* [Qemu-devel] [PATCH v6 0/2] mirror: Improve zero write and discard
@ 2015-11-20 10:12 Fam Zheng
2015-11-20 10:12 ` [Qemu-devel] [PATCH v6 1/2] mirror: Rewrite mirror_iteration Fam Zheng
2015-11-20 10:12 ` [Qemu-devel] [PATCH v6 2/2] mirror: Add mirror_wait_for_io Fam Zheng
0 siblings, 2 replies; 6+ messages in thread
From: Fam Zheng @ 2015-11-20 10:12 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, pbonzini, Jeff Cody, qemu-block, mreitz
v6: Address Paolo's comments in mirror_iteration:
- break if we've already got a chunk to work on;
- fix a typo in comment.
Patch 1 rewrites mirror_iteration. Patch 2 is a small DRY cleaning up.
Fam Zheng (2):
mirror: Rewrite mirror_iteration
mirror: Add mirror_wait_for_io
block/mirror.c | 325 ++++++++++++++++++++++++++++++++++-----------------------
1 file changed, 196 insertions(+), 129 deletions(-)
--
2.4.3
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH v6 1/2] mirror: Rewrite mirror_iteration
2015-11-20 10:12 [Qemu-devel] [PATCH v6 0/2] mirror: Improve zero write and discard Fam Zheng
@ 2015-11-20 10:12 ` Fam Zheng
2015-12-18 16:41 ` Max Reitz
2015-11-20 10:12 ` [Qemu-devel] [PATCH v6 2/2] mirror: Add mirror_wait_for_io Fam Zheng
1 sibling, 1 reply; 6+ messages in thread
From: Fam Zheng @ 2015-11-20 10:12 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, pbonzini, Jeff Cody, qemu-block, mreitz
The "pnum < nb_sectors" condition in deciding whether to actually copy
data is unnecessarily strict, and the qiov initialization is
unnecessarily for bdrv_aio_write_zeroes and bdrv_aio_discard.
Rewrite mirror_iteration to fix both flaws.
Signed-off-by: Fam Zheng <famz@redhat.com>
---
v5: Address Max's review comments:
- Fix parameter name of mirror_do_read().
- Simplify the buffer waiting loop in mirror_do_read.
- Don't skip next dirty chunk when collecting consective dirty
chunks.
- Check sector range when collecting consective dirty chunks.
- Don't misuse a negative return value of
bdrv_get_block_status_above.
---
block/mirror.c | 307 +++++++++++++++++++++++++++++++++++----------------------
1 file changed, 187 insertions(+), 120 deletions(-)
diff --git a/block/mirror.c b/block/mirror.c
index 52c9abf..ff8149d 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -45,7 +45,6 @@ typedef struct MirrorBlockJob {
BlockdevOnError on_source_error, on_target_error;
bool synced;
bool should_complete;
- int64_t sector_num;
int64_t granularity;
size_t buf_size;
int64_t bdev_length;
@@ -157,113 +156,76 @@ static void mirror_read_complete(void *opaque, int ret)
mirror_write_complete, op);
}
-static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
+/* Round sector_num and/or nb_sectors to target cluster if COW is needed, and
+ * return the offset of the adjusted ending sector against
+ * sector_num + nb_sectors. */
+static int mirror_cow_align(MirrorBlockJob *s,
+ int64_t *sector_num,
+ int *nb_sectors)
+{
+ bool head_need_cow, tail_need_cow;
+ int diff = 0;
+ int sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
+
+ head_need_cow = !test_bit(*sector_num / sectors_per_chunk, s->cow_bitmap);
+ tail_need_cow = !test_bit((*sector_num + *nb_sectors) / sectors_per_chunk,
+ s->cow_bitmap);
+ if (head_need_cow || tail_need_cow) {
+ int64_t rounded_sector_num;
+ int rounded_nb_sectors;
+ bdrv_round_to_clusters(s->target, *sector_num, *nb_sectors,
+ &rounded_sector_num, &rounded_nb_sectors);
+ assert(*sector_num >= rounded_sector_num);
+ assert(rounded_nb_sectors >= *nb_sectors);
+ if (tail_need_cow) {
+ int diff = rounded_sector_num + rounded_nb_sectors
+ - (*sector_num + *nb_sectors);
+ *nb_sectors += diff;
+ }
+ if (head_need_cow) {
+ int diff = *sector_num - rounded_sector_num;
+ *sector_num = rounded_sector_num;
+ *nb_sectors += diff;
+ }
+ }
+ return diff;
+}
+
+/* Submit async read while handling COW.
+ * Returns: nb_sectors if no alignment is necessary, or
+ * (new_end - sector_num) if tail is rounded up or down due to
+ * alignment or buffer limit.
+ */
+static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
+ int nb_sectors)
{
BlockDriverState *source = s->common.bs;
- int nb_sectors, sectors_per_chunk, nb_chunks;
- int64_t end, sector_num, next_chunk, next_sector, hbitmap_next_sector;
- uint64_t delay_ns = 0;
+ int sectors_per_chunk, nb_chunks;
+ int ret = nb_sectors;
MirrorOp *op;
- int pnum;
- int64_t ret;
- s->sector_num = hbitmap_iter_next(&s->hbi);
- if (s->sector_num < 0) {
- bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi);
- s->sector_num = hbitmap_iter_next(&s->hbi);
- trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
- assert(s->sector_num >= 0);
- }
-
- hbitmap_next_sector = s->sector_num;
- sector_num = s->sector_num;
sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
- end = s->bdev_length / BDRV_SECTOR_SIZE;
- /* Extend the QEMUIOVector to include all adjacent blocks that will
- * be copied in this operation.
- *
- * We have to do this if we have no backing file yet in the destination,
- * and the cluster size is very large. Then we need to do COW ourselves.
- * The first time a cluster is copied, copy it entirely. Note that,
- * because both the granularity and the cluster size are powers of two,
- * the number of sectors to copy cannot exceed one cluster.
- *
- * We also want to extend the QEMUIOVector to include more adjacent
- * dirty blocks if possible, to limit the number of I/O operations and
- * run efficiently even with a small granularity.
- */
- nb_chunks = 0;
- nb_sectors = 0;
- next_sector = sector_num;
- next_chunk = sector_num / sectors_per_chunk;
+ if (s->cow_bitmap) {
+ ret += mirror_cow_align(s, §or_num, &nb_sectors);
+ }
+ /* We can only handle as much as buf_size at a time. */
+ nb_sectors = MIN(s->buf_size >> BDRV_SECTOR_BITS, nb_sectors);
+ assert(nb_sectors);
+ /* The sector range must meet granularity because:
+ * 1) Caller passes in aligned values;
+ * 2) mirror_cow_align is used only when target cluster is larger. */
+ assert(!(nb_sectors % sectors_per_chunk));
+ assert(!(sector_num % sectors_per_chunk));
+ nb_chunks = nb_sectors / sectors_per_chunk;
- /* Wait for I/O to this cluster (from a previous iteration) to be done. */
- while (test_bit(next_chunk, s->in_flight_bitmap)) {
+ while (s->buf_free_count < nb_chunks) {
trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
s->waiting_for_io = true;
qemu_coroutine_yield();
s->waiting_for_io = false;
}
- do {
- int added_sectors, added_chunks;
-
- if (!bdrv_get_dirty(source, s->dirty_bitmap, next_sector) ||
- test_bit(next_chunk, s->in_flight_bitmap)) {
- assert(nb_sectors > 0);
- break;
- }
-
- added_sectors = sectors_per_chunk;
- if (s->cow_bitmap && !test_bit(next_chunk, s->cow_bitmap)) {
- bdrv_round_to_clusters(s->target,
- next_sector, added_sectors,
- &next_sector, &added_sectors);
-
- /* On the first iteration, the rounding may make us copy
- * sectors before the first dirty one.
- */
- if (next_sector < sector_num) {
- assert(nb_sectors == 0);
- sector_num = next_sector;
- next_chunk = next_sector / sectors_per_chunk;
- }
- }
-
- added_sectors = MIN(added_sectors, end - (sector_num + nb_sectors));
- added_chunks = (added_sectors + sectors_per_chunk - 1) / sectors_per_chunk;
-
- /* When doing COW, it may happen that there is not enough space for
- * a full cluster. Wait if that is the case.
- */
- while (nb_chunks == 0 && s->buf_free_count < added_chunks) {
- trace_mirror_yield_buf_busy(s, nb_chunks, s->in_flight);
- s->waiting_for_io = true;
- qemu_coroutine_yield();
- s->waiting_for_io = false;
- }
- if (s->buf_free_count < nb_chunks + added_chunks) {
- trace_mirror_break_buf_busy(s, nb_chunks, s->in_flight);
- break;
- }
- if (IOV_MAX < nb_chunks + added_chunks) {
- trace_mirror_break_iov_max(s, nb_chunks, added_chunks);
- break;
- }
-
- /* We have enough free space to copy these sectors. */
- bitmap_set(s->in_flight_bitmap, next_chunk, added_chunks);
-
- nb_sectors += added_sectors;
- nb_chunks += added_chunks;
- next_sector += added_sectors;
- next_chunk += added_chunks;
- if (!s->synced && s->common.speed) {
- delay_ns = ratelimit_calculate_delay(&s->limit, added_sectors);
- }
- } while (delay_ns == 0 && next_sector < end);
-
/* Allocate a MirrorOp that is used as an AIO callback. */
op = g_new(MirrorOp, 1);
op->s = s;
@@ -274,47 +236,152 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
* from s->buf_free.
*/
qemu_iovec_init(&op->qiov, nb_chunks);
- next_sector = sector_num;
while (nb_chunks-- > 0) {
MirrorBuffer *buf = QSIMPLEQ_FIRST(&s->buf_free);
- size_t remaining = (nb_sectors * BDRV_SECTOR_SIZE) - op->qiov.size;
+ size_t remaining = nb_sectors * BDRV_SECTOR_SIZE - op->qiov.size;
QSIMPLEQ_REMOVE_HEAD(&s->buf_free, next);
s->buf_free_count--;
qemu_iovec_add(&op->qiov, buf, MIN(s->granularity, remaining));
-
- /* Advance the HBitmapIter in parallel, so that we do not examine
- * the same sector twice.
- */
- if (next_sector > hbitmap_next_sector
- && bdrv_get_dirty(source, s->dirty_bitmap, next_sector)) {
- hbitmap_next_sector = hbitmap_iter_next(&s->hbi);
- }
-
- next_sector += sectors_per_chunk;
}
- bdrv_reset_dirty_bitmap(s->dirty_bitmap, sector_num, nb_sectors);
-
/* Copy the dirty cluster. */
s->in_flight++;
s->sectors_in_flight += nb_sectors;
trace_mirror_one_iteration(s, sector_num, nb_sectors);
- ret = bdrv_get_block_status_above(source, NULL, sector_num,
- nb_sectors, &pnum);
- if (ret < 0 || pnum < nb_sectors ||
- (ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLOCK_ZERO))) {
- bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
- mirror_read_complete, op);
- } else if (ret & BDRV_BLOCK_ZERO) {
+ bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
+ mirror_read_complete, op);
+ return ret;
+}
+
+static void mirror_do_zero_or_discard(MirrorBlockJob *s,
+ int64_t sector_num,
+ int nb_sectors,
+ bool is_discard)
+{
+ MirrorOp *op;
+
+ /* Allocate a MirrorOp that is used as an AIO callback. The qiov is zeroed
+ * so the freeing in mirror_iteration_done is nop. */
+ op = g_new0(MirrorOp, 1);
+ op->s = s;
+ op->sector_num = sector_num;
+ op->nb_sectors = nb_sectors;
+
+ s->in_flight++;
+ s->sectors_in_flight += nb_sectors;
+ if (is_discard) {
+ bdrv_aio_discard(s->target, sector_num, op->nb_sectors,
+ mirror_write_complete, op);
+ } else {
bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,
s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
mirror_write_complete, op);
- } else {
- assert(!(ret & BDRV_BLOCK_DATA));
- bdrv_aio_discard(s->target, sector_num, op->nb_sectors,
- mirror_write_complete, op);
+ }
+}
+
+static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
+{
+ BlockDriverState *source = s->common.bs;
+ int64_t sector_num;
+ uint64_t delay_ns = 0;
+ /* At least the first dirty chunk is mirrored in one iteration. */
+ int nb_chunks = 1;
+ int64_t end = s->bdev_length / BDRV_SECTOR_SIZE;
+ int sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
+
+ sector_num = hbitmap_iter_next(&s->hbi);
+ if (sector_num < 0) {
+ bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi);
+ sector_num = hbitmap_iter_next(&s->hbi);
+ trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
+ assert(sector_num >= 0);
+ }
+
+ /* Find the number of consective dirty chunks following the first dirty
+ * one, and wait for in flight requests in them. */
+ while (nb_chunks * sectors_per_chunk < (s->buf_size >> BDRV_SECTOR_BITS)) {
+ int64_t hbitmap_next;
+ int64_t next_sector = sector_num + nb_chunks * sectors_per_chunk;
+ int64_t next_chunk = next_sector / sectors_per_chunk;
+ if (next_sector >= end ||
+ !bdrv_get_dirty(source, s->dirty_bitmap, next_sector)) {
+ break;
+ }
+ if (test_bit(next_chunk, s->in_flight_bitmap)) {
+ if (nb_chunks > 0) {
+ break;
+ }
+ trace_mirror_yield_in_flight(s, next_sector, s->in_flight);
+ s->waiting_for_io = true;
+ qemu_coroutine_yield();
+ s->waiting_for_io = false;
+ /* Now retry. */
+ } else {
+ hbitmap_next = hbitmap_iter_next(&s->hbi);
+ assert(hbitmap_next == next_sector);
+ nb_chunks++;
+ }
+ }
+
+ /* Clear dirty bits before querying the block status, because
+ * calling bdrv_get_block_status_above could yield - if some blocks are
+ * marked dirty in this window, we need to know.
+ */
+ bdrv_reset_dirty_bitmap(s->dirty_bitmap, sector_num,
+ nb_chunks * sectors_per_chunk);
+ bitmap_set(s->in_flight_bitmap, sector_num / sectors_per_chunk, nb_chunks);
+ while (nb_chunks > 0 && sector_num < end) {
+ int ret;
+ int io_sectors;
+ enum MirrorMethod {
+ MIRROR_METHOD_COPY,
+ MIRROR_METHOD_ZERO,
+ MIRROR_METHOD_DISCARD
+ } mirror_method = MIRROR_METHOD_COPY;
+
+ assert(!(sector_num % sectors_per_chunk));
+ ret = bdrv_get_block_status_above(source, NULL, sector_num,
+ nb_chunks * sectors_per_chunk,
+ &io_sectors);
+ if (ret < 0) {
+ io_sectors = nb_chunks * sectors_per_chunk;
+ }
+
+ io_sectors -= io_sectors % sectors_per_chunk;
+ if (io_sectors < sectors_per_chunk) {
+ io_sectors = sectors_per_chunk;
+ } else if (ret > 0 && !(ret & BDRV_BLOCK_DATA)) {
+ int64_t target_sector_num;
+ int target_nb_sectors;
+ bdrv_round_to_clusters(s->target, sector_num, io_sectors,
+ &target_sector_num, &target_nb_sectors);
+ if (target_sector_num == sector_num &&
+ target_nb_sectors == io_sectors) {
+ mirror_method = ret & BDRV_BLOCK_ZERO ?
+ MIRROR_METHOD_ZERO :
+ MIRROR_METHOD_DISCARD;
+ }
+ }
+
+ switch (mirror_method) {
+ case MIRROR_METHOD_COPY:
+ io_sectors = mirror_do_read(s, sector_num, io_sectors);
+ break;
+ case MIRROR_METHOD_ZERO:
+ mirror_do_zero_or_discard(s, sector_num, io_sectors, false);
+ break;
+ case MIRROR_METHOD_DISCARD:
+ mirror_do_zero_or_discard(s, sector_num, io_sectors, true);
+ break;
+ default:
+ abort();
+ }
+ assert(io_sectors);
+ sector_num += io_sectors;
+ nb_chunks -= io_sectors / sectors_per_chunk;
+ delay_ns += ratelimit_calculate_delay(&s->limit, io_sectors);
}
return delay_ns;
}
--
2.4.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH v6 2/2] mirror: Add mirror_wait_for_io
2015-11-20 10:12 [Qemu-devel] [PATCH v6 0/2] mirror: Improve zero write and discard Fam Zheng
2015-11-20 10:12 ` [Qemu-devel] [PATCH v6 1/2] mirror: Rewrite mirror_iteration Fam Zheng
@ 2015-11-20 10:12 ` Fam Zheng
2015-12-18 16:42 ` Max Reitz
1 sibling, 1 reply; 6+ messages in thread
From: Fam Zheng @ 2015-11-20 10:12 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, pbonzini, Jeff Cody, qemu-block, mreitz
The three lines are duplicated a number of times now, refactor a
function.
Signed-off-by: Fam Zheng <famz@redhat.com>
---
block/mirror.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/block/mirror.c b/block/mirror.c
index ff8149d..ea5a76a 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -191,6 +191,14 @@ static int mirror_cow_align(MirrorBlockJob *s,
return diff;
}
+static inline void mirror_wait_for_io(MirrorBlockJob *s)
+{
+ assert(!s->waiting_for_io);
+ s->waiting_for_io = true;
+ qemu_coroutine_yield();
+ s->waiting_for_io = false;
+}
+
/* Submit async read while handling COW.
* Returns: nb_sectors if no alignment is necessary, or
* (new_end - sector_num) if tail is rounded up or down due to
@@ -221,9 +229,7 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
while (s->buf_free_count < nb_chunks) {
trace_mirror_yield_in_flight(s, sector_num, s->in_flight);
- s->waiting_for_io = true;
- qemu_coroutine_yield();
- s->waiting_for_io = false;
+ mirror_wait_for_io(s);
}
/* Allocate a MirrorOp that is used as an AIO callback. */
@@ -314,9 +320,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
break;
}
trace_mirror_yield_in_flight(s, next_sector, s->in_flight);
- s->waiting_for_io = true;
- qemu_coroutine_yield();
- s->waiting_for_io = false;
+ mirror_wait_for_io(s);
/* Now retry. */
} else {
hbitmap_next = hbitmap_iter_next(&s->hbi);
@@ -406,9 +410,7 @@ static void mirror_free_init(MirrorBlockJob *s)
static void mirror_drain(MirrorBlockJob *s)
{
while (s->in_flight > 0) {
- s->waiting_for_io = true;
- qemu_coroutine_yield();
- s->waiting_for_io = false;
+ mirror_wait_for_io(s);
}
}
@@ -583,9 +585,7 @@ static void coroutine_fn mirror_run(void *opaque)
if (s->in_flight == MAX_IN_FLIGHT || s->buf_free_count == 0 ||
(cnt == 0 && s->in_flight > 0)) {
trace_mirror_yield(s, s->in_flight, s->buf_free_count, cnt);
- s->waiting_for_io = true;
- qemu_coroutine_yield();
- s->waiting_for_io = false;
+ mirror_wait_for_io(s);
continue;
} else if (cnt != 0) {
delay_ns = mirror_iteration(s);
--
2.4.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH v6 1/2] mirror: Rewrite mirror_iteration
2015-11-20 10:12 ` [Qemu-devel] [PATCH v6 1/2] mirror: Rewrite mirror_iteration Fam Zheng
@ 2015-12-18 16:41 ` Max Reitz
2015-12-23 3:53 ` Fam Zheng
0 siblings, 1 reply; 6+ messages in thread
From: Max Reitz @ 2015-12-18 16:41 UTC (permalink / raw)
To: Fam Zheng, qemu-devel; +Cc: Kevin Wolf, pbonzini, Jeff Cody, qemu-block
[-- Attachment #1: Type: text/plain, Size: 8958 bytes --]
On 20.11.2015 11:12, Fam Zheng wrote:
> The "pnum < nb_sectors" condition in deciding whether to actually copy
> data is unnecessarily strict, and the qiov initialization is
> unnecessarily for bdrv_aio_write_zeroes and bdrv_aio_discard.
>
> Rewrite mirror_iteration to fix both flaws.
>
> Signed-off-by: Fam Zheng <famz@redhat.com>
>
> ---
> v5: Address Max's review comments:
> - Fix parameter name of mirror_do_read().
> - Simplify the buffer waiting loop in mirror_do_read.
> - Don't skip next dirty chunk when collecting consective dirty
> chunks.
> - Check sector range when collecting consective dirty chunks.
> - Don't misuse a negative return value of
> bdrv_get_block_status_above.
> ---
> block/mirror.c | 307 +++++++++++++++++++++++++++++++++++----------------------
> 1 file changed, 187 insertions(+), 120 deletions(-)
>
> diff --git a/block/mirror.c b/block/mirror.c
> index 52c9abf..ff8149d 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -45,7 +45,6 @@ typedef struct MirrorBlockJob {
> BlockdevOnError on_source_error, on_target_error;
> bool synced;
> bool should_complete;
> - int64_t sector_num;
> int64_t granularity;
> size_t buf_size;
> int64_t bdev_length;
> @@ -157,113 +156,76 @@ static void mirror_read_complete(void *opaque, int ret)
> mirror_write_complete, op);
> }
>
> -static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
> +/* Round sector_num and/or nb_sectors to target cluster if COW is needed, and
> + * return the offset of the adjusted ending sector against
> + * sector_num + nb_sectors. */
> +static int mirror_cow_align(MirrorBlockJob *s,
> + int64_t *sector_num,
> + int *nb_sectors)
> +{
> + bool head_need_cow, tail_need_cow;
> + int diff = 0;
> + int sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
> +
> + head_need_cow = !test_bit(*sector_num / sectors_per_chunk, s->cow_bitmap);
> + tail_need_cow = !test_bit((*sector_num + *nb_sectors) / sectors_per_chunk,
Should this be (*sector_num + *nb_sectors - 1) so that we actually check
the last chunk of the sector range (instead of the next chunk if
*sector_num + *nb_sectors is divisible by sectors_per_chunk).
> + s->cow_bitmap);
> + if (head_need_cow || tail_need_cow) {
> + int64_t rounded_sector_num;
> + int rounded_nb_sectors;
> + bdrv_round_to_clusters(s->target, *sector_num, *nb_sectors,
> + &rounded_sector_num, &rounded_nb_sectors);
> + assert(*sector_num >= rounded_sector_num);
> + assert(rounded_nb_sectors >= *nb_sectors);
You could move these assertions into the following conditional blocks,
that would make more sense to me:
> + if (tail_need_cow) {
> + int diff = rounded_sector_num + rounded_nb_sectors
> + - (*sector_num + *nb_sectors);
assert(diff >= 0);
Also, I don't like shadowing of variables very much.
> + *nb_sectors += diff;
> + }
> + if (head_need_cow) {
> + int diff = *sector_num - rounded_sector_num;
assert(diff >= 0);
> + *sector_num = rounded_sector_num;
> + *nb_sectors += diff;
> + }
> + }
> + return diff;
diff is always 0.
(This is why I don't like shadowing of variables very much.)
> +}
> +
> +/* Submit async read while handling COW.
> + * Returns: nb_sectors if no alignment is necessary, or
> + * (new_end - sector_num) if tail is rounded up or down due to
> + * alignment or buffer limit.
> + */
> +static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
> + int nb_sectors)
> {
> BlockDriverState *source = s->common.bs;
> - int nb_sectors, sectors_per_chunk, nb_chunks;
> - int64_t end, sector_num, next_chunk, next_sector, hbitmap_next_sector;
> - uint64_t delay_ns = 0;
> + int sectors_per_chunk, nb_chunks;
> + int ret = nb_sectors;
> MirrorOp *op;
> - int pnum;
> - int64_t ret;
>
> - s->sector_num = hbitmap_iter_next(&s->hbi);
> - if (s->sector_num < 0) {
> - bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi);
> - s->sector_num = hbitmap_iter_next(&s->hbi);
> - trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
> - assert(s->sector_num >= 0);
> - }
> -
> - hbitmap_next_sector = s->sector_num;
> - sector_num = s->sector_num;
> sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
> - end = s->bdev_length / BDRV_SECTOR_SIZE;
>
> - /* Extend the QEMUIOVector to include all adjacent blocks that will
> - * be copied in this operation.
> - *
> - * We have to do this if we have no backing file yet in the destination,
> - * and the cluster size is very large. Then we need to do COW ourselves.
> - * The first time a cluster is copied, copy it entirely. Note that,
> - * because both the granularity and the cluster size are powers of two,
> - * the number of sectors to copy cannot exceed one cluster.
> - *
> - * We also want to extend the QEMUIOVector to include more adjacent
> - * dirty blocks if possible, to limit the number of I/O operations and
> - * run efficiently even with a small granularity.
> - */
> - nb_chunks = 0;
> - nb_sectors = 0;
> - next_sector = sector_num;
> - next_chunk = sector_num / sectors_per_chunk;
> + if (s->cow_bitmap) {
> + ret += mirror_cow_align(s, §or_num, &nb_sectors);
mirror_cow_align() always returns 0, but I assume it is supposed to
return the difference of nb_sectors before and after the call (in which
case this line is correct).
> + }
> + /* We can only handle as much as buf_size at a time. */
> + nb_sectors = MIN(s->buf_size >> BDRV_SECTOR_BITS, nb_sectors);
> + assert(nb_sectors);
Maybe we should move these three lines before the mirror_cow_align()
call. I don't think it will make a difference in practice, but it seems
cleaner to me that way.
> + /* The sector range must meet granularity because:
> + * 1) Caller passes in aligned values;
> + * 2) mirror_cow_align is used only when target cluster is larger. */
> + assert(!(nb_sectors % sectors_per_chunk));
> + assert(!(sector_num % sectors_per_chunk));
> + nb_chunks = nb_sectors / sectors_per_chunk;
[...]
> +static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
> +{
[...]
> + while (nb_chunks > 0 && sector_num < end) {
> + int ret;
> + int io_sectors;
> + enum MirrorMethod {
> + MIRROR_METHOD_COPY,
> + MIRROR_METHOD_ZERO,
> + MIRROR_METHOD_DISCARD
> + } mirror_method = MIRROR_METHOD_COPY;
> +
> + assert(!(sector_num % sectors_per_chunk));
> + ret = bdrv_get_block_status_above(source, NULL, sector_num,
> + nb_chunks * sectors_per_chunk,
> + &io_sectors);
> + if (ret < 0) {
> + io_sectors = nb_chunks * sectors_per_chunk;
> + }
> +
> + io_sectors -= io_sectors % sectors_per_chunk;
> + if (io_sectors < sectors_per_chunk) {
> + io_sectors = sectors_per_chunk;
> + } else if (ret > 0 && !(ret & BDRV_BLOCK_DATA)) {
Why > 0 and not >= 0?
Max
> + int64_t target_sector_num;
> + int target_nb_sectors;
> + bdrv_round_to_clusters(s->target, sector_num, io_sectors,
> + &target_sector_num, &target_nb_sectors);
> + if (target_sector_num == sector_num &&
> + target_nb_sectors == io_sectors) {
> + mirror_method = ret & BDRV_BLOCK_ZERO ?
> + MIRROR_METHOD_ZERO :
> + MIRROR_METHOD_DISCARD;
> + }
> + }
> +
> + switch (mirror_method) {
> + case MIRROR_METHOD_COPY:
> + io_sectors = mirror_do_read(s, sector_num, io_sectors);
> + break;
> + case MIRROR_METHOD_ZERO:
> + mirror_do_zero_or_discard(s, sector_num, io_sectors, false);
> + break;
> + case MIRROR_METHOD_DISCARD:
> + mirror_do_zero_or_discard(s, sector_num, io_sectors, true);
> + break;
> + default:
> + abort();
> + }
> + assert(io_sectors);
> + sector_num += io_sectors;
> + nb_chunks -= io_sectors / sectors_per_chunk;
> + delay_ns += ratelimit_calculate_delay(&s->limit, io_sectors);
> }
> return delay_ns;
> }
>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH v6 2/2] mirror: Add mirror_wait_for_io
2015-11-20 10:12 ` [Qemu-devel] [PATCH v6 2/2] mirror: Add mirror_wait_for_io Fam Zheng
@ 2015-12-18 16:42 ` Max Reitz
0 siblings, 0 replies; 6+ messages in thread
From: Max Reitz @ 2015-12-18 16:42 UTC (permalink / raw)
To: Fam Zheng, qemu-devel; +Cc: Kevin Wolf, pbonzini, Jeff Cody, qemu-block
[-- Attachment #1: Type: text/plain, Size: 328 bytes --]
On 20.11.2015 11:12, Fam Zheng wrote:
> The three lines are duplicated a number of times now, refactor a
> function.
>
> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
> block/mirror.c | 24 ++++++++++++------------
> 1 file changed, 12 insertions(+), 12 deletions(-)
Reviewed-by: Max Reitz <mreitz@redhat.com>
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH v6 1/2] mirror: Rewrite mirror_iteration
2015-12-18 16:41 ` Max Reitz
@ 2015-12-23 3:53 ` Fam Zheng
0 siblings, 0 replies; 6+ messages in thread
From: Fam Zheng @ 2015-12-23 3:53 UTC (permalink / raw)
To: Max Reitz; +Cc: Kevin Wolf, pbonzini, Jeff Cody, qemu-devel, qemu-block
On Fri, 12/18 17:41, Max Reitz wrote:
> On 20.11.2015 11:12, Fam Zheng wrote:
> > The "pnum < nb_sectors" condition in deciding whether to actually copy
> > data is unnecessarily strict, and the qiov initialization is
> > unnecessarily for bdrv_aio_write_zeroes and bdrv_aio_discard.
> >
> > Rewrite mirror_iteration to fix both flaws.
> >
> > Signed-off-by: Fam Zheng <famz@redhat.com>
> >
> > ---
> > v5: Address Max's review comments:
> > - Fix parameter name of mirror_do_read().
> > - Simplify the buffer waiting loop in mirror_do_read.
> > - Don't skip next dirty chunk when collecting consective dirty
> > chunks.
> > - Check sector range when collecting consective dirty chunks.
> > - Don't misuse a negative return value of
> > bdrv_get_block_status_above.
> > ---
> > block/mirror.c | 307 +++++++++++++++++++++++++++++++++++----------------------
> > 1 file changed, 187 insertions(+), 120 deletions(-)
> >
> > diff --git a/block/mirror.c b/block/mirror.c
> > index 52c9abf..ff8149d 100644
> > --- a/block/mirror.c
> > +++ b/block/mirror.c
> > @@ -45,7 +45,6 @@ typedef struct MirrorBlockJob {
> > BlockdevOnError on_source_error, on_target_error;
> > bool synced;
> > bool should_complete;
> > - int64_t sector_num;
> > int64_t granularity;
> > size_t buf_size;
> > int64_t bdev_length;
> > @@ -157,113 +156,76 @@ static void mirror_read_complete(void *opaque, int ret)
> > mirror_write_complete, op);
> > }
> >
> > -static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
> > +/* Round sector_num and/or nb_sectors to target cluster if COW is needed, and
> > + * return the offset of the adjusted ending sector against
> > + * sector_num + nb_sectors. */
> > +static int mirror_cow_align(MirrorBlockJob *s,
> > + int64_t *sector_num,
> > + int *nb_sectors)
> > +{
> > + bool head_need_cow, tail_need_cow;
> > + int diff = 0;
> > + int sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
> > +
> > + head_need_cow = !test_bit(*sector_num / sectors_per_chunk, s->cow_bitmap);
> > + tail_need_cow = !test_bit((*sector_num + *nb_sectors) / sectors_per_chunk,
>
> Should this be (*sector_num + *nb_sectors - 1) so that we actually check
> the last chunk of the sector range (instead of the next chunk if
> *sector_num + *nb_sectors is divisible by sectors_per_chunk).
Yes, you're right.
>
> > + s->cow_bitmap);
> > + if (head_need_cow || tail_need_cow) {
> > + int64_t rounded_sector_num;
> > + int rounded_nb_sectors;
> > + bdrv_round_to_clusters(s->target, *sector_num, *nb_sectors,
> > + &rounded_sector_num, &rounded_nb_sectors);
> > + assert(*sector_num >= rounded_sector_num);
> > + assert(rounded_nb_sectors >= *nb_sectors);
>
> You could move these assertions into the following conditional blocks,
> that would make more sense to me:
>
> > + if (tail_need_cow) {
> > + int diff = rounded_sector_num + rounded_nb_sectors
> > + - (*sector_num + *nb_sectors);
>
> assert(diff >= 0);
>
> Also, I don't like shadowing of variables very much.
>
> > + *nb_sectors += diff;
> > + }
> > + if (head_need_cow) {
> > + int diff = *sector_num - rounded_sector_num;
>
> assert(diff >= 0);
Okay.
>
> > + *sector_num = rounded_sector_num;
> > + *nb_sectors += diff;
> > + }
> > + }
> > + return diff;
>
> diff is always 0.
>
> (This is why I don't like shadowing of variables very much.)
The fist shadowing is a mistake, will fix, and I'll also rename the second.
>
> > +}
> > +
> > +/* Submit async read while handling COW.
> > + * Returns: nb_sectors if no alignment is necessary, or
> > + * (new_end - sector_num) if tail is rounded up or down due to
> > + * alignment or buffer limit.
> > + */
> > +static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
> > + int nb_sectors)
> > {
> > BlockDriverState *source = s->common.bs;
> > - int nb_sectors, sectors_per_chunk, nb_chunks;
> > - int64_t end, sector_num, next_chunk, next_sector, hbitmap_next_sector;
> > - uint64_t delay_ns = 0;
> > + int sectors_per_chunk, nb_chunks;
> > + int ret = nb_sectors;
> > MirrorOp *op;
> > - int pnum;
> > - int64_t ret;
> >
> > - s->sector_num = hbitmap_iter_next(&s->hbi);
> > - if (s->sector_num < 0) {
> > - bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi);
> > - s->sector_num = hbitmap_iter_next(&s->hbi);
> > - trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
> > - assert(s->sector_num >= 0);
> > - }
> > -
> > - hbitmap_next_sector = s->sector_num;
> > - sector_num = s->sector_num;
> > sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
> > - end = s->bdev_length / BDRV_SECTOR_SIZE;
> >
> > - /* Extend the QEMUIOVector to include all adjacent blocks that will
> > - * be copied in this operation.
> > - *
> > - * We have to do this if we have no backing file yet in the destination,
> > - * and the cluster size is very large. Then we need to do COW ourselves.
> > - * The first time a cluster is copied, copy it entirely. Note that,
> > - * because both the granularity and the cluster size are powers of two,
> > - * the number of sectors to copy cannot exceed one cluster.
> > - *
> > - * We also want to extend the QEMUIOVector to include more adjacent
> > - * dirty blocks if possible, to limit the number of I/O operations and
> > - * run efficiently even with a small granularity.
> > - */
> > - nb_chunks = 0;
> > - nb_sectors = 0;
> > - next_sector = sector_num;
> > - next_chunk = sector_num / sectors_per_chunk;
> > + if (s->cow_bitmap) {
> > + ret += mirror_cow_align(s, §or_num, &nb_sectors);
>
> mirror_cow_align() always returns 0, but I assume it is supposed to
> return the difference of nb_sectors before and after the call (in which
> case this line is correct).
Yes, except it is supposed to return the "progress" (in case of rounding to
target, count the last sector instead of nb_sectors).
>
> > + }
> > + /* We can only handle as much as buf_size at a time. */
> > + nb_sectors = MIN(s->buf_size >> BDRV_SECTOR_BITS, nb_sectors);
> > + assert(nb_sectors);
>
> Maybe we should move these three lines before the mirror_cow_align()
> call. I don't think it will make a difference in practice, but it seems
> cleaner to me that way.
>
> > + /* The sector range must meet granularity because:
> > + * 1) Caller passes in aligned values;
> > + * 2) mirror_cow_align is used only when target cluster is larger. */
> > + assert(!(nb_sectors % sectors_per_chunk));
> > + assert(!(sector_num % sectors_per_chunk));
> > + nb_chunks = nb_sectors / sectors_per_chunk;
>
> [...]
>
> > +static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
> > +{
>
> [...]
>
> > + while (nb_chunks > 0 && sector_num < end) {
> > + int ret;
> > + int io_sectors;
> > + enum MirrorMethod {
> > + MIRROR_METHOD_COPY,
> > + MIRROR_METHOD_ZERO,
> > + MIRROR_METHOD_DISCARD
> > + } mirror_method = MIRROR_METHOD_COPY;
> > +
> > + assert(!(sector_num % sectors_per_chunk));
> > + ret = bdrv_get_block_status_above(source, NULL, sector_num,
> > + nb_chunks * sectors_per_chunk,
> > + &io_sectors);
> > + if (ret < 0) {
> > + io_sectors = nb_chunks * sectors_per_chunk;
> > + }
> > +
> > + io_sectors -= io_sectors % sectors_per_chunk;
> > + if (io_sectors < sectors_per_chunk) {
> > + io_sectors = sectors_per_chunk;
> > + } else if (ret > 0 && !(ret & BDRV_BLOCK_DATA)) {
>
> Why > 0 and not >= 0?
Good idea, >= 0 looks better (0 means unknown or unallocated, and discard is
appropriate).
Fam
>
> Max
>
> > + int64_t target_sector_num;
> > + int target_nb_sectors;
> > + bdrv_round_to_clusters(s->target, sector_num, io_sectors,
> > + &target_sector_num, &target_nb_sectors);
> > + if (target_sector_num == sector_num &&
> > + target_nb_sectors == io_sectors) {
> > + mirror_method = ret & BDRV_BLOCK_ZERO ?
> > + MIRROR_METHOD_ZERO :
> > + MIRROR_METHOD_DISCARD;
> > + }
> > + }
> > +
> > + switch (mirror_method) {
> > + case MIRROR_METHOD_COPY:
> > + io_sectors = mirror_do_read(s, sector_num, io_sectors);
> > + break;
> > + case MIRROR_METHOD_ZERO:
> > + mirror_do_zero_or_discard(s, sector_num, io_sectors, false);
> > + break;
> > + case MIRROR_METHOD_DISCARD:
> > + mirror_do_zero_or_discard(s, sector_num, io_sectors, true);
> > + break;
> > + default:
> > + abort();
> > + }
> > + assert(io_sectors);
> > + sector_num += io_sectors;
> > + nb_chunks -= io_sectors / sectors_per_chunk;
> > + delay_ns += ratelimit_calculate_delay(&s->limit, io_sectors);
> > }
> > return delay_ns;
> > }
> >
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-12-23 3:54 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-20 10:12 [Qemu-devel] [PATCH v6 0/2] mirror: Improve zero write and discard Fam Zheng
2015-11-20 10:12 ` [Qemu-devel] [PATCH v6 1/2] mirror: Rewrite mirror_iteration Fam Zheng
2015-12-18 16:41 ` Max Reitz
2015-12-23 3:53 ` Fam Zheng
2015-11-20 10:12 ` [Qemu-devel] [PATCH v6 2/2] mirror: Add mirror_wait_for_io Fam Zheng
2015-12-18 16:42 ` Max Reitz
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.