From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:40632) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T9ipC-00024N-1z for qemu-devel@nongnu.org; Thu, 06 Sep 2012 16:37:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1T9ip9-0002fu-VY for qemu-devel@nongnu.org; Thu, 06 Sep 2012 16:37:17 -0400 Received: from mx1.redhat.com ([209.132.183.28]:47114) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1T9ip9-0002fh-MP for qemu-devel@nongnu.org; Thu, 06 Sep 2012 16:37:15 -0400 Message-ID: <50490975.8040702@redhat.com> Date: Thu, 06 Sep 2012 16:37:09 -0400 From: Jeff Cody MIME-Version: 1.0 References: <812f2aca86713b00473144bb272f014cc5e8a7f4.1346351079.git.jcody@redhat.com> <5048AC83.8050508@redhat.com> In-Reply-To: <5048AC83.8050508@redhat.com> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC v2 PATCH 2/6] block: add live block commit functionality Reply-To: jcody@redhat.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: stefanha@gmail.com, supriyak@linux.vnet.ibm.com, eblake@redhat.com, qemu-devel@nongnu.org, pbonzini@redhat.com On 09/06/2012 10:00 AM, Kevin Wolf wrote: > Am 30.08.2012 20:47, schrieb Jeff Cody: >> This adds the live commit coroutine. This iteration focuses on the >> commit only below the active layer, and not the active layer itself. >> >> The behaviour is similar to block streaming; the sectors are walked >> through, and anything that exists above 'base' is committed back down >> into base. At the end, intermediate images are deleted, and the >> chain stitched together. Images are restored to their original open >> flags upon completion. >> >> Signed-off-by: Jeff Cody >> --- >> block/Makefile.objs | 1 + >> block/commit.c | 202 ++++++++++++++++++++++++++++++++++++++++++++++++++++ >> block_int.h | 19 +++++ >> trace-events | 2 + >> 4 files changed, 224 insertions(+) >> create mode 100644 block/commit.c >> >> diff --git a/block/Makefile.objs b/block/Makefile.objs >> index b5754d3..4a136b8 100644 >> --- a/block/Makefile.objs >> +++ b/block/Makefile.objs >> @@ -4,6 +4,7 @@ block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o >> block-obj-y += qed-check.o >> block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o >> block-obj-y += stream.o >> +block-obj-y += commit.o >> block-obj-$(CONFIG_WIN32) += raw-win32.o >> block-obj-$(CONFIG_POSIX) += raw-posix.o >> block-obj-$(CONFIG_LIBISCSI) += iscsi.o >> diff --git a/block/commit.c b/block/commit.c >> new file mode 100644 >> index 0000000..bd3d882 >> --- /dev/null >> +++ b/block/commit.c >> @@ -0,0 +1,202 @@ >> +/* >> + * Live block commit >> + * >> + * Copyright Red Hat, Inc. 2012 >> + * >> + * Authors: >> + * Jeff Cody >> + * Based on stream.c by Stefan Hajnoczi >> + * >> + * This work is licensed under the terms of the GNU LGPL, version 2 or later. >> + * See the COPYING.LIB file in the top-level directory. >> + * >> + */ >> + >> +#include "trace.h" >> +#include "block_int.h" >> +#include "qemu/ratelimit.h" >> + >> +enum { >> + /* >> + * Size of data buffer for populating the image file. This should be large >> + * enough to process multiple clusters in a single call, so that populating >> + * contiguous regions of the image is efficient. >> + */ >> + COMMIT_BUFFER_SIZE = 512 * 1024, /* in bytes */ >> +}; >> + >> +#define SLICE_TIME 100000000ULL /* ns */ >> + >> +typedef struct CommitBlockJob { >> + BlockJob common; >> + RateLimit limit; >> + BlockDriverState *active; >> + BlockDriverState *top; >> + BlockDriverState *base; >> + BlockErrorAction on_error; >> + int base_flags; >> + int top_flags; >> +} CommitBlockJob; >> + >> +static int coroutine_fn commit_populate(BlockDriverState *bs, >> + BlockDriverState *base, >> + int64_t sector_num, int nb_sectors, >> + void *buf) >> +{ >> + if (bdrv_read(bs, sector_num, buf, nb_sectors)) { >> + return -EIO; >> + } >> + if (bdrv_write(base, sector_num, buf, nb_sectors)) { >> + return -EIO; >> + } > > Don't throw error codes away. > OK, I'll pass the return from bdrv_read/write to the caller. > What should we do with backing files that are smaller than the image to > commit? In this version, data is copied up to the size of the backing > file, and then we get -EIO from bdrv_co_do_writev(). > We could leave it like that, and let it receive -EIO in that case. Alternatively, we could try and determine before the commit if the data will fit in the base, and return -ENOSPC if not. >> + return 0; >> +} >> + >> +static void coroutine_fn commit_run(void *opaque) >> +{ >> + CommitBlockJob *s = opaque; >> + BlockDriverState *active = s->active; >> + BlockDriverState *top = s->top; >> + BlockDriverState *base = s->base; >> + BlockDriverState *top_child = NULL; >> + int64_t sector_num, end; >> + int error = 0; >> + int ret = 0; >> + int n = 0; >> + void *buf; >> + int bytes_written = 0; >> + >> + s->common.len = bdrv_getlength(top); >> + if (s->common.len < 0) { >> + block_job_complete(&s->common, s->common.len); >> + return; >> + } >> + >> + top_child = bdrv_find_child(active, top); >> + >> + end = s->common.len >> BDRV_SECTOR_BITS; >> + buf = qemu_blockalign(top, COMMIT_BUFFER_SIZE); >> + >> + for (sector_num = 0; sector_num < end; sector_num += n) { >> + uint64_t delay_ns = 0; >> + bool copy; >> + >> +wait: >> + /* Note that even when no rate limit is applied we need to yield >> + * with no pending I/O here so that qemu_aio_flush() returns. >> + */ >> + block_job_sleep_ns(&s->common, rt_clock, delay_ns); >> + if (block_job_is_cancelled(&s->common)) { >> + break; >> + } >> + /* Copy if allocated above the base */ >> + ret = bdrv_co_is_allocated_above(top, base, sector_num, >> + COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE, >> + &n); >> + copy = (ret == 1); >> + trace_commit_one_iteration(s, sector_num, n, ret); >> + if (ret >= 0 && copy) { > > If ret == 1, it's automatically >= 0... > > By the way, I'm not sure if the interface is 0/1/-errno or > 0/positive/-errno. > The header for bdrv_co_is_allocated() states that it returns true if it is allocated, and false otherwise. However, the function actually returns true if allocated, false if not, and something < 0 on failure. But either way, you are right, I can just use 'copy' here. >> + if (s->common.speed) { >> + delay_ns = ratelimit_calculate_delay(&s->limit, n); >> + if (delay_ns > 0) { >> + goto wait; >> + } >> + } >> + ret = commit_populate(top, base, sector_num, n, buf); >> + bytes_written += n * BDRV_SECTOR_SIZE; >> + } >> + if (ret < 0) { >> + if (s->on_error == BLOCK_ERR_STOP_ANY || >> + s->on_error == BLOCK_ERR_STOP_ENOSPC) { > > Shouldn't there be a check for ret == -ENOSPC then...? And does this > error handling do anything useful if you can't pause the job? Wouldn't > it retry all the time? > We should check for ret == -ENOSPC, yes. On the error handling, we should probably just stop completely here, and return error. >> + n = 0; >> + continue; >> + } >> + if (error == 0) { >> + error = ret; >> + } >> + if (s->on_error == BLOCK_ERR_REPORT) { >> + break; >> + } >> + } >> + ret = 0; >> + >> + /* Publish progress */ >> + s->common.offset += n * BDRV_SECTOR_SIZE; >> + } >> + >> + if (!block_job_is_cancelled(&s->common) && sector_num == end && ret == 0) { >> + /* success */ >> + if (bdrv_delete_intermediate(active, top, base)) { >> + /* something went wrong! */ >> + /* TODO:add error reporting here */ > > Indeed :-) > >> + } >> + } >> + >> + /* restore base open flags here if appropriate (e.g., change the base back >> + * to r/o). These reopens do not need to be atomic, since we won't abort >> + * even on failure here */ >> + >> + if (s->base_flags != bdrv_get_flags(base)) { >> + bdrv_reopen(base, s->base_flags, NULL); >> + } >> + if (s->top_flags != bdrv_get_flags(top_child)) { >> + bdrv_reopen(top_child, s->top_flags, NULL); >> + } >> + >> + qemu_vfree(buf); >> + block_job_complete(&s->common, ret); >> +} >> + >> +static void commit_set_speed(BlockJob *job, int64_t speed, Error **errp) >> +{ >> + CommitBlockJob *s = container_of(job, CommitBlockJob, common); >> + >> + if (speed < 0) { >> + error_set(errp, QERR_INVALID_PARAMETER, "speed"); >> + return; >> + } >> + ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME); >> +} >> + >> +static BlockJobType commit_job_type = { >> + .instance_size = sizeof(CommitBlockJob), >> + .job_type = "commit", >> + .set_speed = commit_set_speed, >> +}; >> + >> +void commit_start(BlockDriverState *bs, BlockDriverState *base, >> + BlockDriverState *top, int64_t speed, >> + BlockErrorAction on_error, BlockDriverCompletionFunc *cb, >> + void *opaque, int orig_base_flags, int orig_top_flags, >> + Error **errp) >> +{ >> + CommitBlockJob *s; >> + >> + if ((on_error == BLOCK_ERR_STOP_ANY || >> + on_error == BLOCK_ERR_STOP_ENOSPC) && >> + !bdrv_iostatus_is_enabled(bs)) { >> + error_set(errp, QERR_INVALID_PARAMETER_COMBINATION); >> + return; >> + } >> + >> + s = block_job_create(&commit_job_type, bs, speed, cb, opaque, errp); >> + if (!s) { >> + return; >> + } >> + >> + s->base = base; >> + s->top = top; >> + s->active = bs; >> + >> + s->base_flags = orig_base_flags; >> + s->top_flags = orig_top_flags; > > So it's the caller who is expected to reopen r/w and then pass the > original flags in? Can't we do both of it here? > Yes, that would be cleaner - I'll move the reopen() to commit_start(). >> + >> + s->on_error = on_error; >> + s->common.co = qemu_coroutine_create(commit_run); >> + >> + trace_commit_start(bs, base, top, s, s->common.co, opaque, >> + orig_base_flags, orig_top_flags); >> + qemu_coroutine_enter(s->common.co, s); >> + >> + return; > > Unnecessary return. > > Kevin >