From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:47337) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S23xq-0005GQ-Sx for qemu-devel@nongnu.org; Mon, 27 Feb 2012 12:02:23 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1S23xm-0007an-0K for qemu-devel@nongnu.org; Mon, 27 Feb 2012 12:02:18 -0500 Received: from mx1.redhat.com ([209.132.183.28]:8196) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1S23xl-0007af-OK for qemu-devel@nongnu.org; Mon, 27 Feb 2012 12:02:13 -0500 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q1RH2Cn7017337 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Mon, 27 Feb 2012 12:02:12 -0500 Message-ID: <4F4BB712.8090005@redhat.com> Date: Mon, 27 Feb 2012 12:02:10 -0500 From: Jeff Cody MIME-Version: 1.0 References: <4F4BA56B.7040608@redhat.com> In-Reply-To: <4F4BA56B.7040608@redhat.com> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command Reply-To: jcody@redhat.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: pbonzini@redhat.com, eblake@redhat.com, qemu-devel@nongnu.org, armbru@redhat.com, lcapitulino@redhat.com On 02/27/2012 10:46 AM, Kevin Wolf wrote: > Am 26.02.2012 20:31, schrieb Jeff Cody: >> This is a QAPI/QMP only command to take a snapshot of a group of >> devices. This is similar to the blockdev-snapshot-sync command, except >> blockdev-group-snapshot-sync accepts a list devices, filenames, and >> formats. >> >> It is attempted to keep the snapshot of the group atomic; if the >> creation or open of any of the new snapshots fails, then all of >> the new snapshots are abandoned, and the name of the snapshot image >> that failed is returned. The failure case should not interrupt >> any operations. >> >> Rather than use bdrv_close() along with a subsequent bdrv_open() to >> perform the pivot, the original image is never closed and the new >> image is placed 'in front' of the original image via manipulation >> of the BlockDriverState fields. Thus, once the new snapshot image >> has been successfully created, there are no more failure points >> before pivoting to the new snapshot. >> >> This allows the group of disks to remain consistent with each other, >> even across snapshot failures. >> >> Signed-off-by: Jeff Cody >> --- >> block.c | 47 ++++++++++++++++++++ >> block.h | 1 + >> blockdev.c | 128 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> qapi-schema.json | 38 ++++++++++++++++ >> 4 files changed, 214 insertions(+), 0 deletions(-) >> >> diff --git a/block.c b/block.c >> index 3621d11..0045ab1 100644 >> --- a/block.c >> +++ b/block.c >> @@ -880,6 +880,53 @@ void bdrv_make_anon(BlockDriverState *bs) >> bs->device_name[0] = '\0'; >> } >> >> +/* >> + * Add new bs contents at the top of an image chain while the chain is live, >> + * while keeping required fields on the top layer. >> + * >> + * It is assumed that bs_new already points to an existing image, >> + * with the correct backing filename of top->backing_file >> + */ >> +void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top) >> +{ >> + BlockDriverState tmp; >> + >> + /* the new bs must not be in bdrv_states */ >> + bdrv_make_anon(bs_new); >> + >> + tmp = *bs_new; >> + tmp.backing_hd = bs_new; >> + >> + /* there are some fields that need to stay on the top layer: */ >> + >> + /* dev info */ >> + tmp.dev_ops = bs_top->dev_ops; >> + tmp.dev_opaque = bs_top->dev_opaque; >> + tmp.dev = bs_top->dev; >> + tmp.buffer_alignment = bs_top->buffer_alignment; >> + tmp.copy_on_read = bs_top->copy_on_read; >> + >> + /* i/o timing parameters */ >> + tmp.slice_time = bs_top->slice_time; >> + tmp.slice_start = bs_top->slice_start; >> + tmp.slice_end = bs_top->slice_end; >> + tmp.io_limits = bs_top->io_limits; >> + tmp.io_base = bs_top->io_base; >> + tmp.throttled_reqs = bs_top->throttled_reqs; >> + tmp.block_timer = bs_top->block_timer; >> + tmp.io_limits_enabled = bs_top->io_limits_enabled; > > What about iostatus_enabled/iostatus? Possibly on_read/write_error and > geometry, too. (Maybe the correct answer is that you're doing the right > thing, but I wanted to bring it up) I think you are right, those should probably be preserved as well; I don't know if it hurts to not preserve them, but since these are all set from drive_init, I think it makes sense I think to keep them on the top layer. > >> + >> + /* keep the same entry in bdrv_states */ >> + pstrcpy(tmp.device_name, sizeof(tmp.device_name), bs_top->device_name); >> + tmp.list = bs_top->list; >> + >> + /* swap contents of the fixed new bs and the current top */ >> + *bs_new = *bs_top; >> + *bs_top = tmp; >> + >> + bdrv_detach_dev(bs_new, bs_new->dev); >> +} > > The last line would actually deserve a comment /* clear the copied > fields in the new backing file */, which makes clear that there are > probably some more fields missing in this section. OK, added. > >> + >> void bdrv_delete(BlockDriverState *bs) >> { >> assert(!bs->dev); >> diff --git a/block.h b/block.h >> index cae289b..190a780 100644 >> --- a/block.h >> +++ b/block.h >> @@ -114,6 +114,7 @@ int bdrv_create(BlockDriver *drv, const char* filename, >> int bdrv_create_file(const char* filename, QEMUOptionParameter *options); >> BlockDriverState *bdrv_new(const char *device_name); >> void bdrv_make_anon(BlockDriverState *bs); >> +void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top); >> void bdrv_delete(BlockDriverState *bs); >> int bdrv_parse_cache_flags(const char *mode, int *flags); >> int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags); >> diff --git a/blockdev.c b/blockdev.c >> index 05e7c5e..560f7e8 100644 >> --- a/blockdev.c >> +++ b/blockdev.c >> @@ -714,6 +714,134 @@ void qmp_blockdev_snapshot_sync(const char *device, const char *snapshot_file, >> } >> } >> >> + >> +/* New and old BlockDriverState structs for group snapshots */ >> +typedef struct BlkGroupSnapshotStates { >> + BlockDriverState *old_bs; >> + BlockDriverState *new_bs; >> + bool is_open; >> + QSIMPLEQ_ENTRY(BlkGroupSnapshotStates) entry; >> +} BlkGroupSnapshotStates; >> + >> +/* >> + * 'Atomic' group snapshots. The snapshots are taken as a set, and if any fail >> + * then we do not pivot any of the devices in the group, and abandon the >> + * snapshots >> + */ >> +void qmp_blockdev_group_snapshot_sync(SnapshotDevList *dev_list, >> + Error **errp) > > Too much indentation? Sure :) Fixed. > >> +{ >> + int ret = 0; >> + SnapshotDevList *dev_entry = dev_list; >> + SnapshotDev *dev_info = NULL; >> + BlkGroupSnapshotStates *states; >> + BlockDriver *proto_drv; >> + BlockDriver *drv; >> + int flags; >> + const char *format; >> + const char *snapshot_file; >> + >> + QSIMPLEQ_HEAD(snap_bdrv_states, BlkGroupSnapshotStates) snap_bdrv_states; >> + QSIMPLEQ_INIT(&snap_bdrv_states); >> + >> + /* We don't do anything in this loop that commits us to the snapshot */ >> + while (NULL != dev_entry) { >> + dev_info = dev_entry->value; >> + dev_entry = dev_entry->next; >> + >> + states = g_malloc0(sizeof(BlkGroupSnapshotStates)); >> + states->is_open = false; >> + >> + states->old_bs = bdrv_find(dev_info->device); >> + >> + if (!states->old_bs) { >> + error_set(errp, QERR_DEVICE_NOT_FOUND, dev_info->device); >> + goto fail; >> + } >> + >> + if (bdrv_in_use(states->old_bs)) { >> + error_set(errp, QERR_DEVICE_IN_USE, dev_info->device); >> + goto fail; >> + } >> + >> + snapshot_file = dev_info->snapshot_file; >> + >> + flags = states->old_bs->open_flags; >> + >> + if (!dev_info->has_format) { >> + format = "qcow2"; >> + } else { >> + format = dev_info->format; >> + } >> + >> + drv = bdrv_find_format(format); >> + if (!drv) { >> + error_set(errp, QERR_INVALID_BLOCK_FORMAT, format); >> + goto fail; >> + } >> + >> + proto_drv = bdrv_find_protocol(snapshot_file); >> + if (!proto_drv) { >> + error_set(errp, QERR_INVALID_BLOCK_FORMAT, format); >> + goto fail; >> + } >> + >> + /* create new image w/backing file */ >> + ret = bdrv_img_create(snapshot_file, format, >> + states->old_bs->filename, >> + drv->format_name, NULL, -1, flags); >> + if (ret) { >> + error_set(errp, QERR_OPEN_FILE_FAILED, snapshot_file); >> + goto fail; >> + } >> + >> + /* We will manually add the backing_hd field to the bs later */ >> + states->new_bs = bdrv_new(""); >> + ret = bdrv_open(states->new_bs, snapshot_file, >> + flags | BDRV_O_NO_BACKING, drv); >> + states->is_open = true; >> + >> + if (ret != 0) { >> + error_set(errp, QERR_OPEN_FILE_FAILED, snapshot_file); >> + goto close_and_fail; >> + } >> + QSIMPLEQ_INSERT_TAIL(&snap_bdrv_states, states, entry); > > Inserting it only here means that all the error cases above leak states. Yes. Will move it up to immediately after the bdrv_open(). > >> + } >> + >> + /* >> + * Now we will drain, flush, & pivot everything - we are committed at this >> + * point. >> + */ >> + bdrv_drain_all(); > > I would feel more comfortable if we could do the bdrv_drain_all() at the > very beginning of the function. Not that I know of a specific scenario > that would go wrong, but you have a nested main loop here that could do > more or less anything. I can move it up to the beginning if desired... My thought was that it was best to drain closer to the point of commit. > >> + QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) { >> + bdrv_flush(states->old_bs); > > This can return an error which must be checked. And of course, we must > do it before committing to the snapshot (but after bdrv_drain_all). Hmm. If the flush returns error, do we abandon at this point? Perhaps it would be best to loop through and flush each device first, and if no flushes fail, _then_ loop through and perform the bdrv_append(). Once we are calling bdrv_append(), we want no possible failure points. > >> + /* This removes our old bs from the bdrv_states, and adds the new bs */ >> + bdrv_append(states->new_bs, states->old_bs); >> + } >> + >> + /* success */ >> + goto exit; >> + >> +close_and_fail: >> + /* >> + * failure, and it is all-or-none; abandon each new bs, and keep using >> + * the original bs for all images >> + */ >> + QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) { >> + if (states->is_open) { >> + bdrv_close(states->new_bs); > > bdrv_delete, as Stefan already mentioned. > >> + } >> + } >> +fail: >> +exit: >> + QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) { >> + g_free(states); >> + } >> + return; >> +} > > Kevin