* [Qemu-devel] [PATCH v2 0/2] Group Live Snapshots @ 2012-02-26 19:31 Jeff Cody 2012-02-26 19:31 ` [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command Jeff Cody 2012-02-26 19:31 ` [Qemu-devel] [PATCH v2 2/2] QMP: Add qmp command for blockdev-group-snapshot-sync Jeff Cody 0 siblings, 2 replies; 13+ messages in thread From: Jeff Cody @ 2012-02-26 19:31 UTC (permalink / raw) To: qemu-devel; +Cc: kwolf, pbonzini, eblake, armbru, lcapitulino This patchset adds the ability to take a snapshot of a group of devices, rather than each device individually. Upon failure of any snapshot, all snapshots taken by the command will be abandoned, and the appropriate failure code returned. This differs from v1 in that: * The QAPI input mechanism for JSON-arrays of qdict items is now used correctly, and there is no modification of the existing monitor code. This drops the original patch 1 from v1. * Rather than use bdrv_close() and bdrv_open() to pivot the snapshot, the fields of the BlockDriverState are manipulated so that there are no irrecoverable failure points in the snapshot process. This is based on a suggestion by Kevin Wolf. * The qapi & block code was broken out into patch 1/2, and the QMP command placed patch 2/2 * Since there are no irrecoverable error points, there is a no need for a command to return a list of failures. There is at most one failure to report, which is the first failure encountered. In light of that, patch 3 from v1 was dropped. Some things for careful review: In patch 1/2, in the new bdrv_append() function: * Are all of the relevant fields preserved in the top bs? (see 'bdrv_append()') * Conversely, are any of the fields being preserved that should not be? * Are there race condition concerns at the end of bdrv_append(), at the line '*bs_top = tmp;', which replaces the contents of the current top bs? Jeff Cody (2): qapi: Introduce blockdev-group-snapshot-sync command QMP: Add qmp command for blockdev-group-snapshot-sync block.c | 47 ++++++++++++++++++++ block.h | 1 + blockdev.c | 128 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ qapi-schema.json | 38 ++++++++++++++++ qmp-commands.hx | 39 ++++++++++++++++ 5 files changed, 253 insertions(+), 0 deletions(-) -- 1.7.9.rc2.1.g69204 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command 2012-02-26 19:31 [Qemu-devel] [PATCH v2 0/2] Group Live Snapshots Jeff Cody @ 2012-02-26 19:31 ` Jeff Cody 2012-02-27 11:13 ` Stefan Hajnoczi 2012-02-27 15:46 ` Kevin Wolf 2012-02-26 19:31 ` [Qemu-devel] [PATCH v2 2/2] QMP: Add qmp command for blockdev-group-snapshot-sync Jeff Cody 1 sibling, 2 replies; 13+ messages in thread From: Jeff Cody @ 2012-02-26 19:31 UTC (permalink / raw) To: qemu-devel; +Cc: kwolf, pbonzini, eblake, armbru, lcapitulino This is a QAPI/QMP only command to take a snapshot of a group of devices. This is similar to the blockdev-snapshot-sync command, except blockdev-group-snapshot-sync accepts a list devices, filenames, and formats. It is attempted to keep the snapshot of the group atomic; if the creation or open of any of the new snapshots fails, then all of the new snapshots are abandoned, and the name of the snapshot image that failed is returned. The failure case should not interrupt any operations. Rather than use bdrv_close() along with a subsequent bdrv_open() to perform the pivot, the original image is never closed and the new image is placed 'in front' of the original image via manipulation of the BlockDriverState fields. Thus, once the new snapshot image has been successfully created, there are no more failure points before pivoting to the new snapshot. This allows the group of disks to remain consistent with each other, even across snapshot failures. Signed-off-by: Jeff Cody <jcody@redhat.com> --- block.c | 47 ++++++++++++++++++++ block.h | 1 + blockdev.c | 128 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ qapi-schema.json | 38 ++++++++++++++++ 4 files changed, 214 insertions(+), 0 deletions(-) diff --git a/block.c b/block.c index 3621d11..0045ab1 100644 --- a/block.c +++ b/block.c @@ -880,6 +880,53 @@ void bdrv_make_anon(BlockDriverState *bs) bs->device_name[0] = '\0'; } +/* + * Add new bs contents at the top of an image chain while the chain is live, + * while keeping required fields on the top layer. + * + * It is assumed that bs_new already points to an existing image, + * with the correct backing filename of top->backing_file + */ +void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top) +{ + BlockDriverState tmp; + + /* the new bs must not be in bdrv_states */ + bdrv_make_anon(bs_new); + + tmp = *bs_new; + tmp.backing_hd = bs_new; + + /* there are some fields that need to stay on the top layer: */ + + /* dev info */ + tmp.dev_ops = bs_top->dev_ops; + tmp.dev_opaque = bs_top->dev_opaque; + tmp.dev = bs_top->dev; + tmp.buffer_alignment = bs_top->buffer_alignment; + tmp.copy_on_read = bs_top->copy_on_read; + + /* i/o timing parameters */ + tmp.slice_time = bs_top->slice_time; + tmp.slice_start = bs_top->slice_start; + tmp.slice_end = bs_top->slice_end; + tmp.io_limits = bs_top->io_limits; + tmp.io_base = bs_top->io_base; + tmp.throttled_reqs = bs_top->throttled_reqs; + tmp.block_timer = bs_top->block_timer; + tmp.io_limits_enabled = bs_top->io_limits_enabled; + + /* keep the same entry in bdrv_states */ + pstrcpy(tmp.device_name, sizeof(tmp.device_name), bs_top->device_name); + tmp.list = bs_top->list; + + /* swap contents of the fixed new bs and the current top */ + *bs_new = *bs_top; + *bs_top = tmp; + + bdrv_detach_dev(bs_new, bs_new->dev); +} + void bdrv_delete(BlockDriverState *bs) { assert(!bs->dev); diff --git a/block.h b/block.h index cae289b..190a780 100644 --- a/block.h +++ b/block.h @@ -114,6 +114,7 @@ int bdrv_create(BlockDriver *drv, const char* filename, int bdrv_create_file(const char* filename, QEMUOptionParameter *options); BlockDriverState *bdrv_new(const char *device_name); void bdrv_make_anon(BlockDriverState *bs); +void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top); void bdrv_delete(BlockDriverState *bs); int bdrv_parse_cache_flags(const char *mode, int *flags); int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags); diff --git a/blockdev.c b/blockdev.c index 05e7c5e..560f7e8 100644 --- a/blockdev.c +++ b/blockdev.c @@ -714,6 +714,134 @@ void qmp_blockdev_snapshot_sync(const char *device, const char *snapshot_file, } } + +/* New and old BlockDriverState structs for group snapshots */ +typedef struct BlkGroupSnapshotStates { + BlockDriverState *old_bs; + BlockDriverState *new_bs; + bool is_open; + QSIMPLEQ_ENTRY(BlkGroupSnapshotStates) entry; +} BlkGroupSnapshotStates; + +/* + * 'Atomic' group snapshots. The snapshots are taken as a set, and if any fail + * then we do not pivot any of the devices in the group, and abandon the + * snapshots + */ +void qmp_blockdev_group_snapshot_sync(SnapshotDevList *dev_list, + Error **errp) +{ + int ret = 0; + SnapshotDevList *dev_entry = dev_list; + SnapshotDev *dev_info = NULL; + BlkGroupSnapshotStates *states; + BlockDriver *proto_drv; + BlockDriver *drv; + int flags; + const char *format; + const char *snapshot_file; + + QSIMPLEQ_HEAD(snap_bdrv_states, BlkGroupSnapshotStates) snap_bdrv_states; + QSIMPLEQ_INIT(&snap_bdrv_states); + + /* We don't do anything in this loop that commits us to the snapshot */ + while (NULL != dev_entry) { + dev_info = dev_entry->value; + dev_entry = dev_entry->next; + + states = g_malloc0(sizeof(BlkGroupSnapshotStates)); + states->is_open = false; + + states->old_bs = bdrv_find(dev_info->device); + + if (!states->old_bs) { + error_set(errp, QERR_DEVICE_NOT_FOUND, dev_info->device); + goto fail; + } + + if (bdrv_in_use(states->old_bs)) { + error_set(errp, QERR_DEVICE_IN_USE, dev_info->device); + goto fail; + } + + snapshot_file = dev_info->snapshot_file; + + flags = states->old_bs->open_flags; + + if (!dev_info->has_format) { + format = "qcow2"; + } else { + format = dev_info->format; + } + + drv = bdrv_find_format(format); + if (!drv) { + error_set(errp, QERR_INVALID_BLOCK_FORMAT, format); + goto fail; + } + + proto_drv = bdrv_find_protocol(snapshot_file); + if (!proto_drv) { + error_set(errp, QERR_INVALID_BLOCK_FORMAT, format); + goto fail; + } + + /* create new image w/backing file */ + ret = bdrv_img_create(snapshot_file, format, + states->old_bs->filename, + drv->format_name, NULL, -1, flags); + if (ret) { + error_set(errp, QERR_OPEN_FILE_FAILED, snapshot_file); + goto fail; + } + + /* We will manually add the backing_hd field to the bs later */ + states->new_bs = bdrv_new(""); + ret = bdrv_open(states->new_bs, snapshot_file, + flags | BDRV_O_NO_BACKING, drv); + states->is_open = true; + + if (ret != 0) { + error_set(errp, QERR_OPEN_FILE_FAILED, snapshot_file); + goto close_and_fail; + } + QSIMPLEQ_INSERT_TAIL(&snap_bdrv_states, states, entry); + } + + /* + * Now we will drain, flush, & pivot everything - we are committed at this + * point. + */ + bdrv_drain_all(); + QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) { + bdrv_flush(states->old_bs); + + /* This removes our old bs from the bdrv_states, and adds the new bs */ + bdrv_append(states->new_bs, states->old_bs); + } + + /* success */ + goto exit; + +close_and_fail: + /* + * failure, and it is all-or-none; abandon each new bs, and keep using + * the original bs for all images + */ + QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) { + if (states->is_open) { + bdrv_close(states->new_bs); + } + } +fail: +exit: + QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) { + g_free(states); + } + return; +} + + static void eject_device(BlockDriverState *bs, int force, Error **errp) { if (bdrv_in_use(bs)) { diff --git a/qapi-schema.json b/qapi-schema.json index 80debe6..a0cab8f 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1107,6 +1107,44 @@ { 'command': 'block_resize', 'data': { 'device': 'str', 'size': 'int' }} ## +# @SnapshotDev +# +# @device: the name of the device to generate the snapshot from. +# +# @snapshot-file: the target of the new image. A new file will be created. +# +# @format: #optional the format of the snapshot image, default is 'qcow2'. +## +{ 'type': 'SnapshotDev', + 'data': {'device': 'str', 'snapshot-file': 'str', '*format': 'str' } } + +## +# @blockdev-group-snapshot-sync +# +# Generates a synchronous snapshot of a group of one or more block devices, +# as atomically as possible. If the snapshot of any device in the group +# fails, then the entire group snapshot will be abandoned and the +# appropriate error returned. +# +# List of: +# @SnapshotDev: information needed for the device snapshot +# +# Returns: nothing on success +# If @device is not a valid block device, DeviceNotFound +# If @device is busy, DeviceInUse will be returned +# If @snapshot-file can't be created, OpenFileFailed +# If @snapshot-file can't be opened, OpenFileFailed +# If @format is invalid, InvalidBlockFormat +# +# Note: The group snapshot attempt returns failure on the first snapshot +# device failure. Therefore, there will be only one device or snapshot file +# returned in an error condition, and subsequent devices will not have been +# attempted. +## +{ 'command': 'blockdev-group-snapshot-sync', + 'data': { 'devlist': [ 'SnapshotDev' ] } } + +## # @blockdev-snapshot-sync # # Generates a synchronous snapshot of a block device. -- 1.7.9.rc2.1.g69204 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command 2012-02-26 19:31 ` [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command Jeff Cody @ 2012-02-27 11:13 ` Stefan Hajnoczi 2012-02-27 14:26 ` Jeff Cody 2012-02-27 15:46 ` Kevin Wolf 1 sibling, 1 reply; 13+ messages in thread From: Stefan Hajnoczi @ 2012-02-27 11:13 UTC (permalink / raw) To: Jeff Cody; +Cc: kwolf, armbru, qemu-devel, lcapitulino, pbonzini, eblake On Sun, Feb 26, 2012 at 7:31 PM, Jeff Cody <jcody@redhat.com> wrote: Do you have automated tests for this feature? > +/* > + * Add new bs contents at the top of an image chain while the chain is live, > + * while keeping required fields on the top layer. Please also document the swap behavior. It's pretty important for the caller to realize that once this function returns, their BlockDriverState arguments with have swapped. > + * It is assumed that bs_new already points to an existing image, > + * with the correct backing filename of top->backing_file Not sure what this means. Isn't bs_new going to use bs_top as its backing file? Why "top->backing_file"? > +void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top) > +{ > + BlockDriverState tmp; > + > + /* the new bs must not be in bdrv_states */ > + bdrv_make_anon(bs_new); > + > + tmp = *bs_new; > + tmp.backing_hd = bs_new; This is tricky, would be nice to comment that we will swap bs_new and bs_top later, therefore we need a pointer to bs_new here even though it doesn't make sense yet. > + > + /* there are some fields that need to stay on the top layer: */ > + > + /* dev info */ > + tmp.dev_ops = bs_top->dev_ops; > + tmp.dev_opaque = bs_top->dev_opaque; > + tmp.dev = bs_top->dev; > + tmp.buffer_alignment = bs_top->buffer_alignment; > + tmp.copy_on_read = bs_top->copy_on_read; > + > + /* i/o timing parameters */ > + tmp.slice_time = bs_top->slice_time; > + tmp.slice_start = bs_top->slice_start; > + tmp.slice_end = bs_top->slice_end; > + tmp.io_limits = bs_top->io_limits; > + tmp.io_base = bs_top->io_base; > + tmp.throttled_reqs = bs_top->throttled_reqs; > + tmp.block_timer = bs_top->block_timer; > + tmp.io_limits_enabled = bs_top->io_limits_enabled; Please add a comment into BlockDriverState to warn that changes to fields may require updates to this function too! > + /* We will manually add the backing_hd field to the bs later */ > + states->new_bs = bdrv_new(""); > + ret = bdrv_open(states->new_bs, snapshot_file, > + flags | BDRV_O_NO_BACKING, drv); > + states->is_open = true; What is the point of is_open? If we failed then new_bs is actually not open here. I think you can remove this field and just do the following in close_and_fail: if (states->new_bs) { bdrv_delete(states->new_bs); } (BTW I think close_and_fail is currently leaking new_bs because it only closes it and does not delete it!) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command 2012-02-27 11:13 ` Stefan Hajnoczi @ 2012-02-27 14:26 ` Jeff Cody 2012-02-27 14:40 ` Stefan Hajnoczi 0 siblings, 1 reply; 13+ messages in thread From: Jeff Cody @ 2012-02-27 14:26 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: kwolf, qemu-devel, armbru, lcapitulino, pbonzini, eblake On 02/27/2012 06:13 AM, Stefan Hajnoczi wrote: > On Sun, Feb 26, 2012 at 7:31 PM, Jeff Cody <jcody@redhat.com> wrote: > > Do you have automated tests for this feature? > No, not yet. The testing has been manual. >> +/* >> + * Add new bs contents at the top of an image chain while the chain is live, >> + * while keeping required fields on the top layer. > > Please also document the swap behavior. It's pretty important for the > caller to realize that once this function returns, their > BlockDriverState arguments with have swapped. Good point. How about this: /* * Add new bs contents at the top of an image chain while the chain is * live, while keeping required fields on the top layer. * * This will modify the BlockDriverState fields, and swap contents * between bs_new and bs_top. Both bs_new and bs_top are modified. * * A new image file is not created. It is assumed that bs_new already * points to an existing image, with the correct backing filename of * bs_top->backing_file. */ > >> + * It is assumed that bs_new already points to an existing image, >> + * with the correct backing filename of top->backing_file > > Not sure what this means. Isn't bs_new going to use bs_top as its > backing file? Why "top->backing_file"? Sorry, that should have been 'bs_top->backing_file'. The image file is not created by this function. I added some more explanation, and corrected that typo, in the above comment block. Let me know if you think it still needs more clarification. > >> +void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top) >> +{ >> + BlockDriverState tmp; >> + >> + /* the new bs must not be in bdrv_states */ >> + bdrv_make_anon(bs_new); >> + >> + tmp = *bs_new; >> + tmp.backing_hd = bs_new; > > This is tricky, would be nice to comment that we will swap bs_new and > bs_top later, therefore we need a pointer to bs_new here even though > it doesn't make sense yet. OK, thanks. I will add some clarifying comments - you are right, it is a bit tricky and counter-intuitive. It may also be clearer if this is done closer to the actual swap ('*bs_top = tmp'). I will move it down to above the swap. > >> + >> + /* there are some fields that need to stay on the top layer: */ >> + >> + /* dev info */ >> + tmp.dev_ops = bs_top->dev_ops; >> + tmp.dev_opaque = bs_top->dev_opaque; >> + tmp.dev = bs_top->dev; >> + tmp.buffer_alignment = bs_top->buffer_alignment; >> + tmp.copy_on_read = bs_top->copy_on_read; >> + >> + /* i/o timing parameters */ >> + tmp.slice_time = bs_top->slice_time; >> + tmp.slice_start = bs_top->slice_start; >> + tmp.slice_end = bs_top->slice_end; >> + tmp.io_limits = bs_top->io_limits; >> + tmp.io_base = bs_top->io_base; >> + tmp.throttled_reqs = bs_top->throttled_reqs; >> + tmp.block_timer = bs_top->block_timer; >> + tmp.io_limits_enabled = bs_top->io_limits_enabled; > > Please add a comment into BlockDriverState to warn that changes to > fields may require updates to this function too! OK; I will add a blanket warning at the top to inspect bdrv_append() when adding new fields, to see if they need updated in bdrv_append(). > >> + /* We will manually add the backing_hd field to the bs later */ >> + states->new_bs = bdrv_new(""); >> + ret = bdrv_open(states->new_bs, snapshot_file, >> + flags | BDRV_O_NO_BACKING, drv); >> + states->is_open = true; > > What is the point of is_open? If we failed then new_bs is actually > not open here. Previous entries may have been opened. You are right, however, in that it is not needed at all. We can rely on new_bs being non-NULL. I will remove the 'is_open' references. > > I think you can remove this field and just do the following in close_and_fail: > > if (states->new_bs) { > bdrv_delete(states->new_bs); > } > > (BTW I think close_and_fail is currently leaking new_bs because it > only closes it and does not delete it!) > You are right. That should be bdrv_delete(), and not bdrv_close(). I will fix that, and also rename close_and_fail to delete_and_fail for good measure. Thanks, Jeff ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command 2012-02-27 14:26 ` Jeff Cody @ 2012-02-27 14:40 ` Stefan Hajnoczi 2012-02-27 15:23 ` Jeff Cody 0 siblings, 1 reply; 13+ messages in thread From: Stefan Hajnoczi @ 2012-02-27 14:40 UTC (permalink / raw) To: jcody; +Cc: kwolf, qemu-devel, armbru, lcapitulino, pbonzini, eblake On Mon, Feb 27, 2012 at 2:26 PM, Jeff Cody <jcody@redhat.com> wrote: > On 02/27/2012 06:13 AM, Stefan Hajnoczi wrote: >> On Sun, Feb 26, 2012 at 7:31 PM, Jeff Cody <jcody@redhat.com> wrote: >> >> Do you have automated tests for this feature? >> > > No, not yet. The testing has been manual. For image streaming I used the Python unittest framework along with QMP/qmp.py to create tests. I am going to submit it as a qemu-iotest. We really need something along the lines of a harness with QMP support so that these block layer features can be tested. I will CC you on the email. > >>> +/* >>> + * Add new bs contents at the top of an image chain while the chain is live, >>> + * while keeping required fields on the top layer. >> >> Please also document the swap behavior. It's pretty important for the >> caller to realize that once this function returns, their >> BlockDriverState arguments with have swapped. > > Good point. How about this: > > /* > * Add new bs contents at the top of an image chain while the chain is > * live, while keeping required fields on the top layer. > * > * This will modify the BlockDriverState fields, and swap contents > * between bs_new and bs_top. Both bs_new and bs_top are modified. Looks good. >>> + * It is assumed that bs_new already points to an existing image, >>> + * with the correct backing filename of top->backing_file >> >> Not sure what this means. Isn't bs_new going to use bs_top as its >> backing file? Why "top->backing_file"? > > Sorry, that should have been 'bs_top->backing_file'. The image file is > not created by this function. I added some more explanation, and > corrected that typo, in the above comment block. Let me know if you > think it still needs more clarification. I still don't follow. Old bs_top's image file itself becomes the backing file, not bs_top->backing_file. Perhaps I'm misinterpreting because of how swap changes bs_top and bs_new, but I'm reading it from the perspective of the caller when they pass in bs_top. The rest looks good. Stefan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command 2012-02-27 14:40 ` Stefan Hajnoczi @ 2012-02-27 15:23 ` Jeff Cody 2012-02-27 15:31 ` Stefan Hajnoczi 0 siblings, 1 reply; 13+ messages in thread From: Jeff Cody @ 2012-02-27 15:23 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: kwolf, qemu-devel, armbru, lcapitulino, pbonzini, eblake On 02/27/2012 09:40 AM, Stefan Hajnoczi wrote: > On Mon, Feb 27, 2012 at 2:26 PM, Jeff Cody <jcody@redhat.com> wrote: >> On 02/27/2012 06:13 AM, Stefan Hajnoczi wrote: >>> On Sun, Feb 26, 2012 at 7:31 PM, Jeff Cody <jcody@redhat.com> wrote: >>> >>> Do you have automated tests for this feature? >>> >> >> No, not yet. The testing has been manual. > > For image streaming I used the Python unittest framework along with > QMP/qmp.py to create tests. I am going to submit it as a qemu-iotest. > We really need something along the lines of a harness with QMP > support so that these block layer features can be tested. I will CC > you on the email. Excellent, thanks. > >> >>>> +/* >>>> + * Add new bs contents at the top of an image chain while the chain is live, >>>> + * while keeping required fields on the top layer. >>> >>> Please also document the swap behavior. It's pretty important for the >>> caller to realize that once this function returns, their >>> BlockDriverState arguments with have swapped. >> >> Good point. How about this: >> >> /* >> * Add new bs contents at the top of an image chain while the chain is >> * live, while keeping required fields on the top layer. >> * >> * This will modify the BlockDriverState fields, and swap contents >> * between bs_new and bs_top. Both bs_new and bs_top are modified. > > Looks good. > >>>> + * It is assumed that bs_new already points to an existing image, >>>> + * with the correct backing filename of top->backing_file >>> >>> Not sure what this means. Isn't bs_new going to use bs_top as its >>> backing file? Why "top->backing_file"? >> >> Sorry, that should have been 'bs_top->backing_file'. The image file is >> not created by this function. I added some more explanation, and >> corrected that typo, in the above comment block. Let me know if you >> think it still needs more clarification. > > I still don't follow. Old bs_top's image file itself becomes the > backing file, not bs_top->backing_file. Perhaps I'm misinterpreting > because of how swap changes bs_top and bs_new, but I'm reading it from > the perspective of the caller when they pass in bs_top. > Maybe it would be better to just replace that part of the comment with something that says "This function does not create the image file". The function bdrv_append() will neither create the image file, or set (in the case of qcow2) the extended headers of the file to have the backing filename. It is only concerned with the bs contents. For example, in group snapshots in blockdev.c, the image file is first created, and then we do the swap: /* create new image w/backing file */ ret = bdrv_img_create(snapshot_file, format, states->old_bs->filename, drv->format_name, NULL, -1, flags); ... ret = bdrv_open(states->new_bs, snapshot_file, flags | BDRV_O_NO_BACKING, drv); ... bdrv_append(states->new_bs, states->old_bs); > The rest looks good. > > Stefan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command 2012-02-27 15:23 ` Jeff Cody @ 2012-02-27 15:31 ` Stefan Hajnoczi 0 siblings, 0 replies; 13+ messages in thread From: Stefan Hajnoczi @ 2012-02-27 15:31 UTC (permalink / raw) To: jcody; +Cc: kwolf, qemu-devel, armbru, lcapitulino, pbonzini, eblake On Mon, Feb 27, 2012 at 3:23 PM, Jeff Cody <jcody@redhat.com> wrote: > On 02/27/2012 09:40 AM, Stefan Hajnoczi wrote: >> On Mon, Feb 27, 2012 at 2:26 PM, Jeff Cody <jcody@redhat.com> wrote: >>> On 02/27/2012 06:13 AM, Stefan Hajnoczi wrote: >>>> On Sun, Feb 26, 2012 at 7:31 PM, Jeff Cody <jcody@redhat.com> wrote: >>>>> + * It is assumed that bs_new already points to an existing image, >>>>> + * with the correct backing filename of top->backing_file >>>> >>>> Not sure what this means. Isn't bs_new going to use bs_top as its >>>> backing file? Why "top->backing_file"? >>> >>> Sorry, that should have been 'bs_top->backing_file'. The image file is >>> not created by this function. I added some more explanation, and >>> corrected that typo, in the above comment block. Let me know if you >>> think it still needs more clarification. >> >> I still don't follow. Old bs_top's image file itself becomes the >> backing file, not bs_top->backing_file. Perhaps I'm misinterpreting >> because of how swap changes bs_top and bs_new, but I'm reading it from >> the perspective of the caller when they pass in bs_top. >> > > Maybe it would be better to just replace that part of the comment with > something that says "This function does not create the image file". > > The function bdrv_append() will neither create the image file, or set > (in the case of qcow2) the extended headers of the file to have the > backing filename. It is only concerned with the bs contents. Makes sense. Stefan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command 2012-02-26 19:31 ` [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command Jeff Cody 2012-02-27 11:13 ` Stefan Hajnoczi @ 2012-02-27 15:46 ` Kevin Wolf 2012-02-27 16:51 ` Paolo Bonzini 2012-02-27 17:02 ` Jeff Cody 1 sibling, 2 replies; 13+ messages in thread From: Kevin Wolf @ 2012-02-27 15:46 UTC (permalink / raw) To: Jeff Cody; +Cc: pbonzini, eblake, qemu-devel, armbru, lcapitulino Am 26.02.2012 20:31, schrieb Jeff Cody: > This is a QAPI/QMP only command to take a snapshot of a group of > devices. This is similar to the blockdev-snapshot-sync command, except > blockdev-group-snapshot-sync accepts a list devices, filenames, and > formats. > > It is attempted to keep the snapshot of the group atomic; if the > creation or open of any of the new snapshots fails, then all of > the new snapshots are abandoned, and the name of the snapshot image > that failed is returned. The failure case should not interrupt > any operations. > > Rather than use bdrv_close() along with a subsequent bdrv_open() to > perform the pivot, the original image is never closed and the new > image is placed 'in front' of the original image via manipulation > of the BlockDriverState fields. Thus, once the new snapshot image > has been successfully created, there are no more failure points > before pivoting to the new snapshot. > > This allows the group of disks to remain consistent with each other, > even across snapshot failures. > > Signed-off-by: Jeff Cody <jcody@redhat.com> > --- > block.c | 47 ++++++++++++++++++++ > block.h | 1 + > blockdev.c | 128 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > qapi-schema.json | 38 ++++++++++++++++ > 4 files changed, 214 insertions(+), 0 deletions(-) > > diff --git a/block.c b/block.c > index 3621d11..0045ab1 100644 > --- a/block.c > +++ b/block.c > @@ -880,6 +880,53 @@ void bdrv_make_anon(BlockDriverState *bs) > bs->device_name[0] = '\0'; > } > > +/* > + * Add new bs contents at the top of an image chain while the chain is live, > + * while keeping required fields on the top layer. > + * > + * It is assumed that bs_new already points to an existing image, > + * with the correct backing filename of top->backing_file > + */ > +void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top) > +{ > + BlockDriverState tmp; > + > + /* the new bs must not be in bdrv_states */ > + bdrv_make_anon(bs_new); > + > + tmp = *bs_new; > + tmp.backing_hd = bs_new; > + > + /* there are some fields that need to stay on the top layer: */ > + > + /* dev info */ > + tmp.dev_ops = bs_top->dev_ops; > + tmp.dev_opaque = bs_top->dev_opaque; > + tmp.dev = bs_top->dev; > + tmp.buffer_alignment = bs_top->buffer_alignment; > + tmp.copy_on_read = bs_top->copy_on_read; > + > + /* i/o timing parameters */ > + tmp.slice_time = bs_top->slice_time; > + tmp.slice_start = bs_top->slice_start; > + tmp.slice_end = bs_top->slice_end; > + tmp.io_limits = bs_top->io_limits; > + tmp.io_base = bs_top->io_base; > + tmp.throttled_reqs = bs_top->throttled_reqs; > + tmp.block_timer = bs_top->block_timer; > + tmp.io_limits_enabled = bs_top->io_limits_enabled; What about iostatus_enabled/iostatus? Possibly on_read/write_error and geometry, too. (Maybe the correct answer is that you're doing the right thing, but I wanted to bring it up) > + > + /* keep the same entry in bdrv_states */ > + pstrcpy(tmp.device_name, sizeof(tmp.device_name), bs_top->device_name); > + tmp.list = bs_top->list; > + > + /* swap contents of the fixed new bs and the current top */ > + *bs_new = *bs_top; > + *bs_top = tmp; > + > + bdrv_detach_dev(bs_new, bs_new->dev); > +} The last line would actually deserve a comment /* clear the copied fields in the new backing file */, which makes clear that there are probably some more fields missing in this section. > + > void bdrv_delete(BlockDriverState *bs) > { > assert(!bs->dev); > diff --git a/block.h b/block.h > index cae289b..190a780 100644 > --- a/block.h > +++ b/block.h > @@ -114,6 +114,7 @@ int bdrv_create(BlockDriver *drv, const char* filename, > int bdrv_create_file(const char* filename, QEMUOptionParameter *options); > BlockDriverState *bdrv_new(const char *device_name); > void bdrv_make_anon(BlockDriverState *bs); > +void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top); > void bdrv_delete(BlockDriverState *bs); > int bdrv_parse_cache_flags(const char *mode, int *flags); > int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags); > diff --git a/blockdev.c b/blockdev.c > index 05e7c5e..560f7e8 100644 > --- a/blockdev.c > +++ b/blockdev.c > @@ -714,6 +714,134 @@ void qmp_blockdev_snapshot_sync(const char *device, const char *snapshot_file, > } > } > > + > +/* New and old BlockDriverState structs for group snapshots */ > +typedef struct BlkGroupSnapshotStates { > + BlockDriverState *old_bs; > + BlockDriverState *new_bs; > + bool is_open; > + QSIMPLEQ_ENTRY(BlkGroupSnapshotStates) entry; > +} BlkGroupSnapshotStates; > + > +/* > + * 'Atomic' group snapshots. The snapshots are taken as a set, and if any fail > + * then we do not pivot any of the devices in the group, and abandon the > + * snapshots > + */ > +void qmp_blockdev_group_snapshot_sync(SnapshotDevList *dev_list, > + Error **errp) Too much indentation? > +{ > + int ret = 0; > + SnapshotDevList *dev_entry = dev_list; > + SnapshotDev *dev_info = NULL; > + BlkGroupSnapshotStates *states; > + BlockDriver *proto_drv; > + BlockDriver *drv; > + int flags; > + const char *format; > + const char *snapshot_file; > + > + QSIMPLEQ_HEAD(snap_bdrv_states, BlkGroupSnapshotStates) snap_bdrv_states; > + QSIMPLEQ_INIT(&snap_bdrv_states); > + > + /* We don't do anything in this loop that commits us to the snapshot */ > + while (NULL != dev_entry) { > + dev_info = dev_entry->value; > + dev_entry = dev_entry->next; > + > + states = g_malloc0(sizeof(BlkGroupSnapshotStates)); > + states->is_open = false; > + > + states->old_bs = bdrv_find(dev_info->device); > + > + if (!states->old_bs) { > + error_set(errp, QERR_DEVICE_NOT_FOUND, dev_info->device); > + goto fail; > + } > + > + if (bdrv_in_use(states->old_bs)) { > + error_set(errp, QERR_DEVICE_IN_USE, dev_info->device); > + goto fail; > + } > + > + snapshot_file = dev_info->snapshot_file; > + > + flags = states->old_bs->open_flags; > + > + if (!dev_info->has_format) { > + format = "qcow2"; > + } else { > + format = dev_info->format; > + } > + > + drv = bdrv_find_format(format); > + if (!drv) { > + error_set(errp, QERR_INVALID_BLOCK_FORMAT, format); > + goto fail; > + } > + > + proto_drv = bdrv_find_protocol(snapshot_file); > + if (!proto_drv) { > + error_set(errp, QERR_INVALID_BLOCK_FORMAT, format); > + goto fail; > + } > + > + /* create new image w/backing file */ > + ret = bdrv_img_create(snapshot_file, format, > + states->old_bs->filename, > + drv->format_name, NULL, -1, flags); > + if (ret) { > + error_set(errp, QERR_OPEN_FILE_FAILED, snapshot_file); > + goto fail; > + } > + > + /* We will manually add the backing_hd field to the bs later */ > + states->new_bs = bdrv_new(""); > + ret = bdrv_open(states->new_bs, snapshot_file, > + flags | BDRV_O_NO_BACKING, drv); > + states->is_open = true; > + > + if (ret != 0) { > + error_set(errp, QERR_OPEN_FILE_FAILED, snapshot_file); > + goto close_and_fail; > + } > + QSIMPLEQ_INSERT_TAIL(&snap_bdrv_states, states, entry); Inserting it only here means that all the error cases above leak states. > + } > + > + /* > + * Now we will drain, flush, & pivot everything - we are committed at this > + * point. > + */ > + bdrv_drain_all(); I would feel more comfortable if we could do the bdrv_drain_all() at the very beginning of the function. Not that I know of a specific scenario that would go wrong, but you have a nested main loop here that could do more or less anything. > + QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) { > + bdrv_flush(states->old_bs); This can return an error which must be checked. And of course, we must do it before committing to the snapshot (but after bdrv_drain_all). > + /* This removes our old bs from the bdrv_states, and adds the new bs */ > + bdrv_append(states->new_bs, states->old_bs); > + } > + > + /* success */ > + goto exit; > + > +close_and_fail: > + /* > + * failure, and it is all-or-none; abandon each new bs, and keep using > + * the original bs for all images > + */ > + QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) { > + if (states->is_open) { > + bdrv_close(states->new_bs); bdrv_delete, as Stefan already mentioned. > + } > + } > +fail: > +exit: > + QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) { > + g_free(states); > + } > + return; > +} Kevin ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command 2012-02-27 15:46 ` Kevin Wolf @ 2012-02-27 16:51 ` Paolo Bonzini 2012-02-27 17:02 ` Jeff Cody 1 sibling, 0 replies; 13+ messages in thread From: Paolo Bonzini @ 2012-02-27 16:51 UTC (permalink / raw) To: Kevin Wolf; +Cc: Jeff Cody, lcapitulino, eblake, qemu-devel, armbru On 02/27/2012 04:46 PM, Kevin Wolf wrote: > > + /* > > + * Now we will drain, flush, & pivot everything - we are committed at this > > + * point. > > + */ > > + bdrv_drain_all(); > > I would feel more comfortable if we could do the bdrv_drain_all() at the > very beginning of the function. Not that I know of a specific scenario > that would go wrong, but you have a nested main loop here that could do > more or less anything. The same could be said of a bdrv_read or a bdrv_write done by bdrv_open, no? Paolo ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command 2012-02-27 15:46 ` Kevin Wolf 2012-02-27 16:51 ` Paolo Bonzini @ 2012-02-27 17:02 ` Jeff Cody 2012-02-27 17:22 ` Kevin Wolf 1 sibling, 1 reply; 13+ messages in thread From: Jeff Cody @ 2012-02-27 17:02 UTC (permalink / raw) To: Kevin Wolf; +Cc: pbonzini, eblake, qemu-devel, armbru, lcapitulino On 02/27/2012 10:46 AM, Kevin Wolf wrote: > Am 26.02.2012 20:31, schrieb Jeff Cody: >> This is a QAPI/QMP only command to take a snapshot of a group of >> devices. This is similar to the blockdev-snapshot-sync command, except >> blockdev-group-snapshot-sync accepts a list devices, filenames, and >> formats. >> >> It is attempted to keep the snapshot of the group atomic; if the >> creation or open of any of the new snapshots fails, then all of >> the new snapshots are abandoned, and the name of the snapshot image >> that failed is returned. The failure case should not interrupt >> any operations. >> >> Rather than use bdrv_close() along with a subsequent bdrv_open() to >> perform the pivot, the original image is never closed and the new >> image is placed 'in front' of the original image via manipulation >> of the BlockDriverState fields. Thus, once the new snapshot image >> has been successfully created, there are no more failure points >> before pivoting to the new snapshot. >> >> This allows the group of disks to remain consistent with each other, >> even across snapshot failures. >> >> Signed-off-by: Jeff Cody <jcody@redhat.com> >> --- >> block.c | 47 ++++++++++++++++++++ >> block.h | 1 + >> blockdev.c | 128 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> qapi-schema.json | 38 ++++++++++++++++ >> 4 files changed, 214 insertions(+), 0 deletions(-) >> >> diff --git a/block.c b/block.c >> index 3621d11..0045ab1 100644 >> --- a/block.c >> +++ b/block.c >> @@ -880,6 +880,53 @@ void bdrv_make_anon(BlockDriverState *bs) >> bs->device_name[0] = '\0'; >> } >> >> +/* >> + * Add new bs contents at the top of an image chain while the chain is live, >> + * while keeping required fields on the top layer. >> + * >> + * It is assumed that bs_new already points to an existing image, >> + * with the correct backing filename of top->backing_file >> + */ >> +void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top) >> +{ >> + BlockDriverState tmp; >> + >> + /* the new bs must not be in bdrv_states */ >> + bdrv_make_anon(bs_new); >> + >> + tmp = *bs_new; >> + tmp.backing_hd = bs_new; >> + >> + /* there are some fields that need to stay on the top layer: */ >> + >> + /* dev info */ >> + tmp.dev_ops = bs_top->dev_ops; >> + tmp.dev_opaque = bs_top->dev_opaque; >> + tmp.dev = bs_top->dev; >> + tmp.buffer_alignment = bs_top->buffer_alignment; >> + tmp.copy_on_read = bs_top->copy_on_read; >> + >> + /* i/o timing parameters */ >> + tmp.slice_time = bs_top->slice_time; >> + tmp.slice_start = bs_top->slice_start; >> + tmp.slice_end = bs_top->slice_end; >> + tmp.io_limits = bs_top->io_limits; >> + tmp.io_base = bs_top->io_base; >> + tmp.throttled_reqs = bs_top->throttled_reqs; >> + tmp.block_timer = bs_top->block_timer; >> + tmp.io_limits_enabled = bs_top->io_limits_enabled; > > What about iostatus_enabled/iostatus? Possibly on_read/write_error and > geometry, too. (Maybe the correct answer is that you're doing the right > thing, but I wanted to bring it up) I think you are right, those should probably be preserved as well; I don't know if it hurts to not preserve them, but since these are all set from drive_init, I think it makes sense I think to keep them on the top layer. > >> + >> + /* keep the same entry in bdrv_states */ >> + pstrcpy(tmp.device_name, sizeof(tmp.device_name), bs_top->device_name); >> + tmp.list = bs_top->list; >> + >> + /* swap contents of the fixed new bs and the current top */ >> + *bs_new = *bs_top; >> + *bs_top = tmp; >> + >> + bdrv_detach_dev(bs_new, bs_new->dev); >> +} > > The last line would actually deserve a comment /* clear the copied > fields in the new backing file */, which makes clear that there are > probably some more fields missing in this section. OK, added. > >> + >> void bdrv_delete(BlockDriverState *bs) >> { >> assert(!bs->dev); >> diff --git a/block.h b/block.h >> index cae289b..190a780 100644 >> --- a/block.h >> +++ b/block.h >> @@ -114,6 +114,7 @@ int bdrv_create(BlockDriver *drv, const char* filename, >> int bdrv_create_file(const char* filename, QEMUOptionParameter *options); >> BlockDriverState *bdrv_new(const char *device_name); >> void bdrv_make_anon(BlockDriverState *bs); >> +void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top); >> void bdrv_delete(BlockDriverState *bs); >> int bdrv_parse_cache_flags(const char *mode, int *flags); >> int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags); >> diff --git a/blockdev.c b/blockdev.c >> index 05e7c5e..560f7e8 100644 >> --- a/blockdev.c >> +++ b/blockdev.c >> @@ -714,6 +714,134 @@ void qmp_blockdev_snapshot_sync(const char *device, const char *snapshot_file, >> } >> } >> >> + >> +/* New and old BlockDriverState structs for group snapshots */ >> +typedef struct BlkGroupSnapshotStates { >> + BlockDriverState *old_bs; >> + BlockDriverState *new_bs; >> + bool is_open; >> + QSIMPLEQ_ENTRY(BlkGroupSnapshotStates) entry; >> +} BlkGroupSnapshotStates; >> + >> +/* >> + * 'Atomic' group snapshots. The snapshots are taken as a set, and if any fail >> + * then we do not pivot any of the devices in the group, and abandon the >> + * snapshots >> + */ >> +void qmp_blockdev_group_snapshot_sync(SnapshotDevList *dev_list, >> + Error **errp) > > Too much indentation? Sure :) Fixed. > >> +{ >> + int ret = 0; >> + SnapshotDevList *dev_entry = dev_list; >> + SnapshotDev *dev_info = NULL; >> + BlkGroupSnapshotStates *states; >> + BlockDriver *proto_drv; >> + BlockDriver *drv; >> + int flags; >> + const char *format; >> + const char *snapshot_file; >> + >> + QSIMPLEQ_HEAD(snap_bdrv_states, BlkGroupSnapshotStates) snap_bdrv_states; >> + QSIMPLEQ_INIT(&snap_bdrv_states); >> + >> + /* We don't do anything in this loop that commits us to the snapshot */ >> + while (NULL != dev_entry) { >> + dev_info = dev_entry->value; >> + dev_entry = dev_entry->next; >> + >> + states = g_malloc0(sizeof(BlkGroupSnapshotStates)); >> + states->is_open = false; >> + >> + states->old_bs = bdrv_find(dev_info->device); >> + >> + if (!states->old_bs) { >> + error_set(errp, QERR_DEVICE_NOT_FOUND, dev_info->device); >> + goto fail; >> + } >> + >> + if (bdrv_in_use(states->old_bs)) { >> + error_set(errp, QERR_DEVICE_IN_USE, dev_info->device); >> + goto fail; >> + } >> + >> + snapshot_file = dev_info->snapshot_file; >> + >> + flags = states->old_bs->open_flags; >> + >> + if (!dev_info->has_format) { >> + format = "qcow2"; >> + } else { >> + format = dev_info->format; >> + } >> + >> + drv = bdrv_find_format(format); >> + if (!drv) { >> + error_set(errp, QERR_INVALID_BLOCK_FORMAT, format); >> + goto fail; >> + } >> + >> + proto_drv = bdrv_find_protocol(snapshot_file); >> + if (!proto_drv) { >> + error_set(errp, QERR_INVALID_BLOCK_FORMAT, format); >> + goto fail; >> + } >> + >> + /* create new image w/backing file */ >> + ret = bdrv_img_create(snapshot_file, format, >> + states->old_bs->filename, >> + drv->format_name, NULL, -1, flags); >> + if (ret) { >> + error_set(errp, QERR_OPEN_FILE_FAILED, snapshot_file); >> + goto fail; >> + } >> + >> + /* We will manually add the backing_hd field to the bs later */ >> + states->new_bs = bdrv_new(""); >> + ret = bdrv_open(states->new_bs, snapshot_file, >> + flags | BDRV_O_NO_BACKING, drv); >> + states->is_open = true; >> + >> + if (ret != 0) { >> + error_set(errp, QERR_OPEN_FILE_FAILED, snapshot_file); >> + goto close_and_fail; >> + } >> + QSIMPLEQ_INSERT_TAIL(&snap_bdrv_states, states, entry); > > Inserting it only here means that all the error cases above leak states. Yes. Will move it up to immediately after the bdrv_open(). > >> + } >> + >> + /* >> + * Now we will drain, flush, & pivot everything - we are committed at this >> + * point. >> + */ >> + bdrv_drain_all(); > > I would feel more comfortable if we could do the bdrv_drain_all() at the > very beginning of the function. Not that I know of a specific scenario > that would go wrong, but you have a nested main loop here that could do > more or less anything. I can move it up to the beginning if desired... My thought was that it was best to drain closer to the point of commit. > >> + QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) { >> + bdrv_flush(states->old_bs); > > This can return an error which must be checked. And of course, we must > do it before committing to the snapshot (but after bdrv_drain_all). Hmm. If the flush returns error, do we abandon at this point? Perhaps it would be best to loop through and flush each device first, and if no flushes fail, _then_ loop through and perform the bdrv_append(). Once we are calling bdrv_append(), we want no possible failure points. > >> + /* This removes our old bs from the bdrv_states, and adds the new bs */ >> + bdrv_append(states->new_bs, states->old_bs); >> + } >> + >> + /* success */ >> + goto exit; >> + >> +close_and_fail: >> + /* >> + * failure, and it is all-or-none; abandon each new bs, and keep using >> + * the original bs for all images >> + */ >> + QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) { >> + if (states->is_open) { >> + bdrv_close(states->new_bs); > > bdrv_delete, as Stefan already mentioned. > >> + } >> + } >> +fail: >> +exit: >> + QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) { >> + g_free(states); >> + } >> + return; >> +} > > Kevin ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command 2012-02-27 17:02 ` Jeff Cody @ 2012-02-27 17:22 ` Kevin Wolf 2012-02-27 17:31 ` Jeff Cody 0 siblings, 1 reply; 13+ messages in thread From: Kevin Wolf @ 2012-02-27 17:22 UTC (permalink / raw) To: jcody; +Cc: pbonzini, eblake, qemu-devel, armbru, lcapitulino Am 27.02.2012 18:02, schrieb Jeff Cody: >>> + >>> + /* keep the same entry in bdrv_states */ >>> + pstrcpy(tmp.device_name, sizeof(tmp.device_name), bs_top->device_name); >>> + tmp.list = bs_top->list; >>> + >>> + /* swap contents of the fixed new bs and the current top */ >>> + *bs_new = *bs_top; >>> + *bs_top = tmp; >>> + >>> + bdrv_detach_dev(bs_new, bs_new->dev); >>> +} >> >> The last line would actually deserve a comment /* clear the copied >> fields in the new backing file */, which makes clear that there are >> probably some more fields missing in this section. > > OK, added. Only the comment or also clearing other fields? For some of them it's very obvious that they need to be cleared (copy on read, I/O throttling). >>> + } >>> + >>> + /* >>> + * Now we will drain, flush, & pivot everything - we are committed at this >>> + * point. >>> + */ >>> + bdrv_drain_all(); >> >> I would feel more comfortable if we could do the bdrv_drain_all() at the >> very beginning of the function. Not that I know of a specific scenario >> that would go wrong, but you have a nested main loop here that could do >> more or less anything. > > I can move it up to the beginning if desired... My thought was that it > was best to drain closer to the point of commit. As long as we don't create new AIO requests here, drained is drained. But anyway, I'm not requesting a change here, it was just a feeling. >> >>> + QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) { >>> + bdrv_flush(states->old_bs); >> >> This can return an error which must be checked. And of course, we must >> do it before committing to the snapshot (but after bdrv_drain_all). > > Hmm. If the flush returns error, do we abandon at this point? Perhaps it > would be best to loop through and flush each device first, and if no > flushes fail, _then_ loop through and perform the bdrv_append(). Once we > are calling bdrv_append(), we want no possible failure points. Yes, this is what I meant. Sorry for the somewhat vague wording. Kevin ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command 2012-02-27 17:22 ` Kevin Wolf @ 2012-02-27 17:31 ` Jeff Cody 0 siblings, 0 replies; 13+ messages in thread From: Jeff Cody @ 2012-02-27 17:31 UTC (permalink / raw) To: Kevin Wolf; +Cc: pbonzini, lcapitulino, eblake, qemu-devel, armbru On 02/27/2012 12:22 PM, Kevin Wolf wrote: > Am 27.02.2012 18:02, schrieb Jeff Cody: >>>> + >>>> + /* keep the same entry in bdrv_states */ >>>> + pstrcpy(tmp.device_name, sizeof(tmp.device_name), bs_top->device_name); >>>> + tmp.list = bs_top->list; >>>> + >>>> + /* swap contents of the fixed new bs and the current top */ >>>> + *bs_new = *bs_top; >>>> + *bs_top = tmp; >>>> + >>>> + bdrv_detach_dev(bs_new, bs_new->dev); >>>> +} >>> >>> The last line would actually deserve a comment /* clear the copied >>> fields in the new backing file */, which makes clear that there are >>> probably some more fields missing in this section. >> >> OK, added. > > Only the comment or also clearing other fields? For some of them it's > very obvious that they need to be cleared (copy on read, I/O throttling). Sorry; yes to clearing the other fields as well. I don't know if it hurts to leave them not cleared, but logically it makes sense to clear them out. > >>>> + } >>>> + >>>> + /* >>>> + * Now we will drain, flush, & pivot everything - we are committed at this >>>> + * point. >>>> + */ >>>> + bdrv_drain_all(); >>> >>> I would feel more comfortable if we could do the bdrv_drain_all() at the >>> very beginning of the function. Not that I know of a specific scenario >>> that would go wrong, but you have a nested main loop here that could do >>> more or less anything. >> >> I can move it up to the beginning if desired... My thought was that it >> was best to drain closer to the point of commit. > > As long as we don't create new AIO requests here, drained is drained. > > But anyway, I'm not requesting a change here, it was just a feeling. > >>> >>>> + QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) { >>>> + bdrv_flush(states->old_bs); >>> >>> This can return an error which must be checked. And of course, we must >>> do it before committing to the snapshot (but after bdrv_drain_all). >> >> Hmm. If the flush returns error, do we abandon at this point? Perhaps it >> would be best to loop through and flush each device first, and if no >> flushes fail, _then_ loop through and perform the bdrv_append(). Once we >> are calling bdrv_append(), we want no possible failure points. > > Yes, this is what I meant. Sorry for the somewhat vague wording. > > Kevin > ^ permalink raw reply [flat|nested] 13+ messages in thread
* [Qemu-devel] [PATCH v2 2/2] QMP: Add qmp command for blockdev-group-snapshot-sync 2012-02-26 19:31 [Qemu-devel] [PATCH v2 0/2] Group Live Snapshots Jeff Cody 2012-02-26 19:31 ` [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command Jeff Cody @ 2012-02-26 19:31 ` Jeff Cody 1 sibling, 0 replies; 13+ messages in thread From: Jeff Cody @ 2012-02-26 19:31 UTC (permalink / raw) To: qemu-devel; +Cc: kwolf, pbonzini, eblake, armbru, lcapitulino This adds the QMP command for blockdev-group-snapshot-sync. It takes an array in as the input, for the argument devlist. The array consists of the following elements: + device: device to snapshot. e.g. "ide-hd0", "virtio0" + snapshot-file: path & file for the snapshot image. e.g. "/tmp/file.img" + format: snapshot format. e.g., "qcow2". Optional There is no HMP equivalent for the command. Signed-off-by: Jeff Cody <jcody@redhat.com> --- qmp-commands.hx | 39 +++++++++++++++++++++++++++++++++++++++ 1 files changed, 39 insertions(+), 0 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index bd6b641..365489c 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -665,6 +665,45 @@ EQMP .args_type = "device:B", .mhandler.cmd_new = qmp_marshal_input_block_job_cancel, }, + { + .name = "blockdev-group-snapshot-sync", + .args_type = "devlist:O", + .params = "device:B,snapshot-file:s,format:s?", + .mhandler.cmd_new = qmp_marshal_input_blockdev_group_snapshot_sync, + }, + +SQMP +blockdev-group-snapshot-sync +---------------------- + +Synchronous snapshot of one or more block devices. A list array input +is accepted, that contains the device and snapshot file information for +each device in group. The default format, if not specified, is qcow2. + +If there is any failure creating or opening a new snapshot, all snapshots +for the group are abandoned, and the original disks pre-snapshot attempt +are used. + + +Arguments: + +devlist array: + - "device": device name to snapshot (json-string) + - "snapshot-file": name of new image file (json-string) + - "format": format of new image (json-string, optional) + +Example: + +-> { "execute": "blockdev-group-snapshot-sync", "arguments": + { "devlist": [{ "device": "ide-hd0", + "snapshot-file": "/some/place/my-image", + "format": "qcow2" }, + { "device": "ide-hd1", + "snapshot-file": "/some/place/my-image2", + "format": "qcow2" }] } } +<- { "return": {} } + +EQMP { .name = "blockdev-snapshot-sync", -- 1.7.9.rc2.1.g69204 ^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2012-02-27 17:31 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-02-26 19:31 [Qemu-devel] [PATCH v2 0/2] Group Live Snapshots Jeff Cody 2012-02-26 19:31 ` [Qemu-devel] [PATCH v2 1/2] qapi: Introduce blockdev-group-snapshot-sync command Jeff Cody 2012-02-27 11:13 ` Stefan Hajnoczi 2012-02-27 14:26 ` Jeff Cody 2012-02-27 14:40 ` Stefan Hajnoczi 2012-02-27 15:23 ` Jeff Cody 2012-02-27 15:31 ` Stefan Hajnoczi 2012-02-27 15:46 ` Kevin Wolf 2012-02-27 16:51 ` Paolo Bonzini 2012-02-27 17:02 ` Jeff Cody 2012-02-27 17:22 ` Kevin Wolf 2012-02-27 17:31 ` Jeff Cody 2012-02-26 19:31 ` [Qemu-devel] [PATCH v2 2/2] QMP: Add qmp command for blockdev-group-snapshot-sync Jeff Cody
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.