* [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot @ 2016-01-12 6:03 Denis V. Lunev 2016-01-12 14:16 ` Kevin Wolf 2016-01-14 11:33 ` [Qemu-devel] [PATCH 1/1] RESUME " Denis V. Lunev 0 siblings, 2 replies; 25+ messages in thread From: Denis V. Lunev @ 2016-01-12 6:03 UTC (permalink / raw) Cc: Kevin Wolf, qemu-block, qemu-devel, Paolo Bonzini, Denis V. Lunev, Laszlo Ersek There is a long-long story. OVMF VMs can not be snapsotted using 'virsh snapshot' as they have "pflash" device which is configured as "raw" image. There was a discussion in the past about that. Good description has been provided on topic by Laszlo Ersek, see below: "It is true that a pflash drive is "just a drive" *internally* to QEMU. It is also true that it more or less takes the same -drive options as any other *disk* drive. But those facts are just implementation details. The relevant trait of pflash storage files is that they are not *disk images*, on the libvirt domain XML level. They are not created in storage pools, you cannot specify their caching attributes, you don't specify their guest-visible frontend in separation (like virtio-blk / virtio-scsi / pflash). Those details are hidden (on purpose). Consequently, pflash storage files are expected to be *small* in size (in practice: identically sized to the varstore template they are instantiated from). They are created under /var/lib/libvirt/qemu/nvram. Although you can edit their path in the domain XML, they are not considered disks." Thus we should avoid selection of "pflash" drives for VM state saving. For now "pflash" is read-write raw image as it configured by libvirt. Thus there are no such images in the field and we could safely disable ability to save state to those images inside QEMU. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Kevin Wolf <kwolf@redhat.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Laszlo Ersek <lersek@redhat.com> CC: qemu-block@nongnu.org --- block/snapshot.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/block/snapshot.c b/block/snapshot.c index 2d86b88..1a03581 100644 --- a/block/snapshot.c +++ b/block/snapshot.c @@ -25,6 +25,7 @@ #include "block/snapshot.h" #include "block/block_int.h" #include "qapi/qmp/qerror.h" +#include "sysemu/blockdev.h" QemuOptsList internal_snapshot_opts = { .name = "snapshot", @@ -481,8 +482,14 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void) BlockDriverState *bs = NULL; while (not_found && (bs = bdrv_next(bs))) { + DriveInfo *dinfo; AioContext *ctx = bdrv_get_aio_context(bs); + dinfo = bs->blk != NULL ? blk_legacy_dinfo(bs->blk) : NULL; + if (dinfo != NULL && dinfo->type == IF_PFLASH) { + continue; + } + aio_context_acquire(ctx); not_found = !bdrv_can_snapshot(bs); aio_context_release(ctx); -- 2.5.0 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot 2016-01-12 6:03 [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot Denis V. Lunev @ 2016-01-12 14:16 ` Kevin Wolf 2016-01-12 14:59 ` Paolo Bonzini 2016-01-12 15:10 ` Denis V. Lunev 2016-01-14 11:33 ` [Qemu-devel] [PATCH 1/1] RESUME " Denis V. Lunev 1 sibling, 2 replies; 25+ messages in thread From: Kevin Wolf @ 2016-01-12 14:16 UTC (permalink / raw) To: Denis V. Lunev; +Cc: Paolo Bonzini, Laszlo Ersek, qemu-devel, qemu-block Am 12.01.2016 um 07:03 hat Denis V. Lunev geschrieben: > There is a long-long story. OVMF VMs can not be snapsotted using > 'virsh snapshot' as they have "pflash" device which is configured as > "raw" image. There was a discussion in the past about that. > > Good description has been provided on topic by Laszlo Ersek, see below: > > "It is true that a pflash drive is "just a drive" *internally* to QEMU. > It is also true that it more or less takes the same -drive options as > any other *disk* drive. But those facts are just implementation details. > > The relevant trait of pflash storage files is that they are not *disk > images*, on the libvirt domain XML level. They are not created in > storage pools, you cannot specify their caching attributes, you don't > specify their guest-visible frontend in separation (like virtio-blk / > virtio-scsi / pflash). Those details are hidden (on purpose). > > Consequently, pflash storage files are expected to be *small* in size > (in practice: identically sized to the varstore template they are > instantiated from). They are created under /var/lib/libvirt/qemu/nvram. > Although you can edit their path in the domain XML, they are not > considered disks." > > Thus we should avoid selection of "pflash" drives for VM state saving. > > For now "pflash" is read-write raw image as it configured by libvirt. > Thus there are no such images in the field and we could safely disable > ability to save state to those images inside QEMU. This is obviously broken. If you write to the pflash, then it needs to be snapshotted in order to keep a consistent state. If you want to avoid snapshotting the image, make it read-only and it will be skipped even today. Kevin ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot 2016-01-12 14:16 ` Kevin Wolf @ 2016-01-12 14:59 ` Paolo Bonzini 2016-01-12 15:13 ` Denis V. Lunev 2016-01-12 15:20 ` Kevin Wolf 2016-01-12 15:10 ` Denis V. Lunev 1 sibling, 2 replies; 25+ messages in thread From: Paolo Bonzini @ 2016-01-12 14:59 UTC (permalink / raw) To: Kevin Wolf, Denis V. Lunev; +Cc: Laszlo Ersek, qemu-devel, qemu-block On 12/01/2016 15:16, Kevin Wolf wrote: >> Thus we should avoid selection of "pflash" drives for VM state saving. >> >> For now "pflash" is read-write raw image as it configured by libvirt. >> Thus there are no such images in the field and we could safely disable >> ability to save state to those images inside QEMU. > > This is obviously broken. If you write to the pflash, then it needs to > be snapshotted in order to keep a consistent state. > > If you want to avoid snapshotting the image, make it read-only and it > will be skipped even today. Sort of. The point of having flash is to _not_ make it read-only, so that is not a solution. Flash is already being snapshotted as part of saving RAM state. In fact, for this reason the device (at least the one used with OVMF; I haven't checked other pflash devices) can simply save it back to disk on the migration destination, without the need to use "migrate -b" or shared storage. See commit 4c0cfc72b31a79f737a64ebbe0411e4b83e25771: Author: Laszlo Ersek <lersek@redhat.com> Date: Sat Aug 23 12:19:07 2014 +0200 pflash_cfi01: write flash contents to bdrv on incoming migration A drive that backs a pflash device is special: - it is very small, - its entire contents are kept in a RAMBlock at all times, covering the guest-phys address range that provides the guest's view of the emulated flash chip. The pflash device model keeps the drive (the host-side file) and the guest-visible flash contents in sync. When migrating the guest, the guest-visible flash contents (the RAMBlock) is migrated by default, but on the target host, the drive (the host-side file) remains in full sync with the RAMBlock only if: - the source and target hosts share the storage underlying the pflash drive, - or the migration requests full or incremental block migration too, which then covers all drives. Due to the special nature of pflash drives, the following scenario makes sense as well: - no full nor incremental block migration, covering all drives, alongside the base migration (justified eg. by shared storage for "normal" (big) drives), - non-shared storage for pflash drives. In this case, currently only those portions of the flash drive are updated on the target disk that the guest reprograms while running on the target host. In order to restore accord, dump the entire flash contents to the bdrv in a post_load() callback. - The read-only check follows the other call-sites of pflash_update(); - both "pfl->ro" and pflash_update() reflect / consider the case when "pfl->bs" is NULL; - the total size of the flash device is calculated as in pflash_cfi01_realize(). When using shared storage, or requesting full or incremental block migration along with the normal migration, the patch should incur a harmless rewrite from the target side. It is assumed that, on the target host, RAM is loaded ahead of the call to pflash_post_load(). I don't like very much using IF_PFLASH this way, which is why I hadn't replied to the patch so far---I hadn't made up my mind about *what* to suggest instead, or whether to just accept it. However, it does work. Perhaps a separate "I know what I am doing" skip-snapshot option? Or a device callback saying "not snapshotting this is fine"? Paolo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot 2016-01-12 14:59 ` Paolo Bonzini @ 2016-01-12 15:13 ` Denis V. Lunev 2016-01-12 15:16 ` Peter Maydell 2016-01-12 15:20 ` Kevin Wolf 1 sibling, 1 reply; 25+ messages in thread From: Denis V. Lunev @ 2016-01-12 15:13 UTC (permalink / raw) To: Paolo Bonzini, Kevin Wolf; +Cc: Laszlo Ersek, qemu-devel, qemu-block On 01/12/2016 05:59 PM, Paolo Bonzini wrote: > > On 12/01/2016 15:16, Kevin Wolf wrote: >>> Thus we should avoid selection of "pflash" drives for VM state saving. >>> >>> For now "pflash" is read-write raw image as it configured by libvirt. >>> Thus there are no such images in the field and we could safely disable >>> ability to save state to those images inside QEMU. >> This is obviously broken. If you write to the pflash, then it needs to >> be snapshotted in order to keep a consistent state. >> >> If you want to avoid snapshotting the image, make it read-only and it >> will be skipped even today. > Sort of. The point of having flash is to _not_ make it read-only, so > that is not a solution. > > Flash is already being snapshotted as part of saving RAM state. In > fact, for this reason the device (at least the one used with OVMF; I > haven't checked other pflash devices) can simply save it back to disk > on the migration destination, without the need to use "migrate -b" or > shared storage. > > See commit 4c0cfc72b31a79f737a64ebbe0411e4b83e25771: > > Author: Laszlo Ersek <lersek@redhat.com> > Date: Sat Aug 23 12:19:07 2014 +0200 > > pflash_cfi01: write flash contents to bdrv on incoming migration > > A drive that backs a pflash device is special: > - it is very small, > - its entire contents are kept in a RAMBlock at all times, covering the > guest-phys address range that provides the guest's view of the emulated > flash chip. > > The pflash device model keeps the drive (the host-side file) and the > guest-visible flash contents in sync. When migrating the guest, the > guest-visible flash contents (the RAMBlock) is migrated by default, but on > the target host, the drive (the host-side file) remains in full sync with > the RAMBlock only if: > - the source and target hosts share the storage underlying the pflash > drive, > - or the migration requests full or incremental block migration too, which > then covers all drives. > > Due to the special nature of pflash drives, the following scenario makes > sense as well: > - no full nor incremental block migration, covering all drives, alongside > the base migration (justified eg. by shared storage for "normal" (big) > drives), > - non-shared storage for pflash drives. > > In this case, currently only those portions of the flash drive are updated > on the target disk that the guest reprograms while running on the target > host. > > In order to restore accord, dump the entire flash contents to the bdrv in > a post_load() callback. > > - The read-only check follows the other call-sites of pflash_update(); > - both "pfl->ro" and pflash_update() reflect / consider the case when > "pfl->bs" is NULL; > - the total size of the flash device is calculated as in > pflash_cfi01_realize(). > > When using shared storage, or requesting full or incremental block > migration along with the normal migration, the patch should incur a > harmless rewrite from the target side. > > It is assumed that, on the target host, RAM is loaded ahead of the call to > pflash_post_load(). > > I don't like very much using IF_PFLASH this way, which is why I hadn't > replied to the patch so far---I hadn't made up my mind about *what* to > suggest instead, or whether to just accept it. However, it does work. > > Perhaps a separate "I know what I am doing" skip-snapshot option? Or > a device callback saying "not snapshotting this is fine"? > > Paolo Paolo, it looks I have made a bad description :( The idea of this patch was trivial. First of all, I would like to keep this image internally snapshoted. That is why the ultimate goal was to switch from raw to qcow2 to keep changes inside the image. Though in this case this drive could be selected to save VM state, which could be big. The function being changed selects the image for VM state saving. here I would like to skip IP_PFLASH from being selected to keep it small as required by libvirt guys. Den ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot 2016-01-12 15:13 ` Denis V. Lunev @ 2016-01-12 15:16 ` Peter Maydell 2016-01-12 15:26 ` Kevin Wolf 0 siblings, 1 reply; 25+ messages in thread From: Peter Maydell @ 2016-01-12 15:16 UTC (permalink / raw) To: Denis V. Lunev Cc: Kevin Wolf, Paolo Bonzini, Laszlo Ersek, QEMU Developers, Qemu-block On 12 January 2016 at 15:13, Denis V. Lunev <den@openvz.org> wrote: > The idea of this patch was trivial. First of all, I would like to keep > this image internally snapshoted. That is why the ultimate goal > was to switch from raw to qcow2 to keep changes inside the > image. > > Though in this case this drive could be selected to save VM > state, which could be big. The function being changed selects > the image for VM state saving. > > here I would like to skip IP_PFLASH from being selected to keep > it small as required by libvirt guys. This has to be a board specific decision. Some of our machine models might have no backing storage other than an IP_PFLASH drive, but it's still nice to be able to do vmsave/vmload on them. thanks -- PMM ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot 2016-01-12 15:16 ` Peter Maydell @ 2016-01-12 15:26 ` Kevin Wolf 0 siblings, 0 replies; 25+ messages in thread From: Kevin Wolf @ 2016-01-12 15:26 UTC (permalink / raw) To: Peter Maydell Cc: Denis V. Lunev, Laszlo Ersek, QEMU Developers, Qemu-block, Paolo Bonzini Am 12.01.2016 um 16:16 hat Peter Maydell geschrieben: > On 12 January 2016 at 15:13, Denis V. Lunev <den@openvz.org> wrote: > > The idea of this patch was trivial. First of all, I would like to keep > > this image internally snapshoted. That is why the ultimate goal > > was to switch from raw to qcow2 to keep changes inside the > > image. > > > > Though in this case this drive could be selected to save VM > > state, which could be big. The function being changed selects > > the image for VM state saving. > > > > here I would like to skip IP_PFLASH from being selected to keep > > it small as required by libvirt guys. > > This has to be a board specific decision. Some of our machine > models might have no backing storage other than an IP_PFLASH > drive, but it's still nice to be able to do vmsave/vmload on them. Maybe we can give flash images lower priority than other images? I'm not sure if we don't break compatibility with such a change, though. loadvm on existing snapshots could fail now. We might need to change that first so that it can find snapshots even on images that wouldn't be the VM state image for new snapshots any more. Kevin ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot 2016-01-12 14:59 ` Paolo Bonzini 2016-01-12 15:13 ` Denis V. Lunev @ 2016-01-12 15:20 ` Kevin Wolf 2016-01-12 15:35 ` Paolo Bonzini 2016-01-12 15:47 ` Denis V. Lunev 1 sibling, 2 replies; 25+ messages in thread From: Kevin Wolf @ 2016-01-12 15:20 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Denis V. Lunev, Laszlo Ersek, qemu-devel, qemu-block Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben: > > > On 12/01/2016 15:16, Kevin Wolf wrote: > >> Thus we should avoid selection of "pflash" drives for VM state saving. > >> > >> For now "pflash" is read-write raw image as it configured by libvirt. > >> Thus there are no such images in the field and we could safely disable > >> ability to save state to those images inside QEMU. > > > > This is obviously broken. If you write to the pflash, then it needs to > > be snapshotted in order to keep a consistent state. > > > > If you want to avoid snapshotting the image, make it read-only and it > > will be skipped even today. > > Sort of. The point of having flash is to _not_ make it read-only, so > that is not a solution. > > Flash is already being snapshotted as part of saving RAM state. In > fact, for this reason the device (at least the one used with OVMF; I > haven't checked other pflash devices) can simply save it back to disk > on the migration destination, without the need to use "migrate -b" or > shared storage. > [...] > I don't like very much using IF_PFLASH this way, which is why I hadn't > replied to the patch so far---I hadn't made up my mind about *what* to > suggest instead, or whether to just accept it. However, it does work. > > Perhaps a separate "I know what I am doing" skip-snapshot option? Or > a device callback saying "not snapshotting this is fine"? Boy, is this ugly... What do you do with disk-only snapshots? The recovery only works as long as you have VM state. Kevin ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot 2016-01-12 15:20 ` Kevin Wolf @ 2016-01-12 15:35 ` Paolo Bonzini 2016-01-12 15:47 ` Denis V. Lunev 1 sibling, 0 replies; 25+ messages in thread From: Paolo Bonzini @ 2016-01-12 15:35 UTC (permalink / raw) To: Kevin Wolf; +Cc: Denis V. Lunev, Laszlo Ersek, qemu-devel, qemu-block On 12/01/2016 16:20, Kevin Wolf wrote: > > Flash is already being snapshotted as part of saving RAM state. In > > fact, for this reason the device (at least the one used with OVMF; I > > haven't checked other pflash devices) can simply save it back to disk > > on the migration destination, without the need to use "migrate -b" or > > shared storage. > > Boy, is this ugly... > > What do you do with disk-only snapshots? The recovery only works as long > as you have VM state. Turns out I had misunderstood Denis's patch, but FWIW this _is_ done as part of migration or savevm, so the VM state is available. Paolo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot 2016-01-12 15:20 ` Kevin Wolf 2016-01-12 15:35 ` Paolo Bonzini @ 2016-01-12 15:47 ` Denis V. Lunev 2016-01-12 16:35 ` Denis V. Lunev 2016-01-13 10:37 ` Laszlo Ersek 1 sibling, 2 replies; 25+ messages in thread From: Denis V. Lunev @ 2016-01-12 15:47 UTC (permalink / raw) To: Kevin Wolf, Paolo Bonzini; +Cc: Laszlo Ersek, qemu-devel, qemu-block On 01/12/2016 06:20 PM, Kevin Wolf wrote: > Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben: >> >> On 12/01/2016 15:16, Kevin Wolf wrote: >>>> Thus we should avoid selection of "pflash" drives for VM state saving. >>>> >>>> For now "pflash" is read-write raw image as it configured by libvirt. >>>> Thus there are no such images in the field and we could safely disable >>>> ability to save state to those images inside QEMU. >>> This is obviously broken. If you write to the pflash, then it needs to >>> be snapshotted in order to keep a consistent state. >>> >>> If you want to avoid snapshotting the image, make it read-only and it >>> will be skipped even today. >> Sort of. The point of having flash is to _not_ make it read-only, so >> that is not a solution. >> >> Flash is already being snapshotted as part of saving RAM state. In >> fact, for this reason the device (at least the one used with OVMF; I >> haven't checked other pflash devices) can simply save it back to disk >> on the migration destination, without the need to use "migrate -b" or >> shared storage. >> [...] >> I don't like very much using IF_PFLASH this way, which is why I hadn't >> replied to the patch so far---I hadn't made up my mind about *what* to >> suggest instead, or whether to just accept it. However, it does work. >> >> Perhaps a separate "I know what I am doing" skip-snapshot option? Or >> a device callback saying "not snapshotting this is fine"? > Boy, is this ugly... > > What do you do with disk-only snapshots? The recovery only works as long > as you have VM state. > > Kevin actually I am in a bit of trouble :( I understand that this is ugly, but I would like to make working 'virsh snapshot' for OVFM VMs. This is necessary for us to make a release. Currently libvirt guys generate XML in the following way: <os> <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type> <loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader> <nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram> </os> This results in: qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on \ -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1 This obviously can not pass check in bdrv_all_can_snapshot() as 'pflash' is RW and raw, i.e. can not be snapshoted. They have discussed the switch to the following command line: qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on \ -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1 and say that in this case VM state could fall into PFLASH drive which is should not be big as the location of the file is different. This means that I am doomed here. Either we should force libvirt people to forget about their opinion that pflash should be small which I am unable to do or I should invent a way to ban VM state saving into pflash. OK. There are 2 options. 1) Ban pflash as it was done. 2) Add 'no-vmstate' flag to -drive (invented just now). Den P.S. Here is a summary that my colleague has receiver from libvirt list. -------- Forwarded Message -------- Subject: Re: Snapshotting OVMF guests Date: Mon, 11 Jan 2016 13:56:29 +0100 From: Laszlo Ersek <lersek@redhat.com> To: Dmitry Andreev <dandreev@virtuozzo.com> CC: Michal Privoznik <mprivozn@redhat.com>, Markus Armbruster <armbru@redhat.com> Hello Dmitry, (Cc: Markus.) > https://bugzilla.redhat.com/show_bug.cgi?id=1180955 I have now re-read that BZ. In comment 7 I wrote, > However, if Michal's v2 libvirt patchset was applied, and the varstore > drive was qcow2, then qemu would dump the *entire VM state*, including > memory and device state, into the varstore drive (the 6th drive) under > the command line visible in comment #0. That's *completely* bogus; > much worse than rejecting the snapshot request. It is bogus for size and configuration reasons. It is true that a pflash drive is "just a drive" *internally* to QEMU. It is also true that it more or less takes the same -drive options as any other *disk* drive. But those facts are just implementation details. The relevant trait of pflash storage files is that they are not *disk images*, on the libvirt domain XML level. They are not created in storage pools, you cannot specify their caching attributes, you don't specify their guest-visible frontend in separation (like virtio-blk / virtio-scsi / pflash). Those details are hidden (on purpose). Consequently, pflash storage files are expected to be *small* in size (in practice: identically sized to the varstore template they are instantiated from). They are created under /var/lib/libvirt/qemu/nvram. Although you can edit their path in the domain XML, they are not considered disks. This is also reflected in the way they are migrated. They are not migrated with NBD / live storage migration / blockdev migration. Instead, on the target host, when the in-migration completes, the entire contents of the flash drive are written out in one shot to the target host file. Please see: - the pflash_post_load() function in QEMU's "hw/block/pflash_cfi01.c", - and QEMU commit 4c0cfc72. Storing large amounts or data in the pflash storage file would be incompatible with this concept. ... We also had an internal team discussion at Red Hat about this. I won't re-read it now, but I think I can share a part of my own BZ comment 9. In that comment I tried to summarize the internal discussion more or less for myself. (I made that comment private because it contained RH product related bits too -- I won't quote those bits now.) So from comment 9: > [...] the upshot from [the internal discussion] seems to be that > "savevm" is *in general* inappropriate for any non-trivial -drive > setup and/or for a -drive setup that is subject to change (eg. > reordering on the command line). Comment 11 in the BZ shows that we plan to document the limitation that internal snapshotting will never be supported for OVMF. External snapshotting *should* be, but it isn't yet either (because, at least at the time of writing the BZ comment, reverting to external snapshots wasn't supported). Bottom line, pflash is implemented as a drive internally, but it is not considered a *disk* drive, for migration, snapshotting, being stored in pools, or for storing large amounts of data. I hope this helps. If you'd like to enable snapshotting for OVMF virtual machines, that would be awesome; but I think it would require implementing the above-mentioned "revert to external snapshot" functionality. Thanks! Laszlo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot 2016-01-12 15:47 ` Denis V. Lunev @ 2016-01-12 16:35 ` Denis V. Lunev 2016-01-12 16:52 ` Kevin Wolf 2016-01-13 10:37 ` Laszlo Ersek 1 sibling, 1 reply; 25+ messages in thread From: Denis V. Lunev @ 2016-01-12 16:35 UTC (permalink / raw) To: Kevin Wolf, Paolo Bonzini; +Cc: Laszlo Ersek, qemu-devel, qemu-block On 01/12/2016 06:47 PM, Denis V. Lunev wrote: > On 01/12/2016 06:20 PM, Kevin Wolf wrote: >> Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben: >>> >>> On 12/01/2016 15:16, Kevin Wolf wrote: >>>>> Thus we should avoid selection of "pflash" drives for VM state >>>>> saving. >>>>> >>>>> For now "pflash" is read-write raw image as it configured by libvirt. >>>>> Thus there are no such images in the field and we could safely >>>>> disable >>>>> ability to save state to those images inside QEMU. >>>> This is obviously broken. If you write to the pflash, then it needs to >>>> be snapshotted in order to keep a consistent state. >>>> >>>> If you want to avoid snapshotting the image, make it read-only and it >>>> will be skipped even today. >>> Sort of. The point of having flash is to _not_ make it read-only, so >>> that is not a solution. >>> >>> Flash is already being snapshotted as part of saving RAM state. In >>> fact, for this reason the device (at least the one used with OVMF; I >>> haven't checked other pflash devices) can simply save it back to disk >>> on the migration destination, without the need to use "migrate -b" or >>> shared storage. >>> [...] >>> I don't like very much using IF_PFLASH this way, which is why I hadn't >>> replied to the patch so far---I hadn't made up my mind about *what* to >>> suggest instead, or whether to just accept it. However, it does work. >>> >>> Perhaps a separate "I know what I am doing" skip-snapshot option? Or >>> a device callback saying "not snapshotting this is fine"? >> Boy, is this ugly... >> >> What do you do with disk-only snapshots? The recovery only works as long >> as you have VM state. >> >> Kevin > actually I am in a bit of trouble :( > > I understand that this is ugly, but I would like to make working > 'virsh snapshot' for OVFM VMs. This is necessary for us to make > a release. > > Currently libvirt guys generate XML in the following way: > > <os> > <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type> > <loader readonly='yes' > type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader> > <nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram> > </os> > > This results in: > > qemu -drive > file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on > \ > -drive > file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1 > > This obviously can not pass check in bdrv_all_can_snapshot() > as 'pflash' is RW and raw, i.e. can not be snapshoted. > > They have discussed the switch to the following command line: > > qemu -drive > file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on > \ > -drive > file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1 > > and say that in this case VM state could fall into PFLASH > drive which is should not be big as the location of the > file is different. This means that I am doomed here. > > Either we should force libvirt people to forget about their > opinion that pflash should be small which I am unable to > do or I should invent a way to ban VM state saving into > pflash. > > OK. There are 2 options. > > 1) Ban pflash as it was done. > 2) Add 'no-vmstate' flag to -drive (invented just now). > something like this: diff --git a/block.c b/block.c index 3e1877d..8900589 100644 --- a/block.c +++ b/block.c @@ -881,6 +881,11 @@ static QemuOptsList bdrv_runtime_opts = { .help = "Block driver to use for the node", }, { + .name = "novmstate", + .type = QEMU_OPT_BOOL, + .help = "Ignore for selecting to save VM state", + }, + { .name = BDRV_OPT_CACHE_WB, .type = QEMU_OPT_BOOL, .help = "Enable writeback mode", @@ -957,6 +962,7 @@ static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file, bs->request_alignment = 512; bs->zero_beyond_eof = true; bs->read_only = !(bs->open_flags & BDRV_O_RDWR); + bs->disable_vmstate_save = qemu_opt_get_bool(opts, "novmstate", false); if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) { error_setg(errp, diff --git a/block/snapshot.c b/block/snapshot.c index 2d86b88..33cdd86 100644 --- a/block/snapshot.c +++ b/block/snapshot.c @@ -483,6 +483,10 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void) while (not_found && (bs = bdrv_next(bs))) { AioContext *ctx = bdrv_get_aio_context(bs); + if (bs->disable_vmstate_save) { + continue; + } + aio_context_acquire(ctx); not_found = !bdrv_can_snapshot(bs); aio_context_release(ctx); diff --git a/include/block/block_int.h b/include/block/block_int.h index 256609d..855a209 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -438,6 +438,9 @@ struct BlockDriverState { /* do we need to tell the quest if we have a volatile write cache? */ int enable_write_cache; + /* skip this BDS searching for one to save VM state */ + bool disable_vmstate_save; + /* the following member gives a name to every node on the bs graph. */ char node_name[32]; /* element of the list of named nodes building the graph */ ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot 2016-01-12 16:35 ` Denis V. Lunev @ 2016-01-12 16:52 ` Kevin Wolf 2016-01-12 16:58 ` Denis V. Lunev 2016-01-12 17:40 ` Markus Armbruster 0 siblings, 2 replies; 25+ messages in thread From: Kevin Wolf @ 2016-01-12 16:52 UTC (permalink / raw) To: Denis V. Lunev; +Cc: Paolo Bonzini, Laszlo Ersek, qemu-devel, qemu-block Am 12.01.2016 um 17:35 hat Denis V. Lunev geschrieben: > On 01/12/2016 06:47 PM, Denis V. Lunev wrote: > >On 01/12/2016 06:20 PM, Kevin Wolf wrote: > >>Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben: > >>> > >>>On 12/01/2016 15:16, Kevin Wolf wrote: > >>>>>Thus we should avoid selection of "pflash" drives for VM > >>>>>state saving. > >>>>> > >>>>>For now "pflash" is read-write raw image as it configured by libvirt. > >>>>>Thus there are no such images in the field and we could > >>>>>safely disable > >>>>>ability to save state to those images inside QEMU. > >>>>This is obviously broken. If you write to the pflash, then it needs to > >>>>be snapshotted in order to keep a consistent state. > >>>> > >>>>If you want to avoid snapshotting the image, make it read-only and it > >>>>will be skipped even today. > >>>Sort of. The point of having flash is to _not_ make it read-only, so > >>>that is not a solution. > >>> > >>>Flash is already being snapshotted as part of saving RAM state. In > >>>fact, for this reason the device (at least the one used with OVMF; I > >>>haven't checked other pflash devices) can simply save it back to disk > >>>on the migration destination, without the need to use "migrate -b" or > >>>shared storage. > >>>[...] > >>>I don't like very much using IF_PFLASH this way, which is why I hadn't > >>>replied to the patch so far---I hadn't made up my mind about *what* to > >>>suggest instead, or whether to just accept it. However, it does work. > >>> > >>>Perhaps a separate "I know what I am doing" skip-snapshot option? Or > >>>a device callback saying "not snapshotting this is fine"? > >>Boy, is this ugly... > >> > >>What do you do with disk-only snapshots? The recovery only works as long > >>as you have VM state. > >> > >>Kevin > >actually I am in a bit of trouble :( > > > >I understand that this is ugly, but I would like to make working > >'virsh snapshot' for OVFM VMs. This is necessary for us to make > >a release. > > > >Currently libvirt guys generate XML in the following way: > > > > <os> > > <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type> > > <loader readonly='yes' > >type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader> > ><nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram> > > </os> > > > >This results in: > > > >qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on > >\ > > -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1 > > > >This obviously can not pass check in bdrv_all_can_snapshot() > >as 'pflash' is RW and raw, i.e. can not be snapshoted. > > > >They have discussed the switch to the following command line: > > > >qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on > >\ > > -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1 > > > >and say that in this case VM state could fall into PFLASH > >drive which is should not be big as the location of the > >file is different. This means that I am doomed here. > > > >Either we should force libvirt people to forget about their > >opinion that pflash should be small which I am unable to > >do or I should invent a way to ban VM state saving into > >pflash. > > > >OK. There are 2 options. > > > >1) Ban pflash as it was done. > >2) Add 'no-vmstate' flag to -drive (invented just now). > > > something like this: > > diff --git a/block.c b/block.c > index 3e1877d..8900589 100644 > --- a/block.c > +++ b/block.c > @@ -881,6 +881,11 @@ static QemuOptsList bdrv_runtime_opts = { > .help = "Block driver to use for the node", > }, > { > + .name = "novmstate", > + .type = QEMU_OPT_BOOL, > + .help = "Ignore for selecting to save VM state", > + }, > + { > .name = BDRV_OPT_CACHE_WB, > .type = QEMU_OPT_BOOL, > .help = "Enable writeback mode", > @@ -957,6 +962,7 @@ static int bdrv_open_common(BlockDriverState > *bs, BdrvChild *file, > bs->request_alignment = 512; > bs->zero_beyond_eof = true; > bs->read_only = !(bs->open_flags & BDRV_O_RDWR); > + bs->disable_vmstate_save = qemu_opt_get_bool(opts, "novmstate", false); > > if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) { > error_setg(errp, > diff --git a/block/snapshot.c b/block/snapshot.c > index 2d86b88..33cdd86 100644 > --- a/block/snapshot.c > +++ b/block/snapshot.c > @@ -483,6 +483,10 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void) > while (not_found && (bs = bdrv_next(bs))) { > AioContext *ctx = bdrv_get_aio_context(bs); > > + if (bs->disable_vmstate_save) { > + continue; > + } > + > aio_context_acquire(ctx); > not_found = !bdrv_can_snapshot(bs); > aio_context_release(ctx); > diff --git a/include/block/block_int.h b/include/block/block_int.h > index 256609d..855a209 100644 > --- a/include/block/block_int.h > +++ b/include/block/block_int.h > @@ -438,6 +438,9 @@ struct BlockDriverState { > /* do we need to tell the quest if we have a volatile write cache? */ > int enable_write_cache; > > + /* skip this BDS searching for one to save VM state */ > + bool disable_vmstate_save; > + > /* the following member gives a name to every node on the bs graph. */ > char node_name[32]; > /* element of the list of named nodes building the graph */ That sounds like an option. (No pun intended.) We can discuss the option name (perhaps "vmstate" defaulting to "on" is better?) and variable names (I'd prefer them to match the option name); also you'll need to extend the QAPI schema for blockdev-add. But all of these are minor points and the idea seems sane. Kevin ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot 2016-01-12 16:52 ` Kevin Wolf @ 2016-01-12 16:58 ` Denis V. Lunev 2016-01-12 17:40 ` Markus Armbruster 1 sibling, 0 replies; 25+ messages in thread From: Denis V. Lunev @ 2016-01-12 16:58 UTC (permalink / raw) To: Kevin Wolf; +Cc: Paolo Bonzini, Laszlo Ersek, qemu-devel, qemu-block On 01/12/2016 07:52 PM, Kevin Wolf wrote: > Am 12.01.2016 um 17:35 hat Denis V. Lunev geschrieben: >> On 01/12/2016 06:47 PM, Denis V. Lunev wrote: >>> On 01/12/2016 06:20 PM, Kevin Wolf wrote: >>>> Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben: >>>>> On 12/01/2016 15:16, Kevin Wolf wrote: >>>>>>> Thus we should avoid selection of "pflash" drives for VM >>>>>>> state saving. >>>>>>> >>>>>>> For now "pflash" is read-write raw image as it configured by libvirt. >>>>>>> Thus there are no such images in the field and we could >>>>>>> safely disable >>>>>>> ability to save state to those images inside QEMU. >>>>>> This is obviously broken. If you write to the pflash, then it needs to >>>>>> be snapshotted in order to keep a consistent state. >>>>>> >>>>>> If you want to avoid snapshotting the image, make it read-only and it >>>>>> will be skipped even today. >>>>> Sort of. The point of having flash is to _not_ make it read-only, so >>>>> that is not a solution. >>>>> >>>>> Flash is already being snapshotted as part of saving RAM state. In >>>>> fact, for this reason the device (at least the one used with OVMF; I >>>>> haven't checked other pflash devices) can simply save it back to disk >>>>> on the migration destination, without the need to use "migrate -b" or >>>>> shared storage. >>>>> [...] >>>>> I don't like very much using IF_PFLASH this way, which is why I hadn't >>>>> replied to the patch so far---I hadn't made up my mind about *what* to >>>>> suggest instead, or whether to just accept it. However, it does work. >>>>> >>>>> Perhaps a separate "I know what I am doing" skip-snapshot option? Or >>>>> a device callback saying "not snapshotting this is fine"? >>>> Boy, is this ugly... >>>> >>>> What do you do with disk-only snapshots? The recovery only works as long >>>> as you have VM state. >>>> >>>> Kevin >>> actually I am in a bit of trouble :( >>> >>> I understand that this is ugly, but I would like to make working >>> 'virsh snapshot' for OVFM VMs. This is necessary for us to make >>> a release. >>> >>> Currently libvirt guys generate XML in the following way: >>> >>> <os> >>> <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type> >>> <loader readonly='yes' >>> type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader> >>> <nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram> >>> </os> >>> >>> This results in: >>> >>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on >>> \ >>> -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1 >>> >>> This obviously can not pass check in bdrv_all_can_snapshot() >>> as 'pflash' is RW and raw, i.e. can not be snapshoted. >>> >>> They have discussed the switch to the following command line: >>> >>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on >>> \ >>> -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1 >>> >>> and say that in this case VM state could fall into PFLASH >>> drive which is should not be big as the location of the >>> file is different. This means that I am doomed here. >>> >>> Either we should force libvirt people to forget about their >>> opinion that pflash should be small which I am unable to >>> do or I should invent a way to ban VM state saving into >>> pflash. >>> >>> OK. There are 2 options. >>> >>> 1) Ban pflash as it was done. >>> 2) Add 'no-vmstate' flag to -drive (invented just now). >>> >> something like this: >> >> diff --git a/block.c b/block.c >> index 3e1877d..8900589 100644 >> --- a/block.c >> +++ b/block.c >> @@ -881,6 +881,11 @@ static QemuOptsList bdrv_runtime_opts = { >> .help = "Block driver to use for the node", >> }, >> { >> + .name = "novmstate", >> + .type = QEMU_OPT_BOOL, >> + .help = "Ignore for selecting to save VM state", >> + }, >> + { >> .name = BDRV_OPT_CACHE_WB, >> .type = QEMU_OPT_BOOL, >> .help = "Enable writeback mode", >> @@ -957,6 +962,7 @@ static int bdrv_open_common(BlockDriverState >> *bs, BdrvChild *file, >> bs->request_alignment = 512; >> bs->zero_beyond_eof = true; >> bs->read_only = !(bs->open_flags & BDRV_O_RDWR); >> + bs->disable_vmstate_save = qemu_opt_get_bool(opts, "novmstate", false); >> >> if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) { >> error_setg(errp, >> diff --git a/block/snapshot.c b/block/snapshot.c >> index 2d86b88..33cdd86 100644 >> --- a/block/snapshot.c >> +++ b/block/snapshot.c >> @@ -483,6 +483,10 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void) >> while (not_found && (bs = bdrv_next(bs))) { >> AioContext *ctx = bdrv_get_aio_context(bs); >> >> + if (bs->disable_vmstate_save) { >> + continue; >> + } >> + >> aio_context_acquire(ctx); >> not_found = !bdrv_can_snapshot(bs); >> aio_context_release(ctx); >> diff --git a/include/block/block_int.h b/include/block/block_int.h >> index 256609d..855a209 100644 >> --- a/include/block/block_int.h >> +++ b/include/block/block_int.h >> @@ -438,6 +438,9 @@ struct BlockDriverState { >> /* do we need to tell the quest if we have a volatile write cache? */ >> int enable_write_cache; >> >> + /* skip this BDS searching for one to save VM state */ >> + bool disable_vmstate_save; >> + >> /* the following member gives a name to every node on the bs graph. */ >> char node_name[32]; >> /* element of the list of named nodes building the graph */ > That sounds like an option. (No pun intended.) > > We can discuss the option name (perhaps "vmstate" defaulting to "on" is > better?) and variable names (I'd prefer them to match the option name); > also you'll need to extend the QAPI schema for blockdev-add. But all of > these are minor points and the idea seems sane. > > Kevin Perfect! Thanks all for a discussion :) Den ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot 2016-01-12 16:52 ` Kevin Wolf 2016-01-12 16:58 ` Denis V. Lunev @ 2016-01-12 17:40 ` Markus Armbruster 2016-01-12 17:50 ` Kevin Wolf ` (2 more replies) 1 sibling, 3 replies; 25+ messages in thread From: Markus Armbruster @ 2016-01-12 17:40 UTC (permalink / raw) To: Kevin Wolf Cc: Denis V. Lunev, Laszlo Ersek, qemu-devel, qemu-block, Paolo Bonzini Kevin Wolf <kwolf@redhat.com> writes: > Am 12.01.2016 um 17:35 hat Denis V. Lunev geschrieben: >> On 01/12/2016 06:47 PM, Denis V. Lunev wrote: >> >On 01/12/2016 06:20 PM, Kevin Wolf wrote: >> >>Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben: >> >>> >> >>>On 12/01/2016 15:16, Kevin Wolf wrote: >> >>>>>Thus we should avoid selection of "pflash" drives for VM >> >>>>>state saving. >> >>>>> >> >>>>>For now "pflash" is read-write raw image as it configured by libvirt. >> >>>>>Thus there are no such images in the field and we could >> >>>>>safely disable >> >>>>>ability to save state to those images inside QEMU. >> >>>>This is obviously broken. If you write to the pflash, then it needs to >> >>>>be snapshotted in order to keep a consistent state. >> >>>> >> >>>>If you want to avoid snapshotting the image, make it read-only and it >> >>>>will be skipped even today. >> >>>Sort of. The point of having flash is to _not_ make it read-only, so >> >>>that is not a solution. >> >>> >> >>>Flash is already being snapshotted as part of saving RAM state. In >> >>>fact, for this reason the device (at least the one used with OVMF; I >> >>>haven't checked other pflash devices) can simply save it back to disk >> >>>on the migration destination, without the need to use "migrate -b" or >> >>>shared storage. >> >>>[...] >> >>>I don't like very much using IF_PFLASH this way, which is why I hadn't >> >>>replied to the patch so far---I hadn't made up my mind about *what* to >> >>>suggest instead, or whether to just accept it. However, it does work. >> >>> >> >>>Perhaps a separate "I know what I am doing" skip-snapshot option? Or >> >>>a device callback saying "not snapshotting this is fine"? >> >>Boy, is this ugly... >> >> >> >>What do you do with disk-only snapshots? The recovery only works as long >> >>as you have VM state. >> >> >> >>Kevin >> >actually I am in a bit of trouble :( >> > >> >I understand that this is ugly, but I would like to make working >> >'virsh snapshot' for OVFM VMs. This is necessary for us to make >> >a release. >> > >> >Currently libvirt guys generate XML in the following way: >> > >> > <os> >> > <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type> >> > <loader readonly='yes' >> >type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader> >> ><nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram> >> > </os> >> > >> >This results in: >> > >> >qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on >> >\ >> > -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1 >> > >> >This obviously can not pass check in bdrv_all_can_snapshot() >> >as 'pflash' is RW and raw, i.e. can not be snapshoted. >> > >> >They have discussed the switch to the following command line: >> > >> >qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on >> >\ >> > -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1 >> > >> >and say that in this case VM state could fall into PFLASH >> >drive which is should not be big as the location of the >> >file is different. This means that I am doomed here. >> > >> >Either we should force libvirt people to forget about their >> >opinion that pflash should be small which I am unable to >> >do or I should invent a way to ban VM state saving into >> >pflash. >> > >> >OK. There are 2 options. >> > >> >1) Ban pflash as it was done. >> >2) Add 'no-vmstate' flag to -drive (invented just now). >> > >> something like this: >> >> diff --git a/block.c b/block.c >> index 3e1877d..8900589 100644 >> --- a/block.c >> +++ b/block.c >> @@ -881,6 +881,11 @@ static QemuOptsList bdrv_runtime_opts = { >> .help = "Block driver to use for the node", >> }, >> { >> + .name = "novmstate", >> + .type = QEMU_OPT_BOOL, >> + .help = "Ignore for selecting to save VM state", >> + }, >> + { >> .name = BDRV_OPT_CACHE_WB, >> .type = QEMU_OPT_BOOL, >> .help = "Enable writeback mode", >> @@ -957,6 +962,7 @@ static int bdrv_open_common(BlockDriverState >> *bs, BdrvChild *file, >> bs->request_alignment = 512; >> bs->zero_beyond_eof = true; >> bs->read_only = !(bs->open_flags & BDRV_O_RDWR); >> + bs->disable_vmstate_save = qemu_opt_get_bool(opts, "novmstate", false); >> >> if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) { >> error_setg(errp, >> diff --git a/block/snapshot.c b/block/snapshot.c >> index 2d86b88..33cdd86 100644 >> --- a/block/snapshot.c >> +++ b/block/snapshot.c >> @@ -483,6 +483,10 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void) >> while (not_found && (bs = bdrv_next(bs))) { >> AioContext *ctx = bdrv_get_aio_context(bs); >> >> + if (bs->disable_vmstate_save) { >> + continue; >> + } >> + >> aio_context_acquire(ctx); >> not_found = !bdrv_can_snapshot(bs); >> aio_context_release(ctx); >> diff --git a/include/block/block_int.h b/include/block/block_int.h >> index 256609d..855a209 100644 >> --- a/include/block/block_int.h >> +++ b/include/block/block_int.h >> @@ -438,6 +438,9 @@ struct BlockDriverState { >> /* do we need to tell the quest if we have a volatile write cache? */ >> int enable_write_cache; >> >> + /* skip this BDS searching for one to save VM state */ >> + bool disable_vmstate_save; >> + >> /* the following member gives a name to every node on the bs graph. */ >> char node_name[32]; >> /* element of the list of named nodes building the graph */ > > That sounds like an option. (No pun intended.) > > We can discuss the option name (perhaps "vmstate" defaulting to "on" is > better?) and variable names (I'd prefer them to match the option name); > also you'll need to extend the QAPI schema for blockdev-add. But all of > these are minor points and the idea seems sane. I've always thought that QEMU picking the image to take the VM state is backwards. Adding means to guide that pick like "don't pick this one, please" may help ease the pain, but it's still backwards. The *user* should pick it. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot 2016-01-12 17:40 ` Markus Armbruster @ 2016-01-12 17:50 ` Kevin Wolf 2016-01-12 17:54 ` Denis V. Lunev ` (2 more replies) 2016-01-12 17:53 ` Denis V. Lunev 2016-01-13 10:41 ` Laszlo Ersek 2 siblings, 3 replies; 25+ messages in thread From: Kevin Wolf @ 2016-01-12 17:50 UTC (permalink / raw) To: Markus Armbruster Cc: Denis V. Lunev, Laszlo Ersek, qemu-devel, qemu-block, Paolo Bonzini Am 12.01.2016 um 18:40 hat Markus Armbruster geschrieben: > Kevin Wolf <kwolf@redhat.com> writes: > > > Am 12.01.2016 um 17:35 hat Denis V. Lunev geschrieben: > >> On 01/12/2016 06:47 PM, Denis V. Lunev wrote: > >> >On 01/12/2016 06:20 PM, Kevin Wolf wrote: > >> >>Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben: > >> >>> > >> >>>On 12/01/2016 15:16, Kevin Wolf wrote: > >> >>>>>Thus we should avoid selection of "pflash" drives for VM > >> >>>>>state saving. > >> >>>>> > >> >>>>>For now "pflash" is read-write raw image as it configured by libvirt. > >> >>>>>Thus there are no such images in the field and we could > >> >>>>>safely disable > >> >>>>>ability to save state to those images inside QEMU. > >> >>>>This is obviously broken. If you write to the pflash, then it needs to > >> >>>>be snapshotted in order to keep a consistent state. > >> >>>> > >> >>>>If you want to avoid snapshotting the image, make it read-only and it > >> >>>>will be skipped even today. > >> >>>Sort of. The point of having flash is to _not_ make it read-only, so > >> >>>that is not a solution. > >> >>> > >> >>>Flash is already being snapshotted as part of saving RAM state. In > >> >>>fact, for this reason the device (at least the one used with OVMF; I > >> >>>haven't checked other pflash devices) can simply save it back to disk > >> >>>on the migration destination, without the need to use "migrate -b" or > >> >>>shared storage. > >> >>>[...] > >> >>>I don't like very much using IF_PFLASH this way, which is why I hadn't > >> >>>replied to the patch so far---I hadn't made up my mind about *what* to > >> >>>suggest instead, or whether to just accept it. However, it does work. > >> >>> > >> >>>Perhaps a separate "I know what I am doing" skip-snapshot option? Or > >> >>>a device callback saying "not snapshotting this is fine"? > >> >>Boy, is this ugly... > >> >> > >> >>What do you do with disk-only snapshots? The recovery only works as long > >> >>as you have VM state. > >> >> > >> >>Kevin > >> >actually I am in a bit of trouble :( > >> > > >> >I understand that this is ugly, but I would like to make working > >> >'virsh snapshot' for OVFM VMs. This is necessary for us to make > >> >a release. > >> > > >> >Currently libvirt guys generate XML in the following way: > >> > > >> > <os> > >> > <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type> > >> > <loader readonly='yes' > >> >type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader> > >> ><nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram> > >> > </os> > >> > > >> >This results in: > >> > > >> >qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on > >> >\ > >> > -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1 > >> > > >> >This obviously can not pass check in bdrv_all_can_snapshot() > >> >as 'pflash' is RW and raw, i.e. can not be snapshoted. > >> > > >> >They have discussed the switch to the following command line: > >> > > >> >qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on > >> >\ > >> > -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1 > >> > > >> >and say that in this case VM state could fall into PFLASH > >> >drive which is should not be big as the location of the > >> >file is different. This means that I am doomed here. > >> > > >> >Either we should force libvirt people to forget about their > >> >opinion that pflash should be small which I am unable to > >> >do or I should invent a way to ban VM state saving into > >> >pflash. > >> > > >> >OK. There are 2 options. > >> > > >> >1) Ban pflash as it was done. > >> >2) Add 'no-vmstate' flag to -drive (invented just now). > >> > > >> something like this: > >> > >> diff --git a/block.c b/block.c > >> index 3e1877d..8900589 100644 > >> --- a/block.c > >> +++ b/block.c > >> @@ -881,6 +881,11 @@ static QemuOptsList bdrv_runtime_opts = { > >> .help = "Block driver to use for the node", > >> }, > >> { > >> + .name = "novmstate", > >> + .type = QEMU_OPT_BOOL, > >> + .help = "Ignore for selecting to save VM state", > >> + }, > >> + { > >> .name = BDRV_OPT_CACHE_WB, > >> .type = QEMU_OPT_BOOL, > >> .help = "Enable writeback mode", > >> @@ -957,6 +962,7 @@ static int bdrv_open_common(BlockDriverState > >> *bs, BdrvChild *file, > >> bs->request_alignment = 512; > >> bs->zero_beyond_eof = true; > >> bs->read_only = !(bs->open_flags & BDRV_O_RDWR); > >> + bs->disable_vmstate_save = qemu_opt_get_bool(opts, "novmstate", false); > >> > >> if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) { > >> error_setg(errp, > >> diff --git a/block/snapshot.c b/block/snapshot.c > >> index 2d86b88..33cdd86 100644 > >> --- a/block/snapshot.c > >> +++ b/block/snapshot.c > >> @@ -483,6 +483,10 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void) > >> while (not_found && (bs = bdrv_next(bs))) { > >> AioContext *ctx = bdrv_get_aio_context(bs); > >> > >> + if (bs->disable_vmstate_save) { > >> + continue; > >> + } > >> + > >> aio_context_acquire(ctx); > >> not_found = !bdrv_can_snapshot(bs); > >> aio_context_release(ctx); > >> diff --git a/include/block/block_int.h b/include/block/block_int.h > >> index 256609d..855a209 100644 > >> --- a/include/block/block_int.h > >> +++ b/include/block/block_int.h > >> @@ -438,6 +438,9 @@ struct BlockDriverState { > >> /* do we need to tell the quest if we have a volatile write cache? */ > >> int enable_write_cache; > >> > >> + /* skip this BDS searching for one to save VM state */ > >> + bool disable_vmstate_save; > >> + > >> /* the following member gives a name to every node on the bs graph. */ > >> char node_name[32]; > >> /* element of the list of named nodes building the graph */ > > > > That sounds like an option. (No pun intended.) > > > > We can discuss the option name (perhaps "vmstate" defaulting to "on" is > > better?) and variable names (I'd prefer them to match the option name); > > also you'll need to extend the QAPI schema for blockdev-add. But all of > > these are minor points and the idea seems sane. > > I've always thought that QEMU picking the image to take the VM state is > backwards. Adding means to guide that pick like "don't pick this one, > please" may help ease the pain, but it's still backwards. > > The *user* should pick it. Designing the API now when it has been in use for ten years is backwards, too. We have to take it as is and make the best of it. We could add an optional argument to savevm that tells which image to save the VM state to. But if it's missing, we still need to make a pick. Of course, libvirt should then always use that option and then we don't need a separate vmstate=[on|off] option. If we go that way, we need to improve loadvm to get VM state from any of the images of a VM, because the user could have saved the state to any. (Making that improvement is probably a good idea anyway.) Kevin ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot 2016-01-12 17:50 ` Kevin Wolf @ 2016-01-12 17:54 ` Denis V. Lunev 2016-01-13 8:09 ` Markus Armbruster 2016-01-13 10:43 ` Laszlo Ersek 2 siblings, 0 replies; 25+ messages in thread From: Denis V. Lunev @ 2016-01-12 17:54 UTC (permalink / raw) To: Kevin Wolf, Markus Armbruster Cc: Paolo Bonzini, Laszlo Ersek, qemu-devel, qemu-block On 01/12/2016 08:50 PM, Kevin Wolf wrote: > Am 12.01.2016 um 18:40 hat Markus Armbruster geschrieben: >> Kevin Wolf <kwolf@redhat.com> writes: >> >>> Am 12.01.2016 um 17:35 hat Denis V. Lunev geschrieben: >>>> On 01/12/2016 06:47 PM, Denis V. Lunev wrote: >>>>> On 01/12/2016 06:20 PM, Kevin Wolf wrote: >>>>>> Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben: >>>>>>> On 12/01/2016 15:16, Kevin Wolf wrote: >>>>>>>>> Thus we should avoid selection of "pflash" drives for VM >>>>>>>>> state saving. >>>>>>>>> >>>>>>>>> For now "pflash" is read-write raw image as it configured by libvirt. >>>>>>>>> Thus there are no such images in the field and we could >>>>>>>>> safely disable >>>>>>>>> ability to save state to those images inside QEMU. >>>>>>>> This is obviously broken. If you write to the pflash, then it needs to >>>>>>>> be snapshotted in order to keep a consistent state. >>>>>>>> >>>>>>>> If you want to avoid snapshotting the image, make it read-only and it >>>>>>>> will be skipped even today. >>>>>>> Sort of. The point of having flash is to _not_ make it read-only, so >>>>>>> that is not a solution. >>>>>>> >>>>>>> Flash is already being snapshotted as part of saving RAM state. In >>>>>>> fact, for this reason the device (at least the one used with OVMF; I >>>>>>> haven't checked other pflash devices) can simply save it back to disk >>>>>>> on the migration destination, without the need to use "migrate -b" or >>>>>>> shared storage. >>>>>>> [...] >>>>>>> I don't like very much using IF_PFLASH this way, which is why I hadn't >>>>>>> replied to the patch so far---I hadn't made up my mind about *what* to >>>>>>> suggest instead, or whether to just accept it. However, it does work. >>>>>>> >>>>>>> Perhaps a separate "I know what I am doing" skip-snapshot option? Or >>>>>>> a device callback saying "not snapshotting this is fine"? >>>>>> Boy, is this ugly... >>>>>> >>>>>> What do you do with disk-only snapshots? The recovery only works as long >>>>>> as you have VM state. >>>>>> >>>>>> Kevin >>>>> actually I am in a bit of trouble :( >>>>> >>>>> I understand that this is ugly, but I would like to make working >>>>> 'virsh snapshot' for OVFM VMs. This is necessary for us to make >>>>> a release. >>>>> >>>>> Currently libvirt guys generate XML in the following way: >>>>> >>>>> <os> >>>>> <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type> >>>>> <loader readonly='yes' >>>>> type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader> >>>>> <nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram> >>>>> </os> >>>>> >>>>> This results in: >>>>> >>>>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on >>>>> \ >>>>> -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1 >>>>> >>>>> This obviously can not pass check in bdrv_all_can_snapshot() >>>>> as 'pflash' is RW and raw, i.e. can not be snapshoted. >>>>> >>>>> They have discussed the switch to the following command line: >>>>> >>>>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on >>>>> \ >>>>> -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1 >>>>> >>>>> and say that in this case VM state could fall into PFLASH >>>>> drive which is should not be big as the location of the >>>>> file is different. This means that I am doomed here. >>>>> >>>>> Either we should force libvirt people to forget about their >>>>> opinion that pflash should be small which I am unable to >>>>> do or I should invent a way to ban VM state saving into >>>>> pflash. >>>>> >>>>> OK. There are 2 options. >>>>> >>>>> 1) Ban pflash as it was done. >>>>> 2) Add 'no-vmstate' flag to -drive (invented just now). >>>>> >>>> something like this: >>>> >>>> diff --git a/block.c b/block.c >>>> index 3e1877d..8900589 100644 >>>> --- a/block.c >>>> +++ b/block.c >>>> @@ -881,6 +881,11 @@ static QemuOptsList bdrv_runtime_opts = { >>>> .help = "Block driver to use for the node", >>>> }, >>>> { >>>> + .name = "novmstate", >>>> + .type = QEMU_OPT_BOOL, >>>> + .help = "Ignore for selecting to save VM state", >>>> + }, >>>> + { >>>> .name = BDRV_OPT_CACHE_WB, >>>> .type = QEMU_OPT_BOOL, >>>> .help = "Enable writeback mode", >>>> @@ -957,6 +962,7 @@ static int bdrv_open_common(BlockDriverState >>>> *bs, BdrvChild *file, >>>> bs->request_alignment = 512; >>>> bs->zero_beyond_eof = true; >>>> bs->read_only = !(bs->open_flags & BDRV_O_RDWR); >>>> + bs->disable_vmstate_save = qemu_opt_get_bool(opts, "novmstate", false); >>>> >>>> if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) { >>>> error_setg(errp, >>>> diff --git a/block/snapshot.c b/block/snapshot.c >>>> index 2d86b88..33cdd86 100644 >>>> --- a/block/snapshot.c >>>> +++ b/block/snapshot.c >>>> @@ -483,6 +483,10 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void) >>>> while (not_found && (bs = bdrv_next(bs))) { >>>> AioContext *ctx = bdrv_get_aio_context(bs); >>>> >>>> + if (bs->disable_vmstate_save) { >>>> + continue; >>>> + } >>>> + >>>> aio_context_acquire(ctx); >>>> not_found = !bdrv_can_snapshot(bs); >>>> aio_context_release(ctx); >>>> diff --git a/include/block/block_int.h b/include/block/block_int.h >>>> index 256609d..855a209 100644 >>>> --- a/include/block/block_int.h >>>> +++ b/include/block/block_int.h >>>> @@ -438,6 +438,9 @@ struct BlockDriverState { >>>> /* do we need to tell the quest if we have a volatile write cache? */ >>>> int enable_write_cache; >>>> >>>> + /* skip this BDS searching for one to save VM state */ >>>> + bool disable_vmstate_save; >>>> + >>>> /* the following member gives a name to every node on the bs graph. */ >>>> char node_name[32]; >>>> /* element of the list of named nodes building the graph */ >>> That sounds like an option. (No pun intended.) >>> >>> We can discuss the option name (perhaps "vmstate" defaulting to "on" is >>> better?) and variable names (I'd prefer them to match the option name); >>> also you'll need to extend the QAPI schema for blockdev-add. But all of >>> these are minor points and the idea seems sane. >> I've always thought that QEMU picking the image to take the VM state is >> backwards. Adding means to guide that pick like "don't pick this one, >> please" may help ease the pain, but it's still backwards. >> >> The *user* should pick it. > Designing the API now when it has been in use for ten years is > backwards, too. We have to take it as is and make the best of it. > > We could add an optional argument to savevm that tells which image to > save the VM state to. But if it's missing, we still need to make a pick. > Of course, libvirt should then always use that option and then we don't > need a separate vmstate=[on|off] option. > > If we go that way, we need to improve loadvm to get VM state from any of > the images of a VM, because the user could have saved the state to any. > (Making that improvement is probably a good idea anyway.) > > Kevin no. There is window now. savevm at the moment is implemented in HMP only. There is my pending patchset to switch to QMP. We can require to do something additional during that switch. Den ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot 2016-01-12 17:50 ` Kevin Wolf 2016-01-12 17:54 ` Denis V. Lunev @ 2016-01-13 8:09 ` Markus Armbruster 2016-01-13 10:43 ` Laszlo Ersek 2 siblings, 0 replies; 25+ messages in thread From: Markus Armbruster @ 2016-01-13 8:09 UTC (permalink / raw) To: Kevin Wolf Cc: Denis V. Lunev, Laszlo Ersek, qemu-devel, qemu-block, Paolo Bonzini Kevin Wolf <kwolf@redhat.com> writes: > Am 12.01.2016 um 18:40 hat Markus Armbruster geschrieben: >> Kevin Wolf <kwolf@redhat.com> writes: >> >> > Am 12.01.2016 um 17:35 hat Denis V. Lunev geschrieben: >> >> On 01/12/2016 06:47 PM, Denis V. Lunev wrote: >> >> >On 01/12/2016 06:20 PM, Kevin Wolf wrote: >> >> >>Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben: >> >> >>> >> >> >>>On 12/01/2016 15:16, Kevin Wolf wrote: >> >> >>>>>Thus we should avoid selection of "pflash" drives for VM >> >> >>>>>state saving. >> >> >>>>> >> >> >>>>>For now "pflash" is read-write raw image as it configured by libvirt. >> >> >>>>>Thus there are no such images in the field and we could >> >> >>>>>safely disable >> >> >>>>>ability to save state to those images inside QEMU. >> >> >>>>This is obviously broken. If you write to the pflash, then it needs to >> >> >>>>be snapshotted in order to keep a consistent state. >> >> >>>> >> >> >>>>If you want to avoid snapshotting the image, make it read-only and it >> >> >>>>will be skipped even today. >> >> >>>Sort of. The point of having flash is to _not_ make it read-only, so >> >> >>>that is not a solution. >> >> >>> >> >> >>>Flash is already being snapshotted as part of saving RAM state. In >> >> >>>fact, for this reason the device (at least the one used with OVMF; I >> >> >>>haven't checked other pflash devices) can simply save it back to disk >> >> >>>on the migration destination, without the need to use "migrate -b" or >> >> >>>shared storage. >> >> >>>[...] >> >> >>>I don't like very much using IF_PFLASH this way, which is why I hadn't >> >> >>>replied to the patch so far---I hadn't made up my mind about *what* to >> >> >>>suggest instead, or whether to just accept it. However, it does work. >> >> >>> >> >> >>>Perhaps a separate "I know what I am doing" skip-snapshot option? Or >> >> >>>a device callback saying "not snapshotting this is fine"? >> >> >>Boy, is this ugly... >> >> >> >> >> >>What do you do with disk-only snapshots? The recovery only works as long >> >> >>as you have VM state. >> >> >> >> >> >>Kevin >> >> >actually I am in a bit of trouble :( >> >> > >> >> >I understand that this is ugly, but I would like to make working >> >> >'virsh snapshot' for OVFM VMs. This is necessary for us to make >> >> >a release. >> >> > >> >> >Currently libvirt guys generate XML in the following way: >> >> > >> >> > <os> >> >> > <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type> >> >> > <loader readonly='yes' >> >> >type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader> >> >> ><nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram> >> >> > </os> >> >> > >> >> >This results in: >> >> > >> >> >qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on >> >> >\ >> >> > -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1 >> >> > >> >> >This obviously can not pass check in bdrv_all_can_snapshot() >> >> >as 'pflash' is RW and raw, i.e. can not be snapshoted. >> >> > >> >> >They have discussed the switch to the following command line: >> >> > >> >> >qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on >> >> >\ >> >> > -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1 >> >> > >> >> >and say that in this case VM state could fall into PFLASH >> >> >drive which is should not be big as the location of the >> >> >file is different. This means that I am doomed here. >> >> > >> >> >Either we should force libvirt people to forget about their >> >> >opinion that pflash should be small which I am unable to >> >> >do or I should invent a way to ban VM state saving into >> >> >pflash. >> >> > >> >> >OK. There are 2 options. >> >> > >> >> >1) Ban pflash as it was done. >> >> >2) Add 'no-vmstate' flag to -drive (invented just now). >> >> > >> >> something like this: >> >> >> >> diff --git a/block.c b/block.c >> >> index 3e1877d..8900589 100644 >> >> --- a/block.c >> >> +++ b/block.c >> >> @@ -881,6 +881,11 @@ static QemuOptsList bdrv_runtime_opts = { >> >> .help = "Block driver to use for the node", >> >> }, >> >> { >> >> + .name = "novmstate", >> >> + .type = QEMU_OPT_BOOL, >> >> + .help = "Ignore for selecting to save VM state", >> >> + }, >> >> + { >> >> .name = BDRV_OPT_CACHE_WB, >> >> .type = QEMU_OPT_BOOL, >> >> .help = "Enable writeback mode", >> >> @@ -957,6 +962,7 @@ static int bdrv_open_common(BlockDriverState >> >> *bs, BdrvChild *file, >> >> bs->request_alignment = 512; >> >> bs->zero_beyond_eof = true; >> >> bs->read_only = !(bs->open_flags & BDRV_O_RDWR); >> >> + bs->disable_vmstate_save = qemu_opt_get_bool(opts, "novmstate", false); >> >> >> >> if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) { >> >> error_setg(errp, >> >> diff --git a/block/snapshot.c b/block/snapshot.c >> >> index 2d86b88..33cdd86 100644 >> >> --- a/block/snapshot.c >> >> +++ b/block/snapshot.c >> >> @@ -483,6 +483,10 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void) >> >> while (not_found && (bs = bdrv_next(bs))) { >> >> AioContext *ctx = bdrv_get_aio_context(bs); >> >> >> >> + if (bs->disable_vmstate_save) { >> >> + continue; >> >> + } >> >> + >> >> aio_context_acquire(ctx); >> >> not_found = !bdrv_can_snapshot(bs); >> >> aio_context_release(ctx); >> >> diff --git a/include/block/block_int.h b/include/block/block_int.h >> >> index 256609d..855a209 100644 >> >> --- a/include/block/block_int.h >> >> +++ b/include/block/block_int.h >> >> @@ -438,6 +438,9 @@ struct BlockDriverState { >> >> /* do we need to tell the quest if we have a volatile write cache? */ >> >> int enable_write_cache; >> >> >> >> + /* skip this BDS searching for one to save VM state */ >> >> + bool disable_vmstate_save; >> >> + >> >> /* the following member gives a name to every node on the bs graph. */ >> >> char node_name[32]; >> >> /* element of the list of named nodes building the graph */ >> > >> > That sounds like an option. (No pun intended.) >> > >> > We can discuss the option name (perhaps "vmstate" defaulting to "on" is >> > better?) and variable names (I'd prefer them to match the option name); >> > also you'll need to extend the QAPI schema for blockdev-add. But all of >> > these are minor points and the idea seems sane. >> >> I've always thought that QEMU picking the image to take the VM state is >> backwards. Adding means to guide that pick like "don't pick this one, >> please" may help ease the pain, but it's still backwards. >> >> The *user* should pick it. > > Designing the API now when it has been in use for ten years is > backwards, too. We have to take it as is and make the best of it. As Den pointed out, the *QMP* interface doesn't exist, yet, and there's no excuse for creating it backwards now. HMP is not a stable interface, but we commonly make a reasonable effort to keep it muscle-memory-compatible. > We could add an optional argument to savevm that tells which image to > save the VM state to. But if it's missing, we still need to make a pick. > Of course, libvirt should then always use that option and then we don't > need a separate vmstate=[on|off] option. > > If we go that way, we need to improve loadvm to get VM state from any of > the images of a VM, because the user could have saved the state to any. > (Making that improvement is probably a good idea anyway.) I want none of the "QEMU picks the image to receive the VM state" baggage in QMP. I want none of the other baggage, either: overloaded "ID or name" parameter, non-atomically deleting old snapshots with the same ID or name. Instead, let's have simple, non-magical commands that do one thing. Speaking of "do one thing": coupling "save VM state" to "snapshot all storage internally" is problematic. What if you want to combine "save VM state" with external snapshots, or some combination of external and internal snapshots? Let me explain. We have "external" and "internal" solutions both for snapshotting storage and VM state: internal vs. external snapshot, migrate to file vs. internal VM snapshot. Do we want to expose the building blocks, or do we want to expose only certain combinations? Back to HMP. In general, HMP commands should be built on top of QMP commands. Adding convenience features is fine. Defaulting savevm's destination could be such a convenience feature. Very low complexity when the default is unambiguous, i.e. if there's just one possible destination. But if there's more than one, it adds significant complexity to the interface. If we want it anyway for muscle-memory- compatibility, I'd spit out a warning when it's ambiguous. Like you, I can't see a need for a separate vmstate=[on|off] knob. Note that my opinions on HMP interfaces are less strong than on QMP interfaces. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot 2016-01-12 17:50 ` Kevin Wolf 2016-01-12 17:54 ` Denis V. Lunev 2016-01-13 8:09 ` Markus Armbruster @ 2016-01-13 10:43 ` Laszlo Ersek 2 siblings, 0 replies; 25+ messages in thread From: Laszlo Ersek @ 2016-01-13 10:43 UTC (permalink / raw) To: Kevin Wolf, Markus Armbruster Cc: Denis V. Lunev, qemu-devel, qemu-block, Paolo Bonzini On 01/12/16 18:50, Kevin Wolf wrote: > Am 12.01.2016 um 18:40 hat Markus Armbruster geschrieben: >> Kevin Wolf <kwolf@redhat.com> writes: >> >>> Am 12.01.2016 um 17:35 hat Denis V. Lunev geschrieben: >>>> On 01/12/2016 06:47 PM, Denis V. Lunev wrote: >>>>> On 01/12/2016 06:20 PM, Kevin Wolf wrote: >>>>>> Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben: >>>>>>> >>>>>>> On 12/01/2016 15:16, Kevin Wolf wrote: >>>>>>>>> Thus we should avoid selection of "pflash" drives for VM >>>>>>>>> state saving. >>>>>>>>> >>>>>>>>> For now "pflash" is read-write raw image as it configured by libvirt. >>>>>>>>> Thus there are no such images in the field and we could >>>>>>>>> safely disable >>>>>>>>> ability to save state to those images inside QEMU. >>>>>>>> This is obviously broken. If you write to the pflash, then it needs to >>>>>>>> be snapshotted in order to keep a consistent state. >>>>>>>> >>>>>>>> If you want to avoid snapshotting the image, make it read-only and it >>>>>>>> will be skipped even today. >>>>>>> Sort of. The point of having flash is to _not_ make it read-only, so >>>>>>> that is not a solution. >>>>>>> >>>>>>> Flash is already being snapshotted as part of saving RAM state. In >>>>>>> fact, for this reason the device (at least the one used with OVMF; I >>>>>>> haven't checked other pflash devices) can simply save it back to disk >>>>>>> on the migration destination, without the need to use "migrate -b" or >>>>>>> shared storage. >>>>>>> [...] >>>>>>> I don't like very much using IF_PFLASH this way, which is why I hadn't >>>>>>> replied to the patch so far---I hadn't made up my mind about *what* to >>>>>>> suggest instead, or whether to just accept it. However, it does work. >>>>>>> >>>>>>> Perhaps a separate "I know what I am doing" skip-snapshot option? Or >>>>>>> a device callback saying "not snapshotting this is fine"? >>>>>> Boy, is this ugly... >>>>>> >>>>>> What do you do with disk-only snapshots? The recovery only works as long >>>>>> as you have VM state. >>>>>> >>>>>> Kevin >>>>> actually I am in a bit of trouble :( >>>>> >>>>> I understand that this is ugly, but I would like to make working >>>>> 'virsh snapshot' for OVFM VMs. This is necessary for us to make >>>>> a release. >>>>> >>>>> Currently libvirt guys generate XML in the following way: >>>>> >>>>> <os> >>>>> <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type> >>>>> <loader readonly='yes' >>>>> type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader> >>>>> <nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram> >>>>> </os> >>>>> >>>>> This results in: >>>>> >>>>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on >>>>> \ >>>>> -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1 >>>>> >>>>> This obviously can not pass check in bdrv_all_can_snapshot() >>>>> as 'pflash' is RW and raw, i.e. can not be snapshoted. >>>>> >>>>> They have discussed the switch to the following command line: >>>>> >>>>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on >>>>> \ >>>>> -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1 >>>>> >>>>> and say that in this case VM state could fall into PFLASH >>>>> drive which is should not be big as the location of the >>>>> file is different. This means that I am doomed here. >>>>> >>>>> Either we should force libvirt people to forget about their >>>>> opinion that pflash should be small which I am unable to >>>>> do or I should invent a way to ban VM state saving into >>>>> pflash. >>>>> >>>>> OK. There are 2 options. >>>>> >>>>> 1) Ban pflash as it was done. >>>>> 2) Add 'no-vmstate' flag to -drive (invented just now). >>>>> >>>> something like this: >>>> >>>> diff --git a/block.c b/block.c >>>> index 3e1877d..8900589 100644 >>>> --- a/block.c >>>> +++ b/block.c >>>> @@ -881,6 +881,11 @@ static QemuOptsList bdrv_runtime_opts = { >>>> .help = "Block driver to use for the node", >>>> }, >>>> { >>>> + .name = "novmstate", >>>> + .type = QEMU_OPT_BOOL, >>>> + .help = "Ignore for selecting to save VM state", >>>> + }, >>>> + { >>>> .name = BDRV_OPT_CACHE_WB, >>>> .type = QEMU_OPT_BOOL, >>>> .help = "Enable writeback mode", >>>> @@ -957,6 +962,7 @@ static int bdrv_open_common(BlockDriverState >>>> *bs, BdrvChild *file, >>>> bs->request_alignment = 512; >>>> bs->zero_beyond_eof = true; >>>> bs->read_only = !(bs->open_flags & BDRV_O_RDWR); >>>> + bs->disable_vmstate_save = qemu_opt_get_bool(opts, "novmstate", false); >>>> >>>> if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) { >>>> error_setg(errp, >>>> diff --git a/block/snapshot.c b/block/snapshot.c >>>> index 2d86b88..33cdd86 100644 >>>> --- a/block/snapshot.c >>>> +++ b/block/snapshot.c >>>> @@ -483,6 +483,10 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void) >>>> while (not_found && (bs = bdrv_next(bs))) { >>>> AioContext *ctx = bdrv_get_aio_context(bs); >>>> >>>> + if (bs->disable_vmstate_save) { >>>> + continue; >>>> + } >>>> + >>>> aio_context_acquire(ctx); >>>> not_found = !bdrv_can_snapshot(bs); >>>> aio_context_release(ctx); >>>> diff --git a/include/block/block_int.h b/include/block/block_int.h >>>> index 256609d..855a209 100644 >>>> --- a/include/block/block_int.h >>>> +++ b/include/block/block_int.h >>>> @@ -438,6 +438,9 @@ struct BlockDriverState { >>>> /* do we need to tell the quest if we have a volatile write cache? */ >>>> int enable_write_cache; >>>> >>>> + /* skip this BDS searching for one to save VM state */ >>>> + bool disable_vmstate_save; >>>> + >>>> /* the following member gives a name to every node on the bs graph. */ >>>> char node_name[32]; >>>> /* element of the list of named nodes building the graph */ >>> >>> That sounds like an option. (No pun intended.) >>> >>> We can discuss the option name (perhaps "vmstate" defaulting to "on" is >>> better?) and variable names (I'd prefer them to match the option name); >>> also you'll need to extend the QAPI schema for blockdev-add. But all of >>> these are minor points and the idea seems sane. >> >> I've always thought that QEMU picking the image to take the VM state is >> backwards. Adding means to guide that pick like "don't pick this one, >> please" may help ease the pain, but it's still backwards. >> >> The *user* should pick it. > > Designing the API now when it has been in use for ten years is > backwards, too. We have to take it as is and make the best of it. > > We could add an optional argument to savevm that tells which image to > save the VM state to. But if it's missing, we still need to make a pick. > Of course, libvirt should then always use that option and then we don't > need a separate vmstate=[on|off] option. Sounds great! > If we go that way, we need to improve loadvm to get VM state from any of > the images of a VM, because the user could have saved the state to any. > (Making that improvement is probably a good idea anyway.) What if there are several images with vmstate in them? (OTOH that doesn't seem to be a well-defined case even now, so nothing would amount to a regression.) Thanks Laszlo > > Kevin > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot 2016-01-12 17:40 ` Markus Armbruster 2016-01-12 17:50 ` Kevin Wolf @ 2016-01-12 17:53 ` Denis V. Lunev 2016-01-13 10:41 ` Laszlo Ersek 2 siblings, 0 replies; 25+ messages in thread From: Denis V. Lunev @ 2016-01-12 17:53 UTC (permalink / raw) To: Markus Armbruster, Kevin Wolf Cc: Paolo Bonzini, Laszlo Ersek, qemu-devel, qemu-block On 01/12/2016 08:40 PM, Markus Armbruster wrote: > Kevin Wolf <kwolf@redhat.com> writes: > >> Am 12.01.2016 um 17:35 hat Denis V. Lunev geschrieben: >>> On 01/12/2016 06:47 PM, Denis V. Lunev wrote: >>>> On 01/12/2016 06:20 PM, Kevin Wolf wrote: >>>>> Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben: >>>>>> On 12/01/2016 15:16, Kevin Wolf wrote: >>>>>>>> Thus we should avoid selection of "pflash" drives for VM >>>>>>>> state saving. >>>>>>>> >>>>>>>> For now "pflash" is read-write raw image as it configured by libvirt. >>>>>>>> Thus there are no such images in the field and we could >>>>>>>> safely disable >>>>>>>> ability to save state to those images inside QEMU. >>>>>>> This is obviously broken. If you write to the pflash, then it needs to >>>>>>> be snapshotted in order to keep a consistent state. >>>>>>> >>>>>>> If you want to avoid snapshotting the image, make it read-only and it >>>>>>> will be skipped even today. >>>>>> Sort of. The point of having flash is to _not_ make it read-only, so >>>>>> that is not a solution. >>>>>> >>>>>> Flash is already being snapshotted as part of saving RAM state. In >>>>>> fact, for this reason the device (at least the one used with OVMF; I >>>>>> haven't checked other pflash devices) can simply save it back to disk >>>>>> on the migration destination, without the need to use "migrate -b" or >>>>>> shared storage. >>>>>> [...] >>>>>> I don't like very much using IF_PFLASH this way, which is why I hadn't >>>>>> replied to the patch so far---I hadn't made up my mind about *what* to >>>>>> suggest instead, or whether to just accept it. However, it does work. >>>>>> >>>>>> Perhaps a separate "I know what I am doing" skip-snapshot option? Or >>>>>> a device callback saying "not snapshotting this is fine"? >>>>> Boy, is this ugly... >>>>> >>>>> What do you do with disk-only snapshots? The recovery only works as long >>>>> as you have VM state. >>>>> >>>>> Kevin >>>> actually I am in a bit of trouble :( >>>> >>>> I understand that this is ugly, but I would like to make working >>>> 'virsh snapshot' for OVFM VMs. This is necessary for us to make >>>> a release. >>>> >>>> Currently libvirt guys generate XML in the following way: >>>> >>>> <os> >>>> <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type> >>>> <loader readonly='yes' >>>> type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader> >>>> <nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram> >>>> </os> >>>> >>>> This results in: >>>> >>>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on >>>> \ >>>> -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1 >>>> >>>> This obviously can not pass check in bdrv_all_can_snapshot() >>>> as 'pflash' is RW and raw, i.e. can not be snapshoted. >>>> >>>> They have discussed the switch to the following command line: >>>> >>>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on >>>> \ >>>> -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1 >>>> >>>> and say that in this case VM state could fall into PFLASH >>>> drive which is should not be big as the location of the >>>> file is different. This means that I am doomed here. >>>> >>>> Either we should force libvirt people to forget about their >>>> opinion that pflash should be small which I am unable to >>>> do or I should invent a way to ban VM state saving into >>>> pflash. >>>> >>>> OK. There are 2 options. >>>> >>>> 1) Ban pflash as it was done. >>>> 2) Add 'no-vmstate' flag to -drive (invented just now). >>>> >>> something like this: >>> >>> diff --git a/block.c b/block.c >>> index 3e1877d..8900589 100644 >>> --- a/block.c >>> +++ b/block.c >>> @@ -881,6 +881,11 @@ static QemuOptsList bdrv_runtime_opts = { >>> .help = "Block driver to use for the node", >>> }, >>> { >>> + .name = "novmstate", >>> + .type = QEMU_OPT_BOOL, >>> + .help = "Ignore for selecting to save VM state", >>> + }, >>> + { >>> .name = BDRV_OPT_CACHE_WB, >>> .type = QEMU_OPT_BOOL, >>> .help = "Enable writeback mode", >>> @@ -957,6 +962,7 @@ static int bdrv_open_common(BlockDriverState >>> *bs, BdrvChild *file, >>> bs->request_alignment = 512; >>> bs->zero_beyond_eof = true; >>> bs->read_only = !(bs->open_flags & BDRV_O_RDWR); >>> + bs->disable_vmstate_save = qemu_opt_get_bool(opts, "novmstate", false); >>> >>> if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) { >>> error_setg(errp, >>> diff --git a/block/snapshot.c b/block/snapshot.c >>> index 2d86b88..33cdd86 100644 >>> --- a/block/snapshot.c >>> +++ b/block/snapshot.c >>> @@ -483,6 +483,10 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void) >>> while (not_found && (bs = bdrv_next(bs))) { >>> AioContext *ctx = bdrv_get_aio_context(bs); >>> >>> + if (bs->disable_vmstate_save) { >>> + continue; >>> + } >>> + >>> aio_context_acquire(ctx); >>> not_found = !bdrv_can_snapshot(bs); >>> aio_context_release(ctx); >>> diff --git a/include/block/block_int.h b/include/block/block_int.h >>> index 256609d..855a209 100644 >>> --- a/include/block/block_int.h >>> +++ b/include/block/block_int.h >>> @@ -438,6 +438,9 @@ struct BlockDriverState { >>> /* do we need to tell the quest if we have a volatile write cache? */ >>> int enable_write_cache; >>> >>> + /* skip this BDS searching for one to save VM state */ >>> + bool disable_vmstate_save; >>> + >>> /* the following member gives a name to every node on the bs graph. */ >>> char node_name[32]; >>> /* element of the list of named nodes building the graph */ >> That sounds like an option. (No pun intended.) >> >> We can discuss the option name (perhaps "vmstate" defaulting to "on" is >> better?) and variable names (I'd prefer them to match the option name); >> also you'll need to extend the QAPI schema for blockdev-add. But all of >> these are minor points and the idea seems sane. > I've always thought that QEMU picking the image to take the VM state is > backwards. Adding means to guide that pick like "don't pick this one, > please" may help ease the pain, but it's still backwards. > > The *user* should pick it. user here can pass hint for all devices setting 'vmstate=on' on a single drive. Though there is an option to select drive specifically like this ## # @savevm # # Save a VM snapshot. Old snapshot with the same name will be deleted if exists. # # @name: identifier of a snapshot to be created # # Returns: Nothing on success # # Since 2.6 ## { 'command': 'savevm', 'data': {'name': 'str', 'vmstate-drive': 'str'} } But for loadvm it would be better to really scan all drives trying to locate vmstate. This is IMHO better but it is unclear for me how to do this just now. I'll need to think on this. Actually we can do both things. Code the option in QMP and also provide protection with "vmstate=off" for certain BDSes. Den ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot 2016-01-12 17:40 ` Markus Armbruster 2016-01-12 17:50 ` Kevin Wolf 2016-01-12 17:53 ` Denis V. Lunev @ 2016-01-13 10:41 ` Laszlo Ersek 2 siblings, 0 replies; 25+ messages in thread From: Laszlo Ersek @ 2016-01-13 10:41 UTC (permalink / raw) To: Markus Armbruster, Kevin Wolf Cc: Denis V. Lunev, qemu-devel, qemu-block, Paolo Bonzini On 01/12/16 18:40, Markus Armbruster wrote: > Kevin Wolf <kwolf@redhat.com> writes: > >> Am 12.01.2016 um 17:35 hat Denis V. Lunev geschrieben: >>> On 01/12/2016 06:47 PM, Denis V. Lunev wrote: >>>> On 01/12/2016 06:20 PM, Kevin Wolf wrote: >>>>> Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben: >>>>>> >>>>>> On 12/01/2016 15:16, Kevin Wolf wrote: >>>>>>>> Thus we should avoid selection of "pflash" drives for VM >>>>>>>> state saving. >>>>>>>> >>>>>>>> For now "pflash" is read-write raw image as it configured by libvirt. >>>>>>>> Thus there are no such images in the field and we could >>>>>>>> safely disable >>>>>>>> ability to save state to those images inside QEMU. >>>>>>> This is obviously broken. If you write to the pflash, then it needs to >>>>>>> be snapshotted in order to keep a consistent state. >>>>>>> >>>>>>> If you want to avoid snapshotting the image, make it read-only and it >>>>>>> will be skipped even today. >>>>>> Sort of. The point of having flash is to _not_ make it read-only, so >>>>>> that is not a solution. >>>>>> >>>>>> Flash is already being snapshotted as part of saving RAM state. In >>>>>> fact, for this reason the device (at least the one used with OVMF; I >>>>>> haven't checked other pflash devices) can simply save it back to disk >>>>>> on the migration destination, without the need to use "migrate -b" or >>>>>> shared storage. >>>>>> [...] >>>>>> I don't like very much using IF_PFLASH this way, which is why I hadn't >>>>>> replied to the patch so far---I hadn't made up my mind about *what* to >>>>>> suggest instead, or whether to just accept it. However, it does work. >>>>>> >>>>>> Perhaps a separate "I know what I am doing" skip-snapshot option? Or >>>>>> a device callback saying "not snapshotting this is fine"? >>>>> Boy, is this ugly... >>>>> >>>>> What do you do with disk-only snapshots? The recovery only works as long >>>>> as you have VM state. >>>>> >>>>> Kevin >>>> actually I am in a bit of trouble :( >>>> >>>> I understand that this is ugly, but I would like to make working >>>> 'virsh snapshot' for OVFM VMs. This is necessary for us to make >>>> a release. >>>> >>>> Currently libvirt guys generate XML in the following way: >>>> >>>> <os> >>>> <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type> >>>> <loader readonly='yes' >>>> type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader> >>>> <nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram> >>>> </os> >>>> >>>> This results in: >>>> >>>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on >>>> \ >>>> -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1 >>>> >>>> This obviously can not pass check in bdrv_all_can_snapshot() >>>> as 'pflash' is RW and raw, i.e. can not be snapshoted. >>>> >>>> They have discussed the switch to the following command line: >>>> >>>> qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on >>>> \ >>>> -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1 >>>> >>>> and say that in this case VM state could fall into PFLASH >>>> drive which is should not be big as the location of the >>>> file is different. This means that I am doomed here. >>>> >>>> Either we should force libvirt people to forget about their >>>> opinion that pflash should be small which I am unable to >>>> do or I should invent a way to ban VM state saving into >>>> pflash. >>>> >>>> OK. There are 2 options. >>>> >>>> 1) Ban pflash as it was done. >>>> 2) Add 'no-vmstate' flag to -drive (invented just now). >>>> >>> something like this: >>> >>> diff --git a/block.c b/block.c >>> index 3e1877d..8900589 100644 >>> --- a/block.c >>> +++ b/block.c >>> @@ -881,6 +881,11 @@ static QemuOptsList bdrv_runtime_opts = { >>> .help = "Block driver to use for the node", >>> }, >>> { >>> + .name = "novmstate", >>> + .type = QEMU_OPT_BOOL, >>> + .help = "Ignore for selecting to save VM state", >>> + }, >>> + { >>> .name = BDRV_OPT_CACHE_WB, >>> .type = QEMU_OPT_BOOL, >>> .help = "Enable writeback mode", >>> @@ -957,6 +962,7 @@ static int bdrv_open_common(BlockDriverState >>> *bs, BdrvChild *file, >>> bs->request_alignment = 512; >>> bs->zero_beyond_eof = true; >>> bs->read_only = !(bs->open_flags & BDRV_O_RDWR); >>> + bs->disable_vmstate_save = qemu_opt_get_bool(opts, "novmstate", false); >>> >>> if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) { >>> error_setg(errp, >>> diff --git a/block/snapshot.c b/block/snapshot.c >>> index 2d86b88..33cdd86 100644 >>> --- a/block/snapshot.c >>> +++ b/block/snapshot.c >>> @@ -483,6 +483,10 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void) >>> while (not_found && (bs = bdrv_next(bs))) { >>> AioContext *ctx = bdrv_get_aio_context(bs); >>> >>> + if (bs->disable_vmstate_save) { >>> + continue; >>> + } >>> + >>> aio_context_acquire(ctx); >>> not_found = !bdrv_can_snapshot(bs); >>> aio_context_release(ctx); >>> diff --git a/include/block/block_int.h b/include/block/block_int.h >>> index 256609d..855a209 100644 >>> --- a/include/block/block_int.h >>> +++ b/include/block/block_int.h >>> @@ -438,6 +438,9 @@ struct BlockDriverState { >>> /* do we need to tell the quest if we have a volatile write cache? */ >>> int enable_write_cache; >>> >>> + /* skip this BDS searching for one to save VM state */ >>> + bool disable_vmstate_save; >>> + >>> /* the following member gives a name to every node on the bs graph. */ >>> char node_name[32]; >>> /* element of the list of named nodes building the graph */ >> >> That sounds like an option. (No pun intended.) >> >> We can discuss the option name (perhaps "vmstate" defaulting to "on" is >> better?) and variable names (I'd prefer them to match the option name); >> also you'll need to extend the QAPI schema for blockdev-add. But all of >> these are minor points and the idea seems sane. > > I've always thought that QEMU picking the image to take the VM state is > backwards. Adding means to guide that pick like "don't pick this one, > please" may help ease the pain, but it's still backwards. I agree. This is the gist of the argument that you made last time too, and I agreed then as well. One piece of good illustration for why this is brittle is your observation that just changing the order of drives (without any pflash being present) breaks this as well. (Libvirt only gets away with that because it also stashes the domain XML when a snapshot is made, and the domain XML dictates the order of drive options.) > The *user* should pick it. I guess so. Thanks! Laszlo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot 2016-01-12 15:47 ` Denis V. Lunev 2016-01-12 16:35 ` Denis V. Lunev @ 2016-01-13 10:37 ` Laszlo Ersek 2016-01-13 11:11 ` Denis V. Lunev 1 sibling, 1 reply; 25+ messages in thread From: Laszlo Ersek @ 2016-01-13 10:37 UTC (permalink / raw) To: Denis V. Lunev, qemu-devel Cc: Kevin Wolf, Paolo Bonzini, Dmitry Andreev, qemu-block meta comment here: On 01/12/16 16:47, Denis V. Lunev wrote: > P.S. Here is a summary that my colleague has receiver from libvirt > list. > > -------- Forwarded Message -------- > Subject: Re: Snapshotting OVMF guests > Date: Mon, 11 Jan 2016 13:56:29 +0100 > From: Laszlo Ersek <lersek@redhat.com> > To: Dmitry Andreev <dandreev@virtuozzo.com> > CC: Michal Privoznik <mprivozn@redhat.com>, Markus Armbruster > <armbru@redhat.com> > > Hello Dmitry, > > [...] Your colleague Dmitry did not receive this from the libvirt list. He received the from me in private. See the headers above. Please do not publicize a private exchange without asking for permission first. In the present case I don't mind it. I stand by everything I said, and I would have written mostly the same if I had been contacted publicly, on-list. But if you contact me in private *first*, then I expect the discussion to remain private. If you want to forward the email to a public list, please ask for permission. Otherwise I might consider it more prudent for myself to answer all private queries with just "please ask me this on the list instead". I appreciate that you guys are working on this, but let's handle emails sensibly. Thanks Laszlo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot 2016-01-13 10:37 ` Laszlo Ersek @ 2016-01-13 11:11 ` Denis V. Lunev 2016-01-13 12:15 ` Laszlo Ersek 0 siblings, 1 reply; 25+ messages in thread From: Denis V. Lunev @ 2016-01-13 11:11 UTC (permalink / raw) To: Laszlo Ersek, qemu-devel Cc: Kevin Wolf, Paolo Bonzini, Dmitry Andreev, qemu-block On 01/13/2016 01:37 PM, Laszlo Ersek wrote: > meta comment here: > > On 01/12/16 16:47, Denis V. Lunev wrote: > >> P.S. Here is a summary that my colleague has receiver from libvirt >> list. >> >> -------- Forwarded Message -------- >> Subject: Re: Snapshotting OVMF guests >> Date: Mon, 11 Jan 2016 13:56:29 +0100 >> From: Laszlo Ersek <lersek@redhat.com> >> To: Dmitry Andreev <dandreev@virtuozzo.com> >> CC: Michal Privoznik <mprivozn@redhat.com>, Markus Armbruster >> <armbru@redhat.com> >> >> Hello Dmitry, >> >> [...] > Your colleague Dmitry did not receive this from the libvirt list. He > received the from me in private. See the headers above. > > Please do not publicize a private exchange without asking for permission > first. > > In the present case I don't mind it. I stand by everything I said, and I > would have written mostly the same if I had been contacted publicly, > on-list. > > But if you contact me in private *first*, then I expect the discussion > to remain private. If you want to forward the email to a public list, > please ask for permission. Otherwise I might consider it more prudent > for myself to answer all private queries with just "please ask me this > on the list instead". > > I appreciate that you guys are working on this, but let's handle emails > sensibly. > > Thanks > Laszlo > Sorry :( I have not properly checked the message :( I am guilty.. Den ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot 2016-01-13 11:11 ` Denis V. Lunev @ 2016-01-13 12:15 ` Laszlo Ersek 0 siblings, 0 replies; 25+ messages in thread From: Laszlo Ersek @ 2016-01-13 12:15 UTC (permalink / raw) To: Denis V. Lunev, qemu-devel Cc: Kevin Wolf, Paolo Bonzini, Dmitry Andreev, qemu-block On 01/13/16 12:11, Denis V. Lunev wrote: > On 01/13/2016 01:37 PM, Laszlo Ersek wrote: >> meta comment here: >> >> On 01/12/16 16:47, Denis V. Lunev wrote: >> >>> P.S. Here is a summary that my colleague has receiver from libvirt >>> list. >>> >>> -------- Forwarded Message -------- >>> Subject: Re: Snapshotting OVMF guests >>> Date: Mon, 11 Jan 2016 13:56:29 +0100 >>> From: Laszlo Ersek <lersek@redhat.com> >>> To: Dmitry Andreev <dandreev@virtuozzo.com> >>> CC: Michal Privoznik <mprivozn@redhat.com>, Markus Armbruster >>> <armbru@redhat.com> >>> >>> Hello Dmitry, >>> >>> [...] >> Your colleague Dmitry did not receive this from the libvirt list. He >> received the from me in private. See the headers above. >> >> Please do not publicize a private exchange without asking for permission >> first. >> >> In the present case I don't mind it. I stand by everything I said, and I >> would have written mostly the same if I had been contacted publicly, >> on-list. >> >> But if you contact me in private *first*, then I expect the discussion >> to remain private. If you want to forward the email to a public list, >> please ask for permission. Otherwise I might consider it more prudent >> for myself to answer all private queries with just "please ask me this >> on the list instead". >> >> I appreciate that you guys are working on this, but let's handle emails >> sensibly. >> >> Thanks >> Laszlo >> > Sorry :( I have not properly checked the message :( > > I am guilty.. No prob, it's just that I've burned myself a few times before, hence I've grown to double check the address list when receiving & sending email. "List address not present" implies "other guy wants it to be private" to me. :) Cheers Laszlo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot 2016-01-12 14:16 ` Kevin Wolf 2016-01-12 14:59 ` Paolo Bonzini @ 2016-01-12 15:10 ` Denis V. Lunev 2016-01-12 15:28 ` Kevin Wolf 1 sibling, 1 reply; 25+ messages in thread From: Denis V. Lunev @ 2016-01-12 15:10 UTC (permalink / raw) To: Kevin Wolf; +Cc: Paolo Bonzini, Laszlo Ersek, qemu-devel, qemu-block On 01/12/2016 05:16 PM, Kevin Wolf wrote: > Am 12.01.2016 um 07:03 hat Denis V. Lunev geschrieben: >> There is a long-long story. OVMF VMs can not be snapsotted using >> 'virsh snapshot' as they have "pflash" device which is configured as >> "raw" image. There was a discussion in the past about that. >> >> Good description has been provided on topic by Laszlo Ersek, see below: >> >> "It is true that a pflash drive is "just a drive" *internally* to QEMU. >> It is also true that it more or less takes the same -drive options as >> any other *disk* drive. But those facts are just implementation details. >> >> The relevant trait of pflash storage files is that they are not *disk >> images*, on the libvirt domain XML level. They are not created in >> storage pools, you cannot specify their caching attributes, you don't >> specify their guest-visible frontend in separation (like virtio-blk / >> virtio-scsi / pflash). Those details are hidden (on purpose). >> >> Consequently, pflash storage files are expected to be *small* in size >> (in practice: identically sized to the varstore template they are >> instantiated from). They are created under /var/lib/libvirt/qemu/nvram. >> Although you can edit their path in the domain XML, they are not >> considered disks." >> >> Thus we should avoid selection of "pflash" drives for VM state saving. >> >> For now "pflash" is read-write raw image as it configured by libvirt. >> Thus there are no such images in the field and we could safely disable >> ability to save state to those images inside QEMU. > This is obviously broken. If you write to the pflash, then it needs to > be snapshotted in order to keep a consistent state. > > If you want to avoid snapshotting the image, make it read-only and it > will be skipped even today. > > Kevin you interpret the patch a bit wrong. It will be snapshoted once I'll raw image with qcow2 image, but this image will not be selected for state saving, i.e. it will remain compact. Den ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot 2016-01-12 15:10 ` Denis V. Lunev @ 2016-01-12 15:28 ` Kevin Wolf 0 siblings, 0 replies; 25+ messages in thread From: Kevin Wolf @ 2016-01-12 15:28 UTC (permalink / raw) To: Denis V. Lunev; +Cc: Paolo Bonzini, Laszlo Ersek, qemu-devel, qemu-block Am 12.01.2016 um 16:10 hat Denis V. Lunev geschrieben: > On 01/12/2016 05:16 PM, Kevin Wolf wrote: > >Am 12.01.2016 um 07:03 hat Denis V. Lunev geschrieben: > >>There is a long-long story. OVMF VMs can not be snapsotted using > >>'virsh snapshot' as they have "pflash" device which is configured as > >>"raw" image. There was a discussion in the past about that. > >> > >>Good description has been provided on topic by Laszlo Ersek, see below: > >> > >>"It is true that a pflash drive is "just a drive" *internally* to QEMU. > >>It is also true that it more or less takes the same -drive options as > >>any other *disk* drive. But those facts are just implementation details. > >> > >>The relevant trait of pflash storage files is that they are not *disk > >>images*, on the libvirt domain XML level. They are not created in > >>storage pools, you cannot specify their caching attributes, you don't > >>specify their guest-visible frontend in separation (like virtio-blk / > >> virtio-scsi / pflash). Those details are hidden (on purpose). > >> > >>Consequently, pflash storage files are expected to be *small* in size > >>(in practice: identically sized to the varstore template they are > >>instantiated from). They are created under /var/lib/libvirt/qemu/nvram. > >>Although you can edit their path in the domain XML, they are not > >>considered disks." > >> > >>Thus we should avoid selection of "pflash" drives for VM state saving. > >> > >>For now "pflash" is read-write raw image as it configured by libvirt. > >>Thus there are no such images in the field and we could safely disable > >>ability to save state to those images inside QEMU. > >This is obviously broken. If you write to the pflash, then it needs to > >be snapshotted in order to keep a consistent state. > > > >If you want to avoid snapshotting the image, make it read-only and it > >will be skipped even today. > > > >Kevin > you interpret the patch a bit wrong. > > It will be snapshoted once I'll raw image with qcow2 image, but this image > will not be selected for state saving, i.e. it will remain compact. Sorry, I misunderstood. That's more reasonable indeed. Kevin ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] RESUME blk: do not select PFLASH device for internal snapshot 2016-01-12 6:03 [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot Denis V. Lunev 2016-01-12 14:16 ` Kevin Wolf @ 2016-01-14 11:33 ` Denis V. Lunev 1 sibling, 0 replies; 25+ messages in thread From: Denis V. Lunev @ 2016-01-14 11:33 UTC (permalink / raw) Cc: Kevin Wolf, Paolo Bonzini, Laszlo Ersek, qemu-devel, qemu-block On 01/12/2016 09:03 AM, Denis V. Lunev wrote: > There is a long-long story. OVMF VMs can not be snapsotted using > 'virsh snapshot' as they have "pflash" device which is configured as > "raw" image. There was a discussion in the past about that. results of the discussion are available in this submission: [PATCH v5 0/8] QMP wrappers for VM snapshot operations ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2016-01-14 11:33 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-01-12 6:03 [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot Denis V. Lunev 2016-01-12 14:16 ` Kevin Wolf 2016-01-12 14:59 ` Paolo Bonzini 2016-01-12 15:13 ` Denis V. Lunev 2016-01-12 15:16 ` Peter Maydell 2016-01-12 15:26 ` Kevin Wolf 2016-01-12 15:20 ` Kevin Wolf 2016-01-12 15:35 ` Paolo Bonzini 2016-01-12 15:47 ` Denis V. Lunev 2016-01-12 16:35 ` Denis V. Lunev 2016-01-12 16:52 ` Kevin Wolf 2016-01-12 16:58 ` Denis V. Lunev 2016-01-12 17:40 ` Markus Armbruster 2016-01-12 17:50 ` Kevin Wolf 2016-01-12 17:54 ` Denis V. Lunev 2016-01-13 8:09 ` Markus Armbruster 2016-01-13 10:43 ` Laszlo Ersek 2016-01-12 17:53 ` Denis V. Lunev 2016-01-13 10:41 ` Laszlo Ersek 2016-01-13 10:37 ` Laszlo Ersek 2016-01-13 11:11 ` Denis V. Lunev 2016-01-13 12:15 ` Laszlo Ersek 2016-01-12 15:10 ` Denis V. Lunev 2016-01-12 15:28 ` Kevin Wolf 2016-01-14 11:33 ` [Qemu-devel] [PATCH 1/1] RESUME " Denis V. Lunev
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.