All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Denis V. Lunev" <den@openvz.org>
To: Kevin Wolf <kwolf@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>
Cc: Laszlo Ersek <lersek@redhat.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot
Date: Tue, 12 Jan 2016 18:47:40 +0300	[thread overview]
Message-ID: <5695201C.2050504@openvz.org> (raw)
In-Reply-To: <20160112152051.GG4841@noname.redhat.com>

On 01/12/2016 06:20 PM, Kevin Wolf wrote:
> Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben:
>>
>> On 12/01/2016 15:16, Kevin Wolf wrote:
>>>> Thus we should avoid selection of "pflash" drives for VM state saving.
>>>>
>>>> For now "pflash" is read-write raw image as it configured by libvirt.
>>>> Thus there are no such images in the field and we could safely disable
>>>> ability to save state to those images inside QEMU.
>>> This is obviously broken. If you write to the pflash, then it needs to
>>> be snapshotted in order to keep a consistent state.
>>>
>>> If you want to avoid snapshotting the image, make it read-only and it
>>> will be skipped even today.
>> Sort of.  The point of having flash is to _not_ make it read-only, so
>> that is not a solution.
>>
>> Flash is already being snapshotted as part of saving RAM state.  In
>> fact, for this reason the device (at least the one used with OVMF; I
>> haven't checked other pflash devices) can simply save it back to disk
>> on the migration destination, without the need to use "migrate -b" or
>> shared storage.
>> [...]
>> I don't like very much using IF_PFLASH this way, which is why I hadn't
>> replied to the patch so far---I hadn't made up my mind about *what* to
>> suggest instead, or whether to just accept it.  However, it does work.
>>
>> Perhaps a separate "I know what I am doing" skip-snapshot option?  Or
>> a device callback saying "not snapshotting this is fine"?
> Boy, is this ugly...
>
> What do you do with disk-only snapshots? The recovery only works as long
> as you have VM state.
>
> Kevin
actually I am in a bit of trouble :(

I understand that this is ugly, but I would like to make working
'virsh snapshot' for OVFM VMs. This is necessary for us to make
a release.

Currently libvirt guys generate XML in the following way:

   <os>
     <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type>
     <loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader>
     <nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram>
   </os>

This results in:

qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on \
      -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1

This obviously can not pass check in bdrv_all_can_snapshot()
as 'pflash' is RW and raw, i.e. can not be snapshoted.

They have discussed the switch to the following command line:

qemu -drive file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on \
      -drive file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1

and say that in this case VM state could fall into PFLASH
drive which is should not be big as the location of the
file is different. This means that I am doomed here.

Either we should force libvirt people to forget about their
opinion that pflash should be small which I am unable to
do or I should invent a way to ban VM state saving into
pflash.

OK. There are 2 options.

1) Ban pflash as it was done.
2) Add 'no-vmstate' flag to -drive (invented just now).

Den


P.S. Here is a summary that my colleague has receiver from libvirt
        list.

-------- Forwarded Message --------
Subject: Re: Snapshotting OVMF guests
Date: Mon, 11 Jan 2016 13:56:29 +0100
From: Laszlo Ersek <lersek@redhat.com>
To: Dmitry Andreev <dandreev@virtuozzo.com>
CC: Michal Privoznik <mprivozn@redhat.com>, Markus Armbruster 
<armbru@redhat.com>

Hello Dmitry,

(Cc: Markus.)

 > https://bugzilla.redhat.com/show_bug.cgi?id=1180955

I have now re-read that BZ. In comment 7 I wrote,

> However, if Michal's v2 libvirt patchset was applied, and the varstore
> drive was qcow2, then qemu would dump the *entire VM state*, including
> memory and device state, into the varstore drive (the 6th drive) under
> the command line visible in comment #0. That's *completely* bogus;
> much worse than rejecting the snapshot request.

It is bogus for size and configuration reasons.

It is true that a pflash drive is "just a drive" *internally* to QEMU.
It is also true that it more or less takes the same -drive options as
any other *disk* drive. But those facts are just implementation details.

The relevant trait of pflash storage files is that they are not *disk
images*, on the libvirt domain XML level. They are not created in
storage pools, you cannot specify their caching attributes, you don't
specify their guest-visible frontend in separation (like virtio-blk /
virtio-scsi / pflash). Those details are hidden (on purpose).

Consequently, pflash storage files are expected to be *small* in size
(in practice: identically sized to the varstore template they are
instantiated from). They are created under /var/lib/libvirt/qemu/nvram.
Although you can edit their path in the domain XML, they are not
considered disks.

This is also reflected in the way they are migrated. They are not
migrated with NBD / live storage migration / blockdev migration.
Instead, on the target host, when the in-migration completes, the entire
contents of the flash drive are written out in one shot to the target
host file.

Please see:
- the pflash_post_load() function in QEMU's "hw/block/pflash_cfi01.c",
- and QEMU commit 4c0cfc72.

Storing large amounts or data in the pflash storage file would be
incompatible with this concept.

... We also had an internal team discussion at Red Hat about this. I
won't re-read it now, but I think I can share a part of my own BZ
comment 9. In that comment I tried to summarize the internal discussion
more or less for myself. (I made that comment private because it
contained RH product related bits too -- I won't quote those bits now.)

So from comment 9:

> [...] the upshot from [the internal discussion] seems to be that
> "savevm" is *in general* inappropriate for any non-trivial -drive
> setup and/or for a -drive setup that is subject to change (eg.
> reordering on the command line).

Comment 11 in the BZ shows that we plan to document the limitation that
internal snapshotting will never be supported for OVMF. External
snapshotting *should* be, but it isn't yet either (because, at least at
the time of writing the BZ comment, reverting to external snapshots
wasn't supported).

Bottom line, pflash is implemented as a drive internally, but it is not
considered a *disk* drive, for migration, snapshotting, being stored in
pools, or for storing large amounts of data.

I hope this helps.

If you'd like to enable snapshotting for OVMF virtual machines, that
would be awesome; but I think it would require implementing the
above-mentioned "revert to external snapshot" functionality.

Thanks!
Laszlo

  parent reply	other threads:[~2016-01-12 15:48 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-12  6:03 [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot Denis V. Lunev
2016-01-12 14:16 ` Kevin Wolf
2016-01-12 14:59   ` Paolo Bonzini
2016-01-12 15:13     ` Denis V. Lunev
2016-01-12 15:16       ` Peter Maydell
2016-01-12 15:26         ` Kevin Wolf
2016-01-12 15:20     ` Kevin Wolf
2016-01-12 15:35       ` Paolo Bonzini
2016-01-12 15:47       ` Denis V. Lunev [this message]
2016-01-12 16:35         ` Denis V. Lunev
2016-01-12 16:52           ` Kevin Wolf
2016-01-12 16:58             ` Denis V. Lunev
2016-01-12 17:40             ` Markus Armbruster
2016-01-12 17:50               ` Kevin Wolf
2016-01-12 17:54                 ` Denis V. Lunev
2016-01-13  8:09                 ` Markus Armbruster
2016-01-13 10:43                 ` Laszlo Ersek
2016-01-12 17:53               ` Denis V. Lunev
2016-01-13 10:41               ` Laszlo Ersek
2016-01-13 10:37         ` Laszlo Ersek
2016-01-13 11:11           ` Denis V. Lunev
2016-01-13 12:15             ` Laszlo Ersek
2016-01-12 15:10   ` Denis V. Lunev
2016-01-12 15:28     ` Kevin Wolf
2016-01-14 11:33 ` [Qemu-devel] [PATCH 1/1] RESUME " Denis V. Lunev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5695201C.2050504@openvz.org \
    --to=den@openvz.org \
    --cc=kwolf@redhat.com \
    --cc=lersek@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.