[Qemu-devel] Live migration protocol, device features, ABIs and other beasts

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] Live migration protocol, device features, ABIs and other beasts
@ 2009-11-22 15:03 Dor Laor
  2009-11-22 15:49 ` Anthony Liguori
  2009-11-23 12:15 ` Juan Quintela
  0 siblings, 2 replies; 96+ messages in thread
From: Dor Laor @ 2009-11-22 15:03 UTC (permalink / raw)
  To: qemu-devel

In the last couple of days we discovered some issues regarding stable 
ABI and the robustness of the live migration protocol. Let's just jump 
right into it, ordered by complexity:

1. Control *every* feature exposed to the guest by qemu cmdline:

    While thinking on cross version migration, and reviewing some
    patches, I noticed that there are many times that we use feature bits
    in order to expose functionality for the guest driver - example:
    VIRTIO_BLK_F_BARRIER, but we do not control it from qemu cmdline.

    The result is that guest running on a newer qemu cannot live migrate
    into older qemu without the barrier feature.

    Like this barrier example, there are probably many cases that we
    do keep device/driver abi but forget new/old release abi.

    The solution here is simpler - Every guest visible change should
    translate into cmdline option. This is part of the machine type and
    in addition should be configurable.
    It's an issue we all should keep in the back of our heads and popup
    when a new capability/change are introduced.

2. Live migration inherent problem.

    Currently, even with VMState, the protocol is not flexible enough.
    We run into problem when we needed to fix pvclock migration issue.
    The fix included 2 additional fields in save/load state and thus
    needed a new version number.
    The trouble is that the load function does not accept sections with
    versions greater than the one it supports.
    We cannot even create a new 'hack section' for new code since the
    sections are ordered and expected to be exact match on the
    destination.

    The result is that new->old migration cannot work. This is not cross
    releases even! It means that even a small bug in current release
    prevents live migration between various instances of the code.
    It forces us to decide whether to fix pvclock migration issue vs
    allow new->old migration. Another ugly hack is to add cmdline that
    will control this behavior. Still it's a pain to mgmt stack and
    users.

    The solution here is more complex. One can claim that we should allow
    newer sections to be accepted by current code (and send the section
    size) and send optional sections. This would be a nasty work around.

    IMHO we should 'specify' the migration protocol and introduce
    capabilities, feature bits, etc. This way we'll have a robust,
    extensible protocol that will withstand any potential issue. Both
    Michael Tisrkin and I suggest it at the time vmstate was introduced.
    Vmstate is good for the code but it's not a protocol.

Which protocol should we use? You're smarter than me, please suggest
one.
wrt the above guest abi issue, we should write a qemu spec with clear 
definitions for devices, drivers, versions, etc.

Looking forward interesting fruitful discussion,
Dor

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Live migration protocol, device features, ABIs and other beasts
  2009-11-22 15:03 [Qemu-devel] Live migration protocol, device features, ABIs and other beasts Dor Laor
@ 2009-11-22 15:49 ` Anthony Liguori
  2009-11-22 20:22   ` [Qemu-devel] " Paolo Bonzini
  2009-11-24 13:21   ` Michael S. Tsirkin
  2009-11-23 12:15 ` Juan Quintela
  1 sibling, 2 replies; 96+ messages in thread
From: Anthony Liguori @ 2009-11-22 15:49 UTC (permalink / raw)
  To: dlaor; +Cc: qemu-devel

Dor Laor wrote:
> In the last couple of days we discovered some issues regarding stable 
> ABI and the robustness of the live migration protocol. Let's just jump 
> right into it, ordered by complexity:
>
> 1. Control *every* feature exposed to the guest by qemu cmdline:
>
>    While thinking on cross version migration, and reviewing some
>    patches, I noticed that there are many times that we use feature bits
>    in order to expose functionality for the guest driver - example:
>    VIRTIO_BLK_F_BARRIER, but we do not control it from qemu cmdline.
>
>    The result is that guest running on a newer qemu cannot live migrate
>    into older qemu without the barrier feature.
>
>    Like this barrier example, there are probably many cases that we
>    do keep device/driver abi but forget new/old release abi.
>
>    The solution here is simpler - Every guest visible change should
>    translate into cmdline option. This is part of the machine type and
>    in addition should be configurable.
>    It's an issue we all should keep in the back of our heads and popup
>    when a new capability/change are introduced.

s/cmdline/qdev/g and I agree with you.  There's nothing protocol 
specific about this though.

> 2. Live migration inherent problem.
>
>    Currently, even with VMState, the protocol is not flexible enough.
>    We run into problem when we needed to fix pvclock migration issue.
>    The fix included 2 additional fields in save/load state and thus
>    needed a new version number.
>    The trouble is that the load function does not accept sections with
>    versions greater than the one it supports.

This is a feature, not a bug.  You cannot migrate from an newer qemu to 
an older one.  There's simply no way to support this in a sane way.

>    We cannot even create a new 'hack section' for new code since the
>    sections are ordered and expected to be exact match on the
>    destination.
>
>    The result is that new->old migration cannot work. This is not cross
>    releases even! It means that even a small bug in current release
>    prevents live migration between various instances of the code.
>    It forces us to decide whether to fix pvclock migration issue vs
>    allow new->old migration. Another ugly hack is to add cmdline that
>    will control this behavior. Still it's a pain to mgmt stack and
>    users.

This is a pretty normal policy (backwards compat but not forwards compat).

>    The solution here is more complex. One can claim that we should allow
>    newer sections to be accepted by current code (and send the section
>    size) and send optional sections. This would be a nasty work around.
>
>    IMHO we should 'specify' the migration protocol and introduce
>    capabilities, feature bits, etc. This way we'll have a robust,
>    extensible protocol that will withstand any potential issue. Both
>    Michael Tisrkin and I suggest it at the time vmstate was introduced.
>    Vmstate is good for the code but it's not a protocol.

I don't see how this fixes anything.  If you used feature bits, how do 
you migrate from a version that has a feature bit that an older version 
doesn't know about?  Do you just ignore it?

Migration needs to be conservative.  There should be only two possible 
outcomes: 1) a successful live migration or 2) graceful failure with the 
source VM still running correctly.  Silently ignoring things that could 
affect the guests behavior means that it's possible that after failure, 
the guest will fail in an unexpected way.

> Which protocol should we use? You're smarter than me, please suggest
> one.
> wrt the above guest abi issue, we should write a qemu spec with clear 
> definitions for devices, drivers, versions, etc.

I don't think there's a problem with what we have now.  The only thing I 
think we should add is a vendor sub-versioning mechanism.  
Unfortunately, we have downstreams that make lots of changes.  Today, 
since we have a single version space, there is inevitable versioning 
clash because of the shared namespace.  If we had a sub-versioning 
mechanism, it provides a way for downstreams to backport features and 
change the device models in such a way that the versioning doesn't clash 
with upstream.

It also provides a way to determine if two downstreams are compatible 
with each other which is a pretty neat concept.

This could be done as a small, incremental change to the current protocol.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and   other beasts
  2009-11-22 15:49 ` Anthony Liguori
@ 2009-11-22 20:22   ` Paolo Bonzini
  2009-11-23  2:17     ` Anthony Liguori
  2009-11-24 13:21   ` Michael S. Tsirkin
  1 sibling, 1 reply; 96+ messages in thread
From: Paolo Bonzini @ 2009-11-22 20:22 UTC (permalink / raw)
  To: qemu-devel


> I don't see how this fixes anything. If you used feature bits, how do
> you migrate from a version that has a feature bit that an older version
> doesn't know about? Do you just ignore it?

I'd go with chunk instead of feature bits, specifying them like in the 
PNG specification:

       Each chunk consists of four parts:

       Length
          A 4-byte unsigned integer giving the number of bytes in the
          chunk's data field. The length counts only the data field, not
          itself, the chunk type code, or the CRC.  Zero is a valid
          length.  Although encoders and decoders should treat the length
          as unsigned, its value must not exceed (2^31)-1 bytes.

       Chunk Type
          A 4-byte chunk type code.  For convenience in description and
          in examining PNG files, type codes are restricted to consist of
          uppercase and lowercase ASCII letters (A-Z and a-z, or 65-90
          and 97-122 decimal).  [...]  Four bits of the type code,
          namely bit 5 (value 32) of each byte, are used to convey chunk
          properties. The property bits are an inherent part of
          the chunk name, and hence are fixed for any chunk type.

          The semantics of the property bits are:

          Ancillary bit: bit 5 of first byte
             0 (uppercase) = critical, 1 (lowercase) = ancillary.

             Chunks that are not strictly necessary in order to
             meaningfully display the contents of the file are known as
             "ancillary" chunks.  [Mandatory chunks may still be useful
             to QEMU, for example there could be a "PCI" chunk type].

          Private bit: bit 5 of second byte
             0 (uppercase) = public, 1 (lowercase) = private.

             A public chunk is one that is part of the PNG specification
             or is registered in the list of PNG special-purpose public
             chunk types.  Applications can also define private
             (unregistered) chunks for their own purposes. [This could
             mean that only savannah qemu.git can define public chunks.]

          Reserved bit: bit 5 of third byte
             Must be 0 (uppercase) in files conforming to this version of
             PNG.

             The significance of the case of the third letter of the
             chunk name is reserved for possible future expansion.

          Safe-to-copy bit: bit 5 of fourth byte
             0 (uppercase) = unsafe to copy, 1 (lowercase) = safe to
             copy.  [Would not matter for QEMU, I guess].

       Chunk Data
          The data bytes appropriate to the chunk type, if any.  This
          field can be of zero length.

       CRC
          A 4-byte CRC (Cyclic Redundancy Check) calculated on the
          preceding bytes in the chunk, including the chunk type code and
          chunk data fields, but not including the length field. The CRC
          is always present, even for chunks containing no data.

> Migration needs to be conservative. There should be only two possible
> outcomes: 1) a successful live migration or 2) graceful failure with the
> source VM still running correctly. Silently ignoring things that could
> affect the guests behavior means that it's possible that after failure,
> the guest will fail in an unexpected way.

It's up to the source to decide what information is extra.  For example, 
the state of a RNG emulation is nice-to-have, but as long as it is 
initialized from another random source on the destination you shouldn't 
care.

Paolo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and   other beasts
  2009-11-22 20:22   ` [Qemu-devel] " Paolo Bonzini
@ 2009-11-23  2:17     ` Anthony Liguori
  2009-11-23  8:18       ` Paolo Bonzini
                         ` (3 more replies)
  0 siblings, 4 replies; 96+ messages in thread
From: Anthony Liguori @ 2009-11-23  2:17 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

Paolo Bonzini wrote:
>
>> I don't see how this fixes anything. If you used feature bits, how do
>> you migrate from a version that has a feature bit that an older version
>> doesn't know about? Do you just ignore it?
>
> I'd go with chunk instead of feature bits, specifying them like in the 
> PNG specification:

You mean, each device would have multiple sections?  We already use 
chunks for each device state.

>> Migration needs to be conservative. There should be only two possible
>> outcomes: 1) a successful live migration or 2) graceful failure with the
>> source VM still running correctly. Silently ignoring things that could
>> affect the guests behavior means that it's possible that after failure,
>> the guest will fail in an unexpected way.
>
> It's up to the source to decide what information is extra.  For 
> example, the state of a RNG emulation is nice-to-have, but as long as 
> it is initialized from another random source on the destination you 
> shouldn't care.

We only migrate things that are guest visible.  Everything else is left 
to the user to configure.  We wouldn't migrate the state of a RNG 
emulation provided that it doesn't have an impact on the guest.

By definition, anything that is guest visible is important because it 
affects the guest's behavior.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs   and   other beasts
  2009-11-23  2:17     ` Anthony Liguori
@ 2009-11-23  8:18       ` Paolo Bonzini
  2009-11-23 13:04         ` Anthony Liguori
  2009-11-23  8:26       ` Gleb Natapov
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 96+ messages in thread
From: Paolo Bonzini @ 2009-11-23  8:18 UTC (permalink / raw)
  To: qemu-devel

On 11/23/2009 03:17 AM, Anthony Liguori wrote:
> You mean, each device would have multiple sections?  We already use
> chunks for each device state.

If they want to, yes.

> We only migrate things that are guest visible.  Everything else is left
> to the user to configure.  We wouldn't migrate the state of a RNG
> emulation provided that it doesn't have an impact on the guest.

The project doing lockstep virtualization would need to migrate it, for 
example.

> By definition, anything that is guest visible is important because it
> affects the guest's behavior.

Yes, but vendors want backwards-compatibility whenever possible. 
Anything that is guest visible is important, but some things are less 
important than others (or they wouldn't have been overlooked in the 
first place).

Paolo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and   other beasts
  2009-11-23  2:17     ` Anthony Liguori
  2009-11-23  8:18       ` Paolo Bonzini
@ 2009-11-23  8:26       ` Gleb Natapov
  2009-11-23  9:29         ` Paolo Bonzini
       [not found]         ` <m3iqd14edf.fsf@neno.neno>
  2009-11-23 13:51       ` Eduardo Habkost
  2009-11-24 13:17       ` [Qemu-devel] " Michael S. Tsirkin
  3 siblings, 2 replies; 96+ messages in thread
From: Gleb Natapov @ 2009-11-23  8:26 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, qemu-devel

On Sun, Nov 22, 2009 at 08:17:46PM -0600, Anthony Liguori wrote:
> Paolo Bonzini wrote:
> >
> >>I don't see how this fixes anything. If you used feature bits, how do
> >>you migrate from a version that has a feature bit that an older version
> >>doesn't know about? Do you just ignore it?
> >
> >I'd go with chunk instead of feature bits, specifying them like in
> >the PNG specification:
> 
> You mean, each device would have multiple sections?  We already use
> chunks for each device state.
> 
Each device can send device info in multiple formats (each format with
its own ID) and destination will choose the one it supports.

--
			Gleb.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and   other beasts
  2009-11-23  8:26       ` Gleb Natapov
@ 2009-11-23  9:29         ` Paolo Bonzini
  2009-11-23  9:31           ` Gleb Natapov
       [not found]         ` <m3iqd14edf.fsf@neno.neno>
  1 sibling, 1 reply; 96+ messages in thread
From: Paolo Bonzini @ 2009-11-23  9:29 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: qemu-devel

On 11/23/2009 09:26 AM, Gleb Natapov wrote:
>>> >  >I'd go with chunk instead of feature bits, specifying them like in
>>> >  >the PNG specification:
>> >
>> >  You mean, each device would have multiple sections?  We already use
>> >  chunks for each device state.
>> >
> Each device can send device info in multiple formats (each format with
> its own ID) and destination will choose the one it supports.

First of all, we'd need a mechanism to send _lengths_ of chunks.  This 
is especially important since there could be other consumers than QEMU 
for vm state data, and these may be interested only in few pieces of data.

Paolo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and   other beasts
  2009-11-23  9:29         ` Paolo Bonzini
@ 2009-11-23  9:31           ` Gleb Natapov
       [not found]             ` <m3einp4e7c.fsf@neno.neno>
  0 siblings, 1 reply; 96+ messages in thread
From: Gleb Natapov @ 2009-11-23  9:31 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Mon, Nov 23, 2009 at 10:29:12AM +0100, Paolo Bonzini wrote:
> On 11/23/2009 09:26 AM, Gleb Natapov wrote:
> >>>>  >I'd go with chunk instead of feature bits, specifying them like in
> >>>>  >the PNG specification:
> >>>
> >>>  You mean, each device would have multiple sections?  We already use
> >>>  chunks for each device state.
> >>>
> >Each device can send device info in multiple formats (each format with
> >its own ID) and destination will choose the one it supports.
> 
> First of all, we'd need a mechanism to send _lengths_ of chunks.
And we need the mechanism to match incoming chunks to a consumer.
Not relay on order of incoming data.

> This is especially important since there could be other consumers
> than QEMU for vm state data, and these may be interested only in few
> pieces of data.
> 
> Paolo

--
			Gleb.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-22 15:03 [Qemu-devel] Live migration protocol, device features, ABIs and other beasts Dor Laor
  2009-11-22 15:49 ` Anthony Liguori
@ 2009-11-23 12:15 ` Juan Quintela
  2009-11-23 13:09   ` Anthony Liguori
                     ` (2 more replies)
  1 sibling, 3 replies; 96+ messages in thread
From: Juan Quintela @ 2009-11-23 12:15 UTC (permalink / raw)
  To: dlaor; +Cc: qemu-devel

Dor Laor <dlaor@redhat.com> wrote:
> In the last couple of days we discovered some issues regarding stable
> ABI and the robustness of the live migration protocol. Let's just jump
> right into it, ordered by complexity:
>
> 1. Control *every* feature exposed to the guest by qemu cmdline:
>
>    While thinking on cross version migration, and reviewing some
>    patches, I noticed that there are many times that we use feature bits
>    in order to expose functionality for the guest driver - example:
>    VIRTIO_BLK_F_BARRIER, but we do not control it from qemu cmdline.

In my opinion this is madness, qemu command line is already too
complicated.  I agree with anthony to put it in the command line.
I will go further, and think that this kind of issues should be put into
the machine type.

If you start qemu with -M pc-0.10, it should save the state in a 0.10
compatible way (that don't happens at the moment, but it should work
that way).

>    The solution here is simpler - Every guest visible change should
>    translate into cmdline option. This is part of the machine type and
>    in addition should be configurable.
>    It's an issue we all should keep in the back of our heads and popup
>    when a new capability/change are introduced.

I think this creates again a exponential posibilities, when we are only
interested in some small part of the combinations.

Upstream: It only cares about what was is 0.10, 0.11 and 0.12, three
combinations, not all possible combinations of all devices.

Downstream: (we for instance) care about RHEL5.4 RHEL5.5 or similar, not
all other possible combinations.

Notice that this is important, we change devices constantly, but we need
new machine types ... once each X months, i.e. they are much less.

> 2. Live migration inherent problem.
>
>    Currently, even with VMState, the protocol is not flexible enough.
>    We run into problem when we needed to fix pvclock migration issue.
>    The fix included 2 additional fields in save/load state and thus
>    needed a new version number.
>    The trouble is that the load function does not accept sections with
>    versions greater than the one it supports.

This is a feature :)
We need te _save function to decide if it can save in an old enough
format.  As we are now, there are no infrastructure of doing different
things on save, but we should be able.

>    IMHO we should 'specify' the migration protocol and introduce
>    capabilities, feature bits, etc. This way we'll have a robust,
>    extensible protocol that will withstand any potential issue. Both
>    Michael Tisrkin and I suggest it at the time vmstate was introduced.
>    Vmstate is good for the code but it's not a protocol.

A very small machine, has:
- "ide"
- "cpu"
- "e1000"
- "acpi"
- "apic"
- "cirrus_vga"
- "usb"
- "pckbd"
- "ps2mouse"
- ....

(no machine has so few devices), took that each device only has 2
features/capabilities/..., that is: 2^9 combinations.  Start to think
about test cases for going forward <-> backward, etc.

Now change it again, and decide that you support 2 kind of machines:
- pc-0.10
- pc-0.11

(that is what you are going to be interested), and now, just in case, it
becames easy to know what things you need to support/test.

i.e. old acpi and new apic is not a valid combination, ...

> Which protocol should we use? You're smarter than me, please suggest
> one.
> wrt the above guest abi issue, we should write a qemu spec with clear
> definitions for devices, drivers, versions, etc.
>
> Looking forward interesting fruitful discussion,

My idea here is that we need to have further use of machine
descriptions, once that is done, we need something like a new property
for qdev (version?).  Once there, each device could do:
- if version != last_version -> die (what it happens now)
- do someting sensible, not use the "new" features not existing on that
  version
- edit the savevm format in an easy way.

With respect to VMState, we have the information of _when_ a field was
introduced.  With VMState it is not difficult to save state in an old
format  (we just need to add some infrastructure, but nothing too
dificult),  What is needed is to test/check that for each particular
device, saving the subset of fields is enough.  And this is a though job.

Later, Juan.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and   other beasts
       [not found]         ` <m3iqd14edf.fsf@neno.neno>
@ 2009-11-23 12:36           ` Gleb Natapov
       [not found]             ` <m3r5rpwcww.fsf@neno.neno>
  2009-11-24 13:28             ` Michael S. Tsirkin
  2009-11-23 13:01           ` Anthony Liguori
  1 sibling, 2 replies; 96+ messages in thread
From: Gleb Natapov @ 2009-11-23 12:36 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Paolo Bonzini, qemu-devel

On Mon, Nov 23, 2009 at 01:25:32PM +0100, Juan Quintela wrote:
> Gleb Natapov <gleb@redhat.com> wrote:
> > On Sun, Nov 22, 2009 at 08:17:46PM -0600, Anthony Liguori wrote:
> >> Paolo Bonzini wrote:
> >> >
> >> >>I don't see how this fixes anything. If you used feature bits, how do
> >> >>you migrate from a version that has a feature bit that an older version
> >> >>doesn't know about? Do you just ignore it?
> >> >
> >> >I'd go with chunk instead of feature bits, specifying them like in
> >> >the PNG specification:
> >> 
> >> You mean, each device would have multiple sections?  We already use
> >> chunks for each device state.
> >> 
> > Each device can send device info in multiple formats (each format with
> > its own ID) and destination will choose the one it supports.
> 
> RAM anyone?  You send 1GB of info in different formats, just in case :)
> 
RAM is migrated very differently from devices. No need to extend this
for RAM.

> In this case, I think that the only two realistic solutions are to
> increase negotiation during migration:
> 
> - source -> target: I can save this devices with this versions:
>    "cpu" 10 - 12
>    "apic" 3
>    "ide" 2-4
>    "virtio-net" 5-10
>    ....
> - target -> source:
>    * I don't support "virtio-net" at all -> failed migration
>    * send it as:
>       "cpu" 11
>       "apic" 3
>       "ide" 2
>       "virtio-net" 10
>      thankyou very much :)
> 
> The other (more simple solution) is:
> 
> - source -> target:
>    my machine types are: pc-0.10, pc-0.11
>    my devices are: "cpu", "apic", "ide", "virtio-net"
> - target-> source
>    * I dont' have "virtio-net" -> fail, or
>    * send it as pc-0.10, please
> 
Yes, I proposed to send device state in multiple formats specifically to
prevent negotiation step, but may be proper negotiation is not so
bad/complex after all.

> My problem implementing optional features/sections/... is not the
> savevm/VMState bits.  At the end, implementing that is easy.  What is
> more dificult is once that a device have 5 features, what are the valid
> combinations.  i.e. if you have pci and msix features, msix requires
> pci.  In this case, the dependency is trivial, but in others that
> hasen't to be so obvious.
It doesn't matter what device support and how it is configured. This can
be handled by each device separately. i.e if destination detects that
source had MSIX enabled for the device but destination hasn't it will
signal an error.

> 
> Having subversions for downstream is a different problem that needs to
> be fixed.
> 
> Later, Juan.

--
			Gleb.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and   other beasts
       [not found]             ` <m3einp4e7c.fsf@neno.neno>
@ 2009-11-23 12:37               ` Gleb Natapov
  0 siblings, 0 replies; 96+ messages in thread
From: Gleb Natapov @ 2009-11-23 12:37 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Paolo Bonzini, qemu-devel

On Mon, Nov 23, 2009 at 01:29:11PM +0100, Juan Quintela wrote:
> Gleb Natapov <gleb@redhat.com> wrote:
> > On Mon, Nov 23, 2009 at 10:29:12AM +0100, Paolo Bonzini wrote:
> >> On 11/23/2009 09:26 AM, Gleb Natapov wrote:
> >> >>>>  >I'd go with chunk instead of feature bits, specifying them like in
> >> >>>>  >the PNG specification:
> >> >>>
> >> >>>  You mean, each device would have multiple sections?  We already use
> >> >>>  chunks for each device state.
> >> >>>
> >> >Each device can send device info in multiple formats (each format with
> >> >its own ID) and destination will choose the one it supports.
> >> 
> >> First of all, we'd need a mechanism to send _lengths_ of chunks.
> > And we need the mechanism to match incoming chunks to a consumer.
> > Not relay on order of incoming data.
> 
> We already do that:
> see savevm.c:qemu_loadvm_state()
> 
Cool. Didn't know that.

> After reading a section, we call find_se(), with searchs for a
> matching section, we can do the same for any subsection/chunk/<name it>
> 
> static SaveStateEntry *find_se(const char *idstr, int instance_id)
> {
>     SaveStateEntry *se;
> 
>     QTAILQ_FOREACH(se, &savevm_handlers, entry) {
>         if (!strcmp(se->idstr, idstr) &&
>             instance_id == se->instance_id)
>             return se;
>     }
>     return NULL;
> }
> 
> Later, Juan.

--
			Gleb.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and   other beasts
       [not found]         ` <m3iqd14edf.fsf@neno.neno>
  2009-11-23 12:36           ` Gleb Natapov
@ 2009-11-23 13:01           ` Anthony Liguori
       [not found]             ` <m3vdh1wd0n.fsf@neno.neno>
  1 sibling, 1 reply; 96+ messages in thread
From: Anthony Liguori @ 2009-11-23 13:01 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Paolo Bonzini, qemu-devel, Gleb Natapov

Juan Quintela wrote:
> Gleb Natapov <gleb@redhat.com> wrote:
>   
>> On Sun, Nov 22, 2009 at 08:17:46PM -0600, Anthony Liguori wrote:
>>     
>>> Paolo Bonzini wrote:
>>>       
>>>>> I don't see how this fixes anything. If you used feature bits, how do
>>>>> you migrate from a version that has a feature bit that an older version
>>>>> doesn't know about? Do you just ignore it?
>>>>>           
>>>> I'd go with chunk instead of feature bits, specifying them like in
>>>> the PNG specification:
>>>>         
>>> You mean, each device would have multiple sections?  We already use
>>> chunks for each device state.
>>>
>>>       
>> Each device can send device info in multiple formats (each format with
>> its own ID) and destination will choose the one it supports.
>>     
>
> RAM anyone?  You send 1GB of info in different formats, just in case :)
>
> In this case, I think that the only two realistic solutions are to
> increase negotiation during migration:
>
> - source -> target: I can save this devices with this versions:
>    "cpu" 10 - 12
>    "apic" 3
>    "ide" 2-4
>    "virtio-net" 5-10
>    ....
> - target -> source:
>    * I don't support "virtio-net" at all -> failed migration
>    * send it as:
>       "cpu" 11
>       "apic" 3
>       "ide" 2
>       "virtio-net" 10
>      thankyou very much :)
>   

I'm not at all convinced that you can downgrade the version of a device 
without exposing a functional change to a guest.  In fact, I'm pretty 
certain that it's provably impossible.  Please give a counter example of 
where this mechanism would be safe.

> Having subversions for downstream is a different problem that needs to
> be fixed.
>   

Yup.  However, I believe the problem is addresses is really what's 
driving this discussion of optional features.

Regards,

Anthony Liguori

> Later, Juan.
>   

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and   other beasts
  2009-11-23  8:18       ` Paolo Bonzini
@ 2009-11-23 13:04         ` Anthony Liguori
  0 siblings, 0 replies; 96+ messages in thread
From: Anthony Liguori @ 2009-11-23 13:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

Paolo Bonzini wrote:
> On 11/23/2009 03:17 AM, Anthony Liguori wrote:
>> You mean, each device would have multiple sections?  We already use
>> chunks for each device state.
>
> If they want to, yes.
>
>> We only migrate things that are guest visible.  Everything else is left
>> to the user to configure.  We wouldn't migrate the state of a RNG
>> emulation provided that it doesn't have an impact on the guest.
>
> The project doing lockstep virtualization would need to migrate it, 
> for example.

Lock step is an entirely different beast.  The live migration protocol 
is not suitable for it.

>> By definition, anything that is guest visible is important because it
>> affects the guest's behavior.
>
> Yes, but vendors want backwards-compatibility whenever possible. 
> Anything that is guest visible is important, but some things are less 
> important than others (or they wouldn't have been overlooked in the 
> first place).

I disagree.  Everything is equally important if we want migration to be 
correct.

I don't see how backwards compatibility fits into this picture though.  
The only argument I've heard for a change here is forwards compatibility 
which is not something I would ever expect any vendor to want to support.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-23 12:15 ` Juan Quintela
@ 2009-11-23 13:09   ` Anthony Liguori
  2009-11-23 14:13     ` Juan Quintela
  2009-11-24 10:39   ` Dor Laor
  2009-11-24 13:59   ` Michael S. Tsirkin
  2 siblings, 1 reply; 96+ messages in thread
From: Anthony Liguori @ 2009-11-23 13:09 UTC (permalink / raw)
  To: Juan Quintela; +Cc: dlaor, qemu-devel

Juan Quintela wrote:
> Dor Laor <dlaor@redhat.com> wrote:
>   
> I will go further, and think that this kind of issues should be put into
> the machine type.
>   

I agree.

> If you start qemu with -M pc-0.10, it should save the state in a 0.10
> compatible way (that don't happens at the moment, but it should work
> that way).
>   

Yes, that's a very good point.

>>    The solution here is simpler - Every guest visible change should
>>    translate into cmdline option. This is part of the machine type and
>>    in addition should be configurable.
>>    It's an issue we all should keep in the back of our heads and popup
>>    when a new capability/change are introduced.
>>     
>
> I think this creates again a exponential posibilities, when we are only
> interested in some small part of the combinations.
>
> Upstream: It only cares about what was is 0.10, 0.11 and 0.12, three
> combinations, not all possible combinations of all devices.
>
> Downstream: (we for instance) care about RHEL5.4 RHEL5.5 or similar, not
> all other possible combinations.
>
> Notice that this is important, we change devices constantly, but we need
> new machine types ... once each X months, i.e. they are much less.
>   

Yes, I think this is an important point to keep things under control.  
Supporting all possible combinations of different git versions is going 
to be unrealistically difficult.  However, making sure that we can 
reproduce the major versions would work well.

 A test suite that ran in the guest and tried to fingerprint the machine 
would be pretty helpful.  Dump the full config for each PCI device, 
where they sit in the device model, etc.  Dump ACPI tables.  For a 
subset of devices we support, query as much info as possible.  Dump it 
all to a config file.

We could run it against 0.11, then when 0.12 rolls around, run it again 
in a guest with -M pc-0.11.  A simple diff of the two config files would 
tell us whether we broke guest compatibility.

>> Which protocol should we use? You're smarter than me, please suggest
>> one.
>> wrt the above guest abi issue, we should write a qemu spec with clear
>> definitions for devices, drivers, versions, etc.
>>
>> Looking forward interesting fruitful discussion,
>>     
>
> My idea here is that we need to have further use of machine
> descriptions, once that is done, we need something like a new property
> for qdev (version?).  Once there, each device could do:
> - if version != last_version -> die (what it happens now)
> - do someting sensible, not use the "new" features not existing on that
>   version
> - edit the savevm format in an easy way.
>   

But this would only kick in when using pc-0.11 or something, right?

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-23  2:17     ` Anthony Liguori
  2009-11-23  8:18       ` Paolo Bonzini
  2009-11-23  8:26       ` Gleb Natapov
@ 2009-11-23 13:51       ` Eduardo Habkost
  2009-11-23 14:21         ` Paolo Bonzini
  2009-11-23 14:53         ` [Qemu-devel] " Anthony Liguori
  2009-11-24 13:17       ` [Qemu-devel] " Michael S. Tsirkin
  3 siblings, 2 replies; 96+ messages in thread
From: Eduardo Habkost @ 2009-11-23 13:51 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, qemu-devel

Excerpts from Anthony Liguori's message of Mon Nov 23 00:17:46 -0200 2009:
> Paolo Bonzini wrote:
> >
> >> I don't see how this fixes anything. If you used feature bits, how do
> >> you migrate from a version that has a feature bit that an older version
> >> doesn't know about? Do you just ignore it?
> >
> > I'd go with chunk instead of feature bits, specifying them like in the 
> > PNG specification:
> 
> You mean, each device would have multiple sections?  We already use 
> chunks for each device state.

I was going to suggest that. The current section system would be a good
candidate for a "capability" system. e.g. instead of adding new data to
the "cpu" section and increasing its version number, you add a
"cpu.pvclock-msrs" section that contains the new data. This way, if a
vendor backports the fix, it doesn't mess up with the section version
numbers.

We could improve the API to make it easier to do, or maybe even improve
the protocol/format to make the "subsection" representation more
compact. Either way, my point is that using names to track new
capabilities sounds better than having to agree on capability bits.


> 
> >> Migration needs to be conservative. There should be only two possible
> >> outcomes: 1) a successful live migration or 2) graceful failure with the
> >> source VM still running correctly. Silently ignoring things that could
> >> affect the guests behavior means that it's possible that after failure,
> >> the guest will fail in an unexpected way.
> >
> > It's up to the source to decide what information is extra.  For 
> > example, the state of a RNG emulation is nice-to-have, but as long as 
> > it is initialized from another random source on the destination you 
> > shouldn't care.
> 
> We only migrate things that are guest visible.  Everything else is left 
> to the user to configure.  We wouldn't migrate the state of a RNG 
> emulation provided that it doesn't have an impact on the guest.
> 
> By definition, anything that is guest visible is important because it 
> affects the guest's behavior.

Right, but I wouldn't be surprised if a user complains that "I know that
my guest don't use that VM feature, so I want to be able to migrate to
an older version anyway".
-- 
Eduardo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-23 13:09   ` Anthony Liguori
@ 2009-11-23 14:13     ` Juan Quintela
  2009-11-24 14:05       ` Michael S. Tsirkin
  0 siblings, 1 reply; 96+ messages in thread
From: Juan Quintela @ 2009-11-23 14:13 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: dlaor, qemu-devel

Anthony Liguori <anthony@codemonkey.ws> wrote:
> Juan Quintela wrote:
>> Dor Laor <dlaor@redhat.com> wrote:
>>>     
>>
>> My idea here is that we need to have further use of machine
>> descriptions, once that is done, we need something like a new property
>> for qdev (version?).  Once there, each device could do:
>> - if version != last_version -> die (what it happens now)
>> - do someting sensible, not use the "new" features not existing on that
>>   version
>> - edit the savevm format in an easy way.
>>   
>
> But this would only kick in when using pc-0.11 or something, right?

Yeap.

At this point, pc-0.10 is just:

static QEMUMachine pc_machine_v0_10 = {
    .name = "pc-0.10",
    .desc = "Standard PC, qemu 0.10",
    .init = pc_init_pci,
    .max_cpus = 255,
    .compat_props = (CompatProperty[]) {
        {
            .driver   = "virtio-blk-pci",
            .property = "class",
            .value    = stringify(PCI_CLASS_STORAGE_OTHER),
        },{
            .driver   = "virtio-console-pci",
            .property = "class",
            .value    = stringify(PCI_CLASS_DISPLAY_OTHER),
        },{
            .driver   = "virtio-net-pci",
            .property = "vectors",
            .value    = stringify(0),
        },{
            .driver   = "virtio-blk-pci",
            .property = "vectors",
            .value    = stringify(0),
        },
        { /* end of list */ }
    },

But to really make it work, we need to take a list of each savevm format
change and put it here.  Notice that several changes are needed:
- savevm infrastructure save functions don't know about version id
- devices don't know to "behave" as other version
- other things that I have probably missed

Later, Juan.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-23 13:51       ` Eduardo Habkost
@ 2009-11-23 14:21         ` Paolo Bonzini
  2009-11-23 15:00           ` Anthony Liguori
  2009-11-23 15:02           ` Eduardo Habkost
  2009-11-23 14:53         ` [Qemu-devel] " Anthony Liguori
  1 sibling, 2 replies; 96+ messages in thread
From: Paolo Bonzini @ 2009-11-23 14:21 UTC (permalink / raw)
  To: Eduardo Habkost; +Cc: qemu-devel

On 11/23/2009 02:51 PM, Eduardo Habkost wrote:
> Right, but I wouldn't be surprised if a user complains that "I know that
> my guest don't use that VM feature, so I want to be able to migrate to
> an older version anyway".

That's a bit more tricky.  What if the older version doesn't support 
sound (just making up an example) and you know that you're not using a 
client that plays sound, but still the sound card is part of the 
machine?  I think there is no doubt that the migration (or save/restore) 
should be aborted in that case.

I would not generalize very much, the problem that Dor posed is very 
specific and probably quite rare.  However, it's real.

Paolo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and   other beasts
       [not found]             ` <m3r5rpwcww.fsf@neno.neno>
@ 2009-11-23 14:32               ` Gleb Natapov
  2009-11-23 14:51                 ` Anthony Liguori
  0 siblings, 1 reply; 96+ messages in thread
From: Gleb Natapov @ 2009-11-23 14:32 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Paolo Bonzini, qemu-devel

On Mon, Nov 23, 2009 at 03:09:35PM +0100, Juan Quintela wrote:
> Gleb Natapov <gleb@redhat.com> wrote:
> > On Mon, Nov 23, 2009 at 01:25:32PM +0100, Juan Quintela wrote:
> >> Gleb Natapov <gleb@redhat.com> wrote:
> >> > On Sun, Nov 22, 2009 at 08:17:46PM -0600, Anthony Liguori wrote:
> 
> > Yes, I proposed to send device state in multiple formats specifically to
> > prevent negotiation step, but may be proper negotiation is not so
> > bad/complex after all.
> 
> Advantages of proper negotation is that target can told you "no".  It
> does'nt matter the number of formats that you send, if the target don't
> understand them, it is not going to work.
> 
The only difference is when it will be known that migration is not
possible: before event attempting it or during migration.

> >> My problem implementing optional features/sections/... is not the
> >> savevm/VMState bits.  At the end, implementing that is easy.  What is
> >> more dificult is once that a device have 5 features, what are the valid
> >> combinations.  i.e. if you have pci and msix features, msix requires
> >> pci.  In this case, the dependency is trivial, but in others that
> >> hasen't to be so obvious.
> > It doesn't matter what device support and how it is configured. This can
> > be handled by each device separately. i.e if destination detects that
> > source had MSIX enabled for the device but destination hasn't it will
> > signal an error.
> 
> And guess what, with current code migration is going to "suceed" on the
> source host and fail on the target host.
Then current code is buggy. It should be possible to abort migration in
the middle if device can't understand the data it received.

--
			Gleb.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and   other beasts
       [not found]             ` <m3vdh1wd0n.fsf@neno.neno>
@ 2009-11-23 14:49               ` Anthony Liguori
  2009-11-23 15:21                 ` Eduardo Habkost
                                   ` (2 more replies)
  0 siblings, 3 replies; 96+ messages in thread
From: Anthony Liguori @ 2009-11-23 14:49 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Paolo Bonzini, qemu-devel, Gleb Natapov

Juan Quintela wrote:
> Anthony Liguori <anthony@codemonkey.ws> wrote:
>   
>> Juan Quintela wrote:
>>     
>
>   
>> I'm not at all convinced that you can downgrade the version of a
>> device without exposing a functional change to a guest.  In fact, I'm
>> pretty certain that it's provably impossible.  Please give a counter
>> example of where this mechanism would be safe.
>>     
>
> The problem that we are having in RHEL just now is that there are two
> new fields to make pvclock/kvmclock more exact (this is qemu-kvm tree):
>
>         /* KVM pvclock msr */
>         VMSTATE_UINT64_V(system_time_msr, CPUState, 12),
>         VMSTATE_UINT64_V(wall_clock_msr, CPUState, 12),
>
> Before we added that values to the state, we used whatever time the host
> were using for that values (yes, we had drift).
>
> But if we don't send that two values, we are not worse that we were
> before adding that to the state.
>   

But the effect is that after you migrate, you change behavior.  In this 
case, you migrate a guest that isn't drifting and then after migration, 
you start drifting.

Changing guest behavior during migration means that the guest becomes 
part of the equation with respect to how well it behaves with this 
change.  If we can prove a guest behaves exactly the same before and 
after migration, then assuming we're correct, we don't have to test 
migration with more than one guest.  Practically speaking, testing with 
more guests is good because it uncovers new bugs.

However, if we rely on certain guest behavior, then it blows up the 
testing matrix because now we have to test every guest with every 
workload to see whether it works with migration.  It's a slippery slope 
that's hard to get off once you start.

> What is our problem here is (you can substitute qemu versions for RHEL
> if it makes your feel better)
>
> Client start with qemu 0.10, it has its image running here
> so far so good
>
> It just happens that appears qemu 0.12
> He wants to test it, no problem, you can go from qemu 0.11 to qemu-0.12
>
> But (and this is the big but), he wants to be sure that he can go back
> to 0.11 if anything bad happens.  Then we want to start:
>
> qemu-0.12 -M pc-0.11
>
> and that this is able to migrate back to qemu-0.12.
>
> Being able to save state with qemu-0.12 in qemu-0.11 format is quite
> difficult (specially because we didn't even try).
>   

But that's the real fix here.

> But if you know substitute qemu-0.11 and qemu-0.12 for RHEL5.4 and
> RHEL5.4.1, you will see that the code bases are going to be really,
> really similar.  And if any savevm format is changed, it is because
> there are no other solution.
>   

In our own stable branch, we do not introduce any savevm changes.  I 
would recommend the same policy for RHEL :-)

> In the cases that we have had so far, this is feasible. I.e. the new
> field just give a "more exact" behaviour, but not sending this new
> value, just got the same behaviour than before.
>   

You may be willing to expose this to your users but as an upstream 
policy, I'm very opposed to it.  You're breaking the contract of 
migration by changing the guests behavior from underneath it.

If I'm a large scale virtualization deployment and I'm using live 
migration transparently to balance the load globally, it needs to be 
completely transparent to the guests running in the deployment.  Failure 
needs to be very exact.

With the time drift example, you've introduced a policy into qemu that 
really belongs in the management layer.  You've decided that changing 
guest behavior by introducing drift in pvclock is acceptable compared to 
the value of this one use-case.

A better approach would be having an option to "force" a migration 
across incompatible versions.  I think such an option would be pretty 
dangerous to offer but at least it puts the decision in the hands of the 
management software where it belongs.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and   other beasts
  2009-11-23 14:32               ` Gleb Natapov
@ 2009-11-23 14:51                 ` Anthony Liguori
  2009-11-23 14:53                   ` Gleb Natapov
       [not found]                   ` <m33a45s009.fsf@neno.neno>
  0 siblings, 2 replies; 96+ messages in thread
From: Anthony Liguori @ 2009-11-23 14:51 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Paolo Bonzini, Juan Quintela, qemu-devel

Gleb Natapov wrote:
>>>> My problem implementing optional features/sections/... is not the
>>>> savevm/VMState bits.  At the end, implementing that is easy.  What is
>>>> more dificult is once that a device have 5 features, what are the valid
>>>> combinations.  i.e. if you have pci and msix features, msix requires
>>>> pci.  In this case, the dependency is trivial, but in others that
>>>> hasen't to be so obvious.
>>>>         
>>> It doesn't matter what device support and how it is configured. This can
>>> be handled by each device separately. i.e if destination detects that
>>> source had MSIX enabled for the device but destination hasn't it will
>>> signal an error.
>>>       
>> And guess what, with current code migration is going to "suceed" on the
>> source host and fail on the target host.
>>     
> Then current code is buggy. It should be possible to abort migration in
> the middle if device can't understand the data it received.
>   

It can, post_load() can error which will terminate the migration.  This 
can be used to validate fields beyond whether they fit into the type 
specified.

Regards,

Anthony Liguori
> --
> 			Gleb.
>   

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-23 13:51       ` Eduardo Habkost
  2009-11-23 14:21         ` Paolo Bonzini
@ 2009-11-23 14:53         ` Anthony Liguori
  2009-11-24 14:28           ` [Qemu-devel] " Michael S. Tsirkin
  1 sibling, 1 reply; 96+ messages in thread
From: Anthony Liguori @ 2009-11-23 14:53 UTC (permalink / raw)
  To: Eduardo Habkost; +Cc: Paolo Bonzini, qemu-devel

Eduardo Habkost wrote:
>>>> Migration needs to be conservative. There should be only two possible
>>>> outcomes: 1) a successful live migration or 2) graceful failure with the
>>>> source VM still running correctly. Silently ignoring things that could
>>>> affect the guests behavior means that it's possible that after failure,
>>>> the guest will fail in an unexpected way.
>>>>         
>>> It's up to the source to decide what information is extra.  For 
>>> example, the state of a RNG emulation is nice-to-have, but as long as 
>>> it is initialized from another random source on the destination you 
>>> shouldn't care.
>>>       
>> We only migrate things that are guest visible.  Everything else is left 
>> to the user to configure.  We wouldn't migrate the state of a RNG 
>> emulation provided that it doesn't have an impact on the guest.
>>
>> By definition, anything that is guest visible is important because it 
>> affects the guest's behavior.
>>     
>
> Right, but I wouldn't be surprised if a user complains that "I know that
> my guest don't use that VM feature, so I want to be able to migrate to
> an older version anyway".
>   

This could be addressed with a "force" migration feature.  That said, I 
don't believe that the overwhelming majority of users are in a position 
to determine whether they can safely migrate to an older version.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and   other beasts
  2009-11-23 14:51                 ` Anthony Liguori
@ 2009-11-23 14:53                   ` Gleb Natapov
  2009-11-23 15:05                     ` Anthony Liguori
       [not found]                   ` <m33a45s009.fsf@neno.neno>
  1 sibling, 1 reply; 96+ messages in thread
From: Gleb Natapov @ 2009-11-23 14:53 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, Juan Quintela, qemu-devel

On Mon, Nov 23, 2009 at 08:51:17AM -0600, Anthony Liguori wrote:
> Gleb Natapov wrote:
> >>>>My problem implementing optional features/sections/... is not the
> >>>>savevm/VMState bits.  At the end, implementing that is easy.  What is
> >>>>more dificult is once that a device have 5 features, what are the valid
> >>>>combinations.  i.e. if you have pci and msix features, msix requires
> >>>>pci.  In this case, the dependency is trivial, but in others that
> >>>>hasen't to be so obvious.
> >>>It doesn't matter what device support and how it is configured. This can
> >>>be handled by each device separately. i.e if destination detects that
> >>>source had MSIX enabled for the device but destination hasn't it will
> >>>signal an error.
> >>And guess what, with current code migration is going to "suceed" on the
> >>source host and fail on the target host.
> >Then current code is buggy. It should be possible to abort migration in
> >the middle if device can't understand the data it received.
> 
> It can, post_load() can error which will terminate the migration.
> This can be used to validate fields beyond whether they fit into the
> type specified.
> 
Then I don't see why Juan claims what he claims.

--
			Gleb.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-23 14:21         ` Paolo Bonzini
@ 2009-11-23 15:00           ` Anthony Liguori
  2009-11-23 15:37             ` Eduardo Habkost
  2009-11-23 15:02           ` Eduardo Habkost
  1 sibling, 1 reply; 96+ messages in thread
From: Anthony Liguori @ 2009-11-23 15:00 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Eduardo Habkost, qemu-devel

Paolo Bonzini wrote:
> On 11/23/2009 02:51 PM, Eduardo Habkost wrote:
>> Right, but I wouldn't be surprised if a user complains that "I know that
>> my guest don't use that VM feature, so I want to be able to migrate to
>> an older version anyway".
>
> That's a bit more tricky.  What if the older version doesn't support 
> sound (just making up an example) and you know that you're not using a 
> client that plays sound, but still the sound card is part of the 
> machine?  I think there is no doubt that the migration (or 
> save/restore) should be aborted in that case.
>
> I would not generalize very much, the problem that Dor posed is very 
> specific and probably quite rare.  However, it's real.

I think the problem is that you shouldn't be changing the guest visible 
state in a stable update of qemu.  If you change the guest visible state 
in a stable update, then you won't be able to support live migration 
between arbitrary stable versions.  You can't introduce features without 
introducing forward compatibility issues.  If you're adding new guest 
visible state, you've added a feature.

This is not a live migration problem, this is a problem with your stable 
branch policy.

Regards,

Anthony Liguori

> Paolo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-23 14:21         ` Paolo Bonzini
  2009-11-23 15:00           ` Anthony Liguori
@ 2009-11-23 15:02           ` Eduardo Habkost
  2009-11-23 15:12             ` Anthony Liguori
  1 sibling, 1 reply; 96+ messages in thread
From: Eduardo Habkost @ 2009-11-23 15:02 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Mon, Nov 23, 2009 at 03:21:24PM +0100, Paolo Bonzini wrote:
> On 11/23/2009 02:51 PM, Eduardo Habkost wrote:
>> Right, but I wouldn't be surprised if a user complains that "I know that
>> my guest don't use that VM feature, so I want to be able to migrate to
>> an older version anyway".
>
> That's a bit more tricky.  What if the older version doesn't support  
> sound (just making up an example) and you know that you're not using a  
> client that plays sound, but still the sound card is part of the  
> machine?  I think there is no doubt that the migration (or save/restore)  
> should be aborted in that case.

Yes, it's not always that simple. I am thinking about simpler cases
where it is possible to migrate and we know that the missing migration
data won't impact the specific guest OS that is being run.

The pvclock MSRs are an example: if the guest is not using pvclock, not
restoring the MSRs won't make any difference. Strictly speaking, not
migrating them is wrong, but the user may argue that they know it won't
impact their guest OS, and that they want to take the risk. I am not
arguing we should always listen to them, I am just saying that some
users may want that to work.

Also, on the pvclock MSR case (and probably others), any argument
against doing backward migration would also be valid against doing
forward migration when the source process doesn't have the fix yet,
because the pvclock MSRs won't be migrated anyway. Forward migration is
as broken as backward migration, but we don't prevent migration on that
direction.

>
> I would not generalize very much, the problem that Dor posed is very  
> specific and probably quite rare.  However, it's real.
>

-- 
Eduardo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and   other beasts
  2009-11-23 14:53                   ` Gleb Natapov
@ 2009-11-23 15:05                     ` Anthony Liguori
  2009-11-23 15:22                       ` Gleb Natapov
  0 siblings, 1 reply; 96+ messages in thread
From: Anthony Liguori @ 2009-11-23 15:05 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Paolo Bonzini, Juan Quintela, qemu-devel

Gleb Natapov wrote:
> Then I don't see why Juan claims what he claims.
>   

Live migration is unidirectional.  As long as qemu can send out all of 
the data without the stream closing, it will "succeed" on the source.  
While this may sound like a bug, it's an impossible problem to solve as 
it's dealing with reliable communication between two unreliable nodes 
(i.e. the two general's problem).  This is why the source qemu does not 
exit after a successful live migration.  It merely stays in the stopped 
state.  The idea is that a third party management tool can be the 
"reliable third party" that can make the final determination about 
whether the migration has succeeded and take actions on the source and 
destination nodes appropriately.

In this precise case, if post_load() fails, it may or may not cause the 
source to fail the migration depending on how large the TCP window sizes 
are, how much data is in flight, and how much state is left to process.

Regards.

Anthony Liguori
> --
> 			Gleb.
>   

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-23 15:02           ` Eduardo Habkost
@ 2009-11-23 15:12             ` Anthony Liguori
  2009-11-24 14:26               ` [Qemu-devel] " Michael S. Tsirkin
  0 siblings, 1 reply; 96+ messages in thread
From: Anthony Liguori @ 2009-11-23 15:12 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel

Eduardo Habkost wrote:
> The pvclock MSRs are an example: if the guest is not using pvclock, not
> restoring the MSRs won't make any difference. Strictly speaking, not
> migrating them is wrong, but the user may argue that they know it won't
> impact their guest OS, and that they want to take the risk.

Once you start dealing with issues of risk vs. benefit, it's a policy 
and belongs in the management layer.

We don't make risk vs. benefit assessments in qemu.  We defer those 
types of decisions.

Today, we only succeed migration when we know it will be successful.  We 
could allow a management tool to override this check such that it could 
implement such a policy.  But that's a really dangerous option to offer.

> Also, on the pvclock MSR case (and probably others), any argument
> against doing backward migration would also be valid against doing
> forward migration when the source process doesn't have the fix yet,
> because the pvclock MSRs won't be migrated anyway. Forward migration is
> as broken as backward migration, but we don't prevent migration on that
> direction.
>   

A bug that is visible to a guest is no longer a bug, but a feature that 
has to be supported for as long as that release is supported.  If we 
feel that it's too dangerous of a bug, then we need to fail gracefully 
and refuse to load that state on any other system forcing a proper 
shutdown/startup for migration to a new version of qemu.

For the purposes of compatibility, it is something that we have to 
preserve.  In this case, you're introducing two MSRs that are readable 
and writable by a guest.  If you migrate all of the sudden you lose that 
MSRs content.  You cannot have live migration cause an MSR to disappear 
regardless of what the purpose of that MSR is.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-23 14:49               ` Anthony Liguori
@ 2009-11-23 15:21                 ` Eduardo Habkost
  2009-11-23 16:16                   ` Anthony Liguori
       [not found]                 ` <m3y6lxqkpv.fsf@neno.neno>
  2009-11-24 13:39                 ` Michael S. Tsirkin
  2 siblings, 1 reply; 96+ messages in thread
From: Eduardo Habkost @ 2009-11-23 15:21 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, Juan Quintela, Gleb Natapov, qemu-devel

Excerpts from Anthony Liguori's message of Mon Nov 23 12:49:23 -0200 2009:
> Juan Quintela wrote:
> > But if you know substitute qemu-0.11 and qemu-0.12 for RHEL5.4 and
> > RHEL5.4.1, you will see that the code bases are going to be really,
> > really similar.  And if any savevm format is changed, it is because
> > there are no other solution.
> >   
> 
> In our own stable branch, we do not introduce any savevm changes.  I 
> would recommend the same policy for RHEL :-)

But what if you need to add a savevm change to make migration work
properly on the stable branch? You can't just tell users "migration is
known to be broken on the stable branch, please don't run migrations
when using the stable branch". That's the case for the pvclock MSR
migration fix.

In a perfect world, the set of state data that is migrated by the
current implementation would always match exactly the expected behavior
of the virtual machine. Unfortunately sometimes the implementation
doesn't follow the "contract" (be it some written specification,
documentation, or just user expectations).  When that happens, it is a
bug on qemu and it needs to be fixed on the stable branch.

Note that (right now) I am not arguing for backward migration, but just
arguing that we can't have a strict "no savevm changes" policy on the
stable branch.
-- 
Eduardo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and   other beasts
  2009-11-23 15:05                     ` Anthony Liguori
@ 2009-11-23 15:22                       ` Gleb Natapov
  2009-11-23 15:30                         ` Paolo Bonzini
  2009-11-23 15:32                         ` Anthony Liguori
  0 siblings, 2 replies; 96+ messages in thread
From: Gleb Natapov @ 2009-11-23 15:22 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, Juan Quintela, qemu-devel

On Mon, Nov 23, 2009 at 09:05:58AM -0600, Anthony Liguori wrote:
> Gleb Natapov wrote:
> >Then I don't see why Juan claims what he claims.
> 
> Live migration is unidirectional.  As long as qemu can send out all
> of the data without the stream closing, it will "succeed" on the
> source.  While this may sound like a bug, it's an impossible problem
> to solve as it's dealing with reliable communication between two
> unreliable nodes (i.e. the two general's problem).  This is why the
> source qemu does not exit after a successful live migration.  It
As far as I remember the two general's problem talks about unreliable
channel, not unreliable nodes. Why not having destination send ACK/NACK
to the source when it knows that migration succeeded/failed. If source
gets NACK it continues, if it gets ACK it exits, otherwise it stays in
paused state. Yes, there are worst case scenarios where this will not work,
but it will not be worse then what we have now.

> merely stays in the stopped state.  The idea is that a third party
> management tool can be the "reliable third party" that can make the
> final determination about whether the migration has succeeded and
> take actions on the source and destination nodes appropriately.
> 
> In this precise case, if post_load() fails, it may or may not cause
> the source to fail the migration depending on how large the TCP
> window sizes are, how much data is in flight, and how much state is
> left to process.
> 
If post_load() fails it should inform management about failure and
management will restart the source. I this how it works now?

--
			Gleb.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and   other beasts
  2009-11-23 15:22                       ` Gleb Natapov
@ 2009-11-23 15:30                         ` Paolo Bonzini
  2009-11-23 15:32                         ` Anthony Liguori
  1 sibling, 0 replies; 96+ messages in thread
From: Paolo Bonzini @ 2009-11-23 15:30 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Juan Quintela, qemu-devel

On 11/23/2009 04:22 PM, Gleb Natapov wrote:
> As far as I remember the two general's problem talks about unreliable
> channel, not unreliable nodes. Why not having destination send ACK/NACK
> to the source when it knows that migration succeeded/failed. If source
> gets NACK it continues, if it gets ACK it exits, otherwise it stays in
> paused state. Yes, there are worst case scenarios where this will not work,
> but it will not be worse then what we have now.

Also, this can be done in a per-protocol manner.  TCP and Unix socket 
migration would support it, while exec (and maybe fd) migration would not.

Paolo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and   other beasts
  2009-11-23 15:22                       ` Gleb Natapov
  2009-11-23 15:30                         ` Paolo Bonzini
@ 2009-11-23 15:32                         ` Anthony Liguori
  2009-11-23 15:49                           ` Gleb Natapov
  1 sibling, 1 reply; 96+ messages in thread
From: Anthony Liguori @ 2009-11-23 15:32 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Paolo Bonzini, Juan Quintela, qemu-devel

Gleb Natapov wrote:
> On Mon, Nov 23, 2009 at 09:05:58AM -0600, Anthony Liguori wrote:
>   
>> Gleb Natapov wrote:
>>     
>>> Then I don't see why Juan claims what he claims.
>>>       
>> Live migration is unidirectional.  As long as qemu can send out all
>> of the data without the stream closing, it will "succeed" on the
>> source.  While this may sound like a bug, it's an impossible problem
>> to solve as it's dealing with reliable communication between two
>> unreliable nodes (i.e. the two general's problem).  This is why the
>> source qemu does not exit after a successful live migration.  It
>>     
> As far as I remember the two general's problem talks about unreliable
> channel, not unreliable nodes.

That's just semantics.  The problem is that one general does not know if 
the other general received the message.  Even if there was a reliable 
channel between the two generals, if one of the generals can die with no 
indication, then you still have the same problem, i.e. the first general 
doesn't know for sure if the second general received the message.

>  Why not having destination send ACK/NACK
> to the source when it knows that migration succeeded/failed.

1) Source sends migration traffic
2) Destination receives it, sends Ack
3) Destination needs to wait to receive Ack from Source before starting 
guest to ensure that guest does not start twice
4) Source receives Ack from Destination, sends Ack
5) Source kills guest
6) Destination receives Ack from Source, starts guest

If Destination dies in between 5 and 6, the VM disappears.

>  If source
> gets NACK it continues, if it gets ACK it exits, otherwise it stays in
> paused state. Yes, there are worst case scenarios where this will not work,
> but it will not be worse then what we have now.
>   

It introduces a round trip in a path that's extremely sensitive to 
latency.  Waiting for those acks == guest down time.  Since it doesn't 
make things fundamentally reliable, why bother?

A management tool doesn't exist in the down time path so it can look at 
both ends at its leisure to determine if something when wrong.

>> merely stays in the stopped state.  The idea is that a third party
>> management tool can be the "reliable third party" that can make the
>> final determination about whether the migration has succeeded and
>> take actions on the source and destination nodes appropriately.
>>
>> In this precise case, if post_load() fails, it may or may not cause
>> the source to fail the migration depending on how large the TCP
>> window sizes are, how much data is in flight, and how much state is
>> left to process.
>>
>>     
> If post_load() fails it should inform management about failure and
> management will restart the source. I this how it works now?
>   

It informs management on the destination node and it can take 
appropriate action by sending cont to the source.  This minimizes down 
time in the common case (successful migration).

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-23 15:00           ` Anthony Liguori
@ 2009-11-23 15:37             ` Eduardo Habkost
  0 siblings, 0 replies; 96+ messages in thread
From: Eduardo Habkost @ 2009-11-23 15:37 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, qemu-devel

On Mon, Nov 23, 2009 at 09:00:05AM -0600, Anthony Liguori wrote:
<snip>
>
> I think the problem is that you shouldn't be changing the guest visible  
> state in a stable update of qemu.  If you change the guest visible state  
> in a stable update, then you won't be able to support live migration  
> between arbitrary stable versions.  You can't introduce features without  
> introducing forward compatibility issues.  If you're adding new guest  
> visible state, you've added a feature.
>
> This is not a live migration problem, this is a problem with your stable  
> branch policy.

It is not a feature, because the data was already supposed to be part of
the guest visible state, but the implementation was buggy. If the
current implementation were the ultimate authority regarding what is a
bug and what is a new feature, no software would ever had any bug,
everything we call "bug fix" today, would be called "feature".

I would agree with you if the ultimate authority regarding expected
guest visible state was our implementation. But things aren't that
simple: sometimes the definition of "guest visible state" in the
implementation is buggy, and we can't just tell users "the feature your
guests are using shouldn't be part of the guest visible state, don't use
it".

-- 
Eduardo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and   other beasts
  2009-11-23 15:32                         ` Anthony Liguori
@ 2009-11-23 15:49                           ` Gleb Natapov
  2009-11-23 16:09                             ` Anthony Liguori
  0 siblings, 1 reply; 96+ messages in thread
From: Gleb Natapov @ 2009-11-23 15:49 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, Juan Quintela, qemu-devel

On Mon, Nov 23, 2009 at 09:32:48AM -0600, Anthony Liguori wrote:
> Gleb Natapov wrote:
> >On Mon, Nov 23, 2009 at 09:05:58AM -0600, Anthony Liguori wrote:
> >>Gleb Natapov wrote:
> >>>Then I don't see why Juan claims what he claims.
> >>Live migration is unidirectional.  As long as qemu can send out all
> >>of the data without the stream closing, it will "succeed" on the
> >>source.  While this may sound like a bug, it's an impossible problem
> >>to solve as it's dealing with reliable communication between two
> >>unreliable nodes (i.e. the two general's problem).  This is why the
> >>source qemu does not exit after a successful live migration.  It
> >As far as I remember the two general's problem talks about unreliable
> >channel, not unreliable nodes.
> 
> That's just semantics.  The problem is that one general does not
> know if the other general received the message.  Even if there was a
> reliable channel between the two generals, if one of the generals
> can die with no indication, then you still have the same problem,
> i.e. the first general doesn't know for sure if the second general
> received the message.
> 
> > Why not having destination send ACK/NACK
> >to the source when it knows that migration succeeded/failed.
> 
> 1) Source sends migration traffic
> 2) Destination receives it, sends Ack
> 3) Destination needs to wait to receive Ack from Source before
> starting guest to ensure that guest does not start twice
> 4) Source receives Ack from Destination, sends Ack
> 5) Source kills guest
> 6) Destination receives Ack from Source, starts guest
> 
> If Destination dies in between 5 and 6, the VM disappears.
> 
1) Source sends migration traffic
2) Destination receives it, sends Ack
3) Destination start running
4) Source receives Ack from Destination
5) Source kills guest

If Source does not receive Ack it stays paused and wait for management to
sort things out.

> > If source
> >gets NACK it continues, if it gets ACK it exits, otherwise it stays in
> >paused state. Yes, there are worst case scenarios where this will not work,
> >but it will not be worse then what we have now.
> 
> It introduces a round trip in a path that's extremely sensitive to
> latency.  Waiting for those acks == guest down time.  Since it
> doesn't make things fundamentally reliable, why bother?
No additional latency. See above. Destination starts running right away.
 
> 
> A management tool doesn't exist in the down time path so it can look
> at both ends at its leisure to determine if something when wrong.
> 
> >>merely stays in the stopped state.  The idea is that a third party
> >>management tool can be the "reliable third party" that can make the
> >>final determination about whether the migration has succeeded and
> >>take actions on the source and destination nodes appropriately.
> >>
> >>In this precise case, if post_load() fails, it may or may not cause
> >>the source to fail the migration depending on how large the TCP
> >>window sizes are, how much data is in flight, and how much state is
> >>left to process.
> >>
> >If post_load() fails it should inform management about failure and
> >management will restart the source. I this how it works now?
> 
> It informs management on the destination node and it can take
> appropriate action by sending cont to the source.  This minimizes
> down time in the common case (successful migration).
> 
> Regards,
> 
> Anthony Liguori

--
			Gleb.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and   other beasts
       [not found]                   ` <m33a45s009.fsf@neno.neno>
@ 2009-11-23 16:05                     ` Gleb Natapov
  2009-11-23 16:10                       ` Anthony Liguori
  0 siblings, 1 reply; 96+ messages in thread
From: Gleb Natapov @ 2009-11-23 16:05 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Paolo Bonzini, qemu-devel

On Mon, Nov 23, 2009 at 05:01:58PM +0100, Juan Quintela wrote:
> Anthony Liguori <anthony@codemonkey.ws> wrote:
> > Gleb Natapov wrote:
> >>>>> My problem implementing optional features/sections/... is not the
> >>>>> savevm/VMState bits.  At the end, implementing that is easy.  What is
> >>>>> more dificult is once that a device have 5 features, what are the valid
> >>>>> combinations.  i.e. if you have pci and msix features, msix requires
> >>>>> pci.  In this case, the dependency is trivial, but in others that
> >>>>> hasen't to be so obvious.
> >>>>>         
> >>>> It doesn't matter what device support and how it is configured. This can
> >>>> be handled by each device separately. i.e if destination detects that
> >>>> source had MSIX enabled for the device but destination hasn't it will
> >>>> signal an error.
> >>>>       
> >>> And guess what, with current code migration is going to "suceed" on the
> >>> source host and fail on the target host.
> >>>     
> >> Then current code is buggy. It should be possible to abort migration in
> >> the middle if device can't understand the data it received.
> >>   
> >
> > It can, post_load() can error which will terminate the migration.
> > This can be used to validate fields beyond whether they fit into the
> > type specified.
> 
> Yes, but "source" never sees that.
> it is like:
> 
> source:
>    set things
>    foreach device
>      sent state
>    end
> 
> target:
>    set things
>    while data
>      receive device
> 
> If it fails in one of the lastest devices.  Source decrees migration
> "success" before target ends reading data.  You can test it, put a
> return -1 in any of the post_load() functions, and you would see that
> migration suceeds in the source and fails in the target.
> 
> Mark was the one that explained me this bug.
> 
According to Anthony this is not a bug. Management has all the means to
resolve this situation properly. The bug would be if dst and src both
run or both exit.

--
			Gleb.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and   other beasts
  2009-11-23 15:49                           ` Gleb Natapov
@ 2009-11-23 16:09                             ` Anthony Liguori
  2009-11-23 16:15                               ` Gleb Natapov
  0 siblings, 1 reply; 96+ messages in thread
From: Anthony Liguori @ 2009-11-23 16:09 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Paolo Bonzini, Juan Quintela, qemu-devel

Gleb Natapov wrote:
> On Mon, Nov 23, 2009 at 09:32:48AM -0600, Anthony Liguori wrote:
>   
>> Gleb Natapov wrote:
>>     
>>> On Mon, Nov 23, 2009 at 09:05:58AM -0600, Anthony Liguori wrote:
>>>       
>>>> Gleb Natapov wrote:
>>>>         
>>>>> Then I don't see why Juan claims what he claims.
>>>>>           
>>>> Live migration is unidirectional.  As long as qemu can send out all
>>>> of the data without the stream closing, it will "succeed" on the
>>>> source.  While this may sound like a bug, it's an impossible problem
>>>> to solve as it's dealing with reliable communication between two
>>>> unreliable nodes (i.e. the two general's problem).  This is why the
>>>> source qemu does not exit after a successful live migration.  It
>>>>         
>>> As far as I remember the two general's problem talks about unreliable
>>> channel, not unreliable nodes.
>>>       
>> That's just semantics.  The problem is that one general does not
>> know if the other general received the message.  Even if there was a
>> reliable channel between the two generals, if one of the generals
>> can die with no indication, then you still have the same problem,
>> i.e. the first general doesn't know for sure if the second general
>> received the message.
>>
>>     
>>> Why not having destination send ACK/NACK
>>> to the source when it knows that migration succeeded/failed.
>>>       
>> 1) Source sends migration traffic
>> 2) Destination receives it, sends Ack
>> 3) Destination needs to wait to receive Ack from Source before
>> starting guest to ensure that guest does not start twice
>> 4) Source receives Ack from Destination, sends Ack
>> 5) Source kills guest
>> 6) Destination receives Ack from Source, starts guest
>>
>> If Destination dies in between 5 and 6, the VM disappears.
>>
>>     
> 1) Source sends migration traffic
> 2) Destination receives it, sends Ack
> 3) Destination start running
> 4) Source receives Ack from Destination
> 5) Source kills guest
>
> If Source does not receive Ack it stays paused and wait for management to
> sort things out.
>   

Is it really useful to kill the source guest in this case?  I'm wary of 
how useful an unreliable ack is namely because it introduces rather 
complex semantics from a management tool perspective.  If folks think it 
would be really useful, I'm not fundamentally opposed to it.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and   other beasts
  2009-11-23 16:05                     ` Gleb Natapov
@ 2009-11-23 16:10                       ` Anthony Liguori
  0 siblings, 0 replies; 96+ messages in thread
From: Anthony Liguori @ 2009-11-23 16:10 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Paolo Bonzini, Juan Quintela, qemu-devel

Gleb Natapov wrote:
> According to Anthony this is not a bug. Management has all the means to
> resolve this situation properly. The bug would be if dst and src both
> run or both exit.
>   

Yup.  And they do.  If you do the same migration with libvirt, it will 
fail gracefully with a -1 in the post_load().

> --
> 			Gleb.
>   

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and   other beasts
  2009-11-23 16:09                             ` Anthony Liguori
@ 2009-11-23 16:15                               ` Gleb Natapov
  2009-11-23 16:19                                 ` Anthony Liguori
  0 siblings, 1 reply; 96+ messages in thread
From: Gleb Natapov @ 2009-11-23 16:15 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, Juan Quintela, qemu-devel

On Mon, Nov 23, 2009 at 10:09:15AM -0600, Anthony Liguori wrote:
> Gleb Natapov wrote:
> >On Mon, Nov 23, 2009 at 09:32:48AM -0600, Anthony Liguori wrote:
> >>Gleb Natapov wrote:
> >>>On Mon, Nov 23, 2009 at 09:05:58AM -0600, Anthony Liguori wrote:
> >>>>Gleb Natapov wrote:
> >>>>>Then I don't see why Juan claims what he claims.
> >>>>Live migration is unidirectional.  As long as qemu can send out all
> >>>>of the data without the stream closing, it will "succeed" on the
> >>>>source.  While this may sound like a bug, it's an impossible problem
> >>>>to solve as it's dealing with reliable communication between two
> >>>>unreliable nodes (i.e. the two general's problem).  This is why the
> >>>>source qemu does not exit after a successful live migration.  It
> >>>As far as I remember the two general's problem talks about unreliable
> >>>channel, not unreliable nodes.
> >>That's just semantics.  The problem is that one general does not
> >>know if the other general received the message.  Even if there was a
> >>reliable channel between the two generals, if one of the generals
> >>can die with no indication, then you still have the same problem,
> >>i.e. the first general doesn't know for sure if the second general
> >>received the message.
> >>
> >>>Why not having destination send ACK/NACK
> >>>to the source when it knows that migration succeeded/failed.
> >>1) Source sends migration traffic
> >>2) Destination receives it, sends Ack
> >>3) Destination needs to wait to receive Ack from Source before
> >>starting guest to ensure that guest does not start twice
> >>4) Source receives Ack from Destination, sends Ack
> >>5) Source kills guest
> >>6) Destination receives Ack from Source, starts guest
> >>
> >>If Destination dies in between 5 and 6, the VM disappears.
> >>
> >1) Source sends migration traffic
> >2) Destination receives it, sends Ack
> >3) Destination start running
> >4) Source receives Ack from Destination
> >5) Source kills guest
> >
> >If Source does not receive Ack it stays paused and wait for management to
> >sort things out.
> 
> Is it really useful to kill the source guest in this case?  I'm wary
> of how useful an unreliable ack is namely because it introduces
> rather complex semantics from a management tool perspective.  If
> folks think it would be really useful, I'm not fundamentally opposed
> to it.
> 
I am OK with management being responsible to sort things out. Juan
said that destination can't abort migration in the middle, so I pointed
out easy solution that will work in 99.999% cases.

--
			Gleb.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-23 15:21                 ` Eduardo Habkost
@ 2009-11-23 16:16                   ` Anthony Liguori
  2009-11-23 17:08                     ` Eduardo Habkost
  0 siblings, 1 reply; 96+ messages in thread
From: Anthony Liguori @ 2009-11-23 16:16 UTC (permalink / raw)
  To: Eduardo Habkost; +Cc: Paolo Bonzini, Juan Quintela, Gleb Natapov, qemu-devel

Eduardo Habkost wrote:
> Excerpts from Anthony Liguori's message of Mon Nov 23 12:49:23 -0200 2009:
>   
>> Juan Quintela wrote:
>>     
>>> But if you know substitute qemu-0.11 and qemu-0.12 for RHEL5.4 and
>>> RHEL5.4.1, you will see that the code bases are going to be really,
>>> really similar.  And if any savevm format is changed, it is because
>>> there are no other solution.
>>>   
>>>       
>> In our own stable branch, we do not introduce any savevm changes.  I 
>> would recommend the same policy for RHEL :-)
>>     
>
> But what if you need to add a savevm change to make migration work
> properly on the stable branch?

Define "properly".

If we have to introduce a new version in VMstate, there are two 
possibilities.  The first is that we have to backlist the old version 
because it was fundamentally broken.  This is rare but it happens.  In 
this case, we would not be able to support migrating from that stable 
release to any other stable release.  Really unfortunate for users but 
we would have no other choice.

The second is that we introduce a new version but don't blacklist the 
old.  This means the old version wasn't fundamentally broken.  It also 
means that the "fix" is a feature.  It makes things better but isn't 
strictly required.  That gets deferred to the next release.

>  You can't just tell users "migration is
> known to be broken on the stable branch, please don't run migrations
> when using the stable branch". That's the case for the pvclock MSR
> migration fix.
>   

You're assuming that backporting the pvclock change is a bug fix.  It's 
a new feature as far as I'm concerned and doesn't belong in stable.

> In a perfect world, the set of state data that is migrated by the
> current implementation would always match exactly the expected behavior
> of the virtual machine. Unfortunately sometimes the implementation
> doesn't follow the "contract" (be it some written specification,
> documentation, or just user expectations).  When that happens, it is a
> bug on qemu and it needs to be fixed on the stable branch.
>
> Note that (right now) I am not arguing for backward migration, but just
> arguing that we can't have a strict "no savevm changes" policy on the
> stable branch.
>   

That's exactly what I'm advocating: a strict savevm policy for stable 
branch.  It's something I've always enforced in the past.  It's 
necessary to preserve the integrity of live migration.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and   other beasts
  2009-11-23 16:15                               ` Gleb Natapov
@ 2009-11-23 16:19                                 ` Anthony Liguori
  0 siblings, 0 replies; 96+ messages in thread
From: Anthony Liguori @ 2009-11-23 16:19 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Paolo Bonzini, Juan Quintela, qemu-devel

Gleb Natapov wrote:
> I am OK with management being responsible to sort things out. Juan
> said that destination can't abort migration in the middle, so I pointed
> out easy solution that will work in 99.999% cases.
>   

I think there's something elegant about doing migration in a 
unidirectional stream.  It makes live migration work to a file, to a 
qcow2 file, etc. with no protocol change.

The ack'ing bits can be done by a live migration transport since it 
happens entirely at the end of the stream.  The old ssh: migration used 
to do this and provided a simple means that worked 99.999% of the time :-)

Regards,

Anthony Liguori

> --
> 			Gleb.
>   

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and   other beasts
       [not found]                 ` <m3y6lxqkpv.fsf@neno.neno>
@ 2009-11-23 16:44                   ` Anthony Liguori
       [not found]                     ` <m3zl6db11z.fsf@neno.neno>
  2009-11-23 20:24                     ` Eduardo Habkost
  0 siblings, 2 replies; 96+ messages in thread
From: Anthony Liguori @ 2009-11-23 16:44 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Paolo Bonzini, qemu-devel, Gleb Natapov

Juan Quintela wrote:
> you can weasel the way you want (I can also do it).
>
> Customer had: 5.4 <-> 5.4 migration working (suboptimally)
> Now appears 5.4.1 that works best with migration.  But he want to do the
> migration in two steps:
>
> migrate from qemu 5.4 -> 5.4.1, and be able to migrate back if he don't
> like it.
>
> At some point, he will migrate to 5.4.1 knowing that it lost backward
> migration.  Think of a cluster of machines here, and you just add a
> 5.4.1 machine into the mix, and what this to work while you haven't
> changed _all_ the machines.
>   

If I'm a customer and you introduce this sort of change in a .z release, 
I would certainly want to know about it and have control over it.

I don't want to transparently migrate from 5.4.1 to 5.4.0 and have my 
guest's time start drifting.  I specifically want that to fail.

If I wanted to support both models because I didn't care, then I would 
start with -M 5.4.0 on all of my nodes.  I know you don't have a -M 
5.4.1 and -M 5.4.0 but if you're introducing these sort of changes, you 
really should.

>> However, if we rely on certain guest behavior, then it blows up the
>> testing matrix because now we have to test every guest with every
>> workload to see whether it works with migration.  It's a slippery
>> slope that's hard to get off once you start.
>>     
>
> I know :( But life sometimes don't agree with you.  Notice that I
> understand that our problem is different that upstream one.  Our prolbem
> is more in migrating from 0.11.0 -> 0.11.1, and be able to go back.
> Changes in the savevm are only introduced if there is no other solution.
> But we want to be able to get the 0.11.0 behaviour in 0.11.1, because we
> have a mixed environment.  Requesting to upgrade all the hosts at the
> same time is not going to fly with any BOFH :)
>   

You've made a policy decision.  As a user, I really don't like that 
policy decision and it makes me want to make sure that we upgrade all of 
our hosts at once to avoid this problem.  Of course, I'm a control freak 
and I'm particularly concerned about time drift issues as that's been 
consuming a bit of my time lately.

>>> But if you know substitute qemu-0.11 and qemu-0.12 for RHEL5.4 and
>>> RHEL5.4.1, you will see that the code bases are going to be really,
>>> really similar.  And if any savevm format is changed, it is because
>>> there are no other solution.
>>>   
>>>       
>> In our own stable branch, we do not introduce any savevm changes.  I
>> would recommend the same policy for RHEL :-)
>>     
>
> Except if we found a bug, and there are no other solution.  That is what
> we try to do.  And we would not change the format for a new feature, but
> what happens if it was a bug that a field is really missing?
>   

Can we reasonably support a guest that doesn't have this older field?  
If the answer is "yes", then it's a feature that can be delayed until 
the next release.

  

>> You may be willing to expose this to your users but as an upstream
>> policy, I'm very opposed to it.  You're breaking the contract of
>> migration by changing the guests behavior from underneath it.
>>     
>
> The layer inside me:
> - You are lying when you told me that qemu-0.11 -M pc-0.10 gives me a
>   pc-0.10 like machine.  The savevm format is different.
>
> (after talking about contracts, I couldn't resist)
>   

That's a bug that we need to fix.

> I could make more examples to you.  But that would just make the
> discussion longer.  What we have here is:
>
> - migration beteween 0.11.0 -> 0.11.0 works some way
> - I want "that very way" between 0.11.1 -> 0.11.0.
>   

Not a problem as long as we don't introduce features in the stable branch.

>> A better approach would be having an option to "force" a migration
>> across incompatible versions.  I think such an option would be pretty
>> dangerous to offer but at least it puts the decision in the hands of
>> the management software where it belongs.
>>     
>
> The difference is where you put things.  In the source (newer code) or
> in the target (older code).  By definition, once that you have changed
> something, you can change it to be backward compatible.  What is a bit
> more difficult is to take the time machine, go to the past, and change
> 5.4 to be compatible with 5.4.1. (*)
>   

The problem here isn't migration, it's what you've decided to backport 
into your stable branch.

Note that the discussion we're having isn't about backporting pvclock to 
qemu or qemu/kvm's stable branch.  We're not going to change the 
migration protocol in upstream to support a decision that we haven't 
actually made.

And from an upstream position, I would oppose implementing the pvclock 
change in the stable branch exactly because of the problems it would 
create with live migration.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-23 16:16                   ` Anthony Liguori
@ 2009-11-23 17:08                     ` Eduardo Habkost
  2009-11-23 18:28                       ` Anthony Liguori
  0 siblings, 1 reply; 96+ messages in thread
From: Eduardo Habkost @ 2009-11-23 17:08 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, Juan Quintela, Gleb Natapov, qemu-devel

Excerpts from Anthony Liguori's message of Mon Nov 23 14:16:39 -0200 2009:
> Eduardo Habkost wrote:
> > Excerpts from Anthony Liguori's message of Mon Nov 23 12:49:23 -0200 2009:
<snip>
> >>>       
> >> In our own stable branch, we do not introduce any savevm changes.  I 
> >> would recommend the same policy for RHEL :-)
> >>     
> >
> > But what if you need to add a savevm change to make migration work
> > properly on the stable branch?
> 
> Define "properly".

It depends on many factors: user expectations, written specifications,
documentation. On the pvclock MSR case, it means guests OSes and users
expect the MSR values to be kept by the virtual machine, because that's
how pvclock is expected to work.

> 
> If we have to introduce a new version in VMstate, there are two 
> possibilities.  The first is that we have to backlist the old version 
> because it was fundamentally broken.  This is rare but it happens.  In 
> this case, we would not be able to support migrating from that stable 
> release to any other stable release.  Really unfortunate for users but 
> we would have no other choice.

Well, we may have an option (described below).

> 
> The second is that we introduce a new version but don't blacklist the 
> old.  This means the old version wasn't fundamentally broken.  It also 
> means that the "fix" is a feature.  It makes things better but isn't 
> strictly required.  That gets deferred to the next release.

Unfortunately sometimes you can't defer to the next release.

However, I think there is another possibility to handle the format
change: you can support migration from the old version to a newer
version, but the machine type (or maybe other internal field) is set to
tell that we are running a machine that has the old behavior (e.g. "this
is a machine that doesn't keep the pvclock MSR values").

Qemu version x.y.1 would support only the old machine type because it
doesn't have the new fix/feature. Qemu version x.y.2 would support both
machine types, because it has the fix but it will support migration from
x.y.1.

If you have a running guest and you want the pvclock (or other
guest-visible) behavior to change, you have to "move it to new virtual
hardware", meaning you should restart the guest using the new machine
type. Migrating guests would never change their machine type (or the
internal field used just for that), because you can't change the
definition of "guest visible state" of a running virtual machine.

(All above just to keep the ability of fixing bugs on guest-visible
behavior while keeping the ability to migrate between different
versions. I am not arguing it is worth all the work, but I am starting
to think it is the only sane solution if we want to keep both
abilities).

The above addresses one point where I think you are right: changing the
definition of "guest visible state" of a running VM isn't something
desirable. We may still disagree about the policy of a stable branch,
but I agree about not changing the savevm format of a running VM.

> 
> >  You can't just tell users "migration is
> > known to be broken on the stable branch, please don't run migrations
> > when using the stable branch". That's the case for the pvclock MSR
> > migration fix.
> >   
> 
> You're assuming that backporting the pvclock change is a bug fix.  It's 
> a new feature as far as I'm concerned and doesn't belong in stable.

It is a bug fix because the definition of "guest visible state" is
buggy. Guests expect pvclock state to be kept, and it was not being
kept.

If you consider that every savevm change is a feature, you are assuming
that the definition of "guest visible state" of the current
implementation is perfect and would never be considered buggy. I don't
think it is a reasonable assumption.

> 
> > In a perfect world, the set of state data that is migrated by the
> > current implementation would always match exactly the expected behavior
> > of the virtual machine. Unfortunately sometimes the implementation
> > doesn't follow the "contract" (be it some written specification,
> > documentation, or just user expectations).  When that happens, it is a
> > bug on qemu and it needs to be fixed on the stable branch.
> >
> > Note that (right now) I am not arguing for backward migration, but just
> > arguing that we can't have a strict "no savevm changes" policy on the
> > stable branch.
> >   
> 
> That's exactly what I'm advocating: a strict savevm policy for stable 
> branch.  It's something I've always enforced in the past.  It's 
> necessary to preserve the integrity of live migration.

That may be good enough for upstream Qemu, but IMO for RHEL it is not a
realistic policy. If the definition of "guest visible state" is buggy on
the current implementation, we can't drop entirely the possibility of
fixing it on our stable branch.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-23 17:08                     ` Eduardo Habkost
@ 2009-11-23 18:28                       ` Anthony Liguori
  2009-11-23 19:24                         ` Eduardo Habkost
  2009-11-24 11:00                         ` Dor Laor
  0 siblings, 2 replies; 96+ messages in thread
From: Anthony Liguori @ 2009-11-23 18:28 UTC (permalink / raw)
  To: Eduardo Habkost; +Cc: Paolo Bonzini, Juan Quintela, Gleb Natapov, qemu-devel

Eduardo Habkost wrote:
> That may be good enough for upstream Qemu, but IMO for RHEL it is not a
> realistic policy. If the definition of "guest visible state" is buggy on
> the current implementation, we can't drop entirely the possibility of
> fixing it on our stable branch.
>   

After mulling over it a bit, here's what I'd suggest:

1) Integrate VMstate with qdev
2) Introduce a bitmap blacklist for unsupported VMstate versions
3) Expose that bitmap as a qdev property for each device.
4) By default, upstream qemu will always set the bitmap to be 100% correct.

This provides a mechanism for informed users and downstreams to reduce 
correctness in favor of migration compatibility on a case-by-case basis.

This takes qemu out of the business of creating these sort of policies 
but allows RHEL to make decisions about what default policy it uses.  It 
also lets well informed users of RHEL to override those policy decisions 
when they deem it to be appropriate.

This would make me happy both from an upstream qemu perspective but also 
as a consumer of RHEL.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and   other beasts
       [not found]                     ` <m3zl6db11z.fsf@neno.neno>
@ 2009-11-23 18:44                       ` Anthony Liguori
  0 siblings, 0 replies; 96+ messages in thread
From: Anthony Liguori @ 2009-11-23 18:44 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Paolo Bonzini, qemu-devel, Gleb Natapov

Juan Quintela wrote:
>> The problem here isn't migration, it's what you've decided to backport
>> into your stable branch.
>>     
>
> No. the problem is that I made a mistake in the past.  And didn't add a
> field to the state that I should.  It just happens to work without that
> field in several use cases.  But in others, it fails spectacularly.
> What to do here?
>   

Blacklist the broken version of the format.  If it's known to fail under 
normal usage, that's the only right thing to do given the mechanisms we 
possess today.

>> And from an upstream position, I would oppose implementing the pvclock
>> change in the stable branch exactly because of the problems it would
>> create with live migration.
>>     
>
> Let's say that the problem is when half of the users told you that you
> can't change it in the stable branch, and the other half told you that
> without the change, migration don't work in a reliable way :(
>   

If one user needs one behavior and another user absolute needs a 
different behavior, you provide a mechanism for a user to choose which 
policy they want.

See my proposal about configurable device blacklists.

> It does'nt matter what you do, you lose :(
>   

If we stick strictly to providing mechanisms and default policies, we 
can't lose because we always pass the blame higher in the stack :-)

Regards,

Anthony Liguroi

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-23 18:28                       ` Anthony Liguori
@ 2009-11-23 19:24                         ` Eduardo Habkost
  2009-11-23 19:49                           ` Anthony Liguori
  2009-11-24 11:00                         ` Dor Laor
  1 sibling, 1 reply; 96+ messages in thread
From: Eduardo Habkost @ 2009-11-23 19:24 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, Juan Quintela, Gleb Natapov, qemu-devel

On Mon, Nov 23, 2009 at 12:28:16PM -0600, Anthony Liguori wrote:
> Eduardo Habkost wrote:
>> That may be good enough for upstream Qemu, but IMO for RHEL it is not a
>> realistic policy. If the definition of "guest visible state" is buggy on
>> the current implementation, we can't drop entirely the possibility of
>> fixing it on our stable branch.
>>   
>
> After mulling over it a bit, here's what I'd suggest:
>
> 1) Integrate VMstate with qdev
> 2) Introduce a bitmap blacklist for unsupported VMstate versions
> 3) Expose that bitmap as a qdev property for each device.
> 4) By default, upstream qemu will always set the bitmap to be 100% correct.
>
> This provides a mechanism for informed users and downstreams to reduce  
> correctness in favor of migration compatibility on a case-by-case basis.

Is this for backward migration? (I suppose so, as forward migration is
already handled by the usual version_id checks on the *_load()
functions).

If so, even with this bitmap, how would the migration source process
know which version it should use when generating the savevm data?
(considering that the migration stream is unidirectional, today) We have
been considering using a "set-savevm-version" monitor command that would
be used by management if backward migration is forced by the user.

BTW, we still have the "machine type" suggestion, that would still keep
guest-visible state correctness and allow backward migration when it is
100% correct and safe. With such mechanism, VMs created with the x.y.1
machine type could be safely migrated from x.y.2 to x.y.1. (Althought
the bitmap suggestion could have some use even on this case, if the user
really wants to force migration of a x.y.2 machine to x.y.1).

>
> This takes qemu out of the business of creating these sort of policies  
> but allows RHEL to make decisions about what default policy it uses.  It  
> also lets well informed users of RHEL to override those policy decisions  
> when they deem it to be appropriate.
>
> This would make me happy both from an upstream qemu perspective but also  
> as a consumer of RHEL.

What about the suggestion of using multiple sections per device, every
time a new feature is added, instead of just increasing the version
numbers linearly? It allows us to keep the savevm version info
consistent on the multiple downstream trees.

Suppose we have the following scenario:

1) Device Foo has features A, B, C on "foo" section, sets version to 1
2) Downstream tree (e.g. RHEL) is branched off upstream
3) Device Foo adds support to feature D, version change to 2
5) Device Foo adds support to feature E, version changed to 3
6) Feature E is backported to a downstream tree. Now it supports
   features A,B,C,E, and its versioning scheme will be incompatible with
   upstream.

What I suggest is something like:

1) Device Foo has features A,B,C, on "foo" section (or maybe on "foo.a",
   "foo.b", and "foo.c" sections, depending if they make sense
   individually)
2) Device Foo adds support to feature D, adds "foo.d" section
3) Device Foo adds support to feature E, adds "foo.e" section

Each section will still have its version number as usual (for cases
where a feature is changed in a incompatible way), but the point is that
independent features get stored on different sections.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-23 19:24                         ` Eduardo Habkost
@ 2009-11-23 19:49                           ` Anthony Liguori
  2009-11-23 21:21                             ` Eduardo Habkost
  0 siblings, 1 reply; 96+ messages in thread
From: Anthony Liguori @ 2009-11-23 19:49 UTC (permalink / raw)
  To: Paolo Bonzini, Juan Quintela, Gleb Natapov, qemu-devel

Eduardo Habkost wrote:
> On Mon, Nov 23, 2009 at 12:28:16PM -0600, Anthony Liguori wrote:
>   
>> Eduardo Habkost wrote:
>>     
>>> That may be good enough for upstream Qemu, but IMO for RHEL it is not a
>>> realistic policy. If the definition of "guest visible state" is buggy on
>>> the current implementation, we can't drop entirely the possibility of
>>> fixing it on our stable branch.
>>>   
>>>       
>> After mulling over it a bit, here's what I'd suggest:
>>
>> 1) Integrate VMstate with qdev
>> 2) Introduce a bitmap blacklist for unsupported VMstate versions
>> 3) Expose that bitmap as a qdev property for each device.
>> 4) By default, upstream qemu will always set the bitmap to be 100% correct.
>>
>> This provides a mechanism for informed users and downstreams to reduce  
>> correctness in favor of migration compatibility on a case-by-case basis.
>>     
>
> Is this for backward migration?

It's for migrating from an older qemu to a newer one.  Normally, newer 
qemu will happily support older formats but in this case, we broke 
something and we need to blacklist the old format.  This lets you 
override that black list.

> If so, even with this bitmap, how would the migration source process
> know which version it should use when generating the savevm data?
>   

To properly support this in -M pc-0.11, we'll need to be able to set the 
version to migrate for each qdev device.  Again, this is something that 
could be overridden as a qdev property.  The effect would be that we 
force a newer qemu to generate the older savevm format.

> (considering that the migration stream is unidirectional, today) We have
> been considering using a "set-savevm-version" monitor command that would
> be used by management if backward migration is forced by the user.
>   

qdev property is the right approach I think.  It's really a per-device 
setting.  It needs to get tied to machine type too and that's a 
convenient way to do that.

> BTW, we still have the "machine type" suggestion, that would still keep
> guest-visible state correctness and allow backward migration when it is
> 100% correct and safe. With such mechanism, VMs created with the x.y.1
> machine type could be safely migrated from x.y.2 to x.y.1. (Althought
> the bitmap suggestion could have some use even on this case, if the user
> really wants to force migration of a x.y.2 machine to x.y.1).
>   

In theory, a user can manually specify everything in a machine type.

>> This takes qemu out of the business of creating these sort of policies  
>> but allows RHEL to make decisions about what default policy it uses.  It  
>> also lets well informed users of RHEL to override those policy decisions  
>> when they deem it to be appropriate.
>>
>> This would make me happy both from an upstream qemu perspective but also  
>> as a consumer of RHEL.
>>     
>
> What about the suggestion of using multiple sections per device, every
> time a new feature is added, instead of just increasing the version
> numbers linearly? It allows us to keep the savevm version info
> consistent on the multiple downstream trees.
>   

It doesn't because it's just as likely to get clashes in subsection 
names.  For instance, RHEL5.4 may call the pvclock msr subsection 
"pvclock-msrs" and then upstream may call it "pvclock-msrs" and flip the 
order of the fields.

To support downstreams effectively, we need vendor specific versioning 
so that we can separate the upstream qemu namespace from each of the 
downstreams.

> Suppose we have the following scenario:
>
> 1) Device Foo has features A, B, C on "foo" section, sets version to 1
> 2) Downstream tree (e.g. RHEL) is branched off upstream
> 3) Device Foo adds support to feature D, version change to 2
> 5) Device Foo adds support to feature E, version changed to 3
> 6) Feature E is backported to a downstream tree. Now it supports
>    features A,B,C,E, and its versioning scheme will be incompatible with
>    upstream.
>   

Downstream adds a "RHEL" subversion.  This allows downstream to add a 
subversion to each device if it modifies it.  When it backports E, it 
bumps the downstream version from 0->1.

As long as the backported features aren't enabled, the migration will be 
compatible to upstream.  Once one of these backported features is 
enabled, migration will fail gracefully.

> What I suggest is something like:
>
> 1) Device Foo has features A,B,C, on "foo" section (or maybe on "foo.a",
>    "foo.b", and "foo.c" sections, depending if they make sense
>    individually)
> 2) Device Foo adds support to feature D, adds "foo.d" section
> 3) Device Foo adds support to feature E, adds "foo.e" section
>   

The combinations blow up quickly.  Just because A,B,C,E works for a 
given downstream, doesn't mean that it would work with the upstream code 
base.  Features are rarely so independent of one another.

It also doesn't address things like QXL which aren't just a simple 
matter of a backported upstream feature.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-23 16:44                   ` Anthony Liguori
       [not found]                     ` <m3zl6db11z.fsf@neno.neno>
@ 2009-11-23 20:24                     ` Eduardo Habkost
  1 sibling, 0 replies; 96+ messages in thread
From: Eduardo Habkost @ 2009-11-23 20:24 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, Juan Quintela, Gleb Natapov, qemu-devel

Excerpts from Anthony Liguori's message of Mon Nov 23 14:44:04 -0200 2009:
> 
> I don't want to transparently migrate from 5.4.1 to 5.4.0 and have my 
> guest's time start drifting.  I specifically want that to fail.

If you migrate from 5.4.0 to 5.4.0 or from 5.4.0 to 5.4.1, the guest
will also start drifting. Do you expect migration to fail on all those
cases too?


>
<snip>
> You've made a policy decision.  As a user, I really don't like that 
> policy decision and it makes me want to make sure that we upgrade all of 
> our hosts at once to avoid this problem.  Of course, I'm a control freak 
> and I'm particularly concerned about time drift issues as that's been 
> consuming a bit of my time lately.
<snip>
> 
> Can we reasonably support a guest that doesn't have this older field?  
> If the answer is "yes", then it's a feature that can be delayed until 
> the next release.

If you are really concerned about the time drift as you say above, the
answer is "no". A guest that doesn't have the new fields on savevm will
start drifting as soon as it is migrated--it doesn't matter if it is a
forward, backward, or sideways migration.
-- 
Eduardo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-23 19:49                           ` Anthony Liguori
@ 2009-11-23 21:21                             ` Eduardo Habkost
  0 siblings, 0 replies; 96+ messages in thread
From: Eduardo Habkost @ 2009-11-23 21:21 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, Juan Quintela, Gleb Natapov, qemu-devel

On Mon, Nov 23, 2009 at 01:49:09PM -0600, Anthony Liguori wrote:
> Eduardo Habkost wrote:
>> On Mon, Nov 23, 2009 at 12:28:16PM -0600, Anthony Liguori wrote:
<snip>
>>>>         
>>> After mulling over it a bit, here's what I'd suggest:
>>>
>>> 1) Integrate VMstate with qdev
>>> 2) Introduce a bitmap blacklist for unsupported VMstate versions
>>> 3) Expose that bitmap as a qdev property for each device.
>>> 4) By default, upstream qemu will always set the bitmap to be 100% correct.
>>>
>>> This provides a mechanism for informed users and downstreams to 
>>> reduce  correctness in favor of migration compatibility on a 
>>> case-by-case basis.
>>>     
>>
>> Is this for backward migration?
>
> It's for migrating from an older qemu to a newer one.  Normally, newer  
> qemu will happily support older formats but in this case, we broke  
> something and we need to blacklist the old format.  This lets you  
> override that black list.

Then we can already do that: just return an error on the load function
if version_id is too low.

Doing with a bitmap on qdev would be more flexible, but it is already
possible today.


>
>> If so, even with this bitmap, how would the migration source process
>> know which version it should use when generating the savevm data?
>>   
>
> To properly support this in -M pc-0.11, we'll need to be able to set the  
> version to migrate for each qdev device.  Again, this is something that  
> could be overridden as a qdev property.  The effect would be that we  
> force a newer qemu to generate the older savevm format.

Right. Different from the bitmap, this is something we can't easily
reproduce with current infra-structure.

>
>> (considering that the migration stream is unidirectional, today) We have
>> been considering using a "set-savevm-version" monitor command that would
>> be used by management if backward migration is forced by the user.
>>   
>
> qdev property is the right approach I think.  It's really a per-device  
> setting.  It needs to get tied to machine type too and that's a  
> convenient way to do that.
>
>> BTW, we still have the "machine type" suggestion, that would still keep
>> guest-visible state correctness and allow backward migration when it is
>> 100% correct and safe. With such mechanism, VMs created with the x.y.1
>> machine type could be safely migrated from x.y.2 to x.y.1. (Althought
>> the bitmap suggestion could have some use even on this case, if the user
>> really wants to force migration of a x.y.2 machine to x.y.1).
>>   
>
> In theory, a user can manually specify everything in a machine type.

So the qdev magic would be used to provide input to this sytem. I see.



>
>>> This takes qemu out of the business of creating these sort of 
>>> policies  but allows RHEL to make decisions about what default policy 
>>> it uses.  It  also lets well informed users of RHEL to override those 
>>> policy decisions  when they deem it to be appropriate.
>>>
>>> This would make me happy both from an upstream qemu perspective but 
>>> also  as a consumer of RHEL.
>>>     
>>
>> What about the suggestion of using multiple sections per device, every
>> time a new feature is added, instead of just increasing the version
>> numbers linearly? It allows us to keep the savevm version info
>> consistent on the multiple downstream trees.
>>   
>
> It doesn't because it's just as likely to get clashes in subsection  
> names.  For instance, RHEL5.4 may call the pvclock msr subsection  
> "pvclock-msrs" and then upstream may call it "pvclock-msrs" and flip the  
> order of the fields.

If we implemented this on RHEL before including it upstream, then we
could have a "RHEL" subversion flag set (or simply call it
"pvclock-msrs-rhel"). But if we backport it, there is no reason to make
the version scheme incompatible.


>
> To support downstreams effectively, we need vendor specific versioning  
> so that we can separate the upstream qemu namespace from each of the  
> downstreams.

The problem is that newer versions of downstream code will be branched
off newer upstream versions, in the future. A mechanism that helps
keeping the version numbers compatible where possible (not always) would
facilitate code contribution on both directions.


>
>> Suppose we have the following scenario:
>>
>> 1) Device Foo has features A, B, C on "foo" section, sets version to 1
>> 2) Downstream tree (e.g. RHEL) is branched off upstream
>> 3) Device Foo adds support to feature D, version change to 2
>> 5) Device Foo adds support to feature E, version changed to 3
>> 6) Feature E is backported to a downstream tree. Now it supports
>>    features A,B,C,E, and its versioning scheme will be incompatible with
>>    upstream.
>>   
>
> Downstream adds a "RHEL" subversion.  This allows downstream to add a  
> subversion to each device if it modifies it.  When it backports E, it  
> bumps the downstream version from 0->1.
>
> As long as the backported features aren't enabled, the migration will be  
> compatible to upstream.  Once one of these backported features is  
> enabled, migration will fail gracefully.
>
>> What I suggest is something like:
>>
>> 1) Device Foo has features A,B,C, on "foo" section (or maybe on "foo.a",
>>    "foo.b", and "foo.c" sections, depending if they make sense
>>    individually)
>> 2) Device Foo adds support to feature D, adds "foo.d" section
>> 3) Device Foo adds support to feature E, adds "foo.e" section
>>   
>
> The combinations blow up quickly.  Just because A,B,C,E works for a  
> given downstream, doesn't mean that it would work with the upstream code  
> base.  Features are rarely so independent of one another.
>
> It also doesn't address things like QXL which aren't just a simple  
> matter of a backported upstream feature.

My point is that sometimes they are clearly independent, and when that
happens, keeping the version schemes compatible is a good thing. The
pvclock MSRs are an example: they are clearly independent from the other
MSRs and it wouldn't hurt Qemu if they were added as a separated
section.

There would be other benefits, too: the pvclock MSR section could be
disabled if pvclock support is disabled on the command-line or machine
definition. We would even have an answer to the user that wanted
backward migration: "yeah, if you were so sure the guest OS didn't use
pvclock, you could have disabled pvclock when the VM was created, and
migration would be possible".

Yes, there are many cases where doing this won't be enough or won't be
as simple, and we will need a "sub-version" field like you suggested for
stuff that are too different from upstream. I am just suggesting that we
encourage savevm features to be introduced in a more "modular" way where
possible, to facilitate collaboration.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-23 12:15 ` Juan Quintela
  2009-11-23 13:09   ` Anthony Liguori
@ 2009-11-24 10:39   ` Dor Laor
  2009-11-24 14:01     ` Michael S. Tsirkin
  2009-11-24 13:59   ` Michael S. Tsirkin
  2 siblings, 1 reply; 96+ messages in thread
From: Dor Laor @ 2009-11-24 10:39 UTC (permalink / raw)
  To: Juan Quintela; +Cc: qemu-devel

On 11/23/2009 02:15 PM, Juan Quintela wrote:
> Dor Laor<dlaor@redhat.com>  wrote:
>> >  In the last couple of days we discovered some issues regarding stable
>> >  ABI and the robustness of the live migration protocol. Let's just jump
>> >  right into it, ordered by complexity:
>> >
>> >  1. Control*every*  feature exposed to the guest by qemu cmdline:
>> >
>> >      While thinking on cross version migration, and reviewing some
>> >      patches, I noticed that there are many times that we use feature bits
>> >      in order to expose functionality for the guest driver - example:
>> >      VIRTIO_BLK_F_BARRIER, but we do not control it from qemu cmdline.
> In my opinion this is madness, qemu command line is already too
> complicated.  I agree with anthony to put it in the command line.

Qemu's cmdline is currently our config file.. Actually there is nothing 
wrong with it. Human users shouldn't be interested with these changes 
and management software should not have problem manipulating it.
We do need flexibility of controlling our features like any other 
software component.

> I will go further, and think that this kind of issues should be put into
> the machine type.
>
> If you start qemu with -M pc-0.10, it should save the state in a 0.10
> compatible way (that don't happens at the moment, but it should work
> that way).

That's the idea - to keep it part of qdev and by default use it with -M.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-23 18:28                       ` Anthony Liguori
  2009-11-23 19:24                         ` Eduardo Habkost
@ 2009-11-24 11:00                         ` Dor Laor
  1 sibling, 0 replies; 96+ messages in thread
From: Dor Laor @ 2009-11-24 11:00 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, Paolo Bonzini, Eduardo Habkost, Gleb Natapov, Juan Quintela

On 11/23/2009 08:28 PM, Anthony Liguori wrote:
> Eduardo Habkost wrote:
>> That may be good enough for upstream Qemu, but IMO for RHEL it is not a
>> realistic policy. If the definition of "guest visible state" is buggy on
>> the current implementation, we can't drop entirely the possibility of
>> fixing it on our stable branch.
>
> After mulling over it a bit, here's what I'd suggest:
>
> 1) Integrate VMstate with qdev
> 2) Introduce a bitmap blacklist for unsupported VMstate versions
> 3) Expose that bitmap as a qdev property for each device.
> 4) By default, upstream qemu will always set the bitmap to be 100% correct.
>
> This provides a mechanism for informed users and downstreams to reduce
> correctness in favor of migration compatibility on a case-by-case basis.
>
> This takes qemu out of the business of creating these sort of policies
> but allows RHEL to make decisions about what default policy it uses. It
> also lets well informed users of RHEL to override those policy decisions
> when they deem it to be appropriate.
>
> This would make me happy both from an upstream qemu perspective but also
> as a consumer of RHEL.

How does this improves the pvclock MSR migration problem?
This is a 5.4.1 -> 5.4.0 migration problem that was originated by time 
drift of pvlcock *on* live migration...
It doesn't work today and it will surely be put in the black list if 
implemented this way. Only windows guest might get benefit out of it 
since they do not use pvclock, but it will turn into management night 
mare instead.
In Qemu we know best what fields/sections are optional and what are must 
have.

If we'll move into a real protocol with specification of every device 
and chunk blocks similar to Paolo/Eduardo suggestions will have maximum 
flexibility and we can allow such migration to happen.

IMHO we need to standartize the migration protocol like an other network 
protocol. The current approach is to use qdev and -M and it's not good 
enough. We should make an effort and try to harden it.
It's the same as the user/kernel ABI - usermode query capabilities, not 
version number.

Lastly, coming to think of it, we should make the migration fail at the 
beginning instead of doing all the dirty bit tracking + pause the VM and 
only then be surprised by different version/capability.

>
> Regards,
>
> Anthony Liguori
>
>
>
>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Re: Live migration protocol, device features, ABIs and   other beasts
  2009-11-23  2:17     ` Anthony Liguori
                         ` (2 preceding siblings ...)
  2009-11-23 13:51       ` Eduardo Habkost
@ 2009-11-24 13:17       ` Michael S. Tsirkin
  2009-11-24 13:35         ` Paul Brook
  3 siblings, 1 reply; 96+ messages in thread
From: Michael S. Tsirkin @ 2009-11-24 13:17 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, qemu-devel

On Sun, Nov 22, 2009 at 08:17:46PM -0600, Anthony Liguori wrote:
> Paolo Bonzini wrote:
>>
>>> I don't see how this fixes anything. If you used feature bits, how do
>>> you migrate from a version that has a feature bit that an older version
>>> doesn't know about? Do you just ignore it?
>>
>> I'd go with chunk instead of feature bits, specifying them like in the  
>> PNG specification:
>
> You mean, each device would have multiple sections?  We already use  
> chunks for each device state.
>
>>> Migration needs to be conservative. There should be only two possible
>>> outcomes: 1) a successful live migration or 2) graceful failure with the
>>> source VM still running correctly. Silently ignoring things that could
>>> affect the guests behavior means that it's possible that after failure,
>>> the guest will fail in an unexpected way.
>>
>> It's up to the source to decide what information is extra.  For  
>> example, the state of a RNG emulation is nice-to-have, but as long as  
>> it is initialized from another random source on the destination you  
>> shouldn't care.
>
> We only migrate things that are guest visible.  Everything else is left  
> to the user to configure.  We wouldn't migrate the state of a RNG  
> emulation provided that it doesn't have an impact on the guest.

But some things might have greater impact than others.

Consider INTx state in PCI as an example.
Qemu used not to migrate it, as a result guest might
lose interrupt when migrating between such old qemu binaries.
This is rare enough, and often harmless, so for a while
no one noticed.

Later, we noticed this is user visible.
As a fix, we are now migrating INTx state and
guests are happy when migrating from new qemu to new qemu.

But it's easy to support migration to old qemu just
by discarding the INTx state, and this is not
at all harder, or worse, than migrating from old qemu
to new one.


> By definition, anything that is guest visible is important because it  
> affects the guest's behavior.
>
> Regards,
>
> Anthony Liguori
>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-22 15:49 ` Anthony Liguori
  2009-11-22 20:22   ` [Qemu-devel] " Paolo Bonzini
@ 2009-11-24 13:21   ` Michael S. Tsirkin
  2009-11-24 13:45     ` Anthony Liguori
  1 sibling, 1 reply; 96+ messages in thread
From: Michael S. Tsirkin @ 2009-11-24 13:21 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: dlaor, qemu-devel

On Sun, Nov 22, 2009 at 09:49:26AM -0600, Anthony Liguori wrote:
>>    We cannot even create a new 'hack section' for new code since the
>>    sections are ordered and expected to be exact match on the
>>    destination.
>>
>>    The result is that new->old migration cannot work. This is not cross
>>    releases even! It means that even a small bug in current release
>>    prevents live migration between various instances of the code.
>>    It forces us to decide whether to fix pvclock migration issue vs
>>    allow new->old migration. Another ugly hack is to add cmdline that
>>    will control this behavior. Still it's a pain to mgmt stack and
>>    users.
>
> This is a pretty normal policy (backwards compat but not forwards compat).

No one is asking that old qemu magically understands new format. It
would be enough that new qemu is able to migrate to a format that old
one understands.

This is backwards compatibility, not forwards compatibility.

If a new version of a word
processor won't save in a format that old version
can read, is it still backwards compatible?

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-23 12:36           ` Gleb Natapov
       [not found]             ` <m3r5rpwcww.fsf@neno.neno>
@ 2009-11-24 13:28             ` Michael S. Tsirkin
  1 sibling, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2009-11-24 13:28 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Paolo Bonzini, Juan Quintela, qemu-devel

On Mon, Nov 23, 2009 at 02:36:40PM +0200, Gleb Natapov wrote:
> > My problem implementing optional features/sections/... is not the
> > savevm/VMState bits.  At the end, implementing that is easy.  What is
> > more dificult is once that a device have 5 features, what are the valid
> > combinations.  i.e. if you have pci and msix features, msix requires
> > pci.  In this case, the dependency is trivial, but in others that
> > hasen't to be so obvious.
> It doesn't matter what device support and how it is configured. This can
> be handled by each device separately. i.e if destination detects that
> source had MSIX enabled for the device but destination hasn't it will
> signal an error.

Yes, this can't work.
But the reverse can: if source does not have MSIX capability
and destination does have it, you can remove MSIX in destination
and make user happy.
One way to do this would be to use the same machine desription
on both sides.

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Re: Live migration protocol, device features, ABIs and   other beasts
  2009-11-24 13:17       ` [Qemu-devel] " Michael S. Tsirkin
@ 2009-11-24 13:35         ` Paul Brook
  2009-11-24 13:49           ` [Qemu-devel] " Michael S. Tsirkin
  0 siblings, 1 reply; 96+ messages in thread
From: Paul Brook @ 2009-11-24 13:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Michael S. Tsirkin

> But it's easy to support migration to old qemu just
> by discarding the INTx state, and this is not
> at all harder, or worse, than migrating from old qemu
> to new one.

Do we really care about migrating to older versions?

Migrating to a new version (backward compatibility) I see the use, it allows 
people to do upgrades with minimal downtime. I have my reservations about how 
feasible this is long-term, but within a release series it's not too bad.

However is migrating to an old version (forward compatibility) really a 
worthwhile thing to support? It sounds like the sort of thing that we're never 
really going to test properly, so will probably fail a good proportion of the 
time anyway. Reading in old state files is a whole lot easier (to write 
maintain, and stay sane) than producing state that is bug-compatible with 
previous versions.
This feels something where the best answer all round is "Don't do that".

Paul

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-23 14:49               ` Anthony Liguori
  2009-11-23 15:21                 ` Eduardo Habkost
       [not found]                 ` <m3y6lxqkpv.fsf@neno.neno>
@ 2009-11-24 13:39                 ` Michael S. Tsirkin
  2 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2009-11-24 13:39 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, Juan Quintela, Gleb Natapov, qemu-devel

On Mon, Nov 23, 2009 at 08:49:23AM -0600, Anthony Liguori wrote:
> Juan Quintela wrote:
>> Anthony Liguori <anthony@codemonkey.ws> wrote:
>>   
>>> Juan Quintela wrote:
>>>     
>>
>>   
>>> I'm not at all convinced that you can downgrade the version of a
>>> device without exposing a functional change to a guest.  In fact, I'm
>>> pretty certain that it's provably impossible.  Please give a counter
>>> example of where this mechanism would be safe.
>>>     
>>
>> The problem that we are having in RHEL just now is that there are two
>> new fields to make pvclock/kvmclock more exact (this is qemu-kvm tree):
>>
>>         /* KVM pvclock msr */
>>         VMSTATE_UINT64_V(system_time_msr, CPUState, 12),
>>         VMSTATE_UINT64_V(wall_clock_msr, CPUState, 12),
>>
>> Before we added that values to the state, we used whatever time the host
>> were using for that values (yes, we had drift).
>>
>> But if we don't send that two values, we are not worse that we were
>> before adding that to the state.
>>   
>
> But the effect is that after you migrate, you change behavior.  In this  
> case, you migrate a guest that isn't drifting and then after migration,  
> you start drifting.

Why is this much better than the other way around?
Migrating from old qemu to new one, we stop drifting.

Users and applications that are sensitive to time drift simply can't use
qemu versions which have time drift, but not all of them are.  There's
no reason to try and prevent everyone from doing this by breaking
new->old migration.


-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-24 13:21   ` Michael S. Tsirkin
@ 2009-11-24 13:45     ` Anthony Liguori
  2009-11-24 13:55       ` Michael S. Tsirkin
  0 siblings, 1 reply; 96+ messages in thread
From: Anthony Liguori @ 2009-11-24 13:45 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: dlaor, qemu-devel

Michael S. Tsirkin wrote:
> On Sun, Nov 22, 2009 at 09:49:26AM -0600, Anthony Liguori wrote:
>   
>>>    We cannot even create a new 'hack section' for new code since the
>>>    sections are ordered and expected to be exact match on the
>>>    destination.
>>>
>>>    The result is that new->old migration cannot work. This is not cross
>>>    releases even! It means that even a small bug in current release
>>>    prevents live migration between various instances of the code.
>>>    It forces us to decide whether to fix pvclock migration issue vs
>>>    allow new->old migration. Another ugly hack is to add cmdline that
>>>    will control this behavior. Still it's a pain to mgmt stack and
>>>    users.
>>>       
>> This is a pretty normal policy (backwards compat but not forwards compat).
>>     
>
> No one is asking that old qemu magically understands new format. It
> would be enough that new qemu is able to migrate to a format that old
> one understands.
>   

I've got no problem with that provided it's something explicitly 
requested by the user.  This requires no migration protocol change 
though so if this is what folks are asking for, why is this thread 
focused around changing the live migration protocol?

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Re: Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-24 13:35         ` Paul Brook
@ 2009-11-24 13:49           ` Michael S. Tsirkin
  2009-11-24 13:59             ` [Qemu-devel] " Paul Brook
       [not found]             ` <m3my2ct2qe.fsf@neno.neno>
  0 siblings, 2 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2009-11-24 13:49 UTC (permalink / raw)
  To: Paul Brook; +Cc: Paolo Bonzini, qemu-devel

On Tue, Nov 24, 2009 at 01:35:35PM +0000, Paul Brook wrote:
> > But it's easy to support migration to old qemu just
> > by discarding the INTx state, and this is not
> > at all harder, or worse, than migrating from old qemu
> > to new one.
> 
> Do we really care about migrating to older versions?
> 
> Migrating to a new version (backward compatibility) I see the use, it allows 
> people to do upgrades with minimal downtime. I have my reservations about how 
> feasible this is long-term, but within a release series it's not too bad.
> 
> However is migrating to an old version (forward compatibility) really a 
> worthwhile thing to support?

migrating to old version does not have to require forward compatibility IMO.
wikipedia says
"Forward compatibility or upward compatibility (sometimes confused with
 extensibility) is the ability of a system to gracefully accept input
 intended for later versions of itself. "

Supporting an explicit option to migrate to old versions would be enough.

> It sounds like the sort of thing that we're never 
> really going to test properly, so will probably fail a good proportion of the 
> time anyway.

Why is it harder to test than old->new migration?
If you have a migration test, just run it both ways.

> Reading in old state files is a whole lot easier (to write 
> maintain, and stay sane) than producing state that is bug-compatible with 
> previous versions.

It seems to me that old->new and new->old migrations are
of about the same level of difficulty.
Supporting one of these but not the other is of course
easier than supporting both, but I don't see where
"a whole lot" comes from.

> This feels something where the best answer all round is "Don't do that".
> 
> Paul
> 

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-24 13:45     ` Anthony Liguori
@ 2009-11-24 13:55       ` Michael S. Tsirkin
  0 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2009-11-24 13:55 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: dlaor, qemu-devel

On Tue, Nov 24, 2009 at 07:45:13AM -0600, Anthony Liguori wrote:
> Michael S. Tsirkin wrote:
>> On Sun, Nov 22, 2009 at 09:49:26AM -0600, Anthony Liguori wrote:
>>   
>>>>    We cannot even create a new 'hack section' for new code since the
>>>>    sections are ordered and expected to be exact match on the
>>>>    destination.
>>>>
>>>>    The result is that new->old migration cannot work. This is not cross
>>>>    releases even! It means that even a small bug in current release
>>>>    prevents live migration between various instances of the code.
>>>>    It forces us to decide whether to fix pvclock migration issue vs
>>>>    allow new->old migration. Another ugly hack is to add cmdline that
>>>>    will control this behavior. Still it's a pain to mgmt stack and
>>>>    users.
>>>>       
>>> This is a pretty normal policy (backwards compat but not forwards compat).
>>>     
>>
>> No one is asking that old qemu magically understands new format. It
>> would be enough that new qemu is able to migrate to a format that old
>> one understands.
>>   
>
> I've got no problem with that provided it's something explicitly  
> requested by the user.  This requires no migration protocol change  
> though so if this is what folks are asking for, why is this thread  
> focused around changing the live migration protocol?

I think the claim is that a section-based migration
protocol would make implementing this easier than
version-based one, especially for downstream which might
cherry-pick a feature from a new release to an old one.

It seems the requirement discussion and the implementation
discussion are going on together in this thread.

> Regards,
>
> Anthony Liguori
>
>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-23 12:15 ` Juan Quintela
  2009-11-23 13:09   ` Anthony Liguori
  2009-11-24 10:39   ` Dor Laor
@ 2009-11-24 13:59   ` Michael S. Tsirkin
  2 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2009-11-24 13:59 UTC (permalink / raw)
  To: Juan Quintela; +Cc: dlaor, qemu-devel

On Mon, Nov 23, 2009 at 01:15:01PM +0100, Juan Quintela wrote:
> Dor Laor <dlaor@redhat.com> wrote:
> > In the last couple of days we discovered some issues regarding stable
> > ABI and the robustness of the live migration protocol. Let's just jump
> > right into it, ordered by complexity:
> >
> > 1. Control *every* feature exposed to the guest by qemu cmdline:
> >
> >    While thinking on cross version migration, and reviewing some
> >    patches, I noticed that there are many times that we use feature bits
> >    in order to expose functionality for the guest driver - example:
> >    VIRTIO_BLK_F_BARRIER, but we do not control it from qemu cmdline.
> 
> In my opinion this is madness, qemu command line is already too
> complicated.  I agree with anthony to put it in the command line.
> I will go further, and think that this kind of issues should be put into
> the machine type.
> 
> If you start qemu with -M pc-0.10, it should save the state in a 0.10
> compatible way (that don't happens at the moment, but it should work
> that way).

The way you save state is fundamentally different from what you have in
the machine.  It's easy to imagine where you migrate qemu-1.0 to
qemu-2.0 to qemu-3.0. There's no reason I can see to use 1.0 format in
the second step in the process, and not requiring it will
make it easier for us to get rid of old format support in
the future.

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-24 13:49           ` [Qemu-devel] " Michael S. Tsirkin
@ 2009-11-24 13:59             ` Paul Brook
  2009-11-24 14:21               ` Michael S. Tsirkin
       [not found]             ` <m3my2ct2qe.fsf@neno.neno>
  1 sibling, 1 reply; 96+ messages in thread
From: Paul Brook @ 2009-11-24 13:59 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Paolo Bonzini, qemu-devel

> > Reading in old state files is a whole lot easier (to write
> > maintain, and stay sane) than producing state that is bug-compatible with
> > previous versions.
> 
> It seems to me that old->new and new->old migrations are
> of about the same level of difficulty.
> Supporting one of these but not the other is of course
> easier than supporting both, but I don't see where
> "a whole lot" comes from.

Migrating from old version requires the restore routine be version aware. 
Migrating to old versions requires the the save routine also be version aware, 
which I'd expect to be about double the amount of work.

Paul

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-24 10:39   ` Dor Laor
@ 2009-11-24 14:01     ` Michael S. Tsirkin
  2009-11-24 14:21       ` Juan Quintela
  0 siblings, 1 reply; 96+ messages in thread
From: Michael S. Tsirkin @ 2009-11-24 14:01 UTC (permalink / raw)
  To: Dor Laor; +Cc: qemu-devel, Juan Quintela

On Tue, Nov 24, 2009 at 12:39:50PM +0200, Dor Laor wrote:
> On 11/23/2009 02:15 PM, Juan Quintela wrote:
>> Dor Laor<dlaor@redhat.com>  wrote:
>>> >  In the last couple of days we discovered some issues regarding stable
>>> >  ABI and the robustness of the live migration protocol. Let's just jump
>>> >  right into it, ordered by complexity:
>>> >
>>> >  1. Control*every*  feature exposed to the guest by qemu cmdline:
>>> >
>>> >      While thinking on cross version migration, and reviewing some
>>> >      patches, I noticed that there are many times that we use feature bits
>>> >      in order to expose functionality for the guest driver - example:
>>> >      VIRTIO_BLK_F_BARRIER, but we do not control it from qemu cmdline.
>> In my opinion this is madness, qemu command line is already too
>> complicated.  I agree with anthony to put it in the command line.
>
> Qemu's cmdline is currently our config file.. Actually there is nothing  
> wrong with it. Human users shouldn't be interested with these changes  
> and management software should not have problem manipulating it.
> We do need flexibility of controlling our features like any other  
> software component.
>
>> I will go further, and think that this kind of issues should be put into
>> the machine type.
>>
>> If you start qemu with -M pc-0.10, it should save the state in a 0.10
>> compatible way (that don't happens at the moment, but it should work
>> that way).
>
> That's the idea - to keep it part of qdev and by default use it with -M.

I think we want to keep these things separate:
machine description should be for things that
are both guest visible and not changeable by guest,
so it absolutely must stay constant as long as guest
it alive.



-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-23 14:13     ` Juan Quintela
@ 2009-11-24 14:05       ` Michael S. Tsirkin
  2009-11-24 14:20         ` Juan Quintela
  2009-11-25 13:36         ` Gerd Hoffmann
  0 siblings, 2 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2009-11-24 14:05 UTC (permalink / raw)
  To: Juan Quintela; +Cc: dlaor, qemu-devel

On Mon, Nov 23, 2009 at 03:13:59PM +0100, Juan Quintela wrote:
> Anthony Liguori <anthony@codemonkey.ws> wrote:
> > Juan Quintela wrote:
> >> Dor Laor <dlaor@redhat.com> wrote:
> >>>     
> >>
> >> My idea here is that we need to have further use of machine
> >> descriptions, once that is done, we need something like a new property
> >> for qdev (version?).  Once there, each device could do:
> >> - if version != last_version -> die (what it happens now)
> >> - do someting sensible, not use the "new" features not existing on that
> >>   version
> >> - edit the savevm format in an easy way.
> >>   
> >
> > But this would only kick in when using pc-0.11 or something, right?
> 
> Yeap.
> 
> At this point, pc-0.10 is just:
> 
> static QEMUMachine pc_machine_v0_10 = {
>     .name = "pc-0.10",
>     .desc = "Standard PC, qemu 0.10",
>     .init = pc_init_pci,
>     .max_cpus = 255,
>     .compat_props = (CompatProperty[]) {
>         {
>             .driver   = "virtio-blk-pci",
>             .property = "class",
>             .value    = stringify(PCI_CLASS_STORAGE_OTHER),
>         },{
>             .driver   = "virtio-console-pci",
>             .property = "class",
>             .value    = stringify(PCI_CLASS_DISPLAY_OTHER),
>         },{
>             .driver   = "virtio-net-pci",
>             .property = "vectors",
>             .value    = stringify(0),
>         },{
>             .driver   = "virtio-blk-pci",
>             .property = "vectors",
>             .value    = stringify(0),
>         },
>         { /* end of list */ }
>     },
> 
> But to really make it work, we need to take a list of each savevm format
> change and put it here.  Notice that several changes are needed:
> - savevm infrastructure save functions don't know about version id
> - devices don't know to "behave" as other version
> - other things that I have probably missed
> 
> Later, Juan.

Why do you think this the right place for it, I wonder?
This describes the machine, it does not seem to have
anything to do with how we migrate it.

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-24 14:05       ` Michael S. Tsirkin
@ 2009-11-24 14:20         ` Juan Quintela
  2009-11-24 14:35           ` Michael S. Tsirkin
  2009-11-25 13:36         ` Gerd Hoffmann
  1 sibling, 1 reply; 96+ messages in thread
From: Juan Quintela @ 2009-11-24 14:20 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: dlaor, qemu-devel

"Michael S. Tsirkin" <mst@redhat.com> wrote:

>> But to really make it work, we need to take a list of each savevm format
>> change and put it here.  Notice that several changes are needed:
>> - savevm infrastructure save functions don't know about version id
>> - devices don't know to "behave" as other version
>> - other things that I have probably missed
>> 
>> Later, Juan.
>
> Why do you think this the right place for it, I wonder?
> This describes the machine, it does not seem to have
> anything to do with how we migrate it.

Becase here is where we know that _this_ machine had _this_ versions of
each device.  And this is also the trivial part to describe: I want a
machine like the one in qemu-0.11.

What is more, it makes trivial for downstream to do things like:
- I define my machine, called foo-7.0, and the changes that it has over
  any other machine.

Later, Juan.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-24 14:01     ` Michael S. Tsirkin
@ 2009-11-24 14:21       ` Juan Quintela
  2009-11-24 14:38         ` Michael S. Tsirkin
  2009-11-24 16:05         ` Michael S. Tsirkin
  0 siblings, 2 replies; 96+ messages in thread
From: Juan Quintela @ 2009-11-24 14:21 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Dor Laor, qemu-devel

"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Tue, Nov 24, 2009 at 12:39:50PM +0200, Dor Laor wrote:
>> On 11/23/2009 02:15 PM, Juan Quintela wrote:
>>> Dor Laor<dlaor@redhat.com>  wrote:
>>>> >  In the last couple of days we discovered some issues regarding stable
>>>> >  ABI and the robustness of the live migration protocol. Let's just jump
>>>> >  right into it, ordered by complexity:
>>>> >
>>>> >  1. Control*every*  feature exposed to the guest by qemu cmdline:
>>>> >
>>>> >      While thinking on cross version migration, and reviewing some
>>>> >      patches, I noticed that there are many times that we use feature bits
>>>> >      in order to expose functionality for the guest driver - example:
>>>> >      VIRTIO_BLK_F_BARRIER, but we do not control it from qemu cmdline.
>>> In my opinion this is madness, qemu command line is already too
>>> complicated.  I agree with anthony to put it in the command line.
>>
>> Qemu's cmdline is currently our config file.. Actually there is nothing  
>> wrong with it. Human users shouldn't be interested with these changes  
>> and management software should not have problem manipulating it.
>> We do need flexibility of controlling our features like any other  
>> software component.
>>
>>> I will go further, and think that this kind of issues should be put into
>>> the machine type.
>>>
>>> If you start qemu with -M pc-0.10, it should save the state in a 0.10
>>> compatible way (that don't happens at the moment, but it should work
>>> that way).
>>
>> That's the idea - to keep it part of qdev and by default use it with -M.
>
> I think we want to keep these things separate:
> machine description should be for things that
> are both guest visible and not changeable by guest,
> so it absolutely must stay constant as long as guest
> it alive.

That is exactly what we need here, that version of the savevm protocol
for each device is the same.

Later, Juan.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-24 13:59             ` [Qemu-devel] " Paul Brook
@ 2009-11-24 14:21               ` Michael S. Tsirkin
  2009-11-24 17:06                 ` Blue Swirl
  0 siblings, 1 reply; 96+ messages in thread
From: Michael S. Tsirkin @ 2009-11-24 14:21 UTC (permalink / raw)
  To: Paul Brook; +Cc: Paolo Bonzini, qemu-devel

On Tue, Nov 24, 2009 at 01:59:49PM +0000, Paul Brook wrote:
> > > Reading in old state files is a whole lot easier (to write
> > > maintain, and stay sane) than producing state that is bug-compatible with
> > > previous versions.
> > 
> > It seems to me that old->new and new->old migrations are
> > of about the same level of difficulty.
> > Supporting one of these but not the other is of course
> > easier than supporting both, but I don't see where
> > "a whole lot" comes from.
> 
> Migrating from old version requires the restore routine be version aware. 
> Migrating to old versions requires the the save routine also be version aware, 
> which I'd expect to be about double the amount of work.
> 
> Paul

Heh, it seems the question is whether double is a lot or not :)

The advantage of both save and restore being version aware
is that you can do light compatibility testing without
having an old qemu lying around.
This is not enough, but better than nothing.

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-23 15:12             ` Anthony Liguori
@ 2009-11-24 14:26               ` Michael S. Tsirkin
  0 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2009-11-24 14:26 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, qemu-devel

On Mon, Nov 23, 2009 at 09:12:15AM -0600, Anthony Liguori wrote:
> Eduardo Habkost wrote:
>> The pvclock MSRs are an example: if the guest is not using pvclock, not
>> restoring the MSRs won't make any difference. Strictly speaking, not
>> migrating them is wrong, but the user may argue that they know it won't
>> impact their guest OS, and that they want to take the risk.
>
> Once you start dealing with issues of risk vs. benefit, it's a policy  
> and belongs in the management layer.
> 
> We don't make risk vs. benefit assessments in qemu.  We defer those  
> types of decisions.
>
> Today, we only succeed migration when we know it will be successful.  We  
> could allow a management tool to override this check such that it could  
> implement such a policy.  But that's a really dangerous option to offer.
> 
>> Also, on the pvclock MSR case (and probably others), any argument
>> against doing backward migration would also be valid against doing
>> forward migration when the source process doesn't have the fix yet,
>> because the pvclock MSRs won't be migrated anyway. Forward migration is
>> as broken as backward migration, but we don't prevent migration on that
>> direction.
>>   
>
> A bug that is visible to a guest is no longer a bug, but a feature that  
> has to be supported for as long as that release is supported.  If we  
> feel that it's too dangerous of a bug, then we need to fail gracefully  
> and refuse to load that state on any other system forcing a proper  
> shutdown/startup for migration to a new version of qemu.
>
> For the purposes of compatibility, it is something that we have to  
> preserve.  In this case, you're introducing two MSRs that are readable  
> and writable by a guest.  If you migrate all of the sudden you lose that  
> MSRs content.  You cannot have live migration cause an MSR to disappear  
> regardless of what the purpose of that MSR is.
>
> Regards,
>
> Anthony Liguori

Same with having MSRs appear, surely?

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-23 14:53         ` [Qemu-devel] " Anthony Liguori
@ 2009-11-24 14:28           ` Michael S. Tsirkin
  2009-11-24 14:33             ` [Qemu-devel] " Anthony Liguori
  0 siblings, 1 reply; 96+ messages in thread
From: Michael S. Tsirkin @ 2009-11-24 14:28 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, Eduardo Habkost, qemu-devel

On Mon, Nov 23, 2009 at 08:53:54AM -0600, Anthony Liguori wrote:
> Eduardo Habkost wrote:
>>>>> Migration needs to be conservative. There should be only two possible
>>>>> outcomes: 1) a successful live migration or 2) graceful failure with the
>>>>> source VM still running correctly. Silently ignoring things that could
>>>>> affect the guests behavior means that it's possible that after failure,
>>>>> the guest will fail in an unexpected way.
>>>>>         
>>>> It's up to the source to decide what information is extra.  For  
>>>> example, the state of a RNG emulation is nice-to-have, but as long 
>>>> as it is initialized from another random source on the destination 
>>>> you shouldn't care.
>>>>       
>>> We only migrate things that are guest visible.  Everything else is 
>>> left to the user to configure.  We wouldn't migrate the state of a 
>>> RNG emulation provided that it doesn't have an impact on the guest.
>>>
>>> By definition, anything that is guest visible is important because it 
>>> affects the guest's behavior.
>>>     
>>
>> Right, but I wouldn't be surprised if a user complains that "I know that
>> my guest don't use that VM feature, so I want to be able to migrate to
>> an older version anyway".
>>   
>
> This could be addressed with a "force" migration feature.  That said, I  
> don't believe that the overwhelming majority of users are in a position  
> to determine whether they can safely migrate to an older version.
>
> Regards,
>
> Anthony Liguori

It's very easy: if their guest runs fine on the old qemu,
it should be safe to migrate there.


-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-24 14:28           ` [Qemu-devel] " Michael S. Tsirkin
@ 2009-11-24 14:33             ` Anthony Liguori
  2009-11-24 16:05               ` Michael S. Tsirkin
  0 siblings, 1 reply; 96+ messages in thread
From: Anthony Liguori @ 2009-11-24 14:33 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Paolo Bonzini, Eduardo Habkost, qemu-devel

Michael S. Tsirkin wrote:
> It's very easy: if their guest runs fine on the old qemu,
> it should be safe to migrate there.
>   

"Runs fine" is a qualitative statement.  There is no way for qemu to 
know whether a guest runs fine or not.  There is no way that we can make 
that statement either.  It has to be something that is controlled higher 
in the stack.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-24 14:20         ` Juan Quintela
@ 2009-11-24 14:35           ` Michael S. Tsirkin
  2009-11-25 13:42             ` Gerd Hoffmann
  0 siblings, 1 reply; 96+ messages in thread
From: Michael S. Tsirkin @ 2009-11-24 14:35 UTC (permalink / raw)
  To: Juan Quintela; +Cc: dlaor, qemu-devel

On Tue, Nov 24, 2009 at 03:20:47PM +0100, Juan Quintela wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> >> But to really make it work, we need to take a list of each savevm format
> >> change and put it here.  Notice that several changes are needed:
> >> - savevm infrastructure save functions don't know about version id
> >> - devices don't know to "behave" as other version
> >> - other things that I have probably missed
> >> 
> >> Later, Juan.
> >
> > Why do you think this the right place for it, I wonder?
> > This describes the machine, it does not seem to have
> > anything to do with how we migrate it.
> 
> Becase here is where we know that _this_ machine had _this_ versions of
> each device.

So? migration format is not related to what devices
you have at all. It's how you save them.

>  And this is also the trivial part to describe: I want a
> machine like the one in qemu-0.11.

Yes, but there might be a ton of reasons to want a
machine like the one in qemu 0.11.
The need to migrate to old qemu is very rare,
it is a completely separate decision
one might take long after starting qemu,

> What is more, it makes trivial for downstream to do things like:
> - I define my machine, called foo-7.0, and the changes that it has over
>   any other machine.
> 
> Later, Juan.

Yes, machine desacription is a good thing.
No, using it for things that have nothing to do with
describing the machine is not a good thing.

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-24 14:21       ` Juan Quintela
@ 2009-11-24 14:38         ` Michael S. Tsirkin
  2009-11-24 16:05         ` Michael S. Tsirkin
  1 sibling, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2009-11-24 14:38 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Dor Laor, qemu-devel

On Tue, Nov 24, 2009 at 03:21:34PM +0100, Juan Quintela wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Tue, Nov 24, 2009 at 12:39:50PM +0200, Dor Laor wrote:
> >> On 11/23/2009 02:15 PM, Juan Quintela wrote:
> >>> Dor Laor<dlaor@redhat.com>  wrote:
> >>>> >  In the last couple of days we discovered some issues regarding stable
> >>>> >  ABI and the robustness of the live migration protocol. Let's just jump
> >>>> >  right into it, ordered by complexity:
> >>>> >
> >>>> >  1. Control*every*  feature exposed to the guest by qemu cmdline:
> >>>> >
> >>>> >      While thinking on cross version migration, and reviewing some
> >>>> >      patches, I noticed that there are many times that we use feature bits
> >>>> >      in order to expose functionality for the guest driver - example:
> >>>> >      VIRTIO_BLK_F_BARRIER, but we do not control it from qemu cmdline.
> >>> In my opinion this is madness, qemu command line is already too
> >>> complicated.  I agree with anthony to put it in the command line.
> >>
> >> Qemu's cmdline is currently our config file.. Actually there is nothing  
> >> wrong with it. Human users shouldn't be interested with these changes  
> >> and management software should not have problem manipulating it.
> >> We do need flexibility of controlling our features like any other  
> >> software component.
> >>
> >>> I will go further, and think that this kind of issues should be put into
> >>> the machine type.
> >>>
> >>> If you start qemu with -M pc-0.10, it should save the state in a 0.10
> >>> compatible way (that don't happens at the moment, but it should work
> >>> that way).
> >>
> >> That's the idea - to keep it part of qdev and by default use it with -M.
> >
> > I think we want to keep these things separate:
> > machine description should be for things that
> > are both guest visible and not changeable by guest,
> > so it absolutely must stay constant as long as guest
> > it alive.
> 
> That is exactly what we need here, that version of the savevm protocol
> for each device is the same.
> 
> Later, Juan.

Same device should be able to use many format versions.

As I said in a separate thread, you might want to do migration between
qemu-1.0 to qemu 2.0 to qemu-3.0
and the second part should use a new format, while
all of them must use the same machine.

Doing it this way will let us get rid of some old
formats eventually, without forcing reboots for users.

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-24 14:21       ` Juan Quintela
  2009-11-24 14:38         ` Michael S. Tsirkin
@ 2009-11-24 16:05         ` Michael S. Tsirkin
  2009-11-25  9:30           ` Juan Quintela
  1 sibling, 1 reply; 96+ messages in thread
From: Michael S. Tsirkin @ 2009-11-24 16:05 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Dor Laor, qemu-devel

On Tue, Nov 24, 2009 at 03:21:34PM +0100, Juan Quintela wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Tue, Nov 24, 2009 at 12:39:50PM +0200, Dor Laor wrote:
> >> On 11/23/2009 02:15 PM, Juan Quintela wrote:
> >>> Dor Laor<dlaor@redhat.com>  wrote:
> >>>> >  In the last couple of days we discovered some issues regarding stable
> >>>> >  ABI and the robustness of the live migration protocol. Let's just jump
> >>>> >  right into it, ordered by complexity:
> >>>> >
> >>>> >  1. Control*every*  feature exposed to the guest by qemu cmdline:
> >>>> >
> >>>> >      While thinking on cross version migration, and reviewing some
> >>>> >      patches, I noticed that there are many times that we use feature bits
> >>>> >      in order to expose functionality for the guest driver - example:
> >>>> >      VIRTIO_BLK_F_BARRIER, but we do not control it from qemu cmdline.
> >>> In my opinion this is madness, qemu command line is already too
> >>> complicated.  I agree with anthony to put it in the command line.
> >>
> >> Qemu's cmdline is currently our config file.. Actually there is nothing  
> >> wrong with it. Human users shouldn't be interested with these changes  
> >> and management software should not have problem manipulating it.
> >> We do need flexibility of controlling our features like any other  
> >> software component.
> >>
> >>> I will go further, and think that this kind of issues should be put into
> >>> the machine type.
> >>>
> >>> If you start qemu with -M pc-0.10, it should save the state in a 0.10
> >>> compatible way (that don't happens at the moment, but it should work
> >>> that way).
> >>
> >> That's the idea - to keep it part of qdev and by default use it with -M.
> >
> > I think we want to keep these things separate:
> > machine description should be for things that
> > are both guest visible and not changeable by guest,
> > so it absolutely must stay constant as long as guest
> > it alive.
> 
> That is exactly what we need here, that version of the savevm protocol
> for each device is the same.
> 
> Later, Juan.

A device already supports load for a range
of versions between X and Y. We want to support
saving to a range of versions.

Which versions to use is a separate decision
which should be taken on run time, not
at startup time.

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-24 14:33             ` [Qemu-devel] " Anthony Liguori
@ 2009-11-24 16:05               ` Michael S. Tsirkin
       [not found]                 ` <m3skc2r66t.fsf@neno.neno>
  0 siblings, 1 reply; 96+ messages in thread
From: Michael S. Tsirkin @ 2009-11-24 16:05 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, Eduardo Habkost, qemu-devel

On Tue, Nov 24, 2009 at 08:33:11AM -0600, Anthony Liguori wrote:
> Michael S. Tsirkin wrote:
>> It's very easy: if their guest runs fine on the old qemu,
>> it should be safe to migrate there.
>>   
>
> "Runs fine" is a qualitative statement.  There is no way for qemu to  
> know whether a guest runs fine or not.

The entity between the keyboard and chair is best placed to decide that.
That entity has already expressed the decision taken by running
the appropriate qemu monitor command.

>  There is no way that we can make  
> that statement either.  It has to be something that is controlled higher  
> in the stack.
> Regards,
>
> Anthony Liguori
>

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-24 14:21               ` Michael S. Tsirkin
@ 2009-11-24 17:06                 ` Blue Swirl
  2009-11-24 17:08                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 96+ messages in thread
From: Blue Swirl @ 2009-11-24 17:06 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Paolo Bonzini, Paul Brook, qemu-devel

On Tue, Nov 24, 2009 at 4:21 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Nov 24, 2009 at 01:59:49PM +0000, Paul Brook wrote:
>> > > Reading in old state files is a whole lot easier (to write
>> > > maintain, and stay sane) than producing state that is bug-compatible with
>> > > previous versions.
>> >
>> > It seems to me that old->new and new->old migrations are
>> > of about the same level of difficulty.
>> > Supporting one of these but not the other is of course
>> > easier than supporting both, but I don't see where
>> > "a whole lot" comes from.
>>
>> Migrating from old version requires the restore routine be version aware.
>> Migrating to old versions requires the the save routine also be version aware,
>> which I'd expect to be about double the amount of work.
>>
>> Paul
>
> Heh, it seems the question is whether double is a lot or not :)
>
> The advantage of both save and restore being version aware
> is that you can do light compatibility testing without
> having an old qemu lying around.
> This is not enough, but better than nothing.

I think the best for us would be if we could make the translator
between versions an external tool with a separate project.

Supporting only the current version would make QEMU simpler to
maintain, any backward compatibility baggage could be thrown out.

The external version translator tool could support arbitrary
conversion between the whole NxN matrix of versions (including distro
hacks), or just those that RHEL happens to use. The tool would not be
limited to QEMU development environment, it could use databases, XSLT,
SOA or be written in C#.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-24 17:06                 ` Blue Swirl
@ 2009-11-24 17:08                   ` Michael S. Tsirkin
  2009-11-24 17:43                     ` Paolo Bonzini
  0 siblings, 1 reply; 96+ messages in thread
From: Michael S. Tsirkin @ 2009-11-24 17:08 UTC (permalink / raw)
  To: Blue Swirl; +Cc: Paolo Bonzini, Paul Brook, qemu-devel

On Tue, Nov 24, 2009 at 07:06:01PM +0200, Blue Swirl wrote:
> On Tue, Nov 24, 2009 at 4:21 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Tue, Nov 24, 2009 at 01:59:49PM +0000, Paul Brook wrote:
> >> > > Reading in old state files is a whole lot easier (to write
> >> > > maintain, and stay sane) than producing state that is bug-compatible with
> >> > > previous versions.
> >> >
> >> > It seems to me that old->new and new->old migrations are
> >> > of about the same level of difficulty.
> >> > Supporting one of these but not the other is of course
> >> > easier than supporting both, but I don't see where
> >> > "a whole lot" comes from.
> >>
> >> Migrating from old version requires the restore routine be version aware.
> >> Migrating to old versions requires the the save routine also be version aware,
> >> which I'd expect to be about double the amount of work.
> >>
> >> Paul
> >
> > Heh, it seems the question is whether double is a lot or not :)
> >
> > The advantage of both save and restore being version aware
> > is that you can do light compatibility testing without
> > having an old qemu lying around.
> > This is not enough, but better than nothing.
> 
> I think the best for us would be if we could make the translator
> between versions an external tool with a separate project.
> 
> Supporting only the current version would make QEMU simpler to
> maintain, any backward compatibility baggage could be thrown out.
> 
> The external version translator tool could support arbitrary
> conversion between the whole NxN matrix of versions (including distro
> hacks), or just those that RHEL happens to use. The tool would not be
> limited to QEMU development environment, it could use databases, XSLT,
> SOA or be written in C#.

Yea, maybe cross-hypervisor migration could be made to work :)
All that would be possible if the migration protocol would
be specified at some level. As it is, the protocol
really dumps out internal infromation about current
qemu implementation, and it seems that making
it a separate project would just slow us down.

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
       [not found]             ` <m3my2ct2qe.fsf@neno.neno>
@ 2009-11-24 17:41               ` Paolo Bonzini
  0 siblings, 0 replies; 96+ messages in thread
From: Paolo Bonzini @ 2009-11-24 17:41 UTC (permalink / raw)
  To: Juan Quintela; +Cc: qemu-devel, Paul Brook, Michael S. Tsirkin

On 11/24/2009 03:30 PM, Juan Quintela wrote:
> No, new ->  old is way, way more difficult.

New->old is way more difficult with the current migration file format. 
The current migration file format is not at all designed to be read by 
an older version.

Or for that matter a tool that only cares about the state of a couple 
devices (this is IMNSHO much more important if it wasn't for the RHEL 
case that started the discussion).

Thinking about the latter and using it to solve the former would be the 
right way to do it.  Some time ago when I had just started at Red Hat I 
wrote a mini-library to parse qemu migrate data, and I couldn't believe 
that there was no length field anywhere.  Luckily I needed only RAM and 
CPU data, so I only needed to know about a couple of extraneous chunks.

Paolo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-24 17:08                   ` Michael S. Tsirkin
@ 2009-11-24 17:43                     ` Paolo Bonzini
  2009-11-24 18:51                       ` Anthony Liguori
  0 siblings, 1 reply; 96+ messages in thread
From: Paolo Bonzini @ 2009-11-24 17:43 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Blue Swirl, Paul Brook, qemu-devel

On 11/24/2009 06:08 PM, Michael S. Tsirkin wrote:
>> >  The external version translator tool could support arbitrary
>> >  conversion between the whole NxN matrix of versions (including distro
>> >  hacks), or just those that RHEL happens to use. The tool would not be
>> >  limited to QEMU development environment, it could use databases, XSLT,
>> >  SOA or be written in C#.
> Yea, maybe cross-hypervisor migration could be made to work:)
> All that would be possible if the migration protocol would
> be specified at some level. As it is, the protocol
> really dumps out internal infromation about current
> qemu implementation, and it seems that making
> it a separate project would just slow us down.

We have Juan's VMState work to start from.  If we can take it a step 
further and use anything (including CPP) to make the primary 
representation of state an XML document or anything like that (and 
convert it back to VMState structs at build time), it would not be a 
huge work, and it would give important benefits.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-24 17:43                     ` Paolo Bonzini
@ 2009-11-24 18:51                       ` Anthony Liguori
  2009-11-24 18:56                         ` Blue Swirl
  2009-11-24 18:57                         ` Paolo Bonzini
  0 siblings, 2 replies; 96+ messages in thread
From: Anthony Liguori @ 2009-11-24 18:51 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Blue Swirl, qemu-devel, Paul Brook, Michael S. Tsirkin

Paolo Bonzini wrote:
> On 11/24/2009 06:08 PM, Michael S. Tsirkin wrote:
>>> >  The external version translator tool could support arbitrary
>>> >  conversion between the whole NxN matrix of versions (including 
>>> distro
>>> >  hacks), or just those that RHEL happens to use. The tool would 
>>> not be
>>> >  limited to QEMU development environment, it could use databases, 
>>> XSLT,
>>> >  SOA or be written in C#.
>> Yea, maybe cross-hypervisor migration could be made to work:)
>> All that would be possible if the migration protocol would
>> be specified at some level. As it is, the protocol
>> really dumps out internal infromation about current
>> qemu implementation, and it seems that making
>> it a separate project would just slow us down.
>
> We have Juan's VMState work to start from.  If we can take it a step 
> further and use anything (including CPP) to make the primary 
> representation of state an XML document or anything like that (and 
> convert it back to VMState structs at build time), it would not be a 
> huge work, and it would give important benefits.

Like adding tremendous complexity for little to no gain.

Sorry, I couldn't help myself :-)

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-24 18:51                       ` Anthony Liguori
@ 2009-11-24 18:56                         ` Blue Swirl
  2009-11-24 19:24                           ` Anthony Liguori
  2009-11-24 18:57                         ` Paolo Bonzini
  1 sibling, 1 reply; 96+ messages in thread
From: Blue Swirl @ 2009-11-24 18:56 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel, Paolo Bonzini, Paul Brook, Michael S. Tsirkin

On Tue, Nov 24, 2009 at 8:51 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:
> Paolo Bonzini wrote:
>>
>> On 11/24/2009 06:08 PM, Michael S. Tsirkin wrote:
>>>>
>>>> >  The external version translator tool could support arbitrary
>>>> >  conversion between the whole NxN matrix of versions (including distro
>>>> >  hacks), or just those that RHEL happens to use. The tool would not be
>>>> >  limited to QEMU development environment, it could use databases,
>>>> > XSLT,
>>>> >  SOA or be written in C#.
>>>
>>> Yea, maybe cross-hypervisor migration could be made to work:)
>>> All that would be possible if the migration protocol would
>>> be specified at some level. As it is, the protocol
>>> really dumps out internal infromation about current
>>> qemu implementation, and it seems that making
>>> it a separate project would just slow us down.
>>
>> We have Juan's VMState work to start from.  If we can take it a step
>> further and use anything (including CPP) to make the primary representation
>> of state an XML document or anything like that (and convert it back to
>> VMState structs at build time), it would not be a huge work, and it would
>> give important benefits.
>
> Like adding tremendous complexity for little to no gain.

But the complexity would be a problem only for the transformation
matrix project. For QEMU the gain would be simplified design, maybe at
the expense of some CPP magic.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-24 18:51                       ` Anthony Liguori
  2009-11-24 18:56                         ` Blue Swirl
@ 2009-11-24 18:57                         ` Paolo Bonzini
  2009-11-24 19:29                           ` Anthony Liguori
  1 sibling, 1 reply; 96+ messages in thread
From: Paolo Bonzini @ 2009-11-24 18:57 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Blue Swirl, qemu-devel, Paul Brook, Michael S. Tsirkin

On 11/24/2009 07:51 PM, Anthony Liguori wrote:
>> to make the primary representation of state an XML document

Since my brain is not working well today, I'll just point out that of 
course I meant "the primary representation of _schemas_ an XML document" 
or anything like that.

>> or  anything like that (and convert it back to VMState structs at build
>> time), it would not be a huge work, and it would give important benefits.
>
> Like adding tremendous complexity for little to no gain.

Anything that could result in a libqemustate or something like that 
would be complex, but would have gain.  (Yes, I've seen the smiley 
despite aforementioned problems with the brain).

Paolo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-24 18:56                         ` Blue Swirl
@ 2009-11-24 19:24                           ` Anthony Liguori
  0 siblings, 0 replies; 96+ messages in thread
From: Anthony Liguori @ 2009-11-24 19:24 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel, Paolo Bonzini, Paul Brook, Michael S. Tsirkin

Blue Swirl wrote:
> But the complexity would be a problem only for the transformation
> matrix project. For QEMU the gain would be simplified design, maybe at
> the expense of some CPP magic.
>   

I don't think it's always a matter of just transforming state.  There 
will be certain features that need to be enabled/disabled to ensure 
compatibility.  That certainly is information that has to live in qemu.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-24 18:57                         ` Paolo Bonzini
@ 2009-11-24 19:29                           ` Anthony Liguori
  2009-11-24 20:01                             ` Michael S. Tsirkin
  0 siblings, 1 reply; 96+ messages in thread
From: Anthony Liguori @ 2009-11-24 19:29 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Blue Swirl, qemu-devel, Paul Brook, Michael S. Tsirkin

Paolo Bonzini wrote:
> On 11/24/2009 07:51 PM, Anthony Liguori wrote:
>>> to make the primary representation of state an XML document
>
> Since my brain is not working well today, I'll just point out that of 
> course I meant "the primary representation of _schemas_ an XML 
> document" or anything like that.
>
>>> or  anything like that (and convert it back to VMState structs at build
>>> time), it would not be a huge work, and it would give important 
>>> benefits.
>>
>> Like adding tremendous complexity for little to no gain.
>
> Anything that could result in a libqemustate or something like that 
> would be complex, but would have gain.  (Yes, I've seen the smiley 
> despite aforementioned problems with the brain).

Which would be...?

"Increased flexibility" is not a quantifiable gain.  If there's a 
particular feature we need to support, let's try to support the feature.

There are bigger fish to fry with live migration than the protocol 
format.  For instance, we need to do a fair bit of work to build an 
infrastructure that will let us test this stuff in a sane way so we can 
avoid introducing things like the pvclock regression.

Regards,

Anthony Liguori
> Paolo

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-24 19:29                           ` Anthony Liguori
@ 2009-11-24 20:01                             ` Michael S. Tsirkin
  0 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2009-11-24 20:01 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Blue Swirl, Paolo Bonzini, Paul Brook, qemu-devel

On Tue, Nov 24, 2009 at 01:29:09PM -0600, Anthony Liguori wrote:
> Paolo Bonzini wrote:
>> On 11/24/2009 07:51 PM, Anthony Liguori wrote:
>>>> to make the primary representation of state an XML document
>>
>> Since my brain is not working well today, I'll just point out that of  
>> course I meant "the primary representation of _schemas_ an XML  
>> document" or anything like that.
>>
>>>> or  anything like that (and convert it back to VMState structs at build
>>>> time), it would not be a huge work, and it would give important  
>>>> benefits.
>>>
>>> Like adding tremendous complexity for little to no gain.
>>
>> Anything that could result in a libqemustate or something like that  
>> would be complex, but would have gain.  (Yes, I've seen the smiley  
>> despite aforementioned problems with the brain).
>
> Which would be...?
>
> "Increased flexibility" is not a quantifiable gain.  If there's a  
> particular feature we need to support, let's try to support the feature.
>
> There are bigger fish to fry with live migration than the protocol  
> format.  For instance, we need to do a fair bit of work to build an  
> infrastructure that will let us test this stuff in a sane way so we can  
> avoid introducing things like the pvclock regression.

BTW, I think that the ability to save in old format will
help in this respect: it makes it possible to do some unit
testing without having old qemu lying around.

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-24 16:05         ` Michael S. Tsirkin
@ 2009-11-25  9:30           ` Juan Quintela
  2009-11-25  9:32             ` Michael S. Tsirkin
  0 siblings, 1 reply; 96+ messages in thread
From: Juan Quintela @ 2009-11-25  9:30 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Dor Laor, qemu-devel

"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Tue, Nov 24, 2009 at 03:21:34PM +0100, Juan Quintela wrote:

> A device already supports load for a range
> of versions between X and Y. We want to support
> saving to a range of versions.
>
> Which versions to use is a separate decision
> which should be taken on run time, not
> at startup time.

Not in the general case.

Think that v8 brings featureX to one device.  If you _know_ that you don't
want to use feature X, startup time is the proper place.  Important
thing is not the savevm format (we can do any change here), what we
really need is that the guest still runs on the destination, and for
that you can't change the hardware too much.

Later, Juan.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-25  9:30           ` Juan Quintela
@ 2009-11-25  9:32             ` Michael S. Tsirkin
  2009-11-25 13:36               ` Juan Quintela
  0 siblings, 1 reply; 96+ messages in thread
From: Michael S. Tsirkin @ 2009-11-25  9:32 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Dor Laor, qemu-devel

On Wed, Nov 25, 2009 at 10:30:47AM +0100, Juan Quintela wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Tue, Nov 24, 2009 at 03:21:34PM +0100, Juan Quintela wrote:
> 
> > A device already supports load for a range
> > of versions between X and Y. We want to support
> > saving to a range of versions.
> >
> > Which versions to use is a separate decision
> > which should be taken on run time, not
> > at startup time.
> 
> Not in the general case.

If that means "not in all cases", I agree.
But it seems pretty common for bugfixes.

> Think that v8 brings featureX to one device.  If you _know_ that you don't
> want to use feature X, startup time is the proper place.  Important
> thing is not the savevm format (we can do any change here), what we
> really need is that the guest still runs on the destination, and for
> that you can't change the hardware too much.
> 
> Later, Juan.

I think it's clear: if you change guest visible properties
these are features that might belong in machine description.
If not - they don't belong in machine description.

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-25  9:32             ` Michael S. Tsirkin
@ 2009-11-25 13:36               ` Juan Quintela
  0 siblings, 0 replies; 96+ messages in thread
From: Juan Quintela @ 2009-11-25 13:36 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Dor Laor, qemu-devel

"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Nov 25, 2009 at 10:30:47AM +0100, Juan Quintela wrote:
>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> > On Tue, Nov 24, 2009 at 03:21:34PM +0100, Juan Quintela wrote:
>> 
>> > A device already supports load for a range
>> > of versions between X and Y. We want to support
>> > saving to a range of versions.
>> >
>> > Which versions to use is a separate decision
>> > which should be taken on run time, not
>> > at startup time.
>> 
>> Not in the general case.
>
> If that means "not in all cases", I agree.
> But it seems pretty common for bugfixes.
>
>> Think that v8 brings featureX to one device.  If you _know_ that you don't
>> want to use feature X, startup time is the proper place.  Important
>> thing is not the savevm format (we can do any change here), what we
>> really need is that the guest still runs on the destination, and for
>> that you can't change the hardware too much.
>> 
>> Later, Juan.
>
> I think it's clear: if you change guest visible properties
> these are features that might belong in machine description.
> If not - they don't belong in machine description.

Ah, ok.  Now we have some agreement (this second part).
About the 1st part, the difference is that I don't want yet another
mechanism for this bugfixes, just use the same one that the machine
description.

I think we can state it as:
- have different ways from bug fixes that guest visible changes
  pro: bugfixes get easier,
  con: we have _two_ mechanism

- have a single way for bugfixes and guest visible changes
  pro: we only have one mechanism
  con: bugfixes are "elevated to the machine description"

We can stop discussing here, because clearly none is better than the
other.  We have to just make a choice of one with its advantages and
disadvantages.

Later, Juan.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-24 14:05       ` Michael S. Tsirkin
  2009-11-24 14:20         ` Juan Quintela
@ 2009-11-25 13:36         ` Gerd Hoffmann
  2009-11-25 13:40           ` Michael S. Tsirkin
  1 sibling, 1 reply; 96+ messages in thread
From: Gerd Hoffmann @ 2009-11-25 13:36 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: dlaor, qemu-devel, Juan Quintela

On 11/24/09 15:05, Michael S. Tsirkin wrote:
> On Mon, Nov 23, 2009 at 03:13:59PM +0100, Juan Quintela wrote:
>>> But this would only kick in when using pc-0.11 or something, right?
>>
>> Yeap.
>>
>> At this point, pc-0.10 is just:
>>
>> static QEMUMachine pc_machine_v0_10 = {
>>      .name = "pc-0.10",
>>      .desc = "Standard PC, qemu 0.10",
>>      .init = pc_init_pci,
>>      .max_cpus = 255,
>>      .compat_props = (CompatProperty[]) {
>>          {
>>              .driver   = "virtio-blk-pci",
>>              .property = "class",
>>              .value    = stringify(PCI_CLASS_STORAGE_OTHER),
>>          },{
>>              .driver   = "virtio-console-pci",
>>              .property = "class",
>>              .value    = stringify(PCI_CLASS_DISPLAY_OTHER),
>>          },{
>>              .driver   = "virtio-net-pci",
>>              .property = "vectors",
>>              .value    = stringify(0),
>>          },{
>>              .driver   = "virtio-blk-pci",
>>              .property = "vectors",
>>              .value    = stringify(0),
>>          },
>>          { /* end of list */ }
>>      },
>>
>> But to really make it work, we need to take a list of each savevm format
>> change and put it here.  Notice that several changes are needed:
>> - savevm infrastructure save functions don't know about version id
>> - devices don't know to "behave" as other version
>> - other things that I have probably missed
>>
>> Later, Juan.
>
> Why do you think this the right place for it, I wonder?
> This describes the machine, it does not seem to have
> anything to do with how we migrate it.

Well.  It turns off MSI for virtio-net-pci when you start it with -M 
pc-0.10.  Which makes virtio-net-pci savevm sections compatible with the 
qemu 0.10 ...

We could add a DeviceState->savevm field and make that available as 
property for devices which need to support multiple versions.  Then you 
we can use the compat properties to switch back to the older format with 
-M pc-0.10.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-25 13:36         ` Gerd Hoffmann
@ 2009-11-25 13:40           ` Michael S. Tsirkin
  2009-11-25 13:59             ` Gerd Hoffmann
  0 siblings, 1 reply; 96+ messages in thread
From: Michael S. Tsirkin @ 2009-11-25 13:40 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: dlaor, qemu-devel, Juan Quintela

On Wed, Nov 25, 2009 at 02:36:49PM +0100, Gerd Hoffmann wrote:
> On 11/24/09 15:05, Michael S. Tsirkin wrote:
>> On Mon, Nov 23, 2009 at 03:13:59PM +0100, Juan Quintela wrote:
>>>> But this would only kick in when using pc-0.11 or something, right?
>>>
>>> Yeap.
>>>
>>> At this point, pc-0.10 is just:
>>>
>>> static QEMUMachine pc_machine_v0_10 = {
>>>      .name = "pc-0.10",
>>>      .desc = "Standard PC, qemu 0.10",
>>>      .init = pc_init_pci,
>>>      .max_cpus = 255,
>>>      .compat_props = (CompatProperty[]) {
>>>          {
>>>              .driver   = "virtio-blk-pci",
>>>              .property = "class",
>>>              .value    = stringify(PCI_CLASS_STORAGE_OTHER),
>>>          },{
>>>              .driver   = "virtio-console-pci",
>>>              .property = "class",
>>>              .value    = stringify(PCI_CLASS_DISPLAY_OTHER),
>>>          },{
>>>              .driver   = "virtio-net-pci",
>>>              .property = "vectors",
>>>              .value    = stringify(0),
>>>          },{
>>>              .driver   = "virtio-blk-pci",
>>>              .property = "vectors",
>>>              .value    = stringify(0),
>>>          },
>>>          { /* end of list */ }
>>>      },
>>>
>>> But to really make it work, we need to take a list of each savevm format
>>> change and put it here.  Notice that several changes are needed:
>>> - savevm infrastructure save functions don't know about version id
>>> - devices don't know to "behave" as other version
>>> - other things that I have probably missed
>>>
>>> Later, Juan.
>>
>> Why do you think this the right place for it, I wonder?
>> This describes the machine, it does not seem to have
>> anything to do with how we migrate it.
>
> Well.  It turns off MSI for virtio-net-pci when you start it with -M  
> pc-0.10.  Which makes virtio-net-pci savevm sections compatible with the  
> qemu 0.10 ...
>
> We could add a DeviceState->savevm field and make that available as  
> property for devices which need to support multiple versions.  Then you  
> we can use the compat properties to switch back to the older format with  
> -M pc-0.10.
>
> cheers,
>   Gerd

I'm confused sorry. Of course when you want to migrate to qemu 0.10
you must have a compatible machine. And savevm format has nothing
to do with it IMO, so MSI is orthogonal to this discussion.
It just shows that it was smart not to save MSI state when
MSI is not present (/me pats self on the back).

In this thread we were discussing changes like pvclock bug,
where we change savevm format without changing the machine,
or almost without changing the machine.

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-25 13:42             ` Gerd Hoffmann
@ 2009-11-25 13:42               ` Michael S. Tsirkin
  2009-11-25 14:10                 ` Gerd Hoffmann
  0 siblings, 1 reply; 96+ messages in thread
From: Michael S. Tsirkin @ 2009-11-25 13:42 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: dlaor, qemu-devel, Juan Quintela

On Wed, Nov 25, 2009 at 02:42:25PM +0100, Gerd Hoffmann wrote:
>>>   And this is also the trivial part to describe: I want a
>>> machine like the one in qemu-0.11.
>>
>> Yes, but there might be a ton of reasons to want a
>> machine like the one in qemu 0.11.
>> The need to migrate to old qemu is very rare,
>> it is a completely separate decision
>> one might take long after starting qemu,
>
> Doesn't work.  If you have a qemu 0.11 machine, a virtio nic and your  
> guest uses MSI-X you simply can't migrate to qemu 0.10.  End of story.
> If you want to be able to migrate to 0.10 you have to start in 0.10  
> compat mode with MSI-X disabled.  So IMHO it does makes sense to tie the  
> savevm format to -M pc-<version>.
>
> cheers,
>   Gerd

MSI-X is an orthogonal issue, let's not mix it in.
Maybe you are using e1000, or maybe you
disabled MSI-X with a flag.

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-24 14:35           ` Michael S. Tsirkin
@ 2009-11-25 13:42             ` Gerd Hoffmann
  2009-11-25 13:42               ` Michael S. Tsirkin
  0 siblings, 1 reply; 96+ messages in thread
From: Gerd Hoffmann @ 2009-11-25 13:42 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: dlaor, qemu-devel, Juan Quintela

>>   And this is also the trivial part to describe: I want a
>> machine like the one in qemu-0.11.
>
> Yes, but there might be a ton of reasons to want a
> machine like the one in qemu 0.11.
> The need to migrate to old qemu is very rare,
> it is a completely separate decision
> one might take long after starting qemu,

Doesn't work.  If you have a qemu 0.11 machine, a virtio nic and your 
guest uses MSI-X you simply can't migrate to qemu 0.10.  End of story.

If you want to be able to migrate to 0.10 you have to start in 0.10 
compat mode with MSI-X disabled.  So IMHO it does makes sense to tie the 
savevm format to -M pc-<version>.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-25 13:40           ` Michael S. Tsirkin
@ 2009-11-25 13:59             ` Gerd Hoffmann
  2009-11-25 14:03               ` Michael S. Tsirkin
  0 siblings, 1 reply; 96+ messages in thread
From: Gerd Hoffmann @ 2009-11-25 13:59 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: dlaor, qemu-devel, Juan Quintela

On 11/25/09 14:40, Michael S. Tsirkin wrote:

>> We could add a DeviceState->savevm field and make that available as
>> property for devices which need to support multiple versions.  Then you
>> we can use the compat properties to switch back to the older format with
>> -M pc-0.10.

> I'm confused sorry. Of course when you want to migrate to qemu 0.10
> you must have a compatible machine. And savevm format has nothing
> to do with it IMO, so MSI is orthogonal to this discussion.
> It just shows that it was smart not to save MSI state when
> MSI is not present (/me pats self on the back).
>
> In this thread we were discussing changes like pvclock bug,
> where we change savevm format without changing the machine,
> or almost without changing the machine.

If 0.12 has this fixed (and thus a new version) and 0.11 hasn't, then 
you'll want -M pc-0.11 use the old (buggy) savevm version.  You have to 
stay bug compatible otherwise you can't migrate to the old buggy version 
because the old qemu can't handle the new format.

Could be implemented via DeviceState->savevm as outlined above.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-25 13:59             ` Gerd Hoffmann
@ 2009-11-25 14:03               ` Michael S. Tsirkin
  2009-11-25 14:53                 ` Juan Quintela
  0 siblings, 1 reply; 96+ messages in thread
From: Michael S. Tsirkin @ 2009-11-25 14:03 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: dlaor, qemu-devel, Juan Quintela

On Wed, Nov 25, 2009 at 02:59:58PM +0100, Gerd Hoffmann wrote:
> On 11/25/09 14:40, Michael S. Tsirkin wrote:
>
>>> We could add a DeviceState->savevm field and make that available as
>>> property for devices which need to support multiple versions.  Then you
>>> we can use the compat properties to switch back to the older format with
>>> -M pc-0.10.
>
>> I'm confused sorry. Of course when you want to migrate to qemu 0.10
>> you must have a compatible machine. And savevm format has nothing
>> to do with it IMO, so MSI is orthogonal to this discussion.
>> It just shows that it was smart not to save MSI state when
>> MSI is not present (/me pats self on the back).
>>
>> In this thread we were discussing changes like pvclock bug,
>> where we change savevm format without changing the machine,
>> or almost without changing the machine.
>
> If 0.12 has this fixed (and thus a new version) and 0.11 hasn't, then  
> you'll want -M pc-0.11 use the old (buggy) savevm version.  You have to  
> stay bug compatible otherwise you can't migrate to the old buggy version  
> because the old qemu can't handle the new format.
>
> Could be implemented via DeviceState->savevm as outlined above.
>
> cheers,
>   Gerd

There might be many reasons to use -M pc-0.11.  Migrating to old qemu is
only one of them.  We should not force old savevm bugs on all users that
use -M pc-0.11.  In partucular, I think with time (years) we might drop
support for old savevm bugs, but I see no reason not to support old
machines indefinitely.


-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-25 14:10                 ` Gerd Hoffmann
@ 2009-11-25 14:09                   ` Michael S. Tsirkin
  2009-11-25 14:52                     ` Gerd Hoffmann
  2009-11-26 18:03                     ` Andrea Arcangeli
  0 siblings, 2 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2009-11-25 14:09 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: dlaor, qemu-devel, Juan Quintela

On Wed, Nov 25, 2009 at 03:10:16PM +0100, Gerd Hoffmann wrote:
>   Hi,
>
>>> Doesn't work.  If you have a qemu 0.11 machine, a virtio nic and your
>>> guest uses MSI-X you simply can't migrate to qemu 0.10.  End of story.
>>> If you want to be able to migrate to 0.10 you have to start in 0.10
>>> compat mode with MSI-X disabled.  So IMHO it does makes sense to tie the
>>> savevm format to -M pc-<version>.
>>>
>>> cheers,
>>>    Gerd
>>
>> MSI-X is an orthogonal issue, let's not mix it in.
>
> It isn't.  I was making that point with feature=MSI-X.  The same  
> argument is true for any other feature:  If $feature is new in qemu  
> $newversion and you are using it you can't migrate to qemu $oldversion  
> which hasn't $feature.  The versioned machine types turn $feature in  
> $newversion, so migration to $oldversion could work.
>
> cheers,
>   Gerd

We were discussing features that are (mostly) not user-visible.
It is clear that if you have a user-visible change you have
a different machine, so you can not migrate.

Now if you fix a bug by changing savevm format, without user visible
changes you *also* can not migrate, but this does not make it into
feature or make it a good fit for machine description.

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-25 13:42               ` Michael S. Tsirkin
@ 2009-11-25 14:10                 ` Gerd Hoffmann
  2009-11-25 14:09                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 96+ messages in thread
From: Gerd Hoffmann @ 2009-11-25 14:10 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: dlaor, qemu-devel, Juan Quintela

   Hi,

>> Doesn't work.  If you have a qemu 0.11 machine, a virtio nic and your
>> guest uses MSI-X you simply can't migrate to qemu 0.10.  End of story.
>> If you want to be able to migrate to 0.10 you have to start in 0.10
>> compat mode with MSI-X disabled.  So IMHO it does makes sense to tie the
>> savevm format to -M pc-<version>.
>>
>> cheers,
>>    Gerd
>
> MSI-X is an orthogonal issue, let's not mix it in.

It isn't.  I was making that point with feature=MSI-X.  The same 
argument is true for any other feature:  If $feature is new in qemu 
$newversion and you are using it you can't migrate to qemu $oldversion 
which hasn't $feature.  The versioned machine types turn $feature in 
$newversion, so migration to $oldversion could work.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-25 14:09                   ` Michael S. Tsirkin
@ 2009-11-25 14:52                     ` Gerd Hoffmann
  2009-11-26 18:03                     ` Andrea Arcangeli
  1 sibling, 0 replies; 96+ messages in thread
From: Gerd Hoffmann @ 2009-11-25 14:52 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: dlaor, qemu-devel, Juan Quintela

On 11/25/09 15:09, Michael S. Tsirkin wrote:
> We were discussing features that are (mostly) not user-visible.
> It is clear that if you have a user-visible change you have
> a different machine, so you can not migrate.
>
> Now if you fix a bug by changing savevm format, without user visible
> changes you *also* can not migrate, but this does not make it into
> feature or make it a good fit for machine description.

Well, it does fit into the machine description IMHO.  When you want 
migrate to a old qemu version you need to know what the old qemu version 
understands, and the machine description for old qemu is a natural fit.

We still could make the use of this information optional though, so 
savevm/migration could use either the most recent savevm format 
(ignoring DeviceState->savevm_version) or the savevm format matching the 
machine type (using DeviceState->savevm_version) depending on what the 
user chooses.

cheers,
   Gerd

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-25 14:03               ` Michael S. Tsirkin
@ 2009-11-25 14:53                 ` Juan Quintela
  2009-11-25 15:01                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 96+ messages in thread
From: Juan Quintela @ 2009-11-25 14:53 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: dlaor, Gerd Hoffmann, qemu-devel

"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Nov 25, 2009 at 02:59:58PM +0100, Gerd Hoffmann wrote:
>> On 11/25/09 14:40, Michael S. Tsirkin wrote:
>>
>>>> We could add a DeviceState->savevm field and make that available as
>>>> property for devices which need to support multiple versions.  Then you
>>>> we can use the compat properties to switch back to the older format with
>>>> -M pc-0.10.
>>
>>> I'm confused sorry. Of course when you want to migrate to qemu 0.10
>>> you must have a compatible machine. And savevm format has nothing
>>> to do with it IMO, so MSI is orthogonal to this discussion.
>>> It just shows that it was smart not to save MSI state when
>>> MSI is not present (/me pats self on the back).
>>>
>>> In this thread we were discussing changes like pvclock bug,
>>> where we change savevm format without changing the machine,
>>> or almost without changing the machine.
>>
>> If 0.12 has this fixed (and thus a new version) and 0.11 hasn't, then  
>> you'll want -M pc-0.11 use the old (buggy) savevm version.  You have to  
>> stay bug compatible otherwise you can't migrate to the old buggy version  
>> because the old qemu can't handle the new format.
>>
>> Could be implemented via DeviceState->savevm as outlined above.
>>
>> cheers,
>>   Gerd
>
> There might be many reasons to use -M pc-0.11.  Migrating to old qemu is
> only one of them.  We should not force old savevm bugs on all users that
> use -M pc-0.11.  In partucular, I think with time (years) we might drop
> support for old savevm bugs, but I see no reason not to support old
> machines indefinitely.

You had to get both or drop both.  If you do a savevm change, machine is
not pc-0.11 anymore, it is pc-0.11.1  or whatever do you want to call
it.

Savevm format is tied to whatever devices version you are using, really.
It just happens that for some cases, you could move between one type of
machine and another.  But that is only some cases.  And IMHO, you don't
want to generate another set of infrastructure for that case.

You need to change saveformats when you change machine description.  I
think that everybody agrees here.  Now, there are times, when one
machine can save in other savevm formats (notice that I am not telling
how many cases are).  To be able to do this, you need yet another
infrastructure.

What people are telling here (me, gerd now) is that the cost/benefit of
adding  the new infrastructure for another use case is too high.  That
it is better to get this behaviour with new machine description types.

Obviously you disagree here.  Then we can stop discussing what cases are
doable one way and another, and go back to the "real discussion".

Does it makes sense to only have one mechanism to change savevm formats
(Machine description types) or should we have Machine descrition types
and another one when machines can save in different formats?

Again, it is a tradeoff between amount of infrastructure and
flexibility.  But the discussion that we are having at this moment is
this one.

Notice that I haven't told yet what are more probable, bugfixes that can
be changed at the savevm moment or changes that need to change at device
creation time.

Michael, do you agree that the discusion is to have two mechanisms vs
only one?  That way, we don't have to continue creating examples :)

Later, Juan.

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-25 14:53                 ` Juan Quintela
@ 2009-11-25 15:01                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2009-11-25 15:01 UTC (permalink / raw)
  To: Juan Quintela; +Cc: dlaor, Gerd Hoffmann, qemu-devel

On Wed, Nov 25, 2009 at 03:53:40PM +0100, Juan Quintela wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Wed, Nov 25, 2009 at 02:59:58PM +0100, Gerd Hoffmann wrote:
> >> On 11/25/09 14:40, Michael S. Tsirkin wrote:
> >>
> >>>> We could add a DeviceState->savevm field and make that available as
> >>>> property for devices which need to support multiple versions.  Then you
> >>>> we can use the compat properties to switch back to the older format with
> >>>> -M pc-0.10.
> >>
> >>> I'm confused sorry. Of course when you want to migrate to qemu 0.10
> >>> you must have a compatible machine. And savevm format has nothing
> >>> to do with it IMO, so MSI is orthogonal to this discussion.
> >>> It just shows that it was smart not to save MSI state when
> >>> MSI is not present (/me pats self on the back).
> >>>
> >>> In this thread we were discussing changes like pvclock bug,
> >>> where we change savevm format without changing the machine,
> >>> or almost without changing the machine.
> >>
> >> If 0.12 has this fixed (and thus a new version) and 0.11 hasn't, then  
> >> you'll want -M pc-0.11 use the old (buggy) savevm version.  You have to  
> >> stay bug compatible otherwise you can't migrate to the old buggy version  
> >> because the old qemu can't handle the new format.
> >>
> >> Could be implemented via DeviceState->savevm as outlined above.
> >>
> >> cheers,
> >>   Gerd
> >
> > There might be many reasons to use -M pc-0.11.  Migrating to old qemu is
> > only one of them.  We should not force old savevm bugs on all users that
> > use -M pc-0.11.  In partucular, I think with time (years) we might drop
> > support for old savevm bugs, but I see no reason not to support old
> > machines indefinitely.
> 
> You had to get both or drop both.  If you do a savevm change, machine is
> not pc-0.11 anymore, it is pc-0.11.1  or whatever do you want to call
> it.

Well, no. machine is whatever you boot into.
If it looks the same when you first boot into it,
it is the same machine.

> Savevm format is tied to whatever devices version you are using, really.

Yes. But not the other way around.
devices are not tied to savevm format.


> It just happens that for some cases, you could move between one type of
> machine and another.  But that is only some cases.  And IMHO, you don't
> want to generate another set of infrastructure for that case.
> 
> You need to change saveformats when you change machine description.  I
> think that everybody agrees here.  Now, there are times, when one
> machine can save in other savevm formats (notice that I am not telling
> how many cases are).  To be able to do this, you need yet another
> infrastructure.

No, this is *not* what me and others are telling you.
What people are saying is that you often have many
savevm formats for the same machine.
This is the issue we are trying to address.


> What people are telling here (me, gerd now) is that the cost/benefit of
> adding  the new infrastructure for another use case is too high.  That
> it is better to get this behaviour with new machine description types.
> 
> Obviously you disagree here.

I do not disagree. But the argument you are making for me
above is not the one I am making at all.

>  Then we can stop discussing what cases are
> doable one way and another, and go back to the "real discussion".
> 
> Does it makes sense to only have one mechanism to change savevm formats
> (Machine description types) or should we have Machine descrition types
> and another one when machines can save in different formats?
> 
> Again, it is a tradeoff between amount of infrastructure and
> flexibility.  But the discussion that we are having at this moment is
> this one.
> 
> Notice that I haven't told yet what are more probable, bugfixes that can
> be changed at the savevm moment or changes that need to change at device
> creation time.
> 
> Michael, do you agree that the discusion is to have two mechanisms vs
> only one?  That way, we don't have to continue creating examples :)
> 
> Later, Juan.

Looks like we are talking about different things.

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
       [not found]                 ` <m3skc2r66t.fsf@neno.neno>
@ 2009-11-25 16:28                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 96+ messages in thread
From: Michael S. Tsirkin @ 2009-11-25 16:28 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Paolo Bonzini, Eduardo Habkost, qemu-devel

On Wed, Nov 25, 2009 at 04:10:34PM +0100, Juan Quintela wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Tue, Nov 24, 2009 at 08:33:11AM -0600, Anthony Liguori wrote:
> >> Michael S. Tsirkin wrote:
> >>> It's very easy: if their guest runs fine on the old qemu,
> >>> it should be safe to migrate there.
> >>>   
> >>
> >> "Runs fine" is a qualitative statement.  There is no way for qemu to  
> >> know whether a guest runs fine or not.
> >
> > The entity between the keyboard and chair is best placed to decide that.
> > That entity has already expressed the decision taken by running
> > the appropriate qemu monitor command.
> >
> >>  There is no way that we can make  
> >> that statement either.  It has to be something that is controlled higher  
> >> in the stack.
> 
> That this would be the 1st time that entity betwen keyboard and monitor
> shoot himself in the foot :)
> 
> I really think "strongly" that this "loose" migration is just searching
> problems.  If we want to create migration, we want the destination to
> consume/understand all the fields.  If any of the fields are not going
> to be used, they shouldn't be sent in the 1st place.  That this is
> managed by a negotiation phase at savevm time, or at a high level by a
> managing application, it don't matter.  If target don't
> understand/consume all info sent during migration -> migration fail.
> 
> Notice that this is "quantatitive" not "qualitative".  It is trivial to
> check if we have consumed everything or not.  If anything is missing,
> etc.  On the other way, when it is safe to not use same fields, that is
> way more complicated to state.  and that belongs to the managament
> aplication/savevm negotation, etc.  Not to the poor savevm protocol.
> 
> Later, Juan.

I agree. A proposed solution to that is a "savevm description file",
specifying savevm versions to use and/or which fields should be sent.

-- 
MST

^ permalink raw reply	[flat|nested] 96+ messages in thread

* Re: [Qemu-devel] Re: Live migration protocol, device features, ABIs and other beasts
  2009-11-25 14:09                   ` Michael S. Tsirkin
  2009-11-25 14:52                     ` Gerd Hoffmann
@ 2009-11-26 18:03                     ` Andrea Arcangeli
  1 sibling, 0 replies; 96+ messages in thread
From: Andrea Arcangeli @ 2009-11-26 18:03 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Juan Quintela, dlaor, Gerd Hoffmann, qemu-devel

Very lengthy discussion, apologies if I repeat something in one of the
various threads but I read lots of these discussions and I'm somewhat
confused still of what this is all about...

On Wed, Nov 25, 2009 at 04:09:55PM +0200, Michael S. Tsirkin wrote:
> We were discussing features that are (mostly) not user-visible.
> It is clear that if you have a user-visible change you have
> a different machine, so you can not migrate.
> 
> Now if you fix a bug by changing savevm format, without user visible
> changes you *also* can not migrate, but this does not make it into
> feature or make it a good fit for machine description.

There clearly has to be a separation already of machine definition
otherwise forward migration to new qemu version couldn't be guaranteed
in the first place!

To migrate back all we need is the ability of the new version of qemu
to write savevm in the old version format negotiated as
max(oldformats[], newformats[]). It already has to be able to "read"
the old savevm format but it wasn't required to write it yet, writing
old format is the only new requirement. The machine definition is the
old one because it comes from an old qemu and it has to be handled by
new qemu if forward migration was possible in the first place.

Clearly the migration won't be done safely across the cluster until
all host nodes are upgraded, so I think the highlevel GUI should print
a warning when it notices a migration from new savevm format to old
savevm format. (obviously only savevm format can change here,
machine definition isn't changing if migration is possible at all and
it should just return error!).

Then in an orthogonal way (totally different problem) we need to
ensure all VM are started with the same guest visible machine
definition (that should be true even if savevm format doesn't
change). With -M if that's the desired API and we are upgrading qemu
significantly in that update, if we didn't change qemu drivers
significantly no -M parameter is needed. And if we upgrade machine
definition migration will simply stop and that's feature not a bug.

Now how much finegrined we want the savevm format, to be versioned per
device, how complex we want the negotiation protocol (to be more
extensible in the future) is all a matter of implementation
details.

In very short all we can be reasonably discussing here is to add the
ability to new qemu to write in older (buggy) savevm format to allow
backwards migration and to negotiate the highest savevm format for a
backwards migration at the start of the connection, with a warning
that there's a savevm format downgrade during migration so user knows
he's risking instability and he should confirm after negotiation is
complete and the downgrade has been noticed. After that we can still
migrate (with a warning) from fixed pvclock to broken pvclock (the
latter will remain potentially unstable, which is warning is required
in my view) and they won't be forced to upgrade all hosts at once to
still migrate across the whole cluster.

^ permalink raw reply	[flat|nested] 96+ messages in thread

end of thread, other threads:[~2009-11-26 18:03 UTC | newest]

Thread overview: 96+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-22 15:03 [Qemu-devel] Live migration protocol, device features, ABIs and other beasts Dor Laor
2009-11-22 15:49 ` Anthony Liguori
2009-11-22 20:22   ` [Qemu-devel] " Paolo Bonzini
2009-11-23  2:17     ` Anthony Liguori
2009-11-23  8:18       ` Paolo Bonzini
2009-11-23 13:04         ` Anthony Liguori
2009-11-23  8:26       ` Gleb Natapov
2009-11-23  9:29         ` Paolo Bonzini
2009-11-23  9:31           ` Gleb Natapov
     [not found]             ` <m3einp4e7c.fsf@neno.neno>
2009-11-23 12:37               ` Gleb Natapov
     [not found]         ` <m3iqd14edf.fsf@neno.neno>
2009-11-23 12:36           ` Gleb Natapov
     [not found]             ` <m3r5rpwcww.fsf@neno.neno>
2009-11-23 14:32               ` Gleb Natapov
2009-11-23 14:51                 ` Anthony Liguori
2009-11-23 14:53                   ` Gleb Natapov
2009-11-23 15:05                     ` Anthony Liguori
2009-11-23 15:22                       ` Gleb Natapov
2009-11-23 15:30                         ` Paolo Bonzini
2009-11-23 15:32                         ` Anthony Liguori
2009-11-23 15:49                           ` Gleb Natapov
2009-11-23 16:09                             ` Anthony Liguori
2009-11-23 16:15                               ` Gleb Natapov
2009-11-23 16:19                                 ` Anthony Liguori
     [not found]                   ` <m33a45s009.fsf@neno.neno>
2009-11-23 16:05                     ` Gleb Natapov
2009-11-23 16:10                       ` Anthony Liguori
2009-11-24 13:28             ` Michael S. Tsirkin
2009-11-23 13:01           ` Anthony Liguori
     [not found]             ` <m3vdh1wd0n.fsf@neno.neno>
2009-11-23 14:49               ` Anthony Liguori
2009-11-23 15:21                 ` Eduardo Habkost
2009-11-23 16:16                   ` Anthony Liguori
2009-11-23 17:08                     ` Eduardo Habkost
2009-11-23 18:28                       ` Anthony Liguori
2009-11-23 19:24                         ` Eduardo Habkost
2009-11-23 19:49                           ` Anthony Liguori
2009-11-23 21:21                             ` Eduardo Habkost
2009-11-24 11:00                         ` Dor Laor
     [not found]                 ` <m3y6lxqkpv.fsf@neno.neno>
2009-11-23 16:44                   ` Anthony Liguori
     [not found]                     ` <m3zl6db11z.fsf@neno.neno>
2009-11-23 18:44                       ` Anthony Liguori
2009-11-23 20:24                     ` Eduardo Habkost
2009-11-24 13:39                 ` Michael S. Tsirkin
2009-11-23 13:51       ` Eduardo Habkost
2009-11-23 14:21         ` Paolo Bonzini
2009-11-23 15:00           ` Anthony Liguori
2009-11-23 15:37             ` Eduardo Habkost
2009-11-23 15:02           ` Eduardo Habkost
2009-11-23 15:12             ` Anthony Liguori
2009-11-24 14:26               ` [Qemu-devel] " Michael S. Tsirkin
2009-11-23 14:53         ` [Qemu-devel] " Anthony Liguori
2009-11-24 14:28           ` [Qemu-devel] " Michael S. Tsirkin
2009-11-24 14:33             ` [Qemu-devel] " Anthony Liguori
2009-11-24 16:05               ` Michael S. Tsirkin
     [not found]                 ` <m3skc2r66t.fsf@neno.neno>
2009-11-25 16:28                   ` Michael S. Tsirkin
2009-11-24 13:17       ` [Qemu-devel] " Michael S. Tsirkin
2009-11-24 13:35         ` Paul Brook
2009-11-24 13:49           ` [Qemu-devel] " Michael S. Tsirkin
2009-11-24 13:59             ` [Qemu-devel] " Paul Brook
2009-11-24 14:21               ` Michael S. Tsirkin
2009-11-24 17:06                 ` Blue Swirl
2009-11-24 17:08                   ` Michael S. Tsirkin
2009-11-24 17:43                     ` Paolo Bonzini
2009-11-24 18:51                       ` Anthony Liguori
2009-11-24 18:56                         ` Blue Swirl
2009-11-24 19:24                           ` Anthony Liguori
2009-11-24 18:57                         ` Paolo Bonzini
2009-11-24 19:29                           ` Anthony Liguori
2009-11-24 20:01                             ` Michael S. Tsirkin
     [not found]             ` <m3my2ct2qe.fsf@neno.neno>
2009-11-24 17:41               ` Paolo Bonzini
2009-11-24 13:21   ` Michael S. Tsirkin
2009-11-24 13:45     ` Anthony Liguori
2009-11-24 13:55       ` Michael S. Tsirkin
2009-11-23 12:15 ` Juan Quintela
2009-11-23 13:09   ` Anthony Liguori
2009-11-23 14:13     ` Juan Quintela
2009-11-24 14:05       ` Michael S. Tsirkin
2009-11-24 14:20         ` Juan Quintela
2009-11-24 14:35           ` Michael S. Tsirkin
2009-11-25 13:42             ` Gerd Hoffmann
2009-11-25 13:42               ` Michael S. Tsirkin
2009-11-25 14:10                 ` Gerd Hoffmann
2009-11-25 14:09                   ` Michael S. Tsirkin
2009-11-25 14:52                     ` Gerd Hoffmann
2009-11-26 18:03                     ` Andrea Arcangeli
2009-11-25 13:36         ` Gerd Hoffmann
2009-11-25 13:40           ` Michael S. Tsirkin
2009-11-25 13:59             ` Gerd Hoffmann
2009-11-25 14:03               ` Michael S. Tsirkin
2009-11-25 14:53                 ` Juan Quintela
2009-11-25 15:01                   ` Michael S. Tsirkin
2009-11-24 10:39   ` Dor Laor
2009-11-24 14:01     ` Michael S. Tsirkin
2009-11-24 14:21       ` Juan Quintela
2009-11-24 14:38         ` Michael S. Tsirkin
2009-11-24 16:05         ` Michael S. Tsirkin
2009-11-25  9:30           ` Juan Quintela
2009-11-25  9:32             ` Michael S. Tsirkin
2009-11-25 13:36               ` Juan Quintela
2009-11-24 13:59   ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.