All of lore.kernel.org
 help / color / mirror / Atom feed
* QEMU/KVM migration backwards compatibility broken?
@ 2019-06-06  0:09 ` Liran Alon
  0 siblings, 0 replies; 26+ messages in thread
From: Liran Alon @ 2019-06-06  0:09 UTC (permalink / raw)
  To: kvm list, qemu-devel; +Cc: Paolo Bonzini, Dr. David Alan Gilbert

Hi,

Looking at QEMU source code, I am puzzled regarding how migration backwards compatibility is preserved regarding X86CPU.

As I understand it, fields that are based on KVM capabilities and guest runtime usage are defined in VMState subsections in order to not send them if not necessary.
This is done such that in case they are not needed and we migrate to an old QEMU which don’t support loading this state, migration will still succeed
(As .needed() method will return false and therefore this state won’t be sent as part of migration stream).
Furthermore, in case .needed() returns true and old QEMU don’t support loading this state, migration fails. As it should because we are aware that guest state
is not going to be restored properly on destination.

I’m puzzled about what will happen in the following scenario:
1) Source is running new QEMU with new KVM that supports save of some VMState subsection.
2) Destination is running new QEMU that supports load this state but with old kernel that doesn’t know how to load this state.

I would have expected in this case that if source .needed() returns true, then migration will fail because of lack of support in destination kernel.
However, it seems from current QEMU code that this will actually succeed in many cases.

For example, if msr_smi_count is sent as part of migration stream (See vmstate_msr_smi_count) and destination have has_msr_smi_count==false,
then destination will succeed loading migration stream but kvm_put_msrs() will actually ignore env->msr_smi_count and will successfully load guest state.
Therefore, migration will succeed even though it should have failed…

It seems to me that QEMU should have for every such VMState subsection, a .post_load() method that verifies that relevant capability is supported by kernel
and otherwise fail migration.

What do you think? Should I really create a patch to modify all these CPUX86 VMState subsections to behave like this?

Thanks,
-Liran

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Qemu-devel] QEMU/KVM migration backwards compatibility broken?
@ 2019-06-06  0:09 ` Liran Alon
  0 siblings, 0 replies; 26+ messages in thread
From: Liran Alon @ 2019-06-06  0:09 UTC (permalink / raw)
  To: kvm list, qemu-devel; +Cc: Paolo Bonzini, Dr. David Alan Gilbert

Hi,

Looking at QEMU source code, I am puzzled regarding how migration backwards compatibility is preserved regarding X86CPU.

As I understand it, fields that are based on KVM capabilities and guest runtime usage are defined in VMState subsections in order to not send them if not necessary.
This is done such that in case they are not needed and we migrate to an old QEMU which don’t support loading this state, migration will still succeed
(As .needed() method will return false and therefore this state won’t be sent as part of migration stream).
Furthermore, in case .needed() returns true and old QEMU don’t support loading this state, migration fails. As it should because we are aware that guest state
is not going to be restored properly on destination.

I’m puzzled about what will happen in the following scenario:
1) Source is running new QEMU with new KVM that supports save of some VMState subsection.
2) Destination is running new QEMU that supports load this state but with old kernel that doesn’t know how to load this state.

I would have expected in this case that if source .needed() returns true, then migration will fail because of lack of support in destination kernel.
However, it seems from current QEMU code that this will actually succeed in many cases.

For example, if msr_smi_count is sent as part of migration stream (See vmstate_msr_smi_count) and destination have has_msr_smi_count==false,
then destination will succeed loading migration stream but kvm_put_msrs() will actually ignore env->msr_smi_count and will successfully load guest state.
Therefore, migration will succeed even though it should have failed…

It seems to me that QEMU should have for every such VMState subsection, a .post_load() method that verifies that relevant capability is supported by kernel
and otherwise fail migration.

What do you think? Should I really create a patch to modify all these CPUX86 VMState subsections to behave like this?

Thanks,
-Liran

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: QEMU/KVM migration backwards compatibility broken?
  2019-06-06  0:09 ` [Qemu-devel] " Liran Alon
@ 2019-06-06  8:42   ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert @ 2019-06-06  8:42 UTC (permalink / raw)
  To: Liran Alon; +Cc: kvm list, qemu-devel, Paolo Bonzini

* Liran Alon (liran.alon@oracle.com) wrote:
> Hi,
> 
> Looking at QEMU source code, I am puzzled regarding how migration backwards compatibility is preserved regarding X86CPU.
> 
> As I understand it, fields that are based on KVM capabilities and guest runtime usage are defined in VMState subsections in order to not send them if not necessary.
> This is done such that in case they are not needed and we migrate to an old QEMU which don’t support loading this state, migration will still succeed
> (As .needed() method will return false and therefore this state won’t be sent as part of migration stream).
> Furthermore, in case .needed() returns true and old QEMU don’t support loading this state, migration fails. As it should because we are aware that guest state
> is not going to be restored properly on destination.
> 
> I’m puzzled about what will happen in the following scenario:
> 1) Source is running new QEMU with new KVM that supports save of some VMState subsection.
> 2) Destination is running new QEMU that supports load this state but with old kernel that doesn’t know how to load this state.
> 
> I would have expected in this case that if source .needed() returns true, then migration will fail because of lack of support in destination kernel.
> However, it seems from current QEMU code that this will actually succeed in many cases.
> 
> For example, if msr_smi_count is sent as part of migration stream (See vmstate_msr_smi_count) and destination have has_msr_smi_count==false,
> then destination will succeed loading migration stream but kvm_put_msrs() will actually ignore env->msr_smi_count and will successfully load guest state.
> Therefore, migration will succeed even though it should have failed…
> 
> It seems to me that QEMU should have for every such VMState subsection, a .post_load() method that verifies that relevant capability is supported by kernel
> and otherwise fail migration.
> 
> What do you think? Should I really create a patch to modify all these CPUX86 VMState subsections to behave like this?

I don't know the x86 specific side that much; but from my migration side
the answer should mostly be through machine types - indeed for smi-count
there's a property 'x-migrate-smi-count' which is off for machine types
pre 2.11 (see hw/i386/pc.c pc_compat_2_11) - so if you've got an old
kernel you should stick to the old machine types.

There's nothing guarding running the new machine type on old-kernels;
and arguably we should have a check at startup that complains if
your kernel is missing something the machine type uses.
However, that would mean that people running with -M pc   would fail
on old kernels.

A post-load is also a valid check; but one question is whether,
for a particular register, the pain is worth it - it depends on the
symptom that the missing state causes.  If it's minor then you might
conclude it's not worth a failed migration;  if it's a hung or
corrupt guest then yes it is.   Certainly a warning printed is worth
it.

Dave

> Thanks,
> -Liran
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] QEMU/KVM migration backwards compatibility broken?
@ 2019-06-06  8:42   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert @ 2019-06-06  8:42 UTC (permalink / raw)
  To: Liran Alon; +Cc: Paolo Bonzini, qemu-devel, kvm list

* Liran Alon (liran.alon@oracle.com) wrote:
> Hi,
> 
> Looking at QEMU source code, I am puzzled regarding how migration backwards compatibility is preserved regarding X86CPU.
> 
> As I understand it, fields that are based on KVM capabilities and guest runtime usage are defined in VMState subsections in order to not send them if not necessary.
> This is done such that in case they are not needed and we migrate to an old QEMU which don’t support loading this state, migration will still succeed
> (As .needed() method will return false and therefore this state won’t be sent as part of migration stream).
> Furthermore, in case .needed() returns true and old QEMU don’t support loading this state, migration fails. As it should because we are aware that guest state
> is not going to be restored properly on destination.
> 
> I’m puzzled about what will happen in the following scenario:
> 1) Source is running new QEMU with new KVM that supports save of some VMState subsection.
> 2) Destination is running new QEMU that supports load this state but with old kernel that doesn’t know how to load this state.
> 
> I would have expected in this case that if source .needed() returns true, then migration will fail because of lack of support in destination kernel.
> However, it seems from current QEMU code that this will actually succeed in many cases.
> 
> For example, if msr_smi_count is sent as part of migration stream (See vmstate_msr_smi_count) and destination have has_msr_smi_count==false,
> then destination will succeed loading migration stream but kvm_put_msrs() will actually ignore env->msr_smi_count and will successfully load guest state.
> Therefore, migration will succeed even though it should have failed…
> 
> It seems to me that QEMU should have for every such VMState subsection, a .post_load() method that verifies that relevant capability is supported by kernel
> and otherwise fail migration.
> 
> What do you think? Should I really create a patch to modify all these CPUX86 VMState subsections to behave like this?

I don't know the x86 specific side that much; but from my migration side
the answer should mostly be through machine types - indeed for smi-count
there's a property 'x-migrate-smi-count' which is off for machine types
pre 2.11 (see hw/i386/pc.c pc_compat_2_11) - so if you've got an old
kernel you should stick to the old machine types.

There's nothing guarding running the new machine type on old-kernels;
and arguably we should have a check at startup that complains if
your kernel is missing something the machine type uses.
However, that would mean that people running with -M pc   would fail
on old kernels.

A post-load is also a valid check; but one question is whether,
for a particular register, the pain is worth it - it depends on the
symptom that the missing state causes.  If it's minor then you might
conclude it's not worth a failed migration;  if it's a hung or
corrupt guest then yes it is.   Certainly a warning printed is worth
it.

Dave

> Thanks,
> -Liran
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: QEMU/KVM migration backwards compatibility broken?
  2019-06-06  8:42   ` [Qemu-devel] " Dr. David Alan Gilbert
@ 2019-06-06  9:11     ` Liran Alon
  -1 siblings, 0 replies; 26+ messages in thread
From: Liran Alon @ 2019-06-06  9:11 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: kvm list, qemu-devel, Paolo Bonzini



> On 6 Jun 2019, at 11:42, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> 
> * Liran Alon (liran.alon@oracle.com) wrote:
>> Hi,
>> 
>> Looking at QEMU source code, I am puzzled regarding how migration backwards compatibility is preserved regarding X86CPU.
>> 
>> As I understand it, fields that are based on KVM capabilities and guest runtime usage are defined in VMState subsections in order to not send them if not necessary.
>> This is done such that in case they are not needed and we migrate to an old QEMU which don’t support loading this state, migration will still succeed
>> (As .needed() method will return false and therefore this state won’t be sent as part of migration stream).
>> Furthermore, in case .needed() returns true and old QEMU don’t support loading this state, migration fails. As it should because we are aware that guest state
>> is not going to be restored properly on destination.
>> 
>> I’m puzzled about what will happen in the following scenario:
>> 1) Source is running new QEMU with new KVM that supports save of some VMState subsection.
>> 2) Destination is running new QEMU that supports load this state but with old kernel that doesn’t know how to load this state.
>> 
>> I would have expected in this case that if source .needed() returns true, then migration will fail because of lack of support in destination kernel.
>> However, it seems from current QEMU code that this will actually succeed in many cases.
>> 
>> For example, if msr_smi_count is sent as part of migration stream (See vmstate_msr_smi_count) and destination have has_msr_smi_count==false,
>> then destination will succeed loading migration stream but kvm_put_msrs() will actually ignore env->msr_smi_count and will successfully load guest state.
>> Therefore, migration will succeed even though it should have failed…
>> 
>> It seems to me that QEMU should have for every such VMState subsection, a .post_load() method that verifies that relevant capability is supported by kernel
>> and otherwise fail migration.
>> 
>> What do you think? Should I really create a patch to modify all these CPUX86 VMState subsections to behave like this?
> 
> I don't know the x86 specific side that much; but from my migration side
> the answer should mostly be through machine types - indeed for smi-count
> there's a property 'x-migrate-smi-count' which is off for machine types
> pre 2.11 (see hw/i386/pc.c pc_compat_2_11) - so if you've got an old
> kernel you should stick to the old machine types.
> 
> There's nothing guarding running the new machine type on old-kernels;
> and arguably we should have a check at startup that complains if
> your kernel is missing something the machine type uses.
> However, that would mean that people running with -M pc   would fail
> on old kernels.
> 
> A post-load is also a valid check; but one question is whether,
> for a particular register, the pain is worth it - it depends on the
> symptom that the missing state causes.  If it's minor then you might
> conclude it's not worth a failed migration;  if it's a hung or
> corrupt guest then yes it is.   Certainly a warning printed is worth
> it.
> 
> Dave

I think we should have flags that allow user to specify which VMState subsections user explicitly allow to avoid restore even though they are required to fully restore guest state.
But it seems to me that the behaviour should be to always fail migration in case we load a VMState subsections that we are unable to restore unless user explicitly specified this is ok
for this specific subsection.

Therefore, it seems that for every VMState subsection that it’s restore is based on kernel capability we should:
1) Have a user-controllable flag (which is also tied to machine-type?) to explicitly allow avoid restoring this state if cannot. Default should be “false”.
2) Have a .post_load() method that verifies we have required kernel capability to restore this state, unless flag (1) was specified as “true”.

Note that above mentioned flags is different than flags such as “x-migrate-smi-count”.
The purpose of “x-migrate-smi-count” flag is to avoid sending the VMState subsection to begin with in case we know we migrate to older QEMU which don’t even have the relevant VMState subsection. But it is not relevant for the case both source and destination runs QEMU which understands the VMState subsection but run on kernels with different capabilities.

Also note regarding your first paragraph, that specifying flags based on kernel you are running on doesn’t help for the case discussed here.
As source QEMU is running on new kernel. Unless you meant that source QEMU should use relevant machine-type based on the destination kernel.
i.e. You should launch QEMU with old machine-type as long as you have hosts in your migration pool that runs with old kernel.
I don’ think it’s the right approach though. As there is no way to change flags such as “x-migrate-smi-count” dynamically after all hosts in migration pool have been upgraded.

What do you think?

-Liran

> 
>> Thanks,
>> -Liran
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] QEMU/KVM migration backwards compatibility broken?
@ 2019-06-06  9:11     ` Liran Alon
  0 siblings, 0 replies; 26+ messages in thread
From: Liran Alon @ 2019-06-06  9:11 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Paolo Bonzini, qemu-devel, kvm list



> On 6 Jun 2019, at 11:42, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> 
> * Liran Alon (liran.alon@oracle.com) wrote:
>> Hi,
>> 
>> Looking at QEMU source code, I am puzzled regarding how migration backwards compatibility is preserved regarding X86CPU.
>> 
>> As I understand it, fields that are based on KVM capabilities and guest runtime usage are defined in VMState subsections in order to not send them if not necessary.
>> This is done such that in case they are not needed and we migrate to an old QEMU which don’t support loading this state, migration will still succeed
>> (As .needed() method will return false and therefore this state won’t be sent as part of migration stream).
>> Furthermore, in case .needed() returns true and old QEMU don’t support loading this state, migration fails. As it should because we are aware that guest state
>> is not going to be restored properly on destination.
>> 
>> I’m puzzled about what will happen in the following scenario:
>> 1) Source is running new QEMU with new KVM that supports save of some VMState subsection.
>> 2) Destination is running new QEMU that supports load this state but with old kernel that doesn’t know how to load this state.
>> 
>> I would have expected in this case that if source .needed() returns true, then migration will fail because of lack of support in destination kernel.
>> However, it seems from current QEMU code that this will actually succeed in many cases.
>> 
>> For example, if msr_smi_count is sent as part of migration stream (See vmstate_msr_smi_count) and destination have has_msr_smi_count==false,
>> then destination will succeed loading migration stream but kvm_put_msrs() will actually ignore env->msr_smi_count and will successfully load guest state.
>> Therefore, migration will succeed even though it should have failed…
>> 
>> It seems to me that QEMU should have for every such VMState subsection, a .post_load() method that verifies that relevant capability is supported by kernel
>> and otherwise fail migration.
>> 
>> What do you think? Should I really create a patch to modify all these CPUX86 VMState subsections to behave like this?
> 
> I don't know the x86 specific side that much; but from my migration side
> the answer should mostly be through machine types - indeed for smi-count
> there's a property 'x-migrate-smi-count' which is off for machine types
> pre 2.11 (see hw/i386/pc.c pc_compat_2_11) - so if you've got an old
> kernel you should stick to the old machine types.
> 
> There's nothing guarding running the new machine type on old-kernels;
> and arguably we should have a check at startup that complains if
> your kernel is missing something the machine type uses.
> However, that would mean that people running with -M pc   would fail
> on old kernels.
> 
> A post-load is also a valid check; but one question is whether,
> for a particular register, the pain is worth it - it depends on the
> symptom that the missing state causes.  If it's minor then you might
> conclude it's not worth a failed migration;  if it's a hung or
> corrupt guest then yes it is.   Certainly a warning printed is worth
> it.
> 
> Dave

I think we should have flags that allow user to specify which VMState subsections user explicitly allow to avoid restore even though they are required to fully restore guest state.
But it seems to me that the behaviour should be to always fail migration in case we load a VMState subsections that we are unable to restore unless user explicitly specified this is ok
for this specific subsection.

Therefore, it seems that for every VMState subsection that it’s restore is based on kernel capability we should:
1) Have a user-controllable flag (which is also tied to machine-type?) to explicitly allow avoid restoring this state if cannot. Default should be “false”.
2) Have a .post_load() method that verifies we have required kernel capability to restore this state, unless flag (1) was specified as “true”.

Note that above mentioned flags is different than flags such as “x-migrate-smi-count”.
The purpose of “x-migrate-smi-count” flag is to avoid sending the VMState subsection to begin with in case we know we migrate to older QEMU which don’t even have the relevant VMState subsection. But it is not relevant for the case both source and destination runs QEMU which understands the VMState subsection but run on kernels with different capabilities.

Also note regarding your first paragraph, that specifying flags based on kernel you are running on doesn’t help for the case discussed here.
As source QEMU is running on new kernel. Unless you meant that source QEMU should use relevant machine-type based on the destination kernel.
i.e. You should launch QEMU with old machine-type as long as you have hosts in your migration pool that runs with old kernel.
I don’ think it’s the right approach though. As there is no way to change flags such as “x-migrate-smi-count” dynamically after all hosts in migration pool have been upgraded.

What do you think?

-Liran

> 
>> Thanks,
>> -Liran
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: QEMU/KVM migration backwards compatibility broken?
  2019-06-06  9:11     ` [Qemu-devel] " Liran Alon
@ 2019-06-06  9:23       ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert @ 2019-06-06  9:23 UTC (permalink / raw)
  To: Liran Alon; +Cc: kvm list, qemu-devel, Paolo Bonzini

* Liran Alon (liran.alon@oracle.com) wrote:
> 
> 
> > On 6 Jun 2019, at 11:42, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> > 
> > * Liran Alon (liran.alon@oracle.com) wrote:
> >> Hi,
> >> 
> >> Looking at QEMU source code, I am puzzled regarding how migration backwards compatibility is preserved regarding X86CPU.
> >> 
> >> As I understand it, fields that are based on KVM capabilities and guest runtime usage are defined in VMState subsections in order to not send them if not necessary.
> >> This is done such that in case they are not needed and we migrate to an old QEMU which don’t support loading this state, migration will still succeed
> >> (As .needed() method will return false and therefore this state won’t be sent as part of migration stream).
> >> Furthermore, in case .needed() returns true and old QEMU don’t support loading this state, migration fails. As it should because we are aware that guest state
> >> is not going to be restored properly on destination.
> >> 
> >> I’m puzzled about what will happen in the following scenario:
> >> 1) Source is running new QEMU with new KVM that supports save of some VMState subsection.
> >> 2) Destination is running new QEMU that supports load this state but with old kernel that doesn’t know how to load this state.
> >> 
> >> I would have expected in this case that if source .needed() returns true, then migration will fail because of lack of support in destination kernel.
> >> However, it seems from current QEMU code that this will actually succeed in many cases.
> >> 
> >> For example, if msr_smi_count is sent as part of migration stream (See vmstate_msr_smi_count) and destination have has_msr_smi_count==false,
> >> then destination will succeed loading migration stream but kvm_put_msrs() will actually ignore env->msr_smi_count and will successfully load guest state.
> >> Therefore, migration will succeed even though it should have failed…
> >> 
> >> It seems to me that QEMU should have for every such VMState subsection, a .post_load() method that verifies that relevant capability is supported by kernel
> >> and otherwise fail migration.
> >> 
> >> What do you think? Should I really create a patch to modify all these CPUX86 VMState subsections to behave like this?
> > 
> > I don't know the x86 specific side that much; but from my migration side
> > the answer should mostly be through machine types - indeed for smi-count
> > there's a property 'x-migrate-smi-count' which is off for machine types
> > pre 2.11 (see hw/i386/pc.c pc_compat_2_11) - so if you've got an old
> > kernel you should stick to the old machine types.
> > 
> > There's nothing guarding running the new machine type on old-kernels;
> > and arguably we should have a check at startup that complains if
> > your kernel is missing something the machine type uses.
> > However, that would mean that people running with -M pc   would fail
> > on old kernels.
> > 
> > A post-load is also a valid check; but one question is whether,
> > for a particular register, the pain is worth it - it depends on the
> > symptom that the missing state causes.  If it's minor then you might
> > conclude it's not worth a failed migration;  if it's a hung or
> > corrupt guest then yes it is.   Certainly a warning printed is worth
> > it.
> > 
> > Dave
> 
> I think we should have flags that allow user to specify which VMState subsections user explicitly allow to avoid restore even though they are required to fully restore guest state.
> But it seems to me that the behaviour should be to always fail migration in case we load a VMState subsections that we are unable to restore unless user explicitly specified this is ok
> for this specific subsection.
> Therefore, it seems that for every VMState subsection that it’s restore is based on kernel capability we should:
> 1) Have a user-controllable flag (which is also tied to machine-type?) to explicitly allow avoid restoring this state if cannot. Default should be “false”.
> 2) Have a .post_load() method that verifies we have required kernel capability to restore this state, unless flag (1) was specified as “true”.

This seems a lot of flags; users aren't going to know what to do with
all of them; I don't see what will set/control them.

> Note that above mentioned flags is different than flags such as “x-migrate-smi-count”.
> The purpose of “x-migrate-smi-count” flag is to avoid sending the VMState subsection to begin with in case we know we migrate to older QEMU which don’t even have the relevant VMState subsection. But it is not relevant for the case both source and destination runs QEMU which understands the VMState subsection but run on kernels with different capabilities.
> 
> Also note regarding your first paragraph, that specifying flags based on kernel you are running on doesn’t help for the case discussed here.
> As source QEMU is running on new kernel. Unless you meant that source QEMU should use relevant machine-type based on the destination kernel.
> i.e. You should launch QEMU with old machine-type as long as you have hosts in your migration pool that runs with old kernel.

That's what I meant; stick to the old machine-type unless you know it's
safe to use a newer one.

> I don’ think it’s the right approach though. As there is no way to change flags such as “x-migrate-smi-count” dynamically after all hosts in migration pool have been upgraded.
> 
> What do you think?

I don't have an easy answer.  The users already have to make sure they
use a machine type that's old enough for all the QEMUs installed in
their cluster; making sure it's also old enough for their oldest
kernel isn't too big a difference - *except* that it's much harder to
tell which kernel corresponds to which feature/machine type etc - so
how does a user know what the newest supported machine type is?
Failing at startup when selecting a machine type that the current
kernel can't support would help that.

Dave

> -Liran
> 
> > 
> >> Thanks,
> >> -Liran
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] QEMU/KVM migration backwards compatibility broken?
@ 2019-06-06  9:23       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert @ 2019-06-06  9:23 UTC (permalink / raw)
  To: Liran Alon; +Cc: Paolo Bonzini, qemu-devel, kvm list

* Liran Alon (liran.alon@oracle.com) wrote:
> 
> 
> > On 6 Jun 2019, at 11:42, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> > 
> > * Liran Alon (liran.alon@oracle.com) wrote:
> >> Hi,
> >> 
> >> Looking at QEMU source code, I am puzzled regarding how migration backwards compatibility is preserved regarding X86CPU.
> >> 
> >> As I understand it, fields that are based on KVM capabilities and guest runtime usage are defined in VMState subsections in order to not send them if not necessary.
> >> This is done such that in case they are not needed and we migrate to an old QEMU which don’t support loading this state, migration will still succeed
> >> (As .needed() method will return false and therefore this state won’t be sent as part of migration stream).
> >> Furthermore, in case .needed() returns true and old QEMU don’t support loading this state, migration fails. As it should because we are aware that guest state
> >> is not going to be restored properly on destination.
> >> 
> >> I’m puzzled about what will happen in the following scenario:
> >> 1) Source is running new QEMU with new KVM that supports save of some VMState subsection.
> >> 2) Destination is running new QEMU that supports load this state but with old kernel that doesn’t know how to load this state.
> >> 
> >> I would have expected in this case that if source .needed() returns true, then migration will fail because of lack of support in destination kernel.
> >> However, it seems from current QEMU code that this will actually succeed in many cases.
> >> 
> >> For example, if msr_smi_count is sent as part of migration stream (See vmstate_msr_smi_count) and destination have has_msr_smi_count==false,
> >> then destination will succeed loading migration stream but kvm_put_msrs() will actually ignore env->msr_smi_count and will successfully load guest state.
> >> Therefore, migration will succeed even though it should have failed…
> >> 
> >> It seems to me that QEMU should have for every such VMState subsection, a .post_load() method that verifies that relevant capability is supported by kernel
> >> and otherwise fail migration.
> >> 
> >> What do you think? Should I really create a patch to modify all these CPUX86 VMState subsections to behave like this?
> > 
> > I don't know the x86 specific side that much; but from my migration side
> > the answer should mostly be through machine types - indeed for smi-count
> > there's a property 'x-migrate-smi-count' which is off for machine types
> > pre 2.11 (see hw/i386/pc.c pc_compat_2_11) - so if you've got an old
> > kernel you should stick to the old machine types.
> > 
> > There's nothing guarding running the new machine type on old-kernels;
> > and arguably we should have a check at startup that complains if
> > your kernel is missing something the machine type uses.
> > However, that would mean that people running with -M pc   would fail
> > on old kernels.
> > 
> > A post-load is also a valid check; but one question is whether,
> > for a particular register, the pain is worth it - it depends on the
> > symptom that the missing state causes.  If it's minor then you might
> > conclude it's not worth a failed migration;  if it's a hung or
> > corrupt guest then yes it is.   Certainly a warning printed is worth
> > it.
> > 
> > Dave
> 
> I think we should have flags that allow user to specify which VMState subsections user explicitly allow to avoid restore even though they are required to fully restore guest state.
> But it seems to me that the behaviour should be to always fail migration in case we load a VMState subsections that we are unable to restore unless user explicitly specified this is ok
> for this specific subsection.
> Therefore, it seems that for every VMState subsection that it’s restore is based on kernel capability we should:
> 1) Have a user-controllable flag (which is also tied to machine-type?) to explicitly allow avoid restoring this state if cannot. Default should be “false”.
> 2) Have a .post_load() method that verifies we have required kernel capability to restore this state, unless flag (1) was specified as “true”.

This seems a lot of flags; users aren't going to know what to do with
all of them; I don't see what will set/control them.

> Note that above mentioned flags is different than flags such as “x-migrate-smi-count”.
> The purpose of “x-migrate-smi-count” flag is to avoid sending the VMState subsection to begin with in case we know we migrate to older QEMU which don’t even have the relevant VMState subsection. But it is not relevant for the case both source and destination runs QEMU which understands the VMState subsection but run on kernels with different capabilities.
> 
> Also note regarding your first paragraph, that specifying flags based on kernel you are running on doesn’t help for the case discussed here.
> As source QEMU is running on new kernel. Unless you meant that source QEMU should use relevant machine-type based on the destination kernel.
> i.e. You should launch QEMU with old machine-type as long as you have hosts in your migration pool that runs with old kernel.

That's what I meant; stick to the old machine-type unless you know it's
safe to use a newer one.

> I don’ think it’s the right approach though. As there is no way to change flags such as “x-migrate-smi-count” dynamically after all hosts in migration pool have been upgraded.
> 
> What do you think?

I don't have an easy answer.  The users already have to make sure they
use a machine type that's old enough for all the QEMUs installed in
their cluster; making sure it's also old enough for their oldest
kernel isn't too big a difference - *except* that it's much harder to
tell which kernel corresponds to which feature/machine type etc - so
how does a user know what the newest supported machine type is?
Failing at startup when selecting a machine type that the current
kernel can't support would help that.

Dave

> -Liran
> 
> > 
> >> Thanks,
> >> -Liran
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: QEMU/KVM migration backwards compatibility broken?
  2019-06-06  9:23       ` [Qemu-devel] " Dr. David Alan Gilbert
@ 2019-06-06 10:09         ` Liran Alon
  -1 siblings, 0 replies; 26+ messages in thread
From: Liran Alon @ 2019-06-06 10:09 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: kvm list, qemu-devel, Paolo Bonzini



> On 6 Jun 2019, at 12:23, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> 
> * Liran Alon (liran.alon@oracle.com) wrote:
>> 
>> 
>>> On 6 Jun 2019, at 11:42, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
>>> 
>>> * Liran Alon (liran.alon@oracle.com) wrote:
>>>> Hi,
>>>> 
>>>> Looking at QEMU source code, I am puzzled regarding how migration backwards compatibility is preserved regarding X86CPU.
>>>> 
>>>> As I understand it, fields that are based on KVM capabilities and guest runtime usage are defined in VMState subsections in order to not send them if not necessary.
>>>> This is done such that in case they are not needed and we migrate to an old QEMU which don’t support loading this state, migration will still succeed
>>>> (As .needed() method will return false and therefore this state won’t be sent as part of migration stream).
>>>> Furthermore, in case .needed() returns true and old QEMU don’t support loading this state, migration fails. As it should because we are aware that guest state
>>>> is not going to be restored properly on destination.
>>>> 
>>>> I’m puzzled about what will happen in the following scenario:
>>>> 1) Source is running new QEMU with new KVM that supports save of some VMState subsection.
>>>> 2) Destination is running new QEMU that supports load this state but with old kernel that doesn’t know how to load this state.
>>>> 
>>>> I would have expected in this case that if source .needed() returns true, then migration will fail because of lack of support in destination kernel.
>>>> However, it seems from current QEMU code that this will actually succeed in many cases.
>>>> 
>>>> For example, if msr_smi_count is sent as part of migration stream (See vmstate_msr_smi_count) and destination have has_msr_smi_count==false,
>>>> then destination will succeed loading migration stream but kvm_put_msrs() will actually ignore env->msr_smi_count and will successfully load guest state.
>>>> Therefore, migration will succeed even though it should have failed…
>>>> 
>>>> It seems to me that QEMU should have for every such VMState subsection, a .post_load() method that verifies that relevant capability is supported by kernel
>>>> and otherwise fail migration.
>>>> 
>>>> What do you think? Should I really create a patch to modify all these CPUX86 VMState subsections to behave like this?
>>> 
>>> I don't know the x86 specific side that much; but from my migration side
>>> the answer should mostly be through machine types - indeed for smi-count
>>> there's a property 'x-migrate-smi-count' which is off for machine types
>>> pre 2.11 (see hw/i386/pc.c pc_compat_2_11) - so if you've got an old
>>> kernel you should stick to the old machine types.
>>> 
>>> There's nothing guarding running the new machine type on old-kernels;
>>> and arguably we should have a check at startup that complains if
>>> your kernel is missing something the machine type uses.
>>> However, that would mean that people running with -M pc   would fail
>>> on old kernels.
>>> 
>>> A post-load is also a valid check; but one question is whether,
>>> for a particular register, the pain is worth it - it depends on the
>>> symptom that the missing state causes.  If it's minor then you might
>>> conclude it's not worth a failed migration;  if it's a hung or
>>> corrupt guest then yes it is.   Certainly a warning printed is worth
>>> it.
>>> 
>>> Dave
>> 
>> I think we should have flags that allow user to specify which VMState subsections user explicitly allow to avoid restore even though they are required to fully restore guest state.
>> But it seems to me that the behaviour should be to always fail migration in case we load a VMState subsections that we are unable to restore unless user explicitly specified this is ok
>> for this specific subsection.
>> Therefore, it seems that for every VMState subsection that it’s restore is based on kernel capability we should:
>> 1) Have a user-controllable flag (which is also tied to machine-type?) to explicitly allow avoid restoring this state if cannot. Default should be “false”.
>> 2) Have a .post_load() method that verifies we have required kernel capability to restore this state, unless flag (1) was specified as “true”.
> 
> This seems a lot of flags; users aren't going to know what to do with
> all of them; I don't see what will set/control them.

True but I think users will want to specify only for a handful of VMState subsections that it is OK to not restore them even thought hey are deemed needed by source QEMU.
We can create flags only for those VMState subsections.
User should set these flags explicitly on QEMU command-line. As a “-cpu” property? I don’t think these flags should be tied to machine-type.

> 
>> Note that above mentioned flags is different than flags such as “x-migrate-smi-count”.
>> The purpose of “x-migrate-smi-count” flag is to avoid sending the VMState subsection to begin with in case we know we migrate to older QEMU which don’t even have the relevant VMState subsection. But it is not relevant for the case both source and destination runs QEMU which understands the VMState subsection but run on kernels with different capabilities.
>> 
>> Also note regarding your first paragraph, that specifying flags based on kernel you are running on doesn’t help for the case discussed here.
>> As source QEMU is running on new kernel. Unless you meant that source QEMU should use relevant machine-type based on the destination kernel.
>> i.e. You should launch QEMU with old machine-type as long as you have hosts in your migration pool that runs with old kernel.
> 
> That's what I meant; stick to the old machine-type unless you know it's
> safe to use a newer one.
> 
>> I don’ think it’s the right approach though. As there is no way to change flags such as “x-migrate-smi-count” dynamically after all hosts in migration pool have been upgraded.
>> 
>> What do you think?
> 
> I don't have an easy answer.  The users already have to make sure they
> use a machine type that's old enough for all the QEMUs installed in
> their cluster; making sure it's also old enough for their oldest
> kernel isn't too big a difference - *except* that it's much harder to
> tell which kernel corresponds to which feature/machine type etc - so
> how does a user know what the newest supported machine type is?
> Failing at startup when selecting a machine type that the current
> kernel can't support would help that.
> 
> Dave

First, machine-type express the set of vHW behaviour and properties that is exposed to guest.
Therefore, machine-type shouldn’t change for a given guest lifetime (including Live-Migrations).
Otherwise, guest will experience different vHW behaviour and properties before/after Live-Migration.
So I think machine-type is not relevant for this discussion. We should focus on flags which specify
migration behaviour (such as “x-migrate-smi-count” which can also be controlled by machine-type but not only).

Second, this strategy results in inefficient migration management. Consider the following scenario:
1) Guest running on new_qemu+old_kernel migrate to host with new_qemu+new_kernel.
Because source is old_kernel than destination QEMU is launched with (x-migrate-smi-count == false).
2) Assume at this point fleet of hosts have half of hosts with old_kernel and half with new_kernel.
3) Further assume that guest workload indeed use msr_smi_count and therefore relevant VMState subsection should be sent to properly preserve guest state.
4) From some reason, we decide to migrate again the guest in (1).
Even if guest is migrated to a host with new_kernel, then QEMU still avoids sending msr_smi_count VMState subsection because it is launched with (x-migrate-smi-count == false).

Therefore, I think it makes more sense that source QEMU will always send all VMState subsection that are deemed needed (i.e. .nedeed() returns true)
and let receive-side decide if migration should fail if this subsection was sent but failed to be restored.
The only case which I think sender should limit the VMState subsection it sends to destination is because source is running older QEMU
which is not even aware of this VMState subsection (Which is to my understanding the rational behind using “x-migrate-smi-count” and tie it up to machine-type).

Third, let’s assume all hosts in fleet was upgraded to new_kernel. How do I modify all launched QEMUs on these new hosts to now have “x-migrate-smi-count” set to true?
As I would like future migrations to do send this VMState subsection. Currently there is no QMP command to update these flags.

Fourth, I think it’s not trivial for management-plane to be aware with which flags it should set on destination QEMU based on currently running kernels on fleet.
It’s not the same as machine-type, as already discussed above doesn’t change during the entire lifetime of guest.

I’m also not sure it is a good idea that we currently control flags such as “x-migrate-smi-count” from machine-type.
As it means that if a guest was initially launched using some old QEMU, it will *forever* not migrate some VMState subsection during all it’s Live-Migrations.
Even if all hosts and all QEMUs on fleet are capable of migrating this state properly.
Maybe it is preferred that this flag was specified as part of “migrate” command itself in case management-plane knows it wishes to migrate even though dest QEMU
is older and doesn’t understand this specific VMState subsection.

I’m left pretty confused about QEMU’s migration compatibility strategy...

-Liran

> 
>> -Liran
>> 
>>> 
>>>> Thanks,
>>>> -Liran
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] QEMU/KVM migration backwards compatibility broken?
@ 2019-06-06 10:09         ` Liran Alon
  0 siblings, 0 replies; 26+ messages in thread
From: Liran Alon @ 2019-06-06 10:09 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Paolo Bonzini, qemu-devel, kvm list



> On 6 Jun 2019, at 12:23, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> 
> * Liran Alon (liran.alon@oracle.com) wrote:
>> 
>> 
>>> On 6 Jun 2019, at 11:42, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
>>> 
>>> * Liran Alon (liran.alon@oracle.com) wrote:
>>>> Hi,
>>>> 
>>>> Looking at QEMU source code, I am puzzled regarding how migration backwards compatibility is preserved regarding X86CPU.
>>>> 
>>>> As I understand it, fields that are based on KVM capabilities and guest runtime usage are defined in VMState subsections in order to not send them if not necessary.
>>>> This is done such that in case they are not needed and we migrate to an old QEMU which don’t support loading this state, migration will still succeed
>>>> (As .needed() method will return false and therefore this state won’t be sent as part of migration stream).
>>>> Furthermore, in case .needed() returns true and old QEMU don’t support loading this state, migration fails. As it should because we are aware that guest state
>>>> is not going to be restored properly on destination.
>>>> 
>>>> I’m puzzled about what will happen in the following scenario:
>>>> 1) Source is running new QEMU with new KVM that supports save of some VMState subsection.
>>>> 2) Destination is running new QEMU that supports load this state but with old kernel that doesn’t know how to load this state.
>>>> 
>>>> I would have expected in this case that if source .needed() returns true, then migration will fail because of lack of support in destination kernel.
>>>> However, it seems from current QEMU code that this will actually succeed in many cases.
>>>> 
>>>> For example, if msr_smi_count is sent as part of migration stream (See vmstate_msr_smi_count) and destination have has_msr_smi_count==false,
>>>> then destination will succeed loading migration stream but kvm_put_msrs() will actually ignore env->msr_smi_count and will successfully load guest state.
>>>> Therefore, migration will succeed even though it should have failed…
>>>> 
>>>> It seems to me that QEMU should have for every such VMState subsection, a .post_load() method that verifies that relevant capability is supported by kernel
>>>> and otherwise fail migration.
>>>> 
>>>> What do you think? Should I really create a patch to modify all these CPUX86 VMState subsections to behave like this?
>>> 
>>> I don't know the x86 specific side that much; but from my migration side
>>> the answer should mostly be through machine types - indeed for smi-count
>>> there's a property 'x-migrate-smi-count' which is off for machine types
>>> pre 2.11 (see hw/i386/pc.c pc_compat_2_11) - so if you've got an old
>>> kernel you should stick to the old machine types.
>>> 
>>> There's nothing guarding running the new machine type on old-kernels;
>>> and arguably we should have a check at startup that complains if
>>> your kernel is missing something the machine type uses.
>>> However, that would mean that people running with -M pc   would fail
>>> on old kernels.
>>> 
>>> A post-load is also a valid check; but one question is whether,
>>> for a particular register, the pain is worth it - it depends on the
>>> symptom that the missing state causes.  If it's minor then you might
>>> conclude it's not worth a failed migration;  if it's a hung or
>>> corrupt guest then yes it is.   Certainly a warning printed is worth
>>> it.
>>> 
>>> Dave
>> 
>> I think we should have flags that allow user to specify which VMState subsections user explicitly allow to avoid restore even though they are required to fully restore guest state.
>> But it seems to me that the behaviour should be to always fail migration in case we load a VMState subsections that we are unable to restore unless user explicitly specified this is ok
>> for this specific subsection.
>> Therefore, it seems that for every VMState subsection that it’s restore is based on kernel capability we should:
>> 1) Have a user-controllable flag (which is also tied to machine-type?) to explicitly allow avoid restoring this state if cannot. Default should be “false”.
>> 2) Have a .post_load() method that verifies we have required kernel capability to restore this state, unless flag (1) was specified as “true”.
> 
> This seems a lot of flags; users aren't going to know what to do with
> all of them; I don't see what will set/control them.

True but I think users will want to specify only for a handful of VMState subsections that it is OK to not restore them even thought hey are deemed needed by source QEMU.
We can create flags only for those VMState subsections.
User should set these flags explicitly on QEMU command-line. As a “-cpu” property? I don’t think these flags should be tied to machine-type.

> 
>> Note that above mentioned flags is different than flags such as “x-migrate-smi-count”.
>> The purpose of “x-migrate-smi-count” flag is to avoid sending the VMState subsection to begin with in case we know we migrate to older QEMU which don’t even have the relevant VMState subsection. But it is not relevant for the case both source and destination runs QEMU which understands the VMState subsection but run on kernels with different capabilities.
>> 
>> Also note regarding your first paragraph, that specifying flags based on kernel you are running on doesn’t help for the case discussed here.
>> As source QEMU is running on new kernel. Unless you meant that source QEMU should use relevant machine-type based on the destination kernel.
>> i.e. You should launch QEMU with old machine-type as long as you have hosts in your migration pool that runs with old kernel.
> 
> That's what I meant; stick to the old machine-type unless you know it's
> safe to use a newer one.
> 
>> I don’ think it’s the right approach though. As there is no way to change flags such as “x-migrate-smi-count” dynamically after all hosts in migration pool have been upgraded.
>> 
>> What do you think?
> 
> I don't have an easy answer.  The users already have to make sure they
> use a machine type that's old enough for all the QEMUs installed in
> their cluster; making sure it's also old enough for their oldest
> kernel isn't too big a difference - *except* that it's much harder to
> tell which kernel corresponds to which feature/machine type etc - so
> how does a user know what the newest supported machine type is?
> Failing at startup when selecting a machine type that the current
> kernel can't support would help that.
> 
> Dave

First, machine-type express the set of vHW behaviour and properties that is exposed to guest.
Therefore, machine-type shouldn’t change for a given guest lifetime (including Live-Migrations).
Otherwise, guest will experience different vHW behaviour and properties before/after Live-Migration.
So I think machine-type is not relevant for this discussion. We should focus on flags which specify
migration behaviour (such as “x-migrate-smi-count” which can also be controlled by machine-type but not only).

Second, this strategy results in inefficient migration management. Consider the following scenario:
1) Guest running on new_qemu+old_kernel migrate to host with new_qemu+new_kernel.
Because source is old_kernel than destination QEMU is launched with (x-migrate-smi-count == false).
2) Assume at this point fleet of hosts have half of hosts with old_kernel and half with new_kernel.
3) Further assume that guest workload indeed use msr_smi_count and therefore relevant VMState subsection should be sent to properly preserve guest state.
4) From some reason, we decide to migrate again the guest in (1).
Even if guest is migrated to a host with new_kernel, then QEMU still avoids sending msr_smi_count VMState subsection because it is launched with (x-migrate-smi-count == false).

Therefore, I think it makes more sense that source QEMU will always send all VMState subsection that are deemed needed (i.e. .nedeed() returns true)
and let receive-side decide if migration should fail if this subsection was sent but failed to be restored.
The only case which I think sender should limit the VMState subsection it sends to destination is because source is running older QEMU
which is not even aware of this VMState subsection (Which is to my understanding the rational behind using “x-migrate-smi-count” and tie it up to machine-type).

Third, let’s assume all hosts in fleet was upgraded to new_kernel. How do I modify all launched QEMUs on these new hosts to now have “x-migrate-smi-count” set to true?
As I would like future migrations to do send this VMState subsection. Currently there is no QMP command to update these flags.

Fourth, I think it’s not trivial for management-plane to be aware with which flags it should set on destination QEMU based on currently running kernels on fleet.
It’s not the same as machine-type, as already discussed above doesn’t change during the entire lifetime of guest.

I’m also not sure it is a good idea that we currently control flags such as “x-migrate-smi-count” from machine-type.
As it means that if a guest was initially launched using some old QEMU, it will *forever* not migrate some VMState subsection during all it’s Live-Migrations.
Even if all hosts and all QEMUs on fleet are capable of migrating this state properly.
Maybe it is preferred that this flag was specified as part of “migrate” command itself in case management-plane knows it wishes to migrate even though dest QEMU
is older and doesn’t understand this specific VMState subsection.

I’m left pretty confused about QEMU’s migration compatibility strategy...

-Liran

> 
>> -Liran
>> 
>>> 
>>>> Thanks,
>>>> -Liran
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: QEMU/KVM migration backwards compatibility broken?
  2019-06-06 10:09         ` [Qemu-devel] " Liran Alon
@ 2019-06-06 10:39           ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert @ 2019-06-06 10:39 UTC (permalink / raw)
  To: Liran Alon; +Cc: kvm list, qemu-devel, Paolo Bonzini

* Liran Alon (liran.alon@oracle.com) wrote:
> 
> 
> > On 6 Jun 2019, at 12:23, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> > 
> > * Liran Alon (liran.alon@oracle.com) wrote:
> >> 
> >> 
> >>> On 6 Jun 2019, at 11:42, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> >>> 
> >>> * Liran Alon (liran.alon@oracle.com) wrote:
> >>>> Hi,
> >>>> 
> >>>> Looking at QEMU source code, I am puzzled regarding how migration backwards compatibility is preserved regarding X86CPU.
> >>>> 
> >>>> As I understand it, fields that are based on KVM capabilities and guest runtime usage are defined in VMState subsections in order to not send them if not necessary.
> >>>> This is done such that in case they are not needed and we migrate to an old QEMU which don’t support loading this state, migration will still succeed
> >>>> (As .needed() method will return false and therefore this state won’t be sent as part of migration stream).
> >>>> Furthermore, in case .needed() returns true and old QEMU don’t support loading this state, migration fails. As it should because we are aware that guest state
> >>>> is not going to be restored properly on destination.
> >>>> 
> >>>> I’m puzzled about what will happen in the following scenario:
> >>>> 1) Source is running new QEMU with new KVM that supports save of some VMState subsection.
> >>>> 2) Destination is running new QEMU that supports load this state but with old kernel that doesn’t know how to load this state.
> >>>> 
> >>>> I would have expected in this case that if source .needed() returns true, then migration will fail because of lack of support in destination kernel.
> >>>> However, it seems from current QEMU code that this will actually succeed in many cases.
> >>>> 
> >>>> For example, if msr_smi_count is sent as part of migration stream (See vmstate_msr_smi_count) and destination have has_msr_smi_count==false,
> >>>> then destination will succeed loading migration stream but kvm_put_msrs() will actually ignore env->msr_smi_count and will successfully load guest state.
> >>>> Therefore, migration will succeed even though it should have failed…
> >>>> 
> >>>> It seems to me that QEMU should have for every such VMState subsection, a .post_load() method that verifies that relevant capability is supported by kernel
> >>>> and otherwise fail migration.
> >>>> 
> >>>> What do you think? Should I really create a patch to modify all these CPUX86 VMState subsections to behave like this?
> >>> 
> >>> I don't know the x86 specific side that much; but from my migration side
> >>> the answer should mostly be through machine types - indeed for smi-count
> >>> there's a property 'x-migrate-smi-count' which is off for machine types
> >>> pre 2.11 (see hw/i386/pc.c pc_compat_2_11) - so if you've got an old
> >>> kernel you should stick to the old machine types.
> >>> 
> >>> There's nothing guarding running the new machine type on old-kernels;
> >>> and arguably we should have a check at startup that complains if
> >>> your kernel is missing something the machine type uses.
> >>> However, that would mean that people running with -M pc   would fail
> >>> on old kernels.
> >>> 
> >>> A post-load is also a valid check; but one question is whether,
> >>> for a particular register, the pain is worth it - it depends on the
> >>> symptom that the missing state causes.  If it's minor then you might
> >>> conclude it's not worth a failed migration;  if it's a hung or
> >>> corrupt guest then yes it is.   Certainly a warning printed is worth
> >>> it.
> >>> 
> >>> Dave
> >> 
> >> I think we should have flags that allow user to specify which VMState subsections user explicitly allow to avoid restore even though they are required to fully restore guest state.
> >> But it seems to me that the behaviour should be to always fail migration in case we load a VMState subsections that we are unable to restore unless user explicitly specified this is ok
> >> for this specific subsection.
> >> Therefore, it seems that for every VMState subsection that it’s restore is based on kernel capability we should:
> >> 1) Have a user-controllable flag (which is also tied to machine-type?) to explicitly allow avoid restoring this state if cannot. Default should be “false”.
> >> 2) Have a .post_load() method that verifies we have required kernel capability to restore this state, unless flag (1) was specified as “true”.
> > 
> > This seems a lot of flags; users aren't going to know what to do with
> > all of them; I don't see what will set/control them.
> 
> True but I think users will want to specify only for a handful of VMState subsections that it is OK to not restore them even thought hey are deemed needed by source QEMU.
> We can create flags only for those VMState subsections.
> User should set these flags explicitly on QEMU command-line. As a “-cpu” property? I don’t think these flags should be tied to machine-type.

I don't see who is going to work out these flags and send them.

> > 
> >> Note that above mentioned flags is different than flags such as “x-migrate-smi-count”.
> >> The purpose of “x-migrate-smi-count” flag is to avoid sending the VMState subsection to begin with in case we know we migrate to older QEMU which don’t even have the relevant VMState subsection. But it is not relevant for the case both source and destination runs QEMU which understands the VMState subsection but run on kernels with different capabilities.
> >> 
> >> Also note regarding your first paragraph, that specifying flags based on kernel you are running on doesn’t help for the case discussed here.
> >> As source QEMU is running on new kernel. Unless you meant that source QEMU should use relevant machine-type based on the destination kernel.
> >> i.e. You should launch QEMU with old machine-type as long as you have hosts in your migration pool that runs with old kernel.
> > 
> > That's what I meant; stick to the old machine-type unless you know it's
> > safe to use a newer one.
> > 
> >> I don’ think it’s the right approach though. As there is no way to change flags such as “x-migrate-smi-count” dynamically after all hosts in migration pool have been upgraded.
> >> 
> >> What do you think?
> > 
> > I don't have an easy answer.  The users already have to make sure they
> > use a machine type that's old enough for all the QEMUs installed in
> > their cluster; making sure it's also old enough for their oldest
> > kernel isn't too big a difference - *except* that it's much harder to
> > tell which kernel corresponds to which feature/machine type etc - so
> > how does a user know what the newest supported machine type is?
> > Failing at startup when selecting a machine type that the current
> > kernel can't support would help that.
> > 
> > Dave
> 
> First, machine-type express the set of vHW behaviour and properties that is exposed to guest.
> Therefore, machine-type shouldn’t change for a given guest lifetime (including Live-Migrations).
> Otherwise, guest will experience different vHW behaviour and properties before/after Live-Migration.
> So I think machine-type is not relevant for this discussion. We should focus on flags which specify
> migration behaviour (such as “x-migrate-smi-count” which can also be controlled by machine-type but not only).

Machine type specifies two things:
  a) The view from the guest
  b) Migration compatibility

(b) is explicitly documented in qemu's docs/devel/migration.rst, see the
subsection on subsections.

> Second, this strategy results in inefficient migration management. Consider the following scenario:
> 1) Guest running on new_qemu+old_kernel migrate to host with new_qemu+new_kernel.
> Because source is old_kernel than destination QEMU is launched with (x-migrate-smi-count == false).
> 2) Assume at this point fleet of hosts have half of hosts with old_kernel and half with new_kernel.
> 3) Further assume that guest workload indeed use msr_smi_count and therefore relevant VMState subsection should be sent to properly preserve guest state.
> 4) From some reason, we decide to migrate again the guest in (1).
> Even if guest is migrated to a host with new_kernel, then QEMU still avoids sending msr_smi_count VMState subsection because it is launched with (x-migrate-smi-count == false).
> 
> Therefore, I think it makes more sense that source QEMU will always send all VMState subsection that are deemed needed (i.e. .nedeed() returns true)
> and let receive-side decide if migration should fail if this subsection was sent but failed to be restored.
> The only case which I think sender should limit the VMState subsection it sends to destination is because source is running older QEMU
> which is not even aware of this VMState subsection (Which is to my understanding the rational behind using “x-migrate-smi-count” and tie it up to machine-type).

But we want to avoid failed migrations if we can; so in general we don't
want to be sending subsections to destinations that can't handle them.
The only case where it's reasonable is when there's a migration bug such
that the behaviour in the guest is really nasty; if there's a choice
between a failed migration or a hung/corrupt guest I'll take a failed
migration.

> Third, let’s assume all hosts in fleet was upgraded to new_kernel. How do I modify all launched QEMUs on these new hosts to now have “x-migrate-smi-count” set to true?
> As I would like future migrations to do send this VMState subsection. Currently there is no QMP command to update these flags.

I guess that's possible - it's pretty painful though; you're going to
have to teach your management layer about features/fixes of the kernels
and which flags to tweak in qemu.  Having said that, if you could do it,
then you'd avoid having to restart VMs to pick up a few fixes.

> Fourth, I think it’s not trivial for management-plane to be aware with which flags it should set on destination QEMU based on currently running kernels on fleet.
> It’s not the same as machine-type, as already discussed above doesn’t change during the entire lifetime of guest.

Right, which is why I don't see your idea of adding flags will work.
I don't see how anything will figure out what the right flags to use
are.
(Getting the management layers to do sane things with the cpuid flags
is already a nightmare, and they're fairly well understood).

> I’m also not sure it is a good idea that we currently control flags such as “x-migrate-smi-count” from machine-type.
> As it means that if a guest was initially launched using some old QEMU, it will *forever* not migrate some VMState subsection during all it’s Live-Migrations.
> Even if all hosts and all QEMUs on fleet are capable of migrating this state properly.
> Maybe it is preferred that this flag was specified as part of “migrate” command itself in case management-plane knows it wishes to migrate even though dest QEMU
> is older and doesn’t understand this specific VMState subsection.
> 
> I’m left pretty confused about QEMU’s migration compatibility strategy...

The compatibility strategy is the machine type;  but yes it does
have a problem when it's not really just a qemu version - but also
kernel (and external libraries, etc).
My general advice is that users should be updating their kernels and
qemus together; but I realise there's lots of cases where that
doesn't work.

Dave

> -Liran
> 
> > 
> >> -Liran
> >> 
> >>> 
> >>>> Thanks,
> >>>> -Liran
> >>> --
> >>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >> 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] QEMU/KVM migration backwards compatibility broken?
@ 2019-06-06 10:39           ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert @ 2019-06-06 10:39 UTC (permalink / raw)
  To: Liran Alon; +Cc: Paolo Bonzini, qemu-devel, kvm list

* Liran Alon (liran.alon@oracle.com) wrote:
> 
> 
> > On 6 Jun 2019, at 12:23, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> > 
> > * Liran Alon (liran.alon@oracle.com) wrote:
> >> 
> >> 
> >>> On 6 Jun 2019, at 11:42, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> >>> 
> >>> * Liran Alon (liran.alon@oracle.com) wrote:
> >>>> Hi,
> >>>> 
> >>>> Looking at QEMU source code, I am puzzled regarding how migration backwards compatibility is preserved regarding X86CPU.
> >>>> 
> >>>> As I understand it, fields that are based on KVM capabilities and guest runtime usage are defined in VMState subsections in order to not send them if not necessary.
> >>>> This is done such that in case they are not needed and we migrate to an old QEMU which don’t support loading this state, migration will still succeed
> >>>> (As .needed() method will return false and therefore this state won’t be sent as part of migration stream).
> >>>> Furthermore, in case .needed() returns true and old QEMU don’t support loading this state, migration fails. As it should because we are aware that guest state
> >>>> is not going to be restored properly on destination.
> >>>> 
> >>>> I’m puzzled about what will happen in the following scenario:
> >>>> 1) Source is running new QEMU with new KVM that supports save of some VMState subsection.
> >>>> 2) Destination is running new QEMU that supports load this state but with old kernel that doesn’t know how to load this state.
> >>>> 
> >>>> I would have expected in this case that if source .needed() returns true, then migration will fail because of lack of support in destination kernel.
> >>>> However, it seems from current QEMU code that this will actually succeed in many cases.
> >>>> 
> >>>> For example, if msr_smi_count is sent as part of migration stream (See vmstate_msr_smi_count) and destination have has_msr_smi_count==false,
> >>>> then destination will succeed loading migration stream but kvm_put_msrs() will actually ignore env->msr_smi_count and will successfully load guest state.
> >>>> Therefore, migration will succeed even though it should have failed…
> >>>> 
> >>>> It seems to me that QEMU should have for every such VMState subsection, a .post_load() method that verifies that relevant capability is supported by kernel
> >>>> and otherwise fail migration.
> >>>> 
> >>>> What do you think? Should I really create a patch to modify all these CPUX86 VMState subsections to behave like this?
> >>> 
> >>> I don't know the x86 specific side that much; but from my migration side
> >>> the answer should mostly be through machine types - indeed for smi-count
> >>> there's a property 'x-migrate-smi-count' which is off for machine types
> >>> pre 2.11 (see hw/i386/pc.c pc_compat_2_11) - so if you've got an old
> >>> kernel you should stick to the old machine types.
> >>> 
> >>> There's nothing guarding running the new machine type on old-kernels;
> >>> and arguably we should have a check at startup that complains if
> >>> your kernel is missing something the machine type uses.
> >>> However, that would mean that people running with -M pc   would fail
> >>> on old kernels.
> >>> 
> >>> A post-load is also a valid check; but one question is whether,
> >>> for a particular register, the pain is worth it - it depends on the
> >>> symptom that the missing state causes.  If it's minor then you might
> >>> conclude it's not worth a failed migration;  if it's a hung or
> >>> corrupt guest then yes it is.   Certainly a warning printed is worth
> >>> it.
> >>> 
> >>> Dave
> >> 
> >> I think we should have flags that allow user to specify which VMState subsections user explicitly allow to avoid restore even though they are required to fully restore guest state.
> >> But it seems to me that the behaviour should be to always fail migration in case we load a VMState subsections that we are unable to restore unless user explicitly specified this is ok
> >> for this specific subsection.
> >> Therefore, it seems that for every VMState subsection that it’s restore is based on kernel capability we should:
> >> 1) Have a user-controllable flag (which is also tied to machine-type?) to explicitly allow avoid restoring this state if cannot. Default should be “false”.
> >> 2) Have a .post_load() method that verifies we have required kernel capability to restore this state, unless flag (1) was specified as “true”.
> > 
> > This seems a lot of flags; users aren't going to know what to do with
> > all of them; I don't see what will set/control them.
> 
> True but I think users will want to specify only for a handful of VMState subsections that it is OK to not restore them even thought hey are deemed needed by source QEMU.
> We can create flags only for those VMState subsections.
> User should set these flags explicitly on QEMU command-line. As a “-cpu” property? I don’t think these flags should be tied to machine-type.

I don't see who is going to work out these flags and send them.

> > 
> >> Note that above mentioned flags is different than flags such as “x-migrate-smi-count”.
> >> The purpose of “x-migrate-smi-count” flag is to avoid sending the VMState subsection to begin with in case we know we migrate to older QEMU which don’t even have the relevant VMState subsection. But it is not relevant for the case both source and destination runs QEMU which understands the VMState subsection but run on kernels with different capabilities.
> >> 
> >> Also note regarding your first paragraph, that specifying flags based on kernel you are running on doesn’t help for the case discussed here.
> >> As source QEMU is running on new kernel. Unless you meant that source QEMU should use relevant machine-type based on the destination kernel.
> >> i.e. You should launch QEMU with old machine-type as long as you have hosts in your migration pool that runs with old kernel.
> > 
> > That's what I meant; stick to the old machine-type unless you know it's
> > safe to use a newer one.
> > 
> >> I don’ think it’s the right approach though. As there is no way to change flags such as “x-migrate-smi-count” dynamically after all hosts in migration pool have been upgraded.
> >> 
> >> What do you think?
> > 
> > I don't have an easy answer.  The users already have to make sure they
> > use a machine type that's old enough for all the QEMUs installed in
> > their cluster; making sure it's also old enough for their oldest
> > kernel isn't too big a difference - *except* that it's much harder to
> > tell which kernel corresponds to which feature/machine type etc - so
> > how does a user know what the newest supported machine type is?
> > Failing at startup when selecting a machine type that the current
> > kernel can't support would help that.
> > 
> > Dave
> 
> First, machine-type express the set of vHW behaviour and properties that is exposed to guest.
> Therefore, machine-type shouldn’t change for a given guest lifetime (including Live-Migrations).
> Otherwise, guest will experience different vHW behaviour and properties before/after Live-Migration.
> So I think machine-type is not relevant for this discussion. We should focus on flags which specify
> migration behaviour (such as “x-migrate-smi-count” which can also be controlled by machine-type but not only).

Machine type specifies two things:
  a) The view from the guest
  b) Migration compatibility

(b) is explicitly documented in qemu's docs/devel/migration.rst, see the
subsection on subsections.

> Second, this strategy results in inefficient migration management. Consider the following scenario:
> 1) Guest running on new_qemu+old_kernel migrate to host with new_qemu+new_kernel.
> Because source is old_kernel than destination QEMU is launched with (x-migrate-smi-count == false).
> 2) Assume at this point fleet of hosts have half of hosts with old_kernel and half with new_kernel.
> 3) Further assume that guest workload indeed use msr_smi_count and therefore relevant VMState subsection should be sent to properly preserve guest state.
> 4) From some reason, we decide to migrate again the guest in (1).
> Even if guest is migrated to a host with new_kernel, then QEMU still avoids sending msr_smi_count VMState subsection because it is launched with (x-migrate-smi-count == false).
> 
> Therefore, I think it makes more sense that source QEMU will always send all VMState subsection that are deemed needed (i.e. .nedeed() returns true)
> and let receive-side decide if migration should fail if this subsection was sent but failed to be restored.
> The only case which I think sender should limit the VMState subsection it sends to destination is because source is running older QEMU
> which is not even aware of this VMState subsection (Which is to my understanding the rational behind using “x-migrate-smi-count” and tie it up to machine-type).

But we want to avoid failed migrations if we can; so in general we don't
want to be sending subsections to destinations that can't handle them.
The only case where it's reasonable is when there's a migration bug such
that the behaviour in the guest is really nasty; if there's a choice
between a failed migration or a hung/corrupt guest I'll take a failed
migration.

> Third, let’s assume all hosts in fleet was upgraded to new_kernel. How do I modify all launched QEMUs on these new hosts to now have “x-migrate-smi-count” set to true?
> As I would like future migrations to do send this VMState subsection. Currently there is no QMP command to update these flags.

I guess that's possible - it's pretty painful though; you're going to
have to teach your management layer about features/fixes of the kernels
and which flags to tweak in qemu.  Having said that, if you could do it,
then you'd avoid having to restart VMs to pick up a few fixes.

> Fourth, I think it’s not trivial for management-plane to be aware with which flags it should set on destination QEMU based on currently running kernels on fleet.
> It’s not the same as machine-type, as already discussed above doesn’t change during the entire lifetime of guest.

Right, which is why I don't see your idea of adding flags will work.
I don't see how anything will figure out what the right flags to use
are.
(Getting the management layers to do sane things with the cpuid flags
is already a nightmare, and they're fairly well understood).

> I’m also not sure it is a good idea that we currently control flags such as “x-migrate-smi-count” from machine-type.
> As it means that if a guest was initially launched using some old QEMU, it will *forever* not migrate some VMState subsection during all it’s Live-Migrations.
> Even if all hosts and all QEMUs on fleet are capable of migrating this state properly.
> Maybe it is preferred that this flag was specified as part of “migrate” command itself in case management-plane knows it wishes to migrate even though dest QEMU
> is older and doesn’t understand this specific VMState subsection.
> 
> I’m left pretty confused about QEMU’s migration compatibility strategy...

The compatibility strategy is the machine type;  but yes it does
have a problem when it's not really just a qemu version - but also
kernel (and external libraries, etc).
My general advice is that users should be updating their kernels and
qemus together; but I realise there's lots of cases where that
doesn't work.

Dave

> -Liran
> 
> > 
> >> -Liran
> >> 
> >>> 
> >>>> Thanks,
> >>>> -Liran
> >>> --
> >>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >> 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: QEMU/KVM migration backwards compatibility broken?
  2019-06-06 10:39           ` [Qemu-devel] " Dr. David Alan Gilbert
@ 2019-06-06 10:57             ` Liran Alon
  -1 siblings, 0 replies; 26+ messages in thread
From: Liran Alon @ 2019-06-06 10:57 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: kvm list, qemu-devel, Paolo Bonzini



> On 6 Jun 2019, at 13:39, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> 
> * Liran Alon (liran.alon@oracle.com) wrote:
>> 
>> 
>>> On 6 Jun 2019, at 12:23, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
>>> 
>>> * Liran Alon (liran.alon@oracle.com) wrote:
>>>> 
>>>> 
>>>>> On 6 Jun 2019, at 11:42, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
>>>>> 
>>>>> * Liran Alon (liran.alon@oracle.com) wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> Looking at QEMU source code, I am puzzled regarding how migration backwards compatibility is preserved regarding X86CPU.
>>>>>> 
>>>>>> As I understand it, fields that are based on KVM capabilities and guest runtime usage are defined in VMState subsections in order to not send them if not necessary.
>>>>>> This is done such that in case they are not needed and we migrate to an old QEMU which don’t support loading this state, migration will still succeed
>>>>>> (As .needed() method will return false and therefore this state won’t be sent as part of migration stream).
>>>>>> Furthermore, in case .needed() returns true and old QEMU don’t support loading this state, migration fails. As it should because we are aware that guest state
>>>>>> is not going to be restored properly on destination.
>>>>>> 
>>>>>> I’m puzzled about what will happen in the following scenario:
>>>>>> 1) Source is running new QEMU with new KVM that supports save of some VMState subsection.
>>>>>> 2) Destination is running new QEMU that supports load this state but with old kernel that doesn’t know how to load this state.
>>>>>> 
>>>>>> I would have expected in this case that if source .needed() returns true, then migration will fail because of lack of support in destination kernel.
>>>>>> However, it seems from current QEMU code that this will actually succeed in many cases.
>>>>>> 
>>>>>> For example, if msr_smi_count is sent as part of migration stream (See vmstate_msr_smi_count) and destination have has_msr_smi_count==false,
>>>>>> then destination will succeed loading migration stream but kvm_put_msrs() will actually ignore env->msr_smi_count and will successfully load guest state.
>>>>>> Therefore, migration will succeed even though it should have failed…
>>>>>> 
>>>>>> It seems to me that QEMU should have for every such VMState subsection, a .post_load() method that verifies that relevant capability is supported by kernel
>>>>>> and otherwise fail migration.
>>>>>> 
>>>>>> What do you think? Should I really create a patch to modify all these CPUX86 VMState subsections to behave like this?
>>>>> 
>>>>> I don't know the x86 specific side that much; but from my migration side
>>>>> the answer should mostly be through machine types - indeed for smi-count
>>>>> there's a property 'x-migrate-smi-count' which is off for machine types
>>>>> pre 2.11 (see hw/i386/pc.c pc_compat_2_11) - so if you've got an old
>>>>> kernel you should stick to the old machine types.
>>>>> 
>>>>> There's nothing guarding running the new machine type on old-kernels;
>>>>> and arguably we should have a check at startup that complains if
>>>>> your kernel is missing something the machine type uses.
>>>>> However, that would mean that people running with -M pc   would fail
>>>>> on old kernels.
>>>>> 
>>>>> A post-load is also a valid check; but one question is whether,
>>>>> for a particular register, the pain is worth it - it depends on the
>>>>> symptom that the missing state causes.  If it's minor then you might
>>>>> conclude it's not worth a failed migration;  if it's a hung or
>>>>> corrupt guest then yes it is.   Certainly a warning printed is worth
>>>>> it.
>>>>> 
>>>>> Dave
>>>> 
>>>> I think we should have flags that allow user to specify which VMState subsections user explicitly allow to avoid restore even though they are required to fully restore guest state.
>>>> But it seems to me that the behaviour should be to always fail migration in case we load a VMState subsections that we are unable to restore unless user explicitly specified this is ok
>>>> for this specific subsection.
>>>> Therefore, it seems that for every VMState subsection that it’s restore is based on kernel capability we should:
>>>> 1) Have a user-controllable flag (which is also tied to machine-type?) to explicitly allow avoid restoring this state if cannot. Default should be “false”.
>>>> 2) Have a .post_load() method that verifies we have required kernel capability to restore this state, unless flag (1) was specified as “true”.
>>> 
>>> This seems a lot of flags; users aren't going to know what to do with
>>> all of them; I don't see what will set/control them.
>> 
>> True but I think users will want to specify only for a handful of VMState subsections that it is OK to not restore them even thought hey are deemed needed by source QEMU.
>> We can create flags only for those VMState subsections.
>> User should set these flags explicitly on QEMU command-line. As a “-cpu” property? I don’t think these flags should be tied to machine-type.
> 
> I don't see who is going to work out these flags and send them.
> 
>>> 
>>>> Note that above mentioned flags is different than flags such as “x-migrate-smi-count”.
>>>> The purpose of “x-migrate-smi-count” flag is to avoid sending the VMState subsection to begin with in case we know we migrate to older QEMU which don’t even have the relevant VMState subsection. But it is not relevant for the case both source and destination runs QEMU which understands the VMState subsection but run on kernels with different capabilities.
>>>> 
>>>> Also note regarding your first paragraph, that specifying flags based on kernel you are running on doesn’t help for the case discussed here.
>>>> As source QEMU is running on new kernel. Unless you meant that source QEMU should use relevant machine-type based on the destination kernel.
>>>> i.e. You should launch QEMU with old machine-type as long as you have hosts in your migration pool that runs with old kernel.
>>> 
>>> That's what I meant; stick to the old machine-type unless you know it's
>>> safe to use a newer one.
>>> 
>>>> I don’ think it’s the right approach though. As there is no way to change flags such as “x-migrate-smi-count” dynamically after all hosts in migration pool have been upgraded.
>>>> 
>>>> What do you think?
>>> 
>>> I don't have an easy answer.  The users already have to make sure they
>>> use a machine type that's old enough for all the QEMUs installed in
>>> their cluster; making sure it's also old enough for their oldest
>>> kernel isn't too big a difference - *except* that it's much harder to
>>> tell which kernel corresponds to which feature/machine type etc - so
>>> how does a user know what the newest supported machine type is?
>>> Failing at startup when selecting a machine type that the current
>>> kernel can't support would help that.
>>> 
>>> Dave
>> 
>> First, machine-type express the set of vHW behaviour and properties that is exposed to guest.
>> Therefore, machine-type shouldn’t change for a given guest lifetime (including Live-Migrations).
>> Otherwise, guest will experience different vHW behaviour and properties before/after Live-Migration.
>> So I think machine-type is not relevant for this discussion. We should focus on flags which specify
>> migration behaviour (such as “x-migrate-smi-count” which can also be controlled by machine-type but not only).
> 
> Machine type specifies two things:
>  a) The view from the guest
>  b) Migration compatibility
> 
> (b) is explicitly documented in qemu's docs/devel/migration.rst, see the
> subsection on subsections.
> 
>> Second, this strategy results in inefficient migration management. Consider the following scenario:
>> 1) Guest running on new_qemu+old_kernel migrate to host with new_qemu+new_kernel.
>> Because source is old_kernel than destination QEMU is launched with (x-migrate-smi-count == false).
>> 2) Assume at this point fleet of hosts have half of hosts with old_kernel and half with new_kernel.
>> 3) Further assume that guest workload indeed use msr_smi_count and therefore relevant VMState subsection should be sent to properly preserve guest state.
>> 4) From some reason, we decide to migrate again the guest in (1).
>> Even if guest is migrated to a host with new_kernel, then QEMU still avoids sending msr_smi_count VMState subsection because it is launched with (x-migrate-smi-count == false).
>> 
>> Therefore, I think it makes more sense that source QEMU will always send all VMState subsection that are deemed needed (i.e. .nedeed() returns true)
>> and let receive-side decide if migration should fail if this subsection was sent but failed to be restored.
>> The only case which I think sender should limit the VMState subsection it sends to destination is because source is running older QEMU
>> which is not even aware of this VMState subsection (Which is to my understanding the rational behind using “x-migrate-smi-count” and tie it up to machine-type).
> 
> But we want to avoid failed migrations if we can; so in general we don't
> want to be sending subsections to destinations that can't handle them.
> The only case where it's reasonable is when there's a migration bug such
> that the behaviour in the guest is really nasty; if there's a choice
> between a failed migration or a hung/corrupt guest I'll take a failed
> migration.
> 
>> Third, let’s assume all hosts in fleet was upgraded to new_kernel. How do I modify all launched QEMUs on these new hosts to now have “x-migrate-smi-count” set to true?
>> As I would like future migrations to do send this VMState subsection. Currently there is no QMP command to update these flags.
> 
> I guess that's possible - it's pretty painful though; you're going to
> have to teach your management layer about features/fixes of the kernels
> and which flags to tweak in qemu.  Having said that, if you could do it,
> then you'd avoid having to restart VMs to pick up a few fixes.
> 
>> Fourth, I think it’s not trivial for management-plane to be aware with which flags it should set on destination QEMU based on currently running kernels on fleet.
>> It’s not the same as machine-type, as already discussed above doesn’t change during the entire lifetime of guest.
> 
> Right, which is why I don't see your idea of adding flags will work.
> I don't see how anything will figure out what the right flags to use
> are.
> (Getting the management layers to do sane things with the cpuid flags
> is already a nightmare, and they're fairly well understood).
> 
>> I’m also not sure it is a good idea that we currently control flags such as “x-migrate-smi-count” from machine-type.
>> As it means that if a guest was initially launched using some old QEMU, it will *forever* not migrate some VMState subsection during all it’s Live-Migrations.
>> Even if all hosts and all QEMUs on fleet are capable of migrating this state properly.
>> Maybe it is preferred that this flag was specified as part of “migrate” command itself in case management-plane knows it wishes to migrate even though dest QEMU
>> is older and doesn’t understand this specific VMState subsection.
>> 
>> I’m left pretty confused about QEMU’s migration compatibility strategy...
> 
> The compatibility strategy is the machine type;  but yes it does
> have a problem when it's not really just a qemu version - but also
> kernel (and external libraries, etc).
> My general advice is that users should be updating their kernels and
> qemus together; but I realise there's lots of cases where that
> doesn't work.
> 
> Dave

I think it’s not practical advise to expect users to always upgrade kernel and QEMU together.
In fact, users prefer to upgrade them separately to avoid doing major upgrades at once and to better pin-point root-cause of issues.

Compiling all above very useful discussion (thanks for this!), I may have a better suggestion that doesn’t require any additional flags:
1) Source QEMU will always send all all VMState subsections that is deemed by source QEMU as required to not break guest semantic behaviour.
This is done by .needed() methods that examine guest runtime state to understand if this state is required to be sent or not.
2) Destination QEMU will provide a generic QMP command which allows to set names of VMState subsections that if accepted on migration stream
and failed to be loaded (because either subsection name is not implemented or because .post_load() method failed) then the failure should be ignored
and migration should continue as usual. By default, the list of this names will be empty.
3) Destination QEMU will implement .post_load() method for all these VMState subsections that depend on kernel capability to be restored properly
such that it will fail subsection load in case kernel capability is not present. (Note that this load failure will be ignored if subsection name is specified in (2)).

Above suggestion have the following properties:
1) Doesn’t require any flag to be added to QEMU.
2) Moves all control on whether to fail migration because of failure to load VMState subsection to receiver side. Sender always attempts to send max state he believes is required.
3) We remove coupling of migration compatibility from machine-type.

What do you think?

-Liran

> 
>> -Liran
>> 
>>> 
>>>> -Liran
>>>> 
>>>>> 
>>>>>> Thanks,
>>>>>> -Liran
>>>>> --
>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>> 
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] QEMU/KVM migration backwards compatibility broken?
@ 2019-06-06 10:57             ` Liran Alon
  0 siblings, 0 replies; 26+ messages in thread
From: Liran Alon @ 2019-06-06 10:57 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Paolo Bonzini, qemu-devel, kvm list



> On 6 Jun 2019, at 13:39, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> 
> * Liran Alon (liran.alon@oracle.com) wrote:
>> 
>> 
>>> On 6 Jun 2019, at 12:23, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
>>> 
>>> * Liran Alon (liran.alon@oracle.com) wrote:
>>>> 
>>>> 
>>>>> On 6 Jun 2019, at 11:42, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
>>>>> 
>>>>> * Liran Alon (liran.alon@oracle.com) wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> Looking at QEMU source code, I am puzzled regarding how migration backwards compatibility is preserved regarding X86CPU.
>>>>>> 
>>>>>> As I understand it, fields that are based on KVM capabilities and guest runtime usage are defined in VMState subsections in order to not send them if not necessary.
>>>>>> This is done such that in case they are not needed and we migrate to an old QEMU which don’t support loading this state, migration will still succeed
>>>>>> (As .needed() method will return false and therefore this state won’t be sent as part of migration stream).
>>>>>> Furthermore, in case .needed() returns true and old QEMU don’t support loading this state, migration fails. As it should because we are aware that guest state
>>>>>> is not going to be restored properly on destination.
>>>>>> 
>>>>>> I’m puzzled about what will happen in the following scenario:
>>>>>> 1) Source is running new QEMU with new KVM that supports save of some VMState subsection.
>>>>>> 2) Destination is running new QEMU that supports load this state but with old kernel that doesn’t know how to load this state.
>>>>>> 
>>>>>> I would have expected in this case that if source .needed() returns true, then migration will fail because of lack of support in destination kernel.
>>>>>> However, it seems from current QEMU code that this will actually succeed in many cases.
>>>>>> 
>>>>>> For example, if msr_smi_count is sent as part of migration stream (See vmstate_msr_smi_count) and destination have has_msr_smi_count==false,
>>>>>> then destination will succeed loading migration stream but kvm_put_msrs() will actually ignore env->msr_smi_count and will successfully load guest state.
>>>>>> Therefore, migration will succeed even though it should have failed…
>>>>>> 
>>>>>> It seems to me that QEMU should have for every such VMState subsection, a .post_load() method that verifies that relevant capability is supported by kernel
>>>>>> and otherwise fail migration.
>>>>>> 
>>>>>> What do you think? Should I really create a patch to modify all these CPUX86 VMState subsections to behave like this?
>>>>> 
>>>>> I don't know the x86 specific side that much; but from my migration side
>>>>> the answer should mostly be through machine types - indeed for smi-count
>>>>> there's a property 'x-migrate-smi-count' which is off for machine types
>>>>> pre 2.11 (see hw/i386/pc.c pc_compat_2_11) - so if you've got an old
>>>>> kernel you should stick to the old machine types.
>>>>> 
>>>>> There's nothing guarding running the new machine type on old-kernels;
>>>>> and arguably we should have a check at startup that complains if
>>>>> your kernel is missing something the machine type uses.
>>>>> However, that would mean that people running with -M pc   would fail
>>>>> on old kernels.
>>>>> 
>>>>> A post-load is also a valid check; but one question is whether,
>>>>> for a particular register, the pain is worth it - it depends on the
>>>>> symptom that the missing state causes.  If it's minor then you might
>>>>> conclude it's not worth a failed migration;  if it's a hung or
>>>>> corrupt guest then yes it is.   Certainly a warning printed is worth
>>>>> it.
>>>>> 
>>>>> Dave
>>>> 
>>>> I think we should have flags that allow user to specify which VMState subsections user explicitly allow to avoid restore even though they are required to fully restore guest state.
>>>> But it seems to me that the behaviour should be to always fail migration in case we load a VMState subsections that we are unable to restore unless user explicitly specified this is ok
>>>> for this specific subsection.
>>>> Therefore, it seems that for every VMState subsection that it’s restore is based on kernel capability we should:
>>>> 1) Have a user-controllable flag (which is also tied to machine-type?) to explicitly allow avoid restoring this state if cannot. Default should be “false”.
>>>> 2) Have a .post_load() method that verifies we have required kernel capability to restore this state, unless flag (1) was specified as “true”.
>>> 
>>> This seems a lot of flags; users aren't going to know what to do with
>>> all of them; I don't see what will set/control them.
>> 
>> True but I think users will want to specify only for a handful of VMState subsections that it is OK to not restore them even thought hey are deemed needed by source QEMU.
>> We can create flags only for those VMState subsections.
>> User should set these flags explicitly on QEMU command-line. As a “-cpu” property? I don’t think these flags should be tied to machine-type.
> 
> I don't see who is going to work out these flags and send them.
> 
>>> 
>>>> Note that above mentioned flags is different than flags such as “x-migrate-smi-count”.
>>>> The purpose of “x-migrate-smi-count” flag is to avoid sending the VMState subsection to begin with in case we know we migrate to older QEMU which don’t even have the relevant VMState subsection. But it is not relevant for the case both source and destination runs QEMU which understands the VMState subsection but run on kernels with different capabilities.
>>>> 
>>>> Also note regarding your first paragraph, that specifying flags based on kernel you are running on doesn’t help for the case discussed here.
>>>> As source QEMU is running on new kernel. Unless you meant that source QEMU should use relevant machine-type based on the destination kernel.
>>>> i.e. You should launch QEMU with old machine-type as long as you have hosts in your migration pool that runs with old kernel.
>>> 
>>> That's what I meant; stick to the old machine-type unless you know it's
>>> safe to use a newer one.
>>> 
>>>> I don’ think it’s the right approach though. As there is no way to change flags such as “x-migrate-smi-count” dynamically after all hosts in migration pool have been upgraded.
>>>> 
>>>> What do you think?
>>> 
>>> I don't have an easy answer.  The users already have to make sure they
>>> use a machine type that's old enough for all the QEMUs installed in
>>> their cluster; making sure it's also old enough for their oldest
>>> kernel isn't too big a difference - *except* that it's much harder to
>>> tell which kernel corresponds to which feature/machine type etc - so
>>> how does a user know what the newest supported machine type is?
>>> Failing at startup when selecting a machine type that the current
>>> kernel can't support would help that.
>>> 
>>> Dave
>> 
>> First, machine-type express the set of vHW behaviour and properties that is exposed to guest.
>> Therefore, machine-type shouldn’t change for a given guest lifetime (including Live-Migrations).
>> Otherwise, guest will experience different vHW behaviour and properties before/after Live-Migration.
>> So I think machine-type is not relevant for this discussion. We should focus on flags which specify
>> migration behaviour (such as “x-migrate-smi-count” which can also be controlled by machine-type but not only).
> 
> Machine type specifies two things:
>  a) The view from the guest
>  b) Migration compatibility
> 
> (b) is explicitly documented in qemu's docs/devel/migration.rst, see the
> subsection on subsections.
> 
>> Second, this strategy results in inefficient migration management. Consider the following scenario:
>> 1) Guest running on new_qemu+old_kernel migrate to host with new_qemu+new_kernel.
>> Because source is old_kernel than destination QEMU is launched with (x-migrate-smi-count == false).
>> 2) Assume at this point fleet of hosts have half of hosts with old_kernel and half with new_kernel.
>> 3) Further assume that guest workload indeed use msr_smi_count and therefore relevant VMState subsection should be sent to properly preserve guest state.
>> 4) From some reason, we decide to migrate again the guest in (1).
>> Even if guest is migrated to a host with new_kernel, then QEMU still avoids sending msr_smi_count VMState subsection because it is launched with (x-migrate-smi-count == false).
>> 
>> Therefore, I think it makes more sense that source QEMU will always send all VMState subsection that are deemed needed (i.e. .nedeed() returns true)
>> and let receive-side decide if migration should fail if this subsection was sent but failed to be restored.
>> The only case which I think sender should limit the VMState subsection it sends to destination is because source is running older QEMU
>> which is not even aware of this VMState subsection (Which is to my understanding the rational behind using “x-migrate-smi-count” and tie it up to machine-type).
> 
> But we want to avoid failed migrations if we can; so in general we don't
> want to be sending subsections to destinations that can't handle them.
> The only case where it's reasonable is when there's a migration bug such
> that the behaviour in the guest is really nasty; if there's a choice
> between a failed migration or a hung/corrupt guest I'll take a failed
> migration.
> 
>> Third, let’s assume all hosts in fleet was upgraded to new_kernel. How do I modify all launched QEMUs on these new hosts to now have “x-migrate-smi-count” set to true?
>> As I would like future migrations to do send this VMState subsection. Currently there is no QMP command to update these flags.
> 
> I guess that's possible - it's pretty painful though; you're going to
> have to teach your management layer about features/fixes of the kernels
> and which flags to tweak in qemu.  Having said that, if you could do it,
> then you'd avoid having to restart VMs to pick up a few fixes.
> 
>> Fourth, I think it’s not trivial for management-plane to be aware with which flags it should set on destination QEMU based on currently running kernels on fleet.
>> It’s not the same as machine-type, as already discussed above doesn’t change during the entire lifetime of guest.
> 
> Right, which is why I don't see your idea of adding flags will work.
> I don't see how anything will figure out what the right flags to use
> are.
> (Getting the management layers to do sane things with the cpuid flags
> is already a nightmare, and they're fairly well understood).
> 
>> I’m also not sure it is a good idea that we currently control flags such as “x-migrate-smi-count” from machine-type.
>> As it means that if a guest was initially launched using some old QEMU, it will *forever* not migrate some VMState subsection during all it’s Live-Migrations.
>> Even if all hosts and all QEMUs on fleet are capable of migrating this state properly.
>> Maybe it is preferred that this flag was specified as part of “migrate” command itself in case management-plane knows it wishes to migrate even though dest QEMU
>> is older and doesn’t understand this specific VMState subsection.
>> 
>> I’m left pretty confused about QEMU’s migration compatibility strategy...
> 
> The compatibility strategy is the machine type;  but yes it does
> have a problem when it's not really just a qemu version - but also
> kernel (and external libraries, etc).
> My general advice is that users should be updating their kernels and
> qemus together; but I realise there's lots of cases where that
> doesn't work.
> 
> Dave

I think it’s not practical advise to expect users to always upgrade kernel and QEMU together.
In fact, users prefer to upgrade them separately to avoid doing major upgrades at once and to better pin-point root-cause of issues.

Compiling all above very useful discussion (thanks for this!), I may have a better suggestion that doesn’t require any additional flags:
1) Source QEMU will always send all all VMState subsections that is deemed by source QEMU as required to not break guest semantic behaviour.
This is done by .needed() methods that examine guest runtime state to understand if this state is required to be sent or not.
2) Destination QEMU will provide a generic QMP command which allows to set names of VMState subsections that if accepted on migration stream
and failed to be loaded (because either subsection name is not implemented or because .post_load() method failed) then the failure should be ignored
and migration should continue as usual. By default, the list of this names will be empty.
3) Destination QEMU will implement .post_load() method for all these VMState subsections that depend on kernel capability to be restored properly
such that it will fail subsection load in case kernel capability is not present. (Note that this load failure will be ignored if subsection name is specified in (2)).

Above suggestion have the following properties:
1) Doesn’t require any flag to be added to QEMU.
2) Moves all control on whether to fail migration because of failure to load VMState subsection to receiver side. Sender always attempts to send max state he believes is required.
3) We remove coupling of migration compatibility from machine-type.

What do you think?

-Liran

> 
>> -Liran
>> 
>>> 
>>>> -Liran
>>>> 
>>>>> 
>>>>>> Thanks,
>>>>>> -Liran
>>>>> --
>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>> 
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: QEMU/KVM migration backwards compatibility broken?
  2019-06-06 10:57             ` [Qemu-devel] " Liran Alon
@ 2019-06-06 11:07               ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert @ 2019-06-06 11:07 UTC (permalink / raw)
  To: Liran Alon; +Cc: kvm list, qemu-devel, Paolo Bonzini

* Liran Alon (liran.alon@oracle.com) wrote:
> 
> 
> > On 6 Jun 2019, at 13:39, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> > 
> > * Liran Alon (liran.alon@oracle.com) wrote:
> >> 
> >> 
> >>> On 6 Jun 2019, at 12:23, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> >>> 
> >>> * Liran Alon (liran.alon@oracle.com) wrote:
> >>>> 
> >>>> 
> >>>>> On 6 Jun 2019, at 11:42, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> >>>>> 
> >>>>> * Liran Alon (liran.alon@oracle.com) wrote:
> >>>>>> Hi,
> >>>>>> 
> >>>>>> Looking at QEMU source code, I am puzzled regarding how migration backwards compatibility is preserved regarding X86CPU.
> >>>>>> 
> >>>>>> As I understand it, fields that are based on KVM capabilities and guest runtime usage are defined in VMState subsections in order to not send them if not necessary.
> >>>>>> This is done such that in case they are not needed and we migrate to an old QEMU which don’t support loading this state, migration will still succeed
> >>>>>> (As .needed() method will return false and therefore this state won’t be sent as part of migration stream).
> >>>>>> Furthermore, in case .needed() returns true and old QEMU don’t support loading this state, migration fails. As it should because we are aware that guest state
> >>>>>> is not going to be restored properly on destination.
> >>>>>> 
> >>>>>> I’m puzzled about what will happen in the following scenario:
> >>>>>> 1) Source is running new QEMU with new KVM that supports save of some VMState subsection.
> >>>>>> 2) Destination is running new QEMU that supports load this state but with old kernel that doesn’t know how to load this state.
> >>>>>> 
> >>>>>> I would have expected in this case that if source .needed() returns true, then migration will fail because of lack of support in destination kernel.
> >>>>>> However, it seems from current QEMU code that this will actually succeed in many cases.
> >>>>>> 
> >>>>>> For example, if msr_smi_count is sent as part of migration stream (See vmstate_msr_smi_count) and destination have has_msr_smi_count==false,
> >>>>>> then destination will succeed loading migration stream but kvm_put_msrs() will actually ignore env->msr_smi_count and will successfully load guest state.
> >>>>>> Therefore, migration will succeed even though it should have failed…
> >>>>>> 
> >>>>>> It seems to me that QEMU should have for every such VMState subsection, a .post_load() method that verifies that relevant capability is supported by kernel
> >>>>>> and otherwise fail migration.
> >>>>>> 
> >>>>>> What do you think? Should I really create a patch to modify all these CPUX86 VMState subsections to behave like this?
> >>>>> 
> >>>>> I don't know the x86 specific side that much; but from my migration side
> >>>>> the answer should mostly be through machine types - indeed for smi-count
> >>>>> there's a property 'x-migrate-smi-count' which is off for machine types
> >>>>> pre 2.11 (see hw/i386/pc.c pc_compat_2_11) - so if you've got an old
> >>>>> kernel you should stick to the old machine types.
> >>>>> 
> >>>>> There's nothing guarding running the new machine type on old-kernels;
> >>>>> and arguably we should have a check at startup that complains if
> >>>>> your kernel is missing something the machine type uses.
> >>>>> However, that would mean that people running with -M pc   would fail
> >>>>> on old kernels.
> >>>>> 
> >>>>> A post-load is also a valid check; but one question is whether,
> >>>>> for a particular register, the pain is worth it - it depends on the
> >>>>> symptom that the missing state causes.  If it's minor then you might
> >>>>> conclude it's not worth a failed migration;  if it's a hung or
> >>>>> corrupt guest then yes it is.   Certainly a warning printed is worth
> >>>>> it.
> >>>>> 
> >>>>> Dave
> >>>> 
> >>>> I think we should have flags that allow user to specify which VMState subsections user explicitly allow to avoid restore even though they are required to fully restore guest state.
> >>>> But it seems to me that the behaviour should be to always fail migration in case we load a VMState subsections that we are unable to restore unless user explicitly specified this is ok
> >>>> for this specific subsection.
> >>>> Therefore, it seems that for every VMState subsection that it’s restore is based on kernel capability we should:
> >>>> 1) Have a user-controllable flag (which is also tied to machine-type?) to explicitly allow avoid restoring this state if cannot. Default should be “false”.
> >>>> 2) Have a .post_load() method that verifies we have required kernel capability to restore this state, unless flag (1) was specified as “true”.
> >>> 
> >>> This seems a lot of flags; users aren't going to know what to do with
> >>> all of them; I don't see what will set/control them.
> >> 
> >> True but I think users will want to specify only for a handful of VMState subsections that it is OK to not restore them even thought hey are deemed needed by source QEMU.
> >> We can create flags only for those VMState subsections.
> >> User should set these flags explicitly on QEMU command-line. As a “-cpu” property? I don’t think these flags should be tied to machine-type.
> > 
> > I don't see who is going to work out these flags and send them.
> > 
> >>> 
> >>>> Note that above mentioned flags is different than flags such as “x-migrate-smi-count”.
> >>>> The purpose of “x-migrate-smi-count” flag is to avoid sending the VMState subsection to begin with in case we know we migrate to older QEMU which don’t even have the relevant VMState subsection. But it is not relevant for the case both source and destination runs QEMU which understands the VMState subsection but run on kernels with different capabilities.
> >>>> 
> >>>> Also note regarding your first paragraph, that specifying flags based on kernel you are running on doesn’t help for the case discussed here.
> >>>> As source QEMU is running on new kernel. Unless you meant that source QEMU should use relevant machine-type based on the destination kernel.
> >>>> i.e. You should launch QEMU with old machine-type as long as you have hosts in your migration pool that runs with old kernel.
> >>> 
> >>> That's what I meant; stick to the old machine-type unless you know it's
> >>> safe to use a newer one.
> >>> 
> >>>> I don’ think it’s the right approach though. As there is no way to change flags such as “x-migrate-smi-count” dynamically after all hosts in migration pool have been upgraded.
> >>>> 
> >>>> What do you think?
> >>> 
> >>> I don't have an easy answer.  The users already have to make sure they
> >>> use a machine type that's old enough for all the QEMUs installed in
> >>> their cluster; making sure it's also old enough for their oldest
> >>> kernel isn't too big a difference - *except* that it's much harder to
> >>> tell which kernel corresponds to which feature/machine type etc - so
> >>> how does a user know what the newest supported machine type is?
> >>> Failing at startup when selecting a machine type that the current
> >>> kernel can't support would help that.
> >>> 
> >>> Dave
> >> 
> >> First, machine-type express the set of vHW behaviour and properties that is exposed to guest.
> >> Therefore, machine-type shouldn’t change for a given guest lifetime (including Live-Migrations).
> >> Otherwise, guest will experience different vHW behaviour and properties before/after Live-Migration.
> >> So I think machine-type is not relevant for this discussion. We should focus on flags which specify
> >> migration behaviour (such as “x-migrate-smi-count” which can also be controlled by machine-type but not only).
> > 
> > Machine type specifies two things:
> >  a) The view from the guest
> >  b) Migration compatibility
> > 
> > (b) is explicitly documented in qemu's docs/devel/migration.rst, see the
> > subsection on subsections.
> > 
> >> Second, this strategy results in inefficient migration management. Consider the following scenario:
> >> 1) Guest running on new_qemu+old_kernel migrate to host with new_qemu+new_kernel.
> >> Because source is old_kernel than destination QEMU is launched with (x-migrate-smi-count == false).
> >> 2) Assume at this point fleet of hosts have half of hosts with old_kernel and half with new_kernel.
> >> 3) Further assume that guest workload indeed use msr_smi_count and therefore relevant VMState subsection should be sent to properly preserve guest state.
> >> 4) From some reason, we decide to migrate again the guest in (1).
> >> Even if guest is migrated to a host with new_kernel, then QEMU still avoids sending msr_smi_count VMState subsection because it is launched with (x-migrate-smi-count == false).
> >> 
> >> Therefore, I think it makes more sense that source QEMU will always send all VMState subsection that are deemed needed (i.e. .nedeed() returns true)
> >> and let receive-side decide if migration should fail if this subsection was sent but failed to be restored.
> >> The only case which I think sender should limit the VMState subsection it sends to destination is because source is running older QEMU
> >> which is not even aware of this VMState subsection (Which is to my understanding the rational behind using “x-migrate-smi-count” and tie it up to machine-type).
> > 
> > But we want to avoid failed migrations if we can; so in general we don't
> > want to be sending subsections to destinations that can't handle them.
> > The only case where it's reasonable is when there's a migration bug such
> > that the behaviour in the guest is really nasty; if there's a choice
> > between a failed migration or a hung/corrupt guest I'll take a failed
> > migration.
> > 
> >> Third, let’s assume all hosts in fleet was upgraded to new_kernel. How do I modify all launched QEMUs on these new hosts to now have “x-migrate-smi-count” set to true?
> >> As I would like future migrations to do send this VMState subsection. Currently there is no QMP command to update these flags.
> > 
> > I guess that's possible - it's pretty painful though; you're going to
> > have to teach your management layer about features/fixes of the kernels
> > and which flags to tweak in qemu.  Having said that, if you could do it,
> > then you'd avoid having to restart VMs to pick up a few fixes.
> > 
> >> Fourth, I think it’s not trivial for management-plane to be aware with which flags it should set on destination QEMU based on currently running kernels on fleet.
> >> It’s not the same as machine-type, as already discussed above doesn’t change during the entire lifetime of guest.
> > 
> > Right, which is why I don't see your idea of adding flags will work.
> > I don't see how anything will figure out what the right flags to use
> > are.
> > (Getting the management layers to do sane things with the cpuid flags
> > is already a nightmare, and they're fairly well understood).
> > 
> >> I’m also not sure it is a good idea that we currently control flags such as “x-migrate-smi-count” from machine-type.
> >> As it means that if a guest was initially launched using some old QEMU, it will *forever* not migrate some VMState subsection during all it’s Live-Migrations.
> >> Even if all hosts and all QEMUs on fleet are capable of migrating this state properly.
> >> Maybe it is preferred that this flag was specified as part of “migrate” command itself in case management-plane knows it wishes to migrate even though dest QEMU
> >> is older and doesn’t understand this specific VMState subsection.
> >> 
> >> I’m left pretty confused about QEMU’s migration compatibility strategy...
> > 
> > The compatibility strategy is the machine type;  but yes it does
> > have a problem when it's not really just a qemu version - but also
> > kernel (and external libraries, etc).
> > My general advice is that users should be updating their kernels and
> > qemus together; but I realise there's lots of cases where that
> > doesn't work.
> > 
> > Dave
> 
> I think it’s not practical advise to expect users to always upgrade kernel and QEMU together.
> In fact, users prefer to upgrade them separately to avoid doing major upgrades at once and to better pin-point root-cause of issues.

It's tricky; for distro-based users, hitting 'update' and getting both
makes a lot of sense; but as you say you ened to let them do stuff
individually if they want to, so they can track down problems.
There's also a newer problem which is people want to run the QEMU in
containers on hosts that have separate update schedules - the kernel
version relationship is then much more fluid.

> Compiling all above very useful discussion (thanks for this!), I may have a better suggestion that doesn’t require any additional flags:
> 1) Source QEMU will always send all all VMState subsections that is deemed by source QEMU as required to not break guest semantic behaviour.
> This is done by .needed() methods that examine guest runtime state to understand if this state is required to be sent or not.

So that's as we already do.

> 2) Destination QEMU will provide a generic QMP command which allows to set names of VMState subsections that if accepted on migration stream
> and failed to be loaded (because either subsection name is not implemented or because .post_load() method failed) then the failure should be ignored
> and migration should continue as usual. By default, the list of this names will be empty.

The format of the migration stream means that you can't skip an unknown
subsection; it's not possible to resume parsing the stream without
knowing what was supposed to be there. [This is pretty awful
but my last attempts to rework it hit a dead end]

So we still need to tie subsections to machine types; that way
you don't send them to old qemu's and there for you don't have the
problem of the qemu receiving something it doesn't know.

Still, you could skip things where the destination kernel doesn't know
about it.

> 3) Destination QEMU will implement .post_load() method for all these VMState subsections that depend on kernel capability to be restored properly
> such that it will fail subsection load in case kernel capability is not present. (Note that this load failure will be ignored if subsection name is specified in (2)).
> 
> Above suggestion have the following properties:
> 1) Doesn’t require any flag to be added to QEMU.

There's no logical difference between 'flags' and 'names of subsections'
- they're got the same problem in someone somewhere knowing which are
  safe.

> 2) Moves all control on whether to fail migration because of failure to load VMState subsection to receiver side. Sender always attempts to send max state he believes is required.
> 3) We remove coupling of migration compatibility from machine-type.
> 
> What do you think?

Sorry, can't do (3) - we need to keep the binding for subsections to
machine types for qemu compatibility;  I'm open for something for
kernel compat, but not when it's breaking the qemu subsection
checks.

Dave

> 
> -Liran
> 
> > 
> >> -Liran
> >> 
> >>> 
> >>>> -Liran
> >>>> 
> >>>>> 
> >>>>>> Thanks,
> >>>>>> -Liran
> >>>>> --
> >>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >>>> 
> >>> --
> >>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >> 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] QEMU/KVM migration backwards compatibility broken?
@ 2019-06-06 11:07               ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert @ 2019-06-06 11:07 UTC (permalink / raw)
  To: Liran Alon; +Cc: Paolo Bonzini, qemu-devel, kvm list

* Liran Alon (liran.alon@oracle.com) wrote:
> 
> 
> > On 6 Jun 2019, at 13:39, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> > 
> > * Liran Alon (liran.alon@oracle.com) wrote:
> >> 
> >> 
> >>> On 6 Jun 2019, at 12:23, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> >>> 
> >>> * Liran Alon (liran.alon@oracle.com) wrote:
> >>>> 
> >>>> 
> >>>>> On 6 Jun 2019, at 11:42, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> >>>>> 
> >>>>> * Liran Alon (liran.alon@oracle.com) wrote:
> >>>>>> Hi,
> >>>>>> 
> >>>>>> Looking at QEMU source code, I am puzzled regarding how migration backwards compatibility is preserved regarding X86CPU.
> >>>>>> 
> >>>>>> As I understand it, fields that are based on KVM capabilities and guest runtime usage are defined in VMState subsections in order to not send them if not necessary.
> >>>>>> This is done such that in case they are not needed and we migrate to an old QEMU which don’t support loading this state, migration will still succeed
> >>>>>> (As .needed() method will return false and therefore this state won’t be sent as part of migration stream).
> >>>>>> Furthermore, in case .needed() returns true and old QEMU don’t support loading this state, migration fails. As it should because we are aware that guest state
> >>>>>> is not going to be restored properly on destination.
> >>>>>> 
> >>>>>> I’m puzzled about what will happen in the following scenario:
> >>>>>> 1) Source is running new QEMU with new KVM that supports save of some VMState subsection.
> >>>>>> 2) Destination is running new QEMU that supports load this state but with old kernel that doesn’t know how to load this state.
> >>>>>> 
> >>>>>> I would have expected in this case that if source .needed() returns true, then migration will fail because of lack of support in destination kernel.
> >>>>>> However, it seems from current QEMU code that this will actually succeed in many cases.
> >>>>>> 
> >>>>>> For example, if msr_smi_count is sent as part of migration stream (See vmstate_msr_smi_count) and destination have has_msr_smi_count==false,
> >>>>>> then destination will succeed loading migration stream but kvm_put_msrs() will actually ignore env->msr_smi_count and will successfully load guest state.
> >>>>>> Therefore, migration will succeed even though it should have failed…
> >>>>>> 
> >>>>>> It seems to me that QEMU should have for every such VMState subsection, a .post_load() method that verifies that relevant capability is supported by kernel
> >>>>>> and otherwise fail migration.
> >>>>>> 
> >>>>>> What do you think? Should I really create a patch to modify all these CPUX86 VMState subsections to behave like this?
> >>>>> 
> >>>>> I don't know the x86 specific side that much; but from my migration side
> >>>>> the answer should mostly be through machine types - indeed for smi-count
> >>>>> there's a property 'x-migrate-smi-count' which is off for machine types
> >>>>> pre 2.11 (see hw/i386/pc.c pc_compat_2_11) - so if you've got an old
> >>>>> kernel you should stick to the old machine types.
> >>>>> 
> >>>>> There's nothing guarding running the new machine type on old-kernels;
> >>>>> and arguably we should have a check at startup that complains if
> >>>>> your kernel is missing something the machine type uses.
> >>>>> However, that would mean that people running with -M pc   would fail
> >>>>> on old kernels.
> >>>>> 
> >>>>> A post-load is also a valid check; but one question is whether,
> >>>>> for a particular register, the pain is worth it - it depends on the
> >>>>> symptom that the missing state causes.  If it's minor then you might
> >>>>> conclude it's not worth a failed migration;  if it's a hung or
> >>>>> corrupt guest then yes it is.   Certainly a warning printed is worth
> >>>>> it.
> >>>>> 
> >>>>> Dave
> >>>> 
> >>>> I think we should have flags that allow user to specify which VMState subsections user explicitly allow to avoid restore even though they are required to fully restore guest state.
> >>>> But it seems to me that the behaviour should be to always fail migration in case we load a VMState subsections that we are unable to restore unless user explicitly specified this is ok
> >>>> for this specific subsection.
> >>>> Therefore, it seems that for every VMState subsection that it’s restore is based on kernel capability we should:
> >>>> 1) Have a user-controllable flag (which is also tied to machine-type?) to explicitly allow avoid restoring this state if cannot. Default should be “false”.
> >>>> 2) Have a .post_load() method that verifies we have required kernel capability to restore this state, unless flag (1) was specified as “true”.
> >>> 
> >>> This seems a lot of flags; users aren't going to know what to do with
> >>> all of them; I don't see what will set/control them.
> >> 
> >> True but I think users will want to specify only for a handful of VMState subsections that it is OK to not restore them even thought hey are deemed needed by source QEMU.
> >> We can create flags only for those VMState subsections.
> >> User should set these flags explicitly on QEMU command-line. As a “-cpu” property? I don’t think these flags should be tied to machine-type.
> > 
> > I don't see who is going to work out these flags and send them.
> > 
> >>> 
> >>>> Note that above mentioned flags is different than flags such as “x-migrate-smi-count”.
> >>>> The purpose of “x-migrate-smi-count” flag is to avoid sending the VMState subsection to begin with in case we know we migrate to older QEMU which don’t even have the relevant VMState subsection. But it is not relevant for the case both source and destination runs QEMU which understands the VMState subsection but run on kernels with different capabilities.
> >>>> 
> >>>> Also note regarding your first paragraph, that specifying flags based on kernel you are running on doesn’t help for the case discussed here.
> >>>> As source QEMU is running on new kernel. Unless you meant that source QEMU should use relevant machine-type based on the destination kernel.
> >>>> i.e. You should launch QEMU with old machine-type as long as you have hosts in your migration pool that runs with old kernel.
> >>> 
> >>> That's what I meant; stick to the old machine-type unless you know it's
> >>> safe to use a newer one.
> >>> 
> >>>> I don’ think it’s the right approach though. As there is no way to change flags such as “x-migrate-smi-count” dynamically after all hosts in migration pool have been upgraded.
> >>>> 
> >>>> What do you think?
> >>> 
> >>> I don't have an easy answer.  The users already have to make sure they
> >>> use a machine type that's old enough for all the QEMUs installed in
> >>> their cluster; making sure it's also old enough for their oldest
> >>> kernel isn't too big a difference - *except* that it's much harder to
> >>> tell which kernel corresponds to which feature/machine type etc - so
> >>> how does a user know what the newest supported machine type is?
> >>> Failing at startup when selecting a machine type that the current
> >>> kernel can't support would help that.
> >>> 
> >>> Dave
> >> 
> >> First, machine-type express the set of vHW behaviour and properties that is exposed to guest.
> >> Therefore, machine-type shouldn’t change for a given guest lifetime (including Live-Migrations).
> >> Otherwise, guest will experience different vHW behaviour and properties before/after Live-Migration.
> >> So I think machine-type is not relevant for this discussion. We should focus on flags which specify
> >> migration behaviour (such as “x-migrate-smi-count” which can also be controlled by machine-type but not only).
> > 
> > Machine type specifies two things:
> >  a) The view from the guest
> >  b) Migration compatibility
> > 
> > (b) is explicitly documented in qemu's docs/devel/migration.rst, see the
> > subsection on subsections.
> > 
> >> Second, this strategy results in inefficient migration management. Consider the following scenario:
> >> 1) Guest running on new_qemu+old_kernel migrate to host with new_qemu+new_kernel.
> >> Because source is old_kernel than destination QEMU is launched with (x-migrate-smi-count == false).
> >> 2) Assume at this point fleet of hosts have half of hosts with old_kernel and half with new_kernel.
> >> 3) Further assume that guest workload indeed use msr_smi_count and therefore relevant VMState subsection should be sent to properly preserve guest state.
> >> 4) From some reason, we decide to migrate again the guest in (1).
> >> Even if guest is migrated to a host with new_kernel, then QEMU still avoids sending msr_smi_count VMState subsection because it is launched with (x-migrate-smi-count == false).
> >> 
> >> Therefore, I think it makes more sense that source QEMU will always send all VMState subsection that are deemed needed (i.e. .nedeed() returns true)
> >> and let receive-side decide if migration should fail if this subsection was sent but failed to be restored.
> >> The only case which I think sender should limit the VMState subsection it sends to destination is because source is running older QEMU
> >> which is not even aware of this VMState subsection (Which is to my understanding the rational behind using “x-migrate-smi-count” and tie it up to machine-type).
> > 
> > But we want to avoid failed migrations if we can; so in general we don't
> > want to be sending subsections to destinations that can't handle them.
> > The only case where it's reasonable is when there's a migration bug such
> > that the behaviour in the guest is really nasty; if there's a choice
> > between a failed migration or a hung/corrupt guest I'll take a failed
> > migration.
> > 
> >> Third, let’s assume all hosts in fleet was upgraded to new_kernel. How do I modify all launched QEMUs on these new hosts to now have “x-migrate-smi-count” set to true?
> >> As I would like future migrations to do send this VMState subsection. Currently there is no QMP command to update these flags.
> > 
> > I guess that's possible - it's pretty painful though; you're going to
> > have to teach your management layer about features/fixes of the kernels
> > and which flags to tweak in qemu.  Having said that, if you could do it,
> > then you'd avoid having to restart VMs to pick up a few fixes.
> > 
> >> Fourth, I think it’s not trivial for management-plane to be aware with which flags it should set on destination QEMU based on currently running kernels on fleet.
> >> It’s not the same as machine-type, as already discussed above doesn’t change during the entire lifetime of guest.
> > 
> > Right, which is why I don't see your idea of adding flags will work.
> > I don't see how anything will figure out what the right flags to use
> > are.
> > (Getting the management layers to do sane things with the cpuid flags
> > is already a nightmare, and they're fairly well understood).
> > 
> >> I’m also not sure it is a good idea that we currently control flags such as “x-migrate-smi-count” from machine-type.
> >> As it means that if a guest was initially launched using some old QEMU, it will *forever* not migrate some VMState subsection during all it’s Live-Migrations.
> >> Even if all hosts and all QEMUs on fleet are capable of migrating this state properly.
> >> Maybe it is preferred that this flag was specified as part of “migrate” command itself in case management-plane knows it wishes to migrate even though dest QEMU
> >> is older and doesn’t understand this specific VMState subsection.
> >> 
> >> I’m left pretty confused about QEMU’s migration compatibility strategy...
> > 
> > The compatibility strategy is the machine type;  but yes it does
> > have a problem when it's not really just a qemu version - but also
> > kernel (and external libraries, etc).
> > My general advice is that users should be updating their kernels and
> > qemus together; but I realise there's lots of cases where that
> > doesn't work.
> > 
> > Dave
> 
> I think it’s not practical advise to expect users to always upgrade kernel and QEMU together.
> In fact, users prefer to upgrade them separately to avoid doing major upgrades at once and to better pin-point root-cause of issues.

It's tricky; for distro-based users, hitting 'update' and getting both
makes a lot of sense; but as you say you ened to let them do stuff
individually if they want to, so they can track down problems.
There's also a newer problem which is people want to run the QEMU in
containers on hosts that have separate update schedules - the kernel
version relationship is then much more fluid.

> Compiling all above very useful discussion (thanks for this!), I may have a better suggestion that doesn’t require any additional flags:
> 1) Source QEMU will always send all all VMState subsections that is deemed by source QEMU as required to not break guest semantic behaviour.
> This is done by .needed() methods that examine guest runtime state to understand if this state is required to be sent or not.

So that's as we already do.

> 2) Destination QEMU will provide a generic QMP command which allows to set names of VMState subsections that if accepted on migration stream
> and failed to be loaded (because either subsection name is not implemented or because .post_load() method failed) then the failure should be ignored
> and migration should continue as usual. By default, the list of this names will be empty.

The format of the migration stream means that you can't skip an unknown
subsection; it's not possible to resume parsing the stream without
knowing what was supposed to be there. [This is pretty awful
but my last attempts to rework it hit a dead end]

So we still need to tie subsections to machine types; that way
you don't send them to old qemu's and there for you don't have the
problem of the qemu receiving something it doesn't know.

Still, you could skip things where the destination kernel doesn't know
about it.

> 3) Destination QEMU will implement .post_load() method for all these VMState subsections that depend on kernel capability to be restored properly
> such that it will fail subsection load in case kernel capability is not present. (Note that this load failure will be ignored if subsection name is specified in (2)).
> 
> Above suggestion have the following properties:
> 1) Doesn’t require any flag to be added to QEMU.

There's no logical difference between 'flags' and 'names of subsections'
- they're got the same problem in someone somewhere knowing which are
  safe.

> 2) Moves all control on whether to fail migration because of failure to load VMState subsection to receiver side. Sender always attempts to send max state he believes is required.
> 3) We remove coupling of migration compatibility from machine-type.
> 
> What do you think?

Sorry, can't do (3) - we need to keep the binding for subsections to
machine types for qemu compatibility;  I'm open for something for
kernel compat, but not when it's breaking the qemu subsection
checks.

Dave

> 
> -Liran
> 
> > 
> >> -Liran
> >> 
> >>> 
> >>>> -Liran
> >>>> 
> >>>>> 
> >>>>>> Thanks,
> >>>>>> -Liran
> >>>>> --
> >>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >>>> 
> >>> --
> >>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >> 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: QEMU/KVM migration backwards compatibility broken?
  2019-06-06 11:07               ` [Qemu-devel] " Dr. David Alan Gilbert
@ 2019-06-06 11:29                 ` Liran Alon
  -1 siblings, 0 replies; 26+ messages in thread
From: Liran Alon @ 2019-06-06 11:29 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: kvm list, qemu-devel, Paolo Bonzini



> On 6 Jun 2019, at 14:07, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> 
> * Liran Alon (liran.alon@oracle.com) wrote:
>> 
>> 
>>> On 6 Jun 2019, at 13:39, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
>>> 
>>> * Liran Alon (liran.alon@oracle.com) wrote:
>>>> 
>>>> 
>>>>> On 6 Jun 2019, at 12:23, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
>>>>> 
>>>>> * Liran Alon (liran.alon@oracle.com) wrote:
>>>>>> 
>>>>>> 
>>>>>>> On 6 Jun 2019, at 11:42, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
>>>>>>> 
>>>>>>> * Liran Alon (liran.alon@oracle.com) wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Looking at QEMU source code, I am puzzled regarding how migration backwards compatibility is preserved regarding X86CPU.
>>>>>>>> 
>>>>>>>> As I understand it, fields that are based on KVM capabilities and guest runtime usage are defined in VMState subsections in order to not send them if not necessary.
>>>>>>>> This is done such that in case they are not needed and we migrate to an old QEMU which don’t support loading this state, migration will still succeed
>>>>>>>> (As .needed() method will return false and therefore this state won’t be sent as part of migration stream).
>>>>>>>> Furthermore, in case .needed() returns true and old QEMU don’t support loading this state, migration fails. As it should because we are aware that guest state
>>>>>>>> is not going to be restored properly on destination.
>>>>>>>> 
>>>>>>>> I’m puzzled about what will happen in the following scenario:
>>>>>>>> 1) Source is running new QEMU with new KVM that supports save of some VMState subsection.
>>>>>>>> 2) Destination is running new QEMU that supports load this state but with old kernel that doesn’t know how to load this state.
>>>>>>>> 
>>>>>>>> I would have expected in this case that if source .needed() returns true, then migration will fail because of lack of support in destination kernel.
>>>>>>>> However, it seems from current QEMU code that this will actually succeed in many cases.
>>>>>>>> 
>>>>>>>> For example, if msr_smi_count is sent as part of migration stream (See vmstate_msr_smi_count) and destination have has_msr_smi_count==false,
>>>>>>>> then destination will succeed loading migration stream but kvm_put_msrs() will actually ignore env->msr_smi_count and will successfully load guest state.
>>>>>>>> Therefore, migration will succeed even though it should have failed…
>>>>>>>> 
>>>>>>>> It seems to me that QEMU should have for every such VMState subsection, a .post_load() method that verifies that relevant capability is supported by kernel
>>>>>>>> and otherwise fail migration.
>>>>>>>> 
>>>>>>>> What do you think? Should I really create a patch to modify all these CPUX86 VMState subsections to behave like this?
>>>>>>> 
>>>>>>> I don't know the x86 specific side that much; but from my migration side
>>>>>>> the answer should mostly be through machine types - indeed for smi-count
>>>>>>> there's a property 'x-migrate-smi-count' which is off for machine types
>>>>>>> pre 2.11 (see hw/i386/pc.c pc_compat_2_11) - so if you've got an old
>>>>>>> kernel you should stick to the old machine types.
>>>>>>> 
>>>>>>> There's nothing guarding running the new machine type on old-kernels;
>>>>>>> and arguably we should have a check at startup that complains if
>>>>>>> your kernel is missing something the machine type uses.
>>>>>>> However, that would mean that people running with -M pc   would fail
>>>>>>> on old kernels.
>>>>>>> 
>>>>>>> A post-load is also a valid check; but one question is whether,
>>>>>>> for a particular register, the pain is worth it - it depends on the
>>>>>>> symptom that the missing state causes.  If it's minor then you might
>>>>>>> conclude it's not worth a failed migration;  if it's a hung or
>>>>>>> corrupt guest then yes it is.   Certainly a warning printed is worth
>>>>>>> it.
>>>>>>> 
>>>>>>> Dave
>>>>>> 
>>>>>> I think we should have flags that allow user to specify which VMState subsections user explicitly allow to avoid restore even though they are required to fully restore guest state.
>>>>>> But it seems to me that the behaviour should be to always fail migration in case we load a VMState subsections that we are unable to restore unless user explicitly specified this is ok
>>>>>> for this specific subsection.
>>>>>> Therefore, it seems that for every VMState subsection that it’s restore is based on kernel capability we should:
>>>>>> 1) Have a user-controllable flag (which is also tied to machine-type?) to explicitly allow avoid restoring this state if cannot. Default should be “false”.
>>>>>> 2) Have a .post_load() method that verifies we have required kernel capability to restore this state, unless flag (1) was specified as “true”.
>>>>> 
>>>>> This seems a lot of flags; users aren't going to know what to do with
>>>>> all of them; I don't see what will set/control them.
>>>> 
>>>> True but I think users will want to specify only for a handful of VMState subsections that it is OK to not restore them even thought hey are deemed needed by source QEMU.
>>>> We can create flags only for those VMState subsections.
>>>> User should set these flags explicitly on QEMU command-line. As a “-cpu” property? I don’t think these flags should be tied to machine-type.
>>> 
>>> I don't see who is going to work out these flags and send them.
>>> 
>>>>> 
>>>>>> Note that above mentioned flags is different than flags such as “x-migrate-smi-count”.
>>>>>> The purpose of “x-migrate-smi-count” flag is to avoid sending the VMState subsection to begin with in case we know we migrate to older QEMU which don’t even have the relevant VMState subsection. But it is not relevant for the case both source and destination runs QEMU which understands the VMState subsection but run on kernels with different capabilities.
>>>>>> 
>>>>>> Also note regarding your first paragraph, that specifying flags based on kernel you are running on doesn’t help for the case discussed here.
>>>>>> As source QEMU is running on new kernel. Unless you meant that source QEMU should use relevant machine-type based on the destination kernel.
>>>>>> i.e. You should launch QEMU with old machine-type as long as you have hosts in your migration pool that runs with old kernel.
>>>>> 
>>>>> That's what I meant; stick to the old machine-type unless you know it's
>>>>> safe to use a newer one.
>>>>> 
>>>>>> I don’ think it’s the right approach though. As there is no way to change flags such as “x-migrate-smi-count” dynamically after all hosts in migration pool have been upgraded.
>>>>>> 
>>>>>> What do you think?
>>>>> 
>>>>> I don't have an easy answer.  The users already have to make sure they
>>>>> use a machine type that's old enough for all the QEMUs installed in
>>>>> their cluster; making sure it's also old enough for their oldest
>>>>> kernel isn't too big a difference - *except* that it's much harder to
>>>>> tell which kernel corresponds to which feature/machine type etc - so
>>>>> how does a user know what the newest supported machine type is?
>>>>> Failing at startup when selecting a machine type that the current
>>>>> kernel can't support would help that.
>>>>> 
>>>>> Dave
>>>> 
>>>> First, machine-type express the set of vHW behaviour and properties that is exposed to guest.
>>>> Therefore, machine-type shouldn’t change for a given guest lifetime (including Live-Migrations).
>>>> Otherwise, guest will experience different vHW behaviour and properties before/after Live-Migration.
>>>> So I think machine-type is not relevant for this discussion. We should focus on flags which specify
>>>> migration behaviour (such as “x-migrate-smi-count” which can also be controlled by machine-type but not only).
>>> 
>>> Machine type specifies two things:
>>> a) The view from the guest
>>> b) Migration compatibility
>>> 
>>> (b) is explicitly documented in qemu's docs/devel/migration.rst, see the
>>> subsection on subsections.
>>> 
>>>> Second, this strategy results in inefficient migration management. Consider the following scenario:
>>>> 1) Guest running on new_qemu+old_kernel migrate to host with new_qemu+new_kernel.
>>>> Because source is old_kernel than destination QEMU is launched with (x-migrate-smi-count == false).
>>>> 2) Assume at this point fleet of hosts have half of hosts with old_kernel and half with new_kernel.
>>>> 3) Further assume that guest workload indeed use msr_smi_count and therefore relevant VMState subsection should be sent to properly preserve guest state.
>>>> 4) From some reason, we decide to migrate again the guest in (1).
>>>> Even if guest is migrated to a host with new_kernel, then QEMU still avoids sending msr_smi_count VMState subsection because it is launched with (x-migrate-smi-count == false).
>>>> 
>>>> Therefore, I think it makes more sense that source QEMU will always send all VMState subsection that are deemed needed (i.e. .nedeed() returns true)
>>>> and let receive-side decide if migration should fail if this subsection was sent but failed to be restored.
>>>> The only case which I think sender should limit the VMState subsection it sends to destination is because source is running older QEMU
>>>> which is not even aware of this VMState subsection (Which is to my understanding the rational behind using “x-migrate-smi-count” and tie it up to machine-type).
>>> 
>>> But we want to avoid failed migrations if we can; so in general we don't
>>> want to be sending subsections to destinations that can't handle them.
>>> The only case where it's reasonable is when there's a migration bug such
>>> that the behaviour in the guest is really nasty; if there's a choice
>>> between a failed migration or a hung/corrupt guest I'll take a failed
>>> migration.
>>> 
>>>> Third, let’s assume all hosts in fleet was upgraded to new_kernel. How do I modify all launched QEMUs on these new hosts to now have “x-migrate-smi-count” set to true?
>>>> As I would like future migrations to do send this VMState subsection. Currently there is no QMP command to update these flags.
>>> 
>>> I guess that's possible - it's pretty painful though; you're going to
>>> have to teach your management layer about features/fixes of the kernels
>>> and which flags to tweak in qemu.  Having said that, if you could do it,
>>> then you'd avoid having to restart VMs to pick up a few fixes.
>>> 
>>>> Fourth, I think it’s not trivial for management-plane to be aware with which flags it should set on destination QEMU based on currently running kernels on fleet.
>>>> It’s not the same as machine-type, as already discussed above doesn’t change during the entire lifetime of guest.
>>> 
>>> Right, which is why I don't see your idea of adding flags will work.
>>> I don't see how anything will figure out what the right flags to use
>>> are.
>>> (Getting the management layers to do sane things with the cpuid flags
>>> is already a nightmare, and they're fairly well understood).
>>> 
>>>> I’m also not sure it is a good idea that we currently control flags such as “x-migrate-smi-count” from machine-type.
>>>> As it means that if a guest was initially launched using some old QEMU, it will *forever* not migrate some VMState subsection during all it’s Live-Migrations.
>>>> Even if all hosts and all QEMUs on fleet are capable of migrating this state properly.
>>>> Maybe it is preferred that this flag was specified as part of “migrate” command itself in case management-plane knows it wishes to migrate even though dest QEMU
>>>> is older and doesn’t understand this specific VMState subsection.
>>>> 
>>>> I’m left pretty confused about QEMU’s migration compatibility strategy...
>>> 
>>> The compatibility strategy is the machine type;  but yes it does
>>> have a problem when it's not really just a qemu version - but also
>>> kernel (and external libraries, etc).
>>> My general advice is that users should be updating their kernels and
>>> qemus together; but I realise there's lots of cases where that
>>> doesn't work.
>>> 
>>> Dave
>> 
>> I think it’s not practical advise to expect users to always upgrade kernel and QEMU together.
>> In fact, users prefer to upgrade them separately to avoid doing major upgrades at once and to better pin-point root-cause of issues.
> 
> It's tricky; for distro-based users, hitting 'update' and getting both
> makes a lot of sense; but as you say you ened to let them do stuff
> individually if they want to, so they can track down problems.
> There's also a newer problem which is people want to run the QEMU in
> containers on hosts that have separate update schedules - the kernel
> version relationship is then much more fluid.
> 
>> Compiling all above very useful discussion (thanks for this!), I may have a better suggestion that doesn’t require any additional flags:
>> 1) Source QEMU will always send all all VMState subsections that is deemed by source QEMU as required to not break guest semantic behaviour.
>> This is done by .needed() methods that examine guest runtime state to understand if this state is required to be sent or not.
> 
> So that's as we already do.

Besides the fact that today we also expect to add a flag tied to machine-type for every new VMState subsection we add that didn’t exist on previous QEMU versions...

> 
>> 2) Destination QEMU will provide a generic QMP command which allows to set names of VMState subsections that if accepted on migration stream
>> and failed to be loaded (because either subsection name is not implemented or because .post_load() method failed) then the failure should be ignored
>> and migration should continue as usual. By default, the list of this names will be empty.
> 
> The format of the migration stream means that you can't skip an unknown
> subsection; it's not possible to resume parsing the stream without
> knowing what was supposed to be there. [This is pretty awful
> but my last attempts to rework it hit a dead end]

Wow… That is indeed pretty awful.
I thought every VMState subsection have a header with a length field… :(

Why did your last attempts to add such a length field to migration stream protocol failed?

> 
> So we still need to tie subsections to machine types; that way
> you don't send them to old qemu's and there for you don't have the
> problem of the qemu receiving something it doesn't know.

I agree that if there is no way to skip a VMState subsection in the stream, then we must
have a way to specify to source QEMU to prevent sending this subsection to destination…

I would suggest though that instead of having a flag tied to machine-type, we will have a QMP command
that can specify names of subsections we explicitly wish to be skipped sending to destination even if their .needed() method returns true.

This seems like a more explicit approach and doesn’t come with the down-side of forever not migrating this VMState subsection
for the entire lifetime of guest.

> 
> Still, you could skip things where the destination kernel doesn't know
> about it.
> 
>> 3) Destination QEMU will implement .post_load() method for all these VMState subsections that depend on kernel capability to be restored properly
>> such that it will fail subsection load in case kernel capability is not present. (Note that this load failure will be ignored if subsection name is specified in (2)).
>> 
>> Above suggestion have the following properties:
>> 1) Doesn’t require any flag to be added to QEMU.
> 
> There's no logical difference between 'flags' and 'names of subsections'
> - they're got the same problem in someone somewhere knowing which are
>  safe.

I agree. But creating additional flags does come with a development and testing overhead and makes code less intuitive.
I would have prefer to use subsection names.

> 
>> 2) Moves all control on whether to fail migration because of failure to load VMState subsection to receiver side. Sender always attempts to send max state he believes is required.
>> 3) We remove coupling of migration compatibility from machine-type.
>> 
>> What do you think?
> 
> Sorry, can't do (3) - we need to keep the binding for subsections to
> machine types for qemu compatibility;  I'm open for something for
> kernel compat, but not when it's breaking the qemu subsection
> checks.
> 
> Dave

Agree. I have proposed now above how to not break qemu subsection checks while still not tie this to machine-type.
Please tell me what you think on that approach. :)

We can combine that approach together with implementing the mentioned .post_load() methods and maybe it solves the discussion at hand here.

-Liran

> 
>> 
>> -Liran
>> 
>>> 
>>>> -Liran
>>>> 
>>>>> 
>>>>>> -Liran
>>>>>> 
>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> -Liran
>>>>>>> --
>>>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>>> 
>>>>> --
>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>> 
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] QEMU/KVM migration backwards compatibility broken?
@ 2019-06-06 11:29                 ` Liran Alon
  0 siblings, 0 replies; 26+ messages in thread
From: Liran Alon @ 2019-06-06 11:29 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Paolo Bonzini, qemu-devel, kvm list



> On 6 Jun 2019, at 14:07, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> 
> * Liran Alon (liran.alon@oracle.com) wrote:
>> 
>> 
>>> On 6 Jun 2019, at 13:39, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
>>> 
>>> * Liran Alon (liran.alon@oracle.com) wrote:
>>>> 
>>>> 
>>>>> On 6 Jun 2019, at 12:23, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
>>>>> 
>>>>> * Liran Alon (liran.alon@oracle.com) wrote:
>>>>>> 
>>>>>> 
>>>>>>> On 6 Jun 2019, at 11:42, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
>>>>>>> 
>>>>>>> * Liran Alon (liran.alon@oracle.com) wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Looking at QEMU source code, I am puzzled regarding how migration backwards compatibility is preserved regarding X86CPU.
>>>>>>>> 
>>>>>>>> As I understand it, fields that are based on KVM capabilities and guest runtime usage are defined in VMState subsections in order to not send them if not necessary.
>>>>>>>> This is done such that in case they are not needed and we migrate to an old QEMU which don’t support loading this state, migration will still succeed
>>>>>>>> (As .needed() method will return false and therefore this state won’t be sent as part of migration stream).
>>>>>>>> Furthermore, in case .needed() returns true and old QEMU don’t support loading this state, migration fails. As it should because we are aware that guest state
>>>>>>>> is not going to be restored properly on destination.
>>>>>>>> 
>>>>>>>> I’m puzzled about what will happen in the following scenario:
>>>>>>>> 1) Source is running new QEMU with new KVM that supports save of some VMState subsection.
>>>>>>>> 2) Destination is running new QEMU that supports load this state but with old kernel that doesn’t know how to load this state.
>>>>>>>> 
>>>>>>>> I would have expected in this case that if source .needed() returns true, then migration will fail because of lack of support in destination kernel.
>>>>>>>> However, it seems from current QEMU code that this will actually succeed in many cases.
>>>>>>>> 
>>>>>>>> For example, if msr_smi_count is sent as part of migration stream (See vmstate_msr_smi_count) and destination have has_msr_smi_count==false,
>>>>>>>> then destination will succeed loading migration stream but kvm_put_msrs() will actually ignore env->msr_smi_count and will successfully load guest state.
>>>>>>>> Therefore, migration will succeed even though it should have failed…
>>>>>>>> 
>>>>>>>> It seems to me that QEMU should have for every such VMState subsection, a .post_load() method that verifies that relevant capability is supported by kernel
>>>>>>>> and otherwise fail migration.
>>>>>>>> 
>>>>>>>> What do you think? Should I really create a patch to modify all these CPUX86 VMState subsections to behave like this?
>>>>>>> 
>>>>>>> I don't know the x86 specific side that much; but from my migration side
>>>>>>> the answer should mostly be through machine types - indeed for smi-count
>>>>>>> there's a property 'x-migrate-smi-count' which is off for machine types
>>>>>>> pre 2.11 (see hw/i386/pc.c pc_compat_2_11) - so if you've got an old
>>>>>>> kernel you should stick to the old machine types.
>>>>>>> 
>>>>>>> There's nothing guarding running the new machine type on old-kernels;
>>>>>>> and arguably we should have a check at startup that complains if
>>>>>>> your kernel is missing something the machine type uses.
>>>>>>> However, that would mean that people running with -M pc   would fail
>>>>>>> on old kernels.
>>>>>>> 
>>>>>>> A post-load is also a valid check; but one question is whether,
>>>>>>> for a particular register, the pain is worth it - it depends on the
>>>>>>> symptom that the missing state causes.  If it's minor then you might
>>>>>>> conclude it's not worth a failed migration;  if it's a hung or
>>>>>>> corrupt guest then yes it is.   Certainly a warning printed is worth
>>>>>>> it.
>>>>>>> 
>>>>>>> Dave
>>>>>> 
>>>>>> I think we should have flags that allow user to specify which VMState subsections user explicitly allow to avoid restore even though they are required to fully restore guest state.
>>>>>> But it seems to me that the behaviour should be to always fail migration in case we load a VMState subsections that we are unable to restore unless user explicitly specified this is ok
>>>>>> for this specific subsection.
>>>>>> Therefore, it seems that for every VMState subsection that it’s restore is based on kernel capability we should:
>>>>>> 1) Have a user-controllable flag (which is also tied to machine-type?) to explicitly allow avoid restoring this state if cannot. Default should be “false”.
>>>>>> 2) Have a .post_load() method that verifies we have required kernel capability to restore this state, unless flag (1) was specified as “true”.
>>>>> 
>>>>> This seems a lot of flags; users aren't going to know what to do with
>>>>> all of them; I don't see what will set/control them.
>>>> 
>>>> True but I think users will want to specify only for a handful of VMState subsections that it is OK to not restore them even thought hey are deemed needed by source QEMU.
>>>> We can create flags only for those VMState subsections.
>>>> User should set these flags explicitly on QEMU command-line. As a “-cpu” property? I don’t think these flags should be tied to machine-type.
>>> 
>>> I don't see who is going to work out these flags and send them.
>>> 
>>>>> 
>>>>>> Note that above mentioned flags is different than flags such as “x-migrate-smi-count”.
>>>>>> The purpose of “x-migrate-smi-count” flag is to avoid sending the VMState subsection to begin with in case we know we migrate to older QEMU which don’t even have the relevant VMState subsection. But it is not relevant for the case both source and destination runs QEMU which understands the VMState subsection but run on kernels with different capabilities.
>>>>>> 
>>>>>> Also note regarding your first paragraph, that specifying flags based on kernel you are running on doesn’t help for the case discussed here.
>>>>>> As source QEMU is running on new kernel. Unless you meant that source QEMU should use relevant machine-type based on the destination kernel.
>>>>>> i.e. You should launch QEMU with old machine-type as long as you have hosts in your migration pool that runs with old kernel.
>>>>> 
>>>>> That's what I meant; stick to the old machine-type unless you know it's
>>>>> safe to use a newer one.
>>>>> 
>>>>>> I don’ think it’s the right approach though. As there is no way to change flags such as “x-migrate-smi-count” dynamically after all hosts in migration pool have been upgraded.
>>>>>> 
>>>>>> What do you think?
>>>>> 
>>>>> I don't have an easy answer.  The users already have to make sure they
>>>>> use a machine type that's old enough for all the QEMUs installed in
>>>>> their cluster; making sure it's also old enough for their oldest
>>>>> kernel isn't too big a difference - *except* that it's much harder to
>>>>> tell which kernel corresponds to which feature/machine type etc - so
>>>>> how does a user know what the newest supported machine type is?
>>>>> Failing at startup when selecting a machine type that the current
>>>>> kernel can't support would help that.
>>>>> 
>>>>> Dave
>>>> 
>>>> First, machine-type express the set of vHW behaviour and properties that is exposed to guest.
>>>> Therefore, machine-type shouldn’t change for a given guest lifetime (including Live-Migrations).
>>>> Otherwise, guest will experience different vHW behaviour and properties before/after Live-Migration.
>>>> So I think machine-type is not relevant for this discussion. We should focus on flags which specify
>>>> migration behaviour (such as “x-migrate-smi-count” which can also be controlled by machine-type but not only).
>>> 
>>> Machine type specifies two things:
>>> a) The view from the guest
>>> b) Migration compatibility
>>> 
>>> (b) is explicitly documented in qemu's docs/devel/migration.rst, see the
>>> subsection on subsections.
>>> 
>>>> Second, this strategy results in inefficient migration management. Consider the following scenario:
>>>> 1) Guest running on new_qemu+old_kernel migrate to host with new_qemu+new_kernel.
>>>> Because source is old_kernel than destination QEMU is launched with (x-migrate-smi-count == false).
>>>> 2) Assume at this point fleet of hosts have half of hosts with old_kernel and half with new_kernel.
>>>> 3) Further assume that guest workload indeed use msr_smi_count and therefore relevant VMState subsection should be sent to properly preserve guest state.
>>>> 4) From some reason, we decide to migrate again the guest in (1).
>>>> Even if guest is migrated to a host with new_kernel, then QEMU still avoids sending msr_smi_count VMState subsection because it is launched with (x-migrate-smi-count == false).
>>>> 
>>>> Therefore, I think it makes more sense that source QEMU will always send all VMState subsection that are deemed needed (i.e. .nedeed() returns true)
>>>> and let receive-side decide if migration should fail if this subsection was sent but failed to be restored.
>>>> The only case which I think sender should limit the VMState subsection it sends to destination is because source is running older QEMU
>>>> which is not even aware of this VMState subsection (Which is to my understanding the rational behind using “x-migrate-smi-count” and tie it up to machine-type).
>>> 
>>> But we want to avoid failed migrations if we can; so in general we don't
>>> want to be sending subsections to destinations that can't handle them.
>>> The only case where it's reasonable is when there's a migration bug such
>>> that the behaviour in the guest is really nasty; if there's a choice
>>> between a failed migration or a hung/corrupt guest I'll take a failed
>>> migration.
>>> 
>>>> Third, let’s assume all hosts in fleet was upgraded to new_kernel. How do I modify all launched QEMUs on these new hosts to now have “x-migrate-smi-count” set to true?
>>>> As I would like future migrations to do send this VMState subsection. Currently there is no QMP command to update these flags.
>>> 
>>> I guess that's possible - it's pretty painful though; you're going to
>>> have to teach your management layer about features/fixes of the kernels
>>> and which flags to tweak in qemu.  Having said that, if you could do it,
>>> then you'd avoid having to restart VMs to pick up a few fixes.
>>> 
>>>> Fourth, I think it’s not trivial for management-plane to be aware with which flags it should set on destination QEMU based on currently running kernels on fleet.
>>>> It’s not the same as machine-type, as already discussed above doesn’t change during the entire lifetime of guest.
>>> 
>>> Right, which is why I don't see your idea of adding flags will work.
>>> I don't see how anything will figure out what the right flags to use
>>> are.
>>> (Getting the management layers to do sane things with the cpuid flags
>>> is already a nightmare, and they're fairly well understood).
>>> 
>>>> I’m also not sure it is a good idea that we currently control flags such as “x-migrate-smi-count” from machine-type.
>>>> As it means that if a guest was initially launched using some old QEMU, it will *forever* not migrate some VMState subsection during all it’s Live-Migrations.
>>>> Even if all hosts and all QEMUs on fleet are capable of migrating this state properly.
>>>> Maybe it is preferred that this flag was specified as part of “migrate” command itself in case management-plane knows it wishes to migrate even though dest QEMU
>>>> is older and doesn’t understand this specific VMState subsection.
>>>> 
>>>> I’m left pretty confused about QEMU’s migration compatibility strategy...
>>> 
>>> The compatibility strategy is the machine type;  but yes it does
>>> have a problem when it's not really just a qemu version - but also
>>> kernel (and external libraries, etc).
>>> My general advice is that users should be updating their kernels and
>>> qemus together; but I realise there's lots of cases where that
>>> doesn't work.
>>> 
>>> Dave
>> 
>> I think it’s not practical advise to expect users to always upgrade kernel and QEMU together.
>> In fact, users prefer to upgrade them separately to avoid doing major upgrades at once and to better pin-point root-cause of issues.
> 
> It's tricky; for distro-based users, hitting 'update' and getting both
> makes a lot of sense; but as you say you ened to let them do stuff
> individually if they want to, so they can track down problems.
> There's also a newer problem which is people want to run the QEMU in
> containers on hosts that have separate update schedules - the kernel
> version relationship is then much more fluid.
> 
>> Compiling all above very useful discussion (thanks for this!), I may have a better suggestion that doesn’t require any additional flags:
>> 1) Source QEMU will always send all all VMState subsections that is deemed by source QEMU as required to not break guest semantic behaviour.
>> This is done by .needed() methods that examine guest runtime state to understand if this state is required to be sent or not.
> 
> So that's as we already do.

Besides the fact that today we also expect to add a flag tied to machine-type for every new VMState subsection we add that didn’t exist on previous QEMU versions...

> 
>> 2) Destination QEMU will provide a generic QMP command which allows to set names of VMState subsections that if accepted on migration stream
>> and failed to be loaded (because either subsection name is not implemented or because .post_load() method failed) then the failure should be ignored
>> and migration should continue as usual. By default, the list of this names will be empty.
> 
> The format of the migration stream means that you can't skip an unknown
> subsection; it's not possible to resume parsing the stream without
> knowing what was supposed to be there. [This is pretty awful
> but my last attempts to rework it hit a dead end]

Wow… That is indeed pretty awful.
I thought every VMState subsection have a header with a length field… :(

Why did your last attempts to add such a length field to migration stream protocol failed?

> 
> So we still need to tie subsections to machine types; that way
> you don't send them to old qemu's and there for you don't have the
> problem of the qemu receiving something it doesn't know.

I agree that if there is no way to skip a VMState subsection in the stream, then we must
have a way to specify to source QEMU to prevent sending this subsection to destination…

I would suggest though that instead of having a flag tied to machine-type, we will have a QMP command
that can specify names of subsections we explicitly wish to be skipped sending to destination even if their .needed() method returns true.

This seems like a more explicit approach and doesn’t come with the down-side of forever not migrating this VMState subsection
for the entire lifetime of guest.

> 
> Still, you could skip things where the destination kernel doesn't know
> about it.
> 
>> 3) Destination QEMU will implement .post_load() method for all these VMState subsections that depend on kernel capability to be restored properly
>> such that it will fail subsection load in case kernel capability is not present. (Note that this load failure will be ignored if subsection name is specified in (2)).
>> 
>> Above suggestion have the following properties:
>> 1) Doesn’t require any flag to be added to QEMU.
> 
> There's no logical difference between 'flags' and 'names of subsections'
> - they're got the same problem in someone somewhere knowing which are
>  safe.

I agree. But creating additional flags does come with a development and testing overhead and makes code less intuitive.
I would have prefer to use subsection names.

> 
>> 2) Moves all control on whether to fail migration because of failure to load VMState subsection to receiver side. Sender always attempts to send max state he believes is required.
>> 3) We remove coupling of migration compatibility from machine-type.
>> 
>> What do you think?
> 
> Sorry, can't do (3) - we need to keep the binding for subsections to
> machine types for qemu compatibility;  I'm open for something for
> kernel compat, but not when it's breaking the qemu subsection
> checks.
> 
> Dave

Agree. I have proposed now above how to not break qemu subsection checks while still not tie this to machine-type.
Please tell me what you think on that approach. :)

We can combine that approach together with implementing the mentioned .post_load() methods and maybe it solves the discussion at hand here.

-Liran

> 
>> 
>> -Liran
>> 
>>> 
>>>> -Liran
>>>> 
>>>>> 
>>>>>> -Liran
>>>>>> 
>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> -Liran
>>>>>>> --
>>>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>>>> 
>>>>> --
>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>> 
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] QEMU/KVM migration backwards compatibility broken?
  2019-06-06 10:09         ` [Qemu-devel] " Liran Alon
@ 2019-06-06 13:13           ` Roman Kagan
  -1 siblings, 0 replies; 26+ messages in thread
From: Roman Kagan @ 2019-06-06 13:13 UTC (permalink / raw)
  To: Liran Alon; +Cc: Dr. David Alan Gilbert, Paolo Bonzini, qemu-devel, kvm list

On Thu, Jun 06, 2019 at 01:09:56PM +0300, Liran Alon wrote:
> First, machine-type express the set of vHW behaviour and properties that is exposed to guest.
> Therefore, machine-type shouldn’t change for a given guest lifetime (including Live-Migrations).
> Otherwise, guest will experience different vHW behaviour and properties before/after Live-Migration.
> So I think machine-type is not relevant for this discussion. We should focus on flags which specify
> migration behaviour (such as “x-migrate-smi-count” which can also be controlled by machine-type but not only).

My experience in dealing with migration compatibility matters (see e.g.
9b4cf107b09d18ac30f46fd1c4de8585ccba030c) is that the strategy is
basically "don't let anything but QEMU control what the guest sees".  If
QEMU is started with a particular set of flags defining guest-visible
things (and machine-type is just a convenient preset of flags), and
anything in the environment (host kernel, libraries, etc.) prevents it
from actually providing that functionality, it should detect it before
running any guest code or taking the incoming migration stream, and
refuse to start instead.

AFAICS from a quick glance, QEMU doesn't have a flag to which the
visiblity of MSR_SMI_COUNT is tied, which IMHO is a bug.  As the
presence of this MSR in the CPU is only determined by the CPU model,
it'd probably be correct to just refuse starting when such a CPU is
requested and the host KVM doesn't support MSR_SMI_COUNT.

> Third, let’s assume all hosts in fleet was upgraded to new_kernel. How
> do I modify all launched QEMUs on these new hosts to now have
> “x-migrate-smi-count” set to true?

In (my) ideal world this flag shouldn't have existed.  Dunno how it came
about; I guess it was some sort of a compromise.  The real problem is
IMO that QEMU lets the control loose on whether MSR_SMI_COUNT is present
in the guest.

Roman.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] QEMU/KVM migration backwards compatibility broken?
@ 2019-06-06 13:13           ` Roman Kagan
  0 siblings, 0 replies; 26+ messages in thread
From: Roman Kagan @ 2019-06-06 13:13 UTC (permalink / raw)
  To: Liran Alon; +Cc: Paolo Bonzini, Dr. David Alan Gilbert, kvm list, qemu-devel

On Thu, Jun 06, 2019 at 01:09:56PM +0300, Liran Alon wrote:
> First, machine-type express the set of vHW behaviour and properties that is exposed to guest.
> Therefore, machine-type shouldn’t change for a given guest lifetime (including Live-Migrations).
> Otherwise, guest will experience different vHW behaviour and properties before/after Live-Migration.
> So I think machine-type is not relevant for this discussion. We should focus on flags which specify
> migration behaviour (such as “x-migrate-smi-count” which can also be controlled by machine-type but not only).

My experience in dealing with migration compatibility matters (see e.g.
9b4cf107b09d18ac30f46fd1c4de8585ccba030c) is that the strategy is
basically "don't let anything but QEMU control what the guest sees".  If
QEMU is started with a particular set of flags defining guest-visible
things (and machine-type is just a convenient preset of flags), and
anything in the environment (host kernel, libraries, etc.) prevents it
from actually providing that functionality, it should detect it before
running any guest code or taking the incoming migration stream, and
refuse to start instead.

AFAICS from a quick glance, QEMU doesn't have a flag to which the
visiblity of MSR_SMI_COUNT is tied, which IMHO is a bug.  As the
presence of this MSR in the CPU is only determined by the CPU model,
it'd probably be correct to just refuse starting when such a CPU is
requested and the host KVM doesn't support MSR_SMI_COUNT.

> Third, let’s assume all hosts in fleet was upgraded to new_kernel. How
> do I modify all launched QEMUs on these new hosts to now have
> “x-migrate-smi-count” set to true?

In (my) ideal world this flag shouldn't have existed.  Dunno how it came
about; I guess it was some sort of a compromise.  The real problem is
IMO that QEMU lets the control loose on whether MSR_SMI_COUNT is present
in the guest.

Roman.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: QEMU/KVM migration backwards compatibility broken?
  2019-06-06 11:29                 ` [Qemu-devel] " Liran Alon
@ 2019-06-06 13:31                   ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert @ 2019-06-06 13:31 UTC (permalink / raw)
  To: Liran Alon; +Cc: kvm list, qemu-devel, Paolo Bonzini

* Liran Alon (liran.alon@oracle.com) wrote:
> 
> 
> > On 6 Jun 2019, at 14:07, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:

<snip>

> > It's tricky; for distro-based users, hitting 'update' and getting both
> > makes a lot of sense; but as you say you ened to let them do stuff
> > individually if they want to, so they can track down problems.
> > There's also a newer problem which is people want to run the QEMU in
> > containers on hosts that have separate update schedules - the kernel
> > version relationship is then much more fluid.
> > 
> >> Compiling all above very useful discussion (thanks for this!), I may have a better suggestion that doesn’t require any additional flags:
> >> 1) Source QEMU will always send all all VMState subsections that is deemed by source QEMU as required to not break guest semantic behaviour.
> >> This is done by .needed() methods that examine guest runtime state to understand if this state is required to be sent or not.
> > 
> > So that's as we already do.
> 
> Besides the fact that today we also expect to add a flag tied to machine-type for every new VMState subsection we add that didn’t exist on previous QEMU versions...
> 
> > 
> >> 2) Destination QEMU will provide a generic QMP command which allows to set names of VMState subsections that if accepted on migration stream
> >> and failed to be loaded (because either subsection name is not implemented or because .post_load() method failed) then the failure should be ignored
> >> and migration should continue as usual. By default, the list of this names will be empty.
> > 
> > The format of the migration stream means that you can't skip an unknown
> > subsection; it's not possible to resume parsing the stream without
> > knowing what was supposed to be there. [This is pretty awful
> > but my last attempts to rework it hit a dead end]
> 
> Wow… That is indeed pretty awful.
> I thought every VMState subsection have a header with a length field… :(

No, no length - it's just a header saying it's a subsection with the
name, then just unformatted data (that had better match what you
expect!).

> Why did your last attempts to add such a length field to migration stream protocol failed?

There's a lot of stuff that's open coded rather than going through
VMState's, so you don't know how much data they'll end up generating.
So the only way to do that is to write to a buffer and then get the
length and dump the buffer.  Actually all that's rare in subsections
but does happen elsewhere.  I got some of some of those nasty cases
but I got stuck trying to get rid of some of the other opencoding
(and still keep it compatible).

> > 
> > So we still need to tie subsections to machine types; that way
> > you don't send them to old qemu's and there for you don't have the
> > problem of the qemu receiving something it doesn't know.
> 
> I agree that if there is no way to skip a VMState subsection in the stream, then we must
> have a way to specify to source QEMU to prevent sending this subsection to destination…
> 
> I would suggest though that instead of having a flag tied to machine-type, we will have a QMP command
> that can specify names of subsections we explicitly wish to be skipped sending to destination even if their .needed() method returns true.

I don't like the thought of generically going behind the devices back;
it's pretty rare to have to do this, so adding a qmp command to tweak
properties that we've already got seems to make more sense to me.

> This seems like a more explicit approach and doesn’t come with the down-side of forever not migrating this VMState subsection
Dave

> for the entire lifetime of guest.
> 
> > 
> > Still, you could skip things where the destination kernel doesn't know
> > about it.
> > 
> >> 3) Destination QEMU will implement .post_load() method for all these VMState subsections that depend on kernel capability to be restored properly
> >> such that it will fail subsection load in case kernel capability is not present. (Note that this load failure will be ignored if subsection name is specified in (2)).
> >> 
> >> Above suggestion have the following properties:
> >> 1) Doesn’t require any flag to be added to QEMU.
> > 
> > There's no logical difference between 'flags' and 'names of subsections'
> > - they're got the same problem in someone somewhere knowing which are
> >  safe.
> 
> I agree. But creating additional flags does come with a development and testing overhead and makes code less intuitive.
> I would have prefer to use subsection names.
> 
> > 
> >> 2) Moves all control on whether to fail migration because of failure to load VMState subsection to receiver side. Sender always attempts to send max state he believes is required.
> >> 3) We remove coupling of migration compatibility from machine-type.
> >> 
> >> What do you think?
> > 
> > Sorry, can't do (3) - we need to keep the binding for subsections to
> > machine types for qemu compatibility;  I'm open for something for
> > kernel compat, but not when it's breaking the qemu subsection
> > checks.
> > 
> > Dave
> 
> Agree. I have proposed now above how to not break qemu subsection checks while still not tie this to machine-type.
> Please tell me what you think on that approach. :)
> 
> We can combine that approach together with implementing the mentioned .post_load() methods and maybe it solves the discussion at hand here.
> 
> -Liran
> 
> > 
> >> 
> >> -Liran
> >> 
> >>> 
> >>>> -Liran
> >>>> 
> >>>>> 
> >>>>>> -Liran
> >>>>>> 
> >>>>>>> 
> >>>>>>>> Thanks,
> >>>>>>>> -Liran
> >>>>>>> --
> >>>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >>>>>> 
> >>>>> --
> >>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >>>> 
> >>> --
> >>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >> 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] QEMU/KVM migration backwards compatibility broken?
@ 2019-06-06 13:31                   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert @ 2019-06-06 13:31 UTC (permalink / raw)
  To: Liran Alon; +Cc: Paolo Bonzini, qemu-devel, kvm list

* Liran Alon (liran.alon@oracle.com) wrote:
> 
> 
> > On 6 Jun 2019, at 14:07, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:

<snip>

> > It's tricky; for distro-based users, hitting 'update' and getting both
> > makes a lot of sense; but as you say you ened to let them do stuff
> > individually if they want to, so they can track down problems.
> > There's also a newer problem which is people want to run the QEMU in
> > containers on hosts that have separate update schedules - the kernel
> > version relationship is then much more fluid.
> > 
> >> Compiling all above very useful discussion (thanks for this!), I may have a better suggestion that doesn’t require any additional flags:
> >> 1) Source QEMU will always send all all VMState subsections that is deemed by source QEMU as required to not break guest semantic behaviour.
> >> This is done by .needed() methods that examine guest runtime state to understand if this state is required to be sent or not.
> > 
> > So that's as we already do.
> 
> Besides the fact that today we also expect to add a flag tied to machine-type for every new VMState subsection we add that didn’t exist on previous QEMU versions...
> 
> > 
> >> 2) Destination QEMU will provide a generic QMP command which allows to set names of VMState subsections that if accepted on migration stream
> >> and failed to be loaded (because either subsection name is not implemented or because .post_load() method failed) then the failure should be ignored
> >> and migration should continue as usual. By default, the list of this names will be empty.
> > 
> > The format of the migration stream means that you can't skip an unknown
> > subsection; it's not possible to resume parsing the stream without
> > knowing what was supposed to be there. [This is pretty awful
> > but my last attempts to rework it hit a dead end]
> 
> Wow… That is indeed pretty awful.
> I thought every VMState subsection have a header with a length field… :(

No, no length - it's just a header saying it's a subsection with the
name, then just unformatted data (that had better match what you
expect!).

> Why did your last attempts to add such a length field to migration stream protocol failed?

There's a lot of stuff that's open coded rather than going through
VMState's, so you don't know how much data they'll end up generating.
So the only way to do that is to write to a buffer and then get the
length and dump the buffer.  Actually all that's rare in subsections
but does happen elsewhere.  I got some of some of those nasty cases
but I got stuck trying to get rid of some of the other opencoding
(and still keep it compatible).

> > 
> > So we still need to tie subsections to machine types; that way
> > you don't send them to old qemu's and there for you don't have the
> > problem of the qemu receiving something it doesn't know.
> 
> I agree that if there is no way to skip a VMState subsection in the stream, then we must
> have a way to specify to source QEMU to prevent sending this subsection to destination…
> 
> I would suggest though that instead of having a flag tied to machine-type, we will have a QMP command
> that can specify names of subsections we explicitly wish to be skipped sending to destination even if their .needed() method returns true.

I don't like the thought of generically going behind the devices back;
it's pretty rare to have to do this, so adding a qmp command to tweak
properties that we've already got seems to make more sense to me.

> This seems like a more explicit approach and doesn’t come with the down-side of forever not migrating this VMState subsection
Dave

> for the entire lifetime of guest.
> 
> > 
> > Still, you could skip things where the destination kernel doesn't know
> > about it.
> > 
> >> 3) Destination QEMU will implement .post_load() method for all these VMState subsections that depend on kernel capability to be restored properly
> >> such that it will fail subsection load in case kernel capability is not present. (Note that this load failure will be ignored if subsection name is specified in (2)).
> >> 
> >> Above suggestion have the following properties:
> >> 1) Doesn’t require any flag to be added to QEMU.
> > 
> > There's no logical difference between 'flags' and 'names of subsections'
> > - they're got the same problem in someone somewhere knowing which are
> >  safe.
> 
> I agree. But creating additional flags does come with a development and testing overhead and makes code less intuitive.
> I would have prefer to use subsection names.
> 
> > 
> >> 2) Moves all control on whether to fail migration because of failure to load VMState subsection to receiver side. Sender always attempts to send max state he believes is required.
> >> 3) We remove coupling of migration compatibility from machine-type.
> >> 
> >> What do you think?
> > 
> > Sorry, can't do (3) - we need to keep the binding for subsections to
> > machine types for qemu compatibility;  I'm open for something for
> > kernel compat, but not when it's breaking the qemu subsection
> > checks.
> > 
> > Dave
> 
> Agree. I have proposed now above how to not break qemu subsection checks while still not tie this to machine-type.
> Please tell me what you think on that approach. :)
> 
> We can combine that approach together with implementing the mentioned .post_load() methods and maybe it solves the discussion at hand here.
> 
> -Liran
> 
> > 
> >> 
> >> -Liran
> >> 
> >>> 
> >>>> -Liran
> >>>> 
> >>>>> 
> >>>>>> -Liran
> >>>>>> 
> >>>>>>> 
> >>>>>>>> Thanks,
> >>>>>>>> -Liran
> >>>>>>> --
> >>>>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >>>>>> 
> >>>>> --
> >>>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >>>> 
> >>> --
> >>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >> 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: QEMU/KVM migration backwards compatibility broken?
  2019-06-06 13:31                   ` [Qemu-devel] " Dr. David Alan Gilbert
@ 2019-06-06 15:16                     ` Liran Alon
  -1 siblings, 0 replies; 26+ messages in thread
From: Liran Alon @ 2019-06-06 15:16 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: kvm list, qemu-devel, Paolo Bonzini


> On 6 Jun 2019, at 16:31, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> 
>>> 
>>> So we still need to tie subsections to machine types; that way
>>> you don't send them to old qemu's and there for you don't have the
>>> problem of the qemu receiving something it doesn't know.
>> 
>> I agree that if there is no way to skip a VMState subsection in the stream, then we must
>> have a way to specify to source QEMU to prevent sending this subsection to destination…
>> 
>> I would suggest though that instead of having a flag tied to machine-type, we will have a QMP command
>> that can specify names of subsections we explicitly wish to be skipped sending to destination even if their .needed() method returns true.
> 
> I don't like the thought of generically going behind the devices back;
> it's pretty rare to have to do this, so adding a qmp command to tweak
> properties that we've already got seems to make more sense to me.
> 
>> This seems like a more explicit approach and doesn’t come with the down-side of forever not migrating this VMState subsection
> Dave

If I understand you correctly, this is what you propose:
1) Have a .post_load() method for VMState subsections that depend on kernel capability to fail migration in case capability do not exist.
2) For specific problematic VMState subsections, add property such that it’s .needed() method will return false in case the property is set to false (value is true by default).
3) Have a QMP command that allows dynamically changing the value of these properties.
4) Properties values are still tied to machine-type? I think not right?

I instead propose the following:
1) Same as (1) above.
2) Add a MigrationParameter (and matching MigrationCapability) named “avoid_state” that specifies list of subsection names to avoid sending in migration even if their .needed() method will return false. i.e. We will modify migration/vmstate.c to not even call .needed() method of such subsection.

I believe the second proposal have the following advantages:
1) Less error-prone: .needed() methods are written only once and don’t need to take into account additional properties when calculating if they are required or not. Just depend on guest state.
2) Generic: We don’t require additional patch to add a new property to support avoiding sending some subsection in case it doesn’t matter for some workload. As we have discovered only late after msr_smi_count was added (by me) at that point. Second approach allows avoid sending any subsection that is deemed not important to guest workload by migration admin.
3) Not tied to machine-type: Properties are usually tied to machine-type as they need to remain same forever for the lifetime of the guest. However, migration parameters are per-migration and are meant to be tweaked and changed. This allows a guest that used to run on old QEMU and moved to new QEMU to now have better state saved for it’s next future migrations.

Currently we indeed have very rare cases like this ([git grep \"x-migrate | wc -l] product only 4 results…) but I’m not sure it’s not only because we haven’t analysed carefully the case of
restored properties that it’s property depend on kernel capability.

As a start thought, we can start by at least agreeing to implement (1) and consider the property VS MigrationParameter discussion for a later time.

What do you think?

-Liran









^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] QEMU/KVM migration backwards compatibility broken?
@ 2019-06-06 15:16                     ` Liran Alon
  0 siblings, 0 replies; 26+ messages in thread
From: Liran Alon @ 2019-06-06 15:16 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: Paolo Bonzini, qemu-devel, kvm list


> On 6 Jun 2019, at 16:31, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> 
>>> 
>>> So we still need to tie subsections to machine types; that way
>>> you don't send them to old qemu's and there for you don't have the
>>> problem of the qemu receiving something it doesn't know.
>> 
>> I agree that if there is no way to skip a VMState subsection in the stream, then we must
>> have a way to specify to source QEMU to prevent sending this subsection to destination…
>> 
>> I would suggest though that instead of having a flag tied to machine-type, we will have a QMP command
>> that can specify names of subsections we explicitly wish to be skipped sending to destination even if their .needed() method returns true.
> 
> I don't like the thought of generically going behind the devices back;
> it's pretty rare to have to do this, so adding a qmp command to tweak
> properties that we've already got seems to make more sense to me.
> 
>> This seems like a more explicit approach and doesn’t come with the down-side of forever not migrating this VMState subsection
> Dave

If I understand you correctly, this is what you propose:
1) Have a .post_load() method for VMState subsections that depend on kernel capability to fail migration in case capability do not exist.
2) For specific problematic VMState subsections, add property such that it’s .needed() method will return false in case the property is set to false (value is true by default).
3) Have a QMP command that allows dynamically changing the value of these properties.
4) Properties values are still tied to machine-type? I think not right?

I instead propose the following:
1) Same as (1) above.
2) Add a MigrationParameter (and matching MigrationCapability) named “avoid_state” that specifies list of subsection names to avoid sending in migration even if their .needed() method will return false. i.e. We will modify migration/vmstate.c to not even call .needed() method of such subsection.

I believe the second proposal have the following advantages:
1) Less error-prone: .needed() methods are written only once and don’t need to take into account additional properties when calculating if they are required or not. Just depend on guest state.
2) Generic: We don’t require additional patch to add a new property to support avoiding sending some subsection in case it doesn’t matter for some workload. As we have discovered only late after msr_smi_count was added (by me) at that point. Second approach allows avoid sending any subsection that is deemed not important to guest workload by migration admin.
3) Not tied to machine-type: Properties are usually tied to machine-type as they need to remain same forever for the lifetime of the guest. However, migration parameters are per-migration and are meant to be tweaked and changed. This allows a guest that used to run on old QEMU and moved to new QEMU to now have better state saved for it’s next future migrations.

Currently we indeed have very rare cases like this ([git grep \"x-migrate | wc -l] product only 4 results…) but I’m not sure it’s not only because we haven’t analysed carefully the case of
restored properties that it’s property depend on kernel capability.

As a start thought, we can start by at least agreeing to implement (1) and consider the property VS MigrationParameter discussion for a later time.

What do you think?

-Liran










^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: QEMU/KVM migration backwards compatibility broken?
  2019-06-06 15:16                     ` [Qemu-devel] " Liran Alon
@ 2019-06-10  9:44                       ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert @ 2019-06-10  9:44 UTC (permalink / raw)
  To: Liran Alon; +Cc: kvm list, qemu-devel, Paolo Bonzini

* Liran Alon (liran.alon@oracle.com) wrote:
> 
> > On 6 Jun 2019, at 16:31, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> > 
> >>> 
> >>> So we still need to tie subsections to machine types; that way
> >>> you don't send them to old qemu's and there for you don't have the
> >>> problem of the qemu receiving something it doesn't know.
> >> 
> >> I agree that if there is no way to skip a VMState subsection in the stream, then we must
> >> have a way to specify to source QEMU to prevent sending this subsection to destination…
> >> 
> >> I would suggest though that instead of having a flag tied to machine-type, we will have a QMP command
> >> that can specify names of subsections we explicitly wish to be skipped sending to destination even if their .needed() method returns true.
> > 
> > I don't like the thought of generically going behind the devices back;
> > it's pretty rare to have to do this, so adding a qmp command to tweak
> > properties that we've already got seems to make more sense to me.
> > 
> >> This seems like a more explicit approach and doesn’t come with the down-side of forever not migrating this VMState subsection
> > Dave
> 
> If I understand you correctly, this is what you propose:
> 1) Have a .post_load() method for VMState subsections that depend on kernel capability to fail migration in case capability do not exist.

Yes (wehther it fails or prints a warning depends on how significant the
capability is; if it's a guest crash then fail is probably best).

> 2) For specific problematic VMState subsections, add property such that it’s .needed() method will return false in case the property is set to false (value is true by default).
> 3) Have a QMP command that allows dynamically changing the value of these properties.
> 4) Properties values are still tied to machine-type? I think not right?

Property values are initialised from the machine type; in your case
where you want to upgrade to use a new feature then you can use
(3) to change it.

> I instead propose the following:
> 1) Same as (1) above.
> 2) Add a MigrationParameter (and matching MigrationCapability) named “avoid_state” that specifies list of subsection names to avoid sending in migration even if their .needed() method will return false. i.e. We will modify migration/vmstate.c to not even call .needed() method of such subsection.
> 
> I believe the second proposal have the following advantages:
> 1) Less error-prone: .needed() methods are written only once and don’t need to take into account additional properties when calculating if they are required or not. Just depend on guest state.
> 2) Generic: We don’t require additional patch to add a new property to support avoiding sending some subsection in case it doesn’t matter for some workload. As we have discovered only late after msr_smi_count was added (by me) at that point. Second approach allows avoid sending any subsection that is deemed not important to guest workload by migration admin.
> 3) Not tied to machine-type: Properties are usually tied to machine-type as they need to remain same forever for the lifetime of the guest. However, migration parameters are per-migration and are meant to be tweaked and changed. This allows a guest that used to run on old QEMU and moved to new QEMU to now have better state saved for it’s next future migrations.
> 
> Currently we indeed have very rare cases like this ([git grep \"x-migrate | wc -l] product only 4 results…) but I’m not sure it’s not only because we haven’t analysed carefully the case of
> restored properties that it’s property depend on kernel capability.
> 
> As a start thought, we can start by at least agreeing to implement (1) and consider the property VS MigrationParameter discussion for a later time.
> 
> What do you think?

I still don't like exposing a list of migration subsections into an
interface.

Dave

> -Liran
> 
> 
> 
> 
> 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] QEMU/KVM migration backwards compatibility broken?
@ 2019-06-10  9:44                       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 26+ messages in thread
From: Dr. David Alan Gilbert @ 2019-06-10  9:44 UTC (permalink / raw)
  To: Liran Alon; +Cc: Paolo Bonzini, qemu-devel, kvm list

* Liran Alon (liran.alon@oracle.com) wrote:
> 
> > On 6 Jun 2019, at 16:31, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> > 
> >>> 
> >>> So we still need to tie subsections to machine types; that way
> >>> you don't send them to old qemu's and there for you don't have the
> >>> problem of the qemu receiving something it doesn't know.
> >> 
> >> I agree that if there is no way to skip a VMState subsection in the stream, then we must
> >> have a way to specify to source QEMU to prevent sending this subsection to destination…
> >> 
> >> I would suggest though that instead of having a flag tied to machine-type, we will have a QMP command
> >> that can specify names of subsections we explicitly wish to be skipped sending to destination even if their .needed() method returns true.
> > 
> > I don't like the thought of generically going behind the devices back;
> > it's pretty rare to have to do this, so adding a qmp command to tweak
> > properties that we've already got seems to make more sense to me.
> > 
> >> This seems like a more explicit approach and doesn’t come with the down-side of forever not migrating this VMState subsection
> > Dave
> 
> If I understand you correctly, this is what you propose:
> 1) Have a .post_load() method for VMState subsections that depend on kernel capability to fail migration in case capability do not exist.

Yes (wehther it fails or prints a warning depends on how significant the
capability is; if it's a guest crash then fail is probably best).

> 2) For specific problematic VMState subsections, add property such that it’s .needed() method will return false in case the property is set to false (value is true by default).
> 3) Have a QMP command that allows dynamically changing the value of these properties.
> 4) Properties values are still tied to machine-type? I think not right?

Property values are initialised from the machine type; in your case
where you want to upgrade to use a new feature then you can use
(3) to change it.

> I instead propose the following:
> 1) Same as (1) above.
> 2) Add a MigrationParameter (and matching MigrationCapability) named “avoid_state” that specifies list of subsection names to avoid sending in migration even if their .needed() method will return false. i.e. We will modify migration/vmstate.c to not even call .needed() method of such subsection.
> 
> I believe the second proposal have the following advantages:
> 1) Less error-prone: .needed() methods are written only once and don’t need to take into account additional properties when calculating if they are required or not. Just depend on guest state.
> 2) Generic: We don’t require additional patch to add a new property to support avoiding sending some subsection in case it doesn’t matter for some workload. As we have discovered only late after msr_smi_count was added (by me) at that point. Second approach allows avoid sending any subsection that is deemed not important to guest workload by migration admin.
> 3) Not tied to machine-type: Properties are usually tied to machine-type as they need to remain same forever for the lifetime of the guest. However, migration parameters are per-migration and are meant to be tweaked and changed. This allows a guest that used to run on old QEMU and moved to new QEMU to now have better state saved for it’s next future migrations.
> 
> Currently we indeed have very rare cases like this ([git grep \"x-migrate | wc -l] product only 4 results…) but I’m not sure it’s not only because we haven’t analysed carefully the case of
> restored properties that it’s property depend on kernel capability.
> 
> As a start thought, we can start by at least agreeing to implement (1) and consider the property VS MigrationParameter discussion for a later time.
> 
> What do you think?

I still don't like exposing a list of migration subsections into an
interface.

Dave

> -Liran
> 
> 
> 
> 
> 
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2019-06-10  9:51 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-06  0:09 QEMU/KVM migration backwards compatibility broken? Liran Alon
2019-06-06  0:09 ` [Qemu-devel] " Liran Alon
2019-06-06  8:42 ` Dr. David Alan Gilbert
2019-06-06  8:42   ` [Qemu-devel] " Dr. David Alan Gilbert
2019-06-06  9:11   ` Liran Alon
2019-06-06  9:11     ` [Qemu-devel] " Liran Alon
2019-06-06  9:23     ` Dr. David Alan Gilbert
2019-06-06  9:23       ` [Qemu-devel] " Dr. David Alan Gilbert
2019-06-06 10:09       ` Liran Alon
2019-06-06 10:09         ` [Qemu-devel] " Liran Alon
2019-06-06 10:39         ` Dr. David Alan Gilbert
2019-06-06 10:39           ` [Qemu-devel] " Dr. David Alan Gilbert
2019-06-06 10:57           ` Liran Alon
2019-06-06 10:57             ` [Qemu-devel] " Liran Alon
2019-06-06 11:07             ` Dr. David Alan Gilbert
2019-06-06 11:07               ` [Qemu-devel] " Dr. David Alan Gilbert
2019-06-06 11:29               ` Liran Alon
2019-06-06 11:29                 ` [Qemu-devel] " Liran Alon
2019-06-06 13:31                 ` Dr. David Alan Gilbert
2019-06-06 13:31                   ` [Qemu-devel] " Dr. David Alan Gilbert
2019-06-06 15:16                   ` Liran Alon
2019-06-06 15:16                     ` [Qemu-devel] " Liran Alon
2019-06-10  9:44                     ` Dr. David Alan Gilbert
2019-06-10  9:44                       ` [Qemu-devel] " Dr. David Alan Gilbert
2019-06-06 13:13         ` Roman Kagan
2019-06-06 13:13           ` Roman Kagan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.