* ARM Snapshots Not Backwards-Compatible
@ 2021-02-03 4:01 Aaron Lindsay
2021-02-03 8:21 ` Philippe Mathieu-Daudé
2021-02-03 10:01 ` Peter Maydell
0 siblings, 2 replies; 22+ messages in thread
From: Aaron Lindsay @ 2021-02-03 4:01 UTC (permalink / raw)
To: qemu-devel; +Cc: Peter Maydell
Hello,
I'm attempting to restore an AArch64 snapshot taken on QEMU 4.1.0 on
QEMU 5.2.0, using system mode. My previous impression, possibly from
https://wiki.qemu.org/Features/Migration/Troubleshooting#Basics was that
this ought to work:
> Note that QEMU supports migrating forward between QEMU versions
Note that I'm using qemu-system-aarch64 with -loadvm.
However, I've run into several issues I thought I should report. The
first of them was that this commit changed the address of CBAR, which
resulted in a mismatch of the register IDs in `cpu_post_load` in
target/arm/machine.c:
https://patchwork.kernel.org/project/qemu-devel/patch/20190927144249.29999-2-peter.maydell@linaro.org/
The second was that several system registers have changed which bits are
allowed to be written in different circumstances, seemingly as a result
of a combination of bugfixes and implementation of additional behavior.
These hit errors detected in `write_list_to_cpustate` in
target/arm/helper.c.
The third is that meanings of the bits in env->features (as defined by
`enum arm_features` in target/arm/cpu.h) has shifted. For example,
ARM_FEATURE_PXN, ARM_FEATURE_CRC, ARM_FEATURE_VFP, ARM_FEATURE_VFP3,
ARM_FEATURE_VFP4 have all been removed and ARM_FEATURE_V8_1M has been
added since 4.1.0. Heck, even I have added a field there in the past.
Unfortunately, these additions/removals mean that when env->features is
saved on one version and restored on another the bits can mean different
things. Notably, the removal of the *VFP features means that a snapshot
of a CPU reporting it supports ARM_FEATURE_VFP3 on 4.1.0 thinks it's now
ARM_FEATURE_M on 5.2.0!
My guess, given the numerous issues and the additional complexity
required to properly implement backwards-compatible snapshotting, is
that this is not a primary goal. However, if it is a goal, what steps
can/should we take to support it more thoroughly?
Thanks!
-Aaron
p.s. Now for an admission: the snapshots I'm testing with were
originally taken with `-cpu max`. This was unintentional, and I
understand if the response is that I can't expect `-cpu max` checkpoints
to work across QEMU versions... but I also don't think that all of these
issues can be blamed on that (notably CBAR and env->features).
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ARM Snapshots Not Backwards-Compatible 2021-02-03 4:01 ARM Snapshots Not Backwards-Compatible Aaron Lindsay @ 2021-02-03 8:21 ` Philippe Mathieu-Daudé 2021-02-03 10:27 ` Dr. David Alan Gilbert 2021-02-03 10:01 ` Peter Maydell 1 sibling, 1 reply; 22+ messages in thread From: Philippe Mathieu-Daudé @ 2021-02-03 8:21 UTC (permalink / raw) To: Aaron Lindsay Cc: Peter Maydell, Andrew Jones, Juan Quintela, qemu-devel, Dr. David Alan Gilbert, qemu-arm Cc'ing migration team and qemu-arm@ list. On 2/3/21 5:01 AM, Aaron Lindsay wrote: > Hello, > > I'm attempting to restore an AArch64 snapshot taken on QEMU 4.1.0 on > QEMU 5.2.0, using system mode. My previous impression, possibly from > https://wiki.qemu.org/Features/Migration/Troubleshooting#Basics was that > this ought to work: > >> Note that QEMU supports migrating forward between QEMU versions > > Note that I'm using qemu-system-aarch64 with -loadvm. > > However, I've run into several issues I thought I should report. The > first of them was that this commit changed the address of CBAR, which > resulted in a mismatch of the register IDs in `cpu_post_load` in > target/arm/machine.c: > https://patchwork.kernel.org/project/qemu-devel/patch/20190927144249.29999-2-peter.maydell@linaro.org/ > > The second was that several system registers have changed which bits are > allowed to be written in different circumstances, seemingly as a result > of a combination of bugfixes and implementation of additional behavior. > These hit errors detected in `write_list_to_cpustate` in > target/arm/helper.c. > > The third is that meanings of the bits in env->features (as defined by > `enum arm_features` in target/arm/cpu.h) has shifted. For example, > ARM_FEATURE_PXN, ARM_FEATURE_CRC, ARM_FEATURE_VFP, ARM_FEATURE_VFP3, > ARM_FEATURE_VFP4 have all been removed and ARM_FEATURE_V8_1M has been > added since 4.1.0. Heck, even I have added a field there in the past. > Unfortunately, these additions/removals mean that when env->features is > saved on one version and restored on another the bits can mean different > things. Notably, the removal of the *VFP features means that a snapshot > of a CPU reporting it supports ARM_FEATURE_VFP3 on 4.1.0 thinks it's now > ARM_FEATURE_M on 5.2.0! > > My guess, given the numerous issues and the additional complexity > required to properly implement backwards-compatible snapshotting, is > that this is not a primary goal. However, if it is a goal, what steps > can/should we take to support it more thoroughly? > > Thanks! > > -Aaron > > p.s. Now for an admission: the snapshots I'm testing with were > originally taken with `-cpu max`. This was unintentional, and I > understand if the response is that I can't expect `-cpu max` checkpoints > to work across QEMU versions... but I also don't think that all of these > issues can be blamed on that (notably CBAR and env->features). > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ARM Snapshots Not Backwards-Compatible 2021-02-03 8:21 ` Philippe Mathieu-Daudé @ 2021-02-03 10:27 ` Dr. David Alan Gilbert 2021-02-03 10:38 ` Peter Maydell 0 siblings, 1 reply; 22+ messages in thread From: Dr. David Alan Gilbert @ 2021-02-03 10:27 UTC (permalink / raw) To: Philippe Mathieu-Daudé Cc: Peter Maydell, Andrew Jones, Juan Quintela, qemu-devel, Aaron Lindsay, qemu-arm * Philippe Mathieu-Daudé (philmd@redhat.com) wrote: > Cc'ing migration team and qemu-arm@ list. I'll have to leave the detail of that to the ARM peole; but from a migration point of view I think we do want the 64 bit ARM migrations to be stable now. Please tie incompatible changes to machine types. Dave > On 2/3/21 5:01 AM, Aaron Lindsay wrote: > > Hello, > > > > I'm attempting to restore an AArch64 snapshot taken on QEMU 4.1.0 on > > QEMU 5.2.0, using system mode. My previous impression, possibly from > > https://wiki.qemu.org/Features/Migration/Troubleshooting#Basics was that > > this ought to work: > > > >> Note that QEMU supports migrating forward between QEMU versions > > > > Note that I'm using qemu-system-aarch64 with -loadvm. > > > > However, I've run into several issues I thought I should report. The > > first of them was that this commit changed the address of CBAR, which > > resulted in a mismatch of the register IDs in `cpu_post_load` in > > target/arm/machine.c: > > https://patchwork.kernel.org/project/qemu-devel/patch/20190927144249.29999-2-peter.maydell@linaro.org/ > > > > The second was that several system registers have changed which bits are > > allowed to be written in different circumstances, seemingly as a result > > of a combination of bugfixes and implementation of additional behavior. > > These hit errors detected in `write_list_to_cpustate` in > > target/arm/helper.c. > > > > The third is that meanings of the bits in env->features (as defined by > > `enum arm_features` in target/arm/cpu.h) has shifted. For example, > > ARM_FEATURE_PXN, ARM_FEATURE_CRC, ARM_FEATURE_VFP, ARM_FEATURE_VFP3, > > ARM_FEATURE_VFP4 have all been removed and ARM_FEATURE_V8_1M has been > > added since 4.1.0. Heck, even I have added a field there in the past. > > Unfortunately, these additions/removals mean that when env->features is > > saved on one version and restored on another the bits can mean different > > things. Notably, the removal of the *VFP features means that a snapshot > > of a CPU reporting it supports ARM_FEATURE_VFP3 on 4.1.0 thinks it's now > > ARM_FEATURE_M on 5.2.0! > > > > My guess, given the numerous issues and the additional complexity > > required to properly implement backwards-compatible snapshotting, is > > that this is not a primary goal. However, if it is a goal, what steps > > can/should we take to support it more thoroughly? > > > > Thanks! > > > > -Aaron > > > > p.s. Now for an admission: the snapshots I'm testing with were > > originally taken with `-cpu max`. This was unintentional, and I > > understand if the response is that I can't expect `-cpu max` checkpoints > > to work across QEMU versions... but I also don't think that all of these > > issues can be blamed on that (notably CBAR and env->features). > > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ARM Snapshots Not Backwards-Compatible 2021-02-03 10:27 ` Dr. David Alan Gilbert @ 2021-02-03 10:38 ` Peter Maydell 2021-02-03 10:49 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 22+ messages in thread From: Peter Maydell @ 2021-02-03 10:38 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Andrew Jones, Juan Quintela, QEMU Developers, Aaron Lindsay, qemu-arm, Philippe Mathieu-Daudé On Wed, 3 Feb 2021 at 10:28, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > > * Philippe Mathieu-Daudé (philmd@redhat.com) wrote: > > Cc'ing migration team and qemu-arm@ list. > > I'll have to leave the detail of that to the ARM peole; but from a > migration point of view I think we do want the 64 bit ARM migrations to > be stable now. Please tie incompatible changes to machine types. That is the intention, but because there's no upstream testing of migration compat, we never notice if we get it wrong. What is x86 doing to keep cross-version migration working ? thanks -- PMM ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ARM Snapshots Not Backwards-Compatible 2021-02-03 10:38 ` Peter Maydell @ 2021-02-03 10:49 ` Dr. David Alan Gilbert 2021-02-03 10:52 ` Peter Maydell 0 siblings, 1 reply; 22+ messages in thread From: Dr. David Alan Gilbert @ 2021-02-03 10:49 UTC (permalink / raw) To: Peter Maydell Cc: Andrew Jones, Juan Quintela, QEMU Developers, Aaron Lindsay, qemu-arm, Philippe Mathieu-Daudé * Peter Maydell (peter.maydell@linaro.org) wrote: > On Wed, 3 Feb 2021 at 10:28, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > > > > * Philippe Mathieu-Daudé (philmd@redhat.com) wrote: > > > Cc'ing migration team and qemu-arm@ list. > > > > I'll have to leave the detail of that to the ARM peole; but from a > > migration point of view I think we do want the 64 bit ARM migrations to > > be stable now. Please tie incompatible changes to machine types. > > That is the intention, but because there's no upstream testing > of migration compat, we never notice if we get it wrong. > What is x86 doing to keep cross-version migration working ? I know there used to be some of our team running Avocado tests for compatibility regularly, I'm not sure of the current status. It's something we also do regularly around when we do downstream releases, so we tend to catch them then, although even on x86 that often turns out to be a bit late. Avocado can take a separate qemu path for source/destination. Dave > thanks > -- PMM > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ARM Snapshots Not Backwards-Compatible 2021-02-03 10:49 ` Dr. David Alan Gilbert @ 2021-02-03 10:52 ` Peter Maydell 2021-02-03 10:59 ` Dr. David Alan Gilbert 2021-02-03 12:44 ` Andrew Jones 0 siblings, 2 replies; 22+ messages in thread From: Peter Maydell @ 2021-02-03 10:52 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Andrew Jones, Juan Quintela, QEMU Developers, Aaron Lindsay, qemu-arm, Philippe Mathieu-Daudé On Wed, 3 Feb 2021 at 10:49, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > > * Peter Maydell (peter.maydell@linaro.org) wrote: > > On Wed, 3 Feb 2021 at 10:28, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > > > > > > * Philippe Mathieu-Daudé (philmd@redhat.com) wrote: > > > > Cc'ing migration team and qemu-arm@ list. > > > > > > I'll have to leave the detail of that to the ARM peole; but from a > > > migration point of view I think we do want the 64 bit ARM migrations to > > > be stable now. Please tie incompatible changes to machine types. > > > > That is the intention, but because there's no upstream testing > > of migration compat, we never notice if we get it wrong. > > What is x86 doing to keep cross-version migration working ? > > I know there used to be some of our team running Avocado tests for > compatibility regularly, I'm not sure of the current status. > It's something we also do regularly around when we do downstream > releases, so we tend to catch them then, although even on x86 that > often turns out to be a bit late. So downstream testing only? I think that unless we either (a) start doing migration-compat testing consistently upstream or (b) RedHat or some other downstream start testing and reporting compat issues to us for aarch64 as they do for x86-64, in practice we're just not going to have working migration compat despite our best intentions. (None of the issues Aaron raises were deliberate compat breaks -- they're all "we made a change we didn't think affected migration but it turns out that it does".) thanks -- PMM ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ARM Snapshots Not Backwards-Compatible 2021-02-03 10:52 ` Peter Maydell @ 2021-02-03 10:59 ` Dr. David Alan Gilbert 2021-02-03 16:04 ` Philippe Mathieu-Daudé 2021-02-03 12:44 ` Andrew Jones 1 sibling, 1 reply; 22+ messages in thread From: Dr. David Alan Gilbert @ 2021-02-03 10:59 UTC (permalink / raw) To: Peter Maydell Cc: Andrew Jones, Juan Quintela, QEMU Developers, Aaron Lindsay, qemu-arm, Philippe Mathieu-Daudé * Peter Maydell (peter.maydell@linaro.org) wrote: > On Wed, 3 Feb 2021 at 10:49, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > > > > * Peter Maydell (peter.maydell@linaro.org) wrote: > > > On Wed, 3 Feb 2021 at 10:28, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > > > > > > > > * Philippe Mathieu-Daudé (philmd@redhat.com) wrote: > > > > > Cc'ing migration team and qemu-arm@ list. > > > > > > > > I'll have to leave the detail of that to the ARM peole; but from a > > > > migration point of view I think we do want the 64 bit ARM migrations to > > > > be stable now. Please tie incompatible changes to machine types. > > > > > > That is the intention, but because there's no upstream testing > > > of migration compat, we never notice if we get it wrong. > > > What is x86 doing to keep cross-version migration working ? > > > > I know there used to be some of our team running Avocado tests for > > compatibility regularly, I'm not sure of the current status. > > It's something we also do regularly around when we do downstream > > releases, so we tend to catch them then, although even on x86 that > > often turns out to be a bit late. > > So downstream testing only? I thought there used to be some regular avocado testing of upstream but I'm not sure if it's all architectures and I'm not sure if it's still happening; I haven't seen any migration issues from it for a while, which makes me think it isn't. > I think that unless we either (a) start > doing migration-compat testing consistently upstream or (b) RedHat or > some other downstream start testing and reporting compat issues > to us for aarch64 as they do for x86-64, in practice we're just > not going to have working migration compat despite our best > intentions. (None of the issues Aaron raises were deliberate > compat breaks -- they're all "we made a change we didn't think > affected migration but it turns out that it does".) I'd agree; we still hit this too often on x86 as well. Dave > thanks > -- PMM > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ARM Snapshots Not Backwards-Compatible 2021-02-03 10:59 ` Dr. David Alan Gilbert @ 2021-02-03 16:04 ` Philippe Mathieu-Daudé 0 siblings, 0 replies; 22+ messages in thread From: Philippe Mathieu-Daudé @ 2021-02-03 16:04 UTC (permalink / raw) To: Dr. David Alan Gilbert, avocado-devel Cc: Peter Maydell, Andrew Jones, Juan Quintela, QEMU Developers, Aaron Lindsay, qemu-arm On 2/3/21 11:59 AM, Dr. David Alan Gilbert wrote: > * Peter Maydell (peter.maydell@linaro.org) wrote: >> On Wed, 3 Feb 2021 at 10:49, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: >>> >>> * Peter Maydell (peter.maydell@linaro.org) wrote: >>>> On Wed, 3 Feb 2021 at 10:28, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: >>>>> >>>>> * Philippe Mathieu-Daudé (philmd@redhat.com) wrote: >>>>>> Cc'ing migration team and qemu-arm@ list. >>>>> >>>>> I'll have to leave the detail of that to the ARM peole; but from a >>>>> migration point of view I think we do want the 64 bit ARM migrations to >>>>> be stable now. Please tie incompatible changes to machine types. >>>> >>>> That is the intention, but because there's no upstream testing >>>> of migration compat, we never notice if we get it wrong. >>>> What is x86 doing to keep cross-version migration working ? >>> >>> I know there used to be some of our team running Avocado tests for >>> compatibility regularly, I'm not sure of the current status. >>> It's something we also do regularly around when we do downstream >>> releases, so we tend to catch them then, although even on x86 that >>> often turns out to be a bit late. >> >> So downstream testing only? > > I thought there used to be some regular avocado testing of upstream but > I'm not sure if it's all architectures and I'm not sure if it's still > happening; I haven't seen any migration issues from it for a while, > which makes me think it isn't. > >> I think that unless we either (a) start >> doing migration-compat testing consistently upstream or (b) RedHat or >> some other downstream start testing and reporting compat issues >> to us for aarch64 as they do for x86-64, in practice we're just >> not going to have working migration compat despite our best >> intentions. (None of the issues Aaron raises were deliberate >> compat breaks -- they're all "we made a change we didn't think >> affected migration but it turns out that it does".) > > I'd agree; we still hit this too often on x86 as well. Cc'ing avocado-devel, since previously discussed: https://www.mail-archive.com/qemu-devel@nongnu.org/msg654139.html ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ARM Snapshots Not Backwards-Compatible 2021-02-03 10:52 ` Peter Maydell 2021-02-03 10:59 ` Dr. David Alan Gilbert @ 2021-02-03 12:44 ` Andrew Jones 2021-02-03 15:45 ` aaron--- via 1 sibling, 1 reply; 22+ messages in thread From: Andrew Jones @ 2021-02-03 12:44 UTC (permalink / raw) To: Peter Maydell Cc: Juan Quintela, QEMU Developers, Dr. David Alan Gilbert, Aaron Lindsay, qemu-arm, Philippe Mathieu-Daudé On Wed, Feb 03, 2021 at 10:52:59AM +0000, Peter Maydell wrote: > On Wed, 3 Feb 2021 at 10:49, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > > > > * Peter Maydell (peter.maydell@linaro.org) wrote: > > > On Wed, 3 Feb 2021 at 10:28, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > > > > > > > > * Philippe Mathieu-Daudé (philmd@redhat.com) wrote: > > > > > Cc'ing migration team and qemu-arm@ list. > > > > > > > > I'll have to leave the detail of that to the ARM peole; but from a > > > > migration point of view I think we do want the 64 bit ARM migrations to > > > > be stable now. Please tie incompatible changes to machine types. > > > > > > That is the intention, but because there's no upstream testing > > > of migration compat, we never notice if we get it wrong. > > > What is x86 doing to keep cross-version migration working ? > > > > I know there used to be some of our team running Avocado tests for > > compatibility regularly, I'm not sure of the current status. > > It's something we also do regularly around when we do downstream > > releases, so we tend to catch them then, although even on x86 that > > often turns out to be a bit late. > > So downstream testing only? Not even downstream for the Arm architecture, at least not at Red Hat. The support we have for Arm Virt is still limited to the use cases for which it has been deployed. To this day that hasn't included migration[*]. > I think that unless we either (a) start > doing migration-compat testing consistently upstream or This is the best choice and it can certainly be an additional approach regardless of what goes on downstream. I can look into this. A pointer to the x86 tests would be a good start. It's pretty simple to do a quick migration test between two versions of QEMU, but we need the whole build two versions of QEMU stuff first, which I hope already exists. > (b) RedHat or > some other downstream start testing and reporting compat issues > to us for aarch64 as they do for x86-64, Red Hat isn't currently allocating resources to this type of testing for AArch64. And, even if it were to start, it would likely focus on compat testing of RHEL arm-virt machine types, which aren't exactly the same thing as the upstream arm-virt machine types. In fact, currently, there are only two (virt-rhel8.2.0 and virt-rhel8.3.0). Thanks, drew [*] Migration support hasn't been a real high priority for AArch64 KVM, because it currently requires identical host CPU types and identical host kernels to work. It'd be nice if the destination host kernel could at least be a later version, but not even that is guaranteed to work at this time. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ARM Snapshots Not Backwards-Compatible 2021-02-03 12:44 ` Andrew Jones @ 2021-02-03 15:45 ` aaron--- via 2021-02-03 15:53 ` Andrew Jones 0 siblings, 1 reply; 22+ messages in thread From: aaron--- via @ 2021-02-03 15:45 UTC (permalink / raw) To: Andrew Jones Cc: Peter Maydell, Juan Quintela, QEMU Developers, Dr. David Alan Gilbert, qemu-arm, Philippe Mathieu-Daudé On Feb 03 13:44, Andrew Jones wrote: > On Wed, Feb 03, 2021 at 10:52:59AM +0000, Peter Maydell wrote: > > On Wed, 3 Feb 2021 at 10:49, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > > > > > > * Peter Maydell (peter.maydell@linaro.org) wrote: > > > > On Wed, 3 Feb 2021 at 10:28, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > > > > > > > > > > * Philippe Mathieu-Daudé (philmd@redhat.com) wrote: > > > > > > Cc'ing migration team and qemu-arm@ list. > > > > > > > > > > I'll have to leave the detail of that to the ARM peole; but from a > > > > > migration point of view I think we do want the 64 bit ARM migrations to > > > > > be stable now. Please tie incompatible changes to machine types. > > > > > > > > That is the intention, but because there's no upstream testing > > > > of migration compat, we never notice if we get it wrong. > > > > What is x86 doing to keep cross-version migration working ? > > > > > > I know there used to be some of our team running Avocado tests for > > > compatibility regularly, I'm not sure of the current status. > > > It's something we also do regularly around when we do downstream > > > releases, so we tend to catch them then, although even on x86 that > > > often turns out to be a bit late. > > > > So downstream testing only? > > Not even downstream for the Arm architecture, at least not at Red Hat. The > support we have for Arm Virt is still limited to the use cases for which > it has been deployed. To this day that hasn't included migration[*]. > > > I think that unless we either (a) start > > doing migration-compat testing consistently upstream or > > This is the best choice and it can certainly be an additional approach > regardless of what goes on downstream. I can look into this. A pointer > to the x86 tests would be a good start. It's pretty simple to do a > quick migration test between two versions of QEMU, but we need the > whole build two versions of QEMU stuff first, which I hope already > exists. Does this mean that this is largely an issue of developing the tests, and not a need for a place to host them? Or would additional hardware/hosting be required for these tests to be run on? -Aaron ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ARM Snapshots Not Backwards-Compatible 2021-02-03 15:45 ` aaron--- via @ 2021-02-03 15:53 ` Andrew Jones 0 siblings, 0 replies; 22+ messages in thread From: Andrew Jones @ 2021-02-03 15:53 UTC (permalink / raw) To: Aaron Lindsay Cc: Peter Maydell, Juan Quintela, QEMU Developers, Dr. David Alan Gilbert, qemu-arm, Philippe Mathieu-Daudé On Wed, Feb 03, 2021 at 10:45:47AM -0500, Aaron Lindsay wrote: > On Feb 03 13:44, Andrew Jones wrote: > > On Wed, Feb 03, 2021 at 10:52:59AM +0000, Peter Maydell wrote: > > > On Wed, 3 Feb 2021 at 10:49, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > > > > > > > > * Peter Maydell (peter.maydell@linaro.org) wrote: > > > > > On Wed, 3 Feb 2021 at 10:28, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > > > > > > > > > > > > * Philippe Mathieu-Daudé (philmd@redhat.com) wrote: > > > > > > > Cc'ing migration team and qemu-arm@ list. > > > > > > > > > > > > I'll have to leave the detail of that to the ARM peole; but from a > > > > > > migration point of view I think we do want the 64 bit ARM migrations to > > > > > > be stable now. Please tie incompatible changes to machine types. > > > > > > > > > > That is the intention, but because there's no upstream testing > > > > > of migration compat, we never notice if we get it wrong. > > > > > What is x86 doing to keep cross-version migration working ? > > > > > > > > I know there used to be some of our team running Avocado tests for > > > > compatibility regularly, I'm not sure of the current status. > > > > It's something we also do regularly around when we do downstream > > > > releases, so we tend to catch them then, although even on x86 that > > > > often turns out to be a bit late. > > > > > > So downstream testing only? > > > > Not even downstream for the Arm architecture, at least not at Red Hat. The > > support we have for Arm Virt is still limited to the use cases for which > > it has been deployed. To this day that hasn't included migration[*]. > > > > > I think that unless we either (a) start > > > doing migration-compat testing consistently upstream or > > > > This is the best choice and it can certainly be an additional approach > > regardless of what goes on downstream. I can look into this. A pointer > > to the x86 tests would be a good start. It's pretty simple to do a > > quick migration test between two versions of QEMU, but we need the > > whole build two versions of QEMU stuff first, which I hope already > > exists. > > Does this mean that this is largely an issue of developing the tests, > and not a need for a place to host them? Or would additional > hardware/hosting be required for these tests to be run on? > I wouldn't expect the new tests to require new resources. Whatever upstream already has for the tests that are already being run should be sufficient. I think we just need to see if we can develop/enable these types of tests in the already present CI. At least for starters... Thanks, drew ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ARM Snapshots Not Backwards-Compatible 2021-02-03 4:01 ARM Snapshots Not Backwards-Compatible Aaron Lindsay 2021-02-03 8:21 ` Philippe Mathieu-Daudé @ 2021-02-03 10:01 ` Peter Maydell 2021-02-03 14:58 ` Aaron Lindsay 1 sibling, 1 reply; 22+ messages in thread From: Peter Maydell @ 2021-02-03 10:01 UTC (permalink / raw) To: Aaron Lindsay; +Cc: QEMU Developers On Wed, 3 Feb 2021 at 04:01, Aaron Lindsay <aaron@os.amperecomputing.com> wrote: > > Hello, > > I'm attempting to restore an AArch64 snapshot taken on QEMU 4.1.0 on > QEMU 5.2.0, using system mode. My previous impression, possibly from > https://wiki.qemu.org/Features/Migration/Troubleshooting#Basics was that > this ought to work: > > > Note that QEMU supports migrating forward between QEMU versions > > Note that I'm using qemu-system-aarch64 with -loadvm. You don't say what machine type and command line you're using. Strictly, Strictly speaking, the intended guarantee only covers versioned machines, eg "virt-4.1" on QEMU 4.1 to "virt-4.1" on QEMU 5.2. > The third is that meanings of the bits in env->features (as defined by > `enum arm_features` in target/arm/cpu.h) has shifted. For example, > ARM_FEATURE_PXN, ARM_FEATURE_CRC, ARM_FEATURE_VFP, ARM_FEATURE_VFP3, > ARM_FEATURE_VFP4 have all been removed and ARM_FEATURE_V8_1M has been > added since 4.1.0. Heck, even I have added a field there in the past. > Unfortunately, these additions/removals mean that when env->features is > saved on one version and restored on another the bits can mean different > things. Notably, the removal of the *VFP features means that a snapshot > of a CPU reporting it supports ARM_FEATURE_VFP3 on 4.1.0 thinks it's now > ARM_FEATURE_M on 5.2.0! Ow. I didn't realize the env->features was in the migration state :-( There is no reason for it to be, because it's a constant property of the CPU. The easy fix is to replace VMSTATE_UINT64(env.features, ARMCPU), in target/arm/machine.c with whatever the syntax is for "ignore 64 bits of data here". Then we'll ignore whatever is coming in from the source, which we don't need, and we'll stop sending it out if we're the destination. thanks -- PMM ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ARM Snapshots Not Backwards-Compatible 2021-02-03 10:01 ` Peter Maydell @ 2021-02-03 14:58 ` Aaron Lindsay 2021-02-03 15:02 ` Philippe Mathieu-Daudé 0 siblings, 1 reply; 22+ messages in thread From: Aaron Lindsay @ 2021-02-03 14:58 UTC (permalink / raw) To: Peter Maydell Cc: Andrew Jones, Juan Quintela, qemu-devel, Dr. David Alan Gilbert, qemu-arm, philmd On Feb 03 10:01, Peter Maydell wrote: > On Wed, 3 Feb 2021 at 04:01, Aaron Lindsay <aaron@os.amperecomputing.com> wrote: > > Note that I'm using qemu-system-aarch64 with -loadvm. > > You don't say what machine type and command line you're using. Strictly, > Strictly speaking, the intended guarantee only covers versioned > machines, eg "virt-4.1" on QEMU 4.1 to "virt-4.1" on QEMU 5.2. Sorry for the omission, `-M virt` was used to capture the snapshots on 4.1.0 and I'm using `-M virt-4.1` to restore it on 5.2.0. You get a fairly helpful error message about that one if you screw it up: "qemu-system-aarch64: Machine type received is 'virt-4.1' and local is 'virt-5.2'" I'm restoring using the following command-line on 5.2.0. Snapshots were taken with the same, except `-M virt` and without `-loadvm` on 4.1.0: $ qemu-system-aarch64 -smp 1 -M virt-4.1 -cpu max -m 8G -icount 0 -nographic -semihosting -kernel ./Image -append 'console=ttyAMA0 root=/dev/vda3 rw earlycon=smh' -drive if=none,id=image,file=rootfs.qcow2,format=qcow2 -device virtio-blk-pci,drive=image -loadvm "snapshot-name" I suspect that some of the differences I raised in this section: > > These hit errors detected in `write_list_to_cpustate` in > > target/arm/helper.c. may be from the fact that I'm running on -max, which understandably may cause changes in the ID registers between QEMU versions. I'm less concerned about issues related to my usage of -max than I am about the others. > > The third is that meanings of the bits in env->features (as defined by > > `enum arm_features` in target/arm/cpu.h) has shifted. For example, > > ARM_FEATURE_PXN, ARM_FEATURE_CRC, ARM_FEATURE_VFP, ARM_FEATURE_VFP3, > > ARM_FEATURE_VFP4 have all been removed and ARM_FEATURE_V8_1M has been > > added since 4.1.0. Heck, even I have added a field there in the past. > > Unfortunately, these additions/removals mean that when env->features is > > saved on one version and restored on another the bits can mean different > > things. Notably, the removal of the *VFP features means that a snapshot > > of a CPU reporting it supports ARM_FEATURE_VFP3 on 4.1.0 thinks it's now > > ARM_FEATURE_M on 5.2.0! > > Ow. I didn't realize the env->features was in the migration state :-( > There is no reason for it to be, because it's a constant property > of the CPU. The easy fix is to replace > VMSTATE_UINT64(env.features, ARMCPU), > in target/arm/machine.c with whatever the syntax is for "ignore > 64 bits of data here". Then we'll ignore whatever is coming in > from the source, which we don't need, and we'll stop sending it > out if we're the destination. I'll look into this. -Aaron ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ARM Snapshots Not Backwards-Compatible 2021-02-03 14:58 ` Aaron Lindsay @ 2021-02-03 15:02 ` Philippe Mathieu-Daudé 2021-02-03 15:10 ` Dr. David Alan Gilbert 0 siblings, 1 reply; 22+ messages in thread From: Philippe Mathieu-Daudé @ 2021-02-03 15:02 UTC (permalink / raw) To: Aaron Lindsay, Peter Maydell Cc: Andrew Jones, qemu-arm, qemu-devel, Dr. David Alan Gilbert, Juan Quintela On 2/3/21 3:58 PM, Aaron Lindsay wrote: > On Feb 03 10:01, Peter Maydell wrote: >>> The third is that meanings of the bits in env->features (as defined by >>> `enum arm_features` in target/arm/cpu.h) has shifted. For example, >>> ARM_FEATURE_PXN, ARM_FEATURE_CRC, ARM_FEATURE_VFP, ARM_FEATURE_VFP3, >>> ARM_FEATURE_VFP4 have all been removed and ARM_FEATURE_V8_1M has been >>> added since 4.1.0. Heck, even I have added a field there in the past. >>> Unfortunately, these additions/removals mean that when env->features is >>> saved on one version and restored on another the bits can mean different >>> things. Notably, the removal of the *VFP features means that a snapshot >>> of a CPU reporting it supports ARM_FEATURE_VFP3 on 4.1.0 thinks it's now >>> ARM_FEATURE_M on 5.2.0! >> >> Ow. I didn't realize the env->features was in the migration state :-( >> There is no reason for it to be, because it's a constant property >> of the CPU. The easy fix is to replace >> VMSTATE_UINT64(env.features, ARMCPU), >> in target/arm/machine.c with whatever the syntax is for "ignore >> 64 bits of data here". Then we'll ignore whatever is coming in >> from the source, which we don't need, and we'll stop sending it >> out if we're the destination. > > I'll look into this. I think this is: VMSTATE_UNUSED(sizeof(uint64_t)) > > -Aaron > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ARM Snapshots Not Backwards-Compatible 2021-02-03 15:02 ` Philippe Mathieu-Daudé @ 2021-02-03 15:10 ` Dr. David Alan Gilbert 2021-02-03 15:26 ` Peter Maydell 2021-02-03 15:42 ` ARM Snapshots Not Backwards-Compatible aaron--- via 0 siblings, 2 replies; 22+ messages in thread From: Dr. David Alan Gilbert @ 2021-02-03 15:10 UTC (permalink / raw) To: Philippe Mathieu-Daudé Cc: Peter Maydell, Andrew Jones, Juan Quintela, qemu-devel, Aaron Lindsay, qemu-arm * Philippe Mathieu-Daudé (philmd@redhat.com) wrote: > On 2/3/21 3:58 PM, Aaron Lindsay wrote: > > On Feb 03 10:01, Peter Maydell wrote: > >>> The third is that meanings of the bits in env->features (as defined by > >>> `enum arm_features` in target/arm/cpu.h) has shifted. For example, > >>> ARM_FEATURE_PXN, ARM_FEATURE_CRC, ARM_FEATURE_VFP, ARM_FEATURE_VFP3, > >>> ARM_FEATURE_VFP4 have all been removed and ARM_FEATURE_V8_1M has been > >>> added since 4.1.0. Heck, even I have added a field there in the past. > >>> Unfortunately, these additions/removals mean that when env->features is > >>> saved on one version and restored on another the bits can mean different > >>> things. Notably, the removal of the *VFP features means that a snapshot > >>> of a CPU reporting it supports ARM_FEATURE_VFP3 on 4.1.0 thinks it's now > >>> ARM_FEATURE_M on 5.2.0! > >> > >> Ow. I didn't realize the env->features was in the migration state :-( > >> There is no reason for it to be, because it's a constant property > >> of the CPU. The easy fix is to replace > >> VMSTATE_UINT64(env.features, ARMCPU), > >> in target/arm/machine.c with whatever the syntax is for "ignore > >> 64 bits of data here". Then we'll ignore whatever is coming in > >> from the source, which we don't need, and we'll stop sending it > >> out if we're the destination. > > > > I'll look into this. > > I think this is: > > VMSTATE_UNUSED(sizeof(uint64_t)) It's interesting that on x86 we've got a longterm request to *add* cpu features to the stream to detect screwups caused by using mismatched CPUs; so it's not necessarily a bad idea to include it once you realise it's there. Dave > > > > -Aaron > > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ARM Snapshots Not Backwards-Compatible 2021-02-03 15:10 ` Dr. David Alan Gilbert @ 2021-02-03 15:26 ` Peter Maydell 2021-02-03 15:54 ` Aaron Lindsay 2021-02-03 16:13 ` [PATCH] target/arm: Don't migrate CPUARMState.features Aaron Lindsay 2021-02-03 15:42 ` ARM Snapshots Not Backwards-Compatible aaron--- via 1 sibling, 2 replies; 22+ messages in thread From: Peter Maydell @ 2021-02-03 15:26 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Andrew Jones, Juan Quintela, QEMU Developers, Aaron Lindsay, qemu-arm, Philippe Mathieu-Daudé On Wed, 3 Feb 2021 at 15:11, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > It's interesting that on x86 we've got a longterm request to *add* cpu > features to the stream to detect screwups caused by using mismatched > CPUs; so it's not necessarily a bad idea to include it once you realise > it's there. I think we would want to do that by checking the ID registers, not the legacy ad-hoc feature-flags word. In fact I think for KVM at least we already perform the check on the ID registers, in that the kernel will return an error if we pass it a value for an ID register and it's not the value it should be. I forget whether we try to check this for TCG (the mechanism for handling sysreg migration there is different). As a side note I suspect that the first two issues Aaron ran into are TCG-only and don't affect KVM. thanks -- PMM ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ARM Snapshots Not Backwards-Compatible 2021-02-03 15:26 ` Peter Maydell @ 2021-02-03 15:54 ` Aaron Lindsay 2021-02-03 16:13 ` [PATCH] target/arm: Don't migrate CPUARMState.features Aaron Lindsay 1 sibling, 0 replies; 22+ messages in thread From: Aaron Lindsay @ 2021-02-03 15:54 UTC (permalink / raw) To: Peter Maydell Cc: Andrew Jones, Juan Quintela, Dr. David Alan Gilbert, QEMU Developers, qemu-arm, Philippe Mathieu-Daudé On Feb 03 15:26, Peter Maydell wrote: > On Wed, 3 Feb 2021 at 15:11, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > > It's interesting that on x86 we've got a longterm request to *add* cpu > > features to the stream to detect screwups caused by using mismatched > > CPUs; so it's not necessarily a bad idea to include it once you realise > > it's there. > > I think we would want to do that by checking the ID registers, > not the legacy ad-hoc feature-flags word. In fact I think for > KVM at least we already perform the check on the ID registers, > in that the kernel will return an error if we pass it a value > for an ID register and it's not the value it should be. > I forget whether we try to check this for TCG (the mechanism > for handling sysreg migration there is different). Ah, I sent my reply where I mentioned attempting to find a way to selectively check env.features before I saw this. I'll just send the patch for `VMSTATE_UNUSED(sizeof(uint64_t))`. -Aaron ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH] target/arm: Don't migrate CPUARMState.features 2021-02-03 15:26 ` Peter Maydell 2021-02-03 15:54 ` Aaron Lindsay @ 2021-02-03 16:13 ` Aaron Lindsay 2021-02-03 16:24 ` Philippe Mathieu-Daudé ` (2 more replies) 1 sibling, 3 replies; 22+ messages in thread From: Aaron Lindsay @ 2021-02-03 16:13 UTC (permalink / raw) To: qemu-devel, Peter Maydell Cc: Andrew Jones, Juan Quintela, Dr . David Alan Gilbert, Aaron Lindsay, qemu-arm, philmd As feature flags are added or removed, the meanings of bits in the `features` field can change between QEMU versions, causing migration failures. Additionally, migrating the field is not useful because it is a constant function of the CPU being used. Signed-off-by: Aaron Lindsay <aaron@os.amperecomputing.com> Suggested-by: Peter Maydell <peter.maydell@linaro.org> --- target/arm/machine.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/arm/machine.c b/target/arm/machine.c index c9e9fd0a12..7f2511b6ed 100644 --- a/target/arm/machine.c +++ b/target/arm/machine.c @@ -834,7 +834,7 @@ const VMStateDescription vmstate_arm_cpu = { VMSTATE_UINT64(env.exclusive_addr, ARMCPU), VMSTATE_UINT64(env.exclusive_val, ARMCPU), VMSTATE_UINT64(env.exclusive_high, ARMCPU), - VMSTATE_UINT64(env.features, ARMCPU), + VMSTATE_UNUSED(sizeof(uint64_t)), VMSTATE_UINT32(env.exception.syndrome, ARMCPU), VMSTATE_UINT32(env.exception.fsr, ARMCPU), VMSTATE_UINT64(env.exception.vaddress, ARMCPU), -- 2.17.1 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] target/arm: Don't migrate CPUARMState.features 2021-02-03 16:13 ` [PATCH] target/arm: Don't migrate CPUARMState.features Aaron Lindsay @ 2021-02-03 16:24 ` Philippe Mathieu-Daudé 2021-02-03 20:06 ` Andrew Jones 2021-02-08 13:11 ` Peter Maydell 2 siblings, 0 replies; 22+ messages in thread From: Philippe Mathieu-Daudé @ 2021-02-03 16:24 UTC (permalink / raw) To: Aaron Lindsay, qemu-devel, Peter Maydell Cc: Andrew Jones, qemu-arm, Dr . David Alan Gilbert, Juan Quintela On 2/3/21 5:13 PM, Aaron Lindsay wrote: > As feature flags are added or removed, the meanings of bits in the > `features` field can change between QEMU versions, causing migration > failures. Additionally, migrating the field is not useful because it is > a constant function of the CPU being used. Please don't bury patches within mailing list threads. BTW you found yet another 13 years old problem :) 918f5dca18d ("target-arm: Extend feature flags to 64 bits") aa941b94450 ("Savevm/loadvm bits for ARM core, the PXA2xx peripherals and Spitz hardware.") > > Signed-off-by: Aaron Lindsay <aaron@os.amperecomputing.com> > Suggested-by: Peter Maydell <peter.maydell@linaro.org> > --- > target/arm/machine.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/target/arm/machine.c b/target/arm/machine.c > index c9e9fd0a12..7f2511b6ed 100644 > --- a/target/arm/machine.c > +++ b/target/arm/machine.c > @@ -834,7 +834,7 @@ const VMStateDescription vmstate_arm_cpu = { > VMSTATE_UINT64(env.exclusive_addr, ARMCPU), > VMSTATE_UINT64(env.exclusive_val, ARMCPU), > VMSTATE_UINT64(env.exclusive_high, ARMCPU), > - VMSTATE_UINT64(env.features, ARMCPU), > + VMSTATE_UNUSED(sizeof(uint64_t)), > VMSTATE_UINT32(env.exception.syndrome, ARMCPU), > VMSTATE_UINT32(env.exception.fsr, ARMCPU), > VMSTATE_UINT64(env.exception.vaddress, ARMCPU), > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] target/arm: Don't migrate CPUARMState.features 2021-02-03 16:13 ` [PATCH] target/arm: Don't migrate CPUARMState.features Aaron Lindsay 2021-02-03 16:24 ` Philippe Mathieu-Daudé @ 2021-02-03 20:06 ` Andrew Jones 2021-02-08 13:11 ` Peter Maydell 2 siblings, 0 replies; 22+ messages in thread From: Andrew Jones @ 2021-02-03 20:06 UTC (permalink / raw) To: Aaron Lindsay Cc: Peter Maydell, Juan Quintela, qemu-devel, Dr . David Alan Gilbert, qemu-arm, philmd On Wed, Feb 03, 2021 at 11:13:40AM -0500, Aaron Lindsay wrote: > As feature flags are added or removed, the meanings of bits in the > `features` field can change between QEMU versions, causing migration > failures. Additionally, migrating the field is not useful because it is > a constant function of the CPU being used. > > Signed-off-by: Aaron Lindsay <aaron@os.amperecomputing.com> > Suggested-by: Peter Maydell <peter.maydell@linaro.org> > --- > target/arm/machine.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/target/arm/machine.c b/target/arm/machine.c > index c9e9fd0a12..7f2511b6ed 100644 > --- a/target/arm/machine.c > +++ b/target/arm/machine.c > @@ -834,7 +834,7 @@ const VMStateDescription vmstate_arm_cpu = { > VMSTATE_UINT64(env.exclusive_addr, ARMCPU), > VMSTATE_UINT64(env.exclusive_val, ARMCPU), > VMSTATE_UINT64(env.exclusive_high, ARMCPU), > - VMSTATE_UINT64(env.features, ARMCPU), > + VMSTATE_UNUSED(sizeof(uint64_t)), > VMSTATE_UINT32(env.exception.syndrome, ARMCPU), > VMSTATE_UINT32(env.exception.fsr, ARMCPU), > VMSTATE_UINT64(env.exception.vaddress, ARMCPU), > -- > 2.17.1 > I tested this with a KVM guest. Prior to this patch, a migration of a 4.1 guest from a 4.1 build of QEMU to a build of the latest bits of QEMU would fail with a segmentation fault. With this patch, the migration succeeds. However, migrating a virt-4.1 machine type from a QEMU of the latest bits to a 4.1 build of QEMU still fails with qemu-system-aarch64: error while loading state for instance 0x0 of device 'pl011' qemu-system-aarch64: load of migration failed: No such file or directory This means "ping-pong" migrations fail, which is another thing we would prefer to work. But, as that appears to be a different bug, regarding this patch Reviewed-by: Andrew Jones <drjones@redhat.com> Tested-by: Andrew Jones <drjones@redhat.com> Thanks, drew ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] target/arm: Don't migrate CPUARMState.features 2021-02-03 16:13 ` [PATCH] target/arm: Don't migrate CPUARMState.features Aaron Lindsay 2021-02-03 16:24 ` Philippe Mathieu-Daudé 2021-02-03 20:06 ` Andrew Jones @ 2021-02-08 13:11 ` Peter Maydell 2 siblings, 0 replies; 22+ messages in thread From: Peter Maydell @ 2021-02-08 13:11 UTC (permalink / raw) To: Aaron Lindsay Cc: Andrew Jones, Juan Quintela, QEMU Developers, Dr . David Alan Gilbert, qemu-arm, Philippe Mathieu-Daudé On Wed, 3 Feb 2021 at 16:14, Aaron Lindsay <aaron@os.amperecomputing.com> wrote: > > As feature flags are added or removed, the meanings of bits in the > `features` field can change between QEMU versions, causing migration > failures. Additionally, migrating the field is not useful because it is > a constant function of the CPU being used. > > Signed-off-by: Aaron Lindsay <aaron@os.amperecomputing.com> > Suggested-by: Peter Maydell <peter.maydell@linaro.org> > --- > target/arm/machine.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Applied to target-arm.next, thanks. -- PMM ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: ARM Snapshots Not Backwards-Compatible 2021-02-03 15:10 ` Dr. David Alan Gilbert 2021-02-03 15:26 ` Peter Maydell @ 2021-02-03 15:42 ` aaron--- via 1 sibling, 0 replies; 22+ messages in thread From: aaron--- via @ 2021-02-03 15:42 UTC (permalink / raw) To: Dr. David Alan Gilbert Cc: Peter Maydell, Andrew Jones, Juan Quintela, qemu-devel, qemu-arm, Philippe Mathieu-Daudé On Feb 03 15:10, Dr. David Alan Gilbert wrote: > * Philippe Mathieu-Daudé (philmd@redhat.com) wrote: > > On 2/3/21 3:58 PM, Aaron Lindsay wrote: > > > On Feb 03 10:01, Peter Maydell wrote: > > >>> The third is that meanings of the bits in env->features (as defined by > > >>> `enum arm_features` in target/arm/cpu.h) has shifted. For example, > > >>> ARM_FEATURE_PXN, ARM_FEATURE_CRC, ARM_FEATURE_VFP, ARM_FEATURE_VFP3, > > >>> ARM_FEATURE_VFP4 have all been removed and ARM_FEATURE_V8_1M has been > > >>> added since 4.1.0. Heck, even I have added a field there in the past. > > >>> Unfortunately, these additions/removals mean that when env->features is > > >>> saved on one version and restored on another the bits can mean different > > >>> things. Notably, the removal of the *VFP features means that a snapshot > > >>> of a CPU reporting it supports ARM_FEATURE_VFP3 on 4.1.0 thinks it's now > > >>> ARM_FEATURE_M on 5.2.0! > > >> > > >> Ow. I didn't realize the env->features was in the migration state :-( > > >> There is no reason for it to be, because it's a constant property > > >> of the CPU. The easy fix is to replace > > >> VMSTATE_UINT64(env.features, ARMCPU), > > >> in target/arm/machine.c with whatever the syntax is for "ignore > > >> 64 bits of data here". Then we'll ignore whatever is coming in > > >> from the source, which we don't need, and we'll stop sending it > > >> out if we're the destination. > > > > > > I'll look into this. > > > > I think this is: > > > > VMSTATE_UNUSED(sizeof(uint64_t)) > > It's interesting that on x86 we've got a longterm request to *add* cpu > features to the stream to detect screwups caused by using mismatched > CPUs; so it's not necessarily a bad idea to include it once you realise > it's there. Another approach would be to assign integer values to the members of `enum arm_features` in target/arm/cpu.h instead of letting the compiler assign them for us. Of course, this only fixes the problem moving forward and doesn't allow future QEMU versions to gain the ability to load snapshots taken with previous QEMU versions. But seeing as how no one else has raised the issue, I'm not sure how many people this impacts. I've tested the `VMSTATE_UNUSED(sizeof(uint64_t))` change and it appears to have the same effect as me manually fixing up env.features (which is to say: it works). Maybe we could change it to `VMSTATE_UINT64(env.incoming_features, ARMCPU),` instead and add a check in `cpu_post_load` that the incoming_features match the expected ones (along with something matching in `cpu_pre_save` to populate it)? We could ignore the incoming_features field entirely in cpu_post_load for old snapshot versions, but check that they match moving forward (perhaps generating a mask from the current value of the enum to ignore fields which have been removed)? I can send a patch for whichever path we settle on. -Aaron ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2021-02-08 19:24 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-02-03 4:01 ARM Snapshots Not Backwards-Compatible Aaron Lindsay 2021-02-03 8:21 ` Philippe Mathieu-Daudé 2021-02-03 10:27 ` Dr. David Alan Gilbert 2021-02-03 10:38 ` Peter Maydell 2021-02-03 10:49 ` Dr. David Alan Gilbert 2021-02-03 10:52 ` Peter Maydell 2021-02-03 10:59 ` Dr. David Alan Gilbert 2021-02-03 16:04 ` Philippe Mathieu-Daudé 2021-02-03 12:44 ` Andrew Jones 2021-02-03 15:45 ` aaron--- via 2021-02-03 15:53 ` Andrew Jones 2021-02-03 10:01 ` Peter Maydell 2021-02-03 14:58 ` Aaron Lindsay 2021-02-03 15:02 ` Philippe Mathieu-Daudé 2021-02-03 15:10 ` Dr. David Alan Gilbert 2021-02-03 15:26 ` Peter Maydell 2021-02-03 15:54 ` Aaron Lindsay 2021-02-03 16:13 ` [PATCH] target/arm: Don't migrate CPUARMState.features Aaron Lindsay 2021-02-03 16:24 ` Philippe Mathieu-Daudé 2021-02-03 20:06 ` Andrew Jones 2021-02-08 13:11 ` Peter Maydell 2021-02-03 15:42 ` ARM Snapshots Not Backwards-Compatible aaron--- via
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.