All of lore.kernel.org
 help / color / mirror / Atom feed
* Guest migration between different Ryzen CPU generations
@ 2022-05-31 17:00 mike tancsa
  2022-06-02 12:42 ` Igor Mammedov
  2022-06-09 14:01 ` Paolo Bonzini
  0 siblings, 2 replies; 9+ messages in thread
From: mike tancsa @ 2022-05-31 17:00 UTC (permalink / raw)
  To: kvm

Hello,

     I have been using kvm since the Ubuntu 18 and 20.x LTS series of 
kernels and distributions without any issues on a whole range of Guests 
up until now. Recently, we spun up an Ubuntu LTS 22 hypervisor to add to 
the mix and eventually upgrade to. Hardware is a series of Ryzen 7 CPUs 
(3700x).  Migrations back and forth without issue for Ubuntu 20.x 
kernels.  The first Ubuntu 22 machine was on identical hardware and all 
was good with that too. The second Ubuntu 22 based machine was spun up 
with a newer gen Ryzen, a 5800x.  On the initial kernel version that 
came with that release back in April, migrations worked as expected 
between hardware as well as different kernel versions and qemu / KVM 
versions that come default with the distribution. Not sure if migrations 
between kernel and KVM versions "accidentally" worked all these years, 
but they did.  However, we ran into an issue with the kernel 
5.15.0-33-generic (possibly with 5.15.0-30 as well) thats part of 
Ubuntu.  Migrations no longer worked to older generation CPUs.  I could 
send a guest TO the box and all was fine, but upon sending the guest to 
another hypervisor, the sender would see it as successfully migrated, 
but the VM would typically just hang, with 100% CPU utilization, or 
sometimes crash.  I tried a 5.18 kernel from May 22nd and again the 
behavior is different. If I specify the CPU as EPYC or EPYC-IBPB, I can 
migrate back and forth.

Quick summary

On Ubuntu 20.04 LTS with latest Ubuntu updates, I can migrate VMs back 
and forth between a 3700x and a 5800x without issue. Guests are a mix of 
Ubuntu, Fedora and FreeBSD
On Ubuntu 22 LTS, with the original kernel from release day, I can 
migrate VMs back and forth between a 3700x and a 5800x without issue
On Ubuntu 22 LTS with everything up to date as of mid May 2022, I can 
migrate from the 3700X to the 5800x without issue. But going from the 
5800x to the 3700x results in a migrated VM that either crashes inside 
the VM or has the CPU pegged at 100% spinning its wheels with the guest 
frozen and needing a hard reset. This is with --live or without and with 
--unsafe or without. The crash / hang happens once the VM is fully 
migrated with the sender thinking it was successfully sent and the 
receiver thinking it successfully arrived in.
On stock Ubuntu 22 (5.15.0-33-generic) I can migrate back and forth to 
Ubuntu 20 as long as the hardware / cpu is identical (in this case, 3700x)
On Ubuntu 22 LTS with everything up to date as of mid May 2022 with 
5.18.0-051800-generic #202205222030 SMP PREEMPT_DYNAMIC Sun May 22. I 
can migrate VMs back and forth that have as its CPU def EPYC or 
EPYC-IBPB. If the def (in my one test case anyways) is Nehalem then I 
get a frozen VM on migration back to the 3700X.

Some more details at

https://ubuntuforums.org/showthread.php?t=2475399

Is this a bug ? Expected behavior ?  Is there a better place to ask 
these questions ?

Thanks in advance!

     ---Mike


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Guest migration between different Ryzen CPU generations
  2022-05-31 17:00 Guest migration between different Ryzen CPU generations mike tancsa
@ 2022-06-02 12:42 ` Igor Mammedov
  2022-06-02 15:09   ` mike tancsa
  2022-06-09 14:01 ` Paolo Bonzini
  1 sibling, 1 reply; 9+ messages in thread
From: Igor Mammedov @ 2022-06-02 12:42 UTC (permalink / raw)
  To: mike tancsa; +Cc: kvm, Leonardo Bras

On Tue, 31 May 2022 13:00:07 -0400
mike tancsa <mike@sentex.net> wrote:

> Hello,
> 
>      I have been using kvm since the Ubuntu 18 and 20.x LTS series of 
> kernels and distributions without any issues on a whole range of Guests 
> up until now. Recently, we spun up an Ubuntu LTS 22 hypervisor to add to 
> the mix and eventually upgrade to. Hardware is a series of Ryzen 7 CPUs 
> (3700x).  Migrations back and forth without issue for Ubuntu 20.x 
> kernels.  The first Ubuntu 22 machine was on identical hardware and all 
> was good with that too. The second Ubuntu 22 based machine was spun up 
> with a newer gen Ryzen, a 5800x.  On the initial kernel version that 
> came with that release back in April, migrations worked as expected 
> between hardware as well as different kernel versions and qemu / KVM 
> versions that come default with the distribution. Not sure if migrations 
> between kernel and KVM versions "accidentally" worked all these years, 
> but they did.  However, we ran into an issue with the kernel 
> 5.15.0-33-generic (possibly with 5.15.0-30 as well) thats part of 
> Ubuntu.  Migrations no longer worked to older generation CPUs.  I could 
> send a guest TO the box and all was fine, but upon sending the guest to 
> another hypervisor, the sender would see it as successfully migrated, 
> but the VM would typically just hang, with 100% CPU utilization, or 
> sometimes crash.  I tried a 5.18 kernel from May 22nd and again the 
> behavior is different. If I specify the CPU as EPYC or EPYC-IBPB, I can 
> migrate back and forth.

perhaps you are hitting issue fixed by:
https://lore.kernel.org/lkml/CAJ6HWG66HZ7raAa+YK0UOGLF+4O3JnzbZ+a-0j8GNixOhLk9dA@mail.gmail.com/T/


> Quick summary
> 
> On Ubuntu 20.04 LTS with latest Ubuntu updates, I can migrate VMs back 
> and forth between a 3700x and a 5800x without issue. Guests are a mix of 
> Ubuntu, Fedora and FreeBSD
> On Ubuntu 22 LTS, with the original kernel from release day, I can 
> migrate VMs back and forth between a 3700x and a 5800x without issue
> On Ubuntu 22 LTS with everything up to date as of mid May 2022, I can 
> migrate from the 3700X to the 5800x without issue. But going from the 
> 5800x to the 3700x results in a migrated VM that either crashes inside 
> the VM or has the CPU pegged at 100% spinning its wheels with the guest 
> frozen and needing a hard reset. This is with --live or without and with 
> --unsafe or without. The crash / hang happens once the VM is fully 
> migrated with the sender thinking it was successfully sent and the 
> receiver thinking it successfully arrived in.
> On stock Ubuntu 22 (5.15.0-33-generic) I can migrate back and forth to 
> Ubuntu 20 as long as the hardware / cpu is identical (in this case, 3700x)
> On Ubuntu 22 LTS with everything up to date as of mid May 2022 with 
> 5.18.0-051800-generic #202205222030 SMP PREEMPT_DYNAMIC Sun May 22. I 
> can migrate VMs back and forth that have as its CPU def EPYC or 
> EPYC-IBPB. If the def (in my one test case anyways) is Nehalem then I 
> get a frozen VM on migration back to the 3700X.
> 
> Some more details at
> 
> https://ubuntuforums.org/showthread.php?t=2475399
> 
> Is this a bug ? Expected behavior ?  Is there a better place to ask 
> these questions ?
> 
> Thanks in advance!
> 
>      ---Mike
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Guest migration between different Ryzen CPU generations
  2022-06-02 12:42 ` Igor Mammedov
@ 2022-06-02 15:09   ` mike tancsa
  2022-06-02 21:46     ` Sean Christopherson
  0 siblings, 1 reply; 9+ messages in thread
From: mike tancsa @ 2022-06-02 15:09 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: kvm, Leonardo Bras

On 6/2/2022 8:42 AM, Igor Mammedov wrote:
> On Tue, 31 May 2022 13:00:07 -0400
> mike tancsa <mike@sentex.net> wrote:
>
>> Hello,
>>
>>       I have been using kvm since the Ubuntu 18 and 20.x LTS series of
>> kernels and distributions without any issues on a whole range of Guests
>> up until now. Recently, we spun up an Ubuntu LTS 22 hypervisor to add to
>> the mix and eventually upgrade to. Hardware is a series of Ryzen 7 CPUs
>> (3700x).  Migrations back and forth without issue for Ubuntu 20.x
>> kernels.  The first Ubuntu 22 machine was on identical hardware and all
>> was good with that too. The second Ubuntu 22 based machine was spun up
>> with a newer gen Ryzen, a 5800x.  On the initial kernel version that
>> came with that release back in April, migrations worked as expected
>> between hardware as well as different kernel versions and qemu / KVM
>> versions that come default with the distribution. Not sure if migrations
>> between kernel and KVM versions "accidentally" worked all these years,
>> but they did.  However, we ran into an issue with the kernel
>> 5.15.0-33-generic (possibly with 5.15.0-30 as well) thats part of
>> Ubuntu.  Migrations no longer worked to older generation CPUs.  I could
>> send a guest TO the box and all was fine, but upon sending the guest to
>> another hypervisor, the sender would see it as successfully migrated,
>> but the VM would typically just hang, with 100% CPU utilization, or
>> sometimes crash.  I tried a 5.18 kernel from May 22nd and again the
>> behavior is different. If I specify the CPU as EPYC or EPYC-IBPB, I can
>> migrate back and forth.
> perhaps you are hitting issue fixed by:
> https://lore.kernel.org/lkml/CAJ6HWG66HZ7raAa+YK0UOGLF+4O3JnzbZ+a-0j8GNixOhLk9dA@mail.gmail.com/T/
>
Thanks for the response. I am not sure. That patch is from Feb. Would 
the bug have been introduced sometime in May to the 5.15 kernel than 
Ubuntu 22 would have tracked ?

Looking at the CPU flags diff between the 5800 and the 3700,

diff -u 3700x 5800x
--- 3700x       2022-06-02 14:57:00.331309878 +0000
+++ 5800x       2022-06-02 14:56:52.403340136 +0000
@@ -77,6 +77,7 @@
  hw_pstate
  ssbd
  mba
+ibrs
  ibpb
  stibp
  vmmcall
@@ -85,6 +86,8 @@
  avx2
  smep
  bmi2
+erms
+invpcid
  cqm
  rdt_a
  rdseed
@@ -122,13 +125,15 @@
  vgif
  v_spec_ctrl
  umip
+pku
+ospke
+vaes
+vpclmulqdq
  rdpid
  overflow_recov
  succor
  smca
-sme
-sev
-sev_es
+fsrm
  bugs
  sysret_ss_attrs
  spectre_v1



>> Quick summary
>>
>> On Ubuntu 20.04 LTS with latest Ubuntu updates, I can migrate VMs back
>> and forth between a 3700x and a 5800x without issue. Guests are a mix of
>> Ubuntu, Fedora and FreeBSD
>> On Ubuntu 22 LTS, with the original kernel from release day, I can
>> migrate VMs back and forth between a 3700x and a 5800x without issue
>> On Ubuntu 22 LTS with everything up to date as of mid May 2022, I can
>> migrate from the 3700X to the 5800x without issue. But going from the
>> 5800x to the 3700x results in a migrated VM that either crashes inside
>> the VM or has the CPU pegged at 100% spinning its wheels with the guest
>> frozen and needing a hard reset. This is with --live or without and with
>> --unsafe or without. The crash / hang happens once the VM is fully
>> migrated with the sender thinking it was successfully sent and the
>> receiver thinking it successfully arrived in.
>> On stock Ubuntu 22 (5.15.0-33-generic) I can migrate back and forth to
>> Ubuntu 20 as long as the hardware / cpu is identical (in this case, 3700x)
>> On Ubuntu 22 LTS with everything up to date as of mid May 2022 with
>> 5.18.0-051800-generic #202205222030 SMP PREEMPT_DYNAMIC Sun May 22. I
>> can migrate VMs back and forth that have as its CPU def EPYC or
>> EPYC-IBPB. If the def (in my one test case anyways) is Nehalem then I
>> get a frozen VM on migration back to the 3700X.
>>
>> Some more details at
>>
>> https://ubuntuforums.org/showthread.php?t=2475399
>>
>> Is this a bug ? Expected behavior ?  Is there a better place to ask
>> these questions ?
>>
>> Thanks in advance!
>>
>>       ---Mike
>>
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Guest migration between different Ryzen CPU generations
  2022-06-02 15:09   ` mike tancsa
@ 2022-06-02 21:46     ` Sean Christopherson
  2022-06-03 13:18       ` mike tancsa
  0 siblings, 1 reply; 9+ messages in thread
From: Sean Christopherson @ 2022-06-02 21:46 UTC (permalink / raw)
  To: mike tancsa; +Cc: Igor Mammedov, kvm, Leonardo Bras

On Thu, Jun 02, 2022, mike tancsa wrote:
> On 6/2/2022 8:42 AM, Igor Mammedov wrote:
> > On Tue, 31 May 2022 13:00:07 -0400
> > mike tancsa <mike@sentex.net> wrote:
> > 
> > > Hello,
> > > 
> > >       I have been using kvm since the Ubuntu 18 and 20.x LTS series of
> > > kernels and distributions without any issues on a whole range of Guests
> > > up until now. Recently, we spun up an Ubuntu LTS 22 hypervisor to add to
> > > the mix and eventually upgrade to. Hardware is a series of Ryzen 7 CPUs
> > > (3700x).  Migrations back and forth without issue for Ubuntu 20.x
> > > kernels.  The first Ubuntu 22 machine was on identical hardware and all
> > > was good with that too. The second Ubuntu 22 based machine was spun up
> > > with a newer gen Ryzen, a 5800x.  On the initial kernel version that
> > > came with that release back in April, migrations worked as expected
> > > between hardware as well as different kernel versions and qemu / KVM
> > > versions that come default with the distribution. Not sure if migrations
> > > between kernel and KVM versions "accidentally" worked all these years,
> > > but they did.  However, we ran into an issue with the kernel
> > > 5.15.0-33-generic (possibly with 5.15.0-30 as well) thats part of
> > > Ubuntu.  Migrations no longer worked to older generation CPUs.  I could
> > > send a guest TO the box and all was fine, but upon sending the guest to
> > > another hypervisor, the sender would see it as successfully migrated,
> > > but the VM would typically just hang, with 100% CPU utilization, or
> > > sometimes crash.  I tried a 5.18 kernel from May 22nd and again the
> > > behavior is different. If I specify the CPU as EPYC or EPYC-IBPB, I can
> > > migrate back and forth.
> > perhaps you are hitting issue fixed by:
> > https://lore.kernel.org/lkml/CAJ6HWG66HZ7raAa+YK0UOGLF+4O3JnzbZ+a-0j8GNixOhLk9dA@mail.gmail.com/T/
> > 
> Thanks for the response. I am not sure.

I suspect Igor is right.  PKRU/PKU, the offending XSAVE feature in that bug, is
in the "new in 5800" list below, and that bug fix went into v5.17, i.e. should
also be fixed in v5.18.

Unfortunately, there's no Fixes: provided and I'm having a hell of a time trying
to figure out when the bug was actually introduced.  The v5.15 code base is quite
different due to a rather massive FPU rework in v5.16.  That fix definitely would
not apply cleanly, but it doesn't mean that the underlying root cause is different,
e.g. the buggy code could easily have been lurking for multiple kernel versions
before the rework in v5.16.

> That patch is from Feb. Would the bug have been introduced sometime in May to
> the 5.15 kernel than Ubuntu 22 would have tracked ?

Dates don't necessarily mean a whole lot when it comes to stable kernels, e.g.
it's not uncommon for a change to be backported to a stable kernel weeks/months
after it initially landed in the upstream tree.

Is moving to v5.17 or later an option for you?  If not, what was the "original"
Ubuntu 22 kernel version that worked?  Ideally, assuming it's the same FPU/PKU bug,
the fix would be backported to v5.15, but that's likely going to be quite difficult,
especially without knowing exactly which commit introduced the bug.

> Looking at the CPU flags diff between the 5800 and the 3700,
> 
> diff -u 3700x 5800x
> --- 3700x       2022-06-02 14:57:00.331309878 +0000
> +++ 5800x       2022-06-02 14:56:52.403340136 +0000
> @@ -77,6 +77,7 @@
>  hw_pstate
>  ssbd
>  mba
> +ibrs
>  ibpb
>  stibp
>  vmmcall
> @@ -85,6 +86,8 @@
>  avx2
>  smep
>  bmi2
> +erms
> +invpcid
>  cqm
>  rdt_a
>  rdseed
> @@ -122,13 +125,15 @@
>  vgif
>  v_spec_ctrl
>  umip
> +pku
> +ospke
> +vaes
> +vpclmulqdq
>  rdpid
>  overflow_recov
>  succor
>  smca
> -sme
> -sev
> -sev_es
> +fsrm
>  bugs
>  sysret_ss_attrs
>  spectre_v1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Guest migration between different Ryzen CPU generations
  2022-06-02 21:46     ` Sean Christopherson
@ 2022-06-03 13:18       ` mike tancsa
  2022-06-03 15:09         ` Sean Christopherson
  0 siblings, 1 reply; 9+ messages in thread
From: mike tancsa @ 2022-06-03 13:18 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Igor Mammedov, kvm, Leonardo Bras

On 6/2/2022 5:46 PM, Sean Christopherson wrote:
> On Thu, Jun 02, 2022, mike tancsa wrote:
>> On 6/2/2022 8:42 AM, Igor Mammedov wrote:
>>> On Tue, 31 May 2022 13:00:07 -0400
>>> mike tancsa <mike@sentex.net> wrote:
>>>
>>>> Hello,
>>>>
>>>>        I have been using kvm since the Ubuntu 18 and 20.x LTS series of
>>>> kernels and distributions without any issues on a whole range of Guests
>>>> up until now. Recently, we spun up an Ubuntu LTS 22 hypervisor to add to
>>>> the mix and eventually upgrade to. Hardware is a series of Ryzen 7 CPUs
>>>> (3700x).  Migrations back and forth without issue for Ubuntu 20.x
>>>> kernels.  The first Ubuntu 22 machine was on identical hardware and all
>>>> was good with that too. The second Ubuntu 22 based machine was spun up
>>>> with a newer gen Ryzen, a 5800x.  On the initial kernel version that
>>>> came with that release back in April, migrations worked as expected
>>>> between hardware as well as different kernel versions and qemu / KVM
>>>> versions that come default with the distribution. Not sure if migrations
>>>> between kernel and KVM versions "accidentally" worked all these years,
>>>> but they did.  However, we ran into an issue with the kernel
>>>> 5.15.0-33-generic (possibly with 5.15.0-30 as well) thats part of
>>>> Ubuntu.  Migrations no longer worked to older generation CPUs.  I could
>>>> send a guest TO the box and all was fine, but upon sending the guest to
>>>> another hypervisor, the sender would see it as successfully migrated,
>>>> but the VM would typically just hang, with 100% CPU utilization, or
>>>> sometimes crash.  I tried a 5.18 kernel from May 22nd and again the
>>>> behavior is different. If I specify the CPU as EPYC or EPYC-IBPB, I can
>>>> migrate back and forth.
>>> perhaps you are hitting issue fixed by:
>>> https://lore.kernel.org/lkml/CAJ6HWG66HZ7raAa+YK0UOGLF+4O3JnzbZ+a-0j8GNixOhLk9dA@mail.gmail.com/T/
>>>
>> Thanks for the response. I am not sure.
> I suspect Igor is right.  PKRU/PKU, the offending XSAVE feature in that bug, is
> in the "new in 5800" list below, and that bug fix went into v5.17, i.e. should
> also be fixed in v5.18.
>
> Unfortunately, there's no Fixes: provided and I'm having a hell of a time trying
> to figure out when the bug was actually introduced.  The v5.15 code base is quite
> different due to a rather massive FPU rework in v5.16.  That fix definitely would
> not apply cleanly, but it doesn't mean that the underlying root cause is different,
> e.g. the buggy code could easily have been lurking for multiple kernel versions
> before the rework in v5.16.
>> That patch is from Feb. Would the bug have been introduced sometime in May to
>> the 5.15 kernel than Ubuntu 22 would have tracked ?
> Dates don't necessarily mean a whole lot when it comes to stable kernels, e.g.
> it's not uncommon for a change to be backported to a stable kernel weeks/months
> after it initially landed in the upstream tree.
>
> Is moving to v5.17 or later an option for you?  If not, what was the "original"
> Ubuntu 22 kernel version that worked?  Ideally, assuming it's the same FPU/PKU bug,
> the fix would be backported to v5.15, but that's likely going to be quite difficult,
> especially without knowing exactly which commit introduced the bug.

Thanks Sean, I can, but it just means adjusting our work flow a bit. For 
our hypervisors we like to just track LTS and be conservative in what 
software we install and stick with apps and kernels designed 
specifically to work with that release / distribution. The Ubuntu 22 
kernel that worked back in April was 5.15.0-25-generic.  TBH, if I am 
told we were just lucky things worked with different hardware and 
different kernels and KVM versions (ie. migrating bidirectionally from 
ubuntu 20.x to 22.x) I would be fine with that too.  But I was a little 
surprised that a kernel version bump from 5.15 would break what was working.

     ---Mike


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Guest migration between different Ryzen CPU generations
  2022-06-03 13:18       ` mike tancsa
@ 2022-06-03 15:09         ` Sean Christopherson
  0 siblings, 0 replies; 9+ messages in thread
From: Sean Christopherson @ 2022-06-03 15:09 UTC (permalink / raw)
  To: mike tancsa; +Cc: Igor Mammedov, kvm, Leonardo Bras

On Fri, Jun 03, 2022, mike tancsa wrote:
> On 6/2/2022 5:46 PM, Sean Christopherson wrote:
> > On Thu, Jun 02, 2022, mike tancsa wrote:
> > > On 6/2/2022 8:42 AM, Igor Mammedov wrote:
> > > > On Tue, 31 May 2022 13:00:07 -0400
> > > > mike tancsa <mike@sentex.net> wrote:
> > > > 
> > > > > Hello,
> > > > > 
> > > > >        I have been using kvm since the Ubuntu 18 and 20.x LTS series of
> > > > > kernels and distributions without any issues on a whole range of Guests
> > > > > up until now. Recently, we spun up an Ubuntu LTS 22 hypervisor to add to
> > > > > the mix and eventually upgrade to. Hardware is a series of Ryzen 7 CPUs
> > > > > (3700x).  Migrations back and forth without issue for Ubuntu 20.x
> > > > > kernels.  The first Ubuntu 22 machine was on identical hardware and all
> > > > > was good with that too. The second Ubuntu 22 based machine was spun up
> > > > > with a newer gen Ryzen, a 5800x.  On the initial kernel version that
> > > > > came with that release back in April, migrations worked as expected
> > > > > between hardware as well as different kernel versions and qemu / KVM
> > > > > versions that come default with the distribution. Not sure if migrations
> > > > > between kernel and KVM versions "accidentally" worked all these years,
> > > > > but they did.  However, we ran into an issue with the kernel
> > > > > 5.15.0-33-generic (possibly with 5.15.0-30 as well) thats part of
> > > > > Ubuntu.  Migrations no longer worked to older generation CPUs.  I could
> > > > > send a guest TO the box and all was fine, but upon sending the guest to
> > > > > another hypervisor, the sender would see it as successfully migrated,
> > > > > but the VM would typically just hang, with 100% CPU utilization, or
> > > > > sometimes crash.  I tried a 5.18 kernel from May 22nd and again the
> > > > > behavior is different. If I specify the CPU as EPYC or EPYC-IBPB, I can
> > > > > migrate back and forth.
> > > > perhaps you are hitting issue fixed by:
> > > > https://lore.kernel.org/lkml/CAJ6HWG66HZ7raAa+YK0UOGLF+4O3JnzbZ+a-0j8GNixOhLk9dA@mail.gmail.com/T/
> > > > 
> > > Thanks for the response. I am not sure.
> > I suspect Igor is right.  PKRU/PKU, the offending XSAVE feature in that bug, is
> > in the "new in 5800" list below, and that bug fix went into v5.17, i.e. should
> > also be fixed in v5.18.
> > 
> > Unfortunately, there's no Fixes: provided and I'm having a hell of a time trying
> > to figure out when the bug was actually introduced.  The v5.15 code base is quite
> > different due to a rather massive FPU rework in v5.16.  That fix definitely would
> > not apply cleanly, but it doesn't mean that the underlying root cause is different,
> > e.g. the buggy code could easily have been lurking for multiple kernel versions
> > before the rework in v5.16.
> > > That patch is from Feb. Would the bug have been introduced sometime in May to
> > > the 5.15 kernel than Ubuntu 22 would have tracked ?
> > Dates don't necessarily mean a whole lot when it comes to stable kernels, e.g.
> > it's not uncommon for a change to be backported to a stable kernel weeks/months
> > after it initially landed in the upstream tree.
> > 
> > Is moving to v5.17 or later an option for you?  If not, what was the "original"
> > Ubuntu 22 kernel version that worked?  Ideally, assuming it's the same FPU/PKU bug,
> > the fix would be backported to v5.15, but that's likely going to be quite difficult,
> > especially without knowing exactly which commit introduced the bug.
> 
> Thanks Sean, I can, but it just means adjusting our work flow a bit. For our
> hypervisors we like to just track LTS and be conservative in what software
> we install and stick with apps and kernels designed specifically to work
> with that release / distribution.

Yeah, tracking LTS is the right thing to do.  I'll try to verify and bisect the bug,
and then get the fix backported to v5.15.y, but it may be a week or two before that
happens.

> The Ubuntu 22 kernel that worked back in April was 5.15.0-25-generic.  TBH,
> if I am told we were just lucky things worked with different hardware and
> different kernels and KVM versions (ie.  migrating bidirectionally from
> ubuntu 20.x to 22.x) I would be fine with that too.  But I was a little
> surprised that a kernel version bump from 5.15 would break what was working.

Migrating between kernel/KVM versions is absolutely supposed to work, this is
firmly a kernel bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Guest migration between different Ryzen CPU generations
  2022-05-31 17:00 Guest migration between different Ryzen CPU generations mike tancsa
  2022-06-02 12:42 ` Igor Mammedov
@ 2022-06-09 14:01 ` Paolo Bonzini
  2022-06-09 14:08   ` mike tancsa
  1 sibling, 1 reply; 9+ messages in thread
From: Paolo Bonzini @ 2022-06-09 14:01 UTC (permalink / raw)
  To: mike tancsa, kvm

On 5/31/22 19:00, mike tancsa wrote:
> On Ubuntu 22 LTS, with the original kernel from release day, I can 
> migrate VMs back and forth between a 3700x and a 5800x without issue
> On Ubuntu 22 LTS with everything up to date as of mid May 2022, I can 
> migrate from the 3700X to the 5800x without issue. But going from the 
> 5800x to the 3700x results in a migrated VM that either crashes inside 
> the VM or has the CPU pegged at 100% spinning its wheels with the guest 
> frozen and needing a hard reset. This is with --live or without and with 
> --unsafe or without. The crash / hang happens once the VM is fully 
> migrated with the sender thinking it was successfully sent and the 
> receiver thinking it successfully arrived in.
> On stock Ubuntu 22 (5.15.0-33-generic) I can migrate back and forth to 
> Ubuntu 20 as long as the hardware / cpu is identical (in this case, 3700x)
> On Ubuntu 22 LTS with everything up to date as of mid May 2022 with 
> 5.18.0-051800-generic #202205222030 SMP PREEMPT_DYNAMIC Sun May 22. I 
> can migrate VMs back and forth that have as its CPU def EPYC or 
> EPYC-IBPB. If the def (in my one test case anyways) is Nehalem then I 
> get a frozen VM on migration back to the 3700X.
Hi, this is probably related to the patch at 
https://www.spinics.net/lists/stable/msg538630.html, which needs a 
backport to 5.15 however.

Note that using Intel definitions on AMD or vice versa is not going to 
always work, though in this case it seems to be a regression.

Paolo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Guest migration between different Ryzen CPU generations
  2022-06-09 14:01 ` Paolo Bonzini
@ 2022-06-09 14:08   ` mike tancsa
  2022-06-09 14:31     ` Paolo Bonzini
  0 siblings, 1 reply; 9+ messages in thread
From: mike tancsa @ 2022-06-09 14:08 UTC (permalink / raw)
  To: Paolo Bonzini, kvm

On 6/9/2022 10:01 AM, Paolo Bonzini wrote:
> On 5/31/22 19:00, mike tancsa wrote:
>> On Ubuntu 22 LTS, with the original kernel from release day, I can 
>> migrate VMs back and forth between a 3700x and a 5800x without issue
>> On Ubuntu 22 LTS with everything up to date as of mid May 2022, I can 
>> migrate from the 3700X to the 5800x without issue. But going from the 
>> 5800x to the 3700x results in a migrated VM that either crashes 
>> inside the VM or has the CPU pegged at 100% spinning its wheels with 
>> the guest frozen and needing a hard reset. This is with --live or 
>> without and with --unsafe or without. The crash / hang happens once 
>> the VM is fully migrated with the sender thinking it was successfully 
>> sent and the receiver thinking it successfully arrived in.
>> On stock Ubuntu 22 (5.15.0-33-generic) I can migrate back and forth 
>> to Ubuntu 20 as long as the hardware / cpu is identical (in this 
>> case, 3700x)
>> On Ubuntu 22 LTS with everything up to date as of mid May 2022 with 
>> 5.18.0-051800-generic #202205222030 SMP PREEMPT_DYNAMIC Sun May 22. I 
>> can migrate VMs back and forth that have as its CPU def EPYC or 
>> EPYC-IBPB. If the def (in my one test case anyways) is Nehalem then I 
>> get a frozen VM on migration back to the 3700X.
> Hi, this is probably related to the patch at 
> https://www.spinics.net/lists/stable/msg538630.html, which needs a 
> backport to 5.15 however.
>
> Note that using Intel definitions on AMD or vice versa is not going to 
> always work, though in this case it seems to be a regression.
>
Thanks for the followup. Forgive the naive question, but I am new to 
linux. Do patches like this typically get picked up by distributions 
like Ubuntu, or would I need open a bug report to flag this for them so 
its included in their updates ?

     ---Mike


> Paolo
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Guest migration between different Ryzen CPU generations
  2022-06-09 14:08   ` mike tancsa
@ 2022-06-09 14:31     ` Paolo Bonzini
  0 siblings, 0 replies; 9+ messages in thread
From: Paolo Bonzini @ 2022-06-09 14:31 UTC (permalink / raw)
  To: mike tancsa, kvm

On 6/9/22 16:08, mike tancsa wrote:
>>
> Thanks for the followup. Forgive the naive question, but I am new to 
> linux. Do patches like this typically get picked up by distributions 
> like Ubuntu, or would I need open a bug report to flag this for them so 
> its included in their updates ?

Yes, they are.

Paolo


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-06-09 14:31 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-31 17:00 Guest migration between different Ryzen CPU generations mike tancsa
2022-06-02 12:42 ` Igor Mammedov
2022-06-02 15:09   ` mike tancsa
2022-06-02 21:46     ` Sean Christopherson
2022-06-03 13:18       ` mike tancsa
2022-06-03 15:09         ` Sean Christopherson
2022-06-09 14:01 ` Paolo Bonzini
2022-06-09 14:08   ` mike tancsa
2022-06-09 14:31     ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.