All of lore.kernel.org
 help / color / mirror / Atom feed
* AMD Opteron 6276 to Intel Xeon E5645 live migration failure
@ 2014-02-28 12:16 ` Nick Thomas
  0 siblings, 0 replies; 6+ messages in thread
From: Nick Thomas @ 2014-02-28 12:16 UTC (permalink / raw)
  To: Qemu-devel; +Cc: kvm

Hi there,

We recently acquired some of the latter CPUs, and are attempting to add
them to our existing cluster of AMD 6200/6300 KVM hosts. My life is made
much easier if live migration between all hosts in the cluster works
flawlessly, but I can reliably trigger a migration failure when moving a
guest from the 6276 to the E5645 hosts.

The failure looks like:

KVM: injection failed, MSI lost (Operation not permitted)
KVM: entry failed, hardware error 0x80000021

If you're running a guest on an Intel machine without unrestricted mode
support, the failure can be most likely due to the guest entering an
invalid state for Intel VT. For example, the guest maybe running in big
real mode which is not supported on less recent Intel processors.

EAX=00000000 EBX=81600000 ECX=00000000 EDX=00000000
ESI=00000003 EDI=81600000 EBP=8168f060 ESP=81601f10
EIP=8102b36c EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     00000000 0000ffff
IDT=     00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00

The guest itself is running Linux in ordinary 64-bit mode at the point
where it's migrated.

I also get this in dmesg:

kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work properly. Using
workaround

I've tried kernels 3.2.54 and 3.10.26 on the source host, and 3.2.54,
3.10.26 and 3.12.9 on the destination; and qemu versions 1.5.3 and
1.7.0. Same error in all cases. I've also tried twiddling the
emulate_invalid_guest_state flag, to no effect.

I can reproduce it with a guest as simple as:

/opt/qemu-1.5.3/bin/qemu-system-x86_64 -enable-kvm -nographic -monitor
stdio -kernel /boot/vmlinuz-3.10.26-bigv-20 -initrd
/boot/initrd.img-3.10.26-bigv-20 -net none

on the source, then migrating that to the destination.

I can reliably boot and run the guest on the E5645 host, and migrate
that guest to the AMD host. Migrating the same guest back again fails
with the same error.

Migrating from a later Intel machine (i7-4600U) to the earlier Intel
seems to work fine.

I'd love to get this working, but I'm a little ignorant on where to
begin, or even if it's possible at all. Are these CPUs just too old, or
is a fixup missing in qemu (or kvm)?

/Nick

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Qemu-devel] AMD Opteron 6276 to Intel Xeon E5645 live migration failure
@ 2014-02-28 12:16 ` Nick Thomas
  0 siblings, 0 replies; 6+ messages in thread
From: Nick Thomas @ 2014-02-28 12:16 UTC (permalink / raw)
  To: Qemu-devel; +Cc: kvm

Hi there,

We recently acquired some of the latter CPUs, and are attempting to add
them to our existing cluster of AMD 6200/6300 KVM hosts. My life is made
much easier if live migration between all hosts in the cluster works
flawlessly, but I can reliably trigger a migration failure when moving a
guest from the 6276 to the E5645 hosts.

The failure looks like:

KVM: injection failed, MSI lost (Operation not permitted)
KVM: entry failed, hardware error 0x80000021

If you're running a guest on an Intel machine without unrestricted mode
support, the failure can be most likely due to the guest entering an
invalid state for Intel VT. For example, the guest maybe running in big
real mode which is not supported on less recent Intel processors.

EAX=00000000 EBX=81600000 ECX=00000000 EDX=00000000
ESI=00000003 EDI=81600000 EBP=8168f060 ESP=81601f10
EIP=8102b36c EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT=     00000000 0000ffff
IDT=     00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00

The guest itself is running Linux in ordinary 64-bit mode at the point
where it's migrated.

I also get this in dmesg:

kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work properly. Using
workaround

I've tried kernels 3.2.54 and 3.10.26 on the source host, and 3.2.54,
3.10.26 and 3.12.9 on the destination; and qemu versions 1.5.3 and
1.7.0. Same error in all cases. I've also tried twiddling the
emulate_invalid_guest_state flag, to no effect.

I can reproduce it with a guest as simple as:

/opt/qemu-1.5.3/bin/qemu-system-x86_64 -enable-kvm -nographic -monitor
stdio -kernel /boot/vmlinuz-3.10.26-bigv-20 -initrd
/boot/initrd.img-3.10.26-bigv-20 -net none

on the source, then migrating that to the destination.

I can reliably boot and run the guest on the E5645 host, and migrate
that guest to the AMD host. Migrating the same guest back again fails
with the same error.

Migrating from a later Intel machine (i7-4600U) to the earlier Intel
seems to work fine.

I'd love to get this working, but I'm a little ignorant on where to
begin, or even if it's possible at all. Are these CPUs just too old, or
is a fixup missing in qemu (or kvm)?

/Nick

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: AMD Opteron 6276 to Intel Xeon E5645 live migration failure
  2014-02-28 12:16 ` [Qemu-devel] " Nick Thomas
@ 2014-02-28 12:57   ` Paolo Bonzini
  -1 siblings, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2014-02-28 12:57 UTC (permalink / raw)
  To: Nick Thomas, Qemu-devel; +Cc: kvm

Il 28/02/2014 13:16, Nick Thomas ha scritto:
> I'd love to get this working, but I'm a little ignorant on where to
> begin, or even if it's possible at all. Are these CPUs just too old, or
> is a fixup missing in qemu (or kvm)?

It's the latter (in kvm).  Note that for migration to work, especially 
for such different models, you have to disable CPU features that aren't 
present in both models.  In general the way to do this is to add "-cpu".

I'd start debugging with "-cpu kvm64" on both sides.

Paolo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] AMD Opteron 6276 to Intel Xeon E5645 live migration failure
@ 2014-02-28 12:57   ` Paolo Bonzini
  0 siblings, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2014-02-28 12:57 UTC (permalink / raw)
  To: Nick Thomas, Qemu-devel; +Cc: kvm

Il 28/02/2014 13:16, Nick Thomas ha scritto:
> I'd love to get this working, but I'm a little ignorant on where to
> begin, or even if it's possible at all. Are these CPUs just too old, or
> is a fixup missing in qemu (or kvm)?

It's the latter (in kvm).  Note that for migration to work, especially 
for such different models, you have to disable CPU features that aren't 
present in both models.  In general the way to do this is to add "-cpu".

I'd start debugging with "-cpu kvm64" on both sides.

Paolo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: AMD Opteron 6276 to Intel Xeon E5645 live migration failure
  2014-02-28 12:57   ` [Qemu-devel] " Paolo Bonzini
@ 2014-02-28 13:15     ` Nick Thomas
  -1 siblings, 0 replies; 6+ messages in thread
From: Nick Thomas @ 2014-02-28 13:15 UTC (permalink / raw)
  To: Paolo Bonzini, Qemu-devel; +Cc: kvm

On 28/02/14 12:57, Paolo Bonzini wrote:
> Il 28/02/2014 13:16, Nick Thomas ha scritto:
>> I'd love to get this working, but I'm a little ignorant on where to
>> begin, or even if it's possible at all. Are these CPUs just too old, or
>> is a fixup missing in qemu (or kvm)?
> 
> It's the latter (in kvm).  Note that for migration to work, especially
> for such different models, you have to disable CPU features that aren't
> present in both models.  In general the way to do this is to add "-cpu".
> 
> I'd start debugging with "-cpu kvm64" on both sides.
> 
> Paolo

Sorry, I forgot to mention - most of my tests have been -cpu
qemu64,-vmx,-svm. I've just re-run the minimal ones with that, and
kvm64, for sanity. Same behaviour.

In the interests of debugging, I guess the next step is to attach gdb to
the destination and find out what's happening around the time the error
messages are pushed out?

I found an earlier thread (
http://www.spinics.net/lists/kvm/msg73478.html ) with a similar flavour
that referenced a "vmxcap" tool - the output of that is here:

Basic VMX Information
  Revision                                 15
  VMCS size                                1024
  VMCS restricted to 32 bit addresses      no
  Dual-monitor support                     yes
  VMCS memory type                         6
  INS/OUTS instruction information         yes
  IA32_VMX_TRUE_*_CTLS support             yes
pin-based controls
  External interrupt exiting               yes
  NMI exiting                              yes
  Virtual NMIs                             yes
  Activate VMX-preemption timer            yes
  Process posted interrupts                no
primary processor-based controls
  Interrupt window exiting                 yes
  Use TSC offsetting                       yes
  HLT exiting                              yes
  INVLPG exiting                           yes
  MWAIT exiting                            yes
  RDPMC exiting                            yes
  RDTSC exiting                            yes
  CR3-load exiting                         default
  CR3-store exiting                        default
  CR8-load exiting                         yes
  CR8-store exiting                        yes
  Use TPR shadow                           yes
  NMI-window exiting                       yes
  MOV-DR exiting                           yes
  Unconditional I/O exiting                yes
  Use I/O bitmaps                          yes
  Monitor trap flag                        yes
  Use MSR bitmaps                          yes
  MONITOR exiting                          yes
  PAUSE exiting                            yes
  Activate secondary control               yes
secondary processor-based controls
  Virtualize APIC accesses                 yes
  Enable EPT                               yes
  Descriptor-table exiting                 yes
  Enable RDTSCP                            yes
  Virtualize x2APIC mode                   yes
  Enable VPID                              yes
  WBINVD exiting                           yes
  Unrestricted guest                       yes
  APIC register emulation                  no
  Virtual interrupt delivery               no
  PAUSE-loop exiting                       yes
  RDRAND exiting                           no
  Enable INVPCID                           no
  Enable VM functions                      no
  VMCS shadowing                           no
  EPT-violation #VE                        no
VM-Exit controls
  Save debug controls                      default
  Host address-space size                  yes
  Load IA32_PERF_GLOBAL_CTRL               yes
  Acknowledge interrupt on exit            yes
  Save IA32_PAT                            yes
  Load IA32_PAT                            yes
  Save IA32_EFER                           yes
  Load IA32_EFER                           yes
  Save VMX-preemption timer value          yes
VM-Entry controls
  Load debug controls                      default
  IA-64 mode guest                         yes
  Entry to SMM                             yes
  Deactivate dual-monitor treatment        yes
  Load IA32_PERF_GLOBAL_CTRL               yes
  Load IA32_PAT                            yes
  Load IA32_EFER                           yes
Miscellaneous data
  VMX-preemption timer scale (log2)        7
  Store EFER.LMA into IA-32e mode guest control yes
  HLT activity state                       yes
  Shutdown activity state                  yes
  Wait-for-SIPI activity state             yes
  IA32_SMBASE support                      no
  Number of CR3-target values              4
  MSR-load/store count recommenation       0
  IA32_SMM_MONITOR_CTL[2] can be set to 1  no
  VMWRITE to VM-exit information fields    no
  MSEG revision identifier                 0
VPID and EPT capabilities
  Execute-only EPT translations            yes
  Page-walk length 4                       yes
  Paging-structure memory type UC          yes
  Paging-structure memory type WB          yes
  2MB EPT pages                            yes
  1GB EPT pages                            yes
  INVEPT supported                         yes
  EPT accessed and dirty flags             no
  Single-context INVEPT                    yes
  All-context INVEPT                       yes
  INVVPID supported                        yes
  Individual-address INVVPID               yes
  Single-context INVVPID                   yes
  All-context INVVPID                      yes
  Single-context-retaining-globals INVVPID yes
VM Functions
  EPTP Switching                           no


Mostly Greek, but I assume one of the things labeled "no" is the source
of my woes?

/Nick

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] AMD Opteron 6276 to Intel Xeon E5645 live migration failure
@ 2014-02-28 13:15     ` Nick Thomas
  0 siblings, 0 replies; 6+ messages in thread
From: Nick Thomas @ 2014-02-28 13:15 UTC (permalink / raw)
  To: Paolo Bonzini, Qemu-devel; +Cc: kvm

On 28/02/14 12:57, Paolo Bonzini wrote:
> Il 28/02/2014 13:16, Nick Thomas ha scritto:
>> I'd love to get this working, but I'm a little ignorant on where to
>> begin, or even if it's possible at all. Are these CPUs just too old, or
>> is a fixup missing in qemu (or kvm)?
> 
> It's the latter (in kvm).  Note that for migration to work, especially
> for such different models, you have to disable CPU features that aren't
> present in both models.  In general the way to do this is to add "-cpu".
> 
> I'd start debugging with "-cpu kvm64" on both sides.
> 
> Paolo

Sorry, I forgot to mention - most of my tests have been -cpu
qemu64,-vmx,-svm. I've just re-run the minimal ones with that, and
kvm64, for sanity. Same behaviour.

In the interests of debugging, I guess the next step is to attach gdb to
the destination and find out what's happening around the time the error
messages are pushed out?

I found an earlier thread (
http://www.spinics.net/lists/kvm/msg73478.html ) with a similar flavour
that referenced a "vmxcap" tool - the output of that is here:

Basic VMX Information
  Revision                                 15
  VMCS size                                1024
  VMCS restricted to 32 bit addresses      no
  Dual-monitor support                     yes
  VMCS memory type                         6
  INS/OUTS instruction information         yes
  IA32_VMX_TRUE_*_CTLS support             yes
pin-based controls
  External interrupt exiting               yes
  NMI exiting                              yes
  Virtual NMIs                             yes
  Activate VMX-preemption timer            yes
  Process posted interrupts                no
primary processor-based controls
  Interrupt window exiting                 yes
  Use TSC offsetting                       yes
  HLT exiting                              yes
  INVLPG exiting                           yes
  MWAIT exiting                            yes
  RDPMC exiting                            yes
  RDTSC exiting                            yes
  CR3-load exiting                         default
  CR3-store exiting                        default
  CR8-load exiting                         yes
  CR8-store exiting                        yes
  Use TPR shadow                           yes
  NMI-window exiting                       yes
  MOV-DR exiting                           yes
  Unconditional I/O exiting                yes
  Use I/O bitmaps                          yes
  Monitor trap flag                        yes
  Use MSR bitmaps                          yes
  MONITOR exiting                          yes
  PAUSE exiting                            yes
  Activate secondary control               yes
secondary processor-based controls
  Virtualize APIC accesses                 yes
  Enable EPT                               yes
  Descriptor-table exiting                 yes
  Enable RDTSCP                            yes
  Virtualize x2APIC mode                   yes
  Enable VPID                              yes
  WBINVD exiting                           yes
  Unrestricted guest                       yes
  APIC register emulation                  no
  Virtual interrupt delivery               no
  PAUSE-loop exiting                       yes
  RDRAND exiting                           no
  Enable INVPCID                           no
  Enable VM functions                      no
  VMCS shadowing                           no
  EPT-violation #VE                        no
VM-Exit controls
  Save debug controls                      default
  Host address-space size                  yes
  Load IA32_PERF_GLOBAL_CTRL               yes
  Acknowledge interrupt on exit            yes
  Save IA32_PAT                            yes
  Load IA32_PAT                            yes
  Save IA32_EFER                           yes
  Load IA32_EFER                           yes
  Save VMX-preemption timer value          yes
VM-Entry controls
  Load debug controls                      default
  IA-64 mode guest                         yes
  Entry to SMM                             yes
  Deactivate dual-monitor treatment        yes
  Load IA32_PERF_GLOBAL_CTRL               yes
  Load IA32_PAT                            yes
  Load IA32_EFER                           yes
Miscellaneous data
  VMX-preemption timer scale (log2)        7
  Store EFER.LMA into IA-32e mode guest control yes
  HLT activity state                       yes
  Shutdown activity state                  yes
  Wait-for-SIPI activity state             yes
  IA32_SMBASE support                      no
  Number of CR3-target values              4
  MSR-load/store count recommenation       0
  IA32_SMM_MONITOR_CTL[2] can be set to 1  no
  VMWRITE to VM-exit information fields    no
  MSEG revision identifier                 0
VPID and EPT capabilities
  Execute-only EPT translations            yes
  Page-walk length 4                       yes
  Paging-structure memory type UC          yes
  Paging-structure memory type WB          yes
  2MB EPT pages                            yes
  1GB EPT pages                            yes
  INVEPT supported                         yes
  EPT accessed and dirty flags             no
  Single-context INVEPT                    yes
  All-context INVEPT                       yes
  INVVPID supported                        yes
  Individual-address INVVPID               yes
  Single-context INVVPID                   yes
  All-context INVVPID                      yes
  Single-context-retaining-globals INVVPID yes
VM Functions
  EPTP Switching                           no


Mostly Greek, but I assume one of the things labeled "no" is the source
of my woes?

/Nick

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-02-28 13:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-28 12:16 AMD Opteron 6276 to Intel Xeon E5645 live migration failure Nick Thomas
2014-02-28 12:16 ` [Qemu-devel] " Nick Thomas
2014-02-28 12:57 ` Paolo Bonzini
2014-02-28 12:57   ` [Qemu-devel] " Paolo Bonzini
2014-02-28 13:15   ` Nick Thomas
2014-02-28 13:15     ` [Qemu-devel] " Nick Thomas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.