kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Bug 213781] New: KVM: x86/svm: The guest (#vcpu>1) can't boot up with QEMU "-overcommit cpu-pm=on"
@ 2021-07-19 10:08 bugzilla-daemon
  2021-07-19 10:57 ` [Bug 213781] " bugzilla-daemon
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: bugzilla-daemon @ 2021-07-19 10:08 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=213781

            Bug ID: 213781
           Summary: KVM: x86/svm: The guest (#vcpu>1) can't boot up with
                    QEMU "-overcommit cpu-pm=on"
           Product: Virtualization
           Version: unspecified
    Kernel Version: 5.14.0-rc1+
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: blocking
          Priority: P1
         Component: kvm
          Assignee: virtualization_kvm@kernel-bugs.osdl.org
          Reporter: like.xu.linux@gmail.com
        Regression: No

Hi,

This issue is an upstream bug and very easy to reproduce on the AMD platforms.
It was first introduced since the commit
e72436bc3a5206f95bb384e741154166ddb3202e.

The QEMU reports the the following stack:

KVM internal error. Suberror: 1
emulation failure
EAX=000f38b3 EBX=00000000 ECX=000002ff EDX=00000001
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00006d88
EIP=000fc95a EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
FS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
GS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008300 DPL=0 TSS16-busy
GDT=     000f50a0 00000037
IDT=     000f50de 00000000
CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=34 41 0f 00 e8 5b 26 ff ff c7 05 38 41 0f 00 00 00 00 00 f4 <eb> fd fa fc
66 b8 00 c2 00 00 8e d8 8e d0 66 bc 58 f8 00 00 e9 07 f9 66 54 66 0f b7 e4 66

At the buggy time, the dump_vmcb() says:

[47175.214140] SVM: VMCB 00000000a4006788, last attempted VMRUN on CPU 81
[47175.215862] SVM: VMCB Control Area:
[47175.216155] SVM: cr_read:            0010
[47175.216426] SVM: cr_write:           0110
[47175.216699] SVM: dr_read:            00ff
[47175.216939] SVM: dr_write:           00ff
[47175.217170] SVM: exceptions:         00060042
[47175.217400] SVM: intercepts:         bc4c8027 0000624f
[47175.217651] SVM: pause filter count: 0
[47175.217879] SVM: pause filter threshold:0
[47175.218107] SVM: iopm_base_pa:       0000000194674000
[47175.218342] SVM: msrpm_base_pa:      00000040857d4000
[47175.218589] SVM: tsc_offset:         ffff92710e0ed2c0
[47175.218823] SVM: asid:               1
[47175.219052] SVM: tlb_ctl:            0
[47175.219280] SVM: int_ctl:            03000200
[47175.219522] SVM: int_vector:         00000000
[47175.219753] SVM: int_state:          00000000
[47175.219981] SVM: exit_code:          00000400
[47175.220208] SVM: exit_info1:         0000000100000014
[47175.220441] SVM: exit_info2:         00000000000fc000
[47175.220684] SVM: exit_int_info:      00000000
[47175.220913] SVM: exit_int_info_err:  00000000
[47175.221140] SVM: nested_ctl:         1
[47175.221363] SVM: nested_cr3:         0000004184ca8000
[47175.221598] SVM: avic_vapic_bar:     0000000000000000
[47175.221823] SVM: ghcb:               0000000000000000
[47175.222047] SVM: event_inj:          00000000
[47175.222272] SVM: event_inj_err:      00000000
[47175.222497] SVM: virt_ext:           2
[47175.222739] SVM: next_rip:           0000000000000000
[47175.222968] SVM: avic_backing_page:  0000000000000000
[47175.223198] SVM: avic_logical_id:    0000000000000000
[47175.223425] SVM: avic_physical_id:   0000000000000000
[47175.223665] SVM: vmsa_pa:            0000000000000000
[47175.223885] SVM: VMCB State Save Area:
[47175.224105] SVM: es:   s: 0010 a: 0c93 l: ffffffff b: 0000000000000000
[47175.224342] SVM: cs:   s: 0008 a: 049b l: ffffffff b: 0000000000000000
[47175.224588] SVM: ss:   s: 0010 a: 0c93 l: ffffffff b: 0000000000000000
[47175.224817] SVM: ds:   s: 0010 a: 0c93 l: ffffffff b: 0000000000000000
[47175.225043] SVM: fs:   s: 0010 a: 0c93 l: ffffffff b: 0000000000000000
[47175.225266] SVM: gs:   s: 0010 a: 0c93 l: ffffffff b: 0000000000000000
[47175.225486] SVM: gdtr: s: 0000 a: 0000 l: 00000037 b: 00000000000f50a0
[47175.225720] SVM: ldtr: s: 0000 a: 0082 l: 0000ffff b: 0000000000000000
[47175.225939] SVM: idtr: s: 0000 a: 0000 l: 00000000 b: 00000000000f50de
[47175.226156] SVM: tr:   s: 0000 a: 0083 l: 0000ffff b: 0000000000000000
[47175.226445] SVM: cpl:            0                efer:        
0000000000001000
[47175.226682] SVM: cr0:            0000000000000011 cr2:         
0000000000000000
[47175.226900] SVM: cr3:            0000000000000000 cr4:         
0000000000000000
[47175.227112] SVM: dr6:            00000000ffff0ff0 dr7:         
0000000000000400
[47175.227327] SVM: rip:            00000000000fc95a rflags:      
0000000000000002
[47175.227554] SVM: rsp:            0000000000006d88 rax:         
00000000000f38b3
[47175.227768] SVM: star:           0000000000000000 lstar:       
0000000000000000
[47175.227983] SVM: cstar:          0000000000000000 sfmask:      
0000000000000000
[47175.228198] SVM: kernel_gs_base: 0000000000000000 sysenter_cs: 
0000000000000000
[47175.228413] SVM: sysenter_esp:   0000000000000000 sysenter_eip:
0000000000000000
[47175.228641] SVM: gpat:           0007040600070406 dbgctl:      
0000000000000000
[47175.228859] SVM: br_from:        0000000000000000 br_to:       
0000000000000000
[47175.229076] SVM: excp_from:      0000000000000000 excp_to:     
0000000000000000

You may need the target BIOS code part:

   fc940:       00 00
   fc942:       72 f3                   jb     fc937 <entry_smp+0xb>
   fc944:       8b 25 34 41 0f 00       mov    0xf4134,%esp
   fc94a:       e8 5b 26 ff ff          call   eefaa <handle_smp>
   fc94f:       c7 05 38 41 0f 00 00    movl   $0x0,0xf4138
   fc956:       00 00 00
   fc959:       f4                      hlt
   fc95a:       eb fd                   jmp    fc959 <entry_smp+0x2d>
   fc95c:       fa                      cli
   fc95d:       fc                      cld
   fc95e:       66 b8 00 c2             mov    $0xc200,%ax
   fc962:       00 00                   add    %al,(%eax)
   fc964:       8e d8                   mov    %eax,%ds
   fc966:       8e d0                   mov    %eax,%ss
   fc968:       66 bc 58 f8             mov    $0xf858,%sp
   fc96c:       00 00                   add    %al,(%eax)
   fc96e:       e9 07 f9 66 54          jmp    5476c27a
<code32flat_end+0x5466c27a>
   fc973:       66 0f b7 e4             movzww %sp,%sp
   fc977:       66 9c                   pushfw
   fc979:       fa                      cli
   fc97a:       fc                      cld

Or the code from the SeaBios:

// Entry point for QEMU smp sipi interrupts.
        DECLFUNC entry_smp
entry_smp:
        // Transition to 32bit mode.
        cli
        cld
        movl $2f + BUILD_BIOS_ADDR, %edx
        jmp transition32_nmi_off
        .code32
        // Acquire lock and take ownership of shared stack
1:      rep ; nop
2:      lock btsl $0, SMPLock
        jc 1b
        movl SMPStack, %esp
        // Call handle_smp
        calll _cfunc32flat_handle_smp - BUILD_BIOS_ADDR
        // Release lock and halt processor.
        movl $0, SMPLock
3:      hlt
        jmp 3b
        .code16

The related trace:

       CPU 1/KVM-1278472 [119] d..2 246654.769260: kvm_entry: vcpu 1, rip
0xfc95a
       CPU 1/KVM-1278472 [119] ...1 246654.769261: kvm_exit: vcpu 1 reason npf
rip 0xfc95a info1 0x0000000100000014 info2 0x00000000000fc000 intr_info
0x00000000 error_code 0x00000000
       CPU 1/KVM-1278472 [119] ...1 246654.769262: kvm_page_fault: address
fc000 error_code 14
       CPU 1/KVM-1278472 [119] d..2 246654.769262: kvm_entry: vcpu 1, rip
0xfc95a
       CPU 1/KVM-1278472 [119] ...1 246654.769263: kvm_exit: vcpu 1 reason npf
rip 0xfc95a info1 0x0000000100000014 info2 0x00000000000fc000 intr_info
0x00000000 error_code 0x00000000
       CPU 1/KVM-1278472 [119] ...1 246654.769263: kvm_page_fault: address
fc000 error_code 14
       CPU 1/KVM-1278472 [119] ...1 246654.769272: kvm_emulate_insn: 0:fc95a:
(prot32)
       CPU 1/KVM-1278472 [119] ...1 246654.769274: kvm_emulate_insn: 0:fc95a:
(prot32) failed
       CPU 1/KVM-1278472 [119] ...1 246654.769275: kvm_fpu: unload
       CPU 1/KVM-1278472 [119] ...1 246654.769275: kvm_userspace_exit: reason
KVM_EXIT_INTERNAL_ERROR (17)


My early explorations:

- Instruction emulation of EIP 0xfc95a raised by (EMULTYPE_ALLOW_RETRY_PF |
EMULTYPE_PF) exited by kvm_mmu_page_fault();
- The __do_insn_fetch_bytes() is called in the x86_decode_insn() due to
svm->vmcb->control.insn_len is 0 (not sure if it's another Errata about #NPF);
- The X86EMUL_IO_NEEDED is returned for kvm_fetch_guest_virt();
- Please note we will have "kvm_emulate_insn: ffff0000:fff0: (real) failed"
for the tools/testing/selftests/kvm/set_memory_region_test.

Please share your understanding with me or fix it with your proposal.

Thanks,
Like Xu

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 213781] KVM: x86/svm: The guest (#vcpu>1) can't boot up with QEMU "-overcommit cpu-pm=on"
  2021-07-19 10:08 [Bug 213781] New: KVM: x86/svm: The guest (#vcpu>1) can't boot up with QEMU "-overcommit cpu-pm=on" bugzilla-daemon
@ 2021-07-19 10:57 ` bugzilla-daemon
  2021-07-29  1:57 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2021-07-19 10:57 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=213781

Maxim Levitsky (maximlevitsky@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |maximlevitsky@gmail.com

--- Comment #1 from Maxim Levitsky (maximlevitsky@gmail.com) ---
I sadly know exactly why this happens and yes this commit is technically to
blame.

But the root cause is non atomic memslot updates that qemu does. It will be
fixed this way or another I hope.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 213781] KVM: x86/svm: The guest (#vcpu>1) can't boot up with QEMU "-overcommit cpu-pm=on"
  2021-07-19 10:08 [Bug 213781] New: KVM: x86/svm: The guest (#vcpu>1) can't boot up with QEMU "-overcommit cpu-pm=on" bugzilla-daemon
  2021-07-19 10:57 ` [Bug 213781] " bugzilla-daemon
@ 2021-07-29  1:57 ` bugzilla-daemon
  2021-07-29  9:29 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2021-07-29  1:57 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=213781

--- Comment #2 from Like Xu (like.xu.linux@gmail.com) ---
Hi Maxim,

Do we have any updates on this issue? Can you help provide more details
about "non-atomic memslot update made by qemu" so I can try to fix it?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 213781] KVM: x86/svm: The guest (#vcpu>1) can't boot up with QEMU "-overcommit cpu-pm=on"
  2021-07-19 10:08 [Bug 213781] New: KVM: x86/svm: The guest (#vcpu>1) can't boot up with QEMU "-overcommit cpu-pm=on" bugzilla-daemon
  2021-07-19 10:57 ` [Bug 213781] " bugzilla-daemon
  2021-07-29  1:57 ` bugzilla-daemon
@ 2021-07-29  9:29 ` bugzilla-daemon
  2022-06-22 12:49 ` bugzilla-daemon
  2022-06-22 13:00 ` bugzilla-daemon
  4 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2021-07-29  9:29 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=213781

--- Comment #3 from Maxim Levitsky (maximlevitsky@gmail.com) ---
For all practical purposes you can just revert this commit.
The fix for root cause is not simple, and I will work on it when I get to it.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 213781] KVM: x86/svm: The guest (#vcpu>1) can't boot up with QEMU "-overcommit cpu-pm=on"
  2021-07-19 10:08 [Bug 213781] New: KVM: x86/svm: The guest (#vcpu>1) can't boot up with QEMU "-overcommit cpu-pm=on" bugzilla-daemon
                   ` (2 preceding siblings ...)
  2021-07-29  9:29 ` bugzilla-daemon
@ 2022-06-22 12:49 ` bugzilla-daemon
  2022-06-22 13:00   ` Maxim Levitsky
  2022-06-22 13:00 ` bugzilla-daemon
  4 siblings, 1 reply; 7+ messages in thread
From: bugzilla-daemon @ 2022-06-22 12:49 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=213781

Like Xu (like.xu.linux@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Kernel Version|5.14.0-rc1+                 |5.19.0-rc1+

--- Comment #4 from Like Xu (like.xu.linux@gmail.com) ---
The issue still exits on the AMD after we revert the commit in 31c25585695a.

Just confirmed that it's caused by non-atomic accesses to memslot:
- __do_insn_fetch_bytes() from the prot32 code page #NPF;
- kvm_vm_ioctl_set_memory_region() from user space;

Considering the expected result [selftests::test_zero_memory_regions on x86_64]
is that the guest will trigger an internal KVM error due to the initial code
fetch encountering a non-existent memslot and resulting in an emulation
failure.

More similar cases will gradually emerge. I'm not sure if KVM has documentation
pointing out this restriction on memslot updates (fix one application QEMU may
be one-sided), or any need to add something unwise like check
gfn_to_memslot(kvm, gpa_to_gfn(cr2_or_gpa)) in the x86_emulate_instruction().

Any other suggestions ?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bug 213781] KVM: x86/svm: The guest (#vcpu>1) can't boot up with QEMU "-overcommit cpu-pm=on"
  2022-06-22 12:49 ` bugzilla-daemon
@ 2022-06-22 13:00   ` Maxim Levitsky
  0 siblings, 0 replies; 7+ messages in thread
From: Maxim Levitsky @ 2022-06-22 13:00 UTC (permalink / raw)
  To: bugzilla-daemon, kvm

On Wed, 2022-06-22 at 12:49 +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=213781
> 
> Like Xu (like.xu.linux@gmail.com) changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>      Kernel Version|5.14.0-rc1+                 |5.19.0-rc1+
> 
> --- Comment #4 from Like Xu (like.xu.linux@gmail.com) ---
> The issue still exits on the AMD after we revert the commit in 31c25585695a.
> 
> Just confirmed that it's caused by non-atomic accesses to memslot:
> - __do_insn_fetch_bytes() from the prot32 code page #NPF;
> - kvm_vm_ioctl_set_memory_region() from user space;
> 
> Considering the expected result [selftests::test_zero_memory_regions on x86_64]
> is that the guest will trigger an internal KVM error due to the initial code
> fetch encountering a non-existent memslot and resulting in an emulation
> failure.
> 
> More similar cases will gradually emerge. I'm not sure if KVM has documentation
> pointing out this restriction on memslot updates (fix one application QEMU may
> be one-sided), or any need to add something unwise like check
> gfn_to_memslot(kvm, gpa_to_gfn(cr2_or_gpa)) in the x86_emulate_instruction().
> 
> Any other suggestions ?
> 

Yep, agree. This has to be fixed on qemu and kvm level (kvm needs new API to upload
atomaically a set of memslot changes (easy part), and the qemu needs code to
batch the memslot updates when it does SMM related memslot updates.

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 213781] KVM: x86/svm: The guest (#vcpu>1) can't boot up with QEMU "-overcommit cpu-pm=on"
  2021-07-19 10:08 [Bug 213781] New: KVM: x86/svm: The guest (#vcpu>1) can't boot up with QEMU "-overcommit cpu-pm=on" bugzilla-daemon
                   ` (3 preceding siblings ...)
  2022-06-22 12:49 ` bugzilla-daemon
@ 2022-06-22 13:00 ` bugzilla-daemon
  4 siblings, 0 replies; 7+ messages in thread
From: bugzilla-daemon @ 2022-06-22 13:00 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=213781

--- Comment #5 from mlevitsk@redhat.com ---
On Wed, 2022-06-22 at 12:49 +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=213781
> 
> Like Xu (like.xu.linux@gmail.com) changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>      Kernel Version|5.14.0-rc1+                 |5.19.0-rc1+
> 
> --- Comment #4 from Like Xu (like.xu.linux@gmail.com) ---
> The issue still exits on the AMD after we revert the commit in 31c25585695a.
> 
> Just confirmed that it's caused by non-atomic accesses to memslot:
> - __do_insn_fetch_bytes() from the prot32 code page #NPF;
> - kvm_vm_ioctl_set_memory_region() from user space;
> 
> Considering the expected result [selftests::test_zero_memory_regions on
> x86_64]
> is that the guest will trigger an internal KVM error due to the initial code
> fetch encountering a non-existent memslot and resulting in an emulation
> failure.
> 
> More similar cases will gradually emerge. I'm not sure if KVM has
> documentation
> pointing out this restriction on memslot updates (fix one application QEMU
> may
> be one-sided), or any need to add something unwise like check
> gfn_to_memslot(kvm, gpa_to_gfn(cr2_or_gpa)) in the x86_emulate_instruction().
> 
> Any other suggestions ?
> 

Yep, agree. This has to be fixed on qemu and kvm level (kvm needs new API to
upload
atomaically a set of memslot changes (easy part), and the qemu needs code to
batch the memslot updates when it does SMM related memslot updates.

Best regards,
        Maxim Levitsky

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-06-22 13:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-19 10:08 [Bug 213781] New: KVM: x86/svm: The guest (#vcpu>1) can't boot up with QEMU "-overcommit cpu-pm=on" bugzilla-daemon
2021-07-19 10:57 ` [Bug 213781] " bugzilla-daemon
2021-07-29  1:57 ` bugzilla-daemon
2021-07-29  9:29 ` bugzilla-daemon
2022-06-22 12:49 ` bugzilla-daemon
2022-06-22 13:00   ` Maxim Levitsky
2022-06-22 13:00 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).