All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 216867] New: KVM instruction emulation breaks LOCK instruction atomicity when CMPXCHG fails
@ 2022-12-30  3:02 bugzilla-daemon
  2023-01-03 22:05 ` Sean Christopherson
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: bugzilla-daemon @ 2022-12-30  3:02 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216867

            Bug ID: 216867
           Summary: KVM instruction emulation breaks LOCK instruction
                    atomicity when CMPXCHG fails
           Product: Virtualization
           Version: unspecified
    Kernel Version: 6.0.14-300.fc37.x86_64
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: kvm
          Assignee: virtualization_kvm@kernel-bugs.osdl.org
          Reporter: ercli@ucdavis.edu
        Regression: No

Created attachment 303502
  --> https://bugzilla.kernel.org/attachment.cgi?id=303502&action=edit
c.img.xz: Guest software to reproduce this bug (xz compressed)

Host CPU model: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
Host kernel version: 6.0.14-300.fc37.x86_64
Host kernel arch: x86_64
Guest: a system level software (called LHV) I wrote myself, 32-bits, compressed
and attached as c.img.xz.
QEMU command line: qemu-system-i386 -m 256M -smp 4 -enable-kvm -serial stdio
-drive media=disk,file=c.img
The problem does not go away if using -machine kernel_irqchip=off
The problem goes away if -accel tcg is used (also remove -enable-kvm)

Actual behavior: serial port shows a few lines, then stops. Example:

...
counts: count0  count1  count2  count0-count1+count2
counts: 1       0       1       0
counts: 2       1       1       0
counts: 3       2       1       0
counts: 4       2       3       -1
counts: 5       3       4       -2
counts: 6       3       5       -2
counts: 7       4       6       -3
counts: 8       4       7       -3
counts: 9       4       8       -3
counts: 10      5       8       -3
(no more outputs due to deadlock)

Usually the output is shorter. Sometimes only the table header.

Expected behaivor (reproducible using TCG): serial port shows:

...
counts: count0  count1  count2  count0-count1+count2
counts: 1       0       1       0
counts: 2       1       1       0
counts: 3       1       2       0
counts: 4       2       2       0
counts: 5       2       3       0
counts: 6       3       3       0
counts: 7       3       4       0
counts: 8       4       4       0
counts: 9       5       4       0
counts: 10      5       5       0
counts: 11      5       6       0
counts: 12      6       6       0
counts: 13      7       6       0
...

Explanation:

See the following for source code, line 5 - 89:

https://github.com/lxylxy123456/uberxmhf/blob/b5935eaf8aab38ce1933da1c1be22dcf1b992eaf/xmhf/src/xmhf-core/xmhf-runtime/xmhf-startup/lhv.c#L5

My code performs the following experiment repeatedly on 3 CPUs:

* Initially, "ptr" at address 0xb8000 (VGA memory mapped I/O) is set to 0
* CPU 0 writes 0x12345678 to ptr, then increases counter "count0".
* In an infinite loop, CPU 1 tries exchanges ptr with register EAX (contains 0)
using the XCHG instruction. If CPU 1 sees 0x12345678, it increases counter
"count1".
* CPU 2's behavior is similar to CPU 1, except it increases counter "count2"
when it sees 0x12345678.

Ideally, after each experiment there should always be count1 + count2 = count0.
However, in KVM, there may be count1 + count2 > count0. This because CPU 0
writes 0x12345678 to ptr once, but CPU 1 and CPU 2 both get 0x12345678 in XCHG.
Note that XCHG instruction always implements the locking protocol.

There is also a deadlock after running the experiment a few times. However I am
not trying to explain it for now.

Guessed cause:

I guess that KVM emulates the XCHG instruction that accesses 0xb8000. The call
stack should be:

...
 x86_emulate_instruction (arch/x86/kvm/x86.c)
  x86_emulate_insn (arch/x86/kvm/emulate.c)
   writeback (arch/x86/kvm/emulate.c)
    segmented_cmpxchg (arch/x86/kvm/emulate.c)
     emulator_cmpxchg_emulated (arch/x86/kvm/x86.c, ->cmpxchg_emulated)
      emulator_try_cmpxchg_user (arch/x86/kvm/x86.c)
       ...
        CMPXCHG instruction

Suppose CPU 2 wants to write 0 to ptr using writeback(), and expecting ptr to
already contain 0x13245678. However, CPU 1 changes the content of ptr to 0. So
* The CMPXCHG instruction fails (clears ZF).
* emulator_try_cmpxchg_user returns 1.
* emulator_cmpxchg_emulated() returns X86EMUL_CMPXCHG_FAILED.
* segmented_cmpxchg() returns X86EMUL_CMPXCHG_FAILED.
* writeback() returns X86EMUL_CMPXCHG_FAILED.
* x86_emulate_insn() returns EMULATION_OK.

Thus, I think the root cause of this bug is that x86_emulate_insn() ignores the
X86EMUL_CMPXCHG_FAILED error. The correct behavior should be retrying the
emulation using the updated value (similar to load-linked/store-conditional).

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Bug 216867] New: KVM instruction emulation breaks LOCK instruction atomicity when CMPXCHG fails
  2022-12-30  3:02 [Bug 216867] New: KVM instruction emulation breaks LOCK instruction atomicity when CMPXCHG fails bugzilla-daemon
@ 2023-01-03 22:05 ` Sean Christopherson
  2023-01-03 22:05 ` [Bug 216867] " bugzilla-daemon
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Sean Christopherson @ 2023-01-03 22:05 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: kvm

On Fri, Dec 30, 2022, bugzilla-daemon@kernel.org wrote:
 
> My code performs the following experiment repeatedly on 3 CPUs:
> 
> * Initially, "ptr" at address 0xb8000 (VGA memory mapped I/O) is set to 0
> * CPU 0 writes 0x12345678 to ptr, then increases counter "count0".
> * In an infinite loop, CPU 1 tries exchanges ptr with register EAX (contains 0)
> using the XCHG instruction. If CPU 1 sees 0x12345678, it increases counter
> "count1".
> * CPU 2's behavior is similar to CPU 1, except it increases counter "count2"
> when it sees 0x12345678.
> 
> Ideally, after each experiment there should always be count1 + count2 = count0.
> However, in KVM, there may be count1 + count2 > count0. This because CPU 0
> writes 0x12345678 to ptr once, but CPU 1 and CPU 2 both get 0x12345678 in XCHG.
> Note that XCHG instruction always implements the locking protocol.
> 
> There is also a deadlock after running the experiment a few times. However I am
> not trying to explain it for now.

Is the suspect deadlock in userspace, the guest, or in the host kernel?

> Guessed cause:
> 
> I guess that KVM emulates the XCHG instruction that accesses 0xb8000. The call
> stack should be:
> 
> ...
>  x86_emulate_instruction (arch/x86/kvm/x86.c)
>   x86_emulate_insn (arch/x86/kvm/emulate.c)
>    writeback (arch/x86/kvm/emulate.c)
>     segmented_cmpxchg (arch/x86/kvm/emulate.c)
>      emulator_cmpxchg_emulated (arch/x86/kvm/x86.c, ->cmpxchg_emulated)
>       emulator_try_cmpxchg_user (arch/x86/kvm/x86.c)
>        ...
>         CMPXCHG instruction
> 
> Suppose CPU 2 wants to write 0 to ptr using writeback(), and expecting ptr to
> already contain 0x13245678. However, CPU 1 changes the content of ptr to 0. So
> * The CMPXCHG instruction fails (clears ZF).
> * emulator_try_cmpxchg_user returns 1.
> * emulator_cmpxchg_emulated() returns X86EMUL_CMPXCHG_FAILED.
> * segmented_cmpxchg() returns X86EMUL_CMPXCHG_FAILED.
> * writeback() returns X86EMUL_CMPXCHG_FAILED.
> * x86_emulate_insn() returns EMULATION_OK.
> 
> Thus, I think the root cause of this bug is that x86_emulate_insn() ignores the
> X86EMUL_CMPXCHG_FAILED error. The correct behavior should be retrying the
> emulation using the updated value (similar to load-linked/store-conditional).

KVM does retry the emulation, albeit in a very roundabout and non-robust way.
On X86EMUL_CMPXCHG_FAILED, x86_emulate_insn() skips the EIP update and doesn't
writeback GPRs.  x86_emulate_instruction() is flawed and emulates single-step, but
the "eip" written should be the original RIP, i.e. shouldn't advance past the
instructions being emulated.  The single-step mess should be fixed, but I doubt
that's the root cause here.

Is there a memslot for 0xb8000?  I assume not since KVM is emulating (have you
actually verified that, e.g. with tracepoints?).  KVM's ABI doesn't support
atomic MMIO operations, i.e. if there's no memslot, KVM will effectively drop
the LOCK semantics.  If that's indeed what's happening, you should see

  kvm: emulating exchange as write

in the host dmesg (just once though).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug 216867] KVM instruction emulation breaks LOCK instruction atomicity when CMPXCHG fails
  2022-12-30  3:02 [Bug 216867] New: KVM instruction emulation breaks LOCK instruction atomicity when CMPXCHG fails bugzilla-daemon
  2023-01-03 22:05 ` Sean Christopherson
@ 2023-01-03 22:05 ` bugzilla-daemon
  2023-01-03 22:38 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: bugzilla-daemon @ 2023-01-03 22:05 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216867

--- Comment #1 from Sean Christopherson (seanjc@google.com) ---
On Fri, Dec 30, 2022, bugzilla-daemon@kernel.org wrote:

> My code performs the following experiment repeatedly on 3 CPUs:
> 
> * Initially, "ptr" at address 0xb8000 (VGA memory mapped I/O) is set to 0
> * CPU 0 writes 0x12345678 to ptr, then increases counter "count0".
> * In an infinite loop, CPU 1 tries exchanges ptr with register EAX (contains
> 0)
> using the XCHG instruction. If CPU 1 sees 0x12345678, it increases counter
> "count1".
> * CPU 2's behavior is similar to CPU 1, except it increases counter "count2"
> when it sees 0x12345678.
> 
> Ideally, after each experiment there should always be count1 + count2 =
> count0.
> However, in KVM, there may be count1 + count2 > count0. This because CPU 0
> writes 0x12345678 to ptr once, but CPU 1 and CPU 2 both get 0x12345678 in
> XCHG.
> Note that XCHG instruction always implements the locking protocol.
> 
> There is also a deadlock after running the experiment a few times. However I
> am
> not trying to explain it for now.

Is the suspect deadlock in userspace, the guest, or in the host kernel?

> Guessed cause:
> 
> I guess that KVM emulates the XCHG instruction that accesses 0xb8000. The
> call
> stack should be:
> 
> ...
>  x86_emulate_instruction (arch/x86/kvm/x86.c)
>   x86_emulate_insn (arch/x86/kvm/emulate.c)
>    writeback (arch/x86/kvm/emulate.c)
>     segmented_cmpxchg (arch/x86/kvm/emulate.c)
>      emulator_cmpxchg_emulated (arch/x86/kvm/x86.c, ->cmpxchg_emulated)
>       emulator_try_cmpxchg_user (arch/x86/kvm/x86.c)
>        ...
>         CMPXCHG instruction
> 
> Suppose CPU 2 wants to write 0 to ptr using writeback(), and expecting ptr to
> already contain 0x13245678. However, CPU 1 changes the content of ptr to 0.
> So
> * The CMPXCHG instruction fails (clears ZF).
> * emulator_try_cmpxchg_user returns 1.
> * emulator_cmpxchg_emulated() returns X86EMUL_CMPXCHG_FAILED.
> * segmented_cmpxchg() returns X86EMUL_CMPXCHG_FAILED.
> * writeback() returns X86EMUL_CMPXCHG_FAILED.
> * x86_emulate_insn() returns EMULATION_OK.
> 
> Thus, I think the root cause of this bug is that x86_emulate_insn() ignores
> the
> X86EMUL_CMPXCHG_FAILED error. The correct behavior should be retrying the
> emulation using the updated value (similar to load-linked/store-conditional).

KVM does retry the emulation, albeit in a very roundabout and non-robust way.
On X86EMUL_CMPXCHG_FAILED, x86_emulate_insn() skips the EIP update and doesn't
writeback GPRs.  x86_emulate_instruction() is flawed and emulates single-step,
but
the "eip" written should be the original RIP, i.e. shouldn't advance past the
instructions being emulated.  The single-step mess should be fixed, but I doubt
that's the root cause here.

Is there a memslot for 0xb8000?  I assume not since KVM is emulating (have you
actually verified that, e.g. with tracepoints?).  KVM's ABI doesn't support
atomic MMIO operations, i.e. if there's no memslot, KVM will effectively drop
the LOCK semantics.  If that's indeed what's happening, you should see

  kvm: emulating exchange as write

in the host dmesg (just once though).

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug 216867] KVM instruction emulation breaks LOCK instruction atomicity when CMPXCHG fails
  2022-12-30  3:02 [Bug 216867] New: KVM instruction emulation breaks LOCK instruction atomicity when CMPXCHG fails bugzilla-daemon
  2023-01-03 22:05 ` Sean Christopherson
  2023-01-03 22:05 ` [Bug 216867] " bugzilla-daemon
@ 2023-01-03 22:38 ` bugzilla-daemon
  2023-01-03 22:39 ` bugzilla-daemon
  2023-01-03 22:48 ` bugzilla-daemon
  4 siblings, 0 replies; 6+ messages in thread
From: bugzilla-daemon @ 2023-01-03 22:38 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216867

--- Comment #2 from Eric Li (ercli@ucdavis.edu) ---
在 2023-01-03星期二的 22:05 +0000,bugzilla-daemon@kernel.org写道:
> https://bugzilla.kernel.org/show_bug.cgi?id=216867
> 
> --- Comment #1 from Sean Christopherson (seanjc@google.com) ---
> On Fri, Dec 30, 2022, bugzilla-daemon@kernel.org wrote:
> 
> > My code performs the following experiment repeatedly on 3 CPUs:
> > 
> > * Initially, "ptr" at address 0xb8000 (VGA memory mapped I/O) is
> > set to 0
> > * CPU 0 writes 0x12345678 to ptr, then increases counter "count0".
> > * In an infinite loop, CPU 1 tries exchanges ptr with register EAX
> > (contains
> > 0)
> > using the XCHG instruction. If CPU 1 sees 0x12345678, it increases
> > counter
> > "count1".
> > * CPU 2's behavior is similar to CPU 1, except it increases counter
> > "count2"
> > when it sees 0x12345678.
> > 
> > Ideally, after each experiment there should always be count1 +
> > count2 =
> > count0.
> > However, in KVM, there may be count1 + count2 > count0. This
> > because CPU 0
> > writes 0x12345678 to ptr once, but CPU 1 and CPU 2 both get
> > 0x12345678 in
> > XCHG.
> > Note that XCHG instruction always implements the locking protocol.
> > 
> > There is also a deadlock after running the experiment a few times.
> > However I
> > am
> > not trying to explain it for now.
> 
> Is the suspect deadlock in userspace, the guest, or in the host
> kernel?
> 

The deadlock happens in the guest. It is due to how my experiment is
implemented. It is not directly related to KVM.

> > Guessed cause:
> > 
> > I guess that KVM emulates the XCHG instruction that accesses
> > 0xb8000. The
> > call
> > stack should be:
> > 
> > ...
> >  x86_emulate_instruction (arch/x86/kvm/x86.c)
> >   x86_emulate_insn (arch/x86/kvm/emulate.c)
> >    writeback (arch/x86/kvm/emulate.c)
> >     segmented_cmpxchg (arch/x86/kvm/emulate.c)
> >      emulator_cmpxchg_emulated (arch/x86/kvm/x86.c, -
> > >cmpxchg_emulated)
> >       emulator_try_cmpxchg_user (arch/x86/kvm/x86.c)
> >        ...
> >         CMPXCHG instruction
> > 
> > Suppose CPU 2 wants to write 0 to ptr using writeback(), and
> > expecting ptr to
> > already contain 0x13245678. However, CPU 1 changes the content of
> > ptr to 0.
> > So
> > * The CMPXCHG instruction fails (clears ZF).
> > * emulator_try_cmpxchg_user returns 1.
> > * emulator_cmpxchg_emulated() returns X86EMUL_CMPXCHG_FAILED.
> > * segmented_cmpxchg() returns X86EMUL_CMPXCHG_FAILED.
> > * writeback() returns X86EMUL_CMPXCHG_FAILED.
> > * x86_emulate_insn() returns EMULATION_OK.
> > 
> > Thus, I think the root cause of this bug is that x86_emulate_insn()
> > ignores
> > the
> > X86EMUL_CMPXCHG_FAILED error. The correct behavior should be
> > retrying the
> > emulation using the updated value (similar to load-linked/store-
> > conditional).
> 
> KVM does retry the emulation, albeit in a very roundabout and non-
> robust way.
> On X86EMUL_CMPXCHG_FAILED, x86_emulate_insn() skips the EIP update
> and doesn't
> writeback GPRs.  x86_emulate_instruction() is flawed and emulates
> single-step,
> but
> the "eip" written should be the original RIP, i.e. shouldn't advance
> past the
> instructions being emulated.  The single-step mess should be fixed,
> but I doubt
> that's the root cause here.
> 

I see, thanks for the explanation. Now the retrying code looks correct
to me (though I agree that the code could have been written in a better
way).

> Is there a memslot for 0xb8000?  I assume not since KVM is emulating
> (have you
> actually verified that, e.g. with tracepoints?).  KVM's ABI doesn't
> support
> atomic MMIO operations, i.e. if there's no memslot, KVM will
> effectively drop
> the LOCK semantics.  If that's indeed what's happening, you should
> see
> 
>   kvm: emulating exchange as write
> 
> in the host dmesg (just once though).
> 

You are right. I see "kvm: emulating exchange as write" when I run the
guest I wrote. Looks like this is the check that causes KVM to drop
LOCK on VGA MMIO:

>       gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, NULL);
>
>       if (gpa == INVALID_GPA ||
>           (gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
>               goto emul_write;

Closing this bug since LOCK on MMIO is not supported by KVM's ABI.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug 216867] KVM instruction emulation breaks LOCK instruction atomicity when CMPXCHG fails
  2022-12-30  3:02 [Bug 216867] New: KVM instruction emulation breaks LOCK instruction atomicity when CMPXCHG fails bugzilla-daemon
                   ` (2 preceding siblings ...)
  2023-01-03 22:38 ` bugzilla-daemon
@ 2023-01-03 22:39 ` bugzilla-daemon
  2023-01-03 22:48 ` bugzilla-daemon
  4 siblings, 0 replies; 6+ messages in thread
From: bugzilla-daemon @ 2023-01-03 22:39 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216867

Eric Li (ercli@ucdavis.edu) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |WILL_NOT_FIX

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug 216867] KVM instruction emulation breaks LOCK instruction atomicity when CMPXCHG fails
  2022-12-30  3:02 [Bug 216867] New: KVM instruction emulation breaks LOCK instruction atomicity when CMPXCHG fails bugzilla-daemon
                   ` (3 preceding siblings ...)
  2023-01-03 22:39 ` bugzilla-daemon
@ 2023-01-03 22:48 ` bugzilla-daemon
  4 siblings, 0 replies; 6+ messages in thread
From: bugzilla-daemon @ 2023-01-03 22:48 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216867

--- Comment #3 from Eric Li (ercli@ucdavis.edu) ---
Just FYI: VMware Workstation Player 16.2.4 has the same behavior, which is
reported in
https://communities.vmware.com/t5/VMware-Workstation-Player/LOCK-instruction-atomicity-broken-on-VGA-memory-mapped-IO/m-p/2946505#M39982

I tested on the following hypervisors / emulators, and do not see this bug:
* Virtual Box (version 6.1.40)
* QEMU TCG (version 5.2.0)
* Hyper-V

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-01-03 22:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-30  3:02 [Bug 216867] New: KVM instruction emulation breaks LOCK instruction atomicity when CMPXCHG fails bugzilla-daemon
2023-01-03 22:05 ` Sean Christopherson
2023-01-03 22:05 ` [Bug 216867] " bugzilla-daemon
2023-01-03 22:38 ` bugzilla-daemon
2023-01-03 22:39 ` bugzilla-daemon
2023-01-03 22:48 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.