All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 216177] New: kvm-unit-tests vmx has about 60% of failure chance
@ 2022-06-27  2:17 bugzilla-daemon
  2022-06-28  0:28 ` [Bug 216177] " bugzilla-daemon
                   ` (12 more replies)
  0 siblings, 13 replies; 19+ messages in thread
From: bugzilla-daemon @ 2022-06-27  2:17 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216177

            Bug ID: 216177
           Summary: kvm-unit-tests vmx has about 60% of failure chance
           Product: Virtualization
           Version: unspecified
    Kernel Version: 5.19-rc1
          Hardware: Intel
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: kvm
          Assignee: virtualization_kvm@kernel-bugs.osdl.org
          Reporter: lixiao.yang@intel.com
        Regression: No

Created attachment 301281
  --> https://bugzilla.kernel.org/attachment.cgi?id=301281&action=edit
vmx failure log

Environment:
CPU Architecture: x86_64
Host OS: Red Hat Enterprise Linux 8.4 (Ootpa)
Host kernel: 5.19.0-rc1
gcc: gcc version 8.4.1
Host kernel source: https://git.kernel.org/pub/scm/virt/kvm/kvm.git
Branch: next
Commit: 4b88b1a518b337de1252b8180519ca4c00015c9e

Qemu source: https://git.qemu.org/git/qemu.git
Branch: master
Commit: 40d522490714b65e0856444277db6c14c5cc3796

kvm-unit-tests source: https://gitlab.com/kvm-unit-tests/kvm-unit-tests.git
Branch: master
Commit: ca85dda2671e88d34acfbca6de48a9ab32b1810d

Bug Detailed Description:
kvm-unit-tests vmx has about 60% of chance to fail. In my case, failure
happened 6 times out of 10 times of tests. 

Reproducing Steps:
rmmod kvm_intel
modprobe kvm_intel nested=Y
git clone https://gitlab.com/kvm-unit-tests/kvm-unit-tests.git
cd kvm-unit-tests
./configure
make standalone
cd tests
./vmx -cpu host

Actual Result:
...
SUMMARY: 430101 tests, 1 unexpected failures, 2 expected failures, 4 skipped
FAIL vmx (430101 tests, 1 unexpected failures, 2 expected failures, 4 skipped)

Expected Result:
...
SUMMARY: 430101 tests, 2 expected failures, 4 skipped
PASS vmx (430101 tests, 2 expected failures, 4 skipped)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug 216177] kvm-unit-tests vmx has about 60% of failure chance
  2022-06-27  2:17 [Bug 216177] New: kvm-unit-tests vmx has about 60% of failure chance bugzilla-daemon
@ 2022-06-28  0:28 ` bugzilla-daemon
  2022-06-28  0:37   ` Nadav Amit
  2022-06-28  0:37 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 19+ messages in thread
From: bugzilla-daemon @ 2022-06-28  0:28 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216177

Sean Christopherson (seanjc@google.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |seanjc@google.com

--- Comment #1 from Sean Christopherson (seanjc@google.com) ---
It's vmx_preemption_timer_expiry_test, which is known to be flaky (though IIRC
it's KVM that's at fault).

Test suite: vmx_preemption_timer_expiry_test
FAIL: Last stored guest TSC (28067103426) < TSC deadline (28067086048)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug 216177] kvm-unit-tests vmx has about 60% of failure chance
  2022-06-28  0:28 ` [Bug 216177] " bugzilla-daemon
@ 2022-06-28  0:37   ` Nadav Amit
  0 siblings, 0 replies; 19+ messages in thread
From: Nadav Amit @ 2022-06-28  0:37 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: kvm



> On Jun 27, 2022, at 5:28 PM, bugzilla-daemon@kernel.org wrote:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=216177
> 
> Sean Christopherson (seanjc@google.com) changed:
> 
>           What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                 CC|                            |seanjc@google.com
> 
> --- Comment #1 from Sean Christopherson (seanjc@google.com) ---
> It's vmx_preemption_timer_expiry_test, which is known to be flaky (though IIRC
> it's KVM that's at fault).
> 
> Test suite: vmx_preemption_timer_expiry_test
> FAIL: Last stored guest TSC (28067103426) < TSC deadline (28067086048)

For the record:

https://lore.kernel.org/kvm/D121A03E-6861-4736-8070-5D1E4FEE1D32@gmail.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug 216177] kvm-unit-tests vmx has about 60% of failure chance
  2022-06-27  2:17 [Bug 216177] New: kvm-unit-tests vmx has about 60% of failure chance bugzilla-daemon
  2022-06-28  0:28 ` [Bug 216177] " bugzilla-daemon
@ 2022-06-28  0:37 ` bugzilla-daemon
  2022-06-28  1:19 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: bugzilla-daemon @ 2022-06-28  0:37 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216177

--- Comment #2 from Nadav Amit (nadav.amit@gmail.com) ---
> On Jun 27, 2022, at 5:28 PM, bugzilla-daemon@kernel.org wrote:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=216177
> 
> Sean Christopherson (seanjc@google.com) changed:
> 
>           What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                 CC|                            |seanjc@google.com
> 
> --- Comment #1 from Sean Christopherson (seanjc@google.com) ---
> It's vmx_preemption_timer_expiry_test, which is known to be flaky (though
> IIRC
> it's KVM that's at fault).
> 
> Test suite: vmx_preemption_timer_expiry_test
> FAIL: Last stored guest TSC (28067103426) < TSC deadline (28067086048)

For the record:

https://lore.kernel.org/kvm/D121A03E-6861-4736-8070-5D1E4FEE1D32@gmail.com/

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug 216177] kvm-unit-tests vmx has about 60% of failure chance
  2022-06-27  2:17 [Bug 216177] New: kvm-unit-tests vmx has about 60% of failure chance bugzilla-daemon
  2022-06-28  0:28 ` [Bug 216177] " bugzilla-daemon
  2022-06-28  0:37 ` bugzilla-daemon
@ 2022-06-28  1:19 ` bugzilla-daemon
  2022-06-28  1:42   ` Nadav Amit
  2022-06-28  1:30 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 19+ messages in thread
From: bugzilla-daemon @ 2022-06-28  1:19 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216177

--- Comment #3 from Yang Lixiao (lixiao.yang@intel.com) ---
(In reply to Nadav Amit from comment #2)
> > On Jun 27, 2022, at 5:28 PM, bugzilla-daemon@kernel.org wrote:
> > 
> > https://bugzilla.kernel.org/show_bug.cgi?id=216177
> > 
> > Sean Christopherson (seanjc@google.com) changed:
> > 
> >           What    |Removed                     |Added
> >
> ----------------------------------------------------------------------------
> >                 CC|                            |seanjc@google.com
> > 
> > --- Comment #1 from Sean Christopherson (seanjc@google.com) ---
> > It's vmx_preemption_timer_expiry_test, which is known to be flaky (though
> > IIRC
> > it's KVM that's at fault).
> > 
> > Test suite: vmx_preemption_timer_expiry_test
> > FAIL: Last stored guest TSC (28067103426) < TSC deadline (28067086048)
> 
> For the record:
> 
> https://lore.kernel.org/kvm/D121A03E-6861-4736-8070-5D1E4FEE1D32@gmail.com/

Thanks for your reply. So this is a KVM bug, and you have sent a patch to kvm
to fix this bug, right?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug 216177] kvm-unit-tests vmx has about 60% of failure chance
  2022-06-27  2:17 [Bug 216177] New: kvm-unit-tests vmx has about 60% of failure chance bugzilla-daemon
                   ` (2 preceding siblings ...)
  2022-06-28  1:19 ` bugzilla-daemon
@ 2022-06-28  1:30 ` bugzilla-daemon
  2022-06-28  1:42 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: bugzilla-daemon @ 2022-06-28  1:30 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216177

--- Comment #4 from Sean Christopherson (seanjc@google.com) ---
On Tue, Jun 28, 2022, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216177
> 
> --- Comment #3 from Yang Lixiao (lixiao.yang@intel.com) ---
> (In reply to Nadav Amit from comment #2)
> > > On Jun 27, 2022, at 5:28 PM, bugzilla-daemon@kernel.org wrote:
> > > 
> > > https://bugzilla.kernel.org/show_bug.cgi?id=216177
> > > 
> > > Sean Christopherson (seanjc@google.com) changed:
> > > 
> > >           What    |Removed                     |Added
> > >
> >
> ----------------------------------------------------------------------------
> > >                 CC|                            |seanjc@google.com
> > > 
> > > --- Comment #1 from Sean Christopherson (seanjc@google.com) ---
> > > It's vmx_preemption_timer_expiry_test, which is known to be flaky (though
> > > IIRC
> > > it's KVM that's at fault).
> > > 
> > > Test suite: vmx_preemption_timer_expiry_test
> > > FAIL: Last stored guest TSC (28067103426) < TSC deadline (28067086048)
> > 
> > For the record:
> > 
> > https://lore.kernel.org/kvm/D121A03E-6861-4736-8070-5D1E4FEE1D32@gmail.com/
> 
> Thanks for your reply. So this is a KVM bug, and you have sent a patch to kvm
> to fix this bug, right?

No, AFAIK no one has posted a fix.  If it's the KVM issue I'm thinking of, the
fix is non-trivial.  It'd require scheduling a timer in L0 with a deadline
shorter
than what L1 requests when emulating the VMX timer, and then busy waiting in L0
if
the host timer fires early.  KVM already does this for e.g. L1's TSC deadline
timer.
That code would need to be adapated for the nested VMX preemption timer.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug 216177] kvm-unit-tests vmx has about 60% of failure chance
  2022-06-28  1:19 ` bugzilla-daemon
@ 2022-06-28  1:42   ` Nadav Amit
  2022-06-28  4:39     ` Jim Mattson
  0 siblings, 1 reply; 19+ messages in thread
From: Nadav Amit @ 2022-06-28  1:42 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: kvm



> On Jun 27, 2022, at 6:19 PM, bugzilla-daemon@kernel.org wrote:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=216177
> 
> --- Comment #3 from Yang Lixiao (lixiao.yang@intel.com) ---
> (In reply to Nadav Amit from comment #2)
>>> On Jun 27, 2022, at 5:28 PM, bugzilla-daemon@kernel.org wrote:
>>> 
>>> https://bugzilla.kernel.org/show_bug.cgi?id=216177
>>> 
>>> Sean Christopherson (seanjc@google.com) changed:
>>> 
>>>          What    |Removed                     |Added
>>> 
>> ----------------------------------------------------------------------------
>>>                CC|                            |seanjc@google.com
>>> 
>>> --- Comment #1 from Sean Christopherson (seanjc@google.com) ---
>>> It's vmx_preemption_timer_expiry_test, which is known to be flaky (though
>>> IIRC
>>> it's KVM that's at fault).
>>> 
>>> Test suite: vmx_preemption_timer_expiry_test
>>> FAIL: Last stored guest TSC (28067103426) < TSC deadline (28067086048)
>> 
>> For the record:
>> 
>> https://lore.kernel.org/kvm/D121A03E-6861-4736-8070-5D1E4FEE1D32@gmail.com/
> 
> Thanks for your reply. So this is a KVM bug, and you have sent a patch to kvm
> to fix this bug, right?

As I noted, at some point I did not manage to reproduce the failure.

The failure on bare-metal that I experienced hints that this is either a test
bug or (much less likely) a hardware bug. But I do not think it is likely to be
a KVM bug.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug 216177] kvm-unit-tests vmx has about 60% of failure chance
  2022-06-27  2:17 [Bug 216177] New: kvm-unit-tests vmx has about 60% of failure chance bugzilla-daemon
                   ` (3 preceding siblings ...)
  2022-06-28  1:30 ` bugzilla-daemon
@ 2022-06-28  1:42 ` bugzilla-daemon
  2022-06-28  1:48 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: bugzilla-daemon @ 2022-06-28  1:42 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216177

--- Comment #5 from Nadav Amit (nadav.amit@gmail.com) ---
> On Jun 27, 2022, at 6:19 PM, bugzilla-daemon@kernel.org wrote:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=216177
> 
> --- Comment #3 from Yang Lixiao (lixiao.yang@intel.com) ---
> (In reply to Nadav Amit from comment #2)
>>> On Jun 27, 2022, at 5:28 PM, bugzilla-daemon@kernel.org wrote:
>>> 
>>> https://bugzilla.kernel.org/show_bug.cgi?id=216177
>>> 
>>> Sean Christopherson (seanjc@google.com) changed:
>>> 
>>>          What    |Removed                     |Added
>>> 
>> ----------------------------------------------------------------------------
>>>                CC|                            |seanjc@google.com
>>> 
>>> --- Comment #1 from Sean Christopherson (seanjc@google.com) ---
>>> It's vmx_preemption_timer_expiry_test, which is known to be flaky (though
>>> IIRC
>>> it's KVM that's at fault).
>>> 
>>> Test suite: vmx_preemption_timer_expiry_test
>>> FAIL: Last stored guest TSC (28067103426) < TSC deadline (28067086048)
>> 
>> For the record:
>> 
>> https://lore.kernel.org/kvm/D121A03E-6861-4736-8070-5D1E4FEE1D32@gmail.com/
> 
> Thanks for your reply. So this is a KVM bug, and you have sent a patch to kvm
> to fix this bug, right?

As I noted, at some point I did not manage to reproduce the failure.

The failure on bare-metal that I experienced hints that this is either a test
bug or (much less likely) a hardware bug. But I do not think it is likely to be
a KVM bug.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug 216177] kvm-unit-tests vmx has about 60% of failure chance
  2022-06-27  2:17 [Bug 216177] New: kvm-unit-tests vmx has about 60% of failure chance bugzilla-daemon
                   ` (4 preceding siblings ...)
  2022-06-28  1:42 ` bugzilla-daemon
@ 2022-06-28  1:48 ` bugzilla-daemon
  2022-06-28  2:19 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: bugzilla-daemon @ 2022-06-28  1:48 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216177

--- Comment #6 from Sean Christopherson (seanjc@google.com) ---
On Tue, Jun 28, 2022, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216177
> 
> --- Comment #5 from Nadav Amit (nadav.amit@gmail.com) ---
> > On Jun 27, 2022, at 6:19 PM, bugzilla-daemon@kernel.org wrote:
> > 
> > https://bugzilla.kernel.org/show_bug.cgi?id=216177
> > 
> > --- Comment #3 from Yang Lixiao (lixiao.yang@intel.com) ---
> > (In reply to Nadav Amit from comment #2)
> >>> On Jun 27, 2022, at 5:28 PM, bugzilla-daemon@kernel.org wrote:
> >>> 
> >>> https://bugzilla.kernel.org/show_bug.cgi?id=216177
> >>> 
> >>> Sean Christopherson (seanjc@google.com) changed:
> >>> 
> >>>          What    |Removed                     |Added
> >>> 
> >>
> ----------------------------------------------------------------------------
> >>>                CC|                            |seanjc@google.com
> >>> 
> >>> --- Comment #1 from Sean Christopherson (seanjc@google.com) ---
> >>> It's vmx_preemption_timer_expiry_test, which is known to be flaky (though
> >>> IIRC
> >>> it's KVM that's at fault).
> >>> 
> >>> Test suite: vmx_preemption_timer_expiry_test
> >>> FAIL: Last stored guest TSC (28067103426) < TSC deadline (28067086048)
> >> 
> >> For the record:
> >> 
> >>
> https://lore.kernel.org/kvm/D121A03E-6861-4736-8070-5D1E4FEE1D32@gmail.com/
> > 
> > Thanks for your reply. So this is a KVM bug, and you have sent a patch to
> kvm
> > to fix this bug, right?
> 
> As I noted, at some point I did not manage to reproduce the failure.
> 
> The failure on bare-metal that I experienced hints that this is either a test
> bug or (much less likely) a hardware bug. But I do not think it is likely to
> be
> a KVM bug.

Oooh, your failure was on bare-metal.  I didn't grok that.  Though it could be
both a hardware bug and a KVM bug :-)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug 216177] kvm-unit-tests vmx has about 60% of failure chance
  2022-06-27  2:17 [Bug 216177] New: kvm-unit-tests vmx has about 60% of failure chance bugzilla-daemon
                   ` (5 preceding siblings ...)
  2022-06-28  1:48 ` bugzilla-daemon
@ 2022-06-28  2:19 ` bugzilla-daemon
  2022-06-28  4:39 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: bugzilla-daemon @ 2022-06-28  2:19 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216177

--- Comment #7 from Yang Lixiao (lixiao.yang@intel.com) ---
(In reply to Sean Christopherson from comment #6)
> On Tue, Jun 28, 2022, bugzilla-daemon@kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=216177
> > 
> > --- Comment #5 from Nadav Amit (nadav.amit@gmail.com) ---
> > > On Jun 27, 2022, at 6:19 PM, bugzilla-daemon@kernel.org wrote:
> > > 
> > > https://bugzilla.kernel.org/show_bug.cgi?id=216177
> > > 
> > > --- Comment #3 from Yang Lixiao (lixiao.yang@intel.com) ---
> > > (In reply to Nadav Amit from comment #2)
> > >>> On Jun 27, 2022, at 5:28 PM, bugzilla-daemon@kernel.org wrote:
> > >>> 
> > >>> https://bugzilla.kernel.org/show_bug.cgi?id=216177
> > >>> 
> > >>> Sean Christopherson (seanjc@google.com) changed:
> > >>> 
> > >>>          What    |Removed                     |Added
> > >>> 
> > >>
> >
> ----------------------------------------------------------------------------
> > >>>                CC|                            |seanjc@google.com
> > >>> 
> > >>> --- Comment #1 from Sean Christopherson (seanjc@google.com) ---
> > >>> It's vmx_preemption_timer_expiry_test, which is known to be flaky
> (though
> > >>> IIRC
> > >>> it's KVM that's at fault).
> > >>> 
> > >>> Test suite: vmx_preemption_timer_expiry_test
> > >>> FAIL: Last stored guest TSC (28067103426) < TSC deadline (28067086048)
> > >> 
> > >> For the record:
> > >> 
> > >>
> > https://lore.kernel.org/kvm/D121A03E-6861-4736-8070-5D1E4FEE1D32@gmail.com/
> > > 
> > > Thanks for your reply. So this is a KVM bug, and you have sent a patch to
> > kvm
> > > to fix this bug, right?
> > 
> > As I noted, at some point I did not manage to reproduce the failure.
> > 
> > The failure on bare-metal that I experienced hints that this is either a
> test
> > bug or (much less likely) a hardware bug. But I do not think it is likely
> to
> > be
> > a KVM bug.
> 
> Oooh, your failure was on bare-metal.  I didn't grok that.  Though it could
> be
> both a hardware bug and a KVM bug :-)

In my tests, I tested kvm-unit-tests vmx on bare-metal (not on VM) and this bug
happened on two different Ice Lake machines and one Cooper Lake machine.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug 216177] kvm-unit-tests vmx has about 60% of failure chance
  2022-06-28  1:42   ` Nadav Amit
@ 2022-06-28  4:39     ` Jim Mattson
  0 siblings, 0 replies; 19+ messages in thread
From: Jim Mattson @ 2022-06-28  4:39 UTC (permalink / raw)
  To: Nadav Amit; +Cc: bugzilla-daemon, kvm

On Mon, Jun 27, 2022 at 8:54 PM Nadav Amit <nadav.amit@gmail.com> wrote:

> The failure on bare-metal that I experienced hints that this is either a test
> bug or (much less likely) a hardware bug. But I do not think it is likely to be
> a KVM bug.

KVM does not use the VMX-preemption timer to virtualize L1's
VMX-preemption timer (and that is why KVM is broken). The KVM bug was
introduced with commit f4124500c2c1 ("KVM: nVMX: Fully emulate
preemption timer"), which uses an L0 CLOCK_MONOTONIC hrtimer to
emulate L1's VMX-preemption timer. There are many reasons that this
cannot possibly work, not the least of which is that the
CLOCK_MONOTONIC timer is subject to time slew.

Currently, KVM reserves L0's VMX-preemption timer for emulating L1's
APIC timer. Better would be to determine whether L1's APIC timer or
L1's VMX-preemption timer is scheduled to fire first, and use L0's
VMX-preemption timer to trigger a VM-exit on the nearest alarm.
Alternatively, as Sean noted, one could perhaps arrange for the
hrtimer to fire early enough that it won't fire late, but I don't
really think that's a viable solution.

I can't explain the bare-metal failures, but I will note that the test
assumes the default treatment of SMIs and SMM. The test will likely
fail with the dual-monitor treatment of SMIs and SMM. Aside from the
older CPUs with broken VMX-preemption timers, I don't know of any
relevant errata.

Of course, it is possible that the test itself is buggy. For the
person who reported bare-metal failures on Ice Lake and Cooper Lake,
how long was the test in VMX non-root mode past the VMX-preemption
timer deadline?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug 216177] kvm-unit-tests vmx has about 60% of failure chance
  2022-06-27  2:17 [Bug 216177] New: kvm-unit-tests vmx has about 60% of failure chance bugzilla-daemon
                   ` (6 preceding siblings ...)
  2022-06-28  2:19 ` bugzilla-daemon
@ 2022-06-28  4:39 ` bugzilla-daemon
  2022-06-28  6:11 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: bugzilla-daemon @ 2022-06-28  4:39 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216177

--- Comment #8 from Jim Mattson (jmattson@google.com) ---
On Mon, Jun 27, 2022 at 8:54 PM Nadav Amit <nadav.amit@gmail.com> wrote:

> The failure on bare-metal that I experienced hints that this is either a test
> bug or (much less likely) a hardware bug. But I do not think it is likely to
> be
> a KVM bug.

KVM does not use the VMX-preemption timer to virtualize L1's
VMX-preemption timer (and that is why KVM is broken). The KVM bug was
introduced with commit f4124500c2c1 ("KVM: nVMX: Fully emulate
preemption timer"), which uses an L0 CLOCK_MONOTONIC hrtimer to
emulate L1's VMX-preemption timer. There are many reasons that this
cannot possibly work, not the least of which is that the
CLOCK_MONOTONIC timer is subject to time slew.

Currently, KVM reserves L0's VMX-preemption timer for emulating L1's
APIC timer. Better would be to determine whether L1's APIC timer or
L1's VMX-preemption timer is scheduled to fire first, and use L0's
VMX-preemption timer to trigger a VM-exit on the nearest alarm.
Alternatively, as Sean noted, one could perhaps arrange for the
hrtimer to fire early enough that it won't fire late, but I don't
really think that's a viable solution.

I can't explain the bare-metal failures, but I will note that the test
assumes the default treatment of SMIs and SMM. The test will likely
fail with the dual-monitor treatment of SMIs and SMM. Aside from the
older CPUs with broken VMX-preemption timers, I don't know of any
relevant errata.

Of course, it is possible that the test itself is buggy. For the
person who reported bare-metal failures on Ice Lake and Cooper Lake,
how long was the test in VMX non-root mode past the VMX-preemption
timer deadline?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug 216177] kvm-unit-tests vmx has about 60% of failure chance
  2022-06-27  2:17 [Bug 216177] New: kvm-unit-tests vmx has about 60% of failure chance bugzilla-daemon
                   ` (7 preceding siblings ...)
  2022-06-28  4:39 ` bugzilla-daemon
@ 2022-06-28  6:11 ` bugzilla-daemon
  2022-06-28 18:24   ` Jim Mattson
  2022-06-28 18:24 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 19+ messages in thread
From: bugzilla-daemon @ 2022-06-28  6:11 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216177

--- Comment #9 from Yang Lixiao (lixiao.yang@intel.com) ---
(In reply to Jim Mattson from comment #8)
> On Mon, Jun 27, 2022 at 8:54 PM Nadav Amit <nadav.amit@gmail.com> wrote:
> 
> > The failure on bare-metal that I experienced hints that this is either a
> test
> > bug or (much less likely) a hardware bug. But I do not think it is likely
> to
> > be
> > a KVM bug.
> 
> KVM does not use the VMX-preemption timer to virtualize L1's
> VMX-preemption timer (and that is why KVM is broken). The KVM bug was
> introduced with commit f4124500c2c1 ("KVM: nVMX: Fully emulate
> preemption timer"), which uses an L0 CLOCK_MONOTONIC hrtimer to
> emulate L1's VMX-preemption timer. There are many reasons that this
> cannot possibly work, not the least of which is that the
> CLOCK_MONOTONIC timer is subject to time slew.
> 
> Currently, KVM reserves L0's VMX-preemption timer for emulating L1's
> APIC timer. Better would be to determine whether L1's APIC timer or
> L1's VMX-preemption timer is scheduled to fire first, and use L0's
> VMX-preemption timer to trigger a VM-exit on the nearest alarm.
> Alternatively, as Sean noted, one could perhaps arrange for the
> hrtimer to fire early enough that it won't fire late, but I don't
> really think that's a viable solution.
> 
> I can't explain the bare-metal failures, but I will note that the test
> assumes the default treatment of SMIs and SMM. The test will likely
> fail with the dual-monitor treatment of SMIs and SMM. Aside from the
> older CPUs with broken VMX-preemption timers, I don't know of any
> relevant errata.
> 
> Of course, it is possible that the test itself is buggy. For the
> person who reported bare-metal failures on Ice Lake and Cooper Lake,
> how long was the test in VMX non-root mode past the VMX-preemption
> timer deadline?

On the first Ice lake:
Test suite: vmx_preemption_timer_expiry_test
FAIL: Last stored guest TSC (28067103426) < TSC deadline (28067086048)

On the second Ice lake:
Test suite: vmx_preemption_timer_expiry_test
FAIL: Last stored guest TSC (27014488614) < TSC deadline (27014469152)

On Cooper lake:
Test suite: vmx_preemption_timer_expiry_test
FAIL: Last stored guest TSC (29030585690) < TSC deadline (29030565024)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug 216177] kvm-unit-tests vmx has about 60% of failure chance
  2022-06-28  6:11 ` bugzilla-daemon
@ 2022-06-28 18:24   ` Jim Mattson
  0 siblings, 0 replies; 19+ messages in thread
From: Jim Mattson @ 2022-06-28 18:24 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: kvm

On Mon, Jun 27, 2022 at 11:32 PM <bugzilla-daemon@kernel.org> wrote:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=216177
>
> --- Comment #9 from Yang Lixiao (lixiao.yang@intel.com) ---
> (In reply to Jim Mattson from comment #8)
> > On Mon, Jun 27, 2022 at 8:54 PM Nadav Amit <nadav.amit@gmail.com> wrote:
> >
> > > The failure on bare-metal that I experienced hints that this is either a
> > test
> > > bug or (much less likely) a hardware bug. But I do not think it is likely
> > to
> > > be
> > > a KVM bug.
> >
> > KVM does not use the VMX-preemption timer to virtualize L1's
> > VMX-preemption timer (and that is why KVM is broken). The KVM bug was
> > introduced with commit f4124500c2c1 ("KVM: nVMX: Fully emulate
> > preemption timer"), which uses an L0 CLOCK_MONOTONIC hrtimer to
> > emulate L1's VMX-preemption timer. There are many reasons that this
> > cannot possibly work, not the least of which is that the
> > CLOCK_MONOTONIC timer is subject to time slew.
> >
> > Currently, KVM reserves L0's VMX-preemption timer for emulating L1's
> > APIC timer. Better would be to determine whether L1's APIC timer or
> > L1's VMX-preemption timer is scheduled to fire first, and use L0's
> > VMX-preemption timer to trigger a VM-exit on the nearest alarm.
> > Alternatively, as Sean noted, one could perhaps arrange for the
> > hrtimer to fire early enough that it won't fire late, but I don't
> > really think that's a viable solution.
> >
> > I can't explain the bare-metal failures, but I will note that the test
> > assumes the default treatment of SMIs and SMM. The test will likely
> > fail with the dual-monitor treatment of SMIs and SMM. Aside from the
> > older CPUs with broken VMX-preemption timers, I don't know of any
> > relevant errata.
> >
> > Of course, it is possible that the test itself is buggy. For the
> > person who reported bare-metal failures on Ice Lake and Cooper Lake,
> > how long was the test in VMX non-root mode past the VMX-preemption
> > timer deadline?
>
> On the first Ice lake:
> Test suite: vmx_preemption_timer_expiry_test
> FAIL: Last stored guest TSC (28067103426) < TSC deadline (28067086048)
>
> On the second Ice lake:
> Test suite: vmx_preemption_timer_expiry_test
> FAIL: Last stored guest TSC (27014488614) < TSC deadline (27014469152)
>
> On Cooper lake:
> Test suite: vmx_preemption_timer_expiry_test
> FAIL: Last stored guest TSC (29030585690) < TSC deadline (29030565024)

Wow! Those are *huge* overruns. What is the value of MSR 0x9B on these hosts?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug 216177] kvm-unit-tests vmx has about 60% of failure chance
  2022-06-27  2:17 [Bug 216177] New: kvm-unit-tests vmx has about 60% of failure chance bugzilla-daemon
                   ` (8 preceding siblings ...)
  2022-06-28  6:11 ` bugzilla-daemon
@ 2022-06-28 18:24 ` bugzilla-daemon
  2022-06-29  0:22 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 19+ messages in thread
From: bugzilla-daemon @ 2022-06-28 18:24 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216177

--- Comment #10 from Jim Mattson (jmattson@google.com) ---
On Mon, Jun 27, 2022 at 11:32 PM <bugzilla-daemon@kernel.org> wrote:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=216177
>
> --- Comment #9 from Yang Lixiao (lixiao.yang@intel.com) ---
> (In reply to Jim Mattson from comment #8)
> > On Mon, Jun 27, 2022 at 8:54 PM Nadav Amit <nadav.amit@gmail.com> wrote:
> >
> > > The failure on bare-metal that I experienced hints that this is either a
> > test
> > > bug or (much less likely) a hardware bug. But I do not think it is likely
> > to
> > > be
> > > a KVM bug.
> >
> > KVM does not use the VMX-preemption timer to virtualize L1's
> > VMX-preemption timer (and that is why KVM is broken). The KVM bug was
> > introduced with commit f4124500c2c1 ("KVM: nVMX: Fully emulate
> > preemption timer"), which uses an L0 CLOCK_MONOTONIC hrtimer to
> > emulate L1's VMX-preemption timer. There are many reasons that this
> > cannot possibly work, not the least of which is that the
> > CLOCK_MONOTONIC timer is subject to time slew.
> >
> > Currently, KVM reserves L0's VMX-preemption timer for emulating L1's
> > APIC timer. Better would be to determine whether L1's APIC timer or
> > L1's VMX-preemption timer is scheduled to fire first, and use L0's
> > VMX-preemption timer to trigger a VM-exit on the nearest alarm.
> > Alternatively, as Sean noted, one could perhaps arrange for the
> > hrtimer to fire early enough that it won't fire late, but I don't
> > really think that's a viable solution.
> >
> > I can't explain the bare-metal failures, but I will note that the test
> > assumes the default treatment of SMIs and SMM. The test will likely
> > fail with the dual-monitor treatment of SMIs and SMM. Aside from the
> > older CPUs with broken VMX-preemption timers, I don't know of any
> > relevant errata.
> >
> > Of course, it is possible that the test itself is buggy. For the
> > person who reported bare-metal failures on Ice Lake and Cooper Lake,
> > how long was the test in VMX non-root mode past the VMX-preemption
> > timer deadline?
>
> On the first Ice lake:
> Test suite: vmx_preemption_timer_expiry_test
> FAIL: Last stored guest TSC (28067103426) < TSC deadline (28067086048)
>
> On the second Ice lake:
> Test suite: vmx_preemption_timer_expiry_test
> FAIL: Last stored guest TSC (27014488614) < TSC deadline (27014469152)
>
> On Cooper lake:
> Test suite: vmx_preemption_timer_expiry_test
> FAIL: Last stored guest TSC (29030585690) < TSC deadline (29030565024)

Wow! Those are *huge* overruns. What is the value of MSR 0x9B on these hosts?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug 216177] kvm-unit-tests vmx has about 60% of failure chance
  2022-06-27  2:17 [Bug 216177] New: kvm-unit-tests vmx has about 60% of failure chance bugzilla-daemon
                   ` (9 preceding siblings ...)
  2022-06-28 18:24 ` bugzilla-daemon
@ 2022-06-29  0:22 ` bugzilla-daemon
  2022-06-29  2:32   ` Jim Mattson
  2022-06-29  2:32 ` bugzilla-daemon
  2022-06-29  2:50 ` bugzilla-daemon
  12 siblings, 1 reply; 19+ messages in thread
From: bugzilla-daemon @ 2022-06-29  0:22 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216177

--- Comment #11 from Yang Lixiao (lixiao.yang@intel.com) ---
(In reply to Jim Mattson from comment #10)
> On Mon, Jun 27, 2022 at 11:32 PM <bugzilla-daemon@kernel.org> wrote:
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=216177
> >
> > --- Comment #9 from Yang Lixiao (lixiao.yang@intel.com) ---
> > (In reply to Jim Mattson from comment #8)
> > > On Mon, Jun 27, 2022 at 8:54 PM Nadav Amit <nadav.amit@gmail.com> wrote:
> > >
> > > > The failure on bare-metal that I experienced hints that this is either
> a
> > > test
> > > > bug or (much less likely) a hardware bug. But I do not think it is
> likely
> > > to
> > > > be
> > > > a KVM bug.
> > >
> > > KVM does not use the VMX-preemption timer to virtualize L1's
> > > VMX-preemption timer (and that is why KVM is broken). The KVM bug was
> > > introduced with commit f4124500c2c1 ("KVM: nVMX: Fully emulate
> > > preemption timer"), which uses an L0 CLOCK_MONOTONIC hrtimer to
> > > emulate L1's VMX-preemption timer. There are many reasons that this
> > > cannot possibly work, not the least of which is that the
> > > CLOCK_MONOTONIC timer is subject to time slew.
> > >
> > > Currently, KVM reserves L0's VMX-preemption timer for emulating L1's
> > > APIC timer. Better would be to determine whether L1's APIC timer or
> > > L1's VMX-preemption timer is scheduled to fire first, and use L0's
> > > VMX-preemption timer to trigger a VM-exit on the nearest alarm.
> > > Alternatively, as Sean noted, one could perhaps arrange for the
> > > hrtimer to fire early enough that it won't fire late, but I don't
> > > really think that's a viable solution.
> > >
> > > I can't explain the bare-metal failures, but I will note that the test
> > > assumes the default treatment of SMIs and SMM. The test will likely
> > > fail with the dual-monitor treatment of SMIs and SMM. Aside from the
> > > older CPUs with broken VMX-preemption timers, I don't know of any
> > > relevant errata.
> > >
> > > Of course, it is possible that the test itself is buggy. For the
> > > person who reported bare-metal failures on Ice Lake and Cooper Lake,
> > > how long was the test in VMX non-root mode past the VMX-preemption
> > > timer deadline?
> >
> > On the first Ice lake:
> > Test suite: vmx_preemption_timer_expiry_test
> > FAIL: Last stored guest TSC (28067103426) < TSC deadline (28067086048)
> >
> > On the second Ice lake:
> > Test suite: vmx_preemption_timer_expiry_test
> > FAIL: Last stored guest TSC (27014488614) < TSC deadline (27014469152)
> >
> > On Cooper lake:
> > Test suite: vmx_preemption_timer_expiry_test
> > FAIL: Last stored guest TSC (29030585690) < TSC deadline (29030565024)
> 
> Wow! Those are *huge* overruns. What is the value of MSR 0x9B on these hosts?

All of the values of MSR 0x9B on the three hosts are 0.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bug 216177] kvm-unit-tests vmx has about 60% of failure chance
  2022-06-29  0:22 ` bugzilla-daemon
@ 2022-06-29  2:32   ` Jim Mattson
  0 siblings, 0 replies; 19+ messages in thread
From: Jim Mattson @ 2022-06-29  2:32 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: kvm

On Tue, Jun 28, 2022 at 5:22 PM <bugzilla-daemon@kernel.org> wrote:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=216177
>
> --- Comment #11 from Yang Lixiao (lixiao.yang@intel.com) ---
> (In reply to Jim Mattson from comment #10)
> > On Mon, Jun 27, 2022 at 11:32 PM <bugzilla-daemon@kernel.org> wrote:
> > >
> > > https://bugzilla.kernel.org/show_bug.cgi?id=216177
> > >
> > > --- Comment #9 from Yang Lixiao (lixiao.yang@intel.com) ---
> > > (In reply to Jim Mattson from comment #8)
> > > > On Mon, Jun 27, 2022 at 8:54 PM Nadav Amit <nadav.amit@gmail.com> wrote:
> > > >
> > > > > The failure on bare-metal that I experienced hints that this is either
> > a
> > > > test
> > > > > bug or (much less likely) a hardware bug. But I do not think it is
> > likely
> > > > to
> > > > > be
> > > > > a KVM bug.
> > > >
> > > > KVM does not use the VMX-preemption timer to virtualize L1's
> > > > VMX-preemption timer (and that is why KVM is broken). The KVM bug was
> > > > introduced with commit f4124500c2c1 ("KVM: nVMX: Fully emulate
> > > > preemption timer"), which uses an L0 CLOCK_MONOTONIC hrtimer to
> > > > emulate L1's VMX-preemption timer. There are many reasons that this
> > > > cannot possibly work, not the least of which is that the
> > > > CLOCK_MONOTONIC timer is subject to time slew.
> > > >
> > > > Currently, KVM reserves L0's VMX-preemption timer for emulating L1's
> > > > APIC timer. Better would be to determine whether L1's APIC timer or
> > > > L1's VMX-preemption timer is scheduled to fire first, and use L0's
> > > > VMX-preemption timer to trigger a VM-exit on the nearest alarm.
> > > > Alternatively, as Sean noted, one could perhaps arrange for the
> > > > hrtimer to fire early enough that it won't fire late, but I don't
> > > > really think that's a viable solution.
> > > >
> > > > I can't explain the bare-metal failures, but I will note that the test
> > > > assumes the default treatment of SMIs and SMM. The test will likely
> > > > fail with the dual-monitor treatment of SMIs and SMM. Aside from the
> > > > older CPUs with broken VMX-preemption timers, I don't know of any
> > > > relevant errata.
> > > >
> > > > Of course, it is possible that the test itself is buggy. For the
> > > > person who reported bare-metal failures on Ice Lake and Cooper Lake,
> > > > how long was the test in VMX non-root mode past the VMX-preemption
> > > > timer deadline?
> > >
> > > On the first Ice lake:
> > > Test suite: vmx_preemption_timer_expiry_test
> > > FAIL: Last stored guest TSC (28067103426) < TSC deadline (28067086048)
> > >
> > > On the second Ice lake:
> > > Test suite: vmx_preemption_timer_expiry_test
> > > FAIL: Last stored guest TSC (27014488614) < TSC deadline (27014469152)
> > >
> > > On Cooper lake:
> > > Test suite: vmx_preemption_timer_expiry_test
> > > FAIL: Last stored guest TSC (29030585690) < TSC deadline (29030565024)
> >
> > Wow! Those are *huge* overruns. What is the value of MSR 0x9B on these hosts?
>
> All of the values of MSR 0x9B on the three hosts are 0.
>
> --
> You may reply to this email to add a comment.
>
> You are receiving this mail because:
> You are watching the assignee of the bug.
Doh! There is a glaring bug in the test. I'll post a fix soon.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug 216177] kvm-unit-tests vmx has about 60% of failure chance
  2022-06-27  2:17 [Bug 216177] New: kvm-unit-tests vmx has about 60% of failure chance bugzilla-daemon
                   ` (10 preceding siblings ...)
  2022-06-29  0:22 ` bugzilla-daemon
@ 2022-06-29  2:32 ` bugzilla-daemon
  2022-06-29  2:50 ` bugzilla-daemon
  12 siblings, 0 replies; 19+ messages in thread
From: bugzilla-daemon @ 2022-06-29  2:32 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216177

--- Comment #12 from Jim Mattson (jmattson@google.com) ---
On Tue, Jun 28, 2022 at 5:22 PM <bugzilla-daemon@kernel.org> wrote:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=216177
>
> --- Comment #11 from Yang Lixiao (lixiao.yang@intel.com) ---
> (In reply to Jim Mattson from comment #10)
> > On Mon, Jun 27, 2022 at 11:32 PM <bugzilla-daemon@kernel.org> wrote:
> > >
> > > https://bugzilla.kernel.org/show_bug.cgi?id=216177
> > >
> > > --- Comment #9 from Yang Lixiao (lixiao.yang@intel.com) ---
> > > (In reply to Jim Mattson from comment #8)
> > > > On Mon, Jun 27, 2022 at 8:54 PM Nadav Amit <nadav.amit@gmail.com>
> wrote:
> > > >
> > > > > The failure on bare-metal that I experienced hints that this is
> either
> > a
> > > > test
> > > > > bug or (much less likely) a hardware bug. But I do not think it is
> > likely
> > > > to
> > > > > be
> > > > > a KVM bug.
> > > >
> > > > KVM does not use the VMX-preemption timer to virtualize L1's
> > > > VMX-preemption timer (and that is why KVM is broken). The KVM bug was
> > > > introduced with commit f4124500c2c1 ("KVM: nVMX: Fully emulate
> > > > preemption timer"), which uses an L0 CLOCK_MONOTONIC hrtimer to
> > > > emulate L1's VMX-preemption timer. There are many reasons that this
> > > > cannot possibly work, not the least of which is that the
> > > > CLOCK_MONOTONIC timer is subject to time slew.
> > > >
> > > > Currently, KVM reserves L0's VMX-preemption timer for emulating L1's
> > > > APIC timer. Better would be to determine whether L1's APIC timer or
> > > > L1's VMX-preemption timer is scheduled to fire first, and use L0's
> > > > VMX-preemption timer to trigger a VM-exit on the nearest alarm.
> > > > Alternatively, as Sean noted, one could perhaps arrange for the
> > > > hrtimer to fire early enough that it won't fire late, but I don't
> > > > really think that's a viable solution.
> > > >
> > > > I can't explain the bare-metal failures, but I will note that the test
> > > > assumes the default treatment of SMIs and SMM. The test will likely
> > > > fail with the dual-monitor treatment of SMIs and SMM. Aside from the
> > > > older CPUs with broken VMX-preemption timers, I don't know of any
> > > > relevant errata.
> > > >
> > > > Of course, it is possible that the test itself is buggy. For the
> > > > person who reported bare-metal failures on Ice Lake and Cooper Lake,
> > > > how long was the test in VMX non-root mode past the VMX-preemption
> > > > timer deadline?
> > >
> > > On the first Ice lake:
> > > Test suite: vmx_preemption_timer_expiry_test
> > > FAIL: Last stored guest TSC (28067103426) < TSC deadline (28067086048)
> > >
> > > On the second Ice lake:
> > > Test suite: vmx_preemption_timer_expiry_test
> > > FAIL: Last stored guest TSC (27014488614) < TSC deadline (27014469152)
> > >
> > > On Cooper lake:
> > > Test suite: vmx_preemption_timer_expiry_test
> > > FAIL: Last stored guest TSC (29030585690) < TSC deadline (29030565024)
> >
> > Wow! Those are *huge* overruns. What is the value of MSR 0x9B on these
> hosts?
>
> All of the values of MSR 0x9B on the three hosts are 0.
>
> --
> You may reply to this email to add a comment.
>
> You are receiving this mail because:
> You are watching the assignee of the bug.
Doh! There is a glaring bug in the test. I'll post a fix soon.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug 216177] kvm-unit-tests vmx has about 60% of failure chance
  2022-06-27  2:17 [Bug 216177] New: kvm-unit-tests vmx has about 60% of failure chance bugzilla-daemon
                   ` (11 preceding siblings ...)
  2022-06-29  2:32 ` bugzilla-daemon
@ 2022-06-29  2:50 ` bugzilla-daemon
  12 siblings, 0 replies; 19+ messages in thread
From: bugzilla-daemon @ 2022-06-29  2:50 UTC (permalink / raw)
  To: kvm

https://bugzilla.kernel.org/show_bug.cgi?id=216177

--- Comment #13 from Yang Lixiao (lixiao.yang@intel.com) ---
(In reply to Jim Mattson from comment #12)
> On Tue, Jun 28, 2022 at 5:22 PM <bugzilla-daemon@kernel.org> wrote:
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=216177
> >
> > --- Comment #11 from Yang Lixiao (lixiao.yang@intel.com) ---
> > (In reply to Jim Mattson from comment #10)
> > > On Mon, Jun 27, 2022 at 11:32 PM <bugzilla-daemon@kernel.org> wrote:
> > > >
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=216177
> > > >
> > > > --- Comment #9 from Yang Lixiao (lixiao.yang@intel.com) ---
> > > > (In reply to Jim Mattson from comment #8)
> > > > > On Mon, Jun 27, 2022 at 8:54 PM Nadav Amit <nadav.amit@gmail.com>
> > wrote:
> > > > >
> > > > > > The failure on bare-metal that I experienced hints that this is
> > either
> > > a
> > > > > test
> > > > > > bug or (much less likely) a hardware bug. But I do not think it is
> > > likely
> > > > > to
> > > > > > be
> > > > > > a KVM bug.
> > > > >
> > > > > KVM does not use the VMX-preemption timer to virtualize L1's
> > > > > VMX-preemption timer (and that is why KVM is broken). The KVM bug was
> > > > > introduced with commit f4124500c2c1 ("KVM: nVMX: Fully emulate
> > > > > preemption timer"), which uses an L0 CLOCK_MONOTONIC hrtimer to
> > > > > emulate L1's VMX-preemption timer. There are many reasons that this
> > > > > cannot possibly work, not the least of which is that the
> > > > > CLOCK_MONOTONIC timer is subject to time slew.
> > > > >
> > > > > Currently, KVM reserves L0's VMX-preemption timer for emulating L1's
> > > > > APIC timer. Better would be to determine whether L1's APIC timer or
> > > > > L1's VMX-preemption timer is scheduled to fire first, and use L0's
> > > > > VMX-preemption timer to trigger a VM-exit on the nearest alarm.
> > > > > Alternatively, as Sean noted, one could perhaps arrange for the
> > > > > hrtimer to fire early enough that it won't fire late, but I don't
> > > > > really think that's a viable solution.
> > > > >
> > > > > I can't explain the bare-metal failures, but I will note that the
> test
> > > > > assumes the default treatment of SMIs and SMM. The test will likely
> > > > > fail with the dual-monitor treatment of SMIs and SMM. Aside from the
> > > > > older CPUs with broken VMX-preemption timers, I don't know of any
> > > > > relevant errata.
> > > > >
> > > > > Of course, it is possible that the test itself is buggy. For the
> > > > > person who reported bare-metal failures on Ice Lake and Cooper Lake,
> > > > > how long was the test in VMX non-root mode past the VMX-preemption
> > > > > timer deadline?
> > > >
> > > > On the first Ice lake:
> > > > Test suite: vmx_preemption_timer_expiry_test
> > > > FAIL: Last stored guest TSC (28067103426) < TSC deadline (28067086048)
> > > >
> > > > On the second Ice lake:
> > > > Test suite: vmx_preemption_timer_expiry_test
> > > > FAIL: Last stored guest TSC (27014488614) < TSC deadline (27014469152)
> > > >
> > > > On Cooper lake:
> > > > Test suite: vmx_preemption_timer_expiry_test
> > > > FAIL: Last stored guest TSC (29030585690) < TSC deadline (29030565024)
> > >
> > > Wow! Those are *huge* overruns. What is the value of MSR 0x9B on these
> > hosts?
> >
> > All of the values of MSR 0x9B on the three hosts are 0.
> >
> > --
> > You may reply to this email to add a comment.
> >
> > You are receiving this mail because:
> > You are watching the assignee of the bug.
> Doh! There is a glaring bug in the test. I'll post a fix soon.

Thanks!

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-06-29  2:50 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-27  2:17 [Bug 216177] New: kvm-unit-tests vmx has about 60% of failure chance bugzilla-daemon
2022-06-28  0:28 ` [Bug 216177] " bugzilla-daemon
2022-06-28  0:37   ` Nadav Amit
2022-06-28  0:37 ` bugzilla-daemon
2022-06-28  1:19 ` bugzilla-daemon
2022-06-28  1:42   ` Nadav Amit
2022-06-28  4:39     ` Jim Mattson
2022-06-28  1:30 ` bugzilla-daemon
2022-06-28  1:42 ` bugzilla-daemon
2022-06-28  1:48 ` bugzilla-daemon
2022-06-28  2:19 ` bugzilla-daemon
2022-06-28  4:39 ` bugzilla-daemon
2022-06-28  6:11 ` bugzilla-daemon
2022-06-28 18:24   ` Jim Mattson
2022-06-28 18:24 ` bugzilla-daemon
2022-06-29  0:22 ` bugzilla-daemon
2022-06-29  2:32   ` Jim Mattson
2022-06-29  2:32 ` bugzilla-daemon
2022-06-29  2:50 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.