kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sandipan Das <sandipan.das@amd.com>
To: Like Xu <like.xu.linux@gmail.com>
Cc: kvm@vger.kernel.org, Sean Christopherson <seanjc@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [kvm-unit-tests PATCH v3 10/13] x86/pmu: Update testcases to cover Intel Arch PMU Version 1
Date: Thu, 8 Sep 2022 13:53:18 +0530	[thread overview]
Message-ID: <27ef941b-05df-7fa4-a54e-8571b0bf70e7@amd.com> (raw)
In-Reply-To: <a1e202f0-260e-fe00-4e39-42e390d4021b@gmail.com>

On 9/6/2022 7:05 PM, Like Xu wrote:
> On 6/9/2022 4:16 pm, Sandipan Das wrote:
>> Hi Like,
>>
>> On 8/19/2022 4:39 PM, Like Xu wrote:
>>> From: Like Xu <likexu@tencent.com>
>>>
>>> For most unit tests, the basic framework and use cases which test
>>> any PMU counter do not require any changes, except for two things:
>>>
>>> - No access to registers introduced only in PMU version 2 and above;
>>> - Expanded tolerance for testing counter overflows
>>>    due to the loss of uniform control of the gloabl_ctrl register
>>>
>>> Adding some pmu_version() return value checks can seamlessly support
>>> Intel Arch PMU Version 1, while opening the door for AMD PMUs tests.
>>>
>>> Signed-off-by: Like Xu <likexu@tencent.com>
>>> ---
>>>   x86/pmu.c | 64 +++++++++++++++++++++++++++++++++++++------------------
>>>   1 file changed, 43 insertions(+), 21 deletions(-)
>>>
>>> [...]
>>> @@ -327,13 +335,21 @@ static void check_counter_overflow(void)
>>>               cnt.config &= ~EVNTSEL_INT;
>>>           idx = event_to_global_idx(&cnt);
>>>           __measure(&cnt, cnt.count);
>>> -        report(cnt.count == 1, "cntr-%d", i);
>>> +
>>> +        report(check_irq() == (i % 2), "irq-%d", i);
>>> +        if (pmu_version() > 1)
>>> +            report(cnt.count == 1, "cntr-%d", i);
>>> +        else
>>> +            report(cnt.count < 4, "cntr-%d", i);
>>> +
>>> [...]
>>
>> Sorry I missed this in the previous response. With an upper bound of
>> 4, I see this test failing some times for at least one of the six
>> counters (with NMI watchdog disabled on the host) on a Milan (Zen 3)
>> system. Increasing it further does reduce the probability but I still
>> see failures. Do you see the same behaviour on systems with Zen 3 and
>> older processors?
> 
> A hundred runs on my machine did not report a failure.
> 

Was this on a Zen 4 system?

> But I'm not surprised by this, because some AMD platforms do
> have hw PMU errata which requires bios or ucode fixes.
> 
> Please help find the right upper bound for all your available AMD boxes.
> 

Even after updating the microcode, the tests failed just as often in an
overnight loop. However, upon closer inspection, the reason for failure
was different. The variance is well within the bounds now but sometimes,
is_the_count_reproducible() is true. Since this selects the original
verification criteria (cnt.count == 1), the tests fail.

> What makes me most nervous is that AMD's core hardware events run
> repeatedly against the same workload, and their count results are erratic.
> 

With that in mind, should we consider having the following change?

diff --git a/x86/pmu.c b/x86/pmu.c
index bb16b3c..39979b8 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -352,7 +352,7 @@ static void check_counter_overflow(void)
                .ctr = gp_counter_base,
                .config = EVNTSEL_OS | EVNTSEL_USR | (*gp_events)[1].unit_sel /* instructions */,
        };
-       bool precise_event = is_the_count_reproducible(&cnt);
+       bool precise_event = is_intel() ? is_the_count_reproducible(&cnt) : false;

        __measure(&cnt, 0);
        count = cnt.count;

With this, the tests always pass. I will run another overnight loop and
report back if I see any errors.

> You may check is_the_count_reproducible() in the test case:
> [1]https://lore.kernel.org/kvm/20220905123946.95223-7-likexu@tencent.com/

On Zen 4 systems, this is always false and the overflow tests always
pass irrespective of whether PerfMonV2 is enabled for the guest or not.

- Sandipan

  reply	other threads:[~2022-09-08  8:23 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-19 11:09 [kvm-unit-tests PATCH v3 00/13] x86/pmu: Test case optimization, fixes and additions Like Xu
2022-08-19 11:09 ` [kvm-unit-tests PATCH v3 01/13] x86/pmu: Introduce __start_event() to drop all of the manual zeroing Like Xu
2022-08-19 11:09 ` [kvm-unit-tests PATCH v3 02/13] x86/pmu: Introduce multiple_{one, many}() to improve readability Like Xu
2022-08-19 11:09 ` [kvm-unit-tests PATCH v3 03/13] x86/pmu: Reset the expected count of the fixed counter 0 when i386 Like Xu
2022-10-05 22:18   ` Sean Christopherson
2022-10-17  7:30     ` Like Xu
2022-10-21 19:16       ` Sean Christopherson
2022-08-19 11:09 ` [kvm-unit-tests PATCH v3 04/13] x86/pmu: Add tests for Intel Processor Event Based Sampling (PEBS) Like Xu
2022-10-05 22:21   ` Sean Christopherson
2022-10-05 22:22     ` Sean Christopherson
2022-08-19 11:09 ` [kvm-unit-tests PATCH v3 05/13] x86: create pmu group for quick pmu-scope testing Like Xu
2022-10-05 22:23   ` Sean Christopherson
2022-08-19 11:09 ` [kvm-unit-tests PATCH v3 06/13] x86/pmu: Test emulation instructions on full-width counters Like Xu
2022-08-19 11:09 ` [kvm-unit-tests PATCH v3 07/13] x86/pmu: Pop up FW prefix to avoid out-of-context propagation Like Xu
2022-10-05 22:25   ` Sean Christopherson
2022-08-19 11:09 ` [kvm-unit-tests PATCH v3 08/13] x86/pmu: Add PDCM check before accessing PERF_CAP register Like Xu
2022-10-05 22:28   ` Sean Christopherson
2022-08-19 11:09 ` [kvm-unit-tests PATCH v3 09/13] x86/pmu: Report SKIP when testing Intel LBR on AMD platforms Like Xu
2022-10-05 22:29   ` Sean Christopherson
2022-08-19 11:09 ` [kvm-unit-tests PATCH v3 10/13] x86/pmu: Update testcases to cover Intel Arch PMU Version 1 Like Xu
2022-09-06  7:15   ` Sandipan Das
2022-09-06 13:28     ` Like Xu
2022-09-06  8:16   ` Sandipan Das
2022-09-06 13:35     ` Like Xu
2022-09-08  8:23       ` Sandipan Das [this message]
2022-09-19  7:09         ` Like Xu
2022-10-21  7:32           ` Like Xu
2022-10-25  8:34             ` Sandipan Das
2022-10-05 22:35   ` Sean Christopherson
2022-10-18  9:32     ` Like Xu
2022-08-19 11:09 ` [kvm-unit-tests PATCH v3 11/13] x86/pmu: Refine message when testing PMU on AMD platforms Like Xu
2022-08-19 11:09 ` [kvm-unit-tests PATCH v3 12/13] x86/pmu: Add assignment framework for Intel-specific HW resources Like Xu
2022-09-06  7:19   ` Sandipan Das
2022-10-05 22:44   ` Sean Christopherson
2022-10-21  7:21     ` Like Xu
2022-10-21 18:22       ` Sean Christopherson
2022-08-19 11:09 ` [kvm-unit-tests PATCH v3 13/13] x86/pmu: Update testcases to cover AMD PMU Like Xu
2022-09-06  7:32   ` Sandipan Das
2022-10-05 22:48   ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=27ef941b-05df-7fa4-a54e-8571b0bf70e7@amd.com \
    --to=sandipan.das@amd.com \
    --cc=kvm@vger.kernel.org \
    --cc=like.xu.linux@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).