Re: [PATCH v2] perf/amd: Implement erratum #1292 workaround for F19h M00-0Fh

From: Like Xu <like.xu.linux@gmail.com>
To: Jim Mattson <jmattson@google.com>, Ravi Bangoria <ravi.bangoria@amd.com>
Cc: eranian@google.com, santosh.shukla@amd.com, pbonzini@redhat.com,
	seanjc@google.com, wanpengli@tencent.com, vkuznets@redhat.com,
	joro@8bytes.org, peterz@infradead.org, mingo@redhat.com,
	alexander.shishkin@linux.intel.com, tglx@linutronix.de,
	bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com,
	kvm@vger.kernel.org, x86@kernel.org,
	linux-perf-users@vger.kernel.org, ananth.narayan@amd.com,
	kim.phillips@amd.com
Subject: Re: [PATCH v2] perf/amd: Implement erratum #1292 workaround for F19h M00-0Fh
Date: Wed, 9 Feb 2022 18:18:45 +0800	[thread overview]
Message-ID: <e58ca80c-b54c-48b3-fb0b-3e9497c877b7@gmail.com> (raw)
In-Reply-To: <CALMp9eQ9K+CXHVZ1zSyw78n-agM2+NQ1xJ4niO-YxSkQCLcK-A@mail.gmail.com>

On 4/2/2022 9:01 pm, Jim Mattson wrote:
> On Fri, Feb 4, 2022 at 1:33 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>
>>
>>
>> On 03-Feb-22 11:25 PM, Jim Mattson wrote:
>>> On Wed, Feb 2, 2022 at 9:18 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>>>
>>>> Hi Jim,
>>>>
>>>> On 03-Feb-22 9:39 AM, Jim Mattson wrote:
>>>>> On Wed, Feb 2, 2022 at 2:52 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
>>>>>>
>>>>>> Perf counter may overcount for a list of Retire Based Events. Implement
>>>>>> workaround for Zen3 Family 19 Model 00-0F processors as suggested in
>>>>>> Revision Guide[1]:
>>>>>>
>>>>>>    To count the non-FP affected PMC events correctly:
>>>>>>      o Use Core::X86::Msr::PERF_CTL2 to count the events, and
>>>>>>      o Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
>>>>>>      o Program Core::X86::Msr::PERF_CTL2[20] to 0b.
>>>>>>
>>>>>> Note that the specified workaround applies only to counting events and
>>>>>> not to sampling events. Thus sampling event will continue functioning
>>>>>> as is.
>>>>>>
>>>>>> Although the issue exists on all previous Zen revisions, the workaround
>>>>>> is different and thus not included in this patch.
>>>>>>
>>>>>> This patch needs Like's patch[2] to make it work on kvm guest.
>>>>>
>>>>> IIUC, this patch along with Like's patch actually breaks PMU
>>>>> virtualization for a kvm guest.
>>>>>
>>>>> Suppose I have some code which counts event 0xC2 [Retired Branch
>>>>> Instructions] on PMC0 and event 0xC4 [Retired Taken Branch
>>>>> Instructions] on PMC1. I then divide PMC1 by PMC0 to see what
>>>>> percentage of my branch instructions are taken. On hardware that
>>>>> suffers from erratum 1292, both counters may overcount, but if the
>>>>> inaccuracy is small, then my final result may still be fairly close to
>>>>> reality.
>>>>>
>>>>> With these patches, if I run that same code in a kvm guest, it looks
>>>>> like one of those events will be counted on PMC2 and the other won't
>>>>> be counted at all. So, when I calculate the percentage of branch
>>>>> instructions taken, I either get 0 or infinity.
>>>>
>>>> Events get multiplexed internally. See below quick test I ran inside
>>>> guest. My host is running with my+Like's patch and guest is running
>>>> with only my patch.
>>>
>>> Your guest may be multiplexing the counters. The guest I posited does not.
>>
>> It would be helpful if you can provide an example.
> 
> Perf on any current Linux distro (i.e. without your fix).

The patch for errata #1292 (like most hw issues or vulnerabilities) should be
applied to both the host and guest.

For non-patched guests on a patched host, the KVM-created perf_events
will be true for is_sampling_event() due to get_sample_period().

I think we (KVM) have a congenital defect in distinguishing whether guest
counters are used in counting mode or sampling mode, which is just
a different use of pure software.

> 
>>> I hope that you are not saying that kvm's *thread-pinned* perf events
>>> are not being multiplexed at the host level, because that completely
>>> breaks PMU virtualization.
>>
>> IIUC, multiplexing happens inside the guest.
> 
> I'm not sure that multiplexing is the answer. Extrapolation may
> introduce greater imprecision than the erratum.

If you run the same test on the patched host, the PMC2 will be
used in a multiplexing way. This is no different.

> 
> If you count something like "instructions retired" three ways:
> 1) Unfixed counter
> 2) PMC2 with the fix
> 3) Multiplexed on PMC2 with the fix
> 
> Is (3) always more accurate than (1)?

The loss of accuracy is due to a reduction in the number of trustworthy counters,
not to these two workaround patches. Any multiplexing (whatever on the host or
the guest) will result in a loss of accuracy. Right ?

I'm not sure if we should provide a sysfs knob for (1), is there a precedent for 
this ?

> 
>> Thanks,
>> Ravi