All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jim Mattson <jmattson@google.com>
To: Like Xu <like.xu.linux@gmail.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>,
	eranian@google.com, santosh.shukla@amd.com, pbonzini@redhat.com,
	seanjc@google.com, wanpengli@tencent.com, vkuznets@redhat.com,
	joro@8bytes.org, peterz@infradead.org, mingo@redhat.com,
	alexander.shishkin@linux.intel.com, tglx@linutronix.de,
	bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com,
	kvm@vger.kernel.org, x86@kernel.org,
	linux-perf-users@vger.kernel.org, ananth.narayan@amd.com,
	kim.phillips@amd.com
Subject: Re: [PATCH v2] perf/amd: Implement erratum #1292 workaround for F19h M00-0Fh
Date: Wed, 9 Feb 2022 13:40:53 -0800	[thread overview]
Message-ID: <CALMp9eRuBwj9Sg60jg2ucWd-XoAcE_kXP5ULFgpkSHg88sOJZQ@mail.gmail.com> (raw)
In-Reply-To: <e58ca80c-b54c-48b3-fb0b-3e9497c877b7@gmail.com>

On Wed, Feb 9, 2022 at 2:19 AM Like Xu <like.xu.linux@gmail.com> wrote:
>
> On 4/2/2022 9:01 pm, Jim Mattson wrote:
> > On Fri, Feb 4, 2022 at 1:33 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
> >>
> >>
> >>
> >> On 03-Feb-22 11:25 PM, Jim Mattson wrote:
> >>> On Wed, Feb 2, 2022 at 9:18 PM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
> >>>>
> >>>> Hi Jim,
> >>>>
> >>>> On 03-Feb-22 9:39 AM, Jim Mattson wrote:
> >>>>> On Wed, Feb 2, 2022 at 2:52 AM Ravi Bangoria <ravi.bangoria@amd.com> wrote:
> >>>>>>
> >>>>>> Perf counter may overcount for a list of Retire Based Events. Implement
> >>>>>> workaround for Zen3 Family 19 Model 00-0F processors as suggested in
> >>>>>> Revision Guide[1]:
> >>>>>>
> >>>>>>    To count the non-FP affected PMC events correctly:
> >>>>>>      o Use Core::X86::Msr::PERF_CTL2 to count the events, and
> >>>>>>      o Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
> >>>>>>      o Program Core::X86::Msr::PERF_CTL2[20] to 0b.
> >>>>>>
> >>>>>> Note that the specified workaround applies only to counting events and
> >>>>>> not to sampling events. Thus sampling event will continue functioning
> >>>>>> as is.
> >>>>>>
> >>>>>> Although the issue exists on all previous Zen revisions, the workaround
> >>>>>> is different and thus not included in this patch.
> >>>>>>
> >>>>>> This patch needs Like's patch[2] to make it work on kvm guest.
> >>>>>
> >>>>> IIUC, this patch along with Like's patch actually breaks PMU
> >>>>> virtualization for a kvm guest.
> >>>>>
> >>>>> Suppose I have some code which counts event 0xC2 [Retired Branch
> >>>>> Instructions] on PMC0 and event 0xC4 [Retired Taken Branch
> >>>>> Instructions] on PMC1. I then divide PMC1 by PMC0 to see what
> >>>>> percentage of my branch instructions are taken. On hardware that
> >>>>> suffers from erratum 1292, both counters may overcount, but if the
> >>>>> inaccuracy is small, then my final result may still be fairly close to
> >>>>> reality.
> >>>>>
> >>>>> With these patches, if I run that same code in a kvm guest, it looks
> >>>>> like one of those events will be counted on PMC2 and the other won't
> >>>>> be counted at all. So, when I calculate the percentage of branch
> >>>>> instructions taken, I either get 0 or infinity.
> >>>>
> >>>> Events get multiplexed internally. See below quick test I ran inside
> >>>> guest. My host is running with my+Like's patch and guest is running
> >>>> with only my patch.
> >>>
> >>> Your guest may be multiplexing the counters. The guest I posited does not.
> >>
> >> It would be helpful if you can provide an example.
> >
> > Perf on any current Linux distro (i.e. without your fix).
>
> The patch for errata #1292 (like most hw issues or vulnerabilities) should be
> applied to both the host and guest.

As I'm sure you are aware, guests are often not patched. For example,
we have a lot of Debian-9 guests running on Milan, despite the fact
that it has to be booted with "nopcid" due to a bug introduced on
4.9-stable. We submitted the fix and notified Debian about a year ago,
but they have not seen fit to cut a new kernel. Do you think they will
cut a new kernel for this patch?

> For non-patched guests on a patched host, the KVM-created perf_events
> will be true for is_sampling_event() due to get_sample_period().
>
> I think we (KVM) have a congenital defect in distinguishing whether guest
> counters are used in counting mode or sampling mode, which is just
> a different use of pure software.

I have no idea what you are saying. However, when kvm sees a guest
counter used in sampling mode, it will just request a PERF_TYPE_RAW
perf event with the INT bit set in 'config.' If it sees a guest
counter used in counting mode, it will either request a PERF_TYPE_RAW
perf event or a PERF_TYPE_HARDWARE perf event, depending on whether or
not it finds the requested event in amd_event_mapping[].

> >
> >>> I hope that you are not saying that kvm's *thread-pinned* perf events
> >>> are not being multiplexed at the host level, because that completely
> >>> breaks PMU virtualization.
> >>
> >> IIUC, multiplexing happens inside the guest.
> >
> > I'm not sure that multiplexing is the answer. Extrapolation may
> > introduce greater imprecision than the erratum.
>
> If you run the same test on the patched host, the PMC2 will be
> used in a multiplexing way. This is no different.
>
> >
> > If you count something like "instructions retired" three ways:
> > 1) Unfixed counter
> > 2) PMC2 with the fix
> > 3) Multiplexed on PMC2 with the fix
> >
> > Is (3) always more accurate than (1)?

Since Ravi has gone dark, I will answer my own question.

For better reproducibility, I simplified his program to:

int main() { return 0;}

On an unpatched Milan host, I get instructions retired between 21911
and 21915. I get branch instructions retired between 5565 and 5566. It
does not matter if I count them separately or at the same time.

After applying v3 of Ravi's patch, if I try to count these events at
the same time, I get 36869 instructions retired and 4962 branch
instructions on the first run. On subsequent runs, perf refuses to
count both at the same time. I get branch instructions retired between
5565 and 5567, but no instructions retired. Instead, perf tells me:

Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog

If I just count one thing at a time (on the patched kernel), I get
between 21911 and 21916 instructions retired, and I get between 5565
and 5566 branch instructions retired.

I don't know under what circumstances the unfixed counters overcount
or by how much. However, for this simple test case, the fixed PMC2
yields the same results as any unfixed counter. Ravi's patch, however
makes counting two of these events simultaneously either (a)
impossible, or (b) highly inaccurate (from 10% under to 68% over).

> The loss of accuracy is due to a reduction in the number of trustworthy counters,
> not to these two workaround patches. Any multiplexing (whatever on the host or
> the guest) will result in a loss of accuracy. Right ?

Yes, that's my point. Fixing one inaccuracy by using a mechanism that
introduces another inaccuracy only makes sense if the inaccuracy you
are fixing is worse than the inaccuracy you are introducing. That does
not appear to be the case here, but I am not privy to all of the
details of this erratum.

  reply	other threads:[~2022-02-09 21:41 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-17  5:57 [PATCH] KVM: x86/pmu: Clear reserved bit PERF_CTL2[43] for AMD erratum 1292 Like Xu
2022-02-02  4:28 ` [PATCH] perf/amd: Implement errata #1292 workaround for F19h M00-0Fh Ravi Bangoria
2022-02-02  5:27   ` Stephane Eranian
2022-02-02  6:02     ` Ravi Bangoria
2022-02-02  6:16       ` Stephane Eranian
2022-02-02  6:32         ` Ravi Bangoria
2022-02-02 10:51           ` [PATCH v2] perf/amd: Implement erratum " Ravi Bangoria
2022-02-02 14:36             ` Peter Zijlstra
2022-02-02 15:32               ` Ravi Bangoria
2022-02-03  9:58                 ` [PATCH v3] " Ravi Bangoria
2022-02-09 12:51                   ` Peter Zijlstra
2022-02-10  4:05                     ` Ravi Bangoria
2022-02-10  8:46                       ` Peter Zijlstra
2022-02-03  4:09             ` [PATCH v2] " Jim Mattson
2022-02-03  5:18               ` Ravi Bangoria
2022-02-03 17:55                 ` Jim Mattson
2022-02-04  9:32                   ` Ravi Bangoria
2022-02-04 13:01                     ` Jim Mattson
2022-02-09 10:18                       ` Like Xu
2022-02-09 21:40                         ` Jim Mattson [this message]
2022-02-10  4:06                           ` Ravi Bangoria
2022-02-10 13:56                             ` Like Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALMp9eRuBwj9Sg60jg2ucWd-XoAcE_kXP5ULFgpkSHg88sOJZQ@mail.gmail.com \
    --to=jmattson@google.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=ananth.narayan@amd.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=eranian@google.com \
    --cc=hpa@zytor.com \
    --cc=joro@8bytes.org \
    --cc=kim.phillips@amd.com \
    --cc=kvm@vger.kernel.org \
    --cc=like.xu.linux@gmail.com \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=ravi.bangoria@amd.com \
    --cc=santosh.shukla@amd.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.