linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Stephane Eranian <eranian@google.com>
Cc: linux-kernel@vger.kernel.org, kim.phillips@amd.com,
	acme@redhat.com, jolsa@redhat.com, songliubraving@fb.com
Subject: Re: [PATCH v6 06/12] perf/x86/amd: add AMD branch sampling period adjustment
Date: Tue, 15 Mar 2022 13:08:53 +0100	[thread overview]
Message-ID: <20220315120853.GG8939@worktop.programming.kicks-ass.net> (raw)
In-Reply-To: <CABPqkBRQwYnxcXigKwF83BPhQmombqa6nuF5-krqN=00Loy_gg@mail.gmail.com>

On Wed, Mar 09, 2022 at 03:03:39PM -0800, Stephane Eranian wrote:
> On Fri, Mar 4, 2022 at 7:45 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Wed, Feb 09, 2022 at 04:32:04PM +0100, Peter Zijlstra wrote:
> > > On Tue, Feb 08, 2022 at 01:16:31PM -0800, Stephane Eranian wrote:
> > > > Add code to adjust the sampling event period when used with the Branch
> > > > Sampling feature (BRS). Given the depth of the BRS (16), the period is
> > > > reduced by that depth such that in the best case scenario, BRS saturates at
> > > > the desired sampling period. In practice, though, the processor may execute
> > > > more branches. Given a desired period P and a depth D, the kernel programs
> > > > the actual period at P - D. After P occurrences of the sampling event, the
> > > > counter overflows. It then may take X branches (skid) before the NMI is
> > > > caught and held by the hardware and BRS activates. Then, after D branches,
> > > > BRS saturates and the NMI is delivered.  With no skid, the effective period
> > > > would be (P - D) + D = P. In practice, however, it will likely be (P - D) +
> > > > X + D. There is no way to eliminate X or predict X.
> > > >
> > > > Signed-off-by: Stephane Eranian <eranian@google.com>
> > > > ---
> > > >  arch/x86/events/core.c       |  7 +++++++
> > > >  arch/x86/events/perf_event.h | 12 ++++++++++++
> > > >  2 files changed, 19 insertions(+)
> > > >
> > > > diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> > > > index c2a890caeb0a..ed285f640efe 100644
> > > > --- a/arch/x86/events/core.c
> > > > +++ b/arch/x86/events/core.c
> > > > @@ -1374,6 +1374,13 @@ int x86_perf_event_set_period(struct perf_event *event)
> > > >         x86_pmu.set_topdown_event_period)
> > > >             return x86_pmu.set_topdown_event_period(event);
> > > >
> > > > +   /*
> > > > +    * decrease period by the depth of the BRS feature to get
> > > > +    * the last N taken branches and approximate the desired period
> > > > +    */
> > > > +   if (has_branch_stack(event))
> > > > +           period = amd_brs_adjust_period(period);
> > > > +
> > > >     /*
> > > >      * If we are way outside a reasonable range then just skip forward:
> > > >      */
> > > > diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
> > > > index 3485a4cf0241..25b037b571e4 100644
> > > > --- a/arch/x86/events/perf_event.h
> > > > +++ b/arch/x86/events/perf_event.h
> > > > @@ -1263,6 +1263,14 @@ static inline bool amd_brs_active(void)
> > > >     return cpuc->brs_active;
> > > >  }
> > > >
> > > > +static inline s64 amd_brs_adjust_period(s64 period)
> > > > +{
> > > > +   if (period > x86_pmu.lbr_nr)
> > > > +           return period - x86_pmu.lbr_nr;
> > > > +
> > > > +   return period;
> > > > +}
> > >
> > > This makes no sense to me without also enforcing that the event is in
> > > fact that branch retired thing.
> >
> > So what are we going to do with all these patches? Note that I did pick
> > them up for testing and I've fixed at least 2 build problems with them.
> >
> > But I still don't think they're actually completely sane. So there's the
> > above issue, subtracting lbr_nr from a random event just makes no sense.
> 
> 
> You are right. Initially, I had it such that only retired_branch_taken was
> the only event possible. In that case, subtracting lbr_nr made sense.
> Since, I have relaxed the event but it exposes this problem. I think
> given how BRS works, I am okay restricting to retired_br_taken
> because no matter what the hw is going to activate at P (period)
> and wait for 16  taken branches before delivering the NMI. So if I
> am sampling on cycles with P=1000000, then the NMI is delivered
> at P + X + Z, where X = number of cycles elapsed for the 16 taken
> branches (unpredictable) and Z the interrupt skid for NMI (which is
> extremely big on AMD). With retired_branch_taken, that formula
> becomes: P + 16 + Z, where Z is the number of taken branches
> during the skid. But given BRS saturates when full, you do lose
> the content because of the Z skid. My opinion is we keep the
> lbr_nr subtraction and force event to be only retired_branch_taken.

OK, can you do me a delta patch and tell me which commit to merge it in?

> > But there's also the whole exclusion thing, IIRC you're making it
> > exclusive against other LBR users, but AFAICT having one LBR user active
> > will completely screw over any other sampling event due to introducing
> > these massive skids.
> 
> 
> The skid is not massive compared to the actual skid of regular interrupt-based
> sampling. You are looking at the time it takes to execute 16 taken branches
> vs. 2000+ cycles for the NMI skid.  And this would happen only if the other
> events overflow during that 16 taken branch window.

Wait, you're telling me that regs->ip is 2000 cycles/CPI further along
than the instruction that caused the PMI on AMD? That seems beyond
useless.

That's also not what I seem to remember from the last time I used perf
on AMD (admittedly a while ago). Normally the reported IP is a few
instructions beyond the eventing IP. Yielding the normal perf-annotate
output that's shifted but mostly trivially readable.

However, if you delay that NMI for however many instructions it takes to
do 16 branches, the reported IP (regs->ip) will be completely unrelated
to the eventing IP (the one that actually triggered PMI).

In that case the perf-annotate output becomes really hard to interpret.
Esp. if you don't know which IPs were basically garbage.

One possible work-around might be to discard the sample for any
!retired_branch_taken overflow and reprogram those counters with a very
small (1?) value to 'insta' take a new sample without interference. But
that's yuck too.


  reply	other threads:[~2022-03-15 12:09 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-08 21:16 [PATCH v6 00/12] perf/x86/amd: Add AMD Fam19h Branch Sampling support Stephane Eranian
2022-02-08 21:16 ` [PATCH v6 01/12] perf/core: add perf_clear_branch_entry_bitfields() helper Stephane Eranian
2022-02-08 21:16 ` [PATCH v6 02/12] x86/cpufeatures: add AMD Fam19h Branch Sampling feature Stephane Eranian
2022-02-08 21:16 ` [PATCH v6 03/12] perf/x86/amd: add AMD Fam19h Branch Sampling support Stephane Eranian
2022-02-08 21:16 ` [PATCH v6 04/12] perf/x86/amd: add branch-brs helper event for Fam19h BRS Stephane Eranian
2022-02-08 21:16 ` [PATCH v6 05/12] perf/x86/amd: enable branch sampling priv level filtering Stephane Eranian
2022-02-08 21:16 ` [PATCH v6 06/12] perf/x86/amd: add AMD branch sampling period adjustment Stephane Eranian
2022-02-09 15:32   ` Peter Zijlstra
2022-03-04 15:45     ` Peter Zijlstra
2022-03-09 23:03       ` Stephane Eranian
2022-03-15 12:08         ` Peter Zijlstra [this message]
2022-03-17 17:11           ` Stephane Eranian
2022-02-08 21:16 ` [PATCH v6 07/12] perf/x86/amd: make Zen3 branch sampling opt-in Stephane Eranian
2022-02-08 21:16 ` [PATCH v6 08/12] ACPI: add perf low power callback Stephane Eranian
2022-02-08 21:16 ` [PATCH v6 09/12] perf/x86/amd: add idle hooks for branch sampling Stephane Eranian
2022-02-08 21:16 ` [PATCH v6 10/12] perf tools: Improve IBS error handling Stephane Eranian
2022-02-09 15:47   ` Arnaldo Carvalho de Melo
2022-03-15  6:49     ` Ravi Bangoria
2022-03-15  2:01   ` Stephane Eranian
2022-03-15  6:23     ` Ravi Bangoria
2022-03-15  7:12       ` Stephane Eranian
2022-03-15  7:45   ` Ravi Bangoria
2022-03-16  0:03     ` Stephane Eranian
2022-03-16 11:07       ` Ravi Bangoria
2022-03-16 11:16         ` Ravi Bangoria
2022-02-08 21:16 ` [PATCH v6 11/12] perf tools: Improve error handling of AMD Branch Sampling Stephane Eranian
2022-02-16 14:17   ` Arnaldo Carvalho de Melo
2022-02-08 21:16 ` [PATCH v6 12/12] perf report: add addr_from/addr_to sort dimensions Stephane Eranian
2022-02-16 14:21   ` Arnaldo Carvalho de Melo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220315120853.GG8939@worktop.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=acme@redhat.com \
    --cc=eranian@google.com \
    --cc=jolsa@redhat.com \
    --cc=kim.phillips@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=songliubraving@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).