All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephane Eranian <eranian@google.com>
To: Robert Richter <robert.richter@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@elte.hu>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs
Date: Fri, 27 Apr 2012 15:10:22 +0200	[thread overview]
Message-ID: <CABPqkBTS2NkFqjVT9HgzPT=TTo9oOc__H9ZDDbXJK7WRpO0Vgg@mail.gmail.com> (raw)
In-Reply-To: <20120427125410.GG18810@erda.amd.com>

Robert,

I did not follow the entire discussion, but based on your initial
post:

perf record -a -e cpu-cycles:p ...    # use ibs op counting cycle count
perf record -a -e r076:p ...          # same as -e cpu-cycles:p
perf record -a -e r0C1:p ...          # use ibs op counting micro-ops

Each IBS sample contains a linear address that points to the
instruction that was causing the sample to trigger. With ibs we have
skid 0.

Though the skid is 0, we map IBS sampling to following precise levels:

 1: RIP taken from IBS sample or (if invalid) from stack.

I assume by stack you mean pt_regs, right?

2: RIP always taken from IBS sample, samples with an invalid rip
   are dropped. Thus samples of an event containing two precise
   modifiers (e.g. r076:pp) only contain (precise) addresses
   detected with IBS.

I don't think you need the distinction between 1 and 2. You can
always use the pt_regs IP as a fallback. You can mark that the
IP is precise with the MISC_EXACT flag in the sample header.
This is how it's done with PEBS. What's wrong with that?
It may actually be better than dropping samples silently as it
may introduce some bias.


On Fri, Apr 27, 2012 at 2:54 PM, Robert Richter <robert.richter@amd.com> wrote:
> On 27.04.12 14:39:21, Stephane Eranian wrote:
>> On Fri, Apr 27, 2012 at 2:34 PM, Robert Richter <robert.richter@amd.com> wrote:
>> > On 23.04.12 11:56:59, Robert Richter wrote:
>> >> On 14.04.12 12:21:46, Peter Zijlstra wrote:
>> >> > On Mon, 2012-04-02 at 20:19 +0200, Robert Richter wrote:
>> >> > > + * We map IBS sampling to following precise levels:
>> >> > > + *
>> >> > > + *  1: RIP taken from IBS sample or (if invalid) from stack
>> >> > > + *  2: RIP always taken from IBS sample, samples with an invalid rip
>> >> > > + *     are dropped. Thus samples of an event containing two precise
>> >> > > + *     modifiers (e.g. r076:pp) only contain (precise) addresses
>> >> > > + *     detected with IBS.
>> >> >
>> >> >             /*
>> >> >              * precise_ip:
>> >> >              *
>> >> >              *  0 - SAMPLE_IP can have arbitrary skid
>> >> >              *  1 - SAMPLE_IP must have constant skid
>> >> >              *  2 - SAMPLE_IP requested to have 0 skid
>> >> >              *  3 - SAMPLE_IP must have 0 skid
>> >> >              *
>> >> >              *  See also PERF_RECORD_MISC_EXACT_IP
>> >> >              */
>> >> >
>> >> > your 1 doesn't have constant skid. I would suggest only supporting 2 and
>> >> > letting userspace drop !PERF_RECORD_MISC_EXACT_IP records if so desired.
>> >>
>> >> Ah, didn't notice the PERF_RECORD_MISC_EXACT_IP flag. Will set this
>> >> flag for precise events.
>> >
>> Why not use 2? IBS has 0 skid, unless I am mistaken.
>
> Events with r076:p would fail then. But r076:pp is actually better and
> a subset of level 1. Thus both level should work.
>
> And there is still the question how samples with imprecise rip should
> be handled. Sometimes we want to get all samples and sometimes all
> samples should always contain a precise rip, other samples should be
> dropped then. But there is no option or modifier for this yet.
>
> My suggestions was to use level 1 for all samples and level 2 for
> samples that only contain a precise rip, saving level 3 for future
> use.
>
> -Robert
>
>>
>> > Peter,
>> >
>> > I have a patch on top that implements the support of the
>> > PERF_RECORD_MISC_EXACT_IP flag. But I am not quite sure about how to
>> > use the precise levels. What do you suggest?
>> >
>> > Thanks,
>> >
>> > -Robert
>> >
>> >>
>> >> Problem is that this flag is not yet well supported, only perf-top
>> >> uses it to count the total number of exact samples. Esp. perf-annotate
>> >> and perf-report do not support it, and there are no modifiers to
>> >> select precise-only sampling (or is this level 3?).
>> >>
>> >> Both might be useful: You might need only precise-rip samples (perf-
>> >> annotate usage), on the other side you want samples with every
>> >> clock/ops count overflow (e.g. to get a counting statistic). The
>> >> p-modifier specification (see perf-list) is not sufficient to select
>> >> both of it.
>> >>
>> >> Another question I have: Isn't precise level 2 a special case of level
>> >> 1 where the skid is constant and 0? The problem I see is, if people
>> >> want to measure precise rip, they simply use r076:p. Level 2 (r076:pp)
>> >> is actually better than 1, but they might think not to be able to
>> >> sample precise-rip if we throw an error for r076:p. Thus, I would
>> >> prefer to also allow level 1.
>> >>
>> >> > That said, mixing the IBS pmu into the regular core pmu isn't exactly
>> >> > pretty..
>> >>
>> >> IBS is currently the only way to do precise-rip sampling on amd cpus.
>> >> IBS events fit well with its corresponding perfctr events (0x76/
>> >> 0xc1). So what don't you like with this approach? I will also post IBS
>> >> perf tool support where IBS can be directly used.
>> >>
>> >> -Robert
>> >>
>> >> --
>> >> Advanced Micro Devices, Inc.
>> >> Operating System Research Center
>> >
>> > --
>> > Advanced Micro Devices, Inc.
>> > Operating System Research Center
>> >
>>
>
> --
> Advanced Micro Devices, Inc.
> Operating System Research Center
>

  reply	other threads:[~2012-04-27 13:10 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-02 18:19 [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Robert Richter
2012-04-02 18:19 ` [PATCH 01/12] perf/x86-ibs: Fix update of period Robert Richter
2012-05-09 14:29   ` [tip:perf/core] " tip-bot for Robert Richter
2012-04-02 18:19 ` [PATCH 02/12] perf: Pass last sampling period to perf_sample_data_init() Robert Richter
2012-05-09 14:30   ` [tip:perf/core] " tip-bot for Robert Richter
2012-04-02 18:19 ` [PATCH 03/12] perf/x86-ibs: Enable ibs op micro-ops counting mode Robert Richter
2012-05-09 14:31   ` [tip:perf/core] " tip-bot for Robert Richter
2012-04-02 18:19 ` [PATCH 04/12] perf/x86-ibs: Fix frequency profiling Robert Richter
2012-05-09 14:32   ` [tip:perf/core] " tip-bot for Robert Richter
2012-04-02 18:19 ` [PATCH 05/12] perf/x86-ibs: Take instruction pointer from ibs sample Robert Richter
2012-05-09 14:33   ` [tip:perf/core] " tip-bot for Robert Richter
2012-04-02 18:19 ` [PATCH 06/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Robert Richter
2012-04-14 10:21   ` Peter Zijlstra
2012-04-23  9:56     ` Robert Richter
2012-04-27 12:34       ` Robert Richter
2012-04-27 12:39         ` Stephane Eranian
2012-04-27 12:54           ` Robert Richter
2012-04-27 13:10             ` Stephane Eranian [this message]
2012-04-27 15:18               ` Robert Richter
2012-04-27 15:30                 ` Peter Zijlstra
2012-04-27 15:57                   ` Stephane Eranian
2012-04-27 15:30             ` Peter Zijlstra
2012-04-27 16:09               ` Robert Richter
2012-04-27 16:21                 ` Peter Zijlstra
2012-04-27 16:23                   ` Stephane Eranian
2012-04-14 10:22   ` Peter Zijlstra
2012-04-23  8:41     ` Robert Richter
2012-04-23 10:36       ` Peter Zijlstra
2012-04-14 10:24   ` Peter Zijlstra
2012-04-23 10:08     ` Robert Richter
2012-05-02 10:33   ` [PATCH v2] " Robert Richter
2012-05-02 11:14     ` Peter Zijlstra
2012-05-04 17:53       ` Peter Zijlstra
2012-05-09 14:34     ` [tip:perf/core] " tip-bot for Robert Richter
2012-04-02 18:19 ` [PATCH 07/12] perf/x86-ibs: Rename some variables Robert Richter
2012-05-09 14:34   ` [tip:perf/core] " tip-bot for Robert Richter
2012-04-02 18:19 ` [PATCH 08/12] perf/x86-ibs: Trigger overflow if remaining period is too small Robert Richter
2012-05-09 14:35   ` [tip:perf/core] " tip-bot for Robert Richter
2012-04-02 18:19 ` [PATCH 09/12] perf/x86-ibs: Extend hw period that triggers overflow Robert Richter
2012-05-09 14:36   ` [tip:perf/core] " tip-bot for Robert Richter
2012-04-02 18:19 ` [PATCH 10/12] perf/x86-ibs: Implement workaround for IBS erratum #420 Robert Richter
2012-05-09 14:37   ` [tip:perf/core] " tip-bot for Robert Richter
2012-04-02 18:19 ` [PATCH 11/12] perf/x86-ibs: Catch spurious interrupts after stopping ibs Robert Richter
2012-05-09 14:38   ` [tip:perf/core] perf/x86-ibs: Catch spurious interrupts after stopping IBS tip-bot for Robert Richter
2012-04-02 18:19 ` [PATCH 12/12] perf/x86-ibs: Fix usage of IBS op current count Robert Richter
2012-05-09 14:39   ` [tip:perf/core] " tip-bot for Robert Richter
2012-04-02 19:11 ` [PATCH 00/12] perf/x86-ibs: Precise event sampling with IBS for AMD CPUs Ingo Molnar
2012-04-03 10:48   ` Robert Richter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPqkBTS2NkFqjVT9HgzPT=TTo9oOc__H9ZDDbXJK7WRpO0Vgg@mail.gmail.com' \
    --to=eranian@google.com \
    --cc=acme@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=robert.richter@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.