All of lore.kernel.org
 help / color / mirror / Atom feed
From: Adrian Hunter <adrian.hunter@intel.com>
To: "Steinar H. Gunderson" <sesse@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@redhat.com>, Namhyung Kim <namhyung@kernel.org>,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] perf intel-pt: Synthesize cycle events
Date: Mon, 21 Mar 2022 11:16:56 +0200	[thread overview]
Message-ID: <371faf0d-f794-4a2e-0a1c-9d454d7c8b12@intel.com> (raw)
In-Reply-To: <YjHfGrZovk3N/H0f@google.com>

On 16.3.2022 14.59, Steinar H. Gunderson wrote:
> On Wed, Mar 16, 2022 at 01:19:46PM +0200, Adrian Hunter wrote:
>>> I guess the good news is that the perf report coming out of your version
>>> looks more likely to me; I have some functions that are around 1% that
>>> shouldn't intuitively be that much (and, if I write some Perl to sum up
>>> the cycles from the IPC lines in perf script, are more around 0.1%).
>>> So perhaps we should stop chasing the difference? I don't know.
>> That doesn't sound right.  I will look at it more closely in the next few days.
> 
> If you need, I can supply the perf.data and binaries, but we're talking
> a couple of gigabytes of data (and I don't know immediately if there's
> an easy way I can package up everything perf.data references) :-)
> 
> /* Steinar */

I had another look at this and it seemed *mostly* OK for me.  One change
I would make is to subject the cycle period to the logic of the 'A' option
(approximate IPC).

So what does the 'A' option do.

By default, IPC is output only when the exact number of cycles and
instructions is known for the sample.  Decoding walks instructions
to reconstruct the control flow, so the exact number of instructions
is known, but the cycle count (CYC packet) is only produced with
another packet, so only indirect/async branches or the first
conditional branch of a TNT packet.

Reporting exact IPC makes sense when sampling every branch or
instruction, but makes less sense when sampling less often.

For example with:

$ perf record -e intel_pt/cyc/u uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.218 MB perf.data ]

Sampling every 50us, exact IPC is reported only twice:

$ perf script --itrace=i50us -F+ipc
           uname 2007962 [005] 2426597.185314:      91866 instructions:uH:      7f3feb913deb _dl_relocate_object+0x40b (/usr/lib/x86_64-linux-gnu/ld-2.31.so)
           uname 2007962 [005] 2426597.185353:      21959 instructions:uH:      7f3feb91158f do_lookup_x+0xcf (/usr/lib/x86_64-linux-gnu/ld-2.31.so)
           uname 2007962 [005] 2426597.185670:     129834 instructions:uH:      7f3feb72e05a read_alias_file+0x23a (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
           uname 2007962 [005] 2426597.185709:      39373 instructions:uH:      7f3feb72ed52 _nl_explode_name+0x52 (/usr/lib/x86_64-linux-gnu/libc-2.31.so)
           uname 2007962 [005] 2426597.185947:     137486 instructions:uH:      7f3feb87e5f3 __strlen_avx2+0x13 (/usr/lib/x86_64-linux-gnu/libc-2.31.so)         IPC: 0.88 (420518/472789) 
           uname 2007962 [005] 2426597.186026:      79196 instructions:uH:      7f3feb87e5f3 __strlen_avx2+0x13 (/usr/lib/x86_64-linux-gnu/libc-2.31.so)         IPC: 1.34 (79196/59092) 
           uname 2007962 [005] 2426597.186066:      29855 instructions:uH:      7f3feb78dee6 _int_malloc+0x446 (/usr/lib/x86_64-linux-gnu/libc-2.31.so)

But if we relax the requirement and just use the number of cycles
counted so far, whether it is exactly correct or not, we can get
approx IPC for every sample:

$ perf script --itrace=i50usA -F+ipc
           uname 2007962 [005] 2426597.185314:      91866 instructions:uH:      7f3feb913deb _dl_relocate_object+0x40b (/usr/lib/x86_64-linux-gnu/ld-2.31.so)    IPC: 0.74 (91866/122744) 
           uname 2007962 [005] 2426597.185353:      21959 instructions:uH:      7f3feb91158f do_lookup_x+0xcf (/usr/lib/x86_64-linux-gnu/ld-2.31.so)     IPC: 0.92 (21959/23822) 
           uname 2007962 [005] 2426597.185670:     129834 instructions:uH:      7f3feb72e05a read_alias_file+0x23a (/usr/lib/x86_64-linux-gnu/libc-2.31.so)      IPC: 0.77 (129834/167753) 
           uname 2007962 [005] 2426597.185709:      39373 instructions:uH:      7f3feb72ed52 _nl_explode_name+0x52 (/usr/lib/x86_64-linux-gnu/libc-2.31.so)      IPC: 1.01 (39373/38881) 
           uname 2007962 [005] 2426597.185947:     137486 instructions:uH:      7f3feb87e5f3 __strlen_avx2+0x13 (/usr/lib/x86_64-linux-gnu/libc-2.31.so)         IPC: 1.14 (137486/119589) 
           uname 2007962 [005] 2426597.186026:      79196 instructions:uH:      7f3feb87e5f3 __strlen_avx2+0x13 (/usr/lib/x86_64-linux-gnu/libc-2.31.so)         IPC: 1.34 (79196/59092) 
           uname 2007962 [005] 2426597.186066:      29855 instructions:uH:      7f3feb78dee6 _int_malloc+0x446 (/usr/lib/x86_64-linux-gnu/libc-2.31.so)          IPC: 1.33 (29855/22282) 


So the cycle sample function looks like this:

static int intel_pt_synth_cycle_sample(struct intel_pt_queue *ptq)
{
	struct intel_pt *pt = ptq->pt;
	union perf_event *event = ptq->event_buf;
	struct perf_sample sample = { .ip = 0, };
	u64 period = 0;

	if (ptq->sample_ipc)
		period = ptq->ipc_cyc_cnt - ptq->last_cy_cyc_cnt;

	if (!period || intel_pt_skip_event(pt))
		return 0;

	intel_pt_prep_sample(pt, ptq, event, &sample);

	sample.id = ptq->pt->cycles_id;
	sample.stream_id = ptq->pt->cycles_id;
	sample.period = period;

	sample.cyc_cnt = period;
	sample.insn_cnt = ptq->ipc_insn_cnt - ptq->last_cy_insn_cnt;
	ptq->last_cy_insn_cnt = ptq->ipc_insn_cnt;
	ptq->last_cy_cyc_cnt = ptq->ipc_cyc_cnt;

	return intel_pt_deliver_synth_event(pt, event, &sample, pt->cycles_sample_type);
}


With regard to the results you got with perf report, please try:

	perf report --itrace=y0nse --show-total-period --stdio

and see if the percentages and cycle counts for rarely executed
functions make more sense.

  reply	other threads:[~2022-03-21  9:17 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-10  9:38 [PATCH] perf intel-pt: Synthesize cycle events Steinar H. Gunderson
2022-03-11  9:10 ` Adrian Hunter
2022-03-11 17:42   ` Steinar H. Gunderson
2022-03-14 16:24     ` Adrian Hunter
2022-03-15 10:16       ` Steinar H. Gunderson
2022-03-15 11:32         ` Adrian Hunter
2022-03-15 18:00           ` Steinar H. Gunderson
2022-03-15 20:11             ` Adrian Hunter
2022-03-16  8:19               ` Steinar H. Gunderson
2022-03-16 11:19                 ` Adrian Hunter
2022-03-16 12:59                   ` Steinar H. Gunderson
2022-03-21  9:16                     ` Adrian Hunter [this message]
2022-03-21 10:33                       ` Steinar H. Gunderson
2022-03-21 13:09                         ` Adrian Hunter
2022-03-21 16:58                           ` Steinar H. Gunderson
2022-03-21 17:40                             ` Adrian Hunter
2022-03-22 11:57                             ` Steinar H. Gunderson
2022-03-29 12:31                               ` Steinar H. Gunderson
2022-03-29 14:16                                 ` Steinar H. Gunderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=371faf0d-f794-4a2e-0a1c-9d454d7c8b12@intel.com \
    --to=adrian.hunter@intel.com \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=sesse@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.