All of lore.kernel.org
 help / color / mirror / Atom feed
* LBR unwinding for user defined dynamic trace point
@ 2015-09-15 13:12 Milian Wolff
  2015-09-15 14:02 ` Andi Kleen
  0 siblings, 1 reply; 6+ messages in thread
From: Milian Wolff @ 2015-09-15 13:12 UTC (permalink / raw)
  To: linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 788 bytes --]

Hey all,

I just tried to use LBR with a user defined dynamic trace point and get a 
strange error message:

$ sudo perf probe -x /lib64/libc.so.6 malloc
$ sudo perf record -e probe_libc:malloc --call-graph lbr -F 1000 ./foo
Error:
The sys_perf_event_open() syscall returned with 95 (Operation not supported) 
for event (probe_libc:malloc).
/bin/dmesg may provide additional information.
No CONFIG_PERF_EVENTS=y kernel support configured?

When I do the same with the Dwarf unwinder, it works just fine. Using the LBR 
unwinder with a performance counter like instructions also works fine. Does 
anyone know what the issue is here?
-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: LBR unwinding for user defined dynamic trace point
  2015-09-15 13:12 LBR unwinding for user defined dynamic trace point Milian Wolff
@ 2015-09-15 14:02 ` Andi Kleen
  2015-09-16  9:55   ` Milian Wolff
  0 siblings, 1 reply; 6+ messages in thread
From: Andi Kleen @ 2015-09-15 14:02 UTC (permalink / raw)
  To: Milian Wolff, 5A; +Cc: linux-perf-users

Milian Wolff <milian.wolff@kdab.com> writes:
>
> When I do the same with the Dwarf unwinder, it works just fine. Using the LBR 
> unwinder with a performance counter like instructions also works fine. Does 
> anyone know what the issue is here?

LBR is only supported for PMU sampling at this point.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: LBR unwinding for user defined dynamic trace point
  2015-09-15 14:02 ` Andi Kleen
@ 2015-09-16  9:55   ` Milian Wolff
  2015-09-16 18:56     ` Andi Kleen
  0 siblings, 1 reply; 6+ messages in thread
From: Milian Wolff @ 2015-09-16  9:55 UTC (permalink / raw)
  To: Andi Kleen; +Cc: 5A, linux-perf-users

On Tuesday, September 15, 2015 7:02:37 AM CEST Andi Kleen wrote:
> Milian Wolff <milian.wolff@kdab.com> writes:
> > When I do the same with the Dwarf unwinder, it works just fine. Using the
> > LBR unwinder with a performance counter like instructions also works
> > fine. Does anyone know what the issue is here?
> 
> LBR is only supported for PMU sampling at this point.

OK, thanks. Can you give me some more information on why that is? Is it 
fundamentally not possible, or simply not yet implemented?

Thanks
-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: LBR unwinding for user defined dynamic trace point
  2015-09-16  9:55   ` Milian Wolff
@ 2015-09-16 18:56     ` Andi Kleen
  2016-04-07 15:05       ` Milian Wolff
  0 siblings, 1 reply; 6+ messages in thread
From: Andi Kleen @ 2015-09-16 18:56 UTC (permalink / raw)
  To: Milian Wolff; +Cc: Andi Kleen, 5A, linux-perf-users

On Wed, Sep 16, 2015 at 11:55:22AM +0200, Milian Wolff wrote:
> On Tuesday, September 15, 2015 7:02:37 AM CEST Andi Kleen wrote:
> > Milian Wolff <milian.wolff@kdab.com> writes:
> > > When I do the same with the Dwarf unwinder, it works just fine. Using the
> > > LBR unwinder with a performance counter like instructions also works
> > > fine. Does anyone know what the issue is here?
> > 
> > LBR is only supported for PMU sampling at this point.
> 
> OK, thanks. Can you give me some more information on why that is? Is it 
> fundamentally not possible, or simply not yet implemented?

For LBR callgraph it could be implemented. For some other LBR usages
there would be limitations, as there is no LBR freezing for software
trace points.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: LBR unwinding for user defined dynamic trace point
  2015-09-16 18:56     ` Andi Kleen
@ 2016-04-07 15:05       ` Milian Wolff
  2016-04-07 15:55         ` Andi Kleen
  0 siblings, 1 reply; 6+ messages in thread
From: Milian Wolff @ 2016-04-07 15:05 UTC (permalink / raw)
  To: Andi Kleen; +Cc: 5A, linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 2507 bytes --]

On Wednesday, September 16, 2015 8:56:14 PM CEST Andi Kleen wrote:
> On Wed, Sep 16, 2015 at 11:55:22AM +0200, Milian Wolff wrote:
> > On Tuesday, September 15, 2015 7:02:37 AM CEST Andi Kleen wrote:
> > > Milian Wolff <milian.wolff@kdab.com> writes:
> > > > When I do the same with the Dwarf unwinder, it works just fine. Using
> > > > the
> > > > LBR unwinder with a performance counter like instructions also works
> > > > fine. Does anyone know what the issue is here?
> > > 
> > > LBR is only supported for PMU sampling at this point.
> > 
> > OK, thanks. Can you give me some more information on why that is? Is it
> > fundamentally not possible, or simply not yet implemented?
> 
> For LBR callgraph it could be implemented. For some other LBR usages
> there would be limitations, as there is no LBR freezing for software
> trace points.

Hey Andi,

I read your articles on LBR on LWN, thanks a lot for that! I want to revive 
this older thread though:

Could you tell me what would be required to get LBR callgraphs supported for 
trace points? I noticed that it also does not work with static tracepoints or 
software events:

~~~~~~~~~~~~~~~~~~~~~~~
$ perf record -e faults --call-graph lbr ls
Error:
The sys_perf_event_open() syscall returned with 95 (Operation not supported) 
for event (faults).
/bin/dmesg may provide additional information.
No CONFIG_PERF_EVENTS=y kernel support configured?

$ perf record -e raw_syscalls:sys_enter --call-graph lbr ls
Error:
The sys_perf_event_open() syscall returned with 95 (Operation not supported) 
for event (raw_syscalls:sys_enter).
/bin/dmesg may provide additional information.
No CONFIG_PERF_EVENTS=y kernel support configured?
~~~~~~~~~~~~~~~~~~~~~~~

It does work fine for hardware events though:

~~~~~~~~~~~~~~~~~~~~~~~
$ perf record -e cache-misses --call-graph lbr ls
WARNING: No sample_id_all support, falling back to unordered processing
...
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (6 samples) ]
~~~~~~~~~~~~~~~~~~~~~~~

Due to the reasons and advantages you point out in your article for LBR 
callgraphs, I'd love to see it supported for the above use-cases as well, if 
possible. If you can guide me, and it does not require me to patch the kernel 
itself, then I'm also willing to come up with a patch for perf.

Thanks
-- 
Milian Wolff | milian.wolff@kdab.com | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5903 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: LBR unwinding for user defined dynamic trace point
  2016-04-07 15:05       ` Milian Wolff
@ 2016-04-07 15:55         ` Andi Kleen
  0 siblings, 0 replies; 6+ messages in thread
From: Andi Kleen @ 2016-04-07 15:55 UTC (permalink / raw)
  To: Milian Wolff; +Cc: Andi Kleen, linux-perf-users

> Could you tell me what would be required to get LBR callgraphs supported for 
> trace points? I noticed that it also does not work with static tracepoints or 
> software events:

The perf trace point handler code would need to know how to read the LBR
registers and dump them into the trace stream. Right now that's only
implemented for the PMU overflow code. In principle it is not very
complicated, see
arch/x86/kernel/cpu/perf_event_intel_lbr.c:intel_pmu_lbr_read()
Basically would need to call that, but save the information
somewhere else (not the cpuc), and then dump it into the trace buffer.

For kernel LBRs it would also need to stop/restart them to avoid
pollution, but for the case you're likely interested in (LBR with ring 3
filter for user programs), that's not needed.

> Due to the reasons and advantages you point out in your article for LBR 
> callgraphs, I'd love to see it supported for the above use-cases as well, if 
> possible. If you can guide me, and it does not require me to patch the kernel 
> itself, then I'm also willing to come up with a patch for perf.

It requires patching the kernel.

-Andi

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-04-07 15:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-15 13:12 LBR unwinding for user defined dynamic trace point Milian Wolff
2015-09-15 14:02 ` Andi Kleen
2015-09-16  9:55   ` Milian Wolff
2015-09-16 18:56     ` Andi Kleen
2016-04-07 15:05       ` Milian Wolff
2016-04-07 15:55         ` Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.