All of lore.kernel.org
 help / color / mirror / Atom feed
* perf not showing me call graph for memcpy no matter what
@ 2021-12-16 13:37 Timothy Miller
  2021-12-16 13:54 ` Timothy Miller
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Timothy Miller @ 2021-12-16 13:37 UTC (permalink / raw)
  To: linux-perf-users

Hi,

I am doing some software profiling on an aarch64 system, and I’m using the Linux perf tool. The problem I’m running into is that “__GI___memcpy_simd” keeps showing up as the function with the most CPU usage.

Unfortunately, no matter what I do, this function keeps showing up as orphaned. That is, I cannot get a stack trace for it so I can find out who is calling it.

I have tried using dwarf mode, but it always gets overloaded.

I have tried using lbr mode, but I get the following error:
   Error:
   PMU Hardware doesn't support sampling/overflow-interrupts.

I’ve rebuilt my application and all relevant libraries with -no-omit-frame-pointer so that I could use the default frame pointer mode. Unfortunately, I still can’t get a call graph for this function.

I emailed the glibc mailing list about this, trying to find out how to work around this problem, perhaps adding frame pointer to the assembly implementation of memcpy. They suggested I try attaching a debugger, and I’ve found that I can get stack traces just fine. They suggest that I seem to be running into some kind of bug in perf. 

Any help/advice would be appreciated.

Thanks.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf not showing me call graph for memcpy no matter what
  2021-12-16 13:37 perf not showing me call graph for memcpy no matter what Timothy Miller
@ 2021-12-16 13:54 ` Timothy Miller
  2021-12-16 17:23 ` James Clark
  2021-12-17  9:16 ` Milian Wolff
  2 siblings, 0 replies; 6+ messages in thread
From: Timothy Miller @ 2021-12-16 13:54 UTC (permalink / raw)
  To: linux-perf-users

Sorry about the duplicate. Majordomo kept throwing errors when I would try to join or verify. It really didn’t like the email address I tried to use before, so I assumed that it wasn’t going to accept the post either.

> On Dec 16, 2021, at 8:37 AM, Timothy Miller <theosib@icloud.com> wrote:
> 
> Hi,
> 
> I am doing some software profiling on an aarch64 system, and I’m using the Linux perf tool. The problem I’m running into is that “__GI___memcpy_simd” keeps showing up as the function with the most CPU usage.
> 
> Unfortunately, no matter what I do, this function keeps showing up as orphaned. That is, I cannot get a stack trace for it so I can find out who is calling it.
> 
> I have tried using dwarf mode, but it always gets overloaded.
> 
> I have tried using lbr mode, but I get the following error:
>   Error:
>   PMU Hardware doesn't support sampling/overflow-interrupts.
> 
> I’ve rebuilt my application and all relevant libraries with -no-omit-frame-pointer so that I could use the default frame pointer mode. Unfortunately, I still can’t get a call graph for this function.
> 
> I emailed the glibc mailing list about this, trying to find out how to work around this problem, perhaps adding frame pointer to the assembly implementation of memcpy. They suggested I try attaching a debugger, and I’ve found that I can get stack traces just fine. They suggest that I seem to be running into some kind of bug in perf. 
> 
> Any help/advice would be appreciated.
> 
> Thanks.
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf not showing me call graph for memcpy no matter what
  2021-12-16 13:37 perf not showing me call graph for memcpy no matter what Timothy Miller
  2021-12-16 13:54 ` Timothy Miller
@ 2021-12-16 17:23 ` James Clark
  2021-12-17  9:16 ` Milian Wolff
  2 siblings, 0 replies; 6+ messages in thread
From: James Clark @ 2021-12-16 17:23 UTC (permalink / raw)
  To: Timothy Miller, linux-perf-users; +Cc: German Gomez



On 16/12/2021 13:37, Timothy Miller wrote:
> Hi,
> 
> I am doing some software profiling on an aarch64 system, and I’m using the Linux perf tool. The problem I’m running into is that “__GI___memcpy_simd” keeps showing up as the function with the most CPU usage.
> 

> Unfortunately, no matter what I do, this function keeps showing up as orphaned. That is, I cannot get a stack trace for it so I can find out who is calling it.
>
Hi Timothy,

Do you have a full reproducer? Maybe one that works in a fresh docker container, and with what you expect to see vs what you actually see.

What version of perf and libunwind are you using?

 
> I have tried using dwarf mode, but it always gets overloaded.

Not sure what you mean by overloaded.

> 
> I have tried using lbr mode, but I get the following error:
>    Error:
>    PMU Hardware doesn't support sampling/overflow-interrupts.
> 
> I’ve rebuilt my application and all relevant libraries with -no-omit-frame-pointer so that I could use the default frame pointer mode. Unfortunately, I still can’t get a call graph for this function.

Even with that option the compiler will still omit frame pointers if there is absolutely no need for them,
for example for a leaf function call. Although it doesn't sound exactly like that's your issue.

You could try the patch "[PATCH v4 0/6] Fix missing leaf-function callers when recording" which is currently on
the mailing list, but that will only give you the caller of the last function. It sounds like you have more
frames missing?

> 
> I emailed the glibc mailing list about this, trying to find out how to work around this problem, perhaps adding frame pointer to the assembly implementation of memcpy. They suggested I try attaching a debugger, and I’ve found that I can get stack traces just fine. They suggest that I seem to be running into some kind of bug in perf. 
> 
> Any help/advice would be appreciated.
> 
> Thanks.
> 
> 

Thanks
James

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf not showing me call graph for memcpy no matter what
  2021-12-16 13:37 perf not showing me call graph for memcpy no matter what Timothy Miller
  2021-12-16 13:54 ` Timothy Miller
  2021-12-16 17:23 ` James Clark
@ 2021-12-17  9:16 ` Milian Wolff
       [not found]   ` <7319C632-64B3-419D-B46A-EA42D2A63CDD@icloud.com>
  2 siblings, 1 reply; 6+ messages in thread
From: Milian Wolff @ 2021-12-17  9:16 UTC (permalink / raw)
  To: linux-perf-users, Timothy Miller

[-- Attachment #1: Type: text/plain, Size: 2395 bytes --]

On Donnerstag, 16. Dezember 2021 14:37:03 CET Timothy Miller wrote:
> Hi,
> 
> I am doing some software profiling on an aarch64 system, and I’m using the
> Linux perf tool. The problem I’m running into is that “__GI___memcpy_simd”
> keeps showing up as the function with the most CPU usage.
> 
> Unfortunately, no matter what I do, this function keeps showing up as
> orphaned. That is, I cannot get a stack trace for it so I can find out who
> is calling it.
> 
> I have tried using dwarf mode, but it always gets overloaded.
> 
> I have tried using lbr mode, but I get the following error:
>    Error:
>    PMU Hardware doesn't support sampling/overflow-interrupts.
> 
> I’ve rebuilt my application and all relevant libraries with
> -no-omit-frame-pointer so that I could use the default frame pointer mode.
> Unfortunately, I still can’t get a call graph for this function.
> 
> I emailed the glibc mailing list about this, trying to find out how to work
> around this problem, perhaps adding frame pointer to the assembly
> implementation of memcpy. They suggested I try attaching a debugger, and
> I’ve found that I can get stack traces just fine. They suggest that I seem
> to be running into some kind of bug in perf.
> 
> Any help/advice would be appreciated.

Hey Timothy,

I haven't been following upstream perf development closely in the past, but a 
few years ago perf was missing support for arm <-> x86 cross machine 
unwinding. Maybe that's the issue you are running into?

Can you try the AppImage of hotspot available at [1] and see if that one works 
when you pass the right sysroot and potentially other flags, see [2] and 
`hotspot --help` for more information.

Hotspot uses a different unwinding mechanism and at least in the past used to 
be better at unwinding `perf.data` files recorded on arm on a x86 machine.

One way or another though, it would be great if you can share an MWE - i.e. 
the `perf.data` file together with the `perf archive` for a minimal example 
that just calls `memcpy` in a loop.

Cheers

[1]: https://github.com/KDAB/hotspot/releases/tag/continuous
[2]: https://github.com/KDAB/hotspot#embedded-systems

-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5272 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: perf not showing me call graph for memcpy no matter what
       [not found]   ` <7319C632-64B3-419D-B46A-EA42D2A63CDD@icloud.com>
@ 2021-12-18 12:47     ` Milian Wolff
  0 siblings, 0 replies; 6+ messages in thread
From: Milian Wolff @ 2021-12-18 12:47 UTC (permalink / raw)
  To: Timothy Miller; +Cc: linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 970 bytes --]

On Freitag, 17. Dezember 2021 16:36:36 CET Timothy Miller wrote:
> > On Dec 17, 2021, at 4:16 AM, Milian Wolff <milian.wolff@kdab.com> wrote:
> > 
> > 
> > Can you try the AppImage of hotspot available at [1] and see if that one
> > works when you pass the right sysroot and potentially other flags, see
> > [2] and `hotspot --help` for more information.
> 
> Is this hotspot just a GUI front-end to perf? Or is it more?

It's just a GUI front end, i.e. a replacement for `perf report`.

> Is this appimage built for aarch64?

No.

> I don’t have a Linux desktop environment set up. I’m just using ssh to
> access remote servers.

You could try a simple VM with any run of the mill linux desktop distribution 
installed.

> Can I use hotspot without a GUI?

No.

-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5272 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* perf not showing me call graph for memcpy no matter what.
@ 2021-12-15 23:47 Timothy Miller
  0 siblings, 0 replies; 6+ messages in thread
From: Timothy Miller @ 2021-12-15 23:47 UTC (permalink / raw)
  To: linux-perf-users

Hi,

I am doing some software profiling on an aarch64 system, and I’m using the Linux perf tool. The problem I’m running into is that “__GI___memcpy_simd” keeps showing up as the function with the most CPU usage.

Unfortunately, no matter what I do, this function keeps showing up as orphaned. That is, I cannot get a stack trace for it so I can find out who is calling it.

I have tried using dwarf mode, but it always gets overloaded.

I have tried using lbr mode, but I get the following error:
    Error:
    PMU Hardware doesn't support sampling/overflow-interrupts.

I’ve rebuilt my application and all relevant libraries with -no-omit-frame-pointer so that I could use the default frame pointer mode. Unfortunately, I still can’t get a call graph for this function.

I emailed the glibc mailing list about this, trying to find out how to work around this problem, perhaps adding frame pointer to the assembly implementation of memcpy. They suggested I try attaching a debugger, and I’ve found that I can get stack traces just fine. They suggest that I seem to be running into some kind of bug in perf. 

Any help/advice would be appreciated.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-12-18 12:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-16 13:37 perf not showing me call graph for memcpy no matter what Timothy Miller
2021-12-16 13:54 ` Timothy Miller
2021-12-16 17:23 ` James Clark
2021-12-17  9:16 ` Milian Wolff
     [not found]   ` <7319C632-64B3-419D-B46A-EA42D2A63CDD@icloud.com>
2021-12-18 12:47     ` Milian Wolff
  -- strict thread matches above, loose matches on Subject: below --
2021-12-15 23:47 Timothy Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.