All of lore.kernel.org
 help / color / mirror / Atom feed
* Perf call graphs on ARM
@ 2021-04-14 13:22 Fontius Sebastian (XC-DA/ESV9)
  2021-04-14 13:38 ` Arnaldo Carvalho de Melo
  2021-04-16 11:16 ` James Clark
  0 siblings, 2 replies; 4+ messages in thread
From: Fontius Sebastian (XC-DA/ESV9) @ 2021-04-14 13:22 UTC (permalink / raw)
  To: linux-perf-users

Hi everyone,

we are having trouble to get perf to output a call graph using frame pointers on a Raspberry Pi 2 Model B Revision 1.1 (ARM Cortex-A7) running Raspbian on its official Kernel versions 4.9 and 5.10.

Let me illustrate the problem we're having using a small test program we will call aaa.cpp:

#include <iostream>
#include <cmath>
__attribute__((noinline)) double G(double aaa) {
  return sqrt(aaa);
}
__attribute__((noinline)) double doit(double aaa) {
  for(int i=0; i<1000; i++)
    aaa=G(aaa);
  return aaa;
}
int main() {
  double aaa = 12;
  for(int i=0; i<10000; i++){
    aaa = doit(aaa);
    for(int j=0; j<1000; j++)
      aaa++;
    }
  std::cout << aaa << std::endl;
}

This program gets compiled like this using GCC Raspbian 8.3.0-6+rpi1:

g++ -O2 -fno-omit-frame-pointer aaa.cpp

Then we run perf on it like this:

perf record -e cycles --call-graph fp -- ./a.out
perf report

In the output of perf there should be a tree like the following:

main
  + doit
    + G

Instead what we are getting is all of those functions attached to a.out (but we _do_ get the runtimes correctly).

It seems the frame pointers are written to the binary, but do not work. We can see the frame pointers in a disassembly output created by compiling with -save-temps like this:

g++ -O2 -fno-omit-frame-pointer -save-temps aaa.cpp

This gives the following output for the doit() function:

_Z4doitd:
.fnstart
.LFB1758:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 1, uses_anonymous_args = 0
vmov.f64 d7, d0
push {r4, r5, fp, lr}
mov r4, #1000
add fp, sp, #12
.L7:
vmov.f64 d0, d7
bl _Z1Gd
subs r4, r4, #1
vmov.f64 d7, d0
bne .L7
pop {r4, r5, fp, pc}
.cantunwind
.fnend

One thing we did try in addition to using frame pointers is to use the DWARF format, but that has some disadvantages like e.g. using roughly 20x the space of the FP format and being much slower to record. Also the recording itself seems unstable and can simply hang the whole Raspberry Pi completely requiring a hard reset. Using Kernel 5.10 the DWARF format also did exhibit the same 'disconnectedness' of the call stack (i.e. all function directly below a.out).

We also tried running Ubuntu 20.04 using its Kernel 5.4, but there both FP and DWARF were 'disconnected' again.

We're at a loss what is going wrong here. Does someone here have an idea what we could try to further debug or even understand the problem?

Mit freundlichen Grüßen / Best regards

Sebastian Fontius

Chassis Systems Control, Image Processing 9 (XC-DA/ESV9)
Robert Bosch GmbH | Postfach 16 61 | 71226 Leonberg | GERMANY | www.bosch.com

Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
Aufsichtsratsvorsitzender: Franz Fehrenbach; Geschäftsführung: Dr. Volkmar Denner, 
Prof. Dr. Stefan Asenkerschbaumer, Filiz Albrecht, Dr. Michael Bolle, Dr. Christian Fischer, 
Dr. Stefan Hartung, Dr. Markus Heyn, Harald Kröger, Rolf Najork, Uwe Raschke


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-04-16 11:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-14 13:22 Perf call graphs on ARM Fontius Sebastian (XC-DA/ESV9)
2021-04-14 13:38 ` Arnaldo Carvalho de Melo
2021-04-16  8:19   ` Fontius Sebastian (XC-DA/ESV9)
2021-04-16 11:16 ` James Clark

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.