On Dienstag, 30. Oktober 2018 23:34:35 CET Milian Wolff wrote: > On Mittwoch, 24. Oktober 2018 16:48:18 CET Andi Kleen wrote: > > > Can someone at least confirm whether unwinding from a function prologue > > > via > > > .eh_frame (but without .debug_frame) should actually be possible? > > > > Yes it should be possible. Asynchronous unwind tables should work > > from any instruction. > We can find `7f91345bdaf8+1 = 7f91345bdaf9" at offset 16 (search for "f9 da > 5b 34 91 7f"). Using that address makes unwinding work for this sample. > What could be the reason for this shift? I believe I have found the culprit: PEBS seems to be at fault here - i.e. the RIP/RSP and the ustack dump of the sample simply don't fit together. Check this out: ``` $ for i in $(seq 10); do perf record -q -e "cycles:" --call-graph dwarf ./cpp- inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\ [unknown\]"; done 0 0 0 0 0 0 0 0 0 0 $ for i in $(seq 10); do perf record -q -e "cycles:p" --call-graph dwarf ./ cpp-inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\ [unknown\]"; done 0 0 0 0 0 0 0 0 0 0 $ for i in $(seq 10); do perf record -q -e "cycles:pp" --call-graph dwarf ./ cpp-inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\ [unknown\]"; done 37 39 35 28 40 39 29 37 31 26 $ for i in $(seq 10); do perf record -q -e "cycles:ppp" --call-graph dwarf ./ cpp-inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\ [unknown\]"; done 79 70 76 77 70 90 64 78 86 74 ``` Note how precise levels 0 and 1 do not produce any samples where unwinding fails. But precise level 2 produces some, and precise level 3 increases the amount (by ca. ~2x). I can reproduce this pattern on two separate Intel CPUs and kernel versions currently: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz with 4.18.16-arch1-1-ARCH Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 4.14.78-1-lts Could someone else try this? What about AMD and IBS - is it also affected? What about newer/different Intel CPUs? Better yet, can someone come up with a fix for this on Intel with maximum precise level? Thanks -- Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts