On Freitag, 2. November 2018 12:26:35 CET Jiri Olsa wrote: > On Thu, Nov 01, 2018 at 11:08:18PM +0100, Milian Wolff wrote: > > On Dienstag, 30. Oktober 2018 23:34:35 CET Milian Wolff wrote: > > > On Mittwoch, 24. Oktober 2018 16:48:18 CET Andi Kleen wrote: > > > > > Can someone at least confirm whether unwinding from a function > > > > > prologue > > > > > via > > > > > .eh_frame (but without .debug_frame) should actually be possible? > > > > > > > > Yes it should be possible. Asynchronous unwind tables should work > > > > from any instruction. > > > > > > > > > We can find `7f91345bdaf8+1 = 7f91345bdaf9" at offset 16 (search for "f9 > > > da > > > 5b 34 91 7f"). Using that address makes unwinding work for this sample. > > > What could be the reason for this shift? > > > > I believe I have found the culprit: PEBS seems to be at fault here - i.e. > > the RIP/RSP and the ustack dump of the sample simply don't fit together. > > > > Check this out: > > > > ``` > > $ for i in $(seq 10); do perf record -q -e "cycles:" --call-graph dwarf > > ./cpp- inlining > /dev/null; perf script | pcre2grep -c -M > > "hypot_finite.*\n.*\ [unknown\]"; done > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > > > $ for i in $(seq 10); do perf record -q -e "cycles:p" --call-graph dwarf > > ./ > > cpp-inlining > /dev/null; perf script | pcre2grep -c -M > > "hypot_finite.*\n.*\ [unknown\]"; done > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > 0 > > > > $ for i in $(seq 10); do perf record -q -e "cycles:pp" --call-graph dwarf > > ./ cpp-inlining > /dev/null; perf script | pcre2grep -c -M > > "hypot_finite.*\n.*\ [unknown\]"; done > > 37 > > 39 > > 35 > > 28 > > 40 > > 39 > > 29 > > 37 > > 31 > > 26 > > > > $ for i in $(seq 10); do perf record -q -e "cycles:ppp" --call-graph dwarf > > ./ cpp-inlining > /dev/null; perf script | pcre2grep -c -M > > "hypot_finite.*\n.*\ [unknown\]"; done > > 79 > > 70 > > 76 > > 77 > > 70 > > 90 > > 64 > > 78 > > 86 > > 74 > > ``` > > > > Note how precise levels 0 and 1 do not produce any samples where unwinding > > fails. But precise level 2 produces some, and precise level 3 increases > > the > > amount (by ca. ~2x). > > > > I can reproduce this pattern on two separate Intel CPUs and kernel > > versions > > currently: > > > > Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz with 4.18.16-arch1-1-ARCH > > Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 4.14.78-1-lts > > > > Could someone else try this? What about AMD and IBS - is it also affected? > > What about newer/different Intel CPUs? > > I tried on intel and can't actualy see that.. how do the failed samples > look like? like is there the stack dump attached, what's in the regs? > > could you please paste the 'perf report -D' output for some of the > failed samples? See here for one case: https://paste.kde.org/prryvdilq What Intel CPU did you use? What microcode version? Which kernel version? Generally, isn't what I'm seeing actually a neccessary evil of the ustack based unwinding in perf? I mean, the general procedure is as follows if I'm not mistaken: - PMU triggers interrupt and PEBS stores RIP etc. - code continous to execute, possibly changing the stack - PMU interrupt is handled, and a perf sample is generated - register values are used from "past" status stored in PEBS - but ustack dump is based on the "current" status From this latter discrepancy, it must directly follow that *sometimes* the ustack dump represents a state that cannot be unwound from, no? Cheers -- Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer KDAB (Deutschland) GmbH, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt, C++ and OpenGL Experts