linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Milian Wolff <milian.wolff@kdab.com>
To: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>,
	linux-kernel@vger.kernel.org, Jiri Olsa <jolsa@kernel.org>,
	namhyung@kernel.org, linux-perf-users@vger.kernel.org,
	Arnaldo Carvalho <acme@kernel.org>
Subject: Re: PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]
Date: Mon, 05 Nov 2018 23:54:40 +0100	[thread overview]
Message-ID: <3799078.YBnU1OB0PF@agathebauer> (raw)
In-Reply-To: <20181105205119.GC25674@krava>

[-- Attachment #1: Type: text/plain, Size: 3898 bytes --]

On Montag, 5. November 2018 21:51:19 CET Jiri Olsa wrote:
> On Fri, Nov 02, 2018 at 06:56:50PM +0100, Milian Wolff wrote:
> 
> SNIP
> 
> > > > Note how precise levels 0 and 1 do not produce any samples where
> > > > unwinding
> > > > fails. But precise level 2 produces some, and precise level 3
> > > > increases
> > > > the
> > > > amount (by ca. ~2x).
> > > > 
> > > > I can reproduce this pattern on two separate Intel CPUs and kernel
> > > > versions
> > > > currently:
> > > > 
> > > > Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz with 4.18.16-arch1-1-ARCH
> > > > Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 4.14.78-1-lts
> > > > 
> > > > Could someone else try this? What about AMD and IBS - is it also
> > > > affected?
> > > > What about newer/different Intel CPUs?
> > > 
> > > I tried on intel and can't actualy see that.. how do the failed samples
> > > look like? like is there the stack dump attached, what's in the regs?
> > > 
> > > could you please paste the 'perf report -D' output for some of the
> > > failed samples?
> > 
> > See here for one case: https://paste.kde.org/prryvdilq
> 
> we should really print some helpfull debug output
> for this.. like to show some markers where the stack
> data starts

Further down below, the offset for the ustack start is given (0xe0). But yes, 
that would be welcome.

> > What Intel CPU did you use? What microcode version? Which kernel version?
> > 
> > Generally, isn't what I'm seeing actually a neccessary evil of the ustack
> > based unwinding in perf? I mean, the general procedure is as follows if
> > I'm
> > not mistaken:
> > 
> > - PMU triggers interrupt and PEBS stores RIP etc.
> > - code continous to execute, possibly changing the stack
> 
> I dont think the code continues to execute.. the stack is ok

Are you sure about this? I mean, isn't that the whole reason why we need PEBS? 
Generally, if you are sure about this, can you point me to some documentation 
on this to allow me to understand it better?

Also, how do you explain the scenario I am seeing, with `cycles:` and 
`cycles:p` not suffering from this issue, but `cycles:pp` and `cycles:ppp` 
leading to broken samples? It _has_ to be PEBS then, no? What else could 
explain this?

> the problem I saw in past is that the copy from user is not
> 100% and sometimes you might not get full stack data you
> asked for

But that would indicate missing data at the end of the ustack dump? In our 
case, the "problematic" data is always at the start.

Also note the apparent shift in the ustack copy which - in one case - directly 
correlatates with the code being executed, i.e. from objdump in libm I see:

0x0000000000029688 <+40>:    sub    $0x28,%rsp
(https://paste.kde.org/poywa7y2z)

The address of the expected parent frame is 7ffff7c7caf8 (hypotf32x+0x18). 
This can be found at offset 80 in the ustack dump (cf. https://paste.kde.org/
prryvdilq - ("f9 ca c7 f7 ff 7f" is found at 0x130, minus 0xe0 yields 0x50 or 
80).

From the libunwind (or libdw) debug output in perf, we see that the unwinder 
tries to access offset 32 (cf. https://paste.kde.org/prryvdilq#line-610), 
which is ofset by 48 from the desired value of 80. This offset is *veroy* 
close to the value of 40 we see in the libm disassembly for __hypot_function 
("$0x28,%rsp"). Is this really just a coincidence?

> have you tried with libdw unwinder? if one of the unwinder
> shows more callchains, we need to fix the other one ;-)

Yes, I've looked at both unwinders. Both try to access the same values, and 
both break due to seemingly wrong data being read from the stack. And if you 
look at my other patches, you may have seen that I've regularly fixed the 
libdw unwinder to bring it closer to libunwind.

Thanks
-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 3826 bytes --]

  reply	other threads:[~2018-11-05 22:54 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-20 22:39 Broken dwarf unwinding - wrong stack pointer register value? Milian Wolff
2018-10-21 20:32 ` Milian Wolff
2018-10-22 10:35 ` Milian Wolff
2018-10-22 11:17   ` Milian Wolff
2018-10-22 13:58     ` Andi Kleen
2018-10-22 19:26       ` Milian Wolff
2018-10-23  4:03         ` Andi Kleen
2018-10-23 10:34           ` Milian Wolff
2018-10-24 14:48             ` Andi Kleen
2018-10-30 22:34               ` Milian Wolff
2018-11-01 22:08                 ` PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?] Milian Wolff
2018-11-02 11:26                   ` Jiri Olsa
2018-11-02 17:56                     ` Milian Wolff
2018-11-05 20:51                       ` Jiri Olsa
2018-11-05 22:54                         ` Milian Wolff [this message]
2018-11-06  0:10                           ` Andi Kleen
2018-11-06  8:39                             ` Jiri Olsa
2018-11-06 17:26                               ` Andi Kleen
2018-11-06 20:04                               ` Milian Wolff
2018-11-06 20:24                                 ` Andi Kleen
2018-11-07 22:41                                   ` Milian Wolff
2018-11-08 12:41                                     ` Milian Wolff
2018-11-09  0:55                                       ` Andi Kleen
2018-11-09  0:54                                     ` Andi Kleen
2018-11-10 21:42                             ` Travis Downs
2018-11-11  1:07                               ` Andi Kleen
     [not found]                                 ` <CAOBGo4zirLiKX8VcROAE=kAD0+qkF0E-cBv9DtBiQr=_obDv5w@mail.gmail.com>
2018-11-11  2:54                                   ` Travis Downs
2018-11-12  3:26                                   ` Andi Kleen
2018-11-14 13:20                                     ` Milian Wolff
2018-11-15  2:05                                       ` Travis Downs
2018-11-15  9:10                                         ` Milian Wolff
2018-11-15 19:00                                           ` Andi Kleen
2018-11-15  2:15                                     ` Travis Downs

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3799078.YBnU1OB0PF@agathebauer \
    --to=milian.wolff@kdab.com \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=jolsa@kernel.org \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=namhyung@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).