linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: perf is unable to read dward from go programs
       [not found] <CABWYdi2jvPUq128XDv_VbY=vFknFyJHbUR=0_K9WuA0mFTkPvg@mail.gmail.com>
@ 2019-11-27 21:15 ` Ivan Babrou
  2019-11-29 13:49   ` Jiri Olsa
  0 siblings, 1 reply; 7+ messages in thread
From: Ivan Babrou @ 2019-11-27 21:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-perf-users, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, kernel-team

There were no response in linux-perf-users@, so I think it's fair to
ask maintainers.

On Fri, Nov 8, 2019 at 3:53 PM Ivan Babrou <ivan@cloudflare.com> wrote:
>
> I have a simple piece of code that burns CPU for 1s:
>
> * https://gist.github.com/bobrik/cf022ff6950d09032fa13a984e2272ed
>
> I can build it just fine: go build -o /tmp/burn burn.go
>
> And I can see correct stacks if I record with fp:
>
> perf record -e cpu-clock -g -F 99 /tmp/burn
>
> But if I record with gwarf:
>
> perf record -e cpu-clock -g -F 99 --call-graph dwarf /tmp/burn
>
> Then stacks are lost with the following complaints during "perf script":
>
> BFD: Dwarf Error: found dwarf version '376', this reader only handles
> version 2, 3 and 4 information.
> BFD: Dwarf Error: found dwarf version '31863', this reader only
> handles version 2, 3 and 4 information.
> BFD: Dwarf Error: found dwarf version '65271', this reader only
> handles version 2, 3 and 4 information.
> BFD: Dwarf Error: found dwarf version '289', this reader only handles
> version 2, 3 and 4 information.
> ...
>
> That's on Linux 5.4-rc5 with binutils 2.28. On Linux 4.19.80 with
> binutils 2.31.1 I don't see the error, but the stacks are not any
> better:
>
> burn   788  3994.230871:   10101010 cpu-clock:
>           479955 crypto/sha512.blockAVX2+0x965 (/tmp/burn)
>
> burn   786  3994.241393:   10101010 cpu-clock:
>           40b2a7 runtime.mallocgc+0x697 (/tmp/burn)
>
> burn   782  3994.248061:   10101010 cpu-clock:
>           4746f1 crypto/sha512.(*digest).Write+0x21 (/tmp/burn)
>
> Compare to an fp version:
>
> burn   762  3892.587246:   10101010 cpu-clock:
>           479b19 crypto/sha512.blockAVX2+0xb29 (/tmp/burn)
> d186b8c721c0c207 [unknown] ([unknown])
>
> burn   760  3892.597158:   10101010 cpu-clock:
>           474783 crypto/sha512.(*digest).Write+0xb3 (/tmp/burn)
>           474e52 crypto/sha512.(*digest).checkSum+0xd2 (/tmp/burn)
>           4749e3 crypto/sha512.(*digest).Sum+0xa3 (/tmp/burn)
>           4840d3 main.burn+0xe3 (/tmp/burn)
>           483fda main.main+0x2a (/tmp/burn)
>           4298ee runtime.main+0x21e (/tmp/burn)
>       c000016060 [unknown] ([unknown])
> 89481eebc0313474 [unknown] ([unknown])
>
> I can understand AVX being off, but other stacks should be in order.
>
> Interestingly, in production I can see that some binaries are not
> affected by this issue.
>
> Is this an expected outcome?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf is unable to read dward from go programs
  2019-11-27 21:15 ` perf is unable to read dward from go programs Ivan Babrou
@ 2019-11-29 13:49   ` Jiri Olsa
  2019-11-29 15:14     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 7+ messages in thread
From: Jiri Olsa @ 2019-11-29 13:49 UTC (permalink / raw)
  To: Ivan Babrou
  Cc: linux-kernel, linux-perf-users, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Namhyung Kim,
	kernel-team

On Wed, Nov 27, 2019 at 01:15:20PM -0800, Ivan Babrou wrote:
> There were no response in linux-perf-users@, so I think it's fair to
> ask maintainers.
> 
> On Fri, Nov 8, 2019 at 3:53 PM Ivan Babrou <ivan@cloudflare.com> wrote:
> >
> > I have a simple piece of code that burns CPU for 1s:
> >
> > * https://gist.github.com/bobrik/cf022ff6950d09032fa13a984e2272ed
> >
> > I can build it just fine: go build -o /tmp/burn burn.go
> >
> > And I can see correct stacks if I record with fp:
> >
> > perf record -e cpu-clock -g -F 99 /tmp/burn
> >
> > But if I record with gwarf:
> >
> > perf record -e cpu-clock -g -F 99 --call-graph dwarf /tmp/burn
> >
> > Then stacks are lost with the following complaints during "perf script":
> >
> > BFD: Dwarf Error: found dwarf version '376', this reader only handles
> > version 2, 3 and 4 information.
> > BFD: Dwarf Error: found dwarf version '31863', this reader only
> > handles version 2, 3 and 4 information.
> > BFD: Dwarf Error: found dwarf version '65271', this reader only
> > handles version 2, 3 and 4 information.
> > BFD: Dwarf Error: found dwarf version '289', this reader only handles
> > version 2, 3 and 4 information.

hi,
the binary generated by go has compressed debug info (on my setup)
and libunwind (default dwarf unwinder) does not seem to support that

but when I compile perf with libdw unwind support:

  $ make DEBUG=1 VF=1 NO_LIBUNWIND=1

I'm getting proper backtraces (below), maybe it's time to change
the default dwarf unwinder ;-)

thanks,
jirka


---
    51.63%  ex       ex                [.] crypto/sha512.blockAVX2
            |
            ---crypto/sha512.blockAVX2
               |          
                --51.48%--crypto/sha512.block
                          crypto/sha512.(*digest).Write
                          crypto/sha512.(*digest).checkSum
                          crypto/sha512.(*digest).Sum
                          main.burn
                          main.main
                          runtime.main
                          runtime.goexit

    11.55%  ex       ex                [.] runtime.mallocgc
            |
            ---runtime.mallocgc
               |          
               |--7.45%--runtime.newobject
               |          |          
               |           --7.45%--main.burn
               |                     main.main
               |                     runtime.main
               |                     runtime.goexit
               |          
                --3.40%--runtime.growslice
                          crypto/sha512.(*digest).Sum
                          main.burn
                          main.main
                          runtime.main
                          runtime.goexit

     3.69%  ex       ex                [.] crypto/sha512.(*digest).Write
            |
            ---crypto/sha512.(*digest).Write
               |          
               |--2.91%--crypto/sha512.(*digest).checkSum
               |          crypto/sha512.(*digest).Sum
               |          main.burn
               |          main.main
               |          runtime.main
               |          runtime.goexit
               |          
                --0.57%--main.burn
                          main.main
                          runtime.main
                          runtime.goexit

     3.44%  ex       ex                [.] runtime.memclrNoHeapPointers
            |
            ---runtime.memclrNoHeapPointers
               |          
                --2.92%--runtime.(*mheap).alloc
                          runtime.(*mcentral).grow
                          runtime.(*mcentral).cacheSpan
                          runtime.(*mcache).refill
                          runtime.(*mcache).nextFree
                          runtime.mallocgc
                          |          
                          |--2.27%--runtime.newobject
                          |          main.burn
                          |          main.main
                          |          runtime.main
                          |          runtime.goexit
                          |          
                           --0.64%--runtime.growslice
                                     crypto/sha512.(*digest).Sum
                                     main.burn
                                     main.main
                                     runtime.main
                                     runtime.goexit
...


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf is unable to read dward from go programs
  2019-11-29 13:49   ` Jiri Olsa
@ 2019-11-29 15:14     ` Arnaldo Carvalho de Melo
  2019-12-02 19:49       ` Ivan Babrou
  0 siblings, 1 reply; 7+ messages in thread
From: Arnaldo Carvalho de Melo @ 2019-11-29 15:14 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ivan Babrou, linux-kernel, linux-perf-users, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Namhyung Kim, kernel-team

Em Fri, Nov 29, 2019 at 02:49:29PM +0100, Jiri Olsa escreveu:
> On Wed, Nov 27, 2019 at 01:15:20PM -0800, Ivan Babrou wrote:
> > There were no response in linux-perf-users@, so I think it's fair to
> > ask maintainers.
> > 
> > On Fri, Nov 8, 2019 at 3:53 PM Ivan Babrou <ivan@cloudflare.com> wrote:
> > >
> > > I have a simple piece of code that burns CPU for 1s:
> > >
> > > * https://gist.github.com/bobrik/cf022ff6950d09032fa13a984e2272ed
> > >
> > > I can build it just fine: go build -o /tmp/burn burn.go
> > >
> > > And I can see correct stacks if I record with fp:
> > >
> > > perf record -e cpu-clock -g -F 99 /tmp/burn
> > >
> > > But if I record with gwarf:
> > >
> > > perf record -e cpu-clock -g -F 99 --call-graph dwarf /tmp/burn
> > >
> > > Then stacks are lost with the following complaints during "perf script":
> > >
> > > BFD: Dwarf Error: found dwarf version '376', this reader only handles
> > > version 2, 3 and 4 information.
> > > BFD: Dwarf Error: found dwarf version '31863', this reader only
> > > handles version 2, 3 and 4 information.
> > > BFD: Dwarf Error: found dwarf version '65271', this reader only
> > > handles version 2, 3 and 4 information.
> > > BFD: Dwarf Error: found dwarf version '289', this reader only handles
> > > version 2, 3 and 4 information.
> 
> hi,
> the binary generated by go has compressed debug info (on my setup)
> and libunwind (default dwarf unwinder) does not seem to support that
> 
> but when I compile perf with libdw unwind support:
> 
>   $ make DEBUG=1 VF=1 NO_LIBUNWIND=1
> 
> I'm getting proper backtraces (below), maybe it's time to change
> the default dwarf unwinder ;-)

we have some 'perf test' entries testing the unwinding routines, can you
please check if those pass when switching to libdw's unwinder?

- Arnaldo
 
> thanks,
> jirka
> 
> 
> ---
>     51.63%  ex       ex                [.] crypto/sha512.blockAVX2
>             |
>             ---crypto/sha512.blockAVX2
>                |          
>                 --51.48%--crypto/sha512.block
>                           crypto/sha512.(*digest).Write
>                           crypto/sha512.(*digest).checkSum
>                           crypto/sha512.(*digest).Sum
>                           main.burn
>                           main.main
>                           runtime.main
>                           runtime.goexit
> 
>     11.55%  ex       ex                [.] runtime.mallocgc
>             |
>             ---runtime.mallocgc
>                |          
>                |--7.45%--runtime.newobject
>                |          |          
>                |           --7.45%--main.burn
>                |                     main.main
>                |                     runtime.main
>                |                     runtime.goexit
>                |          
>                 --3.40%--runtime.growslice
>                           crypto/sha512.(*digest).Sum
>                           main.burn
>                           main.main
>                           runtime.main
>                           runtime.goexit
> 
>      3.69%  ex       ex                [.] crypto/sha512.(*digest).Write
>             |
>             ---crypto/sha512.(*digest).Write
>                |          
>                |--2.91%--crypto/sha512.(*digest).checkSum
>                |          crypto/sha512.(*digest).Sum
>                |          main.burn
>                |          main.main
>                |          runtime.main
>                |          runtime.goexit
>                |          
>                 --0.57%--main.burn
>                           main.main
>                           runtime.main
>                           runtime.goexit
> 
>      3.44%  ex       ex                [.] runtime.memclrNoHeapPointers
>             |
>             ---runtime.memclrNoHeapPointers
>                |          
>                 --2.92%--runtime.(*mheap).alloc
>                           runtime.(*mcentral).grow
>                           runtime.(*mcentral).cacheSpan
>                           runtime.(*mcache).refill
>                           runtime.(*mcache).nextFree
>                           runtime.mallocgc
>                           |          
>                           |--2.27%--runtime.newobject
>                           |          main.burn
>                           |          main.main
>                           |          runtime.main
>                           |          runtime.goexit
>                           |          
>                            --0.64%--runtime.growslice
>                                      crypto/sha512.(*digest).Sum
>                                      main.burn
>                                      main.main
>                                      runtime.main
>                                      runtime.goexit
> ...

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf is unable to read dward from go programs
  2019-11-29 15:14     ` Arnaldo Carvalho de Melo
@ 2019-12-02 19:49       ` Ivan Babrou
  2019-12-03 10:22         ` Jiri Olsa
  0 siblings, 1 reply; 7+ messages in thread
From: Ivan Babrou @ 2019-12-02 19:49 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, linux-kernel, linux-perf-users, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Namhyung Kim, kernel-team

I've tried building with libdw with mixed results:

1. I can see stacks from some Go programs that were invisible before (yay!)

2. Warnings like below still appear in great numbers for a system-wide
flamegraph:

BFD: Dwarf Error: found dwarf version '18345', this reader only
handles version 2, 3 and 4 information.

I'm not sure how to pinpoint this to a particular binary and would
appreciate some help with this.

3. It takes minutes to produce a flamegraph of a running system
whereas before it only took seconds. See this flamegraph of "perf
script" itself:

* https://gist.github.com/bobrik/a9c46cffe9daa5840abd137443d8bab0#file-flamegraph-perf-svg

Seems like there is no caching and debug info is getting reparsed
continuously for every stack. It's possible that it was not an issue
before, because we spent no time decompressing dwarf.

4. Pretty much all luajit frames stacks that were marked as unknown
before are now gone.

See before and after here: https://imgur.com/a/1LNfqAk

Before:

nginx-cache 94572 446642.722028:   10101010 cpu-clock:
            5607d8d56718 ngx_http_lua_shdict_lookup+0x48 (inlined)
            5607d8d5a09d ngx_http_lua_ffi_shdict_incr+0xcd
(/usr/local/nginx-cache/sbin/nginx-cache)
            560802fe58e4 [unknown] (/tmp/perf-94572.map)

After:

nginx-cache 94572 446543.008703:   10101010 cpu-clock:
            5607d8d56718 ngx_http_lua_shdict_lookup+0x48 (inlined)
            5607d8d59da7 ngx_http_lua_ffi_shdict_get+0x197
(/usr/local/nginx-cache/sbin/nginx-cache)

The key is /tmp/perf-*.map frame at the bottom. I don't know if it's
expected, but we grew dependent on knowing this.

5. Special [[stack]], [[heap]] and [anon] frames are also gone, and
you can see the following during "perf script" run:

open("[stack]", O_RDONLY)               = -1 ENOENT (No such file or directory)
open("[heap]", O_RDONLY)                = -1 ENOENT (No such file or directory)
open("//anon", O_RDONLY)                = -1 ENOENT (No such file or directory)

On Fri, Nov 29, 2019 at 7:14 AM Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
>
> Em Fri, Nov 29, 2019 at 02:49:29PM +0100, Jiri Olsa escreveu:
> > On Wed, Nov 27, 2019 at 01:15:20PM -0800, Ivan Babrou wrote:
> > > There were no response in linux-perf-users@, so I think it's fair to
> > > ask maintainers.
> > >
> > > On Fri, Nov 8, 2019 at 3:53 PM Ivan Babrou <ivan@cloudflare.com> wrote:
> > > >
> > > > I have a simple piece of code that burns CPU for 1s:
> > > >
> > > > * https://gist.github.com/bobrik/cf022ff6950d09032fa13a984e2272ed
> > > >
> > > > I can build it just fine: go build -o /tmp/burn burn.go
> > > >
> > > > And I can see correct stacks if I record with fp:
> > > >
> > > > perf record -e cpu-clock -g -F 99 /tmp/burn
> > > >
> > > > But if I record with gwarf:
> > > >
> > > > perf record -e cpu-clock -g -F 99 --call-graph dwarf /tmp/burn
> > > >
> > > > Then stacks are lost with the following complaints during "perf script":
> > > >
> > > > BFD: Dwarf Error: found dwarf version '376', this reader only handles
> > > > version 2, 3 and 4 information.
> > > > BFD: Dwarf Error: found dwarf version '31863', this reader only
> > > > handles version 2, 3 and 4 information.
> > > > BFD: Dwarf Error: found dwarf version '65271', this reader only
> > > > handles version 2, 3 and 4 information.
> > > > BFD: Dwarf Error: found dwarf version '289', this reader only handles
> > > > version 2, 3 and 4 information.
> >
> > hi,
> > the binary generated by go has compressed debug info (on my setup)
> > and libunwind (default dwarf unwinder) does not seem to support that
> >
> > but when I compile perf with libdw unwind support:
> >
> >   $ make DEBUG=1 VF=1 NO_LIBUNWIND=1
> >
> > I'm getting proper backtraces (below), maybe it's time to change
> > the default dwarf unwinder ;-)
>
> we have some 'perf test' entries testing the unwinding routines, can you
> please check if those pass when switching to libdw's unwinder?
>
> - Arnaldo
>
> > thanks,
> > jirka
> >
> >
> > ---
> >     51.63%  ex       ex                [.] crypto/sha512.blockAVX2
> >             |
> >             ---crypto/sha512.blockAVX2
> >                |
> >                 --51.48%--crypto/sha512.block
> >                           crypto/sha512.(*digest).Write
> >                           crypto/sha512.(*digest).checkSum
> >                           crypto/sha512.(*digest).Sum
> >                           main.burn
> >                           main.main
> >                           runtime.main
> >                           runtime.goexit
> >
> >     11.55%  ex       ex                [.] runtime.mallocgc
> >             |
> >             ---runtime.mallocgc
> >                |
> >                |--7.45%--runtime.newobject
> >                |          |
> >                |           --7.45%--main.burn
> >                |                     main.main
> >                |                     runtime.main
> >                |                     runtime.goexit
> >                |
> >                 --3.40%--runtime.growslice
> >                           crypto/sha512.(*digest).Sum
> >                           main.burn
> >                           main.main
> >                           runtime.main
> >                           runtime.goexit
> >
> >      3.69%  ex       ex                [.] crypto/sha512.(*digest).Write
> >             |
> >             ---crypto/sha512.(*digest).Write
> >                |
> >                |--2.91%--crypto/sha512.(*digest).checkSum
> >                |          crypto/sha512.(*digest).Sum
> >                |          main.burn
> >                |          main.main
> >                |          runtime.main
> >                |          runtime.goexit
> >                |
> >                 --0.57%--main.burn
> >                           main.main
> >                           runtime.main
> >                           runtime.goexit
> >
> >      3.44%  ex       ex                [.] runtime.memclrNoHeapPointers
> >             |
> >             ---runtime.memclrNoHeapPointers
> >                |
> >                 --2.92%--runtime.(*mheap).alloc
> >                           runtime.(*mcentral).grow
> >                           runtime.(*mcentral).cacheSpan
> >                           runtime.(*mcache).refill
> >                           runtime.(*mcache).nextFree
> >                           runtime.mallocgc
> >                           |
> >                           |--2.27%--runtime.newobject
> >                           |          main.burn
> >                           |          main.main
> >                           |          runtime.main
> >                           |          runtime.goexit
> >                           |
> >                            --0.64%--runtime.growslice
> >                                      crypto/sha512.(*digest).Sum
> >                                      main.burn
> >                                      main.main
> >                                      runtime.main
> >                                      runtime.goexit
> > ...
>
> --
>
> - Arnaldo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf is unable to read dward from go programs
  2019-12-02 19:49       ` Ivan Babrou
@ 2019-12-03 10:22         ` Jiri Olsa
  2019-12-19 23:38           ` Ivan Babrou
  0 siblings, 1 reply; 7+ messages in thread
From: Jiri Olsa @ 2019-12-03 10:22 UTC (permalink / raw)
  To: Ivan Babrou
  Cc: Arnaldo Carvalho de Melo, linux-kernel, linux-perf-users,
	Peter Zijlstra, Ingo Molnar, Alexander Shishkin, Namhyung Kim,
	kernel-team

On Mon, Dec 02, 2019 at 11:49:55AM -0800, Ivan Babrou wrote:
> I've tried building with libdw with mixed results:
> 
> 1. I can see stacks from some Go programs that were invisible before (yay!)
> 
> 2. Warnings like below still appear in great numbers for a system-wide
> flamegraph:
> 
> BFD: Dwarf Error: found dwarf version '18345', this reader only
> handles version 2, 3 and 4 information.
> 
> I'm not sure how to pinpoint this to a particular binary and would
> appreciate some help with this.

I'd need some way of reproducing this, could you please
paste me command lines you used?

> 
> 3. It takes minutes to produce a flamegraph of a running system
> whereas before it only took seconds. See this flamegraph of "perf
> script" itself:
> 
> * https://gist.github.com/bobrik/a9c46cffe9daa5840abd137443d8bab0#file-flamegraph-perf-svg
> 
> Seems like there is no caching and debug info is getting reparsed
> continuously for every stack. It's possible that it was not an issue
> before, because we spent no time decompressing dwarf.

possibly, if we have some clear reproducer we can hand it
to the libdw guy that helped us develop this

> 
> 4. Pretty much all luajit frames stacks that were marked as unknown
> before are now gone.
> 
> See before and after here: https://imgur.com/a/1LNfqAk
> 
> Before:
> 
> nginx-cache 94572 446642.722028:   10101010 cpu-clock:
>             5607d8d56718 ngx_http_lua_shdict_lookup+0x48 (inlined)
>             5607d8d5a09d ngx_http_lua_ffi_shdict_incr+0xcd
> (/usr/local/nginx-cache/sbin/nginx-cache)
>             560802fe58e4 [unknown] (/tmp/perf-94572.map)
> 
> After:
> 
> nginx-cache 94572 446543.008703:   10101010 cpu-clock:
>             5607d8d56718 ngx_http_lua_shdict_lookup+0x48 (inlined)
>             5607d8d59da7 ngx_http_lua_ffi_shdict_get+0x197
> (/usr/local/nginx-cache/sbin/nginx-cache)
> 
> The key is /tmp/perf-*.map frame at the bottom. I don't know if it's
> expected, but we grew dependent on knowing this.
> 
> 5. Special [[stack]], [[heap]] and [anon] frames are also gone, and
> you can see the following during "perf script" run:
> 
> open("[stack]", O_RDONLY)               = -1 ENOENT (No such file or directory)
> open("[heap]", O_RDONLY)                = -1 ENOENT (No such file or directory)
> open("//anon", O_RDONLY)                = -1 ENOENT (No such file or directory)

strange, let's start with the reproducer and I'll check
on this if I see it

thanks,
jirka


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf is unable to read dward from go programs
  2019-12-03 10:22         ` Jiri Olsa
@ 2019-12-19 23:38           ` Ivan Babrou
  2019-12-19 23:57             ` Ivan Babrou
  0 siblings, 1 reply; 7+ messages in thread
From: Ivan Babrou @ 2019-12-19 23:38 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, linux-kernel, linux-perf-users,
	Peter Zijlstra, Ingo Molnar, Alexander Shishkin, Namhyung Kim,
	kernel-team

At first I thought it was related to C/Go interaction, but it looks
even easier to trigger.

Ok, I have a reproduction for both random dwarf versions and for
uncached binary symbol resolution:

$ cat /tmp/main.go
package main

func main() {
  for {
    src := make([]byte, 100*1024*1024)
    dst := make([]byte, len(src))
    copy(dst, src)
  }
}

Compile and run it:

$ go build -o /tmp/memmove /tmp/main.go && /tmp/memmove

Then record a trace:

$ sudo perf record -p $(pidof memmove) --call-graph dwarf -- sleep 1

Getting stack traces generates warnings:

$ sudo perf script > /dev/null
BFD: Dwarf Error: found dwarf version '376', this reader only handles
version 2, 3 and 4 information.
BFD: Dwarf Error: found dwarf version '31863', this reader only
handles version 2, 3 and 4 information.
BFD: Dwarf Error: found dwarf version '65267', this reader only
handles version 2, 3 and 4 information.
BFD: Dwarf Error: found dwarf version '33833', this reader only
handles version 2, 3 and 4 information.
BFD: Dwarf Error: found dwarf version '41392', this reader only
handles version 2, 3 and 4 information.

Here's an example of raw perf script output where it generates the warning:

memmove 254821 125950.754906:     174236 cycles: BFD: Dwarf Error:
found dwarf version '33833', this reader only handles version 2, 3 and
4 information.

                  44d549 runtime.memmove+0x4d9 (/tmp/memmove)
                  452644 main.main+0x94 (/tmp/memmove)
                  42668d runtime.main+0x21d (/tmp/memmove)
                  44bef0 runtime.goexit+0x0 (/tmp/memmove)

Timed execution shows quite a bit of time spent:

$ time sudo perf script > /dev/null
... whole bunch of warnings about dwarf version here ..

real 0m24.356s
user 0m23.617s
sys 0m0.723s

And you can see object files being open over and over again:

$ sudo bpftrace -e 'tracepoint:syscalls:sys_enter_open,
tracepoint:syscalls:sys_enter_openat {
@filename[str(args->filename)]++ }'
... some unrelated files here ...
@filename[/lib/x86_64-linux-gnu/libpcre.so.3]: 595
@filename[/usr/lib/locale/locale-archive]: 625
@filename[/lib/x86_64-linux-gnu/libpthread.so.0]: 789
@filename[/lib/x86_64-linux-gnu/libdl.so.2]: 838
@filename[/var/log/journal/0bc45a9309f7404586bcddee0d45bf9d/system.journa]: 1044
@filename[/lib/x86_64-linux-gnu/libc.so.6]: 1273
@filename[/etc/ld.so.cache]: 1501
@filename[/usr/lib/x86_64-linux-gnu/../lib/x86_64-linux-gnu/elfutils/libe]: 5784
@filename[/usr/lib/x86_64-linux-gnu/elfutils/libebl_x86_64.so]: 5784
@filename[/tmp/memmove]: 5789

I'm going to guess that anything open for 5000+ times is related to
perf script run, as I can also see those in strace.

This is on Debian Stretch and Linux 5.4.5 (perf is from 5.4.5 as well).

This should cover points 2 and 3, I'll have to get back to you on
points 4 and 5, but those are separate.

On Tue, Dec 3, 2019 at 2:22 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Mon, Dec 02, 2019 at 11:49:55AM -0800, Ivan Babrou wrote:
> > I've tried building with libdw with mixed results:
> >
> > 1. I can see stacks from some Go programs that were invisible before (yay!)
> >
> > 2. Warnings like below still appear in great numbers for a system-wide
> > flamegraph:
> >
> > BFD: Dwarf Error: found dwarf version '18345', this reader only
> > handles version 2, 3 and 4 information.
> >
> > I'm not sure how to pinpoint this to a particular binary and would
> > appreciate some help with this.
>
> I'd need some way of reproducing this, could you please
> paste me command lines you used?
>
> >
> > 3. It takes minutes to produce a flamegraph of a running system
> > whereas before it only took seconds. See this flamegraph of "perf
> > script" itself:
> >
> > * https://gist.github.com/bobrik/a9c46cffe9daa5840abd137443d8bab0#file-flamegraph-perf-svg
> >
> > Seems like there is no caching and debug info is getting reparsed
> > continuously for every stack. It's possible that it was not an issue
> > before, because we spent no time decompressing dwarf.
>
> possibly, if we have some clear reproducer we can hand it
> to the libdw guy that helped us develop this
>
> >
> > 4. Pretty much all luajit frames stacks that were marked as unknown
> > before are now gone.
> >
> > See before and after here: https://imgur.com/a/1LNfqAk
> >
> > Before:
> >
> > nginx-cache 94572 446642.722028:   10101010 cpu-clock:
> >             5607d8d56718 ngx_http_lua_shdict_lookup+0x48 (inlined)
> >             5607d8d5a09d ngx_http_lua_ffi_shdict_incr+0xcd
> > (/usr/local/nginx-cache/sbin/nginx-cache)
> >             560802fe58e4 [unknown] (/tmp/perf-94572.map)
> >
> > After:
> >
> > nginx-cache 94572 446543.008703:   10101010 cpu-clock:
> >             5607d8d56718 ngx_http_lua_shdict_lookup+0x48 (inlined)
> >             5607d8d59da7 ngx_http_lua_ffi_shdict_get+0x197
> > (/usr/local/nginx-cache/sbin/nginx-cache)
> >
> > The key is /tmp/perf-*.map frame at the bottom. I don't know if it's
> > expected, but we grew dependent on knowing this.
> >
> > 5. Special [[stack]], [[heap]] and [anon] frames are also gone, and
> > you can see the following during "perf script" run:
> >
> > open("[stack]", O_RDONLY)               = -1 ENOENT (No such file or directory)
> > open("[heap]", O_RDONLY)                = -1 ENOENT (No such file or directory)
> > open("//anon", O_RDONLY)                = -1 ENOENT (No such file or directory)
>
> strange, let's start with the reproducer and I'll check
> on this if I see it
>
> thanks,
> jirka
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf is unable to read dward from go programs
  2019-12-19 23:38           ` Ivan Babrou
@ 2019-12-19 23:57             ` Ivan Babrou
  0 siblings, 0 replies; 7+ messages in thread
From: Ivan Babrou @ 2019-12-19 23:57 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, linux-kernel, linux-perf-users,
	Peter Zijlstra, Ingo Molnar, Alexander Shishkin, Namhyung Kim,
	kernel-team

Found a reproducer for missing /tmp/perf-XXX frames:

$ cat hello.java
class HelloWorld {
    public static void main(String args[]) {
        for (long i = 0; ; i++) {}
    }
}

Compile it:

$ javac hello.java

Run it:

$ java HelloWorld

Trace it with libdw:

$ sudo perf record -p $(pidof java) --call-graph dwarf -- sleep 1

Observe traces in perf script not having any frames:

java 61467 127535.601549:     149893 cycles:

java 61467 127535.601602:     606223 cycles:

java 61467 127535.601802:     890579 cycles:

java 61467 127535.602158:     840878 cycles:

java 61467 127535.602453:     809547 cycles:

java 61467 127535.602718:     794099 cycles:

java 61467 127535.602981:     777283 cycles:

Now trace it with libunwind:

$ sudo perf record -p $(pidof java) --call-graph dwarf -- sleep 1

And observe traces with /tmp/perf-XXX:

java 61467 127583.896584:     118386 cycles:
            7fcd55342ce8 [unknown] (/tmp/perf-61466.map)

java 61467 127583.896626:     537046 cycles:
            7fcd55342d17 [unknown] (/tmp/perf-61466.map)

java 61467 127583.896804:     866323 cycles:
            7fcd55342ce8 [unknown] (/tmp/perf-61466.map)

java 61467 127583.897091:     825548 cycles:
            7fcd55342d17 [unknown] (/tmp/perf-61466.map)

java 61467 127583.897361:     812276 cycles:
            7fcd55342d17 [unknown] (/tmp/perf-61466.map)

java 61467 127583.897627:     804520 cycles:
            7fcd55342ce5 [unknown] (/tmp/perf-61466.map)

This should cover all issues I mentioned.

On Thu, Dec 19, 2019 at 3:38 PM Ivan Babrou <ivan@cloudflare.com> wrote:
>
> At first I thought it was related to C/Go interaction, but it looks
> even easier to trigger.
>
> Ok, I have a reproduction for both random dwarf versions and for
> uncached binary symbol resolution:
>
> $ cat /tmp/main.go
> package main
>
> func main() {
>   for {
>     src := make([]byte, 100*1024*1024)
>     dst := make([]byte, len(src))
>     copy(dst, src)
>   }
> }
>
> Compile and run it:
>
> $ go build -o /tmp/memmove /tmp/main.go && /tmp/memmove
>
> Then record a trace:
>
> $ sudo perf record -p $(pidof memmove) --call-graph dwarf -- sleep 1
>
> Getting stack traces generates warnings:
>
> $ sudo perf script > /dev/null
> BFD: Dwarf Error: found dwarf version '376', this reader only handles
> version 2, 3 and 4 information.
> BFD: Dwarf Error: found dwarf version '31863', this reader only
> handles version 2, 3 and 4 information.
> BFD: Dwarf Error: found dwarf version '65267', this reader only
> handles version 2, 3 and 4 information.
> BFD: Dwarf Error: found dwarf version '33833', this reader only
> handles version 2, 3 and 4 information.
> BFD: Dwarf Error: found dwarf version '41392', this reader only
> handles version 2, 3 and 4 information.
>
> Here's an example of raw perf script output where it generates the warning:
>
> memmove 254821 125950.754906:     174236 cycles: BFD: Dwarf Error:
> found dwarf version '33833', this reader only handles version 2, 3 and
> 4 information.
>
>                   44d549 runtime.memmove+0x4d9 (/tmp/memmove)
>                   452644 main.main+0x94 (/tmp/memmove)
>                   42668d runtime.main+0x21d (/tmp/memmove)
>                   44bef0 runtime.goexit+0x0 (/tmp/memmove)
>
> Timed execution shows quite a bit of time spent:
>
> $ time sudo perf script > /dev/null
> ... whole bunch of warnings about dwarf version here ..
>
> real 0m24.356s
> user 0m23.617s
> sys 0m0.723s
>
> And you can see object files being open over and over again:
>
> $ sudo bpftrace -e 'tracepoint:syscalls:sys_enter_open,
> tracepoint:syscalls:sys_enter_openat {
> @filename[str(args->filename)]++ }'
> ... some unrelated files here ...
> @filename[/lib/x86_64-linux-gnu/libpcre.so.3]: 595
> @filename[/usr/lib/locale/locale-archive]: 625
> @filename[/lib/x86_64-linux-gnu/libpthread.so.0]: 789
> @filename[/lib/x86_64-linux-gnu/libdl.so.2]: 838
> @filename[/var/log/journal/0bc45a9309f7404586bcddee0d45bf9d/system.journa]: 1044
> @filename[/lib/x86_64-linux-gnu/libc.so.6]: 1273
> @filename[/etc/ld.so.cache]: 1501
> @filename[/usr/lib/x86_64-linux-gnu/../lib/x86_64-linux-gnu/elfutils/libe]: 5784
> @filename[/usr/lib/x86_64-linux-gnu/elfutils/libebl_x86_64.so]: 5784
> @filename[/tmp/memmove]: 5789
>
> I'm going to guess that anything open for 5000+ times is related to
> perf script run, as I can also see those in strace.
>
> This is on Debian Stretch and Linux 5.4.5 (perf is from 5.4.5 as well).
>
> This should cover points 2 and 3, I'll have to get back to you on
> points 4 and 5, but those are separate.
>
> On Tue, Dec 3, 2019 at 2:22 AM Jiri Olsa <jolsa@redhat.com> wrote:
> >
> > On Mon, Dec 02, 2019 at 11:49:55AM -0800, Ivan Babrou wrote:
> > > I've tried building with libdw with mixed results:
> > >
> > > 1. I can see stacks from some Go programs that were invisible before (yay!)
> > >
> > > 2. Warnings like below still appear in great numbers for a system-wide
> > > flamegraph:
> > >
> > > BFD: Dwarf Error: found dwarf version '18345', this reader only
> > > handles version 2, 3 and 4 information.
> > >
> > > I'm not sure how to pinpoint this to a particular binary and would
> > > appreciate some help with this.
> >
> > I'd need some way of reproducing this, could you please
> > paste me command lines you used?
> >
> > >
> > > 3. It takes minutes to produce a flamegraph of a running system
> > > whereas before it only took seconds. See this flamegraph of "perf
> > > script" itself:
> > >
> > > * https://gist.github.com/bobrik/a9c46cffe9daa5840abd137443d8bab0#file-flamegraph-perf-svg
> > >
> > > Seems like there is no caching and debug info is getting reparsed
> > > continuously for every stack. It's possible that it was not an issue
> > > before, because we spent no time decompressing dwarf.
> >
> > possibly, if we have some clear reproducer we can hand it
> > to the libdw guy that helped us develop this
> >
> > >
> > > 4. Pretty much all luajit frames stacks that were marked as unknown
> > > before are now gone.
> > >
> > > See before and after here: https://imgur.com/a/1LNfqAk
> > >
> > > Before:
> > >
> > > nginx-cache 94572 446642.722028:   10101010 cpu-clock:
> > >             5607d8d56718 ngx_http_lua_shdict_lookup+0x48 (inlined)
> > >             5607d8d5a09d ngx_http_lua_ffi_shdict_incr+0xcd
> > > (/usr/local/nginx-cache/sbin/nginx-cache)
> > >             560802fe58e4 [unknown] (/tmp/perf-94572.map)
> > >
> > > After:
> > >
> > > nginx-cache 94572 446543.008703:   10101010 cpu-clock:
> > >             5607d8d56718 ngx_http_lua_shdict_lookup+0x48 (inlined)
> > >             5607d8d59da7 ngx_http_lua_ffi_shdict_get+0x197
> > > (/usr/local/nginx-cache/sbin/nginx-cache)
> > >
> > > The key is /tmp/perf-*.map frame at the bottom. I don't know if it's
> > > expected, but we grew dependent on knowing this.
> > >
> > > 5. Special [[stack]], [[heap]] and [anon] frames are also gone, and
> > > you can see the following during "perf script" run:
> > >
> > > open("[stack]", O_RDONLY)               = -1 ENOENT (No such file or directory)
> > > open("[heap]", O_RDONLY)                = -1 ENOENT (No such file or directory)
> > > open("//anon", O_RDONLY)                = -1 ENOENT (No such file or directory)
> >
> > strange, let's start with the reproducer and I'll check
> > on this if I see it
> >
> > thanks,
> > jirka
> >

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-12-19 23:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CABWYdi2jvPUq128XDv_VbY=vFknFyJHbUR=0_K9WuA0mFTkPvg@mail.gmail.com>
2019-11-27 21:15 ` perf is unable to read dward from go programs Ivan Babrou
2019-11-29 13:49   ` Jiri Olsa
2019-11-29 15:14     ` Arnaldo Carvalho de Melo
2019-12-02 19:49       ` Ivan Babrou
2019-12-03 10:22         ` Jiri Olsa
2019-12-19 23:38           ` Ivan Babrou
2019-12-19 23:57             ` Ivan Babrou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).