All of lore.kernel.org
 help / color / mirror / Atom feed
* Haswell LBR call stacks
@ 2015-06-14 11:02 Milian Wolff
  2015-08-04 17:24 ` Haswell LBR call stacks - broken Milian Wolff
  0 siblings, 1 reply; 5+ messages in thread
From: Milian Wolff @ 2015-06-14 11:02 UTC (permalink / raw)
  To: linux-perf-users

[-- Attachment #1: Type: text/plain, Size: 890 bytes --]

Hey all,

Some time ago I read this interesting article: 

http://article.gmane.org/gmane.linux.kernel/1809078

It mentions a new call stack unwinding for perf, based on Haswell LBR 
facility. I now have a new Laptop with a Broadwell i7-5600U CPU, but my perf 
version 4.0.3 running against a Linux 4.0.4 kernel (all vanilla Archlinux 
versions) does not seem to support this feature. Was it ever included in the 
mainline? Is a special compiler flag required to enable this feature? Anything 
else I'm missing?

callchain: Unknown --call-graph option value: lbr

 usage: perf record [<options>] [<command>]
    or: perf record [<options>] -- <command> [<options>]

        --call-graph <mode[,dump_size]>
                          setup and enables call-graph (stack chain/backtrace) 
recording: fp dwarf

Thanks
-- 
Milian Wolff
mail@milianw.de
http://milianw.de

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Haswell LBR call stacks - broken
  2015-06-14 11:02 Haswell LBR call stacks Milian Wolff
@ 2015-08-04 17:24 ` Milian Wolff
  2015-08-04 18:10   ` Andi Kleen
  0 siblings, 1 reply; 5+ messages in thread
From: Milian Wolff @ 2015-08-04 17:24 UTC (permalink / raw)
  To: linux-perf-users

On Sunday 14 June 2015 13:02:57 Milian Wolff wrote:
> Hey all,
> 
> Some time ago I read this interesting article:
> 
> http://article.gmane.org/gmane.linux.kernel/1809078
> 
> It mentions a new call stack unwinding for perf, based on Haswell LBR
> facility. I now have a new Laptop with a Broadwell i7-5600U CPU, but my perf
> version 4.0.3 running against a Linux 4.0.4 kernel (all vanilla Archlinux
> versions) does not seem to support this feature. Was it ever included in
> the mainline? Is a special compiler flag required to enable this feature?
> Anything else I'm missing?
> 
> callchain: Unknown --call-graph option value: lbr
> 
>  usage: perf record [<options>] [<command>]
>     or: perf record [<options>] -- <command> [<options>]
> 
>         --call-graph <mode[,dump_size]>
>                           setup and enables call-graph (stack
> chain/backtrace) recording: fp dwarf

OK, with a more recent perf v4.2.rc5 on a Intel(R) Core(TM) i7-4770 CPU @ 
3.40GHz it actually works. Somewhat :) It is much faster, but the callstacks 
don't terminate properly, and are sometimes not correctly demangled. I observe 
the following behavior:

    15.56%  ex_string_compa  libQt5Core.so.5.5.0   [.] QString::compare_helper                 
            |          
            |--5.84%-- 
_ZN7QString14compare_helperEPK5QChariS2_iN2Qt15CaseSensitivityE@plt
            |          QString::compare_helper
            |          main
            |          |          
            |          |--4.24%-- main
            |          |          |          
            |          |          |--1.60%-- main
            |          |          |          |          
            |          |          |          |--1.07%-- main
            |          |          |          |          |          
            |          |          |          |          |--0.54%-- main
            |          |          |          |          |          main
            |          |          |          |          |          
            |          |          |          |           --0.53%-- 
QString::compare_helper
            |          |          |          |                     main
            |          |          |          |          
            |          |          |           --0.53%-- 
QString::compare_helper
            |          |          |                     main
            |          |          |                     main
            |          |          |                     
QString::compare_helper
            |          |          |                     main
            |          |          |                     
QString::compare_helper
...

The correct callgraph, as shown by --call-graph dwarf, is:

    21.62%  ex_string_compa  libQt5Core.so.5.5.0   [.] QString::compare_helper                                        
            |
            ---QString::compare_helper
               main

Is this a known (undocumented) limitation or a bug? Is there anything I could 
do to get this fixed?

Thanks
-- 
Milian Wolff
mail@milianw.de
http://milianw.de

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Haswell LBR call stacks - broken
  2015-08-04 17:24 ` Haswell LBR call stacks - broken Milian Wolff
@ 2015-08-04 18:10   ` Andi Kleen
  2015-08-04 21:41     ` Liang, Kan
  0 siblings, 1 reply; 5+ messages in thread
From: Andi Kleen @ 2015-08-04 18:10 UTC (permalink / raw)
  To: Milian Wolff; +Cc: linux-perf-users, kan.liang

Milian Wolff <mail@milianw.de> writes:
>
> OK, with a more recent perf v4.2.rc5 on a Intel(R) Core(TM) i7-4770 CPU @ 
> 3.40GHz it actually works. Somewhat :) It is much faster, but the callstacks 
> don't terminate properly, and are sometimes not correctly demangled. I observe 
> the following behavior:
>
>     15.56%  ex_string_compa  libQt5Core.so.5.5.0   [.] QString::compare_helper                 
>             |          
>             |--5.84%-- 
> _ZN7QString14compare_helperEPK5QChariS2_iN2Qt15CaseSensitivityE@plt

Maybe the demangler doesn't like the @plt.


>             |          QString::compare_helper
>             |          main
>             |          |          
>             |          |--4.24%-- main
>             |          |          |          
>             |          |          |--1.60%-- main
>             |          |          |          |          
>             |          |          |          |--1.07%-- main

Yes that looks like a bug.

Adding Kan.

-Andi

>             |          |          |          |          |          
>             |          |          |          |          |--0.54%-- main
>             |          |          |          |          |          main
>             |          |          |          |          |          
>             |          |          |          |           --0.53%-- 
> QString::compare_helper
>             |          |          |          |                     main
>             |          |          |          |          
>             |          |          |           --0.53%-- 
> QString::compare_helper
>             |          |          |                     main
>             |          |          |                     main
>             |          |          |                     
> QString::compare_helper
>             |          |          |                     main
>             |          |          |                     
> QString::compare_helper
> ...
>
> The correct callgraph, as shown by --call-graph dwarf, is:
>
>     21.62%  ex_string_compa  libQt5Core.so.5.5.0   [.] QString::compare_helper                                        
>             |
>             ---QString::compare_helper
>                main
>
> Is this a known (undocumented) limitation or a bug? Is there anything I could 
> do to get this fixed?
>
> Thanks

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Haswell LBR call stacks - broken
  2015-08-04 18:10   ` Andi Kleen
@ 2015-08-04 21:41     ` Liang, Kan
  2015-08-05  9:02       ` Milian Wolff
  0 siblings, 1 reply; 5+ messages in thread
From: Liang, Kan @ 2015-08-04 21:41 UTC (permalink / raw)
  To: Andi Kleen, Milian Wolff; +Cc: linux-perf-users

Hi Milian,

Is it possible to share your test case/steps with me?

Does --call-graph fp work?

Thanks,
Kan

> 
> Milian Wolff <mail@milianw.de> writes:
> >
> > OK, with a more recent perf v4.2.rc5 on a Intel(R) Core(TM) i7-4770
> > CPU @ 3.40GHz it actually works. Somewhat :) It is much faster, but
> > the callstacks don't terminate properly, and are sometimes not
> > correctly demangled. I observe the following behavior:
> >
> >     15.56%  ex_string_compa  libQt5Core.so.5.5.0   [.]
> QString::compare_helper
> >             |
> >             |--5.84%--
> >
> _ZN7QString14compare_helperEPK5QChariS2_iN2Qt15CaseSensitivityE@pl
> t
> 
> Maybe the demangler doesn't like the @plt.
> 
> 
> >             |          QString::compare_helper
> >             |          main
> >             |          |
> >             |          |--4.24%-- main
> >             |          |          |
> >             |          |          |--1.60%-- main
> >             |          |          |          |
> >             |          |          |          |--1.07%-- main
> 
> Yes that looks like a bug.
> 
> Adding Kan.
> 
> -Andi
> 
> >             |          |          |          |          |
> >             |          |          |          |          |--0.54%-- main
> >             |          |          |          |          |          main
> >             |          |          |          |          |
> >             |          |          |          |           --0.53%--
> > QString::compare_helper
> >             |          |          |          |                     main
> >             |          |          |          |
> >             |          |          |           --0.53%--
> > QString::compare_helper
> >             |          |          |                     main
> >             |          |          |                     main
> >             |          |          |
> > QString::compare_helper
> >             |          |          |                     main
> >             |          |          |
> > QString::compare_helper
> > ...
> >
> > The correct callgraph, as shown by --call-graph dwarf, is:
> >
> >     21.62%  ex_string_compa  libQt5Core.so.5.5.0   [.]
> QString::compare_helper
> >             |
> >             ---QString::compare_helper
> >                main
> >
> > Is this a known (undocumented) limitation or a bug? Is there anything
> > I could do to get this fixed?
> >
> > Thanks
> 
> --
> ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Haswell LBR call stacks - broken
  2015-08-04 21:41     ` Liang, Kan
@ 2015-08-05  9:02       ` Milian Wolff
  0 siblings, 0 replies; 5+ messages in thread
From: Milian Wolff @ 2015-08-05  9:02 UTC (permalink / raw)
  To: Liang, Kan; +Cc: Andi Kleen, linux-perf-users

On Tuesday 04 August 2015 21:41:47 Liang, Kan wrote:
> Hi Milian,
> 
> Is it possible to share your test case/steps with me?

Sure, I hope Qt is fine with you.

main.cpp:
~~~~~~~~~~~~~~~~~~~~~~
#include <QString>
#include <QTextStream>

int main()
{
    QStringList haystack;
    for (int i = 0; i < 1000; ++i) {
        haystack << QString::number(i);
    }

    uint matches = 0;
    for (int i = 0; i < 1000; ++i) {
        foreach (const QString &str, haystack) {
            if (str == "needle") {
                ++matches;
            }
        }
    }

    QTextStream out(stdout);
    out << "Matches: " << matches << endl;

    return 0;
}

~~~~~~~~~~~~~~~~~~~~~~

lbr.pro:
~~~~~~~~~~~~~~~~~~~~~~
TEMPLATE = app

SOURCES = main.cpp

CONFIG += release
QMAKE_CXXFLAGS += -g
~~~~~~~~~~~~~~~~~~~~~~

To build, put both into a folder and then do:

~~~~~~~~~~~~~~~~~~~~~~
mkdir build
cd build
qmake-qt5 ..
make
perf record --call-graph lbr ./lbr
perf report --stdio
~~~~~~~~~~~~~~~~~~~~~~

> Does --call-graph fp work?

No, I'm on a 64bit architecture, and most libs (esp. Qt) is built without 
framepointers. --call-graph dwarf does work though.

Bye
-- 
Milian Wolff
mail@milianw.de
http://milianw.de

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-08-05  9:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-14 11:02 Haswell LBR call stacks Milian Wolff
2015-08-04 17:24 ` Haswell LBR call stacks - broken Milian Wolff
2015-08-04 18:10   ` Andi Kleen
2015-08-04 21:41     ` Liang, Kan
2015-08-05  9:02       ` Milian Wolff

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.