* Haswell LBR call stacks
@ 2015-06-14 11:02 Milian Wolff
2015-08-04 17:24 ` Haswell LBR call stacks - broken Milian Wolff
0 siblings, 1 reply; 5+ messages in thread
From: Milian Wolff @ 2015-06-14 11:02 UTC (permalink / raw)
To: linux-perf-users
[-- Attachment #1: Type: text/plain, Size: 890 bytes --]
Hey all,
Some time ago I read this interesting article:
http://article.gmane.org/gmane.linux.kernel/1809078
It mentions a new call stack unwinding for perf, based on Haswell LBR
facility. I now have a new Laptop with a Broadwell i7-5600U CPU, but my perf
version 4.0.3 running against a Linux 4.0.4 kernel (all vanilla Archlinux
versions) does not seem to support this feature. Was it ever included in the
mainline? Is a special compiler flag required to enable this feature? Anything
else I'm missing?
callchain: Unknown --call-graph option value: lbr
usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
--call-graph <mode[,dump_size]>
setup and enables call-graph (stack chain/backtrace)
recording: fp dwarf
Thanks
--
Milian Wolff
mail@milianw.de
http://milianw.de
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Haswell LBR call stacks - broken
2015-06-14 11:02 Haswell LBR call stacks Milian Wolff
@ 2015-08-04 17:24 ` Milian Wolff
2015-08-04 18:10 ` Andi Kleen
0 siblings, 1 reply; 5+ messages in thread
From: Milian Wolff @ 2015-08-04 17:24 UTC (permalink / raw)
To: linux-perf-users
On Sunday 14 June 2015 13:02:57 Milian Wolff wrote:
> Hey all,
>
> Some time ago I read this interesting article:
>
> http://article.gmane.org/gmane.linux.kernel/1809078
>
> It mentions a new call stack unwinding for perf, based on Haswell LBR
> facility. I now have a new Laptop with a Broadwell i7-5600U CPU, but my perf
> version 4.0.3 running against a Linux 4.0.4 kernel (all vanilla Archlinux
> versions) does not seem to support this feature. Was it ever included in
> the mainline? Is a special compiler flag required to enable this feature?
> Anything else I'm missing?
>
> callchain: Unknown --call-graph option value: lbr
>
> usage: perf record [<options>] [<command>]
> or: perf record [<options>] -- <command> [<options>]
>
> --call-graph <mode[,dump_size]>
> setup and enables call-graph (stack
> chain/backtrace) recording: fp dwarf
OK, with a more recent perf v4.2.rc5 on a Intel(R) Core(TM) i7-4770 CPU @
3.40GHz it actually works. Somewhat :) It is much faster, but the callstacks
don't terminate properly, and are sometimes not correctly demangled. I observe
the following behavior:
15.56% ex_string_compa libQt5Core.so.5.5.0 [.] QString::compare_helper
|
|--5.84%--
_ZN7QString14compare_helperEPK5QChariS2_iN2Qt15CaseSensitivityE@plt
| QString::compare_helper
| main
| |
| |--4.24%-- main
| | |
| | |--1.60%-- main
| | | |
| | | |--1.07%-- main
| | | | |
| | | | |--0.54%-- main
| | | | | main
| | | | |
| | | | --0.53%--
QString::compare_helper
| | | | main
| | | |
| | | --0.53%--
QString::compare_helper
| | | main
| | | main
| | |
QString::compare_helper
| | | main
| | |
QString::compare_helper
...
The correct callgraph, as shown by --call-graph dwarf, is:
21.62% ex_string_compa libQt5Core.so.5.5.0 [.] QString::compare_helper
|
---QString::compare_helper
main
Is this a known (undocumented) limitation or a bug? Is there anything I could
do to get this fixed?
Thanks
--
Milian Wolff
mail@milianw.de
http://milianw.de
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Haswell LBR call stacks - broken
2015-08-04 17:24 ` Haswell LBR call stacks - broken Milian Wolff
@ 2015-08-04 18:10 ` Andi Kleen
2015-08-04 21:41 ` Liang, Kan
0 siblings, 1 reply; 5+ messages in thread
From: Andi Kleen @ 2015-08-04 18:10 UTC (permalink / raw)
To: Milian Wolff; +Cc: linux-perf-users, kan.liang
Milian Wolff <mail@milianw.de> writes:
>
> OK, with a more recent perf v4.2.rc5 on a Intel(R) Core(TM) i7-4770 CPU @
> 3.40GHz it actually works. Somewhat :) It is much faster, but the callstacks
> don't terminate properly, and are sometimes not correctly demangled. I observe
> the following behavior:
>
> 15.56% ex_string_compa libQt5Core.so.5.5.0 [.] QString::compare_helper
> |
> |--5.84%--
> _ZN7QString14compare_helperEPK5QChariS2_iN2Qt15CaseSensitivityE@plt
Maybe the demangler doesn't like the @plt.
> | QString::compare_helper
> | main
> | |
> | |--4.24%-- main
> | | |
> | | |--1.60%-- main
> | | | |
> | | | |--1.07%-- main
Yes that looks like a bug.
Adding Kan.
-Andi
> | | | | |
> | | | | |--0.54%-- main
> | | | | | main
> | | | | |
> | | | | --0.53%--
> QString::compare_helper
> | | | | main
> | | | |
> | | | --0.53%--
> QString::compare_helper
> | | | main
> | | | main
> | | |
> QString::compare_helper
> | | | main
> | | |
> QString::compare_helper
> ...
>
> The correct callgraph, as shown by --call-graph dwarf, is:
>
> 21.62% ex_string_compa libQt5Core.so.5.5.0 [.] QString::compare_helper
> |
> ---QString::compare_helper
> main
>
> Is this a known (undocumented) limitation or a bug? Is there anything I could
> do to get this fixed?
>
> Thanks
--
ak@linux.intel.com -- Speaking for myself only
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: Haswell LBR call stacks - broken
2015-08-04 18:10 ` Andi Kleen
@ 2015-08-04 21:41 ` Liang, Kan
2015-08-05 9:02 ` Milian Wolff
0 siblings, 1 reply; 5+ messages in thread
From: Liang, Kan @ 2015-08-04 21:41 UTC (permalink / raw)
To: Andi Kleen, Milian Wolff; +Cc: linux-perf-users
Hi Milian,
Is it possible to share your test case/steps with me?
Does --call-graph fp work?
Thanks,
Kan
>
> Milian Wolff <mail@milianw.de> writes:
> >
> > OK, with a more recent perf v4.2.rc5 on a Intel(R) Core(TM) i7-4770
> > CPU @ 3.40GHz it actually works. Somewhat :) It is much faster, but
> > the callstacks don't terminate properly, and are sometimes not
> > correctly demangled. I observe the following behavior:
> >
> > 15.56% ex_string_compa libQt5Core.so.5.5.0 [.]
> QString::compare_helper
> > |
> > |--5.84%--
> >
> _ZN7QString14compare_helperEPK5QChariS2_iN2Qt15CaseSensitivityE@pl
> t
>
> Maybe the demangler doesn't like the @plt.
>
>
> > | QString::compare_helper
> > | main
> > | |
> > | |--4.24%-- main
> > | | |
> > | | |--1.60%-- main
> > | | | |
> > | | | |--1.07%-- main
>
> Yes that looks like a bug.
>
> Adding Kan.
>
> -Andi
>
> > | | | | |
> > | | | | |--0.54%-- main
> > | | | | | main
> > | | | | |
> > | | | | --0.53%--
> > QString::compare_helper
> > | | | | main
> > | | | |
> > | | | --0.53%--
> > QString::compare_helper
> > | | | main
> > | | | main
> > | | |
> > QString::compare_helper
> > | | | main
> > | | |
> > QString::compare_helper
> > ...
> >
> > The correct callgraph, as shown by --call-graph dwarf, is:
> >
> > 21.62% ex_string_compa libQt5Core.so.5.5.0 [.]
> QString::compare_helper
> > |
> > ---QString::compare_helper
> > main
> >
> > Is this a known (undocumented) limitation or a bug? Is there anything
> > I could do to get this fixed?
> >
> > Thanks
>
> --
> ak@linux.intel.com -- Speaking for myself only
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Haswell LBR call stacks - broken
2015-08-04 21:41 ` Liang, Kan
@ 2015-08-05 9:02 ` Milian Wolff
0 siblings, 0 replies; 5+ messages in thread
From: Milian Wolff @ 2015-08-05 9:02 UTC (permalink / raw)
To: Liang, Kan; +Cc: Andi Kleen, linux-perf-users
On Tuesday 04 August 2015 21:41:47 Liang, Kan wrote:
> Hi Milian,
>
> Is it possible to share your test case/steps with me?
Sure, I hope Qt is fine with you.
main.cpp:
~~~~~~~~~~~~~~~~~~~~~~
#include <QString>
#include <QTextStream>
int main()
{
QStringList haystack;
for (int i = 0; i < 1000; ++i) {
haystack << QString::number(i);
}
uint matches = 0;
for (int i = 0; i < 1000; ++i) {
foreach (const QString &str, haystack) {
if (str == "needle") {
++matches;
}
}
}
QTextStream out(stdout);
out << "Matches: " << matches << endl;
return 0;
}
~~~~~~~~~~~~~~~~~~~~~~
lbr.pro:
~~~~~~~~~~~~~~~~~~~~~~
TEMPLATE = app
SOURCES = main.cpp
CONFIG += release
QMAKE_CXXFLAGS += -g
~~~~~~~~~~~~~~~~~~~~~~
To build, put both into a folder and then do:
~~~~~~~~~~~~~~~~~~~~~~
mkdir build
cd build
qmake-qt5 ..
make
perf record --call-graph lbr ./lbr
perf report --stdio
~~~~~~~~~~~~~~~~~~~~~~
> Does --call-graph fp work?
No, I'm on a 64bit architecture, and most libs (esp. Qt) is built without
framepointers. --call-graph dwarf does work though.
Bye
--
Milian Wolff
mail@milianw.de
http://milianw.de
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-08-05 9:02 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-14 11:02 Haswell LBR call stacks Milian Wolff
2015-08-04 17:24 ` Haswell LBR call stacks - broken Milian Wolff
2015-08-04 18:10 ` Andi Kleen
2015-08-04 21:41 ` Liang, Kan
2015-08-05 9:02 ` Milian Wolff
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.