All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH/RFC 0/4] perf annotate: Add --source-only option and the new source code TUI view
@ 2017-06-28  3:18 Taeung Song
  2017-06-28  3:26 ` Taeung Song
  2017-06-28  9:53 ` Milian Wolff
  0 siblings, 2 replies; 8+ messages in thread
From: Taeung Song @ 2017-06-28  3:18 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, Adrian Hunter, Andi Kleen, David Ahern, Jin Yao,
	Jiri Olsa, Kim Phillips, Masami Hiramatsu, Milian Wolff,
	Namhyung Kim, Wang Nan

Hi,

The --source-only option and new source code TUI view can show
the result of performance analysis based on full source code per symbol(function).
(Namhyung Kim told me this idea and it was also requested by others some time ago..)

If someone wants to see the cause, he/she will need to dig into the asm.
But before that, looking at the source level can give a hint or clue
for the problem.

For example, if target symbol is 'hex2u64' of util/util.c,
the output is like below.

    $ perf annotate --source-only --stdio -s hex2u64
 Percent |      Source code of util.c for cycles:ppp (42 samples)
-----------------------------------------------------------------
    0.00 : 354   * While we find nice hex chars, build a long_val.
    0.00 : 355   * Return number of chars processed.
    0.00 : 356   */
    0.00 : 357  int hex2u64(const char *ptr, u64 *long_val)
    2.38 : 358  {
    2.38 : 359          const char *p = ptr;
    0.00 : 360          *long_val = 0;
    0.00 : 361
   30.95 : 362          while (*p) {
   23.81 : 363                  const int hex_val = hex(*p);
    0.00 : 364
   14.29 : 365                  if (hex_val < 0)
    0.00 : 366                          break;
    0.00 : 367
   26.19 : 368                  *long_val = (*long_val << 4) | hex_val;
    0.00 : 369                  p++;
    0.00 : 370          }
    0.00 : 371
    0.00 : 372          return p - ptr;
    0.00 : 373  }

And I added many perf developers into Cc: because I want to listen to your opinions
about this new feature, if you don't mind.

If you give some feedback, I'd appreciate it! :)

Thanks,
Taeung

Taeung Song (4):
  perf annotate: Add --source-only option
  perf annotate: Add new source code view to the annotate TUI browser
  perf annotate: Fold or unfold partial disassembly lines on source code
    view
  perf annotate: Support a 'o' key showing addresses on the new source
    code view

 tools/perf/builtin-annotate.c     |   2 +
 tools/perf/ui/browser.h           |   1 +
 tools/perf/ui/browsers/annotate.c | 307 +++++++++++++++++++++++++++++++++++++-
 tools/perf/util/annotate.c        | 303 ++++++++++++++++++++++++++++++++++++-
 tools/perf/util/annotate.h        |  30 ++++
 tools/perf/util/symbol.c          |   1 +
 tools/perf/util/symbol.h          |   1 +
 7 files changed, 633 insertions(+), 12 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH/RFC 0/4] perf annotate: Add --source-only option and the new source code TUI view
  2017-06-28  3:18 [PATCH/RFC 0/4] perf annotate: Add --source-only option and the new source code TUI view Taeung Song
@ 2017-06-28  3:26 ` Taeung Song
  2017-06-28  9:53 ` Milian Wolff
  1 sibling, 0 replies; 8+ messages in thread
From: Taeung Song @ 2017-06-28  3:26 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, Adrian Hunter, Andi Kleen, David Ahern, Jin Yao,
	Jiri Olsa, Kim Phillips, Masami Hiramatsu, Milian Wolff,
	Namhyung Kim, Wang Nan

Additionally,
the code is available on 'annotate/src-only' branch at

   git://github.com/taeung/linux-perf.git

Thanks,
Taeung

On 06/28/2017 12:18 PM, Taeung Song wrote:
> Hi,
> 
> The --source-only option and new source code TUI view can show
> the result of performance analysis based on full source code per symbol(function).
> (Namhyung Kim told me this idea and it was also requested by others some time ago..)
> 
> If someone wants to see the cause, he/she will need to dig into the asm.
> But before that, looking at the source level can give a hint or clue
> for the problem.
> 
> For example, if target symbol is 'hex2u64' of util/util.c,
> the output is like below.
> 
>      $ perf annotate --source-only --stdio -s hex2u64
>   Percent |      Source code of util.c for cycles:ppp (42 samples)
> -----------------------------------------------------------------
>      0.00 : 354   * While we find nice hex chars, build a long_val.
>      0.00 : 355   * Return number of chars processed.
>      0.00 : 356   */
>      0.00 : 357  int hex2u64(const char *ptr, u64 *long_val)
>      2.38 : 358  {
>      2.38 : 359          const char *p = ptr;
>      0.00 : 360          *long_val = 0;
>      0.00 : 361
>     30.95 : 362          while (*p) {
>     23.81 : 363                  const int hex_val = hex(*p);
>      0.00 : 364
>     14.29 : 365                  if (hex_val < 0)
>      0.00 : 366                          break;
>      0.00 : 367
>     26.19 : 368                  *long_val = (*long_val << 4) | hex_val;
>      0.00 : 369                  p++;
>      0.00 : 370          }
>      0.00 : 371
>      0.00 : 372          return p - ptr;
>      0.00 : 373  }
> 
> And I added many perf developers into Cc: because I want to listen to your opinions
> about this new feature, if you don't mind.
> 
> If you give some feedback, I'd appreciate it! :)
> 
> Thanks,
> Taeung
> 
> Taeung Song (4):
>    perf annotate: Add --source-only option
>    perf annotate: Add new source code view to the annotate TUI browser
>    perf annotate: Fold or unfold partial disassembly lines on source code
>      view
>    perf annotate: Support a 'o' key showing addresses on the new source
>      code view
> 
>   tools/perf/builtin-annotate.c     |   2 +
>   tools/perf/ui/browser.h           |   1 +
>   tools/perf/ui/browsers/annotate.c | 307 +++++++++++++++++++++++++++++++++++++-
>   tools/perf/util/annotate.c        | 303 ++++++++++++++++++++++++++++++++++++-
>   tools/perf/util/annotate.h        |  30 ++++
>   tools/perf/util/symbol.c          |   1 +
>   tools/perf/util/symbol.h          |   1 +
>   7 files changed, 633 insertions(+), 12 deletions(-)
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH/RFC 0/4] perf annotate: Add --source-only option and the new source code TUI view
  2017-06-28  3:18 [PATCH/RFC 0/4] perf annotate: Add --source-only option and the new source code TUI view Taeung Song
  2017-06-28  3:26 ` Taeung Song
@ 2017-06-28  9:53 ` Milian Wolff
  2017-06-28 16:27   ` Taeung Song
                     ` (2 more replies)
  1 sibling, 3 replies; 8+ messages in thread
From: Milian Wolff @ 2017-06-28  9:53 UTC (permalink / raw)
  To: Taeung Song
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Adrian Hunter,
	Andi Kleen, David Ahern, Jin Yao, Jiri Olsa, Kim Phillips,
	Masami Hiramatsu, Namhyung Kim, Wang Nan

[-- Attachment #1: Type: text/plain, Size: 6632 bytes --]

On Wednesday, June 28, 2017 5:18:08 AM CEST Taeung Song wrote:
> Hi,
> 
> The --source-only option and new source code TUI view can show
> the result of performance analysis based on full source code per
> symbol(function). (Namhyung Kim told me this idea and it was also requested
> by others some time ago..)
> 
> If someone wants to see the cause, he/she will need to dig into the asm.
> But before that, looking at the source level can give a hint or clue
> for the problem.
> 
> For example, if target symbol is 'hex2u64' of util/util.c,
> the output is like below.
> 
>     $ perf annotate --source-only --stdio -s hex2u64
>  Percent |      Source code of util.c for cycles:ppp (42 samples)
> -----------------------------------------------------------------
>     0.00 : 354   * While we find nice hex chars, build a long_val.
>     0.00 : 355   * Return number of chars processed.
>     0.00 : 356   */
>     0.00 : 357  int hex2u64(const char *ptr, u64 *long_val)
>     2.38 : 358  {
>     2.38 : 359          const char *p = ptr;
>     0.00 : 360          *long_val = 0;
>     0.00 : 361
>    30.95 : 362          while (*p) {
>    23.81 : 363                  const int hex_val = hex(*p);
>     0.00 : 364
>    14.29 : 365                  if (hex_val < 0)
>     0.00 : 366                          break;
>     0.00 : 367
>    26.19 : 368                  *long_val = (*long_val << 4) | hex_val;
>     0.00 : 369                  p++;
>     0.00 : 370          }
>     0.00 : 371
>     0.00 : 372          return p - ptr;
>     0.00 : 373  }
> 
> And I added many perf developers into Cc: because I want to listen to your
> opinions about this new feature, if you don't mind.
> 
> If you give some feedback, I'd appreciate it! :)

Thanks Taeung,

I requested this feature some time ago and it's really cool to see someone 
step up and implement it - much appreciated!

I just tested it out on my pet-example that leverages C++ instead of C:

~~~~~
#include <complex>
#include <cmath>
#include <random>
#include <iostream>

using namespace std;

int main()
{
    uniform_real_distribution<double> uniform(-1E5, 1E5);
    default_random_engine engine;
    double s = 0;
    for (int i = 0; i < 10000000; ++i) {
        s += norm(complex<double>(uniform(engine), uniform(engine)));
    }
    cout << s << '\n';
    return 0;
}
~~~~~

Compile it with:

g++ -O2 -g -std=c++11 test.cpp -o test

Then record it with perf:

perf record --call-graph dwarf ./test

Then analyse it with `perf report`. You'll see one entry for main with 
something like:

+  100.00%    39.69%  cpp-inlining  cpp-inlining      [.] main

Select it and annotate it, then switch to your new source-only view:

main  test.cpp
       │  30                                                                                                                                                                                                    
       │  31    using namespace std;                                                                                                                                                                            
       │  32                                                                                                                                                                                                    
       │  33    int main()                                                                                                                                                                                      
       │+ 34    {                                                                                                                                                                                               
       │  35        uniform_real_distribution<double> uniform(-1E5, 1E5);                                                                                                                                       
       │  36        default_random_engine engine;                                                                                                                                                               
       │+ 37        double s = 0;                                                                                                                                                                               
       │+ 38        for (int i = 0; i < 10000000; ++i) {                                                                                                                                                        
  4.88 │+ 39            s += norm(complex<double>(uniform(engine), 
uniform(engine)));                                                                                                                           
       │  40        }                                                                                                                                                                                           
       │  41        cout << s << '\n';                                                                                                                                                                          
       │  42        return 0;                                                                                                                                                                                   
       │+ 43    } 

Note: the line numbers are off b/c my file contains a file-header on-top. 
Ignore that.

Note2: There is no column header shown, so it's unclear what the first column 
represents.

Note 3: report showed 39.69% self cost in main, 100.00% inclusive. annotate 
shows 4.88... What is that?

What this shows, is that it's extremely important to visualize inclusive cost 
_and_ self cost in this view. Additionally, we need to account for inlining. 
Right now, we only see the self cost that is directly within main, I suspect. 
For C++ this is usually very misleading, and basically makes the annotate view 
completely useless for application-level profiling. If a second column would 
be added with the inclusive cost with the ability to drill down, then I could 
easily see myself using this view.

I would appreciate if you could take this into account.

Thanks a lot


-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 3826 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH/RFC 0/4] perf annotate: Add --source-only option and the new source code TUI view
  2017-06-28  9:53 ` Milian Wolff
@ 2017-06-28 16:27   ` Taeung Song
  2017-06-28 16:32     ` Milian Wolff
  2017-06-29  7:11   ` Namhyung Kim
  2017-06-30 16:21   ` Taeung Song
  2 siblings, 1 reply; 8+ messages in thread
From: Taeung Song @ 2017-06-28 16:27 UTC (permalink / raw)
  To: Milian Wolff
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Adrian Hunter,
	Andi Kleen, David Ahern, Jin Yao, Jiri Olsa, Kim Phillips,
	Masami Hiramatsu, Namhyung Kim, Wang Nan



On 06/28/2017 06:53 PM, Milian Wolff wrote:
> On Wednesday, June 28, 2017 5:18:08 AM CEST Taeung Song wrote:
>> Hi,
>>
>> The --source-only option and new source code TUI view can show
>> the result of performance analysis based on full source code per
>> symbol(function). (Namhyung Kim told me this idea and it was also requested
>> by others some time ago..)
>>
>> If someone wants to see the cause, he/she will need to dig into the asm.
>> But before that, looking at the source level can give a hint or clue
>> for the problem.
>>
>> For example, if target symbol is 'hex2u64' of util/util.c,
>> the output is like below.
>>
>>      $ perf annotate --source-only --stdio -s hex2u64
>>   Percent |      Source code of util.c for cycles:ppp (42 samples)
>> -----------------------------------------------------------------
>>      0.00 : 354   * While we find nice hex chars, build a long_val.
>>      0.00 : 355   * Return number of chars processed.
>>      0.00 : 356   */
>>      0.00 : 357  int hex2u64(const char *ptr, u64 *long_val)
>>      2.38 : 358  {
>>      2.38 : 359          const char *p = ptr;
>>      0.00 : 360          *long_val = 0;
>>      0.00 : 361
>>     30.95 : 362          while (*p) {
>>     23.81 : 363                  const int hex_val = hex(*p);
>>      0.00 : 364
>>     14.29 : 365                  if (hex_val < 0)
>>      0.00 : 366                          break;
>>      0.00 : 367
>>     26.19 : 368                  *long_val = (*long_val << 4) | hex_val;
>>      0.00 : 369                  p++;
>>      0.00 : 370          }
>>      0.00 : 371
>>      0.00 : 372          return p - ptr;
>>      0.00 : 373  }
>>
>> And I added many perf developers into Cc: because I want to listen to your
>> opinions about this new feature, if you don't mind.
>>
>> If you give some feedback, I'd appreciate it! :)
> 
> Thanks Taeung,
> 
> I requested this feature some time ago and it's really cool to see someone
> step up and implement it - much appreciated!

Thank you so much, Milian !! :)

> 
> I just tested it out on my pet-example that leverages C++ instead of C:
> 
> ~~~~~
> #include <complex>
> #include <cmath>
> #include <random>
> #include <iostream>
> 
> using namespace std;
> 
> int main()
> {
>      uniform_real_distribution<double> uniform(-1E5, 1E5);
>      default_random_engine engine;
>      double s = 0;
>      for (int i = 0; i < 10000000; ++i) {
>          s += norm(complex<double>(uniform(engine), uniform(engine)));
>      }
>      cout << s << '\n';
>      return 0;
> }
> ~~~~~
> 
> Compile it with:
> 
> g++ -O2 -g -std=c++11 test.cpp -o test
> 
> Then record it with perf:
> 
> perf record --call-graph dwarf ./test
> 
> Then analyse it with `perf report`. You'll see one entry for main with
> something like:
> 
> +  100.00%    39.69%  cpp-inlining  cpp-inlining      [.] main
> 
> Select it and annotate it, then switch to your new source-only view:
> 
> main  test.cpp
>         │  30
>         │  31    using namespace std;
>         │  32
>         │  33    int main()
>         │+ 34    {
>         │  35        uniform_real_distribution<double> uniform(-1E5, 1E5);
>         │  36        default_random_engine engine;
>         │+ 37        double s = 0;
>         │+ 38        for (int i = 0; i < 10000000; ++i) {
>    4.88 │+ 39            s += norm(complex<double>(uniform(engine),
> uniform(engine)));
>         │  40        }
>         │  41        cout << s << '\n';
>         │  42        return 0;
>         │+ 43    }
> 
> Note: the line numbers are off b/c my file contains a file-header on-top.
> Ignore that.
> 
> Note2: There is no column header shown, so it's unclear what the first column
> represents.
> 
> Note 3: report showed 39.69% self cost in main, 100.00% inclusive. annotate
> shows 4.88... What is that?
> 
> What this shows, is that it's extremely important to visualize inclusive cost
> _and_ self cost in this view. Additionally, we need to account for inlining.
> Right now, we only see the self cost that is directly within main, I suspect.
> For C++ this is usually very misleading, and basically makes the annotate view
> completely useless for application-level profiling. If a second column would
> be added with the inclusive cost with the ability to drill down, then I could
> easily see myself using this view.
> 
> I would appreciate if you could take this into account.
> 
> Thanks a lot
> 
> 

Sure, I got it.
I'll investigate this weird case and recheck this patchset based on your 
comments,
and then I'll reply again. :)

Thanks,
Taeung

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH/RFC 0/4] perf annotate: Add --source-only option and the new source code TUI view
  2017-06-28 16:27   ` Taeung Song
@ 2017-06-28 16:32     ` Milian Wolff
  0 siblings, 0 replies; 8+ messages in thread
From: Milian Wolff @ 2017-06-28 16:32 UTC (permalink / raw)
  To: Taeung Song
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Adrian Hunter,
	Andi Kleen, David Ahern, Jin Yao, Jiri Olsa, Kim Phillips,
	Masami Hiramatsu, Namhyung Kim, Wang Nan

[-- Attachment #1: Type: text/plain, Size: 5535 bytes --]

On Wednesday, June 28, 2017 6:27:34 PM CEST Taeung Song wrote:
> On 06/28/2017 06:53 PM, Milian Wolff wrote:
> > On Wednesday, June 28, 2017 5:18:08 AM CEST Taeung Song wrote:
> >> Hi,
> >> 
> >> The --source-only option and new source code TUI view can show
> >> the result of performance analysis based on full source code per
> >> symbol(function). (Namhyung Kim told me this idea and it was also
> >> requested
> >> by others some time ago..)
> >> 
> >> If someone wants to see the cause, he/she will need to dig into the asm.
> >> But before that, looking at the source level can give a hint or clue
> >> for the problem.
> >> 
> >> For example, if target symbol is 'hex2u64' of util/util.c,
> >> the output is like below.
> >> 
> >>      $ perf annotate --source-only --stdio -s hex2u64
> >>   
> >>   Percent |      Source code of util.c for cycles:ppp (42 samples)
> >> 
> >> -----------------------------------------------------------------
> >> 
> >>      0.00 : 354   * While we find nice hex chars, build a long_val.
> >>      0.00 : 355   * Return number of chars processed.
> >>      0.00 : 356   */
> >>      0.00 : 357  int hex2u64(const char *ptr, u64 *long_val)
> >>      2.38 : 358  {
> >>      2.38 : 359          const char *p = ptr;
> >>      0.00 : 360          *long_val = 0;
> >>      0.00 : 361
> >>     
> >>     30.95 : 362          while (*p) {
> >>     23.81 : 363                  const int hex_val = hex(*p);
> >>     
> >>      0.00 : 364
> >>     
> >>     14.29 : 365                  if (hex_val < 0)
> >>     
> >>      0.00 : 366                          break;
> >>      0.00 : 367
> >>     
> >>     26.19 : 368                  *long_val = (*long_val << 4) | hex_val;
> >>     
> >>      0.00 : 369                  p++;
> >>      0.00 : 370          }
> >>      0.00 : 371
> >>      0.00 : 372          return p - ptr;
> >>      0.00 : 373  }
> >> 
> >> And I added many perf developers into Cc: because I want to listen to
> >> your
> >> opinions about this new feature, if you don't mind.
> >> 
> >> If you give some feedback, I'd appreciate it! :)
> > 
> > Thanks Taeung,
> > 
> > I requested this feature some time ago and it's really cool to see someone
> > step up and implement it - much appreciated!
> 
> Thank you so much, Milian !! :)
> 
> > I just tested it out on my pet-example that leverages C++ instead of C:
> > 
> > ~~~~~
> > #include <complex>
> > #include <cmath>
> > #include <random>
> > #include <iostream>
> > 
> > using namespace std;
> > 
> > int main()
> > {
> > 
> >      uniform_real_distribution<double> uniform(-1E5, 1E5);
> >      default_random_engine engine;
> >      double s = 0;
> >      for (int i = 0; i < 10000000; ++i) {
> >      
> >          s += norm(complex<double>(uniform(engine), uniform(engine)));
> >      
> >      }
> >      cout << s << '\n';
> >      return 0;
> > 
> > }
> > ~~~~~
> > 
> > Compile it with:
> > 
> > g++ -O2 -g -std=c++11 test.cpp -o test
> > 
> > Then record it with perf:
> > 
> > perf record --call-graph dwarf ./test
> > 
> > Then analyse it with `perf report`. You'll see one entry for main with
> > something like:
> > 
> > +  100.00%    39.69%  cpp-inlining  cpp-inlining      [.] main
> > 
> > Select it and annotate it, then switch to your new source-only view:
> > 
> > main  test.cpp
> > 
> >         │  30
> >         │  31    using namespace std;
> >         │  32
> >         │  33    int main()
> >         │+ 34    {
> >         │  35        uniform_real_distribution<double> uniform(-1E5, 1E5);
> >         │  36        default_random_engine engine;
> >         │+ 37        double s = 0;
> >         │+ 38        for (int i = 0; i < 10000000; ++i) {
> >    
> >    4.88 │+ 39            s += norm(complex<double>(uniform(engine),
> > 
> > uniform(engine)));
> > 
> >         │  40        }
> >         │  41        cout << s << '\n';
> >         │  42        return 0;
> >         │+ 43    }
> > 
> > Note: the line numbers are off b/c my file contains a file-header on-top.
> > Ignore that.
> > 
> > Note2: There is no column header shown, so it's unclear what the first
> > column represents.
> > 
> > Note 3: report showed 39.69% self cost in main, 100.00% inclusive.
> > annotate
> > shows 4.88... What is that?
> > 
> > What this shows, is that it's extremely important to visualize inclusive
> > cost _and_ self cost in this view. Additionally, we need to account for
> > inlining. Right now, we only see the self cost that is directly within
> > main, I suspect. For C++ this is usually very misleading, and basically
> > makes the annotate view completely useless for application-level
> > profiling. If a second column would be added with the inclusive cost with
> > the ability to drill down, then I could easily see myself using this
> > view.
> > 
> > I would appreciate if you could take this into account.
> > 
> > Thanks a lot
> 
> Sure, I got it.
> I'll investigate this weird case and recheck this patchset based on your
> comments,
> and then I'll reply again. :)

Cool, I'm happy to test this. Note though that this is not really a "weird 
case" for a C++ developer. It's rather the norm of what we have to deal 
with...

Cheers

-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 3826 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH/RFC 0/4] perf annotate: Add --source-only option and the new source code TUI view
  2017-06-28  9:53 ` Milian Wolff
  2017-06-28 16:27   ` Taeung Song
@ 2017-06-29  7:11   ` Namhyung Kim
  2017-06-30  7:14     ` Taeung Song
  2017-06-30 16:21   ` Taeung Song
  2 siblings, 1 reply; 8+ messages in thread
From: Namhyung Kim @ 2017-06-29  7:11 UTC (permalink / raw)
  To: Milian Wolff
  Cc: Taeung Song, Arnaldo Carvalho de Melo, linux-kernel,
	Adrian Hunter, Andi Kleen, David Ahern, Jin Yao, Jiri Olsa,
	Kim Phillips, Masami Hiramatsu, Wang Nan, kernel-team

Hello,

On Wed, Jun 28, 2017 at 11:53:22AM +0200, Milian Wolff wrote:
> On Wednesday, June 28, 2017 5:18:08 AM CEST Taeung Song wrote:
> > Hi,
> > 
> > The --source-only option and new source code TUI view can show
> > the result of performance analysis based on full source code per
> > symbol(function). (Namhyung Kim told me this idea and it was also requested
> > by others some time ago..)
> > 
> > If someone wants to see the cause, he/she will need to dig into the asm.
> > But before that, looking at the source level can give a hint or clue
> > for the problem.
> > 
> > For example, if target symbol is 'hex2u64' of util/util.c,
> > the output is like below.
> > 
> >     $ perf annotate --source-only --stdio -s hex2u64
> >  Percent |      Source code of util.c for cycles:ppp (42 samples)
> > -----------------------------------------------------------------
> >     0.00 : 354   * While we find nice hex chars, build a long_val.
> >     0.00 : 355   * Return number of chars processed.
> >     0.00 : 356   */
> >     0.00 : 357  int hex2u64(const char *ptr, u64 *long_val)
> >     2.38 : 358  {
> >     2.38 : 359          const char *p = ptr;
> >     0.00 : 360          *long_val = 0;
> >     0.00 : 361
> >    30.95 : 362          while (*p) {
> >    23.81 : 363                  const int hex_val = hex(*p);
> >     0.00 : 364
> >    14.29 : 365                  if (hex_val < 0)
> >     0.00 : 366                          break;
> >     0.00 : 367
> >    26.19 : 368                  *long_val = (*long_val << 4) | hex_val;
> >     0.00 : 369                  p++;
> >     0.00 : 370          }
> >     0.00 : 371
> >     0.00 : 372          return p - ptr;
> >     0.00 : 373  }
> > 
> > And I added many perf developers into Cc: because I want to listen to your
> > opinions about this new feature, if you don't mind.
> > 
> > If you give some feedback, I'd appreciate it! :)
> 
> Thanks Taeung,
> 
> I requested this feature some time ago and it's really cool to see someone 
> step up and implement it - much appreciated!
> 
> I just tested it out on my pet-example that leverages C++ instead of C:
> 
> ~~~~~
> #include <complex>
> #include <cmath>
> #include <random>
> #include <iostream>
> 
> using namespace std;
> 
> int main()
> {
>     uniform_real_distribution<double> uniform(-1E5, 1E5);
>     default_random_engine engine;
>     double s = 0;
>     for (int i = 0; i < 10000000; ++i) {
>         s += norm(complex<double>(uniform(engine), uniform(engine)));
>     }
>     cout << s << '\n';
>     return 0;
> }
> ~~~~~
> 
> Compile it with:
> 
> g++ -O2 -g -std=c++11 test.cpp -o test
> 
> Then record it with perf:
> 
> perf record --call-graph dwarf ./test
> 
> Then analyse it with `perf report`. You'll see one entry for main with 
> something like:
> 
> +  100.00%    39.69%  cpp-inlining  cpp-inlining      [.] main
> 
> Select it and annotate it, then switch to your new source-only view:
> 
> main  test.cpp
>        │  30                                                                                                                                                                                             >        │  31    using namespace std;                                                                                                                                                                     >        │  32                                                                                                                                                                                             >        │  33    int main()                                                                                                                                                                               >        │+ 34    {                                                                                                                                                                                        >        │  35        uniform_real_distribution<double> uniform(-1E5, 1E5);                                                                                                                                >        │  36        default_random_engine engine;                                                                                                                                                        >        │+ 37        double s = 0;                                                                                                                                                                        >        │+ 38        for (int i = 0; i < 10000000; ++i) {                                                                                                                                                 >   4.88 │+ 39            s += norm(complex<double>(uniform(engine), uniform(engine)));                                                                                                                    >        │  40        }                                                                                                                                                                                    >        │  41        cout << s << '\n';                                                                                                                                                                   >        │  42        return 0;                                                                                                                                                                            >        │+ 43    } 
> 
> Note: the line numbers are off b/c my file contains a file-header on-top. 
> Ignore that.
> 
> Note2: There is no column header shown, so it's unclear what the first column 
> represents.
> 
> Note 3: report showed 39.69% self cost in main, 100.00% inclusive. annotate 
> shows 4.88... What is that?
> 
> What this shows, is that it's extremely important to visualize inclusive cost 
> _and_ self cost in this view. Additionally, we need to account for inlining. 
> Right now, we only see the self cost that is directly within main, I suspect. 

Currently perf annotate doesn't use the sample period, it uses sample
count instead and print the percentage within the function.  So it's a
different number to the perf report.  I think we need to fix this
first.

Thanks,
Namhyung


> For C++ this is usually very misleading, and basically makes the annotate view 
> completely useless for application-level profiling. If a second column would 
> be added with the inclusive cost with the ability to drill down, then I could 
> easily see myself using this view.
> 
> I would appreciate if you could take this into account.
> 
> Thanks a lot
> 
> 
> -- 
> Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
> KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
> Tel: +49-30-521325470
> KDAB - The Qt Experts

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH/RFC 0/4] perf annotate: Add --source-only option and the new source code TUI view
  2017-06-29  7:11   ` Namhyung Kim
@ 2017-06-30  7:14     ` Taeung Song
  0 siblings, 0 replies; 8+ messages in thread
From: Taeung Song @ 2017-06-30  7:14 UTC (permalink / raw)
  To: Namhyung Kim, Milian Wolff, Arnaldo Carvalho de Melo
  Cc: linux-kernel, Adrian Hunter, Andi Kleen, David Ahern, Jin Yao,
	Jiri Olsa, Kim Phillips, Masami Hiramatsu, Wang Nan, kernel-team

Hi,

On 06/29/2017 04:11 PM, Namhyung Kim wrote:
> Hello,
> 
> On Wed, Jun 28, 2017 at 11:53:22AM +0200, Milian Wolff wrote:
>> On Wednesday, June 28, 2017 5:18:08 AM CEST Taeung Song wrote:
>>> Hi,
>>>
>>> The --source-only option and new source code TUI view can show
>>> the result of performance analysis based on full source code per
>>> symbol(function). (Namhyung Kim told me this idea and it was also requested
>>> by others some time ago..)
>>>
>>> If someone wants to see the cause, he/she will need to dig into the asm.
>>> But before that, looking at the source level can give a hint or clue
>>> for the problem.
>>>
>>> For example, if target symbol is 'hex2u64' of util/util.c,
>>> the output is like below.
>>>
>>>      $ perf annotate --source-only --stdio -s hex2u64
>>>   Percent |      Source code of util.c for cycles:ppp (42 samples)
>>> -----------------------------------------------------------------
>>>      0.00 : 354   * While we find nice hex chars, build a long_val.
>>>      0.00 : 355   * Return number of chars processed.
>>>      0.00 : 356   */
>>>      0.00 : 357  int hex2u64(const char *ptr, u64 *long_val)
>>>      2.38 : 358  {
>>>      2.38 : 359          const char *p = ptr;
>>>      0.00 : 360          *long_val = 0;
>>>      0.00 : 361
>>>     30.95 : 362          while (*p) {
>>>     23.81 : 363                  const int hex_val = hex(*p);
>>>      0.00 : 364
>>>     14.29 : 365                  if (hex_val < 0)
>>>      0.00 : 366                          break;
>>>      0.00 : 367
>>>     26.19 : 368                  *long_val = (*long_val << 4) | hex_val;
>>>      0.00 : 369                  p++;
>>>      0.00 : 370          }
>>>      0.00 : 371
>>>      0.00 : 372          return p - ptr;
>>>      0.00 : 373  }
>>>
>>> And I added many perf developers into Cc: because I want to listen to your
>>> opinions about this new feature, if you don't mind.
>>>
>>> If you give some feedback, I'd appreciate it! :)
>>
>> Thanks Taeung,
>>
>> I requested this feature some time ago and it's really cool to see someone
>> step up and implement it - much appreciated!
>>
>> I just tested it out on my pet-example that leverages C++ instead of C:
>>
>> ~~~~~
>> #include <complex>
>> #include <cmath>
>> #include <random>
>> #include <iostream>
>>
>> using namespace std;
>>
>> int main()
>> {
>>      uniform_real_distribution<double> uniform(-1E5, 1E5);
>>      default_random_engine engine;
>>      double s = 0;
>>      for (int i = 0; i < 10000000; ++i) {
>>          s += norm(complex<double>(uniform(engine), uniform(engine)));
>>      }
>>      cout << s << '\n';
>>      return 0;
>> }
>> ~~~~~
>>
>> Compile it with:
>>
>> g++ -O2 -g -std=c++11 test.cpp -o test
>>
>> Then record it with perf:
>>
>> perf record --call-graph dwarf ./test
>>
>> Then analyse it with `perf report`. You'll see one entry for main with
>> something like:
>>
>> +  100.00%    39.69%  cpp-inlining  cpp-inlining      [.] main
>>
>> Select it and annotate it, then switch to your new source-only view:
>>
>> main  test.cpp
>>         │  30                                                                                                                                                                                             >        │  31    using namespace std;                                                                                                                                                                     >        │  32                                                                                                                                                                                             >        │  33    int main()                                                                                                                                                                               >        │+ 34    {                                                                                                                                                                                        >        │  35        uniform_real_distribution<double> uniform(-1E5, 1E5);                                                                                                                                >        │  36        default_random_engine engine;                                                                                                                                                        >        │+ 37        double s = 0;                                                                                                                                                                        >        │+ 38        for (int i = 0; i < 10000000; ++i) {                                                                                                                                                 >   4.88 │+ 39            s += norm(complex<double>(uniform(engine), uniform(engine)));                                                                                                                
    >        │  40        }                                                                                                                                                                                    >        │  41        cout << s << '\n';                                                                                                                                                                   >        │  42        return 0;                                                                                                                                                                            >        │+ 43    }
>>
>> Note: the line numbers are off b/c my file contains a file-header on-top.
>> Ignore that.
>>
>> Note2: There is no column header shown, so it's unclear what the first column
>> represents.
>>
>> Note 3: report showed 39.69% self cost in main, 100.00% inclusive. annotate
>> shows 4.88... What is that?
>>
>> What this shows, is that it's extremely important to visualize inclusive cost
>> _and_ self cost in this view. Additionally, we need to account for inlining.
>> Right now, we only see the self cost that is directly within main, I suspect.
> 
> Currently perf annotate doesn't use the sample period, it uses sample
> count instead and print the percentage within the function.  So it's a
> different number to the perf report.  I think we need to fix this
> first.
> 
> Thanks,
> Namhyung
> 

I understood. Hum.. so we need to replace the menu and column about 
total period to things about sample count like below ?

"t             Toggle total period view"

   -> "t             Toggle showing the number of samples for"

(I'm not sure what a short key(e.g. 't') is proper..)

Or modifying the code related to the number of samples,
show actual total period on perf-annotate ?

What do you think about this change ?

Thanks,
Taeung

> 
>> For C++ this is usually very misleading, and basically makes the annotate view
>> completely useless for application-level profiling. If a second column would
>> be added with the inclusive cost with the ability to drill down, then I could
>> easily see myself using this view.
>>
>> I would appreciate if you could take this into account.
>>
>> Thanks a lot
>>
>>
>> -- 
>> Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
>> KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
>> Tel: +49-30-521325470
>> KDAB - The Qt Experts
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH/RFC 0/4] perf annotate: Add --source-only option and the new source code TUI view
  2017-06-28  9:53 ` Milian Wolff
  2017-06-28 16:27   ` Taeung Song
  2017-06-29  7:11   ` Namhyung Kim
@ 2017-06-30 16:21   ` Taeung Song
  2 siblings, 0 replies; 8+ messages in thread
From: Taeung Song @ 2017-06-30 16:21 UTC (permalink / raw)
  To: Milian Wolff
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Adrian Hunter,
	Andi Kleen, David Ahern, Jin Yao, Jiri Olsa, Kim Phillips,
	Masami Hiramatsu, Namhyung Kim, Wang Nan

I'm late..

On 06/28/2017 06:53 PM, Milian Wolff wrote:
> On Wednesday, June 28, 2017 5:18:08 AM CEST Taeung Song wrote:
>> Hi,
>>
>> The --source-only option and new source code TUI view can show
>> the result of performance analysis based on full source code per
>> symbol(function). (Namhyung Kim told me this idea and it was also requested
>> by others some time ago..)
>>
>> If someone wants to see the cause, he/she will need to dig into the asm.
>> But before that, looking at the source level can give a hint or clue
>> for the problem.
>>
>> For example, if target symbol is 'hex2u64' of util/util.c,
>> the output is like below.
>>
>>      $ perf annotate --source-only --stdio -s hex2u64
>>   Percent |      Source code of util.c for cycles:ppp (42 samples)
>> -----------------------------------------------------------------
>>      0.00 : 354   * While we find nice hex chars, build a long_val.
>>      0.00 : 355   * Return number of chars processed.
>>      0.00 : 356   */
>>      0.00 : 357  int hex2u64(const char *ptr, u64 *long_val)
>>      2.38 : 358  {
>>      2.38 : 359          const char *p = ptr;
>>      0.00 : 360          *long_val = 0;
>>      0.00 : 361
>>     30.95 : 362          while (*p) {
>>     23.81 : 363                  const int hex_val = hex(*p);
>>      0.00 : 364
>>     14.29 : 365                  if (hex_val < 0)
>>      0.00 : 366                          break;
>>      0.00 : 367
>>     26.19 : 368                  *long_val = (*long_val << 4) | hex_val;
>>      0.00 : 369                  p++;
>>      0.00 : 370          }
>>      0.00 : 371
>>      0.00 : 372          return p - ptr;
>>      0.00 : 373  }
>>
>> And I added many perf developers into Cc: because I want to listen to your
>> opinions about this new feature, if you don't mind.
>>
>> If you give some feedback, I'd appreciate it! :)
> 
> Thanks Taeung,
> 
> I requested this feature some time ago and it's really cool to see someone
> step up and implement it - much appreciated!
> 
> I just tested it out on my pet-example that leverages C++ instead of C:
> 
> ~~~~~
> #include <complex>
> #include <cmath>
> #include <random>
> #include <iostream>
> 
> using namespace std;
> 
> int main()
> {
>      uniform_real_distribution<double> uniform(-1E5, 1E5);
>      default_random_engine engine;
>      double s = 0;
>      for (int i = 0; i < 10000000; ++i) {
>          s += norm(complex<double>(uniform(engine), uniform(engine)));
>      }
>      cout << s << '\n';
>      return 0;
> }
> ~~~~~
> 
> Compile it with:
> 
> g++ -O2 -g -std=c++11 test.cpp -o test
> 
> Then record it with perf:
> 
> perf record --call-graph dwarf ./test
> 
> Then analyse it with `perf report`. You'll see one entry for main with
> something like:
> 
> +  100.00%    39.69%  cpp-inlining  cpp-inlining      [.] main
> 
> Select it and annotate it, then switch to your new source-only view:
> 
> main  test.cpp
>         │  30
>         │  31    using namespace std;
>         │  32
>         │  33    int main()
>         │+ 34    {
>         │  35        uniform_real_distribution<double> uniform(-1E5, 1E5);
>         │  36        default_random_engine engine;
>         │+ 37        double s = 0;
>         │+ 38        for (int i = 0; i < 10000000; ++i) {
>    4.88 │+ 39            s += norm(complex<double>(uniform(engine),
> uniform(engine)));
>         │  40        }
>         │  41        cout << s << '\n';
>         │  42        return 0;
>         │+ 43    }
> 
> Note: the line numbers are off b/c my file contains a file-header on-top.
> Ignore that.
> 
> Note2: There is no column header shown, so it's unclear what the first column
> represents.

Okey, I'll add the first column.

> 
> Note 3: report showed 39.69% self cost in main, 100.00% inclusive. annotate
> shows 4.88... What is that?

My case is different from your result..but I'll keep digging things
related to the inclusive and self cost to understand the above case, 
remaking v2 patchset.

> 
> What this shows, is that it's extremely important to visualize inclusive cost
> _and_ self cost in this view. Additionally, we need to account for inlining.
> Right now, we only see the self cost that is directly within main, I suspect.

I handled only one source code file on this patchset,
so I also think it seems to be needed to check other source code file to 
handle inlining..

> For C++ this is usually very misleading, and basically makes the annotate view
> completely useless for application-level profiling. If a second column would
> be added with the inclusive cost with the ability to drill down, then I could
> easily see myself using this view.

Okey, will try to make the feature for 'drill down' after this basic 
patchset.


BTW, I'll resend v2 patchset later

   - considering several source files when getting srcline info
   - considering inline function
   - investigating the inclusive and self cost

Thanks,
Taeung

> 
> I would appreciate if you could take this into account.
> 
> Thanks a lot
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-06-30 16:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-28  3:18 [PATCH/RFC 0/4] perf annotate: Add --source-only option and the new source code TUI view Taeung Song
2017-06-28  3:26 ` Taeung Song
2017-06-28  9:53 ` Milian Wolff
2017-06-28 16:27   ` Taeung Song
2017-06-28 16:32     ` Milian Wolff
2017-06-29  7:11   ` Namhyung Kim
2017-06-30  7:14     ` Taeung Song
2017-06-30 16:21   ` Taeung Song

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.