All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 2/2] perf sort: Improve symbol sort output by separating unresolved samples by type
@ 2011-08-31  1:51 Anton Blanchard
  2011-08-31 12:36 ` emunson
  0 siblings, 1 reply; 2+ messages in thread
From: Anton Blanchard @ 2011-08-31  1:51 UTC (permalink / raw)
  To: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, emunson, imunsie
  Cc: linux-kernel


I took a profile that suggested 60% of total CPU time was in the
hypervisor:

# perf report --sort symbol
...
    60.20%  [H] 0x33d43c        
     4.43%  [k] ._spin_lock_irqsave
     1.07%  [k] ._spin_lock

Using perf stat to get the user/kernel/hypervisor breakdown contradicted
this.

The problem is we merge all unresolved samples into the one unknown
bucket. If add a comparison by sample type to sort__sym_cmp we get the
real picture:

# perf report --sort symbol
...
    57.11%  [.] 0x80fbf63c
     4.43%  [k] ._spin_lock_irqsave 
     1.07%  [k] ._spin_lock
     0.65%  [H] 0x33d43c

So it was almost all userspace, not hypervisor as the initial profile
suggested.

I found another issue while adding this. Symbol sorting sometimes shows
multiple entries for the unknown bucket:

# perf report --sort symbol
...
    16.65%  [.] 0x6cd3a8        
     7.25%  [.] 0x422460
     5.37%  [.] yylex
     4.79%  [.] malloc
     4.78%  [.] _int_malloc
     4.03%  [.] _int_free
     3.95%  [.] hash_source_code_string
     2.82%  [.] 0x532908
     2.64%  [.] 0x36b538
     0.94%  [H] 0x8000000000e132a4
     0.82%  [H] 0x800000000000e8b0

This happens because we aren't consistent with our sorting. On
one hand we check to see if both symbols match and for two unresolved
samples sym is NULL so we match:

        if (left->ms.sym == right->ms.sym)
                return 0;

On the other hand we use sample IP for unresolved samples when
comparing against a symbol:

       ip_l = left->ms.sym ? left->ms.sym->start : left->ip;
       ip_r = right->ms.sym ? right->ms.sym->start : right->ip;

This means unresolved samples end up spread across the rbtree and we
can't merge them all.

If we use cmp_null all unresolved samples will end up in the one bucket
and the output makes more sense:

# perf report --sort symbol
...
    39.12%  [.] 0x36b538        
     5.37%  [.] yylex
     4.79%  [.] malloc
     4.78%  [.] _int_malloc
     4.03%  [.] _int_free
     3.95%  [.] hash_source_code_string
     2.26%  [H] 0x800000000000e8b0

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Index: linux-2.6-tip/tools/perf/util/sort.c
===================================================================
--- linux-2.6-tip.orig/tools/perf/util/sort.c	2011-08-31 10:26:17.296618713 +1000
+++ linux-2.6-tip/tools/perf/util/sort.c	2011-08-31 10:32:12.703212452 +1000
@@ -157,11 +157,17 @@ sort__sym_cmp(struct hist_entry *left, s
 {
 	u64 ip_l, ip_r;
 
+	if (!left->ms.sym && !right->ms.sym)
+		return right->level - left->level;
+
+	if (!left->ms.sym || !right->ms.sym)
+		return cmp_null(left->ms.sym, right->ms.sym);
+
 	if (left->ms.sym == right->ms.sym)
 		return 0;
 
-	ip_l = left->ms.sym ? left->ms.sym->start : left->ip;
-	ip_r = right->ms.sym ? right->ms.sym->start : right->ip;
+	ip_l = left->ms.sym->start;
+	ip_r = right->ms.sym->start;
 
 	return (int64_t)(ip_r - ip_l);
 }

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH 2/2] perf sort: Improve symbol sort output by separating unresolved samples by type
  2011-08-31  1:51 [PATCH 2/2] perf sort: Improve symbol sort output by separating unresolved samples by type Anton Blanchard
@ 2011-08-31 12:36 ` emunson
  0 siblings, 0 replies; 2+ messages in thread
From: emunson @ 2011-08-31 12:36 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: Peter Zijlstra, Paul Mackerras, Ingo Molnar,
	Arnaldo Carvalho de Melo, Frederic Weisbecker, imunsie,
	linux-kernel

 On Wed, 31 Aug 2011 11:51:45 +1000, Anton Blanchard wrote:
> I took a profile that suggested 60% of total CPU time was in the
> hypervisor:
>
> # perf report --sort symbol
> ...
>     60.20%  [H] 0x33d43c
>      4.43%  [k] ._spin_lock_irqsave
>      1.07%  [k] ._spin_lock
>
> Using perf stat to get the user/kernel/hypervisor breakdown 
> contradicted
> this.
>
> The problem is we merge all unresolved samples into the one unknown
> bucket. If add a comparison by sample type to sort__sym_cmp we get 
> the
> real picture:
>
> # perf report --sort symbol
> ...
>     57.11%  [.] 0x80fbf63c
>      4.43%  [k] ._spin_lock_irqsave
>      1.07%  [k] ._spin_lock
>      0.65%  [H] 0x33d43c
>
> So it was almost all userspace, not hypervisor as the initial profile
> suggested.
>
> I found another issue while adding this. Symbol sorting sometimes 
> shows
> multiple entries for the unknown bucket:
>
> # perf report --sort symbol
> ...
>     16.65%  [.] 0x6cd3a8
>      7.25%  [.] 0x422460
>      5.37%  [.] yylex
>      4.79%  [.] malloc
>      4.78%  [.] _int_malloc
>      4.03%  [.] _int_free
>      3.95%  [.] hash_source_code_string
>      2.82%  [.] 0x532908
>      2.64%  [.] 0x36b538
>      0.94%  [H] 0x8000000000e132a4
>      0.82%  [H] 0x800000000000e8b0
>
> This happens because we aren't consistent with our sorting. On
> one hand we check to see if both symbols match and for two unresolved
> samples sym is NULL so we match:
>
>         if (left->ms.sym == right->ms.sym)
>                 return 0;
>
> On the other hand we use sample IP for unresolved samples when
> comparing against a symbol:
>
>        ip_l = left->ms.sym ? left->ms.sym->start : left->ip;
>        ip_r = right->ms.sym ? right->ms.sym->start : right->ip;
>
> This means unresolved samples end up spread across the rbtree and we
> can't merge them all.
>
> If we use cmp_null all unresolved samples will end up in the one 
> bucket
> and the output makes more sense:
>
> # perf report --sort symbol
> ...
>     39.12%  [.] 0x36b538
>      5.37%  [.] yylex
>      4.79%  [.] malloc
>      4.78%  [.] _int_malloc
>      4.03%  [.] _int_free
>      3.95%  [.] hash_source_code_string
>      2.26%  [H] 0x800000000000e8b0
>
> Signed-off-by: Anton Blanchard <anton@samba.org>


 Acked-by: Eric B Munson <emunson@mgebm.net>


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-08-31 12:36 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-31  1:51 [PATCH 2/2] perf sort: Improve symbol sort output by separating unresolved samples by type Anton Blanchard
2011-08-31 12:36 ` emunson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.