linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: kan.liang@linux.intel.com
To: acme@kernel.org, jolsa@redhat.com, peterz@infradead.org,
	mingo@redhat.com, linux-kernel@vger.kernel.org
Cc: namhyung@kernel.org, adrian.hunter@intel.com,
	mathieu.poirier@linaro.org, ravi.bangoria@linux.ibm.com,
	alexey.budankov@linux.intel.com, vitaly.slobodskoy@intel.com,
	pavel.gerasimov@intel.com, mpe@ellerman.id.au,
	eranian@google.com, ak@linux.intel.com,
	Kan Liang <kan.liang@linux.intel.com>
Subject: [PATCH V3 17/17] perf hist: Add fast path for duplicate entries check
Date: Fri, 13 Mar 2020 11:33:19 -0700	[thread overview]
Message-ID: <20200313183319.17739-18-kan.liang@linux.intel.com> (raw)
In-Reply-To: <20200313183319.17739-1-kan.liang@linux.intel.com>

From: Kan Liang <kan.liang@linux.intel.com>

Perf checks the duplicate entries in a callchain before adding an entry.
However the check is very slow especially with deeper call stack.
Almost ~50% elapsed time of perf report is spent on the check when the
call stack is always depth of 32.

The hist_entry__cmp() is used to compare the new entry with the old
entries. It will go through all the available sorts in the sort_list,
and call the specific cmp of each sort, which is very slow.
Actually, for most cases, there are no duplicate entries in callchain.
The symbols are usually different. It's much faster to do a quick check
for symbols first. Only do the full cmp when the symbols are exactly the
same.
The quick check is only to check symbols, not dso. Export
_sort__sym_cmp.

 $perf record --call-graph lbr ./tchain_edit_64

 Without the patch
 $time perf report --stdio
 real    0m21.142s
 user    0m21.110s
 sys     0m0.033s

 With the patch
 $time perf report --stdio
 real    0m10.977s
 user    0m10.948s
 sys     0m0.027s

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/hist.c | 23 +++++++++++++++++++++++
 tools/perf/util/sort.c |  2 +-
 tools/perf/util/sort.h |  2 ++
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index e74a5acf66d9..311d6d119f3c 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -1057,6 +1057,20 @@ iter_next_cumulative_entry(struct hist_entry_iter *iter,
 	return fill_callchain_info(al, node, iter->hide_unresolved);
 }
 
+static bool
+hist_entry__fast__sym_diff(struct hist_entry *left,
+			   struct hist_entry *right)
+{
+	struct symbol *sym_l = left->ms.sym;
+	struct symbol *sym_r = right->ms.sym;
+
+	if (!sym_l && !sym_r)
+		return left->ip != right->ip;
+
+	return !!_sort__sym_cmp(sym_l, sym_r);
+}
+
+
 static int
 iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
 			       struct addr_location *al)
@@ -1083,6 +1097,7 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
 	};
 	int i;
 	struct callchain_cursor cursor;
+	bool fast = hists__has(he_tmp.hists, sym);
 
 	callchain_cursor_snapshot(&cursor, &callchain_cursor);
 
@@ -1093,6 +1108,14 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
 	 * It's possible that it has cycles or recursive calls.
 	 */
 	for (i = 0; i < iter->curr; i++) {
+		/*
+		 * For most cases, there are no duplicate entries in callchain.
+		 * The symbols are usually different. Do a quick check for
+		 * symbols first.
+		 */
+		if (fast && hist_entry__fast__sym_diff(he_cache[i], &he_tmp))
+			continue;
+
 		if (hist_entry__cmp(he_cache[i], &he_tmp) == 0) {
 			/* to avoid calling callback function */
 			iter->he = NULL;
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index ab0cfd790ad0..33e0fa1bc203 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -234,7 +234,7 @@ static int64_t _sort__addr_cmp(u64 left_ip, u64 right_ip)
 	return (int64_t)(right_ip - left_ip);
 }
 
-static int64_t _sort__sym_cmp(struct symbol *sym_l, struct symbol *sym_r)
+int64_t _sort__sym_cmp(struct symbol *sym_l, struct symbol *sym_r)
 {
 	if (!sym_l || !sym_r)
 		return cmp_null(sym_l, sym_r);
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 6c862d62d052..c3c3c68cbfdd 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -309,5 +309,7 @@ int64_t
 sort__daddr_cmp(struct hist_entry *left, struct hist_entry *right);
 int64_t
 sort__dcacheline_cmp(struct hist_entry *left, struct hist_entry *right);
+int64_t
+_sort__sym_cmp(struct symbol *sym_l, struct symbol *sym_r);
 char *hist_entry__srcline(struct hist_entry *he);
 #endif	/* __PERF_SORT_H */
-- 
2.17.1


      parent reply	other threads:[~2020-03-13 18:34 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-13 18:33 [PATCH V3 00/17] Stitch LBR call stack (Perf Tools) kan.liang
2020-03-13 18:33 ` [PATCH V3 01/17] perf pmu: Add support for PMU capabilities kan.liang
2020-03-18 19:47   ` Arnaldo Carvalho de Melo
2020-03-19 13:07     ` Liang, Kan
2020-03-13 18:33 ` [PATCH V3 02/17] perf header: Support CPU " kan.liang
2020-03-18 19:49   ` Arnaldo Carvalho de Melo
2020-03-13 18:33 ` [PATCH V3 03/17] perf record: Clear HEADER_CPU_PMU_CAPS for non LBR call stack mode kan.liang
2020-03-13 18:33 ` [PATCH V3 04/17] perf stat: Clear HEADER_CPU_PMU_CAPS kan.liang
2020-03-13 18:33 ` [PATCH V3 05/17] perf machine: Remove the indent in resolve_lbr_callchain_sample kan.liang
2020-03-18 19:50   ` Arnaldo Carvalho de Melo
2020-03-13 18:33 ` [PATCH V3 06/17] perf machine: Refine the function for LBR call stack reconstruction kan.liang
2020-03-13 18:33 ` [PATCH V3 07/17] perf machine: Factor out lbr_callchain_add_kernel_ip() kan.liang
2020-03-13 18:33 ` [PATCH V3 08/17] perf machine: Factor out lbr_callchain_add_lbr_ip() kan.liang
2020-03-13 18:33 ` [PATCH V3 09/17] perf thread: Add a knob for LBR stitch approach kan.liang
2020-03-13 18:33 ` [PATCH V3 10/17] perf tools: Save previous sample for LBR stitching approach kan.liang
2020-03-18 12:14   ` Jiri Olsa
2020-03-18 14:20     ` Liang, Kan
2020-03-13 18:33 ` [PATCH V3 11/17] perf tools: Save previous cursor nodes " kan.liang
2020-03-13 18:33 ` [PATCH V3 12/17] perf tools: Stitch LBR call stack kan.liang
2020-03-13 18:33 ` [PATCH V3 13/17] perf report: Add option to enable the LBR stitching approach kan.liang
2020-03-13 18:33 ` [PATCH V3 14/17] perf script: " kan.liang
2020-03-13 18:33 ` [PATCH V3 15/17] perf top: " kan.liang
2020-03-18 12:14   ` Jiri Olsa
2020-03-18 14:19     ` Liang, Kan
2020-03-13 18:33 ` [PATCH V3 16/17] perf c2c: " kan.liang
2020-03-13 18:33 ` kan.liang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200313183319.17739-18-kan.liang@linux.intel.com \
    --to=kan.liang@linux.intel.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ak@linux.intel.com \
    --cc=alexey.budankov@linux.intel.com \
    --cc=eranian@google.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.poirier@linaro.org \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=namhyung@kernel.org \
    --cc=pavel.gerasimov@intel.com \
    --cc=peterz@infradead.org \
    --cc=ravi.bangoria@linux.ibm.com \
    --cc=vitaly.slobodskoy@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).