linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
To: acme@redhat.com, mingo@kernel.org, peterz@infradead.org,
	eranian@google.com, robert.richter@amd.com, asharma@fb.com
Cc: mpjohn@us.ibm.com, Anton Blanchard <anton@au1.ibm.com>,
	paulus@samba.org, linux-kernel@vger.kernel.org,
	linuxppc-dev@ozlabs.org
Subject: [RFC][PATCH] perf: Add a few generic stalled-cycles events
Date: Thu, 11 Oct 2012 18:28:39 -0700	[thread overview]
Message-ID: <20121012012839.GA15348@us.ibm.com> (raw)


>From 89cb6a25b9f714e55a379467a832ee015014ed11 Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Date: Tue, 18 Sep 2012 10:59:01 -0700
Subject: [PATCH] perf: Add a few generic stalled-cycles events

The existing generic event 'stalled-cycles-backend' corresponds to
PM_CMPLU_STALL event in Power7. While this event is useful, detailed
performance analysis often requires us to find more specific reasons
for the stalled cycle. For instance, stalled cycles in Power7 can
occur due to, among others:

	- instruction fetch unit (IFU),
	- Load-store-unit (LSU),
	- Fixed point unit (FXU)
	- Branch unit (BRU)

While it is possible to use raw codes to monitor these events, it quickly
becomes cumbersome with performance analysis frequently requiring mapping
the raw event codes in reports to their symbolic names.

This patch is a proposal to try and generalize such perf events. Since
the code changes are quite simple, I bunched all the 4 events together.

I am not familiar with how readily these events would map to other
architectures. Here is some information on the events for Power7:

	stalled-cycles-fixed-point (PM_CMPLU_STALL_FXU)

		Following a completion stall, the last instruction to finish
		before completion resumes was from the Fixed Point Unit.

		Completion stall is any period when no groups completed and
		the completion table was not empty for that thread.

	stalled-cycles-load-store (PM_CMPLU_STALL_LSU)

		Following a completion stall, the last instruction to finish
		before completion resumes was from the Load-Store Unit.

	stalled-cycles-instruction-fetch (PM_CMPLU_STALL_IFU)

		Following a completion stall, the last instruction to finish
		before completion resumes was from the Instruction Fetch Unit.

	stalled-cycles-branch (PM_CMPLU_STALL_BRU)

		Following a completion stall, the last instruction to finish
		before completion resumes was from the Branch Unit.

Looking for feedback on this approach and if this can be further extended.
Power7 has 530 events[2] out of which a "CPI stack analysis"[1] uses about 26
events.


[1] CPI Stack analysis
	https://www.power.org/documentation/commonly-used-metrics-for-performance-analysis

[2] Power7 events:
	https://www.power.org/documentation/comprehensive-pmu-event-reference-power7/

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 arch/powerpc/perf/power7-pmu.c |    4 ++++
 include/linux/perf_event.h     |    4 ++++
 tools/perf/builtin-stat.c      |    4 ++++
 tools/perf/util/evsel.c        |    4 ++++
 tools/perf/util/parse-events.l |    4 ++++
 tools/perf/util/python.c       |    4 ++++
 6 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
index 1251e4d..813e7c7 100644
--- a/arch/powerpc/perf/power7-pmu.c
+++ b/arch/powerpc/perf/power7-pmu.c
@@ -304,6 +304,10 @@ static int power7_generic_events[] = {
 	[PERF_COUNT_HW_CACHE_MISSES] = 0x400f0,		/* LD_MISS_L1	*/
 	[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = 0x10068,	/* BRU_FIN	*/
 	[PERF_COUNT_HW_BRANCH_MISSES] = 0x400f6,	/* BR_MPRED	*/
+	[PERF_COUNT_HW_STALLED_CYCLES_FIXED_POINT] = 0x20014,/* CMPLU_STALL_FXU */
+	[PERF_COUNT_HW_STALLED_CYCLES_LOAD_STORE] = 0x20012,/* CMPLU_STALL_LSU */
+	[PERF_COUNT_HW_STALLED_CYCLES_INSTRUCTION_FETCH] = 0x4004c,/* CMPLU_STALL_IFU */
+	[PERF_COUNT_HW_STALLED_CYCLES_BRANCH] = 0x4004e,/* CMPLU_STALL_BRU */
 };
 
 #define C(x)	PERF_COUNT_HW_CACHE_##x
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index bdb4161..ff9f0a6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -55,6 +55,10 @@ enum perf_hw_id {
 	PERF_COUNT_HW_STALLED_CYCLES_FRONTEND	= 7,
 	PERF_COUNT_HW_STALLED_CYCLES_BACKEND	= 8,
 	PERF_COUNT_HW_REF_CPU_CYCLES		= 9,
+	PERF_COUNT_HW_STALLED_CYCLES_FIXED_POINT = 10,
+	PERF_COUNT_HW_STALLED_CYCLES_LOAD_STORE	= 11,
+	PERF_COUNT_HW_STALLED_CYCLES_INSTRUCTION_FETCH = 12,
+	PERF_COUNT_HW_STALLED_CYCLES_BRANCH	= 13,
 
 	PERF_COUNT_HW_MAX,			/* non-ABI */
 };
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 861f0ae..6275dbb 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -77,6 +77,10 @@ static struct perf_event_attr default_attrs[] = {
   { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS		},
   { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS	},
   { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES		},
+  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_FIXED_POINT },
+  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_LOAD_STORE },
+  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_INSTRUCTION_FETCH },
+  { .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_BRANCH },
 
 };
 
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 2eaae14..17e3190 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -77,6 +77,10 @@ static const char *perf_evsel__hw_names[PERF_COUNT_HW_MAX] = {
 	"stalled-cycles-frontend",
 	"stalled-cycles-backend",
 	"ref-cycles",
+	"stalled-cycles-fixed-point",
+	"stalled-cycles-load-store",
+	"stalled-cycles-instruction-fetch",
+	"stalled-cycles-branch",
 };
 
 static const char *__perf_evsel__hw_name(u64 config)
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 384ca74..0c49c05 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -102,6 +102,10 @@ branch-instructions|branches			{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_
 branch-misses					{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_BRANCH_MISSES); }
 bus-cycles					{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_BUS_CYCLES); }
 ref-cycles					{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_REF_CPU_CYCLES); }
+stalled-cycles-fixed-point			{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_STALLED_CYCLES_FIXED_POINT); }
+stalled-cycles-load-store			{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_STALLED_CYCLES_LOAD_STORE); }
+stalled-cycles-instruction-fetch		{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_STALLED_CYCLES_INSTRUCTION_FETCH); }
+stalled-cycles-branch				{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_STALLED_CYCLES_BRANCH); }
 cpu-clock					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CPU_CLOCK); }
 task-clock					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_TASK_CLOCK); }
 page-faults|faults				{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_PAGE_FAULTS); }
diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
index 0688bfb..c563b30 100644
--- a/tools/perf/util/python.c
+++ b/tools/perf/util/python.c
@@ -952,6 +952,10 @@ static struct {
 
 	{ "COUNT_HW_STALLED_CYCLES_FRONTEND",	  PERF_COUNT_HW_STALLED_CYCLES_FRONTEND },
 	{ "COUNT_HW_STALLED_CYCLES_BACKEND",	  PERF_COUNT_HW_STALLED_CYCLES_BACKEND },
+	{ "COUNT_HW_STALLED_CYCLES_FIXED_POINT",  PERF_COUNT_HW_STALLED_CYCLES_FIXED_POINT },
+	{ "COUNT_HW_STALLED_CYCLES_LOAD_STORE",  PERF_COUNT_HW_STALLED_CYCLES_LOAD_STORE },
+	{ "COUNT_HW_STALLED_CYCLES_INSTRUCTION_FETCH",  PERF_COUNT_HW_STALLED_CYCLES_INSTRUCTION_FETCH },
+	{ "COUNT_HW_STALLED_CYCLES_BRANCH",  PERF_COUNT_HW_STALLED_CYCLES_BRANCH },
 
 	{ "COUNT_SW_CPU_CLOCK",	       PERF_COUNT_SW_CPU_CLOCK },
 	{ "COUNT_SW_TASK_CLOCK",       PERF_COUNT_SW_TASK_CLOCK },
-- 
1.7.1

             reply	other threads:[~2012-10-12  1:27 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-12  1:28 Sukadev Bhattiprolu [this message]
2012-10-15  5:26 ` [RFC][PATCH] perf: Add a few generic stalled-cycles events Anshuman Khandual
2012-10-15 15:55 ` Robert Richter
2012-10-15 17:23   ` Arun Sharma
2012-10-16  5:28     ` Anshuman Khandual
2012-10-16 10:08   ` Robert Richter
2012-10-16 12:21     ` Stephane Eranian
2012-10-19 17:05       ` Sukadev Bhattiprolu
2012-10-16 18:31     ` Sukadev Bhattiprolu
2012-10-24 12:27       ` Peter Zijlstra
2012-10-31  6:40         ` Sukadev Bhattiprolu
2012-10-31  7:22           ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121012012839.GA15348@us.ibm.com \
    --to=sukadev@linux.vnet.ibm.com \
    --cc=acme@redhat.com \
    --cc=anton@au1.ibm.com \
    --cc=asharma@fb.com \
    --cc=eranian@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=mingo@kernel.org \
    --cc=mpjohn@us.ibm.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=robert.richter@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).