linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/10][v6] powerpc/perf: Export memory hierarchy level in Power7/8.
@ 2013-10-16  2:06 Sukadev Bhattiprolu
  2013-10-16  2:06 ` [PATCH 01/10][v6] powerpc: Rename branch_opcode() to instr_opcode() Sukadev Bhattiprolu
                   ` (9 more replies)
  0 siblings, 10 replies; 16+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-16  2:06 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Michael Ellerman, linux-kernel, Stephane Eranian, linuxppc-dev,
	Paul Mackerras, Anshuman Khandual

Power7 and Power8 processors save the memory hierarchy level (eg: L2, L3)
from which a load or store instruction was satisfied. Export this hierarchy
information to the user via the perf_mem_data_src object.

Thanks to input from Stephane Eranian, Michael Ellerman, Michael Neuling
and Anshuman Khandual.

Sukadev Bhattiprolu (10):
  powerpc: Rename branch_opcode() to instr_opcode()
  powerpc/Power7: detect load/store instructions
  tools/perf: silence compiler warnings
  tools/perf: Remove local byteorder.h.
  powerpc/perf: Remove PME_ prefix for power7 events
  powerpc/perf: Export Power8 generic events in sysfs
  powerpc/perf: Add Power8 event PM_MRK_GRP_CMPL to sysfs.
  powerpc/perf: Define big-endian version of perf_mem_data_src
  powerpc/perf: Export Power8 memory hierarchy info to user space.
  powerpc/perf: Export Power7 memory hierarchy info to user space.

 arch/powerpc/include/asm/code-patching.h     |    1 +
 arch/powerpc/include/asm/perf_event_server.h |    4 +-
 arch/powerpc/lib/code-patching.c             |   51 +++++++++++-
 arch/powerpc/perf/core-book3s.c              |   11 +++
 arch/powerpc/perf/power7-pmu.c               |  112 +++++++++++++++++++++++---
 arch/powerpc/perf/power8-events-list.h       |   21 +++++
 arch/powerpc/perf/power8-pmu.c               |   97 ++++++++++++++++++++--
 include/uapi/linux/perf_event.h              |   16 ++++
 tools/perf/Makefile                          |    1 -
 tools/perf/util/include/asm/byteorder.h      |    2 -
 tools/perf/util/include/linux/types.h        |   20 +++++
 tools/perf/util/srcline.c                    |    4 +-
 12 files changed, 316 insertions(+), 24 deletions(-)
 create mode 100644 arch/powerpc/perf/power8-events-list.h
 delete mode 100644 tools/perf/util/include/asm/byteorder.h

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 01/10][v6] powerpc: Rename branch_opcode() to instr_opcode()
  2013-10-16  2:06 [PATCH 00/10][v6] powerpc/perf: Export memory hierarchy level in Power7/8 Sukadev Bhattiprolu
@ 2013-10-16  2:06 ` Sukadev Bhattiprolu
  2013-10-16  2:06 ` [PATCH 02/10][v6] powerpc/Power7: detect load/store instructions Sukadev Bhattiprolu
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-16  2:06 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Michael Ellerman, linux-kernel, Stephane Eranian, linuxppc-dev,
	Paul Mackerras, Anshuman Khandual

The logic used in branch_opcode() to extract the opcode for an instruction
applies to non branch instructions also. So rename to instr_opcode().

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 arch/powerpc/lib/code-patching.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 17e5b23..2bc9db3 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -72,19 +72,19 @@ unsigned int create_cond_branch(const unsigned int *addr,
 	return instruction;
 }
 
-static unsigned int branch_opcode(unsigned int instr)
+static unsigned int instr_opcode(unsigned int instr)
 {
 	return (instr >> 26) & 0x3F;
 }
 
 static int instr_is_branch_iform(unsigned int instr)
 {
-	return branch_opcode(instr) == 18;
+	return instr_opcode(instr) == 18;
 }
 
 static int instr_is_branch_bform(unsigned int instr)
 {
-	return branch_opcode(instr) == 16;
+	return instr_opcode(instr) == 16;
 }
 
 int instr_is_relative_branch(unsigned int instr)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 02/10][v6] powerpc/Power7: detect load/store instructions
  2013-10-16  2:06 [PATCH 00/10][v6] powerpc/perf: Export memory hierarchy level in Power7/8 Sukadev Bhattiprolu
  2013-10-16  2:06 ` [PATCH 01/10][v6] powerpc: Rename branch_opcode() to instr_opcode() Sukadev Bhattiprolu
@ 2013-10-16  2:06 ` Sukadev Bhattiprolu
  2013-10-16  8:25   ` David Laight
  2013-10-16  2:06 ` [PATCH 03/10][v6] tools/perf: silence compiler warnings Sukadev Bhattiprolu
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 16+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-16  2:06 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Michael Ellerman, linux-kernel, Stephane Eranian, linuxppc-dev,
	Paul Mackerras, Anshuman Khandual

Implement instr_is_load_store_2_06() to detect whether a given instruction
is one of the fixed-point or floating-point load/store instructions in the
POWER Instruction Set Architecture v2.06.

This function will be used in a follow-on patch to save memory hierarchy
information of the load/store on a Power7 system. (Power8 systems set some
bits in the SIER to identify load/store operations and hence don't need a
similar functionality).

Based on optimized code from Michael Ellerman and comments from Tom Musta.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
Changelog[v6]
	- [Michael Ellerman, Tom Musta]: Optmize the implementation to
	  avoid for loop.

 arch/powerpc/include/asm/code-patching.h |    1 +
 arch/powerpc/lib/code-patching.c         |   45 ++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
index a6f8c7a..9cc3ef1 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -34,6 +34,7 @@ int instr_is_branch_to_addr(const unsigned int *instr, unsigned long addr);
 unsigned long branch_target(const unsigned int *instr);
 unsigned int translate_branch(const unsigned int *dest,
 			      const unsigned int *src);
+int instr_is_load_store_2_06(const unsigned int *instr);
 
 static inline unsigned long ppc_function_entry(void *func)
 {
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 2bc9db3..49fb9d7 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -159,6 +159,51 @@ unsigned int translate_branch(const unsigned int *dest, const unsigned int *src)
 	return 0;
 }
 
+/*
+ * Determine if the op code in the instruction corresponds to a load or
+ * store instruction. Ignore the vector load instructions like evlddepx,
+ * evstddepx for now.
+ *
+ * This function is valid for POWER ISA 2.06.
+ *
+ * Reference:	PowerISA_V2.06B_Public.pdf, Sections 3.3.2 through 3.3.6
+ *		and 4.6.2 through 4.6.4, Appendix F (Opcode Maps).
+ */
+int instr_is_load_store_2_06(const unsigned int *instr)
+{
+	unsigned int op, upper, lower;
+
+	op = instr_opcode(*instr);
+
+	if ((op >= 32 && op <= 58) || (op == 61 || op == 62))
+		return true;
+
+	if (op != 31)
+		return false;
+
+	upper = op >> 5;
+	lower = op & 0x1f;
+
+	/* Short circuit as many misses as we can */
+	if (lower < 3 || lower > 23)
+		return false;
+
+	if (lower == 3) {
+		if (upper >= 16)
+			return true;
+
+		return false;
+	}
+
+	if (lower == 7 || lower == 12)
+		return true;
+
+	if (lower >= 20) /* && lower <= 23 (implicit) */
+		return true;
+
+	return false;
+}
+
 
 #ifdef CONFIG_CODE_PATCHING_SELFTEST
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 03/10][v6] tools/perf: silence compiler warnings
  2013-10-16  2:06 [PATCH 00/10][v6] powerpc/perf: Export memory hierarchy level in Power7/8 Sukadev Bhattiprolu
  2013-10-16  2:06 ` [PATCH 01/10][v6] powerpc: Rename branch_opcode() to instr_opcode() Sukadev Bhattiprolu
  2013-10-16  2:06 ` [PATCH 02/10][v6] powerpc/Power7: detect load/store instructions Sukadev Bhattiprolu
@ 2013-10-16  2:06 ` Sukadev Bhattiprolu
  2013-10-16  2:06 ` [PATCH 04/10][v6] tools/perf: Remove local byteorder.h Sukadev Bhattiprolu
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-16  2:06 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Michael Ellerman, linux-kernel, Stephane Eranian, linuxppc-dev,
	Paul Mackerras, Anshuman Khandual

The uninitialized variables cause warnings which are treated as errors
during build (without WERROR=0).

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 tools/perf/util/srcline.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index 10983a9..0477055 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -223,8 +223,8 @@ out:
 
 char *get_srcline(struct dso *dso, unsigned long addr)
 {
-	char *file;
-	unsigned line;
+	char *file = NULL;
+	unsigned line = 0;
 	char *srcline;
 	char *dso_name = dso->long_name;
 	size_t size;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 04/10][v6] tools/perf: Remove local byteorder.h.
  2013-10-16  2:06 [PATCH 00/10][v6] powerpc/perf: Export memory hierarchy level in Power7/8 Sukadev Bhattiprolu
                   ` (2 preceding siblings ...)
  2013-10-16  2:06 ` [PATCH 03/10][v6] tools/perf: silence compiler warnings Sukadev Bhattiprolu
@ 2013-10-16  2:06 ` Sukadev Bhattiprolu
  2013-10-16  2:06 ` [PATCH 05/10][v6] powerpc/perf: Remove PME_ prefix for power7 events Sukadev Bhattiprolu
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-16  2:06 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Michael Ellerman, linux-kernel, Stephane Eranian, linuxppc-dev,
	Paul Mackerras, Anshuman Khandual

Remove the local tools/perf/util/include/asm/byteorder.h and add
a few missing typedefs to tools/perf/util/include/linux/types.h.

The local byteorder.h complicates defining big/little endian versions
of data structures in include/uapi/linux/perf_event.h.

Fix proposed by Michael Ellerman.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 tools/perf/Makefile                     |    1 -
 tools/perf/util/include/asm/byteorder.h |    2 --
 tools/perf/util/include/linux/types.h   |   20 ++++++++++++++++++++
 3 files changed, 20 insertions(+), 3 deletions(-)
 delete mode 100644 tools/perf/util/include/asm/byteorder.h

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index b62e12d..3c4a7d9 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -225,7 +225,6 @@ LIB_H += util/include/linux/types.h
 LIB_H += util/include/linux/linkage.h
 LIB_H += util/include/asm/asm-offsets.h
 LIB_H += util/include/asm/bug.h
-LIB_H += util/include/asm/byteorder.h
 LIB_H += util/include/asm/hweight.h
 LIB_H += util/include/asm/swab.h
 LIB_H += util/include/asm/system.h
diff --git a/tools/perf/util/include/asm/byteorder.h b/tools/perf/util/include/asm/byteorder.h
deleted file mode 100644
index 2a9bdc0..0000000
--- a/tools/perf/util/include/asm/byteorder.h
+++ /dev/null
@@ -1,2 +0,0 @@
-#include <asm/types.h>
-#include "../../../../include/uapi/linux/swab.h"
diff --git a/tools/perf/util/include/linux/types.h b/tools/perf/util/include/linux/types.h
index eb46478..775f68e 100644
--- a/tools/perf/util/include/linux/types.h
+++ b/tools/perf/util/include/linux/types.h
@@ -7,10 +7,30 @@
 #define __bitwise
 #endif
 
+#ifndef __le16
+typedef __u16 __bitwise __le16;
+#endif
+
 #ifndef __le32
 typedef __u32 __bitwise __le32;
 #endif
 
+#ifndef __be16
+typedef __u16 __bitwise __be16;
+#endif
+
+#ifndef __le64
+typedef __u64 __bitwise __le64;
+#endif
+
+#ifndef __be32
+typedef __u32 __bitwise __be32;
+#endif
+
+#ifndef __be64
+typedef __u64 __bitwise __be64;
+#endif
+
 #define DECLARE_BITMAP(name,bits) \
 	unsigned long name[BITS_TO_LONGS(bits)]
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 05/10][v6] powerpc/perf: Remove PME_ prefix for power7 events
  2013-10-16  2:06 [PATCH 00/10][v6] powerpc/perf: Export memory hierarchy level in Power7/8 Sukadev Bhattiprolu
                   ` (3 preceding siblings ...)
  2013-10-16  2:06 ` [PATCH 04/10][v6] tools/perf: Remove local byteorder.h Sukadev Bhattiprolu
@ 2013-10-16  2:06 ` Sukadev Bhattiprolu
  2013-10-16  2:06 ` [PATCH 06/10][v6] powerpc/perf: Export Power8 generic events in sysfs Sukadev Bhattiprolu
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-16  2:06 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Michael Ellerman, linux-kernel, Stephane Eranian, linuxppc-dev,
	Paul Mackerras, Anshuman Khandual

We used the PME_ prefix earlier to avoid some macro/variable name
collisions.  We have since changed the way we define/use the event
macros so we no longer need the prefix.

By dropping the prefix, we keep the the event macros consistent with
their official names.

Reported-by: Michael Ellerman <ellerman@au1.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/perf_event_server.h |    2 +-
 arch/powerpc/perf/power7-pmu.c               |   18 +++++++++---------
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
index 3fd2f1b..d7b3419 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -138,7 +138,7 @@ extern ssize_t power_events_sysfs_show(struct device *dev,
 #define	EVENT_PTR(_id, _suffix)		&EVENT_VAR(_id, _suffix).attr.attr
 
 #define	EVENT_ATTR(_name, _id, _suffix)					\
-	PMU_EVENT_ATTR(_name, EVENT_VAR(_id, _suffix), PME_##_id,	\
+	PMU_EVENT_ATTR(_name, EVENT_VAR(_id, _suffix), _id,		\
 			power_events_sysfs_show)
 
 #define	GENERIC_EVENT_ATTR(_name, _id)	EVENT_ATTR(_name, _id, _g)
diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
index 56c67bc..ae24dfc 100644
--- a/arch/powerpc/perf/power7-pmu.c
+++ b/arch/powerpc/perf/power7-pmu.c
@@ -54,7 +54,7 @@
  * Power7 event codes.
  */
 #define EVENT(_name, _code) \
-	PME_##_name = _code,
+	_name = _code,
 
 enum {
 #include "power7-events-list.h"
@@ -318,14 +318,14 @@ static void power7_disable_pmc(unsigned int pmc, unsigned long mmcr[])
 }
 
 static int power7_generic_events[] = {
-	[PERF_COUNT_HW_CPU_CYCLES] =			PME_PM_CYC,
-	[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =	PME_PM_GCT_NOSLOT_CYC,
-	[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] =	PME_PM_CMPLU_STALL,
-	[PERF_COUNT_HW_INSTRUCTIONS] =			PME_PM_INST_CMPL,
-	[PERF_COUNT_HW_CACHE_REFERENCES] =		PME_PM_LD_REF_L1,
-	[PERF_COUNT_HW_CACHE_MISSES] =			PME_PM_LD_MISS_L1,
-	[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] =		PME_PM_BRU_FIN,
-	[PERF_COUNT_HW_BRANCH_MISSES] =			PME_PM_BR_MPRED,
+	[PERF_COUNT_HW_CPU_CYCLES] =			PM_CYC,
+	[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =	PM_GCT_NOSLOT_CYC,
+	[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] =	PM_CMPLU_STALL,
+	[PERF_COUNT_HW_INSTRUCTIONS] =			PM_INST_CMPL,
+	[PERF_COUNT_HW_CACHE_REFERENCES] =		PM_LD_REF_L1,
+	[PERF_COUNT_HW_CACHE_MISSES] =			PM_LD_MISS_L1,
+	[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] =		PM_BRU_FIN,
+	[PERF_COUNT_HW_BRANCH_MISSES] =			PM_BR_MPRED,
 };
 
 #define C(x)	PERF_COUNT_HW_CACHE_##x
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 06/10][v6] powerpc/perf: Export Power8 generic events in sysfs
  2013-10-16  2:06 [PATCH 00/10][v6] powerpc/perf: Export memory hierarchy level in Power7/8 Sukadev Bhattiprolu
                   ` (4 preceding siblings ...)
  2013-10-16  2:06 ` [PATCH 05/10][v6] powerpc/perf: Remove PME_ prefix for power7 events Sukadev Bhattiprolu
@ 2013-10-16  2:06 ` Sukadev Bhattiprolu
  2013-10-16  2:06 ` [PATCH 07/10][v6] powerpc/perf: Add Power8 event PM_MRK_GRP_CMPL to sysfs Sukadev Bhattiprolu
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-16  2:06 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Michael Ellerman, linux-kernel, Stephane Eranian, linuxppc-dev,
	Paul Mackerras, Anshuman Khandual

Export generic perf events for Power8 in sysfs.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
Changelog[v6]:
	[Michael Ellerman] Drop PME_ prefix in macros

 arch/powerpc/perf/power8-events-list.h |   20 +++++++++++++++
 arch/powerpc/perf/power8-pmu.c         |   44 +++++++++++++++++++++++++++-----
 2 files changed, 58 insertions(+), 6 deletions(-)
 create mode 100644 arch/powerpc/perf/power8-events-list.h

diff --git a/arch/powerpc/perf/power8-events-list.h b/arch/powerpc/perf/power8-events-list.h
new file mode 100644
index 0000000..1368547
--- /dev/null
+++ b/arch/powerpc/perf/power8-events-list.h
@@ -0,0 +1,20 @@
+/*
+ * Performance counter support for POWER8 processors.
+ *
+ * Copyright 2013 Sukadev Bhattiprolu, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+/*
+ * Some power8 event codes.
+ */
+EVENT(PM_CYC,				0x0001e)
+EVENT(PM_GCT_NOSLOT_CYC,		0x100f8)
+EVENT(PM_CMPLU_STALL,			0x4000a)
+EVENT(PM_INST_CMPL,			0x00002)
+EVENT(PM_BRU_FIN,			0x10068)
+EVENT(PM_BR_MPRED_CMPL,			0x400f6)
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 2ee4a70..5141d97 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -18,13 +18,13 @@
 /*
  * Some power8 event codes.
  */
-#define PM_CYC				0x0001e
-#define PM_GCT_NOSLOT_CYC		0x100f8
-#define PM_CMPLU_STALL			0x4000a
-#define PM_INST_CMPL			0x00002
-#define PM_BRU_FIN			0x10068
-#define PM_BR_MPRED_CMPL		0x400f6
+#define EVENT(_name, _code)	_name = _code,
 
+enum {
+#include "power8-events-list.h"
+};
+
+#undef EVENT
 
 /*
  * Raw event encoding for POWER8:
@@ -510,6 +510,37 @@ static void power8_disable_pmc(unsigned int pmc, unsigned long mmcr[])
 		mmcr[1] &= ~(0xffUL << MMCR1_PMCSEL_SHIFT(pmc + 1));
 }
 
+GENERIC_EVENT_ATTR(cpu-cyles,			PM_CYC);
+GENERIC_EVENT_ATTR(stalled-cycles-frontend,	PM_GCT_NOSLOT_CYC);
+GENERIC_EVENT_ATTR(stalled-cycles-backend,	PM_CMPLU_STALL);
+GENERIC_EVENT_ATTR(instructions,		PM_INST_CMPL);
+GENERIC_EVENT_ATTR(branch-instructions,		PM_BRU_FIN);
+GENERIC_EVENT_ATTR(branch-misses,		PM_BR_MPRED_CMPL);
+
+#define EVENT(_name, _code)	POWER_EVENT_ATTR(_name, _name);
+#include "power8-events-list.h"
+#undef EVENT
+
+#define EVENT(_name, _code)	POWER_EVENT_PTR(_name),
+
+static struct attribute *power8_events_attr[] = {
+	GENERIC_EVENT_PTR(PM_CYC),
+	GENERIC_EVENT_PTR(PM_GCT_NOSLOT_CYC),
+	GENERIC_EVENT_PTR(PM_CMPLU_STALL),
+	GENERIC_EVENT_PTR(PM_INST_CMPL),
+	GENERIC_EVENT_PTR(PM_BRU_FIN),
+	GENERIC_EVENT_PTR(PM_BR_MPRED_CMPL),
+
+	#include "power8-events-list.h"
+	#undef EVENT
+	NULL
+};
+
+static struct attribute_group power8_pmu_events_group = {
+	.name = "events",
+	.attrs = power8_events_attr,
+};
+
 PMU_FORMAT_ATTR(event,		"config:0-49");
 PMU_FORMAT_ATTR(pmcxsel,	"config:0-7");
 PMU_FORMAT_ATTR(mark,		"config:8");
@@ -546,6 +577,7 @@ struct attribute_group power8_pmu_format_group = {
 
 static const struct attribute_group *power8_pmu_attr_groups[] = {
 	&power8_pmu_format_group,
+	&power8_pmu_events_group,
 	NULL,
 };
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 07/10][v6] powerpc/perf: Add Power8 event PM_MRK_GRP_CMPL to sysfs.
  2013-10-16  2:06 [PATCH 00/10][v6] powerpc/perf: Export memory hierarchy level in Power7/8 Sukadev Bhattiprolu
                   ` (5 preceding siblings ...)
  2013-10-16  2:06 ` [PATCH 06/10][v6] powerpc/perf: Export Power8 generic events in sysfs Sukadev Bhattiprolu
@ 2013-10-16  2:06 ` Sukadev Bhattiprolu
  2013-10-16  2:06 ` [PATCH 08/10][v6] powerpc/perf: Define big-endian version of perf_mem_data_src Sukadev Bhattiprolu
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 16+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-16  2:06 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Michael Ellerman, linux-kernel, Stephane Eranian, linuxppc-dev,
	Paul Mackerras, Anshuman Khandual

The perf event PM_MRK_GRP_CMPL is useful in analyzing memory hierarchy
of applications.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
Changelog[v6]:
	- [Michael Ellerman]: Drop redundant PME_ prefix from event name.

 arch/powerpc/perf/power8-events-list.h |    1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/perf/power8-events-list.h b/arch/powerpc/perf/power8-events-list.h
index 1368547..b39e117 100644
--- a/arch/powerpc/perf/power8-events-list.h
+++ b/arch/powerpc/perf/power8-events-list.h
@@ -18,3 +18,4 @@ EVENT(PM_CMPLU_STALL,			0x4000a)
 EVENT(PM_INST_CMPL,			0x00002)
 EVENT(PM_BRU_FIN,			0x10068)
 EVENT(PM_BR_MPRED_CMPL,			0x400f6)
+EVENT(PM_MRK_GRP_CMPL,			0x40130)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 08/10][v6] powerpc/perf: Define big-endian version of perf_mem_data_src
  2013-10-16  2:06 [PATCH 00/10][v6] powerpc/perf: Export memory hierarchy level in Power7/8 Sukadev Bhattiprolu
                   ` (6 preceding siblings ...)
  2013-10-16  2:06 ` [PATCH 07/10][v6] powerpc/perf: Add Power8 event PM_MRK_GRP_CMPL to sysfs Sukadev Bhattiprolu
@ 2013-10-16  2:06 ` Sukadev Bhattiprolu
  2013-10-16  2:06 ` [PATCH 09/10][v6] powerpc/perf: Export Power8 memory hierarchy info to user space Sukadev Bhattiprolu
  2013-10-16  2:06 ` [PATCH 10/10][v6] powerpc/perf: Export Power7 " Sukadev Bhattiprolu
  9 siblings, 0 replies; 16+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-16  2:06 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Michael Ellerman, linux-kernel, Stephane Eranian, linuxppc-dev,
	Paul Mackerras, Anshuman Khandual

perf_mem_data_src is an union that is initialized via the ->val field
and accessed via the bitmap fields. For this to work on big endian
platforms, we also need a big-endian represenation of perf_mem_data_src.

Cc: Stephane Eranian <eranian@google.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
Changelog [v6]
	- [Michael Ellerman] Use __BIG_ENDIAN_BITFIELD to simplify the
	  endian check.

Changelog [v5]:
	- perf_event.h includes <byteorder.h> which pulls in the local
	  byteorder.h when building the perf tool. This local byteorder.h
	  leaves __LITTLE_ENDIAN and __BIG_ENDIAN undefined.
	  Include <endian.h> explicitly in the local byteorder.h.

Changelog [v2]:
	- [Vince Weaver, Michael Ellerman] No __KERNEL__ in uapi headers.

 include/uapi/linux/perf_event.h |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index ca1d90b..383052b7 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -695,6 +695,7 @@ enum perf_callchain_context {
 #define PERF_FLAG_FD_OUTPUT		(1U << 1)
 #define PERF_FLAG_PID_CGROUP		(1U << 2) /* pid=cgroup id, per-cpu mode only */
 
+#if defined(__LITTLE_ENDIAN_BITFIELD)
 union perf_mem_data_src {
 	__u64 val;
 	struct {
@@ -706,6 +707,21 @@ union perf_mem_data_src {
 			mem_rsvd:31;
 	};
 };
+#elif defined(__BIG_ENDIAN_BITFIELD)
+union perf_mem_data_src {
+	__u64 val;
+	struct {
+		__u64	mem_rsvd:31,
+			mem_dtlb:7,	/* tlb access */
+			mem_lock:2,	/* lock instr */
+			mem_snoop:5,	/* snoop mode */
+			mem_lvl:14,	/* memory hierarchy level */
+			mem_op:5;	/* type of opcode */
+	};
+};
+#else
+#error "Unknown endianness"
+#endif
 
 /* type of opcode (load/store/prefetch,code) */
 #define PERF_MEM_OP_NA		0x01 /* not available */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 09/10][v6] powerpc/perf: Export Power8 memory hierarchy info to user space.
  2013-10-16  2:06 [PATCH 00/10][v6] powerpc/perf: Export memory hierarchy level in Power7/8 Sukadev Bhattiprolu
                   ` (7 preceding siblings ...)
  2013-10-16  2:06 ` [PATCH 08/10][v6] powerpc/perf: Define big-endian version of perf_mem_data_src Sukadev Bhattiprolu
@ 2013-10-16  2:06 ` Sukadev Bhattiprolu
  2013-10-16  2:06 ` [PATCH 10/10][v6] powerpc/perf: Export Power7 " Sukadev Bhattiprolu
  9 siblings, 0 replies; 16+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-16  2:06 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Michael Ellerman, linux-kernel, Stephane Eranian, linuxppc-dev,
	Paul Mackerras, Anshuman Khandual

On Power8, the LDST field in SIER identifies the memory hierarchy level
(eg: L1, L2 etc), from which a data-cache miss for a marked instruction
was satisfied.

Use the 'perf_mem_data_src' object to export this hierarchy level to user
space. Fortunately, the memory hierarchy levels in Power8 map fairly easily
into the arch-neutral levels as described by the ldst_src_map[] table.

Usage:

	perf record -d -e 'cpu/PM_MRK_GRP_CMPL/' <application>
	perf report -n --mem-mode --sort=mem,sym,dso,symbol_daddr,dso_daddr"

		For samples involving load/store instructions, the memory
		hierarchy level is shown as "L1 hit", "Remote RAM hit" etc.
	# or

	perf record --data <application>
	perf report -D

		Sample records contain a 'data_src' field which encodes the
		memory hierarchy level: Eg: data_src 0x442 indicates
		MEM_OP_LOAD, MEM_LVL_HIT, MEM_LVL_L2 (i.e load hit L2).

Note that the PMU event PM_MRK_GRP_CMPL tracks all marked group completions
events. While some of these are loads and stores, others like 'add'
instructions may also be sampled. One alternative of sampling on
PM_MRK_GRP_CMPL and throwing away non-loads and non-store samples could
yield an inconsistent profile of the application.

As the precise semantics of 'perf mem -t load' or 'perf mem -t store' (which
require sampling only loads or only stores) cannot be implemented on Power,
we don't implement 'perf mem' on Power for now.

Thanks to input from Stephane Eranian, Michael Ellerman and Michael Neuling.

Cc: Stephane Eranian <eranian@google.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
Changelog[v2]:
	Drop support for 'perf mem' for Power (use perf-record and perf-report
	directly)

 arch/powerpc/include/asm/perf_event_server.h |    2 +
 arch/powerpc/perf/core-book3s.c              |   11 ++++++
 arch/powerpc/perf/power8-pmu.c               |   53 ++++++++++++++++++++++++++
 3 files changed, 66 insertions(+)

diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
index d7b3419..5f2c449 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -38,6 +38,8 @@ struct power_pmu {
 	void            (*config_bhrb)(u64 pmu_bhrb_filter);
 	void		(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
 	int		(*limited_pmc_event)(u64 event_id);
+	void		(*get_mem_data_src)(union perf_mem_data_src *dsrc,
+				struct pt_regs *regs);
 	u32		flags;
 	const struct attribute_group	**attr_groups;
 	int		n_generic;
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index eeae308..5221ba1 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1696,6 +1696,13 @@ ssize_t power_events_sysfs_show(struct device *dev,
 	return sprintf(page, "event=0x%02llx\n", pmu_attr->id);
 }
 
+static inline void power_get_mem_data_src(union perf_mem_data_src *dsrc,
+				struct pt_regs *regs)
+{
+	if  (ppmu->get_mem_data_src)
+		ppmu->get_mem_data_src(dsrc, regs);
+}
+
 struct pmu power_pmu = {
 	.pmu_enable	= power_pmu_enable,
 	.pmu_disable	= power_pmu_disable,
@@ -1777,6 +1784,10 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
 			data.br_stack = &cpuhw->bhrb_stack;
 		}
 
+		if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC &&
+						ppmu->get_mem_data_src)
+			ppmu->get_mem_data_src(&data.data_src, regs);
+
 		if (perf_event_overflow(event, &data, regs))
 			power_pmu_stop(event, 0);
 	}
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 5141d97..c25b5c3 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -541,6 +541,58 @@ static struct attribute_group power8_pmu_events_group = {
 	.attrs = power8_events_attr,
 };
 
+#define POWER8_SIER_TYPE_SHIFT	15
+#define POWER8_SIER_TYPE_MASK	(0x7LL << POWER8_SIER_TYPE_SHIFT)
+
+#define POWER8_SIER_LDST_SHIFT	1
+#define POWER8_SIER_LDST_MASK	(0x7LL << POWER8_SIER_LDST_SHIFT)
+
+#define P(a, b)			PERF_MEM_S(a, b)
+#define PLH(a, b)		(P(OP, LOAD) | P(LVL, HIT) | P(a, b))
+#define PSM(a, b)		(P(OP, STORE) | P(LVL, MISS) | P(a, b))
+
+/*
+ * Power8 interpretations:
+ * REM_CCE1: 1-hop indicates L2/L3 cache of a different core on same chip
+ * REM_CCE2: 2-hop indicates different chip or different node.
+ */
+static u64 ldst_src_map[] = {
+	/* 000 */	P(LVL, NA),
+
+	/* 001 */	PLH(LVL, L1),
+	/* 010 */	PLH(LVL, L2),
+	/* 011 */	PLH(LVL, L3),
+	/* 100 */	PLH(LVL, LOC_RAM),
+	/* 101 */	PLH(LVL, REM_CCE1),
+	/* 110 */	PLH(LVL, REM_CCE2),
+
+	/* 111 */	PSM(LVL, L1),
+};
+
+static inline bool is_load_store_inst(u64 sier)
+{
+	u64 val;
+	val = (sier & POWER8_SIER_TYPE_MASK) >> POWER8_SIER_TYPE_SHIFT;
+
+	/* 1 = load, 2 = store */
+	return val == 1 || val == 2;
+}
+
+static void power8_get_mem_data_src(union perf_mem_data_src *dsrc,
+			struct pt_regs *regs)
+{
+	u64 idx;
+	u64 sier;
+
+	sier = mfspr(SPRN_SIER);
+
+	if (is_load_store_inst(sier)) {
+		idx = (sier & POWER8_SIER_LDST_MASK) >> POWER8_SIER_LDST_SHIFT;
+
+		dsrc->val |= ldst_src_map[idx];
+	}
+}
+
 PMU_FORMAT_ATTR(event,		"config:0-49");
 PMU_FORMAT_ATTR(pmcxsel,	"config:0-7");
 PMU_FORMAT_ATTR(mark,		"config:8");
@@ -639,6 +691,7 @@ static struct power_pmu power8_pmu = {
 	.get_constraint		= power8_get_constraint,
 	.get_alternatives	= power8_get_alternatives,
 	.disable_pmc		= power8_disable_pmc,
+	.get_mem_data_src	= power8_get_mem_data_src,
 	.flags			= PPMU_HAS_SSLOT | PPMU_HAS_SIER | PPMU_BHRB | PPMU_EBB,
 	.n_generic		= ARRAY_SIZE(power8_generic_events),
 	.generic_events		= power8_generic_events,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 10/10][v6] powerpc/perf: Export Power7 memory hierarchy info to user space.
  2013-10-16  2:06 [PATCH 00/10][v6] powerpc/perf: Export memory hierarchy level in Power7/8 Sukadev Bhattiprolu
                   ` (8 preceding siblings ...)
  2013-10-16  2:06 ` [PATCH 09/10][v6] powerpc/perf: Export Power8 memory hierarchy info to user space Sukadev Bhattiprolu
@ 2013-10-16  2:06 ` Sukadev Bhattiprolu
  9 siblings, 0 replies; 16+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-16  2:06 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Michael Ellerman, linux-kernel, Stephane Eranian, linuxppc-dev,
	Paul Mackerras, Anshuman Khandual

On Power7, the DCACHE_SRC field in MMCRA register identifies the memory
hierarchy level (eg: L2, L3 etc) from which a data-cache miss for a
marked instruction was satisfied.

Use the 'perf_mem_data_src' object to export this hierarchy level to user
space. Some memory hierarchy levels in Power7 don't map into the arch-neutral
levels. However, since newer generation of the processor (i.e. Power8) uses
fewer levels than in Power7, we don't really need to define new hierarchy
levels just for Power7.

We instead, map as many levels as possible and approximate the rest. See
comments near dcache-src_map[] in the patch.

Usage:

	perf record -d -e 'cpu/PM_MRK_GRP_CMPL/' <application>
	perf report -n --mem-mode --sort=mem,sym,dso,symbol_daddr,dso_daddr"

		For samples involving load/store instructions, the memory
		hierarchy level is shown as "L1 hit", "Remote RAM hit" etc.
	# or

	perf record --data <application>
	perf report -D

		Sample records contain a 'data_src' field which encodes the
		memory hierarchy level: Eg: data_src 0x442 indicates
		MEM_OP_LOAD, MEM_LVL_HIT, MEM_LVL_L2 (i.e load hit L2).

Note that the PMU event PM_MRK_GRP_CMPL tracks all marked group completions
events. While some of these are loads and stores, others like 'add'
instructions may also be sampled.

As such, the precise semantics of 'perf mem -t load' or 'perf mem -t store'
(which require sampling only loads or only stores cannot be implemented on
Power. (Sampling on PM_MRK_GRP_CMPL and throwing away non-loads and non-store
samples could yield an inconsistent profile of the application).

Thanks to input from Stephane Eranian, Michael Ellerman and Michael Neuling.

Cc: Stephane Eranian <eranian@google.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
Changelog[v4]:
	Drop support for 'perf mem' for Power (use perf-record and perf-report
	directly)

Changelog[v3]:
	[Michael Ellerman] If newer levels that we defined in [v2] are not
	needed for Power8, ignore the new levels for Power7 also, and
	approximate them.
	Separate the TLB level mapping to a separate patchset.

Changelog[v2]:
        [Stephane Eranian] Define new levels rather than ORing the L2 and L3
        with REM_CCE1 and REM_CCE2.
        [Stephane Eranian] allocate a bit PERF_MEM_XLVL_NA for architectures
        that don't use the ->mem_xlvl field.
        Insert the TLB patch ahead so the new TLB bits are contigous with
        existing TLB bits.

 arch/powerpc/perf/power7-pmu.c |   94 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 94 insertions(+)

diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
index ae24dfc..3e86bb8 100644
--- a/arch/powerpc/perf/power7-pmu.c
+++ b/arch/powerpc/perf/power7-pmu.c
@@ -11,8 +11,10 @@
 #include <linux/kernel.h>
 #include <linux/perf_event.h>
 #include <linux/string.h>
+#include <linux/uaccess.h>
 #include <asm/reg.h>
 #include <asm/cputable.h>
+#include <asm/code-patching.h>
 
 /*
  * Bits in event code for POWER7
@@ -317,6 +319,97 @@ static void power7_disable_pmc(unsigned int pmc, unsigned long mmcr[])
 		mmcr[1] &= ~(0xffUL << MMCR1_PMCSEL_SH(pmc));
 }
 
+#define POWER7_MMCRA_DCACHE_MISS	(0x1LL << 55)
+#define POWER7_MMCRA_DCACHE_SRC_SHIFT	51
+#define POWER7_MMCRA_DCACHE_SRC_MASK	(0xFLL << POWER7_MMCRA_DCACHE_SRC_SHIFT)
+
+#define P(a, b)		PERF_MEM_S(a, b)
+#define PLH(a, b)	(P(OP, LOAD) | P(LVL, HIT) | P(a, b))
+/*
+ * Map the Power7 DCACHE_SRC field (bits 9..12) in MMCRA register to the
+ * architecture-neutral memory hierarchy levels. For the levels in Power7
+ * that don't map to the arch-neutral levels, approximate to nearest
+ * level.
+ *
+ *	1-hop:	indicates another core on the same chip (2.1 and 3.1 levels).
+ *	2-hops:	indicates a different chip on same or different node (remote
+ *		and distant levels).
+ *
+ * For consistency with this interpretation of the hops, we dont use
+ * the REM_RAM1 level below.
+ *
+ * The *SHR and *MOD states of the cache are ignored/not exported to user.
+ *
+ * ### Levels marked with ### in comments below are approximated
+ */
+static u64 dcache_src_map[] = {
+	PLH(LVL, L2),			/* 00: FROM_L2 */
+	PLH(LVL, L3),			/* 01: FROM_L3 */
+
+	P(LVL, NA),			/* 02: Reserved */
+	P(LVL, NA),			/* 03: Reserved */
+
+	PLH(LVL, REM_CCE1),		/* 04: FROM_L2.1_SHR ### */
+	PLH(LVL, REM_CCE1),		/* 05: FROM_L2.1_MOD ### */
+
+	PLH(LVL, REM_CCE1),		/* 06: FROM_L3.1_SHR ### */
+	PLH(LVL, REM_CCE1),		/* 07: FROM_L3.1_MOD ### */
+
+	PLH(LVL, REM_CCE2),		/* 08: FROM_RL2L3_SHR ### */
+	PLH(LVL, REM_CCE2),		/* 09: FROM_RL2L3_MOD ### */
+
+	PLH(LVL, REM_CCE2),		/* 10: FROM_DL2L3_SHR ### */
+	PLH(LVL, REM_CCE2),		/* 11: FROM_DL2L3_MOD ### */
+
+	PLH(LVL, LOC_RAM),		/* 12: FROM_LMEM */
+	PLH(LVL, REM_RAM2),		/* 13: FROM_RMEM ### */
+	PLH(LVL, REM_RAM2),		/* 14: FROM_DMEM */
+
+	P(LVL, NA),			/* 15: Reserved */
+};
+
+/*
+ * Determine the memory-hierarchy information (if applicable) for the
+ * instruction/address we are sampling. If we encountered a DCACHE_MISS,
+ * mmcra[DCACHE_SRC_MASK] specifies the memory level from which the operand
+ * was loaded.
+ *
+ * Otherwise, it is an L1-hit, provided the instruction was a load/store.
+ */
+static void power7_get_mem_data_src(union perf_mem_data_src *dsrc,
+			struct pt_regs *regs)
+{
+	u64 idx;
+	u64 mmcra = regs->dsisr;
+	u64 addr;
+	int ret;
+	unsigned int instr;
+
+	if (mmcra & POWER7_MMCRA_DCACHE_MISS) {
+		idx = mmcra & POWER7_MMCRA_DCACHE_SRC_MASK;
+		idx >>= POWER7_MMCRA_DCACHE_SRC_SHIFT;
+
+		dsrc->val |= dcache_src_map[idx];
+		return;
+	}
+
+	instr = 0;
+	addr = perf_instruction_pointer(regs);
+
+	if (is_kernel_addr(addr))
+		instr = *(unsigned int *)addr;
+	else {
+		pagefault_disable();
+		ret = __get_user_inatomic(instr, (unsigned int __user *)addr);
+		pagefault_enable();
+		if (ret)
+			instr = 0;
+	}
+	if (instr && instr_is_load_store_2_06(&instr))
+		dsrc->val |= PLH(LVL, L1);
+}
+
+
 static int power7_generic_events[] = {
 	[PERF_COUNT_HW_CPU_CYCLES] =			PM_CYC,
 	[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =	PM_GCT_NOSLOT_CYC,
@@ -437,6 +530,7 @@ static struct power_pmu power7_pmu = {
 	.get_constraint		= power7_get_constraint,
 	.get_alternatives	= power7_get_alternatives,
 	.disable_pmc		= power7_disable_pmc,
+	.get_mem_data_src	= power7_get_mem_data_src,
 	.flags			= PPMU_ALT_SIPR,
 	.attr_groups		= power7_pmu_attr_groups,
 	.n_generic		= ARRAY_SIZE(power7_generic_events),
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* RE: [PATCH 02/10][v6] powerpc/Power7: detect load/store instructions
  2013-10-16  2:06 ` [PATCH 02/10][v6] powerpc/Power7: detect load/store instructions Sukadev Bhattiprolu
@ 2013-10-16  8:25   ` David Laight
  2013-10-16  9:38     ` Anshuman Khandual
  2013-10-16 15:27     ` Sukadev Bhattiprolu
  0 siblings, 2 replies; 16+ messages in thread
From: David Laight @ 2013-10-16  8:25 UTC (permalink / raw)
  To: Sukadev Bhattiprolu, Arnaldo Carvalho de Melo
  Cc: Michael Ellerman, linux-kernel, Stephane Eranian, linuxppc-dev,
	Paul Mackerras, Anshuman Khandual

> Implement instr_is_load_store_2_06() to detect whether a given =
instruction
> is one of the fixed-point or floating-point load/store instructions in =
the
> POWER Instruction Set Architecture v2.06.
...
> +int instr_is_load_store_2_06(const unsigned int *instr)
> +{
> +	unsigned int op, upper, lower;
> +
> +	op =3D instr_opcode(*instr);
> +
> +	if ((op >=3D 32 && op <=3D 58) || (op =3D=3D 61 || op =3D=3D 62))
> +		return true;
> +
> +	if (op !=3D 31)
> +		return false;
> +
> +	upper =3D op >> 5;
> +	lower =3D op & 0x1f;
> +
> +	/* Short circuit as many misses as we can */
> +	if (lower < 3 || lower > 23)
> +		return false;
> +
> +	if (lower =3D=3D 3) {
> +		if (upper >=3D 16)
> +			return true;
> +
> +		return false;
> +	}
> +
> +	if (lower =3D=3D 7 || lower =3D=3D 12)
> +		return true;
> +
> +	if (lower >=3D 20) /* && lower <=3D 23 (implicit) */
> +		return true;
> +
> +	return false;
> +}

I can't help feeling the code could do with some comments about
which actual instructions are selected where.

	David

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 02/10][v6] powerpc/Power7: detect load/store instructions
  2013-10-16  8:25   ` David Laight
@ 2013-10-16  9:38     ` Anshuman Khandual
  2013-10-16 15:39       ` Sukadev Bhattiprolu
  2013-10-16 15:27     ` Sukadev Bhattiprolu
  1 sibling, 1 reply; 16+ messages in thread
From: Anshuman Khandual @ 2013-10-16  9:38 UTC (permalink / raw)
  To: David Laight
  Cc: Michael Ellerman, linux-kernel, Stephane Eranian, linuxppc-dev,
	Paul Mackerras, Arnaldo Carvalho de Melo, Sukadev Bhattiprolu

On 10/16/2013 01:55 PM, David Laight wrote:
>> Implement instr_is_load_store_2_06() to detect whether a given instruction
>> is one of the fixed-point or floating-point load/store instructions in the
>> POWER Instruction Set Architecture v2.06.
> ...

The op code encoding is dependent on the ISA version ? Does the basic load
and store instructions change with newer ISA versions ? BTW we have got a
newer version for the ISA "PowerISA_V2.07_PUBLIC.pdf" here at power.org

https://www.power.org/documentation/power-isa-version-2-07/

Does not sound like a good idea to analyse the instructions with functions
names which specify ISA version number. Besides, this function does not
belong to specific processor or platform. It has to be bit generic.
 
>> +int instr_is_load_store_2_06(const unsigned int *instr)
>> +{
>> +	unsigned int op, upper, lower;
>> +
>> +	op = instr_opcode(*instr);
>> +
>> +	if ((op >= 32 && op <= 58) || (op == 61 || op == 62))
>> +		return true;
>> +
>> +	if (op != 31)
>> +		return false;
>> +
>> +	upper = op >> 5;
>> +	lower = op & 0x1f;
>> +
>> +	/* Short circuit as many misses as we can */
>> +	if (lower < 3 || lower > 23)
>> +		return false;
>> +
>> +	if (lower == 3) {
>> +		if (upper >= 16)
>> +			return true;
>> +
>> +		return false;
>> +	}
>> +
>> +	if (lower == 7 || lower == 12)
>> +		return true;
>> +
>> +	if (lower >= 20) /* && lower <= 23 (implicit) */
>> +		return true;
>> +
>> +	return false;
>> +}
> 
> I can't help feeling the code could do with some comments about
> which actual instructions are selected where.

Yeah, I agree. At least which category of load-store instructions are
getting selected in each case.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 02/10][v6] powerpc/Power7: detect load/store instructions
  2013-10-16  8:25   ` David Laight
  2013-10-16  9:38     ` Anshuman Khandual
@ 2013-10-16 15:27     ` Sukadev Bhattiprolu
  2013-10-17 17:20       ` Sukadev Bhattiprolu
  1 sibling, 1 reply; 16+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-16 15:27 UTC (permalink / raw)
  To: David Laight
  Cc: Michael Ellerman, linux-kernel, Stephane Eranian, linuxppc-dev,
	Paul Mackerras, Arnaldo Carvalho de Melo, Anshuman Khandual

David Laight [David.Laight@aculab.com] wrote:
| 
| I can't help feeling the code could do with some comments about
| which actual instructions are selected where.

At a high level, only the load and store instructions are selected.

I added a reference to the Appendix F (Opcode maps) in the function
header.  The opcode maps is a table of upper x lower values. From
that table it should be fairly straightforward which instructions
are selected.

How about I add this to the function header ?

 * Please use the table in Appendix F (opcode maps) to determine
 * events selected by this function.

There are over 100 instructions selected by this list and wasn't
sure if we should list them all.

Sukadev

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 02/10][v6] powerpc/Power7: detect load/store instructions
  2013-10-16  9:38     ` Anshuman Khandual
@ 2013-10-16 15:39       ` Sukadev Bhattiprolu
  0 siblings, 0 replies; 16+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-16 15:39 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: Michael Ellerman, linux-kernel, Stephane Eranian, linuxppc-dev,
	David Laight, Paul Mackerras, Arnaldo Carvalho de Melo

Anshuman Khandual [khandual@linux.vnet.ibm.com] wrote:
| On 10/16/2013 01:55 PM, David Laight wrote:
| >> Implement instr_is_load_store_2_06() to detect whether a given instruction
| >> is one of the fixed-point or floating-point load/store instructions in the
| >> POWER Instruction Set Architecture v2.06.
| > ...
| 
| The op code encoding is dependent on the ISA version ? Does the basic load
| and store instructions change with newer ISA versions ?

TBH, I don't know whether the encoding is dependent on the ISA version.

We need this for a very narrow/specific purpose on Power7 _and_ did not
want to set up expectations that it will work with all versions. Hence
the horribly named function :-)

| BTW we have got a
| newer version for the ISA "PowerISA_V2.07_PUBLIC.pdf" here at power.org
| 
| https://www.power.org/documentation/power-isa-version-2-07/

Yes, but on Power8 there is a bit in the SIER that tells us whether it
is a load or store instruction. We use that and don't need to determine
in software.

Power7 does not have such a bit and we need this only for Power7. We are
not targetting this "memory hierarchy" feature for Power6 or older processors.

| 
| Does not sound like a good idea to analyse the instructions with functions
| names which specify ISA version number. Besides, this function does not
| belong to specific processor or platform. It has to be bit generic.
| 
| >> +int instr_is_load_store_2_06(const unsigned int *instr)
| >> +{
| >> +	unsigned int op, upper, lower;
| >> +
| >> +	op = instr_opcode(*instr);
| >> +
| >> +	if ((op >= 32 && op <= 58) || (op == 61 || op == 62))
| >> +		return true;
| >> +
| >> +	if (op != 31)
| >> +		return false;
| >> +
| >> +	upper = op >> 5;
| >> +	lower = op & 0x1f;
| >> +
| >> +	/* Short circuit as many misses as we can */
| >> +	if (lower < 3 || lower > 23)
| >> +		return false;
| >> +
| >> +	if (lower == 3) {
| >> +		if (upper >= 16)
| >> +			return true;
| >> +
| >> +		return false;
| >> +	}
| >> +
| >> +	if (lower == 7 || lower == 12)
| >> +		return true;
| >> +
| >> +	if (lower >= 20) /* && lower <= 23 (implicit) */
| >> +		return true;
| >> +
| >> +	return false;
| >> +}
| > 
| > I can't help feeling the code could do with some comments about
| > which actual instructions are selected where.
| 
| Yeah, I agree. At least which category of load-store instructions are
| getting selected in each case.

Like I mentioned in the other message, how about adding a couple
of lines in the function header ?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 02/10][v6] powerpc/Power7: detect load/store instructions
  2013-10-16 15:27     ` Sukadev Bhattiprolu
@ 2013-10-17 17:20       ` Sukadev Bhattiprolu
  0 siblings, 0 replies; 16+ messages in thread
From: Sukadev Bhattiprolu @ 2013-10-17 17:20 UTC (permalink / raw)
  To: David Laight, Arnaldo Carvalho de Melo
  Cc: Michael Ellerman, linux-kernel, Stephane Eranian, linuxppc-dev,
	Paul Mackerras, Anshuman Khandual

| 
| How about I add this to the function header ?
| 
|  * Please use the table in Appendix F (opcode maps) to determine
|  * events selected by this function.

Here is the updated patch with the comment.
---

>From 38d1f9ac67a7f50db593e5875a8de6a2ecbea8e0 Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Date: Fri, 23 Aug 2013 18:35:02 -0700
Subject: [PATCH 6/10][v6] powerpc/Power7: detect load/store instructions

Implement instr_is_load_store_2_06() to detect whether a given instruction
is one of the fixed-point or floating-point load/store instructions in the
POWER Instruction Set Architecture v2.06.

This function will be used in a follow-on patch to save memory hierarchy
information of the load/store on a Power7 system. (Power8 systems set some
bits in the SIER to identify load/store operations and hence don't need a
similar functionality).

Based on optimized code from Michael Ellerman and comments from Tom Musta.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
Changelog[v6]
	- [David Laight, Anshuman Khandual] Add a comment in function
	  header to help better understand which instructions are selected
	  by the instr_is_load_store_2_06().
	- [Michael Ellerman, Tom Musta]: Optmize the implementation to
	  avoid for loop.

 arch/powerpc/include/asm/code-patching.h |    1 +
 arch/powerpc/lib/code-patching.c         |   48 ++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
index a6f8c7a..9cc3ef1 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -34,6 +34,7 @@ int instr_is_branch_to_addr(const unsigned int *instr, unsigned long addr);
 unsigned long branch_target(const unsigned int *instr);
 unsigned int translate_branch(const unsigned int *dest,
 			      const unsigned int *src);
+int instr_is_load_store_2_06(const unsigned int *instr);
 
 static inline unsigned long ppc_function_entry(void *func)
 {
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 2bc9db3..84571aa 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -159,6 +159,54 @@ unsigned int translate_branch(const unsigned int *dest, const unsigned int *src)
 	return 0;
 }
 
+/*
+ * Determine if the op code in the instruction corresponds to a load or
+ * store instruction. Ignore the vector load instructions like evlddepx,
+ * evstddepx for now.
+ *
+ * This function is valid for POWER ISA 2.06.
+ *
+ * Reference:	PowerISA_V2.06B_Public.pdf, Sections 3.3.2 through 3.3.6
+ *		and 4.6.2 through 4.6.4, Appendix F (Opcode Maps).
+ *
+ *		Use the tables in Appendix F (Opcode Maps) to identify
+ *		instructions selected by this function.
+ */
+int instr_is_load_store_2_06(const unsigned int *instr)
+{
+	unsigned int op, upper, lower;
+
+	op = instr_opcode(*instr);
+
+	if ((op >= 32 && op <= 58) || (op == 61 || op == 62))
+		return true;
+
+	if (op != 31)
+		return false;
+
+	upper = op >> 5;
+	lower = op & 0x1f;
+
+	/* Short circuit as many misses as we can */
+	if (lower < 3 || lower > 23)
+		return false;
+
+	if (lower == 3) {
+		if (upper >= 16)
+			return true;
+
+		return false;
+	}
+
+	if (lower == 7 || lower == 12)
+		return true;
+
+	if (lower >= 20) /* && lower <= 23 (implicit) */
+		return true;
+
+	return false;
+}
+
 
 #ifdef CONFIG_CODE_PATCHING_SELFTEST
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-10-17 17:21 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-16  2:06 [PATCH 00/10][v6] powerpc/perf: Export memory hierarchy level in Power7/8 Sukadev Bhattiprolu
2013-10-16  2:06 ` [PATCH 01/10][v6] powerpc: Rename branch_opcode() to instr_opcode() Sukadev Bhattiprolu
2013-10-16  2:06 ` [PATCH 02/10][v6] powerpc/Power7: detect load/store instructions Sukadev Bhattiprolu
2013-10-16  8:25   ` David Laight
2013-10-16  9:38     ` Anshuman Khandual
2013-10-16 15:39       ` Sukadev Bhattiprolu
2013-10-16 15:27     ` Sukadev Bhattiprolu
2013-10-17 17:20       ` Sukadev Bhattiprolu
2013-10-16  2:06 ` [PATCH 03/10][v6] tools/perf: silence compiler warnings Sukadev Bhattiprolu
2013-10-16  2:06 ` [PATCH 04/10][v6] tools/perf: Remove local byteorder.h Sukadev Bhattiprolu
2013-10-16  2:06 ` [PATCH 05/10][v6] powerpc/perf: Remove PME_ prefix for power7 events Sukadev Bhattiprolu
2013-10-16  2:06 ` [PATCH 06/10][v6] powerpc/perf: Export Power8 generic events in sysfs Sukadev Bhattiprolu
2013-10-16  2:06 ` [PATCH 07/10][v6] powerpc/perf: Add Power8 event PM_MRK_GRP_CMPL to sysfs Sukadev Bhattiprolu
2013-10-16  2:06 ` [PATCH 08/10][v6] powerpc/perf: Define big-endian version of perf_mem_data_src Sukadev Bhattiprolu
2013-10-16  2:06 ` [PATCH 09/10][v6] powerpc/perf: Export Power8 memory hierarchy info to user space Sukadev Bhattiprolu
2013-10-16  2:06 ` [PATCH 10/10][v6] powerpc/perf: Export Power7 " Sukadev Bhattiprolu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).