All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8][v4] powerpc/perf: Export memory hierarchy level in Power7/8.
@ 2013-09-14  0:49 ` Sukadev Bhattiprolu
  0 siblings, 0 replies; 27+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14  0:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linuxppc-dev, Stephane Eranian, Michael Ellerman, Paul Mackerras,
	Anshuman Khandual

Power7 and Power8 processors save the memory hierarchy level (eg: L2, L3)
from which a load or store instruction was satisfied. Export this hierarchy
information to the user via the perf_mem_data_src object.

Thanks to input from Stephane Eranian, Michael Ellerman, Michael Neuling.

Sukadev Bhattiprolu (8):
  powerpc/perf: Rename Power8 macros to start with PME
  powerpc/perf: Export Power8 generic events in sysfs
  powerpc/perf: Add PM_MRK_GRP_CMPL event to sysfs.
  powerpc/perf: Define big-endian version of perf_mem_data_src
  powerpc/perf: Export Power8 memory hierarchy info to user space.
  powerpc: Rename branch_opcode() to instr_opcode()
  power: implement is_instr_load_store().
  powerpc/perf: Export Power7 memory hierarchy info to user space.

 arch/powerpc/include/asm/code-patching.h     |    1 +
 arch/powerpc/include/asm/perf_event_server.h |    2 +
 arch/powerpc/lib/code-patching.c             |   96 ++++++++++++++++++++++-
 arch/powerpc/perf/core-book3s.c              |   11 +++
 arch/powerpc/perf/power7-pmu.c               |   94 +++++++++++++++++++++++
 arch/powerpc/perf/power8-pmu.c               |  105 +++++++++++++++++++++++---
 include/uapi/linux/perf_event.h              |   58 ++++++++++++++
 7 files changed, 352 insertions(+), 15 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 0/8][v4] powerpc/perf: Export memory hierarchy level in Power7/8.
@ 2013-09-14  0:49 ` Sukadev Bhattiprolu
  0 siblings, 0 replies; 27+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14  0:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linuxppc-dev, Paul Mackerras, Michael Ellerman, Stephane Eranian,
	Anshuman Khandual

Power7 and Power8 processors save the memory hierarchy level (eg: L2, L3)
from which a load or store instruction was satisfied. Export this hierarchy
information to the user via the perf_mem_data_src object.

Thanks to input from Stephane Eranian, Michael Ellerman, Michael Neuling.

Sukadev Bhattiprolu (8):
  powerpc/perf: Rename Power8 macros to start with PME
  powerpc/perf: Export Power8 generic events in sysfs
  powerpc/perf: Add PM_MRK_GRP_CMPL event to sysfs.
  powerpc/perf: Define big-endian version of perf_mem_data_src
  powerpc/perf: Export Power8 memory hierarchy info to user space.
  powerpc: Rename branch_opcode() to instr_opcode()
  power: implement is_instr_load_store().
  powerpc/perf: Export Power7 memory hierarchy info to user space.

 arch/powerpc/include/asm/code-patching.h     |    1 +
 arch/powerpc/include/asm/perf_event_server.h |    2 +
 arch/powerpc/lib/code-patching.c             |   96 ++++++++++++++++++++++-
 arch/powerpc/perf/core-book3s.c              |   11 +++
 arch/powerpc/perf/power7-pmu.c               |   94 +++++++++++++++++++++++
 arch/powerpc/perf/power8-pmu.c               |  105 +++++++++++++++++++++++---
 include/uapi/linux/perf_event.h              |   58 ++++++++++++++
 7 files changed, 352 insertions(+), 15 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/8][v4] powerpc/perf: Rename Power8 macros to start with PME
  2013-09-14  0:49 ` Sukadev Bhattiprolu
@ 2013-09-14  0:49   ` Sukadev Bhattiprolu
  -1 siblings, 0 replies; 27+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14  0:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linuxppc-dev, Stephane Eranian, Michael Ellerman, Paul Mackerras,
	Anshuman Khandual

We use helpers like GENERIC_EVENT_ATTR() to list the generic events in
sysfs. To avoid name collisions, GENERIC_EVENT_ATTR() requires the perf
event macros to start with PME.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 arch/powerpc/perf/power8-pmu.c |   24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 96a64d6..30c6b12 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -18,12 +18,12 @@
 /*
  * Some power8 event codes.
  */
-#define PM_CYC				0x0001e
-#define PM_GCT_NOSLOT_CYC		0x100f8
-#define PM_CMPLU_STALL			0x4000a
-#define PM_INST_CMPL			0x00002
-#define PM_BRU_FIN			0x10068
-#define PM_BR_MPRED_CMPL		0x400f6
+#define PME_PM_CYC				0x0001e
+#define PME_PM_GCT_NOSLOT_CYC			0x100f8
+#define PME_PM_CMPLU_STALL			0x4000a
+#define PME_PM_INST_CMPL			0x00002
+#define PME_PM_BRU_FIN				0x10068
+#define PME_PM_BR_MPRED_CMPL			0x400f6
 
 
 /*
@@ -550,12 +550,12 @@ static const struct attribute_group *power8_pmu_attr_groups[] = {
 };
 
 static int power8_generic_events[] = {
-	[PERF_COUNT_HW_CPU_CYCLES] =			PM_CYC,
-	[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =	PM_GCT_NOSLOT_CYC,
-	[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] =	PM_CMPLU_STALL,
-	[PERF_COUNT_HW_INSTRUCTIONS] =			PM_INST_CMPL,
-	[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] =		PM_BRU_FIN,
-	[PERF_COUNT_HW_BRANCH_MISSES] =			PM_BR_MPRED_CMPL,
+	[PERF_COUNT_HW_CPU_CYCLES] =			PME_PM_CYC,
+	[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =	PME_PM_GCT_NOSLOT_CYC,
+	[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] =	PME_PM_CMPLU_STALL,
+	[PERF_COUNT_HW_INSTRUCTIONS] =			PME_PM_INST_CMPL,
+	[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] =		PME_PM_BRU_FIN,
+	[PERF_COUNT_HW_BRANCH_MISSES] =			PME_PM_BR_MPRED_CMPL,
 };
 
 static u64 power8_bhrb_filter_map(u64 branch_sample_type)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 1/8][v4] powerpc/perf: Rename Power8 macros to start with PME
@ 2013-09-14  0:49   ` Sukadev Bhattiprolu
  0 siblings, 0 replies; 27+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14  0:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linuxppc-dev, Paul Mackerras, Michael Ellerman, Stephane Eranian,
	Anshuman Khandual

We use helpers like GENERIC_EVENT_ATTR() to list the generic events in
sysfs. To avoid name collisions, GENERIC_EVENT_ATTR() requires the perf
event macros to start with PME.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 arch/powerpc/perf/power8-pmu.c |   24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 96a64d6..30c6b12 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -18,12 +18,12 @@
 /*
  * Some power8 event codes.
  */
-#define PM_CYC				0x0001e
-#define PM_GCT_NOSLOT_CYC		0x100f8
-#define PM_CMPLU_STALL			0x4000a
-#define PM_INST_CMPL			0x00002
-#define PM_BRU_FIN			0x10068
-#define PM_BR_MPRED_CMPL		0x400f6
+#define PME_PM_CYC				0x0001e
+#define PME_PM_GCT_NOSLOT_CYC			0x100f8
+#define PME_PM_CMPLU_STALL			0x4000a
+#define PME_PM_INST_CMPL			0x00002
+#define PME_PM_BRU_FIN				0x10068
+#define PME_PM_BR_MPRED_CMPL			0x400f6
 
 
 /*
@@ -550,12 +550,12 @@ static const struct attribute_group *power8_pmu_attr_groups[] = {
 };
 
 static int power8_generic_events[] = {
-	[PERF_COUNT_HW_CPU_CYCLES] =			PM_CYC,
-	[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =	PM_GCT_NOSLOT_CYC,
-	[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] =	PM_CMPLU_STALL,
-	[PERF_COUNT_HW_INSTRUCTIONS] =			PM_INST_CMPL,
-	[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] =		PM_BRU_FIN,
-	[PERF_COUNT_HW_BRANCH_MISSES] =			PM_BR_MPRED_CMPL,
+	[PERF_COUNT_HW_CPU_CYCLES] =			PME_PM_CYC,
+	[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =	PME_PM_GCT_NOSLOT_CYC,
+	[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] =	PME_PM_CMPLU_STALL,
+	[PERF_COUNT_HW_INSTRUCTIONS] =			PME_PM_INST_CMPL,
+	[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] =		PME_PM_BRU_FIN,
+	[PERF_COUNT_HW_BRANCH_MISSES] =			PME_PM_BR_MPRED_CMPL,
 };
 
 static u64 power8_bhrb_filter_map(u64 branch_sample_type)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/8][v4] powerpc/perf: Export Power8 generic events in sysfs
  2013-09-14  0:49 ` Sukadev Bhattiprolu
@ 2013-09-14  0:49   ` Sukadev Bhattiprolu
  -1 siblings, 0 replies; 27+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14  0:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linuxppc-dev, Stephane Eranian, Michael Ellerman, Paul Mackerras,
	Anshuman Khandual

Export generic perf events for Power8 in sysfs.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 arch/powerpc/perf/power8-pmu.c |   23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 30c6b12..ff98fb8 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -510,6 +510,28 @@ static void power8_disable_pmc(unsigned int pmc, unsigned long mmcr[])
 		mmcr[1] &= ~(0xffUL << MMCR1_PMCSEL_SHIFT(pmc + 1));
 }
 
+GENERIC_EVENT_ATTR(cpu-cyles,			PM_CYC);
+GENERIC_EVENT_ATTR(stalled-cycles-frontend,	PM_GCT_NOSLOT_CYC);
+GENERIC_EVENT_ATTR(stalled-cycles-backend,	PM_CMPLU_STALL);
+GENERIC_EVENT_ATTR(instructions,		PM_INST_CMPL);
+GENERIC_EVENT_ATTR(branch-instructions,		PM_BRU_FIN);
+GENERIC_EVENT_ATTR(branch-misses,		PM_BR_MPRED_CMPL);
+
+static struct attribute *power8_events_attr[] = {
+	GENERIC_EVENT_PTR(PM_CYC),
+	GENERIC_EVENT_PTR(PM_GCT_NOSLOT_CYC),
+	GENERIC_EVENT_PTR(PM_CMPLU_STALL),
+	GENERIC_EVENT_PTR(PM_INST_CMPL),
+	GENERIC_EVENT_PTR(PM_BRU_FIN),
+	GENERIC_EVENT_PTR(PM_BR_MPRED_CMPL),
+	NULL
+};
+
+static struct attribute_group power8_pmu_events_group = {
+	.name = "events",
+	.attrs = power8_events_attr,
+};
+
 PMU_FORMAT_ATTR(event,		"config:0-49");
 PMU_FORMAT_ATTR(pmcxsel,	"config:0-7");
 PMU_FORMAT_ATTR(mark,		"config:8");
@@ -546,6 +568,7 @@ struct attribute_group power8_pmu_format_group = {
 
 static const struct attribute_group *power8_pmu_attr_groups[] = {
 	&power8_pmu_format_group,
+	&power8_pmu_events_group,
 	NULL,
 };
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/8][v4] powerpc/perf: Export Power8 generic events in sysfs
@ 2013-09-14  0:49   ` Sukadev Bhattiprolu
  0 siblings, 0 replies; 27+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14  0:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linuxppc-dev, Paul Mackerras, Michael Ellerman, Stephane Eranian,
	Anshuman Khandual

Export generic perf events for Power8 in sysfs.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 arch/powerpc/perf/power8-pmu.c |   23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 30c6b12..ff98fb8 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -510,6 +510,28 @@ static void power8_disable_pmc(unsigned int pmc, unsigned long mmcr[])
 		mmcr[1] &= ~(0xffUL << MMCR1_PMCSEL_SHIFT(pmc + 1));
 }
 
+GENERIC_EVENT_ATTR(cpu-cyles,			PM_CYC);
+GENERIC_EVENT_ATTR(stalled-cycles-frontend,	PM_GCT_NOSLOT_CYC);
+GENERIC_EVENT_ATTR(stalled-cycles-backend,	PM_CMPLU_STALL);
+GENERIC_EVENT_ATTR(instructions,		PM_INST_CMPL);
+GENERIC_EVENT_ATTR(branch-instructions,		PM_BRU_FIN);
+GENERIC_EVENT_ATTR(branch-misses,		PM_BR_MPRED_CMPL);
+
+static struct attribute *power8_events_attr[] = {
+	GENERIC_EVENT_PTR(PM_CYC),
+	GENERIC_EVENT_PTR(PM_GCT_NOSLOT_CYC),
+	GENERIC_EVENT_PTR(PM_CMPLU_STALL),
+	GENERIC_EVENT_PTR(PM_INST_CMPL),
+	GENERIC_EVENT_PTR(PM_BRU_FIN),
+	GENERIC_EVENT_PTR(PM_BR_MPRED_CMPL),
+	NULL
+};
+
+static struct attribute_group power8_pmu_events_group = {
+	.name = "events",
+	.attrs = power8_events_attr,
+};
+
 PMU_FORMAT_ATTR(event,		"config:0-49");
 PMU_FORMAT_ATTR(pmcxsel,	"config:0-7");
 PMU_FORMAT_ATTR(mark,		"config:8");
@@ -546,6 +568,7 @@ struct attribute_group power8_pmu_format_group = {
 
 static const struct attribute_group *power8_pmu_attr_groups[] = {
 	&power8_pmu_format_group,
+	&power8_pmu_events_group,
 	NULL,
 };
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/8][v4] powerpc/perf: Add PM_MRK_GRP_CMPL event to sysfs.
  2013-09-14  0:49 ` Sukadev Bhattiprolu
@ 2013-09-14  0:49   ` Sukadev Bhattiprolu
  -1 siblings, 0 replies; 27+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14  0:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linuxppc-dev, Stephane Eranian, Michael Ellerman, Paul Mackerras,
	Anshuman Khandual

The perf event PM_MRK_GRP_CMPL is useful in analyzing memory hierarchy
of applications.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 arch/powerpc/perf/power8-pmu.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index ff98fb8..5c61e59 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -24,6 +24,7 @@
 #define PME_PM_INST_CMPL			0x00002
 #define PME_PM_BRU_FIN				0x10068
 #define PME_PM_BR_MPRED_CMPL			0x400f6
+#define PME_PM_MRK_GRP_CMPL			0x40130
 
 
 /*
@@ -517,6 +518,8 @@ GENERIC_EVENT_ATTR(instructions,		PM_INST_CMPL);
 GENERIC_EVENT_ATTR(branch-instructions,		PM_BRU_FIN);
 GENERIC_EVENT_ATTR(branch-misses,		PM_BR_MPRED_CMPL);
 
+POWER_EVENT_ATTR(PM_MRK_GRP_CMPL,		PM_MRK_GRP_CMPL);
+
 static struct attribute *power8_events_attr[] = {
 	GENERIC_EVENT_PTR(PM_CYC),
 	GENERIC_EVENT_PTR(PM_GCT_NOSLOT_CYC),
@@ -524,6 +527,8 @@ static struct attribute *power8_events_attr[] = {
 	GENERIC_EVENT_PTR(PM_INST_CMPL),
 	GENERIC_EVENT_PTR(PM_BRU_FIN),
 	GENERIC_EVENT_PTR(PM_BR_MPRED_CMPL),
+
+	POWER_EVENT_PTR(PM_MRK_GRP_CMPL),
 	NULL
 };
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/8][v4] powerpc/perf: Add PM_MRK_GRP_CMPL event to sysfs.
@ 2013-09-14  0:49   ` Sukadev Bhattiprolu
  0 siblings, 0 replies; 27+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14  0:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linuxppc-dev, Paul Mackerras, Michael Ellerman, Stephane Eranian,
	Anshuman Khandual

The perf event PM_MRK_GRP_CMPL is useful in analyzing memory hierarchy
of applications.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 arch/powerpc/perf/power8-pmu.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index ff98fb8..5c61e59 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -24,6 +24,7 @@
 #define PME_PM_INST_CMPL			0x00002
 #define PME_PM_BRU_FIN				0x10068
 #define PME_PM_BR_MPRED_CMPL			0x400f6
+#define PME_PM_MRK_GRP_CMPL			0x40130
 
 
 /*
@@ -517,6 +518,8 @@ GENERIC_EVENT_ATTR(instructions,		PM_INST_CMPL);
 GENERIC_EVENT_ATTR(branch-instructions,		PM_BRU_FIN);
 GENERIC_EVENT_ATTR(branch-misses,		PM_BR_MPRED_CMPL);
 
+POWER_EVENT_ATTR(PM_MRK_GRP_CMPL,		PM_MRK_GRP_CMPL);
+
 static struct attribute *power8_events_attr[] = {
 	GENERIC_EVENT_PTR(PM_CYC),
 	GENERIC_EVENT_PTR(PM_GCT_NOSLOT_CYC),
@@ -524,6 +527,8 @@ static struct attribute *power8_events_attr[] = {
 	GENERIC_EVENT_PTR(PM_INST_CMPL),
 	GENERIC_EVENT_PTR(PM_BRU_FIN),
 	GENERIC_EVENT_PTR(PM_BR_MPRED_CMPL),
+
+	POWER_EVENT_PTR(PM_MRK_GRP_CMPL),
 	NULL
 };
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/8][v4] powerpc/perf: Define big-endian version of perf_mem_data_src
  2013-09-14  0:49 ` Sukadev Bhattiprolu
@ 2013-09-14  0:49   ` Sukadev Bhattiprolu
  -1 siblings, 0 replies; 27+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14  0:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linuxppc-dev, Stephane Eranian, Michael Ellerman, Paul Mackerras,
	Anshuman Khandual

perf_mem_data_src is an union that is initialized via the ->val field
and accessed via the bitmap fields. For this to work on big endian
platforms, we also need a big-endian represenation of perf_mem_data_src.

Cc: Stephane Eranian <eranian@google.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
Changelog [v2]:
	- [Vince Weaver, Michael Ellerman] No __KERNEL__ in uapi headers.

 include/uapi/linux/perf_event.h |   58 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 62c25a2..7a4f9bb 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -19,6 +19,50 @@
 #include <asm/byteorder.h>
 
 /*
+ * Kernel and userspace check for endianness in incompatible ways.
+ * In user space, <endian.h> defines both __BIG_ENDIAN and __LITTLE_ENDIAN
+ * but sets __BYTE_ORDER to one or the other. So user space uses checks are:
+ *
+ *	#if __BYTE_ORDER == __LITTLE_ENDIAN
+ *
+ * In the kernel, __BYTE_ORDER is undefined, so using the above check doesn't
+ * work. Further, kernel code assumes that exactly one of __BIG_ENDIAN and
+ * __LITTLE_ENDIAN is defined.  So the kernel checks are like:
+ *
+ *	#if defined(__LITTLE_ENDIAN)
+ *
+ * But we can't use that check in user space since __LITTLE_ENDIAN (and
+ * __BIG_ENDIAN) are always defined.
+ *
+ * Since some perf data structures depend on endianness _and_ are shared
+ * between kernel and user, perf needs its own notion of endian macros (at
+ * least until user and kernel endian checks converge).
+ */
+#define __PERF_LE	1234
+#define __PERF_BE	4321
+
+#if defined(__BYTE_ORDER)
+
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+#define __PERF_BYTE_ORDER	__PERF_LE
+#elif __BYTE_ORDER == __BIG_ENDIAN
+#define __PERF_BYTE_ORDER	__PERF_BE
+#endif
+
+#else /* __BYTE_ORDER */
+
+#if defined(__LITTLE_ENDIAN) && defined(__BIG_ENDIAN)
+#error "Cannot determine endianness"
+#elif defined(__LITTLE_ENDIAN)
+#define __PERF_BYTE_ORDER	__PERF_LE
+#elif defined(__BIG_ENDIAN)
+#define __PERF_BYTE_ORDER	__PERF_BE
+#endif
+
+
+#endif /* __BYTE_ORDER */
+
+/*
  * User-space ABI bits:
  */
 
@@ -659,6 +703,7 @@ enum perf_callchain_context {
 #define PERF_FLAG_FD_OUTPUT		(1U << 1)
 #define PERF_FLAG_PID_CGROUP		(1U << 2) /* pid=cgroup id, per-cpu mode only */
 
+#if __PERF_BYTE_ORDER == __PERF_LE
 union perf_mem_data_src {
 	__u64 val;
 	struct {
@@ -670,6 +715,19 @@ union perf_mem_data_src {
 			mem_rsvd:31;
 	};
 };
+#elif __PERF_BYTE_ORDER == __PERF_BE
+union perf_mem_data_src {
+	__u64 val;
+	struct {
+		__u64	mem_rsvd:31,
+			mem_dtlb:7,	/* tlb access */
+			mem_lock:2,	/* lock instr */
+			mem_snoop:5,	/* snoop mode */
+			mem_lvl:14,	/* memory hierarchy level */
+			mem_op:5;	/* type of opcode */
+	};
+};
+#endif
 
 /* type of opcode (load/store/prefetch,code) */
 #define PERF_MEM_OP_NA		0x01 /* not available */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/8][v4] powerpc/perf: Define big-endian version of perf_mem_data_src
@ 2013-09-14  0:49   ` Sukadev Bhattiprolu
  0 siblings, 0 replies; 27+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14  0:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linuxppc-dev, Paul Mackerras, Michael Ellerman, Stephane Eranian,
	Anshuman Khandual

perf_mem_data_src is an union that is initialized via the ->val field
and accessed via the bitmap fields. For this to work on big endian
platforms, we also need a big-endian represenation of perf_mem_data_src.

Cc: Stephane Eranian <eranian@google.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
Changelog [v2]:
	- [Vince Weaver, Michael Ellerman] No __KERNEL__ in uapi headers.

 include/uapi/linux/perf_event.h |   58 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 58 insertions(+)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 62c25a2..7a4f9bb 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -19,6 +19,50 @@
 #include <asm/byteorder.h>
 
 /*
+ * Kernel and userspace check for endianness in incompatible ways.
+ * In user space, <endian.h> defines both __BIG_ENDIAN and __LITTLE_ENDIAN
+ * but sets __BYTE_ORDER to one or the other. So user space uses checks are:
+ *
+ *	#if __BYTE_ORDER == __LITTLE_ENDIAN
+ *
+ * In the kernel, __BYTE_ORDER is undefined, so using the above check doesn't
+ * work. Further, kernel code assumes that exactly one of __BIG_ENDIAN and
+ * __LITTLE_ENDIAN is defined.  So the kernel checks are like:
+ *
+ *	#if defined(__LITTLE_ENDIAN)
+ *
+ * But we can't use that check in user space since __LITTLE_ENDIAN (and
+ * __BIG_ENDIAN) are always defined.
+ *
+ * Since some perf data structures depend on endianness _and_ are shared
+ * between kernel and user, perf needs its own notion of endian macros (at
+ * least until user and kernel endian checks converge).
+ */
+#define __PERF_LE	1234
+#define __PERF_BE	4321
+
+#if defined(__BYTE_ORDER)
+
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+#define __PERF_BYTE_ORDER	__PERF_LE
+#elif __BYTE_ORDER == __BIG_ENDIAN
+#define __PERF_BYTE_ORDER	__PERF_BE
+#endif
+
+#else /* __BYTE_ORDER */
+
+#if defined(__LITTLE_ENDIAN) && defined(__BIG_ENDIAN)
+#error "Cannot determine endianness"
+#elif defined(__LITTLE_ENDIAN)
+#define __PERF_BYTE_ORDER	__PERF_LE
+#elif defined(__BIG_ENDIAN)
+#define __PERF_BYTE_ORDER	__PERF_BE
+#endif
+
+
+#endif /* __BYTE_ORDER */
+
+/*
  * User-space ABI bits:
  */
 
@@ -659,6 +703,7 @@ enum perf_callchain_context {
 #define PERF_FLAG_FD_OUTPUT		(1U << 1)
 #define PERF_FLAG_PID_CGROUP		(1U << 2) /* pid=cgroup id, per-cpu mode only */
 
+#if __PERF_BYTE_ORDER == __PERF_LE
 union perf_mem_data_src {
 	__u64 val;
 	struct {
@@ -670,6 +715,19 @@ union perf_mem_data_src {
 			mem_rsvd:31;
 	};
 };
+#elif __PERF_BYTE_ORDER == __PERF_BE
+union perf_mem_data_src {
+	__u64 val;
+	struct {
+		__u64	mem_rsvd:31,
+			mem_dtlb:7,	/* tlb access */
+			mem_lock:2,	/* lock instr */
+			mem_snoop:5,	/* snoop mode */
+			mem_lvl:14,	/* memory hierarchy level */
+			mem_op:5;	/* type of opcode */
+	};
+};
+#endif
 
 /* type of opcode (load/store/prefetch,code) */
 #define PERF_MEM_OP_NA		0x01 /* not available */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 5/8][v4] powerpc/perf: Export Power8 memory hierarchy info to user space.
  2013-09-14  0:49 ` Sukadev Bhattiprolu
@ 2013-09-14  0:49   ` Sukadev Bhattiprolu
  -1 siblings, 0 replies; 27+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14  0:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linuxppc-dev, Stephane Eranian, Michael Ellerman, Paul Mackerras,
	Anshuman Khandual

On Power8, the LDST field in SIER identifies the memory hierarchy level
(eg: L1, L2 etc), from which a data-cache miss for a marked instruction
was satisfied.

Use the 'perf_mem_data_src' object to export this hierarchy level to user
space. Fortunately, the memory hierarchy levels in Power8 map fairly easily
into the arch-neutral levels as described by the ldst_src_map[] table.

Usage:

	perf record -d -e 'cpu/PM_MRK_GRP_CMPL/' <application>
	perf report -n --mem-mode --sort=mem,sym,dso,symbol_daddr,dso_daddr"

		For samples involving load/store instructions, the memory
		hierarchy level is shown as "L1 hit", "Remote RAM hit" etc.
	# or

	perf record --data <application>
	perf report -D

		Sample records contain a 'data_src' field which encodes the
		memory hierarchy level: Eg: data_src 0x442 indicates
		MEM_OP_LOAD, MEM_LVL_HIT, MEM_LVL_L2 (i.e load hit L2).

Note that the PMU event PM_MRK_GRP_CMPL tracks all marked group completions
events. While some of these are loads and stores, others like 'add'
instructions may also be sampled. One alternative of sampling on
PM_MRK_GRP_CMPL and throwing away non-loads and non-store samples could
yield an inconsistent profile of the application.

As the precise semantics of 'perf mem -t load' or 'perf mem -t store' (which
require sampling only loads or only stores) cannot be implemented on Power,
we don't implement 'perf mem' on Power for now.

Thanks to input from Stephane Eranian, Michael Ellerman and Michael Neuling.

Cc: Stephane Eranian <eranian@google.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
Changelog[v4]:
	Drop support for 'perf mem' for Power (use perf-record and perf-report
	directly)

 arch/powerpc/include/asm/perf_event_server.h |    2 +
 arch/powerpc/perf/core-book3s.c              |   11 ++++++
 arch/powerpc/perf/power8-pmu.c               |   53 ++++++++++++++++++++++++++
 3 files changed, 66 insertions(+)

diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
index cc5f45b..2252798 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -37,6 +37,8 @@ struct power_pmu {
 	void            (*config_bhrb)(u64 pmu_bhrb_filter);
 	void		(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
 	int		(*limited_pmc_event)(u64 event_id);
+	void		(*get_mem_data_src)(union perf_mem_data_src *dsrc,
+				struct pt_regs *regs);
 	u32		flags;
 	const struct attribute_group	**attr_groups;
 	int		n_generic;
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index a3985ae..e61fd05 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1693,6 +1693,13 @@ ssize_t power_events_sysfs_show(struct device *dev,
 	return sprintf(page, "event=0x%02llx\n", pmu_attr->id);
 }
 
+static inline void power_get_mem_data_src(union perf_mem_data_src *dsrc,
+				struct pt_regs *regs)
+{
+	if  (ppmu->get_mem_data_src)
+		ppmu->get_mem_data_src(dsrc, regs);
+}
+
 struct pmu power_pmu = {
 	.pmu_enable	= power_pmu_enable,
 	.pmu_disable	= power_pmu_disable,
@@ -1774,6 +1781,10 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
 			data.br_stack = &cpuhw->bhrb_stack;
 		}
 
+		if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC &&
+						ppmu->get_mem_data_src)
+			ppmu->get_mem_data_src(&data.data_src, regs);
+
 		if (perf_event_overflow(event, &data, regs))
 			power_pmu_stop(event, 0);
 	}
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 5c61e59..4ecf903 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -537,6 +537,58 @@ static struct attribute_group power8_pmu_events_group = {
 	.attrs = power8_events_attr,
 };
 
+#define POWER8_SIER_TYPE_SHIFT	15
+#define POWER8_SIER_TYPE_MASK	(0x7LL << POWER8_SIER_TYPE_SHIFT)
+
+#define POWER8_SIER_LDST_SHIFT	1
+#define POWER8_SIER_LDST_MASK	(0x7LL << POWER8_SIER_LDST_SHIFT)
+
+#define P(a, b)			PERF_MEM_S(a, b)
+#define PLH(a, b)		(P(OP, LOAD) | P(LVL, HIT) | P(a, b))
+#define PSM(a, b)		(P(OP, STORE) | P(LVL, MISS) | P(a, b))
+
+/*
+ * Power8 interpretations:
+ * REM_CCE1: 1-hop indicates L2/L3 cache of a different core on same chip
+ * REM_CCE2: 2-hop indicates different chip or different node.
+ */
+static u64 ldst_src_map[] = {
+	/* 000 */	P(LVL, NA),
+
+	/* 001 */	PLH(LVL, L1),
+	/* 010 */	PLH(LVL, L2),
+	/* 011 */	PLH(LVL, L3),
+	/* 100 */	PLH(LVL, LOC_RAM),
+	/* 101 */	PLH(LVL, REM_CCE1),
+	/* 110 */	PLH(LVL, REM_CCE2),
+
+	/* 111 */	PSM(LVL, L1),
+};
+
+static inline bool is_load_store_inst(u64 sier)
+{
+	u64 val;
+	val = (sier & POWER8_SIER_TYPE_MASK) >> POWER8_SIER_TYPE_SHIFT;
+
+	/* 1 = load, 2 = store */
+	return val == 1 || val == 2;
+}
+
+static void power8_get_mem_data_src(union perf_mem_data_src *dsrc,
+			struct pt_regs *regs)
+{
+	u64 idx;
+	u64 sier;
+
+	sier = mfspr(SPRN_SIER);
+
+	if (is_load_store_inst(sier)) {
+		idx = (sier & POWER8_SIER_LDST_MASK) >> POWER8_SIER_LDST_SHIFT;
+
+		dsrc->val |= ldst_src_map[idx];
+	}
+}
+
 PMU_FORMAT_ATTR(event,		"config:0-49");
 PMU_FORMAT_ATTR(pmcxsel,	"config:0-7");
 PMU_FORMAT_ATTR(mark,		"config:8");
@@ -640,6 +692,7 @@ static struct power_pmu power8_pmu = {
 	.get_constraint		= power8_get_constraint,
 	.get_alternatives	= power8_get_alternatives,
 	.disable_pmc		= power8_disable_pmc,
+	.get_mem_data_src	= power8_get_mem_data_src,
 	.flags			= PPMU_HAS_SSLOT | PPMU_HAS_SIER | PPMU_BHRB | PPMU_EBB,
 	.n_generic		= ARRAY_SIZE(power8_generic_events),
 	.generic_events		= power8_generic_events,
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 5/8][v4] powerpc/perf: Export Power8 memory hierarchy info to user space.
@ 2013-09-14  0:49   ` Sukadev Bhattiprolu
  0 siblings, 0 replies; 27+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14  0:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linuxppc-dev, Paul Mackerras, Michael Ellerman, Stephane Eranian,
	Anshuman Khandual

On Power8, the LDST field in SIER identifies the memory hierarchy level
(eg: L1, L2 etc), from which a data-cache miss for a marked instruction
was satisfied.

Use the 'perf_mem_data_src' object to export this hierarchy level to user
space. Fortunately, the memory hierarchy levels in Power8 map fairly easily
into the arch-neutral levels as described by the ldst_src_map[] table.

Usage:

	perf record -d -e 'cpu/PM_MRK_GRP_CMPL/' <application>
	perf report -n --mem-mode --sort=mem,sym,dso,symbol_daddr,dso_daddr"

		For samples involving load/store instructions, the memory
		hierarchy level is shown as "L1 hit", "Remote RAM hit" etc.
	# or

	perf record --data <application>
	perf report -D

		Sample records contain a 'data_src' field which encodes the
		memory hierarchy level: Eg: data_src 0x442 indicates
		MEM_OP_LOAD, MEM_LVL_HIT, MEM_LVL_L2 (i.e load hit L2).

Note that the PMU event PM_MRK_GRP_CMPL tracks all marked group completions
events. While some of these are loads and stores, others like 'add'
instructions may also be sampled. One alternative of sampling on
PM_MRK_GRP_CMPL and throwing away non-loads and non-store samples could
yield an inconsistent profile of the application.

As the precise semantics of 'perf mem -t load' or 'perf mem -t store' (which
require sampling only loads or only stores) cannot be implemented on Power,
we don't implement 'perf mem' on Power for now.

Thanks to input from Stephane Eranian, Michael Ellerman and Michael Neuling.

Cc: Stephane Eranian <eranian@google.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
Changelog[v4]:
	Drop support for 'perf mem' for Power (use perf-record and perf-report
	directly)

 arch/powerpc/include/asm/perf_event_server.h |    2 +
 arch/powerpc/perf/core-book3s.c              |   11 ++++++
 arch/powerpc/perf/power8-pmu.c               |   53 ++++++++++++++++++++++++++
 3 files changed, 66 insertions(+)

diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
index cc5f45b..2252798 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -37,6 +37,8 @@ struct power_pmu {
 	void            (*config_bhrb)(u64 pmu_bhrb_filter);
 	void		(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
 	int		(*limited_pmc_event)(u64 event_id);
+	void		(*get_mem_data_src)(union perf_mem_data_src *dsrc,
+				struct pt_regs *regs);
 	u32		flags;
 	const struct attribute_group	**attr_groups;
 	int		n_generic;
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index a3985ae..e61fd05 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1693,6 +1693,13 @@ ssize_t power_events_sysfs_show(struct device *dev,
 	return sprintf(page, "event=0x%02llx\n", pmu_attr->id);
 }
 
+static inline void power_get_mem_data_src(union perf_mem_data_src *dsrc,
+				struct pt_regs *regs)
+{
+	if  (ppmu->get_mem_data_src)
+		ppmu->get_mem_data_src(dsrc, regs);
+}
+
 struct pmu power_pmu = {
 	.pmu_enable	= power_pmu_enable,
 	.pmu_disable	= power_pmu_disable,
@@ -1774,6 +1781,10 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
 			data.br_stack = &cpuhw->bhrb_stack;
 		}
 
+		if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC &&
+						ppmu->get_mem_data_src)
+			ppmu->get_mem_data_src(&data.data_src, regs);
+
 		if (perf_event_overflow(event, &data, regs))
 			power_pmu_stop(event, 0);
 	}
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 5c61e59..4ecf903 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -537,6 +537,58 @@ static struct attribute_group power8_pmu_events_group = {
 	.attrs = power8_events_attr,
 };
 
+#define POWER8_SIER_TYPE_SHIFT	15
+#define POWER8_SIER_TYPE_MASK	(0x7LL << POWER8_SIER_TYPE_SHIFT)
+
+#define POWER8_SIER_LDST_SHIFT	1
+#define POWER8_SIER_LDST_MASK	(0x7LL << POWER8_SIER_LDST_SHIFT)
+
+#define P(a, b)			PERF_MEM_S(a, b)
+#define PLH(a, b)		(P(OP, LOAD) | P(LVL, HIT) | P(a, b))
+#define PSM(a, b)		(P(OP, STORE) | P(LVL, MISS) | P(a, b))
+
+/*
+ * Power8 interpretations:
+ * REM_CCE1: 1-hop indicates L2/L3 cache of a different core on same chip
+ * REM_CCE2: 2-hop indicates different chip or different node.
+ */
+static u64 ldst_src_map[] = {
+	/* 000 */	P(LVL, NA),
+
+	/* 001 */	PLH(LVL, L1),
+	/* 010 */	PLH(LVL, L2),
+	/* 011 */	PLH(LVL, L3),
+	/* 100 */	PLH(LVL, LOC_RAM),
+	/* 101 */	PLH(LVL, REM_CCE1),
+	/* 110 */	PLH(LVL, REM_CCE2),
+
+	/* 111 */	PSM(LVL, L1),
+};
+
+static inline bool is_load_store_inst(u64 sier)
+{
+	u64 val;
+	val = (sier & POWER8_SIER_TYPE_MASK) >> POWER8_SIER_TYPE_SHIFT;
+
+	/* 1 = load, 2 = store */
+	return val == 1 || val == 2;
+}
+
+static void power8_get_mem_data_src(union perf_mem_data_src *dsrc,
+			struct pt_regs *regs)
+{
+	u64 idx;
+	u64 sier;
+
+	sier = mfspr(SPRN_SIER);
+
+	if (is_load_store_inst(sier)) {
+		idx = (sier & POWER8_SIER_LDST_MASK) >> POWER8_SIER_LDST_SHIFT;
+
+		dsrc->val |= ldst_src_map[idx];
+	}
+}
+
 PMU_FORMAT_ATTR(event,		"config:0-49");
 PMU_FORMAT_ATTR(pmcxsel,	"config:0-7");
 PMU_FORMAT_ATTR(mark,		"config:8");
@@ -640,6 +692,7 @@ static struct power_pmu power8_pmu = {
 	.get_constraint		= power8_get_constraint,
 	.get_alternatives	= power8_get_alternatives,
 	.disable_pmc		= power8_disable_pmc,
+	.get_mem_data_src	= power8_get_mem_data_src,
 	.flags			= PPMU_HAS_SSLOT | PPMU_HAS_SIER | PPMU_BHRB | PPMU_EBB,
 	.n_generic		= ARRAY_SIZE(power8_generic_events),
 	.generic_events		= power8_generic_events,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 6/8][v4] powerpc: Rename branch_opcode() to instr_opcode()
  2013-09-14  0:49 ` Sukadev Bhattiprolu
@ 2013-09-14  0:49   ` Sukadev Bhattiprolu
  -1 siblings, 0 replies; 27+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14  0:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linuxppc-dev, Stephane Eranian, Michael Ellerman, Paul Mackerras,
	Anshuman Khandual

The logic used in branch_opcode() to extract the opcode for an instruction
applies to non branch instructions also. So rename to instr_opcode().

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 arch/powerpc/lib/code-patching.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 17e5b23..2bc9db3 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -72,19 +72,19 @@ unsigned int create_cond_branch(const unsigned int *addr,
 	return instruction;
 }
 
-static unsigned int branch_opcode(unsigned int instr)
+static unsigned int instr_opcode(unsigned int instr)
 {
 	return (instr >> 26) & 0x3F;
 }
 
 static int instr_is_branch_iform(unsigned int instr)
 {
-	return branch_opcode(instr) == 18;
+	return instr_opcode(instr) == 18;
 }
 
 static int instr_is_branch_bform(unsigned int instr)
 {
-	return branch_opcode(instr) == 16;
+	return instr_opcode(instr) == 16;
 }
 
 int instr_is_relative_branch(unsigned int instr)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 6/8][v4] powerpc: Rename branch_opcode() to instr_opcode()
@ 2013-09-14  0:49   ` Sukadev Bhattiprolu
  0 siblings, 0 replies; 27+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14  0:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linuxppc-dev, Paul Mackerras, Michael Ellerman, Stephane Eranian,
	Anshuman Khandual

The logic used in branch_opcode() to extract the opcode for an instruction
applies to non branch instructions also. So rename to instr_opcode().

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 arch/powerpc/lib/code-patching.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 17e5b23..2bc9db3 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -72,19 +72,19 @@ unsigned int create_cond_branch(const unsigned int *addr,
 	return instruction;
 }
 
-static unsigned int branch_opcode(unsigned int instr)
+static unsigned int instr_opcode(unsigned int instr)
 {
 	return (instr >> 26) & 0x3F;
 }
 
 static int instr_is_branch_iform(unsigned int instr)
 {
-	return branch_opcode(instr) == 18;
+	return instr_opcode(instr) == 18;
 }
 
 static int instr_is_branch_bform(unsigned int instr)
 {
-	return branch_opcode(instr) == 16;
+	return instr_opcode(instr) == 16;
 }
 
 int instr_is_relative_branch(unsigned int instr)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 7/8][v4] power: implement is_instr_load_store().
  2013-09-14  0:49 ` Sukadev Bhattiprolu
@ 2013-09-14  0:49   ` Sukadev Bhattiprolu
  -1 siblings, 0 replies; 27+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14  0:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linuxppc-dev, Stephane Eranian, Michael Ellerman, Paul Mackerras,
	Anshuman Khandual

Implement is_instr_load_store() to detect whether a given instruction
is one of the fixed-point or floating-point load/store instructions.
This function will be used in a follow-on patch to save memory hierarchy
information of the load/store.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/code-patching.h |    1 +
 arch/powerpc/lib/code-patching.c         |   90 ++++++++++++++++++++++++++++++
 2 files changed, 91 insertions(+)

diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
index a6f8c7a..3e47fe0 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -34,6 +34,7 @@ int instr_is_branch_to_addr(const unsigned int *instr, unsigned long addr);
 unsigned long branch_target(const unsigned int *instr);
 unsigned int translate_branch(const unsigned int *dest,
 			      const unsigned int *src);
+int instr_is_load_store(const unsigned int *instr);
 
 static inline unsigned long ppc_function_entry(void *func)
 {
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 2bc9db3..7e5dc6f 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -159,6 +159,96 @@ unsigned int translate_branch(const unsigned int *dest, const unsigned int *src)
 	return 0;
 }
 
+static unsigned int load_store_xval(const unsigned int instr)
+{
+	return (instr >> 1) & 0x3FF;	/* bits 21..30 */
+}
+
+/*
+ * Values of bits 21:30 of Fixed-point and Floating-point Load and Store
+ * instructions.
+ *
+ * Reference:	PowerISA_V2.06B_Public.pdf, Sections 3.3.2 through 3.3.6 and
+ *		4.6.2 through 4.6.4.
+ */
+#define	x_lbzx		87
+#define	x_lbzux		119
+#define	x_lhzx		279
+#define	x_lhzux		311
+#define	x_lhax		343
+#define	x_lhaux		375
+#define	x_lwzx		23
+#define	x_lwzux		55
+#define	x_lwax		341
+#define	x_lwaux		373
+#define	x_ldx		21
+#define	x_ldux		53
+#define	x_stbx		215
+#define	x_stbux		247
+#define	x_sthx		407
+#define	x_sthux		439
+#define	x_stwx		151
+#define	x_stwux		183
+#define	x_stdx		149
+#define	x_stdux		181
+#define	x_lhbrx		790
+#define	x_lwbrx		534
+#define	x_sthbrx	918
+#define	x_stwbrx	662
+#define	x_ldbrx		532
+#define	x_stdbrx	660
+#define	x_lswi		597
+#define	x_lswx		533
+#define	x_stswi		725
+#define	x_stswx		661
+#define	x_lfsx		535
+#define	x_lfsux		567
+#define	x_lfdx		599
+#define	x_lfdux		631
+#define	x_lfiwax	855
+#define	x_lfiwzx	887
+#define	x_stfsx		663
+#define	x_stfsux	695
+#define	x_stfdx		727
+#define	x_stfdux	759
+#define	x_stfiwax	983
+#define	x_lfdpx		791
+#define	x_stfdpx	919
+
+static unsigned int x_form_load_store[] = {
+	x_lbzx,     x_lbzux,    x_lhzx,     x_lhzux,    x_lhax,
+	x_lhaux,    x_lwzx,     x_lwzux,    x_lwax,     x_lwaux,
+	x_ldx,      x_ldux,     x_stbx,     x_stbux,    x_sthx,
+	x_sthux,    x_stwx,     x_stwux,    x_stdx,     x_stdux,
+	x_lhbrx,    x_lwbrx,    x_sthbrx,   x_stwbrx,   x_ldbrx,
+	x_stdbrx,   x_lswi,     x_lswx,     x_stswi,    x_stswx,
+	x_lfsx,     x_lfsux,    x_lfdx,     x_lfdux,    x_lfiwax,
+	x_lfiwzx,   x_stfsx,    x_stfsux,   x_stfdx,    x_stfdux,
+	x_stfiwax,  x_lfdpx,    x_stfdpx
+};
+
+int instr_is_load_store(const unsigned int *instr)
+{
+	unsigned int op;
+	int i, n;
+
+	op = instr_opcode(*instr);
+
+	if ((op >= 32 && op <= 58) || (op == 61 || op == 62))
+		return 1;
+
+	if (op == 31) {
+		n = sizeof(x_form_load_store) / sizeof(int);
+
+		for (i = 0; i < n; i++) {
+			if (x_form_load_store[i] == load_store_xval(*instr))
+				return 1;
+		}
+	}
+
+	return 0;
+}
+
 
 #ifdef CONFIG_CODE_PATCHING_SELFTEST
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 7/8][v4] power: implement is_instr_load_store().
@ 2013-09-14  0:49   ` Sukadev Bhattiprolu
  0 siblings, 0 replies; 27+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14  0:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linuxppc-dev, Paul Mackerras, Michael Ellerman, Stephane Eranian,
	Anshuman Khandual

Implement is_instr_load_store() to detect whether a given instruction
is one of the fixed-point or floating-point load/store instructions.
This function will be used in a follow-on patch to save memory hierarchy
information of the load/store.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/code-patching.h |    1 +
 arch/powerpc/lib/code-patching.c         |   90 ++++++++++++++++++++++++++++++
 2 files changed, 91 insertions(+)

diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
index a6f8c7a..3e47fe0 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -34,6 +34,7 @@ int instr_is_branch_to_addr(const unsigned int *instr, unsigned long addr);
 unsigned long branch_target(const unsigned int *instr);
 unsigned int translate_branch(const unsigned int *dest,
 			      const unsigned int *src);
+int instr_is_load_store(const unsigned int *instr);
 
 static inline unsigned long ppc_function_entry(void *func)
 {
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 2bc9db3..7e5dc6f 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -159,6 +159,96 @@ unsigned int translate_branch(const unsigned int *dest, const unsigned int *src)
 	return 0;
 }
 
+static unsigned int load_store_xval(const unsigned int instr)
+{
+	return (instr >> 1) & 0x3FF;	/* bits 21..30 */
+}
+
+/*
+ * Values of bits 21:30 of Fixed-point and Floating-point Load and Store
+ * instructions.
+ *
+ * Reference:	PowerISA_V2.06B_Public.pdf, Sections 3.3.2 through 3.3.6 and
+ *		4.6.2 through 4.6.4.
+ */
+#define	x_lbzx		87
+#define	x_lbzux		119
+#define	x_lhzx		279
+#define	x_lhzux		311
+#define	x_lhax		343
+#define	x_lhaux		375
+#define	x_lwzx		23
+#define	x_lwzux		55
+#define	x_lwax		341
+#define	x_lwaux		373
+#define	x_ldx		21
+#define	x_ldux		53
+#define	x_stbx		215
+#define	x_stbux		247
+#define	x_sthx		407
+#define	x_sthux		439
+#define	x_stwx		151
+#define	x_stwux		183
+#define	x_stdx		149
+#define	x_stdux		181
+#define	x_lhbrx		790
+#define	x_lwbrx		534
+#define	x_sthbrx	918
+#define	x_stwbrx	662
+#define	x_ldbrx		532
+#define	x_stdbrx	660
+#define	x_lswi		597
+#define	x_lswx		533
+#define	x_stswi		725
+#define	x_stswx		661
+#define	x_lfsx		535
+#define	x_lfsux		567
+#define	x_lfdx		599
+#define	x_lfdux		631
+#define	x_lfiwax	855
+#define	x_lfiwzx	887
+#define	x_stfsx		663
+#define	x_stfsux	695
+#define	x_stfdx		727
+#define	x_stfdux	759
+#define	x_stfiwax	983
+#define	x_lfdpx		791
+#define	x_stfdpx	919
+
+static unsigned int x_form_load_store[] = {
+	x_lbzx,     x_lbzux,    x_lhzx,     x_lhzux,    x_lhax,
+	x_lhaux,    x_lwzx,     x_lwzux,    x_lwax,     x_lwaux,
+	x_ldx,      x_ldux,     x_stbx,     x_stbux,    x_sthx,
+	x_sthux,    x_stwx,     x_stwux,    x_stdx,     x_stdux,
+	x_lhbrx,    x_lwbrx,    x_sthbrx,   x_stwbrx,   x_ldbrx,
+	x_stdbrx,   x_lswi,     x_lswx,     x_stswi,    x_stswx,
+	x_lfsx,     x_lfsux,    x_lfdx,     x_lfdux,    x_lfiwax,
+	x_lfiwzx,   x_stfsx,    x_stfsux,   x_stfdx,    x_stfdux,
+	x_stfiwax,  x_lfdpx,    x_stfdpx
+};
+
+int instr_is_load_store(const unsigned int *instr)
+{
+	unsigned int op;
+	int i, n;
+
+	op = instr_opcode(*instr);
+
+	if ((op >= 32 && op <= 58) || (op == 61 || op == 62))
+		return 1;
+
+	if (op == 31) {
+		n = sizeof(x_form_load_store) / sizeof(int);
+
+		for (i = 0; i < n; i++) {
+			if (x_form_load_store[i] == load_store_xval(*instr))
+				return 1;
+		}
+	}
+
+	return 0;
+}
+
 
 #ifdef CONFIG_CODE_PATCHING_SELFTEST
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 8/8][v4] powerpc/perf: Export Power7 memory hierarchy info to user space.
  2013-09-14  0:49 ` Sukadev Bhattiprolu
@ 2013-09-14  0:49   ` Sukadev Bhattiprolu
  -1 siblings, 0 replies; 27+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14  0:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linuxppc-dev, Stephane Eranian, Michael Ellerman, Paul Mackerras,
	Anshuman Khandual

On Power7, the DCACHE_SRC field in MMCRA register identifies the memory
hierarchy level (eg: L2, L3 etc) from which a data-cache miss for a
marked instruction was satisfied.

Use the 'perf_mem_data_src' object to export this hierarchy level to user
space. Some memory hierarchy levels in Power7 don't map into the arch-neutral
levels. However, since newer generation of the processor (i.e. Power8) uses
fewer levels than in Power7, we don't really need to define new hierarchy
levels just for Power7.

We instead, map as many levels as possible and approximate the rest. See
comments near dcache-src_map[] in the patch.

Usage:

	perf record -d -e 'cpu/PM_MRK_GRP_CMPL/' <application>
	perf report -n --mem-mode --sort=mem,sym,dso,symbol_daddr,dso_daddr"

		For samples involving load/store instructions, the memory
		hierarchy level is shown as "L1 hit", "Remote RAM hit" etc.
	# or

	perf record --data <application>
	perf report -D

		Sample records contain a 'data_src' field which encodes the
		memory hierarchy level: Eg: data_src 0x442 indicates
		MEM_OP_LOAD, MEM_LVL_HIT, MEM_LVL_L2 (i.e load hit L2).

Note that the PMU event PM_MRK_GRP_CMPL tracks all marked group completions
events. While some of these are loads and stores, others like 'add'
instructions may also be sampled.

As such, the precise semantics of 'perf mem -t load' or 'perf mem -t store'
(which require sampling only loads or only stores cannot be implemented on
Power. (Sampling on PM_MRK_GRP_CMPL and throwing away non-loads and non-store
samples could yield an inconsistent profile of the application).

Thanks to input from Stephane Eranian, Michael Ellerman and Michael Neuling.

Cc: Stephane Eranian <eranian@google.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
Changelog[v4]:
	Drop support for 'perf mem' for Power (use perf-record and perf-report
	directly)

Changelog[v3]:
	[Michael Ellerman] If newer levels that we defined in [v2] are not
	needed for Power8, ignore the new levels for Power7 also, and
	approximate them.
	Separate the TLB level mapping to a separate patchset.

Changelog[v2]:
        [Stephane Eranian] Define new levels rather than ORing the L2 and L3
        with REM_CCE1 and REM_CCE2.
        [Stephane Eranian] allocate a bit PERF_MEM_XLVL_NA for architectures
        that don't use the ->mem_xlvl field.
        Insert the TLB patch ahead so the new TLB bits are contigous with
        existing TLB bits.

 arch/powerpc/perf/power7-pmu.c |   94 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 94 insertions(+)

diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
index 56c67bc..ddfa548 100644
--- a/arch/powerpc/perf/power7-pmu.c
+++ b/arch/powerpc/perf/power7-pmu.c
@@ -11,8 +11,10 @@
 #include <linux/kernel.h>
 #include <linux/perf_event.h>
 #include <linux/string.h>
+#include <linux/uaccess.h>
 #include <asm/reg.h>
 #include <asm/cputable.h>
+#include <asm/code-patching.h>
 
 /*
  * Bits in event code for POWER7
@@ -317,6 +319,97 @@ static void power7_disable_pmc(unsigned int pmc, unsigned long mmcr[])
 		mmcr[1] &= ~(0xffUL << MMCR1_PMCSEL_SH(pmc));
 }
 
+#define POWER7_MMCRA_DCACHE_MISS	(0x1LL << 55)
+#define POWER7_MMCRA_DCACHE_SRC_SHIFT	51
+#define POWER7_MMCRA_DCACHE_SRC_MASK	(0xFLL << POWER7_MMCRA_DCACHE_SRC_SHIFT)
+
+#define P(a, b)		PERF_MEM_S(a, b)
+#define PLH(a, b)	(P(OP, LOAD) | P(LVL, HIT) | P(a, b))
+/*
+ * Map the Power7 DCACHE_SRC field (bits 9..12) in MMCRA register to the
+ * architecture-neutral memory hierarchy levels. For the levels in Power7
+ * that don't map to the arch-neutral levels, approximate to nearest
+ * level.
+ *
+ *	1-hop:	indicates another core on the same chip (2.1 and 3.1 levels).
+ *	2-hops:	indicates a different chip on same or different node (remote
+ *		and distant levels).
+ *
+ * For consistency with this interpretation of the hops, we dont use
+ * the REM_RAM1 level below.
+ *
+ * The *SHR and *MOD states of the cache are ignored/not exported to user.
+ *
+ * ### Levels marked with ### in comments below are approximated
+ */
+static u64 dcache_src_map[] = {
+	PLH(LVL, L2),			/* 00: FROM_L2 */
+	PLH(LVL, L3),			/* 01: FROM_L3 */
+
+	P(LVL, NA),			/* 02: Reserved */
+	P(LVL, NA),			/* 03: Reserved */
+
+	PLH(LVL, REM_CCE1),		/* 04: FROM_L2.1_SHR ### */
+	PLH(LVL, REM_CCE1),		/* 05: FROM_L2.1_MOD ### */
+
+	PLH(LVL, REM_CCE1),		/* 06: FROM_L3.1_SHR ### */
+	PLH(LVL, REM_CCE1),		/* 07: FROM_L3.1_MOD ### */
+
+	PLH(LVL, REM_CCE2),		/* 08: FROM_RL2L3_SHR ### */
+	PLH(LVL, REM_CCE2),		/* 09: FROM_RL2L3_MOD ### */
+
+	PLH(LVL, REM_CCE2),		/* 10: FROM_DL2L3_SHR ### */
+	PLH(LVL, REM_CCE2),		/* 11: FROM_DL2L3_MOD ### */
+
+	PLH(LVL, LOC_RAM),		/* 12: FROM_LMEM */
+	PLH(LVL, REM_RAM2),		/* 13: FROM_RMEM ### */
+	PLH(LVL, REM_RAM2),		/* 14: FROM_DMEM */
+
+	P(LVL, NA),			/* 15: Reserved */
+};
+
+/*
+ * Determine the memory-hierarchy information (if applicable) for the
+ * instruction/address we are sampling. If we encountered a DCACHE_MISS,
+ * mmcra[DCACHE_SRC_MASK] specifies the memory level from which the operand
+ * was loaded.
+ *
+ * Otherwise, it is an L1-hit, provided the instruction was a load/store.
+ */
+static void power7_get_mem_data_src(union perf_mem_data_src *dsrc,
+			struct pt_regs *regs)
+{
+	u64 idx;
+	u64 mmcra = regs->dsisr;
+	u64 addr;
+	int ret;
+	unsigned int instr;
+
+	if (mmcra & POWER7_MMCRA_DCACHE_MISS) {
+		idx = mmcra & POWER7_MMCRA_DCACHE_SRC_MASK;
+		idx >>= POWER7_MMCRA_DCACHE_SRC_SHIFT;
+
+		dsrc->val |= dcache_src_map[idx];
+		return;
+	}
+
+	instr = 0;
+	addr = perf_instruction_pointer(regs);
+
+	if (is_kernel_addr(addr))
+		instr = *(unsigned int *)addr;
+	else {
+		pagefault_disable();
+		ret = __get_user_inatomic(instr, (unsigned int __user *)addr);
+		pagefault_enable();
+		if (ret)
+			instr = 0;
+	}
+	if (instr && instr_is_load_store(&instr))
+		dsrc->val |= PLH(LVL, L1);
+}
+
+
 static int power7_generic_events[] = {
 	[PERF_COUNT_HW_CPU_CYCLES] =			PME_PM_CYC,
 	[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =	PME_PM_GCT_NOSLOT_CYC,
@@ -437,6 +530,7 @@ static struct power_pmu power7_pmu = {
 	.get_constraint		= power7_get_constraint,
 	.get_alternatives	= power7_get_alternatives,
 	.disable_pmc		= power7_disable_pmc,
+	.get_mem_data_src	= power7_get_mem_data_src,
 	.flags			= PPMU_ALT_SIPR,
 	.attr_groups		= power7_pmu_attr_groups,
 	.n_generic		= ARRAY_SIZE(power7_generic_events),
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 8/8][v4] powerpc/perf: Export Power7 memory hierarchy info to user space.
@ 2013-09-14  0:49   ` Sukadev Bhattiprolu
  0 siblings, 0 replies; 27+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14  0:49 UTC (permalink / raw)
  To: linux-kernel
  Cc: linuxppc-dev, Paul Mackerras, Michael Ellerman, Stephane Eranian,
	Anshuman Khandual

On Power7, the DCACHE_SRC field in MMCRA register identifies the memory
hierarchy level (eg: L2, L3 etc) from which a data-cache miss for a
marked instruction was satisfied.

Use the 'perf_mem_data_src' object to export this hierarchy level to user
space. Some memory hierarchy levels in Power7 don't map into the arch-neutral
levels. However, since newer generation of the processor (i.e. Power8) uses
fewer levels than in Power7, we don't really need to define new hierarchy
levels just for Power7.

We instead, map as many levels as possible and approximate the rest. See
comments near dcache-src_map[] in the patch.

Usage:

	perf record -d -e 'cpu/PM_MRK_GRP_CMPL/' <application>
	perf report -n --mem-mode --sort=mem,sym,dso,symbol_daddr,dso_daddr"

		For samples involving load/store instructions, the memory
		hierarchy level is shown as "L1 hit", "Remote RAM hit" etc.
	# or

	perf record --data <application>
	perf report -D

		Sample records contain a 'data_src' field which encodes the
		memory hierarchy level: Eg: data_src 0x442 indicates
		MEM_OP_LOAD, MEM_LVL_HIT, MEM_LVL_L2 (i.e load hit L2).

Note that the PMU event PM_MRK_GRP_CMPL tracks all marked group completions
events. While some of these are loads and stores, others like 'add'
instructions may also be sampled.

As such, the precise semantics of 'perf mem -t load' or 'perf mem -t store'
(which require sampling only loads or only stores cannot be implemented on
Power. (Sampling on PM_MRK_GRP_CMPL and throwing away non-loads and non-store
samples could yield an inconsistent profile of the application).

Thanks to input from Stephane Eranian, Michael Ellerman and Michael Neuling.

Cc: Stephane Eranian <eranian@google.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
Changelog[v4]:
	Drop support for 'perf mem' for Power (use perf-record and perf-report
	directly)

Changelog[v3]:
	[Michael Ellerman] If newer levels that we defined in [v2] are not
	needed for Power8, ignore the new levels for Power7 also, and
	approximate them.
	Separate the TLB level mapping to a separate patchset.

Changelog[v2]:
        [Stephane Eranian] Define new levels rather than ORing the L2 and L3
        with REM_CCE1 and REM_CCE2.
        [Stephane Eranian] allocate a bit PERF_MEM_XLVL_NA for architectures
        that don't use the ->mem_xlvl field.
        Insert the TLB patch ahead so the new TLB bits are contigous with
        existing TLB bits.

 arch/powerpc/perf/power7-pmu.c |   94 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 94 insertions(+)

diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
index 56c67bc..ddfa548 100644
--- a/arch/powerpc/perf/power7-pmu.c
+++ b/arch/powerpc/perf/power7-pmu.c
@@ -11,8 +11,10 @@
 #include <linux/kernel.h>
 #include <linux/perf_event.h>
 #include <linux/string.h>
+#include <linux/uaccess.h>
 #include <asm/reg.h>
 #include <asm/cputable.h>
+#include <asm/code-patching.h>
 
 /*
  * Bits in event code for POWER7
@@ -317,6 +319,97 @@ static void power7_disable_pmc(unsigned int pmc, unsigned long mmcr[])
 		mmcr[1] &= ~(0xffUL << MMCR1_PMCSEL_SH(pmc));
 }
 
+#define POWER7_MMCRA_DCACHE_MISS	(0x1LL << 55)
+#define POWER7_MMCRA_DCACHE_SRC_SHIFT	51
+#define POWER7_MMCRA_DCACHE_SRC_MASK	(0xFLL << POWER7_MMCRA_DCACHE_SRC_SHIFT)
+
+#define P(a, b)		PERF_MEM_S(a, b)
+#define PLH(a, b)	(P(OP, LOAD) | P(LVL, HIT) | P(a, b))
+/*
+ * Map the Power7 DCACHE_SRC field (bits 9..12) in MMCRA register to the
+ * architecture-neutral memory hierarchy levels. For the levels in Power7
+ * that don't map to the arch-neutral levels, approximate to nearest
+ * level.
+ *
+ *	1-hop:	indicates another core on the same chip (2.1 and 3.1 levels).
+ *	2-hops:	indicates a different chip on same or different node (remote
+ *		and distant levels).
+ *
+ * For consistency with this interpretation of the hops, we dont use
+ * the REM_RAM1 level below.
+ *
+ * The *SHR and *MOD states of the cache are ignored/not exported to user.
+ *
+ * ### Levels marked with ### in comments below are approximated
+ */
+static u64 dcache_src_map[] = {
+	PLH(LVL, L2),			/* 00: FROM_L2 */
+	PLH(LVL, L3),			/* 01: FROM_L3 */
+
+	P(LVL, NA),			/* 02: Reserved */
+	P(LVL, NA),			/* 03: Reserved */
+
+	PLH(LVL, REM_CCE1),		/* 04: FROM_L2.1_SHR ### */
+	PLH(LVL, REM_CCE1),		/* 05: FROM_L2.1_MOD ### */
+
+	PLH(LVL, REM_CCE1),		/* 06: FROM_L3.1_SHR ### */
+	PLH(LVL, REM_CCE1),		/* 07: FROM_L3.1_MOD ### */
+
+	PLH(LVL, REM_CCE2),		/* 08: FROM_RL2L3_SHR ### */
+	PLH(LVL, REM_CCE2),		/* 09: FROM_RL2L3_MOD ### */
+
+	PLH(LVL, REM_CCE2),		/* 10: FROM_DL2L3_SHR ### */
+	PLH(LVL, REM_CCE2),		/* 11: FROM_DL2L3_MOD ### */
+
+	PLH(LVL, LOC_RAM),		/* 12: FROM_LMEM */
+	PLH(LVL, REM_RAM2),		/* 13: FROM_RMEM ### */
+	PLH(LVL, REM_RAM2),		/* 14: FROM_DMEM */
+
+	P(LVL, NA),			/* 15: Reserved */
+};
+
+/*
+ * Determine the memory-hierarchy information (if applicable) for the
+ * instruction/address we are sampling. If we encountered a DCACHE_MISS,
+ * mmcra[DCACHE_SRC_MASK] specifies the memory level from which the operand
+ * was loaded.
+ *
+ * Otherwise, it is an L1-hit, provided the instruction was a load/store.
+ */
+static void power7_get_mem_data_src(union perf_mem_data_src *dsrc,
+			struct pt_regs *regs)
+{
+	u64 idx;
+	u64 mmcra = regs->dsisr;
+	u64 addr;
+	int ret;
+	unsigned int instr;
+
+	if (mmcra & POWER7_MMCRA_DCACHE_MISS) {
+		idx = mmcra & POWER7_MMCRA_DCACHE_SRC_MASK;
+		idx >>= POWER7_MMCRA_DCACHE_SRC_SHIFT;
+
+		dsrc->val |= dcache_src_map[idx];
+		return;
+	}
+
+	instr = 0;
+	addr = perf_instruction_pointer(regs);
+
+	if (is_kernel_addr(addr))
+		instr = *(unsigned int *)addr;
+	else {
+		pagefault_disable();
+		ret = __get_user_inatomic(instr, (unsigned int __user *)addr);
+		pagefault_enable();
+		if (ret)
+			instr = 0;
+	}
+	if (instr && instr_is_load_store(&instr))
+		dsrc->val |= PLH(LVL, L1);
+}
+
+
 static int power7_generic_events[] = {
 	[PERF_COUNT_HW_CPU_CYCLES] =			PME_PM_CYC,
 	[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =	PME_PM_GCT_NOSLOT_CYC,
@@ -437,6 +530,7 @@ static struct power_pmu power7_pmu = {
 	.get_constraint		= power7_get_constraint,
 	.get_alternatives	= power7_get_alternatives,
 	.disable_pmc		= power7_disable_pmc,
+	.get_mem_data_src	= power7_get_mem_data_src,
 	.flags			= PPMU_ALT_SIPR,
 	.attr_groups		= power7_pmu_attr_groups,
 	.n_generic		= ARRAY_SIZE(power7_generic_events),
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 7/8][v4] power: implement is_instr_load_store().
  2013-09-14  0:49   ` Sukadev Bhattiprolu
  (?)
@ 2013-09-16 12:22   ` Tom Musta
  -1 siblings, 0 replies; 27+ messages in thread
From: Tom Musta @ 2013-09-16 12:22 UTC (permalink / raw)
  To: Sukadev Bhattiprolu, linuxppc-dev

On 9/13/2013 7:49 PM, Sukadev Bhattiprolu wrote:
> Implement is_instr_load_store() to detect whether a given instruction
> is one of the fixed-point or floating-point load/store instructions.
> This function will be used in a follow-on patch to save memory hierarchy
> information of the load/store.
>
>
> +/*
> + * Values of bits 21:30 of Fixed-point and Floating-point Load and Store
> + * instructions.
> + *
> + * Reference:	PowerISA_V2.06B_Public.pdf, Sections 3.3.2 through 3.3.6 and
> + *		4.6.2 through 4.6.4.
> + */
> +#define	x_lbzx		87
> +#define	x_lbzux		119
> +#define	x_lhzx		279
> <snip>
> +
> +static unsigned int x_form_load_store[] = {
> +	x_lbzx,     x_lbzux,    x_lhzx,     x_lhzux,    x_lhax,
> +	x_lhaux,    x_lwzx,     x_lwzux,    x_lwax,     x_lwaux,
> +	x_ldx,      x_ldux,     x_stbx,     x_stbux,    x_sthx,
> +	x_sthux,    x_stwx,     x_stwux,    x_stdx,     x_stdux,
> +	x_lhbrx,    x_lwbrx,    x_sthbrx,   x_stwbrx,   x_ldbrx,
> +	x_stdbrx,   x_lswi,     x_lswx,     x_stswi,    x_stswx,
> +	x_lfsx,     x_lfsux,    x_lfdx,     x_lfdux,    x_lfiwax,
> +	x_lfiwzx,   x_stfsx,    x_stfsux,   x_stfdx,    x_stfdux,
> +	x_stfiwax,  x_lfdpx,    x_stfdpx
> +};
> +
> <snip>
> +
>   
>   
Can you explain why this function only covers fixed point and floating 
point instructions?  I.e., why did you skip Altivec and VSX?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/8][v4] powerpc/perf: Rename Power8 macros to start with PME
  2013-09-14  0:49   ` Sukadev Bhattiprolu
@ 2013-09-18  5:24     ` Anshuman Khandual
  -1 siblings, 0 replies; 27+ messages in thread
From: Anshuman Khandual @ 2013-09-18  5:24 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: linux-kernel, linuxppc-dev, Stephane Eranian, Michael Ellerman,
	Paul Mackerras

On 09/14/2013 06:19 AM, Sukadev Bhattiprolu wrote:
> We use helpers like GENERIC_EVENT_ATTR() to list the generic events in
> sysfs. To avoid name collisions, GENERIC_EVENT_ATTR() requires the perf
> event macros to start with PME.

We got all the raw event codes covered for P7 with the help of power7-events-list.h
enumeration.

/*
 * Power7 event codes.
 */
#define EVENT(_name, _code) \
	PME_##_name = _code,

enum {
#include "power7-events-list.h"
};
#undef EVENT

Just wondering if its a good idea to name change these selected macros to be consumed
by GENERIC_EVENT_ATTR() right here for this purpose or we need to get the comprehensive
list of raw events for P8 first. Just an idea.

Regards
Anshuman


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/8][v4] powerpc/perf: Rename Power8 macros to start with PME
@ 2013-09-18  5:24     ` Anshuman Khandual
  0 siblings, 0 replies; 27+ messages in thread
From: Anshuman Khandual @ 2013-09-18  5:24 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: linuxppc-dev, Michael Ellerman, Paul Mackerras, linux-kernel,
	Stephane Eranian

On 09/14/2013 06:19 AM, Sukadev Bhattiprolu wrote:
> We use helpers like GENERIC_EVENT_ATTR() to list the generic events in
> sysfs. To avoid name collisions, GENERIC_EVENT_ATTR() requires the perf
> event macros to start with PME.

We got all the raw event codes covered for P7 with the help of power7-events-list.h
enumeration.

/*
 * Power7 event codes.
 */
#define EVENT(_name, _code) \
	PME_##_name = _code,

enum {
#include "power7-events-list.h"
};
#undef EVENT

Just wondering if its a good idea to name change these selected macros to be consumed
by GENERIC_EVENT_ATTR() right here for this purpose or we need to get the comprehensive
list of raw events for P8 first. Just an idea.

Regards
Anshuman

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 8/8][v4] powerpc/perf: Export Power7 memory hierarchy info to user space.
  2013-09-14  0:49   ` Sukadev Bhattiprolu
@ 2013-09-18 10:47     ` Anshuman Khandual
  -1 siblings, 0 replies; 27+ messages in thread
From: Anshuman Khandual @ 2013-09-18 10:47 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: linux-kernel, linuxppc-dev, Paul Mackerras, Michael Ellerman,
	Stephane Eranian

On 09/14/2013 06:19 AM, Sukadev Bhattiprolu wrote:
> On Power7, the DCACHE_SRC field in MMCRA register identifies the memory
> hierarchy level (eg: L2, L3 etc) from which a data-cache miss for a
> marked instruction was satisfied.
> 
> Use the 'perf_mem_data_src' object to export this hierarchy level to user
> space. Some memory hierarchy levels in Power7 don't map into the arch-neutral
> levels. However, since newer generation of the processor (i.e. Power8) uses
> fewer levels than in Power7, we don't really need to define new hierarchy
> levels just for Power7.
> 
> We instead, map as many levels as possible and approximate the rest. See
> comments near dcache-src_map[] in the patch.
> 
> Usage:
> 
> 	perf record -d -e 'cpu/PM_MRK_GRP_CMPL/' <application>
> 	perf report -n --mem-mode --sort=mem,sym,dso,symbol_daddr,dso_daddr"
> 
> 		For samples involving load/store instructions, the memory
> 		hierarchy level is shown as "L1 hit", "Remote RAM hit" etc.
> 	# or
> 
> 	perf record --data <application>
> 	perf report -D
> 
> 		Sample records contain a 'data_src' field which encodes the
> 		memory hierarchy level: Eg: data_src 0x442 indicates
> 		MEM_OP_LOAD, MEM_LVL_HIT, MEM_LVL_L2 (i.e load hit L2).

Successfully built and boot tested this entire patchset both on a P7 and P8 system.
Running some sample tests with ebizzy micro benchmark. Till now got only 0x142 and
0x0 values for data_src object for the sample records. Will experiment around bit
more on P7 and P8 systems and post the results.

Regards
Anshuman 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 8/8][v4] powerpc/perf: Export Power7 memory hierarchy info to user space.
@ 2013-09-18 10:47     ` Anshuman Khandual
  0 siblings, 0 replies; 27+ messages in thread
From: Anshuman Khandual @ 2013-09-18 10:47 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Stephane Eranian, linuxppc-dev, Paul Mackerras, linux-kernel,
	Michael Ellerman

On 09/14/2013 06:19 AM, Sukadev Bhattiprolu wrote:
> On Power7, the DCACHE_SRC field in MMCRA register identifies the memory
> hierarchy level (eg: L2, L3 etc) from which a data-cache miss for a
> marked instruction was satisfied.
> 
> Use the 'perf_mem_data_src' object to export this hierarchy level to user
> space. Some memory hierarchy levels in Power7 don't map into the arch-neutral
> levels. However, since newer generation of the processor (i.e. Power8) uses
> fewer levels than in Power7, we don't really need to define new hierarchy
> levels just for Power7.
> 
> We instead, map as many levels as possible and approximate the rest. See
> comments near dcache-src_map[] in the patch.
> 
> Usage:
> 
> 	perf record -d -e 'cpu/PM_MRK_GRP_CMPL/' <application>
> 	perf report -n --mem-mode --sort=mem,sym,dso,symbol_daddr,dso_daddr"
> 
> 		For samples involving load/store instructions, the memory
> 		hierarchy level is shown as "L1 hit", "Remote RAM hit" etc.
> 	# or
> 
> 	perf record --data <application>
> 	perf report -D
> 
> 		Sample records contain a 'data_src' field which encodes the
> 		memory hierarchy level: Eg: data_src 0x442 indicates
> 		MEM_OP_LOAD, MEM_LVL_HIT, MEM_LVL_L2 (i.e load hit L2).

Successfully built and boot tested this entire patchset both on a P7 and P8 system.
Running some sample tests with ebizzy micro benchmark. Till now got only 0x142 and
0x0 values for data_src object for the sample records. Will experiment around bit
more on P7 and P8 systems and post the results.

Regards
Anshuman 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 8/8][v4] powerpc/perf: Export Power7 memory hierarchy info to user space.
  2013-09-14  0:49   ` Sukadev Bhattiprolu
@ 2013-09-19  8:41     ` Anshuman Khandual
  -1 siblings, 0 replies; 27+ messages in thread
From: Anshuman Khandual @ 2013-09-19  8:41 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: linux-kernel, linuxppc-dev, Stephane Eranian, Michael Ellerman,
	Paul Mackerras

On 09/14/2013 06:19 AM, Sukadev Bhattiprolu wrote:
> +static void power7_get_mem_data_src(union perf_mem_data_src *dsrc,
> +			struct pt_regs *regs)
> +{
> +	u64 idx;
> +	u64 mmcra = regs->dsisr;
> +	u64 addr;
> +	int ret;
> +	unsigned int instr;
> +
> +	if (mmcra & POWER7_MMCRA_DCACHE_MISS) {
> +		idx = mmcra & POWER7_MMCRA_DCACHE_SRC_MASK;
> +		idx >>= POWER7_MMCRA_DCACHE_SRC_SHIFT;
> +
> +		dsrc->val |= dcache_src_map[idx];
> +		return;
> +	}
> +
> +	instr = 0;
> +	addr = perf_instruction_pointer(regs);
> +
> +	if (is_kernel_addr(addr))
> +		instr = *(unsigned int *)addr;
> +	else {
> +		pagefault_disable();
> +		ret = __get_user_inatomic(instr, (unsigned int __user *)addr);
> +		pagefault_enable();
> +		if (ret)
> +			instr = 0;
> +	}
> +	if (instr && instr_is_load_store(&instr))


Wondering if there is any possibility of getting positive values for
"(mmcra & POWER7_MMCRA_DCACHE_SRC_MASK) >> POWER7_MMCRA_DCACHE_SRC_SHIFT"
when the marked instruction did not have MMCRA[POWER7_MMCRA_DCACHE_MISS]
bit set. In that case we should actually compute dsrc->val as in the previous
case. I did couple of experiments on a P7 box, but was not able to find a
instance for a marked instruction whose MMCRA[POWER7_MMCRA_DCACHE_MISS] bit
not set and have a positive value POWER7_MMCRA_DCACHE_SRC field.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 8/8][v4] powerpc/perf: Export Power7 memory hierarchy info to user space.
@ 2013-09-19  8:41     ` Anshuman Khandual
  0 siblings, 0 replies; 27+ messages in thread
From: Anshuman Khandual @ 2013-09-19  8:41 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: linuxppc-dev, Michael Ellerman, Paul Mackerras, linux-kernel,
	Stephane Eranian

On 09/14/2013 06:19 AM, Sukadev Bhattiprolu wrote:
> +static void power7_get_mem_data_src(union perf_mem_data_src *dsrc,
> +			struct pt_regs *regs)
> +{
> +	u64 idx;
> +	u64 mmcra = regs->dsisr;
> +	u64 addr;
> +	int ret;
> +	unsigned int instr;
> +
> +	if (mmcra & POWER7_MMCRA_DCACHE_MISS) {
> +		idx = mmcra & POWER7_MMCRA_DCACHE_SRC_MASK;
> +		idx >>= POWER7_MMCRA_DCACHE_SRC_SHIFT;
> +
> +		dsrc->val |= dcache_src_map[idx];
> +		return;
> +	}
> +
> +	instr = 0;
> +	addr = perf_instruction_pointer(regs);
> +
> +	if (is_kernel_addr(addr))
> +		instr = *(unsigned int *)addr;
> +	else {
> +		pagefault_disable();
> +		ret = __get_user_inatomic(instr, (unsigned int __user *)addr);
> +		pagefault_enable();
> +		if (ret)
> +			instr = 0;
> +	}
> +	if (instr && instr_is_load_store(&instr))


Wondering if there is any possibility of getting positive values for
"(mmcra & POWER7_MMCRA_DCACHE_SRC_MASK) >> POWER7_MMCRA_DCACHE_SRC_SHIFT"
when the marked instruction did not have MMCRA[POWER7_MMCRA_DCACHE_MISS]
bit set. In that case we should actually compute dsrc->val as in the previous
case. I did couple of experiments on a P7 box, but was not able to find a
instance for a marked instruction whose MMCRA[POWER7_MMCRA_DCACHE_MISS] bit
not set and have a positive value POWER7_MMCRA_DCACHE_SRC field.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 8/8][v4] powerpc/perf: Export Power7 memory hierarchy info to user space.
  2013-09-19  8:41     ` Anshuman Khandual
@ 2013-09-24 22:30       ` Sukadev Bhattiprolu
  -1 siblings, 0 replies; 27+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-24 22:30 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linux-kernel, linuxppc-dev, Stephane Eranian, Michael Ellerman,
	Paul Mackerras

Anshuman Khandual [khandual@linux.vnet.ibm.com] wrote:
| On 09/14/2013 06:19 AM, Sukadev Bhattiprolu wrote:
| > +static void power7_get_mem_data_src(union perf_mem_data_src *dsrc,
| > +			struct pt_regs *regs)
| > +{
| > +	u64 idx;
| > +	u64 mmcra = regs->dsisr;
| > +	u64 addr;
| > +	int ret;
| > +	unsigned int instr;
| > +
| > +	if (mmcra & POWER7_MMCRA_DCACHE_MISS) {
| > +		idx = mmcra & POWER7_MMCRA_DCACHE_SRC_MASK;
| > +		idx >>= POWER7_MMCRA_DCACHE_SRC_SHIFT;
| > +
| > +		dsrc->val |= dcache_src_map[idx];
| > +		return;
| > +	}
| > +
| > +	instr = 0;
| > +	addr = perf_instruction_pointer(regs);
| > +
| > +	if (is_kernel_addr(addr))
| > +		instr = *(unsigned int *)addr;
| > +	else {
| > +		pagefault_disable();
| > +		ret = __get_user_inatomic(instr, (unsigned int __user *)addr);
| > +		pagefault_enable();
| > +		if (ret)
| > +			instr = 0;
| > +	}
| > +	if (instr && instr_is_load_store(&instr))
| 
| 
| Wondering if there is any possibility of getting positive values for
| "(mmcra & POWER7_MMCRA_DCACHE_SRC_MASK) >> POWER7_MMCRA_DCACHE_SRC_SHIFT"
| when the marked instruction did not have MMCRA[POWER7_MMCRA_DCACHE_MISS]
| bit set. In that case we should actually compute dsrc->val as in the previous
| case. I did couple of experiments on a P7 box, but was not able to find a
| instance for a marked instruction whose MMCRA[POWER7_MMCRA_DCACHE_MISS] bit
| not set and have a positive value POWER7_MMCRA_DCACHE_SRC field.

Confirmed again with the hardware team that if there was no DCACHE_MISS,
the DCACHE_SRC field will be clear.

Thanks,

Sukadev


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 8/8][v4] powerpc/perf: Export Power7 memory hierarchy info to user space.
@ 2013-09-24 22:30       ` Sukadev Bhattiprolu
  0 siblings, 0 replies; 27+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-24 22:30 UTC (permalink / raw)
  To: Anshuman Khandual
  Cc: linuxppc-dev, Michael Ellerman, Paul Mackerras, linux-kernel,
	Stephane Eranian

Anshuman Khandual [khandual@linux.vnet.ibm.com] wrote:
| On 09/14/2013 06:19 AM, Sukadev Bhattiprolu wrote:
| > +static void power7_get_mem_data_src(union perf_mem_data_src *dsrc,
| > +			struct pt_regs *regs)
| > +{
| > +	u64 idx;
| > +	u64 mmcra = regs->dsisr;
| > +	u64 addr;
| > +	int ret;
| > +	unsigned int instr;
| > +
| > +	if (mmcra & POWER7_MMCRA_DCACHE_MISS) {
| > +		idx = mmcra & POWER7_MMCRA_DCACHE_SRC_MASK;
| > +		idx >>= POWER7_MMCRA_DCACHE_SRC_SHIFT;
| > +
| > +		dsrc->val |= dcache_src_map[idx];
| > +		return;
| > +	}
| > +
| > +	instr = 0;
| > +	addr = perf_instruction_pointer(regs);
| > +
| > +	if (is_kernel_addr(addr))
| > +		instr = *(unsigned int *)addr;
| > +	else {
| > +		pagefault_disable();
| > +		ret = __get_user_inatomic(instr, (unsigned int __user *)addr);
| > +		pagefault_enable();
| > +		if (ret)
| > +			instr = 0;
| > +	}
| > +	if (instr && instr_is_load_store(&instr))
| 
| 
| Wondering if there is any possibility of getting positive values for
| "(mmcra & POWER7_MMCRA_DCACHE_SRC_MASK) >> POWER7_MMCRA_DCACHE_SRC_SHIFT"
| when the marked instruction did not have MMCRA[POWER7_MMCRA_DCACHE_MISS]
| bit set. In that case we should actually compute dsrc->val as in the previous
| case. I did couple of experiments on a P7 box, but was not able to find a
| instance for a marked instruction whose MMCRA[POWER7_MMCRA_DCACHE_MISS] bit
| not set and have a positive value POWER7_MMCRA_DCACHE_SRC field.

Confirmed again with the hardware team that if there was no DCACHE_MISS,
the DCACHE_SRC field will be clear.

Thanks,

Sukadev

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2013-09-24 22:30 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-14  0:49 [PATCH 0/8][v4] powerpc/perf: Export memory hierarchy level in Power7/8 Sukadev Bhattiprolu
2013-09-14  0:49 ` Sukadev Bhattiprolu
2013-09-14  0:49 ` [PATCH 1/8][v4] powerpc/perf: Rename Power8 macros to start with PME Sukadev Bhattiprolu
2013-09-14  0:49   ` Sukadev Bhattiprolu
2013-09-18  5:24   ` Anshuman Khandual
2013-09-18  5:24     ` Anshuman Khandual
2013-09-14  0:49 ` [PATCH 2/8][v4] powerpc/perf: Export Power8 generic events in sysfs Sukadev Bhattiprolu
2013-09-14  0:49   ` Sukadev Bhattiprolu
2013-09-14  0:49 ` [PATCH 3/8][v4] powerpc/perf: Add PM_MRK_GRP_CMPL event to sysfs Sukadev Bhattiprolu
2013-09-14  0:49   ` Sukadev Bhattiprolu
2013-09-14  0:49 ` [PATCH 4/8][v4] powerpc/perf: Define big-endian version of perf_mem_data_src Sukadev Bhattiprolu
2013-09-14  0:49   ` Sukadev Bhattiprolu
2013-09-14  0:49 ` [PATCH 5/8][v4] powerpc/perf: Export Power8 memory hierarchy info to user space Sukadev Bhattiprolu
2013-09-14  0:49   ` Sukadev Bhattiprolu
2013-09-14  0:49 ` [PATCH 6/8][v4] powerpc: Rename branch_opcode() to instr_opcode() Sukadev Bhattiprolu
2013-09-14  0:49   ` Sukadev Bhattiprolu
2013-09-14  0:49 ` [PATCH 7/8][v4] power: implement is_instr_load_store() Sukadev Bhattiprolu
2013-09-14  0:49   ` Sukadev Bhattiprolu
2013-09-16 12:22   ` Tom Musta
2013-09-14  0:49 ` [PATCH 8/8][v4] powerpc/perf: Export Power7 memory hierarchy info to user space Sukadev Bhattiprolu
2013-09-14  0:49   ` Sukadev Bhattiprolu
2013-09-18 10:47   ` Anshuman Khandual
2013-09-18 10:47     ` Anshuman Khandual
2013-09-19  8:41   ` Anshuman Khandual
2013-09-19  8:41     ` Anshuman Khandual
2013-09-24 22:30     ` Sukadev Bhattiprolu
2013-09-24 22:30       ` Sukadev Bhattiprolu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.