* [PATCH 0/8][v4] powerpc/perf: Export memory hierarchy level in Power7/8.
@ 2013-09-14 0:49 Sukadev Bhattiprolu
2013-09-14 0:49 ` [PATCH 1/8][v4] powerpc/perf: Rename Power8 macros to start with PME Sukadev Bhattiprolu
` (7 more replies)
0 siblings, 8 replies; 14+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14 0:49 UTC (permalink / raw)
To: linux-kernel
Cc: linuxppc-dev, Paul Mackerras, Michael Ellerman, Stephane Eranian,
Anshuman Khandual
Power7 and Power8 processors save the memory hierarchy level (eg: L2, L3)
from which a load or store instruction was satisfied. Export this hierarchy
information to the user via the perf_mem_data_src object.
Thanks to input from Stephane Eranian, Michael Ellerman, Michael Neuling.
Sukadev Bhattiprolu (8):
powerpc/perf: Rename Power8 macros to start with PME
powerpc/perf: Export Power8 generic events in sysfs
powerpc/perf: Add PM_MRK_GRP_CMPL event to sysfs.
powerpc/perf: Define big-endian version of perf_mem_data_src
powerpc/perf: Export Power8 memory hierarchy info to user space.
powerpc: Rename branch_opcode() to instr_opcode()
power: implement is_instr_load_store().
powerpc/perf: Export Power7 memory hierarchy info to user space.
arch/powerpc/include/asm/code-patching.h | 1 +
arch/powerpc/include/asm/perf_event_server.h | 2 +
arch/powerpc/lib/code-patching.c | 96 ++++++++++++++++++++++-
arch/powerpc/perf/core-book3s.c | 11 +++
arch/powerpc/perf/power7-pmu.c | 94 +++++++++++++++++++++++
arch/powerpc/perf/power8-pmu.c | 105 +++++++++++++++++++++++---
include/uapi/linux/perf_event.h | 58 ++++++++++++++
7 files changed, 352 insertions(+), 15 deletions(-)
--
1.7.9.5
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/8][v4] powerpc/perf: Rename Power8 macros to start with PME
2013-09-14 0:49 [PATCH 0/8][v4] powerpc/perf: Export memory hierarchy level in Power7/8 Sukadev Bhattiprolu
@ 2013-09-14 0:49 ` Sukadev Bhattiprolu
2013-09-18 5:24 ` Anshuman Khandual
2013-09-14 0:49 ` [PATCH 2/8][v4] powerpc/perf: Export Power8 generic events in sysfs Sukadev Bhattiprolu
` (6 subsequent siblings)
7 siblings, 1 reply; 14+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14 0:49 UTC (permalink / raw)
To: linux-kernel
Cc: linuxppc-dev, Paul Mackerras, Michael Ellerman, Stephane Eranian,
Anshuman Khandual
We use helpers like GENERIC_EVENT_ATTR() to list the generic events in
sysfs. To avoid name collisions, GENERIC_EVENT_ATTR() requires the perf
event macros to start with PME.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
arch/powerpc/perf/power8-pmu.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 96a64d6..30c6b12 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -18,12 +18,12 @@
/*
* Some power8 event codes.
*/
-#define PM_CYC 0x0001e
-#define PM_GCT_NOSLOT_CYC 0x100f8
-#define PM_CMPLU_STALL 0x4000a
-#define PM_INST_CMPL 0x00002
-#define PM_BRU_FIN 0x10068
-#define PM_BR_MPRED_CMPL 0x400f6
+#define PME_PM_CYC 0x0001e
+#define PME_PM_GCT_NOSLOT_CYC 0x100f8
+#define PME_PM_CMPLU_STALL 0x4000a
+#define PME_PM_INST_CMPL 0x00002
+#define PME_PM_BRU_FIN 0x10068
+#define PME_PM_BR_MPRED_CMPL 0x400f6
/*
@@ -550,12 +550,12 @@ static const struct attribute_group *power8_pmu_attr_groups[] = {
};
static int power8_generic_events[] = {
- [PERF_COUNT_HW_CPU_CYCLES] = PM_CYC,
- [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = PM_GCT_NOSLOT_CYC,
- [PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = PM_CMPLU_STALL,
- [PERF_COUNT_HW_INSTRUCTIONS] = PM_INST_CMPL,
- [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = PM_BRU_FIN,
- [PERF_COUNT_HW_BRANCH_MISSES] = PM_BR_MPRED_CMPL,
+ [PERF_COUNT_HW_CPU_CYCLES] = PME_PM_CYC,
+ [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = PME_PM_GCT_NOSLOT_CYC,
+ [PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = PME_PM_CMPLU_STALL,
+ [PERF_COUNT_HW_INSTRUCTIONS] = PME_PM_INST_CMPL,
+ [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = PME_PM_BRU_FIN,
+ [PERF_COUNT_HW_BRANCH_MISSES] = PME_PM_BR_MPRED_CMPL,
};
static u64 power8_bhrb_filter_map(u64 branch_sample_type)
--
1.7.9.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/8][v4] powerpc/perf: Export Power8 generic events in sysfs
2013-09-14 0:49 [PATCH 0/8][v4] powerpc/perf: Export memory hierarchy level in Power7/8 Sukadev Bhattiprolu
2013-09-14 0:49 ` [PATCH 1/8][v4] powerpc/perf: Rename Power8 macros to start with PME Sukadev Bhattiprolu
@ 2013-09-14 0:49 ` Sukadev Bhattiprolu
2013-09-14 0:49 ` [PATCH 3/8][v4] powerpc/perf: Add PM_MRK_GRP_CMPL event to sysfs Sukadev Bhattiprolu
` (5 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14 0:49 UTC (permalink / raw)
To: linux-kernel
Cc: linuxppc-dev, Paul Mackerras, Michael Ellerman, Stephane Eranian,
Anshuman Khandual
Export generic perf events for Power8 in sysfs.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
arch/powerpc/perf/power8-pmu.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 30c6b12..ff98fb8 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -510,6 +510,28 @@ static void power8_disable_pmc(unsigned int pmc, unsigned long mmcr[])
mmcr[1] &= ~(0xffUL << MMCR1_PMCSEL_SHIFT(pmc + 1));
}
+GENERIC_EVENT_ATTR(cpu-cyles, PM_CYC);
+GENERIC_EVENT_ATTR(stalled-cycles-frontend, PM_GCT_NOSLOT_CYC);
+GENERIC_EVENT_ATTR(stalled-cycles-backend, PM_CMPLU_STALL);
+GENERIC_EVENT_ATTR(instructions, PM_INST_CMPL);
+GENERIC_EVENT_ATTR(branch-instructions, PM_BRU_FIN);
+GENERIC_EVENT_ATTR(branch-misses, PM_BR_MPRED_CMPL);
+
+static struct attribute *power8_events_attr[] = {
+ GENERIC_EVENT_PTR(PM_CYC),
+ GENERIC_EVENT_PTR(PM_GCT_NOSLOT_CYC),
+ GENERIC_EVENT_PTR(PM_CMPLU_STALL),
+ GENERIC_EVENT_PTR(PM_INST_CMPL),
+ GENERIC_EVENT_PTR(PM_BRU_FIN),
+ GENERIC_EVENT_PTR(PM_BR_MPRED_CMPL),
+ NULL
+};
+
+static struct attribute_group power8_pmu_events_group = {
+ .name = "events",
+ .attrs = power8_events_attr,
+};
+
PMU_FORMAT_ATTR(event, "config:0-49");
PMU_FORMAT_ATTR(pmcxsel, "config:0-7");
PMU_FORMAT_ATTR(mark, "config:8");
@@ -546,6 +568,7 @@ struct attribute_group power8_pmu_format_group = {
static const struct attribute_group *power8_pmu_attr_groups[] = {
&power8_pmu_format_group,
+ &power8_pmu_events_group,
NULL,
};
--
1.7.9.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/8][v4] powerpc/perf: Add PM_MRK_GRP_CMPL event to sysfs.
2013-09-14 0:49 [PATCH 0/8][v4] powerpc/perf: Export memory hierarchy level in Power7/8 Sukadev Bhattiprolu
2013-09-14 0:49 ` [PATCH 1/8][v4] powerpc/perf: Rename Power8 macros to start with PME Sukadev Bhattiprolu
2013-09-14 0:49 ` [PATCH 2/8][v4] powerpc/perf: Export Power8 generic events in sysfs Sukadev Bhattiprolu
@ 2013-09-14 0:49 ` Sukadev Bhattiprolu
2013-09-14 0:49 ` [PATCH 4/8][v4] powerpc/perf: Define big-endian version of perf_mem_data_src Sukadev Bhattiprolu
` (4 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14 0:49 UTC (permalink / raw)
To: linux-kernel
Cc: linuxppc-dev, Paul Mackerras, Michael Ellerman, Stephane Eranian,
Anshuman Khandual
The perf event PM_MRK_GRP_CMPL is useful in analyzing memory hierarchy
of applications.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
arch/powerpc/perf/power8-pmu.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index ff98fb8..5c61e59 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -24,6 +24,7 @@
#define PME_PM_INST_CMPL 0x00002
#define PME_PM_BRU_FIN 0x10068
#define PME_PM_BR_MPRED_CMPL 0x400f6
+#define PME_PM_MRK_GRP_CMPL 0x40130
/*
@@ -517,6 +518,8 @@ GENERIC_EVENT_ATTR(instructions, PM_INST_CMPL);
GENERIC_EVENT_ATTR(branch-instructions, PM_BRU_FIN);
GENERIC_EVENT_ATTR(branch-misses, PM_BR_MPRED_CMPL);
+POWER_EVENT_ATTR(PM_MRK_GRP_CMPL, PM_MRK_GRP_CMPL);
+
static struct attribute *power8_events_attr[] = {
GENERIC_EVENT_PTR(PM_CYC),
GENERIC_EVENT_PTR(PM_GCT_NOSLOT_CYC),
@@ -524,6 +527,8 @@ static struct attribute *power8_events_attr[] = {
GENERIC_EVENT_PTR(PM_INST_CMPL),
GENERIC_EVENT_PTR(PM_BRU_FIN),
GENERIC_EVENT_PTR(PM_BR_MPRED_CMPL),
+
+ POWER_EVENT_PTR(PM_MRK_GRP_CMPL),
NULL
};
--
1.7.9.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 4/8][v4] powerpc/perf: Define big-endian version of perf_mem_data_src
2013-09-14 0:49 [PATCH 0/8][v4] powerpc/perf: Export memory hierarchy level in Power7/8 Sukadev Bhattiprolu
` (2 preceding siblings ...)
2013-09-14 0:49 ` [PATCH 3/8][v4] powerpc/perf: Add PM_MRK_GRP_CMPL event to sysfs Sukadev Bhattiprolu
@ 2013-09-14 0:49 ` Sukadev Bhattiprolu
2013-09-14 0:49 ` [PATCH 5/8][v4] powerpc/perf: Export Power8 memory hierarchy info to user space Sukadev Bhattiprolu
` (3 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14 0:49 UTC (permalink / raw)
To: linux-kernel
Cc: linuxppc-dev, Paul Mackerras, Michael Ellerman, Stephane Eranian,
Anshuman Khandual
perf_mem_data_src is an union that is initialized via the ->val field
and accessed via the bitmap fields. For this to work on big endian
platforms, we also need a big-endian represenation of perf_mem_data_src.
Cc: Stephane Eranian <eranian@google.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
Changelog [v2]:
- [Vince Weaver, Michael Ellerman] No __KERNEL__ in uapi headers.
include/uapi/linux/perf_event.h | 58 +++++++++++++++++++++++++++++++++++++++
1 file changed, 58 insertions(+)
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 62c25a2..7a4f9bb 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -19,6 +19,50 @@
#include <asm/byteorder.h>
/*
+ * Kernel and userspace check for endianness in incompatible ways.
+ * In user space, <endian.h> defines both __BIG_ENDIAN and __LITTLE_ENDIAN
+ * but sets __BYTE_ORDER to one or the other. So user space uses checks are:
+ *
+ * #if __BYTE_ORDER == __LITTLE_ENDIAN
+ *
+ * In the kernel, __BYTE_ORDER is undefined, so using the above check doesn't
+ * work. Further, kernel code assumes that exactly one of __BIG_ENDIAN and
+ * __LITTLE_ENDIAN is defined. So the kernel checks are like:
+ *
+ * #if defined(__LITTLE_ENDIAN)
+ *
+ * But we can't use that check in user space since __LITTLE_ENDIAN (and
+ * __BIG_ENDIAN) are always defined.
+ *
+ * Since some perf data structures depend on endianness _and_ are shared
+ * between kernel and user, perf needs its own notion of endian macros (at
+ * least until user and kernel endian checks converge).
+ */
+#define __PERF_LE 1234
+#define __PERF_BE 4321
+
+#if defined(__BYTE_ORDER)
+
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+#define __PERF_BYTE_ORDER __PERF_LE
+#elif __BYTE_ORDER == __BIG_ENDIAN
+#define __PERF_BYTE_ORDER __PERF_BE
+#endif
+
+#else /* __BYTE_ORDER */
+
+#if defined(__LITTLE_ENDIAN) && defined(__BIG_ENDIAN)
+#error "Cannot determine endianness"
+#elif defined(__LITTLE_ENDIAN)
+#define __PERF_BYTE_ORDER __PERF_LE
+#elif defined(__BIG_ENDIAN)
+#define __PERF_BYTE_ORDER __PERF_BE
+#endif
+
+
+#endif /* __BYTE_ORDER */
+
+/*
* User-space ABI bits:
*/
@@ -659,6 +703,7 @@ enum perf_callchain_context {
#define PERF_FLAG_FD_OUTPUT (1U << 1)
#define PERF_FLAG_PID_CGROUP (1U << 2) /* pid=cgroup id, per-cpu mode only */
+#if __PERF_BYTE_ORDER == __PERF_LE
union perf_mem_data_src {
__u64 val;
struct {
@@ -670,6 +715,19 @@ union perf_mem_data_src {
mem_rsvd:31;
};
};
+#elif __PERF_BYTE_ORDER == __PERF_BE
+union perf_mem_data_src {
+ __u64 val;
+ struct {
+ __u64 mem_rsvd:31,
+ mem_dtlb:7, /* tlb access */
+ mem_lock:2, /* lock instr */
+ mem_snoop:5, /* snoop mode */
+ mem_lvl:14, /* memory hierarchy level */
+ mem_op:5; /* type of opcode */
+ };
+};
+#endif
/* type of opcode (load/store/prefetch,code) */
#define PERF_MEM_OP_NA 0x01 /* not available */
--
1.7.9.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 5/8][v4] powerpc/perf: Export Power8 memory hierarchy info to user space.
2013-09-14 0:49 [PATCH 0/8][v4] powerpc/perf: Export memory hierarchy level in Power7/8 Sukadev Bhattiprolu
` (3 preceding siblings ...)
2013-09-14 0:49 ` [PATCH 4/8][v4] powerpc/perf: Define big-endian version of perf_mem_data_src Sukadev Bhattiprolu
@ 2013-09-14 0:49 ` Sukadev Bhattiprolu
2013-09-14 0:49 ` [PATCH 6/8][v4] powerpc: Rename branch_opcode() to instr_opcode() Sukadev Bhattiprolu
` (2 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14 0:49 UTC (permalink / raw)
To: linux-kernel
Cc: linuxppc-dev, Paul Mackerras, Michael Ellerman, Stephane Eranian,
Anshuman Khandual
On Power8, the LDST field in SIER identifies the memory hierarchy level
(eg: L1, L2 etc), from which a data-cache miss for a marked instruction
was satisfied.
Use the 'perf_mem_data_src' object to export this hierarchy level to user
space. Fortunately, the memory hierarchy levels in Power8 map fairly easily
into the arch-neutral levels as described by the ldst_src_map[] table.
Usage:
perf record -d -e 'cpu/PM_MRK_GRP_CMPL/' <application>
perf report -n --mem-mode --sort=mem,sym,dso,symbol_daddr,dso_daddr"
For samples involving load/store instructions, the memory
hierarchy level is shown as "L1 hit", "Remote RAM hit" etc.
# or
perf record --data <application>
perf report -D
Sample records contain a 'data_src' field which encodes the
memory hierarchy level: Eg: data_src 0x442 indicates
MEM_OP_LOAD, MEM_LVL_HIT, MEM_LVL_L2 (i.e load hit L2).
Note that the PMU event PM_MRK_GRP_CMPL tracks all marked group completions
events. While some of these are loads and stores, others like 'add'
instructions may also be sampled. One alternative of sampling on
PM_MRK_GRP_CMPL and throwing away non-loads and non-store samples could
yield an inconsistent profile of the application.
As the precise semantics of 'perf mem -t load' or 'perf mem -t store' (which
require sampling only loads or only stores) cannot be implemented on Power,
we don't implement 'perf mem' on Power for now.
Thanks to input from Stephane Eranian, Michael Ellerman and Michael Neuling.
Cc: Stephane Eranian <eranian@google.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
Changelog[v4]:
Drop support for 'perf mem' for Power (use perf-record and perf-report
directly)
arch/powerpc/include/asm/perf_event_server.h | 2 +
arch/powerpc/perf/core-book3s.c | 11 ++++++
arch/powerpc/perf/power8-pmu.c | 53 ++++++++++++++++++++++++++
3 files changed, 66 insertions(+)
diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
index cc5f45b..2252798 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -37,6 +37,8 @@ struct power_pmu {
void (*config_bhrb)(u64 pmu_bhrb_filter);
void (*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
int (*limited_pmc_event)(u64 event_id);
+ void (*get_mem_data_src)(union perf_mem_data_src *dsrc,
+ struct pt_regs *regs);
u32 flags;
const struct attribute_group **attr_groups;
int n_generic;
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index a3985ae..e61fd05 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1693,6 +1693,13 @@ ssize_t power_events_sysfs_show(struct device *dev,
return sprintf(page, "event=0x%02llx\n", pmu_attr->id);
}
+static inline void power_get_mem_data_src(union perf_mem_data_src *dsrc,
+ struct pt_regs *regs)
+{
+ if (ppmu->get_mem_data_src)
+ ppmu->get_mem_data_src(dsrc, regs);
+}
+
struct pmu power_pmu = {
.pmu_enable = power_pmu_enable,
.pmu_disable = power_pmu_disable,
@@ -1774,6 +1781,10 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
data.br_stack = &cpuhw->bhrb_stack;
}
+ if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC &&
+ ppmu->get_mem_data_src)
+ ppmu->get_mem_data_src(&data.data_src, regs);
+
if (perf_event_overflow(event, &data, regs))
power_pmu_stop(event, 0);
}
diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 5c61e59..4ecf903 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -537,6 +537,58 @@ static struct attribute_group power8_pmu_events_group = {
.attrs = power8_events_attr,
};
+#define POWER8_SIER_TYPE_SHIFT 15
+#define POWER8_SIER_TYPE_MASK (0x7LL << POWER8_SIER_TYPE_SHIFT)
+
+#define POWER8_SIER_LDST_SHIFT 1
+#define POWER8_SIER_LDST_MASK (0x7LL << POWER8_SIER_LDST_SHIFT)
+
+#define P(a, b) PERF_MEM_S(a, b)
+#define PLH(a, b) (P(OP, LOAD) | P(LVL, HIT) | P(a, b))
+#define PSM(a, b) (P(OP, STORE) | P(LVL, MISS) | P(a, b))
+
+/*
+ * Power8 interpretations:
+ * REM_CCE1: 1-hop indicates L2/L3 cache of a different core on same chip
+ * REM_CCE2: 2-hop indicates different chip or different node.
+ */
+static u64 ldst_src_map[] = {
+ /* 000 */ P(LVL, NA),
+
+ /* 001 */ PLH(LVL, L1),
+ /* 010 */ PLH(LVL, L2),
+ /* 011 */ PLH(LVL, L3),
+ /* 100 */ PLH(LVL, LOC_RAM),
+ /* 101 */ PLH(LVL, REM_CCE1),
+ /* 110 */ PLH(LVL, REM_CCE2),
+
+ /* 111 */ PSM(LVL, L1),
+};
+
+static inline bool is_load_store_inst(u64 sier)
+{
+ u64 val;
+ val = (sier & POWER8_SIER_TYPE_MASK) >> POWER8_SIER_TYPE_SHIFT;
+
+ /* 1 = load, 2 = store */
+ return val == 1 || val == 2;
+}
+
+static void power8_get_mem_data_src(union perf_mem_data_src *dsrc,
+ struct pt_regs *regs)
+{
+ u64 idx;
+ u64 sier;
+
+ sier = mfspr(SPRN_SIER);
+
+ if (is_load_store_inst(sier)) {
+ idx = (sier & POWER8_SIER_LDST_MASK) >> POWER8_SIER_LDST_SHIFT;
+
+ dsrc->val |= ldst_src_map[idx];
+ }
+}
+
PMU_FORMAT_ATTR(event, "config:0-49");
PMU_FORMAT_ATTR(pmcxsel, "config:0-7");
PMU_FORMAT_ATTR(mark, "config:8");
@@ -640,6 +692,7 @@ static struct power_pmu power8_pmu = {
.get_constraint = power8_get_constraint,
.get_alternatives = power8_get_alternatives,
.disable_pmc = power8_disable_pmc,
+ .get_mem_data_src = power8_get_mem_data_src,
.flags = PPMU_HAS_SSLOT | PPMU_HAS_SIER | PPMU_BHRB | PPMU_EBB,
.n_generic = ARRAY_SIZE(power8_generic_events),
.generic_events = power8_generic_events,
--
1.7.9.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 6/8][v4] powerpc: Rename branch_opcode() to instr_opcode()
2013-09-14 0:49 [PATCH 0/8][v4] powerpc/perf: Export memory hierarchy level in Power7/8 Sukadev Bhattiprolu
` (4 preceding siblings ...)
2013-09-14 0:49 ` [PATCH 5/8][v4] powerpc/perf: Export Power8 memory hierarchy info to user space Sukadev Bhattiprolu
@ 2013-09-14 0:49 ` Sukadev Bhattiprolu
2013-09-14 0:49 ` [PATCH 7/8][v4] power: implement is_instr_load_store() Sukadev Bhattiprolu
2013-09-14 0:49 ` [PATCH 8/8][v4] powerpc/perf: Export Power7 memory hierarchy info to user space Sukadev Bhattiprolu
7 siblings, 0 replies; 14+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14 0:49 UTC (permalink / raw)
To: linux-kernel
Cc: linuxppc-dev, Paul Mackerras, Michael Ellerman, Stephane Eranian,
Anshuman Khandual
The logic used in branch_opcode() to extract the opcode for an instruction
applies to non branch instructions also. So rename to instr_opcode().
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
arch/powerpc/lib/code-patching.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 17e5b23..2bc9db3 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -72,19 +72,19 @@ unsigned int create_cond_branch(const unsigned int *addr,
return instruction;
}
-static unsigned int branch_opcode(unsigned int instr)
+static unsigned int instr_opcode(unsigned int instr)
{
return (instr >> 26) & 0x3F;
}
static int instr_is_branch_iform(unsigned int instr)
{
- return branch_opcode(instr) == 18;
+ return instr_opcode(instr) == 18;
}
static int instr_is_branch_bform(unsigned int instr)
{
- return branch_opcode(instr) == 16;
+ return instr_opcode(instr) == 16;
}
int instr_is_relative_branch(unsigned int instr)
--
1.7.9.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 7/8][v4] power: implement is_instr_load_store().
2013-09-14 0:49 [PATCH 0/8][v4] powerpc/perf: Export memory hierarchy level in Power7/8 Sukadev Bhattiprolu
` (5 preceding siblings ...)
2013-09-14 0:49 ` [PATCH 6/8][v4] powerpc: Rename branch_opcode() to instr_opcode() Sukadev Bhattiprolu
@ 2013-09-14 0:49 ` Sukadev Bhattiprolu
2013-09-16 12:22 ` Tom Musta
2013-09-14 0:49 ` [PATCH 8/8][v4] powerpc/perf: Export Power7 memory hierarchy info to user space Sukadev Bhattiprolu
7 siblings, 1 reply; 14+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14 0:49 UTC (permalink / raw)
To: linux-kernel
Cc: linuxppc-dev, Paul Mackerras, Michael Ellerman, Stephane Eranian,
Anshuman Khandual
Implement is_instr_load_store() to detect whether a given instruction
is one of the fixed-point or floating-point load/store instructions.
This function will be used in a follow-on patch to save memory hierarchy
information of the load/store.
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/code-patching.h | 1 +
arch/powerpc/lib/code-patching.c | 90 ++++++++++++++++++++++++++++++
2 files changed, 91 insertions(+)
diff --git a/arch/powerpc/include/asm/code-patching.h b/arch/powerpc/include/asm/code-patching.h
index a6f8c7a..3e47fe0 100644
--- a/arch/powerpc/include/asm/code-patching.h
+++ b/arch/powerpc/include/asm/code-patching.h
@@ -34,6 +34,7 @@ int instr_is_branch_to_addr(const unsigned int *instr, unsigned long addr);
unsigned long branch_target(const unsigned int *instr);
unsigned int translate_branch(const unsigned int *dest,
const unsigned int *src);
+int instr_is_load_store(const unsigned int *instr);
static inline unsigned long ppc_function_entry(void *func)
{
diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 2bc9db3..7e5dc6f 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -159,6 +159,96 @@ unsigned int translate_branch(const unsigned int *dest, const unsigned int *src)
return 0;
}
+static unsigned int load_store_xval(const unsigned int instr)
+{
+ return (instr >> 1) & 0x3FF; /* bits 21..30 */
+}
+
+/*
+ * Values of bits 21:30 of Fixed-point and Floating-point Load and Store
+ * instructions.
+ *
+ * Reference: PowerISA_V2.06B_Public.pdf, Sections 3.3.2 through 3.3.6 and
+ * 4.6.2 through 4.6.4.
+ */
+#define x_lbzx 87
+#define x_lbzux 119
+#define x_lhzx 279
+#define x_lhzux 311
+#define x_lhax 343
+#define x_lhaux 375
+#define x_lwzx 23
+#define x_lwzux 55
+#define x_lwax 341
+#define x_lwaux 373
+#define x_ldx 21
+#define x_ldux 53
+#define x_stbx 215
+#define x_stbux 247
+#define x_sthx 407
+#define x_sthux 439
+#define x_stwx 151
+#define x_stwux 183
+#define x_stdx 149
+#define x_stdux 181
+#define x_lhbrx 790
+#define x_lwbrx 534
+#define x_sthbrx 918
+#define x_stwbrx 662
+#define x_ldbrx 532
+#define x_stdbrx 660
+#define x_lswi 597
+#define x_lswx 533
+#define x_stswi 725
+#define x_stswx 661
+#define x_lfsx 535
+#define x_lfsux 567
+#define x_lfdx 599
+#define x_lfdux 631
+#define x_lfiwax 855
+#define x_lfiwzx 887
+#define x_stfsx 663
+#define x_stfsux 695
+#define x_stfdx 727
+#define x_stfdux 759
+#define x_stfiwax 983
+#define x_lfdpx 791
+#define x_stfdpx 919
+
+static unsigned int x_form_load_store[] = {
+ x_lbzx, x_lbzux, x_lhzx, x_lhzux, x_lhax,
+ x_lhaux, x_lwzx, x_lwzux, x_lwax, x_lwaux,
+ x_ldx, x_ldux, x_stbx, x_stbux, x_sthx,
+ x_sthux, x_stwx, x_stwux, x_stdx, x_stdux,
+ x_lhbrx, x_lwbrx, x_sthbrx, x_stwbrx, x_ldbrx,
+ x_stdbrx, x_lswi, x_lswx, x_stswi, x_stswx,
+ x_lfsx, x_lfsux, x_lfdx, x_lfdux, x_lfiwax,
+ x_lfiwzx, x_stfsx, x_stfsux, x_stfdx, x_stfdux,
+ x_stfiwax, x_lfdpx, x_stfdpx
+};
+
+int instr_is_load_store(const unsigned int *instr)
+{
+ unsigned int op;
+ int i, n;
+
+ op = instr_opcode(*instr);
+
+ if ((op >= 32 && op <= 58) || (op == 61 || op == 62))
+ return 1;
+
+ if (op == 31) {
+ n = sizeof(x_form_load_store) / sizeof(int);
+
+ for (i = 0; i < n; i++) {
+ if (x_form_load_store[i] == load_store_xval(*instr))
+ return 1;
+ }
+ }
+
+ return 0;
+}
+
#ifdef CONFIG_CODE_PATCHING_SELFTEST
--
1.7.9.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 8/8][v4] powerpc/perf: Export Power7 memory hierarchy info to user space.
2013-09-14 0:49 [PATCH 0/8][v4] powerpc/perf: Export memory hierarchy level in Power7/8 Sukadev Bhattiprolu
` (6 preceding siblings ...)
2013-09-14 0:49 ` [PATCH 7/8][v4] power: implement is_instr_load_store() Sukadev Bhattiprolu
@ 2013-09-14 0:49 ` Sukadev Bhattiprolu
2013-09-18 10:47 ` Anshuman Khandual
2013-09-19 8:41 ` Anshuman Khandual
7 siblings, 2 replies; 14+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-14 0:49 UTC (permalink / raw)
To: linux-kernel
Cc: linuxppc-dev, Paul Mackerras, Michael Ellerman, Stephane Eranian,
Anshuman Khandual
On Power7, the DCACHE_SRC field in MMCRA register identifies the memory
hierarchy level (eg: L2, L3 etc) from which a data-cache miss for a
marked instruction was satisfied.
Use the 'perf_mem_data_src' object to export this hierarchy level to user
space. Some memory hierarchy levels in Power7 don't map into the arch-neutral
levels. However, since newer generation of the processor (i.e. Power8) uses
fewer levels than in Power7, we don't really need to define new hierarchy
levels just for Power7.
We instead, map as many levels as possible and approximate the rest. See
comments near dcache-src_map[] in the patch.
Usage:
perf record -d -e 'cpu/PM_MRK_GRP_CMPL/' <application>
perf report -n --mem-mode --sort=mem,sym,dso,symbol_daddr,dso_daddr"
For samples involving load/store instructions, the memory
hierarchy level is shown as "L1 hit", "Remote RAM hit" etc.
# or
perf record --data <application>
perf report -D
Sample records contain a 'data_src' field which encodes the
memory hierarchy level: Eg: data_src 0x442 indicates
MEM_OP_LOAD, MEM_LVL_HIT, MEM_LVL_L2 (i.e load hit L2).
Note that the PMU event PM_MRK_GRP_CMPL tracks all marked group completions
events. While some of these are loads and stores, others like 'add'
instructions may also be sampled.
As such, the precise semantics of 'perf mem -t load' or 'perf mem -t store'
(which require sampling only loads or only stores cannot be implemented on
Power. (Sampling on PM_MRK_GRP_CMPL and throwing away non-loads and non-store
samples could yield an inconsistent profile of the application).
Thanks to input from Stephane Eranian, Michael Ellerman and Michael Neuling.
Cc: Stephane Eranian <eranian@google.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
Changelog[v4]:
Drop support for 'perf mem' for Power (use perf-record and perf-report
directly)
Changelog[v3]:
[Michael Ellerman] If newer levels that we defined in [v2] are not
needed for Power8, ignore the new levels for Power7 also, and
approximate them.
Separate the TLB level mapping to a separate patchset.
Changelog[v2]:
[Stephane Eranian] Define new levels rather than ORing the L2 and L3
with REM_CCE1 and REM_CCE2.
[Stephane Eranian] allocate a bit PERF_MEM_XLVL_NA for architectures
that don't use the ->mem_xlvl field.
Insert the TLB patch ahead so the new TLB bits are contigous with
existing TLB bits.
arch/powerpc/perf/power7-pmu.c | 94 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 94 insertions(+)
diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
index 56c67bc..ddfa548 100644
--- a/arch/powerpc/perf/power7-pmu.c
+++ b/arch/powerpc/perf/power7-pmu.c
@@ -11,8 +11,10 @@
#include <linux/kernel.h>
#include <linux/perf_event.h>
#include <linux/string.h>
+#include <linux/uaccess.h>
#include <asm/reg.h>
#include <asm/cputable.h>
+#include <asm/code-patching.h>
/*
* Bits in event code for POWER7
@@ -317,6 +319,97 @@ static void power7_disable_pmc(unsigned int pmc, unsigned long mmcr[])
mmcr[1] &= ~(0xffUL << MMCR1_PMCSEL_SH(pmc));
}
+#define POWER7_MMCRA_DCACHE_MISS (0x1LL << 55)
+#define POWER7_MMCRA_DCACHE_SRC_SHIFT 51
+#define POWER7_MMCRA_DCACHE_SRC_MASK (0xFLL << POWER7_MMCRA_DCACHE_SRC_SHIFT)
+
+#define P(a, b) PERF_MEM_S(a, b)
+#define PLH(a, b) (P(OP, LOAD) | P(LVL, HIT) | P(a, b))
+/*
+ * Map the Power7 DCACHE_SRC field (bits 9..12) in MMCRA register to the
+ * architecture-neutral memory hierarchy levels. For the levels in Power7
+ * that don't map to the arch-neutral levels, approximate to nearest
+ * level.
+ *
+ * 1-hop: indicates another core on the same chip (2.1 and 3.1 levels).
+ * 2-hops: indicates a different chip on same or different node (remote
+ * and distant levels).
+ *
+ * For consistency with this interpretation of the hops, we dont use
+ * the REM_RAM1 level below.
+ *
+ * The *SHR and *MOD states of the cache are ignored/not exported to user.
+ *
+ * ### Levels marked with ### in comments below are approximated
+ */
+static u64 dcache_src_map[] = {
+ PLH(LVL, L2), /* 00: FROM_L2 */
+ PLH(LVL, L3), /* 01: FROM_L3 */
+
+ P(LVL, NA), /* 02: Reserved */
+ P(LVL, NA), /* 03: Reserved */
+
+ PLH(LVL, REM_CCE1), /* 04: FROM_L2.1_SHR ### */
+ PLH(LVL, REM_CCE1), /* 05: FROM_L2.1_MOD ### */
+
+ PLH(LVL, REM_CCE1), /* 06: FROM_L3.1_SHR ### */
+ PLH(LVL, REM_CCE1), /* 07: FROM_L3.1_MOD ### */
+
+ PLH(LVL, REM_CCE2), /* 08: FROM_RL2L3_SHR ### */
+ PLH(LVL, REM_CCE2), /* 09: FROM_RL2L3_MOD ### */
+
+ PLH(LVL, REM_CCE2), /* 10: FROM_DL2L3_SHR ### */
+ PLH(LVL, REM_CCE2), /* 11: FROM_DL2L3_MOD ### */
+
+ PLH(LVL, LOC_RAM), /* 12: FROM_LMEM */
+ PLH(LVL, REM_RAM2), /* 13: FROM_RMEM ### */
+ PLH(LVL, REM_RAM2), /* 14: FROM_DMEM */
+
+ P(LVL, NA), /* 15: Reserved */
+};
+
+/*
+ * Determine the memory-hierarchy information (if applicable) for the
+ * instruction/address we are sampling. If we encountered a DCACHE_MISS,
+ * mmcra[DCACHE_SRC_MASK] specifies the memory level from which the operand
+ * was loaded.
+ *
+ * Otherwise, it is an L1-hit, provided the instruction was a load/store.
+ */
+static void power7_get_mem_data_src(union perf_mem_data_src *dsrc,
+ struct pt_regs *regs)
+{
+ u64 idx;
+ u64 mmcra = regs->dsisr;
+ u64 addr;
+ int ret;
+ unsigned int instr;
+
+ if (mmcra & POWER7_MMCRA_DCACHE_MISS) {
+ idx = mmcra & POWER7_MMCRA_DCACHE_SRC_MASK;
+ idx >>= POWER7_MMCRA_DCACHE_SRC_SHIFT;
+
+ dsrc->val |= dcache_src_map[idx];
+ return;
+ }
+
+ instr = 0;
+ addr = perf_instruction_pointer(regs);
+
+ if (is_kernel_addr(addr))
+ instr = *(unsigned int *)addr;
+ else {
+ pagefault_disable();
+ ret = __get_user_inatomic(instr, (unsigned int __user *)addr);
+ pagefault_enable();
+ if (ret)
+ instr = 0;
+ }
+ if (instr && instr_is_load_store(&instr))
+ dsrc->val |= PLH(LVL, L1);
+}
+
+
static int power7_generic_events[] = {
[PERF_COUNT_HW_CPU_CYCLES] = PME_PM_CYC,
[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = PME_PM_GCT_NOSLOT_CYC,
@@ -437,6 +530,7 @@ static struct power_pmu power7_pmu = {
.get_constraint = power7_get_constraint,
.get_alternatives = power7_get_alternatives,
.disable_pmc = power7_disable_pmc,
+ .get_mem_data_src = power7_get_mem_data_src,
.flags = PPMU_ALT_SIPR,
.attr_groups = power7_pmu_attr_groups,
.n_generic = ARRAY_SIZE(power7_generic_events),
--
1.7.9.5
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 7/8][v4] power: implement is_instr_load_store().
2013-09-14 0:49 ` [PATCH 7/8][v4] power: implement is_instr_load_store() Sukadev Bhattiprolu
@ 2013-09-16 12:22 ` Tom Musta
0 siblings, 0 replies; 14+ messages in thread
From: Tom Musta @ 2013-09-16 12:22 UTC (permalink / raw)
To: Sukadev Bhattiprolu, linuxppc-dev
On 9/13/2013 7:49 PM, Sukadev Bhattiprolu wrote:
> Implement is_instr_load_store() to detect whether a given instruction
> is one of the fixed-point or floating-point load/store instructions.
> This function will be used in a follow-on patch to save memory hierarchy
> information of the load/store.
>
>
> +/*
> + * Values of bits 21:30 of Fixed-point and Floating-point Load and Store
> + * instructions.
> + *
> + * Reference: PowerISA_V2.06B_Public.pdf, Sections 3.3.2 through 3.3.6 and
> + * 4.6.2 through 4.6.4.
> + */
> +#define x_lbzx 87
> +#define x_lbzux 119
> +#define x_lhzx 279
> <snip>
> +
> +static unsigned int x_form_load_store[] = {
> + x_lbzx, x_lbzux, x_lhzx, x_lhzux, x_lhax,
> + x_lhaux, x_lwzx, x_lwzux, x_lwax, x_lwaux,
> + x_ldx, x_ldux, x_stbx, x_stbux, x_sthx,
> + x_sthux, x_stwx, x_stwux, x_stdx, x_stdux,
> + x_lhbrx, x_lwbrx, x_sthbrx, x_stwbrx, x_ldbrx,
> + x_stdbrx, x_lswi, x_lswx, x_stswi, x_stswx,
> + x_lfsx, x_lfsux, x_lfdx, x_lfdux, x_lfiwax,
> + x_lfiwzx, x_stfsx, x_stfsux, x_stfdx, x_stfdux,
> + x_stfiwax, x_lfdpx, x_stfdpx
> +};
> +
> <snip>
> +
>
>
Can you explain why this function only covers fixed point and floating
point instructions? I.e., why did you skip Altivec and VSX?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/8][v4] powerpc/perf: Rename Power8 macros to start with PME
2013-09-14 0:49 ` [PATCH 1/8][v4] powerpc/perf: Rename Power8 macros to start with PME Sukadev Bhattiprolu
@ 2013-09-18 5:24 ` Anshuman Khandual
0 siblings, 0 replies; 14+ messages in thread
From: Anshuman Khandual @ 2013-09-18 5:24 UTC (permalink / raw)
To: Sukadev Bhattiprolu
Cc: linuxppc-dev, Michael Ellerman, Paul Mackerras, linux-kernel,
Stephane Eranian
On 09/14/2013 06:19 AM, Sukadev Bhattiprolu wrote:
> We use helpers like GENERIC_EVENT_ATTR() to list the generic events in
> sysfs. To avoid name collisions, GENERIC_EVENT_ATTR() requires the perf
> event macros to start with PME.
We got all the raw event codes covered for P7 with the help of power7-events-list.h
enumeration.
/*
* Power7 event codes.
*/
#define EVENT(_name, _code) \
PME_##_name = _code,
enum {
#include "power7-events-list.h"
};
#undef EVENT
Just wondering if its a good idea to name change these selected macros to be consumed
by GENERIC_EVENT_ATTR() right here for this purpose or we need to get the comprehensive
list of raw events for P8 first. Just an idea.
Regards
Anshuman
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 8/8][v4] powerpc/perf: Export Power7 memory hierarchy info to user space.
2013-09-14 0:49 ` [PATCH 8/8][v4] powerpc/perf: Export Power7 memory hierarchy info to user space Sukadev Bhattiprolu
@ 2013-09-18 10:47 ` Anshuman Khandual
2013-09-19 8:41 ` Anshuman Khandual
1 sibling, 0 replies; 14+ messages in thread
From: Anshuman Khandual @ 2013-09-18 10:47 UTC (permalink / raw)
To: Sukadev Bhattiprolu
Cc: Stephane Eranian, linuxppc-dev, Paul Mackerras, linux-kernel,
Michael Ellerman
On 09/14/2013 06:19 AM, Sukadev Bhattiprolu wrote:
> On Power7, the DCACHE_SRC field in MMCRA register identifies the memory
> hierarchy level (eg: L2, L3 etc) from which a data-cache miss for a
> marked instruction was satisfied.
>
> Use the 'perf_mem_data_src' object to export this hierarchy level to user
> space. Some memory hierarchy levels in Power7 don't map into the arch-neutral
> levels. However, since newer generation of the processor (i.e. Power8) uses
> fewer levels than in Power7, we don't really need to define new hierarchy
> levels just for Power7.
>
> We instead, map as many levels as possible and approximate the rest. See
> comments near dcache-src_map[] in the patch.
>
> Usage:
>
> perf record -d -e 'cpu/PM_MRK_GRP_CMPL/' <application>
> perf report -n --mem-mode --sort=mem,sym,dso,symbol_daddr,dso_daddr"
>
> For samples involving load/store instructions, the memory
> hierarchy level is shown as "L1 hit", "Remote RAM hit" etc.
> # or
>
> perf record --data <application>
> perf report -D
>
> Sample records contain a 'data_src' field which encodes the
> memory hierarchy level: Eg: data_src 0x442 indicates
> MEM_OP_LOAD, MEM_LVL_HIT, MEM_LVL_L2 (i.e load hit L2).
Successfully built and boot tested this entire patchset both on a P7 and P8 system.
Running some sample tests with ebizzy micro benchmark. Till now got only 0x142 and
0x0 values for data_src object for the sample records. Will experiment around bit
more on P7 and P8 systems and post the results.
Regards
Anshuman
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 8/8][v4] powerpc/perf: Export Power7 memory hierarchy info to user space.
2013-09-14 0:49 ` [PATCH 8/8][v4] powerpc/perf: Export Power7 memory hierarchy info to user space Sukadev Bhattiprolu
2013-09-18 10:47 ` Anshuman Khandual
@ 2013-09-19 8:41 ` Anshuman Khandual
2013-09-24 22:30 ` Sukadev Bhattiprolu
1 sibling, 1 reply; 14+ messages in thread
From: Anshuman Khandual @ 2013-09-19 8:41 UTC (permalink / raw)
To: Sukadev Bhattiprolu
Cc: linuxppc-dev, Michael Ellerman, Paul Mackerras, linux-kernel,
Stephane Eranian
On 09/14/2013 06:19 AM, Sukadev Bhattiprolu wrote:
> +static void power7_get_mem_data_src(union perf_mem_data_src *dsrc,
> + struct pt_regs *regs)
> +{
> + u64 idx;
> + u64 mmcra = regs->dsisr;
> + u64 addr;
> + int ret;
> + unsigned int instr;
> +
> + if (mmcra & POWER7_MMCRA_DCACHE_MISS) {
> + idx = mmcra & POWER7_MMCRA_DCACHE_SRC_MASK;
> + idx >>= POWER7_MMCRA_DCACHE_SRC_SHIFT;
> +
> + dsrc->val |= dcache_src_map[idx];
> + return;
> + }
> +
> + instr = 0;
> + addr = perf_instruction_pointer(regs);
> +
> + if (is_kernel_addr(addr))
> + instr = *(unsigned int *)addr;
> + else {
> + pagefault_disable();
> + ret = __get_user_inatomic(instr, (unsigned int __user *)addr);
> + pagefault_enable();
> + if (ret)
> + instr = 0;
> + }
> + if (instr && instr_is_load_store(&instr))
Wondering if there is any possibility of getting positive values for
"(mmcra & POWER7_MMCRA_DCACHE_SRC_MASK) >> POWER7_MMCRA_DCACHE_SRC_SHIFT"
when the marked instruction did not have MMCRA[POWER7_MMCRA_DCACHE_MISS]
bit set. In that case we should actually compute dsrc->val as in the previous
case. I did couple of experiments on a P7 box, but was not able to find a
instance for a marked instruction whose MMCRA[POWER7_MMCRA_DCACHE_MISS] bit
not set and have a positive value POWER7_MMCRA_DCACHE_SRC field.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 8/8][v4] powerpc/perf: Export Power7 memory hierarchy info to user space.
2013-09-19 8:41 ` Anshuman Khandual
@ 2013-09-24 22:30 ` Sukadev Bhattiprolu
0 siblings, 0 replies; 14+ messages in thread
From: Sukadev Bhattiprolu @ 2013-09-24 22:30 UTC (permalink / raw)
To: Anshuman Khandual
Cc: linuxppc-dev, Michael Ellerman, Paul Mackerras, linux-kernel,
Stephane Eranian
Anshuman Khandual [khandual@linux.vnet.ibm.com] wrote:
| On 09/14/2013 06:19 AM, Sukadev Bhattiprolu wrote:
| > +static void power7_get_mem_data_src(union perf_mem_data_src *dsrc,
| > + struct pt_regs *regs)
| > +{
| > + u64 idx;
| > + u64 mmcra = regs->dsisr;
| > + u64 addr;
| > + int ret;
| > + unsigned int instr;
| > +
| > + if (mmcra & POWER7_MMCRA_DCACHE_MISS) {
| > + idx = mmcra & POWER7_MMCRA_DCACHE_SRC_MASK;
| > + idx >>= POWER7_MMCRA_DCACHE_SRC_SHIFT;
| > +
| > + dsrc->val |= dcache_src_map[idx];
| > + return;
| > + }
| > +
| > + instr = 0;
| > + addr = perf_instruction_pointer(regs);
| > +
| > + if (is_kernel_addr(addr))
| > + instr = *(unsigned int *)addr;
| > + else {
| > + pagefault_disable();
| > + ret = __get_user_inatomic(instr, (unsigned int __user *)addr);
| > + pagefault_enable();
| > + if (ret)
| > + instr = 0;
| > + }
| > + if (instr && instr_is_load_store(&instr))
|
|
| Wondering if there is any possibility of getting positive values for
| "(mmcra & POWER7_MMCRA_DCACHE_SRC_MASK) >> POWER7_MMCRA_DCACHE_SRC_SHIFT"
| when the marked instruction did not have MMCRA[POWER7_MMCRA_DCACHE_MISS]
| bit set. In that case we should actually compute dsrc->val as in the previous
| case. I did couple of experiments on a P7 box, but was not able to find a
| instance for a marked instruction whose MMCRA[POWER7_MMCRA_DCACHE_MISS] bit
| not set and have a positive value POWER7_MMCRA_DCACHE_SRC field.
Confirmed again with the hardware team that if there was no DCACHE_MISS,
the DCACHE_SRC field will be clear.
Thanks,
Sukadev
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2013-09-24 22:30 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-14 0:49 [PATCH 0/8][v4] powerpc/perf: Export memory hierarchy level in Power7/8 Sukadev Bhattiprolu
2013-09-14 0:49 ` [PATCH 1/8][v4] powerpc/perf: Rename Power8 macros to start with PME Sukadev Bhattiprolu
2013-09-18 5:24 ` Anshuman Khandual
2013-09-14 0:49 ` [PATCH 2/8][v4] powerpc/perf: Export Power8 generic events in sysfs Sukadev Bhattiprolu
2013-09-14 0:49 ` [PATCH 3/8][v4] powerpc/perf: Add PM_MRK_GRP_CMPL event to sysfs Sukadev Bhattiprolu
2013-09-14 0:49 ` [PATCH 4/8][v4] powerpc/perf: Define big-endian version of perf_mem_data_src Sukadev Bhattiprolu
2013-09-14 0:49 ` [PATCH 5/8][v4] powerpc/perf: Export Power8 memory hierarchy info to user space Sukadev Bhattiprolu
2013-09-14 0:49 ` [PATCH 6/8][v4] powerpc: Rename branch_opcode() to instr_opcode() Sukadev Bhattiprolu
2013-09-14 0:49 ` [PATCH 7/8][v4] power: implement is_instr_load_store() Sukadev Bhattiprolu
2013-09-16 12:22 ` Tom Musta
2013-09-14 0:49 ` [PATCH 8/8][v4] powerpc/perf: Export Power7 memory hierarchy info to user space Sukadev Bhattiprolu
2013-09-18 10:47 ` Anshuman Khandual
2013-09-19 8:41 ` Anshuman Khandual
2013-09-24 22:30 ` Sukadev Bhattiprolu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).