* [PATCH 0/4] powerpc/perf: Fixes for power10 PMU
@ 2020-11-11 4:33 Athira Rajeev
2020-11-11 4:33 ` [PATCH 1/4] powerpc/perf: Fix to update radix_scope_qual in power10 Athira Rajeev
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Athira Rajeev @ 2020-11-11 4:33 UTC (permalink / raw)
To: mpe; +Cc: mikey, maddy, linuxppc-dev
Patchset contains PMU fixes for power10.
This patchset contains 4 patches.
Patch1 includes fix to update event code with radix_scope_qual
bit in power10.
Patch2 updates the event group constraints for L2/L3 and threshold
events in power10.
Patch3 includes the event code changes for l2/l3 events and
some of the generic events.
Patch4 adds fixes for PMCCEXT bit in power10.
Athira Rajeev (4):
powerpc/perf: Fix to update radix_scope_qual in power10
powerpc/perf: Update the PMU group constraints for l2l3 and threshold
events in power10
powerpc/perf: Fix to update l2l3 events and generic event codes for
power10
powerpc/perf: MMCR0 control for PMU registers under PMCC=00
arch/powerpc/include/asm/reg.h | 1 +
arch/powerpc/kernel/cpu_setup_power.S | 2 +
arch/powerpc/kernel/dt_cpu_ftrs.c | 1 +
arch/powerpc/perf/core-book3s.c | 16 +++
arch/powerpc/perf/isa207-common.c | 27 ++++-
arch/powerpc/perf/isa207-common.h | 16 ++-
arch/powerpc/perf/power10-events-list.h | 9 ++
arch/powerpc/perf/power10-pmu.c | 177 ++++++++++++++++++++++++++++++--
8 files changed, 236 insertions(+), 13 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/4] powerpc/perf: Fix to update radix_scope_qual in power10
2020-11-11 4:33 [PATCH 0/4] powerpc/perf: Fixes for power10 PMU Athira Rajeev
@ 2020-11-11 4:33 ` Athira Rajeev
2020-11-11 4:33 ` [PATCH 2/4] powerpc/perf: Update the PMU group constraints for l2l3 and threshold events " Athira Rajeev
` (2 subsequent siblings)
3 siblings, 0 replies; 9+ messages in thread
From: Athira Rajeev @ 2020-11-11 4:33 UTC (permalink / raw)
To: mpe; +Cc: mikey, maddy, linuxppc-dev
power10 uses bit 9 of the raw event code as RADIX_SCOPE_QUAL.
This bit is used for enabling the radix process events.
Patch fixes the PMU counter support functions to program bit
18 of MMCR1 ( Monitor Mode Control Register1 ) with the
RADIX_SCOPE_QUAL bit value. Since this field is not per-pmc,
add this to PMU group constraints to make sure events in a
group will have same bit value for this field. Use bit 21 as
constraint bit field for radix_scope_qual. Patch also updates
the power10 raw event encoding layout information, format field
and constraints bit layout to include the radix_scope_qual bit.
Fixes: a64e697cef23 ("powerpc/perf: power10 Performance Monitoring support")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
arch/powerpc/perf/isa207-common.c | 12 ++++++++++++
arch/powerpc/perf/isa207-common.h | 13 ++++++++++---
arch/powerpc/perf/power10-pmu.c | 11 +++++++----
3 files changed, 29 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
index 2848904..f57f54f 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -339,6 +339,11 @@ int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp)
value |= CNST_L1_QUAL_VAL(cache);
}
+ if (cpu_has_feature(CPU_FTR_ARCH_31)) {
+ mask |= CNST_RADIX_SCOPE_GROUP_MASK;
+ value |= CNST_RADIX_SCOPE_GROUP_VAL(event >> p10_EVENT_RADIX_SCOPE_QUAL_SHIFT);
+ }
+
if (is_event_marked(event)) {
mask |= CNST_SAMPLE_MASK;
value |= CNST_SAMPLE_VAL(event >> EVENT_SAMPLE_SHIFT);
@@ -456,6 +461,13 @@ int isa207_compute_mmcr(u64 event[], int n_ev,
}
}
+ /* Set RADIX_SCOPE_QUAL bit */
+ if (cpu_has_feature(CPU_FTR_ARCH_31)) {
+ val = (event[i] >> p10_EVENT_RADIX_SCOPE_QUAL_SHIFT) &
+ p10_EVENT_RADIX_SCOPE_QUAL_MASK;
+ mmcr1 |= val << p10_MMCR1_RADIX_SCOPE_QUAL_SHIFT;
+ }
+
if (is_event_marked(event[i])) {
mmcra |= MMCRA_SAMPLE_ENABLE;
diff --git a/arch/powerpc/perf/isa207-common.h b/arch/powerpc/perf/isa207-common.h
index 7025de5..dc9c3d2 100644
--- a/arch/powerpc/perf/isa207-common.h
+++ b/arch/powerpc/perf/isa207-common.h
@@ -101,6 +101,9 @@
#define p10_EVENT_CACHE_SEL_MASK 0x3ull
#define p10_EVENT_MMCR3_MASK 0x7fffull
#define p10_EVENT_MMCR3_SHIFT 45
+#define p10_EVENT_RADIX_SCOPE_QUAL_SHIFT 9
+#define p10_EVENT_RADIX_SCOPE_QUAL_MASK 0x1
+#define p10_MMCR1_RADIX_SCOPE_QUAL_SHIFT 45
#define p10_EVENT_VALID_MASK \
((p10_SDAR_MODE_MASK << p10_SDAR_MODE_SHIFT | \
@@ -112,6 +115,7 @@
(p9_EVENT_COMBINE_MASK << p9_EVENT_COMBINE_SHIFT) | \
(p10_EVENT_MMCR3_MASK << p10_EVENT_MMCR3_SHIFT) | \
(EVENT_MARKED_MASK << EVENT_MARKED_SHIFT) | \
+ (p10_EVENT_RADIX_SCOPE_QUAL_MASK << p10_EVENT_RADIX_SCOPE_QUAL_SHIFT) | \
EVENT_LINUX_MASK | \
EVENT_PSEL_MASK))
/*
@@ -125,9 +129,9 @@
*
* 28 24 20 16 12 8 4 0
* | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - |
- * [ ] | [ ] [ sample ] [ ] [6] [5] [4] [3] [2] [1]
- * | | | |
- * BHRB IFM -* | | | Count of events for each PMC.
+ * [ ] | [ ] | [ sample ] [ ] [6] [5] [4] [3] [2] [1]
+ * | | | | |
+ * BHRB IFM -* | | |*radix_scope | Count of events for each PMC.
* EBB -* | | p1, p2, p3, p4, p5, p6.
* L1 I/D qualifier -* |
* nc - number of counters -*
@@ -165,6 +169,9 @@
#define CNST_L2L3_GROUP_VAL(v) (((v) & 0x1full) << 55)
#define CNST_L2L3_GROUP_MASK CNST_L2L3_GROUP_VAL(0x1f)
+#define CNST_RADIX_SCOPE_GROUP_VAL(v) (((v) & 0x1ull) << 21)
+#define CNST_RADIX_SCOPE_GROUP_MASK CNST_RADIX_SCOPE_GROUP_VAL(1)
+
/*
* For NC we are counting up to 4 events. This requires three bits, and we need
* the fifth event to overflow and set the 4th bit. To achieve that we bias the
diff --git a/arch/powerpc/perf/power10-pmu.c b/arch/powerpc/perf/power10-pmu.c
index 9dbe8f9..cf44fb7 100644
--- a/arch/powerpc/perf/power10-pmu.c
+++ b/arch/powerpc/perf/power10-pmu.c
@@ -23,10 +23,10 @@
*
* 28 24 20 16 12 8 4 0
* | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - |
- * [ ] [ sample ] [ ] [ ] [ pmc ] [unit ] [ ] m [ pmcxsel ]
- * | | | | | |
- * | | | | | *- mark
- * | | | *- L1/L2/L3 cache_sel |
+ * [ ] [ sample ] [ ] [ ] [ pmc ] [unit ] [ ] | m [ pmcxsel ]
+ * | | | | | | |
+ * | | | | | | *- mark
+ * | | | *- L1/L2/L3 cache_sel | |*-radix_scope_qual
* | | sdar_mode |
* | *- sampling mode for marked events *- combine
* |
@@ -59,6 +59,7 @@
*
* MMCR1[16] = cache_sel[0]
* MMCR1[17] = cache_sel[1]
+ * MMCR1[18] = radix_scope_qual
*
* if mark:
* MMCRA[63] = 1 (SAMPLE_ENABLE)
@@ -175,6 +176,7 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[])
PMU_FORMAT_ATTR(invert_bit, "config:47");
PMU_FORMAT_ATTR(src_mask, "config:48-53");
PMU_FORMAT_ATTR(src_match, "config:54-59");
+PMU_FORMAT_ATTR(radix_scope, "config:9");
static struct attribute *power10_pmu_format_attr[] = {
&format_attr_event.attr,
@@ -194,6 +196,7 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[])
&format_attr_invert_bit.attr,
&format_attr_src_mask.attr,
&format_attr_src_match.attr,
+ &format_attr_radix_scope.attr,
NULL,
};
--
1.8.3.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/4] powerpc/perf: Update the PMU group constraints for l2l3 and threshold events in power10
2020-11-11 4:33 [PATCH 0/4] powerpc/perf: Fixes for power10 PMU Athira Rajeev
2020-11-11 4:33 ` [PATCH 1/4] powerpc/perf: Fix to update radix_scope_qual in power10 Athira Rajeev
@ 2020-11-11 4:33 ` Athira Rajeev
2020-11-18 4:32 ` Michael Ellerman
2020-11-11 4:33 ` [PATCH 3/4] powerpc/perf: Fix to update l2l3 events and generic event codes for power10 Athira Rajeev
2020-11-11 4:33 ` [PATCH 4/4] powerpc/perf: MMCR0 control for PMU registers under PMCC=00 Athira Rajeev
3 siblings, 1 reply; 9+ messages in thread
From: Athira Rajeev @ 2020-11-11 4:33 UTC (permalink / raw)
To: mpe; +Cc: mikey, maddy, linuxppc-dev
In Power9, L2/L3 bus events are always available as a
"bank" of 4 events. To obtain the counts for any of the
l2/l3 bus events in a given bank, the user will have to
program PMC4 with corresponding l2/l3 bus event for that
bank.
Commit 59029136d750 ("powerpc/perf: Add constraints for power9 l2/l3 bus events")
enforced this rule in Power9. But this is not valid for
Power10, since in Power10 Monitor Mode Control Register2
(MMCR2) has bits to configure l2/l3 event bits. Hence remove
this PMC4 constraint check from power10.
Since the l2/l3 bits in MMCR2 are not per-pmc, patch handles
group constrints checks for l2/l3 bits in MMCR2.
Patch also updates constraints for threshold events in power10.
Fixes: a64e697cef23 ("powerpc/perf: power10 Performance Monitoring support")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
arch/powerpc/perf/isa207-common.c | 15 +++++++++++----
arch/powerpc/perf/isa207-common.h | 3 +++
2 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
index f57f54f..0f4983e 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -311,9 +311,11 @@ int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp)
}
if (unit >= 6 && unit <= 9) {
- if (cpu_has_feature(CPU_FTR_ARCH_31) && (unit == 6)) {
- mask |= CNST_L2L3_GROUP_MASK;
- value |= CNST_L2L3_GROUP_VAL(event >> p10_L2L3_EVENT_SHIFT);
+ if (cpu_has_feature(CPU_FTR_ARCH_31)) {
+ if (unit == 6) {
+ mask |= CNST_L2L3_GROUP_MASK;
+ value |= CNST_L2L3_GROUP_VAL(event >> p10_L2L3_EVENT_SHIFT);
+ }
} else if (cpu_has_feature(CPU_FTR_ARCH_300)) {
mask |= CNST_CACHE_GROUP_MASK;
value |= CNST_CACHE_GROUP_VAL(event & 0xff);
@@ -349,7 +351,12 @@ int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp)
value |= CNST_SAMPLE_VAL(event >> EVENT_SAMPLE_SHIFT);
}
- if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+ if (cpu_has_feature(CPU_FTR_ARCH_31)) {
+ if (event_is_threshold(event)) {
+ mask |= CNST_THRESH_CTL_SEL_MASK;
+ value |= CNST_THRESH_CTL_SEL_VAL(event >> EVENT_THRESH_SHIFT);
+ }
+ } else if (cpu_has_feature(CPU_FTR_ARCH_300)) {
if (event_is_threshold(event) && is_thresh_cmp_valid(event)) {
mask |= CNST_THRESH_MASK;
value |= CNST_THRESH_VAL(event >> EVENT_THRESH_SHIFT);
diff --git a/arch/powerpc/perf/isa207-common.h b/arch/powerpc/perf/isa207-common.h
index dc9c3d2..4208764 100644
--- a/arch/powerpc/perf/isa207-common.h
+++ b/arch/powerpc/perf/isa207-common.h
@@ -149,6 +149,9 @@
#define CNST_THRESH_VAL(v) (((v) & EVENT_THRESH_MASK) << 32)
#define CNST_THRESH_MASK CNST_THRESH_VAL(EVENT_THRESH_MASK)
+#define CNST_THRESH_CTL_SEL_VAL(v) (((v) & 0x7ffull) << 32)
+#define CNST_THRESH_CTL_SEL_MASK CNST_THRESH_CTL_SEL_VAL(0x7ff)
+
#define CNST_EBB_VAL(v) (((v) & EVENT_EBB_MASK) << 24)
#define CNST_EBB_MASK CNST_EBB_VAL(EVENT_EBB_MASK)
--
1.8.3.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 3/4] powerpc/perf: Fix to update l2l3 events and generic event codes for power10
2020-11-11 4:33 [PATCH 0/4] powerpc/perf: Fixes for power10 PMU Athira Rajeev
2020-11-11 4:33 ` [PATCH 1/4] powerpc/perf: Fix to update radix_scope_qual in power10 Athira Rajeev
2020-11-11 4:33 ` [PATCH 2/4] powerpc/perf: Update the PMU group constraints for l2l3 and threshold events " Athira Rajeev
@ 2020-11-11 4:33 ` Athira Rajeev
2020-11-18 4:36 ` Michael Ellerman
2020-11-11 4:33 ` [PATCH 4/4] powerpc/perf: MMCR0 control for PMU registers under PMCC=00 Athira Rajeev
3 siblings, 1 reply; 9+ messages in thread
From: Athira Rajeev @ 2020-11-11 4:33 UTC (permalink / raw)
To: mpe; +Cc: mikey, maddy, linuxppc-dev
Fix the event code for events: branch-instructions (to PM_BR_FIN),
branch-misses (to PM_BR_MPRED_FIN) and cache-misses (to
PM_LD_DEMAND_MISS_L1_FIN) for power10 PMU. Update the
list of generic events with this modified event code.
Export l2l3 events (PM_L2_ST_MISS and PM_L2_ST) and LLC-prefetches
(PM_L3_PF_MISS_L3) via sysfs, and also add these to cache_events.
To maintain the current event code work with DD1, rename
existing array of generic_events, cache_events and pmu_attr_groups
with suffix _dd1. Update the power10 pmu init code to pick the
dd1 list while registering the power PMU, based on the pvr
(Processor Version Register) value.
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
arch/powerpc/perf/power10-events-list.h | 9 ++
arch/powerpc/perf/power10-pmu.c | 166 +++++++++++++++++++++++++++++++-
2 files changed, 173 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/perf/power10-events-list.h b/arch/powerpc/perf/power10-events-list.h
index 60c1b81..9e0b3c9 100644
--- a/arch/powerpc/perf/power10-events-list.h
+++ b/arch/powerpc/perf/power10-events-list.h
@@ -15,6 +15,9 @@
EVENT(PM_RUN_INST_CMPL, 0x500fa);
EVENT(PM_BR_CMPL, 0x4d05e);
EVENT(PM_BR_MPRED_CMPL, 0x400f6);
+EVENT(PM_BR_FIN, 0x2f04a);
+EVENT(PM_BR_MPRED_FIN, 0x35884);
+EVENT(PM_LD_DEMAND_MISS_L1_FIN, 0x400f0);
/* All L1 D cache load references counted at finish, gated by reject */
EVENT(PM_LD_REF_L1, 0x100fc);
@@ -36,6 +39,12 @@
EVENT(PM_DATA_FROM_L3, 0x01340000001c040);
/* Demand LD - L3 Miss (not L2 hit and not L3 hit) */
EVENT(PM_DATA_FROM_L3MISS, 0x300fe);
+/* All successful D-side store dispatches for this thread */
+EVENT(PM_L2_ST, 0x010000046080);
+/* All successful D-side store dispatches for this thread that were L2 Miss */
+EVENT(PM_L2_ST_MISS, 0x26880);
+/* Total HW L3 prefetches(Load+store) */
+EVENT(PM_L3_PF_MISS_L3, 0x100000016080);
/* Data PTEG reload */
EVENT(PM_DTLB_MISS, 0x300fc);
/* ITLB Reloaded */
diff --git a/arch/powerpc/perf/power10-pmu.c b/arch/powerpc/perf/power10-pmu.c
index cf44fb7..86665ad 100644
--- a/arch/powerpc/perf/power10-pmu.c
+++ b/arch/powerpc/perf/power10-pmu.c
@@ -114,6 +114,9 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[])
GENERIC_EVENT_ATTR(cache-misses, PM_LD_MISS_L1);
GENERIC_EVENT_ATTR(mem-loads, MEM_LOADS);
GENERIC_EVENT_ATTR(mem-stores, MEM_STORES);
+GENERIC_EVENT_ATTR(branch-instructions, PM_BR_FIN);
+GENERIC_EVENT_ATTR(branch-misses, PM_BR_MPRED_FIN);
+GENERIC_EVENT_ATTR(cache-misses, PM_LD_DEMAND_MISS_L1_FIN);
CACHE_EVENT_ATTR(L1-dcache-load-misses, PM_LD_MISS_L1);
CACHE_EVENT_ATTR(L1-dcache-loads, PM_LD_REF_L1);
@@ -124,12 +127,15 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[])
CACHE_EVENT_ATTR(L1-icache-prefetches, PM_IC_PREF_REQ);
CACHE_EVENT_ATTR(LLC-load-misses, PM_DATA_FROM_L3MISS);
CACHE_EVENT_ATTR(LLC-loads, PM_DATA_FROM_L3);
+CACHE_EVENT_ATTR(LLC-prefetches, PM_L3_PF_MISS_L3);
+CACHE_EVENT_ATTR(LLC-store-misses, PM_L2_ST_MISS);
+CACHE_EVENT_ATTR(LLC-stores, PM_L2_ST);
CACHE_EVENT_ATTR(branch-load-misses, PM_BR_MPRED_CMPL);
CACHE_EVENT_ATTR(branch-loads, PM_BR_CMPL);
CACHE_EVENT_ATTR(dTLB-load-misses, PM_DTLB_MISS);
CACHE_EVENT_ATTR(iTLB-load-misses, PM_ITLB_MISS);
-static struct attribute *power10_events_attr[] = {
+static struct attribute *power10_events_attr_dd1[] = {
GENERIC_EVENT_PTR(PM_RUN_CYC),
GENERIC_EVENT_PTR(PM_RUN_INST_CMPL),
GENERIC_EVENT_PTR(PM_BR_CMPL),
@@ -154,11 +160,44 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[])
NULL
};
+static struct attribute *power10_events_attr[] = {
+ GENERIC_EVENT_PTR(PM_RUN_CYC),
+ GENERIC_EVENT_PTR(PM_RUN_INST_CMPL),
+ GENERIC_EVENT_PTR(PM_BR_FIN),
+ GENERIC_EVENT_PTR(PM_BR_MPRED_FIN),
+ GENERIC_EVENT_PTR(PM_LD_REF_L1),
+ GENERIC_EVENT_PTR(PM_LD_DEMAND_MISS_L1_FIN),
+ GENERIC_EVENT_PTR(MEM_LOADS),
+ GENERIC_EVENT_PTR(MEM_STORES),
+ CACHE_EVENT_PTR(PM_LD_MISS_L1),
+ CACHE_EVENT_PTR(PM_LD_REF_L1),
+ CACHE_EVENT_PTR(PM_LD_PREFETCH_CACHE_LINE_MISS),
+ CACHE_EVENT_PTR(PM_ST_MISS_L1),
+ CACHE_EVENT_PTR(PM_L1_ICACHE_MISS),
+ CACHE_EVENT_PTR(PM_INST_FROM_L1),
+ CACHE_EVENT_PTR(PM_IC_PREF_REQ),
+ CACHE_EVENT_PTR(PM_DATA_FROM_L3MISS),
+ CACHE_EVENT_PTR(PM_DATA_FROM_L3),
+ CACHE_EVENT_PTR(PM_L3_PF_MISS_L3),
+ CACHE_EVENT_PTR(PM_L2_ST_MISS),
+ CACHE_EVENT_PTR(PM_L2_ST),
+ CACHE_EVENT_PTR(PM_BR_MPRED_CMPL),
+ CACHE_EVENT_PTR(PM_BR_CMPL),
+ CACHE_EVENT_PTR(PM_DTLB_MISS),
+ CACHE_EVENT_PTR(PM_ITLB_MISS),
+ NULL
+};
+
static struct attribute_group power10_pmu_events_group = {
.name = "events",
.attrs = power10_events_attr,
};
+static struct attribute_group power10_pmu_events_group_dd1 = {
+ .name = "events",
+ .attrs = power10_events_attr_dd1,
+};
+
PMU_FORMAT_ATTR(event, "config:0-59");
PMU_FORMAT_ATTR(pmcxsel, "config:0-7");
PMU_FORMAT_ATTR(mark, "config:8");
@@ -211,7 +250,13 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[])
NULL,
};
-static int power10_generic_events[] = {
+static const struct attribute_group *power10_pmu_attr_groups_dd1[] = {
+ &power10_pmu_format_group,
+ &power10_pmu_events_group_dd1,
+ NULL,
+};
+
+static int power10_generic_events_dd1[] = {
[PERF_COUNT_HW_CPU_CYCLES] = PM_RUN_CYC,
[PERF_COUNT_HW_INSTRUCTIONS] = PM_RUN_INST_CMPL,
[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = PM_BR_CMPL,
@@ -220,6 +265,15 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[])
[PERF_COUNT_HW_CACHE_MISSES] = PM_LD_MISS_L1,
};
+static int power10_generic_events[] = {
+ [PERF_COUNT_HW_CPU_CYCLES] = PM_RUN_CYC,
+ [PERF_COUNT_HW_INSTRUCTIONS] = PM_RUN_INST_CMPL,
+ [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = PM_BR_FIN,
+ [PERF_COUNT_HW_BRANCH_MISSES] = PM_BR_MPRED_FIN,
+ [PERF_COUNT_HW_CACHE_REFERENCES] = PM_LD_REF_L1,
+ [PERF_COUNT_HW_CACHE_MISSES] = PM_LD_DEMAND_MISS_L1_FIN,
+};
+
static u64 power10_bhrb_filter_map(u64 branch_sample_type)
{
u64 pmu_bhrb_filter = 0;
@@ -311,6 +365,107 @@ static void power10_config_bhrb(u64 pmu_bhrb_filter)
[C(RESULT_MISS)] = PM_DATA_FROM_L3MISS,
},
[C(OP_WRITE)] = {
+ [C(RESULT_ACCESS)] = PM_L2_ST,
+ [C(RESULT_MISS)] = PM_L2_ST_MISS,
+ },
+ [C(OP_PREFETCH)] = {
+ [C(RESULT_ACCESS)] = PM_L3_PF_MISS_L3,
+ [C(RESULT_MISS)] = 0,
+ },
+ },
+ [C(DTLB)] = {
+ [C(OP_READ)] = {
+ [C(RESULT_ACCESS)] = 0,
+ [C(RESULT_MISS)] = PM_DTLB_MISS,
+ },
+ [C(OP_WRITE)] = {
+ [C(RESULT_ACCESS)] = -1,
+ [C(RESULT_MISS)] = -1,
+ },
+ [C(OP_PREFETCH)] = {
+ [C(RESULT_ACCESS)] = -1,
+ [C(RESULT_MISS)] = -1,
+ },
+ },
+ [C(ITLB)] = {
+ [C(OP_READ)] = {
+ [C(RESULT_ACCESS)] = 0,
+ [C(RESULT_MISS)] = PM_ITLB_MISS,
+ },
+ [C(OP_WRITE)] = {
+ [C(RESULT_ACCESS)] = -1,
+ [C(RESULT_MISS)] = -1,
+ },
+ [C(OP_PREFETCH)] = {
+ [C(RESULT_ACCESS)] = -1,
+ [C(RESULT_MISS)] = -1,
+ },
+ },
+ [C(BPU)] = {
+ [C(OP_READ)] = {
+ [C(RESULT_ACCESS)] = PM_BR_CMPL,
+ [C(RESULT_MISS)] = PM_BR_MPRED_CMPL,
+ },
+ [C(OP_WRITE)] = {
+ [C(RESULT_ACCESS)] = -1,
+ [C(RESULT_MISS)] = -1,
+ },
+ [C(OP_PREFETCH)] = {
+ [C(RESULT_ACCESS)] = -1,
+ [C(RESULT_MISS)] = -1,
+ },
+ },
+ [C(NODE)] = {
+ [C(OP_READ)] = {
+ [C(RESULT_ACCESS)] = -1,
+ [C(RESULT_MISS)] = -1,
+ },
+ [C(OP_WRITE)] = {
+ [C(RESULT_ACCESS)] = -1,
+ [C(RESULT_MISS)] = -1,
+ },
+ [C(OP_PREFETCH)] = {
+ [C(RESULT_ACCESS)] = -1,
+ [C(RESULT_MISS)] = -1,
+ },
+ },
+};
+
+static u64 power10_cache_events_dd1[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
+ [C(L1D)] = {
+ [C(OP_READ)] = {
+ [C(RESULT_ACCESS)] = PM_LD_REF_L1,
+ [C(RESULT_MISS)] = PM_LD_MISS_L1,
+ },
+ [C(OP_WRITE)] = {
+ [C(RESULT_ACCESS)] = 0,
+ [C(RESULT_MISS)] = PM_ST_MISS_L1,
+ },
+ [C(OP_PREFETCH)] = {
+ [C(RESULT_ACCESS)] = PM_LD_PREFETCH_CACHE_LINE_MISS,
+ [C(RESULT_MISS)] = 0,
+ },
+ },
+ [C(L1I)] = {
+ [C(OP_READ)] = {
+ [C(RESULT_ACCESS)] = PM_INST_FROM_L1,
+ [C(RESULT_MISS)] = PM_L1_ICACHE_MISS,
+ },
+ [C(OP_WRITE)] = {
+ [C(RESULT_ACCESS)] = PM_INST_FROM_L1MISS,
+ [C(RESULT_MISS)] = -1,
+ },
+ [C(OP_PREFETCH)] = {
+ [C(RESULT_ACCESS)] = PM_IC_PREF_REQ,
+ [C(RESULT_MISS)] = 0,
+ },
+ },
+ [C(LL)] = {
+ [C(OP_READ)] = {
+ [C(RESULT_ACCESS)] = PM_DATA_FROM_L3,
+ [C(RESULT_MISS)] = PM_DATA_FROM_L3MISS,
+ },
+ [C(OP_WRITE)] = {
[C(RESULT_ACCESS)] = -1,
[C(RESULT_MISS)] = -1,
},
@@ -407,6 +562,7 @@ static void power10_config_bhrb(u64 pmu_bhrb_filter)
int init_power10_pmu(void)
{
int rc;
+ unsigned int pvr = mfspr(SPRN_PVR);
/* Comes from cpu_specs[] */
if (!cur_cpu_spec->oprofile_cpu_type ||
@@ -416,6 +572,12 @@ int init_power10_pmu(void)
/* Set the PERF_REG_EXTENDED_MASK here */
PERF_REG_EXTENDED_MASK = PERF_REG_PMU_MASK_31;
+ if ((PVR_MAJ(pvr) == 1)) {
+ power10_pmu.generic_events = power10_generic_events_dd1;
+ power10_pmu.attr_groups = power10_pmu_attr_groups_dd1;
+ power10_pmu.cache_events = &power10_cache_events_dd1;
+ }
+
rc = register_power_pmu(&power10_pmu);
if (rc)
return rc;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 4/4] powerpc/perf: MMCR0 control for PMU registers under PMCC=00
2020-11-11 4:33 [PATCH 0/4] powerpc/perf: Fixes for power10 PMU Athira Rajeev
` (2 preceding siblings ...)
2020-11-11 4:33 ` [PATCH 3/4] powerpc/perf: Fix to update l2l3 events and generic event codes for power10 Athira Rajeev
@ 2020-11-11 4:33 ` Athira Rajeev
3 siblings, 0 replies; 9+ messages in thread
From: Athira Rajeev @ 2020-11-11 4:33 UTC (permalink / raw)
To: mpe; +Cc: mikey, maddy, linuxppc-dev
PowerISA v3.1 introduces new control bit (PMCCEXT) for enabling
secure access to group B PMU registers in problem state when
MMCR0 PMCC=0b00. This patch adds support for MMCR0 PMCCEXT bit
in power10 by enabling this bit during boot and during the PMU
event enable/disable operations when MMCR0 PMCC=0b00
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
arch/powerpc/include/asm/reg.h | 1 +
arch/powerpc/kernel/cpu_setup_power.S | 2 ++
arch/powerpc/kernel/dt_cpu_ftrs.c | 1 +
arch/powerpc/perf/core-book3s.c | 16 ++++++++++++++++
4 files changed, 20 insertions(+)
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index f877a57..cba9965 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -864,6 +864,7 @@
#define MMCR0_BHRBA 0x00200000UL /* BHRB Access allowed in userspace */
#define MMCR0_EBE 0x00100000UL /* Event based branch enable */
#define MMCR0_PMCC 0x000c0000UL /* PMC control */
+#define MMCR0_PMCCEXT ASM_CONST(0x00000200) /* PMCCEXT control */
#define MMCR0_PMCC_U6 0x00080000UL /* PMC1-6 are R/W by user (PR) */
#define MMCR0_PMC1CE 0x00008000UL /* PMC1 count enable*/
#define MMCR0_PMCjCE ASM_CONST(0x00004000) /* PMCj count enable*/
diff --git a/arch/powerpc/kernel/cpu_setup_power.S b/arch/powerpc/kernel/cpu_setup_power.S
index 704e8b9..8fc8b72 100644
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ b/arch/powerpc/kernel/cpu_setup_power.S
@@ -249,4 +249,6 @@ __init_PMU_ISA31:
mtspr SPRN_MMCR3,r5
LOAD_REG_IMMEDIATE(r5, MMCRA_BHRB_DISABLE)
mtspr SPRN_MMCRA,r5
+ LOAD_REG_IMMEDIATE(r5, MMCR0_PMCCEXT)
+ mtspr SPRN_MMCR0,r5
blr
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 1098863..9d07965 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -454,6 +454,7 @@ static void init_pmu_power10(void)
mtspr(SPRN_MMCR3, 0);
mtspr(SPRN_MMCRA, MMCRA_BHRB_DISABLE);
+ mtspr(SPRN_MMCR0, MMCR0_PMCCEXT);
}
static int __init feat_enable_pmu_power10(struct dt_cpu_feature *f)
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 08643cb..f328bc0 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -95,6 +95,7 @@ struct cpu_hw_events {
#define SPRN_SIER3 0
#define MMCRA_SAMPLE_ENABLE 0
#define MMCRA_BHRB_DISABLE 0
+#define MMCR0_PMCCEXT 0
static inline unsigned long perf_ip_adjust(struct pt_regs *regs)
{
@@ -1242,6 +1243,9 @@ static void power_pmu_disable(struct pmu *pmu)
val |= MMCR0_FC;
val &= ~(MMCR0_EBE | MMCR0_BHRBA | MMCR0_PMCC | MMCR0_PMAO |
MMCR0_FC56);
+ /* Set mmcr0 PMCCEXT for p10 */
+ if (ppmu->flags & PPMU_ARCH_31)
+ val |= MMCR0_PMCCEXT;
/*
* The barrier is to make sure the mtspr has been
@@ -1449,6 +1453,18 @@ static void power_pmu_enable(struct pmu *pmu)
mmcr0 = ebb_switch_in(ebb, cpuhw);
+ /*
+ * Set mmcr0 (PMCCEXT) for p10
+ * if mmcr0 PMCC=0b00 to allow secure
+ * mode of access to group B registers.
+ */
+ if (ppmu->flags & PPMU_ARCH_31) {
+ if (!(mmcr0 & MMCR0_PMCC)) {
+ cpuhw->mmcr.mmcr0 |= MMCR0_PMCCEXT;
+ mmcr0 |= MMCR0_PMCCEXT;
+ }
+ }
+
mb();
if (cpuhw->bhrb_users)
ppmu->config_bhrb(cpuhw->bhrb_filter);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/4] powerpc/perf: Update the PMU group constraints for l2l3 and threshold events in power10
2020-11-11 4:33 ` [PATCH 2/4] powerpc/perf: Update the PMU group constraints for l2l3 and threshold events " Athira Rajeev
@ 2020-11-18 4:32 ` Michael Ellerman
2020-11-18 5:21 ` Athira Rajeev
0 siblings, 1 reply; 9+ messages in thread
From: Michael Ellerman @ 2020-11-18 4:32 UTC (permalink / raw)
To: Athira Rajeev; +Cc: mikey, maddy, linuxppc-dev
Athira Rajeev <atrajeev@linux.vnet.ibm.com> writes:
> In Power9, L2/L3 bus events are always available as a
> "bank" of 4 events. To obtain the counts for any of the
> l2/l3 bus events in a given bank, the user will have to
> program PMC4 with corresponding l2/l3 bus event for that
> bank.
>
> Commit 59029136d750 ("powerpc/perf: Add constraints for power9 l2/l3 bus events")
> enforced this rule in Power9. But this is not valid for
> Power10, since in Power10 Monitor Mode Control Register2
> (MMCR2) has bits to configure l2/l3 event bits. Hence remove
> this PMC4 constraint check from power10.
>
> Since the l2/l3 bits in MMCR2 are not per-pmc, patch handles
> group constrints checks for l2/l3 bits in MMCR2.
> Patch also updates constraints for threshold events in power10.
That should be done in a separate patch please.
cheers
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/4] powerpc/perf: Fix to update l2l3 events and generic event codes for power10
2020-11-11 4:33 ` [PATCH 3/4] powerpc/perf: Fix to update l2l3 events and generic event codes for power10 Athira Rajeev
@ 2020-11-18 4:36 ` Michael Ellerman
2020-11-18 5:23 ` Athira Rajeev
0 siblings, 1 reply; 9+ messages in thread
From: Michael Ellerman @ 2020-11-18 4:36 UTC (permalink / raw)
To: Athira Rajeev; +Cc: mikey, maddy, linuxppc-dev
Athira Rajeev <atrajeev@linux.vnet.ibm.com> writes:
> Fix the event code for events: branch-instructions (to PM_BR_FIN),
> branch-misses (to PM_BR_MPRED_FIN) and cache-misses (to
> PM_LD_DEMAND_MISS_L1_FIN) for power10 PMU. Update the
> list of generic events with this modified event code.
That should be one patch.
> Export l2l3 events (PM_L2_ST_MISS and PM_L2_ST) and LLC-prefetches
> (PM_L3_PF_MISS_L3) via sysfs, and also add these to cache_events.
That should be another patch.
> To maintain the current event code work with DD1, rename
> existing array of generic_events, cache_events and pmu_attr_groups
> with suffix _dd1. Update the power10 pmu init code to pick the
> dd1 list while registering the power PMU, based on the pvr
> (Processor Version Register) value.
And that should be a third patch.
cheers
> diff --git a/arch/powerpc/perf/power10-events-list.h b/arch/powerpc/perf/power10-events-list.h
> index 60c1b81..9e0b3c9 100644
> --- a/arch/powerpc/perf/power10-events-list.h
> +++ b/arch/powerpc/perf/power10-events-list.h
> @@ -15,6 +15,9 @@
> EVENT(PM_RUN_INST_CMPL, 0x500fa);
> EVENT(PM_BR_CMPL, 0x4d05e);
> EVENT(PM_BR_MPRED_CMPL, 0x400f6);
> +EVENT(PM_BR_FIN, 0x2f04a);
> +EVENT(PM_BR_MPRED_FIN, 0x35884);
> +EVENT(PM_LD_DEMAND_MISS_L1_FIN, 0x400f0);
>
> /* All L1 D cache load references counted at finish, gated by reject */
> EVENT(PM_LD_REF_L1, 0x100fc);
> @@ -36,6 +39,12 @@
> EVENT(PM_DATA_FROM_L3, 0x01340000001c040);
> /* Demand LD - L3 Miss (not L2 hit and not L3 hit) */
> EVENT(PM_DATA_FROM_L3MISS, 0x300fe);
> +/* All successful D-side store dispatches for this thread */
> +EVENT(PM_L2_ST, 0x010000046080);
> +/* All successful D-side store dispatches for this thread that were L2 Miss */
> +EVENT(PM_L2_ST_MISS, 0x26880);
> +/* Total HW L3 prefetches(Load+store) */
> +EVENT(PM_L3_PF_MISS_L3, 0x100000016080);
> /* Data PTEG reload */
> EVENT(PM_DTLB_MISS, 0x300fc);
> /* ITLB Reloaded */
> diff --git a/arch/powerpc/perf/power10-pmu.c b/arch/powerpc/perf/power10-pmu.c
> index cf44fb7..86665ad 100644
> --- a/arch/powerpc/perf/power10-pmu.c
> +++ b/arch/powerpc/perf/power10-pmu.c
> @@ -114,6 +114,9 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[])
> GENERIC_EVENT_ATTR(cache-misses, PM_LD_MISS_L1);
> GENERIC_EVENT_ATTR(mem-loads, MEM_LOADS);
> GENERIC_EVENT_ATTR(mem-stores, MEM_STORES);
> +GENERIC_EVENT_ATTR(branch-instructions, PM_BR_FIN);
> +GENERIC_EVENT_ATTR(branch-misses, PM_BR_MPRED_FIN);
> +GENERIC_EVENT_ATTR(cache-misses, PM_LD_DEMAND_MISS_L1_FIN);
>
> CACHE_EVENT_ATTR(L1-dcache-load-misses, PM_LD_MISS_L1);
> CACHE_EVENT_ATTR(L1-dcache-loads, PM_LD_REF_L1);
> @@ -124,12 +127,15 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[])
> CACHE_EVENT_ATTR(L1-icache-prefetches, PM_IC_PREF_REQ);
> CACHE_EVENT_ATTR(LLC-load-misses, PM_DATA_FROM_L3MISS);
> CACHE_EVENT_ATTR(LLC-loads, PM_DATA_FROM_L3);
> +CACHE_EVENT_ATTR(LLC-prefetches, PM_L3_PF_MISS_L3);
> +CACHE_EVENT_ATTR(LLC-store-misses, PM_L2_ST_MISS);
> +CACHE_EVENT_ATTR(LLC-stores, PM_L2_ST);
> CACHE_EVENT_ATTR(branch-load-misses, PM_BR_MPRED_CMPL);
> CACHE_EVENT_ATTR(branch-loads, PM_BR_CMPL);
> CACHE_EVENT_ATTR(dTLB-load-misses, PM_DTLB_MISS);
> CACHE_EVENT_ATTR(iTLB-load-misses, PM_ITLB_MISS);
>
> -static struct attribute *power10_events_attr[] = {
> +static struct attribute *power10_events_attr_dd1[] = {
> GENERIC_EVENT_PTR(PM_RUN_CYC),
> GENERIC_EVENT_PTR(PM_RUN_INST_CMPL),
> GENERIC_EVENT_PTR(PM_BR_CMPL),
> @@ -154,11 +160,44 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[])
> NULL
> };
>
> +static struct attribute *power10_events_attr[] = {
> + GENERIC_EVENT_PTR(PM_RUN_CYC),
> + GENERIC_EVENT_PTR(PM_RUN_INST_CMPL),
> + GENERIC_EVENT_PTR(PM_BR_FIN),
> + GENERIC_EVENT_PTR(PM_BR_MPRED_FIN),
> + GENERIC_EVENT_PTR(PM_LD_REF_L1),
> + GENERIC_EVENT_PTR(PM_LD_DEMAND_MISS_L1_FIN),
> + GENERIC_EVENT_PTR(MEM_LOADS),
> + GENERIC_EVENT_PTR(MEM_STORES),
> + CACHE_EVENT_PTR(PM_LD_MISS_L1),
> + CACHE_EVENT_PTR(PM_LD_REF_L1),
> + CACHE_EVENT_PTR(PM_LD_PREFETCH_CACHE_LINE_MISS),
> + CACHE_EVENT_PTR(PM_ST_MISS_L1),
> + CACHE_EVENT_PTR(PM_L1_ICACHE_MISS),
> + CACHE_EVENT_PTR(PM_INST_FROM_L1),
> + CACHE_EVENT_PTR(PM_IC_PREF_REQ),
> + CACHE_EVENT_PTR(PM_DATA_FROM_L3MISS),
> + CACHE_EVENT_PTR(PM_DATA_FROM_L3),
> + CACHE_EVENT_PTR(PM_L3_PF_MISS_L3),
> + CACHE_EVENT_PTR(PM_L2_ST_MISS),
> + CACHE_EVENT_PTR(PM_L2_ST),
> + CACHE_EVENT_PTR(PM_BR_MPRED_CMPL),
> + CACHE_EVENT_PTR(PM_BR_CMPL),
> + CACHE_EVENT_PTR(PM_DTLB_MISS),
> + CACHE_EVENT_PTR(PM_ITLB_MISS),
> + NULL
> +};
> +
> static struct attribute_group power10_pmu_events_group = {
> .name = "events",
> .attrs = power10_events_attr,
> };
>
> +static struct attribute_group power10_pmu_events_group_dd1 = {
> + .name = "events",
> + .attrs = power10_events_attr_dd1,
> +};
> +
> PMU_FORMAT_ATTR(event, "config:0-59");
> PMU_FORMAT_ATTR(pmcxsel, "config:0-7");
> PMU_FORMAT_ATTR(mark, "config:8");
> @@ -211,7 +250,13 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[])
> NULL,
> };
>
> -static int power10_generic_events[] = {
> +static const struct attribute_group *power10_pmu_attr_groups_dd1[] = {
> + &power10_pmu_format_group,
> + &power10_pmu_events_group_dd1,
> + NULL,
> +};
> +
> +static int power10_generic_events_dd1[] = {
> [PERF_COUNT_HW_CPU_CYCLES] = PM_RUN_CYC,
> [PERF_COUNT_HW_INSTRUCTIONS] = PM_RUN_INST_CMPL,
> [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = PM_BR_CMPL,
> @@ -220,6 +265,15 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[])
> [PERF_COUNT_HW_CACHE_MISSES] = PM_LD_MISS_L1,
> };
>
> +static int power10_generic_events[] = {
> + [PERF_COUNT_HW_CPU_CYCLES] = PM_RUN_CYC,
> + [PERF_COUNT_HW_INSTRUCTIONS] = PM_RUN_INST_CMPL,
> + [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = PM_BR_FIN,
> + [PERF_COUNT_HW_BRANCH_MISSES] = PM_BR_MPRED_FIN,
> + [PERF_COUNT_HW_CACHE_REFERENCES] = PM_LD_REF_L1,
> + [PERF_COUNT_HW_CACHE_MISSES] = PM_LD_DEMAND_MISS_L1_FIN,
> +};
> +
> static u64 power10_bhrb_filter_map(u64 branch_sample_type)
> {
> u64 pmu_bhrb_filter = 0;
> @@ -311,6 +365,107 @@ static void power10_config_bhrb(u64 pmu_bhrb_filter)
> [C(RESULT_MISS)] = PM_DATA_FROM_L3MISS,
> },
> [C(OP_WRITE)] = {
> + [C(RESULT_ACCESS)] = PM_L2_ST,
> + [C(RESULT_MISS)] = PM_L2_ST_MISS,
> + },
> + [C(OP_PREFETCH)] = {
> + [C(RESULT_ACCESS)] = PM_L3_PF_MISS_L3,
> + [C(RESULT_MISS)] = 0,
> + },
> + },
> + [C(DTLB)] = {
> + [C(OP_READ)] = {
> + [C(RESULT_ACCESS)] = 0,
> + [C(RESULT_MISS)] = PM_DTLB_MISS,
> + },
> + [C(OP_WRITE)] = {
> + [C(RESULT_ACCESS)] = -1,
> + [C(RESULT_MISS)] = -1,
> + },
> + [C(OP_PREFETCH)] = {
> + [C(RESULT_ACCESS)] = -1,
> + [C(RESULT_MISS)] = -1,
> + },
> + },
> + [C(ITLB)] = {
> + [C(OP_READ)] = {
> + [C(RESULT_ACCESS)] = 0,
> + [C(RESULT_MISS)] = PM_ITLB_MISS,
> + },
> + [C(OP_WRITE)] = {
> + [C(RESULT_ACCESS)] = -1,
> + [C(RESULT_MISS)] = -1,
> + },
> + [C(OP_PREFETCH)] = {
> + [C(RESULT_ACCESS)] = -1,
> + [C(RESULT_MISS)] = -1,
> + },
> + },
> + [C(BPU)] = {
> + [C(OP_READ)] = {
> + [C(RESULT_ACCESS)] = PM_BR_CMPL,
> + [C(RESULT_MISS)] = PM_BR_MPRED_CMPL,
> + },
> + [C(OP_WRITE)] = {
> + [C(RESULT_ACCESS)] = -1,
> + [C(RESULT_MISS)] = -1,
> + },
> + [C(OP_PREFETCH)] = {
> + [C(RESULT_ACCESS)] = -1,
> + [C(RESULT_MISS)] = -1,
> + },
> + },
> + [C(NODE)] = {
> + [C(OP_READ)] = {
> + [C(RESULT_ACCESS)] = -1,
> + [C(RESULT_MISS)] = -1,
> + },
> + [C(OP_WRITE)] = {
> + [C(RESULT_ACCESS)] = -1,
> + [C(RESULT_MISS)] = -1,
> + },
> + [C(OP_PREFETCH)] = {
> + [C(RESULT_ACCESS)] = -1,
> + [C(RESULT_MISS)] = -1,
> + },
> + },
> +};
> +
> +static u64 power10_cache_events_dd1[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
> + [C(L1D)] = {
> + [C(OP_READ)] = {
> + [C(RESULT_ACCESS)] = PM_LD_REF_L1,
> + [C(RESULT_MISS)] = PM_LD_MISS_L1,
> + },
> + [C(OP_WRITE)] = {
> + [C(RESULT_ACCESS)] = 0,
> + [C(RESULT_MISS)] = PM_ST_MISS_L1,
> + },
> + [C(OP_PREFETCH)] = {
> + [C(RESULT_ACCESS)] = PM_LD_PREFETCH_CACHE_LINE_MISS,
> + [C(RESULT_MISS)] = 0,
> + },
> + },
> + [C(L1I)] = {
> + [C(OP_READ)] = {
> + [C(RESULT_ACCESS)] = PM_INST_FROM_L1,
> + [C(RESULT_MISS)] = PM_L1_ICACHE_MISS,
> + },
> + [C(OP_WRITE)] = {
> + [C(RESULT_ACCESS)] = PM_INST_FROM_L1MISS,
> + [C(RESULT_MISS)] = -1,
> + },
> + [C(OP_PREFETCH)] = {
> + [C(RESULT_ACCESS)] = PM_IC_PREF_REQ,
> + [C(RESULT_MISS)] = 0,
> + },
> + },
> + [C(LL)] = {
> + [C(OP_READ)] = {
> + [C(RESULT_ACCESS)] = PM_DATA_FROM_L3,
> + [C(RESULT_MISS)] = PM_DATA_FROM_L3MISS,
> + },
> + [C(OP_WRITE)] = {
> [C(RESULT_ACCESS)] = -1,
> [C(RESULT_MISS)] = -1,
> },
> @@ -407,6 +562,7 @@ static void power10_config_bhrb(u64 pmu_bhrb_filter)
> int init_power10_pmu(void)
> {
> int rc;
> + unsigned int pvr = mfspr(SPRN_PVR);
>
> /* Comes from cpu_specs[] */
> if (!cur_cpu_spec->oprofile_cpu_type ||
> @@ -416,6 +572,12 @@ int init_power10_pmu(void)
> /* Set the PERF_REG_EXTENDED_MASK here */
> PERF_REG_EXTENDED_MASK = PERF_REG_PMU_MASK_31;
>
> + if ((PVR_MAJ(pvr) == 1)) {
> + power10_pmu.generic_events = power10_generic_events_dd1;
> + power10_pmu.attr_groups = power10_pmu_attr_groups_dd1;
> + power10_pmu.cache_events = &power10_cache_events_dd1;
> + }
> +
> rc = register_power_pmu(&power10_pmu);
> if (rc)
> return rc;
> --
> 1.8.3.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/4] powerpc/perf: Update the PMU group constraints for l2l3 and threshold events in power10
2020-11-18 4:32 ` Michael Ellerman
@ 2020-11-18 5:21 ` Athira Rajeev
0 siblings, 0 replies; 9+ messages in thread
From: Athira Rajeev @ 2020-11-18 5:21 UTC (permalink / raw)
To: Michael Ellerman; +Cc: Michael Neuling, Madhavan Srinivasan, linuxppc-dev
> On 18-Nov-2020, at 10:02 AM, Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Athira Rajeev <atrajeev@linux.vnet.ibm.com> writes:
>> In Power9, L2/L3 bus events are always available as a
>> "bank" of 4 events. To obtain the counts for any of the
>> l2/l3 bus events in a given bank, the user will have to
>> program PMC4 with corresponding l2/l3 bus event for that
>> bank.
>>
>> Commit 59029136d750 ("powerpc/perf: Add constraints for power9 l2/l3 bus events")
>> enforced this rule in Power9. But this is not valid for
>> Power10, since in Power10 Monitor Mode Control Register2
>> (MMCR2) has bits to configure l2/l3 event bits. Hence remove
>> this PMC4 constraint check from power10.
>>
>> Since the l2/l3 bits in MMCR2 are not per-pmc, patch handles
>> group constrints checks for l2/l3 bits in MMCR2.
>
>> Patch also updates constraints for threshold events in power10.
>
> That should be done in a separate patch please.
Thanks mpe for checking the patch set.
Sure,
I will make threshold constraint changes as a separate patch and send next version
>
> cheers
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 3/4] powerpc/perf: Fix to update l2l3 events and generic event codes for power10
2020-11-18 4:36 ` Michael Ellerman
@ 2020-11-18 5:23 ` Athira Rajeev
0 siblings, 0 replies; 9+ messages in thread
From: Athira Rajeev @ 2020-11-18 5:23 UTC (permalink / raw)
To: Michael Ellerman; +Cc: Michael Neuling, Madhavan Srinivasan, linuxppc-dev
> On 18-Nov-2020, at 10:06 AM, Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Athira Rajeev <atrajeev@linux.vnet.ibm.com> writes:
>> Fix the event code for events: branch-instructions (to PM_BR_FIN),
>> branch-misses (to PM_BR_MPRED_FIN) and cache-misses (to
>> PM_LD_DEMAND_MISS_L1_FIN) for power10 PMU. Update the
>> list of generic events with this modified event code.
>
> That should be one patch.
Ok,
>
>> Export l2l3 events (PM_L2_ST_MISS and PM_L2_ST) and LLC-prefetches
>> (PM_L3_PF_MISS_L3) via sysfs, and also add these to cache_events.
>
> That should be another patch.
Ok,
>
>> To maintain the current event code work with DD1, rename
>> existing array of generic_events, cache_events and pmu_attr_groups
>> with suffix _dd1. Update the power10 pmu init code to pick the
>> dd1 list while registering the power PMU, based on the pvr
>> (Processor Version Register) value.
>
> And that should be a third patch.
>
Ok, I will make these changes in the next version
Thanks
Athira
> cheers
>
>> diff --git a/arch/powerpc/perf/power10-events-list.h b/arch/powerpc/perf/power10-events-list.h
>> index 60c1b81..9e0b3c9 100644
>> --- a/arch/powerpc/perf/power10-events-list.h
>> +++ b/arch/powerpc/perf/power10-events-list.h
>> @@ -15,6 +15,9 @@
>> EVENT(PM_RUN_INST_CMPL, 0x500fa);
>> EVENT(PM_BR_CMPL, 0x4d05e);
>> EVENT(PM_BR_MPRED_CMPL, 0x400f6);
>> +EVENT(PM_BR_FIN, 0x2f04a);
>> +EVENT(PM_BR_MPRED_FIN, 0x35884);
>> +EVENT(PM_LD_DEMAND_MISS_L1_FIN, 0x400f0);
>>
>> /* All L1 D cache load references counted at finish, gated by reject */
>> EVENT(PM_LD_REF_L1, 0x100fc);
>> @@ -36,6 +39,12 @@
>> EVENT(PM_DATA_FROM_L3, 0x01340000001c040);
>> /* Demand LD - L3 Miss (not L2 hit and not L3 hit) */
>> EVENT(PM_DATA_FROM_L3MISS, 0x300fe);
>> +/* All successful D-side store dispatches for this thread */
>> +EVENT(PM_L2_ST, 0x010000046080);
>> +/* All successful D-side store dispatches for this thread that were L2 Miss */
>> +EVENT(PM_L2_ST_MISS, 0x26880);
>> +/* Total HW L3 prefetches(Load+store) */
>> +EVENT(PM_L3_PF_MISS_L3, 0x100000016080);
>> /* Data PTEG reload */
>> EVENT(PM_DTLB_MISS, 0x300fc);
>> /* ITLB Reloaded */
>> diff --git a/arch/powerpc/perf/power10-pmu.c b/arch/powerpc/perf/power10-pmu.c
>> index cf44fb7..86665ad 100644
>> --- a/arch/powerpc/perf/power10-pmu.c
>> +++ b/arch/powerpc/perf/power10-pmu.c
>> @@ -114,6 +114,9 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[])
>> GENERIC_EVENT_ATTR(cache-misses, PM_LD_MISS_L1);
>> GENERIC_EVENT_ATTR(mem-loads, MEM_LOADS);
>> GENERIC_EVENT_ATTR(mem-stores, MEM_STORES);
>> +GENERIC_EVENT_ATTR(branch-instructions, PM_BR_FIN);
>> +GENERIC_EVENT_ATTR(branch-misses, PM_BR_MPRED_FIN);
>> +GENERIC_EVENT_ATTR(cache-misses, PM_LD_DEMAND_MISS_L1_FIN);
>>
>> CACHE_EVENT_ATTR(L1-dcache-load-misses, PM_LD_MISS_L1);
>> CACHE_EVENT_ATTR(L1-dcache-loads, PM_LD_REF_L1);
>> @@ -124,12 +127,15 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[])
>> CACHE_EVENT_ATTR(L1-icache-prefetches, PM_IC_PREF_REQ);
>> CACHE_EVENT_ATTR(LLC-load-misses, PM_DATA_FROM_L3MISS);
>> CACHE_EVENT_ATTR(LLC-loads, PM_DATA_FROM_L3);
>> +CACHE_EVENT_ATTR(LLC-prefetches, PM_L3_PF_MISS_L3);
>> +CACHE_EVENT_ATTR(LLC-store-misses, PM_L2_ST_MISS);
>> +CACHE_EVENT_ATTR(LLC-stores, PM_L2_ST);
>> CACHE_EVENT_ATTR(branch-load-misses, PM_BR_MPRED_CMPL);
>> CACHE_EVENT_ATTR(branch-loads, PM_BR_CMPL);
>> CACHE_EVENT_ATTR(dTLB-load-misses, PM_DTLB_MISS);
>> CACHE_EVENT_ATTR(iTLB-load-misses, PM_ITLB_MISS);
>>
>> -static struct attribute *power10_events_attr[] = {
>> +static struct attribute *power10_events_attr_dd1[] = {
>> GENERIC_EVENT_PTR(PM_RUN_CYC),
>> GENERIC_EVENT_PTR(PM_RUN_INST_CMPL),
>> GENERIC_EVENT_PTR(PM_BR_CMPL),
>> @@ -154,11 +160,44 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[])
>> NULL
>> };
>>
>> +static struct attribute *power10_events_attr[] = {
>> + GENERIC_EVENT_PTR(PM_RUN_CYC),
>> + GENERIC_EVENT_PTR(PM_RUN_INST_CMPL),
>> + GENERIC_EVENT_PTR(PM_BR_FIN),
>> + GENERIC_EVENT_PTR(PM_BR_MPRED_FIN),
>> + GENERIC_EVENT_PTR(PM_LD_REF_L1),
>> + GENERIC_EVENT_PTR(PM_LD_DEMAND_MISS_L1_FIN),
>> + GENERIC_EVENT_PTR(MEM_LOADS),
>> + GENERIC_EVENT_PTR(MEM_STORES),
>> + CACHE_EVENT_PTR(PM_LD_MISS_L1),
>> + CACHE_EVENT_PTR(PM_LD_REF_L1),
>> + CACHE_EVENT_PTR(PM_LD_PREFETCH_CACHE_LINE_MISS),
>> + CACHE_EVENT_PTR(PM_ST_MISS_L1),
>> + CACHE_EVENT_PTR(PM_L1_ICACHE_MISS),
>> + CACHE_EVENT_PTR(PM_INST_FROM_L1),
>> + CACHE_EVENT_PTR(PM_IC_PREF_REQ),
>> + CACHE_EVENT_PTR(PM_DATA_FROM_L3MISS),
>> + CACHE_EVENT_PTR(PM_DATA_FROM_L3),
>> + CACHE_EVENT_PTR(PM_L3_PF_MISS_L3),
>> + CACHE_EVENT_PTR(PM_L2_ST_MISS),
>> + CACHE_EVENT_PTR(PM_L2_ST),
>> + CACHE_EVENT_PTR(PM_BR_MPRED_CMPL),
>> + CACHE_EVENT_PTR(PM_BR_CMPL),
>> + CACHE_EVENT_PTR(PM_DTLB_MISS),
>> + CACHE_EVENT_PTR(PM_ITLB_MISS),
>> + NULL
>> +};
>> +
>> static struct attribute_group power10_pmu_events_group = {
>> .name = "events",
>> .attrs = power10_events_attr,
>> };
>>
>> +static struct attribute_group power10_pmu_events_group_dd1 = {
>> + .name = "events",
>> + .attrs = power10_events_attr_dd1,
>> +};
>> +
>> PMU_FORMAT_ATTR(event, "config:0-59");
>> PMU_FORMAT_ATTR(pmcxsel, "config:0-7");
>> PMU_FORMAT_ATTR(mark, "config:8");
>> @@ -211,7 +250,13 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[])
>> NULL,
>> };
>>
>> -static int power10_generic_events[] = {
>> +static const struct attribute_group *power10_pmu_attr_groups_dd1[] = {
>> + &power10_pmu_format_group,
>> + &power10_pmu_events_group_dd1,
>> + NULL,
>> +};
>> +
>> +static int power10_generic_events_dd1[] = {
>> [PERF_COUNT_HW_CPU_CYCLES] = PM_RUN_CYC,
>> [PERF_COUNT_HW_INSTRUCTIONS] = PM_RUN_INST_CMPL,
>> [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = PM_BR_CMPL,
>> @@ -220,6 +265,15 @@ static int power10_get_alternatives(u64 event, unsigned int flags, u64 alt[])
>> [PERF_COUNT_HW_CACHE_MISSES] = PM_LD_MISS_L1,
>> };
>>
>> +static int power10_generic_events[] = {
>> + [PERF_COUNT_HW_CPU_CYCLES] = PM_RUN_CYC,
>> + [PERF_COUNT_HW_INSTRUCTIONS] = PM_RUN_INST_CMPL,
>> + [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = PM_BR_FIN,
>> + [PERF_COUNT_HW_BRANCH_MISSES] = PM_BR_MPRED_FIN,
>> + [PERF_COUNT_HW_CACHE_REFERENCES] = PM_LD_REF_L1,
>> + [PERF_COUNT_HW_CACHE_MISSES] = PM_LD_DEMAND_MISS_L1_FIN,
>> +};
>> +
>> static u64 power10_bhrb_filter_map(u64 branch_sample_type)
>> {
>> u64 pmu_bhrb_filter = 0;
>> @@ -311,6 +365,107 @@ static void power10_config_bhrb(u64 pmu_bhrb_filter)
>> [C(RESULT_MISS)] = PM_DATA_FROM_L3MISS,
>> },
>> [C(OP_WRITE)] = {
>> + [C(RESULT_ACCESS)] = PM_L2_ST,
>> + [C(RESULT_MISS)] = PM_L2_ST_MISS,
>> + },
>> + [C(OP_PREFETCH)] = {
>> + [C(RESULT_ACCESS)] = PM_L3_PF_MISS_L3,
>> + [C(RESULT_MISS)] = 0,
>> + },
>> + },
>> + [C(DTLB)] = {
>> + [C(OP_READ)] = {
>> + [C(RESULT_ACCESS)] = 0,
>> + [C(RESULT_MISS)] = PM_DTLB_MISS,
>> + },
>> + [C(OP_WRITE)] = {
>> + [C(RESULT_ACCESS)] = -1,
>> + [C(RESULT_MISS)] = -1,
>> + },
>> + [C(OP_PREFETCH)] = {
>> + [C(RESULT_ACCESS)] = -1,
>> + [C(RESULT_MISS)] = -1,
>> + },
>> + },
>> + [C(ITLB)] = {
>> + [C(OP_READ)] = {
>> + [C(RESULT_ACCESS)] = 0,
>> + [C(RESULT_MISS)] = PM_ITLB_MISS,
>> + },
>> + [C(OP_WRITE)] = {
>> + [C(RESULT_ACCESS)] = -1,
>> + [C(RESULT_MISS)] = -1,
>> + },
>> + [C(OP_PREFETCH)] = {
>> + [C(RESULT_ACCESS)] = -1,
>> + [C(RESULT_MISS)] = -1,
>> + },
>> + },
>> + [C(BPU)] = {
>> + [C(OP_READ)] = {
>> + [C(RESULT_ACCESS)] = PM_BR_CMPL,
>> + [C(RESULT_MISS)] = PM_BR_MPRED_CMPL,
>> + },
>> + [C(OP_WRITE)] = {
>> + [C(RESULT_ACCESS)] = -1,
>> + [C(RESULT_MISS)] = -1,
>> + },
>> + [C(OP_PREFETCH)] = {
>> + [C(RESULT_ACCESS)] = -1,
>> + [C(RESULT_MISS)] = -1,
>> + },
>> + },
>> + [C(NODE)] = {
>> + [C(OP_READ)] = {
>> + [C(RESULT_ACCESS)] = -1,
>> + [C(RESULT_MISS)] = -1,
>> + },
>> + [C(OP_WRITE)] = {
>> + [C(RESULT_ACCESS)] = -1,
>> + [C(RESULT_MISS)] = -1,
>> + },
>> + [C(OP_PREFETCH)] = {
>> + [C(RESULT_ACCESS)] = -1,
>> + [C(RESULT_MISS)] = -1,
>> + },
>> + },
>> +};
>> +
>> +static u64 power10_cache_events_dd1[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
>> + [C(L1D)] = {
>> + [C(OP_READ)] = {
>> + [C(RESULT_ACCESS)] = PM_LD_REF_L1,
>> + [C(RESULT_MISS)] = PM_LD_MISS_L1,
>> + },
>> + [C(OP_WRITE)] = {
>> + [C(RESULT_ACCESS)] = 0,
>> + [C(RESULT_MISS)] = PM_ST_MISS_L1,
>> + },
>> + [C(OP_PREFETCH)] = {
>> + [C(RESULT_ACCESS)] = PM_LD_PREFETCH_CACHE_LINE_MISS,
>> + [C(RESULT_MISS)] = 0,
>> + },
>> + },
>> + [C(L1I)] = {
>> + [C(OP_READ)] = {
>> + [C(RESULT_ACCESS)] = PM_INST_FROM_L1,
>> + [C(RESULT_MISS)] = PM_L1_ICACHE_MISS,
>> + },
>> + [C(OP_WRITE)] = {
>> + [C(RESULT_ACCESS)] = PM_INST_FROM_L1MISS,
>> + [C(RESULT_MISS)] = -1,
>> + },
>> + [C(OP_PREFETCH)] = {
>> + [C(RESULT_ACCESS)] = PM_IC_PREF_REQ,
>> + [C(RESULT_MISS)] = 0,
>> + },
>> + },
>> + [C(LL)] = {
>> + [C(OP_READ)] = {
>> + [C(RESULT_ACCESS)] = PM_DATA_FROM_L3,
>> + [C(RESULT_MISS)] = PM_DATA_FROM_L3MISS,
>> + },
>> + [C(OP_WRITE)] = {
>> [C(RESULT_ACCESS)] = -1,
>> [C(RESULT_MISS)] = -1,
>> },
>> @@ -407,6 +562,7 @@ static void power10_config_bhrb(u64 pmu_bhrb_filter)
>> int init_power10_pmu(void)
>> {
>> int rc;
>> + unsigned int pvr = mfspr(SPRN_PVR);
>>
>> /* Comes from cpu_specs[] */
>> if (!cur_cpu_spec->oprofile_cpu_type ||
>> @@ -416,6 +572,12 @@ int init_power10_pmu(void)
>> /* Set the PERF_REG_EXTENDED_MASK here */
>> PERF_REG_EXTENDED_MASK = PERF_REG_PMU_MASK_31;
>>
>> + if ((PVR_MAJ(pvr) == 1)) {
>> + power10_pmu.generic_events = power10_generic_events_dd1;
>> + power10_pmu.attr_groups = power10_pmu_attr_groups_dd1;
>> + power10_pmu.cache_events = &power10_cache_events_dd1;
>> + }
>> +
>> rc = register_power_pmu(&power10_pmu);
>> if (rc)
>> return rc;
>> --
>> 1.8.3.1
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-11-18 5:25 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-11 4:33 [PATCH 0/4] powerpc/perf: Fixes for power10 PMU Athira Rajeev
2020-11-11 4:33 ` [PATCH 1/4] powerpc/perf: Fix to update radix_scope_qual in power10 Athira Rajeev
2020-11-11 4:33 ` [PATCH 2/4] powerpc/perf: Update the PMU group constraints for l2l3 and threshold events " Athira Rajeev
2020-11-18 4:32 ` Michael Ellerman
2020-11-18 5:21 ` Athira Rajeev
2020-11-11 4:33 ` [PATCH 3/4] powerpc/perf: Fix to update l2l3 events and generic event codes for power10 Athira Rajeev
2020-11-18 4:36 ` Michael Ellerman
2020-11-18 5:23 ` Athira Rajeev
2020-11-11 4:33 ` [PATCH 4/4] powerpc/perf: MMCR0 control for PMU registers under PMCC=00 Athira Rajeev
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.