* Updated Broadwell perf patchkit @ 2014-08-14 1:17 Andi Kleen 2014-08-14 1:17 ` [PATCH 1/5] perf, x86: Remove incorrect model number from Haswell perf Andi Kleen ` (4 more replies) 0 siblings, 5 replies; 25+ messages in thread From: Andi Kleen @ 2014-08-14 1:17 UTC (permalink / raw) To: peterz; +Cc: linux-kernel, mingo, eranian Addressed the earlier review feedback. Also changed Haswell to use the Broadwell cache event list, which is far more correct for it than Sandy Bridge's. -Andi ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 1/5] perf, x86: Remove incorrect model number from Haswell perf 2014-08-14 1:17 Updated Broadwell perf patchkit Andi Kleen @ 2014-08-14 1:17 ` Andi Kleen 2014-08-14 1:17 ` [PATCH 2/5] perf, x86: Document all Haswell models Andi Kleen ` (3 subsequent siblings) 4 siblings, 0 replies; 25+ messages in thread From: Andi Kleen @ 2014-08-14 1:17 UTC (permalink / raw) To: peterz; +Cc: linux-kernel, mingo, eranian, Andi Kleen From: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> --- arch/x86/kernel/cpu/perf_event_intel.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index 2502d0d..ef6c8b7 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -2541,7 +2541,6 @@ __init int intel_pmu_init(void) case 60: /* Haswell Client */ case 70: - case 71: case 63: case 69: x86_pmu.late_ack = true; -- 1.9.3 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 2/5] perf, x86: Document all Haswell models 2014-08-14 1:17 Updated Broadwell perf patchkit Andi Kleen 2014-08-14 1:17 ` [PATCH 1/5] perf, x86: Remove incorrect model number from Haswell perf Andi Kleen @ 2014-08-14 1:17 ` Andi Kleen 2014-08-14 7:01 ` Peter Zijlstra 2014-08-14 1:17 ` [PATCH 3/5] perf, x86: Add Broadwell core support Andi Kleen ` (2 subsequent siblings) 4 siblings, 1 reply; 25+ messages in thread From: Andi Kleen @ 2014-08-14 1:17 UTC (permalink / raw) To: peterz; +Cc: linux-kernel, mingo, eranian, Andi Kleen From: Andi Kleen <ak@linux.intel.com> Add names for each Haswell model as requested by Peter. Signed-off-by: Andi Kleen <ak@linux.intel.com> --- arch/x86/kernel/cpu/perf_event_intel.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index ef6c8b7..03befdd 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -2540,9 +2540,9 @@ __init int intel_pmu_init(void) case 60: /* Haswell Client */ - case 70: - case 63: - case 69: + case 70: /* Crystall Well */ + case 63: /* Haswell Server */ + case 69: /* Haswell ULT */ x86_pmu.late_ack = true; memcpy(hw_cache_event_ids, snb_hw_cache_event_ids, sizeof(hw_cache_event_ids)); memcpy(hw_cache_extra_regs, snb_hw_cache_extra_regs, sizeof(hw_cache_extra_regs)); -- 1.9.3 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 2/5] perf, x86: Document all Haswell models 2014-08-14 1:17 ` [PATCH 2/5] perf, x86: Document all Haswell models Andi Kleen @ 2014-08-14 7:01 ` Peter Zijlstra 2014-08-14 15:00 ` Andi Kleen 0 siblings, 1 reply; 25+ messages in thread From: Peter Zijlstra @ 2014-08-14 7:01 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-kernel, mingo, eranian, Andi Kleen [-- Attachment #1: Type: text/plain, Size: 1385 bytes --] On Wed, Aug 13, 2014 at 06:17:46PM -0700, Andi Kleen wrote: > From: Andi Kleen <ak@linux.intel.com> > > Add names for each Haswell model as requested by Peter. > > Signed-off-by: Andi Kleen <ak@linux.intel.com> > --- > arch/x86/kernel/cpu/perf_event_intel.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c > index ef6c8b7..03befdd 100644 > --- a/arch/x86/kernel/cpu/perf_event_intel.c > +++ b/arch/x86/kernel/cpu/perf_event_intel.c > @@ -2540,9 +2540,9 @@ __init int intel_pmu_init(void) > > > case 60: /* Haswell Client */ > + case 70: /* Crystall Well */ > + case 63: /* Haswell Server */ > + case 69: /* Haswell ULT */ So I googled Crystalwell, and I'm not sure I understand; is that Haswell-H + GT3e or it that Haswell + GT3e, the distinction being that there are also Desktop parts with Iris Pro 5200, such like the Haswell-R. Haswell Server, is that the single socket one, or is it like IVB both the EP and EX parts? And is 69 only the ULT or also the ULX parts? Would something like this be accurate:? case 60: /* 22nm Haswell */ case 70: /* 22nm Haswell + GT3e (Iris Pro 5200) */ case 69: /* 22nm Haswell ULT */ case 63: /* 22nm Haswell-EP/EX */ [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/5] perf, x86: Document all Haswell models 2014-08-14 7:01 ` Peter Zijlstra @ 2014-08-14 15:00 ` Andi Kleen 0 siblings, 0 replies; 25+ messages in thread From: Andi Kleen @ 2014-08-14 15:00 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Andi Kleen, linux-kernel, mingo, eranian > Would something like this be accurate:? I believe my version is more accurate, but feel free to use yours. -Andi -- ak@linux.intel.com -- Speaking for myself only ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 3/5] perf, x86: Add Broadwell core support 2014-08-14 1:17 Updated Broadwell perf patchkit Andi Kleen 2014-08-14 1:17 ` [PATCH 1/5] perf, x86: Remove incorrect model number from Haswell perf Andi Kleen 2014-08-14 1:17 ` [PATCH 2/5] perf, x86: Document all Haswell models Andi Kleen @ 2014-08-14 1:17 ` Andi Kleen 2014-08-14 7:07 ` Peter Zijlstra 2014-08-14 1:17 ` [PATCH 4/5] perf, x86: Add INST_RETIRED.ALL workarounds Andi Kleen 2014-08-14 1:17 ` [PATCH 5/5] perf, x86: Use Broadwell cache event list for Haswell Andi Kleen 4 siblings, 1 reply; 25+ messages in thread From: Andi Kleen @ 2014-08-14 1:17 UTC (permalink / raw) To: peterz; +Cc: linux-kernel, mingo, eranian, Andi Kleen From: Andi Kleen <ak@linux.intel.com> Add Broadwell support for Broadwell Client to perf. This is very similar to Haswell. It uses a new cache event table, because there were various changes there. The constraint list has one new event that needs to be handled over Haswell. The PEBS event list is the same, so we reuse Haswell's. v2: Remove unnamed model numbers. Signed-off-by: Andi Kleen <ak@linux.intel.com> --- arch/x86/kernel/cpu/perf_event_intel.c | 150 +++++++++++++++++++++++++++++++++ 1 file changed, 150 insertions(+) diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index 03befdd..4bfb0ec 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -220,6 +220,15 @@ static struct event_constraint intel_hsw_event_constraints[] = { EVENT_CONSTRAINT_END }; +struct event_constraint intel_bdw_event_constraints[] = { + FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */ + FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */ + FIXED_EVENT_CONSTRAINT(0x0300, 2), /* CPU_CLK_UNHALTED.REF */ + INTEL_UEVENT_CONSTRAINT(0x148, 0x4), /* L1D_PEND_MISS.PENDING */ + INTEL_EVENT_CONSTRAINT(0xa3, 0x4), /* CYCLE_ACTIVITY.* */ + EVENT_CONSTRAINT_END +}; + static u64 intel_pmu_event_map(int hw_event) { return intel_perfmon_event_map[hw_event]; @@ -415,6 +424,126 @@ static __initconst const u64 snb_hw_cache_event_ids }; +static __initconst const u64 bdw_hw_cache_event_ids + [PERF_COUNT_HW_CACHE_MAX] + [PERF_COUNT_HW_CACHE_OP_MAX] + [PERF_COUNT_HW_CACHE_RESULT_MAX] = +{ + [ C(L1D ) ] = { + [ C(OP_READ) ] = { + [ C(RESULT_ACCESS) ] = 0x81d0, /* MEM_UOPS_RETIRED.ALL_LOADS */ + [ C(RESULT_MISS) ] = 0x151, /* L1D.REPLACEMENT */ + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = 0x82d0, /* MEM_UOPS_RETIRED.ALL_STORES */ + [ C(RESULT_MISS) ] = 0x0, + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = 0x0, + [ C(RESULT_MISS) ] = 0x0, + }, + }, + [ C(L1I ) ] = { + [ C(OP_READ) ] = { + [ C(RESULT_ACCESS) ] = 0x0, + [ C(RESULT_MISS) ] = 0x280, /* ICACHE.MISSES */ + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = -1, + [ C(RESULT_MISS) ] = -1, + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = 0x0, + [ C(RESULT_MISS) ] = 0x0, + }, + }, + [ C(LL ) ] = { + [ C(OP_READ) ] = { + /* OFFCORE_RESPONSE:ALL_DATA_RD|ALL_CODE_RD */ + [ C(RESULT_ACCESS) ] = 0x1b7, + /* OFFCORE_RESPONSE:ALL_DATA_RD|ALL_CODE_RD|SUPPLIER_NONE| + L3_MISS|ANY_SNOOP */ + [ C(RESULT_MISS) ] = 0x1b7, + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = 0x1b7, /* OFFCORE_RESPONSE:ALL_RFO */ + /* OFFCORE_RESPONSE:ALL_RFO|SUPPLIER_NONE|L3_MISS|ANY_SNOOP */ + [ C(RESULT_MISS) ] = 0x1b7, + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = 0x0, + [ C(RESULT_MISS) ] = 0x0, + }, + }, + [ C(DTLB) ] = { + [ C(OP_READ) ] = { + [ C(RESULT_ACCESS) ] = 0x81d0, /* MEM_UOPS_RETIRED.ALL_LOADS */ + [ C(RESULT_MISS) ] = 0x108, /* DTLB_LOAD_MISSES.MISS_CAUSES_A_WALK */ + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = 0x82d0, /* MEM_UOPS_RETIRED.ALL_STORES */ + [ C(RESULT_MISS) ] = 0x149, /* DTLB_STORE_MISSES.MISS_CAUSES_A_WALK */ + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = 0x0, + [ C(RESULT_MISS) ] = 0x0, + }, + }, + [ C(ITLB) ] = { + [ C(OP_READ) ] = { + [ C(RESULT_ACCESS) ] = 0x6085, /* ITLB_MISSES.STLB_HIT */ + [ C(RESULT_MISS) ] = 0x185, /* ITLB_MISSES.MISS_CAUSES_A_WALK */ + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = -1, + [ C(RESULT_MISS) ] = -1, + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = -1, + [ C(RESULT_MISS) ] = -1, + }, + }, + [ C(BPU ) ] = { + [ C(OP_READ) ] = { + [ C(RESULT_ACCESS) ] = 0xc4, /* BR_INST_RETIRED.ALL_BRANCHES */ + [ C(RESULT_MISS) ] = 0xc5, /* BR_MISP_RETIRED.ALL_BRANCHES */ + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = -1, + [ C(RESULT_MISS) ] = -1, + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = -1, + [ C(RESULT_MISS) ] = -1, + }, + }, +}; + +static __initconst const u64 bdw_hw_cache_extra_regs + [PERF_COUNT_HW_CACHE_MAX] + [PERF_COUNT_HW_CACHE_OP_MAX] + [PERF_COUNT_HW_CACHE_RESULT_MAX] = +{ + [ C(LL ) ] = { + [ C(OP_READ) ] = { + /* OFFCORE_RESPONSE:ALL_DATA_RD|ALL_CODE_RD */ + [ C(RESULT_ACCESS) ] = 0x2d5, + /* OFFCORE_RESPONSE:ALL_DATA_RD|ALL_CODE_RD|SUPPLIER_NONE| + L3_MISS|ANY_SNOOP */ + [ C(RESULT_MISS) ] = 0x3fbc0202d5, + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = 0x122, /* OFFCORE_RESPONSE:ALL_RFO */ + /* OFFCORE_RESPONSE:ALL_RFO|SUPPLIER_NONE|L3_MISS|ANY_SNOOP */ + [ C(RESULT_MISS) ] = 0x3fbc020122, + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = 0x0, + [ C(RESULT_MISS) ] = 0x0, + }, + }, +}; + static __initconst const u64 westmere_hw_cache_event_ids [PERF_COUNT_HW_CACHE_MAX] [PERF_COUNT_HW_CACHE_OP_MAX] @@ -2564,6 +2693,27 @@ __init int intel_pmu_init(void) pr_cont("Haswell events, "); break; + case 61: /* Broadwell Client */ + x86_pmu.late_ack = true; + memcpy(hw_cache_event_ids, bdw_hw_cache_event_ids, sizeof(hw_cache_event_ids)); + memcpy(hw_cache_extra_regs, bdw_hw_cache_extra_regs, sizeof(hw_cache_extra_regs)); + + intel_pmu_lbr_init_snb(); + + x86_pmu.event_constraints = intel_bdw_event_constraints; + x86_pmu.pebs_constraints = intel_hsw_pebs_event_constraints; + x86_pmu.extra_regs = intel_snbep_extra_regs; + x86_pmu.pebs_aliases = intel_pebs_aliases_snb; + /* all extra regs are per-cpu when HT is on */ + x86_pmu.er_flags |= ERF_HAS_RSP_1; + x86_pmu.er_flags |= ERF_NO_HT_SHARING; + + x86_pmu.hw_config = hsw_hw_config; + x86_pmu.get_event_constraints = hsw_get_event_constraints; + x86_pmu.cpu_events = hsw_events_attrs; + pr_cont("Broadwell events, "); + break; + default: switch (x86_pmu.version) { case 1: -- 1.9.3 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 3/5] perf, x86: Add Broadwell core support 2014-08-14 1:17 ` [PATCH 3/5] perf, x86: Add Broadwell core support Andi Kleen @ 2014-08-14 7:07 ` Peter Zijlstra 2014-08-14 7:32 ` Peter Zijlstra 0 siblings, 1 reply; 25+ messages in thread From: Peter Zijlstra @ 2014-08-14 7:07 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-kernel, mingo, eranian, Andi Kleen [-- Attachment #1: Type: text/plain, Size: 206 bytes --] On Wed, Aug 13, 2014 at 06:17:47PM -0700, Andi Kleen wrote: > v2: Remove unnamed model numbers. > + case 61: /* Broadwell Client */ Seriously? So 71 and 79 are fine as a number but you cannot name them? [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/5] perf, x86: Add Broadwell core support 2014-08-14 7:07 ` Peter Zijlstra @ 2014-08-14 7:32 ` Peter Zijlstra 2014-08-14 14:58 ` Andi Kleen 0 siblings, 1 reply; 25+ messages in thread From: Peter Zijlstra @ 2014-08-14 7:32 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-kernel, mingo, eranian, Andi Kleen [-- Attachment #1: Type: text/plain, Size: 433 bytes --] On Thu, Aug 14, 2014 at 09:07:22AM +0200, Peter Zijlstra wrote: > On Wed, Aug 13, 2014 at 06:17:47PM -0700, Andi Kleen wrote: > > > v2: Remove unnamed model numbers. > > > + case 61: /* Broadwell Client */ > > Seriously? So 71 and 79 are fine as a number but you cannot name them? would something like: case 71: /* 14nm Broadwell TBA */ work? Where you then will update the description at the appropriate time? [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/5] perf, x86: Add Broadwell core support 2014-08-14 7:32 ` Peter Zijlstra @ 2014-08-14 14:58 ` Andi Kleen 2014-08-14 15:10 ` Peter Zijlstra 0 siblings, 1 reply; 25+ messages in thread From: Andi Kleen @ 2014-08-14 14:58 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Andi Kleen, linux-kernel, mingo, eranian On Thu, Aug 14, 2014 at 09:32:10AM +0200, Peter Zijlstra wrote: > On Thu, Aug 14, 2014 at 09:07:22AM +0200, Peter Zijlstra wrote: > > On Wed, Aug 13, 2014 at 06:17:47PM -0700, Andi Kleen wrote: > > > > > v2: Remove unnamed model numbers. > > > > > + case 61: /* Broadwell Client */ > > > > Seriously? So 71 and 79 are fine as a number but you cannot name them? > > would something like: > > case 71: /* 14nm Broadwell TBA */ > > work? Where you then will update the description at the appropriate > time? There won't be a single description, it will turn into all kinds of products. The names I used are the ones we use. Really, Peter, it's futile to try to reproduce http://ark.intel.com in perf comments. If you need information on specific Intel products please look it up there. -Andi -- ak@linux.intel.com -- Speaking for myself only ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/5] perf, x86: Add Broadwell core support 2014-08-14 14:58 ` Andi Kleen @ 2014-08-14 15:10 ` Peter Zijlstra 0 siblings, 0 replies; 25+ messages in thread From: Peter Zijlstra @ 2014-08-14 15:10 UTC (permalink / raw) To: Andi Kleen; +Cc: Andi Kleen, linux-kernel, mingo, eranian [-- Attachment #1: Type: text/plain, Size: 373 bytes --] On Thu, Aug 14, 2014 at 07:58:54AM -0700, Andi Kleen wrote: > Really, Peter, it's futile to try to reproduce http://ark.intel.com > in perf comments. If you need information on specific Intel > products please look it up there. ark.intel.com is useless, I never can find anything there. The best I can find is wikipedia and they stopped listing model numbers long ago :/ [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 4/5] perf, x86: Add INST_RETIRED.ALL workarounds 2014-08-14 1:17 Updated Broadwell perf patchkit Andi Kleen ` (2 preceding siblings ...) 2014-08-14 1:17 ` [PATCH 3/5] perf, x86: Add Broadwell core support Andi Kleen @ 2014-08-14 1:17 ` Andi Kleen 2014-08-14 4:46 ` Stephane Eranian 2014-08-14 7:10 ` Peter Zijlstra 2014-08-14 1:17 ` [PATCH 5/5] perf, x86: Use Broadwell cache event list for Haswell Andi Kleen 4 siblings, 2 replies; 25+ messages in thread From: Andi Kleen @ 2014-08-14 1:17 UTC (permalink / raw) To: peterz; +Cc: linux-kernel, mingo, eranian, Andi Kleen From: Andi Kleen <ak@linux.intel.com> On Broadwell INST_RETIRED.ALL cannot be used with any period that doesn't have the lowest 6 bits cleared. And the period should not be smaller than 128. Add a new callback to enforce this, and set it for Broadwell. This is erratum BDM57 and BDM11. v2: Use correct event name in description. Use EVENT() macro. Signed-off-by: Andi Kleen <ak@linux.intel.com> --- arch/x86/kernel/cpu/perf_event.c | 3 +++ arch/x86/kernel/cpu/perf_event.h | 1 + arch/x86/kernel/cpu/perf_event_intel.c | 19 +++++++++++++++++++ 3 files changed, 23 insertions(+) diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c index 0adc5e3..a0adf58 100644 --- a/arch/x86/kernel/cpu/perf_event.c +++ b/arch/x86/kernel/cpu/perf_event.c @@ -980,6 +980,9 @@ int x86_perf_event_set_period(struct perf_event *event) if (left > x86_pmu.max_period) left = x86_pmu.max_period; + if (x86_pmu.limit_period) + left = x86_pmu.limit_period(event, left); + per_cpu(pmc_prev_left[idx], smp_processor_id()) = left; /* diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h index de81627..a46e391 100644 --- a/arch/x86/kernel/cpu/perf_event.h +++ b/arch/x86/kernel/cpu/perf_event.h @@ -456,6 +456,7 @@ struct x86_pmu { struct x86_pmu_quirk *quirks; int perfctr_second_write; bool late_ack; + unsigned (*limit_period)(struct perf_event *event, unsigned l); /* * sysfs attrs diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index 4bfb0ec..66260e1 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -2034,6 +2034,24 @@ hsw_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event) return c; } +/* + * Broadwell: + * The INST_RETIRED.ALL period always needs to have lowest + * 6bits cleared (BDM57). It shall not use a period smaller + * than 100 (BDM11). We combine the two to enforce + * a min-period of 128. + */ +static unsigned bdw_limit_period(struct perf_event *event, unsigned left) +{ + if ((event->hw.config & 0xffff) == + X86_CONFIG(.event=0xc0, .umask=0x01)) { + if (left < 128) + left = 128; + left &= ~0x3fu; + } + return left; +} + PMU_FORMAT_ATTR(event, "config:0-7" ); PMU_FORMAT_ATTR(umask, "config:8-15" ); PMU_FORMAT_ATTR(edge, "config:18" ); @@ -2711,6 +2729,7 @@ __init int intel_pmu_init(void) x86_pmu.hw_config = hsw_hw_config; x86_pmu.get_event_constraints = hsw_get_event_constraints; x86_pmu.cpu_events = hsw_events_attrs; + x86_pmu.limit_period = bdw_limit_period; pr_cont("Broadwell events, "); break; -- 1.9.3 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 4/5] perf, x86: Add INST_RETIRED.ALL workarounds 2014-08-14 1:17 ` [PATCH 4/5] perf, x86: Add INST_RETIRED.ALL workarounds Andi Kleen @ 2014-08-14 4:46 ` Stephane Eranian 2014-08-14 14:30 ` Andi Kleen 2014-08-14 7:10 ` Peter Zijlstra 1 sibling, 1 reply; 25+ messages in thread From: Stephane Eranian @ 2014-08-14 4:46 UTC (permalink / raw) To: Andi Kleen; +Cc: Peter Zijlstra, LKML, Ingo Molnar, Andi Kleen On Thu, Aug 14, 2014 at 3:17 AM, Andi Kleen <andi@firstfloor.org> wrote: > From: Andi Kleen <ak@linux.intel.com> > > On Broadwell INST_RETIRED.ALL cannot be used with any period > that doesn't have the lowest 6 bits cleared. And the period > should not be smaller than 128. > If you have frequency mode enabled, then I suspect this works okay. It may be that the kernel will keep trying to set a period lower than 128 but you will correct it. I am more worried about the case where the user sets a fixed period with some of the bottom 6 bits set. The apps thinks it is sampling at X occurences per sample, when it is in fact at X - 63 (worst case). I think this would be okay, if there was a trace of this in the sampling buffer, i.e., PERF_SAMPLE_PERIOD. But I recall that perf record does not request this flag when using a fixed period. So no way of knowing what the actual period was, at least when using the perf tool. I understand also that for this event, 64 occurrences may not matter as much, maybe except if you use some filters such as cmask. > Add a new callback to enforce this, and set it for Broadwell. > > This is erratum BDM57 and BDM11. > > v2: Use correct event name in description. Use EVENT() macro. > Signed-off-by: Andi Kleen <ak@linux.intel.com> > --- > arch/x86/kernel/cpu/perf_event.c | 3 +++ > arch/x86/kernel/cpu/perf_event.h | 1 + > arch/x86/kernel/cpu/perf_event_intel.c | 19 +++++++++++++++++++ > 3 files changed, 23 insertions(+) > > diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c > index 0adc5e3..a0adf58 100644 > --- a/arch/x86/kernel/cpu/perf_event.c > +++ b/arch/x86/kernel/cpu/perf_event.c > @@ -980,6 +980,9 @@ int x86_perf_event_set_period(struct perf_event *event) > if (left > x86_pmu.max_period) > left = x86_pmu.max_period; > > + if (x86_pmu.limit_period) > + left = x86_pmu.limit_period(event, left); > + > per_cpu(pmc_prev_left[idx], smp_processor_id()) = left; > > /* > diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h > index de81627..a46e391 100644 > --- a/arch/x86/kernel/cpu/perf_event.h > +++ b/arch/x86/kernel/cpu/perf_event.h > @@ -456,6 +456,7 @@ struct x86_pmu { > struct x86_pmu_quirk *quirks; > int perfctr_second_write; > bool late_ack; > + unsigned (*limit_period)(struct perf_event *event, unsigned l); > > /* > * sysfs attrs > diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c > index 4bfb0ec..66260e1 100644 > --- a/arch/x86/kernel/cpu/perf_event_intel.c > +++ b/arch/x86/kernel/cpu/perf_event_intel.c > @@ -2034,6 +2034,24 @@ hsw_get_event_constraints(struct cpu_hw_events *cpuc, struct perf_event *event) > return c; > } > > +/* > + * Broadwell: > + * The INST_RETIRED.ALL period always needs to have lowest > + * 6bits cleared (BDM57). It shall not use a period smaller > + * than 100 (BDM11). We combine the two to enforce > + * a min-period of 128. > + */ > +static unsigned bdw_limit_period(struct perf_event *event, unsigned left) > +{ > + if ((event->hw.config & 0xffff) == > + X86_CONFIG(.event=0xc0, .umask=0x01)) { > + if (left < 128) > + left = 128; > + left &= ~0x3fu; > + } > + return left; > +} > + > PMU_FORMAT_ATTR(event, "config:0-7" ); > PMU_FORMAT_ATTR(umask, "config:8-15" ); > PMU_FORMAT_ATTR(edge, "config:18" ); > @@ -2711,6 +2729,7 @@ __init int intel_pmu_init(void) > x86_pmu.hw_config = hsw_hw_config; > x86_pmu.get_event_constraints = hsw_get_event_constraints; > x86_pmu.cpu_events = hsw_events_attrs; > + x86_pmu.limit_period = bdw_limit_period; > pr_cont("Broadwell events, "); > break; > > -- > 1.9.3 > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/5] perf, x86: Add INST_RETIRED.ALL workarounds 2014-08-14 4:46 ` Stephane Eranian @ 2014-08-14 14:30 ` Andi Kleen 2014-08-14 17:47 ` Stephane Eranian 0 siblings, 1 reply; 25+ messages in thread From: Andi Kleen @ 2014-08-14 14:30 UTC (permalink / raw) To: Stephane Eranian; +Cc: Andi Kleen, Peter Zijlstra, LKML, Ingo Molnar I understand all your points, but there's no alternative. The only other way would be to disable INST_RETIRED.ALL. -Andi ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/5] perf, x86: Add INST_RETIRED.ALL workarounds 2014-08-14 14:30 ` Andi Kleen @ 2014-08-14 17:47 ` Stephane Eranian 2014-08-14 18:41 ` Andi Kleen 2014-08-15 14:31 ` Peter Zijlstra 0 siblings, 2 replies; 25+ messages in thread From: Stephane Eranian @ 2014-08-14 17:47 UTC (permalink / raw) To: Andi Kleen, Namhyung Kim, Jiri Olsa, Arnaldo Carvalho de Melo Cc: Andi Kleen, Peter Zijlstra, LKML, Ingo Molnar [+perf tool maintainers] On Thu, Aug 14, 2014 at 4:30 PM, Andi Kleen <ak@linux.intel.com> wrote: > > I understand all your points, but there's no alternative. > The only other way would be to disable INST_RETIRED.ALL. > You cannot do that either. INST_RETIRED:ALL is important. I assume the bug applies whether or not the event is used with a filter. I think we need to ensure that by looking at the perf.data file, one can reconstruct the total number of inst_Retired:all occurrences for the run. With a fixed period, one would do num_samples * fixed_period. I know the Gooda tool does that. It is used to estimate the number of events captured vs. the number of events occurring. So what I am saying is that on BDW, we need to have perf force PERF_SAMPLE_PERIOD even with a fixed period is passed. The kernel cannot simply add PERF_SAMPLE_PERIOD to the event sample_format because the tool would get confused by the sample record size otherwise. I understand this is not pretty. We'd need to have a on the evsel just before making the syscall. That callback would be for x86 and it would check the CPUID to force PERF_SAMPLE_PERIOD. That's my thinking but there may be a better way of doing this. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/5] perf, x86: Add INST_RETIRED.ALL workarounds 2014-08-14 17:47 ` Stephane Eranian @ 2014-08-14 18:41 ` Andi Kleen 2014-08-15 14:31 ` Peter Zijlstra 1 sibling, 0 replies; 25+ messages in thread From: Andi Kleen @ 2014-08-14 18:41 UTC (permalink / raw) To: Stephane Eranian Cc: Namhyung Kim, Jiri Olsa, Arnaldo Carvalho de Melo, Andi Kleen, Peter Zijlstra, LKML, Ingo Molnar On Thu, Aug 14, 2014 at 07:47:56PM +0200, Stephane Eranian wrote: > [+perf tool maintainers] > > On Thu, Aug 14, 2014 at 4:30 PM, Andi Kleen <ak@linux.intel.com> wrote: > > > > I understand all your points, but there's no alternative. > > The only other way would be to disable INST_RETIRED.ALL. > > > You cannot do that either. INST_RETIRED:ALL is important. > I assume the bug applies whether or not the event is used > with a filter. > > I think we need to ensure that by looking at the perf.data file, > one can reconstruct the total number of inst_Retired:all > occurrences for the run. With a fixed period, one would do > num_samples * fixed_period. I know the Gooda tool does > that. It is used to estimate the number of events captured > vs. the number of events occurring. Is that really a problem? Normally periods are not that small, especially not for instruction retired. I don't think you can run such a small period on instruction retired for any significant time without throttling. With sensible periods, let's say >10k, the error from losing a few bits is very small. It would surprise me if you can actually measure it. There will be always much more jitter just from standard system noise. -Andi ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/5] perf, x86: Add INST_RETIRED.ALL workarounds 2014-08-14 17:47 ` Stephane Eranian 2014-08-14 18:41 ` Andi Kleen @ 2014-08-15 14:31 ` Peter Zijlstra 2014-08-15 17:21 ` Andi Kleen 1 sibling, 1 reply; 25+ messages in thread From: Peter Zijlstra @ 2014-08-15 14:31 UTC (permalink / raw) To: Stephane Eranian Cc: Andi Kleen, Namhyung Kim, Jiri Olsa, Arnaldo Carvalho de Melo, Andi Kleen, LKML, Ingo Molnar [-- Attachment #1: Type: text/plain, Size: 1861 bytes --] On Thu, Aug 14, 2014 at 07:47:56PM +0200, Stephane Eranian wrote: > [+perf tool maintainers] > > On Thu, Aug 14, 2014 at 4:30 PM, Andi Kleen <ak@linux.intel.com> wrote: > > > > I understand all your points, but there's no alternative. > > The only other way would be to disable INST_RETIRED.ALL. > > > You cannot do that either. INST_RETIRED:ALL is important. I assume > the bug applies whether or not the event is used with a filter. > > I think we need to ensure that by looking at the perf.data file, one > can reconstruct the total number of inst_Retired:all occurrences for > the run. With a fixed period, one would do num_samples * fixed_period. > I know the Gooda tool does that. It is used to estimate the number of > events captured vs. the number of events occurring. OK, I think we can make that work; IFF we guarantee perf_event_attr::sample_period >= 128. Suppose we start out with sample_period=192; then we'll set period_left to 192, we'll end up with left = 128 (we truncate the lower bits). We get an interrupt, find that period_left = 64 (>0 so we return 0 and don't get an overflow handler), up that to 128. Then we trigger again, at n=256. Then we find period_left = -64 (<=0 so we return 1 and do get an overflow). We increment with sample_period so we get left = 128. We fire again, at n=384, period_left = 0 (<=0 so we return 1 and get an overflow). And on and on. So while the individual interrupts are 'wrong' we get then with interval=256,128 in exactly the right ratio to average out at 192. And this works for everything >=128. So the num_samples*fixed_period thing is still entirely correct +- 127, which is good enough I'd say, as you already have that error anyhow. So no need to 'fix' the tools, al we need to do is refuse to create INST_RETIRED:ALL events with sample_period < 128. [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/5] perf, x86: Add INST_RETIRED.ALL workarounds 2014-08-15 14:31 ` Peter Zijlstra @ 2014-08-15 17:21 ` Andi Kleen 0 siblings, 0 replies; 25+ messages in thread From: Andi Kleen @ 2014-08-15 17:21 UTC (permalink / raw) To: Peter Zijlstra Cc: Stephane Eranian, Andi Kleen, Namhyung Kim, Jiri Olsa, Arnaldo Carvalho de Melo, Andi Kleen, LKML, Ingo Molnar > So no need to 'fix' the tools, al we need to do is refuse to create > INST_RETIRED:ALL events with sample_period < 128. Like this? diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c index a0adf58..c592d4d 100644 --- a/arch/x86/kernel/cpu/perf_event.c +++ b/arch/x86/kernel/cpu/perf_event.c @@ -443,6 +443,12 @@ int x86_pmu_hw_config(struct perf_event *event) if (event->attr.type == PERF_TYPE_RAW) event->hw.config |= event->attr.config & X86_RAW_EVENT_MASK; + if (event->attr.sample_period && x86_pmu.limit_period) { + if (x86_pmu.limit_period(event, event->attr.sample_period) < + event->attr.sample_period) + return -EINVAL; + } + return x86_setup_perfctr(event); } -Andi ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 4/5] perf, x86: Add INST_RETIRED.ALL workarounds 2014-08-14 1:17 ` [PATCH 4/5] perf, x86: Add INST_RETIRED.ALL workarounds Andi Kleen 2014-08-14 4:46 ` Stephane Eranian @ 2014-08-14 7:10 ` Peter Zijlstra 1 sibling, 0 replies; 25+ messages in thread From: Peter Zijlstra @ 2014-08-14 7:10 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-kernel, mingo, eranian, Andi Kleen [-- Attachment #1: Type: text/plain, Size: 524 bytes --] On Wed, Aug 13, 2014 at 06:17:48PM -0700, Andi Kleen wrote: > v2: Use correct event name in description. Use EVENT() macro. > +static unsigned bdw_limit_period(struct perf_event *event, unsigned left) > +{ > + if ((event->hw.config & 0xffff) == I was thinking you should use INTEL_ARCH_EVENT_MASK or something instead of the raw 0xFFFF there, but that X86_CONFIG() usage is nice too :-) > + X86_CONFIG(.event=0xc0, .umask=0x01)) { > + if (left < 128) > + left = 128; > + left &= ~0x3fu; > + } > + return left; > +} [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 5/5] perf, x86: Use Broadwell cache event list for Haswell 2014-08-14 1:17 Updated Broadwell perf patchkit Andi Kleen ` (3 preceding siblings ...) 2014-08-14 1:17 ` [PATCH 4/5] perf, x86: Add INST_RETIRED.ALL workarounds Andi Kleen @ 2014-08-14 1:17 ` Andi Kleen 4 siblings, 0 replies; 25+ messages in thread From: Andi Kleen @ 2014-08-14 1:17 UTC (permalink / raw) To: peterz; +Cc: linux-kernel, mingo, eranian, Andi Kleen From: Andi Kleen <ak@linux.intel.com> Use the newly added Broadwell cache event list for Haswell too. They are identical, but Haswell is very different from the Sandy Bridge list that was used previously. That fixes a wide range of mis-counting cache events. The prefetch events are gone now. They way the hardware counts them is very misleading (some prefetches included, others not), so it seemed best to leave them out. Signed-off-by: Andi Kleen <ak@linux.intel.com> --- arch/x86/kernel/cpu/perf_event_intel.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index 66260e1..178ddc0 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -2691,8 +2691,8 @@ __init int intel_pmu_init(void) case 63: /* Haswell Server */ case 69: /* Haswell ULT */ x86_pmu.late_ack = true; - memcpy(hw_cache_event_ids, snb_hw_cache_event_ids, sizeof(hw_cache_event_ids)); - memcpy(hw_cache_extra_regs, snb_hw_cache_extra_regs, sizeof(hw_cache_extra_regs)); + memcpy(hw_cache_event_ids, bdw_hw_cache_event_ids, sizeof(hw_cache_event_ids)); + memcpy(hw_cache_extra_regs, bdw_hw_cache_extra_regs, sizeof(hw_cache_extra_regs)); intel_pmu_lbr_init_snb(); -- 1.9.3 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Broadwell perf support @ 2014-08-25 22:43 Andi Kleen 2014-08-25 22:43 ` [PATCH 3/5] perf, x86: Add Broadwell core support Andi Kleen 0 siblings, 1 reply; 25+ messages in thread From: Andi Kleen @ 2014-08-25 22:43 UTC (permalink / raw) To: peterz; +Cc: linux-kernel, mingo, eranian Updated version of the perf Broadwell patchkit. This also has some fixes for Haswell. This addresses all earlier feedback. Too low user specified periods on INST_RETIRED.ALL are now rejected with an error. The Haswell models are documented. The event matches use the perf macros. -Andi ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 3/5] perf, x86: Add Broadwell core support 2014-08-25 22:43 Broadwell perf support Andi Kleen @ 2014-08-25 22:43 ` Andi Kleen 2014-09-01 13:51 ` Peter Zijlstra 0 siblings, 1 reply; 25+ messages in thread From: Andi Kleen @ 2014-08-25 22:43 UTC (permalink / raw) To: peterz; +Cc: linux-kernel, mingo, eranian, Andi Kleen From: Andi Kleen <ak@linux.intel.com> Add Broadwell support for Broadwell Client to perf. This is very similar to Haswell. It uses a new cache event table, because there were various changes there. The constraint list has one new event that needs to be handled over Haswell. The PEBS event list is the same, so we reuse Haswell's. v2: Remove unnamed model numbers. Signed-off-by: Andi Kleen <ak@linux.intel.com> --- arch/x86/kernel/cpu/perf_event_intel.c | 150 +++++++++++++++++++++++++++++++++ 1 file changed, 150 insertions(+) diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index 0e9c272..1cca4ae 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -220,6 +220,15 @@ static struct event_constraint intel_hsw_event_constraints[] = { EVENT_CONSTRAINT_END }; +struct event_constraint intel_bdw_event_constraints[] = { + FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */ + FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */ + FIXED_EVENT_CONSTRAINT(0x0300, 2), /* CPU_CLK_UNHALTED.REF */ + INTEL_UEVENT_CONSTRAINT(0x148, 0x4), /* L1D_PEND_MISS.PENDING */ + INTEL_EVENT_CONSTRAINT(0xa3, 0x4), /* CYCLE_ACTIVITY.* */ + EVENT_CONSTRAINT_END +}; + static u64 intel_pmu_event_map(int hw_event) { return intel_perfmon_event_map[hw_event]; @@ -415,6 +424,126 @@ static __initconst const u64 snb_hw_cache_event_ids }; +static __initconst const u64 bdw_hw_cache_event_ids + [PERF_COUNT_HW_CACHE_MAX] + [PERF_COUNT_HW_CACHE_OP_MAX] + [PERF_COUNT_HW_CACHE_RESULT_MAX] = +{ + [ C(L1D ) ] = { + [ C(OP_READ) ] = { + [ C(RESULT_ACCESS) ] = 0x81d0, /* MEM_UOPS_RETIRED.ALL_LOADS */ + [ C(RESULT_MISS) ] = 0x151, /* L1D.REPLACEMENT */ + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = 0x82d0, /* MEM_UOPS_RETIRED.ALL_STORES */ + [ C(RESULT_MISS) ] = 0x0, + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = 0x0, + [ C(RESULT_MISS) ] = 0x0, + }, + }, + [ C(L1I ) ] = { + [ C(OP_READ) ] = { + [ C(RESULT_ACCESS) ] = 0x0, + [ C(RESULT_MISS) ] = 0x280, /* ICACHE.MISSES */ + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = -1, + [ C(RESULT_MISS) ] = -1, + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = 0x0, + [ C(RESULT_MISS) ] = 0x0, + }, + }, + [ C(LL ) ] = { + [ C(OP_READ) ] = { + /* OFFCORE_RESPONSE:ALL_DATA_RD|ALL_CODE_RD */ + [ C(RESULT_ACCESS) ] = 0x1b7, + /* OFFCORE_RESPONSE:ALL_DATA_RD|ALL_CODE_RD|SUPPLIER_NONE| + L3_MISS|ANY_SNOOP */ + [ C(RESULT_MISS) ] = 0x1b7, + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = 0x1b7, /* OFFCORE_RESPONSE:ALL_RFO */ + /* OFFCORE_RESPONSE:ALL_RFO|SUPPLIER_NONE|L3_MISS|ANY_SNOOP */ + [ C(RESULT_MISS) ] = 0x1b7, + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = 0x0, + [ C(RESULT_MISS) ] = 0x0, + }, + }, + [ C(DTLB) ] = { + [ C(OP_READ) ] = { + [ C(RESULT_ACCESS) ] = 0x81d0, /* MEM_UOPS_RETIRED.ALL_LOADS */ + [ C(RESULT_MISS) ] = 0x108, /* DTLB_LOAD_MISSES.MISS_CAUSES_A_WALK */ + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = 0x82d0, /* MEM_UOPS_RETIRED.ALL_STORES */ + [ C(RESULT_MISS) ] = 0x149, /* DTLB_STORE_MISSES.MISS_CAUSES_A_WALK */ + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = 0x0, + [ C(RESULT_MISS) ] = 0x0, + }, + }, + [ C(ITLB) ] = { + [ C(OP_READ) ] = { + [ C(RESULT_ACCESS) ] = 0x6085, /* ITLB_MISSES.STLB_HIT */ + [ C(RESULT_MISS) ] = 0x185, /* ITLB_MISSES.MISS_CAUSES_A_WALK */ + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = -1, + [ C(RESULT_MISS) ] = -1, + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = -1, + [ C(RESULT_MISS) ] = -1, + }, + }, + [ C(BPU ) ] = { + [ C(OP_READ) ] = { + [ C(RESULT_ACCESS) ] = 0xc4, /* BR_INST_RETIRED.ALL_BRANCHES */ + [ C(RESULT_MISS) ] = 0xc5, /* BR_MISP_RETIRED.ALL_BRANCHES */ + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = -1, + [ C(RESULT_MISS) ] = -1, + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = -1, + [ C(RESULT_MISS) ] = -1, + }, + }, +}; + +static __initconst const u64 bdw_hw_cache_extra_regs + [PERF_COUNT_HW_CACHE_MAX] + [PERF_COUNT_HW_CACHE_OP_MAX] + [PERF_COUNT_HW_CACHE_RESULT_MAX] = +{ + [ C(LL ) ] = { + [ C(OP_READ) ] = { + /* OFFCORE_RESPONSE:ALL_DATA_RD|ALL_CODE_RD */ + [ C(RESULT_ACCESS) ] = 0x2d5, + /* OFFCORE_RESPONSE:ALL_DATA_RD|ALL_CODE_RD|SUPPLIER_NONE| + L3_MISS|ANY_SNOOP */ + [ C(RESULT_MISS) ] = 0x3fbc0202d5, + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = 0x122, /* OFFCORE_RESPONSE:ALL_RFO */ + /* OFFCORE_RESPONSE:ALL_RFO|SUPPLIER_NONE|L3_MISS|ANY_SNOOP */ + [ C(RESULT_MISS) ] = 0x3fbc020122, + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = 0x0, + [ C(RESULT_MISS) ] = 0x0, + }, + }, +}; + static __initconst const u64 westmere_hw_cache_event_ids [PERF_COUNT_HW_CACHE_MAX] [PERF_COUNT_HW_CACHE_OP_MAX] @@ -2565,6 +2694,27 @@ __init int intel_pmu_init(void) pr_cont("Haswell events, "); break; + case 61: /* Broadwell Client */ + x86_pmu.late_ack = true; + memcpy(hw_cache_event_ids, bdw_hw_cache_event_ids, sizeof(hw_cache_event_ids)); + memcpy(hw_cache_extra_regs, bdw_hw_cache_extra_regs, sizeof(hw_cache_extra_regs)); + + intel_pmu_lbr_init_snb(); + + x86_pmu.event_constraints = intel_bdw_event_constraints; + x86_pmu.pebs_constraints = intel_hsw_pebs_event_constraints; + x86_pmu.extra_regs = intel_snbep_extra_regs; + x86_pmu.pebs_aliases = intel_pebs_aliases_snb; + /* all extra regs are per-cpu when HT is on */ + x86_pmu.er_flags |= ERF_HAS_RSP_1; + x86_pmu.er_flags |= ERF_NO_HT_SHARING; + + x86_pmu.hw_config = hsw_hw_config; + x86_pmu.get_event_constraints = hsw_get_event_constraints; + x86_pmu.cpu_events = hsw_events_attrs; + pr_cont("Broadwell events, "); + break; + default: switch (x86_pmu.version) { case 1: -- 1.9.3 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 3/5] perf, x86: Add Broadwell core support 2014-08-25 22:43 ` [PATCH 3/5] perf, x86: Add Broadwell core support Andi Kleen @ 2014-09-01 13:51 ` Peter Zijlstra 2014-11-04 13:14 ` Stephane Eranian 0 siblings, 1 reply; 25+ messages in thread From: Peter Zijlstra @ 2014-09-01 13:51 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-kernel, mingo, eranian, Andi Kleen On Mon, Aug 25, 2014 at 03:43:29PM -0700, Andi Kleen wrote: > + case 61: /* Broadwell Client */ Again, no other description has "Client" in. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/5] perf, x86: Add Broadwell core support 2014-09-01 13:51 ` Peter Zijlstra @ 2014-11-04 13:14 ` Stephane Eranian 2014-11-04 13:20 ` Peter Zijlstra 0 siblings, 1 reply; 25+ messages in thread From: Stephane Eranian @ 2014-11-04 13:14 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Andi Kleen, LKML, Ingo Molnar, Andi Kleen Hi, What's happening with Broadwell support? I don't see any support in 3.18-rc2. We need this ASAP now. On Mon, Sep 1, 2014 at 3:51 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, Aug 25, 2014 at 03:43:29PM -0700, Andi Kleen wrote: > >> + case 61: /* Broadwell Client */ > > Again, no other description has "Client" in. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/5] perf, x86: Add Broadwell core support 2014-11-04 13:14 ` Stephane Eranian @ 2014-11-04 13:20 ` Peter Zijlstra 0 siblings, 0 replies; 25+ messages in thread From: Peter Zijlstra @ 2014-11-04 13:20 UTC (permalink / raw) To: Stephane Eranian; +Cc: Andi Kleen, LKML, Ingo Molnar, Andi Kleen On Tue, Nov 04, 2014 at 02:14:59PM +0100, Stephane Eranian wrote: > What's happening with Broadwell support? > I don't see any support in 3.18-rc2. We need > this ASAP now. lkml.kernel.org/r/tip-1776b10627e486dd431fe72d8d47e5a865cf65d1@git.kernel.org ^ permalink raw reply [flat|nested] 25+ messages in thread
* perf, x86: Updated Broadwell patchkit @ 2014-08-27 21:03 Andi Kleen 2014-08-27 21:03 ` [PATCH 3/5] perf, x86: Add Broadwell core support Andi Kleen 0 siblings, 1 reply; 25+ messages in thread From: Andi Kleen @ 2014-08-27 21:03 UTC (permalink / raw) To: peterz; +Cc: linux-kernel, mingo, eranian, tglx Only minor changes to the last version. The Broadwell model number patch now has a more expansive change log and does not reorder case statements anymore. Some minor updates to commit logs. ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 3/5] perf, x86: Add Broadwell core support 2014-08-27 21:03 perf, x86: Updated Broadwell patchkit Andi Kleen @ 2014-08-27 21:03 ` Andi Kleen 0 siblings, 0 replies; 25+ messages in thread From: Andi Kleen @ 2014-08-27 21:03 UTC (permalink / raw) To: peterz; +Cc: linux-kernel, mingo, eranian, tglx, Andi Kleen From: Andi Kleen <ak@linux.intel.com> Add Broadwell support for Broadwell Client to perf. This is very similar to Haswell. It uses a new cache event table, because there were various changes there. The constraint list has one new event that needs to be handled over Haswell. The PEBS event list is the same, so we reuse Haswell's. v2: Remove unnamed model numbers. Signed-off-by: Andi Kleen <ak@linux.intel.com> --- arch/x86/kernel/cpu/perf_event_intel.c | 150 +++++++++++++++++++++++++++++++++ 1 file changed, 150 insertions(+) diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index 71caf22..7930b56 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -220,6 +220,15 @@ static struct event_constraint intel_hsw_event_constraints[] = { EVENT_CONSTRAINT_END }; +struct event_constraint intel_bdw_event_constraints[] = { + FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */ + FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */ + FIXED_EVENT_CONSTRAINT(0x0300, 2), /* CPU_CLK_UNHALTED.REF */ + INTEL_UEVENT_CONSTRAINT(0x148, 0x4), /* L1D_PEND_MISS.PENDING */ + INTEL_EVENT_CONSTRAINT(0xa3, 0x4), /* CYCLE_ACTIVITY.* */ + EVENT_CONSTRAINT_END +}; + static u64 intel_pmu_event_map(int hw_event) { return intel_perfmon_event_map[hw_event]; @@ -415,6 +424,126 @@ static __initconst const u64 snb_hw_cache_event_ids }; +static __initconst const u64 bdw_hw_cache_event_ids + [PERF_COUNT_HW_CACHE_MAX] + [PERF_COUNT_HW_CACHE_OP_MAX] + [PERF_COUNT_HW_CACHE_RESULT_MAX] = +{ + [ C(L1D ) ] = { + [ C(OP_READ) ] = { + [ C(RESULT_ACCESS) ] = 0x81d0, /* MEM_UOPS_RETIRED.ALL_LOADS */ + [ C(RESULT_MISS) ] = 0x151, /* L1D.REPLACEMENT */ + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = 0x82d0, /* MEM_UOPS_RETIRED.ALL_STORES */ + [ C(RESULT_MISS) ] = 0x0, + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = 0x0, + [ C(RESULT_MISS) ] = 0x0, + }, + }, + [ C(L1I ) ] = { + [ C(OP_READ) ] = { + [ C(RESULT_ACCESS) ] = 0x0, + [ C(RESULT_MISS) ] = 0x280, /* ICACHE.MISSES */ + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = -1, + [ C(RESULT_MISS) ] = -1, + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = 0x0, + [ C(RESULT_MISS) ] = 0x0, + }, + }, + [ C(LL ) ] = { + [ C(OP_READ) ] = { + /* OFFCORE_RESPONSE:ALL_DATA_RD|ALL_CODE_RD */ + [ C(RESULT_ACCESS) ] = 0x1b7, + /* OFFCORE_RESPONSE:ALL_DATA_RD|ALL_CODE_RD|SUPPLIER_NONE| + L3_MISS|ANY_SNOOP */ + [ C(RESULT_MISS) ] = 0x1b7, + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = 0x1b7, /* OFFCORE_RESPONSE:ALL_RFO */ + /* OFFCORE_RESPONSE:ALL_RFO|SUPPLIER_NONE|L3_MISS|ANY_SNOOP */ + [ C(RESULT_MISS) ] = 0x1b7, + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = 0x0, + [ C(RESULT_MISS) ] = 0x0, + }, + }, + [ C(DTLB) ] = { + [ C(OP_READ) ] = { + [ C(RESULT_ACCESS) ] = 0x81d0, /* MEM_UOPS_RETIRED.ALL_LOADS */ + [ C(RESULT_MISS) ] = 0x108, /* DTLB_LOAD_MISSES.MISS_CAUSES_A_WALK */ + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = 0x82d0, /* MEM_UOPS_RETIRED.ALL_STORES */ + [ C(RESULT_MISS) ] = 0x149, /* DTLB_STORE_MISSES.MISS_CAUSES_A_WALK */ + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = 0x0, + [ C(RESULT_MISS) ] = 0x0, + }, + }, + [ C(ITLB) ] = { + [ C(OP_READ) ] = { + [ C(RESULT_ACCESS) ] = 0x6085, /* ITLB_MISSES.STLB_HIT */ + [ C(RESULT_MISS) ] = 0x185, /* ITLB_MISSES.MISS_CAUSES_A_WALK */ + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = -1, + [ C(RESULT_MISS) ] = -1, + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = -1, + [ C(RESULT_MISS) ] = -1, + }, + }, + [ C(BPU ) ] = { + [ C(OP_READ) ] = { + [ C(RESULT_ACCESS) ] = 0xc4, /* BR_INST_RETIRED.ALL_BRANCHES */ + [ C(RESULT_MISS) ] = 0xc5, /* BR_MISP_RETIRED.ALL_BRANCHES */ + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = -1, + [ C(RESULT_MISS) ] = -1, + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = -1, + [ C(RESULT_MISS) ] = -1, + }, + }, +}; + +static __initconst const u64 bdw_hw_cache_extra_regs + [PERF_COUNT_HW_CACHE_MAX] + [PERF_COUNT_HW_CACHE_OP_MAX] + [PERF_COUNT_HW_CACHE_RESULT_MAX] = +{ + [ C(LL ) ] = { + [ C(OP_READ) ] = { + /* OFFCORE_RESPONSE:ALL_DATA_RD|ALL_CODE_RD */ + [ C(RESULT_ACCESS) ] = 0x2d5, + /* OFFCORE_RESPONSE:ALL_DATA_RD|ALL_CODE_RD|SUPPLIER_NONE| + L3_MISS|ANY_SNOOP */ + [ C(RESULT_MISS) ] = 0x3fbc0202d5, + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = 0x122, /* OFFCORE_RESPONSE:ALL_RFO */ + /* OFFCORE_RESPONSE:ALL_RFO|SUPPLIER_NONE|L3_MISS|ANY_SNOOP */ + [ C(RESULT_MISS) ] = 0x3fbc020122, + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = 0x0, + [ C(RESULT_MISS) ] = 0x0, + }, + }, +}; + static __initconst const u64 westmere_hw_cache_event_ids [PERF_COUNT_HW_CACHE_MAX] [PERF_COUNT_HW_CACHE_OP_MAX] @@ -2565,6 +2694,27 @@ __init int intel_pmu_init(void) pr_cont("Haswell events, "); break; + case 61: /* Broadwell Client */ + x86_pmu.late_ack = true; + memcpy(hw_cache_event_ids, bdw_hw_cache_event_ids, sizeof(hw_cache_event_ids)); + memcpy(hw_cache_extra_regs, bdw_hw_cache_extra_regs, sizeof(hw_cache_extra_regs)); + + intel_pmu_lbr_init_snb(); + + x86_pmu.event_constraints = intel_bdw_event_constraints; + x86_pmu.pebs_constraints = intel_hsw_pebs_event_constraints; + x86_pmu.extra_regs = intel_snbep_extra_regs; + x86_pmu.pebs_aliases = intel_pebs_aliases_snb; + /* all extra regs are per-cpu when HT is on */ + x86_pmu.er_flags |= ERF_HAS_RSP_1; + x86_pmu.er_flags |= ERF_NO_HT_SHARING; + + x86_pmu.hw_config = hsw_hw_config; + x86_pmu.get_event_constraints = hsw_get_event_constraints; + x86_pmu.cpu_events = hsw_events_attrs; + pr_cont("Broadwell events, "); + break; + default: switch (x86_pmu.version) { case 1: -- 1.9.3 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 1/5] perf, x86: Remove incorrect model number from Haswell perf @ 2014-09-02 18:44 Andi Kleen 2014-09-02 18:44 ` [PATCH 3/5] perf, x86: Add Broadwell core support Andi Kleen 0 siblings, 1 reply; 25+ messages in thread From: Andi Kleen @ 2014-09-02 18:44 UTC (permalink / raw) To: peterz; +Cc: linux-kernel, mingo, eranian, tglx, Andi Kleen From: Andi Kleen <ak@linux.intel.com> 71 is a Broadwell, not a Haswell. The model number was added by mistake earlier. Remove it for now, until it can be re-added later with real Broadwell support. In practice it does not cause a lot of issues because the Broadwell PMU is very similar to Haswell, but some details were wrong, and it's better to handle it correctly. Signed-off-by: Andi Kleen <ak@linux.intel.com> --- arch/x86/kernel/cpu/perf_event_intel.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index 89bc750..f962e26 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -2544,7 +2544,6 @@ __init int intel_pmu_init(void) case 63: case 69: case 70: - case 71: x86_pmu.late_ack = true; memcpy(hw_cache_event_ids, snb_hw_cache_event_ids, sizeof(hw_cache_event_ids)); memcpy(hw_cache_extra_regs, snb_hw_cache_extra_regs, sizeof(hw_cache_extra_regs)); -- 1.9.3 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 3/5] perf, x86: Add Broadwell core support 2014-09-02 18:44 [PATCH 1/5] perf, x86: Remove incorrect model number from Haswell perf Andi Kleen @ 2014-09-02 18:44 ` Andi Kleen 0 siblings, 0 replies; 25+ messages in thread From: Andi Kleen @ 2014-09-02 18:44 UTC (permalink / raw) To: peterz; +Cc: linux-kernel, mingo, eranian, tglx, Andi Kleen From: Andi Kleen <ak@linux.intel.com> Add Broadwell support for Broadwell Client to perf. This is very similar to Haswell. It uses a new cache event table, because there were various changes there. The constraint list has one new event that needs to be handled over Haswell. The PEBS event list is the same, so we reuse Haswell's. v2: Remove unnamed model numbers. v3: Rename cache event list to hsw_*. Change names. Signed-off-by: Andi Kleen <ak@linux.intel.com> --- arch/x86/kernel/cpu/perf_event_intel.c | 150 +++++++++++++++++++++++++++++++++ 1 file changed, 150 insertions(+) diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index 7c9f78e..61b5404 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -220,6 +220,15 @@ static struct event_constraint intel_hsw_event_constraints[] = { EVENT_CONSTRAINT_END }; +struct event_constraint intel_bdw_event_constraints[] = { + FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */ + FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */ + FIXED_EVENT_CONSTRAINT(0x0300, 2), /* CPU_CLK_UNHALTED.REF */ + INTEL_UEVENT_CONSTRAINT(0x148, 0x4), /* L1D_PEND_MISS.PENDING */ + INTEL_EVENT_CONSTRAINT(0xa3, 0x4), /* CYCLE_ACTIVITY.* */ + EVENT_CONSTRAINT_END +}; + static u64 intel_pmu_event_map(int hw_event) { return intel_perfmon_event_map[hw_event]; @@ -415,6 +424,126 @@ static __initconst const u64 snb_hw_cache_event_ids }; +static __initconst const u64 hsw_hw_cache_event_ids + [PERF_COUNT_HW_CACHE_MAX] + [PERF_COUNT_HW_CACHE_OP_MAX] + [PERF_COUNT_HW_CACHE_RESULT_MAX] = +{ + [ C(L1D ) ] = { + [ C(OP_READ) ] = { + [ C(RESULT_ACCESS) ] = 0x81d0, /* MEM_UOPS_RETIRED.ALL_LOADS */ + [ C(RESULT_MISS) ] = 0x151, /* L1D.REPLACEMENT */ + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = 0x82d0, /* MEM_UOPS_RETIRED.ALL_STORES */ + [ C(RESULT_MISS) ] = 0x0, + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = 0x0, + [ C(RESULT_MISS) ] = 0x0, + }, + }, + [ C(L1I ) ] = { + [ C(OP_READ) ] = { + [ C(RESULT_ACCESS) ] = 0x0, + [ C(RESULT_MISS) ] = 0x280, /* ICACHE.MISSES */ + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = -1, + [ C(RESULT_MISS) ] = -1, + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = 0x0, + [ C(RESULT_MISS) ] = 0x0, + }, + }, + [ C(LL ) ] = { + [ C(OP_READ) ] = { + /* OFFCORE_RESPONSE:ALL_DATA_RD|ALL_CODE_RD */ + [ C(RESULT_ACCESS) ] = 0x1b7, + /* OFFCORE_RESPONSE:ALL_DATA_RD|ALL_CODE_RD|SUPPLIER_NONE| + L3_MISS|ANY_SNOOP */ + [ C(RESULT_MISS) ] = 0x1b7, + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = 0x1b7, /* OFFCORE_RESPONSE:ALL_RFO */ + /* OFFCORE_RESPONSE:ALL_RFO|SUPPLIER_NONE|L3_MISS|ANY_SNOOP */ + [ C(RESULT_MISS) ] = 0x1b7, + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = 0x0, + [ C(RESULT_MISS) ] = 0x0, + }, + }, + [ C(DTLB) ] = { + [ C(OP_READ) ] = { + [ C(RESULT_ACCESS) ] = 0x81d0, /* MEM_UOPS_RETIRED.ALL_LOADS */ + [ C(RESULT_MISS) ] = 0x108, /* DTLB_LOAD_MISSES.MISS_CAUSES_A_WALK */ + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = 0x82d0, /* MEM_UOPS_RETIRED.ALL_STORES */ + [ C(RESULT_MISS) ] = 0x149, /* DTLB_STORE_MISSES.MISS_CAUSES_A_WALK */ + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = 0x0, + [ C(RESULT_MISS) ] = 0x0, + }, + }, + [ C(ITLB) ] = { + [ C(OP_READ) ] = { + [ C(RESULT_ACCESS) ] = 0x6085, /* ITLB_MISSES.STLB_HIT */ + [ C(RESULT_MISS) ] = 0x185, /* ITLB_MISSES.MISS_CAUSES_A_WALK */ + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = -1, + [ C(RESULT_MISS) ] = -1, + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = -1, + [ C(RESULT_MISS) ] = -1, + }, + }, + [ C(BPU ) ] = { + [ C(OP_READ) ] = { + [ C(RESULT_ACCESS) ] = 0xc4, /* BR_INST_RETIRED.ALL_BRANCHES */ + [ C(RESULT_MISS) ] = 0xc5, /* BR_MISP_RETIRED.ALL_BRANCHES */ + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = -1, + [ C(RESULT_MISS) ] = -1, + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = -1, + [ C(RESULT_MISS) ] = -1, + }, + }, +}; + +static __initconst const u64 hsw_hw_cache_extra_regs + [PERF_COUNT_HW_CACHE_MAX] + [PERF_COUNT_HW_CACHE_OP_MAX] + [PERF_COUNT_HW_CACHE_RESULT_MAX] = +{ + [ C(LL ) ] = { + [ C(OP_READ) ] = { + /* OFFCORE_RESPONSE:ALL_DATA_RD|ALL_CODE_RD */ + [ C(RESULT_ACCESS) ] = 0x2d5, + /* OFFCORE_RESPONSE:ALL_DATA_RD|ALL_CODE_RD|SUPPLIER_NONE| + L3_MISS|ANY_SNOOP */ + [ C(RESULT_MISS) ] = 0x3fbc0202d5, + }, + [ C(OP_WRITE) ] = { + [ C(RESULT_ACCESS) ] = 0x122, /* OFFCORE_RESPONSE:ALL_RFO */ + /* OFFCORE_RESPONSE:ALL_RFO|SUPPLIER_NONE|L3_MISS|ANY_SNOOP */ + [ C(RESULT_MISS) ] = 0x3fbc020122, + }, + [ C(OP_PREFETCH) ] = { + [ C(RESULT_ACCESS) ] = 0x0, + [ C(RESULT_MISS) ] = 0x0, + }, + }, +}; + static __initconst const u64 westmere_hw_cache_event_ids [PERF_COUNT_HW_CACHE_MAX] [PERF_COUNT_HW_CACHE_OP_MAX] @@ -2565,6 +2694,27 @@ __init int intel_pmu_init(void) pr_cont("Haswell events, "); break; + case 61: /* 14nm Broadwell Core-M */ + x86_pmu.late_ack = true; + memcpy(hw_cache_event_ids, hsw_hw_cache_event_ids, sizeof(hw_cache_event_ids)); + memcpy(hw_cache_extra_regs, hsw_hw_cache_extra_regs, sizeof(hw_cache_extra_regs)); + + intel_pmu_lbr_init_snb(); + + x86_pmu.event_constraints = intel_bdw_event_constraints; + x86_pmu.pebs_constraints = intel_hsw_pebs_event_constraints; + x86_pmu.extra_regs = intel_snbep_extra_regs; + x86_pmu.pebs_aliases = intel_pebs_aliases_snb; + /* all extra regs are per-cpu when HT is on */ + x86_pmu.er_flags |= ERF_HAS_RSP_1; + x86_pmu.er_flags |= ERF_NO_HT_SHARING; + + x86_pmu.hw_config = hsw_hw_config; + x86_pmu.get_event_constraints = hsw_get_event_constraints; + x86_pmu.cpu_events = hsw_events_attrs; + pr_cont("Broadwell events, "); + break; + default: switch (x86_pmu.version) { case 1: -- 1.9.3 ^ permalink raw reply related [flat|nested] 25+ messages in thread
end of thread, other threads:[~2014-11-04 13:20 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-08-14 1:17 Updated Broadwell perf patchkit Andi Kleen 2014-08-14 1:17 ` [PATCH 1/5] perf, x86: Remove incorrect model number from Haswell perf Andi Kleen 2014-08-14 1:17 ` [PATCH 2/5] perf, x86: Document all Haswell models Andi Kleen 2014-08-14 7:01 ` Peter Zijlstra 2014-08-14 15:00 ` Andi Kleen 2014-08-14 1:17 ` [PATCH 3/5] perf, x86: Add Broadwell core support Andi Kleen 2014-08-14 7:07 ` Peter Zijlstra 2014-08-14 7:32 ` Peter Zijlstra 2014-08-14 14:58 ` Andi Kleen 2014-08-14 15:10 ` Peter Zijlstra 2014-08-14 1:17 ` [PATCH 4/5] perf, x86: Add INST_RETIRED.ALL workarounds Andi Kleen 2014-08-14 4:46 ` Stephane Eranian 2014-08-14 14:30 ` Andi Kleen 2014-08-14 17:47 ` Stephane Eranian 2014-08-14 18:41 ` Andi Kleen 2014-08-15 14:31 ` Peter Zijlstra 2014-08-15 17:21 ` Andi Kleen 2014-08-14 7:10 ` Peter Zijlstra 2014-08-14 1:17 ` [PATCH 5/5] perf, x86: Use Broadwell cache event list for Haswell Andi Kleen 2014-08-25 22:43 Broadwell perf support Andi Kleen 2014-08-25 22:43 ` [PATCH 3/5] perf, x86: Add Broadwell core support Andi Kleen 2014-09-01 13:51 ` Peter Zijlstra 2014-11-04 13:14 ` Stephane Eranian 2014-11-04 13:20 ` Peter Zijlstra 2014-08-27 21:03 perf, x86: Updated Broadwell patchkit Andi Kleen 2014-08-27 21:03 ` [PATCH 3/5] perf, x86: Add Broadwell core support Andi Kleen 2014-09-02 18:44 [PATCH 1/5] perf, x86: Remove incorrect model number from Haswell perf Andi Kleen 2014-09-02 18:44 ` [PATCH 3/5] perf, x86: Add Broadwell core support Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).