* Fix Skylake PEBS data source for perf v5 @ 2017-08-16 22:21 Andi Kleen 2017-08-16 22:21 ` [PATCH v5 1/4] perf/x86: Move Nehalem PEBS code to flag Andi Kleen ` (4 more replies) 0 siblings, 5 replies; 15+ messages in thread From: Andi Kleen @ 2017-08-16 22:21 UTC (permalink / raw) To: peterz, acme; +Cc: jolsa, linux-kernel Fix data source reporting for Skylake and Skylake Server. The encodings have changed to express support for L4 and persistent memory. The first patch is a (independent) cleanup. The second is for the kernel and the third/fourth for perf/tools. The kernel part and perf tools will compile independently. Also available in git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git perf/skx-data-src-7 v1: Initial post v2: Merged some patches. Change encoding to use special bit for each combination instead of modifiers. v3: Switch to new generic lvlnum indication v4: Repost. No changes. v5: ported to latest tree. Retested remote HITM. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v5 1/4] perf/x86: Move Nehalem PEBS code to flag 2017-08-16 22:21 Fix Skylake PEBS data source for perf v5 Andi Kleen @ 2017-08-16 22:21 ` Andi Kleen 2017-08-25 11:53 ` [tip:perf/core] " tip-bot for Andi Kleen 2017-08-16 22:21 ` [PATCH v5 2/4] perf/x86: Fix data source decoding for Skylake Andi Kleen ` (3 subsequent siblings) 4 siblings, 1 reply; 15+ messages in thread From: Andi Kleen @ 2017-08-16 22:21 UTC (permalink / raw) To: peterz, acme; +Cc: jolsa, linux-kernel, Andi Kleen From: Andi Kleen <ak@linux.intel.com> Minor cleanup: use an explicit x86_pmu flag to handle the missing Lock / TLB information on Nehalem, instead of always checking the model number for each PEBS sample. Signed-off-by: Andi Kleen <ak@linux.intel.com> --- arch/x86/events/intel/core.c | 1 + arch/x86/events/intel/ds.c | 5 +---- arch/x86/events/perf_event.h | 3 ++- 3 files changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 98b0f0729527..c3439a36dcf9 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -3905,6 +3905,7 @@ __init int intel_pmu_init(void) intel_pmu_pebs_data_source_nhm(); x86_add_quirk(intel_nehalem_quirk); + x86_pmu.pebs_no_tlb = 1; pr_cont("Nehalem events, "); break; diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index a322fed5f8ed..3ccdf8cb4495 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -149,8 +149,6 @@ static u64 load_latency_data(u64 status) { union intel_x86_pebs_dse dse; u64 val; - int model = boot_cpu_data.x86_model; - int fam = boot_cpu_data.x86; dse.val = status; @@ -162,8 +160,7 @@ static u64 load_latency_data(u64 status) /* * Nehalem models do not support TLB, Lock infos */ - if (fam == 0x6 && (model == 26 || model == 30 - || model == 31 || model == 46)) { + if (x86_pmu.pebs_no_tlb) { val |= P(TLB, NA) | P(LOCK, NA); return val; } diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 476aec3a4cab..2e9636e4068f 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -591,7 +591,8 @@ struct x86_pmu { pebs :1, pebs_active :1, pebs_broken :1, - pebs_prec_dist :1; + pebs_prec_dist :1, + pebs_no_tlb :1; int pebs_record_size; int pebs_buffer_size; void (*drain_pebs)(struct pt_regs *regs); -- 2.9.4 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [tip:perf/core] perf/x86: Move Nehalem PEBS code to flag 2017-08-16 22:21 ` [PATCH v5 1/4] perf/x86: Move Nehalem PEBS code to flag Andi Kleen @ 2017-08-25 11:53 ` tip-bot for Andi Kleen 0 siblings, 0 replies; 15+ messages in thread From: tip-bot for Andi Kleen @ 2017-08-25 11:53 UTC (permalink / raw) To: linux-tip-commits; +Cc: mingo, tglx, linux-kernel, peterz, hpa, ak, torvalds Commit-ID: 95298355143f9765f0d40ed57dce7fa6571cc623 Gitweb: http://git.kernel.org/tip/95298355143f9765f0d40ed57dce7fa6571cc623 Author: Andi Kleen <ak@linux.intel.com> AuthorDate: Wed, 16 Aug 2017 15:21:53 -0700 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Fri, 25 Aug 2017 11:04:16 +0200 perf/x86: Move Nehalem PEBS code to flag Minor cleanup: use an explicit x86_pmu flag to handle the missing Lock / TLB information on Nehalem, instead of always checking the model number for each PEBS sample. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: jolsa@kernel.org Link: http://lkml.kernel.org/r/20170816222156.19953-2-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org> --- arch/x86/events/intel/core.c | 1 + arch/x86/events/intel/ds.c | 5 +---- arch/x86/events/perf_event.h | 3 ++- 3 files changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 98b0f07..c3439a3 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -3905,6 +3905,7 @@ __init int intel_pmu_init(void) intel_pmu_pebs_data_source_nhm(); x86_add_quirk(intel_nehalem_quirk); + x86_pmu.pebs_no_tlb = 1; pr_cont("Nehalem events, "); break; diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index a322fed..3ccdf8c 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -149,8 +149,6 @@ static u64 load_latency_data(u64 status) { union intel_x86_pebs_dse dse; u64 val; - int model = boot_cpu_data.x86_model; - int fam = boot_cpu_data.x86; dse.val = status; @@ -162,8 +160,7 @@ static u64 load_latency_data(u64 status) /* * Nehalem models do not support TLB, Lock infos */ - if (fam == 0x6 && (model == 26 || model == 30 - || model == 31 || model == 46)) { + if (x86_pmu.pebs_no_tlb) { val |= P(TLB, NA) | P(LOCK, NA); return val; } diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 476aec3..2e9636e 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -591,7 +591,8 @@ struct x86_pmu { pebs :1, pebs_active :1, pebs_broken :1, - pebs_prec_dist :1; + pebs_prec_dist :1, + pebs_no_tlb :1; int pebs_record_size; int pebs_buffer_size; void (*drain_pebs)(struct pt_regs *regs); ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v5 2/4] perf/x86: Fix data source decoding for Skylake 2017-08-16 22:21 Fix Skylake PEBS data source for perf v5 Andi Kleen 2017-08-16 22:21 ` [PATCH v5 1/4] perf/x86: Move Nehalem PEBS code to flag Andi Kleen @ 2017-08-16 22:21 ` Andi Kleen 2017-08-25 11:53 ` [tip:perf/core] " tip-bot for Andi Kleen 2017-08-16 22:21 ` [PATCH v5 3/4] perf, tools: Add support for printing new mem_info encodings Andi Kleen ` (2 subsequent siblings) 4 siblings, 1 reply; 15+ messages in thread From: Andi Kleen @ 2017-08-16 22:21 UTC (permalink / raw) To: peterz, acme; +Cc: jolsa, linux-kernel, Andi Kleen From: Andi Kleen <ak@linux.intel.com> Skylake changed the encoding of the PEBS data source field. Some combinations are not available anymore, but some new cases e.g. for L4 cache hit are added. Fix up the conversion table for Skylake, similar as had been done for Nehalem. On Skylake server the encoding for L4 actually means persistent memory. Handle this case too. To properly describe it in the abstracted perf format I had to add some new fields. Since a hit can have only one level add a new field that is an enumeration, not a bit field to describe the level. It can describe any level. Some numbers are also used to describe PMEM and LFB. Also add a new generic remote flag that can be combined with the generic level to signify a remote cache. And there is an extension field for the snoop indication to handle the Forward state. I didn't add a generic flag for hops because it's not needed for Skylake. I changed the existing encodings for older CPUs to also fill in the new level and remote fields. v2: Merge with persistent memory patch. Add explicit bit for each case instead of using generic modifier. v3: Rework with new lvlnum and remote fields. Change older CPUs to report the new fields too. Signed-off-by: Andi Kleen <ak@linux.intel.com> --- arch/x86/events/intel/core.c | 2 ++ arch/x86/events/intel/ds.c | 51 ++++++++++++++++++++++++++--------------- arch/x86/events/perf_event.h | 2 ++ include/uapi/linux/perf_event.h | 30 ++++++++++++++++++++++-- 4 files changed, 64 insertions(+), 21 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index c3439a36dcf9..6f342001ec6a 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -4208,6 +4208,8 @@ __init int intel_pmu_init(void) skl_format_attr); WARN_ON(!x86_pmu.format_attrs); x86_pmu.cpu_events = hsw_events_attrs; + intel_pmu_pebs_data_source_skl( + boot_cpu_data.x86_model == INTEL_FAM6_SKYLAKE_X); pr_cont("Skylake events, "); break; diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 3ccdf8cb4495..98e36e0c791c 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -49,34 +49,47 @@ union intel_x86_pebs_dse { */ #define P(a, b) PERF_MEM_S(a, b) #define OP_LH (P(OP, LOAD) | P(LVL, HIT)) +#define LEVEL(x) P(LVLNUM, x) +#define REM P(REMOTE, REMOTE) #define SNOOP_NONE_MISS (P(SNOOP, NONE) | P(SNOOP, MISS)) /* Version for Sandy Bridge and later */ static u64 pebs_data_source[] = { - P(OP, LOAD) | P(LVL, MISS) | P(LVL, L3) | P(SNOOP, NA),/* 0x00:ukn L3 */ - OP_LH | P(LVL, L1) | P(SNOOP, NONE), /* 0x01: L1 local */ - OP_LH | P(LVL, LFB) | P(SNOOP, NONE), /* 0x02: LFB hit */ - OP_LH | P(LVL, L2) | P(SNOOP, NONE), /* 0x03: L2 hit */ - OP_LH | P(LVL, L3) | P(SNOOP, NONE), /* 0x04: L3 hit */ - OP_LH | P(LVL, L3) | P(SNOOP, MISS), /* 0x05: L3 hit, snoop miss */ - OP_LH | P(LVL, L3) | P(SNOOP, HIT), /* 0x06: L3 hit, snoop hit */ - OP_LH | P(LVL, L3) | P(SNOOP, HITM), /* 0x07: L3 hit, snoop hitm */ - OP_LH | P(LVL, REM_CCE1) | P(SNOOP, HIT), /* 0x08: L3 miss snoop hit */ - OP_LH | P(LVL, REM_CCE1) | P(SNOOP, HITM), /* 0x09: L3 miss snoop hitm*/ - OP_LH | P(LVL, LOC_RAM) | P(SNOOP, HIT), /* 0x0a: L3 miss, shared */ - OP_LH | P(LVL, REM_RAM1) | P(SNOOP, HIT), /* 0x0b: L3 miss, shared */ - OP_LH | P(LVL, LOC_RAM) | SNOOP_NONE_MISS,/* 0x0c: L3 miss, excl */ - OP_LH | P(LVL, REM_RAM1) | SNOOP_NONE_MISS,/* 0x0d: L3 miss, excl */ - OP_LH | P(LVL, IO) | P(SNOOP, NONE), /* 0x0e: I/O */ - OP_LH | P(LVL, UNC) | P(SNOOP, NONE), /* 0x0f: uncached */ + P(OP, LOAD) | P(LVL, MISS) | LEVEL(L3) | P(SNOOP, NA),/* 0x00:ukn L3 */ + OP_LH | P(LVL, L1) | LEVEL(L1) | P(SNOOP, NONE), /* 0x01: L1 local */ + OP_LH | P(LVL, LFB) | LEVEL(LFB) | P(SNOOP, NONE), /* 0x02: LFB hit */ + OP_LH | P(LVL, L2) | LEVEL(L2) | P(SNOOP, NONE), /* 0x03: L2 hit */ + OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOP, NONE), /* 0x04: L3 hit */ + OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOP, MISS), /* 0x05: L3 hit, snoop miss */ + OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOP, HIT), /* 0x06: L3 hit, snoop hit */ + OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOP, HITM), /* 0x07: L3 hit, snoop hitm */ + OP_LH | P(LVL, REM_CCE1) | REM | LEVEL(L3) | P(SNOOP, HIT), /* 0x08: L3 miss snoop hit */ + OP_LH | P(LVL, REM_CCE1) | REM | LEVEL(L3) | P(SNOOP, HITM), /* 0x09: L3 miss snoop hitm*/ + OP_LH | P(LVL, LOC_RAM) | LEVEL(RAM) | P(SNOOP, HIT), /* 0x0a: L3 miss, shared */ + OP_LH | P(LVL, REM_RAM1) | REM | LEVEL(L3) | P(SNOOP, HIT), /* 0x0b: L3 miss, shared */ + OP_LH | P(LVL, LOC_RAM) | LEVEL(RAM) | SNOOP_NONE_MISS, /* 0x0c: L3 miss, excl */ + OP_LH | P(LVL, REM_RAM1) | LEVEL(RAM) | REM | SNOOP_NONE_MISS, /* 0x0d: L3 miss, excl */ + OP_LH | P(LVL, IO) | LEVEL(NA) | P(SNOOP, NONE), /* 0x0e: I/O */ + OP_LH | P(LVL, UNC) | LEVEL(NA) | P(SNOOP, NONE), /* 0x0f: uncached */ }; /* Patch up minor differences in the bits */ void __init intel_pmu_pebs_data_source_nhm(void) { - pebs_data_source[0x05] = OP_LH | P(LVL, L3) | P(SNOOP, HIT); - pebs_data_source[0x06] = OP_LH | P(LVL, L3) | P(SNOOP, HITM); - pebs_data_source[0x07] = OP_LH | P(LVL, L3) | P(SNOOP, HITM); + pebs_data_source[0x05] = OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOP, HIT); + pebs_data_source[0x06] = OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOP, HITM); + pebs_data_source[0x07] = OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOP, HITM); +} + +void __init intel_pmu_pebs_data_source_skl(bool pmem) +{ + u64 pmem_or_l4 = pmem ? LEVEL(PMEM) : LEVEL(L4); + + pebs_data_source[0x08] = OP_LH | pmem_or_l4 | P(SNOOP, HIT); + pebs_data_source[0x09] = OP_LH | pmem_or_l4 | REM | P(SNOOP, HIT); + pebs_data_source[0x0b] = OP_LH | LEVEL(RAM) | REM | P(SNOOP, NONE); + pebs_data_source[0x0c] = OP_LH | LEVEL(ANY_CACHE) | REM | P(SNOOPX, FWD); + pebs_data_source[0x0d] = OP_LH | LEVEL(ANY_CACHE) | REM | P(SNOOP, HITM); } static u64 precise_store_data(u64 status) diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 2e9636e4068f..0f7dad8bd358 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -948,6 +948,8 @@ void intel_pmu_lbr_init_knl(void); void intel_pmu_pebs_data_source_nhm(void); +void intel_pmu_pebs_data_source_skl(bool pmem); + int intel_pmu_setup_lbr_filter(struct perf_event *event); void intel_pt_interrupt(void); diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 642db5fa3286..2a37ae925d85 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -954,14 +954,20 @@ union perf_mem_data_src { mem_snoop:5, /* snoop mode */ mem_lock:2, /* lock instr */ mem_dtlb:7, /* tlb access */ - mem_rsvd:31; + mem_lvl_num:4, /* memory hierarchy level number */ + mem_remote:1, /* remote */ + mem_snoopx:2, /* snoop mode, ext */ + mem_rsvd:24; }; }; #elif defined(__BIG_ENDIAN_BITFIELD) union perf_mem_data_src { __u64 val; struct { - __u64 mem_rsvd:31, + __u64 mem_rsvd:24, + mem_snoopx:2, /* snoop mode, ext */ + mem_remote:1, /* remote */ + mem_lvl_num:4, /* memory hierarchy level number */ mem_dtlb:7, /* tlb access */ mem_lock:2, /* lock instr */ mem_snoop:5, /* snoop mode */ @@ -998,6 +1004,22 @@ union perf_mem_data_src { #define PERF_MEM_LVL_UNC 0x2000 /* Uncached memory */ #define PERF_MEM_LVL_SHIFT 5 +#define PERF_MEM_REMOTE_REMOTE 0x01 /* Remote */ +#define PERF_MEM_REMOTE_SHIFT 37 + +#define PERF_MEM_LVLNUM_L1 0x01 /* L1 */ +#define PERF_MEM_LVLNUM_L2 0x02 /* L2 */ +#define PERF_MEM_LVLNUM_L3 0x03 /* L3 */ +#define PERF_MEM_LVLNUM_L4 0x04 /* L4 */ +/* 5-0xa available */ +#define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */ +#define PERF_MEM_LVLNUM_LFB 0x0c /* LFB */ +#define PERF_MEM_LVLNUM_RAM 0x0d /* RAM */ +#define PERF_MEM_LVLNUM_PMEM 0x0e /* PMEM */ +#define PERF_MEM_LVLNUM_NA 0x0f /* N/A */ + +#define PERF_MEM_LVLNUM_SHIFT 33 + /* snoop mode */ #define PERF_MEM_SNOOP_NA 0x01 /* not available */ #define PERF_MEM_SNOOP_NONE 0x02 /* no snoop */ @@ -1006,6 +1028,10 @@ union perf_mem_data_src { #define PERF_MEM_SNOOP_HITM 0x10 /* snoop hit modified */ #define PERF_MEM_SNOOP_SHIFT 19 +#define PERF_MEM_SNOOPX_FWD 0x01 /* forward */ +/* 1 free */ +#define PERF_MEM_SNOOPX_SHIFT 37 + /* locked instruction */ #define PERF_MEM_LOCK_NA 0x01 /* not available */ #define PERF_MEM_LOCK_LOCKED 0x02 /* locked transaction */ -- 2.9.4 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [tip:perf/core] perf/x86: Fix data source decoding for Skylake 2017-08-16 22:21 ` [PATCH v5 2/4] perf/x86: Fix data source decoding for Skylake Andi Kleen @ 2017-08-25 11:53 ` tip-bot for Andi Kleen 0 siblings, 0 replies; 15+ messages in thread From: tip-bot for Andi Kleen @ 2017-08-25 11:53 UTC (permalink / raw) To: linux-tip-commits Cc: mpe, mingo, peterz, tglx, ak, hpa, torvalds, linux-kernel, maddy Commit-ID: 6ae5fa61d27dcb055f4198bcf6c8dbbf1bb33f52 Gitweb: http://git.kernel.org/tip/6ae5fa61d27dcb055f4198bcf6c8dbbf1bb33f52 Author: Andi Kleen <ak@linux.intel.com> AuthorDate: Wed, 16 Aug 2017 15:21:54 -0700 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Fri, 25 Aug 2017 11:04:17 +0200 perf/x86: Fix data source decoding for Skylake Skylake changed the encoding of the PEBS data source field. Some combinations are not available anymore, but some new cases e.g. for L4 cache hit are added. Fix up the conversion table for Skylake, similar as had been done for Nehalem. On Skylake server the encoding for L4 actually means persistent memory. Handle this case too. To properly describe it in the abstracted perf format I had to add some new fields. Since a hit can have only one level add a new field that is an enumeration, not a bit field to describe the level. It can describe any level. Some numbers are also used to describe PMEM and LFB. Also add a new generic remote flag that can be combined with the generic level to signify a remote cache. And there is an extension field for the snoop indication to handle the Forward state. I didn't add a generic flag for hops because it's not needed for Skylake. I changed the existing encodings for older CPUs to also fill in the new level and remote fields. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: jolsa@kernel.org Link: http://lkml.kernel.org/r/20170816222156.19953-3-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org> --- arch/x86/events/intel/core.c | 2 ++ arch/x86/events/intel/ds.c | 51 ++++++++++++++++++++++++++--------------- arch/x86/events/perf_event.h | 2 ++ include/uapi/linux/perf_event.h | 30 ++++++++++++++++++++++-- 4 files changed, 64 insertions(+), 21 deletions(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index c3439a3..6f34200 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -4208,6 +4208,8 @@ __init int intel_pmu_init(void) skl_format_attr); WARN_ON(!x86_pmu.format_attrs); x86_pmu.cpu_events = hsw_events_attrs; + intel_pmu_pebs_data_source_skl( + boot_cpu_data.x86_model == INTEL_FAM6_SKYLAKE_X); pr_cont("Skylake events, "); break; diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 3ccdf8c..98e36e0 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -49,34 +49,47 @@ union intel_x86_pebs_dse { */ #define P(a, b) PERF_MEM_S(a, b) #define OP_LH (P(OP, LOAD) | P(LVL, HIT)) +#define LEVEL(x) P(LVLNUM, x) +#define REM P(REMOTE, REMOTE) #define SNOOP_NONE_MISS (P(SNOOP, NONE) | P(SNOOP, MISS)) /* Version for Sandy Bridge and later */ static u64 pebs_data_source[] = { - P(OP, LOAD) | P(LVL, MISS) | P(LVL, L3) | P(SNOOP, NA),/* 0x00:ukn L3 */ - OP_LH | P(LVL, L1) | P(SNOOP, NONE), /* 0x01: L1 local */ - OP_LH | P(LVL, LFB) | P(SNOOP, NONE), /* 0x02: LFB hit */ - OP_LH | P(LVL, L2) | P(SNOOP, NONE), /* 0x03: L2 hit */ - OP_LH | P(LVL, L3) | P(SNOOP, NONE), /* 0x04: L3 hit */ - OP_LH | P(LVL, L3) | P(SNOOP, MISS), /* 0x05: L3 hit, snoop miss */ - OP_LH | P(LVL, L3) | P(SNOOP, HIT), /* 0x06: L3 hit, snoop hit */ - OP_LH | P(LVL, L3) | P(SNOOP, HITM), /* 0x07: L3 hit, snoop hitm */ - OP_LH | P(LVL, REM_CCE1) | P(SNOOP, HIT), /* 0x08: L3 miss snoop hit */ - OP_LH | P(LVL, REM_CCE1) | P(SNOOP, HITM), /* 0x09: L3 miss snoop hitm*/ - OP_LH | P(LVL, LOC_RAM) | P(SNOOP, HIT), /* 0x0a: L3 miss, shared */ - OP_LH | P(LVL, REM_RAM1) | P(SNOOP, HIT), /* 0x0b: L3 miss, shared */ - OP_LH | P(LVL, LOC_RAM) | SNOOP_NONE_MISS,/* 0x0c: L3 miss, excl */ - OP_LH | P(LVL, REM_RAM1) | SNOOP_NONE_MISS,/* 0x0d: L3 miss, excl */ - OP_LH | P(LVL, IO) | P(SNOOP, NONE), /* 0x0e: I/O */ - OP_LH | P(LVL, UNC) | P(SNOOP, NONE), /* 0x0f: uncached */ + P(OP, LOAD) | P(LVL, MISS) | LEVEL(L3) | P(SNOOP, NA),/* 0x00:ukn L3 */ + OP_LH | P(LVL, L1) | LEVEL(L1) | P(SNOOP, NONE), /* 0x01: L1 local */ + OP_LH | P(LVL, LFB) | LEVEL(LFB) | P(SNOOP, NONE), /* 0x02: LFB hit */ + OP_LH | P(LVL, L2) | LEVEL(L2) | P(SNOOP, NONE), /* 0x03: L2 hit */ + OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOP, NONE), /* 0x04: L3 hit */ + OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOP, MISS), /* 0x05: L3 hit, snoop miss */ + OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOP, HIT), /* 0x06: L3 hit, snoop hit */ + OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOP, HITM), /* 0x07: L3 hit, snoop hitm */ + OP_LH | P(LVL, REM_CCE1) | REM | LEVEL(L3) | P(SNOOP, HIT), /* 0x08: L3 miss snoop hit */ + OP_LH | P(LVL, REM_CCE1) | REM | LEVEL(L3) | P(SNOOP, HITM), /* 0x09: L3 miss snoop hitm*/ + OP_LH | P(LVL, LOC_RAM) | LEVEL(RAM) | P(SNOOP, HIT), /* 0x0a: L3 miss, shared */ + OP_LH | P(LVL, REM_RAM1) | REM | LEVEL(L3) | P(SNOOP, HIT), /* 0x0b: L3 miss, shared */ + OP_LH | P(LVL, LOC_RAM) | LEVEL(RAM) | SNOOP_NONE_MISS, /* 0x0c: L3 miss, excl */ + OP_LH | P(LVL, REM_RAM1) | LEVEL(RAM) | REM | SNOOP_NONE_MISS, /* 0x0d: L3 miss, excl */ + OP_LH | P(LVL, IO) | LEVEL(NA) | P(SNOOP, NONE), /* 0x0e: I/O */ + OP_LH | P(LVL, UNC) | LEVEL(NA) | P(SNOOP, NONE), /* 0x0f: uncached */ }; /* Patch up minor differences in the bits */ void __init intel_pmu_pebs_data_source_nhm(void) { - pebs_data_source[0x05] = OP_LH | P(LVL, L3) | P(SNOOP, HIT); - pebs_data_source[0x06] = OP_LH | P(LVL, L3) | P(SNOOP, HITM); - pebs_data_source[0x07] = OP_LH | P(LVL, L3) | P(SNOOP, HITM); + pebs_data_source[0x05] = OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOP, HIT); + pebs_data_source[0x06] = OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOP, HITM); + pebs_data_source[0x07] = OP_LH | P(LVL, L3) | LEVEL(L3) | P(SNOOP, HITM); +} + +void __init intel_pmu_pebs_data_source_skl(bool pmem) +{ + u64 pmem_or_l4 = pmem ? LEVEL(PMEM) : LEVEL(L4); + + pebs_data_source[0x08] = OP_LH | pmem_or_l4 | P(SNOOP, HIT); + pebs_data_source[0x09] = OP_LH | pmem_or_l4 | REM | P(SNOOP, HIT); + pebs_data_source[0x0b] = OP_LH | LEVEL(RAM) | REM | P(SNOOP, NONE); + pebs_data_source[0x0c] = OP_LH | LEVEL(ANY_CACHE) | REM | P(SNOOPX, FWD); + pebs_data_source[0x0d] = OP_LH | LEVEL(ANY_CACHE) | REM | P(SNOOP, HITM); } static u64 precise_store_data(u64 status) diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 2e9636e..0f7dad8 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -948,6 +948,8 @@ void intel_pmu_lbr_init_knl(void); void intel_pmu_pebs_data_source_nhm(void); +void intel_pmu_pebs_data_source_skl(bool pmem); + int intel_pmu_setup_lbr_filter(struct perf_event *event); void intel_pt_interrupt(void); diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 642db5f..2a37ae9 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -954,14 +954,20 @@ union perf_mem_data_src { mem_snoop:5, /* snoop mode */ mem_lock:2, /* lock instr */ mem_dtlb:7, /* tlb access */ - mem_rsvd:31; + mem_lvl_num:4, /* memory hierarchy level number */ + mem_remote:1, /* remote */ + mem_snoopx:2, /* snoop mode, ext */ + mem_rsvd:24; }; }; #elif defined(__BIG_ENDIAN_BITFIELD) union perf_mem_data_src { __u64 val; struct { - __u64 mem_rsvd:31, + __u64 mem_rsvd:24, + mem_snoopx:2, /* snoop mode, ext */ + mem_remote:1, /* remote */ + mem_lvl_num:4, /* memory hierarchy level number */ mem_dtlb:7, /* tlb access */ mem_lock:2, /* lock instr */ mem_snoop:5, /* snoop mode */ @@ -998,6 +1004,22 @@ union perf_mem_data_src { #define PERF_MEM_LVL_UNC 0x2000 /* Uncached memory */ #define PERF_MEM_LVL_SHIFT 5 +#define PERF_MEM_REMOTE_REMOTE 0x01 /* Remote */ +#define PERF_MEM_REMOTE_SHIFT 37 + +#define PERF_MEM_LVLNUM_L1 0x01 /* L1 */ +#define PERF_MEM_LVLNUM_L2 0x02 /* L2 */ +#define PERF_MEM_LVLNUM_L3 0x03 /* L3 */ +#define PERF_MEM_LVLNUM_L4 0x04 /* L4 */ +/* 5-0xa available */ +#define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */ +#define PERF_MEM_LVLNUM_LFB 0x0c /* LFB */ +#define PERF_MEM_LVLNUM_RAM 0x0d /* RAM */ +#define PERF_MEM_LVLNUM_PMEM 0x0e /* PMEM */ +#define PERF_MEM_LVLNUM_NA 0x0f /* N/A */ + +#define PERF_MEM_LVLNUM_SHIFT 33 + /* snoop mode */ #define PERF_MEM_SNOOP_NA 0x01 /* not available */ #define PERF_MEM_SNOOP_NONE 0x02 /* no snoop */ @@ -1006,6 +1028,10 @@ union perf_mem_data_src { #define PERF_MEM_SNOOP_HITM 0x10 /* snoop hit modified */ #define PERF_MEM_SNOOP_SHIFT 19 +#define PERF_MEM_SNOOPX_FWD 0x01 /* forward */ +/* 1 free */ +#define PERF_MEM_SNOOPX_SHIFT 37 + /* locked instruction */ #define PERF_MEM_LOCK_NA 0x01 /* not available */ #define PERF_MEM_LOCK_LOCKED 0x02 /* locked transaction */ ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v5 3/4] perf, tools: Add support for printing new mem_info encodings 2017-08-16 22:21 Fix Skylake PEBS data source for perf v5 Andi Kleen 2017-08-16 22:21 ` [PATCH v5 1/4] perf/x86: Move Nehalem PEBS code to flag Andi Kleen 2017-08-16 22:21 ` [PATCH v5 2/4] perf/x86: Fix data source decoding for Skylake Andi Kleen @ 2017-08-16 22:21 ` Andi Kleen 2017-08-23 13:01 ` Jiri Olsa 2017-08-24 8:23 ` [tip:perf/core] perf " tip-bot for Andi Kleen 2017-08-16 22:21 ` [PATCH v5 4/4] perf, tools: Add test cases for new data source encoding Andi Kleen 2017-08-22 15:37 ` Fix Skylake PEBS data source for perf v5 Arnaldo Carvalho de Melo 4 siblings, 2 replies; 15+ messages in thread From: Andi Kleen @ 2017-08-16 22:21 UTC (permalink / raw) To: peterz, acme; +Cc: jolsa, linux-kernel, Andi Kleen From: Andi Kleen <ak@linux.intel.com> Add decoding for the new lvlx and snoopx field meminfo field added earlier to the kernel so that "perf mem report" and other tools can print it properly. v2: Merge with persistent memory patch. Switch to new bit encoding for each combination. v3: Switch to generic lvlnum field. Signed-off-by: Andi Kleen <ak@linux.intel.com> --- tools/include/uapi/linux/perf_event.h | 30 ++++++++++++++++++++++-- tools/perf/util/mem-events.c | 43 ++++++++++++++++++++++++++++++++--- 2 files changed, 68 insertions(+), 5 deletions(-) diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h index 642db5fa3286..2a37ae925d85 100644 --- a/tools/include/uapi/linux/perf_event.h +++ b/tools/include/uapi/linux/perf_event.h @@ -954,14 +954,20 @@ union perf_mem_data_src { mem_snoop:5, /* snoop mode */ mem_lock:2, /* lock instr */ mem_dtlb:7, /* tlb access */ - mem_rsvd:31; + mem_lvl_num:4, /* memory hierarchy level number */ + mem_remote:1, /* remote */ + mem_snoopx:2, /* snoop mode, ext */ + mem_rsvd:24; }; }; #elif defined(__BIG_ENDIAN_BITFIELD) union perf_mem_data_src { __u64 val; struct { - __u64 mem_rsvd:31, + __u64 mem_rsvd:24, + mem_snoopx:2, /* snoop mode, ext */ + mem_remote:1, /* remote */ + mem_lvl_num:4, /* memory hierarchy level number */ mem_dtlb:7, /* tlb access */ mem_lock:2, /* lock instr */ mem_snoop:5, /* snoop mode */ @@ -998,6 +1004,22 @@ union perf_mem_data_src { #define PERF_MEM_LVL_UNC 0x2000 /* Uncached memory */ #define PERF_MEM_LVL_SHIFT 5 +#define PERF_MEM_REMOTE_REMOTE 0x01 /* Remote */ +#define PERF_MEM_REMOTE_SHIFT 37 + +#define PERF_MEM_LVLNUM_L1 0x01 /* L1 */ +#define PERF_MEM_LVLNUM_L2 0x02 /* L2 */ +#define PERF_MEM_LVLNUM_L3 0x03 /* L3 */ +#define PERF_MEM_LVLNUM_L4 0x04 /* L4 */ +/* 5-0xa available */ +#define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */ +#define PERF_MEM_LVLNUM_LFB 0x0c /* LFB */ +#define PERF_MEM_LVLNUM_RAM 0x0d /* RAM */ +#define PERF_MEM_LVLNUM_PMEM 0x0e /* PMEM */ +#define PERF_MEM_LVLNUM_NA 0x0f /* N/A */ + +#define PERF_MEM_LVLNUM_SHIFT 33 + /* snoop mode */ #define PERF_MEM_SNOOP_NA 0x01 /* not available */ #define PERF_MEM_SNOOP_NONE 0x02 /* no snoop */ @@ -1006,6 +1028,10 @@ union perf_mem_data_src { #define PERF_MEM_SNOOP_HITM 0x10 /* snoop hit modified */ #define PERF_MEM_SNOOP_SHIFT 19 +#define PERF_MEM_SNOOPX_FWD 0x01 /* forward */ +/* 1 free */ +#define PERF_MEM_SNOOPX_SHIFT 37 + /* locked instruction */ #define PERF_MEM_LOCK_NA 0x01 /* not available */ #define PERF_MEM_LOCK_LOCKED 0x02 /* locked transaction */ diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c index 06f5a3a4295c..ced4f3fff035 100644 --- a/tools/perf/util/mem-events.c +++ b/tools/perf/util/mem-events.c @@ -166,11 +166,20 @@ static const char * const mem_lvl[] = { "Uncached", }; +static const char * const mem_lvlnum[] = { + [PERF_MEM_LVLNUM_ANY_CACHE] = "Any cache", + [PERF_MEM_LVLNUM_LFB] = "LFB", + [PERF_MEM_LVLNUM_RAM] = "RAM", + [PERF_MEM_LVLNUM_PMEM] = "PMEM", + [PERF_MEM_LVLNUM_NA] = "N/A", +}; + int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info) { size_t i, l = 0; u64 m = PERF_MEM_LVL_NA; u64 hit, miss; + int printed; if (mem_info) m = mem_info->data_src.mem_lvl; @@ -184,17 +193,37 @@ int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info) /* already taken care of */ m &= ~(PERF_MEM_LVL_HIT|PERF_MEM_LVL_MISS); + + if (mem_info && mem_info->data_src.mem_remote) { + strcat(out, "Remote "); + l += 7; + } + + printed = 0; for (i = 0; m && i < ARRAY_SIZE(mem_lvl); i++, m >>= 1) { if (!(m & 0x1)) continue; - if (l) { + if (printed++) { strcat(out, " or "); l += 4; } l += scnprintf(out + l, sz - l, mem_lvl[i]); } - if (*out == '\0') - l += scnprintf(out, sz - l, "N/A"); + + if (mem_info && mem_info->data_src.mem_lvl_num) { + int lvl = mem_info->data_src.mem_lvl_num; + if (printed++) { + strcat(out, " or "); + l += 4; + } + if (mem_lvlnum[lvl]) + l += scnprintf(out + l, sz - l, mem_lvlnum[lvl]); + else + l += scnprintf(out + l, sz - l, "L%d", lvl); + } + + if (l == 0) + l += scnprintf(out + l, sz - l, "N/A"); if (hit) l += scnprintf(out + l, sz - l, " hit"); if (miss) @@ -231,6 +260,14 @@ int perf_mem__snp_scnprintf(char *out, size_t sz, struct mem_info *mem_info) } l += scnprintf(out + l, sz - l, snoop_access[i]); } + if (mem_info && + (mem_info->data_src.mem_snoopx & PERF_MEM_SNOOPX_FWD)) { + if (l) { + strcat(out, " or "); + l += 4; + } + l += scnprintf(out + l, sz - l, "Fwd"); + } if (*out == '\0') l += scnprintf(out, sz - l, "N/A"); -- 2.9.4 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v5 3/4] perf, tools: Add support for printing new mem_info encodings 2017-08-16 22:21 ` [PATCH v5 3/4] perf, tools: Add support for printing new mem_info encodings Andi Kleen @ 2017-08-23 13:01 ` Jiri Olsa 2017-08-23 14:00 ` Andi Kleen 2017-08-24 8:23 ` [tip:perf/core] perf " tip-bot for Andi Kleen 1 sibling, 1 reply; 15+ messages in thread From: Jiri Olsa @ 2017-08-23 13:01 UTC (permalink / raw) To: Andi Kleen; +Cc: peterz, acme, jolsa, linux-kernel, Andi Kleen, Joe Mario On Wed, Aug 16, 2017 at 03:21:55PM -0700, Andi Kleen wrote: SNIP > int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info) > { > size_t i, l = 0; > u64 m = PERF_MEM_LVL_NA; > u64 hit, miss; > + int printed; > > if (mem_info) > m = mem_info->data_src.mem_lvl; > @@ -184,17 +193,37 @@ int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info) > /* already taken care of */ > m &= ~(PERF_MEM_LVL_HIT|PERF_MEM_LVL_MISS); > > + > + if (mem_info && mem_info->data_src.mem_remote) { > + strcat(out, "Remote "); > + l += 7; > + } Andi, how is this 'Remote' different from the remote levels in mem_lvl? "Remote RAM (1 hop)", "Remote RAM (2 hops)", "Remote Cache (1 hop)", "Remote Cache (2 hops)", thanks, jirka > + > + printed = 0; > for (i = 0; m && i < ARRAY_SIZE(mem_lvl); i++, m >>= 1) { > if (!(m & 0x1)) > continue; > - if (l) { > + if (printed++) { > strcat(out, " or "); > l += 4; > } > l += scnprintf(out + l, sz - l, mem_lvl[i]); > } > - if (*out == '\0') > - l += scnprintf(out, sz - l, "N/A"); > + > + if (mem_info && mem_info->data_src.mem_lvl_num) { > + int lvl = mem_info->data_src.mem_lvl_num; > + if (printed++) { > + strcat(out, " or "); > + l += 4; > + } > + if (mem_lvlnum[lvl]) > + l += scnprintf(out + l, sz - l, mem_lvlnum[lvl]); > + else > + l += scnprintf(out + l, sz - l, "L%d", lvl); > + } ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v5 3/4] perf, tools: Add support for printing new mem_info encodings 2017-08-23 13:01 ` Jiri Olsa @ 2017-08-23 14:00 ` Andi Kleen 2017-08-23 14:14 ` Jiri Olsa 0 siblings, 1 reply; 15+ messages in thread From: Andi Kleen @ 2017-08-23 14:00 UTC (permalink / raw) To: Jiri Olsa; +Cc: Andi Kleen, peterz, acme, jolsa, linux-kernel, Joe Mario > Andi, > how is this 'Remote' different from the remote levels in mem_lvl? > > "Remote RAM (1 hop)", > "Remote RAM (2 hops)", > "Remote Cache (1 hop)", > "Remote Cache (2 hops)", It applies to any other level. This is needed to express "Remote unknown level", as is reported by Skylake. -Andi ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v5 3/4] perf, tools: Add support for printing new mem_info encodings 2017-08-23 14:00 ` Andi Kleen @ 2017-08-23 14:14 ` Jiri Olsa 2017-08-23 15:59 ` Andi Kleen 0 siblings, 1 reply; 15+ messages in thread From: Jiri Olsa @ 2017-08-23 14:14 UTC (permalink / raw) To: Andi Kleen; +Cc: Andi Kleen, peterz, acme, jolsa, linux-kernel, Joe Mario On Wed, Aug 23, 2017 at 07:00:32AM -0700, Andi Kleen wrote: > > Andi, > > how is this 'Remote' different from the remote levels in mem_lvl? > > > > "Remote RAM (1 hop)", > > "Remote RAM (2 hops)", > > "Remote Cache (1 hop)", > > "Remote Cache (2 hops)", > > It applies to any other level. This is needed to express > "Remote unknown level", as is reported by Skylake. so if I find HITM with this flag set I should count it as remote HITM then? something like attached.. untested thanks, jirka --- diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c index 06f5a3a4295c..65e22b9e59f9 100644 --- a/tools/perf/util/mem-events.c +++ b/tools/perf/util/mem-events.c @@ -279,6 +279,7 @@ int c2c_decode_stats(struct c2c_stats *stats, struct mem_info *mi) u64 lvl = data_src->mem_lvl; u64 snoop = data_src->mem_snoop; u64 lock = data_src->mem_lock; + bool mr = data_src->mem_remote; int err = 0; #define HITM_INC(__f) \ @@ -324,7 +325,7 @@ do { \ } if ((lvl & P(LVL, REM_RAM1)) || - (lvl & P(LVL, REM_RAM2))) { + (lvl & P(LVL, REM_RAM2)) || mr) { stats->rmt_dram++; if (snoop & P(SNOOP, HIT)) stats->ld_shared++; @@ -334,7 +335,7 @@ do { \ } if ((lvl & P(LVL, REM_CCE1)) || - (lvl & P(LVL, REM_CCE2))) { + (lvl & P(LVL, REM_CCE2)) || mr) { if (snoop & P(SNOOP, HIT)) stats->rmt_hit++; else if (snoop & P(SNOOP, HITM)) ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v5 3/4] perf, tools: Add support for printing new mem_info encodings 2017-08-23 14:14 ` Jiri Olsa @ 2017-08-23 15:59 ` Andi Kleen 2017-08-24 8:23 ` Jiri Olsa 0 siblings, 1 reply; 15+ messages in thread From: Andi Kleen @ 2017-08-23 15:59 UTC (permalink / raw) To: Jiri Olsa Cc: Andi Kleen, Andi Kleen, peterz, acme, jolsa, linux-kernel, Joe Mario > so if I find HITM with this flag set I should count it > as remote HITM then? something like attached.. untested You mean for c2c? Yes looks reasonable. -Andi > > thanks, > jirka > > --- > diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c > index 06f5a3a4295c..65e22b9e59f9 100644 > --- a/tools/perf/util/mem-events.c > +++ b/tools/perf/util/mem-events.c > @@ -279,6 +279,7 @@ int c2c_decode_stats(struct c2c_stats *stats, struct mem_info *mi) > u64 lvl = data_src->mem_lvl; > u64 snoop = data_src->mem_snoop; > u64 lock = data_src->mem_lock; > + bool mr = data_src->mem_remote; > int err = 0; > > #define HITM_INC(__f) \ > @@ -324,7 +325,7 @@ do { \ > } > > if ((lvl & P(LVL, REM_RAM1)) || > - (lvl & P(LVL, REM_RAM2))) { > + (lvl & P(LVL, REM_RAM2)) || mr) { > stats->rmt_dram++; > if (snoop & P(SNOOP, HIT)) > stats->ld_shared++; > @@ -334,7 +335,7 @@ do { \ > } > > if ((lvl & P(LVL, REM_CCE1)) || > - (lvl & P(LVL, REM_CCE2))) { > + (lvl & P(LVL, REM_CCE2)) || mr) { > if (snoop & P(SNOOP, HIT)) > stats->rmt_hit++; > else if (snoop & P(SNOOP, HITM)) > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v5 3/4] perf, tools: Add support for printing new mem_info encodings 2017-08-23 15:59 ` Andi Kleen @ 2017-08-24 8:23 ` Jiri Olsa 0 siblings, 0 replies; 15+ messages in thread From: Jiri Olsa @ 2017-08-24 8:23 UTC (permalink / raw) To: Andi Kleen; +Cc: Andi Kleen, peterz, acme, jolsa, linux-kernel, Joe Mario On Wed, Aug 23, 2017 at 08:59:00AM -0700, Andi Kleen wrote: > > so if I find HITM with this flag set I should count it > > as remote HITM then? something like attached.. untested > > You mean for c2c? Yes looks reasonable. yes, it seems to fix c2c to find remote HITMs again on skylake.. I'll post the full patch soon thanks, jirka ^ permalink raw reply [flat|nested] 15+ messages in thread
* [tip:perf/core] perf tools: Add support for printing new mem_info encodings 2017-08-16 22:21 ` [PATCH v5 3/4] perf, tools: Add support for printing new mem_info encodings Andi Kleen 2017-08-23 13:01 ` Jiri Olsa @ 2017-08-24 8:23 ` tip-bot for Andi Kleen 1 sibling, 0 replies; 15+ messages in thread From: tip-bot for Andi Kleen @ 2017-08-24 8:23 UTC (permalink / raw) To: linux-tip-commits; +Cc: linux-kernel, acme, mingo, peterz, hpa, ak, tglx, jolsa Commit-ID: 52839e653b5629bd237ad2ecdd8fdd897fcd5712 Gitweb: http://git.kernel.org/tip/52839e653b5629bd237ad2ecdd8fdd897fcd5712 Author: Andi Kleen <ak@linux.intel.com> AuthorDate: Wed, 16 Aug 2017 15:21:55 -0700 Committer: Arnaldo Carvalho de Melo <acme@redhat.com> CommitDate: Tue, 22 Aug 2017 12:30:25 -0300 perf tools: Add support for printing new mem_info encodings Add decoding for the new "lvlx" and "snoopx" meminfo fields added earlier to the kernel so that "perf mem report" and other tools can print it properly. v2: Merge with persistent memory patch. Switch to new bit encoding for each combination. v3: Switch to generic lvlnum field. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170816222156.19953-4-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/include/uapi/linux/perf_event.h | 30 ++++++++++++++++++++++-- tools/perf/util/mem-events.c | 43 ++++++++++++++++++++++++++++++++--- 2 files changed, 68 insertions(+), 5 deletions(-) diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h index 642db5f..2a37ae9 100644 --- a/tools/include/uapi/linux/perf_event.h +++ b/tools/include/uapi/linux/perf_event.h @@ -954,14 +954,20 @@ union perf_mem_data_src { mem_snoop:5, /* snoop mode */ mem_lock:2, /* lock instr */ mem_dtlb:7, /* tlb access */ - mem_rsvd:31; + mem_lvl_num:4, /* memory hierarchy level number */ + mem_remote:1, /* remote */ + mem_snoopx:2, /* snoop mode, ext */ + mem_rsvd:24; }; }; #elif defined(__BIG_ENDIAN_BITFIELD) union perf_mem_data_src { __u64 val; struct { - __u64 mem_rsvd:31, + __u64 mem_rsvd:24, + mem_snoopx:2, /* snoop mode, ext */ + mem_remote:1, /* remote */ + mem_lvl_num:4, /* memory hierarchy level number */ mem_dtlb:7, /* tlb access */ mem_lock:2, /* lock instr */ mem_snoop:5, /* snoop mode */ @@ -998,6 +1004,22 @@ union perf_mem_data_src { #define PERF_MEM_LVL_UNC 0x2000 /* Uncached memory */ #define PERF_MEM_LVL_SHIFT 5 +#define PERF_MEM_REMOTE_REMOTE 0x01 /* Remote */ +#define PERF_MEM_REMOTE_SHIFT 37 + +#define PERF_MEM_LVLNUM_L1 0x01 /* L1 */ +#define PERF_MEM_LVLNUM_L2 0x02 /* L2 */ +#define PERF_MEM_LVLNUM_L3 0x03 /* L3 */ +#define PERF_MEM_LVLNUM_L4 0x04 /* L4 */ +/* 5-0xa available */ +#define PERF_MEM_LVLNUM_ANY_CACHE 0x0b /* Any cache */ +#define PERF_MEM_LVLNUM_LFB 0x0c /* LFB */ +#define PERF_MEM_LVLNUM_RAM 0x0d /* RAM */ +#define PERF_MEM_LVLNUM_PMEM 0x0e /* PMEM */ +#define PERF_MEM_LVLNUM_NA 0x0f /* N/A */ + +#define PERF_MEM_LVLNUM_SHIFT 33 + /* snoop mode */ #define PERF_MEM_SNOOP_NA 0x01 /* not available */ #define PERF_MEM_SNOOP_NONE 0x02 /* no snoop */ @@ -1006,6 +1028,10 @@ union perf_mem_data_src { #define PERF_MEM_SNOOP_HITM 0x10 /* snoop hit modified */ #define PERF_MEM_SNOOP_SHIFT 19 +#define PERF_MEM_SNOOPX_FWD 0x01 /* forward */ +/* 1 free */ +#define PERF_MEM_SNOOPX_SHIFT 37 + /* locked instruction */ #define PERF_MEM_LOCK_NA 0x01 /* not available */ #define PERF_MEM_LOCK_LOCKED 0x02 /* locked transaction */ diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c index 06f5a3a..ced4f3f 100644 --- a/tools/perf/util/mem-events.c +++ b/tools/perf/util/mem-events.c @@ -166,11 +166,20 @@ static const char * const mem_lvl[] = { "Uncached", }; +static const char * const mem_lvlnum[] = { + [PERF_MEM_LVLNUM_ANY_CACHE] = "Any cache", + [PERF_MEM_LVLNUM_LFB] = "LFB", + [PERF_MEM_LVLNUM_RAM] = "RAM", + [PERF_MEM_LVLNUM_PMEM] = "PMEM", + [PERF_MEM_LVLNUM_NA] = "N/A", +}; + int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info) { size_t i, l = 0; u64 m = PERF_MEM_LVL_NA; u64 hit, miss; + int printed; if (mem_info) m = mem_info->data_src.mem_lvl; @@ -184,17 +193,37 @@ int perf_mem__lvl_scnprintf(char *out, size_t sz, struct mem_info *mem_info) /* already taken care of */ m &= ~(PERF_MEM_LVL_HIT|PERF_MEM_LVL_MISS); + + if (mem_info && mem_info->data_src.mem_remote) { + strcat(out, "Remote "); + l += 7; + } + + printed = 0; for (i = 0; m && i < ARRAY_SIZE(mem_lvl); i++, m >>= 1) { if (!(m & 0x1)) continue; - if (l) { + if (printed++) { strcat(out, " or "); l += 4; } l += scnprintf(out + l, sz - l, mem_lvl[i]); } - if (*out == '\0') - l += scnprintf(out, sz - l, "N/A"); + + if (mem_info && mem_info->data_src.mem_lvl_num) { + int lvl = mem_info->data_src.mem_lvl_num; + if (printed++) { + strcat(out, " or "); + l += 4; + } + if (mem_lvlnum[lvl]) + l += scnprintf(out + l, sz - l, mem_lvlnum[lvl]); + else + l += scnprintf(out + l, sz - l, "L%d", lvl); + } + + if (l == 0) + l += scnprintf(out + l, sz - l, "N/A"); if (hit) l += scnprintf(out + l, sz - l, " hit"); if (miss) @@ -231,6 +260,14 @@ int perf_mem__snp_scnprintf(char *out, size_t sz, struct mem_info *mem_info) } l += scnprintf(out + l, sz - l, snoop_access[i]); } + if (mem_info && + (mem_info->data_src.mem_snoopx & PERF_MEM_SNOOPX_FWD)) { + if (l) { + strcat(out, " or "); + l += 4; + } + l += scnprintf(out + l, sz - l, "Fwd"); + } if (*out == '\0') l += scnprintf(out, sz - l, "N/A"); ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v5 4/4] perf, tools: Add test cases for new data source encoding 2017-08-16 22:21 Fix Skylake PEBS data source for perf v5 Andi Kleen ` (2 preceding siblings ...) 2017-08-16 22:21 ` [PATCH v5 3/4] perf, tools: Add support for printing new mem_info encodings Andi Kleen @ 2017-08-16 22:21 ` Andi Kleen 2017-08-24 8:23 ` [tip:perf/core] perf test: " tip-bot for Andi Kleen 2017-08-22 15:37 ` Fix Skylake PEBS data source for perf v5 Arnaldo Carvalho de Melo 4 siblings, 1 reply; 15+ messages in thread From: Andi Kleen @ 2017-08-16 22:21 UTC (permalink / raw) To: peterz, acme; +Cc: jolsa, linux-kernel, Andi Kleen From: Andi Kleen <ak@linux.intel.com> Add some simple tests to perf test to test data source printing. v2: Make the tests actually checked for the correct name of Forward v3: Adjust to new encoding Signed-off-by: Andi Kleen <ak@linux.intel.com> --- tools/perf/tests/Build | 1 + tools/perf/tests/builtin-test.c | 4 ++++ tools/perf/tests/mem.c | 53 +++++++++++++++++++++++++++++++++++++++++ tools/perf/tests/tests.h | 1 + 4 files changed, 59 insertions(+) create mode 100644 tools/perf/tests/mem.c diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build index 84222bdb8689..87bf3edb037c 100644 --- a/tools/perf/tests/Build +++ b/tools/perf/tests/Build @@ -34,6 +34,7 @@ perf-y += thread-map.o perf-y += llvm.o llvm-src-base.o llvm-src-kbuild.o llvm-src-prologue.o llvm-src-relocation.o perf-y += bpf.o perf-y += topology.o +perf-y += mem.o perf-y += cpumap.o perf-y += stat.o perf-y += event_update.o diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c index 9ecc44e68990..377bea009163 100644 --- a/tools/perf/tests/builtin-test.c +++ b/tools/perf/tests/builtin-test.c @@ -48,6 +48,10 @@ static struct test generic_tests[] = { .func = test__basic_mmap, }, { + .desc = "Test data source output", + .func = test__mem, + }, + { .desc = "Parse event definition strings", .func = test__parse_events, }, diff --git a/tools/perf/tests/mem.c b/tools/perf/tests/mem.c new file mode 100644 index 000000000000..ee203b9ff44c --- /dev/null +++ b/tools/perf/tests/mem.c @@ -0,0 +1,53 @@ +#include "util/mem-events.h" +#include "util/symbol.h" +#include "linux/perf_event.h" +#include "util/debug.h" +#include "tests.h" +#include <string.h> + +static int check(union perf_mem_data_src data_src, + const char *string) +{ + char out[100]; + char failure[100]; + struct mem_info mi = { .data_src = data_src }; + + int n; + + n = perf_mem__snp_scnprintf(out, sizeof out, &mi); + n += perf_mem__lvl_scnprintf(out + n, sizeof out - n, &mi); + snprintf(failure, sizeof failure, "unexpected %s", out); + TEST_ASSERT_VAL(failure, !strcmp(string, out)); + return 0; +} + +int test__mem(struct test *text __maybe_unused, int subtest __maybe_unused) +{ + int ret = 0; + + ret |= check(((union perf_mem_data_src) { + .mem_lvl = PERF_MEM_LVL_HIT, + .mem_lvl_num = 4 }), "N/AL4 hit"); + + ret |= check(((union perf_mem_data_src) { + .mem_lvl = PERF_MEM_LVL_HIT, + .mem_lvl_num = 4, + .mem_remote = 1 }), "N/ARemote L4 hit"); + + ret |= check(((union perf_mem_data_src) { + .mem_lvl = PERF_MEM_LVL_MISS, + .mem_lvl_num = PERF_MEM_LVLNUM_PMEM }), "N/APMEM miss"); + + ret |= check(((union perf_mem_data_src) { + .mem_lvl = PERF_MEM_LVL_MISS, + .mem_lvl_num = PERF_MEM_LVLNUM_PMEM, + .mem_remote =1 }), "N/ARemote PMEM miss"); + + ret |= check(((union perf_mem_data_src) { + .mem_snoopx = PERF_MEM_SNOOPX_FWD, + .mem_lvl = PERF_MEM_LVL_MISS, + .mem_lvl_num = PERF_MEM_LVLNUM_RAM, + .mem_remote = 1 }), "FwdRemote RAM miss"); + + return ret; +} diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h index c46ae818aac8..921412a6a880 100644 --- a/tools/perf/tests/tests.h +++ b/tools/perf/tests/tests.h @@ -58,6 +58,7 @@ int test__python_use(struct test *test, int subtest); int test__bp_signal(struct test *test, int subtest); int test__bp_signal_overflow(struct test *test, int subtest); int test__task_exit(struct test *test, int subtest); +int test__mem(struct test *test, int subtest); int test__sw_clock_freq(struct test *test, int subtest); int test__code_reading(struct test *test, int subtest); int test__sample_parsing(struct test *test, int subtest); -- 2.9.4 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [tip:perf/core] perf test: Add test cases for new data source encoding 2017-08-16 22:21 ` [PATCH v5 4/4] perf, tools: Add test cases for new data source encoding Andi Kleen @ 2017-08-24 8:23 ` tip-bot for Andi Kleen 0 siblings, 0 replies; 15+ messages in thread From: tip-bot for Andi Kleen @ 2017-08-24 8:23 UTC (permalink / raw) To: linux-tip-commits; +Cc: tglx, ak, jolsa, hpa, acme, mingo, peterz, linux-kernel Commit-ID: 3067eaa7ce2dbcde89d87277cdbc91c211480060 Gitweb: http://git.kernel.org/tip/3067eaa7ce2dbcde89d87277cdbc91c211480060 Author: Andi Kleen <ak@linux.intel.com> AuthorDate: Wed, 16 Aug 2017 15:21:56 -0700 Committer: Arnaldo Carvalho de Melo <acme@redhat.com> CommitDate: Tue, 22 Aug 2017 13:23:10 -0300 perf test: Add test cases for new data source encoding Add some simple tests to perf test to test data source printing. v2: Make the tests actually checked for the correct name of Forward v3: Adjust to new encoding Committer notes: Avoid the in place declaration to make this build with older compilers, for instance, in Debian 7 we get: tests/mem.c: In function 'test__mem': tests/mem.c:30:5: error: missing initializer [-Werror=missing-field-initializers] tests/mem.c:30:5: error: (near initialization for '(anonymous).<anonymous>.mem_snoop') [-Werror=missing-field-initializers] So just zero a struct, then go on building the unions as needed, reusing settings from the previous test, i.e. local -> remote, etc. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170816222156.19953-5-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> --- tools/perf/tests/Build | 1 + tools/perf/tests/builtin-test.c | 4 +++ tools/perf/tests/mem.c | 56 +++++++++++++++++++++++++++++++++++++++++ tools/perf/tests/tests.h | 1 + 4 files changed, 62 insertions(+) diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build index 84222bd..87bf3ed 100644 --- a/tools/perf/tests/Build +++ b/tools/perf/tests/Build @@ -34,6 +34,7 @@ perf-y += thread-map.o perf-y += llvm.o llvm-src-base.o llvm-src-kbuild.o llvm-src-prologue.o llvm-src-relocation.o perf-y += bpf.o perf-y += topology.o +perf-y += mem.o perf-y += cpumap.o perf-y += stat.o perf-y += event_update.o diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c index 9ecc44e..377bea0 100644 --- a/tools/perf/tests/builtin-test.c +++ b/tools/perf/tests/builtin-test.c @@ -48,6 +48,10 @@ static struct test generic_tests[] = { .func = test__basic_mmap, }, { + .desc = "Test data source output", + .func = test__mem, + }, + { .desc = "Parse event definition strings", .func = test__parse_events, }, diff --git a/tools/perf/tests/mem.c b/tools/perf/tests/mem.c new file mode 100644 index 0000000..21952e1 --- /dev/null +++ b/tools/perf/tests/mem.c @@ -0,0 +1,56 @@ +#include "util/mem-events.h" +#include "util/symbol.h" +#include "linux/perf_event.h" +#include "util/debug.h" +#include "tests.h" +#include <string.h> + +static int check(union perf_mem_data_src data_src, + const char *string) +{ + char out[100]; + char failure[100]; + struct mem_info mi = { .data_src = data_src }; + + int n; + + n = perf_mem__snp_scnprintf(out, sizeof out, &mi); + n += perf_mem__lvl_scnprintf(out + n, sizeof out - n, &mi); + snprintf(failure, sizeof failure, "unexpected %s", out); + TEST_ASSERT_VAL(failure, !strcmp(string, out)); + return 0; +} + +int test__mem(struct test *text __maybe_unused, int subtest __maybe_unused) +{ + int ret = 0; + union perf_mem_data_src src; + + memset(&src, 0, sizeof(src)); + + src.mem_lvl = PERF_MEM_LVL_HIT; + src.mem_lvl_num = 4; + + ret |= check(src, "N/AL4 hit"); + + src.mem_remote = 1; + + ret |= check(src, "N/ARemote L4 hit"); + + src.mem_lvl = PERF_MEM_LVL_MISS; + src.mem_lvl_num = PERF_MEM_LVLNUM_PMEM; + src.mem_remote = 0; + + ret |= check(src, "N/APMEM miss"); + + src.mem_remote = 1; + + ret |= check(src, "N/ARemote PMEM miss"); + + src.mem_snoopx = PERF_MEM_SNOOPX_FWD; + src.mem_lvl_num = PERF_MEM_LVLNUM_RAM; + + ret |= check(src , "FwdRemote RAM miss"); + + return ret; +} diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h index c46ae81..921412a 100644 --- a/tools/perf/tests/tests.h +++ b/tools/perf/tests/tests.h @@ -58,6 +58,7 @@ int test__python_use(struct test *test, int subtest); int test__bp_signal(struct test *test, int subtest); int test__bp_signal_overflow(struct test *test, int subtest); int test__task_exit(struct test *test, int subtest); +int test__mem(struct test *test, int subtest); int test__sw_clock_freq(struct test *test, int subtest); int test__code_reading(struct test *test, int subtest); int test__sample_parsing(struct test *test, int subtest); ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: Fix Skylake PEBS data source for perf v5 2017-08-16 22:21 Fix Skylake PEBS data source for perf v5 Andi Kleen ` (3 preceding siblings ...) 2017-08-16 22:21 ` [PATCH v5 4/4] perf, tools: Add test cases for new data source encoding Andi Kleen @ 2017-08-22 15:37 ` Arnaldo Carvalho de Melo 4 siblings, 0 replies; 15+ messages in thread From: Arnaldo Carvalho de Melo @ 2017-08-22 15:37 UTC (permalink / raw) To: Andi Kleen; +Cc: peterz, jolsa, linux-kernel Em Wed, Aug 16, 2017 at 03:21:52PM -0700, Andi Kleen escreveu: > Fix data source reporting for Skylake and Skylake Server. > The encodings have changed to express support for L4 and persistent > memory. > > The first patch is a (independent) cleanup. > > The second is for the kernel and the third/fourth for perf/tools. > The kernel part and perf tools will compile independently. I got the tools part, Peter had some trouble applying it, will get all merged soon in tip, thanks, - Arnaldo > Also available in > git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git perf/skx-data-src-7 > > v1: > Initial post > v2: > Merged some patches. > Change encoding to use special bit for each combination instead > of modifiers. > v3: > Switch to new generic lvlnum indication > v4: > Repost. No changes. > v5: > ported to latest tree. Retested remote HITM. > ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2017-08-25 11:58 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-08-16 22:21 Fix Skylake PEBS data source for perf v5 Andi Kleen 2017-08-16 22:21 ` [PATCH v5 1/4] perf/x86: Move Nehalem PEBS code to flag Andi Kleen 2017-08-25 11:53 ` [tip:perf/core] " tip-bot for Andi Kleen 2017-08-16 22:21 ` [PATCH v5 2/4] perf/x86: Fix data source decoding for Skylake Andi Kleen 2017-08-25 11:53 ` [tip:perf/core] " tip-bot for Andi Kleen 2017-08-16 22:21 ` [PATCH v5 3/4] perf, tools: Add support for printing new mem_info encodings Andi Kleen 2017-08-23 13:01 ` Jiri Olsa 2017-08-23 14:00 ` Andi Kleen 2017-08-23 14:14 ` Jiri Olsa 2017-08-23 15:59 ` Andi Kleen 2017-08-24 8:23 ` Jiri Olsa 2017-08-24 8:23 ` [tip:perf/core] perf " tip-bot for Andi Kleen 2017-08-16 22:21 ` [PATCH v5 4/4] perf, tools: Add test cases for new data source encoding Andi Kleen 2017-08-24 8:23 ` [tip:perf/core] perf test: " tip-bot for Andi Kleen 2017-08-22 15:37 ` Fix Skylake PEBS data source for perf v5 Arnaldo Carvalho de Melo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).