* [PATCH v4 0/5] perf report: Show branch type @ 2017-04-11 22:21 Jin Yao 2017-04-11 22:21 ` [PATCH v4 1/5] perf/core: Define the common branch type classification Jin Yao ` (5 more replies) 0 siblings, 6 replies; 23+ messages in thread From: Jin Yao @ 2017-04-11 22:21 UTC (permalink / raw) To: acme, jolsa, peterz, mingo, alexander.shishkin Cc: Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev, Jin Yao v4: --- 1. Describe the major changes in patch description. Thanks for Peter Zijlstra's reminding. 2. Initialize branch type to 0 in intel_pmu_lbr_read_32 and intel_pmu_lbr_read_64. Remove the invalid else code in intel_pmu_lbr_filter. v3: --- 1. Move the JCC forward/backward and cross page computing from kernel to userspace. 2. Use lookup table to replace original switch/case processing. Changed: perf/core: Define the common branch type classification perf/x86/intel: Record branch type perf report: Show branch type statistics for stdio mode perf report: Show branch type in callchain entry Not changed: perf record: Create a new option save_type in --branch-filter v2: --- 1. Use 4 bits in perf_branch_entry to record branch type. 2. Pull out some common branch types from FAR_BRANCH. Now the branch types defined in perf_event.h: PERF_BR_NONE : unknown PERF_BR_JCC_FWD : conditional forward jump PERF_BR_JCC_BWD : conditional backward jump PERF_BR_JMP : jump PERF_BR_IND_JMP : indirect jump PERF_BR_CALL : call PERF_BR_IND_CALL : indirect call PERF_BR_RET : return PERF_BR_SYSCALL : syscall PERF_BR_SYSRET : syscall return PERF_BR_IRQ : hw interrupt/trap/fault PERF_BR_INT : sw interrupt PERF_BR_IRET : return from interrupt PERF_BR_FAR_BRANCH: others not generic far branch type 3. Use 2 bits in perf_branch_entry for a "cross" metrics checking for branch cross 4K or 2M area. It's an approximate computing for checking if the branch cross 4K page or 2MB page. For example: perf record -g --branch-filter any,save_type <command> perf report --stdio JCC forward: 27.7% JCC backward: 9.8% JMP: 0.0% IND_JMP: 6.5% CALL: 26.6% IND_CALL: 0.0% RET: 29.3% IRET: 0.0% CROSS_4K: 0.0% CROSS_2M: 14.3% perf report --branch-history --stdio --no-children -23.60%--main div.c:42 (RET cycles:2) compute_flag div.c:28 (RET cycles:2) compute_flag div.c:27 (RET CROSS_2M cycles:1) rand rand.c:28 (RET CROSS_2M cycles:1) rand rand.c:28 (RET cycles:1) __random random.c:298 (RET cycles:1) __random random.c:297 (JCC forward cycles:1) __random random.c:295 (JCC forward cycles:1) __random random.c:295 (JCC forward cycles:1) __random random.c:295 (JCC forward cycles:1) __random random.c:295 (RET cycles:9) Changed: perf/core: Define the common branch type classification perf/x86/intel: Record branch type perf report: Show branch type statistics for stdio mode perf report: Show branch type in callchain entry Not changed: perf record: Create a new option save_type in --branch-filter v1: --- It is often useful to know the branch types while analyzing branch data. For example, a call is very different from a conditional branch. Currently we have to look it up in binary while the binary may later not be available and even the binary is available but user has to take some time. It is very useful for user to check it directly in perf report. Perf already has support for disassembling the branch instruction to get the branch type. The patch series records the branch type and show the branch type with other LBR information in callchain entry via perf report. The patch series also adds the branch type summary at the end of perf report --stdio. To keep consistent on kernel and userspace and make the classification more common, the patch adds the common branch type classification in perf_event.h. The common branch types are: JCC forward: Conditional forward jump JCC backward: Conditional backward jump JMP: Jump imm IND_JMP: Jump reg/mem CALL: Call imm IND_CALL: Call reg/mem RET: Ret FAR_BRANCH: SYSCALL/SYSRET, IRQ, IRET, TSX Abort An example: 1. Record branch type (new option "save_type") perf record -g --branch-filter any,save_type <command> 2. Show the branch type statistics at the end of perf report --stdio perf report --stdio JCC forward: 34.0% JCC backward: 3.6% JMP: 0.0% IND_JMP: 6.5% CALL: 26.6% IND_CALL: 0.0% RET: 29.3% FAR_BRANCH: 0.0% 3. Show branch type in callchain entry perf report --branch-history --stdio --no-children --23.91%--main div.c:42 (RET cycles:2) compute_flag div.c:28 (RET cycles:2) compute_flag div.c:27 (RET cycles:1) rand rand.c:28 (RET cycles:1) rand rand.c:28 (RET cycles:1) __random random.c:298 (RET cycles:1) __random random.c:297 (JCC forward cycles:1) __random random.c:295 (JCC forward cycles:1) __random random.c:295 (JCC forward cycles:1) __random random.c:295 (JCC forward cycles:1) __random random.c:295 (RET cycles:9) Jin Yao (5): perf/core: Define the common branch type classification perf/x86/intel: Record branch type perf record: Create a new option save_type in --branch-filter perf report: Show branch type statistics for stdio mode perf report: Show branch type in callchain entry arch/x86/events/intel/lbr.c | 53 ++++++++- include/uapi/linux/perf_event.h | 29 ++++- tools/include/uapi/linux/perf_event.h | 29 ++++- tools/perf/Documentation/perf-record.txt | 1 + tools/perf/builtin-report.c | 70 +++++++++++ tools/perf/util/callchain.c | 195 +++++++++++++++++++++---------- tools/perf/util/callchain.h | 4 +- tools/perf/util/event.h | 3 +- tools/perf/util/hist.c | 5 +- tools/perf/util/machine.c | 26 +++-- tools/perf/util/parse-branch-options.c | 1 + tools/perf/util/util.c | 59 ++++++++++ tools/perf/util/util.h | 17 +++ 13 files changed, 411 insertions(+), 81 deletions(-) -- 2.7.4 ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v4 1/5] perf/core: Define the common branch type classification 2017-04-11 22:21 [PATCH v4 0/5] perf report: Show branch type Jin Yao @ 2017-04-11 22:21 ` Jin Yao 2017-04-11 22:21 ` [PATCH v4 2/5] perf/x86/intel: Record branch type Jin Yao ` (4 subsequent siblings) 5 siblings, 0 replies; 23+ messages in thread From: Jin Yao @ 2017-04-11 22:21 UTC (permalink / raw) To: acme, jolsa, peterz, mingo, alexander.shishkin Cc: Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev, Jin Yao It is often useful to know the branch types while analyzing branch data. For example, a call is very different from a conditional branch. Currently we have to look it up in binary while the binary may later not be available and even the binary is available but user has to take some time. It is very useful for user to check it directly in perf report. Perf already has support for disassembling the branch instruction to get the x86 branch type. To keep consistent on kernel and userspace and make the classification more common, the patch adds the common branch type classification in perf_event.h. PERF_BR_NONE : unknown PERF_BR_JCC : conditional jump PERF_BR_JMP : jump PERF_BR_IND_JMP : indirect jump PERF_BR_CALL : call PERF_BR_IND_CALL : indirect call PERF_BR_RET : return PERF_BR_SYSCALL : syscall PERF_BR_SYSRET : syscall return PERF_BR_IRQ : hw interrupt/trap/fault PERF_BR_INT : sw interrupt PERF_BR_IRET : return from interrupt PERF_BR_FAR_BRANCH: not generic far branch type The patch also adds a new field type (4 bits) in perf_branch_entry to record the branch type. Since the disassembling of branch instruction needs some overhead, a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it needs to disassemble the branch instruction and record the branch type. Comparing to previous version, the major changes are: 1. Remove the PERF_BR_JCC_FWD/PERF_BR_JCC_BWD, they will be computed later in userspace. 2. Remove the "cross" field in perf_branch_entry. The cross page computing will be done later in userspace. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> --- include/uapi/linux/perf_event.h | 29 ++++++++++++++++++++++++++++- tools/include/uapi/linux/perf_event.h | 29 ++++++++++++++++++++++++++++- 2 files changed, 56 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index d09a9cd..69af012 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift { PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT = 14, /* no flags */ PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT = 15, /* no cycles */ + PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT = 16, /* save branch type */ + PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */ }; @@ -198,9 +200,32 @@ enum perf_branch_sample_type { PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT, PERF_SAMPLE_BRANCH_NO_CYCLES = 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT, + PERF_SAMPLE_BRANCH_TYPE_SAVE = + 1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT, + PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT, }; +/* + * Common flow change classification + */ +enum { + PERF_BR_NONE = 0, /* unknown */ + PERF_BR_JCC = 1, /* conditional jump */ + PERF_BR_JMP = 2, /* jump */ + PERF_BR_IND_JMP = 3, /* indirect jump */ + PERF_BR_CALL = 4, /* call */ + PERF_BR_IND_CALL = 5, /* indirect call */ + PERF_BR_RET = 6, /* return */ + PERF_BR_SYSCALL = 7, /* syscall */ + PERF_BR_SYSRET = 8, /* syscall return */ + PERF_BR_IRQ = 9, /* hw interrupt/trap/fault */ + PERF_BR_INT = 10, /* sw interrupt */ + PERF_BR_IRET = 11, /* return from interrupt */ + PERF_BR_FAR_BRANCH = 12, /* not generic far branch type */ + PERF_BR_MAX, +}; + #define PERF_SAMPLE_BRANCH_PLM_ALL \ (PERF_SAMPLE_BRANCH_USER|\ PERF_SAMPLE_BRANCH_KERNEL|\ @@ -999,6 +1024,7 @@ union perf_mem_data_src { * in_tx: running in a hardware transaction * abort: aborting a hardware transaction * cycles: cycles from last branch (or 0 if not supported) + * type: branch type */ struct perf_branch_entry { __u64 from; @@ -1008,7 +1034,8 @@ struct perf_branch_entry { in_tx:1, /* in transaction */ abort:1, /* transaction abort */ cycles:16, /* cycle count to last branch */ - reserved:44; + type:4, /* branch type */ + reserved:40; }; #endif /* _UAPI_LINUX_PERF_EVENT_H */ diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h index d09a9cd..69af012 100644 --- a/tools/include/uapi/linux/perf_event.h +++ b/tools/include/uapi/linux/perf_event.h @@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift { PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT = 14, /* no flags */ PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT = 15, /* no cycles */ + PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT = 16, /* save branch type */ + PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */ }; @@ -198,9 +200,32 @@ enum perf_branch_sample_type { PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT, PERF_SAMPLE_BRANCH_NO_CYCLES = 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT, + PERF_SAMPLE_BRANCH_TYPE_SAVE = + 1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT, + PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT, }; +/* + * Common flow change classification + */ +enum { + PERF_BR_NONE = 0, /* unknown */ + PERF_BR_JCC = 1, /* conditional jump */ + PERF_BR_JMP = 2, /* jump */ + PERF_BR_IND_JMP = 3, /* indirect jump */ + PERF_BR_CALL = 4, /* call */ + PERF_BR_IND_CALL = 5, /* indirect call */ + PERF_BR_RET = 6, /* return */ + PERF_BR_SYSCALL = 7, /* syscall */ + PERF_BR_SYSRET = 8, /* syscall return */ + PERF_BR_IRQ = 9, /* hw interrupt/trap/fault */ + PERF_BR_INT = 10, /* sw interrupt */ + PERF_BR_IRET = 11, /* return from interrupt */ + PERF_BR_FAR_BRANCH = 12, /* not generic far branch type */ + PERF_BR_MAX, +}; + #define PERF_SAMPLE_BRANCH_PLM_ALL \ (PERF_SAMPLE_BRANCH_USER|\ PERF_SAMPLE_BRANCH_KERNEL|\ @@ -999,6 +1024,7 @@ union perf_mem_data_src { * in_tx: running in a hardware transaction * abort: aborting a hardware transaction * cycles: cycles from last branch (or 0 if not supported) + * type: branch type */ struct perf_branch_entry { __u64 from; @@ -1008,7 +1034,8 @@ struct perf_branch_entry { in_tx:1, /* in transaction */ abort:1, /* transaction abort */ cycles:16, /* cycle count to last branch */ - reserved:44; + type:4, /* branch type */ + reserved:40; }; #endif /* _UAPI_LINUX_PERF_EVENT_H */ -- 2.7.4 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 2/5] perf/x86/intel: Record branch type 2017-04-11 22:21 [PATCH v4 0/5] perf report: Show branch type Jin Yao 2017-04-11 22:21 ` [PATCH v4 1/5] perf/core: Define the common branch type classification Jin Yao @ 2017-04-11 22:21 ` Jin Yao 2017-04-11 22:21 ` [PATCH v4 3/5] perf record: Create a new option save_type in --branch-filter Jin Yao ` (3 subsequent siblings) 5 siblings, 0 replies; 23+ messages in thread From: Jin Yao @ 2017-04-11 22:21 UTC (permalink / raw) To: acme, jolsa, peterz, mingo, alexander.shishkin Cc: Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev, Jin Yao Perf already has support for disassembling the branch instruction and using the branch type for filtering. The patch just records the branch type in perf_branch_entry. Before recording, the patch converts the x86 branch type to common branch type. Comparing to previous version, the major changes are: 1. Uses a lookup table to convert x86 branch type to common branch type. 2. Move the JCC forward/JCC backward and cross page computing to user space. 3. Initialize branch type to 0 in intel_pmu_lbr_read_32 and intel_pmu_lbr_read_64 Signed-off-by: Jin Yao <yao.jin@linux.intel.com> --- arch/x86/events/intel/lbr.c | 53 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 52 insertions(+), 1 deletion(-) diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c index 81b321a..d3b1dd6 100644 --- a/arch/x86/events/intel/lbr.c +++ b/arch/x86/events/intel/lbr.c @@ -109,6 +109,9 @@ enum { X86_BR_ZERO_CALL = 1 << 15,/* zero length call */ X86_BR_CALL_STACK = 1 << 16,/* call stack */ X86_BR_IND_JMP = 1 << 17,/* indirect jump */ + + X86_BR_TYPE_SAVE = 1 << 18,/* indicate to save branch type */ + }; #define X86_BR_PLM (X86_BR_USER | X86_BR_KERNEL) @@ -507,6 +510,7 @@ static void intel_pmu_lbr_read_32(struct cpu_hw_events *cpuc) cpuc->lbr_entries[i].to = msr_lastbranch.to; cpuc->lbr_entries[i].mispred = 0; cpuc->lbr_entries[i].predicted = 0; + cpuc->lbr_entries[i].type = 0; cpuc->lbr_entries[i].reserved = 0; } cpuc->lbr_stack.nr = i; @@ -593,6 +597,7 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc) cpuc->lbr_entries[out].in_tx = in_tx; cpuc->lbr_entries[out].abort = abort; cpuc->lbr_entries[out].cycles = cycles; + cpuc->lbr_entries[out].type = 0; cpuc->lbr_entries[out].reserved = 0; out++; } @@ -670,6 +675,10 @@ static int intel_pmu_setup_sw_lbr_filter(struct perf_event *event) if (br_type & PERF_SAMPLE_BRANCH_CALL) mask |= X86_BR_CALL | X86_BR_ZERO_CALL; + + if (br_type & PERF_SAMPLE_BRANCH_TYPE_SAVE) + mask |= X86_BR_TYPE_SAVE; + /* * stash actual user request into reg, it may * be used by fixup code for some CPU @@ -923,6 +932,44 @@ static int branch_type(unsigned long from, unsigned long to, int abort) return ret; } +#define X86_BR_TYPE_MAP_MAX 16 + +static int +common_branch_type(int type) +{ + int i, mask; + const int branch_map[X86_BR_TYPE_MAP_MAX] = { + PERF_BR_CALL, /* X86_BR_CALL */ + PERF_BR_RET, /* X86_BR_RET */ + PERF_BR_SYSCALL, /* X86_BR_SYSCALL */ + PERF_BR_SYSRET, /* X86_BR_SYSRET */ + PERF_BR_INT, /* X86_BR_INT */ + PERF_BR_IRET, /* X86_BR_IRET */ + PERF_BR_JCC, /* X86_BR_JCC */ + PERF_BR_JMP, /* X86_BR_JMP */ + PERF_BR_IRQ, /* X86_BR_IRQ */ + PERF_BR_IND_CALL, /* X86_BR_IND_CALL */ + PERF_BR_NONE, /* X86_BR_ABORT */ + PERF_BR_NONE, /* X86_BR_IN_TX */ + PERF_BR_NONE, /* X86_BR_NO_TX */ + PERF_BR_CALL, /* X86_BR_ZERO_CALL */ + PERF_BR_NONE, /* X86_BR_CALL_STACK */ + PERF_BR_IND_JMP, /* X86_BR_IND_JMP */ + }; + + type >>= 2; /* skip X86_BR_USER and X86_BR_KERNEL */ + mask = ~(~0 << 1); + + for (i = 0; i < X86_BR_TYPE_MAP_MAX; i++) { + if (type & mask) + return branch_map[i]; + + type >>= 1; + } + + return PERF_BR_NONE; +} + /* * implement actual branch filter based on user demand. * Hardware may not exactly satisfy that request, thus @@ -939,7 +986,8 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc) bool compress = false; /* if sampling all branches, then nothing to filter */ - if ((br_sel & X86_BR_ALL) == X86_BR_ALL) + if (((br_sel & X86_BR_ALL) == X86_BR_ALL) && + ((br_sel & X86_BR_TYPE_SAVE) != X86_BR_TYPE_SAVE)) return; for (i = 0; i < cpuc->lbr_stack.nr; i++) { @@ -960,6 +1008,9 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc) cpuc->lbr_entries[i].from = 0; compress = true; } + + if ((br_sel & X86_BR_TYPE_SAVE) == X86_BR_TYPE_SAVE) + cpuc->lbr_entries[i].type = common_branch_type(type); } if (!compress) -- 2.7.4 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 3/5] perf record: Create a new option save_type in --branch-filter 2017-04-11 22:21 [PATCH v4 0/5] perf report: Show branch type Jin Yao 2017-04-11 22:21 ` [PATCH v4 1/5] perf/core: Define the common branch type classification Jin Yao 2017-04-11 22:21 ` [PATCH v4 2/5] perf/x86/intel: Record branch type Jin Yao @ 2017-04-11 22:21 ` Jin Yao 2017-04-11 22:21 ` [PATCH v4 4/5] perf report: Show branch type statistics for stdio mode Jin Yao ` (2 subsequent siblings) 5 siblings, 0 replies; 23+ messages in thread From: Jin Yao @ 2017-04-11 22:21 UTC (permalink / raw) To: acme, jolsa, peterz, mingo, alexander.shishkin Cc: Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev, Jin Yao The option indicates the kernel to save branch type during sampling. One example: perf record -g --branch-filter any,save_type <command> Signed-off-by: Jin Yao <yao.jin@linux.intel.com> --- tools/perf/Documentation/perf-record.txt | 1 + tools/perf/util/parse-branch-options.c | 1 + 2 files changed, 2 insertions(+) diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index ea3789d..e2f5a4f 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -332,6 +332,7 @@ following filters are defined: - no_tx: only when the target is not in a hardware transaction - abort_tx: only when the target is a hardware transaction abort - cond: conditional branches + - save_type: save branch type during sampling in case binary is not available later + The option requires at least one branch type among any, any_call, any_ret, ind_call, cond. diff --git a/tools/perf/util/parse-branch-options.c b/tools/perf/util/parse-branch-options.c index 38fd115..e71fb5f 100644 --- a/tools/perf/util/parse-branch-options.c +++ b/tools/perf/util/parse-branch-options.c @@ -28,6 +28,7 @@ static const struct branch_mode branch_modes[] = { BRANCH_OPT("cond", PERF_SAMPLE_BRANCH_COND), BRANCH_OPT("ind_jmp", PERF_SAMPLE_BRANCH_IND_JUMP), BRANCH_OPT("call", PERF_SAMPLE_BRANCH_CALL), + BRANCH_OPT("save_type", PERF_SAMPLE_BRANCH_TYPE_SAVE), BRANCH_END }; -- 2.7.4 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH v4 4/5] perf report: Show branch type statistics for stdio mode 2017-04-11 22:21 [PATCH v4 0/5] perf report: Show branch type Jin Yao ` (2 preceding siblings ...) 2017-04-11 22:21 ` [PATCH v4 3/5] perf record: Create a new option save_type in --branch-filter Jin Yao @ 2017-04-11 22:21 ` Jin Yao 2017-04-18 18:53 ` Jiri Olsa 2017-04-18 18:53 ` Jiri Olsa 2017-04-11 22:21 ` [PATCH v4 5/5] perf report: Show branch type in callchain entry Jin Yao 2017-04-12 10:58 ` [PATCH v4 0/5] perf report: Show branch type Jiri Olsa 5 siblings, 2 replies; 23+ messages in thread From: Jin Yao @ 2017-04-11 22:21 UTC (permalink / raw) To: acme, jolsa, peterz, mingo, alexander.shishkin Cc: Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev, Jin Yao Show the branch type statistics at the end of perf report --stdio. For example: perf report --stdio JCC forward: 27.8% JCC backward: 9.7% CROSS_4K: 0.0% CROSS_2M: 14.3% JCC: 37.6% JMP: 0.0% IND_JMP: 6.5% CALL: 26.6% RET: 29.3% IRET: 0.0% The branch types are: --------------------- JCC forward: Conditional forward jump JCC backward: Conditional backward jump JMP: Jump imm IND_JMP: Jump reg/mem CALL: Call imm IND_CALL: Call reg/mem RET: Ret SYSCALL: Syscall SYSRET: Syscall return IRQ: HW interrupt/trap/fault INT: SW interrupt IRET: Return from interrupt FAR_BRANCH: Others not generic branch type CROSS_4K and CROSS_2M: ---------------------- They are the metrics checking for branches cross 4K or 2MB pages. It's an approximate computing. We don't know if the area is 4K or 2MB, so always compute both. To make the output simple, if a branch crosses 2M area, CROSS_4K will not be incremented. Comparing to previous version, the major changes are: Add the computing of JCC forward/JCC backward and cross page checking by using the from and to addresses. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> --- tools/perf/builtin-report.c | 70 +++++++++++++++++++++++++++++++++++++++++++++ tools/perf/util/event.h | 3 +- tools/perf/util/hist.c | 5 +--- tools/perf/util/util.c | 59 ++++++++++++++++++++++++++++++++++++++ tools/perf/util/util.h | 17 +++++++++++ 5 files changed, 149 insertions(+), 5 deletions(-) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index c18158b..c2889eb 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -66,6 +66,7 @@ struct report { u64 queue_size; int socket_filter; DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS); + struct branch_type_stat brtype_stat; }; static int report__config(const char *var, const char *value, void *cb) @@ -144,6 +145,24 @@ static int hist_iter__report_callback(struct hist_entry_iter *iter, return err; } +static int hist_iter__branch_callback(struct hist_entry_iter *iter, + struct addr_location *al __maybe_unused, + bool single __maybe_unused, + void *arg) +{ + struct hist_entry *he = iter->he; + struct report *rep = arg; + struct branch_info *bi; + + if (sort__mode == SORT_MODE__BRANCH) { + bi = he->branch_info; + branch_type_count(&rep->brtype_stat, &bi->flags, + bi->from.addr, bi->to.addr); + } + + return 0; +} + static int process_sample_event(struct perf_tool *tool, union perf_event *event, struct perf_sample *sample, @@ -182,6 +201,8 @@ static int process_sample_event(struct perf_tool *tool, */ if (!sample->branch_stack) goto out_put; + + iter.add_entry_cb = hist_iter__branch_callback; iter.ops = &hist_iter_branch; } else if (rep->mem_mode) { iter.ops = &hist_iter_mem; @@ -369,6 +390,50 @@ static size_t hists__fprintf_nr_sample_events(struct hists *hists, struct report return ret + fprintf(fp, "\n#\n"); } +static void branch_type_stat_display(FILE *fp, struct branch_type_stat *stat) +{ + u64 total = 0; + int i; + + for (i = 0; i < PERF_BR_MAX; i++) + total += stat->counts[i]; + + if (total == 0) + return; + + fprintf(fp, "\n#"); + fprintf(fp, "\n# Branch Statistics:"); + fprintf(fp, "\n#"); + + if (stat->jcc_fwd > 0) + fprintf(fp, "\n%12s: %5.1f%%", + "JCC forward", + 100.0 * (double)stat->jcc_fwd / (double)total); + + if (stat->jcc_bwd > 0) + fprintf(fp, "\n%12s: %5.1f%%", + "JCC backward", + 100.0 * (double)stat->jcc_bwd / (double)total); + + if (stat->cross_4k > 0) + fprintf(fp, "\n%12s: %5.1f%%", + "CROSS_4K", + 100.0 * (double)stat->cross_4k / (double)total); + + if (stat->cross_2m > 0) + fprintf(fp, "\n%12s: %5.1f%%", + "CROSS_2M", + 100.0 * (double)stat->cross_2m / (double)total); + + for (i = 0; i < PERF_BR_MAX; i++) { + if (stat->counts[i] > 0) + fprintf(fp, "\n%12s: %5.1f%%", + branch_type_name(i), + 100.0 * + (double)stat->counts[i] / (double)total); + } +} + static int perf_evlist__tty_browse_hists(struct perf_evlist *evlist, struct report *rep, const char *help) @@ -404,6 +469,9 @@ static int perf_evlist__tty_browse_hists(struct perf_evlist *evlist, perf_read_values_destroy(&rep->show_threads_values); } + if (sort__mode == SORT_MODE__BRANCH) + branch_type_stat_display(stdout, &rep->brtype_stat); + return 0; } @@ -936,6 +1004,8 @@ int cmd_report(int argc, const char **argv) if (has_br_stack && branch_call_mode) symbol_conf.show_branchflag_count = true; + memset(&report.brtype_stat, 0, sizeof(struct branch_type_stat)); + /* * Branch mode is a tristate: * -1 means default, so decide based on the file having branch data. diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h index eb7a7b2..26b4c2e 100644 --- a/tools/perf/util/event.h +++ b/tools/perf/util/event.h @@ -142,7 +142,8 @@ struct branch_flags { u64 in_tx:1; u64 abort:1; u64 cycles:16; - u64 reserved:44; + u64 type:4; + u64 reserved:40; }; struct branch_entry { diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index 61bf304..c8aee25 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -745,12 +745,9 @@ iter_prepare_branch_entry(struct hist_entry_iter *iter, struct addr_location *al } static int -iter_add_single_branch_entry(struct hist_entry_iter *iter, +iter_add_single_branch_entry(struct hist_entry_iter *iter __maybe_unused, struct addr_location *al __maybe_unused) { - /* to avoid calling callback function */ - iter->he = NULL; - return 0; } diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c index d8b45ce..a4b54a9 100644 --- a/tools/perf/util/util.c +++ b/tools/perf/util/util.c @@ -802,3 +802,62 @@ int unit_number__scnprintf(char *buf, size_t size, u64 n) return scnprintf(buf, size, "%" PRIu64 "%c", n, unit[i]); } + +static bool cross_area(u64 addr1, u64 addr2, int size) +{ + u64 align1, align2; + + align1 = addr1 & ~(size - 1); + align2 = addr2 & ~(size - 1); + + return (align1 != align2) ? true : false; +} + +#define AREA_4K 4096 +#define AREA_2M (2 * 1024 * 1024) + +void branch_type_count(struct branch_type_stat *stat, + struct branch_flags *flags, + u64 from, u64 to) +{ + if ((flags->type == PERF_BR_NONE) || (from == 0)) + return; + + stat->counts[flags->type]++; + + if (flags->type == PERF_BR_JCC) { + if (to > from) + stat->jcc_fwd++; + else + stat->jcc_bwd++; + } + + if (cross_area(from, to, AREA_2M)) + stat->cross_2m++; + else if (cross_area(from, to, AREA_4K)) + stat->cross_4k++; +} + +const char *branch_type_name(int type) +{ + const char *branch_names[PERF_BR_MAX] = { + "N/A", + "JCC", + "JMP", + "IND_JMP", + "CALL", + "IND_CALL", + "RET", + "SYSCALL", + "SYSRET", + "IRQ", + "INT", + "IRET", + "FAR_BRANCH", + }; + + if ((type >= 0) && (type < PERF_BR_MAX)) + return branch_names[type]; + + return NULL; +} diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h index 7cf5752..0a5bbcc 100644 --- a/tools/perf/util/util.h +++ b/tools/perf/util/util.h @@ -79,6 +79,7 @@ #include <linux/bitops.h> #include <termios.h> #include "strlist.h" +#include "../perf.h" extern const char *graph_line; extern const char *graph_dotted_line; @@ -380,4 +381,20 @@ struct inline_node { struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr); void inline_node__delete(struct inline_node *node); +struct branch_type_stat { + u64 counts[PERF_BR_MAX]; + u64 jcc_fwd; + u64 jcc_bwd; + u64 cross_4k; + u64 cross_2m; +}; + +struct branch_flags; + +void branch_type_count(struct branch_type_stat *stat, + struct branch_flags *flags, + u64 from, u64 to); + +const char *branch_type_name(int type); + #endif /* GIT_COMPAT_UTIL_H */ -- 2.7.4 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v4 4/5] perf report: Show branch type statistics for stdio mode 2017-04-11 22:21 ` [PATCH v4 4/5] perf report: Show branch type statistics for stdio mode Jin Yao @ 2017-04-18 18:53 ` Jiri Olsa 2017-04-19 0:53 ` Jin, Yao 2017-04-18 18:53 ` Jiri Olsa 1 sibling, 1 reply; 23+ messages in thread From: Jiri Olsa @ 2017-04-18 18:53 UTC (permalink / raw) To: Jin Yao Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev On Wed, Apr 12, 2017 at 06:21:05AM +0800, Jin Yao wrote: SNIP > +const char *branch_type_name(int type) > +{ > + const char *branch_names[PERF_BR_MAX] = { > + "N/A", > + "JCC", > + "JMP", > + "IND_JMP", > + "CALL", > + "IND_CALL", > + "RET", > + "SYSCALL", > + "SYSRET", > + "IRQ", > + "INT", > + "IRET", > + "FAR_BRANCH", > + }; > + > + if ((type >= 0) && (type < PERF_BR_MAX)) > + return branch_names[type]; > + > + return NULL; looks like we should add util/branch.c with above functions and merge it with util/parse-branch-options.c we create new file even for less code ;-) thanks, jirka ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 4/5] perf report: Show branch type statistics for stdio mode 2017-04-18 18:53 ` Jiri Olsa @ 2017-04-19 0:53 ` Jin, Yao 2017-04-19 4:11 ` Jin, Yao 0 siblings, 1 reply; 23+ messages in thread From: Jin, Yao @ 2017-04-19 0:53 UTC (permalink / raw) To: Jiri Olsa Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev On 4/19/2017 2:53 AM, Jiri Olsa wrote: > On Wed, Apr 12, 2017 at 06:21:05AM +0800, Jin Yao wrote: > > SNIP > >> +const char *branch_type_name(int type) >> +{ >> + const char *branch_names[PERF_BR_MAX] = { >> + "N/A", >> + "JCC", >> + "JMP", >> + "IND_JMP", >> + "CALL", >> + "IND_CALL", >> + "RET", >> + "SYSCALL", >> + "SYSRET", >> + "IRQ", >> + "INT", >> + "IRET", >> + "FAR_BRANCH", >> + }; >> + >> + if ((type >= 0) && (type < PERF_BR_MAX)) >> + return branch_names[type]; >> + >> + return NULL; > looks like we should add util/branch.c with above functions > and merge it with util/parse-branch-options.c > > we create new file even for less code ;-) > > thanks, > jirka Could we directly add branch_type_name() in util/parse-branch-options.c? I just feel it's a bit waste of creating a new file for less code. :) Thanks Jin Yao ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 4/5] perf report: Show branch type statistics for stdio mode 2017-04-19 0:53 ` Jin, Yao @ 2017-04-19 4:11 ` Jin, Yao 0 siblings, 0 replies; 23+ messages in thread From: Jin, Yao @ 2017-04-19 4:11 UTC (permalink / raw) To: Jiri Olsa Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev On 4/19/2017 8:53 AM, Jin, Yao wrote: > > > On 4/19/2017 2:53 AM, Jiri Olsa wrote: >> On Wed, Apr 12, 2017 at 06:21:05AM +0800, Jin Yao wrote: >> >> SNIP >> >>> +const char *branch_type_name(int type) >>> +{ >>> + const char *branch_names[PERF_BR_MAX] = { >>> + "N/A", >>> + "JCC", >>> + "JMP", >>> + "IND_JMP", >>> + "CALL", >>> + "IND_CALL", >>> + "RET", >>> + "SYSCALL", >>> + "SYSRET", >>> + "IRQ", >>> + "INT", >>> + "IRET", >>> + "FAR_BRANCH", >>> + }; >>> + >>> + if ((type >= 0) && (type < PERF_BR_MAX)) >>> + return branch_names[type]; >>> + >>> + return NULL; >> looks like we should add util/branch.c with above functions >> and merge it with util/parse-branch-options.c >> >> we create new file even for less code ;-) >> >> thanks, >> jirka > > Could we directly add branch_type_name() in util/parse-branch-options.c? > > I just feel it's a bit waste of creating a new file for less code. :) > > Thanks > Jin Yao After considering again, yes, creating util/branch.c should be better. I will do that. Thanks Jin Yao ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 4/5] perf report: Show branch type statistics for stdio mode 2017-04-11 22:21 ` [PATCH v4 4/5] perf report: Show branch type statistics for stdio mode Jin Yao 2017-04-18 18:53 ` Jiri Olsa @ 2017-04-18 18:53 ` Jiri Olsa 2017-04-19 0:41 ` Jin, Yao 1 sibling, 1 reply; 23+ messages in thread From: Jiri Olsa @ 2017-04-18 18:53 UTC (permalink / raw) To: Jin Yao Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev On Wed, Apr 12, 2017 at 06:21:05AM +0800, Jin Yao wrote: SNIP > +static int hist_iter__branch_callback(struct hist_entry_iter *iter, > + struct addr_location *al __maybe_unused, > + bool single __maybe_unused, > + void *arg) > +{ > + struct hist_entry *he = iter->he; > + struct report *rep = arg; > + struct branch_info *bi; > + > + if (sort__mode == SORT_MODE__BRANCH) { is this check necessary? the hist_iter__branch_callback was set based on this check jirka ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 4/5] perf report: Show branch type statistics for stdio mode 2017-04-18 18:53 ` Jiri Olsa @ 2017-04-19 0:41 ` Jin, Yao 0 siblings, 0 replies; 23+ messages in thread From: Jin, Yao @ 2017-04-19 0:41 UTC (permalink / raw) To: Jiri Olsa Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev On 4/19/2017 2:53 AM, Jiri Olsa wrote: > On Wed, Apr 12, 2017 at 06:21:05AM +0800, Jin Yao wrote: > > SNIP > >> +static int hist_iter__branch_callback(struct hist_entry_iter *iter, >> + struct addr_location *al __maybe_unused, >> + bool single __maybe_unused, >> + void *arg) >> +{ >> + struct hist_entry *he = iter->he; >> + struct report *rep = arg; >> + struct branch_info *bi; >> + >> + if (sort__mode == SORT_MODE__BRANCH) { > is this check necessary? the hist_iter__branch_callback > was set based on this check > > jirka Let me double check. Thanks Jin Yao ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH v4 5/5] perf report: Show branch type in callchain entry 2017-04-11 22:21 [PATCH v4 0/5] perf report: Show branch type Jin Yao ` (3 preceding siblings ...) 2017-04-11 22:21 ` [PATCH v4 4/5] perf report: Show branch type statistics for stdio mode Jin Yao @ 2017-04-11 22:21 ` Jin Yao 2017-04-18 18:53 ` Jiri Olsa 2017-04-18 18:53 ` Jiri Olsa 2017-04-12 10:58 ` [PATCH v4 0/5] perf report: Show branch type Jiri Olsa 5 siblings, 2 replies; 23+ messages in thread From: Jin Yao @ 2017-04-11 22:21 UTC (permalink / raw) To: acme, jolsa, peterz, mingo, alexander.shishkin Cc: Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev, Jin Yao Show branch type in callchain entry. The branch type is printed with other LBR information (such as cycles/abort/...). One example: perf report --branch-history --stdio --no-children --23.54%--main div.c:42 (CROSS_2M RET cycles:2) compute_flag div.c:28 (RET cycles:2) compute_flag div.c:27 (CROSS_2M RET cycles:1) rand rand.c:28 (CROSS_4K RET cycles:1) rand rand.c:28 (CROSS_2M RET cycles:1) __random random.c:298 (CROSS_4K RET cycles:1) __random random.c:297 (JCC backward CROSS_2M cycles:1) __random random.c:295 (JCC forward CROSS_4K cycles:1) __random random.c:295 (JCC backward CROSS_2M cycles:1) __random random.c:295 (JCC forward CROSS_4K cycles:1) __random random.c:295 (CROSS_2M RET cycles:9) Comparing to previous version, the major changes are: Since we have to compute the JCC forward/JCC backward and cross page checking in user space by from and to addresses, while each callchain entry only contains one ip (either from or to), so this patch will append a branch from address to the callchain entry which just contains the to ip. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> --- tools/perf/util/callchain.c | 195 ++++++++++++++++++++++++++++++-------------- tools/perf/util/callchain.h | 4 +- tools/perf/util/machine.c | 26 ++++-- 3 files changed, 152 insertions(+), 73 deletions(-) diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c index 2e5eff5..3c875b1 100644 --- a/tools/perf/util/callchain.c +++ b/tools/perf/util/callchain.c @@ -467,6 +467,11 @@ fill_node(struct callchain_node *node, struct callchain_cursor *cursor) call->cycles_count = cursor_node->branch_flags.cycles; call->iter_count = cursor_node->nr_loop_iter; call->samples_count = cursor_node->samples; + + branch_type_count(&call->brtype_stat, + &cursor_node->branch_flags, + cursor_node->branch_from, + cursor_node->ip); } list_add_tail(&call->list, &node->val); @@ -579,6 +584,11 @@ static enum match_result match_chain(struct callchain_cursor_node *node, cnode->cycles_count += node->branch_flags.cycles; cnode->iter_count += node->nr_loop_iter; cnode->samples_count += node->samples; + + branch_type_count(&cnode->brtype_stat, + &node->branch_flags, + node->branch_from, + node->ip); } return MATCH_EQ; @@ -813,7 +823,7 @@ merge_chain_branch(struct callchain_cursor *cursor, list_for_each_entry_safe(list, next_list, &src->val, list) { callchain_cursor_append(cursor, list->ip, list->ms.map, list->ms.sym, - false, NULL, 0, 0); + false, NULL, 0, 0, 0); list_del(&list->list); map__zput(list->ms.map); free(list); @@ -853,7 +863,7 @@ int callchain_merge(struct callchain_cursor *cursor, int callchain_cursor_append(struct callchain_cursor *cursor, u64 ip, struct map *map, struct symbol *sym, bool branch, struct branch_flags *flags, - int nr_loop_iter, int samples) + int nr_loop_iter, int samples, u64 branch_from) { struct callchain_cursor_node *node = *cursor->last; @@ -877,6 +887,7 @@ int callchain_cursor_append(struct callchain_cursor *cursor, memcpy(&node->branch_flags, flags, sizeof(struct branch_flags)); + node->branch_from = branch_from; cursor->nr++; cursor->last = &node->next; @@ -1105,95 +1116,151 @@ int callchain_branch_counts(struct callchain_root *root, cycles_count); } +static int branch_type_str(struct branch_type_stat *stat, + char *bf, int bfsize) +{ + int i, j = 0, printed = 0; + u64 total = 0; + + for (i = 0; i < PERF_BR_MAX; i++) + total += stat->counts[i]; + + if (total == 0) + return 0; + + printed += scnprintf(bf + printed, bfsize - printed, " ("); + + if (stat->jcc_fwd > 0) { + j++; + printed += scnprintf(bf + printed, bfsize - printed, + "JCC forward"); + } + + if (stat->jcc_bwd > 0) { + if (j++) + printed += scnprintf(bf + printed, bfsize - printed, + " JCC backward"); + else + printed += scnprintf(bf + printed, bfsize - printed, + "JCC backward"); + } + + if (stat->cross_4k > 0) { + if (j++) + printed += scnprintf(bf + printed, bfsize - printed, + " CROSS_4K"); + else + printed += scnprintf(bf + printed, bfsize - printed, + "CROSS_4K"); + } + + if (stat->cross_2m > 0) { + if (j++) + printed += scnprintf(bf + printed, bfsize - printed, + " CROSS_2M"); + else + printed += scnprintf(bf + printed, bfsize - printed, + "CROSS_2M"); + } + + for (i = 0; i < PERF_BR_MAX; i++) { + if (i == PERF_BR_JCC) + continue; + + if (stat->counts[i] > 0) { + if (j++) + printed += scnprintf(bf + printed, + bfsize - printed, + " %s", + branch_type_name(i)); + else + printed += scnprintf(bf + printed, + bfsize - printed, + "%s", + branch_type_name(i)); + } + } + + return printed; +} + static int counts_str_build(char *bf, int bfsize, u64 branch_count, u64 predicted_count, u64 abort_count, u64 cycles_count, - u64 iter_count, u64 samples_count) + u64 iter_count, u64 samples_count, + struct branch_type_stat *brtype_stat) { - double predicted_percent = 0.0; - const char *null_str = ""; - char iter_str[32]; - char cycle_str[32]; - char *istr, *cstr; u64 cycles; + int printed, i = 0; if (branch_count == 0) return scnprintf(bf, bfsize, " (calltrace)"); + printed = branch_type_str(brtype_stat, bf, bfsize); + if (printed) + i++; + cycles = cycles_count / branch_count; + if (cycles) { + if (i++) + printed += scnprintf(bf + printed, bfsize - printed, + " cycles:%" PRId64 "", cycles); + else + printed += scnprintf(bf + printed, bfsize - printed, + " (cycles:%" PRId64 "", cycles); + } if (iter_count && samples_count) { - if (cycles > 0) - scnprintf(iter_str, sizeof(iter_str), - " iterations:%" PRId64 "", - iter_count / samples_count); + if (i++) + printed += scnprintf(bf + printed, bfsize - printed, + " iterations:%" PRId64 "", + iter_count / samples_count); else - scnprintf(iter_str, sizeof(iter_str), - "iterations:%" PRId64 "", - iter_count / samples_count); - istr = iter_str; - } else - istr = (char *)null_str; - - if (cycles > 0) { - scnprintf(cycle_str, sizeof(cycle_str), - "cycles:%" PRId64 "", cycles); - cstr = cycle_str; - } else - cstr = (char *)null_str; - - predicted_percent = predicted_count * 100.0 / branch_count; + printed += scnprintf(bf + printed, bfsize - printed, + " (iterations:%" PRId64 "", + iter_count / samples_count); + } - if ((predicted_count == branch_count) && (abort_count == 0)) { - if ((cycles > 0) || (istr != (char *)null_str)) - return scnprintf(bf, bfsize, " (%s%s)", cstr, istr); + if (predicted_count < branch_count) { + if (i++) + printed += scnprintf(bf + printed, bfsize - printed, + " predicted:%.1f%%", + predicted_count * 100.0 / branch_count); else - return scnprintf(bf, bfsize, "%s", (char *)null_str); - } - - if ((predicted_count < branch_count) && (abort_count == 0)) { - if ((cycles > 0) || (istr != (char *)null_str)) - return scnprintf(bf, bfsize, - " (predicted:%.1f%% %s%s)", - predicted_percent, cstr, istr); - else { - return scnprintf(bf, bfsize, - " (predicted:%.1f%%)", - predicted_percent); - } + printed += scnprintf(bf + printed, bfsize - printed, + " (predicted:%.1f%%", + predicted_count * 100.0 / branch_count); } - if ((predicted_count == branch_count) && (abort_count > 0)) { - if ((cycles > 0) || (istr != (char *)null_str)) - return scnprintf(bf, bfsize, - " (abort:%" PRId64 " %s%s)", - abort_count, cstr, istr); + if (abort_count) { + if (i++) + printed += scnprintf(bf + printed, bfsize - printed, + " abort:%.1f%%", + abort_count * 100.0 / branch_count); else - return scnprintf(bf, bfsize, - " (abort:%" PRId64 ")", - abort_count); + printed += scnprintf(bf + printed, bfsize - printed, + " (abort:%.1f%%", + abort_count * 100.0 / branch_count); } - if ((cycles > 0) || (istr != (char *)null_str)) - return scnprintf(bf, bfsize, - " (predicted:%.1f%% abort:%" PRId64 " %s%s)", - predicted_percent, abort_count, cstr, istr); + if (i) + return scnprintf(bf + printed, bfsize - printed, ")"); - return scnprintf(bf, bfsize, - " (predicted:%.1f%% abort:%" PRId64 ")", - predicted_percent, abort_count); + bf[0] = 0; + return 0; } static int callchain_counts_printf(FILE *fp, char *bf, int bfsize, u64 branch_count, u64 predicted_count, u64 abort_count, u64 cycles_count, - u64 iter_count, u64 samples_count) + u64 iter_count, u64 samples_count, + struct branch_type_stat *brtype_stat) { - char str[128]; + char str[256]; counts_str_build(str, sizeof(str), branch_count, predicted_count, abort_count, cycles_count, - iter_count, samples_count); + iter_count, samples_count, brtype_stat); if (fp) return fprintf(fp, "%s", str); @@ -1225,7 +1292,8 @@ int callchain_list_counts__printf_value(struct callchain_node *node, return callchain_counts_printf(fp, bf, bfsize, branch_count, predicted_count, abort_count, - cycles_count, iter_count, samples_count); + cycles_count, iter_count, samples_count, + &clist->brtype_stat); } static void free_callchain_node(struct callchain_node *node) @@ -1350,7 +1418,8 @@ int callchain_cursor__copy(struct callchain_cursor *dst, rc = callchain_cursor_append(dst, node->ip, node->map, node->sym, node->branch, &node->branch_flags, - node->nr_loop_iter, node->samples); + node->nr_loop_iter, node->samples, + node->branch_from); if (rc) break; diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h index c56c23d..b93897a 100644 --- a/tools/perf/util/callchain.h +++ b/tools/perf/util/callchain.h @@ -119,6 +119,7 @@ struct callchain_list { u64 cycles_count; u64 iter_count; u64 samples_count; + struct branch_type_stat brtype_stat; char *srcline; struct list_head list; }; @@ -135,6 +136,7 @@ struct callchain_cursor_node { struct symbol *sym; bool branch; struct branch_flags branch_flags; + u64 branch_from; int nr_loop_iter; int samples; struct callchain_cursor_node *next; @@ -198,7 +200,7 @@ static inline void callchain_cursor_reset(struct callchain_cursor *cursor) int callchain_cursor_append(struct callchain_cursor *cursor, u64 ip, struct map *map, struct symbol *sym, bool branch, struct branch_flags *flags, - int nr_loop_iter, int samples); + int nr_loop_iter, int samples, u64 branch_from); /* Close a cursor writing session. Initialize for the reader */ static inline void callchain_cursor_commit(struct callchain_cursor *cursor) diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index dfc6004..2309614 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -1673,7 +1673,8 @@ static int add_callchain_ip(struct thread *thread, bool branch, struct branch_flags *flags, int nr_loop_iter, - int samples) + int samples, + u64 branch_from) { struct addr_location al; @@ -1726,7 +1727,8 @@ static int add_callchain_ip(struct thread *thread, if (symbol_conf.hide_unresolved && al.sym == NULL) return 0; return callchain_cursor_append(cursor, al.addr, al.map, al.sym, - branch, flags, nr_loop_iter, samples); + branch, flags, nr_loop_iter, samples, + branch_from); } struct branch_info *sample__resolve_bstack(struct perf_sample *sample, @@ -1805,7 +1807,7 @@ static int resolve_lbr_callchain_sample(struct thread *thread, struct ip_callchain *chain = sample->callchain; int chain_nr = min(max_stack, (int)chain->nr), i; u8 cpumode = PERF_RECORD_MISC_USER; - u64 ip; + u64 ip, branch_from = 0; for (i = 0; i < chain_nr; i++) { if (chain->ips[i] == PERF_CONTEXT_USER) @@ -1847,6 +1849,8 @@ static int resolve_lbr_callchain_sample(struct thread *thread, ip = lbr_stack->entries[0].to; branch = true; flags = &lbr_stack->entries[0].flags; + branch_from = + lbr_stack->entries[0].from; } } else { if (j < lbr_nr) { @@ -1861,12 +1865,15 @@ static int resolve_lbr_callchain_sample(struct thread *thread, ip = lbr_stack->entries[0].to; branch = true; flags = &lbr_stack->entries[0].flags; + branch_from = + lbr_stack->entries[0].from; } } err = add_callchain_ip(thread, cursor, parent, root_al, &cpumode, ip, - branch, flags, 0, 0); + branch, flags, 0, 0, + branch_from); if (err) return (err < 0) ? err : 0; } @@ -1965,19 +1972,20 @@ static int thread__resolve_callchain_sample(struct thread *thread, root_al, NULL, be[i].to, true, &be[i].flags, - nr_loop_iter, 1); + nr_loop_iter, 1, + be[i].from); else err = add_callchain_ip(thread, cursor, parent, root_al, NULL, be[i].to, true, &be[i].flags, - 0, 0); + 0, 0, be[i].from); if (!err) err = add_callchain_ip(thread, cursor, parent, root_al, NULL, be[i].from, true, &be[i].flags, - 0, 0); + 0, 0, 0); if (err == -EINVAL) break; if (err) @@ -2007,7 +2015,7 @@ static int thread__resolve_callchain_sample(struct thread *thread, err = add_callchain_ip(thread, cursor, parent, root_al, &cpumode, ip, - false, NULL, 0, 0); + false, NULL, 0, 0, 0); if (err) return (err < 0) ? err : 0; @@ -2024,7 +2032,7 @@ static int unwind_entry(struct unwind_entry *entry, void *arg) return 0; return callchain_cursor_append(cursor, entry->ip, entry->map, entry->sym, - false, NULL, 0, 0); + false, NULL, 0, 0, 0); } static int thread__resolve_callchain_unwind(struct thread *thread, -- 2.7.4 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH v4 5/5] perf report: Show branch type in callchain entry 2017-04-11 22:21 ` [PATCH v4 5/5] perf report: Show branch type in callchain entry Jin Yao @ 2017-04-18 18:53 ` Jiri Olsa 2017-04-19 0:33 ` Jin, Yao 2017-04-18 18:53 ` Jiri Olsa 1 sibling, 1 reply; 23+ messages in thread From: Jiri Olsa @ 2017-04-18 18:53 UTC (permalink / raw) To: Jin Yao Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev On Wed, Apr 12, 2017 at 06:21:06AM +0800, Jin Yao wrote: SNIP > +static int branch_type_str(struct branch_type_stat *stat, > + char *bf, int bfsize) > +{ > + int i, j = 0, printed = 0; > + u64 total = 0; > + > + for (i = 0; i < PERF_BR_MAX; i++) > + total += stat->counts[i]; > + > + if (total == 0) > + return 0; > + > + printed += scnprintf(bf + printed, bfsize - printed, " ("); > + > + if (stat->jcc_fwd > 0) { > + j++; > + printed += scnprintf(bf + printed, bfsize - printed, > + "JCC forward"); > + } > + > + if (stat->jcc_bwd > 0) { > + if (j++) > + printed += scnprintf(bf + printed, bfsize - printed, > + " JCC backward"); > + else > + printed += scnprintf(bf + printed, bfsize - printed, > + "JCC backward"); > + } > + > + if (stat->cross_4k > 0) { > + if (j++) > + printed += scnprintf(bf + printed, bfsize - printed, > + " CROSS_4K"); > + else > + printed += scnprintf(bf + printed, bfsize - printed, > + "CROSS_4K"); > + } could that 2 legs if be shortened to just one scnprintf like (untested): printed += scnprintf(bf + printed, bfsize - printed, "%s%s", j++ ? " " : "", "CROSS_4K"); I'd also probably use some kind of macro or function with all that similar code, but I dont insist ;-) thanks, jirka ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 5/5] perf report: Show branch type in callchain entry 2017-04-18 18:53 ` Jiri Olsa @ 2017-04-19 0:33 ` Jin, Yao 0 siblings, 0 replies; 23+ messages in thread From: Jin, Yao @ 2017-04-19 0:33 UTC (permalink / raw) To: Jiri Olsa Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev On 4/19/2017 2:53 AM, Jiri Olsa wrote: > On Wed, Apr 12, 2017 at 06:21:06AM +0800, Jin Yao wrote: > > SNIP > >> +static int branch_type_str(struct branch_type_stat *stat, >> + char *bf, int bfsize) >> +{ >> + int i, j = 0, printed = 0; >> + u64 total = 0; >> + >> + for (i = 0; i < PERF_BR_MAX; i++) >> + total += stat->counts[i]; >> + >> + if (total == 0) >> + return 0; >> + >> + printed += scnprintf(bf + printed, bfsize - printed, " ("); >> + >> + if (stat->jcc_fwd > 0) { >> + j++; >> + printed += scnprintf(bf + printed, bfsize - printed, >> + "JCC forward"); >> + } >> + >> + if (stat->jcc_bwd > 0) { >> + if (j++) >> + printed += scnprintf(bf + printed, bfsize - printed, >> + " JCC backward"); >> + else >> + printed += scnprintf(bf + printed, bfsize - printed, >> + "JCC backward"); >> + } >> + >> + if (stat->cross_4k > 0) { >> + if (j++) >> + printed += scnprintf(bf + printed, bfsize - printed, >> + " CROSS_4K"); >> + else >> + printed += scnprintf(bf + printed, bfsize - printed, >> + "CROSS_4K"); >> + } > could that 2 legs if be shortened to just one scnprintf like (untested): > > printed += scnprintf(bf + printed, bfsize - printed, "%s%s", j++ ? " " : "", "CROSS_4K"); > > I'd also probably use some kind of macro or function > with all that similar code, but I dont insist ;-) > > thanks, > jirka Thanks for this suggestion. I will use this kind of code. Of course, I will test. :) Thanks Jin Yao ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 5/5] perf report: Show branch type in callchain entry 2017-04-11 22:21 ` [PATCH v4 5/5] perf report: Show branch type in callchain entry Jin Yao 2017-04-18 18:53 ` Jiri Olsa @ 2017-04-18 18:53 ` Jiri Olsa 2017-04-19 0:32 ` Jin, Yao 1 sibling, 1 reply; 23+ messages in thread From: Jiri Olsa @ 2017-04-18 18:53 UTC (permalink / raw) To: Jin Yao Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev On Wed, Apr 12, 2017 at 06:21:06AM +0800, Jin Yao wrote: SNIP > static int counts_str_build(char *bf, int bfsize, > u64 branch_count, u64 predicted_count, > u64 abort_count, u64 cycles_count, > - u64 iter_count, u64 samples_count) > + u64 iter_count, u64 samples_count, > + struct branch_type_stat *brtype_stat) > { > - double predicted_percent = 0.0; > - const char *null_str = ""; > - char iter_str[32]; > - char cycle_str[32]; > - char *istr, *cstr; > u64 cycles; > + int printed, i = 0; > > if (branch_count == 0) > return scnprintf(bf, bfsize, " (calltrace)"); > > + printed = branch_type_str(brtype_stat, bf, bfsize); > + if (printed) > + i++; > + > cycles = cycles_count / branch_count; > + if (cycles) { > + if (i++) > + printed += scnprintf(bf + printed, bfsize - printed, > + " cycles:%" PRId64 "", cycles); > + else > + printed += scnprintf(bf + printed, bfsize - printed, > + " (cycles:%" PRId64 "", cycles); > + } > > if (iter_count && samples_count) { > - if (cycles > 0) > - scnprintf(iter_str, sizeof(iter_str), > - " iterations:%" PRId64 "", > - iter_count / samples_count); > + if (i++) > + printed += scnprintf(bf + printed, bfsize - printed, > + " iterations:%" PRId64 "", > + iter_count / samples_count); > else > - scnprintf(iter_str, sizeof(iter_str), > - "iterations:%" PRId64 "", > - iter_count / samples_count); > - istr = iter_str; could you please put the change from using iter_str to bf into separate patch before the actual branch display change? it's hard to see if anything is broken ;-) thanks, jirka ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 5/5] perf report: Show branch type in callchain entry 2017-04-18 18:53 ` Jiri Olsa @ 2017-04-19 0:32 ` Jin, Yao 0 siblings, 0 replies; 23+ messages in thread From: Jin, Yao @ 2017-04-19 0:32 UTC (permalink / raw) To: Jiri Olsa Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev On 4/19/2017 2:53 AM, Jiri Olsa wrote: > On Wed, Apr 12, 2017 at 06:21:06AM +0800, Jin Yao wrote: > > SNIP > >> static int counts_str_build(char *bf, int bfsize, >> u64 branch_count, u64 predicted_count, >> u64 abort_count, u64 cycles_count, >> - u64 iter_count, u64 samples_count) >> + u64 iter_count, u64 samples_count, >> + struct branch_type_stat *brtype_stat) >> { >> - double predicted_percent = 0.0; >> - const char *null_str = ""; >> - char iter_str[32]; >> - char cycle_str[32]; >> - char *istr, *cstr; >> u64 cycles; >> + int printed, i = 0; >> >> if (branch_count == 0) >> return scnprintf(bf, bfsize, " (calltrace)"); >> >> + printed = branch_type_str(brtype_stat, bf, bfsize); >> + if (printed) >> + i++; >> + >> cycles = cycles_count / branch_count; >> + if (cycles) { >> + if (i++) >> + printed += scnprintf(bf + printed, bfsize - printed, >> + " cycles:%" PRId64 "", cycles); >> + else >> + printed += scnprintf(bf + printed, bfsize - printed, >> + " (cycles:%" PRId64 "", cycles); >> + } >> >> if (iter_count && samples_count) { >> - if (cycles > 0) >> - scnprintf(iter_str, sizeof(iter_str), >> - " iterations:%" PRId64 "", >> - iter_count / samples_count); >> + if (i++) >> + printed += scnprintf(bf + printed, bfsize - printed, >> + " iterations:%" PRId64 "", >> + iter_count / samples_count); >> else >> - scnprintf(iter_str, sizeof(iter_str), >> - "iterations:%" PRId64 "", >> - iter_count / samples_count); >> - istr = iter_str; > could you please put the change from using iter_str > to bf into separate patch before the actual branch > display change? > > it's hard to see if anything is broken ;-) > > thanks, > jirka Got it, I will separate the patches. Thanks Jin Yao ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 0/5] perf report: Show branch type 2017-04-11 22:21 [PATCH v4 0/5] perf report: Show branch type Jin Yao ` (4 preceding siblings ...) 2017-04-11 22:21 ` [PATCH v4 5/5] perf report: Show branch type in callchain entry Jin Yao @ 2017-04-12 10:58 ` Jiri Olsa 2017-04-12 12:25 ` Jin, Yao 2017-04-13 2:00 ` Jin, Yao 5 siblings, 2 replies; 23+ messages in thread From: Jiri Olsa @ 2017-04-12 10:58 UTC (permalink / raw) To: Jin Yao Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev On Wed, Apr 12, 2017 at 06:21:01AM +0800, Jin Yao wrote: SNIP > > 3. Use 2 bits in perf_branch_entry for a "cross" metrics checking > for branch cross 4K or 2M area. It's an approximate computing > for checking if the branch cross 4K page or 2MB page. > > For example: > > perf record -g --branch-filter any,save_type <command> > > perf report --stdio > > JCC forward: 27.7% > JCC backward: 9.8% > JMP: 0.0% > IND_JMP: 6.5% > CALL: 26.6% > IND_CALL: 0.0% > RET: 29.3% > IRET: 0.0% > CROSS_4K: 0.0% > CROSS_2M: 14.3% got mangled perf report --stdio output for: [root@ibm-x3650m4-02 perf]# ./perf record -j any,save_type kill kill: not enough arguments [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.013 MB perf.data (18 samples) ] [root@ibm-x3650m4-02 perf]# ./perf report --stdio -f | head -30 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 253 of event 'cycles' # Event count (approx.): 253 # # Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles # ........ ....... .................... ....................................... ....................................... .................. # 8.30% perf Um [kernel.vmlinux] [k] __intel_pmu_enable_all.constprop.17 [k] native_write_msr - 7.91% perf Um [kernel.vmlinux] [k] intel_pmu_lbr_enable_all [k] __intel_pmu_enable_all.constprop.17 - 7.91% perf Um [kernel.vmlinux] [k] native_write_msr [k] intel_pmu_lbr_enable_all - 6.32% kill libc-2.24.so [.] _dl_addr [.] _dl_addr - 5.93% perf Um [kernel.vmlinux] [k] perf_iterate_ctx [k] perf_iterate_ctx - 2.77% kill libc-2.24.so [.] malloc [.] malloc - 1.98% kill libc-2.24.so [.] _int_malloc [.] _int_malloc - 1.58% kill [kernel.vmlinux] [k] __rb_insert_augmented [k] __rb_insert_augmented - 1.58% perf Um [kernel.vmlinux] [k] perf_event_exec [k] perf_event_exec - 1.19% kill [kernel.vmlinux] [k] anon_vma_interval_tree_insert [k] anon_vma_interval_tree_insert - 1.19% kill [kernel.vmlinux] [k] free_pgd_range [k] free_pgd_range - 1.19% kill [kernel.vmlinux] [k] n_tty_write [k] n_tty_write - 1.19% perf Um [kernel.vmlinux] [k] native_sched_clock [k] sched_clock - ... SNIP jirka ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 0/5] perf report: Show branch type 2017-04-12 10:58 ` [PATCH v4 0/5] perf report: Show branch type Jiri Olsa @ 2017-04-12 12:25 ` Jin, Yao 2017-04-12 14:26 ` Jiri Olsa 2017-04-13 2:00 ` Jin, Yao 1 sibling, 1 reply; 23+ messages in thread From: Jin, Yao @ 2017-04-12 12:25 UTC (permalink / raw) To: Jiri Olsa Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev On 4/12/2017 6:58 PM, Jiri Olsa wrote: > On Wed, Apr 12, 2017 at 06:21:01AM +0800, Jin Yao wrote: > > SNIP > >> 3. Use 2 bits in perf_branch_entry for a "cross" metrics checking >> for branch cross 4K or 2M area. It's an approximate computing >> for checking if the branch cross 4K page or 2MB page. >> >> For example: >> >> perf record -g --branch-filter any,save_type <command> >> >> perf report --stdio >> >> JCC forward: 27.7% >> JCC backward: 9.8% >> JMP: 0.0% >> IND_JMP: 6.5% >> CALL: 26.6% >> IND_CALL: 0.0% >> RET: 29.3% >> IRET: 0.0% >> CROSS_4K: 0.0% >> CROSS_2M: 14.3% > got mangled perf report --stdio output for: > > > [root@ibm-x3650m4-02 perf]# ./perf record -j any,save_type kill > kill: not enough arguments > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.013 MB perf.data (18 samples) ] > > [root@ibm-x3650m4-02 perf]# ./perf report --stdio -f | head -30 > # To display the perf.data header info, please use --header/--header-only options. > # > # > # Total Lost Samples: 0 > # > # Samples: 253 of event 'cycles' > # Event count (approx.): 253 > # > # Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles > # ........ ....... .................... ....................................... ....................................... .................. > # > 8.30% perf > Um [kernel.vmlinux] [k] __intel_pmu_enable_all.constprop.17 [k] native_write_msr - > 7.91% perf > Um [kernel.vmlinux] [k] intel_pmu_lbr_enable_all [k] __intel_pmu_enable_all.constprop.17 - > 7.91% perf > Um [kernel.vmlinux] [k] native_write_msr [k] intel_pmu_lbr_enable_all - > 6.32% kill libc-2.24.so [.] _dl_addr [.] _dl_addr - > 5.93% perf > Um [kernel.vmlinux] [k] perf_iterate_ctx [k] perf_iterate_ctx - > 2.77% kill libc-2.24.so [.] malloc [.] malloc - > 1.98% kill libc-2.24.so [.] _int_malloc [.] _int_malloc - > 1.58% kill [kernel.vmlinux] [k] __rb_insert_augmented [k] __rb_insert_augmented - > 1.58% perf > Um [kernel.vmlinux] [k] perf_event_exec [k] perf_event_exec - > 1.19% kill [kernel.vmlinux] [k] anon_vma_interval_tree_insert [k] anon_vma_interval_tree_insert - > 1.19% kill [kernel.vmlinux] [k] free_pgd_range [k] free_pgd_range - > 1.19% kill [kernel.vmlinux] [k] n_tty_write [k] n_tty_write - > 1.19% perf > Um [kernel.vmlinux] [k] native_sched_clock [k] sched_clock - > ... > SNIP > > > jirka Hi, Thanks so much for trying this patch. The branch statistics is printed at the end of perf report --stdio. For example, on my machine, root@skl:/tmp# perf record -j any,save_type kill . . . . . . For more details see kill(1). [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.011 MB perf.data (1 samples) ] root@skl:/tmp# perf report --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 3 of event 'cycles' # Event count (approx.): 3 # # Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles # ........ ....... .................... ............................ ............................ .................. # 33.33% perf [kernel.vmlinux] [k] __intel_pmu_enable_all [k] native_write_msr 10 33.33% perf [kernel.vmlinux] [k] intel_pmu_lbr_enable_all [k] __intel_pmu_enable_all 4 33.33% perf [kernel.vmlinux] [k] native_write_msr [k] intel_pmu_lbr_enable_all - # # (Tip: Show current config key-value pairs: perf config --list) # # # Branch Statistics: # CROSS_4K: 100.0% CALL: 33.3% RET: 66.7% Thanks Jin Yao ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 0/5] perf report: Show branch type 2017-04-12 12:25 ` Jin, Yao @ 2017-04-12 14:26 ` Jiri Olsa 2017-04-12 15:42 ` Jin, Yao 0 siblings, 1 reply; 23+ messages in thread From: Jiri Olsa @ 2017-04-12 14:26 UTC (permalink / raw) To: Jin, Yao Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev On Wed, Apr 12, 2017 at 08:25:34PM +0800, Jin, Yao wrote: SNIP > > # Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles > > # ........ ....... .................... ....................................... ....................................... .................. > > # > > 8.30% perf > > Um [kernel.vmlinux] [k] __intel_pmu_enable_all.constprop.17 [k] native_write_msr - > > 7.91% perf > > Um [kernel.vmlinux] [k] intel_pmu_lbr_enable_all [k] __intel_pmu_enable_all.constprop.17 - > > 7.91% perf > > Um [kernel.vmlinux] [k] native_write_msr [k] intel_pmu_lbr_enable_all - > > 6.32% kill libc-2.24.so [.] _dl_addr [.] _dl_addr - > > 5.93% perf > > Um [kernel.vmlinux] [k] perf_iterate_ctx [k] perf_iterate_ctx - > > 2.77% kill libc-2.24.so [.] malloc [.] malloc - > > 1.98% kill libc-2.24.so [.] _int_malloc [.] _int_malloc - > > 1.58% kill [kernel.vmlinux] [k] __rb_insert_augmented [k] __rb_insert_augmented - > > 1.58% perf > > Um [kernel.vmlinux] [k] perf_event_exec [k] perf_event_exec - > > 1.19% kill [kernel.vmlinux] [k] anon_vma_interval_tree_insert [k] anon_vma_interval_tree_insert - > > 1.19% kill [kernel.vmlinux] [k] free_pgd_range [k] free_pgd_range - > > 1.19% kill [kernel.vmlinux] [k] n_tty_write [k] n_tty_write - > > 1.19% perf > > Um [kernel.vmlinux] [k] native_sched_clock [k] sched_clock - > > ... > > SNIP > > > > > > jirka > > Hi, > > Thanks so much for trying this patch. > > The branch statistics is printed at the end of perf report --stdio. yep, but for some reason with your changes the head report got changed as well, I haven't checked the details yet.. jirka ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 0/5] perf report: Show branch type 2017-04-12 14:26 ` Jiri Olsa @ 2017-04-12 15:42 ` Jin, Yao 2017-04-12 15:46 ` Jiri Olsa 0 siblings, 1 reply; 23+ messages in thread From: Jin, Yao @ 2017-04-12 15:42 UTC (permalink / raw) To: Jiri Olsa Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev On 4/12/2017 10:26 PM, Jiri Olsa wrote: > On Wed, Apr 12, 2017 at 08:25:34PM +0800, Jin, Yao wrote: > > SNIP > >>> # Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles >>> # ........ ....... .................... ....................................... ....................................... .................. >>> # >>> 8.30% perf >>> Um [kernel.vmlinux] [k] __intel_pmu_enable_all.constprop.17 [k] native_write_msr - >>> 7.91% perf >>> Um [kernel.vmlinux] [k] intel_pmu_lbr_enable_all [k] __intel_pmu_enable_all.constprop.17 - >>> 7.91% perf >>> Um [kernel.vmlinux] [k] native_write_msr [k] intel_pmu_lbr_enable_all - >>> 6.32% kill libc-2.24.so [.] _dl_addr [.] _dl_addr - >>> 5.93% perf >>> Um [kernel.vmlinux] [k] perf_iterate_ctx [k] perf_iterate_ctx - >>> 2.77% kill libc-2.24.so [.] malloc [.] malloc - >>> 1.98% kill libc-2.24.so [.] _int_malloc [.] _int_malloc - >>> 1.58% kill [kernel.vmlinux] [k] __rb_insert_augmented [k] __rb_insert_augmented - >>> 1.58% perf >>> Um [kernel.vmlinux] [k] perf_event_exec [k] perf_event_exec - >>> 1.19% kill [kernel.vmlinux] [k] anon_vma_interval_tree_insert [k] anon_vma_interval_tree_insert - >>> 1.19% kill [kernel.vmlinux] [k] free_pgd_range [k] free_pgd_range - >>> 1.19% kill [kernel.vmlinux] [k] n_tty_write [k] n_tty_write - >>> 1.19% perf >>> Um [kernel.vmlinux] [k] native_sched_clock [k] sched_clock - >>> ... >>> SNIP >>> >>> >>> jirka >> Hi, >> >> Thanks so much for trying this patch. >> >> The branch statistics is printed at the end of perf report --stdio. > yep, but for some reason with your changes the head report > got changed as well, I haven't checked the details yet.. > > jirka The kill returns immediately with no parameter error. Could you try an application which can run for a while? For example: perf record -j any,save_type top Thanks Jin Yao ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 0/5] perf report: Show branch type 2017-04-12 15:42 ` Jin, Yao @ 2017-04-12 15:46 ` Jiri Olsa 0 siblings, 0 replies; 23+ messages in thread From: Jiri Olsa @ 2017-04-12 15:46 UTC (permalink / raw) To: Jin, Yao Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev On Wed, Apr 12, 2017 at 11:42:44PM +0800, Jin, Yao wrote: > > > On 4/12/2017 10:26 PM, Jiri Olsa wrote: > > On Wed, Apr 12, 2017 at 08:25:34PM +0800, Jin, Yao wrote: > > > > SNIP > > > > > > # Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles > > > > # ........ ....... .................... ....................................... ....................................... .................. > > > > # > > > > 8.30% perf > > > > Um [kernel.vmlinux] [k] __intel_pmu_enable_all.constprop.17 [k] native_write_msr - > > > > 7.91% perf > > > > Um [kernel.vmlinux] [k] intel_pmu_lbr_enable_all [k] __intel_pmu_enable_all.constprop.17 - > > > > 7.91% perf > > > > Um [kernel.vmlinux] [k] native_write_msr [k] intel_pmu_lbr_enable_all - > > > > 6.32% kill libc-2.24.so [.] _dl_addr [.] _dl_addr - > > > > 5.93% perf > > > > Um [kernel.vmlinux] [k] perf_iterate_ctx [k] perf_iterate_ctx - > > > > 2.77% kill libc-2.24.so [.] malloc [.] malloc - > > > > 1.98% kill libc-2.24.so [.] _int_malloc [.] _int_malloc - > > > > 1.58% kill [kernel.vmlinux] [k] __rb_insert_augmented [k] __rb_insert_augmented - > > > > 1.58% perf > > > > Um [kernel.vmlinux] [k] perf_event_exec [k] perf_event_exec - > > > > 1.19% kill [kernel.vmlinux] [k] anon_vma_interval_tree_insert [k] anon_vma_interval_tree_insert - > > > > 1.19% kill [kernel.vmlinux] [k] free_pgd_range [k] free_pgd_range - > > > > 1.19% kill [kernel.vmlinux] [k] n_tty_write [k] n_tty_write - > > > > 1.19% perf > > > > Um [kernel.vmlinux] [k] native_sched_clock [k] sched_clock - > > > > ... > > > > SNIP > > > > > > > > > > > > jirka > > > Hi, > > > > > > Thanks so much for trying this patch. > > > > > > The branch statistics is printed at the end of perf report --stdio. > > yep, but for some reason with your changes the head report > > got changed as well, I haven't checked the details yet.. > > > > jirka > > The kill returns immediately with no parameter error. Could you try an > application which can run for a while? > > For example: > perf record -j any,save_type top sure, but it does not change the fact that the report output is broken, we need to fix it even for the 'kill' record case jirka ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 0/5] perf report: Show branch type 2017-04-12 10:58 ` [PATCH v4 0/5] perf report: Show branch type Jiri Olsa 2017-04-12 12:25 ` Jin, Yao @ 2017-04-13 2:00 ` Jin, Yao 2017-04-13 3:25 ` Jin, Yao 1 sibling, 1 reply; 23+ messages in thread From: Jin, Yao @ 2017-04-13 2:00 UTC (permalink / raw) To: Jiri Olsa Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev On 4/12/2017 6:58 PM, Jiri Olsa wrote: > On Wed, Apr 12, 2017 at 06:21:01AM +0800, Jin Yao wrote: > > SNIP > >> 3. Use 2 bits in perf_branch_entry for a "cross" metrics checking >> for branch cross 4K or 2M area. It's an approximate computing >> for checking if the branch cross 4K page or 2MB page. >> >> For example: >> >> perf record -g --branch-filter any,save_type <command> >> >> perf report --stdio >> >> JCC forward: 27.7% >> JCC backward: 9.8% >> JMP: 0.0% >> IND_JMP: 6.5% >> CALL: 26.6% >> IND_CALL: 0.0% >> RET: 29.3% >> IRET: 0.0% >> CROSS_4K: 0.0% >> CROSS_2M: 14.3% > got mangled perf report --stdio output for: > > > [root@ibm-x3650m4-02 perf]# ./perf record -j any,save_type kill > kill: not enough arguments > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.013 MB perf.data (18 samples) ] > > [root@ibm-x3650m4-02 perf]# ./perf report --stdio -f | head -30 > # To display the perf.data header info, please use --header/--header-only options. > # > # > # Total Lost Samples: 0 > # > # Samples: 253 of event 'cycles' > # Event count (approx.): 253 > # > # Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles > # ........ ....... .................... ....................................... ....................................... .................. > # > 8.30% perf > Um [kernel.vmlinux] [k] __intel_pmu_enable_all.constprop.17 [k] native_write_msr - > 7.91% perf > Um [kernel.vmlinux] [k] intel_pmu_lbr_enable_all [k] __intel_pmu_enable_all.constprop.17 - > 7.91% perf > Um [kernel.vmlinux] [k] native_write_msr [k] intel_pmu_lbr_enable_all - > 6.32% kill libc-2.24.so [.] _dl_addr [.] _dl_addr - > 5.93% perf > Um [kernel.vmlinux] [k] perf_iterate_ctx [k] perf_iterate_ctx - > 2.77% kill libc-2.24.so [.] malloc [.] malloc - > 1.98% kill libc-2.24.so [.] _int_malloc [.] _int_malloc - > 1.58% kill [kernel.vmlinux] [k] __rb_insert_augmented [k] __rb_insert_augmented - > 1.58% perf > Um [kernel.vmlinux] [k] perf_event_exec [k] perf_event_exec - > 1.19% kill [kernel.vmlinux] [k] anon_vma_interval_tree_insert [k] anon_vma_interval_tree_insert - > 1.19% kill [kernel.vmlinux] [k] free_pgd_range [k] free_pgd_range - > 1.19% kill [kernel.vmlinux] [k] n_tty_write [k] n_tty_write - > 1.19% perf > Um [kernel.vmlinux] [k] native_sched_clock [k] sched_clock - > ... > SNIP > > > jirka Sorry, I look at this issue at midnight in Shanghai. I misunderstood that the above output was only a mail format issue. Sorry about that. Now I recheck the output, and yes, the perf report output is mangled. But my patch doesn't touch the associated code. Anyway I remove my patches, pull the latest update from perf/core branch and run tests to check if its a regression issue. I test on HSW and SKL both. 1. On HSW. root@hsw:/tmp# perf record -j any kill ...... /* SNIP */ For more details see kill(1). [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.014 MB perf.data (9 samples) ] root@hsw:/tmp# perf report --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 144 of event 'cycles' # Event count (approx.): 144 # # Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles # ........ ....... .................... ............................... ............................... .................. # 10.42% kill libc-2.23.so [.] read_alias_file [.] read_alias_file - 9.72% kill [kernel.vmlinux] [k] update_load_avg [k] update_load_avg - 9.03% perf Um [unknown] [k] 0000000000000000 [k] 0000000000000000 - 8.33% kill libc-2.23.so [.] _int_malloc [.] _int_malloc - ...... /* SNIP */ 0.69% kill [kernel.vmlinux] [k] _raw_spin_lock [k] unmap_page_range - 0.69% perf Um [kernel.vmlinux] [k] __intel_pmu_enable_all [k] native_write_msr - 0.69% perf Um [kernel.vmlinux] [k] intel_pmu_lbr_enable_all [k] __intel_pmu_enable_all - 0.69% perf Um [kernel.vmlinux] [k] native_write_msr [k] intel_pmu_lbr_enable_all - The issue is still there. 2. On SKL root@skl:/tmp# perf record -j any kill ...... /* SNIP */ For more details see kill(1). [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.012 MB perf.data (1 samples) ] root@skl:/tmp# perf report --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 32 of event 'cycles' # Event count (approx.): 32 # # Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles # ........ ....... .................... ............................ ............................ .................. # 90.62% perf Um [unknown] [k] 0000000000000000 [k] 0000000000000000 - 3.12% perf Um [kernel.vmlinux] [k] __intel_pmu_enable_all [k] native_write_msr 11 3.12% perf Um [kernel.vmlinux] [k] intel_pmu_lbr_enable_all [k] __intel_pmu_enable_all 4 3.12% perf Um [kernel.vmlinux] [k] native_write_msr [k] intel_pmu_lbr_enable_all - The issue is there too. Now it works without my patch and it runs with latest perf/core branch. So it looks like a regression issue. Thanks Jin Yao ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 0/5] perf report: Show branch type 2017-04-13 2:00 ` Jin, Yao @ 2017-04-13 3:25 ` Jin, Yao 2017-04-13 8:26 ` Jiri Olsa 0 siblings, 1 reply; 23+ messages in thread From: Jin, Yao @ 2017-04-13 3:25 UTC (permalink / raw) To: Jiri Olsa Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev, treeze.taeung On 4/13/2017 10:00 AM, Jin, Yao wrote: > > > On 4/12/2017 6:58 PM, Jiri Olsa wrote: >> On Wed, Apr 12, 2017 at 06:21:01AM +0800, Jin Yao wrote: >> >> SNIP >> >>> 3. Use 2 bits in perf_branch_entry for a "cross" metrics checking >>> for branch cross 4K or 2M area. It's an approximate computing >>> for checking if the branch cross 4K page or 2MB page. >>> >>> For example: >>> >>> perf record -g --branch-filter any,save_type <command> >>> >>> perf report --stdio >>> >>> JCC forward: 27.7% >>> JCC backward: 9.8% >>> JMP: 0.0% >>> IND_JMP: 6.5% >>> CALL: 26.6% >>> IND_CALL: 0.0% >>> RET: 29.3% >>> IRET: 0.0% >>> CROSS_4K: 0.0% >>> CROSS_2M: 14.3% >> got mangled perf report --stdio output for: >> >> >> [root@ibm-x3650m4-02 perf]# ./perf record -j any,save_type kill >> kill: not enough arguments >> [ perf record: Woken up 1 times to write data ] >> [ perf record: Captured and wrote 0.013 MB perf.data (18 samples) ] >> >> [root@ibm-x3650m4-02 perf]# ./perf report --stdio -f | head -30 >> # To display the perf.data header info, please use >> --header/--header-only options. >> # >> # >> # Total Lost Samples: 0 >> # >> # Samples: 253 of event 'cycles' >> # Event count (approx.): 253 >> # >> # Overhead Command Source Shared Object Source >> Symbol Target >> Symbol Basic Block Cycles >> # ........ ....... .................... >> ....................................... >> ....................................... .................. >> # >> 8.30% perf >> Um [kernel.vmlinux] [k] __intel_pmu_enable_all.constprop.17 >> [k] native_write_msr - >> 7.91% perf >> Um [kernel.vmlinux] [k] intel_pmu_lbr_enable_all >> [k] __intel_pmu_enable_all.constprop.17 - >> 7.91% perf >> Um [kernel.vmlinux] [k] native_write_msr >> [k] intel_pmu_lbr_enable_all - >> 6.32% kill libc-2.24.so [.] >> _dl_addr [.] >> _dl_addr - >> 5.93% perf >> Um [kernel.vmlinux] [k] perf_iterate_ctx >> [k] perf_iterate_ctx - >> 2.77% kill libc-2.24.so [.] >> malloc [.] >> malloc - >> 1.98% kill libc-2.24.so [.] >> _int_malloc [.] >> _int_malloc - >> 1.58% kill [kernel.vmlinux] [k] >> __rb_insert_augmented [k] >> __rb_insert_augmented - >> 1.58% perf >> Um [kernel.vmlinux] [k] perf_event_exec >> [k] perf_event_exec - >> 1.19% kill [kernel.vmlinux] [k] >> anon_vma_interval_tree_insert [k] >> anon_vma_interval_tree_insert - >> 1.19% kill [kernel.vmlinux] [k] >> free_pgd_range [k] >> free_pgd_range - >> 1.19% kill [kernel.vmlinux] [k] >> n_tty_write [k] >> n_tty_write - >> 1.19% perf >> Um [kernel.vmlinux] [k] native_sched_clock >> [k] sched_clock - >> ... >> SNIP >> >> >> jirka > > Sorry, I look at this issue at midnight in Shanghai. I misunderstood > that the above output was only a mail format issue. Sorry about that. > > Now I recheck the output, and yes, the perf report output is mangled. > But my patch doesn't touch the associated code. > > Anyway I remove my patches, pull the latest update from perf/core > branch and run tests to check if its a regression issue. I test on HSW > and SKL both. > > 1. On HSW. > > root@hsw:/tmp# perf record -j any kill > ...... /* SNIP */ > For more details see kill(1). > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.014 MB perf.data (9 samples) ] > > root@hsw:/tmp# perf report --stdio > # To display the perf.data header info, please use > --header/--header-only options. > # > # > # Total Lost Samples: 0 > # > # Samples: 144 of event 'cycles' > # Event count (approx.): 144 > # > # Overhead Command Source Shared Object Source > Symbol Target Symbol Basic Block > Cycles > # ........ ....... .................... > ............................... ............................... > .................. > # > 10.42% kill libc-2.23.so [.] > read_alias_file [.] read_alias_file - > 9.72% kill [kernel.vmlinux] [k] > update_load_avg [k] update_load_avg - > 9.03% perf > Um [unknown] [k] 0000000000000000 [k] > 0000000000000000 - > 8.33% kill libc-2.23.so [.] > _int_malloc [.] _int_malloc - > ...... /* SNIP */ > 0.69% kill [kernel.vmlinux] [k] > _raw_spin_lock [k] unmap_page_range - > 0.69% perf > Um [kernel.vmlinux] [k] __intel_pmu_enable_all [k] > native_write_msr - > 0.69% perf > Um [kernel.vmlinux] [k] intel_pmu_lbr_enable_all [k] > __intel_pmu_enable_all - > 0.69% perf > Um [kernel.vmlinux] [k] native_write_msr [k] > intel_pmu_lbr_enable_all - > > The issue is still there. > > 2. On SKL > > root@skl:/tmp# perf record -j any kill > ...... /* SNIP */ > For more details see kill(1). > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.012 MB perf.data (1 samples) ] > > root@skl:/tmp# perf report --stdio > > # To display the perf.data header info, please use > --header/--header-only options. > # > # > # Total Lost Samples: 0 > # > # Samples: 32 of event 'cycles' > # Event count (approx.): 32 > # > # Overhead Command Source Shared Object Source > Symbol Target Symbol Basic Block Cycles > # ........ ....... .................... > ............................ ............................ > .................. > # > 90.62% perf > Um [unknown] [k] 0000000000000000 [k] > 0000000000000000 - > 3.12% perf > Um [kernel.vmlinux] [k] __intel_pmu_enable_all [k] > native_write_msr 11 > 3.12% perf > Um [kernel.vmlinux] [k] intel_pmu_lbr_enable_all [k] > __intel_pmu_enable_all 4 > 3.12% perf > Um [kernel.vmlinux] [k] native_write_msr [k] > intel_pmu_lbr_enable_all - > > The issue is there too. > > Now it works without my patch and it runs with latest perf/core > branch. So it looks like a regression issue. > > Thanks > Jin Yao > > I have tested, the regression issue is happened after this commit: bdd97ca perf tools: Refactor the code to strip command name with {l,r}trim() CC to the author for double checking. Thanks Jin Yao > > > > > > > > > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH v4 0/5] perf report: Show branch type 2017-04-13 3:25 ` Jin, Yao @ 2017-04-13 8:26 ` Jiri Olsa 0 siblings, 0 replies; 23+ messages in thread From: Jiri Olsa @ 2017-04-13 8:26 UTC (permalink / raw) To: Jin, Yao Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev, treeze.taeung On Thu, Apr 13, 2017 at 11:25:39AM +0800, Jin, Yao wrote: SNIP > > > > Now it works without my patch and it runs with latest perf/core branch. > > So it looks like a regression issue. > > > > Thanks > > Jin Yao > > > > > > I have tested, the regression issue is happened after this commit: > > bdd97ca perf tools: Refactor the code to strip command name with {l,r}trim() > > CC to the author for double checking. cool, thanks jirka ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2017-04-19 4:11 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-04-11 22:21 [PATCH v4 0/5] perf report: Show branch type Jin Yao 2017-04-11 22:21 ` [PATCH v4 1/5] perf/core: Define the common branch type classification Jin Yao 2017-04-11 22:21 ` [PATCH v4 2/5] perf/x86/intel: Record branch type Jin Yao 2017-04-11 22:21 ` [PATCH v4 3/5] perf record: Create a new option save_type in --branch-filter Jin Yao 2017-04-11 22:21 ` [PATCH v4 4/5] perf report: Show branch type statistics for stdio mode Jin Yao 2017-04-18 18:53 ` Jiri Olsa 2017-04-19 0:53 ` Jin, Yao 2017-04-19 4:11 ` Jin, Yao 2017-04-18 18:53 ` Jiri Olsa 2017-04-19 0:41 ` Jin, Yao 2017-04-11 22:21 ` [PATCH v4 5/5] perf report: Show branch type in callchain entry Jin Yao 2017-04-18 18:53 ` Jiri Olsa 2017-04-19 0:33 ` Jin, Yao 2017-04-18 18:53 ` Jiri Olsa 2017-04-19 0:32 ` Jin, Yao 2017-04-12 10:58 ` [PATCH v4 0/5] perf report: Show branch type Jiri Olsa 2017-04-12 12:25 ` Jin, Yao 2017-04-12 14:26 ` Jiri Olsa 2017-04-12 15:42 ` Jin, Yao 2017-04-12 15:46 ` Jiri Olsa 2017-04-13 2:00 ` Jin, Yao 2017-04-13 3:25 ` Jin, Yao 2017-04-13 8:26 ` Jiri Olsa
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).