* [PATCH v1 0/5] perf report: Show branch type
@ 2017-03-31 15:18 Jin Yao
2017-03-31 15:18 ` [PATCH v1 1/5] perf/core: Define the common branch type classification Jin Yao
` (4 more replies)
0 siblings, 5 replies; 16+ messages in thread
From: Jin Yao @ 2017-03-31 15:18 UTC (permalink / raw)
To: acme, jolsa; +Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
It is often useful to know the branch types while analyzing branch
data. For example, a call is very different from a conditional branch.
Currently we have to look it up in binary while the binary may later
not be available and even the binary is available but user has to take
some time. It is very useful for user to check it directly in perf
report.
Perf already has support for disassembling the branch instruction
to get the branch type.
The patch series records the branch type and show the branch type with
other LBR information in callchain entry via perf report. The patch
series also adds the branch type summary at the end of
perf report --stdio.
To keep consistent on kernel and userspace and make the classification
more common, the patch adds the common branch type classification
in perf_event.h.
The common branch types are:
JCC forward: Conditional forward jump
JCC backward: Conditional backward jump
JMP: Jump imm
IND_JMP: Jump reg/mem
CALL: Call imm
IND_CALL: Call reg/mem
RET: Ret
FAR_BRANCH: SYSCALL/SYSRET, IRQ, IRET, TSX Abort
An example:
1. Record branch type (new option "save_type")
perf record -g --branch-filter any,save_type <command>
2. Show the branch type statistics at the end of perf report --stdio
perf report --stdio
JCC forward: 34.0%
JCC backward: 3.6%
JMP: 0.0%
IND_JMP: 6.5%
CALL: 26.6%
IND_CALL: 0.0%
RET: 29.3%
FAR_BRANCH: 0.0%
3. Show branch type in callchain entry
perf report --branch-history --stdio --no-children
--23.91%--main div.c:42 (RET cycles:2)
compute_flag div.c:28 (RET cycles:2)
compute_flag div.c:27 (RET cycles:1)
rand rand.c:28 (RET cycles:1)
rand rand.c:28 (RET cycles:1)
__random random.c:298 (RET cycles:1)
__random random.c:297 (JCC forward cycles:1)
__random random.c:295 (JCC forward cycles:1)
__random random.c:295 (JCC forward cycles:1)
__random random.c:295 (JCC forward cycles:1)
__random random.c:295 (RET cycles:9)
Jin Yao (5):
perf/core: Define the common branch type classification
perf/x86/intel: Record branch type
perf record: Create a new option save_type in --branch-filter
perf report: Show branch type statistics for stdio mode
perf report: Show branch type in callchain entry
arch/x86/events/intel/lbr.c | 69 ++++++++++++-
include/uapi/linux/perf_event.h | 24 ++++-
tools/include/uapi/linux/perf_event.h | 24 ++++-
tools/perf/Documentation/perf-record.txt | 1 +
tools/perf/builtin-report.c | 140 ++++++++++++++++++++++++++
tools/perf/util/callchain.c | 168 ++++++++++++++++++++-----------
tools/perf/util/callchain.h | 13 +++
tools/perf/util/event.h | 3 +-
tools/perf/util/hist.c | 5 +-
tools/perf/util/parse-branch-options.c | 1 +
10 files changed, 381 insertions(+), 67 deletions(-)
--
2.7.4
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v1 1/5] perf/core: Define the common branch type classification
2017-03-31 15:18 [PATCH v1 0/5] perf report: Show branch type Jin Yao
@ 2017-03-31 15:18 ` Jin Yao
2017-04-04 14:18 ` Arnaldo Carvalho de Melo
2017-03-31 15:18 ` [PATCH v1 2/5] perf/x86/intel: Record branch type Jin Yao
` (3 subsequent siblings)
4 siblings, 1 reply; 16+ messages in thread
From: Jin Yao @ 2017-03-31 15:18 UTC (permalink / raw)
To: acme, jolsa; +Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
It is often useful to know the branch types while analyzing branch
data. For example, a call is very different from a conditional branch.
Currently we have to look it up in binary while the binary may later
not be available and even the binary is available but user has to take
some time. It is very useful for user to check it directly in perf
report.
Perf already has support for disassembling the branch instruction
to get the branch type. The branch type is defined in lbr.c.
To keep consistent on kernel and userspace and make the classification
more common, the patch adds the common branch type classification
in perf_event.h.
Since the disassembling of branch instruction needs some overhead,
a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
needs to disassemble the branch instruction and record the branch
type.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
include/uapi/linux/perf_event.h | 24 +++++++++++++++++++++++-
tools/include/uapi/linux/perf_event.h | 24 +++++++++++++++++++++++-
2 files changed, 46 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index d09a9cd..4d731fd 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT = 14, /* no flags */
PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT = 15, /* no cycles */
+ PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT = 16, /* save branch type */
+
PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */
};
@@ -198,9 +200,27 @@ enum perf_branch_sample_type {
PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
PERF_SAMPLE_BRANCH_NO_CYCLES = 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
+ PERF_SAMPLE_BRANCH_TYPE_SAVE =
+ 1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
+
PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
};
+/*
+ * Common flow change classification
+ */
+enum {
+ PERF_BR_NONE = 0, /* unknown */
+ PERF_BR_JCC_FWD = 1 << 1, /* conditional forward jump */
+ PERF_BR_JCC_BWD = 1 << 2, /* conditional backward jump */
+ PERF_BR_JMP = 1 << 3, /* jump */
+ PERF_BR_IND_JMP = 1 << 4, /* indirect jump */
+ PERF_BR_CALL = 1 << 5, /* call */
+ PERF_BR_IND_CALL = 1 << 6, /* indirect call */
+ PERF_BR_RET = 1 << 7, /* return */
+ PERF_BR_FAR_BRANCH = 1 << 8, /* SYSCALL,SYSRET,IRQ,... */
+};
+
#define PERF_SAMPLE_BRANCH_PLM_ALL \
(PERF_SAMPLE_BRANCH_USER|\
PERF_SAMPLE_BRANCH_KERNEL|\
@@ -999,6 +1019,7 @@ union perf_mem_data_src {
* in_tx: running in a hardware transaction
* abort: aborting a hardware transaction
* cycles: cycles from last branch (or 0 if not supported)
+ * type: branch type
*/
struct perf_branch_entry {
__u64 from;
@@ -1008,7 +1029,8 @@ struct perf_branch_entry {
in_tx:1, /* in transaction */
abort:1, /* transaction abort */
cycles:16, /* cycle count to last branch */
- reserved:44;
+ type:9, /* branch type */
+ reserved:35;
};
#endif /* _UAPI_LINUX_PERF_EVENT_H */
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index d09a9cd..4d731fd 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT = 14, /* no flags */
PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT = 15, /* no cycles */
+ PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT = 16, /* save branch type */
+
PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */
};
@@ -198,9 +200,27 @@ enum perf_branch_sample_type {
PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
PERF_SAMPLE_BRANCH_NO_CYCLES = 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
+ PERF_SAMPLE_BRANCH_TYPE_SAVE =
+ 1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
+
PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
};
+/*
+ * Common flow change classification
+ */
+enum {
+ PERF_BR_NONE = 0, /* unknown */
+ PERF_BR_JCC_FWD = 1 << 1, /* conditional forward jump */
+ PERF_BR_JCC_BWD = 1 << 2, /* conditional backward jump */
+ PERF_BR_JMP = 1 << 3, /* jump */
+ PERF_BR_IND_JMP = 1 << 4, /* indirect jump */
+ PERF_BR_CALL = 1 << 5, /* call */
+ PERF_BR_IND_CALL = 1 << 6, /* indirect call */
+ PERF_BR_RET = 1 << 7, /* return */
+ PERF_BR_FAR_BRANCH = 1 << 8, /* SYSCALL,SYSRET,IRQ,... */
+};
+
#define PERF_SAMPLE_BRANCH_PLM_ALL \
(PERF_SAMPLE_BRANCH_USER|\
PERF_SAMPLE_BRANCH_KERNEL|\
@@ -999,6 +1019,7 @@ union perf_mem_data_src {
* in_tx: running in a hardware transaction
* abort: aborting a hardware transaction
* cycles: cycles from last branch (or 0 if not supported)
+ * type: branch type
*/
struct perf_branch_entry {
__u64 from;
@@ -1008,7 +1029,8 @@ struct perf_branch_entry {
in_tx:1, /* in transaction */
abort:1, /* transaction abort */
cycles:16, /* cycle count to last branch */
- reserved:44;
+ type:9, /* branch type */
+ reserved:35;
};
#endif /* _UAPI_LINUX_PERF_EVENT_H */
--
2.7.4
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v1 2/5] perf/x86/intel: Record branch type
2017-03-31 15:18 [PATCH v1 0/5] perf report: Show branch type Jin Yao
2017-03-31 15:18 ` [PATCH v1 1/5] perf/core: Define the common branch type classification Jin Yao
@ 2017-03-31 15:18 ` Jin Yao
2017-03-31 15:18 ` [PATCH v1 3/5] perf record: Create a new option save_type in --branch-filter Jin Yao
` (2 subsequent siblings)
4 siblings, 0 replies; 16+ messages in thread
From: Jin Yao @ 2017-03-31 15:18 UTC (permalink / raw)
To: acme, jolsa; +Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
Perf already has support for disassembling the branch instruction
and using the branch type for filtering. The patch just records
the branch type in perf_branch_entry.
Before recording, the patch converts the x86 branch classification
to common branch classification.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
arch/x86/events/intel/lbr.c | 69 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 68 insertions(+), 1 deletion(-)
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 81b321a..57d17a4 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -109,6 +109,9 @@ enum {
X86_BR_ZERO_CALL = 1 << 15,/* zero length call */
X86_BR_CALL_STACK = 1 << 16,/* call stack */
X86_BR_IND_JMP = 1 << 17,/* indirect jump */
+
+ X86_BR_TYPE_SAVE = 1 << 18,/* indicate to save branch type */
+
};
#define X86_BR_PLM (X86_BR_USER | X86_BR_KERNEL)
@@ -670,6 +673,10 @@ static int intel_pmu_setup_sw_lbr_filter(struct perf_event *event)
if (br_type & PERF_SAMPLE_BRANCH_CALL)
mask |= X86_BR_CALL | X86_BR_ZERO_CALL;
+
+ if (br_type & PERF_SAMPLE_BRANCH_TYPE_SAVE)
+ mask |= X86_BR_TYPE_SAVE;
+
/*
* stash actual user request into reg, it may
* be used by fixup code for some CPU
@@ -923,6 +930,58 @@ static int branch_type(unsigned long from, unsigned long to, int abort)
return ret;
}
+static int
+common_branch_type(int type, u64 from, u64 to)
+{
+ int ret;
+
+ type = type & (~(X86_BR_KERNEL | X86_BR_USER));
+
+ switch (type) {
+ case X86_BR_CALL:
+ case X86_BR_ZERO_CALL:
+ ret = PERF_BR_CALL;
+ break;
+
+ case X86_BR_RET:
+ ret = PERF_BR_RET;
+ break;
+
+ case X86_BR_SYSCALL:
+ case X86_BR_SYSRET:
+ case X86_BR_INT:
+ case X86_BR_IRET:
+ case X86_BR_IRQ:
+ case X86_BR_ABORT:
+ ret = PERF_BR_FAR_BRANCH;
+ break;
+
+ case X86_BR_JCC:
+ if (to > from)
+ ret = PERF_BR_JCC_FWD;
+ else
+ ret = PERF_BR_JCC_BWD;
+ break;
+
+ case X86_BR_JMP:
+ ret = PERF_BR_JMP;
+ break;
+
+ case X86_BR_IND_CALL:
+ ret = PERF_BR_IND_CALL;
+ break;
+
+ case X86_BR_IND_JMP:
+ ret = PERF_BR_IND_JMP;
+ break;
+
+ default:
+ ret = PERF_BR_NONE;
+ }
+
+ return ret;
+}
+
/*
* implement actual branch filter based on user demand.
* Hardware may not exactly satisfy that request, thus
@@ -939,7 +998,8 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
bool compress = false;
/* if sampling all branches, then nothing to filter */
- if ((br_sel & X86_BR_ALL) == X86_BR_ALL)
+ if (((br_sel & X86_BR_ALL) == X86_BR_ALL) &&
+ ((br_sel & X86_BR_TYPE_SAVE) != X86_BR_TYPE_SAVE))
return;
for (i = 0; i < cpuc->lbr_stack.nr; i++) {
@@ -960,6 +1020,13 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
cpuc->lbr_entries[i].from = 0;
compress = true;
}
+
+ if ((br_sel & X86_BR_TYPE_SAVE) == X86_BR_TYPE_SAVE)
+ cpuc->lbr_entries[i].type = common_branch_type(type,
+ from,
+ to);
+ else
+ cpuc->lbr_entries[i].type = PERF_BR_NONE;
}
if (!compress)
--
2.7.4
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v1 3/5] perf record: Create a new option save_type in --branch-filter
2017-03-31 15:18 [PATCH v1 0/5] perf report: Show branch type Jin Yao
2017-03-31 15:18 ` [PATCH v1 1/5] perf/core: Define the common branch type classification Jin Yao
2017-03-31 15:18 ` [PATCH v1 2/5] perf/x86/intel: Record branch type Jin Yao
@ 2017-03-31 15:18 ` Jin Yao
2017-03-31 15:18 ` [PATCH v1 4/5] perf report: Show branch type statistics for stdio mode Jin Yao
2017-03-31 15:18 ` [PATCH v1 5/5] perf report: Show branch type in callchain entry Jin Yao
4 siblings, 0 replies; 16+ messages in thread
From: Jin Yao @ 2017-03-31 15:18 UTC (permalink / raw)
To: acme, jolsa; +Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
The option indicates the kernel to save branch type during sampling.
One example:
perf record -g --branch-filter any,save_type <command>
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
tools/perf/Documentation/perf-record.txt | 1 +
tools/perf/util/parse-branch-options.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index ea3789d..e2f5a4f 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -332,6 +332,7 @@ following filters are defined:
- no_tx: only when the target is not in a hardware transaction
- abort_tx: only when the target is a hardware transaction abort
- cond: conditional branches
+ - save_type: save branch type during sampling in case binary is not available later
+
The option requires at least one branch type among any, any_call, any_ret, ind_call, cond.
diff --git a/tools/perf/util/parse-branch-options.c b/tools/perf/util/parse-branch-options.c
index 38fd115..e71fb5f 100644
--- a/tools/perf/util/parse-branch-options.c
+++ b/tools/perf/util/parse-branch-options.c
@@ -28,6 +28,7 @@ static const struct branch_mode branch_modes[] = {
BRANCH_OPT("cond", PERF_SAMPLE_BRANCH_COND),
BRANCH_OPT("ind_jmp", PERF_SAMPLE_BRANCH_IND_JUMP),
BRANCH_OPT("call", PERF_SAMPLE_BRANCH_CALL),
+ BRANCH_OPT("save_type", PERF_SAMPLE_BRANCH_TYPE_SAVE),
BRANCH_END
};
--
2.7.4
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v1 4/5] perf report: Show branch type statistics for stdio mode
2017-03-31 15:18 [PATCH v1 0/5] perf report: Show branch type Jin Yao
` (2 preceding siblings ...)
2017-03-31 15:18 ` [PATCH v1 3/5] perf record: Create a new option save_type in --branch-filter Jin Yao
@ 2017-03-31 15:18 ` Jin Yao
2017-03-31 15:18 ` [PATCH v1 5/5] perf report: Show branch type in callchain entry Jin Yao
4 siblings, 0 replies; 16+ messages in thread
From: Jin Yao @ 2017-03-31 15:18 UTC (permalink / raw)
To: acme, jolsa; +Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
Show the branch type statistics at the end of perf report --stdio.
For example:
perf report --stdio
JCC forward: 34.0%
JCC backward: 3.6%
JMP: 0.0%
IND_JMP: 6.5%
CALL: 26.6%
IND_CALL: 0.0%
RET: 29.3%
FAR_BRANCH: 0.0%
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
tools/perf/builtin-report.c | 140 ++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/hist.c | 5 +-
2 files changed, 141 insertions(+), 4 deletions(-)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index c18158b..fb26513 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -43,6 +43,17 @@
#include <linux/bitmap.h>
#include <linux/stringify.h>
+struct branch_type_stat {
+ u64 jcc_fwd;
+ u64 jcc_bwd;
+ u64 jmp;
+ u64 ind_jmp;
+ u64 call;
+ u64 ind_call;
+ u64 ret;
+ u64 far_branch;
+};
+
struct report {
struct perf_tool tool;
struct perf_session *session;
@@ -66,6 +77,7 @@ struct report {
u64 queue_size;
int socket_filter;
DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
+ struct branch_type_stat brtype_stat;
};
static int report__config(const char *var, const char *value, void *cb)
@@ -144,6 +156,66 @@ static int hist_iter__report_callback(struct hist_entry_iter *iter,
return err;
}
+static void branch_type_count(struct report *rep, struct branch_info *bi)
+{
+ struct branch_type_stat *stat = &rep->brtype_stat;
+ struct branch_flags *flags = &bi->flags;
+
+ switch (flags->type) {
+ case PERF_BR_JCC_FWD:
+ stat->jcc_fwd++;
+ break;
+
+ case PERF_BR_JCC_BWD:
+ stat->jcc_bwd++;
+ break;
+
+ case PERF_BR_JMP:
+ stat->jmp++;
+ break;
+
+ case PERF_BR_IND_JMP:
+ stat->ind_jmp++;
+ break;
+
+ case PERF_BR_CALL:
+ stat->call++;
+ break;
+
+ case PERF_BR_IND_CALL:
+ stat->ind_call++;
+ break;
+
+ case PERF_BR_RET:
+ stat->ret++;
+ break;
+
+ case PERF_BR_FAR_BRANCH:
+ stat->far_branch++;
+ break;
+
+ default:
+ break;
+ }
+}
+
+static int hist_iter__branch_callback(struct hist_entry_iter *iter,
+ struct addr_location *al __maybe_unused,
+ bool single __maybe_unused,
+ void *arg)
+{
+ struct hist_entry *he = iter->he;
+ struct report *rep = arg;
+ struct branch_info *bi;
+
+ if (sort__mode == SORT_MODE__BRANCH) {
+ bi = he->branch_info;
+ branch_type_count(rep, bi);
+ }
+
+ return 0;
+}
+
static int process_sample_event(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
@@ -182,6 +254,8 @@ static int process_sample_event(struct perf_tool *tool,
*/
if (!sample->branch_stack)
goto out_put;
+
+ iter.add_entry_cb = hist_iter__branch_callback;
iter.ops = &hist_iter_branch;
} else if (rep->mem_mode) {
iter.ops = &hist_iter_mem;
@@ -369,6 +443,67 @@ static size_t hists__fprintf_nr_sample_events(struct hists *hists, struct report
return ret + fprintf(fp, "\n#\n");
}
+static void branch_type_stat_display(FILE *fp, struct branch_type_stat *stat)
+{
+ u64 total = 0;
+
+ total += stat->jcc_fwd;
+ total += stat->jcc_bwd;
+ total += stat->jmp;
+ total += stat->ind_jmp;
+ total += stat->call;
+ total += stat->ind_call;
+ total += stat->ret;
+ total += stat->far_branch;
+
+ if (total == 0)
+ return;
+
+ fprintf(fp, "\n#");
+ fprintf(fp, "\n# Branch Statistics:");
+ fprintf(fp, "\n#");
+
+ if (stat->jcc_fwd > 0)
+ fprintf(fp, "\n%12s: %5.1f%%",
+ "JCC forward",
+ 100.0 * (double)stat->jcc_fwd / (double)total);
+
+ if (stat->jcc_bwd > 0)
+ fprintf(fp, "\n%12s: %5.1f%%",
+ "JCC backward",
+ 100.0 * (double)stat->jcc_bwd / (double)total);
+
+ if (stat->jmp > 0)
+ fprintf(fp, "\n%12s: %5.1f%%",
+ "JMP",
+ 100.0 * (double)stat->jmp / (double)total);
+
+ if (stat->ind_jmp > 0)
+ fprintf(fp, "\n%12s: %5.1f%%",
+ "IND_JMP",
+ 100.0 * (double)stat->ind_jmp / (double)total);
+
+ if (stat->call > 0)
+ fprintf(fp, "\n%12s: %5.1f%%",
+ "CALL",
+ 100.0 * (double)stat->call / (double)total);
+
+ if (stat->ind_call > 0)
+ fprintf(fp, "\n%12s: %5.1f%%",
+ "IND_CALL",
+ 100.0 * (double)stat->ind_call / (double)total);
+
+ if (stat->ret > 0)
+ fprintf(fp, "\n%12s: %5.1f%%",
+ "RET",
+ 100.0 * (double)stat->ret / (double)total);
+
+ if (stat->far_branch > 0)
+ fprintf(fp, "\n%12s: %5.1f%%",
+ "FAR_BRANCH",
+ 100.0 * (double)stat->far_branch / (double)total);
+}
+
static int perf_evlist__tty_browse_hists(struct perf_evlist *evlist,
struct report *rep,
const char *help)
@@ -404,6 +539,9 @@ static int perf_evlist__tty_browse_hists(struct perf_evlist *evlist,
perf_read_values_destroy(&rep->show_threads_values);
}
+ if (sort__mode == SORT_MODE__BRANCH)
+ branch_type_stat_display(stdout, &rep->brtype_stat);
+
return 0;
}
@@ -936,6 +1074,8 @@ int cmd_report(int argc, const char **argv)
if (has_br_stack && branch_call_mode)
symbol_conf.show_branchflag_count = true;
+ memset(&report.brtype_stat, 0, sizeof(struct branch_type_stat));
+
/*
* Branch mode is a tristate:
* -1 means default, so decide based on the file having branch data.
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 61bf304..c8aee25 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -745,12 +745,9 @@ iter_prepare_branch_entry(struct hist_entry_iter *iter, struct addr_location *al
}
static int
-iter_add_single_branch_entry(struct hist_entry_iter *iter,
+iter_add_single_branch_entry(struct hist_entry_iter *iter __maybe_unused,
struct addr_location *al __maybe_unused)
{
- /* to avoid calling callback function */
- iter->he = NULL;
-
return 0;
}
--
2.7.4
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v1 5/5] perf report: Show branch type in callchain entry
2017-03-31 15:18 [PATCH v1 0/5] perf report: Show branch type Jin Yao
` (3 preceding siblings ...)
2017-03-31 15:18 ` [PATCH v1 4/5] perf report: Show branch type statistics for stdio mode Jin Yao
@ 2017-03-31 15:18 ` Jin Yao
4 siblings, 0 replies; 16+ messages in thread
From: Jin Yao @ 2017-03-31 15:18 UTC (permalink / raw)
To: acme, jolsa; +Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
Show branch type in callchain entry. The branch type is printed
with other LBR information (such as cycles/abort/...).
The branch types are:
JCC forward: Conditional forward jump
JCC backward: Conditional backward jump
JMP: Jump imm
IND_JMP: Jump reg/mem
CALL: Call imm
IND_CALL: Call reg/mem
RET: Ret
FAR_BRANCH: SYSCALL/SYSRET, IRQ, IRET, TSX Abort
One example:
perf report --branch-history --stdio --no-children
--23.91%--main div.c:42 (RET cycles:2)
compute_flag div.c:28 (RET cycles:2)
compute_flag div.c:27 (RET cycles:1)
rand rand.c:28 (RET cycles:1)
rand rand.c:28 (RET cycles:1)
__random random.c:298 (RET cycles:1)
__random random.c:297 (JCC forward cycles:1)
__random random.c:295 (JCC forward cycles:1)
__random random.c:295 (JCC forward cycles:1)
__random random.c:295 (JCC forward cycles:1)
__random random.c:295 (RET cycles:9)
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
tools/perf/util/callchain.c | 168 ++++++++++++++++++++++++++++----------------
tools/perf/util/callchain.h | 13 ++++
tools/perf/util/event.h | 3 +-
3 files changed, 124 insertions(+), 60 deletions(-)
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 3cea1fb..f8f4c26 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -428,6 +428,44 @@ create_child(struct callchain_node *parent, bool inherit_children)
return new;
}
+static const char *br_type_name[BR_IDX_MAX] = {
+ "JCC forward",
+ "JCC backward",
+ "JMP",
+ "IND_JMP",
+ "CALL",
+ "IND_CALL",
+ "RET",
+ "FAR_BRANCH",
+};
+
+static void
+branch_type_count(int *counts, struct branch_flags *flags)
+{
+ if ((flags->type & PERF_BR_CALL) == PERF_BR_CALL)
+ counts[BR_IDX_CALL]++;
+
+ if ((flags->type & PERF_BR_RET) == PERF_BR_RET)
+ counts[BR_IDX_RET]++;
+
+ if ((flags->type & PERF_BR_FAR_BRANCH) == PERF_BR_FAR_BRANCH)
+ counts[BR_IDX_FAR_BRANCH]++;
+
+ if ((flags->type & PERF_BR_JCC_FWD) == PERF_BR_JCC_FWD)
+ counts[BR_IDX_JCC_FWD]++;
+
+ if ((flags->type & PERF_BR_JCC_BWD) == PERF_BR_JCC_BWD)
+ counts[BR_IDX_JCC_BWD]++;
+
+ if ((flags->type & PERF_BR_JMP) == PERF_BR_JMP)
+ counts[BR_IDX_JMP]++;
+
+ if ((flags->type & PERF_BR_IND_CALL) == PERF_BR_IND_CALL)
+ counts[BR_IDX_IND_CALL]++;
+
+ if ((flags->type & PERF_BR_IND_JMP) == PERF_BR_IND_JMP)
+ counts[BR_IDX_IND_JMP]++;
+}
/*
* Fill the node with callchain values
@@ -467,6 +505,9 @@ fill_node(struct callchain_node *node, struct callchain_cursor *cursor)
call->cycles_count = cursor_node->branch_flags.cycles;
call->iter_count = cursor_node->nr_loop_iter;
call->samples_count = cursor_node->samples;
+
+ branch_type_count(call->brtype_count,
+ &cursor_node->branch_flags);
}
list_add_tail(&call->list, &node->val);
@@ -579,6 +620,9 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
cnode->cycles_count += node->branch_flags.cycles;
cnode->iter_count += node->nr_loop_iter;
cnode->samples_count += node->samples;
+
+ branch_type_count(cnode->brtype_count,
+ &node->branch_flags);
}
return MATCH_EQ;
@@ -1105,95 +1149,100 @@ int callchain_branch_counts(struct callchain_root *root,
cycles_count);
}
+static int branch_type_str(int *counts, char *bf, int bfsize)
+{
+ int i, printed = 0;
+
+ for (i = 0; i < BR_IDX_MAX; i++) {
+ if (printed == bfsize - 1)
+ return printed;
+
+ if (counts[i] > 0) {
+ printed += scnprintf(bf + printed, bfsize - printed,
+ " (%s", br_type_name[i]);
+ }
+ }
+
+ return printed;
+}
+
static int counts_str_build(char *bf, int bfsize,
u64 branch_count, u64 predicted_count,
u64 abort_count, u64 cycles_count,
- u64 iter_count, u64 samples_count)
+ u64 iter_count, u64 samples_count,
+ int *brtype_count)
{
- double predicted_percent = 0.0;
- const char *null_str = "";
- char iter_str[32];
- char cycle_str[32];
- char *istr, *cstr;
u64 cycles;
+ int printed, i = 0;
if (branch_count == 0)
return scnprintf(bf, bfsize, " (calltrace)");
- cycles = cycles_count / branch_count;
+ printed = branch_type_str(brtype_count, bf, bfsize);
+ if (printed)
+ i++;
- if (iter_count && samples_count) {
- if (cycles > 0)
- scnprintf(iter_str, sizeof(iter_str),
- " iterations:%" PRId64 "",
- iter_count / samples_count);
+ cycles = cycles_count / branch_count;
+ if (cycles) {
+ if (i++)
+ printed += scnprintf(bf + printed, bfsize - printed,
+ " cycles:%" PRId64 "", cycles);
else
- scnprintf(iter_str, sizeof(iter_str),
- "iterations:%" PRId64 "",
- iter_count / samples_count);
- istr = iter_str;
- } else
- istr = (char *)null_str;
-
- if (cycles > 0) {
- scnprintf(cycle_str, sizeof(cycle_str),
- "cycles:%" PRId64 "", cycles);
- cstr = cycle_str;
- } else
- cstr = (char *)null_str;
-
- predicted_percent = predicted_count * 100.0 / branch_count;
+ printed += scnprintf(bf + printed, bfsize - printed,
+ " (cycles:%" PRId64 "", cycles);
+ }
- if ((predicted_count == branch_count) && (abort_count == 0)) {
- if ((cycles > 0) || (istr != (char *)null_str))
- return scnprintf(bf, bfsize, " (%s%s)", cstr, istr);
+ if (iter_count && samples_count) {
+ if (i++)
+ printed += scnprintf(bf + printed, bfsize - printed,
+ " iterations:%" PRId64 "",
+ iter_count / samples_count);
else
- return scnprintf(bf, bfsize, "%s", (char *)null_str);
+ printed += scnprintf(bf + printed, bfsize - printed,
+ " (iterations:%" PRId64 "",
+ iter_count / samples_count);
}
- if ((predicted_count < branch_count) && (abort_count == 0)) {
- if ((cycles > 0) || (istr != (char *)null_str))
- return scnprintf(bf, bfsize,
- " (predicted:%.1f%% %s%s)",
- predicted_percent, cstr, istr);
- else {
- return scnprintf(bf, bfsize,
- " (predicted:%.1f%%)",
- predicted_percent);
- }
+ if (predicted_count < branch_count) {
+ if (i++)
+ printed += scnprintf(bf + printed, bfsize - printed,
+ " predicted:%.1f%%",
+ predicted_count * 100.0 / branch_count);
+ else
+ printed += scnprintf(bf + printed, bfsize - printed,
+ " (predicted:%.1f%%",
+ predicted_count * 100.0 / branch_count);
}
- if ((predicted_count == branch_count) && (abort_count > 0)) {
- if ((cycles > 0) || (istr != (char *)null_str))
- return scnprintf(bf, bfsize,
- " (abort:%" PRId64 " %s%s)",
- abort_count, cstr, istr);
+ if (abort_count) {
+ if (i++)
+ printed += scnprintf(bf + printed, bfsize - printed,
+ " abort:%.1f%%",
+ abort_count * 100.0 / branch_count);
else
- return scnprintf(bf, bfsize,
- " (abort:%" PRId64 ")",
- abort_count);
+ printed += scnprintf(bf + printed, bfsize - printed,
+ " (abort:%.1f%%",
+ abort_count * 100.0 / branch_count);
}
- if ((cycles > 0) || (istr != (char *)null_str))
- return scnprintf(bf, bfsize,
- " (predicted:%.1f%% abort:%" PRId64 " %s%s)",
- predicted_percent, abort_count, cstr, istr);
+ if (i)
+ return scnprintf(bf + printed, bfsize - printed, ")");
- return scnprintf(bf, bfsize,
- " (predicted:%.1f%% abort:%" PRId64 ")",
- predicted_percent, abort_count);
+ bf[0] = 0;
+ return 0;
}
static int callchain_counts_printf(FILE *fp, char *bf, int bfsize,
u64 branch_count, u64 predicted_count,
u64 abort_count, u64 cycles_count,
- u64 iter_count, u64 samples_count)
+ u64 iter_count, u64 samples_count,
+ int *brtype_count)
{
char str[128];
counts_str_build(str, sizeof(str), branch_count,
predicted_count, abort_count, cycles_count,
- iter_count, samples_count);
+ iter_count, samples_count, brtype_count);
if (fp)
return fprintf(fp, "%s", str);
@@ -1225,7 +1274,8 @@ int callchain_list_counts__printf_value(struct callchain_node *node,
return callchain_counts_printf(fp, bf, bfsize, branch_count,
predicted_count, abort_count,
- cycles_count, iter_count, samples_count);
+ cycles_count, iter_count, samples_count,
+ clist->brtype_count);
}
static void free_callchain_node(struct callchain_node *node)
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index c56c23d..994aa5a 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -106,6 +106,18 @@ struct callchain_param {
extern struct callchain_param callchain_param;
extern struct callchain_param callchain_param_default;
+enum {
+ BR_IDX_JCC_FWD = 0,
+ BR_IDX_JCC_BWD = 1,
+ BR_IDX_JMP = 2,
+ BR_IDX_IND_JMP = 3,
+ BR_IDX_CALL = 4,
+ BR_IDX_IND_CALL = 5,
+ BR_IDX_RET = 6,
+ BR_IDX_FAR_BRANCH = 7,
+ BR_IDX_MAX,
+};
+
struct callchain_list {
u64 ip;
struct map_symbol ms;
@@ -119,6 +131,7 @@ struct callchain_list {
u64 cycles_count;
u64 iter_count;
u64 samples_count;
+ int brtype_count[BR_IDX_MAX];
char *srcline;
struct list_head list;
};
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index eb7a7b2..4c1a6da 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -142,7 +142,8 @@ struct branch_flags {
u64 in_tx:1;
u64 abort:1;
u64 cycles:16;
- u64 reserved:44;
+ u64 type:9;
+ u64 reserved:35;
};
struct branch_entry {
--
2.7.4
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v1 1/5] perf/core: Define the common branch type classification
2017-03-31 15:18 ` [PATCH v1 1/5] perf/core: Define the common branch type classification Jin Yao
@ 2017-04-04 14:18 ` Arnaldo Carvalho de Melo
2017-04-04 15:52 ` Jin, Yao
2017-04-06 6:58 ` Peter Zijlstra
0 siblings, 2 replies; 16+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-04-04 14:18 UTC (permalink / raw)
To: Jin Yao
Cc: Jiri Olsa, linux-kernel, ak, kan.liang, yao.jin, Peter Zijlstra,
Alexander Shishkin, Ingo Molnar
Adding the perf kernel maintainers to the CC list.
Em Fri, Mar 31, 2017 at 11:18:38PM +0800, Jin Yao escreveu:
> It is often useful to know the branch types while analyzing branch
> data. For example, a call is very different from a conditional branch.
>
> Currently we have to look it up in binary while the binary may later
> not be available and even the binary is available but user has to take
> some time. It is very useful for user to check it directly in perf
> report.
>
> Perf already has support for disassembling the branch instruction
> to get the branch type. The branch type is defined in lbr.c.
>
> To keep consistent on kernel and userspace and make the classification
> more common, the patch adds the common branch type classification
> in perf_event.h.
>
> Since the disassembling of branch instruction needs some overhead,
> a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
> needs to disassemble the branch instruction and record the branch
> type.
>
> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
> ---
> include/uapi/linux/perf_event.h | 24 +++++++++++++++++++++++-
> tools/include/uapi/linux/perf_event.h | 24 +++++++++++++++++++++++-
> 2 files changed, 46 insertions(+), 2 deletions(-)
>
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index d09a9cd..4d731fd 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
> PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT = 14, /* no flags */
> PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT = 15, /* no cycles */
>
> + PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT = 16, /* save branch type */
> +
> PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */
> };
>
> @@ -198,9 +200,27 @@ enum perf_branch_sample_type {
> PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
> PERF_SAMPLE_BRANCH_NO_CYCLES = 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
>
> + PERF_SAMPLE_BRANCH_TYPE_SAVE =
> + 1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
> +
> PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
> };
>
> +/*
> + * Common flow change classification
> + */
> +enum {
> + PERF_BR_NONE = 0, /* unknown */
> + PERF_BR_JCC_FWD = 1 << 1, /* conditional forward jump */
> + PERF_BR_JCC_BWD = 1 << 2, /* conditional backward jump */
> + PERF_BR_JMP = 1 << 3, /* jump */
> + PERF_BR_IND_JMP = 1 << 4, /* indirect jump */
> + PERF_BR_CALL = 1 << 5, /* call */
> + PERF_BR_IND_CALL = 1 << 6, /* indirect call */
> + PERF_BR_RET = 1 << 7, /* return */
> + PERF_BR_FAR_BRANCH = 1 << 8, /* SYSCALL,SYSRET,IRQ,... */
Humm, wouldn't be better to have those in separate buckets? I.e.
PERF_BR_SYSCALL
PERF_BR_SYSRET,
PERF_BR_IRQ
etc?
And why a bitmask? /me reads a bit more... couldn't find a reason for
this:
+ type:9, /* branch type */
Do you have a reason to use 9 bits? Why not just:
enum {
PERF_BR_NONE = 0, /* unknown */
PERF_BR_JCC_FWD = 1, /* conditional forward jump */
PERF_BR_JCC_BWD = 2, /* conditional backward jump */
PERF_BR_JMP = 3, /* jump */
PERF_BR_IND_JMP = 4, /* indirect jump */
PERF_BR_CALL = 5, /* call */
PERF_BR_IND_CALL = 6, /* indirect call */
PERF_BR_RET = 7, /* return */
PERF_BR_FAR_BRANCH = 8, /* SYSCALL,SYSRET,IRQ,... */
And then use, say, 4 or 5 bits for that type field?
I must be missing something trivial ;-\
- Arnaldo
> +};
> +
> #define PERF_SAMPLE_BRANCH_PLM_ALL \
> (PERF_SAMPLE_BRANCH_USER|\
> PERF_SAMPLE_BRANCH_KERNEL|\
> @@ -999,6 +1019,7 @@ union perf_mem_data_src {
> * in_tx: running in a hardware transaction
> * abort: aborting a hardware transaction
> * cycles: cycles from last branch (or 0 if not supported)
> + * type: branch type
> */
> struct perf_branch_entry {
> __u64 from;
> @@ -1008,7 +1029,8 @@ struct perf_branch_entry {
> in_tx:1, /* in transaction */
> abort:1, /* transaction abort */
> cycles:16, /* cycle count to last branch */
> - reserved:44;
> + type:9, /* branch type */
> + reserved:35;
> };
>
> #endif /* _UAPI_LINUX_PERF_EVENT_H */
> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
> index d09a9cd..4d731fd 100644
> --- a/tools/include/uapi/linux/perf_event.h
> +++ b/tools/include/uapi/linux/perf_event.h
> @@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
> PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT = 14, /* no flags */
> PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT = 15, /* no cycles */
>
> + PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT = 16, /* save branch type */
> +
> PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */
> };
>
> @@ -198,9 +200,27 @@ enum perf_branch_sample_type {
> PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
> PERF_SAMPLE_BRANCH_NO_CYCLES = 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
>
> + PERF_SAMPLE_BRANCH_TYPE_SAVE =
> + 1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
> +
> PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
> };
>
> +/*
> + * Common flow change classification
> + */
> +enum {
> + PERF_BR_NONE = 0, /* unknown */
> + PERF_BR_JCC_FWD = 1 << 1, /* conditional forward jump */
> + PERF_BR_JCC_BWD = 1 << 2, /* conditional backward jump */
> + PERF_BR_JMP = 1 << 3, /* jump */
> + PERF_BR_IND_JMP = 1 << 4, /* indirect jump */
> + PERF_BR_CALL = 1 << 5, /* call */
> + PERF_BR_IND_CALL = 1 << 6, /* indirect call */
> + PERF_BR_RET = 1 << 7, /* return */
> + PERF_BR_FAR_BRANCH = 1 << 8, /* SYSCALL,SYSRET,IRQ,... */
> +};
> +
> #define PERF_SAMPLE_BRANCH_PLM_ALL \
> (PERF_SAMPLE_BRANCH_USER|\
> PERF_SAMPLE_BRANCH_KERNEL|\
> @@ -999,6 +1019,7 @@ union perf_mem_data_src {
> * in_tx: running in a hardware transaction
> * abort: aborting a hardware transaction
> * cycles: cycles from last branch (or 0 if not supported)
> + * type: branch type
> */
> struct perf_branch_entry {
> __u64 from;
> @@ -1008,7 +1029,8 @@ struct perf_branch_entry {
> in_tx:1, /* in transaction */
> abort:1, /* transaction abort */
> cycles:16, /* cycle count to last branch */
> - reserved:44;
> + type:9, /* branch type */
> + reserved:35;
> };
>
> #endif /* _UAPI_LINUX_PERF_EVENT_H */
> --
> 2.7.4
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 1/5] perf/core: Define the common branch type classification
2017-04-04 14:18 ` Arnaldo Carvalho de Melo
@ 2017-04-04 15:52 ` Jin, Yao
2017-04-04 16:09 ` Arnaldo Carvalho de Melo
2017-04-06 6:58 ` Peter Zijlstra
1 sibling, 1 reply; 16+ messages in thread
From: Jin, Yao @ 2017-04-04 15:52 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Jiri Olsa, linux-kernel, ak, kan.liang, yao.jin, Peter Zijlstra,
Alexander Shishkin, Ingo Molnar
On 4/4/2017 10:18 PM, Arnaldo Carvalho de Melo wrote:
> Adding the perf kernel maintainers to the CC list.
>
> Em Fri, Mar 31, 2017 at 11:18:38PM +0800, Jin Yao escreveu:
>> It is often useful to know the branch types while analyzing branch
>> data. For example, a call is very different from a conditional branch.
>>
>> Currently we have to look it up in binary while the binary may later
>> not be available and even the binary is available but user has to take
>> some time. It is very useful for user to check it directly in perf
>> report.
>>
>> Perf already has support for disassembling the branch instruction
>> to get the branch type. The branch type is defined in lbr.c.
>>
>> To keep consistent on kernel and userspace and make the classification
>> more common, the patch adds the common branch type classification
>> in perf_event.h.
>>
>> Since the disassembling of branch instruction needs some overhead,
>> a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
>> needs to disassemble the branch instruction and record the branch
>> type.
>>
>> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
>> ---
>> include/uapi/linux/perf_event.h | 24 +++++++++++++++++++++++-
>> tools/include/uapi/linux/perf_event.h | 24 +++++++++++++++++++++++-
>> 2 files changed, 46 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
>> index d09a9cd..4d731fd 100644
>> --- a/include/uapi/linux/perf_event.h
>> +++ b/include/uapi/linux/perf_event.h
>> @@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
>> PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT = 14, /* no flags */
>> PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT = 15, /* no cycles */
>>
>> + PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT = 16, /* save branch type */
>> +
>> PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */
>> };
>>
>> @@ -198,9 +200,27 @@ enum perf_branch_sample_type {
>> PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
>> PERF_SAMPLE_BRANCH_NO_CYCLES = 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
>>
>> + PERF_SAMPLE_BRANCH_TYPE_SAVE =
>> + 1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
>> +
>> PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
>> };
>>
>> +/*
>> + * Common flow change classification
>> + */
>> +enum {
>> + PERF_BR_NONE = 0, /* unknown */
>> + PERF_BR_JCC_FWD = 1 << 1, /* conditional forward jump */
>> + PERF_BR_JCC_BWD = 1 << 2, /* conditional backward jump */
>> + PERF_BR_JMP = 1 << 3, /* jump */
>> + PERF_BR_IND_JMP = 1 << 4, /* indirect jump */
>> + PERF_BR_CALL = 1 << 5, /* call */
>> + PERF_BR_IND_CALL = 1 << 6, /* indirect call */
>> + PERF_BR_RET = 1 << 7, /* return */
>> + PERF_BR_FAR_BRANCH = 1 << 8, /* SYSCALL,SYSRET,IRQ,... */
> Humm, wouldn't be better to have those in separate buckets? I.e.
>
> PERF_BR_SYSCALL
> PERF_BR_SYSRET,
> PERF_BR_IRQ
>
> etc?
There are also other types. I.e. abort, ..... I use FAR_BRANCH is
because I just want to differentiate between basic branch types and
others. (others may be too much and platform specific).
>
> And why a bitmask? /me reads a bit more... couldn't find a reason for
> this:
>
> + type:9, /* branch type */
>
> Do you have a reason to use 9 bits? Why not just:
>
> enum {
> PERF_BR_NONE = 0, /* unknown */
> PERF_BR_JCC_FWD = 1, /* conditional forward jump */
> PERF_BR_JCC_BWD = 2, /* conditional backward jump */
> PERF_BR_JMP = 3, /* jump */
> PERF_BR_IND_JMP = 4, /* indirect jump */
> PERF_BR_CALL = 5, /* call */
> PERF_BR_IND_CALL = 6, /* indirect call */
> PERF_BR_RET = 7, /* return */
> PERF_BR_FAR_BRANCH = 8, /* SYSCALL,SYSRET,IRQ,... */
>
> And then use, say, 4 or 5 bits for that type field?
>
> I must be missing something trivial ;-\
>
> - Arnaldo
You are right. I made things more complicated. Yes, the definitions
should be clear and simple. I will redefine them in v2.
Thanks
Jin Yao
>
>> +};
>> +
>> #define PERF_SAMPLE_BRANCH_PLM_ALL \
>> (PERF_SAMPLE_BRANCH_USER|\
>> PERF_SAMPLE_BRANCH_KERNEL|\
>> @@ -999,6 +1019,7 @@ union perf_mem_data_src {
>> * in_tx: running in a hardware transaction
>> * abort: aborting a hardware transaction
>> * cycles: cycles from last branch (or 0 if not supported)
>> + * type: branch type
>> */
>> struct perf_branch_entry {
>> __u64 from;
>> @@ -1008,7 +1029,8 @@ struct perf_branch_entry {
>> in_tx:1, /* in transaction */
>> abort:1, /* transaction abort */
>> cycles:16, /* cycle count to last branch */
>> - reserved:44;
>> + type:9, /* branch type */
>> + reserved:35;
>> };
>>
>> #endif /* _UAPI_LINUX_PERF_EVENT_H */
>> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
>> index d09a9cd..4d731fd 100644
>> --- a/tools/include/uapi/linux/perf_event.h
>> +++ b/tools/include/uapi/linux/perf_event.h
>> @@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
>> PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT = 14, /* no flags */
>> PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT = 15, /* no cycles */
>>
>> + PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT = 16, /* save branch type */
>> +
>> PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */
>> };
>>
>> @@ -198,9 +200,27 @@ enum perf_branch_sample_type {
>> PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
>> PERF_SAMPLE_BRANCH_NO_CYCLES = 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
>>
>> + PERF_SAMPLE_BRANCH_TYPE_SAVE =
>> + 1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
>> +
>> PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
>> };
>>
>> +/*
>> + * Common flow change classification
>> + */
>> +enum {
>> + PERF_BR_NONE = 0, /* unknown */
>> + PERF_BR_JCC_FWD = 1 << 1, /* conditional forward jump */
>> + PERF_BR_JCC_BWD = 1 << 2, /* conditional backward jump */
>> + PERF_BR_JMP = 1 << 3, /* jump */
>> + PERF_BR_IND_JMP = 1 << 4, /* indirect jump */
>> + PERF_BR_CALL = 1 << 5, /* call */
>> + PERF_BR_IND_CALL = 1 << 6, /* indirect call */
>> + PERF_BR_RET = 1 << 7, /* return */
>> + PERF_BR_FAR_BRANCH = 1 << 8, /* SYSCALL,SYSRET,IRQ,... */
>> +};
>> +
>> #define PERF_SAMPLE_BRANCH_PLM_ALL \
>> (PERF_SAMPLE_BRANCH_USER|\
>> PERF_SAMPLE_BRANCH_KERNEL|\
>> @@ -999,6 +1019,7 @@ union perf_mem_data_src {
>> * in_tx: running in a hardware transaction
>> * abort: aborting a hardware transaction
>> * cycles: cycles from last branch (or 0 if not supported)
>> + * type: branch type
>> */
>> struct perf_branch_entry {
>> __u64 from;
>> @@ -1008,7 +1029,8 @@ struct perf_branch_entry {
>> in_tx:1, /* in transaction */
>> abort:1, /* transaction abort */
>> cycles:16, /* cycle count to last branch */
>> - reserved:44;
>> + type:9, /* branch type */
>> + reserved:35;
>> };
>>
>> #endif /* _UAPI_LINUX_PERF_EVENT_H */
>> --
>> 2.7.4
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 1/5] perf/core: Define the common branch type classification
2017-04-04 15:52 ` Jin, Yao
@ 2017-04-04 16:09 ` Arnaldo Carvalho de Melo
2017-04-06 0:09 ` Jin, Yao
0 siblings, 1 reply; 16+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-04-04 16:09 UTC (permalink / raw)
To: Jin, Yao
Cc: Jiri Olsa, linux-kernel, ak, kan.liang, yao.jin, Peter Zijlstra,
Alexander Shishkin, Ingo Molnar
Em Tue, Apr 04, 2017 at 11:52:53PM +0800, Jin, Yao escreveu:
>
>
> On 4/4/2017 10:18 PM, Arnaldo Carvalho de Melo wrote:
> > Adding the perf kernel maintainers to the CC list.
> >
> > Em Fri, Mar 31, 2017 at 11:18:38PM +0800, Jin Yao escreveu:
> > > It is often useful to know the branch types while analyzing branch
> > > data. For example, a call is very different from a conditional branch.
> > >
> > > Currently we have to look it up in binary while the binary may later
> > > not be available and even the binary is available but user has to take
> > > some time. It is very useful for user to check it directly in perf
> > > report.
> > >
> > > Perf already has support for disassembling the branch instruction
> > > to get the branch type. The branch type is defined in lbr.c.
> > >
> > > To keep consistent on kernel and userspace and make the classification
> > > more common, the patch adds the common branch type classification
> > > in perf_event.h.
> > >
> > > Since the disassembling of branch instruction needs some overhead,
> > > a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
> > > needs to disassemble the branch instruction and record the branch
> > > type.
> > >
> > > Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
> > > ---
> > > include/uapi/linux/perf_event.h | 24 +++++++++++++++++++++++-
> > > tools/include/uapi/linux/perf_event.h | 24 +++++++++++++++++++++++-
> > > 2 files changed, 46 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> > > index d09a9cd..4d731fd 100644
> > > --- a/include/uapi/linux/perf_event.h
> > > +++ b/include/uapi/linux/perf_event.h
> > > @@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
> > > PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT = 14, /* no flags */
> > > PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT = 15, /* no cycles */
> > > + PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT = 16, /* save branch type */
> > > +
> > > PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */
> > > };
> > > @@ -198,9 +200,27 @@ enum perf_branch_sample_type {
> > > PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
> > > PERF_SAMPLE_BRANCH_NO_CYCLES = 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
> > > + PERF_SAMPLE_BRANCH_TYPE_SAVE =
> > > + 1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
> > > +
> > > PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
> > > };
> > > +/*
> > > + * Common flow change classification
> > > + */
> > > +enum {
> > > + PERF_BR_NONE = 0, /* unknown */
> > > + PERF_BR_JCC_FWD = 1 << 1, /* conditional forward jump */
> > > + PERF_BR_JCC_BWD = 1 << 2, /* conditional backward jump */
> > > + PERF_BR_JMP = 1 << 3, /* jump */
> > > + PERF_BR_IND_JMP = 1 << 4, /* indirect jump */
> > > + PERF_BR_CALL = 1 << 5, /* call */
> > > + PERF_BR_IND_CALL = 1 << 6, /* indirect call */
> > > + PERF_BR_RET = 1 << 7, /* return */
> > > + PERF_BR_FAR_BRANCH = 1 << 8, /* SYSCALL,SYSRET,IRQ,... */
> > Humm, wouldn't be better to have those in separate buckets? I.e.
> >
> > PERF_BR_SYSCALL
> > PERF_BR_SYSRET,
> > PERF_BR_IRQ
> >
> > etc?
>
> There are also other types. I.e. abort, ..... I use FAR_BRANCH is because I
> just want to differentiate between basic branch types and others. (others
> may be too much and platform specific).
I understand that this is what you need right now, but "syscall",
"sysret", "irq", look generic enough, no?
Really, really arch specific stuff could indeed be lumped together in a
FAR_BRANCH, but those used as an example, above (/*
SYSCALL,SYSRET,IRQ,... */) seems potentially useful to have untangled?
> > And why a bitmask? /me reads a bit more... couldn't find a reason for
> > this:
> >
> > + type:9, /* branch type */
> >
> > Do you have a reason to use 9 bits? Why not just:
> >
> > enum {
> > PERF_BR_NONE = 0, /* unknown */
> > PERF_BR_JCC_FWD = 1, /* conditional forward jump */
> > PERF_BR_JCC_BWD = 2, /* conditional backward jump */
> > PERF_BR_JMP = 3, /* jump */
> > PERF_BR_IND_JMP = 4, /* indirect jump */
> > PERF_BR_CALL = 5, /* call */
> > PERF_BR_IND_CALL = 6, /* indirect call */
> > PERF_BR_RET = 7, /* return */
> > PERF_BR_FAR_BRANCH = 8, /* SYSCALL,SYSRET,IRQ,... */
> >
> > And then use, say, 4 or 5 bits for that type field?
> >
> > I must be missing something trivial ;-\
> >
> > - Arnaldo
>
> You are right. I made things more complicated. Yes, the definitions should
> be clear and simple. I will redefine them in v2.
Thanks, I wasn't missing anything, uff :-)
> Thanks
> Jin Yao
>
> >
> > > +};
> > > +
> > > #define PERF_SAMPLE_BRANCH_PLM_ALL \
> > > (PERF_SAMPLE_BRANCH_USER|\
> > > PERF_SAMPLE_BRANCH_KERNEL|\
> > > @@ -999,6 +1019,7 @@ union perf_mem_data_src {
> > > * in_tx: running in a hardware transaction
> > > * abort: aborting a hardware transaction
> > > * cycles: cycles from last branch (or 0 if not supported)
> > > + * type: branch type
> > > */
> > > struct perf_branch_entry {
> > > __u64 from;
> > > @@ -1008,7 +1029,8 @@ struct perf_branch_entry {
> > > in_tx:1, /* in transaction */
> > > abort:1, /* transaction abort */
> > > cycles:16, /* cycle count to last branch */
> > > - reserved:44;
> > > + type:9, /* branch type */
> > > + reserved:35;
> > > };
> > > #endif /* _UAPI_LINUX_PERF_EVENT_H */
> > > diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
> > > index d09a9cd..4d731fd 100644
> > > --- a/tools/include/uapi/linux/perf_event.h
> > > +++ b/tools/include/uapi/linux/perf_event.h
> > > @@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
> > > PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT = 14, /* no flags */
> > > PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT = 15, /* no cycles */
> > > + PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT = 16, /* save branch type */
> > > +
> > > PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */
> > > };
> > > @@ -198,9 +200,27 @@ enum perf_branch_sample_type {
> > > PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
> > > PERF_SAMPLE_BRANCH_NO_CYCLES = 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
> > > + PERF_SAMPLE_BRANCH_TYPE_SAVE =
> > > + 1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
> > > +
> > > PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
> > > };
> > > +/*
> > > + * Common flow change classification
> > > + */
> > > +enum {
> > > + PERF_BR_NONE = 0, /* unknown */
> > > + PERF_BR_JCC_FWD = 1 << 1, /* conditional forward jump */
> > > + PERF_BR_JCC_BWD = 1 << 2, /* conditional backward jump */
> > > + PERF_BR_JMP = 1 << 3, /* jump */
> > > + PERF_BR_IND_JMP = 1 << 4, /* indirect jump */
> > > + PERF_BR_CALL = 1 << 5, /* call */
> > > + PERF_BR_IND_CALL = 1 << 6, /* indirect call */
> > > + PERF_BR_RET = 1 << 7, /* return */
> > > + PERF_BR_FAR_BRANCH = 1 << 8, /* SYSCALL,SYSRET,IRQ,... */
> > > +};
> > > +
> > > #define PERF_SAMPLE_BRANCH_PLM_ALL \
> > > (PERF_SAMPLE_BRANCH_USER|\
> > > PERF_SAMPLE_BRANCH_KERNEL|\
> > > @@ -999,6 +1019,7 @@ union perf_mem_data_src {
> > > * in_tx: running in a hardware transaction
> > > * abort: aborting a hardware transaction
> > > * cycles: cycles from last branch (or 0 if not supported)
> > > + * type: branch type
> > > */
> > > struct perf_branch_entry {
> > > __u64 from;
> > > @@ -1008,7 +1029,8 @@ struct perf_branch_entry {
> > > in_tx:1, /* in transaction */
> > > abort:1, /* transaction abort */
> > > cycles:16, /* cycle count to last branch */
> > > - reserved:44;
> > > + type:9, /* branch type */
> > > + reserved:35;
> > > };
> > > #endif /* _UAPI_LINUX_PERF_EVENT_H */
> > > --
> > > 2.7.4
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 1/5] perf/core: Define the common branch type classification
2017-04-04 16:09 ` Arnaldo Carvalho de Melo
@ 2017-04-06 0:09 ` Jin, Yao
0 siblings, 0 replies; 16+ messages in thread
From: Jin, Yao @ 2017-04-06 0:09 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Jiri Olsa, linux-kernel, ak, kan.liang, yao.jin, Peter Zijlstra,
Alexander Shishkin, Ingo Molnar
On 4/5/2017 12:09 AM, Arnaldo Carvalho de Melo wrote:
> Em Tue, Apr 04, 2017 at 11:52:53PM +0800, Jin, Yao escreveu:
>>
>> On 4/4/2017 10:18 PM, Arnaldo Carvalho de Melo wrote:
>>> Adding the perf kernel maintainers to the CC list.
>>>
>>> Em Fri, Mar 31, 2017 at 11:18:38PM +0800, Jin Yao escreveu:
>>>> It is often useful to know the branch types while analyzing branch
>>>> data. For example, a call is very different from a conditional branch.
>>>>
>>>> Currently we have to look it up in binary while the binary may later
>>>> not be available and even the binary is available but user has to take
>>>> some time. It is very useful for user to check it directly in perf
>>>> report.
>>>>
>>>> Perf already has support for disassembling the branch instruction
>>>> to get the branch type. The branch type is defined in lbr.c.
>>>>
>>>> To keep consistent on kernel and userspace and make the classification
>>>> more common, the patch adds the common branch type classification
>>>> in perf_event.h.
>>>>
>>>> Since the disassembling of branch instruction needs some overhead,
>>>> a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
>>>> needs to disassemble the branch instruction and record the branch
>>>> type.
>>>>
>>>> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
>>>> ---
>>>> include/uapi/linux/perf_event.h | 24 +++++++++++++++++++++++-
>>>> tools/include/uapi/linux/perf_event.h | 24 +++++++++++++++++++++++-
>>>> 2 files changed, 46 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
>>>> index d09a9cd..4d731fd 100644
>>>> --- a/include/uapi/linux/perf_event.h
>>>> +++ b/include/uapi/linux/perf_event.h
>>>> @@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
>>>> PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT = 14, /* no flags */
>>>> PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT = 15, /* no cycles */
>>>> + PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT = 16, /* save branch type */
>>>> +
>>>> PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */
>>>> };
>>>> @@ -198,9 +200,27 @@ enum perf_branch_sample_type {
>>>> PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
>>>> PERF_SAMPLE_BRANCH_NO_CYCLES = 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
>>>> + PERF_SAMPLE_BRANCH_TYPE_SAVE =
>>>> + 1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
>>>> +
>>>> PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
>>>> };
>>>> +/*
>>>> + * Common flow change classification
>>>> + */
>>>> +enum {
>>>> + PERF_BR_NONE = 0, /* unknown */
>>>> + PERF_BR_JCC_FWD = 1 << 1, /* conditional forward jump */
>>>> + PERF_BR_JCC_BWD = 1 << 2, /* conditional backward jump */
>>>> + PERF_BR_JMP = 1 << 3, /* jump */
>>>> + PERF_BR_IND_JMP = 1 << 4, /* indirect jump */
>>>> + PERF_BR_CALL = 1 << 5, /* call */
>>>> + PERF_BR_IND_CALL = 1 << 6, /* indirect call */
>>>> + PERF_BR_RET = 1 << 7, /* return */
>>>> + PERF_BR_FAR_BRANCH = 1 << 8, /* SYSCALL,SYSRET,IRQ,... */
>>> Humm, wouldn't be better to have those in separate buckets? I.e.
>>>
>>> PERF_BR_SYSCALL
>>> PERF_BR_SYSRET,
>>> PERF_BR_IRQ
>>>
>>> etc?
>> There are also other types. I.e. abort, ..... I use FAR_BRANCH is because I
>> just want to differentiate between basic branch types and others. (others
>> may be too much and platform specific).
> I understand that this is what you need right now, but "syscall",
> "sysret", "irq", look generic enough, no?
>
> Really, really arch specific stuff could indeed be lumped together in a
> FAR_BRANCH, but those used as an example, above (/*
> SYSCALL,SYSRET,IRQ,... */) seems potentially useful to have untangled?
After considerations, yes, you are right. I will fix this in v2.
Thanks
Jin Yao
>>> And why a bitmask? /me reads a bit more... couldn't find a reason for
>>> this:
>>>
>>> + type:9, /* branch type */
>>>
>>> Do you have a reason to use 9 bits? Why not just:
>>>
>>> enum {
>>> PERF_BR_NONE = 0, /* unknown */
>>> PERF_BR_JCC_FWD = 1, /* conditional forward jump */
>>> PERF_BR_JCC_BWD = 2, /* conditional backward jump */
>>> PERF_BR_JMP = 3, /* jump */
>>> PERF_BR_IND_JMP = 4, /* indirect jump */
>>> PERF_BR_CALL = 5, /* call */
>>> PERF_BR_IND_CALL = 6, /* indirect call */
>>> PERF_BR_RET = 7, /* return */
>>> PERF_BR_FAR_BRANCH = 8, /* SYSCALL,SYSRET,IRQ,... */
>>>
>>> And then use, say, 4 or 5 bits for that type field?
>>>
>>> I must be missing something trivial ;-\
>>>
>>> - Arnaldo
>> You are right. I made things more complicated. Yes, the definitions should
>> be clear and simple. I will redefine them in v2.
> Thanks, I wasn't missing anything, uff :-)
>
>> Thanks
>> Jin Yao
>>
>>>> +};
>>>> +
>>>> #define PERF_SAMPLE_BRANCH_PLM_ALL \
>>>> (PERF_SAMPLE_BRANCH_USER|\
>>>> PERF_SAMPLE_BRANCH_KERNEL|\
>>>> @@ -999,6 +1019,7 @@ union perf_mem_data_src {
>>>> * in_tx: running in a hardware transaction
>>>> * abort: aborting a hardware transaction
>>>> * cycles: cycles from last branch (or 0 if not supported)
>>>> + * type: branch type
>>>> */
>>>> struct perf_branch_entry {
>>>> __u64 from;
>>>> @@ -1008,7 +1029,8 @@ struct perf_branch_entry {
>>>> in_tx:1, /* in transaction */
>>>> abort:1, /* transaction abort */
>>>> cycles:16, /* cycle count to last branch */
>>>> - reserved:44;
>>>> + type:9, /* branch type */
>>>> + reserved:35;
>>>> };
>>>> #endif /* _UAPI_LINUX_PERF_EVENT_H */
>>>> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
>>>> index d09a9cd..4d731fd 100644
>>>> --- a/tools/include/uapi/linux/perf_event.h
>>>> +++ b/tools/include/uapi/linux/perf_event.h
>>>> @@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
>>>> PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT = 14, /* no flags */
>>>> PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT = 15, /* no cycles */
>>>> + PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT = 16, /* save branch type */
>>>> +
>>>> PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */
>>>> };
>>>> @@ -198,9 +200,27 @@ enum perf_branch_sample_type {
>>>> PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
>>>> PERF_SAMPLE_BRANCH_NO_CYCLES = 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
>>>> + PERF_SAMPLE_BRANCH_TYPE_SAVE =
>>>> + 1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
>>>> +
>>>> PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
>>>> };
>>>> +/*
>>>> + * Common flow change classification
>>>> + */
>>>> +enum {
>>>> + PERF_BR_NONE = 0, /* unknown */
>>>> + PERF_BR_JCC_FWD = 1 << 1, /* conditional forward jump */
>>>> + PERF_BR_JCC_BWD = 1 << 2, /* conditional backward jump */
>>>> + PERF_BR_JMP = 1 << 3, /* jump */
>>>> + PERF_BR_IND_JMP = 1 << 4, /* indirect jump */
>>>> + PERF_BR_CALL = 1 << 5, /* call */
>>>> + PERF_BR_IND_CALL = 1 << 6, /* indirect call */
>>>> + PERF_BR_RET = 1 << 7, /* return */
>>>> + PERF_BR_FAR_BRANCH = 1 << 8, /* SYSCALL,SYSRET,IRQ,... */
>>>> +};
>>>> +
>>>> #define PERF_SAMPLE_BRANCH_PLM_ALL \
>>>> (PERF_SAMPLE_BRANCH_USER|\
>>>> PERF_SAMPLE_BRANCH_KERNEL|\
>>>> @@ -999,6 +1019,7 @@ union perf_mem_data_src {
>>>> * in_tx: running in a hardware transaction
>>>> * abort: aborting a hardware transaction
>>>> * cycles: cycles from last branch (or 0 if not supported)
>>>> + * type: branch type
>>>> */
>>>> struct perf_branch_entry {
>>>> __u64 from;
>>>> @@ -1008,7 +1029,8 @@ struct perf_branch_entry {
>>>> in_tx:1, /* in transaction */
>>>> abort:1, /* transaction abort */
>>>> cycles:16, /* cycle count to last branch */
>>>> - reserved:44;
>>>> + type:9, /* branch type */
>>>> + reserved:35;
>>>> };
>>>> #endif /* _UAPI_LINUX_PERF_EVENT_H */
>>>> --
>>>> 2.7.4
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 1/5] perf/core: Define the common branch type classification
2017-04-04 14:18 ` Arnaldo Carvalho de Melo
2017-04-04 15:52 ` Jin, Yao
@ 2017-04-06 6:58 ` Peter Zijlstra
2017-04-06 8:21 ` Jin, Yao
1 sibling, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2017-04-06 6:58 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Jin Yao, Jiri Olsa, linux-kernel, ak, kan.liang, yao.jin,
Alexander Shishkin, Ingo Molnar
On Tue, Apr 04, 2017 at 11:18:05AM -0300, Arnaldo Carvalho de Melo wrote:
> Adding the perf kernel maintainers to the CC list.
Thanks.
> Em Fri, Mar 31, 2017 at 11:18:38PM +0800, Jin Yao escreveu:
> > It is often useful to know the branch types while analyzing branch
> > data. For example, a call is very different from a conditional branch.
> >
> > Currently we have to look it up in binary while the binary may later
> > not be available and even the binary is available but user has to take
> > some time. It is very useful for user to check it directly in perf
> > report.
> >
> > Perf already has support for disassembling the branch instruction
> > to get the branch type. The branch type is defined in lbr.c.
> >
> > To keep consistent on kernel and userspace and make the classification
> > more common, the patch adds the common branch type classification
> > in perf_event.h.
> >
> > Since the disassembling of branch instruction needs some overhead,
> > a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
> > needs to disassemble the branch instruction and record the branch
> > type.
I don't get it. Why is the kernel interface mucked with for a user-space
feature?
That's wrong.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 1/5] perf/core: Define the common branch type classification
2017-04-06 6:58 ` Peter Zijlstra
@ 2017-04-06 8:21 ` Jin, Yao
2017-04-06 9:25 ` Peter Zijlstra
0 siblings, 1 reply; 16+ messages in thread
From: Jin, Yao @ 2017-04-06 8:21 UTC (permalink / raw)
To: Peter Zijlstra, Arnaldo Carvalho de Melo
Cc: Jiri Olsa, linux-kernel, ak, kan.liang, yao.jin,
Alexander Shishkin, Ingo Molnar
On 4/6/2017 2:58 PM, Peter Zijlstra wrote:
> On Tue, Apr 04, 2017 at 11:18:05AM -0300, Arnaldo Carvalho de Melo wrote:
>> Adding the perf kernel maintainers to the CC list.
> Thanks.
>
>> Em Fri, Mar 31, 2017 at 11:18:38PM +0800, Jin Yao escreveu:
>>> It is often useful to know the branch types while analyzing branch
>>> data. For example, a call is very different from a conditional branch.
>>>
>>> Currently we have to look it up in binary while the binary may later
>>> not be available and even the binary is available but user has to take
>>> some time. It is very useful for user to check it directly in perf
>>> report.
>>>
>>> Perf already has support for disassembling the branch instruction
>>> to get the branch type. The branch type is defined in lbr.c.
>>>
>>> To keep consistent on kernel and userspace and make the classification
>>> more common, the patch adds the common branch type classification
>>> in perf_event.h.
>>>
>>> Since the disassembling of branch instruction needs some overhead,
>>> a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
>>> needs to disassemble the branch instruction and record the branch
>>> type.
> I don't get it. Why is the kernel interface mucked with for a user-space
> feature?
>
> That's wrong.
Hi, otherwise we have to maintain 2 branch type copies between kernel
and user-space.
For example, currently X86_BR_* are defined in lbr.c. To display the
branch type in user-space, the user-space has to maintain the same copy
for X86_BR_*. I didn't get a better idea.
Thanks
Jin Yao
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 1/5] perf/core: Define the common branch type classification
2017-04-06 8:21 ` Jin, Yao
@ 2017-04-06 9:25 ` Peter Zijlstra
2017-04-06 14:43 ` Jin, Yao
0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2017-04-06 9:25 UTC (permalink / raw)
To: Jin, Yao
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel, ak, kan.liang,
yao.jin, Alexander Shishkin, Ingo Molnar
On Thu, Apr 06, 2017 at 04:21:06PM +0800, Jin, Yao wrote:
> Hi, otherwise we have to maintain 2 branch type copies between kernel and
> user-space.
>
> For example, currently X86_BR_* are defined in lbr.c. To display the branch
> type in user-space, the user-space has to maintain the same copy for
> X86_BR_*. I didn't get a better idea.
I still don't understand what you want; or why it would matter.
Those specific macros are for hardware LBR filter emulation/fixup. What
does that have to do with any userspace crud?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 1/5] perf/core: Define the common branch type classification
2017-04-06 9:25 ` Peter Zijlstra
@ 2017-04-06 14:43 ` Jin, Yao
2017-04-06 16:56 ` Peter Zijlstra
0 siblings, 1 reply; 16+ messages in thread
From: Jin, Yao @ 2017-04-06 14:43 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel, ak, kan.liang,
yao.jin, Alexander Shishkin, Ingo Molnar
On 4/6/2017 5:25 PM, Peter Zijlstra wrote:
> On Thu, Apr 06, 2017 at 04:21:06PM +0800, Jin, Yao wrote:
>> Hi, otherwise we have to maintain 2 branch type copies between kernel and
>> user-space.
>>
>> For example, currently X86_BR_* are defined in lbr.c. To display the branch
>> type in user-space, the user-space has to maintain the same copy for
>> X86_BR_*. I didn't get a better idea.
> I still don't understand what you want; or why it would matter.
>
> Those specific macros are for hardware LBR filter emulation/fixup. What
> does that have to do with any userspace crud?
I just want to provide a new feature that the user can directly check
branch type
in perf report, instead of looking it up in the binary. Binary could be
not available
later, so it's possible that userspace can't get the branch type.
The X86_BR are generated when disassembling the branch instruction in
kernel.
They can be considered as the x86 branch types.
It's easy to let kernel return the x86 branch types to userspace, and
then userspace
shows the branch type in perf report.
While kernel and userspace have to maintain the X86_BR definitions. One
copy is in
kernel and the other copy is in userspace. To avoid the duplicate
definitions , I define
the common branch type in perf_event.h to share between kernel and
userspace.
That's why I do that.
Thanks
Jin Yao
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 1/5] perf/core: Define the common branch type classification
2017-04-06 14:43 ` Jin, Yao
@ 2017-04-06 16:56 ` Peter Zijlstra
2017-04-07 2:14 ` Jin, Yao
0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2017-04-06 16:56 UTC (permalink / raw)
To: Jin, Yao
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel, ak, kan.liang,
yao.jin, Alexander Shishkin, Ingo Molnar
On Thu, Apr 06, 2017 at 10:43:19PM +0800, Jin, Yao wrote:
>
>
> On 4/6/2017 5:25 PM, Peter Zijlstra wrote:
> > On Thu, Apr 06, 2017 at 04:21:06PM +0800, Jin, Yao wrote:
> > > Hi, otherwise we have to maintain 2 branch type copies between kernel and
> > > user-space.
> > >
> > > For example, currently X86_BR_* are defined in lbr.c. To display the branch
> > > type in user-space, the user-space has to maintain the same copy for
> > > X86_BR_*. I didn't get a better idea.
> > I still don't understand what you want; or why it would matter.
> >
> > Those specific macros are for hardware LBR filter emulation/fixup. What
> > does that have to do with any userspace crud?
>
> I just want to provide a new feature that the user can directly check branch
> type
> in perf report, instead of looking it up in the binary. Binary could be not
> available
> later, so it's possible that userspace can't get the branch type.
>
> The X86_BR are generated when disassembling the branch instruction in
> kernel.
> They can be considered as the x86 branch types.
>
> It's easy to let kernel return the x86 branch types to userspace, and then
> userspace
> shows the branch type in perf report.
>
> While kernel and userspace have to maintain the X86_BR definitions. One copy
> is in
> kernel and the other copy is in userspace. To avoid the duplicate
> definitions , I define
> the common branch type in perf_event.h to share between kernel and
> userspace.
> That's why I do that.
Argh, fix your mailer. That is unreadable.
/me reflows...
> I just want to provide a new feature that the user can directly check
> branch type in perf report, instead of looking it up in the binary.
> Binary could be not available later, so it's possible that userspace
> can't get the branch type.
>
> The X86_BR are generated when disassembling the branch instruction in
> kernel. They can be considered as the x86 branch types.
>
> It's easy to let kernel return the x86 branch types to userspace, and
> then userspace shows the branch type in perf report.
>
> While kernel and userspace have to maintain the X86_BR definitions.
> One copy is in kernel and the other copy is in userspace. To avoid the
> duplicate definitions , I define the common branch type in
> perf_event.h to share between kernel and userspace. That's why I do
> that.
See, that's so much better..
Oh, so you _ARE_ adding a kernel feature? I understood you only wanted
to change perf-report.
WTH didn't you Cc the maintainers?
Also, if you do this, you need to Cc the PowerPC people, since they too
implement PERF_SAMPLE_BRANCH_ bits.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v1 1/5] perf/core: Define the common branch type classification
2017-04-06 16:56 ` Peter Zijlstra
@ 2017-04-07 2:14 ` Jin, Yao
0 siblings, 0 replies; 16+ messages in thread
From: Jin, Yao @ 2017-04-07 2:14 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo, Jiri Olsa, linux-kernel, Andi Kleen,
kan.liang, Alexander Shishkin, Ingo Molnar, yao.jin
> Argh, fix your mailer. That is unreadable.
>
> /me reflows...
Sorry about that. Now I reconfigure the mail editor by applying "Preformat" and "Fixed Width" settings in thunderbird client. Wish it to be better.
> See, that's so much better..
>
> Oh, so you _ARE_ adding a kernel feature? I understood you only wanted
> to change perf-report.
Honestly it's a perf-report feature. But it needs kernel to record the branch type to perf_event_entry so there is a kernel patch for that in patch series.
>
> WTH didn't you Cc the maintainers?
Very sorry not to cc to all maintainers in v1. I will be careful of sending v2 patch series.
> Also, if you do this, you need to Cc the PowerPC people, since they too
> implement PERF_SAMPLE_BRANCH_ bits.
I will cc linuxppc-dev@lists.ozlabs.org when sending v2.
Thanks
Jin Yao
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2017-04-07 2:14 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-31 15:18 [PATCH v1 0/5] perf report: Show branch type Jin Yao
2017-03-31 15:18 ` [PATCH v1 1/5] perf/core: Define the common branch type classification Jin Yao
2017-04-04 14:18 ` Arnaldo Carvalho de Melo
2017-04-04 15:52 ` Jin, Yao
2017-04-04 16:09 ` Arnaldo Carvalho de Melo
2017-04-06 0:09 ` Jin, Yao
2017-04-06 6:58 ` Peter Zijlstra
2017-04-06 8:21 ` Jin, Yao
2017-04-06 9:25 ` Peter Zijlstra
2017-04-06 14:43 ` Jin, Yao
2017-04-06 16:56 ` Peter Zijlstra
2017-04-07 2:14 ` Jin, Yao
2017-03-31 15:18 ` [PATCH v1 2/5] perf/x86/intel: Record branch type Jin Yao
2017-03-31 15:18 ` [PATCH v1 3/5] perf record: Create a new option save_type in --branch-filter Jin Yao
2017-03-31 15:18 ` [PATCH v1 4/5] perf report: Show branch type statistics for stdio mode Jin Yao
2017-03-31 15:18 ` [PATCH v1 5/5] perf report: Show branch type in callchain entry Jin Yao
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.