* Re: [PATCH v10 1/7] perf/core: Define the common branch type classification
2017-07-18 12:13 ` [PATCH v10 1/7] perf/core: Define the common branch type classification Jin Yao
@ 2017-07-18 9:31 ` Michael Ellerman
2017-07-18 15:54 ` Arnaldo Carvalho de Melo
2017-07-20 9:03 ` [tip:perf/core] " tip-bot for Jin Yao
1 sibling, 1 reply; 17+ messages in thread
From: Michael Ellerman @ 2017-07-18 9:31 UTC (permalink / raw)
To: Jin Yao, acme, jolsa, peterz, mingo, alexander.shishkin
Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
Jin Yao <yao.jin@linux.intel.com> writes:
> It is often useful to know the branch types while analyzing branch
> data. For example, a call is very different from a conditional branch.
>
> Currently we have to look it up in binary while the binary may later
> not be available and even the binary is available but user has to take
> some time. It is very useful for user to check it directly in perf
> report.
>
> Perf already has support for disassembling the branch instruction
> to get the x86 branch type.
>
> To keep consistent on kernel and userspace and make the classification
> more common, the patch adds the common branch type classification
> in perf_event.h.
>
> The patch only defines a minimum but most common set of branch types.
>
> PERF_BR_UNKNOWN : unknown
> PERF_BR_COND :conditional
> PERF_BR_UNCOND : unconditional
> PERF_BR_IND : indirect
> PERF_BR_CALL : function call
> PERF_BR_IND_CALL : indirect function call
> PERF_BR_RET : function return
> PERF_BR_SYSCALL : syscall
> PERF_BR_SYSRET : syscall return
> PERF_BR_COND_CALL : conditional function call
> PERF_BR_COND_RET : conditional function return
>
> The patch also adds a new field type (4 bits) in perf_branch_entry
> to record the branch type.
>
> Since the disassembling of branch instruction needs some overhead,
> a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
> needs to disassemble the branch instruction and record the branch
> type.
>
> Change log
> ----------
> v10: Not changed.
>
> v9: Not changed.
>
> v8: Change PERF_BR_NONE to PERF_BR_UNKNOWN.
> No other change.
I acked v8, so you could have retained my ack. Here it is again:
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
cheers
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v10 0/7] perf report: Show branch type
@ 2017-07-18 12:13 Jin Yao
2017-07-18 12:13 ` [PATCH v10 1/7] perf/core: Define the common branch type classification Jin Yao
` (6 more replies)
0 siblings, 7 replies; 17+ messages in thread
From: Jin Yao @ 2017-07-18 12:13 UTC (permalink / raw)
To: acme, jolsa, peterz, mingo, alexander.shishkin, mpe
Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
v10:
----
According to Jiri Olsa's comment, update the patch
"perf/x86/intel: Record branch type". Set the branch_map
array to be static. The previous version has it on stack
then makes the compiler to create it every time when the
function gets called.
Other patches have been acked by Jiri Olsa and Michael Ellerman.
v9:
---
It only changes the patch "perf/x86/intel: Record branch type".
Peter suggests to use __ffs() to find first bit.
Yes, with this change, the code is simpler and clearer.
No other functional changes.
v8:
---
Change PERF_BR_NONE to PERF_BR_UNKNOWN according to Peter's comments.
No other functional changes.
v7:
---
Redefine the common branch types according to review comments from
Michael Ellerman <mpe@ellerman.id.au>
Now the patch series just defines a minimum but more common set of
branch types.
PERF_BR_NONE : unknown
PERF_BR_COND :conditional
PERF_BR_UNCOND : unconditional
PERF_BR_IND : indirect
PERF_BR_CALL : function call
PERF_BR_IND_CALL : indirect function call
PERF_BR_RET : function return
PERF_BR_SYSCALL : syscall
PERF_BR_SYSRET : syscall return
PERF_BR_COND_CALL : conditional function call
PERF_BR_COND_RET : conditional function return
v6:
---
Update according to the review comments from
Jiri Olsa <jolsa@redhat.com>. Major modifications are:
1. Move that multiline conditional code inside {} brackets.
2. Move branch_type_stat_display() from builtin-report.c to
branch.c. Move branch_type_str() from callchain.c to
branch.c.
3. Keep the original branch info display order, that is:
predicted, abort, cycles, iterations
v5:
---
Mainly the v5 patch series are updated according to
comments from Jiri Olsa <jolsa@redhat.com>.
The kernel part doesn't have functional change. It just
solve the merge issue.
In userspace, the functions of branch type counting and
branch type name resolving are moved to the new files:
util/branch.c, util/branch.h.
And refactor the branch info printing code for better
maintenance.
Not changed (or just fix merge issue):
perf/core: Define the common branch type classification
perf/x86/intel: Record branch type
perf record: Create a new option save_type in --branch-filter
New patches:
perf report: Refactor the branch info printing code
perf util: Create branch.c/.h for common branch functions
Changed:
perf report: Show branch type statistics for stdio mode
perf report: Show branch type in callchain entry
v4:
---
1. Describe the major changes in patch description.
Thanks for Peter Zijlstra's reminding.
2. Initialize branch type to 0 in intel_pmu_lbr_read_32 and
intel_pmu_lbr_read_64. Remove the invalid else code in
intel_pmu_lbr_filter.
v3:
---
1. Move the JCC forward/backward and cross page computing from
kernel to userspace.
2. Use lookup table to replace original switch/case processing.
Changed:
perf/core: Define the common branch type classification
perf/x86/intel: Record branch type
perf report: Show branch type statistics for stdio mode
perf report: Show branch type in callchain entry
Not changed:
perf record: Create a new option save_type in --branch-filter
v2:
---
1. Use 4 bits in perf_branch_entry to record branch type.
2. Pull out some common branch types from FAR_BRANCH. Now the branch
types defined in perf_event.h:
Jin Yao (7):
perf/core: Define the common branch type classification
perf/x86/intel: Record branch type
perf record: Create a new option save_type in --branch-filter
perf report: Refactor the branch info printing code
perf util: Create branch.c/.h for common branch functions
perf report: Show branch type statistics for stdio mode
perf report: Show branch type in callchain entry
arch/x86/events/intel/lbr.c | 52 +++++++++-
include/uapi/linux/perf_event.h | 27 ++++-
tools/include/uapi/linux/perf_event.h | 27 ++++-
tools/perf/Documentation/perf-record.txt | 1 +
tools/perf/builtin-report.c | 25 +++++
tools/perf/util/Build | 1 +
tools/perf/util/branch.c | 166 +++++++++++++++++++++++++++++++
tools/perf/util/branch.h | 25 +++++
tools/perf/util/callchain.c | 140 ++++++++++++++------------
tools/perf/util/callchain.h | 5 +-
tools/perf/util/event.h | 3 +-
tools/perf/util/hist.c | 5 +-
tools/perf/util/machine.c | 26 +++--
tools/perf/util/parse-branch-options.c | 1 +
14 files changed, 420 insertions(+), 84 deletions(-)
create mode 100644 tools/perf/util/branch.c
create mode 100644 tools/perf/util/branch.h
--
2.7.4
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v10 1/7] perf/core: Define the common branch type classification
2017-07-18 12:13 [PATCH v10 0/7] perf report: Show branch type Jin Yao
@ 2017-07-18 12:13 ` Jin Yao
2017-07-18 9:31 ` Michael Ellerman
2017-07-20 9:03 ` [tip:perf/core] " tip-bot for Jin Yao
2017-07-18 12:13 ` [PATCH v10 2/7] perf/x86/intel: Record branch type Jin Yao
` (5 subsequent siblings)
6 siblings, 2 replies; 17+ messages in thread
From: Jin Yao @ 2017-07-18 12:13 UTC (permalink / raw)
To: acme, jolsa, peterz, mingo, alexander.shishkin, mpe
Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
It is often useful to know the branch types while analyzing branch
data. For example, a call is very different from a conditional branch.
Currently we have to look it up in binary while the binary may later
not be available and even the binary is available but user has to take
some time. It is very useful for user to check it directly in perf
report.
Perf already has support for disassembling the branch instruction
to get the x86 branch type.
To keep consistent on kernel and userspace and make the classification
more common, the patch adds the common branch type classification
in perf_event.h.
The patch only defines a minimum but most common set of branch types.
PERF_BR_UNKNOWN : unknown
PERF_BR_COND :conditional
PERF_BR_UNCOND : unconditional
PERF_BR_IND : indirect
PERF_BR_CALL : function call
PERF_BR_IND_CALL : indirect function call
PERF_BR_RET : function return
PERF_BR_SYSCALL : syscall
PERF_BR_SYSRET : syscall return
PERF_BR_COND_CALL : conditional function call
PERF_BR_COND_RET : conditional function return
The patch also adds a new field type (4 bits) in perf_branch_entry
to record the branch type.
Since the disassembling of branch instruction needs some overhead,
a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
needs to disassemble the branch instruction and record the branch
type.
Change log
----------
v10: Not changed.
v9: Not changed.
v8: Change PERF_BR_NONE to PERF_BR_UNKNOWN.
No other change.
v7: Just keep the most common branch types.
Others are removed.
v6: Not changed.
v5: Not changed. The v5 patch series just change the userspace.
v4: Comparing to previous version, the major changes are:
1. Remove the PERF_BR_JCC_FWD/PERF_BR_JCC_BWD, they will be
computed later in userspace.
2. Remove the "cross" field in perf_branch_entry. The cross page
computing will be done later in userspace.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
include/uapi/linux/perf_event.h | 27 ++++++++++++++++++++++++++-
tools/include/uapi/linux/perf_event.h | 27 ++++++++++++++++++++++++++-
2 files changed, 52 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index b1c0b18..642db5f 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT = 14, /* no flags */
PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT = 15, /* no cycles */
+ PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT = 16, /* save branch type */
+
PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */
};
@@ -198,9 +200,30 @@ enum perf_branch_sample_type {
PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
PERF_SAMPLE_BRANCH_NO_CYCLES = 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
+ PERF_SAMPLE_BRANCH_TYPE_SAVE =
+ 1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
+
PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
};
+/*
+ * Common flow change classification
+ */
+enum {
+ PERF_BR_UNKNOWN = 0, /* unknown */
+ PERF_BR_COND = 1, /* conditional */
+ PERF_BR_UNCOND = 2, /* unconditional */
+ PERF_BR_IND = 3, /* indirect */
+ PERF_BR_CALL = 4, /* function call */
+ PERF_BR_IND_CALL = 5, /* indirect function call */
+ PERF_BR_RET = 6, /* function return */
+ PERF_BR_SYSCALL = 7, /* syscall */
+ PERF_BR_SYSRET = 8, /* syscall return */
+ PERF_BR_COND_CALL = 9, /* conditional function call */
+ PERF_BR_COND_RET = 10, /* conditional function return */
+ PERF_BR_MAX,
+};
+
#define PERF_SAMPLE_BRANCH_PLM_ALL \
(PERF_SAMPLE_BRANCH_USER|\
PERF_SAMPLE_BRANCH_KERNEL|\
@@ -1015,6 +1038,7 @@ union perf_mem_data_src {
* in_tx: running in a hardware transaction
* abort: aborting a hardware transaction
* cycles: cycles from last branch (or 0 if not supported)
+ * type: branch type
*/
struct perf_branch_entry {
__u64 from;
@@ -1024,7 +1048,8 @@ struct perf_branch_entry {
in_tx:1, /* in transaction */
abort:1, /* transaction abort */
cycles:16, /* cycle count to last branch */
- reserved:44;
+ type:4, /* branch type */
+ reserved:40;
};
#endif /* _UAPI_LINUX_PERF_EVENT_H */
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index b1c0b18..642db5f 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT = 14, /* no flags */
PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT = 15, /* no cycles */
+ PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT = 16, /* save branch type */
+
PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */
};
@@ -198,9 +200,30 @@ enum perf_branch_sample_type {
PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
PERF_SAMPLE_BRANCH_NO_CYCLES = 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
+ PERF_SAMPLE_BRANCH_TYPE_SAVE =
+ 1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
+
PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
};
+/*
+ * Common flow change classification
+ */
+enum {
+ PERF_BR_UNKNOWN = 0, /* unknown */
+ PERF_BR_COND = 1, /* conditional */
+ PERF_BR_UNCOND = 2, /* unconditional */
+ PERF_BR_IND = 3, /* indirect */
+ PERF_BR_CALL = 4, /* function call */
+ PERF_BR_IND_CALL = 5, /* indirect function call */
+ PERF_BR_RET = 6, /* function return */
+ PERF_BR_SYSCALL = 7, /* syscall */
+ PERF_BR_SYSRET = 8, /* syscall return */
+ PERF_BR_COND_CALL = 9, /* conditional function call */
+ PERF_BR_COND_RET = 10, /* conditional function return */
+ PERF_BR_MAX,
+};
+
#define PERF_SAMPLE_BRANCH_PLM_ALL \
(PERF_SAMPLE_BRANCH_USER|\
PERF_SAMPLE_BRANCH_KERNEL|\
@@ -1015,6 +1038,7 @@ union perf_mem_data_src {
* in_tx: running in a hardware transaction
* abort: aborting a hardware transaction
* cycles: cycles from last branch (or 0 if not supported)
+ * type: branch type
*/
struct perf_branch_entry {
__u64 from;
@@ -1024,7 +1048,8 @@ struct perf_branch_entry {
in_tx:1, /* in transaction */
abort:1, /* transaction abort */
cycles:16, /* cycle count to last branch */
- reserved:44;
+ type:4, /* branch type */
+ reserved:40;
};
#endif /* _UAPI_LINUX_PERF_EVENT_H */
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v10 2/7] perf/x86/intel: Record branch type
2017-07-18 12:13 [PATCH v10 0/7] perf report: Show branch type Jin Yao
2017-07-18 12:13 ` [PATCH v10 1/7] perf/core: Define the common branch type classification Jin Yao
@ 2017-07-18 12:13 ` Jin Yao
2017-07-20 9:04 ` [tip:perf/core] " tip-bot for Jin Yao
2017-07-18 12:13 ` [PATCH v10 3/7] perf record: Create a new option save_type in --branch-filter Jin Yao
` (4 subsequent siblings)
6 siblings, 1 reply; 17+ messages in thread
From: Jin Yao @ 2017-07-18 12:13 UTC (permalink / raw)
To: acme, jolsa, peterz, mingo, alexander.shishkin, mpe
Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
Perf already has support for disassembling the branch instruction
and using the branch type for filtering. The patch just records
the branch type in perf_branch_entry.
Before recording, the patch converts the x86 branch type to
common branch type.
Change log
----------
v10: Set the branch_map array to be static. The previous version
has it on stack then makes the compiler to create it every
time when the function gets called.
v9: Use __ffs() to find first bit in type in common_branch_type().
It lets the code be clear.
v8: Change PERF_BR_NONE to PERF_BR_UNKNOWN.
v7: Just convert following x86 branch types to common branch types.
X86_BR_CALL -> PERF_BR_CALL
X86_BR_RET -> PERF_BR_RET
X86_BR_JCC -> PERF_BR_COND
X86_BR_JMP -> PERF_BR_UNCOND
X86_BR_IND_CALL -> PERF_BR_IND_CALL
X86_BR_ZERO_CALL -> PERF_BR_CALL
X86_BR_IND_JMP -> PERF_BR_IND
X86_BR_SYSCALL -> PERF_BR_SYSCALL
X86_BR_SYSRET -> PERF_BR_SYSRET
Others are set to PERF_BR_NONE
v6: Not changed.
v5: Just fix the merge error. No other update.
v4: Comparing to previous version, the major changes are:
1. Uses a lookup table to convert x86 branch type to common branch
type.
2. Move the JCC forward/JCC backward and cross page computing to
user space.
3. Initialize branch type to 0 in intel_pmu_lbr_read_32 and
intel_pmu_lbr_read_64
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
arch/x86/events/intel/lbr.c | 52 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 51 insertions(+), 1 deletion(-)
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index eb26165..0edda48 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -109,6 +109,9 @@ enum {
X86_BR_ZERO_CALL = 1 << 15,/* zero length call */
X86_BR_CALL_STACK = 1 << 16,/* call stack */
X86_BR_IND_JMP = 1 << 17,/* indirect jump */
+
+ X86_BR_TYPE_SAVE = 1 << 18,/* indicate to save branch type */
+
};
#define X86_BR_PLM (X86_BR_USER | X86_BR_KERNEL)
@@ -510,6 +513,7 @@ static void intel_pmu_lbr_read_32(struct cpu_hw_events *cpuc)
cpuc->lbr_entries[i].in_tx = 0;
cpuc->lbr_entries[i].abort = 0;
cpuc->lbr_entries[i].cycles = 0;
+ cpuc->lbr_entries[i].type = 0;
cpuc->lbr_entries[i].reserved = 0;
}
cpuc->lbr_stack.nr = i;
@@ -596,6 +600,7 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc)
cpuc->lbr_entries[out].in_tx = in_tx;
cpuc->lbr_entries[out].abort = abort;
cpuc->lbr_entries[out].cycles = cycles;
+ cpuc->lbr_entries[out].type = 0;
cpuc->lbr_entries[out].reserved = 0;
out++;
}
@@ -673,6 +678,10 @@ static int intel_pmu_setup_sw_lbr_filter(struct perf_event *event)
if (br_type & PERF_SAMPLE_BRANCH_CALL)
mask |= X86_BR_CALL | X86_BR_ZERO_CALL;
+
+ if (br_type & PERF_SAMPLE_BRANCH_TYPE_SAVE)
+ mask |= X86_BR_TYPE_SAVE;
+
/*
* stash actual user request into reg, it may
* be used by fixup code for some CPU
@@ -926,6 +935,43 @@ static int branch_type(unsigned long from, unsigned long to, int abort)
return ret;
}
+#define X86_BR_TYPE_MAP_MAX 16
+
+static int branch_map[X86_BR_TYPE_MAP_MAX] = {
+ PERF_BR_CALL, /* X86_BR_CALL */
+ PERF_BR_RET, /* X86_BR_RET */
+ PERF_BR_SYSCALL, /* X86_BR_SYSCALL */
+ PERF_BR_SYSRET, /* X86_BR_SYSRET */
+ PERF_BR_UNKNOWN, /* X86_BR_INT */
+ PERF_BR_UNKNOWN, /* X86_BR_IRET */
+ PERF_BR_COND, /* X86_BR_JCC */
+ PERF_BR_UNCOND, /* X86_BR_JMP */
+ PERF_BR_UNKNOWN, /* X86_BR_IRQ */
+ PERF_BR_IND_CALL, /* X86_BR_IND_CALL */
+ PERF_BR_UNKNOWN, /* X86_BR_ABORT */
+ PERF_BR_UNKNOWN, /* X86_BR_IN_TX */
+ PERF_BR_UNKNOWN, /* X86_BR_NO_TX */
+ PERF_BR_CALL, /* X86_BR_ZERO_CALL */
+ PERF_BR_UNKNOWN, /* X86_BR_CALL_STACK */
+ PERF_BR_IND, /* X86_BR_IND_JMP */
+};
+
+static int
+common_branch_type(int type)
+{
+ int i;
+
+ type >>= 2; /* skip X86_BR_USER and X86_BR_KERNEL */
+
+ if (type) {
+ i = __ffs(type);
+ if (i < X86_BR_TYPE_MAP_MAX)
+ return branch_map[i];
+ }
+
+ return PERF_BR_UNKNOWN;
+}
+
/*
* implement actual branch filter based on user demand.
* Hardware may not exactly satisfy that request, thus
@@ -942,7 +988,8 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
bool compress = false;
/* if sampling all branches, then nothing to filter */
- if ((br_sel & X86_BR_ALL) == X86_BR_ALL)
+ if (((br_sel & X86_BR_ALL) == X86_BR_ALL) &&
+ ((br_sel & X86_BR_TYPE_SAVE) != X86_BR_TYPE_SAVE))
return;
for (i = 0; i < cpuc->lbr_stack.nr; i++) {
@@ -963,6 +1010,9 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
cpuc->lbr_entries[i].from = 0;
compress = true;
}
+
+ if ((br_sel & X86_BR_TYPE_SAVE) == X86_BR_TYPE_SAVE)
+ cpuc->lbr_entries[i].type = common_branch_type(type);
}
if (!compress)
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v10 3/7] perf record: Create a new option save_type in --branch-filter
2017-07-18 12:13 [PATCH v10 0/7] perf report: Show branch type Jin Yao
2017-07-18 12:13 ` [PATCH v10 1/7] perf/core: Define the common branch type classification Jin Yao
2017-07-18 12:13 ` [PATCH v10 2/7] perf/x86/intel: Record branch type Jin Yao
@ 2017-07-18 12:13 ` Jin Yao
2017-07-20 9:04 ` [tip:perf/core] " tip-bot for Jin Yao
2017-07-18 12:13 ` [PATCH v10 4/7] perf report: Refactor the branch info printing code Jin Yao
` (3 subsequent siblings)
6 siblings, 1 reply; 17+ messages in thread
From: Jin Yao @ 2017-07-18 12:13 UTC (permalink / raw)
To: acme, jolsa, peterz, mingo, alexander.shishkin, mpe
Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
The option indicates the kernel to save branch type during sampling.
One example:
perf record -g --branch-filter any,save_type <command>
Change log
----------
v10: Not changed.
v9: Not changed.
v8: Not changed.
v7: Not changed.
v6: Not changed.
v5: Not changed.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
tools/perf/Documentation/perf-record.txt | 1 +
tools/perf/util/parse-branch-options.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index b0e9e92..9bdea04 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -332,6 +332,7 @@ following filters are defined:
- no_tx: only when the target is not in a hardware transaction
- abort_tx: only when the target is a hardware transaction abort
- cond: conditional branches
+ - save_type: save branch type during sampling in case binary is not available later
+
The option requires at least one branch type among any, any_call, any_ret, ind_call, cond.
diff --git a/tools/perf/util/parse-branch-options.c b/tools/perf/util/parse-branch-options.c
index 38fd115..e71fb5f 100644
--- a/tools/perf/util/parse-branch-options.c
+++ b/tools/perf/util/parse-branch-options.c
@@ -28,6 +28,7 @@ static const struct branch_mode branch_modes[] = {
BRANCH_OPT("cond", PERF_SAMPLE_BRANCH_COND),
BRANCH_OPT("ind_jmp", PERF_SAMPLE_BRANCH_IND_JUMP),
BRANCH_OPT("call", PERF_SAMPLE_BRANCH_CALL),
+ BRANCH_OPT("save_type", PERF_SAMPLE_BRANCH_TYPE_SAVE),
BRANCH_END
};
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v10 4/7] perf report: Refactor the branch info printing code
2017-07-18 12:13 [PATCH v10 0/7] perf report: Show branch type Jin Yao
` (2 preceding siblings ...)
2017-07-18 12:13 ` [PATCH v10 3/7] perf record: Create a new option save_type in --branch-filter Jin Yao
@ 2017-07-18 12:13 ` Jin Yao
2017-07-20 9:04 ` [tip:perf/core] " tip-bot for Jin Yao
2017-07-18 12:13 ` [PATCH v10 5/7] perf util: Create branch.c/.h for common branch functions Jin Yao
` (2 subsequent siblings)
6 siblings, 1 reply; 17+ messages in thread
From: Jin Yao @ 2017-07-18 12:13 UTC (permalink / raw)
To: acme, jolsa, peterz, mingo, alexander.shishkin, mpe
Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
The branch info such as predicted/cycles/... are printed at the
callchain entries.
For example: perf report --branch-history --no-children --stdio
--1.07%--main div.c:39 (predicted:52.4% cycles:1 iterations:17)
main div.c:44 (predicted:52.4% cycles:1)
main div.c:42 (cycles:2)
compute_flag div.c:28 (cycles:2)
compute_flag div.c:27 (cycles:1)
rand rand.c:28 (cycles:1)
rand rand.c:28 (cycles:1)
__random random.c:298 (cycles:1)
__random random.c:297 (cycles:1)
__random random.c:295 (cycles:1)
__random random.c:295 (cycles:1)
__random random.c:295 (cycles:1)
But the current code is difficult to maintain and extend. This patch
refactors the code for easy maintenance.
Change log
----------
v10: Not changed
v9: Not changed
v8: Not changed
v7: Not changed
v6: 1. Put the multiline condition code into {} brackets in
counts_str_build()
2. Keep the original display order, that is:
predicted, abort, cycles, iterations
v5: It's a new patch in v5 patch series.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
tools/perf/util/callchain.c | 106 ++++++++++++++++++++------------------------
1 file changed, 47 insertions(+), 59 deletions(-)
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index b4204b4..abc6908 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1214,83 +1214,71 @@ int callchain_branch_counts(struct callchain_root *root,
cycles_count);
}
+static int count_pri64_printf(int index, const char *str, u64 value,
+ char *bf, int bfsize)
+{
+ int printed;
+
+ printed = scnprintf(bf, bfsize,
+ "%s%s:%" PRId64 "",
+ (index) ? " " : " (", str, value);
+
+ return printed;
+}
+
+static int count_float_printf(int index, const char *str, float value,
+ char *bf, int bfsize)
+{
+ int printed;
+
+ printed = scnprintf(bf, bfsize,
+ "%s%s:%.1f%%",
+ (index) ? " " : " (", str, value);
+
+ return printed;
+}
+
static int counts_str_build(char *bf, int bfsize,
u64 branch_count, u64 predicted_count,
u64 abort_count, u64 cycles_count,
u64 iter_count, u64 samples_count)
{
- double predicted_percent = 0.0;
- const char *null_str = "";
- char iter_str[32];
- char cycle_str[32];
- char *istr, *cstr;
u64 cycles;
+ int printed = 0, i = 0;
if (branch_count == 0)
return scnprintf(bf, bfsize, " (calltrace)");
- cycles = cycles_count / branch_count;
-
- if (iter_count && samples_count) {
- if (cycles > 0)
- scnprintf(iter_str, sizeof(iter_str),
- " iterations:%" PRId64 "",
- iter_count / samples_count);
- else
- scnprintf(iter_str, sizeof(iter_str),
- "iterations:%" PRId64 "",
- iter_count / samples_count);
- istr = iter_str;
- } else
- istr = (char *)null_str;
-
- if (cycles > 0) {
- scnprintf(cycle_str, sizeof(cycle_str),
- "cycles:%" PRId64 "", cycles);
- cstr = cycle_str;
- } else
- cstr = (char *)null_str;
-
- predicted_percent = predicted_count * 100.0 / branch_count;
+ if (predicted_count < branch_count) {
+ printed += count_float_printf(i++, "predicted",
+ predicted_count * 100.0 / branch_count,
+ bf + printed, bfsize - printed);
+ }
- if ((predicted_count == branch_count) && (abort_count == 0)) {
- if ((cycles > 0) || (istr != (char *)null_str))
- return scnprintf(bf, bfsize, " (%s%s)", cstr, istr);
- else
- return scnprintf(bf, bfsize, "%s", (char *)null_str);
+ if (abort_count) {
+ printed += count_float_printf(i++, "abort",
+ abort_count * 100.0 / branch_count,
+ bf + printed, bfsize - printed);
}
- if ((predicted_count < branch_count) && (abort_count == 0)) {
- if ((cycles > 0) || (istr != (char *)null_str))
- return scnprintf(bf, bfsize,
- " (predicted:%.1f%% %s%s)",
- predicted_percent, cstr, istr);
- else {
- return scnprintf(bf, bfsize,
- " (predicted:%.1f%%)",
- predicted_percent);
- }
+ cycles = cycles_count / branch_count;
+ if (cycles) {
+ printed += count_pri64_printf(i++, "cycles",
+ cycles,
+ bf + printed, bfsize - printed);
}
- if ((predicted_count == branch_count) && (abort_count > 0)) {
- if ((cycles > 0) || (istr != (char *)null_str))
- return scnprintf(bf, bfsize,
- " (abort:%" PRId64 " %s%s)",
- abort_count, cstr, istr);
- else
- return scnprintf(bf, bfsize,
- " (abort:%" PRId64 ")",
- abort_count);
+ if (iter_count && samples_count) {
+ printed += count_pri64_printf(i++, "iterations",
+ iter_count / samples_count,
+ bf + printed, bfsize - printed);
}
- if ((cycles > 0) || (istr != (char *)null_str))
- return scnprintf(bf, bfsize,
- " (predicted:%.1f%% abort:%" PRId64 " %s%s)",
- predicted_percent, abort_count, cstr, istr);
+ if (i)
+ return scnprintf(bf + printed, bfsize - printed, ")");
- return scnprintf(bf, bfsize,
- " (predicted:%.1f%% abort:%" PRId64 ")",
- predicted_percent, abort_count);
+ bf[0] = 0;
+ return 0;
}
static int callchain_counts_printf(FILE *fp, char *bf, int bfsize,
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v10 5/7] perf util: Create branch.c/.h for common branch functions
2017-07-18 12:13 [PATCH v10 0/7] perf report: Show branch type Jin Yao
` (3 preceding siblings ...)
2017-07-18 12:13 ` [PATCH v10 4/7] perf report: Refactor the branch info printing code Jin Yao
@ 2017-07-18 12:13 ` Jin Yao
2017-07-20 9:05 ` [tip:perf/core] " tip-bot for Jin Yao
2017-07-18 12:13 ` [PATCH v10 6/7] perf report: Show branch type statistics for stdio mode Jin Yao
2017-07-18 12:13 ` [PATCH v10 7/7] perf report: Show branch type in callchain entry Jin Yao
6 siblings, 1 reply; 17+ messages in thread
From: Jin Yao @ 2017-07-18 12:13 UTC (permalink / raw)
To: acme, jolsa, peterz, mingo, alexander.shishkin, mpe
Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
Create new util/branch.c and util/branch.h to contain the common
branch functions. Such as:
branch_type_count(): Count the numbers of branch types
branch_type_name() : Return the name of branch type
branch_type_stat_display(): Display branch type statistics info
branch_type_str(): Construct the branch type string.
The branch type is saved in branch_flags.
Change log
----------
v10: Not changed
v9: Not changed
v8: Change PERF_BR_NONE to PERF_BR_UNKNOWN.
v7: Since the common branch type name is changed (e.g. JCC->COND),
this patch is performed the modification accordingly.
v6: Move that multiline conditional code inside {} brackets.
Move branch_type_stat_display() from builtin-report.c to
branch.c.
Move branch_type_str() from callchain.c to branch.c.
v5: It's a new patch in v5 patch series.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
tools/perf/util/Build | 1 +
tools/perf/util/branch.c | 166 +++++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/branch.h | 25 +++++++
tools/perf/util/event.h | 3 +-
4 files changed, 194 insertions(+), 1 deletion(-)
create mode 100644 tools/perf/util/branch.c
create mode 100644 tools/perf/util/branch.h
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 79dea95..9857c38 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -93,6 +93,7 @@ libperf-y += drv_configs.o
libperf-y += units.o
libperf-y += time-utils.o
libperf-y += expr-bison.o
+libperf-y += branch.o
libperf-$(CONFIG_LIBBPF) += bpf-loader.o
libperf-$(CONFIG_BPF_PROLOGUE) += bpf-prologue.o
diff --git a/tools/perf/util/branch.c b/tools/perf/util/branch.c
new file mode 100644
index 0000000..953f3ff
--- /dev/null
+++ b/tools/perf/util/branch.c
@@ -0,0 +1,166 @@
+#include "perf.h"
+#include "util/util.h"
+#include "util/debug.h"
+#include "util/branch.h"
+
+static bool cross_area(u64 addr1, u64 addr2, int size)
+{
+ u64 align1, align2;
+
+ align1 = addr1 & ~(size - 1);
+ align2 = addr2 & ~(size - 1);
+
+ return (align1 != align2) ? true : false;
+}
+
+#define AREA_4K 4096
+#define AREA_2M (2 * 1024 * 1024)
+
+void branch_type_count(struct branch_type_stat *stat,
+ struct branch_flags *flags,
+ u64 from, u64 to)
+{
+ if (flags->type == PERF_BR_UNKNOWN || from == 0)
+ return;
+
+ stat->counts[flags->type]++;
+
+ if (flags->type == PERF_BR_COND) {
+ if (to > from)
+ stat->cond_fwd++;
+ else
+ stat->cond_bwd++;
+ }
+
+ if (cross_area(from, to, AREA_2M))
+ stat->cross_2m++;
+ else if (cross_area(from, to, AREA_4K))
+ stat->cross_4k++;
+}
+
+const char *branch_type_name(int type)
+{
+ const char *branch_names[PERF_BR_MAX] = {
+ "N/A",
+ "COND",
+ "UNCOND",
+ "IND",
+ "CALL",
+ "IND_CALL",
+ "RET",
+ "SYSCALL",
+ "SYSRET",
+ "COND_CALL",
+ "COND_RET"
+ };
+
+ if (type >= 0 && type < PERF_BR_MAX)
+ return branch_names[type];
+
+ return NULL;
+}
+
+void branch_type_stat_display(FILE *fp, struct branch_type_stat *stat)
+{
+ u64 total = 0;
+ int i;
+
+ for (i = 0; i < PERF_BR_MAX; i++)
+ total += stat->counts[i];
+
+ if (total == 0)
+ return;
+
+ fprintf(fp, "\n#");
+ fprintf(fp, "\n# Branch Statistics:");
+ fprintf(fp, "\n#");
+
+ if (stat->cond_fwd > 0) {
+ fprintf(fp, "\n%8s: %5.1f%%",
+ "COND_FWD",
+ 100.0 * (double)stat->cond_fwd / (double)total);
+ }
+
+ if (stat->cond_bwd > 0) {
+ fprintf(fp, "\n%8s: %5.1f%%",
+ "COND_BWD",
+ 100.0 * (double)stat->cond_bwd / (double)total);
+ }
+
+ if (stat->cross_4k > 0) {
+ fprintf(fp, "\n%8s: %5.1f%%",
+ "CROSS_4K",
+ 100.0 * (double)stat->cross_4k / (double)total);
+ }
+
+ if (stat->cross_2m > 0) {
+ fprintf(fp, "\n%8s: %5.1f%%",
+ "CROSS_2M",
+ 100.0 * (double)stat->cross_2m / (double)total);
+ }
+
+ for (i = 0; i < PERF_BR_MAX; i++) {
+ if (stat->counts[i] > 0)
+ fprintf(fp, "\n%8s: %5.1f%%",
+ branch_type_name(i),
+ 100.0 *
+ (double)stat->counts[i] / (double)total);
+ }
+}
+
+static int count_str_printf(int index, const char *str,
+ char *bf, int bfsize)
+{
+ int printed;
+
+ printed = scnprintf(bf, bfsize,
+ "%s%s",
+ (index) ? " " : " (", str);
+
+ return printed;
+}
+
+int branch_type_str(struct branch_type_stat *stat,
+ char *bf, int bfsize)
+{
+ int i, j = 0, printed = 0;
+ u64 total = 0;
+
+ for (i = 0; i < PERF_BR_MAX; i++)
+ total += stat->counts[i];
+
+ if (total == 0)
+ return 0;
+
+ if (stat->cond_fwd > 0) {
+ printed += count_str_printf(j++, "COND_FWD",
+ bf + printed, bfsize - printed);
+ }
+
+ if (stat->cond_bwd > 0) {
+ printed += count_str_printf(j++, "COND_BWD",
+ bf + printed, bfsize - printed);
+ }
+
+ for (i = 0; i < PERF_BR_MAX; i++) {
+ if (i == PERF_BR_COND)
+ continue;
+
+ if (stat->counts[i] > 0) {
+ printed += count_str_printf(j++, branch_type_name(i),
+ bf + printed, bfsize - printed);
+ }
+ }
+
+ if (stat->cross_4k > 0) {
+ printed += count_str_printf(j++, "CROSS_4K",
+ bf + printed, bfsize - printed);
+ }
+
+ if (stat->cross_2m > 0) {
+ printed += count_str_printf(j++, "CROSS_2M",
+ bf + printed, bfsize - printed);
+ }
+
+ return printed;
+}
diff --git a/tools/perf/util/branch.h b/tools/perf/util/branch.h
new file mode 100644
index 0000000..0d702cb
--- /dev/null
+++ b/tools/perf/util/branch.h
@@ -0,0 +1,25 @@
+#ifndef _PERF_BRANCH_H
+#define _PERF_BRANCH_H 1
+
+#include <stdint.h>
+#include "../perf.h"
+
+struct branch_type_stat {
+ u64 counts[PERF_BR_MAX];
+ u64 cond_fwd;
+ u64 cond_bwd;
+ u64 cross_4k;
+ u64 cross_2m;
+};
+
+struct branch_flags;
+
+void branch_type_count(struct branch_type_stat *stat,
+ struct branch_flags *flags,
+ u64 from, u64 to);
+
+const char *branch_type_name(int type);
+void branch_type_stat_display(FILE *fp, struct branch_type_stat *stat);
+int branch_type_str(struct branch_type_stat *stat, char *bf, int bfsize);
+
+#endif /* _PERF_BRANCH_H */
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 9967c87..e2605c9 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -142,7 +142,8 @@ struct branch_flags {
u64 in_tx:1;
u64 abort:1;
u64 cycles:16;
- u64 reserved:44;
+ u64 type:4;
+ u64 reserved:40;
};
struct branch_entry {
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v10 6/7] perf report: Show branch type statistics for stdio mode
2017-07-18 12:13 [PATCH v10 0/7] perf report: Show branch type Jin Yao
` (4 preceding siblings ...)
2017-07-18 12:13 ` [PATCH v10 5/7] perf util: Create branch.c/.h for common branch functions Jin Yao
@ 2017-07-18 12:13 ` Jin Yao
2017-07-20 9:05 ` [tip:perf/core] " tip-bot for Jin Yao
2017-07-18 12:13 ` [PATCH v10 7/7] perf report: Show branch type in callchain entry Jin Yao
6 siblings, 1 reply; 17+ messages in thread
From: Jin Yao @ 2017-07-18 12:13 UTC (permalink / raw)
To: acme, jolsa, peterz, mingo, alexander.shishkin, mpe
Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
Show the branch type statistics at the end of perf report --stdio.
For example:
perf report --stdio
COND_FWD: 28.5%
COND_BWD: 9.4%
CROSS_4K: 0.7%
CROSS_2M: 14.1%
COND: 37.9%
UNCOND: 0.2%
IND: 6.7%
CALL: 26.5%
RET: 28.7%
SYSRET: 0.0%
The branch types are:
---------------------
COND_FWD: conditional forward
COND_BWD: conditional backward
COND: conditional branch
UNCOND: unconditional branch
IND: indirect
CALL: function call
IND_CALL: indirect function call
RET: function return
SYSCALL: syscall
SYSRET: syscall return
COND_CALL: conditional function call
COND_RET: conditional function return
CROSS_4K and CROSS_2M:
----------------------
They are the metrics checking for branches cross 4K or 2MB pages.
It's an approximate computing. We don't know if the area is 4K or
2MB, so always compute both.
To make the output simple, if a branch crosses 2M area, CROSS_4K
will not be incremented.
Change log
----------
v10: Not changed.
v9: Not changed.
v8: Not changed.
v7: Since the common branch type definitions are changed, some
tags/strings are updated accordingly.
v6: Remove branch_type_stat_display() since it's moved to branch.c.
v5: Remove the unnecessary sort__mode checking in
hist_iter__branch_callback().
v4: Comparing to previous version, the major changes are:
Add the computing of JCC forward/JCC backward and cross page checking
by using the from and to addresses.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
tools/perf/builtin-report.c | 25 +++++++++++++++++++++++++
tools/perf/util/hist.c | 5 +----
2 files changed, 26 insertions(+), 4 deletions(-)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 79a33eb..9b7f107 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -38,6 +38,7 @@
#include "util/time-utils.h"
#include "util/auxtrace.h"
#include "util/units.h"
+#include "util/branch.h"
#include <dlfcn.h>
#include <errno.h>
@@ -73,6 +74,7 @@ struct report {
u64 queue_size;
int socket_filter;
DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
+ struct branch_type_stat brtype_stat;
};
static int report__config(const char *var, const char *value, void *cb)
@@ -150,6 +152,22 @@ static int hist_iter__report_callback(struct hist_entry_iter *iter,
return err;
}
+static int hist_iter__branch_callback(struct hist_entry_iter *iter,
+ struct addr_location *al __maybe_unused,
+ bool single __maybe_unused,
+ void *arg)
+{
+ struct hist_entry *he = iter->he;
+ struct report *rep = arg;
+ struct branch_info *bi;
+
+ bi = he->branch_info;
+ branch_type_count(&rep->brtype_stat, &bi->flags,
+ bi->from.addr, bi->to.addr);
+
+ return 0;
+}
+
static int process_sample_event(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
@@ -188,6 +206,8 @@ static int process_sample_event(struct perf_tool *tool,
*/
if (!sample->branch_stack)
goto out_put;
+
+ iter.add_entry_cb = hist_iter__branch_callback;
iter.ops = &hist_iter_branch;
} else if (rep->mem_mode) {
iter.ops = &hist_iter_mem;
@@ -410,6 +430,9 @@ static int perf_evlist__tty_browse_hists(struct perf_evlist *evlist,
perf_read_values_destroy(&rep->show_threads_values);
}
+ if (sort__mode == SORT_MODE__BRANCH)
+ branch_type_stat_display(stdout, &rep->brtype_stat);
+
return 0;
}
@@ -943,6 +966,8 @@ int cmd_report(int argc, const char **argv)
if (has_br_stack && branch_call_mode)
symbol_conf.show_branchflag_count = true;
+ memset(&report.brtype_stat, 0, sizeof(struct branch_type_stat));
+
/*
* Branch mode is a tristate:
* -1 means default, so decide based on the file having branch data.
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index cf0186a..2f6c5e6 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -749,12 +749,9 @@ iter_prepare_branch_entry(struct hist_entry_iter *iter, struct addr_location *al
}
static int
-iter_add_single_branch_entry(struct hist_entry_iter *iter,
+iter_add_single_branch_entry(struct hist_entry_iter *iter __maybe_unused,
struct addr_location *al __maybe_unused)
{
- /* to avoid calling callback function */
- iter->he = NULL;
-
return 0;
}
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v10 7/7] perf report: Show branch type in callchain entry
2017-07-18 12:13 [PATCH v10 0/7] perf report: Show branch type Jin Yao
` (5 preceding siblings ...)
2017-07-18 12:13 ` [PATCH v10 6/7] perf report: Show branch type statistics for stdio mode Jin Yao
@ 2017-07-18 12:13 ` Jin Yao
2017-07-20 9:05 ` [tip:perf/core] " tip-bot for Jin Yao
6 siblings, 1 reply; 17+ messages in thread
From: Jin Yao @ 2017-07-18 12:13 UTC (permalink / raw)
To: acme, jolsa, peterz, mingo, alexander.shishkin, mpe
Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
Show branch type in callchain entry. The branch type is printed
with other LBR information (such as cycles/abort/...).
For example:
perf record -g -j any,save_type
perf report --branch-history --stdio --no-children
38.50% div.c:45 [.] main div
|
---main div.c:42 (RET CROSS_2M cycles:2)
compute_flag div.c:28 (cycles:2)
compute_flag div.c:27 (RET CROSS_2M cycles:1)
rand rand.c:28 (cycles:1)
rand rand.c:28 (RET CROSS_2M cycles:1)
__random random.c:298 (cycles:1)
__random random.c:297 (COND_BWD CROSS_2M cycles:1)
__random random.c:295 (cycles:1)
__random random.c:295 (COND_BWD CROSS_2M cycles:1)
__random random.c:295 (cycles:1)
__random random.c:295 (RET CROSS_2M cycles:9)
Change log
----------
v10: Not changed.
v9: Not changed.
v8: Not changed.
v7: Not changed.
v6: Remove the branch_type_str() since it's moved to branch.c.
v5: Rewrite the branch info print code in util/callchain.c.
v4: Comparing to previous version, the major changes are:
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
tools/perf/util/callchain.c | 38 +++++++++++++++++++++++++++++---------
tools/perf/util/callchain.h | 5 ++++-
tools/perf/util/machine.c | 26 +++++++++++++++++---------
3 files changed, 50 insertions(+), 19 deletions(-)
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index abc6908..4fff550 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -23,6 +23,7 @@
#include "sort.h"
#include "machine.h"
#include "callchain.h"
+#include "branch.h"
#define CALLCHAIN_PARAM_DEFAULT \
.mode = CHAIN_GRAPH_ABS, \
@@ -571,6 +572,11 @@ fill_node(struct callchain_node *node, struct callchain_cursor *cursor)
call->cycles_count = cursor_node->branch_flags.cycles;
call->iter_count = cursor_node->nr_loop_iter;
call->samples_count = cursor_node->samples;
+
+ branch_type_count(&call->brtype_stat,
+ &cursor_node->branch_flags,
+ cursor_node->branch_from,
+ cursor_node->ip);
}
list_add_tail(&call->list, &node->val);
@@ -688,6 +694,11 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
cnode->cycles_count += node->branch_flags.cycles;
cnode->iter_count += node->nr_loop_iter;
cnode->samples_count += node->samples;
+
+ branch_type_count(&cnode->brtype_stat,
+ &node->branch_flags,
+ node->branch_from,
+ node->ip);
}
return MATCH_EQ;
@@ -922,7 +933,7 @@ merge_chain_branch(struct callchain_cursor *cursor,
list_for_each_entry_safe(list, next_list, &src->val, list) {
callchain_cursor_append(cursor, list->ip,
list->ms.map, list->ms.sym,
- false, NULL, 0, 0);
+ false, NULL, 0, 0, 0);
list_del(&list->list);
map__zput(list->ms.map);
free(list);
@@ -962,7 +973,7 @@ int callchain_merge(struct callchain_cursor *cursor,
int callchain_cursor_append(struct callchain_cursor *cursor,
u64 ip, struct map *map, struct symbol *sym,
bool branch, struct branch_flags *flags,
- int nr_loop_iter, int samples)
+ int nr_loop_iter, int samples, u64 branch_from)
{
struct callchain_cursor_node *node = *cursor->last;
@@ -986,6 +997,7 @@ int callchain_cursor_append(struct callchain_cursor *cursor,
memcpy(&node->branch_flags, flags,
sizeof(struct branch_flags));
+ node->branch_from = branch_from;
cursor->nr++;
cursor->last = &node->next;
@@ -1241,14 +1253,19 @@ static int count_float_printf(int index, const char *str, float value,
static int counts_str_build(char *bf, int bfsize,
u64 branch_count, u64 predicted_count,
u64 abort_count, u64 cycles_count,
- u64 iter_count, u64 samples_count)
+ u64 iter_count, u64 samples_count,
+ struct branch_type_stat *brtype_stat)
{
u64 cycles;
- int printed = 0, i = 0;
+ int printed, i = 0;
if (branch_count == 0)
return scnprintf(bf, bfsize, " (calltrace)");
+ printed = branch_type_str(brtype_stat, bf, bfsize);
+ if (printed)
+ i++;
+
if (predicted_count < branch_count) {
printed += count_float_printf(i++, "predicted",
predicted_count * 100.0 / branch_count,
@@ -1284,13 +1301,14 @@ static int counts_str_build(char *bf, int bfsize,
static int callchain_counts_printf(FILE *fp, char *bf, int bfsize,
u64 branch_count, u64 predicted_count,
u64 abort_count, u64 cycles_count,
- u64 iter_count, u64 samples_count)
+ u64 iter_count, u64 samples_count,
+ struct branch_type_stat *brtype_stat)
{
- char str[128];
+ char str[256];
counts_str_build(str, sizeof(str), branch_count,
predicted_count, abort_count, cycles_count,
- iter_count, samples_count);
+ iter_count, samples_count, brtype_stat);
if (fp)
return fprintf(fp, "%s", str);
@@ -1322,7 +1340,8 @@ int callchain_list_counts__printf_value(struct callchain_node *node,
return callchain_counts_printf(fp, bf, bfsize, branch_count,
predicted_count, abort_count,
- cycles_count, iter_count, samples_count);
+ cycles_count, iter_count, samples_count,
+ &clist->brtype_stat);
}
static void free_callchain_node(struct callchain_node *node)
@@ -1447,7 +1466,8 @@ int callchain_cursor__copy(struct callchain_cursor *dst,
rc = callchain_cursor_append(dst, node->ip, node->map, node->sym,
node->branch, &node->branch_flags,
- node->nr_loop_iter, node->samples);
+ node->nr_loop_iter, node->samples,
+ node->branch_from);
if (rc)
break;
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index c56c23d..9773820 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -7,6 +7,7 @@
#include "event.h"
#include "map.h"
#include "symbol.h"
+#include "branch.h"
#define HELP_PAD "\t\t\t\t"
@@ -119,6 +120,7 @@ struct callchain_list {
u64 cycles_count;
u64 iter_count;
u64 samples_count;
+ struct branch_type_stat brtype_stat;
char *srcline;
struct list_head list;
};
@@ -135,6 +137,7 @@ struct callchain_cursor_node {
struct symbol *sym;
bool branch;
struct branch_flags branch_flags;
+ u64 branch_from;
int nr_loop_iter;
int samples;
struct callchain_cursor_node *next;
@@ -198,7 +201,7 @@ static inline void callchain_cursor_reset(struct callchain_cursor *cursor)
int callchain_cursor_append(struct callchain_cursor *cursor, u64 ip,
struct map *map, struct symbol *sym,
bool branch, struct branch_flags *flags,
- int nr_loop_iter, int samples);
+ int nr_loop_iter, int samples, u64 branch_from);
/* Close a cursor writing session. Initialize for the reader */
static inline void callchain_cursor_commit(struct callchain_cursor *cursor)
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index a54a2be..79d08ea 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1682,7 +1682,8 @@ static int add_callchain_ip(struct thread *thread,
bool branch,
struct branch_flags *flags,
int nr_loop_iter,
- int samples)
+ int samples,
+ u64 branch_from)
{
struct addr_location al;
@@ -1735,7 +1736,8 @@ static int add_callchain_ip(struct thread *thread,
if (symbol_conf.hide_unresolved && al.sym == NULL)
return 0;
return callchain_cursor_append(cursor, al.addr, al.map, al.sym,
- branch, flags, nr_loop_iter, samples);
+ branch, flags, nr_loop_iter, samples,
+ branch_from);
}
struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
@@ -1814,7 +1816,7 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
struct ip_callchain *chain = sample->callchain;
int chain_nr = min(max_stack, (int)chain->nr), i;
u8 cpumode = PERF_RECORD_MISC_USER;
- u64 ip;
+ u64 ip, branch_from = 0;
for (i = 0; i < chain_nr; i++) {
if (chain->ips[i] == PERF_CONTEXT_USER)
@@ -1856,6 +1858,8 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
ip = lbr_stack->entries[0].to;
branch = true;
flags = &lbr_stack->entries[0].flags;
+ branch_from =
+ lbr_stack->entries[0].from;
}
} else {
if (j < lbr_nr) {
@@ -1870,12 +1874,15 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
ip = lbr_stack->entries[0].to;
branch = true;
flags = &lbr_stack->entries[0].flags;
+ branch_from =
+ lbr_stack->entries[0].from;
}
}
err = add_callchain_ip(thread, cursor, parent,
root_al, &cpumode, ip,
- branch, flags, 0, 0);
+ branch, flags, 0, 0,
+ branch_from);
if (err)
return (err < 0) ? err : 0;
}
@@ -1974,19 +1981,20 @@ static int thread__resolve_callchain_sample(struct thread *thread,
root_al,
NULL, be[i].to,
true, &be[i].flags,
- nr_loop_iter, 1);
+ nr_loop_iter, 1,
+ be[i].from);
else
err = add_callchain_ip(thread, cursor, parent,
root_al,
NULL, be[i].to,
true, &be[i].flags,
- 0, 0);
+ 0, 0, be[i].from);
if (!err)
err = add_callchain_ip(thread, cursor, parent, root_al,
NULL, be[i].from,
true, &be[i].flags,
- 0, 0);
+ 0, 0, 0);
if (err == -EINVAL)
break;
if (err)
@@ -2016,7 +2024,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
err = add_callchain_ip(thread, cursor, parent,
root_al, &cpumode, ip,
- false, NULL, 0, 0);
+ false, NULL, 0, 0, 0);
if (err)
return (err < 0) ? err : 0;
@@ -2033,7 +2041,7 @@ static int unwind_entry(struct unwind_entry *entry, void *arg)
return 0;
return callchain_cursor_append(cursor, entry->ip,
entry->map, entry->sym,
- false, NULL, 0, 0);
+ false, NULL, 0, 0, 0);
}
static int thread__resolve_callchain_unwind(struct thread *thread,
--
2.7.4
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v10 1/7] perf/core: Define the common branch type classification
2017-07-18 9:31 ` Michael Ellerman
@ 2017-07-18 15:54 ` Arnaldo Carvalho de Melo
0 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-07-18 15:54 UTC (permalink / raw)
To: Michael Ellerman
Cc: Jin Yao, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel,
ak, kan.liang, yao.jin
Em Tue, Jul 18, 2017 at 07:31:37PM +1000, Michael Ellerman escreveu:
> Jin Yao <yao.jin@linux.intel.com> writes:
>
> > It is often useful to know the branch types while analyzing branch
> > data. For example, a call is very different from a conditional branch.
> >
> > Currently we have to look it up in binary while the binary may later
> > not be available and even the binary is available but user has to take
> > some time. It is very useful for user to check it directly in perf
> > report.
> >
> > Perf already has support for disassembling the branch instruction
> > to get the x86 branch type.
> >
> > To keep consistent on kernel and userspace and make the classification
> > more common, the patch adds the common branch type classification
> > in perf_event.h.
> >
> > The patch only defines a minimum but most common set of branch types.
> >
> > PERF_BR_UNKNOWN : unknown
> > PERF_BR_COND :conditional
> > PERF_BR_UNCOND : unconditional
> > PERF_BR_IND : indirect
> > PERF_BR_CALL : function call
> > PERF_BR_IND_CALL : indirect function call
> > PERF_BR_RET : function return
> > PERF_BR_SYSCALL : syscall
> > PERF_BR_SYSRET : syscall return
> > PERF_BR_COND_CALL : conditional function call
> > PERF_BR_COND_RET : conditional function return
> >
> > The patch also adds a new field type (4 bits) in perf_branch_entry
> > to record the branch type.
> >
> > Since the disassembling of branch instruction needs some overhead,
> > a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
> > needs to disassemble the branch instruction and record the branch
> > type.
> >
> > Change log
> > ----------
> > v10: Not changed.
> >
> > v9: Not changed.
> >
> > v8: Change PERF_BR_NONE to PERF_BR_UNKNOWN.
> > No other change.
>
> I acked v8, so you could have retained my ack. Here it is again:
>
> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Ok, collecting acks from you, Jiri and PeterZ, will try to get this
upstreamed soon.
- Arnaldo
^ permalink raw reply [flat|nested] 17+ messages in thread
* [tip:perf/core] perf/core: Define the common branch type classification
2017-07-18 12:13 ` [PATCH v10 1/7] perf/core: Define the common branch type classification Jin Yao
2017-07-18 9:31 ` Michael Ellerman
@ 2017-07-20 9:03 ` tip-bot for Jin Yao
1 sibling, 0 replies; 17+ messages in thread
From: tip-bot for Jin Yao @ 2017-07-20 9:03 UTC (permalink / raw)
To: linux-tip-commits
Cc: alexander.shishkin, linux-kernel, tglx, acme, peterz, yao.jin,
ak, jolsa, kan.liang, mpe, hpa, mingo
Commit-ID: eb0baf8a0d9259d168523b8e7c436b55ade7c546
Gitweb: http://git.kernel.org/tip/eb0baf8a0d9259d168523b8e7c436b55ade7c546
Author: Jin Yao <yao.jin@linux.intel.com>
AuthorDate: Tue, 18 Jul 2017 20:13:09 +0800
Committer: Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 18 Jul 2017 23:14:38 -0300
perf/core: Define the common branch type classification
It is often useful to know the branch types while analyzing branch data.
For example, a call is very different from a conditional branch.
Currently we have to look it up in binary while the binary may later not
be available and even the binary is available but user has to take some
time. It is very useful for user to check it directly in perf report.
Perf already has support for disassembling the branch instruction to get
the x86 branch type.
To keep consistent on kernel and userspace and make the classification
more common, the patch adds the common branch type classification
in perf_event.h.
The patch only defines a minimum but most common set of branch types.
PERF_BR_UNKNOWN : unknown
PERF_BR_COND :conditional
PERF_BR_UNCOND : unconditional
PERF_BR_IND : indirect
PERF_BR_CALL : function call
PERF_BR_IND_CALL : indirect function call
PERF_BR_RET : function return
PERF_BR_SYSCALL : syscall
PERF_BR_SYSRET : syscall return
PERF_BR_COND_CALL : conditional function call
PERF_BR_COND_RET : conditional function return
The patch also adds a new field type (4 bits) in perf_branch_entry
to record the branch type.
Since the disassembling of branch instruction needs some overhead,
a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
needs to disassemble the branch instruction and record the branch
type.
Change log:
v10: Not changed.
v9: Not changed.
v8: Change PERF_BR_NONE to PERF_BR_UNKNOWN.
No other change.
v7: Just keep the most common branch types.
Others are removed.
v6: Not changed.
v5: Not changed. The v5 patch series just change the userspace.
v4: Comparing to previous version, the major changes are:
1. Remove the PERF_BR_JCC_FWD/PERF_BR_JCC_BWD, they will be
computed later in userspace.
2. Remove the "cross" field in perf_branch_entry. The cross page
computing will be done later in userspace.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/1500379995-6449-2-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
include/uapi/linux/perf_event.h | 27 ++++++++++++++++++++++++++-
tools/include/uapi/linux/perf_event.h | 27 ++++++++++++++++++++++++++-
2 files changed, 52 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index b1c0b18..642db5f 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT = 14, /* no flags */
PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT = 15, /* no cycles */
+ PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT = 16, /* save branch type */
+
PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */
};
@@ -198,9 +200,30 @@ enum perf_branch_sample_type {
PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
PERF_SAMPLE_BRANCH_NO_CYCLES = 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
+ PERF_SAMPLE_BRANCH_TYPE_SAVE =
+ 1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
+
PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
};
+/*
+ * Common flow change classification
+ */
+enum {
+ PERF_BR_UNKNOWN = 0, /* unknown */
+ PERF_BR_COND = 1, /* conditional */
+ PERF_BR_UNCOND = 2, /* unconditional */
+ PERF_BR_IND = 3, /* indirect */
+ PERF_BR_CALL = 4, /* function call */
+ PERF_BR_IND_CALL = 5, /* indirect function call */
+ PERF_BR_RET = 6, /* function return */
+ PERF_BR_SYSCALL = 7, /* syscall */
+ PERF_BR_SYSRET = 8, /* syscall return */
+ PERF_BR_COND_CALL = 9, /* conditional function call */
+ PERF_BR_COND_RET = 10, /* conditional function return */
+ PERF_BR_MAX,
+};
+
#define PERF_SAMPLE_BRANCH_PLM_ALL \
(PERF_SAMPLE_BRANCH_USER|\
PERF_SAMPLE_BRANCH_KERNEL|\
@@ -1015,6 +1038,7 @@ union perf_mem_data_src {
* in_tx: running in a hardware transaction
* abort: aborting a hardware transaction
* cycles: cycles from last branch (or 0 if not supported)
+ * type: branch type
*/
struct perf_branch_entry {
__u64 from;
@@ -1024,7 +1048,8 @@ struct perf_branch_entry {
in_tx:1, /* in transaction */
abort:1, /* transaction abort */
cycles:16, /* cycle count to last branch */
- reserved:44;
+ type:4, /* branch type */
+ reserved:40;
};
#endif /* _UAPI_LINUX_PERF_EVENT_H */
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index b1c0b18..642db5f 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT = 14, /* no flags */
PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT = 15, /* no cycles */
+ PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT = 16, /* save branch type */
+
PERF_SAMPLE_BRANCH_MAX_SHIFT /* non-ABI */
};
@@ -198,9 +200,30 @@ enum perf_branch_sample_type {
PERF_SAMPLE_BRANCH_NO_FLAGS = 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
PERF_SAMPLE_BRANCH_NO_CYCLES = 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
+ PERF_SAMPLE_BRANCH_TYPE_SAVE =
+ 1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
+
PERF_SAMPLE_BRANCH_MAX = 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
};
+/*
+ * Common flow change classification
+ */
+enum {
+ PERF_BR_UNKNOWN = 0, /* unknown */
+ PERF_BR_COND = 1, /* conditional */
+ PERF_BR_UNCOND = 2, /* unconditional */
+ PERF_BR_IND = 3, /* indirect */
+ PERF_BR_CALL = 4, /* function call */
+ PERF_BR_IND_CALL = 5, /* indirect function call */
+ PERF_BR_RET = 6, /* function return */
+ PERF_BR_SYSCALL = 7, /* syscall */
+ PERF_BR_SYSRET = 8, /* syscall return */
+ PERF_BR_COND_CALL = 9, /* conditional function call */
+ PERF_BR_COND_RET = 10, /* conditional function return */
+ PERF_BR_MAX,
+};
+
#define PERF_SAMPLE_BRANCH_PLM_ALL \
(PERF_SAMPLE_BRANCH_USER|\
PERF_SAMPLE_BRANCH_KERNEL|\
@@ -1015,6 +1038,7 @@ union perf_mem_data_src {
* in_tx: running in a hardware transaction
* abort: aborting a hardware transaction
* cycles: cycles from last branch (or 0 if not supported)
+ * type: branch type
*/
struct perf_branch_entry {
__u64 from;
@@ -1024,7 +1048,8 @@ struct perf_branch_entry {
in_tx:1, /* in transaction */
abort:1, /* transaction abort */
cycles:16, /* cycle count to last branch */
- reserved:44;
+ type:4, /* branch type */
+ reserved:40;
};
#endif /* _UAPI_LINUX_PERF_EVENT_H */
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [tip:perf/core] perf/x86/intel: Record branch type
2017-07-18 12:13 ` [PATCH v10 2/7] perf/x86/intel: Record branch type Jin Yao
@ 2017-07-20 9:04 ` tip-bot for Jin Yao
0 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Jin Yao @ 2017-07-20 9:04 UTC (permalink / raw)
To: linux-tip-commits
Cc: jolsa, mingo, acme, alexander.shishkin, hpa, tglx, mpe, yao.jin,
kan.liang, peterz, linux-kernel, ak
Commit-ID: d5c7f9dc58edcfb6b45f557bb0023173a0dabde6
Gitweb: http://git.kernel.org/tip/d5c7f9dc58edcfb6b45f557bb0023173a0dabde6
Author: Jin Yao <yao.jin@linux.intel.com>
AuthorDate: Tue, 18 Jul 2017 20:13:10 +0800
Committer: Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 18 Jul 2017 23:14:38 -0300
perf/x86/intel: Record branch type
Perf already has support for disassembling the branch instruction
and using the branch type for filtering. The patch just records
the branch type in perf_branch_entry.
Before recording, the patch converts the x86 branch type to
common branch type.
Change log:
v10: Set the branch_map array to be static. The previous version
has it on stack then makes the compiler to create it every
time when the function gets called.
v9: Use __ffs() to find first bit in type in common_branch_type().
It lets the code be clear.
v8: Change PERF_BR_NONE to PERF_BR_UNKNOWN.
v7: Just convert following x86 branch types to common branch types.
X86_BR_CALL -> PERF_BR_CALL
X86_BR_RET -> PERF_BR_RET
X86_BR_JCC -> PERF_BR_COND
X86_BR_JMP -> PERF_BR_UNCOND
X86_BR_IND_CALL -> PERF_BR_IND_CALL
X86_BR_ZERO_CALL -> PERF_BR_CALL
X86_BR_IND_JMP -> PERF_BR_IND
X86_BR_SYSCALL -> PERF_BR_SYSCALL
X86_BR_SYSRET -> PERF_BR_SYSRET
Others are set to PERF_BR_NONE
v6: Not changed.
v5: Just fix the merge error. No other update.
v4: Comparing to previous version, the major changes are:
1. Uses a lookup table to convert x86 branch type to common branch
type.
2. Move the JCC forward/JCC backward and cross page computing to
user space.
3. Initialize branch type to 0 in intel_pmu_lbr_read_32 and
intel_pmu_lbr_read_64
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Link: http://lkml.kernel.org/r/1500379995-6449-3-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
arch/x86/events/intel/lbr.c | 52 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 51 insertions(+), 1 deletion(-)
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index eb26165..0edda48 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -109,6 +109,9 @@ enum {
X86_BR_ZERO_CALL = 1 << 15,/* zero length call */
X86_BR_CALL_STACK = 1 << 16,/* call stack */
X86_BR_IND_JMP = 1 << 17,/* indirect jump */
+
+ X86_BR_TYPE_SAVE = 1 << 18,/* indicate to save branch type */
+
};
#define X86_BR_PLM (X86_BR_USER | X86_BR_KERNEL)
@@ -510,6 +513,7 @@ static void intel_pmu_lbr_read_32(struct cpu_hw_events *cpuc)
cpuc->lbr_entries[i].in_tx = 0;
cpuc->lbr_entries[i].abort = 0;
cpuc->lbr_entries[i].cycles = 0;
+ cpuc->lbr_entries[i].type = 0;
cpuc->lbr_entries[i].reserved = 0;
}
cpuc->lbr_stack.nr = i;
@@ -596,6 +600,7 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc)
cpuc->lbr_entries[out].in_tx = in_tx;
cpuc->lbr_entries[out].abort = abort;
cpuc->lbr_entries[out].cycles = cycles;
+ cpuc->lbr_entries[out].type = 0;
cpuc->lbr_entries[out].reserved = 0;
out++;
}
@@ -673,6 +678,10 @@ static int intel_pmu_setup_sw_lbr_filter(struct perf_event *event)
if (br_type & PERF_SAMPLE_BRANCH_CALL)
mask |= X86_BR_CALL | X86_BR_ZERO_CALL;
+
+ if (br_type & PERF_SAMPLE_BRANCH_TYPE_SAVE)
+ mask |= X86_BR_TYPE_SAVE;
+
/*
* stash actual user request into reg, it may
* be used by fixup code for some CPU
@@ -926,6 +935,43 @@ static int branch_type(unsigned long from, unsigned long to, int abort)
return ret;
}
+#define X86_BR_TYPE_MAP_MAX 16
+
+static int branch_map[X86_BR_TYPE_MAP_MAX] = {
+ PERF_BR_CALL, /* X86_BR_CALL */
+ PERF_BR_RET, /* X86_BR_RET */
+ PERF_BR_SYSCALL, /* X86_BR_SYSCALL */
+ PERF_BR_SYSRET, /* X86_BR_SYSRET */
+ PERF_BR_UNKNOWN, /* X86_BR_INT */
+ PERF_BR_UNKNOWN, /* X86_BR_IRET */
+ PERF_BR_COND, /* X86_BR_JCC */
+ PERF_BR_UNCOND, /* X86_BR_JMP */
+ PERF_BR_UNKNOWN, /* X86_BR_IRQ */
+ PERF_BR_IND_CALL, /* X86_BR_IND_CALL */
+ PERF_BR_UNKNOWN, /* X86_BR_ABORT */
+ PERF_BR_UNKNOWN, /* X86_BR_IN_TX */
+ PERF_BR_UNKNOWN, /* X86_BR_NO_TX */
+ PERF_BR_CALL, /* X86_BR_ZERO_CALL */
+ PERF_BR_UNKNOWN, /* X86_BR_CALL_STACK */
+ PERF_BR_IND, /* X86_BR_IND_JMP */
+};
+
+static int
+common_branch_type(int type)
+{
+ int i;
+
+ type >>= 2; /* skip X86_BR_USER and X86_BR_KERNEL */
+
+ if (type) {
+ i = __ffs(type);
+ if (i < X86_BR_TYPE_MAP_MAX)
+ return branch_map[i];
+ }
+
+ return PERF_BR_UNKNOWN;
+}
+
/*
* implement actual branch filter based on user demand.
* Hardware may not exactly satisfy that request, thus
@@ -942,7 +988,8 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
bool compress = false;
/* if sampling all branches, then nothing to filter */
- if ((br_sel & X86_BR_ALL) == X86_BR_ALL)
+ if (((br_sel & X86_BR_ALL) == X86_BR_ALL) &&
+ ((br_sel & X86_BR_TYPE_SAVE) != X86_BR_TYPE_SAVE))
return;
for (i = 0; i < cpuc->lbr_stack.nr; i++) {
@@ -963,6 +1010,9 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
cpuc->lbr_entries[i].from = 0;
compress = true;
}
+
+ if ((br_sel & X86_BR_TYPE_SAVE) == X86_BR_TYPE_SAVE)
+ cpuc->lbr_entries[i].type = common_branch_type(type);
}
if (!compress)
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [tip:perf/core] perf record: Create a new option save_type in --branch-filter
2017-07-18 12:13 ` [PATCH v10 3/7] perf record: Create a new option save_type in --branch-filter Jin Yao
@ 2017-07-20 9:04 ` tip-bot for Jin Yao
0 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Jin Yao @ 2017-07-20 9:04 UTC (permalink / raw)
To: linux-tip-commits
Cc: yao.jin, jolsa, mingo, tglx, ak, alexander.shishkin, acme,
linux-kernel, mpe, hpa, peterz, kan.liang
Commit-ID: 60f83fa6341dab4aec01cee354ea902771473adb
Gitweb: http://git.kernel.org/tip/60f83fa6341dab4aec01cee354ea902771473adb
Author: Jin Yao <yao.jin@linux.intel.com>
AuthorDate: Tue, 18 Jul 2017 20:13:11 +0800
Committer: Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 18 Jul 2017 23:14:39 -0300
perf record: Create a new option save_type in --branch-filter
The option indicates the kernel to save branch type during sampling.
One example:
perf record -g --branch-filter any,save_type <command>
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1500379995-6449-4-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/Documentation/perf-record.txt | 1 +
tools/perf/util/parse-branch-options.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index b0e9e92..9bdea04 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -332,6 +332,7 @@ following filters are defined:
- no_tx: only when the target is not in a hardware transaction
- abort_tx: only when the target is a hardware transaction abort
- cond: conditional branches
+ - save_type: save branch type during sampling in case binary is not available later
+
The option requires at least one branch type among any, any_call, any_ret, ind_call, cond.
diff --git a/tools/perf/util/parse-branch-options.c b/tools/perf/util/parse-branch-options.c
index 38fd115..e71fb5f 100644
--- a/tools/perf/util/parse-branch-options.c
+++ b/tools/perf/util/parse-branch-options.c
@@ -28,6 +28,7 @@ static const struct branch_mode branch_modes[] = {
BRANCH_OPT("cond", PERF_SAMPLE_BRANCH_COND),
BRANCH_OPT("ind_jmp", PERF_SAMPLE_BRANCH_IND_JUMP),
BRANCH_OPT("call", PERF_SAMPLE_BRANCH_CALL),
+ BRANCH_OPT("save_type", PERF_SAMPLE_BRANCH_TYPE_SAVE),
BRANCH_END
};
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [tip:perf/core] perf report: Refactor the branch info printing code
2017-07-18 12:13 ` [PATCH v10 4/7] perf report: Refactor the branch info printing code Jin Yao
@ 2017-07-20 9:04 ` tip-bot for Jin Yao
0 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Jin Yao @ 2017-07-20 9:04 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, yao.jin, acme, alexander.shishkin, jolsa, hpa, mpe,
mingo, peterz, tglx, ak, kan.liang
Commit-ID: 8d51735fcd2be1791c0bfe2d581d9063281fe7fb
Gitweb: http://git.kernel.org/tip/8d51735fcd2be1791c0bfe2d581d9063281fe7fb
Author: Jin Yao <yao.jin@linux.intel.com>
AuthorDate: Tue, 18 Jul 2017 20:13:12 +0800
Committer: Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 18 Jul 2017 23:14:40 -0300
perf report: Refactor the branch info printing code
The branch info such as predicted/cycles/... are printed at the
callchain entries.
For example: perf report --branch-history --no-children --stdio
--1.07%--main div.c:39 (predicted:52.4% cycles:1 iterations:17)
main div.c:44 (predicted:52.4% cycles:1)
main div.c:42 (cycles:2)
compute_flag div.c:28 (cycles:2)
compute_flag div.c:27 (cycles:1)
rand rand.c:28 (cycles:1)
rand rand.c:28 (cycles:1)
__random random.c:298 (cycles:1)
__random random.c:297 (cycles:1)
__random random.c:295 (cycles:1)
__random random.c:295 (cycles:1)
__random random.c:295 (cycles:1)
But the current code is difficult to maintain and extend. This patch
refactors the code for easy maintenance.
Change log:
v6: 1. Put the multiline condition code into {} brackets in
counts_str_build()
2. Keep the original display order, that is:
predicted, abort, cycles, iterations
v5: It's a new patch in v5 patch series.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1500379995-6449-5-git-send-email-yao.jin@linux.intel.com
[ Don't use 'index' as a name for a variable, it shadows a globa decl in older distros ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/callchain.c | 100 ++++++++++++++++++--------------------------
1 file changed, 41 insertions(+), 59 deletions(-)
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index b4204b4..917f4d6 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1214,83 +1214,65 @@ int callchain_branch_counts(struct callchain_root *root,
cycles_count);
}
+static int count_pri64_printf(int idx, const char *str, u64 value, char *bf, int bfsize)
+{
+ int printed;
+
+ printed = scnprintf(bf, bfsize, "%s%s:%" PRId64 "", (idx) ? " " : " (", str, value);
+
+ return printed;
+}
+
+static int count_float_printf(int idx, const char *str, float value, char *bf, int bfsize)
+{
+ int printed;
+
+ printed = scnprintf(bf, bfsize, "%s%s:%.1f%%", (idx) ? " " : " (", str, value);
+
+ return printed;
+}
+
static int counts_str_build(char *bf, int bfsize,
u64 branch_count, u64 predicted_count,
u64 abort_count, u64 cycles_count,
u64 iter_count, u64 samples_count)
{
- double predicted_percent = 0.0;
- const char *null_str = "";
- char iter_str[32];
- char cycle_str[32];
- char *istr, *cstr;
u64 cycles;
+ int printed = 0, i = 0;
if (branch_count == 0)
return scnprintf(bf, bfsize, " (calltrace)");
- cycles = cycles_count / branch_count;
-
- if (iter_count && samples_count) {
- if (cycles > 0)
- scnprintf(iter_str, sizeof(iter_str),
- " iterations:%" PRId64 "",
- iter_count / samples_count);
- else
- scnprintf(iter_str, sizeof(iter_str),
- "iterations:%" PRId64 "",
- iter_count / samples_count);
- istr = iter_str;
- } else
- istr = (char *)null_str;
-
- if (cycles > 0) {
- scnprintf(cycle_str, sizeof(cycle_str),
- "cycles:%" PRId64 "", cycles);
- cstr = cycle_str;
- } else
- cstr = (char *)null_str;
-
- predicted_percent = predicted_count * 100.0 / branch_count;
+ if (predicted_count < branch_count) {
+ printed += count_float_printf(i++, "predicted",
+ predicted_count * 100.0 / branch_count,
+ bf + printed, bfsize - printed);
+ }
- if ((predicted_count == branch_count) && (abort_count == 0)) {
- if ((cycles > 0) || (istr != (char *)null_str))
- return scnprintf(bf, bfsize, " (%s%s)", cstr, istr);
- else
- return scnprintf(bf, bfsize, "%s", (char *)null_str);
+ if (abort_count) {
+ printed += count_float_printf(i++, "abort",
+ abort_count * 100.0 / branch_count,
+ bf + printed, bfsize - printed);
}
- if ((predicted_count < branch_count) && (abort_count == 0)) {
- if ((cycles > 0) || (istr != (char *)null_str))
- return scnprintf(bf, bfsize,
- " (predicted:%.1f%% %s%s)",
- predicted_percent, cstr, istr);
- else {
- return scnprintf(bf, bfsize,
- " (predicted:%.1f%%)",
- predicted_percent);
- }
+ cycles = cycles_count / branch_count;
+ if (cycles) {
+ printed += count_pri64_printf(i++, "cycles",
+ cycles,
+ bf + printed, bfsize - printed);
}
- if ((predicted_count == branch_count) && (abort_count > 0)) {
- if ((cycles > 0) || (istr != (char *)null_str))
- return scnprintf(bf, bfsize,
- " (abort:%" PRId64 " %s%s)",
- abort_count, cstr, istr);
- else
- return scnprintf(bf, bfsize,
- " (abort:%" PRId64 ")",
- abort_count);
+ if (iter_count && samples_count) {
+ printed += count_pri64_printf(i++, "iterations",
+ iter_count / samples_count,
+ bf + printed, bfsize - printed);
}
- if ((cycles > 0) || (istr != (char *)null_str))
- return scnprintf(bf, bfsize,
- " (predicted:%.1f%% abort:%" PRId64 " %s%s)",
- predicted_percent, abort_count, cstr, istr);
+ if (i)
+ return scnprintf(bf + printed, bfsize - printed, ")");
- return scnprintf(bf, bfsize,
- " (predicted:%.1f%% abort:%" PRId64 ")",
- predicted_percent, abort_count);
+ bf[0] = 0;
+ return 0;
}
static int callchain_counts_printf(FILE *fp, char *bf, int bfsize,
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [tip:perf/core] perf util: Create branch.c/.h for common branch functions
2017-07-18 12:13 ` [PATCH v10 5/7] perf util: Create branch.c/.h for common branch functions Jin Yao
@ 2017-07-20 9:05 ` tip-bot for Jin Yao
0 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Jin Yao @ 2017-07-20 9:05 UTC (permalink / raw)
To: linux-tip-commits
Cc: acme, ak, mpe, alexander.shishkin, jolsa, mingo, yao.jin, peterz,
linux-kernel, kan.liang, tglx, hpa
Commit-ID: 992c7e9267c12a8e301152c5569028ff8d535322
Gitweb: http://git.kernel.org/tip/992c7e9267c12a8e301152c5569028ff8d535322
Author: Jin Yao <yao.jin@linux.intel.com>
AuthorDate: Tue, 18 Jul 2017 20:13:13 +0800
Committer: Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 18 Jul 2017 23:14:40 -0300
perf util: Create branch.c/.h for common branch functions
Create new util/branch.c and util/branch.h to contain the common branch
functions. Such as:
branch_type_count(): Count the numbers of branch types
branch_type_name() : Return the name of branch type
branch_type_stat_display(): Display branch type statistics info
branch_type_str(): Construct the branch type string.
The branch type is saved in branch_flags.
Change log:
v8: Change PERF_BR_NONE to PERF_BR_UNKNOWN.
v7: Since the common branch type name is changed (e.g. JCC->COND),
this patch is performed the modification accordingly.
v6: Move that multiline conditional code inside {} brackets.
Move branch_type_stat_display() from builtin-report.c to
branch.c.
Move branch_type_str() from callchain.c to branch.c.
v5: It's a new patch in v5 patch series.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1500379995-6449-6-git-send-email-yao.jin@linux.intel.com
[ Don't use 'index' and 'stat' as names for variables, it shadows global decls in older distros ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/Build | 1 +
tools/perf/util/branch.c | 147 +++++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/branch.h | 24 ++++++++
tools/perf/util/event.h | 3 +-
4 files changed, 174 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 7580fe4..8d49a98 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -93,6 +93,7 @@ libperf-y += drv_configs.o
libperf-y += units.o
libperf-y += time-utils.o
libperf-y += expr-bison.o
+libperf-y += branch.o
libperf-$(CONFIG_LIBBPF) += bpf-loader.o
libperf-$(CONFIG_BPF_PROLOGUE) += bpf-prologue.o
diff --git a/tools/perf/util/branch.c b/tools/perf/util/branch.c
new file mode 100644
index 0000000..a4fce27
--- /dev/null
+++ b/tools/perf/util/branch.c
@@ -0,0 +1,147 @@
+#include "perf.h"
+#include "util/util.h"
+#include "util/debug.h"
+#include "util/branch.h"
+
+static bool cross_area(u64 addr1, u64 addr2, int size)
+{
+ u64 align1, align2;
+
+ align1 = addr1 & ~(size - 1);
+ align2 = addr2 & ~(size - 1);
+
+ return (align1 != align2) ? true : false;
+}
+
+#define AREA_4K 4096
+#define AREA_2M (2 * 1024 * 1024)
+
+void branch_type_count(struct branch_type_stat *st, struct branch_flags *flags,
+ u64 from, u64 to)
+{
+ if (flags->type == PERF_BR_UNKNOWN || from == 0)
+ return;
+
+ st->counts[flags->type]++;
+
+ if (flags->type == PERF_BR_COND) {
+ if (to > from)
+ st->cond_fwd++;
+ else
+ st->cond_bwd++;
+ }
+
+ if (cross_area(from, to, AREA_2M))
+ st->cross_2m++;
+ else if (cross_area(from, to, AREA_4K))
+ st->cross_4k++;
+}
+
+const char *branch_type_name(int type)
+{
+ const char *branch_names[PERF_BR_MAX] = {
+ "N/A",
+ "COND",
+ "UNCOND",
+ "IND",
+ "CALL",
+ "IND_CALL",
+ "RET",
+ "SYSCALL",
+ "SYSRET",
+ "COND_CALL",
+ "COND_RET"
+ };
+
+ if (type >= 0 && type < PERF_BR_MAX)
+ return branch_names[type];
+
+ return NULL;
+}
+
+void branch_type_stat_display(FILE *fp, struct branch_type_stat *st)
+{
+ u64 total = 0;
+ int i;
+
+ for (i = 0; i < PERF_BR_MAX; i++)
+ total += st->counts[i];
+
+ if (total == 0)
+ return;
+
+ fprintf(fp, "\n#");
+ fprintf(fp, "\n# Branch Statistics:");
+ fprintf(fp, "\n#");
+
+ if (st->cond_fwd > 0) {
+ fprintf(fp, "\n%8s: %5.1f%%",
+ "COND_FWD",
+ 100.0 * (double)st->cond_fwd / (double)total);
+ }
+
+ if (st->cond_bwd > 0) {
+ fprintf(fp, "\n%8s: %5.1f%%",
+ "COND_BWD",
+ 100.0 * (double)st->cond_bwd / (double)total);
+ }
+
+ if (st->cross_4k > 0) {
+ fprintf(fp, "\n%8s: %5.1f%%",
+ "CROSS_4K",
+ 100.0 * (double)st->cross_4k / (double)total);
+ }
+
+ if (st->cross_2m > 0) {
+ fprintf(fp, "\n%8s: %5.1f%%",
+ "CROSS_2M",
+ 100.0 * (double)st->cross_2m / (double)total);
+ }
+
+ for (i = 0; i < PERF_BR_MAX; i++) {
+ if (st->counts[i] > 0)
+ fprintf(fp, "\n%8s: %5.1f%%",
+ branch_type_name(i),
+ 100.0 *
+ (double)st->counts[i] / (double)total);
+ }
+}
+
+static int count_str_scnprintf(int idx, const char *str, char *bf, int size)
+{
+ return scnprintf(bf, size, "%s%s", (idx) ? " " : " (", str);
+}
+
+int branch_type_str(struct branch_type_stat *st, char *bf, int size)
+{
+ int i, j = 0, printed = 0;
+ u64 total = 0;
+
+ for (i = 0; i < PERF_BR_MAX; i++)
+ total += st->counts[i];
+
+ if (total == 0)
+ return 0;
+
+ if (st->cond_fwd > 0)
+ printed += count_str_scnprintf(j++, "COND_FWD", bf + printed, size - printed);
+
+ if (st->cond_bwd > 0)
+ printed += count_str_scnprintf(j++, "COND_BWD", bf + printed, size - printed);
+
+ for (i = 0; i < PERF_BR_MAX; i++) {
+ if (i == PERF_BR_COND)
+ continue;
+
+ if (st->counts[i] > 0)
+ printed += count_str_scnprintf(j++, branch_type_name(i), bf + printed, size - printed);
+ }
+
+ if (st->cross_4k > 0)
+ printed += count_str_scnprintf(j++, "CROSS_4K", bf + printed, size - printed);
+
+ if (st->cross_2m > 0)
+ printed += count_str_scnprintf(j++, "CROSS_2M", bf + printed, size - printed);
+
+ return printed;
+}
diff --git a/tools/perf/util/branch.h b/tools/perf/util/branch.h
new file mode 100644
index 0000000..686f2b6
--- /dev/null
+++ b/tools/perf/util/branch.h
@@ -0,0 +1,24 @@
+#ifndef _PERF_BRANCH_H
+#define _PERF_BRANCH_H 1
+
+#include <stdint.h>
+#include "../perf.h"
+
+struct branch_type_stat {
+ u64 counts[PERF_BR_MAX];
+ u64 cond_fwd;
+ u64 cond_bwd;
+ u64 cross_4k;
+ u64 cross_2m;
+};
+
+struct branch_flags;
+
+void branch_type_count(struct branch_type_stat *st, struct branch_flags *flags,
+ u64 from, u64 to);
+
+const char *branch_type_name(int type);
+void branch_type_stat_display(FILE *fp, struct branch_type_stat *st);
+int branch_type_str(struct branch_type_stat *st, char *bf, int bfsize);
+
+#endif /* _PERF_BRANCH_H */
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 37c5faf..423ac82 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -142,7 +142,8 @@ struct branch_flags {
u64 in_tx:1;
u64 abort:1;
u64 cycles:16;
- u64 reserved:44;
+ u64 type:4;
+ u64 reserved:40;
};
struct branch_entry {
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [tip:perf/core] perf report: Show branch type statistics for stdio mode
2017-07-18 12:13 ` [PATCH v10 6/7] perf report: Show branch type statistics for stdio mode Jin Yao
@ 2017-07-20 9:05 ` tip-bot for Jin Yao
0 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Jin Yao @ 2017-07-20 9:05 UTC (permalink / raw)
To: linux-tip-commits
Cc: acme, tglx, mingo, kan.liang, ak, alexander.shishkin, hpa,
peterz, yao.jin, mpe, jolsa, linux-kernel
Commit-ID: 2d78b18952a1bdf125d13fa6bb68fbc5c1b0aed9
Gitweb: http://git.kernel.org/tip/2d78b18952a1bdf125d13fa6bb68fbc5c1b0aed9
Author: Jin Yao <yao.jin@linux.intel.com>
AuthorDate: Tue, 18 Jul 2017 20:13:14 +0800
Committer: Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 18 Jul 2017 23:14:41 -0300
perf report: Show branch type statistics for stdio mode
Show the branch type statistics at the end of perf report --stdio.
For example:
perf report --stdio
COND_FWD: 28.5%
COND_BWD: 9.4%
CROSS_4K: 0.7%
CROSS_2M: 14.1%
COND: 37.9%
UNCOND: 0.2%
IND: 6.7%
CALL: 26.5%
RET: 28.7%
SYSRET: 0.0%
The branch types are:
COND_FWD: conditional forward
COND_BWD: conditional backward
COND: conditional branch
UNCOND: unconditional branch
IND: indirect
CALL: function call
IND_CALL: indirect function call
RET: function return
SYSCALL: syscall
SYSRET: syscall return
COND_CALL: conditional function call
COND_RET: conditional function return
CROSS_4K and CROSS_2M:
They are the metrics checking for branches cross 4K or 2MB pages.
It's an approximate computing. We don't know if the area is 4K or
2MB, so always compute both.
To make the output simple, if a branch crosses 2M area, CROSS_4K
will not be incremented.
Change log
v7: Since the common branch type definitions are changed, some
tags/strings are updated accordingly.
v6: Remove branch_type_stat_display() since it's moved to branch.c.
v5: Remove the unnecessary sort__mode checking in
hist_iter__branch_callback().
v4: Comparing to previous version, the major changes are:
Add the computing of JCC forward/JCC backward and cross page checking
by using the from and to addresses.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1500379995-6449-7-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/builtin-report.c | 25 +++++++++++++++++++++++++
tools/perf/util/hist.c | 5 +----
2 files changed, 26 insertions(+), 4 deletions(-)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 8e752ba..cea25d0 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -38,6 +38,7 @@
#include "util/time-utils.h"
#include "util/auxtrace.h"
#include "util/units.h"
+#include "util/branch.h"
#include <dlfcn.h>
#include <errno.h>
@@ -73,6 +74,7 @@ struct report {
u64 queue_size;
int socket_filter;
DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
+ struct branch_type_stat brtype_stat;
};
static int report__config(const char *var, const char *value, void *cb)
@@ -150,6 +152,22 @@ out:
return err;
}
+static int hist_iter__branch_callback(struct hist_entry_iter *iter,
+ struct addr_location *al __maybe_unused,
+ bool single __maybe_unused,
+ void *arg)
+{
+ struct hist_entry *he = iter->he;
+ struct report *rep = arg;
+ struct branch_info *bi;
+
+ bi = he->branch_info;
+ branch_type_count(&rep->brtype_stat, &bi->flags,
+ bi->from.addr, bi->to.addr);
+
+ return 0;
+}
+
static int process_sample_event(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
@@ -188,6 +206,8 @@ static int process_sample_event(struct perf_tool *tool,
*/
if (!sample->branch_stack)
goto out_put;
+
+ iter.add_entry_cb = hist_iter__branch_callback;
iter.ops = &hist_iter_branch;
} else if (rep->mem_mode) {
iter.ops = &hist_iter_mem;
@@ -410,6 +430,9 @@ static int perf_evlist__tty_browse_hists(struct perf_evlist *evlist,
perf_read_values_destroy(&rep->show_threads_values);
}
+ if (sort__mode == SORT_MODE__BRANCH)
+ branch_type_stat_display(stdout, &rep->brtype_stat);
+
return 0;
}
@@ -944,6 +967,8 @@ repeat:
if (has_br_stack && branch_call_mode)
symbol_conf.show_branchflag_count = true;
+ memset(&report.brtype_stat, 0, sizeof(struct branch_type_stat));
+
/*
* Branch mode is a tristate:
* -1 means default, so decide based on the file having branch data.
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index cf0186a..2f6c5e6 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -749,12 +749,9 @@ iter_prepare_branch_entry(struct hist_entry_iter *iter, struct addr_location *al
}
static int
-iter_add_single_branch_entry(struct hist_entry_iter *iter,
+iter_add_single_branch_entry(struct hist_entry_iter *iter __maybe_unused,
struct addr_location *al __maybe_unused)
{
- /* to avoid calling callback function */
- iter->he = NULL;
-
return 0;
}
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [tip:perf/core] perf report: Show branch type in callchain entry
2017-07-18 12:13 ` [PATCH v10 7/7] perf report: Show branch type in callchain entry Jin Yao
@ 2017-07-20 9:05 ` tip-bot for Jin Yao
0 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Jin Yao @ 2017-07-20 9:05 UTC (permalink / raw)
To: linux-tip-commits
Cc: acme, ak, alexander.shishkin, hpa, jolsa, mingo, kan.liang, mpe,
yao.jin, tglx, linux-kernel, peterz
Commit-ID: b851dd49868e295e18c5d72fc3bad85ff1c444b1
Gitweb: http://git.kernel.org/tip/b851dd49868e295e18c5d72fc3bad85ff1c444b1
Author: Jin Yao <yao.jin@linux.intel.com>
AuthorDate: Tue, 18 Jul 2017 20:13:15 +0800
Committer: Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 18 Jul 2017 23:14:42 -0300
perf report: Show branch type in callchain entry
Show branch type in callchain entry. The branch type is printed
with other LBR information (such as cycles/abort/...).
For example:
perf record -g -j any,save_type
perf report --branch-history --stdio --no-children
38.50% div.c:45 [.] main div
|
---main div.c:42 (RET CROSS_2M cycles:2)
compute_flag div.c:28 (cycles:2)
compute_flag div.c:27 (RET CROSS_2M cycles:1)
rand rand.c:28 (cycles:1)
rand rand.c:28 (RET CROSS_2M cycles:1)
__random random.c:298 (cycles:1)
__random random.c:297 (COND_BWD CROSS_2M cycles:1)
__random random.c:295 (cycles:1)
__random random.c:295 (COND_BWD CROSS_2M cycles:1)
__random random.c:295 (cycles:1)
__random random.c:295 (RET CROSS_2M cycles:9)
Change log
v6: Remove the branch_type_str() since it's moved to branch.c.
v5: Rewrite the branch info print code in util/callchain.c.
v4: Comparing to previous version, the major changes are:
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1500379995-6449-8-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/util/callchain.c | 38 +++++++++++++++++++++++++++++---------
tools/perf/util/callchain.h | 5 ++++-
tools/perf/util/machine.c | 26 +++++++++++++++++---------
3 files changed, 50 insertions(+), 19 deletions(-)
diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 917f4d6..22d413a 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -23,6 +23,7 @@
#include "sort.h"
#include "machine.h"
#include "callchain.h"
+#include "branch.h"
#define CALLCHAIN_PARAM_DEFAULT \
.mode = CHAIN_GRAPH_ABS, \
@@ -571,6 +572,11 @@ fill_node(struct callchain_node *node, struct callchain_cursor *cursor)
call->cycles_count = cursor_node->branch_flags.cycles;
call->iter_count = cursor_node->nr_loop_iter;
call->samples_count = cursor_node->samples;
+
+ branch_type_count(&call->brtype_stat,
+ &cursor_node->branch_flags,
+ cursor_node->branch_from,
+ cursor_node->ip);
}
list_add_tail(&call->list, &node->val);
@@ -688,6 +694,11 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
cnode->cycles_count += node->branch_flags.cycles;
cnode->iter_count += node->nr_loop_iter;
cnode->samples_count += node->samples;
+
+ branch_type_count(&cnode->brtype_stat,
+ &node->branch_flags,
+ node->branch_from,
+ node->ip);
}
return MATCH_EQ;
@@ -922,7 +933,7 @@ merge_chain_branch(struct callchain_cursor *cursor,
list_for_each_entry_safe(list, next_list, &src->val, list) {
callchain_cursor_append(cursor, list->ip,
list->ms.map, list->ms.sym,
- false, NULL, 0, 0);
+ false, NULL, 0, 0, 0);
list_del(&list->list);
map__zput(list->ms.map);
free(list);
@@ -962,7 +973,7 @@ int callchain_merge(struct callchain_cursor *cursor,
int callchain_cursor_append(struct callchain_cursor *cursor,
u64 ip, struct map *map, struct symbol *sym,
bool branch, struct branch_flags *flags,
- int nr_loop_iter, int samples)
+ int nr_loop_iter, int samples, u64 branch_from)
{
struct callchain_cursor_node *node = *cursor->last;
@@ -986,6 +997,7 @@ int callchain_cursor_append(struct callchain_cursor *cursor,
memcpy(&node->branch_flags, flags,
sizeof(struct branch_flags));
+ node->branch_from = branch_from;
cursor->nr++;
cursor->last = &node->next;
@@ -1235,14 +1247,19 @@ static int count_float_printf(int idx, const char *str, float value, char *bf, i
static int counts_str_build(char *bf, int bfsize,
u64 branch_count, u64 predicted_count,
u64 abort_count, u64 cycles_count,
- u64 iter_count, u64 samples_count)
+ u64 iter_count, u64 samples_count,
+ struct branch_type_stat *brtype_stat)
{
u64 cycles;
- int printed = 0, i = 0;
+ int printed, i = 0;
if (branch_count == 0)
return scnprintf(bf, bfsize, " (calltrace)");
+ printed = branch_type_str(brtype_stat, bf, bfsize);
+ if (printed)
+ i++;
+
if (predicted_count < branch_count) {
printed += count_float_printf(i++, "predicted",
predicted_count * 100.0 / branch_count,
@@ -1278,13 +1295,14 @@ static int counts_str_build(char *bf, int bfsize,
static int callchain_counts_printf(FILE *fp, char *bf, int bfsize,
u64 branch_count, u64 predicted_count,
u64 abort_count, u64 cycles_count,
- u64 iter_count, u64 samples_count)
+ u64 iter_count, u64 samples_count,
+ struct branch_type_stat *brtype_stat)
{
- char str[128];
+ char str[256];
counts_str_build(str, sizeof(str), branch_count,
predicted_count, abort_count, cycles_count,
- iter_count, samples_count);
+ iter_count, samples_count, brtype_stat);
if (fp)
return fprintf(fp, "%s", str);
@@ -1316,7 +1334,8 @@ int callchain_list_counts__printf_value(struct callchain_node *node,
return callchain_counts_printf(fp, bf, bfsize, branch_count,
predicted_count, abort_count,
- cycles_count, iter_count, samples_count);
+ cycles_count, iter_count, samples_count,
+ &clist->brtype_stat);
}
static void free_callchain_node(struct callchain_node *node)
@@ -1441,7 +1460,8 @@ int callchain_cursor__copy(struct callchain_cursor *dst,
rc = callchain_cursor_append(dst, node->ip, node->map, node->sym,
node->branch, &node->branch_flags,
- node->nr_loop_iter, node->samples);
+ node->nr_loop_iter, node->samples,
+ node->branch_from);
if (rc)
break;
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index c56c23d..9773820 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -7,6 +7,7 @@
#include "event.h"
#include "map.h"
#include "symbol.h"
+#include "branch.h"
#define HELP_PAD "\t\t\t\t"
@@ -119,6 +120,7 @@ struct callchain_list {
u64 cycles_count;
u64 iter_count;
u64 samples_count;
+ struct branch_type_stat brtype_stat;
char *srcline;
struct list_head list;
};
@@ -135,6 +137,7 @@ struct callchain_cursor_node {
struct symbol *sym;
bool branch;
struct branch_flags branch_flags;
+ u64 branch_from;
int nr_loop_iter;
int samples;
struct callchain_cursor_node *next;
@@ -198,7 +201,7 @@ static inline void callchain_cursor_reset(struct callchain_cursor *cursor)
int callchain_cursor_append(struct callchain_cursor *cursor, u64 ip,
struct map *map, struct symbol *sym,
bool branch, struct branch_flags *flags,
- int nr_loop_iter, int samples);
+ int nr_loop_iter, int samples, u64 branch_from);
/* Close a cursor writing session. Initialize for the reader */
static inline void callchain_cursor_commit(struct callchain_cursor *cursor)
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index a54a2be..79d08ea 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1682,7 +1682,8 @@ static int add_callchain_ip(struct thread *thread,
bool branch,
struct branch_flags *flags,
int nr_loop_iter,
- int samples)
+ int samples,
+ u64 branch_from)
{
struct addr_location al;
@@ -1735,7 +1736,8 @@ static int add_callchain_ip(struct thread *thread,
if (symbol_conf.hide_unresolved && al.sym == NULL)
return 0;
return callchain_cursor_append(cursor, al.addr, al.map, al.sym,
- branch, flags, nr_loop_iter, samples);
+ branch, flags, nr_loop_iter, samples,
+ branch_from);
}
struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
@@ -1814,7 +1816,7 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
struct ip_callchain *chain = sample->callchain;
int chain_nr = min(max_stack, (int)chain->nr), i;
u8 cpumode = PERF_RECORD_MISC_USER;
- u64 ip;
+ u64 ip, branch_from = 0;
for (i = 0; i < chain_nr; i++) {
if (chain->ips[i] == PERF_CONTEXT_USER)
@@ -1856,6 +1858,8 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
ip = lbr_stack->entries[0].to;
branch = true;
flags = &lbr_stack->entries[0].flags;
+ branch_from =
+ lbr_stack->entries[0].from;
}
} else {
if (j < lbr_nr) {
@@ -1870,12 +1874,15 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
ip = lbr_stack->entries[0].to;
branch = true;
flags = &lbr_stack->entries[0].flags;
+ branch_from =
+ lbr_stack->entries[0].from;
}
}
err = add_callchain_ip(thread, cursor, parent,
root_al, &cpumode, ip,
- branch, flags, 0, 0);
+ branch, flags, 0, 0,
+ branch_from);
if (err)
return (err < 0) ? err : 0;
}
@@ -1974,19 +1981,20 @@ static int thread__resolve_callchain_sample(struct thread *thread,
root_al,
NULL, be[i].to,
true, &be[i].flags,
- nr_loop_iter, 1);
+ nr_loop_iter, 1,
+ be[i].from);
else
err = add_callchain_ip(thread, cursor, parent,
root_al,
NULL, be[i].to,
true, &be[i].flags,
- 0, 0);
+ 0, 0, be[i].from);
if (!err)
err = add_callchain_ip(thread, cursor, parent, root_al,
NULL, be[i].from,
true, &be[i].flags,
- 0, 0);
+ 0, 0, 0);
if (err == -EINVAL)
break;
if (err)
@@ -2016,7 +2024,7 @@ check_calls:
err = add_callchain_ip(thread, cursor, parent,
root_al, &cpumode, ip,
- false, NULL, 0, 0);
+ false, NULL, 0, 0, 0);
if (err)
return (err < 0) ? err : 0;
@@ -2033,7 +2041,7 @@ static int unwind_entry(struct unwind_entry *entry, void *arg)
return 0;
return callchain_cursor_append(cursor, entry->ip,
entry->map, entry->sym,
- false, NULL, 0, 0);
+ false, NULL, 0, 0, 0);
}
static int thread__resolve_callchain_unwind(struct thread *thread,
^ permalink raw reply related [flat|nested] 17+ messages in thread
end of thread, other threads:[~2017-07-20 9:12 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-18 12:13 [PATCH v10 0/7] perf report: Show branch type Jin Yao
2017-07-18 12:13 ` [PATCH v10 1/7] perf/core: Define the common branch type classification Jin Yao
2017-07-18 9:31 ` Michael Ellerman
2017-07-18 15:54 ` Arnaldo Carvalho de Melo
2017-07-20 9:03 ` [tip:perf/core] " tip-bot for Jin Yao
2017-07-18 12:13 ` [PATCH v10 2/7] perf/x86/intel: Record branch type Jin Yao
2017-07-20 9:04 ` [tip:perf/core] " tip-bot for Jin Yao
2017-07-18 12:13 ` [PATCH v10 3/7] perf record: Create a new option save_type in --branch-filter Jin Yao
2017-07-20 9:04 ` [tip:perf/core] " tip-bot for Jin Yao
2017-07-18 12:13 ` [PATCH v10 4/7] perf report: Refactor the branch info printing code Jin Yao
2017-07-20 9:04 ` [tip:perf/core] " tip-bot for Jin Yao
2017-07-18 12:13 ` [PATCH v10 5/7] perf util: Create branch.c/.h for common branch functions Jin Yao
2017-07-20 9:05 ` [tip:perf/core] " tip-bot for Jin Yao
2017-07-18 12:13 ` [PATCH v10 6/7] perf report: Show branch type statistics for stdio mode Jin Yao
2017-07-20 9:05 ` [tip:perf/core] " tip-bot for Jin Yao
2017-07-18 12:13 ` [PATCH v10 7/7] perf report: Show branch type in callchain entry Jin Yao
2017-07-20 9:05 ` [tip:perf/core] " tip-bot for Jin Yao
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.