All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v6 0/7] perf report: Show branch type
  2017-04-20 12:07 [PATCH v6 0/7] perf report: Show branch type Jin Yao
@ 2017-04-20  9:36 ` Jiri Olsa
  2017-04-23  8:36   ` Jin, Yao
  2017-06-02  8:02   ` Jin, Yao
  2017-04-20 12:07 ` [PATCH v6 1/7] perf/core: Define the common branch type classification Jin Yao
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 35+ messages in thread
From: Jiri Olsa @ 2017-04-20  9:36 UTC (permalink / raw)
  To: Jin Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin, linuxppc-dev

On Thu, Apr 20, 2017 at 08:07:48PM +0800, Jin Yao wrote:
> v6:
>    Update according to the review comments from
>    Jiri Olsa <jolsa@redhat.com>. Major modifications are: 
> 
>    1. Move that multiline conditional code inside {} brackets.
> 
>    2. Move branch_type_stat_display() from builtin-report.c to
>       branch.c. Move branch_type_str() from callchain.c to
>       branch.c.
> 
>    3. Keep the original branch info display order, that is:
>       predicted, abort, cycles, iterations

for the tools part

Acked-by: Jiri Olsa <jolsa@kernel.org>

thanks,
jirka

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v6 0/7] perf report: Show branch type
@ 2017-04-20 12:07 Jin Yao
  2017-04-20  9:36 ` Jiri Olsa
                   ` (8 more replies)
  0 siblings, 9 replies; 35+ messages in thread
From: Jin Yao @ 2017-04-20 12:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev, Jin Yao

v6:
   Update according to the review comments from
   Jiri Olsa <jolsa@redhat.com>. Major modifications are: 

   1. Move that multiline conditional code inside {} brackets.

   2. Move branch_type_stat_display() from builtin-report.c to
      branch.c. Move branch_type_str() from callchain.c to
      branch.c.

   3. Keep the original branch info display order, that is:
      predicted, abort, cycles, iterations

v5:
---
   Mainly the v5 patch series are updated according to
   comments from Jiri Olsa <jolsa@redhat.com>.

   The kernel part doesn't have functional change. It just
   solve the merge issue.

   In userspace, the functions of branch type counting and
   branch type name resolving are moved to the new files: 
   util/branch.c, util/branch.h.

   And refactor the branch info printing code for better
   maintenance.

Not changed (or just fix merge issue):
  perf/core: Define the common branch type classification
  perf/x86/intel: Record branch type
  perf record: Create a new option save_type in --branch-filter

New patches:
  perf report: Refactor the branch info printing code
  perf util: Create branch.c/.h for common branch functions

Changed:
  perf report: Show branch type statistics for stdio mode
  perf report: Show branch type in callchain entry

v4:
---
1. Describe the major changes in patch description.
   Thanks for Peter Zijlstra's reminding. 

2. Initialize branch type to 0 in intel_pmu_lbr_read_32 and
   intel_pmu_lbr_read_64. Remove the invalid else code in
   intel_pmu_lbr_filter. 

v3:
---
1. Move the JCC forward/backward and cross page computing from
   kernel to userspace.

2. Use lookup table to replace original switch/case processing.

Changed:
  perf/core: Define the common branch type classification
  perf/x86/intel: Record branch type
  perf report: Show branch type statistics for stdio mode
  perf report: Show branch type in callchain entry

Not changed:
  perf record: Create a new option save_type in --branch-filter

v2:
---
1. Use 4 bits in perf_branch_entry to record branch type.

2. Pull out some common branch types from FAR_BRANCH. Now the branch
   types defined in perf_event.h:

Jin Yao (7):
  perf/core: Define the common branch type classification
  perf/x86/intel: Record branch type
  perf record: Create a new option save_type in --branch-filter
  perf report: Refactor the branch info printing code
  perf util: Create branch.c/.h for common branch functions
  perf report: Show branch type statistics for stdio mode
  perf report: Show branch type in callchain entry

 arch/x86/events/intel/lbr.c              |  53 +++++++++-
 include/uapi/linux/perf_event.h          |  29 +++++-
 tools/include/uapi/linux/perf_event.h    |  29 +++++-
 tools/perf/Documentation/perf-record.txt |   1 +
 tools/perf/builtin-report.c              |  25 +++++
 tools/perf/util/Build                    |   1 +
 tools/perf/util/branch.c                 | 168 +++++++++++++++++++++++++++++++
 tools/perf/util/branch.h                 |  25 +++++
 tools/perf/util/callchain.c              | 140 ++++++++++++++------------
 tools/perf/util/callchain.h              |   5 +-
 tools/perf/util/event.h                  |   3 +-
 tools/perf/util/hist.c                   |   5 +-
 tools/perf/util/machine.c                |  26 +++--
 tools/perf/util/parse-branch-options.c   |   1 +
 14 files changed, 427 insertions(+), 84 deletions(-)
 create mode 100644 tools/perf/util/branch.c
 create mode 100644 tools/perf/util/branch.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v6 1/7] perf/core: Define the common branch type classification
  2017-04-20 12:07 [PATCH v6 0/7] perf report: Show branch type Jin Yao
  2017-04-20  9:36 ` Jiri Olsa
@ 2017-04-20 12:07 ` Jin Yao
  2017-07-07  8:42   ` Peter Zijlstra
  2017-07-10  6:05   ` Michael Ellerman
  2017-04-20 12:07 ` [PATCH v6 2/7] perf/x86/intel: Record branch type Jin Yao
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 35+ messages in thread
From: Jin Yao @ 2017-04-20 12:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev, Jin Yao

It is often useful to know the branch types while analyzing branch
data. For example, a call is very different from a conditional branch.

Currently we have to look it up in binary while the binary may later
not be available and even the binary is available but user has to take
some time. It is very useful for user to check it directly in perf
report.

Perf already has support for disassembling the branch instruction
to get the x86 branch type.

To keep consistent on kernel and userspace and make the classification
more common, the patch adds the common branch type classification
in perf_event.h.

PERF_BR_NONE      : unknown
PERF_BR_JCC       : conditional jump
PERF_BR_JMP       : jump
PERF_BR_IND_JMP   : indirect jump
PERF_BR_CALL      : call
PERF_BR_IND_CALL  : indirect call
PERF_BR_RET       : return
PERF_BR_SYSCALL   : syscall
PERF_BR_SYSRET    : syscall return
PERF_BR_IRQ       : hw interrupt/trap/fault
PERF_BR_INT       : sw interrupt
PERF_BR_IRET      : return from interrupt
PERF_BR_FAR_BRANCH: not generic far branch type

The patch also adds a new field type (4 bits) in perf_branch_entry
to record the branch type.

Since the disassembling of branch instruction needs some overhead,
a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
needs to disassemble the branch instruction and record the branch
type.

Change log
----------

v6: Not changed.

v5: Not changed. The v5 patch series just change the userspace.

v4: Comparing to previous version, the major changes are:

1. Remove the PERF_BR_JCC_FWD/PERF_BR_JCC_BWD, they will be
   computed later in userspace.

2. Remove the "cross" field in perf_branch_entry. The cross page
   computing will be done later in userspace.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 include/uapi/linux/perf_event.h       | 29 ++++++++++++++++++++++++++++-
 tools/include/uapi/linux/perf_event.h | 29 ++++++++++++++++++++++++++++-
 2 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index d09a9cd..69af012 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
 	PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT	= 14, /* no flags */
 	PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT	= 15, /* no cycles */
 
+	PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT	= 16, /* save branch type */
+
 	PERF_SAMPLE_BRANCH_MAX_SHIFT		/* non-ABI */
 };
 
@@ -198,9 +200,32 @@ enum perf_branch_sample_type {
 	PERF_SAMPLE_BRANCH_NO_FLAGS	= 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
 	PERF_SAMPLE_BRANCH_NO_CYCLES	= 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
 
+	PERF_SAMPLE_BRANCH_TYPE_SAVE	=
+		1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
+
 	PERF_SAMPLE_BRANCH_MAX		= 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
 };
 
+/*
+ * Common flow change classification
+ */
+enum {
+	PERF_BR_NONE		= 0,	/* unknown */
+	PERF_BR_JCC		= 1,	/* conditional jump */
+	PERF_BR_JMP		= 2,	/* jump */
+	PERF_BR_IND_JMP		= 3,	/* indirect jump */
+	PERF_BR_CALL		= 4,	/* call */
+	PERF_BR_IND_CALL	= 5,	/* indirect call */
+	PERF_BR_RET		= 6,	/* return */
+	PERF_BR_SYSCALL		= 7,	/* syscall */
+	PERF_BR_SYSRET		= 8,	/* syscall return */
+	PERF_BR_IRQ		= 9,	/* hw interrupt/trap/fault */
+	PERF_BR_INT		= 10,	/* sw interrupt */
+	PERF_BR_IRET		= 11,	/* return from interrupt */
+	PERF_BR_FAR_BRANCH	= 12,	/* not generic far branch type */
+	PERF_BR_MAX,
+};
+
 #define PERF_SAMPLE_BRANCH_PLM_ALL \
 	(PERF_SAMPLE_BRANCH_USER|\
 	 PERF_SAMPLE_BRANCH_KERNEL|\
@@ -999,6 +1024,7 @@ union perf_mem_data_src {
  *     in_tx: running in a hardware transaction
  *     abort: aborting a hardware transaction
  *    cycles: cycles from last branch (or 0 if not supported)
+ *      type: branch type
  */
 struct perf_branch_entry {
 	__u64	from;
@@ -1008,7 +1034,8 @@ struct perf_branch_entry {
 		in_tx:1,    /* in transaction */
 		abort:1,    /* transaction abort */
 		cycles:16,  /* cycle count to last branch */
-		reserved:44;
+		type:4,     /* branch type */
+		reserved:40;
 };
 
 #endif /* _UAPI_LINUX_PERF_EVENT_H */
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index d09a9cd..69af012 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
 	PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT	= 14, /* no flags */
 	PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT	= 15, /* no cycles */
 
+	PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT	= 16, /* save branch type */
+
 	PERF_SAMPLE_BRANCH_MAX_SHIFT		/* non-ABI */
 };
 
@@ -198,9 +200,32 @@ enum perf_branch_sample_type {
 	PERF_SAMPLE_BRANCH_NO_FLAGS	= 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
 	PERF_SAMPLE_BRANCH_NO_CYCLES	= 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
 
+	PERF_SAMPLE_BRANCH_TYPE_SAVE	=
+		1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
+
 	PERF_SAMPLE_BRANCH_MAX		= 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
 };
 
+/*
+ * Common flow change classification
+ */
+enum {
+	PERF_BR_NONE		= 0,	/* unknown */
+	PERF_BR_JCC		= 1,	/* conditional jump */
+	PERF_BR_JMP		= 2,	/* jump */
+	PERF_BR_IND_JMP		= 3,	/* indirect jump */
+	PERF_BR_CALL		= 4,	/* call */
+	PERF_BR_IND_CALL	= 5,	/* indirect call */
+	PERF_BR_RET		= 6,	/* return */
+	PERF_BR_SYSCALL		= 7,	/* syscall */
+	PERF_BR_SYSRET		= 8,	/* syscall return */
+	PERF_BR_IRQ		= 9,	/* hw interrupt/trap/fault */
+	PERF_BR_INT		= 10,	/* sw interrupt */
+	PERF_BR_IRET		= 11,	/* return from interrupt */
+	PERF_BR_FAR_BRANCH	= 12,	/* not generic far branch type */
+	PERF_BR_MAX,
+};
+
 #define PERF_SAMPLE_BRANCH_PLM_ALL \
 	(PERF_SAMPLE_BRANCH_USER|\
 	 PERF_SAMPLE_BRANCH_KERNEL|\
@@ -999,6 +1024,7 @@ union perf_mem_data_src {
  *     in_tx: running in a hardware transaction
  *     abort: aborting a hardware transaction
  *    cycles: cycles from last branch (or 0 if not supported)
+ *      type: branch type
  */
 struct perf_branch_entry {
 	__u64	from;
@@ -1008,7 +1034,8 @@ struct perf_branch_entry {
 		in_tx:1,    /* in transaction */
 		abort:1,    /* transaction abort */
 		cycles:16,  /* cycle count to last branch */
-		reserved:44;
+		type:4,     /* branch type */
+		reserved:40;
 };
 
 #endif /* _UAPI_LINUX_PERF_EVENT_H */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v6 2/7] perf/x86/intel: Record branch type
  2017-04-20 12:07 [PATCH v6 0/7] perf report: Show branch type Jin Yao
  2017-04-20  9:36 ` Jiri Olsa
  2017-04-20 12:07 ` [PATCH v6 1/7] perf/core: Define the common branch type classification Jin Yao
@ 2017-04-20 12:07 ` Jin Yao
  2017-04-23 13:55   ` Jiri Olsa
  2017-04-20 12:07 ` [PATCH v6 3/7] perf record: Create a new option save_type in --branch-filter Jin Yao
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: Jin Yao @ 2017-04-20 12:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev, Jin Yao

Perf already has support for disassembling the branch instruction
and using the branch type for filtering. The patch just records
the branch type in perf_branch_entry.

Before recording, the patch converts the x86 branch type to
common branch type.

Change log
----------

v6: Not changed.

v5: Just fix the merge error. No other update.

v4: Comparing to previous version, the major changes are:

1. Uses a lookup table to convert x86 branch type to common branch
   type.

2. Move the JCC forward/JCC backward and cross page computing to
   user space.

3. Initialize branch type to 0 in intel_pmu_lbr_read_32 and
   intel_pmu_lbr_read_64

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 arch/x86/events/intel/lbr.c | 53 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 52 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index f924629..f10a7ed 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -109,6 +109,9 @@ enum {
 	X86_BR_ZERO_CALL	= 1 << 15,/* zero length call */
 	X86_BR_CALL_STACK	= 1 << 16,/* call stack */
 	X86_BR_IND_JMP		= 1 << 17,/* indirect jump */
+
+	X86_BR_TYPE_SAVE	= 1 << 18,/* indicate to save branch type */
+
 };
 
 #define X86_BR_PLM (X86_BR_USER | X86_BR_KERNEL)
@@ -510,6 +513,7 @@ static void intel_pmu_lbr_read_32(struct cpu_hw_events *cpuc)
 		cpuc->lbr_entries[i].in_tx	= 0;
 		cpuc->lbr_entries[i].abort	= 0;
 		cpuc->lbr_entries[i].cycles	= 0;
+		cpuc->lbr_entries[i].type	= 0;
 		cpuc->lbr_entries[i].reserved	= 0;
 	}
 	cpuc->lbr_stack.nr = i;
@@ -596,6 +600,7 @@ static void intel_pmu_lbr_read_64(struct cpu_hw_events *cpuc)
 		cpuc->lbr_entries[out].in_tx	 = in_tx;
 		cpuc->lbr_entries[out].abort	 = abort;
 		cpuc->lbr_entries[out].cycles	 = cycles;
+		cpuc->lbr_entries[out].type	 = 0;
 		cpuc->lbr_entries[out].reserved	 = 0;
 		out++;
 	}
@@ -673,6 +678,10 @@ static int intel_pmu_setup_sw_lbr_filter(struct perf_event *event)
 
 	if (br_type & PERF_SAMPLE_BRANCH_CALL)
 		mask |= X86_BR_CALL | X86_BR_ZERO_CALL;
+
+	if (br_type & PERF_SAMPLE_BRANCH_TYPE_SAVE)
+		mask |= X86_BR_TYPE_SAVE;
+
 	/*
 	 * stash actual user request into reg, it may
 	 * be used by fixup code for some CPU
@@ -926,6 +935,44 @@ static int branch_type(unsigned long from, unsigned long to, int abort)
 	return ret;
 }
 
+#define X86_BR_TYPE_MAP_MAX	16
+
+static int
+common_branch_type(int type)
+{
+	int i, mask;
+	const int branch_map[X86_BR_TYPE_MAP_MAX] = {
+		PERF_BR_CALL,		/* X86_BR_CALL */
+		PERF_BR_RET,		/* X86_BR_RET */
+		PERF_BR_SYSCALL,	/* X86_BR_SYSCALL */
+		PERF_BR_SYSRET,		/* X86_BR_SYSRET */
+		PERF_BR_INT,		/* X86_BR_INT */
+		PERF_BR_IRET,		/* X86_BR_IRET */
+		PERF_BR_JCC,		/* X86_BR_JCC */
+		PERF_BR_JMP,		/* X86_BR_JMP */
+		PERF_BR_IRQ,		/* X86_BR_IRQ */
+		PERF_BR_IND_CALL,	/* X86_BR_IND_CALL */
+		PERF_BR_NONE,		/* X86_BR_ABORT */
+		PERF_BR_NONE,		/* X86_BR_IN_TX */
+		PERF_BR_NONE,		/* X86_BR_NO_TX */
+		PERF_BR_CALL,		/* X86_BR_ZERO_CALL */
+		PERF_BR_NONE,		/* X86_BR_CALL_STACK */
+		PERF_BR_IND_JMP,	/* X86_BR_IND_JMP */
+	};
+
+	type >>= 2; /* skip X86_BR_USER and X86_BR_KERNEL */
+	mask = ~(~0 << 1);
+
+	for (i = 0; i < X86_BR_TYPE_MAP_MAX; i++) {
+		if (type & mask)
+			return branch_map[i];
+
+		type >>= 1;
+	}
+
+	return PERF_BR_NONE;
+}
+
 /*
  * implement actual branch filter based on user demand.
  * Hardware may not exactly satisfy that request, thus
@@ -942,7 +989,8 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
 	bool compress = false;
 
 	/* if sampling all branches, then nothing to filter */
-	if ((br_sel & X86_BR_ALL) == X86_BR_ALL)
+	if (((br_sel & X86_BR_ALL) == X86_BR_ALL) &&
+	    ((br_sel & X86_BR_TYPE_SAVE) != X86_BR_TYPE_SAVE))
 		return;
 
 	for (i = 0; i < cpuc->lbr_stack.nr; i++) {
@@ -963,6 +1011,9 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
 			cpuc->lbr_entries[i].from = 0;
 			compress = true;
 		}
+
+		if ((br_sel & X86_BR_TYPE_SAVE) == X86_BR_TYPE_SAVE)
+			cpuc->lbr_entries[i].type = common_branch_type(type);
 	}
 
 	if (!compress)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v6 3/7] perf record: Create a new option save_type in --branch-filter
  2017-04-20 12:07 [PATCH v6 0/7] perf report: Show branch type Jin Yao
                   ` (2 preceding siblings ...)
  2017-04-20 12:07 ` [PATCH v6 2/7] perf/x86/intel: Record branch type Jin Yao
@ 2017-04-20 12:07 ` Jin Yao
  2017-04-20 12:07 ` [PATCH v6 4/7] perf report: Refactor the branch info printing code Jin Yao
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Jin Yao @ 2017-04-20 12:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev, Jin Yao

The option indicates the kernel to save branch type during sampling.

One example:
perf record -g --branch-filter any,save_type <command>

Change log
----------

v6: Not changed.

v5: Not changed.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/Documentation/perf-record.txt | 1 +
 tools/perf/util/parse-branch-options.c   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index ea3789d..e2f5a4f 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -332,6 +332,7 @@ following filters are defined:
 	- no_tx: only when the target is not in a hardware transaction
 	- abort_tx: only when the target is a hardware transaction abort
 	- cond: conditional branches
+	- save_type: save branch type during sampling in case binary is not available later
 
 +
 The option requires at least one branch type among any, any_call, any_ret, ind_call, cond.
diff --git a/tools/perf/util/parse-branch-options.c b/tools/perf/util/parse-branch-options.c
index 38fd115..e71fb5f 100644
--- a/tools/perf/util/parse-branch-options.c
+++ b/tools/perf/util/parse-branch-options.c
@@ -28,6 +28,7 @@ static const struct branch_mode branch_modes[] = {
 	BRANCH_OPT("cond", PERF_SAMPLE_BRANCH_COND),
 	BRANCH_OPT("ind_jmp", PERF_SAMPLE_BRANCH_IND_JUMP),
 	BRANCH_OPT("call", PERF_SAMPLE_BRANCH_CALL),
+	BRANCH_OPT("save_type", PERF_SAMPLE_BRANCH_TYPE_SAVE),
 	BRANCH_END
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v6 4/7] perf report: Refactor the branch info printing code
  2017-04-20 12:07 [PATCH v6 0/7] perf report: Show branch type Jin Yao
                   ` (3 preceding siblings ...)
  2017-04-20 12:07 ` [PATCH v6 3/7] perf record: Create a new option save_type in --branch-filter Jin Yao
@ 2017-04-20 12:07 ` Jin Yao
  2017-04-20 12:07 ` [PATCH v6 5/7] perf util: Create branch.c/.h for common branch functions Jin Yao
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Jin Yao @ 2017-04-20 12:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev, Jin Yao

The branch info such as predicted/cycles/... are printed at the
callchain entries.

For example: perf report --branch-history --no-children --stdio

    --1.07%--main div.c:39 (predicted:52.4% cycles:1 iterations:17)
              main div.c:44 (predicted:52.4% cycles:1)
              main div.c:42 (cycles:2)
              compute_flag div.c:28 (cycles:2)
              compute_flag div.c:27 (cycles:1)
              rand rand.c:28 (cycles:1)
              rand rand.c:28 (cycles:1)
              __random random.c:298 (cycles:1)
              __random random.c:297 (cycles:1)
              __random random.c:295 (cycles:1)
              __random random.c:295 (cycles:1)
              __random random.c:295 (cycles:1)

But the current code is difficult to maintain and extend. This patch
refactors the code for easy maintenance.

Change log
----------

v6: 1. Put the multiline condition code into {} brackets in
       counts_str_build()

    2. Keep the original display order, that is:
       predicted, abort, cycles, iterations

v5: It's a new patch in v5 patch series.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/util/callchain.c | 106 ++++++++++++++++++++------------------------
 1 file changed, 47 insertions(+), 59 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 0096d45..d44b5ed 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -1106,83 +1106,71 @@ int callchain_branch_counts(struct callchain_root *root,
 						  cycles_count);
 }
 
+static int count_pri64_printf(int index, const char *str, u64 value,
+	char *bf, int bfsize)
+{
+	int printed;
+
+	printed = scnprintf(bf, bfsize,
+		"%s%s:%" PRId64 "",
+		(index) ? " " : " (", str, value);
+
+	return printed;
+}
+
+static int count_float_printf(int index, const char *str, float value,
+	char *bf, int bfsize)
+{
+	int printed;
+
+	printed = scnprintf(bf, bfsize,
+		"%s%s:%.1f%%",
+		(index) ? " " : " (", str, value);
+
+	return printed;
+}
+
 static int counts_str_build(char *bf, int bfsize,
 			     u64 branch_count, u64 predicted_count,
 			     u64 abort_count, u64 cycles_count,
 			     u64 iter_count, u64 samples_count)
 {
-	double predicted_percent = 0.0;
-	const char *null_str = "";
-	char iter_str[32];
-	char cycle_str[32];
-	char *istr, *cstr;
 	u64 cycles;
+	int printed = 0, i = 0;
 
 	if (branch_count == 0)
 		return scnprintf(bf, bfsize, " (calltrace)");
 
-	cycles = cycles_count / branch_count;
-
-	if (iter_count && samples_count) {
-		if (cycles > 0)
-			scnprintf(iter_str, sizeof(iter_str),
-				 " iterations:%" PRId64 "",
-				 iter_count / samples_count);
-		else
-			scnprintf(iter_str, sizeof(iter_str),
-				 "iterations:%" PRId64 "",
-				 iter_count / samples_count);
-		istr = iter_str;
-	} else
-		istr = (char *)null_str;
-
-	if (cycles > 0) {
-		scnprintf(cycle_str, sizeof(cycle_str),
-			  "cycles:%" PRId64 "", cycles);
-		cstr = cycle_str;
-	} else
-		cstr = (char *)null_str;
-
-	predicted_percent = predicted_count * 100.0 / branch_count;
+	if (predicted_count < branch_count) {
+		printed += count_float_printf(i++, "predicted",
+				predicted_count * 100.0 / branch_count,
+				bf + printed, bfsize - printed);
+	}
 
-	if ((predicted_count == branch_count) && (abort_count == 0)) {
-		if ((cycles > 0) || (istr != (char *)null_str))
-			return scnprintf(bf, bfsize, " (%s%s)", cstr, istr);
-		else
-			return scnprintf(bf, bfsize, "%s", (char *)null_str);
+	if (abort_count) {
+		printed += count_float_printf(i++, "abort",
+				abort_count * 100.0 / branch_count,
+				bf + printed, bfsize - printed);
 	}
 
-	if ((predicted_count < branch_count) && (abort_count == 0)) {
-		if ((cycles > 0) || (istr != (char *)null_str))
-			return scnprintf(bf, bfsize,
-				" (predicted:%.1f%% %s%s)",
-				predicted_percent, cstr, istr);
-		else {
-			return scnprintf(bf, bfsize,
-				" (predicted:%.1f%%)",
-				predicted_percent);
-		}
+	cycles = cycles_count / branch_count;
+	if (cycles) {
+		printed += count_pri64_printf(i++, "cycles",
+				cycles,
+				bf + printed, bfsize - printed);
 	}
 
-	if ((predicted_count == branch_count) && (abort_count > 0)) {
-		if ((cycles > 0) || (istr != (char *)null_str))
-			return scnprintf(bf, bfsize,
-				" (abort:%" PRId64 " %s%s)",
-				abort_count, cstr, istr);
-		else
-			return scnprintf(bf, bfsize,
-				" (abort:%" PRId64 ")",
-				abort_count);
+	if (iter_count && samples_count) {
+		printed += count_pri64_printf(i++, "iterations",
+				iter_count / samples_count,
+				bf + printed, bfsize - printed);
 	}
 
-	if ((cycles > 0) || (istr != (char *)null_str))
-		return scnprintf(bf, bfsize,
-			" (predicted:%.1f%% abort:%" PRId64 " %s%s)",
-			predicted_percent, abort_count, cstr, istr);
+	if (i)
+		return scnprintf(bf + printed, bfsize - printed, ")");
 
-	return scnprintf(bf, bfsize,
-			" (predicted:%.1f%% abort:%" PRId64 ")",
-			predicted_percent, abort_count);
+	bf[0] = 0;
+	return 0;
 }
 
 static int callchain_counts_printf(FILE *fp, char *bf, int bfsize,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v6 5/7] perf util: Create branch.c/.h for common branch functions
  2017-04-20 12:07 [PATCH v6 0/7] perf report: Show branch type Jin Yao
                   ` (4 preceding siblings ...)
  2017-04-20 12:07 ` [PATCH v6 4/7] perf report: Refactor the branch info printing code Jin Yao
@ 2017-04-20 12:07 ` Jin Yao
  2017-04-20 12:07 ` [PATCH v6 6/7] perf report: Show branch type statistics for stdio mode Jin Yao
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Jin Yao @ 2017-04-20 12:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev, Jin Yao

Create new util/branch.c and util/branch.h to contain the common
branch functions. Such as:

branch_type_count(): Count the numbers of branch types
branch_type_name() : Return the name of branch type
branch_type_stat_display(): Display branch type statistics info
branch_type_str(): Construct the branch type string.

The branch type is saved in branch_flags.

Change log
----------

v6: Move that multiline conditional code inside {} brackets.
    Move branch_type_stat_display() from builtin-report.c to
      branch.c.
    Move branch_type_str() from callchain.c to branch.c.

v5: It's a new patch in v5 patch series.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/util/Build    |   1 +
 tools/perf/util/branch.c | 168 +++++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/branch.h |  25 +++++++
 tools/perf/util/event.h  |   3 +-
 4 files changed, 196 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/util/branch.c
 create mode 100644 tools/perf/util/branch.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index f0b9e5d..391bf85 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -91,6 +91,7 @@ libperf-y += vsprintf.o
 libperf-y += drv_configs.o
 libperf-y += time-utils.o
 libperf-y += expr-bison.o
+libperf-y += branch.o
 
 libperf-$(CONFIG_LIBBPF) += bpf-loader.o
 libperf-$(CONFIG_BPF_PROLOGUE) += bpf-prologue.o
diff --git a/tools/perf/util/branch.c b/tools/perf/util/branch.c
new file mode 100644
index 0000000..4aa56ad
--- /dev/null
+++ b/tools/perf/util/branch.c
@@ -0,0 +1,168 @@
+#include "perf.h"
+#include "util/util.h"
+#include "util/debug.h"
+#include "util/branch.h"
+
+static bool cross_area(u64 addr1, u64 addr2, int size)
+{
+	u64 align1, align2;
+
+	align1 = addr1 & ~(size - 1);
+	align2 = addr2 & ~(size - 1);
+
+	return (align1 != align2) ? true : false;
+}
+
+#define AREA_4K		4096
+#define AREA_2M		(2 * 1024 * 1024)
+
+void branch_type_count(struct branch_type_stat *stat,
+		       struct branch_flags *flags,
+		       u64 from, u64 to)
+{
+	if (flags->type == PERF_BR_NONE || from == 0)
+		return;
+
+	stat->counts[flags->type]++;
+
+	if (flags->type == PERF_BR_JCC) {
+		if (to > from)
+			stat->jcc_fwd++;
+		else
+			stat->jcc_bwd++;
+	}
+
+	if (cross_area(from, to, AREA_2M))
+		stat->cross_2m++;
+	else if (cross_area(from, to, AREA_4K))
+		stat->cross_4k++;
+}
+
+const char *branch_type_name(int type)
+{
+	const char *branch_names[PERF_BR_MAX] = {
+		"N/A",
+		"JCC",
+		"JMP",
+		"IND_JMP",
+		"CALL",
+		"IND_CALL",
+		"RET",
+		"SYSCALL",
+		"SYSRET",
+		"IRQ",
+		"INT",
+		"IRET",
+		"FAR_BRANCH",
+	};
+
+	if (type >= 0 && type < PERF_BR_MAX)
+		return branch_names[type];
+
+	return NULL;
+}
+
+void branch_type_stat_display(FILE *fp, struct branch_type_stat *stat)
+{
+	u64 total = 0;
+	int i;
+
+	for (i = 0; i < PERF_BR_MAX; i++)
+		total += stat->counts[i];
+
+	if (total == 0)
+		return;
+
+	fprintf(fp, "\n#");
+	fprintf(fp, "\n# Branch Statistics:");
+	fprintf(fp, "\n#");
+
+	if (stat->jcc_fwd > 0) {
+		fprintf(fp, "\n%12s: %5.1f%%",
+			"JCC forward",
+			100.0 * (double)stat->jcc_fwd / (double)total);
+	}
+
+	if (stat->jcc_bwd > 0) {
+		fprintf(fp, "\n%12s: %5.1f%%",
+			"JCC backward",
+			100.0 * (double)stat->jcc_bwd / (double)total);
+	}
+
+	if (stat->cross_4k > 0) {
+		fprintf(fp, "\n%12s: %5.1f%%",
+			"CROSS_4K",
+			100.0 * (double)stat->cross_4k / (double)total);
+	}
+
+	if (stat->cross_2m > 0) {
+		fprintf(fp, "\n%12s: %5.1f%%",
+			"CROSS_2M",
+			100.0 * (double)stat->cross_2m / (double)total);
+	}
+
+	for (i = 0; i < PERF_BR_MAX; i++) {
+		if (stat->counts[i] > 0)
+			fprintf(fp, "\n%12s: %5.1f%%",
+				branch_type_name(i),
+				100.0 *
+				(double)stat->counts[i] / (double)total);
+	}
+}
+
+static int count_str_printf(int index, const char *str,
+	char *bf, int bfsize)
+{
+	int printed;
+
+	printed = scnprintf(bf, bfsize,
+		"%s%s",
+		(index) ? " " : " (", str);
+
+	return printed;
+}
+
+int branch_type_str(struct branch_type_stat *stat,
+		    char *bf, int bfsize)
+{
+	int i, j = 0, printed = 0;
+	u64 total = 0;
+
+	for (i = 0; i < PERF_BR_MAX; i++)
+		total += stat->counts[i];
+
+	if (total == 0)
+		return 0;
+
+	if (stat->jcc_fwd > 0) {
+		printed += count_str_printf(j++, "JCC forward",
+				bf + printed, bfsize - printed);
+	}
+
+	if (stat->jcc_bwd > 0) {
+		printed += count_str_printf(j++, "JCC backward",
+				bf + printed, bfsize - printed);
+	}
+
+	for (i = 0; i < PERF_BR_MAX; i++) {
+		if (i == PERF_BR_JCC)
+			continue;
+
+		if (stat->counts[i] > 0) {
+			printed += count_str_printf(j++, branch_type_name(i),
+					bf + printed, bfsize - printed);
+		}
+	}
+
+	if (stat->cross_4k > 0) {
+		printed += count_str_printf(j++, "CROSS_4K",
+				bf + printed, bfsize - printed);
+	}
+
+	if (stat->cross_2m > 0) {
+		printed += count_str_printf(j++, "CROSS_2M",
+				bf + printed, bfsize - printed);
+	}
+
+	return printed;
+}
diff --git a/tools/perf/util/branch.h b/tools/perf/util/branch.h
new file mode 100644
index 0000000..4fb3d50
--- /dev/null
+++ b/tools/perf/util/branch.h
@@ -0,0 +1,25 @@
+#ifndef _PERF_BRANCH_H
+#define _PERF_BRANCH_H 1
+
+#include <stdint.h>
+#include "../perf.h"
+
+struct branch_type_stat {
+	u64 counts[PERF_BR_MAX];
+	u64 jcc_fwd;
+	u64 jcc_bwd;
+	u64 cross_4k;
+	u64 cross_2m;
+};
+
+struct branch_flags;
+
+void branch_type_count(struct branch_type_stat *stat,
+		       struct branch_flags *flags,
+		       u64 from, u64 to);
+
+const char *branch_type_name(int type);
+void branch_type_stat_display(FILE *fp, struct branch_type_stat *stat);
+int branch_type_str(struct branch_type_stat *stat, char *bf, int bfsize);
+
+#endif /* _PERF_BRANCH_H */
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index eb7a7b2..26b4c2e 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -142,7 +142,8 @@ struct branch_flags {
 	u64 in_tx:1;
 	u64 abort:1;
 	u64 cycles:16;
-	u64 reserved:44;
+	u64 type:4;
+	u64 reserved:40;
 };
 
 struct branch_entry {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v6 6/7] perf report: Show branch type statistics for stdio mode
  2017-04-20 12:07 [PATCH v6 0/7] perf report: Show branch type Jin Yao
                   ` (5 preceding siblings ...)
  2017-04-20 12:07 ` [PATCH v6 5/7] perf util: Create branch.c/.h for common branch functions Jin Yao
@ 2017-04-20 12:07 ` Jin Yao
  2017-04-20 12:07 ` [PATCH v6 7/7] perf report: Show branch type in callchain entry Jin Yao
  2017-07-07  8:09 ` [PATCH v6 0/7] perf report: Show branch type Jiri Olsa
  8 siblings, 0 replies; 35+ messages in thread
From: Jin Yao @ 2017-04-20 12:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev, Jin Yao

Show the branch type statistics at the end of perf report --stdio.

For example:
perf report --stdio

 JCC forward:  27.6%
JCC backward:  10.0%
    CROSS_4K:   0.0%
    CROSS_2M:  14.3%
         JCC:  37.6%
         JMP:   0.0%
     IND_JMP:   6.5%
        CALL:  26.6%
    IND_CALL:   0.0%
         RET:  29.3%

The branch types are:
---------------------
 JCC forward: Conditional forward jump
JCC backward: Conditional backward jump
         JMP: Jump imm
     IND_JMP: Jump reg/mem
        CALL: Call imm
    IND_CALL: Call reg/mem
         RET: Ret
     SYSCALL: Syscall
      SYSRET: Syscall return
         IRQ: HW interrupt/trap/fault
         INT: SW interrupt
        IRET: Return from interrupt
  FAR_BRANCH: Others not generic branch type

CROSS_4K and CROSS_2M:
----------------------
They are the metrics checking for branches cross 4K or 2MB pages.
It's an approximate computing. We don't know if the area is 4K or
2MB, so always compute both.

To make the output simple, if a branch crosses 2M area, CROSS_4K
will not be incremented.

Change log
----------

v6: Remove branch_type_stat_display() since it's moved to branch.c.

v5: Remove the unnecessary sort__mode checking in
    hist_iter__branch_callback().

v4: Comparing to previous version, the major changes are:

Add the computing of JCC forward/JCC backward and cross page checking
by using the from and to addresses.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/builtin-report.c | 25 +++++++++++++++++++++++++
 tools/perf/util/hist.c      |  5 +----
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 5bbd4b2..ba5026a 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -37,6 +37,7 @@
 #include "arch/common.h"
 #include "util/time-utils.h"
 #include "util/auxtrace.h"
+#include "util/branch.h"
 
 #include <dlfcn.h>
 #include <errno.h>
@@ -68,6 +69,7 @@ struct report {
 	u64			queue_size;
 	int			socket_filter;
 	DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
+	struct branch_type_stat	brtype_stat;
 };
 
 static int report__config(const char *var, const char *value, void *cb)
@@ -146,6 +148,22 @@ static int hist_iter__report_callback(struct hist_entry_iter *iter,
 	return err;
 }
 
+static int hist_iter__branch_callback(struct hist_entry_iter *iter,
+				      struct addr_location *al __maybe_unused,
+				      bool single __maybe_unused,
+				      void *arg)
+{
+	struct hist_entry *he = iter->he;
+	struct report *rep = arg;
+	struct branch_info *bi;
+
+	bi = he->branch_info;
+	branch_type_count(&rep->brtype_stat, &bi->flags,
+			  bi->from.addr, bi->to.addr);
+
+	return 0;
+}
+
 static int process_sample_event(struct perf_tool *tool,
 				union perf_event *event,
 				struct perf_sample *sample,
@@ -184,6 +202,8 @@ static int process_sample_event(struct perf_tool *tool,
 		 */
 		if (!sample->branch_stack)
 			goto out_put;
+
+		iter.add_entry_cb = hist_iter__branch_callback;
 		iter.ops = &hist_iter_branch;
 	} else if (rep->mem_mode) {
 		iter.ops = &hist_iter_mem;
@@ -406,6 +426,9 @@ static int perf_evlist__tty_browse_hists(struct perf_evlist *evlist,
 		perf_read_values_destroy(&rep->show_threads_values);
 	}
 
+	if (sort__mode == SORT_MODE__BRANCH)
+		branch_type_stat_display(stdout, &rep->brtype_stat);
+
 	return 0;
 }
 
@@ -938,6 +961,8 @@ int cmd_report(int argc, const char **argv)
 	if (has_br_stack && branch_call_mode)
 		symbol_conf.show_branchflag_count = true;
 
+	memset(&report.brtype_stat, 0, sizeof(struct branch_type_stat));
+
 	/*
 	 * Branch mode is a tristate:
 	 * -1 means default, so decide based on the file having branch data.
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 65d4275..f3a3be5 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -747,12 +747,9 @@ iter_prepare_branch_entry(struct hist_entry_iter *iter, struct addr_location *al
 }
 
 static int
-iter_add_single_branch_entry(struct hist_entry_iter *iter,
+iter_add_single_branch_entry(struct hist_entry_iter *iter __maybe_unused,
 			     struct addr_location *al __maybe_unused)
 {
-	/* to avoid calling callback function */
-	iter->he = NULL;
-
 	return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v6 7/7] perf report: Show branch type in callchain entry
  2017-04-20 12:07 [PATCH v6 0/7] perf report: Show branch type Jin Yao
                   ` (6 preceding siblings ...)
  2017-04-20 12:07 ` [PATCH v6 6/7] perf report: Show branch type statistics for stdio mode Jin Yao
@ 2017-04-20 12:07 ` Jin Yao
  2017-07-07  8:09 ` [PATCH v6 0/7] perf report: Show branch type Jiri Olsa
  8 siblings, 0 replies; 35+ messages in thread
From: Jin Yao @ 2017-04-20 12:07 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev, Jin Yao

Show branch type in callchain entry. The branch type is printed
with other LBR information (such as cycles/abort/...).

For example:
perf report --branch-history --stdio --no-children

--24.21%--main div.c:42 (RET CROSS_2M cycles:2)
          compute_flag div.c:28 (cycles:2)
          compute_flag div.c:27 (RET CROSS_2M cycles:1)
          rand rand.c:28 (cycles:1)
          rand rand.c:28 (RET CROSS_2M cycles:1)
          __random random.c:298 (cycles:1)
          __random random.c:297 (JCC backward CROSS_2M cycles:1)
          __random random.c:295 (cycles:1)
          __random random.c:295 (JCC backward CROSS_2M cycles:1)
          __random random.c:295 (cycles:1)
          __random random.c:295 (RET CROSS_2M cycles:9)

Change log
----------

v6: Remove the branch_type_str() since it's moved to branch.c.

v5: Rewrite the branch info print code in util/callchain.c.

v4: Comparing to previous version, the major changes are:

Since we have to compute the JCC forward/JCC backward and cross
page checking in user space by from and to addresses, while each
callchain entry only contains one ip (either from or to), so
this patch will append a branch from address to the callchain
entry which just contains the to ip.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/util/callchain.c | 38 +++++++++++++++++++++++++++++---------
 tools/perf/util/callchain.h |  5 ++++-
 tools/perf/util/machine.c   | 26 +++++++++++++++++---------
 3 files changed, 50 insertions(+), 19 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index d44b5ed..cfae50d 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -23,6 +23,7 @@
 #include "sort.h"
 #include "machine.h"
 #include "callchain.h"
+#include "branch.h"
 
 __thread struct callchain_cursor callchain_cursor;
 
@@ -468,6 +469,11 @@ fill_node(struct callchain_node *node, struct callchain_cursor *cursor)
 			call->cycles_count = cursor_node->branch_flags.cycles;
 			call->iter_count = cursor_node->nr_loop_iter;
 			call->samples_count = cursor_node->samples;
+
+			branch_type_count(&call->brtype_stat,
+					  &cursor_node->branch_flags,
+					  cursor_node->branch_from,
+					  cursor_node->ip);
 		}
 
 		list_add_tail(&call->list, &node->val);
@@ -580,6 +586,11 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
 			cnode->cycles_count += node->branch_flags.cycles;
 			cnode->iter_count += node->nr_loop_iter;
 			cnode->samples_count += node->samples;
+
+			branch_type_count(&cnode->brtype_stat,
+					  &node->branch_flags,
+					  node->branch_from,
+					  node->ip);
 		}
 
 		return MATCH_EQ;
@@ -814,7 +825,7 @@ merge_chain_branch(struct callchain_cursor *cursor,
 	list_for_each_entry_safe(list, next_list, &src->val, list) {
 		callchain_cursor_append(cursor, list->ip,
 					list->ms.map, list->ms.sym,
-					false, NULL, 0, 0);
+					false, NULL, 0, 0, 0);
 		list_del(&list->list);
 		map__zput(list->ms.map);
 		free(list);
@@ -854,7 +865,7 @@ int callchain_merge(struct callchain_cursor *cursor,
 int callchain_cursor_append(struct callchain_cursor *cursor,
 			    u64 ip, struct map *map, struct symbol *sym,
 			    bool branch, struct branch_flags *flags,
-			    int nr_loop_iter, int samples)
+			    int nr_loop_iter, int samples, u64 branch_from)
 {
 	struct callchain_cursor_node *node = *cursor->last;
 
@@ -878,6 +889,7 @@ int callchain_cursor_append(struct callchain_cursor *cursor,
 		memcpy(&node->branch_flags, flags,
 			sizeof(struct branch_flags));
 
+	node->branch_from = branch_from;
 	cursor->nr++;
 
 	cursor->last = &node->next;
@@ -1133,14 +1145,19 @@ static int count_float_printf(int index, const char *str, float value,
 static int counts_str_build(char *bf, int bfsize,
 			     u64 branch_count, u64 predicted_count,
 			     u64 abort_count, u64 cycles_count,
-			     u64 iter_count, u64 samples_count)
+			     u64 iter_count, u64 samples_count,
+			     struct branch_type_stat *brtype_stat)
 {
 	u64 cycles;
-	int printed = 0, i = 0;
+	int printed, i = 0;
 
 	if (branch_count == 0)
 		return scnprintf(bf, bfsize, " (calltrace)");
 
+	printed = branch_type_str(brtype_stat, bf, bfsize);
+	if (printed)
+		i++;
+
 	if (predicted_count < branch_count) {
 		printed += count_float_printf(i++, "predicted",
 				predicted_count * 100.0 / branch_count,
@@ -1176,13 +1193,14 @@ static int counts_str_build(char *bf, int bfsize,
 static int callchain_counts_printf(FILE *fp, char *bf, int bfsize,
 				   u64 branch_count, u64 predicted_count,
 				   u64 abort_count, u64 cycles_count,
-				   u64 iter_count, u64 samples_count)
+				   u64 iter_count, u64 samples_count,
+				   struct branch_type_stat *brtype_stat)
 {
-	char str[128];
+	char str[256];
 
 	counts_str_build(str, sizeof(str), branch_count,
 			 predicted_count, abort_count, cycles_count,
-			 iter_count, samples_count);
+			 iter_count, samples_count, brtype_stat);
 
 	if (fp)
 		return fprintf(fp, "%s", str);
@@ -1214,7 +1232,8 @@ int callchain_list_counts__printf_value(struct callchain_node *node,
 
 	return callchain_counts_printf(fp, bf, bfsize, branch_count,
 				       predicted_count, abort_count,
-				       cycles_count, iter_count, samples_count);
+				       cycles_count, iter_count, samples_count,
+				       &clist->brtype_stat);
 }
 
 static void free_callchain_node(struct callchain_node *node)
@@ -1339,7 +1358,8 @@ int callchain_cursor__copy(struct callchain_cursor *dst,
 
 		rc = callchain_cursor_append(dst, node->ip, node->map, node->sym,
 					     node->branch, &node->branch_flags,
-					     node->nr_loop_iter, node->samples);
+					     node->nr_loop_iter, node->samples,
+					     node->branch_from);
 		if (rc)
 			break;
 
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index c56c23d..9773820 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -7,6 +7,7 @@
 #include "event.h"
 #include "map.h"
 #include "symbol.h"
+#include "branch.h"
 
 #define HELP_PAD "\t\t\t\t"
 
@@ -119,6 +120,7 @@ struct callchain_list {
 	u64			cycles_count;
 	u64			iter_count;
 	u64			samples_count;
+	struct branch_type_stat brtype_stat;
 	char		       *srcline;
 	struct list_head	list;
 };
@@ -135,6 +137,7 @@ struct callchain_cursor_node {
 	struct symbol			*sym;
 	bool				branch;
 	struct branch_flags		branch_flags;
+	u64				branch_from;
 	int				nr_loop_iter;
 	int				samples;
 	struct callchain_cursor_node	*next;
@@ -198,7 +201,7 @@ static inline void callchain_cursor_reset(struct callchain_cursor *cursor)
 int callchain_cursor_append(struct callchain_cursor *cursor, u64 ip,
 			    struct map *map, struct symbol *sym,
 			    bool branch, struct branch_flags *flags,
-			    int nr_loop_iter, int samples);
+			    int nr_loop_iter, int samples, u64 branch_from);
 
 /* Close a cursor writing session. Initialize for the reader */
 static inline void callchain_cursor_commit(struct callchain_cursor *cursor)
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 988e84c..096451f 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1679,7 +1679,8 @@ static int add_callchain_ip(struct thread *thread,
 			    bool branch,
 			    struct branch_flags *flags,
 			    int nr_loop_iter,
-			    int samples)
+			    int samples,
+			    u64 branch_from)
 {
 	struct addr_location al;
 
@@ -1732,7 +1733,8 @@ static int add_callchain_ip(struct thread *thread,
 	if (symbol_conf.hide_unresolved && al.sym == NULL)
 		return 0;
 	return callchain_cursor_append(cursor, al.addr, al.map, al.sym,
-				       branch, flags, nr_loop_iter, samples);
+				       branch, flags, nr_loop_iter, samples,
+				       branch_from);
 }
 
 struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
@@ -1811,7 +1813,7 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 	struct ip_callchain *chain = sample->callchain;
 	int chain_nr = min(max_stack, (int)chain->nr), i;
 	u8 cpumode = PERF_RECORD_MISC_USER;
-	u64 ip;
+	u64 ip, branch_from = 0;
 
 	for (i = 0; i < chain_nr; i++) {
 		if (chain->ips[i] == PERF_CONTEXT_USER)
@@ -1853,6 +1855,8 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 					ip = lbr_stack->entries[0].to;
 					branch = true;
 					flags = &lbr_stack->entries[0].flags;
+					branch_from =
+						lbr_stack->entries[0].from;
 				}
 			} else {
 				if (j < lbr_nr) {
@@ -1867,12 +1871,15 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 					ip = lbr_stack->entries[0].to;
 					branch = true;
 					flags = &lbr_stack->entries[0].flags;
+					branch_from =
+						lbr_stack->entries[0].from;
 				}
 			}
 
 			err = add_callchain_ip(thread, cursor, parent,
 					       root_al, &cpumode, ip,
-					       branch, flags, 0, 0);
+					       branch, flags, 0, 0,
+					       branch_from);
 			if (err)
 				return (err < 0) ? err : 0;
 		}
@@ -1971,19 +1978,20 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 						       root_al,
 						       NULL, be[i].to,
 						       true, &be[i].flags,
-						       nr_loop_iter, 1);
+						       nr_loop_iter, 1,
+						       be[i].from);
 			else
 				err = add_callchain_ip(thread, cursor, parent,
 						       root_al,
 						       NULL, be[i].to,
 						       true, &be[i].flags,
-						       0, 0);
+						       0, 0, be[i].from);
 
 			if (!err)
 				err = add_callchain_ip(thread, cursor, parent, root_al,
 						       NULL, be[i].from,
 						       true, &be[i].flags,
-						       0, 0);
+						       0, 0, 0);
 			if (err == -EINVAL)
 				break;
 			if (err)
@@ -2013,7 +2021,7 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 
 		err = add_callchain_ip(thread, cursor, parent,
 				       root_al, &cpumode, ip,
-				       false, NULL, 0, 0);
+				       false, NULL, 0, 0, 0);
 
 		if (err)
 			return (err < 0) ? err : 0;
@@ -2030,7 +2038,7 @@ static int unwind_entry(struct unwind_entry *entry, void *arg)
 		return 0;
 	return callchain_cursor_append(cursor, entry->ip,
 				       entry->map, entry->sym,
-				       false, NULL, 0, 0);
+				       false, NULL, 0, 0, 0);
 }
 
 static int thread__resolve_callchain_unwind(struct thread *thread,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 0/7] perf report: Show branch type
  2017-04-20  9:36 ` Jiri Olsa
@ 2017-04-23  8:36   ` Jin, Yao
  2017-06-02  8:02   ` Jin, Yao
  1 sibling, 0 replies; 35+ messages in thread
From: Jin, Yao @ 2017-04-23  8:36 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin, linuxppc-dev



On 4/20/2017 5:36 PM, Jiri Olsa wrote:
> On Thu, Apr 20, 2017 at 08:07:48PM +0800, Jin Yao wrote:
>> v6:
>>     Update according to the review comments from
>>     Jiri Olsa <jolsa@redhat.com>. Major modifications are:
>>
>>     1. Move that multiline conditional code inside {} brackets.
>>
>>     2. Move branch_type_stat_display() from builtin-report.c to
>>        branch.c. Move branch_type_str() from callchain.c to
>>        branch.c.
>>
>>     3. Keep the original branch info display order, that is:
>>        predicted, abort, cycles, iterations
> for the tools part
>
> Acked-by: Jiri Olsa <jolsa@kernel.org>
>
> thanks,
> jirka

Thanks so much!

Is the kernel part OK?

Thanks
Jin Yao

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 2/7] perf/x86/intel: Record branch type
  2017-04-20 12:07 ` [PATCH v6 2/7] perf/x86/intel: Record branch type Jin Yao
@ 2017-04-23 13:55   ` Jiri Olsa
  2017-04-24  0:47     ` Jin, Yao
  0 siblings, 1 reply; 35+ messages in thread
From: Jiri Olsa @ 2017-04-23 13:55 UTC (permalink / raw)
  To: Jin Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin, linuxppc-dev

On Thu, Apr 20, 2017 at 08:07:50PM +0800, Jin Yao wrote:

SNIP

>  
> +#define X86_BR_TYPE_MAP_MAX	16
> +
> +static int
> +common_branch_type(int type)
> +{
> +	int i, mask;
> +	const int branch_map[X86_BR_TYPE_MAP_MAX] = {
> +		PERF_BR_CALL,		/* X86_BR_CALL */
> +		PERF_BR_RET,		/* X86_BR_RET */
> +		PERF_BR_SYSCALL,	/* X86_BR_SYSCALL */
> +		PERF_BR_SYSRET,		/* X86_BR_SYSRET */
> +		PERF_BR_INT,		/* X86_BR_INT */
> +		PERF_BR_IRET,		/* X86_BR_IRET */
> +		PERF_BR_JCC,		/* X86_BR_JCC */
> +		PERF_BR_JMP,		/* X86_BR_JMP */
> +		PERF_BR_IRQ,		/* X86_BR_IRQ */
> +		PERF_BR_IND_CALL,	/* X86_BR_IND_CALL */
> +		PERF_BR_NONE,		/* X86_BR_ABORT */
> +		PERF_BR_NONE,		/* X86_BR_IN_TX */
> +		PERF_BR_NONE,		/* X86_BR_NO_TX */
> +		PERF_BR_CALL,		/* X86_BR_ZERO_CALL */
> +		PERF_BR_NONE,		/* X86_BR_CALL_STACK */
> +		PERF_BR_IND_JMP,	/* X86_BR_IND_JMP */
> +	};
> +
> +	type >>= 2; /* skip X86_BR_USER and X86_BR_KERNEL */
> +	mask = ~(~0 << 1);

is that a fancy way to get 1 into the mask? what do I miss?

> +
> +	for (i = 0; i < X86_BR_TYPE_MAP_MAX; i++) {
> +		if (type & mask)
> +			return branch_map[i];

I wonder some bit search would be faster in here, but maybe not big deal

jirka

> +
> +		type >>= 1;
> +	}
> +
> +	return PERF_BR_NONE;
> +}
> +
>  /*

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 2/7] perf/x86/intel: Record branch type
  2017-04-23 13:55   ` Jiri Olsa
@ 2017-04-24  0:47     ` Jin, Yao
  2017-05-08  0:49       ` Jin, Yao
  2017-05-09  8:26       ` Jiri Olsa
  0 siblings, 2 replies; 35+ messages in thread
From: Jin, Yao @ 2017-04-24  0:47 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin, linuxppc-dev



On 4/23/2017 9:55 PM, Jiri Olsa wrote:
> On Thu, Apr 20, 2017 at 08:07:50PM +0800, Jin Yao wrote:
>
> SNIP
>
>>   
>> +#define X86_BR_TYPE_MAP_MAX	16
>> +
>> +static int
>> +common_branch_type(int type)
>> +{
>> +	int i, mask;
>> +	const int branch_map[X86_BR_TYPE_MAP_MAX] = {
>> +		PERF_BR_CALL,		/* X86_BR_CALL */
>> +		PERF_BR_RET,		/* X86_BR_RET */
>> +		PERF_BR_SYSCALL,	/* X86_BR_SYSCALL */
>> +		PERF_BR_SYSRET,		/* X86_BR_SYSRET */
>> +		PERF_BR_INT,		/* X86_BR_INT */
>> +		PERF_BR_IRET,		/* X86_BR_IRET */
>> +		PERF_BR_JCC,		/* X86_BR_JCC */
>> +		PERF_BR_JMP,		/* X86_BR_JMP */
>> +		PERF_BR_IRQ,		/* X86_BR_IRQ */
>> +		PERF_BR_IND_CALL,	/* X86_BR_IND_CALL */
>> +		PERF_BR_NONE,		/* X86_BR_ABORT */
>> +		PERF_BR_NONE,		/* X86_BR_IN_TX */
>> +		PERF_BR_NONE,		/* X86_BR_NO_TX */
>> +		PERF_BR_CALL,		/* X86_BR_ZERO_CALL */
>> +		PERF_BR_NONE,		/* X86_BR_CALL_STACK */
>> +		PERF_BR_IND_JMP,	/* X86_BR_IND_JMP */
>> +	};
>> +
>> +	type >>= 2; /* skip X86_BR_USER and X86_BR_KERNEL */
>> +	mask = ~(~0 << 1);
> is that a fancy way to get 1 into the mask? what do I miss?
>
>> +
>> +	for (i = 0; i < X86_BR_TYPE_MAP_MAX; i++) {
>> +		if (type & mask)
>> +			return branch_map[i];
> I wonder some bit search would be faster in here, but maybe not big deal
>
> jirka

I just think the branch_map[] doesn't contain many entries (16 entries 
here), so maybe checking 1 bit one time should be acceptable. I just 
want to keep the code simple.

But if the number of entries is more (e.g. 64), maybe it'd better check 
2 or 4 bits one time.

Thanks
Jin Yao

>
>> +
>> +		type >>= 1;
>> +	}
>> +
>> +	return PERF_BR_NONE;
>> +}
>> +
>>   /*

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 2/7] perf/x86/intel: Record branch type
  2017-04-24  0:47     ` Jin, Yao
@ 2017-05-08  0:49       ` Jin, Yao
  2017-05-09  8:26       ` Jiri Olsa
  1 sibling, 0 replies; 35+ messages in thread
From: Jin, Yao @ 2017-05-08  0:49 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin, linuxppc-dev



On 4/24/2017 8:47 AM, Jin, Yao wrote:
>
>
> On 4/23/2017 9:55 PM, Jiri Olsa wrote:
>> On Thu, Apr 20, 2017 at 08:07:50PM +0800, Jin Yao wrote:
>>
>> SNIP
>>
>>>   +#define X86_BR_TYPE_MAP_MAX    16
>>> +
>>> +static int
>>> +common_branch_type(int type)
>>> +{
>>> +    int i, mask;
>>> +    const int branch_map[X86_BR_TYPE_MAP_MAX] = {
>>> +        PERF_BR_CALL,        /* X86_BR_CALL */
>>> +        PERF_BR_RET,        /* X86_BR_RET */
>>> +        PERF_BR_SYSCALL,    /* X86_BR_SYSCALL */
>>> +        PERF_BR_SYSRET,        /* X86_BR_SYSRET */
>>> +        PERF_BR_INT,        /* X86_BR_INT */
>>> +        PERF_BR_IRET,        /* X86_BR_IRET */
>>> +        PERF_BR_JCC,        /* X86_BR_JCC */
>>> +        PERF_BR_JMP,        /* X86_BR_JMP */
>>> +        PERF_BR_IRQ,        /* X86_BR_IRQ */
>>> +        PERF_BR_IND_CALL,    /* X86_BR_IND_CALL */
>>> +        PERF_BR_NONE,        /* X86_BR_ABORT */
>>> +        PERF_BR_NONE,        /* X86_BR_IN_TX */
>>> +        PERF_BR_NONE,        /* X86_BR_NO_TX */
>>> +        PERF_BR_CALL,        /* X86_BR_ZERO_CALL */
>>> +        PERF_BR_NONE,        /* X86_BR_CALL_STACK */
>>> +        PERF_BR_IND_JMP,    /* X86_BR_IND_JMP */
>>> +    };
>>> +
>>> +    type >>= 2; /* skip X86_BR_USER and X86_BR_KERNEL */
>>> +    mask = ~(~0 << 1);
>> is that a fancy way to get 1 into the mask? what do I miss?
>>
>>> +
>>> +    for (i = 0; i < X86_BR_TYPE_MAP_MAX; i++) {
>>> +        if (type & mask)
>>> +            return branch_map[i];
>> I wonder some bit search would be faster in here, but maybe not big deal
>>
>> jirka
>
> I just think the branch_map[] doesn't contain many entries (16 entries 
> here), so maybe checking 1 bit one time should be acceptable. I just 
> want to keep the code simple.
>
> But if the number of entries is more (e.g. 64), maybe it'd better 
> check 2 or 4 bits one time.
>
> Thanks
> Jin Yao
>

Hi,

Is this explanation OK? Since for tools part, it's Acked-by: Jiri Olsa. 
I just want to know  if the kernel part is OK either?

Thanks
Jin Yao

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 2/7] perf/x86/intel: Record branch type
  2017-04-24  0:47     ` Jin, Yao
  2017-05-08  0:49       ` Jin, Yao
@ 2017-05-09  8:26       ` Jiri Olsa
  2017-05-09 11:57         ` Jin, Yao
  1 sibling, 1 reply; 35+ messages in thread
From: Jiri Olsa @ 2017-05-09  8:26 UTC (permalink / raw)
  To: Jin, Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin, linuxppc-dev

On Mon, Apr 24, 2017 at 08:47:14AM +0800, Jin, Yao wrote:
> 
> 
> On 4/23/2017 9:55 PM, Jiri Olsa wrote:
> > On Thu, Apr 20, 2017 at 08:07:50PM +0800, Jin Yao wrote:
> > 
> > SNIP
> > 
> > > +#define X86_BR_TYPE_MAP_MAX	16
> > > +
> > > +static int
> > > +common_branch_type(int type)
> > > +{
> > > +	int i, mask;
> > > +	const int branch_map[X86_BR_TYPE_MAP_MAX] = {
> > > +		PERF_BR_CALL,		/* X86_BR_CALL */
> > > +		PERF_BR_RET,		/* X86_BR_RET */
> > > +		PERF_BR_SYSCALL,	/* X86_BR_SYSCALL */
> > > +		PERF_BR_SYSRET,		/* X86_BR_SYSRET */
> > > +		PERF_BR_INT,		/* X86_BR_INT */
> > > +		PERF_BR_IRET,		/* X86_BR_IRET */
> > > +		PERF_BR_JCC,		/* X86_BR_JCC */
> > > +		PERF_BR_JMP,		/* X86_BR_JMP */
> > > +		PERF_BR_IRQ,		/* X86_BR_IRQ */
> > > +		PERF_BR_IND_CALL,	/* X86_BR_IND_CALL */
> > > +		PERF_BR_NONE,		/* X86_BR_ABORT */
> > > +		PERF_BR_NONE,		/* X86_BR_IN_TX */
> > > +		PERF_BR_NONE,		/* X86_BR_NO_TX */
> > > +		PERF_BR_CALL,		/* X86_BR_ZERO_CALL */
> > > +		PERF_BR_NONE,		/* X86_BR_CALL_STACK */
> > > +		PERF_BR_IND_JMP,	/* X86_BR_IND_JMP */
> > > +	};
> > > +
> > > +	type >>= 2; /* skip X86_BR_USER and X86_BR_KERNEL */
> > > +	mask = ~(~0 << 1);
> > is that a fancy way to get 1 into the mask? what do I miss?

you did not comment on this one

> > 
> > > +
> > > +	for (i = 0; i < X86_BR_TYPE_MAP_MAX; i++) {
> > > +		if (type & mask)
> > > +			return branch_map[i];
> > I wonder some bit search would be faster in here, but maybe not big deal
> > 
> > jirka
> 
> I just think the branch_map[] doesn't contain many entries (16 entries
> here), so maybe checking 1 bit one time should be acceptable. I just want to
> keep the code simple.
> 
> But if the number of entries is more (e.g. 64), maybe it'd better check 2 or
> 4 bits one time.

ook

jirka

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 2/7] perf/x86/intel: Record branch type
  2017-05-09  8:26       ` Jiri Olsa
@ 2017-05-09 11:57         ` Jin, Yao
  2017-05-09 12:39           ` Jiri Olsa
  0 siblings, 1 reply; 35+ messages in thread
From: Jin, Yao @ 2017-05-09 11:57 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin, linuxppc-dev



On 5/9/2017 4:26 PM, Jiri Olsa wrote:
> On Mon, Apr 24, 2017 at 08:47:14AM +0800, Jin, Yao wrote:
>>
>> On 4/23/2017 9:55 PM, Jiri Olsa wrote:
>>> On Thu, Apr 20, 2017 at 08:07:50PM +0800, Jin Yao wrote:
>>>
>>> SNIP
>>>
>>>> +#define X86_BR_TYPE_MAP_MAX	16
>>>> +
>>>> +static int
>>>> +common_branch_type(int type)
>>>> +{
>>>> +	int i, mask;
>>>> +	const int branch_map[X86_BR_TYPE_MAP_MAX] = {
>>>> +		PERF_BR_CALL,		/* X86_BR_CALL */
>>>> +		PERF_BR_RET,		/* X86_BR_RET */
>>>> +		PERF_BR_SYSCALL,	/* X86_BR_SYSCALL */
>>>> +		PERF_BR_SYSRET,		/* X86_BR_SYSRET */
>>>> +		PERF_BR_INT,		/* X86_BR_INT */
>>>> +		PERF_BR_IRET,		/* X86_BR_IRET */
>>>> +		PERF_BR_JCC,		/* X86_BR_JCC */
>>>> +		PERF_BR_JMP,		/* X86_BR_JMP */
>>>> +		PERF_BR_IRQ,		/* X86_BR_IRQ */
>>>> +		PERF_BR_IND_CALL,	/* X86_BR_IND_CALL */
>>>> +		PERF_BR_NONE,		/* X86_BR_ABORT */
>>>> +		PERF_BR_NONE,		/* X86_BR_IN_TX */
>>>> +		PERF_BR_NONE,		/* X86_BR_NO_TX */
>>>> +		PERF_BR_CALL,		/* X86_BR_ZERO_CALL */
>>>> +		PERF_BR_NONE,		/* X86_BR_CALL_STACK */
>>>> +		PERF_BR_IND_JMP,	/* X86_BR_IND_JMP */
>>>> +	};
>>>> +
>>>> +	type >>= 2; /* skip X86_BR_USER and X86_BR_KERNEL */
>>>> +	mask = ~(~0 << 1);
>>> is that a fancy way to get 1 into the mask? what do I miss?
> you did not comment on this one

Sorry, I misunderstood that this comment and the next comment had the 
same meaning.

In the previous version, I used the switch/case to convert from X86_BR 
to PERF_BR. I got a comment from community that it'd better use a lookup 
table for conversion.

Since each bit in type represents a X86_BR type so I use a mask (0x1) to 
filter the bit. Yes, it looks I can also directly set 0x1 to mask.

I write the code "mask = ~(~0 << 1)" according to my coding habits. If 
you think I should change the code to "mask = 0x1", that's OK  :)

>>>> +
>>>> +	for (i = 0; i < X86_BR_TYPE_MAP_MAX; i++) {
>>>> +		if (type & mask)
>>>> +			return branch_map[i];
>>> I wonder some bit search would be faster in here, but maybe not big deal
>>>
>>> jirka
>> I just think the branch_map[] doesn't contain many entries (16 entries
>> here), so maybe checking 1 bit one time should be acceptable. I just want to
>> keep the code simple.
>>
>> But if the number of entries is more (e.g. 64), maybe it'd better check 2 or
>> 4 bits one time.
> ook
>
> jirka
Sorry, what's the meaning of ook? Does it mean "OK"?

Thanks
Jin Yao

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 2/7] perf/x86/intel: Record branch type
  2017-05-09 11:57         ` Jin, Yao
@ 2017-05-09 12:39           ` Jiri Olsa
  2017-05-10  0:18             ` Jin, Yao
  0 siblings, 1 reply; 35+ messages in thread
From: Jiri Olsa @ 2017-05-09 12:39 UTC (permalink / raw)
  To: Jin, Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin, linuxppc-dev

On Tue, May 09, 2017 at 07:57:11PM +0800, Jin, Yao wrote:

SNIP

> > > > > +
> > > > > +	type >>= 2; /* skip X86_BR_USER and X86_BR_KERNEL */
> > > > > +	mask = ~(~0 << 1);
> > > > is that a fancy way to get 1 into the mask? what do I miss?
> > you did not comment on this one
> 
> Sorry, I misunderstood that this comment and the next comment had the same
> meaning.
> 
> In the previous version, I used the switch/case to convert from X86_BR to
> PERF_BR. I got a comment from community that it'd better use a lookup table
> for conversion.
> 
> Since each bit in type represents a X86_BR type so I use a mask (0x1) to
> filter the bit. Yes, it looks I can also directly set 0x1 to mask.
> 
> I write the code "mask = ~(~0 << 1)" according to my coding habits. If you
> think I should change the code to "mask = 0x1", that's OK  :)

im ok with that.. was just wondering for the reason
I guess compiler will make it single constant assignment anyway

> 
> > > > > +
> > > > > +	for (i = 0; i < X86_BR_TYPE_MAP_MAX; i++) {
> > > > > +		if (type & mask)
> > > > > +			return branch_map[i];
> > > > I wonder some bit search would be faster in here, but maybe not big deal
> > > > 
> > > > jirka
> > > I just think the branch_map[] doesn't contain many entries (16 entries
> > > here), so maybe checking 1 bit one time should be acceptable. I just want to
> > > keep the code simple.
> > > 
> > > But if the number of entries is more (e.g. 64), maybe it'd better check 2 or
> > > 4 bits one time.
> > ook
> > 
> > jirka
> Sorry, what's the meaning of ook? Does it mean "OK"?

just means ok ;-)

thanks,
jirka

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 2/7] perf/x86/intel: Record branch type
  2017-05-09 12:39           ` Jiri Olsa
@ 2017-05-10  0:18             ` Jin, Yao
  0 siblings, 0 replies; 35+ messages in thread
From: Jin, Yao @ 2017-05-10  0:18 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin, linuxppc-dev



On 5/9/2017 8:39 PM, Jiri Olsa wrote:
> On Tue, May 09, 2017 at 07:57:11PM +0800, Jin, Yao wrote:
>
> SNIP
>
>>>>>> +
>>>>>> +	type >>= 2; /* skip X86_BR_USER and X86_BR_KERNEL */
>>>>>> +	mask = ~(~0 << 1);
>>>>> is that a fancy way to get 1 into the mask? what do I miss?
>>> you did not comment on this one
>> Sorry, I misunderstood that this comment and the next comment had the same
>> meaning.
>>
>> In the previous version, I used the switch/case to convert from X86_BR to
>> PERF_BR. I got a comment from community that it'd better use a lookup table
>> for conversion.
>>
>> Since each bit in type represents a X86_BR type so I use a mask (0x1) to
>> filter the bit. Yes, it looks I can also directly set 0x1 to mask.
>>
>> I write the code "mask = ~(~0 << 1)" according to my coding habits. If you
>> think I should change the code to "mask = 0x1", that's OK  :)
> im ok with that.. was just wondering for the reason
> I guess compiler will make it single constant assignment anyway

I think so.  The compiler should be clever enough for this optimization.
>>>>>> +
>>>>>> +	for (i = 0; i < X86_BR_TYPE_MAP_MAX; i++) {
>>>>>> +		if (type & mask)
>>>>>> +			return branch_map[i];
>>>>> I wonder some bit search would be faster in here, but maybe not big deal
>>>>>
>>>>> jirka
>>>> I just think the branch_map[] doesn't contain many entries (16 entries
>>>> here), so maybe checking 1 bit one time should be acceptable. I just want to
>>>> keep the code simple.
>>>>
>>>> But if the number of entries is more (e.g. 64), maybe it'd better check 2 or
>>>> 4 bits one time.
>>> ook
>>>
>>> jirka
>> Sorry, what's the meaning of ook? Does it mean "OK"?
> just means ok ;-)
>
> thanks,
> jirka

Thanks so much!

Jin Yao

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 0/7] perf report: Show branch type
  2017-04-20  9:36 ` Jiri Olsa
  2017-04-23  8:36   ` Jin, Yao
@ 2017-06-02  8:02   ` Jin, Yao
  2017-06-26  6:24     ` Jin, Yao
  1 sibling, 1 reply; 35+ messages in thread
From: Jin, Yao @ 2017-06-02  8:02 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin, linuxppc-dev

Hi maintainers,

Is this patch series (v6) OK for merging?

Thanks

Jin Yao


On 4/20/2017 5:36 PM, Jiri Olsa wrote:
> On Thu, Apr 20, 2017 at 08:07:48PM +0800, Jin Yao wrote:
>> v6:
>>     Update according to the review comments from
>>     Jiri Olsa <jolsa@redhat.com>. Major modifications are:
>>
>>     1. Move that multiline conditional code inside {} brackets.
>>
>>     2. Move branch_type_stat_display() from builtin-report.c to
>>        branch.c. Move branch_type_str() from callchain.c to
>>        branch.c.
>>
>>     3. Keep the original branch info display order, that is:
>>        predicted, abort, cycles, iterations
> for the tools part
>
> Acked-by: Jiri Olsa <jolsa@kernel.org>
>
> thanks,
> jirka

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 0/7] perf report: Show branch type
  2017-06-02  8:02   ` Jin, Yao
@ 2017-06-26  6:24     ` Jin, Yao
  2017-07-06  1:47       ` Jin, Yao
  0 siblings, 1 reply; 35+ messages in thread
From: Jin, Yao @ 2017-06-26  6:24 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin, linuxppc-dev

Hi maintainers,

Is this patch series OK or anything I should update?

Thanks

Jin Yao


On 6/2/2017 4:02 PM, Jin, Yao wrote:
> Hi maintainers,
>
> Is this patch series (v6) OK for merging?
>
> Thanks
>
> Jin Yao
>
>
> On 4/20/2017 5:36 PM, Jiri Olsa wrote:
>> On Thu, Apr 20, 2017 at 08:07:48PM +0800, Jin Yao wrote:
>>> v6:
>>>     Update according to the review comments from
>>>     Jiri Olsa <jolsa@redhat.com>. Major modifications are:
>>>
>>>     1. Move that multiline conditional code inside {} brackets.
>>>
>>>     2. Move branch_type_stat_display() from builtin-report.c to
>>>        branch.c. Move branch_type_str() from callchain.c to
>>>        branch.c.
>>>
>>>     3. Keep the original branch info display order, that is:
>>>        predicted, abort, cycles, iterations
>> for the tools part
>>
>> Acked-by: Jiri Olsa <jolsa@kernel.org>
>>
>> thanks,
>> jirka
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 0/7] perf report: Show branch type
  2017-06-26  6:24     ` Jin, Yao
@ 2017-07-06  1:47       ` Jin, Yao
  0 siblings, 0 replies; 35+ messages in thread
From: Jin, Yao @ 2017-07-06  1:47 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin, linuxppc-dev

Hi Arnaldo,

Could this series be merged? It's more than 2 months since the last time 
Jiri Olsa gave the ack.

Thanks

Jin Yao


On 6/26/2017 2:24 PM, Jin, Yao wrote:
> Hi maintainers,
>
> Is this patch series OK or anything I should update?
>
> Thanks
>
> Jin Yao
>
>
> On 6/2/2017 4:02 PM, Jin, Yao wrote:
>> Hi maintainers,
>>
>> Is this patch series (v6) OK for merging?
>>
>> Thanks
>>
>> Jin Yao
>>
>>
>> On 4/20/2017 5:36 PM, Jiri Olsa wrote:
>>> On Thu, Apr 20, 2017 at 08:07:48PM +0800, Jin Yao wrote:
>>>> v6:
>>>>     Update according to the review comments from
>>>>     Jiri Olsa <jolsa@redhat.com>. Major modifications are:
>>>>
>>>>     1. Move that multiline conditional code inside {} brackets.
>>>>
>>>>     2. Move branch_type_stat_display() from builtin-report.c to
>>>>        branch.c. Move branch_type_str() from callchain.c to
>>>>        branch.c.
>>>>
>>>>     3. Keep the original branch info display order, that is:
>>>>        predicted, abort, cycles, iterations
>>> for the tools part
>>>
>>> Acked-by: Jiri Olsa <jolsa@kernel.org>
>>>
>>> thanks,
>>> jirka
>>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 0/7] perf report: Show branch type
  2017-04-20 12:07 [PATCH v6 0/7] perf report: Show branch type Jin Yao
                   ` (7 preceding siblings ...)
  2017-04-20 12:07 ` [PATCH v6 7/7] perf report: Show branch type in callchain entry Jin Yao
@ 2017-07-07  8:09 ` Jiri Olsa
  8 siblings, 0 replies; 35+ messages in thread
From: Jiri Olsa @ 2017-07-07  8:09 UTC (permalink / raw)
  To: Jin Yao, peterz
  Cc: acme, jolsa, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin, linuxppc-dev

On Thu, Apr 20, 2017 at 08:07:48PM +0800, Jin Yao wrote:
> v6:
>    Update according to the review comments from
>    Jiri Olsa <jolsa@redhat.com>. Major modifications are: 
> 
>    1. Move that multiline conditional code inside {} brackets.
> 
>    2. Move branch_type_stat_display() from builtin-report.c to
>       branch.c. Move branch_type_str() from callchain.c to
>       branch.c.
> 
>    3. Keep the original branch info display order, that is:
>       predicted, abort, cycles, iterations

Peter,
are you ok with the kernel side of this?

thanks,
jirka

> 
> v5:
> ---
>    Mainly the v5 patch series are updated according to
>    comments from Jiri Olsa <jolsa@redhat.com>.
> 
>    The kernel part doesn't have functional change. It just
>    solve the merge issue.
> 
>    In userspace, the functions of branch type counting and
>    branch type name resolving are moved to the new files: 
>    util/branch.c, util/branch.h.
> 
>    And refactor the branch info printing code for better
>    maintenance.
> 
> Not changed (or just fix merge issue):
>   perf/core: Define the common branch type classification
>   perf/x86/intel: Record branch type
>   perf record: Create a new option save_type in --branch-filter
> 
> New patches:
>   perf report: Refactor the branch info printing code
>   perf util: Create branch.c/.h for common branch functions
> 
> Changed:
>   perf report: Show branch type statistics for stdio mode
>   perf report: Show branch type in callchain entry
> 
> v4:
> ---
> 1. Describe the major changes in patch description.
>    Thanks for Peter Zijlstra's reminding. 
> 
> 2. Initialize branch type to 0 in intel_pmu_lbr_read_32 and
>    intel_pmu_lbr_read_64. Remove the invalid else code in
>    intel_pmu_lbr_filter. 
> 
> v3:
> ---
> 1. Move the JCC forward/backward and cross page computing from
>    kernel to userspace.
> 
> 2. Use lookup table to replace original switch/case processing.
> 
> Changed:
>   perf/core: Define the common branch type classification
>   perf/x86/intel: Record branch type
>   perf report: Show branch type statistics for stdio mode
>   perf report: Show branch type in callchain entry
> 
> Not changed:
>   perf record: Create a new option save_type in --branch-filter
> 
> v2:
> ---
> 1. Use 4 bits in perf_branch_entry to record branch type.
> 
> 2. Pull out some common branch types from FAR_BRANCH. Now the branch
>    types defined in perf_event.h:
> 
> Jin Yao (7):
>   perf/core: Define the common branch type classification
>   perf/x86/intel: Record branch type
>   perf record: Create a new option save_type in --branch-filter
>   perf report: Refactor the branch info printing code
>   perf util: Create branch.c/.h for common branch functions
>   perf report: Show branch type statistics for stdio mode
>   perf report: Show branch type in callchain entry
> 
>  arch/x86/events/intel/lbr.c              |  53 +++++++++-
>  include/uapi/linux/perf_event.h          |  29 +++++-
>  tools/include/uapi/linux/perf_event.h    |  29 +++++-
>  tools/perf/Documentation/perf-record.txt |   1 +
>  tools/perf/builtin-report.c              |  25 +++++
>  tools/perf/util/Build                    |   1 +
>  tools/perf/util/branch.c                 | 168 +++++++++++++++++++++++++++++++
>  tools/perf/util/branch.h                 |  25 +++++
>  tools/perf/util/callchain.c              | 140 ++++++++++++++------------
>  tools/perf/util/callchain.h              |   5 +-
>  tools/perf/util/event.h                  |   3 +-
>  tools/perf/util/hist.c                   |   5 +-
>  tools/perf/util/machine.c                |  26 +++--
>  tools/perf/util/parse-branch-options.c   |   1 +
>  14 files changed, 427 insertions(+), 84 deletions(-)
>  create mode 100644 tools/perf/util/branch.c
>  create mode 100644 tools/perf/util/branch.h
> 
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
  2017-04-20 12:07 ` [PATCH v6 1/7] perf/core: Define the common branch type classification Jin Yao
@ 2017-07-07  8:42   ` Peter Zijlstra
  2017-07-10  5:19     ` Michael Ellerman
  2017-07-10  6:05   ` Michael Ellerman
  1 sibling, 1 reply; 35+ messages in thread
From: Peter Zijlstra @ 2017-07-07  8:42 UTC (permalink / raw)
  To: Jin Yao
  Cc: acme, jolsa, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin, linuxppc-dev, maddy


PPC folks, maddy, does this work for you guys?

On Thu, Apr 20, 2017 at 08:07:49PM +0800, Jin Yao wrote:
> It is often useful to know the branch types while analyzing branch
> data. For example, a call is very different from a conditional branch.
> 
> Currently we have to look it up in binary while the binary may later
> not be available and even the binary is available but user has to take
> some time. It is very useful for user to check it directly in perf
> report.
> 
> Perf already has support for disassembling the branch instruction
> to get the x86 branch type.
> 
> To keep consistent on kernel and userspace and make the classification
> more common, the patch adds the common branch type classification
> in perf_event.h.
> 
> PERF_BR_NONE      : unknown
> PERF_BR_JCC       : conditional jump
> PERF_BR_JMP       : jump
> PERF_BR_IND_JMP   : indirect jump
> PERF_BR_CALL      : call
> PERF_BR_IND_CALL  : indirect call
> PERF_BR_RET       : return
> PERF_BR_SYSCALL   : syscall
> PERF_BR_SYSRET    : syscall return
> PERF_BR_IRQ       : hw interrupt/trap/fault
> PERF_BR_INT       : sw interrupt
> PERF_BR_IRET      : return from interrupt
> PERF_BR_FAR_BRANCH: not generic far branch type
> 
> The patch also adds a new field type (4 bits) in perf_branch_entry
> to record the branch type.
> 
> Since the disassembling of branch instruction needs some overhead,
> a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
> needs to disassemble the branch instruction and record the branch
> type.
> 
> Change log
> ----------
> 
> v6: Not changed.
> 
> v5: Not changed. The v5 patch series just change the userspace.
> 
> v4: Comparing to previous version, the major changes are:
> 
> 1. Remove the PERF_BR_JCC_FWD/PERF_BR_JCC_BWD, they will be
>    computed later in userspace.
> 
> 2. Remove the "cross" field in perf_branch_entry. The cross page
>    computing will be done later in userspace.
> 
> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
> ---
>  include/uapi/linux/perf_event.h       | 29 ++++++++++++++++++++++++++++-
>  tools/include/uapi/linux/perf_event.h | 29 ++++++++++++++++++++++++++++-
>  2 files changed, 56 insertions(+), 2 deletions(-)
> 
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index d09a9cd..69af012 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
>  	PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT	= 14, /* no flags */
>  	PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT	= 15, /* no cycles */
>  
> +	PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT	= 16, /* save branch type */
> +
>  	PERF_SAMPLE_BRANCH_MAX_SHIFT		/* non-ABI */
>  };
>  
> @@ -198,9 +200,32 @@ enum perf_branch_sample_type {
>  	PERF_SAMPLE_BRANCH_NO_FLAGS	= 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
>  	PERF_SAMPLE_BRANCH_NO_CYCLES	= 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
>  
> +	PERF_SAMPLE_BRANCH_TYPE_SAVE	=
> +		1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
> +
>  	PERF_SAMPLE_BRANCH_MAX		= 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
>  };
>  
> +/*
> + * Common flow change classification
> + */
> +enum {
> +	PERF_BR_NONE		= 0,	/* unknown */
> +	PERF_BR_JCC		= 1,	/* conditional jump */
> +	PERF_BR_JMP		= 2,	/* jump */
> +	PERF_BR_IND_JMP		= 3,	/* indirect jump */
> +	PERF_BR_CALL		= 4,	/* call */
> +	PERF_BR_IND_CALL	= 5,	/* indirect call */
> +	PERF_BR_RET		= 6,	/* return */
> +	PERF_BR_SYSCALL		= 7,	/* syscall */
> +	PERF_BR_SYSRET		= 8,	/* syscall return */
> +	PERF_BR_IRQ		= 9,	/* hw interrupt/trap/fault */
> +	PERF_BR_INT		= 10,	/* sw interrupt */
> +	PERF_BR_IRET		= 11,	/* return from interrupt */
> +	PERF_BR_FAR_BRANCH	= 12,	/* not generic far branch type */
> +	PERF_BR_MAX,
> +};
> +
>  #define PERF_SAMPLE_BRANCH_PLM_ALL \
>  	(PERF_SAMPLE_BRANCH_USER|\
>  	 PERF_SAMPLE_BRANCH_KERNEL|\
> @@ -999,6 +1024,7 @@ union perf_mem_data_src {
>   *     in_tx: running in a hardware transaction
>   *     abort: aborting a hardware transaction
>   *    cycles: cycles from last branch (or 0 if not supported)
> + *      type: branch type
>   */
>  struct perf_branch_entry {
>  	__u64	from;
> @@ -1008,7 +1034,8 @@ struct perf_branch_entry {
>  		in_tx:1,    /* in transaction */
>  		abort:1,    /* transaction abort */
>  		cycles:16,  /* cycle count to last branch */
> -		reserved:44;
> +		type:4,     /* branch type */
> +		reserved:40;
>  };
>  
>  #endif /* _UAPI_LINUX_PERF_EVENT_H */
> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
> index d09a9cd..69af012 100644
> --- a/tools/include/uapi/linux/perf_event.h
> +++ b/tools/include/uapi/linux/perf_event.h
> @@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
>  	PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT	= 14, /* no flags */
>  	PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT	= 15, /* no cycles */
>  
> +	PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT	= 16, /* save branch type */
> +
>  	PERF_SAMPLE_BRANCH_MAX_SHIFT		/* non-ABI */
>  };
>  
> @@ -198,9 +200,32 @@ enum perf_branch_sample_type {
>  	PERF_SAMPLE_BRANCH_NO_FLAGS	= 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
>  	PERF_SAMPLE_BRANCH_NO_CYCLES	= 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
>  
> +	PERF_SAMPLE_BRANCH_TYPE_SAVE	=
> +		1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
> +
>  	PERF_SAMPLE_BRANCH_MAX		= 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
>  };
>  
> +/*
> + * Common flow change classification
> + */
> +enum {
> +	PERF_BR_NONE		= 0,	/* unknown */
> +	PERF_BR_JCC		= 1,	/* conditional jump */
> +	PERF_BR_JMP		= 2,	/* jump */
> +	PERF_BR_IND_JMP		= 3,	/* indirect jump */
> +	PERF_BR_CALL		= 4,	/* call */
> +	PERF_BR_IND_CALL	= 5,	/* indirect call */
> +	PERF_BR_RET		= 6,	/* return */
> +	PERF_BR_SYSCALL		= 7,	/* syscall */
> +	PERF_BR_SYSRET		= 8,	/* syscall return */
> +	PERF_BR_IRQ		= 9,	/* hw interrupt/trap/fault */
> +	PERF_BR_INT		= 10,	/* sw interrupt */
> +	PERF_BR_IRET		= 11,	/* return from interrupt */
> +	PERF_BR_FAR_BRANCH	= 12,	/* not generic far branch type */
> +	PERF_BR_MAX,
> +};
> +
>  #define PERF_SAMPLE_BRANCH_PLM_ALL \
>  	(PERF_SAMPLE_BRANCH_USER|\
>  	 PERF_SAMPLE_BRANCH_KERNEL|\
> @@ -999,6 +1024,7 @@ union perf_mem_data_src {
>   *     in_tx: running in a hardware transaction
>   *     abort: aborting a hardware transaction
>   *    cycles: cycles from last branch (or 0 if not supported)
> + *      type: branch type
>   */
>  struct perf_branch_entry {
>  	__u64	from;
> @@ -1008,7 +1034,8 @@ struct perf_branch_entry {
>  		in_tx:1,    /* in transaction */
>  		abort:1,    /* transaction abort */
>  		cycles:16,  /* cycle count to last branch */
> -		reserved:44;
> +		type:4,     /* branch type */
> +		reserved:40;
>  };
>  
>  #endif /* _UAPI_LINUX_PERF_EVENT_H */
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
  2017-07-07  8:42   ` Peter Zijlstra
@ 2017-07-10  5:19     ` Michael Ellerman
  0 siblings, 0 replies; 35+ messages in thread
From: Michael Ellerman @ 2017-07-10  5:19 UTC (permalink / raw)
  To: Peter Zijlstra, Jin Yao
  Cc: ak, maddy, alexander.shishkin, kan.liang, linuxppc-dev,
	Linux-kernel, acme, mingo, jolsa, yao.jin

Peter Zijlstra <peterz@infradead.org> writes:

> PPC folks, maddy, does this work for you guys?

It think it works for us, but I have some comments, I'll reply to the original.

cheers

> On Thu, Apr 20, 2017 at 08:07:49PM +0800, Jin Yao wrote:
>> It is often useful to know the branch types while analyzing branch
>> data. For example, a call is very different from a conditional branch.
>> 
>> Currently we have to look it up in binary while the binary may later
>> not be available and even the binary is available but user has to take
>> some time. It is very useful for user to check it directly in perf
>> report.
>> 
>> Perf already has support for disassembling the branch instruction
>> to get the x86 branch type.
>> 
>> To keep consistent on kernel and userspace and make the classification
>> more common, the patch adds the common branch type classification
>> in perf_event.h.
>> 
>> PERF_BR_NONE      : unknown
>> PERF_BR_JCC       : conditional jump
>> PERF_BR_JMP       : jump
>> PERF_BR_IND_JMP   : indirect jump
>> PERF_BR_CALL      : call
>> PERF_BR_IND_CALL  : indirect call
>> PERF_BR_RET       : return
>> PERF_BR_SYSCALL   : syscall
>> PERF_BR_SYSRET    : syscall return
>> PERF_BR_IRQ       : hw interrupt/trap/fault
>> PERF_BR_INT       : sw interrupt
>> PERF_BR_IRET      : return from interrupt
>> PERF_BR_FAR_BRANCH: not generic far branch type
>> 
>> The patch also adds a new field type (4 bits) in perf_branch_entry
>> to record the branch type.
>> 
>> Since the disassembling of branch instruction needs some overhead,
>> a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
>> needs to disassemble the branch instruction and record the branch
>> type.
>> 
>> Change log
>> ----------
>> 
>> v6: Not changed.
>> 
>> v5: Not changed. The v5 patch series just change the userspace.
>> 
>> v4: Comparing to previous version, the major changes are:
>> 
>> 1. Remove the PERF_BR_JCC_FWD/PERF_BR_JCC_BWD, they will be
>>    computed later in userspace.
>> 
>> 2. Remove the "cross" field in perf_branch_entry. The cross page
>>    computing will be done later in userspace.
>> 
>> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
>> ---
>>  include/uapi/linux/perf_event.h       | 29 ++++++++++++++++++++++++++++-
>>  tools/include/uapi/linux/perf_event.h | 29 ++++++++++++++++++++++++++++-
>>  2 files changed, 56 insertions(+), 2 deletions(-)
>> 
>> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
>> index d09a9cd..69af012 100644
>> --- a/include/uapi/linux/perf_event.h
>> +++ b/include/uapi/linux/perf_event.h
>> @@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
>>  	PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT	= 14, /* no flags */
>>  	PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT	= 15, /* no cycles */
>>  
>> +	PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT	= 16, /* save branch type */
>> +
>>  	PERF_SAMPLE_BRANCH_MAX_SHIFT		/* non-ABI */
>>  };
>>  
>> @@ -198,9 +200,32 @@ enum perf_branch_sample_type {
>>  	PERF_SAMPLE_BRANCH_NO_FLAGS	= 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
>>  	PERF_SAMPLE_BRANCH_NO_CYCLES	= 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
>>  
>> +	PERF_SAMPLE_BRANCH_TYPE_SAVE	=
>> +		1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
>> +
>>  	PERF_SAMPLE_BRANCH_MAX		= 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
>>  };
>>  
>> +/*
>> + * Common flow change classification
>> + */
>> +enum {
>> +	PERF_BR_NONE		= 0,	/* unknown */
>> +	PERF_BR_JCC		= 1,	/* conditional jump */
>> +	PERF_BR_JMP		= 2,	/* jump */
>> +	PERF_BR_IND_JMP		= 3,	/* indirect jump */
>> +	PERF_BR_CALL		= 4,	/* call */
>> +	PERF_BR_IND_CALL	= 5,	/* indirect call */
>> +	PERF_BR_RET		= 6,	/* return */
>> +	PERF_BR_SYSCALL		= 7,	/* syscall */
>> +	PERF_BR_SYSRET		= 8,	/* syscall return */
>> +	PERF_BR_IRQ		= 9,	/* hw interrupt/trap/fault */
>> +	PERF_BR_INT		= 10,	/* sw interrupt */
>> +	PERF_BR_IRET		= 11,	/* return from interrupt */
>> +	PERF_BR_FAR_BRANCH	= 12,	/* not generic far branch type */
>> +	PERF_BR_MAX,
>> +};
>> +
>>  #define PERF_SAMPLE_BRANCH_PLM_ALL \
>>  	(PERF_SAMPLE_BRANCH_USER|\
>>  	 PERF_SAMPLE_BRANCH_KERNEL|\
>> @@ -999,6 +1024,7 @@ union perf_mem_data_src {
>>   *     in_tx: running in a hardware transaction
>>   *     abort: aborting a hardware transaction
>>   *    cycles: cycles from last branch (or 0 if not supported)
>> + *      type: branch type
>>   */
>>  struct perf_branch_entry {
>>  	__u64	from;
>> @@ -1008,7 +1034,8 @@ struct perf_branch_entry {
>>  		in_tx:1,    /* in transaction */
>>  		abort:1,    /* transaction abort */
>>  		cycles:16,  /* cycle count to last branch */
>> -		reserved:44;
>> +		type:4,     /* branch type */
>> +		reserved:40;
>>  };
>>  
>>  #endif /* _UAPI_LINUX_PERF_EVENT_H */
>> diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
>> index d09a9cd..69af012 100644
>> --- a/tools/include/uapi/linux/perf_event.h
>> +++ b/tools/include/uapi/linux/perf_event.h
>> @@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
>>  	PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT	= 14, /* no flags */
>>  	PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT	= 15, /* no cycles */
>>  
>> +	PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT	= 16, /* save branch type */
>> +
>>  	PERF_SAMPLE_BRANCH_MAX_SHIFT		/* non-ABI */
>>  };
>>  
>> @@ -198,9 +200,32 @@ enum perf_branch_sample_type {
>>  	PERF_SAMPLE_BRANCH_NO_FLAGS	= 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
>>  	PERF_SAMPLE_BRANCH_NO_CYCLES	= 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
>>  
>> +	PERF_SAMPLE_BRANCH_TYPE_SAVE	=
>> +		1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
>> +
>>  	PERF_SAMPLE_BRANCH_MAX		= 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
>>  };
>>  
>> +/*
>> + * Common flow change classification
>> + */
>> +enum {
>> +	PERF_BR_NONE		= 0,	/* unknown */
>> +	PERF_BR_JCC		= 1,	/* conditional jump */
>> +	PERF_BR_JMP		= 2,	/* jump */
>> +	PERF_BR_IND_JMP		= 3,	/* indirect jump */
>> +	PERF_BR_CALL		= 4,	/* call */
>> +	PERF_BR_IND_CALL	= 5,	/* indirect call */
>> +	PERF_BR_RET		= 6,	/* return */
>> +	PERF_BR_SYSCALL		= 7,	/* syscall */
>> +	PERF_BR_SYSRET		= 8,	/* syscall return */
>> +	PERF_BR_IRQ		= 9,	/* hw interrupt/trap/fault */
>> +	PERF_BR_INT		= 10,	/* sw interrupt */
>> +	PERF_BR_IRET		= 11,	/* return from interrupt */
>> +	PERF_BR_FAR_BRANCH	= 12,	/* not generic far branch type */
>> +	PERF_BR_MAX,
>> +};
>> +
>>  #define PERF_SAMPLE_BRANCH_PLM_ALL \
>>  	(PERF_SAMPLE_BRANCH_USER|\
>>  	 PERF_SAMPLE_BRANCH_KERNEL|\
>> @@ -999,6 +1024,7 @@ union perf_mem_data_src {
>>   *     in_tx: running in a hardware transaction
>>   *     abort: aborting a hardware transaction
>>   *    cycles: cycles from last branch (or 0 if not supported)
>> + *      type: branch type
>>   */
>>  struct perf_branch_entry {
>>  	__u64	from;
>> @@ -1008,7 +1034,8 @@ struct perf_branch_entry {
>>  		in_tx:1,    /* in transaction */
>>  		abort:1,    /* transaction abort */
>>  		cycles:16,  /* cycle count to last branch */
>> -		reserved:44;
>> +		type:4,     /* branch type */
>> +		reserved:40;
>>  };
>>  
>>  #endif /* _UAPI_LINUX_PERF_EVENT_H */
>> -- 
>> 2.7.4
>> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
  2017-04-20 12:07 ` [PATCH v6 1/7] perf/core: Define the common branch type classification Jin Yao
  2017-07-07  8:42   ` Peter Zijlstra
@ 2017-07-10  6:05   ` Michael Ellerman
  2017-07-10  8:16     ` Jin, Yao
  1 sibling, 1 reply; 35+ messages in thread
From: Michael Ellerman @ 2017-07-10  6:05 UTC (permalink / raw)
  To: Jin Yao, acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: ak, kan.liang, linuxppc-dev, Linux-kernel, Jin Yao, yao.jin

Hi Jin Yao,

Sorry I haven't commented until now, but it got lost in the flood of
patches.

Just a few nit-picks below ...

Jin Yao <yao.jin@linux.intel.com> writes:

> It is often useful to know the branch types while analyzing branch
> data. For example, a call is very different from a conditional branch.
>
> Currently we have to look it up in binary while the binary may later
> not be available and even the binary is available but user has to take
> some time. It is very useful for user to check it directly in perf
> report.
>
> Perf already has support for disassembling the branch instruction
> to get the x86 branch type.
>
> To keep consistent on kernel and userspace and make the classification
> more common, the patch adds the common branch type classification
> in perf_event.h.

Most of the code and doc uses "branch" but then a few these are called
"jump". Can we just stick with "branch"?

> PERF_BR_NONE      : unknown
> PERF_BR_JCC       : conditional jump
> PERF_BR_JMP       : jump
> PERF_BR_IND_JMP   : indirect jump

eg:

PERF_BR_COND    : conditional branch
PERF_BR_UNCOND  : unconditional branch
PERF_BR_IND     : indirect branch

> PERF_BR_CALL      : call
> PERF_BR_IND_CALL  : indirect call
> PERF_BR_RET       : return
> PERF_BR_SYSCALL   : syscall
> PERF_BR_SYSRET    : syscall return
> PERF_BR_IRQ       : hw interrupt/trap/fault
> PERF_BR_INT       : sw interrupt

I'm not sure what that means, I'm guessing on x86 it means someone
executed "int" ?

Is that sufficiently useful to use up a bit? I think we only have 3
free?

> PERF_BR_IRET      : return from interrupt
> PERF_BR_FAR_BRANCH: not generic far branch type

What is a "not generic far branch" ?

I don't know what that would mean on powerpc for example.


I think the only thing we have on powerpc that's commonly used and that
isn't covered above is branches that decrement a loop counter and then
branch based on the result.

It might be nice if we could separate those out from other conditional
branches. Whether it's worth using a bit for I'm not sure. Do other
arches have something similar?

Those branches do tend to be "backward conditional", so that may be
sufficient. But backward conditional also includes if bodies that have
been moved out of line and then branch back to the main body of the
function.

cheers

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
  2017-07-10  6:05   ` Michael Ellerman
@ 2017-07-10  8:16     ` Jin, Yao
  2017-07-10 10:32       ` Michael Ellerman
  0 siblings, 1 reply; 35+ messages in thread
From: Jin, Yao @ 2017-07-10  8:16 UTC (permalink / raw)
  To: Michael Ellerman, acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: ak, kan.liang, linuxppc-dev, Linux-kernel, yao.jin



On 7/10/2017 2:05 PM, Michael Ellerman wrote:
> Hi Jin Yao,
>
> Sorry I haven't commented until now, but it got lost in the flood of
> patches.

Never mind, it's no problem. :)

> Just a few nit-picks below ...
> Jin Yao <yao.jin@linux.intel.com> writes:
>
>> It is often useful to know the branch types while analyzing branch
>> data. For example, a call is very different from a conditional branch.
>>
>> Currently we have to look it up in binary while the binary may later
>> not be available and even the binary is available but user has to take
>> some time. It is very useful for user to check it directly in perf
>> report.
>>
>> Perf already has support for disassembling the branch instruction
>> to get the x86 branch type.
>>
>> To keep consistent on kernel and userspace and make the classification
>> more common, the patch adds the common branch type classification
>> in perf_event.h.
> Most of the code and doc uses "branch" but then a few these are called
> "jump". Can we just stick with "branch"?
>
>> PERF_BR_NONE      : unknown
>> PERF_BR_JCC       : conditional jump
>> PERF_BR_JMP       : jump
>> PERF_BR_IND_JMP   : indirect jump
> eg:
>
> PERF_BR_COND    : conditional branch
> PERF_BR_UNCOND  : unconditional branch
> PERF_BR_IND     : indirect branch

Call and jump are all branches. If we want to figure out which one is 
jump and which one is call, we need the detail branch type definitions.

For example,  if we only say "PERF_BR_IND", we could not know if it's an 
indirect jump or indirect call.
>> PERF_BR_CALL      : call
>> PERF_BR_IND_CALL  : indirect call
>> PERF_BR_RET       : return
>> PERF_BR_SYSCALL   : syscall
>> PERF_BR_SYSRET    : syscall return
>> PERF_BR_IRQ       : hw interrupt/trap/fault
>> PERF_BR_INT       : sw interrupt
> I'm not sure what that means, I'm guessing on x86 it means someone
> executed "int" ?

PERF_BR_IRQ is for hw interrupt and PERF_BR_INT is for sw interrupt.

PERF_BR_CALL/PERF_BR_IND_CALL and PERF_BR_RET are for function call 
(direct call and indirect call) and return.

PERF_BR_SYSCALL/PERF_BR_SYSRET are for syscall and syscall return.

> Is that sufficiently useful to use up a bit? I think we only have 3
> free?

Do you means 3 bits? Each bit stands for one branch type? I guess what 
you mean is:

PERF_BR_COND    : conditional branch
PERF_BR_UNCOND  : unconditional branch
PERF_BR_IND     : indirect branch

But 3 branch types are not enough for us.

>> PERF_BR_IRET      : return from interrupt
>> PERF_BR_FAR_BRANCH: not generic far branch type
> What is a "not generic far branch" ?
>
> I don't know what that would mean on powerpc for example.

It's reserved for future using I think.

>
> I think the only thing we have on powerpc that's commonly used and that
> isn't covered above is branches that decrement a loop counter and then
> branch based on the result.
>
> It might be nice if we could separate those out from other conditional
> branches. Whether it's worth using a bit for I'm not sure. Do other
> arches have something similar?
>
> Those branches do tend to be "backward conditional", so that may be
> sufficient. But backward conditional also includes if bodies that have
> been moved out of line and then branch back to the main body of the
> function.
>
> cheers

Sorry, I'm not familiar with powerpc arch. Or could you add the branch 
type which powerpc needs?

For backward conditional and forward conditional, we compute them in 
userspace according to the from/to addresses.

Thanks
Jin Yao

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
  2017-07-10  8:16     ` Jin, Yao
@ 2017-07-10 10:32       ` Michael Ellerman
  2017-07-10 11:46         ` Jin, Yao
  0 siblings, 1 reply; 35+ messages in thread
From: Michael Ellerman @ 2017-07-10 10:32 UTC (permalink / raw)
  To: Jin, Yao, acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: ak, kan.liang, linuxppc-dev, Linux-kernel, yao.jin

"Jin, Yao" <yao.jin@linux.intel.com> writes:
> On 7/10/2017 2:05 PM, Michael Ellerman wrote:
>> Jin Yao <yao.jin@linux.intel.com> writes:
>>
>>> It is often useful to know the branch types while analyzing branch
>>> data. For example, a call is very different from a conditional branch.
>>>
...
>>> To keep consistent on kernel and userspace and make the classification
>>> more common, the patch adds the common branch type classification
>>> in perf_event.h.
>>
>> Most of the code and doc uses "branch" but then a few these are called
>> "jump". Can we just stick with "branch"?
>>
>>> PERF_BR_NONE      : unknown
>>> PERF_BR_JCC       : conditional jump
>>> PERF_BR_JMP       : jump
>>> PERF_BR_IND_JMP   : indirect jump
>> eg:
>>
>> PERF_BR_COND    : conditional branch
>> PERF_BR_UNCOND  : unconditional branch
>> PERF_BR_IND     : indirect branch
>
> Call and jump are all branches. If we want to figure out which one is 
> jump and which one is call, we need the detail branch type definitions.

Yeah I'm not saying we don't need the different types, I'm saying I'd
rather we just called them "branch" not "jump". Just because "jump" can
mean different things on different arches.

> For example,  if we only say "PERF_BR_IND", we could not know if it's an 
> indirect jump or indirect call.

Yes we can, PERF_BR_IND is an indirect branch, which is not a call,
because if it was a call then it would be PERF_BR_IND_CALL.

>>> PERF_BR_CALL      : call
>>> PERF_BR_IND_CALL  : indirect call
>>> PERF_BR_RET       : return
>>> PERF_BR_SYSCALL   : syscall
>>> PERF_BR_SYSRET    : syscall return
>>> PERF_BR_IRQ       : hw interrupt/trap/fault
>>> PERF_BR_INT       : sw interrupt
>> I'm not sure what that means, I'm guessing on x86 it means someone
>> executed "int" ?
>
> PERF_BR_IRQ is for hw interrupt and PERF_BR_INT is for sw interrupt.

OK, but I still don't know what that means :)

What's an example of an instruction that is PERF_BR_IRQ and PERF_BR_INT ?

> PERF_BR_CALL/PERF_BR_IND_CALL and PERF_BR_RET are for function call 
> (direct call and indirect call) and return.

Yep makes sense.

> PERF_BR_SYSCALL/PERF_BR_SYSRET are for syscall and syscall return.

Yep OK.

>> Is that sufficiently useful to use up a bit? I think we only have 3
>> free?
>
> Do you means 3 bits? Each bit stands for one branch type? I guess what 
> you mean is:
>
> PERF_BR_COND    : conditional branch
> PERF_BR_UNCOND  : unconditional branch
> PERF_BR_IND     : indirect branch
>
> But 3 branch types are not enough for us.

What I meant was you're using 4 bits for the type, so you have 16
possible values, and you've defined 13 of them. Meaning there are only 3
types free.

So we should try to only define branch types that are really useful, and
keep some free for future use.

Maybe PERF_BR_INT is really common on x86 and so it's important to count
it, but like I said above I don't know what it is.

>>> PERF_BR_IRET      : return from interrupt
>>> PERF_BR_FAR_BRANCH: not generic far branch type
>> What is a "not generic far branch" ?
>>
>> I don't know what that would mean on powerpc for example.
>
> It's reserved for future using I think.

OK so let's not put it in the Linux API until it's defined?

>> I think the only thing we have on powerpc that's commonly used and that
>> isn't covered above is branches that decrement a loop counter and then
>> branch based on the result.
...
>
> Sorry, I'm not familiar with powerpc arch. Or could you add the branch 
> type which powerpc needs?

These are good:

+	PERF_BR_COND		= 1,	/* conditional */
+	PERF_BR_UNCOND		= 2,	/* unconditional */
+	PERF_BR_IND		= 3,	/* indirect */
+	PERF_BR_CALL		= 4,	/* call */
+	PERF_BR_IND_CALL	= 5,	/* indirect call */
+	PERF_BR_RET		= 6,	/* return */

These we wouldn't use currently, but make sense:

+	PERF_BR_SYSCALL		= 7,	/* syscall */
+	PERF_BR_SYSRET		= 8,	/* syscall return */
+	PERF_BR_IRET		= 11,	/* return from interrupt */

These I'm not so sure about, I don't really know what they would map to
for us:

+	PERF_BR_IRQ		= 9,	/* hw interrupt/trap/fault */
+	PERF_BR_INT		= 10,	/* sw interrupt */

And sounds like this should be dropped for now:

+	PERF_BR_FAR_BRANCH	= 12,	/* not generic far branch type */

The branch types you haven't covered which might be useful for us are:

	PERF_BR_COND_CALL	/* Conditional call */
	PERF_BR_COND_RET	/* Condition return */


cheers

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
  2017-07-10 10:32       ` Michael Ellerman
@ 2017-07-10 11:46         ` Jin, Yao
  2017-07-10 13:10           ` Segher Boessenkool
  0 siblings, 1 reply; 35+ messages in thread
From: Jin, Yao @ 2017-07-10 11:46 UTC (permalink / raw)
  To: Michael Ellerman, acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: ak, kan.liang, linuxppc-dev, Linux-kernel, yao.jin

Hi Michael,

Please let me summarize for the new branch type definitions.

1. We all agree these definitions:

+	PERF_BR_COND		= 1,	/* conditional */
+	PERF_BR_UNCOND		= 2,	/* unconditional */
+	PERF_BR_IND		= 3,	/* indirect */
+	PERF_BR_CALL		= 4,	/* call */
+	PERF_BR_IND_CALL	= 5,	/* indirect call */
+	PERF_BR_RET		= 6,	/* return */
+	PERF_BR_SYSCALL		= 7,	/* syscall */
+	PERF_BR_SYSRET		= 8,	/* syscall return */
+	PERF_BR_IRET		= 11,	/* return from interrupt */

2. I wish to keep following definitions for x86.

+	PERF_BR_IRQ		= 9,	/* hw interrupt/trap/fault */
+	PERF_BR_INT		= 10,	/* sw interrupt */

PERF_BR_INT is triggered by instruction "int" .
PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3 
transition).

3. I can drop PERF_BR_FAR_BRANCH

4. I'd like to add following types for powerpc.

	PERF_BR_COND_CALL	/* Conditional call */
	PERF_BR_COND_RET	/* Condition return */

If you agree these new definitions, I will prepare the new patch.

Thanks
Jin Yao

On 7/10/2017 6:32 PM, Michael Ellerman wrote:
> "Jin, Yao" <yao.jin@linux.intel.com> writes:
>> On 7/10/2017 2:05 PM, Michael Ellerman wrote:
>>> Jin Yao <yao.jin@linux.intel.com> writes:
>>>
>>>> It is often useful to know the branch types while analyzing branch
>>>> data. For example, a call is very different from a conditional branch.
>>>>
> ...
>>>> To keep consistent on kernel and userspace and make the classification
>>>> more common, the patch adds the common branch type classification
>>>> in perf_event.h.
>>> Most of the code and doc uses "branch" but then a few these are called
>>> "jump". Can we just stick with "branch"?
>>>
>>>> PERF_BR_NONE      : unknown
>>>> PERF_BR_JCC       : conditional jump
>>>> PERF_BR_JMP       : jump
>>>> PERF_BR_IND_JMP   : indirect jump
>>> eg:
>>>
>>> PERF_BR_COND    : conditional branch
>>> PERF_BR_UNCOND  : unconditional branch
>>> PERF_BR_IND     : indirect branch
>> Call and jump are all branches. If we want to figure out which one is
>> jump and which one is call, we need the detail branch type definitions.
> Yeah I'm not saying we don't need the different types, I'm saying I'd
> rather we just called them "branch" not "jump". Just because "jump" can
> mean different things on different arches.
>
>> For example,  if we only say "PERF_BR_IND", we could not know if it's an
>> indirect jump or indirect call.
> Yes we can, PERF_BR_IND is an indirect branch, which is not a call,
> because if it was a call then it would be PERF_BR_IND_CALL.
>
>>>> PERF_BR_CALL      : call
>>>> PERF_BR_IND_CALL  : indirect call
>>>> PERF_BR_RET       : return
>>>> PERF_BR_SYSCALL   : syscall
>>>> PERF_BR_SYSRET    : syscall return
>>>> PERF_BR_IRQ       : hw interrupt/trap/fault
>>>> PERF_BR_INT       : sw interrupt
>>> I'm not sure what that means, I'm guessing on x86 it means someone
>>> executed "int" ?
>> PERF_BR_IRQ is for hw interrupt and PERF_BR_INT is for sw interrupt.
> OK, but I still don't know what that means :)
>
> What's an example of an instruction that is PERF_BR_IRQ and PERF_BR_INT ?
>
>> PERF_BR_CALL/PERF_BR_IND_CALL and PERF_BR_RET are for function call
>> (direct call and indirect call) and return.
> Yep makes sense.
>
>> PERF_BR_SYSCALL/PERF_BR_SYSRET are for syscall and syscall return.
> Yep OK.
>
>>> Is that sufficiently useful to use up a bit? I think we only have 3
>>> free?
>> Do you means 3 bits? Each bit stands for one branch type? I guess what
>> you mean is:
>>
>> PERF_BR_COND    : conditional branch
>> PERF_BR_UNCOND  : unconditional branch
>> PERF_BR_IND     : indirect branch
>>
>> But 3 branch types are not enough for us.
> What I meant was you're using 4 bits for the type, so you have 16
> possible values, and you've defined 13 of them. Meaning there are only 3
> types free.
>
> So we should try to only define branch types that are really useful, and
> keep some free for future use.
>
> Maybe PERF_BR_INT is really common on x86 and so it's important to count
> it, but like I said above I don't know what it is.
>
>>>> PERF_BR_IRET      : return from interrupt
>>>> PERF_BR_FAR_BRANCH: not generic far branch type
>>> What is a "not generic far branch" ?
>>>
>>> I don't know what that would mean on powerpc for example.
>> It's reserved for future using I think.
> OK so let's not put it in the Linux API until it's defined?
>
>>> I think the only thing we have on powerpc that's commonly used and that
>>> isn't covered above is branches that decrement a loop counter and then
>>> branch based on the result.
> ...
>> Sorry, I'm not familiar with powerpc arch. Or could you add the branch
>> type which powerpc needs?
> These are good:
>
> +	PERF_BR_COND		= 1,	/* conditional */
> +	PERF_BR_UNCOND		= 2,	/* unconditional */
> +	PERF_BR_IND		= 3,	/* indirect */
> +	PERF_BR_CALL		= 4,	/* call */
> +	PERF_BR_IND_CALL	= 5,	/* indirect call */
> +	PERF_BR_RET		= 6,	/* return */
>
> These we wouldn't use currently, but make sense:
>
> +	PERF_BR_SYSCALL		= 7,	/* syscall */
> +	PERF_BR_SYSRET		= 8,	/* syscall return */
> +	PERF_BR_IRET		= 11,	/* return from interrupt */
>
> These I'm not so sure about, I don't really know what they would map to
> for us:
>
> +	PERF_BR_IRQ		= 9,	/* hw interrupt/trap/fault */
> +	PERF_BR_INT		= 10,	/* sw interrupt */
>
> And sounds like this should be dropped for now:
>
> +	PERF_BR_FAR_BRANCH	= 12,	/* not generic far branch type */
>
> The branch types you haven't covered which might be useful for us are:
>
> 	PERF_BR_COND_CALL	/* Conditional call */
> 	PERF_BR_COND_RET	/* Condition return */
>
>
> cheers

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
  2017-07-10 11:46         ` Jin, Yao
@ 2017-07-10 13:10           ` Segher Boessenkool
  2017-07-10 13:28             ` Jin, Yao
                               ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Segher Boessenkool @ 2017-07-10 13:10 UTC (permalink / raw)
  To: Jin, Yao
  Cc: Michael Ellerman, acme, jolsa, peterz, mingo, alexander.shishkin,
	kan.liang, ak, linuxppc-dev, Linux-kernel, yao.jin

Hi!

On Mon, Jul 10, 2017 at 07:46:17PM +0800, Jin, Yao wrote:
> 1. We all agree these definitions:
> 
> +	PERF_BR_COND		= 1,	/* conditional */
> +	PERF_BR_UNCOND		= 2,	/* unconditional */
> +	PERF_BR_IND		= 3,	/* indirect */
> +	PERF_BR_CALL		= 4,	/* call */
> +	PERF_BR_IND_CALL	= 5,	/* indirect call */
> +	PERF_BR_RET		= 6,	/* return */
> +	PERF_BR_SYSCALL		= 7,	/* syscall */
> +	PERF_BR_SYSRET		= 8,	/* syscall return */
> +	PERF_BR_IRET		= 11,	/* return from interrupt */

Do we?  It does not map very well to PowerPC branch types.

> 2. I wish to keep following definitions for x86.
> 
> +	PERF_BR_IRQ		= 9,	/* hw interrupt/trap/fault */
> +	PERF_BR_INT		= 10,	/* sw interrupt */
> 
> PERF_BR_INT is triggered by instruction "int" .
> PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3 
> transition).

So your "PERF_BR_INT" is a system call?  And PERF_BR_IRQ is not an
interrupt request (as its name suggests), not what we call an "external
interrupt" either; instead it is every interrupt that is not a system
call?

It also does not follow the lines of "software caused interrupt" vs.
the rest.

> 4. I'd like to add following types for powerpc.
> 
> 	PERF_BR_COND_CALL	/* Conditional call */
> 	PERF_BR_COND_RET	/* Condition return */

Almost all PowerPC branches have a "conditional" version (only "syscall"
and "sysret/iret" do not -- and those last two are the same, just like
PERF_BR_INT seems to be the same as PERF_BR_SYSCALL).

So how should those PERF_BR_* be used?  It cannot be used in an
architecture-neutral interface the way you define it now.


Segher

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
  2017-07-10 13:10           ` Segher Boessenkool
@ 2017-07-10 13:28             ` Jin, Yao
  2017-07-10 13:46             ` Peter Zijlstra
  2017-07-11  2:13             ` Michael Ellerman
  2 siblings, 0 replies; 35+ messages in thread
From: Jin, Yao @ 2017-07-10 13:28 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Ellerman, acme, jolsa, peterz, mingo, alexander.shishkin,
	kan.liang, ak, linuxppc-dev, Linux-kernel, yao.jin

Hi,

Following branch types should be common enough, right?

+	PERF_BR_COND		= 1,	/* conditional */
+	PERF_BR_UNCOND		= 2,	/* unconditional */
+	PERF_BR_IND		= 3,	/* indirect */
+	PERF_BR_CALL		= 4,	/* call */
+	PERF_BR_IND_CALL	= 5,	/* indirect call */
+	PERF_BR_RET		= 6,	/* return */

I decide to only define these types in this patch set. For other more 
arch-related branch type, we can add it in future.

Is this OK?

Thanks
Jin Yao

On 7/10/2017 9:10 PM, Segher Boessenkool wrote:
> Hi!
>
> On Mon, Jul 10, 2017 at 07:46:17PM +0800, Jin, Yao wrote:
>> 1. We all agree these definitions:
>>
>> +	PERF_BR_COND		= 1,	/* conditional */
>> +	PERF_BR_UNCOND		= 2,	/* unconditional */
>> +	PERF_BR_IND		= 3,	/* indirect */
>> +	PERF_BR_CALL		= 4,	/* call */
>> +	PERF_BR_IND_CALL	= 5,	/* indirect call */
>> +	PERF_BR_RET		= 6,	/* return */
>> +	PERF_BR_SYSCALL		= 7,	/* syscall */
>> +	PERF_BR_SYSRET		= 8,	/* syscall return */
>> +	PERF_BR_IRET		= 11,	/* return from interrupt */
> Do we?  It does not map very well to PowerPC branch types.
>
>> 2. I wish to keep following definitions for x86.
>>
>> +	PERF_BR_IRQ		= 9,	/* hw interrupt/trap/fault */
>> +	PERF_BR_INT		= 10,	/* sw interrupt */
>>
>> PERF_BR_INT is triggered by instruction "int" .
>> PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3
>> transition).
> So your "PERF_BR_INT" is a system call?  And PERF_BR_IRQ is not an
> interrupt request (as its name suggests), not what we call an "external
> interrupt" either; instead it is every interrupt that is not a system
> call?
>
> It also does not follow the lines of "software caused interrupt" vs.
> the rest.
>
>> 4. I'd like to add following types for powerpc.
>>
>> 	PERF_BR_COND_CALL	/* Conditional call */
>> 	PERF_BR_COND_RET	/* Condition return */
> Almost all PowerPC branches have a "conditional" version (only "syscall"
> and "sysret/iret" do not -- and those last two are the same, just like
> PERF_BR_INT seems to be the same as PERF_BR_SYSCALL).
>
> So how should those PERF_BR_* be used?  It cannot be used in an
> architecture-neutral interface the way you define it now.
>
>
> Segher

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
  2017-07-10 13:10           ` Segher Boessenkool
  2017-07-10 13:28             ` Jin, Yao
@ 2017-07-10 13:46             ` Peter Zijlstra
  2017-07-10 14:06               ` Jin, Yao
  2017-07-10 14:37               ` Segher Boessenkool
  2017-07-11  2:13             ` Michael Ellerman
  2 siblings, 2 replies; 35+ messages in thread
From: Peter Zijlstra @ 2017-07-10 13:46 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Jin, Yao, Michael Ellerman, acme, jolsa, mingo,
	alexander.shishkin, kan.liang, ak, linuxppc-dev, Linux-kernel,
	yao.jin

On Mon, Jul 10, 2017 at 08:10:50AM -0500, Segher Boessenkool wrote:

> > PERF_BR_INT is triggered by instruction "int" .
> > PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3 
> > transition).
> 
> So your "PERF_BR_INT" is a system call? 

The "INT" thing has indeed been used as system call mechanism (typically
INT 80). But these days we have special purpose syscall instructions.

It could maybe be compared to the PPC "Unconditional TRAP with
immediate" where you use the immediate value as an index into a handler
vector.

> And PERF_BR_IRQ is not an interrupt request (as its name suggests),
> not what we call an "external interrupt" either; instead it is every
> interrupt that is not a system call?

It is actual interrupts, but also faults, traps and all the other
exceptions not caused by "INT" I think.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
  2017-07-10 13:46             ` Peter Zijlstra
@ 2017-07-10 14:06               ` Jin, Yao
  2017-07-11  2:28                 ` Michael Ellerman
  2017-07-10 14:37               ` Segher Boessenkool
  1 sibling, 1 reply; 35+ messages in thread
From: Jin, Yao @ 2017-07-10 14:06 UTC (permalink / raw)
  To: Peter Zijlstra, Segher Boessenkool
  Cc: Michael Ellerman, acme, jolsa, mingo, alexander.shishkin,
	kan.liang, ak, linuxppc-dev, Linux-kernel, yao.jin



On 7/10/2017 9:46 PM, Peter Zijlstra wrote:
> On Mon, Jul 10, 2017 at 08:10:50AM -0500, Segher Boessenkool wrote:
>
>>> PERF_BR_INT is triggered by instruction "int" .
>>> PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3
>>> transition).
>> So your "PERF_BR_INT" is a system call?
> The "INT" thing has indeed been used as system call mechanism (typically
> INT 80). But these days we have special purpose syscall instructions.
>
> It could maybe be compared to the PPC "Unconditional TRAP with
> immediate" where you use the immediate value as an index into a handler
> vector.
>
>> And PERF_BR_IRQ is not an interrupt request (as its name suggests),
>> not what we call an "external interrupt" either; instead it is every
>> interrupt that is not a system call?
> It is actual interrupts, but also faults, traps and all the other
> exceptions not caused by "INT" I think.
>
Yes. It's interrupt, traps, faults. If from is in the user space and to 
is in the kernel, it indicates the ring3 -> ring0 transition.

If the from instruction is not syscall or other ring transition 
instruction, it should be interrupt, traps and faults. That's how we get 
the PERF_BR_IRQ on x86.

Anyway, maybe we just use a minimum but the most common set of branch 
types now, it could be a good start and acceptable on all architectures.

PERF_BR_COND        = 1,    /* conditional */
PERF_BR_UNCOND        = 2,    /* unconditional */
PERF_BR_IND        = 3,    /* indirect */
PERF_BR_CALL        = 4,    /* call */
PERF_BR_IND_CALL    = 5,    /* indirect call */
PERF_BR_RET        = 6,    /* return */

Thanks
Jin Yao

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
  2017-07-10 13:46             ` Peter Zijlstra
  2017-07-10 14:06               ` Jin, Yao
@ 2017-07-10 14:37               ` Segher Boessenkool
  1 sibling, 0 replies; 35+ messages in thread
From: Segher Boessenkool @ 2017-07-10 14:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jin, Yao, Michael Ellerman, acme, jolsa, mingo,
	alexander.shishkin, kan.liang, ak, linuxppc-dev, Linux-kernel,
	yao.jin

Hi Peter,

On Mon, Jul 10, 2017 at 03:46:58PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 10, 2017 at 08:10:50AM -0500, Segher Boessenkool wrote:
> 
> > > PERF_BR_INT is triggered by instruction "int" .
> > > PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3 
> > > transition).
> > 
> > So your "PERF_BR_INT" is a system call? 
> 
> The "INT" thing has indeed been used as system call mechanism (typically
> INT 80). But these days we have special purpose syscall instructions.
> 
> It could maybe be compared to the PPC "Unconditional TRAP with
> immediate" where you use the immediate value as an index into a handler
> vector.

If we would do that, yes :-)  (We just generate a SIGTRAP instead).

> > And PERF_BR_IRQ is not an interrupt request (as its name suggests),
> > not what we call an "external interrupt" either; instead it is every
> > interrupt that is not a system call?
> 
> It is actual interrupts, but also faults, traps and all the other
> exceptions not caused by "INT" I think.

Ah, right, exceptions == interrupts for PowerPC, more terminological
confusion :-)


Segher

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
  2017-07-10 13:10           ` Segher Boessenkool
  2017-07-10 13:28             ` Jin, Yao
  2017-07-10 13:46             ` Peter Zijlstra
@ 2017-07-11  2:13             ` Michael Ellerman
  2 siblings, 0 replies; 35+ messages in thread
From: Michael Ellerman @ 2017-07-11  2:13 UTC (permalink / raw)
  To: Segher Boessenkool, Jin, Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, kan.liang, ak,
	linuxppc-dev, Linux-kernel, yao.jin

Segher Boessenkool <segher@kernel.crashing.org> writes:
> Hi!
>
> On Mon, Jul 10, 2017 at 07:46:17PM +0800, Jin, Yao wrote:
>> 1. We all agree these definitions:
>> 
>> +	PERF_BR_COND		= 1,	/* conditional */
>> +	PERF_BR_UNCOND		= 2,	/* unconditional */
>> +	PERF_BR_IND		= 3,	/* indirect */
>> +	PERF_BR_CALL		= 4,	/* call */
>> +	PERF_BR_IND_CALL	= 5,	/* indirect call */
>> +	PERF_BR_RET		= 6,	/* return */
>> +	PERF_BR_SYSCALL		= 7,	/* syscall */
>> +	PERF_BR_SYSRET		= 8,	/* syscall return */
>> +	PERF_BR_IRET		= 11,	/* return from interrupt */
>
> Do we?  It does not map very well to PowerPC branch types.

I think they map well enough to the types of branches that are actually
used in practice.

To represent the full range of possibilities we'd need to switch to a
bitmap of flags, ie. COND, IND, CALL, RET, SYSCALL, INT, etc. But it
would need more than 4 bits and I don't think there's that much added
value in being able to represent all the bizarre combinations.

But maybe that is the best option as it makes the API more flexible and
means we don't have to get the list of branches correct up front?


I ran some quick numbers on a kernel I had here (powernv w/gcc 7):

  Type      Percent
  -----------------
  cond      40.92%	beq (79166) bne (57379) ble (10411) bgt (9587) blt (6248) bge (3704) bdnz (1251) bdz (353) bns (30) bdnzf (2) bdnzt (1)
  uncond    14.89%	b (61182) 
  indirect  0.10%	bctr (418)
  call      33.33%	bl (136926)
  ind call  1.44%	bctrl (5912)
  return    9.23%	blr (37943)
        =   99.91%

If we add cond call/return that covers another 0.08% taking us to 99.99%
of branches.

I know future compilers and or different code might use a different
distribution, but I doubt it will change all that much.

Maybe cond could be broken down further, but the only really meaningful
sub category I can think of is the decrementing type, and those are
quite rare.

cheers

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
  2017-07-10 14:06               ` Jin, Yao
@ 2017-07-11  2:28                 ` Michael Ellerman
  2017-07-11  3:00                   ` Jin, Yao
  0 siblings, 1 reply; 35+ messages in thread
From: Michael Ellerman @ 2017-07-11  2:28 UTC (permalink / raw)
  To: Jin, Yao, Peter Zijlstra, Segher Boessenkool
  Cc: acme, jolsa, mingo, alexander.shishkin, kan.liang, ak,
	linuxppc-dev, Linux-kernel, yao.jin

"Jin, Yao" <yao.jin@linux.intel.com> writes:

> On 7/10/2017 9:46 PM, Peter Zijlstra wrote:
>> On Mon, Jul 10, 2017 at 08:10:50AM -0500, Segher Boessenkool wrote:
>>
>>>> PERF_BR_INT is triggered by instruction "int" .
>>>> PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3
>>>> transition).
>>> So your "PERF_BR_INT" is a system call?
>> The "INT" thing has indeed been used as system call mechanism (typically
>> INT 80). But these days we have special purpose syscall instructions.
>>
>> It could maybe be compared to the PPC "Unconditional TRAP with
>> immediate" where you use the immediate value as an index into a handler
>> vector.
>>
>>> And PERF_BR_IRQ is not an interrupt request (as its name suggests),
>>> not what we call an "external interrupt" either; instead it is every
>>> interrupt that is not a system call?
>> It is actual interrupts, but also faults, traps and all the other
>> exceptions not caused by "INT" I think.
>>
> Yes. It's interrupt, traps, faults. If from is in the user space and to 
> is in the kernel, it indicates the ring3 -> ring0 transition.
>
> If the from instruction is not syscall or other ring transition 
> instruction, it should be interrupt, traps and faults. That's how we get 
> the PERF_BR_IRQ on x86.
>
> Anyway, maybe we just use a minimum but the most common set of branch 
> types now, it could be a good start and acceptable on all architectures.
>
> PERF_BR_COND        = 1,    /* conditional */
> PERF_BR_UNCOND        = 2,    /* unconditional */
> PERF_BR_IND        = 3,    /* indirect */
> PERF_BR_CALL        = 4,    /* call */
> PERF_BR_IND_CALL    = 5,    /* indirect call */
> PERF_BR_RET        = 6,    /* return */

That would be fine by me, if you're sick of talking about it and just
want to get it merged :)

I think you could expand it a bit, this list would cover the vast bulk
of branch types for us:

  PERF_BR_COND		/* Conditional */
  PERF_BR_UNCOND	/* Unconditional */
  PERF_BR_IND		/* Indirect */
  PERF_BR_CALL		/* Function call */
  PERF_BR_IND_CALL	/* Indirect function call */
  PERF_BR_RET		/* Function return */
  PERF_BR_SYSCALL	/* Syscall */
  PERF_BR_SYSRET	/* Syscall return */
  PERF_BR_COND_CALL	/* Conditional function call */
  PERF_BR_COND_RET	/* Conditional function return */

cheers

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v6 1/7] perf/core: Define the common branch type classification
  2017-07-11  2:28                 ` Michael Ellerman
@ 2017-07-11  3:00                   ` Jin, Yao
  0 siblings, 0 replies; 35+ messages in thread
From: Jin, Yao @ 2017-07-11  3:00 UTC (permalink / raw)
  To: Michael Ellerman, Peter Zijlstra, Segher Boessenkool
  Cc: acme, jolsa, mingo, alexander.shishkin, kan.liang, ak,
	linuxppc-dev, Linux-kernel, yao.jin



On 7/11/2017 10:28 AM, Michael Ellerman wrote:
> "Jin, Yao" <yao.jin@linux.intel.com> writes:
>
>> On 7/10/2017 9:46 PM, Peter Zijlstra wrote:
>>> On Mon, Jul 10, 2017 at 08:10:50AM -0500, Segher Boessenkool wrote:
>>>
>>>>> PERF_BR_INT is triggered by instruction "int" .
>>>>> PERF_BR_IRQ is triggered by interrupts, traps, faults (the ring 0,3
>>>>> transition).
>>>> So your "PERF_BR_INT" is a system call?
>>> The "INT" thing has indeed been used as system call mechanism (typically
>>> INT 80). But these days we have special purpose syscall instructions.
>>>
>>> It could maybe be compared to the PPC "Unconditional TRAP with
>>> immediate" where you use the immediate value as an index into a handler
>>> vector.
>>>
>>>> And PERF_BR_IRQ is not an interrupt request (as its name suggests),
>>>> not what we call an "external interrupt" either; instead it is every
>>>> interrupt that is not a system call?
>>> It is actual interrupts, but also faults, traps and all the other
>>> exceptions not caused by "INT" I think.
>>>
>> Yes. It's interrupt, traps, faults. If from is in the user space and to
>> is in the kernel, it indicates the ring3 -> ring0 transition.
>>
>> If the from instruction is not syscall or other ring transition
>> instruction, it should be interrupt, traps and faults. That's how we get
>> the PERF_BR_IRQ on x86.
>>
>> Anyway, maybe we just use a minimum but the most common set of branch
>> types now, it could be a good start and acceptable on all architectures.
>>
>> PERF_BR_COND        = 1,    /* conditional */
>> PERF_BR_UNCOND        = 2,    /* unconditional */
>> PERF_BR_IND        = 3,    /* indirect */
>> PERF_BR_CALL        = 4,    /* call */
>> PERF_BR_IND_CALL    = 5,    /* indirect call */
>> PERF_BR_RET        = 6,    /* return */
> That would be fine by me, if you're sick of talking about it and just
> want to get it merged :)
:)
>
> I think you could expand it a bit, this list would cover the vast bulk
> of branch types for us:
>
>    PERF_BR_COND		/* Conditional */
>    PERF_BR_UNCOND	/* Unconditional */
>    PERF_BR_IND		/* Indirect */
>    PERF_BR_CALL		/* Function call */
>    PERF_BR_IND_CALL	/* Indirect function call */
>    PERF_BR_RET		/* Function return */
>    PERF_BR_SYSCALL	/* Syscall */
>    PERF_BR_SYSRET	/* Syscall return */
>    PERF_BR_COND_CALL	/* Conditional function call */
>    PERF_BR_COND_RET	/* Conditional function return */
>
> cheers

OK, accept! Use 4 bits for above branch types and we can reserve 5 for 
potential future types.

Thanks
Jin Yao

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2017-07-11  3:01 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-20 12:07 [PATCH v6 0/7] perf report: Show branch type Jin Yao
2017-04-20  9:36 ` Jiri Olsa
2017-04-23  8:36   ` Jin, Yao
2017-06-02  8:02   ` Jin, Yao
2017-06-26  6:24     ` Jin, Yao
2017-07-06  1:47       ` Jin, Yao
2017-04-20 12:07 ` [PATCH v6 1/7] perf/core: Define the common branch type classification Jin Yao
2017-07-07  8:42   ` Peter Zijlstra
2017-07-10  5:19     ` Michael Ellerman
2017-07-10  6:05   ` Michael Ellerman
2017-07-10  8:16     ` Jin, Yao
2017-07-10 10:32       ` Michael Ellerman
2017-07-10 11:46         ` Jin, Yao
2017-07-10 13:10           ` Segher Boessenkool
2017-07-10 13:28             ` Jin, Yao
2017-07-10 13:46             ` Peter Zijlstra
2017-07-10 14:06               ` Jin, Yao
2017-07-11  2:28                 ` Michael Ellerman
2017-07-11  3:00                   ` Jin, Yao
2017-07-10 14:37               ` Segher Boessenkool
2017-07-11  2:13             ` Michael Ellerman
2017-04-20 12:07 ` [PATCH v6 2/7] perf/x86/intel: Record branch type Jin Yao
2017-04-23 13:55   ` Jiri Olsa
2017-04-24  0:47     ` Jin, Yao
2017-05-08  0:49       ` Jin, Yao
2017-05-09  8:26       ` Jiri Olsa
2017-05-09 11:57         ` Jin, Yao
2017-05-09 12:39           ` Jiri Olsa
2017-05-10  0:18             ` Jin, Yao
2017-04-20 12:07 ` [PATCH v6 3/7] perf record: Create a new option save_type in --branch-filter Jin Yao
2017-04-20 12:07 ` [PATCH v6 4/7] perf report: Refactor the branch info printing code Jin Yao
2017-04-20 12:07 ` [PATCH v6 5/7] perf util: Create branch.c/.h for common branch functions Jin Yao
2017-04-20 12:07 ` [PATCH v6 6/7] perf report: Show branch type statistics for stdio mode Jin Yao
2017-04-20 12:07 ` [PATCH v6 7/7] perf report: Show branch type in callchain entry Jin Yao
2017-07-07  8:09 ` [PATCH v6 0/7] perf report: Show branch type Jiri Olsa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.