All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] perf report: Show branch type
@ 2017-04-07 10:47 Jin Yao
  2017-04-07 10:47 ` [PATCH v2 1/5] perf/core: Define the common branch type classification Jin Yao
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Jin Yao @ 2017-04-07 10:47 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev, Jin Yao

v2:
---
1. Use 4 bits in perf_branch_entry to record branch type.

2. Pull out some common branch types from FAR_BRANCH. Now the branch
   types defined in perf_event.h:

PERF_BR_NONE      : unknown
PERF_BR_JCC_FWD   : conditional forward jump
PERF_BR_JCC_BWD   : conditional backward jump
PERF_BR_JMP       : jump
PERF_BR_IND_JMP   : indirect jump
PERF_BR_CALL      : call
PERF_BR_IND_CALL  : indirect call
PERF_BR_RET       : return
PERF_BR_SYSCALL   : syscall
PERF_BR_SYSRET    : syscall return
PERF_BR_IRQ       : hw interrupt/trap/fault
PERF_BR_INT       : sw interrupt
PERF_BR_IRET      : return from interrupt
PERF_BR_FAR_BRANCH: others not generic far branch type

3. Use 2 bits in perf_branch_entry for a "cross" metrics checking
   for branch cross 4K or 2M area. It's an approximate computing
   for checking if the branch cross 4K page or 2MB page.

For example:

perf record -g --branch-filter any,save_type <command>

perf report --stdio

     JCC forward:  27.7%
    JCC backward:   9.8%
             JMP:   0.0%
         IND_JMP:   6.5%
            CALL:  26.6%
        IND_CALL:   0.0%
             RET:  29.3%
            IRET:   0.0%
        CROSS_4K:   0.0%
        CROSS_2M:  14.3%

perf report --branch-history --stdio --no-children

    -23.60%--main div.c:42 (RET cycles:2)
             compute_flag div.c:28 (RET cycles:2)
             compute_flag div.c:27 (RET CROSS_2M cycles:1)
             rand rand.c:28 (RET CROSS_2M cycles:1)
             rand rand.c:28 (RET cycles:1)
             __random random.c:298 (RET cycles:1)
             __random random.c:297 (JCC forward cycles:1)
             __random random.c:295 (JCC forward cycles:1)
             __random random.c:295 (JCC forward cycles:1)
             __random random.c:295 (JCC forward cycles:1)
             __random random.c:295 (RET cycles:9)

Changed:
  perf/core: Define the common branch type classification
  perf/x86/intel: Record branch type
  perf report: Show branch type statistics for stdio mode
  perf report: Show branch type in callchain entry

Not changed:
  perf record: Create a new option save_type in --branch-filter

v1:
---
It is often useful to know the branch types while analyzing branch
data. For example, a call is very different from a conditional branch.

Currently we have to look it up in binary while the binary may later
not be available and even the binary is available but user has to take
some time. It is very useful for user to check it directly in perf
report.

Perf already has support for disassembling the branch instruction
to get the branch type.

The patch series records the branch type and show the branch type with
other LBR information in callchain entry via perf report. The patch
series also adds the branch type summary at the end of
perf report --stdio.

To keep consistent on kernel and userspace and make the classification
more common, the patch adds the common branch type classification
in perf_event.h.

The common branch types are:

 JCC forward: Conditional forward jump
JCC backward: Conditional backward jump
         JMP: Jump imm
     IND_JMP: Jump reg/mem
        CALL: Call imm
    IND_CALL: Call reg/mem
         RET: Ret
  FAR_BRANCH: SYSCALL/SYSRET, IRQ, IRET, TSX Abort

An example:

1. Record branch type (new option "save_type")

perf record -g --branch-filter any,save_type <command>

2. Show the branch type statistics at the end of perf report --stdio

perf report --stdio

     JCC forward:  34.0%
    JCC backward:   3.6%
             JMP:   0.0%
         IND_JMP:   6.5%
            CALL:  26.6%
        IND_CALL:   0.0%
             RET:  29.3%
      FAR_BRANCH:   0.0%

3. Show branch type in callchain entry

perf report --branch-history --stdio --no-children

    --23.91%--main div.c:42 (RET cycles:2)
              compute_flag div.c:28 (RET cycles:2)
              compute_flag div.c:27 (RET cycles:1)
              rand rand.c:28 (RET cycles:1)
              rand rand.c:28 (RET cycles:1)
              __random random.c:298 (RET cycles:1)
              __random random.c:297 (JCC forward cycles:1)
              __random random.c:295 (JCC forward cycles:1)
              __random random.c:295 (JCC forward cycles:1)
              __random random.c:295 (JCC forward cycles:1)
              __random random.c:295 (RET cycles:9)

Jin Yao (5):
  perf/core: Define the common branch type classification
  perf/x86/intel: Record branch type
  perf record: Create a new option save_type in --branch-filter
  perf report: Show branch type statistics for stdio mode
  perf report: Show branch type in callchain entry

 arch/x86/events/intel/lbr.c              | 106 ++++++++++++++-
 include/uapi/linux/perf_event.h          |  37 +++++-
 tools/include/uapi/linux/perf_event.h    |  37 +++++-
 tools/perf/Documentation/perf-record.txt |   1 +
 tools/perf/builtin-report.c              | 212 +++++++++++++++++++++++++++++
 tools/perf/util/callchain.c              | 221 ++++++++++++++++++++++---------
 tools/perf/util/callchain.h              |  20 +++
 tools/perf/util/event.h                  |   4 +-
 tools/perf/util/hist.c                   |   5 +-
 tools/perf/util/parse-branch-options.c   |   1 +
 10 files changed, 577 insertions(+), 67 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/5] perf/core: Define the common branch type classification
  2017-04-07 10:47 [PATCH v2 0/5] perf report: Show branch type Jin Yao
@ 2017-04-07 10:47 ` Jin Yao
  2017-04-07 10:47 ` [PATCH v2 2/5] perf/x86/intel: Record branch type Jin Yao
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Jin Yao @ 2017-04-07 10:47 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev, Jin Yao

It is often useful to know the branch types while analyzing branch
data. For example, a call is very different from a conditional branch.

Currently we have to look it up in binary while the binary may later
not be available and even the binary is available but user has to take
some time. It is very useful for user to check it directly in perf
report.

Perf already has support for disassembling the branch instruction
to get the x86 branch type.

To keep consistent on kernel and userspace and make the classification
more common, the patch adds the common branch type classification
in perf_event.h.

PERF_BR_NONE      : unknown
PERF_BR_JCC_FWD   : conditional forward jump
PERF_BR_JCC_BWD   : conditional backward jump
PERF_BR_JMP       : jump
PERF_BR_IND_JMP   : indirect jump
PERF_BR_CALL      : call
PERF_BR_IND_CALL  : indirect call
PERF_BR_RET       : return
PERF_BR_SYSCALL   : syscall
PERF_BR_SYSRET    : syscall return
PERF_BR_IRQ       : hw interrupt/trap/fault
PERF_BR_INT       : sw interrupt
PERF_BR_IRET      : return from interrupt
PERF_BR_FAR_BRANCH: others not generic branch type

The patch adds following metrics checking for branches cross
4K or 2MB areas.

PERF_BR_CROSS_NONE: branch not cross an area
PERF_BR_CROSS_4K  : branch cross 4K area
PERF_BR_CROSS_2M  : branch cross 2MB area

Since the disassembling of branch instruction needs some overhead,
a new PERF_SAMPLE_BRANCH_TYPE_SAVE is introduced to indicate if it
needs to disassemble the branch instruction and record the branch
type.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 include/uapi/linux/perf_event.h       | 37 ++++++++++++++++++++++++++++++++++-
 tools/include/uapi/linux/perf_event.h | 37 ++++++++++++++++++++++++++++++++++-
 2 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index d09a9cd..e2fcd53 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
 	PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT	= 14, /* no flags */
 	PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT	= 15, /* no cycles */
 
+	PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT	= 16, /* save branch type */
+
 	PERF_SAMPLE_BRANCH_MAX_SHIFT		/* non-ABI */
 };
 
@@ -198,9 +200,38 @@ enum perf_branch_sample_type {
 	PERF_SAMPLE_BRANCH_NO_FLAGS	= 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
 	PERF_SAMPLE_BRANCH_NO_CYCLES	= 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
 
+	PERF_SAMPLE_BRANCH_TYPE_SAVE	=
+		1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
+
 	PERF_SAMPLE_BRANCH_MAX		= 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
 };
 
+/*
+ * Common flow change classification
+ */
+enum {
+	PERF_BR_NONE		= 0,	/* unknown */
+	PERF_BR_JCC_FWD		= 1,	/* conditional forward jump */
+	PERF_BR_JCC_BWD		= 2,	/* conditional backward jump */
+	PERF_BR_JMP		= 3,	/* jump */
+	PERF_BR_IND_JMP		= 4,	/* indirect jump */
+	PERF_BR_CALL		= 5,	/* call */
+	PERF_BR_IND_CALL	= 6,	/* indirect call */
+	PERF_BR_RET		= 7,	/* return */
+	PERF_BR_SYSCALL		= 8,	/* syscall */
+	PERF_BR_SYSRET		= 9,	/* syscall return */
+	PERF_BR_IRQ		= 10,	/* hw interrupt/trap/fault */
+	PERF_BR_INT		= 11,	/* sw interrupt */
+	PERF_BR_IRET		= 12,	/* return from interrupt */
+	PERF_BR_FAR_BRANCH	= 13,	/* others not generic branch type */
+};
+
+enum {
+	PERF_BR_CROSS_NONE	= 0,	/* branch not cross an area */
+	PERF_BR_CROSS_4K	= 1,	/* branch cross 4K */
+	PERF_BR_CROSS_2M	= 2,	/* branch cross 2MB */
+};
+
 #define PERF_SAMPLE_BRANCH_PLM_ALL \
 	(PERF_SAMPLE_BRANCH_USER|\
 	 PERF_SAMPLE_BRANCH_KERNEL|\
@@ -999,6 +1030,8 @@ union perf_mem_data_src {
  *     in_tx: running in a hardware transaction
  *     abort: aborting a hardware transaction
  *    cycles: cycles from last branch (or 0 if not supported)
+ *      type: branch type
+ *     cross: branch cross 4K or 2MB area
  */
 struct perf_branch_entry {
 	__u64	from;
@@ -1008,7 +1041,9 @@ struct perf_branch_entry {
 		in_tx:1,    /* in transaction */
 		abort:1,    /* transaction abort */
 		cycles:16,  /* cycle count to last branch */
-		reserved:44;
+		type:4,     /* branch type */
+		cross:2,    /* branch cross 4K or 2MB area */
+		reserved:38;
 };
 
 #endif /* _UAPI_LINUX_PERF_EVENT_H */
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index d09a9cd..e2fcd53 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -174,6 +174,8 @@ enum perf_branch_sample_type_shift {
 	PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT	= 14, /* no flags */
 	PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT	= 15, /* no cycles */
 
+	PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT	= 16, /* save branch type */
+
 	PERF_SAMPLE_BRANCH_MAX_SHIFT		/* non-ABI */
 };
 
@@ -198,9 +200,38 @@ enum perf_branch_sample_type {
 	PERF_SAMPLE_BRANCH_NO_FLAGS	= 1U << PERF_SAMPLE_BRANCH_NO_FLAGS_SHIFT,
 	PERF_SAMPLE_BRANCH_NO_CYCLES	= 1U << PERF_SAMPLE_BRANCH_NO_CYCLES_SHIFT,
 
+	PERF_SAMPLE_BRANCH_TYPE_SAVE	=
+		1U << PERF_SAMPLE_BRANCH_TYPE_SAVE_SHIFT,
+
 	PERF_SAMPLE_BRANCH_MAX		= 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
 };
 
+/*
+ * Common flow change classification
+ */
+enum {
+	PERF_BR_NONE		= 0,	/* unknown */
+	PERF_BR_JCC_FWD		= 1,	/* conditional forward jump */
+	PERF_BR_JCC_BWD		= 2,	/* conditional backward jump */
+	PERF_BR_JMP		= 3,	/* jump */
+	PERF_BR_IND_JMP		= 4,	/* indirect jump */
+	PERF_BR_CALL		= 5,	/* call */
+	PERF_BR_IND_CALL	= 6,	/* indirect call */
+	PERF_BR_RET		= 7,	/* return */
+	PERF_BR_SYSCALL		= 8,	/* syscall */
+	PERF_BR_SYSRET		= 9,	/* syscall return */
+	PERF_BR_IRQ		= 10,	/* hw interrupt/trap/fault */
+	PERF_BR_INT		= 11,	/* sw interrupt */
+	PERF_BR_IRET		= 12,	/* return from interrupt */
+	PERF_BR_FAR_BRANCH	= 13,	/* others not generic branch type */
+};
+
+enum {
+	PERF_BR_CROSS_NONE	= 0,	/* branch not cross an area */
+	PERF_BR_CROSS_4K	= 1,	/* branch cross 4K */
+	PERF_BR_CROSS_2M	= 2,	/* branch cross 2MB */
+};
+
 #define PERF_SAMPLE_BRANCH_PLM_ALL \
 	(PERF_SAMPLE_BRANCH_USER|\
 	 PERF_SAMPLE_BRANCH_KERNEL|\
@@ -999,6 +1030,8 @@ union perf_mem_data_src {
  *     in_tx: running in a hardware transaction
  *     abort: aborting a hardware transaction
  *    cycles: cycles from last branch (or 0 if not supported)
+ *      type: branch type
+ *     cross: branch cross 4K or 2MB area
  */
 struct perf_branch_entry {
 	__u64	from;
@@ -1008,7 +1041,9 @@ struct perf_branch_entry {
 		in_tx:1,    /* in transaction */
 		abort:1,    /* transaction abort */
 		cycles:16,  /* cycle count to last branch */
-		reserved:44;
+		type:4,     /* branch type */
+		cross:2,    /* branch cross 4K or 2MB area */
+		reserved:38;
 };
 
 #endif /* _UAPI_LINUX_PERF_EVENT_H */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/5] perf/x86/intel: Record branch type
  2017-04-07 10:47 [PATCH v2 0/5] perf report: Show branch type Jin Yao
  2017-04-07 10:47 ` [PATCH v2 1/5] perf/core: Define the common branch type classification Jin Yao
@ 2017-04-07 10:47 ` Jin Yao
  2017-04-07 15:20   ` Peter Zijlstra
  2017-04-07 10:47 ` [PATCH v2 3/5] perf record: Create a new option save_type in --branch-filter Jin Yao
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Jin Yao @ 2017-04-07 10:47 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev, Jin Yao

Perf already has support for disassembling the branch instruction
and using the branch type for filtering. The patch just records
the branch type in perf_branch_entry.

Before recording, the patch converts the x86 branch classification
to common branch classification and compute for checking if the
branches cross 4K or 2MB areas. It's an approximate computing for
crossing 4K page or 2MB page.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 arch/x86/events/intel/lbr.c | 106 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 105 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 81b321a..635a0fb 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -109,6 +109,9 @@ enum {
 	X86_BR_ZERO_CALL	= 1 << 15,/* zero length call */
 	X86_BR_CALL_STACK	= 1 << 16,/* call stack */
 	X86_BR_IND_JMP		= 1 << 17,/* indirect jump */
+
+	X86_BR_TYPE_SAVE	= 1 << 18,/* indicate to save branch type */
+
 };
 
 #define X86_BR_PLM (X86_BR_USER | X86_BR_KERNEL)
@@ -139,6 +142,9 @@ enum {
 	 X86_BR_IRQ		|\
 	 X86_BR_INT)
 
+#define AREA_4K		4096
+#define AREA_2M		(2 * 1024 * 1024)
+
 static void intel_pmu_lbr_filter(struct cpu_hw_events *cpuc);
 
 /*
@@ -670,6 +676,10 @@ static int intel_pmu_setup_sw_lbr_filter(struct perf_event *event)
 
 	if (br_type & PERF_SAMPLE_BRANCH_CALL)
 		mask |= X86_BR_CALL | X86_BR_ZERO_CALL;
+
+	if (br_type & PERF_SAMPLE_BRANCH_TYPE_SAVE)
+		mask |= X86_BR_TYPE_SAVE;
+
 	/*
 	 * stash actual user request into reg, it may
 	 * be used by fixup code for some CPU
@@ -923,6 +933,84 @@ static int branch_type(unsigned long from, unsigned long to, int abort)
 	return ret;
 }
 
+static int
+common_branch_type(int type, u64 from, u64 to)
+{
+	int ret;
+
+	type = type & (~(X86_BR_KERNEL | X86_BR_USER));
+
+	switch (type) {
+	case X86_BR_CALL:
+	case X86_BR_ZERO_CALL:
+		ret = PERF_BR_CALL;
+		break;
+
+	case X86_BR_RET:
+		ret = PERF_BR_RET;
+		break;
+
+	case X86_BR_SYSCALL:
+		ret = PERF_BR_SYSCALL;
+		break;
+
+	case X86_BR_SYSRET:
+		ret = PERF_BR_SYSRET;
+		break;
+
+	case X86_BR_INT:
+		ret = PERF_BR_INT;
+		break;
+
+	case X86_BR_IRET:
+		ret = PERF_BR_IRET;
+		break;
+
+	case X86_BR_IRQ:
+		ret = PERF_BR_IRQ;
+		break;
+
+	case X86_BR_ABORT:
+		ret = PERF_BR_FAR_BRANCH;
+		break;
+
+	case X86_BR_JCC:
+		if (to > from)
+			ret = PERF_BR_JCC_FWD;
+		else
+			ret = PERF_BR_JCC_BWD;
+		break;
+
+	case X86_BR_JMP:
+		ret = PERF_BR_JMP;
+		break;
+
+	case X86_BR_IND_CALL:
+		ret = PERF_BR_IND_CALL;
+		break;
+
+	case X86_BR_IND_JMP:
+		ret = PERF_BR_IND_JMP;
+		break;
+
+	default:
+		ret = PERF_BR_NONE;
+	}
+
+	return ret;
+}
+
+static bool
+cross_area(u64 addr1, u64 addr2, int size)
+{
+	u64 align1, align2;
+
+	align1 = addr1 & ~(size - 1);
+	align2 = addr2 & ~(size - 1);
+
+	return (align1 != align2) ? true : false;
+}
+
 /*
  * implement actual branch filter based on user demand.
  * Hardware may not exactly satisfy that request, thus
@@ -939,7 +1027,8 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
 	bool compress = false;
 
 	/* if sampling all branches, then nothing to filter */
-	if ((br_sel & X86_BR_ALL) == X86_BR_ALL)
+	if (((br_sel & X86_BR_ALL) == X86_BR_ALL) &&
+	    ((br_sel & X86_BR_TYPE_SAVE) != X86_BR_TYPE_SAVE))
 		return;
 
 	for (i = 0; i < cpuc->lbr_stack.nr; i++) {
@@ -960,6 +1049,21 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
 			cpuc->lbr_entries[i].from = 0;
 			compress = true;
 		}
+
+		if ((br_sel & X86_BR_TYPE_SAVE) == X86_BR_TYPE_SAVE) {
+			cpuc->lbr_entries[i].type = common_branch_type(type,
+								       from,
+								       to);
+			if (cross_area(from, to, AREA_2M))
+				cpuc->lbr_entries[i].cross = PERF_BR_CROSS_2M;
+			else if (cross_area(from, to, AREA_4K))
+				cpuc->lbr_entries[i].cross = PERF_BR_CROSS_4K;
+			else
+				cpuc->lbr_entries[i].cross = PERF_BR_CROSS_NONE;
+		} else {
+			cpuc->lbr_entries[i].type = PERF_BR_NONE;
+			cpuc->lbr_entries[i].cross = PERF_BR_CROSS_NONE;
+		}
 	}
 
 	if (!compress)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/5] perf record: Create a new option save_type in --branch-filter
  2017-04-07 10:47 [PATCH v2 0/5] perf report: Show branch type Jin Yao
  2017-04-07 10:47 ` [PATCH v2 1/5] perf/core: Define the common branch type classification Jin Yao
  2017-04-07 10:47 ` [PATCH v2 2/5] perf/x86/intel: Record branch type Jin Yao
@ 2017-04-07 10:47 ` Jin Yao
  2017-04-07 10:47 ` [PATCH v2 4/5] perf report: Show branch type statistics for stdio mode Jin Yao
  2017-04-07 10:47 ` [PATCH v2 5/5] perf report: Show branch type in callchain entry Jin Yao
  4 siblings, 0 replies; 11+ messages in thread
From: Jin Yao @ 2017-04-07 10:47 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev, Jin Yao

The option indicates the kernel to save branch type during sampling.

One example:
perf record -g --branch-filter any,save_type <command>

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/Documentation/perf-record.txt | 1 +
 tools/perf/util/parse-branch-options.c   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index ea3789d..e2f5a4f 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -332,6 +332,7 @@ following filters are defined:
 	- no_tx: only when the target is not in a hardware transaction
 	- abort_tx: only when the target is a hardware transaction abort
 	- cond: conditional branches
+	- save_type: save branch type during sampling in case binary is not available later
 
 +
 The option requires at least one branch type among any, any_call, any_ret, ind_call, cond.
diff --git a/tools/perf/util/parse-branch-options.c b/tools/perf/util/parse-branch-options.c
index 38fd115..e71fb5f 100644
--- a/tools/perf/util/parse-branch-options.c
+++ b/tools/perf/util/parse-branch-options.c
@@ -28,6 +28,7 @@ static const struct branch_mode branch_modes[] = {
 	BRANCH_OPT("cond", PERF_SAMPLE_BRANCH_COND),
 	BRANCH_OPT("ind_jmp", PERF_SAMPLE_BRANCH_IND_JUMP),
 	BRANCH_OPT("call", PERF_SAMPLE_BRANCH_CALL),
+	BRANCH_OPT("save_type", PERF_SAMPLE_BRANCH_TYPE_SAVE),
 	BRANCH_END
 };
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 4/5] perf report: Show branch type statistics for stdio mode
  2017-04-07 10:47 [PATCH v2 0/5] perf report: Show branch type Jin Yao
                   ` (2 preceding siblings ...)
  2017-04-07 10:47 ` [PATCH v2 3/5] perf record: Create a new option save_type in --branch-filter Jin Yao
@ 2017-04-07 10:47 ` Jin Yao
  2017-04-07 10:47 ` [PATCH v2 5/5] perf report: Show branch type in callchain entry Jin Yao
  4 siblings, 0 replies; 11+ messages in thread
From: Jin Yao @ 2017-04-07 10:47 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev, Jin Yao

Show the branch type statistics at the end of perf report --stdio.

For example:
perf report --stdio

 JCC forward:  27.7%
JCC backward:   9.8%
         JMP:   0.0%
     IND_JMP:   6.5%
        CALL:  26.6%
    IND_CALL:   0.0%
         RET:  29.3%
        IRET:   0.0%
    CROSS_4K:   0.0%
    CROSS_2M:  14.3%

The branch types are:
---------------------
 JCC forward: Conditional forward jump
JCC backward: Conditional backward jump
         JMP: Jump imm
     IND_JMP: Jump reg/mem
        CALL: Call imm
    IND_CALL: Call reg/mem
         RET: Ret
     SYSCALL: Syscall
      SYSRET: Syscall return
         IRQ: HW interrupt/trap/fault
         INT: SW interrupt
        IRET: Return from interrupt
  FAR_BRANCH: Others not generic branch type

CROSS_4K and CROSS_2M:
----------------------
They are the metrics checking for branches cross 4K or 2MB pages.
It's an approximate computing. We don't know if the area is 4K or
2MB, so always compute both.

To make the output simple, if a branch crosses 2M area, CROSS_4K
will not be incremented.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/builtin-report.c | 212 ++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/event.h     |   4 +-
 tools/perf/util/hist.c      |   5 +-
 3 files changed, 216 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index c18158b..1dc1058 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -43,6 +43,24 @@
 #include <linux/bitmap.h>
 #include <linux/stringify.h>
 
+struct branch_type_stat {
+	u64	jcc_fwd;
+	u64	jcc_bwd;
+	u64	jmp;
+	u64	ind_jmp;
+	u64	call;
+	u64	ind_call;
+	u64	ret;
+	u64	syscall;
+	u64	sysret;
+	u64	irq;
+	u64	intr;
+	u64	iret;
+	u64	far_branch;
+	u64	cross_4k;
+	u64	cross_2m;
+};
+
 struct report {
 	struct perf_tool	tool;
 	struct perf_session	*session;
@@ -66,6 +84,7 @@ struct report {
 	u64			queue_size;
 	int			socket_filter;
 	DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
+	struct branch_type_stat	brtype_stat;
 };
 
 static int report__config(const char *var, const char *value, void *cb)
@@ -144,6 +163,91 @@ static int hist_iter__report_callback(struct hist_entry_iter *iter,
 	return err;
 }
 
+static void branch_type_count(struct report *rep, struct branch_info *bi)
+{
+	struct branch_type_stat	*stat = &rep->brtype_stat;
+	struct branch_flags *flags = &bi->flags;
+
+	switch (flags->type) {
+	case PERF_BR_JCC_FWD:
+		stat->jcc_fwd++;
+		break;
+
+	case PERF_BR_JCC_BWD:
+		stat->jcc_bwd++;
+		break;
+
+	case PERF_BR_JMP:
+		stat->jmp++;
+		break;
+
+	case PERF_BR_IND_JMP:
+		stat->ind_jmp++;
+		break;
+
+	case PERF_BR_CALL:
+		stat->call++;
+		break;
+
+	case PERF_BR_IND_CALL:
+		stat->ind_call++;
+		break;
+
+	case PERF_BR_RET:
+		stat->ret++;
+		break;
+
+	case PERF_BR_SYSCALL:
+		stat->syscall++;
+		break;
+
+	case PERF_BR_SYSRET:
+		stat->sysret++;
+		break;
+
+	case PERF_BR_IRQ:
+		stat->irq++;
+		break;
+
+	case PERF_BR_INT:
+		stat->intr++;
+		break;
+
+	case PERF_BR_IRET:
+		stat->iret++;
+		break;
+
+	case PERF_BR_FAR_BRANCH:
+		stat->far_branch++;
+		break;
+
+	default:
+		break;
+	}
+
+	if (flags->cross == PERF_BR_CROSS_2M)
+		stat->cross_2m++;
+	else if (flags->cross == PERF_BR_CROSS_4K)
+		stat->cross_4k++;
+}
+
+static int hist_iter__branch_callback(struct hist_entry_iter *iter,
+				      struct addr_location *al __maybe_unused,
+				      bool single __maybe_unused,
+				      void *arg)
+{
+	struct hist_entry *he = iter->he;
+	struct report *rep = arg;
+	struct branch_info *bi;
+
+	if (sort__mode == SORT_MODE__BRANCH) {
+		bi = he->branch_info;
+		branch_type_count(rep, bi);
+	}
+
+	return 0;
+}
+
 static int process_sample_event(struct perf_tool *tool,
 				union perf_event *event,
 				struct perf_sample *sample,
@@ -182,6 +286,8 @@ static int process_sample_event(struct perf_tool *tool,
 		 */
 		if (!sample->branch_stack)
 			goto out_put;
+
+		iter.add_entry_cb = hist_iter__branch_callback;
 		iter.ops = &hist_iter_branch;
 	} else if (rep->mem_mode) {
 		iter.ops = &hist_iter_mem;
@@ -369,6 +475,107 @@ static size_t hists__fprintf_nr_sample_events(struct hists *hists, struct report
 	return ret + fprintf(fp, "\n#\n");
 }
 
+static void branch_type_stat_display(FILE *fp, struct branch_type_stat *stat)
+{
+	u64 total = 0;
+
+	total += stat->jcc_fwd;
+	total += stat->jcc_bwd;
+	total += stat->jmp;
+	total += stat->ind_jmp;
+	total += stat->call;
+	total += stat->ind_call;
+	total += stat->ret;
+	total += stat->syscall;
+	total += stat->sysret;
+	total += stat->irq;
+	total += stat->intr;
+	total += stat->iret;
+	total += stat->far_branch;
+
+	if (total == 0)
+		return;
+
+	fprintf(fp, "\n#");
+	fprintf(fp, "\n# Branch Statistics:");
+	fprintf(fp, "\n#");
+
+	if (stat->jcc_fwd > 0)
+		fprintf(fp, "\n%12s: %5.1f%%",
+			"JCC forward",
+			100.0 * (double)stat->jcc_fwd / (double)total);
+
+	if (stat->jcc_bwd > 0)
+		fprintf(fp, "\n%12s: %5.1f%%",
+			"JCC backward",
+			100.0 * (double)stat->jcc_bwd / (double)total);
+
+	if (stat->jmp > 0)
+		fprintf(fp, "\n%12s: %5.1f%%",
+			"JMP",
+			100.0 * (double)stat->jmp / (double)total);
+
+	if (stat->ind_jmp > 0)
+		fprintf(fp, "\n%12s: %5.1f%%",
+			"IND_JMP",
+			100.0 * (double)stat->ind_jmp / (double)total);
+
+	if (stat->call > 0)
+		fprintf(fp, "\n%12s: %5.1f%%",
+			"CALL",
+			100.0 * (double)stat->call / (double)total);
+
+	if (stat->ind_call > 0)
+		fprintf(fp, "\n%12s: %5.1f%%",
+			"IND_CALL",
+			100.0 * (double)stat->ind_call / (double)total);
+
+	if (stat->ret > 0)
+		fprintf(fp, "\n%12s: %5.1f%%",
+			"RET",
+			100.0 * (double)stat->ret / (double)total);
+
+	if (stat->syscall > 0)
+		fprintf(fp, "\n%12s: %5.1f%%",
+			"SYSCALL",
+			100.0 * (double)stat->syscall / (double)total);
+
+	if (stat->sysret > 0)
+		fprintf(fp, "\n%12s: %5.1f%%",
+			"SYSRET",
+			100.0 * (double)stat->sysret / (double)total);
+
+	if (stat->irq > 0)
+		fprintf(fp, "\n%12s: %5.1f%%",
+			"IRQ",
+			100.0 * (double)stat->irq / (double)total);
+
+	if (stat->intr > 0)
+		fprintf(fp, "\n%12s: %5.1f%%",
+			"INT",
+			100.0 * (double)stat->intr / (double)total);
+
+	if (stat->iret > 0)
+		fprintf(fp, "\n%12s: %5.1f%%",
+			"IRET",
+			100.0 * (double)stat->iret / (double)total);
+
+	if (stat->far_branch > 0)
+		fprintf(fp, "\n%12s: %5.1f%%",
+			"FAR_BRANCH",
+			100.0 * (double)stat->far_branch / (double)total);
+
+	if (stat->cross_4k > 0)
+		fprintf(fp, "\n%12s: %5.1f%%",
+			"CROSS_4K",
+			100.0 * (double)stat->cross_4k / (double)total);
+
+	if (stat->cross_2m > 0)
+		fprintf(fp, "\n%12s: %5.1f%%",
+			"CROSS_2M",
+			100.0 * (double)stat->cross_2m / (double)total);
+}
+
 static int perf_evlist__tty_browse_hists(struct perf_evlist *evlist,
 					 struct report *rep,
 					 const char *help)
@@ -404,6 +611,9 @@ static int perf_evlist__tty_browse_hists(struct perf_evlist *evlist,
 		perf_read_values_destroy(&rep->show_threads_values);
 	}
 
+	if (sort__mode == SORT_MODE__BRANCH)
+		branch_type_stat_display(stdout, &rep->brtype_stat);
+
 	return 0;
 }
 
@@ -936,6 +1146,8 @@ int cmd_report(int argc, const char **argv)
 	if (has_br_stack && branch_call_mode)
 		symbol_conf.show_branchflag_count = true;
 
+	memset(&report.brtype_stat, 0, sizeof(struct branch_type_stat));
+
 	/*
 	 * Branch mode is a tristate:
 	 * -1 means default, so decide based on the file having branch data.
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index eb7a7b2..b192a10 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -142,7 +142,9 @@ struct branch_flags {
 	u64 in_tx:1;
 	u64 abort:1;
 	u64 cycles:16;
-	u64 reserved:44;
+	u64 type:4;
+	u64 cross:2;
+	u64 reserved:38;
 };
 
 struct branch_entry {
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 61bf304..c8aee25 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -745,12 +745,9 @@ iter_prepare_branch_entry(struct hist_entry_iter *iter, struct addr_location *al
 }
 
 static int
-iter_add_single_branch_entry(struct hist_entry_iter *iter,
+iter_add_single_branch_entry(struct hist_entry_iter *iter __maybe_unused,
 			     struct addr_location *al __maybe_unused)
 {
-	/* to avoid calling callback function */
-	iter->he = NULL;
-
 	return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 5/5] perf report: Show branch type in callchain entry
  2017-04-07 10:47 [PATCH v2 0/5] perf report: Show branch type Jin Yao
                   ` (3 preceding siblings ...)
  2017-04-07 10:47 ` [PATCH v2 4/5] perf report: Show branch type statistics for stdio mode Jin Yao
@ 2017-04-07 10:47 ` Jin Yao
  4 siblings, 0 replies; 11+ messages in thread
From: Jin Yao @ 2017-04-07 10:47 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, linuxppc-dev, Jin Yao

Show branch type in callchain entry. The branch type is printed
with other LBR information (such as cycles/abort/...).

One example:
perf report --branch-history --stdio --no-children

-23.60%--main div.c:42 (RET cycles:2)
         compute_flag div.c:28 (RET cycles:2)
         compute_flag div.c:27 (RET CROSS_2M cycles:1)
         rand rand.c:28 (RET CROSS_2M cycles:1)
         rand rand.c:28 (RET cycles:1)
         __random random.c:298 (RET cycles:1)
         __random random.c:297 (JCC forward cycles:1)
         __random random.c:295 (JCC forward cycles:1)
         __random random.c:295 (JCC forward cycles:1)
         __random random.c:295 (JCC forward cycles:1)
         __random random.c:295 (RET cycles:9)

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
 tools/perf/util/callchain.c | 221 ++++++++++++++++++++++++++++++++------------
 tools/perf/util/callchain.h |  20 ++++
 2 files changed, 182 insertions(+), 59 deletions(-)

diff --git a/tools/perf/util/callchain.c b/tools/perf/util/callchain.c
index 3cea1fb..ca040a0 100644
--- a/tools/perf/util/callchain.c
+++ b/tools/perf/util/callchain.c
@@ -428,6 +428,89 @@ create_child(struct callchain_node *parent, bool inherit_children)
 	return new;
 }
 
+static const char *br_tag[BR_IDX_MAX] = {
+	"JCC forward",
+	"JCC backward",
+	"JMP",
+	"IND_JMP",
+	"CALL",
+	"IND_CALL",
+	"RET",
+	"SYSCALL",
+	"SYSRET",
+	"IRQ",
+	"INT",
+	"IRET",
+	"FAR_BRANCH",
+	"CROSS_4K",
+	"CROSS_2M",
+};
+
+static void
+branch_type_count(int *counts, struct branch_flags *flags)
+{
+	switch (flags->type) {
+	case PERF_BR_JCC_FWD:
+		counts[BR_IDX_JCC_FWD]++;
+		break;
+
+	case PERF_BR_JCC_BWD:
+		counts[BR_IDX_JCC_BWD]++;
+		break;
+
+	case PERF_BR_JMP:
+		counts[BR_IDX_JMP]++;
+		break;
+
+	case PERF_BR_IND_JMP:
+		counts[BR_IDX_IND_JMP]++;
+		break;
+
+	case PERF_BR_CALL:
+		counts[BR_IDX_CALL]++;
+		break;
+
+	case PERF_BR_IND_CALL:
+		counts[BR_IDX_IND_CALL]++;
+		break;
+
+	case PERF_BR_RET:
+		counts[BR_IDX_RET]++;
+		break;
+
+	case PERF_BR_SYSCALL:
+		counts[BR_IDX_SYSCALL]++;
+		break;
+
+	case PERF_BR_SYSRET:
+		counts[BR_IDX_SYSRET]++;
+		break;
+
+	case PERF_BR_IRQ:
+		counts[BR_IDX_IRQ]++;
+		break;
+
+	case PERF_BR_INT:
+		counts[BR_IDX_INT]++;
+		break;
+
+	case PERF_BR_IRET:
+		counts[BR_IDX_IRET]++;
+		break;
+
+	case PERF_BR_FAR_BRANCH:
+		counts[BR_IDX_FAR_BRANCH]++;
+		break;
+
+	default:
+		break;
+	}
+
+	if (flags->cross == PERF_BR_CROSS_2M)
+		counts[BR_IDX_CROSS_2M]++;
+	else if (flags->cross == PERF_BR_CROSS_4K)
+		counts[BR_IDX_CROSS_4K]++;
+}
 
 /*
  * Fill the node with callchain values
@@ -467,6 +550,9 @@ fill_node(struct callchain_node *node, struct callchain_cursor *cursor)
 			call->cycles_count = cursor_node->branch_flags.cycles;
 			call->iter_count = cursor_node->nr_loop_iter;
 			call->samples_count = cursor_node->samples;
+
+			branch_type_count(call->brtype_count,
+					  &cursor_node->branch_flags);
 		}
 
 		list_add_tail(&call->list, &node->val);
@@ -579,6 +665,9 @@ static enum match_result match_chain(struct callchain_cursor_node *node,
 			cnode->cycles_count += node->branch_flags.cycles;
 			cnode->iter_count += node->nr_loop_iter;
 			cnode->samples_count += node->samples;
+
+			branch_type_count(cnode->brtype_count,
+					  &node->branch_flags);
 		}
 
 		return MATCH_EQ;
@@ -1105,95 +1194,108 @@ int callchain_branch_counts(struct callchain_root *root,
 						  cycles_count);
 }
 
+static int branch_type_str(int *counts, char *bf, int bfsize)
+{
+	int i, printed = 0;
+	bool brace = false;
+
+	for (i = 0; i < BR_IDX_MAX; i++) {
+		if (printed == bfsize - 1)
+			return printed;
+
+		if (counts[i] > 0) {
+			if (!brace) {
+				brace = true;
+				printed += scnprintf(bf + printed,
+						bfsize - printed,
+						" (%s", br_tag[i]);
+			} else
+				printed += scnprintf(bf + printed,
+						bfsize - printed,
+						" %s", br_tag[i]);
+		}
+	}
+
+	return printed;
+}
+
 static int counts_str_build(char *bf, int bfsize,
 			     u64 branch_count, u64 predicted_count,
 			     u64 abort_count, u64 cycles_count,
-			     u64 iter_count, u64 samples_count)
+			     u64 iter_count, u64 samples_count,
+			     int *brtype_count)
 {
-	double predicted_percent = 0.0;
-	const char *null_str = "";
-	char iter_str[32];
-	char cycle_str[32];
-	char *istr, *cstr;
 	u64 cycles;
+	int printed, i = 0;
 
 	if (branch_count == 0)
 		return scnprintf(bf, bfsize, " (calltrace)");
 
+	printed = branch_type_str(brtype_count, bf, bfsize);
+	if (printed)
+		i++;
+
 	cycles = cycles_count / branch_count;
+	if (cycles) {
+		if (i++)
+			printed += scnprintf(bf + printed, bfsize - printed,
+				" cycles:%" PRId64 "", cycles);
+		else
+			printed += scnprintf(bf + printed, bfsize - printed,
+				" (cycles:%" PRId64 "", cycles);
+	}
 
 	if (iter_count && samples_count) {
-		if (cycles > 0)
-			scnprintf(iter_str, sizeof(iter_str),
-				 " iterations:%" PRId64 "",
-				 iter_count / samples_count);
+		if (i++)
+			printed += scnprintf(bf + printed, bfsize - printed,
+				" iterations:%" PRId64 "",
+				iter_count / samples_count);
 		else
-			scnprintf(iter_str, sizeof(iter_str),
-				 "iterations:%" PRId64 "",
-				 iter_count / samples_count);
-		istr = iter_str;
-	} else
-		istr = (char *)null_str;
-
-	if (cycles > 0) {
-		scnprintf(cycle_str, sizeof(cycle_str),
-			  "cycles:%" PRId64 "", cycles);
-		cstr = cycle_str;
-	} else
-		cstr = (char *)null_str;
-
-	predicted_percent = predicted_count * 100.0 / branch_count;
+			printed += scnprintf(bf + printed, bfsize - printed,
+				" (iterations:%" PRId64 "",
+				iter_count / samples_count);
+	}
 
-	if ((predicted_count == branch_count) && (abort_count == 0)) {
-		if ((cycles > 0) || (istr != (char *)null_str))
-			return scnprintf(bf, bfsize, " (%s%s)", cstr, istr);
+	if (predicted_count < branch_count) {
+		if (i++)
+			printed += scnprintf(bf + printed, bfsize - printed,
+				" predicted:%.1f%%",
+				predicted_count * 100.0 / branch_count);
 		else
-			return scnprintf(bf, bfsize, "%s", (char *)null_str);
-	}
-
-	if ((predicted_count < branch_count) && (abort_count == 0)) {
-		if ((cycles > 0) || (istr != (char *)null_str))
-			return scnprintf(bf, bfsize,
-				" (predicted:%.1f%% %s%s)",
-				predicted_percent, cstr, istr);
-		else {
-			return scnprintf(bf, bfsize,
-				" (predicted:%.1f%%)",
-				predicted_percent);
-		}
+			printed += scnprintf(bf + printed, bfsize - printed,
+				" (predicted:%.1f%%",
+				predicted_count * 100.0 / branch_count);
 	}
 
-	if ((predicted_count == branch_count) && (abort_count > 0)) {
-		if ((cycles > 0) || (istr != (char *)null_str))
-			return scnprintf(bf, bfsize,
-				" (abort:%" PRId64 " %s%s)",
-				abort_count, cstr, istr);
+	if (abort_count) {
+		if (i++)
+			printed += scnprintf(bf + printed, bfsize - printed,
+				" abort:%.1f%%",
+				abort_count * 100.0 / branch_count);
 		else
-			return scnprintf(bf, bfsize,
-				" (abort:%" PRId64 ")",
-				abort_count);
+			printed += scnprintf(bf + printed, bfsize - printed,
+				" (abort:%.1f%%",
+				abort_count * 100.0 / branch_count);
 	}
 
-	if ((cycles > 0) || (istr != (char *)null_str))
-		return scnprintf(bf, bfsize,
-			" (predicted:%.1f%% abort:%" PRId64 " %s%s)",
-			predicted_percent, abort_count, cstr, istr);
+	if (i)
+		return scnprintf(bf + printed, bfsize - printed, ")");
 
-	return scnprintf(bf, bfsize,
-			" (predicted:%.1f%% abort:%" PRId64 ")",
-			predicted_percent, abort_count);
+	bf[0] = 0;
+	return 0;
 }
 
 static int callchain_counts_printf(FILE *fp, char *bf, int bfsize,
 				   u64 branch_count, u64 predicted_count,
 				   u64 abort_count, u64 cycles_count,
-				   u64 iter_count, u64 samples_count)
+				   u64 iter_count, u64 samples_count,
+				   int *brtype_count)
 {
 	char str[128];
 
 	counts_str_build(str, sizeof(str), branch_count,
 			 predicted_count, abort_count, cycles_count,
-			 iter_count, samples_count);
+			 iter_count, samples_count, brtype_count);
 
 	if (fp)
 		return fprintf(fp, "%s", str);
@@ -1225,7 +1327,8 @@ int callchain_list_counts__printf_value(struct callchain_node *node,
 
 	return callchain_counts_printf(fp, bf, bfsize, branch_count,
 				       predicted_count, abort_count,
-				       cycles_count, iter_count, samples_count);
+				       cycles_count, iter_count, samples_count,
+				       clist->brtype_count);
 }
 
 static void free_callchain_node(struct callchain_node *node)
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index c56c23d..564a485 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -106,6 +106,25 @@ struct callchain_param {
 extern struct callchain_param callchain_param;
 extern struct callchain_param callchain_param_default;
 
+enum {
+	BR_IDX_JCC_FWD		= 0,
+	BR_IDX_JCC_BWD		= 1,
+	BR_IDX_JMP		= 2,
+	BR_IDX_IND_JMP		= 3,
+	BR_IDX_CALL		= 4,
+	BR_IDX_IND_CALL		= 5,
+	BR_IDX_RET		= 6,
+	BR_IDX_SYSCALL		= 7,
+	BR_IDX_SYSRET		= 8,
+	BR_IDX_IRQ		= 9,
+	BR_IDX_INT		= 10,
+	BR_IDX_IRET		= 11,
+	BR_IDX_FAR_BRANCH	= 12,
+	BR_IDX_CROSS_4K		= 13,
+	BR_IDX_CROSS_2M		= 14,
+	BR_IDX_MAX,
+};
+
 struct callchain_list {
 	u64			ip;
 	struct map_symbol	ms;
@@ -119,6 +138,7 @@ struct callchain_list {
 	u64			cycles_count;
 	u64			iter_count;
 	u64			samples_count;
+	int			brtype_count[BR_IDX_MAX];
 	char		       *srcline;
 	struct list_head	list;
 };
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/5] perf/x86/intel: Record branch type
  2017-04-07 10:47 ` [PATCH v2 2/5] perf/x86/intel: Record branch type Jin Yao
@ 2017-04-07 15:20   ` Peter Zijlstra
  2017-04-07 16:48     ` Andi Kleen
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2017-04-07 15:20 UTC (permalink / raw)
  To: Jin Yao
  Cc: acme, jolsa, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin, linuxppc-dev

On Fri, Apr 07, 2017 at 06:47:43PM +0800, Jin Yao wrote:
> Perf already has support for disassembling the branch instruction
> and using the branch type for filtering. The patch just records
> the branch type in perf_branch_entry.
> 
> Before recording, the patch converts the x86 branch classification
> to common branch classification and compute for checking if the
> branches cross 4K or 2MB areas. It's an approximate computing for
> crossing 4K page or 2MB page.

The changelog is completely empty of rationale. Why do we care?

Not having the binary is a very bad reason; you can't do much of
anything if that's missing.


> @@ -923,6 +933,84 @@ static int branch_type(unsigned long from, unsigned long to, int abort)
>  	return ret;
>  }
>  
> +static int
> +common_branch_type(int type, u64 from, u64 to)
> +{
> +	int ret;
> +
> +	type = type & (~(X86_BR_KERNEL | X86_BR_USER));
> +
> +	switch (type) {
> +	case X86_BR_CALL:
> +	case X86_BR_ZERO_CALL:
> +		ret = PERF_BR_CALL;
> +		break;
> +
> +	case X86_BR_RET:
> +		ret = PERF_BR_RET;
> +		break;
> +
> +	case X86_BR_SYSCALL:
> +		ret = PERF_BR_SYSCALL;
> +		break;
> +
> +	case X86_BR_SYSRET:
> +		ret = PERF_BR_SYSRET;
> +		break;
> +
> +	case X86_BR_INT:
> +		ret = PERF_BR_INT;
> +		break;
> +
> +	case X86_BR_IRET:
> +		ret = PERF_BR_IRET;
> +		break;
> +
> +	case X86_BR_IRQ:
> +		ret = PERF_BR_IRQ;
> +		break;
> +
> +	case X86_BR_ABORT:
> +		ret = PERF_BR_FAR_BRANCH;
> +		break;
> +
> +	case X86_BR_JCC:
> +		if (to > from)
> +			ret = PERF_BR_JCC_FWD;
> +		else
> +			ret = PERF_BR_JCC_BWD;
> +		break;

This seems like superfluous information; we already get to and from, so
this comparison is pointless.

The rest looks like something you can simpler implement using a lookup
table.

> +
> +	case X86_BR_JMP:
> +		ret = PERF_BR_JMP;
> +		break;
> +
> +	case X86_BR_IND_CALL:
> +		ret = PERF_BR_IND_CALL;
> +		break;
> +
> +	case X86_BR_IND_JMP:
> +		ret = PERF_BR_IND_JMP;
> +		break;
> +
> +	default:
> +		ret = PERF_BR_NONE;
> +	}
> +
> +	return ret;
> +}
> +
> +static bool
> +cross_area(u64 addr1, u64 addr2, int size)
> +{
> +	u64 align1, align2;
> +
> +	align1 = addr1 & ~(size - 1);
> +	align2 = addr2 & ~(size - 1);
> +
> +	return (align1 != align2) ? true : false;
> +}
> +
>  /*
>   * implement actual branch filter based on user demand.
>   * Hardware may not exactly satisfy that request, thus
> @@ -939,7 +1027,8 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
>  	bool compress = false;
>  
>  	/* if sampling all branches, then nothing to filter */
> -	if ((br_sel & X86_BR_ALL) == X86_BR_ALL)
> +	if (((br_sel & X86_BR_ALL) == X86_BR_ALL) &&
> +	    ((br_sel & X86_BR_TYPE_SAVE) != X86_BR_TYPE_SAVE))
>  		return;
>  
>  	for (i = 0; i < cpuc->lbr_stack.nr; i++) {
> @@ -960,6 +1049,21 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
>  			cpuc->lbr_entries[i].from = 0;
>  			compress = true;
>  		}
> +
> +		if ((br_sel & X86_BR_TYPE_SAVE) == X86_BR_TYPE_SAVE) {
> +			cpuc->lbr_entries[i].type = common_branch_type(type,
> +								       from,
> +								       to);
> +			if (cross_area(from, to, AREA_2M))
> +				cpuc->lbr_entries[i].cross = PERF_BR_CROSS_2M;
> +			else if (cross_area(from, to, AREA_4K))
> +				cpuc->lbr_entries[i].cross = PERF_BR_CROSS_4K;
> +			else
> +				cpuc->lbr_entries[i].cross = PERF_BR_CROSS_NONE;

This again is superfluous information; it is already fully contained in
to and from, which we have.

> +		} else {
> +			cpuc->lbr_entries[i].type = PERF_BR_NONE;
> +			cpuc->lbr_entries[i].cross = PERF_BR_CROSS_NONE;
> +		}
>  	}
>  
>  	if (!compress)
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/5] perf/x86/intel: Record branch type
  2017-04-07 15:20   ` Peter Zijlstra
@ 2017-04-07 16:48     ` Andi Kleen
  2017-04-07 17:20       ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2017-04-07 16:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jin Yao, acme, jolsa, mingo, alexander.shishkin, Linux-kernel,
	kan.liang, yao.jin, linuxppc-dev

On Fri, Apr 07, 2017 at 05:20:31PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 07, 2017 at 06:47:43PM +0800, Jin Yao wrote:
> > Perf already has support for disassembling the branch instruction
> > and using the branch type for filtering. The patch just records
> > the branch type in perf_branch_entry.
> > 
> > Before recording, the patch converts the x86 branch classification
> > to common branch classification and compute for checking if the
> > branches cross 4K or 2MB areas. It's an approximate computing for
> > crossing 4K page or 2MB page.
> 
> The changelog is completely empty of rationale. Why do we care?
> 
> Not having the binary is a very bad reason; you can't do much of
> anything if that's missing.

It's a somewhat common situation with partially JITed code, if you
don't have an agent. You can still do a lot of useful things.

We found it useful to have this extra information during workload
analysis. Forward conditionals and page crossing jumps
are indications of frontend problems.

-Andi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/5] perf/x86/intel: Record branch type
  2017-04-07 16:48     ` Andi Kleen
@ 2017-04-07 17:20       ` Peter Zijlstra
  2017-04-07 17:50         ` Andi Kleen
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2017-04-07 17:20 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jin Yao, acme, jolsa, mingo, alexander.shishkin, Linux-kernel,
	kan.liang, yao.jin, linuxppc-dev

On Fri, Apr 07, 2017 at 09:48:34AM -0700, Andi Kleen wrote:
> On Fri, Apr 07, 2017 at 05:20:31PM +0200, Peter Zijlstra wrote:
> > On Fri, Apr 07, 2017 at 06:47:43PM +0800, Jin Yao wrote:
> > > Perf already has support for disassembling the branch instruction
> > > and using the branch type for filtering. The patch just records
> > > the branch type in perf_branch_entry.
> > > 
> > > Before recording, the patch converts the x86 branch classification
> > > to common branch classification and compute for checking if the
> > > branches cross 4K or 2MB areas. It's an approximate computing for
> > > crossing 4K page or 2MB page.
> > 
> > The changelog is completely empty of rationale. Why do we care?
> > 
> > Not having the binary is a very bad reason; you can't do much of
> > anything if that's missing.
> 
> It's a somewhat common situation with partially JITed code, if you
> don't have an agent. You can still do a lot of useful things.

Like what? How can you say anything about code you don't have?

> We found it useful to have this extra information during workload
> analysis. Forward conditionals and page crossing jumps
> are indications of frontend problems.

But you already have the exact same information in {to,from}, why would
you need to repackage information already contained?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/5] perf/x86/intel: Record branch type
  2017-04-07 17:20       ` Peter Zijlstra
@ 2017-04-07 17:50         ` Andi Kleen
  2017-04-08  8:46           ` Jin, Yao
  0 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2017-04-07 17:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jin Yao, acme, jolsa, mingo, alexander.shishkin, Linux-kernel,
	kan.liang, yao.jin, linuxppc-dev

> > It's a somewhat common situation with partially JITed code, if you
> > don't have an agent. You can still do a lot of useful things.
> 
> Like what? How can you say anything about code you don't have?

For example if you combine the PMU topdown measurement, and see if it's
frontend bound, and then you see it has lots of forward conditionals,
then dynamic basic block reordering will help. If you have lots
of cross page jumps then function reordering will help. etc.

> > We found it useful to have this extra information during workload
> > analysis. Forward conditionals and page crossing jumps
> > are indications of frontend problems.
> 
> But you already have the exact same information in {to,from}, why would
> you need to repackage information already contained?

Without this patch, we don't know if it's conditional or something else.
And the kernel already knows this for its filtering, so it can as well
report it.

Right the CROSS_* and forward backward information could be computed
later.

-Andi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/5] perf/x86/intel: Record branch type
  2017-04-07 17:50         ` Andi Kleen
@ 2017-04-08  8:46           ` Jin, Yao
  0 siblings, 0 replies; 11+ messages in thread
From: Jin, Yao @ 2017-04-08  8:46 UTC (permalink / raw)
  To: Andi Kleen, Peter Zijlstra
  Cc: acme, jolsa, mingo, alexander.shishkin, Linux-kernel, kan.liang,
	yao.jin, linuxppc-dev

> Without this patch, we don't know if it's conditional or something else.
> And the kernel already knows this for its filtering, so it can as well
> report it.
>
> Right the CROSS_* and forward backward information could be computed
> later.
>
> -Andi
>
>
OK, I will move CROSS_* and JCC forward/backward computing to user-space though it makes user-space code to be complicated.

Thanks
Jin Yao

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-04-08  8:46 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-07 10:47 [PATCH v2 0/5] perf report: Show branch type Jin Yao
2017-04-07 10:47 ` [PATCH v2 1/5] perf/core: Define the common branch type classification Jin Yao
2017-04-07 10:47 ` [PATCH v2 2/5] perf/x86/intel: Record branch type Jin Yao
2017-04-07 15:20   ` Peter Zijlstra
2017-04-07 16:48     ` Andi Kleen
2017-04-07 17:20       ` Peter Zijlstra
2017-04-07 17:50         ` Andi Kleen
2017-04-08  8:46           ` Jin, Yao
2017-04-07 10:47 ` [PATCH v2 3/5] perf record: Create a new option save_type in --branch-filter Jin Yao
2017-04-07 10:47 ` [PATCH v2 4/5] perf report: Show branch type statistics for stdio mode Jin Yao
2017-04-07 10:47 ` [PATCH v2 5/5] perf report: Show branch type in callchain entry Jin Yao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.