linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V4 00/17] Stitch LBR call stack (Perf Tools)
@ 2020-03-19 20:25 kan.liang
  2020-03-19 20:25 ` [PATCH V4 01/17] perf pmu: Add support for PMU capabilities kan.liang
                   ` (18 more replies)
  0 siblings, 19 replies; 46+ messages in thread
From: kan.liang @ 2020-03-19 20:25 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Changes since V3:
- There is no dependency among the 'capabilities'. If perf fails to read
  one, it should not impact others. Continue to parse the rest of caps.
  (Patch 1)
- Use list_for_each_entry() to replace perf_pmu__scan_caps() (Patch 1 &
  2)
- Combine the declaration plus assignment when possible (Patch 1 & 2)
- Add check for script/report/c2c.. (Patch 13, 14 & 16)

Changes since V2:
- Check strdup() in Patch 1
- Split several patches into smaller patches

Changes since V1:
- Rebase on top of commit 5100c2b77049 ("perf header: Add check for
  unexpected use of reserved membrs in event attr")
- Fix compling error with GCC9 in patch 1.


The kernel patches have been merged into linux-next.
  commit bbfd5e4fab63 ("perf/core: Add new branch sample type for HW
index of raw branch records")
  commit db278b90c326 ("perf/x86/intel: Output LBR TOS information
correctly")

Start from Haswell, Linux perf can utilize the existing Last Branch
Record (LBR) facility to record call stack. However, the depth of the
reconstructed LBR call stack limits to the number of LBR registers.
E.g. on skylake, the depth of reconstructed LBR call stack is <= 32
That's because HW will overwrite the oldest LBR registers when it's
full.

However, the overwritten LBRs may still be retrieved from previous
sample. At that moment, HW hasn't overwritten the LBR registers yet.
Perf tools can stitch those overwritten LBRs on current call stacks to
get a more complete call stack.

To determine if LBRs can be stitched, the maximum number of LBRs is
required. Patch 1 - 4 retrieve the capabilities information from sysfs
and save them in perf header.

Patch 5 - 12 implements the LBR stitching approach.

Users can use the options introduced in patch 13-16 to enable the LBR
stitching approach for perf report, script, top and c2c.

Patch 17 adds a fast path for duplicate entries check. It benefits all
call stack parsing, not just for stitch LBR call stack. It can be
merged independently.

The stitching approach base on LBR call stack technology. The known
limitations of LBR call stack technology still apply to the approach,
e.g. Exception handing such as setjmp/longjmp will have calls/returns
not match.
This approach is not full proof. There can be cases where it creates
incorrect call stacks from incorrect matches. There is no attempt
to validate any matches in another way. So it is not enabled by default.
However in many common cases with call stack overflows it can recreate
better call stacks than the default lbr call stack output. So if there
are problems with LBR overflows this is a possible workaround.

Regression:
Users may collect LBR call stack on a machine with new perf tool and
new kernel (support LBR TOS). However, they may parse the perf.data with
old perf tool (not support LBR TOS). The old tool doesn't check
attr.branch_sample_type. Users probably get incorrect information
without any warning.

Performance impact:
The processing time may increase with the LBR stitching approach
enabled. The impact depends on the increased depth of call stacks.

For a simple test case tchain_edit with 43 depth of call stacks.
perf record --call-graph lbr -- ./tchain_edit
perf report --stitch-lbr

Without --stitch-lbr, perf report only display 32 depth of call stacks.
With --stitch-lbr, perf report can display all 43 depth of call stacks.
The depth of call stacks increase 34.3%.

Correspondingly, the processing time of perf report increases 39%,
Without --stitch-lbr:                           11.0 sec
With --stitch-lbr:                              15.3 sec

The source code of tchain_edit.c is something similar as below.
noinline void f43(void)
{
        int i;
        for (i = 0; i < 10000;) {

                if(i%2)
                        i++;
                else
                        i++;
        }
}

noinline void f42(void)
{
        int i;
        for (i = 0; i < 100; i++) {
                f43();
                f43();
                f43();
        }
}

noinline void f41(void)
{
        int i;
        for (i = 0; i < 100; i++) {
                f42();
                f42();
                f42();
        }
}
noinline void f40(void)
{
        f41();
}

... ...

noinline void f32(void)
{
        f33();
}

noinline void f31(void)
{
        int i;

        for (i = 0; i < 10000; i++) {
                if(i%2)
                        i++;
                else
                        i++;
        }

        f32();
}

noinline void f30(void)
{
        f31();
}

... ...

noinline void f1(void)
{
        f2();
}

int main()
{
        f1();
}

Kan Liang (17):
  perf pmu: Add support for PMU capabilities
  perf header: Support CPU PMU capabilities
  perf record: Clear HEADER_CPU_PMU_CAPS for non LBR call stack mode
  perf stat: Clear HEADER_CPU_PMU_CAPS
  perf machine: Remove the indent in resolve_lbr_callchain_sample
  perf machine: Refine the function for LBR call stack reconstruction
  perf machine: Factor out lbr_callchain_add_kernel_ip()
  perf machine: Factor out lbr_callchain_add_lbr_ip()
  perf thread: Add a knob for LBR stitch approach
  perf tools: Save previous sample for LBR stitching approach
  perf tools: Save previous cursor nodes for LBR stitching approach
  perf tools: Stitch LBR call stack
  perf report: Add option to enable the LBR stitching approach
  perf script: Add option to enable the LBR stitching approach
  perf top: Add option to enable the LBR stitching approach
  perf c2c: Add option to enable the LBR stitching approach
  perf hist: Add fast path for duplicate entries check

 tools/perf/Documentation/perf-c2c.txt         |  11 +
 tools/perf/Documentation/perf-report.txt      |  11 +
 tools/perf/Documentation/perf-script.txt      |  11 +
 tools/perf/Documentation/perf-top.txt         |   9 +
 .../Documentation/perf.data-file-format.txt   |  16 +
 tools/perf/builtin-c2c.c                      |  12 +
 tools/perf/builtin-record.c                   |   3 +
 tools/perf/builtin-report.c                   |  12 +
 tools/perf/builtin-script.c                   |  12 +
 tools/perf/builtin-stat.c                     |   1 +
 tools/perf/builtin-top.c                      |  11 +
 tools/perf/util/branch.h                      |  19 +-
 tools/perf/util/callchain.h                   |   8 +
 tools/perf/util/env.h                         |   3 +
 tools/perf/util/header.c                      | 108 +++++
 tools/perf/util/header.h                      |   1 +
 tools/perf/util/hist.c                        |  23 +
 tools/perf/util/machine.c                     | 423 +++++++++++++++---
 tools/perf/util/pmu.c                         |  82 ++++
 tools/perf/util/pmu.h                         |   9 +
 tools/perf/util/sort.c                        |   2 +-
 tools/perf/util/sort.h                        |   2 +
 tools/perf/util/thread.c                      |   2 +
 tools/perf/util/thread.h                      |  35 ++
 tools/perf/util/top.h                         |   1 +
 25 files changed, 757 insertions(+), 70 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH V4 01/17] perf pmu: Add support for PMU capabilities
  2020-03-19 20:25 [PATCH V4 00/17] Stitch LBR call stack (Perf Tools) kan.liang
@ 2020-03-19 20:25 ` kan.liang
  2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2020-03-19 20:25 ` [PATCH V4 02/17] perf header: Support CPU " kan.liang
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 46+ messages in thread
From: kan.liang @ 2020-03-19 20:25 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The PMU capabilities information, which is located at
/sys/bus/event_source/devices/<dev>/caps, is required by perf tool.
For example, the max LBR information is required to stitch LBR call
stack.

Add perf_pmu__caps_parse() to parse the PMU capabilities information.
The information is stored in a list.

The following patch will store the capabilities information in perf
header.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/util/pmu.c | 82 +++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/pmu.h |  9 +++++
 2 files changed, 91 insertions(+)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 8b99fd312aae..b3b975ad2ea2 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -844,6 +844,7 @@ static struct perf_pmu *pmu_lookup(const char *name)
 
 	INIT_LIST_HEAD(&pmu->format);
 	INIT_LIST_HEAD(&pmu->aliases);
+	INIT_LIST_HEAD(&pmu->caps);
 	list_splice(&format, &pmu->format);
 	list_splice(&aliases, &pmu->aliases);
 	list_add_tail(&pmu->list, &pmus);
@@ -1565,3 +1566,84 @@ int perf_pmu__scan_file(struct perf_pmu *pmu, const char *name, const char *fmt,
 	va_end(args);
 	return ret;
 }
+
+static int perf_pmu__new_caps(struct list_head *list, char *name, char *value)
+{
+	struct perf_pmu_caps *caps = zalloc(sizeof(*caps));
+
+	if (!caps)
+		return -ENOMEM;
+
+	caps->name = strdup(name);
+	if (!caps->name)
+		goto free_caps;
+	caps->value = strndup(value, strlen(value) - 1);
+	if (!caps->value)
+		goto free_name;
+	list_add_tail(&caps->list, list);
+	return 0;
+
+free_name:
+	zfree(caps->name);
+free_caps:
+	free(caps);
+
+	return -ENOMEM;
+}
+
+/*
+ * Reading/parsing the given pmu capabilities, which should be located at:
+ * /sys/bus/event_source/devices/<dev>/caps as sysfs group attributes.
+ * Return the number of capabilities
+ */
+int perf_pmu__caps_parse(struct perf_pmu *pmu)
+{
+	struct stat st;
+	char caps_path[PATH_MAX];
+	const char *sysfs = sysfs__mountpoint();
+	DIR *caps_dir;
+	struct dirent *evt_ent;
+	int nr_caps = 0;
+
+	if (!sysfs)
+		return -1;
+
+	snprintf(caps_path, PATH_MAX,
+		 "%s" EVENT_SOURCE_DEVICE_PATH "%s/caps", sysfs, pmu->name);
+
+	if (stat(caps_path, &st) < 0)
+		return 0;	/* no error if caps does not exist */
+
+	caps_dir = opendir(caps_path);
+	if (!caps_dir)
+		return -EINVAL;
+
+	while ((evt_ent = readdir(caps_dir)) != NULL) {
+		char path[PATH_MAX + NAME_MAX + 1];
+		char *name = evt_ent->d_name;
+		char value[128];
+		FILE *file;
+
+		if (!strcmp(name, ".") || !strcmp(name, ".."))
+			continue;
+
+		snprintf(path, sizeof(path), "%s/%s", caps_path, name);
+
+		file = fopen(path, "r");
+		if (!file)
+			continue;
+
+		if (!fgets(value, sizeof(value), file) ||
+		    (perf_pmu__new_caps(&pmu->caps, name, value) < 0)) {
+			fclose(file);
+			continue;
+		}
+
+		nr_caps++;
+		fclose(file);
+	}
+
+	closedir(caps_dir);
+
+	return nr_caps;
+}
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 6737e3d5d568..854e1b8de746 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -21,6 +21,12 @@ enum {
 
 struct perf_event_attr;
 
+struct perf_pmu_caps {
+	char *name;
+	char *value;
+	struct list_head list;
+};
+
 struct perf_pmu {
 	char *name;
 	__u32 type;
@@ -32,6 +38,7 @@ struct perf_pmu {
 	struct perf_cpu_map *cpus;
 	struct list_head format;  /* HEAD struct perf_pmu_format -> list */
 	struct list_head aliases; /* HEAD struct perf_pmu_alias -> list */
+	struct list_head caps;    /* HEAD struct perf_pmu_caps -> list */
 	struct list_head list;    /* ELEM */
 };
 
@@ -102,4 +109,6 @@ struct pmu_events_map *perf_pmu__find_map(struct perf_pmu *pmu);
 
 int perf_pmu__convert_scale(const char *scale, char **end, double *sval);
 
+int perf_pmu__caps_parse(struct perf_pmu *pmu);
+
 #endif /* __PMU_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 02/17] perf header: Support CPU PMU capabilities
  2020-03-19 20:25 [PATCH V4 00/17] Stitch LBR call stack (Perf Tools) kan.liang
  2020-03-19 20:25 ` [PATCH V4 01/17] perf pmu: Add support for PMU capabilities kan.liang
@ 2020-03-19 20:25 ` kan.liang
  2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2020-03-19 20:25 ` [PATCH V4 03/17] perf record: Clear HEADER_CPU_PMU_CAPS for non LBR call stack mode kan.liang
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 46+ messages in thread
From: kan.liang @ 2020-03-19 20:25 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

To stitch LBR call stack, the max LBR information is required. So the
CPU PMU capabilities information has to be stored in perf header.

Add a new feature HEADER_CPU_PMU_CAPS for CPU PMU capabilities.
Retrieve all CPU PMU capabilities, not just max LBR information.

Add variable max_branches to facilitate future usage.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 .../Documentation/perf.data-file-format.txt   |  16 +++
 tools/perf/util/env.h                         |   3 +
 tools/perf/util/header.c                      | 108 ++++++++++++++++++
 tools/perf/util/header.h                      |   1 +
 4 files changed, 128 insertions(+)

diff --git a/tools/perf/Documentation/perf.data-file-format.txt b/tools/perf/Documentation/perf.data-file-format.txt
index b0152e1095c5..b6472e463284 100644
--- a/tools/perf/Documentation/perf.data-file-format.txt
+++ b/tools/perf/Documentation/perf.data-file-format.txt
@@ -373,6 +373,22 @@ struct {
 Indicates that trace contains records of PERF_RECORD_COMPRESSED type
 that have perf_events records in compressed form.
 
+	HEADER_CPU_PMU_CAPS = 28,
+
+	A list of cpu PMU capabilities. The format of data is as below.
+
+struct {
+	u32 nr_cpu_pmu_caps;
+	{
+		char	name[];
+		char	value[];
+	} [nr_cpu_pmu_caps]
+};
+
+
+Example:
+ cpu pmu capabilities: branches=32, max_precise=3, pmu_name=icelake
+
 	other bits are reserved and should ignored for now
 	HEADER_FEAT_BITS	= 256,
 
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index 11d05ae3606a..d286d478b4d8 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -48,6 +48,7 @@ struct perf_env {
 	char			*cpuid;
 	unsigned long long	total_mem;
 	unsigned int		msr_pmu_type;
+	unsigned int		max_branches;
 
 	int			nr_cmdline;
 	int			nr_sibling_cores;
@@ -57,12 +58,14 @@ struct perf_env {
 	int			nr_memory_nodes;
 	int			nr_pmu_mappings;
 	int			nr_groups;
+	int			nr_cpu_pmu_caps;
 	char			*cmdline;
 	const char		**cmdline_argv;
 	char			*sibling_cores;
 	char			*sibling_dies;
 	char			*sibling_threads;
 	char			*pmu_mappings;
+	char			*cpu_pmu_caps;
 	struct cpu_topology_map	*cpu;
 	struct cpu_cache_level	*caches;
 	int			 caches_cnt;
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index acbd046bf95c..28e82da04b7a 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -1395,6 +1395,38 @@ static int write_compressed(struct feat_fd *ff __maybe_unused,
 	return do_write(ff, &(ff->ph->env.comp_mmap_len), sizeof(ff->ph->env.comp_mmap_len));
 }
 
+static int write_cpu_pmu_caps(struct feat_fd *ff,
+			      struct evlist *evlist __maybe_unused)
+{
+	struct perf_pmu *cpu_pmu = perf_pmu__find("cpu");
+	struct perf_pmu_caps *caps = NULL;
+	int nr_caps;
+	int ret;
+
+	if (!cpu_pmu)
+		return -ENOENT;
+
+	nr_caps = perf_pmu__caps_parse(cpu_pmu);
+	if (nr_caps < 0)
+		return nr_caps;
+
+	ret = do_write(ff, &nr_caps, sizeof(nr_caps));
+	if (ret < 0)
+		return ret;
+
+	list_for_each_entry(caps, &cpu_pmu->caps, list) {
+		ret = do_write_string(ff, caps->name);
+		if (ret < 0)
+			return ret;
+
+		ret = do_write_string(ff, caps->value);
+		if (ret < 0)
+			return ret;
+	}
+
+	return ret;
+}
+
 static void print_hostname(struct feat_fd *ff, FILE *fp)
 {
 	fprintf(fp, "# hostname : %s\n", ff->ph->env.hostname);
@@ -1809,6 +1841,27 @@ static void print_compressed(struct feat_fd *ff, FILE *fp)
 		ff->ph->env.comp_level, ff->ph->env.comp_ratio);
 }
 
+static void print_cpu_pmu_caps(struct feat_fd *ff, FILE *fp)
+{
+	const char *delimiter = "# cpu pmu capabilities: ";
+	u32 nr_caps = ff->ph->env.nr_cpu_pmu_caps;
+	char *str;
+
+	if (!nr_caps) {
+		fprintf(fp, "# cpu pmu capabilities: not available\n");
+		return;
+	}
+
+	str = ff->ph->env.cpu_pmu_caps;
+	while (nr_caps--) {
+		fprintf(fp, "%s%s", delimiter, str);
+		delimiter = ", ";
+		str += strlen(str) + 1;
+	}
+
+	fprintf(fp, "\n");
+}
+
 static void print_pmu_mappings(struct feat_fd *ff, FILE *fp)
 {
 	const char *delimiter = "# pmu mappings: ";
@@ -2846,6 +2899,60 @@ static int process_compressed(struct feat_fd *ff,
 	return 0;
 }
 
+static int process_cpu_pmu_caps(struct feat_fd *ff,
+				void *data __maybe_unused)
+{
+	char *name, *value;
+	struct strbuf sb;
+	u32 nr_caps;
+
+	if (do_read_u32(ff, &nr_caps))
+		return -1;
+
+	if (!nr_caps) {
+		pr_debug("cpu pmu capabilities not available\n");
+		return 0;
+	}
+
+	ff->ph->env.nr_cpu_pmu_caps = nr_caps;
+
+	if (strbuf_init(&sb, 128) < 0)
+		return -1;
+
+	while (nr_caps--) {
+		name = do_read_string(ff);
+		if (!name)
+			goto error;
+
+		value = do_read_string(ff);
+		if (!value)
+			goto free_name;
+
+		if (strbuf_addf(&sb, "%s=%s", name, value) < 0)
+			goto free_value;
+
+		/* include a NULL character at the end */
+		if (strbuf_add(&sb, "", 1) < 0)
+			goto free_value;
+
+		if (!strcmp(name, "branches"))
+			ff->ph->env.max_branches = atoi(value);
+
+		free(value);
+		free(name);
+	}
+	ff->ph->env.cpu_pmu_caps = strbuf_detach(&sb, NULL);
+	return 0;
+
+free_value:
+	free(value);
+free_name:
+	free(name);
+error:
+	strbuf_release(&sb);
+	return -1;
+}
+
 #define FEAT_OPR(n, func, __full_only) \
 	[HEADER_##n] = {					\
 		.name	    = __stringify(n),			\
@@ -2903,6 +3010,7 @@ const struct perf_header_feature_ops feat_ops[HEADER_LAST_FEATURE] = {
 	FEAT_OPR(BPF_PROG_INFO, bpf_prog_info,  false),
 	FEAT_OPR(BPF_BTF,       bpf_btf,        false),
 	FEAT_OPR(COMPRESSED,	compressed,	false),
+	FEAT_OPR(CPU_PMU_CAPS,	cpu_pmu_caps,	false),
 };
 
 struct header_print_data {
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index 840f95cee349..650bd1c7a99b 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -43,6 +43,7 @@ enum {
 	HEADER_BPF_PROG_INFO,
 	HEADER_BPF_BTF,
 	HEADER_COMPRESSED,
+	HEADER_CPU_PMU_CAPS,
 	HEADER_LAST_FEATURE,
 	HEADER_FEAT_BITS	= 256,
 };
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 03/17] perf record: Clear HEADER_CPU_PMU_CAPS for non LBR call stack mode
  2020-03-19 20:25 [PATCH V4 00/17] Stitch LBR call stack (Perf Tools) kan.liang
  2020-03-19 20:25 ` [PATCH V4 01/17] perf pmu: Add support for PMU capabilities kan.liang
  2020-03-19 20:25 ` [PATCH V4 02/17] perf header: Support CPU " kan.liang
@ 2020-03-19 20:25 ` kan.liang
  2020-04-17 14:42   ` Arnaldo Carvalho de Melo
  2020-03-19 20:25 ` [PATCH V4 04/17] perf stat: Clear HEADER_CPU_PMU_CAPS kan.liang
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 46+ messages in thread
From: kan.liang @ 2020-03-19 20:25 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The CPU PMU capabilities information is only useful for LBR call stack.
Clear the feature for other perf record mode.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/builtin-record.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4c301466101b..428f7f5b8e48 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1120,6 +1120,9 @@ static void record__init_features(struct record *rec)
 	if (!record__comp_enabled(rec))
 		perf_header__clear_feat(&session->header, HEADER_COMPRESSED);
 
+	if (!callchain_param.enabled || (callchain_param.record_mode != CALLCHAIN_LBR))
+		perf_header__clear_feat(&session->header, HEADER_CPU_PMU_CAPS);
+
 	perf_header__clear_feat(&session->header, HEADER_STAT);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 04/17] perf stat: Clear HEADER_CPU_PMU_CAPS
  2020-03-19 20:25 [PATCH V4 00/17] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (2 preceding siblings ...)
  2020-03-19 20:25 ` [PATCH V4 03/17] perf record: Clear HEADER_CPU_PMU_CAPS for non LBR call stack mode kan.liang
@ 2020-03-19 20:25 ` kan.liang
  2020-04-17 14:42   ` Arnaldo Carvalho de Melo
  2020-03-19 20:25 ` [PATCH V4 05/17] perf machine: Remove the indent in resolve_lbr_callchain_sample kan.liang
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 46+ messages in thread
From: kan.liang @ 2020-03-19 20:25 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The CPU PMU capabilities information is only useful for perf record with
LBR call stack.

Clear the header for perf stat.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/builtin-stat.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index ec053dc1e35c..b5c8a5ab5e75 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1595,6 +1595,7 @@ static void init_features(struct perf_session *session)
 	perf_header__clear_feat(&session->header, HEADER_TRACING_DATA);
 	perf_header__clear_feat(&session->header, HEADER_BRANCH_STACK);
 	perf_header__clear_feat(&session->header, HEADER_AUXTRACE);
+	perf_header__clear_feat(&session->header, HEADER_CPU_PMU_CAPS);
 }
 
 static int __cmd_record(int argc, const char **argv)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 05/17] perf machine: Remove the indent in resolve_lbr_callchain_sample
  2020-03-19 20:25 [PATCH V4 00/17] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (3 preceding siblings ...)
  2020-03-19 20:25 ` [PATCH V4 04/17] perf stat: Clear HEADER_CPU_PMU_CAPS kan.liang
@ 2020-03-19 20:25 ` kan.liang
  2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2020-03-19 20:25 ` [PATCH V4 06/17] perf machine: Refine the function for LBR call stack reconstruction kan.liang
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 46+ messages in thread
From: kan.liang @ 2020-03-19 20:25 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The indent is unnecessary in resolve_lbr_callchain_sample.
Removing it will make the following patch simpler.

Current code path for resolve_lbr_callchain_sample()

        /* LBR only affects the user callchain */
        if (i != chain_nr) {
                body of the function
                ....
                return 1;
        }

        return 0;

With the patch,

        /* LBR only affects the user callchain */
        if (i == chain_nr)
                return 0;

        body of the function
        ...
        return 1;

No functional changes.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/util/machine.c | 123 +++++++++++++++++++-------------------
 1 file changed, 63 insertions(+), 60 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index fd14f1489802..9021e5b6a2a9 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2177,6 +2177,12 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 	int chain_nr = min(max_stack, (int)chain->nr), i;
 	u8 cpumode = PERF_RECORD_MISC_USER;
 	u64 ip, branch_from = 0;
+	struct branch_stack *lbr_stack;
+	struct branch_entry *entries;
+	int lbr_nr, j, k;
+	bool branch;
+	struct branch_flags *flags;
+	int mix_chain_nr;
 
 	for (i = 0; i < chain_nr; i++) {
 		if (chain->ips[i] == PERF_CONTEXT_USER)
@@ -2184,71 +2190,68 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 	}
 
 	/* LBR only affects the user callchain */
-	if (i != chain_nr) {
-		struct branch_stack *lbr_stack = sample->branch_stack;
-		struct branch_entry *entries = perf_sample__branch_entries(sample);
-		int lbr_nr = lbr_stack->nr, j, k;
-		bool branch;
-		struct branch_flags *flags;
-		/*
-		 * LBR callstack can only get user call chain.
-		 * The mix_chain_nr is kernel call chain
-		 * number plus LBR user call chain number.
-		 * i is kernel call chain number,
-		 * 1 is PERF_CONTEXT_USER,
-		 * lbr_nr + 1 is the user call chain number.
-		 * For details, please refer to the comments
-		 * in callchain__printf
-		 */
-		int mix_chain_nr = i + 1 + lbr_nr + 1;
-
-		for (j = 0; j < mix_chain_nr; j++) {
-			int err;
-			branch = false;
-			flags = NULL;
+	if (i == chain_nr)
+		return 0;
 
-			if (callchain_param.order == ORDER_CALLEE) {
-				if (j < i + 1)
-					ip = chain->ips[j];
-				else if (j > i + 1) {
-					k = j - i - 2;
-					ip = entries[k].from;
-					branch = true;
-					flags = &entries[k].flags;
-				} else {
-					ip = entries[0].to;
-					branch = true;
-					flags = &entries[0].flags;
-					branch_from = entries[0].from;
-				}
+	lbr_stack = sample->branch_stack;
+	entries = perf_sample__branch_entries(sample);
+	lbr_nr = lbr_stack->nr;
+	/*
+	 * LBR callstack can only get user call chain.
+	 * The mix_chain_nr is kernel call chain
+	 * number plus LBR user call chain number.
+	 * i is kernel call chain number,
+	 * 1 is PERF_CONTEXT_USER,
+	 * lbr_nr + 1 is the user call chain number.
+	 * For details, please refer to the comments
+	 * in callchain__printf
+	 */
+	mix_chain_nr = i + 1 + lbr_nr + 1;
+
+	for (j = 0; j < mix_chain_nr; j++) {
+		int err;
+
+		branch = false;
+		flags = NULL;
+
+		if (callchain_param.order == ORDER_CALLEE) {
+			if (j < i + 1)
+				ip = chain->ips[j];
+			else if (j > i + 1) {
+				k = j - i - 2;
+				ip = entries[k].from;
+				branch = true;
+				flags = &entries[k].flags;
 			} else {
-				if (j < lbr_nr) {
-					k = lbr_nr - j - 1;
-					ip = entries[k].from;
-					branch = true;
-					flags = &entries[k].flags;
-				}
-				else if (j > lbr_nr)
-					ip = chain->ips[i + 1 - (j - lbr_nr)];
-				else {
-					ip = entries[0].to;
-					branch = true;
-					flags = &entries[0].flags;
-					branch_from = entries[0].from;
-				}
+				ip = entries[0].to;
+				branch = true;
+				flags = &entries[0].flags;
+				branch_from = entries[0].from;
+			}
+		} else {
+			if (j < lbr_nr) {
+				k = lbr_nr - j - 1;
+				ip = entries[k].from;
+				branch = true;
+				flags = &entries[k].flags;
+			} else if (j > lbr_nr)
+				ip = chain->ips[i + 1 - (j - lbr_nr)];
+			else {
+				ip = entries[0].to;
+				branch = true;
+				flags = &entries[0].flags;
+				branch_from = entries[0].from;
 			}
-
-			err = add_callchain_ip(thread, cursor, parent,
-					       root_al, &cpumode, ip,
-					       branch, flags, NULL,
-					       branch_from);
-			if (err)
-				return (err < 0) ? err : 0;
 		}
-		return 1;
-	}
 
-	return 0;
+		err = add_callchain_ip(thread, cursor, parent,
+				       root_al, &cpumode, ip,
+				       branch, flags, NULL,
+				       branch_from);
+		if (err)
+			return (err < 0) ? err : 0;
+	}
+	return 1;
 }
 
 static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 06/17] perf machine: Refine the function for LBR call stack reconstruction
  2020-03-19 20:25 [PATCH V4 00/17] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (4 preceding siblings ...)
  2020-03-19 20:25 ` [PATCH V4 05/17] perf machine: Remove the indent in resolve_lbr_callchain_sample kan.liang
@ 2020-03-19 20:25 ` kan.liang
  2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2020-03-19 20:25 ` [PATCH V4 07/17] perf machine: Factor out lbr_callchain_add_kernel_ip() kan.liang
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 46+ messages in thread
From: kan.liang @ 2020-03-19 20:25 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

LBR only collect the user call stack. To reconstruct a call stack, both
kernel call stack and user call stack are required. The function
resolve_lbr_callchain_sample() mix the kernel call stack and user call
stack.
Now, with the help of HW idx, perf tool can reconstruct a more complete
call stack by adding some user call stack from previous sample. However,
current implementation is hard to be extended to support it.

Current code path for resolve_lbr_callchain_sample()

  for (j = 0; j < mix_chain_nr; j++) {
       if (ORDER_CALLEE) {
             if (kernel callchain)
                  Fill callchain info
             else if (LBR callchain)
                  Fill callchain info
       } else {
             if (LBR callchain)
                  Fill callchain info
             else if (kernel callchain)
                  Fill callchain info
       }
       add_callchain_ip();
  }

With the patch,

  if (ORDER_CALLEE) {
       for (j = 0; j < NUM of kernel callchain) {
             Fill callchain info
             add_callchain_ip();
       }
       for (; j < mix_chain_nr) {
             Fill callchain info
             add_callchain_ip();
       }
  } else {
       for (; j < NUM of LBR callchain) {
             Fill callchain info
             add_callchain_ip();
       }
       for (j = 0; j < mix_chain_nr) {
             Fill callchain info
             add_callchain_ip();
       }
  }

No functional changes.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/util/machine.c | 111 ++++++++++++++++++++++++++------------
 1 file changed, 76 insertions(+), 35 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 9021e5b6a2a9..cf2c97a6ef81 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2183,6 +2183,7 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 	bool branch;
 	struct branch_flags *flags;
 	int mix_chain_nr;
+	int err;
 
 	for (i = 0; i < chain_nr; i++) {
 		if (chain->ips[i] == PERF_CONTEXT_USER)
@@ -2208,50 +2209,90 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 	 */
 	mix_chain_nr = i + 1 + lbr_nr + 1;
 
-	for (j = 0; j < mix_chain_nr; j++) {
-		int err;
-
-		branch = false;
-		flags = NULL;
-
-		if (callchain_param.order == ORDER_CALLEE) {
-			if (j < i + 1)
-				ip = chain->ips[j];
-			else if (j > i + 1) {
-				k = j - i - 2;
-				ip = entries[k].from;
-				branch = true;
-				flags = &entries[k].flags;
-			} else {
-				ip = entries[0].to;
-				branch = true;
-				flags = &entries[0].flags;
-				branch_from = entries[0].from;
-			}
-		} else {
-			if (j < lbr_nr) {
-				k = lbr_nr - j - 1;
-				ip = entries[k].from;
-				branch = true;
-				flags = &entries[k].flags;
-			} else if (j > lbr_nr)
-				ip = chain->ips[i + 1 - (j - lbr_nr)];
-			else {
-				ip = entries[0].to;
-				branch = true;
-				flags = &entries[0].flags;
-				branch_from = entries[0].from;
-			}
+	if (callchain_param.order == ORDER_CALLEE) {
+		/* Add kernel ip */
+		for (j = 0; j < i + 1; j++) {
+			ip = chain->ips[j];
+			branch = false;
+			flags = NULL;
+			err = add_callchain_ip(thread, cursor, parent,
+					       root_al, &cpumode, ip,
+					       branch, flags, NULL,
+					       branch_from);
+			if (err)
+				goto error;
 		}
+		/* Add LBR ip from first entries.to */
+		ip = entries[0].to;
+		branch = true;
+		flags = &entries[0].flags;
+		branch_from = entries[0].from;
+		err = add_callchain_ip(thread, cursor, parent,
+				       root_al, &cpumode, ip,
+				       branch, flags, NULL,
+				       branch_from);
+		if (err)
+			goto error;
 
+		/* Add LBR ip from entries.from one by one. */
+		for (j = i + 2; j < mix_chain_nr; j++) {
+			k = j - i - 2;
+			ip = entries[k].from;
+			branch = true;
+			flags = &entries[k].flags;
+
+			err = add_callchain_ip(thread, cursor, parent,
+					       root_al, &cpumode, ip,
+					       branch, flags, NULL,
+					       branch_from);
+			if (err)
+				goto error;
+		}
+	} else {
+		/* Add LBR ip from entries.from one by one. */
+		for (j = 0; j < lbr_nr; j++) {
+			k = lbr_nr - j - 1;
+			ip = entries[k].from;
+			branch = true;
+			flags = &entries[k].flags;
+
+			err = add_callchain_ip(thread, cursor, parent,
+					       root_al, &cpumode, ip,
+					       branch, flags, NULL,
+					       branch_from);
+			if (err)
+				goto error;
+		}
+
+		/* Add LBR ip from first entries.to */
+		ip = entries[0].to;
+		branch = true;
+		flags = &entries[0].flags;
+		branch_from = entries[0].from;
 		err = add_callchain_ip(thread, cursor, parent,
 				       root_al, &cpumode, ip,
 				       branch, flags, NULL,
 				       branch_from);
 		if (err)
-			return (err < 0) ? err : 0;
+			goto error;
+
+		/* Add kernel ip */
+		for (j = lbr_nr + 1; j < mix_chain_nr; j++) {
+			ip = chain->ips[i + 1 - (j - lbr_nr)];
+			branch = false;
+			flags = NULL;
+			err = add_callchain_ip(thread, cursor, parent,
+					       root_al, &cpumode, ip,
+					       branch, flags, NULL,
+					       branch_from);
+			if (err)
+				goto error;
+		}
 	}
 	return 1;
+
+error:
+	return (err < 0) ? err : 0;
 }
 
 static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 07/17] perf machine: Factor out lbr_callchain_add_kernel_ip()
  2020-03-19 20:25 [PATCH V4 00/17] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (5 preceding siblings ...)
  2020-03-19 20:25 ` [PATCH V4 06/17] perf machine: Refine the function for LBR call stack reconstruction kan.liang
@ 2020-03-19 20:25 ` kan.liang
  2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2020-03-19 20:25 ` [PATCH V4 08/17] perf machine: Factor out lbr_callchain_add_lbr_ip() kan.liang
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 46+ messages in thread
From: kan.liang @ 2020-03-19 20:25 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Both caller and callee needs to add kernel ip to callchain.
Factor out lbr_callchain_add_kernel_ip() to improve code readability.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/util/machine.c | 67 ++++++++++++++++++++++++++-------------
 1 file changed, 45 insertions(+), 22 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index cf2c97a6ef81..ef26f28eacea 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2159,6 +2159,40 @@ static int remove_loops(struct branch_entry *l, int nr,
 	return nr;
 }
 
+static int lbr_callchain_add_kernel_ip(struct thread *thread,
+				       struct callchain_cursor *cursor,
+				       struct perf_sample *sample,
+				       struct symbol **parent,
+				       struct addr_location *root_al,
+				       u64 branch_from,
+				       bool callee, int end)
+{
+	struct ip_callchain *chain = sample->callchain;
+	u8 cpumode = PERF_RECORD_MISC_USER;
+	int err, i;
+
+	if (callee) {
+		for (i = 0; i < end + 1; i++) {
+			err = add_callchain_ip(thread, cursor, parent,
+					       root_al, &cpumode, chain->ips[i],
+					       false, NULL, NULL, branch_from);
+			if (err)
+				return err;
+		}
+		return 0;
+	}
+
+	for (i = end; i >= 0; i--) {
+		err = add_callchain_ip(thread, cursor, parent,
+				       root_al, &cpumode, chain->ips[i],
+				       false, NULL, NULL, branch_from);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 /*
  * Recolve LBR callstack chain sample
  * Return:
@@ -2211,17 +2245,12 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 
 	if (callchain_param.order == ORDER_CALLEE) {
 		/* Add kernel ip */
-		for (j = 0; j < i + 1; j++) {
-			ip = chain->ips[j];
-			branch = false;
-			flags = NULL;
-			err = add_callchain_ip(thread, cursor, parent,
-					       root_al, &cpumode, ip,
-					       branch, flags, NULL,
-					       branch_from);
-			if (err)
-				goto error;
-		}
+		err = lbr_callchain_add_kernel_ip(thread, cursor, sample,
+						  parent, root_al, branch_from,
+						  true, i);
+		if (err)
+			goto error;
+
 		/* Add LBR ip from first entries.to */
 		ip = entries[0].to;
 		branch = true;
@@ -2277,17 +2306,11 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 			goto error;
 
 		/* Add kernel ip */
-		for (j = lbr_nr + 1; j < mix_chain_nr; j++) {
-			ip = chain->ips[i + 1 - (j - lbr_nr)];
-			branch = false;
-			flags = NULL;
-			err = add_callchain_ip(thread, cursor, parent,
-					       root_al, &cpumode, ip,
-					       branch, flags, NULL,
-					       branch_from);
-			if (err)
-				goto error;
-		}
+		err = lbr_callchain_add_kernel_ip(thread, cursor, sample,
+						  parent, root_al, branch_from,
+						  false, i);
+		if (err)
+			goto error;
 	}
 	return 1;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 08/17] perf machine: Factor out lbr_callchain_add_lbr_ip()
  2020-03-19 20:25 [PATCH V4 00/17] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (6 preceding siblings ...)
  2020-03-19 20:25 ` [PATCH V4 07/17] perf machine: Factor out lbr_callchain_add_kernel_ip() kan.liang
@ 2020-03-19 20:25 ` kan.liang
  2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2020-03-19 20:25 ` [PATCH V4 09/17] perf thread: Add a knob for LBR stitch approach kan.liang
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 46+ messages in thread
From: kan.liang @ 2020-03-19 20:25 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Both caller and callee needs to add ip from LBR to callchain.
Factor out lbr_callchain_add_lbr_ip() to improve code readability.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/util/machine.c | 143 +++++++++++++++++++-------------------
 1 file changed, 73 insertions(+), 70 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index ef26f28eacea..f1661dd3ca69 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2193,6 +2193,74 @@ static int lbr_callchain_add_kernel_ip(struct thread *thread,
 	return 0;
 }
 
+static int lbr_callchain_add_lbr_ip(struct thread *thread,
+				    struct callchain_cursor *cursor,
+				    struct perf_sample *sample,
+				    struct symbol **parent,
+				    struct addr_location *root_al,
+				    u64 *branch_from,
+				    bool callee)
+{
+	struct branch_stack *lbr_stack = sample->branch_stack;
+	struct branch_entry *entries = perf_sample__branch_entries(sample);
+	u8 cpumode = PERF_RECORD_MISC_USER;
+	int lbr_nr = lbr_stack->nr;
+	struct branch_flags *flags;
+	int err, i;
+	u64 ip;
+
+	if (callee) {
+		/* Add LBR ip from first entries.to */
+		ip = entries[0].to;
+		flags = &entries[0].flags;
+		*branch_from = entries[0].from;
+		err = add_callchain_ip(thread, cursor, parent,
+				       root_al, &cpumode, ip,
+				       true, flags, NULL,
+				       *branch_from);
+		if (err)
+			return err;
+
+		/* Add LBR ip from entries.from one by one. */
+		for (i = 0; i < lbr_nr; i++) {
+			ip = entries[i].from;
+			flags = &entries[i].flags;
+			err = add_callchain_ip(thread, cursor, parent,
+					       root_al, &cpumode, ip,
+					       true, flags, NULL,
+					       *branch_from);
+			if (err)
+				return err;
+		}
+		return 0;
+	}
+
+	/* Add LBR ip from entries.from one by one. */
+	for (i = lbr_nr - 1; i >= 0; i--) {
+		ip = entries[i].from;
+		flags = &entries[i].flags;
+		err = add_callchain_ip(thread, cursor, parent,
+				       root_al, &cpumode, ip,
+				       true, flags, NULL,
+				       *branch_from);
+		if (err)
+			return err;
+	}
+
+	/* Add LBR ip from first entries.to */
+	ip = entries[0].to;
+	flags = &entries[0].flags;
+	*branch_from = entries[0].from;
+	err = add_callchain_ip(thread, cursor, parent,
+			       root_al, &cpumode, ip,
+			       true, flags, NULL,
+			       *branch_from);
+	if (err)
+		return err;
+
+	return 0;
+}
+
 /*
  * Recolve LBR callstack chain sample
  * Return:
@@ -2209,14 +2277,7 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 {
 	struct ip_callchain *chain = sample->callchain;
 	int chain_nr = min(max_stack, (int)chain->nr), i;
-	u8 cpumode = PERF_RECORD_MISC_USER;
-	u64 ip, branch_from = 0;
-	struct branch_stack *lbr_stack;
-	struct branch_entry *entries;
-	int lbr_nr, j, k;
-	bool branch;
-	struct branch_flags *flags;
-	int mix_chain_nr;
+	u64 branch_from = 0;
 	int err;
 
 	for (i = 0; i < chain_nr; i++) {
@@ -2228,21 +2289,6 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 	if (i == chain_nr)
 		return 0;
 
-	lbr_stack = sample->branch_stack;
-	entries = perf_sample__branch_entries(sample);
-	lbr_nr = lbr_stack->nr;
-	/*
-	 * LBR callstack can only get user call chain.
-	 * The mix_chain_nr is kernel call chain
-	 * number plus LBR user call chain number.
-	 * i is kernel call chain number,
-	 * 1 is PERF_CONTEXT_USER,
-	 * lbr_nr + 1 is the user call chain number.
-	 * For details, please refer to the comments
-	 * in callchain__printf
-	 */
-	mix_chain_nr = i + 1 + lbr_nr + 1;
-
 	if (callchain_param.order == ORDER_CALLEE) {
 		/* Add kernel ip */
 		err = lbr_callchain_add_kernel_ip(thread, cursor, sample,
@@ -2251,57 +2297,14 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 		if (err)
 			goto error;
 
-		/* Add LBR ip from first entries.to */
-		ip = entries[0].to;
-		branch = true;
-		flags = &entries[0].flags;
-		branch_from = entries[0].from;
-		err = add_callchain_ip(thread, cursor, parent,
-				       root_al, &cpumode, ip,
-				       branch, flags, NULL,
-				       branch_from);
+		err = lbr_callchain_add_lbr_ip(thread, cursor, sample, parent,
+					       root_al, &branch_from, true);
 		if (err)
 			goto error;
 
-		/* Add LBR ip from entries.from one by one. */
-		for (j = i + 2; j < mix_chain_nr; j++) {
-			k = j - i - 2;
-			ip = entries[k].from;
-			branch = true;
-			flags = &entries[k].flags;
-
-			err = add_callchain_ip(thread, cursor, parent,
-					       root_al, &cpumode, ip,
-					       branch, flags, NULL,
-					       branch_from);
-			if (err)
-				goto error;
-		}
 	} else {
-		/* Add LBR ip from entries.from one by one. */
-		for (j = 0; j < lbr_nr; j++) {
-			k = lbr_nr - j - 1;
-			ip = entries[k].from;
-			branch = true;
-			flags = &entries[k].flags;
-
-			err = add_callchain_ip(thread, cursor, parent,
-					       root_al, &cpumode, ip,
-					       branch, flags, NULL,
-					       branch_from);
-			if (err)
-				goto error;
-		}
-
-		/* Add LBR ip from first entries.to */
-		ip = entries[0].to;
-		branch = true;
-		flags = &entries[0].flags;
-		branch_from = entries[0].from;
-		err = add_callchain_ip(thread, cursor, parent,
-				       root_al, &cpumode, ip,
-				       branch, flags, NULL,
-				       branch_from);
+		err = lbr_callchain_add_lbr_ip(thread, cursor, sample, parent,
+					       root_al, &branch_from, false);
 		if (err)
 			goto error;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 09/17] perf thread: Add a knob for LBR stitch approach
  2020-03-19 20:25 [PATCH V4 00/17] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (7 preceding siblings ...)
  2020-03-19 20:25 ` [PATCH V4 08/17] perf machine: Factor out lbr_callchain_add_lbr_ip() kan.liang
@ 2020-03-19 20:25 ` kan.liang
  2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2020-03-19 20:25 ` [PATCH V4 10/17] perf tools: Save previous sample for LBR stitching approach kan.liang
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 46+ messages in thread
From: kan.liang @ 2020-03-19 20:25 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The LBR stitch approach should be disabled by default. Because
- The stitching approach base on LBR call stack technology. The known
limitations of LBR call stack technology still apply to the approach,
e.g. Exception handing such as setjmp/longjmp will have calls/returns
not match.
- This approach is not full proof. There can be cases where it creates
incorrect call stacks from incorrect matches. There is no attempt to
validate any matches in another way.

The 'lbr_stitch_enable' is used to indicate whether enable LBR stitch
approach, which is disabled by default. The following patch will
introduce a new option for each tools to enable the LBR stitch
approach.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/util/thread.c | 1 +
 tools/perf/util/thread.h | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 28b719388028..1f080db23615 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -47,6 +47,7 @@ struct thread *thread__new(pid_t pid, pid_t tid)
 		thread->tid = tid;
 		thread->ppid = -1;
 		thread->cpu = -1;
+		thread->lbr_stitch_enable = false;
 		INIT_LIST_HEAD(&thread->namespaces_list);
 		INIT_LIST_HEAD(&thread->comm_list);
 		init_rwsem(&thread->namespaces_lock);
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 20b96b5d1f15..95294050cff2 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -46,6 +46,9 @@ struct thread {
 	struct srccode_state	srccode_state;
 	bool			filter;
 	int			filter_entry_depth;
+
+	/* LBR call stack stitch */
+	bool			lbr_stitch_enable;
 };
 
 struct machine;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 10/17] perf tools: Save previous sample for LBR stitching approach
  2020-03-19 20:25 [PATCH V4 00/17] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (8 preceding siblings ...)
  2020-03-19 20:25 ` [PATCH V4 09/17] perf thread: Add a knob for LBR stitch approach kan.liang
@ 2020-03-19 20:25 ` kan.liang
  2020-04-17 15:02   ` Arnaldo Carvalho de Melo
  2020-04-22 12:17   ` [tip: perf/core] perf thread: " tip-bot2 for Kan Liang
  2020-03-19 20:25 ` [PATCH V4 11/17] perf tools: Save previous cursor nodes " kan.liang
                   ` (8 subsequent siblings)
  18 siblings, 2 replies; 46+ messages in thread
From: kan.liang @ 2020-03-19 20:25 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

To retrieve the overwritten LBRs from previous sample for LBR stitching
approach, perf has to save the previous sample.

Only allocate the struct lbr_stitch once, when LBR stitching approach
is enabled and kernel supports hw_idx.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/util/machine.c | 23 +++++++++++++++++++++++
 tools/perf/util/thread.c  |  1 +
 tools/perf/util/thread.h  | 11 +++++++++++
 3 files changed, 35 insertions(+)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index f1661dd3ca69..d91e11bfc8ca 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2261,6 +2261,21 @@ static int lbr_callchain_add_lbr_ip(struct thread *thread,
 	return 0;
 }
 
+static bool alloc_lbr_stitch(struct thread *thread)
+{
+	if (thread->lbr_stitch)
+		return true;
+
+	thread->lbr_stitch = calloc(1, sizeof(struct lbr_stitch));
+	if (!thread->lbr_stitch)
+		goto err;
+
+err:
+	pr_warning("Failed to allocate space for stitched LBRs. Disable LBR stitch\n");
+	thread->lbr_stitch_enable = false;
+	return false;
+}
+
 /*
  * Recolve LBR callstack chain sample
  * Return:
@@ -2277,6 +2292,7 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 {
 	struct ip_callchain *chain = sample->callchain;
 	int chain_nr = min(max_stack, (int)chain->nr), i;
+	struct lbr_stitch *lbr_stitch;
 	u64 branch_from = 0;
 	int err;
 
@@ -2289,6 +2305,13 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 	if (i == chain_nr)
 		return 0;
 
+	if (thread->lbr_stitch_enable && !sample->no_hw_idx &&
+	    alloc_lbr_stitch(thread)) {
+		lbr_stitch = thread->lbr_stitch;
+
+		memcpy(&lbr_stitch->prev_sample, sample, sizeof(*sample));
+	}
+
 	if (callchain_param.order == ORDER_CALLEE) {
 		/* Add kernel ip */
 		err = lbr_callchain_add_kernel_ip(thread, cursor, sample,
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 1f080db23615..8d0da260c84c 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -111,6 +111,7 @@ void thread__delete(struct thread *thread)
 
 	exit_rwsem(&thread->namespaces_lock);
 	exit_rwsem(&thread->comm_lock);
+	thread__free_stitch_list(thread);
 	free(thread);
 }
 
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 95294050cff2..f65a84a25f93 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -13,6 +13,7 @@
 #include <strlist.h>
 #include <intlist.h>
 #include "rwsem.h"
+#include "event.h"
 
 struct addr_location;
 struct map;
@@ -20,6 +21,10 @@ struct perf_record_namespaces;
 struct thread_stack;
 struct unwind_libunwind_ops;
 
+struct lbr_stitch {
+	struct perf_sample		prev_sample;
+};
+
 struct thread {
 	union {
 		struct rb_node	 rb_node;
@@ -49,6 +54,7 @@ struct thread {
 
 	/* LBR call stack stitch */
 	bool			lbr_stitch_enable;
+	struct lbr_stitch	*lbr_stitch;
 };
 
 struct machine;
@@ -145,4 +151,9 @@ static inline bool thread__is_filtered(struct thread *thread)
 	return false;
 }
 
+static inline void thread__free_stitch_list(struct thread *thread)
+{
+	free(thread->lbr_stitch);
+}
+
 #endif	/* __PERF_THREAD_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 11/17] perf tools: Save previous cursor nodes for LBR stitching approach
  2020-03-19 20:25 [PATCH V4 00/17] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (9 preceding siblings ...)
  2020-03-19 20:25 ` [PATCH V4 10/17] perf tools: Save previous sample for LBR stitching approach kan.liang
@ 2020-03-19 20:25 ` kan.liang
  2020-04-17 16:53   ` Arnaldo Carvalho de Melo
  2020-04-22 12:17   ` [tip: perf/core] perf callchain: " tip-bot2 for Kan Liang
  2020-03-19 20:25 ` [PATCH V4 12/17] perf tools: Stitch LBR call stack kan.liang
                   ` (7 subsequent siblings)
  18 siblings, 2 replies; 46+ messages in thread
From: kan.liang @ 2020-03-19 20:25 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The cursor nodes which generates from sample are eventually added into
callchain. To avoid generating cursor nodes from previous samples again,
the previous cursor nodes are also saved for LBR stitching approach.

Some option, e.g. hide-unresolved, may hide some LBRs.
Add a variable 'valid' in struct callchain_cursor_node to indicate this
case. The LBR stitching approach will only append the valid cursor nodes
from previous samples later.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/util/callchain.h |  3 ++
 tools/perf/util/machine.c   | 77 +++++++++++++++++++++++++++++++++++--
 tools/perf/util/thread.h    |  8 ++++
 3 files changed, 84 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 706bb7bbe1e1..cb33cd42ff43 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -143,6 +143,9 @@ struct callchain_cursor_node {
 	u64				ip;
 	struct map_symbol		ms;
 	const char			*srcline;
+	/* Indicate valid cursor node for LBR stitch */
+	bool				valid;
+
 	bool				branch;
 	struct branch_flags		branch_flags;
 	u64				branch_from;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index d91e11bfc8ca..f190265a1f26 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2193,6 +2193,31 @@ static int lbr_callchain_add_kernel_ip(struct thread *thread,
 	return 0;
 }
 
+static void save_lbr_cursor_node(struct thread *thread,
+				 struct callchain_cursor *cursor,
+				 int idx)
+{
+	struct lbr_stitch *lbr_stitch = thread->lbr_stitch;
+
+	if (!lbr_stitch)
+		return;
+
+	if (cursor->pos == cursor->nr) {
+		lbr_stitch->prev_lbr_cursor[idx].valid = false;
+		return;
+	}
+
+	if (!cursor->curr)
+		cursor->curr = cursor->first;
+	else
+		cursor->curr = cursor->curr->next;
+	memcpy(&lbr_stitch->prev_lbr_cursor[idx], cursor->curr,
+	       sizeof(struct callchain_cursor_node));
+
+	lbr_stitch->prev_lbr_cursor[idx].valid = true;
+	cursor->pos++;
+}
+
 static int lbr_callchain_add_lbr_ip(struct thread *thread,
 				    struct callchain_cursor *cursor,
 				    struct perf_sample *sample,
@@ -2209,6 +2234,21 @@ static int lbr_callchain_add_lbr_ip(struct thread *thread,
 	int err, i;
 	u64 ip;
 
+	/*
+	 * The curr and pos are not used in writing session. They are cleared
+	 * in callchain_cursor_commit() when the writing session is closed.
+	 * Using curr and pos to track the current cursor node.
+	 */
+	if (thread->lbr_stitch) {
+		cursor->curr = NULL;
+		cursor->pos = cursor->nr;
+		if (cursor->nr) {
+			cursor->curr = cursor->first;
+			for (i = 0; i < (int)(cursor->nr - 1); i++)
+				cursor->curr = cursor->curr->next;
+		}
+	}
+
 	if (callee) {
 		/* Add LBR ip from first entries.to */
 		ip = entries[0].to;
@@ -2221,6 +2261,20 @@ static int lbr_callchain_add_lbr_ip(struct thread *thread,
 		if (err)
 			return err;
 
+		/*
+		 * The number of cursor node increases.
+		 * Move the current cursor node.
+		 * But does not need to save current cursor node for entry 0.
+		 * It's impossible to stitch the whole LBRs of previous sample.
+		 */
+		if (thread->lbr_stitch && (cursor->pos != cursor->nr)) {
+			if (!cursor->curr)
+				cursor->curr = cursor->first;
+			else
+				cursor->curr = cursor->curr->next;
+			cursor->pos++;
+		}
+
 		/* Add LBR ip from entries.from one by one. */
 		for (i = 0; i < lbr_nr; i++) {
 			ip = entries[i].from;
@@ -2231,6 +2285,7 @@ static int lbr_callchain_add_lbr_ip(struct thread *thread,
 					       *branch_from);
 			if (err)
 				return err;
+			save_lbr_cursor_node(thread, cursor, i);
 		}
 		return 0;
 	}
@@ -2245,6 +2300,7 @@ static int lbr_callchain_add_lbr_ip(struct thread *thread,
 				       *branch_from);
 		if (err)
 			return err;
+		save_lbr_cursor_node(thread, cursor, i);
 	}
 
 	/* Add LBR ip from first entries.to */
@@ -2261,7 +2317,7 @@ static int lbr_callchain_add_lbr_ip(struct thread *thread,
 	return 0;
 }
 
-static bool alloc_lbr_stitch(struct thread *thread)
+static bool alloc_lbr_stitch(struct thread *thread, unsigned int max_lbr)
 {
 	if (thread->lbr_stitch)
 		return true;
@@ -2270,6 +2326,15 @@ static bool alloc_lbr_stitch(struct thread *thread)
 	if (!thread->lbr_stitch)
 		goto err;
 
+	thread->lbr_stitch->prev_lbr_cursor = calloc(max_lbr + 1, sizeof(struct callchain_cursor_node));
+	if (!thread->lbr_stitch->prev_lbr_cursor)
+		goto free_lbr_stitch;
+
+	return true;
+
+free_lbr_stitch:
+	free(thread->lbr_stitch);
+	thread->lbr_stitch = NULL;
 err:
 	pr_warning("Failed to allocate space for stitched LBRs. Disable LBR stitch\n");
 	thread->lbr_stitch_enable = false;
@@ -2288,7 +2353,8 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 					struct perf_sample *sample,
 					struct symbol **parent,
 					struct addr_location *root_al,
-					int max_stack)
+					int max_stack,
+					unsigned int max_lbr)
 {
 	struct ip_callchain *chain = sample->callchain;
 	int chain_nr = min(max_stack, (int)chain->nr), i;
@@ -2306,7 +2372,7 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 		return 0;
 
 	if (thread->lbr_stitch_enable && !sample->no_hw_idx &&
-	    alloc_lbr_stitch(thread)) {
+	    (max_lbr > 0) && alloc_lbr_stitch(thread, max_lbr)) {
 		lbr_stitch = thread->lbr_stitch;
 
 		memcpy(&lbr_stitch->prev_sample, sample, sizeof(*sample));
@@ -2386,8 +2452,11 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 		chain_nr = chain->nr;
 
 	if (perf_evsel__has_branch_callstack(evsel)) {
+		struct perf_env *env = perf_evsel__env(evsel);
+
 		err = resolve_lbr_callchain_sample(thread, cursor, sample, parent,
-						   root_al, max_stack);
+						   root_al, max_stack,
+						   !env ? 0 : env->max_branches);
 		if (err)
 			return (err < 0) ? err : 0;
 	}
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index f65a84a25f93..477c669cdbfa 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -14,6 +14,7 @@
 #include <intlist.h>
 #include "rwsem.h"
 #include "event.h"
+#include "callchain.h"
 
 struct addr_location;
 struct map;
@@ -23,6 +24,7 @@ struct unwind_libunwind_ops;
 
 struct lbr_stitch {
 	struct perf_sample		prev_sample;
+	struct callchain_cursor_node	*prev_lbr_cursor;
 };
 
 struct thread {
@@ -153,6 +155,12 @@ static inline bool thread__is_filtered(struct thread *thread)
 
 static inline void thread__free_stitch_list(struct thread *thread)
 {
+	struct lbr_stitch *lbr_stitch = thread->lbr_stitch;
+
+	if (!lbr_stitch)
+		return;
+
+	free(lbr_stitch->prev_lbr_cursor);
 	free(thread->lbr_stitch);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 12/17] perf tools: Stitch LBR call stack
  2020-03-19 20:25 [PATCH V4 00/17] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (10 preceding siblings ...)
  2020-03-19 20:25 ` [PATCH V4 11/17] perf tools: Save previous cursor nodes " kan.liang
@ 2020-03-19 20:25 ` kan.liang
  2020-04-22 12:17   ` [tip: perf/core] perf callchain: " tip-bot2 for Kan Liang
  2020-03-19 20:25 ` [PATCH V4 13/17] perf report: Add option to enable the LBR stitching approach kan.liang
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 46+ messages in thread
From: kan.liang @ 2020-03-19 20:25 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

In LBR call stack mode, the depth of reconstructed LBR call stack limits
to the number of LBR registers.

  For example, on skylake, the depth of reconstructed LBR call stack is
  always <= 32.

  # To display the perf.data header info, please use
  # --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 6K of event 'cycles'
  # Event count (approx.): 6487119731
  #
  # Children      Self  Command          Shared Object       Symbol
  # ........  ........  ...............  ..................
  # ................................

    99.97%    99.97%  tchain_edit      tchain_edit        [.] f43
            |
             --99.64%--f11
                       f12
                       f13
                       f14
                       f15
                       f16
                       f17
                       f18
                       f19
                       f20
                       f21
                       f22
                       f23
                       f24
                       f25
                       f26
                       f27
                       f28
                       f29
                       f30
                       f31
                       f32
                       f33
                       f34
                       f35
                       f36
                       f37
                       f38
                       f39
                       f40
                       f41
                       f42
                       f43

For a call stack which is deeper than LBR limit, HW will overwrite the
LBR register with oldest branch. Only partial call stacks can be
reconstructed.

However, the overwritten LBRs may still be retrieved from previous
sample. At that moment, HW hasn't overwritten the LBR registers yet.
Perf tools can stitch those overwritten LBRs on current call stacks to
get a more complete call stack.

To determine if LBRs can be stitched, perf tools need to compare current
sample with previous sample.
- They should have identical LBR records (Same from, to and flags
  values, and the same physical index of LBR registers).
- The searching starts from the base-of-stack of current sample.

Once perf determines to stitch the previous LBRs, the corresponding LBR
cursor nodes will be copied to 'lists'.
The 'lists' is to track the LBR cursor nodes which are going to be
stitched.
When the stitching is over, the nodes will not be freed immediately.
They will be moved to 'free_lists'. Next stitching may reuse the space.
Both 'lists' and 'free_lists' will be freed when all samples are
processed.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/util/branch.h    |  19 +++--
 tools/perf/util/callchain.h |   5 ++
 tools/perf/util/machine.c   | 139 +++++++++++++++++++++++++++++++++++-
 tools/perf/util/thread.h    |  13 ++++
 4 files changed, 168 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/branch.h b/tools/perf/util/branch.h
index 154a05cd03af..4d3f02fa223d 100644
--- a/tools/perf/util/branch.h
+++ b/tools/perf/util/branch.h
@@ -15,13 +15,18 @@
 #include "event.h"
 
 struct branch_flags {
-	u64 mispred:1;
-	u64 predicted:1;
-	u64 in_tx:1;
-	u64 abort:1;
-	u64 cycles:16;
-	u64 type:4;
-	u64 reserved:40;
+	union {
+		u64 value;
+		struct {
+			u64 mispred:1;
+			u64 predicted:1;
+			u64 in_tx:1;
+			u64 abort:1;
+			u64 cycles:16;
+			u64 type:4;
+			u64 reserved:40;
+		};
+	};
 };
 
 struct branch_info {
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index cb33cd42ff43..8f668ee29f25 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -154,6 +154,11 @@ struct callchain_cursor_node {
 	struct callchain_cursor_node	*next;
 };
 
+struct stitch_list {
+	struct list_head		node;
+	struct callchain_cursor_node	cursor;
+};
+
 struct callchain_cursor {
 	u64				nr;
 	struct callchain_cursor_node	*first;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index f190265a1f26..4d4df1f3c066 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2317,6 +2317,119 @@ static int lbr_callchain_add_lbr_ip(struct thread *thread,
 	return 0;
 }
 
+static int lbr_callchain_add_stitched_lbr_ip(struct thread *thread,
+					     struct callchain_cursor *cursor)
+{
+	struct lbr_stitch *lbr_stitch = thread->lbr_stitch;
+	struct callchain_cursor_node *cnode;
+	struct stitch_list *stitch_node;
+	int err;
+
+	list_for_each_entry(stitch_node, &lbr_stitch->lists, node) {
+		cnode = &stitch_node->cursor;
+
+		err = callchain_cursor_append(cursor, cnode->ip,
+					      &cnode->ms,
+					      cnode->branch,
+					      &cnode->branch_flags,
+					      cnode->nr_loop_iter,
+					      cnode->iter_cycles,
+					      cnode->branch_from,
+					      cnode->srcline);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
+static struct stitch_list *get_stitch_node(struct thread *thread)
+{
+	struct lbr_stitch *lbr_stitch = thread->lbr_stitch;
+	struct stitch_list *stitch_node;
+
+	if (!list_empty(&lbr_stitch->free_lists)) {
+		stitch_node = list_first_entry(&lbr_stitch->free_lists,
+					       struct stitch_list, node);
+		list_del(&stitch_node->node);
+
+		return stitch_node;
+	}
+
+	return malloc(sizeof(struct stitch_list));
+}
+
+static bool has_stitched_lbr(struct thread *thread,
+			     struct perf_sample *cur,
+			     struct perf_sample *prev,
+			     unsigned int max_lbr,
+			     bool callee)
+{
+	struct branch_stack *cur_stack = cur->branch_stack;
+	struct branch_entry *cur_entries = perf_sample__branch_entries(cur);
+	struct branch_stack *prev_stack = prev->branch_stack;
+	struct branch_entry *prev_entries = perf_sample__branch_entries(prev);
+	struct lbr_stitch *lbr_stitch = thread->lbr_stitch;
+	int i, j, nr_identical_branches = 0;
+	struct stitch_list *stitch_node;
+	u64 cur_base, distance;
+
+	if (!cur_stack || !prev_stack)
+		return false;
+
+	/* Find the physical index of the base-of-stack for current sample. */
+	cur_base = max_lbr - cur_stack->nr + cur_stack->hw_idx + 1;
+
+	distance = (prev_stack->hw_idx > cur_base) ? (prev_stack->hw_idx - cur_base) :
+						     (max_lbr + prev_stack->hw_idx - cur_base);
+	/* Previous sample has shorter stack. Nothing can be stitched. */
+	if (distance + 1 > prev_stack->nr)
+		return false;
+
+	/*
+	 * Check if there are identical LBRs between two samples.
+	 * Identicall LBRs must have same from, to and flags values. Also,
+	 * they have to be saved in the same LBR registers (same physical
+	 * index).
+	 *
+	 * Starts from the base-of-stack of current sample.
+	 */
+	for (i = distance, j = cur_stack->nr - 1; (i >= 0) && (j >= 0); i--, j--) {
+		if ((prev_entries[i].from != cur_entries[j].from) ||
+		    (prev_entries[i].to != cur_entries[j].to) ||
+		    (prev_entries[i].flags.value != cur_entries[j].flags.value))
+			break;
+		nr_identical_branches++;
+	}
+
+	if (!nr_identical_branches)
+		return false;
+
+	/*
+	 * Save the LBRs between the base-of-stack of previous sample
+	 * and the base-of-stack of current sample into lbr_stitch->lists.
+	 * These LBRs will be stitched later.
+	 */
+	for (i = prev_stack->nr - 1; i > (int)distance; i--) {
+
+		if (!lbr_stitch->prev_lbr_cursor[i].valid)
+			continue;
+
+		stitch_node = get_stitch_node(thread);
+		if (!stitch_node)
+			return false;
+
+		memcpy(&stitch_node->cursor, &lbr_stitch->prev_lbr_cursor[i],
+		       sizeof(struct callchain_cursor_node));
+
+		if (callee)
+			list_add(&stitch_node->node, &lbr_stitch->lists);
+		else
+			list_add_tail(&stitch_node->node, &lbr_stitch->lists);
+	}
+
+	return true;
+}
+
 static bool alloc_lbr_stitch(struct thread *thread, unsigned int max_lbr)
 {
 	if (thread->lbr_stitch)
@@ -2330,6 +2443,9 @@ static bool alloc_lbr_stitch(struct thread *thread, unsigned int max_lbr)
 	if (!thread->lbr_stitch->prev_lbr_cursor)
 		goto free_lbr_stitch;
 
+	INIT_LIST_HEAD(&thread->lbr_stitch->lists);
+	INIT_LIST_HEAD(&thread->lbr_stitch->free_lists);
+
 	return true;
 
 free_lbr_stitch:
@@ -2356,9 +2472,11 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 					int max_stack,
 					unsigned int max_lbr)
 {
+	bool callee = (callchain_param.order == ORDER_CALLEE);
 	struct ip_callchain *chain = sample->callchain;
 	int chain_nr = min(max_stack, (int)chain->nr), i;
 	struct lbr_stitch *lbr_stitch;
+	bool stitched_lbr = false;
 	u64 branch_from = 0;
 	int err;
 
@@ -2375,10 +2493,18 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 	    (max_lbr > 0) && alloc_lbr_stitch(thread, max_lbr)) {
 		lbr_stitch = thread->lbr_stitch;
 
+		stitched_lbr = has_stitched_lbr(thread, sample,
+						&lbr_stitch->prev_sample,
+						max_lbr, callee);
+
+		if (!stitched_lbr && !list_empty(&lbr_stitch->lists)) {
+			list_replace_init(&lbr_stitch->lists,
+					  &lbr_stitch->free_lists);
+		}
 		memcpy(&lbr_stitch->prev_sample, sample, sizeof(*sample));
 	}
 
-	if (callchain_param.order == ORDER_CALLEE) {
+	if (callee) {
 		/* Add kernel ip */
 		err = lbr_callchain_add_kernel_ip(thread, cursor, sample,
 						  parent, root_al, branch_from,
@@ -2391,7 +2517,18 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 		if (err)
 			goto error;
 
+		if (stitched_lbr) {
+			err = lbr_callchain_add_stitched_lbr_ip(thread, cursor);
+			if (err)
+				goto error;
+		}
+
 	} else {
+		if (stitched_lbr) {
+			err = lbr_callchain_add_stitched_lbr_ip(thread, cursor);
+			if (err)
+				goto error;
+		}
 		err = lbr_callchain_add_lbr_ip(thread, cursor, sample, parent,
 					       root_al, &branch_from, false);
 		if (err)
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 477c669cdbfa..12dd7288a14d 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -23,6 +23,8 @@ struct thread_stack;
 struct unwind_libunwind_ops;
 
 struct lbr_stitch {
+	struct list_head		lists;
+	struct list_head		free_lists;
 	struct perf_sample		prev_sample;
 	struct callchain_cursor_node	*prev_lbr_cursor;
 };
@@ -156,10 +158,21 @@ static inline bool thread__is_filtered(struct thread *thread)
 static inline void thread__free_stitch_list(struct thread *thread)
 {
 	struct lbr_stitch *lbr_stitch = thread->lbr_stitch;
+	struct stitch_list *pos, *tmp;
 
 	if (!lbr_stitch)
 		return;
 
+	list_for_each_entry_safe(pos, tmp, &lbr_stitch->lists, node) {
+		list_del_init(&pos->node);
+		free(pos);
+	}
+
+	list_for_each_entry_safe(pos, tmp, &lbr_stitch->free_lists, node) {
+		list_del_init(&pos->node);
+		free(pos);
+	}
+
 	free(lbr_stitch->prev_lbr_cursor);
 	free(thread->lbr_stitch);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 13/17] perf report: Add option to enable the LBR stitching approach
  2020-03-19 20:25 [PATCH V4 00/17] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (11 preceding siblings ...)
  2020-03-19 20:25 ` [PATCH V4 12/17] perf tools: Stitch LBR call stack kan.liang
@ 2020-03-19 20:25 ` kan.liang
  2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2020-03-19 20:25 ` [PATCH V4 14/17] perf script: " kan.liang
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 46+ messages in thread
From: kan.liang @ 2020-03-19 20:25 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

With the LBR stitching approach, the reconstructed LBR call stack
can break the HW limitation. However, it may reconstruct invalid call
stacks in some cases, e.g. exception handing such as setjmp/longjmp.
Also, it may impact the processing time especially when the number of
samples with stitched LBRs are huge.

Add an option to enable the approach.

  # To display the perf.data header info, please use
  # --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 6K of event 'cycles'
  # Event count (approx.): 6492797701
  #
  # Children      Self  Command          Shared Object       Symbol
  # ........  ........  ...............  ..................
  # .................................
  #
    99.99%    99.99%  tchain_edit      tchain_edit        [.] f43
            |
            ---main
               f1
               f2
               f3
               f4
               f5
               f6
               f7
               f8
               f9
               f10
               f11
               f12
               f13
               f14
               f15
               f16
               f17
               f18
               f19
               f20
               f21
               f22
               f23
               f24
               f25
               f26
               f27
               f28
               f29
               f30
               f31
               |
                --99.65%--f32
                          f33
                          f34
                          f35
                          f36
                          f37
                          f38
                          f39
                          f40
                          f41
                          f42
                          f43

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/Documentation/perf-report.txt | 11 +++++++++++
 tools/perf/builtin-report.c              | 12 ++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index e56e3f1344a7..cfeab01054e4 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -487,6 +487,17 @@ include::itrace.txt[]
 	This option extends the perf report to show reference callgraphs,
 	which collected by reference event, in no callgraph event.
 
+--stitch-lbr::
+	Show callgraph with stitched LBRs, which may have more complete
+	callgraph. The perf.data file must have been obtained using
+	perf record --call-graph lbr.
+	Disabled by default. In common cases with call stack overflows,
+	it can recreate better call stacks than the default lbr call stack
+	output. But this approach is not full proof. There can be cases
+	where it creates incorrect call stacks from incorrect matches.
+	The known limitations include exception handing such as
+	setjmp/longjmp will have calls/returns not match.
+
 --socket-filter::
 	Only report the samples on the processor socket that match with this filter
 
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index ea673b7eb3f4..f44fc45c32dd 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -84,6 +84,7 @@ struct report {
 	bool			header_only;
 	bool			nonany_branch_mode;
 	bool			group_set;
+	bool			stitch_lbr;
 	int			max_stack;
 	struct perf_read_values	show_threads_values;
 	struct annotation_options annotation_opts;
@@ -267,6 +268,9 @@ static int process_sample_event(struct perf_tool *tool,
 		return -1;
 	}
 
+	if (rep->stitch_lbr)
+		al.thread->lbr_stitch_enable = true;
+
 	if (symbol_conf.hide_unresolved && al.sym == NULL)
 		goto out_put;
 
@@ -407,6 +411,12 @@ static int report__setup_sample_type(struct report *rep)
 			callchain_param.record_mode = CALLCHAIN_FP;
 	}
 
+	if (rep->stitch_lbr && (callchain_param.record_mode != CALLCHAIN_LBR)) {
+		ui__warning("Can't find LBR callchain. Switch off --stitch-lbr.\n"
+			    "Please apply --call-graph lbr when recording.\n");
+		rep->stitch_lbr = false;
+	}
+
 	/* ??? handle more cases than just ANY? */
 	if (!(perf_evlist__combined_branch_type(session->evlist) &
 				PERF_SAMPLE_BRANCH_ANY))
@@ -1256,6 +1266,8 @@ int cmd_report(int argc, const char **argv)
 			"Show full source file name path for source lines"),
 	OPT_BOOLEAN(0, "show-ref-call-graph", &symbol_conf.show_ref_callgraph,
 		    "Show callgraph from reference event"),
+	OPT_BOOLEAN(0, "stitch-lbr", &report.stitch_lbr,
+		    "Enable LBR callgraph stitching approach"),
 	OPT_INTEGER(0, "socket-filter", &report.socket_filter,
 		    "only show processor socket that match with this filter"),
 	OPT_BOOLEAN(0, "raw-trace", &symbol_conf.raw_trace,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 14/17] perf script: Add option to enable the LBR stitching approach
  2020-03-19 20:25 [PATCH V4 00/17] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (12 preceding siblings ...)
  2020-03-19 20:25 ` [PATCH V4 13/17] perf report: Add option to enable the LBR stitching approach kan.liang
@ 2020-03-19 20:25 ` kan.liang
  2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2020-03-19 20:25 ` [PATCH V4 15/17] perf top: " kan.liang
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 46+ messages in thread
From: kan.liang @ 2020-03-19 20:25 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

With the LBR stitching approach, the reconstructed LBR call stack
can break the HW limitation. However, it may reconstruct invalid call
stacks in some cases, e.g. exception handing such as setjmp/longjmp.
Also, it may impact the processing time especially when the number of
samples with stitched LBRs are huge.

Add an option to enable the approach.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/Documentation/perf-script.txt | 11 +++++++++++
 tools/perf/builtin-script.c              | 12 ++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index db6a36aac47e..67d5a2c7065b 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -426,6 +426,17 @@ include::itrace.txt[]
 --show-on-off-events::
 	Show the --switch-on/off events too.
 
+--stitch-lbr::
+	Show callgraph with stitched LBRs, which may have more complete
+	callgraph. The perf.data file must have been obtained using
+	perf record --call-graph lbr.
+	Disabled by default. In common cases with call stack overflows,
+	it can recreate better call stacks than the default lbr call stack
+	output. But this approach is not full proof. There can be cases
+	where it creates incorrect call stacks from incorrect matches.
+	The known limitations include exception handing such as
+	setjmp/longjmp will have calls/returns not match.
+
 SEE ALSO
 --------
 linkperf:perf-record[1], linkperf:perf-script-perl[1],
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 656b347f6dd8..e0ec4a0e7b35 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1687,6 +1687,7 @@ struct perf_script {
 	bool			show_bpf_events;
 	bool			allocated;
 	bool			per_event_dump;
+	bool			stitch_lbr;
 	struct evswitch		evswitch;
 	struct perf_cpu_map	*cpus;
 	struct perf_thread_map *threads;
@@ -1913,6 +1914,9 @@ static void process_event(struct perf_script *script,
 	if (PRINT_FIELD(IP)) {
 		struct callchain_cursor *cursor = NULL;
 
+		if (script->stitch_lbr)
+			al->thread->lbr_stitch_enable = true;
+
 		if (symbol_conf.use_callchain && sample->callchain &&
 		    thread__resolve_callchain(al->thread, &callchain_cursor, evsel,
 					      sample, NULL, NULL, scripting_max_stack) == 0)
@@ -3295,6 +3299,12 @@ static void script__setup_sample_type(struct perf_script *script)
 		else
 			callchain_param.record_mode = CALLCHAIN_FP;
 	}
+
+	if (script->stitch_lbr && (callchain_param.record_mode != CALLCHAIN_LBR)) {
+		pr_warning("Can't find LBR callchain. Switch off --stitch-lbr.\n"
+			   "Please apply --call-graph lbr when recording.\n");
+		script->stitch_lbr = false;
+	}
 }
 
 static int process_stat_round_event(struct perf_session *session,
@@ -3602,6 +3612,8 @@ int cmd_script(int argc, const char **argv)
 		   "file", "file saving guest os /proc/kallsyms"),
 	OPT_STRING(0, "guestmodules", &symbol_conf.default_guest_modules,
 		   "file", "file saving guest os /proc/modules"),
+	OPT_BOOLEAN('\0', "stitch-lbr", &script.stitch_lbr,
+		    "Enable LBR callgraph stitching approach"),
 	OPTS_EVSWITCH(&script.evswitch),
 	OPT_END()
 	};
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 15/17] perf top: Add option to enable the LBR stitching approach
  2020-03-19 20:25 [PATCH V4 00/17] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (13 preceding siblings ...)
  2020-03-19 20:25 ` [PATCH V4 14/17] perf script: " kan.liang
@ 2020-03-19 20:25 ` kan.liang
  2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2020-03-19 20:25 ` [PATCH V4 16/17] perf c2c: " kan.liang
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 46+ messages in thread
From: kan.liang @ 2020-03-19 20:25 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

With the LBR stitching approach, the reconstructed LBR call stack
can break the HW limitation. However, it may reconstruct invalid call
stacks in some cases, e.g. exception handing such as setjmp/longjmp.
Also, it may impact the processing time especially when the number of
samples with stitched LBRs are huge.

Add an option to enable the approach.
The option must be used with --call-graph lbr.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/Documentation/perf-top.txt |  9 +++++++++
 tools/perf/builtin-top.c              | 11 +++++++++++
 tools/perf/util/top.h                 |  1 +
 3 files changed, 21 insertions(+)

diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt
index 324b6b53c86b..0648d96981fe 100644
--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -310,6 +310,15 @@ Default is to monitor all CPUS.
 	go straight to the histogram browser, just like 'perf top' with no events
 	explicitely specified does.
 
+--stitch-lbr::
+	Show callgraph with stitched LBRs, which may have more complete
+	callgraph. The option must be used with --call-graph lbr recording.
+	Disabled by default. In common cases with call stack overflows,
+	it can recreate better call stacks than the default lbr call stack
+	output. But this approach is not full proof. There can be cases
+	where it creates incorrect call stacks from incorrect matches.
+	The known limitations include exception handing such as
+	setjmp/longjmp will have calls/returns not match.
 
 INTERACTIVE PROMPTING KEYS
 --------------------------
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index f6dd1a63f159..aae8282b1fac 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -33,6 +33,7 @@
 #include "util/map.h"
 #include "util/mmap.h"
 #include "util/session.h"
+#include "util/thread.h"
 #include "util/symbol.h"
 #include "util/synthetic-events.h"
 #include "util/top.h"
@@ -766,6 +767,9 @@ static void perf_event__process_sample(struct perf_tool *tool,
 	if (machine__resolve(machine, &al, sample) < 0)
 		return;
 
+	if (top->stitch_lbr)
+		al.thread->lbr_stitch_enable = true;
+
 	if (!machine->kptr_restrict_warned &&
 	    symbol_conf.kptr_restrict &&
 	    al.cpumode == PERF_RECORD_MISC_KERNEL) {
@@ -1543,6 +1547,8 @@ int cmd_top(int argc, const char **argv)
 			"number of thread to run event synthesize"),
 	OPT_BOOLEAN(0, "namespaces", &opts->record_namespaces,
 		    "Record namespaces events"),
+	OPT_BOOLEAN(0, "stitch-lbr", &top.stitch_lbr,
+		    "Enable LBR callgraph stitching approach"),
 	OPTS_EVSWITCH(&top.evswitch),
 	OPT_END()
 	};
@@ -1612,6 +1618,11 @@ int cmd_top(int argc, const char **argv)
 		}
 	}
 
+	if (top.stitch_lbr && !(callchain_param.record_mode == CALLCHAIN_LBR)) {
+		pr_err("Error: --stitch-lbr must be used with --call-graph lbr\n");
+		goto out_delete_evlist;
+	}
+
 	if (opts->branch_stack && callchain_param.enabled)
 		symbol_conf.show_branchflag_count = true;
 
diff --git a/tools/perf/util/top.h b/tools/perf/util/top.h
index f117d4f4821e..45dc84ddff37 100644
--- a/tools/perf/util/top.h
+++ b/tools/perf/util/top.h
@@ -36,6 +36,7 @@ struct perf_top {
 	bool		   use_tui, use_stdio;
 	bool		   vmlinux_warned;
 	bool		   dump_symtab;
+	bool		   stitch_lbr;
 	struct hist_entry  *sym_filter_entry;
 	struct evsel 	   *sym_evsel;
 	struct perf_session *session;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 16/17] perf c2c: Add option to enable the LBR stitching approach
  2020-03-19 20:25 [PATCH V4 00/17] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (14 preceding siblings ...)
  2020-03-19 20:25 ` [PATCH V4 15/17] perf top: " kan.liang
@ 2020-03-19 20:25 ` kan.liang
  2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2020-03-19 20:25 ` [PATCH V4 17/17] perf hist: Add fast path for duplicate entries check kan.liang
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 46+ messages in thread
From: kan.liang @ 2020-03-19 20:25 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

With the LBR stitching approach, the reconstructed LBR call stack
can break the HW limitation. However, it may reconstruct invalid call
stacks in some cases, e.g. exception handing such as setjmp/longjmp.
Also, it may impact the processing time especially when the number of
samples with stitched LBRs are huge.

Add an option to enable the approach.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/Documentation/perf-c2c.txt | 11 +++++++++++
 tools/perf/builtin-c2c.c              | 12 ++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
index e6150f21267d..2133eb320cb0 100644
--- a/tools/perf/Documentation/perf-c2c.txt
+++ b/tools/perf/Documentation/perf-c2c.txt
@@ -111,6 +111,17 @@ REPORT OPTIONS
 --display::
 	Switch to HITM type (rmt, lcl) to display and sort on. Total HITMs as default.
 
+--stitch-lbr::
+	Show callgraph with stitched LBRs, which may have more complete
+	callgraph. The perf.data file must have been obtained using
+	perf c2c record --call-graph lbr.
+	Disabled by default. In common cases with call stack overflows,
+	it can recreate better call stacks than the default lbr call stack
+	output. But this approach is not full proof. There can be cases
+	where it creates incorrect call stacks from incorrect matches.
+	The known limitations include exception handing such as
+	setjmp/longjmp will have calls/returns not match.
+
 C2C RECORD
 ----------
 The perf c2c record command setup options related to HITM cacheline analysis
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 246ac0b4d54f..0d544c4fb4be 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -95,6 +95,7 @@ struct perf_c2c {
 	bool			 use_stdio;
 	bool			 stats_only;
 	bool			 symbol_full;
+	bool			 stitch_lbr;
 
 	/* HITM shared clines stats */
 	struct c2c_stats	hitm_stats;
@@ -273,6 +274,9 @@ static int process_sample_event(struct perf_tool *tool __maybe_unused,
 		return -1;
 	}
 
+	if (c2c.stitch_lbr)
+		al.thread->lbr_stitch_enable = true;
+
 	ret = sample__resolve_callchain(sample, &callchain_cursor, NULL,
 					evsel, &al, sysctl_perf_event_max_stack);
 	if (ret)
@@ -2601,6 +2605,12 @@ static int setup_callchain(struct evlist *evlist)
 		}
 	}
 
+	if (c2c.stitch_lbr && (mode != CALLCHAIN_LBR)) {
+		ui__warning("Can't find LBR callchain. Switch off --stitch-lbr.\n"
+			    "Please apply --call-graph lbr when recording.\n");
+		c2c.stitch_lbr = false;
+	}
+
 	callchain_param.record_mode = mode;
 	callchain_param.min_percent = 0;
 	return 0;
@@ -2752,6 +2762,8 @@ static int perf_c2c__report(int argc, const char **argv)
 	OPT_STRING('c', "coalesce", &coalesce, "coalesce fields",
 		   "coalesce fields: pid,tid,iaddr,dso"),
 	OPT_BOOLEAN('f', "force", &symbol_conf.force, "don't complain, do it"),
+	OPT_BOOLEAN(0, "stitch-lbr", &c2c.stitch_lbr,
+		    "Enable LBR callgraph stitching approach"),
 	OPT_PARENT(c2c_options),
 	OPT_END()
 	};
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH V4 17/17] perf hist: Add fast path for duplicate entries check
  2020-03-19 20:25 [PATCH V4 00/17] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (15 preceding siblings ...)
  2020-03-19 20:25 ` [PATCH V4 16/17] perf c2c: " kan.liang
@ 2020-03-19 20:25 ` kan.liang
  2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2020-03-23 11:13 ` [PATCH V4 00/17] Stitch LBR call stack (Perf Tools) Jiri Olsa
  2020-04-17 17:48 ` Arnaldo Carvalho de Melo
  18 siblings, 1 reply; 46+ messages in thread
From: kan.liang @ 2020-03-19 20:25 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, linux-kernel
  Cc: namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Perf checks the duplicate entries in a callchain before adding an entry.
However the check is very slow especially with deeper call stack.
Almost ~50% elapsed time of perf report is spent on the check when the
call stack is always depth of 32.

The hist_entry__cmp() is used to compare the new entry with the old
entries. It will go through all the available sorts in the sort_list,
and call the specific cmp of each sort, which is very slow.
Actually, for most cases, there are no duplicate entries in callchain.
The symbols are usually different. It's much faster to do a quick check
for symbols first. Only do the full cmp when the symbols are exactly the
same.
The quick check is only to check symbols, not dso. Export
_sort__sym_cmp.

 $perf record --call-graph lbr ./tchain_edit_64

 Without the patch
 $time perf report --stdio
 real    0m21.142s
 user    0m21.110s
 sys     0m0.033s

 With the patch
 $time perf report --stdio
 real    0m10.977s
 user    0m10.948s
 sys     0m0.027s

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/hist.c | 23 +++++++++++++++++++++++
 tools/perf/util/sort.c |  2 +-
 tools/perf/util/sort.h |  2 ++
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index e74a5acf66d9..311d6d119f3c 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -1057,6 +1057,20 @@ iter_next_cumulative_entry(struct hist_entry_iter *iter,
 	return fill_callchain_info(al, node, iter->hide_unresolved);
 }
 
+static bool
+hist_entry__fast__sym_diff(struct hist_entry *left,
+			   struct hist_entry *right)
+{
+	struct symbol *sym_l = left->ms.sym;
+	struct symbol *sym_r = right->ms.sym;
+
+	if (!sym_l && !sym_r)
+		return left->ip != right->ip;
+
+	return !!_sort__sym_cmp(sym_l, sym_r);
+}
+
+
 static int
 iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
 			       struct addr_location *al)
@@ -1083,6 +1097,7 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
 	};
 	int i;
 	struct callchain_cursor cursor;
+	bool fast = hists__has(he_tmp.hists, sym);
 
 	callchain_cursor_snapshot(&cursor, &callchain_cursor);
 
@@ -1093,6 +1108,14 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
 	 * It's possible that it has cycles or recursive calls.
 	 */
 	for (i = 0; i < iter->curr; i++) {
+		/*
+		 * For most cases, there are no duplicate entries in callchain.
+		 * The symbols are usually different. Do a quick check for
+		 * symbols first.
+		 */
+		if (fast && hist_entry__fast__sym_diff(he_cache[i], &he_tmp))
+			continue;
+
 		if (hist_entry__cmp(he_cache[i], &he_tmp) == 0) {
 			/* to avoid calling callback function */
 			iter->he = NULL;
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index e860595576c2..96fd5fb42116 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -234,7 +234,7 @@ static int64_t _sort__addr_cmp(u64 left_ip, u64 right_ip)
 	return (int64_t)(right_ip - left_ip);
 }
 
-static int64_t _sort__sym_cmp(struct symbol *sym_l, struct symbol *sym_r)
+int64_t _sort__sym_cmp(struct symbol *sym_l, struct symbol *sym_r)
 {
 	if (!sym_l || !sym_r)
 		return cmp_null(sym_l, sym_r);
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 6c862d62d052..c3c3c68cbfdd 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -309,5 +309,7 @@ int64_t
 sort__daddr_cmp(struct hist_entry *left, struct hist_entry *right);
 int64_t
 sort__dcacheline_cmp(struct hist_entry *left, struct hist_entry *right);
+int64_t
+_sort__sym_cmp(struct symbol *sym_l, struct symbol *sym_r);
 char *hist_entry__srcline(struct hist_entry *he);
 #endif	/* __PERF_SORT_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH V4 00/17] Stitch LBR call stack (Perf Tools)
  2020-03-19 20:25 [PATCH V4 00/17] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (16 preceding siblings ...)
  2020-03-19 20:25 ` [PATCH V4 17/17] perf hist: Add fast path for duplicate entries check kan.liang
@ 2020-03-23 11:13 ` Jiri Olsa
  2020-04-02 15:34   ` Liang, Kan
  2020-04-17 17:48 ` Arnaldo Carvalho de Melo
  18 siblings, 1 reply; 46+ messages in thread
From: Jiri Olsa @ 2020-03-23 11:13 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, peterz, mingo, linux-kernel, namhyung, adrian.hunter,
	mathieu.poirier, ravi.bangoria, alexey.budankov,
	vitaly.slobodskoy, pavel.gerasimov, mpe, eranian, ak

On Thu, Mar 19, 2020 at 01:25:00PM -0700, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> Changes since V3:
> - There is no dependency among the 'capabilities'. If perf fails to read
>   one, it should not impact others. Continue to parse the rest of caps.
>   (Patch 1)
> - Use list_for_each_entry() to replace perf_pmu__scan_caps() (Patch 1 &
>   2)
> - Combine the declaration plus assignment when possible (Patch 1 & 2)
> - Add check for script/report/c2c.. (Patch 13, 14 & 16)

it's all black magic to me, but looks ok ;-)

Acked-by: Jiri Olsa <jolsa@redhat.com>

thanks,
jirka


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V4 00/17] Stitch LBR call stack (Perf Tools)
  2020-03-23 11:13 ` [PATCH V4 00/17] Stitch LBR call stack (Perf Tools) Jiri Olsa
@ 2020-04-02 15:34   ` Liang, Kan
  2020-04-02 16:00     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 46+ messages in thread
From: Liang, Kan @ 2020-04-02 15:34 UTC (permalink / raw)
  To: Jiri Olsa, acme
  Cc: peterz, mingo, linux-kernel, namhyung, adrian.hunter,
	mathieu.poirier, ravi.bangoria, alexey.budankov,
	vitaly.slobodskoy, pavel.gerasimov, mpe, eranian, ak



On 3/23/2020 7:13 AM, Jiri Olsa wrote:
> On Thu, Mar 19, 2020 at 01:25:00PM -0700, kan.liang@linux.intel.com wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> Changes since V3:
>> - There is no dependency among the 'capabilities'. If perf fails to read
>>    one, it should not impact others. Continue to parse the rest of caps.
>>    (Patch 1)
>> - Use list_for_each_entry() to replace perf_pmu__scan_caps() (Patch 1 &
>>    2)
>> - Combine the declaration plus assignment when possible (Patch 1 & 2)
>> - Add check for script/report/c2c.. (Patch 13, 14 & 16)
> 
> it's all black magic to me, but looks ok ;-)
> 
> Acked-by: Jiri Olsa <jolsa@redhat.com>
>

Thanks Jirka.

Hi Arnaldo,

Any comments for the series?

Thanks,
Kan

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V4 00/17] Stitch LBR call stack (Perf Tools)
  2020-04-02 15:34   ` Liang, Kan
@ 2020-04-02 16:00     ` Arnaldo Carvalho de Melo
  2020-04-02 17:02       ` Liang, Kan
  0 siblings, 1 reply; 46+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-04-02 16:00 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Jiri Olsa, peterz, mingo, linux-kernel, namhyung, adrian.hunter,
	mathieu.poirier, ravi.bangoria, alexey.budankov,
	vitaly.slobodskoy, pavel.gerasimov, mpe, eranian, ak

Em Thu, Apr 02, 2020 at 11:34:18AM -0400, Liang, Kan escreveu:
> 
> 
> On 3/23/2020 7:13 AM, Jiri Olsa wrote:
> > On Thu, Mar 19, 2020 at 01:25:00PM -0700, kan.liang@linux.intel.com wrote:
> > > From: Kan Liang <kan.liang@linux.intel.com>
> > > 
> > > Changes since V3:
> > > - There is no dependency among the 'capabilities'. If perf fails to read
> > >    one, it should not impact others. Continue to parse the rest of caps.
> > >    (Patch 1)
> > > - Use list_for_each_entry() to replace perf_pmu__scan_caps() (Patch 1 &
> > >    2)
> > > - Combine the declaration plus assignment when possible (Patch 1 & 2)
> > > - Add check for script/report/c2c.. (Patch 13, 14 & 16)
> > 
> > it's all black magic to me, but looks ok ;-)
> > 
> > Acked-by: Jiri Olsa <jolsa@redhat.com>
> > 
> 
> Thanks Jirka.
> 
> Hi Arnaldo,
> 
> Any comments for the series?

I need to test it, hope to do it soon, but I'm a bit backlogged, sorry.

- Arnaldo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V4 00/17] Stitch LBR call stack (Perf Tools)
  2020-04-02 16:00     ` Arnaldo Carvalho de Melo
@ 2020-04-02 17:02       ` Liang, Kan
  0 siblings, 0 replies; 46+ messages in thread
From: Liang, Kan @ 2020-04-02 17:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, peterz, mingo, linux-kernel, namhyung, adrian.hunter,
	mathieu.poirier, ravi.bangoria, alexey.budankov,
	vitaly.slobodskoy, pavel.gerasimov, mpe, eranian, ak



On 4/2/2020 12:00 PM, Arnaldo Carvalho de Melo wrote:
> Em Thu, Apr 02, 2020 at 11:34:18AM -0400, Liang, Kan escreveu:
>>
>>
>> On 3/23/2020 7:13 AM, Jiri Olsa wrote:
>>> On Thu, Mar 19, 2020 at 01:25:00PM -0700, kan.liang@linux.intel.com wrote:
>>>> From: Kan Liang <kan.liang@linux.intel.com>
>>>>
>>>> Changes since V3:
>>>> - There is no dependency among the 'capabilities'. If perf fails to read
>>>>     one, it should not impact others. Continue to parse the rest of caps.
>>>>     (Patch 1)
>>>> - Use list_for_each_entry() to replace perf_pmu__scan_caps() (Patch 1 &
>>>>     2)
>>>> - Combine the declaration plus assignment when possible (Patch 1 & 2)
>>>> - Add check for script/report/c2c.. (Patch 13, 14 & 16)
>>>
>>> it's all black magic to me, but looks ok ;-)
>>>
>>> Acked-by: Jiri Olsa <jolsa@redhat.com>
>>>
>>
>> Thanks Jirka.
>>
>> Hi Arnaldo,
>>
>> Any comments for the series?
> 
> I need to test it, 

Sure. Thanks for the update.

Kan

> hope to do it soon, but I'm a bit backlogged, sorry.
> > - Arnaldo
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V4 03/17] perf record: Clear HEADER_CPU_PMU_CAPS for non LBR call stack mode
  2020-03-19 20:25 ` [PATCH V4 03/17] perf record: Clear HEADER_CPU_PMU_CAPS for non LBR call stack mode kan.liang
@ 2020-04-17 14:42   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 46+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-04-17 14:42 UTC (permalink / raw)
  To: kan.liang
  Cc: jolsa, peterz, mingo, linux-kernel, namhyung, adrian.hunter,
	mathieu.poirier, ravi.bangoria, alexey.budankov,
	vitaly.slobodskoy, pavel.gerasimov, mpe, eranian, ak

Em Thu, Mar 19, 2020 at 01:25:03PM -0700, kan.liang@linux.intel.com escreveu:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> The CPU PMU capabilities information is only useful for LBR call stack.
> Clear the feature for other perf record mode.

Humm, I think it is useful to have this extra piece of info in the
header in general, i.e. some user may want to know about these
capabilities when investigating some perf.data file on another machine,
etc, so I'l skip this patch for now.

- Arnaldo
 
> Reviewed-by: Andi Kleen <ak@linux.intel.com>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> ---
>  tools/perf/builtin-record.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 4c301466101b..428f7f5b8e48 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -1120,6 +1120,9 @@ static void record__init_features(struct record *rec)
>  	if (!record__comp_enabled(rec))
>  		perf_header__clear_feat(&session->header, HEADER_COMPRESSED);
>  
> +	if (!callchain_param.enabled || (callchain_param.record_mode != CALLCHAIN_LBR))
> +		perf_header__clear_feat(&session->header, HEADER_CPU_PMU_CAPS);
> +
>  	perf_header__clear_feat(&session->header, HEADER_STAT);
>  }
>  
> -- 
> 2.17.1
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V4 04/17] perf stat: Clear HEADER_CPU_PMU_CAPS
  2020-03-19 20:25 ` [PATCH V4 04/17] perf stat: Clear HEADER_CPU_PMU_CAPS kan.liang
@ 2020-04-17 14:42   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 46+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-04-17 14:42 UTC (permalink / raw)
  To: kan.liang
  Cc: jolsa, peterz, mingo, linux-kernel, namhyung, adrian.hunter,
	mathieu.poirier, ravi.bangoria, alexey.budankov,
	vitaly.slobodskoy, pavel.gerasimov, mpe, eranian, ak

Em Thu, Mar 19, 2020 at 01:25:04PM -0700, kan.liang@linux.intel.com escreveu:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> The CPU PMU capabilities information is only useful for perf record with
> LBR call stack.
> 
> Clear the header for perf stat.

Ditto.
 
> Reviewed-by: Andi Kleen <ak@linux.intel.com>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> ---
>  tools/perf/builtin-stat.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index ec053dc1e35c..b5c8a5ab5e75 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1595,6 +1595,7 @@ static void init_features(struct perf_session *session)
>  	perf_header__clear_feat(&session->header, HEADER_TRACING_DATA);
>  	perf_header__clear_feat(&session->header, HEADER_BRANCH_STACK);
>  	perf_header__clear_feat(&session->header, HEADER_AUXTRACE);
> +	perf_header__clear_feat(&session->header, HEADER_CPU_PMU_CAPS);
>  }
>  
>  static int __cmd_record(int argc, const char **argv)
> -- 
> 2.17.1
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V4 10/17] perf tools: Save previous sample for LBR stitching approach
  2020-03-19 20:25 ` [PATCH V4 10/17] perf tools: Save previous sample for LBR stitching approach kan.liang
@ 2020-04-17 15:02   ` Arnaldo Carvalho de Melo
  2020-04-22 12:17   ` [tip: perf/core] perf thread: " tip-bot2 for Kan Liang
  1 sibling, 0 replies; 46+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-04-17 15:02 UTC (permalink / raw)
  To: kan.liang
  Cc: jolsa, peterz, mingo, linux-kernel, namhyung, adrian.hunter,
	mathieu.poirier, ravi.bangoria, alexey.budankov,
	vitaly.slobodskoy, pavel.gerasimov, mpe, eranian, ak

Em Thu, Mar 19, 2020 at 01:25:10PM -0700, kan.liang@linux.intel.com escreveu:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> To retrieve the overwritten LBRs from previous sample for LBR stitching
> approach, perf has to save the previous sample.
> 
> Only allocate the struct lbr_stitch once, when LBR stitching approach
> is enabled and kernel supports hw_idx.

Applied + this one on top:

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 59778f5aec2a..a54ca09a1d00 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2297,7 +2297,7 @@ static bool alloc_lbr_stitch(struct thread *thread)
 	if (thread->lbr_stitch)
 		return true;
 
-	thread->lbr_stitch = calloc(1, sizeof(struct lbr_stitch));
+	thread->lbr_stitch = zalloc(sizeof(*thread->lbr_stitch));
 	if (!thread->lbr_stitch)
 		goto err;
 
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index f65a84a25f93..34eb61cee6a4 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -5,6 +5,7 @@
 #include <linux/refcount.h>
 #include <linux/rbtree.h>
 #include <linux/list.h>
+#include <linux/zalloc.h>
 #include <stdio.h>
 #include <unistd.h>
 #include <sys/types.h>
@@ -153,7 +154,7 @@ static inline bool thread__is_filtered(struct thread *thread)
 
 static inline void thread__free_stitch_list(struct thread *thread)
 {
-	free(thread->lbr_stitch);
+	zfree(&thread->lbr_stitch);
 }
 
 #endif	/* __PERF_THREAD_H */
 
> Reviewed-by: Andi Kleen <ak@linux.intel.com>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> ---
>  tools/perf/util/machine.c | 23 +++++++++++++++++++++++
>  tools/perf/util/thread.c  |  1 +
>  tools/perf/util/thread.h  | 11 +++++++++++
>  3 files changed, 35 insertions(+)
> 
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index f1661dd3ca69..d91e11bfc8ca 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -2261,6 +2261,21 @@ static int lbr_callchain_add_lbr_ip(struct thread *thread,
>  	return 0;
>  }
>  
> +static bool alloc_lbr_stitch(struct thread *thread)
> +{
> +	if (thread->lbr_stitch)
> +		return true;
> +
> +	thread->lbr_stitch = calloc(1, sizeof(struct lbr_stitch));
> +	if (!thread->lbr_stitch)
> +		goto err;
> +
> +err:
> +	pr_warning("Failed to allocate space for stitched LBRs. Disable LBR stitch\n");
> +	thread->lbr_stitch_enable = false;
> +	return false;
> +}
> +
>  /*
>   * Recolve LBR callstack chain sample
>   * Return:
> @@ -2277,6 +2292,7 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
>  {
>  	struct ip_callchain *chain = sample->callchain;
>  	int chain_nr = min(max_stack, (int)chain->nr), i;
> +	struct lbr_stitch *lbr_stitch;
>  	u64 branch_from = 0;
>  	int err;
>  
> @@ -2289,6 +2305,13 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
>  	if (i == chain_nr)
>  		return 0;
>  
> +	if (thread->lbr_stitch_enable && !sample->no_hw_idx &&
> +	    alloc_lbr_stitch(thread)) {
> +		lbr_stitch = thread->lbr_stitch;
> +
> +		memcpy(&lbr_stitch->prev_sample, sample, sizeof(*sample));
> +	}
> +
>  	if (callchain_param.order == ORDER_CALLEE) {
>  		/* Add kernel ip */
>  		err = lbr_callchain_add_kernel_ip(thread, cursor, sample,
> diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
> index 1f080db23615..8d0da260c84c 100644
> --- a/tools/perf/util/thread.c
> +++ b/tools/perf/util/thread.c
> @@ -111,6 +111,7 @@ void thread__delete(struct thread *thread)
>  
>  	exit_rwsem(&thread->namespaces_lock);
>  	exit_rwsem(&thread->comm_lock);
> +	thread__free_stitch_list(thread);
>  	free(thread);
>  }
>  
> diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
> index 95294050cff2..f65a84a25f93 100644
> --- a/tools/perf/util/thread.h
> +++ b/tools/perf/util/thread.h
> @@ -13,6 +13,7 @@
>  #include <strlist.h>
>  #include <intlist.h>
>  #include "rwsem.h"
> +#include "event.h"
>  
>  struct addr_location;
>  struct map;
> @@ -20,6 +21,10 @@ struct perf_record_namespaces;
>  struct thread_stack;
>  struct unwind_libunwind_ops;
>  
> +struct lbr_stitch {
> +	struct perf_sample		prev_sample;
> +};
> +
>  struct thread {
>  	union {
>  		struct rb_node	 rb_node;
> @@ -49,6 +54,7 @@ struct thread {
>  
>  	/* LBR call stack stitch */
>  	bool			lbr_stitch_enable;
> +	struct lbr_stitch	*lbr_stitch;
>  };
>  
>  struct machine;
> @@ -145,4 +151,9 @@ static inline bool thread__is_filtered(struct thread *thread)
>  	return false;
>  }
>  
> +static inline void thread__free_stitch_list(struct thread *thread)
> +{
> +	free(thread->lbr_stitch);
> +}
> +
>  #endif	/* __PERF_THREAD_H */
> -- 
> 2.17.1
> 

-- 

- Arnaldo

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH V4 11/17] perf tools: Save previous cursor nodes for LBR stitching approach
  2020-03-19 20:25 ` [PATCH V4 11/17] perf tools: Save previous cursor nodes " kan.liang
@ 2020-04-17 16:53   ` Arnaldo Carvalho de Melo
  2020-04-22 12:17   ` [tip: perf/core] perf callchain: " tip-bot2 for Kan Liang
  1 sibling, 0 replies; 46+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-04-17 16:53 UTC (permalink / raw)
  To: kan.liang
  Cc: jolsa, peterz, mingo, linux-kernel, namhyung, adrian.hunter,
	mathieu.poirier, ravi.bangoria, alexey.budankov,
	vitaly.slobodskoy, pavel.gerasimov, mpe, eranian, ak

Em Thu, Mar 19, 2020 at 01:25:11PM -0700, kan.liang@linux.intel.com escreveu:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> The cursor nodes which generates from sample are eventually added into
> callchain. To avoid generating cursor nodes from previous samples again,
> the previous cursor nodes are also saved for LBR stitching approach.
> 
> Some option, e.g. hide-unresolved, may hide some LBRs.
> Add a variable 'valid' in struct callchain_cursor_node to indicate this
> case. The LBR stitching approach will only append the valid cursor nodes
> from previous samples later.
> 
> Reviewed-by: Andi Kleen <ak@linux.intel.com>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>

Applied this on top:


diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 6e7f15b45389..737dee723a57 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2364,8 +2364,7 @@ static bool alloc_lbr_stitch(struct thread *thread, unsigned int max_lbr)
 	return true;
 
 free_lbr_stitch:
-	free(thread->lbr_stitch);
-	thread->lbr_stitch = NULL;
+	zfree(&thread->lbr_stitch);
 err:
 	pr_warning("Failed to allocate space for stitched LBRs. Disable LBR stitch\n");
 	thread->lbr_stitch_enable = false;
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index c2eb3f943724..8456174a52c5 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -161,7 +161,7 @@ static inline void thread__free_stitch_list(struct thread *thread)
 	if (!lbr_stitch)
 		return;
 
-	free(lbr_stitch->prev_lbr_cursor);
+	zfree(&lbr_stitch->prev_lbr_cursor);
 	zfree(&thread->lbr_stitch);
 }
 

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH V4 00/17] Stitch LBR call stack (Perf Tools)
  2020-03-19 20:25 [PATCH V4 00/17] Stitch LBR call stack (Perf Tools) kan.liang
                   ` (17 preceding siblings ...)
  2020-03-23 11:13 ` [PATCH V4 00/17] Stitch LBR call stack (Perf Tools) Jiri Olsa
@ 2020-04-17 17:48 ` Arnaldo Carvalho de Melo
  2020-04-17 21:47   ` Liang, Kan
  18 siblings, 1 reply; 46+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-04-17 17:48 UTC (permalink / raw)
  To: kan.liang
  Cc: jolsa, peterz, mingo, linux-kernel, namhyung, adrian.hunter,
	mathieu.poirier, ravi.bangoria, alexey.budankov,
	vitaly.slobodskoy, pavel.gerasimov, mpe, eranian, ak

Em Thu, Mar 19, 2020 at 01:25:00PM -0700, kan.liang@linux.intel.com escreveu:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> Changes since V3:
> - There is no dependency among the 'capabilities'. If perf fails to read
>   one, it should not impact others. Continue to parse the rest of caps.
>   (Patch 1)
> - Use list_for_each_entry() to replace perf_pmu__scan_caps() (Patch 1 &
>   2)
> - Combine the declaration plus assignment when possible (Patch 1 & 2)
> - Add check for script/report/c2c.. (Patch 13, 14 & 16)
> 
> Changes since V2:
> - Check strdup() in Patch 1
> - Split several patches into smaller patches
> 
> Changes since V1:
> - Rebase on top of commit 5100c2b77049 ("perf header: Add check for
>   unexpected use of reserved membrs in event attr")
> - Fix compling error with GCC9 in patch 1.
> 
> 
> The kernel patches have been merged into linux-next.
>   commit bbfd5e4fab63 ("perf/core: Add new branch sample type for HW
> index of raw branch records")
>   commit db278b90c326 ("perf/x86/intel: Output LBR TOS information
> correctly")
> 
> Start from Haswell, Linux perf can utilize the existing Last Branch
> Record (LBR) facility to record call stack. However, the depth of the
> reconstructed LBR call stack limits to the number of LBR registers.
> E.g. on skylake, the depth of reconstructed LBR call stack is <= 32
> That's because HW will overwrite the oldest LBR registers when it's
> full.
> 
> However, the overwritten LBRs may still be retrieved from previous
> sample. At that moment, HW hasn't overwritten the LBR registers yet.
> Perf tools can stitch those overwritten LBRs on current call stacks to
> get a more complete call stack.
> 
> To determine if LBRs can be stitched, the maximum number of LBRs is
> required. Patch 1 - 4 retrieve the capabilities information from sysfs
> and save them in perf header.
> 
> Patch 5 - 12 implements the LBR stitching approach.
> 
> Users can use the options introduced in patch 13-16 to enable the LBR
> stitching approach for perf report, script, top and c2c.
> 
> Patch 17 adds a fast path for duplicate entries check. It benefits all
> call stack parsing, not just for stitch LBR call stack. It can be
> merged independently.
> 
> The stitching approach base on LBR call stack technology. The known
> limitations of LBR call stack technology still apply to the approach,
> e.g. Exception handing such as setjmp/longjmp will have calls/returns
> not match.
> This approach is not full proof. There can be cases where it creates
> incorrect call stacks from incorrect matches. There is no attempt
> to validate any matches in another way. So it is not enabled by default.
> However in many common cases with call stack overflows it can recreate
> better call stacks than the default lbr call stack output. So if there
> are problems with LBR overflows this is a possible workaround.
> 
> Regression:
> Users may collect LBR call stack on a machine with new perf tool and
> new kernel (support LBR TOS). However, they may parse the perf.data with
> old perf tool (not support LBR TOS). The old tool doesn't check
> attr.branch_sample_type. Users probably get incorrect information
> without any warning.
> 
> Performance impact:
> The processing time may increase with the LBR stitching approach
> enabled. The impact depends on the increased depth of call stacks.
> 
> For a simple test case tchain_edit with 43 depth of call stacks.
> perf record --call-graph lbr -- ./tchain_edit
> perf report --stitch-lbr
> 
> Without --stitch-lbr, perf report only display 32 depth of call stacks.
> With --stitch-lbr, perf report can display all 43 depth of call stacks.
> The depth of call stacks increase 34.3%.
> 
> Correspondingly, the processing time of perf report increases 39%,
> Without --stitch-lbr:                           11.0 sec
> With --stitch-lbr:                              15.3 sec

Next time provide the full test proggie, I had to expand those ... to
reproduce your results, all I have is in perf/core, some patches are
still to be processed, will continue later, have to stop now, see:

https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/commit/?id=13cfba6b741ff

For my testing, looks really great!

- Arnaldo
 
> The source code of tchain_edit.c is something similar as below.
> noinline void f43(void)
> {
>         int i;
>         for (i = 0; i < 10000;) {
> 
>                 if(i%2)
>                         i++;
>                 else
>                         i++;
>         }
> }
> 
> noinline void f42(void)
> {
>         int i;
>         for (i = 0; i < 100; i++) {
>                 f43();
>                 f43();
>                 f43();
>         }
> }
> 
> noinline void f41(void)
> {
>         int i;
>         for (i = 0; i < 100; i++) {
>                 f42();
>                 f42();
>                 f42();
>         }
> }
> noinline void f40(void)
> {
>         f41();
> }
> 
> ... ...
> 
> noinline void f32(void)
> {
>         f33();
> }
> 
> noinline void f31(void)
> {
>         int i;
> 
>         for (i = 0; i < 10000; i++) {
>                 if(i%2)
>                         i++;
>                 else
>                         i++;
>         }
> 
>         f32();
> }
> 
> noinline void f30(void)
> {
>         f31();
> }
> 
> ... ...
> 
> noinline void f1(void)
> {
>         f2();
> }
> 
> int main()
> {
>         f1();
> }
> 
> Kan Liang (17):
>   perf pmu: Add support for PMU capabilities
>   perf header: Support CPU PMU capabilities
>   perf record: Clear HEADER_CPU_PMU_CAPS for non LBR call stack mode
>   perf stat: Clear HEADER_CPU_PMU_CAPS
>   perf machine: Remove the indent in resolve_lbr_callchain_sample
>   perf machine: Refine the function for LBR call stack reconstruction
>   perf machine: Factor out lbr_callchain_add_kernel_ip()
>   perf machine: Factor out lbr_callchain_add_lbr_ip()
>   perf thread: Add a knob for LBR stitch approach
>   perf tools: Save previous sample for LBR stitching approach
>   perf tools: Save previous cursor nodes for LBR stitching approach
>   perf tools: Stitch LBR call stack
>   perf report: Add option to enable the LBR stitching approach
>   perf script: Add option to enable the LBR stitching approach
>   perf top: Add option to enable the LBR stitching approach
>   perf c2c: Add option to enable the LBR stitching approach
>   perf hist: Add fast path for duplicate entries check
> 
>  tools/perf/Documentation/perf-c2c.txt         |  11 +
>  tools/perf/Documentation/perf-report.txt      |  11 +
>  tools/perf/Documentation/perf-script.txt      |  11 +
>  tools/perf/Documentation/perf-top.txt         |   9 +
>  .../Documentation/perf.data-file-format.txt   |  16 +
>  tools/perf/builtin-c2c.c                      |  12 +
>  tools/perf/builtin-record.c                   |   3 +
>  tools/perf/builtin-report.c                   |  12 +
>  tools/perf/builtin-script.c                   |  12 +
>  tools/perf/builtin-stat.c                     |   1 +
>  tools/perf/builtin-top.c                      |  11 +
>  tools/perf/util/branch.h                      |  19 +-
>  tools/perf/util/callchain.h                   |   8 +
>  tools/perf/util/env.h                         |   3 +
>  tools/perf/util/header.c                      | 108 +++++
>  tools/perf/util/header.h                      |   1 +
>  tools/perf/util/hist.c                        |  23 +
>  tools/perf/util/machine.c                     | 423 +++++++++++++++---
>  tools/perf/util/pmu.c                         |  82 ++++
>  tools/perf/util/pmu.h                         |   9 +
>  tools/perf/util/sort.c                        |   2 +-
>  tools/perf/util/sort.h                        |   2 +
>  tools/perf/util/thread.c                      |   2 +
>  tools/perf/util/thread.h                      |  35 ++
>  tools/perf/util/top.h                         |   1 +
>  25 files changed, 757 insertions(+), 70 deletions(-)
> 
> -- 
> 2.17.1
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V4 00/17] Stitch LBR call stack (Perf Tools)
  2020-04-17 17:48 ` Arnaldo Carvalho de Melo
@ 2020-04-17 21:47   ` Liang, Kan
  2020-04-17 21:54     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 46+ messages in thread
From: Liang, Kan @ 2020-04-17 21:47 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: jolsa, peterz, mingo, linux-kernel, namhyung, adrian.hunter,
	mathieu.poirier, ravi.bangoria, alexey.budankov,
	vitaly.slobodskoy, pavel.gerasimov, mpe, eranian, ak



On 4/17/2020 1:48 PM, Arnaldo Carvalho de Melo wrote:
> Em Thu, Mar 19, 2020 at 01:25:00PM -0700, kan.liang@linux.intel.com escreveu:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> Changes since V3:
>> - There is no dependency among the 'capabilities'. If perf fails to read
>>    one, it should not impact others. Continue to parse the rest of caps.
>>    (Patch 1)
>> - Use list_for_each_entry() to replace perf_pmu__scan_caps() (Patch 1 &
>>    2)
>> - Combine the declaration plus assignment when possible (Patch 1 & 2)
>> - Add check for script/report/c2c.. (Patch 13, 14 & 16)
>>
>> Changes since V2:
>> - Check strdup() in Patch 1
>> - Split several patches into smaller patches
>>
>> Changes since V1:
>> - Rebase on top of commit 5100c2b77049 ("perf header: Add check for
>>    unexpected use of reserved membrs in event attr")
>> - Fix compling error with GCC9 in patch 1.
>>
>>
>> The kernel patches have been merged into linux-next.
>>    commit bbfd5e4fab63 ("perf/core: Add new branch sample type for HW
>> index of raw branch records")
>>    commit db278b90c326 ("perf/x86/intel: Output LBR TOS information
>> correctly")
>>
>> Start from Haswell, Linux perf can utilize the existing Last Branch
>> Record (LBR) facility to record call stack. However, the depth of the
>> reconstructed LBR call stack limits to the number of LBR registers.
>> E.g. on skylake, the depth of reconstructed LBR call stack is <= 32
>> That's because HW will overwrite the oldest LBR registers when it's
>> full.
>>
>> However, the overwritten LBRs may still be retrieved from previous
>> sample. At that moment, HW hasn't overwritten the LBR registers yet.
>> Perf tools can stitch those overwritten LBRs on current call stacks to
>> get a more complete call stack.
>>
>> To determine if LBRs can be stitched, the maximum number of LBRs is
>> required. Patch 1 - 4 retrieve the capabilities information from sysfs
>> and save them in perf header.
>>
>> Patch 5 - 12 implements the LBR stitching approach.
>>
>> Users can use the options introduced in patch 13-16 to enable the LBR
>> stitching approach for perf report, script, top and c2c.
>>
>> Patch 17 adds a fast path for duplicate entries check. It benefits all
>> call stack parsing, not just for stitch LBR call stack. It can be
>> merged independently.
>>
>> The stitching approach base on LBR call stack technology. The known
>> limitations of LBR call stack technology still apply to the approach,
>> e.g. Exception handing such as setjmp/longjmp will have calls/returns
>> not match.
>> This approach is not full proof. There can be cases where it creates
>> incorrect call stacks from incorrect matches. There is no attempt
>> to validate any matches in another way. So it is not enabled by default.
>> However in many common cases with call stack overflows it can recreate
>> better call stacks than the default lbr call stack output. So if there
>> are problems with LBR overflows this is a possible workaround.
>>
>> Regression:
>> Users may collect LBR call stack on a machine with new perf tool and
>> new kernel (support LBR TOS). However, they may parse the perf.data with
>> old perf tool (not support LBR TOS). The old tool doesn't check
>> attr.branch_sample_type. Users probably get incorrect information
>> without any warning.
>>
>> Performance impact:
>> The processing time may increase with the LBR stitching approach
>> enabled. The impact depends on the increased depth of call stacks.
>>
>> For a simple test case tchain_edit with 43 depth of call stacks.
>> perf record --call-graph lbr -- ./tchain_edit
>> perf report --stitch-lbr
>>
>> Without --stitch-lbr, perf report only display 32 depth of call stacks.
>> With --stitch-lbr, perf report can display all 43 depth of call stacks.
>> The depth of call stacks increase 34.3%.
>>
>> Correspondingly, the processing time of perf report increases 39%,
>> Without --stitch-lbr:                           11.0 sec
>> With --stitch-lbr:                              15.3 sec
> 
> Next time provide the full test proggie, I had to expand those ... to
> reproduce your results,

Sure, I will do so in the future.

> all I have is in perf/core, some patches are
> still to be processed, will continue later, have to stop now, see:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/commit/?id=13cfba6b741ff
> 
> For my testing, looks really great!

Thanks for the testing. :)

Kan

> 
> - Arnaldo
>   
>> The source code of tchain_edit.c is something similar as below.
>> noinline void f43(void)
>> {
>>          int i;
>>          for (i = 0; i < 10000;) {
>>
>>                  if(i%2)
>>                          i++;
>>                  else
>>                          i++;
>>          }
>> }
>>
>> noinline void f42(void)
>> {
>>          int i;
>>          for (i = 0; i < 100; i++) {
>>                  f43();
>>                  f43();
>>                  f43();
>>          }
>> }
>>
>> noinline void f41(void)
>> {
>>          int i;
>>          for (i = 0; i < 100; i++) {
>>                  f42();
>>                  f42();
>>                  f42();
>>          }
>> }
>> noinline void f40(void)
>> {
>>          f41();
>> }
>>
>> ... ...
>>
>> noinline void f32(void)
>> {
>>          f33();
>> }
>>
>> noinline void f31(void)
>> {
>>          int i;
>>
>>          for (i = 0; i < 10000; i++) {
>>                  if(i%2)
>>                          i++;
>>                  else
>>                          i++;
>>          }
>>
>>          f32();
>> }
>>
>> noinline void f30(void)
>> {
>>          f31();
>> }
>>
>> ... ...
>>
>> noinline void f1(void)
>> {
>>          f2();
>> }
>>
>> int main()
>> {
>>          f1();
>> }
>>
>> Kan Liang (17):
>>    perf pmu: Add support for PMU capabilities
>>    perf header: Support CPU PMU capabilities
>>    perf record: Clear HEADER_CPU_PMU_CAPS for non LBR call stack mode
>>    perf stat: Clear HEADER_CPU_PMU_CAPS
>>    perf machine: Remove the indent in resolve_lbr_callchain_sample
>>    perf machine: Refine the function for LBR call stack reconstruction
>>    perf machine: Factor out lbr_callchain_add_kernel_ip()
>>    perf machine: Factor out lbr_callchain_add_lbr_ip()
>>    perf thread: Add a knob for LBR stitch approach
>>    perf tools: Save previous sample for LBR stitching approach
>>    perf tools: Save previous cursor nodes for LBR stitching approach
>>    perf tools: Stitch LBR call stack
>>    perf report: Add option to enable the LBR stitching approach
>>    perf script: Add option to enable the LBR stitching approach
>>    perf top: Add option to enable the LBR stitching approach
>>    perf c2c: Add option to enable the LBR stitching approach
>>    perf hist: Add fast path for duplicate entries check
>>
>>   tools/perf/Documentation/perf-c2c.txt         |  11 +
>>   tools/perf/Documentation/perf-report.txt      |  11 +
>>   tools/perf/Documentation/perf-script.txt      |  11 +
>>   tools/perf/Documentation/perf-top.txt         |   9 +
>>   .../Documentation/perf.data-file-format.txt   |  16 +
>>   tools/perf/builtin-c2c.c                      |  12 +
>>   tools/perf/builtin-record.c                   |   3 +
>>   tools/perf/builtin-report.c                   |  12 +
>>   tools/perf/builtin-script.c                   |  12 +
>>   tools/perf/builtin-stat.c                     |   1 +
>>   tools/perf/builtin-top.c                      |  11 +
>>   tools/perf/util/branch.h                      |  19 +-
>>   tools/perf/util/callchain.h                   |   8 +
>>   tools/perf/util/env.h                         |   3 +
>>   tools/perf/util/header.c                      | 108 +++++
>>   tools/perf/util/header.h                      |   1 +
>>   tools/perf/util/hist.c                        |  23 +
>>   tools/perf/util/machine.c                     | 423 +++++++++++++++---
>>   tools/perf/util/pmu.c                         |  82 ++++
>>   tools/perf/util/pmu.h                         |   9 +
>>   tools/perf/util/sort.c                        |   2 +-
>>   tools/perf/util/sort.h                        |   2 +
>>   tools/perf/util/thread.c                      |   2 +
>>   tools/perf/util/thread.h                      |  35 ++
>>   tools/perf/util/top.h                         |   1 +
>>   25 files changed, 757 insertions(+), 70 deletions(-)
>>
>> -- 
>> 2.17.1
>>
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V4 00/17] Stitch LBR call stack (Perf Tools)
  2020-04-17 21:47   ` Liang, Kan
@ 2020-04-17 21:54     ` Arnaldo Carvalho de Melo
  2020-04-17 21:55       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 46+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-04-17 21:54 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Arnaldo Carvalho de Melo, jolsa, peterz, mingo, linux-kernel,
	namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak

Em Fri, Apr 17, 2020 at 05:47:49PM -0400, Liang, Kan escreveu:
> 
> 
> On 4/17/2020 1:48 PM, Arnaldo Carvalho de Melo wrote:
> > Em Thu, Mar 19, 2020 at 01:25:00PM -0700, kan.liang@linux.intel.com escreveu:
> > > For a simple test case tchain_edit with 43 depth of call stacks.
> > > perf record --call-graph lbr -- ./tchain_edit
> > > perf report --stitch-lbr

> > > Without --stitch-lbr, perf report only display 32 depth of call stacks.
> > > With --stitch-lbr, perf report can display all 43 depth of call stacks.
> > > The depth of call stacks increase 34.3%.

> > > Correspondingly, the processing time of perf report increases 39%,
> > > Without --stitch-lbr:                           11.0 sec
> > > With --stitch-lbr:                              15.3 sec

> > Next time provide the full test proggie, I had to expand those ... to
> > reproduce your results,

> Sure, I will do so in the future.

> > all I have is in perf/core, some patches are
> > still to be processed, will continue later, have to stop now, see:

> > https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/commit/?id=13cfba6b741ff

> > For my testing, looks really great!

> Thanks for the testing. :)

My pleasure.

BTW everything is in there by now.

- Arnaldo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V4 00/17] Stitch LBR call stack (Perf Tools)
  2020-04-17 21:54     ` Arnaldo Carvalho de Melo
@ 2020-04-17 21:55       ` Arnaldo Carvalho de Melo
  2020-04-17 21:55         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 46+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-04-17 21:55 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Arnaldo Carvalho de Melo, jolsa, peterz, mingo, linux-kernel,
	namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak

Em Fri, Apr 17, 2020 at 06:54:02PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, Apr 17, 2020 at 05:47:49PM -0400, Liang, Kan escreveu:
> > 
> > 
> > On 4/17/2020 1:48 PM, Arnaldo Carvalho de Melo wrote:
> > > Em Thu, Mar 19, 2020 at 01:25:00PM -0700, kan.liang@linux.intel.com escreveu:
> > > > For a simple test case tchain_edit with 43 depth of call stacks.
> > > > perf record --call-graph lbr -- ./tchain_edit
> > > > perf report --stitch-lbr
> 
> > > > Without --stitch-lbr, perf report only display 32 depth of call stacks.
> > > > With --stitch-lbr, perf report can display all 43 depth of call stacks.
> > > > The depth of call stacks increase 34.3%.
> 
> > > > Correspondingly, the processing time of perf report increases 39%,
> > > > Without --stitch-lbr:                           11.0 sec
> > > > With --stitch-lbr:                              15.3 sec
> 
> > > Next time provide the full test proggie, I had to expand those ... to
> > > reproduce your results,
> 
> > Sure, I will do so in the future.
> 
> > > all I have is in perf/core, some patches are
> > > still to be processed, will continue later, have to stop now, see:
> 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/commit/?id=13cfba6b741ff
> 
> > > For my testing, looks really great!
> 
> > Thanks for the testing. :)
> 
> My pleasure.
> 
> BTW everything is in there by now.

And I had to do some more fixes to get it building in my container build
test suite, I've restarted it and now its at this point:

Fri 17 Apr 2020 06:18:18 PM -03
$ export PERF_TARBALL=http://192.168.122.1/perf/perf-5.7.0-rc1.tar.xz
$ dm
   1   125.76 alpine:3.4                    : Ok   gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final)
   2   123.16 alpine:3.5                    : Ok   gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final)
   3   127.96 alpine:3.6                    : Ok   gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final)
   4   138.24 alpine:3.7                    : Ok   gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0)
   5   147.38 alpine:3.8                    : Ok   gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1)
   6   160.57 alpine:3.9                    : Ok   gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1)
   7   186.38 alpine:3.10                   : Ok   gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0)
   8   208.66 alpine:3.11                   : Ok   gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 9.0.0 (https://git.alpinelinux.org/aports f7f0d2c2b8bcd6a5843401a9a702029556492689) (based on LLVM 9.0.0)
   9   176.99 alpine:edge                   : Ok   gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 9.0.1 (git://git.alpinelinux.org/aports 6c34b9a10bcdcdac04a11569c50b61fb50c4ea6e) (based on LLVM 9.0.1)
  10    86.14 alt:p8                        : Ok   x86_64-alt-linux-gcc (GCC) 5.3.1 20151207 (ALT p8 5.3.1-alt3.M80P.1), clang version 3.8.0 (tags/RELEASE_380/final)
  11    99.43 alt:p9                        : Ok   x86_64-alt-linux-gcc (GCC) 8.3.1 20190507 (ALT p9 8.3.1-alt5), clang version 7.0.1
  12   104.78 alt:sisyphus                  : Ok   x86_64-alt-linux-gcc (GCC) 9.2.1 20200123 (ALT Sisyphus 9.2.1-alt3), clang version 9.0.1
  13    77.72 amazonlinux:1                 : Ok   gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final)
  14   120.69 amazonlinux:2                 : Ok   gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2)
  15    24.96 android-ndk:r12b-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
  16    25.73 android-ndk:r15c-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
  17    22.34 centos:5                      : Ok   gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55)
  18    31.91 centos:6                      : Ok   gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23)
  19    36.18 centos:7                      : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)
  20   129.17 centos:8                      : Ok   gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4), clang version 8.0.1 (Red Hat 8.0.1-1.module_el8.1.0+215+a01033fb)
  21: clearlinux:latest

- Arnaldo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH V4 00/17] Stitch LBR call stack (Perf Tools)
  2020-04-17 21:55       ` Arnaldo Carvalho de Melo
@ 2020-04-17 21:55         ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 46+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-04-17 21:55 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Arnaldo Carvalho de Melo, jolsa, peterz, mingo, linux-kernel,
	namhyung, adrian.hunter, mathieu.poirier, ravi.bangoria,
	alexey.budankov, vitaly.slobodskoy, pavel.gerasimov, mpe,
	eranian, ak

Em Fri, Apr 17, 2020 at 06:55:17PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, Apr 17, 2020 at 06:54:02PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Fri, Apr 17, 2020 at 05:47:49PM -0400, Liang, Kan escreveu:
> > > 
> > > 
> > > On 4/17/2020 1:48 PM, Arnaldo Carvalho de Melo wrote:
> > > > Em Thu, Mar 19, 2020 at 01:25:00PM -0700, kan.liang@linux.intel.com escreveu:
> > > > > For a simple test case tchain_edit with 43 depth of call stacks.
> > > > > perf record --call-graph lbr -- ./tchain_edit
> > > > > perf report --stitch-lbr
> > 
> > > > > Without --stitch-lbr, perf report only display 32 depth of call stacks.
> > > > > With --stitch-lbr, perf report can display all 43 depth of call stacks.
> > > > > The depth of call stacks increase 34.3%.
> > 
> > > > > Correspondingly, the processing time of perf report increases 39%,
> > > > > Without --stitch-lbr:                           11.0 sec
> > > > > With --stitch-lbr:                              15.3 sec
> > 
> > > > Next time provide the full test proggie, I had to expand those ... to
> > > > reproduce your results,
> > 
> > > Sure, I will do so in the future.
> > 
> > > > all I have is in perf/core, some patches are
> > > > still to be processed, will continue later, have to stop now, see:
> > 
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/commit/?id=13cfba6b741ff
> > 
> > > > For my testing, looks really great!
> > 
> > > Thanks for the testing. :)
> > 
> > My pleasure.
> > 
> > BTW everything is in there by now.
> 
> And I had to do some more fixes to get it building in my container build
> test suite, I've restarted it and now its at this point:
> 
> Fri 17 Apr 2020 06:18:18 PM -03
> $ export PERF_TARBALL=http://192.168.122.1/perf/perf-5.7.0-rc1.tar.xz
> $ dm
>    1   125.76 alpine:3.4                    : Ok   gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final)
>    2   123.16 alpine:3.5                    : Ok   gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final)
>    3   127.96 alpine:3.6                    : Ok   gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final)
>    4   138.24 alpine:3.7                    : Ok   gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0)
>    5   147.38 alpine:3.8                    : Ok   gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1)
>    6   160.57 alpine:3.9                    : Ok   gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1)
>    7   186.38 alpine:3.10                   : Ok   gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0)
>    8   208.66 alpine:3.11                   : Ok   gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 9.0.0 (https://git.alpinelinux.org/aports f7f0d2c2b8bcd6a5843401a9a702029556492689) (based on LLVM 9.0.0)
>    9   176.99 alpine:edge                   : Ok   gcc (Alpine 9.2.0) 9.2.0, Alpine clang version 9.0.1 (git://git.alpinelinux.org/aports 6c34b9a10bcdcdac04a11569c50b61fb50c4ea6e) (based on LLVM 9.0.1)
>   10    86.14 alt:p8                        : Ok   x86_64-alt-linux-gcc (GCC) 5.3.1 20151207 (ALT p8 5.3.1-alt3.M80P.1), clang version 3.8.0 (tags/RELEASE_380/final)
>   11    99.43 alt:p9                        : Ok   x86_64-alt-linux-gcc (GCC) 8.3.1 20190507 (ALT p9 8.3.1-alt5), clang version 7.0.1
>   12   104.78 alt:sisyphus                  : Ok   x86_64-alt-linux-gcc (GCC) 9.2.1 20200123 (ALT Sisyphus 9.2.1-alt3), clang version 9.0.1
>   13    77.72 amazonlinux:1                 : Ok   gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final)
>   14   120.69 amazonlinux:2                 : Ok   gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2)
>   15    24.96 android-ndk:r12b-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
>   16    25.73 android-ndk:r15c-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
>   17    22.34 centos:5                      : Ok   gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55)
>   18    31.91 centos:6                      : Ok   gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23)
>   19    36.18 centos:7                      : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)
>   20   129.17 centos:8                      : Ok   gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4), clang version 8.0.1 (Red Hat 8.0.1-1.module_el8.1.0+215+a01033fb)
>   21: clearlinux:latest

Just as I pressed the send hotkey:

  21    45.71 clearlinux:latest             : Ok   gcc (Clear Linux OS for Intel Architecture) 9.3.1 20200325 releases/gcc-9.3.0-55-gdff885cdc0, clang version 9.0.1


:-)

- Arnaldo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [tip: perf/core] perf hist: Add fast path for duplicate entries check
  2020-03-19 20:25 ` [PATCH V4 17/17] perf hist: Add fast path for duplicate entries check kan.liang
@ 2020-04-22 12:17   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 46+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-04-22 12:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Kan Liang, Jiri Olsa, Adrian Hunter, Alexey Budankov, Andi Kleen,
	Mathieu Poirier, Michael Ellerman, Namhyung Kim, Pavel Gerasimov,
	Peter Zijlstra, Ravi Bangoria, Stephane Eranian,
	Vitaly Slobodskoy, Arnaldo Carvalho de Melo, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     12e89e65f446476951f42aedeef56b6bd6f7f1e6
Gitweb:        https://git.kernel.org/tip/12e89e65f446476951f42aedeef56b6bd6f7f1e6
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 19 Mar 2020 13:25:17 -07:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Sat, 18 Apr 2020 09:05:01 -03:00

perf hist: Add fast path for duplicate entries check

Perf checks the duplicate entries in a callchain before adding an entry.
However the check is very slow especially with deeper call stack.
Almost ~50% elapsed time of perf report is spent on the check when the
call stack is always depth of 32.

The hist_entry__cmp() is used to compare the new entry with the old
entries. It will go through all the available sorts in the sort_list,
and call the specific cmp of each sort, which is very slow.

Actually, for most cases, there are no duplicate entries in callchain.
The symbols are usually different. It's much faster to do a quick check
for symbols first. Only do the full cmp when the symbols are exactly the
same.

The quick check is only to check symbols, not dso. Export
_sort__sym_cmp.

  $ perf record --call-graph lbr ./tchain_edit_64

  Without the patch
  $time perf report --stdio
  real    0m21.142s
  user    0m21.110s
  sys     0m0.033s

  With the patch
  $time perf report --stdio
  real    0m10.977s
  user    0m10.948s
  sys     0m0.027s

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
Link: http://lore.kernel.org/lkml/20200319202517.23423-18-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/hist.c | 23 +++++++++++++++++++++++
 tools/perf/util/sort.c |  2 +-
 tools/perf/util/sort.h |  2 ++
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 283a69f..c2550db 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -1070,6 +1070,20 @@ iter_next_cumulative_entry(struct hist_entry_iter *iter,
 	return fill_callchain_info(al, node, iter->hide_unresolved);
 }
 
+static bool
+hist_entry__fast__sym_diff(struct hist_entry *left,
+			   struct hist_entry *right)
+{
+	struct symbol *sym_l = left->ms.sym;
+	struct symbol *sym_r = right->ms.sym;
+
+	if (!sym_l && !sym_r)
+		return left->ip != right->ip;
+
+	return !!_sort__sym_cmp(sym_l, sym_r);
+}
+
+
 static int
 iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
 			       struct addr_location *al)
@@ -1096,6 +1110,7 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
 	};
 	int i;
 	struct callchain_cursor cursor;
+	bool fast = hists__has(he_tmp.hists, sym);
 
 	callchain_cursor_snapshot(&cursor, &callchain_cursor);
 
@@ -1106,6 +1121,14 @@ iter_add_next_cumulative_entry(struct hist_entry_iter *iter,
 	 * It's possible that it has cycles or recursive calls.
 	 */
 	for (i = 0; i < iter->curr; i++) {
+		/*
+		 * For most cases, there are no duplicate entries in callchain.
+		 * The symbols are usually different. Do a quick check for
+		 * symbols first.
+		 */
+		if (fast && hist_entry__fast__sym_diff(he_cache[i], &he_tmp))
+			continue;
+
 		if (hist_entry__cmp(he_cache[i], &he_tmp) == 0) {
 			/* to avoid calling callback function */
 			iter->he = NULL;
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index f14cc72..dc15ddc 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -237,7 +237,7 @@ static int64_t _sort__addr_cmp(u64 left_ip, u64 right_ip)
 	return (int64_t)(right_ip - left_ip);
 }
 
-static int64_t _sort__sym_cmp(struct symbol *sym_l, struct symbol *sym_r)
+int64_t _sort__sym_cmp(struct symbol *sym_l, struct symbol *sym_r)
 {
 	if (!sym_l || !sym_r)
 		return cmp_null(sym_l, sym_r);
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index cfa6ac6..66d39c4 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -311,5 +311,7 @@ int64_t
 sort__daddr_cmp(struct hist_entry *left, struct hist_entry *right);
 int64_t
 sort__dcacheline_cmp(struct hist_entry *left, struct hist_entry *right);
+int64_t
+_sort__sym_cmp(struct symbol *sym_l, struct symbol *sym_r);
 char *hist_entry__srcline(struct hist_entry *he);
 #endif	/* __PERF_SORT_H */

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [tip: perf/core] perf c2c: Add option to enable the LBR stitching approach
  2020-03-19 20:25 ` [PATCH V4 16/17] perf c2c: " kan.liang
@ 2020-04-22 12:17   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 46+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-04-22 12:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Kan Liang, Andi Kleen, Jiri Olsa, Adrian Hunter, Alexey Budankov,
	Mathieu Poirier, Michael Ellerman, Namhyung Kim, Pavel Gerasimov,
	Peter Zijlstra, Ravi Bangoria, Stephane Eranian,
	Vitaly Slobodskoy, Arnaldo Carvalho de Melo, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     d80da766d181555d0c846298b8c619c384c7d179
Gitweb:        https://git.kernel.org/tip/d80da766d181555d0c846298b8c619c384c7d179
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 19 Mar 2020 13:25:16 -07:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Sat, 18 Apr 2020 09:05:01 -03:00

perf c2c: Add option to enable the LBR stitching approach

With the LBR stitching approach, the reconstructed LBR call stack can
break the HW limitation. However, it may reconstruct invalid call stacks
in some cases, e.g. exception handing such as setjmp/longjmp.  Also, it
may impact the processing time especially when the number of samples
with stitched LBRs are huge.

Add an option to enable the approach.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
Link: http://lore.kernel.org/lkml/20200319202517.23423-17-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-c2c.txt | 11 +++++++++++
 tools/perf/builtin-c2c.c              | 12 ++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
index e6150f2..2133eb3 100644
--- a/tools/perf/Documentation/perf-c2c.txt
+++ b/tools/perf/Documentation/perf-c2c.txt
@@ -111,6 +111,17 @@ REPORT OPTIONS
 --display::
 	Switch to HITM type (rmt, lcl) to display and sort on. Total HITMs as default.
 
+--stitch-lbr::
+	Show callgraph with stitched LBRs, which may have more complete
+	callgraph. The perf.data file must have been obtained using
+	perf c2c record --call-graph lbr.
+	Disabled by default. In common cases with call stack overflows,
+	it can recreate better call stacks than the default lbr call stack
+	output. But this approach is not full proof. There can be cases
+	where it creates incorrect call stacks from incorrect matches.
+	The known limitations include exception handing such as
+	setjmp/longjmp will have calls/returns not match.
+
 C2C RECORD
 ----------
 The perf c2c record command setup options related to HITM cacheline analysis
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 246ac0b..0d544c4 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -95,6 +95,7 @@ struct perf_c2c {
 	bool			 use_stdio;
 	bool			 stats_only;
 	bool			 symbol_full;
+	bool			 stitch_lbr;
 
 	/* HITM shared clines stats */
 	struct c2c_stats	hitm_stats;
@@ -273,6 +274,9 @@ static int process_sample_event(struct perf_tool *tool __maybe_unused,
 		return -1;
 	}
 
+	if (c2c.stitch_lbr)
+		al.thread->lbr_stitch_enable = true;
+
 	ret = sample__resolve_callchain(sample, &callchain_cursor, NULL,
 					evsel, &al, sysctl_perf_event_max_stack);
 	if (ret)
@@ -2601,6 +2605,12 @@ static int setup_callchain(struct evlist *evlist)
 		}
 	}
 
+	if (c2c.stitch_lbr && (mode != CALLCHAIN_LBR)) {
+		ui__warning("Can't find LBR callchain. Switch off --stitch-lbr.\n"
+			    "Please apply --call-graph lbr when recording.\n");
+		c2c.stitch_lbr = false;
+	}
+
 	callchain_param.record_mode = mode;
 	callchain_param.min_percent = 0;
 	return 0;
@@ -2752,6 +2762,8 @@ static int perf_c2c__report(int argc, const char **argv)
 	OPT_STRING('c', "coalesce", &coalesce, "coalesce fields",
 		   "coalesce fields: pid,tid,iaddr,dso"),
 	OPT_BOOLEAN('f', "force", &symbol_conf.force, "don't complain, do it"),
+	OPT_BOOLEAN(0, "stitch-lbr", &c2c.stitch_lbr,
+		    "Enable LBR callgraph stitching approach"),
 	OPT_PARENT(c2c_options),
 	OPT_END()
 	};

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [tip: perf/core] perf top: Add option to enable the LBR stitching approach
  2020-03-19 20:25 ` [PATCH V4 15/17] perf top: " kan.liang
@ 2020-04-22 12:17   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 46+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-04-22 12:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Kan Liang, Andi Kleen, Jiri Olsa, Arnaldo Carvalho de Melo,
	Adrian Hunter, Alexey Budankov, Mathieu Poirier,
	Michael Ellerman, Namhyung Kim, Pavel Gerasimov, Peter Zijlstra,
	Ravi Bangoria, Stephane Eranian, Vitaly Slobodskoy, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     13e0c844fa097f657bd8204fd574477c34f47a0c
Gitweb:        https://git.kernel.org/tip/13e0c844fa097f657bd8204fd574477c34f47a0c
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 19 Mar 2020 13:25:15 -07:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Sat, 18 Apr 2020 09:05:01 -03:00

perf top: Add option to enable the LBR stitching approach

With the LBR stitching approach, the reconstructed LBR call stack
can break the HW limitation. However, it may reconstruct invalid call
stacks in some cases, e.g. exception handing such as setjmp/longjmp.
Also, it may impact the processing time especially when the number of
samples with stitched LBRs are huge.

Add an option to enable the approach.
The option must be used with --call-graph lbr.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
Link: http://lore.kernel.org/lkml/20200319202517.23423-16-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-top.txt |  9 +++++++++
 tools/perf/builtin-top.c              | 11 +++++++++++
 tools/perf/util/top.h                 |  1 +
 3 files changed, 21 insertions(+)

diff --git a/tools/perf/Documentation/perf-top.txt b/tools/perf/Documentation/perf-top.txt
index 487737a..20227da 100644
--- a/tools/perf/Documentation/perf-top.txt
+++ b/tools/perf/Documentation/perf-top.txt
@@ -319,6 +319,15 @@ Default is to monitor all CPUS.
 	go straight to the histogram browser, just like 'perf top' with no events
 	explicitely specified does.
 
+--stitch-lbr::
+	Show callgraph with stitched LBRs, which may have more complete
+	callgraph. The option must be used with --call-graph lbr recording.
+	Disabled by default. In common cases with call stack overflows,
+	it can recreate better call stacks than the default lbr call stack
+	output. But this approach is not full proof. There can be cases
+	where it creates incorrect call stacks from incorrect matches.
+	The known limitations include exception handing such as
+	setjmp/longjmp will have calls/returns not match.
 
 INTERACTIVE PROMPTING KEYS
 --------------------------
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 289cf83..6b067a5 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -33,6 +33,7 @@
 #include "util/map.h"
 #include "util/mmap.h"
 #include "util/session.h"
+#include "util/thread.h"
 #include "util/symbol.h"
 #include "util/synthetic-events.h"
 #include "util/top.h"
@@ -775,6 +776,9 @@ static void perf_event__process_sample(struct perf_tool *tool,
 	if (machine__resolve(machine, &al, sample) < 0)
 		return;
 
+	if (top->stitch_lbr)
+		al.thread->lbr_stitch_enable = true;
+
 	if (!machine->kptr_restrict_warned &&
 	    symbol_conf.kptr_restrict &&
 	    al.cpumode == PERF_RECORD_MISC_KERNEL) {
@@ -1571,6 +1575,8 @@ int cmd_top(int argc, const char **argv)
 		    "Sort the output by the event at the index n in group. "
 		    "If n is invalid, sort by the first event. "
 		    "WARNING: should be used on grouped events."),
+	OPT_BOOLEAN(0, "stitch-lbr", &top.stitch_lbr,
+		    "Enable LBR callgraph stitching approach"),
 	OPTS_EVSWITCH(&top.evswitch),
 	OPT_END()
 	};
@@ -1640,6 +1646,11 @@ int cmd_top(int argc, const char **argv)
 		}
 	}
 
+	if (top.stitch_lbr && !(callchain_param.record_mode == CALLCHAIN_LBR)) {
+		pr_err("Error: --stitch-lbr must be used with --call-graph lbr\n");
+		goto out_delete_evlist;
+	}
+
 	if (opts->branch_stack && callchain_param.enabled)
 		symbol_conf.show_branchflag_count = true;
 
diff --git a/tools/perf/util/top.h b/tools/perf/util/top.h
index f117d4f..45dc84d 100644
--- a/tools/perf/util/top.h
+++ b/tools/perf/util/top.h
@@ -36,6 +36,7 @@ struct perf_top {
 	bool		   use_tui, use_stdio;
 	bool		   vmlinux_warned;
 	bool		   dump_symtab;
+	bool		   stitch_lbr;
 	struct hist_entry  *sym_filter_entry;
 	struct evsel 	   *sym_evsel;
 	struct perf_session *session;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [tip: perf/core] perf script: Add option to enable the LBR stitching approach
  2020-03-19 20:25 ` [PATCH V4 14/17] perf script: " kan.liang
@ 2020-04-22 12:17   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 46+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-04-22 12:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Kan Liang, Andi Kleen, Jiri Olsa, Arnaldo Carvalho de Melo,
	Adrian Hunter, Alexey Budankov, Mathieu Poirier,
	Michael Ellerman, Namhyung Kim, Pavel Gerasimov, Peter Zijlstra,
	Ravi Bangoria, Stephane Eranian, Vitaly Slobodskoy, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     680d125cd522d460b24ccc8b29f03cdb62dea23e
Gitweb:        https://git.kernel.org/tip/680d125cd522d460b24ccc8b29f03cdb62dea23e
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 19 Mar 2020 13:25:14 -07:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Sat, 18 Apr 2020 09:05:01 -03:00

perf script: Add option to enable the LBR stitching approach

With the LBR stitching approach, the reconstructed LBR call stack can
break the HW limitation. However, it may reconstruct invalid call stacks
in some cases, e.g. exception handing such as setjmp/longjmp.  Also, it
may impact the processing time especially when the number of samples
with stitched LBRs are huge.

Add an option to enable the approach.

Committer testing:

Using the same perf.data as with the latest cset committer testing
section:

  $ perf script --stitch-lbr
  <SNIP>
  tchain_edit 11131 15164.984292:     437491 cycles:u:
                    401106 f43+0x0 (/wb/tchain_edit)
                    40114c f42+0x18 (/wb/tchain_edit)
                    401172 f41+0xe (/wb/tchain_edit)
                    401194 f40+0x0 (/wb/tchain_edit)
                    40119b f39+0x0 (/wb/tchain_edit)
                    4011a2 f38+0x0 (/wb/tchain_edit)
                    4011a9 f37+0x0 (/wb/tchain_edit)
                    4011b0 f36+0x0 (/wb/tchain_edit)
                    4011b7 f35+0x0 (/wb/tchain_edit)
                    4011be f34+0x0 (/wb/tchain_edit)
                    4011c5 f33+0x0 (/wb/tchain_edit)
                    4011cc f32+0x0 (/wb/tchain_edit)
                    401207 f31+0x34 (/wb/tchain_edit)
                    401212 f30+0x0 (/wb/tchain_edit)
                    401219 f29+0x0 (/wb/tchain_edit)
                    401220 f28+0x0 (/wb/tchain_edit)
                    401227 f27+0x0 (/wb/tchain_edit)
                    40122e f26+0x0 (/wb/tchain_edit)
                    401235 f25+0x0 (/wb/tchain_edit)
                    40123c f24+0x0 (/wb/tchain_edit)
                    401243 f23+0x0 (/wb/tchain_edit)
                    40124a f22+0x0 (/wb/tchain_edit)
                    401251 f21+0x0 (/wb/tchain_edit)
                    401258 f20+0x0 (/wb/tchain_edit)
                    40125f f19+0x0 (/wb/tchain_edit)
                    401266 f18+0x0 (/wb/tchain_edit)
                    40126d f17+0x0 (/wb/tchain_edit)
                    401274 f16+0x0 (/wb/tchain_edit)
                    40127b f15+0x0 (/wb/tchain_edit)
                    401282 f14+0x0 (/wb/tchain_edit)
                    401289 f13+0x0 (/wb/tchain_edit)
                    401290 f12+0x0 (/wb/tchain_edit)
                    401297 f11+0x0 (/wb/tchain_edit)
                    40129e f10+0x0 (/wb/tchain_edit)
                    4012a5 f9+0x0 (/wb/tchain_edit)
                    4012ac f8+0x0 (/wb/tchain_edit)
                    4012b3 f7+0x0 (/wb/tchain_edit)
                    4012ba f6+0x0 (/wb/tchain_edit)
                    4012c1 f5+0x0 (/wb/tchain_edit)
                    4012c8 f4+0x0 (/wb/tchain_edit)
                    4012cf f3+0x0 (/wb/tchain_edit)
                    4012d6 f2+0x0 (/wb/tchain_edit)
                    4012dd f1+0x0 (/wb/tchain_edit)
                    4012e4 main+0x0 (/wb/tchain_edit)
              7f41a5016f41 __libc_start_main+0xf1 (/usr/lib64/libc-2.29.so)
  <SNIP>
  $

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
Link: http://lore.kernel.org/lkml/20200319202517.23423-15-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-script.txt | 11 +++++++++++
 tools/perf/builtin-script.c              | 12 ++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 963487e..372dfd1 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -440,6 +440,17 @@ include::itrace.txt[]
 --show-on-off-events::
 	Show the --switch-on/off events too.
 
+--stitch-lbr::
+	Show callgraph with stitched LBRs, which may have more complete
+	callgraph. The perf.data file must have been obtained using
+	perf record --call-graph lbr.
+	Disabled by default. In common cases with call stack overflows,
+	it can recreate better call stacks than the default lbr call stack
+	output. But this approach is not full proof. There can be cases
+	where it creates incorrect call stacks from incorrect matches.
+	The known limitations include exception handing such as
+	setjmp/longjmp will have calls/returns not match.
+
 SEE ALSO
 --------
 linkperf:perf-record[1], linkperf:perf-script-perl[1],
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 06b511c..a223654 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1697,6 +1697,7 @@ struct perf_script {
 	bool			show_cgroup_events;
 	bool			allocated;
 	bool			per_event_dump;
+	bool			stitch_lbr;
 	struct evswitch		evswitch;
 	struct perf_cpu_map	*cpus;
 	struct perf_thread_map *threads;
@@ -1923,6 +1924,9 @@ static void process_event(struct perf_script *script,
 	if (PRINT_FIELD(IP)) {
 		struct callchain_cursor *cursor = NULL;
 
+		if (script->stitch_lbr)
+			al->thread->lbr_stitch_enable = true;
+
 		if (symbol_conf.use_callchain && sample->callchain &&
 		    thread__resolve_callchain(al->thread, &callchain_cursor, evsel,
 					      sample, NULL, NULL, scripting_max_stack) == 0)
@@ -3170,6 +3174,12 @@ static void script__setup_sample_type(struct perf_script *script)
 		else
 			callchain_param.record_mode = CALLCHAIN_FP;
 	}
+
+	if (script->stitch_lbr && (callchain_param.record_mode != CALLCHAIN_LBR)) {
+		pr_warning("Can't find LBR callchain. Switch off --stitch-lbr.\n"
+			   "Please apply --call-graph lbr when recording.\n");
+		script->stitch_lbr = false;
+	}
 }
 
 static int process_stat_round_event(struct perf_session *session,
@@ -3481,6 +3491,8 @@ int cmd_script(int argc, const char **argv)
 		   "file", "file saving guest os /proc/kallsyms"),
 	OPT_STRING(0, "guestmodules", &symbol_conf.default_guest_modules,
 		   "file", "file saving guest os /proc/modules"),
+	OPT_BOOLEAN('\0', "stitch-lbr", &script.stitch_lbr,
+		    "Enable LBR callgraph stitching approach"),
 	OPTS_EVSWITCH(&script.evswitch),
 	OPT_END()
 	};

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [tip: perf/core] perf report: Add option to enable the LBR stitching approach
  2020-03-19 20:25 ` [PATCH V4 13/17] perf report: Add option to enable the LBR stitching approach kan.liang
@ 2020-04-22 12:17   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 46+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-04-22 12:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Kan Liang, Andi Kleen, Jiri Olsa, Arnaldo Carvalho de Melo,
	Adrian Hunter, Alexey Budankov, Mathieu Poirier,
	Michael Ellerman, Namhyung Kim, Pavel Gerasimov, Peter Zijlstra,
	Ravi Bangoria, Stephane Eranian, Vitaly Slobodskoy, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     b1d1429b1820e1587d8588fc05b28ef9af42cfc6
Gitweb:        https://git.kernel.org/tip/b1d1429b1820e1587d8588fc05b28ef9af42cfc6
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 19 Mar 2020 13:25:13 -07:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Sat, 18 Apr 2020 09:05:01 -03:00

perf report: Add option to enable the LBR stitching approach

With the LBR stitching approach, the reconstructed LBR call stack can
break the HW limitation. However, it may reconstruct invalid call stacks
in some cases, e.g. exception handing such as setjmp/longjmp.  Also, it
may impact the processing time especially when the number of samples
with stitched LBRs are huge.

Add an option to enable the approach.

  # To display the perf.data header info, please use
  # --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 6K of event 'cycles'
  # Event count (approx.): 6492797701
  #
  # Children      Self  Command          Shared Object       Symbol
  # ........  ........  ...............  ..................
  # .................................
  #
    99.99%    99.99%  tchain_edit      tchain_edit        [.] f43
            |
            ---main
               f1
               f2
               f3
               f4
               f5
               f6
               f7
               f8
               f9
               f10
               f11
               f12
               f13
               f14
               f15
               f16
               f17
               f18
               f19
               f20
               f21
               f22
               f23
               f24
               f25
               f26
               f27
               f28
               f29
               f30
               f31
               |
                --99.65%--f32
                          f33
                          f34
                          f35
                          f36
                          f37
                          f38
                          f39
                          f40
                          f41
                          f42
                          f43

Committer testing:

  $ perf record --call-graph lbr /wb/tchain_edit
  [ perf record: Woken up 23 times to write data ]
  [ perf record: Captured and wrote 5.578 MB perf.data (6839 samples) ]
  $ perf report --header-only | egrep 'cpu(desc|.*capabilities)'
  # cpudesc : Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  # cpu pmu capabilities: branches=32, max_precise=3, pmu_name=skylake
  $

Before:

  $ perf report --no-children --stdio
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 6K of event 'cycles:u'
  # Event count (approx.): 6459523879
  #
  # Overhead  Command      Shared Object     Symbol
  # ........  ...........  ................  .......................
  #
      99.95%  tchain_edit  tchain_edit       [.] f43
              |
               --99.92%--f43
                         f42
                         f41
                         f40
                         f39
                         f38
                         f37
                         f36
                         f35
                         f34
                         f33
                         f32
                         f31
                         f30
                         f29
                         f28
                         f27
                         f26
                         f25
                         f24
                         f23
                         f22
                         f21
                         f20
                         f19
                         f18
                         f17
                         f16
                         f15
                         f14
                         f13
                         f12
                         f11

       0.03%  tchain_edit  tchain_edit       [.] f42
       0.01%  tchain_edit  tchain_edit       [.] f41
       0.00%  tchain_edit  tchain_edit       [.] f31
       0.00%  tchain_edit  ld-2.29.so        [.] _dl_relocate_object
       0.00%  tchain_edit  ld-2.29.so        [.] memmove
       0.00%  tchain_edit  [unknown]         [k] 0xffffffff93a00b17

After:

  $ perf report --stitch-lbr --no-children --stdio
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 6K of event 'cycles:u'
  # Event count (approx.): 6459496645
  #
  # Overhead  Command      Shared Object     Symbol
  # ........  ...........  ................  ........................
  #
      99.97%  tchain_edit  tchain_edit       [.] f43
              |
               --99.93%--f43
                         f42
                         f41
                         f40
                         f39
                         f38
                         f37
                         f36
                         f35
                         f34
                         f33
                         f32
                         f31
                         f30
                         f29
                         f28
                         f27
                         f26
                         f25
                         f24
                         f23
                         f22
                         f21
                         f20
                         f19
                         f18
                         f17
                         f16
                         f15
                         f14
                         f13
                         f12
                         f11
                         f10
                         f9
                         f8
                         f7
                         f6
                         f5
                         f4
                         f3
                         f2
                         f1
                         main
                         __libc_start_main

       0.02%  tchain_edit  [unknown]         [k] 0xffffffff93a00b17
       0.01%  tchain_edit  tchain_edit       [.] f31
       0.00%  tchain_edit  ld-2.29.so        [.] _dl_important_hwcaps

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
Link: http://lore.kernel.org/lkml/20200319202517.23423-14-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-report.txt | 11 +++++++++++
 tools/perf/builtin-report.c              | 12 ++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index f569b9e..d068103 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -488,6 +488,17 @@ include::itrace.txt[]
 	This option extends the perf report to show reference callgraphs,
 	which collected by reference event, in no callgraph event.
 
+--stitch-lbr::
+	Show callgraph with stitched LBRs, which may have more complete
+	callgraph. The perf.data file must have been obtained using
+	perf record --call-graph lbr.
+	Disabled by default. In common cases with call stack overflows,
+	it can recreate better call stacks than the default lbr call stack
+	output. But this approach is not full proof. There can be cases
+	where it creates incorrect call stacks from incorrect matches.
+	The known limitations include exception handing such as
+	setjmp/longjmp will have calls/returns not match.
+
 --socket-filter::
 	Only report the samples on the processor socket that match with this filter
 
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index c0cebd5..0c32767 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -84,6 +84,7 @@ struct report {
 	bool			header_only;
 	bool			nonany_branch_mode;
 	bool			group_set;
+	bool			stitch_lbr;
 	int			max_stack;
 	struct perf_read_values	show_threads_values;
 	struct annotation_options annotation_opts;
@@ -267,6 +268,9 @@ static int process_sample_event(struct perf_tool *tool,
 		return -1;
 	}
 
+	if (rep->stitch_lbr)
+		al.thread->lbr_stitch_enable = true;
+
 	if (symbol_conf.hide_unresolved && al.sym == NULL)
 		goto out_put;
 
@@ -408,6 +412,12 @@ static int report__setup_sample_type(struct report *rep)
 			callchain_param.record_mode = CALLCHAIN_FP;
 	}
 
+	if (rep->stitch_lbr && (callchain_param.record_mode != CALLCHAIN_LBR)) {
+		ui__warning("Can't find LBR callchain. Switch off --stitch-lbr.\n"
+			    "Please apply --call-graph lbr when recording.\n");
+		rep->stitch_lbr = false;
+	}
+
 	/* ??? handle more cases than just ANY? */
 	if (!(perf_evlist__combined_branch_type(session->evlist) &
 				PERF_SAMPLE_BRANCH_ANY))
@@ -1258,6 +1268,8 @@ int cmd_report(int argc, const char **argv)
 			"Show full source file name path for source lines"),
 	OPT_BOOLEAN(0, "show-ref-call-graph", &symbol_conf.show_ref_callgraph,
 		    "Show callgraph from reference event"),
+	OPT_BOOLEAN(0, "stitch-lbr", &report.stitch_lbr,
+		    "Enable LBR callgraph stitching approach"),
 	OPT_INTEGER(0, "socket-filter", &report.socket_filter,
 		    "only show processor socket that match with this filter"),
 	OPT_BOOLEAN(0, "raw-trace", &symbol_conf.raw_trace,

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [tip: perf/core] perf callchain: Stitch LBR call stack
  2020-03-19 20:25 ` [PATCH V4 12/17] perf tools: Stitch LBR call stack kan.liang
@ 2020-04-22 12:17   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 46+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-04-22 12:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Kan Liang, Andi Kleen, Jiri Olsa, Adrian Hunter, Alexey Budankov,
	Mathieu Poirier, Michael Ellerman, Namhyung Kim, Pavel Gerasimov,
	Peter Zijlstra, Ravi Bangoria, Stephane Eranian,
	Vitaly Slobodskoy, Arnaldo Carvalho de Melo, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     ff165628d72644e37674c5485658e8bd9f4a348b
Gitweb:        https://git.kernel.org/tip/ff165628d72644e37674c5485658e8bd9f4a348b
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 19 Mar 2020 13:25:12 -07:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Sat, 18 Apr 2020 09:05:01 -03:00

perf callchain: Stitch LBR call stack

In LBR call stack mode, the depth of reconstructed LBR call stack limits
to the number of LBR registers.

  For example, on skylake, the depth of reconstructed LBR call stack is
  always <= 32.

  # To display the perf.data header info, please use
  # --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 6K of event 'cycles'
  # Event count (approx.): 6487119731
  #
  # Children      Self  Command          Shared Object       Symbol
  # ........  ........  ...............  ..................
  # ................................

    99.97%    99.97%  tchain_edit      tchain_edit        [.] f43
            |
             --99.64%--f11
                       f12
                       f13
                       f14
                       f15
                       f16
                       f17
                       f18
                       f19
                       f20
                       f21
                       f22
                       f23
                       f24
                       f25
                       f26
                       f27
                       f28
                       f29
                       f30
                       f31
                       f32
                       f33
                       f34
                       f35
                       f36
                       f37
                       f38
                       f39
                       f40
                       f41
                       f42
                       f43

For a call stack which is deeper than LBR limit, HW will overwrite the
LBR register with oldest branch. Only partial call stacks can be
reconstructed.

However, the overwritten LBRs may still be retrieved from previous
sample. At that moment, HW hasn't overwritten the LBR registers yet.
Perf tools can stitch those overwritten LBRs on current call stacks to
get a more complete call stack.

To determine if LBRs can be stitched, perf tools need to compare current
sample with previous sample.

- They should have identical LBR records (Same from, to and flags
  values, and the same physical index of LBR registers).

- The searching starts from the base-of-stack of current sample.

Once perf determines to stitch the previous LBRs, the corresponding LBR
cursor nodes will be copied to 'lists'.  The 'lists' is to track the LBR
cursor nodes which are going to be stitched.

When the stitching is over, the nodes will not be freed immediately.
They will be moved to 'free_lists'. Next stitching may reuse the space.
Both 'lists' and 'free_lists' will be freed when all samples are
processed.

Committer notes:

Fix the intel-pt.c initialization of the union with 'struct
branch_flags', that breaks the build with its unnamed union on older gcc
versions.

Uninline thread__free_stitch_list(), as it grew big and started dragging
includes to thread.h, so move it to thread.c where what it needs in
terms of headers are already there.

This fixes the build in several systems such as debian:experimental when
cross building to the MIPS32 architecture, i.e. in the other cases what
was needed was being included by sheer luck.

  In file included from builtin-sched.c:11:
  util/thread.h: In function 'thread__free_stitch_list':
  util/thread.h:169:3: error: implicit declaration of function 'free' [-Werror=implicit-function-declaration]
    169 |   free(pos);
        |   ^~~~
  util/thread.h:169:3: error: incompatible implicit declaration of built-in function 'free' [-Werror]
  util/thread.h:19:1: note: include '<stdlib.h>' or provide a declaration of 'free'
     18 | #include "callchain.h"
    +++ |+#include <stdlib.h>
     19 |
  util/thread.h:174:3: error: incompatible implicit declaration of built-in function 'free' [-Werror]
    174 |   free(pos);
        |   ^~~~
  util/thread.h:174:3: note: include '<stdlib.h>' or provide a declaration of 'free'

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
Link: http://lore.kernel.org/lkml/20200319202517.23423-13-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/branch.h    |  19 +++--
 tools/perf/util/callchain.h |   5 +-
 tools/perf/util/intel-pt.c  |  17 +---
 tools/perf/util/machine.c   | 139 ++++++++++++++++++++++++++++++++++-
 tools/perf/util/thread.c    |  22 ++++++-
 tools/perf/util/thread.h    |  14 +----
 6 files changed, 188 insertions(+), 28 deletions(-)

diff --git a/tools/perf/util/branch.h b/tools/perf/util/branch.h
index 154a05c..4d3f02f 100644
--- a/tools/perf/util/branch.h
+++ b/tools/perf/util/branch.h
@@ -15,13 +15,18 @@
 #include "event.h"
 
 struct branch_flags {
-	u64 mispred:1;
-	u64 predicted:1;
-	u64 in_tx:1;
-	u64 abort:1;
-	u64 cycles:16;
-	u64 type:4;
-	u64 reserved:40;
+	union {
+		u64 value;
+		struct {
+			u64 mispred:1;
+			u64 predicted:1;
+			u64 in_tx:1;
+			u64 abort:1;
+			u64 cycles:16;
+			u64 type:4;
+			u64 reserved:40;
+		};
+	};
 };
 
 struct branch_info {
diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index cb33cd4..8f668ee 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -154,6 +154,11 @@ struct callchain_cursor_node {
 	struct callchain_cursor_node	*next;
 };
 
+struct stitch_list {
+	struct list_head		node;
+	struct callchain_cursor_node	cursor;
+};
+
 struct callchain_cursor {
 	u64				nr;
 	struct callchain_cursor_node	*first;
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index a659b4a..4be7634 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -1717,15 +1717,14 @@ static u64 intel_pt_lbr_flags(u64 info)
 	union {
 		struct branch_flags flags;
 		u64 result;
-	} u = {
-		.flags = {
-			.mispred	= !!(info & LBR_INFO_MISPRED),
-			.predicted	= !(info & LBR_INFO_MISPRED),
-			.in_tx		= !!(info & LBR_INFO_IN_TX),
-			.abort		= !!(info & LBR_INFO_ABORT),
-			.cycles		= info & LBR_INFO_CYCLES,
-		}
-	};
+	} u;
+
+	u.result	  = 0;
+	u.flags.mispred	  = !!(info & LBR_INFO_MISPRED);
+	u.flags.predicted = !(info & LBR_INFO_MISPRED);
+	u.flags.in_tx	  = !!(info & LBR_INFO_IN_TX);
+	u.flags.abort	  = !!(info & LBR_INFO_ABORT);
+	u.flags.cycles	  = info & LBR_INFO_CYCLES;
 
 	return u.result;
 }
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 737dee7..5ac32ca 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2348,6 +2348,119 @@ static int lbr_callchain_add_lbr_ip(struct thread *thread,
 	return 0;
 }
 
+static int lbr_callchain_add_stitched_lbr_ip(struct thread *thread,
+					     struct callchain_cursor *cursor)
+{
+	struct lbr_stitch *lbr_stitch = thread->lbr_stitch;
+	struct callchain_cursor_node *cnode;
+	struct stitch_list *stitch_node;
+	int err;
+
+	list_for_each_entry(stitch_node, &lbr_stitch->lists, node) {
+		cnode = &stitch_node->cursor;
+
+		err = callchain_cursor_append(cursor, cnode->ip,
+					      &cnode->ms,
+					      cnode->branch,
+					      &cnode->branch_flags,
+					      cnode->nr_loop_iter,
+					      cnode->iter_cycles,
+					      cnode->branch_from,
+					      cnode->srcline);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
+static struct stitch_list *get_stitch_node(struct thread *thread)
+{
+	struct lbr_stitch *lbr_stitch = thread->lbr_stitch;
+	struct stitch_list *stitch_node;
+
+	if (!list_empty(&lbr_stitch->free_lists)) {
+		stitch_node = list_first_entry(&lbr_stitch->free_lists,
+					       struct stitch_list, node);
+		list_del(&stitch_node->node);
+
+		return stitch_node;
+	}
+
+	return malloc(sizeof(struct stitch_list));
+}
+
+static bool has_stitched_lbr(struct thread *thread,
+			     struct perf_sample *cur,
+			     struct perf_sample *prev,
+			     unsigned int max_lbr,
+			     bool callee)
+{
+	struct branch_stack *cur_stack = cur->branch_stack;
+	struct branch_entry *cur_entries = perf_sample__branch_entries(cur);
+	struct branch_stack *prev_stack = prev->branch_stack;
+	struct branch_entry *prev_entries = perf_sample__branch_entries(prev);
+	struct lbr_stitch *lbr_stitch = thread->lbr_stitch;
+	int i, j, nr_identical_branches = 0;
+	struct stitch_list *stitch_node;
+	u64 cur_base, distance;
+
+	if (!cur_stack || !prev_stack)
+		return false;
+
+	/* Find the physical index of the base-of-stack for current sample. */
+	cur_base = max_lbr - cur_stack->nr + cur_stack->hw_idx + 1;
+
+	distance = (prev_stack->hw_idx > cur_base) ? (prev_stack->hw_idx - cur_base) :
+						     (max_lbr + prev_stack->hw_idx - cur_base);
+	/* Previous sample has shorter stack. Nothing can be stitched. */
+	if (distance + 1 > prev_stack->nr)
+		return false;
+
+	/*
+	 * Check if there are identical LBRs between two samples.
+	 * Identicall LBRs must have same from, to and flags values. Also,
+	 * they have to be saved in the same LBR registers (same physical
+	 * index).
+	 *
+	 * Starts from the base-of-stack of current sample.
+	 */
+	for (i = distance, j = cur_stack->nr - 1; (i >= 0) && (j >= 0); i--, j--) {
+		if ((prev_entries[i].from != cur_entries[j].from) ||
+		    (prev_entries[i].to != cur_entries[j].to) ||
+		    (prev_entries[i].flags.value != cur_entries[j].flags.value))
+			break;
+		nr_identical_branches++;
+	}
+
+	if (!nr_identical_branches)
+		return false;
+
+	/*
+	 * Save the LBRs between the base-of-stack of previous sample
+	 * and the base-of-stack of current sample into lbr_stitch->lists.
+	 * These LBRs will be stitched later.
+	 */
+	for (i = prev_stack->nr - 1; i > (int)distance; i--) {
+
+		if (!lbr_stitch->prev_lbr_cursor[i].valid)
+			continue;
+
+		stitch_node = get_stitch_node(thread);
+		if (!stitch_node)
+			return false;
+
+		memcpy(&stitch_node->cursor, &lbr_stitch->prev_lbr_cursor[i],
+		       sizeof(struct callchain_cursor_node));
+
+		if (callee)
+			list_add(&stitch_node->node, &lbr_stitch->lists);
+		else
+			list_add_tail(&stitch_node->node, &lbr_stitch->lists);
+	}
+
+	return true;
+}
+
 static bool alloc_lbr_stitch(struct thread *thread, unsigned int max_lbr)
 {
 	if (thread->lbr_stitch)
@@ -2361,6 +2474,9 @@ static bool alloc_lbr_stitch(struct thread *thread, unsigned int max_lbr)
 	if (!thread->lbr_stitch->prev_lbr_cursor)
 		goto free_lbr_stitch;
 
+	INIT_LIST_HEAD(&thread->lbr_stitch->lists);
+	INIT_LIST_HEAD(&thread->lbr_stitch->free_lists);
+
 	return true;
 
 free_lbr_stitch:
@@ -2386,9 +2502,11 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 					int max_stack,
 					unsigned int max_lbr)
 {
+	bool callee = (callchain_param.order == ORDER_CALLEE);
 	struct ip_callchain *chain = sample->callchain;
 	int chain_nr = min(max_stack, (int)chain->nr), i;
 	struct lbr_stitch *lbr_stitch;
+	bool stitched_lbr = false;
 	u64 branch_from = 0;
 	int err;
 
@@ -2405,10 +2523,18 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 	    (max_lbr > 0) && alloc_lbr_stitch(thread, max_lbr)) {
 		lbr_stitch = thread->lbr_stitch;
 
+		stitched_lbr = has_stitched_lbr(thread, sample,
+						&lbr_stitch->prev_sample,
+						max_lbr, callee);
+
+		if (!stitched_lbr && !list_empty(&lbr_stitch->lists)) {
+			list_replace_init(&lbr_stitch->lists,
+					  &lbr_stitch->free_lists);
+		}
 		memcpy(&lbr_stitch->prev_sample, sample, sizeof(*sample));
 	}
 
-	if (callchain_param.order == ORDER_CALLEE) {
+	if (callee) {
 		/* Add kernel ip */
 		err = lbr_callchain_add_kernel_ip(thread, cursor, sample,
 						  parent, root_al, branch_from,
@@ -2421,7 +2547,18 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 		if (err)
 			goto error;
 
+		if (stitched_lbr) {
+			err = lbr_callchain_add_stitched_lbr_ip(thread, cursor);
+			if (err)
+				goto error;
+		}
+
 	} else {
+		if (stitched_lbr) {
+			err = lbr_callchain_add_stitched_lbr_ip(thread, cursor);
+			if (err)
+				goto error;
+		}
 		err = lbr_callchain_add_lbr_ip(thread, cursor, sample, parent,
 					       root_al, &branch_from, false);
 		if (err)
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 8d0da26..665e5c0 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -454,3 +454,25 @@ int thread__memcpy(struct thread *thread, struct machine *machine,
 
        return dso__data_read_offset(al.map->dso, machine, offset, buf, len);
 }
+
+void thread__free_stitch_list(struct thread *thread)
+{
+	struct lbr_stitch *lbr_stitch = thread->lbr_stitch;
+	struct stitch_list *pos, *tmp;
+
+	if (!lbr_stitch)
+		return;
+
+	list_for_each_entry_safe(pos, tmp, &lbr_stitch->lists, node) {
+		list_del_init(&pos->node);
+		free(pos);
+	}
+
+	list_for_each_entry_safe(pos, tmp, &lbr_stitch->free_lists, node) {
+		list_del_init(&pos->node);
+		free(pos);
+	}
+
+	zfree(&lbr_stitch->prev_lbr_cursor);
+	zfree(&thread->lbr_stitch);
+}
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 8456174..b066fb3 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -5,7 +5,6 @@
 #include <linux/refcount.h>
 #include <linux/rbtree.h>
 #include <linux/list.h>
-#include <linux/zalloc.h>
 #include <stdio.h>
 #include <unistd.h>
 #include <sys/types.h>
@@ -24,6 +23,8 @@ struct thread_stack;
 struct unwind_libunwind_ops;
 
 struct lbr_stitch {
+	struct list_head		lists;
+	struct list_head		free_lists;
 	struct perf_sample		prev_sample;
 	struct callchain_cursor_node	*prev_lbr_cursor;
 };
@@ -154,15 +155,6 @@ static inline bool thread__is_filtered(struct thread *thread)
 	return false;
 }
 
-static inline void thread__free_stitch_list(struct thread *thread)
-{
-	struct lbr_stitch *lbr_stitch = thread->lbr_stitch;
-
-	if (!lbr_stitch)
-		return;
-
-	zfree(&lbr_stitch->prev_lbr_cursor);
-	zfree(&thread->lbr_stitch);
-}
+void thread__free_stitch_list(struct thread *thread);
 
 #endif	/* __PERF_THREAD_H */

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [tip: perf/core] perf callchain: Save previous cursor nodes for LBR stitching approach
  2020-03-19 20:25 ` [PATCH V4 11/17] perf tools: Save previous cursor nodes " kan.liang
  2020-04-17 16:53   ` Arnaldo Carvalho de Melo
@ 2020-04-22 12:17   ` tip-bot2 for Kan Liang
  1 sibling, 0 replies; 46+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-04-22 12:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Kan Liang, Andi Kleen, Jiri Olsa, Adrian Hunter, Alexey Budankov,
	Mathieu Poirier, Michael Ellerman, Namhyung Kim, Pavel Gerasimov,
	Peter Zijlstra, Ravi Bangoria, Stephane Eranian,
	Vitaly Slobodskoy, Arnaldo Carvalho de Melo, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     7f1d39317c071268b4204175df7cfbb2187acb72
Gitweb:        https://git.kernel.org/tip/7f1d39317c071268b4204175df7cfbb2187acb72
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 19 Mar 2020 13:25:11 -07:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Sat, 18 Apr 2020 09:05:01 -03:00

perf callchain: Save previous cursor nodes for LBR stitching approach

The cursor nodes which generates from sample are eventually added into
callchain. To avoid generating cursor nodes from previous samples again,
the previous cursor nodes are also saved for LBR stitching approach.

Some option, e.g. hide-unresolved, may hide some LBRs.  Add a variable
'valid' in struct callchain_cursor_node to indicate this case. The LBR
stitching approach will only append the valid cursor nodes from previous
samples later.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
Link: http://lore.kernel.org/lkml/20200319202517.23423-12-kan.liang@linux.intel.com
[ Use zfree() instead of open coded equivalent, and use it when freeing members of structs ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/callchain.h |  3 +-
 tools/perf/util/machine.c   | 76 ++++++++++++++++++++++++++++++++++--
 tools/perf/util/thread.h    |  8 ++++-
 3 files changed, 83 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/callchain.h b/tools/perf/util/callchain.h
index 706bb7b..cb33cd4 100644
--- a/tools/perf/util/callchain.h
+++ b/tools/perf/util/callchain.h
@@ -143,6 +143,9 @@ struct callchain_cursor_node {
 	u64				ip;
 	struct map_symbol		ms;
 	const char			*srcline;
+	/* Indicate valid cursor node for LBR stitch */
+	bool				valid;
+
 	bool				branch;
 	struct branch_flags		branch_flags;
 	u64				branch_from;
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index a54ca09..737dee7 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2224,6 +2224,31 @@ static int lbr_callchain_add_kernel_ip(struct thread *thread,
 	return 0;
 }
 
+static void save_lbr_cursor_node(struct thread *thread,
+				 struct callchain_cursor *cursor,
+				 int idx)
+{
+	struct lbr_stitch *lbr_stitch = thread->lbr_stitch;
+
+	if (!lbr_stitch)
+		return;
+
+	if (cursor->pos == cursor->nr) {
+		lbr_stitch->prev_lbr_cursor[idx].valid = false;
+		return;
+	}
+
+	if (!cursor->curr)
+		cursor->curr = cursor->first;
+	else
+		cursor->curr = cursor->curr->next;
+	memcpy(&lbr_stitch->prev_lbr_cursor[idx], cursor->curr,
+	       sizeof(struct callchain_cursor_node));
+
+	lbr_stitch->prev_lbr_cursor[idx].valid = true;
+	cursor->pos++;
+}
+
 static int lbr_callchain_add_lbr_ip(struct thread *thread,
 				    struct callchain_cursor *cursor,
 				    struct perf_sample *sample,
@@ -2240,6 +2265,21 @@ static int lbr_callchain_add_lbr_ip(struct thread *thread,
 	int err, i;
 	u64 ip;
 
+	/*
+	 * The curr and pos are not used in writing session. They are cleared
+	 * in callchain_cursor_commit() when the writing session is closed.
+	 * Using curr and pos to track the current cursor node.
+	 */
+	if (thread->lbr_stitch) {
+		cursor->curr = NULL;
+		cursor->pos = cursor->nr;
+		if (cursor->nr) {
+			cursor->curr = cursor->first;
+			for (i = 0; i < (int)(cursor->nr - 1); i++)
+				cursor->curr = cursor->curr->next;
+		}
+	}
+
 	if (callee) {
 		/* Add LBR ip from first entries.to */
 		ip = entries[0].to;
@@ -2252,6 +2292,20 @@ static int lbr_callchain_add_lbr_ip(struct thread *thread,
 		if (err)
 			return err;
 
+		/*
+		 * The number of cursor node increases.
+		 * Move the current cursor node.
+		 * But does not need to save current cursor node for entry 0.
+		 * It's impossible to stitch the whole LBRs of previous sample.
+		 */
+		if (thread->lbr_stitch && (cursor->pos != cursor->nr)) {
+			if (!cursor->curr)
+				cursor->curr = cursor->first;
+			else
+				cursor->curr = cursor->curr->next;
+			cursor->pos++;
+		}
+
 		/* Add LBR ip from entries.from one by one. */
 		for (i = 0; i < lbr_nr; i++) {
 			ip = entries[i].from;
@@ -2262,6 +2316,7 @@ static int lbr_callchain_add_lbr_ip(struct thread *thread,
 					       *branch_from);
 			if (err)
 				return err;
+			save_lbr_cursor_node(thread, cursor, i);
 		}
 		return 0;
 	}
@@ -2276,6 +2331,7 @@ static int lbr_callchain_add_lbr_ip(struct thread *thread,
 				       *branch_from);
 		if (err)
 			return err;
+		save_lbr_cursor_node(thread, cursor, i);
 	}
 
 	/* Add LBR ip from first entries.to */
@@ -2292,7 +2348,7 @@ static int lbr_callchain_add_lbr_ip(struct thread *thread,
 	return 0;
 }
 
-static bool alloc_lbr_stitch(struct thread *thread)
+static bool alloc_lbr_stitch(struct thread *thread, unsigned int max_lbr)
 {
 	if (thread->lbr_stitch)
 		return true;
@@ -2301,6 +2357,14 @@ static bool alloc_lbr_stitch(struct thread *thread)
 	if (!thread->lbr_stitch)
 		goto err;
 
+	thread->lbr_stitch->prev_lbr_cursor = calloc(max_lbr + 1, sizeof(struct callchain_cursor_node));
+	if (!thread->lbr_stitch->prev_lbr_cursor)
+		goto free_lbr_stitch;
+
+	return true;
+
+free_lbr_stitch:
+	zfree(&thread->lbr_stitch);
 err:
 	pr_warning("Failed to allocate space for stitched LBRs. Disable LBR stitch\n");
 	thread->lbr_stitch_enable = false;
@@ -2319,7 +2383,8 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 					struct perf_sample *sample,
 					struct symbol **parent,
 					struct addr_location *root_al,
-					int max_stack)
+					int max_stack,
+					unsigned int max_lbr)
 {
 	struct ip_callchain *chain = sample->callchain;
 	int chain_nr = min(max_stack, (int)chain->nr), i;
@@ -2337,7 +2402,7 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 		return 0;
 
 	if (thread->lbr_stitch_enable && !sample->no_hw_idx &&
-	    alloc_lbr_stitch(thread)) {
+	    (max_lbr > 0) && alloc_lbr_stitch(thread, max_lbr)) {
 		lbr_stitch = thread->lbr_stitch;
 
 		memcpy(&lbr_stitch->prev_sample, sample, sizeof(*sample));
@@ -2417,8 +2482,11 @@ static int thread__resolve_callchain_sample(struct thread *thread,
 		chain_nr = chain->nr;
 
 	if (perf_evsel__has_branch_callstack(evsel)) {
+		struct perf_env *env = perf_evsel__env(evsel);
+
 		err = resolve_lbr_callchain_sample(thread, cursor, sample, parent,
-						   root_al, max_stack);
+						   root_al, max_stack,
+						   !env ? 0 : env->max_branches);
 		if (err)
 			return (err < 0) ? err : 0;
 	}
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 34eb61c..8456174 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -15,6 +15,7 @@
 #include <intlist.h>
 #include "rwsem.h"
 #include "event.h"
+#include "callchain.h"
 
 struct addr_location;
 struct map;
@@ -24,6 +25,7 @@ struct unwind_libunwind_ops;
 
 struct lbr_stitch {
 	struct perf_sample		prev_sample;
+	struct callchain_cursor_node	*prev_lbr_cursor;
 };
 
 struct thread {
@@ -154,6 +156,12 @@ static inline bool thread__is_filtered(struct thread *thread)
 
 static inline void thread__free_stitch_list(struct thread *thread)
 {
+	struct lbr_stitch *lbr_stitch = thread->lbr_stitch;
+
+	if (!lbr_stitch)
+		return;
+
+	zfree(&lbr_stitch->prev_lbr_cursor);
 	zfree(&thread->lbr_stitch);
 }
 

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [tip: perf/core] perf thread: Save previous sample for LBR stitching approach
  2020-03-19 20:25 ` [PATCH V4 10/17] perf tools: Save previous sample for LBR stitching approach kan.liang
  2020-04-17 15:02   ` Arnaldo Carvalho de Melo
@ 2020-04-22 12:17   ` tip-bot2 for Kan Liang
  1 sibling, 0 replies; 46+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-04-22 12:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Kan Liang, Andi Kleen, Jiri Olsa, Adrian Hunter, Alexey Budankov,
	Mathieu Poirier, Michael Ellerman, Namhyung Kim, Pavel Gerasimov,
	Peter Zijlstra, Ravi Bangoria, Stephane Eranian,
	Vitaly Slobodskoy, Arnaldo Carvalho de Melo, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     9c6c3f471d85a9b0bcda3ce6fc1e2646685e3f60
Gitweb:        https://git.kernel.org/tip/9c6c3f471d85a9b0bcda3ce6fc1e2646685e3f60
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 19 Mar 2020 13:25:10 -07:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Sat, 18 Apr 2020 09:05:01 -03:00

perf thread: Save previous sample for LBR stitching approach

To retrieve the overwritten LBRs from previous sample for LBR stitching
approach, perf has to save the previous sample.

Only allocate the struct lbr_stitch once, when LBR stitching approach is
enabled and kernel supports hw_idx.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
Link: http://lore.kernel.org/lkml/20200319202517.23423-11-kan.liang@linux.intel.com
[ Use zalloc()/zfree() for thread->lbr_stitch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/machine.c | 23 +++++++++++++++++++++++
 tools/perf/util/thread.c  |  1 +
 tools/perf/util/thread.h  | 12 ++++++++++++
 3 files changed, 36 insertions(+)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index f9d69fc..a54ca09 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2292,6 +2292,21 @@ static int lbr_callchain_add_lbr_ip(struct thread *thread,
 	return 0;
 }
 
+static bool alloc_lbr_stitch(struct thread *thread)
+{
+	if (thread->lbr_stitch)
+		return true;
+
+	thread->lbr_stitch = zalloc(sizeof(*thread->lbr_stitch));
+	if (!thread->lbr_stitch)
+		goto err;
+
+err:
+	pr_warning("Failed to allocate space for stitched LBRs. Disable LBR stitch\n");
+	thread->lbr_stitch_enable = false;
+	return false;
+}
+
 /*
  * Recolve LBR callstack chain sample
  * Return:
@@ -2308,6 +2323,7 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 {
 	struct ip_callchain *chain = sample->callchain;
 	int chain_nr = min(max_stack, (int)chain->nr), i;
+	struct lbr_stitch *lbr_stitch;
 	u64 branch_from = 0;
 	int err;
 
@@ -2320,6 +2336,13 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 	if (i == chain_nr)
 		return 0;
 
+	if (thread->lbr_stitch_enable && !sample->no_hw_idx &&
+	    alloc_lbr_stitch(thread)) {
+		lbr_stitch = thread->lbr_stitch;
+
+		memcpy(&lbr_stitch->prev_sample, sample, sizeof(*sample));
+	}
+
 	if (callchain_param.order == ORDER_CALLEE) {
 		/* Add kernel ip */
 		err = lbr_callchain_add_kernel_ip(thread, cursor, sample,
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 1f080db..8d0da26 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -111,6 +111,7 @@ void thread__delete(struct thread *thread)
 
 	exit_rwsem(&thread->namespaces_lock);
 	exit_rwsem(&thread->comm_lock);
+	thread__free_stitch_list(thread);
 	free(thread);
 }
 
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 9529405..34eb61c 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -5,6 +5,7 @@
 #include <linux/refcount.h>
 #include <linux/rbtree.h>
 #include <linux/list.h>
+#include <linux/zalloc.h>
 #include <stdio.h>
 #include <unistd.h>
 #include <sys/types.h>
@@ -13,6 +14,7 @@
 #include <strlist.h>
 #include <intlist.h>
 #include "rwsem.h"
+#include "event.h"
 
 struct addr_location;
 struct map;
@@ -20,6 +22,10 @@ struct perf_record_namespaces;
 struct thread_stack;
 struct unwind_libunwind_ops;
 
+struct lbr_stitch {
+	struct perf_sample		prev_sample;
+};
+
 struct thread {
 	union {
 		struct rb_node	 rb_node;
@@ -49,6 +55,7 @@ struct thread {
 
 	/* LBR call stack stitch */
 	bool			lbr_stitch_enable;
+	struct lbr_stitch	*lbr_stitch;
 };
 
 struct machine;
@@ -145,4 +152,9 @@ static inline bool thread__is_filtered(struct thread *thread)
 	return false;
 }
 
+static inline void thread__free_stitch_list(struct thread *thread)
+{
+	zfree(&thread->lbr_stitch);
+}
+
 #endif	/* __PERF_THREAD_H */

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [tip: perf/core] perf thread: Add a knob for LBR stitch approach
  2020-03-19 20:25 ` [PATCH V4 09/17] perf thread: Add a knob for LBR stitch approach kan.liang
@ 2020-04-22 12:17   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 46+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-04-22 12:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Kan Liang, Andi Kleen, Jiri Olsa, Adrian Hunter, Alexey Budankov,
	Mathieu Poirier, Michael Ellerman, Namhyung Kim, Pavel Gerasimov,
	Peter Zijlstra, Ravi Bangoria, Stephane Eranian,
	Vitaly Slobodskoy, Arnaldo Carvalho de Melo, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     771fd155dfaa5332da69d606db16fe27bd9d388d
Gitweb:        https://git.kernel.org/tip/771fd155dfaa5332da69d606db16fe27bd9d388d
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 19 Mar 2020 13:25:09 -07:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Sat, 18 Apr 2020 09:05:01 -03:00

perf thread: Add a knob for LBR stitch approach

The LBR stitch approach should be disabled by default. Because

- The stitching approach base on LBR call stack technology. The known
  limitations of LBR call stack technology still apply to the approach,
  e.g. Exception handing such as setjmp/longjmp will have calls/returns
  not match.

- This approach is not foolproof. There can be cases where it creates
  incorrect call stacks from incorrect matches. There is no attempt to
  validate any matches in another way.

The 'lbr_stitch_enable' is used to indicate whether enable LBR stitch
approach, which is disabled by default. The following patch will
introduce a new option for each tools to enable the LBR stitch
approach.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
Link: http://lore.kernel.org/lkml/20200319202517.23423-10-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/thread.c | 1 +
 tools/perf/util/thread.h | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 28b7193..1f080db 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -47,6 +47,7 @@ struct thread *thread__new(pid_t pid, pid_t tid)
 		thread->tid = tid;
 		thread->ppid = -1;
 		thread->cpu = -1;
+		thread->lbr_stitch_enable = false;
 		INIT_LIST_HEAD(&thread->namespaces_list);
 		INIT_LIST_HEAD(&thread->comm_list);
 		init_rwsem(&thread->namespaces_lock);
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 20b96b5..9529405 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -46,6 +46,9 @@ struct thread {
 	struct srccode_state	srccode_state;
 	bool			filter;
 	int			filter_entry_depth;
+
+	/* LBR call stack stitch */
+	bool			lbr_stitch_enable;
 };
 
 struct machine;

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [tip: perf/core] perf machine: Factor out lbr_callchain_add_lbr_ip()
  2020-03-19 20:25 ` [PATCH V4 08/17] perf machine: Factor out lbr_callchain_add_lbr_ip() kan.liang
@ 2020-04-22 12:17   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 46+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-04-22 12:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Kan Liang, Andi Kleen, Jiri Olsa, Adrian Hunter, Alexey Budankov,
	Mathieu Poirier, Michael Ellerman, Namhyung Kim, Pavel Gerasimov,
	Peter Zijlstra, Ravi Bangoria, Stephane Eranian,
	Vitaly Slobodskoy, Arnaldo Carvalho de Melo, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     e2b23483eb1d851b4c48935a995f79b2de41c3ed
Gitweb:        https://git.kernel.org/tip/e2b23483eb1d851b4c48935a995f79b2de41c3ed
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 19 Mar 2020 13:25:08 -07:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Sat, 18 Apr 2020 09:05:01 -03:00

perf machine: Factor out lbr_callchain_add_lbr_ip()

Both caller and callee needs to add ip from LBR to callchain.
Factor out lbr_callchain_add_lbr_ip() to improve code readability.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
Link: http://lore.kernel.org/lkml/20200319202517.23423-9-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/machine.c | 143 ++++++++++++++++++-------------------
 1 file changed, 73 insertions(+), 70 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index a7f75fd..f9d69fc 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2224,6 +2224,74 @@ static int lbr_callchain_add_kernel_ip(struct thread *thread,
 	return 0;
 }
 
+static int lbr_callchain_add_lbr_ip(struct thread *thread,
+				    struct callchain_cursor *cursor,
+				    struct perf_sample *sample,
+				    struct symbol **parent,
+				    struct addr_location *root_al,
+				    u64 *branch_from,
+				    bool callee)
+{
+	struct branch_stack *lbr_stack = sample->branch_stack;
+	struct branch_entry *entries = perf_sample__branch_entries(sample);
+	u8 cpumode = PERF_RECORD_MISC_USER;
+	int lbr_nr = lbr_stack->nr;
+	struct branch_flags *flags;
+	int err, i;
+	u64 ip;
+
+	if (callee) {
+		/* Add LBR ip from first entries.to */
+		ip = entries[0].to;
+		flags = &entries[0].flags;
+		*branch_from = entries[0].from;
+		err = add_callchain_ip(thread, cursor, parent,
+				       root_al, &cpumode, ip,
+				       true, flags, NULL,
+				       *branch_from);
+		if (err)
+			return err;
+
+		/* Add LBR ip from entries.from one by one. */
+		for (i = 0; i < lbr_nr; i++) {
+			ip = entries[i].from;
+			flags = &entries[i].flags;
+			err = add_callchain_ip(thread, cursor, parent,
+					       root_al, &cpumode, ip,
+					       true, flags, NULL,
+					       *branch_from);
+			if (err)
+				return err;
+		}
+		return 0;
+	}
+
+	/* Add LBR ip from entries.from one by one. */
+	for (i = lbr_nr - 1; i >= 0; i--) {
+		ip = entries[i].from;
+		flags = &entries[i].flags;
+		err = add_callchain_ip(thread, cursor, parent,
+				       root_al, &cpumode, ip,
+				       true, flags, NULL,
+				       *branch_from);
+		if (err)
+			return err;
+	}
+
+	/* Add LBR ip from first entries.to */
+	ip = entries[0].to;
+	flags = &entries[0].flags;
+	*branch_from = entries[0].from;
+	err = add_callchain_ip(thread, cursor, parent,
+			       root_al, &cpumode, ip,
+			       true, flags, NULL,
+			       *branch_from);
+	if (err)
+		return err;
+
+	return 0;
+}
+
 /*
  * Recolve LBR callstack chain sample
  * Return:
@@ -2240,14 +2308,7 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 {
 	struct ip_callchain *chain = sample->callchain;
 	int chain_nr = min(max_stack, (int)chain->nr), i;
-	u8 cpumode = PERF_RECORD_MISC_USER;
-	u64 ip, branch_from = 0;
-	struct branch_stack *lbr_stack;
-	struct branch_entry *entries;
-	int lbr_nr, j, k;
-	bool branch;
-	struct branch_flags *flags;
-	int mix_chain_nr;
+	u64 branch_from = 0;
 	int err;
 
 	for (i = 0; i < chain_nr; i++) {
@@ -2259,21 +2320,6 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 	if (i == chain_nr)
 		return 0;
 
-	lbr_stack = sample->branch_stack;
-	entries = perf_sample__branch_entries(sample);
-	lbr_nr = lbr_stack->nr;
-	/*
-	 * LBR callstack can only get user call chain.
-	 * The mix_chain_nr is kernel call chain
-	 * number plus LBR user call chain number.
-	 * i is kernel call chain number,
-	 * 1 is PERF_CONTEXT_USER,
-	 * lbr_nr + 1 is the user call chain number.
-	 * For details, please refer to the comments
-	 * in callchain__printf
-	 */
-	mix_chain_nr = i + 1 + lbr_nr + 1;
-
 	if (callchain_param.order == ORDER_CALLEE) {
 		/* Add kernel ip */
 		err = lbr_callchain_add_kernel_ip(thread, cursor, sample,
@@ -2282,57 +2328,14 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 		if (err)
 			goto error;
 
-		/* Add LBR ip from first entries.to */
-		ip = entries[0].to;
-		branch = true;
-		flags = &entries[0].flags;
-		branch_from = entries[0].from;
-		err = add_callchain_ip(thread, cursor, parent,
-				       root_al, &cpumode, ip,
-				       branch, flags, NULL,
-				       branch_from);
+		err = lbr_callchain_add_lbr_ip(thread, cursor, sample, parent,
+					       root_al, &branch_from, true);
 		if (err)
 			goto error;
 
-		/* Add LBR ip from entries.from one by one. */
-		for (j = i + 2; j < mix_chain_nr; j++) {
-			k = j - i - 2;
-			ip = entries[k].from;
-			branch = true;
-			flags = &entries[k].flags;
-
-			err = add_callchain_ip(thread, cursor, parent,
-					       root_al, &cpumode, ip,
-					       branch, flags, NULL,
-					       branch_from);
-			if (err)
-				goto error;
-		}
 	} else {
-		/* Add LBR ip from entries.from one by one. */
-		for (j = 0; j < lbr_nr; j++) {
-			k = lbr_nr - j - 1;
-			ip = entries[k].from;
-			branch = true;
-			flags = &entries[k].flags;
-
-			err = add_callchain_ip(thread, cursor, parent,
-					       root_al, &cpumode, ip,
-					       branch, flags, NULL,
-					       branch_from);
-			if (err)
-				goto error;
-		}
-
-		/* Add LBR ip from first entries.to */
-		ip = entries[0].to;
-		branch = true;
-		flags = &entries[0].flags;
-		branch_from = entries[0].from;
-		err = add_callchain_ip(thread, cursor, parent,
-				       root_al, &cpumode, ip,
-				       branch, flags, NULL,
-				       branch_from);
+		err = lbr_callchain_add_lbr_ip(thread, cursor, sample, parent,
+					       root_al, &branch_from, false);
 		if (err)
 			goto error;
 

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [tip: perf/core] perf machine: Factor out lbr_callchain_add_kernel_ip()
  2020-03-19 20:25 ` [PATCH V4 07/17] perf machine: Factor out lbr_callchain_add_kernel_ip() kan.liang
@ 2020-04-22 12:17   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 46+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-04-22 12:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Kan Liang, Andi Kleen, Jiri Olsa, Adrian Hunter, Alexey Budankov,
	Mathieu Poirier, Michael Ellerman, Namhyung Kim, Pavel Gerasimov,
	Peter Zijlstra, Ravi Bangoria, Stephane Eranian,
	Vitaly Slobodskoy, Arnaldo Carvalho de Melo, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     dd3e249a0c0ad88098922803b149c788bb364c23
Gitweb:        https://git.kernel.org/tip/dd3e249a0c0ad88098922803b149c788bb364c23
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 19 Mar 2020 13:25:07 -07:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Sat, 18 Apr 2020 09:05:00 -03:00

perf machine: Factor out lbr_callchain_add_kernel_ip()

Both caller and callee needs to add kernel ip to callchain.  Factor out
lbr_callchain_add_kernel_ip() to improve code readability.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
Link: http://lore.kernel.org/lkml/20200319202517.23423-8-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/machine.c | 67 +++++++++++++++++++++++++-------------
 1 file changed, 45 insertions(+), 22 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 0da540e..a7f75fd 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2190,6 +2190,40 @@ static int remove_loops(struct branch_entry *l, int nr,
 	return nr;
 }
 
+static int lbr_callchain_add_kernel_ip(struct thread *thread,
+				       struct callchain_cursor *cursor,
+				       struct perf_sample *sample,
+				       struct symbol **parent,
+				       struct addr_location *root_al,
+				       u64 branch_from,
+				       bool callee, int end)
+{
+	struct ip_callchain *chain = sample->callchain;
+	u8 cpumode = PERF_RECORD_MISC_USER;
+	int err, i;
+
+	if (callee) {
+		for (i = 0; i < end + 1; i++) {
+			err = add_callchain_ip(thread, cursor, parent,
+					       root_al, &cpumode, chain->ips[i],
+					       false, NULL, NULL, branch_from);
+			if (err)
+				return err;
+		}
+		return 0;
+	}
+
+	for (i = end; i >= 0; i--) {
+		err = add_callchain_ip(thread, cursor, parent,
+				       root_al, &cpumode, chain->ips[i],
+				       false, NULL, NULL, branch_from);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 /*
  * Recolve LBR callstack chain sample
  * Return:
@@ -2242,17 +2276,12 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 
 	if (callchain_param.order == ORDER_CALLEE) {
 		/* Add kernel ip */
-		for (j = 0; j < i + 1; j++) {
-			ip = chain->ips[j];
-			branch = false;
-			flags = NULL;
-			err = add_callchain_ip(thread, cursor, parent,
-					       root_al, &cpumode, ip,
-					       branch, flags, NULL,
-					       branch_from);
-			if (err)
-				goto error;
-		}
+		err = lbr_callchain_add_kernel_ip(thread, cursor, sample,
+						  parent, root_al, branch_from,
+						  true, i);
+		if (err)
+			goto error;
+
 		/* Add LBR ip from first entries.to */
 		ip = entries[0].to;
 		branch = true;
@@ -2308,17 +2337,11 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 			goto error;
 
 		/* Add kernel ip */
-		for (j = lbr_nr + 1; j < mix_chain_nr; j++) {
-			ip = chain->ips[i + 1 - (j - lbr_nr)];
-			branch = false;
-			flags = NULL;
-			err = add_callchain_ip(thread, cursor, parent,
-					       root_al, &cpumode, ip,
-					       branch, flags, NULL,
-					       branch_from);
-			if (err)
-				goto error;
-		}
+		err = lbr_callchain_add_kernel_ip(thread, cursor, sample,
+						  parent, root_al, branch_from,
+						  false, i);
+		if (err)
+			goto error;
 	}
 	return 1;
 

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [tip: perf/core] perf machine: Refine the function for LBR call stack reconstruction
  2020-03-19 20:25 ` [PATCH V4 06/17] perf machine: Refine the function for LBR call stack reconstruction kan.liang
@ 2020-04-22 12:17   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 46+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-04-22 12:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Kan Liang, Andi Kleen, Jiri Olsa, Adrian Hunter, Alexey Budankov,
	Mathieu Poirier, Michael Ellerman, Namhyung Kim, Pavel Gerasimov,
	Peter Zijlstra, Ravi Bangoria, Stephane Eranian,
	Vitaly Slobodskoy, Arnaldo Carvalho de Melo, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     e48b8311ca4538ec716196a1625812b045999f21
Gitweb:        https://git.kernel.org/tip/e48b8311ca4538ec716196a1625812b045999f21
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 19 Mar 2020 13:25:06 -07:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Sat, 18 Apr 2020 09:05:00 -03:00

perf machine: Refine the function for LBR call stack reconstruction

LBR only collect the user call stack. To reconstruct a call stack, both
kernel call stack and user call stack are required. The function
resolve_lbr_callchain_sample() mix the kernel call stack and user call
stack.

Now, with the help of HW idx, perf tool can reconstruct a more complete
call stack by adding some user call stack from previous sample. However,
current implementation is hard to be extended to support it.

Current code path for resolve_lbr_callchain_sample()

  for (j = 0; j < mix_chain_nr; j++) {
       if (ORDER_CALLEE) {
             if (kernel callchain)
                  Fill callchain info
             else if (LBR callchain)
                  Fill callchain info
       } else {
             if (LBR callchain)
                  Fill callchain info
             else if (kernel callchain)
                  Fill callchain info
       }
       add_callchain_ip();
  }

With the patch,

  if (ORDER_CALLEE) {
       for (j = 0; j < NUM of kernel callchain) {
             Fill callchain info
             add_callchain_ip();
       }
       for (; j < mix_chain_nr) {
             Fill callchain info
             add_callchain_ip();
       }
  } else {
       for (; j < NUM of LBR callchain) {
             Fill callchain info
             add_callchain_ip();
       }
       for (j = 0; j < mix_chain_nr) {
             Fill callchain info
             add_callchain_ip();
       }
  }

No functional changes.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
Link: http://lore.kernel.org/lkml/20200319202517.23423-7-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/machine.c | 111 +++++++++++++++++++++++++------------
 1 file changed, 76 insertions(+), 35 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index be1bd92..0da540e 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2214,6 +2214,7 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 	bool branch;
 	struct branch_flags *flags;
 	int mix_chain_nr;
+	int err;
 
 	for (i = 0; i < chain_nr; i++) {
 		if (chain->ips[i] == PERF_CONTEXT_USER)
@@ -2239,50 +2240,90 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 	 */
 	mix_chain_nr = i + 1 + lbr_nr + 1;
 
-	for (j = 0; j < mix_chain_nr; j++) {
-		int err;
-
-		branch = false;
-		flags = NULL;
-
-		if (callchain_param.order == ORDER_CALLEE) {
-			if (j < i + 1)
-				ip = chain->ips[j];
-			else if (j > i + 1) {
-				k = j - i - 2;
-				ip = entries[k].from;
-				branch = true;
-				flags = &entries[k].flags;
-			} else {
-				ip = entries[0].to;
-				branch = true;
-				flags = &entries[0].flags;
-				branch_from = entries[0].from;
-			}
-		} else {
-			if (j < lbr_nr) {
-				k = lbr_nr - j - 1;
-				ip = entries[k].from;
-				branch = true;
-				flags = &entries[k].flags;
-			} else if (j > lbr_nr)
-				ip = chain->ips[i + 1 - (j - lbr_nr)];
-			else {
-				ip = entries[0].to;
-				branch = true;
-				flags = &entries[0].flags;
-				branch_from = entries[0].from;
-			}
+	if (callchain_param.order == ORDER_CALLEE) {
+		/* Add kernel ip */
+		for (j = 0; j < i + 1; j++) {
+			ip = chain->ips[j];
+			branch = false;
+			flags = NULL;
+			err = add_callchain_ip(thread, cursor, parent,
+					       root_al, &cpumode, ip,
+					       branch, flags, NULL,
+					       branch_from);
+			if (err)
+				goto error;
 		}
+		/* Add LBR ip from first entries.to */
+		ip = entries[0].to;
+		branch = true;
+		flags = &entries[0].flags;
+		branch_from = entries[0].from;
+		err = add_callchain_ip(thread, cursor, parent,
+				       root_al, &cpumode, ip,
+				       branch, flags, NULL,
+				       branch_from);
+		if (err)
+			goto error;
 
+		/* Add LBR ip from entries.from one by one. */
+		for (j = i + 2; j < mix_chain_nr; j++) {
+			k = j - i - 2;
+			ip = entries[k].from;
+			branch = true;
+			flags = &entries[k].flags;
+
+			err = add_callchain_ip(thread, cursor, parent,
+					       root_al, &cpumode, ip,
+					       branch, flags, NULL,
+					       branch_from);
+			if (err)
+				goto error;
+		}
+	} else {
+		/* Add LBR ip from entries.from one by one. */
+		for (j = 0; j < lbr_nr; j++) {
+			k = lbr_nr - j - 1;
+			ip = entries[k].from;
+			branch = true;
+			flags = &entries[k].flags;
+
+			err = add_callchain_ip(thread, cursor, parent,
+					       root_al, &cpumode, ip,
+					       branch, flags, NULL,
+					       branch_from);
+			if (err)
+				goto error;
+		}
+
+		/* Add LBR ip from first entries.to */
+		ip = entries[0].to;
+		branch = true;
+		flags = &entries[0].flags;
+		branch_from = entries[0].from;
 		err = add_callchain_ip(thread, cursor, parent,
 				       root_al, &cpumode, ip,
 				       branch, flags, NULL,
 				       branch_from);
 		if (err)
-			return (err < 0) ? err : 0;
+			goto error;
+
+		/* Add kernel ip */
+		for (j = lbr_nr + 1; j < mix_chain_nr; j++) {
+			ip = chain->ips[i + 1 - (j - lbr_nr)];
+			branch = false;
+			flags = NULL;
+			err = add_callchain_ip(thread, cursor, parent,
+					       root_al, &cpumode, ip,
+					       branch, flags, NULL,
+					       branch_from);
+			if (err)
+				goto error;
+		}
 	}
 	return 1;
+
+error:
+	return (err < 0) ? err : 0;
 }
 
 static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [tip: perf/core] perf machine: Remove the indent in resolve_lbr_callchain_sample
  2020-03-19 20:25 ` [PATCH V4 05/17] perf machine: Remove the indent in resolve_lbr_callchain_sample kan.liang
@ 2020-04-22 12:17   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 46+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-04-22 12:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Kan Liang, Andi Kleen, Jiri Olsa, Adrian Hunter, Alexey Budankov,
	Mathieu Poirier, Michael Ellerman, Namhyung Kim, Pavel Gerasimov,
	Peter Zijlstra, Ravi Bangoria, Stephane Eranian,
	Vitaly Slobodskoy, Arnaldo Carvalho de Melo, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     f8603267bf8589f2a6a3e0a7de0a8dc6b6bd3c7d
Gitweb:        https://git.kernel.org/tip/f8603267bf8589f2a6a3e0a7de0a8dc6b6bd3c7d
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 19 Mar 2020 13:25:05 -07:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Sat, 18 Apr 2020 09:05:00 -03:00

perf machine: Remove the indent in resolve_lbr_callchain_sample

The indent is unnecessary in resolve_lbr_callchain_sample.  Removing it
will make the following patch simpler.

Current code path for resolve_lbr_callchain_sample()

        /* LBR only affects the user callchain */
        if (i != chain_nr) {
                body of the function
                ....
                return 1;
        }

        return 0;

With the patch,

        /* LBR only affects the user callchain */
        if (i == chain_nr)
                return 0;

        body of the function
        ...
        return 1;

No functional changes.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
Link: http://lore.kernel.org/lkml/20200319202517.23423-6-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/machine.c | 123 ++++++++++++++++++-------------------
 1 file changed, 63 insertions(+), 60 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 09845ea..be1bd92 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2208,6 +2208,12 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 	int chain_nr = min(max_stack, (int)chain->nr), i;
 	u8 cpumode = PERF_RECORD_MISC_USER;
 	u64 ip, branch_from = 0;
+	struct branch_stack *lbr_stack;
+	struct branch_entry *entries;
+	int lbr_nr, j, k;
+	bool branch;
+	struct branch_flags *flags;
+	int mix_chain_nr;
 
 	for (i = 0; i < chain_nr; i++) {
 		if (chain->ips[i] == PERF_CONTEXT_USER)
@@ -2215,71 +2221,68 @@ static int resolve_lbr_callchain_sample(struct thread *thread,
 	}
 
 	/* LBR only affects the user callchain */
-	if (i != chain_nr) {
-		struct branch_stack *lbr_stack = sample->branch_stack;
-		struct branch_entry *entries = perf_sample__branch_entries(sample);
-		int lbr_nr = lbr_stack->nr, j, k;
-		bool branch;
-		struct branch_flags *flags;
-		/*
-		 * LBR callstack can only get user call chain.
-		 * The mix_chain_nr is kernel call chain
-		 * number plus LBR user call chain number.
-		 * i is kernel call chain number,
-		 * 1 is PERF_CONTEXT_USER,
-		 * lbr_nr + 1 is the user call chain number.
-		 * For details, please refer to the comments
-		 * in callchain__printf
-		 */
-		int mix_chain_nr = i + 1 + lbr_nr + 1;
-
-		for (j = 0; j < mix_chain_nr; j++) {
-			int err;
-			branch = false;
-			flags = NULL;
+	if (i == chain_nr)
+		return 0;
 
-			if (callchain_param.order == ORDER_CALLEE) {
-				if (j < i + 1)
-					ip = chain->ips[j];
-				else if (j > i + 1) {
-					k = j - i - 2;
-					ip = entries[k].from;
-					branch = true;
-					flags = &entries[k].flags;
-				} else {
-					ip = entries[0].to;
-					branch = true;
-					flags = &entries[0].flags;
-					branch_from = entries[0].from;
-				}
+	lbr_stack = sample->branch_stack;
+	entries = perf_sample__branch_entries(sample);
+	lbr_nr = lbr_stack->nr;
+	/*
+	 * LBR callstack can only get user call chain.
+	 * The mix_chain_nr is kernel call chain
+	 * number plus LBR user call chain number.
+	 * i is kernel call chain number,
+	 * 1 is PERF_CONTEXT_USER,
+	 * lbr_nr + 1 is the user call chain number.
+	 * For details, please refer to the comments
+	 * in callchain__printf
+	 */
+	mix_chain_nr = i + 1 + lbr_nr + 1;
+
+	for (j = 0; j < mix_chain_nr; j++) {
+		int err;
+
+		branch = false;
+		flags = NULL;
+
+		if (callchain_param.order == ORDER_CALLEE) {
+			if (j < i + 1)
+				ip = chain->ips[j];
+			else if (j > i + 1) {
+				k = j - i - 2;
+				ip = entries[k].from;
+				branch = true;
+				flags = &entries[k].flags;
 			} else {
-				if (j < lbr_nr) {
-					k = lbr_nr - j - 1;
-					ip = entries[k].from;
-					branch = true;
-					flags = &entries[k].flags;
-				}
-				else if (j > lbr_nr)
-					ip = chain->ips[i + 1 - (j - lbr_nr)];
-				else {
-					ip = entries[0].to;
-					branch = true;
-					flags = &entries[0].flags;
-					branch_from = entries[0].from;
-				}
+				ip = entries[0].to;
+				branch = true;
+				flags = &entries[0].flags;
+				branch_from = entries[0].from;
+			}
+		} else {
+			if (j < lbr_nr) {
+				k = lbr_nr - j - 1;
+				ip = entries[k].from;
+				branch = true;
+				flags = &entries[k].flags;
+			} else if (j > lbr_nr)
+				ip = chain->ips[i + 1 - (j - lbr_nr)];
+			else {
+				ip = entries[0].to;
+				branch = true;
+				flags = &entries[0].flags;
+				branch_from = entries[0].from;
 			}
-
-			err = add_callchain_ip(thread, cursor, parent,
-					       root_al, &cpumode, ip,
-					       branch, flags, NULL,
-					       branch_from);
-			if (err)
-				return (err < 0) ? err : 0;
 		}
-		return 1;
-	}
 
-	return 0;
+		err = add_callchain_ip(thread, cursor, parent,
+				       root_al, &cpumode, ip,
+				       branch, flags, NULL,
+				       branch_from);
+		if (err)
+			return (err < 0) ? err : 0;
+	}
+	return 1;
 }
 
 static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [tip: perf/core] perf header: Support CPU PMU capabilities
  2020-03-19 20:25 ` [PATCH V4 02/17] perf header: Support CPU " kan.liang
@ 2020-04-22 12:17   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 46+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-04-22 12:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Kan Liang, Andi Kleen, Jiri Olsa, Arnaldo Carvalho de Melo,
	Adrian Hunter, Alexey Budankov, Mathieu Poirier,
	Michael Ellerman, Namhyung Kim, Pavel Gerasimov, Peter Zijlstra,
	Ravi Bangoria, Stephane Eranian, Vitaly Slobodskoy, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     6f91ea283a1ed23e4a548ddd62db6deb2c707f82
Gitweb:        https://git.kernel.org/tip/6f91ea283a1ed23e4a548ddd62db6deb2c707f82
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 19 Mar 2020 13:25:02 -07:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Sat, 18 Apr 2020 09:05:00 -03:00

perf header: Support CPU PMU capabilities

To stitch LBR call stack, the max LBR information is required. So the
CPU PMU capabilities information has to be stored in perf header.

Add a new feature HEADER_CPU_PMU_CAPS for CPU PMU capabilities.
Retrieve all CPU PMU capabilities, not just max LBR information.

Add variable max_branches to facilitate future usage.

Committer testing:

  # ls -la /sys/devices/cpu/caps/
  total 0
  drwxr-xr-x. 2 root root    0 Apr 17 10:53 .
  drwxr-xr-x. 6 root root    0 Apr 17 07:02 ..
  -r--r--r--. 1 root root 4096 Apr 17 10:53 max_precise
  #
  # cat /sys/devices/cpu/caps/max_precise
  0
  # perf record sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.033 MB perf.data (7 samples) ]
  #
  # perf report --header-only | egrep 'cpu(desc|.*capabilities)'
  # cpudesc : AMD Ryzen 5 3600X 6-Core Processor
  # cpu pmu capabilities: max_precise=0
  #

And then on an Intel machine:

  $ ls -la /sys/devices/cpu/caps/
  total 0
  drwxr-xr-x. 2 root root    0 Apr 17 10:51 .
  drwxr-xr-x. 6 root root    0 Apr 17 10:04 ..
  -r--r--r--. 1 root root 4096 Apr 17 11:37 branches
  -r--r--r--. 1 root root 4096 Apr 17 10:51 max_precise
  -r--r--r--. 1 root root 4096 Apr 17 11:37 pmu_name
  $ cat /sys/devices/cpu/caps/max_precise
  3
  $ cat /sys/devices/cpu/caps/branches
  32
  $ cat /sys/devices/cpu/caps/pmu_name
  skylake
  $ perf record sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.001 MB perf.data (8 samples) ]
  $ perf report --header-only | egrep 'cpu(desc|.*capabilities)'
  # cpudesc : Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  # cpu pmu capabilities: branches=32, max_precise=3, pmu_name=skylake
  $

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
Link: http://lore.kernel.org/lkml/20200319202517.23423-3-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf.data-file-format.txt |  16 ++-
 tools/perf/util/env.h                              |   3 +-
 tools/perf/util/header.c                           | 108 ++++++++++++-
 tools/perf/util/header.h                           |   1 +-
 4 files changed, 128 insertions(+)

diff --git a/tools/perf/Documentation/perf.data-file-format.txt b/tools/perf/Documentation/perf.data-file-format.txt
index b0152e1..b6472e4 100644
--- a/tools/perf/Documentation/perf.data-file-format.txt
+++ b/tools/perf/Documentation/perf.data-file-format.txt
@@ -373,6 +373,22 @@ struct {
 Indicates that trace contains records of PERF_RECORD_COMPRESSED type
 that have perf_events records in compressed form.
 
+	HEADER_CPU_PMU_CAPS = 28,
+
+	A list of cpu PMU capabilities. The format of data is as below.
+
+struct {
+	u32 nr_cpu_pmu_caps;
+	{
+		char	name[];
+		char	value[];
+	} [nr_cpu_pmu_caps]
+};
+
+
+Example:
+ cpu pmu capabilities: branches=32, max_precise=3, pmu_name=icelake
+
 	other bits are reserved and should ignored for now
 	HEADER_FEAT_BITS	= 256,
 
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index 7632075..1ab2682 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -48,6 +48,7 @@ struct perf_env {
 	char			*cpuid;
 	unsigned long long	total_mem;
 	unsigned int		msr_pmu_type;
+	unsigned int		max_branches;
 
 	int			nr_cmdline;
 	int			nr_sibling_cores;
@@ -57,12 +58,14 @@ struct perf_env {
 	int			nr_memory_nodes;
 	int			nr_pmu_mappings;
 	int			nr_groups;
+	int			nr_cpu_pmu_caps;
 	char			*cmdline;
 	const char		**cmdline_argv;
 	char			*sibling_cores;
 	char			*sibling_dies;
 	char			*sibling_threads;
 	char			*pmu_mappings;
+	char			*cpu_pmu_caps;
 	struct cpu_topology_map	*cpu;
 	struct cpu_cache_level	*caches;
 	int			 caches_cnt;
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index acbd046..28e82da 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -1395,6 +1395,38 @@ static int write_compressed(struct feat_fd *ff __maybe_unused,
 	return do_write(ff, &(ff->ph->env.comp_mmap_len), sizeof(ff->ph->env.comp_mmap_len));
 }
 
+static int write_cpu_pmu_caps(struct feat_fd *ff,
+			      struct evlist *evlist __maybe_unused)
+{
+	struct perf_pmu *cpu_pmu = perf_pmu__find("cpu");
+	struct perf_pmu_caps *caps = NULL;
+	int nr_caps;
+	int ret;
+
+	if (!cpu_pmu)
+		return -ENOENT;
+
+	nr_caps = perf_pmu__caps_parse(cpu_pmu);
+	if (nr_caps < 0)
+		return nr_caps;
+
+	ret = do_write(ff, &nr_caps, sizeof(nr_caps));
+	if (ret < 0)
+		return ret;
+
+	list_for_each_entry(caps, &cpu_pmu->caps, list) {
+		ret = do_write_string(ff, caps->name);
+		if (ret < 0)
+			return ret;
+
+		ret = do_write_string(ff, caps->value);
+		if (ret < 0)
+			return ret;
+	}
+
+	return ret;
+}
+
 static void print_hostname(struct feat_fd *ff, FILE *fp)
 {
 	fprintf(fp, "# hostname : %s\n", ff->ph->env.hostname);
@@ -1809,6 +1841,27 @@ static void print_compressed(struct feat_fd *ff, FILE *fp)
 		ff->ph->env.comp_level, ff->ph->env.comp_ratio);
 }
 
+static void print_cpu_pmu_caps(struct feat_fd *ff, FILE *fp)
+{
+	const char *delimiter = "# cpu pmu capabilities: ";
+	u32 nr_caps = ff->ph->env.nr_cpu_pmu_caps;
+	char *str;
+
+	if (!nr_caps) {
+		fprintf(fp, "# cpu pmu capabilities: not available\n");
+		return;
+	}
+
+	str = ff->ph->env.cpu_pmu_caps;
+	while (nr_caps--) {
+		fprintf(fp, "%s%s", delimiter, str);
+		delimiter = ", ";
+		str += strlen(str) + 1;
+	}
+
+	fprintf(fp, "\n");
+}
+
 static void print_pmu_mappings(struct feat_fd *ff, FILE *fp)
 {
 	const char *delimiter = "# pmu mappings: ";
@@ -2846,6 +2899,60 @@ static int process_compressed(struct feat_fd *ff,
 	return 0;
 }
 
+static int process_cpu_pmu_caps(struct feat_fd *ff,
+				void *data __maybe_unused)
+{
+	char *name, *value;
+	struct strbuf sb;
+	u32 nr_caps;
+
+	if (do_read_u32(ff, &nr_caps))
+		return -1;
+
+	if (!nr_caps) {
+		pr_debug("cpu pmu capabilities not available\n");
+		return 0;
+	}
+
+	ff->ph->env.nr_cpu_pmu_caps = nr_caps;
+
+	if (strbuf_init(&sb, 128) < 0)
+		return -1;
+
+	while (nr_caps--) {
+		name = do_read_string(ff);
+		if (!name)
+			goto error;
+
+		value = do_read_string(ff);
+		if (!value)
+			goto free_name;
+
+		if (strbuf_addf(&sb, "%s=%s", name, value) < 0)
+			goto free_value;
+
+		/* include a NULL character at the end */
+		if (strbuf_add(&sb, "", 1) < 0)
+			goto free_value;
+
+		if (!strcmp(name, "branches"))
+			ff->ph->env.max_branches = atoi(value);
+
+		free(value);
+		free(name);
+	}
+	ff->ph->env.cpu_pmu_caps = strbuf_detach(&sb, NULL);
+	return 0;
+
+free_value:
+	free(value);
+free_name:
+	free(name);
+error:
+	strbuf_release(&sb);
+	return -1;
+}
+
 #define FEAT_OPR(n, func, __full_only) \
 	[HEADER_##n] = {					\
 		.name	    = __stringify(n),			\
@@ -2903,6 +3010,7 @@ const struct perf_header_feature_ops feat_ops[HEADER_LAST_FEATURE] = {
 	FEAT_OPR(BPF_PROG_INFO, bpf_prog_info,  false),
 	FEAT_OPR(BPF_BTF,       bpf_btf,        false),
 	FEAT_OPR(COMPRESSED,	compressed,	false),
+	FEAT_OPR(CPU_PMU_CAPS,	cpu_pmu_caps,	false),
 };
 
 struct header_print_data {
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index 840f95c..650bd1c 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -43,6 +43,7 @@ enum {
 	HEADER_BPF_PROG_INFO,
 	HEADER_BPF_BTF,
 	HEADER_COMPRESSED,
+	HEADER_CPU_PMU_CAPS,
 	HEADER_LAST_FEATURE,
 	HEADER_FEAT_BITS	= 256,
 };

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [tip: perf/core] perf pmu: Add support for PMU capabilities
  2020-03-19 20:25 ` [PATCH V4 01/17] perf pmu: Add support for PMU capabilities kan.liang
@ 2020-04-22 12:17   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 46+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-04-22 12:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Kan Liang, Andi Kleen, Jiri Olsa, Adrian Hunter, Alexey Budankov,
	Mathieu Poirier, Michael Ellerman, Namhyung Kim, Pavel Gerasimov,
	Peter Zijlstra, Ravi Bangoria, Stephane Eranian,
	Vitaly Slobodskoy, Arnaldo Carvalho de Melo, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     9fbc61f832ebf432326a90e28184dade05ee34a8
Gitweb:        https://git.kernel.org/tip/9fbc61f832ebf432326a90e28184dade05ee34a8
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 19 Mar 2020 13:25:01 -07:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Sat, 18 Apr 2020 09:05:00 -03:00

perf pmu: Add support for PMU capabilities

The PMU capabilities information, which is located at
/sys/bus/event_source/devices/<dev>/caps, is required by perf tool.  For
example, the max LBR information is required to stitch LBR call stack.

Add perf_pmu__caps_parse() to parse the PMU capabilities information.
The information is stored in a list.

The following patch will store the capabilities information in perf
header.

Committer notes:

Here's an example of such directories and its files in an i5 7th gen
machine:

  [root@seventh ~]# ls -lad /sys/bus/event_source/devices/*/caps
  drwxr-xr-x. 2 root root 0 Apr 14 13:33 /sys/bus/event_source/devices/cpu/caps
  drwxr-xr-x. 2 root root 0 Apr 14 13:33 /sys/bus/event_source/devices/intel_pt/caps
  [root@seventh ~]# ls -la /sys/bus/event_source/devices/intel_pt/caps
  total 0
  drwxr-xr-x. 2 root root    0 Apr 14 13:33 .
  drwxr-xr-x. 5 root root    0 Apr 14 13:12 ..
  -r--r--r--. 1 root root 4096 Apr 16 13:10 cr3_filtering
  -r--r--r--. 1 root root 4096 Apr 16 11:42 cycle_thresholds
  -r--r--r--. 1 root root 4096 Apr 16 13:10 ip_filtering
  -r--r--r--. 1 root root 4096 Apr 16 13:10 max_subleaf
  -r--r--r--. 1 root root 4096 Apr 14 13:33 mtc
  -r--r--r--. 1 root root 4096 Apr 14 13:33 mtc_periods
  -r--r--r--. 1 root root 4096 Apr 16 13:10 num_address_ranges
  -r--r--r--. 1 root root 4096 Apr 16 13:10 output_subsys
  -r--r--r--. 1 root root 4096 Apr 16 13:10 payloads_lip
  -r--r--r--. 1 root root 4096 Apr 16 13:10 power_event_trace
  -r--r--r--. 1 root root 4096 Apr 14 13:33 psb_cyc
  -r--r--r--. 1 root root 4096 Apr 14 13:33 psb_periods
  -r--r--r--. 1 root root 4096 Apr 16 13:10 ptwrite
  -r--r--r--. 1 root root 4096 Apr 16 13:10 single_range_output
  -r--r--r--. 1 root root 4096 Apr 16 12:03 topa_multiple_entries
  -r--r--r--. 1 root root 4096 Apr 16 13:10 topa_output
  [root@seventh ~]# cat /sys/bus/event_source/devices/intel_pt/caps/topa_output
  1
  [root@seventh ~]# cat /sys/bus/event_source/devices/intel_pt/caps/topa_multiple_entries
  1
  [root@seventh ~]# cat /sys/bus/event_source/devices/intel_pt/caps/mtc
  1
  [root@seventh ~]# cat /sys/bus/event_source/devices/intel_pt/caps/power_event_trace
  0
  [root@seventh ~]#

  [root@seventh ~]# ls -la /sys/bus/event_source/devices/cpu/caps/
  total 0
  drwxr-xr-x. 2 root root    0 Apr 14 13:33 .
  drwxr-xr-x. 6 root root    0 Apr 14 13:12 ..
  -r--r--r--. 1 root root 4096 Apr 16 13:10 branches
  -r--r--r--. 1 root root 4096 Apr 14 13:33 max_precise
  -r--r--r--. 1 root root 4096 Apr 16 13:10 pmu_name
  [root@seventh ~]# cat /sys/bus/event_source/devices/cpu/caps/max_precise
  3
  [root@seventh ~]# cat /sys/bus/event_source/devices/cpu/caps/branches
  32
  [root@seventh ~]# cat /sys/bus/event_source/devices/cpu/caps/pmu_name
  skylake
  [root@seventh ~]#

Wow, first time I've heard about
/sys/bus/event_source/devices/cpu/caps/max_precise, I think I'll use it!
:-)

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
Link: http://lore.kernel.org/lkml/20200319202517.23423-2-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/pmu.c | 82 ++++++++++++++++++++++++++++++++++++++++++-
 tools/perf/util/pmu.h |  9 +++++-
 2 files changed, 91 insertions(+)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index bc912a8..d9f89ed 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -850,6 +850,7 @@ static struct perf_pmu *pmu_lookup(const char *name)
 
 	INIT_LIST_HEAD(&pmu->format);
 	INIT_LIST_HEAD(&pmu->aliases);
+	INIT_LIST_HEAD(&pmu->caps);
 	list_splice(&format, &pmu->format);
 	list_splice(&aliases, &pmu->aliases);
 	list_add_tail(&pmu->list, &pmus);
@@ -1594,3 +1595,84 @@ int perf_pmu__scan_file(struct perf_pmu *pmu, const char *name, const char *fmt,
 	va_end(args);
 	return ret;
 }
+
+static int perf_pmu__new_caps(struct list_head *list, char *name, char *value)
+{
+	struct perf_pmu_caps *caps = zalloc(sizeof(*caps));
+
+	if (!caps)
+		return -ENOMEM;
+
+	caps->name = strdup(name);
+	if (!caps->name)
+		goto free_caps;
+	caps->value = strndup(value, strlen(value) - 1);
+	if (!caps->value)
+		goto free_name;
+	list_add_tail(&caps->list, list);
+	return 0;
+
+free_name:
+	zfree(caps->name);
+free_caps:
+	free(caps);
+
+	return -ENOMEM;
+}
+
+/*
+ * Reading/parsing the given pmu capabilities, which should be located at:
+ * /sys/bus/event_source/devices/<dev>/caps as sysfs group attributes.
+ * Return the number of capabilities
+ */
+int perf_pmu__caps_parse(struct perf_pmu *pmu)
+{
+	struct stat st;
+	char caps_path[PATH_MAX];
+	const char *sysfs = sysfs__mountpoint();
+	DIR *caps_dir;
+	struct dirent *evt_ent;
+	int nr_caps = 0;
+
+	if (!sysfs)
+		return -1;
+
+	snprintf(caps_path, PATH_MAX,
+		 "%s" EVENT_SOURCE_DEVICE_PATH "%s/caps", sysfs, pmu->name);
+
+	if (stat(caps_path, &st) < 0)
+		return 0;	/* no error if caps does not exist */
+
+	caps_dir = opendir(caps_path);
+	if (!caps_dir)
+		return -EINVAL;
+
+	while ((evt_ent = readdir(caps_dir)) != NULL) {
+		char path[PATH_MAX + NAME_MAX + 1];
+		char *name = evt_ent->d_name;
+		char value[128];
+		FILE *file;
+
+		if (!strcmp(name, ".") || !strcmp(name, ".."))
+			continue;
+
+		snprintf(path, sizeof(path), "%s/%s", caps_path, name);
+
+		file = fopen(path, "r");
+		if (!file)
+			continue;
+
+		if (!fgets(value, sizeof(value), file) ||
+		    (perf_pmu__new_caps(&pmu->caps, name, value) < 0)) {
+			fclose(file);
+			continue;
+		}
+
+		nr_caps++;
+		fclose(file);
+	}
+
+	closedir(caps_dir);
+
+	return nr_caps;
+}
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 5fb3f16..1edd214 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -22,6 +22,12 @@ enum {
 
 struct perf_event_attr;
 
+struct perf_pmu_caps {
+	char *name;
+	char *value;
+	struct list_head list;
+};
+
 struct perf_pmu {
 	char *name;
 	__u32 type;
@@ -33,6 +39,7 @@ struct perf_pmu {
 	struct perf_cpu_map *cpus;
 	struct list_head format;  /* HEAD struct perf_pmu_format -> list */
 	struct list_head aliases; /* HEAD struct perf_pmu_alias -> list */
+	struct list_head caps;    /* HEAD struct perf_pmu_caps -> list */
 	struct list_head list;    /* ELEM */
 };
 
@@ -107,4 +114,6 @@ bool pmu_uncore_alias_match(const char *pmu_name, const char *name);
 
 int perf_pmu__convert_scale(const char *scale, char **end, double *sval);
 
+int perf_pmu__caps_parse(struct perf_pmu *pmu);
+
 #endif /* __PMU_H */

^ permalink raw reply related	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2020-04-22 12:23 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-19 20:25 [PATCH V4 00/17] Stitch LBR call stack (Perf Tools) kan.liang
2020-03-19 20:25 ` [PATCH V4 01/17] perf pmu: Add support for PMU capabilities kan.liang
2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-03-19 20:25 ` [PATCH V4 02/17] perf header: Support CPU " kan.liang
2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-03-19 20:25 ` [PATCH V4 03/17] perf record: Clear HEADER_CPU_PMU_CAPS for non LBR call stack mode kan.liang
2020-04-17 14:42   ` Arnaldo Carvalho de Melo
2020-03-19 20:25 ` [PATCH V4 04/17] perf stat: Clear HEADER_CPU_PMU_CAPS kan.liang
2020-04-17 14:42   ` Arnaldo Carvalho de Melo
2020-03-19 20:25 ` [PATCH V4 05/17] perf machine: Remove the indent in resolve_lbr_callchain_sample kan.liang
2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-03-19 20:25 ` [PATCH V4 06/17] perf machine: Refine the function for LBR call stack reconstruction kan.liang
2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-03-19 20:25 ` [PATCH V4 07/17] perf machine: Factor out lbr_callchain_add_kernel_ip() kan.liang
2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-03-19 20:25 ` [PATCH V4 08/17] perf machine: Factor out lbr_callchain_add_lbr_ip() kan.liang
2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-03-19 20:25 ` [PATCH V4 09/17] perf thread: Add a knob for LBR stitch approach kan.liang
2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-03-19 20:25 ` [PATCH V4 10/17] perf tools: Save previous sample for LBR stitching approach kan.liang
2020-04-17 15:02   ` Arnaldo Carvalho de Melo
2020-04-22 12:17   ` [tip: perf/core] perf thread: " tip-bot2 for Kan Liang
2020-03-19 20:25 ` [PATCH V4 11/17] perf tools: Save previous cursor nodes " kan.liang
2020-04-17 16:53   ` Arnaldo Carvalho de Melo
2020-04-22 12:17   ` [tip: perf/core] perf callchain: " tip-bot2 for Kan Liang
2020-03-19 20:25 ` [PATCH V4 12/17] perf tools: Stitch LBR call stack kan.liang
2020-04-22 12:17   ` [tip: perf/core] perf callchain: " tip-bot2 for Kan Liang
2020-03-19 20:25 ` [PATCH V4 13/17] perf report: Add option to enable the LBR stitching approach kan.liang
2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-03-19 20:25 ` [PATCH V4 14/17] perf script: " kan.liang
2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-03-19 20:25 ` [PATCH V4 15/17] perf top: " kan.liang
2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-03-19 20:25 ` [PATCH V4 16/17] perf c2c: " kan.liang
2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-03-19 20:25 ` [PATCH V4 17/17] perf hist: Add fast path for duplicate entries check kan.liang
2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-03-23 11:13 ` [PATCH V4 00/17] Stitch LBR call stack (Perf Tools) Jiri Olsa
2020-04-02 15:34   ` Liang, Kan
2020-04-02 16:00     ` Arnaldo Carvalho de Melo
2020-04-02 17:02       ` Liang, Kan
2020-04-17 17:48 ` Arnaldo Carvalho de Melo
2020-04-17 21:47   ` Liang, Kan
2020-04-17 21:54     ` Arnaldo Carvalho de Melo
2020-04-17 21:55       ` Arnaldo Carvalho de Melo
2020-04-17 21:55         ` Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).