All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET v2 0/7] perf inject: Speed build-id injection
@ 2020-09-23  8:05 Namhyung Kim
  2020-09-23  8:05 ` [PATCH 1/7] perf bench: Add build-id injection benchmark Namhyung Kim
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Namhyung Kim @ 2020-09-23  8:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Mark Rutland, Alexander Shishkin,
	LKML, Stephane Eranian, Ian Rogers

Hello,

This is my second attempt to speed up build-id injection.  As this is
to improve performance, I've added a benchmark for it.  Please look at
the usage in the first commit.

By default, it measures average processing time of 100 MMAP2 events
and 10000 SAMPLE events.  Below is the current result on my laptop.

  $ perf bench internals inject-build-id
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 22.997 msec (+- 0.067 msec)
    Average time per event: 2.255 usec (+- 0.007 usec)

With this patchset applied, it got this:

  $ perf bench internals inject-build-id
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 18.441 msec (+- 0.106 msec)
    Average time per event: 1.808 usec (+- 0.010 usec)
    Average build-id-all injection took: 13.451 msec (+- 0.132 msec)
    Average time per event: 1.319 usec (+- 0.013 usec)

Real usecases might be different as it depends on the number of
mmap/sample events as well as how many DSOs are actually hit.

Also please take a look at the last patch which I removed the old
build-id inject logic which seems not used anymore.

This code is available at 'perf/inject-speedup-v2' branch on

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git


Thanks
Namhyung


Namhyung Kim (7):
  perf bench: Add build-id injection benchmark
  perf inject: Add missing callbacks in perf_tool
  perf inject: Enter namespace when reading build-id
  perf inject: Do not load map/dso when injecting build-id
  perf inject: Add --buildid-all option
  perf bench: Run inject-build-id with --buildid-all option too
  perf inject: Remove stale build-id processing

 tools/perf/Documentation/perf-inject.txt |   6 +-
 tools/perf/bench/Build                   |   1 +
 tools/perf/bench/bench.h                 |   1 +
 tools/perf/bench/inject-buildid.c        | 432 +++++++++++++++++++++++
 tools/perf/builtin-bench.c               |   1 +
 tools/perf/builtin-inject.c              | 207 ++++++++---
 tools/perf/util/build-id.h               |   4 +
 tools/perf/util/map.c                    |  17 +-
 tools/perf/util/map.h                    |  14 +
 9 files changed, 617 insertions(+), 66 deletions(-)
 create mode 100644 tools/perf/bench/inject-buildid.c

-- 
2.28.0.681.g6f77f65b4e-goog


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/7] perf bench: Add build-id injection benchmark
  2020-09-23  8:05 [PATCHSET v2 0/7] perf inject: Speed build-id injection Namhyung Kim
@ 2020-09-23  8:05 ` Namhyung Kim
  2020-09-23 22:13   ` Ian Rogers
  2020-09-23  8:05 ` [PATCH 2/7] perf inject: Add missing callbacks in perf_tool Namhyung Kim
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Namhyung Kim @ 2020-09-23  8:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Mark Rutland, Alexander Shishkin,
	LKML, Stephane Eranian, Ian Rogers

Sometimes I can see perf record piped with perf inject take long time
processing build-id.  So add inject-build-id benchmark to the
internals benchmark suite to measure its overhead regularly.

It runs perf inject command internally and feeds the given number of
synthesized events (MMAP2 + SAMPLE basically).

  Usage: perf bench internals inject-build-id <options>

    -i, --iterations <n>  Number of iterations used to compute average (default: 100)
    -m, --nr-mmaps <n>    Number of mmap events for each iteration (default: 100)
    -n, --nr-samples <n>  Number of sample events per mmap event (default: 100)
    -v, --verbose         be more verbose (show iteration count, DSO name, etc)

By default, it measures average processing time of 100 MMAP2 events
and 10000 SAMPLE events.  Below is a result on my laptop.

  $ perf bench internals inject-build-id
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 22.997 msec (+- 0.067 msec)
    Average time per event: 2.255 usec (+- 0.007 usec)

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/bench/Build            |   1 +
 tools/perf/bench/bench.h          |   1 +
 tools/perf/bench/inject-buildid.c | 417 ++++++++++++++++++++++++++++++
 tools/perf/builtin-bench.c        |   1 +
 tools/perf/builtin-inject.c       |   9 +-
 tools/perf/util/build-id.h        |   4 +
 6 files changed, 428 insertions(+), 5 deletions(-)
 create mode 100644 tools/perf/bench/inject-buildid.c

diff --git a/tools/perf/bench/Build b/tools/perf/bench/Build
index dd68a40a790c..8b52591338d6 100644
--- a/tools/perf/bench/Build
+++ b/tools/perf/bench/Build
@@ -12,6 +12,7 @@ perf-y += epoll-ctl.o
 perf-y += synthesize.o
 perf-y += kallsyms-parse.o
 perf-y += find-bit-bench.o
+perf-y += inject-buildid.o
 
 perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-lib.o
 perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-asm.o
diff --git a/tools/perf/bench/bench.h b/tools/perf/bench/bench.h
index 2804812d4154..eac36afab2b3 100644
--- a/tools/perf/bench/bench.h
+++ b/tools/perf/bench/bench.h
@@ -47,6 +47,7 @@ int bench_epoll_wait(int argc, const char **argv);
 int bench_epoll_ctl(int argc, const char **argv);
 int bench_synthesize(int argc, const char **argv);
 int bench_kallsyms_parse(int argc, const char **argv);
+int bench_inject_build_id(int argc, const char **argv);
 
 #define BENCH_FORMAT_DEFAULT_STR	"default"
 #define BENCH_FORMAT_DEFAULT		0
diff --git a/tools/perf/bench/inject-buildid.c b/tools/perf/bench/inject-buildid.c
new file mode 100644
index 000000000000..e5144a85d689
--- /dev/null
+++ b/tools/perf/bench/inject-buildid.c
@@ -0,0 +1,417 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <stdlib.h>
+#include <stddef.h>
+#include <ftw.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#include <sys/wait.h>
+#include <linux/kernel.h>
+#include <linux/time64.h>
+#include <linux/list.h>
+#include <linux/err.h>
+#include <internal/lib.h>
+#include <subcmd/parse-options.h>
+
+#include "bench.h"
+#include "util/data.h"
+#include "util/stat.h"
+#include "util/debug.h"
+#include "util/event.h"
+#include "util/symbol.h"
+#include "util/session.h"
+#include "util/build-id.h"
+#include "util/synthetic-events.h"
+
+#define MMAP_DEV_MAJOR  8
+
+static unsigned int iterations = 100;
+static unsigned int nr_mmaps   = 100;
+static unsigned int nr_samples = 100;  /* samples per mmap */
+
+static u64 bench_sample_type = PERF_SAMPLE_IP | PERF_SAMPLE_TID;
+static u16 bench_id_hdr_size = 8;  /* only for pid/tid */
+
+struct bench_data {
+	int			pid;
+	int			input_pipe[2];
+	int			output_pipe[2];
+};
+
+struct bench_dso {
+	struct list_head	list;
+	char			*name;
+	int			ino;
+};
+
+static int nr_dsos;
+static LIST_HEAD(dso_list);
+
+extern int cmd_inject(int argc, const char *argv[]);
+
+static const struct option options[] = {
+	OPT_UINTEGER('i', "iterations", &iterations,
+		     "Number of iterations used to compute average (default: 100)"),
+	OPT_UINTEGER('m', "nr-mmaps", &nr_mmaps,
+		     "Number of mmap events for each iteration (default: 100)"),
+	OPT_UINTEGER('n', "nr-samples", &nr_samples,
+		     "Number of sample events per mmap event (default: 100)"),
+	OPT_INCR('v', "verbose", &verbose,
+		 "be more verbose (show iteration count, DSO name, etc)"),
+	OPT_END()
+};
+
+static const char *const bench_usage[] = {
+	"perf bench internals inject-build-id <options>",
+	NULL
+};
+
+static int add_dso(const char *fpath, const struct stat *sb __maybe_unused,
+		   int typeflag, struct FTW *ftwbuf __maybe_unused)
+{
+	struct bench_dso *dso;
+	unsigned char build_id[BUILD_ID_SIZE];
+
+	if (typeflag == FTW_D || typeflag == FTW_SL) {
+		return 0;
+	}
+
+	if (filename__read_build_id(fpath, build_id, BUILD_ID_SIZE) < 0)
+		return 0;
+
+	dso = malloc(sizeof(*dso));
+	if (dso == NULL)
+		return -1;
+
+	dso->name = realpath(fpath, NULL);
+	if (dso->name == NULL) {
+		free(dso);
+		return -1;
+	}
+
+	dso->ino = nr_dsos++;
+	list_add(&dso->list, &dso_list);
+	pr_debug2("  Adding DSO: %s\n", fpath);
+
+	/* stop if we collected 4x DSOs than needed */
+	if ((unsigned)nr_dsos > 4 * nr_mmaps)
+		return 1;
+
+	return 0;
+}
+
+static void collect_dso(void)
+{
+	if (nftw("/usr/lib/", add_dso, 10, FTW_PHYS) < 0)
+		return;
+
+	pr_debug("  Collected %d DSOs\n", nr_dsos);
+}
+
+static void release_dso(void)
+{
+	struct bench_dso *dso;
+
+	while (!list_empty(&dso_list)) {
+		dso = list_first_entry(&dso_list, struct bench_dso, list);
+		list_del(&dso->list);
+		free(dso->name);
+		free(dso);
+	}
+}
+
+static u64 dso_map_addr(struct bench_dso *dso)
+{
+	return 0x400000ULL + dso->ino * 8192ULL;
+}
+
+static u32 synthesize_attr(struct bench_data *data)
+{
+	union perf_event event;
+
+	memset(&event, 0, sizeof(event.attr) + sizeof(u64));
+
+	event.header.type = PERF_RECORD_HEADER_ATTR;
+	event.header.size = sizeof(event.attr) + sizeof(u64);
+
+	event.attr.attr.type = PERF_TYPE_SOFTWARE;
+	event.attr.attr.config = PERF_COUNT_SW_TASK_CLOCK;
+	event.attr.attr.exclude_kernel = 1;
+	event.attr.attr.sample_id_all = 1;
+	event.attr.attr.sample_type = bench_sample_type;
+
+	return writen(data->input_pipe[1], &event, event.header.size);
+}
+
+static u32 synthesize_fork(struct bench_data *data)
+{
+	union perf_event event;
+
+	memset(&event, 0, sizeof(event.fork) + bench_id_hdr_size);
+
+	event.header.type = PERF_RECORD_FORK;
+	event.header.misc = PERF_RECORD_MISC_FORK_EXEC;
+	event.header.size = sizeof(event.fork) + bench_id_hdr_size;
+
+	event.fork.ppid = 1;
+	event.fork.ptid = 1;
+	event.fork.pid = data->pid;
+	event.fork.tid = data->pid;
+
+	return writen(data->input_pipe[1], &event, event.header.size);
+}
+
+static u32 synthesize_mmap(struct bench_data *data, struct bench_dso *dso)
+{
+	union perf_event event;
+	size_t len = offsetof(struct perf_record_mmap2, filename);
+
+	len += roundup(strlen(dso->name) + 1, 8) + bench_id_hdr_size;
+
+	memset(&event, 0, min(len, sizeof(event.mmap2)));
+
+	event.header.type = PERF_RECORD_MMAP2;
+	event.header.misc = PERF_RECORD_MISC_USER;
+	event.header.size = len;
+
+	event.mmap2.pid = data->pid;
+	event.mmap2.tid = data->pid;
+	event.mmap2.maj = MMAP_DEV_MAJOR;
+	event.mmap2.ino = dso->ino;
+
+	strcpy(event.mmap2.filename, dso->name);
+
+	event.mmap2.start = dso_map_addr(dso);
+	event.mmap2.len = 4096;
+	event.mmap2.prot = PROT_EXEC;
+
+	if (len > sizeof(event.mmap2)) {
+		/* write mmap2 event first */
+		writen(data->input_pipe[1], &event, len - bench_id_hdr_size);
+		/* write zero-filled sample id header */
+		memset(&event, 0, bench_id_hdr_size);
+		writen(data->input_pipe[1], &event, bench_id_hdr_size);
+	} else {
+		writen(data->input_pipe[1], &event, len);
+	}
+	return len;
+}
+
+static u32 synthesize_sample(struct bench_data *data, struct bench_dso *dso)
+{
+	union perf_event event;
+	struct perf_sample sample = {
+		.tid = data->pid,
+		.pid = data->pid,
+		.ip = dso_map_addr(dso),
+	};
+
+	event.header.type = PERF_RECORD_SAMPLE;
+	event.header.misc = PERF_RECORD_MISC_USER;
+	event.header.size = perf_event__sample_event_size(&sample, bench_sample_type, 0);
+
+	perf_event__synthesize_sample(&event, bench_sample_type, 0, &sample);
+
+	return writen(data->input_pipe[1], &event, event.header.size);
+}
+
+static void sigpipe_handler(int sig __maybe_unused)
+{
+	/* child exited */
+}
+
+static int setup_injection(struct bench_data *data)
+{
+	int ready_pipe[2];
+	int dev_null_fd;
+	char buf;
+
+	if (pipe(ready_pipe) < 0)
+		return -1;
+
+	if (pipe(data->input_pipe) < 0)
+		return -1;
+
+	if (pipe(data->output_pipe) < 0)
+		return -1;
+
+	data->pid = fork();
+	if (data->pid < 0)
+		return -1;
+
+	if (data->pid == 0) {
+		const char **inject_argv;
+
+		close(data->input_pipe[1]);
+		close(data->output_pipe[0]);
+		close(ready_pipe[0]);
+
+		dup2(data->input_pipe[0], STDIN_FILENO);
+		close(data->input_pipe[0]);
+		dup2(data->output_pipe[1], STDOUT_FILENO);
+		close(data->output_pipe[1]);
+
+		dev_null_fd = open("/dev/null", O_WRONLY);
+		if (dev_null_fd < 0)
+			exit(1);
+
+		dup2(dev_null_fd, STDERR_FILENO);
+
+		inject_argv = calloc(3, sizeof(*inject_argv));
+		if (inject_argv == NULL)
+			exit(1);
+
+		inject_argv[0] = strdup("inject");
+		inject_argv[1] = strdup("-b");
+
+		/* signal that we're ready to go */
+		close(ready_pipe[1]);
+
+		cmd_inject(2, inject_argv);
+
+		exit(0);
+	}
+
+	signal(SIGPIPE, sigpipe_handler);
+
+	close(ready_pipe[1]);
+	close(data->input_pipe[0]);
+	close(data->output_pipe[1]);
+
+	/* wait for child ready */
+	if (read(ready_pipe[0], &buf, 1) < 0)
+		return -1;
+	close(ready_pipe[0]);
+
+	return 0;
+}
+
+static int inject_build_id(struct bench_data *data)
+{
+	int flag, status;
+	unsigned int i, k;
+	char buf[8192];
+	u64 nread = 0;
+	u64 len = nr_mmaps / 2 * sizeof(struct perf_record_header_build_id);
+
+	flag = fcntl(data->output_pipe[0], F_GETFL, 0);
+	if (fcntl(data->output_pipe[0], F_SETFL, flag | O_NONBLOCK) < 0)
+		return -1;
+
+	/* this makes the child to run */
+	if (perf_header__write_pipe(data->input_pipe[1]) < 0)
+		return -1;
+
+	len += synthesize_attr(data);
+	len += synthesize_fork(data);
+
+	for (i = 0; i < nr_mmaps; i++) {
+		struct bench_dso *dso;
+		int idx = rand() % (nr_dsos - 1);
+
+		dso = list_first_entry(&dso_list, struct bench_dso, list);
+		while (idx--)
+			dso = list_next_entry(dso, list);
+
+		pr_debug("   [%2d] injecting: %s\n", i+1, dso->name);
+		len += synthesize_mmap(data, dso);
+
+		for (k = 0; k < nr_samples; k++)
+			len += synthesize_sample(data, dso);
+
+		/* read out data from child */
+		while (true) {
+			int n; 
+
+			n = read(data->output_pipe[0], buf, sizeof(buf));
+			if (n <= 0)
+				break;
+			nread += n;
+		}
+	}
+
+	/* wait to read data at least as we wrote + some build-ids */
+	while (nread < len) {
+		int n;
+
+		n = read(data->output_pipe[0], buf, sizeof(buf));
+		if (n < 0)
+			break;
+		nread += n;
+	}
+	close(data->input_pipe[1]);
+	close(data->output_pipe[0]);
+
+	wait(&status);
+	pr_debug("   Child %d exited with %d\n", data->pid, status);
+
+	return 0;
+}
+
+static int do_inject_loop(struct bench_data *data)
+{
+	unsigned int i;
+	struct stats time_stats;
+	double time_average, time_stddev;
+
+	srand(time(NULL));
+	init_stats(&time_stats);
+	symbol__init(NULL);
+
+	collect_dso();
+	if (nr_dsos == 0) {
+		printf("  Cannot collect DSOs for injection\n");
+		return -1;
+	}
+
+	for (i = 0; i < iterations; i++) {
+		struct timeval start, end, diff;
+		u64 runtime_us;
+
+		pr_debug("  Iteration #%d\n", i+1);
+
+		if (setup_injection(data) < 0) {
+			printf("  Build-id injection setup failed\n");
+			break;
+		}
+
+		gettimeofday(&start, NULL);
+		if (inject_build_id(data) < 0) {
+			printf("  Build-id injection failed\n");
+			break;
+		}
+
+		gettimeofday(&end, NULL);
+		timersub(&end, &start, &diff);
+		runtime_us = diff.tv_sec * USEC_PER_SEC + diff.tv_usec;
+		update_stats(&time_stats, runtime_us);
+	}
+
+	time_average = avg_stats(&time_stats) / USEC_PER_MSEC;
+	time_stddev = stddev_stats(&time_stats) / USEC_PER_MSEC;
+	printf("  Average build-id injection took: %.3f msec (+- %.3f msec)\n",
+		time_average, time_stddev);
+
+	/* each iteration, it processes MMAP2 + BUILD_ID + nr_samples * SAMPLE */
+	time_average = avg_stats(&time_stats) / (nr_mmaps * (nr_samples + 2));
+	time_stddev = stddev_stats(&time_stats) / (nr_mmaps * (nr_samples + 2));
+	printf("  Average time per event: %.3f usec (+- %.3f usec)\n",
+		time_average, time_stddev);
+
+	release_dso();
+	return 0;
+}
+
+int bench_inject_build_id(int argc, const char **argv)
+{
+	struct bench_data data;
+
+	argc = parse_options(argc, argv, options, bench_usage, 0);
+	if (argc) {
+		usage_with_options(bench_usage, options);
+		exit(EXIT_FAILURE);
+	}
+
+	return do_inject_loop(&data);
+}
+
diff --git a/tools/perf/builtin-bench.c b/tools/perf/builtin-bench.c
index 4f176039fc8f..62a7b7420a44 100644
--- a/tools/perf/builtin-bench.c
+++ b/tools/perf/builtin-bench.c
@@ -87,6 +87,7 @@ static struct bench epoll_benchmarks[] = {
 static struct bench internals_benchmarks[] = {
 	{ "synthesize", "Benchmark perf event synthesis",	bench_synthesize	},
 	{ "kallsyms-parse", "Benchmark kallsyms parsing",	bench_kallsyms_parse	},
+	{ "inject-build-id", "Benchmark build-id injection",	bench_inject_build_id	},
 	{ NULL,		NULL,					NULL			}
 };
 
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 6d2f410d773a..e4d78f11494e 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -441,11 +441,10 @@ static int dso__inject_build_id(struct dso *dso, struct perf_tool *tool,
 	return 0;
 }
 
-static int perf_event__inject_buildid(struct perf_tool *tool,
-				      union perf_event *event,
-				      struct perf_sample *sample,
-				      struct evsel *evsel __maybe_unused,
-				      struct machine *machine)
+int perf_event__inject_buildid(struct perf_tool *tool, union perf_event *event,
+			       struct perf_sample *sample,
+			       struct evsel *evsel __maybe_unused,
+			       struct machine *machine)
 {
 	struct addr_location al;
 	struct thread *thread;
diff --git a/tools/perf/util/build-id.h b/tools/perf/util/build-id.h
index aad419bb165c..949f7e54c9cb 100644
--- a/tools/perf/util/build-id.h
+++ b/tools/perf/util/build-id.h
@@ -29,6 +29,10 @@ int build_id__mark_dso_hit(struct perf_tool *tool, union perf_event *event,
 
 int dsos__hit_all(struct perf_session *session);
 
+int perf_event__inject_buildid(struct perf_tool *tool, union perf_event *event,
+			       struct perf_sample *sample, struct evsel *evsel,
+			       struct machine *machine);
+
 bool perf_session__read_build_ids(struct perf_session *session, bool with_hits);
 int perf_session__write_buildid_table(struct perf_session *session,
 				      struct feat_fd *fd);
-- 
2.28.0.681.g6f77f65b4e-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/7] perf inject: Add missing callbacks in perf_tool
  2020-09-23  8:05 [PATCHSET v2 0/7] perf inject: Speed build-id injection Namhyung Kim
  2020-09-23  8:05 ` [PATCH 1/7] perf bench: Add build-id injection benchmark Namhyung Kim
@ 2020-09-23  8:05 ` Namhyung Kim
  2020-09-23  8:05 ` [PATCH 3/7] perf inject: Enter namespace when reading build-id Namhyung Kim
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Namhyung Kim @ 2020-09-23  8:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Mark Rutland, Alexander Shishkin,
	LKML, Stephane Eranian, Ian Rogers

I found some events (like PERF_RECORD_CGROUP) are not copied by perf
inject due to the missing callbacks.  Let's add them.

While at it, I've changed the order of the callbacks to match with
struct perf_tool so that we can compare them easily.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-inject.c | 36 +++++++++++++++++++++++++++++++-----
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index e4d78f11494e..2c5e23d73a8a 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -97,6 +97,13 @@ static int perf_event__repipe_op2_synth(struct perf_session *session,
 	return perf_event__repipe_synth(session->tool, event);
 }
 
+static int perf_event__repipe_op4_synth(struct perf_session *session,
+					union perf_event *event,
+					u64 data __maybe_unused)
+{
+	return perf_event__repipe_synth(session->tool, event);
+}
+
 static int perf_event__repipe_attr(struct perf_tool *tool,
 				   union perf_event *event,
 				   struct evlist **pevlist)
@@ -115,6 +122,13 @@ static int perf_event__repipe_attr(struct perf_tool *tool,
 	return perf_event__repipe_synth(tool, event);
 }
 
+static int perf_event__repipe_event_update(struct perf_tool *tool,
+					   union perf_event *event,
+					   struct evlist **pevlist __maybe_unused)
+{
+	return perf_event__repipe_synth(tool, event);
+}
+
 #ifdef HAVE_AUXTRACE_SUPPORT
 
 static int copy_bytes(struct perf_inject *inject, int fd, off_t size)
@@ -707,9 +721,12 @@ int cmd_inject(int argc, const char **argv)
 	struct perf_inject inject = {
 		.tool = {
 			.sample		= perf_event__repipe_sample,
+			.read		= perf_event__repipe_sample,
 			.mmap		= perf_event__repipe,
 			.mmap2		= perf_event__repipe,
 			.comm		= perf_event__repipe,
+			.namespaces	= perf_event__repipe,
+			.cgroup		= perf_event__repipe,
 			.fork		= perf_event__repipe,
 			.exit		= perf_event__repipe,
 			.lost		= perf_event__repipe,
@@ -717,19 +734,28 @@ int cmd_inject(int argc, const char **argv)
 			.aux		= perf_event__repipe,
 			.itrace_start	= perf_event__repipe,
 			.context_switch	= perf_event__repipe,
-			.read		= perf_event__repipe_sample,
 			.throttle	= perf_event__repipe,
 			.unthrottle	= perf_event__repipe,
+			.ksymbol	= perf_event__repipe,
+			.bpf		= perf_event__repipe,
+			.text_poke	= perf_event__repipe,
 			.attr		= perf_event__repipe_attr,
+			.event_update	= perf_event__repipe_event_update,
 			.tracing_data	= perf_event__repipe_op2_synth,
-			.auxtrace_info	= perf_event__repipe_op2_synth,
-			.auxtrace	= perf_event__repipe_auxtrace,
-			.auxtrace_error	= perf_event__repipe_op2_synth,
-			.time_conv	= perf_event__repipe_op2_synth,
 			.finished_round	= perf_event__repipe_oe_synth,
 			.build_id	= perf_event__repipe_op2_synth,
 			.id_index	= perf_event__repipe_op2_synth,
+			.auxtrace_info	= perf_event__repipe_op2_synth,
+			.auxtrace_error	= perf_event__repipe_op2_synth,
+			.time_conv	= perf_event__repipe_op2_synth,
+			.thread_map	= perf_event__repipe_op2_synth,
+			.cpu_map	= perf_event__repipe_op2_synth,
+			.stat_config	= perf_event__repipe_op2_synth,
+			.stat		= perf_event__repipe_op2_synth,
+			.stat_round	= perf_event__repipe_op2_synth,
 			.feature	= perf_event__repipe_op2_synth,
+			.compressed	= perf_event__repipe_op4_synth,
+			.auxtrace	= perf_event__repipe_auxtrace,
 		},
 		.input_name  = "-",
 		.samples = LIST_HEAD_INIT(inject.samples),
-- 
2.28.0.681.g6f77f65b4e-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/7] perf inject: Enter namespace when reading build-id
  2020-09-23  8:05 [PATCHSET v2 0/7] perf inject: Speed build-id injection Namhyung Kim
  2020-09-23  8:05 ` [PATCH 1/7] perf bench: Add build-id injection benchmark Namhyung Kim
  2020-09-23  8:05 ` [PATCH 2/7] perf inject: Add missing callbacks in perf_tool Namhyung Kim
@ 2020-09-23  8:05 ` Namhyung Kim
  2020-09-23  8:05 ` [PATCH 4/7] perf inject: Do not load map/dso when injecting build-id Namhyung Kim
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Namhyung Kim @ 2020-09-23  8:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Mark Rutland, Alexander Shishkin,
	LKML, Stephane Eranian, Ian Rogers

It should be in a proper mnt namespace when accessing the file.

I think this had no problem since the build-id was actually read from
map__load() -> dso__load() already.  But I'd like to change it in the
following commit.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-inject.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 2c5e23d73a8a..670157db2502 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -23,6 +23,7 @@
 #include "util/symbol.h"
 #include "util/synthetic-events.h"
 #include "util/thread.h"
+#include "util/namespaces.h"
 #include <linux/err.h>
 
 #include <subcmd/parse-options.h>
@@ -419,16 +420,19 @@ static int perf_event__repipe_tracing_data(struct perf_session *session,
 
 static int dso__read_build_id(struct dso *dso)
 {
+	struct nscookie nsc;
+
 	if (dso->has_build_id)
 		return 0;
 
+	nsinfo__mountns_enter(dso->nsinfo, &nsc);
 	if (filename__read_build_id(dso->long_name, dso->build_id,
 				    sizeof(dso->build_id)) > 0) {
 		dso->has_build_id = true;
-		return 0;
 	}
+	nsinfo__mountns_exit(&nsc);
 
-	return -1;
+	return dso->has_build_id ? 0 : -1;
 }
 
 static int dso__inject_build_id(struct dso *dso, struct perf_tool *tool,
-- 
2.28.0.681.g6f77f65b4e-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/7] perf inject: Do not load map/dso when injecting build-id
  2020-09-23  8:05 [PATCHSET v2 0/7] perf inject: Speed build-id injection Namhyung Kim
                   ` (2 preceding siblings ...)
  2020-09-23  8:05 ` [PATCH 3/7] perf inject: Enter namespace when reading build-id Namhyung Kim
@ 2020-09-23  8:05 ` Namhyung Kim
  2020-09-24 13:09   ` Jiri Olsa
  2020-09-23  8:05 ` [PATCH 5/7] perf inject: Add --buildid-all option Namhyung Kim
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Namhyung Kim @ 2020-09-23  8:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Mark Rutland, Alexander Shishkin,
	LKML, Stephane Eranian, Ian Rogers

No need to load symbols in a DSO when injecting build-id.  I guess the
reason was to check the DSO is a special file like anon files.  Use
some helper functions in map.c to check them before reading build-id.
Also pass sample event's cpumode to a new build-id event.

It brought a speedup in the benchmark of 22 -> 18 msec on my laptop.

  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 18.259 msec (+- 0.051 msec)
    Average time per event: 1.790 usec (+- 0.005 usec)

Also note that, it reduces memory footprint of perf inject.  I saw
that max RSS went down by ~27MB when processing ~300MB data file.

Original-patch-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-inject.c | 30 ++++++++++--------------------
 tools/perf/util/map.c       | 17 +----------------
 tools/perf/util/map.h       | 14 ++++++++++++++
 3 files changed, 25 insertions(+), 36 deletions(-)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 670157db2502..d0aa365e7294 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -436,21 +436,22 @@ static int dso__read_build_id(struct dso *dso)
 }
 
 static int dso__inject_build_id(struct dso *dso, struct perf_tool *tool,
-				struct machine *machine)
+				struct machine *machine, u8 cpumode)
 {
-	u16 misc = PERF_RECORD_MISC_USER;
 	int err;
 
+	if (is_anon_memory(dso->long_name))
+		return 0;
+	if (is_no_dso_memory(dso->long_name))
+		return 0;
+
 	if (dso__read_build_id(dso) < 0) {
 		pr_debug("no build_id found for %s\n", dso->long_name);
 		return -1;
 	}
 
-	if (dso->kernel)
-		misc = PERF_RECORD_MISC_KERNEL;
-
-	err = perf_event__synthesize_build_id(tool, dso, misc, perf_event__repipe,
-					      machine);
+	err = perf_event__synthesize_build_id(tool, dso, cpumode,
+					      perf_event__repipe, machine);
 	if (err) {
 		pr_err("Can't synthesize build_id event for %s\n", dso->long_name);
 		return -1;
@@ -477,19 +478,8 @@ int perf_event__inject_buildid(struct perf_tool *tool, union perf_event *event,
 	if (thread__find_map(thread, sample->cpumode, sample->ip, &al)) {
 		if (!al.map->dso->hit) {
 			al.map->dso->hit = 1;
-			if (map__load(al.map) >= 0) {
-				dso__inject_build_id(al.map->dso, tool, machine);
-				/*
-				 * If this fails, too bad, let the other side
-				 * account this as unresolved.
-				 */
-			} else {
-#ifdef HAVE_LIBELF_SUPPORT
-				pr_warning("no symbols found in %s, maybe "
-					   "install a debug package?\n",
-					   al.map->dso->long_name);
-#endif
-			}
+			dso__inject_build_id(al.map->dso, tool, machine,
+					     sample->cpumode);
 		}
 	}
 
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index cc0faf8f1321..8b305e624124 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -27,21 +27,6 @@
 
 static void __maps__insert(struct maps *maps, struct map *map);
 
-static inline int is_anon_memory(const char *filename, u32 flags)
-{
-	return flags & MAP_HUGETLB ||
-	       !strcmp(filename, "//anon") ||
-	       !strncmp(filename, "/dev/zero", sizeof("/dev/zero") - 1) ||
-	       !strncmp(filename, "/anon_hugepage", sizeof("/anon_hugepage") - 1);
-}
-
-static inline int is_no_dso_memory(const char *filename)
-{
-	return !strncmp(filename, "[stack", 6) ||
-	       !strncmp(filename, "/SYSV",5)   ||
-	       !strcmp(filename, "[heap]");
-}
-
 static inline int is_android_lib(const char *filename)
 {
 	return strstarts(filename, "/data/app-lib/") ||
@@ -158,7 +143,7 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
 		int anon, no_dso, vdso, android;
 
 		android = is_android_lib(filename);
-		anon = is_anon_memory(filename, flags);
+		anon = is_anon_memory(filename) || flags & MAP_HUGETLB;
 		vdso = is_vdso_map(filename);
 		no_dso = is_no_dso_memory(filename);
 		map->prot = prot;
diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index c2f5d28fe73a..b1c0686db1b7 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -171,4 +171,18 @@ static inline bool is_bpf_image(const char *name)
 	return strncmp(name, "bpf_trampoline_", sizeof("bpf_trampoline_") - 1) == 0 ||
 	       strncmp(name, "bpf_dispatcher_", sizeof("bpf_dispatcher_") - 1) == 0;
 }
+
+static inline int is_anon_memory(const char *filename)
+{
+	return !strcmp(filename, "//anon") ||
+	       !strncmp(filename, "/dev/zero", sizeof("/dev/zero") - 1) ||
+	       !strncmp(filename, "/anon_hugepage", sizeof("/anon_hugepage") - 1);
+}
+
+static inline int is_no_dso_memory(const char *filename)
+{
+	return !strncmp(filename, "[stack", 6) ||
+	       !strncmp(filename, "/SYSV", 5)  ||
+	       !strcmp(filename, "[heap]");
+}
 #endif /* __PERF_MAP_H */
-- 
2.28.0.681.g6f77f65b4e-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/7] perf inject: Add --buildid-all option
  2020-09-23  8:05 [PATCHSET v2 0/7] perf inject: Speed build-id injection Namhyung Kim
                   ` (3 preceding siblings ...)
  2020-09-23  8:05 ` [PATCH 4/7] perf inject: Do not load map/dso when injecting build-id Namhyung Kim
@ 2020-09-23  8:05 ` Namhyung Kim
  2020-09-23 22:16   ` Ian Rogers
  2020-09-23  8:05 ` [PATCH 6/7] perf bench: Run inject-build-id with --buildid-all option too Namhyung Kim
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Namhyung Kim @ 2020-09-23  8:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Mark Rutland, Alexander Shishkin,
	LKML, Stephane Eranian, Ian Rogers

Like perf record, we can even more speedup build-id processing by just
using all DSOs.  Then we don't need to look at all the sample events
anymore.  The following patch will update perf bench to show the
result of the --buildid-all option too.

Original-patch-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-inject.txt |   6 +-
 tools/perf/builtin-inject.c              | 112 ++++++++++++++++++++++-
 2 files changed, 112 insertions(+), 6 deletions(-)

diff --git a/tools/perf/Documentation/perf-inject.txt b/tools/perf/Documentation/perf-inject.txt
index 70969ea73e01..a8eccff21281 100644
--- a/tools/perf/Documentation/perf-inject.txt
+++ b/tools/perf/Documentation/perf-inject.txt
@@ -24,8 +24,12 @@ information could make use of this facility.
 OPTIONS
 -------
 -b::
---build-ids=::
+--build-ids::
         Inject build-ids into the output stream
+
+--buildid-all:
+	Inject build-ids of all DSOs into the output stream
+
 -v::
 --verbose::
 	Be more verbose.
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index d0aa365e7294..500428aaa576 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -10,6 +10,7 @@
 
 #include "util/color.h"
 #include "util/dso.h"
+#include "util/vdso.h"
 #include "util/evlist.h"
 #include "util/evsel.h"
 #include "util/map.h"
@@ -36,6 +37,7 @@ struct perf_inject {
 	struct perf_tool	tool;
 	struct perf_session	*session;
 	bool			build_ids;
+	bool			build_id_all;
 	bool			sched_stat;
 	bool			have_auxtrace;
 	bool			strip;
@@ -55,6 +57,9 @@ struct event_entry {
 	union perf_event event[];
 };
 
+static int dso__inject_build_id(struct dso *dso, struct perf_tool *tool,
+				struct machine *machine, u8 cpumode);
+
 static int output_bytes(struct perf_inject *inject, void *buf, size_t sz)
 {
 	ssize_t size;
@@ -318,6 +323,68 @@ static int perf_event__jit_repipe_mmap(struct perf_tool *tool,
 }
 #endif
 
+static struct dso *findnew_dso(int pid, int tid, const char *filename,
+			       struct dso_id *id, struct machine *machine)
+{
+	struct thread *thread;
+	struct nsinfo *nsi = NULL;
+	struct nsinfo *nnsi;
+	struct dso *dso;
+	bool vdso;
+
+	thread = machine__findnew_thread(machine, pid, tid);
+	if (thread == NULL) {
+		pr_err("cannot find or create a task %d/%d.\n", tid, pid);
+		return NULL;
+	}
+
+	vdso = is_vdso_map(filename);
+	nsi = nsinfo__get(thread->nsinfo);
+
+	if (vdso) {
+		/* The vdso maps are always on the host and not the
+		 * container.  Ensure that we don't use setns to look
+		 * them up.
+		 */
+		nnsi = nsinfo__copy(nsi);
+		if (nnsi) {
+			nsinfo__put(nsi);
+			nnsi->need_setns = false;
+			nsi = nnsi;
+		}
+		dso = machine__findnew_vdso(machine, thread);
+	} else {
+		dso = machine__findnew_dso_id(machine, filename, id);
+	}
+
+	if (dso)
+		dso->nsinfo = nsi;
+	else
+		nsinfo__put(nsi);
+
+	thread__put(thread);
+	return dso;
+}
+
+static int perf_event__repipe_buildid_mmap(struct perf_tool *tool,
+					   union perf_event *event,
+					   struct perf_sample *sample,
+					   struct machine *machine)
+{
+	struct dso *dso;
+
+	dso = findnew_dso(event->mmap.pid, event->mmap.tid,
+			  event->mmap.filename, NULL, machine);
+
+	if (dso && !dso->hit) {
+		dso->hit = 1;
+		dso__inject_build_id(dso, tool, machine, sample->cpumode);
+		dso__put(dso);
+	}
+
+	return perf_event__repipe(tool, event, sample, machine);
+}
+
 static int perf_event__repipe_mmap2(struct perf_tool *tool,
 				   union perf_event *event,
 				   struct perf_sample *sample,
@@ -356,6 +423,33 @@ static int perf_event__jit_repipe_mmap2(struct perf_tool *tool,
 }
 #endif
 
+static int perf_event__repipe_buildid_mmap2(struct perf_tool *tool,
+					    union perf_event *event,
+					    struct perf_sample *sample,
+					    struct machine *machine)
+{
+	struct dso_id dso_id = {
+		.maj = event->mmap2.maj,
+		.min = event->mmap2.min,
+		.ino = event->mmap2.ino,
+		.ino_generation = event->mmap2.ino_generation,
+	};
+	struct dso *dso;
+
+	dso = findnew_dso(event->mmap2.pid, event->mmap2.tid,
+			  event->mmap2.filename, &dso_id, machine);
+
+	if (dso && !dso->hit) {
+		dso->hit = 1;
+		dso__inject_build_id(dso, tool, machine, sample->cpumode);
+		dso__put(dso);
+	}
+
+	perf_event__repipe(tool, event, sample, machine);
+
+	return 0;
+}
+
 static int perf_event__repipe_fork(struct perf_tool *tool,
 				   union perf_event *event,
 				   struct perf_sample *sample,
@@ -613,7 +707,7 @@ static int __cmd_inject(struct perf_inject *inject)
 	signal(SIGINT, sig_handler);
 
 	if (inject->build_ids || inject->sched_stat ||
-	    inject->itrace_synth_opts.set) {
+	    inject->itrace_synth_opts.set || inject->build_id_all) {
 		inject->tool.mmap	  = perf_event__repipe_mmap;
 		inject->tool.mmap2	  = perf_event__repipe_mmap2;
 		inject->tool.fork	  = perf_event__repipe_fork;
@@ -622,7 +716,10 @@ static int __cmd_inject(struct perf_inject *inject)
 
 	output_data_offset = session->header.data_offset;
 
-	if (inject->build_ids) {
+	if (inject->build_id_all) {
+		inject->tool.mmap	  = perf_event__repipe_buildid_mmap;
+		inject->tool.mmap2	  = perf_event__repipe_buildid_mmap2;
+	} else if (inject->build_ids) {
 		inject->tool.sample = perf_event__inject_buildid;
 	} else if (inject->sched_stat) {
 		struct evsel *evsel;
@@ -766,6 +863,8 @@ int cmd_inject(int argc, const char **argv)
 	struct option options[] = {
 		OPT_BOOLEAN('b', "build-ids", &inject.build_ids,
 			    "Inject build-ids into the output stream"),
+		OPT_BOOLEAN(0, "buildid-all", &inject.build_id_all,
+			    "Inject build-ids of all DSOs into the output stream"),
 		OPT_STRING('i', "input", &inject.input_name, "file",
 			   "input file name"),
 		OPT_STRING('o', "output", &inject.output.path, "file",
@@ -814,8 +913,6 @@ int cmd_inject(int argc, const char **argv)
 		return -1;
 	}
 
-	inject.tool.ordered_events = inject.sched_stat;
-
 	data.path = inject.input_name;
 	inject.session = perf_session__new(&data, true, &inject.tool);
 	if (IS_ERR(inject.session))
@@ -824,7 +921,7 @@ int cmd_inject(int argc, const char **argv)
 	if (zstd_init(&(inject.session->zstd_data), 0) < 0)
 		pr_warning("Decompression initialization failed.\n");
 
-	if (inject.build_ids) {
+	if (inject.build_ids && !inject.build_id_all) {
 		/*
 		 * to make sure the mmap records are ordered correctly
 		 * and so that the correct especially due to jitted code
@@ -834,6 +931,11 @@ int cmd_inject(int argc, const char **argv)
 		inject.tool.ordered_events = true;
 		inject.tool.ordering_requires_timestamps = true;
 	}
+
+	if (inject.sched_stat) {
+		inject.tool.ordered_events = true;
+	}
+
 #ifdef HAVE_JITDUMP
 	if (inject.jit_mode) {
 		inject.tool.mmap2	   = perf_event__jit_repipe_mmap2;
-- 
2.28.0.681.g6f77f65b4e-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/7] perf bench: Run inject-build-id with --buildid-all option too
  2020-09-23  8:05 [PATCHSET v2 0/7] perf inject: Speed build-id injection Namhyung Kim
                   ` (4 preceding siblings ...)
  2020-09-23  8:05 ` [PATCH 5/7] perf inject: Add --buildid-all option Namhyung Kim
@ 2020-09-23  8:05 ` Namhyung Kim
  2020-09-23 22:17   ` Ian Rogers
  2020-09-23  8:05 ` [PATCH 7/7] perf inject: Remove stale build-id processing Namhyung Kim
  2020-09-24 13:35 ` [PATCHSET v2 0/7] perf inject: Speed build-id injection Jiri Olsa
  7 siblings, 1 reply; 23+ messages in thread
From: Namhyung Kim @ 2020-09-23  8:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Mark Rutland, Alexander Shishkin,
	LKML, Stephane Eranian, Ian Rogers

For comparison, it now runs the benchmark twice - one if regular -b
and another for --buildid-all.

  $ perf bench internals inject-build-id
  # Running 'internals/inject-build-id' benchmark:
    Average build-id injection took: 18.441 msec (+- 0.106 msec)
    Average time per event: 1.808 usec (+- 0.010 usec)
    Average build-id-all injection took: 13.451 msec (+- 0.132 msec)
    Average time per event: 1.319 usec (+- 0.013 usec)

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/bench/inject-buildid.c | 47 ++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 16 deletions(-)

diff --git a/tools/perf/bench/inject-buildid.c b/tools/perf/bench/inject-buildid.c
index e5144a85d689..7c9f2baecef2 100644
--- a/tools/perf/bench/inject-buildid.c
+++ b/tools/perf/bench/inject-buildid.c
@@ -220,7 +220,7 @@ static void sigpipe_handler(int sig __maybe_unused)
 	/* child exited */
 }
 
-static int setup_injection(struct bench_data *data)
+static int setup_injection(struct bench_data *data, bool build_id_all)
 {
 	int ready_pipe[2];
 	int dev_null_fd;
@@ -241,6 +241,7 @@ static int setup_injection(struct bench_data *data)
 
 	if (data->pid == 0) {
 		const char **inject_argv;
+		int inject_argc = 2;
 
 		close(data->input_pipe[1]);
 		close(data->output_pipe[0]);
@@ -257,17 +258,22 @@ static int setup_injection(struct bench_data *data)
 
 		dup2(dev_null_fd, STDERR_FILENO);
 
-		inject_argv = calloc(3, sizeof(*inject_argv));
+		if (build_id_all)
+			inject_argc++;
+
+		inject_argv = calloc(inject_argc + 1, sizeof(*inject_argv));
 		if (inject_argv == NULL)
 			exit(1);
 
 		inject_argv[0] = strdup("inject");
 		inject_argv[1] = strdup("-b");
+		if (build_id_all)
+			inject_argv[2] = strdup("--buildid-all");
 
 		/* signal that we're ready to go */
 		close(ready_pipe[1]);
 
-		cmd_inject(2, inject_argv);
+		cmd_inject(inject_argc, inject_argv);
 
 		exit(0);
 	}
@@ -348,21 +354,14 @@ static int inject_build_id(struct bench_data *data)
 	return 0;
 }
 
-static int do_inject_loop(struct bench_data *data)
+static void do_inject_loop(struct bench_data *data, bool build_id_all)
 {
 	unsigned int i;
 	struct stats time_stats;
 	double time_average, time_stddev;
 
-	srand(time(NULL));
 	init_stats(&time_stats);
-	symbol__init(NULL);
-
-	collect_dso();
-	if (nr_dsos == 0) {
-		printf("  Cannot collect DSOs for injection\n");
-		return -1;
-	}
+	pr_debug("  Build-id%s injection benchmark\n", build_id_all ? "-all" : "");
 
 	for (i = 0; i < iterations; i++) {
 		struct timeval start, end, diff;
@@ -370,7 +369,7 @@ static int do_inject_loop(struct bench_data *data)
 
 		pr_debug("  Iteration #%d\n", i+1);
 
-		if (setup_injection(data) < 0) {
+		if (setup_injection(data, build_id_all) < 0) {
 			printf("  Build-id injection setup failed\n");
 			break;
 		}
@@ -389,14 +388,30 @@ static int do_inject_loop(struct bench_data *data)
 
 	time_average = avg_stats(&time_stats) / USEC_PER_MSEC;
 	time_stddev = stddev_stats(&time_stats) / USEC_PER_MSEC;
-	printf("  Average build-id injection took: %.3f msec (+- %.3f msec)\n",
-		time_average, time_stddev);
+	printf("  Average build-id%s injection took: %.3f msec (+- %.3f msec)\n",
+	       build_id_all ? "-all" : "", time_average, time_stddev);
 
 	/* each iteration, it processes MMAP2 + BUILD_ID + nr_samples * SAMPLE */
 	time_average = avg_stats(&time_stats) / (nr_mmaps * (nr_samples + 2));
 	time_stddev = stddev_stats(&time_stats) / (nr_mmaps * (nr_samples + 2));
 	printf("  Average time per event: %.3f usec (+- %.3f usec)\n",
 		time_average, time_stddev);
+}
+
+static int do_inject_loops(struct bench_data *data)
+{
+
+	srand(time(NULL));
+	symbol__init(NULL);
+
+	collect_dso();
+	if (nr_dsos == 0) {
+		printf("  Cannot collect DSOs for injection\n");
+		return -1;
+	}
+
+	do_inject_loop(data, false);
+	do_inject_loop(data, true);
 
 	release_dso();
 	return 0;
@@ -412,6 +427,6 @@ int bench_inject_build_id(int argc, const char **argv)
 		exit(EXIT_FAILURE);
 	}
 
-	return do_inject_loop(&data);
+	return do_inject_loops(&data);
 }
 
-- 
2.28.0.681.g6f77f65b4e-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 7/7] perf inject: Remove stale build-id processing
  2020-09-23  8:05 [PATCHSET v2 0/7] perf inject: Speed build-id injection Namhyung Kim
                   ` (5 preceding siblings ...)
  2020-09-23  8:05 ` [PATCH 6/7] perf bench: Run inject-build-id with --buildid-all option too Namhyung Kim
@ 2020-09-23  8:05 ` Namhyung Kim
  2020-09-23 14:36   ` Adrian Hunter
  2020-09-24 13:33   ` Jiri Olsa
  2020-09-24 13:35 ` [PATCHSET v2 0/7] perf inject: Speed build-id injection Jiri Olsa
  7 siblings, 2 replies; 23+ messages in thread
From: Namhyung Kim @ 2020-09-23  8:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Mark Rutland, Alexander Shishkin,
	LKML, Stephane Eranian, Ian Rogers, Adrian Hunter

I think we don't need to call build_id__mark_dso_hit() in the
perf_event__repipe_sample() as it's not used by -b option.  In case of
the -b option is used, it uses perf_event__inject_buildid() instead.
This can remove unnecessary overhead of finding thread/map for each
sample event.

Also I suspect HEADER_BUILD_ID feature bit setting since we already
generated/injected BUILD_ID event into the output stream.  So this
header information seems redundant.  I'm not 100% sure about the
auxtrace usage, but it looks like not related to this directly.

And we now have --buildid-all so users can get the same behavior if
they want.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-inject.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 500428aaa576..0191d72be7c4 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -277,8 +277,6 @@ static int perf_event__repipe_sample(struct perf_tool *tool,
 		return f(tool, event, sample, evsel, machine);
 	}
 
-	build_id__mark_dso_hit(tool, event, sample, evsel, machine);
-
 	if (inject->itrace_synth_opts.set && sample->aux_sample.size)
 		event = perf_inject__cut_auxtrace_sample(inject, event, sample);
 
@@ -767,16 +765,6 @@ static int __cmd_inject(struct perf_inject *inject)
 		return ret;
 
 	if (!data_out->is_pipe) {
-		if (inject->build_ids)
-			perf_header__set_feat(&session->header,
-					      HEADER_BUILD_ID);
-		/*
-		 * Keep all buildids when there is unprocessed AUX data because
-		 * it is not known which ones the AUX trace hits.
-		 */
-		if (perf_header__has_feat(&session->header, HEADER_BUILD_ID) &&
-		    inject->have_auxtrace && !inject->itrace_synth_opts.set)
-			dsos__hit_all(session);
 		/*
 		 * The AUX areas have been removed and replaced with
 		 * synthesized hardware events, so clear the feature flag and
-- 
2.28.0.681.g6f77f65b4e-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/7] perf inject: Remove stale build-id processing
  2020-09-23  8:05 ` [PATCH 7/7] perf inject: Remove stale build-id processing Namhyung Kim
@ 2020-09-23 14:36   ` Adrian Hunter
  2020-09-24  3:51     ` Namhyung Kim
  2020-09-24 13:33   ` Jiri Olsa
  1 sibling, 1 reply; 23+ messages in thread
From: Adrian Hunter @ 2020-09-23 14:36 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Mark Rutland, Alexander Shishkin,
	LKML, Stephane Eranian, Ian Rogers

On 23/09/20 11:05 am, Namhyung Kim wrote:
> I think we don't need to call build_id__mark_dso_hit() in the
> perf_event__repipe_sample() as it's not used by -b option.  In case of
> the -b option is used, it uses perf_event__inject_buildid() instead.
> This can remove unnecessary overhead of finding thread/map for each
> sample event.
> 
> Also I suspect HEADER_BUILD_ID feature bit setting since we already
> generated/injected BUILD_ID event into the output stream.  So this
> header information seems redundant.  I'm not 100% sure about the
> auxtrace usage, but it looks like not related to this directly.
> 
> And we now have --buildid-all so users can get the same behavior if
> they want.

For a perf.data file, don't buildids get written to the HEADER_BUILD_ID
feature section by perf_session__write_header() if the feature flag is set
and if they are hit?

So, unless -b is used, anything you don't hit you lose i.e. a buildid in the
HEADER_BUILD_ID feature section of the input file, will not be written to
the output file.

> 
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-inject.c | 12 ------------
>  1 file changed, 12 deletions(-)
> 
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index 500428aaa576..0191d72be7c4 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -277,8 +277,6 @@ static int perf_event__repipe_sample(struct perf_tool *tool,
>  		return f(tool, event, sample, evsel, machine);
>  	}
>  
> -	build_id__mark_dso_hit(tool, event, sample, evsel, machine);
> -

I guess that chunk would prevent losing a buildid in a perf.data file?

>  	if (inject->itrace_synth_opts.set && sample->aux_sample.size)
>  		event = perf_inject__cut_auxtrace_sample(inject, event, sample);
>  
> @@ -767,16 +765,6 @@ static int __cmd_inject(struct perf_inject *inject)
>  		return ret;
>  
>  	if (!data_out->is_pipe) {
> -		if (inject->build_ids)
> -			perf_header__set_feat(&session->header,
> -					      HEADER_BUILD_ID);

That could be due to confusion with 'perf buildid-list' which will not show
any buildids from synthesized buildid events unless "with hits" is selected,
so then it looks like there are no buildids.

It could be an advantage to have the buildids also in the HEADER_BUILD_ID
feature section, because then then build-list can list them quickly.

> -		/*
> -		 * Keep all buildids when there is unprocessed AUX data because
> -		 * it is not known which ones the AUX trace hits.
> -		 */
> -		if (perf_header__has_feat(&session->header, HEADER_BUILD_ID) &&
> -		    inject->have_auxtrace && !inject->itrace_synth_opts.set)
> -			dsos__hit_all(session);

I expect that is definitely needed.

>  		/*
>  		 * The AUX areas have been removed and replaced with
>  		 * synthesized hardware events, so clear the feature flag and
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/7] perf bench: Add build-id injection benchmark
  2020-09-23  8:05 ` [PATCH 1/7] perf bench: Add build-id injection benchmark Namhyung Kim
@ 2020-09-23 22:13   ` Ian Rogers
  2020-09-24  6:23     ` Namhyung Kim
  0 siblings, 1 reply; 23+ messages in thread
From: Ian Rogers @ 2020-09-23 22:13 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	Mark Rutland, Alexander Shishkin, LKML, Stephane Eranian

On Wed, Sep 23, 2020 at 1:05 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Sometimes I can see perf record piped with perf inject take long time
> processing build-id.  So add inject-build-id benchmark to the
> internals benchmark suite to measure its overhead regularly.
>
> It runs perf inject command internally and feeds the given number of
> synthesized events (MMAP2 + SAMPLE basically).
>
>   Usage: perf bench internals inject-build-id <options>
>
>     -i, --iterations <n>  Number of iterations used to compute average (default: 100)
>     -m, --nr-mmaps <n>    Number of mmap events for each iteration (default: 100)
>     -n, --nr-samples <n>  Number of sample events per mmap event (default: 100)
>     -v, --verbose         be more verbose (show iteration count, DSO name, etc)
>
> By default, it measures average processing time of 100 MMAP2 events
> and 10000 SAMPLE events.  Below is a result on my laptop.
>
>   $ perf bench internals inject-build-id
>   # Running 'internals/inject-build-id' benchmark:
>     Average build-id injection took: 22.997 msec (+- 0.067 msec)
>     Average time per event: 2.255 usec (+- 0.007 usec)

This is great! Some suggestions below.

> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/bench/Build            |   1 +
>  tools/perf/bench/bench.h          |   1 +
>  tools/perf/bench/inject-buildid.c | 417 ++++++++++++++++++++++++++++++
>  tools/perf/builtin-bench.c        |   1 +
>  tools/perf/builtin-inject.c       |   9 +-
>  tools/perf/util/build-id.h        |   4 +
>  6 files changed, 428 insertions(+), 5 deletions(-)
>  create mode 100644 tools/perf/bench/inject-buildid.c
>
> diff --git a/tools/perf/bench/Build b/tools/perf/bench/Build
> index dd68a40a790c..8b52591338d6 100644
> --- a/tools/perf/bench/Build
> +++ b/tools/perf/bench/Build
> @@ -12,6 +12,7 @@ perf-y += epoll-ctl.o
>  perf-y += synthesize.o
>  perf-y += kallsyms-parse.o
>  perf-y += find-bit-bench.o
> +perf-y += inject-buildid.o
>
>  perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-lib.o
>  perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-asm.o
> diff --git a/tools/perf/bench/bench.h b/tools/perf/bench/bench.h
> index 2804812d4154..eac36afab2b3 100644
> --- a/tools/perf/bench/bench.h
> +++ b/tools/perf/bench/bench.h
> @@ -47,6 +47,7 @@ int bench_epoll_wait(int argc, const char **argv);
>  int bench_epoll_ctl(int argc, const char **argv);
>  int bench_synthesize(int argc, const char **argv);
>  int bench_kallsyms_parse(int argc, const char **argv);
> +int bench_inject_build_id(int argc, const char **argv);
>
>  #define BENCH_FORMAT_DEFAULT_STR       "default"
>  #define BENCH_FORMAT_DEFAULT           0
> diff --git a/tools/perf/bench/inject-buildid.c b/tools/perf/bench/inject-buildid.c
> new file mode 100644
> index 000000000000..e5144a85d689
> --- /dev/null
> +++ b/tools/perf/bench/inject-buildid.c
> @@ -0,0 +1,417 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <stdlib.h>
> +#include <stddef.h>
> +#include <ftw.h>
> +#include <fcntl.h>
> +#include <unistd.h>
> +#include <sys/mman.h>
> +#include <sys/wait.h>
> +#include <linux/kernel.h>
> +#include <linux/time64.h>
> +#include <linux/list.h>
> +#include <linux/err.h>
> +#include <internal/lib.h>
> +#include <subcmd/parse-options.h>
> +
> +#include "bench.h"
> +#include "util/data.h"
> +#include "util/stat.h"
> +#include "util/debug.h"
> +#include "util/event.h"
> +#include "util/symbol.h"
> +#include "util/session.h"
> +#include "util/build-id.h"
> +#include "util/synthetic-events.h"
> +
> +#define MMAP_DEV_MAJOR  8
> +
> +static unsigned int iterations = 100;
> +static unsigned int nr_mmaps   = 100;
> +static unsigned int nr_samples = 100;  /* samples per mmap */
> +
> +static u64 bench_sample_type = PERF_SAMPLE_IP | PERF_SAMPLE_TID;
> +static u16 bench_id_hdr_size = 8;  /* only for pid/tid */
> +
> +struct bench_data {
> +       int                     pid;
> +       int                     input_pipe[2];
> +       int                     output_pipe[2];
> +};
> +
> +struct bench_dso {
> +       struct list_head        list;
> +       char                    *name;
> +       int                     ino;
> +};
> +
> +static int nr_dsos;
> +static LIST_HEAD(dso_list);
> +
> +extern int cmd_inject(int argc, const char *argv[]);
> +
> +static const struct option options[] = {
> +       OPT_UINTEGER('i', "iterations", &iterations,
> +                    "Number of iterations used to compute average (default: 100)"),
> +       OPT_UINTEGER('m', "nr-mmaps", &nr_mmaps,
> +                    "Number of mmap events for each iteration (default: 100)"),
> +       OPT_UINTEGER('n', "nr-samples", &nr_samples,
> +                    "Number of sample events per mmap event (default: 100)"),
> +       OPT_INCR('v', "verbose", &verbose,
> +                "be more verbose (show iteration count, DSO name, etc)"),
> +       OPT_END()
> +};
> +
> +static const char *const bench_usage[] = {
> +       "perf bench internals inject-build-id <options>",
> +       NULL
> +};
> +

Perhaps a comment:
/* Helper for collect_dso that adds the given file as a dso to
dso_list if it contains a buildid. Stops after 4 such dsos.*/

> +static int add_dso(const char *fpath, const struct stat *sb __maybe_unused,
> +                  int typeflag, struct FTW *ftwbuf __maybe_unused)
> +{
> +       struct bench_dso *dso;
> +       unsigned char build_id[BUILD_ID_SIZE];
> +
> +       if (typeflag == FTW_D || typeflag == FTW_SL) {
> +               return 0;
> +       }
> +
> +       if (filename__read_build_id(fpath, build_id, BUILD_ID_SIZE) < 0)
> +               return 0;
> +
> +       dso = malloc(sizeof(*dso));
> +       if (dso == NULL)
> +               return -1;
> +
> +       dso->name = realpath(fpath, NULL);
> +       if (dso->name == NULL) {
> +               free(dso);
> +               return -1;
> +       }
> +
> +       dso->ino = nr_dsos++;
> +       list_add(&dso->list, &dso_list);
> +       pr_debug2("  Adding DSO: %s\n", fpath);
> +
> +       /* stop if we collected 4x DSOs than needed */
> +       if ((unsigned)nr_dsos > 4 * nr_mmaps)
> +               return 1;
> +
> +       return 0;
> +}
> +
> +static void collect_dso(void)
> +{
> +       if (nftw("/usr/lib/", add_dso, 10, FTW_PHYS) < 0)
> +               return;
> +
> +       pr_debug("  Collected %d DSOs\n", nr_dsos);

Should this fail if the count isn't 4?

> +}
> +
> +static void release_dso(void)
> +{
> +       struct bench_dso *dso;
> +
> +       while (!list_empty(&dso_list)) {
> +               dso = list_first_entry(&dso_list, struct bench_dso, list);
> +               list_del(&dso->list);
> +               free(dso->name);
> +               free(dso);
> +       }
> +}
> +

Perhaps a comment and move next to synthesize_mmap.
/* Fake address used by mmap events. */

> +static u64 dso_map_addr(struct bench_dso *dso)
> +{
> +       return 0x400000ULL + dso->ino * 8192ULL;
> +}
> +
> +static u32 synthesize_attr(struct bench_data *data)
> +{
> +       union perf_event event;
> +
> +       memset(&event, 0, sizeof(event.attr) + sizeof(u64));
> +
> +       event.header.type = PERF_RECORD_HEADER_ATTR;
> +       event.header.size = sizeof(event.attr) + sizeof(u64);
> +
> +       event.attr.attr.type = PERF_TYPE_SOFTWARE;
> +       event.attr.attr.config = PERF_COUNT_SW_TASK_CLOCK;
> +       event.attr.attr.exclude_kernel = 1;
> +       event.attr.attr.sample_id_all = 1;
> +       event.attr.attr.sample_type = bench_sample_type;
> +
> +       return writen(data->input_pipe[1], &event, event.header.size);
> +}
> +
> +static u32 synthesize_fork(struct bench_data *data)
> +{
> +       union perf_event event;
> +
> +       memset(&event, 0, sizeof(event.fork) + bench_id_hdr_size);
> +
> +       event.header.type = PERF_RECORD_FORK;
> +       event.header.misc = PERF_RECORD_MISC_FORK_EXEC;
> +       event.header.size = sizeof(event.fork) + bench_id_hdr_size;
> +
> +       event.fork.ppid = 1;
> +       event.fork.ptid = 1;
> +       event.fork.pid = data->pid;
> +       event.fork.tid = data->pid;
> +
> +       return writen(data->input_pipe[1], &event, event.header.size);
> +}
> +
> +static u32 synthesize_mmap(struct bench_data *data, struct bench_dso *dso)
> +{
> +       union perf_event event;
> +       size_t len = offsetof(struct perf_record_mmap2, filename);
> +
> +       len += roundup(strlen(dso->name) + 1, 8) + bench_id_hdr_size;
> +
> +       memset(&event, 0, min(len, sizeof(event.mmap2)));
> +
> +       event.header.type = PERF_RECORD_MMAP2;
> +       event.header.misc = PERF_RECORD_MISC_USER;
> +       event.header.size = len;
> +
> +       event.mmap2.pid = data->pid;
> +       event.mmap2.tid = data->pid;
> +       event.mmap2.maj = MMAP_DEV_MAJOR;
> +       event.mmap2.ino = dso->ino;
> +
> +       strcpy(event.mmap2.filename, dso->name);
> +
> +       event.mmap2.start = dso_map_addr(dso);
> +       event.mmap2.len = 4096;
> +       event.mmap2.prot = PROT_EXEC;
> +
> +       if (len > sizeof(event.mmap2)) {
> +               /* write mmap2 event first */
> +               writen(data->input_pipe[1], &event, len - bench_id_hdr_size);
> +               /* write zero-filled sample id header */
> +               memset(&event, 0, bench_id_hdr_size);
> +               writen(data->input_pipe[1], &event, bench_id_hdr_size);
> +       } else {
> +               writen(data->input_pipe[1], &event, len);
> +       }
> +       return len;
> +}
> +
> +static u32 synthesize_sample(struct bench_data *data, struct bench_dso *dso)
> +{
> +       union perf_event event;
> +       struct perf_sample sample = {
> +               .tid = data->pid,
> +               .pid = data->pid,
> +               .ip = dso_map_addr(dso),
> +       };
> +
> +       event.header.type = PERF_RECORD_SAMPLE;
> +       event.header.misc = PERF_RECORD_MISC_USER;
> +       event.header.size = perf_event__sample_event_size(&sample, bench_sample_type, 0);
> +
> +       perf_event__synthesize_sample(&event, bench_sample_type, 0, &sample);
> +
> +       return writen(data->input_pipe[1], &event, event.header.size);
> +}
> +
> +static void sigpipe_handler(int sig __maybe_unused)
> +{
> +       /* child exited */
> +}
> +
> +static int setup_injection(struct bench_data *data)
> +{
> +       int ready_pipe[2];
> +       int dev_null_fd;
> +       char buf;
> +
> +       if (pipe(ready_pipe) < 0)
> +               return -1;
> +
> +       if (pipe(data->input_pipe) < 0)
> +               return -1;
> +
> +       if (pipe(data->output_pipe) < 0)
> +               return -1;
> +
> +       data->pid = fork();
> +       if (data->pid < 0)
> +               return -1;
> +
> +       if (data->pid == 0) {
> +               const char **inject_argv;
> +
> +               close(data->input_pipe[1]);
> +               close(data->output_pipe[0]);
> +               close(ready_pipe[0]);
> +
> +               dup2(data->input_pipe[0], STDIN_FILENO);
> +               close(data->input_pipe[0]);
> +               dup2(data->output_pipe[1], STDOUT_FILENO);
> +               close(data->output_pipe[1]);
> +
> +               dev_null_fd = open("/dev/null", O_WRONLY);
> +               if (dev_null_fd < 0)
> +                       exit(1);
> +
> +               dup2(dev_null_fd, STDERR_FILENO);
> +
> +               inject_argv = calloc(3, sizeof(*inject_argv));
> +               if (inject_argv == NULL)
> +                       exit(1);
> +
> +               inject_argv[0] = strdup("inject");
> +               inject_argv[1] = strdup("-b");
> +
> +               /* signal that we're ready to go */
> +               close(ready_pipe[1]);
> +
> +               cmd_inject(2, inject_argv);
> +
> +               exit(0);
> +       }
> +
> +       signal(SIGPIPE, sigpipe_handler);
> +
> +       close(ready_pipe[1]);
> +       close(data->input_pipe[0]);
> +       close(data->output_pipe[1]);
> +
> +       /* wait for child ready */
> +       if (read(ready_pipe[0], &buf, 1) < 0)
> +               return -1;
> +       close(ready_pipe[0]);
> +
> +       return 0;
> +}

This feels like generic scaffolding that could be shared by other perf
command benchmarks.

> +
> +static int inject_build_id(struct bench_data *data)
> +{
> +       int flag, status;
> +       unsigned int i, k;
> +       char buf[8192];
> +       u64 nread = 0;
> +       u64 len = nr_mmaps / 2 * sizeof(struct perf_record_header_build_id);
> +
> +       flag = fcntl(data->output_pipe[0], F_GETFL, 0);
> +       if (fcntl(data->output_pipe[0], F_SETFL, flag | O_NONBLOCK) < 0)
> +               return -1;
> +
> +       /* this makes the child to run */
> +       if (perf_header__write_pipe(data->input_pipe[1]) < 0)
> +               return -1;
> +
> +       len += synthesize_attr(data);
> +       len += synthesize_fork(data);
> +
> +       for (i = 0; i < nr_mmaps; i++) {
> +               struct bench_dso *dso;
> +               int idx = rand() % (nr_dsos - 1);
> +
> +               dso = list_first_entry(&dso_list, struct bench_dso, list);
> +               while (idx--)
> +                       dso = list_next_entry(dso, list);
> +
> +               pr_debug("   [%2d] injecting: %s\n", i+1, dso->name);
> +               len += synthesize_mmap(data, dso);
> +
> +               for (k = 0; k < nr_samples; k++)
> +                       len += synthesize_sample(data, dso);
> +
> +               /* read out data from child */
> +               while (true) {
> +                       int n;
> +
> +                       n = read(data->output_pipe[0], buf, sizeof(buf));
> +                       if (n <= 0)
> +                               break;
> +                       nread += n;
> +               }
> +       }
> +
> +       /* wait to read data at least as we wrote + some build-ids */
> +       while (nread < len) {
> +               int n;
> +
> +               n = read(data->output_pipe[0], buf, sizeof(buf));
> +               if (n < 0)
> +                       break;
> +               nread += n;
> +       }
> +       close(data->input_pipe[1]);
> +       close(data->output_pipe[0]);
> +
> +       wait(&status);
> +       pr_debug("   Child %d exited with %d\n", data->pid, status);
> +
> +       return 0;
> +}
>

Perhaps we can read the highwater mark (VmHWM) from /proc/[pid]/status
as this would capture cases like buildid injection doing unnecessary
symbol generation.

> +static int do_inject_loop(struct bench_data *data)
> +{
> +       unsigned int i;
> +       struct stats time_stats;
> +       double time_average, time_stddev;
> +
> +       srand(time(NULL));
> +       init_stats(&time_stats);
> +       symbol__init(NULL);
> +
> +       collect_dso();
> +       if (nr_dsos == 0) {
> +               printf("  Cannot collect DSOs for injection\n");
> +               return -1;
> +       }
> +
> +       for (i = 0; i < iterations; i++) {
> +               struct timeval start, end, diff;
> +               u64 runtime_us;
> +
> +               pr_debug("  Iteration #%d\n", i+1);
> +
> +               if (setup_injection(data) < 0) {
> +                       printf("  Build-id injection setup failed\n");
> +                       break;
> +               }
> +
> +               gettimeofday(&start, NULL);
> +               if (inject_build_id(data) < 0) {
> +                       printf("  Build-id injection failed\n");
> +                       break;
> +               }
> +
> +               gettimeofday(&end, NULL);
> +               timersub(&end, &start, &diff);
> +               runtime_us = diff.tv_sec * USEC_PER_SEC + diff.tv_usec;
> +               update_stats(&time_stats, runtime_us);
> +       }
> +
> +       time_average = avg_stats(&time_stats) / USEC_PER_MSEC;
> +       time_stddev = stddev_stats(&time_stats) / USEC_PER_MSEC;
> +       printf("  Average build-id injection took: %.3f msec (+- %.3f msec)\n",
> +               time_average, time_stddev);
> +
> +       /* each iteration, it processes MMAP2 + BUILD_ID + nr_samples * SAMPLE */
> +       time_average = avg_stats(&time_stats) / (nr_mmaps * (nr_samples + 2));
> +       time_stddev = stddev_stats(&time_stats) / (nr_mmaps * (nr_samples + 2));
> +       printf("  Average time per event: %.3f usec (+- %.3f usec)\n",
> +               time_average, time_stddev);
> +       release_dso();
> +       return 0;
> +}
> +
> +int bench_inject_build_id(int argc, const char **argv)
> +{
> +       struct bench_data data;
> +
> +       argc = parse_options(argc, argv, options, bench_usage, 0);
> +       if (argc) {
> +               usage_with_options(bench_usage, options);
> +               exit(EXIT_FAILURE);
> +       }
> +
> +       return do_inject_loop(&data);
> +}
> +
> diff --git a/tools/perf/builtin-bench.c b/tools/perf/builtin-bench.c
> index 4f176039fc8f..62a7b7420a44 100644
> --- a/tools/perf/builtin-bench.c
> +++ b/tools/perf/builtin-bench.c
> @@ -87,6 +87,7 @@ static struct bench epoll_benchmarks[] = {
>  static struct bench internals_benchmarks[] = {
>         { "synthesize", "Benchmark perf event synthesis",       bench_synthesize        },
>         { "kallsyms-parse", "Benchmark kallsyms parsing",       bench_kallsyms_parse    },
> +       { "inject-build-id", "Benchmark build-id injection",    bench_inject_build_id   },
>         { NULL,         NULL,                                   NULL                    }
>  };
>
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index 6d2f410d773a..e4d78f11494e 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -441,11 +441,10 @@ static int dso__inject_build_id(struct dso *dso, struct perf_tool *tool,
>         return 0;
>  }
>
> -static int perf_event__inject_buildid(struct perf_tool *tool,
> -                                     union perf_event *event,
> -                                     struct perf_sample *sample,
> -                                     struct evsel *evsel __maybe_unused,
> -                                     struct machine *machine)
> +int perf_event__inject_buildid(struct perf_tool *tool, union perf_event *event,
> +                              struct perf_sample *sample,
> +                              struct evsel *evsel __maybe_unused,
> +                              struct machine *machine)
>  {
>         struct addr_location al;
>         struct thread *thread;
> diff --git a/tools/perf/util/build-id.h b/tools/perf/util/build-id.h
> index aad419bb165c..949f7e54c9cb 100644
> --- a/tools/perf/util/build-id.h
> +++ b/tools/perf/util/build-id.h
> @@ -29,6 +29,10 @@ int build_id__mark_dso_hit(struct perf_tool *tool, union perf_event *event,
>
>  int dsos__hit_all(struct perf_session *session);
>
> +int perf_event__inject_buildid(struct perf_tool *tool, union perf_event *event,
> +                              struct perf_sample *sample, struct evsel *evsel,
> +                              struct machine *machine);
> +
>  bool perf_session__read_build_ids(struct perf_session *session, bool with_hits);
>  int perf_session__write_buildid_table(struct perf_session *session,
>                                       struct feat_fd *fd);
> --
> 2.28.0.681.g6f77f65b4e-goog
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/7] perf inject: Add --buildid-all option
  2020-09-23  8:05 ` [PATCH 5/7] perf inject: Add --buildid-all option Namhyung Kim
@ 2020-09-23 22:16   ` Ian Rogers
  0 siblings, 0 replies; 23+ messages in thread
From: Ian Rogers @ 2020-09-23 22:16 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	Mark Rutland, Alexander Shishkin, LKML, Stephane Eranian

On Wed, Sep 23, 2020 at 1:06 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Like perf record, we can even more speedup build-id processing by just
> using all DSOs.  Then we don't need to look at all the sample events
> anymore.  The following patch will update perf bench to show the
> result of the --buildid-all option too.
>
> Original-patch-by: Stephane Eranian <eranian@google.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/Documentation/perf-inject.txt |   6 +-
>  tools/perf/builtin-inject.c              | 112 ++++++++++++++++++++++-
>  2 files changed, 112 insertions(+), 6 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-inject.txt b/tools/perf/Documentation/perf-inject.txt
> index 70969ea73e01..a8eccff21281 100644
> --- a/tools/perf/Documentation/perf-inject.txt
> +++ b/tools/perf/Documentation/perf-inject.txt
> @@ -24,8 +24,12 @@ information could make use of this facility.
>  OPTIONS
>  -------
>  -b::
> ---build-ids=::
> +--build-ids::
>          Inject build-ids into the output stream
> +
> +--buildid-all:
> +       Inject build-ids of all DSOs into the output stream
> +
>  -v::
>  --verbose::
>         Be more verbose.
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index d0aa365e7294..500428aaa576 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -10,6 +10,7 @@
>
>  #include "util/color.h"
>  #include "util/dso.h"
> +#include "util/vdso.h"
>  #include "util/evlist.h"
>  #include "util/evsel.h"
>  #include "util/map.h"
> @@ -36,6 +37,7 @@ struct perf_inject {
>         struct perf_tool        tool;
>         struct perf_session     *session;
>         bool                    build_ids;
> +       bool                    build_id_all;
>         bool                    sched_stat;
>         bool                    have_auxtrace;
>         bool                    strip;
> @@ -55,6 +57,9 @@ struct event_entry {
>         union perf_event event[];
>  };
>
> +static int dso__inject_build_id(struct dso *dso, struct perf_tool *tool,
> +                               struct machine *machine, u8 cpumode);
> +
>  static int output_bytes(struct perf_inject *inject, void *buf, size_t sz)
>  {
>         ssize_t size;
> @@ -318,6 +323,68 @@ static int perf_event__jit_repipe_mmap(struct perf_tool *tool,
>  }
>  #endif
>
> +static struct dso *findnew_dso(int pid, int tid, const char *filename,
> +                              struct dso_id *id, struct machine *machine)
> +{
> +       struct thread *thread;
> +       struct nsinfo *nsi = NULL;
> +       struct nsinfo *nnsi;
> +       struct dso *dso;
> +       bool vdso;
> +
> +       thread = machine__findnew_thread(machine, pid, tid);
> +       if (thread == NULL) {
> +               pr_err("cannot find or create a task %d/%d.\n", tid, pid);
> +               return NULL;
> +       }
> +
> +       vdso = is_vdso_map(filename);
> +       nsi = nsinfo__get(thread->nsinfo);
> +
> +       if (vdso) {
> +               /* The vdso maps are always on the host and not the
> +                * container.  Ensure that we don't use setns to look
> +                * them up.
> +                */
> +               nnsi = nsinfo__copy(nsi);
> +               if (nnsi) {
> +                       nsinfo__put(nsi);
> +                       nnsi->need_setns = false;
> +                       nsi = nnsi;
> +               }
> +               dso = machine__findnew_vdso(machine, thread);
> +       } else {
> +               dso = machine__findnew_dso_id(machine, filename, id);
> +       }
> +
> +       if (dso)
> +               dso->nsinfo = nsi;
> +       else
> +               nsinfo__put(nsi);
> +
> +       thread__put(thread);
> +       return dso;
> +}
> +
> +static int perf_event__repipe_buildid_mmap(struct perf_tool *tool,
> +                                          union perf_event *event,
> +                                          struct perf_sample *sample,
> +                                          struct machine *machine)
> +{
> +       struct dso *dso;
> +
> +       dso = findnew_dso(event->mmap.pid, event->mmap.tid,
> +                         event->mmap.filename, NULL, machine);
> +
> +       if (dso && !dso->hit) {
> +               dso->hit = 1;
> +               dso__inject_build_id(dso, tool, machine, sample->cpumode);
> +               dso__put(dso);
> +       }
> +
> +       return perf_event__repipe(tool, event, sample, machine);
> +}
> +
>  static int perf_event__repipe_mmap2(struct perf_tool *tool,
>                                    union perf_event *event,
>                                    struct perf_sample *sample,
> @@ -356,6 +423,33 @@ static int perf_event__jit_repipe_mmap2(struct perf_tool *tool,
>  }
>  #endif
>
> +static int perf_event__repipe_buildid_mmap2(struct perf_tool *tool,
> +                                           union perf_event *event,
> +                                           struct perf_sample *sample,
> +                                           struct machine *machine)
> +{
> +       struct dso_id dso_id = {
> +               .maj = event->mmap2.maj,
> +               .min = event->mmap2.min,
> +               .ino = event->mmap2.ino,
> +               .ino_generation = event->mmap2.ino_generation,
> +       };
> +       struct dso *dso;
> +
> +       dso = findnew_dso(event->mmap2.pid, event->mmap2.tid,
> +                         event->mmap2.filename, &dso_id, machine);
> +
> +       if (dso && !dso->hit) {
> +               dso->hit = 1;
> +               dso__inject_build_id(dso, tool, machine, sample->cpumode);
> +               dso__put(dso);
> +       }
> +
> +       perf_event__repipe(tool, event, sample, machine);
> +
> +       return 0;
> +}
> +
>  static int perf_event__repipe_fork(struct perf_tool *tool,
>                                    union perf_event *event,
>                                    struct perf_sample *sample,
> @@ -613,7 +707,7 @@ static int __cmd_inject(struct perf_inject *inject)
>         signal(SIGINT, sig_handler);
>
>         if (inject->build_ids || inject->sched_stat ||
> -           inject->itrace_synth_opts.set) {
> +           inject->itrace_synth_opts.set || inject->build_id_all) {
>                 inject->tool.mmap         = perf_event__repipe_mmap;
>                 inject->tool.mmap2        = perf_event__repipe_mmap2;
>                 inject->tool.fork         = perf_event__repipe_fork;
> @@ -622,7 +716,10 @@ static int __cmd_inject(struct perf_inject *inject)
>
>         output_data_offset = session->header.data_offset;
>
> -       if (inject->build_ids) {
> +       if (inject->build_id_all) {
> +               inject->tool.mmap         = perf_event__repipe_buildid_mmap;
> +               inject->tool.mmap2        = perf_event__repipe_buildid_mmap2;
> +       } else if (inject->build_ids) {
>                 inject->tool.sample = perf_event__inject_buildid;
>         } else if (inject->sched_stat) {
>                 struct evsel *evsel;
> @@ -766,6 +863,8 @@ int cmd_inject(int argc, const char **argv)
>         struct option options[] = {
>                 OPT_BOOLEAN('b', "build-ids", &inject.build_ids,
>                             "Inject build-ids into the output stream"),
> +               OPT_BOOLEAN(0, "buildid-all", &inject.build_id_all,
> +                           "Inject build-ids of all DSOs into the output stream"),
>                 OPT_STRING('i', "input", &inject.input_name, "file",
>                            "input file name"),
>                 OPT_STRING('o', "output", &inject.output.path, "file",
> @@ -814,8 +913,6 @@ int cmd_inject(int argc, const char **argv)
>                 return -1;
>         }
>
> -       inject.tool.ordered_events = inject.sched_stat;
> -
>         data.path = inject.input_name;
>         inject.session = perf_session__new(&data, true, &inject.tool);
>         if (IS_ERR(inject.session))
> @@ -824,7 +921,7 @@ int cmd_inject(int argc, const char **argv)
>         if (zstd_init(&(inject.session->zstd_data), 0) < 0)
>                 pr_warning("Decompression initialization failed.\n");
>
> -       if (inject.build_ids) {
> +       if (inject.build_ids && !inject.build_id_all) {
>                 /*
>                  * to make sure the mmap records are ordered correctly
>                  * and so that the correct especially due to jitted code
> @@ -834,6 +931,11 @@ int cmd_inject(int argc, const char **argv)
>                 inject.tool.ordered_events = true;
>                 inject.tool.ordering_requires_timestamps = true;
>         }
> +
> +       if (inject.sched_stat) {
> +               inject.tool.ordered_events = true;
> +       }
> +
>  #ifdef HAVE_JITDUMP
>         if (inject.jit_mode) {
>                 inject.tool.mmap2          = perf_event__jit_repipe_mmap2;
> --
> 2.28.0.681.g6f77f65b4e-goog
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/7] perf bench: Run inject-build-id with --buildid-all option too
  2020-09-23  8:05 ` [PATCH 6/7] perf bench: Run inject-build-id with --buildid-all option too Namhyung Kim
@ 2020-09-23 22:17   ` Ian Rogers
  0 siblings, 0 replies; 23+ messages in thread
From: Ian Rogers @ 2020-09-23 22:17 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	Mark Rutland, Alexander Shishkin, LKML, Stephane Eranian

On Wed, Sep 23, 2020 at 1:06 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> For comparison, it now runs the benchmark twice - one if regular -b
> and another for --buildid-all.
>
>   $ perf bench internals inject-build-id
>   # Running 'internals/inject-build-id' benchmark:
>     Average build-id injection took: 18.441 msec (+- 0.106 msec)
>     Average time per event: 1.808 usec (+- 0.010 usec)
>     Average build-id-all injection took: 13.451 msec (+- 0.132 msec)
>     Average time per event: 1.319 usec (+- 0.013 usec)
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/bench/inject-buildid.c | 47 ++++++++++++++++++++-----------
>  1 file changed, 31 insertions(+), 16 deletions(-)
>
> diff --git a/tools/perf/bench/inject-buildid.c b/tools/perf/bench/inject-buildid.c
> index e5144a85d689..7c9f2baecef2 100644
> --- a/tools/perf/bench/inject-buildid.c
> +++ b/tools/perf/bench/inject-buildid.c
> @@ -220,7 +220,7 @@ static void sigpipe_handler(int sig __maybe_unused)
>         /* child exited */
>  }
>
> -static int setup_injection(struct bench_data *data)
> +static int setup_injection(struct bench_data *data, bool build_id_all)
>  {
>         int ready_pipe[2];
>         int dev_null_fd;
> @@ -241,6 +241,7 @@ static int setup_injection(struct bench_data *data)
>
>         if (data->pid == 0) {
>                 const char **inject_argv;
> +               int inject_argc = 2;
>
>                 close(data->input_pipe[1]);
>                 close(data->output_pipe[0]);
> @@ -257,17 +258,22 @@ static int setup_injection(struct bench_data *data)
>
>                 dup2(dev_null_fd, STDERR_FILENO);
>
> -               inject_argv = calloc(3, sizeof(*inject_argv));
> +               if (build_id_all)
> +                       inject_argc++;
> +
> +               inject_argv = calloc(inject_argc + 1, sizeof(*inject_argv));
>                 if (inject_argv == NULL)
>                         exit(1);
>
>                 inject_argv[0] = strdup("inject");
>                 inject_argv[1] = strdup("-b");
> +               if (build_id_all)
> +                       inject_argv[2] = strdup("--buildid-all");
>
>                 /* signal that we're ready to go */
>                 close(ready_pipe[1]);
>
> -               cmd_inject(2, inject_argv);
> +               cmd_inject(inject_argc, inject_argv);
>
>                 exit(0);
>         }
> @@ -348,21 +354,14 @@ static int inject_build_id(struct bench_data *data)
>         return 0;
>  }
>
> -static int do_inject_loop(struct bench_data *data)
> +static void do_inject_loop(struct bench_data *data, bool build_id_all)
>  {
>         unsigned int i;
>         struct stats time_stats;
>         double time_average, time_stddev;
>
> -       srand(time(NULL));
>         init_stats(&time_stats);
> -       symbol__init(NULL);
> -
> -       collect_dso();
> -       if (nr_dsos == 0) {
> -               printf("  Cannot collect DSOs for injection\n");
> -               return -1;
> -       }
> +       pr_debug("  Build-id%s injection benchmark\n", build_id_all ? "-all" : "");
>
>         for (i = 0; i < iterations; i++) {
>                 struct timeval start, end, diff;
> @@ -370,7 +369,7 @@ static int do_inject_loop(struct bench_data *data)
>
>                 pr_debug("  Iteration #%d\n", i+1);
>
> -               if (setup_injection(data) < 0) {
> +               if (setup_injection(data, build_id_all) < 0) {
>                         printf("  Build-id injection setup failed\n");
>                         break;
>                 }
> @@ -389,14 +388,30 @@ static int do_inject_loop(struct bench_data *data)
>
>         time_average = avg_stats(&time_stats) / USEC_PER_MSEC;
>         time_stddev = stddev_stats(&time_stats) / USEC_PER_MSEC;
> -       printf("  Average build-id injection took: %.3f msec (+- %.3f msec)\n",
> -               time_average, time_stddev);
> +       printf("  Average build-id%s injection took: %.3f msec (+- %.3f msec)\n",
> +              build_id_all ? "-all" : "", time_average, time_stddev);
>
>         /* each iteration, it processes MMAP2 + BUILD_ID + nr_samples * SAMPLE */
>         time_average = avg_stats(&time_stats) / (nr_mmaps * (nr_samples + 2));
>         time_stddev = stddev_stats(&time_stats) / (nr_mmaps * (nr_samples + 2));
>         printf("  Average time per event: %.3f usec (+- %.3f usec)\n",
>                 time_average, time_stddev);
> +}
> +
> +static int do_inject_loops(struct bench_data *data)
> +{
> +
> +       srand(time(NULL));
> +       symbol__init(NULL);
> +
> +       collect_dso();
> +       if (nr_dsos == 0) {
> +               printf("  Cannot collect DSOs for injection\n");
> +               return -1;
> +       }
> +
> +       do_inject_loop(data, false);
> +       do_inject_loop(data, true);
>
>         release_dso();
>         return 0;
> @@ -412,6 +427,6 @@ int bench_inject_build_id(int argc, const char **argv)
>                 exit(EXIT_FAILURE);
>         }
>
> -       return do_inject_loop(&data);
> +       return do_inject_loops(&data);
>  }
>
> --
> 2.28.0.681.g6f77f65b4e-goog
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/7] perf inject: Remove stale build-id processing
  2020-09-23 14:36   ` Adrian Hunter
@ 2020-09-24  3:51     ` Namhyung Kim
  0 siblings, 0 replies; 23+ messages in thread
From: Namhyung Kim @ 2020-09-24  3:51 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	Mark Rutland, Alexander Shishkin, LKML, Stephane Eranian,
	Ian Rogers

Hi Adrian,

Thanks for your review!

On Wed, Sep 23, 2020 at 11:36 PM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> On 23/09/20 11:05 am, Namhyung Kim wrote:
> > I think we don't need to call build_id__mark_dso_hit() in the
> > perf_event__repipe_sample() as it's not used by -b option.  In case of
> > the -b option is used, it uses perf_event__inject_buildid() instead.
> > This can remove unnecessary overhead of finding thread/map for each
> > sample event.
> >
> > Also I suspect HEADER_BUILD_ID feature bit setting since we already
> > generated/injected BUILD_ID event into the output stream.  So this
> > header information seems redundant.  I'm not 100% sure about the
> > auxtrace usage, but it looks like not related to this directly.
> >
> > And we now have --buildid-all so users can get the same behavior if
> > they want.
>
> For a perf.data file, don't buildids get written to the HEADER_BUILD_ID
> feature section by perf_session__write_header() if the feature flag is set
> and if they are hit?

Right, that's what perf record does unless -B option is used.
But I'm more concerned about the pipe usage which doesn't use the header.

>
> So, unless -b is used, anything you don't hit you lose i.e. a buildid in the
> HEADER_BUILD_ID feature section of the input file, will not be written to
> the output file.

But perf inject generates PERF_RECORD_HEADER_BUILD_ID events
and puts them into the data stream when -b option is used.

Do you say perf inject should process build-id even when -b is not used?

> >
> > Cc: Adrian Hunter <adrian.hunter@intel.com>
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/perf/builtin-inject.c | 12 ------------
> >  1 file changed, 12 deletions(-)
> >
> > diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> > index 500428aaa576..0191d72be7c4 100644
> > --- a/tools/perf/builtin-inject.c
> > +++ b/tools/perf/builtin-inject.c
> > @@ -277,8 +277,6 @@ static int perf_event__repipe_sample(struct perf_tool *tool,
> >               return f(tool, event, sample, evsel, machine);
> >       }
> >
> > -     build_id__mark_dso_hit(tool, event, sample, evsel, machine);
> > -
>
> I guess that chunk would prevent losing a buildid in a perf.data file?
>
> >       if (inject->itrace_synth_opts.set && sample->aux_sample.size)
> >               event = perf_inject__cut_auxtrace_sample(inject, event, sample);
> >
> > @@ -767,16 +765,6 @@ static int __cmd_inject(struct perf_inject *inject)
> >               return ret;
> >
> >       if (!data_out->is_pipe) {
> > -             if (inject->build_ids)
> > -                     perf_header__set_feat(&session->header,
> > -                                           HEADER_BUILD_ID);
>
> That could be due to confusion with 'perf buildid-list' which will not show
> any buildids from synthesized buildid events unless "with hits" is selected,
> so then it looks like there are no buildids.

Yeah, it's confusing.. I think we should change the behavior to handle
the pipe case properly.

>
> It could be an advantage to have the buildids also in the HEADER_BUILD_ID
> feature section, because then then build-list can list them quickly.

I'm not sure it's good to have duplicate build-id info.
We may add an option just to update the header section and
not inject BUILD_ID events.

>
> > -             /*
> > -              * Keep all buildids when there is unprocessed AUX data because
> > -              * it is not known which ones the AUX trace hits.
> > -              */
> > -             if (perf_header__has_feat(&session->header, HEADER_BUILD_ID) &&
> > -                 inject->have_auxtrace && !inject->itrace_synth_opts.set)
> > -                     dsos__hit_all(session);
>
> I expect that is definitely needed.

OK.

Thanks
Namhyung

>
> >               /*
> >                * The AUX areas have been removed and replaced with
> >                * synthesized hardware events, so clear the feature flag and
> >
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/7] perf bench: Add build-id injection benchmark
  2020-09-23 22:13   ` Ian Rogers
@ 2020-09-24  6:23     ` Namhyung Kim
  0 siblings, 0 replies; 23+ messages in thread
From: Namhyung Kim @ 2020-09-24  6:23 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	Mark Rutland, Alexander Shishkin, LKML, Stephane Eranian

Hi Ian,

On Thu, Sep 24, 2020 at 7:13 AM Ian Rogers <irogers@google.com> wrote:
>
> On Wed, Sep 23, 2020 at 1:05 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Sometimes I can see perf record piped with perf inject take long time
> > processing build-id.  So add inject-build-id benchmark to the
> > internals benchmark suite to measure its overhead regularly.
> >
> > It runs perf inject command internally and feeds the given number of
> > synthesized events (MMAP2 + SAMPLE basically).
> >
> >   Usage: perf bench internals inject-build-id <options>
> >
> >     -i, --iterations <n>  Number of iterations used to compute average (default: 100)
> >     -m, --nr-mmaps <n>    Number of mmap events for each iteration (default: 100)
> >     -n, --nr-samples <n>  Number of sample events per mmap event (default: 100)
> >     -v, --verbose         be more verbose (show iteration count, DSO name, etc)
> >
> > By default, it measures average processing time of 100 MMAP2 events
> > and 10000 SAMPLE events.  Below is a result on my laptop.
> >
> >   $ perf bench internals inject-build-id
> >   # Running 'internals/inject-build-id' benchmark:
> >     Average build-id injection took: 22.997 msec (+- 0.067 msec)
> >     Average time per event: 2.255 usec (+- 0.007 usec)
>
> This is great! Some suggestions below.

Thanks!

>
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
[SNIP]
> > +
> > +static const char *const bench_usage[] = {
> > +       "perf bench internals inject-build-id <options>",
> > +       NULL
> > +};
> > +
>
> Perhaps a comment:
> /* Helper for collect_dso that adds the given file as a dso to
> dso_list if it contains a buildid. Stops after 4 such dsos.*/

Will add.. please see below.

>
> > +static int add_dso(const char *fpath, const struct stat *sb __maybe_unused,
> > +                  int typeflag, struct FTW *ftwbuf __maybe_unused)
> > +{
> > +       struct bench_dso *dso;
> > +       unsigned char build_id[BUILD_ID_SIZE];
> > +
> > +       if (typeflag == FTW_D || typeflag == FTW_SL) {
> > +               return 0;
> > +       }
> > +
> > +       if (filename__read_build_id(fpath, build_id, BUILD_ID_SIZE) < 0)
> > +               return 0;
> > +
> > +       dso = malloc(sizeof(*dso));
> > +       if (dso == NULL)
> > +               return -1;
> > +
> > +       dso->name = realpath(fpath, NULL);
> > +       if (dso->name == NULL) {
> > +               free(dso);
> > +               return -1;
> > +       }
> > +
> > +       dso->ino = nr_dsos++;
> > +       list_add(&dso->list, &dso_list);
> > +       pr_debug2("  Adding DSO: %s\n", fpath);
> > +
> > +       /* stop if we collected 4x DSOs than needed */
> > +       if ((unsigned)nr_dsos > 4 * nr_mmaps)
> > +               return 1;
> > +
> > +       return 0;
> > +}
> > +
> > +static void collect_dso(void)
> > +{
> > +       if (nftw("/usr/lib/", add_dso, 10, FTW_PHYS) < 0)
> > +               return;
> > +
> > +       pr_debug("  Collected %d DSOs\n", nr_dsos);
>
> Should this fail if the count isn't 4?

The add_dso would stop if it collected enough DSOs.
I chose it as 4 x nr_mmaps (default: 100).

It's gonna pick a DSO in the list randomly during benchmark
and I want to reduce the chance it selects the same one in the
same iteration. So instead of having nr_mmaps DSOs, it keeps
4 times more DSOs than needed.

>
> > +}
> > +
> > +static void release_dso(void)
> > +{
> > +       struct bench_dso *dso;
> > +
> > +       while (!list_empty(&dso_list)) {
> > +               dso = list_first_entry(&dso_list, struct bench_dso, list);
> > +               list_del(&dso->list);
> > +               free(dso->name);
> > +               free(dso);
> > +       }
> > +}
> > +
>
> Perhaps a comment and move next to synthesize_mmap.
> /* Fake address used by mmap events. */

OK, will do.  (and it's used by sample events too)

>
> > +static u64 dso_map_addr(struct bench_dso *dso)
> > +{
> > +       return 0x400000ULL + dso->ino * 8192ULL;
> > +}
[SNIP]

> > +static int setup_injection(struct bench_data *data)
> > +{
> > +       int ready_pipe[2];
> > +       int dev_null_fd;
> > +       char buf;
> > +
> > +       if (pipe(ready_pipe) < 0)
> > +               return -1;
> > +
> > +       if (pipe(data->input_pipe) < 0)
> > +               return -1;
> > +
> > +       if (pipe(data->output_pipe) < 0)
> > +               return -1;
> > +
> > +       data->pid = fork();
> > +       if (data->pid < 0)
> > +               return -1;
> > +
> > +       if (data->pid == 0) {
> > +               const char **inject_argv;
> > +
> > +               close(data->input_pipe[1]);
> > +               close(data->output_pipe[0]);
> > +               close(ready_pipe[0]);
> > +
> > +               dup2(data->input_pipe[0], STDIN_FILENO);
> > +               close(data->input_pipe[0]);
> > +               dup2(data->output_pipe[1], STDOUT_FILENO);
> > +               close(data->output_pipe[1]);
> > +
> > +               dev_null_fd = open("/dev/null", O_WRONLY);
> > +               if (dev_null_fd < 0)
> > +                       exit(1);
> > +
> > +               dup2(dev_null_fd, STDERR_FILENO);
> > +
> > +               inject_argv = calloc(3, sizeof(*inject_argv));
> > +               if (inject_argv == NULL)
> > +                       exit(1);
> > +
> > +               inject_argv[0] = strdup("inject");
> > +               inject_argv[1] = strdup("-b");
> > +
> > +               /* signal that we're ready to go */
> > +               close(ready_pipe[1]);
> > +
> > +               cmd_inject(2, inject_argv);
> > +
> > +               exit(0);
> > +       }
> > +
> > +       signal(SIGPIPE, sigpipe_handler);
> > +
> > +       close(ready_pipe[1]);
> > +       close(data->input_pipe[0]);
> > +       close(data->output_pipe[1]);
> > +
> > +       /* wait for child ready */
> > +       if (read(ready_pipe[0], &buf, 1) < 0)
> > +               return -1;
> > +       close(ready_pipe[0]);
> > +
> > +       return 0;
> > +}
>
> This feels like generic scaffolding that could be shared by other perf
> command benchmarks.

Maybe.. the thing is perf inject usually works on pipes so it needed
a new process to run the test.  Probably we can simply run others
in the same process.

>
> > +
> > +static int inject_build_id(struct bench_data *data)
> > +{
> > +       int flag, status;
> > +       unsigned int i, k;
> > +       char buf[8192];
> > +       u64 nread = 0;
> > +       u64 len = nr_mmaps / 2 * sizeof(struct perf_record_header_build_id);
> > +
> > +       flag = fcntl(data->output_pipe[0], F_GETFL, 0);
> > +       if (fcntl(data->output_pipe[0], F_SETFL, flag | O_NONBLOCK) < 0)
> > +               return -1;
> > +
> > +       /* this makes the child to run */
> > +       if (perf_header__write_pipe(data->input_pipe[1]) < 0)
> > +               return -1;
> > +
> > +       len += synthesize_attr(data);
> > +       len += synthesize_fork(data);
> > +
> > +       for (i = 0; i < nr_mmaps; i++) {
> > +               struct bench_dso *dso;
> > +               int idx = rand() % (nr_dsos - 1);
> > +
> > +               dso = list_first_entry(&dso_list, struct bench_dso, list);
> > +               while (idx--)
> > +                       dso = list_next_entry(dso, list);
> > +
> > +               pr_debug("   [%2d] injecting: %s\n", i+1, dso->name);
> > +               len += synthesize_mmap(data, dso);
> > +
> > +               for (k = 0; k < nr_samples; k++)
> > +                       len += synthesize_sample(data, dso);
> > +
> > +               /* read out data from child */
> > +               while (true) {
> > +                       int n;
> > +
> > +                       n = read(data->output_pipe[0], buf, sizeof(buf));
> > +                       if (n <= 0)
> > +                               break;
> > +                       nread += n;
> > +               }
> > +       }
> > +
> > +       /* wait to read data at least as we wrote + some build-ids */
> > +       while (nread < len) {
> > +               int n;
> > +
> > +               n = read(data->output_pipe[0], buf, sizeof(buf));
> > +               if (n < 0)
> > +                       break;
> > +               nread += n;
> > +       }
> > +       close(data->input_pipe[1]);
> > +       close(data->output_pipe[0]);
> > +
> > +       wait(&status);
> > +       pr_debug("   Child %d exited with %d\n", data->pid, status);
> > +
> > +       return 0;
> > +}
> >
>
> Perhaps we can read the highwater mark (VmHWM) from /proc/[pid]/status
> as this would capture cases like buildid injection doing unnecessary
> symbol generation.

Good idea!  I'll add it and check we can see the difference.

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] perf inject: Do not load map/dso when injecting build-id
  2020-09-23  8:05 ` [PATCH 4/7] perf inject: Do not load map/dso when injecting build-id Namhyung Kim
@ 2020-09-24 13:09   ` Jiri Olsa
  2020-09-24 13:20     ` Namhyung Kim
  0 siblings, 1 reply; 23+ messages in thread
From: Jiri Olsa @ 2020-09-24 13:09 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Mark Rutland, Alexander Shishkin, LKML, Stephane Eranian,
	Ian Rogers

On Wed, Sep 23, 2020 at 05:05:34PM +0900, Namhyung Kim wrote:

SNIP

> -static inline int is_no_dso_memory(const char *filename)
> -{
> -	return !strncmp(filename, "[stack", 6) ||
> -	       !strncmp(filename, "/SYSV",5)   ||
> -	       !strcmp(filename, "[heap]");
> -}
> -
>  static inline int is_android_lib(const char *filename)
>  {
>  	return strstarts(filename, "/data/app-lib/") ||
> @@ -158,7 +143,7 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
>  		int anon, no_dso, vdso, android;
>  
>  		android = is_android_lib(filename);
> -		anon = is_anon_memory(filename, flags);
> +		anon = is_anon_memory(filename) || flags & MAP_HUGETLB;

what's the reason to take 'flags & MAP_HUGETLB' out of is_anon_memory?

jirka

>  		vdso = is_vdso_map(filename);
>  		no_dso = is_no_dso_memory(filename);
>  		map->prot = prot;
> diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
> index c2f5d28fe73a..b1c0686db1b7 100644
> --- a/tools/perf/util/map.h
> +++ b/tools/perf/util/map.h
> @@ -171,4 +171,18 @@ static inline bool is_bpf_image(const char *name)
>  	return strncmp(name, "bpf_trampoline_", sizeof("bpf_trampoline_") - 1) == 0 ||
>  	       strncmp(name, "bpf_dispatcher_", sizeof("bpf_dispatcher_") - 1) == 0;
>  }
> +
> +static inline int is_anon_memory(const char *filename)
> +{
> +	return !strcmp(filename, "//anon") ||
> +	       !strncmp(filename, "/dev/zero", sizeof("/dev/zero") - 1) ||
> +	       !strncmp(filename, "/anon_hugepage", sizeof("/anon_hugepage") - 1);
> +}
> +
> +static inline int is_no_dso_memory(const char *filename)
> +{
> +	return !strncmp(filename, "[stack", 6) ||
> +	       !strncmp(filename, "/SYSV", 5)  ||
> +	       !strcmp(filename, "[heap]");
> +}
>  #endif /* __PERF_MAP_H */
> -- 
> 2.28.0.681.g6f77f65b4e-goog
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] perf inject: Do not load map/dso when injecting build-id
  2020-09-24 13:09   ` Jiri Olsa
@ 2020-09-24 13:20     ` Namhyung Kim
  2020-09-24 13:44       ` Jiri Olsa
  0 siblings, 1 reply; 23+ messages in thread
From: Namhyung Kim @ 2020-09-24 13:20 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Mark Rutland, Alexander Shishkin, LKML, Stephane Eranian,
	Ian Rogers

On Thu, Sep 24, 2020 at 10:09 PM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Wed, Sep 23, 2020 at 05:05:34PM +0900, Namhyung Kim wrote:
>
> SNIP
>
> > -static inline int is_no_dso_memory(const char *filename)
> > -{
> > -     return !strncmp(filename, "[stack", 6) ||
> > -            !strncmp(filename, "/SYSV",5)   ||
> > -            !strcmp(filename, "[heap]");
> > -}
> > -
> >  static inline int is_android_lib(const char *filename)
> >  {
> >       return strstarts(filename, "/data/app-lib/") ||
> > @@ -158,7 +143,7 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
> >               int anon, no_dso, vdso, android;
> >
> >               android = is_android_lib(filename);
> > -             anon = is_anon_memory(filename, flags);
> > +             anon = is_anon_memory(filename) || flags & MAP_HUGETLB;
>
> what's the reason to take 'flags & MAP_HUGETLB' out of is_anon_memory?

The MAP_HUGETLB is defined in uapi/linux/mman.h and I had trouble
when including the header in the map.h file.

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/7] perf inject: Remove stale build-id processing
  2020-09-23  8:05 ` [PATCH 7/7] perf inject: Remove stale build-id processing Namhyung Kim
  2020-09-23 14:36   ` Adrian Hunter
@ 2020-09-24 13:33   ` Jiri Olsa
  2020-09-24 14:23     ` Namhyung Kim
  1 sibling, 1 reply; 23+ messages in thread
From: Jiri Olsa @ 2020-09-24 13:33 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Mark Rutland, Alexander Shishkin, LKML, Stephane Eranian,
	Ian Rogers, Adrian Hunter

On Wed, Sep 23, 2020 at 05:05:37PM +0900, Namhyung Kim wrote:
> I think we don't need to call build_id__mark_dso_hit() in the
> perf_event__repipe_sample() as it's not used by -b option.  In case of
> the -b option is used, it uses perf_event__inject_buildid() instead.
> This can remove unnecessary overhead of finding thread/map for each
> sample event.
> 
> Also I suspect HEADER_BUILD_ID feature bit setting since we already
> generated/injected BUILD_ID event into the output stream.  So this
> header information seems redundant.  I'm not 100% sure about the
> auxtrace usage, but it looks like not related to this directly.
> 
> And we now have --buildid-all so users can get the same behavior if
> they want.
> 
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-inject.c | 12 ------------
>  1 file changed, 12 deletions(-)
> 
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index 500428aaa576..0191d72be7c4 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -277,8 +277,6 @@ static int perf_event__repipe_sample(struct perf_tool *tool,
>  		return f(tool, event, sample, evsel, machine);
>  	}
>  
> -	build_id__mark_dso_hit(tool, event, sample, evsel, machine);
> -

I recalled using simple 'perf inject -i .. -o .. ' to get uncompressed data
from 'perf record -z' and I though this change will force inject not to store
all build ids ... but it's happening even without your change ;-)

	$ ./perf record ls
	...
	[ perf record: Woken up 1 times to write data ]
	[ perf record: Captured and wrote 0.016 MB perf.data (15 samples) ]

	$ ./perf inject -o perf.data.new -i perf.data

	$ ./perf buildid-list

	17f4e448cc746582ea1881528deb549f7fdb3fd5 [kernel.kallsyms]
	b516839521ded07bb1fbd0a0276be9820ee8908e /usr/bin/ls
	1805c738c8f3ec0f47b7ea09080c28f34d18a82b /usr/lib64/ld-2.31.so
	f22785ea7e42e8aa9097a567a3cc8ae214cae4b6 [vdso]
	d278249792061c6b74d1693ca59513be1def13f2 /usr/lib64/libc-2.31.so

	$ ./perf buildid-list -i perf.data.new
	f22785ea7e42e8aa9097a567a3cc8ae214cae4b6 [vdso]

jirka


>  	if (inject->itrace_synth_opts.set && sample->aux_sample.size)
>  		event = perf_inject__cut_auxtrace_sample(inject, event, sample);
>  
> @@ -767,16 +765,6 @@ static int __cmd_inject(struct perf_inject *inject)
>  		return ret;
>  
>  	if (!data_out->is_pipe) {
> -		if (inject->build_ids)
> -			perf_header__set_feat(&session->header,
> -					      HEADER_BUILD_ID);
> -		/*
> -		 * Keep all buildids when there is unprocessed AUX data because
> -		 * it is not known which ones the AUX trace hits.
> -		 */
> -		if (perf_header__has_feat(&session->header, HEADER_BUILD_ID) &&
> -		    inject->have_auxtrace && !inject->itrace_synth_opts.set)
> -			dsos__hit_all(session);
>  		/*
>  		 * The AUX areas have been removed and replaced with
>  		 * synthesized hardware events, so clear the feature flag and
> -- 
> 2.28.0.681.g6f77f65b4e-goog
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCHSET v2 0/7] perf inject: Speed build-id injection
  2020-09-23  8:05 [PATCHSET v2 0/7] perf inject: Speed build-id injection Namhyung Kim
                   ` (6 preceding siblings ...)
  2020-09-23  8:05 ` [PATCH 7/7] perf inject: Remove stale build-id processing Namhyung Kim
@ 2020-09-24 13:35 ` Jiri Olsa
  7 siblings, 0 replies; 23+ messages in thread
From: Jiri Olsa @ 2020-09-24 13:35 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Mark Rutland, Alexander Shishkin, LKML, Stephane Eranian,
	Ian Rogers

On Wed, Sep 23, 2020 at 05:05:30PM +0900, Namhyung Kim wrote:
> Hello,
> 
> This is my second attempt to speed up build-id injection.  As this is
> to improve performance, I've added a benchmark for it.  Please look at
> the usage in the first commit.
> 
> By default, it measures average processing time of 100 MMAP2 events
> and 10000 SAMPLE events.  Below is the current result on my laptop.
> 
>   $ perf bench internals inject-build-id
>   # Running 'internals/inject-build-id' benchmark:
>     Average build-id injection took: 22.997 msec (+- 0.067 msec)
>     Average time per event: 2.255 usec (+- 0.007 usec)
> 
> With this patchset applied, it got this:
> 
>   $ perf bench internals inject-build-id
>   # Running 'internals/inject-build-id' benchmark:
>     Average build-id injection took: 18.441 msec (+- 0.106 msec)
>     Average time per event: 1.808 usec (+- 0.010 usec)
>     Average build-id-all injection took: 13.451 msec (+- 0.132 msec)
>     Average time per event: 1.319 usec (+- 0.013 usec)

nice, I can see that speed up as well

Acked-by: Jiri Olsa <jolsa@redhat.com>

thanks,
jirka


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] perf inject: Do not load map/dso when injecting build-id
  2020-09-24 13:20     ` Namhyung Kim
@ 2020-09-24 13:44       ` Jiri Olsa
  2020-09-24 14:46         ` Namhyung Kim
  0 siblings, 1 reply; 23+ messages in thread
From: Jiri Olsa @ 2020-09-24 13:44 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Mark Rutland, Alexander Shishkin, LKML, Stephane Eranian,
	Ian Rogers

On Thu, Sep 24, 2020 at 10:20:51PM +0900, Namhyung Kim wrote:
> On Thu, Sep 24, 2020 at 10:09 PM Jiri Olsa <jolsa@redhat.com> wrote:
> >
> > On Wed, Sep 23, 2020 at 05:05:34PM +0900, Namhyung Kim wrote:
> >
> > SNIP
> >
> > > -static inline int is_no_dso_memory(const char *filename)
> > > -{
> > > -     return !strncmp(filename, "[stack", 6) ||
> > > -            !strncmp(filename, "/SYSV",5)   ||
> > > -            !strcmp(filename, "[heap]");
> > > -}
> > > -
> > >  static inline int is_android_lib(const char *filename)
> > >  {
> > >       return strstarts(filename, "/data/app-lib/") ||
> > > @@ -158,7 +143,7 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
> > >               int anon, no_dso, vdso, android;
> > >
> > >               android = is_android_lib(filename);
> > > -             anon = is_anon_memory(filename, flags);
> > > +             anon = is_anon_memory(filename) || flags & MAP_HUGETLB;
> >
> > what's the reason to take 'flags & MAP_HUGETLB' out of is_anon_memory?
> 
> The MAP_HUGETLB is defined in uapi/linux/mman.h and I had trouble
> when including the header in the map.h file.

could you share the error? it might be corner case, but it
could bite us in future

also flags are stored just in map not dso so you'd need to
add that as arg to dso__inject_build_id

thanks,
jirka

> 
> Thanks
> Namhyung
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/7] perf inject: Remove stale build-id processing
  2020-09-24 13:33   ` Jiri Olsa
@ 2020-09-24 14:23     ` Namhyung Kim
  0 siblings, 0 replies; 23+ messages in thread
From: Namhyung Kim @ 2020-09-24 14:23 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Mark Rutland, Alexander Shishkin, LKML, Stephane Eranian,
	Ian Rogers, Adrian Hunter

On Thu, Sep 24, 2020 at 10:34 PM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Wed, Sep 23, 2020 at 05:05:37PM +0900, Namhyung Kim wrote:
> > I think we don't need to call build_id__mark_dso_hit() in the
> > perf_event__repipe_sample() as it's not used by -b option.  In case of
> > the -b option is used, it uses perf_event__inject_buildid() instead.
> > This can remove unnecessary overhead of finding thread/map for each
> > sample event.
> >
> > Also I suspect HEADER_BUILD_ID feature bit setting since we already
> > generated/injected BUILD_ID event into the output stream.  So this
> > header information seems redundant.  I'm not 100% sure about the
> > auxtrace usage, but it looks like not related to this directly.
> >
> > And we now have --buildid-all so users can get the same behavior if
> > they want.
> >
> > Cc: Adrian Hunter <adrian.hunter@intel.com>
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/perf/builtin-inject.c | 12 ------------
> >  1 file changed, 12 deletions(-)
> >
> > diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> > index 500428aaa576..0191d72be7c4 100644
> > --- a/tools/perf/builtin-inject.c
> > +++ b/tools/perf/builtin-inject.c
> > @@ -277,8 +277,6 @@ static int perf_event__repipe_sample(struct perf_tool *tool,
> >               return f(tool, event, sample, evsel, machine);
> >       }
> >
> > -     build_id__mark_dso_hit(tool, event, sample, evsel, machine);
> > -
>
> I recalled using simple 'perf inject -i .. -o .. ' to get uncompressed data
> from 'perf record -z' and I though this change will force inject not to store
> all build ids ... but it's happening even without your change ;-)

I wasn't aware of that use case.  Will check..

Anyway, basically perf record saves build-id in the file header unless -B
option is used.  I think we don't need to update feature data if user didn't
request something explicit (maybe like when -b or --itrace is used).

Now I've realized that current code always updates the feature data
including build-id so it needs to mark dso.  But I think it's meaningless
because 1) if the file already has build-id feature no need to do the
marking again, or 2) if the input is a pipe we need -b option and
then the marking code won't run.

Maybe the only case we are concerned about is that the data file
doesn't have a build-id feature and we don't want to inject build-id
events but want to update the build-id feature data in the header.
I'm not sure if it's a good practice, but if we really want to support
it, I think we should add a dedicated option.

>
>         $ ./perf record ls
>         ...
>         [ perf record: Woken up 1 times to write data ]
>         [ perf record: Captured and wrote 0.016 MB perf.data (15 samples) ]
>
>         $ ./perf inject -o perf.data.new -i perf.data
>
>         $ ./perf buildid-list
>
>         17f4e448cc746582ea1881528deb549f7fdb3fd5 [kernel.kallsyms]
>         b516839521ded07bb1fbd0a0276be9820ee8908e /usr/bin/ls
>         1805c738c8f3ec0f47b7ea09080c28f34d18a82b /usr/lib64/ld-2.31.so
>         f22785ea7e42e8aa9097a567a3cc8ae214cae4b6 [vdso]
>         d278249792061c6b74d1693ca59513be1def13f2 /usr/lib64/libc-2.31.so
>
>         $ ./perf buildid-list -i perf.data.new
>         f22785ea7e42e8aa9097a567a3cc8ae214cae4b6 [vdso]

Oh... I found that the current code doesn't handle mmap events
unless the -b (or other) option is used.  So even if it tries to
mark dso it couldn't find one.  But vdso is created separately
so you can see it only IMHO.

Yes, we need to fix this.. but I'm not sure what's the meaning
of running perf inject without any option. :)

Thanks
Namhyung

>
> jirka
>
>
> >       if (inject->itrace_synth_opts.set && sample->aux_sample.size)
> >               event = perf_inject__cut_auxtrace_sample(inject, event, sample);
> >
> > @@ -767,16 +765,6 @@ static int __cmd_inject(struct perf_inject *inject)
> >               return ret;
> >
> >       if (!data_out->is_pipe) {
> > -             if (inject->build_ids)
> > -                     perf_header__set_feat(&session->header,
> > -                                           HEADER_BUILD_ID);
> > -             /*
> > -              * Keep all buildids when there is unprocessed AUX data because
> > -              * it is not known which ones the AUX trace hits.
> > -              */
> > -             if (perf_header__has_feat(&session->header, HEADER_BUILD_ID) &&
> > -                 inject->have_auxtrace && !inject->itrace_synth_opts.set)
> > -                     dsos__hit_all(session);
> >               /*
> >                * The AUX areas have been removed and replaced with
> >                * synthesized hardware events, so clear the feature flag and
> > --
> > 2.28.0.681.g6f77f65b4e-goog
> >
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] perf inject: Do not load map/dso when injecting build-id
  2020-09-24 13:44       ` Jiri Olsa
@ 2020-09-24 14:46         ` Namhyung Kim
  2020-09-25 14:26           ` Jiri Olsa
  0 siblings, 1 reply; 23+ messages in thread
From: Namhyung Kim @ 2020-09-24 14:46 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra,
	Mark Rutland, Alexander Shishkin, LKML, Stephane Eranian,
	Ian Rogers

On Thu, Sep 24, 2020 at 03:44:44PM +0200, Jiri Olsa wrote:
> On Thu, Sep 24, 2020 at 10:20:51PM +0900, Namhyung Kim wrote:
> > On Thu, Sep 24, 2020 at 10:09 PM Jiri Olsa <jolsa@redhat.com> wrote:
> > >
> > > On Wed, Sep 23, 2020 at 05:05:34PM +0900, Namhyung Kim wrote:
> > >
> > > SNIP
> > >
> > > > -static inline int is_no_dso_memory(const char *filename)
> > > > -{
> > > > -     return !strncmp(filename, "[stack", 6) ||
> > > > -            !strncmp(filename, "/SYSV",5)   ||
> > > > -            !strcmp(filename, "[heap]");
> > > > -}
> > > > -
> > > >  static inline int is_android_lib(const char *filename)
> > > >  {
> > > >       return strstarts(filename, "/data/app-lib/") ||
> > > > @@ -158,7 +143,7 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
> > > >               int anon, no_dso, vdso, android;
> > > >
> > > >               android = is_android_lib(filename);
> > > > -             anon = is_anon_memory(filename, flags);
> > > > +             anon = is_anon_memory(filename) || flags & MAP_HUGETLB;
> > >
> > > what's the reason to take 'flags & MAP_HUGETLB' out of is_anon_memory?
> > 
> > The MAP_HUGETLB is defined in uapi/linux/mman.h and I had trouble
> > when including the header in the map.h file.
> 
> could you share the error? it might be corner case, but it
> could bite us in future

Sure.

  CC       util/session.o
In file included from /home/namhyung/project/linux/tools/include/uapi/asm-generic/mman-common-tools.h:5,
                 from /home/namhyung/project/linux/tools/include/uapi/asm-generic/mman.h:5,
                 from /home/namhyung/project/linux/tools/arch/x86/include/uapi/asm/mman.h:5,
                 from /home/namhyung/project/linux/tools/include/uapi/linux/mman.h:5,
                 from util/map.h:13,
                 from util/session.c:21:
/home/namhyung/project/linux/tools/include/uapi/asm-generic/mman-common.h:26: error: "MAP_POPULATE" redefined [-Werror]
   26 | #define MAP_POPULATE  0x008000 /* populate (prefault) pagetables */
      | 
In file included from /usr/include/x86_64-linux-gnu/bits/mman.h:31,
                 from /usr/include/x86_64-linux-gnu/sys/mman.h:41,
                 from util/session.c:12:
/usr/include/x86_64-linux-gnu/bits/mman-map-flags-generic.h:34: note: this is the location of the previous definition
   34 | # define MAP_POPULATE 0x08000  /* Populate (prefault) pagetables.  */
      | 

This is repeated for each macro definitions..


> 
> also flags are stored just in map not dso so you'd need to
> add that as arg to dso__inject_build_id

Will do.

Thanks
Namhyung

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] perf inject: Do not load map/dso when injecting build-id
  2020-09-24 14:46         ` Namhyung Kim
@ 2020-09-25 14:26           ` Jiri Olsa
  2020-09-28 12:31             ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 23+ messages in thread
From: Jiri Olsa @ 2020-09-25 14:26 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, Peter Zijlstra, Mark Rutland, Alexander Shishkin,
	LKML, Stephane Eranian, Ian Rogers

On Thu, Sep 24, 2020 at 11:46:32PM +0900, Namhyung Kim wrote:
> On Thu, Sep 24, 2020 at 03:44:44PM +0200, Jiri Olsa wrote:
> > On Thu, Sep 24, 2020 at 10:20:51PM +0900, Namhyung Kim wrote:
> > > On Thu, Sep 24, 2020 at 10:09 PM Jiri Olsa <jolsa@redhat.com> wrote:
> > > >
> > > > On Wed, Sep 23, 2020 at 05:05:34PM +0900, Namhyung Kim wrote:
> > > >
> > > > SNIP
> > > >
> > > > > -static inline int is_no_dso_memory(const char *filename)
> > > > > -{
> > > > > -     return !strncmp(filename, "[stack", 6) ||
> > > > > -            !strncmp(filename, "/SYSV",5)   ||
> > > > > -            !strcmp(filename, "[heap]");
> > > > > -}
> > > > > -
> > > > >  static inline int is_android_lib(const char *filename)
> > > > >  {
> > > > >       return strstarts(filename, "/data/app-lib/") ||
> > > > > @@ -158,7 +143,7 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
> > > > >               int anon, no_dso, vdso, android;
> > > > >
> > > > >               android = is_android_lib(filename);
> > > > > -             anon = is_anon_memory(filename, flags);
> > > > > +             anon = is_anon_memory(filename) || flags & MAP_HUGETLB;
> > > >
> > > > what's the reason to take 'flags & MAP_HUGETLB' out of is_anon_memory?
> > > 
> > > The MAP_HUGETLB is defined in uapi/linux/mman.h and I had trouble
> > > when including the header in the map.h file.
> > 
> > could you share the error? it might be corner case, but it
> > could bite us in future
> 
> Sure.
> 
>   CC       util/session.o
> In file included from /home/namhyung/project/linux/tools/include/uapi/asm-generic/mman-common-tools.h:5,
>                  from /home/namhyung/project/linux/tools/include/uapi/asm-generic/mman.h:5,
>                  from /home/namhyung/project/linux/tools/arch/x86/include/uapi/asm/mman.h:5,
>                  from /home/namhyung/project/linux/tools/include/uapi/linux/mman.h:5,
>                  from util/map.h:13,
>                  from util/session.c:21:
> /home/namhyung/project/linux/tools/include/uapi/asm-generic/mman-common.h:26: error: "MAP_POPULATE" redefined [-Werror]
>    26 | #define MAP_POPULATE  0x008000 /* populate (prefault) pagetables */
>       | 
> In file included from /usr/include/x86_64-linux-gnu/bits/mman.h:31,
>                  from /usr/include/x86_64-linux-gnu/sys/mman.h:41,
>                  from util/session.c:12:
> /usr/include/x86_64-linux-gnu/bits/mman-map-flags-generic.h:34: note: this is the location of the previous definition
>    34 | # define MAP_POPULATE 0x08000  /* Populate (prefault) pagetables.  */
>       | 
> 
> This is repeated for each macro definitions..

hm, some black magic happened in the past and it looks like now we
can't have <sys/mman.h> and <linux/mman.h> includes together

it looks related to this commit:
  be709d48329a tools headers uapi: Sync asm-generic/mman-common.h and linux/mman.h

I'm not sure I understand the purpose of asm-generic/mman-common-tools.h file

Arnaldo,
any chance you might have some quick solution before I dive in?
you can reproduce the issue with change below

thanks,
jirka


---
diff --git a/tools/perf/util/map.h b/tools/perf/util/map.h
index b1c0686db1b7..a0b0281d8eab 100644
--- a/tools/perf/util/map.h
+++ b/tools/perf/util/map.h
@@ -10,6 +10,7 @@
 #include <string.h>
 #include <stdbool.h>
 #include <linux/types.h>
+#include <sys/mman.h>


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 4/7] perf inject: Do not load map/dso when injecting build-id
  2020-09-25 14:26           ` Jiri Olsa
@ 2020-09-28 12:31             ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 23+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-09-28 12:31 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Namhyung Kim, Ingo Molnar, Peter Zijlstra, Mark Rutland,
	Alexander Shishkin, LKML, Stephane Eranian, Ian Rogers

Em Fri, Sep 25, 2020 at 04:26:19PM +0200, Jiri Olsa escreveu:
> On Thu, Sep 24, 2020 at 11:46:32PM +0900, Namhyung Kim wrote:
> > On Thu, Sep 24, 2020 at 03:44:44PM +0200, Jiri Olsa wrote:
> > > On Thu, Sep 24, 2020 at 10:20:51PM +0900, Namhyung Kim wrote:
> > > > On Thu, Sep 24, 2020 at 10:09 PM Jiri Olsa <jolsa@redhat.com> wrote:
> > > > >
> > > > > On Wed, Sep 23, 2020 at 05:05:34PM +0900, Namhyung Kim wrote:
> > > > >
> > > > > SNIP
> > > > >
> > > > > > -static inline int is_no_dso_memory(const char *filename)
> > > > > > -{
> > > > > > -     return !strncmp(filename, "[stack", 6) ||
> > > > > > -            !strncmp(filename, "/SYSV",5)   ||
> > > > > > -            !strcmp(filename, "[heap]");
> > > > > > -}
> > > > > > -
> > > > > >  static inline int is_android_lib(const char *filename)
> > > > > >  {
> > > > > >       return strstarts(filename, "/data/app-lib/") ||
> > > > > > @@ -158,7 +143,7 @@ struct map *map__new(struct machine *machine, u64 start, u64 len,
> > > > > >               int anon, no_dso, vdso, android;
> > > > > >
> > > > > >               android = is_android_lib(filename);
> > > > > > -             anon = is_anon_memory(filename, flags);
> > > > > > +             anon = is_anon_memory(filename) || flags & MAP_HUGETLB;
> > > > >
> > > > > what's the reason to take 'flags & MAP_HUGETLB' out of is_anon_memory?
> > > > 
> > > > The MAP_HUGETLB is defined in uapi/linux/mman.h and I had trouble
> > > > when including the header in the map.h file.
> > > 
> > > could you share the error? it might be corner case, but it
> > > could bite us in future
> > 
> > Sure.
> > 
> >   CC       util/session.o
> > In file included from /home/namhyung/project/linux/tools/include/uapi/asm-generic/mman-common-tools.h:5,
> >                  from /home/namhyung/project/linux/tools/include/uapi/asm-generic/mman.h:5,
> >                  from /home/namhyung/project/linux/tools/arch/x86/include/uapi/asm/mman.h:5,
> >                  from /home/namhyung/project/linux/tools/include/uapi/linux/mman.h:5,
> >                  from util/map.h:13,
> >                  from util/session.c:21:
> > /home/namhyung/project/linux/tools/include/uapi/asm-generic/mman-common.h:26: error: "MAP_POPULATE" redefined [-Werror]
> >    26 | #define MAP_POPULATE  0x008000 /* populate (prefault) pagetables */
> >       | 
> > In file included from /usr/include/x86_64-linux-gnu/bits/mman.h:31,
> >                  from /usr/include/x86_64-linux-gnu/sys/mman.h:41,
> >                  from util/session.c:12:
> > /usr/include/x86_64-linux-gnu/bits/mman-map-flags-generic.h:34: note: this is the location of the previous definition
> >    34 | # define MAP_POPULATE 0x08000  /* Populate (prefault) pagetables.  */
> >       | 
> > 
> > This is repeated for each macro definitions..
> 
> hm, some black magic happened in the past and it looks like now we
> can't have <sys/mman.h> and <linux/mman.h> includes together
> 
> it looks related to this commit:
>   be709d48329a tools headers uapi: Sync asm-generic/mman-common.h and linux/mman.h
> 
> I'm not sure I understand the purpose of asm-generic/mman-common-tools.h file
> 
> Arnaldo,
> any chance you might have some quick solution before I dive in?
> you can reproduce the issue with change below

So, some of the headers in tools/include/ are for
tools/perf/trace/beauty/ to auto-generate tables, like described in the
be709d48329a cset.

This causes problems like the one you're describing, that is why I'm now
using tools/perf/trace/beauty/include/ for these purposes, and that is
not used in building source code, just in autogenerating such tables and
helping us to notice when changes were made in the kernel that may break
those autogeneration scripts, all under that tools/perf/check_patch.sh
build checks.

I'll look into this mmap case and try to get this resolved and then
build it on all the containers to make sure everything continues to work
as expected.

- Arnaldo

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-09-28 12:31 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-23  8:05 [PATCHSET v2 0/7] perf inject: Speed build-id injection Namhyung Kim
2020-09-23  8:05 ` [PATCH 1/7] perf bench: Add build-id injection benchmark Namhyung Kim
2020-09-23 22:13   ` Ian Rogers
2020-09-24  6:23     ` Namhyung Kim
2020-09-23  8:05 ` [PATCH 2/7] perf inject: Add missing callbacks in perf_tool Namhyung Kim
2020-09-23  8:05 ` [PATCH 3/7] perf inject: Enter namespace when reading build-id Namhyung Kim
2020-09-23  8:05 ` [PATCH 4/7] perf inject: Do not load map/dso when injecting build-id Namhyung Kim
2020-09-24 13:09   ` Jiri Olsa
2020-09-24 13:20     ` Namhyung Kim
2020-09-24 13:44       ` Jiri Olsa
2020-09-24 14:46         ` Namhyung Kim
2020-09-25 14:26           ` Jiri Olsa
2020-09-28 12:31             ` Arnaldo Carvalho de Melo
2020-09-23  8:05 ` [PATCH 5/7] perf inject: Add --buildid-all option Namhyung Kim
2020-09-23 22:16   ` Ian Rogers
2020-09-23  8:05 ` [PATCH 6/7] perf bench: Run inject-build-id with --buildid-all option too Namhyung Kim
2020-09-23 22:17   ` Ian Rogers
2020-09-23  8:05 ` [PATCH 7/7] perf inject: Remove stale build-id processing Namhyung Kim
2020-09-23 14:36   ` Adrian Hunter
2020-09-24  3:51     ` Namhyung Kim
2020-09-24 13:33   ` Jiri Olsa
2020-09-24 14:23     ` Namhyung Kim
2020-09-24 13:35 ` [PATCHSET v2 0/7] perf inject: Speed build-id injection Jiri Olsa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.