linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Benchmark and improve event synthesis performance
@ 2020-04-01 23:39 Ian Rogers
  2020-04-01 23:39 ` [PATCH 1/5] perf bench: add event synthesis benchmark Ian Rogers
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Ian Rogers @ 2020-04-01 23:39 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Petr Mladek, Andrey Zhizhikin, Kefeng Wang, Thomas Gleixner,
	Kan Liang, linux-kernel, linux-perf-users
  Cc: Stephane Eranian, Ian Rogers


Event synthesis is performance critical in common tasks using perf. For
example, when perf record starts in system wide mode the /proc file
system is scanned with events synthesized for each process and all
executable mmaps. With large machines and lots of processes, we have seen
O(seconds) of wall clock time while synthesis is occurring.

This patch set adds a benchmark for synthesis performance in a new
benchmark collection called 'internals'. The benchmark uses the
machine__synthesize_threads function, single threaded on the perf process
with a 'tool' that just drops the events, to measure how long synthesis
takes.

By profiling this benchmark 2 performance bottlenecks were identified,
hugetlbfs_mountpoint and stdio. The impact of theses changes are:

Before:
Average synthesis took: 167.616800 usec
Average data synthesis took: 208.655600 usec

After hugetlbfs_mountpoint scalability fix:
Average synthesis took: 120.195100 usec
Average data synthesis took: 156.582300 usec

After removal of stdio in /proc/pid/maps code:
Average synthesis took: 67.189100 usec
Average data synthesis took: 102.451600 usec

Time was measured on an Intel Xeon 6154 compiling with Debian gcc 9.2.1.

Two patches in the set were sent to LKML previously but are included
here for context around the benchmark performance impact:
https://lore.kernel.org/lkml/20200327172914.28603-1-irogers@google.com/T/#u
https://lore.kernel.org/lkml/20200328014221.168130-1-irogers@google.com/T/#u

A future area of improvement could be to add the perf top
num-thread-synthesize option more widely to other perf commands, and
also to benchmark its effectiveness.

Ian Rogers (4):
  perf bench: add event synthesis benchmark
  perf synthetic-events: save 4kb from 2 stack frames
  tools api: add a lightweight buffered reading api
  perf synthetic events: Remove use of sscanf from /proc reading

Stephane Eranian (1):
  tools api fs: make xxx__mountpoint() more scalable

 tools/lib/api/fs/fs.c              |  17 +++
 tools/lib/api/fs/fs.h              |  12 ++
 tools/lib/api/io.h                 | 103 +++++++++++++++++
 tools/perf/bench/Build             |   2 +-
 tools/perf/bench/bench.h           |   2 +-
 tools/perf/bench/synthesize.c      | 101 ++++++++++++++++
 tools/perf/builtin-bench.c         |   6 +
 tools/perf/util/synthetic-events.c | 177 +++++++++++++++++++----------
 8 files changed, 355 insertions(+), 65 deletions(-)
 create mode 100644 tools/lib/api/io.h
 create mode 100644 tools/perf/bench/synthesize.c

-- 
2.26.0.rc2.310.g2932bb562d-goog


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/5] perf bench: add event synthesis benchmark
  2020-04-01 23:39 [PATCH 0/5] Benchmark and improve event synthesis performance Ian Rogers
@ 2020-04-01 23:39 ` Ian Rogers
  2020-04-02 13:41   ` Jiri Olsa
  2020-04-01 23:39 ` [PATCH 2/5] tools api fs: make xxx__mountpoint() more scalable Ian Rogers
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Ian Rogers @ 2020-04-01 23:39 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Petr Mladek, Andrey Zhizhikin, Kefeng Wang, Thomas Gleixner,
	Kan Liang, linux-kernel, linux-perf-users
  Cc: Stephane Eranian, Ian Rogers

Event synthesis may occur at the start or end (tail) of a perf command.
In system-wide mode it can scan every process in /proc, which may add
seconds of latency before event recording. Add a new benchmark that
times how long event synthesis takes with and without data synthesis.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/bench/Build        |   2 +-
 tools/perf/bench/bench.h      |   2 +-
 tools/perf/bench/synthesize.c | 101 ++++++++++++++++++++++++++++++++++
 tools/perf/builtin-bench.c    |   6 ++
 4 files changed, 109 insertions(+), 2 deletions(-)
 create mode 100644 tools/perf/bench/synthesize.c

diff --git a/tools/perf/bench/Build b/tools/perf/bench/Build
index e4e321b6f883..042827385c87 100644
--- a/tools/perf/bench/Build
+++ b/tools/perf/bench/Build
@@ -6,9 +6,9 @@ perf-y += futex-wake.o
 perf-y += futex-wake-parallel.o
 perf-y += futex-requeue.o
 perf-y += futex-lock-pi.o
-
 perf-y += epoll-wait.o
 perf-y += epoll-ctl.o
+perf-y += synthesize.o
 
 perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-lib.o
 perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-asm.o
diff --git a/tools/perf/bench/bench.h b/tools/perf/bench/bench.h
index 4aa6de1aa67d..4d669c803237 100644
--- a/tools/perf/bench/bench.h
+++ b/tools/perf/bench/bench.h
@@ -41,9 +41,9 @@ int bench_futex_wake_parallel(int argc, const char **argv);
 int bench_futex_requeue(int argc, const char **argv);
 /* pi futexes */
 int bench_futex_lock_pi(int argc, const char **argv);
-
 int bench_epoll_wait(int argc, const char **argv);
 int bench_epoll_ctl(int argc, const char **argv);
+int bench_synthesize(int argc, const char **argv);
 
 #define BENCH_FORMAT_DEFAULT_STR	"default"
 #define BENCH_FORMAT_DEFAULT		0
diff --git a/tools/perf/bench/synthesize.c b/tools/perf/bench/synthesize.c
new file mode 100644
index 000000000000..6291257bc9c9
--- /dev/null
+++ b/tools/perf/bench/synthesize.c
@@ -0,0 +1,101 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Benchmark synthesis of perf events such as at the start of a 'perf
+ * record'. Synthesis is done on the current process and the 'dummy' event
+ * handlers are invoked that support dump_trace but otherwise do nothing.
+ *
+ * Copyright 2019 Google LLC.
+ */
+#include <stdio.h>
+#include "bench.h"
+#include "../util/debug.h"
+#include "../util/session.h"
+#include "../util/synthetic-events.h"
+#include "../util/target.h"
+#include "../util/thread_map.h"
+#include "../util/tool.h"
+#include <linux/err.h>
+#include <linux/time64.h>
+#include <subcmd/parse-options.h>
+
+static unsigned int iterations = 10000;
+
+static const struct option options[] = {
+	OPT_UINTEGER('i', "iterations", &iterations,
+		"Number of iterations used to compute average"),
+	OPT_END()
+};
+
+static const char *const usage[] = {
+	"perf bench internals synthesize <options>",
+	NULL
+};
+
+
+static int do_synthesize(struct perf_session *session,
+			struct perf_thread_map *threads,
+			struct target *target, bool data_mmap)
+{
+	const unsigned int nr_threads_synthesize = 1;
+	struct timeval start, end, diff;
+	u64 runtime_us;
+	unsigned int i;
+	double average;
+	int err;
+
+	gettimeofday(&start, NULL);
+	for (i = 0; i < iterations; i++) {
+		err = machine__synthesize_threads(&session->machines.host,
+						target, threads, data_mmap,
+						nr_threads_synthesize);
+		if (err)
+			return err;
+	}
+
+	gettimeofday(&end, NULL);
+	timersub(&end, &start, &diff);
+	runtime_us = diff.tv_sec * USEC_PER_SEC + diff.tv_usec;
+	average = (double)runtime_us/(double)iterations;
+	printf("Average %ssynthesis took: %f usec\n",
+		data_mmap ? "data " : "", average);
+	return 0;
+}
+
+int bench_synthesize(int argc, const char **argv)
+{
+	struct perf_tool tool;
+	struct perf_session *session;
+	struct target target = {
+		.pid = "self",
+	};
+	struct perf_thread_map *threads;
+	int err;
+
+	argc = parse_options(argc, argv, options, usage, 0);
+
+	session = perf_session__new(NULL, false, NULL);
+	if (IS_ERR(session)) {
+		pr_err("Session creation failed.\n");
+		return PTR_ERR(session);
+	}
+	threads = thread_map__new_by_pid(getpid());
+	if (!threads) {
+		pr_err("Thread map creation failed.\n");
+		err = -ENOMEM;
+		goto err_out;
+	}
+	perf_tool__fill_defaults(&tool);
+
+	err = do_synthesize(session, threads, &target, false);
+	if (err)
+		goto err_out;
+
+	err = do_synthesize(session, threads, &target, true);
+
+err_out:
+	if (threads)
+		perf_thread_map__put(threads);
+
+	perf_session__delete(session);
+	return err;
+}
diff --git a/tools/perf/builtin-bench.c b/tools/perf/builtin-bench.c
index c06fe21c8613..11c79a8d85d6 100644
--- a/tools/perf/builtin-bench.c
+++ b/tools/perf/builtin-bench.c
@@ -76,6 +76,11 @@ static struct bench epoll_benchmarks[] = {
 };
 #endif // HAVE_EVENTFD
 
+static struct bench internals_benchmarks[] = {
+	{ "synthesize", "Benchmark perf event synthesis",	bench_synthesize	},
+	{ NULL,		NULL,					NULL			}
+};
+
 struct collection {
 	const char	*name;
 	const char	*summary;
@@ -92,6 +97,7 @@ static struct collection collections[] = {
 #ifdef HAVE_EVENTFD
 	{"epoll",       "Epoll stressing benchmarks",                   epoll_benchmarks        },
 #endif
+	{ "internals",	"Perf-internals benchmarks",			internals_benchmarks	},
 	{ "all",	"All benchmarks",				NULL			},
 	{ NULL,		NULL,						NULL			}
 };
-- 
2.26.0.rc2.310.g2932bb562d-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/5] tools api fs: make xxx__mountpoint() more scalable
  2020-04-01 23:39 [PATCH 0/5] Benchmark and improve event synthesis performance Ian Rogers
  2020-04-01 23:39 ` [PATCH 1/5] perf bench: add event synthesis benchmark Ian Rogers
@ 2020-04-01 23:39 ` Ian Rogers
  2020-04-01 23:39 ` [PATCH 3/5] perf synthetic-events: save 4kb from 2 stack frames Ian Rogers
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Ian Rogers @ 2020-04-01 23:39 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Petr Mladek, Andrey Zhizhikin, Kefeng Wang, Thomas Gleixner,
	Kan Liang, linux-kernel, linux-perf-users
  Cc: Stephane Eranian, Ian Rogers

From: Stephane Eranian <eranian@google.com>

The xxx_mountpoint() interface provided by fs.c finds
mount points for common pseudo filesystems. The first
time xxx_mountpoint() is invoked, it scans the mount
table (/proc/mounts) looking for a match. If found, it
is cached. The price to scan /proc/mounts is paid once
if the mount is found.

When the mount point is not found, subsequent calls to
xxx_mountpoint() scan /proc/mounts over and over again.
There is no caching.

This causes a scaling issue in perf record with hugeltbfs__mountpoint().
The function is called for each process found in synthesize__mmap_events().
If the machine has thousands of processes and if the /proc/mounts has many
entries this could cause major overhead in perf record. We have observed
multi-second slowdowns on some configurations.

As an example on a laptop:

Before:
$ sudo umount /dev/hugepages
$ strace -e trace=openat -o /tmp/tt perf record -a ls
$ fgrep mounts /tmp/tt
285

After:
$ sudo umount /dev/hugepages
$ strace -e trace=openat -o /tmp/tt perf record -a ls
$ fgrep mounts /tmp/tt
1

One could argue that the non-caching in case the moint point is not found
is intentional. That way subsequent calls may discover a moint point if
the sysadmin mounts the filesystem. But the same argument could be made
against caching the mount point. It could be unmounted causing errors.
It all depends on the intent of the interface. This patch assumes it
is expected to scan /proc/mounts once. The patch documents the caching
behavior in the fs.h header file.

An alternative would be to just fix perf record. But it would solve
the problem with hugetlbs__mountpoint() but there could be similar
issues (possibly down the line) with other xxx_mountpoint() calls
in perf or other tools.

Signed-off-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/lib/api/fs/fs.c | 17 +++++++++++++++++
 tools/lib/api/fs/fs.h | 12 ++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/tools/lib/api/fs/fs.c b/tools/lib/api/fs/fs.c
index 027b18f7ed8c..82f53d81a7a7 100644
--- a/tools/lib/api/fs/fs.c
+++ b/tools/lib/api/fs/fs.c
@@ -90,6 +90,7 @@ struct fs {
 	const char * const	*mounts;
 	char			 path[PATH_MAX];
 	bool			 found;
+	bool			 checked;
 	long			 magic;
 };
 
@@ -111,31 +112,37 @@ static struct fs fs__entries[] = {
 		.name	= "sysfs",
 		.mounts	= sysfs__fs_known_mountpoints,
 		.magic	= SYSFS_MAGIC,
+		.checked = false,
 	},
 	[FS__PROCFS] = {
 		.name	= "proc",
 		.mounts	= procfs__known_mountpoints,
 		.magic	= PROC_SUPER_MAGIC,
+		.checked = false,
 	},
 	[FS__DEBUGFS] = {
 		.name	= "debugfs",
 		.mounts	= debugfs__known_mountpoints,
 		.magic	= DEBUGFS_MAGIC,
+		.checked = false,
 	},
 	[FS__TRACEFS] = {
 		.name	= "tracefs",
 		.mounts	= tracefs__known_mountpoints,
 		.magic	= TRACEFS_MAGIC,
+		.checked = false,
 	},
 	[FS__HUGETLBFS] = {
 		.name	= "hugetlbfs",
 		.mounts = hugetlbfs__known_mountpoints,
 		.magic	= HUGETLBFS_MAGIC,
+		.checked = false,
 	},
 	[FS__BPF_FS] = {
 		.name	= "bpf",
 		.mounts = bpf_fs__known_mountpoints,
 		.magic	= BPF_FS_MAGIC,
+		.checked = false,
 	},
 };
 
@@ -158,6 +165,7 @@ static bool fs__read_mounts(struct fs *fs)
 	}
 
 	fclose(fp);
+	fs->checked = true;
 	return fs->found = found;
 }
 
@@ -220,6 +228,7 @@ static bool fs__env_override(struct fs *fs)
 		return false;
 
 	fs->found = true;
+	fs->checked = true;
 	strncpy(fs->path, override_path, sizeof(fs->path) - 1);
 	fs->path[sizeof(fs->path) - 1] = '\0';
 	return true;
@@ -246,6 +255,14 @@ static const char *fs__mountpoint(int idx)
 	if (fs->found)
 		return (const char *)fs->path;
 
+	/* the mount point was already checked for the mount point
+	 * but and did not exist, so return NULL to avoid scanning again.
+	 * This makes the found and not found paths cost equivalent
+	 * in case of multiple calls.
+	 */
+	if (fs->checked)
+		return NULL;
+
 	return fs__get_mountpoint(fs);
 }
 
diff --git a/tools/lib/api/fs/fs.h b/tools/lib/api/fs/fs.h
index 936edb95e1f3..aa222ca30311 100644
--- a/tools/lib/api/fs/fs.h
+++ b/tools/lib/api/fs/fs.h
@@ -18,6 +18,18 @@
 	const char *name##__mount(void);	\
 	bool name##__configured(void);		\
 
+/*
+ * The xxxx__mountpoint() entry points find the first match mount point for each
+ * filesystems listed below, where xxxx is the filesystem type.
+ *
+ * The interface is as follows:
+ *
+ * - If a mount point is found on first call, it is cached and used for all
+ *   subsequent calls.
+ *
+ * - If a mount point is not found, NULL is returned on first call and all
+ *   subsequent calls.
+ */
 FS(sysfs)
 FS(procfs)
 FS(debugfs)
-- 
2.26.0.rc2.310.g2932bb562d-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/5] perf synthetic-events: save 4kb from 2 stack frames
  2020-04-01 23:39 [PATCH 0/5] Benchmark and improve event synthesis performance Ian Rogers
  2020-04-01 23:39 ` [PATCH 1/5] perf bench: add event synthesis benchmark Ian Rogers
  2020-04-01 23:39 ` [PATCH 2/5] tools api fs: make xxx__mountpoint() more scalable Ian Rogers
@ 2020-04-01 23:39 ` Ian Rogers
  2020-04-01 23:39 ` [PATCH 4/5] tools api: add a lightweight buffered reading api Ian Rogers
  2020-04-01 23:39 ` [PATCH 5/5] perf synthetic events: Remove use of sscanf from /proc reading Ian Rogers
  4 siblings, 0 replies; 10+ messages in thread
From: Ian Rogers @ 2020-04-01 23:39 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Petr Mladek, Andrey Zhizhikin, Kefeng Wang, Thomas Gleixner,
	Kan Liang, linux-kernel, linux-perf-users
  Cc: Stephane Eranian, Ian Rogers

Reuse an existing char buffer to avoid two PATH_MAX sized char buffers.
Reduces stack frame sizes by 4kb.

perf_event__synthesize_mmap_events before 'sub $0x45b8,%rsp' after
'sub $0x35b8,%rsp'.

perf_event__get_comm_ids before 'sub $0x2028,%rsp' after
'sub $0x1028,%rsp'.

The performance impact of this change is negligible.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/synthetic-events.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 3f28af39f9c6..1f3d8d4bb879 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -70,7 +70,6 @@ int perf_tool__process_synth_event(struct perf_tool *tool,
 static int perf_event__get_comm_ids(pid_t pid, char *comm, size_t len,
 				    pid_t *tgid, pid_t *ppid)
 {
-	char filename[PATH_MAX];
 	char bf[4096];
 	int fd;
 	size_t size = 0;
@@ -80,11 +79,11 @@ static int perf_event__get_comm_ids(pid_t pid, char *comm, size_t len,
 	*tgid = -1;
 	*ppid = -1;
 
-	snprintf(filename, sizeof(filename), "/proc/%d/status", pid);
+	snprintf(bf, sizeof(bf), "/proc/%d/status", pid);
 
-	fd = open(filename, O_RDONLY);
+	fd = open(bf, O_RDONLY);
 	if (fd < 0) {
-		pr_debug("couldn't open %s\n", filename);
+		pr_debug("couldn't open %s\n", bf);
 		return -1;
 	}
 
@@ -280,9 +279,9 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 				       struct machine *machine,
 				       bool mmap_data)
 {
-	char filename[PATH_MAX];
 	FILE *fp;
 	unsigned long long t;
+	char bf[BUFSIZ];
 	bool truncation = false;
 	unsigned long long timeout = proc_map_timeout * 1000000ULL;
 	int rc = 0;
@@ -292,15 +291,15 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 	if (machine__is_default_guest(machine))
 		return 0;
 
-	snprintf(filename, sizeof(filename), "%s/proc/%d/task/%d/maps",
-		 machine->root_dir, pid, pid);
+	snprintf(bf, sizeof(bf), "%s/proc/%d/task/%d/maps",
+		machine->root_dir, pid, pid);
 
-	fp = fopen(filename, "r");
+	fp = fopen(bf, "r");
 	if (fp == NULL) {
 		/*
 		 * We raced with a task exiting - just return:
 		 */
-		pr_debug("couldn't open %s\n", filename);
+		pr_debug("couldn't open %s\n", bf);
 		return -1;
 	}
 
@@ -308,7 +307,6 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 	t = rdclock();
 
 	while (1) {
-		char bf[BUFSIZ];
 		char prot[5];
 		char execname[PATH_MAX];
 		char anonstr[] = "//anon";
@@ -320,10 +318,10 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 			break;
 
 		if ((rdclock() - t) > timeout) {
-			pr_warning("Reading %s time out. "
+			pr_warning("Reading %s/proc/%d/task/%d/maps time out. "
 				   "You may want to increase "
 				   "the time limit by --proc-map-timeout\n",
-				   filename);
+				   machine->root_dir, pid, pid);
 			truncation = true;
 			goto out;
 		}
-- 
2.26.0.rc2.310.g2932bb562d-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/5] tools api: add a lightweight buffered reading api
  2020-04-01 23:39 [PATCH 0/5] Benchmark and improve event synthesis performance Ian Rogers
                   ` (2 preceding siblings ...)
  2020-04-01 23:39 ` [PATCH 3/5] perf synthetic-events: save 4kb from 2 stack frames Ian Rogers
@ 2020-04-01 23:39 ` Ian Rogers
  2020-04-02 13:41   ` Jiri Olsa
  2020-04-01 23:39 ` [PATCH 5/5] perf synthetic events: Remove use of sscanf from /proc reading Ian Rogers
  4 siblings, 1 reply; 10+ messages in thread
From: Ian Rogers @ 2020-04-01 23:39 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Petr Mladek, Andrey Zhizhikin, Kefeng Wang, Thomas Gleixner,
	Kan Liang, linux-kernel, linux-perf-users
  Cc: Stephane Eranian, Ian Rogers

The synthesize benchmark shows the majority of execution time going to
fgets and sscanf, necessary to parse /proc/pid/maps. Add a new buffered
reading library that will be used to replace these calls in a follow-up
CL.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/lib/api/io.h | 103 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 103 insertions(+)
 create mode 100644 tools/lib/api/io.h

diff --git a/tools/lib/api/io.h b/tools/lib/api/io.h
new file mode 100644
index 000000000000..e88d76c59440
--- /dev/null
+++ b/tools/lib/api/io.h
@@ -0,0 +1,103 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Lightweight buffered reading library.
+ *
+ * Copyright 2019 Google LLC.
+ */
+
+struct io {
+	/* File descriptor being read/ */
+	int fd;
+	/* Size of the read buffer. */
+	unsigned int buf_len;
+	/* Pointer to storage for buffering read. */
+	char *buf;
+	/* End of the storage. */
+	char *end;
+	/* Currently accessed data pointer. */
+	char *data;
+	/* Set true on when the end of file on read error. */
+	bool eof;
+};
+
+static inline void init_io(struct io *io, int fd,
+			   char *buf, unsigned int buf_len)
+{
+	io->fd = fd;
+	io->buf_len = buf_len;
+	io->buf = buf;
+	io->end = buf;
+	io->data = buf;
+	io->eof = false;
+}
+
+/* Reads one character from the "io" file with similar semantics to fgetc. */
+static inline int get_char(struct io *io)
+{
+	char *ptr = io->data;
+
+	if (ptr == io->end) {
+		ssize_t n = read(io->fd, io->buf, io->buf_len);
+
+		if (n <= 0) {
+			io->eof = true;
+			return -1;
+		}
+		ptr = &io->buf[0];
+		io->end = &io->buf[n];
+	}
+	io->data = ptr + 1;
+	return *ptr;
+}
+
+/* Read a hexadecimal value with no 0x prefix into the out argument hex.
+ * Returns -1 on error or if nothing is read, otherwise returns the character
+ * after the hexadecimal value.
+ */
+static inline int get_hex(struct io *io, __u64 *hex)
+{
+	bool first_read = true;
+
+	*hex = 0;
+	while (true) {
+		char ch = get_char(io);
+
+		if (ch < 0)
+			return ch;
+		if (ch >= '0' && ch <= '9')
+			*hex = (*hex << 4) | (ch - '0');
+		else if (ch >= 'a' && ch <= 'f')
+			*hex = (*hex << 4) | (ch - 'a' + 10);
+		else if (ch >= 'A' && ch <= 'F')
+			*hex = (*hex << 4) | (ch - 'A' + 10);
+		else if (first_read)
+			return -1;
+		else
+			return ch;
+		first_read = false;
+	}
+}
+
+/* Read a decimal value into the out argument dec.
+ * Returns -1 on error or if nothing is read, otherwise returns the character
+ * after the decimal value.
+ */
+static inline int get_dec(struct io *io, __u64 *dec)
+{
+	bool first_read = true;
+
+	*dec = 0;
+	while (true) {
+		char ch = get_char(io);
+
+		if (ch < 0)
+			return ch;
+		if (ch >= '0' && ch <= '9')
+			*dec = (*dec * 10) + ch - '0';
+		else if (first_read)
+			return -1;
+		else
+			return ch;
+		first_read = false;
+	}
+}
-- 
2.26.0.rc2.310.g2932bb562d-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/5] perf synthetic events: Remove use of sscanf from /proc reading
  2020-04-01 23:39 [PATCH 0/5] Benchmark and improve event synthesis performance Ian Rogers
                   ` (3 preceding siblings ...)
  2020-04-01 23:39 ` [PATCH 4/5] tools api: add a lightweight buffered reading api Ian Rogers
@ 2020-04-01 23:39 ` Ian Rogers
  2020-04-02 13:40   ` Jiri Olsa
  2020-04-02 13:41   ` Jiri Olsa
  4 siblings, 2 replies; 10+ messages in thread
From: Ian Rogers @ 2020-04-01 23:39 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Petr Mladek, Andrey Zhizhikin, Kefeng Wang, Thomas Gleixner,
	Kan Liang, linux-kernel, linux-perf-users
  Cc: Stephane Eranian, Ian Rogers

The synthesize benchmark, run on a single process and thread, shows
perf_event__synthesize_mmap_events as the hottest function with fgets
and sscanf taking the majority of execution time. fscanf performs
similarly well. Replace the scanf call with manual reading of each field
of the /proc/pid/maps line, and remove some unnecessary buffering.
This change also addresses potential, but unlikely, buffer overruns for
the string values read by scanf.

Performance before is:
Average synthesis took: 120.195100 usec
Average data synthesis took: 156.582300 usec

And after is:
Average synthesis took: 67.189100 usec
Average data synthesis took: 102.451600 usec

On a Intel Xeon 6154 compiling with Debian gcc 9.2.1.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/synthetic-events.c | 157 +++++++++++++++++++----------
 1 file changed, 105 insertions(+), 52 deletions(-)

diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 1f3d8d4bb879..64e010689a5f 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -36,6 +36,7 @@
 #include <string.h>
 #include <uapi/linux/mman.h> /* To get things like MAP_HUGETLB even on older libc headers */
 #include <api/fs/fs.h>
+#include <api/io.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
@@ -272,6 +273,79 @@ static int perf_event__synthesize_fork(struct perf_tool *tool,
 	return 0;
 }
 
+static bool read_proc_maps_line(struct io *io, __u64 *start, __u64 *end,
+				u32 *prot, u32 *flags, __u64 *offset,
+				u32 *maj, u32 *min,
+				__u64 *inode,
+				ssize_t pathname_size, char *pathname)
+{
+	__u64 temp;
+	int ch;
+	char *start_pathname = pathname;
+
+	if (get_hex(io, start) != '-')
+		return false;
+	if (get_hex(io, end) != ' ')
+		return false;
+
+	/* map protection and flags bits */
+	*prot = 0;
+	ch = get_char(io);
+	if (ch == 'r')
+		*prot |= PROT_READ;
+	else if (ch != '-')
+		return false;
+	ch = get_char(io);
+	if (ch == 'w')
+		*prot |= PROT_WRITE;
+	else if (ch != '-')
+		return false;
+	ch = get_char(io);
+	if (ch == 'x')
+		*prot |= PROT_EXEC;
+	else if (ch != '-')
+		return false;
+	ch = get_char(io);
+	if (ch == 's')
+		*flags = MAP_SHARED;
+	else if (ch == 'p')
+		*flags = MAP_PRIVATE;
+	else
+		return false;
+	if (get_char(io) != ' ')
+		return false;
+
+	if (get_hex(io, offset) != ' ')
+		return false;
+
+	if (get_hex(io, &temp) != ':')
+		return false;
+	*maj = temp;
+	if (get_hex(io, &temp) != ' ')
+		return false;
+	*min = temp;
+
+	ch = get_dec(io, inode);
+	if (ch != ' ') {
+		*pathname = '\0';
+		return ch == '\n';
+	}
+	do {
+		ch = get_char(io);
+	} while (ch == ' ');
+	while (true) {
+		if (ch < 0)
+			return false;
+		if (ch == '\0' || ch == '\n' ||
+		    (pathname + 1 - start_pathname) >= pathname_size) {
+			*pathname = '\0';
+			return true;
+		}
+		*pathname++ = ch;
+		ch = get_char(io);
+	}
+}
+
 int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 				       union perf_event *event,
 				       pid_t pid, pid_t tgid,
@@ -279,9 +353,9 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 				       struct machine *machine,
 				       bool mmap_data)
 {
-	FILE *fp;
 	unsigned long long t;
 	char bf[BUFSIZ];
+	struct io io;
 	bool truncation = false;
 	unsigned long long timeout = proc_map_timeout * 1000000ULL;
 	int rc = 0;
@@ -294,28 +368,39 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 	snprintf(bf, sizeof(bf), "%s/proc/%d/task/%d/maps",
 		machine->root_dir, pid, pid);
 
-	fp = fopen(bf, "r");
-	if (fp == NULL) {
+	io.fd = open(bf, O_RDONLY, 0);
+	if (io.fd < 0) {
 		/*
 		 * We raced with a task exiting - just return:
 		 */
 		pr_debug("couldn't open %s\n", bf);
 		return -1;
 	}
+	init_io(&io, io.fd, bf, sizeof(bf));
 
 	event->header.type = PERF_RECORD_MMAP2;
 	t = rdclock();
 
-	while (1) {
-		char prot[5];
-		char execname[PATH_MAX];
-		char anonstr[] = "//anon";
-		unsigned int ino;
+	while (!io.eof) {
+		static const char anonstr[] = "//anon";
 		size_t size;
-		ssize_t n;
 
-		if (fgets(bf, sizeof(bf), fp) == NULL)
-			break;
+		/* ensure null termination since stack will be reused. */
+		strcpy(event->mmap2.filename, "");
+
+		/* 00400000-0040c000 r-xp 00000000 fd:01 41038  /bin/cat */
+		if (!read_proc_maps_line(&io,
+					&event->mmap2.start,
+					&event->mmap2.len,
+					&event->mmap2.prot,
+					&event->mmap2.flags,
+					&event->mmap2.pgoff,
+					&event->mmap2.maj,
+					&event->mmap2.min,
+					&event->mmap2.ino,
+					sizeof(event->mmap2.filename),
+					event->mmap2.filename))
+			continue;
 
 		if ((rdclock() - t) > timeout) {
 			pr_warning("Reading %s/proc/%d/task/%d/maps time out. "
@@ -326,23 +411,6 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 			goto out;
 		}
 
-		/* ensure null termination since stack will be reused. */
-		strcpy(execname, "");
-
-		/* 00400000-0040c000 r-xp 00000000 fd:01 41038  /bin/cat */
-		n = sscanf(bf, "%"PRI_lx64"-%"PRI_lx64" %s %"PRI_lx64" %x:%x %u %[^\n]\n",
-		       &event->mmap2.start, &event->mmap2.len, prot,
-		       &event->mmap2.pgoff, &event->mmap2.maj,
-		       &event->mmap2.min,
-		       &ino, execname);
-
-		/*
- 		 * Anon maps don't have the execname.
- 		 */
-		if (n < 7)
-			continue;
-
-		event->mmap2.ino = (u64)ino;
 		event->mmap2.ino_generation = 0;
 
 		/*
@@ -353,23 +421,8 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 		else
 			event->header.misc = PERF_RECORD_MISC_GUEST_USER;
 
-		/* map protection and flags bits */
-		event->mmap2.prot = 0;
-		event->mmap2.flags = 0;
-		if (prot[0] == 'r')
-			event->mmap2.prot |= PROT_READ;
-		if (prot[1] == 'w')
-			event->mmap2.prot |= PROT_WRITE;
-		if (prot[2] == 'x')
-			event->mmap2.prot |= PROT_EXEC;
-
-		if (prot[3] == 's')
-			event->mmap2.flags |= MAP_SHARED;
-		else
-			event->mmap2.flags |= MAP_PRIVATE;
-
-		if (prot[2] != 'x') {
-			if (!mmap_data || prot[0] != 'r')
+		if ((event->mmap2.prot & PROT_EXEC) == 0) {
+			if (!mmap_data || (event->mmap2.prot & PROT_READ) == 0)
 				continue;
 
 			event->header.misc |= PERF_RECORD_MISC_MMAP_DATA;
@@ -379,17 +432,17 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 		if (truncation)
 			event->header.misc |= PERF_RECORD_MISC_PROC_MAP_PARSE_TIMEOUT;
 
-		if (!strcmp(execname, ""))
-			strcpy(execname, anonstr);
+		if (!strcmp(event->mmap2.filename, ""))
+			strcpy(event->mmap2.filename, anonstr);
 
 		if (hugetlbfs_mnt_len &&
-		    !strncmp(execname, hugetlbfs_mnt, hugetlbfs_mnt_len)) {
-			strcpy(execname, anonstr);
+		    !strncmp(event->mmap2.filename, hugetlbfs_mnt,
+			     hugetlbfs_mnt_len)) {
+			strcpy(event->mmap2.filename, anonstr);
 			event->mmap2.flags |= MAP_HUGETLB;
 		}
 
-		size = strlen(execname) + 1;
-		memcpy(event->mmap2.filename, execname, size);
+		size = strlen(event->mmap2.filename) + 1;
 		size = PERF_ALIGN(size, sizeof(u64));
 		event->mmap2.len -= event->mmap.start;
 		event->mmap2.header.size = (sizeof(event->mmap2) -
@@ -408,7 +461,7 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 			break;
 	}
 
-	fclose(fp);
+	close(io.fd);
 	return rc;
 }
 
-- 
2.26.0.rc2.310.g2932bb562d-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 5/5] perf synthetic events: Remove use of sscanf from /proc reading
  2020-04-01 23:39 ` [PATCH 5/5] perf synthetic events: Remove use of sscanf from /proc reading Ian Rogers
@ 2020-04-02 13:40   ` Jiri Olsa
  2020-04-02 13:41   ` Jiri Olsa
  1 sibling, 0 replies; 10+ messages in thread
From: Jiri Olsa @ 2020-04-02 13:40 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Petr Mladek,
	Andrey Zhizhikin, Kefeng Wang, Thomas Gleixner, Kan Liang,
	linux-kernel, linux-perf-users, Stephane Eranian

On Wed, Apr 01, 2020 at 04:39:45PM -0700, Ian Rogers wrote:
> The synthesize benchmark, run on a single process and thread, shows
> perf_event__synthesize_mmap_events as the hottest function with fgets
> and sscanf taking the majority of execution time. fscanf performs
> similarly well. Replace the scanf call with manual reading of each field
> of the /proc/pid/maps line, and remove some unnecessary buffering.
> This change also addresses potential, but unlikely, buffer overruns for
> the string values read by scanf.
> 
> Performance before is:
> Average synthesis took: 120.195100 usec
> Average data synthesis took: 156.582300 usec
> 
> And after is:
> Average synthesis took: 67.189100 usec
> Average data synthesis took: 102.451600 usec
> 
> On a Intel Xeon 6154 compiling with Debian gcc 9.2.1.

I can see the speedup as well, thanks!

jirka


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 5/5] perf synthetic events: Remove use of sscanf from /proc reading
  2020-04-01 23:39 ` [PATCH 5/5] perf synthetic events: Remove use of sscanf from /proc reading Ian Rogers
  2020-04-02 13:40   ` Jiri Olsa
@ 2020-04-02 13:41   ` Jiri Olsa
  1 sibling, 0 replies; 10+ messages in thread
From: Jiri Olsa @ 2020-04-02 13:41 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Petr Mladek,
	Andrey Zhizhikin, Kefeng Wang, Thomas Gleixner, Kan Liang,
	linux-kernel, linux-perf-users, Stephane Eranian

On Wed, Apr 01, 2020 at 04:39:45PM -0700, Ian Rogers wrote:

SNIP

> @@ -279,9 +353,9 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
>  				       struct machine *machine,
>  				       bool mmap_data)
>  {
> -	FILE *fp;
>  	unsigned long long t;
>  	char bf[BUFSIZ];
> +	struct io io;
>  	bool truncation = false;
>  	unsigned long long timeout = proc_map_timeout * 1000000ULL;
>  	int rc = 0;
> @@ -294,28 +368,39 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
>  	snprintf(bf, sizeof(bf), "%s/proc/%d/task/%d/maps",
>  		machine->root_dir, pid, pid);
>  
> -	fp = fopen(bf, "r");
> -	if (fp == NULL) {
> +	io.fd = open(bf, O_RDONLY, 0);
> +	if (io.fd < 0) {
>  		/*
>  		 * We raced with a task exiting - just return:
>  		 */
>  		pr_debug("couldn't open %s\n", bf);
>  		return -1;
>  	}
> +	init_io(&io, io.fd, bf, sizeof(bf));
>  
>  	event->header.type = PERF_RECORD_MMAP2;
>  	t = rdclock();
>  
> -	while (1) {
> -		char prot[5];
> -		char execname[PATH_MAX];
> -		char anonstr[] = "//anon";
> -		unsigned int ino;
> +	while (!io.eof) {
> +		static const char anonstr[] = "//anon";
>  		size_t size;
> -		ssize_t n;
>  
> -		if (fgets(bf, sizeof(bf), fp) == NULL)
> -			break;
> +		/* ensure null termination since stack will be reused. */
> +		strcpy(event->mmap2.filename, "");

could you just do 'event->mmap2.filename[0] = 0x0' instad ?

jirka

> +
> +		/* 00400000-0040c000 r-xp 00000000 fd:01 41038  /bin/cat */
> +		if (!read_proc_maps_line(&io,
> +					&event->mmap2.start,
> +					&event->mmap2.len,
> +					&event->mmap2.prot,
> +					&event->mmap2.flags,
> +					&event->mmap2.pgoff,
> +					&event->mmap2.maj,
> +					&event->mmap2.min,
> +					&event->mmap2.ino,
> +					sizeof(event->mmap2.filename),
> +					event->mmap2.filename))
> +			continue;
>  

SNIP


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/5] perf bench: add event synthesis benchmark
  2020-04-01 23:39 ` [PATCH 1/5] perf bench: add event synthesis benchmark Ian Rogers
@ 2020-04-02 13:41   ` Jiri Olsa
  0 siblings, 0 replies; 10+ messages in thread
From: Jiri Olsa @ 2020-04-02 13:41 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Petr Mladek,
	Andrey Zhizhikin, Kefeng Wang, Thomas Gleixner, Kan Liang,
	linux-kernel, linux-perf-users, Stephane Eranian

On Wed, Apr 01, 2020 at 04:39:41PM -0700, Ian Rogers wrote:
> Event synthesis may occur at the start or end (tail) of a perf command.
> In system-wide mode it can scan every process in /proc, which may add
> seconds of latency before event recording. Add a new benchmark that
> times how long event synthesis takes with and without data synthesis.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>

awesome, please put the command line example in changelog

jirka


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/5] tools api: add a lightweight buffered reading api
  2020-04-01 23:39 ` [PATCH 4/5] tools api: add a lightweight buffered reading api Ian Rogers
@ 2020-04-02 13:41   ` Jiri Olsa
  0 siblings, 0 replies; 10+ messages in thread
From: Jiri Olsa @ 2020-04-02 13:41 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Petr Mladek,
	Andrey Zhizhikin, Kefeng Wang, Thomas Gleixner, Kan Liang,
	linux-kernel, linux-perf-users, Stephane Eranian

On Wed, Apr 01, 2020 at 04:39:44PM -0700, Ian Rogers wrote:
> The synthesize benchmark shows the majority of execution time going to
> fgets and sscanf, necessary to parse /proc/pid/maps. Add a new buffered
> reading library that will be used to replace these calls in a follow-up
> CL.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/lib/api/io.h | 103 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 103 insertions(+)
>  create mode 100644 tools/lib/api/io.h
> 
> diff --git a/tools/lib/api/io.h b/tools/lib/api/io.h
> new file mode 100644
> index 000000000000..e88d76c59440
> --- /dev/null
> +++ b/tools/lib/api/io.h
> @@ -0,0 +1,103 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Lightweight buffered reading library.
> + *
> + * Copyright 2019 Google LLC.
> + */

missing the recursion header check

#ifndef __PERF_IO_H
#define __PERF_IO_H 

> +
> +struct io {
> +	/* File descriptor being read/ */
> +	int fd;
> +	/* Size of the read buffer. */
> +	unsigned int buf_len;
> +	/* Pointer to storage for buffering read. */
> +	char *buf;
> +	/* End of the storage. */
> +	char *end;
> +	/* Currently accessed data pointer. */
> +	char *data;
> +	/* Set true on when the end of file on read error. */
> +	bool eof;
> +};
> +
> +static inline void init_io(struct io *io, int fd,
> +			   char *buf, unsigned int buf_len)
> +{
> +	io->fd = fd;
> +	io->buf_len = buf_len;
> +	io->buf = buf;
> +	io->end = buf;
> +	io->data = buf;
> +	io->eof = false;
> +}

looks good, could you please add io__ prefix for all functions?

thanks,
jirka

> +
> +/* Reads one character from the "io" file with similar semantics to fgetc. */
> +static inline int get_char(struct io *io)
> +{
> +	char *ptr = io->data;
> +
> +	if (ptr == io->end) {
> +		ssize_t n = read(io->fd, io->buf, io->buf_len);
> +
> +		if (n <= 0) {
> +			io->eof = true;
> +			return -1;
> +		}
> +		ptr = &io->buf[0];
> +		io->end = &io->buf[n];
> +	}
> +	io->data = ptr + 1;
> +	return *ptr;
> +}
> +
> +/* Read a hexadecimal value with no 0x prefix into the out argument hex.
> + * Returns -1 on error or if nothing is read, otherwise returns the character
> + * after the hexadecimal value.
> + */
> +static inline int get_hex(struct io *io, __u64 *hex)
> +{
> +	bool first_read = true;
> +
> +	*hex = 0;
> +	while (true) {
> +		char ch = get_char(io);
> +
> +		if (ch < 0)
> +			return ch;
> +		if (ch >= '0' && ch <= '9')
> +			*hex = (*hex << 4) | (ch - '0');
> +		else if (ch >= 'a' && ch <= 'f')
> +			*hex = (*hex << 4) | (ch - 'a' + 10);
> +		else if (ch >= 'A' && ch <= 'F')
> +			*hex = (*hex << 4) | (ch - 'A' + 10);
> +		else if (first_read)
> +			return -1;
> +		else
> +			return ch;
> +		first_read = false;
> +	}
> +}
> +
> +/* Read a decimal value into the out argument dec.
> + * Returns -1 on error or if nothing is read, otherwise returns the character
> + * after the decimal value.
> + */
> +static inline int get_dec(struct io *io, __u64 *dec)
> +{
> +	bool first_read = true;
> +
> +	*dec = 0;
> +	while (true) {
> +		char ch = get_char(io);
> +
> +		if (ch < 0)
> +			return ch;
> +		if (ch >= '0' && ch <= '9')
> +			*dec = (*dec * 10) + ch - '0';
> +		else if (first_read)
> +			return -1;
> +		else
> +			return ch;
> +		first_read = false;
> +	}
> +}
> -- 
> 2.26.0.rc2.310.g2932bb562d-goog
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-04-02 13:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-01 23:39 [PATCH 0/5] Benchmark and improve event synthesis performance Ian Rogers
2020-04-01 23:39 ` [PATCH 1/5] perf bench: add event synthesis benchmark Ian Rogers
2020-04-02 13:41   ` Jiri Olsa
2020-04-01 23:39 ` [PATCH 2/5] tools api fs: make xxx__mountpoint() more scalable Ian Rogers
2020-04-01 23:39 ` [PATCH 3/5] perf synthetic-events: save 4kb from 2 stack frames Ian Rogers
2020-04-01 23:39 ` [PATCH 4/5] tools api: add a lightweight buffered reading api Ian Rogers
2020-04-02 13:41   ` Jiri Olsa
2020-04-01 23:39 ` [PATCH 5/5] perf synthetic events: Remove use of sscanf from /proc reading Ian Rogers
2020-04-02 13:40   ` Jiri Olsa
2020-04-02 13:41   ` Jiri Olsa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).