linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] Benchmark and improve event synthesis performance
@ 2020-04-02 15:43 Ian Rogers
  2020-04-02 15:43 ` [PATCH v2 1/5] perf bench: add event synthesis benchmark Ian Rogers
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Ian Rogers @ 2020-04-02 15:43 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Petr Mladek, Andrey Zhizhikin, Kefeng Wang, Thomas Gleixner,
	Kan Liang, linux-kernel, linux-perf-users
  Cc: Stephane Eranian, Ian Rogers

Event synthesis is performance critical in common tasks using perf. For
example, when perf record starts in system wide mode the /proc file
system is scanned with events synthesized for each process and all
executable mmaps. With large machines and lots of processes, we have seen
O(seconds) of wall clock time while synthesis is occurring.

This patch set adds a benchmark for synthesis performance in a new
benchmark collection called 'internals'. The benchmark uses the
machine__synthesize_threads function, single threaded on the perf process
with a 'tool' that just drops the events, to measure how long synthesis
takes.

By profiling this benchmark 2 performance bottlenecks were identified,
hugetlbfs_mountpoint and stdio. The impact of theses changes are:

Before:
Average synthesis took: 167.616800 usec
Average data synthesis took: 208.655600 usec

After hugetlbfs_mountpoint scalability fix:
Average synthesis took: 120.195100 usec
Average data synthesis took: 156.582300 usec

After removal of stdio in /proc/pid/maps code:
Average synthesis took: 67.189100 usec
Average data synthesis took: 102.451600 usec

Time was measured on an Intel Xeon 6154 compiling with Debian gcc 9.2.1.

v2 of this patch set adds the new benchmark to the perf-bench man page
and addresses review comments from Jiri Olsa, thanks!

Two patches in the set were sent to LKML previously but are included
here for context around the benchmark performance impact:
https://lore.kernel.org/lkml/20200327172914.28603-1-irogers@google.com/T/#u
https://lore.kernel.org/lkml/20200328014221.168130-1-irogers@google.com/T/#u

A future area of improvement could be to add the perf top
num-thread-synthesize option more widely to other perf commands, and
also to benchmark its effectiveness.

Ian Rogers (4):
  perf bench: add event synthesis benchmark
  perf synthetic-events: save 4kb from 2 stack frames
  tools api: add a lightweight buffered reading api
  perf synthetic events: Remove use of sscanf from /proc reading

Stephane Eranian (1):
  tools api fs: make xxx__mountpoint() more scalable

 tools/lib/api/fs/fs.c                   |  17 +++
 tools/lib/api/fs/fs.h                   |  12 ++
 tools/lib/api/io.h                      | 107 ++++++++++++++
 tools/perf/Documentation/perf-bench.txt |   8 ++
 tools/perf/bench/Build                  |   2 +-
 tools/perf/bench/bench.h                |   2 +-
 tools/perf/bench/synthesize.c           | 101 ++++++++++++++
 tools/perf/builtin-bench.c              |   6 +
 tools/perf/util/synthetic-events.c      | 177 +++++++++++++++---------
 9 files changed, 367 insertions(+), 65 deletions(-)
 create mode 100644 tools/lib/api/io.h
 create mode 100644 tools/perf/bench/synthesize.c

-- 
2.26.0.rc2.310.g2932bb562d-goog


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/5] perf bench: add event synthesis benchmark
  2020-04-02 15:43 [PATCH v2 0/5] Benchmark and improve event synthesis performance Ian Rogers
@ 2020-04-02 15:43 ` Ian Rogers
  2020-04-06 14:07   ` Arnaldo Carvalho de Melo
  2020-04-22 12:17   ` [tip: perf/core] perf bench: Add " tip-bot2 for Ian Rogers
  2020-04-02 15:43 ` [PATCH v2 2/5] tools api fs: make xxx__mountpoint() more scalable Ian Rogers
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Ian Rogers @ 2020-04-02 15:43 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Petr Mladek, Andrey Zhizhikin, Kefeng Wang, Thomas Gleixner,
	Kan Liang, linux-kernel, linux-perf-users
  Cc: Stephane Eranian, Ian Rogers

Event synthesis may occur at the start or end (tail) of a perf command.
In system-wide mode it can scan every process in /proc, which may add
seconds of latency before event recording. Add a new benchmark that
times how long event synthesis takes with and without data synthesis.

An example execution looks like:
 $ perf bench internals synthesize
 # Running 'internals/synthesize' benchmark:
 Average synthesis took: 168.253800 usec
 Average data synthesis took: 208.104700 usec

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/Documentation/perf-bench.txt |   8 ++
 tools/perf/bench/Build                  |   2 +-
 tools/perf/bench/bench.h                |   2 +-
 tools/perf/bench/synthesize.c           | 101 ++++++++++++++++++++++++
 tools/perf/builtin-bench.c              |   6 ++
 5 files changed, 117 insertions(+), 2 deletions(-)
 create mode 100644 tools/perf/bench/synthesize.c

diff --git a/tools/perf/Documentation/perf-bench.txt b/tools/perf/Documentation/perf-bench.txt
index 0921a3c67381..bad16512c48d 100644
--- a/tools/perf/Documentation/perf-bench.txt
+++ b/tools/perf/Documentation/perf-bench.txt
@@ -61,6 +61,9 @@ SUBSYSTEM
 'epoll'::
 	Eventpoll (epoll) stressing benchmarks.
 
+'internals'::
+	Benchmark internal perf functionality.
+
 'all'::
 	All benchmark subsystems.
 
@@ -214,6 +217,11 @@ Suite for evaluating concurrent epoll_wait calls.
 *ctl*::
 Suite for evaluating multiple epoll_ctl calls.
 
+SUITES FOR 'internals'
+~~~~~~~~~~~~~~~~~~~~~~
+*synthesize*::
+Suite for evaluating perf's event synthesis performance.
+
 SEE ALSO
 --------
 linkperf:perf[1]
diff --git a/tools/perf/bench/Build b/tools/perf/bench/Build
index e4e321b6f883..042827385c87 100644
--- a/tools/perf/bench/Build
+++ b/tools/perf/bench/Build
@@ -6,9 +6,9 @@ perf-y += futex-wake.o
 perf-y += futex-wake-parallel.o
 perf-y += futex-requeue.o
 perf-y += futex-lock-pi.o
-
 perf-y += epoll-wait.o
 perf-y += epoll-ctl.o
+perf-y += synthesize.o
 
 perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-lib.o
 perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-asm.o
diff --git a/tools/perf/bench/bench.h b/tools/perf/bench/bench.h
index 4aa6de1aa67d..4d669c803237 100644
--- a/tools/perf/bench/bench.h
+++ b/tools/perf/bench/bench.h
@@ -41,9 +41,9 @@ int bench_futex_wake_parallel(int argc, const char **argv);
 int bench_futex_requeue(int argc, const char **argv);
 /* pi futexes */
 int bench_futex_lock_pi(int argc, const char **argv);
-
 int bench_epoll_wait(int argc, const char **argv);
 int bench_epoll_ctl(int argc, const char **argv);
+int bench_synthesize(int argc, const char **argv);
 
 #define BENCH_FORMAT_DEFAULT_STR	"default"
 #define BENCH_FORMAT_DEFAULT		0
diff --git a/tools/perf/bench/synthesize.c b/tools/perf/bench/synthesize.c
new file mode 100644
index 000000000000..6291257bc9c9
--- /dev/null
+++ b/tools/perf/bench/synthesize.c
@@ -0,0 +1,101 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Benchmark synthesis of perf events such as at the start of a 'perf
+ * record'. Synthesis is done on the current process and the 'dummy' event
+ * handlers are invoked that support dump_trace but otherwise do nothing.
+ *
+ * Copyright 2019 Google LLC.
+ */
+#include <stdio.h>
+#include "bench.h"
+#include "../util/debug.h"
+#include "../util/session.h"
+#include "../util/synthetic-events.h"
+#include "../util/target.h"
+#include "../util/thread_map.h"
+#include "../util/tool.h"
+#include <linux/err.h>
+#include <linux/time64.h>
+#include <subcmd/parse-options.h>
+
+static unsigned int iterations = 10000;
+
+static const struct option options[] = {
+	OPT_UINTEGER('i', "iterations", &iterations,
+		"Number of iterations used to compute average"),
+	OPT_END()
+};
+
+static const char *const usage[] = {
+	"perf bench internals synthesize <options>",
+	NULL
+};
+
+
+static int do_synthesize(struct perf_session *session,
+			struct perf_thread_map *threads,
+			struct target *target, bool data_mmap)
+{
+	const unsigned int nr_threads_synthesize = 1;
+	struct timeval start, end, diff;
+	u64 runtime_us;
+	unsigned int i;
+	double average;
+	int err;
+
+	gettimeofday(&start, NULL);
+	for (i = 0; i < iterations; i++) {
+		err = machine__synthesize_threads(&session->machines.host,
+						target, threads, data_mmap,
+						nr_threads_synthesize);
+		if (err)
+			return err;
+	}
+
+	gettimeofday(&end, NULL);
+	timersub(&end, &start, &diff);
+	runtime_us = diff.tv_sec * USEC_PER_SEC + diff.tv_usec;
+	average = (double)runtime_us/(double)iterations;
+	printf("Average %ssynthesis took: %f usec\n",
+		data_mmap ? "data " : "", average);
+	return 0;
+}
+
+int bench_synthesize(int argc, const char **argv)
+{
+	struct perf_tool tool;
+	struct perf_session *session;
+	struct target target = {
+		.pid = "self",
+	};
+	struct perf_thread_map *threads;
+	int err;
+
+	argc = parse_options(argc, argv, options, usage, 0);
+
+	session = perf_session__new(NULL, false, NULL);
+	if (IS_ERR(session)) {
+		pr_err("Session creation failed.\n");
+		return PTR_ERR(session);
+	}
+	threads = thread_map__new_by_pid(getpid());
+	if (!threads) {
+		pr_err("Thread map creation failed.\n");
+		err = -ENOMEM;
+		goto err_out;
+	}
+	perf_tool__fill_defaults(&tool);
+
+	err = do_synthesize(session, threads, &target, false);
+	if (err)
+		goto err_out;
+
+	err = do_synthesize(session, threads, &target, true);
+
+err_out:
+	if (threads)
+		perf_thread_map__put(threads);
+
+	perf_session__delete(session);
+	return err;
+}
diff --git a/tools/perf/builtin-bench.c b/tools/perf/builtin-bench.c
index c06fe21c8613..11c79a8d85d6 100644
--- a/tools/perf/builtin-bench.c
+++ b/tools/perf/builtin-bench.c
@@ -76,6 +76,11 @@ static struct bench epoll_benchmarks[] = {
 };
 #endif // HAVE_EVENTFD
 
+static struct bench internals_benchmarks[] = {
+	{ "synthesize", "Benchmark perf event synthesis",	bench_synthesize	},
+	{ NULL,		NULL,					NULL			}
+};
+
 struct collection {
 	const char	*name;
 	const char	*summary;
@@ -92,6 +97,7 @@ static struct collection collections[] = {
 #ifdef HAVE_EVENTFD
 	{"epoll",       "Epoll stressing benchmarks",                   epoll_benchmarks        },
 #endif
+	{ "internals",	"Perf-internals benchmarks",			internals_benchmarks	},
 	{ "all",	"All benchmarks",				NULL			},
 	{ NULL,		NULL,						NULL			}
 };
-- 
2.26.0.rc2.310.g2932bb562d-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/5] tools api fs: make xxx__mountpoint() more scalable
  2020-04-02 15:43 [PATCH v2 0/5] Benchmark and improve event synthesis performance Ian Rogers
  2020-04-02 15:43 ` [PATCH v2 1/5] perf bench: add event synthesis benchmark Ian Rogers
@ 2020-04-02 15:43 ` Ian Rogers
  2020-04-06 14:07   ` Arnaldo Carvalho de Melo
  2020-04-22 12:17   ` [tip: perf/core] tools api fs: Make " tip-bot2 for Stephane Eranian
  2020-04-02 15:43 ` [PATCH v2 3/5] perf synthetic-events: save 4kb from 2 stack frames Ian Rogers
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Ian Rogers @ 2020-04-02 15:43 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Petr Mladek, Andrey Zhizhikin, Kefeng Wang, Thomas Gleixner,
	Kan Liang, linux-kernel, linux-perf-users
  Cc: Stephane Eranian, Ian Rogers

From: Stephane Eranian <eranian@google.com>

The xxx_mountpoint() interface provided by fs.c finds
mount points for common pseudo filesystems. The first
time xxx_mountpoint() is invoked, it scans the mount
table (/proc/mounts) looking for a match. If found, it
is cached. The price to scan /proc/mounts is paid once
if the mount is found.

When the mount point is not found, subsequent calls to
xxx_mountpoint() scan /proc/mounts over and over again.
There is no caching.

This causes a scaling issue in perf record with hugeltbfs__mountpoint().
The function is called for each process found in synthesize__mmap_events().
If the machine has thousands of processes and if the /proc/mounts has many
entries this could cause major overhead in perf record. We have observed
multi-second slowdowns on some configurations.

As an example on a laptop:

Before:
$ sudo umount /dev/hugepages
$ strace -e trace=openat -o /tmp/tt perf record -a ls
$ fgrep mounts /tmp/tt
285

After:
$ sudo umount /dev/hugepages
$ strace -e trace=openat -o /tmp/tt perf record -a ls
$ fgrep mounts /tmp/tt
1

One could argue that the non-caching in case the moint point is not found
is intentional. That way subsequent calls may discover a moint point if
the sysadmin mounts the filesystem. But the same argument could be made
against caching the mount point. It could be unmounted causing errors.
It all depends on the intent of the interface. This patch assumes it
is expected to scan /proc/mounts once. The patch documents the caching
behavior in the fs.h header file.

An alternative would be to just fix perf record. But it would solve
the problem with hugetlbs__mountpoint() but there could be similar
issues (possibly down the line) with other xxx_mountpoint() calls
in perf or other tools.

Signed-off-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/lib/api/fs/fs.c | 17 +++++++++++++++++
 tools/lib/api/fs/fs.h | 12 ++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/tools/lib/api/fs/fs.c b/tools/lib/api/fs/fs.c
index 027b18f7ed8c..82f53d81a7a7 100644
--- a/tools/lib/api/fs/fs.c
+++ b/tools/lib/api/fs/fs.c
@@ -90,6 +90,7 @@ struct fs {
 	const char * const	*mounts;
 	char			 path[PATH_MAX];
 	bool			 found;
+	bool			 checked;
 	long			 magic;
 };
 
@@ -111,31 +112,37 @@ static struct fs fs__entries[] = {
 		.name	= "sysfs",
 		.mounts	= sysfs__fs_known_mountpoints,
 		.magic	= SYSFS_MAGIC,
+		.checked = false,
 	},
 	[FS__PROCFS] = {
 		.name	= "proc",
 		.mounts	= procfs__known_mountpoints,
 		.magic	= PROC_SUPER_MAGIC,
+		.checked = false,
 	},
 	[FS__DEBUGFS] = {
 		.name	= "debugfs",
 		.mounts	= debugfs__known_mountpoints,
 		.magic	= DEBUGFS_MAGIC,
+		.checked = false,
 	},
 	[FS__TRACEFS] = {
 		.name	= "tracefs",
 		.mounts	= tracefs__known_mountpoints,
 		.magic	= TRACEFS_MAGIC,
+		.checked = false,
 	},
 	[FS__HUGETLBFS] = {
 		.name	= "hugetlbfs",
 		.mounts = hugetlbfs__known_mountpoints,
 		.magic	= HUGETLBFS_MAGIC,
+		.checked = false,
 	},
 	[FS__BPF_FS] = {
 		.name	= "bpf",
 		.mounts = bpf_fs__known_mountpoints,
 		.magic	= BPF_FS_MAGIC,
+		.checked = false,
 	},
 };
 
@@ -158,6 +165,7 @@ static bool fs__read_mounts(struct fs *fs)
 	}
 
 	fclose(fp);
+	fs->checked = true;
 	return fs->found = found;
 }
 
@@ -220,6 +228,7 @@ static bool fs__env_override(struct fs *fs)
 		return false;
 
 	fs->found = true;
+	fs->checked = true;
 	strncpy(fs->path, override_path, sizeof(fs->path) - 1);
 	fs->path[sizeof(fs->path) - 1] = '\0';
 	return true;
@@ -246,6 +255,14 @@ static const char *fs__mountpoint(int idx)
 	if (fs->found)
 		return (const char *)fs->path;
 
+	/* the mount point was already checked for the mount point
+	 * but and did not exist, so return NULL to avoid scanning again.
+	 * This makes the found and not found paths cost equivalent
+	 * in case of multiple calls.
+	 */
+	if (fs->checked)
+		return NULL;
+
 	return fs__get_mountpoint(fs);
 }
 
diff --git a/tools/lib/api/fs/fs.h b/tools/lib/api/fs/fs.h
index 936edb95e1f3..aa222ca30311 100644
--- a/tools/lib/api/fs/fs.h
+++ b/tools/lib/api/fs/fs.h
@@ -18,6 +18,18 @@
 	const char *name##__mount(void);	\
 	bool name##__configured(void);		\
 
+/*
+ * The xxxx__mountpoint() entry points find the first match mount point for each
+ * filesystems listed below, where xxxx is the filesystem type.
+ *
+ * The interface is as follows:
+ *
+ * - If a mount point is found on first call, it is cached and used for all
+ *   subsequent calls.
+ *
+ * - If a mount point is not found, NULL is returned on first call and all
+ *   subsequent calls.
+ */
 FS(sysfs)
 FS(procfs)
 FS(debugfs)
-- 
2.26.0.rc2.310.g2932bb562d-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 3/5] perf synthetic-events: save 4kb from 2 stack frames
  2020-04-02 15:43 [PATCH v2 0/5] Benchmark and improve event synthesis performance Ian Rogers
  2020-04-02 15:43 ` [PATCH v2 1/5] perf bench: add event synthesis benchmark Ian Rogers
  2020-04-02 15:43 ` [PATCH v2 2/5] tools api fs: make xxx__mountpoint() more scalable Ian Rogers
@ 2020-04-02 15:43 ` Ian Rogers
  2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Ian Rogers
  2020-04-02 15:43 ` [PATCH v2 4/5] tools api: add a lightweight buffered reading api Ian Rogers
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Ian Rogers @ 2020-04-02 15:43 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Petr Mladek, Andrey Zhizhikin, Kefeng Wang, Thomas Gleixner,
	Kan Liang, linux-kernel, linux-perf-users
  Cc: Stephane Eranian, Ian Rogers

Reuse an existing char buffer to avoid two PATH_MAX sized char buffers.
Reduces stack frame sizes by 4kb.

perf_event__synthesize_mmap_events before 'sub $0x45b8,%rsp' after
'sub $0x35b8,%rsp'.

perf_event__get_comm_ids before 'sub $0x2028,%rsp' after
'sub $0x1028,%rsp'.

The performance impact of this change is negligible.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/synthetic-events.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 3f28af39f9c6..1f3d8d4bb879 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -70,7 +70,6 @@ int perf_tool__process_synth_event(struct perf_tool *tool,
 static int perf_event__get_comm_ids(pid_t pid, char *comm, size_t len,
 				    pid_t *tgid, pid_t *ppid)
 {
-	char filename[PATH_MAX];
 	char bf[4096];
 	int fd;
 	size_t size = 0;
@@ -80,11 +79,11 @@ static int perf_event__get_comm_ids(pid_t pid, char *comm, size_t len,
 	*tgid = -1;
 	*ppid = -1;
 
-	snprintf(filename, sizeof(filename), "/proc/%d/status", pid);
+	snprintf(bf, sizeof(bf), "/proc/%d/status", pid);
 
-	fd = open(filename, O_RDONLY);
+	fd = open(bf, O_RDONLY);
 	if (fd < 0) {
-		pr_debug("couldn't open %s\n", filename);
+		pr_debug("couldn't open %s\n", bf);
 		return -1;
 	}
 
@@ -280,9 +279,9 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 				       struct machine *machine,
 				       bool mmap_data)
 {
-	char filename[PATH_MAX];
 	FILE *fp;
 	unsigned long long t;
+	char bf[BUFSIZ];
 	bool truncation = false;
 	unsigned long long timeout = proc_map_timeout * 1000000ULL;
 	int rc = 0;
@@ -292,15 +291,15 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 	if (machine__is_default_guest(machine))
 		return 0;
 
-	snprintf(filename, sizeof(filename), "%s/proc/%d/task/%d/maps",
-		 machine->root_dir, pid, pid);
+	snprintf(bf, sizeof(bf), "%s/proc/%d/task/%d/maps",
+		machine->root_dir, pid, pid);
 
-	fp = fopen(filename, "r");
+	fp = fopen(bf, "r");
 	if (fp == NULL) {
 		/*
 		 * We raced with a task exiting - just return:
 		 */
-		pr_debug("couldn't open %s\n", filename);
+		pr_debug("couldn't open %s\n", bf);
 		return -1;
 	}
 
@@ -308,7 +307,6 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 	t = rdclock();
 
 	while (1) {
-		char bf[BUFSIZ];
 		char prot[5];
 		char execname[PATH_MAX];
 		char anonstr[] = "//anon";
@@ -320,10 +318,10 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 			break;
 
 		if ((rdclock() - t) > timeout) {
-			pr_warning("Reading %s time out. "
+			pr_warning("Reading %s/proc/%d/task/%d/maps time out. "
 				   "You may want to increase "
 				   "the time limit by --proc-map-timeout\n",
-				   filename);
+				   machine->root_dir, pid, pid);
 			truncation = true;
 			goto out;
 		}
-- 
2.26.0.rc2.310.g2932bb562d-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 4/5] tools api: add a lightweight buffered reading api
  2020-04-02 15:43 [PATCH v2 0/5] Benchmark and improve event synthesis performance Ian Rogers
                   ` (2 preceding siblings ...)
  2020-04-02 15:43 ` [PATCH v2 3/5] perf synthetic-events: save 4kb from 2 stack frames Ian Rogers
@ 2020-04-02 15:43 ` Ian Rogers
  2020-04-04  3:06   ` Namhyung Kim
  2020-04-02 15:43 ` [PATCH v2 5/5] perf synthetic events: Remove use of sscanf from /proc reading Ian Rogers
  2020-04-03 11:01 ` [PATCH v2 0/5] Benchmark and improve event synthesis performance Jiri Olsa
  5 siblings, 1 reply; 17+ messages in thread
From: Ian Rogers @ 2020-04-02 15:43 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Petr Mladek, Andrey Zhizhikin, Kefeng Wang, Thomas Gleixner,
	Kan Liang, linux-kernel, linux-perf-users
  Cc: Stephane Eranian, Ian Rogers

The synthesize benchmark shows the majority of execution time going to
fgets and sscanf, necessary to parse /proc/pid/maps. Add a new buffered
reading library that will be used to replace these calls in a follow-up
CL.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/lib/api/io.h | 107 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 107 insertions(+)
 create mode 100644 tools/lib/api/io.h

diff --git a/tools/lib/api/io.h b/tools/lib/api/io.h
new file mode 100644
index 000000000000..5aa5b0e26a7a
--- /dev/null
+++ b/tools/lib/api/io.h
@@ -0,0 +1,107 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Lightweight buffered reading library.
+ *
+ * Copyright 2019 Google LLC.
+ */
+#ifndef __API_IO__
+#define __API_IO__
+
+struct io {
+	/* File descriptor being read/ */
+	int fd;
+	/* Size of the read buffer. */
+	unsigned int buf_len;
+	/* Pointer to storage for buffering read. */
+	char *buf;
+	/* End of the storage. */
+	char *end;
+	/* Currently accessed data pointer. */
+	char *data;
+	/* Set true on when the end of file on read error. */
+	bool eof;
+};
+
+static inline void io__init(struct io *io, int fd,
+			    char *buf, unsigned int buf_len)
+{
+	io->fd = fd;
+	io->buf_len = buf_len;
+	io->buf = buf;
+	io->end = buf;
+	io->data = buf;
+	io->eof = false;
+}
+
+/* Reads one character from the "io" file with similar semantics to fgetc. */
+static inline int io__get_char(struct io *io)
+{
+	char *ptr = io->data;
+
+	if (ptr == io->end) {
+		ssize_t n = read(io->fd, io->buf, io->buf_len);
+
+		if (n <= 0) {
+			io->eof = true;
+			return -1;
+		}
+		ptr = &io->buf[0];
+		io->end = &io->buf[n];
+	}
+	io->data = ptr + 1;
+	return *ptr;
+}
+
+/* Read a hexadecimal value with no 0x prefix into the out argument hex.
+ * Returns -1 on error or if nothing is read, otherwise returns the character
+ * after the hexadecimal value.
+ */
+static inline int io__get_hex(struct io *io, __u64 *hex)
+{
+	bool first_read = true;
+
+	*hex = 0;
+	while (true) {
+		char ch = io__get_char(io);
+
+		if (ch < 0)
+			return ch;
+		if (ch >= '0' && ch <= '9')
+			*hex = (*hex << 4) | (ch - '0');
+		else if (ch >= 'a' && ch <= 'f')
+			*hex = (*hex << 4) | (ch - 'a' + 10);
+		else if (ch >= 'A' && ch <= 'F')
+			*hex = (*hex << 4) | (ch - 'A' + 10);
+		else if (first_read)
+			return -1;
+		else
+			return ch;
+		first_read = false;
+	}
+}
+
+/* Read a decimal value into the out argument dec.
+ * Returns -1 on error or if nothing is read, otherwise returns the character
+ * after the decimal value.
+ */
+static inline int io__get_dec(struct io *io, __u64 *dec)
+{
+	bool first_read = true;
+
+	*dec = 0;
+	while (true) {
+		char ch = io__get_char(io);
+
+		if (ch < 0)
+			return ch;
+		if (ch >= '0' && ch <= '9')
+			*dec = (*dec * 10) + ch - '0';
+		else if (first_read)
+			return -1;
+		else
+			return ch;
+		first_read = false;
+	}
+}
+
+#endif /* __API_IO__ */
-- 
2.26.0.rc2.310.g2932bb562d-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 5/5] perf synthetic events: Remove use of sscanf from /proc reading
  2020-04-02 15:43 [PATCH v2 0/5] Benchmark and improve event synthesis performance Ian Rogers
                   ` (3 preceding siblings ...)
  2020-04-02 15:43 ` [PATCH v2 4/5] tools api: add a lightweight buffered reading api Ian Rogers
@ 2020-04-02 15:43 ` Ian Rogers
  2020-04-03 11:01 ` [PATCH v2 0/5] Benchmark and improve event synthesis performance Jiri Olsa
  5 siblings, 0 replies; 17+ messages in thread
From: Ian Rogers @ 2020-04-02 15:43 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Petr Mladek, Andrey Zhizhikin, Kefeng Wang, Thomas Gleixner,
	Kan Liang, linux-kernel, linux-perf-users
  Cc: Stephane Eranian, Ian Rogers

The synthesize benchmark, run on a single process and thread, shows
perf_event__synthesize_mmap_events as the hottest function with fgets
and sscanf taking the majority of execution time. fscanf performs
similarly well. Replace the scanf call with manual reading of each field
of the /proc/pid/maps line, and remove some unnecessary buffering.
This change also addresses potential, but unlikely, buffer overruns for
the string values read by scanf.

Performance before is:
Average synthesis took: 120.195100 usec
Average data synthesis took: 156.582300 usec

And after is:
Average synthesis took: 67.189100 usec
Average data synthesis took: 102.451600 usec

On a Intel Xeon 6154 compiling with Debian gcc 9.2.1.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/synthetic-events.c | 157 +++++++++++++++++++----------
 1 file changed, 105 insertions(+), 52 deletions(-)

diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 1f3d8d4bb879..ebe48e35499d 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -36,6 +36,7 @@
 #include <string.h>
 #include <uapi/linux/mman.h> /* To get things like MAP_HUGETLB even on older libc headers */
 #include <api/fs/fs.h>
+#include <api/io.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <fcntl.h>
@@ -272,6 +273,79 @@ static int perf_event__synthesize_fork(struct perf_tool *tool,
 	return 0;
 }
 
+static bool read_proc_maps_line(struct io *io, __u64 *start, __u64 *end,
+				u32 *prot, u32 *flags, __u64 *offset,
+				u32 *maj, u32 *min,
+				__u64 *inode,
+				ssize_t pathname_size, char *pathname)
+{
+	__u64 temp;
+	int ch;
+	char *start_pathname = pathname;
+
+	if (io__get_hex(io, start) != '-')
+		return false;
+	if (io__get_hex(io, end) != ' ')
+		return false;
+
+	/* map protection and flags bits */
+	*prot = 0;
+	ch = io__get_char(io);
+	if (ch == 'r')
+		*prot |= PROT_READ;
+	else if (ch != '-')
+		return false;
+	ch = io__get_char(io);
+	if (ch == 'w')
+		*prot |= PROT_WRITE;
+	else if (ch != '-')
+		return false;
+	ch = io__get_char(io);
+	if (ch == 'x')
+		*prot |= PROT_EXEC;
+	else if (ch != '-')
+		return false;
+	ch = io__get_char(io);
+	if (ch == 's')
+		*flags = MAP_SHARED;
+	else if (ch == 'p')
+		*flags = MAP_PRIVATE;
+	else
+		return false;
+	if (io__get_char(io) != ' ')
+		return false;
+
+	if (io__get_hex(io, offset) != ' ')
+		return false;
+
+	if (io__get_hex(io, &temp) != ':')
+		return false;
+	*maj = temp;
+	if (io__get_hex(io, &temp) != ' ')
+		return false;
+	*min = temp;
+
+	ch = io__get_dec(io, inode);
+	if (ch != ' ') {
+		*pathname = '\0';
+		return ch == '\n';
+	}
+	do {
+		ch = io__get_char(io);
+	} while (ch == ' ');
+	while (true) {
+		if (ch < 0)
+			return false;
+		if (ch == '\0' || ch == '\n' ||
+		    (pathname + 1 - start_pathname) >= pathname_size) {
+			*pathname = '\0';
+			return true;
+		}
+		*pathname++ = ch;
+		ch = io__get_char(io);
+	}
+}
+
 int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 				       union perf_event *event,
 				       pid_t pid, pid_t tgid,
@@ -279,9 +353,9 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 				       struct machine *machine,
 				       bool mmap_data)
 {
-	FILE *fp;
 	unsigned long long t;
 	char bf[BUFSIZ];
+	struct io io;
 	bool truncation = false;
 	unsigned long long timeout = proc_map_timeout * 1000000ULL;
 	int rc = 0;
@@ -294,28 +368,39 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 	snprintf(bf, sizeof(bf), "%s/proc/%d/task/%d/maps",
 		machine->root_dir, pid, pid);
 
-	fp = fopen(bf, "r");
-	if (fp == NULL) {
+	io.fd = open(bf, O_RDONLY, 0);
+	if (io.fd < 0) {
 		/*
 		 * We raced with a task exiting - just return:
 		 */
 		pr_debug("couldn't open %s\n", bf);
 		return -1;
 	}
+	io__init(&io, io.fd, bf, sizeof(bf));
 
 	event->header.type = PERF_RECORD_MMAP2;
 	t = rdclock();
 
-	while (1) {
-		char prot[5];
-		char execname[PATH_MAX];
-		char anonstr[] = "//anon";
-		unsigned int ino;
+	while (!io.eof) {
+		static const char anonstr[] = "//anon";
 		size_t size;
-		ssize_t n;
 
-		if (fgets(bf, sizeof(bf), fp) == NULL)
-			break;
+		/* ensure null termination since stack will be reused. */
+		event->mmap2.filename[0] = '\0';
+
+		/* 00400000-0040c000 r-xp 00000000 fd:01 41038  /bin/cat */
+		if (!read_proc_maps_line(&io,
+					&event->mmap2.start,
+					&event->mmap2.len,
+					&event->mmap2.prot,
+					&event->mmap2.flags,
+					&event->mmap2.pgoff,
+					&event->mmap2.maj,
+					&event->mmap2.min,
+					&event->mmap2.ino,
+					sizeof(event->mmap2.filename),
+					event->mmap2.filename))
+			continue;
 
 		if ((rdclock() - t) > timeout) {
 			pr_warning("Reading %s/proc/%d/task/%d/maps time out. "
@@ -326,23 +411,6 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 			goto out;
 		}
 
-		/* ensure null termination since stack will be reused. */
-		strcpy(execname, "");
-
-		/* 00400000-0040c000 r-xp 00000000 fd:01 41038  /bin/cat */
-		n = sscanf(bf, "%"PRI_lx64"-%"PRI_lx64" %s %"PRI_lx64" %x:%x %u %[^\n]\n",
-		       &event->mmap2.start, &event->mmap2.len, prot,
-		       &event->mmap2.pgoff, &event->mmap2.maj,
-		       &event->mmap2.min,
-		       &ino, execname);
-
-		/*
- 		 * Anon maps don't have the execname.
- 		 */
-		if (n < 7)
-			continue;
-
-		event->mmap2.ino = (u64)ino;
 		event->mmap2.ino_generation = 0;
 
 		/*
@@ -353,23 +421,8 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 		else
 			event->header.misc = PERF_RECORD_MISC_GUEST_USER;
 
-		/* map protection and flags bits */
-		event->mmap2.prot = 0;
-		event->mmap2.flags = 0;
-		if (prot[0] == 'r')
-			event->mmap2.prot |= PROT_READ;
-		if (prot[1] == 'w')
-			event->mmap2.prot |= PROT_WRITE;
-		if (prot[2] == 'x')
-			event->mmap2.prot |= PROT_EXEC;
-
-		if (prot[3] == 's')
-			event->mmap2.flags |= MAP_SHARED;
-		else
-			event->mmap2.flags |= MAP_PRIVATE;
-
-		if (prot[2] != 'x') {
-			if (!mmap_data || prot[0] != 'r')
+		if ((event->mmap2.prot & PROT_EXEC) == 0) {
+			if (!mmap_data || (event->mmap2.prot & PROT_READ) == 0)
 				continue;
 
 			event->header.misc |= PERF_RECORD_MISC_MMAP_DATA;
@@ -379,17 +432,17 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 		if (truncation)
 			event->header.misc |= PERF_RECORD_MISC_PROC_MAP_PARSE_TIMEOUT;
 
-		if (!strcmp(execname, ""))
-			strcpy(execname, anonstr);
+		if (!strcmp(event->mmap2.filename, ""))
+			strcpy(event->mmap2.filename, anonstr);
 
 		if (hugetlbfs_mnt_len &&
-		    !strncmp(execname, hugetlbfs_mnt, hugetlbfs_mnt_len)) {
-			strcpy(execname, anonstr);
+		    !strncmp(event->mmap2.filename, hugetlbfs_mnt,
+			     hugetlbfs_mnt_len)) {
+			strcpy(event->mmap2.filename, anonstr);
 			event->mmap2.flags |= MAP_HUGETLB;
 		}
 
-		size = strlen(execname) + 1;
-		memcpy(event->mmap2.filename, execname, size);
+		size = strlen(event->mmap2.filename) + 1;
 		size = PERF_ALIGN(size, sizeof(u64));
 		event->mmap2.len -= event->mmap.start;
 		event->mmap2.header.size = (sizeof(event->mmap2) -
@@ -408,7 +461,7 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 			break;
 	}
 
-	fclose(fp);
+	close(io.fd);
 	return rc;
 }
 
-- 
2.26.0.rc2.310.g2932bb562d-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/5] Benchmark and improve event synthesis performance
  2020-04-02 15:43 [PATCH v2 0/5] Benchmark and improve event synthesis performance Ian Rogers
                   ` (4 preceding siblings ...)
  2020-04-02 15:43 ` [PATCH v2 5/5] perf synthetic events: Remove use of sscanf from /proc reading Ian Rogers
@ 2020-04-03 11:01 ` Jiri Olsa
  5 siblings, 0 replies; 17+ messages in thread
From: Jiri Olsa @ 2020-04-03 11:01 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Namhyung Kim, Petr Mladek,
	Andrey Zhizhikin, Kefeng Wang, Thomas Gleixner, Kan Liang,
	linux-kernel, linux-perf-users, Stephane Eranian

On Thu, Apr 02, 2020 at 08:43:52AM -0700, Ian Rogers wrote:
> Event synthesis is performance critical in common tasks using perf. For
> example, when perf record starts in system wide mode the /proc file
> system is scanned with events synthesized for each process and all
> executable mmaps. With large machines and lots of processes, we have seen
> O(seconds) of wall clock time while synthesis is occurring.
> 
> This patch set adds a benchmark for synthesis performance in a new
> benchmark collection called 'internals'. The benchmark uses the
> machine__synthesize_threads function, single threaded on the perf process
> with a 'tool' that just drops the events, to measure how long synthesis
> takes.
> 
> By profiling this benchmark 2 performance bottlenecks were identified,
> hugetlbfs_mountpoint and stdio. The impact of theses changes are:
> 
> Before:
> Average synthesis took: 167.616800 usec
> Average data synthesis took: 208.655600 usec
> 
> After hugetlbfs_mountpoint scalability fix:
> Average synthesis took: 120.195100 usec
> Average data synthesis took: 156.582300 usec
> 
> After removal of stdio in /proc/pid/maps code:
> Average synthesis took: 67.189100 usec
> Average data synthesis took: 102.451600 usec
> 
> Time was measured on an Intel Xeon 6154 compiling with Debian gcc 9.2.1.
> 
> v2 of this patch set adds the new benchmark to the perf-bench man page
> and addresses review comments from Jiri Olsa, thanks!

Acked-by: Jiri Olsa <jolsa@redhat.com>

thanks,
jirka

> 
> Two patches in the set were sent to LKML previously but are included
> here for context around the benchmark performance impact:
> https://lore.kernel.org/lkml/20200327172914.28603-1-irogers@google.com/T/#u
> https://lore.kernel.org/lkml/20200328014221.168130-1-irogers@google.com/T/#u
> 
> A future area of improvement could be to add the perf top
> num-thread-synthesize option more widely to other perf commands, and
> also to benchmark its effectiveness.
> 
> Ian Rogers (4):
>   perf bench: add event synthesis benchmark
>   perf synthetic-events: save 4kb from 2 stack frames
>   tools api: add a lightweight buffered reading api
>   perf synthetic events: Remove use of sscanf from /proc reading
> 
> Stephane Eranian (1):
>   tools api fs: make xxx__mountpoint() more scalable
> 
>  tools/lib/api/fs/fs.c                   |  17 +++
>  tools/lib/api/fs/fs.h                   |  12 ++
>  tools/lib/api/io.h                      | 107 ++++++++++++++
>  tools/perf/Documentation/perf-bench.txt |   8 ++
>  tools/perf/bench/Build                  |   2 +-
>  tools/perf/bench/bench.h                |   2 +-
>  tools/perf/bench/synthesize.c           | 101 ++++++++++++++
>  tools/perf/builtin-bench.c              |   6 +
>  tools/perf/util/synthetic-events.c      | 177 +++++++++++++++---------
>  9 files changed, 367 insertions(+), 65 deletions(-)
>  create mode 100644 tools/lib/api/io.h
>  create mode 100644 tools/perf/bench/synthesize.c
> 
> -- 
> 2.26.0.rc2.310.g2932bb562d-goog
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] tools api: add a lightweight buffered reading api
  2020-04-02 15:43 ` [PATCH v2 4/5] tools api: add a lightweight buffered reading api Ian Rogers
@ 2020-04-04  3:06   ` Namhyung Kim
  2020-04-06 14:09     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 17+ messages in thread
From: Namhyung Kim @ 2020-04-04  3:06 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Petr Mladek,
	Andrey Zhizhikin, Kefeng Wang, Thomas Gleixner, Kan Liang,
	linux-kernel, linux-perf-users, Stephane Eranian

Hello,

On Fri, Apr 3, 2020 at 12:44 AM Ian Rogers <irogers@google.com> wrote:
>
> The synthesize benchmark shows the majority of execution time going to
> fgets and sscanf, necessary to parse /proc/pid/maps. Add a new buffered
> reading library that will be used to replace these calls in a follow-up
> CL.
>
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/lib/api/io.h | 107 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 107 insertions(+)
>  create mode 100644 tools/lib/api/io.h
>
> diff --git a/tools/lib/api/io.h b/tools/lib/api/io.h
> new file mode 100644
> index 000000000000..5aa5b0e26a7a
> --- /dev/null
> +++ b/tools/lib/api/io.h
> @@ -0,0 +1,107 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Lightweight buffered reading library.
> + *
> + * Copyright 2019 Google LLC.
> + */
> +#ifndef __API_IO__
> +#define __API_IO__
> +
> +struct io {
> +       /* File descriptor being read/ */
> +       int fd;
> +       /* Size of the read buffer. */
> +       unsigned int buf_len;
> +       /* Pointer to storage for buffering read. */
> +       char *buf;
> +       /* End of the storage. */
> +       char *end;
> +       /* Currently accessed data pointer. */
> +       char *data;
> +       /* Set true on when the end of file on read error. */
> +       bool eof;
> +};
> +
> +static inline void io__init(struct io *io, int fd,
> +                           char *buf, unsigned int buf_len)
> +{
> +       io->fd = fd;
> +       io->buf_len = buf_len;
> +       io->buf = buf;
> +       io->end = buf;
> +       io->data = buf;
> +       io->eof = false;
> +}
> +
> +/* Reads one character from the "io" file with similar semantics to fgetc. */
> +static inline int io__get_char(struct io *io)
> +{
> +       char *ptr = io->data;
> +
> +       if (ptr == io->end) {
> +               ssize_t n = read(io->fd, io->buf, io->buf_len);
> +
> +               if (n <= 0) {
> +                       io->eof = true;
> +                       return -1;
> +               }
> +               ptr = &io->buf[0];
> +               io->end = &io->buf[n];
> +       }
> +       io->data = ptr + 1;
> +       return *ptr;
> +}
> +
> +/* Read a hexadecimal value with no 0x prefix into the out argument hex.
> + * Returns -1 on error or if nothing is read, otherwise returns the character
> + * after the hexadecimal value.
> + */
> +static inline int io__get_hex(struct io *io, __u64 *hex)
> +{
> +       bool first_read = true;
> +
> +       *hex = 0;
> +       while (true) {
> +               char ch = io__get_char(io);
> +

Maybe you can add this

    if (io->eof)
        return 0;

Please see below


> +               if (ch < 0)
> +                       return ch;
> +               if (ch >= '0' && ch <= '9')
> +                       *hex = (*hex << 4) | (ch - '0');
> +               else if (ch >= 'a' && ch <= 'f')
> +                       *hex = (*hex << 4) | (ch - 'a' + 10);
> +               else if (ch >= 'A' && ch <= 'F')
> +                       *hex = (*hex << 4) | (ch - 'A' + 10);
> +               else if (first_read)
> +                       return -1;
> +               else
> +                       return ch;
> +               first_read = false;
> +       }
> +}

What if a file contains hex digits at the end (without trailing spaces)?
I guess it'd see EOF and return -1, right?

And it'd better to be clear when it sees a big hex numbers -
it could have a comment that it'd simply discard upper bits
or return an error.

> +
> +/* Read a decimal value into the out argument dec.
> + * Returns -1 on error or if nothing is read, otherwise returns the character
> + * after the decimal value.
> + */
> +static inline int io__get_dec(struct io *io, __u64 *dec)
> +{
> +       bool first_read = true;
> +
> +       *dec = 0;
> +       while (true) {
> +               char ch = io__get_char(io);
> +
> +               if (ch < 0)
> +                       return ch;
> +               if (ch >= '0' && ch <= '9')
> +                       *dec = (*dec * 10) + ch - '0';
> +               else if (first_read)
> +                       return -1;
> +               else
> +                       return ch;
> +               first_read = false;
> +       }
> +}

Ditto.

Thanks
Namhyung


> +
> +#endif /* __API_IO__ */
> --
> 2.26.0.rc2.310.g2932bb562d-goog
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/5] perf bench: add event synthesis benchmark
  2020-04-02 15:43 ` [PATCH v2 1/5] perf bench: add event synthesis benchmark Ian Rogers
@ 2020-04-06 14:07   ` Arnaldo Carvalho de Melo
  2020-04-22 12:17   ` [tip: perf/core] perf bench: Add " tip-bot2 for Ian Rogers
  1 sibling, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-04-06 14:07 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Petr Mladek, Andrey Zhizhikin,
	Kefeng Wang, Thomas Gleixner, Kan Liang, linux-kernel,
	linux-perf-users, Stephane Eranian

Em Thu, Apr 02, 2020 at 08:43:53AM -0700, Ian Rogers escreveu:
> Event synthesis may occur at the start or end (tail) of a perf command.
> In system-wide mode it can scan every process in /proc, which may add
> seconds of latency before event recording. Add a new benchmark that
> times how long event synthesis takes with and without data synthesis.

Thanks, applied,

- Arnaldo
 
> An example execution looks like:
>  $ perf bench internals synthesize
>  # Running 'internals/synthesize' benchmark:
>  Average synthesis took: 168.253800 usec
>  Average data synthesis took: 208.104700 usec
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/Documentation/perf-bench.txt |   8 ++
>  tools/perf/bench/Build                  |   2 +-
>  tools/perf/bench/bench.h                |   2 +-
>  tools/perf/bench/synthesize.c           | 101 ++++++++++++++++++++++++
>  tools/perf/builtin-bench.c              |   6 ++
>  5 files changed, 117 insertions(+), 2 deletions(-)
>  create mode 100644 tools/perf/bench/synthesize.c
> 
> diff --git a/tools/perf/Documentation/perf-bench.txt b/tools/perf/Documentation/perf-bench.txt
> index 0921a3c67381..bad16512c48d 100644
> --- a/tools/perf/Documentation/perf-bench.txt
> +++ b/tools/perf/Documentation/perf-bench.txt
> @@ -61,6 +61,9 @@ SUBSYSTEM
>  'epoll'::
>  	Eventpoll (epoll) stressing benchmarks.
>  
> +'internals'::
> +	Benchmark internal perf functionality.
> +
>  'all'::
>  	All benchmark subsystems.
>  
> @@ -214,6 +217,11 @@ Suite for evaluating concurrent epoll_wait calls.
>  *ctl*::
>  Suite for evaluating multiple epoll_ctl calls.
>  
> +SUITES FOR 'internals'
> +~~~~~~~~~~~~~~~~~~~~~~
> +*synthesize*::
> +Suite for evaluating perf's event synthesis performance.
> +
>  SEE ALSO
>  --------
>  linkperf:perf[1]
> diff --git a/tools/perf/bench/Build b/tools/perf/bench/Build
> index e4e321b6f883..042827385c87 100644
> --- a/tools/perf/bench/Build
> +++ b/tools/perf/bench/Build
> @@ -6,9 +6,9 @@ perf-y += futex-wake.o
>  perf-y += futex-wake-parallel.o
>  perf-y += futex-requeue.o
>  perf-y += futex-lock-pi.o
> -
>  perf-y += epoll-wait.o
>  perf-y += epoll-ctl.o
> +perf-y += synthesize.o
>  
>  perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-lib.o
>  perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-asm.o
> diff --git a/tools/perf/bench/bench.h b/tools/perf/bench/bench.h
> index 4aa6de1aa67d..4d669c803237 100644
> --- a/tools/perf/bench/bench.h
> +++ b/tools/perf/bench/bench.h
> @@ -41,9 +41,9 @@ int bench_futex_wake_parallel(int argc, const char **argv);
>  int bench_futex_requeue(int argc, const char **argv);
>  /* pi futexes */
>  int bench_futex_lock_pi(int argc, const char **argv);
> -
>  int bench_epoll_wait(int argc, const char **argv);
>  int bench_epoll_ctl(int argc, const char **argv);
> +int bench_synthesize(int argc, const char **argv);
>  
>  #define BENCH_FORMAT_DEFAULT_STR	"default"
>  #define BENCH_FORMAT_DEFAULT		0
> diff --git a/tools/perf/bench/synthesize.c b/tools/perf/bench/synthesize.c
> new file mode 100644
> index 000000000000..6291257bc9c9
> --- /dev/null
> +++ b/tools/perf/bench/synthesize.c
> @@ -0,0 +1,101 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Benchmark synthesis of perf events such as at the start of a 'perf
> + * record'. Synthesis is done on the current process and the 'dummy' event
> + * handlers are invoked that support dump_trace but otherwise do nothing.
> + *
> + * Copyright 2019 Google LLC.
> + */
> +#include <stdio.h>
> +#include "bench.h"
> +#include "../util/debug.h"
> +#include "../util/session.h"
> +#include "../util/synthetic-events.h"
> +#include "../util/target.h"
> +#include "../util/thread_map.h"
> +#include "../util/tool.h"
> +#include <linux/err.h>
> +#include <linux/time64.h>
> +#include <subcmd/parse-options.h>
> +
> +static unsigned int iterations = 10000;
> +
> +static const struct option options[] = {
> +	OPT_UINTEGER('i', "iterations", &iterations,
> +		"Number of iterations used to compute average"),
> +	OPT_END()
> +};
> +
> +static const char *const usage[] = {
> +	"perf bench internals synthesize <options>",
> +	NULL
> +};
> +
> +
> +static int do_synthesize(struct perf_session *session,
> +			struct perf_thread_map *threads,
> +			struct target *target, bool data_mmap)
> +{
> +	const unsigned int nr_threads_synthesize = 1;
> +	struct timeval start, end, diff;
> +	u64 runtime_us;
> +	unsigned int i;
> +	double average;
> +	int err;
> +
> +	gettimeofday(&start, NULL);
> +	for (i = 0; i < iterations; i++) {
> +		err = machine__synthesize_threads(&session->machines.host,
> +						target, threads, data_mmap,
> +						nr_threads_synthesize);
> +		if (err)
> +			return err;
> +	}
> +
> +	gettimeofday(&end, NULL);
> +	timersub(&end, &start, &diff);
> +	runtime_us = diff.tv_sec * USEC_PER_SEC + diff.tv_usec;
> +	average = (double)runtime_us/(double)iterations;
> +	printf("Average %ssynthesis took: %f usec\n",
> +		data_mmap ? "data " : "", average);
> +	return 0;
> +}
> +
> +int bench_synthesize(int argc, const char **argv)
> +{
> +	struct perf_tool tool;
> +	struct perf_session *session;
> +	struct target target = {
> +		.pid = "self",
> +	};
> +	struct perf_thread_map *threads;
> +	int err;
> +
> +	argc = parse_options(argc, argv, options, usage, 0);
> +
> +	session = perf_session__new(NULL, false, NULL);
> +	if (IS_ERR(session)) {
> +		pr_err("Session creation failed.\n");
> +		return PTR_ERR(session);
> +	}
> +	threads = thread_map__new_by_pid(getpid());
> +	if (!threads) {
> +		pr_err("Thread map creation failed.\n");
> +		err = -ENOMEM;
> +		goto err_out;
> +	}
> +	perf_tool__fill_defaults(&tool);
> +
> +	err = do_synthesize(session, threads, &target, false);
> +	if (err)
> +		goto err_out;
> +
> +	err = do_synthesize(session, threads, &target, true);
> +
> +err_out:
> +	if (threads)
> +		perf_thread_map__put(threads);
> +
> +	perf_session__delete(session);
> +	return err;
> +}
> diff --git a/tools/perf/builtin-bench.c b/tools/perf/builtin-bench.c
> index c06fe21c8613..11c79a8d85d6 100644
> --- a/tools/perf/builtin-bench.c
> +++ b/tools/perf/builtin-bench.c
> @@ -76,6 +76,11 @@ static struct bench epoll_benchmarks[] = {
>  };
>  #endif // HAVE_EVENTFD
>  
> +static struct bench internals_benchmarks[] = {
> +	{ "synthesize", "Benchmark perf event synthesis",	bench_synthesize	},
> +	{ NULL,		NULL,					NULL			}
> +};
> +
>  struct collection {
>  	const char	*name;
>  	const char	*summary;
> @@ -92,6 +97,7 @@ static struct collection collections[] = {
>  #ifdef HAVE_EVENTFD
>  	{"epoll",       "Epoll stressing benchmarks",                   epoll_benchmarks        },
>  #endif
> +	{ "internals",	"Perf-internals benchmarks",			internals_benchmarks	},
>  	{ "all",	"All benchmarks",				NULL			},
>  	{ NULL,		NULL,						NULL			}
>  };
> -- 
> 2.26.0.rc2.310.g2932bb562d-goog
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/5] tools api fs: make xxx__mountpoint() more scalable
  2020-04-02 15:43 ` [PATCH v2 2/5] tools api fs: make xxx__mountpoint() more scalable Ian Rogers
@ 2020-04-06 14:07   ` Arnaldo Carvalho de Melo
  2020-04-22 12:17   ` [tip: perf/core] tools api fs: Make " tip-bot2 for Stephane Eranian
  1 sibling, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-04-06 14:07 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Petr Mladek, Andrey Zhizhikin,
	Kefeng Wang, Thomas Gleixner, Kan Liang, linux-kernel,
	linux-perf-users, Stephane Eranian

Em Thu, Apr 02, 2020 at 08:43:54AM -0700, Ian Rogers escreveu:
> From: Stephane Eranian <eranian@google.com>
> 
> The xxx_mountpoint() interface provided by fs.c finds
> mount points for common pseudo filesystems. The first
> time xxx_mountpoint() is invoked, it scans the mount
> table (/proc/mounts) looking for a match. If found, it
> is cached. The price to scan /proc/mounts is paid once
> if the mount is found.
> 
> When the mount point is not found, subsequent calls to
> xxx_mountpoint() scan /proc/mounts over and over again.
> There is no caching.

Thanks, applied, lets see if someone screams...

- Arnaldo
 
> This causes a scaling issue in perf record with hugeltbfs__mountpoint().
> The function is called for each process found in synthesize__mmap_events().
> If the machine has thousands of processes and if the /proc/mounts has many
> entries this could cause major overhead in perf record. We have observed
> multi-second slowdowns on some configurations.
> 
> As an example on a laptop:
> 
> Before:
> $ sudo umount /dev/hugepages
> $ strace -e trace=openat -o /tmp/tt perf record -a ls
> $ fgrep mounts /tmp/tt
> 285
> 
> After:
> $ sudo umount /dev/hugepages
> $ strace -e trace=openat -o /tmp/tt perf record -a ls
> $ fgrep mounts /tmp/tt
> 1
> 
> One could argue that the non-caching in case the moint point is not found
> is intentional. That way subsequent calls may discover a moint point if
> the sysadmin mounts the filesystem. But the same argument could be made
> against caching the mount point. It could be unmounted causing errors.
> It all depends on the intent of the interface. This patch assumes it
> is expected to scan /proc/mounts once. The patch documents the caching
> behavior in the fs.h header file.
> 
> An alternative would be to just fix perf record. But it would solve
> the problem with hugetlbs__mountpoint() but there could be similar
> issues (possibly down the line) with other xxx_mountpoint() calls
> in perf or other tools.
> 
> Signed-off-by: Stephane Eranian <eranian@google.com>
> Reviewed-by: Ian Rogers <irogers@google.com>
> Acked-by: Jiri Olsa <jolsa@redhat.com>
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/lib/api/fs/fs.c | 17 +++++++++++++++++
>  tools/lib/api/fs/fs.h | 12 ++++++++++++
>  2 files changed, 29 insertions(+)
> 
> diff --git a/tools/lib/api/fs/fs.c b/tools/lib/api/fs/fs.c
> index 027b18f7ed8c..82f53d81a7a7 100644
> --- a/tools/lib/api/fs/fs.c
> +++ b/tools/lib/api/fs/fs.c
> @@ -90,6 +90,7 @@ struct fs {
>  	const char * const	*mounts;
>  	char			 path[PATH_MAX];
>  	bool			 found;
> +	bool			 checked;
>  	long			 magic;
>  };
>  
> @@ -111,31 +112,37 @@ static struct fs fs__entries[] = {
>  		.name	= "sysfs",
>  		.mounts	= sysfs__fs_known_mountpoints,
>  		.magic	= SYSFS_MAGIC,
> +		.checked = false,
>  	},
>  	[FS__PROCFS] = {
>  		.name	= "proc",
>  		.mounts	= procfs__known_mountpoints,
>  		.magic	= PROC_SUPER_MAGIC,
> +		.checked = false,
>  	},
>  	[FS__DEBUGFS] = {
>  		.name	= "debugfs",
>  		.mounts	= debugfs__known_mountpoints,
>  		.magic	= DEBUGFS_MAGIC,
> +		.checked = false,
>  	},
>  	[FS__TRACEFS] = {
>  		.name	= "tracefs",
>  		.mounts	= tracefs__known_mountpoints,
>  		.magic	= TRACEFS_MAGIC,
> +		.checked = false,
>  	},
>  	[FS__HUGETLBFS] = {
>  		.name	= "hugetlbfs",
>  		.mounts = hugetlbfs__known_mountpoints,
>  		.magic	= HUGETLBFS_MAGIC,
> +		.checked = false,
>  	},
>  	[FS__BPF_FS] = {
>  		.name	= "bpf",
>  		.mounts = bpf_fs__known_mountpoints,
>  		.magic	= BPF_FS_MAGIC,
> +		.checked = false,
>  	},
>  };
>  
> @@ -158,6 +165,7 @@ static bool fs__read_mounts(struct fs *fs)
>  	}
>  
>  	fclose(fp);
> +	fs->checked = true;
>  	return fs->found = found;
>  }
>  
> @@ -220,6 +228,7 @@ static bool fs__env_override(struct fs *fs)
>  		return false;
>  
>  	fs->found = true;
> +	fs->checked = true;
>  	strncpy(fs->path, override_path, sizeof(fs->path) - 1);
>  	fs->path[sizeof(fs->path) - 1] = '\0';
>  	return true;
> @@ -246,6 +255,14 @@ static const char *fs__mountpoint(int idx)
>  	if (fs->found)
>  		return (const char *)fs->path;
>  
> +	/* the mount point was already checked for the mount point
> +	 * but and did not exist, so return NULL to avoid scanning again.
> +	 * This makes the found and not found paths cost equivalent
> +	 * in case of multiple calls.
> +	 */
> +	if (fs->checked)
> +		return NULL;
> +
>  	return fs__get_mountpoint(fs);
>  }
>  
> diff --git a/tools/lib/api/fs/fs.h b/tools/lib/api/fs/fs.h
> index 936edb95e1f3..aa222ca30311 100644
> --- a/tools/lib/api/fs/fs.h
> +++ b/tools/lib/api/fs/fs.h
> @@ -18,6 +18,18 @@
>  	const char *name##__mount(void);	\
>  	bool name##__configured(void);		\
>  
> +/*
> + * The xxxx__mountpoint() entry points find the first match mount point for each
> + * filesystems listed below, where xxxx is the filesystem type.
> + *
> + * The interface is as follows:
> + *
> + * - If a mount point is found on first call, it is cached and used for all
> + *   subsequent calls.
> + *
> + * - If a mount point is not found, NULL is returned on first call and all
> + *   subsequent calls.
> + */
>  FS(sysfs)
>  FS(procfs)
>  FS(debugfs)
> -- 
> 2.26.0.rc2.310.g2932bb562d-goog
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] tools api: add a lightweight buffered reading api
  2020-04-04  3:06   ` Namhyung Kim
@ 2020-04-06 14:09     ` Arnaldo Carvalho de Melo
  2020-04-06 16:15       ` Ian Rogers
  0 siblings, 1 reply; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-04-06 14:09 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ian Rogers, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Petr Mladek, Andrey Zhizhikin,
	Kefeng Wang, Thomas Gleixner, Kan Liang, linux-kernel,
	linux-perf-users, Stephane Eranian

Em Sat, Apr 04, 2020 at 12:06:45PM +0900, Namhyung Kim escreveu:
> Hello,
> 
> On Fri, Apr 3, 2020 at 12:44 AM Ian Rogers <irogers@google.com> wrote:
> >
> > The synthesize benchmark shows the majority of execution time going to
> > fgets and sscanf, necessary to parse /proc/pid/maps. Add a new buffered
> > reading library that will be used to replace these calls in a follow-up
> > CL.

waiting for some conclusion to this thread,

- Arnaldo

> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/lib/api/io.h | 107 +++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 107 insertions(+)
> >  create mode 100644 tools/lib/api/io.h
> >
> > diff --git a/tools/lib/api/io.h b/tools/lib/api/io.h
> > new file mode 100644
> > index 000000000000..5aa5b0e26a7a
> > --- /dev/null
> > +++ b/tools/lib/api/io.h
> > @@ -0,0 +1,107 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Lightweight buffered reading library.
> > + *
> > + * Copyright 2019 Google LLC.
> > + */
> > +#ifndef __API_IO__
> > +#define __API_IO__
> > +
> > +struct io {
> > +       /* File descriptor being read/ */
> > +       int fd;
> > +       /* Size of the read buffer. */
> > +       unsigned int buf_len;
> > +       /* Pointer to storage for buffering read. */
> > +       char *buf;
> > +       /* End of the storage. */
> > +       char *end;
> > +       /* Currently accessed data pointer. */
> > +       char *data;
> > +       /* Set true on when the end of file on read error. */
> > +       bool eof;
> > +};
> > +
> > +static inline void io__init(struct io *io, int fd,
> > +                           char *buf, unsigned int buf_len)
> > +{
> > +       io->fd = fd;
> > +       io->buf_len = buf_len;
> > +       io->buf = buf;
> > +       io->end = buf;
> > +       io->data = buf;
> > +       io->eof = false;
> > +}
> > +
> > +/* Reads one character from the "io" file with similar semantics to fgetc. */
> > +static inline int io__get_char(struct io *io)
> > +{
> > +       char *ptr = io->data;
> > +
> > +       if (ptr == io->end) {
> > +               ssize_t n = read(io->fd, io->buf, io->buf_len);
> > +
> > +               if (n <= 0) {
> > +                       io->eof = true;
> > +                       return -1;
> > +               }
> > +               ptr = &io->buf[0];
> > +               io->end = &io->buf[n];
> > +       }
> > +       io->data = ptr + 1;
> > +       return *ptr;
> > +}
> > +
> > +/* Read a hexadecimal value with no 0x prefix into the out argument hex.
> > + * Returns -1 on error or if nothing is read, otherwise returns the character
> > + * after the hexadecimal value.
> > + */
> > +static inline int io__get_hex(struct io *io, __u64 *hex)
> > +{
> > +       bool first_read = true;
> > +
> > +       *hex = 0;
> > +       while (true) {
> > +               char ch = io__get_char(io);
> > +
> 
> Maybe you can add this
> 
>     if (io->eof)
>         return 0;
> 
> Please see below
> 
> 
> > +               if (ch < 0)
> > +                       return ch;
> > +               if (ch >= '0' && ch <= '9')
> > +                       *hex = (*hex << 4) | (ch - '0');
> > +               else if (ch >= 'a' && ch <= 'f')
> > +                       *hex = (*hex << 4) | (ch - 'a' + 10);
> > +               else if (ch >= 'A' && ch <= 'F')
> > +                       *hex = (*hex << 4) | (ch - 'A' + 10);
> > +               else if (first_read)
> > +                       return -1;
> > +               else
> > +                       return ch;
> > +               first_read = false;
> > +       }
> > +}
> 
> What if a file contains hex digits at the end (without trailing spaces)?
> I guess it'd see EOF and return -1, right?
> 
> And it'd better to be clear when it sees a big hex numbers -
> it could have a comment that it'd simply discard upper bits
> or return an error.
> 
> > +
> > +/* Read a decimal value into the out argument dec.
> > + * Returns -1 on error or if nothing is read, otherwise returns the character
> > + * after the decimal value.
> > + */
> > +static inline int io__get_dec(struct io *io, __u64 *dec)
> > +{
> > +       bool first_read = true;
> > +
> > +       *dec = 0;
> > +       while (true) {
> > +               char ch = io__get_char(io);
> > +
> > +               if (ch < 0)
> > +                       return ch;
> > +               if (ch >= '0' && ch <= '9')
> > +                       *dec = (*dec * 10) + ch - '0';
> > +               else if (first_read)
> > +                       return -1;
> > +               else
> > +                       return ch;
> > +               first_read = false;
> > +       }
> > +}
> 
> Ditto.
> 
> Thanks
> Namhyung
> 
> 
> > +
> > +#endif /* __API_IO__ */
> > --
> > 2.26.0.rc2.310.g2932bb562d-goog
> >

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] tools api: add a lightweight buffered reading api
  2020-04-06 14:09     ` Arnaldo Carvalho de Melo
@ 2020-04-06 16:15       ` Ian Rogers
  2020-04-07 12:32         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 17+ messages in thread
From: Ian Rogers @ 2020-04-06 16:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Namhyung Kim, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Petr Mladek, Andrey Zhizhikin,
	Kefeng Wang, Thomas Gleixner, Kan Liang, linux-kernel,
	linux-perf-users, Stephane Eranian

On Mon, Apr 6, 2020 at 7:09 AM Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
>
> Em Sat, Apr 04, 2020 at 12:06:45PM +0900, Namhyung Kim escreveu:
> > Hello,
> >
> > On Fri, Apr 3, 2020 at 12:44 AM Ian Rogers <irogers@google.com> wrote:
> > >
> > > The synthesize benchmark shows the majority of execution time going to
> > > fgets and sscanf, necessary to parse /proc/pid/maps. Add a new buffered
> > > reading library that will be used to replace these calls in a follow-up
> > > CL.
>
> waiting for some conclusion to this thread,
>
> - Arnaldo

Thanks, sorry I was busy at the weekend. I agree with Namhyung's
comments, nice catch! Fwiw, it comes from my refactoring this api out
of a specific /proc/pid/maps reader. I'll work to address the issue
and ideally stick some tests of the corner cases somewhere - any
suggestions? This doesn't feel like a perf test, nor is it a kernel
side test.

Thanks,
Ian

> > > Signed-off-by: Ian Rogers <irogers@google.com>
> > > ---
> > >  tools/lib/api/io.h | 107 +++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 107 insertions(+)
> > >  create mode 100644 tools/lib/api/io.h
> > >
> > > diff --git a/tools/lib/api/io.h b/tools/lib/api/io.h
> > > new file mode 100644
> > > index 000000000000..5aa5b0e26a7a
> > > --- /dev/null
> > > +++ b/tools/lib/api/io.h
> > > @@ -0,0 +1,107 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +/*
> > > + * Lightweight buffered reading library.
> > > + *
> > > + * Copyright 2019 Google LLC.
> > > + */
> > > +#ifndef __API_IO__
> > > +#define __API_IO__
> > > +
> > > +struct io {
> > > +       /* File descriptor being read/ */
> > > +       int fd;
> > > +       /* Size of the read buffer. */
> > > +       unsigned int buf_len;
> > > +       /* Pointer to storage for buffering read. */
> > > +       char *buf;
> > > +       /* End of the storage. */
> > > +       char *end;
> > > +       /* Currently accessed data pointer. */
> > > +       char *data;
> > > +       /* Set true on when the end of file on read error. */
> > > +       bool eof;
> > > +};
> > > +
> > > +static inline void io__init(struct io *io, int fd,
> > > +                           char *buf, unsigned int buf_len)
> > > +{
> > > +       io->fd = fd;
> > > +       io->buf_len = buf_len;
> > > +       io->buf = buf;
> > > +       io->end = buf;
> > > +       io->data = buf;
> > > +       io->eof = false;
> > > +}
> > > +
> > > +/* Reads one character from the "io" file with similar semantics to fgetc. */
> > > +static inline int io__get_char(struct io *io)
> > > +{
> > > +       char *ptr = io->data;
> > > +
> > > +       if (ptr == io->end) {
> > > +               ssize_t n = read(io->fd, io->buf, io->buf_len);
> > > +
> > > +               if (n <= 0) {
> > > +                       io->eof = true;
> > > +                       return -1;
> > > +               }
> > > +               ptr = &io->buf[0];
> > > +               io->end = &io->buf[n];
> > > +       }
> > > +       io->data = ptr + 1;
> > > +       return *ptr;
> > > +}
> > > +
> > > +/* Read a hexadecimal value with no 0x prefix into the out argument hex.
> > > + * Returns -1 on error or if nothing is read, otherwise returns the character
> > > + * after the hexadecimal value.
> > > + */
> > > +static inline int io__get_hex(struct io *io, __u64 *hex)
> > > +{
> > > +       bool first_read = true;
> > > +
> > > +       *hex = 0;
> > > +       while (true) {
> > > +               char ch = io__get_char(io);
> > > +
> >
> > Maybe you can add this
> >
> >     if (io->eof)
> >         return 0;
> >
> > Please see below
> >
> >
> > > +               if (ch < 0)
> > > +                       return ch;
> > > +               if (ch >= '0' && ch <= '9')
> > > +                       *hex = (*hex << 4) | (ch - '0');
> > > +               else if (ch >= 'a' && ch <= 'f')
> > > +                       *hex = (*hex << 4) | (ch - 'a' + 10);
> > > +               else if (ch >= 'A' && ch <= 'F')
> > > +                       *hex = (*hex << 4) | (ch - 'A' + 10);
> > > +               else if (first_read)
> > > +                       return -1;
> > > +               else
> > > +                       return ch;
> > > +               first_read = false;
> > > +       }
> > > +}
> >
> > What if a file contains hex digits at the end (without trailing spaces)?
> > I guess it'd see EOF and return -1, right?
> >
> > And it'd better to be clear when it sees a big hex numbers -
> > it could have a comment that it'd simply discard upper bits
> > or return an error.
> >
> > > +
> > > +/* Read a decimal value into the out argument dec.
> > > + * Returns -1 on error or if nothing is read, otherwise returns the character
> > > + * after the decimal value.
> > > + */
> > > +static inline int io__get_dec(struct io *io, __u64 *dec)
> > > +{
> > > +       bool first_read = true;
> > > +
> > > +       *dec = 0;
> > > +       while (true) {
> > > +               char ch = io__get_char(io);
> > > +
> > > +               if (ch < 0)
> > > +                       return ch;
> > > +               if (ch >= '0' && ch <= '9')
> > > +                       *dec = (*dec * 10) + ch - '0';
> > > +               else if (first_read)
> > > +                       return -1;
> > > +               else
> > > +                       return ch;
> > > +               first_read = false;
> > > +       }
> > > +}
> >
> > Ditto.
> >
> > Thanks
> > Namhyung
> >
> >
> > > +
> > > +#endif /* __API_IO__ */
> > > --
> > > 2.26.0.rc2.310.g2932bb562d-goog
> > >
>
> --
>
> - Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] tools api: add a lightweight buffered reading api
  2020-04-06 16:15       ` Ian Rogers
@ 2020-04-07 12:32         ` Arnaldo Carvalho de Melo
  2020-04-10  3:43           ` Ian Rogers
  0 siblings, 1 reply; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-04-07 12:32 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Namhyung Kim, Peter Zijlstra,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Petr Mladek, Andrey Zhizhikin, Kefeng Wang, Thomas Gleixner,
	Kan Liang, linux-kernel, linux-perf-users, Stephane Eranian

Em Mon, Apr 06, 2020 at 09:15:51AM -0700, Ian Rogers escreveu:
> On Mon, Apr 6, 2020 at 7:09 AM Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> wrote:
> > Em Sat, Apr 04, 2020 at 12:06:45PM +0900, Namhyung Kim escreveu:
> > > On Fri, Apr 3, 2020 at 12:44 AM Ian Rogers <irogers@google.com> wrote:
> > > >
> > > > The synthesize benchmark shows the majority of execution time going to
> > > > fgets and sscanf, necessary to parse /proc/pid/maps. Add a new buffered
> > > > reading library that will be used to replace these calls in a follow-up
> > > > CL.

> > waiting for some conclusion to this thread,
 
> Thanks, sorry I was busy at the weekend. I agree with Namhyung's
> comments, nice catch! Fwiw, it comes from my refactoring this api out
> of a specific /proc/pid/maps reader. I'll work to address the issue
> and ideally stick some tests of the corner cases somewhere - any
> suggestions? This doesn't feel like a perf test, nor is it a kernel

It may not be uniquely useful for perf, but I'd start by adding an entry
to 'perf test' anyway, as its used by perf, after all what we want is
that code gets tested regularly, adding it to 'perf test' achieves that.

Thanks,

- Arnaldo

> side test.
> 
> Thanks,
> Ian
> 
> > > > Signed-off-by: Ian Rogers <irogers@google.com>
> > > > ---
> > > >  tools/lib/api/io.h | 107 +++++++++++++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 107 insertions(+)
> > > >  create mode 100644 tools/lib/api/io.h
> > > >
> > > > diff --git a/tools/lib/api/io.h b/tools/lib/api/io.h
> > > > new file mode 100644
> > > > index 000000000000..5aa5b0e26a7a
> > > > --- /dev/null
> > > > +++ b/tools/lib/api/io.h
> > > > @@ -0,0 +1,107 @@
> > > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > > +/*
> > > > + * Lightweight buffered reading library.
> > > > + *
> > > > + * Copyright 2019 Google LLC.
> > > > + */
> > > > +#ifndef __API_IO__
> > > > +#define __API_IO__
> > > > +
> > > > +struct io {
> > > > +       /* File descriptor being read/ */
> > > > +       int fd;
> > > > +       /* Size of the read buffer. */
> > > > +       unsigned int buf_len;
> > > > +       /* Pointer to storage for buffering read. */
> > > > +       char *buf;
> > > > +       /* End of the storage. */
> > > > +       char *end;
> > > > +       /* Currently accessed data pointer. */
> > > > +       char *data;
> > > > +       /* Set true on when the end of file on read error. */
> > > > +       bool eof;
> > > > +};
> > > > +
> > > > +static inline void io__init(struct io *io, int fd,
> > > > +                           char *buf, unsigned int buf_len)
> > > > +{
> > > > +       io->fd = fd;
> > > > +       io->buf_len = buf_len;
> > > > +       io->buf = buf;
> > > > +       io->end = buf;
> > > > +       io->data = buf;
> > > > +       io->eof = false;
> > > > +}
> > > > +
> > > > +/* Reads one character from the "io" file with similar semantics to fgetc. */
> > > > +static inline int io__get_char(struct io *io)
> > > > +{
> > > > +       char *ptr = io->data;
> > > > +
> > > > +       if (ptr == io->end) {
> > > > +               ssize_t n = read(io->fd, io->buf, io->buf_len);
> > > > +
> > > > +               if (n <= 0) {
> > > > +                       io->eof = true;
> > > > +                       return -1;
> > > > +               }
> > > > +               ptr = &io->buf[0];
> > > > +               io->end = &io->buf[n];
> > > > +       }
> > > > +       io->data = ptr + 1;
> > > > +       return *ptr;
> > > > +}
> > > > +
> > > > +/* Read a hexadecimal value with no 0x prefix into the out argument hex.
> > > > + * Returns -1 on error or if nothing is read, otherwise returns the character
> > > > + * after the hexadecimal value.
> > > > + */
> > > > +static inline int io__get_hex(struct io *io, __u64 *hex)
> > > > +{
> > > > +       bool first_read = true;
> > > > +
> > > > +       *hex = 0;
> > > > +       while (true) {
> > > > +               char ch = io__get_char(io);
> > > > +
> > >
> > > Maybe you can add this
> > >
> > >     if (io->eof)
> > >         return 0;
> > >
> > > Please see below
> > >
> > >
> > > > +               if (ch < 0)
> > > > +                       return ch;
> > > > +               if (ch >= '0' && ch <= '9')
> > > > +                       *hex = (*hex << 4) | (ch - '0');
> > > > +               else if (ch >= 'a' && ch <= 'f')
> > > > +                       *hex = (*hex << 4) | (ch - 'a' + 10);
> > > > +               else if (ch >= 'A' && ch <= 'F')
> > > > +                       *hex = (*hex << 4) | (ch - 'A' + 10);
> > > > +               else if (first_read)
> > > > +                       return -1;
> > > > +               else
> > > > +                       return ch;
> > > > +               first_read = false;
> > > > +       }
> > > > +}
> > >
> > > What if a file contains hex digits at the end (without trailing spaces)?
> > > I guess it'd see EOF and return -1, right?
> > >
> > > And it'd better to be clear when it sees a big hex numbers -
> > > it could have a comment that it'd simply discard upper bits
> > > or return an error.
> > >
> > > > +
> > > > +/* Read a decimal value into the out argument dec.
> > > > + * Returns -1 on error or if nothing is read, otherwise returns the character
> > > > + * after the decimal value.
> > > > + */
> > > > +static inline int io__get_dec(struct io *io, __u64 *dec)
> > > > +{
> > > > +       bool first_read = true;
> > > > +
> > > > +       *dec = 0;
> > > > +       while (true) {
> > > > +               char ch = io__get_char(io);
> > > > +
> > > > +               if (ch < 0)
> > > > +                       return ch;
> > > > +               if (ch >= '0' && ch <= '9')
> > > > +                       *dec = (*dec * 10) + ch - '0';
> > > > +               else if (first_read)
> > > > +                       return -1;
> > > > +               else
> > > > +                       return ch;
> > > > +               first_read = false;
> > > > +       }
> > > > +}
> > >
> > > Ditto.
> > >
> > > Thanks
> > > Namhyung
> > >
> > >
> > > > +
> > > > +#endif /* __API_IO__ */
> > > > --
> > > > 2.26.0.rc2.310.g2932bb562d-goog
> > > >
> >
> > --
> >
> > - Arnaldo

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/5] tools api: add a lightweight buffered reading api
  2020-04-07 12:32         ` Arnaldo Carvalho de Melo
@ 2020-04-10  3:43           ` Ian Rogers
  0 siblings, 0 replies; 17+ messages in thread
From: Ian Rogers @ 2020-04-10  3:43 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Namhyung Kim, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Petr Mladek, Andrey Zhizhikin,
	Kefeng Wang, Thomas Gleixner, Kan Liang, linux-kernel,
	linux-perf-users, Stephane Eranian

On Tue, Apr 7, 2020 at 5:32 AM Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
>
> Em Mon, Apr 06, 2020 at 09:15:51AM -0700, Ian Rogers escreveu:
> > On Mon, Apr 6, 2020 at 7:09 AM Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com> wrote:
> > > Em Sat, Apr 04, 2020 at 12:06:45PM +0900, Namhyung Kim escreveu:
> > > > On Fri, Apr 3, 2020 at 12:44 AM Ian Rogers <irogers@google.com> wrote:
> > > > >
> > > > > The synthesize benchmark shows the majority of execution time going to
> > > > > fgets and sscanf, necessary to parse /proc/pid/maps. Add a new buffered
> > > > > reading library that will be used to replace these calls in a follow-up
> > > > > CL.
>
> > > waiting for some conclusion to this thread,
>
> > Thanks, sorry I was busy at the weekend. I agree with Namhyung's
> > comments, nice catch! Fwiw, it comes from my refactoring this api out
> > of a specific /proc/pid/maps reader. I'll work to address the issue
> > and ideally stick some tests of the corner cases somewhere - any
> > suggestions? This doesn't feel like a perf test, nor is it a kernel
>
> It may not be uniquely useful for perf, but I'd start by adding an entry
> to 'perf test' anyway, as its used by perf, after all what we want is
> that code gets tested regularly, adding it to 'perf test' achieves that.

Thanks, this is hopefully done in:
https://lore.kernel.org/lkml/20200410034048.144150-1-irogers@google.com/T/#t
which also hopefully addresses Namhyung's comments around corner cases
and better documentation.

Thanks,
Ian

> Thanks,
>
> - Arnaldo
>
> > side test.
> >
> > Thanks,
> > Ian
> >
> > > > > Signed-off-by: Ian Rogers <irogers@google.com>
> > > > > ---
> > > > >  tools/lib/api/io.h | 107 +++++++++++++++++++++++++++++++++++++++++++++
> > > > >  1 file changed, 107 insertions(+)
> > > > >  create mode 100644 tools/lib/api/io.h
> > > > >
> > > > > diff --git a/tools/lib/api/io.h b/tools/lib/api/io.h
> > > > > new file mode 100644
> > > > > index 000000000000..5aa5b0e26a7a
> > > > > --- /dev/null
> > > > > +++ b/tools/lib/api/io.h
> > > > > @@ -0,0 +1,107 @@
> > > > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > > > +/*
> > > > > + * Lightweight buffered reading library.
> > > > > + *
> > > > > + * Copyright 2019 Google LLC.
> > > > > + */
> > > > > +#ifndef __API_IO__
> > > > > +#define __API_IO__
> > > > > +
> > > > > +struct io {
> > > > > +       /* File descriptor being read/ */
> > > > > +       int fd;
> > > > > +       /* Size of the read buffer. */
> > > > > +       unsigned int buf_len;
> > > > > +       /* Pointer to storage for buffering read. */
> > > > > +       char *buf;
> > > > > +       /* End of the storage. */
> > > > > +       char *end;
> > > > > +       /* Currently accessed data pointer. */
> > > > > +       char *data;
> > > > > +       /* Set true on when the end of file on read error. */
> > > > > +       bool eof;
> > > > > +};
> > > > > +
> > > > > +static inline void io__init(struct io *io, int fd,
> > > > > +                           char *buf, unsigned int buf_len)
> > > > > +{
> > > > > +       io->fd = fd;
> > > > > +       io->buf_len = buf_len;
> > > > > +       io->buf = buf;
> > > > > +       io->end = buf;
> > > > > +       io->data = buf;
> > > > > +       io->eof = false;
> > > > > +}
> > > > > +
> > > > > +/* Reads one character from the "io" file with similar semantics to fgetc. */
> > > > > +static inline int io__get_char(struct io *io)
> > > > > +{
> > > > > +       char *ptr = io->data;
> > > > > +
> > > > > +       if (ptr == io->end) {
> > > > > +               ssize_t n = read(io->fd, io->buf, io->buf_len);
> > > > > +
> > > > > +               if (n <= 0) {
> > > > > +                       io->eof = true;
> > > > > +                       return -1;
> > > > > +               }
> > > > > +               ptr = &io->buf[0];
> > > > > +               io->end = &io->buf[n];
> > > > > +       }
> > > > > +       io->data = ptr + 1;
> > > > > +       return *ptr;
> > > > > +}
> > > > > +
> > > > > +/* Read a hexadecimal value with no 0x prefix into the out argument hex.
> > > > > + * Returns -1 on error or if nothing is read, otherwise returns the character
> > > > > + * after the hexadecimal value.
> > > > > + */
> > > > > +static inline int io__get_hex(struct io *io, __u64 *hex)
> > > > > +{
> > > > > +       bool first_read = true;
> > > > > +
> > > > > +       *hex = 0;
> > > > > +       while (true) {
> > > > > +               char ch = io__get_char(io);
> > > > > +
> > > >
> > > > Maybe you can add this
> > > >
> > > >     if (io->eof)
> > > >         return 0;
> > > >
> > > > Please see below
> > > >
> > > >
> > > > > +               if (ch < 0)
> > > > > +                       return ch;
> > > > > +               if (ch >= '0' && ch <= '9')
> > > > > +                       *hex = (*hex << 4) | (ch - '0');
> > > > > +               else if (ch >= 'a' && ch <= 'f')
> > > > > +                       *hex = (*hex << 4) | (ch - 'a' + 10);
> > > > > +               else if (ch >= 'A' && ch <= 'F')
> > > > > +                       *hex = (*hex << 4) | (ch - 'A' + 10);
> > > > > +               else if (first_read)
> > > > > +                       return -1;
> > > > > +               else
> > > > > +                       return ch;
> > > > > +               first_read = false;
> > > > > +       }
> > > > > +}
> > > >
> > > > What if a file contains hex digits at the end (without trailing spaces)?
> > > > I guess it'd see EOF and return -1, right?
> > > >
> > > > And it'd better to be clear when it sees a big hex numbers -
> > > > it could have a comment that it'd simply discard upper bits
> > > > or return an error.
> > > >
> > > > > +
> > > > > +/* Read a decimal value into the out argument dec.
> > > > > + * Returns -1 on error or if nothing is read, otherwise returns the character
> > > > > + * after the decimal value.
> > > > > + */
> > > > > +static inline int io__get_dec(struct io *io, __u64 *dec)
> > > > > +{
> > > > > +       bool first_read = true;
> > > > > +
> > > > > +       *dec = 0;
> > > > > +       while (true) {
> > > > > +               char ch = io__get_char(io);
> > > > > +
> > > > > +               if (ch < 0)
> > > > > +                       return ch;
> > > > > +               if (ch >= '0' && ch <= '9')
> > > > > +                       *dec = (*dec * 10) + ch - '0';
> > > > > +               else if (first_read)
> > > > > +                       return -1;
> > > > > +               else
> > > > > +                       return ch;
> > > > > +               first_read = false;
> > > > > +       }
> > > > > +}
> > > >
> > > > Ditto.
> > > >
> > > > Thanks
> > > > Namhyung
> > > >
> > > >
> > > > > +
> > > > > +#endif /* __API_IO__ */
> > > > > --
> > > > > 2.26.0.rc2.310.g2932bb562d-goog
> > > > >
> > >
> > > --
> > >
> > > - Arnaldo
>
> --
>
> - Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [tip: perf/core] perf synthetic-events: save 4kb from 2 stack frames
  2020-04-02 15:43 ` [PATCH v2 3/5] perf synthetic-events: save 4kb from 2 stack frames Ian Rogers
@ 2020-04-22 12:17   ` tip-bot2 for Ian Rogers
  0 siblings, 0 replies; 17+ messages in thread
From: tip-bot2 for Ian Rogers @ 2020-04-22 12:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ian Rogers, Alexander Shishkin, Andrey Zhizhikin, Jiri Olsa,
	Kan Liang, Kefeng Wang, Mark Rutland, Namhyung Kim,
	Peter Zijlstra, Petr Mladek, Stephane Eranian, Thomas Gleixner,
	Arnaldo Carvalho de Melo, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     04ed4ccb9c07868bc0cb41f699391332bf62220c
Gitweb:        https://git.kernel.org/tip/04ed4ccb9c07868bc0cb41f699391332bf62220c
Author:        Ian Rogers <irogers@google.com>
AuthorDate:    Thu, 02 Apr 2020 08:43:55 -07:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Thu, 16 Apr 2020 12:19:13 -03:00

perf synthetic-events: save 4kb from 2 stack frames

Reuse an existing char buffer to avoid two PATH_MAX sized char buffers.

Reduces stack frame sizes by 4kb.

perf_event__synthesize_mmap_events before 'sub $0x45b8,%rsp' after
'sub $0x35b8,%rsp'.

perf_event__get_comm_ids before 'sub $0x2028,%rsp' after
'sub $0x1028,%rsp'.

The performance impact of this change is negligible.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrey Zhizhikin <andrey.z@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20200402154357.107873-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/synthetic-events.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index a661b12..9d4aa95 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -71,7 +71,6 @@ int perf_tool__process_synth_event(struct perf_tool *tool,
 static int perf_event__get_comm_ids(pid_t pid, char *comm, size_t len,
 				    pid_t *tgid, pid_t *ppid)
 {
-	char filename[PATH_MAX];
 	char bf[4096];
 	int fd;
 	size_t size = 0;
@@ -81,11 +80,11 @@ static int perf_event__get_comm_ids(pid_t pid, char *comm, size_t len,
 	*tgid = -1;
 	*ppid = -1;
 
-	snprintf(filename, sizeof(filename), "/proc/%d/status", pid);
+	snprintf(bf, sizeof(bf), "/proc/%d/status", pid);
 
-	fd = open(filename, O_RDONLY);
+	fd = open(bf, O_RDONLY);
 	if (fd < 0) {
-		pr_debug("couldn't open %s\n", filename);
+		pr_debug("couldn't open %s\n", bf);
 		return -1;
 	}
 
@@ -281,9 +280,9 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 				       struct machine *machine,
 				       bool mmap_data)
 {
-	char filename[PATH_MAX];
 	FILE *fp;
 	unsigned long long t;
+	char bf[BUFSIZ];
 	bool truncation = false;
 	unsigned long long timeout = proc_map_timeout * 1000000ULL;
 	int rc = 0;
@@ -293,15 +292,15 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 	if (machine__is_default_guest(machine))
 		return 0;
 
-	snprintf(filename, sizeof(filename), "%s/proc/%d/task/%d/maps",
-		 machine->root_dir, pid, pid);
+	snprintf(bf, sizeof(bf), "%s/proc/%d/task/%d/maps",
+		machine->root_dir, pid, pid);
 
-	fp = fopen(filename, "r");
+	fp = fopen(bf, "r");
 	if (fp == NULL) {
 		/*
 		 * We raced with a task exiting - just return:
 		 */
-		pr_debug("couldn't open %s\n", filename);
+		pr_debug("couldn't open %s\n", bf);
 		return -1;
 	}
 
@@ -309,7 +308,6 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 	t = rdclock();
 
 	while (1) {
-		char bf[BUFSIZ];
 		char prot[5];
 		char execname[PATH_MAX];
 		char anonstr[] = "//anon";
@@ -321,10 +319,10 @@ int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 			break;
 
 		if ((rdclock() - t) > timeout) {
-			pr_warning("Reading %s time out. "
+			pr_warning("Reading %s/proc/%d/task/%d/maps time out. "
 				   "You may want to increase "
 				   "the time limit by --proc-map-timeout\n",
-				   filename);
+				   machine->root_dir, pid, pid);
 			truncation = true;
 			goto out;
 		}

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [tip: perf/core] perf bench: Add event synthesis benchmark
  2020-04-02 15:43 ` [PATCH v2 1/5] perf bench: add event synthesis benchmark Ian Rogers
  2020-04-06 14:07   ` Arnaldo Carvalho de Melo
@ 2020-04-22 12:17   ` tip-bot2 for Ian Rogers
  1 sibling, 0 replies; 17+ messages in thread
From: tip-bot2 for Ian Rogers @ 2020-04-22 12:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Ian Rogers, Jiri Olsa, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Andrey Zhizhikin, Kan Liang, Kefeng Wang,
	Mark Rutland, Namhyung Kim, Peter Zijlstra, Petr Mladek,
	Stephane Eranian, Thomas Gleixner, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     2a4b51666af8bf0b67ccc2e53120bad27351917c
Gitweb:        https://git.kernel.org/tip/2a4b51666af8bf0b67ccc2e53120bad27351917c
Author:        Ian Rogers <irogers@google.com>
AuthorDate:    Thu, 02 Apr 2020 08:43:53 -07:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Thu, 16 Apr 2020 12:19:12 -03:00

perf bench: Add event synthesis benchmark

Event synthesis may occur at the start or end (tail) of a perf command.
In system-wide mode it can scan every process in /proc, which may add
seconds of latency before event recording. Add a new benchmark that
times how long event synthesis takes with and without data synthesis.

An example execution looks like:

 $ perf bench internals synthesize
 # Running 'internals/synthesize' benchmark:
 Average synthesis took: 168.253800 usec
 Average data synthesis took: 208.104700 usec

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrey Zhizhikin <andrey.z@gmail.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20200402154357.107873-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-bench.txt |   8 ++-
 tools/perf/bench/Build                  |   2 +-
 tools/perf/bench/bench.h                |   2 +-
 tools/perf/bench/synthesize.c           | 101 +++++++++++++++++++++++-
 tools/perf/builtin-bench.c              |   6 +-
 5 files changed, 117 insertions(+), 2 deletions(-)
 create mode 100644 tools/perf/bench/synthesize.c

diff --git a/tools/perf/Documentation/perf-bench.txt b/tools/perf/Documentation/perf-bench.txt
index 0921a3c..bad1651 100644
--- a/tools/perf/Documentation/perf-bench.txt
+++ b/tools/perf/Documentation/perf-bench.txt
@@ -61,6 +61,9 @@ SUBSYSTEM
 'epoll'::
 	Eventpoll (epoll) stressing benchmarks.
 
+'internals'::
+	Benchmark internal perf functionality.
+
 'all'::
 	All benchmark subsystems.
 
@@ -214,6 +217,11 @@ Suite for evaluating concurrent epoll_wait calls.
 *ctl*::
 Suite for evaluating multiple epoll_ctl calls.
 
+SUITES FOR 'internals'
+~~~~~~~~~~~~~~~~~~~~~~
+*synthesize*::
+Suite for evaluating perf's event synthesis performance.
+
 SEE ALSO
 --------
 linkperf:perf[1]
diff --git a/tools/perf/bench/Build b/tools/perf/bench/Build
index e4e321b..0428273 100644
--- a/tools/perf/bench/Build
+++ b/tools/perf/bench/Build
@@ -6,9 +6,9 @@ perf-y += futex-wake.o
 perf-y += futex-wake-parallel.o
 perf-y += futex-requeue.o
 perf-y += futex-lock-pi.o
-
 perf-y += epoll-wait.o
 perf-y += epoll-ctl.o
+perf-y += synthesize.o
 
 perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-lib.o
 perf-$(CONFIG_X86_64) += mem-memcpy-x86-64-asm.o
diff --git a/tools/perf/bench/bench.h b/tools/perf/bench/bench.h
index 4aa6de1..4d669c8 100644
--- a/tools/perf/bench/bench.h
+++ b/tools/perf/bench/bench.h
@@ -41,9 +41,9 @@ int bench_futex_wake_parallel(int argc, const char **argv);
 int bench_futex_requeue(int argc, const char **argv);
 /* pi futexes */
 int bench_futex_lock_pi(int argc, const char **argv);
-
 int bench_epoll_wait(int argc, const char **argv);
 int bench_epoll_ctl(int argc, const char **argv);
+int bench_synthesize(int argc, const char **argv);
 
 #define BENCH_FORMAT_DEFAULT_STR	"default"
 #define BENCH_FORMAT_DEFAULT		0
diff --git a/tools/perf/bench/synthesize.c b/tools/perf/bench/synthesize.c
new file mode 100644
index 0000000..6291257
--- /dev/null
+++ b/tools/perf/bench/synthesize.c
@@ -0,0 +1,101 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Benchmark synthesis of perf events such as at the start of a 'perf
+ * record'. Synthesis is done on the current process and the 'dummy' event
+ * handlers are invoked that support dump_trace but otherwise do nothing.
+ *
+ * Copyright 2019 Google LLC.
+ */
+#include <stdio.h>
+#include "bench.h"
+#include "../util/debug.h"
+#include "../util/session.h"
+#include "../util/synthetic-events.h"
+#include "../util/target.h"
+#include "../util/thread_map.h"
+#include "../util/tool.h"
+#include <linux/err.h>
+#include <linux/time64.h>
+#include <subcmd/parse-options.h>
+
+static unsigned int iterations = 10000;
+
+static const struct option options[] = {
+	OPT_UINTEGER('i', "iterations", &iterations,
+		"Number of iterations used to compute average"),
+	OPT_END()
+};
+
+static const char *const usage[] = {
+	"perf bench internals synthesize <options>",
+	NULL
+};
+
+
+static int do_synthesize(struct perf_session *session,
+			struct perf_thread_map *threads,
+			struct target *target, bool data_mmap)
+{
+	const unsigned int nr_threads_synthesize = 1;
+	struct timeval start, end, diff;
+	u64 runtime_us;
+	unsigned int i;
+	double average;
+	int err;
+
+	gettimeofday(&start, NULL);
+	for (i = 0; i < iterations; i++) {
+		err = machine__synthesize_threads(&session->machines.host,
+						target, threads, data_mmap,
+						nr_threads_synthesize);
+		if (err)
+			return err;
+	}
+
+	gettimeofday(&end, NULL);
+	timersub(&end, &start, &diff);
+	runtime_us = diff.tv_sec * USEC_PER_SEC + diff.tv_usec;
+	average = (double)runtime_us/(double)iterations;
+	printf("Average %ssynthesis took: %f usec\n",
+		data_mmap ? "data " : "", average);
+	return 0;
+}
+
+int bench_synthesize(int argc, const char **argv)
+{
+	struct perf_tool tool;
+	struct perf_session *session;
+	struct target target = {
+		.pid = "self",
+	};
+	struct perf_thread_map *threads;
+	int err;
+
+	argc = parse_options(argc, argv, options, usage, 0);
+
+	session = perf_session__new(NULL, false, NULL);
+	if (IS_ERR(session)) {
+		pr_err("Session creation failed.\n");
+		return PTR_ERR(session);
+	}
+	threads = thread_map__new_by_pid(getpid());
+	if (!threads) {
+		pr_err("Thread map creation failed.\n");
+		err = -ENOMEM;
+		goto err_out;
+	}
+	perf_tool__fill_defaults(&tool);
+
+	err = do_synthesize(session, threads, &target, false);
+	if (err)
+		goto err_out;
+
+	err = do_synthesize(session, threads, &target, true);
+
+err_out:
+	if (threads)
+		perf_thread_map__put(threads);
+
+	perf_session__delete(session);
+	return err;
+}
diff --git a/tools/perf/builtin-bench.c b/tools/perf/builtin-bench.c
index c06fe21..11c79a8 100644
--- a/tools/perf/builtin-bench.c
+++ b/tools/perf/builtin-bench.c
@@ -76,6 +76,11 @@ static struct bench epoll_benchmarks[] = {
 };
 #endif // HAVE_EVENTFD
 
+static struct bench internals_benchmarks[] = {
+	{ "synthesize", "Benchmark perf event synthesis",	bench_synthesize	},
+	{ NULL,		NULL,					NULL			}
+};
+
 struct collection {
 	const char	*name;
 	const char	*summary;
@@ -92,6 +97,7 @@ static struct collection collections[] = {
 #ifdef HAVE_EVENTFD
 	{"epoll",       "Epoll stressing benchmarks",                   epoll_benchmarks        },
 #endif
+	{ "internals",	"Perf-internals benchmarks",			internals_benchmarks	},
 	{ "all",	"All benchmarks",				NULL			},
 	{ NULL,		NULL,						NULL			}
 };

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [tip: perf/core] tools api fs: Make xxx__mountpoint() more scalable
  2020-04-02 15:43 ` [PATCH v2 2/5] tools api fs: make xxx__mountpoint() more scalable Ian Rogers
  2020-04-06 14:07   ` Arnaldo Carvalho de Melo
@ 2020-04-22 12:17   ` tip-bot2 for Stephane Eranian
  1 sibling, 0 replies; 17+ messages in thread
From: tip-bot2 for Stephane Eranian @ 2020-04-22 12:17 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Stephane Eranian, Ian Rogers, Jiri Olsa, Alexander Shishkin,
	Andrey Zhizhikin, Kan Liang, Kefeng Wang, Mark Rutland,
	Namhyung Kim, Peter Zijlstra, Petr Mladek, Thomas Gleixner,
	Arnaldo Carvalho de Melo, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     c6fddb28bad26e5472cb7acf7b04cd5126f1a4ab
Gitweb:        https://git.kernel.org/tip/c6fddb28bad26e5472cb7acf7b04cd5126f1a4ab
Author:        Stephane Eranian <eranian@google.com>
AuthorDate:    Thu, 02 Apr 2020 08:43:54 -07:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Thu, 16 Apr 2020 12:19:12 -03:00

tools api fs: Make xxx__mountpoint() more scalable

The xxx_mountpoint() interface provided by fs.c finds mount points for
common pseudo filesystems. The first time xxx_mountpoint() is invoked,
it scans the mount table (/proc/mounts) looking for a match. If found,
it is cached. The price to scan /proc/mounts is paid once if the mount
is found.

When the mount point is not found, subsequent calls to xxx_mountpoint()
scan /proc/mounts over and over again.  There is no caching.

This causes a scaling issue in perf record with hugeltbfs__mountpoint().
The function is called for each process found in
synthesize__mmap_events().  If the machine has thousands of processes
and if the /proc/mounts has many entries this could cause major overhead
in perf record. We have observed multi-second slowdowns on some
configurations.

As an example on a laptop:

Before:

  $ sudo umount /dev/hugepages
  $ strace -e trace=openat -o /tmp/tt perf record -a ls
  $ fgrep mounts /tmp/tt
  285

After:

  $ sudo umount /dev/hugepages
  $ strace -e trace=openat -o /tmp/tt perf record -a ls
  $ fgrep mounts /tmp/tt
  1

One could argue that the non-caching in case the moint point is not
found is intentional. That way subsequent calls may discover a moint
point if the sysadmin mounts the filesystem. But the same argument could
be made against caching the mount point. It could be unmounted causing
errors.  It all depends on the intent of the interface. This patch
assumes it is expected to scan /proc/mounts once. The patch documents
the caching behavior in the fs.h header file.

An alternative would be to just fix perf record. But it would solve the
problem with hugetlbs__mountpoint() but there could be similar issues
(possibly down the line) with other xxx_mountpoint() calls in perf or
other tools.

Signed-off-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrey Zhizhikin <andrey.z@gmail.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lore.kernel.org/lkml/20200402154357.107873-3-irogers@google.com
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/lib/api/fs/fs.c | 17 +++++++++++++++++
 tools/lib/api/fs/fs.h | 12 ++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/tools/lib/api/fs/fs.c b/tools/lib/api/fs/fs.c
index 027b18f..82f53d8 100644
--- a/tools/lib/api/fs/fs.c
+++ b/tools/lib/api/fs/fs.c
@@ -90,6 +90,7 @@ struct fs {
 	const char * const	*mounts;
 	char			 path[PATH_MAX];
 	bool			 found;
+	bool			 checked;
 	long			 magic;
 };
 
@@ -111,31 +112,37 @@ static struct fs fs__entries[] = {
 		.name	= "sysfs",
 		.mounts	= sysfs__fs_known_mountpoints,
 		.magic	= SYSFS_MAGIC,
+		.checked = false,
 	},
 	[FS__PROCFS] = {
 		.name	= "proc",
 		.mounts	= procfs__known_mountpoints,
 		.magic	= PROC_SUPER_MAGIC,
+		.checked = false,
 	},
 	[FS__DEBUGFS] = {
 		.name	= "debugfs",
 		.mounts	= debugfs__known_mountpoints,
 		.magic	= DEBUGFS_MAGIC,
+		.checked = false,
 	},
 	[FS__TRACEFS] = {
 		.name	= "tracefs",
 		.mounts	= tracefs__known_mountpoints,
 		.magic	= TRACEFS_MAGIC,
+		.checked = false,
 	},
 	[FS__HUGETLBFS] = {
 		.name	= "hugetlbfs",
 		.mounts = hugetlbfs__known_mountpoints,
 		.magic	= HUGETLBFS_MAGIC,
+		.checked = false,
 	},
 	[FS__BPF_FS] = {
 		.name	= "bpf",
 		.mounts = bpf_fs__known_mountpoints,
 		.magic	= BPF_FS_MAGIC,
+		.checked = false,
 	},
 };
 
@@ -158,6 +165,7 @@ static bool fs__read_mounts(struct fs *fs)
 	}
 
 	fclose(fp);
+	fs->checked = true;
 	return fs->found = found;
 }
 
@@ -220,6 +228,7 @@ static bool fs__env_override(struct fs *fs)
 		return false;
 
 	fs->found = true;
+	fs->checked = true;
 	strncpy(fs->path, override_path, sizeof(fs->path) - 1);
 	fs->path[sizeof(fs->path) - 1] = '\0';
 	return true;
@@ -246,6 +255,14 @@ static const char *fs__mountpoint(int idx)
 	if (fs->found)
 		return (const char *)fs->path;
 
+	/* the mount point was already checked for the mount point
+	 * but and did not exist, so return NULL to avoid scanning again.
+	 * This makes the found and not found paths cost equivalent
+	 * in case of multiple calls.
+	 */
+	if (fs->checked)
+		return NULL;
+
 	return fs__get_mountpoint(fs);
 }
 
diff --git a/tools/lib/api/fs/fs.h b/tools/lib/api/fs/fs.h
index 936edb9..aa222ca 100644
--- a/tools/lib/api/fs/fs.h
+++ b/tools/lib/api/fs/fs.h
@@ -18,6 +18,18 @@
 	const char *name##__mount(void);	\
 	bool name##__configured(void);		\
 
+/*
+ * The xxxx__mountpoint() entry points find the first match mount point for each
+ * filesystems listed below, where xxxx is the filesystem type.
+ *
+ * The interface is as follows:
+ *
+ * - If a mount point is found on first call, it is cached and used for all
+ *   subsequent calls.
+ *
+ * - If a mount point is not found, NULL is returned on first call and all
+ *   subsequent calls.
+ */
 FS(sysfs)
 FS(procfs)
 FS(debugfs)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-04-22 12:18 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-02 15:43 [PATCH v2 0/5] Benchmark and improve event synthesis performance Ian Rogers
2020-04-02 15:43 ` [PATCH v2 1/5] perf bench: add event synthesis benchmark Ian Rogers
2020-04-06 14:07   ` Arnaldo Carvalho de Melo
2020-04-22 12:17   ` [tip: perf/core] perf bench: Add " tip-bot2 for Ian Rogers
2020-04-02 15:43 ` [PATCH v2 2/5] tools api fs: make xxx__mountpoint() more scalable Ian Rogers
2020-04-06 14:07   ` Arnaldo Carvalho de Melo
2020-04-22 12:17   ` [tip: perf/core] tools api fs: Make " tip-bot2 for Stephane Eranian
2020-04-02 15:43 ` [PATCH v2 3/5] perf synthetic-events: save 4kb from 2 stack frames Ian Rogers
2020-04-22 12:17   ` [tip: perf/core] " tip-bot2 for Ian Rogers
2020-04-02 15:43 ` [PATCH v2 4/5] tools api: add a lightweight buffered reading api Ian Rogers
2020-04-04  3:06   ` Namhyung Kim
2020-04-06 14:09     ` Arnaldo Carvalho de Melo
2020-04-06 16:15       ` Ian Rogers
2020-04-07 12:32         ` Arnaldo Carvalho de Melo
2020-04-10  3:43           ` Ian Rogers
2020-04-02 15:43 ` [PATCH v2 5/5] perf synthetic events: Remove use of sscanf from /proc reading Ian Rogers
2020-04-03 11:01 ` [PATCH v2 0/5] Benchmark and improve event synthesis performance Jiri Olsa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).