All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] perf record: mmap output file - v5
@ 2013-11-12 14:46 David Ahern
  2013-11-12 14:46 ` [PATCH 1/5] perf record: Fix segfault with --no-mmap-pages David Ahern
                   ` (4 more replies)
  0 siblings, 5 replies; 42+ messages in thread
From: David Ahern @ 2013-11-12 14:46 UTC (permalink / raw)
  To: acme, linux-kernel; +Cc: mingo, jolsa, David Ahern

Next round or mmap based output. This round addresses re-use of
mmap-pages argument parsing, adds -O shortcut for --out-pages,
allows -O0 to use write instead of mmap, and handles out of space
failures.

David Ahern (5):
  perf record: Fix segfault with --no-mmap-pages
  perf tool: Round mmap pages to power 2 - v2
  perf tool: Refactor mmap_pages parsing
  perf record: mmap output file - v5
  perf record: Handle out of space failures writing data with mmap

 tools/perf/Documentation/perf-record.txt |   7 ++
 tools/perf/builtin-record.c              | 174 +++++++++++++++++++++++++++++++
 tools/perf/util/evlist.c                 |  75 +++++++++----
 tools/perf/util/evlist.h                 |   3 +
 4 files changed, 240 insertions(+), 19 deletions(-)

-- 
1.8.3.4 (Apple Git-47)


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/5] perf record: Fix segfault with --no-mmap-pages
  2013-11-12 14:46 [PATCH 0/5] perf record: mmap output file - v5 David Ahern
@ 2013-11-12 14:46 ` David Ahern
  2013-11-12 21:57   ` [tip:perf/urgent] " tip-bot for David Ahern
  2013-11-12 14:46 ` [PATCH 2/5] perf tool: Round mmap pages to power 2 - v2 David Ahern
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 42+ messages in thread
From: David Ahern @ 2013-11-12 14:46 UTC (permalink / raw)
  To: acme, linux-kernel; +Cc: mingo, jolsa, David Ahern, Adrian Hunter

Adrian reported a segfault when using --no-out-pages:
$ tools/perf/perf record -vv --no-out-pages uname
Segmentation fault (core dumped)

The same occurs with --no-mmap-pages. Fix by checking that str is non-NULL
before parsing it.

Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/util/evlist.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 99dc58e5dcc3..3960560f873a 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -719,6 +719,9 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
 		{ .tag  = 0 },
 	};
 
+	if (str == NULL)
+		return -1;
+
 	val = parse_tag_value(str, tags);
 	if (val != (unsigned long) -1) {
 		/* we got file size value */
-- 
1.8.3.4 (Apple Git-47)


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/5] perf tool: Round mmap pages to power 2 - v2
  2013-11-12 14:46 [PATCH 0/5] perf record: mmap output file - v5 David Ahern
  2013-11-12 14:46 ` [PATCH 1/5] perf record: Fix segfault with --no-mmap-pages David Ahern
@ 2013-11-12 14:46 ` David Ahern
  2013-11-12 21:57   ` [tip:perf/urgent] perf evlist: " tip-bot for David Ahern
  2013-11-12 14:46 ` [PATCH 3/5] perf tool: Refactor mmap_pages parsing David Ahern
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 42+ messages in thread
From: David Ahern @ 2013-11-12 14:46 UTC (permalink / raw)
  To: acme, linux-kernel; +Cc: mingo, jolsa, David Ahern

Currently perf requires the -m / --mmap_pages option to be a power of 2.
To be more user friendly perf should automatically round this up to the
next power of 2.

Currently:
  $ perf record -m 3 -a -- sleep 1
  --mmap_pages/-m value must be a power of two.sleep: Terminated

With patch:
  $ perf record -m 3 -a -- sleep 1
  rounding mmap pages size to 16384 (4 pages)
  ...

v2: Add bytes units to rounding message per Ingo's request. Other
    suggestions (e.g., prefixing INFO) should be addressed by wrapping
    pr_info to catch all instances.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
---
 tools/perf/util/evlist.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 3960560f873a..fb4727d3df85 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -710,7 +710,6 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
 {
 	unsigned int *mmap_pages = opt->value;
 	unsigned long pages, val;
-	size_t size;
 	static struct parse_tag tags[] = {
 		{ .tag  = 'B', .mult = 1       },
 		{ .tag  = 'K', .mult = 1 << 10 },
@@ -726,11 +725,6 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
 	if (val != (unsigned long) -1) {
 		/* we got file size value */
 		pages = PERF_ALIGN(val, page_size) / page_size;
-		if (pages < (1UL << 31) && !is_power_of_2(pages)) {
-			pages = next_pow2(pages);
-			pr_info("rounding mmap pages size to %lu (%lu pages)\n",
-				pages * page_size, pages);
-		}
 	} else {
 		/* we got pages count value */
 		char *eptr;
@@ -741,14 +735,14 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
 		}
 	}
 
-	if (pages > UINT_MAX || pages > SIZE_MAX / page_size) {
-		pr_err("--mmap_pages/-m value too big\n");
-		return -1;
+	if (pages < (1UL << 31) && !is_power_of_2(pages)) {
+		pages = next_pow2(pages);
+		pr_info("rounding mmap pages size to %lu bytes (%lu pages)\n",
+			pages * page_size, pages);
 	}
 
-	size = perf_evlist__mmap_size(pages);
-	if (!size) {
-		pr_err("--mmap_pages/-m value must be a power of two.");
+	if (pages > UINT_MAX || pages > SIZE_MAX / page_size) {
+		pr_err("--mmap_pages/-m value too big\n");
 		return -1;
 	}
 
-- 
1.8.3.4 (Apple Git-47)


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 3/5] perf tool: Refactor mmap_pages parsing
  2013-11-12 14:46 [PATCH 0/5] perf record: mmap output file - v5 David Ahern
  2013-11-12 14:46 ` [PATCH 1/5] perf record: Fix segfault with --no-mmap-pages David Ahern
  2013-11-12 14:46 ` [PATCH 2/5] perf tool: Round mmap pages to power 2 - v2 David Ahern
@ 2013-11-12 14:46 ` David Ahern
  2013-11-12 21:57   ` [tip:perf/urgent] perf evlist: " tip-bot for David Ahern
  2013-11-12 14:46 ` [PATCH 4/5] perf record: mmap output file - v5 David Ahern
  2013-11-12 14:46 ` [PATCH 5/5] perf record: Handle out of space failures writing data with mmap David Ahern
  4 siblings, 1 reply; 42+ messages in thread
From: David Ahern @ 2013-11-12 14:46 UTC (permalink / raw)
  To: acme, linux-kernel; +Cc: mingo, jolsa, David Ahern, Adrian Hunter

Logic will be re-used for the out-pages argument for mmap based
writes in perf-record.

Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/evlist.c | 39 ++++++++++++++++++++++++++++-----------
 1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index fb4727d3df85..cb19044601bb 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -705,10 +705,9 @@ static size_t perf_evlist__mmap_size(unsigned long pages)
 	return (pages + 1) * page_size;
 }
 
-int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
-				  int unset __maybe_unused)
+static long parse_pages_arg(const char *str, unsigned long min,
+			    unsigned long max)
 {
-	unsigned int *mmap_pages = opt->value;
 	unsigned long pages, val;
 	static struct parse_tag tags[] = {
 		{ .tag  = 'B', .mult = 1       },
@@ -719,7 +718,7 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
 	};
 
 	if (str == NULL)
-		return -1;
+		return -EINVAL;
 
 	val = parse_tag_value(str, tags);
 	if (val != (unsigned long) -1) {
@@ -729,20 +728,38 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
 		/* we got pages count value */
 		char *eptr;
 		pages = strtoul(str, &eptr, 10);
-		if (*eptr != '\0') {
-			pr_err("failed to parse --mmap_pages/-m value\n");
-			return -1;
-		}
+		if (*eptr != '\0')
+			return -EINVAL;
 	}
 
-	if (pages < (1UL << 31) && !is_power_of_2(pages)) {
+	if ((pages == 0) && (min == 0)) {
+		/* leave number of pages at 0 */
+	} else if (pages < (1UL << 31) && !is_power_of_2(pages)) {
+		/* round pages up to next power of 2 */
 		pages = next_pow2(pages);
 		pr_info("rounding mmap pages size to %lu bytes (%lu pages)\n",
 			pages * page_size, pages);
 	}
 
-	if (pages > UINT_MAX || pages > SIZE_MAX / page_size) {
-		pr_err("--mmap_pages/-m value too big\n");
+	if (pages > max)
+		return -EINVAL;
+
+	return pages;
+}
+
+int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
+				  int unset __maybe_unused)
+{
+	unsigned int *mmap_pages = opt->value;
+	unsigned long max = UINT_MAX;
+	long pages;
+
+	if (max < SIZE_MAX / page_size)
+		max = SIZE_MAX / page_size;
+
+	pages = parse_pages_arg(str, 1, max);
+	if (pages < 0) {
+		pr_err("Invalid argument for --mmap_pages/-m\n");
 		return -1;
 	}
 
-- 
1.8.3.4 (Apple Git-47)


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 4/5] perf record: mmap output file - v5
  2013-11-12 14:46 [PATCH 0/5] perf record: mmap output file - v5 David Ahern
                   ` (2 preceding siblings ...)
  2013-11-12 14:46 ` [PATCH 3/5] perf tool: Refactor mmap_pages parsing David Ahern
@ 2013-11-12 14:46 ` David Ahern
  2013-11-12 14:57   ` Peter Zijlstra
  2013-11-12 14:46 ` [PATCH 5/5] perf record: Handle out of space failures writing data with mmap David Ahern
  4 siblings, 1 reply; 42+ messages in thread
From: David Ahern @ 2013-11-12 14:46 UTC (permalink / raw)
  To: acme, linux-kernel
  Cc: mingo, jolsa, David Ahern, Frederic Weisbecker, Peter Zijlstra,
	Namhyung Kim, Mike Galbraith, Stephane Eranian

When recording raw_syscalls for the entire system, e.g.,
    perf record -e raw_syscalls:*,sched:sched_switch -a -- sleep 1

you end up with a negative feedback loop as perf itself calls write() fairly
often. This patch handles the problem by mmap'ing the file in chunks of 64M at
a time and copies events from the event buffers to the file avoiding write
system calls.

Before (with write syscall):

    perf record -o /tmp/perf.data -e raw_syscalls:*,sched:sched_switch -a -- sleep 1
    [ perf record: Woken up 0 times to write data ]
    [ perf record: Captured and wrote 81.843 MB /tmp/perf.data (~3575786 samples) ]

After (using mmap):

    perf record -o /tmp/perf.data -e raw_syscalls:*,sched:sched_switch -a -- sleep 1
    [ perf record: Woken up 31 times to write data ]
    [ perf record: Captured and wrote 8.203 MB /tmp/perf.data (~358388 samples) ]

In addition to perf-trace benefits using mmap lowers the overhead of
perf-record. For example,

  perf stat -i -- perf record -g -o /tmp/perf.data openssl speed aes

shows a drop in time, CPU cycles, and instructions all drop by more than a
factor of 3. Jiri also ran a test that showed a big improvement.

v5: Addressed misc comments from Jiri, Adrian and Arnaldo. Added -O shortcut
    for --out-pages. Added -O 0 as a means to fall back to write

v4: Refactoring per Ingo's comments

v3: Removed use of bytes_at_mmap_start at the stat() that set it
    Added user option to control the size of the mmap for writing file.

v2: Removed msync call before munmap per Jiri's suggestion

Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Stephane Eranian <eranian@google.com>
---
 tools/perf/Documentation/perf-record.txt |   7 ++
 tools/perf/builtin-record.c              | 164 +++++++++++++++++++++++++++++++
 tools/perf/util/evlist.c                 |  23 +++++
 tools/perf/util/evlist.h                 |   3 +
 4 files changed, 197 insertions(+)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 052f7c4dc00c..7c67dad9e341 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -201,6 +201,13 @@ abort events and some memory events in precise mode on modern Intel CPUs.
 --transaction::
 Record transaction flags for transaction related events.
 
+-O::
+--out-pages=::
+Number of pages to mmap for writing data to file or size specification
+with appended unit character - B/K/M/G. The size is rounded up to have nearest
+pages power of two value. 0 falls back to write instead of mmap. Default size
+is 64M.
+
 SEE ALSO
 --------
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 880227eae20f..1a4fa5df215b 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -30,6 +30,9 @@
 #include <sched.h>
 #include <sys/mman.h>
 
+/* output file mmap'ed N chunks at a time */
+#define MMAP_OUTPUT_SIZE   (64*1024*1024)
+
 #ifndef HAVE_ON_EXIT_SUPPORT
 #ifndef ATEXIT_MAX
 #define ATEXIT_MAX 32
@@ -65,6 +68,16 @@ static void __handle_on_exit_funcs(void)
 struct perf_record {
 	struct perf_tool	tool;
 	struct perf_record_opts	opts;
+
+	/* for MMAP based file writes */
+	struct {
+		void		*addr;
+		u64		offset;     /* current location within mmap */
+		unsigned int	out_pages;  /* user configurable option */
+		size_t		out_size;   /* size of mmap segments */
+		bool		use;
+	} mmap;
+
 	u64			bytes_written;
 	struct perf_data_file	file;
 	struct perf_evlist	*evlist;
@@ -76,6 +89,95 @@ struct perf_record {
 	long			samples;
 };
 
+static int mmap_next_segment(struct perf_record *rec, off_t offset)
+{
+	struct perf_data_file *file = &rec->file;
+
+	/* extend file to include a new mmap segment */
+	if (ftruncate(file->fd, offset + rec->mmap.out_size) != 0) {
+		pr_err("ftruncate failed\n");
+		return -1;
+	}
+
+	rec->mmap.addr = mmap(NULL, rec->mmap.out_size,
+			      PROT_WRITE | PROT_READ, MAP_SHARED,
+			      file->fd, offset);
+
+	if (rec->mmap.addr == MAP_FAILED) {
+		pr_err("mmap failed: %d: %s\n", errno, strerror(errno));
+
+		/* reset file size */
+		if (ftruncate(file->fd, offset) != 0)
+			pr_err("ftruncate failed too. Is it Halloween?\n");
+
+		return -1;
+	}
+
+	return 0;
+}
+
+static off_t next_mmap_offset(struct perf_record *rec)
+{
+	off_t offset;
+
+	/*
+	 * for first segment, mmap offset is current amount of data
+	 * already written to file. For follow on segments the output
+	 * starts at 0.
+	 */
+	offset = rec->session->header.data_offset + rec->bytes_written;
+	if (offset < (ssize_t) rec->mmap.out_size) {
+		rec->mmap.offset = offset;
+		offset = 0;
+	} else {
+		rec->mmap.offset = 0;
+	}
+
+	/* returning offset within file - used for mmap of next segment */
+	return offset;
+}
+
+static int do_mmap_output(struct perf_record *rec, void *buf, size_t size)
+{
+	u64 remaining;
+	off_t offset;
+
+	if (rec->mmap.addr == NULL) {
+next_segment:
+		offset = next_mmap_offset(rec);
+		if (mmap_next_segment(rec, offset) != 0)
+			return -1;
+	}
+
+	/* amount of space in current mmap segment */
+	remaining = rec->mmap.out_size - rec->mmap.offset;
+
+	/*
+	 * if current size to write is more than the available
+	 * space write what we can then go back and create the
+	 * next segment
+	 */
+	if (size > remaining) {
+		memcpy(rec->mmap.addr + rec->mmap.offset, buf, remaining);
+		rec->bytes_written += remaining;
+
+		size -= remaining;
+		buf  += remaining;
+
+		munmap(rec->mmap.addr, rec->mmap.out_size);
+		goto next_segment;
+	}
+
+	/* more data to copy and it fits in the current segment */
+	if (size) {
+		memcpy(rec->mmap.addr + rec->mmap.offset, buf, size);
+		rec->bytes_written += size;
+		rec->mmap.offset += size;
+	}
+
+	return 0;
+}
+
 static int do_write_output(struct perf_record *rec, void *buf, size_t size)
 {
 	struct perf_data_file *file = &rec->file;
@@ -99,6 +201,9 @@ static int do_write_output(struct perf_record *rec, void *buf, size_t size)
 
 static int write_output(struct perf_record *rec, void *buf, size_t size)
 {
+	if (rec->mmap.use)
+		return do_mmap_output(rec, buf, size);
+
 	return do_write_output(rec, buf, size);
 }
 
@@ -361,6 +466,52 @@ static void perf_record__init_features(struct perf_record *rec)
 		perf_header__clear_feat(&session->header, HEADER_BRANCH_STACK);
 }
 
+static int mmap_output_fini(struct perf_record *rec)
+{
+	off_t len;
+	int fd;
+
+	if (!rec->mmap.use)
+		return 0;
+
+	rec->mmap.use = false;
+
+	len = rec->session->header.data_offset + rec->bytes_written;
+	fd = rec->file.fd;
+
+	munmap(rec->mmap.addr, rec->mmap.out_size);
+	rec->mmap.addr = NULL;
+
+	if (ftruncate(fd, len) != 0) {
+		pr_err("ftruncate failed\n");
+		return -1;
+	}
+
+	/*
+	 * Set output pointer to end of file
+	 * eg., needed for buildid processing
+	 */
+	if (lseek(fd, 0, SEEK_END) == (off_t) -1) {
+		pr_err("ftruncate failed\n");
+		return -1;
+	}
+
+	return 0;
+}
+
+static void mmap_output_init(struct perf_record *rec)
+{
+	struct perf_data_file *file = &rec->file;
+
+	if (file->is_pipe)
+		return;
+
+	rec->mmap.out_size = rec->mmap.out_pages * page_size;
+
+	if (rec->mmap.out_size)
+		rec->mmap.use = true;
+}
+
 static int __cmd_record(struct perf_record *rec, int argc, const char **argv)
 {
 	int err;
@@ -434,6 +585,8 @@ static int __cmd_record(struct perf_record *rec, int argc, const char **argv)
 		goto out_delete_session;
 	}
 
+	mmap_output_init(rec);
+
 	machine = &session->machines.host;
 
 	if (file->is_pipe) {
@@ -541,6 +694,11 @@ static int __cmd_record(struct perf_record *rec, int argc, const char **argv)
 		}
 	}
 
+	if (mmap_output_fini(rec) != 0) {
+		err = -1;
+		goto out_delete_session;
+	}
+
 	if (quiet || signr == SIGUSR1)
 		return 0;
 
@@ -802,6 +960,9 @@ static struct perf_record record = {
 			.uses_mmap   = true,
 		},
 	},
+	.mmap = {
+		.out_size = MMAP_OUTPUT_SIZE,
+	},
 };
 
 #define CALLCHAIN_HELP "setup and enables call-graph (stack chain/backtrace) recording: "
@@ -888,6 +1049,9 @@ const struct option record_options[] = {
 		    "sample by weight (on special events only)"),
 	OPT_BOOLEAN(0, "transaction", &record.opts.sample_transaction,
 		    "sample transaction flags (special events only)"),
+	OPT_CALLBACK('O', "out-pages", &record.mmap.out_pages, "pages",
+		     "Number of pages or size with units to use for output (default 64M)",
+		     perf_evlist__parse_out_pages),
 	OPT_END()
 };
 
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index cb19044601bb..3d1f7faa30d7 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -767,6 +767,29 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
 	return 0;
 }
 
+int perf_evlist__parse_out_pages(const struct option *opt, const char *str,
+				  int unset __maybe_unused)
+{
+	unsigned int *out_pages = opt->value;
+	unsigned long max = UINT_MAX;
+	long pages;
+
+	if (max < SIZE_MAX / page_size)
+		max = SIZE_MAX / page_size;
+
+	pages = parse_pages_arg(str, 0, max);
+	if (pages < 0) {
+		pr_err("Invalid argument for --out-pages/-O\n");
+		return -1;
+	}
+
+	if (pages == 0)
+		pr_debug("Reverting to write instead of mmap for output file\n");
+
+	*out_pages = pages;
+	return 0;
+}
+
 /**
  * perf_evlist__mmap - Create mmaps to receive events.
  * @evlist: list of events
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index ecaa582f40e2..749488147276 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -107,6 +107,9 @@ int perf_evlist__prepare_workload(struct perf_evlist *evlist,
 				  bool want_signal);
 int perf_evlist__start_workload(struct perf_evlist *evlist);
 
+int perf_evlist__parse_out_pages(const struct option *opt,
+				  const char *str, int unset);
+
 int perf_evlist__parse_mmap_pages(const struct option *opt,
 				  const char *str,
 				  int unset);
-- 
1.8.3.4 (Apple Git-47)


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 5/5] perf record: Handle out of space failures writing data with mmap
  2013-11-12 14:46 [PATCH 0/5] perf record: mmap output file - v5 David Ahern
                   ` (3 preceding siblings ...)
  2013-11-12 14:46 ` [PATCH 4/5] perf record: mmap output file - v5 David Ahern
@ 2013-11-12 14:46 ` David Ahern
  2013-11-12 21:19   ` Ingo Molnar
  4 siblings, 1 reply; 42+ messages in thread
From: David Ahern @ 2013-11-12 14:46 UTC (permalink / raw)
  To: acme, linux-kernel
  Cc: mingo, jolsa, David Ahern, Frederic Weisbecker, Peter Zijlstra,
	Namhyung Kim, Mike Galbraith, Stephane Eranian

If the filesystem where a file is written using mmap fills perf record
gets a SIGBUS and terminated. Handle the SIGBUS by using longjmp to
bounce out of the memcpy and fail the write.

Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Stephane Eranian <eranian@google.com>
---
 tools/perf/builtin-record.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 1a4fa5df215b..48d6535d144f 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -29,9 +29,11 @@
 #include <unistd.h>
 #include <sched.h>
 #include <sys/mman.h>
+#include <setjmp.h>
 
 /* output file mmap'ed N chunks at a time */
 #define MMAP_OUTPUT_SIZE   (64*1024*1024)
+sigjmp_buf mmap_jmp;
 
 #ifndef HAVE_ON_EXIT_SUPPORT
 #ifndef ATEXIT_MAX
@@ -141,6 +143,7 @@ static int do_mmap_output(struct perf_record *rec, void *buf, size_t size)
 {
 	u64 remaining;
 	off_t offset;
+	volatile size_t total_len = 0;
 
 	if (rec->mmap.addr == NULL) {
 next_segment:
@@ -157,20 +160,23 @@ next_segment:
 	 * space write what we can then go back and create the
 	 * next segment
 	 */
-	if (size > remaining) {
-		memcpy(rec->mmap.addr + rec->mmap.offset, buf, remaining);
+	if (setjmp(mmap_jmp) != 0) {
+		pr_err("mmap copy failed.\n");
+		return -1;
+	}
+	if (size-total_len > remaining) {
+		memcpy(rec->mmap.addr + rec->mmap.offset, buf+total_len, remaining);
 		rec->bytes_written += remaining;
 
-		size -= remaining;
-		buf  += remaining;
+		total_len += remaining;
 
 		munmap(rec->mmap.addr, rec->mmap.out_size);
 		goto next_segment;
 	}
 
 	/* more data to copy and it fits in the current segment */
-	if (size) {
-		memcpy(rec->mmap.addr + rec->mmap.offset, buf, size);
+	if (size - total_len) {
+		memcpy(rec->mmap.addr + rec->mmap.offset, buf+total_len, size-total_len);
 		rec->bytes_written += size;
 		rec->mmap.offset += size;
 	}
@@ -272,6 +278,9 @@ static void sig_handler(int sig)
 	if (sig == SIGCHLD)
 		child_finished = 1;
 
+	if (sig == SIGBUS)
+		longjmp(mmap_jmp, 1);
+
 	done = 1;
 	signr = sig;
 }
@@ -532,6 +541,7 @@ static int __cmd_record(struct perf_record *rec, int argc, const char **argv)
 	signal(SIGINT, sig_handler);
 	signal(SIGUSR1, sig_handler);
 	signal(SIGTERM, sig_handler);
+	signal(SIGBUS, sig_handler);
 
 	session = perf_session__new(file, false, NULL);
 	if (session == NULL) {
-- 
1.8.3.4 (Apple Git-47)


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-12 14:46 ` [PATCH 4/5] perf record: mmap output file - v5 David Ahern
@ 2013-11-12 14:57   ` Peter Zijlstra
  2013-11-12 15:07     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2013-11-12 14:57 UTC (permalink / raw)
  To: David Ahern
  Cc: acme, linux-kernel, mingo, jolsa, Frederic Weisbecker,
	Namhyung Kim, Mike Galbraith, Stephane Eranian

On Tue, Nov 12, 2013 at 07:46:56AM -0700, David Ahern wrote:
> When recording raw_syscalls for the entire system, e.g.,
>     perf record -e raw_syscalls:*,sched:sched_switch -a -- sleep 1
> 
> you end up with a negative feedback loop as perf itself calls write() fairly
> often. This patch handles the problem by mmap'ing the file in chunks of 64M at
> a time and copies events from the event buffers to the file avoiding write
> system calls.

You know this completely fails the moment you trace faults, because
every new access to one of those pages (to mark it dirty) will trigger a
fault. And we'll take a bunch more faults -- one for each page -- than
we ever did write() syscalls.

Anyway the over all performance improvements still make it worth it. But
the above seems like a false argument in favour of this.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-12 14:57   ` Peter Zijlstra
@ 2013-11-12 15:07     ` Arnaldo Carvalho de Melo
  2013-11-12 15:19       ` Peter Zijlstra
  0 siblings, 1 reply; 42+ messages in thread
From: Arnaldo Carvalho de Melo @ 2013-11-12 15:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Ahern, linux-kernel, mingo, jolsa, Frederic Weisbecker,
	Namhyung Kim, Mike Galbraith, Stephane Eranian

Em Tue, Nov 12, 2013 at 03:57:07PM +0100, Peter Zijlstra escreveu:
> On Tue, Nov 12, 2013 at 07:46:56AM -0700, David Ahern wrote:
> > When recording raw_syscalls for the entire system, e.g.,
> >     perf record -e raw_syscalls:*,sched:sched_switch -a -- sleep 1
> > 
> > you end up with a negative feedback loop as perf itself calls write() fairly
> > often. This patch handles the problem by mmap'ing the file in chunks of 64M at
> > a time and copies events from the event buffers to the file avoiding write
> > system calls.
> 
> You know this completely fails the moment you trace faults, because
> every new access to one of those pages (to mark it dirty) will trigger a
> fault. And we'll take a bunch more faults -- one for each page -- than
> we ever did write() syscalls.

So we should provide a neon lettered warning when doing that, no? :-)
 
> Anyway the over all performance improvements still make it worth it. But
> the above seems like a false argument in favour of this.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-12 15:07     ` Arnaldo Carvalho de Melo
@ 2013-11-12 15:19       ` Peter Zijlstra
  2013-11-12 15:36         ` David Ahern
  0 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2013-11-12 15:19 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: David Ahern, linux-kernel, mingo, jolsa, Frederic Weisbecker,
	Namhyung Kim, Mike Galbraith, Stephane Eranian

On Tue, Nov 12, 2013 at 12:07:51PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Nov 12, 2013 at 03:57:07PM +0100, Peter Zijlstra escreveu:
> > On Tue, Nov 12, 2013 at 07:46:56AM -0700, David Ahern wrote:
> > > When recording raw_syscalls for the entire system, e.g.,
> > >     perf record -e raw_syscalls:*,sched:sched_switch -a -- sleep 1
> > > 
> > > you end up with a negative feedback loop as perf itself calls write() fairly
> > > often. This patch handles the problem by mmap'ing the file in chunks of 64M at
> > > a time and copies events from the event buffers to the file avoiding write
> > > system calls.
> > 
> > You know this completely fails the moment you trace faults, because
> > every new access to one of those pages (to mark it dirty) will trigger a
> > fault. And we'll take a bunch more faults -- one for each page -- than
> > we ever did write() syscalls.
> 
> So we should provide a neon lettered warning when doing that, no? :-)

Dunno.. it _should_ all work. Try it and see what it does.. Once the
events are bigger than a page things might get 'interesting' though.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-12 15:19       ` Peter Zijlstra
@ 2013-11-12 15:36         ` David Ahern
  2013-11-12 21:11           ` Ingo Molnar
  0 siblings, 1 reply; 42+ messages in thread
From: David Ahern @ 2013-11-12 15:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: linux-kernel, mingo, jolsa, Frederic Weisbecker, Namhyung Kim,
	Mike Galbraith, Stephane Eranian

On 11/12/13, 8:19 AM, Peter Zijlstra wrote:
> On Tue, Nov 12, 2013 at 12:07:51PM -0300, Arnaldo Carvalho de Melo wrote:
>> Em Tue, Nov 12, 2013 at 03:57:07PM +0100, Peter Zijlstra escreveu:
>>> On Tue, Nov 12, 2013 at 07:46:56AM -0700, David Ahern wrote:
>>>> When recording raw_syscalls for the entire system, e.g.,
>>>>      perf record -e raw_syscalls:*,sched:sched_switch -a -- sleep 1
>>>>
>>>> you end up with a negative feedback loop as perf itself calls write() fairly
>>>> often. This patch handles the problem by mmap'ing the file in chunks of 64M at
>>>> a time and copies events from the event buffers to the file avoiding write
>>>> system calls.
>>>
>>> You know this completely fails the moment you trace faults, because
>>> every new access to one of those pages (to mark it dirty) will trigger a
>>> fault. And we'll take a bunch more faults -- one for each page -- than
>>> we ever did write() syscalls.
>>
>> So we should provide a neon lettered warning when doing that, no? :-)
>
> Dunno.. it _should_ all work. Try it and see what it does.. Once the
> events are bigger than a page things might get 'interesting' though.
>

one option here is not allow page faults and system wide system calls. 
system wide tracing needs mmap; page faults for a task can use write(). 
I left that option in case something like this came up.

David

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-12 15:36         ` David Ahern
@ 2013-11-12 21:11           ` Ingo Molnar
  2013-11-13 11:34             ` Peter Zijlstra
  0 siblings, 1 reply; 42+ messages in thread
From: Ingo Molnar @ 2013-11-12 21:11 UTC (permalink / raw)
  To: David Ahern
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, linux-kernel, jolsa,
	Frederic Weisbecker, Namhyung Kim, Mike Galbraith,
	Stephane Eranian


* David Ahern <dsahern@gmail.com> wrote:

> > Dunno.. it _should_ all work. Try it and see what it does.. Once the 
> > events are bigger than a page things might get 'interesting' though.

Which could be the case with call-graph recording, right?

> one option here is not allow page faults and system wide system calls. 
> system wide tracing needs mmap; page faults for a task can use write(). 
> I left that option in case something like this came up.

So maybe splice() sounds like the right long term solution after all? :-/

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5/5] perf record: Handle out of space failures writing data with mmap
  2013-11-12 14:46 ` [PATCH 5/5] perf record: Handle out of space failures writing data with mmap David Ahern
@ 2013-11-12 21:19   ` Ingo Molnar
  2013-11-13 14:33     ` David Ahern
  0 siblings, 1 reply; 42+ messages in thread
From: Ingo Molnar @ 2013-11-12 21:19 UTC (permalink / raw)
  To: David Ahern
  Cc: acme, linux-kernel, jolsa, Frederic Weisbecker, Peter Zijlstra,
	Namhyung Kim, Mike Galbraith, Stephane Eranian


* David Ahern <dsahern@gmail.com> wrote:

> If the filesystem where a file is written using mmap fills perf record
> gets a SIGBUS and terminated. Handle the SIGBUS by using longjmp to
> bounce out of the memcpy and fail the write.
> 
> Signed-off-by: David Ahern <dsahern@gmail.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Stephane Eranian <eranian@google.com>
> ---
>  tools/perf/builtin-record.c | 22 ++++++++++++++++------
>  1 file changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 1a4fa5df215b..48d6535d144f 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -29,9 +29,11 @@
>  #include <unistd.h>
>  #include <sched.h>
>  #include <sys/mman.h>
> +#include <setjmp.h>
>  
>  /* output file mmap'ed N chunks at a time */
>  #define MMAP_OUTPUT_SIZE   (64*1024*1024)
> +sigjmp_buf mmap_jmp;
>  
>  #ifndef HAVE_ON_EXIT_SUPPORT
>  #ifndef ATEXIT_MAX
> @@ -141,6 +143,7 @@ static int do_mmap_output(struct perf_record *rec, void *buf, size_t size)
>  {
>  	u64 remaining;
>  	off_t offset;
> +	volatile size_t total_len = 0;
>  
>  	if (rec->mmap.addr == NULL) {
>  next_segment:
> @@ -157,20 +160,23 @@ next_segment:
>  	 * space write what we can then go back and create the
>  	 * next segment
>  	 */
> -	if (size > remaining) {
> -		memcpy(rec->mmap.addr + rec->mmap.offset, buf, remaining);
> +	if (setjmp(mmap_jmp) != 0) {
> +		pr_err("mmap copy failed.\n");
> +		return -1;
> +	}
> +	if (size-total_len > remaining) {
> +		memcpy(rec->mmap.addr + rec->mmap.offset, buf+total_len, remaining);
>  		rec->bytes_written += remaining;
>  
> -		size -= remaining;
> -		buf  += remaining;
> +		total_len += remaining;
>  
>  		munmap(rec->mmap.addr, rec->mmap.out_size);
>  		goto next_segment;
>  	}
>  
>  	/* more data to copy and it fits in the current segment */
> -	if (size) {
> -		memcpy(rec->mmap.addr + rec->mmap.offset, buf, size);
> +	if (size - total_len) {
> +		memcpy(rec->mmap.addr + rec->mmap.offset, buf+total_len, size-total_len);
>  		rec->bytes_written += size;
>  		rec->mmap.offset += size;
>  	}
> @@ -272,6 +278,9 @@ static void sig_handler(int sig)
>  	if (sig == SIGCHLD)
>  		child_finished = 1;
>  
> +	if (sig == SIGBUS)
> +		longjmp(mmap_jmp, 1);

So this isn't very robust, because it assumes that all sources of SIGBUS 
are due to that memcpy() hitting -ENOSPC...

There are several failure modes:

 - If mmap_jmp is not set yet and we get a SIGBUS is some other place, 
   then the longjmp() result will be undefined.

 - If mmap_jmp environment is set, but we've returned from 
   do_mmap_output() already, then the result will be undefined - likely a 
   non-obvious crash.

So at minimum we need a flag that tells us whether the jump environment is 
valid or not - i.e. whether we are executing inside the protected region 
or not - and only do the longjmp() if that flag is set.

Is there really no other way to handle the -ENOSPC case robustly? I guess 
not because the memcpy() really needs memory to write to, but I thought 
I'd ask ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [tip:perf/urgent] perf record: Fix segfault with --no-mmap-pages
  2013-11-12 14:46 ` [PATCH 1/5] perf record: Fix segfault with --no-mmap-pages David Ahern
@ 2013-11-12 21:57   ` tip-bot for David Ahern
  0 siblings, 0 replies; 42+ messages in thread
From: tip-bot for David Ahern @ 2013-11-12 21:57 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, linux-kernel, hpa, mingo, jolsa, dsahern, adrian.hunter, tglx

Commit-ID:  8973504be70b2986a2081eeff7d9a4210dec295d
Gitweb:     http://git.kernel.org/tip/8973504be70b2986a2081eeff7d9a4210dec295d
Author:     David Ahern <dsahern@gmail.com>
AuthorDate: Tue, 12 Nov 2013 07:46:53 -0700
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 12 Nov 2013 16:30:54 -0300

perf record: Fix segfault with --no-mmap-pages

Adrian reported a segfault when using --no-out-pages:

$ tools/perf/perf record -vv --no-out-pages uname
Segmentation fault (core dumped)

The same occurs with --no-mmap-pages. Fix by checking that str is
non-NULL before parsing it.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1384267617-3446-2-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/evlist.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 99dc58e..3960560 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -719,6 +719,9 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
 		{ .tag  = 0 },
 	};
 
+	if (str == NULL)
+		return -1;
+
 	val = parse_tag_value(str, tags);
 	if (val != (unsigned long) -1) {
 		/* we got file size value */

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [tip:perf/urgent] perf evlist: Round mmap pages to power 2 - v2
  2013-11-12 14:46 ` [PATCH 2/5] perf tool: Round mmap pages to power 2 - v2 David Ahern
@ 2013-11-12 21:57   ` tip-bot for David Ahern
  0 siblings, 0 replies; 42+ messages in thread
From: tip-bot for David Ahern @ 2013-11-12 21:57 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: acme, linux-kernel, hpa, mingo, dsahern, tglx, jolsa

Commit-ID:  9639837e95db90d056f4683c911717921519320e
Gitweb:     http://git.kernel.org/tip/9639837e95db90d056f4683c911717921519320e
Author:     David Ahern <dsahern@gmail.com>
AuthorDate: Tue, 12 Nov 2013 07:46:54 -0700
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 12 Nov 2013 16:31:53 -0300

perf evlist: Round mmap pages to power 2 - v2

Currently perf requires the -m / --mmap_pages option to be a power of 2.

To be more user friendly perf should automatically round this up to the
next power of 2.

Currently:
  $ perf record -m 3 -a -- sleep 1
  --mmap_pages/-m value must be a power of two.sleep: Terminated

With patch:
  $ perf record -m 3 -a -- sleep 1
  rounding mmap pages size to 16384 (4 pages)
  ...

v2: Add bytes units to rounding message per Ingo's request. Other
    suggestions (e.g., prefixing INFO) should be addressed by wrapping
    pr_info to catch all instances.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1384267617-3446-3-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/evlist.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 3960560..fb4727d 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -710,7 +710,6 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
 {
 	unsigned int *mmap_pages = opt->value;
 	unsigned long pages, val;
-	size_t size;
 	static struct parse_tag tags[] = {
 		{ .tag  = 'B', .mult = 1       },
 		{ .tag  = 'K', .mult = 1 << 10 },
@@ -726,11 +725,6 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
 	if (val != (unsigned long) -1) {
 		/* we got file size value */
 		pages = PERF_ALIGN(val, page_size) / page_size;
-		if (pages < (1UL << 31) && !is_power_of_2(pages)) {
-			pages = next_pow2(pages);
-			pr_info("rounding mmap pages size to %lu (%lu pages)\n",
-				pages * page_size, pages);
-		}
 	} else {
 		/* we got pages count value */
 		char *eptr;
@@ -741,14 +735,14 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
 		}
 	}
 
-	if (pages > UINT_MAX || pages > SIZE_MAX / page_size) {
-		pr_err("--mmap_pages/-m value too big\n");
-		return -1;
+	if (pages < (1UL << 31) && !is_power_of_2(pages)) {
+		pages = next_pow2(pages);
+		pr_info("rounding mmap pages size to %lu bytes (%lu pages)\n",
+			pages * page_size, pages);
 	}
 
-	size = perf_evlist__mmap_size(pages);
-	if (!size) {
-		pr_err("--mmap_pages/-m value must be a power of two.");
+	if (pages > UINT_MAX || pages > SIZE_MAX / page_size) {
+		pr_err("--mmap_pages/-m value too big\n");
 		return -1;
 	}
 

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [tip:perf/urgent] perf evlist: Refactor mmap_pages parsing
  2013-11-12 14:46 ` [PATCH 3/5] perf tool: Refactor mmap_pages parsing David Ahern
@ 2013-11-12 21:57   ` tip-bot for David Ahern
  0 siblings, 0 replies; 42+ messages in thread
From: tip-bot for David Ahern @ 2013-11-12 21:57 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, linux-kernel, hpa, mingo, jolsa, dsahern, adrian.hunter, tglx

Commit-ID:  33c2dcfdfe7f114cc656bcb4c839f5939d5e60ba
Gitweb:     http://git.kernel.org/tip/33c2dcfdfe7f114cc656bcb4c839f5939d5e60ba
Author:     David Ahern <dsahern@gmail.com>
AuthorDate: Tue, 12 Nov 2013 07:46:55 -0700
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 12 Nov 2013 16:33:22 -0300

perf evlist: Refactor mmap_pages parsing

Logic will be re-used for the out-pages argument for mmap based writes
in perf-record.

Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1384267617-3446-4-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/evlist.c | 39 ++++++++++++++++++++++++++++-----------
 1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index fb4727d..cb19044 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -705,10 +705,9 @@ static size_t perf_evlist__mmap_size(unsigned long pages)
 	return (pages + 1) * page_size;
 }
 
-int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
-				  int unset __maybe_unused)
+static long parse_pages_arg(const char *str, unsigned long min,
+			    unsigned long max)
 {
-	unsigned int *mmap_pages = opt->value;
 	unsigned long pages, val;
 	static struct parse_tag tags[] = {
 		{ .tag  = 'B', .mult = 1       },
@@ -719,7 +718,7 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
 	};
 
 	if (str == NULL)
-		return -1;
+		return -EINVAL;
 
 	val = parse_tag_value(str, tags);
 	if (val != (unsigned long) -1) {
@@ -729,20 +728,38 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
 		/* we got pages count value */
 		char *eptr;
 		pages = strtoul(str, &eptr, 10);
-		if (*eptr != '\0') {
-			pr_err("failed to parse --mmap_pages/-m value\n");
-			return -1;
-		}
+		if (*eptr != '\0')
+			return -EINVAL;
 	}
 
-	if (pages < (1UL << 31) && !is_power_of_2(pages)) {
+	if ((pages == 0) && (min == 0)) {
+		/* leave number of pages at 0 */
+	} else if (pages < (1UL << 31) && !is_power_of_2(pages)) {
+		/* round pages up to next power of 2 */
 		pages = next_pow2(pages);
 		pr_info("rounding mmap pages size to %lu bytes (%lu pages)\n",
 			pages * page_size, pages);
 	}
 
-	if (pages > UINT_MAX || pages > SIZE_MAX / page_size) {
-		pr_err("--mmap_pages/-m value too big\n");
+	if (pages > max)
+		return -EINVAL;
+
+	return pages;
+}
+
+int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
+				  int unset __maybe_unused)
+{
+	unsigned int *mmap_pages = opt->value;
+	unsigned long max = UINT_MAX;
+	long pages;
+
+	if (max < SIZE_MAX / page_size)
+		max = SIZE_MAX / page_size;
+
+	pages = parse_pages_arg(str, 1, max);
+	if (pages < 0) {
+		pr_err("Invalid argument for --mmap_pages/-m\n");
 		return -1;
 	}
 

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-12 21:11           ` Ingo Molnar
@ 2013-11-13 11:34             ` Peter Zijlstra
  2013-11-13 11:50               ` Ingo Molnar
  2013-11-15 16:41               ` David Ahern
  0 siblings, 2 replies; 42+ messages in thread
From: Peter Zijlstra @ 2013-11-13 11:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Ahern, Arnaldo Carvalho de Melo, linux-kernel, jolsa,
	Frederic Weisbecker, Namhyung Kim, Mike Galbraith,
	Stephane Eranian

On Tue, Nov 12, 2013 at 10:11:21PM +0100, Ingo Molnar wrote:
> 
> * David Ahern <dsahern@gmail.com> wrote:
> 
> > > Dunno.. it _should_ all work. Try it and see what it does.. Once the 
> > > events are bigger than a page things might get 'interesting' though.
> 
> Which could be the case with call-graph recording, right?

Not typically, I think we're limiting call graphs to 127 u64, which is
~1k. Maybe you can blow the single page if you also do a large
top-of-stack copy for dwarf/unwind nonsense.

> > one option here is not allow page faults and system wide system calls. 
> > system wide tracing needs mmap; page faults for a task can use write(). 
> > I left that option in case something like this came up.
> 
> So maybe splice() sounds like the right long term solution after all? :-/

Right until you put a tracepoint (kprobe) somewhere in whatever function
is used to transfer a single page into/from a splice pipe.

You can always screw yourself over using this stuff, no exceptions.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-13 11:34             ` Peter Zijlstra
@ 2013-11-13 11:50               ` Ingo Molnar
  2013-11-13 12:16                 ` Peter Zijlstra
  2013-11-13 14:29                 ` David Ahern
  2013-11-15 16:41               ` David Ahern
  1 sibling, 2 replies; 42+ messages in thread
From: Ingo Molnar @ 2013-11-13 11:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Ahern, Arnaldo Carvalho de Melo, linux-kernel, jolsa,
	Frederic Weisbecker, Namhyung Kim, Mike Galbraith,
	Stephane Eranian


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Nov 12, 2013 at 10:11:21PM +0100, Ingo Molnar wrote:
> > 
> > * David Ahern <dsahern@gmail.com> wrote:
> > 
> > > > Dunno.. it _should_ all work. Try it and see what it does.. Once the 
> > > > events are bigger than a page things might get 'interesting' though.
> > 
> > Which could be the case with call-graph recording, right?
> 
> Not typically, I think we're limiting call graphs to 127 u64, which is 
> ~1k. Maybe you can blow the single page if you also do a large 
> top-of-stack copy for dwarf/unwind nonsense.

What I meant was dwarf style call graph recording:

 tools/perf/builtin-record.c:                    const unsigned long default_stack_dump_size = 8192;

Doesn't that mean 8K+ events?

> > > one option here is not allow page faults and system wide system 
> > > calls. system wide tracing needs mmap; page faults for a task can 
> > > use write(). I left that option in case something like this came up.
> > 
> > So maybe splice() sounds like the right long term solution after all?
> > :-/
> 
> Right until you put a tracepoint (kprobe) somewhere in whatever function 
> is used to transfer a single page into/from a splice pipe.

That ought to be a far less common occurance than tracing page faults 
though.

> You can always screw yourself over using this stuff, no exceptions.

Granted, as the many notrace markings demonstrate this stuff really wants 
to observe itself observing itself all the time! :)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-13 11:50               ` Ingo Molnar
@ 2013-11-13 12:16                 ` Peter Zijlstra
  2013-11-13 14:29                 ` David Ahern
  1 sibling, 0 replies; 42+ messages in thread
From: Peter Zijlstra @ 2013-11-13 12:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Ahern, Arnaldo Carvalho de Melo, linux-kernel, jolsa,
	Frederic Weisbecker, Namhyung Kim, Mike Galbraith,
	Stephane Eranian

On Wed, Nov 13, 2013 at 12:50:21PM +0100, Ingo Molnar wrote:
> 
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Tue, Nov 12, 2013 at 10:11:21PM +0100, Ingo Molnar wrote:
> > > 
> > > * David Ahern <dsahern@gmail.com> wrote:
> > > 
> > > > > Dunno.. it _should_ all work. Try it and see what it does.. Once the 
> > > > > events are bigger than a page things might get 'interesting' though.
> > > 
> > > Which could be the case with call-graph recording, right?
> > 
> > Not typically, I think we're limiting call graphs to 127 u64, which is 
> > ~1k. Maybe you can blow the single page if you also do a large 
> > top-of-stack copy for dwarf/unwind nonsense.
> 
> What I meant was dwarf style call graph recording:

Ah, one of those things I've never yet used ;-)

>  tools/perf/builtin-record.c:                    const unsigned long default_stack_dump_size = 8192;
> 
> Doesn't that mean 8K+ events?

Oh yes.. that's ideal to trigger this.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-13 11:50               ` Ingo Molnar
  2013-11-13 12:16                 ` Peter Zijlstra
@ 2013-11-13 14:29                 ` David Ahern
  1 sibling, 0 replies; 42+ messages in thread
From: David Ahern @ 2013-11-13 14:29 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, linux-kernel, jolsa,
	Frederic Weisbecker, Namhyung Kim, Mike Galbraith,
	Stephane Eranian

On 11/13/13, 4:50 AM, Ingo Molnar wrote:
>>>> one option here is not allow page faults and system wide system
>>>> calls. system wide tracing needs mmap; page faults for a task can
>>>> use write(). I left that option in case something like this came up.
>>>
>>> So maybe splice() sounds like the right long term solution after all?
>>> :-/
>>
>> Right until you put a tracepoint (kprobe) somewhere in whatever function
>> is used to transfer a single page into/from a splice pipe.
>
> That ought to be a far less common occurance than tracing page faults
> though.
>
>> You can always screw yourself over using this stuff, no exceptions.
>
> Granted, as the many notrace markings demonstrate this stuff really wants
> to observe itself observing itself all the time! :)

perf can have options for output -- write(), mmap(), splice() -- and 
some logic of when not to use it:

1. don't use write for system wide tracing

2. don't use mmap for page fault tracing

3. don't use splice for .... (need a splice implementation)

and allow the user to override.

David

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5/5] perf record: Handle out of space failures writing data with mmap
  2013-11-12 21:19   ` Ingo Molnar
@ 2013-11-13 14:33     ` David Ahern
  0 siblings, 0 replies; 42+ messages in thread
From: David Ahern @ 2013-11-13 14:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: acme, linux-kernel, jolsa, Frederic Weisbecker, Peter Zijlstra,
	Namhyung Kim, Mike Galbraith, Stephane Eranian

On 11/12/13, 2:19 PM, Ingo Molnar wrote:

> So this isn't very robust, because it assumes that all sources of SIGBUS
> are due to that memcpy() hitting -ENOSPC...
>
> There are several failure modes:
>
>   - If mmap_jmp is not set yet and we get a SIGBUS is some other place,
>     then the longjmp() result will be undefined.
>
>   - If mmap_jmp environment is set, but we've returned from
>     do_mmap_output() already, then the result will be undefined - likely a
>     non-obvious crash.
>
> So at minimum we need a flag that tells us whether the jump environment is
> valid or not - i.e. whether we are executing inside the protected region
> or not - and only do the longjmp() if that flag is set.

Right I meant to add that -- a flag to know when the jmp should be used. 
Got distracted.

>
> Is there really no other way to handle the -ENOSPC case robustly? I guess
> not because the memcpy() really needs memory to write to, but I thought
> I'd ask ...

You need some means to interrupt memcpy and bounce out of it. longjmp is 
the only option I know of.

David

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-13 11:34             ` Peter Zijlstra
  2013-11-13 11:50               ` Ingo Molnar
@ 2013-11-15 16:41               ` David Ahern
  2013-11-18  9:01                 ` Peter Zijlstra
  1 sibling, 1 reply; 42+ messages in thread
From: David Ahern @ 2013-11-15 16:41 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, linux-kernel, jolsa,
	Frederic Weisbecker, Namhyung Kim, Mike Galbraith,
	Stephane Eranian

On 11/13/13, 4:34 AM, Peter Zijlstra wrote:
>>> one option here is not allow page faults and system wide system calls.
>>> system wide tracing needs mmap; page faults for a task can use write().
>>> I left that option in case something like this came up.
>>
>> So maybe splice() sounds like the right long term solution after all? :-/
>
> Right until you put a tracepoint (kprobe) somewhere in whatever function
> is used to transfer a single page into/from a splice pipe.
>
> You can always screw yourself over using this stuff, no exceptions.
>

What now? Can we add the mmap path as an option? Leave write by default 
and users can select mmap if they want? e.g., have out_size default to 0 
and 'perf trace record' can set --out-pages to 64M. It bypasses all 
system calls at the expense of more page faults.

David

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-15 16:41               ` David Ahern
@ 2013-11-18  9:01                 ` Peter Zijlstra
  2013-11-18  9:40                   ` Ingo Molnar
  0 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2013-11-18  9:01 UTC (permalink / raw)
  To: David Ahern
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel, jolsa,
	Frederic Weisbecker, Namhyung Kim, Mike Galbraith,
	Stephane Eranian

On Fri, Nov 15, 2013 at 09:41:53AM -0700, David Ahern wrote:
> On 11/13/13, 4:34 AM, Peter Zijlstra wrote:
> >>>one option here is not allow page faults and system wide system calls.
> >>>system wide tracing needs mmap; page faults for a task can use write().
> >>>I left that option in case something like this came up.
> >>
> >>So maybe splice() sounds like the right long term solution after all? :-/
> >
> >Right until you put a tracepoint (kprobe) somewhere in whatever function
> >is used to transfer a single page into/from a splice pipe.
> >
> >You can always screw yourself over using this stuff, no exceptions.
> >
> 
> What now? Can we add the mmap path as an option?

I'd say an option is always a possibility, but someone please try what
happens if you use stupid large events (dwarf stack copies) on
PERF_COUNT_SW_PAGE_FAULTS (.period=1) while recording with mmap().

The other option is to simply disallow PERF_SAMPLE_STACK_USER for that
event.

Personally I think 8k copies for every event are way stupid anyway,
that's a metric ton of data at a huge cost.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-18  9:01                 ` Peter Zijlstra
@ 2013-11-18  9:40                   ` Ingo Molnar
  2013-11-19  0:24                     ` Namhyung Kim
  0 siblings, 1 reply; 42+ messages in thread
From: Ingo Molnar @ 2013-11-18  9:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Ahern, Arnaldo Carvalho de Melo, linux-kernel, jolsa,
	Frederic Weisbecker, Namhyung Kim, Mike Galbraith,
	Stephane Eranian


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Nov 15, 2013 at 09:41:53AM -0700, David Ahern wrote:
> > On 11/13/13, 4:34 AM, Peter Zijlstra wrote:
> > >>>one option here is not allow page faults and system wide system calls.
> > >>>system wide tracing needs mmap; page faults for a task can use write().
> > >>>I left that option in case something like this came up.
> > >>
> > >>So maybe splice() sounds like the right long term solution after all? :-/
> > >
> > >Right until you put a tracepoint (kprobe) somewhere in whatever function
> > >is used to transfer a single page into/from a splice pipe.
> > >
> > >You can always screw yourself over using this stuff, no exceptions.
> > >
> > 
> > What now? Can we add the mmap path as an option?
> 
> I'd say an option is always a possibility, but someone please try 
> what happens if you use stupid large events (dwarf stack copies) on 
> PERF_COUNT_SW_PAGE_FAULTS (.period=1) while recording with mmap().
> 
> The other option is to simply disallow PERF_SAMPLE_STACK_USER for 
> that event.
> 
> Personally I think 8k copies for every event are way stupid anyway, 
> that's a metric ton of data at a huge cost.

Well, with 1 khz sampling of a single threaded workload it's 8MB per 
second - that's 80 MB for 10 seconds profiling - not the end of the 
world.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-18  9:40                   ` Ingo Molnar
@ 2013-11-19  0:24                     ` Namhyung Kim
  2013-11-19  0:34                       ` David Ahern
  2013-11-19  6:54                       ` Ingo Molnar
  0 siblings, 2 replies; 42+ messages in thread
From: Namhyung Kim @ 2013-11-19  0:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, David Ahern, Arnaldo Carvalho de Melo,
	linux-kernel, jolsa, Frederic Weisbecker, Mike Galbraith,
	Stephane Eranian

Hi Ingo,

On Mon, 18 Nov 2013 10:40:36 +0100, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
>
>> On Fri, Nov 15, 2013 at 09:41:53AM -0700, David Ahern wrote:
>> > On 11/13/13, 4:34 AM, Peter Zijlstra wrote:
>> > >>>one option here is not allow page faults and system wide system calls.
>> > >>>system wide tracing needs mmap; page faults for a task can use write().
>> > >>>I left that option in case something like this came up.
>> > >>
>> > >>So maybe splice() sounds like the right long term solution after all? :-/
>> > >
>> > >Right until you put a tracepoint (kprobe) somewhere in whatever function
>> > >is used to transfer a single page into/from a splice pipe.
>> > >
>> > >You can always screw yourself over using this stuff, no exceptions.
>> > >
>> > 
>> > What now? Can we add the mmap path as an option?
>> 
>> I'd say an option is always a possibility, but someone please try 
>> what happens if you use stupid large events (dwarf stack copies) on 
>> PERF_COUNT_SW_PAGE_FAULTS (.period=1) while recording with mmap().
>> 
>> The other option is to simply disallow PERF_SAMPLE_STACK_USER for 
>> that event.
>> 
>> Personally I think 8k copies for every event are way stupid anyway, 
>> that's a metric ton of data at a huge cost.
>
> Well, with 1 khz sampling of a single threaded workload it's 8MB per 
> second - that's 80 MB for 10 seconds profiling - not the end of the 
> world.

We now use 4 khz sampling frequency by default, just FYI. :)

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-19  0:24                     ` Namhyung Kim
@ 2013-11-19  0:34                       ` David Ahern
  2013-11-19  1:48                         ` Namhyung Kim
                                           ` (3 more replies)
  2013-11-19  6:54                       ` Ingo Molnar
  1 sibling, 4 replies; 42+ messages in thread
From: David Ahern @ 2013-11-19  0:34 UTC (permalink / raw)
  To: Namhyung Kim, Ingo Molnar
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, linux-kernel, jolsa,
	Frederic Weisbecker, Mike Galbraith, Stephane Eranian

On 11/18/13, 5:24 PM, Namhyung Kim wrote:
>>>> What now? Can we add the mmap path as an option?
>>>
>>> I'd say an option is always a possibility, but someone please try
>>> what happens if you use stupid large events (dwarf stack copies) on
>>> PERF_COUNT_SW_PAGE_FAULTS (.period=1) while recording with mmap().
>>>
>>> The other option is to simply disallow PERF_SAMPLE_STACK_USER for
>>> that event.
>>>
>>> Personally I think 8k copies for every event are way stupid anyway,
>>> that's a metric ton of data at a huge cost.
>>
>> Well, with 1 khz sampling of a single threaded workload it's 8MB per
>> second - that's 80 MB for 10 seconds profiling - not the end of the
>> world.
>
> We now use 4 khz sampling frequency by default, just FYI. :)

I think Peter is asking about:
     perf record -e faults -c 1 --call-graph dwarf,8192 -a -- sleep 1

And as expected it is a massive feedback spiraling out of control.

David

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-19  0:34                       ` David Ahern
@ 2013-11-19  1:48                         ` Namhyung Kim
  2013-11-19  2:02                         ` Namhyung Kim
                                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 42+ messages in thread
From: Namhyung Kim @ 2013-11-19  1:48 UTC (permalink / raw)
  To: David Ahern
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	linux-kernel, Jiri Olsa, Frederic Weisbecker, Mike Galbraith,
	Stephane Eranian

Hi David,

On Tue, Nov 19, 2013 at 12:34 AM, David Ahern <dsahern@gmail.com> wrote:
> On 11/18/13, 5:24 PM, Namhyung Kim wrote:
>>>>>
>>>>> What now? Can we add the mmap path as an option?
>>>>
>>>>
>>>> I'd say an option is always a possibility, but someone please try
>>>> what happens if you use stupid large events (dwarf stack copies) on
>>>> PERF_COUNT_SW_PAGE_FAULTS (.period=1) while recording with mmap().
>>>>
>>>> The other option is to simply disallow PERF_SAMPLE_STACK_USER for
>>>> that event.
>>>>
>>>> Personally I think 8k copies for every event are way stupid anyway,
>>>> that's a metric ton of data at a huge cost.
>>>
>>>
>>> Well, with 1 khz sampling of a single threaded workload it's 8MB per
>>> second - that's 80 MB for 10 seconds profiling - not the end of the
>>> world.
>>
>>
>> We now use 4 khz sampling frequency by default, just FYI. :)
>
>
> I think Peter is asking about:
>     perf record -e faults -c 1 --call-graph dwarf,8192 -a -- sleep 1
>
> And as expected it is a massive feedback spiraling out of control.

Ah, I missed that part - just blindly answered about the freq -
thinking he's talking about the default freq of perf record/top.

Anyway, for above case, I guess it won't affect much as stack usually
is in memory so no page fault will occur even recording with mmap
unless the system suffers from a high memory pressure, right?

But I agree that copying 8KB for each sample seems too large.


-- 
Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-19  0:34                       ` David Ahern
  2013-11-19  1:48                         ` Namhyung Kim
@ 2013-11-19  2:02                         ` Namhyung Kim
  2013-11-19  2:13                         ` Namhyung Kim
  2013-11-19 12:08                         ` Peter Zijlstra
  3 siblings, 0 replies; 42+ messages in thread
From: Namhyung Kim @ 2013-11-19  2:02 UTC (permalink / raw)
  To: David Ahern
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	linux-kernel, jolsa, Frederic Weisbecker, Mike Galbraith,
	Stephane Eranian

On Mon, 18 Nov 2013 17:34:49 -0700, David Ahern wrote:
> On 11/18/13, 5:24 PM, Namhyung Kim wrote:
>>>>> What now? Can we add the mmap path as an option?
>>>>
>>>> I'd say an option is always a possibility, but someone please try
>>>> what happens if you use stupid large events (dwarf stack copies) on
>>>> PERF_COUNT_SW_PAGE_FAULTS (.period=1) while recording with mmap().
>>>>
>>>> The other option is to simply disallow PERF_SAMPLE_STACK_USER for
>>>> that event.
>>>>
>>>> Personally I think 8k copies for every event are way stupid anyway,
>>>> that's a metric ton of data at a huge cost.
>>>
>>> Well, with 1 khz sampling of a single threaded workload it's 8MB per
>>> second - that's 80 MB for 10 seconds profiling - not the end of the
>>> world.
>>
>> We now use 4 khz sampling frequency by default, just FYI. :)
>
> I think Peter is asking about:
>     perf record -e faults -c 1 --call-graph dwarf,8192 -a -- sleep 1
>
> And as expected it is a massive feedback spiraling out of control.

How about adding an option to exclude the perf tools from recording for
system-wide (or cpu-wide) session?

This way, we can prevent the feedback loops for page-fault or syscall
events you mentioned IMHO.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-19  0:34                       ` David Ahern
  2013-11-19  1:48                         ` Namhyung Kim
  2013-11-19  2:02                         ` Namhyung Kim
@ 2013-11-19  2:13                         ` Namhyung Kim
  2013-11-19  2:17                           ` David Ahern
  2013-11-19 12:08                         ` Peter Zijlstra
  3 siblings, 1 reply; 42+ messages in thread
From: Namhyung Kim @ 2013-11-19  2:13 UTC (permalink / raw)
  To: David Ahern
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	linux-kernel, jolsa, Frederic Weisbecker, Mike Galbraith,
	Stephane Eranian

On Mon, 18 Nov 2013 17:34:49 -0700, David Ahern wrote:
> On 11/18/13, 5:24 PM, Namhyung Kim wrote:
>>>>> What now? Can we add the mmap path as an option?
>>>>
>>>> I'd say an option is always a possibility, but someone please try
>>>> what happens if you use stupid large events (dwarf stack copies) on
>>>> PERF_COUNT_SW_PAGE_FAULTS (.period=1) while recording with mmap().
>>>>
>>>> The other option is to simply disallow PERF_SAMPLE_STACK_USER for
>>>> that event.
>>>>
>>>> Personally I think 8k copies for every event are way stupid anyway,
>>>> that's a metric ton of data at a huge cost.
>>>
>>> Well, with 1 khz sampling of a single threaded workload it's 8MB per
>>> second - that's 80 MB for 10 seconds profiling - not the end of the
>>> world.
>>
>> We now use 4 khz sampling frequency by default, just FYI. :)
>
> I think Peter is asking about:
>     perf record -e faults -c 1 --call-graph dwarf,8192 -a -- sleep 1

I think it should be

  perf record -e cycles -F 4000 -e faults -c 1 --call-graph dwarf,8192 -a -- sleep 1

(at least to generate the feedback spiral more efficiently..)

Well, I know that we don't support this now.  But wouldn't it make sense
to support this kind of thing?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-19  2:13                         ` Namhyung Kim
@ 2013-11-19  2:17                           ` David Ahern
  2013-11-19  2:30                             ` Namhyung Kim
  0 siblings, 1 reply; 42+ messages in thread
From: David Ahern @ 2013-11-19  2:17 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	linux-kernel, jolsa, Frederic Weisbecker, Mike Galbraith,
	Stephane Eranian

On 11/18/13, 7:13 PM, Namhyung Kim wrote:
> I think it should be
>
>    perf record -e cycles -F 4000 -e faults -c 1 --call-graph dwarf,8192 -a -- sleep 1
>
> (at least to generate the feedback spiral more efficiently..)

you don't need the cycles. faults by itself works. Each event contains > 
2 pages of data in the sample. With mmap-based output a single sample (1 
page fault in any process) generates 2-3 page faults by perf which cause 
2-3 >8k samples to be generated, which generates faults, ....

David



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-19  2:17                           ` David Ahern
@ 2013-11-19  2:30                             ` Namhyung Kim
  2013-11-19  2:33                               ` David Ahern
  0 siblings, 1 reply; 42+ messages in thread
From: Namhyung Kim @ 2013-11-19  2:30 UTC (permalink / raw)
  To: David Ahern
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	linux-kernel, jolsa, Frederic Weisbecker, Mike Galbraith,
	Stephane Eranian

On Mon, 18 Nov 2013 19:17:37 -0700, David Ahern wrote:
> On 11/18/13, 7:13 PM, Namhyung Kim wrote:
>> I think it should be
>>
>>    perf record -e cycles -F 4000 -e faults -c 1 --call-graph dwarf,8192 -a -- sleep 1
>>
>> (at least to generate the feedback spiral more efficiently..)
>
> you don't need the cycles. faults by itself works. Each event contains
> 2 pages of data in the sample. With mmap-based output a single
> sample (1 page fault in any process) generates 2-3 page faults by perf
> which cause 2-3 >8k samples to be generated, which generates faults,
> ....

But after perf touches all pages in ring-buffer and stack, it won't
generate page-faults for itself anymore, right?

Hmm.. thinking it again, perf has all ring-buffer pages in memory when
mmap() called, right?  If so why not doing something like MAP_POPULATE
so that it doesn't need to generate minor-faults?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-19  2:30                             ` Namhyung Kim
@ 2013-11-19  2:33                               ` David Ahern
  2013-11-19  2:36                                 ` Namhyung Kim
  2013-11-19  6:58                                 ` Ingo Molnar
  0 siblings, 2 replies; 42+ messages in thread
From: David Ahern @ 2013-11-19  2:33 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	linux-kernel, jolsa, Frederic Weisbecker, Mike Galbraith,
	Stephane Eranian

On 11/18/13, 7:30 PM, Namhyung Kim wrote:
> On Mon, 18 Nov 2013 19:17:37 -0700, David Ahern wrote:
>> On 11/18/13, 7:13 PM, Namhyung Kim wrote:
>>> I think it should be
>>>
>>>     perf record -e cycles -F 4000 -e faults -c 1 --call-graph dwarf,8192 -a -- sleep 1
>>>
>>> (at least to generate the feedback spiral more efficiently..)
>>
>> you don't need the cycles. faults by itself works. Each event contains
>> 2 pages of data in the sample. With mmap-based output a single
>> sample (1 page fault in any process) generates 2-3 page faults by perf
>> which cause 2-3 >8k samples to be generated, which generates faults,
>> ....
>
> But after perf touches all pages in ring-buffer and stack, it won't
> generate page-faults for itself anymore, right?
>
> Hmm.. thinking it again, perf has all ring-buffer pages in memory when
> mmap() called, right?  If so why not doing something like MAP_POPULATE
> so that it doesn't need to generate minor-faults?

This is mmap'ed output, not the ring buffers or its stack. As the output 
file grows, new pages are needed and those are allocated on access via 
page faults. The ftruncate only extends the file size, it does not 
allocate pages at that time.

David

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-19  2:33                               ` David Ahern
@ 2013-11-19  2:36                                 ` Namhyung Kim
  2013-11-19  6:58                                 ` Ingo Molnar
  1 sibling, 0 replies; 42+ messages in thread
From: Namhyung Kim @ 2013-11-19  2:36 UTC (permalink / raw)
  To: David Ahern
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	linux-kernel, Jiri Olsa, Frederic Weisbecker, Mike Galbraith,
	Stephane Eranian

On Tue, Nov 19, 2013 at 2:33 AM, David Ahern <dsahern@gmail.com> wrote:
> On 11/18/13, 7:30 PM, Namhyung Kim wrote:
>>
>> On Mon, 18 Nov 2013 19:17:37 -0700, David Ahern wrote:
>>>
>>> On 11/18/13, 7:13 PM, Namhyung Kim wrote:
>>>>
>>>> I think it should be
>>>>
>>>>     perf record -e cycles -F 4000 -e faults -c 1 --call-graph dwarf,8192
>>>> -a -- sleep 1
>>>>
>>>> (at least to generate the feedback spiral more efficiently..)
>>>
>>>
>>> you don't need the cycles. faults by itself works. Each event contains
>>> 2 pages of data in the sample. With mmap-based output a single
>>> sample (1 page fault in any process) generates 2-3 page faults by perf
>>> which cause 2-3 >8k samples to be generated, which generates faults,
>>> ....
>>
>>
>> But after perf touches all pages in ring-buffer and stack, it won't
>> generate page-faults for itself anymore, right?
>>
>> Hmm.. thinking it again, perf has all ring-buffer pages in memory when
>> mmap() called, right?  If so why not doing something like MAP_POPULATE
>> so that it doesn't need to generate minor-faults?
>
>
> This is mmap'ed output, not the ring buffers or its stack. As the output
> file grows, new pages are needed and those are allocated on access via page
> faults. The ftruncate only extends the file size, it does not allocate pages
> at that time.

Argh, I was completely confused! ;-)

Please feel free to ignore what I said here..

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-19  0:24                     ` Namhyung Kim
  2013-11-19  0:34                       ` David Ahern
@ 2013-11-19  6:54                       ` Ingo Molnar
  1 sibling, 0 replies; 42+ messages in thread
From: Ingo Molnar @ 2013-11-19  6:54 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, David Ahern, Arnaldo Carvalho de Melo,
	linux-kernel, jolsa, Frederic Weisbecker, Mike Galbraith,
	Stephane Eranian


* Namhyung Kim <namhyung@kernel.org> wrote:

> > Well, with 1 khz sampling of a single threaded workload it's 8MB 
> > per second - that's 80 MB for 10 seconds profiling - not the end 
> > of the world.
> 
> We now use 4 khz sampling frequency by default, just FYI. :)

Yes, but if someone samples with 1 khz that's still plenty enough for 
a long enough run.

With 4k sampling you coul do 2.5 seconds of profiling and still have 
80 MB of data :-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-19  2:33                               ` David Ahern
  2013-11-19  2:36                                 ` Namhyung Kim
@ 2013-11-19  6:58                                 ` Ingo Molnar
  2013-11-19 11:48                                   ` Peter Zijlstra
  1 sibling, 1 reply; 42+ messages in thread
From: Ingo Molnar @ 2013-11-19  6:58 UTC (permalink / raw)
  To: David Ahern
  Cc: Namhyung Kim, Peter Zijlstra, Arnaldo Carvalho de Melo,
	linux-kernel, jolsa, Frederic Weisbecker, Mike Galbraith,
	Stephane Eranian


* David Ahern <dsahern@gmail.com> wrote:

> This is mmap'ed output, not the ring buffers or its stack. As the 
> output file grows, new pages are needed and those are allocated on 
> access via page faults. The ftruncate only extends the file size, it 
> does not allocate pages at that time.

Hm, doesn't MAP_POPULATE prefault pages in this case as well? 
Prefaulting would avoid the most obvious page fault driven feedback 
loops and it would probably be faster as well, because it avoids all 
the pagefaults ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-19  6:58                                 ` Ingo Molnar
@ 2013-11-19 11:48                                   ` Peter Zijlstra
  2013-11-19 11:49                                     ` Peter Zijlstra
  0 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2013-11-19 11:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Ahern, Namhyung Kim, Arnaldo Carvalho de Melo,
	linux-kernel, jolsa, Frederic Weisbecker, Mike Galbraith,
	Stephane Eranian

On Tue, Nov 19, 2013 at 07:58:35AM +0100, Ingo Molnar wrote:
> 
> * David Ahern <dsahern@gmail.com> wrote:
> 
> > This is mmap'ed output, not the ring buffers or its stack. As the 
> > output file grows, new pages are needed and those are allocated on 
> > access via page faults. The ftruncate only extends the file size, it 
> > does not allocate pages at that time.
> 
> Hm, doesn't MAP_POPULATE prefault pages in this case as well? 
> Prefaulting would avoid the most obvious page fault driven feedback 
> loops and it would probably be faster as well, because it avoids all 
> the pagefaults ...

MAP_POPULATE does indeed seem to write pre-fault for writable maps
(mm/mlock.c:__mlock_vma_pages_range).

And that does indeed seem to side-step the perf sw pagefault event, but
that is arguably a perf bug.

If we were to fix this, MAP_POPULATE would simply generate a metric ton
of events, we'd have to write out with yet more MAP_POPULATE.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-19 11:48                                   ` Peter Zijlstra
@ 2013-11-19 11:49                                     ` Peter Zijlstra
  2013-11-19 13:13                                       ` Ingo Molnar
  0 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2013-11-19 11:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Ahern, Namhyung Kim, Arnaldo Carvalho de Melo,
	linux-kernel, jolsa, Frederic Weisbecker, Mike Galbraith,
	Stephane Eranian

On Tue, Nov 19, 2013 at 12:48:10PM +0100, Peter Zijlstra wrote:
> And that does indeed seem to side-step the perf sw pagefault event, but
> that is arguably a perf bug.

To clarify; mm/memory.c:handle_mm_fault() is where the VM counts its
generic PGFAULT event, but our perf sw event is in the arch fault
handler.

So they count different but related things.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-19  0:34                       ` David Ahern
                                           ` (2 preceding siblings ...)
  2013-11-19  2:13                         ` Namhyung Kim
@ 2013-11-19 12:08                         ` Peter Zijlstra
  3 siblings, 0 replies; 42+ messages in thread
From: Peter Zijlstra @ 2013-11-19 12:08 UTC (permalink / raw)
  To: David Ahern
  Cc: Namhyung Kim, Ingo Molnar, Arnaldo Carvalho de Melo,
	linux-kernel, jolsa, Frederic Weisbecker, Mike Galbraith,
	Stephane Eranian

On Mon, Nov 18, 2013 at 05:34:49PM -0700, David Ahern wrote:
> I think Peter is asking about:
>     perf record -e faults -c 1 --call-graph dwarf,8192 -a -- sleep 1
> 
> And as expected it is a massive feedback spiraling out of control.

I was and good to know, thanks for testing.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-19 11:49                                     ` Peter Zijlstra
@ 2013-11-19 13:13                                       ` Ingo Molnar
  2013-11-19 13:45                                         ` Peter Zijlstra
  0 siblings, 1 reply; 42+ messages in thread
From: Ingo Molnar @ 2013-11-19 13:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Ahern, Namhyung Kim, Arnaldo Carvalho de Melo,
	linux-kernel, jolsa, Frederic Weisbecker, Mike Galbraith,
	Stephane Eranian


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Nov 19, 2013 at 12:48:10PM +0100, Peter Zijlstra wrote:
> > And that does indeed seem to side-step the perf sw pagefault event, but
> > that is arguably a perf bug.
> 
> To clarify; mm/memory.c:handle_mm_fault() is where the VM counts its 
> generic PGFAULT event, but our perf sw event is in the arch fault 
> handler.
> 
> So they count different but related things.

I think that assymetry was intended: we didn't want to count 
'synchronous' pagefaults like get_user_pages() or mlock() bringing in 
pages, only asynchronous/real ones, or so.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-19 13:13                                       ` Ingo Molnar
@ 2013-11-19 13:45                                         ` Peter Zijlstra
  2013-11-19 15:31                                           ` Ingo Molnar
  0 siblings, 1 reply; 42+ messages in thread
From: Peter Zijlstra @ 2013-11-19 13:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Ahern, Namhyung Kim, Arnaldo Carvalho de Melo,
	linux-kernel, jolsa, Frederic Weisbecker, Mike Galbraith,
	Stephane Eranian

On Tue, Nov 19, 2013 at 02:13:04PM +0100, Ingo Molnar wrote:
> 
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Tue, Nov 19, 2013 at 12:48:10PM +0100, Peter Zijlstra wrote:
> > > And that does indeed seem to side-step the perf sw pagefault event, but
> > > that is arguably a perf bug.
> > 
> > To clarify; mm/memory.c:handle_mm_fault() is where the VM counts its 
> > generic PGFAULT event, but our perf sw event is in the arch fault 
> > handler.
> > 
> > So they count different but related things.
> 
> I think that assymetry was intended: we didn't want to count 
> 'synchronous' pagefaults like get_user_pages() or mlock() bringing in 
> pages, only asynchronous/real ones, or so.

OK, I couldn't remember.

Anyway, I don't want to hold up this patch set, and the speedup in the
'normal' case is nice.

The only reason I reacted was because the changelog mentioned avoiding a
feedback loop -- so I obviously had to point out that it didn't do such
a thing, it only changed the details of the loop.

I'm fairly certain this particular problem is unavoidable, no matter
what the mechanism used, you can always create feedback.

At some point people will just have to know wtf they're doing. Also, I
don't think anything really bad happens, at worst we'll just drop a
bunch of events because the output cannot keep up for obvious reasons.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-19 13:45                                         ` Peter Zijlstra
@ 2013-11-19 15:31                                           ` Ingo Molnar
  2013-11-19 16:09                                             ` David Ahern
  0 siblings, 1 reply; 42+ messages in thread
From: Ingo Molnar @ 2013-11-19 15:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Ahern, Namhyung Kim, Arnaldo Carvalho de Melo,
	linux-kernel, jolsa, Frederic Weisbecker, Mike Galbraith,
	Stephane Eranian


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Nov 19, 2013 at 02:13:04PM +0100, Ingo Molnar wrote:
> > 
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > On Tue, Nov 19, 2013 at 12:48:10PM +0100, Peter Zijlstra wrote:
> > > > And that does indeed seem to side-step the perf sw pagefault event, but
> > > > that is arguably a perf bug.
> > > 
> > > To clarify; mm/memory.c:handle_mm_fault() is where the VM counts its 
> > > generic PGFAULT event, but our perf sw event is in the arch fault 
> > > handler.
> > > 
> > > So they count different but related things.
> > 
> > I think that assymetry was intended: we didn't want to count 
> > 'synchronous' pagefaults like get_user_pages() or mlock() bringing 
> > in pages, only asynchronous/real ones, or so.
> 
> OK, I couldn't remember.
> 
> Anyway, I don't want to hold up this patch set, and the speedup in 
> the 'normal' case is nice.
> 
> The only reason I reacted was because the changelog mentioned 
> avoiding a feedback loop -- so I obviously had to point out that it 
> didn't do such a thing, it only changed the details of the loop.

So with MAP_POPULATE the 'feedback window' is moved entirely into the 
kernel (to within a single syscall) and is also reduced significantly, 
compared to a write() loop.

While you are right that it is not an elimination of the problem - yet 
it is still a significant reduction of its cross section surface in 
practice.

> I'm fairly certain this particular problem is unavoidable, no matter 
> what the mechanism used, you can always create feedback.

Well, we could exclude the profiling task itself from profiling events 
(just like ftrace and core bits of perf does it out of necessity), but 
I intentionally wanted to avoid that, to make sure we are honest and 
to make sure people don't tolerate profiling overhead that disturbs 
other workloads.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-19 15:31                                           ` Ingo Molnar
@ 2013-11-19 16:09                                             ` David Ahern
  2013-11-19 16:14                                               ` Ingo Molnar
  0 siblings, 1 reply; 42+ messages in thread
From: David Ahern @ 2013-11-19 16:09 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: Namhyung Kim, Arnaldo Carvalho de Melo, linux-kernel, jolsa,
	Frederic Weisbecker, Mike Galbraith, Stephane Eranian

On 11/19/13, 8:31 AM, Ingo Molnar wrote:
>> The only reason I reacted was because the changelog mentioned
>> avoiding a feedback loop -- so I obviously had to point out that it
>> didn't do such a thing, it only changed the details of the loop.
>
> So with MAP_POPULATE the 'feedback window' is moved entirely into the
> kernel (to within a single syscall) and is also reduced significantly,
> compared to a write() loop.

As I understand it we have to use MAP_SHARED, not MAP_PRIVATE for files. 
So MAP_POPULATE does not work here. (And I tried to verify -- 
MAP_PRIVATE | MAP_POPULATE drops the feedback loop, but the file is 0's 
after the header).

>> I'm fairly certain this particular problem is unavoidable, no matter
>> what the mechanism used, you can always create feedback.
>
> Well, we could exclude the profiling task itself from profiling events
> (just like ftrace and core bits of perf does it out of necessity), but
> I intentionally wanted to avoid that, to make sure we are honest and
> to make sure people don't tolerate profiling overhead that disturbs
> other workloads.

Samples generated by perf itself need to be observable -- e.g. process 
scheduling I want to see the time consumed by the data collector itself 
and there are times when 'perf trace -- perf ...' is useful.

perf just needs options to do the right thing and stay out of its own 
way. Having a restriction that you can't do system wide collection of 
systems calls AND faults does not seem all that limiting.

David


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/5] perf record: mmap output file - v5
  2013-11-19 16:09                                             ` David Ahern
@ 2013-11-19 16:14                                               ` Ingo Molnar
  0 siblings, 0 replies; 42+ messages in thread
From: Ingo Molnar @ 2013-11-19 16:14 UTC (permalink / raw)
  To: David Ahern
  Cc: Peter Zijlstra, Namhyung Kim, Arnaldo Carvalho de Melo,
	linux-kernel, jolsa, Frederic Weisbecker, Mike Galbraith,
	Stephane Eranian


* David Ahern <dsahern@gmail.com> wrote:

> > Well, we could exclude the profiling task itself from profiling 
> > events (just like ftrace and core bits of perf does it out of 
> > necessity), but I intentionally wanted to avoid that, to make sure 
> > we are honest and to make sure people don't tolerate profiling 
> > overhead that disturbs other workloads.
> 
> Samples generated by perf itself need to be observable -- e.g. 
> process scheduling I want to see the time consumed by the data 
> collector itself and there are times when 'perf trace -- perf ...' 
> is useful.

Absolutely agreed - a measurement instrument affects the measurement, 
and we must not try to hide that.

Still we can try to make the disturbance smaller and more managable.

For example if I have enough RAM it should be possible to run perf 
record with a 1 GB ring-buffer, and in that case as long as the 
perf.data is smaller than 1 GB there should be no writeout or any 
other IO activity until the measurement ends.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2013-11-19 16:14 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-12 14:46 [PATCH 0/5] perf record: mmap output file - v5 David Ahern
2013-11-12 14:46 ` [PATCH 1/5] perf record: Fix segfault with --no-mmap-pages David Ahern
2013-11-12 21:57   ` [tip:perf/urgent] " tip-bot for David Ahern
2013-11-12 14:46 ` [PATCH 2/5] perf tool: Round mmap pages to power 2 - v2 David Ahern
2013-11-12 21:57   ` [tip:perf/urgent] perf evlist: " tip-bot for David Ahern
2013-11-12 14:46 ` [PATCH 3/5] perf tool: Refactor mmap_pages parsing David Ahern
2013-11-12 21:57   ` [tip:perf/urgent] perf evlist: " tip-bot for David Ahern
2013-11-12 14:46 ` [PATCH 4/5] perf record: mmap output file - v5 David Ahern
2013-11-12 14:57   ` Peter Zijlstra
2013-11-12 15:07     ` Arnaldo Carvalho de Melo
2013-11-12 15:19       ` Peter Zijlstra
2013-11-12 15:36         ` David Ahern
2013-11-12 21:11           ` Ingo Molnar
2013-11-13 11:34             ` Peter Zijlstra
2013-11-13 11:50               ` Ingo Molnar
2013-11-13 12:16                 ` Peter Zijlstra
2013-11-13 14:29                 ` David Ahern
2013-11-15 16:41               ` David Ahern
2013-11-18  9:01                 ` Peter Zijlstra
2013-11-18  9:40                   ` Ingo Molnar
2013-11-19  0:24                     ` Namhyung Kim
2013-11-19  0:34                       ` David Ahern
2013-11-19  1:48                         ` Namhyung Kim
2013-11-19  2:02                         ` Namhyung Kim
2013-11-19  2:13                         ` Namhyung Kim
2013-11-19  2:17                           ` David Ahern
2013-11-19  2:30                             ` Namhyung Kim
2013-11-19  2:33                               ` David Ahern
2013-11-19  2:36                                 ` Namhyung Kim
2013-11-19  6:58                                 ` Ingo Molnar
2013-11-19 11:48                                   ` Peter Zijlstra
2013-11-19 11:49                                     ` Peter Zijlstra
2013-11-19 13:13                                       ` Ingo Molnar
2013-11-19 13:45                                         ` Peter Zijlstra
2013-11-19 15:31                                           ` Ingo Molnar
2013-11-19 16:09                                             ` David Ahern
2013-11-19 16:14                                               ` Ingo Molnar
2013-11-19 12:08                         ` Peter Zijlstra
2013-11-19  6:54                       ` Ingo Molnar
2013-11-12 14:46 ` [PATCH 5/5] perf record: Handle out of space failures writing data with mmap David Ahern
2013-11-12 21:19   ` Ingo Molnar
2013-11-13 14:33     ` David Ahern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.