linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: tip-bot for Wang Nan <tipbot@zytor.com>
To: linux-tip-commits@vger.kernel.org
Cc: jolsa@kernel.org, namhyung@kernel.org, mhiramat@kernel.org,
	nilayvaish@gmail.com, linux-kernel@vger.kernel.org,
	mingo@kernel.org, lizefan@huawei.com, wangnan0@huawei.com,
	hekuang@huawei.com, acme@redhat.com, tglx@linutronix.de,
	hpa@zytor.com
Subject: [tip:perf/core] perf record: Add --tail-synthesize option
Date: Sat, 16 Jul 2016 13:51:43 -0700	[thread overview]
Message-ID: <tip-4ea648aec01982d5a57816a95c4665d6081e78f9@git.kernel.org> (raw)
In-Reply-To: <1468485287-33422-16-git-send-email-wangnan0@huawei.com>

Commit-ID:  4ea648aec01982d5a57816a95c4665d6081e78f9
Gitweb:     http://git.kernel.org/tip/4ea648aec01982d5a57816a95c4665d6081e78f9
Author:     Wang Nan <wangnan0@huawei.com>
AuthorDate: Thu, 14 Jul 2016 08:34:47 +0000
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 15 Jul 2016 17:27:52 -0300

perf record: Add --tail-synthesize option

When working with overwritable ring buffer there's a inconvenience
problem: if perf dumps data after a long period after it starts,
non-sample events may lost, which makes following 'perf report' unable
to identify proc name and mmap layout. For example:

 # perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output \
        dd if=/dev/zero of=/dev/null

send SIGUSR2 after dd runs long enough. The resuling perf.data lost
correct comm and mmap events:

 # perf script -i perf.data.2016061522374354
 perf 24478 [004] 2581325.601789:  raw_syscalls:sys_exit: NR 0 = 512
 ^^^^
 Should be 'dd'
                   27b2e8 syscall_slow_exit_work+0xfe2000e3 (/lib/modules/4.6.0-rc3+/build/vmlinux)
                   203cc7 do_syscall_64+0xfe200117 (/lib/modules/4.6.0-rc3+/build/vmlinux)
                   b18d83 return_from_SYSCALL_64+0xfe200000 (/lib/modules/4.6.0-rc3+/build/vmlinux)
             7f47c417edf0 [unknown] ([unknown])
             ^^^^^^^^^^^^
             Fail to unwind

This patch provides a '--tail-synthesize' option, allows perf to collect
system status when finalizing output file. In resuling output file, the
non-sample events reflect system status when dumping data.

After this patch:
 # perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output --tail-synthesize \
        dd if=/dev/zero of=/dev/null

 # perf script -i perf.data.2016061600544998
 dd 27364 [004] 2583244.994464: raw_syscalls:sys_enter: NR 1 (1, ...
 ^^
 Correct comm
                   203a18 syscall_trace_enter_phase2+0xfe2001a8 ([kernel.kallsyms])
                   203aa5 syscall_trace_enter+0xfe200055 ([kernel.kallsyms])
                   203caa do_syscall_64+0xfe2000fa ([kernel.kallsyms])
                   b18d83 return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms])
                    d8e50 __GI___libc_write+0xffff01d9639f4010 (/tmp/oxygen_root-w00229757/lib64/libc-2.18.so)
                    ^^^^^
                    Correct unwind

This option doesn't aim to solve this problem completely. If a process
terminates before SIGUSR2, we still lost its COMM and MMAP events. For
example, we can't unwind correctly from the final perf.data we get from
the previous example, because when perf collects the final output file
(when we press C-c), 'dd' has been terminated so its '/proc/<pid>/mmap'
becomes empty.

However, this is a cheaper choice. To completely solve this problem we
need to continously output non-sample events. To satisify the
requirement of daemonization, we need to merge them periodically. It is
possible but requires much more code and cycles.

Automatically select --tail-synthesize when --overwrite is provided.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nilay Vaish <nilayvaish@gmail.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1468485287-33422-16-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-record.txt |  8 ++++++++
 tools/perf/builtin-record.c              | 31 +++++++++++++++++++++++++------
 tools/perf/perf.h                        |  1 +
 3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 384c630..69966ab 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -367,6 +367,12 @@ options.
 'perf record --dry-run -e' can act as a BPF script compiler if llvm.dump-obj
 in config file is set to true.
 
+--tail-synthesize::
+Instead of collecting non-sample events (for example, fork, comm, mmap) at
+the beginning of record, collect them during finalizing an output file.
+The collected non-sample events reflects the status of the system when
+record is finished.
+
 --overwrite::
 Makes all events use an overwritable ring buffer. An overwritable ring
 buffer works like a flight recorder: when it gets full, the kernel will
@@ -381,6 +387,8 @@ those fitting in the ring buffer at that moment.
 'overwrite' attribute can also be set or canceled for an event using
 config terms. For example: 'cycles/overwrite/' and 'instructions/no-overwrite/'.
 
+Implies --tail-synthesize.
+
 SEE ALSO
 --------
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 39c7486..8f2c16d 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -604,13 +604,16 @@ record__finish_output(struct record *rec)
 	return;
 }
 
-static int record__synthesize_workload(struct record *rec)
+static int record__synthesize_workload(struct record *rec, bool tail)
 {
 	struct {
 		struct thread_map map;
 		struct thread_map_data map_data;
 	} thread_map;
 
+	if (rec->opts.tail_synthesize != tail)
+		return 0;
+
 	thread_map.map.nr = 1;
 	thread_map.map.map[0].pid = rec->evlist->workload.pid;
 	thread_map.map.map[0].comm = NULL;
@@ -621,7 +624,7 @@ static int record__synthesize_workload(struct record *rec)
 						 rec->opts.proc_map_timeout);
 }
 
-static int record__synthesize(struct record *rec);
+static int record__synthesize(struct record *rec, bool tail);
 
 static int
 record__switch_output(struct record *rec, bool at_exit)
@@ -632,6 +635,10 @@ record__switch_output(struct record *rec, bool at_exit)
 	/* Same Size:      "2015122520103046"*/
 	char timestamp[] = "InvalidTimestamp";
 
+	record__synthesize(rec, true);
+	if (target__none(&rec->opts.target))
+		record__synthesize_workload(rec, true);
+
 	rec->samples = 0;
 	record__finish_output(rec);
 	err = fetch_current_timestamp(timestamp, sizeof(timestamp));
@@ -654,7 +661,7 @@ record__switch_output(struct record *rec, bool at_exit)
 
 	/* Output tracking events */
 	if (!at_exit) {
-		record__synthesize(rec);
+		record__synthesize(rec, false);
 
 		/*
 		 * In 'perf record --switch-output' without -a,
@@ -666,7 +673,7 @@ record__switch_output(struct record *rec, bool at_exit)
 		 * perf_event__synthesize_thread_map() for those events.
 		 */
 		if (target__none(&rec->opts.target))
-			record__synthesize_workload(rec);
+			record__synthesize_workload(rec, false);
 	}
 	return fd;
 }
@@ -720,7 +727,7 @@ static const struct perf_event_mmap_page *record__pick_pc(struct record *rec)
 	return NULL;
 }
 
-static int record__synthesize(struct record *rec)
+static int record__synthesize(struct record *rec, bool tail)
 {
 	struct perf_session *session = rec->session;
 	struct machine *machine = &session->machines.host;
@@ -730,6 +737,9 @@ static int record__synthesize(struct record *rec)
 	int fd = perf_data_file__fd(file);
 	int err = 0;
 
+	if (rec->opts.tail_synthesize != tail)
+		return 0;
+
 	if (file->is_pipe) {
 		err = perf_event__synthesize_attrs(tool, session,
 						   process_synthesized_event);
@@ -893,7 +903,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 
 	machine = &session->machines.host;
 
-	err = record__synthesize(rec);
+	err = record__synthesize(rec, false);
 	if (err < 0)
 		goto out_child;
 
@@ -1057,6 +1067,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	if (!quiet)
 		fprintf(stderr, "[ perf record: Woken up %ld times to write data ]\n", waking);
 
+	if (target__none(&rec->opts.target))
+		record__synthesize_workload(rec, true);
+
 out_child:
 	if (forks) {
 		int exit_status;
@@ -1075,6 +1088,7 @@ out_child:
 	} else
 		status = err;
 
+	record__synthesize(rec, true);
 	/* this will be recalculated during process_buildids() */
 	rec->samples = 0;
 
@@ -1399,6 +1413,8 @@ struct option __record_options[] = {
 	OPT_BOOLEAN_SET('i', "no-inherit", &record.opts.no_inherit,
 			&record.opts.no_inherit_set,
 			"child tasks do not inherit counters"),
+	OPT_BOOLEAN(0, "tail-synthesize", &record.opts.tail_synthesize,
+		    "synthesize non-sample events at the end of output"),
 	OPT_BOOLEAN(0, "overwrite", &record.opts.overwrite, "use overwrite mode"),
 	OPT_UINTEGER('F', "freq", &record.opts.user_freq, "profile at this frequency"),
 	OPT_CALLBACK('m', "mmap-pages", &record.opts, "pages[,pages]",
@@ -1610,6 +1626,9 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
 		}
 	}
 
+	if (record.opts.overwrite)
+		record.opts.tail_synthesize = true;
+
 	if (rec->evlist->nr_entries == 0 &&
 	    perf_evlist__add_default(rec->evlist) < 0) {
 		pr_err("Not enough memory for event selector list\n");
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 608b42b..a7e0f14 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -59,6 +59,7 @@ struct record_opts {
 	bool	     record_switch_events;
 	bool	     all_kernel;
 	bool	     all_user;
+	bool	     tail_synthesize;
 	bool	     overwrite;
 	unsigned int freq;
 	unsigned int mmap_pages;

  reply	other threads:[~2016-07-16 20:51 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-14  8:34 [PATCH v16 00/15] perf tools: Support overwritable ring buffer Wang Nan
2016-07-14  8:34 ` [PATCH v16 01/15] perf tools: Drop redundant evsel->overwrite indicator Wang Nan
2016-07-16 20:45   ` [tip:perf/core] perf evlist: " tip-bot for Arnaldo Carvalho de Melo
2016-07-14  8:34 ` [PATCH v16 02/15] tools lib fd array: Allow associating a pointer cookie with each entry Wang Nan
2016-07-16 20:46   ` [tip:perf/core] " tip-bot for Wang Nan
2016-07-14  8:34 ` [PATCH v16 03/15] perf tools: Update perf evlist mmap related APIs and helpers Wang Nan
2016-07-16 20:46   ` [tip:perf/core] perf evlist: Update " tip-bot for Wang Nan
2016-07-14  8:34 ` [PATCH v16 04/15] perf tools: Decouple record__mmap_read() and evlist Wang Nan
2016-07-16 20:47   ` [tip:perf/core] perf record: " tip-bot for Wang Nan
2016-07-14  8:34 ` [PATCH v16 05/15] perf tools: Record mmap cookie into fdarray private field Wang Nan
2016-07-15 14:59   ` Jiri Olsa
2016-07-16 20:47   ` [tip:perf/core] perf evlist: " tip-bot for Wang Nan
2016-07-14  8:34 ` [PATCH v16 06/15] perf tools: Extract common code in mmap failure processing Wang Nan
2016-07-16 20:48   ` [tip:perf/core] perf evlist: " tip-bot for Wang Nan
2016-07-14  8:34 ` [PATCH v16 07/15] perf tools: Introduce backward_mmap array for evlist Wang Nan
2016-07-16 20:48   ` [tip:perf/core] perf evlist: " tip-bot for Wang Nan
2016-07-14  8:34 ` [PATCH v16 08/15] perf tools: Map backward events to backward_mmap Wang Nan
2016-07-16 20:48   ` [tip:perf/core] perf evlist: " tip-bot for Wang Nan
2016-07-14  8:34 ` [PATCH v16 09/15] perf tools: Drop evlist->backward Wang Nan
2016-07-16 20:49   ` [tip:perf/core] perf evlist: " tip-bot for Wang Nan
2016-07-14  8:34 ` [PATCH v16 10/15] perf tools: Setup backward mmap state machine Wang Nan
2016-07-16 20:49   ` [tip:perf/core] perf evlist: " tip-bot for Wang Nan
2016-07-14  8:34 ` [PATCH v16 11/15] perf record: Read from overwritable ring buffer Wang Nan
2016-07-16 20:50   ` [tip:perf/core] " tip-bot for Wang Nan
2016-07-14  8:34 ` [PATCH v16 12/15] perf tools: Make perf_evlist__{pause,resume} internal helpers Wang Nan
2016-07-16 20:50   ` [tip:perf/core] perf evlist: Make {pause,resume} " tip-bot for Wang Nan
2016-07-14  8:34 ` [PATCH v16 13/15] perf tools: Enable overwrite settings Wang Nan
2016-07-16 20:50   ` [tip:perf/core] " tip-bot for Wang Nan
2016-07-14  8:34 ` [PATCH v16 14/15] perf tools: Don't warn about out of order event if write_backward is used Wang Nan
2016-07-15 14:59   ` Jiri Olsa
2016-07-16 20:51   ` [tip:perf/core] perf session: " tip-bot for Wang Nan
2016-07-14  8:34 ` [PATCH v16 15/15] perf tools: Add --tail-synthesize option Wang Nan
2016-07-16 20:51   ` tip-bot for Wang Nan [this message]
2016-07-15 15:00 ` [PATCH v16 00/15] perf tools: Support overwritable ring buffer Jiri Olsa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=tip-4ea648aec01982d5a57816a95c4665d6081e78f9@git.kernel.org \
    --to=tipbot@zytor.com \
    --cc=acme@redhat.com \
    --cc=hekuang@huawei.com \
    --cc=hpa@zytor.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=mhiramat@kernel.org \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=nilayvaish@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=wangnan0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).