linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/8] perf tools: Support overwritable ring buffer
@ 2016-06-15  2:23 Wang Nan
  2016-06-15  2:23 ` [PATCH v7 1/8] perf evlist: Introduce aux evlist Wang Nan
                   ` (7 more replies)
  0 siblings, 8 replies; 29+ messages in thread
From: Wang Nan @ 2016-06-15  2:23 UTC (permalink / raw)
  To: acme; +Cc: pi3orama, linux-kernel, Wang Nan

This patch set enables daemonized perf recording by utilizing
overwritable backward ring buffer. With this feature one can
put perf background, and dump ring buffer records by a SIGUSR2
when he/she find something unusual. For example, following
command record system calls, schedule events and samples on cpu cycles
continously:

 # perf record -g -e cycles -e raw_syscalls:*/call-graph=no/ \
                  -e sched:sched_switch/call-graph=no/ \
                  --switch-output --overwrite -a

Then by sending SIGUSR2 to perf when lagging is happen, we get multiple
perf.data output, each of them correspond a abnormal event, and the data
size is reasonable:

 # ls -l ./perf.data*
 -rw------- 1 root root 5122165 May 13 23:51 ./perf.data.2016051323511683
 -rw------- 1 root root 5135093 May 13 23:51 ./perf.data.2016051323512107
 -rw------- 1 root root 5135213 May 13 23:51 ./perf.data.2016051323512215
 -rw------- 1 root root 5135157 May 13 23:51 ./perf.data.2016051323512387

v1 -> v2: Totally redesign: drop the principle of 'channal', use
          auxiliary evlist instead. Fix missing documentation.

v2 -> v3: Rename perf_evlist__toggle_paused() to perf_evlist__pause/resume.

v3 -> v4: Update commit message to describe auxiliary evlist more clearly.

v4 -> v5: Reorder commits, ensure '--overwrite' works right after perf
          support the option.
          Add test cases for auxiliary evlist.
          Avoid bug if main evlist is empty.

v5 -> v6: Improve filter pollfd related code.

v6 -> v7: Rebase to newest perf/core.

Wang Nan (8):
  perf evlist: Introduce aux evlist
  perf tests: Add testcase for auxiliary evlist
  perf record: Introduce rec->overwrite_evlist for overwritable events
  perf record: Toggle overwrite ring buffer for reading
  perf tools: Enable overwrite settings
  perf tools: Don't warn about out of order event if write_backward is
    used
  perf tools: Check write_backward during evlist config
  perf record: Unmap overwrite evlist when event terminate

 tools/perf/Documentation/perf-record.txt |  14 ++
 tools/perf/builtin-record.c              | 301 +++++++++++++++++++++++++++----
 tools/perf/perf.h                        |   1 +
 tools/perf/tests/backward-ring-buffer.c  |  86 +++++++--
 tools/perf/util/evlist.c                 |  34 ++--
 tools/perf/util/evlist.h                 |   5 +-
 tools/perf/util/evsel.c                  |  27 +--
 tools/perf/util/evsel.h                  |  15 ++
 tools/perf/util/parse-events.c           |  20 +-
 tools/perf/util/parse-events.h           |   2 +
 tools/perf/util/parse-events.l           |   2 +
 tools/perf/util/record.c                 |  17 ++
 tools/perf/util/session.c                |  22 ++-
 13 files changed, 463 insertions(+), 83 deletions(-)

-- 
1.8.3.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v7 1/8] perf evlist: Introduce aux evlist
  2016-06-15  2:23 [PATCH v7 0/8] perf tools: Support overwritable ring buffer Wang Nan
@ 2016-06-15  2:23 ` Wang Nan
  2016-06-15  2:23 ` [PATCH v7 2/8] perf tests: Add testcase for auxiliary evlist Wang Nan
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Wang Nan @ 2016-06-15  2:23 UTC (permalink / raw)
  To: acme
  Cc: pi3orama, linux-kernel, Wang Nan, He Kuang, Jiri Olsa,
	Masami Hiramatsu, Namhyung Kim, Zefan Li,
	Arnaldo Carvalho de Melo

An auxiliary evlist is created by perf_evlist__new_aux() using an
existing evlist as its parent. An auxiliary evlist can have its own
'struct perf_mmap', but can't have any other data. User should use its
parent instead when accessing other data.

Auxiliary evlists are containers of 'struct perf_mmap'. It is introduced
to allow its parent evlist to map different events into separated mmaps.

The following commit creates an auxiliary evlist for overwritable
events, because overwritable events need a read only and backwards ring
buffer, which is different from normal events.

To achieve this goal, this patch carefully changes 'evlist' to
'evlist->parent' in all functions in the path of 'perf_evlist__mmap_ex',
except 'evlist->mmap' related operations, to make sure all evlist
modifications (like pollfd and event id hash tables) goes to original
evlist.

A 'evlist->parent' pointer is added to 'struct perf_evlist' and points to
the evlist itself for normal evlists.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1464056944-166978-2-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/evlist.c | 31 +++++++++++++++++++++----------
 tools/perf/util/evlist.h |  3 +++
 2 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 1b918aa..e8fcb22 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -45,6 +45,7 @@ void perf_evlist__init(struct perf_evlist *evlist, struct cpu_map *cpus,
 	fdarray__init(&evlist->pollfd, 64);
 	evlist->workload.pid = -1;
 	evlist->backward = false;
+	evlist->parent = evlist;
 }
 
 struct perf_evlist *perf_evlist__new(void)
@@ -1012,7 +1013,7 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
 	struct perf_evsel *evsel;
 	int revent;
 
-	evlist__for_each(evlist, evsel) {
+	evlist__for_each(evlist->parent, evsel) {
 		int fd;
 
 		if (evsel->overwrite != (evlist->overwrite && evlist->backward))
@@ -1044,16 +1045,16 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
 		 * Therefore don't add it for polling.
 		 */
 		if (!evsel->system_wide &&
-		    __perf_evlist__add_pollfd(evlist, fd, idx, revent) < 0) {
+		    __perf_evlist__add_pollfd(evlist->parent, fd, idx, revent) < 0) {
 			perf_evlist__mmap_put(evlist, idx);
 			return -1;
 		}
 
 		if (evsel->attr.read_format & PERF_FORMAT_ID) {
-			if (perf_evlist__id_add_fd(evlist, evsel, cpu, thread,
+			if (perf_evlist__id_add_fd(evlist->parent, evsel, cpu, thread,
 						   fd) < 0)
 				return -1;
-			perf_evlist__set_sid_idx(evlist, evsel, idx, cpu,
+			perf_evlist__set_sid_idx(evlist->parent, evsel, idx, cpu,
 						 thread);
 		}
 	}
@@ -1094,13 +1095,13 @@ static int perf_evlist__mmap_per_thread(struct perf_evlist *evlist,
 					struct mmap_params *mp)
 {
 	int thread;
-	int nr_threads = thread_map__nr(evlist->threads);
+	int nr_threads = thread_map__nr(evlist->parent->threads);
 
 	pr_debug2("perf event ring buffer mmapped per thread\n");
 	for (thread = 0; thread < nr_threads; thread++) {
 		int output = -1;
 
-		auxtrace_mmap_params__set_idx(&mp->auxtrace_mp, evlist, thread,
+		auxtrace_mmap_params__set_idx(&mp->auxtrace_mp, evlist->parent, thread,
 					      false);
 
 		if (perf_evlist__mmap_per_evsel(evlist, thread, mp, 0, thread,
@@ -1239,8 +1240,8 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 			 bool auxtrace_overwrite)
 {
 	struct perf_evsel *evsel;
-	const struct cpu_map *cpus = evlist->cpus;
-	const struct thread_map *threads = evlist->threads;
+	const struct cpu_map *cpus = evlist->parent->cpus;
+	const struct thread_map *threads = evlist->parent->threads;
 	struct mmap_params mp = {
 		.prot = PROT_READ | (overwrite ? 0 : PROT_WRITE),
 	};
@@ -1248,7 +1249,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 	if (evlist->mmap == NULL && perf_evlist__alloc_mmap(evlist) < 0)
 		return -ENOMEM;
 
-	if (evlist->pollfd.entries == NULL && perf_evlist__alloc_pollfd(evlist) < 0)
+	if (evlist->parent->pollfd.entries == NULL && perf_evlist__alloc_pollfd(evlist->parent) < 0)
 		return -ENOMEM;
 
 	evlist->overwrite = overwrite;
@@ -1259,7 +1260,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 	auxtrace_mmap_params__init(&mp.auxtrace_mp, evlist->mmap_len,
 				   auxtrace_pages, auxtrace_overwrite);
 
-	evlist__for_each(evlist, evsel) {
+	evlist__for_each(evlist->parent, evsel) {
 		if ((evsel->attr.read_format & PERF_FORMAT_ID) &&
 		    evsel->sample_id == NULL &&
 		    perf_evsel__alloc_id(evsel, cpu_map__nr(cpus), threads->nr) < 0)
@@ -1916,3 +1917,13 @@ perf_evlist__find_evsel_by_str(struct perf_evlist *evlist,
 
 	return NULL;
 }
+
+struct perf_evlist *perf_evlist__new_aux(struct perf_evlist *parent)
+{
+	struct perf_evlist *evlist = zalloc(sizeof(*evlist));
+
+	if (evlist != NULL)
+		perf_evlist__init(evlist, parent->cpus, parent->threads);
+	evlist->parent = parent->parent;
+	return evlist;
+}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 68cb136..41e65ac 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -60,6 +60,7 @@ struct perf_evlist {
 	struct perf_evsel *selected;
 	struct events_stats stats;
 	struct perf_env	*env;
+	struct perf_evlist *parent;
 };
 
 struct perf_evsel_str_handler {
@@ -70,6 +71,8 @@ struct perf_evsel_str_handler {
 struct perf_evlist *perf_evlist__new(void);
 struct perf_evlist *perf_evlist__new_default(void);
 struct perf_evlist *perf_evlist__new_dummy(void);
+struct perf_evlist *perf_evlist__new_aux(struct perf_evlist *);
+
 void perf_evlist__init(struct perf_evlist *evlist, struct cpu_map *cpus,
 		       struct thread_map *threads);
 void perf_evlist__exit(struct perf_evlist *evlist);
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v7 2/8] perf tests: Add testcase for auxiliary evlist
  2016-06-15  2:23 [PATCH v7 0/8] perf tools: Support overwritable ring buffer Wang Nan
  2016-06-15  2:23 ` [PATCH v7 1/8] perf evlist: Introduce aux evlist Wang Nan
@ 2016-06-15  2:23 ` Wang Nan
  2016-06-15  2:23 ` [PATCH v7 3/8] perf record: Introduce rec->overwrite_evlist for overwritable events Wang Nan
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Wang Nan @ 2016-06-15  2:23 UTC (permalink / raw)
  To: acme
  Cc: pi3orama, linux-kernel, Wang Nan, Arnaldo Carvalho de Melo,
	Jiri Olsa, Masami Hiramatsu, Namhyung Kim, Zefan Li, He Kuang

Improve test backward-ring-buffer, trace both enter and exit event of
prctl() syscall, utilize auxiliary evlist to mmap enter and exit event
into separated mmaps.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
---
 tools/perf/tests/backward-ring-buffer.c | 87 ++++++++++++++++++++++++++-------
 1 file changed, 68 insertions(+), 19 deletions(-)

diff --git a/tools/perf/tests/backward-ring-buffer.c b/tools/perf/tests/backward-ring-buffer.c
index d9ba991..76e34c0 100644
--- a/tools/perf/tests/backward-ring-buffer.c
+++ b/tools/perf/tests/backward-ring-buffer.c
@@ -31,16 +31,19 @@ static int count_samples(struct perf_evlist *evlist, int *sample_count,
 	for (i = 0; i < evlist->nr_mmaps; i++) {
 		union perf_event *event;
 
-		perf_evlist__mmap_read_catchup(evlist, i);
-		while ((event = perf_evlist__mmap_read_backward(evlist, i)) != NULL) {
+		if (evlist->backward)
+			perf_evlist__mmap_read_catchup(evlist, i);
+		while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) {
 			const u32 type = event->header.type;
 
 			switch (type) {
 			case PERF_RECORD_SAMPLE:
-				(*sample_count)++;
+				if (sample_count)
+					(*sample_count)++;
 				break;
 			case PERF_RECORD_COMM:
-				(*comm_count)++;
+				if (comm_count)
+					(*comm_count)++;
 				break;
 			default:
 				pr_err("Unexpected record of type %d\n", type);
@@ -51,34 +54,53 @@ static int count_samples(struct perf_evlist *evlist, int *sample_count,
 	return TEST_OK;
 }
 
-static int do_test(struct perf_evlist *evlist, int mmap_pages,
-		   int *sample_count, int *comm_count)
+static int do_test(struct perf_evlist *evlist,
+		   struct perf_evlist *aux_evlist,
+		   int mmap_pages,
+		   int *enter_sample_count,
+		   int *exit_sample_count,
+		   int *comm_count)
 {
 	int err;
 	char sbuf[STRERR_BUFSIZE];
 
-	err = perf_evlist__mmap(evlist, mmap_pages, true);
+	err = perf_evlist__mmap(evlist, mmap_pages, false);
 	if (err < 0) {
 		pr_debug("perf_evlist__mmap: %s\n",
 			 strerror_r(errno, sbuf, sizeof(sbuf)));
 		return TEST_FAIL;
 	}
 
+	err = perf_evlist__mmap(aux_evlist, mmap_pages, true);
+	if (err < 0) {
+		pr_debug("perf_evlist__mmap for aux_evlist: %s\n",
+			 strerror_r(errno, sbuf, sizeof(sbuf)));
+		return TEST_FAIL;
+	}
+
 	perf_evlist__enable(evlist);
 	testcase();
 	perf_evlist__disable(evlist);
 
-	err = count_samples(evlist, sample_count, comm_count);
+	err = count_samples(aux_evlist, exit_sample_count, comm_count);
+	if (err)
+		goto errout;
+	err = count_samples(evlist, enter_sample_count, NULL);
+	if (err)
+		goto errout;
+errout:
 	perf_evlist__munmap(evlist);
+	perf_evlist__munmap(aux_evlist);
 	return err;
 }
 
 
 int test__backward_ring_buffer(int subtest __maybe_unused)
 {
-	int ret = TEST_SKIP, err, sample_count = 0, comm_count = 0;
+	int ret = TEST_SKIP, err;
+	int enter_sample_count = 0, exit_sample_count = 0, comm_count = 0;
 	char pid[16], sbuf[STRERR_BUFSIZE];
-	struct perf_evlist *evlist;
+	struct perf_evlist *evlist, *aux_evlist = NULL;
 	struct perf_evsel *evsel __maybe_unused;
 	struct parse_events_error parse_error;
 	struct record_opts opts = {
@@ -115,11 +137,22 @@ int test__backward_ring_buffer(int subtest __maybe_unused)
 		goto out_delete_evlist;
 	}
 
-	perf_evlist__config(evlist, &opts, NULL);
+	/*
+	 * Set backward bit, ring buffer should be writing from end. Record
+	 * it in aux evlist
+	 */
+	perf_evlist__last(evlist)->overwrite = true;
+	perf_evlist__last(evlist)->attr.write_backward = 1;
 
-	/* Set backward bit, ring buffer should be writing from end */
-	evlist__for_each(evlist, evsel)
-		evsel->attr.write_backward = 1;
+	err = parse_events(evlist, "syscalls:sys_exit_prctl", &parse_error);
+	if (err) {
+		pr_debug("Failed to parse tracepoint event, try use root\n");
+		ret = TEST_SKIP;
+		goto out_delete_evlist;
+	}
+	/* Don't set backward bit for exit event. Record it in main evlist */
+
+	perf_evlist__config(evlist, &opts, NULL);
 
 	err = perf_evlist__open(evlist);
 	if (err < 0) {
@@ -128,24 +161,40 @@ int test__backward_ring_buffer(int subtest __maybe_unused)
 		goto out_delete_evlist;
 	}
 
+	aux_evlist = perf_evlist__new_aux(evlist);
+	if (!aux_evlist) {
+		pr_debug("perf_evlist__new_aux failed\n");
+		goto out_delete_evlist;
+	}
+	aux_evlist->backward = true;
+
 	ret = TEST_FAIL;
-	err = do_test(evlist, opts.mmap_pages, &sample_count,
+	err = do_test(evlist, aux_evlist, opts.mmap_pages,
+		      &enter_sample_count, &exit_sample_count,
 		      &comm_count);
 	if (err != TEST_OK)
 		goto out_delete_evlist;
 
-	if ((sample_count != NR_ITERS) || (comm_count != NR_ITERS)) {
-		pr_err("Unexpected counter: sample_count=%d, comm_count=%d\n",
-		       sample_count, comm_count);
+	if (enter_sample_count != exit_sample_count) {
+		pr_err("Unexpected counter: enter_sample_count=%d, exit_sample_count=%d\n",
+		       enter_sample_count, exit_sample_count);
+		goto out_delete_evlist;
+	}
+
+	if ((exit_sample_count != NR_ITERS) || (comm_count != NR_ITERS)) {
+		pr_err("Unexpected counter: exit_sample_count=%d, comm_count=%d\n",
+		       exit_sample_count, comm_count);
 		goto out_delete_evlist;
 	}
 
-	err = do_test(evlist, 1, &sample_count, &comm_count);
+	err = do_test(evlist, aux_evlist, 1, NULL, NULL, NULL);
 	if (err != TEST_OK)
 		goto out_delete_evlist;
 
 	ret = TEST_OK;
 out_delete_evlist:
+	if (aux_evlist)
+		perf_evlist__delete(aux_evlist);
 	perf_evlist__delete(evlist);
 	return ret;
 }
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v7 3/8] perf record: Introduce rec->overwrite_evlist for overwritable events
  2016-06-15  2:23 [PATCH v7 0/8] perf tools: Support overwritable ring buffer Wang Nan
  2016-06-15  2:23 ` [PATCH v7 1/8] perf evlist: Introduce aux evlist Wang Nan
  2016-06-15  2:23 ` [PATCH v7 2/8] perf tests: Add testcase for auxiliary evlist Wang Nan
@ 2016-06-15  2:23 ` Wang Nan
  2016-06-15  2:23 ` [PATCH v7 4/8] perf record: Toggle overwrite ring buffer for reading Wang Nan
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Wang Nan @ 2016-06-15  2:23 UTC (permalink / raw)
  To: acme
  Cc: pi3orama, linux-kernel, Wang Nan, He Kuang,
	Arnaldo Carvalho de Melo, Jiri Olsa, Masami Hiramatsu,
	Namhyung Kim, Zefan Li

Create an auxiliary evlist for overwritable events.

Before mmap, build this evlist and set 'overwrite' and 'backward'
attribute. Since perf_evlist__mmap_ex() only maps events when
evsel->overwrite matches evlist's corresponding attributes, with
these two evlists an event goes to either rec->evlist or
rec->overwrite_evlist.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c | 140 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 117 insertions(+), 23 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index d4cf1b0..8108332 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -50,6 +50,7 @@ struct record {
 	struct perf_data_file	file;
 	struct auxtrace_record	*itr;
 	struct perf_evlist	*evlist;
+	struct perf_evlist	*overwrite_evlist;
 	struct perf_session	*session;
 	const char		*progname;
 	int			realtime_prio;
@@ -341,6 +342,84 @@ int auxtrace_record__snapshot_start(struct auxtrace_record *itr __maybe_unused)
 
 #endif
 
+static int record__create_overwrite_evlist(struct record *rec)
+{
+	struct perf_evlist *evlist = rec->evlist;
+	struct perf_evsel *pos;
+
+	evlist__for_each(evlist, pos) {
+		if (!pos->overwrite)
+			continue;
+
+		if (!rec->overwrite_evlist) {
+			rec->overwrite_evlist = perf_evlist__new_aux(evlist);
+			if (rec->overwrite_evlist) {
+				rec->overwrite_evlist->backward = true;
+				rec->overwrite_evlist->overwrite = true;
+				return 0;
+			} else
+				return -ENOMEM;
+		}
+	}
+	return 0;
+}
+
+static int record__mmap_evlist(struct record *rec,
+			       struct perf_evlist *evlist,
+			       bool overwrite)
+{
+	struct record_opts *opts = &rec->opts;
+	char msg[512];
+
+	/*
+	 * Don't use evlist->overwrite because it is logically an
+	 * internal attribute and is set by perf_evlist__mmap_ex().
+	 * Avoid circular dependency.
+	 */
+	if (perf_evlist__mmap_ex(evlist, opts->mmap_pages, overwrite,
+				 opts->auxtrace_mmap_pages,
+				 opts->auxtrace_snapshot_mode) < 0) {
+		if (errno == EPERM) {
+			pr_err("Permission error mapping pages.\n"
+			       "Consider increasing "
+			       "/proc/sys/kernel/perf_event_mlock_kb,\n"
+			       "or try again with a smaller value of -m/--mmap_pages.\n"
+			       "(current value: %u,%u)\n",
+			       opts->mmap_pages, opts->auxtrace_mmap_pages);
+			return -errno;
+		} else {
+			pr_err("failed to mmap with %d (%s)\n", errno,
+				strerror_r(errno, msg, sizeof(msg)));
+			if (errno)
+				return -errno;
+			else
+				return -EINVAL;
+		}
+	}
+	return 0;
+}
+
+static int record__mmap(struct record *rec)
+{
+	int err;
+
+	err = record__create_overwrite_evlist(rec);
+	if (err)
+		return err;
+
+	err = record__mmap_evlist(rec, rec->evlist, false);
+	if (err)
+		return err;
+
+	if (!rec->overwrite_evlist)
+		return 0;
+
+	err = record__mmap_evlist(rec, rec->overwrite_evlist, true);
+	if (err)
+		return err;
+	return 0;
+}
+
 static int record__open(struct record *rec)
 {
 	char msg[512];
@@ -353,6 +432,13 @@ static int record__open(struct record *rec)
 	perf_evlist__config(evlist, opts, &callchain_param);
 
 	evlist__for_each(evlist, pos) {
+		if (pos->overwrite) {
+			if (!pos->attr.write_backward) {
+				ui__warning("Unable to read from overwrite ring buffer\n\n");
+				rc = -ENOSYS;
+				goto out;
+			}
+		}
 try_again:
 		if (perf_evsel__open(pos, pos->cpus, pos->threads) < 0) {
 			if (perf_evsel__fallback(pos, errno, msg, sizeof(msg))) {
@@ -377,28 +463,9 @@ try_again:
 		goto out;
 	}
 
-	if (perf_evlist__mmap_ex(evlist, opts->mmap_pages, false,
-				 opts->auxtrace_mmap_pages,
-				 opts->auxtrace_snapshot_mode) < 0) {
-		if (errno == EPERM) {
-			pr_err("Permission error mapping pages.\n"
-			       "Consider increasing "
-			       "/proc/sys/kernel/perf_event_mlock_kb,\n"
-			       "or try again with a smaller value of -m/--mmap_pages.\n"
-			       "(current value: %u,%u)\n",
-			       opts->mmap_pages, opts->auxtrace_mmap_pages);
-			rc = -errno;
-		} else {
-			pr_err("failed to mmap with %d (%s)\n", errno,
-				strerror_r(errno, msg, sizeof(msg)));
-			if (errno)
-				rc = -errno;
-			else
-				rc = -EINVAL;
-		}
+	rc = record__mmap(rec);
+	if (rc)
 		goto out;
-	}
-
 	session->evlist = evlist;
 	perf_session__set_id_hdr_size(session);
 out:
@@ -655,10 +722,26 @@ perf_event__synth_time_conv(const struct perf_event_mmap_page *pc __maybe_unused
 	return 0;
 }
 
+static const struct perf_event_mmap_page *
+perf_evlist__pick_pc(struct perf_evlist *evlist)
+{
+	if (evlist && evlist->mmap && evlist->mmap[0].base)
+		return evlist->mmap[0].base;
+	return NULL;
+}
+
 static const struct perf_event_mmap_page *record__pick_pc(struct record *rec)
 {
-	if (rec->evlist && rec->evlist->mmap && rec->evlist->mmap[0].base)
-		return rec->evlist->mmap[0].base;
+	const struct perf_event_mmap_page *pc;
+
+	/* Change it to a loop if a new aux evlist is added */
+	pc = perf_evlist__pick_pc(rec->evlist);
+	if (pc)
+		return pc;
+	pc = perf_evlist__pick_pc(rec->overwrite_evlist);
+	if (pc)
+		return pc;
+
 	return NULL;
 }
 
@@ -941,6 +1024,14 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 				err = 0;
 			waking++;
 
+			/*
+			 * Although we may have auxiliray evlist, there is
+			 * only one pollfd, so we don't need to filter pollfd
+			 * for auxiliray evlist.
+			 *
+			 * TODO: if an event is terminated (POLLERR | POLLHUP),
+			 * unmap mmaps for auxiliray evlist too.
+			 */
 			if (perf_evlist__filter_pollfd(rec->evlist, POLLERR | POLLHUP) == 0)
 				draining = true;
 		}
@@ -1269,6 +1360,7 @@ static struct record record = {
 		.mmap2		= perf_event__process_mmap2,
 		.ordered_events	= true,
 	},
+	.overwrite_evlist = NULL,
 };
 
 const char record_callchain_help[] = CALLCHAIN_RECORD_HELP
@@ -1565,6 +1657,8 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
 	err = __cmd_record(&record, argc, argv);
 out_symbol_exit:
 	perf_evlist__delete(rec->evlist);
+	if (rec->overwrite_evlist)
+		perf_evlist__delete(rec->overwrite_evlist);
 	symbol__exit();
 	auxtrace_record__free(rec->itr);
 	return err;
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v7 4/8] perf record: Toggle overwrite ring buffer for reading
  2016-06-15  2:23 [PATCH v7 0/8] perf tools: Support overwritable ring buffer Wang Nan
                   ` (2 preceding siblings ...)
  2016-06-15  2:23 ` [PATCH v7 3/8] perf record: Introduce rec->overwrite_evlist for overwritable events Wang Nan
@ 2016-06-15  2:23 ` Wang Nan
  2016-06-15  2:23 ` [PATCH v7 5/8] perf tools: Enable overwrite settings Wang Nan
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Wang Nan @ 2016-06-15  2:23 UTC (permalink / raw)
  To: acme
  Cc: pi3orama, linux-kernel, Wang Nan, He Kuang,
	Arnaldo Carvalho de Melo, Jiri Olsa, Masami Hiramatsu,
	Namhyung Kim, Zefan Li

overwrite_evt_state is introduced to reflect the state of overwritable
ring buffers. It is a state machine with 3 states:

 RUNNING --(1)--> DATA_PENDING --(2)--> EMPTY
    ^                  ^                 |
    |                  |___(disallow)___/|
    |                                    |
     \_________________(3)______________/

 RUNNING      : Overwritable ring buffers are recording
 DATA_PENDING : We are required to collect overwritable ring buffers
 EMPTY        : We have collected data from those ring buffers.

 (1): Pause ring buffers for reading
 (2): Read from ring buffers
 (3): Resume ring buffers for recording

We can't avoid this complexity. Because we deliberately drop records from
overwritable ring buffer, we can't detect remaining data by checking head
and old pointers. Therefore, DATA_PENDING state is mandatory.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c | 146 +++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 136 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 8108332..02444e9 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -42,6 +42,28 @@
 #include <sys/mman.h>
 #include <asm/bug.h>
 
+/*
+ * State machine of overwrite_evt_state:
+ *
+ * RUNNING --(1)--> DATA_PENDING --(2)--> EMPTY
+ *    ^                  ^                 |
+ *    |                  |___(disallow)___/|
+ *    |                                    |
+ *     \_________________(3)______________/
+ *
+ * RUNNING      : Overwritable ring buffers are recording
+ * DATA_PENDING : We are required to collect overwritable ring buffers
+ * EMPTY        : We have collected data from those ring buffers.
+ *
+ * (1): Pause ring buffers for reading
+ * (2): Read from ring buffers
+ * (3): Resume ring buffers for recording
+ */
+enum overwrite_evt_state {
+	OVERWRITE_EVT_RUNNING,
+	OVERWRITE_EVT_DATA_PENDING,
+	OVERWRITE_EVT_EMPTY,
+};
 
 struct record {
 	struct perf_tool	tool;
@@ -61,6 +83,7 @@ struct record {
 	bool			buildid_all;
 	bool			timestamp_filename;
 	bool			switch_output;
+	enum overwrite_evt_state overwrite_evt_state;
 	unsigned long long	samples;
 };
 
@@ -132,9 +155,9 @@ rb_find_range(struct perf_evlist *evlist,
 	return backward_rb_find_range(data, mask, head, start, end);
 }
 
-static int record__mmap_read(struct record *rec, int idx)
+static int record__mmap_read(struct record *rec, struct perf_evlist *evlist, int idx)
 {
-	struct perf_mmap *md = &rec->evlist->mmap[idx];
+	struct perf_mmap *md = &evlist->mmap[idx];
 	u64 head = perf_mmap__read_head(md);
 	u64 old = md->prev;
 	u64 end = head, start = old;
@@ -143,7 +166,7 @@ static int record__mmap_read(struct record *rec, int idx)
 	void *buf;
 	int rc = 0;
 
-	if (rb_find_range(rec->evlist, data, md->mask, head,
+	if (rb_find_range(evlist, data, md->mask, head,
 			  old, &start, &end))
 		return -1;
 
@@ -157,7 +180,7 @@ static int record__mmap_read(struct record *rec, int idx)
 		WARN_ONCE(1, "failed to keep up with mmap data. (warn only once)\n");
 
 		md->prev = head;
-		perf_evlist__mmap_consume(rec->evlist, idx);
+		perf_evlist__mmap_consume(evlist, idx);
 		return 0;
 	}
 
@@ -182,7 +205,7 @@ static int record__mmap_read(struct record *rec, int idx)
 	}
 
 	md->prev = head;
-	perf_evlist__mmap_consume(rec->evlist, idx);
+	perf_evlist__mmap_consume(evlist, idx);
 out:
 	return rc;
 }
@@ -468,6 +491,7 @@ try_again:
 		goto out;
 	session->evlist = evlist;
 	perf_session__set_id_hdr_size(session);
+	rec->overwrite_evt_state = OVERWRITE_EVT_RUNNING;
 out:
 	return rc;
 }
@@ -548,17 +572,72 @@ static struct perf_event_header finished_round_event = {
 	.type = PERF_RECORD_FINISHED_ROUND,
 };
 
-static int record__mmap_read_all(struct record *rec)
+static void
+record__toggle_overwrite_evsels(struct record *rec,
+				enum overwrite_evt_state state)
+{
+	struct perf_evlist *evlist = rec->overwrite_evlist;
+	enum overwrite_evt_state old_state = rec->overwrite_evt_state;
+	enum action {
+		NONE,
+		PAUSE,
+		RESUME,
+	} action = NONE;
+
+	switch (old_state) {
+	case OVERWRITE_EVT_RUNNING:
+		if (state != OVERWRITE_EVT_RUNNING)
+			action = PAUSE;
+		break;
+	case OVERWRITE_EVT_DATA_PENDING:
+		if (state == OVERWRITE_EVT_RUNNING)
+			action = RESUME;
+		break;
+	case OVERWRITE_EVT_EMPTY:
+		if (state == OVERWRITE_EVT_RUNNING)
+			action = RESUME;
+		if (state == OVERWRITE_EVT_DATA_PENDING)
+			state = OVERWRITE_EVT_EMPTY;
+		break;
+	default:
+		WARN_ONCE(1, "Shouldn't get there\n");
+	}
+
+	rec->overwrite_evt_state = state;
+
+	if (action == NONE)
+		return;
+
+	if (!evlist)
+		return;
+
+	switch (action) {
+	case PAUSE:
+		perf_evlist__pause(evlist);
+		break;
+	case RESUME:
+		perf_evlist__resume(evlist);
+		break;
+	case NONE:
+	default:
+		break;
+	}
+}
+
+static int __record__mmap_read_evlist(struct record *rec, struct perf_evlist *evlist)
 {
 	u64 bytes_written = rec->bytes_written;
 	int i;
 	int rc = 0;
 
-	for (i = 0; i < rec->evlist->nr_mmaps; i++) {
-		struct auxtrace_mmap *mm = &rec->evlist->mmap[i].auxtrace_mmap;
+	if (!evlist)
+		return 0;
 
-		if (rec->evlist->mmap[i].base) {
-			if (record__mmap_read(rec, i) != 0) {
+	for (i = 0; i < evlist->nr_mmaps; i++) {
+		struct auxtrace_mmap *mm = &evlist->mmap[i].auxtrace_mmap;
+
+		if (evlist->mmap[i].base) {
+			if (record__mmap_read(rec, evlist, i) != 0) {
 				rc = -1;
 				goto out;
 			}
@@ -582,6 +661,23 @@ out:
 	return rc;
 }
 
+static int record__mmap_read_all(struct record *rec)
+{
+	int err;
+
+	err = __record__mmap_read_evlist(rec, rec->evlist);
+	if (err)
+		return err;
+
+	if (rec->overwrite_evt_state == OVERWRITE_EVT_DATA_PENDING) {
+		err = __record__mmap_read_evlist(rec, rec->overwrite_evlist);
+		if (err)
+			return err;
+		record__toggle_overwrite_evsels(rec, OVERWRITE_EVT_EMPTY);
+	}
+	return 0;
+}
+
 static void record__init_features(struct record *rec)
 {
 	struct perf_session *session = rec->session;
@@ -978,6 +1074,17 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	for (;;) {
 		unsigned long long hits = rec->samples;
 
+		/*
+		 * rec->overwrite_evt_state is possible to be
+		 * OVERWRITE_EVT_EMPTY here: when done == true and
+		 * hits != rec->samples after previous reading.
+		 *
+		 * record__toggle_overwrite_evsels ensure we never
+		 * convert OVERWRITE_EVT_EMPTY to OVERWRITE_EVT_DATA_PENDING.
+		 */
+		if (trigger_is_hit(&switch_output_trigger) || done || draining)
+			record__toggle_overwrite_evsels(rec, OVERWRITE_EVT_DATA_PENDING);
+
 		if (record__mmap_read_all(rec) < 0) {
 			trigger_error(&auxtrace_snapshot_trigger);
 			trigger_error(&switch_output_trigger);
@@ -997,8 +1104,27 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		}
 
 		if (trigger_is_hit(&switch_output_trigger)) {
+			/*
+			 * If switch_output_trigger is hit, the data in
+			 * overwritable ring buffer should have been collected,
+			 * so overwrite_evt_state should be set to
+			 * OVERWRITE_EVT_EMPTY.
+			 *
+			 * If SIGUSR2 raise after or during record__mmap_read_all(),
+			 * record__mmap_read_all() didn't collect data from
+			 * overwritable ring buffer. Read again.
+			 */
+			if (rec->overwrite_evt_state == OVERWRITE_EVT_RUNNING)
+				continue;
 			trigger_ready(&switch_output_trigger);
 
+			/*
+			 * Reenable events in overwrite ring buffer after
+			 * record__mmap_read_all(): we should have collected
+			 * data from it.
+			 */
+			record__toggle_overwrite_evsels(rec, OVERWRITE_EVT_RUNNING);
+
 			if (!quiet)
 				fprintf(stderr, "[ perf record: dump data: Woken up %ld times ]\n",
 					waking);
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v7 5/8] perf tools: Enable overwrite settings
  2016-06-15  2:23 [PATCH v7 0/8] perf tools: Support overwritable ring buffer Wang Nan
                   ` (3 preceding siblings ...)
  2016-06-15  2:23 ` [PATCH v7 4/8] perf record: Toggle overwrite ring buffer for reading Wang Nan
@ 2016-06-15  2:23 ` Wang Nan
  2016-06-15  2:23 ` [PATCH v7 6/8] perf tools: Don't warn about out of order event if write_backward is used Wang Nan
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Wang Nan @ 2016-06-15  2:23 UTC (permalink / raw)
  To: acme
  Cc: pi3orama, linux-kernel, Wang Nan, He Kuang,
	Arnaldo Carvalho de Melo, Jiri Olsa, Masami Hiramatsu,
	Namhyung Kim, Zefan Li

This patch allows following config terms and option:

Globally setting events to overwrite;

 # perf record --overwrite ...

Set specific events to be overwrite or no-overwrite.

 # perf record --event cycles/overwrite/ ...
 # perf record --event cycles/no-overwrite/ ...

Add missing config terms and update config term array size because the
longest string length is changed.

For overwritable events, automatically select attr.write_backward since
perf requires it to be backward for reading.

Test result:
 # perf record --overwrite -e syscalls:*enter_nanosleep* usleep 1
 [ perf record: Woken up 2 times to write data ]
 [ perf record: Captured and wrote 0.011 MB perf.data (1 samples) ]
 # perf evlist -v
 syscalls:sys_enter_nanosleep: type: 2, size: 112, config: 0x134, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW, disabled: 1, inherit: 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, write_backward: 1
 # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/Documentation/perf-record.txt | 14 ++++++++++++++
 tools/perf/builtin-record.c              |  1 +
 tools/perf/perf.h                        |  1 +
 tools/perf/tests/backward-ring-buffer.c  | 15 ++++++---------
 tools/perf/util/evsel.c                  | 12 ++++++++++++
 tools/perf/util/evsel.h                  |  2 ++
 tools/perf/util/parse-events.c           | 20 ++++++++++++++++++--
 tools/perf/util/parse-events.h           |  2 ++
 tools/perf/util/parse-events.l           |  2 ++
 9 files changed, 58 insertions(+), 11 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 8dbee83..f5cb932 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -360,6 +360,20 @@ particular perf.data snapshot should be kept or not.
 
 Implies --timestamp-filename, --no-buildid and --no-buildid-cache.
 
+--overwrite::
+Makes all events use overwritable ring buffer. Event with overwritable ring
+buffer works like a flight recorder: when buffer gets full, instead of dumping
+records into output file, kernel overwrites old records silently. Perf dumps
+data from overwritable ring buffer when switching output (see --switch-output)
+and before terminate.
+
+Perf behaves like a daemon when '--overwrite' and '--switch-output' are
+provided. It record and drop events in background, and dumps data when
+something unusual is detected.
+
+'overwrite' attribute can also be set or canceled for specific event using
+config terms like 'cycles/overwrite/' and 'instructions/no-overwrite/'.
+
 SEE ALSO
 --------
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 02444e9..b9094f0 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1528,6 +1528,7 @@ struct option __record_options[] = {
 	OPT_BOOLEAN_SET('i', "no-inherit", &record.opts.no_inherit,
 			&record.opts.no_inherit_set,
 			"child tasks do not inherit counters"),
+	OPT_BOOLEAN(0, "overwrite", &record.opts.overwrite, "use overwrite mode"),
 	OPT_UINTEGER('F', "freq", &record.opts.user_freq, "profile at this frequency"),
 	OPT_CALLBACK('m', "mmap-pages", &record.opts, "pages[,pages]",
 		     "number of mmap data pages and AUX area tracing mmap pages",
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index cd8f1b1..608b42b 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -59,6 +59,7 @@ struct record_opts {
 	bool	     record_switch_events;
 	bool	     all_kernel;
 	bool	     all_user;
+	bool	     overwrite;
 	unsigned int freq;
 	unsigned int mmap_pages;
 	unsigned int auxtrace_mmap_pages;
diff --git a/tools/perf/tests/backward-ring-buffer.c b/tools/perf/tests/backward-ring-buffer.c
index 76e34c0..bba0b83 100644
--- a/tools/perf/tests/backward-ring-buffer.c
+++ b/tools/perf/tests/backward-ring-buffer.c
@@ -130,27 +130,24 @@ int test__backward_ring_buffer(int subtest __maybe_unused)
 	}
 
 	bzero(&parse_error, sizeof(parse_error));
-	err = parse_events(evlist, "syscalls:sys_enter_prctl", &parse_error);
+	/*
+	 * Set backward bit, ring buffer should be writing from end. Record
+	 * it in aux evlist
+	 */
+	err = parse_events(evlist, "syscalls:sys_enter_prctl/overwrite/", &parse_error);
 	if (err) {
 		pr_debug("Failed to parse tracepoint event, try use root\n");
 		ret = TEST_SKIP;
 		goto out_delete_evlist;
 	}
 
-	/*
-	 * Set backward bit, ring buffer should be writing from end. Record
-	 * it in aux evlist
-	 */
-	perf_evlist__last(evlist)->overwrite = true;
-	perf_evlist__last(evlist)->attr.write_backward = 1;
-
+	/* Don't set backward bit for exit event. Record it in main evlist */
 	err = parse_events(evlist, "syscalls:sys_exit_prctl", &parse_error);
 	if (err) {
 		pr_debug("Failed to parse tracepoint event, try use root\n");
 		ret = TEST_SKIP;
 		goto out_delete_evlist;
 	}
-	/* Don't set backward bit for exit event. Record it in main evlist */
 
 	perf_evlist__config(evlist, &opts, NULL);
 
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 9b2e3e6..ef223c7 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -677,11 +677,22 @@ static void apply_config_terms(struct perf_evsel *evsel,
 			 */
 			attr->inherit = term->val.inherit ? 1 : 0;
 			break;
+		case PERF_EVSEL__CONFIG_TERM_OVERWRITE:
+			evsel->overwrite = term->val.overwrite ? 1 : 0;
+			break;
 		default:
 			break;
 		}
 	}
 
+	/*
+	 * Set backward after config term processing because it is
+	 * possible to set overwrite globally, without config
+	 * terms.
+	 */
+	if (evsel->overwrite)
+		attr->write_backward = 1;
+
 	/* User explicitly set per-event callgraph, clear the old setting and reset. */
 	if ((callgraph_buf != NULL) || (dump_size > 0) || max_stack) {
 		if (max_stack) {
@@ -758,6 +769,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts,
 
 	attr->sample_id_all = perf_missing_features.sample_id_all ? 0 : 1;
 	attr->inherit	    = !opts->no_inherit;
+	evsel->overwrite    = opts->overwrite;
 
 	perf_evsel__set_sample_bit(evsel, IP);
 	perf_evsel__set_sample_bit(evsel, TID);
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 828ddd1..55ff958 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -45,6 +45,7 @@ enum {
 	PERF_EVSEL__CONFIG_TERM_STACK_USER,
 	PERF_EVSEL__CONFIG_TERM_INHERIT,
 	PERF_EVSEL__CONFIG_TERM_MAX_STACK,
+	PERF_EVSEL__CONFIG_TERM_OVERWRITE,
 	PERF_EVSEL__CONFIG_TERM_MAX,
 };
 
@@ -59,6 +60,7 @@ struct perf_evsel_config_term {
 		u64	stack_user;
 		int	max_stack;
 		bool	inherit;
+		bool	overwrite;
 	} val;
 };
 
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index d15e335..b641f75 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -901,6 +901,8 @@ static const char *config_term_names[__PARSE_EVENTS__TERM_TYPE_NR] = {
 	[PARSE_EVENTS__TERM_TYPE_NOINHERIT]		= "no-inherit",
 	[PARSE_EVENTS__TERM_TYPE_INHERIT]		= "inherit",
 	[PARSE_EVENTS__TERM_TYPE_MAX_STACK]		= "max-stack",
+	[PARSE_EVENTS__TERM_TYPE_OVERWRITE]		= "overwrite",
+	[PARSE_EVENTS__TERM_TYPE_NOOVERWRITE]		= "no-overwrite",
 };
 
 static bool config_term_shrinked;
@@ -993,6 +995,12 @@ do {									   \
 	case PARSE_EVENTS__TERM_TYPE_NOINHERIT:
 		CHECK_TYPE_VAL(NUM);
 		break;
+	case PARSE_EVENTS__TERM_TYPE_OVERWRITE:
+		CHECK_TYPE_VAL(NUM);
+		break;
+	case PARSE_EVENTS__TERM_TYPE_NOOVERWRITE:
+		CHECK_TYPE_VAL(NUM);
+		break;
 	case PARSE_EVENTS__TERM_TYPE_NAME:
 		CHECK_TYPE_VAL(STR);
 		break;
@@ -1045,6 +1053,8 @@ static int config_term_tracepoint(struct perf_event_attr *attr,
 	case PARSE_EVENTS__TERM_TYPE_INHERIT:
 	case PARSE_EVENTS__TERM_TYPE_NOINHERIT:
 	case PARSE_EVENTS__TERM_TYPE_MAX_STACK:
+	case PARSE_EVENTS__TERM_TYPE_OVERWRITE:
+	case PARSE_EVENTS__TERM_TYPE_NOOVERWRITE:
 		return config_term_common(attr, term, err);
 	default:
 		if (err) {
@@ -1117,6 +1127,12 @@ do {								\
 		case PARSE_EVENTS__TERM_TYPE_MAX_STACK:
 			ADD_CONFIG_TERM(MAX_STACK, max_stack, term->val.num);
 			break;
+		case PARSE_EVENTS__TERM_TYPE_OVERWRITE:
+			ADD_CONFIG_TERM(OVERWRITE, overwrite, term->val.num ? 1 : 0);
+			break;
+		case PARSE_EVENTS__TERM_TYPE_NOOVERWRITE:
+			ADD_CONFIG_TERM(OVERWRITE, overwrite, term->val.num ? 0 : 1);
+			break;
 		default:
 			break;
 		}
@@ -2330,9 +2346,9 @@ static void config_terms_list(char *buf, size_t buf_sz)
 char *parse_events_formats_error_string(char *additional_terms)
 {
 	char *str;
-	/* "branch_type" is the longest name */
+	/* "no-overwrite" is the longest name */
 	char static_terms[__PARSE_EVENTS__TERM_TYPE_NR *
-			  (sizeof("branch_type") - 1)];
+			  (sizeof("no-overwrite") - 1)];
 
 	config_terms_list(static_terms, sizeof(static_terms));
 	/* valid terms */
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 46c05cc..1b04d82 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -69,6 +69,8 @@ enum {
 	PARSE_EVENTS__TERM_TYPE_NOINHERIT,
 	PARSE_EVENTS__TERM_TYPE_INHERIT,
 	PARSE_EVENTS__TERM_TYPE_MAX_STACK,
+	PARSE_EVENTS__TERM_TYPE_NOOVERWRITE,
+	PARSE_EVENTS__TERM_TYPE_OVERWRITE,
 	__PARSE_EVENTS__TERM_TYPE_NR,
 };
 
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 3c15b33..7a25194 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -202,6 +202,8 @@ stack-size		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_STACKSIZE); }
 max-stack		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_MAX_STACK); }
 inherit			{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_INHERIT); }
 no-inherit		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NOINHERIT); }
+overwrite		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_OVERWRITE); }
+no-overwrite		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NOOVERWRITE); }
 ,			{ return ','; }
 "/"			{ BEGIN(INITIAL); return '/'; }
 {name_minus}		{ return str(yyscanner, PE_NAME); }
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v7 6/8] perf tools: Don't warn about out of order event if write_backward is used
  2016-06-15  2:23 [PATCH v7 0/8] perf tools: Support overwritable ring buffer Wang Nan
                   ` (4 preceding siblings ...)
  2016-06-15  2:23 ` [PATCH v7 5/8] perf tools: Enable overwrite settings Wang Nan
@ 2016-06-15  2:23 ` Wang Nan
  2016-06-15  2:23 ` [PATCH v7 7/8] perf tools: Check write_backward during evlist config Wang Nan
  2016-06-15  2:23 ` [PATCH v7 8/8] perf record: Unmap overwrite evlist when event terminate Wang Nan
  7 siblings, 0 replies; 29+ messages in thread
From: Wang Nan @ 2016-06-15  2:23 UTC (permalink / raw)
  To: acme
  Cc: pi3orama, linux-kernel, Wang Nan, He Kuang,
	Arnaldo Carvalho de Melo, Jiri Olsa, Masami Hiramatsu,
	Namhyung Kim, Zefan Li

If write_backward attribute is set, records are written into kernel
ring buffer from end to beginning, but read from beginning to end.
To avoid 'XX out of order events recorded' warning message (timestamps
of records is in reverse order when using write_backward), suppress the
warning message if write_backward is selected by at lease one event.

Result:

Before this patch:
 # perf record -m 1 -e raw_syscalls:sys_exit/overwrite/ \
                    -e raw_syscalls:sys_enter \
                    dd if=/dev/zero of=/dev/null count=300
 300+0 records in
 300+0 records out
 153600 bytes (154 kB) copied, 0.000601617 s, 255 MB/s
 [ perf record: Woken up 5 times to write data ]
 Warning:
 40 out of order events recorded.
 [ perf record: Captured and wrote 0.096 MB perf.data (696 samples) ]

After this patch:
 # perf record -m 1 -e raw_syscalls:sys_exit/overwrite/ \
                    -e raw_syscalls:sys_enter \
                    dd if=/dev/zero of=/dev/null count=300
 300+0 records in
 300+0 records out
 153600 bytes (154 kB) copied, 0.000644873 s, 238 MB/s
 [ perf record: Woken up 5 times to write data ]
 [ perf record: Captured and wrote 0.096 MB perf.data (696 samples) ]

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/util/session.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index dfedf09..561441e 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1497,10 +1497,27 @@ int perf_session__register_idle_thread(struct perf_session *session)
 	return err;
 }
 
+static void
+perf_session__warn_order(const struct perf_session *session)
+{
+	const struct ordered_events *oe = &session->ordered_events;
+	struct perf_evsel *evsel;
+	bool should_warn = true;
+
+	evlist__for_each(session->evlist, evsel) {
+		if (evsel->attr.write_backward)
+			should_warn = false;
+	}
+
+	if (!should_warn)
+		return;
+	if (oe->nr_unordered_events != 0)
+		ui__warning("%u out of order events recorded.\n", oe->nr_unordered_events);
+}
+
 static void perf_session__warn_about_errors(const struct perf_session *session)
 {
 	const struct events_stats *stats = &session->evlist->stats;
-	const struct ordered_events *oe = &session->ordered_events;
 
 	if (session->tool->lost == perf_event__process_lost &&
 	    stats->nr_events[PERF_RECORD_LOST] != 0) {
@@ -1557,8 +1574,7 @@ static void perf_session__warn_about_errors(const struct perf_session *session)
 			    stats->nr_unprocessable_samples);
 	}
 
-	if (oe->nr_unordered_events != 0)
-		ui__warning("%u out of order events recorded.\n", oe->nr_unordered_events);
+	perf_session__warn_order(session);
 
 	events_stats__auxtrace_error_warn(stats);
 
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v7 7/8] perf tools: Check write_backward during evlist config
  2016-06-15  2:23 [PATCH v7 0/8] perf tools: Support overwritable ring buffer Wang Nan
                   ` (5 preceding siblings ...)
  2016-06-15  2:23 ` [PATCH v7 6/8] perf tools: Don't warn about out of order event if write_backward is used Wang Nan
@ 2016-06-15  2:23 ` Wang Nan
  2016-06-16 21:47   ` Arnaldo Carvalho de Melo
  2016-06-15  2:23 ` [PATCH v7 8/8] perf record: Unmap overwrite evlist when event terminate Wang Nan
  7 siblings, 1 reply; 29+ messages in thread
From: Wang Nan @ 2016-06-15  2:23 UTC (permalink / raw)
  To: acme
  Cc: pi3orama, linux-kernel, Wang Nan, He Kuang,
	Arnaldo Carvalho de Melo, Jiri Olsa, Masami Hiramatsu,
	Namhyung Kim, Zefan Li

Before this patch, when using overwritable ring buffer on an old
kernel, error message is misleading:

 # ~/perf record -m 1 -e raw_syscalls:*/overwrite/ -a
 Error:
 The raw_syscalls:sys_enter event is not supported.

This patch output clear error message to tell user his/her kernel
is too old:

 # ~/perf record -m 1 -e raw_syscalls:*/overwrite/ -a
 Reading from overwrite event is not supported by this kernel
 Error:
 The raw_syscalls:sys_enter event is not supported.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/util/evsel.c  | 17 +++++------------
 tools/perf/util/evsel.h  | 13 +++++++++++++
 tools/perf/util/record.c | 17 +++++++++++++++++
 3 files changed, 35 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index ef223c7..59a69bb 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -29,17 +29,7 @@
 #include "trace-event.h"
 #include "stat.h"
 
-static struct {
-	bool sample_id_all;
-	bool exclude_guest;
-	bool mmap2;
-	bool cloexec;
-	bool clockid;
-	bool clockid_wrong;
-	bool lbr_flags;
-	bool write_backward;
-} perf_missing_features;
-
+struct perf_missing_features perf_missing_features;
 static clockid_t clockid;
 
 static int perf_evsel__no_extra_init(struct perf_evsel *evsel __maybe_unused)
@@ -690,8 +680,11 @@ static void apply_config_terms(struct perf_evsel *evsel,
 	 * possible to set overwrite globally, without config
 	 * terms.
 	 */
-	if (evsel->overwrite)
+	if (evsel->overwrite) {
+		WARN_ONCE(perf_missing_features.write_backward,
+			  "Reading from overwrite event is not supported by this kernel\n");
 		attr->write_backward = 1;
+	}
 
 	/* User explicitly set per-event callgraph, clear the old setting and reset. */
 	if ((callgraph_buf != NULL) || (dump_size > 0) || max_stack) {
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 55ff958..17e506a 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -11,6 +11,19 @@
 #include "cpumap.h"
 #include "counts.h"
 
+struct perf_missing_features {
+	bool sample_id_all;
+	bool exclude_guest;
+	bool mmap2;
+	bool cloexec;
+	bool clockid;
+	bool clockid_wrong;
+	bool lbr_flags;
+	bool write_backward;
+};
+
+extern struct perf_missing_features perf_missing_features;
+
 struct perf_evsel;
 
 /*
diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
index 481792c..e3ab812 100644
--- a/tools/perf/util/record.c
+++ b/tools/perf/util/record.c
@@ -90,6 +90,11 @@ static void perf_probe_context_switch(struct perf_evsel *evsel)
 	evsel->attr.context_switch = 1;
 }
 
+static void perf_probe_write_backward(struct perf_evsel *evsel)
+{
+	evsel->attr.write_backward = 1;
+}
+
 bool perf_can_sample_identifier(void)
 {
 	return perf_probe_api(perf_probe_sample_identifier);
@@ -129,6 +134,17 @@ bool perf_can_record_cpu_wide(void)
 	return true;
 }
 
+static void perf_check_write_backward(void)
+{
+	static bool checked = false;
+
+	if (!checked) {
+		perf_missing_features.write_backward =
+			!perf_probe_api(perf_probe_write_backward);
+		checked = true;
+	}
+}
+
 void perf_evlist__config(struct perf_evlist *evlist, struct record_opts *opts,
 			 struct callchain_param *callchain)
 {
@@ -136,6 +152,7 @@ void perf_evlist__config(struct perf_evlist *evlist, struct record_opts *opts,
 	bool use_sample_identifier = false;
 	bool use_comm_exec;
 
+	perf_check_write_backward();
 	/*
 	 * Set the evsel leader links before we configure attributes,
 	 * since some might depend on this info.
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v7 8/8] perf record: Unmap overwrite evlist when event terminate
  2016-06-15  2:23 [PATCH v7 0/8] perf tools: Support overwritable ring buffer Wang Nan
                   ` (6 preceding siblings ...)
  2016-06-15  2:23 ` [PATCH v7 7/8] perf tools: Check write_backward during evlist config Wang Nan
@ 2016-06-15  2:23 ` Wang Nan
  2016-06-16 20:59   ` Arnaldo Carvalho de Melo
  7 siblings, 1 reply; 29+ messages in thread
From: Wang Nan @ 2016-06-15  2:23 UTC (permalink / raw)
  To: acme
  Cc: pi3orama, linux-kernel, Wang Nan, He Kuang,
	Arnaldo Carvalho de Melo, Jiri Olsa, Masami Hiramatsu,
	Namhyung Kim, Zefan Li

When see POLLERR or POLLHUP, unmap ring buffer from both the main
evlist and overwrite evlist.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c | 30 +++++++++++++++++++++---------
 tools/perf/util/evlist.c    |  3 +--
 tools/perf/util/evlist.h    |  2 +-
 3 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index b9094f0..ca6376c 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -914,6 +914,26 @@ out:
 	return err;
 }
 
+static void record__munmap_filtered(struct fdarray *fda, int fd, void *arg)
+{
+	struct record *rec = (struct record *)arg;
+
+	perf_evlist__mmap_put(rec->evlist, fda->priv[fd].idx);
+	if (rec->overwrite_evlist)
+		perf_evlist__mmap_put(rec->overwrite_evlist, fda->priv[fd].idx);
+}
+
+static int record__filter_pollfd(struct record *rec)
+{
+	/*
+	 * Although we may have auxiliray evlist, there is
+	 * only one pollfd, so we don't need to filter pollfd
+	 * for auxiliray evlist.
+	 */
+	return fdarray__filter(&rec->evlist->pollfd, POLLERR | POLLHUP,
+			       record__munmap_filtered, rec);
+}
+
 static int __cmd_record(struct record *rec, int argc, const char **argv)
 {
 	int err;
@@ -1150,15 +1170,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 				err = 0;
 			waking++;
 
-			/*
-			 * Although we may have auxiliray evlist, there is
-			 * only one pollfd, so we don't need to filter pollfd
-			 * for auxiliray evlist.
-			 *
-			 * TODO: if an event is terminated (POLLERR | POLLHUP),
-			 * unmap mmaps for auxiliray evlist too.
-			 */
-			if (perf_evlist__filter_pollfd(rec->evlist, POLLERR | POLLHUP) == 0)
+			if (record__filter_pollfd(rec) == 0)
 				draining = true;
 		}
 
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index e8fcb22..d43ee81 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -27,7 +27,6 @@
 #include <linux/log2.h>
 #include <linux/err.h>
 
-static void perf_evlist__mmap_put(struct perf_evlist *evlist, int idx);
 static void __perf_evlist__munmap(struct perf_evlist *evlist, int idx);
 
 #define FD(e, x, y) (*(int *)xyarray__entry(e->fd, x, y))
@@ -863,7 +862,7 @@ static void perf_evlist__mmap_get(struct perf_evlist *evlist, int idx)
 	atomic_inc(&evlist->mmap[idx].refcnt);
 }
 
-static void perf_evlist__mmap_put(struct perf_evlist *evlist, int idx)
+void perf_evlist__mmap_put(struct perf_evlist *evlist, int idx)
 {
 	struct perf_mmap *md = &evlist->mmap[idx];
 
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 41e65ac..ba5e006 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -141,7 +141,7 @@ union perf_event *perf_evlist__mmap_read_backward(struct perf_evlist *evlist,
 void perf_evlist__mmap_read_catchup(struct perf_evlist *evlist, int idx);
 
 void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx);
-
+void perf_evlist__mmap_put(struct perf_evlist *evlist, int idx);
 int perf_evlist__pause(struct perf_evlist *evlist);
 int perf_evlist__resume(struct perf_evlist *evlist);
 int perf_evlist__open(struct perf_evlist *evlist);
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v7 8/8] perf record: Unmap overwrite evlist when event terminate
  2016-06-15  2:23 ` [PATCH v7 8/8] perf record: Unmap overwrite evlist when event terminate Wang Nan
@ 2016-06-16 20:59   ` Arnaldo Carvalho de Melo
  2016-06-20  8:04     ` Wangnan (F)
  0 siblings, 1 reply; 29+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-06-16 20:59 UTC (permalink / raw)
  To: Wang Nan
  Cc: acme, pi3orama, linux-kernel, He Kuang, Jiri Olsa,
	Masami Hiramatsu, Namhyung Kim, Zefan Li

Em Wed, Jun 15, 2016 at 02:23:35AM +0000, Wang Nan escreveu:
> When see POLLERR or POLLHUP, unmap ring buffer from both the main
> evlist and overwrite evlist.

When you use an auxiliary evlist this makes evlist->parent be different
than evlist, right? So can't we just hide this from tools and do it all
in perf_evlist__filter_pollfd?

This way we will not need to expose perf_evlist__mmap_put(), etc.

I haven't tried to _actually_ do this, just asking from browsing the
patchkit and with the goal of hiding the internals of this thing inside
evlist.c as much as possible.

- Arnaldo
 
> Signed-off-by: Wang Nan <wangnan0@huawei.com>
> Cc: He Kuang <hekuang@huawei.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Zefan Li <lizefan@huawei.com>
> Cc: pi3orama@163.com
> ---
>  tools/perf/builtin-record.c | 30 +++++++++++++++++++++---------
>  tools/perf/util/evlist.c    |  3 +--
>  tools/perf/util/evlist.h    |  2 +-
>  3 files changed, 23 insertions(+), 12 deletions(-)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index b9094f0..ca6376c 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -914,6 +914,26 @@ out:
>  	return err;
>  }
>  
> +static void record__munmap_filtered(struct fdarray *fda, int fd, void *arg)
> +{
> +	struct record *rec = (struct record *)arg;
> +
> +	perf_evlist__mmap_put(rec->evlist, fda->priv[fd].idx);
> +	if (rec->overwrite_evlist)
> +		perf_evlist__mmap_put(rec->overwrite_evlist, fda->priv[fd].idx);
> +}
> +
> +static int record__filter_pollfd(struct record *rec)
> +{
> +	/*
> +	 * Although we may have auxiliray evlist, there is
> +	 * only one pollfd, so we don't need to filter pollfd
> +	 * for auxiliray evlist.
> +	 */
> +	return fdarray__filter(&rec->evlist->pollfd, POLLERR | POLLHUP,
> +			       record__munmap_filtered, rec);
> +}
> +
>  static int __cmd_record(struct record *rec, int argc, const char **argv)
>  {
>  	int err;
> @@ -1150,15 +1170,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>  				err = 0;
>  			waking++;
>  
> -			/*
> -			 * Although we may have auxiliray evlist, there is
> -			 * only one pollfd, so we don't need to filter pollfd
> -			 * for auxiliray evlist.
> -			 *
> -			 * TODO: if an event is terminated (POLLERR | POLLHUP),
> -			 * unmap mmaps for auxiliray evlist too.
> -			 */
> -			if (perf_evlist__filter_pollfd(rec->evlist, POLLERR | POLLHUP) == 0)
> +			if (record__filter_pollfd(rec) == 0)
>  				draining = true;
>  		}
>  
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index e8fcb22..d43ee81 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -27,7 +27,6 @@
>  #include <linux/log2.h>
>  #include <linux/err.h>
>  
> -static void perf_evlist__mmap_put(struct perf_evlist *evlist, int idx);
>  static void __perf_evlist__munmap(struct perf_evlist *evlist, int idx);
>  
>  #define FD(e, x, y) (*(int *)xyarray__entry(e->fd, x, y))
> @@ -863,7 +862,7 @@ static void perf_evlist__mmap_get(struct perf_evlist *evlist, int idx)
>  	atomic_inc(&evlist->mmap[idx].refcnt);
>  }
>  
> -static void perf_evlist__mmap_put(struct perf_evlist *evlist, int idx)
> +void perf_evlist__mmap_put(struct perf_evlist *evlist, int idx)
>  {
>  	struct perf_mmap *md = &evlist->mmap[idx];
>  
> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> index 41e65ac..ba5e006 100644
> --- a/tools/perf/util/evlist.h
> +++ b/tools/perf/util/evlist.h
> @@ -141,7 +141,7 @@ union perf_event *perf_evlist__mmap_read_backward(struct perf_evlist *evlist,
>  void perf_evlist__mmap_read_catchup(struct perf_evlist *evlist, int idx);
>  
>  void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx);
> -
> +void perf_evlist__mmap_put(struct perf_evlist *evlist, int idx);
>  int perf_evlist__pause(struct perf_evlist *evlist);
>  int perf_evlist__resume(struct perf_evlist *evlist);
>  int perf_evlist__open(struct perf_evlist *evlist);
> -- 
> 1.8.3.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v7 7/8] perf tools: Check write_backward during evlist config
  2016-06-15  2:23 ` [PATCH v7 7/8] perf tools: Check write_backward during evlist config Wang Nan
@ 2016-06-16 21:47   ` Arnaldo Carvalho de Melo
  2016-06-20  4:09     ` Wangnan (F)
  2016-06-22  7:43     ` [tip:perf/core] perf evsel: Fix write_backwards fallback tip-bot for Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 29+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-06-16 21:47 UTC (permalink / raw)
  To: Wang Nan
  Cc: pi3orama, linux-kernel, He Kuang, Arnaldo Carvalho de Melo,
	Jiri Olsa, Masami Hiramatsu, Namhyung Kim, Zefan Li

Em Wed, Jun 15, 2016 at 02:23:34AM +0000, Wang Nan escreveu:
> Before this patch, when using overwritable ring buffer on an old
> kernel, error message is misleading:
> 
>  # ~/perf record -m 1 -e raw_syscalls:*/overwrite/ -a
>  Error:
>  The raw_syscalls:sys_enter event is not supported.
> 
> This patch output clear error message to tell user his/her kernel
> is too old:
> 
>  # ~/perf record -m 1 -e raw_syscalls:*/overwrite/ -a
>  Reading from overwrite event is not supported by this kernel
>  Error:
>  The raw_syscalls:sys_enter event is not supported.

So I went to see if exposing that missing_features struct outside
evsel.c was strictly needed and found that we already have fallbacking
for this feature (attr.write_backwards) i.e. if we set it and
sys_perf_event_open() fails, we will check if we are asking the kernel
for some attr. field that it doesn't supports, set that missing_features
and try again.

But the way this was done for attr.write_backwards was buggy, as we need
to check features in the inverse order of their introduction to the
kernel, so that a newer tool checks first the newest perf_event_attr
fields, detecting that the older kernel doesn't have support for them.
The patch that introduced write_backwards support ([1]) in perf_evsel__open()
did this checking after all the other older attributes, wrongly.

[1]: b90dc17a5d14 ("perf evsel: Add overwrite attribute and check write_backward")

Also we shouldn't even try to call sys_perf_event_open if
perf_missing_features.write_backward is true and evsel->overwrite is
also true, the old code would check this only after successfully opening
the fd, do it before the open loop.

Please take a look at the following patch, see if it is sufficient for
handling older kernels, probably we need to emit a message to the user,
but that has to be done at the builtin- level, i.e. at the tool, i.e.
perf_evsel_open__strerror() should have what it takes to figure out this
extra error and provide/ a proper string, lemme add this to the patch...
done, please check:

write_backwards_fallback.patch:

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 9b2e3e624efe..4e8e6d8795d0 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1389,8 +1389,11 @@ fallback_missing_features:
 	if (perf_missing_features.lbr_flags)
 		evsel->attr.branch_sample_type &= ~(PERF_SAMPLE_BRANCH_NO_FLAGS |
 				     PERF_SAMPLE_BRANCH_NO_CYCLES);
-	if (perf_missing_features.write_backward)
+	if (perf_missing_features.write_backward) {
+		if (evsel->overwrite)
+			return -EINVAL;
 		evsel->attr.write_backward = false;
+	}
 retry_sample_id:
 	if (perf_missing_features.sample_id_all)
 		evsel->attr.sample_id_all = 0;
@@ -1453,12 +1456,6 @@ retry_open:
 				err = -EINVAL;
 				goto out_close;
 			}
-
-			if (evsel->overwrite &&
-			    perf_missing_features.write_backward) {
-				err = -EINVAL;
-				goto out_close;
-			}
 		}
 	}
 
@@ -1496,7 +1493,10 @@ try_fallback:
 	 * Must probe features in the order they were added to the
 	 * perf_event_attr interface.
 	 */
-	if (!perf_missing_features.clockid_wrong && evsel->attr.use_clockid) {
+	if (!perf_missing_features.write_backward && evsel->attr.write_backward) {
+		perf_missing_features.write_backward = true;
+		goto fallback_missing_features;
+	} else if (!perf_missing_features.clockid_wrong && evsel->attr.use_clockid) {
 		perf_missing_features.clockid_wrong = true;
 		goto fallback_missing_features;
 	} else if (!perf_missing_features.clockid && evsel->attr.use_clockid) {
@@ -1521,10 +1521,6 @@ try_fallback:
 			  PERF_SAMPLE_BRANCH_NO_FLAGS))) {
 		perf_missing_features.lbr_flags = true;
 		goto fallback_missing_features;
-	} else if (!perf_missing_features.write_backward &&
-			evsel->attr.write_backward) {
-		perf_missing_features.write_backward = true;
-		goto fallback_missing_features;
 	}
 
 out_close:
@@ -2409,6 +2405,8 @@ int perf_evsel__open_strerror(struct perf_evsel *evsel, struct target *target,
 	"We found oprofile daemon running, please stop it and try again.");
 		break;
 	case EINVAL:
+		if (evsel->overwrite && perf_missing_features.write_backward)
+			return scnprintf(msg, size, "Reading from overwrite event is not supported by this kernel.");
 		if (perf_missing_features.clockid)
 			return scnprintf(msg, size, "clockid feature not supported.");
 		if (perf_missing_features.clockid_wrong)

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v7 7/8] perf tools: Check write_backward during evlist config
  2016-06-16 21:47   ` Arnaldo Carvalho de Melo
@ 2016-06-20  4:09     ` Wangnan (F)
  2016-06-22  7:43     ` [tip:perf/core] perf evsel: Fix write_backwards fallback tip-bot for Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 29+ messages in thread
From: Wangnan (F) @ 2016-06-20  4:09 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: pi3orama, linux-kernel, He Kuang, Arnaldo Carvalho de Melo,
	Jiri Olsa, Masami Hiramatsu, Namhyung Kim, Zefan Li



On 2016/6/17 5:47, Arnaldo Carvalho de Melo wrote:
> Em Wed, Jun 15, 2016 at 02:23:34AM +0000, Wang Nan escreveu:
>> Before this patch, when using overwritable ring buffer on an old
>> kernel, error message is misleading:
>>
>>   # ~/perf record -m 1 -e raw_syscalls:*/overwrite/ -a
>>   Error:
>>   The raw_syscalls:sys_enter event is not supported.
>>
>> This patch output clear error message to tell user his/her kernel
>> is too old:
>>
>>   # ~/perf record -m 1 -e raw_syscalls:*/overwrite/ -a
>>   Reading from overwrite event is not supported by this kernel
>>   Error:
>>   The raw_syscalls:sys_enter event is not supported.
> So I went to see if exposing that missing_features struct outside
> evsel.c was strictly needed and found that we already have fallbacking
> for this feature (attr.write_backwards) i.e. if we set it and
> sys_perf_event_open() fails, we will check if we are asking the kernel
> for some attr. field that it doesn't supports, set that missing_features
> and try again.
>
> But the way this was done for attr.write_backwards was buggy, as we need
> to check features in the inverse order of their introduction to the
> kernel, so that a newer tool checks first the newest perf_event_attr
> fields, detecting that the older kernel doesn't have support for them.
> The patch that introduced write_backwards support ([1]) in perf_evsel__open()
> did this checking after all the other older attributes, wrongly.
>
> [1]: b90dc17a5d14 ("perf evsel: Add overwrite attribute and check write_backward")
>
> Also we shouldn't even try to call sys_perf_event_open if
> perf_missing_features.write_backward is true and evsel->overwrite is
> also true, the old code would check this only after successfully opening
> the fd, do it before the open loop.
>
> Please take a look at the following patch, see if it is sufficient for
> handling older kernels, probably we need to emit a message to the user,
> but that has to be done at the builtin- level, i.e. at the tool, i.e.
> perf_evsel_open__strerror() should have what it takes to figure out this
> extra error and provide/ a proper string, lemme add this to the patch...
> done, please check:
>
> write_backwards_fallback.patch:

[SNIP]

>   
> @@ -1496,7 +1493,10 @@ try_fallback:
>   	 * Must probe features in the order they were added to the
>   	 * perf_event_attr interface.
>   	 */

I read this comment but misunderstand. I thought 'order' means newest last.

Will try your patch. Thank you.

> -	if (!perf_missing_features.clockid_wrong && evsel->attr.use_clockid) {
> +	if (!perf_missing_features.write_backward && evsel->attr.write_backward) {
> +		perf_missing_features.write_backward = true;
> +		goto fallback_missing_features;
> +	} else if (!perf_missing_features.clockid_wrong && evsel->attr.use_clockid) {
>   		perf_missing_features.clockid_wrong = true;
>   		goto fallback_missing_features;
>   	} else if (!perf_missing_features.clockid && evsel->attr.use_clockid) {
> @@ -1521,10 +1521,6 @@ try_fallback:
>   			  PERF_SAMPLE_BRANCH_NO_FLAGS))) {
>   		perf_missing_features.lbr_flags = true;
>   		goto fallback_missing_features;
> -	} else if (!perf_missing_features.write_backward &&
> -			evsel->attr.write_backward) {
> -		perf_missing_features.write_backward = true;
> -		goto fallback_missing_features;
>   	}
>   
>   out_close:
> @@ -2409,6 +2405,8 @@ int perf_evsel__open_strerror(struct perf_evsel *evsel, struct target *target,
>   	"We found oprofile daemon running, please stop it and try again.");
>   		break;
>   	case EINVAL:
> +		if (evsel->overwrite && perf_missing_features.write_backward)
> +			return scnprintf(msg, size, "Reading from overwrite event is not supported by this kernel.");
>   		if (perf_missing_features.clockid)
>   			return scnprintf(msg, size, "clockid feature not supported.");
>   		if (perf_missing_features.clockid_wrong)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v7 8/8] perf record: Unmap overwrite evlist when event terminate
  2016-06-16 20:59   ` Arnaldo Carvalho de Melo
@ 2016-06-20  8:04     ` Wangnan (F)
  0 siblings, 0 replies; 29+ messages in thread
From: Wangnan (F) @ 2016-06-20  8:04 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: acme, pi3orama, linux-kernel, He Kuang, Jiri Olsa,
	Masami Hiramatsu, Namhyung Kim, Zefan Li



On 2016/6/17 4:59, Arnaldo Carvalho de Melo wrote:
> Em Wed, Jun 15, 2016 at 02:23:35AM +0000, Wang Nan escreveu:
>> When see POLLERR or POLLHUP, unmap ring buffer from both the main
>> evlist and overwrite evlist.
> When you use an auxiliary evlist this makes evlist->parent be different
> than evlist, right? So can't we just hide this from tools and do it all
> in perf_evlist__filter_pollfd?

So we need to find all children from parent. Currently we can only track
parent from children. This is not enough. Although currently we can pass
rec->overwrite_evlist to the filter so perf_evlist__filter_pollfd knows
it should release mmaps from both evlists, if in future we have more than
one aux_evlists it doesn't work.

Thank you.


> This way we will not need to expose perf_evlist__mmap_put(), etc.
>
> I haven't tried to _actually_ do this, just asking from browsing the
> patchkit and with the goal of hiding the internals of this thing inside
> evlist.c as much as possible.
>
> - Arnaldo
>   
>> Signed-off-by: Wang Nan <wangnan0@huawei.com>
>> Cc: He Kuang <hekuang@huawei.com>
>> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
>> Cc: Jiri Olsa <jolsa@kernel.org>
>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
>> Cc: Namhyung Kim <namhyung@kernel.org>
>> Cc: Zefan Li <lizefan@huawei.com>
>> Cc: pi3orama@163.com
>> ---
>>   tools/perf/builtin-record.c | 30 +++++++++++++++++++++---------
>>   tools/perf/util/evlist.c    |  3 +--
>>   tools/perf/util/evlist.h    |  2 +-
>>   3 files changed, 23 insertions(+), 12 deletions(-)
>>
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index b9094f0..ca6376c 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -914,6 +914,26 @@ out:
>>   	return err;
>>   }
>>   
>> +static void record__munmap_filtered(struct fdarray *fda, int fd, void *arg)
>> +{
>> +	struct record *rec = (struct record *)arg;
>> +
>> +	perf_evlist__mmap_put(rec->evlist, fda->priv[fd].idx);
>> +	if (rec->overwrite_evlist)
>> +		perf_evlist__mmap_put(rec->overwrite_evlist, fda->priv[fd].idx);
>> +}
>> +
>> +static int record__filter_pollfd(struct record *rec)
>> +{
>> +	/*
>> +	 * Although we may have auxiliray evlist, there is
>> +	 * only one pollfd, so we don't need to filter pollfd
>> +	 * for auxiliray evlist.
>> +	 */
>> +	return fdarray__filter(&rec->evlist->pollfd, POLLERR | POLLHUP,
>> +			       record__munmap_filtered, rec);
>> +}
>> +
>>   static int __cmd_record(struct record *rec, int argc, const char **argv)
>>   {
>>   	int err;
>> @@ -1150,15 +1170,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>>   				err = 0;
>>   			waking++;
>>   
>> -			/*
>> -			 * Although we may have auxiliray evlist, there is
>> -			 * only one pollfd, so we don't need to filter pollfd
>> -			 * for auxiliray evlist.
>> -			 *
>> -			 * TODO: if an event is terminated (POLLERR | POLLHUP),
>> -			 * unmap mmaps for auxiliray evlist too.
>> -			 */
>> -			if (perf_evlist__filter_pollfd(rec->evlist, POLLERR | POLLHUP) == 0)
>> +			if (record__filter_pollfd(rec) == 0)
>>   				draining = true;
>>   		}
>>   
>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
>> index e8fcb22..d43ee81 100644
>> --- a/tools/perf/util/evlist.c
>> +++ b/tools/perf/util/evlist.c
>> @@ -27,7 +27,6 @@
>>   #include <linux/log2.h>
>>   #include <linux/err.h>
>>   
>> -static void perf_evlist__mmap_put(struct perf_evlist *evlist, int idx);
>>   static void __perf_evlist__munmap(struct perf_evlist *evlist, int idx);
>>   
>>   #define FD(e, x, y) (*(int *)xyarray__entry(e->fd, x, y))
>> @@ -863,7 +862,7 @@ static void perf_evlist__mmap_get(struct perf_evlist *evlist, int idx)
>>   	atomic_inc(&evlist->mmap[idx].refcnt);
>>   }
>>   
>> -static void perf_evlist__mmap_put(struct perf_evlist *evlist, int idx)
>> +void perf_evlist__mmap_put(struct perf_evlist *evlist, int idx)
>>   {
>>   	struct perf_mmap *md = &evlist->mmap[idx];
>>   
>> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
>> index 41e65ac..ba5e006 100644
>> --- a/tools/perf/util/evlist.h
>> +++ b/tools/perf/util/evlist.h
>> @@ -141,7 +141,7 @@ union perf_event *perf_evlist__mmap_read_backward(struct perf_evlist *evlist,
>>   void perf_evlist__mmap_read_catchup(struct perf_evlist *evlist, int idx);
>>   
>>   void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx);
>> -
>> +void perf_evlist__mmap_put(struct perf_evlist *evlist, int idx);
>>   int perf_evlist__pause(struct perf_evlist *evlist);
>>   int perf_evlist__resume(struct perf_evlist *evlist);
>>   int perf_evlist__open(struct perf_evlist *evlist);
>> -- 
>> 1.8.3.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v8 0/8] perf tools: Support overwritable ring buffer
@ 2016-06-20 10:47 Wang Nan
  2016-06-20 10:47 ` [PATCH v8 1/8] perf tools: Fix write_backwards fallback Wang Nan
                   ` (7 more replies)
  0 siblings, 8 replies; 29+ messages in thread
From: Wang Nan @ 2016-06-20 10:47 UTC (permalink / raw)
  To: acme; +Cc: linux-kernel, pi3orama, Wang Nan

This patch set enables daemonized perf recording by utilizing
overwritable backward ring buffer. With this feature one can
put perf background, and dump ring buffer records by a SIGUSR2
when he/she find something unusual. For example, following
command record system calls, schedule events and samples on cpu cycles
continously:

 # perf record -g -e cycles -e raw_syscalls:*/call-graph=no/ \
                  -e sched:sched_switch/call-graph=no/ \
                  --switch-output --overwrite -a

Then by sending SIGUSR2 to perf when lagging is happen, we get multiple
perf.data output, each of them correspond a abnormal event, and the data
size is reasonable:

 # ls -l ./perf.data*
 -rw------- 1 root root 5122165 May 13 23:51 ./perf.data.2016051323511683
 -rw------- 1 root root 5135093 May 13 23:51 ./perf.data.2016051323512107
 -rw------- 1 root root 5135213 May 13 23:51 ./perf.data.2016051323512215
 -rw------- 1 root root 5135157 May 13 23:51 ./perf.data.2016051323512387

v1 -> v2: Totally redesign: drop the principle of 'channal', use
          auxiliary evlist instead. Fix missing documentation.

v2 -> v3: Rename perf_evlist__toggle_paused() to perf_evlist__pause/resume.

v3 -> v4: Update commit message to describe auxiliary evlist more clearly.

v4 -> v5: Reorder commits, ensure '--overwrite' works right after perf
          support the option.
          Add test cases for auxiliary evlist.
          Avoid bug if main evlist is empty.

v5 -> v6: Improve filter pollfd related code.

v6 -> v7: Rebase to newest perf/core.

v7 -> v8: Unmap mmaps from parent and children in
          perf_evlist__munmap_filtered(), hide more detail of aux evlist.
          Add --tail-synthesize, do synthesize at the end of perf.data.

Wang Nan (8):
  perf tools: Fix write_backwards fallback
  perf evlist: Introduce aux evlist
  perf tests: Add testcase for auxiliary evlist
  perf record: Introduce rec->overwrite_evlist for overwritable events
  perf record: Toggle overwrite ring buffer for reading
  perf tools: Enable overwrite settings
  perf tools: Don't warn about out of order event if write_backward is
    used
  perf tools: Add --tail-synthesize option

 tools/perf/Documentation/perf-record.txt |  22 +++
 tools/perf/builtin-record.c              | 310 +++++++++++++++++++++++++++----
 tools/perf/perf.h                        |   2 +
 tools/perf/tests/backward-ring-buffer.c  |  86 +++++++--
 tools/perf/util/evlist.c                 |  49 +++--
 tools/perf/util/evlist.h                 |  12 ++
 tools/perf/util/evsel.c                  |  35 ++--
 tools/perf/util/evsel.h                  |   2 +
 tools/perf/util/parse-events.c           |  20 +-
 tools/perf/util/parse-events.h           |   2 +
 tools/perf/util/parse-events.l           |   2 +
 tools/perf/util/session.c                |  22 ++-
 12 files changed, 476 insertions(+), 88 deletions(-)

-- 
1.8.3.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v8 1/8] perf tools: Fix write_backwards fallback
  2016-06-20 10:47 [PATCH v8 0/8] perf tools: Support overwritable ring buffer Wang Nan
@ 2016-06-20 10:47 ` Wang Nan
  2016-06-20 10:47 ` [PATCH v8 2/8] perf evlist: Introduce aux evlist Wang Nan
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Wang Nan @ 2016-06-20 10:47 UTC (permalink / raw)
  To: acme
  Cc: linux-kernel, pi3orama, Wang Nan, He Kuang, Jiri Olsa,
	Masami Hiramatsu, Namhyung Kim, Zefan Li

Commit b90dc17a5d14 "perf evsel: Add overwrite attribute and check
write_backward" misunderstood the 'order' should be obeyed in
__perf_evsel__open.

Arnaldo in [1]:
"But the way this was done for attr.write_backwards was buggy, as we need
to check features in the inverse order of their introduction to the
kernel, so that a newer tool checks first the newest perf_event_attr
fields, detecting that the older kernel doesn't have support for them."

Also, we can avoid calling sys_perf_event_open() if we have already
detected the missing of write_backward.

[1] http://lkml.kernel.org/r/20160616214724.GI13337@kernel.org

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/util/evsel.c | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 9b2e3e6..1d8f2bb 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1389,8 +1389,11 @@ fallback_missing_features:
 	if (perf_missing_features.lbr_flags)
 		evsel->attr.branch_sample_type &= ~(PERF_SAMPLE_BRANCH_NO_FLAGS |
 				     PERF_SAMPLE_BRANCH_NO_CYCLES);
-	if (perf_missing_features.write_backward)
+	if (perf_missing_features.write_backward) {
+		if (evsel->overwrite)
+			return -EINVAL;
 		evsel->attr.write_backward = false;
+	}
 retry_sample_id:
 	if (perf_missing_features.sample_id_all)
 		evsel->attr.sample_id_all = 0;
@@ -1453,12 +1456,6 @@ retry_open:
 				err = -EINVAL;
 				goto out_close;
 			}
-
-			if (evsel->overwrite &&
-			    perf_missing_features.write_backward) {
-				err = -EINVAL;
-				goto out_close;
-			}
 		}
 	}
 
@@ -1496,7 +1493,10 @@ try_fallback:
 	 * Must probe features in the order they were added to the
 	 * perf_event_attr interface.
 	 */
-	if (!perf_missing_features.clockid_wrong && evsel->attr.use_clockid) {
+	if (!perf_missing_features.write_backward && evsel->attr.write_backward) {
+		perf_missing_features.write_backward = true;
+		goto fallback_missing_features;
+	} else if (!perf_missing_features.clockid_wrong && evsel->attr.use_clockid) {
 		perf_missing_features.clockid_wrong = true;
 		goto fallback_missing_features;
 	} else if (!perf_missing_features.clockid && evsel->attr.use_clockid) {
@@ -1521,12 +1521,7 @@ try_fallback:
 			  PERF_SAMPLE_BRANCH_NO_FLAGS))) {
 		perf_missing_features.lbr_flags = true;
 		goto fallback_missing_features;
-	} else if (!perf_missing_features.write_backward &&
-			evsel->attr.write_backward) {
-		perf_missing_features.write_backward = true;
-		goto fallback_missing_features;
 	}
-
 out_close:
 	do {
 		while (--thread >= 0) {
@@ -2409,6 +2404,8 @@ int perf_evsel__open_strerror(struct perf_evsel *evsel, struct target *target,
 	"We found oprofile daemon running, please stop it and try again.");
 		break;
 	case EINVAL:
+		if (evsel->overwrite && perf_missing_features.write_backward)
+			return scnprintf(msg, size, "Reading from overwrite event is not supported by this kernel.");
 		if (perf_missing_features.clockid)
 			return scnprintf(msg, size, "clockid feature not supported.");
 		if (perf_missing_features.clockid_wrong)
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 2/8] perf evlist: Introduce aux evlist
  2016-06-20 10:47 [PATCH v8 0/8] perf tools: Support overwritable ring buffer Wang Nan
  2016-06-20 10:47 ` [PATCH v8 1/8] perf tools: Fix write_backwards fallback Wang Nan
@ 2016-06-20 10:47 ` Wang Nan
  2016-06-20 20:36   ` Arnaldo Carvalho de Melo
  2016-06-20 10:47 ` [PATCH v8 3/8] perf tests: Add testcase for auxiliary evlist Wang Nan
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: Wang Nan @ 2016-06-20 10:47 UTC (permalink / raw)
  To: acme
  Cc: linux-kernel, pi3orama, Wang Nan, He Kuang, Jiri Olsa,
	Masami Hiramatsu, Namhyung Kim, Zefan Li

An auxiliary evlist is created by perf_evlist__new_aux() using an
existing evlist as its parent. An auxiliary evlist can have its own
'struct perf_mmap', but can't have any other data. User should use its
parent instead when accessing other data.

Auxiliary evlists are containers of 'struct perf_mmap'. It is introduced
to allow its parent evlist to map different events into separated mmaps.

Following commits create an auxiliary evlist for overwritable
events, because overwritable events need a read only and backwards ring
buffer, which is different from normal events.

To achieve this goal, this patch carefully changes 'evlist' to
'evlist->parent' in all functions in the path of 'perf_evlist__mmap_ex',
except 'evlist->mmap' related operations, to make sure all evlist
modifications (like pollfd and event id hash tables) goes to original
evlist.

A 'evlist->parent' pointer is added to 'struct perf_evlist' and points to
the evlist itself for normal evlists.

Children of one evlist are linked into it so one can find all children
from its parent.

To avoid potential complexity, forbid creating aux evlist from another
aux evlist.

Improve perf_evlist__munmap_filtered(), so when recording, if an event
is terminated, unmap mmaps, from parent and children.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/util/evlist.c | 49 +++++++++++++++++++++++++++++++++++++-----------
 tools/perf/util/evlist.h | 12 ++++++++++++
 2 files changed, 50 insertions(+), 11 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 1b918aa..b03cb3c 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -41,10 +41,12 @@ void perf_evlist__init(struct perf_evlist *evlist, struct cpu_map *cpus,
 	for (i = 0; i < PERF_EVLIST__HLIST_SIZE; ++i)
 		INIT_HLIST_HEAD(&evlist->heads[i]);
 	INIT_LIST_HEAD(&evlist->entries);
+	INIT_LIST_HEAD(&evlist->children);
 	perf_evlist__set_maps(evlist, cpus, threads);
 	fdarray__init(&evlist->pollfd, 64);
 	evlist->workload.pid = -1;
 	evlist->backward = false;
+	evlist->parent = evlist;
 }
 
 struct perf_evlist *perf_evlist__new(void)
@@ -487,13 +489,17 @@ static void perf_evlist__munmap_filtered(struct fdarray *fda, int fd,
 					 void *arg __maybe_unused)
 {
 	struct perf_evlist *evlist = container_of(fda, struct perf_evlist, pollfd);
+	struct perf_evlist *child;
 
 	perf_evlist__mmap_put(evlist, fda->priv[fd].idx);
+	list_for_each_entry(child, &evlist->children, list)
+		perf_evlist__mmap_put(child, fda->priv[fd].idx);
+
 }
 
 int perf_evlist__filter_pollfd(struct perf_evlist *evlist, short revents_and_mask)
 {
-	return fdarray__filter(&evlist->pollfd, revents_and_mask,
+	return fdarray__filter(&evlist->parent->pollfd, revents_and_mask,
 			       perf_evlist__munmap_filtered, NULL);
 }
 
@@ -1012,7 +1018,7 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
 	struct perf_evsel *evsel;
 	int revent;
 
-	evlist__for_each(evlist, evsel) {
+	evlist__for_each(evlist->parent, evsel) {
 		int fd;
 
 		if (evsel->overwrite != (evlist->overwrite && evlist->backward))
@@ -1044,16 +1050,16 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
 		 * Therefore don't add it for polling.
 		 */
 		if (!evsel->system_wide &&
-		    __perf_evlist__add_pollfd(evlist, fd, idx, revent) < 0) {
+		    __perf_evlist__add_pollfd(evlist->parent, fd, idx, revent) < 0) {
 			perf_evlist__mmap_put(evlist, idx);
 			return -1;
 		}
 
 		if (evsel->attr.read_format & PERF_FORMAT_ID) {
-			if (perf_evlist__id_add_fd(evlist, evsel, cpu, thread,
+			if (perf_evlist__id_add_fd(evlist->parent, evsel, cpu, thread,
 						   fd) < 0)
 				return -1;
-			perf_evlist__set_sid_idx(evlist, evsel, idx, cpu,
+			perf_evlist__set_sid_idx(evlist->parent, evsel, idx, cpu,
 						 thread);
 		}
 	}
@@ -1094,13 +1100,13 @@ static int perf_evlist__mmap_per_thread(struct perf_evlist *evlist,
 					struct mmap_params *mp)
 {
 	int thread;
-	int nr_threads = thread_map__nr(evlist->threads);
+	int nr_threads = thread_map__nr(evlist->parent->threads);
 
 	pr_debug2("perf event ring buffer mmapped per thread\n");
 	for (thread = 0; thread < nr_threads; thread++) {
 		int output = -1;
 
-		auxtrace_mmap_params__set_idx(&mp->auxtrace_mp, evlist, thread,
+		auxtrace_mmap_params__set_idx(&mp->auxtrace_mp, evlist->parent, thread,
 					      false);
 
 		if (perf_evlist__mmap_per_evsel(evlist, thread, mp, 0, thread,
@@ -1239,8 +1245,8 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 			 bool auxtrace_overwrite)
 {
 	struct perf_evsel *evsel;
-	const struct cpu_map *cpus = evlist->cpus;
-	const struct thread_map *threads = evlist->threads;
+	const struct cpu_map *cpus = evlist->parent->cpus;
+	const struct thread_map *threads = evlist->parent->threads;
 	struct mmap_params mp = {
 		.prot = PROT_READ | (overwrite ? 0 : PROT_WRITE),
 	};
@@ -1248,7 +1254,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 	if (evlist->mmap == NULL && perf_evlist__alloc_mmap(evlist) < 0)
 		return -ENOMEM;
 
-	if (evlist->pollfd.entries == NULL && perf_evlist__alloc_pollfd(evlist) < 0)
+	if (evlist->parent->pollfd.entries == NULL && perf_evlist__alloc_pollfd(evlist->parent) < 0)
 		return -ENOMEM;
 
 	evlist->overwrite = overwrite;
@@ -1259,7 +1265,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
 	auxtrace_mmap_params__init(&mp.auxtrace_mp, evlist->mmap_len,
 				   auxtrace_pages, auxtrace_overwrite);
 
-	evlist__for_each(evlist, evsel) {
+	evlist__for_each(evlist->parent, evsel) {
 		if ((evsel->attr.read_format & PERF_FORMAT_ID) &&
 		    evsel->sample_id == NULL &&
 		    perf_evsel__alloc_id(evsel, cpu_map__nr(cpus), threads->nr) < 0)
@@ -1916,3 +1922,24 @@ perf_evlist__find_evsel_by_str(struct perf_evlist *evlist,
 
 	return NULL;
 }
+
+struct perf_evlist *perf_evlist__new_aux(struct perf_evlist *parent)
+{
+	struct perf_evlist *evlist;
+
+	if (perf_evlist__is_aux(parent)) {
+		pr_err("Internal error: create aux evlist from another aux evlist\n");
+		return NULL;
+	}
+
+	evlist = zalloc(sizeof(*evlist));
+	if (!evlist)
+		return NULL;
+
+	perf_evlist__init(evlist, parent->cpus, parent->threads);
+	evlist->parent = parent->parent;
+	INIT_LIST_HEAD(&evlist->list);
+	list_add(&evlist->list, &parent->children);
+
+	return evlist;
+}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 68cb136..5b50692 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -37,6 +37,10 @@ struct perf_mmap {
 
 struct perf_evlist {
 	struct list_head entries;
+	union {
+		struct list_head children;
+		struct list_head list;
+	};
 	struct hlist_head heads[PERF_EVLIST__HLIST_SIZE];
 	int		 nr_entries;
 	int		 nr_groups;
@@ -60,8 +64,14 @@ struct perf_evlist {
 	struct perf_evsel *selected;
 	struct events_stats stats;
 	struct perf_env	*env;
+	struct perf_evlist *parent;
 };
 
+static inline bool perf_evlist__is_aux(struct perf_evlist *evlist)
+{
+	return evlist->parent != evlist;
+}
+
 struct perf_evsel_str_handler {
 	const char *name;
 	void	   *handler;
@@ -70,6 +80,8 @@ struct perf_evsel_str_handler {
 struct perf_evlist *perf_evlist__new(void);
 struct perf_evlist *perf_evlist__new_default(void);
 struct perf_evlist *perf_evlist__new_dummy(void);
+struct perf_evlist *perf_evlist__new_aux(struct perf_evlist *);
+
 void perf_evlist__init(struct perf_evlist *evlist, struct cpu_map *cpus,
 		       struct thread_map *threads);
 void perf_evlist__exit(struct perf_evlist *evlist);
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 3/8] perf tests: Add testcase for auxiliary evlist
  2016-06-20 10:47 [PATCH v8 0/8] perf tools: Support overwritable ring buffer Wang Nan
  2016-06-20 10:47 ` [PATCH v8 1/8] perf tools: Fix write_backwards fallback Wang Nan
  2016-06-20 10:47 ` [PATCH v8 2/8] perf evlist: Introduce aux evlist Wang Nan
@ 2016-06-20 10:47 ` Wang Nan
  2016-06-21 21:05   ` Arnaldo Carvalho de Melo
  2016-06-20 10:47 ` [PATCH v8 4/8] perf record: Introduce rec->overwrite_evlist for overwritable events Wang Nan
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: Wang Nan @ 2016-06-20 10:47 UTC (permalink / raw)
  To: acme
  Cc: linux-kernel, pi3orama, Wang Nan, Arnaldo Carvalho de Melo,
	Jiri Olsa, Masami Hiramatsu, Namhyung Kim, Zefan Li, He Kuang

Improve test backward-ring-buffer, trace both enter and exit event of
prctl() syscall, utilize auxiliary evlist to mmap enter and exit event
into separated mmaps.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
---
 tools/perf/tests/backward-ring-buffer.c | 87 ++++++++++++++++++++++++++-------
 1 file changed, 68 insertions(+), 19 deletions(-)

diff --git a/tools/perf/tests/backward-ring-buffer.c b/tools/perf/tests/backward-ring-buffer.c
index d9ba991..76e34c0 100644
--- a/tools/perf/tests/backward-ring-buffer.c
+++ b/tools/perf/tests/backward-ring-buffer.c
@@ -31,16 +31,19 @@ static int count_samples(struct perf_evlist *evlist, int *sample_count,
 	for (i = 0; i < evlist->nr_mmaps; i++) {
 		union perf_event *event;
 
-		perf_evlist__mmap_read_catchup(evlist, i);
-		while ((event = perf_evlist__mmap_read_backward(evlist, i)) != NULL) {
+		if (evlist->backward)
+			perf_evlist__mmap_read_catchup(evlist, i);
+		while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) {
 			const u32 type = event->header.type;
 
 			switch (type) {
 			case PERF_RECORD_SAMPLE:
-				(*sample_count)++;
+				if (sample_count)
+					(*sample_count)++;
 				break;
 			case PERF_RECORD_COMM:
-				(*comm_count)++;
+				if (comm_count)
+					(*comm_count)++;
 				break;
 			default:
 				pr_err("Unexpected record of type %d\n", type);
@@ -51,34 +54,53 @@ static int count_samples(struct perf_evlist *evlist, int *sample_count,
 	return TEST_OK;
 }
 
-static int do_test(struct perf_evlist *evlist, int mmap_pages,
-		   int *sample_count, int *comm_count)
+static int do_test(struct perf_evlist *evlist,
+		   struct perf_evlist *aux_evlist,
+		   int mmap_pages,
+		   int *enter_sample_count,
+		   int *exit_sample_count,
+		   int *comm_count)
 {
 	int err;
 	char sbuf[STRERR_BUFSIZE];
 
-	err = perf_evlist__mmap(evlist, mmap_pages, true);
+	err = perf_evlist__mmap(evlist, mmap_pages, false);
 	if (err < 0) {
 		pr_debug("perf_evlist__mmap: %s\n",
 			 strerror_r(errno, sbuf, sizeof(sbuf)));
 		return TEST_FAIL;
 	}
 
+	err = perf_evlist__mmap(aux_evlist, mmap_pages, true);
+	if (err < 0) {
+		pr_debug("perf_evlist__mmap for aux_evlist: %s\n",
+			 strerror_r(errno, sbuf, sizeof(sbuf)));
+		return TEST_FAIL;
+	}
+
 	perf_evlist__enable(evlist);
 	testcase();
 	perf_evlist__disable(evlist);
 
-	err = count_samples(evlist, sample_count, comm_count);
+	err = count_samples(aux_evlist, exit_sample_count, comm_count);
+	if (err)
+		goto errout;
+	err = count_samples(evlist, enter_sample_count, NULL);
+	if (err)
+		goto errout;
+errout:
 	perf_evlist__munmap(evlist);
+	perf_evlist__munmap(aux_evlist);
 	return err;
 }
 
 
 int test__backward_ring_buffer(int subtest __maybe_unused)
 {
-	int ret = TEST_SKIP, err, sample_count = 0, comm_count = 0;
+	int ret = TEST_SKIP, err;
+	int enter_sample_count = 0, exit_sample_count = 0, comm_count = 0;
 	char pid[16], sbuf[STRERR_BUFSIZE];
-	struct perf_evlist *evlist;
+	struct perf_evlist *evlist, *aux_evlist = NULL;
 	struct perf_evsel *evsel __maybe_unused;
 	struct parse_events_error parse_error;
 	struct record_opts opts = {
@@ -115,11 +137,22 @@ int test__backward_ring_buffer(int subtest __maybe_unused)
 		goto out_delete_evlist;
 	}
 
-	perf_evlist__config(evlist, &opts, NULL);
+	/*
+	 * Set backward bit, ring buffer should be writing from end. Record
+	 * it in aux evlist
+	 */
+	perf_evlist__last(evlist)->overwrite = true;
+	perf_evlist__last(evlist)->attr.write_backward = 1;
 
-	/* Set backward bit, ring buffer should be writing from end */
-	evlist__for_each(evlist, evsel)
-		evsel->attr.write_backward = 1;
+	err = parse_events(evlist, "syscalls:sys_exit_prctl", &parse_error);
+	if (err) {
+		pr_debug("Failed to parse tracepoint event, try use root\n");
+		ret = TEST_SKIP;
+		goto out_delete_evlist;
+	}
+	/* Don't set backward bit for exit event. Record it in main evlist */
+
+	perf_evlist__config(evlist, &opts, NULL);
 
 	err = perf_evlist__open(evlist);
 	if (err < 0) {
@@ -128,24 +161,40 @@ int test__backward_ring_buffer(int subtest __maybe_unused)
 		goto out_delete_evlist;
 	}
 
+	aux_evlist = perf_evlist__new_aux(evlist);
+	if (!aux_evlist) {
+		pr_debug("perf_evlist__new_aux failed\n");
+		goto out_delete_evlist;
+	}
+	aux_evlist->backward = true;
+
 	ret = TEST_FAIL;
-	err = do_test(evlist, opts.mmap_pages, &sample_count,
+	err = do_test(evlist, aux_evlist, opts.mmap_pages,
+		      &enter_sample_count, &exit_sample_count,
 		      &comm_count);
 	if (err != TEST_OK)
 		goto out_delete_evlist;
 
-	if ((sample_count != NR_ITERS) || (comm_count != NR_ITERS)) {
-		pr_err("Unexpected counter: sample_count=%d, comm_count=%d\n",
-		       sample_count, comm_count);
+	if (enter_sample_count != exit_sample_count) {
+		pr_err("Unexpected counter: enter_sample_count=%d, exit_sample_count=%d\n",
+		       enter_sample_count, exit_sample_count);
+		goto out_delete_evlist;
+	}
+
+	if ((exit_sample_count != NR_ITERS) || (comm_count != NR_ITERS)) {
+		pr_err("Unexpected counter: exit_sample_count=%d, comm_count=%d\n",
+		       exit_sample_count, comm_count);
 		goto out_delete_evlist;
 	}
 
-	err = do_test(evlist, 1, &sample_count, &comm_count);
+	err = do_test(evlist, aux_evlist, 1, NULL, NULL, NULL);
 	if (err != TEST_OK)
 		goto out_delete_evlist;
 
 	ret = TEST_OK;
 out_delete_evlist:
+	if (aux_evlist)
+		perf_evlist__delete(aux_evlist);
 	perf_evlist__delete(evlist);
 	return ret;
 }
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 4/8] perf record: Introduce rec->overwrite_evlist for overwritable events
  2016-06-20 10:47 [PATCH v8 0/8] perf tools: Support overwritable ring buffer Wang Nan
                   ` (2 preceding siblings ...)
  2016-06-20 10:47 ` [PATCH v8 3/8] perf tests: Add testcase for auxiliary evlist Wang Nan
@ 2016-06-20 10:47 ` Wang Nan
  2016-06-21 21:30   ` Arnaldo Carvalho de Melo
  2016-06-20 10:47 ` [PATCH v8 5/8] perf record: Toggle overwrite ring buffer for reading Wang Nan
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: Wang Nan @ 2016-06-20 10:47 UTC (permalink / raw)
  To: acme
  Cc: linux-kernel, pi3orama, Wang Nan, He Kuang,
	Arnaldo Carvalho de Melo, Jiri Olsa, Masami Hiramatsu,
	Namhyung Kim, Zefan Li

Create an auxiliary evlist for overwritable events.

Before mmap, build this evlist and set 'overwrite' and 'backward'
attribute. Since perf_evlist__mmap_ex() only maps events when
evsel->overwrite matches evlist's corresponding attributes, with
these two evlists an event goes to either rec->evlist or
rec->overwrite_evlist.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c | 132 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 109 insertions(+), 23 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index d4cf1b0..dbbb3c0 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -50,6 +50,7 @@ struct record {
 	struct perf_data_file	file;
 	struct auxtrace_record	*itr;
 	struct perf_evlist	*evlist;
+	struct perf_evlist	*overwrite_evlist;
 	struct perf_session	*session;
 	const char		*progname;
 	int			realtime_prio;
@@ -341,6 +342,84 @@ int auxtrace_record__snapshot_start(struct auxtrace_record *itr __maybe_unused)
 
 #endif
 
+static int record__create_overwrite_evlist(struct record *rec)
+{
+	struct perf_evlist *evlist = rec->evlist;
+	struct perf_evsel *pos;
+
+	evlist__for_each(evlist, pos) {
+		if (!pos->overwrite)
+			continue;
+
+		if (!rec->overwrite_evlist) {
+			rec->overwrite_evlist = perf_evlist__new_aux(evlist);
+			if (rec->overwrite_evlist) {
+				rec->overwrite_evlist->backward = true;
+				rec->overwrite_evlist->overwrite = true;
+				return 0;
+			} else
+				return -ENOMEM;
+		}
+	}
+	return 0;
+}
+
+static int record__mmap_evlist(struct record *rec,
+			       struct perf_evlist *evlist,
+			       bool overwrite)
+{
+	struct record_opts *opts = &rec->opts;
+	char msg[512];
+
+	/*
+	 * Don't use evlist->overwrite because it is logically an
+	 * internal attribute and is set by perf_evlist__mmap_ex().
+	 * Avoid circular dependency.
+	 */
+	if (perf_evlist__mmap_ex(evlist, opts->mmap_pages, overwrite,
+				 opts->auxtrace_mmap_pages,
+				 opts->auxtrace_snapshot_mode) < 0) {
+		if (errno == EPERM) {
+			pr_err("Permission error mapping pages.\n"
+			       "Consider increasing "
+			       "/proc/sys/kernel/perf_event_mlock_kb,\n"
+			       "or try again with a smaller value of -m/--mmap_pages.\n"
+			       "(current value: %u,%u)\n",
+			       opts->mmap_pages, opts->auxtrace_mmap_pages);
+			return -errno;
+		} else {
+			pr_err("failed to mmap with %d (%s)\n", errno,
+				strerror_r(errno, msg, sizeof(msg)));
+			if (errno)
+				return -errno;
+			else
+				return -EINVAL;
+		}
+	}
+	return 0;
+}
+
+static int record__mmap(struct record *rec)
+{
+	int err;
+
+	err = record__create_overwrite_evlist(rec);
+	if (err)
+		return err;
+
+	err = record__mmap_evlist(rec, rec->evlist, false);
+	if (err)
+		return err;
+
+	if (!rec->overwrite_evlist)
+		return 0;
+
+	err = record__mmap_evlist(rec, rec->overwrite_evlist, true);
+	if (err)
+		return err;
+	return 0;
+}
+
 static int record__open(struct record *rec)
 {
 	char msg[512];
@@ -353,6 +432,13 @@ static int record__open(struct record *rec)
 	perf_evlist__config(evlist, opts, &callchain_param);
 
 	evlist__for_each(evlist, pos) {
+		if (pos->overwrite) {
+			if (!pos->attr.write_backward) {
+				ui__warning("Unable to read from overwrite ring buffer\n\n");
+				rc = -ENOSYS;
+				goto out;
+			}
+		}
 try_again:
 		if (perf_evsel__open(pos, pos->cpus, pos->threads) < 0) {
 			if (perf_evsel__fallback(pos, errno, msg, sizeof(msg))) {
@@ -377,28 +463,9 @@ try_again:
 		goto out;
 	}
 
-	if (perf_evlist__mmap_ex(evlist, opts->mmap_pages, false,
-				 opts->auxtrace_mmap_pages,
-				 opts->auxtrace_snapshot_mode) < 0) {
-		if (errno == EPERM) {
-			pr_err("Permission error mapping pages.\n"
-			       "Consider increasing "
-			       "/proc/sys/kernel/perf_event_mlock_kb,\n"
-			       "or try again with a smaller value of -m/--mmap_pages.\n"
-			       "(current value: %u,%u)\n",
-			       opts->mmap_pages, opts->auxtrace_mmap_pages);
-			rc = -errno;
-		} else {
-			pr_err("failed to mmap with %d (%s)\n", errno,
-				strerror_r(errno, msg, sizeof(msg)));
-			if (errno)
-				rc = -errno;
-			else
-				rc = -EINVAL;
-		}
+	rc = record__mmap(rec);
+	if (rc)
 		goto out;
-	}
-
 	session->evlist = evlist;
 	perf_session__set_id_hdr_size(session);
 out:
@@ -655,10 +722,26 @@ perf_event__synth_time_conv(const struct perf_event_mmap_page *pc __maybe_unused
 	return 0;
 }
 
+static const struct perf_event_mmap_page *
+perf_evlist__pick_pc(struct perf_evlist *evlist)
+{
+	if (evlist && evlist->mmap && evlist->mmap[0].base)
+		return evlist->mmap[0].base;
+	return NULL;
+}
+
 static const struct perf_event_mmap_page *record__pick_pc(struct record *rec)
 {
-	if (rec->evlist && rec->evlist->mmap && rec->evlist->mmap[0].base)
-		return rec->evlist->mmap[0].base;
+	const struct perf_event_mmap_page *pc;
+
+	/* Change it to a loop if a new aux evlist is added */
+	pc = perf_evlist__pick_pc(rec->evlist);
+	if (pc)
+		return pc;
+	pc = perf_evlist__pick_pc(rec->overwrite_evlist);
+	if (pc)
+		return pc;
+
 	return NULL;
 }
 
@@ -1269,6 +1352,7 @@ static struct record record = {
 		.mmap2		= perf_event__process_mmap2,
 		.ordered_events	= true,
 	},
+	.overwrite_evlist = NULL,
 };
 
 const char record_callchain_help[] = CALLCHAIN_RECORD_HELP
@@ -1565,6 +1649,8 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
 	err = __cmd_record(&record, argc, argv);
 out_symbol_exit:
 	perf_evlist__delete(rec->evlist);
+	if (rec->overwrite_evlist)
+		perf_evlist__delete(rec->overwrite_evlist);
 	symbol__exit();
 	auxtrace_record__free(rec->itr);
 	return err;
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 5/8] perf record: Toggle overwrite ring buffer for reading
  2016-06-20 10:47 [PATCH v8 0/8] perf tools: Support overwritable ring buffer Wang Nan
                   ` (3 preceding siblings ...)
  2016-06-20 10:47 ` [PATCH v8 4/8] perf record: Introduce rec->overwrite_evlist for overwritable events Wang Nan
@ 2016-06-20 10:47 ` Wang Nan
  2016-06-20 10:47 ` [PATCH v8 6/8] perf tools: Enable overwrite settings Wang Nan
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Wang Nan @ 2016-06-20 10:47 UTC (permalink / raw)
  To: acme
  Cc: linux-kernel, pi3orama, Wang Nan, He Kuang,
	Arnaldo Carvalho de Melo, Jiri Olsa, Masami Hiramatsu,
	Namhyung Kim, Zefan Li

overwrite_evt_state is introduced to reflect the state of overwritable
ring buffers. It is a state machine with 3 states:

 RUNNING --(1)--> DATA_PENDING --(2)--> EMPTY
    ^                  ^                 |
    |                  |___(disallow)___/|
    |                                    |
     \_________________(3)______________/

 RUNNING      : Overwritable ring buffers are recording
 DATA_PENDING : We are required to collect overwritable ring buffers
 EMPTY        : We have collected data from those ring buffers.

 (1): Pause ring buffers for reading
 (2): Read from ring buffers
 (3): Resume ring buffers for recording

We can't avoid this complexity. Since we deliberately drop records from
overwritable ring buffer, there's no mean for us to check remaining from
ring buffer itself (by checking head and old pointers). Therefore, we
need DATA_PENDING and EMPTY state to help us recording what we have done
to the ring buffer.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/builtin-record.c | 146 +++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 136 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index dbbb3c0..48c0051 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -42,6 +42,28 @@
 #include <sys/mman.h>
 #include <asm/bug.h>
 
+/*
+ * State machine of overwrite_evt_state:
+ *
+ * RUNNING --(1)--> DATA_PENDING --(2)--> EMPTY
+ *    ^                  ^                 |
+ *    |                  |___(disallow)___/|
+ *    |                                    |
+ *     \_________________(3)______________/
+ *
+ * RUNNING      : Overwritable ring buffers are recording
+ * DATA_PENDING : We are required to collect overwritable ring buffers
+ * EMPTY        : We have collected data from those ring buffers.
+ *
+ * (1): Pause ring buffers for reading
+ * (2): Read from ring buffers
+ * (3): Resume ring buffers for recording
+ */
+enum overwrite_evt_state {
+	OVERWRITE_EVT_RUNNING,
+	OVERWRITE_EVT_DATA_PENDING,
+	OVERWRITE_EVT_EMPTY,
+};
 
 struct record {
 	struct perf_tool	tool;
@@ -61,6 +83,7 @@ struct record {
 	bool			buildid_all;
 	bool			timestamp_filename;
 	bool			switch_output;
+	enum overwrite_evt_state overwrite_evt_state;
 	unsigned long long	samples;
 };
 
@@ -132,9 +155,9 @@ rb_find_range(struct perf_evlist *evlist,
 	return backward_rb_find_range(data, mask, head, start, end);
 }
 
-static int record__mmap_read(struct record *rec, int idx)
+static int record__mmap_read(struct record *rec, struct perf_evlist *evlist, int idx)
 {
-	struct perf_mmap *md = &rec->evlist->mmap[idx];
+	struct perf_mmap *md = &evlist->mmap[idx];
 	u64 head = perf_mmap__read_head(md);
 	u64 old = md->prev;
 	u64 end = head, start = old;
@@ -143,7 +166,7 @@ static int record__mmap_read(struct record *rec, int idx)
 	void *buf;
 	int rc = 0;
 
-	if (rb_find_range(rec->evlist, data, md->mask, head,
+	if (rb_find_range(evlist, data, md->mask, head,
 			  old, &start, &end))
 		return -1;
 
@@ -157,7 +180,7 @@ static int record__mmap_read(struct record *rec, int idx)
 		WARN_ONCE(1, "failed to keep up with mmap data. (warn only once)\n");
 
 		md->prev = head;
-		perf_evlist__mmap_consume(rec->evlist, idx);
+		perf_evlist__mmap_consume(evlist, idx);
 		return 0;
 	}
 
@@ -182,7 +205,7 @@ static int record__mmap_read(struct record *rec, int idx)
 	}
 
 	md->prev = head;
-	perf_evlist__mmap_consume(rec->evlist, idx);
+	perf_evlist__mmap_consume(evlist, idx);
 out:
 	return rc;
 }
@@ -468,6 +491,7 @@ try_again:
 		goto out;
 	session->evlist = evlist;
 	perf_session__set_id_hdr_size(session);
+	rec->overwrite_evt_state = OVERWRITE_EVT_RUNNING;
 out:
 	return rc;
 }
@@ -548,17 +572,72 @@ static struct perf_event_header finished_round_event = {
 	.type = PERF_RECORD_FINISHED_ROUND,
 };
 
-static int record__mmap_read_all(struct record *rec)
+static void
+record__toggle_overwrite_evsels(struct record *rec,
+				enum overwrite_evt_state state)
+{
+	struct perf_evlist *evlist = rec->overwrite_evlist;
+	enum overwrite_evt_state old_state = rec->overwrite_evt_state;
+	enum action {
+		NONE,
+		PAUSE,
+		RESUME,
+	} action = NONE;
+
+	switch (old_state) {
+	case OVERWRITE_EVT_RUNNING:
+		if (state != OVERWRITE_EVT_RUNNING)
+			action = PAUSE;
+		break;
+	case OVERWRITE_EVT_DATA_PENDING:
+		if (state == OVERWRITE_EVT_RUNNING)
+			action = RESUME;
+		break;
+	case OVERWRITE_EVT_EMPTY:
+		if (state == OVERWRITE_EVT_RUNNING)
+			action = RESUME;
+		if (state == OVERWRITE_EVT_DATA_PENDING)
+			state = OVERWRITE_EVT_EMPTY;
+		break;
+	default:
+		WARN_ONCE(1, "Shouldn't get there\n");
+	}
+
+	rec->overwrite_evt_state = state;
+
+	if (action == NONE)
+		return;
+
+	if (!evlist)
+		return;
+
+	switch (action) {
+	case PAUSE:
+		perf_evlist__pause(evlist);
+		break;
+	case RESUME:
+		perf_evlist__resume(evlist);
+		break;
+	case NONE:
+	default:
+		break;
+	}
+}
+
+static int __record__mmap_read_evlist(struct record *rec, struct perf_evlist *evlist)
 {
 	u64 bytes_written = rec->bytes_written;
 	int i;
 	int rc = 0;
 
-	for (i = 0; i < rec->evlist->nr_mmaps; i++) {
-		struct auxtrace_mmap *mm = &rec->evlist->mmap[i].auxtrace_mmap;
+	if (!evlist)
+		return 0;
+
+	for (i = 0; i < evlist->nr_mmaps; i++) {
+		struct auxtrace_mmap *mm = &evlist->mmap[i].auxtrace_mmap;
 
-		if (rec->evlist->mmap[i].base) {
-			if (record__mmap_read(rec, i) != 0) {
+		if (evlist->mmap[i].base) {
+			if (record__mmap_read(rec, evlist, i) != 0) {
 				rc = -1;
 				goto out;
 			}
@@ -582,6 +661,23 @@ out:
 	return rc;
 }
 
+static int record__mmap_read_all(struct record *rec)
+{
+	int err;
+
+	err = __record__mmap_read_evlist(rec, rec->evlist);
+	if (err)
+		return err;
+
+	if (rec->overwrite_evt_state == OVERWRITE_EVT_DATA_PENDING) {
+		err = __record__mmap_read_evlist(rec, rec->overwrite_evlist);
+		if (err)
+			return err;
+		record__toggle_overwrite_evsels(rec, OVERWRITE_EVT_EMPTY);
+	}
+	return 0;
+}
+
 static void record__init_features(struct record *rec)
 {
 	struct perf_session *session = rec->session;
@@ -978,6 +1074,17 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	for (;;) {
 		unsigned long long hits = rec->samples;
 
+		/*
+		 * rec->overwrite_evt_state is possible to be
+		 * OVERWRITE_EVT_EMPTY here: when done == true and
+		 * hits != rec->samples after previous reading.
+		 *
+		 * record__toggle_overwrite_evsels ensure we never
+		 * convert OVERWRITE_EVT_EMPTY to OVERWRITE_EVT_DATA_PENDING.
+		 */
+		if (trigger_is_hit(&switch_output_trigger) || done || draining)
+			record__toggle_overwrite_evsels(rec, OVERWRITE_EVT_DATA_PENDING);
+
 		if (record__mmap_read_all(rec) < 0) {
 			trigger_error(&auxtrace_snapshot_trigger);
 			trigger_error(&switch_output_trigger);
@@ -997,8 +1104,27 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		}
 
 		if (trigger_is_hit(&switch_output_trigger)) {
+			/*
+			 * If switch_output_trigger is hit, the data in
+			 * overwritable ring buffer should have been collected,
+			 * so overwrite_evt_state should be set to
+			 * OVERWRITE_EVT_EMPTY.
+			 *
+			 * If SIGUSR2 raise after or during record__mmap_read_all(),
+			 * record__mmap_read_all() didn't collect data from
+			 * overwritable ring buffer. Read again.
+			 */
+			if (rec->overwrite_evt_state == OVERWRITE_EVT_RUNNING)
+				continue;
 			trigger_ready(&switch_output_trigger);
 
+			/*
+			 * Reenable events in overwrite ring buffer after
+			 * record__mmap_read_all(): we should have collected
+			 * data from it.
+			 */
+			record__toggle_overwrite_evsels(rec, OVERWRITE_EVT_RUNNING);
+
 			if (!quiet)
 				fprintf(stderr, "[ perf record: dump data: Woken up %ld times ]\n",
 					waking);
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 6/8] perf tools: Enable overwrite settings
  2016-06-20 10:47 [PATCH v8 0/8] perf tools: Support overwritable ring buffer Wang Nan
                   ` (4 preceding siblings ...)
  2016-06-20 10:47 ` [PATCH v8 5/8] perf record: Toggle overwrite ring buffer for reading Wang Nan
@ 2016-06-20 10:47 ` Wang Nan
  2016-06-21 21:49   ` Arnaldo Carvalho de Melo
  2016-06-20 10:47 ` [PATCH v8 7/8] perf tools: Don't warn about out of order event if write_backward is used Wang Nan
  2016-06-20 10:47 ` [PATCH v8 8/8] perf tools: Add --tail-synthesize option Wang Nan
  7 siblings, 1 reply; 29+ messages in thread
From: Wang Nan @ 2016-06-20 10:47 UTC (permalink / raw)
  To: acme
  Cc: linux-kernel, pi3orama, Wang Nan, He Kuang,
	Arnaldo Carvalho de Melo, Jiri Olsa, Masami Hiramatsu,
	Namhyung Kim, Zefan Li

This patch allows following config terms and option:

Globally setting events to overwrite;

 # perf record --overwrite ...

Set specific events to be overwrite or no-overwrite.

 # perf record --event cycles/overwrite/ ...
 # perf record --event cycles/no-overwrite/ ...

Add missing config terms and update config term array size because the
longest string length is changed.

For overwritable events, automatically select attr.write_backward since
perf requires it to be backward for reading.

Test result:
 # perf record --overwrite -e syscalls:*enter_nanosleep* usleep 1
 [ perf record: Woken up 2 times to write data ]
 [ perf record: Captured and wrote 0.011 MB perf.data (1 samples) ]
 # perf evlist -v
 syscalls:sys_enter_nanosleep: type: 2, size: 112, config: 0x134, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW, disabled: 1, inherit: 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, write_backward: 1
 # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/Documentation/perf-record.txt | 14 ++++++++++++++
 tools/perf/builtin-record.c              |  1 +
 tools/perf/perf.h                        |  1 +
 tools/perf/tests/backward-ring-buffer.c  | 15 ++++++---------
 tools/perf/util/evsel.c                  | 12 ++++++++++++
 tools/perf/util/evsel.h                  |  2 ++
 tools/perf/util/parse-events.c           | 20 ++++++++++++++++++--
 tools/perf/util/parse-events.h           |  2 ++
 tools/perf/util/parse-events.l           |  2 ++
 9 files changed, 58 insertions(+), 11 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 8dbee83..f5cb932 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -360,6 +360,20 @@ particular perf.data snapshot should be kept or not.
 
 Implies --timestamp-filename, --no-buildid and --no-buildid-cache.
 
+--overwrite::
+Makes all events use overwritable ring buffer. Event with overwritable ring
+buffer works like a flight recorder: when buffer gets full, instead of dumping
+records into output file, kernel overwrites old records silently. Perf dumps
+data from overwritable ring buffer when switching output (see --switch-output)
+and before terminate.
+
+Perf behaves like a daemon when '--overwrite' and '--switch-output' are
+provided. It record and drop events in background, and dumps data when
+something unusual is detected.
+
+'overwrite' attribute can also be set or canceled for specific event using
+config terms like 'cycles/overwrite/' and 'instructions/no-overwrite/'.
+
 SEE ALSO
 --------
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 48c0051..bb62882 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1520,6 +1520,7 @@ struct option __record_options[] = {
 	OPT_BOOLEAN_SET('i', "no-inherit", &record.opts.no_inherit,
 			&record.opts.no_inherit_set,
 			"child tasks do not inherit counters"),
+	OPT_BOOLEAN(0, "overwrite", &record.opts.overwrite, "use overwrite mode"),
 	OPT_UINTEGER('F', "freq", &record.opts.user_freq, "profile at this frequency"),
 	OPT_CALLBACK('m', "mmap-pages", &record.opts, "pages[,pages]",
 		     "number of mmap data pages and AUX area tracing mmap pages",
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index cd8f1b1..608b42b 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -59,6 +59,7 @@ struct record_opts {
 	bool	     record_switch_events;
 	bool	     all_kernel;
 	bool	     all_user;
+	bool	     overwrite;
 	unsigned int freq;
 	unsigned int mmap_pages;
 	unsigned int auxtrace_mmap_pages;
diff --git a/tools/perf/tests/backward-ring-buffer.c b/tools/perf/tests/backward-ring-buffer.c
index 76e34c0..bba0b83 100644
--- a/tools/perf/tests/backward-ring-buffer.c
+++ b/tools/perf/tests/backward-ring-buffer.c
@@ -130,27 +130,24 @@ int test__backward_ring_buffer(int subtest __maybe_unused)
 	}
 
 	bzero(&parse_error, sizeof(parse_error));
-	err = parse_events(evlist, "syscalls:sys_enter_prctl", &parse_error);
+	/*
+	 * Set backward bit, ring buffer should be writing from end. Record
+	 * it in aux evlist
+	 */
+	err = parse_events(evlist, "syscalls:sys_enter_prctl/overwrite/", &parse_error);
 	if (err) {
 		pr_debug("Failed to parse tracepoint event, try use root\n");
 		ret = TEST_SKIP;
 		goto out_delete_evlist;
 	}
 
-	/*
-	 * Set backward bit, ring buffer should be writing from end. Record
-	 * it in aux evlist
-	 */
-	perf_evlist__last(evlist)->overwrite = true;
-	perf_evlist__last(evlist)->attr.write_backward = 1;
-
+	/* Don't set backward bit for exit event. Record it in main evlist */
 	err = parse_events(evlist, "syscalls:sys_exit_prctl", &parse_error);
 	if (err) {
 		pr_debug("Failed to parse tracepoint event, try use root\n");
 		ret = TEST_SKIP;
 		goto out_delete_evlist;
 	}
-	/* Don't set backward bit for exit event. Record it in main evlist */
 
 	perf_evlist__config(evlist, &opts, NULL);
 
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 1d8f2bb..3629520 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -677,11 +677,22 @@ static void apply_config_terms(struct perf_evsel *evsel,
 			 */
 			attr->inherit = term->val.inherit ? 1 : 0;
 			break;
+		case PERF_EVSEL__CONFIG_TERM_OVERWRITE:
+			evsel->overwrite = term->val.overwrite ? 1 : 0;
+			break;
 		default:
 			break;
 		}
 	}
 
+	/*
+	 * Set backward after config term processing because it is
+	 * possible to set overwrite globally, without config
+	 * terms.
+	 */
+	if (evsel->overwrite)
+		attr->write_backward = 1;
+
 	/* User explicitly set per-event callgraph, clear the old setting and reset. */
 	if ((callgraph_buf != NULL) || (dump_size > 0) || max_stack) {
 		if (max_stack) {
@@ -758,6 +769,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts,
 
 	attr->sample_id_all = perf_missing_features.sample_id_all ? 0 : 1;
 	attr->inherit	    = !opts->no_inherit;
+	evsel->overwrite    = opts->overwrite;
 
 	perf_evsel__set_sample_bit(evsel, IP);
 	perf_evsel__set_sample_bit(evsel, TID);
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 828ddd1..55ff958 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -45,6 +45,7 @@ enum {
 	PERF_EVSEL__CONFIG_TERM_STACK_USER,
 	PERF_EVSEL__CONFIG_TERM_INHERIT,
 	PERF_EVSEL__CONFIG_TERM_MAX_STACK,
+	PERF_EVSEL__CONFIG_TERM_OVERWRITE,
 	PERF_EVSEL__CONFIG_TERM_MAX,
 };
 
@@ -59,6 +60,7 @@ struct perf_evsel_config_term {
 		u64	stack_user;
 		int	max_stack;
 		bool	inherit;
+		bool	overwrite;
 	} val;
 };
 
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index d15e335..b641f75 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -901,6 +901,8 @@ static const char *config_term_names[__PARSE_EVENTS__TERM_TYPE_NR] = {
 	[PARSE_EVENTS__TERM_TYPE_NOINHERIT]		= "no-inherit",
 	[PARSE_EVENTS__TERM_TYPE_INHERIT]		= "inherit",
 	[PARSE_EVENTS__TERM_TYPE_MAX_STACK]		= "max-stack",
+	[PARSE_EVENTS__TERM_TYPE_OVERWRITE]		= "overwrite",
+	[PARSE_EVENTS__TERM_TYPE_NOOVERWRITE]		= "no-overwrite",
 };
 
 static bool config_term_shrinked;
@@ -993,6 +995,12 @@ do {									   \
 	case PARSE_EVENTS__TERM_TYPE_NOINHERIT:
 		CHECK_TYPE_VAL(NUM);
 		break;
+	case PARSE_EVENTS__TERM_TYPE_OVERWRITE:
+		CHECK_TYPE_VAL(NUM);
+		break;
+	case PARSE_EVENTS__TERM_TYPE_NOOVERWRITE:
+		CHECK_TYPE_VAL(NUM);
+		break;
 	case PARSE_EVENTS__TERM_TYPE_NAME:
 		CHECK_TYPE_VAL(STR);
 		break;
@@ -1045,6 +1053,8 @@ static int config_term_tracepoint(struct perf_event_attr *attr,
 	case PARSE_EVENTS__TERM_TYPE_INHERIT:
 	case PARSE_EVENTS__TERM_TYPE_NOINHERIT:
 	case PARSE_EVENTS__TERM_TYPE_MAX_STACK:
+	case PARSE_EVENTS__TERM_TYPE_OVERWRITE:
+	case PARSE_EVENTS__TERM_TYPE_NOOVERWRITE:
 		return config_term_common(attr, term, err);
 	default:
 		if (err) {
@@ -1117,6 +1127,12 @@ do {								\
 		case PARSE_EVENTS__TERM_TYPE_MAX_STACK:
 			ADD_CONFIG_TERM(MAX_STACK, max_stack, term->val.num);
 			break;
+		case PARSE_EVENTS__TERM_TYPE_OVERWRITE:
+			ADD_CONFIG_TERM(OVERWRITE, overwrite, term->val.num ? 1 : 0);
+			break;
+		case PARSE_EVENTS__TERM_TYPE_NOOVERWRITE:
+			ADD_CONFIG_TERM(OVERWRITE, overwrite, term->val.num ? 0 : 1);
+			break;
 		default:
 			break;
 		}
@@ -2330,9 +2346,9 @@ static void config_terms_list(char *buf, size_t buf_sz)
 char *parse_events_formats_error_string(char *additional_terms)
 {
 	char *str;
-	/* "branch_type" is the longest name */
+	/* "no-overwrite" is the longest name */
 	char static_terms[__PARSE_EVENTS__TERM_TYPE_NR *
-			  (sizeof("branch_type") - 1)];
+			  (sizeof("no-overwrite") - 1)];
 
 	config_terms_list(static_terms, sizeof(static_terms));
 	/* valid terms */
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 46c05cc..1b04d82 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -69,6 +69,8 @@ enum {
 	PARSE_EVENTS__TERM_TYPE_NOINHERIT,
 	PARSE_EVENTS__TERM_TYPE_INHERIT,
 	PARSE_EVENTS__TERM_TYPE_MAX_STACK,
+	PARSE_EVENTS__TERM_TYPE_NOOVERWRITE,
+	PARSE_EVENTS__TERM_TYPE_OVERWRITE,
 	__PARSE_EVENTS__TERM_TYPE_NR,
 };
 
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 3c15b33..7a25194 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -202,6 +202,8 @@ stack-size		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_STACKSIZE); }
 max-stack		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_MAX_STACK); }
 inherit			{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_INHERIT); }
 no-inherit		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NOINHERIT); }
+overwrite		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_OVERWRITE); }
+no-overwrite		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NOOVERWRITE); }
 ,			{ return ','; }
 "/"			{ BEGIN(INITIAL); return '/'; }
 {name_minus}		{ return str(yyscanner, PE_NAME); }
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 7/8] perf tools: Don't warn about out of order event if write_backward is used
  2016-06-20 10:47 [PATCH v8 0/8] perf tools: Support overwritable ring buffer Wang Nan
                   ` (5 preceding siblings ...)
  2016-06-20 10:47 ` [PATCH v8 6/8] perf tools: Enable overwrite settings Wang Nan
@ 2016-06-20 10:47 ` Wang Nan
  2016-06-20 10:47 ` [PATCH v8 8/8] perf tools: Add --tail-synthesize option Wang Nan
  7 siblings, 0 replies; 29+ messages in thread
From: Wang Nan @ 2016-06-20 10:47 UTC (permalink / raw)
  To: acme
  Cc: linux-kernel, pi3orama, Wang Nan, He Kuang,
	Arnaldo Carvalho de Melo, Jiri Olsa, Masami Hiramatsu,
	Namhyung Kim, Zefan Li

If write_backward attribute is set, records are written into kernel
ring buffer from end to beginning, but read from beginning to end.
To avoid 'XX out of order events recorded' warning message (timestamps
of records is in reverse order when using write_backward), suppress the
warning message if write_backward is selected by at lease one event.

Result:

Before this patch:
 # perf record -m 1 -e raw_syscalls:sys_exit/overwrite/ \
                    -e raw_syscalls:sys_enter \
                    dd if=/dev/zero of=/dev/null count=300
 300+0 records in
 300+0 records out
 153600 bytes (154 kB) copied, 0.000601617 s, 255 MB/s
 [ perf record: Woken up 5 times to write data ]
 Warning:
 40 out of order events recorded.
 [ perf record: Captured and wrote 0.096 MB perf.data (696 samples) ]

After this patch:
 # perf record -m 1 -e raw_syscalls:sys_exit/overwrite/ \
                    -e raw_syscalls:sys_enter \
                    dd if=/dev/zero of=/dev/null count=300
 300+0 records in
 300+0 records out
 153600 bytes (154 kB) copied, 0.000644873 s, 238 MB/s
 [ perf record: Woken up 5 times to write data ]
 [ perf record: Captured and wrote 0.096 MB perf.data (696 samples) ]

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/util/session.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index dfedf09..561441e 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1497,10 +1497,27 @@ int perf_session__register_idle_thread(struct perf_session *session)
 	return err;
 }
 
+static void
+perf_session__warn_order(const struct perf_session *session)
+{
+	const struct ordered_events *oe = &session->ordered_events;
+	struct perf_evsel *evsel;
+	bool should_warn = true;
+
+	evlist__for_each(session->evlist, evsel) {
+		if (evsel->attr.write_backward)
+			should_warn = false;
+	}
+
+	if (!should_warn)
+		return;
+	if (oe->nr_unordered_events != 0)
+		ui__warning("%u out of order events recorded.\n", oe->nr_unordered_events);
+}
+
 static void perf_session__warn_about_errors(const struct perf_session *session)
 {
 	const struct events_stats *stats = &session->evlist->stats;
-	const struct ordered_events *oe = &session->ordered_events;
 
 	if (session->tool->lost == perf_event__process_lost &&
 	    stats->nr_events[PERF_RECORD_LOST] != 0) {
@@ -1557,8 +1574,7 @@ static void perf_session__warn_about_errors(const struct perf_session *session)
 			    stats->nr_unprocessable_samples);
 	}
 
-	if (oe->nr_unordered_events != 0)
-		ui__warning("%u out of order events recorded.\n", oe->nr_unordered_events);
+	perf_session__warn_order(session);
 
 	events_stats__auxtrace_error_warn(stats);
 
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v8 8/8] perf tools: Add --tail-synthesize option
  2016-06-20 10:47 [PATCH v8 0/8] perf tools: Support overwritable ring buffer Wang Nan
                   ` (6 preceding siblings ...)
  2016-06-20 10:47 ` [PATCH v8 7/8] perf tools: Don't warn about out of order event if write_backward is used Wang Nan
@ 2016-06-20 10:47 ` Wang Nan
  7 siblings, 0 replies; 29+ messages in thread
From: Wang Nan @ 2016-06-20 10:47 UTC (permalink / raw)
  To: acme
  Cc: linux-kernel, pi3orama, Wang Nan, He Kuang,
	Arnaldo Carvalho de Melo, Jiri Olsa, Masami Hiramatsu,
	Namhyung Kim, Zefan Li

When working with overwritable ring buffer there's a inconvenience
problem: if perf dumps data after a long period after it starts, non-sample
events may lost, which makes following 'perf report' unable to identify
proc name and mmap layout. For example:

 # perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output \
        dd if=/dev/zero of=/dev/null

send SIGUSR2 after dd runs long enough. The resuling perf.data lost
correct comm and mmap events:

 # perf script -i perf.data.2016061522374354
 perf 24478 [004] 2581325.601789:  raw_syscalls:sys_exit: NR 0 = 512
 ^^^^
 Should be 'dd'
                   27b2e8 syscall_slow_exit_work+0xfe2000e3 (/lib/modules/4.6.0-rc3+/build/vmlinux)
                   203cc7 do_syscall_64+0xfe200117 (/lib/modules/4.6.0-rc3+/build/vmlinux)
                   b18d83 return_from_SYSCALL_64+0xfe200000 (/lib/modules/4.6.0-rc3+/build/vmlinux)
             7f47c417edf0 [unknown] ([unknown])
             ^^^^^^^^^^^^
             Fail to unwind

This patch provides a '--tail-synthesize' option, allows perf to collect
system status when finalizing output file. In resuling output file, the
non-sample events reflect system status when dumping data.

After this patch:
 # perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output --tail-synthesize \
        dd if=/dev/zero of=/dev/null

 # perf script -i perf.data.2016061600544998
 dd 27364 [004] 2583244.994464: raw_syscalls:sys_enter: NR 1 (1, ...
 ^^
 Correct comm
                   203a18 syscall_trace_enter_phase2+0xfe2001a8 ([kernel.kallsyms])
                   203aa5 syscall_trace_enter+0xfe200055 ([kernel.kallsyms])
                   203caa do_syscall_64+0xfe2000fa ([kernel.kallsyms])
                   b18d83 return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms])
                    d8e50 __GI___libc_write+0xffff01d9639f4010 (/tmp/oxygen_root-w00229757/lib64/libc-2.18.so)
                    ^^^^^
                    Correct unwind

This option doesn't aim to solve this problem completely. If a process
terminates before SIGUSR2, we still lost its COMM and MMAP events. For
example, we can't unwind correctly from the final perf.data we get from
the previous example, because when perf collects the final output file
(when we press C-c), the 'dd' is already terminated its
'/proc/<pid>/mmap' becomes empty. However this is a cheaper choice. To
complete solve this problem we need to continously output non-sample
events. To satisify the requirement of daemonization, we need to merge
them periodically. It is possible but requires much more code and cycles.

Automatically select --tail-synthesize when --overwrite is provided.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
---
 tools/perf/Documentation/perf-record.txt |  8 ++++++++
 tools/perf/builtin-record.c              | 31 +++++++++++++++++++++++++------
 tools/perf/perf.h                        |  1 +
 3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index f5cb932..d552baf 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -360,6 +360,12 @@ particular perf.data snapshot should be kept or not.
 
 Implies --timestamp-filename, --no-buildid and --no-buildid-cache.
 
+--tail-synthesize::
+Instead of collecting non-sample events (for example, fork, comm, mmap) at
+the beginning of record, collect them during finalizing an output file.
+The collected non-sample events reflects the status of the system when
+record is finished.
+
 --overwrite::
 Makes all events use overwritable ring buffer. Event with overwritable ring
 buffer works like a flight recorder: when buffer gets full, instead of dumping
@@ -374,6 +380,8 @@ something unusual is detected.
 'overwrite' attribute can also be set or canceled for specific event using
 config terms like 'cycles/overwrite/' and 'instructions/no-overwrite/'.
 
+Implies --tail-synthesize.
+
 SEE ALSO
 --------
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index bb62882..7b95444 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -724,13 +724,16 @@ record__finish_output(struct record *rec)
 	return;
 }
 
-static int record__synthesize_workload(struct record *rec)
+static int record__synthesize_workload(struct record *rec, bool tail)
 {
 	struct {
 		struct thread_map map;
 		struct thread_map_data map_data;
 	} thread_map;
 
+	if (rec->opts.tail_synthesize != tail)
+		return 0;
+
 	thread_map.map.nr = 1;
 	thread_map.map.map[0].pid = rec->evlist->workload.pid;
 	thread_map.map.map[0].comm = NULL;
@@ -741,7 +744,7 @@ static int record__synthesize_workload(struct record *rec)
 						 rec->opts.proc_map_timeout);
 }
 
-static int record__synthesize(struct record *rec);
+static int record__synthesize(struct record *rec, bool tail);
 
 static int
 record__switch_output(struct record *rec, bool at_exit)
@@ -752,6 +755,10 @@ record__switch_output(struct record *rec, bool at_exit)
 	/* Same Size:      "2015122520103046"*/
 	char timestamp[] = "InvalidTimestamp";
 
+	record__synthesize(rec, true);
+	if (target__none(&rec->opts.target))
+		record__synthesize_workload(rec, true);
+
 	rec->samples = 0;
 	record__finish_output(rec);
 	err = fetch_current_timestamp(timestamp, sizeof(timestamp));
@@ -774,7 +781,7 @@ record__switch_output(struct record *rec, bool at_exit)
 
 	/* Output tracking events */
 	if (!at_exit) {
-		record__synthesize(rec);
+		record__synthesize(rec, false);
 
 		/*
 		 * In 'perf record --switch-output' without -a,
@@ -786,7 +793,7 @@ record__switch_output(struct record *rec, bool at_exit)
 		 * perf_event__synthesize_thread_map() for those events.
 		 */
 		if (target__none(&rec->opts.target))
-			record__synthesize_workload(rec);
+			record__synthesize_workload(rec, false);
 	}
 	return fd;
 }
@@ -841,7 +848,7 @@ static const struct perf_event_mmap_page *record__pick_pc(struct record *rec)
 	return NULL;
 }
 
-static int record__synthesize(struct record *rec)
+static int record__synthesize(struct record *rec, bool tail)
 {
 	struct perf_session *session = rec->session;
 	struct machine *machine = &session->machines.host;
@@ -851,6 +858,9 @@ static int record__synthesize(struct record *rec)
 	int fd = perf_data_file__fd(file);
 	int err = 0;
 
+	if (rec->opts.tail_synthesize != tail)
+		return 0;
+
 	if (file->is_pipe) {
 		err = perf_event__synthesize_attrs(tool, session,
 						   process_synthesized_event);
@@ -1014,7 +1024,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 
 	machine = &session->machines.host;
 
-	err = record__synthesize(rec);
+	err = record__synthesize(rec, false);
 	if (err < 0)
 		goto out_child;
 
@@ -1179,6 +1189,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	if (!quiet)
 		fprintf(stderr, "[ perf record: Woken up %ld times to write data ]\n", waking);
 
+	if (target__none(&rec->opts.target))
+		record__synthesize_workload(rec, true);
+
 out_child:
 	if (forks) {
 		int exit_status;
@@ -1197,6 +1210,7 @@ out_child:
 	} else
 		status = err;
 
+	record__synthesize(rec, true);
 	/* this will be recalculated during process_buildids() */
 	rec->samples = 0;
 
@@ -1520,6 +1534,8 @@ struct option __record_options[] = {
 	OPT_BOOLEAN_SET('i', "no-inherit", &record.opts.no_inherit,
 			&record.opts.no_inherit_set,
 			"child tasks do not inherit counters"),
+	OPT_BOOLEAN(0, "tail-synthesize", &record.opts.tail_synthesize,
+		    "synthesize non-sample events at the end of output"),
 	OPT_BOOLEAN(0, "overwrite", &record.opts.overwrite, "use overwrite mode"),
 	OPT_UINTEGER('F', "freq", &record.opts.user_freq, "profile at this frequency"),
 	OPT_CALLBACK('m', "mmap-pages", &record.opts, "pages[,pages]",
@@ -1726,6 +1742,9 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
 		}
 	}
 
+	if (record.opts.overwrite)
+		record.opts.tail_synthesize = true;
+
 	if (rec->evlist->nr_entries == 0 &&
 	    perf_evlist__add_default(rec->evlist) < 0) {
 		pr_err("Not enough memory for event selector list\n");
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 608b42b..a7e0f14 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -59,6 +59,7 @@ struct record_opts {
 	bool	     record_switch_events;
 	bool	     all_kernel;
 	bool	     all_user;
+	bool	     tail_synthesize;
 	bool	     overwrite;
 	unsigned int freq;
 	unsigned int mmap_pages;
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 2/8] perf evlist: Introduce aux evlist
  2016-06-20 10:47 ` [PATCH v8 2/8] perf evlist: Introduce aux evlist Wang Nan
@ 2016-06-20 20:36   ` Arnaldo Carvalho de Melo
  2016-06-21  1:31     ` Wangnan (F)
  0 siblings, 1 reply; 29+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-06-20 20:36 UTC (permalink / raw)
  To: Wang Nan
  Cc: linux-kernel, pi3orama, He Kuang, Jiri Olsa, Masami Hiramatsu,
	Namhyung Kim, Zefan Li

Em Mon, Jun 20, 2016 at 10:47:19AM +0000, Wang Nan escreveu:
> An auxiliary evlist is created by perf_evlist__new_aux() using an
> existing evlist as its parent. An auxiliary evlist can have its own
> 'struct perf_mmap', but can't have any other data. User should use its
> parent instead when accessing other data.
> 
> Auxiliary evlists are containers of 'struct perf_mmap'. It is introduced
> to allow its parent evlist to map different events into separated mmaps.
> 
> Following commits create an auxiliary evlist for overwritable
> events, because overwritable events need a read only and backwards ring
> buffer, which is different from normal events.
> 
> To achieve this goal, this patch carefully changes 'evlist' to
> 'evlist->parent' in all functions in the path of 'perf_evlist__mmap_ex',
> except 'evlist->mmap' related operations, to make sure all evlist
> modifications (like pollfd and event id hash tables) goes to original
> evlist.
> 
> A 'evlist->parent' pointer is added to 'struct perf_evlist' and points to
> the evlist itself for normal evlists.
> 
> Children of one evlist are linked into it so one can find all children
> from its parent.

> To avoid potential complexity, forbid creating aux evlist from another
> aux evlist.
> 
> Improve perf_evlist__munmap_filtered(), so when recording, if an event
> is terminated, unmap mmaps, from parent and children.
 
> Signed-off-by: Wang Nan <wangnan0@huawei.com>
> Cc: He Kuang <hekuang@huawei.com>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Zefan Li <lizefan@huawei.com>
> Cc: pi3orama@163.com
> ---
>  tools/perf/util/evlist.c | 49 +++++++++++++++++++++++++++++++++++++-----------
>  tools/perf/util/evlist.h | 12 ++++++++++++
>  2 files changed, 50 insertions(+), 11 deletions(-)
> 
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 1b918aa..b03cb3c 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -41,10 +41,12 @@ void perf_evlist__init(struct perf_evlist *evlist, struct cpu_map *cpus,
>  	for (i = 0; i < PERF_EVLIST__HLIST_SIZE; ++i)
>  		INIT_HLIST_HEAD(&evlist->heads[i]);
>  	INIT_LIST_HEAD(&evlist->entries);
> +	INIT_LIST_HEAD(&evlist->children);
>  	perf_evlist__set_maps(evlist, cpus, threads);
>  	fdarray__init(&evlist->pollfd, 64);
>  	evlist->workload.pid = -1;
>  	evlist->backward = false;
> +	evlist->parent = evlist;
>  }
>  
>  struct perf_evlist *perf_evlist__new(void)
> @@ -487,13 +489,17 @@ static void perf_evlist__munmap_filtered(struct fdarray *fda, int fd,
>  					 void *arg __maybe_unused)
>  {
>  	struct perf_evlist *evlist = container_of(fda, struct perf_evlist, pollfd);
> +	struct perf_evlist *child;
>  
>  	perf_evlist__mmap_put(evlist, fda->priv[fd].idx);
> +	list_for_each_entry(child, &evlist->children, list)
> +		perf_evlist__mmap_put(child, fda->priv[fd].idx);
> +
>  }
>  
>  int perf_evlist__filter_pollfd(struct perf_evlist *evlist, short revents_and_mask)
>  {
> -	return fdarray__filter(&evlist->pollfd, revents_and_mask,
> +	return fdarray__filter(&evlist->parent->pollfd, revents_and_mask,
>  			       perf_evlist__munmap_filtered, NULL);
>  }
>  
> @@ -1012,7 +1018,7 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
>  	struct perf_evsel *evsel;
>  	int revent;
>  
> -	evlist__for_each(evlist, evsel) {
> +	evlist__for_each(evlist->parent, evsel) {
>  		int fd;
>  
>  		if (evsel->overwrite != (evlist->overwrite && evlist->backward))
> @@ -1044,16 +1050,16 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
>  		 * Therefore don't add it for polling.
>  		 */
>  		if (!evsel->system_wide &&
> -		    __perf_evlist__add_pollfd(evlist, fd, idx, revent) < 0) {
> +		    __perf_evlist__add_pollfd(evlist->parent, fd, idx, revent) < 0) {
>  			perf_evlist__mmap_put(evlist, idx);
>  			return -1;
>  		}
>  
>  		if (evsel->attr.read_format & PERF_FORMAT_ID) {
> -			if (perf_evlist__id_add_fd(evlist, evsel, cpu, thread,
> +			if (perf_evlist__id_add_fd(evlist->parent, evsel, cpu, thread,
>  						   fd) < 0)
>  				return -1;
> -			perf_evlist__set_sid_idx(evlist, evsel, idx, cpu,
> +			perf_evlist__set_sid_idx(evlist->parent, evsel, idx, cpu,
>  						 thread);
>  		}
>  	}
> @@ -1094,13 +1100,13 @@ static int perf_evlist__mmap_per_thread(struct perf_evlist *evlist,
>  					struct mmap_params *mp)
>  {
>  	int thread;
> -	int nr_threads = thread_map__nr(evlist->threads);
> +	int nr_threads = thread_map__nr(evlist->parent->threads);
>  
>  	pr_debug2("perf event ring buffer mmapped per thread\n");
>  	for (thread = 0; thread < nr_threads; thread++) {
>  		int output = -1;
>  
> -		auxtrace_mmap_params__set_idx(&mp->auxtrace_mp, evlist, thread,
> +		auxtrace_mmap_params__set_idx(&mp->auxtrace_mp, evlist->parent, thread,
>  					      false);
>  
>  		if (perf_evlist__mmap_per_evsel(evlist, thread, mp, 0, thread,
> @@ -1239,8 +1245,8 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
>  			 bool auxtrace_overwrite)
>  {
>  	struct perf_evsel *evsel;
> -	const struct cpu_map *cpus = evlist->cpus;
> -	const struct thread_map *threads = evlist->threads;
> +	const struct cpu_map *cpus = evlist->parent->cpus;
> +	const struct thread_map *threads = evlist->parent->threads;
>  	struct mmap_params mp = {
>  		.prot = PROT_READ | (overwrite ? 0 : PROT_WRITE),
>  	};
> @@ -1248,7 +1254,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
>  	if (evlist->mmap == NULL && perf_evlist__alloc_mmap(evlist) < 0)
>  		return -ENOMEM;
>  
> -	if (evlist->pollfd.entries == NULL && perf_evlist__alloc_pollfd(evlist) < 0)
> +	if (evlist->parent->pollfd.entries == NULL && perf_evlist__alloc_pollfd(evlist->parent) < 0)
>  		return -ENOMEM;
>  
>  	evlist->overwrite = overwrite;
> @@ -1259,7 +1265,7 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
>  	auxtrace_mmap_params__init(&mp.auxtrace_mp, evlist->mmap_len,
>  				   auxtrace_pages, auxtrace_overwrite);
>  
> -	evlist__for_each(evlist, evsel) {
> +	evlist__for_each(evlist->parent, evsel) {
>  		if ((evsel->attr.read_format & PERF_FORMAT_ID) &&
>  		    evsel->sample_id == NULL &&
>  		    perf_evsel__alloc_id(evsel, cpu_map__nr(cpus), threads->nr) < 0)
> @@ -1916,3 +1922,24 @@ perf_evlist__find_evsel_by_str(struct perf_evlist *evlist,
>  
>  	return NULL;
>  }
> +
> +struct perf_evlist *perf_evlist__new_aux(struct perf_evlist *parent)
> +{
> +	struct perf_evlist *evlist;
> +
> +	if (perf_evlist__is_aux(parent)) {
> +		pr_err("Internal error: create aux evlist from another aux evlist\n");
> +		return NULL;
> +	}
> +
> +	evlist = zalloc(sizeof(*evlist));
> +	if (!evlist)
> +		return NULL;
> +
> +	perf_evlist__init(evlist, parent->cpus, parent->threads);
> +	evlist->parent = parent->parent;
> +	INIT_LIST_HEAD(&evlist->list);
> +	list_add(&evlist->list, &parent->children);
> +
> +	return evlist;
> +}
> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> index 68cb136..5b50692 100644
> --- a/tools/perf/util/evlist.h
> +++ b/tools/perf/util/evlist.h
> @@ -37,6 +37,10 @@ struct perf_mmap {
>  
>  struct perf_evlist {
>  	struct list_head entries;
> +	union {
> +		struct list_head children;
> +		struct list_head list;

This is something new, right, i.e. having a list of evlists from a
parent evlist, one where there can be at most one level, i.e. no
grandchildrens...

Is this strictly needed? I.e. the existing code calling
perf_evlist__munmap_filtered() has just the parent pointer and thus, to
not change it we need to have a way to go the children?

Also, can you se a parent with more than one child?

- Arnaldo

> +	};
>  	struct hlist_head heads[PERF_EVLIST__HLIST_SIZE];
>  	int		 nr_entries;
>  	int		 nr_groups;
> @@ -60,8 +64,14 @@ struct perf_evlist {
>  	struct perf_evsel *selected;
>  	struct events_stats stats;
>  	struct perf_env	*env;
> +	struct perf_evlist *parent;
>  };
>  
> +static inline bool perf_evlist__is_aux(struct perf_evlist *evlist)
> +{
> +	return evlist->parent != evlist;
> +}
> +
>  struct perf_evsel_str_handler {
>  	const char *name;
>  	void	   *handler;
> @@ -70,6 +80,8 @@ struct perf_evsel_str_handler {
>  struct perf_evlist *perf_evlist__new(void);
>  struct perf_evlist *perf_evlist__new_default(void);
>  struct perf_evlist *perf_evlist__new_dummy(void);
> +struct perf_evlist *perf_evlist__new_aux(struct perf_evlist *);
> +
>  void perf_evlist__init(struct perf_evlist *evlist, struct cpu_map *cpus,
>  		       struct thread_map *threads);
>  void perf_evlist__exit(struct perf_evlist *evlist);
> -- 
> 1.8.3.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 2/8] perf evlist: Introduce aux evlist
  2016-06-20 20:36   ` Arnaldo Carvalho de Melo
@ 2016-06-21  1:31     ` Wangnan (F)
  0 siblings, 0 replies; 29+ messages in thread
From: Wangnan (F) @ 2016-06-21  1:31 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, pi3orama, He Kuang, Jiri Olsa, Masami Hiramatsu,
	Namhyung Kim, Zefan Li



On 2016/6/21 4:36, Arnaldo Carvalho de Melo wrote:
> Em Mon, Jun 20, 2016 at 10:47:19AM +0000, Wang Nan escreveu:
>> An auxiliary evlist is created by perf_evlist__new_aux() using an
>> existing evlist as its parent. An auxiliary evlist can have its own
>> 'struct perf_mmap', but can't have any other data. User should use its
>> parent instead when accessing other data.
>>
>> Auxiliary evlists are containers of 'struct perf_mmap'. It is introduced
>> to allow its parent evlist to map different events into separated mmaps.
>>
>> Following commits create an auxiliary evlist for overwritable
>> events, because overwritable events need a read only and backwards ring
>> buffer, which is different from normal events.
>>
>> To achieve this goal, this patch carefully changes 'evlist' to
>> 'evlist->parent' in all functions in the path of 'perf_evlist__mmap_ex',
>> except 'evlist->mmap' related operations, to make sure all evlist
>> modifications (like pollfd and event id hash tables) goes to original
>> evlist.
>>
>> A 'evlist->parent' pointer is added to 'struct perf_evlist' and points to
>> the evlist itself for normal evlists.
>>
>> Children of one evlist are linked into it so one can find all children
>> from its parent.
>> To avoid potential complexity, forbid creating aux evlist from another
>> aux evlist.
>>
>> Improve perf_evlist__munmap_filtered(), so when recording, if an event
>> is terminated, unmap mmaps, from parent and children.
>   
>> Signed-off-by: Wang Nan <wangnan0@huawei.com>
>> Cc: He Kuang <hekuang@huawei.com>
>> Cc: Jiri Olsa <jolsa@kernel.org>
>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
>> Cc: Namhyung Kim <namhyung@kernel.org>
>> Cc: Zefan Li <lizefan@huawei.com>
>> Cc: pi3orama@163.com
>> ---
>>   tools/perf/util/evlist.c | 49 +++++++++++++++++++++++++++++++++++++-----------
>>   tools/perf/util/evlist.h | 12 ++++++++++++
>>   2 files changed, 50 insertions(+), 11 deletions(-)

[SNIP]

>> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
>> index 68cb136..5b50692 100644
>> --- a/tools/perf/util/evlist.h
>> +++ b/tools/perf/util/evlist.h
>> @@ -37,6 +37,10 @@ struct perf_mmap {
>>   
>>   struct perf_evlist {
>>   	struct list_head entries;
>> +	union {
>> +		struct list_head children;
>> +		struct list_head list;
> This is something new, right, i.e. having a list of evlists from a
> parent evlist, one where there can be at most one level, i.e. no
> grandchildrens...
>
> Is this strictly needed? I.e. the existing code calling
> perf_evlist__munmap_filtered() has just the parent pointer and thus, to
> not change it we need to have a way to go the children?
>
> Also, can you se a parent with more than one child?

If we further restrict aux evlist and parent list, allow only one aux
evlist for a parent, we can avoid the linked list. Simply linking parent
and child using pointers would be enough.

However, I plan to introduce multiple children for a parent. There are
two possible aux evlists:

   1. Trigger aux evlist: we can use BPF script to measure events, and 
do something
      (dumpping, printing, computing, reporting...) when it detect something
      (for example, FPS drop to 10 frames/sec). We can create a bpf-output
      event and put it in a separated aux evlist with buffering turned off,
      so once BPF script generate a event perf receives it immediately.

   2. Non-sampling aux evlist: we can use a dummy event to collect fork, 
comm
      and mmap{,2} events in a separated aux evlist. We can use this 
evlist to
      maintain system state during recording, so we can totally solve 
the problem
      describe in patch 8/8.

Thank you.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 3/8] perf tests: Add testcase for auxiliary evlist
  2016-06-20 10:47 ` [PATCH v8 3/8] perf tests: Add testcase for auxiliary evlist Wang Nan
@ 2016-06-21 21:05   ` Arnaldo Carvalho de Melo
  2016-06-22  4:10     ` Wangnan (F)
  0 siblings, 1 reply; 29+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-06-21 21:05 UTC (permalink / raw)
  To: Wang Nan
  Cc: linux-kernel, pi3orama, Arnaldo Carvalho de Melo, Jiri Olsa,
	Masami Hiramatsu, Namhyung Kim, Zefan Li, He Kuang

Em Mon, Jun 20, 2016 at 10:47:20AM +0000, Wang Nan escreveu:
> Improve test backward-ring-buffer, trace both enter and exit event of
> prctl() syscall, utilize auxiliary evlist to mmap enter and exit event
> into separated mmaps.
> 
> Signed-off-by: Wang Nan <wangnan0@huawei.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Zefan Li <lizefan@huawei.com>
> Cc: He Kuang <hekuang@huawei.com>
> ---
>  tools/perf/tests/backward-ring-buffer.c | 87 ++++++++++++++++++++++++++-------
>  1 file changed, 68 insertions(+), 19 deletions(-)
> 
> diff --git a/tools/perf/tests/backward-ring-buffer.c b/tools/perf/tests/backward-ring-buffer.c
> index d9ba991..76e34c0 100644
> --- a/tools/perf/tests/backward-ring-buffer.c
> +++ b/tools/perf/tests/backward-ring-buffer.c
> @@ -31,16 +31,19 @@ static int count_samples(struct perf_evlist *evlist, int *sample_count,
>  	for (i = 0; i < evlist->nr_mmaps; i++) {
>  		union perf_event *event;
>  
> -		perf_evlist__mmap_read_catchup(evlist, i);
> -		while ((event = perf_evlist__mmap_read_backward(evlist, i)) != NULL) {
> +		if (evlist->backward)
> +			perf_evlist__mmap_read_catchup(evlist, i);

So, shouldn't this be done at perf_evlist__mmap_read_catchup()? I.e. you
will use this only when you know that one of the evlists count_samples()
will deal with can be a backwards one...

I.e. do with perf_evlist__mmap_read_catchup the same thing you did in
perf_evlist__mmap_read, checking there this evlist->backward. 

This can be done on top, so I'll continue tentatively merging this.

> +		while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) {
>  			const u32 type = event->header.type;
>  
>  			switch (type) {
>  			case PERF_RECORD_SAMPLE:
> -				(*sample_count)++;
> +				if (sample_count)
> +					(*sample_count)++;
>  				break;
>  			case PERF_RECORD_COMM:
> -				(*comm_count)++;
> +				if (comm_count)
> +					(*comm_count)++;

You could've avoided all this by passing some dummy integer pointer for
the enter_sample_count case. Making the patch smaller helps reviewing
:-)

>  				break;
>  			default:
>  				pr_err("Unexpected record of type %d\n", type);
> @@ -51,34 +54,53 @@ static int count_samples(struct perf_evlist *evlist, int *sample_count,
>  	return TEST_OK;
>  }
>  
> -static int do_test(struct perf_evlist *evlist, int mmap_pages,
> -		   int *sample_count, int *comm_count)
> +static int do_test(struct perf_evlist *evlist,
> +		   struct perf_evlist *aux_evlist,
> +		   int mmap_pages,
> +		   int *enter_sample_count,
> +		   int *exit_sample_count,
> +		   int *comm_count)
>  {
>  	int err;
>  	char sbuf[STRERR_BUFSIZE];
>  
> -	err = perf_evlist__mmap(evlist, mmap_pages, true);
> +	err = perf_evlist__mmap(evlist, mmap_pages, false);
>  	if (err < 0) {
>  		pr_debug("perf_evlist__mmap: %s\n",
>  			 strerror_r(errno, sbuf, sizeof(sbuf)));
>  		return TEST_FAIL;
>  	}
>  
> +	err = perf_evlist__mmap(aux_evlist, mmap_pages, true);
> +	if (err < 0) {
> +		pr_debug("perf_evlist__mmap for aux_evlist: %s\n",
> +			 strerror_r(errno, sbuf, sizeof(sbuf)));
> +		return TEST_FAIL;
> +	}
> +
>  	perf_evlist__enable(evlist);
>  	testcase();
>  	perf_evlist__disable(evlist);
>  
> -	err = count_samples(evlist, sample_count, comm_count);
> +	err = count_samples(aux_evlist, exit_sample_count, comm_count);
> +	if (err)
> +		goto errout;
> +	err = count_samples(evlist, enter_sample_count, NULL);
> +	if (err)
> +		goto errout;
> +errout:
>  	perf_evlist__munmap(evlist);
> +	perf_evlist__munmap(aux_evlist);
>  	return err;
>  }
>  
>  
>  int test__backward_ring_buffer(int subtest __maybe_unused)
>  {
> -	int ret = TEST_SKIP, err, sample_count = 0, comm_count = 0;
> +	int ret = TEST_SKIP, err;
> +	int enter_sample_count = 0, exit_sample_count = 0, comm_count = 0;
>  	char pid[16], sbuf[STRERR_BUFSIZE];
> -	struct perf_evlist *evlist;
> +	struct perf_evlist *evlist, *aux_evlist = NULL;
>  	struct perf_evsel *evsel __maybe_unused;
>  	struct parse_events_error parse_error;
>  	struct record_opts opts = {
> @@ -115,11 +137,22 @@ int test__backward_ring_buffer(int subtest __maybe_unused)
>  		goto out_delete_evlist;
>  	}
>  
> -	perf_evlist__config(evlist, &opts, NULL);
> +	/*
> +	 * Set backward bit, ring buffer should be writing from end. Record
> +	 * it in aux evlist
> +	 */
> +	perf_evlist__last(evlist)->overwrite = true;
> +	perf_evlist__last(evlist)->attr.write_backward = 1;
>  
> -	/* Set backward bit, ring buffer should be writing from end */
> -	evlist__for_each(evlist, evsel)
> -		evsel->attr.write_backward = 1;
> +	err = parse_events(evlist, "syscalls:sys_exit_prctl", &parse_error);
> +	if (err) {
> +		pr_debug("Failed to parse tracepoint event, try use root\n");
> +		ret = TEST_SKIP;
> +		goto out_delete_evlist;
> +	}
> +	/* Don't set backward bit for exit event. Record it in main evlist */
> +
> +	perf_evlist__config(evlist, &opts, NULL);
>  
>  	err = perf_evlist__open(evlist);
>  	if (err < 0) {
> @@ -128,24 +161,40 @@ int test__backward_ring_buffer(int subtest __maybe_unused)
>  		goto out_delete_evlist;
>  	}
>  
> +	aux_evlist = perf_evlist__new_aux(evlist);
> +	if (!aux_evlist) {
> +		pr_debug("perf_evlist__new_aux failed\n");
> +		goto out_delete_evlist;
> +	}
> +	aux_evlist->backward = true;
> +
>  	ret = TEST_FAIL;
> -	err = do_test(evlist, opts.mmap_pages, &sample_count,
> +	err = do_test(evlist, aux_evlist, opts.mmap_pages,
> +		      &enter_sample_count, &exit_sample_count,
>  		      &comm_count);
>  	if (err != TEST_OK)
>  		goto out_delete_evlist;
>  
> -	if ((sample_count != NR_ITERS) || (comm_count != NR_ITERS)) {
> -		pr_err("Unexpected counter: sample_count=%d, comm_count=%d\n",
> -		       sample_count, comm_count);
> +	if (enter_sample_count != exit_sample_count) {
> +		pr_err("Unexpected counter: enter_sample_count=%d, exit_sample_count=%d\n",
> +		       enter_sample_count, exit_sample_count);
> +		goto out_delete_evlist;
> +	}
> +
> +	if ((exit_sample_count != NR_ITERS) || (comm_count != NR_ITERS)) {
> +		pr_err("Unexpected counter: exit_sample_count=%d, comm_count=%d\n",
> +		       exit_sample_count, comm_count);
>  		goto out_delete_evlist;
>  	}
>  
> -	err = do_test(evlist, 1, &sample_count, &comm_count);
> +	err = do_test(evlist, aux_evlist, 1, NULL, NULL, NULL);
>  	if (err != TEST_OK)
>  		goto out_delete_evlist;
>  
>  	ret = TEST_OK;
>  out_delete_evlist:
> +	if (aux_evlist)
> +		perf_evlist__delete(aux_evlist);
>  	perf_evlist__delete(evlist);
>  	return ret;
>  }
> -- 
> 1.8.3.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 4/8] perf record: Introduce rec->overwrite_evlist for overwritable events
  2016-06-20 10:47 ` [PATCH v8 4/8] perf record: Introduce rec->overwrite_evlist for overwritable events Wang Nan
@ 2016-06-21 21:30   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 29+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-06-21 21:30 UTC (permalink / raw)
  To: Wang Nan
  Cc: linux-kernel, pi3orama, He Kuang, Arnaldo Carvalho de Melo,
	Jiri Olsa, Masami Hiramatsu, Namhyung Kim, Zefan Li

Em Mon, Jun 20, 2016 at 10:47:21AM +0000, Wang Nan escreveu:
> Create an auxiliary evlist for overwritable events.
> 
> Before mmap, build this evlist and set 'overwrite' and 'backward'
> attribute. Since perf_evlist__mmap_ex() only maps events when
> evsel->overwrite matches evlist's corresponding attributes, with
> these two evlists an event goes to either rec->evlist or
> rec->overwrite_evlist.

Please try to make your patches more granular, for instance, in this
case you moved the call to perf_evlist__mmap_ex and all that block to a
function called record__mmap_evlist(), I had to check if you had
introduced any change in that block of code.

Instead you could've made a prep change, one that introduced
record__mmap_evlist() with that block, making clear in the changeset
that no change was introduced, I would check that anyway, but being in a
separate patch, would move this out of the way for the meat in this
patch, which is to introduce the rec->overwrite_evlist when backward
events were asked for.

Doing it this way will speed up processing of your patches and even more
importantly, will make our lives easier in the future when looking for
bugs :-\

I am breaking this down as described, please take a look at the result.

Thanks,

- Arnaldo
 
> Signed-off-by: Wang Nan <wangnan0@huawei.com>
> Cc: He Kuang <hekuang@huawei.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Zefan Li <lizefan@huawei.com>
> Cc: pi3orama@163.com
> ---
>  tools/perf/builtin-record.c | 132 ++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 109 insertions(+), 23 deletions(-)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index d4cf1b0..dbbb3c0 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -50,6 +50,7 @@ struct record {
>  	struct perf_data_file	file;
>  	struct auxtrace_record	*itr;
>  	struct perf_evlist	*evlist;
> +	struct perf_evlist	*overwrite_evlist;
>  	struct perf_session	*session;
>  	const char		*progname;
>  	int			realtime_prio;
> @@ -341,6 +342,84 @@ int auxtrace_record__snapshot_start(struct auxtrace_record *itr __maybe_unused)
>  
>  #endif
>  
> +static int record__create_overwrite_evlist(struct record *rec)
> +{
> +	struct perf_evlist *evlist = rec->evlist;
> +	struct perf_evsel *pos;
> +
> +	evlist__for_each(evlist, pos) {
> +		if (!pos->overwrite)
> +			continue;
> +
> +		if (!rec->overwrite_evlist) {
> +			rec->overwrite_evlist = perf_evlist__new_aux(evlist);
> +			if (rec->overwrite_evlist) {
> +				rec->overwrite_evlist->backward = true;
> +				rec->overwrite_evlist->overwrite = true;
> +				return 0;
> +			} else
> +				return -ENOMEM;
> +		}
> +	}
> +	return 0;
> +}
> +
> +static int record__mmap_evlist(struct record *rec,
> +			       struct perf_evlist *evlist,
> +			       bool overwrite)
> +{
> +	struct record_opts *opts = &rec->opts;
> +	char msg[512];
> +
> +	/*
> +	 * Don't use evlist->overwrite because it is logically an
> +	 * internal attribute and is set by perf_evlist__mmap_ex().
> +	 * Avoid circular dependency.
> +	 */
> +	if (perf_evlist__mmap_ex(evlist, opts->mmap_pages, overwrite,
> +				 opts->auxtrace_mmap_pages,
> +				 opts->auxtrace_snapshot_mode) < 0) {
> +		if (errno == EPERM) {
> +			pr_err("Permission error mapping pages.\n"
> +			       "Consider increasing "
> +			       "/proc/sys/kernel/perf_event_mlock_kb,\n"
> +			       "or try again with a smaller value of -m/--mmap_pages.\n"
> +			       "(current value: %u,%u)\n",
> +			       opts->mmap_pages, opts->auxtrace_mmap_pages);
> +			return -errno;
> +		} else {
> +			pr_err("failed to mmap with %d (%s)\n", errno,
> +				strerror_r(errno, msg, sizeof(msg)));
> +			if (errno)
> +				return -errno;
> +			else
> +				return -EINVAL;
> +		}
> +	}
> +	return 0;
> +}
> +
> +static int record__mmap(struct record *rec)
> +{
> +	int err;
> +
> +	err = record__create_overwrite_evlist(rec);
> +	if (err)
> +		return err;
> +
> +	err = record__mmap_evlist(rec, rec->evlist, false);
> +	if (err)
> +		return err;
> +
> +	if (!rec->overwrite_evlist)
> +		return 0;
> +
> +	err = record__mmap_evlist(rec, rec->overwrite_evlist, true);
> +	if (err)
> +		return err;
> +	return 0;
> +}
> +
>  static int record__open(struct record *rec)
>  {
>  	char msg[512];
> @@ -353,6 +432,13 @@ static int record__open(struct record *rec)
>  	perf_evlist__config(evlist, opts, &callchain_param);
>  
>  	evlist__for_each(evlist, pos) {
> +		if (pos->overwrite) {
> +			if (!pos->attr.write_backward) {
> +				ui__warning("Unable to read from overwrite ring buffer\n\n");
> +				rc = -ENOSYS;
> +				goto out;
> +			}
> +		}
>  try_again:
>  		if (perf_evsel__open(pos, pos->cpus, pos->threads) < 0) {
>  			if (perf_evsel__fallback(pos, errno, msg, sizeof(msg))) {
> @@ -377,28 +463,9 @@ try_again:
>  		goto out;
>  	}
>  
> -	if (perf_evlist__mmap_ex(evlist, opts->mmap_pages, false,
> -				 opts->auxtrace_mmap_pages,
> -				 opts->auxtrace_snapshot_mode) < 0) {
> -		if (errno == EPERM) {
> -			pr_err("Permission error mapping pages.\n"
> -			       "Consider increasing "
> -			       "/proc/sys/kernel/perf_event_mlock_kb,\n"
> -			       "or try again with a smaller value of -m/--mmap_pages.\n"
> -			       "(current value: %u,%u)\n",
> -			       opts->mmap_pages, opts->auxtrace_mmap_pages);
> -			rc = -errno;
> -		} else {
> -			pr_err("failed to mmap with %d (%s)\n", errno,
> -				strerror_r(errno, msg, sizeof(msg)));
> -			if (errno)
> -				rc = -errno;
> -			else
> -				rc = -EINVAL;
> -		}
> +	rc = record__mmap(rec);
> +	if (rc)
>  		goto out;
> -	}
> -
>  	session->evlist = evlist;
>  	perf_session__set_id_hdr_size(session);
>  out:
> @@ -655,10 +722,26 @@ perf_event__synth_time_conv(const struct perf_event_mmap_page *pc __maybe_unused
>  	return 0;
>  }
>  
> +static const struct perf_event_mmap_page *
> +perf_evlist__pick_pc(struct perf_evlist *evlist)
> +{
> +	if (evlist && evlist->mmap && evlist->mmap[0].base)
> +		return evlist->mmap[0].base;
> +	return NULL;
> +}
> +
>  static const struct perf_event_mmap_page *record__pick_pc(struct record *rec)
>  {
> -	if (rec->evlist && rec->evlist->mmap && rec->evlist->mmap[0].base)
> -		return rec->evlist->mmap[0].base;
> +	const struct perf_event_mmap_page *pc;
> +
> +	/* Change it to a loop if a new aux evlist is added */
> +	pc = perf_evlist__pick_pc(rec->evlist);
> +	if (pc)
> +		return pc;
> +	pc = perf_evlist__pick_pc(rec->overwrite_evlist);
> +	if (pc)
> +		return pc;
> +
>  	return NULL;
>  }
>  
> @@ -1269,6 +1352,7 @@ static struct record record = {
>  		.mmap2		= perf_event__process_mmap2,
>  		.ordered_events	= true,
>  	},
> +	.overwrite_evlist = NULL,
>  };
>  
>  const char record_callchain_help[] = CALLCHAIN_RECORD_HELP
> @@ -1565,6 +1649,8 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
>  	err = __cmd_record(&record, argc, argv);
>  out_symbol_exit:
>  	perf_evlist__delete(rec->evlist);
> +	if (rec->overwrite_evlist)
> +		perf_evlist__delete(rec->overwrite_evlist);
>  	symbol__exit();
>  	auxtrace_record__free(rec->itr);
>  	return err;
> -- 
> 1.8.3.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 6/8] perf tools: Enable overwrite settings
  2016-06-20 10:47 ` [PATCH v8 6/8] perf tools: Enable overwrite settings Wang Nan
@ 2016-06-21 21:49   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 29+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-06-21 21:49 UTC (permalink / raw)
  To: Wang Nan
  Cc: linux-kernel, pi3orama, He Kuang, Arnaldo Carvalho de Melo,
	Jiri Olsa, Masami Hiramatsu, Namhyung Kim, Zefan Li

Em Mon, Jun 20, 2016 at 10:47:23AM +0000, Wang Nan escreveu:
> This patch allows following config terms and option:
> 
> Globally setting events to overwrite;
> 
>  # perf record --overwrite ...
> 
> Set specific events to be overwrite or no-overwrite.
> 
>  # perf record --event cycles/overwrite/ ...
>  # perf record --event cycles/no-overwrite/ ...
> 
> Add missing config terms and update config term array size because the
> longest string length is changed.
> 
> For overwritable events, automatically select attr.write_backward since
> perf requires it to be backward for reading.
> 
> Test result:
>  # perf record --overwrite -e syscalls:*enter_nanosleep* usleep 1
>  [ perf record: Woken up 2 times to write data ]
>  [ perf record: Captured and wrote 0.011 MB perf.data (1 samples) ]
>  # perf evlist -v
>  syscalls:sys_enter_nanosleep: type: 2, size: 112, config: 0x134, { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|RAW, disabled: 1, inherit: 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, write_backward: 1
>  # Tip: use 'perf evlist --trace-fields' to show fields for tracepoint events
> 
> Signed-off-by: Wang Nan <wangnan0@huawei.com>
> Signed-off-by: He Kuang <hekuang@huawei.com>
> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Zefan Li <lizefan@huawei.com>
> Cc: pi3orama@163.com
> ---
>  tools/perf/Documentation/perf-record.txt | 14 ++++++++++++++
>  tools/perf/builtin-record.c              |  1 +
>  tools/perf/perf.h                        |  1 +
>  tools/perf/tests/backward-ring-buffer.c  | 15 ++++++---------
>  tools/perf/util/evsel.c                  | 12 ++++++++++++
>  tools/perf/util/evsel.h                  |  2 ++
>  tools/perf/util/parse-events.c           | 20 ++++++++++++++++++--
>  tools/perf/util/parse-events.h           |  2 ++
>  tools/perf/util/parse-events.l           |  2 ++
>  9 files changed, 58 insertions(+), 11 deletions(-)
> 
> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
> index 8dbee83..f5cb932 100644
> --- a/tools/perf/Documentation/perf-record.txt
> +++ b/tools/perf/Documentation/perf-record.txt
> @@ -360,6 +360,20 @@ particular perf.data snapshot should be kept or not.
>  
>  Implies --timestamp-filename, --no-buildid and --no-buildid-cache.
>  
> +--overwrite::
> +Makes all events use overwritable ring buffer. Event with overwritable ring
> +buffer works like a flight recorder: when buffer gets full, instead of dumping
> +records into output file, kernel overwrites old records silently. Perf dumps
> +data from overwritable ring buffer when switching output (see --switch-output)
> +and before terminate.

What about: (some other alternative paragraphs below)

---------------

Makes all events use an overwritable ring buffer. An overwritable ring
buffer works like a flight recorder: when it gets full, the kernel will
overwrite the oldest records, that thus will never make it to the
perf.data file.

> +
> +Perf behaves like a daemon when '--overwrite' and '--switch-output' are
> +provided. It record and drop events in background, and dumps data when
> +something unusual is detected.

When '--overwrite' and '--switch-output' are used perf records and drops
events until it receives a signal, meaning that something unusual was
detected that warrants taking a snapshot of the most current events,
those fitting in the ring buffer at that moment.

> +
> +'overwrite' attribute can also be set or canceled for specific event using
                                                        an
> +config terms like 'cycles/overwrite/' and 'instructions/no-overwrite/'.
                    in
> +
>  SEE ALSO
>  --------
>  linkperf:perf-stat[1], linkperf:perf-list[1]
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 48c0051..bb62882 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -1520,6 +1520,7 @@ struct option __record_options[] = {
>  	OPT_BOOLEAN_SET('i', "no-inherit", &record.opts.no_inherit,
>  			&record.opts.no_inherit_set,
>  			"child tasks do not inherit counters"),
> +	OPT_BOOLEAN(0, "overwrite", &record.opts.overwrite, "use overwrite mode"),
>  	OPT_UINTEGER('F', "freq", &record.opts.user_freq, "profile at this frequency"),
>  	OPT_CALLBACK('m', "mmap-pages", &record.opts, "pages[,pages]",
>  		     "number of mmap data pages and AUX area tracing mmap pages",
> diff --git a/tools/perf/perf.h b/tools/perf/perf.h
> index cd8f1b1..608b42b 100644
> --- a/tools/perf/perf.h
> +++ b/tools/perf/perf.h
> @@ -59,6 +59,7 @@ struct record_opts {
>  	bool	     record_switch_events;
>  	bool	     all_kernel;
>  	bool	     all_user;
> +	bool	     overwrite;
>  	unsigned int freq;
>  	unsigned int mmap_pages;
>  	unsigned int auxtrace_mmap_pages;
> diff --git a/tools/perf/tests/backward-ring-buffer.c b/tools/perf/tests/backward-ring-buffer.c
> index 76e34c0..bba0b83 100644
> --- a/tools/perf/tests/backward-ring-buffer.c
> +++ b/tools/perf/tests/backward-ring-buffer.c
> @@ -130,27 +130,24 @@ int test__backward_ring_buffer(int subtest __maybe_unused)
>  	}
>  
>  	bzero(&parse_error, sizeof(parse_error));
> -	err = parse_events(evlist, "syscalls:sys_enter_prctl", &parse_error);
> +	/*
> +	 * Set backward bit, ring buffer should be writing from end. Record
> +	 * it in aux evlist
> +	 */
> +	err = parse_events(evlist, "syscalls:sys_enter_prctl/overwrite/", &parse_error);
>  	if (err) {
>  		pr_debug("Failed to parse tracepoint event, try use root\n");
>  		ret = TEST_SKIP;
>  		goto out_delete_evlist;
>  	}
>  
> -	/*
> -	 * Set backward bit, ring buffer should be writing from end. Record
> -	 * it in aux evlist
> -	 */
> -	perf_evlist__last(evlist)->overwrite = true;
> -	perf_evlist__last(evlist)->attr.write_backward = 1;
> -
> +	/* Don't set backward bit for exit event. Record it in main evlist */
>  	err = parse_events(evlist, "syscalls:sys_exit_prctl", &parse_error);
>  	if (err) {
>  		pr_debug("Failed to parse tracepoint event, try use root\n");
>  		ret = TEST_SKIP;
>  		goto out_delete_evlist;
>  	}
> -	/* Don't set backward bit for exit event. Record it in main evlist */
>  
>  	perf_evlist__config(evlist, &opts, NULL);
>  
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 1d8f2bb..3629520 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -677,11 +677,22 @@ static void apply_config_terms(struct perf_evsel *evsel,
>  			 */
>  			attr->inherit = term->val.inherit ? 1 : 0;
>  			break;
> +		case PERF_EVSEL__CONFIG_TERM_OVERWRITE:
> +			evsel->overwrite = term->val.overwrite ? 1 : 0;
> +			break;
>  		default:
>  			break;
>  		}
>  	}
>  
> +	/*
> +	 * Set backward after config term processing because it is
> +	 * possible to set overwrite globally, without config
> +	 * terms.
> +	 */
> +	if (evsel->overwrite)
> +		attr->write_backward = 1;
> +
>  	/* User explicitly set per-event callgraph, clear the old setting and reset. */
>  	if ((callgraph_buf != NULL) || (dump_size > 0) || max_stack) {
>  		if (max_stack) {
> @@ -758,6 +769,7 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts,
>  
>  	attr->sample_id_all = perf_missing_features.sample_id_all ? 0 : 1;
>  	attr->inherit	    = !opts->no_inherit;
> +	evsel->overwrite    = opts->overwrite;
>  
>  	perf_evsel__set_sample_bit(evsel, IP);
>  	perf_evsel__set_sample_bit(evsel, TID);
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index 828ddd1..55ff958 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -45,6 +45,7 @@ enum {
>  	PERF_EVSEL__CONFIG_TERM_STACK_USER,
>  	PERF_EVSEL__CONFIG_TERM_INHERIT,
>  	PERF_EVSEL__CONFIG_TERM_MAX_STACK,
> +	PERF_EVSEL__CONFIG_TERM_OVERWRITE,
>  	PERF_EVSEL__CONFIG_TERM_MAX,
>  };
>  
> @@ -59,6 +60,7 @@ struct perf_evsel_config_term {
>  		u64	stack_user;
>  		int	max_stack;
>  		bool	inherit;
> +		bool	overwrite;
>  	} val;
>  };
>  
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index d15e335..b641f75 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -901,6 +901,8 @@ static const char *config_term_names[__PARSE_EVENTS__TERM_TYPE_NR] = {
>  	[PARSE_EVENTS__TERM_TYPE_NOINHERIT]		= "no-inherit",
>  	[PARSE_EVENTS__TERM_TYPE_INHERIT]		= "inherit",
>  	[PARSE_EVENTS__TERM_TYPE_MAX_STACK]		= "max-stack",
> +	[PARSE_EVENTS__TERM_TYPE_OVERWRITE]		= "overwrite",
> +	[PARSE_EVENTS__TERM_TYPE_NOOVERWRITE]		= "no-overwrite",
>  };
>  
>  static bool config_term_shrinked;
> @@ -993,6 +995,12 @@ do {									   \
>  	case PARSE_EVENTS__TERM_TYPE_NOINHERIT:
>  		CHECK_TYPE_VAL(NUM);
>  		break;
> +	case PARSE_EVENTS__TERM_TYPE_OVERWRITE:
> +		CHECK_TYPE_VAL(NUM);
> +		break;
> +	case PARSE_EVENTS__TERM_TYPE_NOOVERWRITE:
> +		CHECK_TYPE_VAL(NUM);
> +		break;
>  	case PARSE_EVENTS__TERM_TYPE_NAME:
>  		CHECK_TYPE_VAL(STR);
>  		break;
> @@ -1045,6 +1053,8 @@ static int config_term_tracepoint(struct perf_event_attr *attr,
>  	case PARSE_EVENTS__TERM_TYPE_INHERIT:
>  	case PARSE_EVENTS__TERM_TYPE_NOINHERIT:
>  	case PARSE_EVENTS__TERM_TYPE_MAX_STACK:
> +	case PARSE_EVENTS__TERM_TYPE_OVERWRITE:
> +	case PARSE_EVENTS__TERM_TYPE_NOOVERWRITE:
>  		return config_term_common(attr, term, err);
>  	default:
>  		if (err) {
> @@ -1117,6 +1127,12 @@ do {								\
>  		case PARSE_EVENTS__TERM_TYPE_MAX_STACK:
>  			ADD_CONFIG_TERM(MAX_STACK, max_stack, term->val.num);
>  			break;
> +		case PARSE_EVENTS__TERM_TYPE_OVERWRITE:
> +			ADD_CONFIG_TERM(OVERWRITE, overwrite, term->val.num ? 1 : 0);
> +			break;
> +		case PARSE_EVENTS__TERM_TYPE_NOOVERWRITE:
> +			ADD_CONFIG_TERM(OVERWRITE, overwrite, term->val.num ? 0 : 1);
> +			break;
>  		default:
>  			break;
>  		}
> @@ -2330,9 +2346,9 @@ static void config_terms_list(char *buf, size_t buf_sz)
>  char *parse_events_formats_error_string(char *additional_terms)
>  {
>  	char *str;
> -	/* "branch_type" is the longest name */
> +	/* "no-overwrite" is the longest name */
>  	char static_terms[__PARSE_EVENTS__TERM_TYPE_NR *
> -			  (sizeof("branch_type") - 1)];
> +			  (sizeof("no-overwrite") - 1)];
>  
>  	config_terms_list(static_terms, sizeof(static_terms));
>  	/* valid terms */
> diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
> index 46c05cc..1b04d82 100644
> --- a/tools/perf/util/parse-events.h
> +++ b/tools/perf/util/parse-events.h
> @@ -69,6 +69,8 @@ enum {
>  	PARSE_EVENTS__TERM_TYPE_NOINHERIT,
>  	PARSE_EVENTS__TERM_TYPE_INHERIT,
>  	PARSE_EVENTS__TERM_TYPE_MAX_STACK,
> +	PARSE_EVENTS__TERM_TYPE_NOOVERWRITE,
> +	PARSE_EVENTS__TERM_TYPE_OVERWRITE,
>  	__PARSE_EVENTS__TERM_TYPE_NR,
>  };
>  
> diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
> index 3c15b33..7a25194 100644
> --- a/tools/perf/util/parse-events.l
> +++ b/tools/perf/util/parse-events.l
> @@ -202,6 +202,8 @@ stack-size		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_STACKSIZE); }
>  max-stack		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_MAX_STACK); }
>  inherit			{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_INHERIT); }
>  no-inherit		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NOINHERIT); }
> +overwrite		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_OVERWRITE); }
> +no-overwrite		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NOOVERWRITE); }
>  ,			{ return ','; }
>  "/"			{ BEGIN(INITIAL); return '/'; }
>  {name_minus}		{ return str(yyscanner, PE_NAME); }
> -- 
> 1.8.3.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v8 3/8] perf tests: Add testcase for auxiliary evlist
  2016-06-21 21:05   ` Arnaldo Carvalho de Melo
@ 2016-06-22  4:10     ` Wangnan (F)
  0 siblings, 0 replies; 29+ messages in thread
From: Wangnan (F) @ 2016-06-22  4:10 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, pi3orama, Arnaldo Carvalho de Melo, Jiri Olsa,
	Masami Hiramatsu, Namhyung Kim, Zefan Li, He Kuang



On 2016/6/22 5:05, Arnaldo Carvalho de Melo wrote:
> Em Mon, Jun 20, 2016 at 10:47:20AM +0000, Wang Nan escreveu:
>> Improve test backward

[SNIP]

>>
>> diff --git a/tools/perf/tests/backward-ring-buffer.c b/tools/perf/tests/backward-ring-buffer.c
>> index d9ba991..76e34c0 100644
>> --- a/tools/perf/tests/backward-ring-buffer.c
>> +++ b/tools/perf/tests/backward-ring-buffer.c
>> @@ -31,16 +31,19 @@ static int count_samples(struct perf_evlist *evlist, int *sample_count,
>>   	for (i = 0; i < evlist->nr_mmaps; i++) {
>>   		union perf_event *event;
>>   
>> -		perf_evlist__mmap_read_catchup(evlist, i);
>> -		while ((event = perf_evlist__mmap_read_backward(evlist, i)) != NULL) {
>> +		if (evlist->backward)
>> +			perf_evlist__mmap_read_catchup(evlist, i);
> So, shouldn't this be done at perf_evlist__mmap_read_catchup()? I.e. you
> will use this only when you know that one of the evlists count_samples()
> will deal with can be a backwards one...
>
> I.e. do with perf_evlist__mmap_read_catchup the same thing you did in
> perf_evlist__mmap_read, checking there this evlist->backward.

I can make the code clearer, but I don't agree hiding evlist->backward 
checker in
perf_evlist__mmap_read_catchup():

1. If we make perf_evlist__mmap_read_catchup() implicitly ignore 
non-backward evlist,
    then we are creating a new rule for reading from mmaps that, before 
calling
    perf_evlist__mmap_read() we need to call 
perf_evlist__mmap_read_catchup() first.
    Theoretically, existing code should be adjusted to satisify this new 
rule, but
    actually most of catchup does nothing.

    If we don't require existing code be adjusted, then we are still 
required to
    clarify when catchup() is required, so evlist->backward is still 
exposed.

2. I think we don't need to restrict perf_evlist__mmap_read_catchup() 
for backward
    ring buffer. It is a generic operations, can be used for a normal 
evlist to
    consume existing data in ring buffer.


> This can be done on top, so I'll continue tentatively merging this.
>
>> +		while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) {
>>   			const u32 type = event->header.type;
>>   
>>   			switch (type) {
>>   			case PERF_RECORD_SAMPLE:
>> -				(*sample_count)++;
>> +				if (sample_count)
>> +					(*sample_count)++;
>>   				break;
>>   			case PERF_RECORD_COMM:
>> -				(*comm_count)++;
>> +				if (comm_count)
>> +					(*comm_count)++;
> You could've avoided all this by passing some dummy integer pointer for
> the enter_sample_count case. Making the patch smaller helps reviewing
> :-)

Will do.

Thank you.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [tip:perf/core] perf evsel: Fix write_backwards fallback
  2016-06-16 21:47   ` Arnaldo Carvalho de Melo
  2016-06-20  4:09     ` Wangnan (F)
@ 2016-06-22  7:43     ` tip-bot for Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 29+ messages in thread
From: tip-bot for Arnaldo Carvalho de Melo @ 2016-06-22  7:43 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, hpa, hekuang, linux-kernel, lizefan, wangnan0, namhyung,
	jolsa, mhiramat, mingo, acme, acme

Commit-ID:  7da36e94e7fad768ca8640b61ed1f49b284e1dc5
Gitweb:     http://git.kernel.org/tip/7da36e94e7fad768ca8640b61ed1f49b284e1dc5
Author:     Arnaldo Carvalho de Melo <acme@kernel.org>
AuthorDate: Mon, 20 Jun 2016 10:47:18 +0000
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 21 Jun 2016 13:18:35 -0300

perf evsel: Fix write_backwards fallback

Commit b90dc17a5d14 "perf evsel: Add overwrite attribute and check
write_backward" misunderstood the 'order' should be obeyed in
__perf_evsel__open.

But the way this was done for attr.write_backwards was buggy, as we need
to check features in the inverse order of their introduction to the
kernel, so that a newer tool checks first the newest perf_event_attr
fields, detecting that the older kernel doesn't have support for them.

Also, we can avoid calling sys_perf_event_open() if we have already
detected the missing of write_backward.

Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Fixes: b90dc17a5d14 ("perf evsel: Add overwrite attribute and check write_backward")
Link: http://lkml.kernel.org/r/1466419645-75551-2-git-send-email-wangnan0@huawei.com
Link: http://lkml.kernel.org/r/20160616214724.GI13337@kernel.org
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/evsel.c | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 9b2e3e6..1d8f2bb 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1389,8 +1389,11 @@ fallback_missing_features:
 	if (perf_missing_features.lbr_flags)
 		evsel->attr.branch_sample_type &= ~(PERF_SAMPLE_BRANCH_NO_FLAGS |
 				     PERF_SAMPLE_BRANCH_NO_CYCLES);
-	if (perf_missing_features.write_backward)
+	if (perf_missing_features.write_backward) {
+		if (evsel->overwrite)
+			return -EINVAL;
 		evsel->attr.write_backward = false;
+	}
 retry_sample_id:
 	if (perf_missing_features.sample_id_all)
 		evsel->attr.sample_id_all = 0;
@@ -1453,12 +1456,6 @@ retry_open:
 				err = -EINVAL;
 				goto out_close;
 			}
-
-			if (evsel->overwrite &&
-			    perf_missing_features.write_backward) {
-				err = -EINVAL;
-				goto out_close;
-			}
 		}
 	}
 
@@ -1496,7 +1493,10 @@ try_fallback:
 	 * Must probe features in the order they were added to the
 	 * perf_event_attr interface.
 	 */
-	if (!perf_missing_features.clockid_wrong && evsel->attr.use_clockid) {
+	if (!perf_missing_features.write_backward && evsel->attr.write_backward) {
+		perf_missing_features.write_backward = true;
+		goto fallback_missing_features;
+	} else if (!perf_missing_features.clockid_wrong && evsel->attr.use_clockid) {
 		perf_missing_features.clockid_wrong = true;
 		goto fallback_missing_features;
 	} else if (!perf_missing_features.clockid && evsel->attr.use_clockid) {
@@ -1521,12 +1521,7 @@ try_fallback:
 			  PERF_SAMPLE_BRANCH_NO_FLAGS))) {
 		perf_missing_features.lbr_flags = true;
 		goto fallback_missing_features;
-	} else if (!perf_missing_features.write_backward &&
-			evsel->attr.write_backward) {
-		perf_missing_features.write_backward = true;
-		goto fallback_missing_features;
 	}
-
 out_close:
 	do {
 		while (--thread >= 0) {
@@ -2409,6 +2404,8 @@ int perf_evsel__open_strerror(struct perf_evsel *evsel, struct target *target,
 	"We found oprofile daemon running, please stop it and try again.");
 		break;
 	case EINVAL:
+		if (evsel->overwrite && perf_missing_features.write_backward)
+			return scnprintf(msg, size, "Reading from overwrite event is not supported by this kernel.");
 		if (perf_missing_features.clockid)
 			return scnprintf(msg, size, "clockid feature not supported.");
 		if (perf_missing_features.clockid_wrong)

^ permalink raw reply related	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-06-22  7:44 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-15  2:23 [PATCH v7 0/8] perf tools: Support overwritable ring buffer Wang Nan
2016-06-15  2:23 ` [PATCH v7 1/8] perf evlist: Introduce aux evlist Wang Nan
2016-06-15  2:23 ` [PATCH v7 2/8] perf tests: Add testcase for auxiliary evlist Wang Nan
2016-06-15  2:23 ` [PATCH v7 3/8] perf record: Introduce rec->overwrite_evlist for overwritable events Wang Nan
2016-06-15  2:23 ` [PATCH v7 4/8] perf record: Toggle overwrite ring buffer for reading Wang Nan
2016-06-15  2:23 ` [PATCH v7 5/8] perf tools: Enable overwrite settings Wang Nan
2016-06-15  2:23 ` [PATCH v7 6/8] perf tools: Don't warn about out of order event if write_backward is used Wang Nan
2016-06-15  2:23 ` [PATCH v7 7/8] perf tools: Check write_backward during evlist config Wang Nan
2016-06-16 21:47   ` Arnaldo Carvalho de Melo
2016-06-20  4:09     ` Wangnan (F)
2016-06-22  7:43     ` [tip:perf/core] perf evsel: Fix write_backwards fallback tip-bot for Arnaldo Carvalho de Melo
2016-06-15  2:23 ` [PATCH v7 8/8] perf record: Unmap overwrite evlist when event terminate Wang Nan
2016-06-16 20:59   ` Arnaldo Carvalho de Melo
2016-06-20  8:04     ` Wangnan (F)
2016-06-20 10:47 [PATCH v8 0/8] perf tools: Support overwritable ring buffer Wang Nan
2016-06-20 10:47 ` [PATCH v8 1/8] perf tools: Fix write_backwards fallback Wang Nan
2016-06-20 10:47 ` [PATCH v8 2/8] perf evlist: Introduce aux evlist Wang Nan
2016-06-20 20:36   ` Arnaldo Carvalho de Melo
2016-06-21  1:31     ` Wangnan (F)
2016-06-20 10:47 ` [PATCH v8 3/8] perf tests: Add testcase for auxiliary evlist Wang Nan
2016-06-21 21:05   ` Arnaldo Carvalho de Melo
2016-06-22  4:10     ` Wangnan (F)
2016-06-20 10:47 ` [PATCH v8 4/8] perf record: Introduce rec->overwrite_evlist for overwritable events Wang Nan
2016-06-21 21:30   ` Arnaldo Carvalho de Melo
2016-06-20 10:47 ` [PATCH v8 5/8] perf record: Toggle overwrite ring buffer for reading Wang Nan
2016-06-20 10:47 ` [PATCH v8 6/8] perf tools: Enable overwrite settings Wang Nan
2016-06-21 21:49   ` Arnaldo Carvalho de Melo
2016-06-20 10:47 ` [PATCH v8 7/8] perf tools: Don't warn about out of order event if write_backward is used Wang Nan
2016-06-20 10:47 ` [PATCH v8 8/8] perf tools: Add --tail-synthesize option Wang Nan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).