All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] perf record: Allow multiple recording time ranges
@ 2022-08-24  7:28 Adrian Hunter
  2022-08-24  7:28 ` [PATCH 1/5] perf record: Fix way of handling non-perf-event pollfds Adrian Hunter
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Adrian Hunter @ 2022-08-24  7:28 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen,
	Alexey Bayduraev, linux-kernel

Hi

This patch set extends perf record -D/--delay option to accept time ranges
for when events are enabled, for instance:

    perf record -e intel_pt// -D 10-20,30-40

to record 10ms to 20ms into the trace and 30ms to 40ms.  Refer patch 5 for
more details.

This ran into an issue with the handling of polling file descriptors.
Essentially, calling perf_evlist__add_pollfd() would not actually result
in the file descriptor being polled.  Patches 1, 2 and 3 deal with that.

Patch 4 deals with the issue that when disabling events, we actually do not
want to disable events collecting sideband information.

Patch 5, the last patch, actually makes the desired enhancement to
perf record.


Adrian Hunter (5):
      perf record: Fix way of handling non-perf-event pollfds
      perf record: Fix done_fd wakeup event
      perf record: Change evlist->ctl_fd to use fdarray_flag__non_perf_event
      perf evlist: Add evlist__{en/dis}able_non_dummy()
      perf record: Allow multiple recording time ranges

 tools/lib/api/fd/array.h                 |   5 +-
 tools/perf/Documentation/perf-record.txt |   6 +-
 tools/perf/builtin-record.c              | 117 ++++++++++++--
 tools/perf/util/evlist.c                 | 270 +++++++++++++++++++++++++++++--
 tools/perf/util/evlist.h                 |  12 +-
 5 files changed, 374 insertions(+), 36 deletions(-)


Regards
Adrian

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/5] perf record: Fix way of handling non-perf-event pollfds
  2022-08-24  7:28 [PATCH 0/5] perf record: Allow multiple recording time ranges Adrian Hunter
@ 2022-08-24  7:28 ` Adrian Hunter
  2022-08-24 15:40   ` Ian Rogers
  2022-08-24  7:28 ` [PATCH 2/5] perf record: Fix done_fd wakeup event Adrian Hunter
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Adrian Hunter @ 2022-08-24  7:28 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen,
	Alexey Bayduraev, linux-kernel

perf record __cmd_record() does not poll evlist pollfds. Instead it polls
thread_data[0].pollfd. That happens whether or not threads are being used.

perf record duplicates evlist mmap pollfds as needed for separate threads.
The non-perf-event represented by evlist->ctl_fd has to handled separately,
which is done explicitly, duplicating it into the thread_data[0] pollfds.
That approach neglects any other non-perf-event file descriptors. Currently
there is also done_fd which needs the same handling.

Add a new generalized approach.

Add fdarray_flag__non_perf_event to identify the file descriptors that
need the special handling. For those cases, also keep a mapping of the
evlist pollfd index and thread pollfd index, so that the evlist revents
can be updated.

Although this patch adds the new handling, it does not take it into use.
There is no functional change, but it is the precursor to a fix, so is
marked as a fix.

Fixes: 415ccb58f68a ("perf record: Introduce thread specific data array")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/lib/api/fd/array.h    |  5 ++-
 tools/perf/builtin-record.c | 80 +++++++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+), 2 deletions(-)

diff --git a/tools/lib/api/fd/array.h b/tools/lib/api/fd/array.h
index 60ad197c8ee9..5c01f7b05dfb 100644
--- a/tools/lib/api/fd/array.h
+++ b/tools/lib/api/fd/array.h
@@ -31,8 +31,9 @@ struct fdarray {
 };
 
 enum fdarray_flags {
-	fdarray_flag__default	    = 0x00000000,
-	fdarray_flag__nonfilterable = 0x00000001
+	fdarray_flag__default		= 0x00000000,
+	fdarray_flag__nonfilterable	= 0x00000001,
+	fdarray_flag__non_perf_event	= 0x00000002,
 };
 
 void fdarray__init(struct fdarray *fda, int nr_autogrow);
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4713f0f3a6cf..e0be48c47f65 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -143,6 +143,11 @@ static const char *thread_spec_tags[THREAD_SPEC__MAX] = {
 	"undefined", "cpu", "core", "package", "numa", "user"
 };
 
+struct pollfd_index_map {
+	int evlist_pollfd_index;
+	int thread_pollfd_index;
+};
+
 struct record {
 	struct perf_tool	tool;
 	struct record_opts	opts;
@@ -171,6 +176,9 @@ struct record {
 	int			nr_threads;
 	struct thread_mask	*thread_masks;
 	struct record_thread	*thread_data;
+	struct pollfd_index_map	*index_map;
+	size_t			index_map_sz;
+	size_t			index_map_cnt;
 };
 
 static volatile int done;
@@ -1074,6 +1082,70 @@ static void record__free_thread_data(struct record *rec)
 	zfree(&rec->thread_data);
 }
 
+static int record__map_thread_evlist_pollfd_indexes(struct record *rec,
+						    int evlist_pollfd_index,
+						    int thread_pollfd_index)
+{
+	size_t x = rec->index_map_cnt;
+
+	if (realloc_array_as_needed(rec->index_map, rec->index_map_sz, x, NULL))
+		return -ENOMEM;
+	rec->index_map[x].evlist_pollfd_index = evlist_pollfd_index;
+	rec->index_map[x].thread_pollfd_index = thread_pollfd_index;
+	rec->index_map_cnt += 1;
+	return 0;
+}
+
+static int record__update_evlist_pollfd_from_thread(struct record *rec,
+						    struct evlist *evlist,
+						    struct record_thread *thread_data)
+{
+	struct pollfd *e_entries = evlist->core.pollfd.entries;
+	struct pollfd *t_entries = thread_data->pollfd.entries;
+	int err = 0;
+	size_t i;
+
+	for (i = 0; i < rec->index_map_cnt; i++) {
+		int e_pos = rec->index_map[i].evlist_pollfd_index;
+		int t_pos = rec->index_map[i].thread_pollfd_index;
+
+		if (e_entries[e_pos].fd != t_entries[t_pos].fd ||
+		    e_entries[e_pos].events != t_entries[t_pos].events) {
+			pr_err("Thread and evlist pollfd index mismatch\n");
+			err = -EINVAL;
+			continue;
+		}
+		e_entries[e_pos].revents = t_entries[t_pos].revents;
+	}
+	return err;
+}
+
+static int record__dup_non_perf_events(struct record *rec,
+				       struct evlist *evlist,
+				       struct record_thread *thread_data)
+{
+	struct fdarray *fda = &evlist->core.pollfd;
+	int i, ret;
+
+	for (i = 0; i < fda->nr; i++) {
+		if (!(fda->priv[i].flags & fdarray_flag__non_perf_event))
+			continue;
+		ret = fdarray__dup_entry_from(&thread_data->pollfd, i, fda);
+		if (ret < 0) {
+			pr_err("Failed to duplicate descriptor in main thread pollfd\n");
+			return ret;
+		}
+		pr_debug2("thread_data[%p]: pollfd[%d] <- non_perf_event fd=%d\n",
+			  thread_data, ret, fda->entries[i].fd);
+		ret = record__map_thread_evlist_pollfd_indexes(rec, i, ret);
+		if (ret < 0) {
+			pr_err("Failed to map thread and evlist pollfd indexes\n");
+			return ret;
+		}
+	}
+	return 0;
+}
+
 static int record__alloc_thread_data(struct record *rec, struct evlist *evlist)
 {
 	int t, ret;
@@ -1121,6 +1193,11 @@ static int record__alloc_thread_data(struct record *rec, struct evlist *evlist)
 				 thread_data[t].pipes.msg[0]);
 		} else {
 			thread_data[t].tid = gettid();
+
+			ret = record__dup_non_perf_events(rec, evlist, &thread_data[t]);
+			if (ret < 0)
+				goto out_free;
+
 			if (evlist->ctl_fd.pos == -1)
 				continue;
 			ret = fdarray__dup_entry_from(&thread_data[t].pollfd, evlist->ctl_fd.pos,
@@ -2530,6 +2607,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 					    record__thread_munmap_filtered, NULL) == 0)
 				draining = true;
 
+			err = record__update_evlist_pollfd_from_thread(rec, rec->evlist, thread);
+			if (err)
+				goto out_child;
 			evlist__ctlfd_update(rec->evlist,
 				&thread->pollfd.entries[thread->ctlfd_pos]);
 		}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/5] perf record: Fix done_fd wakeup event
  2022-08-24  7:28 [PATCH 0/5] perf record: Allow multiple recording time ranges Adrian Hunter
  2022-08-24  7:28 ` [PATCH 1/5] perf record: Fix way of handling non-perf-event pollfds Adrian Hunter
@ 2022-08-24  7:28 ` Adrian Hunter
  2022-08-24 15:41   ` Ian Rogers
  2022-08-24  7:28 ` [PATCH 3/5] perf record: Change evlist->ctl_fd to use fdarray_flag__non_perf_event Adrian Hunter
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Adrian Hunter @ 2022-08-24  7:28 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen,
	Alexey Bayduraev, linux-kernel

evlist__add_wakeup_eventfd() calls perf_evlist__add_pollfd() to add a
non-perf-event to the evlist pollfds. Since commit 415ccb58f68a
("perf record: Introduce thread specific data array") that doesn't work
because evlist pollfs is not polled and done_fd is not duplicated into
thread-data.

Patch "perf record: Fix way of handling non-perf-event pollfds" added a new
approach that ensures file descriptors like done_fd are handled correctly
by flagging them as fdarray_flag__non_perf_event.

Fix by flagging done_fd as fdarray_flag__non_perf_event.

Example:

 Before:

  $ sleep 3 & perf record -vv -p $!
  ...
  thread_data[0x55f44bd34140]: pollfd[0] <- event_fd=5
  thread_data[0x55f44bd34140]: pollfd[1] <- event_fd=6
  thread_data[0x55f44bd34140]: pollfd[2] <- event_fd=7
  thread_data[0x55f44bd34140]: pollfd[3] <- event_fd=8
  thread_data[0x55f44bd34140]: pollfd[4] <- event_fd=9
  thread_data[0x55f44bd34140]: pollfd[5] <- event_fd=10
  thread_data[0x55f44bd34140]: pollfd[6] <- event_fd=11
  thread_data[0x55f44bd34140]: pollfd[7] <- event_fd=12
  ...

 After:

  $ sleep 3 & perf record -vv -p $!
  ...
  thread_data[0x55a8ded89140]: pollfd[0] <- event_fd=5
  thread_data[0x55a8ded89140]: pollfd[1] <- event_fd=6
  thread_data[0x55a8ded89140]: pollfd[2] <- event_fd=7
  thread_data[0x55a8ded89140]: pollfd[3] <- event_fd=8
  thread_data[0x55a8ded89140]: pollfd[4] <- event_fd=9
  thread_data[0x55a8ded89140]: pollfd[5] <- event_fd=10
  thread_data[0x55a8ded89140]: pollfd[6] <- event_fd=11
  thread_data[0x55a8ded89140]: pollfd[7] <- event_fd=12
  thread_data[0x55a8ded89140]: pollfd[8] <- non_perf_event fd=4
  ...

This patch depends on "perf record: Fix way of handling non-perf-event
pollfds".

Fixes: 415ccb58f68a ("perf record: Introduce thread specific data array")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/evlist.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 48167f3941a6..0b2222d05577 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -608,7 +608,8 @@ int evlist__filter_pollfd(struct evlist *evlist, short revents_and_mask)
 int evlist__add_wakeup_eventfd(struct evlist *evlist, int fd)
 {
 	return perf_evlist__add_pollfd(&evlist->core, fd, NULL, POLLIN,
-				       fdarray_flag__nonfilterable);
+				       fdarray_flag__nonfilterable |
+				       fdarray_flag__non_perf_event);
 }
 #endif
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/5] perf record: Change evlist->ctl_fd to use fdarray_flag__non_perf_event
  2022-08-24  7:28 [PATCH 0/5] perf record: Allow multiple recording time ranges Adrian Hunter
  2022-08-24  7:28 ` [PATCH 1/5] perf record: Fix way of handling non-perf-event pollfds Adrian Hunter
  2022-08-24  7:28 ` [PATCH 2/5] perf record: Fix done_fd wakeup event Adrian Hunter
@ 2022-08-24  7:28 ` Adrian Hunter
  2022-08-24 15:42   ` Ian Rogers
  2022-08-24  7:28 ` [PATCH 4/5] perf evlist: Add evlist__{en/dis}able_non_dummy() Adrian Hunter
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Adrian Hunter @ 2022-08-24  7:28 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen,
	Alexey Bayduraev, linux-kernel

Patch "perf record: Fix way of handling non-perf-event pollfds" added a
generic way to handle non-perf-event file descriptors like evlist->ctl_fd.
Use it instead of handling evlist->ctl_fd separately.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-record.c | 15 +--------------
 tools/perf/util/evlist.c    | 19 ++-----------------
 tools/perf/util/evlist.h    |  1 -
 3 files changed, 3 insertions(+), 32 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index e0be48c47f65..cefb3028f565 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1198,18 +1198,7 @@ static int record__alloc_thread_data(struct record *rec, struct evlist *evlist)
 			if (ret < 0)
 				goto out_free;
 
-			if (evlist->ctl_fd.pos == -1)
-				continue;
-			ret = fdarray__dup_entry_from(&thread_data[t].pollfd, evlist->ctl_fd.pos,
-						      &evlist->core.pollfd);
-			if (ret < 0) {
-				pr_err("Failed to duplicate descriptor in main thread pollfd\n");
-				goto out_free;
-			}
-			thread_data[t].ctlfd_pos = ret;
-			pr_debug2("thread_data[%p]: pollfd[%d] <- ctl_fd=%d\n",
-				 thread_data, thread_data[t].ctlfd_pos,
-				 evlist->core.pollfd.entries[evlist->ctl_fd.pos].fd);
+			thread_data[t].ctlfd_pos = -1; /* Not used */
 		}
 	}
 
@@ -2610,8 +2599,6 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 			err = record__update_evlist_pollfd_from_thread(rec, rec->evlist, thread);
 			if (err)
 				goto out_child;
-			evlist__ctlfd_update(rec->evlist,
-				&thread->pollfd.entries[thread->ctlfd_pos]);
 		}
 
 		if (evlist__ctlfd_process(rec->evlist, &cmd) > 0) {
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 0b2222d05577..4c5e6e9f8d11 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1898,7 +1898,8 @@ int evlist__initialize_ctlfd(struct evlist *evlist, int fd, int ack)
 	}
 
 	evlist->ctl_fd.pos = perf_evlist__add_pollfd(&evlist->core, fd, NULL, POLLIN,
-						     fdarray_flag__nonfilterable);
+						     fdarray_flag__nonfilterable |
+						     fdarray_flag__non_perf_event);
 	if (evlist->ctl_fd.pos < 0) {
 		evlist->ctl_fd.pos = -1;
 		pr_err("Failed to add ctl fd entry: %m\n");
@@ -2148,22 +2149,6 @@ int evlist__ctlfd_process(struct evlist *evlist, enum evlist_ctl_cmd *cmd)
 	return err;
 }
 
-int evlist__ctlfd_update(struct evlist *evlist, struct pollfd *update)
-{
-	int ctlfd_pos = evlist->ctl_fd.pos;
-	struct pollfd *entries = evlist->core.pollfd.entries;
-
-	if (!evlist__ctlfd_initialized(evlist))
-		return 0;
-
-	if (entries[ctlfd_pos].fd != update->fd ||
-	    entries[ctlfd_pos].events != update->events)
-		return -1;
-
-	entries[ctlfd_pos].revents = update->revents;
-	return 0;
-}
-
 struct evsel *evlist__find_evsel(struct evlist *evlist, int idx)
 {
 	struct evsel *evsel;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 351ba2887a79..3a464585d397 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -418,7 +418,6 @@ void evlist__close_control(int ctl_fd, int ctl_fd_ack, bool *ctl_fd_close);
 int evlist__initialize_ctlfd(struct evlist *evlist, int ctl_fd, int ctl_fd_ack);
 int evlist__finalize_ctlfd(struct evlist *evlist);
 bool evlist__ctlfd_initialized(struct evlist *evlist);
-int evlist__ctlfd_update(struct evlist *evlist, struct pollfd *update);
 int evlist__ctlfd_process(struct evlist *evlist, enum evlist_ctl_cmd *cmd);
 int evlist__ctlfd_ack(struct evlist *evlist);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/5] perf evlist: Add evlist__{en/dis}able_non_dummy()
  2022-08-24  7:28 [PATCH 0/5] perf record: Allow multiple recording time ranges Adrian Hunter
                   ` (2 preceding siblings ...)
  2022-08-24  7:28 ` [PATCH 3/5] perf record: Change evlist->ctl_fd to use fdarray_flag__non_perf_event Adrian Hunter
@ 2022-08-24  7:28 ` Adrian Hunter
  2022-08-24 15:45   ` Ian Rogers
  2022-08-24  7:28 ` [PATCH 5/5] perf record: Allow multiple recording time ranges Adrian Hunter
  2022-08-24 16:56 ` [PATCH 0/5] " Andi Kleen
  5 siblings, 1 reply; 13+ messages in thread
From: Adrian Hunter @ 2022-08-24  7:28 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen,
	Alexey Bayduraev, linux-kernel

Dummy events are used to provide sideband information like MMAP events that
are always needed even when main events are disabled. Add functions that
take that into account.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/evlist.c | 30 ++++++++++++++++++++++++------
 tools/perf/util/evlist.h |  2 ++
 2 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 4c5e6e9f8d11..3cfe730c12b8 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -480,7 +480,7 @@ static int evlist__is_enabled(struct evlist *evlist)
 	return false;
 }
 
-static void __evlist__disable(struct evlist *evlist, char *evsel_name)
+static void __evlist__disable(struct evlist *evlist, char *evsel_name, bool excl_dummy)
 {
 	struct evsel *pos;
 	struct evlist_cpu_iterator evlist_cpu_itr;
@@ -502,6 +502,8 @@ static void __evlist__disable(struct evlist *evlist, char *evsel_name)
 				continue;
 			if (pos->disabled || !evsel__is_group_leader(pos) || !pos->core.fd)
 				continue;
+			if (excl_dummy && evsel__is_dummy_event(pos))
+				continue;
 			if (pos->immediate)
 				has_imm = true;
 			if (pos->immediate != imm)
@@ -518,6 +520,8 @@ static void __evlist__disable(struct evlist *evlist, char *evsel_name)
 			continue;
 		if (!evsel__is_group_leader(pos) || !pos->core.fd)
 			continue;
+		if (excl_dummy && evsel__is_dummy_event(pos))
+			continue;
 		pos->disabled = true;
 	}
 
@@ -533,15 +537,20 @@ static void __evlist__disable(struct evlist *evlist, char *evsel_name)
 
 void evlist__disable(struct evlist *evlist)
 {
-	__evlist__disable(evlist, NULL);
+	__evlist__disable(evlist, NULL, false);
+}
+
+void evlist__disable_non_dummy(struct evlist *evlist)
+{
+	__evlist__disable(evlist, NULL, true);
 }
 
 void evlist__disable_evsel(struct evlist *evlist, char *evsel_name)
 {
-	__evlist__disable(evlist, evsel_name);
+	__evlist__disable(evlist, evsel_name, false);
 }
 
-static void __evlist__enable(struct evlist *evlist, char *evsel_name)
+static void __evlist__enable(struct evlist *evlist, char *evsel_name, bool excl_dummy)
 {
 	struct evsel *pos;
 	struct evlist_cpu_iterator evlist_cpu_itr;
@@ -560,6 +569,8 @@ static void __evlist__enable(struct evlist *evlist, char *evsel_name)
 			continue;
 		if (!evsel__is_group_leader(pos) || !pos->core.fd)
 			continue;
+		if (excl_dummy && evsel__is_dummy_event(pos))
+			continue;
 		evsel__enable_cpu(pos, evlist_cpu_itr.cpu_map_idx);
 	}
 	affinity__cleanup(affinity);
@@ -568,6 +579,8 @@ static void __evlist__enable(struct evlist *evlist, char *evsel_name)
 			continue;
 		if (!evsel__is_group_leader(pos) || !pos->core.fd)
 			continue;
+		if (excl_dummy && evsel__is_dummy_event(pos))
+			continue;
 		pos->disabled = false;
 	}
 
@@ -581,12 +594,17 @@ static void __evlist__enable(struct evlist *evlist, char *evsel_name)
 
 void evlist__enable(struct evlist *evlist)
 {
-	__evlist__enable(evlist, NULL);
+	__evlist__enable(evlist, NULL, false);
+}
+
+void evlist__enable_non_dummy(struct evlist *evlist)
+{
+	__evlist__enable(evlist, NULL, true);
 }
 
 void evlist__enable_evsel(struct evlist *evlist, char *evsel_name)
 {
-	__evlist__enable(evlist, evsel_name);
+	__evlist__enable(evlist, evsel_name, false);
 }
 
 void evlist__toggle_enable(struct evlist *evlist)
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 3a464585d397..3a8474406738 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -205,6 +205,8 @@ void evlist__enable(struct evlist *evlist);
 void evlist__toggle_enable(struct evlist *evlist);
 void evlist__disable_evsel(struct evlist *evlist, char *evsel_name);
 void evlist__enable_evsel(struct evlist *evlist, char *evsel_name);
+void evlist__disable_non_dummy(struct evlist *evlist);
+void evlist__enable_non_dummy(struct evlist *evlist);
 
 void evlist__set_selected(struct evlist *evlist, struct evsel *evsel);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/5] perf record: Allow multiple recording time ranges
  2022-08-24  7:28 [PATCH 0/5] perf record: Allow multiple recording time ranges Adrian Hunter
                   ` (3 preceding siblings ...)
  2022-08-24  7:28 ` [PATCH 4/5] perf evlist: Add evlist__{en/dis}able_non_dummy() Adrian Hunter
@ 2022-08-24  7:28 ` Adrian Hunter
  2022-08-24 15:52   ` Ian Rogers
  2022-08-24 16:56 ` [PATCH 0/5] " Andi Kleen
  5 siblings, 1 reply; 13+ messages in thread
From: Adrian Hunter @ 2022-08-24  7:28 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Andi Kleen,
	Alexey Bayduraev, linux-kernel

AUX area traces can produce too much data to record successfully or
analyze subsequently. Add another means to reduce data collection by
allowing multiple recording time ranges.

This is useful, for instance, in cases where a workload produces
predictably reproducible events in specific time ranges.

Today we only have perf record -D <msecs> to start at a specific region, or
some complicated approach using snapshot mode and external scripts sending
signals or using the fifos. But these approaches are difficult to set up
compared with simply having perf do it.

Extend perf record option -D/--delay option to specifying relative time
stamps for start stop controlled by perf with the right time offset, for
instance:

    perf record -e intel_pt// -D 10-20,30-40

to record 10ms to 20ms into the trace and 30ms to 40ms.

Example:

 The example workload is:

 $ cat repeat-usleep.c

 int usleep(useconds_t usec);

 int usage(int ret, const char *msg)
 {
         if (msg)
                 fprintf(stderr, "%s\n", msg);

         fprintf(stderr, "Usage is: repeat-usleep <microseconds>\n");

         return ret;
 }

 int main(int argc, char *argv[])
 {
         unsigned long usecs;
         char *end_ptr;

         if (argc != 2)
                 return usage(1, "Error: Wrong number of arguments!");

         errno = 0;
         usecs = strtoul(argv[1], &end_ptr, 0);
         if (errno || *end_ptr || usecs > UINT_MAX)
                 return usage(1, "Error: Invalid argument!");

         while (1) {
                 int ret = usleep(usecs);

                 if (ret & errno != EINTR)
                         return usage(1, "Error: usleep() failed!");
         }

         return 0;
 }

 $ perf record -e intel_pt//u --delay 10-20,40-70,110-160 -- ./repeat-usleep 500
 Events disabled
 Events enabled
 Events disabled
 Events enabled
 Events disabled
 Events enabled
 Events disabled
 [ perf record: Woken up 5 times to write data ]
 [ perf record: Captured and wrote 0.204 MB perf.data ]
 Terminated

 A dlfilter is used to determine continuous data collection (timestamps
 less than 1ms apart):

 $ cat dlfilter-show-delays.c

 static __u64 start_time;
 static __u64 last_time;

 int start(void **data, void *ctx)
 {
         printf("%-17s\t%-9s\t%-6s\n", " Time", " Duration", " Delay");
         return 0;
 }

 int filter_event_early(void *data, const struct perf_dlfilter_sample *sample, void *ctx)
 {
         __u64 delta;

         if (!sample->time)
                 return 1;
         if (!last_time)
                 goto out;
         delta = sample->time - last_time;
         if (delta < 1000000)
                 goto out2;;
         printf("%17.9f\t%9.1f\t%6.1f\n", start_time / 1000000000.0, (last_time - start_time) / 1000000.0, delta / 1000000.0);
 out:
         start_time = sample->time;
 out2:
         last_time = sample->time;
         return 1;
 }

 int stop(void *data, void *ctx)
 {
         printf("%17.9f\t%9.1f\n", start_time / 1000000000.0, (last_time - start_time) / 1000000.0);
         return 0;
 }

 The result shows the times roughly match the --delay option:

 $ perf script --itrace=qb --dlfilter dlfilter-show-delays.so
  Time                    Duration        Delay
   39215.302317300             9.7         20.5
   39215.332480217            30.4         40.9
   39215.403837717            49.8

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-record.txt |   6 +-
 tools/perf/builtin-record.c              |  24 ++-
 tools/perf/util/evlist.c                 | 234 +++++++++++++++++++++++
 tools/perf/util/evlist.h                 |   9 +
 4 files changed, 269 insertions(+), 4 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 099817ef5150..953b9663522a 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -430,8 +430,10 @@ if combined with -a or -C options.
 -D::
 --delay=::
 After starting the program, wait msecs before measuring (-1: start with events
-disabled). This is useful to filter out the startup phase of the program, which
-is often very different.
+disabled), or enable events only for specified ranges of msecs (e.g.
+-D 10-20,30-40 means wait 10 msecs, enable for 10 msecs, wait 10 msecs, enable
+for 10 msecs, then stop). Note, delaying enabling of events is useful to filter
+out the startup phase of the program, which is often very different.
 
 -I::
 --intr-regs::
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index cefb3028f565..7fdb1dd9a0a8 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -2498,6 +2498,10 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		}
 	}
 
+	err = event_enable_timer__start(rec->evlist->eet);
+	if (err)
+		goto out_child;
+
 	trigger_ready(&auxtrace_snapshot_trigger);
 	trigger_ready(&switch_output_trigger);
 	perf_hooks__invoke_record_start();
@@ -2621,6 +2625,14 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 			}
 		}
 
+		err = event_enable_timer__process(rec->evlist->eet);
+		if (err < 0)
+			goto out_child;
+		if (err) {
+			err = 0;
+			done = 1;
+		}
+
 		/*
 		 * When perf is starting the traced process, at the end events
 		 * die with the process and we wait for that. Thus no need to
@@ -2842,6 +2854,12 @@ static int perf_record_config(const char *var, const char *value, void *cb)
 	return 0;
 }
 
+static int record__parse_event_enable_time(const struct option *opt, const char *str, int unset)
+{
+	struct record *rec = (struct record *)opt->value;
+
+	return evlist__parse_event_enable_time(rec->evlist, &rec->opts, str, unset);
+}
 
 static int record__parse_affinity(const struct option *opt, const char *str, int unset)
 {
@@ -3303,8 +3321,10 @@ static struct option __record_options[] = {
 	OPT_CALLBACK('G', "cgroup", &record.evlist, "name",
 		     "monitor event in cgroup name only",
 		     parse_cgroups),
-	OPT_INTEGER('D', "delay", &record.opts.initial_delay,
-		  "ms to wait before starting measurement after program start (-1: start with events disabled)"),
+	OPT_CALLBACK('D', "delay", &record, "ms",
+		     "ms to wait before starting measurement after program start (-1: start with events disabled), "
+		     "or ranges of time to enable events e.g. '-D 10-20,30-40'",
+		     record__parse_event_enable_time),
 	OPT_BOOLEAN(0, "kcore", &record.opts.kcore, "copy /proc/kcore"),
 	OPT_STRING('u', "uid", &record.opts.target.uid_str, "user",
 		   "user to profile"),
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 3cfe730c12b8..fcfe5bcc0bcf 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -15,6 +15,7 @@
 #include "target.h"
 #include "evlist.h"
 #include "evsel.h"
+#include "record.h"
 #include "debug.h"
 #include "units.h"
 #include "bpf_counter.h"
@@ -40,12 +41,14 @@
 #include <sys/ioctl.h>
 #include <sys/mman.h>
 #include <sys/prctl.h>
+#include <sys/timerfd.h>
 
 #include <linux/bitops.h>
 #include <linux/hash.h>
 #include <linux/log2.h>
 #include <linux/err.h>
 #include <linux/string.h>
+#include <linux/time64.h>
 #include <linux/zalloc.h>
 #include <perf/evlist.h>
 #include <perf/evsel.h>
@@ -147,6 +150,7 @@ static void evlist__purge(struct evlist *evlist)
 
 void evlist__exit(struct evlist *evlist)
 {
+	event_enable_timer__exit(&evlist->eet);
 	zfree(&evlist->mmap);
 	zfree(&evlist->overwrite_mmap);
 	perf_evlist__exit(&evlist->core);
@@ -2167,6 +2171,236 @@ int evlist__ctlfd_process(struct evlist *evlist, enum evlist_ctl_cmd *cmd)
 	return err;
 }
 
+/**
+ * struct event_enable_time - perf record -D/--delay single time range.
+ * @start: start of time range to enable events in milliseconds
+ * @end: end of time range to enable events in milliseconds
+ *
+ * N.B. this structure is also accessed as an array of int.
+ */
+struct event_enable_time {
+	int	start;
+	int	end;
+};
+
+static int parse_event_enable_time(const char *str, struct event_enable_time *range, bool first)
+{
+	const char *fmt = first ? "%u - %u %n" : " , %u - %u %n";
+	int ret, start, end, n;
+
+	ret = sscanf(str, fmt, &start, &end, &n);
+	if (ret != 2 || end <= start)
+		return -EINVAL;
+	if (range) {
+		range->start = start;
+		range->end = end;
+	}
+	return n;
+}
+
+static ssize_t parse_event_enable_times(const char *str, struct event_enable_time *range)
+{
+	int incr = !!range;
+	bool first = true;
+	ssize_t ret, cnt;
+
+	for (cnt = 0; *str; cnt++) {
+		ret = parse_event_enable_time(str, range, first);
+		if (ret < 0)
+			return ret;
+		/* Check no overlap */
+		if (!first && range && range->start <= range[-1].end)
+			return -EINVAL;
+		str += ret;
+		range += incr;
+		first = false;
+	}
+	return cnt;
+}
+
+/**
+ * struct event_enable_timer - control structure for perf record -D/--delay.
+ * @evlist: event list
+ * @times: time ranges that events are enabled (N.B. this is also accessed as an
+ *         array of int)
+ * @times_cnt: number of time ranges
+ * @timerfd: timer file descriptor
+ * @pollfd_pos: position in @evlist array of file descriptors to poll (fdarray)
+ * @times_step: current position in (int *)@times)[],
+ *              refer event_enable_timer__process()
+ *
+ * Note, this structure is only used when there are time ranges, not when there
+ * is only an initial delay.
+ */
+struct event_enable_timer {
+	struct evlist *evlist;
+	struct event_enable_time *times;
+	size_t	times_cnt;
+	int	timerfd;
+	int	pollfd_pos;
+	size_t	times_step;
+};
+
+static int str_to_delay(const char *str)
+{
+	char *endptr;
+	long d;
+
+	d = strtol(str, &endptr, 10);
+	if (*endptr || d > INT_MAX || d < -1)
+		return 0;
+	return d;
+}
+
+int evlist__parse_event_enable_time(struct evlist *evlist, struct record_opts *opts,
+				    const char *str, int unset)
+{
+	enum fdarray_flags flags = fdarray_flag__nonfilterable | fdarray_flag__non_perf_event;
+	struct event_enable_timer *eet;
+	ssize_t times_cnt;
+	ssize_t ret;
+	int err;
+
+	if (unset)
+		return 0;
+
+	opts->initial_delay = str_to_delay(str);
+	if (opts->initial_delay)
+		return 0;
+
+	ret = parse_event_enable_times(str, NULL);
+	if (ret < 0)
+		return ret;
+
+	times_cnt = ret;
+	if (times_cnt == 0)
+		return -EINVAL;
+
+	eet = zalloc(sizeof(*eet));
+	if (!eet)
+		return -ENOMEM;
+
+	eet->times = calloc(times_cnt, sizeof(*eet->times));
+	if (!eet->times) {
+		err = -ENOMEM;
+		goto free_eet;
+	}
+
+	if (parse_event_enable_times(str, eet->times) != times_cnt) {
+		err = -EINVAL;
+		goto free_eet_times;
+	}
+
+	eet->times_cnt = times_cnt;
+
+	eet->timerfd = timerfd_create(CLOCK_MONOTONIC, TFD_CLOEXEC);
+	if (eet->timerfd == -1) {
+		err = -errno;
+		pr_err("timerfd_create failed: %s\n", strerror(errno));
+		goto free_eet_times;
+	}
+
+	eet->pollfd_pos = perf_evlist__add_pollfd(&evlist->core, eet->timerfd, NULL, POLLIN, flags);
+	if (eet->pollfd_pos < 0) {
+		err = eet->pollfd_pos;
+		goto close_timerfd;
+	}
+
+	eet->evlist = evlist;
+	evlist->eet = eet;
+	opts->initial_delay = eet->times[0].start;
+
+	return 0;
+
+close_timerfd:
+	close(eet->timerfd);
+free_eet_times:
+	free(eet->times);
+free_eet:
+	free(eet);
+	return err;
+}
+
+static int event_enable_timer__set_timer(struct event_enable_timer *eet, int ms)
+{
+	struct itimerspec its = {
+		.it_value.tv_sec = ms / MSEC_PER_SEC,
+		.it_value.tv_nsec = (ms % MSEC_PER_SEC) * NSEC_PER_MSEC,
+	};
+	int err = 0;
+
+	if (timerfd_settime(eet->timerfd, 0, &its, NULL) < 0) {
+		err = -errno;
+		pr_err("timerfd_settime failed: %s\n", strerror(errno));
+	}
+	return err;
+}
+
+int event_enable_timer__start(struct event_enable_timer *eet)
+{
+	int ms;
+
+	if (!eet)
+		return 0;
+
+	ms = eet->times[0].end - eet->times[0].start;
+	eet->times_step = 1;
+
+	return event_enable_timer__set_timer(eet, ms);
+}
+
+int event_enable_timer__process(struct event_enable_timer *eet)
+{
+	struct pollfd *entries;
+	short revents;
+
+	if (!eet)
+		return 0;
+
+	entries = eet->evlist->core.pollfd.entries;
+	revents = entries[eet->pollfd_pos].revents;
+	entries[eet->pollfd_pos].revents = 0;
+
+	if (revents & POLLIN) {
+		size_t step = eet->times_step;
+		size_t pos = step / 2;
+
+		if (step & 1) {
+			evlist__disable_non_dummy(eet->evlist);
+			pr_info(EVLIST_DISABLED_MSG);
+			if (pos >= eet->times_cnt - 1) {
+				/* Disarm timer */
+				event_enable_timer__set_timer(eet, 0);
+				return 1; /* Stop */
+			}
+		} else {
+			evlist__enable_non_dummy(eet->evlist);
+			pr_info(EVLIST_ENABLED_MSG);
+		}
+
+		step += 1;
+		pos = step / 2;
+
+		if (pos < eet->times_cnt) {
+			int *times = (int *)eet->times; /* Accessing 'times' as array of int */
+			int ms = times[step] - times[step - 1];
+
+			eet->times_step = step;
+			return event_enable_timer__set_timer(eet, ms);
+		}
+	}
+
+	return 0;
+}
+
+void event_enable_timer__exit(struct event_enable_timer **ep)
+{
+	if (!ep || !*ep)
+		return;
+	free((*ep)->times);
+	zfree(ep);
+}
+
 struct evsel *evlist__find_evsel(struct evlist *evlist, int idx)
 {
 	struct evsel *evsel;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 3a8474406738..9d967fe3953a 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -48,6 +48,8 @@ enum bkw_mmap_state {
 	BKW_MMAP_EMPTY,
 };
 
+struct event_enable_timer;
+
 struct evlist {
 	struct perf_evlist core;
 	bool		 enabled;
@@ -79,6 +81,7 @@ struct evlist {
 		int	ack;	/* ack file descriptor for control commands */
 		int	pos;	/* index at evlist core object to check signals */
 	} ctl_fd;
+	struct event_enable_timer *eet;
 };
 
 struct evsel_str_handler {
@@ -426,6 +429,12 @@ int evlist__ctlfd_ack(struct evlist *evlist);
 #define EVLIST_ENABLED_MSG "Events enabled\n"
 #define EVLIST_DISABLED_MSG "Events disabled\n"
 
+int evlist__parse_event_enable_time(struct evlist *evlist, struct record_opts *opts,
+				    const char *str, int unset);
+int event_enable_timer__start(struct event_enable_timer *eet);
+void event_enable_timer__exit(struct event_enable_timer **ep);
+int event_enable_timer__process(struct event_enable_timer *eet);
+
 struct evsel *evlist__find_evsel(struct evlist *evlist, int idx);
 
 int evlist__scnprintf_evsels(struct evlist *evlist, size_t size, char *bf);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/5] perf record: Fix way of handling non-perf-event pollfds
  2022-08-24  7:28 ` [PATCH 1/5] perf record: Fix way of handling non-perf-event pollfds Adrian Hunter
@ 2022-08-24 15:40   ` Ian Rogers
  0 siblings, 0 replies; 13+ messages in thread
From: Ian Rogers @ 2022-08-24 15:40 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	Alexey Bayduraev, linux-kernel

On Wed, Aug 24, 2022 at 12:28 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> perf record __cmd_record() does not poll evlist pollfds. Instead it polls
> thread_data[0].pollfd. That happens whether or not threads are being used.
>
> perf record duplicates evlist mmap pollfds as needed for separate threads.
> The non-perf-event represented by evlist->ctl_fd has to handled separately,
> which is done explicitly, duplicating it into the thread_data[0] pollfds.
> That approach neglects any other non-perf-event file descriptors. Currently
> there is also done_fd which needs the same handling.
>
> Add a new generalized approach.
>
> Add fdarray_flag__non_perf_event to identify the file descriptors that
> need the special handling. For those cases, also keep a mapping of the
> evlist pollfd index and thread pollfd index, so that the evlist revents
> can be updated.
>
> Although this patch adds the new handling, it does not take it into use.
> There is no functional change, but it is the precursor to a fix, so is
> marked as a fix.
>
> Fixes: 415ccb58f68a ("perf record: Introduce thread specific data array")
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/lib/api/fd/array.h    |  5 ++-
>  tools/perf/builtin-record.c | 80 +++++++++++++++++++++++++++++++++++++
>  2 files changed, 83 insertions(+), 2 deletions(-)
>
> diff --git a/tools/lib/api/fd/array.h b/tools/lib/api/fd/array.h
> index 60ad197c8ee9..5c01f7b05dfb 100644
> --- a/tools/lib/api/fd/array.h
> +++ b/tools/lib/api/fd/array.h
> @@ -31,8 +31,9 @@ struct fdarray {
>  };
>
>  enum fdarray_flags {
> -       fdarray_flag__default       = 0x00000000,
> -       fdarray_flag__nonfilterable = 0x00000001
> +       fdarray_flag__default           = 0x00000000,
> +       fdarray_flag__nonfilterable     = 0x00000001,
> +       fdarray_flag__non_perf_event    = 0x00000002,
>  };
>
>  void fdarray__init(struct fdarray *fda, int nr_autogrow);
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 4713f0f3a6cf..e0be48c47f65 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -143,6 +143,11 @@ static const char *thread_spec_tags[THREAD_SPEC__MAX] = {
>         "undefined", "cpu", "core", "package", "numa", "user"
>  };
>
> +struct pollfd_index_map {
> +       int evlist_pollfd_index;
> +       int thread_pollfd_index;
> +};
> +
>  struct record {
>         struct perf_tool        tool;
>         struct record_opts      opts;
> @@ -171,6 +176,9 @@ struct record {
>         int                     nr_threads;
>         struct thread_mask      *thread_masks;
>         struct record_thread    *thread_data;
> +       struct pollfd_index_map *index_map;
> +       size_t                  index_map_sz;
> +       size_t                  index_map_cnt;
>  };
>
>  static volatile int done;
> @@ -1074,6 +1082,70 @@ static void record__free_thread_data(struct record *rec)
>         zfree(&rec->thread_data);
>  }
>
> +static int record__map_thread_evlist_pollfd_indexes(struct record *rec,
> +                                                   int evlist_pollfd_index,
> +                                                   int thread_pollfd_index)
> +{
> +       size_t x = rec->index_map_cnt;
> +
> +       if (realloc_array_as_needed(rec->index_map, rec->index_map_sz, x, NULL))
> +               return -ENOMEM;
> +       rec->index_map[x].evlist_pollfd_index = evlist_pollfd_index;
> +       rec->index_map[x].thread_pollfd_index = thread_pollfd_index;
> +       rec->index_map_cnt += 1;
> +       return 0;
> +}
> +
> +static int record__update_evlist_pollfd_from_thread(struct record *rec,
> +                                                   struct evlist *evlist,
> +                                                   struct record_thread *thread_data)
> +{
> +       struct pollfd *e_entries = evlist->core.pollfd.entries;
> +       struct pollfd *t_entries = thread_data->pollfd.entries;
> +       int err = 0;
> +       size_t i;
> +
> +       for (i = 0; i < rec->index_map_cnt; i++) {
> +               int e_pos = rec->index_map[i].evlist_pollfd_index;
> +               int t_pos = rec->index_map[i].thread_pollfd_index;
> +
> +               if (e_entries[e_pos].fd != t_entries[t_pos].fd ||
> +                   e_entries[e_pos].events != t_entries[t_pos].events) {
> +                       pr_err("Thread and evlist pollfd index mismatch\n");
> +                       err = -EINVAL;
> +                       continue;
> +               }
> +               e_entries[e_pos].revents = t_entries[t_pos].revents;
> +       }
> +       return err;
> +}
> +
> +static int record__dup_non_perf_events(struct record *rec,
> +                                      struct evlist *evlist,
> +                                      struct record_thread *thread_data)
> +{
> +       struct fdarray *fda = &evlist->core.pollfd;
> +       int i, ret;
> +
> +       for (i = 0; i < fda->nr; i++) {
> +               if (!(fda->priv[i].flags & fdarray_flag__non_perf_event))
> +                       continue;
> +               ret = fdarray__dup_entry_from(&thread_data->pollfd, i, fda);
> +               if (ret < 0) {
> +                       pr_err("Failed to duplicate descriptor in main thread pollfd\n");
> +                       return ret;
> +               }
> +               pr_debug2("thread_data[%p]: pollfd[%d] <- non_perf_event fd=%d\n",
> +                         thread_data, ret, fda->entries[i].fd);
> +               ret = record__map_thread_evlist_pollfd_indexes(rec, i, ret);
> +               if (ret < 0) {
> +                       pr_err("Failed to map thread and evlist pollfd indexes\n");
> +                       return ret;
> +               }
> +       }
> +       return 0;
> +}
> +
>  static int record__alloc_thread_data(struct record *rec, struct evlist *evlist)
>  {
>         int t, ret;
> @@ -1121,6 +1193,11 @@ static int record__alloc_thread_data(struct record *rec, struct evlist *evlist)
>                                  thread_data[t].pipes.msg[0]);
>                 } else {
>                         thread_data[t].tid = gettid();
> +
> +                       ret = record__dup_non_perf_events(rec, evlist, &thread_data[t]);
> +                       if (ret < 0)
> +                               goto out_free;
> +
>                         if (evlist->ctl_fd.pos == -1)
>                                 continue;
>                         ret = fdarray__dup_entry_from(&thread_data[t].pollfd, evlist->ctl_fd.pos,
> @@ -2530,6 +2607,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>                                             record__thread_munmap_filtered, NULL) == 0)
>                                 draining = true;
>
> +                       err = record__update_evlist_pollfd_from_thread(rec, rec->evlist, thread);
> +                       if (err)
> +                               goto out_child;
>                         evlist__ctlfd_update(rec->evlist,
>                                 &thread->pollfd.entries[thread->ctlfd_pos]);
>                 }
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/5] perf record: Fix done_fd wakeup event
  2022-08-24  7:28 ` [PATCH 2/5] perf record: Fix done_fd wakeup event Adrian Hunter
@ 2022-08-24 15:41   ` Ian Rogers
  0 siblings, 0 replies; 13+ messages in thread
From: Ian Rogers @ 2022-08-24 15:41 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	Alexey Bayduraev, linux-kernel

On Wed, Aug 24, 2022 at 12:28 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> evlist__add_wakeup_eventfd() calls perf_evlist__add_pollfd() to add a
> non-perf-event to the evlist pollfds. Since commit 415ccb58f68a
> ("perf record: Introduce thread specific data array") that doesn't work
> because evlist pollfs is not polled and done_fd is not duplicated into
> thread-data.
>
> Patch "perf record: Fix way of handling non-perf-event pollfds" added a new
> approach that ensures file descriptors like done_fd are handled correctly
> by flagging them as fdarray_flag__non_perf_event.
>
> Fix by flagging done_fd as fdarray_flag__non_perf_event.
>
> Example:
>
>  Before:
>
>   $ sleep 3 & perf record -vv -p $!
>   ...
>   thread_data[0x55f44bd34140]: pollfd[0] <- event_fd=5
>   thread_data[0x55f44bd34140]: pollfd[1] <- event_fd=6
>   thread_data[0x55f44bd34140]: pollfd[2] <- event_fd=7
>   thread_data[0x55f44bd34140]: pollfd[3] <- event_fd=8
>   thread_data[0x55f44bd34140]: pollfd[4] <- event_fd=9
>   thread_data[0x55f44bd34140]: pollfd[5] <- event_fd=10
>   thread_data[0x55f44bd34140]: pollfd[6] <- event_fd=11
>   thread_data[0x55f44bd34140]: pollfd[7] <- event_fd=12
>   ...
>
>  After:
>
>   $ sleep 3 & perf record -vv -p $!
>   ...
>   thread_data[0x55a8ded89140]: pollfd[0] <- event_fd=5
>   thread_data[0x55a8ded89140]: pollfd[1] <- event_fd=6
>   thread_data[0x55a8ded89140]: pollfd[2] <- event_fd=7
>   thread_data[0x55a8ded89140]: pollfd[3] <- event_fd=8
>   thread_data[0x55a8ded89140]: pollfd[4] <- event_fd=9
>   thread_data[0x55a8ded89140]: pollfd[5] <- event_fd=10
>   thread_data[0x55a8ded89140]: pollfd[6] <- event_fd=11
>   thread_data[0x55a8ded89140]: pollfd[7] <- event_fd=12
>   thread_data[0x55a8ded89140]: pollfd[8] <- non_perf_event fd=4
>   ...
>
> This patch depends on "perf record: Fix way of handling non-perf-event
> pollfds".
>
> Fixes: 415ccb58f68a ("perf record: Introduce thread specific data array")
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/util/evlist.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 48167f3941a6..0b2222d05577 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -608,7 +608,8 @@ int evlist__filter_pollfd(struct evlist *evlist, short revents_and_mask)
>  int evlist__add_wakeup_eventfd(struct evlist *evlist, int fd)
>  {
>         return perf_evlist__add_pollfd(&evlist->core, fd, NULL, POLLIN,
> -                                      fdarray_flag__nonfilterable);
> +                                      fdarray_flag__nonfilterable |
> +                                      fdarray_flag__non_perf_event);
>  }
>  #endif
>
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/5] perf record: Change evlist->ctl_fd to use fdarray_flag__non_perf_event
  2022-08-24  7:28 ` [PATCH 3/5] perf record: Change evlist->ctl_fd to use fdarray_flag__non_perf_event Adrian Hunter
@ 2022-08-24 15:42   ` Ian Rogers
  0 siblings, 0 replies; 13+ messages in thread
From: Ian Rogers @ 2022-08-24 15:42 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	Alexey Bayduraev, linux-kernel

On Wed, Aug 24, 2022 at 12:28 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> Patch "perf record: Fix way of handling non-perf-event pollfds" added a
> generic way to handle non-perf-event file descriptors like evlist->ctl_fd.
> Use it instead of handling evlist->ctl_fd separately.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/builtin-record.c | 15 +--------------
>  tools/perf/util/evlist.c    | 19 ++-----------------
>  tools/perf/util/evlist.h    |  1 -
>  3 files changed, 3 insertions(+), 32 deletions(-)
>
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index e0be48c47f65..cefb3028f565 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -1198,18 +1198,7 @@ static int record__alloc_thread_data(struct record *rec, struct evlist *evlist)
>                         if (ret < 0)
>                                 goto out_free;
>
> -                       if (evlist->ctl_fd.pos == -1)
> -                               continue;
> -                       ret = fdarray__dup_entry_from(&thread_data[t].pollfd, evlist->ctl_fd.pos,
> -                                                     &evlist->core.pollfd);
> -                       if (ret < 0) {
> -                               pr_err("Failed to duplicate descriptor in main thread pollfd\n");
> -                               goto out_free;
> -                       }
> -                       thread_data[t].ctlfd_pos = ret;
> -                       pr_debug2("thread_data[%p]: pollfd[%d] <- ctl_fd=%d\n",
> -                                thread_data, thread_data[t].ctlfd_pos,
> -                                evlist->core.pollfd.entries[evlist->ctl_fd.pos].fd);
> +                       thread_data[t].ctlfd_pos = -1; /* Not used */
>                 }
>         }
>
> @@ -2610,8 +2599,6 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>                         err = record__update_evlist_pollfd_from_thread(rec, rec->evlist, thread);
>                         if (err)
>                                 goto out_child;
> -                       evlist__ctlfd_update(rec->evlist,
> -                               &thread->pollfd.entries[thread->ctlfd_pos]);
>                 }
>
>                 if (evlist__ctlfd_process(rec->evlist, &cmd) > 0) {
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 0b2222d05577..4c5e6e9f8d11 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -1898,7 +1898,8 @@ int evlist__initialize_ctlfd(struct evlist *evlist, int fd, int ack)
>         }
>
>         evlist->ctl_fd.pos = perf_evlist__add_pollfd(&evlist->core, fd, NULL, POLLIN,
> -                                                    fdarray_flag__nonfilterable);
> +                                                    fdarray_flag__nonfilterable |
> +                                                    fdarray_flag__non_perf_event);
>         if (evlist->ctl_fd.pos < 0) {
>                 evlist->ctl_fd.pos = -1;
>                 pr_err("Failed to add ctl fd entry: %m\n");
> @@ -2148,22 +2149,6 @@ int evlist__ctlfd_process(struct evlist *evlist, enum evlist_ctl_cmd *cmd)
>         return err;
>  }
>
> -int evlist__ctlfd_update(struct evlist *evlist, struct pollfd *update)
> -{
> -       int ctlfd_pos = evlist->ctl_fd.pos;
> -       struct pollfd *entries = evlist->core.pollfd.entries;
> -
> -       if (!evlist__ctlfd_initialized(evlist))
> -               return 0;
> -
> -       if (entries[ctlfd_pos].fd != update->fd ||
> -           entries[ctlfd_pos].events != update->events)
> -               return -1;
> -
> -       entries[ctlfd_pos].revents = update->revents;
> -       return 0;
> -}
> -
>  struct evsel *evlist__find_evsel(struct evlist *evlist, int idx)
>  {
>         struct evsel *evsel;
> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> index 351ba2887a79..3a464585d397 100644
> --- a/tools/perf/util/evlist.h
> +++ b/tools/perf/util/evlist.h
> @@ -418,7 +418,6 @@ void evlist__close_control(int ctl_fd, int ctl_fd_ack, bool *ctl_fd_close);
>  int evlist__initialize_ctlfd(struct evlist *evlist, int ctl_fd, int ctl_fd_ack);
>  int evlist__finalize_ctlfd(struct evlist *evlist);
>  bool evlist__ctlfd_initialized(struct evlist *evlist);
> -int evlist__ctlfd_update(struct evlist *evlist, struct pollfd *update);
>  int evlist__ctlfd_process(struct evlist *evlist, enum evlist_ctl_cmd *cmd);
>  int evlist__ctlfd_ack(struct evlist *evlist);
>
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/5] perf evlist: Add evlist__{en/dis}able_non_dummy()
  2022-08-24  7:28 ` [PATCH 4/5] perf evlist: Add evlist__{en/dis}able_non_dummy() Adrian Hunter
@ 2022-08-24 15:45   ` Ian Rogers
  0 siblings, 0 replies; 13+ messages in thread
From: Ian Rogers @ 2022-08-24 15:45 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	Alexey Bayduraev, linux-kernel

On Wed, Aug 24, 2022 at 12:28 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> Dummy events are used to provide sideband information like MMAP events that
> are always needed even when main events are disabled. Add functions that
> take that into account.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/util/evlist.c | 30 ++++++++++++++++++++++++------
>  tools/perf/util/evlist.h |  2 ++
>  2 files changed, 26 insertions(+), 6 deletions(-)
>
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 4c5e6e9f8d11..3cfe730c12b8 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -480,7 +480,7 @@ static int evlist__is_enabled(struct evlist *evlist)
>         return false;
>  }
>
> -static void __evlist__disable(struct evlist *evlist, char *evsel_name)
> +static void __evlist__disable(struct evlist *evlist, char *evsel_name, bool excl_dummy)
>  {
>         struct evsel *pos;
>         struct evlist_cpu_iterator evlist_cpu_itr;
> @@ -502,6 +502,8 @@ static void __evlist__disable(struct evlist *evlist, char *evsel_name)
>                                 continue;
>                         if (pos->disabled || !evsel__is_group_leader(pos) || !pos->core.fd)
>                                 continue;
> +                       if (excl_dummy && evsel__is_dummy_event(pos))
> +                               continue;
>                         if (pos->immediate)
>                                 has_imm = true;
>                         if (pos->immediate != imm)
> @@ -518,6 +520,8 @@ static void __evlist__disable(struct evlist *evlist, char *evsel_name)
>                         continue;
>                 if (!evsel__is_group_leader(pos) || !pos->core.fd)
>                         continue;
> +               if (excl_dummy && evsel__is_dummy_event(pos))
> +                       continue;
>                 pos->disabled = true;
>         }
>
> @@ -533,15 +537,20 @@ static void __evlist__disable(struct evlist *evlist, char *evsel_name)
>
>  void evlist__disable(struct evlist *evlist)
>  {
> -       __evlist__disable(evlist, NULL);
> +       __evlist__disable(evlist, NULL, false);
> +}
> +
> +void evlist__disable_non_dummy(struct evlist *evlist)
> +{
> +       __evlist__disable(evlist, NULL, true);

nit: I think it is better to  prefer to name the argument when passing
constants, it can avoid errors if later constants are introduced, etc.
It can also be checked by compiler warnings. As this is limited to the
same file it is not hugely critical.

>  }
>
>  void evlist__disable_evsel(struct evlist *evlist, char *evsel_name)
>  {
> -       __evlist__disable(evlist, evsel_name);
> +       __evlist__disable(evlist, evsel_name, false);
>  }
>
> -static void __evlist__enable(struct evlist *evlist, char *evsel_name)
> +static void __evlist__enable(struct evlist *evlist, char *evsel_name, bool excl_dummy)
>  {
>         struct evsel *pos;
>         struct evlist_cpu_iterator evlist_cpu_itr;
> @@ -560,6 +569,8 @@ static void __evlist__enable(struct evlist *evlist, char *evsel_name)
>                         continue;
>                 if (!evsel__is_group_leader(pos) || !pos->core.fd)
>                         continue;
> +               if (excl_dummy && evsel__is_dummy_event(pos))
> +                       continue;
>                 evsel__enable_cpu(pos, evlist_cpu_itr.cpu_map_idx);
>         }
>         affinity__cleanup(affinity);
> @@ -568,6 +579,8 @@ static void __evlist__enable(struct evlist *evlist, char *evsel_name)
>                         continue;
>                 if (!evsel__is_group_leader(pos) || !pos->core.fd)
>                         continue;
> +               if (excl_dummy && evsel__is_dummy_event(pos))
> +                       continue;
>                 pos->disabled = false;
>         }
>
> @@ -581,12 +594,17 @@ static void __evlist__enable(struct evlist *evlist, char *evsel_name)
>
>  void evlist__enable(struct evlist *evlist)
>  {
> -       __evlist__enable(evlist, NULL);
> +       __evlist__enable(evlist, NULL, false);
> +}
> +
> +void evlist__enable_non_dummy(struct evlist *evlist)
> +{
> +       __evlist__enable(evlist, NULL, true);
>  }
>
>  void evlist__enable_evsel(struct evlist *evlist, char *evsel_name)
>  {
> -       __evlist__enable(evlist, evsel_name);
> +       __evlist__enable(evlist, evsel_name, false);
>  }
>
>  void evlist__toggle_enable(struct evlist *evlist)
> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> index 3a464585d397..3a8474406738 100644
> --- a/tools/perf/util/evlist.h
> +++ b/tools/perf/util/evlist.h
> @@ -205,6 +205,8 @@ void evlist__enable(struct evlist *evlist);
>  void evlist__toggle_enable(struct evlist *evlist);
>  void evlist__disable_evsel(struct evlist *evlist, char *evsel_name);
>  void evlist__enable_evsel(struct evlist *evlist, char *evsel_name);
> +void evlist__disable_non_dummy(struct evlist *evlist);
> +void evlist__enable_non_dummy(struct evlist *evlist);
>
>  void evlist__set_selected(struct evlist *evlist, struct evsel *evsel);
>
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/5] perf record: Allow multiple recording time ranges
  2022-08-24  7:28 ` [PATCH 5/5] perf record: Allow multiple recording time ranges Adrian Hunter
@ 2022-08-24 15:52   ` Ian Rogers
  2022-08-26  6:38     ` Adrian Hunter
  0 siblings, 1 reply; 13+ messages in thread
From: Ian Rogers @ 2022-08-24 15:52 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	Alexey Bayduraev, linux-kernel

On Wed, Aug 24, 2022 at 12:28 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> AUX area traces can produce too much data to record successfully or
> analyze subsequently. Add another means to reduce data collection by
> allowing multiple recording time ranges.
>
> This is useful, for instance, in cases where a workload produces
> predictably reproducible events in specific time ranges.
>
> Today we only have perf record -D <msecs> to start at a specific region, or
> some complicated approach using snapshot mode and external scripts sending
> signals or using the fifos. But these approaches are difficult to set up
> compared with simply having perf do it.
>
> Extend perf record option -D/--delay option to specifying relative time
> stamps for start stop controlled by perf with the right time offset, for
> instance:
>
>     perf record -e intel_pt// -D 10-20,30-40
>
> to record 10ms to 20ms into the trace and 30ms to 40ms.

Could you add a test like this in:
https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/tests/shell/record.sh?h=perf/core
If for no other reason than code coverage. It would also be
interesting to see what happens if floating point values are passed,
ranges are incomplete, etc. As the functionality is generic the event
could be cycles or instructions rather than intel_pt.

Thanks,
Ian

> Example:
>
>  The example workload is:
>
>  $ cat repeat-usleep.c
>
>  int usleep(useconds_t usec);
>
>  int usage(int ret, const char *msg)
>  {
>          if (msg)
>                  fprintf(stderr, "%s\n", msg);
>
>          fprintf(stderr, "Usage is: repeat-usleep <microseconds>\n");
>
>          return ret;
>  }
>
>  int main(int argc, char *argv[])
>  {
>          unsigned long usecs;
>          char *end_ptr;
>
>          if (argc != 2)
>                  return usage(1, "Error: Wrong number of arguments!");
>
>          errno = 0;
>          usecs = strtoul(argv[1], &end_ptr, 0);
>          if (errno || *end_ptr || usecs > UINT_MAX)
>                  return usage(1, "Error: Invalid argument!");
>
>          while (1) {
>                  int ret = usleep(usecs);
>
>                  if (ret & errno != EINTR)
>                          return usage(1, "Error: usleep() failed!");
>          }
>
>          return 0;
>  }
>
>  $ perf record -e intel_pt//u --delay 10-20,40-70,110-160 -- ./repeat-usleep 500
>  Events disabled
>  Events enabled
>  Events disabled
>  Events enabled
>  Events disabled
>  Events enabled
>  Events disabled
>  [ perf record: Woken up 5 times to write data ]
>  [ perf record: Captured and wrote 0.204 MB perf.data ]
>  Terminated
>
>  A dlfilter is used to determine continuous data collection (timestamps
>  less than 1ms apart):
>
>  $ cat dlfilter-show-delays.c
>
>  static __u64 start_time;
>  static __u64 last_time;
>
>  int start(void **data, void *ctx)
>  {
>          printf("%-17s\t%-9s\t%-6s\n", " Time", " Duration", " Delay");
>          return 0;
>  }
>
>  int filter_event_early(void *data, const struct perf_dlfilter_sample *sample, void *ctx)
>  {
>          __u64 delta;
>
>          if (!sample->time)
>                  return 1;
>          if (!last_time)
>                  goto out;
>          delta = sample->time - last_time;
>          if (delta < 1000000)
>                  goto out2;;
>          printf("%17.9f\t%9.1f\t%6.1f\n", start_time / 1000000000.0, (last_time - start_time) / 1000000.0, delta / 1000000.0);
>  out:
>          start_time = sample->time;
>  out2:
>          last_time = sample->time;
>          return 1;
>  }
>
>  int stop(void *data, void *ctx)
>  {
>          printf("%17.9f\t%9.1f\n", start_time / 1000000000.0, (last_time - start_time) / 1000000.0);
>          return 0;
>  }
>
>  The result shows the times roughly match the --delay option:
>
>  $ perf script --itrace=qb --dlfilter dlfilter-show-delays.so
>   Time                    Duration        Delay
>    39215.302317300             9.7         20.5
>    39215.332480217            30.4         40.9
>    39215.403837717            49.8
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/Documentation/perf-record.txt |   6 +-
>  tools/perf/builtin-record.c              |  24 ++-
>  tools/perf/util/evlist.c                 | 234 +++++++++++++++++++++++
>  tools/perf/util/evlist.h                 |   9 +
>  4 files changed, 269 insertions(+), 4 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
> index 099817ef5150..953b9663522a 100644
> --- a/tools/perf/Documentation/perf-record.txt
> +++ b/tools/perf/Documentation/perf-record.txt
> @@ -430,8 +430,10 @@ if combined with -a or -C options.
>  -D::
>  --delay=::
>  After starting the program, wait msecs before measuring (-1: start with events
> -disabled). This is useful to filter out the startup phase of the program, which
> -is often very different.
> +disabled), or enable events only for specified ranges of msecs (e.g.
> +-D 10-20,30-40 means wait 10 msecs, enable for 10 msecs, wait 10 msecs, enable
> +for 10 msecs, then stop). Note, delaying enabling of events is useful to filter
> +out the startup phase of the program, which is often very different.
>
>  -I::
>  --intr-regs::
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index cefb3028f565..7fdb1dd9a0a8 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -2498,6 +2498,10 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>                 }
>         }
>
> +       err = event_enable_timer__start(rec->evlist->eet);
> +       if (err)
> +               goto out_child;
> +
>         trigger_ready(&auxtrace_snapshot_trigger);
>         trigger_ready(&switch_output_trigger);
>         perf_hooks__invoke_record_start();
> @@ -2621,6 +2625,14 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>                         }
>                 }
>
> +               err = event_enable_timer__process(rec->evlist->eet);
> +               if (err < 0)
> +                       goto out_child;
> +               if (err) {
> +                       err = 0;
> +                       done = 1;
> +               }
> +
>                 /*
>                  * When perf is starting the traced process, at the end events
>                  * die with the process and we wait for that. Thus no need to
> @@ -2842,6 +2854,12 @@ static int perf_record_config(const char *var, const char *value, void *cb)
>         return 0;
>  }
>
> +static int record__parse_event_enable_time(const struct option *opt, const char *str, int unset)
> +{
> +       struct record *rec = (struct record *)opt->value;
> +
> +       return evlist__parse_event_enable_time(rec->evlist, &rec->opts, str, unset);
> +}
>
>  static int record__parse_affinity(const struct option *opt, const char *str, int unset)
>  {
> @@ -3303,8 +3321,10 @@ static struct option __record_options[] = {
>         OPT_CALLBACK('G', "cgroup", &record.evlist, "name",
>                      "monitor event in cgroup name only",
>                      parse_cgroups),
> -       OPT_INTEGER('D', "delay", &record.opts.initial_delay,
> -                 "ms to wait before starting measurement after program start (-1: start with events disabled)"),
> +       OPT_CALLBACK('D', "delay", &record, "ms",
> +                    "ms to wait before starting measurement after program start (-1: start with events disabled), "
> +                    "or ranges of time to enable events e.g. '-D 10-20,30-40'",
> +                    record__parse_event_enable_time),
>         OPT_BOOLEAN(0, "kcore", &record.opts.kcore, "copy /proc/kcore"),
>         OPT_STRING('u', "uid", &record.opts.target.uid_str, "user",
>                    "user to profile"),
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 3cfe730c12b8..fcfe5bcc0bcf 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -15,6 +15,7 @@
>  #include "target.h"
>  #include "evlist.h"
>  #include "evsel.h"
> +#include "record.h"
>  #include "debug.h"
>  #include "units.h"
>  #include "bpf_counter.h"
> @@ -40,12 +41,14 @@
>  #include <sys/ioctl.h>
>  #include <sys/mman.h>
>  #include <sys/prctl.h>
> +#include <sys/timerfd.h>
>
>  #include <linux/bitops.h>
>  #include <linux/hash.h>
>  #include <linux/log2.h>
>  #include <linux/err.h>
>  #include <linux/string.h>
> +#include <linux/time64.h>
>  #include <linux/zalloc.h>
>  #include <perf/evlist.h>
>  #include <perf/evsel.h>
> @@ -147,6 +150,7 @@ static void evlist__purge(struct evlist *evlist)
>
>  void evlist__exit(struct evlist *evlist)
>  {
> +       event_enable_timer__exit(&evlist->eet);
>         zfree(&evlist->mmap);
>         zfree(&evlist->overwrite_mmap);
>         perf_evlist__exit(&evlist->core);
> @@ -2167,6 +2171,236 @@ int evlist__ctlfd_process(struct evlist *evlist, enum evlist_ctl_cmd *cmd)
>         return err;
>  }
>
> +/**
> + * struct event_enable_time - perf record -D/--delay single time range.
> + * @start: start of time range to enable events in milliseconds
> + * @end: end of time range to enable events in milliseconds
> + *
> + * N.B. this structure is also accessed as an array of int.
> + */
> +struct event_enable_time {
> +       int     start;
> +       int     end;
> +};
> +
> +static int parse_event_enable_time(const char *str, struct event_enable_time *range, bool first)
> +{
> +       const char *fmt = first ? "%u - %u %n" : " , %u - %u %n";
> +       int ret, start, end, n;
> +
> +       ret = sscanf(str, fmt, &start, &end, &n);
> +       if (ret != 2 || end <= start)
> +               return -EINVAL;
> +       if (range) {
> +               range->start = start;
> +               range->end = end;
> +       }
> +       return n;
> +}
> +
> +static ssize_t parse_event_enable_times(const char *str, struct event_enable_time *range)
> +{
> +       int incr = !!range;
> +       bool first = true;
> +       ssize_t ret, cnt;
> +
> +       for (cnt = 0; *str; cnt++) {
> +               ret = parse_event_enable_time(str, range, first);
> +               if (ret < 0)
> +                       return ret;
> +               /* Check no overlap */
> +               if (!first && range && range->start <= range[-1].end)
> +                       return -EINVAL;
> +               str += ret;
> +               range += incr;
> +               first = false;
> +       }
> +       return cnt;
> +}
> +
> +/**
> + * struct event_enable_timer - control structure for perf record -D/--delay.
> + * @evlist: event list
> + * @times: time ranges that events are enabled (N.B. this is also accessed as an
> + *         array of int)
> + * @times_cnt: number of time ranges
> + * @timerfd: timer file descriptor
> + * @pollfd_pos: position in @evlist array of file descriptors to poll (fdarray)
> + * @times_step: current position in (int *)@times)[],
> + *              refer event_enable_timer__process()
> + *
> + * Note, this structure is only used when there are time ranges, not when there
> + * is only an initial delay.
> + */
> +struct event_enable_timer {
> +       struct evlist *evlist;
> +       struct event_enable_time *times;
> +       size_t  times_cnt;
> +       int     timerfd;
> +       int     pollfd_pos;
> +       size_t  times_step;
> +};
> +
> +static int str_to_delay(const char *str)
> +{
> +       char *endptr;
> +       long d;
> +
> +       d = strtol(str, &endptr, 10);
> +       if (*endptr || d > INT_MAX || d < -1)
> +               return 0;
> +       return d;
> +}
> +
> +int evlist__parse_event_enable_time(struct evlist *evlist, struct record_opts *opts,
> +                                   const char *str, int unset)
> +{
> +       enum fdarray_flags flags = fdarray_flag__nonfilterable | fdarray_flag__non_perf_event;
> +       struct event_enable_timer *eet;
> +       ssize_t times_cnt;
> +       ssize_t ret;
> +       int err;
> +
> +       if (unset)
> +               return 0;
> +
> +       opts->initial_delay = str_to_delay(str);
> +       if (opts->initial_delay)
> +               return 0;
> +
> +       ret = parse_event_enable_times(str, NULL);
> +       if (ret < 0)
> +               return ret;
> +
> +       times_cnt = ret;
> +       if (times_cnt == 0)
> +               return -EINVAL;
> +
> +       eet = zalloc(sizeof(*eet));
> +       if (!eet)
> +               return -ENOMEM;
> +
> +       eet->times = calloc(times_cnt, sizeof(*eet->times));
> +       if (!eet->times) {
> +               err = -ENOMEM;
> +               goto free_eet;
> +       }
> +
> +       if (parse_event_enable_times(str, eet->times) != times_cnt) {
> +               err = -EINVAL;
> +               goto free_eet_times;
> +       }
> +
> +       eet->times_cnt = times_cnt;
> +
> +       eet->timerfd = timerfd_create(CLOCK_MONOTONIC, TFD_CLOEXEC);
> +       if (eet->timerfd == -1) {
> +               err = -errno;
> +               pr_err("timerfd_create failed: %s\n", strerror(errno));
> +               goto free_eet_times;
> +       }
> +
> +       eet->pollfd_pos = perf_evlist__add_pollfd(&evlist->core, eet->timerfd, NULL, POLLIN, flags);
> +       if (eet->pollfd_pos < 0) {
> +               err = eet->pollfd_pos;
> +               goto close_timerfd;
> +       }
> +
> +       eet->evlist = evlist;
> +       evlist->eet = eet;
> +       opts->initial_delay = eet->times[0].start;
> +
> +       return 0;
> +
> +close_timerfd:
> +       close(eet->timerfd);
> +free_eet_times:
> +       free(eet->times);
> +free_eet:
> +       free(eet);
> +       return err;
> +}
> +
> +static int event_enable_timer__set_timer(struct event_enable_timer *eet, int ms)
> +{
> +       struct itimerspec its = {
> +               .it_value.tv_sec = ms / MSEC_PER_SEC,
> +               .it_value.tv_nsec = (ms % MSEC_PER_SEC) * NSEC_PER_MSEC,
> +       };
> +       int err = 0;
> +
> +       if (timerfd_settime(eet->timerfd, 0, &its, NULL) < 0) {
> +               err = -errno;
> +               pr_err("timerfd_settime failed: %s\n", strerror(errno));
> +       }
> +       return err;
> +}
> +
> +int event_enable_timer__start(struct event_enable_timer *eet)
> +{
> +       int ms;
> +
> +       if (!eet)
> +               return 0;
> +
> +       ms = eet->times[0].end - eet->times[0].start;
> +       eet->times_step = 1;
> +
> +       return event_enable_timer__set_timer(eet, ms);
> +}
> +
> +int event_enable_timer__process(struct event_enable_timer *eet)
> +{
> +       struct pollfd *entries;
> +       short revents;
> +
> +       if (!eet)
> +               return 0;
> +
> +       entries = eet->evlist->core.pollfd.entries;
> +       revents = entries[eet->pollfd_pos].revents;
> +       entries[eet->pollfd_pos].revents = 0;
> +
> +       if (revents & POLLIN) {
> +               size_t step = eet->times_step;
> +               size_t pos = step / 2;
> +
> +               if (step & 1) {
> +                       evlist__disable_non_dummy(eet->evlist);
> +                       pr_info(EVLIST_DISABLED_MSG);
> +                       if (pos >= eet->times_cnt - 1) {
> +                               /* Disarm timer */
> +                               event_enable_timer__set_timer(eet, 0);
> +                               return 1; /* Stop */
> +                       }
> +               } else {
> +                       evlist__enable_non_dummy(eet->evlist);
> +                       pr_info(EVLIST_ENABLED_MSG);
> +               }
> +
> +               step += 1;
> +               pos = step / 2;
> +
> +               if (pos < eet->times_cnt) {
> +                       int *times = (int *)eet->times; /* Accessing 'times' as array of int */
> +                       int ms = times[step] - times[step - 1];
> +
> +                       eet->times_step = step;
> +                       return event_enable_timer__set_timer(eet, ms);
> +               }
> +       }
> +
> +       return 0;
> +}
> +
> +void event_enable_timer__exit(struct event_enable_timer **ep)
> +{
> +       if (!ep || !*ep)
> +               return;
> +       free((*ep)->times);
> +       zfree(ep);
> +}
> +
>  struct evsel *evlist__find_evsel(struct evlist *evlist, int idx)
>  {
>         struct evsel *evsel;
> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> index 3a8474406738..9d967fe3953a 100644
> --- a/tools/perf/util/evlist.h
> +++ b/tools/perf/util/evlist.h
> @@ -48,6 +48,8 @@ enum bkw_mmap_state {
>         BKW_MMAP_EMPTY,
>  };
>
> +struct event_enable_timer;
> +
>  struct evlist {
>         struct perf_evlist core;
>         bool             enabled;
> @@ -79,6 +81,7 @@ struct evlist {
>                 int     ack;    /* ack file descriptor for control commands */
>                 int     pos;    /* index at evlist core object to check signals */
>         } ctl_fd;
> +       struct event_enable_timer *eet;
>  };
>
>  struct evsel_str_handler {
> @@ -426,6 +429,12 @@ int evlist__ctlfd_ack(struct evlist *evlist);
>  #define EVLIST_ENABLED_MSG "Events enabled\n"
>  #define EVLIST_DISABLED_MSG "Events disabled\n"
>
> +int evlist__parse_event_enable_time(struct evlist *evlist, struct record_opts *opts,
> +                                   const char *str, int unset);
> +int event_enable_timer__start(struct event_enable_timer *eet);
> +void event_enable_timer__exit(struct event_enable_timer **ep);
> +int event_enable_timer__process(struct event_enable_timer *eet);
> +
>  struct evsel *evlist__find_evsel(struct evlist *evlist, int idx);
>
>  int evlist__scnprintf_evsels(struct evlist *evlist, size_t size, char *bf);
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/5] perf record: Allow multiple recording time ranges
  2022-08-24  7:28 [PATCH 0/5] perf record: Allow multiple recording time ranges Adrian Hunter
                   ` (4 preceding siblings ...)
  2022-08-24  7:28 ` [PATCH 5/5] perf record: Allow multiple recording time ranges Adrian Hunter
@ 2022-08-24 16:56 ` Andi Kleen
  5 siblings, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2022-08-24 16:56 UTC (permalink / raw)
  To: Adrian Hunter, Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, Namhyung Kim, Ian Rogers, Alexey Bayduraev, linux-kernel


On 8/24/2022 12:28 AM, Adrian Hunter wrote:
> Hi
>
> This patch set extends perf record -D/--delay option to accept time ranges
> for when events are enabled, for instance:
>
>      perf record -e intel_pt// -D 10-20,30-40
>
> to record 10ms to 20ms into the trace and 30ms to 40ms.  Refer patch 5 for
> more details.


Great! I just needed that for something.


-Andi



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/5] perf record: Allow multiple recording time ranges
  2022-08-24 15:52   ` Ian Rogers
@ 2022-08-26  6:38     ` Adrian Hunter
  0 siblings, 0 replies; 13+ messages in thread
From: Adrian Hunter @ 2022-08-26  6:38 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Namhyung Kim, Andi Kleen,
	Alexey Bayduraev, linux-kernel

On 24/08/22 18:52, Ian Rogers wrote:
> On Wed, Aug 24, 2022 at 12:28 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>>
>> AUX area traces can produce too much data to record successfully or
>> analyze subsequently. Add another means to reduce data collection by
>> allowing multiple recording time ranges.
>>
>> This is useful, for instance, in cases where a workload produces
>> predictably reproducible events in specific time ranges.
>>
>> Today we only have perf record -D <msecs> to start at a specific region, or
>> some complicated approach using snapshot mode and external scripts sending
>> signals or using the fifos. But these approaches are difficult to set up
>> compared with simply having perf do it.
>>
>> Extend perf record option -D/--delay option to specifying relative time
>> stamps for start stop controlled by perf with the right time offset, for
>> instance:
>>
>>     perf record -e intel_pt// -D 10-20,30-40
>>
>> to record 10ms to 20ms into the trace and 30ms to 40ms.
> 
> Could you add a test like this in:
> https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/perf/tests/shell/record.sh?h=perf/core
> If for no other reason than code coverage. It would also be
> interesting to see what happens if floating point values are passed,
> ranges are incomplete, etc. As the functionality is generic the event
> could be cycles or instructions rather than intel_pt.

Just checking arguments does not add much value. Really it needs to
get information about what perf is doing. A simple way would be to
add debug messages specifically formatted to give that information.
Then a test could capture them and check they match what is expected.

For example, to test the -D option, print debug messages with
timestamps when events are enabled / disabled, when the workload
starts etc.

Maybe want a new pr_... for such messages like pr_test()
that print at, say, verbose level 3 or 4 (or 5)?  Or a new
"verbose" option just for test messages?

> 
> Thanks,
> Ian
> 
>> Example:
>>
>>  The example workload is:
>>
>>  $ cat repeat-usleep.c
>>
>>  int usleep(useconds_t usec);
>>
>>  int usage(int ret, const char *msg)
>>  {
>>          if (msg)
>>                  fprintf(stderr, "%s\n", msg);
>>
>>          fprintf(stderr, "Usage is: repeat-usleep <microseconds>\n");
>>
>>          return ret;
>>  }
>>
>>  int main(int argc, char *argv[])
>>  {
>>          unsigned long usecs;
>>          char *end_ptr;
>>
>>          if (argc != 2)
>>                  return usage(1, "Error: Wrong number of arguments!");
>>
>>          errno = 0;
>>          usecs = strtoul(argv[1], &end_ptr, 0);
>>          if (errno || *end_ptr || usecs > UINT_MAX)
>>                  return usage(1, "Error: Invalid argument!");
>>
>>          while (1) {
>>                  int ret = usleep(usecs);
>>
>>                  if (ret & errno != EINTR)
>>                          return usage(1, "Error: usleep() failed!");
>>          }
>>
>>          return 0;
>>  }
>>
>>  $ perf record -e intel_pt//u --delay 10-20,40-70,110-160 -- ./repeat-usleep 500
>>  Events disabled
>>  Events enabled
>>  Events disabled
>>  Events enabled
>>  Events disabled
>>  Events enabled
>>  Events disabled
>>  [ perf record: Woken up 5 times to write data ]
>>  [ perf record: Captured and wrote 0.204 MB perf.data ]
>>  Terminated
>>
>>  A dlfilter is used to determine continuous data collection (timestamps
>>  less than 1ms apart):
>>
>>  $ cat dlfilter-show-delays.c
>>
>>  static __u64 start_time;
>>  static __u64 last_time;
>>
>>  int start(void **data, void *ctx)
>>  {
>>          printf("%-17s\t%-9s\t%-6s\n", " Time", " Duration", " Delay");
>>          return 0;
>>  }
>>
>>  int filter_event_early(void *data, const struct perf_dlfilter_sample *sample, void *ctx)
>>  {
>>          __u64 delta;
>>
>>          if (!sample->time)
>>                  return 1;
>>          if (!last_time)
>>                  goto out;
>>          delta = sample->time - last_time;
>>          if (delta < 1000000)
>>                  goto out2;;
>>          printf("%17.9f\t%9.1f\t%6.1f\n", start_time / 1000000000.0, (last_time - start_time) / 1000000.0, delta / 1000000.0);
>>  out:
>>          start_time = sample->time;
>>  out2:
>>          last_time = sample->time;
>>          return 1;
>>  }
>>
>>  int stop(void *data, void *ctx)
>>  {
>>          printf("%17.9f\t%9.1f\n", start_time / 1000000000.0, (last_time - start_time) / 1000000.0);
>>          return 0;
>>  }
>>
>>  The result shows the times roughly match the --delay option:
>>
>>  $ perf script --itrace=qb --dlfilter dlfilter-show-delays.so
>>   Time                    Duration        Delay
>>    39215.302317300             9.7         20.5
>>    39215.332480217            30.4         40.9
>>    39215.403837717            49.8
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>>  tools/perf/Documentation/perf-record.txt |   6 +-
>>  tools/perf/builtin-record.c              |  24 ++-
>>  tools/perf/util/evlist.c                 | 234 +++++++++++++++++++++++
>>  tools/perf/util/evlist.h                 |   9 +
>>  4 files changed, 269 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
>> index 099817ef5150..953b9663522a 100644
>> --- a/tools/perf/Documentation/perf-record.txt
>> +++ b/tools/perf/Documentation/perf-record.txt
>> @@ -430,8 +430,10 @@ if combined with -a or -C options.
>>  -D::
>>  --delay=::
>>  After starting the program, wait msecs before measuring (-1: start with events
>> -disabled). This is useful to filter out the startup phase of the program, which
>> -is often very different.
>> +disabled), or enable events only for specified ranges of msecs (e.g.
>> +-D 10-20,30-40 means wait 10 msecs, enable for 10 msecs, wait 10 msecs, enable
>> +for 10 msecs, then stop). Note, delaying enabling of events is useful to filter
>> +out the startup phase of the program, which is often very different.
>>
>>  -I::
>>  --intr-regs::
>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>> index cefb3028f565..7fdb1dd9a0a8 100644
>> --- a/tools/perf/builtin-record.c
>> +++ b/tools/perf/builtin-record.c
>> @@ -2498,6 +2498,10 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>>                 }
>>         }
>>
>> +       err = event_enable_timer__start(rec->evlist->eet);
>> +       if (err)
>> +               goto out_child;
>> +
>>         trigger_ready(&auxtrace_snapshot_trigger);
>>         trigger_ready(&switch_output_trigger);
>>         perf_hooks__invoke_record_start();
>> @@ -2621,6 +2625,14 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>>                         }
>>                 }
>>
>> +               err = event_enable_timer__process(rec->evlist->eet);
>> +               if (err < 0)
>> +                       goto out_child;
>> +               if (err) {
>> +                       err = 0;
>> +                       done = 1;
>> +               }
>> +
>>                 /*
>>                  * When perf is starting the traced process, at the end events
>>                  * die with the process and we wait for that. Thus no need to
>> @@ -2842,6 +2854,12 @@ static int perf_record_config(const char *var, const char *value, void *cb)
>>         return 0;
>>  }
>>
>> +static int record__parse_event_enable_time(const struct option *opt, const char *str, int unset)
>> +{
>> +       struct record *rec = (struct record *)opt->value;
>> +
>> +       return evlist__parse_event_enable_time(rec->evlist, &rec->opts, str, unset);
>> +}
>>
>>  static int record__parse_affinity(const struct option *opt, const char *str, int unset)
>>  {
>> @@ -3303,8 +3321,10 @@ static struct option __record_options[] = {
>>         OPT_CALLBACK('G', "cgroup", &record.evlist, "name",
>>                      "monitor event in cgroup name only",
>>                      parse_cgroups),
>> -       OPT_INTEGER('D', "delay", &record.opts.initial_delay,
>> -                 "ms to wait before starting measurement after program start (-1: start with events disabled)"),
>> +       OPT_CALLBACK('D', "delay", &record, "ms",
>> +                    "ms to wait before starting measurement after program start (-1: start with events disabled), "
>> +                    "or ranges of time to enable events e.g. '-D 10-20,30-40'",
>> +                    record__parse_event_enable_time),
>>         OPT_BOOLEAN(0, "kcore", &record.opts.kcore, "copy /proc/kcore"),
>>         OPT_STRING('u', "uid", &record.opts.target.uid_str, "user",
>>                    "user to profile"),
>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
>> index 3cfe730c12b8..fcfe5bcc0bcf 100644
>> --- a/tools/perf/util/evlist.c
>> +++ b/tools/perf/util/evlist.c
>> @@ -15,6 +15,7 @@
>>  #include "target.h"
>>  #include "evlist.h"
>>  #include "evsel.h"
>> +#include "record.h"
>>  #include "debug.h"
>>  #include "units.h"
>>  #include "bpf_counter.h"
>> @@ -40,12 +41,14 @@
>>  #include <sys/ioctl.h>
>>  #include <sys/mman.h>
>>  #include <sys/prctl.h>
>> +#include <sys/timerfd.h>
>>
>>  #include <linux/bitops.h>
>>  #include <linux/hash.h>
>>  #include <linux/log2.h>
>>  #include <linux/err.h>
>>  #include <linux/string.h>
>> +#include <linux/time64.h>
>>  #include <linux/zalloc.h>
>>  #include <perf/evlist.h>
>>  #include <perf/evsel.h>
>> @@ -147,6 +150,7 @@ static void evlist__purge(struct evlist *evlist)
>>
>>  void evlist__exit(struct evlist *evlist)
>>  {
>> +       event_enable_timer__exit(&evlist->eet);
>>         zfree(&evlist->mmap);
>>         zfree(&evlist->overwrite_mmap);
>>         perf_evlist__exit(&evlist->core);
>> @@ -2167,6 +2171,236 @@ int evlist__ctlfd_process(struct evlist *evlist, enum evlist_ctl_cmd *cmd)
>>         return err;
>>  }
>>
>> +/**
>> + * struct event_enable_time - perf record -D/--delay single time range.
>> + * @start: start of time range to enable events in milliseconds
>> + * @end: end of time range to enable events in milliseconds
>> + *
>> + * N.B. this structure is also accessed as an array of int.
>> + */
>> +struct event_enable_time {
>> +       int     start;
>> +       int     end;
>> +};
>> +
>> +static int parse_event_enable_time(const char *str, struct event_enable_time *range, bool first)
>> +{
>> +       const char *fmt = first ? "%u - %u %n" : " , %u - %u %n";
>> +       int ret, start, end, n;
>> +
>> +       ret = sscanf(str, fmt, &start, &end, &n);
>> +       if (ret != 2 || end <= start)
>> +               return -EINVAL;
>> +       if (range) {
>> +               range->start = start;
>> +               range->end = end;
>> +       }
>> +       return n;
>> +}
>> +
>> +static ssize_t parse_event_enable_times(const char *str, struct event_enable_time *range)
>> +{
>> +       int incr = !!range;
>> +       bool first = true;
>> +       ssize_t ret, cnt;
>> +
>> +       for (cnt = 0; *str; cnt++) {
>> +               ret = parse_event_enable_time(str, range, first);
>> +               if (ret < 0)
>> +                       return ret;
>> +               /* Check no overlap */
>> +               if (!first && range && range->start <= range[-1].end)
>> +                       return -EINVAL;
>> +               str += ret;
>> +               range += incr;
>> +               first = false;
>> +       }
>> +       return cnt;
>> +}
>> +
>> +/**
>> + * struct event_enable_timer - control structure for perf record -D/--delay.
>> + * @evlist: event list
>> + * @times: time ranges that events are enabled (N.B. this is also accessed as an
>> + *         array of int)
>> + * @times_cnt: number of time ranges
>> + * @timerfd: timer file descriptor
>> + * @pollfd_pos: position in @evlist array of file descriptors to poll (fdarray)
>> + * @times_step: current position in (int *)@times)[],
>> + *              refer event_enable_timer__process()
>> + *
>> + * Note, this structure is only used when there are time ranges, not when there
>> + * is only an initial delay.
>> + */
>> +struct event_enable_timer {
>> +       struct evlist *evlist;
>> +       struct event_enable_time *times;
>> +       size_t  times_cnt;
>> +       int     timerfd;
>> +       int     pollfd_pos;
>> +       size_t  times_step;
>> +};
>> +
>> +static int str_to_delay(const char *str)
>> +{
>> +       char *endptr;
>> +       long d;
>> +
>> +       d = strtol(str, &endptr, 10);
>> +       if (*endptr || d > INT_MAX || d < -1)
>> +               return 0;
>> +       return d;
>> +}
>> +
>> +int evlist__parse_event_enable_time(struct evlist *evlist, struct record_opts *opts,
>> +                                   const char *str, int unset)
>> +{
>> +       enum fdarray_flags flags = fdarray_flag__nonfilterable | fdarray_flag__non_perf_event;
>> +       struct event_enable_timer *eet;
>> +       ssize_t times_cnt;
>> +       ssize_t ret;
>> +       int err;
>> +
>> +       if (unset)
>> +               return 0;
>> +
>> +       opts->initial_delay = str_to_delay(str);
>> +       if (opts->initial_delay)
>> +               return 0;
>> +
>> +       ret = parse_event_enable_times(str, NULL);
>> +       if (ret < 0)
>> +               return ret;
>> +
>> +       times_cnt = ret;
>> +       if (times_cnt == 0)
>> +               return -EINVAL;
>> +
>> +       eet = zalloc(sizeof(*eet));
>> +       if (!eet)
>> +               return -ENOMEM;
>> +
>> +       eet->times = calloc(times_cnt, sizeof(*eet->times));
>> +       if (!eet->times) {
>> +               err = -ENOMEM;
>> +               goto free_eet;
>> +       }
>> +
>> +       if (parse_event_enable_times(str, eet->times) != times_cnt) {
>> +               err = -EINVAL;
>> +               goto free_eet_times;
>> +       }
>> +
>> +       eet->times_cnt = times_cnt;
>> +
>> +       eet->timerfd = timerfd_create(CLOCK_MONOTONIC, TFD_CLOEXEC);
>> +       if (eet->timerfd == -1) {
>> +               err = -errno;
>> +               pr_err("timerfd_create failed: %s\n", strerror(errno));
>> +               goto free_eet_times;
>> +       }
>> +
>> +       eet->pollfd_pos = perf_evlist__add_pollfd(&evlist->core, eet->timerfd, NULL, POLLIN, flags);
>> +       if (eet->pollfd_pos < 0) {
>> +               err = eet->pollfd_pos;
>> +               goto close_timerfd;
>> +       }
>> +
>> +       eet->evlist = evlist;
>> +       evlist->eet = eet;
>> +       opts->initial_delay = eet->times[0].start;
>> +
>> +       return 0;
>> +
>> +close_timerfd:
>> +       close(eet->timerfd);
>> +free_eet_times:
>> +       free(eet->times);
>> +free_eet:
>> +       free(eet);
>> +       return err;
>> +}
>> +
>> +static int event_enable_timer__set_timer(struct event_enable_timer *eet, int ms)
>> +{
>> +       struct itimerspec its = {
>> +               .it_value.tv_sec = ms / MSEC_PER_SEC,
>> +               .it_value.tv_nsec = (ms % MSEC_PER_SEC) * NSEC_PER_MSEC,
>> +       };
>> +       int err = 0;
>> +
>> +       if (timerfd_settime(eet->timerfd, 0, &its, NULL) < 0) {
>> +               err = -errno;
>> +               pr_err("timerfd_settime failed: %s\n", strerror(errno));
>> +       }
>> +       return err;
>> +}
>> +
>> +int event_enable_timer__start(struct event_enable_timer *eet)
>> +{
>> +       int ms;
>> +
>> +       if (!eet)
>> +               return 0;
>> +
>> +       ms = eet->times[0].end - eet->times[0].start;
>> +       eet->times_step = 1;
>> +
>> +       return event_enable_timer__set_timer(eet, ms);
>> +}
>> +
>> +int event_enable_timer__process(struct event_enable_timer *eet)
>> +{
>> +       struct pollfd *entries;
>> +       short revents;
>> +
>> +       if (!eet)
>> +               return 0;
>> +
>> +       entries = eet->evlist->core.pollfd.entries;
>> +       revents = entries[eet->pollfd_pos].revents;
>> +       entries[eet->pollfd_pos].revents = 0;
>> +
>> +       if (revents & POLLIN) {
>> +               size_t step = eet->times_step;
>> +               size_t pos = step / 2;
>> +
>> +               if (step & 1) {
>> +                       evlist__disable_non_dummy(eet->evlist);
>> +                       pr_info(EVLIST_DISABLED_MSG);
>> +                       if (pos >= eet->times_cnt - 1) {
>> +                               /* Disarm timer */
>> +                               event_enable_timer__set_timer(eet, 0);
>> +                               return 1; /* Stop */
>> +                       }
>> +               } else {
>> +                       evlist__enable_non_dummy(eet->evlist);
>> +                       pr_info(EVLIST_ENABLED_MSG);
>> +               }
>> +
>> +               step += 1;
>> +               pos = step / 2;
>> +
>> +               if (pos < eet->times_cnt) {
>> +                       int *times = (int *)eet->times; /* Accessing 'times' as array of int */
>> +                       int ms = times[step] - times[step - 1];
>> +
>> +                       eet->times_step = step;
>> +                       return event_enable_timer__set_timer(eet, ms);
>> +               }
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>> +void event_enable_timer__exit(struct event_enable_timer **ep)
>> +{
>> +       if (!ep || !*ep)
>> +               return;
>> +       free((*ep)->times);
>> +       zfree(ep);
>> +}
>> +
>>  struct evsel *evlist__find_evsel(struct evlist *evlist, int idx)
>>  {
>>         struct evsel *evsel;
>> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
>> index 3a8474406738..9d967fe3953a 100644
>> --- a/tools/perf/util/evlist.h
>> +++ b/tools/perf/util/evlist.h
>> @@ -48,6 +48,8 @@ enum bkw_mmap_state {
>>         BKW_MMAP_EMPTY,
>>  };
>>
>> +struct event_enable_timer;
>> +
>>  struct evlist {
>>         struct perf_evlist core;
>>         bool             enabled;
>> @@ -79,6 +81,7 @@ struct evlist {
>>                 int     ack;    /* ack file descriptor for control commands */
>>                 int     pos;    /* index at evlist core object to check signals */
>>         } ctl_fd;
>> +       struct event_enable_timer *eet;
>>  };
>>
>>  struct evsel_str_handler {
>> @@ -426,6 +429,12 @@ int evlist__ctlfd_ack(struct evlist *evlist);
>>  #define EVLIST_ENABLED_MSG "Events enabled\n"
>>  #define EVLIST_DISABLED_MSG "Events disabled\n"
>>
>> +int evlist__parse_event_enable_time(struct evlist *evlist, struct record_opts *opts,
>> +                                   const char *str, int unset);
>> +int event_enable_timer__start(struct event_enable_timer *eet);
>> +void event_enable_timer__exit(struct event_enable_timer **ep);
>> +int event_enable_timer__process(struct event_enable_timer *eet);
>> +
>>  struct evsel *evlist__find_evsel(struct evlist *evlist, int idx);
>>
>>  int evlist__scnprintf_evsels(struct evlist *evlist, size_t size, char *bf);
>> --
>> 2.25.1
>>


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-08-26  6:39 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-24  7:28 [PATCH 0/5] perf record: Allow multiple recording time ranges Adrian Hunter
2022-08-24  7:28 ` [PATCH 1/5] perf record: Fix way of handling non-perf-event pollfds Adrian Hunter
2022-08-24 15:40   ` Ian Rogers
2022-08-24  7:28 ` [PATCH 2/5] perf record: Fix done_fd wakeup event Adrian Hunter
2022-08-24 15:41   ` Ian Rogers
2022-08-24  7:28 ` [PATCH 3/5] perf record: Change evlist->ctl_fd to use fdarray_flag__non_perf_event Adrian Hunter
2022-08-24 15:42   ` Ian Rogers
2022-08-24  7:28 ` [PATCH 4/5] perf evlist: Add evlist__{en/dis}able_non_dummy() Adrian Hunter
2022-08-24 15:45   ` Ian Rogers
2022-08-24  7:28 ` [PATCH 5/5] perf record: Allow multiple recording time ranges Adrian Hunter
2022-08-24 15:52   ` Ian Rogers
2022-08-26  6:38     ` Adrian Hunter
2022-08-24 16:56 ` [PATCH 0/5] " Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.