All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 00/10] perf pollfd series v2
@ 2014-09-03 21:59 Arnaldo Carvalho de Melo
  2014-09-03 21:59 ` [PATCH 01/10] perf evlist: Introduce perf_evlist__filter_pollfd method Arnaldo Carvalho de Melo
                   ` (9 more replies)
  0 siblings, 10 replies; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-03 21:59 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Adrian Hunter,
	Borislav Petkov, Corey Ashford, David Ahern, Don Zickus,
	Frederic Weisbecker, Ingo Molnar, Jean Pihet, Mike Galbraith,
	Namhyung Kim, Peter Zijlstra, Stephane Eranian

Hi,

   	This is an alternative series to the one Jiri Olsa posted to use the
fixes he made to the kernel side to allow tooling to notice that a thread had
exited by looking at the pollfd.revents looking for POLLHUP notifications.

        Once all event file descriptors are removed from the evlist pollfd array,
tools can make a decision about exiting or telling the user about what happened,
asking to guidance on what to do next.

        The main difference in this approach is that a new class, which Jiri
called 'poller' and I called 'fdarray', grew up from what was in evlist->pollfd
and associated operations, while Jiri first introduced a new class and then
made tooling use it.

        The details of the implementation should be clear on the changelog
comments, please let me know if you see any problems, and if I can have your
acked-by/tested-by/whatever-else tags to get this moving forward.

        Ah, there is still one missing thing which is to make the hists browser
in live mode be notified that all monitored events are POLLHUP'ed, will get
to that in followup patches.

        The kernel bits were sent together with my latest pull req to Ingo,
this series is on top of that branch and is available at:

 git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git perf/fdarray

--------------------------------------------------------------------

v2: The kernel bits were already merged by Ingo, this is should address the
comments made for the first RFC, and has a hopefully improved/clearer
fdarray__filter() method, with changes made from thinking about comments made
by Adrian.

I made fdarray__init() receive the fdarray__add() autogrow hint, as suggested
by Namhyung.

It also makes fdarray__add() receive the revents mask instead of using what
makes sense to perf_evlist__add_pollfd(), also suggested by Namhyung.

I renamed the fd/poll.[ch] files to fd/array.[ch], from the discussion had
about naming with Jiri.

I added some Acked-by and Reviewed-by tags, please let me know if i should
stick some more, I really appreciate those tags!

Ah, in this branch there are some more patches, as it was made on top of my
perf/core branch, below are the stats for just the changesets related to this
RFC.

Thanks!

- Arnaldo

Arnaldo Carvalho de Melo (10):
  perf evlist: Introduce perf_evlist__filter_pollfd method
  perf tests: Add test for perf_evlist__filter_pollfd()
  perf evlist: Monitor POLLERR and POLLHUP events too
  perf record: Filter out POLLHUP'ed file descriptors
  perf trace: Filter out POLLHUP'ed file descriptors
  perf evlist: Allow growing pollfd on add method
  perf tests: Add pollfd growing test
  perf kvm stat live: Use perf_evlist__add_pollfd() instead of local
    equivalent
  perf evlist: Introduce poll method for common code idiom
  tools lib api: Adopt fdarray class from perf's evlist

 tools/lib/api/Makefile                    |   7 +-
 tools/lib/api/fd/array.c                  | 108 +++++++++++++++
 tools/lib/api/fd/array.h                  |  32 +++++
 tools/perf/Makefile.perf                  |   1 +
 tools/perf/builtin-kvm.c                  |  24 ++--
 tools/perf/builtin-record.c               |   5 +-
 tools/perf/builtin-top.c                  |   4 +-
 tools/perf/builtin-trace.c                |   3 +-
 tools/perf/tests/builtin-test.c           |   8 ++
 tools/perf/tests/evlist.c                 | 217 ++++++++++++++++++++++++++++++
 tools/perf/tests/open-syscall-tp-fields.c |   2 +-
 tools/perf/tests/perf-record.c            |   2 +-
 tools/perf/tests/task-exit.c              |   2 +-
 tools/perf/tests/tests.h                  |   2 +
 tools/perf/util/evlist.c                  |  37 +++--
 tools/perf/util/evlist.h                  |  10 +-
 tools/perf/util/python.c                  |   6 +-
 17 files changed, 431 insertions(+), 39 deletions(-)
 create mode 100644 tools/lib/api/fd/array.c
 create mode 100644 tools/lib/api/fd/array.h
 create mode 100644 tools/perf/tests/evlist.c

-- 
1.9.3

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 01/10] perf evlist: Introduce perf_evlist__filter_pollfd method
  2014-09-03 21:59 [RFC 00/10] perf pollfd series v2 Arnaldo Carvalho de Melo
@ 2014-09-03 21:59 ` Arnaldo Carvalho de Melo
  2014-09-03 21:59 ` [PATCH 02/10] perf tests: Add test for perf_evlist__filter_pollfd() Arnaldo Carvalho de Melo
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-03 21:59 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Adrian Hunter,
	David Ahern, Don Zickus, Frederic Weisbecker, Mike Galbraith,
	Namhyung Kim, Paul Mackerras, Peter Zijlstra, Stephane Eranian

From: Arnaldo Carvalho de Melo <acme@redhat.com>

To remove all entries in evlist->pollfd[] that have revents matching at
least one of the bits in the specified mask.

It'll adjust evlist->nr_fds to the number of unfiltered fds and will
return this value, as a convenience and to avoid requiring direct access
to internal state of perf_evlist objects.

This will be used after polling the evlist fds so that we remove fds
that were closed by the kernel.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-y2sca7z3wicvvy40a50lozwm@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/evlist.c | 21 +++++++++++++++++++++
 tools/perf/util/evlist.h |  2 ++
 2 files changed, 23 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index a3e28b49128a..023bc3873ae9 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -428,6 +428,27 @@ void perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd)
 	evlist->nr_fds++;
 }
 
+int perf_evlist__filter_pollfd(struct perf_evlist *evlist, short revents_and_mask)
+{
+	int fd, nr_fds = 0;
+
+	if (evlist->nr_fds == 0)
+		return 0;
+
+	for (fd = 0; fd < evlist->nr_fds; ++fd) {
+		if (evlist->pollfd[fd].revents & revents_and_mask)
+			continue;
+
+		if (fd != nr_fds)
+			evlist->pollfd[nr_fds] = evlist->pollfd[fd];
+
+		++nr_fds;
+	}
+
+	evlist->nr_fds = nr_fds;
+	return nr_fds;
+}
+
 static void perf_evlist__id_hash(struct perf_evlist *evlist,
 				 struct perf_evsel *evsel,
 				 int cpu, int thread, u64 id)
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 106de53a6a74..1082420951f9 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -84,6 +84,8 @@ void perf_evlist__id_add(struct perf_evlist *evlist, struct perf_evsel *evsel,
 
 void perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd);
 
+int perf_evlist__filter_pollfd(struct perf_evlist *evlist, short revents_and_mask);
+
 struct perf_evsel *perf_evlist__id2evsel(struct perf_evlist *evlist, u64 id);
 
 struct perf_sample_id *perf_evlist__id2sid(struct perf_evlist *evlist, u64 id);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/10] perf tests: Add test for perf_evlist__filter_pollfd()
  2014-09-03 21:59 [RFC 00/10] perf pollfd series v2 Arnaldo Carvalho de Melo
  2014-09-03 21:59 ` [PATCH 01/10] perf evlist: Introduce perf_evlist__filter_pollfd method Arnaldo Carvalho de Melo
@ 2014-09-03 21:59 ` Arnaldo Carvalho de Melo
  2014-09-03 21:59 ` [PATCH 03/10] perf evlist: Monitor POLLERR and POLLHUP events too Arnaldo Carvalho de Melo
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-03 21:59 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Adrian Hunter,
	David Ahern, Don Zickus, Frederic Weisbecker, Mike Galbraith,
	Namhyung Kim, Paul Mackerras, Peter Zijlstra, Stephane Eranian

From: Arnaldo Carvalho de Melo <acme@redhat.com>

That will use a synthetic evlist with just what is touched by this new
method to check that it works as expected.

Output in verbose mode:

  $ perf test -v pollfd
  33: Filter fds with revents mask in a pollfd array         :
  --- start ---
  filtering all but pollfd[2]:
  before:   5 [ 5, 4, 3, 2, 1 ]
   after:   1 [ 3 ]
  filtering all but (pollfd[0], pollfd[3]):
  before:   5 [ 5, 4, 3, 2, 1 ]
   after:   2 [ 5, 2 ]
  test child finished with 0
  ---- end ----
  Filter fds with revents mask in a pollfd array: Ok
  $

Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-x7c8liszdvc3ocmanf2cet8p@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Makefile.perf        |   1 +
 tools/perf/tests/builtin-test.c |   4 ++
 tools/perf/tests/evlist.c       | 103 ++++++++++++++++++++++++++++++++++++++++
 tools/perf/tests/tests.h        |   1 +
 4 files changed, 109 insertions(+)
 create mode 100644 tools/perf/tests/evlist.c

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 95e832b1bc3c..9ce194fc00a0 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -399,6 +399,7 @@ LIB_OBJS += $(OUTPUT)tests/open-syscall-tp-fields.o
 LIB_OBJS += $(OUTPUT)tests/mmap-basic.o
 LIB_OBJS += $(OUTPUT)tests/perf-record.o
 LIB_OBJS += $(OUTPUT)tests/rdpmc.o
+LIB_OBJS += $(OUTPUT)tests/evlist.o
 LIB_OBJS += $(OUTPUT)tests/evsel-roundtrip-name.o
 LIB_OBJS += $(OUTPUT)tests/evsel-tp-sched.o
 LIB_OBJS += $(OUTPUT)tests/pmu.o
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 6a4145e5ad2c..41e556edbe02 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -158,6 +158,10 @@ static struct test {
 		.func = test__switch_tracking,
 	},
 	{
+		.desc = "Filter fds with revents mask in a pollfd array",
+		.func = test__perf_evlist__filter_pollfd,
+	},
+	{
 		.func = NULL,
 	},
 };
diff --git a/tools/perf/tests/evlist.c b/tools/perf/tests/evlist.c
new file mode 100644
index 000000000000..77579158f4d9
--- /dev/null
+++ b/tools/perf/tests/evlist.c
@@ -0,0 +1,103 @@
+#include "util/evlist.h"
+#include "util/debug.h"
+#include "tests/tests.h"
+
+static void perf_evlist__init_pollfd(struct perf_evlist *evlist,
+				     int nr_fds_alloc, short revents)
+{
+	int fd;
+
+	evlist->nr_fds = nr_fds_alloc;
+
+	for (fd = 0; fd < nr_fds_alloc; ++fd) {
+		evlist->pollfd[fd].fd	   = nr_fds_alloc - fd;
+		evlist->pollfd[fd].revents = revents;
+	}
+}
+
+static int perf_evlist__fprintf_pollfd(struct perf_evlist *evlist,
+				       const char *prefix, FILE *fp)
+{
+	int printed = 0, fd;
+
+	if (!verbose)
+		return 0;
+
+	printed += fprintf(fp, "\n%s: %3d [ ", prefix, evlist->nr_fds);
+	for (fd = 0; fd < evlist->nr_fds; ++fd)
+		printed += fprintf(fp, "%s%d", fd ? ", " : "", evlist->pollfd[fd].fd);
+	printed += fprintf(fp, " ]");
+	return printed;
+}
+
+int test__perf_evlist__filter_pollfd(void)
+{
+	const int nr_fds_alloc = 5;
+	int nr_fds, expected_fd[2], fd;
+	struct pollfd pollfd[nr_fds_alloc];
+	struct perf_evlist evlist_alloc = {
+		.pollfd	= pollfd,
+	}, *evlist = &evlist_alloc;
+
+	perf_evlist__init_pollfd(evlist, nr_fds_alloc, POLLIN);
+	nr_fds = perf_evlist__filter_pollfd(evlist, POLLHUP);
+	if (nr_fds != nr_fds_alloc) {
+		pr_debug("\nperf_evlist__filter_pollfd()=%d != %d shouldn't have filtered anything",
+			 nr_fds, nr_fds_alloc);
+		return TEST_FAIL;
+	}
+
+	perf_evlist__init_pollfd(evlist, nr_fds_alloc, POLLHUP);
+	nr_fds = perf_evlist__filter_pollfd(evlist, POLLHUP);
+	if (nr_fds != 0) {
+		pr_debug("\nperf_evlist__filter_pollfd()=%d != %d, should have filtered all fds",
+			 nr_fds, nr_fds_alloc);
+		return TEST_FAIL;
+	}
+
+	perf_evlist__init_pollfd(evlist, nr_fds_alloc, POLLHUP);
+	pollfd[2].revents = POLLIN;
+	expected_fd[0] = pollfd[2].fd;
+
+	pr_debug("\nfiltering all but pollfd[2]:");
+	perf_evlist__fprintf_pollfd(evlist, "before", stderr);
+	nr_fds = perf_evlist__filter_pollfd(evlist, POLLHUP);
+	perf_evlist__fprintf_pollfd(evlist, " after", stderr);
+	if (nr_fds != 1) {
+		pr_debug("\nperf_evlist__filter_pollfd()=%d != 1, should have left just one event",
+			 nr_fds);
+		return TEST_FAIL;
+	}
+
+	if (pollfd[0].fd != expected_fd[0]) {
+		pr_debug("\npollfd[0].fd=%d != %d\n", pollfd[0].fd, expected_fd[0]);
+		return TEST_FAIL;
+	}
+
+	perf_evlist__init_pollfd(evlist, nr_fds_alloc, POLLHUP);
+	pollfd[0].revents = POLLIN;
+	expected_fd[0] = pollfd[0].fd;
+	pollfd[3].revents = POLLIN;
+	expected_fd[1] = pollfd[3].fd;
+
+	pr_debug("\nfiltering all but (pollfd[0], pollfd[3]):");
+	perf_evlist__fprintf_pollfd(evlist, "before", stderr);
+	nr_fds = perf_evlist__filter_pollfd(evlist, POLLHUP);
+	perf_evlist__fprintf_pollfd(evlist, " after", stderr);
+	if (nr_fds != 2) {
+		pr_debug("\nperf_evlist__filter_pollfd()=%d != 2, should have left just two events",
+			 nr_fds);
+		return TEST_FAIL;
+	}
+
+	for (fd = 0; fd < 2; ++fd) {
+		if (pollfd[fd].fd != expected_fd[fd]) {
+			pr_debug("\npollfd[%d].fd=%d != %d\n", fd, pollfd[fd].fd, expected_fd[fd]);
+			return TEST_FAIL;
+		}
+	}
+
+	pr_debug("\n");
+
+	return 0;
+}
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index be8be10e3957..72c4c039bd80 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -49,6 +49,7 @@ int test__thread_mg_share(void);
 int test__hists_output(void);
 int test__hists_cumulate(void);
 int test__switch_tracking(void);
+int test__perf_evlist__filter_pollfd(void);
 
 #if defined(__x86_64__) || defined(__i386__) || defined(__arm__)
 #ifdef HAVE_DWARF_UNWIND_SUPPORT
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 03/10] perf evlist: Monitor POLLERR and POLLHUP events too
  2014-09-03 21:59 [RFC 00/10] perf pollfd series v2 Arnaldo Carvalho de Melo
  2014-09-03 21:59 ` [PATCH 01/10] perf evlist: Introduce perf_evlist__filter_pollfd method Arnaldo Carvalho de Melo
  2014-09-03 21:59 ` [PATCH 02/10] perf tests: Add test for perf_evlist__filter_pollfd() Arnaldo Carvalho de Melo
@ 2014-09-03 21:59 ` Arnaldo Carvalho de Melo
  2014-09-03 21:59 ` [PATCH 04/10] perf record: Filter out POLLHUP'ed file descriptors Arnaldo Carvalho de Melo
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-03 21:59 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Adrian Hunter,
	David Ahern, Don Zickus, Frederic Weisbecker, Mike Galbraith,
	Namhyung Kim, Paul Mackerras, Peter Zijlstra, Stephane Eranian

From: Arnaldo Carvalho de Melo <acme@redhat.com>

We want to know when the fd went away, like when a monitored thread
exits.

If we do not monitor such events, then the tools will wait forever on
events from a vanished thread, like when running:

 $ sleep 5s &
 $ perf record -p `pidof sleep`

This builds upon the kernel patch by Jiri Olsa that actually makes a
poll on those file descriptors to return POLLHUP.

It is also needed to change the tools to use
perf_evlist__filter_pollfd() to check if there are remainings fds to
monitor or if all are gone, in which case they will exit the
poll/mmap/read loop.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-a4fslwspov0bs69nj825hqpq@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/evlist.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 023bc3873ae9..502cd11ab17e 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -424,7 +424,7 @@ void perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd)
 {
 	fcntl(fd, F_SETFL, O_NONBLOCK);
 	evlist->pollfd[evlist->nr_fds].fd = fd;
-	evlist->pollfd[evlist->nr_fds].events = POLLIN;
+	evlist->pollfd[evlist->nr_fds].events = POLLIN | POLLERR | POLLHUP;
 	evlist->nr_fds++;
 }
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/10] perf record: Filter out POLLHUP'ed file descriptors
  2014-09-03 21:59 [RFC 00/10] perf pollfd series v2 Arnaldo Carvalho de Melo
                   ` (2 preceding siblings ...)
  2014-09-03 21:59 ` [PATCH 03/10] perf evlist: Monitor POLLERR and POLLHUP events too Arnaldo Carvalho de Melo
@ 2014-09-03 21:59 ` Arnaldo Carvalho de Melo
  2014-09-04 12:32   ` Adrian Hunter
  2014-09-03 21:59 ` [PATCH 05/10] perf trace: Filter out POLLHUP'ed " Arnaldo Carvalho de Melo
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-03 21:59 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Adrian Hunter,
	David Ahern, Don Zickus, Frederic Weisbecker, Mike Galbraith,
	Namhyung Kim, Paul Mackerras, Peter Zijlstra, Stephane Eranian

From: Arnaldo Carvalho de Melo <acme@redhat.com>

So that we don't continue polling on vanished file descriptors, i.e.
file descriptors for events monitoring threads that exited.

I.e. the following 'perf record' command now exits as expected, instead
of staying in an eternal loop:

  $ sleep 5s &
  $ perf record -p `pidof sleep`

Reported-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-ftg46awsyfjc3axb0xm63oig@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-record.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 87e28a4e33ba..b87708ce09c9 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -467,6 +467,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 			if (err > 0 || (err < 0 && errno == EINTR))
 				err = 0;
 			waking++;
+
+			if (perf_evlist__filter_pollfd(rec->evlist, POLLERR | POLLHUP) == 0)
+				done = 1;
 		}
 
 		/*
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 05/10] perf trace: Filter out POLLHUP'ed file descriptors
  2014-09-03 21:59 [RFC 00/10] perf pollfd series v2 Arnaldo Carvalho de Melo
                   ` (3 preceding siblings ...)
  2014-09-03 21:59 ` [PATCH 04/10] perf record: Filter out POLLHUP'ed file descriptors Arnaldo Carvalho de Melo
@ 2014-09-03 21:59 ` Arnaldo Carvalho de Melo
  2014-09-03 22:00 ` [PATCH 06/10] perf evlist: Allow growing pollfd on add method Arnaldo Carvalho de Melo
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-03 21:59 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Adrian Hunter,
	David Ahern, Don Zickus, Frederic Weisbecker, Mike Galbraith,
	Namhyung Kim, Paul Mackerras, Peter Zijlstra, Stephane Eranian

From: Arnaldo Carvalho de Melo <acme@redhat.com>

So that we don't continue polling on vanished file descriptors, i.e.
file descriptors for events monitoring threads that exited.

I.e. the following 'trace' command now exits as expected, instead
of staying in an eternal loop:

      $ sleep 5s &
      $ trace -p `pidof sleep`

Reported-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-fhn2pyf8sqhoiwu42hxq9yq2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-trace.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index a9e96ff49c7f..09d4bc44215e 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -2171,7 +2171,8 @@ next_event:
 	if (trace->nr_events == before) {
 		int timeout = done ? 100 : -1;
 
-		if (poll(evlist->pollfd, evlist->nr_fds, timeout) > 0)
+		if (poll(evlist->pollfd, evlist->nr_fds, timeout) > 0 &&
+		    perf_evlist__filter_pollfd(evlist, POLLERR | POLLHUP) > 0)
 			goto again;
 	} else {
 		goto again;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 06/10] perf evlist: Allow growing pollfd on add method
  2014-09-03 21:59 [RFC 00/10] perf pollfd series v2 Arnaldo Carvalho de Melo
                   ` (4 preceding siblings ...)
  2014-09-03 21:59 ` [PATCH 05/10] perf trace: Filter out POLLHUP'ed " Arnaldo Carvalho de Melo
@ 2014-09-03 22:00 ` Arnaldo Carvalho de Melo
  2014-09-03 22:00 ` [PATCH 07/10] perf tests: Add pollfd growing test Arnaldo Carvalho de Melo
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-03 22:00 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Adrian Hunter,
	Corey Ashford, David Ahern, Frederic Weisbecker, Ingo Molnar,
	Jean Pihet, Jiri Olsa, Namhyung Kim, Paul Mackerras,
	Peter Zijlstra

From: Arnaldo Carvalho de Melo <acme@redhat.com>

This way we will be able to add more file descriptors to be polled,
like stdin or some timer fd.

At this point we might as well yank the pollfd class from evlist so that
it can be used in other places.

Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-8udu11ke3ev1yklprevldzyl@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/evlist.c | 40 +++++++++++++++++++++++++++++++++++-----
 tools/perf/util/evlist.h |  5 +++--
 2 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 502cd11ab17e..f8064f3bb594 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -402,7 +402,21 @@ int perf_evlist__enable_event_idx(struct perf_evlist *evlist,
 		return perf_evlist__enable_event_thread(evlist, evsel, idx);
 }
 
-static int perf_evlist__alloc_pollfd(struct perf_evlist *evlist)
+static int perf_evlist__grow_pollfd(struct perf_evlist *evlist, int hint)
+{
+	int nr_fds_alloc = evlist->nr_fds_alloc + hint;
+	size_t size = sizeof(struct pollfd) * nr_fds_alloc;
+	struct pollfd *pollfd = realloc(evlist->pollfd, size);
+
+	if (pollfd == NULL)
+		return -ENOMEM;
+
+	evlist->nr_fds_alloc = nr_fds_alloc;
+	evlist->pollfd	     = pollfd;
+	return 0;
+}
+
+int perf_evlist__alloc_pollfd(struct perf_evlist *evlist)
 {
 	int nr_cpus = cpu_map__nr(evlist->cpus);
 	int nr_threads = thread_map__nr(evlist->threads);
@@ -416,16 +430,28 @@ static int perf_evlist__alloc_pollfd(struct perf_evlist *evlist)
 			nfds += nr_cpus * nr_threads;
 	}
 
-	evlist->pollfd = malloc(sizeof(struct pollfd) * nfds);
-	return evlist->pollfd != NULL ? 0 : -ENOMEM;
+	if (evlist->nr_fds_alloc - evlist->nr_fds < nfds &&
+	    perf_evlist__grow_pollfd(evlist, nfds) < 0)
+		return -ENOMEM;
+
+	return 0;
 }
 
-void perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd)
+int perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd)
 {
+	/*
+	 * XXX: 64 is arbitrary, just not to call realloc at each fd.
+	 *	Find a better autogrowing heuristic
+	 */
+	if (evlist->nr_fds == evlist->nr_fds_alloc &&
+	    perf_evlist__grow_pollfd(evlist, 64) < 0)
+		return -ENOMEM;
+
 	fcntl(fd, F_SETFL, O_NONBLOCK);
 	evlist->pollfd[evlist->nr_fds].fd = fd;
 	evlist->pollfd[evlist->nr_fds].events = POLLIN | POLLERR | POLLHUP;
 	evlist->nr_fds++;
+	return 0;
 }
 
 int perf_evlist__filter_pollfd(struct perf_evlist *evlist, short revents_and_mask)
@@ -718,7 +744,11 @@ static int __perf_evlist__mmap(struct perf_evlist *evlist, int idx,
 		return -1;
 	}
 
-	perf_evlist__add_pollfd(evlist, fd);
+	if (perf_evlist__add_pollfd(evlist, fd) < 0) {
+		__perf_evlist__munmap(evlist->mmap[idx].base, idx);
+		return -1;
+	}
+
 	return 0;
 }
 
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 1082420951f9..bbc2fd01b5c5 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -30,6 +30,7 @@ struct perf_evlist {
 	int		 nr_entries;
 	int		 nr_groups;
 	int		 nr_fds;
+	int		 nr_fds_alloc;
 	int		 nr_mmaps;
 	size_t		 mmap_len;
 	int		 id_pos;
@@ -82,8 +83,8 @@ perf_evlist__find_tracepoint_by_name(struct perf_evlist *evlist,
 void perf_evlist__id_add(struct perf_evlist *evlist, struct perf_evsel *evsel,
 			 int cpu, int thread, u64 id);
 
-void perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd);
-
+int perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd);
+int perf_evlist__alloc_pollfd(struct perf_evlist *evlist);
 int perf_evlist__filter_pollfd(struct perf_evlist *evlist, short revents_and_mask);
 
 struct perf_evsel *perf_evlist__id2evsel(struct perf_evlist *evlist, u64 id);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 07/10] perf tests: Add pollfd growing test
  2014-09-03 21:59 [RFC 00/10] perf pollfd series v2 Arnaldo Carvalho de Melo
                   ` (5 preceding siblings ...)
  2014-09-03 22:00 ` [PATCH 06/10] perf evlist: Allow growing pollfd on add method Arnaldo Carvalho de Melo
@ 2014-09-03 22:00 ` Arnaldo Carvalho de Melo
  2014-09-03 22:00 ` [PATCH 08/10] perf kvm stat live: Use perf_evlist__add_pollfd() instead of local equivalent Arnaldo Carvalho de Melo
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-03 22:00 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Adrian Hunter,
	Corey Ashford, David Ahern, Frederic Weisbecker, Ingo Molnar,
	Jean Pihet, Jiri Olsa, Namhyung Kim, Paul Mackerras,
	Peter Zijlstra

From: Arnaldo Carvalho de Melo <acme@redhat.com>

  [acme@ssdandy linux]$ perf test "Add fd"
  34: Add fd to pollfd array, making it autogrow             : Ok
  [acme@ssdandy linux]$ perf test -v "Add fd"
  34: Add fd to pollfd array, making it autogrow             :
  --- start ---
  test child forked, pid 19817

  before growing array:   2 [ 1, 2 ]
  after 3rd add_pollfd:   3 [ 1, 2, 35 ]
  after 4th add_pollfd:   4 [ 1, 2, 35, 88 ]
  test child finished with 0
  ---- end ----
  Add fd to pollfd array, making it autogrow: Ok
  [acme@ssdandy linux]$

Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-smflpyta146bzog7z0effjss@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/tests/builtin-test.c |   4 ++
 tools/perf/tests/evlist.c       | 111 ++++++++++++++++++++++++++++++++++++++++
 tools/perf/tests/tests.h        |   1 +
 3 files changed, 116 insertions(+)

diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 41e556edbe02..174c3ffc5713 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -162,6 +162,10 @@ static struct test {
 		.func = test__perf_evlist__filter_pollfd,
 	},
 	{
+		.desc = "Add fd to pollfd array, making it autogrow",
+		.func = test__perf_evlist__add_pollfd,
+	},
+	{
 		.func = NULL,
 	},
 };
diff --git a/tools/perf/tests/evlist.c b/tools/perf/tests/evlist.c
index 77579158f4d9..99d7dfd4e20a 100644
--- a/tools/perf/tests/evlist.c
+++ b/tools/perf/tests/evlist.c
@@ -1,5 +1,7 @@
 #include "util/evlist.h"
 #include "util/debug.h"
+#include "util/thread_map.h"
+#include "util/cpumap.h"
 #include "tests/tests.h"
 
 static void perf_evlist__init_pollfd(struct perf_evlist *evlist,
@@ -101,3 +103,112 @@ int test__perf_evlist__filter_pollfd(void)
 
 	return 0;
 }
+
+int test__perf_evlist__add_pollfd(void)
+{
+	struct perf_evsel evsel = {
+		.system_wide = false,
+	};
+	struct thread_map threads = {
+		.nr = 2,
+	};
+	struct perf_evlist evlist_alloc = {
+		.pollfd	 = NULL,
+		.threads = &threads,
+	}, *evlist = &evlist_alloc;
+
+	INIT_LIST_HEAD(&evlist->entries);
+	list_add(&evsel.node, &evlist->entries);
+
+	if (perf_evlist__alloc_pollfd(evlist) < 0) {
+		pr_debug("\nperf_evlist__alloc_pollfd(evlist) failed!");
+		return TEST_FAIL;
+	}
+
+	if (evlist->nr_fds_alloc != threads.nr) {
+		pr_debug("\n_evlist__alloc_pollfd: nr_fds_alloc=%d != (threads->nr(%d) * cpu_map->nr(%d))=%d",
+			 evlist->nr_fds_alloc, thread_map__nr(evlist->threads), cpu_map__nr(evlist->cpus),
+			 thread_map__nr(evlist->threads) * cpu_map__nr(evlist->cpus));
+		return TEST_FAIL;
+	}
+
+	if (perf_evlist__add_pollfd(evlist, 1) < 0) {
+		pr_debug("\nperf_evlist__add_pollfd(evlist, 1) failed!");
+		return TEST_FAIL;
+	}
+
+	if (evlist->nr_fds != 1) {
+		pr_debug("\nperf_evlist__add_pollfd(evlist, 1)=%d != 1", evlist->nr_fds);
+		return TEST_FAIL;
+	}
+
+	if (perf_evlist__add_pollfd(evlist, 2) < 0) {
+		pr_debug("\nperf_evlist__add_pollfd(evlist, 2) failed!");
+		return TEST_FAIL;
+	}
+
+	if (evlist->nr_fds != 2) {
+		pr_debug("\nperf_evlist__add_pollfd(evlist, 2)=%d != 2", evlist->nr_fds);
+		return TEST_FAIL;
+	}
+
+	perf_evlist__fprintf_pollfd(evlist, "before growing array", stderr);
+
+	if (perf_evlist__add_pollfd(evlist, 35) < 0) {
+		pr_debug("\nperf_evlist__add_pollfd(evlist, 35) failed!");
+		return TEST_FAIL;
+	}
+
+	if (evlist->nr_fds != 3) {
+		pr_debug("\nperf_evlist__add_pollfd(evlist, 35)=%d != 3", evlist->nr_fds);
+		return TEST_FAIL;
+	}
+
+	if (evlist->pollfd == NULL) {
+		pr_debug("\nperf_evlist__add_pollfd(evlist, 35) should have allocated evlist->pollfd!");
+		return TEST_FAIL;
+	}
+
+	perf_evlist__fprintf_pollfd(evlist, "after 3rd add_pollfd", stderr);
+
+	if (evlist->pollfd[2].fd != 35) {
+		pr_debug("\nevlist->pollfd[2](%d) != 35!", evlist->pollfd[2].fd);
+		return TEST_FAIL;
+	}
+
+	if (perf_evlist__add_pollfd(evlist, 88) < 0) {
+		pr_debug("\nperf_evlist__add_pollfd(evlist, 88) failed!");
+		return TEST_FAIL;
+	}
+
+	if (evlist->nr_fds != 4) {
+		pr_debug("\nperf_evlist__add_pollfd(evlist, 88)=%d != 2", evlist->nr_fds);
+		return TEST_FAIL;
+	}
+
+	perf_evlist__fprintf_pollfd(evlist, "after 4th add_pollfd", stderr);
+
+	if (evlist->pollfd[0].fd != 1) {
+		pr_debug("\nevlist->pollfd[0](%d) != 1!", evlist->pollfd[0].fd);
+		return TEST_FAIL;
+	}
+
+	if (evlist->pollfd[1].fd != 2) {
+		pr_debug("\nevlist->pollfd[1](%d) != 2!", evlist->pollfd[1].fd);
+		return TEST_FAIL;
+	}
+
+	if (evlist->pollfd[2].fd != 35) {
+		pr_debug("\nevlist->pollfd[2](%d) != 35!", evlist->pollfd[2].fd);
+		return TEST_FAIL;
+	}
+
+	if (evlist->pollfd[3].fd != 88) {
+		pr_debug("\nevlist->pollfd[3](%d) != 88!", evlist->pollfd[3].fd);
+		return TEST_FAIL;
+	}
+
+	pr_debug("\n");
+
+	return 0;
+}
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index 72c4c039bd80..699d4bb61dc7 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -50,6 +50,7 @@ int test__hists_output(void);
 int test__hists_cumulate(void);
 int test__switch_tracking(void);
 int test__perf_evlist__filter_pollfd(void);
+int test__perf_evlist__add_pollfd(void);
 
 #if defined(__x86_64__) || defined(__i386__) || defined(__arm__)
 #ifdef HAVE_DWARF_UNWIND_SUPPORT
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 08/10] perf kvm stat live: Use perf_evlist__add_pollfd() instead of local equivalent
  2014-09-03 21:59 [RFC 00/10] perf pollfd series v2 Arnaldo Carvalho de Melo
                   ` (6 preceding siblings ...)
  2014-09-03 22:00 ` [PATCH 07/10] perf tests: Add pollfd growing test Arnaldo Carvalho de Melo
@ 2014-09-03 22:00 ` Arnaldo Carvalho de Melo
  2014-09-03 22:00 ` [PATCH 09/10] perf evlist: Introduce poll method for common code idiom Arnaldo Carvalho de Melo
  2014-09-03 22:00 ` [PATCH 10/10] tools lib api: Adopt fdarray class from perf's evlist Arnaldo Carvalho de Melo
  9 siblings, 0 replies; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-03 22:00 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Adrian Hunter,
	Corey Ashford, David Ahern, Frederic Weisbecker, Ingo Molnar,
	Jean Pihet, Jiri Olsa, Namhyung Kim, Paul Mackerras,
	Peter Zijlstra

From: Arnaldo Carvalho de Melo <acme@redhat.com>

Since we can add file descriptors to the evlist pollfd and it will
autogrow, no need to copy all events to a local pollfd array, just add
the timer and stdin file descriptors.

Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-2hvp9iromiheh6rl4oaa08x5@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-kvm.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index f5d3ae483110..a440219b0be0 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -919,15 +919,8 @@ static int kvm_events_live_report(struct perf_kvm_stat *kvm)
 	signal(SIGINT, sig_handler);
 	signal(SIGTERM, sig_handler);
 
-	/* copy pollfds -- need to add timerfd and stdin */
+	/* use pollfds -- need to add timerfd and stdin */
 	nr_fds = kvm->evlist->nr_fds;
-	pollfds = zalloc(sizeof(struct pollfd) * (nr_fds + 2));
-	if (!pollfds) {
-		err = -ENOMEM;
-		goto out;
-	}
-	memcpy(pollfds, kvm->evlist->pollfd,
-		sizeof(struct pollfd) * kvm->evlist->nr_fds);
 
 	/* add timer fd */
 	if (perf_kvm__timerfd_create(kvm) < 0) {
@@ -935,17 +928,21 @@ static int kvm_events_live_report(struct perf_kvm_stat *kvm)
 		goto out;
 	}
 
-	pollfds[nr_fds].fd = kvm->timerfd;
-	pollfds[nr_fds].events = POLLIN;
+	if (perf_evlist__add_pollfd(kvm->evlist, kvm->timerfd))
+		goto out;
+
 	nr_fds++;
 
-	pollfds[nr_fds].fd = fileno(stdin);
-	pollfds[nr_fds].events = POLLIN;
+	if (perf_evlist__add_pollfd(kvm->evlist, fileno(stdin)))
+		goto out;
+
 	nr_stdin = nr_fds;
 	nr_fds++;
 	if (fd_set_nonblock(fileno(stdin)) != 0)
 		goto out;
 
+	pollfds	 = kvm->evlist->pollfd;
+
 	/* everything is good - enable the events and process */
 	perf_evlist__enable(kvm->evlist);
 
@@ -979,7 +976,6 @@ out:
 		close(kvm->timerfd);
 
 	tcsetattr(0, TCSAFLUSH, &save);
-	free(pollfds);
 	return err;
 }
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 09/10] perf evlist: Introduce poll method for common code idiom
  2014-09-03 21:59 [RFC 00/10] perf pollfd series v2 Arnaldo Carvalho de Melo
                   ` (7 preceding siblings ...)
  2014-09-03 22:00 ` [PATCH 08/10] perf kvm stat live: Use perf_evlist__add_pollfd() instead of local equivalent Arnaldo Carvalho de Melo
@ 2014-09-03 22:00 ` Arnaldo Carvalho de Melo
  2014-09-03 22:00 ` [PATCH 10/10] tools lib api: Adopt fdarray class from perf's evlist Arnaldo Carvalho de Melo
  9 siblings, 0 replies; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-03 22:00 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Adrian Hunter,
	Corey Ashford, David Ahern, Frederic Weisbecker, Ingo Molnar,
	Jean Pihet, Jiri Olsa, Namhyung Kim, Paul Mackerras,
	Peter Zijlstra

From: Arnaldo Carvalho de Melo <acme@redhat.com>

Since we have access two evlist members in all these poll calls, provide
a helper.

This will also help to make the patch introducing the pollfd class more
clear, as the evlist specific uses will be hiden away
perf_evlist__poll().

Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-hizx410uqouefj52jujubv2y@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-record.c               | 2 +-
 tools/perf/builtin-top.c                  | 4 ++--
 tools/perf/builtin-trace.c                | 2 +-
 tools/perf/tests/open-syscall-tp-fields.c | 2 +-
 tools/perf/tests/perf-record.c            | 2 +-
 tools/perf/tests/task-exit.c              | 2 +-
 tools/perf/util/evlist.c                  | 5 +++++
 tools/perf/util/evlist.h                  | 2 ++
 tools/perf/util/python.c                  | 2 +-
 9 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index b87708ce09c9..aac86600b89e 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -459,7 +459,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		if (hits == rec->samples) {
 			if (done)
 				break;
-			err = poll(rec->evlist->pollfd, rec->evlist->nr_fds, -1);
+			err = perf_evlist__poll(rec->evlist, -1);
 			/*
 			 * Propagate error, only if there's any. Ignore positive
 			 * number of returned events and interrupt error.
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 9848e270b92c..ef7c808c8ce2 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -964,7 +964,7 @@ static int __cmd_top(struct perf_top *top)
                 perf_evlist__enable(top->evlist);
 
 	/* Wait for a minimal set of events before starting the snapshot */
-	poll(top->evlist->pollfd, top->evlist->nr_fds, 100);
+	perf_evlist__poll(top->evlist, 100);
 
 	perf_top__mmap_read(top);
 
@@ -991,7 +991,7 @@ static int __cmd_top(struct perf_top *top)
 		perf_top__mmap_read(top);
 
 		if (hits == top->samples)
-			ret = poll(top->evlist->pollfd, top->evlist->nr_fds, 100);
+			ret = perf_evlist__poll(top->evlist, 100);
 	}
 
 	ret = 0;
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 09d4bc44215e..970aa0297697 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -2171,7 +2171,7 @@ next_event:
 	if (trace->nr_events == before) {
 		int timeout = done ? 100 : -1;
 
-		if (poll(evlist->pollfd, evlist->nr_fds, timeout) > 0 &&
+		if (perf_evlist__poll(evlist, timeout) > 0 &&
 		    perf_evlist__filter_pollfd(evlist, POLLERR | POLLHUP) > 0)
 			goto again;
 	} else {
diff --git a/tools/perf/tests/open-syscall-tp-fields.c b/tools/perf/tests/open-syscall-tp-fields.c
index 922bdb627950..127dcae0b760 100644
--- a/tools/perf/tests/open-syscall-tp-fields.c
+++ b/tools/perf/tests/open-syscall-tp-fields.c
@@ -105,7 +105,7 @@ int test__syscall_open_tp_fields(void)
 		}
 
 		if (nr_events == before)
-			poll(evlist->pollfd, evlist->nr_fds, 10);
+			perf_evlist__poll(evlist, 10);
 
 		if (++nr_polls > 5) {
 			pr_debug("%s: no events!\n", __func__);
diff --git a/tools/perf/tests/perf-record.c b/tools/perf/tests/perf-record.c
index 2ce753c1db63..7a228a2a070b 100644
--- a/tools/perf/tests/perf-record.c
+++ b/tools/perf/tests/perf-record.c
@@ -268,7 +268,7 @@ int test__PERF_RECORD(void)
 		 * perf_event_attr.wakeup_events, just PERF_EVENT_SAMPLE does.
 		 */
 		if (total_events == before && false)
-			poll(evlist->pollfd, evlist->nr_fds, -1);
+			perf_evlist__poll(evlist, -1);
 
 		sleep(1);
 		if (++wakeups > 5) {
diff --git a/tools/perf/tests/task-exit.c b/tools/perf/tests/task-exit.c
index 87522f01c7ad..3a8fedef83bc 100644
--- a/tools/perf/tests/task-exit.c
+++ b/tools/perf/tests/task-exit.c
@@ -105,7 +105,7 @@ retry:
 	}
 
 	if (!exited || !nr_exit) {
-		poll(evlist->pollfd, evlist->nr_fds, -1);
+		perf_evlist__poll(evlist, -1);
 		goto retry;
 	}
 
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index f8064f3bb594..686f4a687493 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -475,6 +475,11 @@ int perf_evlist__filter_pollfd(struct perf_evlist *evlist, short revents_and_mas
 	return nr_fds;
 }
 
+int perf_evlist__poll(struct perf_evlist *evlist, int timeout)
+{
+	return poll(evlist->pollfd, evlist->nr_fds, timeout);
+}
+
 static void perf_evlist__id_hash(struct perf_evlist *evlist,
 				 struct perf_evsel *evsel,
 				 int cpu, int thread, u64 id)
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index bbc2fd01b5c5..d7e99b67c94f 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -87,6 +87,8 @@ int perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd);
 int perf_evlist__alloc_pollfd(struct perf_evlist *evlist);
 int perf_evlist__filter_pollfd(struct perf_evlist *evlist, short revents_and_mask);
 
+int perf_evlist__poll(struct perf_evlist *evlist, int timeout);
+
 struct perf_evsel *perf_evlist__id2evsel(struct perf_evlist *evlist, u64 id);
 
 struct perf_sample_id *perf_evlist__id2sid(struct perf_evlist *evlist, u64 id);
diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
index 12aa9b0d0ba1..4472f8be8e35 100644
--- a/tools/perf/util/python.c
+++ b/tools/perf/util/python.c
@@ -736,7 +736,7 @@ static PyObject *pyrf_evlist__poll(struct pyrf_evlist *pevlist,
 	if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|i", kwlist, &timeout))
 		return NULL;
 
-	n = poll(evlist->pollfd, evlist->nr_fds, timeout);
+	n = perf_evlist__poll(evlist, timeout);
 	if (n < 0) {
 		PyErr_SetFromErrno(PyExc_OSError);
 		return NULL;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 10/10] tools lib api: Adopt fdarray class from perf's evlist
  2014-09-03 21:59 [RFC 00/10] perf pollfd series v2 Arnaldo Carvalho de Melo
                   ` (8 preceding siblings ...)
  2014-09-03 22:00 ` [PATCH 09/10] perf evlist: Introduce poll method for common code idiom Arnaldo Carvalho de Melo
@ 2014-09-03 22:00 ` Arnaldo Carvalho de Melo
  9 siblings, 0 replies; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-03 22:00 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Adrian Hunter,
	Borislav Petkov, Corey Ashford, David Ahern, Frederic Weisbecker,
	Ingo Molnar, Jean Pihet, Jiri Olsa, Namhyung Kim, Paul Mackerras,
	Peter Zijlstra

From: Arnaldo Carvalho de Melo <acme@redhat.com>

The extensible file description array that grew in the perf_evlist class
can be useful for other tools, as it is not something that only evlists
need, so move it to tools/lib/api/fd to ease sharing it.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-uqa1qyxyppfxepzjrmqrvmb6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/lib/api/Makefile    |   7 ++-
 tools/lib/api/fd/array.c  | 108 ++++++++++++++++++++++++++++++++++++++++++++++
 tools/lib/api/fd/array.h  |  32 ++++++++++++++
 tools/perf/builtin-kvm.c  |   4 +-
 tools/perf/tests/evlist.c |  67 ++++++++++++++--------------
 tools/perf/util/evlist.c  |  57 ++++--------------------
 tools/perf/util/evlist.h  |   5 +--
 tools/perf/util/python.c  |   4 +-
 8 files changed, 195 insertions(+), 89 deletions(-)
 create mode 100644 tools/lib/api/fd/array.c
 create mode 100644 tools/lib/api/fd/array.h

diff --git a/tools/lib/api/Makefile b/tools/lib/api/Makefile
index ce00f7ee6455..0d46df1dcbe1 100644
--- a/tools/lib/api/Makefile
+++ b/tools/lib/api/Makefile
@@ -10,9 +10,14 @@ LIB_OBJS=
 
 LIB_H += fs/debugfs.h
 LIB_H += fs/fs.h
+# See comment below about piggybacking...
+LIB_H += fd/array.h
 
 LIB_OBJS += $(OUTPUT)fs/debugfs.o
 LIB_OBJS += $(OUTPUT)fs/fs.o
+# XXX piggybacking here, need to introduce libapikfd, or rename this
+# to plain libapik.a and make it have it all api goodies
+LIB_OBJS += $(OUTPUT)fd/array.o
 
 LIBFILE = libapikfs.a
 
@@ -29,7 +34,7 @@ $(LIBFILE): $(LIB_OBJS)
 $(LIB_OBJS): $(LIB_H)
 
 libapi_dirs:
-	$(QUIET_MKDIR)mkdir -p $(OUTPUT)fs/
+	$(QUIET_MKDIR)mkdir -p $(OUTPUT){fs,fd}/
 
 $(OUTPUT)%.o: %.c libapi_dirs
 	$(QUIET_CC)$(CC) -o $@ -c $(ALL_CFLAGS) $<
diff --git a/tools/lib/api/fd/array.c b/tools/lib/api/fd/array.c
new file mode 100644
index 000000000000..bd923d41b450
--- /dev/null
+++ b/tools/lib/api/fd/array.c
@@ -0,0 +1,108 @@
+/*
+ * Copyright (C) 2014, Red Hat Inc, Arnaldo Carvalho de Melo <acme@redhat.com>
+ *
+ * Released under the GPL v2. (and only v2, not any later version)
+ */
+#include "array.h"
+#include <errno.h>
+#include <fcntl.h>
+#include <poll.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+void fdarray__init(struct fdarray *fda, int nr_autogrow)
+{
+	fda->entries	 = NULL;
+	fda->nr		 = fda->nr_alloc = 0;
+	fda->nr_autogrow = nr_autogrow;
+}
+
+int fdarray__grow(struct fdarray *fda, int nr)
+{
+	int nr_alloc = fda->nr_alloc + nr;
+	size_t size  = sizeof(struct pollfd) * nr_alloc;
+	struct pollfd *entries = realloc(fda->entries, size);
+
+	if (entries == NULL)
+		return -ENOMEM;
+
+	fda->nr_alloc = nr_alloc;
+	fda->entries  = entries;
+	return 0;
+}
+
+struct fdarray *fdarray__new(int nr_alloc, int nr_autogrow)
+{
+	struct fdarray *fda = calloc(1, sizeof(*fda));
+
+	if (fda != NULL) {
+		if (fdarray__grow(fda, nr_alloc)) {
+			free(fda);
+			fda = NULL;
+		} else {
+			fda->nr_autogrow = nr_autogrow;
+		}
+	}
+
+	return fda;
+}
+
+void fdarray__exit(struct fdarray *fda)
+{
+	free(fda->entries);
+	fdarray__init(fda, 0);
+}
+
+void fdarray__delete(struct fdarray *fda)
+{
+	fdarray__exit(fda);
+	free(fda);
+}
+
+int fdarray__add(struct fdarray *fda, int fd, short revents)
+{
+	if (fda->nr == fda->nr_alloc &&
+	    fdarray__grow(fda, fda->nr_autogrow) < 0)
+		return -ENOMEM;
+
+	fcntl(fd, F_SETFL, O_NONBLOCK);
+	fda->entries[fda->nr].fd     = fd;
+	fda->entries[fda->nr].events = revents;
+	fda->nr++;
+	return 0;
+}
+
+int fdarray__filter(struct fdarray *fda, short revents)
+{
+	int fd, nr = 0;
+
+	if (fda->nr == 0)
+		return 0;
+
+	for (fd = 0; fd < fda->nr; ++fd) {
+		if (fda->entries[fd].revents & revents)
+			continue;
+
+		if (fd != nr)
+			fda->entries[nr] = fda->entries[fd];
+
+		++nr;
+	}
+
+	return fda->nr = nr;
+}
+
+int fdarray__poll(struct fdarray *fda, int timeout)
+{
+	return poll(fda->entries, fda->nr, timeout);
+}
+
+int fdarray__fprintf(struct fdarray *fda, FILE *fp)
+{
+	int fd, printed = fprintf(fp, "%d [ ", fda->nr);
+
+	for (fd = 0; fd < fda->nr; ++fd)
+		printed += fprintf(fp, "%s%d", fd ? ", " : "", fda->entries[fd].fd);
+
+	return printed + fprintf(fp, " ]");
+}
diff --git a/tools/lib/api/fd/array.h b/tools/lib/api/fd/array.h
new file mode 100644
index 000000000000..de38361ba69e
--- /dev/null
+++ b/tools/lib/api/fd/array.h
@@ -0,0 +1,32 @@
+#ifndef __API_FD_ARRAY__
+#define __API_FD_ARRAY__
+
+#include <stdio.h>
+
+struct pollfd;
+
+struct fdarray {
+	int	       nr;
+	int	       nr_alloc;
+	int	       nr_autogrow;
+	struct pollfd *entries;
+};
+
+void fdarray__init(struct fdarray *fda, int nr_autogrow);
+void fdarray__exit(struct fdarray *fda);
+
+struct fdarray *fdarray__new(int nr_alloc, int nr_autogrow);
+void fdarray__delete(struct fdarray *fda);
+
+int fdarray__add(struct fdarray *fda, int fd, short revents);
+int fdarray__poll(struct fdarray *fda, int timeout);
+int fdarray__filter(struct fdarray *fda, short revents);
+int fdarray__grow(struct fdarray *fda, int extra);
+int fdarray__fprintf(struct fdarray *fda, FILE *fp);
+
+static inline int fdarray__available_entries(struct fdarray *fda)
+{
+	return fda->nr_alloc - fda->nr;
+}
+
+#endif /* __API_FD_ARRAY__ */
diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index a440219b0be0..1e639d6265cc 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -920,7 +920,7 @@ static int kvm_events_live_report(struct perf_kvm_stat *kvm)
 	signal(SIGTERM, sig_handler);
 
 	/* use pollfds -- need to add timerfd and stdin */
-	nr_fds = kvm->evlist->nr_fds;
+	nr_fds = kvm->evlist->pollfd.nr;
 
 	/* add timer fd */
 	if (perf_kvm__timerfd_create(kvm) < 0) {
@@ -941,7 +941,7 @@ static int kvm_events_live_report(struct perf_kvm_stat *kvm)
 	if (fd_set_nonblock(fileno(stdin)) != 0)
 		goto out;
 
-	pollfds	 = kvm->evlist->pollfd;
+	pollfds	 = kvm->evlist->pollfd.entries;
 
 	/* everything is good - enable the events and process */
 	perf_evlist__enable(kvm->evlist);
diff --git a/tools/perf/tests/evlist.c b/tools/perf/tests/evlist.c
index 99d7dfd4e20a..fe3e118b25bd 100644
--- a/tools/perf/tests/evlist.c
+++ b/tools/perf/tests/evlist.c
@@ -8,28 +8,26 @@ static void perf_evlist__init_pollfd(struct perf_evlist *evlist,
 				     int nr_fds_alloc, short revents)
 {
 	int fd;
+	struct fdarray *fda = &evlist->pollfd;
 
-	evlist->nr_fds = nr_fds_alloc;
+	fda->nr = nr_fds_alloc;
 
 	for (fd = 0; fd < nr_fds_alloc; ++fd) {
-		evlist->pollfd[fd].fd	   = nr_fds_alloc - fd;
-		evlist->pollfd[fd].revents = revents;
+		fda->entries[fd].fd	 = nr_fds_alloc - fd;
+		fda->entries[fd].revents = revents;
 	}
 }
 
 static int perf_evlist__fprintf_pollfd(struct perf_evlist *evlist,
 				       const char *prefix, FILE *fp)
 {
-	int printed = 0, fd;
+	int printed = 0;
 
 	if (!verbose)
 		return 0;
 
-	printed += fprintf(fp, "\n%s: %3d [ ", prefix, evlist->nr_fds);
-	for (fd = 0; fd < evlist->nr_fds; ++fd)
-		printed += fprintf(fp, "%s%d", fd ? ", " : "", evlist->pollfd[fd].fd);
-	printed += fprintf(fp, " ]");
-	return printed;
+	printed += fprintf(fp, "\n%s: ", prefix);
+	return fdarray__fprintf(&evlist->pollfd, fp);
 }
 
 int test__perf_evlist__filter_pollfd(void)
@@ -38,7 +36,9 @@ int test__perf_evlist__filter_pollfd(void)
 	int nr_fds, expected_fd[2], fd;
 	struct pollfd pollfd[nr_fds_alloc];
 	struct perf_evlist evlist_alloc = {
-		.pollfd	= pollfd,
+		.pollfd	= {
+			.entries = pollfd,
+		},
 	}, *evlist = &evlist_alloc;
 
 	perf_evlist__init_pollfd(evlist, nr_fds_alloc, POLLIN);
@@ -113,9 +113,12 @@ int test__perf_evlist__add_pollfd(void)
 		.nr = 2,
 	};
 	struct perf_evlist evlist_alloc = {
-		.pollfd	 = NULL,
+		.pollfd	 = {
+			.entries = NULL,
+		},
 		.threads = &threads,
 	}, *evlist = &evlist_alloc;
+	struct fdarray *fda = &evlist->pollfd;
 
 	INIT_LIST_HEAD(&evlist->entries);
 	list_add(&evsel.node, &evlist->entries);
@@ -125,9 +128,9 @@ int test__perf_evlist__add_pollfd(void)
 		return TEST_FAIL;
 	}
 
-	if (evlist->nr_fds_alloc != threads.nr) {
+	if (fda->nr_alloc != threads.nr) {
 		pr_debug("\n_evlist__alloc_pollfd: nr_fds_alloc=%d != (threads->nr(%d) * cpu_map->nr(%d))=%d",
-			 evlist->nr_fds_alloc, thread_map__nr(evlist->threads), cpu_map__nr(evlist->cpus),
+			 fda->nr_alloc, thread_map__nr(evlist->threads), cpu_map__nr(evlist->cpus),
 			 thread_map__nr(evlist->threads) * cpu_map__nr(evlist->cpus));
 		return TEST_FAIL;
 	}
@@ -137,8 +140,8 @@ int test__perf_evlist__add_pollfd(void)
 		return TEST_FAIL;
 	}
 
-	if (evlist->nr_fds != 1) {
-		pr_debug("\nperf_evlist__add_pollfd(evlist, 1)=%d != 1", evlist->nr_fds);
+	if (fda->nr != 1) {
+		pr_debug("\nperf_evlist__add_pollfd(evlist, 1)=%d != 1", fda->nr);
 		return TEST_FAIL;
 	}
 
@@ -147,8 +150,8 @@ int test__perf_evlist__add_pollfd(void)
 		return TEST_FAIL;
 	}
 
-	if (evlist->nr_fds != 2) {
-		pr_debug("\nperf_evlist__add_pollfd(evlist, 2)=%d != 2", evlist->nr_fds);
+	if (fda->nr != 2) {
+		pr_debug("\nperf_evlist__add_pollfd(evlist, 2)=%d != 2", fda->nr);
 		return TEST_FAIL;
 	}
 
@@ -159,20 +162,20 @@ int test__perf_evlist__add_pollfd(void)
 		return TEST_FAIL;
 	}
 
-	if (evlist->nr_fds != 3) {
-		pr_debug("\nperf_evlist__add_pollfd(evlist, 35)=%d != 3", evlist->nr_fds);
+	if (fda->nr != 3) {
+		pr_debug("\nperf_evlist__add_pollfd(evlist, 35)=%d != 3", fda->nr);
 		return TEST_FAIL;
 	}
 
-	if (evlist->pollfd == NULL) {
+	if (fda->entries == NULL) {
 		pr_debug("\nperf_evlist__add_pollfd(evlist, 35) should have allocated evlist->pollfd!");
 		return TEST_FAIL;
 	}
 
 	perf_evlist__fprintf_pollfd(evlist, "after 3rd add_pollfd", stderr);
 
-	if (evlist->pollfd[2].fd != 35) {
-		pr_debug("\nevlist->pollfd[2](%d) != 35!", evlist->pollfd[2].fd);
+	if (fda->entries[2].fd != 35) {
+		pr_debug("\nfda->entries[2](%d) != 35!", fda->entries[2].fd);
 		return TEST_FAIL;
 	}
 
@@ -181,30 +184,30 @@ int test__perf_evlist__add_pollfd(void)
 		return TEST_FAIL;
 	}
 
-	if (evlist->nr_fds != 4) {
-		pr_debug("\nperf_evlist__add_pollfd(evlist, 88)=%d != 2", evlist->nr_fds);
+	if (fda->nr != 4) {
+		pr_debug("\nperf_evlist__add_pollfd(evlist, 88)=%d != 2", fda->nr);
 		return TEST_FAIL;
 	}
 
 	perf_evlist__fprintf_pollfd(evlist, "after 4th add_pollfd", stderr);
 
-	if (evlist->pollfd[0].fd != 1) {
-		pr_debug("\nevlist->pollfd[0](%d) != 1!", evlist->pollfd[0].fd);
+	if (fda->entries[0].fd != 1) {
+		pr_debug("\nfda->entries[0](%d) != 1!", fda->entries[0].fd);
 		return TEST_FAIL;
 	}
 
-	if (evlist->pollfd[1].fd != 2) {
-		pr_debug("\nevlist->pollfd[1](%d) != 2!", evlist->pollfd[1].fd);
+	if (fda->entries[1].fd != 2) {
+		pr_debug("\nfda->entries[1](%d) != 2!", fda->entries[1].fd);
 		return TEST_FAIL;
 	}
 
-	if (evlist->pollfd[2].fd != 35) {
-		pr_debug("\nevlist->pollfd[2](%d) != 35!", evlist->pollfd[2].fd);
+	if (fda->entries[2].fd != 35) {
+		pr_debug("\nfda->entries[2](%d) != 35!", fda->entries[2].fd);
 		return TEST_FAIL;
 	}
 
-	if (evlist->pollfd[3].fd != 88) {
-		pr_debug("\nevlist->pollfd[3](%d) != 88!", evlist->pollfd[3].fd);
+	if (fda->entries[3].fd != 88) {
+		pr_debug("\nfda->entries[3](%d) != 88!", fda->entries[3].fd);
 		return TEST_FAIL;
 	}
 
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 686f4a687493..8b386775f069 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -37,6 +37,7 @@ void perf_evlist__init(struct perf_evlist *evlist, struct cpu_map *cpus,
 		INIT_HLIST_HEAD(&evlist->heads[i]);
 	INIT_LIST_HEAD(&evlist->entries);
 	perf_evlist__set_maps(evlist, cpus, threads);
+	fdarray__init(&evlist->pollfd, 64);
 	evlist->workload.pid = -1;
 }
 
@@ -102,7 +103,7 @@ static void perf_evlist__purge(struct perf_evlist *evlist)
 void perf_evlist__exit(struct perf_evlist *evlist)
 {
 	zfree(&evlist->mmap);
-	zfree(&evlist->pollfd);
+	fdarray__exit(&evlist->pollfd);
 }
 
 void perf_evlist__delete(struct perf_evlist *evlist)
@@ -402,20 +403,6 @@ int perf_evlist__enable_event_idx(struct perf_evlist *evlist,
 		return perf_evlist__enable_event_thread(evlist, evsel, idx);
 }
 
-static int perf_evlist__grow_pollfd(struct perf_evlist *evlist, int hint)
-{
-	int nr_fds_alloc = evlist->nr_fds_alloc + hint;
-	size_t size = sizeof(struct pollfd) * nr_fds_alloc;
-	struct pollfd *pollfd = realloc(evlist->pollfd, size);
-
-	if (pollfd == NULL)
-		return -ENOMEM;
-
-	evlist->nr_fds_alloc = nr_fds_alloc;
-	evlist->pollfd	     = pollfd;
-	return 0;
-}
-
 int perf_evlist__alloc_pollfd(struct perf_evlist *evlist)
 {
 	int nr_cpus = cpu_map__nr(evlist->cpus);
@@ -430,8 +417,8 @@ int perf_evlist__alloc_pollfd(struct perf_evlist *evlist)
 			nfds += nr_cpus * nr_threads;
 	}
 
-	if (evlist->nr_fds_alloc - evlist->nr_fds < nfds &&
-	    perf_evlist__grow_pollfd(evlist, nfds) < 0)
+	if (fdarray__available_entries(&evlist->pollfd) < nfds &&
+	    fdarray__grow(&evlist->pollfd, nfds) < 0)
 		return -ENOMEM;
 
 	return 0;
@@ -439,45 +426,17 @@ int perf_evlist__alloc_pollfd(struct perf_evlist *evlist)
 
 int perf_evlist__add_pollfd(struct perf_evlist *evlist, int fd)
 {
-	/*
-	 * XXX: 64 is arbitrary, just not to call realloc at each fd.
-	 *	Find a better autogrowing heuristic
-	 */
-	if (evlist->nr_fds == evlist->nr_fds_alloc &&
-	    perf_evlist__grow_pollfd(evlist, 64) < 0)
-		return -ENOMEM;
-
-	fcntl(fd, F_SETFL, O_NONBLOCK);
-	evlist->pollfd[evlist->nr_fds].fd = fd;
-	evlist->pollfd[evlist->nr_fds].events = POLLIN | POLLERR | POLLHUP;
-	evlist->nr_fds++;
-	return 0;
+	return fdarray__add(&evlist->pollfd, fd, POLLIN | POLLERR | POLLHUP);
 }
 
 int perf_evlist__filter_pollfd(struct perf_evlist *evlist, short revents_and_mask)
 {
-	int fd, nr_fds = 0;
-
-	if (evlist->nr_fds == 0)
-		return 0;
-
-	for (fd = 0; fd < evlist->nr_fds; ++fd) {
-		if (evlist->pollfd[fd].revents & revents_and_mask)
-			continue;
-
-		if (fd != nr_fds)
-			evlist->pollfd[nr_fds] = evlist->pollfd[fd];
-
-		++nr_fds;
-	}
-
-	evlist->nr_fds = nr_fds;
-	return nr_fds;
+	return fdarray__filter(&evlist->pollfd, revents_and_mask);
 }
 
 int perf_evlist__poll(struct perf_evlist *evlist, int timeout)
 {
-	return poll(evlist->pollfd, evlist->nr_fds, timeout);
+	return fdarray__poll(&evlist->pollfd, timeout);
 }
 
 static void perf_evlist__id_hash(struct perf_evlist *evlist,
@@ -937,7 +896,7 @@ int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages,
 	if (evlist->mmap == NULL && perf_evlist__alloc_mmap(evlist) < 0)
 		return -ENOMEM;
 
-	if (evlist->pollfd == NULL && perf_evlist__alloc_pollfd(evlist) < 0)
+	if (evlist->pollfd.entries == NULL && perf_evlist__alloc_pollfd(evlist) < 0)
 		return -ENOMEM;
 
 	evlist->overwrite = overwrite;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index d7e99b67c94f..fc013704d903 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -2,6 +2,7 @@
 #define __PERF_EVLIST_H 1
 
 #include <linux/list.h>
+#include <api/fd/array.h>
 #include <stdio.h>
 #include "../perf.h"
 #include "event.h"
@@ -29,8 +30,6 @@ struct perf_evlist {
 	struct hlist_head heads[PERF_EVLIST__HLIST_SIZE];
 	int		 nr_entries;
 	int		 nr_groups;
-	int		 nr_fds;
-	int		 nr_fds_alloc;
 	int		 nr_mmaps;
 	size_t		 mmap_len;
 	int		 id_pos;
@@ -41,8 +40,8 @@ struct perf_evlist {
 		pid_t	pid;
 	} workload;
 	bool		 overwrite;
+	struct fdarray	 pollfd;
 	struct perf_mmap *mmap;
-	struct pollfd	 *pollfd;
 	struct thread_map *threads;
 	struct cpu_map	  *cpus;
 	struct perf_evsel *selected;
diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
index 4472f8be8e35..3dda85ca50c1 100644
--- a/tools/perf/util/python.c
+++ b/tools/perf/util/python.c
@@ -753,9 +753,9 @@ static PyObject *pyrf_evlist__get_pollfd(struct pyrf_evlist *pevlist,
         PyObject *list = PyList_New(0);
 	int i;
 
-	for (i = 0; i < evlist->nr_fds; ++i) {
+	for (i = 0; i < evlist->pollfd.nr; ++i) {
 		PyObject *file;
-		FILE *fp = fdopen(evlist->pollfd[i].fd, "r");
+		FILE *fp = fdopen(evlist->pollfd.entries[i].fd, "r");
 
 		if (fp == NULL)
 			goto free_list;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 04/10] perf record: Filter out POLLHUP'ed file descriptors
  2014-09-03 21:59 ` [PATCH 04/10] perf record: Filter out POLLHUP'ed file descriptors Arnaldo Carvalho de Melo
@ 2014-09-04 12:32   ` Adrian Hunter
  2014-09-04 15:19     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 25+ messages in thread
From: Adrian Hunter @ 2014-09-04 12:32 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: linux-kernel, Arnaldo Carvalho de Melo, David Ahern, Don Zickus,
	Frederic Weisbecker, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Peter Zijlstra, Stephane Eranian

On 09/04/2014 12:59 AM, Arnaldo Carvalho de Melo wrote:
> From: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> So that we don't continue polling on vanished file descriptors, i.e.
> file descriptors for events monitoring threads that exited.
> 
> I.e. the following 'perf record' command now exits as expected, instead
> of staying in an eternal loop:
> 
>   $ sleep 5s &
>   $ perf record -p `pidof sleep`
> 
> Reported-by: Jiri Olsa <jolsa@redhat.com>
> Acked-by: Jiri Olsa <jolsa@kernel.org>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Don Zickus <dzickus@redhat.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Mike Galbraith <efault@gmx.de>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Stephane Eranian <eranian@google.com>
> Link: http://lkml.kernel.org/n/tip-ftg46awsyfjc3axb0xm63oig@git.kernel.org
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> ---
>  tools/perf/builtin-record.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 87e28a4e33ba..b87708ce09c9 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -467,6 +467,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>  			if (err > 0 || (err < 0 && errno == EINTR))
>  				err = 0;
>  			waking++;
> +
> +			if (perf_evlist__filter_pollfd(rec->evlist, POLLERR | POLLHUP) == 0)

If the poll fds only include the ones mmapped, then mightn't it filter out
ones still in use? e.g. what if you record two processes and one exits?

> +				done = 1;
>  		}
>  
>  		/*
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 04/10] perf record: Filter out POLLHUP'ed file descriptors
  2014-09-04 12:32   ` Adrian Hunter
@ 2014-09-04 15:19     ` Arnaldo Carvalho de Melo
  2014-09-05  8:42       ` Adrian Hunter
  0 siblings, 1 reply; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-04 15:19 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Jiri Olsa, linux-kernel, David Ahern, Don Zickus,
	Frederic Weisbecker, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Peter Zijlstra, Stephane Eranian

Em Thu, Sep 04, 2014 at 03:32:08PM +0300, Adrian Hunter escreveu:
> On 09/04/2014 12:59 AM, Arnaldo Carvalho de Melo wrote:
> > So that we don't continue polling on vanished file descriptors, i.e.
> > file descriptors for events monitoring threads that exited.

> > I.e. the following 'perf record' command now exits as expected, instead
> > of staying in an eternal loop:

> >   $ sleep 5s &
> >   $ perf record -p `pidof sleep`

> > Reported-by: Jiri Olsa <jolsa@redhat.com>
<SNIP>
> > +++ b/tools/perf/builtin-record.c
> > @@ -467,6 +467,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
> >  			if (err > 0 || (err < 0 && errno == EINTR))
> >  				err = 0;
> >  			waking++;
> > +			if (perf_evlist__filter_pollfd(rec->evlist, POLLERR | POLLHUP) == 0)
> 
> If the poll fds only include the ones mmapped, then mightn't it filter out
> ones still in use? e.g. what if you record two processes and one exits?

Humm, without looking at the code, we would be needlessly looking at the
mmaps for the threads that exited.

I.e. we need to associate at the fdarray level the mmaps for the
descriptors, so that we can, at fdarray__filter() time, munmap the
POLLHUPed descriptors, right?

I'll take a look at the code to see what is needed.

But then this makes the patch set good for a class of problems while
maintaining existing behaviour for the case you outlined, i.e. this is a
partial fix and should go now, with further patches on top of it?

- Arnaldo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 04/10] perf record: Filter out POLLHUP'ed file descriptors
  2014-09-04 15:19     ` Arnaldo Carvalho de Melo
@ 2014-09-05  8:42       ` Adrian Hunter
  2014-09-05 14:07         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 25+ messages in thread
From: Adrian Hunter @ 2014-09-05  8:42 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Jiri Olsa, linux-kernel, David Ahern, Don Zickus,
	Frederic Weisbecker, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Peter Zijlstra, Stephane Eranian

On 09/04/2014 06:19 PM, Arnaldo Carvalho de Melo wrote:
> Em Thu, Sep 04, 2014 at 03:32:08PM +0300, Adrian Hunter escreveu:
>> On 09/04/2014 12:59 AM, Arnaldo Carvalho de Melo wrote:
>>> So that we don't continue polling on vanished file descriptors, i.e.
>>> file descriptors for events monitoring threads that exited.
> 
>>> I.e. the following 'perf record' command now exits as expected, instead
>>> of staying in an eternal loop:
> 
>>>   $ sleep 5s &
>>>   $ perf record -p `pidof sleep`
> 
>>> Reported-by: Jiri Olsa <jolsa@redhat.com>
> <SNIP>
>>> +++ b/tools/perf/builtin-record.c
>>> @@ -467,6 +467,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
>>>  			if (err > 0 || (err < 0 && errno == EINTR))
>>>  				err = 0;
>>>  			waking++;
>>> +			if (perf_evlist__filter_pollfd(rec->evlist, POLLERR | POLLHUP) == 0)
>>
>> If the poll fds only include the ones mmapped, then mightn't it filter out
>> ones still in use? e.g. what if you record two processes and one exits?
> 
> Humm, without looking at the code, we would be needlessly looking at the
> mmaps for the threads that exited.
> 
> I.e. we need to associate at the fdarray level the mmaps for the
> descriptors, so that we can, at fdarray__filter() time, munmap the
> POLLHUPed descriptors, right?
> 
> I'll take a look at the code to see what is needed.
> 
> But then this makes the patch set good for a class of problems while
> maintaining existing behaviour for the case you outlined, i.e. this is a
> partial fix and should go now, with further patches on top of it?

No I was meaning something different. For example, 'perf record' opens an
event for 2 processes per-cpu and gets 4 file descriptors:

	task1	task2
cpu0	fd0	fd1
cpu1	fd2	fd3

Now, perf record will mmap fd0 and fd2 and set-output fd1->fd0
and fd3->fd2.

pollfds includes only fd0 and fd2.

But if task2 exits, the POLLHUP will appear on fd1 and fd3.

I think Jiri's patchset changed pollfds to include all fds for that reason.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 04/10] perf record: Filter out POLLHUP'ed file descriptors
  2014-09-05  8:42       ` Adrian Hunter
@ 2014-09-05 14:07         ` Arnaldo Carvalho de Melo
  2014-09-06 20:39           ` Jiri Olsa
  0 siblings, 1 reply; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-05 14:07 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Jiri Olsa, linux-kernel, David Ahern, Don Zickus,
	Frederic Weisbecker, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Peter Zijlstra, Stephane Eranian

Em Fri, Sep 05, 2014 at 11:42:59AM +0300, Adrian Hunter escreveu:
> On 09/04/2014 06:19 PM, Arnaldo Carvalho de Melo wrote:
> > Em Thu, Sep 04, 2014 at 03:32:08PM +0300, Adrian Hunter escreveu:
> No I was meaning something different. For example, 'perf record' opens an
> event for 2 processes per-cpu and gets 4 file descriptors:
 
> 	task1	task2
> cpu0	fd0	fd1
> cpu1	fd2	fd3
 
> Now, perf record will mmap fd0 and fd2 and set-output fd1->fd0
> and fd3->fd2.
 
> pollfds includes only fd0 and fd2.
 
> But if task2 exits, the POLLHUP will appear on fd1 and fd3.

So? We are not interested in fd1 and fd3, since all our reading is done
on fd0 and fd2 mmaps, no?

I.e. when we ask the kernel to point fd B to fd A's mmap (what you
called set-output) and fd B inserts an event into fd A's mmap ring
buffer, we get fd A poll return as POLLRD, no?

Have to check... Otherwise we would have to poll all fds all the time,
not just the ones mmaping, right?

> I think Jiri's patchset changed pollfds to include all fds for that reason.

It did? I have to look again, probably went together with other changes,
has it?

- Arnaldo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 04/10] perf record: Filter out POLLHUP'ed file descriptors
  2014-09-05 14:07         ` Arnaldo Carvalho de Melo
@ 2014-09-06 20:39           ` Jiri Olsa
  2014-09-08 13:46             ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 25+ messages in thread
From: Jiri Olsa @ 2014-09-06 20:39 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Adrian Hunter, linux-kernel, David Ahern, Don Zickus,
	Frederic Weisbecker, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Peter Zijlstra, Stephane Eranian

On Fri, Sep 05, 2014 at 11:07:56AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Fri, Sep 05, 2014 at 11:42:59AM +0300, Adrian Hunter escreveu:
> > On 09/04/2014 06:19 PM, Arnaldo Carvalho de Melo wrote:
> > > Em Thu, Sep 04, 2014 at 03:32:08PM +0300, Adrian Hunter escreveu:
> > No I was meaning something different. For example, 'perf record' opens an
> > event for 2 processes per-cpu and gets 4 file descriptors:
>  
> > 	task1	task2
> > cpu0	fd0	fd1
> > cpu1	fd2	fd3
>  
> > Now, perf record will mmap fd0 and fd2 and set-output fd1->fd0
> > and fd3->fd2.
>  
> > pollfds includes only fd0 and fd2.
>  
> > But if task2 exits, the POLLHUP will appear on fd1 and fd3.
> 
> So? We are not interested in fd1 and fd3, since all our reading is done
> on fd0 and fd2 mmaps, no?

hm, what if task1 (fd0, fd2) exits first.. perf record will exit,
but it still has to read task2..?

> 
> I.e. when we ask the kernel to point fd B to fd A's mmap (what you
> called set-output) and fd B inserts an event into fd A's mmap ring
> buffer, we get fd A poll return as POLLRD, no?
> 
> Have to check... Otherwise we would have to poll all fds all the time,
> not just the ones mmaping, right?
> 
> > I think Jiri's patchset changed pollfds to include all fds for that reason.

hm, I did not think of that.. ;-) I needed more grained feedback
for future features like cpu hotplug

> 
> It did? I have to look again, probably went together with other changes,
> has it?

it was done by replacing 'int' with 'struct poll_item' for evsel::fd xyarray

jirka

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 04/10] perf record: Filter out POLLHUP'ed file descriptors
  2014-09-06 20:39           ` Jiri Olsa
@ 2014-09-08 13:46             ` Arnaldo Carvalho de Melo
  2014-09-08 14:04               ` Jiri Olsa
  0 siblings, 1 reply; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-08 13:46 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Adrian Hunter, linux-kernel, David Ahern, Don Zickus,
	Frederic Weisbecker, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Peter Zijlstra, Stephane Eranian

Em Sat, Sep 06, 2014 at 10:39:15PM +0200, Jiri Olsa escreveu:
> On Fri, Sep 05, 2014 at 11:07:56AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Fri, Sep 05, 2014 at 11:42:59AM +0300, Adrian Hunter escreveu:
> > > On 09/04/2014 06:19 PM, Arnaldo Carvalho de Melo wrote:
> > > > Em Thu, Sep 04, 2014 at 03:32:08PM +0300, Adrian Hunter escreveu:
> > > No I was meaning something different. For example, 'perf record' opens an
> > > event for 2 processes per-cpu and gets 4 file descriptors:

> > > 	task1	task2
> > > cpu0	fd0	fd1
> > > cpu1	fd2	fd3

> > > Now, perf record will mmap fd0 and fd2 and set-output fd1->fd0
> > > and fd3->fd2.

> > > pollfds includes only fd0 and fd2.

> > > But if task2 exits, the POLLHUP will appear on fd1 and fd3.

> > So? We are not interested in fd1 and fd3, since all our reading is done
> > on fd0 and fd2 mmaps, no?

> hm, what if task1 (fd0, fd2) exits first.. perf record will exit,
> but it still has to read task2..?

Ok, what happens in that case, i.e. when the fds that were set to be the
ones to be polled, gets nuked, does the set-output command gets just
undone? Or does the mmap stands, receiving the events from the remaining
fds and the polling notifications get sent to, in this case, fd3 and
fd1?

I'll look at the kernel code for that...
 
> > I.e. when we ask the kernel to point fd B to fd A's mmap (what you
> > called set-output) and fd B inserts an event into fd A's mmap ring
> > buffer, we get fd A poll return as POLLRD, no?

> > Have to check... Otherwise we would have to poll all fds all the time,
> > not just the ones mmaping, right?

> > > I think Jiri's patchset changed pollfds to include all fds for that reason.

> hm, I did not think of that.. ;-) I needed more grained feedback
> for future features like cpu hotplug

So this is good for something you didn't tried to fix (and document) but
good for something that may be nice in the future? Grumpf, we have
already way too much stuff that will be eventually used but is not used
right now :-\
 
> > It did? I have to look again, probably went together with other changes,
> > has it?
> 
> it was done by replacing 'int' with 'struct poll_item' for evsel::fd xyarray

Well, will look into that as soon as I see the need for it. I.e. after I
clarify the discussion above.

- Arnaldo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 04/10] perf record: Filter out POLLHUP'ed file descriptors
  2014-09-08 13:46             ` Arnaldo Carvalho de Melo
@ 2014-09-08 14:04               ` Jiri Olsa
  2014-09-08 14:33                 ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 25+ messages in thread
From: Jiri Olsa @ 2014-09-08 14:04 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Adrian Hunter, linux-kernel, David Ahern, Don Zickus,
	Frederic Weisbecker, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Peter Zijlstra, Stephane Eranian

On Mon, Sep 08, 2014 at 10:46:16AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Sat, Sep 06, 2014 at 10:39:15PM +0200, Jiri Olsa escreveu:
> > On Fri, Sep 05, 2014 at 11:07:56AM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Fri, Sep 05, 2014 at 11:42:59AM +0300, Adrian Hunter escreveu:
> > > > On 09/04/2014 06:19 PM, Arnaldo Carvalho de Melo wrote:
> > > > > Em Thu, Sep 04, 2014 at 03:32:08PM +0300, Adrian Hunter escreveu:
> > > > No I was meaning something different. For example, 'perf record' opens an
> > > > event for 2 processes per-cpu and gets 4 file descriptors:
> 
> > > > 	task1	task2
> > > > cpu0	fd0	fd1
> > > > cpu1	fd2	fd3
> 
> > > > Now, perf record will mmap fd0 and fd2 and set-output fd1->fd0
> > > > and fd3->fd2.
> 
> > > > pollfds includes only fd0 and fd2.
> 
> > > > But if task2 exits, the POLLHUP will appear on fd1 and fd3.
> 
> > > So? We are not interested in fd1 and fd3, since all our reading is done
> > > on fd0 and fd2 mmaps, no?
> 
> > hm, what if task1 (fd0, fd2) exits first.. perf record will exit,
> > but it still has to read task2..?
> 
> Ok, what happens in that case, i.e. when the fds that were set to be the
> ones to be polled, gets nuked, does the set-output command gets just
> undone? Or does the mmap stands, receiving the events from the remaining
> fds and the polling notifications get sent to, in this case, fd3 and
> fd1?

mmaps stays for fd1 and fd3.. and they get poll notifications as well,
we just do not check/poll them now

> 
> I'll look at the kernel code for that...
>  
> > > I.e. when we ask the kernel to point fd B to fd A's mmap (what you
> > > called set-output) and fd B inserts an event into fd A's mmap ring
> > > buffer, we get fd A poll return as POLLRD, no?
> 
> > > Have to check... Otherwise we would have to poll all fds all the time,
> > > not just the ones mmaping, right?
> 
> > > > I think Jiri's patchset changed pollfds to include all fds for that reason.
> 
> > hm, I did not think of that.. ;-) I needed more grained feedback
> > for future features like cpu hotplug
> 
> So this is good for something you didn't tried to fix (and document) but
> good for something that may be nice in the future? Grumpf, we have
> already way too much stuff that will be eventually used but is not used
> right now :-\
>  

IMO it's more clear to poll pm all event FDs.. and now with the
case Adrian described it seems necessary anyway

jirka

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 04/10] perf record: Filter out POLLHUP'ed file descriptors
  2014-09-08 14:04               ` Jiri Olsa
@ 2014-09-08 14:33                 ` Arnaldo Carvalho de Melo
  2014-09-08 15:10                   ` Jiri Olsa
  0 siblings, 1 reply; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-08 14:33 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Adrian Hunter, linux-kernel, David Ahern, Don Zickus,
	Frederic Weisbecker, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Peter Zijlstra, Stephane Eranian

Em Mon, Sep 08, 2014 at 04:04:54PM +0200, Jiri Olsa escreveu:
> On Mon, Sep 08, 2014 at 10:46:16AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Sat, Sep 06, 2014 at 10:39:15PM +0200, Jiri Olsa escreveu:
> > > On Fri, Sep 05, 2014 at 11:07:56AM -0300, Arnaldo Carvalho de Melo wrote:
> > > > Em Fri, Sep 05, 2014 at 11:42:59AM +0300, Adrian Hunter escreveu:
> > > > > On 09/04/2014 06:19 PM, Arnaldo Carvalho de Melo wrote:
> > > > > > Em Thu, Sep 04, 2014 at 03:32:08PM +0300, Adrian Hunter escreveu:
> > > > > No I was meaning something different. For example, 'perf record' opens an
> > > > > event for 2 processes per-cpu and gets 4 file descriptors:

> > > > > 	task1	task2
> > > > > cpu0	fd0	fd1
> > > > > cpu1	fd2	fd3

> > > > > Now, perf record will mmap fd0 and fd2 and set-output fd1->fd0
> > > > > and fd3->fd2.

> > > > > pollfds includes only fd0 and fd2.

> > > > > But if task2 exits, the POLLHUP will appear on fd1 and fd3.

> > > > So? We are not interested in fd1 and fd3, since all our reading is done
> > > > on fd0 and fd2 mmaps, no?

> > > hm, what if task1 (fd0, fd2) exits first.. perf record will exit,
> > > but it still has to read task2..?

> > Ok, what happens in that case, i.e. when the fds that were set to be the
> > ones to be polled, gets nuked, does the set-output command gets just
> > undone? Or does the mmap stands, receiving the events from the remaining
> > fds and the polling notifications get sent to, in this case, fd3 and
> > fd1?
 
> mmaps stays for fd1 and fd3.. and they get poll notifications as well,
> we just do not check/poll them now

So what you're saying is that we should have been polling all the fds
all the time?

Because after all we end up trying to consume everything in all the ring
buffers when just one of them gets a POLLRD anyway...

> > I'll look at the kernel code for that...
> >  
> > > > I.e. when we ask the kernel to point fd B to fd A's mmap (what you
> > > > called set-output) and fd B inserts an event into fd A's mmap ring
> > > > buffer, we get fd A poll return as POLLRD, no?
> > 
> > > > Have to check... Otherwise we would have to poll all fds all the time,
> > > > not just the ones mmaping, right?
> > 
> > > > > I think Jiri's patchset changed pollfds to include all fds for that reason.
> > 
> > > hm, I did not think of that.. ;-) I needed more grained feedback
> > > for future features like cpu hotplug
> > 
> > So this is good for something you didn't tried to fix (and document) but
> > good for something that may be nice in the future? Grumpf, we have
> > already way too much stuff that will be eventually used but is not used
> > right now :-\
 
> IMO it's more clear to poll pm all event FDs.. and now with the
> case Adrian described it seems necessary anyway

I would have to check why was that we were polling just the one where
the mmap is done, I don't recall being the one to do it, probably who
did it thought that since the ring buffer is there, it was enough (and
possibly scaled better, dunno) to do the polling in just one of them.

- Arnaldo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 04/10] perf record: Filter out POLLHUP'ed file descriptors
  2014-09-08 14:33                 ` Arnaldo Carvalho de Melo
@ 2014-09-08 15:10                   ` Jiri Olsa
  2014-09-08 15:38                     ` Arnaldo Carvalho de Melo
  2014-09-26  9:20                     ` [tip:perf/core] perf evlist: We need to poll all event file descriptors tip-bot for Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 25+ messages in thread
From: Jiri Olsa @ 2014-09-08 15:10 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Adrian Hunter, linux-kernel, David Ahern, Don Zickus,
	Frederic Weisbecker, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Peter Zijlstra, Stephane Eranian

On Mon, Sep 08, 2014 at 11:33:17AM -0300, Arnaldo Carvalho de Melo wrote:

SNIP

> 
> So what you're saying is that we should have been polling all the fds
> all the time?
> 
> Because after all we end up trying to consume everything in all the ring
> buffers when just one of them gets a POLLRD anyway...
> 
> > > I'll look at the kernel code for that...
> > >  
> > > > > I.e. when we ask the kernel to point fd B to fd A's mmap (what you
> > > > > called set-output) and fd B inserts an event into fd A's mmap ring
> > > > > buffer, we get fd A poll return as POLLRD, no?
> > > 
> > > > > Have to check... Otherwise we would have to poll all fds all the time,
> > > > > not just the ones mmaping, right?
> > > 
> > > > > > I think Jiri's patchset changed pollfds to include all fds for that reason.
> > > 
> > > > hm, I did not think of that.. ;-) I needed more grained feedback
> > > > for future features like cpu hotplug
> > > 
> > > So this is good for something you didn't tried to fix (and document) but
> > > good for something that may be nice in the future? Grumpf, we have
> > > already way too much stuff that will be eventually used but is not used
> > > right now :-\
>  
> > IMO it's more clear to poll pm all event FDs.. and now with the
> > case Adrian described it seems necessary anyway
> 
> I would have to check why was that we were polling just the one where
> the mmap is done, I don't recall being the one to do it, probably who
> did it thought that since the ring buffer is there, it was enough (and
> possibly scaled better, dunno) to do the polling in just one of them.

for read notification it's ok to poll just for one event, because they
all share same ringbuffer and perf_poll checks if there's ANY new data

for the hup notification I think we need to poll all of them

jirka

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 04/10] perf record: Filter out POLLHUP'ed file descriptors
  2014-09-08 15:10                   ` Jiri Olsa
@ 2014-09-08 15:38                     ` Arnaldo Carvalho de Melo
  2014-09-08 17:07                       ` Arnaldo Carvalho de Melo
  2014-09-26  9:21                       ` [tip:perf/core] perf evlist: Refcount mmaps tip-bot for Arnaldo Carvalho de Melo
  2014-09-26  9:20                     ` [tip:perf/core] perf evlist: We need to poll all event file descriptors tip-bot for Arnaldo Carvalho de Melo
  1 sibling, 2 replies; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-08 15:38 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Adrian Hunter, linux-kernel, David Ahern, Don Zickus,
	Frederic Weisbecker, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Peter Zijlstra, Stephane Eranian

Em Mon, Sep 08, 2014 at 05:10:17PM +0200, Jiri Olsa escreveu:
> On Mon, Sep 08, 2014 at 11:33:17AM -0300, Arnaldo Carvalho de Melo wrote:
> > > IMO it's more clear to poll pm all event FDs.. and now with the
> > > case Adrian described it seems necessary anyway

> > I would have to check why was that we were polling just the one where
> > the mmap is done, I don't recall being the one to do it, probably who
> > did it thought that since the ring buffer is there, it was enough (and
> > possibly scaled better, dunno) to do the polling in just one of them.
 
> for read notification it's ok to poll just for one event, because they
> all share same ringbuffer and perf_poll checks if there's ANY new data
 
> for the hup notification I think we need to poll all of them

Yeah, I'm convinced of this, I'm working on a patch to make it look at
all file descriptors at poll time.

I'm doing it on top of a patch that will close the mmap when it gets a
POLLHUP, as discussed recently on this thread, in a response I gave to
Adrian.

But since multiple fds share an mmap, we can only close that mmap when
all fds are HUPed, i.e. we need to reference count struct perf_mmap,
which is what I am doing.

I.e. the fist mmaps and sets perf_mmap.nfds to 1, the next one, just
after doing that PERF_EVENT_IOC_SET_OUTPUT will bump perf_mmap.nfds,
unmap gets replaced by mmap_put, that decs and calls munmap when it
hits zero, yadda, yadda.

- Arnaldo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 04/10] perf record: Filter out POLLHUP'ed file descriptors
  2014-09-08 15:38                     ` Arnaldo Carvalho de Melo
@ 2014-09-08 17:07                       ` Arnaldo Carvalho de Melo
  2014-09-26  9:21                       ` [tip:perf/core] perf evlist: Refcount mmaps tip-bot for Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-09-08 17:07 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Adrian Hunter, linux-kernel, David Ahern, Don Zickus,
	Frederic Weisbecker, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Peter Zijlstra, Stephane Eranian

Em Mon, Sep 08, 2014 at 12:38:24PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Mon, Sep 08, 2014 at 05:10:17PM +0200, Jiri Olsa escreveu:
> > On Mon, Sep 08, 2014 at 11:33:17AM -0300, Arnaldo Carvalho de Melo wrote:
> > > > IMO it's more clear to poll pm all event FDs.. and now with the
> > > > case Adrian described it seems necessary anyway
> 
> > > I would have to check why was that we were polling just the one where
> > > the mmap is done, I don't recall being the one to do it, probably who
> > > did it thought that since the ring buffer is there, it was enough (and
> > > possibly scaled better, dunno) to do the polling in just one of them.
>  
> > for read notification it's ok to poll just for one event, because they
> > all share same ringbuffer and perf_poll checks if there's ANY new data
>  
> > for the hup notification I think we need to poll all of them
> 
> Yeah, I'm convinced of this, I'm working on a patch to make it look at
> all file descriptors at poll time.

Done, its at:

https://git.kernel.org/cgit/linux/kernel/git/acme/linux.git/commit/?h=perf/fdarray.v2&id=46eb798f12d41093deb743b9df10fec23ffae719

this is in a refreshed branch so that it gets in the right time in the
patch kit

Comment:

-----
  perf evlist: We need to poll all event file descriptors
    
  Because we want to notice when they get POLLHUP'ed, so that we can
  figure out when all threads exited in a workload being monitored.
    
  We can't just monitor the fds that were mmaped, we need to notice when
  all the fds that were PERF_EVENT_IOC_SET_OUTPUT'ed too, because the mmap
  stays even after the fd that originally was used to do the mmap call
  went away, its only when all the set-output fds for a mmap are gone that
  the mmap is.
-----

Full branch is available at:

git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git perf/fdarray.v2

Web interface:

https://git.kernel.org/cgit/linux/kernel/git/acme/linux.git/log/?h=perf/fdarray.v2
 
> I'm doing it on top of a patch that will close the mmap when it gets a
> POLLHUP, as discussed recently on this thread, in a response I gave to
> Adrian.

> But since multiple fds share an mmap, we can only close that mmap when
> all fds are HUPed, i.e. we need to reference count struct perf_mmap,
> which is what I am doing.
 
> I.e. the fist mmaps and sets perf_mmap.nfds to 1, the next one, just
> after doing that PERF_EVENT_IOC_SET_OUTPUT will bump perf_mmap.nfds,
> unmap gets replaced by mmap_put, that decs and calls munmap when it
> hits zero, yadda, yadda.

Should be ok, now, with the above patch (kinda one liner, just moving
the add_pollfd call) + the patches at the top of the branch that
introduce the mmap refcounting and the fdarray facility to store some
info (what I think you called poll_item in your patchset) associated to
each entry in the file descriptor array).

Please let me know if you can take a look at the branch via the web
interface to see if that solves the problems you reported before I
repost them or if you prefer that I repost so that you can check via
e-mail.

I will repost it one final time, after lunch ;-)

- Arnaldo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [tip:perf/core] perf evlist: We need to poll all event file descriptors
  2014-09-08 15:10                   ` Jiri Olsa
  2014-09-08 15:38                     ` Arnaldo Carvalho de Melo
@ 2014-09-26  9:20                     ` tip-bot for Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 25+ messages in thread
From: tip-bot for Arnaldo Carvalho de Melo @ 2014-09-26  9:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, eranian, paulus, acme, hpa, mingo, peterz, efault,
	namhyung, jolsa, fweisbec, adrian.hunter, dsahern, tglx, dzickus

Commit-ID:  033fa713db66b96d5779e6a93d50ff821bc1abd2
Gitweb:     http://git.kernel.org/tip/033fa713db66b96d5779e6a93d50ff821bc1abd2
Author:     Arnaldo Carvalho de Melo <acme@redhat.com>
AuthorDate: Mon, 8 Sep 2014 12:55:12 -0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 25 Sep 2014 16:46:54 -0300

perf evlist: We need to poll all event file descriptors

Because we want to notice when they get POLLHUP'ed, so that we can
figure out when all threads exited in a workload being monitored.

We can't just monitor the fds that were mmaped, we need to notice when
all the fds that were PERF_EVENT_IOC_SET_OUTPUT'ed too, because the mmap
stays even after the fd that originally was used to do the mmap call
went away, its only when all the set-output fds for a mmap are gone that
the mmap is.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20140908151016.GH17728@krava.brq.redhat.com
Link: http://lkml.kernel.org/n/tip-24omlq5asrfg4uo3muuzn2bl@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/evlist.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 502cd11..6b13bfa 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -717,8 +717,6 @@ static int __perf_evlist__mmap(struct perf_evlist *evlist, int idx,
 		evlist->mmap[idx].base = NULL;
 		return -1;
 	}
-
-	perf_evlist__add_pollfd(evlist, fd);
 	return 0;
 }
 
@@ -745,6 +743,8 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
 				return -1;
 		}
 
+		perf_evlist__add_pollfd(evlist, fd);
+
 		if ((evsel->attr.read_format & PERF_FORMAT_ID) &&
 		    perf_evlist__id_add_fd(evlist, evsel, cpu, thread, fd) < 0)
 			return -1;

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [tip:perf/core] perf evlist: Refcount mmaps
  2014-09-08 15:38                     ` Arnaldo Carvalho de Melo
  2014-09-08 17:07                       ` Arnaldo Carvalho de Melo
@ 2014-09-26  9:21                       ` tip-bot for Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 25+ messages in thread
From: tip-bot for Arnaldo Carvalho de Melo @ 2014-09-26  9:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, paulus, acme, hpa, mingo, jolsa, a.p.zijlstra,
	jean.pihet, namhyung, fweisbec, adrian.hunter, dsahern, tglx, bp,
	cjashfor

Commit-ID:  82396986032915c1572bfb74b224fcc2e4e8ba7c
Gitweb:     http://git.kernel.org/tip/82396986032915c1572bfb74b224fcc2e4e8ba7c
Author:     Arnaldo Carvalho de Melo <acme@redhat.com>
AuthorDate: Mon, 8 Sep 2014 13:26:35 -0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 25 Sep 2014 16:46:55 -0300

perf evlist: Refcount mmaps

We need to know how many fds are using a perf mmap via
PERF_EVENT_IOC_SET_OUTPUT, so that we can know when to ditch an mmap,
refcount it.

v2: Automatically unmap it when the refcount hits one, which will happen
when all fds are filtered by perf_evlist__filter_pollfd(), in later
patches.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20140908153824.GG2773@kernel.org
Link: http://lkml.kernel.org/n/tip-cpv7v2lw0g74ucmxa39xdpms@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/evlist.c | 47 +++++++++++++++++++++++++++++++++++++++++++++--
 tools/perf/util/evlist.h |  6 ++++++
 2 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 398dab1..efddee5 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -28,6 +28,8 @@
 #define FD(e, x, y) (*(int *)xyarray__entry(e->fd, x, y))
 #define SID(e, x, y) xyarray__entry(e->sample_id, x, y)
 
+static void __perf_evlist__munmap(struct perf_evlist *evlist, int idx);
+
 void perf_evlist__init(struct perf_evlist *evlist, struct cpu_map *cpus,
 		       struct thread_map *threads)
 {
@@ -651,14 +653,36 @@ union perf_event *perf_evlist__mmap_read(struct perf_evlist *evlist, int idx)
 	return event;
 }
 
+static bool perf_mmap__empty(struct perf_mmap *md)
+{
+	return perf_mmap__read_head(md) != md->prev;
+}
+
+static void perf_evlist__mmap_get(struct perf_evlist *evlist, int idx)
+{
+	++evlist->mmap[idx].refcnt;
+}
+
+static void perf_evlist__mmap_put(struct perf_evlist *evlist, int idx)
+{
+	BUG_ON(evlist->mmap[idx].refcnt == 0);
+
+	if (--evlist->mmap[idx].refcnt == 0)
+		__perf_evlist__munmap(evlist, idx);
+}
+
 void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx)
 {
+	struct perf_mmap *md = &evlist->mmap[idx];
+
 	if (!evlist->overwrite) {
-		struct perf_mmap *md = &evlist->mmap[idx];
 		unsigned int old = md->prev;
 
 		perf_mmap__write_tail(md, old);
 	}
+
+	if (md->refcnt == 1 && perf_mmap__empty(md))
+		perf_evlist__mmap_put(evlist, idx);
 }
 
 static void __perf_evlist__munmap(struct perf_evlist *evlist, int idx)
@@ -666,6 +690,7 @@ static void __perf_evlist__munmap(struct perf_evlist *evlist, int idx)
 	if (evlist->mmap[idx].base != NULL) {
 		munmap(evlist->mmap[idx].base, evlist->mmap_len);
 		evlist->mmap[idx].base = NULL;
+		evlist->mmap[idx].refcnt = 0;
 	}
 }
 
@@ -699,6 +724,20 @@ struct mmap_params {
 static int __perf_evlist__mmap(struct perf_evlist *evlist, int idx,
 			       struct mmap_params *mp, int fd)
 {
+	/*
+	 * The last one will be done at perf_evlist__mmap_consume(), so that we
+	 * make sure we don't prevent tools from consuming every last event in
+	 * the ring buffer.
+	 *
+	 * I.e. we can get the POLLHUP meaning that the fd doesn't exist
+	 * anymore, but the last events for it are still in the ring buffer,
+	 * waiting to be consumed.
+	 *
+	 * Tools can chose to ignore this at their own discretion, but the
+	 * evlist layer can't just drop it when filtering events in
+	 * perf_evlist__filter_pollfd().
+	 */
+	evlist->mmap[idx].refcnt = 2;
 	evlist->mmap[idx].prev = 0;
 	evlist->mmap[idx].mask = mp->mask;
 	evlist->mmap[idx].base = mmap(NULL, evlist->mmap_len, mp->prot,
@@ -734,10 +773,14 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
 		} else {
 			if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, *output) != 0)
 				return -1;
+
+			perf_evlist__mmap_get(evlist, idx);
 		}
 
-		if (perf_evlist__add_pollfd(evlist, fd) < 0)
+		if (perf_evlist__add_pollfd(evlist, fd) < 0) {
+			perf_evlist__mmap_put(evlist, idx);
 			return -1;
+		}
 
 		if ((evsel->attr.read_format & PERF_FORMAT_ID) &&
 		    perf_evlist__id_add_fd(evlist, evsel, cpu, thread, fd) < 0)
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index fc01370..bd312b0 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -18,9 +18,15 @@ struct record_opts;
 #define PERF_EVLIST__HLIST_BITS 8
 #define PERF_EVLIST__HLIST_SIZE (1 << PERF_EVLIST__HLIST_BITS)
 
+/**
+ * struct perf_mmap - perf's ring buffer mmap details
+ *
+ * @refcnt - e.g. code using PERF_EVENT_IOC_SET_OUTPUT to share this
+ */
 struct perf_mmap {
 	void		 *base;
 	int		 mask;
+	int		 refcnt;
 	unsigned int	 prev;
 	char		 event_copy[PERF_SAMPLE_MAX_SIZE];
 };

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/10] perf record: Filter out POLLHUP'ed file descriptors
  2014-08-22 20:59 [RFC 00/10] perf pollfd series Arnaldo Carvalho de Melo
@ 2014-08-22 20:59 ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 25+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-08-22 20:59 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Adrian Hunter,
	David Ahern, Don Zickus, Frederic Weisbecker, Ingo Molnar,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Peter Zijlstra,
	Stephane Eranian

From: Arnaldo Carvalho de Melo <acme@redhat.com>

So that we don't continue polling on vanished file descriptors, i.e.
file descriptors for events monitoring threads that exited.

I.e. the following 'perf record' command now exits as expected, instead
of staying in an eternal loop:

  $ sleep 5s &
  $ perf record -p `pidof sleep`

Reported-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-ftg46awsyfjc3axb0xm63oig@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-record.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 87e28a4e33ba..b87708ce09c9 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -467,6 +467,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 			if (err > 0 || (err < 0 && errno == EINTR))
 				err = 0;
 			waking++;
+
+			if (perf_evlist__filter_pollfd(rec->evlist, POLLERR | POLLHUP) == 0)
+				done = 1;
 		}
 
 		/*
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2014-09-26  9:22 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-03 21:59 [RFC 00/10] perf pollfd series v2 Arnaldo Carvalho de Melo
2014-09-03 21:59 ` [PATCH 01/10] perf evlist: Introduce perf_evlist__filter_pollfd method Arnaldo Carvalho de Melo
2014-09-03 21:59 ` [PATCH 02/10] perf tests: Add test for perf_evlist__filter_pollfd() Arnaldo Carvalho de Melo
2014-09-03 21:59 ` [PATCH 03/10] perf evlist: Monitor POLLERR and POLLHUP events too Arnaldo Carvalho de Melo
2014-09-03 21:59 ` [PATCH 04/10] perf record: Filter out POLLHUP'ed file descriptors Arnaldo Carvalho de Melo
2014-09-04 12:32   ` Adrian Hunter
2014-09-04 15:19     ` Arnaldo Carvalho de Melo
2014-09-05  8:42       ` Adrian Hunter
2014-09-05 14:07         ` Arnaldo Carvalho de Melo
2014-09-06 20:39           ` Jiri Olsa
2014-09-08 13:46             ` Arnaldo Carvalho de Melo
2014-09-08 14:04               ` Jiri Olsa
2014-09-08 14:33                 ` Arnaldo Carvalho de Melo
2014-09-08 15:10                   ` Jiri Olsa
2014-09-08 15:38                     ` Arnaldo Carvalho de Melo
2014-09-08 17:07                       ` Arnaldo Carvalho de Melo
2014-09-26  9:21                       ` [tip:perf/core] perf evlist: Refcount mmaps tip-bot for Arnaldo Carvalho de Melo
2014-09-26  9:20                     ` [tip:perf/core] perf evlist: We need to poll all event file descriptors tip-bot for Arnaldo Carvalho de Melo
2014-09-03 21:59 ` [PATCH 05/10] perf trace: Filter out POLLHUP'ed " Arnaldo Carvalho de Melo
2014-09-03 22:00 ` [PATCH 06/10] perf evlist: Allow growing pollfd on add method Arnaldo Carvalho de Melo
2014-09-03 22:00 ` [PATCH 07/10] perf tests: Add pollfd growing test Arnaldo Carvalho de Melo
2014-09-03 22:00 ` [PATCH 08/10] perf kvm stat live: Use perf_evlist__add_pollfd() instead of local equivalent Arnaldo Carvalho de Melo
2014-09-03 22:00 ` [PATCH 09/10] perf evlist: Introduce poll method for common code idiom Arnaldo Carvalho de Melo
2014-09-03 22:00 ` [PATCH 10/10] tools lib api: Adopt fdarray class from perf's evlist Arnaldo Carvalho de Melo
  -- strict thread matches above, loose matches on Subject: below --
2014-08-22 20:59 [RFC 00/10] perf pollfd series Arnaldo Carvalho de Melo
2014-08-22 20:59 ` [PATCH 04/10] perf record: Filter out POLLHUP'ed file descriptors Arnaldo Carvalho de Melo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.