linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Optimize perf stat for large number of events/cpus v3
@ 2019-10-25 18:14 Andi Kleen
  2019-10-25 18:14 ` [PATCH v3 1/7] perf pmu: Use file system cache to optimize sysfs access Andi Kleen
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Andi Kleen @ 2019-10-25 18:14 UTC (permalink / raw)
  To: acme; +Cc: jolsa, eranian, linux-kernel

[v3: Address review feedback. Fix minor bug and rebase 
to latest tip/perf/core avoiding one reject.]

This patch kit optimizes perf stat for a large number of events 
on systems with many CPUs and PMUs.

Some profiling shows that the most overhead is doing IPIs to
all the target CPUs. We can optimize this by using sched_setaffinity
to set the affinity to a target CPU once and then doing
the perf operation for all events on that CPU. This requires
some restructuring, but cuts the set up time quite a bit.

In theory we could go further by parallelizing these setups
too, but that would be much more complicated and for now just batching it
per CPU seems to be sufficient. At some point with many more cores 
parallelization or a better bulk perf setup API might be needed though.

In addition perf does a lot of redundant /sys accesses with
many PMUs, which can be also expensve. This is also optimized.

On a large test case (>700 events with many weak groups) on a 94 CPU
system I go from

real	0m8.607s
user	0m0.550s
sys	0m8.041s

to 

real	0m3.269s
user	0m0.760s
sys	0m1.694s

so shaving ~6 seconds of system time, at slightly more cost
in perf stat itself. On a 4 socket system with the savings
are more dramatic:

real	0m15.641s
user	0m0.873s
sys	0m14.729s

to 

real	0m4.493s
user	0m1.578s
sys	0m2.444s

so 11s difference in the user visible set up time.

Also available in 

git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/stat-scale-5

v1: Initial post.
v2: Rebase. Fix some minor issues.
v3: Rebase. Address review feedback. Fix one minor issue

-Andi


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3 1/7] perf pmu: Use file system cache to optimize sysfs access
  2019-10-25 18:14 Optimize perf stat for large number of events/cpus v3 Andi Kleen
@ 2019-10-25 18:14 ` Andi Kleen
  2019-10-28 22:01   ` Jiri Olsa
  2019-10-25 18:14 ` [PATCH v3 2/7] perf affinity: Add infrastructure to save/restore affinity Andi Kleen
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2019-10-25 18:14 UTC (permalink / raw)
  To: acme; +Cc: jolsa, eranian, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

pmu.c does a lot of redundant /sys accesses while parsing aliases
and probing for PMUs. On large systems with a lot of PMUs this
can get expensive (>2s):

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 27.25    1.227847           8    160888     16976 openat
 26.42    1.190481           7    164224    164077 stat

Add a cache to remember if specific file names exist or don't
exist, which eliminates most of this overhead.

Also optimize some stat() calls to be slightly cheaper access()

Resulting in:

  0.18    0.004166           2      1851       305 open
  0.08    0.001970           2       829       622 access

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---

v2: Use single lookup function as API (Jiri)
---
 tools/perf/util/Build     |  1 +
 tools/perf/util/fncache.c | 63 +++++++++++++++++++++++++++++++++++++++
 tools/perf/util/fncache.h |  7 +++++
 tools/perf/util/pmu.c     | 34 +++++++--------------
 tools/perf/util/srccode.c |  9 +-----
 5 files changed, 83 insertions(+), 31 deletions(-)
 create mode 100644 tools/perf/util/fncache.c
 create mode 100644 tools/perf/util/fncache.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 39814b1806a6..2c1504fe924c 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -48,6 +48,7 @@ perf-y += header.o
 perf-y += callchain.o
 perf-y += values.o
 perf-y += debug.o
+perf-y += fncache.o
 perf-y += machine.o
 perf-y += map.o
 perf-y += pstack.o
diff --git a/tools/perf/util/fncache.c b/tools/perf/util/fncache.c
new file mode 100644
index 000000000000..5afcd7edbe7a
--- /dev/null
+++ b/tools/perf/util/fncache.c
@@ -0,0 +1,63 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Manage a cache of file names' existence */
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+#include <linux/list.h>
+#include "fncache.h"
+
+struct fncache {
+	struct hlist_node nd;
+	bool res;
+	char name[];
+};
+
+#define FNHSIZE 61
+
+static struct hlist_head fncache_hash[FNHSIZE];
+
+unsigned shash(const unsigned char *s)
+{
+	unsigned h = 0;
+	while (*s)
+		h = 65599 * h + *s++;
+	return h ^ (h >> 16);
+}
+
+static bool lookup_fncache(const char *name, bool *res)
+{
+	int h = shash((const unsigned char *)name) % FNHSIZE;
+	struct fncache *n;
+
+	hlist_for_each_entry (n, &fncache_hash[h], nd) {
+		if (!strcmp(n->name, name)) {
+			*res = n->res;
+			return true;
+		}
+	}
+	return false;
+}
+
+static void update_fncache(const char *name, bool res)
+{
+	struct fncache *n = malloc(sizeof(struct fncache) + strlen(name) + 1);
+	int h = shash((const unsigned char *)name) % FNHSIZE;
+
+	if (!n)
+		return;
+	strcpy(n->name, name);
+	n->res = res;
+	hlist_add_head(&n->nd, &fncache_hash[h]);
+}
+
+/* No LRU, only use when bounded in some other way. */
+bool file_available(const char *name)
+{
+	bool res;
+
+	if (lookup_fncache(name, &res))
+		return res;
+	res = access(name, R_OK) == 0;
+	update_fncache(name, res);
+	return res;
+}
diff --git a/tools/perf/util/fncache.h b/tools/perf/util/fncache.h
new file mode 100644
index 000000000000..fe020beaefb1
--- /dev/null
+++ b/tools/perf/util/fncache.h
@@ -0,0 +1,7 @@
+#ifndef _FCACHE_H
+#define _FCACHE_H 1
+
+unsigned shash(const unsigned char *s);
+bool file_available(const char *name);
+
+#endif
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index adbe97e941dd..81357cc3d59a 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -24,6 +24,7 @@
 #include "pmu-events/pmu-events.h"
 #include "string2.h"
 #include "strbuf.h"
+#include "fncache.h"
 
 struct perf_pmu_format {
 	char *name;
@@ -82,7 +83,6 @@ int perf_pmu__format_parse(char *dir, struct list_head *head)
  */
 static int pmu_format(const char *name, struct list_head *format)
 {
-	struct stat st;
 	char path[PATH_MAX];
 	const char *sysfs = sysfs__mountpoint();
 
@@ -92,8 +92,8 @@ static int pmu_format(const char *name, struct list_head *format)
 	snprintf(path, PATH_MAX,
 		 "%s" EVENT_SOURCE_DEVICE_PATH "%s/format", sysfs, name);
 
-	if (stat(path, &st) < 0)
-		return 0;	/* no error if format does not exist */
+	if (!file_available(path))
+		return 0;
 
 	if (perf_pmu__format_parse(path, format))
 		return -1;
@@ -475,7 +475,6 @@ static int pmu_aliases_parse(char *dir, struct list_head *head)
  */
 static int pmu_aliases(const char *name, struct list_head *head)
 {
-	struct stat st;
 	char path[PATH_MAX];
 	const char *sysfs = sysfs__mountpoint();
 
@@ -485,8 +484,8 @@ static int pmu_aliases(const char *name, struct list_head *head)
 	snprintf(path, PATH_MAX,
 		 "%s/bus/event_source/devices/%s/events", sysfs, name);
 
-	if (stat(path, &st) < 0)
-		return 0;	 /* no error if 'events' does not exist */
+	if (!file_available(path))
+		return 0;
 
 	if (pmu_aliases_parse(path, head))
 		return -1;
@@ -525,7 +524,6 @@ static int pmu_alias_terms(struct perf_pmu_alias *alias,
  */
 static int pmu_type(const char *name, __u32 *type)
 {
-	struct stat st;
 	char path[PATH_MAX];
 	FILE *file;
 	int ret = 0;
@@ -537,7 +535,7 @@ static int pmu_type(const char *name, __u32 *type)
 	snprintf(path, PATH_MAX,
 		 "%s" EVENT_SOURCE_DEVICE_PATH "%s/type", sysfs, name);
 
-	if (stat(path, &st) < 0)
+	if (access(path, R_OK) < 0)
 		return -1;
 
 	file = fopen(path, "r");
@@ -628,14 +626,11 @@ static struct perf_cpu_map *pmu_cpumask(const char *name)
 static bool pmu_is_uncore(const char *name)
 {
 	char path[PATH_MAX];
-	struct perf_cpu_map *cpus;
-	const char *sysfs = sysfs__mountpoint();
+	const char *sysfs;
 
+	sysfs = sysfs__mountpoint();
 	snprintf(path, PATH_MAX, CPUS_TEMPLATE_UNCORE, sysfs, name);
-	cpus = __pmu_cpumask(path);
-	perf_cpu_map__put(cpus);
-
-	return !!cpus;
+	return file_available(path);
 }
 
 /*
@@ -645,7 +640,6 @@ static bool pmu_is_uncore(const char *name)
  */
 static int is_arm_pmu_core(const char *name)
 {
-	struct stat st;
 	char path[PATH_MAX];
 	const char *sysfs = sysfs__mountpoint();
 
@@ -655,10 +649,7 @@ static int is_arm_pmu_core(const char *name)
 	/* Look for cpu sysfs (specific to arm) */
 	scnprintf(path, PATH_MAX, "%s/bus/event_source/devices/%s/cpus",
 				sysfs, name);
-	if (stat(path, &st) == 0)
-		return 1;
-
-	return 0;
+	return file_available(path);
 }
 
 static char *perf_pmu__getcpuid(struct perf_pmu *pmu)
@@ -1528,7 +1519,6 @@ bool pmu_have_event(const char *pname, const char *name)
 
 static FILE *perf_pmu__open_file(struct perf_pmu *pmu, const char *name)
 {
-	struct stat st;
 	char path[PATH_MAX];
 	const char *sysfs;
 
@@ -1538,10 +1528,8 @@ static FILE *perf_pmu__open_file(struct perf_pmu *pmu, const char *name)
 
 	snprintf(path, PATH_MAX,
 		 "%s" EVENT_SOURCE_DEVICE_PATH "%s/%s", sysfs, pmu->name, name);
-
-	if (stat(path, &st) < 0)
+	if (!file_available(path))
 		return NULL;
-
 	return fopen(path, "r");
 }
 
diff --git a/tools/perf/util/srccode.c b/tools/perf/util/srccode.c
index d84ed8b6caaa..c29edaaca863 100644
--- a/tools/perf/util/srccode.c
+++ b/tools/perf/util/srccode.c
@@ -16,6 +16,7 @@
 #include "srccode.h"
 #include "debug.h"
 #include <internal/lib.h> // page_size
+#include "fncache.h"
 
 #define MAXSRCCACHE (32*1024*1024)
 #define MAXSRCFILES     64
@@ -36,14 +37,6 @@ static LIST_HEAD(srcfile_list);
 static long map_total_sz;
 static int num_srcfiles;
 
-static unsigned shash(unsigned char *s)
-{
-	unsigned h = 0;
-	while (*s)
-		h = 65599 * h + *s++;
-	return h ^ (h >> 16);
-}
-
 static int countlines(char *map, int maplen)
 {
 	int numl;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 2/7] perf affinity: Add infrastructure to save/restore affinity
  2019-10-25 18:14 Optimize perf stat for large number of events/cpus v3 Andi Kleen
  2019-10-25 18:14 ` [PATCH v3 1/7] perf pmu: Use file system cache to optimize sysfs access Andi Kleen
@ 2019-10-25 18:14 ` Andi Kleen
  2019-10-25 18:14 ` [PATCH v3 3/7] perf evsel: Add iterator to iterate over events ordered by CPU Andi Kleen
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Andi Kleen @ 2019-10-25 18:14 UTC (permalink / raw)
  To: acme; +Cc: jolsa, eranian, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

The kernel perf subsystem has to IPI to the target CPU for many
operations. On systems with many CPUs and when managing many events the
overhead can be dominated by lots of IPIs.

An alternative is to set up CPU affinity in the perf tool, then set up
all the events for that CPU, and then move on to the next CPU.

Add some affinity management infrastructure to enable such a model.
Used in followon patches.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---

v2: Use linux/bitmap.h functions.
---
 tools/perf/util/Build      |  1 +
 tools/perf/util/affinity.c | 72 ++++++++++++++++++++++++++++++++++++++
 tools/perf/util/affinity.h | 15 ++++++++
 3 files changed, 88 insertions(+)
 create mode 100644 tools/perf/util/affinity.c
 create mode 100644 tools/perf/util/affinity.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 2c1504fe924c..c7d4eab017e5 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -76,6 +76,7 @@ perf-y += sort.o
 perf-y += hist.o
 perf-y += util.o
 perf-y += cpumap.o
+perf-y += affinity.o
 perf-y += cputopo.o
 perf-y += cgroup.o
 perf-y += target.o
diff --git a/tools/perf/util/affinity.c b/tools/perf/util/affinity.c
new file mode 100644
index 000000000000..e197b0416f56
--- /dev/null
+++ b/tools/perf/util/affinity.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Manage affinity to optimize IPIs inside the kernel perf API. */
+#define _GNU_SOURCE 1
+#include <sched.h>
+#include <stdlib.h>
+#include <linux/bitmap.h>
+#include "perf.h"
+#include "cpumap.h"
+#include "affinity.h"
+
+static int get_cpu_set_size(void)
+{
+	int sz = cpu__max_cpu() + 8 - 1;
+	/*
+	 * sched_getaffinity doesn't like masks smaller than the kernel.
+	 * Hopefully that's big enough.
+	 */
+	if (sz < 4096)
+		sz = 4096;
+	return sz/8;
+}
+
+int affinity__setup(struct affinity *a)
+{
+	int cpu_set_size = get_cpu_set_size();
+
+	a->orig_cpus = bitmap_alloc(cpu_set_size*8);
+	if (!a->orig_cpus)
+		return -1;
+	sched_getaffinity(0, cpu_set_size, (cpu_set_t *)a->orig_cpus);
+	a->sched_cpus = bitmap_alloc(cpu_set_size*8);
+	if (!a->sched_cpus) {
+		free(a->orig_cpus);
+		return -1;
+	}
+	bitmap_zero((unsigned long *)a->sched_cpus, cpu_set_size);
+	a->changed = false;
+	return 0;
+}
+
+/*
+ * perf_event_open does an IPI internally to the target CPU.
+ * It is more efficient to change perf's affinity to the target
+ * CPU and then set up all events on that CPU, so we amortize
+ * CPU communication.
+ */
+void affinity__set(struct affinity *a, int cpu)
+{
+	int cpu_set_size = get_cpu_set_size();
+
+	if (cpu == -1)
+		return;
+	a->changed = true;
+	set_bit(cpu, a->sched_cpus);
+	/*
+	 * We ignore errors because affinity is just an optimization.
+	 * This could happen for example with isolated CPUs or cpusets.
+	 * In this case the IPIs inside the kernel's perf API still work.
+	 */
+	sched_setaffinity(0, cpu_set_size, (cpu_set_t *)a->sched_cpus);
+	clear_bit(cpu, a->sched_cpus);
+}
+
+void affinity__cleanup(struct affinity *a)
+{
+	int cpu_set_size = get_cpu_set_size();
+
+	if (a->changed)
+		sched_setaffinity(0, cpu_set_size, (cpu_set_t *)a->orig_cpus);
+	free(a->sched_cpus);
+	free(a->orig_cpus);
+}
diff --git a/tools/perf/util/affinity.h b/tools/perf/util/affinity.h
new file mode 100644
index 000000000000..008e2c3995b9
--- /dev/null
+++ b/tools/perf/util/affinity.h
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef AFFINITY_H
+#define AFFINITY_H 1
+
+struct affinity {
+	unsigned long *orig_cpus;
+	unsigned long *sched_cpus;
+	bool changed;
+};
+
+void affinity__cleanup(struct affinity *a);
+void affinity__set(struct affinity *a, int cpu);
+int affinity__setup(struct affinity *a);
+
+#endif
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 3/7] perf evsel: Add iterator to iterate over events ordered by CPU
  2019-10-25 18:14 Optimize perf stat for large number of events/cpus v3 Andi Kleen
  2019-10-25 18:14 ` [PATCH v3 1/7] perf pmu: Use file system cache to optimize sysfs access Andi Kleen
  2019-10-25 18:14 ` [PATCH v3 2/7] perf affinity: Add infrastructure to save/restore affinity Andi Kleen
@ 2019-10-25 18:14 ` Andi Kleen
  2019-10-30 10:05   ` Jiri Olsa
  2019-10-30 10:06   ` Jiri Olsa
  2019-10-25 18:14 ` [PATCH v3 4/7] perf stat: Use affinity for closing file descriptors Andi Kleen
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 19+ messages in thread
From: Andi Kleen @ 2019-10-25 18:14 UTC (permalink / raw)
  To: acme; +Cc: jolsa, eranian, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Add some common code that is needed to iterate over all events
in CPU order. Used in followon patches

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---

v2: Add cpumap__for_each_cpu macro to factor out some common code
---
 tools/perf/util/cpumap.h |  8 ++++++++
 tools/perf/util/evlist.c | 33 +++++++++++++++++++++++++++++++++
 tools/perf/util/evlist.h |  4 ++++
 tools/perf/util/evsel.h  |  1 +
 4 files changed, 46 insertions(+)

diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h
index 2553bef1279d..a9b13d72fd29 100644
--- a/tools/perf/util/cpumap.h
+++ b/tools/perf/util/cpumap.h
@@ -60,4 +60,12 @@ int cpu_map__build_map(struct perf_cpu_map *cpus, struct perf_cpu_map **res,
 
 int cpu_map__cpu(struct perf_cpu_map *cpus, int idx);
 bool cpu_map__has(struct perf_cpu_map *cpus, int cpu);
+
+#define __cpumap__for_each_cpu(cpus, index, cpu, maxcpu)\
+	for ((index) = 0; 				\
+	     (cpu) = (index) < (maxcpu) ? (cpus)->map[index] : -1, (index) < (maxcpu); \
+	     (index)++)
+#define cpumap__for_each_cpu(cpus, index, cpu) \
+	__cpumap__for_each_cpu(cpus, index, cpu, (cpus)->nr)
+
 #endif /* __PERF_CPUMAP_H */
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index fdce590d2278..da3c8f8ef68e 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -342,6 +342,39 @@ static int perf_evlist__nr_threads(struct evlist *evlist,
 		return perf_thread_map__nr(evlist->core.threads);
 }
 
+struct perf_cpu_map *evlist__cpu_iter_start(struct evlist *evlist)
+{
+	struct perf_cpu_map *cpus;
+	struct evsel *pos;
+
+	/*
+	 * evlist->cpus is not necessarily a superset of all the
+	 * event's cpus, so compute our own super set. This
+	 * assume that there is a super set
+	 */
+	cpus = evlist->core.cpus;
+	evlist__for_each_entry(evlist, pos) {
+		pos->cpu_index = 0;
+		if (pos->core.cpus->nr > cpus->nr)
+			cpus = pos->core.cpus;
+	}
+	return cpus;
+}
+
+bool evlist__cpu_iter_skip(struct evsel *ev, int cpu)
+{
+	if (ev->cpu_index >= ev->core.cpus->nr)
+		return true;
+	if (cpu >= 0 && ev->core.cpus->map[ev->cpu_index] != cpu)
+		return true;
+	return false;
+}
+
+void evlist__cpu_iter_next(struct evsel *ev)
+{
+	ev->cpu_index++;
+}
+
 void evlist__disable(struct evlist *evlist)
 {
 	struct evsel *pos;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 13051409fd22..c1deb8ebdcea 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -336,6 +336,10 @@ void perf_evlist__to_front(struct evlist *evlist,
 void perf_evlist__set_tracking_event(struct evlist *evlist,
 				     struct evsel *tracking_evsel);
 
+struct perf_cpu_map *evlist__cpu_iter_start(struct evlist *evlist);
+bool evlist__cpu_iter_skip(struct evsel *ev, int cpu);
+void evlist__cpu_iter_next(struct evsel *ev);
+
 struct evsel *
 perf_evlist__find_evsel_by_str(struct evlist *evlist, const char *str);
 
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index ddc5ee6f6592..cf90019ae744 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -95,6 +95,7 @@ struct evsel {
 	bool			collect_stat;
 	bool			weak_group;
 	bool			percore;
+	int			cpu_index;
 	const char		*pmu_name;
 	struct {
 		perf_evsel__sb_cb_t	*cb;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 4/7] perf stat: Use affinity for closing file descriptors
  2019-10-25 18:14 Optimize perf stat for large number of events/cpus v3 Andi Kleen
                   ` (2 preceding siblings ...)
  2019-10-25 18:14 ` [PATCH v3 3/7] perf evsel: Add iterator to iterate over events ordered by CPU Andi Kleen
@ 2019-10-25 18:14 ` Andi Kleen
  2019-10-30 10:05   ` Jiri Olsa
  2019-10-25 18:14 ` [PATCH v3 5/7] perf stat: Use affinity for opening events Andi Kleen
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2019-10-25 18:14 UTC (permalink / raw)
  To: acme; +Cc: jolsa, eranian, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Closing a perf fd can also trigger an IPI to the target CPU.
Use the same affinity technique as we use for reading/enabling events
to closing to optimize the CPU transitions.

Before on a large test case with 94 CPUs:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 32.56    3.085463          50     61483           close

After:

 10.54    0.735704          11     61485           close

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---

v2: Use new iterator macros
---
 tools/perf/lib/evsel.c              | 27 +++++++++++++++++++++------
 tools/perf/lib/include/perf/evsel.h |  1 +
 tools/perf/util/evlist.c            | 29 +++++++++++++++++++++++++++--
 tools/perf/util/evsel.h             |  1 +
 4 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/tools/perf/lib/evsel.c b/tools/perf/lib/evsel.c
index 5a89857b0381..ea775dacbd2d 100644
--- a/tools/perf/lib/evsel.c
+++ b/tools/perf/lib/evsel.c
@@ -114,16 +114,23 @@ int perf_evsel__open(struct perf_evsel *evsel, struct perf_cpu_map *cpus,
 	return err;
 }
 
+static void perf_evsel__close_fd_cpu(struct perf_evsel *evsel, int cpu)
+{
+	int thread;
+
+	for (thread = 0; thread < xyarray__max_y(evsel->fd); ++thread) {
+		if (FD(evsel, cpu, thread) >= 0)
+			close(FD(evsel, cpu, thread));
+		FD(evsel, cpu, thread) = -1;
+	}
+}
+
 void perf_evsel__close_fd(struct perf_evsel *evsel)
 {
-	int cpu, thread;
+	int cpu;
 
 	for (cpu = 0; cpu < xyarray__max_x(evsel->fd); cpu++)
-		for (thread = 0; thread < xyarray__max_y(evsel->fd); ++thread) {
-			if (FD(evsel, cpu, thread) >= 0)
-				close(FD(evsel, cpu, thread));
-			FD(evsel, cpu, thread) = -1;
-		}
+		perf_evsel__close_fd_cpu(evsel, cpu);
 }
 
 void perf_evsel__free_fd(struct perf_evsel *evsel)
@@ -141,6 +148,14 @@ void perf_evsel__close(struct perf_evsel *evsel)
 	perf_evsel__free_fd(evsel);
 }
 
+void perf_evsel__close_cpu(struct perf_evsel *evsel, int cpu)
+{
+	if (evsel->fd == NULL)
+		return;
+
+	perf_evsel__close_fd_cpu(evsel, cpu);
+}
+
 int perf_evsel__read_size(struct perf_evsel *evsel)
 {
 	u64 read_format = evsel->attr.read_format;
diff --git a/tools/perf/lib/include/perf/evsel.h b/tools/perf/lib/include/perf/evsel.h
index 557f5815a9c9..e7add554f861 100644
--- a/tools/perf/lib/include/perf/evsel.h
+++ b/tools/perf/lib/include/perf/evsel.h
@@ -26,6 +26,7 @@ LIBPERF_API void perf_evsel__delete(struct perf_evsel *evsel);
 LIBPERF_API int perf_evsel__open(struct perf_evsel *evsel, struct perf_cpu_map *cpus,
 				 struct perf_thread_map *threads);
 LIBPERF_API void perf_evsel__close(struct perf_evsel *evsel);
+LIBPERF_API void perf_evsel__close_cpu(struct perf_evsel *evsel, int cpu);
 LIBPERF_API int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
 				 struct perf_counts_values *count);
 LIBPERF_API int perf_evsel__enable(struct perf_evsel *evsel);
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index da3c8f8ef68e..aeb82de36043 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -18,6 +18,7 @@
 #include "debug.h"
 #include "units.h"
 #include <internal/lib.h> // page_size
+#include "affinity.h"
 #include "../perf.h"
 #include "asm/bug.h"
 #include "bpf-event.h"
@@ -1170,9 +1171,33 @@ void perf_evlist__set_selected(struct evlist *evlist,
 void evlist__close(struct evlist *evlist)
 {
 	struct evsel *evsel;
+	struct affinity affinity;
+	struct perf_cpu_map *cpus;
+	int i, cpu;
+
+	if (!evlist->core.cpus) {
+		evlist__for_each_entry_reverse(evlist, evsel)
+			evsel__close(evsel);
+		return;
+	}
 
-	evlist__for_each_entry_reverse(evlist, evsel)
-		evsel__close(evsel);
+	if (affinity__setup(&affinity) < 0)
+		return;
+	cpus = evlist__cpu_iter_start(evlist);
+	cpumap__for_each_cpu (cpus, i, cpu) {
+		affinity__set(&affinity, cpu);
+
+		evlist__for_each_entry_reverse(evlist, evsel) {
+			if (evlist__cpu_iter_skip(evsel, cpu))
+			    continue;
+			perf_evsel__close_cpu(&evsel->core, evsel->cpu_index);
+			evlist__cpu_iter_next(evsel);
+		}
+	}
+	evlist__for_each_entry_reverse(evlist, evsel) {
+		perf_evsel__free_fd(&evsel->core);
+		perf_evsel__free_id(&evsel->core);
+	}
 }
 
 static int perf_evlist__create_syswide_maps(struct evlist *evlist)
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index cf90019ae744..2e3b011ed09e 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -391,4 +391,5 @@ static inline bool evsel__has_callchain(const struct evsel *evsel)
 struct perf_env *perf_evsel__env(struct evsel *evsel);
 
 int perf_evsel__store_ids(struct evsel *evsel, struct evlist *evlist);
+
 #endif /* __PERF_EVSEL_H */
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 5/7] perf stat: Use affinity for opening events
  2019-10-25 18:14 Optimize perf stat for large number of events/cpus v3 Andi Kleen
                   ` (3 preceding siblings ...)
  2019-10-25 18:14 ` [PATCH v3 4/7] perf stat: Use affinity for closing file descriptors Andi Kleen
@ 2019-10-25 18:14 ` Andi Kleen
  2019-10-30 10:06   ` Jiri Olsa
  2019-10-25 18:14 ` [PATCH v3 6/7] perf stat: Use affinity for reading Andi Kleen
  2019-10-25 18:14 ` [PATCH v3 7/7] perf stat: Use affinity for enabling/disabling events Andi Kleen
  6 siblings, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2019-10-25 18:14 UTC (permalink / raw)
  To: acme; +Cc: jolsa, eranian, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Restructure the event opening in perf stat to cycle through
the events by CPU after setting affinity to that CPU.
This eliminates IPI overhead in the perf API.

We have to loop through the CPU in the outter builtin-stat
code instead of leaving that to low level functions.

It has to change the weak group fallback strategy slightly.
Since we cannot easily undo the opens for other CPUs
move the weak group retry to a separate loop.

Before with a large test case with 94 CPUs:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 42.75    4.050910          67     60046       110 perf_event_open

After:

 26.86    0.944396          16     58069       110 perf_event_open

(the number changes slightly because the weak group retries
work differently and the test case relies on weak groups)

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---

v2: Use new iterator macros.
Fix bug that caused unnecessary retry for errored events.
Add extra assert to check assumption that cpumaps are always subsets
---
 tools/perf/builtin-record.c    |   2 +-
 tools/perf/builtin-stat.c      | 193 +++++++++++++++++++++++++--------
 tools/perf/tests/event-times.c |   4 +-
 tools/perf/util/evlist.c       |   6 +-
 tools/perf/util/evlist.h       |   3 +-
 tools/perf/util/evsel.c        |  18 ++-
 tools/perf/util/evsel.h        |   5 +-
 tools/perf/util/stat.c         |   5 +-
 tools/perf/util/stat.h         |   3 +-
 9 files changed, 180 insertions(+), 59 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 2fb83aabbef5..9f8a9393ce4a 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -776,7 +776,7 @@ static int record__open(struct record *rec)
 			if ((errno == EINVAL || errno == EBADF) &&
 			    pos->leader != pos &&
 			    pos->weak_group) {
-			        pos = perf_evlist__reset_weak_group(evlist, pos);
+			        pos = perf_evlist__reset_weak_group(evlist, pos, true);
 				goto try_again;
 			}
 			rc = -errno;
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index c88d4e118409..e4ad3a29adff 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -65,6 +65,7 @@
 #include "util/target.h"
 #include "util/time-utils.h"
 #include "util/top.h"
+#include "util/affinity.h"
 #include "asm/bug.h"
 
 #include <linux/time64.h>
@@ -420,6 +421,58 @@ static bool is_target_alive(struct target *_target,
 	return false;
 }
 
+enum counter_recovery {
+	COUNTER_SKIP,
+	COUNTER_RETRY,
+	COUNTER_FATAL,
+};
+
+static enum counter_recovery stat_handle_error(struct evsel *counter)
+{
+	char msg[BUFSIZ];
+	/*
+	 * PPC returns ENXIO for HW counters until 2.6.37
+	 * (behavior changed with commit b0a873e).
+	 */
+	if (errno == EINVAL || errno == ENOSYS ||
+	    errno == ENOENT || errno == EOPNOTSUPP ||
+	    errno == ENXIO) {
+		if (verbose > 0)
+			ui__warning("%s event is not supported by the kernel.\n",
+				    perf_evsel__name(counter));
+		counter->supported = false;
+		counter->errored = true;
+
+		if ((counter->leader != counter) ||
+		    !(counter->leader->core.nr_members > 1))
+			return COUNTER_SKIP;
+	} else if (perf_evsel__fallback(counter, errno, msg, sizeof(msg))) {
+		if (verbose > 0)
+			ui__warning("%s\n", msg);
+		return COUNTER_RETRY;
+	} else if (target__has_per_thread(&target) &&
+		   evsel_list->core.threads &&
+		   evsel_list->core.threads->err_thread != -1) {
+		/*
+		 * For global --per-thread case, skip current
+		 * error thread.
+		 */
+		if (!thread_map__remove(evsel_list->core.threads,
+					evsel_list->core.threads->err_thread)) {
+			evsel_list->core.threads->err_thread = -1;
+			return COUNTER_RETRY;
+		}
+	}
+
+	perf_evsel__open_strerror(counter, &target,
+				  errno, msg, sizeof(msg));
+	ui__error("%s\n", msg);
+
+	if (child_pid != -1)
+		kill(child_pid, SIGTERM);
+	return COUNTER_FATAL;
+}
+
 static int __run_perf_stat(int argc, const char **argv, int run_idx)
 {
 	int interval = stat_config.interval;
@@ -428,11 +481,15 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 	char msg[BUFSIZ];
 	unsigned long long t0, t1;
 	struct evsel *counter;
+	struct perf_cpu_map *cpus;
 	struct timespec ts;
 	size_t l;
 	int status = 0;
 	const bool forks = (argc > 0);
 	bool is_pipe = STAT_RECORD ? perf_stat.data.is_pipe : false;
+	struct affinity affinity;
+	int i, cpu;
+	bool second_pass = false;
 
 	if (interval) {
 		ts.tv_sec  = interval / USEC_PER_MSEC;
@@ -457,61 +514,109 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 	if (group)
 		perf_evlist__set_leader(evsel_list);
 
-	evlist__for_each_entry(evsel_list, counter) {
+	if (affinity__setup(&affinity) < 0)
+		return -1;
+
+	cpus = evlist__cpu_iter_start(evsel_list);
+	cpumap__for_each_cpu (cpus, i, cpu) {
+		affinity__set(&affinity, cpu);
+
+		evlist__for_each_entry(evsel_list, counter) {
+			if (evlist__cpu_iter_skip(counter, cpu))
+				continue;
+			if (counter->reset_group || counter->errored)
+				continue;
+			evlist__cpu_iter_next(counter);
 try_again:
-		if (create_perf_stat_counter(counter, &stat_config, &target) < 0) {
-
-			/* Weak group failed. Reset the group. */
-			if ((errno == EINVAL || errno == EBADF) &&
-			    counter->leader != counter &&
-			    counter->weak_group) {
-				counter = perf_evlist__reset_weak_group(evsel_list, counter);
-				goto try_again;
-			}
+			if (create_perf_stat_counter(counter, &stat_config, &target,
+						     counter->cpu_index - 1) < 0) {
 
-			/*
-			 * PPC returns ENXIO for HW counters until 2.6.37
-			 * (behavior changed with commit b0a873e).
-			 */
-			if (errno == EINVAL || errno == ENOSYS ||
-			    errno == ENOENT || errno == EOPNOTSUPP ||
-			    errno == ENXIO) {
-				if (verbose > 0)
-					ui__warning("%s event is not supported by the kernel.\n",
-						    perf_evsel__name(counter));
-				counter->supported = false;
-
-				if ((counter->leader != counter) ||
-				    !(counter->leader->core.nr_members > 1))
-					continue;
-			} else if (perf_evsel__fallback(counter, errno, msg, sizeof(msg))) {
-                                if (verbose > 0)
-                                        ui__warning("%s\n", msg);
-                                goto try_again;
-			} else if (target__has_per_thread(&target) &&
-				   evsel_list->core.threads &&
-				   evsel_list->core.threads->err_thread != -1) {
 				/*
-				 * For global --per-thread case, skip current
-				 * error thread.
+				 * Weak group failed. We cannot just undo this here
+				 * because earlier CPUs might be in group mode, and the kernel
+				 * doesn't support mixing group and non group reads. Defer
+				 * it to later.
+				 * Don't close here because we're in the wrong affinity.
 				 */
-				if (!thread_map__remove(evsel_list->core.threads,
-							evsel_list->core.threads->err_thread)) {
-					evsel_list->core.threads->err_thread = -1;
+				if ((errno == EINVAL || errno == EBADF) &&
+				    counter->leader != counter &&
+				    counter->weak_group) {
+					perf_evlist__reset_weak_group(evsel_list, counter, false);
+					assert(counter->reset_group);
+					second_pass = true;
+					continue;
+				}
+
+				switch (stat_handle_error(counter)) {
+				case COUNTER_FATAL:
+					return -1;
+				case COUNTER_RETRY:
 					goto try_again;
+				case COUNTER_SKIP:
+					continue;
+				default:
+					break;
 				}
+
 			}
+			counter->supported = true;
+		}
+	}
 
-			perf_evsel__open_strerror(counter, &target,
-						  errno, msg, sizeof(msg));
-			ui__error("%s\n", msg);
+	if (second_pass) {
+		/*
+		 * Now redo all the weak group after closing them,
+		 * and also close errored counters.
+		 */
 
-			if (child_pid != -1)
-				kill(child_pid, SIGTERM);
+		cpus = evlist__cpu_iter_start(evsel_list);
+		cpumap__for_each_cpu (cpus, i, cpu) {
+			affinity__set(&affinity, cpu);
+			/* First close errored or weak retry */
+			evlist__for_each_entry(evsel_list, counter) {
+				if (!counter->reset_group && !counter->errored)
+					continue;
+				if (evlist__cpu_iter_skip(counter, cpu))
+					continue;
+				perf_evsel__close_cpu(&counter->core, counter->cpu_index);
+			}
+			/* Now reopen weak */
+			evlist__for_each_entry(evsel_list, counter) {
+				if (!counter->reset_group)
+					continue;
+				if (evlist__cpu_iter_skip(counter, cpu))
+					continue;
+				evlist__cpu_iter_next(counter);
+try_again_reset:
+				pr_debug2("reopening weak %s\n", perf_evsel__name(counter));
+				if (create_perf_stat_counter(counter, &stat_config, &target,
+							     counter->cpu_index - 1) < 0) {
+
+					switch (stat_handle_error(counter)) {
+					case COUNTER_FATAL:
+						return -1;
+					case COUNTER_RETRY:
+						goto try_again_reset;
+					case COUNTER_SKIP:
+						continue;
+					default:
+						break;
+					}
+				}
+				counter->supported = true;
+			}
+		}
+	}
+	affinity__cleanup(&affinity);
 
-			return -1;
+	evlist__for_each_entry(evsel_list, counter) {
+		if (!counter->supported) {
+			perf_evsel__free_fd(&counter->core);
+			continue;
 		}
-		counter->supported = true;
+		/* Must have consumed all map indexes */
+		assert(!counter->errored &&
+			counter->cpu_index == counter->core.cpus->nr);
 
 		l = strlen(counter->unit);
 		if (l > stat_config.unit_width)
diff --git a/tools/perf/tests/event-times.c b/tools/perf/tests/event-times.c
index 1ee8704e2284..1e8a9f5c356d 100644
--- a/tools/perf/tests/event-times.c
+++ b/tools/perf/tests/event-times.c
@@ -125,7 +125,7 @@ static int attach__cpu_disabled(struct evlist *evlist)
 
 	evsel->core.attr.disabled = 1;
 
-	err = perf_evsel__open_per_cpu(evsel, cpus);
+	err = perf_evsel__open_per_cpu(evsel, cpus, -1);
 	if (err) {
 		if (err == -EACCES)
 			return TEST_SKIP;
@@ -152,7 +152,7 @@ static int attach__cpu_enabled(struct evlist *evlist)
 		return -1;
 	}
 
-	err = perf_evsel__open_per_cpu(evsel, cpus);
+	err = perf_evsel__open_per_cpu(evsel, cpus, -1);
 	if (err == -EACCES)
 		return TEST_SKIP;
 
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index aeb82de36043..ca9b06979fc0 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1635,7 +1635,8 @@ void perf_evlist__force_leader(struct evlist *evlist)
 }
 
 struct evsel *perf_evlist__reset_weak_group(struct evlist *evsel_list,
-						 struct evsel *evsel)
+						 struct evsel *evsel,
+						bool close)
 {
 	struct evsel *c2, *leader;
 	bool is_open = true;
@@ -1652,10 +1653,11 @@ struct evsel *perf_evlist__reset_weak_group(struct evlist *evsel_list,
 		if (c2 == evsel)
 			is_open = false;
 		if (c2->leader == leader) {
-			if (is_open)
+			if (is_open && close)
 				perf_evsel__close(&c2->core);
 			c2->leader = c2;
 			c2->core.nr_members = 0;
+			c2->reset_group = true;
 		}
 	}
 	return leader;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index c1deb8ebdcea..d9174d565db3 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -351,5 +351,6 @@ bool perf_evlist__exclude_kernel(struct evlist *evlist);
 void perf_evlist__force_leader(struct evlist *evlist);
 
 struct evsel *perf_evlist__reset_weak_group(struct evlist *evlist,
-						 struct evsel *evsel);
+						 struct evsel *evsel,
+						bool close);
 #endif /* __PERF_EVLIST_H */
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index d4451846af93..7106f9a067df 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1569,8 +1569,9 @@ static int perf_event_open(struct evsel *evsel,
 	return fd;
 }
 
-int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus,
-		struct perf_thread_map *threads)
+static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
+		struct perf_thread_map *threads,
+		int start_cpu, int end_cpu)
 {
 	int cpu, thread, nthreads;
 	unsigned long flags = PERF_FLAG_FD_CLOEXEC;
@@ -1647,7 +1648,7 @@ int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus,
 
 	display_attr(&evsel->core.attr);
 
-	for (cpu = 0; cpu < cpus->nr; cpu++) {
+	for (cpu = start_cpu; cpu < end_cpu; cpu++) {
 
 		for (thread = 0; thread < nthreads; thread++) {
 			int fd, group_fd;
@@ -1825,6 +1826,12 @@ int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus,
 	return err;
 }
 
+int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus,
+		struct perf_thread_map *threads)
+{
+	return evsel__open_cpu(evsel, cpus, threads, 0, cpus ? cpus->nr : 1);
+}
+
 void evsel__close(struct evsel *evsel)
 {
 	perf_evsel__close(&evsel->core);
@@ -1832,9 +1839,10 @@ void evsel__close(struct evsel *evsel)
 }
 
 int perf_evsel__open_per_cpu(struct evsel *evsel,
-			     struct perf_cpu_map *cpus)
+			     struct perf_cpu_map *cpus,
+			     int cpu)
 {
-	return evsel__open(evsel, cpus, NULL);
+	return evsel__open_cpu(evsel, cpus, NULL, cpu, cpu + 1);
 }
 
 int perf_evsel__open_per_thread(struct evsel *evsel,
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 2e3b011ed09e..d5440a928745 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -94,6 +94,8 @@ struct evsel {
 	struct evsel		*metric_leader;
 	bool			collect_stat;
 	bool			weak_group;
+	bool			reset_group;
+	bool			errored;
 	bool			percore;
 	int			cpu_index;
 	const char		*pmu_name;
@@ -223,7 +225,8 @@ int evsel__enable(struct evsel *evsel);
 int evsel__disable(struct evsel *evsel);
 
 int perf_evsel__open_per_cpu(struct evsel *evsel,
-			     struct perf_cpu_map *cpus);
+			     struct perf_cpu_map *cpus,
+			     int cpu);
 int perf_evsel__open_per_thread(struct evsel *evsel,
 				struct perf_thread_map *threads);
 int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus,
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 6822e4ffe224..3aebe732e886 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -463,7 +463,8 @@ size_t perf_event__fprintf_stat_config(union perf_event *event, FILE *fp)
 
 int create_perf_stat_counter(struct evsel *evsel,
 			     struct perf_stat_config *config,
-			     struct target *target)
+			     struct target *target,
+			     int cpu)
 {
 	struct perf_event_attr *attr = &evsel->core.attr;
 	struct evsel *leader = evsel->leader;
@@ -517,7 +518,7 @@ int create_perf_stat_counter(struct evsel *evsel,
 	}
 
 	if (target__has_cpu(target) && !target__has_per_thread(target))
-		return perf_evsel__open_per_cpu(evsel, evsel__cpus(evsel));
+		return perf_evsel__open_per_cpu(evsel, evsel__cpus(evsel), cpu);
 
 	return perf_evsel__open_per_thread(evsel, evsel->core.threads);
 }
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 081c4a5113c6..4c9a7b68c3e7 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -213,7 +213,8 @@ size_t perf_event__fprintf_stat_config(union perf_event *event, FILE *fp);
 
 int create_perf_stat_counter(struct evsel *evsel,
 			     struct perf_stat_config *config,
-			     struct target *target);
+			     struct target *target,
+			     int cpu);
 void
 perf_evlist__print_counters(struct evlist *evlist,
 			    struct perf_stat_config *config,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 6/7] perf stat: Use affinity for reading
  2019-10-25 18:14 Optimize perf stat for large number of events/cpus v3 Andi Kleen
                   ` (4 preceding siblings ...)
  2019-10-25 18:14 ` [PATCH v3 5/7] perf stat: Use affinity for opening events Andi Kleen
@ 2019-10-25 18:14 ` Andi Kleen
  2019-10-25 18:14 ` [PATCH v3 7/7] perf stat: Use affinity for enabling/disabling events Andi Kleen
  6 siblings, 0 replies; 19+ messages in thread
From: Andi Kleen @ 2019-10-25 18:14 UTC (permalink / raw)
  To: acme; +Cc: jolsa, eranian, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Restructure event reading to use affinity to minimize the number
of IPIs needed.

Before on a large test case with 94 CPUs:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  3.16    0.106079           4     22082           read

After:

  3.43    0.081295           3     22082           read

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---

v2: Use new iterator macros
---
 tools/perf/builtin-stat.c | 96 ++++++++++++++++++++++-----------------
 tools/perf/util/evsel.h   |  1 +
 2 files changed, 56 insertions(+), 41 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index e4ad3a29adff..828f84b11299 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -266,15 +266,10 @@ static int read_single_counter(struct evsel *counter, int cpu,
  * Read out the results of a single counter:
  * do not aggregate counts across CPUs in system-wide mode
  */
-static int read_counter(struct evsel *counter, struct timespec *rs)
+static int read_counter(struct evsel *counter, struct timespec *rs, int cpu)
 {
 	int nthreads = perf_thread_map__nr(evsel_list->core.threads);
-	int ncpus, cpu, thread;
-
-	if (target__has_cpu(&target) && !target__has_per_thread(&target))
-		ncpus = perf_evsel__nr_cpus(counter);
-	else
-		ncpus = 1;
+	int thread;
 
 	if (!counter->supported)
 		return -ENOENT;
@@ -283,40 +278,38 @@ static int read_counter(struct evsel *counter, struct timespec *rs)
 		nthreads = 1;
 
 	for (thread = 0; thread < nthreads; thread++) {
-		for (cpu = 0; cpu < ncpus; cpu++) {
-			struct perf_counts_values *count;
-
-			count = perf_counts(counter->counts, cpu, thread);
-
-			/*
-			 * The leader's group read loads data into its group members
-			 * (via perf_evsel__read_counter) and sets threir count->loaded.
-			 */
-			if (!perf_counts__is_loaded(counter->counts, cpu, thread) &&
-			    read_single_counter(counter, cpu, thread, rs)) {
-				counter->counts->scaled = -1;
-				perf_counts(counter->counts, cpu, thread)->ena = 0;
-				perf_counts(counter->counts, cpu, thread)->run = 0;
-				return -1;
-			}
+		struct perf_counts_values *count;
 
-			perf_counts__set_loaded(counter->counts, cpu, thread, false);
+		count = perf_counts(counter->counts, cpu, thread);
 
-			if (STAT_RECORD) {
-				if (perf_evsel__write_stat_event(counter, cpu, thread, count)) {
-					pr_err("failed to write stat event\n");
-					return -1;
-				}
-			}
+		/*
+		 * The leader's group read loads data into its group members
+		 * (via perf_evsel__read_counter) and sets threir count->loaded.
+		 */
+		if (!perf_counts__is_loaded(counter->counts, cpu, thread) &&
+		    read_single_counter(counter, cpu, thread, rs)) {
+			counter->counts->scaled = -1;
+			perf_counts(counter->counts, cpu, thread)->ena = 0;
+			perf_counts(counter->counts, cpu, thread)->run = 0;
+			return -1;
+		}
+
+		perf_counts__set_loaded(counter->counts, cpu, thread, false);
 
-			if (verbose > 1) {
-				fprintf(stat_config.output,
-					"%s: %d: %" PRIu64 " %" PRIu64 " %" PRIu64 "\n",
-						perf_evsel__name(counter),
-						cpu,
-						count->val, count->ena, count->run);
+		if (STAT_RECORD) {
+			if (perf_evsel__write_stat_event(counter, cpu, thread, count)) {
+				pr_err("failed to write stat event\n");
+				return -1;
 			}
 		}
+
+		if (verbose > 1) {
+			fprintf(stat_config.output,
+				"%s: %d: %" PRIu64 " %" PRIu64 " %" PRIu64 "\n",
+					perf_evsel__name(counter),
+					cpu,
+					count->val, count->ena, count->run);
+		}
 	}
 
 	return 0;
@@ -325,15 +318,36 @@ static int read_counter(struct evsel *counter, struct timespec *rs)
 static void read_counters(struct timespec *rs)
 {
 	struct evsel *counter;
-	int ret;
+	struct affinity affinity;
+	int i, ncpus, cpu;
+	struct perf_cpu_map *cpus;
+
+	if (affinity__setup(&affinity) < 0)
+		return;
+
+	cpus = evlist__cpu_iter_start(evsel_list);
+
+	ncpus = cpus->nr;
+	if (!(target__has_cpu(&target) && !target__has_per_thread(&target)))
+		ncpus = 1;
+	__cpumap__for_each_cpu (cpus, i, cpu, ncpus) {
+		affinity__set(&affinity, cpu);
+
+		evlist__for_each_entry(evsel_list, counter) {
+			if (evlist__cpu_iter_skip(counter, cpu))
+				continue;
+			counter->err = read_counter(counter, rs, counter->cpu_index);
+			evlist__cpu_iter_next(counter);
+		}
+	}
+	affinity__cleanup(&affinity);
 
 	evlist__for_each_entry(evsel_list, counter) {
-		ret = read_counter(counter, rs);
-		if (ret)
+		if (counter->err)
 			pr_debug("failed to read counter %s\n", counter->name);
-
-		if (ret == 0 && perf_stat_process_counter(&stat_config, counter))
+		if (counter->err == 0 && perf_stat_process_counter(&stat_config, counter))
 			pr_warning("failed to process counter %s\n", counter->name);
+		counter->err = 0;
 	}
 }
 
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index d5440a928745..9fc9f6698aa4 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -86,6 +86,7 @@ struct evsel {
 	struct list_head	config_terms;
 	struct bpf_object	*bpf_obj;
 	int			bpf_fd;
+	int			err;
 	bool			auto_merge_stats;
 	bool			merged_stat;
 	const char *		metric_expr;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3 7/7] perf stat: Use affinity for enabling/disabling events
  2019-10-25 18:14 Optimize perf stat for large number of events/cpus v3 Andi Kleen
                   ` (5 preceding siblings ...)
  2019-10-25 18:14 ` [PATCH v3 6/7] perf stat: Use affinity for reading Andi Kleen
@ 2019-10-25 18:14 ` Andi Kleen
  6 siblings, 0 replies; 19+ messages in thread
From: Andi Kleen @ 2019-10-25 18:14 UTC (permalink / raw)
  To: acme; +Cc: jolsa, eranian, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Restructure event enabling/disabling to use affinity, which
minimizes the number of IPIs needed.

Before on a large test case with 94 CPUs:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 54.65    1.899986          22     84812       660 ioctl

after:

 39.21    0.930451          10     84796       644 ioctl

Signed-off-by: Andi Kleen <ak@linux.intel.com>

---

v2: Use new iterator macros
---
 tools/perf/lib/evsel.c              | 49 +++++++++++++++++++++--------
 tools/perf/lib/include/perf/evsel.h |  2 ++
 tools/perf/util/evlist.c            | 48 +++++++++++++++++++++++++---
 tools/perf/util/evsel.c             | 13 ++++++++
 tools/perf/util/evsel.h             |  2 ++
 5 files changed, 96 insertions(+), 18 deletions(-)

diff --git a/tools/perf/lib/evsel.c b/tools/perf/lib/evsel.c
index ea775dacbd2d..89ddfade0b96 100644
--- a/tools/perf/lib/evsel.c
+++ b/tools/perf/lib/evsel.c
@@ -198,38 +198,61 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
 }
 
 static int perf_evsel__run_ioctl(struct perf_evsel *evsel,
-				 int ioc,  void *arg)
+				 int ioc,  void *arg,
+				 int cpu)
 {
-	int cpu, thread;
+	int thread;
 
-	for (cpu = 0; cpu < xyarray__max_x(evsel->fd); cpu++) {
-		for (thread = 0; thread < xyarray__max_y(evsel->fd); thread++) {
-			int fd = FD(evsel, cpu, thread),
-			    err = ioctl(fd, ioc, arg);
+	for (thread = 0; thread < xyarray__max_y(evsel->fd); thread++) {
+		int fd = FD(evsel, cpu, thread),
+		    err = ioctl(fd, ioc, arg);
 
-			if (err)
-				return err;
-		}
+		if (err)
+			return err;
 	}
 
 	return 0;
 }
 
+int perf_evsel__enable_cpu(struct perf_evsel *evsel, int cpu)
+{
+	return perf_evsel__run_ioctl(evsel, PERF_EVENT_IOC_ENABLE, 0, cpu);
+}
+
 int perf_evsel__enable(struct perf_evsel *evsel)
 {
-	return perf_evsel__run_ioctl(evsel, PERF_EVENT_IOC_ENABLE, 0);
+	int i;
+	int err = 0;
+
+	for (i = 0; i < evsel->cpus->nr && !err; i++)
+		err = perf_evsel__run_ioctl(evsel, PERF_EVENT_IOC_ENABLE, 0, i);
+	return err;
+}
+
+int perf_evsel__disable_cpu(struct perf_evsel *evsel, int cpu)
+{
+	return perf_evsel__run_ioctl(evsel, PERF_EVENT_IOC_DISABLE, 0, cpu);
 }
 
 int perf_evsel__disable(struct perf_evsel *evsel)
 {
-	return perf_evsel__run_ioctl(evsel, PERF_EVENT_IOC_DISABLE, 0);
+	int i;
+	int err = 0;
+
+	for (i = 0; i < evsel->cpus->nr && !err; i++)
+		err = perf_evsel__run_ioctl(evsel, PERF_EVENT_IOC_DISABLE, 0, i);
+	return err;
 }
 
 int perf_evsel__apply_filter(struct perf_evsel *evsel, const char *filter)
 {
-	return perf_evsel__run_ioctl(evsel,
+	int err = 0, i;
+
+	for (i = 0; i < evsel->cpus->nr && !err; i++)
+		err = perf_evsel__run_ioctl(evsel,
 				     PERF_EVENT_IOC_SET_FILTER,
-				     (void *)filter);
+				     (void *)filter, i);
+	return err;
 }
 
 struct perf_cpu_map *perf_evsel__cpus(struct perf_evsel *evsel)
diff --git a/tools/perf/lib/include/perf/evsel.h b/tools/perf/lib/include/perf/evsel.h
index e7add554f861..c82ec39a4ad0 100644
--- a/tools/perf/lib/include/perf/evsel.h
+++ b/tools/perf/lib/include/perf/evsel.h
@@ -30,7 +30,9 @@ LIBPERF_API void perf_evsel__close_cpu(struct perf_evsel *evsel, int cpu);
 LIBPERF_API int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
 				 struct perf_counts_values *count);
 LIBPERF_API int perf_evsel__enable(struct perf_evsel *evsel);
+LIBPERF_API int perf_evsel__enable_cpu(struct perf_evsel *evsel, int cpu);
 LIBPERF_API int perf_evsel__disable(struct perf_evsel *evsel);
+LIBPERF_API int perf_evsel__disable_cpu(struct perf_evsel *evsel, int cpu);
 LIBPERF_API struct perf_cpu_map *perf_evsel__cpus(struct perf_evsel *evsel);
 LIBPERF_API struct perf_thread_map *perf_evsel__threads(struct perf_evsel *evsel);
 LIBPERF_API struct perf_event_attr *perf_evsel__attr(struct perf_evsel *evsel);
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index ca9b06979fc0..e3e4c2fedc8a 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -379,26 +379,64 @@ void evlist__cpu_iter_next(struct evsel *ev)
 void evlist__disable(struct evlist *evlist)
 {
 	struct evsel *pos;
+	struct affinity affinity;
+	struct perf_cpu_map *cpus;
+	int i, cpu;
+
+	if (affinity__setup(&affinity) < 0)
+		return;
+
+	cpus = evlist__cpu_iter_start(evlist);
+	cpumap__for_each_cpu (cpus, i, cpu) {
+		affinity__set(&affinity, cpu);
 
+		evlist__for_each_entry(evlist, pos) {
+			if (evlist__cpu_iter_skip(pos, cpu))
+				continue;
+			if (pos->disabled || !perf_evsel__is_group_leader(pos) || !pos->core.fd)
+				continue;
+			evsel__disable_cpu(pos, pos->cpu_index);
+			evlist__cpu_iter_next(pos);
+		}
+	}
+	affinity__cleanup(&affinity);
 	evlist__for_each_entry(evlist, pos) {
-		if (pos->disabled || !perf_evsel__is_group_leader(pos) || !pos->core.fd)
+		if (!perf_evsel__is_group_leader(pos) || !pos->core.fd)
 			continue;
-		evsel__disable(pos);
+		pos->disabled = true;
 	}
-
 	evlist->enabled = false;
 }
 
 void evlist__enable(struct evlist *evlist)
 {
 	struct evsel *pos;
+	struct affinity affinity;
+	struct perf_cpu_map *cpus;
+	int i, cpu;
+
+	if (affinity__setup(&affinity) < 0)
+		return;
 
+	cpus = evlist__cpu_iter_start(evlist);
+	cpumap__for_each_cpu (cpus, i, cpu) {
+		affinity__set(&affinity, cpu);
+
+		evlist__for_each_entry(evlist, pos) {
+			if (evlist__cpu_iter_skip(pos, cpu))
+				continue;
+			if (!perf_evsel__is_group_leader(pos) || !pos->core.fd)
+				continue;
+			evsel__enable_cpu(pos, pos->cpu_index);
+			evlist__cpu_iter_next(pos);
+		}
+	}
+	affinity__cleanup(&affinity);
 	evlist__for_each_entry(evlist, pos) {
 		if (!perf_evsel__is_group_leader(pos) || !pos->core.fd)
 			continue;
-		evsel__enable(pos);
+		pos->disabled = false;
 	}
-
 	evlist->enabled = true;
 }
 
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 7106f9a067df..79050a6f4991 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1205,13 +1205,26 @@ int perf_evsel__append_addr_filter(struct evsel *evsel, const char *filter)
 	return perf_evsel__append_filter(evsel, "%s,%s", filter);
 }
 
+/* Caller has to clear disabled after going through all CPUs. */
+int evsel__enable_cpu(struct evsel *evsel, int cpu)
+{
+	int err = perf_evsel__enable_cpu(&evsel->core, cpu);
+	return err;
+}
+
 int evsel__enable(struct evsel *evsel)
 {
 	int err = perf_evsel__enable(&evsel->core);
 
 	if (!err)
 		evsel->disabled = false;
+	return err;
+}
 
+/* Caller has to set disabled after going through all CPUs. */
+int evsel__disable_cpu(struct evsel *evsel, int cpu)
+{
+	int err = perf_evsel__disable_cpu(&evsel->core, cpu);
 	return err;
 }
 
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 9fc9f6698aa4..15977bbe7b63 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -222,8 +222,10 @@ int perf_evsel__set_filter(struct evsel *evsel, const char *filter);
 int perf_evsel__append_tp_filter(struct evsel *evsel, const char *filter);
 int perf_evsel__append_addr_filter(struct evsel *evsel,
 				   const char *filter);
+int evsel__enable_cpu(struct evsel *evsel, int cpu);
 int evsel__enable(struct evsel *evsel);
 int evsel__disable(struct evsel *evsel);
+int evsel__disable_cpu(struct evsel *evsel, int cpu);
 
 int perf_evsel__open_per_cpu(struct evsel *evsel,
 			     struct perf_cpu_map *cpus,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/7] perf pmu: Use file system cache to optimize sysfs access
  2019-10-25 18:14 ` [PATCH v3 1/7] perf pmu: Use file system cache to optimize sysfs access Andi Kleen
@ 2019-10-28 22:01   ` Jiri Olsa
  2019-10-29  2:14     ` Andi Kleen
  0 siblings, 1 reply; 19+ messages in thread
From: Jiri Olsa @ 2019-10-28 22:01 UTC (permalink / raw)
  To: Andi Kleen; +Cc: acme, jolsa, eranian, linux-kernel, Andi Kleen

On Fri, Oct 25, 2019 at 11:14:11AM -0700, Andi Kleen wrote:

SNIP

>  	if (pmu_aliases_parse(path, head))
>  		return -1;
> @@ -525,7 +524,6 @@ static int pmu_alias_terms(struct perf_pmu_alias *alias,
>   */
>  static int pmu_type(const char *name, __u32 *type)
>  {
> -	struct stat st;
>  	char path[PATH_MAX];
>  	FILE *file;
>  	int ret = 0;
> @@ -537,7 +535,7 @@ static int pmu_type(const char *name, __u32 *type)
>  	snprintf(path, PATH_MAX,
>  		 "%s" EVENT_SOURCE_DEVICE_PATH "%s/type", sysfs, name);
>  
> -	if (stat(path, &st) < 0)
> +	if (access(path, R_OK) < 0)

why not file_available call in here?

jirka

>  		return -1;
>  
>  	file = fopen(path, "r");
> @@ -628,14 +626,11 @@ static struct perf_cpu_map *pmu_cpumask(const char *name)
>  static bool pmu_is_uncore(const char *name)
>  {
>  	char path[PATH_MAX];
> -	struct perf_cpu_map *cpus;
> -	const char *sysfs = sysfs__mountpoint();
> +	const char *sysfs;
>  
> +	sysfs = sysfs__mountpoint();
>  	snprintf(path, PATH_MAX, CPUS_TEMPLATE_UNCORE, sysfs, name);

SNIP


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 1/7] perf pmu: Use file system cache to optimize sysfs access
  2019-10-28 22:01   ` Jiri Olsa
@ 2019-10-29  2:14     ` Andi Kleen
  0 siblings, 0 replies; 19+ messages in thread
From: Andi Kleen @ 2019-10-29  2:14 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Andi Kleen, acme, jolsa, eranian, linux-kernel

On Mon, Oct 28, 2019 at 11:01:37PM +0100, Jiri Olsa wrote:
> On Fri, Oct 25, 2019 at 11:14:11AM -0700, Andi Kleen wrote:
> 
> SNIP
> 
> >  	if (pmu_aliases_parse(path, head))
> >  		return -1;
> > @@ -525,7 +524,6 @@ static int pmu_alias_terms(struct perf_pmu_alias *alias,
> >   */
> >  static int pmu_type(const char *name, __u32 *type)
> >  {
> > -	struct stat st;
> >  	char path[PATH_MAX];
> >  	FILE *file;
> >  	int ret = 0;
> > @@ -537,7 +535,7 @@ static int pmu_type(const char *name, __u32 *type)
> >  	snprintf(path, PATH_MAX,
> >  		 "%s" EVENT_SOURCE_DEVICE_PATH "%s/type", sysfs, name);
> >  
> > -	if (stat(path, &st) < 0)
> > +	if (access(path, R_OK) < 0)
> 
> why not file_available call in here?

iirc it doesn't do any redundant accesses.

-Andi


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 3/7] perf evsel: Add iterator to iterate over events ordered by CPU
  2019-10-25 18:14 ` [PATCH v3 3/7] perf evsel: Add iterator to iterate over events ordered by CPU Andi Kleen
@ 2019-10-30 10:05   ` Jiri Olsa
  2019-10-30 10:06   ` Jiri Olsa
  1 sibling, 0 replies; 19+ messages in thread
From: Jiri Olsa @ 2019-10-30 10:05 UTC (permalink / raw)
  To: Andi Kleen; +Cc: acme, jolsa, eranian, linux-kernel, Andi Kleen

On Fri, Oct 25, 2019 at 11:14:13AM -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> Add some common code that is needed to iterate over all events
> in CPU order. Used in followon patches
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> 
> ---
> 
> v2: Add cpumap__for_each_cpu macro to factor out some common code
> ---
>  tools/perf/util/cpumap.h |  8 ++++++++
>  tools/perf/util/evlist.c | 33 +++++++++++++++++++++++++++++++++
>  tools/perf/util/evlist.h |  4 ++++
>  tools/perf/util/evsel.h  |  1 +
>  4 files changed, 46 insertions(+)
> 
> diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h
> index 2553bef1279d..a9b13d72fd29 100644
> --- a/tools/perf/util/cpumap.h
> +++ b/tools/perf/util/cpumap.h
> @@ -60,4 +60,12 @@ int cpu_map__build_map(struct perf_cpu_map *cpus, struct perf_cpu_map **res,
>  
>  int cpu_map__cpu(struct perf_cpu_map *cpus, int idx);
>  bool cpu_map__has(struct perf_cpu_map *cpus, int cpu);
> +
> +#define __cpumap__for_each_cpu(cpus, index, cpu, maxcpu)\
> +	for ((index) = 0; 				\
> +	     (cpu) = (index) < (maxcpu) ? (cpus)->map[index] : -1, (index) < (maxcpu); \
> +	     (index)++)
> +#define cpumap__for_each_cpu(cpus, index, cpu) \
> +	__cpumap__for_each_cpu(cpus, index, cpu, (cpus)->nr)

there's perf_cpu_map__for_each_cpu macro in libperf

jirka


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/7] perf stat: Use affinity for closing file descriptors
  2019-10-25 18:14 ` [PATCH v3 4/7] perf stat: Use affinity for closing file descriptors Andi Kleen
@ 2019-10-30 10:05   ` Jiri Olsa
  2019-11-04 23:35     ` Andi Kleen
  0 siblings, 1 reply; 19+ messages in thread
From: Jiri Olsa @ 2019-10-30 10:05 UTC (permalink / raw)
  To: Andi Kleen; +Cc: acme, jolsa, eranian, linux-kernel, Andi Kleen

On Fri, Oct 25, 2019 at 11:14:14AM -0700, Andi Kleen wrote:

SNIP

> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index da3c8f8ef68e..aeb82de36043 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -18,6 +18,7 @@
>  #include "debug.h"
>  #include "units.h"
>  #include <internal/lib.h> // page_size
> +#include "affinity.h"
>  #include "../perf.h"
>  #include "asm/bug.h"
>  #include "bpf-event.h"
> @@ -1170,9 +1171,33 @@ void perf_evlist__set_selected(struct evlist *evlist,
>  void evlist__close(struct evlist *evlist)
>  {
>  	struct evsel *evsel;
> +	struct affinity affinity;
> +	struct perf_cpu_map *cpus;
> +	int i, cpu;
> +
> +	if (!evlist->core.cpus) {
> +		evlist__for_each_entry_reverse(evlist, evsel)
> +			evsel__close(evsel);
> +		return;
> +	}
>  
> -	evlist__for_each_entry_reverse(evlist, evsel)
> -		evsel__close(evsel);
> +	if (affinity__setup(&affinity) < 0)
> +		return;
> +	cpus = evlist__cpu_iter_start(evlist);
> +	cpumap__for_each_cpu (cpus, i, cpu) {
> +		affinity__set(&affinity, cpu);

whats the point of affinity->changed flags when we call
affinity__set unconditionaly? I think we can do without
it, becase we'll always endup calling affinity__set

also here you're missing affinity__cleanup call in here

however, it seems superfluous to always allocate those
bitmaps, while we need just the current cpus that we
run on and also that is probably questionable

could we put 'struct affinity' to 'struct evlist'
and get rid of all affinity__setup/cleanup calls?
(apart from those in evlist__init and evlist__delete)

thanks,
jirka


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 5/7] perf stat: Use affinity for opening events
  2019-10-25 18:14 ` [PATCH v3 5/7] perf stat: Use affinity for opening events Andi Kleen
@ 2019-10-30 10:06   ` Jiri Olsa
  0 siblings, 0 replies; 19+ messages in thread
From: Jiri Olsa @ 2019-10-30 10:06 UTC (permalink / raw)
  To: Andi Kleen; +Cc: acme, jolsa, eranian, linux-kernel, Andi Kleen

On Fri, Oct 25, 2019 at 11:14:15AM -0700, Andi Kleen wrote:

SNIP

>  
> +enum counter_recovery {
> +	COUNTER_SKIP,
> +	COUNTER_RETRY,
> +	COUNTER_FATAL,
> +};
> +
> +static enum counter_recovery stat_handle_error(struct evsel *counter)
> +{
> +	char msg[BUFSIZ];
> +	/*
> +	 * PPC returns ENXIO for HW counters until 2.6.37
> +	 * (behavior changed with commit b0a873e).
> +	 */
> +	if (errno == EINVAL || errno == ENOSYS ||
> +	    errno == ENOENT || errno == EOPNOTSUPP ||
> +	    errno == ENXIO) {
> +		if (verbose > 0)
> +			ui__warning("%s event is not supported by the kernel.\n",
> +				    perf_evsel__name(counter));
> +		counter->supported = false;
> +		counter->errored = true;
> +
> +		if ((counter->leader != counter) ||
> +		    !(counter->leader->core.nr_members > 1))
> +			return COUNTER_SKIP;
> +	} else if (perf_evsel__fallback(counter, errno, msg, sizeof(msg))) {
> +		if (verbose > 0)
> +			ui__warning("%s\n", msg);
> +		return COUNTER_RETRY;
> +	} else if (target__has_per_thread(&target) &&
> +		   evsel_list->core.threads &&
> +		   evsel_list->core.threads->err_thread != -1) {
> +		/*
> +		 * For global --per-thread case, skip current
> +		 * error thread.
> +		 */
> +		if (!thread_map__remove(evsel_list->core.threads,
> +					evsel_list->core.threads->err_thread)) {
> +			evsel_list->core.threads->err_thread = -1;
> +			return COUNTER_RETRY;
> +		}
> +	}
> +
> +	perf_evsel__open_strerror(counter, &target,
> +				  errno, msg, sizeof(msg));
> +	ui__error("%s\n", msg);
> +
> +	if (child_pid != -1)
> +		kill(child_pid, SIGTERM);
> +	return COUNTER_FATAL;
> +}

there's lot of code movement and other factoring together with
affinity changes, please separate those into separate patches

thanks,
jirka

> +
>  static int __run_perf_stat(int argc, const char **argv, int run_idx)
>  {
>  	int interval = stat_config.interval;
> @@ -428,11 +481,15 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>  	char msg[BUFSIZ];
>  	unsigned long long t0, t1;
>  	struct evsel *counter;
> +	struct perf_cpu_map *cpus;
>  	struct timespec ts;
>  	size_t l;
>  	int status = 0;
>  	const bool forks = (argc > 0);
>  	bool is_pipe = STAT_RECORD ? perf_stat.data.is_pipe : false;
> +	struct affinity affinity;
> +	int i, cpu;
> +	bool second_pass = false;
>  
>  	if (interval) {
>  		ts.tv_sec  = interval / USEC_PER_MSEC;
> @@ -457,61 +514,109 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>  	if (group)
>  		perf_evlist__set_leader(evsel_list);
>  
> -	evlist__for_each_entry(evsel_list, counter) {
> +	if (affinity__setup(&affinity) < 0)
> +		return -1;
> +
> +	cpus = evlist__cpu_iter_start(evsel_list);
> +	cpumap__for_each_cpu (cpus, i, cpu) {
> +		affinity__set(&affinity, cpu);
> +
> +		evlist__for_each_entry(evsel_list, counter) {
> +			if (evlist__cpu_iter_skip(counter, cpu))
> +				continue;
> +			if (counter->reset_group || counter->errored)
> +				continue;
> +			evlist__cpu_iter_next(counter);
>  try_again:
> -		if (create_perf_stat_counter(counter, &stat_config, &target) < 0) {
> -
> -			/* Weak group failed. Reset the group. */
> -			if ((errno == EINVAL || errno == EBADF) &&
> -			    counter->leader != counter &&
> -			    counter->weak_group) {
> -				counter = perf_evlist__reset_weak_group(evsel_list, counter);
> -				goto try_again;
> -			}
> +			if (create_perf_stat_counter(counter, &stat_config, &target,
> +						     counter->cpu_index - 1) < 0) {
>  
> -			/*
> -			 * PPC returns ENXIO for HW counters until 2.6.37
> -			 * (behavior changed with commit b0a873e).
> -			 */
> -			if (errno == EINVAL || errno == ENOSYS ||
> -			    errno == ENOENT || errno == EOPNOTSUPP ||
> -			    errno == ENXIO) {
> -				if (verbose > 0)
> -					ui__warning("%s event is not supported by the kernel.\n",
> -						    perf_evsel__name(counter));
> -				counter->supported = false;
> -
> -				if ((counter->leader != counter) ||
> -				    !(counter->leader->core.nr_members > 1))
> -					continue;
> -			} else if (perf_evsel__fallback(counter, errno, msg, sizeof(msg))) {
> -                                if (verbose > 0)
> -                                        ui__warning("%s\n", msg);
> -                                goto try_again;
> -			} else if (target__has_per_thread(&target) &&
> -				   evsel_list->core.threads &&
> -				   evsel_list->core.threads->err_thread != -1) {
>  				/*
> -				 * For global --per-thread case, skip current
> -				 * error thread.
> +				 * Weak group failed. We cannot just undo this here
> +				 * because earlier CPUs might be in group mode, and the kernel
> +				 * doesn't support mixing group and non group reads. Defer
> +				 * it to later.
> +				 * Don't close here because we're in the wrong affinity.
>  				 */
> -				if (!thread_map__remove(evsel_list->core.threads,
> -							evsel_list->core.threads->err_thread)) {
> -					evsel_list->core.threads->err_thread = -1;
> +				if ((errno == EINVAL || errno == EBADF) &&
> +				    counter->leader != counter &&
> +				    counter->weak_group) {
> +					perf_evlist__reset_weak_group(evsel_list, counter, false);
> +					assert(counter->reset_group);
> +					second_pass = true;
> +					continue;
> +				}
> +
> +				switch (stat_handle_error(counter)) {
> +				case COUNTER_FATAL:
> +					return -1;
> +				case COUNTER_RETRY:
>  					goto try_again;
> +				case COUNTER_SKIP:
> +					continue;
> +				default:
> +					break;
>  				}
> +
>  			}
> +			counter->supported = true;
> +		}
> +	}
>  
> -			perf_evsel__open_strerror(counter, &target,
> -						  errno, msg, sizeof(msg));
> -			ui__error("%s\n", msg);
> +	if (second_pass) {
> +		/*
> +		 * Now redo all the weak group after closing them,
> +		 * and also close errored counters.
> +		 */
>  
> -			if (child_pid != -1)
> -				kill(child_pid, SIGTERM);
> +		cpus = evlist__cpu_iter_start(evsel_list);
> +		cpumap__for_each_cpu (cpus, i, cpu) {
> +			affinity__set(&affinity, cpu);
> +			/* First close errored or weak retry */
> +			evlist__for_each_entry(evsel_list, counter) {
> +				if (!counter->reset_group && !counter->errored)
> +					continue;
> +				if (evlist__cpu_iter_skip(counter, cpu))
> +					continue;
> +				perf_evsel__close_cpu(&counter->core, counter->cpu_index);
> +			}
> +			/* Now reopen weak */
> +			evlist__for_each_entry(evsel_list, counter) {
> +				if (!counter->reset_group)
> +					continue;
> +				if (evlist__cpu_iter_skip(counter, cpu))
> +					continue;
> +				evlist__cpu_iter_next(counter);
> +try_again_reset:
> +				pr_debug2("reopening weak %s\n", perf_evsel__name(counter));
> +				if (create_perf_stat_counter(counter, &stat_config, &target,
> +							     counter->cpu_index - 1) < 0) {
> +
> +					switch (stat_handle_error(counter)) {
> +					case COUNTER_FATAL:
> +						return -1;
> +					case COUNTER_RETRY:
> +						goto try_again_reset;
> +					case COUNTER_SKIP:
> +						continue;
> +					default:
> +						break;
> +					}
> +				}
> +				counter->supported = true;
> +			}
> +		}
> +	}
> +	affinity__cleanup(&affinity);
>  
> -			return -1;
> +	evlist__for_each_entry(evsel_list, counter) {
> +		if (!counter->supported) {
> +			perf_evsel__free_fd(&counter->core);
> +			continue;
>  		}
> -		counter->supported = true;
> +		/* Must have consumed all map indexes */
> +		assert(!counter->errored &&
> +			counter->cpu_index == counter->core.cpus->nr);
>  
>  		l = strlen(counter->unit);
>  		if (l > stat_config.unit_width)
> diff --git a/tools/perf/tests/event-times.c b/tools/perf/tests/event-times.c
> index 1ee8704e2284..1e8a9f5c356d 100644
> --- a/tools/perf/tests/event-times.c
> +++ b/tools/perf/tests/event-times.c
> @@ -125,7 +125,7 @@ static int attach__cpu_disabled(struct evlist *evlist)
>  
>  	evsel->core.attr.disabled = 1;
>  
> -	err = perf_evsel__open_per_cpu(evsel, cpus);
> +	err = perf_evsel__open_per_cpu(evsel, cpus, -1);
>  	if (err) {
>  		if (err == -EACCES)
>  			return TEST_SKIP;
> @@ -152,7 +152,7 @@ static int attach__cpu_enabled(struct evlist *evlist)
>  		return -1;
>  	}
>  
> -	err = perf_evsel__open_per_cpu(evsel, cpus);
> +	err = perf_evsel__open_per_cpu(evsel, cpus, -1);
>  	if (err == -EACCES)
>  		return TEST_SKIP;
>  
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index aeb82de36043..ca9b06979fc0 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -1635,7 +1635,8 @@ void perf_evlist__force_leader(struct evlist *evlist)
>  }
>  
>  struct evsel *perf_evlist__reset_weak_group(struct evlist *evsel_list,
> -						 struct evsel *evsel)
> +						 struct evsel *evsel,
> +						bool close)
>  {
>  	struct evsel *c2, *leader;
>  	bool is_open = true;
> @@ -1652,10 +1653,11 @@ struct evsel *perf_evlist__reset_weak_group(struct evlist *evsel_list,
>  		if (c2 == evsel)
>  			is_open = false;
>  		if (c2->leader == leader) {
> -			if (is_open)
> +			if (is_open && close)
>  				perf_evsel__close(&c2->core);
>  			c2->leader = c2;
>  			c2->core.nr_members = 0;
> +			c2->reset_group = true;
>  		}
>  	}
>  	return leader;
> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> index c1deb8ebdcea..d9174d565db3 100644
> --- a/tools/perf/util/evlist.h
> +++ b/tools/perf/util/evlist.h
> @@ -351,5 +351,6 @@ bool perf_evlist__exclude_kernel(struct evlist *evlist);
>  void perf_evlist__force_leader(struct evlist *evlist);
>  
>  struct evsel *perf_evlist__reset_weak_group(struct evlist *evlist,
> -						 struct evsel *evsel);
> +						 struct evsel *evsel,
> +						bool close);
>  #endif /* __PERF_EVLIST_H */
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index d4451846af93..7106f9a067df 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -1569,8 +1569,9 @@ static int perf_event_open(struct evsel *evsel,
>  	return fd;
>  }
>  
> -int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus,
> -		struct perf_thread_map *threads)
> +static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
> +		struct perf_thread_map *threads,
> +		int start_cpu, int end_cpu)
>  {
>  	int cpu, thread, nthreads;
>  	unsigned long flags = PERF_FLAG_FD_CLOEXEC;
> @@ -1647,7 +1648,7 @@ int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus,
>  
>  	display_attr(&evsel->core.attr);
>  
> -	for (cpu = 0; cpu < cpus->nr; cpu++) {
> +	for (cpu = start_cpu; cpu < end_cpu; cpu++) {
>  
>  		for (thread = 0; thread < nthreads; thread++) {
>  			int fd, group_fd;
> @@ -1825,6 +1826,12 @@ int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus,
>  	return err;
>  }
>  
> +int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus,
> +		struct perf_thread_map *threads)
> +{
> +	return evsel__open_cpu(evsel, cpus, threads, 0, cpus ? cpus->nr : 1);
> +}
> +
>  void evsel__close(struct evsel *evsel)
>  {
>  	perf_evsel__close(&evsel->core);
> @@ -1832,9 +1839,10 @@ void evsel__close(struct evsel *evsel)
>  }
>  
>  int perf_evsel__open_per_cpu(struct evsel *evsel,
> -			     struct perf_cpu_map *cpus)
> +			     struct perf_cpu_map *cpus,
> +			     int cpu)
>  {
> -	return evsel__open(evsel, cpus, NULL);
> +	return evsel__open_cpu(evsel, cpus, NULL, cpu, cpu + 1);
>  }
>  
>  int perf_evsel__open_per_thread(struct evsel *evsel,
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index 2e3b011ed09e..d5440a928745 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -94,6 +94,8 @@ struct evsel {
>  	struct evsel		*metric_leader;
>  	bool			collect_stat;
>  	bool			weak_group;
> +	bool			reset_group;
> +	bool			errored;
>  	bool			percore;
>  	int			cpu_index;
>  	const char		*pmu_name;
> @@ -223,7 +225,8 @@ int evsel__enable(struct evsel *evsel);
>  int evsel__disable(struct evsel *evsel);
>  
>  int perf_evsel__open_per_cpu(struct evsel *evsel,
> -			     struct perf_cpu_map *cpus);
> +			     struct perf_cpu_map *cpus,
> +			     int cpu);
>  int perf_evsel__open_per_thread(struct evsel *evsel,
>  				struct perf_thread_map *threads);
>  int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus,
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index 6822e4ffe224..3aebe732e886 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -463,7 +463,8 @@ size_t perf_event__fprintf_stat_config(union perf_event *event, FILE *fp)
>  
>  int create_perf_stat_counter(struct evsel *evsel,
>  			     struct perf_stat_config *config,
> -			     struct target *target)
> +			     struct target *target,
> +			     int cpu)
>  {
>  	struct perf_event_attr *attr = &evsel->core.attr;
>  	struct evsel *leader = evsel->leader;
> @@ -517,7 +518,7 @@ int create_perf_stat_counter(struct evsel *evsel,
>  	}
>  
>  	if (target__has_cpu(target) && !target__has_per_thread(target))
> -		return perf_evsel__open_per_cpu(evsel, evsel__cpus(evsel));
> +		return perf_evsel__open_per_cpu(evsel, evsel__cpus(evsel), cpu);
>  
>  	return perf_evsel__open_per_thread(evsel, evsel->core.threads);
>  }
> diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> index 081c4a5113c6..4c9a7b68c3e7 100644
> --- a/tools/perf/util/stat.h
> +++ b/tools/perf/util/stat.h
> @@ -213,7 +213,8 @@ size_t perf_event__fprintf_stat_config(union perf_event *event, FILE *fp);
>  
>  int create_perf_stat_counter(struct evsel *evsel,
>  			     struct perf_stat_config *config,
> -			     struct target *target);
> +			     struct target *target,
> +			     int cpu);
>  void
>  perf_evlist__print_counters(struct evlist *evlist,
>  			    struct perf_stat_config *config,
> -- 
> 2.21.0
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 3/7] perf evsel: Add iterator to iterate over events ordered by CPU
  2019-10-25 18:14 ` [PATCH v3 3/7] perf evsel: Add iterator to iterate over events ordered by CPU Andi Kleen
  2019-10-30 10:05   ` Jiri Olsa
@ 2019-10-30 10:06   ` Jiri Olsa
  2019-10-30 15:51     ` Andi Kleen
  1 sibling, 1 reply; 19+ messages in thread
From: Jiri Olsa @ 2019-10-30 10:06 UTC (permalink / raw)
  To: Andi Kleen; +Cc: acme, jolsa, eranian, linux-kernel, Andi Kleen

On Fri, Oct 25, 2019 at 11:14:13AM -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> Add some common code that is needed to iterate over all events
> in CPU order. Used in followon patches
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> 
> ---
> 
> v2: Add cpumap__for_each_cpu macro to factor out some common code
> ---
>  tools/perf/util/cpumap.h |  8 ++++++++
>  tools/perf/util/evlist.c | 33 +++++++++++++++++++++++++++++++++
>  tools/perf/util/evlist.h |  4 ++++
>  tools/perf/util/evsel.h  |  1 +
>  4 files changed, 46 insertions(+)
> 
> diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h
> index 2553bef1279d..a9b13d72fd29 100644
> --- a/tools/perf/util/cpumap.h
> +++ b/tools/perf/util/cpumap.h
> @@ -60,4 +60,12 @@ int cpu_map__build_map(struct perf_cpu_map *cpus, struct perf_cpu_map **res,
>  
>  int cpu_map__cpu(struct perf_cpu_map *cpus, int idx);
>  bool cpu_map__has(struct perf_cpu_map *cpus, int cpu);
> +
> +#define __cpumap__for_each_cpu(cpus, index, cpu, maxcpu)\
> +	for ((index) = 0; 				\
> +	     (cpu) = (index) < (maxcpu) ? (cpus)->map[index] : -1, (index) < (maxcpu); \
> +	     (index)++)
> +#define cpumap__for_each_cpu(cpus, index, cpu) \
> +	__cpumap__for_each_cpu(cpus, index, cpu, (cpus)->nr)
> +
>  #endif /* __PERF_CPUMAP_H */
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index fdce590d2278..da3c8f8ef68e 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -342,6 +342,39 @@ static int perf_evlist__nr_threads(struct evlist *evlist,
>  		return perf_thread_map__nr(evlist->core.threads);
>  }
>  
> +struct perf_cpu_map *evlist__cpu_iter_start(struct evlist *evlist)
> +{
> +	struct perf_cpu_map *cpus;
> +	struct evsel *pos;
> +
> +	/*
> +	 * evlist->cpus is not necessarily a superset of all the
> +	 * event's cpus, so compute our own super set. This
> +	 * assume that there is a super set
> +	 */
> +	cpus = evlist->core.cpus;
> +	evlist__for_each_entry(evlist, pos) {
> +		pos->cpu_index = 0;
> +		if (pos->core.cpus->nr > cpus->nr)
> +			cpus = pos->core.cpus;
> +	}
> +	return cpus;

I might not understand the reason for cpu_index, but 
imagine something like below should be enough, no?

	make evlist->all_cpus that contains all events cpus + evlist->core.cpus,
        and iterate it via:

	evlist__for_each_cpu(evlist, cpu) {
		affinity__set(&affinity, cpu);

		evlist__for_each_entry(evlist, evsel) {
			if (!cpu_map__has(perf_evsel__cpus(&evsel->core), cpu)
				continue;

			// here we have evsel with its cpu running on given cpu
		}
	}

jirka


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 3/7] perf evsel: Add iterator to iterate over events ordered by CPU
  2019-10-30 10:06   ` Jiri Olsa
@ 2019-10-30 15:51     ` Andi Kleen
  2019-10-30 18:15       ` Jiri Olsa
  0 siblings, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2019-10-30 15:51 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Andi Kleen, acme, jolsa, eranian, linux-kernel, Andi Kleen

On Wed, Oct 30, 2019 at 11:06:06AM +0100, Jiri Olsa wrote:
> > +struct perf_cpu_map *evlist__cpu_iter_start(struct evlist *evlist)
> > +{
> > +	struct perf_cpu_map *cpus;
> > +	struct evsel *pos;
> > +
> > +	/*
> > +	 * evlist->cpus is not necessarily a superset of all the
> > +	 * event's cpus, so compute our own super set. This
> > +	 * assume that there is a super set
> > +	 */
> > +	cpus = evlist->core.cpus;
> > +	evlist__for_each_entry(evlist, pos) {
> > +		pos->cpu_index = 0;
> > +		if (pos->core.cpus->nr > cpus->nr)
> > +			cpus = pos->core.cpus;
> > +	}
> > +	return cpus;
> 
> I might not understand the reason for cpu_index, but 

This is just so that we can iterate each event's map
independently.

> imagine something like below should be enough, no?

Well it's more complicated because evlist->all_cpus doesn't exist.
The exists evlist->cpus cannot be used (I tried that)
I also don't think we have an existing function to merge
two maps, so that would need to be added to create it.
Just using ->cpu_index is a much simpler change.


-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 3/7] perf evsel: Add iterator to iterate over events ordered by CPU
  2019-10-30 15:51     ` Andi Kleen
@ 2019-10-30 18:15       ` Jiri Olsa
  2019-10-30 19:03         ` Andi Kleen
  0 siblings, 1 reply; 19+ messages in thread
From: Jiri Olsa @ 2019-10-30 18:15 UTC (permalink / raw)
  To: Andi Kleen; +Cc: acme, jolsa, eranian, linux-kernel, Andi Kleen

On Wed, Oct 30, 2019 at 08:51:08AM -0700, Andi Kleen wrote:
> On Wed, Oct 30, 2019 at 11:06:06AM +0100, Jiri Olsa wrote:
> > > +struct perf_cpu_map *evlist__cpu_iter_start(struct evlist *evlist)
> > > +{
> > > +	struct perf_cpu_map *cpus;
> > > +	struct evsel *pos;
> > > +
> > > +	/*
> > > +	 * evlist->cpus is not necessarily a superset of all the
> > > +	 * event's cpus, so compute our own super set. This
> > > +	 * assume that there is a super set
> > > +	 */
> > > +	cpus = evlist->core.cpus;
> > > +	evlist__for_each_entry(evlist, pos) {
> > > +		pos->cpu_index = 0;
> > > +		if (pos->core.cpus->nr > cpus->nr)
> > > +			cpus = pos->core.cpus;
> > > +	}
> > > +	return cpus;
> > 
> > I might not understand the reason for cpu_index, but 
> 
> This is just so that we can iterate each event's map
> independently.
> 
> > imagine something like below should be enough, no?
> 
> Well it's more complicated because evlist->all_cpus doesn't exist.

yes, I suggest to create it

> The exists evlist->cpus cannot be used (I tried that)
> I also don't think we have an existing function to merge
> two maps, so that would need to be added to create it.
> Just using ->cpu_index is a much simpler change.

I dont think that would be lot of code
and it would simplify this one

jirka


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 3/7] perf evsel: Add iterator to iterate over events ordered by CPU
  2019-10-30 18:15       ` Jiri Olsa
@ 2019-10-30 19:03         ` Andi Kleen
  2019-11-01  8:38           ` Jiri Olsa
  0 siblings, 1 reply; 19+ messages in thread
From: Andi Kleen @ 2019-10-30 19:03 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Andi Kleen, acme, jolsa, eranian, linux-kernel, Andi Kleen

> 
> > The exists evlist->cpus cannot be used (I tried that)
> > I also don't think we have an existing function to merge
> > two maps, so that would need to be added to create it.
> > Just using ->cpu_index is a much simpler change.
> 
> I dont think that would be lot of code
> and it would simplify this one

AFAIK they're not guaranteed to be sorted, which makes merging
complicated. I'm not sure it's safe to just sort existing maps
because someone might have a index.

So you'll need to create temporary maps, sort them and then 
merge. Won't be simple.

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 3/7] perf evsel: Add iterator to iterate over events ordered by CPU
  2019-10-30 19:03         ` Andi Kleen
@ 2019-11-01  8:38           ` Jiri Olsa
  0 siblings, 0 replies; 19+ messages in thread
From: Jiri Olsa @ 2019-11-01  8:38 UTC (permalink / raw)
  To: Andi Kleen; +Cc: acme, jolsa, eranian, linux-kernel, Andi Kleen

On Wed, Oct 30, 2019 at 12:03:28PM -0700, Andi Kleen wrote:
> > 
> > > The exists evlist->cpus cannot be used (I tried that)
> > > I also don't think we have an existing function to merge
> > > two maps, so that would need to be added to create it.
> > > Just using ->cpu_index is a much simpler change.
> > 
> > I dont think that would be lot of code
> > and it would simplify this one
> 
> AFAIK they're not guaranteed to be sorted, which makes merging
> complicated. I'm not sure it's safe to just sort existing maps
> because someone might have a index.

we could add bitmap to maps, then combining them
would be just a matter of or-ing them

> 
> So you'll need to create temporary maps, sort them and then 
> merge. Won't be simple.

it's also not simple to read simple event close code now

jirka


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3 4/7] perf stat: Use affinity for closing file descriptors
  2019-10-30 10:05   ` Jiri Olsa
@ 2019-11-04 23:35     ` Andi Kleen
  0 siblings, 0 replies; 19+ messages in thread
From: Andi Kleen @ 2019-11-04 23:35 UTC (permalink / raw)
  To: Jiri Olsa; +Cc: Andi Kleen, acme, jolsa, eranian, linux-kernel, Andi Kleen

> >  
> > -	evlist__for_each_entry_reverse(evlist, evsel)
> > -		evsel__close(evsel);
> > +	if (affinity__setup(&affinity) < 0)
> > +		return;
> > +	cpus = evlist__cpu_iter_start(evlist);
> > +	cpumap__for_each_cpu (cpus, i, cpu) {
> > +		affinity__set(&affinity, cpu);
> 
> whats the point of affinity->changed flags when we call
> affinity__set unconditionaly? I think we can do without
> it, becase we'll always endup calling affinity__set

No we don't in the per thread case (without -a) 

In this case affinity is never set because there is no cpu.

I added it just to make the strace look nicer in this case.

> 
> however, it seems superfluous to always allocate those
> bitmaps, while we need just the current cpus that we
> run on and also that is probably questionable
> 
> could we put 'struct affinity' to 'struct evlist'
> and get rid of all affinity__setup/cleanup calls?
> (apart from those in evlist__init and evlist__delete)

affinity setup/cleanup is essentially push/pop for the affinity
state. For setup while it could be in theory moved
it would be a bad API because if someone sets up a evlist
inside an affinity region it would save the wrong state.

For cleanup there's nothing that would call it to reset
the affinity.

They could be made global, but that's somewhat ugly
and might also break with threading.


-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-11-04 23:35 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-25 18:14 Optimize perf stat for large number of events/cpus v3 Andi Kleen
2019-10-25 18:14 ` [PATCH v3 1/7] perf pmu: Use file system cache to optimize sysfs access Andi Kleen
2019-10-28 22:01   ` Jiri Olsa
2019-10-29  2:14     ` Andi Kleen
2019-10-25 18:14 ` [PATCH v3 2/7] perf affinity: Add infrastructure to save/restore affinity Andi Kleen
2019-10-25 18:14 ` [PATCH v3 3/7] perf evsel: Add iterator to iterate over events ordered by CPU Andi Kleen
2019-10-30 10:05   ` Jiri Olsa
2019-10-30 10:06   ` Jiri Olsa
2019-10-30 15:51     ` Andi Kleen
2019-10-30 18:15       ` Jiri Olsa
2019-10-30 19:03         ` Andi Kleen
2019-11-01  8:38           ` Jiri Olsa
2019-10-25 18:14 ` [PATCH v3 4/7] perf stat: Use affinity for closing file descriptors Andi Kleen
2019-10-30 10:05   ` Jiri Olsa
2019-11-04 23:35     ` Andi Kleen
2019-10-25 18:14 ` [PATCH v3 5/7] perf stat: Use affinity for opening events Andi Kleen
2019-10-30 10:06   ` Jiri Olsa
2019-10-25 18:14 ` [PATCH v3 6/7] perf stat: Use affinity for reading Andi Kleen
2019-10-25 18:14 ` [PATCH v3 7/7] perf stat: Use affinity for enabling/disabling events Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).