linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/6] Make evlist CPUs more accurate
@ 2022-04-30  6:23 Ian Rogers
  2022-04-30  6:23 ` [PATCH v4 1/6] perf cpumap: Switch to using perf_cpu_map API Ian Rogers
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Ian Rogers @ 2022-04-30  6:23 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel
  Cc: Stephane Eranian, Ian Rogers

Based on the thread:
https://lore.kernel.org/linux-perf-users/CAP-5=fVMHzTfKdpWMXtbtx7t14u2f4WzNak+F0Q93cQ7CZfhbg@mail.gmail.com/

First patch is a cherry-pick to avoid a conflict of:
https://lore.kernel.org/linux-perf-users/20220414014642.3308206-2-irogers@google.com/
Second patch makes all_cpus more accurate when there are command line CPUs.
The third patch fixes perf stat metric-only output for uncore metrics.
The fourth patch makes cleans up merging of dummy CPU maps.
The fifth and sixth patch try to make user_requested_cpus and all_cpus
clearer with documentation and by renaming all_cpus.

The code no longer needs to add an intersect function and so the API
is removed and the merged API left unchangaged.

Ian Rogers (6):
  perf cpumap: Switch to using perf_cpu_map API
  perf evlist: Clear all_cpus before propagating
  perf stat: Avoid printing cpus with no counters
  perf cpumap: Handle dummy maps as empty in subset
  perf evlist: Add to user_requested_cpus documentation
  perf evlist: Rename all_cpus

 tools/lib/perf/cpumap.c                  |  4 +--
 tools/lib/perf/evlist.c                  | 14 ++++++----
 tools/lib/perf/include/internal/evlist.h |  5 ++--
 tools/perf/builtin-record.c              | 13 +++++----
 tools/perf/tests/cpumap.c                | 10 ++++++-
 tools/perf/util/bpf_counter_cgroup.c     | 35 ++++++++++++------------
 tools/perf/util/evlist.c                 |  6 ++--
 tools/perf/util/evlist.h                 |  4 +--
 tools/perf/util/stat-display.c           |  7 +++--
 9 files changed, 56 insertions(+), 42 deletions(-)

-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/6] perf cpumap: Switch to using perf_cpu_map API
  2022-04-30  6:23 [PATCH v4 0/6] Make evlist CPUs more accurate Ian Rogers
@ 2022-04-30  6:23 ` Ian Rogers
  2022-05-02 23:51   ` Namhyung Kim
  2022-04-30  6:23 ` [PATCH v4 2/6] perf evlist: Clear all_cpus before propagating Ian Rogers
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Ian Rogers @ 2022-04-30  6:23 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel
  Cc: Stephane Eranian, Ian Rogers

Switch some raw accesses to the cpu map to using the library API. This
can help with reference count checking.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-record.c          | 13 ++++++------
 tools/perf/util/bpf_counter_cgroup.c | 31 ++++++++++++++--------------
 2 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 069825c48d40..a5cf6a99d67f 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1011,7 +1011,7 @@ static int record__thread_data_init_maps(struct record_thread *thread_data, stru
 
 	for (m = 0, tm = 0; m < nr_mmaps && tm < thread_data->nr_mmaps; m++) {
 		if (cpu_map__is_dummy(cpus) ||
-		    test_bit(cpus->map[m].cpu, thread_data->mask->maps.bits)) {
+		    test_bit(perf_cpu_map__cpu(cpus, m).cpu, thread_data->mask->maps.bits)) {
 			if (thread_data->maps) {
 				thread_data->maps[tm] = &mmap[m];
 				pr_debug2("thread_data[%p]: cpu%d: maps[%d] -> mmap[%d]\n",
@@ -3331,13 +3331,14 @@ struct option *record_options = __record_options;
 
 static void record__mmap_cpu_mask_init(struct mmap_cpu_mask *mask, struct perf_cpu_map *cpus)
 {
-	int c;
+	struct perf_cpu cpu;
+	int idx;
 
 	if (cpu_map__is_dummy(cpus))
 		return;
 
-	for (c = 0; c < cpus->nr; c++)
-		set_bit(cpus->map[c].cpu, mask->bits);
+	perf_cpu_map__for_each_cpu(cpu, idx, cpus)
+		set_bit(cpu.cpu, mask->bits);
 }
 
 static int record__mmap_cpu_mask_init_spec(struct mmap_cpu_mask *mask, const char *mask_spec)
@@ -3404,8 +3405,8 @@ static int record__init_thread_cpu_masks(struct record *rec, struct perf_cpu_map
 	pr_debug("nr_threads: %d\n", rec->nr_threads);
 
 	for (t = 0; t < rec->nr_threads; t++) {
-		set_bit(cpus->map[t].cpu, rec->thread_masks[t].maps.bits);
-		set_bit(cpus->map[t].cpu, rec->thread_masks[t].affinity.bits);
+		set_bit(perf_cpu_map__cpu(cpus, t).cpu, rec->thread_masks[t].maps.bits);
+		set_bit(perf_cpu_map__cpu(cpus, t).cpu, rec->thread_masks[t].affinity.bits);
 		if (verbose) {
 			pr_debug("thread_masks[%d]: ", t);
 			mmap_cpu_mask__scnprintf(&rec->thread_masks[t].maps, "maps");
diff --git a/tools/perf/util/bpf_counter_cgroup.c b/tools/perf/util/bpf_counter_cgroup.c
index ac60c08e8e2a..a4b676920da0 100644
--- a/tools/perf/util/bpf_counter_cgroup.c
+++ b/tools/perf/util/bpf_counter_cgroup.c
@@ -47,7 +47,7 @@ static int bperf_load_program(struct evlist *evlist)
 	struct evsel *evsel;
 	struct cgroup *cgrp, *leader_cgrp;
 	__u32 i, cpu;
-	__u32 nr_cpus = evlist->core.all_cpus->nr;
+	__u32 nr_cpus = perf_cpu_map__nr(evlist->core.all_cpus);
 	int total_cpus = cpu__max_cpu().cpu;
 	int map_size, map_fd;
 	int prog_fd, err;
@@ -125,7 +125,7 @@ static int bperf_load_program(struct evlist *evlist)
 			for (cpu = 0; cpu < nr_cpus; cpu++) {
 				int fd = FD(evsel, cpu);
 				__u32 idx = evsel->core.idx * total_cpus +
-					evlist->core.all_cpus->map[cpu].cpu;
+					perf_cpu_map__cpu(evlist->core.all_cpus, cpu).cpu;
 
 				err = bpf_map_update_elem(map_fd, &idx, &fd,
 							  BPF_ANY);
@@ -207,13 +207,13 @@ static int bperf_cgrp__install_pe(struct evsel *evsel __maybe_unused,
  */
 static int bperf_cgrp__sync_counters(struct evlist *evlist)
 {
-	int i, cpu;
-	int nr_cpus = evlist->core.all_cpus->nr;
+	struct perf_cpu cpu;
+	int idx;
 	int prog_fd = bpf_program__fd(skel->progs.trigger_read);
 
-	for (i = 0; i < nr_cpus; i++) {
-		cpu = evlist->core.all_cpus->map[i].cpu;
-		bperf_trigger_reading(prog_fd, cpu);
+	perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
+		cpu = perf_cpu_map__cpu(evlist->core.all_cpus, i);
+		bperf_trigger_reading(prog_fd, cpu.cpu);
 	}
 
 	return 0;
@@ -244,12 +244,10 @@ static int bperf_cgrp__disable(struct evsel *evsel)
 static int bperf_cgrp__read(struct evsel *evsel)
 {
 	struct evlist *evlist = evsel->evlist;
-	int i, cpu, nr_cpus = evlist->core.all_cpus->nr;
 	int total_cpus = cpu__max_cpu().cpu;
 	struct perf_counts_values *counts;
 	struct bpf_perf_event_value *values;
 	int reading_map_fd, err = 0;
-	__u32 idx;
 
 	if (evsel->core.idx)
 		return 0;
@@ -263,7 +261,10 @@ static int bperf_cgrp__read(struct evsel *evsel)
 	reading_map_fd = bpf_map__fd(skel->maps.cgrp_readings);
 
 	evlist__for_each_entry(evlist, evsel) {
-		idx = evsel->core.idx;
+		__u32 idx = evsel->core.idx;
+		int i;
+		struct perf_cpu_map cpu;
+
 		err = bpf_map_lookup_elem(reading_map_fd, &idx, values);
 		if (err) {
 			pr_err("bpf map lookup failed: idx=%u, event=%s, cgrp=%s\n",
@@ -271,13 +272,11 @@ static int bperf_cgrp__read(struct evsel *evsel)
 			goto out;
 		}
 
-		for (i = 0; i < nr_cpus; i++) {
-			cpu = evlist->core.all_cpus->map[i].cpu;
-
+		perf_cpu_map__for_each_cpu(cpu, i, evlist->core.all_cpu) {
 			counts = perf_counts(evsel->counts, i, 0);
-			counts->val = values[cpu].counter;
-			counts->ena = values[cpu].enabled;
-			counts->run = values[cpu].running;
+			counts->val = values[cpu.cpu].counter;
+			counts->ena = values[cpu.cpu].enabled;
+			counts->run = values[cpu.cpu].running;
 		}
 	}
 
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 2/6] perf evlist: Clear all_cpus before propagating
  2022-04-30  6:23 [PATCH v4 0/6] Make evlist CPUs more accurate Ian Rogers
  2022-04-30  6:23 ` [PATCH v4 1/6] perf cpumap: Switch to using perf_cpu_map API Ian Rogers
@ 2022-04-30  6:23 ` Ian Rogers
  2022-04-30  6:23 ` [PATCH v4 3/6] perf stat: Avoid printing cpus with no counters Ian Rogers
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Ian Rogers @ 2022-04-30  6:23 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel
  Cc: Stephane Eranian, Ian Rogers

all_cpus is merged into during propagation. Initially all_cpus is set
from PMU sysfs. perf_evlist__set_maps will recompute it and change
evsel->cpus to user_requested_cpus if they are given. If all_cpus isn't
cleared then the union of the user_requested_cpus and PMU sysfs values
is set to all_cpus, whereas just user_requested_cpus is necessary. To
avoid this make all_cpus empty prior to propagation.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/lib/perf/evlist.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index a09315538a30..974b4585f93e 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -59,6 +59,10 @@ static void perf_evlist__propagate_maps(struct perf_evlist *evlist)
 {
 	struct perf_evsel *evsel;
 
+	/* Recomputing all_cpus, so start with a blank slate. */
+	perf_cpu_map__put(evlist->all_cpus);
+	evlist->all_cpus = NULL;
+
 	perf_evlist__for_each_evsel(evlist, evsel)
 		__perf_evlist__propagate_maps(evlist, evsel);
 }
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 3/6] perf stat: Avoid printing cpus with no counters
  2022-04-30  6:23 [PATCH v4 0/6] Make evlist CPUs more accurate Ian Rogers
  2022-04-30  6:23 ` [PATCH v4 1/6] perf cpumap: Switch to using perf_cpu_map API Ian Rogers
  2022-04-30  6:23 ` [PATCH v4 2/6] perf evlist: Clear all_cpus before propagating Ian Rogers
@ 2022-04-30  6:23 ` Ian Rogers
  2022-05-02 16:09   ` Adrian Hunter
  2022-04-30  6:23 ` [PATCH v4 4/6] perf cpumap: Handle dummy maps as empty in subset Ian Rogers
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Ian Rogers @ 2022-04-30  6:23 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel
  Cc: Stephane Eranian, Ian Rogers

perf_evlist's user_requested_cpus can contain CPUs not present in any
evsel's cpus, for example uncore counters. Avoid printing the prefix and
trailing \n until the first valid counter is encountered.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/stat-display.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index d9629a83aa78..13f705737367 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -948,8 +948,6 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
 		struct evsel *counter;
 		bool first = true;
 
-		if (prefix)
-			fputs(prefix, config->output);
 		evlist__for_each_entry(evlist, counter) {
 			u64 ena, run, val;
 			double uval;
@@ -961,6 +959,8 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
 
 			id = aggr_cpu_id__cpu(cpu, /*data=*/NULL);
 			if (first) {
+				if (prefix)
+					fputs(prefix, config->output);
 				aggr_printout(config, counter, id, 0);
 				first = false;
 			}
@@ -972,7 +972,8 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
 			printout(config, id, 0, counter, uval, prefix,
 				 run, ena, 1.0, &rt_stat);
 		}
-		fputc('\n', config->output);
+		if (!first)
+			fputc('\n', config->output);
 	}
 }
 
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 4/6] perf cpumap: Handle dummy maps as empty in subset
  2022-04-30  6:23 [PATCH v4 0/6] Make evlist CPUs more accurate Ian Rogers
                   ` (2 preceding siblings ...)
  2022-04-30  6:23 ` [PATCH v4 3/6] perf stat: Avoid printing cpus with no counters Ian Rogers
@ 2022-04-30  6:23 ` Ian Rogers
  2022-05-02 16:13   ` Adrian Hunter
  2022-04-30  6:23 ` [PATCH v4 5/6] perf evlist: Add to user_requested_cpus documentation Ian Rogers
  2022-04-30  6:23 ` [PATCH v4 6/6] perf evlist: Rename all_cpus Ian Rogers
  5 siblings, 1 reply; 12+ messages in thread
From: Ian Rogers @ 2022-04-30  6:23 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel
  Cc: Stephane Eranian, Ian Rogers

perf_cpu_map__empty is true for empty and dummy maps. Make is_subset
respect that.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/lib/perf/cpumap.c   |  4 ++--
 tools/perf/tests/cpumap.c | 10 +++++++++-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
index 384d5e076ee4..9c83675788c2 100644
--- a/tools/lib/perf/cpumap.c
+++ b/tools/lib/perf/cpumap.c
@@ -322,9 +322,9 @@ struct perf_cpu perf_cpu_map__max(struct perf_cpu_map *map)
 /** Is 'b' a subset of 'a'. */
 bool perf_cpu_map__is_subset(const struct perf_cpu_map *a, const struct perf_cpu_map *b)
 {
-	if (a == b || !b)
+	if (a == b || perf_cpu_map__empty(b))
 		return true;
-	if (!a || b->nr > a->nr)
+	if (perf_cpu_map__empty(a) || b->nr > a->nr)
 		return false;
 
 	for (int i = 0, j = 0; i < a->nr; i++) {
diff --git a/tools/perf/tests/cpumap.c b/tools/perf/tests/cpumap.c
index f94929ebb54b..d52b58395385 100644
--- a/tools/perf/tests/cpumap.c
+++ b/tools/perf/tests/cpumap.c
@@ -128,13 +128,21 @@ static int test__cpu_map_merge(struct test_suite *test __maybe_unused, int subte
 	struct perf_cpu_map *a = perf_cpu_map__new("4,2,1");
 	struct perf_cpu_map *b = perf_cpu_map__new("4,5,7");
 	struct perf_cpu_map *c = perf_cpu_map__merge(a, b);
+	struct perf_cpu_map *d = perf_cpu_map__dummy_new();
+	struct perf_cpu_map *e = perf_cpu_map__merge(b, d);
 	char buf[100];
 
 	TEST_ASSERT_VAL("failed to merge map: bad nr", perf_cpu_map__nr(c) == 5);
 	cpu_map__snprint(c, buf, sizeof(buf));
 	TEST_ASSERT_VAL("failed to merge map: bad result", !strcmp(buf, "1-2,4-5,7"));
-	perf_cpu_map__put(b);
+
+	TEST_ASSERT_VAL("failed to merge map: bad nr", perf_cpu_map__nr(e) == 3);
+	cpu_map__snprint(e, buf, sizeof(buf));
+	TEST_ASSERT_VAL("failed to merge map: bad result", !strcmp(buf, "4-5,7"));
+
 	perf_cpu_map__put(c);
+	perf_cpu_map__put(d);
+	perf_cpu_map__put(e);
 	return 0;
 }
 
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 5/6] perf evlist: Add to user_requested_cpus documentation
  2022-04-30  6:23 [PATCH v4 0/6] Make evlist CPUs more accurate Ian Rogers
                   ` (3 preceding siblings ...)
  2022-04-30  6:23 ` [PATCH v4 4/6] perf cpumap: Handle dummy maps as empty in subset Ian Rogers
@ 2022-04-30  6:23 ` Ian Rogers
  2022-04-30  6:23 ` [PATCH v4 6/6] perf evlist: Rename all_cpus Ian Rogers
  5 siblings, 0 replies; 12+ messages in thread
From: Ian Rogers @ 2022-04-30  6:23 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel
  Cc: Stephane Eranian, Ian Rogers

Document a key use-case in propagation.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/lib/perf/include/internal/evlist.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/lib/perf/include/internal/evlist.h b/tools/lib/perf/include/internal/evlist.h
index e3e64f37db7b..74541bd87aa9 100644
--- a/tools/lib/perf/include/internal/evlist.h
+++ b/tools/lib/perf/include/internal/evlist.h
@@ -21,7 +21,8 @@ struct perf_evlist {
 	bool			 has_user_cpus;
 	/**
 	 * The cpus passed from the command line or all online CPUs by
-	 * default.
+	 * default. For evsels with no or dummy cpu maps, this cpu map replaces
+	 * their cpus during propagation.
 	 */
 	struct perf_cpu_map	*user_requested_cpus;
 	/** The union of all evsel cpu maps. */
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 6/6] perf evlist: Rename all_cpus
  2022-04-30  6:23 [PATCH v4 0/6] Make evlist CPUs more accurate Ian Rogers
                   ` (4 preceding siblings ...)
  2022-04-30  6:23 ` [PATCH v4 5/6] perf evlist: Add to user_requested_cpus documentation Ian Rogers
@ 2022-04-30  6:23 ` Ian Rogers
  5 siblings, 0 replies; 12+ messages in thread
From: Ian Rogers @ 2022-04-30  6:23 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel
  Cc: Stephane Eranian, Ian Rogers

Try to make the struct variable clearer by renaming to
merged_evsel_cpus.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/lib/perf/evlist.c                  | 16 ++++++++--------
 tools/lib/perf/include/internal/evlist.h |  2 +-
 tools/perf/util/bpf_counter_cgroup.c     | 10 +++++-----
 tools/perf/util/evlist.c                 |  6 +++---
 tools/perf/util/evlist.h                 |  4 ++--
 5 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index 974b4585f93e..5840a9377494 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -52,16 +52,16 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
 
 	perf_thread_map__put(evsel->threads);
 	evsel->threads = perf_thread_map__get(evlist->threads);
-	evlist->all_cpus = perf_cpu_map__merge(evlist->all_cpus, evsel->cpus);
+	evlist->merged_evsel_cpus = perf_cpu_map__merge(evlist->merged_evsel_cpus, evsel->cpus);
 }
 
 static void perf_evlist__propagate_maps(struct perf_evlist *evlist)
 {
 	struct perf_evsel *evsel;
 
-	/* Recomputing all_cpus, so start with a blank slate. */
-	perf_cpu_map__put(evlist->all_cpus);
-	evlist->all_cpus = NULL;
+	/* Recomputing merged_evsel_cpus, so start with a blank slate. */
+	perf_cpu_map__put(evlist->merged_evsel_cpus);
+	evlist->merged_evsel_cpus = NULL;
 
 	perf_evlist__for_each_evsel(evlist, evsel)
 		__perf_evlist__propagate_maps(evlist, evsel);
@@ -128,10 +128,10 @@ static void perf_evlist__purge(struct perf_evlist *evlist)
 void perf_evlist__exit(struct perf_evlist *evlist)
 {
 	perf_cpu_map__put(evlist->user_requested_cpus);
-	perf_cpu_map__put(evlist->all_cpus);
+	perf_cpu_map__put(evlist->merged_evsel_cpus);
 	perf_thread_map__put(evlist->threads);
 	evlist->user_requested_cpus = NULL;
-	evlist->all_cpus = NULL;
+	evlist->merged_evsel_cpus = NULL;
 	evlist->threads = NULL;
 	fdarray__exit(&evlist->pollfd);
 }
@@ -169,8 +169,8 @@ void perf_evlist__set_maps(struct perf_evlist *evlist,
 		evlist->threads = perf_thread_map__get(threads);
 	}
 
-	if (!evlist->all_cpus && cpus)
-		evlist->all_cpus = perf_cpu_map__get(cpus);
+	if (!evlist->merged_evsel_cpus && cpus)
+		evlist->merged_evsel_cpus = perf_cpu_map__get(cpus);
 
 	perf_evlist__propagate_maps(evlist);
 }
diff --git a/tools/lib/perf/include/internal/evlist.h b/tools/lib/perf/include/internal/evlist.h
index 74541bd87aa9..3a0a47ba8c57 100644
--- a/tools/lib/perf/include/internal/evlist.h
+++ b/tools/lib/perf/include/internal/evlist.h
@@ -26,7 +26,7 @@ struct perf_evlist {
 	 */
 	struct perf_cpu_map	*user_requested_cpus;
 	/** The union of all evsel cpu maps. */
-	struct perf_cpu_map	*all_cpus;
+	struct perf_cpu_map	*merged_evsel_cpus;
 	struct perf_thread_map	*threads;
 	int			 nr_mmaps;
 	size_t			 mmap_len;
diff --git a/tools/perf/util/bpf_counter_cgroup.c b/tools/perf/util/bpf_counter_cgroup.c
index a4b676920da0..521f1e2b1bce 100644
--- a/tools/perf/util/bpf_counter_cgroup.c
+++ b/tools/perf/util/bpf_counter_cgroup.c
@@ -47,7 +47,7 @@ static int bperf_load_program(struct evlist *evlist)
 	struct evsel *evsel;
 	struct cgroup *cgrp, *leader_cgrp;
 	__u32 i, cpu;
-	__u32 nr_cpus = perf_cpu_map__nr(evlist->core.all_cpus);
+	__u32 nr_cpus = perf_cpu_map__nr(evlist->core.merged_evsel_cpus);
 	int total_cpus = cpu__max_cpu().cpu;
 	int map_size, map_fd;
 	int prog_fd, err;
@@ -88,7 +88,7 @@ static int bperf_load_program(struct evlist *evlist)
 	err = -1;
 
 	cgrp_switch = evsel__new(&cgrp_switch_attr);
-	if (evsel__open_per_cpu(cgrp_switch, evlist->core.all_cpus, -1) < 0) {
+	if (evsel__open_per_cpu(cgrp_switch, evlist->core.merged_evsel_cpus, -1) < 0) {
 		pr_err("Failed to open cgroup switches event\n");
 		goto out;
 	}
@@ -115,7 +115,7 @@ static int bperf_load_program(struct evlist *evlist)
 			evsel->cgrp = NULL;
 
 			/* open single copy of the events w/o cgroup */
-			err = evsel__open_per_cpu(evsel, evlist->core.all_cpus, -1);
+			err = evsel__open_per_cpu(evsel, evlist->core.merged_evsel_cpus, -1);
 			if (err) {
 				pr_err("Failed to open first cgroup events\n");
 				goto out;
@@ -125,7 +125,7 @@ static int bperf_load_program(struct evlist *evlist)
 			for (cpu = 0; cpu < nr_cpus; cpu++) {
 				int fd = FD(evsel, cpu);
 				__u32 idx = evsel->core.idx * total_cpus +
-					perf_cpu_map__cpu(evlist->core.all_cpus, cpu).cpu;
+					perf_cpu_map__cpu(evlist->core.merged_evsel_cpus, cpu).cpu;
 
 				err = bpf_map_update_elem(map_fd, &idx, &fd,
 							  BPF_ANY);
@@ -212,7 +212,7 @@ static int bperf_cgrp__sync_counters(struct evlist *evlist)
 	int prog_fd = bpf_program__fd(skel->progs.trigger_read);
 
 	perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
-		cpu = perf_cpu_map__cpu(evlist->core.all_cpus, i);
+		cpu = perf_cpu_map__cpu(evlist->core.merged_evsel_cpus, i);
 		bperf_trigger_reading(prog_fd, cpu.cpu);
 	}
 
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 52ea004ba01e..57ecd94e6f9e 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -349,7 +349,7 @@ struct evlist_cpu_iterator evlist__cpu_begin(struct evlist *evlist, struct affin
 		.evsel = NULL,
 		.cpu_map_idx = 0,
 		.evlist_cpu_map_idx = 0,
-		.evlist_cpu_map_nr = perf_cpu_map__nr(evlist->core.all_cpus),
+		.evlist_cpu_map_nr = perf_cpu_map__nr(evlist->core.merged_evsel_cpus),
 		.cpu = (struct perf_cpu){ .cpu = -1},
 		.affinity = affinity,
 	};
@@ -360,7 +360,7 @@ struct evlist_cpu_iterator evlist__cpu_begin(struct evlist *evlist, struct affin
 	} else {
 		itr.evsel = evlist__first(evlist);
 		if (itr.affinity) {
-			itr.cpu = perf_cpu_map__cpu(evlist->core.all_cpus, 0);
+			itr.cpu = perf_cpu_map__cpu(evlist->core.merged_evsel_cpus, 0);
 			affinity__set(itr.affinity, itr.cpu.cpu);
 			itr.cpu_map_idx = perf_cpu_map__idx(itr.evsel->core.cpus, itr.cpu);
 			/*
@@ -388,7 +388,7 @@ void evlist_cpu_iterator__next(struct evlist_cpu_iterator *evlist_cpu_itr)
 	if (evlist_cpu_itr->evlist_cpu_map_idx < evlist_cpu_itr->evlist_cpu_map_nr) {
 		evlist_cpu_itr->evsel = evlist__first(evlist_cpu_itr->container);
 		evlist_cpu_itr->cpu =
-			perf_cpu_map__cpu(evlist_cpu_itr->container->core.all_cpus,
+			perf_cpu_map__cpu(evlist_cpu_itr->container->core.merged_evsel_cpus,
 					  evlist_cpu_itr->evlist_cpu_map_idx);
 		if (evlist_cpu_itr->affinity)
 			affinity__set(evlist_cpu_itr->affinity, evlist_cpu_itr->cpu.cpu);
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index a21daaa5fc1b..e2e68f988d26 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -336,12 +336,12 @@ struct evlist_cpu_iterator {
 	/** The CPU map index corresponding to the evsel->core.cpus for the current CPU. */
 	int cpu_map_idx;
 	/**
-	 * The CPU map index corresponding to evlist->core.all_cpus for the
+	 * The CPU map index corresponding to evlist->core.merged_evsel_cpus for the
 	 * current CPU.  Distinct from cpu_map_idx as the evsel's cpu map may
 	 * contain fewer entries.
 	 */
 	int evlist_cpu_map_idx;
-	/** The number of CPU map entries in evlist->core.all_cpus. */
+	/** The number of CPU map entries in evlist->core.merged_evsel_cpus. */
 	int evlist_cpu_map_nr;
 	/** The current CPU of the iterator. */
 	struct perf_cpu cpu;
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 3/6] perf stat: Avoid printing cpus with no counters
  2022-04-30  6:23 ` [PATCH v4 3/6] perf stat: Avoid printing cpus with no counters Ian Rogers
@ 2022-05-02 16:09   ` Adrian Hunter
  0 siblings, 0 replies; 12+ messages in thread
From: Adrian Hunter @ 2022-05-02 16:09 UTC (permalink / raw)
  To: Ian Rogers, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel
  Cc: Stephane Eranian

On 30/04/22 09:23, Ian Rogers wrote:
> perf_evlist's user_requested_cpus can contain CPUs not present in any
> evsel's cpus, for example uncore counters. Avoid printing the prefix and
> trailing \n until the first valid counter is encountered.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>

Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>

> ---
>  tools/perf/util/stat-display.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index d9629a83aa78..13f705737367 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -948,8 +948,6 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
>  		struct evsel *counter;
>  		bool first = true;
>  
> -		if (prefix)
> -			fputs(prefix, config->output);
>  		evlist__for_each_entry(evlist, counter) {
>  			u64 ena, run, val;
>  			double uval;
> @@ -961,6 +959,8 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
>  
>  			id = aggr_cpu_id__cpu(cpu, /*data=*/NULL);
>  			if (first) {
> +				if (prefix)
> +					fputs(prefix, config->output);
>  				aggr_printout(config, counter, id, 0);
>  				first = false;
>  			}
> @@ -972,7 +972,8 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
>  			printout(config, id, 0, counter, uval, prefix,
>  				 run, ena, 1.0, &rt_stat);
>  		}
> -		fputc('\n', config->output);
> +		if (!first)
> +			fputc('\n', config->output);
>  	}
>  }
>  


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 4/6] perf cpumap: Handle dummy maps as empty in subset
  2022-04-30  6:23 ` [PATCH v4 4/6] perf cpumap: Handle dummy maps as empty in subset Ian Rogers
@ 2022-05-02 16:13   ` Adrian Hunter
  2022-05-02 17:04     ` Ian Rogers
  0 siblings, 1 reply; 12+ messages in thread
From: Adrian Hunter @ 2022-05-02 16:13 UTC (permalink / raw)
  To: Ian Rogers, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel
  Cc: Stephane Eranian

On 30/04/22 09:23, Ian Rogers wrote:
> perf_cpu_map__empty is true for empty and dummy maps. Make is_subset
> respect that.

I think this might be the opposite of what I am trying to do, which is
enable all_cpus to represent all the "cpu" values (3rd parameter of
perf_event_open()) to iterate over including -1 so that per-thread and
per-cpu events can be mixed.

> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/lib/perf/cpumap.c   |  4 ++--
>  tools/perf/tests/cpumap.c | 10 +++++++++-
>  2 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
> index 384d5e076ee4..9c83675788c2 100644
> --- a/tools/lib/perf/cpumap.c
> +++ b/tools/lib/perf/cpumap.c
> @@ -322,9 +322,9 @@ struct perf_cpu perf_cpu_map__max(struct perf_cpu_map *map)
>  /** Is 'b' a subset of 'a'. */
>  bool perf_cpu_map__is_subset(const struct perf_cpu_map *a, const struct perf_cpu_map *b)
>  {
> -	if (a == b || !b)
> +	if (a == b || perf_cpu_map__empty(b))
>  		return true;
> -	if (!a || b->nr > a->nr)
> +	if (perf_cpu_map__empty(a) || b->nr > a->nr)
>  		return false;
>  
>  	for (int i = 0, j = 0; i < a->nr; i++) {
> diff --git a/tools/perf/tests/cpumap.c b/tools/perf/tests/cpumap.c
> index f94929ebb54b..d52b58395385 100644
> --- a/tools/perf/tests/cpumap.c
> +++ b/tools/perf/tests/cpumap.c
> @@ -128,13 +128,21 @@ static int test__cpu_map_merge(struct test_suite *test __maybe_unused, int subte
>  	struct perf_cpu_map *a = perf_cpu_map__new("4,2,1");
>  	struct perf_cpu_map *b = perf_cpu_map__new("4,5,7");
>  	struct perf_cpu_map *c = perf_cpu_map__merge(a, b);
> +	struct perf_cpu_map *d = perf_cpu_map__dummy_new();
> +	struct perf_cpu_map *e = perf_cpu_map__merge(b, d);
>  	char buf[100];
>  
>  	TEST_ASSERT_VAL("failed to merge map: bad nr", perf_cpu_map__nr(c) == 5);
>  	cpu_map__snprint(c, buf, sizeof(buf));
>  	TEST_ASSERT_VAL("failed to merge map: bad result", !strcmp(buf, "1-2,4-5,7"));
> -	perf_cpu_map__put(b);
> +
> +	TEST_ASSERT_VAL("failed to merge map: bad nr", perf_cpu_map__nr(e) == 3);
> +	cpu_map__snprint(e, buf, sizeof(buf));
> +	TEST_ASSERT_VAL("failed to merge map: bad result", !strcmp(buf, "4-5,7"));
> +
>  	perf_cpu_map__put(c);
> +	perf_cpu_map__put(d);
> +	perf_cpu_map__put(e);
>  	return 0;
>  }
>  


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 4/6] perf cpumap: Handle dummy maps as empty in subset
  2022-05-02 16:13   ` Adrian Hunter
@ 2022-05-02 17:04     ` Ian Rogers
  2022-05-03  4:51       ` Adrian Hunter
  0 siblings, 1 reply; 12+ messages in thread
From: Ian Rogers @ 2022-05-02 17:04 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel,
	Stephane Eranian

On Mon, May 2, 2022 at 9:13 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> On 30/04/22 09:23, Ian Rogers wrote:
> > perf_cpu_map__empty is true for empty and dummy maps. Make is_subset
> > respect that.
>
> I think this might be the opposite of what I am trying to do, which is
> enable all_cpus to represent all the "cpu" values (3rd parameter of
> perf_event_open()) to iterate over including -1 so that per-thread and
> per-cpu events can be mixed.

Wouldn't you iterate over the cpus of the evsel? I'm not sure using
all_cpus in that way makes sense, it also violates the definition of
empty.

Thanks,
Ian

> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/lib/perf/cpumap.c   |  4 ++--
> >  tools/perf/tests/cpumap.c | 10 +++++++++-
> >  2 files changed, 11 insertions(+), 3 deletions(-)
> >
> > diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
> > index 384d5e076ee4..9c83675788c2 100644
> > --- a/tools/lib/perf/cpumap.c
> > +++ b/tools/lib/perf/cpumap.c
> > @@ -322,9 +322,9 @@ struct perf_cpu perf_cpu_map__max(struct perf_cpu_map *map)
> >  /** Is 'b' a subset of 'a'. */
> >  bool perf_cpu_map__is_subset(const struct perf_cpu_map *a, const struct perf_cpu_map *b)
> >  {
> > -     if (a == b || !b)
> > +     if (a == b || perf_cpu_map__empty(b))
> >               return true;
> > -     if (!a || b->nr > a->nr)
> > +     if (perf_cpu_map__empty(a) || b->nr > a->nr)
> >               return false;
> >
> >       for (int i = 0, j = 0; i < a->nr; i++) {
> > diff --git a/tools/perf/tests/cpumap.c b/tools/perf/tests/cpumap.c
> > index f94929ebb54b..d52b58395385 100644
> > --- a/tools/perf/tests/cpumap.c
> > +++ b/tools/perf/tests/cpumap.c
> > @@ -128,13 +128,21 @@ static int test__cpu_map_merge(struct test_suite *test __maybe_unused, int subte
> >       struct perf_cpu_map *a = perf_cpu_map__new("4,2,1");
> >       struct perf_cpu_map *b = perf_cpu_map__new("4,5,7");
> >       struct perf_cpu_map *c = perf_cpu_map__merge(a, b);
> > +     struct perf_cpu_map *d = perf_cpu_map__dummy_new();
> > +     struct perf_cpu_map *e = perf_cpu_map__merge(b, d);
> >       char buf[100];
> >
> >       TEST_ASSERT_VAL("failed to merge map: bad nr", perf_cpu_map__nr(c) == 5);
> >       cpu_map__snprint(c, buf, sizeof(buf));
> >       TEST_ASSERT_VAL("failed to merge map: bad result", !strcmp(buf, "1-2,4-5,7"));
> > -     perf_cpu_map__put(b);
> > +
> > +     TEST_ASSERT_VAL("failed to merge map: bad nr", perf_cpu_map__nr(e) == 3);
> > +     cpu_map__snprint(e, buf, sizeof(buf));
> > +     TEST_ASSERT_VAL("failed to merge map: bad result", !strcmp(buf, "4-5,7"));
> > +
> >       perf_cpu_map__put(c);
> > +     perf_cpu_map__put(d);
> > +     perf_cpu_map__put(e);
> >       return 0;
> >  }
> >
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 1/6] perf cpumap: Switch to using perf_cpu_map API
  2022-04-30  6:23 ` [PATCH v4 1/6] perf cpumap: Switch to using perf_cpu_map API Ian Rogers
@ 2022-05-02 23:51   ` Namhyung Kim
  0 siblings, 0 replies; 12+ messages in thread
From: Namhyung Kim @ 2022-05-02 23:51 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Mathieu Poirier,
	Suzuki K Poulose, Mike Leach, Leo Yan, John Garry, Will Deacon,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Kajol Jain, James Clark, German Gomez, Adrian Hunter,
	Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel,
	Stephane Eranian

Hi Ian,

On Fri, Apr 29, 2022 at 11:23 PM Ian Rogers <irogers@google.com> wrote:
>
> Switch some raw accesses to the cpu map to using the library API. This
> can help with reference count checking.
>
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
[...]
> diff --git a/tools/perf/util/bpf_counter_cgroup.c b/tools/perf/util/bpf_counter_cgroup.c
> index ac60c08e8e2a..a4b676920da0 100644
> --- a/tools/perf/util/bpf_counter_cgroup.c
> +++ b/tools/perf/util/bpf_counter_cgroup.c
> @@ -47,7 +47,7 @@ static int bperf_load_program(struct evlist *evlist)
>         struct evsel *evsel;
>         struct cgroup *cgrp, *leader_cgrp;
>         __u32 i, cpu;
> -       __u32 nr_cpus = evlist->core.all_cpus->nr;
> +       __u32 nr_cpus = perf_cpu_map__nr(evlist->core.all_cpus);
>         int total_cpus = cpu__max_cpu().cpu;
>         int map_size, map_fd;
>         int prog_fd, err;
> @@ -125,7 +125,7 @@ static int bperf_load_program(struct evlist *evlist)
>                         for (cpu = 0; cpu < nr_cpus; cpu++) {
>                                 int fd = FD(evsel, cpu);
>                                 __u32 idx = evsel->core.idx * total_cpus +
> -                                       evlist->core.all_cpus->map[cpu].cpu;
> +                                       perf_cpu_map__cpu(evlist->core.all_cpus, cpu).cpu;
>
>                                 err = bpf_map_update_elem(map_fd, &idx, &fd,
>                                                           BPF_ANY);
> @@ -207,13 +207,13 @@ static int bperf_cgrp__install_pe(struct evsel *evsel __maybe_unused,
>   */
>  static int bperf_cgrp__sync_counters(struct evlist *evlist)
>  {
> -       int i, cpu;
> -       int nr_cpus = evlist->core.all_cpus->nr;
> +       struct perf_cpu cpu;
> +       int idx;
>         int prog_fd = bpf_program__fd(skel->progs.trigger_read);
>
> -       for (i = 0; i < nr_cpus; i++) {
> -               cpu = evlist->core.all_cpus->map[i].cpu;
> -               bperf_trigger_reading(prog_fd, cpu);
> +       perf_cpu_map__for_each_cpu(cpu, idx, cpus) {

s/cpus/evlist->core.all_cpus/?

> +               cpu = perf_cpu_map__cpu(evlist->core.all_cpus, i);

I don't think we need this line anymore.

> +               bperf_trigger_reading(prog_fd, cpu.cpu);
>         }
>
>         return 0;
> @@ -244,12 +244,10 @@ static int bperf_cgrp__disable(struct evsel *evsel)
>  static int bperf_cgrp__read(struct evsel *evsel)
>  {
>         struct evlist *evlist = evsel->evlist;
> -       int i, cpu, nr_cpus = evlist->core.all_cpus->nr;
>         int total_cpus = cpu__max_cpu().cpu;
>         struct perf_counts_values *counts;
>         struct bpf_perf_event_value *values;
>         int reading_map_fd, err = 0;
> -       __u32 idx;
>
>         if (evsel->core.idx)
>                 return 0;
> @@ -263,7 +261,10 @@ static int bperf_cgrp__read(struct evsel *evsel)
>         reading_map_fd = bpf_map__fd(skel->maps.cgrp_readings);
>
>         evlist__for_each_entry(evlist, evsel) {
> -               idx = evsel->core.idx;
> +               __u32 idx = evsel->core.idx;
> +               int i;
> +               struct perf_cpu_map cpu;
> +
>                 err = bpf_map_lookup_elem(reading_map_fd, &idx, values);
>                 if (err) {
>                         pr_err("bpf map lookup failed: idx=%u, event=%s, cgrp=%s\n",
> @@ -271,13 +272,11 @@ static int bperf_cgrp__read(struct evsel *evsel)
>                         goto out;
>                 }
>
> -               for (i = 0; i < nr_cpus; i++) {
> -                       cpu = evlist->core.all_cpus->map[i].cpu;
> -
> +               perf_cpu_map__for_each_cpu(cpu, i, evlist->core.all_cpu) {

evlist->core.all_cpus ?

Thanks,
Namhyung


>                         counts = perf_counts(evsel->counts, i, 0);
> -                       counts->val = values[cpu].counter;
> -                       counts->ena = values[cpu].enabled;
> -                       counts->run = values[cpu].running;
> +                       counts->val = values[cpu.cpu].counter;
> +                       counts->ena = values[cpu.cpu].enabled;
> +                       counts->run = values[cpu.cpu].running;
>                 }
>         }
>
> --
> 2.36.0.464.gb9c8b46e94-goog
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 4/6] perf cpumap: Handle dummy maps as empty in subset
  2022-05-02 17:04     ` Ian Rogers
@ 2022-05-03  4:51       ` Adrian Hunter
  0 siblings, 0 replies; 12+ messages in thread
From: Adrian Hunter @ 2022-05-03  4:51 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel,
	Stephane Eranian

On 2/05/22 20:04, Ian Rogers wrote:
> On Mon, May 2, 2022 at 9:13 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>>
>> On 30/04/22 09:23, Ian Rogers wrote:
>>> perf_cpu_map__empty is true for empty and dummy maps. Make is_subset
>>> respect that.
>>
>> I think this might be the opposite of what I am trying to do, which is
>> enable all_cpus to represent all the "cpu" values (3rd parameter of
>> perf_event_open()) to iterate over including -1 so that per-thread and
>> per-cpu events can be mixed.
> 
> Wouldn't you iterate over the cpus of the evsel?

When mmapping perf events, it is necessary to iterate all (user_requested)
cpus / threads in order to conveniently set-output events on the same
cpu / thread.

>                                                  I'm not sure using
> all_cpus in that way makes sense

It does if you want to mix per-cpu and per-thread (cpu == -1) events.
As I wrote previously, the cpus is then seen as a list of values for
parameter 3 of perf_event_open(), and therefore includes -1.

>                                   it also violates the definition of
> empty.

That can be renamed, since it is typically being used to mean
per-thread (i.e. per-task context)

> 
> Thanks,
> Ian
> 
>>>
>>> Signed-off-by: Ian Rogers <irogers@google.com>
>>> ---
>>>  tools/lib/perf/cpumap.c   |  4 ++--
>>>  tools/perf/tests/cpumap.c | 10 +++++++++-
>>>  2 files changed, 11 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
>>> index 384d5e076ee4..9c83675788c2 100644
>>> --- a/tools/lib/perf/cpumap.c
>>> +++ b/tools/lib/perf/cpumap.c
>>> @@ -322,9 +322,9 @@ struct perf_cpu perf_cpu_map__max(struct perf_cpu_map *map)
>>>  /** Is 'b' a subset of 'a'. */
>>>  bool perf_cpu_map__is_subset(const struct perf_cpu_map *a, const struct perf_cpu_map *b)
>>>  {
>>> -     if (a == b || !b)
>>> +     if (a == b || perf_cpu_map__empty(b))
>>>               return true;
>>> -     if (!a || b->nr > a->nr)
>>> +     if (perf_cpu_map__empty(a) || b->nr > a->nr)
>>>               return false;
>>>
>>>       for (int i = 0, j = 0; i < a->nr; i++) {
>>> diff --git a/tools/perf/tests/cpumap.c b/tools/perf/tests/cpumap.c
>>> index f94929ebb54b..d52b58395385 100644
>>> --- a/tools/perf/tests/cpumap.c
>>> +++ b/tools/perf/tests/cpumap.c
>>> @@ -128,13 +128,21 @@ static int test__cpu_map_merge(struct test_suite *test __maybe_unused, int subte
>>>       struct perf_cpu_map *a = perf_cpu_map__new("4,2,1");
>>>       struct perf_cpu_map *b = perf_cpu_map__new("4,5,7");
>>>       struct perf_cpu_map *c = perf_cpu_map__merge(a, b);
>>> +     struct perf_cpu_map *d = perf_cpu_map__dummy_new();
>>> +     struct perf_cpu_map *e = perf_cpu_map__merge(b, d);
>>>       char buf[100];
>>>
>>>       TEST_ASSERT_VAL("failed to merge map: bad nr", perf_cpu_map__nr(c) == 5);
>>>       cpu_map__snprint(c, buf, sizeof(buf));
>>>       TEST_ASSERT_VAL("failed to merge map: bad result", !strcmp(buf, "1-2,4-5,7"));
>>> -     perf_cpu_map__put(b);
>>> +
>>> +     TEST_ASSERT_VAL("failed to merge map: bad nr", perf_cpu_map__nr(e) == 3);
>>> +     cpu_map__snprint(e, buf, sizeof(buf));
>>> +     TEST_ASSERT_VAL("failed to merge map: bad result", !strcmp(buf, "4-5,7"));
>>> +
>>>       perf_cpu_map__put(c);
>>> +     perf_cpu_map__put(d);
>>> +     perf_cpu_map__put(e);
>>>       return 0;
>>>  }
>>>
>>


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-05-03  4:52 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-30  6:23 [PATCH v4 0/6] Make evlist CPUs more accurate Ian Rogers
2022-04-30  6:23 ` [PATCH v4 1/6] perf cpumap: Switch to using perf_cpu_map API Ian Rogers
2022-05-02 23:51   ` Namhyung Kim
2022-04-30  6:23 ` [PATCH v4 2/6] perf evlist: Clear all_cpus before propagating Ian Rogers
2022-04-30  6:23 ` [PATCH v4 3/6] perf stat: Avoid printing cpus with no counters Ian Rogers
2022-05-02 16:09   ` Adrian Hunter
2022-04-30  6:23 ` [PATCH v4 4/6] perf cpumap: Handle dummy maps as empty in subset Ian Rogers
2022-05-02 16:13   ` Adrian Hunter
2022-05-02 17:04     ` Ian Rogers
2022-05-03  4:51       ` Adrian Hunter
2022-04-30  6:23 ` [PATCH v4 5/6] perf evlist: Add to user_requested_cpus documentation Ian Rogers
2022-04-30  6:23 ` [PATCH v4 6/6] perf evlist: Rename all_cpus Ian Rogers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).