linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/6] Make evlist CPUs more accurate
@ 2022-05-03  4:17 Ian Rogers
  2022-05-03  4:17 ` [PATCH v5 1/6] perf cpumap: Switch to using perf_cpu_map API Ian Rogers
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Ian Rogers @ 2022-05-03  4:17 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel
  Cc: Stephane Eranian, Ian Rogers

Based on the thread:
https://lore.kernel.org/linux-perf-users/CAP-5=fVMHzTfKdpWMXtbtx7t14u2f4WzNak+F0Q93cQ7CZfhbg@mail.gmail.com/

First patch is a cherry-pick to avoid a conflict of:
https://lore.kernel.org/linux-perf-users/20220414014642.3308206-2-irogers@google.com/
Second patch makes all_cpus more accurate when there are command line CPUs.
The third patch fixes perf stat metric-only output for uncore metrics.
The fourth patch makes cleans up merging of dummy CPU maps.
The fifth and sixth patch try to make user_requested_cpus and all_cpus
clearer with documentation and by renaming all_cpus.

The code no longer needs to add an intersect function and so the API
is removed and the merged API left unchanged.

v5. Fixes the 1st patch as pointed out by Namhyung Kim
    <namhyung@kernel.org>, adds some further clean-up to it and adds
    reviewed-by Adrian Hunter <adrian.hunter@intel.com> to patch 3.

Ian Rogers (6):
  perf cpumap: Switch to using perf_cpu_map API
  perf evlist: Clear all_cpus before propagating
  perf stat: Avoid printing cpus with no counters
  perf cpumap: Handle dummy maps as empty in subset
  perf evlist: Add to user_requested_cpus documentation
  perf evlist: Rename all_cpus

 tools/lib/perf/cpumap.c                  |  4 +--
 tools/lib/perf/evlist.c                  | 14 +++++---
 tools/lib/perf/include/internal/evlist.h |  5 +--
 tools/perf/builtin-record.c              | 13 +++----
 tools/perf/tests/cpumap.c                | 10 +++++-
 tools/perf/util/bpf_counter_cgroup.c     | 46 +++++++++++-------------
 tools/perf/util/evlist.c                 |  6 ++--
 tools/perf/util/evlist.h                 |  4 +--
 tools/perf/util/stat-display.c           |  7 ++--
 9 files changed, 60 insertions(+), 49 deletions(-)

-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v5 1/6] perf cpumap: Switch to using perf_cpu_map API
  2022-05-03  4:17 [PATCH v5 0/6] Make evlist CPUs more accurate Ian Rogers
@ 2022-05-03  4:17 ` Ian Rogers
  2022-05-04 17:44   ` Namhyung Kim
  2022-05-03  4:17 ` [PATCH v5 2/6] perf evlist: Clear all_cpus before propagating Ian Rogers
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Ian Rogers @ 2022-05-03  4:17 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel
  Cc: Stephane Eranian, Ian Rogers

Switch some raw accesses to the cpu map to using the library API. This
can help with reference count checking. Some BPF cases switch from index
to CPU for consistency, this shouldn't matter as the CPU map is full.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-record.c          | 13 +++++----
 tools/perf/util/bpf_counter_cgroup.c | 42 +++++++++++++---------------
 2 files changed, 26 insertions(+), 29 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 069825c48d40..a5cf6a99d67f 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1011,7 +1011,7 @@ static int record__thread_data_init_maps(struct record_thread *thread_data, stru
 
 	for (m = 0, tm = 0; m < nr_mmaps && tm < thread_data->nr_mmaps; m++) {
 		if (cpu_map__is_dummy(cpus) ||
-		    test_bit(cpus->map[m].cpu, thread_data->mask->maps.bits)) {
+		    test_bit(perf_cpu_map__cpu(cpus, m).cpu, thread_data->mask->maps.bits)) {
 			if (thread_data->maps) {
 				thread_data->maps[tm] = &mmap[m];
 				pr_debug2("thread_data[%p]: cpu%d: maps[%d] -> mmap[%d]\n",
@@ -3331,13 +3331,14 @@ struct option *record_options = __record_options;
 
 static void record__mmap_cpu_mask_init(struct mmap_cpu_mask *mask, struct perf_cpu_map *cpus)
 {
-	int c;
+	struct perf_cpu cpu;
+	int idx;
 
 	if (cpu_map__is_dummy(cpus))
 		return;
 
-	for (c = 0; c < cpus->nr; c++)
-		set_bit(cpus->map[c].cpu, mask->bits);
+	perf_cpu_map__for_each_cpu(cpu, idx, cpus)
+		set_bit(cpu.cpu, mask->bits);
 }
 
 static int record__mmap_cpu_mask_init_spec(struct mmap_cpu_mask *mask, const char *mask_spec)
@@ -3404,8 +3405,8 @@ static int record__init_thread_cpu_masks(struct record *rec, struct perf_cpu_map
 	pr_debug("nr_threads: %d\n", rec->nr_threads);
 
 	for (t = 0; t < rec->nr_threads; t++) {
-		set_bit(cpus->map[t].cpu, rec->thread_masks[t].maps.bits);
-		set_bit(cpus->map[t].cpu, rec->thread_masks[t].affinity.bits);
+		set_bit(perf_cpu_map__cpu(cpus, t).cpu, rec->thread_masks[t].maps.bits);
+		set_bit(perf_cpu_map__cpu(cpus, t).cpu, rec->thread_masks[t].affinity.bits);
 		if (verbose) {
 			pr_debug("thread_masks[%d]: ", t);
 			mmap_cpu_mask__scnprintf(&rec->thread_masks[t].maps, "maps");
diff --git a/tools/perf/util/bpf_counter_cgroup.c b/tools/perf/util/bpf_counter_cgroup.c
index ac60c08e8e2a..63b9db657442 100644
--- a/tools/perf/util/bpf_counter_cgroup.c
+++ b/tools/perf/util/bpf_counter_cgroup.c
@@ -46,8 +46,8 @@ static int bperf_load_program(struct evlist *evlist)
 	struct bpf_link *link;
 	struct evsel *evsel;
 	struct cgroup *cgrp, *leader_cgrp;
-	__u32 i, cpu;
-	__u32 nr_cpus = evlist->core.all_cpus->nr;
+	int i, j;
+	struct perf_cpu cpu;
 	int total_cpus = cpu__max_cpu().cpu;
 	int map_size, map_fd;
 	int prog_fd, err;
@@ -93,9 +93,9 @@ static int bperf_load_program(struct evlist *evlist)
 		goto out;
 	}
 
-	for (i = 0; i < nr_cpus; i++) {
+	perf_cpu_map__for_each_cpu(cpu, i, evlist->core.all_cpus) {
 		link = bpf_program__attach_perf_event(skel->progs.on_cgrp_switch,
-						      FD(cgrp_switch, i));
+						      FD(cgrp_switch, cpu.cpu));
 		if (IS_ERR(link)) {
 			pr_err("Failed to attach cgroup program\n");
 			err = PTR_ERR(link);
@@ -122,10 +122,9 @@ static int bperf_load_program(struct evlist *evlist)
 			}
 
 			map_fd = bpf_map__fd(skel->maps.events);
-			for (cpu = 0; cpu < nr_cpus; cpu++) {
-				int fd = FD(evsel, cpu);
-				__u32 idx = evsel->core.idx * total_cpus +
-					evlist->core.all_cpus->map[cpu].cpu;
+			perf_cpu_map__for_each_cpu(cpu, j, evlist->core.all_cpus) {
+				int fd = FD(evsel, cpu.cpu);
+				__u32 idx = evsel->core.idx * total_cpus + cpu.cpu;
 
 				err = bpf_map_update_elem(map_fd, &idx, &fd,
 							  BPF_ANY);
@@ -207,14 +206,12 @@ static int bperf_cgrp__install_pe(struct evsel *evsel __maybe_unused,
  */
 static int bperf_cgrp__sync_counters(struct evlist *evlist)
 {
-	int i, cpu;
-	int nr_cpus = evlist->core.all_cpus->nr;
+	struct perf_cpu cpu;
+	int idx;
 	int prog_fd = bpf_program__fd(skel->progs.trigger_read);
 
-	for (i = 0; i < nr_cpus; i++) {
-		cpu = evlist->core.all_cpus->map[i].cpu;
-		bperf_trigger_reading(prog_fd, cpu);
-	}
+	perf_cpu_map__for_each_cpu(cpu, idx, evlist->core.all_cpus)
+		bperf_trigger_reading(prog_fd, cpu.cpu);
 
 	return 0;
 }
@@ -244,12 +241,10 @@ static int bperf_cgrp__disable(struct evsel *evsel)
 static int bperf_cgrp__read(struct evsel *evsel)
 {
 	struct evlist *evlist = evsel->evlist;
-	int i, cpu, nr_cpus = evlist->core.all_cpus->nr;
 	int total_cpus = cpu__max_cpu().cpu;
 	struct perf_counts_values *counts;
 	struct bpf_perf_event_value *values;
 	int reading_map_fd, err = 0;
-	__u32 idx;
 
 	if (evsel->core.idx)
 		return 0;
@@ -263,7 +258,10 @@ static int bperf_cgrp__read(struct evsel *evsel)
 	reading_map_fd = bpf_map__fd(skel->maps.cgrp_readings);
 
 	evlist__for_each_entry(evlist, evsel) {
-		idx = evsel->core.idx;
+		__u32 idx = evsel->core.idx;
+		int i;
+		struct perf_cpu cpu;
+
 		err = bpf_map_lookup_elem(reading_map_fd, &idx, values);
 		if (err) {
 			pr_err("bpf map lookup failed: idx=%u, event=%s, cgrp=%s\n",
@@ -271,13 +269,11 @@ static int bperf_cgrp__read(struct evsel *evsel)
 			goto out;
 		}
 
-		for (i = 0; i < nr_cpus; i++) {
-			cpu = evlist->core.all_cpus->map[i].cpu;
-
+		perf_cpu_map__for_each_cpu(cpu, i, evlist->core.all_cpus) {
 			counts = perf_counts(evsel->counts, i, 0);
-			counts->val = values[cpu].counter;
-			counts->ena = values[cpu].enabled;
-			counts->run = values[cpu].running;
+			counts->val = values[cpu.cpu].counter;
+			counts->ena = values[cpu.cpu].enabled;
+			counts->run = values[cpu.cpu].running;
 		}
 	}
 
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 2/6] perf evlist: Clear all_cpus before propagating
  2022-05-03  4:17 [PATCH v5 0/6] Make evlist CPUs more accurate Ian Rogers
  2022-05-03  4:17 ` [PATCH v5 1/6] perf cpumap: Switch to using perf_cpu_map API Ian Rogers
@ 2022-05-03  4:17 ` Ian Rogers
  2022-05-04 14:15   ` Adrian Hunter
  2022-05-03  4:17 ` [PATCH v5 3/6] perf stat: Avoid printing cpus with no counters Ian Rogers
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Ian Rogers @ 2022-05-03  4:17 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel
  Cc: Stephane Eranian, Ian Rogers

all_cpus is merged into during propagation. Initially all_cpus is set
from PMU sysfs. perf_evlist__set_maps will recompute it and change
evsel->cpus to user_requested_cpus if they are given. If all_cpus isn't
cleared then the union of the user_requested_cpus and PMU sysfs values
is set to all_cpus, whereas just user_requested_cpus is necessary. To
avoid this make all_cpus empty prior to propagation.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/lib/perf/evlist.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index a09315538a30..974b4585f93e 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -59,6 +59,10 @@ static void perf_evlist__propagate_maps(struct perf_evlist *evlist)
 {
 	struct perf_evsel *evsel;
 
+	/* Recomputing all_cpus, so start with a blank slate. */
+	perf_cpu_map__put(evlist->all_cpus);
+	evlist->all_cpus = NULL;
+
 	perf_evlist__for_each_evsel(evlist, evsel)
 		__perf_evlist__propagate_maps(evlist, evsel);
 }
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 3/6] perf stat: Avoid printing cpus with no counters
  2022-05-03  4:17 [PATCH v5 0/6] Make evlist CPUs more accurate Ian Rogers
  2022-05-03  4:17 ` [PATCH v5 1/6] perf cpumap: Switch to using perf_cpu_map API Ian Rogers
  2022-05-03  4:17 ` [PATCH v5 2/6] perf evlist: Clear all_cpus before propagating Ian Rogers
@ 2022-05-03  4:17 ` Ian Rogers
  2022-05-03 14:19   ` Arnaldo Carvalho de Melo
  2022-05-03  4:17 ` [PATCH v5 4/6] perf cpumap: Handle dummy maps as empty in subset Ian Rogers
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Ian Rogers @ 2022-05-03  4:17 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel
  Cc: Stephane Eranian, Ian Rogers

perf_evlist's user_requested_cpus can contain CPUs not present in any
evsel's cpus, for example uncore counters. Avoid printing the prefix and
trailing \n until the first valid counter is encountered.

Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/stat-display.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index d9629a83aa78..13f705737367 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -948,8 +948,6 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
 		struct evsel *counter;
 		bool first = true;
 
-		if (prefix)
-			fputs(prefix, config->output);
 		evlist__for_each_entry(evlist, counter) {
 			u64 ena, run, val;
 			double uval;
@@ -961,6 +959,8 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
 
 			id = aggr_cpu_id__cpu(cpu, /*data=*/NULL);
 			if (first) {
+				if (prefix)
+					fputs(prefix, config->output);
 				aggr_printout(config, counter, id, 0);
 				first = false;
 			}
@@ -972,7 +972,8 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
 			printout(config, id, 0, counter, uval, prefix,
 				 run, ena, 1.0, &rt_stat);
 		}
-		fputc('\n', config->output);
+		if (!first)
+			fputc('\n', config->output);
 	}
 }
 
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 4/6] perf cpumap: Handle dummy maps as empty in subset
  2022-05-03  4:17 [PATCH v5 0/6] Make evlist CPUs more accurate Ian Rogers
                   ` (2 preceding siblings ...)
  2022-05-03  4:17 ` [PATCH v5 3/6] perf stat: Avoid printing cpus with no counters Ian Rogers
@ 2022-05-03  4:17 ` Ian Rogers
  2022-05-03  7:43   ` Adrian Hunter
  2022-05-03  4:17 ` [PATCH v5 5/6] perf evlist: Add to user_requested_cpus documentation Ian Rogers
  2022-05-03  4:17 ` [PATCH v5 6/6] perf evlist: Rename all_cpus Ian Rogers
  5 siblings, 1 reply; 17+ messages in thread
From: Ian Rogers @ 2022-05-03  4:17 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel
  Cc: Stephane Eranian, Ian Rogers

perf_cpu_map__empty is true for empty and dummy maps. Make is_subset
respect that.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/lib/perf/cpumap.c   |  4 ++--
 tools/perf/tests/cpumap.c | 10 +++++++++-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
index 384d5e076ee4..9c83675788c2 100644
--- a/tools/lib/perf/cpumap.c
+++ b/tools/lib/perf/cpumap.c
@@ -322,9 +322,9 @@ struct perf_cpu perf_cpu_map__max(struct perf_cpu_map *map)
 /** Is 'b' a subset of 'a'. */
 bool perf_cpu_map__is_subset(const struct perf_cpu_map *a, const struct perf_cpu_map *b)
 {
-	if (a == b || !b)
+	if (a == b || perf_cpu_map__empty(b))
 		return true;
-	if (!a || b->nr > a->nr)
+	if (perf_cpu_map__empty(a) || b->nr > a->nr)
 		return false;
 
 	for (int i = 0, j = 0; i < a->nr; i++) {
diff --git a/tools/perf/tests/cpumap.c b/tools/perf/tests/cpumap.c
index f94929ebb54b..d52b58395385 100644
--- a/tools/perf/tests/cpumap.c
+++ b/tools/perf/tests/cpumap.c
@@ -128,13 +128,21 @@ static int test__cpu_map_merge(struct test_suite *test __maybe_unused, int subte
 	struct perf_cpu_map *a = perf_cpu_map__new("4,2,1");
 	struct perf_cpu_map *b = perf_cpu_map__new("4,5,7");
 	struct perf_cpu_map *c = perf_cpu_map__merge(a, b);
+	struct perf_cpu_map *d = perf_cpu_map__dummy_new();
+	struct perf_cpu_map *e = perf_cpu_map__merge(b, d);
 	char buf[100];
 
 	TEST_ASSERT_VAL("failed to merge map: bad nr", perf_cpu_map__nr(c) == 5);
 	cpu_map__snprint(c, buf, sizeof(buf));
 	TEST_ASSERT_VAL("failed to merge map: bad result", !strcmp(buf, "1-2,4-5,7"));
-	perf_cpu_map__put(b);
+
+	TEST_ASSERT_VAL("failed to merge map: bad nr", perf_cpu_map__nr(e) == 3);
+	cpu_map__snprint(e, buf, sizeof(buf));
+	TEST_ASSERT_VAL("failed to merge map: bad result", !strcmp(buf, "4-5,7"));
+
 	perf_cpu_map__put(c);
+	perf_cpu_map__put(d);
+	perf_cpu_map__put(e);
 	return 0;
 }
 
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 5/6] perf evlist: Add to user_requested_cpus documentation
  2022-05-03  4:17 [PATCH v5 0/6] Make evlist CPUs more accurate Ian Rogers
                   ` (3 preceding siblings ...)
  2022-05-03  4:17 ` [PATCH v5 4/6] perf cpumap: Handle dummy maps as empty in subset Ian Rogers
@ 2022-05-03  4:17 ` Ian Rogers
  2022-05-03  4:17 ` [PATCH v5 6/6] perf evlist: Rename all_cpus Ian Rogers
  5 siblings, 0 replies; 17+ messages in thread
From: Ian Rogers @ 2022-05-03  4:17 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel
  Cc: Stephane Eranian, Ian Rogers

Document a key use-case in propagation.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/lib/perf/include/internal/evlist.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/lib/perf/include/internal/evlist.h b/tools/lib/perf/include/internal/evlist.h
index e3e64f37db7b..74541bd87aa9 100644
--- a/tools/lib/perf/include/internal/evlist.h
+++ b/tools/lib/perf/include/internal/evlist.h
@@ -21,7 +21,8 @@ struct perf_evlist {
 	bool			 has_user_cpus;
 	/**
 	 * The cpus passed from the command line or all online CPUs by
-	 * default.
+	 * default. For evsels with no or dummy cpu maps, this cpu map replaces
+	 * their cpus during propagation.
 	 */
 	struct perf_cpu_map	*user_requested_cpus;
 	/** The union of all evsel cpu maps. */
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v5 6/6] perf evlist: Rename all_cpus
  2022-05-03  4:17 [PATCH v5 0/6] Make evlist CPUs more accurate Ian Rogers
                   ` (4 preceding siblings ...)
  2022-05-03  4:17 ` [PATCH v5 5/6] perf evlist: Add to user_requested_cpus documentation Ian Rogers
@ 2022-05-03  4:17 ` Ian Rogers
  5 siblings, 0 replies; 17+ messages in thread
From: Ian Rogers @ 2022-05-03  4:17 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel
  Cc: Stephane Eranian, Ian Rogers

Try to make the struct variable clearer by renaming to
merged_evsel_cpus.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/lib/perf/evlist.c                  | 16 ++++++++--------
 tools/lib/perf/include/internal/evlist.h |  2 +-
 tools/perf/util/bpf_counter_cgroup.c     | 12 ++++++------
 tools/perf/util/evlist.c                 |  6 +++---
 tools/perf/util/evlist.h                 |  4 ++--
 5 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index 974b4585f93e..5840a9377494 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -52,16 +52,16 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
 
 	perf_thread_map__put(evsel->threads);
 	evsel->threads = perf_thread_map__get(evlist->threads);
-	evlist->all_cpus = perf_cpu_map__merge(evlist->all_cpus, evsel->cpus);
+	evlist->merged_evsel_cpus = perf_cpu_map__merge(evlist->merged_evsel_cpus, evsel->cpus);
 }
 
 static void perf_evlist__propagate_maps(struct perf_evlist *evlist)
 {
 	struct perf_evsel *evsel;
 
-	/* Recomputing all_cpus, so start with a blank slate. */
-	perf_cpu_map__put(evlist->all_cpus);
-	evlist->all_cpus = NULL;
+	/* Recomputing merged_evsel_cpus, so start with a blank slate. */
+	perf_cpu_map__put(evlist->merged_evsel_cpus);
+	evlist->merged_evsel_cpus = NULL;
 
 	perf_evlist__for_each_evsel(evlist, evsel)
 		__perf_evlist__propagate_maps(evlist, evsel);
@@ -128,10 +128,10 @@ static void perf_evlist__purge(struct perf_evlist *evlist)
 void perf_evlist__exit(struct perf_evlist *evlist)
 {
 	perf_cpu_map__put(evlist->user_requested_cpus);
-	perf_cpu_map__put(evlist->all_cpus);
+	perf_cpu_map__put(evlist->merged_evsel_cpus);
 	perf_thread_map__put(evlist->threads);
 	evlist->user_requested_cpus = NULL;
-	evlist->all_cpus = NULL;
+	evlist->merged_evsel_cpus = NULL;
 	evlist->threads = NULL;
 	fdarray__exit(&evlist->pollfd);
 }
@@ -169,8 +169,8 @@ void perf_evlist__set_maps(struct perf_evlist *evlist,
 		evlist->threads = perf_thread_map__get(threads);
 	}
 
-	if (!evlist->all_cpus && cpus)
-		evlist->all_cpus = perf_cpu_map__get(cpus);
+	if (!evlist->merged_evsel_cpus && cpus)
+		evlist->merged_evsel_cpus = perf_cpu_map__get(cpus);
 
 	perf_evlist__propagate_maps(evlist);
 }
diff --git a/tools/lib/perf/include/internal/evlist.h b/tools/lib/perf/include/internal/evlist.h
index 74541bd87aa9..3a0a47ba8c57 100644
--- a/tools/lib/perf/include/internal/evlist.h
+++ b/tools/lib/perf/include/internal/evlist.h
@@ -26,7 +26,7 @@ struct perf_evlist {
 	 */
 	struct perf_cpu_map	*user_requested_cpus;
 	/** The union of all evsel cpu maps. */
-	struct perf_cpu_map	*all_cpus;
+	struct perf_cpu_map	*merged_evsel_cpus;
 	struct perf_thread_map	*threads;
 	int			 nr_mmaps;
 	size_t			 mmap_len;
diff --git a/tools/perf/util/bpf_counter_cgroup.c b/tools/perf/util/bpf_counter_cgroup.c
index 63b9db657442..35e92b889c3b 100644
--- a/tools/perf/util/bpf_counter_cgroup.c
+++ b/tools/perf/util/bpf_counter_cgroup.c
@@ -88,12 +88,12 @@ static int bperf_load_program(struct evlist *evlist)
 	err = -1;
 
 	cgrp_switch = evsel__new(&cgrp_switch_attr);
-	if (evsel__open_per_cpu(cgrp_switch, evlist->core.all_cpus, -1) < 0) {
+	if (evsel__open_per_cpu(cgrp_switch, evlist->core.merged_evsel_cpus, -1) < 0) {
 		pr_err("Failed to open cgroup switches event\n");
 		goto out;
 	}
 
-	perf_cpu_map__for_each_cpu(cpu, i, evlist->core.all_cpus) {
+	perf_cpu_map__for_each_cpu(cpu, i, evlist->core.merged_evsel_cpus) {
 		link = bpf_program__attach_perf_event(skel->progs.on_cgrp_switch,
 						      FD(cgrp_switch, cpu.cpu));
 		if (IS_ERR(link)) {
@@ -115,14 +115,14 @@ static int bperf_load_program(struct evlist *evlist)
 			evsel->cgrp = NULL;
 
 			/* open single copy of the events w/o cgroup */
-			err = evsel__open_per_cpu(evsel, evlist->core.all_cpus, -1);
+			err = evsel__open_per_cpu(evsel, evlist->core.merged_evsel_cpus, -1);
 			if (err) {
 				pr_err("Failed to open first cgroup events\n");
 				goto out;
 			}
 
 			map_fd = bpf_map__fd(skel->maps.events);
-			perf_cpu_map__for_each_cpu(cpu, j, evlist->core.all_cpus) {
+			perf_cpu_map__for_each_cpu(cpu, j, evlist->core.merged_evsel_cpus) {
 				int fd = FD(evsel, cpu.cpu);
 				__u32 idx = evsel->core.idx * total_cpus + cpu.cpu;
 
@@ -210,7 +210,7 @@ static int bperf_cgrp__sync_counters(struct evlist *evlist)
 	int idx;
 	int prog_fd = bpf_program__fd(skel->progs.trigger_read);
 
-	perf_cpu_map__for_each_cpu(cpu, idx, evlist->core.all_cpus)
+	perf_cpu_map__for_each_cpu(cpu, idx, evlist->core.merged_evsel_cpus)
 		bperf_trigger_reading(prog_fd, cpu.cpu);
 
 	return 0;
@@ -269,7 +269,7 @@ static int bperf_cgrp__read(struct evsel *evsel)
 			goto out;
 		}
 
-		perf_cpu_map__for_each_cpu(cpu, i, evlist->core.all_cpus) {
+		perf_cpu_map__for_each_cpu(cpu, i, evlist->core.merged_evsel_cpus) {
 			counts = perf_counts(evsel->counts, i, 0);
 			counts->val = values[cpu.cpu].counter;
 			counts->ena = values[cpu.cpu].enabled;
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 52ea004ba01e..57ecd94e6f9e 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -349,7 +349,7 @@ struct evlist_cpu_iterator evlist__cpu_begin(struct evlist *evlist, struct affin
 		.evsel = NULL,
 		.cpu_map_idx = 0,
 		.evlist_cpu_map_idx = 0,
-		.evlist_cpu_map_nr = perf_cpu_map__nr(evlist->core.all_cpus),
+		.evlist_cpu_map_nr = perf_cpu_map__nr(evlist->core.merged_evsel_cpus),
 		.cpu = (struct perf_cpu){ .cpu = -1},
 		.affinity = affinity,
 	};
@@ -360,7 +360,7 @@ struct evlist_cpu_iterator evlist__cpu_begin(struct evlist *evlist, struct affin
 	} else {
 		itr.evsel = evlist__first(evlist);
 		if (itr.affinity) {
-			itr.cpu = perf_cpu_map__cpu(evlist->core.all_cpus, 0);
+			itr.cpu = perf_cpu_map__cpu(evlist->core.merged_evsel_cpus, 0);
 			affinity__set(itr.affinity, itr.cpu.cpu);
 			itr.cpu_map_idx = perf_cpu_map__idx(itr.evsel->core.cpus, itr.cpu);
 			/*
@@ -388,7 +388,7 @@ void evlist_cpu_iterator__next(struct evlist_cpu_iterator *evlist_cpu_itr)
 	if (evlist_cpu_itr->evlist_cpu_map_idx < evlist_cpu_itr->evlist_cpu_map_nr) {
 		evlist_cpu_itr->evsel = evlist__first(evlist_cpu_itr->container);
 		evlist_cpu_itr->cpu =
-			perf_cpu_map__cpu(evlist_cpu_itr->container->core.all_cpus,
+			perf_cpu_map__cpu(evlist_cpu_itr->container->core.merged_evsel_cpus,
 					  evlist_cpu_itr->evlist_cpu_map_idx);
 		if (evlist_cpu_itr->affinity)
 			affinity__set(evlist_cpu_itr->affinity, evlist_cpu_itr->cpu.cpu);
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index a21daaa5fc1b..e2e68f988d26 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -336,12 +336,12 @@ struct evlist_cpu_iterator {
 	/** The CPU map index corresponding to the evsel->core.cpus for the current CPU. */
 	int cpu_map_idx;
 	/**
-	 * The CPU map index corresponding to evlist->core.all_cpus for the
+	 * The CPU map index corresponding to evlist->core.merged_evsel_cpus for the
 	 * current CPU.  Distinct from cpu_map_idx as the evsel's cpu map may
 	 * contain fewer entries.
 	 */
 	int evlist_cpu_map_idx;
-	/** The number of CPU map entries in evlist->core.all_cpus. */
+	/** The number of CPU map entries in evlist->core.merged_evsel_cpus. */
 	int evlist_cpu_map_nr;
 	/** The current CPU of the iterator. */
 	struct perf_cpu cpu;
-- 
2.36.0.464.gb9c8b46e94-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 4/6] perf cpumap: Handle dummy maps as empty in subset
  2022-05-03  4:17 ` [PATCH v5 4/6] perf cpumap: Handle dummy maps as empty in subset Ian Rogers
@ 2022-05-03  7:43   ` Adrian Hunter
  2022-05-03 14:03     ` Ian Rogers
  0 siblings, 1 reply; 17+ messages in thread
From: Adrian Hunter @ 2022-05-03  7:43 UTC (permalink / raw)
  To: Ian Rogers, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel
  Cc: Stephane Eranian

On 3/05/22 07:17, Ian Rogers wrote:
> perf_cpu_map__empty is true for empty and dummy maps. Make is_subset
> respect that.

As I wrote before, I am not keen on this because it prevents -1, as a
valid 3rd parameter to perf_event_open(), from being represented
in merged evsel cpu maps.

Why do you want this?

> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/lib/perf/cpumap.c   |  4 ++--
>  tools/perf/tests/cpumap.c | 10 +++++++++-
>  2 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
> index 384d5e076ee4..9c83675788c2 100644
> --- a/tools/lib/perf/cpumap.c
> +++ b/tools/lib/perf/cpumap.c
> @@ -322,9 +322,9 @@ struct perf_cpu perf_cpu_map__max(struct perf_cpu_map *map)
>  /** Is 'b' a subset of 'a'. */
>  bool perf_cpu_map__is_subset(const struct perf_cpu_map *a, const struct perf_cpu_map *b)
>  {
> -	if (a == b || !b)
> +	if (a == b || perf_cpu_map__empty(b))
>  		return true;
> -	if (!a || b->nr > a->nr)
> +	if (perf_cpu_map__empty(a) || b->nr > a->nr)
>  		return false;
>  
>  	for (int i = 0, j = 0; i < a->nr; i++) {
> diff --git a/tools/perf/tests/cpumap.c b/tools/perf/tests/cpumap.c
> index f94929ebb54b..d52b58395385 100644
> --- a/tools/perf/tests/cpumap.c
> +++ b/tools/perf/tests/cpumap.c
> @@ -128,13 +128,21 @@ static int test__cpu_map_merge(struct test_suite *test __maybe_unused, int subte
>  	struct perf_cpu_map *a = perf_cpu_map__new("4,2,1");
>  	struct perf_cpu_map *b = perf_cpu_map__new("4,5,7");
>  	struct perf_cpu_map *c = perf_cpu_map__merge(a, b);
> +	struct perf_cpu_map *d = perf_cpu_map__dummy_new();
> +	struct perf_cpu_map *e = perf_cpu_map__merge(b, d);
>  	char buf[100];
>  
>  	TEST_ASSERT_VAL("failed to merge map: bad nr", perf_cpu_map__nr(c) == 5);
>  	cpu_map__snprint(c, buf, sizeof(buf));
>  	TEST_ASSERT_VAL("failed to merge map: bad result", !strcmp(buf, "1-2,4-5,7"));
> -	perf_cpu_map__put(b);
> +
> +	TEST_ASSERT_VAL("failed to merge map: bad nr", perf_cpu_map__nr(e) == 3);
> +	cpu_map__snprint(e, buf, sizeof(buf));
> +	TEST_ASSERT_VAL("failed to merge map: bad result", !strcmp(buf, "4-5,7"));
> +
>  	perf_cpu_map__put(c);
> +	perf_cpu_map__put(d);
> +	perf_cpu_map__put(e);
>  	return 0;
>  }
>  


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 4/6] perf cpumap: Handle dummy maps as empty in subset
  2022-05-03  7:43   ` Adrian Hunter
@ 2022-05-03 14:03     ` Ian Rogers
  2022-05-04 12:54       ` Adrian Hunter
  0 siblings, 1 reply; 17+ messages in thread
From: Ian Rogers @ 2022-05-03 14:03 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel,
	Stephane Eranian

On Tue, May 3, 2022 at 12:43 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> On 3/05/22 07:17, Ian Rogers wrote:
> > perf_cpu_map__empty is true for empty and dummy maps. Make is_subset
> > respect that.
>
> As I wrote before, I am not keen on this because it prevents -1, as a
> valid 3rd parameter to perf_event_open(), from being represented
> in merged evsel cpu maps.
>
> Why do you want this?

Thanks Adrian, could you give me a test case (command line) where the
differing dummy and empty behavior matters? Normally cpus/own_cpus are
set to null during parsing. They may get replaced with
user_requested_cpus:
https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/lib/perf/evlist.c?h=perf/core#n44
https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/lib/perf/evlist.c?h=perf/core#n45
(should it be on line 45 that !empty is expected?)

During merge the null/empty all_cpus drops this value, which doesn't
matter as the behavior with empty is the same as dummy:
https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/lib/perf/evsel.c?h=perf/core#n119

What's concerning me is the definition of empty:
https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/lib/perf/cpumap.c?h=perf/core#n279
```
return map ? map->map[0].cpu == -1 : true;
```
If the first entry can be -1 and there can be other CPUs merged after
then that cpu map will be empty by the definition above. Perhaps it
should be:
```
return map ? (map->nr == 1 && map->map[0].cpu == -1) : true;
```
but it seems you prefer:
```
return (map == NULL) ? true : false;
```

You'd asked what the behavior with a dummy is and clearly it is
somewhat muddy. That is what this patch and unit test is trying to
clean up.

Thanks,
Ian

> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/lib/perf/cpumap.c   |  4 ++--
> >  tools/perf/tests/cpumap.c | 10 +++++++++-
> >  2 files changed, 11 insertions(+), 3 deletions(-)
> >
> > diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
> > index 384d5e076ee4..9c83675788c2 100644
> > --- a/tools/lib/perf/cpumap.c
> > +++ b/tools/lib/perf/cpumap.c
> > @@ -322,9 +322,9 @@ struct perf_cpu perf_cpu_map__max(struct perf_cpu_map *map)
> >  /** Is 'b' a subset of 'a'. */
> >  bool perf_cpu_map__is_subset(const struct perf_cpu_map *a, const struct perf_cpu_map *b)
> >  {
> > -     if (a == b || !b)
> > +     if (a == b || perf_cpu_map__empty(b))
> >               return true;
> > -     if (!a || b->nr > a->nr)
> > +     if (perf_cpu_map__empty(a) || b->nr > a->nr)
> >               return false;
> >
> >       for (int i = 0, j = 0; i < a->nr; i++) {
> > diff --git a/tools/perf/tests/cpumap.c b/tools/perf/tests/cpumap.c
> > index f94929ebb54b..d52b58395385 100644
> > --- a/tools/perf/tests/cpumap.c
> > +++ b/tools/perf/tests/cpumap.c
> > @@ -128,13 +128,21 @@ static int test__cpu_map_merge(struct test_suite *test __maybe_unused, int subte
> >       struct perf_cpu_map *a = perf_cpu_map__new("4,2,1");
> >       struct perf_cpu_map *b = perf_cpu_map__new("4,5,7");
> >       struct perf_cpu_map *c = perf_cpu_map__merge(a, b);
> > +     struct perf_cpu_map *d = perf_cpu_map__dummy_new();
> > +     struct perf_cpu_map *e = perf_cpu_map__merge(b, d);
> >       char buf[100];
> >
> >       TEST_ASSERT_VAL("failed to merge map: bad nr", perf_cpu_map__nr(c) == 5);
> >       cpu_map__snprint(c, buf, sizeof(buf));
> >       TEST_ASSERT_VAL("failed to merge map: bad result", !strcmp(buf, "1-2,4-5,7"));
> > -     perf_cpu_map__put(b);
> > +
> > +     TEST_ASSERT_VAL("failed to merge map: bad nr", perf_cpu_map__nr(e) == 3);
> > +     cpu_map__snprint(e, buf, sizeof(buf));
> > +     TEST_ASSERT_VAL("failed to merge map: bad result", !strcmp(buf, "4-5,7"));
> > +
> >       perf_cpu_map__put(c);
> > +     perf_cpu_map__put(d);
> > +     perf_cpu_map__put(e);
> >       return 0;
> >  }
> >
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 3/6] perf stat: Avoid printing cpus with no counters
  2022-05-03  4:17 ` [PATCH v5 3/6] perf stat: Avoid printing cpus with no counters Ian Rogers
@ 2022-05-03 14:19   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-05-03 14:19 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Adrian Hunter, Riccardo Mancini, Andi Kleen,
	Alexey Bayduraev, Alexander Antonov, linux-perf-users,
	linux-kernel, Stephane Eranian

Em Mon, May 02, 2022 at 09:17:54PM -0700, Ian Rogers escreveu:
> perf_evlist's user_requested_cpus can contain CPUs not present in any
> evsel's cpus, for example uncore counters. Avoid printing the prefix and
> trailing \n until the first valid counter is encountered.
> 
> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>


Thanks, applied.

- Arnaldo


> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/util/stat-display.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index d9629a83aa78..13f705737367 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -948,8 +948,6 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
>  		struct evsel *counter;
>  		bool first = true;
>  
> -		if (prefix)
> -			fputs(prefix, config->output);
>  		evlist__for_each_entry(evlist, counter) {
>  			u64 ena, run, val;
>  			double uval;
> @@ -961,6 +959,8 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
>  
>  			id = aggr_cpu_id__cpu(cpu, /*data=*/NULL);
>  			if (first) {
> +				if (prefix)
> +					fputs(prefix, config->output);
>  				aggr_printout(config, counter, id, 0);
>  				first = false;
>  			}
> @@ -972,7 +972,8 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
>  			printout(config, id, 0, counter, uval, prefix,
>  				 run, ena, 1.0, &rt_stat);
>  		}
> -		fputc('\n', config->output);
> +		if (!first)
> +			fputc('\n', config->output);
>  	}
>  }
>  
> -- 
> 2.36.0.464.gb9c8b46e94-goog

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 4/6] perf cpumap: Handle dummy maps as empty in subset
  2022-05-03 14:03     ` Ian Rogers
@ 2022-05-04 12:54       ` Adrian Hunter
  2022-05-04 13:59         ` Ian Rogers
  0 siblings, 1 reply; 17+ messages in thread
From: Adrian Hunter @ 2022-05-04 12:54 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel,
	Stephane Eranian

On 3/05/22 17:03, Ian Rogers wrote:
> On Tue, May 3, 2022 at 12:43 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>>
>> On 3/05/22 07:17, Ian Rogers wrote:
>>> perf_cpu_map__empty is true for empty and dummy maps. Make is_subset
>>> respect that.
>>
>> As I wrote before, I am not keen on this because it prevents -1, as a
>> valid 3rd parameter to perf_event_open(), from being represented
>> in merged evsel cpu maps.
>>
>> Why do you want this?
> 
> Thanks Adrian, could you give me a test case (command line) where the
> differing dummy and empty behavior matters?

perf record --per-thread -e intel_pt// uname

With patchset "perf intel-pt: Better support for perf record --cpu"
the above will have (assuming 8-CPUs):
	user_requested_cpus = {-1}
	intel_pt evsel->cpus = {-1}
	text_poke dummy evsel->cpus = {0-7}
which when merged would result in:
	before this patch: all_cpus = {-1-7}
	after this patch:  all_cpus = {0-7}

The absence of -1 will mean that the intel_pt event does not get
mmapped.

>                                             Normally cpus/own_cpus are
> set to null during parsing. They may get replaced with
> user_requested_cpus:
> https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/lib/perf/evlist.c?h=perf/core#n44
> https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/lib/perf/evlist.c?h=perf/core#n45
> (should it be on line 45 that !empty is expected?)
> 
> During merge the null/empty all_cpus drops this value, which doesn't
> matter as the behavior with empty is the same as dummy:
> https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/lib/perf/evsel.c?h=perf/core#n119
> 
> What's concerning me is the definition of empty:
> https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/lib/perf/cpumap.c?h=perf/core#n279
> ```
> return map ? map->map[0].cpu == -1 : true;
> ```
> If the first entry can be -1 and there can be other CPUs merged after
> then that cpu map will be empty by the definition above. Perhaps it
> should be:
> ```
> return map ? (map->nr == 1 && map->map[0].cpu == -1) : true;
> ```
> but it seems you prefer:
> ```
> return (map == NULL) ? true : false;
> ```
> 
> You'd asked what the behavior with a dummy is and clearly it is
> somewhat muddy. That is what this patch and unit test is trying to
> clean up.
> 
> Thanks,
> Ian
> 
>>>
>>> Signed-off-by: Ian Rogers <irogers@google.com>
>>> ---
>>>  tools/lib/perf/cpumap.c   |  4 ++--
>>>  tools/perf/tests/cpumap.c | 10 +++++++++-
>>>  2 files changed, 11 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
>>> index 384d5e076ee4..9c83675788c2 100644
>>> --- a/tools/lib/perf/cpumap.c
>>> +++ b/tools/lib/perf/cpumap.c
>>> @@ -322,9 +322,9 @@ struct perf_cpu perf_cpu_map__max(struct perf_cpu_map *map)
>>>  /** Is 'b' a subset of 'a'. */
>>>  bool perf_cpu_map__is_subset(const struct perf_cpu_map *a, const struct perf_cpu_map *b)
>>>  {
>>> -     if (a == b || !b)
>>> +     if (a == b || perf_cpu_map__empty(b))
>>>               return true;
>>> -     if (!a || b->nr > a->nr)
>>> +     if (perf_cpu_map__empty(a) || b->nr > a->nr)
>>>               return false;
>>>
>>>       for (int i = 0, j = 0; i < a->nr; i++) {
>>> diff --git a/tools/perf/tests/cpumap.c b/tools/perf/tests/cpumap.c
>>> index f94929ebb54b..d52b58395385 100644
>>> --- a/tools/perf/tests/cpumap.c
>>> +++ b/tools/perf/tests/cpumap.c
>>> @@ -128,13 +128,21 @@ static int test__cpu_map_merge(struct test_suite *test __maybe_unused, int subte
>>>       struct perf_cpu_map *a = perf_cpu_map__new("4,2,1");
>>>       struct perf_cpu_map *b = perf_cpu_map__new("4,5,7");
>>>       struct perf_cpu_map *c = perf_cpu_map__merge(a, b);
>>> +     struct perf_cpu_map *d = perf_cpu_map__dummy_new();
>>> +     struct perf_cpu_map *e = perf_cpu_map__merge(b, d);
>>>       char buf[100];
>>>
>>>       TEST_ASSERT_VAL("failed to merge map: bad nr", perf_cpu_map__nr(c) == 5);
>>>       cpu_map__snprint(c, buf, sizeof(buf));
>>>       TEST_ASSERT_VAL("failed to merge map: bad result", !strcmp(buf, "1-2,4-5,7"));
>>> -     perf_cpu_map__put(b);
>>> +
>>> +     TEST_ASSERT_VAL("failed to merge map: bad nr", perf_cpu_map__nr(e) == 3);
>>> +     cpu_map__snprint(e, buf, sizeof(buf));
>>> +     TEST_ASSERT_VAL("failed to merge map: bad result", !strcmp(buf, "4-5,7"));
>>> +
>>>       perf_cpu_map__put(c);
>>> +     perf_cpu_map__put(d);
>>> +     perf_cpu_map__put(e);
>>>       return 0;
>>>  }
>>>
>>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 4/6] perf cpumap: Handle dummy maps as empty in subset
  2022-05-04 12:54       ` Adrian Hunter
@ 2022-05-04 13:59         ` Ian Rogers
  2022-05-04 16:33           ` Ian Rogers
  0 siblings, 1 reply; 17+ messages in thread
From: Ian Rogers @ 2022-05-04 13:59 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel,
	Stephane Eranian

On Wed, May 4, 2022 at 5:54 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> On 3/05/22 17:03, Ian Rogers wrote:
> > On Tue, May 3, 2022 at 12:43 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
> >>
> >> On 3/05/22 07:17, Ian Rogers wrote:
> >>> perf_cpu_map__empty is true for empty and dummy maps. Make is_subset
> >>> respect that.
> >>
> >> As I wrote before, I am not keen on this because it prevents -1, as a
> >> valid 3rd parameter to perf_event_open(), from being represented
> >> in merged evsel cpu maps.
> >>
> >> Why do you want this?
> >
> > Thanks Adrian, could you give me a test case (command line) where the
> > differing dummy and empty behavior matters?
>
> perf record --per-thread -e intel_pt// uname
>
> With patchset "perf intel-pt: Better support for perf record --cpu"
> the above will have (assuming 8-CPUs):
>         user_requested_cpus = {-1}
>         intel_pt evsel->cpus = {-1}
>         text_poke dummy evsel->cpus = {0-7}
> which when merged would result in:
>         before this patch: all_cpus = {-1-7}
>         after this patch:  all_cpus = {0-7}
>
> The absence of -1 will mean that the intel_pt event does not get
> mmapped.

Thanks, so what's the right fix? To make this work we should:
 - remove language of dummy being a singular cpu_map
 - change perf_cpu_map__empty to be something like
perf_cpu_map__empty_or_just_dummy
 - change cpu_map__is_dummy to be something llike cpu_map__singular_dummy
 - add tests on cpu map code for things like the evlist affiinity
iterator as I'm not clear what will happen when it encounters a -1 CPU
Note, I'm proposing changing function names rather than implementation
behavior, as we don't have enough tests to give me confidence that
changing the existing behavior wouldn't break something. For example,
--per-thread mode was recently broken:
https://lore.kernel.org/lkml/e1ce0d93-88cc-af79-e67e-d3c79d166ca6@gmail.com/
Do we also need to fix up parse events for software events (e.g.
faults) where the cpu map is empty but really should be dummy? This
will likely need a propagate fix as during the parsing propagation
user_requested_cpus is empty and we want to keep dummy cpu maps, not
overwrite them with empty.

I see a fair amount of clean up here, which isn't bad. My assumed
alternative was that the intel_pt code could look for dummy cpu maps
on the evsels, but then why have dummy cpu maps at all and just use
empty throughout the code base? We could also add a flag to the evlist
to say whether any evsel cpu maps contain a dummy/empty map. API wise
I'm tempted to say that the dummy CPU map should be removed and empty
just used in its place (less is more, keep it simple).

Something that would help with clarity, I think, would be to land the fix in:
https://lore.kernel.org/lkml/20220503041757.2365696-3-irogers@google.com/
as currently all_cpus contains cpus not in the evsel cpu maps, but are
residue from when the evsels were parsed.

Thanks,
Ian

> >                                             Normally cpus/own_cpus are
> > set to null during parsing. They may get replaced with
> > user_requested_cpus:
> > https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/lib/perf/evlist.c?h=perf/core#n44
> > https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/lib/perf/evlist.c?h=perf/core#n45
> > (should it be on line 45 that !empty is expected?)
> >
> > During merge the null/empty all_cpus drops this value, which doesn't
> > matter as the behavior with empty is the same as dummy:
> > https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/lib/perf/evsel.c?h=perf/core#n119
> >
> > What's concerning me is the definition of empty:
> > https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/lib/perf/cpumap.c?h=perf/core#n279
> > ```
> > return map ? map->map[0].cpu == -1 : true;
> > ```
> > If the first entry can be -1 and there can be other CPUs merged after
> > then that cpu map will be empty by the definition above. Perhaps it
> > should be:
> > ```
> > return map ? (map->nr == 1 && map->map[0].cpu == -1) : true;
> > ```
> > but it seems you prefer:
> > ```
> > return (map == NULL) ? true : false;
> > ```
> >
> > You'd asked what the behavior with a dummy is and clearly it is
> > somewhat muddy. That is what this patch and unit test is trying to
> > clean up.
> >
> > Thanks,
> > Ian
> >
> >>>
> >>> Signed-off-by: Ian Rogers <irogers@google.com>
> >>> ---
> >>>  tools/lib/perf/cpumap.c   |  4 ++--
> >>>  tools/perf/tests/cpumap.c | 10 +++++++++-
> >>>  2 files changed, 11 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
> >>> index 384d5e076ee4..9c83675788c2 100644
> >>> --- a/tools/lib/perf/cpumap.c
> >>> +++ b/tools/lib/perf/cpumap.c
> >>> @@ -322,9 +322,9 @@ struct perf_cpu perf_cpu_map__max(struct perf_cpu_map *map)
> >>>  /** Is 'b' a subset of 'a'. */
> >>>  bool perf_cpu_map__is_subset(const struct perf_cpu_map *a, const struct perf_cpu_map *b)
> >>>  {
> >>> -     if (a == b || !b)
> >>> +     if (a == b || perf_cpu_map__empty(b))
> >>>               return true;
> >>> -     if (!a || b->nr > a->nr)
> >>> +     if (perf_cpu_map__empty(a) || b->nr > a->nr)
> >>>               return false;
> >>>
> >>>       for (int i = 0, j = 0; i < a->nr; i++) {
> >>> diff --git a/tools/perf/tests/cpumap.c b/tools/perf/tests/cpumap.c
> >>> index f94929ebb54b..d52b58395385 100644
> >>> --- a/tools/perf/tests/cpumap.c
> >>> +++ b/tools/perf/tests/cpumap.c
> >>> @@ -128,13 +128,21 @@ static int test__cpu_map_merge(struct test_suite *test __maybe_unused, int subte
> >>>       struct perf_cpu_map *a = perf_cpu_map__new("4,2,1");
> >>>       struct perf_cpu_map *b = perf_cpu_map__new("4,5,7");
> >>>       struct perf_cpu_map *c = perf_cpu_map__merge(a, b);
> >>> +     struct perf_cpu_map *d = perf_cpu_map__dummy_new();
> >>> +     struct perf_cpu_map *e = perf_cpu_map__merge(b, d);
> >>>       char buf[100];
> >>>
> >>>       TEST_ASSERT_VAL("failed to merge map: bad nr", perf_cpu_map__nr(c) == 5);
> >>>       cpu_map__snprint(c, buf, sizeof(buf));
> >>>       TEST_ASSERT_VAL("failed to merge map: bad result", !strcmp(buf, "1-2,4-5,7"));
> >>> -     perf_cpu_map__put(b);
> >>> +
> >>> +     TEST_ASSERT_VAL("failed to merge map: bad nr", perf_cpu_map__nr(e) == 3);
> >>> +     cpu_map__snprint(e, buf, sizeof(buf));
> >>> +     TEST_ASSERT_VAL("failed to merge map: bad result", !strcmp(buf, "4-5,7"));
> >>> +
> >>>       perf_cpu_map__put(c);
> >>> +     perf_cpu_map__put(d);
> >>> +     perf_cpu_map__put(e);
> >>>       return 0;
> >>>  }
> >>>
> >>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 2/6] perf evlist: Clear all_cpus before propagating
  2022-05-03  4:17 ` [PATCH v5 2/6] perf evlist: Clear all_cpus before propagating Ian Rogers
@ 2022-05-04 14:15   ` Adrian Hunter
  2022-05-05 17:39     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 17+ messages in thread
From: Adrian Hunter @ 2022-05-04 14:15 UTC (permalink / raw)
  To: Ian Rogers, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel
  Cc: Stephane Eranian

On 3/05/22 07:17, Ian Rogers wrote:
> all_cpus is merged into during propagation. Initially all_cpus is set
> from PMU sysfs. perf_evlist__set_maps will recompute it and change
> evsel->cpus to user_requested_cpus if they are given. If all_cpus isn't
> cleared then the union of the user_requested_cpus and PMU sysfs values
> is set to all_cpus, whereas just user_requested_cpus is necessary. To
> avoid this make all_cpus empty prior to propagation.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>

Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>

> ---
>  tools/lib/perf/evlist.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
> index a09315538a30..974b4585f93e 100644
> --- a/tools/lib/perf/evlist.c
> +++ b/tools/lib/perf/evlist.c
> @@ -59,6 +59,10 @@ static void perf_evlist__propagate_maps(struct perf_evlist *evlist)
>  {
>  	struct perf_evsel *evsel;
>  
> +	/* Recomputing all_cpus, so start with a blank slate. */
> +	perf_cpu_map__put(evlist->all_cpus);
> +	evlist->all_cpus = NULL;
> +
>  	perf_evlist__for_each_evsel(evlist, evsel)
>  		__perf_evlist__propagate_maps(evlist, evsel);
>  }


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 4/6] perf cpumap: Handle dummy maps as empty in subset
  2022-05-04 13:59         ` Ian Rogers
@ 2022-05-04 16:33           ` Ian Rogers
  0 siblings, 0 replies; 17+ messages in thread
From: Ian Rogers @ 2022-05-04 16:33 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel,
	Stephane Eranian

On Wed, May 4, 2022 at 6:59 AM Ian Rogers <irogers@google.com> wrote:
>
> On Wed, May 4, 2022 at 5:54 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
> >
> > On 3/05/22 17:03, Ian Rogers wrote:
> > > On Tue, May 3, 2022 at 12:43 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
> > >>
> > >> On 3/05/22 07:17, Ian Rogers wrote:
> > >>> perf_cpu_map__empty is true for empty and dummy maps. Make is_subset
> > >>> respect that.
> > >>
> > >> As I wrote before, I am not keen on this because it prevents -1, as a
> > >> valid 3rd parameter to perf_event_open(), from being represented
> > >> in merged evsel cpu maps.
> > >>
> > >> Why do you want this?
> > >
> > > Thanks Adrian, could you give me a test case (command line) where the
> > > differing dummy and empty behavior matters?
> >
> > perf record --per-thread -e intel_pt// uname
> >
> > With patchset "perf intel-pt: Better support for perf record --cpu"
> > the above will have (assuming 8-CPUs):
> >         user_requested_cpus = {-1}
> >         intel_pt evsel->cpus = {-1}
> >         text_poke dummy evsel->cpus = {0-7}
> > which when merged would result in:
> >         before this patch: all_cpus = {-1-7}
> >         after this patch:  all_cpus = {0-7}
> >
> > The absence of -1 will mean that the intel_pt event does not get
> > mmapped.
>
> Thanks, so what's the right fix? To make this work we should:
>  - remove language of dummy being a singular cpu_map
>  - change perf_cpu_map__empty to be something like
> perf_cpu_map__empty_or_just_dummy
>  - change cpu_map__is_dummy to be something llike cpu_map__singular_dummy
>  - add tests on cpu map code for things like the evlist affiinity
> iterator as I'm not clear what will happen when it encounters a -1 CPU
> Note, I'm proposing changing function names rather than implementation
> behavior, as we don't have enough tests to give me confidence that
> changing the existing behavior wouldn't break something. For example,
> --per-thread mode was recently broken:
> https://lore.kernel.org/lkml/e1ce0d93-88cc-af79-e67e-d3c79d166ca6@gmail.com/
> Do we also need to fix up parse events for software events (e.g.
> faults) where the cpu map is empty but really should be dummy? This
> will likely need a propagate fix as during the parsing propagation
> user_requested_cpus is empty and we want to keep dummy cpu maps, not
> overwrite them with empty.
>
> I see a fair amount of clean up here, which isn't bad. My assumed
> alternative was that the intel_pt code could look for dummy cpu maps
> on the evsels, but then why have dummy cpu maps at all and just use
> empty throughout the code base? We could also add a flag to the evlist
> to say whether any evsel cpu maps contain a dummy/empty map. API wise
> I'm tempted to say that the dummy CPU map should be removed and empty
> just used in its place (less is more, keep it simple).
>
> Something that would help with clarity, I think, would be to land the fix in:
> https://lore.kernel.org/lkml/20220503041757.2365696-3-irogers@google.com/
> as currently all_cpus contains cpus not in the evsel cpu maps, but are
> residue from when the evsels were parsed.

Just to think out loud on this a bit more. If we imagine the CPU in
the CPU map is going to be consumed by perf_event_open then the
definition of CPU there is:
cpu == -1: Any CPU
cpu >= 0: Specified CPU
A CPU map of {-1-7} would mean any CPU and CPUs 0-7, which for an
evsel would be something of a contradiction. We also have the
contradiction that no CPUs is turned into -1, which means any CPU.
This leads me to think:
 - let's rename dummy to any CPU, this is more intention revealing and
may avoid confusion with the dummy event;
 - let's make empty really mean empty, no specified or any CPU values.
If an evsel is opened, .. with an empty CPU map it is a failure (this
will need parse-event fixes, etc.);
 - let's allow {-1-7} as a CPU map, ie allow any with specified CPUS,
and document that it comes about from merging CPU maps;
 - let's try to push the cleaned up API into libperf and get rid of
leftover bits in tools/perf/util/cpumap.h.

In terms of scheduling I think we can land what's in this series
except this patch and the rename of all_cpus in patch 6/6. This avoids
merge conflicts to more easily land the intel_pt --cpu fixes in:
https://lore.kernel.org/lkml/20220422162402.147958-1-adrian.hunter@intel.com/
With that landed then I can do the function rename and cleanup work.
Things like making sure the evlist affinity iterator works an any CPU
in the CPU map.

What do you think?

Thanks,
Ian

> Thanks,
> Ian
>
> > >                                             Normally cpus/own_cpus are
> > > set to null during parsing. They may get replaced with
> > > user_requested_cpus:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/lib/perf/evlist.c?h=perf/core#n44
> > > https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/lib/perf/evlist.c?h=perf/core#n45
> > > (should it be on line 45 that !empty is expected?)
> > >
> > > During merge the null/empty all_cpus drops this value, which doesn't
> > > matter as the behavior with empty is the same as dummy:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/lib/perf/evsel.c?h=perf/core#n119
> > >
> > > What's concerning me is the definition of empty:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tree/tools/lib/perf/cpumap.c?h=perf/core#n279
> > > ```
> > > return map ? map->map[0].cpu == -1 : true;
> > > ```
> > > If the first entry can be -1 and there can be other CPUs merged after
> > > then that cpu map will be empty by the definition above. Perhaps it
> > > should be:
> > > ```
> > > return map ? (map->nr == 1 && map->map[0].cpu == -1) : true;
> > > ```
> > > but it seems you prefer:
> > > ```
> > > return (map == NULL) ? true : false;
> > > ```
> > >
> > > You'd asked what the behavior with a dummy is and clearly it is
> > > somewhat muddy. That is what this patch and unit test is trying to
> > > clean up.
> > >
> > > Thanks,
> > > Ian
> > >
> > >>>
> > >>> Signed-off-by: Ian Rogers <irogers@google.com>
> > >>> ---
> > >>>  tools/lib/perf/cpumap.c   |  4 ++--
> > >>>  tools/perf/tests/cpumap.c | 10 +++++++++-
> > >>>  2 files changed, 11 insertions(+), 3 deletions(-)
> > >>>
> > >>> diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
> > >>> index 384d5e076ee4..9c83675788c2 100644
> > >>> --- a/tools/lib/perf/cpumap.c
> > >>> +++ b/tools/lib/perf/cpumap.c
> > >>> @@ -322,9 +322,9 @@ struct perf_cpu perf_cpu_map__max(struct perf_cpu_map *map)
> > >>>  /** Is 'b' a subset of 'a'. */
> > >>>  bool perf_cpu_map__is_subset(const struct perf_cpu_map *a, const struct perf_cpu_map *b)
> > >>>  {
> > >>> -     if (a == b || !b)
> > >>> +     if (a == b || perf_cpu_map__empty(b))
> > >>>               return true;
> > >>> -     if (!a || b->nr > a->nr)
> > >>> +     if (perf_cpu_map__empty(a) || b->nr > a->nr)
> > >>>               return false;
> > >>>
> > >>>       for (int i = 0, j = 0; i < a->nr; i++) {
> > >>> diff --git a/tools/perf/tests/cpumap.c b/tools/perf/tests/cpumap.c
> > >>> index f94929ebb54b..d52b58395385 100644
> > >>> --- a/tools/perf/tests/cpumap.c
> > >>> +++ b/tools/perf/tests/cpumap.c
> > >>> @@ -128,13 +128,21 @@ static int test__cpu_map_merge(struct test_suite *test __maybe_unused, int subte
> > >>>       struct perf_cpu_map *a = perf_cpu_map__new("4,2,1");
> > >>>       struct perf_cpu_map *b = perf_cpu_map__new("4,5,7");
> > >>>       struct perf_cpu_map *c = perf_cpu_map__merge(a, b);
> > >>> +     struct perf_cpu_map *d = perf_cpu_map__dummy_new();
> > >>> +     struct perf_cpu_map *e = perf_cpu_map__merge(b, d);
> > >>>       char buf[100];
> > >>>
> > >>>       TEST_ASSERT_VAL("failed to merge map: bad nr", perf_cpu_map__nr(c) == 5);
> > >>>       cpu_map__snprint(c, buf, sizeof(buf));
> > >>>       TEST_ASSERT_VAL("failed to merge map: bad result", !strcmp(buf, "1-2,4-5,7"));
> > >>> -     perf_cpu_map__put(b);
> > >>> +
> > >>> +     TEST_ASSERT_VAL("failed to merge map: bad nr", perf_cpu_map__nr(e) == 3);
> > >>> +     cpu_map__snprint(e, buf, sizeof(buf));
> > >>> +     TEST_ASSERT_VAL("failed to merge map: bad result", !strcmp(buf, "4-5,7"));
> > >>> +
> > >>>       perf_cpu_map__put(c);
> > >>> +     perf_cpu_map__put(d);
> > >>> +     perf_cpu_map__put(e);
> > >>>       return 0;
> > >>>  }
> > >>>
> > >>
> >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 1/6] perf cpumap: Switch to using perf_cpu_map API
  2022-05-03  4:17 ` [PATCH v5 1/6] perf cpumap: Switch to using perf_cpu_map API Ian Rogers
@ 2022-05-04 17:44   ` Namhyung Kim
  2022-05-05 17:39     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 17+ messages in thread
From: Namhyung Kim @ 2022-05-04 17:44 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Mathieu Poirier,
	Suzuki K Poulose, Mike Leach, Leo Yan, John Garry, Will Deacon,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Kajol Jain, James Clark, German Gomez, Adrian Hunter,
	Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel,
	Stephane Eranian

Hi Ian,

On Mon, May 2, 2022 at 9:18 PM Ian Rogers <irogers@google.com> wrote:
>
> Switch some raw accesses to the cpu map to using the library API. This
> can help with reference count checking. Some BPF cases switch from index
> to CPU for consistency, this shouldn't matter as the CPU map is full.
>
> Signed-off-by: Ian Rogers <irogers@google.com>

Acked-by: Namhyung Kim <namhyung@kernel.org>

Thanks,
Namhyung


> ---
>  tools/perf/builtin-record.c          | 13 +++++----
>  tools/perf/util/bpf_counter_cgroup.c | 42 +++++++++++++---------------
>  2 files changed, 26 insertions(+), 29 deletions(-)
>
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 069825c48d40..a5cf6a99d67f 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -1011,7 +1011,7 @@ static int record__thread_data_init_maps(struct record_thread *thread_data, stru
>
>         for (m = 0, tm = 0; m < nr_mmaps && tm < thread_data->nr_mmaps; m++) {
>                 if (cpu_map__is_dummy(cpus) ||
> -                   test_bit(cpus->map[m].cpu, thread_data->mask->maps.bits)) {
> +                   test_bit(perf_cpu_map__cpu(cpus, m).cpu, thread_data->mask->maps.bits)) {
>                         if (thread_data->maps) {
>                                 thread_data->maps[tm] = &mmap[m];
>                                 pr_debug2("thread_data[%p]: cpu%d: maps[%d] -> mmap[%d]\n",
> @@ -3331,13 +3331,14 @@ struct option *record_options = __record_options;
>
>  static void record__mmap_cpu_mask_init(struct mmap_cpu_mask *mask, struct perf_cpu_map *cpus)
>  {
> -       int c;
> +       struct perf_cpu cpu;
> +       int idx;
>
>         if (cpu_map__is_dummy(cpus))
>                 return;
>
> -       for (c = 0; c < cpus->nr; c++)
> -               set_bit(cpus->map[c].cpu, mask->bits);
> +       perf_cpu_map__for_each_cpu(cpu, idx, cpus)
> +               set_bit(cpu.cpu, mask->bits);
>  }
>
>  static int record__mmap_cpu_mask_init_spec(struct mmap_cpu_mask *mask, const char *mask_spec)
> @@ -3404,8 +3405,8 @@ static int record__init_thread_cpu_masks(struct record *rec, struct perf_cpu_map
>         pr_debug("nr_threads: %d\n", rec->nr_threads);
>
>         for (t = 0; t < rec->nr_threads; t++) {
> -               set_bit(cpus->map[t].cpu, rec->thread_masks[t].maps.bits);
> -               set_bit(cpus->map[t].cpu, rec->thread_masks[t].affinity.bits);
> +               set_bit(perf_cpu_map__cpu(cpus, t).cpu, rec->thread_masks[t].maps.bits);
> +               set_bit(perf_cpu_map__cpu(cpus, t).cpu, rec->thread_masks[t].affinity.bits);
>                 if (verbose) {
>                         pr_debug("thread_masks[%d]: ", t);
>                         mmap_cpu_mask__scnprintf(&rec->thread_masks[t].maps, "maps");
> diff --git a/tools/perf/util/bpf_counter_cgroup.c b/tools/perf/util/bpf_counter_cgroup.c
> index ac60c08e8e2a..63b9db657442 100644
> --- a/tools/perf/util/bpf_counter_cgroup.c
> +++ b/tools/perf/util/bpf_counter_cgroup.c
> @@ -46,8 +46,8 @@ static int bperf_load_program(struct evlist *evlist)
>         struct bpf_link *link;
>         struct evsel *evsel;
>         struct cgroup *cgrp, *leader_cgrp;
> -       __u32 i, cpu;
> -       __u32 nr_cpus = evlist->core.all_cpus->nr;
> +       int i, j;
> +       struct perf_cpu cpu;
>         int total_cpus = cpu__max_cpu().cpu;
>         int map_size, map_fd;
>         int prog_fd, err;
> @@ -93,9 +93,9 @@ static int bperf_load_program(struct evlist *evlist)
>                 goto out;
>         }
>
> -       for (i = 0; i < nr_cpus; i++) {
> +       perf_cpu_map__for_each_cpu(cpu, i, evlist->core.all_cpus) {
>                 link = bpf_program__attach_perf_event(skel->progs.on_cgrp_switch,
> -                                                     FD(cgrp_switch, i));
> +                                                     FD(cgrp_switch, cpu.cpu));
>                 if (IS_ERR(link)) {
>                         pr_err("Failed to attach cgroup program\n");
>                         err = PTR_ERR(link);
> @@ -122,10 +122,9 @@ static int bperf_load_program(struct evlist *evlist)
>                         }
>
>                         map_fd = bpf_map__fd(skel->maps.events);
> -                       for (cpu = 0; cpu < nr_cpus; cpu++) {
> -                               int fd = FD(evsel, cpu);
> -                               __u32 idx = evsel->core.idx * total_cpus +
> -                                       evlist->core.all_cpus->map[cpu].cpu;
> +                       perf_cpu_map__for_each_cpu(cpu, j, evlist->core.all_cpus) {
> +                               int fd = FD(evsel, cpu.cpu);
> +                               __u32 idx = evsel->core.idx * total_cpus + cpu.cpu;
>
>                                 err = bpf_map_update_elem(map_fd, &idx, &fd,
>                                                           BPF_ANY);
> @@ -207,14 +206,12 @@ static int bperf_cgrp__install_pe(struct evsel *evsel __maybe_unused,
>   */
>  static int bperf_cgrp__sync_counters(struct evlist *evlist)
>  {
> -       int i, cpu;
> -       int nr_cpus = evlist->core.all_cpus->nr;
> +       struct perf_cpu cpu;
> +       int idx;
>         int prog_fd = bpf_program__fd(skel->progs.trigger_read);
>
> -       for (i = 0; i < nr_cpus; i++) {
> -               cpu = evlist->core.all_cpus->map[i].cpu;
> -               bperf_trigger_reading(prog_fd, cpu);
> -       }
> +       perf_cpu_map__for_each_cpu(cpu, idx, evlist->core.all_cpus)
> +               bperf_trigger_reading(prog_fd, cpu.cpu);
>
>         return 0;
>  }
> @@ -244,12 +241,10 @@ static int bperf_cgrp__disable(struct evsel *evsel)
>  static int bperf_cgrp__read(struct evsel *evsel)
>  {
>         struct evlist *evlist = evsel->evlist;
> -       int i, cpu, nr_cpus = evlist->core.all_cpus->nr;
>         int total_cpus = cpu__max_cpu().cpu;
>         struct perf_counts_values *counts;
>         struct bpf_perf_event_value *values;
>         int reading_map_fd, err = 0;
> -       __u32 idx;
>
>         if (evsel->core.idx)
>                 return 0;
> @@ -263,7 +258,10 @@ static int bperf_cgrp__read(struct evsel *evsel)
>         reading_map_fd = bpf_map__fd(skel->maps.cgrp_readings);
>
>         evlist__for_each_entry(evlist, evsel) {
> -               idx = evsel->core.idx;
> +               __u32 idx = evsel->core.idx;
> +               int i;
> +               struct perf_cpu cpu;
> +
>                 err = bpf_map_lookup_elem(reading_map_fd, &idx, values);
>                 if (err) {
>                         pr_err("bpf map lookup failed: idx=%u, event=%s, cgrp=%s\n",
> @@ -271,13 +269,11 @@ static int bperf_cgrp__read(struct evsel *evsel)
>                         goto out;
>                 }
>
> -               for (i = 0; i < nr_cpus; i++) {
> -                       cpu = evlist->core.all_cpus->map[i].cpu;
> -
> +               perf_cpu_map__for_each_cpu(cpu, i, evlist->core.all_cpus) {
>                         counts = perf_counts(evsel->counts, i, 0);
> -                       counts->val = values[cpu].counter;
> -                       counts->ena = values[cpu].enabled;
> -                       counts->run = values[cpu].running;
> +                       counts->val = values[cpu.cpu].counter;
> +                       counts->ena = values[cpu.cpu].enabled;
> +                       counts->run = values[cpu.cpu].running;
>                 }
>         }
>
> --
> 2.36.0.464.gb9c8b46e94-goog
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 1/6] perf cpumap: Switch to using perf_cpu_map API
  2022-05-04 17:44   ` Namhyung Kim
@ 2022-05-05 17:39     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-05-05 17:39 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ian Rogers, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Adrian Hunter, Riccardo Mancini, Andi Kleen,
	Alexey Bayduraev, Alexander Antonov, linux-perf-users,
	linux-kernel, Stephane Eranian

Em Wed, May 04, 2022 at 10:44:42AM -0700, Namhyung Kim escreveu:
> Hi Ian,
> 
> On Mon, May 2, 2022 at 9:18 PM Ian Rogers <irogers@google.com> wrote:
> >
> > Switch some raw accesses to the cpu map to using the library API. This
> > can help with reference count checking. Some BPF cases switch from index
> > to CPU for consistency, this shouldn't matter as the CPU map is full.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> 
> Acked-by: Namhyung Kim <namhyung@kernel.org>

Thanks, applied.

- Arnaldo

 
> Thanks,
> Namhyung
> 
> 
> > ---
> >  tools/perf/builtin-record.c          | 13 +++++----
> >  tools/perf/util/bpf_counter_cgroup.c | 42 +++++++++++++---------------
> >  2 files changed, 26 insertions(+), 29 deletions(-)
> >
> > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > index 069825c48d40..a5cf6a99d67f 100644
> > --- a/tools/perf/builtin-record.c
> > +++ b/tools/perf/builtin-record.c
> > @@ -1011,7 +1011,7 @@ static int record__thread_data_init_maps(struct record_thread *thread_data, stru
> >
> >         for (m = 0, tm = 0; m < nr_mmaps && tm < thread_data->nr_mmaps; m++) {
> >                 if (cpu_map__is_dummy(cpus) ||
> > -                   test_bit(cpus->map[m].cpu, thread_data->mask->maps.bits)) {
> > +                   test_bit(perf_cpu_map__cpu(cpus, m).cpu, thread_data->mask->maps.bits)) {
> >                         if (thread_data->maps) {
> >                                 thread_data->maps[tm] = &mmap[m];
> >                                 pr_debug2("thread_data[%p]: cpu%d: maps[%d] -> mmap[%d]\n",
> > @@ -3331,13 +3331,14 @@ struct option *record_options = __record_options;
> >
> >  static void record__mmap_cpu_mask_init(struct mmap_cpu_mask *mask, struct perf_cpu_map *cpus)
> >  {
> > -       int c;
> > +       struct perf_cpu cpu;
> > +       int idx;
> >
> >         if (cpu_map__is_dummy(cpus))
> >                 return;
> >
> > -       for (c = 0; c < cpus->nr; c++)
> > -               set_bit(cpus->map[c].cpu, mask->bits);
> > +       perf_cpu_map__for_each_cpu(cpu, idx, cpus)
> > +               set_bit(cpu.cpu, mask->bits);
> >  }
> >
> >  static int record__mmap_cpu_mask_init_spec(struct mmap_cpu_mask *mask, const char *mask_spec)
> > @@ -3404,8 +3405,8 @@ static int record__init_thread_cpu_masks(struct record *rec, struct perf_cpu_map
> >         pr_debug("nr_threads: %d\n", rec->nr_threads);
> >
> >         for (t = 0; t < rec->nr_threads; t++) {
> > -               set_bit(cpus->map[t].cpu, rec->thread_masks[t].maps.bits);
> > -               set_bit(cpus->map[t].cpu, rec->thread_masks[t].affinity.bits);
> > +               set_bit(perf_cpu_map__cpu(cpus, t).cpu, rec->thread_masks[t].maps.bits);
> > +               set_bit(perf_cpu_map__cpu(cpus, t).cpu, rec->thread_masks[t].affinity.bits);
> >                 if (verbose) {
> >                         pr_debug("thread_masks[%d]: ", t);
> >                         mmap_cpu_mask__scnprintf(&rec->thread_masks[t].maps, "maps");
> > diff --git a/tools/perf/util/bpf_counter_cgroup.c b/tools/perf/util/bpf_counter_cgroup.c
> > index ac60c08e8e2a..63b9db657442 100644
> > --- a/tools/perf/util/bpf_counter_cgroup.c
> > +++ b/tools/perf/util/bpf_counter_cgroup.c
> > @@ -46,8 +46,8 @@ static int bperf_load_program(struct evlist *evlist)
> >         struct bpf_link *link;
> >         struct evsel *evsel;
> >         struct cgroup *cgrp, *leader_cgrp;
> > -       __u32 i, cpu;
> > -       __u32 nr_cpus = evlist->core.all_cpus->nr;
> > +       int i, j;
> > +       struct perf_cpu cpu;
> >         int total_cpus = cpu__max_cpu().cpu;
> >         int map_size, map_fd;
> >         int prog_fd, err;
> > @@ -93,9 +93,9 @@ static int bperf_load_program(struct evlist *evlist)
> >                 goto out;
> >         }
> >
> > -       for (i = 0; i < nr_cpus; i++) {
> > +       perf_cpu_map__for_each_cpu(cpu, i, evlist->core.all_cpus) {
> >                 link = bpf_program__attach_perf_event(skel->progs.on_cgrp_switch,
> > -                                                     FD(cgrp_switch, i));
> > +                                                     FD(cgrp_switch, cpu.cpu));
> >                 if (IS_ERR(link)) {
> >                         pr_err("Failed to attach cgroup program\n");
> >                         err = PTR_ERR(link);
> > @@ -122,10 +122,9 @@ static int bperf_load_program(struct evlist *evlist)
> >                         }
> >
> >                         map_fd = bpf_map__fd(skel->maps.events);
> > -                       for (cpu = 0; cpu < nr_cpus; cpu++) {
> > -                               int fd = FD(evsel, cpu);
> > -                               __u32 idx = evsel->core.idx * total_cpus +
> > -                                       evlist->core.all_cpus->map[cpu].cpu;
> > +                       perf_cpu_map__for_each_cpu(cpu, j, evlist->core.all_cpus) {
> > +                               int fd = FD(evsel, cpu.cpu);
> > +                               __u32 idx = evsel->core.idx * total_cpus + cpu.cpu;
> >
> >                                 err = bpf_map_update_elem(map_fd, &idx, &fd,
> >                                                           BPF_ANY);
> > @@ -207,14 +206,12 @@ static int bperf_cgrp__install_pe(struct evsel *evsel __maybe_unused,
> >   */
> >  static int bperf_cgrp__sync_counters(struct evlist *evlist)
> >  {
> > -       int i, cpu;
> > -       int nr_cpus = evlist->core.all_cpus->nr;
> > +       struct perf_cpu cpu;
> > +       int idx;
> >         int prog_fd = bpf_program__fd(skel->progs.trigger_read);
> >
> > -       for (i = 0; i < nr_cpus; i++) {
> > -               cpu = evlist->core.all_cpus->map[i].cpu;
> > -               bperf_trigger_reading(prog_fd, cpu);
> > -       }
> > +       perf_cpu_map__for_each_cpu(cpu, idx, evlist->core.all_cpus)
> > +               bperf_trigger_reading(prog_fd, cpu.cpu);
> >
> >         return 0;
> >  }
> > @@ -244,12 +241,10 @@ static int bperf_cgrp__disable(struct evsel *evsel)
> >  static int bperf_cgrp__read(struct evsel *evsel)
> >  {
> >         struct evlist *evlist = evsel->evlist;
> > -       int i, cpu, nr_cpus = evlist->core.all_cpus->nr;
> >         int total_cpus = cpu__max_cpu().cpu;
> >         struct perf_counts_values *counts;
> >         struct bpf_perf_event_value *values;
> >         int reading_map_fd, err = 0;
> > -       __u32 idx;
> >
> >         if (evsel->core.idx)
> >                 return 0;
> > @@ -263,7 +258,10 @@ static int bperf_cgrp__read(struct evsel *evsel)
> >         reading_map_fd = bpf_map__fd(skel->maps.cgrp_readings);
> >
> >         evlist__for_each_entry(evlist, evsel) {
> > -               idx = evsel->core.idx;
> > +               __u32 idx = evsel->core.idx;
> > +               int i;
> > +               struct perf_cpu cpu;
> > +
> >                 err = bpf_map_lookup_elem(reading_map_fd, &idx, values);
> >                 if (err) {
> >                         pr_err("bpf map lookup failed: idx=%u, event=%s, cgrp=%s\n",
> > @@ -271,13 +269,11 @@ static int bperf_cgrp__read(struct evsel *evsel)
> >                         goto out;
> >                 }
> >
> > -               for (i = 0; i < nr_cpus; i++) {
> > -                       cpu = evlist->core.all_cpus->map[i].cpu;
> > -
> > +               perf_cpu_map__for_each_cpu(cpu, i, evlist->core.all_cpus) {
> >                         counts = perf_counts(evsel->counts, i, 0);
> > -                       counts->val = values[cpu].counter;
> > -                       counts->ena = values[cpu].enabled;
> > -                       counts->run = values[cpu].running;
> > +                       counts->val = values[cpu.cpu].counter;
> > +                       counts->ena = values[cpu.cpu].enabled;
> > +                       counts->run = values[cpu.cpu].running;
> >                 }
> >         }
> >
> > --
> > 2.36.0.464.gb9c8b46e94-goog
> >

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v5 2/6] perf evlist: Clear all_cpus before propagating
  2022-05-04 14:15   ` Adrian Hunter
@ 2022-05-05 17:39     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-05-05 17:39 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ian Rogers, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Mathieu Poirier,
	Suzuki K Poulose, Mike Leach, Leo Yan, John Garry, Will Deacon,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Kajol Jain, James Clark, German Gomez,
	Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel,
	Stephane Eranian

Em Wed, May 04, 2022 at 05:15:57PM +0300, Adrian Hunter escreveu:
> On 3/05/22 07:17, Ian Rogers wrote:
> > all_cpus is merged into during propagation. Initially all_cpus is set
> > from PMU sysfs. perf_evlist__set_maps will recompute it and change
> > evsel->cpus to user_requested_cpus if they are given. If all_cpus isn't
> > cleared then the union of the user_requested_cpus and PMU sysfs values
> > is set to all_cpus, whereas just user_requested_cpus is necessary. To
> > avoid this make all_cpus empty prior to propagation.
> > 
> > Signed-off-by: Ian Rogers <irogers@google.com>
> 
> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>

Thanks, applied.

- Arnaldo

 
> > ---
> >  tools/lib/perf/evlist.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
> > index a09315538a30..974b4585f93e 100644
> > --- a/tools/lib/perf/evlist.c
> > +++ b/tools/lib/perf/evlist.c
> > @@ -59,6 +59,10 @@ static void perf_evlist__propagate_maps(struct perf_evlist *evlist)
> >  {
> >  	struct perf_evsel *evsel;
> >  
> > +	/* Recomputing all_cpus, so start with a blank slate. */
> > +	perf_cpu_map__put(evlist->all_cpus);
> > +	evlist->all_cpus = NULL;
> > +
> >  	perf_evlist__for_each_evsel(evlist, evsel)
> >  		__perf_evlist__propagate_maps(evlist, evsel);
> >  }

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-05-05 17:39 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-03  4:17 [PATCH v5 0/6] Make evlist CPUs more accurate Ian Rogers
2022-05-03  4:17 ` [PATCH v5 1/6] perf cpumap: Switch to using perf_cpu_map API Ian Rogers
2022-05-04 17:44   ` Namhyung Kim
2022-05-05 17:39     ` Arnaldo Carvalho de Melo
2022-05-03  4:17 ` [PATCH v5 2/6] perf evlist: Clear all_cpus before propagating Ian Rogers
2022-05-04 14:15   ` Adrian Hunter
2022-05-05 17:39     ` Arnaldo Carvalho de Melo
2022-05-03  4:17 ` [PATCH v5 3/6] perf stat: Avoid printing cpus with no counters Ian Rogers
2022-05-03 14:19   ` Arnaldo Carvalho de Melo
2022-05-03  4:17 ` [PATCH v5 4/6] perf cpumap: Handle dummy maps as empty in subset Ian Rogers
2022-05-03  7:43   ` Adrian Hunter
2022-05-03 14:03     ` Ian Rogers
2022-05-04 12:54       ` Adrian Hunter
2022-05-04 13:59         ` Ian Rogers
2022-05-04 16:33           ` Ian Rogers
2022-05-03  4:17 ` [PATCH v5 5/6] perf evlist: Add to user_requested_cpus documentation Ian Rogers
2022-05-03  4:17 ` [PATCH v5 6/6] perf evlist: Rename all_cpus Ian Rogers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).