All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Make evlist CPUs more accurate
@ 2022-03-28  6:24 ` Ian Rogers
  0 siblings, 0 replies; 34+ messages in thread
From: Ian Rogers @ 2022-03-28  6:24 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel, coresight,
	linux-arm-kernel, netdev, bpf
  Cc: Stephane Eranian, Ian Rogers

evlist has all_cpus, computed to be the merge of all evsel CPU maps,
and cpus. cpus may contain more CPUs than all_cpus, as by default cpus
holds all online CPUs whilst all_cpus holds the merge/union from
evsels. For an uncore event there may just be 1 CPU per socket, which
will be a far smaller CPU map than all online CPUs.

These patches change cpus to be called user_cpus, to reflect their
potential user specified nature. The user_cpus are set to be the
current value intersected with all_cpus, so that user_cpus is always a
subset of all_cpus. This fixes printing code for metrics so that
unnecessary blank lines aren't printed.

To make the intersect function perform well, a perf_cpu_map__is_subset
function is added. While adding this function, also use it in
perf_cpu_map__merge to avoid creating a new CPU map for some currently
missed patterns.

Ian Rogers (5):
  perf evlist: Rename cpus to user_cpus
  perf cpumap: More cpu map reuse by merge.
  perf cpumap: Add intersect function.
  perf stat: Avoid segv if core.user_cpus isn't set.
  perf evlist: Respect all_cpus when setting user_cpus

 tools/lib/perf/cpumap.c                  | 76 ++++++++++++++++++++----
 tools/lib/perf/evlist.c                  | 28 ++++-----
 tools/lib/perf/include/internal/evlist.h |  4 +-
 tools/lib/perf/include/perf/cpumap.h     |  2 +
 tools/perf/arch/arm/util/cs-etm.c        |  8 +--
 tools/perf/arch/arm64/util/arm-spe.c     |  2 +-
 tools/perf/arch/x86/util/intel-bts.c     |  2 +-
 tools/perf/arch/x86/util/intel-pt.c      |  4 +-
 tools/perf/bench/evlist-open-close.c     |  2 +-
 tools/perf/builtin-ftrace.c              |  2 +-
 tools/perf/builtin-record.c              |  6 +-
 tools/perf/builtin-stat.c                | 11 ++--
 tools/perf/builtin-top.c                 |  2 +-
 tools/perf/util/auxtrace.c               |  2 +-
 tools/perf/util/bpf_ftrace.c             |  4 +-
 tools/perf/util/evlist.c                 | 16 ++---
 tools/perf/util/record.c                 |  6 +-
 tools/perf/util/sideband_evlist.c        |  2 +-
 tools/perf/util/stat-display.c           |  2 +-
 tools/perf/util/synthetic-events.c       |  2 +-
 tools/perf/util/top.c                    |  6 +-
 21 files changed, 127 insertions(+), 62 deletions(-)

-- 
2.35.1.1021.g381101b075-goog


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 0/5] Make evlist CPUs more accurate
@ 2022-03-28  6:24 ` Ian Rogers
  0 siblings, 0 replies; 34+ messages in thread
From: Ian Rogers @ 2022-03-28  6:24 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel, coresight,
	linux-arm-kernel, netdev, bpf
  Cc: Stephane Eranian, Ian Rogers

evlist has all_cpus, computed to be the merge of all evsel CPU maps,
and cpus. cpus may contain more CPUs than all_cpus, as by default cpus
holds all online CPUs whilst all_cpus holds the merge/union from
evsels. For an uncore event there may just be 1 CPU per socket, which
will be a far smaller CPU map than all online CPUs.

These patches change cpus to be called user_cpus, to reflect their
potential user specified nature. The user_cpus are set to be the
current value intersected with all_cpus, so that user_cpus is always a
subset of all_cpus. This fixes printing code for metrics so that
unnecessary blank lines aren't printed.

To make the intersect function perform well, a perf_cpu_map__is_subset
function is added. While adding this function, also use it in
perf_cpu_map__merge to avoid creating a new CPU map for some currently
missed patterns.

Ian Rogers (5):
  perf evlist: Rename cpus to user_cpus
  perf cpumap: More cpu map reuse by merge.
  perf cpumap: Add intersect function.
  perf stat: Avoid segv if core.user_cpus isn't set.
  perf evlist: Respect all_cpus when setting user_cpus

 tools/lib/perf/cpumap.c                  | 76 ++++++++++++++++++++----
 tools/lib/perf/evlist.c                  | 28 ++++-----
 tools/lib/perf/include/internal/evlist.h |  4 +-
 tools/lib/perf/include/perf/cpumap.h     |  2 +
 tools/perf/arch/arm/util/cs-etm.c        |  8 +--
 tools/perf/arch/arm64/util/arm-spe.c     |  2 +-
 tools/perf/arch/x86/util/intel-bts.c     |  2 +-
 tools/perf/arch/x86/util/intel-pt.c      |  4 +-
 tools/perf/bench/evlist-open-close.c     |  2 +-
 tools/perf/builtin-ftrace.c              |  2 +-
 tools/perf/builtin-record.c              |  6 +-
 tools/perf/builtin-stat.c                | 11 ++--
 tools/perf/builtin-top.c                 |  2 +-
 tools/perf/util/auxtrace.c               |  2 +-
 tools/perf/util/bpf_ftrace.c             |  4 +-
 tools/perf/util/evlist.c                 | 16 ++---
 tools/perf/util/record.c                 |  6 +-
 tools/perf/util/sideband_evlist.c        |  2 +-
 tools/perf/util/stat-display.c           |  2 +-
 tools/perf/util/synthetic-events.c       |  2 +-
 tools/perf/util/top.c                    |  6 +-
 21 files changed, 127 insertions(+), 62 deletions(-)

-- 
2.35.1.1021.g381101b075-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/5] perf evlist: Rename cpus to user_cpus
  2022-03-28  6:24 ` Ian Rogers
@ 2022-03-28  6:24   ` Ian Rogers
  -1 siblings, 0 replies; 34+ messages in thread
From: Ian Rogers @ 2022-03-28  6:24 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel, coresight,
	linux-arm-kernel, netdev, bpf
  Cc: Stephane Eranian, Ian Rogers

evlist contains cpus and all_cpus. all_cpus is the union of the cpu maps
of all evsels. cpus is set to be cpus required from the command line,
defaulting to all online cpus if no cpus are specified. For something
like an uncore event, all_cpus may just be CPU 0, however, all_cpus may
be every online CPU. This causes all_cpus to have fewer values than the
cpus variable which is confusing given the 'all' in the name. To try to
make the behavior clearer, rename cpus to user_cpus and add comments on
the two struct variables.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/lib/perf/evlist.c                  | 28 ++++++++++++------------
 tools/lib/perf/include/internal/evlist.h |  4 +++-
 tools/perf/arch/arm/util/cs-etm.c        |  8 +++----
 tools/perf/arch/arm64/util/arm-spe.c     |  2 +-
 tools/perf/arch/x86/util/intel-bts.c     |  2 +-
 tools/perf/arch/x86/util/intel-pt.c      |  4 ++--
 tools/perf/bench/evlist-open-close.c     |  2 +-
 tools/perf/builtin-ftrace.c              |  2 +-
 tools/perf/builtin-record.c              |  6 ++---
 tools/perf/builtin-stat.c                |  8 +++----
 tools/perf/builtin-top.c                 |  2 +-
 tools/perf/util/auxtrace.c               |  2 +-
 tools/perf/util/bpf_ftrace.c             |  4 ++--
 tools/perf/util/evlist.c                 | 14 ++++++------
 tools/perf/util/record.c                 |  6 ++---
 tools/perf/util/sideband_evlist.c        |  2 +-
 tools/perf/util/stat-display.c           |  2 +-
 tools/perf/util/synthetic-events.c       |  2 +-
 tools/perf/util/top.c                    |  7 +++---
 19 files changed, 55 insertions(+), 52 deletions(-)

diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index 9a770bfdc804..e29dc229768a 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -41,10 +41,10 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
 	 */
 	if (!evsel->own_cpus || evlist->has_user_cpus) {
 		perf_cpu_map__put(evsel->cpus);
-		evsel->cpus = perf_cpu_map__get(evlist->cpus);
-	} else if (!evsel->system_wide && perf_cpu_map__empty(evlist->cpus)) {
+		evsel->cpus = perf_cpu_map__get(evlist->user_cpus);
+	} else if (!evsel->system_wide && perf_cpu_map__empty(evlist->user_cpus)) {
 		perf_cpu_map__put(evsel->cpus);
-		evsel->cpus = perf_cpu_map__get(evlist->cpus);
+		evsel->cpus = perf_cpu_map__get(evlist->user_cpus);
 	} else if (evsel->cpus != evsel->own_cpus) {
 		perf_cpu_map__put(evsel->cpus);
 		evsel->cpus = perf_cpu_map__get(evsel->own_cpus);
@@ -123,10 +123,10 @@ static void perf_evlist__purge(struct perf_evlist *evlist)
 
 void perf_evlist__exit(struct perf_evlist *evlist)
 {
-	perf_cpu_map__put(evlist->cpus);
+	perf_cpu_map__put(evlist->user_cpus);
 	perf_cpu_map__put(evlist->all_cpus);
 	perf_thread_map__put(evlist->threads);
-	evlist->cpus = NULL;
+	evlist->user_cpus = NULL;
 	evlist->all_cpus = NULL;
 	evlist->threads = NULL;
 	fdarray__exit(&evlist->pollfd);
@@ -155,9 +155,9 @@ void perf_evlist__set_maps(struct perf_evlist *evlist,
 	 * original reference count of 1.  If that is not the case it is up to
 	 * the caller to increase the reference count.
 	 */
-	if (cpus != evlist->cpus) {
-		perf_cpu_map__put(evlist->cpus);
-		evlist->cpus = perf_cpu_map__get(cpus);
+	if (cpus != evlist->user_cpus) {
+		perf_cpu_map__put(evlist->user_cpus);
+		evlist->user_cpus = perf_cpu_map__get(cpus);
 	}
 
 	if (threads != evlist->threads) {
@@ -294,7 +294,7 @@ int perf_evlist__id_add_fd(struct perf_evlist *evlist,
 
 int perf_evlist__alloc_pollfd(struct perf_evlist *evlist)
 {
-	int nr_cpus = perf_cpu_map__nr(evlist->cpus);
+	int nr_cpus = perf_cpu_map__nr(evlist->user_cpus);
 	int nr_threads = perf_thread_map__nr(evlist->threads);
 	int nfds = 0;
 	struct perf_evsel *evsel;
@@ -426,7 +426,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
 	       int idx, struct perf_mmap_param *mp, int cpu_idx,
 	       int thread, int *_output, int *_output_overwrite)
 {
-	struct perf_cpu evlist_cpu = perf_cpu_map__cpu(evlist->cpus, cpu_idx);
+	struct perf_cpu evlist_cpu = perf_cpu_map__cpu(evlist->user_cpus, cpu_idx);
 	struct perf_evsel *evsel;
 	int revent;
 
@@ -536,7 +536,7 @@ mmap_per_cpu(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
 	     struct perf_mmap_param *mp)
 {
 	int nr_threads = perf_thread_map__nr(evlist->threads);
-	int nr_cpus    = perf_cpu_map__nr(evlist->cpus);
+	int nr_cpus    = perf_cpu_map__nr(evlist->user_cpus);
 	int cpu, thread;
 
 	for (cpu = 0; cpu < nr_cpus; cpu++) {
@@ -564,8 +564,8 @@ static int perf_evlist__nr_mmaps(struct perf_evlist *evlist)
 {
 	int nr_mmaps;
 
-	nr_mmaps = perf_cpu_map__nr(evlist->cpus);
-	if (perf_cpu_map__empty(evlist->cpus))
+	nr_mmaps = perf_cpu_map__nr(evlist->user_cpus);
+	if (perf_cpu_map__empty(evlist->user_cpus))
 		nr_mmaps = perf_thread_map__nr(evlist->threads);
 
 	return nr_mmaps;
@@ -576,7 +576,7 @@ int perf_evlist__mmap_ops(struct perf_evlist *evlist,
 			  struct perf_mmap_param *mp)
 {
 	struct perf_evsel *evsel;
-	const struct perf_cpu_map *cpus = evlist->cpus;
+	const struct perf_cpu_map *cpus = evlist->user_cpus;
 	const struct perf_thread_map *threads = evlist->threads;
 
 	if (!ops || !ops->get || !ops->mmap)
diff --git a/tools/lib/perf/include/internal/evlist.h b/tools/lib/perf/include/internal/evlist.h
index 4cefade540bd..5f95672662ae 100644
--- a/tools/lib/perf/include/internal/evlist.h
+++ b/tools/lib/perf/include/internal/evlist.h
@@ -19,7 +19,9 @@ struct perf_evlist {
 	int			 nr_entries;
 	int			 nr_groups;
 	bool			 has_user_cpus;
-	struct perf_cpu_map	*cpus;
+	/** The list of cpus passed from the command line. */
+	struct perf_cpu_map	*user_cpus;
+	/** The union of all evsel cpu maps. */
 	struct perf_cpu_map	*all_cpus;
 	struct perf_thread_map	*threads;
 	int			 nr_mmaps;
diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
index cbc555245959..405d58903d84 100644
--- a/tools/perf/arch/arm/util/cs-etm.c
+++ b/tools/perf/arch/arm/util/cs-etm.c
@@ -199,7 +199,7 @@ static int cs_etm_set_option(struct auxtrace_record *itr,
 			     struct evsel *evsel, u32 option)
 {
 	int i, err = -EINVAL;
-	struct perf_cpu_map *event_cpus = evsel->evlist->core.cpus;
+	struct perf_cpu_map *event_cpus = evsel->evlist->core.user_cpus;
 	struct perf_cpu_map *online_cpus = perf_cpu_map__new(NULL);
 
 	/* Set option of each CPU we have */
@@ -299,7 +299,7 @@ static int cs_etm_recording_options(struct auxtrace_record *itr,
 				container_of(itr, struct cs_etm_recording, itr);
 	struct perf_pmu *cs_etm_pmu = ptr->cs_etm_pmu;
 	struct evsel *evsel, *cs_etm_evsel = NULL;
-	struct perf_cpu_map *cpus = evlist->core.cpus;
+	struct perf_cpu_map *cpus = evlist->core.user_cpus;
 	bool privileged = perf_event_paranoid_check(-1);
 	int err = 0;
 
@@ -522,7 +522,7 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused,
 {
 	int i;
 	int etmv3 = 0, etmv4 = 0, ete = 0;
-	struct perf_cpu_map *event_cpus = evlist->core.cpus;
+	struct perf_cpu_map *event_cpus = evlist->core.user_cpus;
 	struct perf_cpu_map *online_cpus = perf_cpu_map__new(NULL);
 
 	/* cpu map is not empty, we have specific CPUs to work with */
@@ -713,7 +713,7 @@ static int cs_etm_info_fill(struct auxtrace_record *itr,
 	u32 offset;
 	u64 nr_cpu, type;
 	struct perf_cpu_map *cpu_map;
-	struct perf_cpu_map *event_cpus = session->evlist->core.cpus;
+	struct perf_cpu_map *event_cpus = session->evlist->core.user_cpus;
 	struct perf_cpu_map *online_cpus = perf_cpu_map__new(NULL);
 	struct cs_etm_recording *ptr =
 			container_of(itr, struct cs_etm_recording, itr);
diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
index 5860bbaea95a..83ad05613321 100644
--- a/tools/perf/arch/arm64/util/arm-spe.c
+++ b/tools/perf/arch/arm64/util/arm-spe.c
@@ -144,7 +144,7 @@ static int arm_spe_recording_options(struct auxtrace_record *itr,
 			container_of(itr, struct arm_spe_recording, itr);
 	struct perf_pmu *arm_spe_pmu = sper->arm_spe_pmu;
 	struct evsel *evsel, *arm_spe_evsel = NULL;
-	struct perf_cpu_map *cpus = evlist->core.cpus;
+	struct perf_cpu_map *cpus = evlist->core.user_cpus;
 	bool privileged = perf_event_paranoid_check(-1);
 	struct evsel *tracking_evsel;
 	int err;
diff --git a/tools/perf/arch/x86/util/intel-bts.c b/tools/perf/arch/x86/util/intel-bts.c
index 4a76d49d25d6..c9d73ecfd795 100644
--- a/tools/perf/arch/x86/util/intel-bts.c
+++ b/tools/perf/arch/x86/util/intel-bts.c
@@ -110,7 +110,7 @@ static int intel_bts_recording_options(struct auxtrace_record *itr,
 			container_of(itr, struct intel_bts_recording, itr);
 	struct perf_pmu *intel_bts_pmu = btsr->intel_bts_pmu;
 	struct evsel *evsel, *intel_bts_evsel = NULL;
-	const struct perf_cpu_map *cpus = evlist->core.cpus;
+	const struct perf_cpu_map *cpus = evlist->core.user_cpus;
 	bool privileged = perf_event_paranoid_check(-1);
 
 	if (opts->auxtrace_sample_mode) {
diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
index 8c31578d6f4a..58bf24960273 100644
--- a/tools/perf/arch/x86/util/intel-pt.c
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -382,7 +382,7 @@ static int intel_pt_info_fill(struct auxtrace_record *itr,
 			ui__warning("Intel Processor Trace: TSC not available\n");
 	}
 
-	per_cpu_mmaps = !perf_cpu_map__empty(session->evlist->core.cpus);
+	per_cpu_mmaps = !perf_cpu_map__empty(session->evlist->core.user_cpus);
 
 	auxtrace_info->type = PERF_AUXTRACE_INTEL_PT;
 	auxtrace_info->priv[INTEL_PT_PMU_TYPE] = intel_pt_pmu->type;
@@ -632,7 +632,7 @@ static int intel_pt_recording_options(struct auxtrace_record *itr,
 	struct perf_pmu *intel_pt_pmu = ptr->intel_pt_pmu;
 	bool have_timing_info, need_immediate = false;
 	struct evsel *evsel, *intel_pt_evsel = NULL;
-	const struct perf_cpu_map *cpus = evlist->core.cpus;
+	const struct perf_cpu_map *cpus = evlist->core.user_cpus;
 	bool privileged = perf_event_paranoid_check(-1);
 	u64 tsc_bit;
 	int err;
diff --git a/tools/perf/bench/evlist-open-close.c b/tools/perf/bench/evlist-open-close.c
index de56601f69ee..5bdc6b476a4d 100644
--- a/tools/perf/bench/evlist-open-close.c
+++ b/tools/perf/bench/evlist-open-close.c
@@ -151,7 +151,7 @@ static int bench_evlist_open_close__run(char *evstr)
 
 	init_stats(&time_stats);
 
-	printf("  Number of cpus:\t%d\n", perf_cpu_map__nr(evlist->core.cpus));
+	printf("  Number of cpus:\t%d\n", perf_cpu_map__nr(evlist->core.user_cpus));
 	printf("  Number of threads:\t%d\n", evlist->core.threads->nr);
 	printf("  Number of events:\t%d (%d fds)\n",
 		evlist->core.nr_entries, evlist__count_evsel_fds(evlist));
diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index ad9ce1bfffa1..642cbc6fdfc5 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -301,7 +301,7 @@ static int set_tracing_cpumask(struct perf_cpu_map *cpumap)
 
 static int set_tracing_cpu(struct perf_ftrace *ftrace)
 {
-	struct perf_cpu_map *cpumap = ftrace->evlist->core.cpus;
+	struct perf_cpu_map *cpumap = ftrace->evlist->core.user_cpus;
 
 	if (!target__has_cpu(&ftrace->target))
 		return 0;
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 0b4abed555d8..28ab3866802c 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -987,7 +987,7 @@ static int record__thread_data_init_maps(struct record_thread *thread_data, stru
 	int m, tm, nr_mmaps = evlist->core.nr_mmaps;
 	struct mmap *mmap = evlist->mmap;
 	struct mmap *overwrite_mmap = evlist->overwrite_mmap;
-	struct perf_cpu_map *cpus = evlist->core.cpus;
+	struct perf_cpu_map *cpus = evlist->core.user_cpus;
 
 	thread_data->nr_mmaps = bitmap_weight(thread_data->mask->maps.bits,
 					      thread_data->mask->maps.nbits);
@@ -1881,7 +1881,7 @@ static int record__synthesize(struct record *rec, bool tail)
 		return err;
 	}
 
-	err = perf_event__synthesize_cpu_map(&rec->tool, rec->evlist->core.cpus,
+	err = perf_event__synthesize_cpu_map(&rec->tool, rec->evlist->core.user_cpus,
 					     process_synthesized_event, NULL);
 	if (err < 0) {
 		pr_err("Couldn't synthesize cpu map.\n");
@@ -3675,7 +3675,7 @@ static int record__init_thread_default_masks(struct record *rec, struct perf_cpu
 static int record__init_thread_masks(struct record *rec)
 {
 	int ret = 0;
-	struct perf_cpu_map *cpus = rec->evlist->core.cpus;
+	struct perf_cpu_map *cpus = rec->evlist->core.user_cpus;
 
 	if (!record__threads_enabled(rec))
 		return record__init_thread_default_masks(rec, cpus);
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 4ee40de698a4..5bee529f7656 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -804,7 +804,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 	if (group)
 		evlist__set_leader(evsel_list);
 
-	if (!cpu_map__is_dummy(evsel_list->core.cpus)) {
+	if (!cpu_map__is_dummy(evsel_list->core.user_cpus)) {
 		if (affinity__setup(&saved_affinity) < 0)
 			return -1;
 		affinity = &saved_affinity;
@@ -1458,7 +1458,7 @@ static int perf_stat_init_aggr_mode(void)
 	aggr_cpu_id_get_t get_id = aggr_mode__get_aggr(stat_config.aggr_mode);
 
 	if (get_id) {
-		stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.cpus,
+		stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.user_cpus,
 							 get_id, /*data=*/NULL);
 		if (!stat_config.aggr_map) {
 			pr_err("cannot build %s map", aggr_mode__string[stat_config.aggr_mode]);
@@ -1472,7 +1472,7 @@ static int perf_stat_init_aggr_mode(void)
 	 * taking the highest cpu number to be the size of
 	 * the aggregation translate cpumap.
 	 */
-	nr = perf_cpu_map__max(evsel_list->core.cpus).cpu;
+	nr = perf_cpu_map__max(evsel_list->core.user_cpus).cpu;
 	stat_config.cpus_aggr_map = cpu_aggr_map__empty_new(nr + 1);
 	return stat_config.cpus_aggr_map ? 0 : -ENOMEM;
 }
@@ -1627,7 +1627,7 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
 	if (!get_id)
 		return 0;
 
-	stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.cpus, get_id, env);
+	stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.user_cpus, get_id, env);
 	if (!stat_config.aggr_map) {
 		pr_err("cannot build %s map", aggr_mode__string[stat_config.aggr_mode]);
 		return -1;
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 9b08e44a31d9..4cfa112292d0 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1021,7 +1021,7 @@ static int perf_top__start_counters(struct perf_top *top)
 
 	evlist__for_each_entry(evlist, counter) {
 try_again:
-		if (evsel__open(counter, top->evlist->core.cpus,
+		if (evsel__open(counter, top->evlist->core.user_cpus,
 				     top->evlist->core.threads) < 0) {
 
 			/*
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 9e48652662d4..b138dd6bdefc 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -174,7 +174,7 @@ void auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp,
 	mp->idx = idx;
 
 	if (per_cpu) {
-		mp->cpu = perf_cpu_map__cpu(evlist->core.cpus, idx);
+		mp->cpu = perf_cpu_map__cpu(evlist->core.user_cpus, idx);
 		if (evlist->core.threads)
 			mp->tid = perf_thread_map__pid(evlist->core.threads, 0);
 		else
diff --git a/tools/perf/util/bpf_ftrace.c b/tools/perf/util/bpf_ftrace.c
index 4f4d3aaff37c..69481b28b885 100644
--- a/tools/perf/util/bpf_ftrace.c
+++ b/tools/perf/util/bpf_ftrace.c
@@ -38,7 +38,7 @@ int perf_ftrace__latency_prepare_bpf(struct perf_ftrace *ftrace)
 
 	/* don't need to set cpu filter for system-wide mode */
 	if (ftrace->target.cpu_list) {
-		ncpus = perf_cpu_map__nr(ftrace->evlist->core.cpus);
+		ncpus = perf_cpu_map__nr(ftrace->evlist->core.user_cpus);
 		bpf_map__set_max_entries(skel->maps.cpu_filter, ncpus);
 	}
 
@@ -63,7 +63,7 @@ int perf_ftrace__latency_prepare_bpf(struct perf_ftrace *ftrace)
 		fd = bpf_map__fd(skel->maps.cpu_filter);
 
 		for (i = 0; i < ncpus; i++) {
-			cpu = perf_cpu_map__cpu(ftrace->evlist->core.cpus, i).cpu;
+			cpu = perf_cpu_map__cpu(ftrace->evlist->core.user_cpus, i).cpu;
 			bpf_map_update_elem(fd, &cpu, &val, BPF_ANY);
 		}
 	}
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 9bb79e049957..d335fb713f5e 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -440,7 +440,7 @@ static void __evlist__disable(struct evlist *evlist, char *evsel_name)
 	bool has_imm = false;
 
 	// See explanation in evlist__close()
-	if (!cpu_map__is_dummy(evlist->core.cpus)) {
+	if (!cpu_map__is_dummy(evlist->core.user_cpus)) {
 		if (affinity__setup(&saved_affinity) < 0)
 			return;
 		affinity = &saved_affinity;
@@ -500,7 +500,7 @@ static void __evlist__enable(struct evlist *evlist, char *evsel_name)
 	struct affinity saved_affinity, *affinity = NULL;
 
 	// See explanation in evlist__close()
-	if (!cpu_map__is_dummy(evlist->core.cpus)) {
+	if (!cpu_map__is_dummy(evlist->core.user_cpus)) {
 		if (affinity__setup(&saved_affinity) < 0)
 			return;
 		affinity = &saved_affinity;
@@ -565,7 +565,7 @@ static int evlist__enable_event_cpu(struct evlist *evlist, struct evsel *evsel,
 static int evlist__enable_event_thread(struct evlist *evlist, struct evsel *evsel, int thread)
 {
 	int cpu;
-	int nr_cpus = perf_cpu_map__nr(evlist->core.cpus);
+	int nr_cpus = perf_cpu_map__nr(evlist->core.user_cpus);
 
 	if (!evsel->core.fd)
 		return -EINVAL;
@@ -580,7 +580,7 @@ static int evlist__enable_event_thread(struct evlist *evlist, struct evsel *evse
 
 int evlist__enable_event_idx(struct evlist *evlist, struct evsel *evsel, int idx)
 {
-	bool per_cpu_mmaps = !perf_cpu_map__empty(evlist->core.cpus);
+	bool per_cpu_mmaps = !perf_cpu_map__empty(evlist->core.user_cpus);
 
 	if (per_cpu_mmaps)
 		return evlist__enable_event_cpu(evlist, evsel, idx);
@@ -1301,10 +1301,10 @@ void evlist__close(struct evlist *evlist)
 	struct affinity affinity;
 
 	/*
-	 * With perf record core.cpus is usually NULL.
+	 * With perf record core.user_cpus is usually NULL.
 	 * Use the old method to handle this for now.
 	 */
-	if (!evlist->core.cpus || cpu_map__is_dummy(evlist->core.cpus)) {
+	if (!evlist->core.user_cpus || cpu_map__is_dummy(evlist->core.user_cpus)) {
 		evlist__for_each_entry_reverse(evlist, evsel)
 			evsel__close(evsel);
 		return;
@@ -1367,7 +1367,7 @@ int evlist__open(struct evlist *evlist)
 	 * Default: one fd per CPU, all threads, aka systemwide
 	 * as sys_perf_event_open(cpu = -1, thread = -1) is EINVAL
 	 */
-	if (evlist->core.threads == NULL && evlist->core.cpus == NULL) {
+	if (evlist->core.threads == NULL && evlist->core.user_cpus == NULL) {
 		err = evlist__create_syswide_maps(evlist);
 		if (err < 0)
 			goto out_err;
diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
index 007a64681416..ff326eba084f 100644
--- a/tools/perf/util/record.c
+++ b/tools/perf/util/record.c
@@ -106,7 +106,7 @@ void evlist__config(struct evlist *evlist, struct record_opts *opts, struct call
 	if (opts->group)
 		evlist__set_leader(evlist);
 
-	if (perf_cpu_map__cpu(evlist->core.cpus, 0).cpu < 0)
+	if (perf_cpu_map__cpu(evlist->core.user_cpus, 0).cpu < 0)
 		opts->no_inherit = true;
 
 	use_comm_exec = perf_can_comm_exec();
@@ -244,7 +244,7 @@ bool evlist__can_select_event(struct evlist *evlist, const char *str)
 
 	evsel = evlist__last(temp_evlist);
 
-	if (!evlist || perf_cpu_map__empty(evlist->core.cpus)) {
+	if (!evlist || perf_cpu_map__empty(evlist->core.user_cpus)) {
 		struct perf_cpu_map *cpus = perf_cpu_map__new(NULL);
 
 		if (cpus)
@@ -252,7 +252,7 @@ bool evlist__can_select_event(struct evlist *evlist, const char *str)
 
 		perf_cpu_map__put(cpus);
 	} else {
-		cpu = perf_cpu_map__cpu(evlist->core.cpus, 0);
+		cpu = perf_cpu_map__cpu(evlist->core.user_cpus, 0);
 	}
 
 	while (1) {
diff --git a/tools/perf/util/sideband_evlist.c b/tools/perf/util/sideband_evlist.c
index 748371ac22be..9f58c68a25f7 100644
--- a/tools/perf/util/sideband_evlist.c
+++ b/tools/perf/util/sideband_evlist.c
@@ -114,7 +114,7 @@ int evlist__start_sb_thread(struct evlist *evlist, struct target *target)
 	}
 
 	evlist__for_each_entry(evlist, counter) {
-		if (evsel__open(counter, evlist->core.cpus, evlist->core.threads) < 0)
+		if (evsel__open(counter, evlist->core.user_cpus, evlist->core.threads) < 0)
 			goto out_delete_evlist;
 	}
 
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 9cbe351b141f..634dd9ea2b35 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -929,7 +929,7 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
 	int all_idx;
 	struct perf_cpu cpu;
 
-	perf_cpu_map__for_each_cpu(cpu, all_idx, evlist->core.cpus) {
+	perf_cpu_map__for_each_cpu(cpu, all_idx, evlist->core.user_cpus) {
 		struct evsel *counter;
 		bool first = true;
 
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index b654de0841f8..591afc6c607b 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -2127,7 +2127,7 @@ int perf_event__synthesize_stat_events(struct perf_stat_config *config, struct p
 		return err;
 	}
 
-	err = perf_event__synthesize_cpu_map(tool, evlist->core.cpus, process, NULL);
+	err = perf_event__synthesize_cpu_map(tool, evlist->core.user_cpus, process, NULL);
 	if (err < 0) {
 		pr_err("Couldn't synthesize thread map.\n");
 		return err;
diff --git a/tools/perf/util/top.c b/tools/perf/util/top.c
index c1ebfc5d2e0c..e98422f3ff17 100644
--- a/tools/perf/util/top.c
+++ b/tools/perf/util/top.c
@@ -95,15 +95,16 @@ size_t perf_top__header_snprintf(struct perf_top *top, char *bf, size_t size)
 
 	if (target->cpu_list)
 		ret += SNPRINTF(bf + ret, size - ret, ", CPU%s: %s)",
-				perf_cpu_map__nr(top->evlist->core.cpus) > 1 ? "s" : "",
+				perf_cpu_map__nr(top->evlist->core.user_cpus) > 1 ? "s" : "",
 				target->cpu_list);
 	else {
 		if (target->tid)
 			ret += SNPRINTF(bf + ret, size - ret, ")");
 		else
 			ret += SNPRINTF(bf + ret, size - ret, ", %d CPU%s)",
-					perf_cpu_map__nr(top->evlist->core.cpus),
-					perf_cpu_map__nr(top->evlist->core.cpus) > 1 ? "s" : "");
+					perf_cpu_map__nr(top->evlist->core.user_cpus),
+					perf_cpu_map__nr(top->evlist->core.user_cpus) > 1
+					? "s" : "");
 	}
 
 	perf_top__reset_sample_counters(top);
-- 
2.35.1.1021.g381101b075-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 1/5] perf evlist: Rename cpus to user_cpus
@ 2022-03-28  6:24   ` Ian Rogers
  0 siblings, 0 replies; 34+ messages in thread
From: Ian Rogers @ 2022-03-28  6:24 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel, coresight,
	linux-arm-kernel, netdev, bpf
  Cc: Stephane Eranian, Ian Rogers

evlist contains cpus and all_cpus. all_cpus is the union of the cpu maps
of all evsels. cpus is set to be cpus required from the command line,
defaulting to all online cpus if no cpus are specified. For something
like an uncore event, all_cpus may just be CPU 0, however, all_cpus may
be every online CPU. This causes all_cpus to have fewer values than the
cpus variable which is confusing given the 'all' in the name. To try to
make the behavior clearer, rename cpus to user_cpus and add comments on
the two struct variables.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/lib/perf/evlist.c                  | 28 ++++++++++++------------
 tools/lib/perf/include/internal/evlist.h |  4 +++-
 tools/perf/arch/arm/util/cs-etm.c        |  8 +++----
 tools/perf/arch/arm64/util/arm-spe.c     |  2 +-
 tools/perf/arch/x86/util/intel-bts.c     |  2 +-
 tools/perf/arch/x86/util/intel-pt.c      |  4 ++--
 tools/perf/bench/evlist-open-close.c     |  2 +-
 tools/perf/builtin-ftrace.c              |  2 +-
 tools/perf/builtin-record.c              |  6 ++---
 tools/perf/builtin-stat.c                |  8 +++----
 tools/perf/builtin-top.c                 |  2 +-
 tools/perf/util/auxtrace.c               |  2 +-
 tools/perf/util/bpf_ftrace.c             |  4 ++--
 tools/perf/util/evlist.c                 | 14 ++++++------
 tools/perf/util/record.c                 |  6 ++---
 tools/perf/util/sideband_evlist.c        |  2 +-
 tools/perf/util/stat-display.c           |  2 +-
 tools/perf/util/synthetic-events.c       |  2 +-
 tools/perf/util/top.c                    |  7 +++---
 19 files changed, 55 insertions(+), 52 deletions(-)

diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index 9a770bfdc804..e29dc229768a 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -41,10 +41,10 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
 	 */
 	if (!evsel->own_cpus || evlist->has_user_cpus) {
 		perf_cpu_map__put(evsel->cpus);
-		evsel->cpus = perf_cpu_map__get(evlist->cpus);
-	} else if (!evsel->system_wide && perf_cpu_map__empty(evlist->cpus)) {
+		evsel->cpus = perf_cpu_map__get(evlist->user_cpus);
+	} else if (!evsel->system_wide && perf_cpu_map__empty(evlist->user_cpus)) {
 		perf_cpu_map__put(evsel->cpus);
-		evsel->cpus = perf_cpu_map__get(evlist->cpus);
+		evsel->cpus = perf_cpu_map__get(evlist->user_cpus);
 	} else if (evsel->cpus != evsel->own_cpus) {
 		perf_cpu_map__put(evsel->cpus);
 		evsel->cpus = perf_cpu_map__get(evsel->own_cpus);
@@ -123,10 +123,10 @@ static void perf_evlist__purge(struct perf_evlist *evlist)
 
 void perf_evlist__exit(struct perf_evlist *evlist)
 {
-	perf_cpu_map__put(evlist->cpus);
+	perf_cpu_map__put(evlist->user_cpus);
 	perf_cpu_map__put(evlist->all_cpus);
 	perf_thread_map__put(evlist->threads);
-	evlist->cpus = NULL;
+	evlist->user_cpus = NULL;
 	evlist->all_cpus = NULL;
 	evlist->threads = NULL;
 	fdarray__exit(&evlist->pollfd);
@@ -155,9 +155,9 @@ void perf_evlist__set_maps(struct perf_evlist *evlist,
 	 * original reference count of 1.  If that is not the case it is up to
 	 * the caller to increase the reference count.
 	 */
-	if (cpus != evlist->cpus) {
-		perf_cpu_map__put(evlist->cpus);
-		evlist->cpus = perf_cpu_map__get(cpus);
+	if (cpus != evlist->user_cpus) {
+		perf_cpu_map__put(evlist->user_cpus);
+		evlist->user_cpus = perf_cpu_map__get(cpus);
 	}
 
 	if (threads != evlist->threads) {
@@ -294,7 +294,7 @@ int perf_evlist__id_add_fd(struct perf_evlist *evlist,
 
 int perf_evlist__alloc_pollfd(struct perf_evlist *evlist)
 {
-	int nr_cpus = perf_cpu_map__nr(evlist->cpus);
+	int nr_cpus = perf_cpu_map__nr(evlist->user_cpus);
 	int nr_threads = perf_thread_map__nr(evlist->threads);
 	int nfds = 0;
 	struct perf_evsel *evsel;
@@ -426,7 +426,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
 	       int idx, struct perf_mmap_param *mp, int cpu_idx,
 	       int thread, int *_output, int *_output_overwrite)
 {
-	struct perf_cpu evlist_cpu = perf_cpu_map__cpu(evlist->cpus, cpu_idx);
+	struct perf_cpu evlist_cpu = perf_cpu_map__cpu(evlist->user_cpus, cpu_idx);
 	struct perf_evsel *evsel;
 	int revent;
 
@@ -536,7 +536,7 @@ mmap_per_cpu(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
 	     struct perf_mmap_param *mp)
 {
 	int nr_threads = perf_thread_map__nr(evlist->threads);
-	int nr_cpus    = perf_cpu_map__nr(evlist->cpus);
+	int nr_cpus    = perf_cpu_map__nr(evlist->user_cpus);
 	int cpu, thread;
 
 	for (cpu = 0; cpu < nr_cpus; cpu++) {
@@ -564,8 +564,8 @@ static int perf_evlist__nr_mmaps(struct perf_evlist *evlist)
 {
 	int nr_mmaps;
 
-	nr_mmaps = perf_cpu_map__nr(evlist->cpus);
-	if (perf_cpu_map__empty(evlist->cpus))
+	nr_mmaps = perf_cpu_map__nr(evlist->user_cpus);
+	if (perf_cpu_map__empty(evlist->user_cpus))
 		nr_mmaps = perf_thread_map__nr(evlist->threads);
 
 	return nr_mmaps;
@@ -576,7 +576,7 @@ int perf_evlist__mmap_ops(struct perf_evlist *evlist,
 			  struct perf_mmap_param *mp)
 {
 	struct perf_evsel *evsel;
-	const struct perf_cpu_map *cpus = evlist->cpus;
+	const struct perf_cpu_map *cpus = evlist->user_cpus;
 	const struct perf_thread_map *threads = evlist->threads;
 
 	if (!ops || !ops->get || !ops->mmap)
diff --git a/tools/lib/perf/include/internal/evlist.h b/tools/lib/perf/include/internal/evlist.h
index 4cefade540bd..5f95672662ae 100644
--- a/tools/lib/perf/include/internal/evlist.h
+++ b/tools/lib/perf/include/internal/evlist.h
@@ -19,7 +19,9 @@ struct perf_evlist {
 	int			 nr_entries;
 	int			 nr_groups;
 	bool			 has_user_cpus;
-	struct perf_cpu_map	*cpus;
+	/** The list of cpus passed from the command line. */
+	struct perf_cpu_map	*user_cpus;
+	/** The union of all evsel cpu maps. */
 	struct perf_cpu_map	*all_cpus;
 	struct perf_thread_map	*threads;
 	int			 nr_mmaps;
diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
index cbc555245959..405d58903d84 100644
--- a/tools/perf/arch/arm/util/cs-etm.c
+++ b/tools/perf/arch/arm/util/cs-etm.c
@@ -199,7 +199,7 @@ static int cs_etm_set_option(struct auxtrace_record *itr,
 			     struct evsel *evsel, u32 option)
 {
 	int i, err = -EINVAL;
-	struct perf_cpu_map *event_cpus = evsel->evlist->core.cpus;
+	struct perf_cpu_map *event_cpus = evsel->evlist->core.user_cpus;
 	struct perf_cpu_map *online_cpus = perf_cpu_map__new(NULL);
 
 	/* Set option of each CPU we have */
@@ -299,7 +299,7 @@ static int cs_etm_recording_options(struct auxtrace_record *itr,
 				container_of(itr, struct cs_etm_recording, itr);
 	struct perf_pmu *cs_etm_pmu = ptr->cs_etm_pmu;
 	struct evsel *evsel, *cs_etm_evsel = NULL;
-	struct perf_cpu_map *cpus = evlist->core.cpus;
+	struct perf_cpu_map *cpus = evlist->core.user_cpus;
 	bool privileged = perf_event_paranoid_check(-1);
 	int err = 0;
 
@@ -522,7 +522,7 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused,
 {
 	int i;
 	int etmv3 = 0, etmv4 = 0, ete = 0;
-	struct perf_cpu_map *event_cpus = evlist->core.cpus;
+	struct perf_cpu_map *event_cpus = evlist->core.user_cpus;
 	struct perf_cpu_map *online_cpus = perf_cpu_map__new(NULL);
 
 	/* cpu map is not empty, we have specific CPUs to work with */
@@ -713,7 +713,7 @@ static int cs_etm_info_fill(struct auxtrace_record *itr,
 	u32 offset;
 	u64 nr_cpu, type;
 	struct perf_cpu_map *cpu_map;
-	struct perf_cpu_map *event_cpus = session->evlist->core.cpus;
+	struct perf_cpu_map *event_cpus = session->evlist->core.user_cpus;
 	struct perf_cpu_map *online_cpus = perf_cpu_map__new(NULL);
 	struct cs_etm_recording *ptr =
 			container_of(itr, struct cs_etm_recording, itr);
diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
index 5860bbaea95a..83ad05613321 100644
--- a/tools/perf/arch/arm64/util/arm-spe.c
+++ b/tools/perf/arch/arm64/util/arm-spe.c
@@ -144,7 +144,7 @@ static int arm_spe_recording_options(struct auxtrace_record *itr,
 			container_of(itr, struct arm_spe_recording, itr);
 	struct perf_pmu *arm_spe_pmu = sper->arm_spe_pmu;
 	struct evsel *evsel, *arm_spe_evsel = NULL;
-	struct perf_cpu_map *cpus = evlist->core.cpus;
+	struct perf_cpu_map *cpus = evlist->core.user_cpus;
 	bool privileged = perf_event_paranoid_check(-1);
 	struct evsel *tracking_evsel;
 	int err;
diff --git a/tools/perf/arch/x86/util/intel-bts.c b/tools/perf/arch/x86/util/intel-bts.c
index 4a76d49d25d6..c9d73ecfd795 100644
--- a/tools/perf/arch/x86/util/intel-bts.c
+++ b/tools/perf/arch/x86/util/intel-bts.c
@@ -110,7 +110,7 @@ static int intel_bts_recording_options(struct auxtrace_record *itr,
 			container_of(itr, struct intel_bts_recording, itr);
 	struct perf_pmu *intel_bts_pmu = btsr->intel_bts_pmu;
 	struct evsel *evsel, *intel_bts_evsel = NULL;
-	const struct perf_cpu_map *cpus = evlist->core.cpus;
+	const struct perf_cpu_map *cpus = evlist->core.user_cpus;
 	bool privileged = perf_event_paranoid_check(-1);
 
 	if (opts->auxtrace_sample_mode) {
diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
index 8c31578d6f4a..58bf24960273 100644
--- a/tools/perf/arch/x86/util/intel-pt.c
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -382,7 +382,7 @@ static int intel_pt_info_fill(struct auxtrace_record *itr,
 			ui__warning("Intel Processor Trace: TSC not available\n");
 	}
 
-	per_cpu_mmaps = !perf_cpu_map__empty(session->evlist->core.cpus);
+	per_cpu_mmaps = !perf_cpu_map__empty(session->evlist->core.user_cpus);
 
 	auxtrace_info->type = PERF_AUXTRACE_INTEL_PT;
 	auxtrace_info->priv[INTEL_PT_PMU_TYPE] = intel_pt_pmu->type;
@@ -632,7 +632,7 @@ static int intel_pt_recording_options(struct auxtrace_record *itr,
 	struct perf_pmu *intel_pt_pmu = ptr->intel_pt_pmu;
 	bool have_timing_info, need_immediate = false;
 	struct evsel *evsel, *intel_pt_evsel = NULL;
-	const struct perf_cpu_map *cpus = evlist->core.cpus;
+	const struct perf_cpu_map *cpus = evlist->core.user_cpus;
 	bool privileged = perf_event_paranoid_check(-1);
 	u64 tsc_bit;
 	int err;
diff --git a/tools/perf/bench/evlist-open-close.c b/tools/perf/bench/evlist-open-close.c
index de56601f69ee..5bdc6b476a4d 100644
--- a/tools/perf/bench/evlist-open-close.c
+++ b/tools/perf/bench/evlist-open-close.c
@@ -151,7 +151,7 @@ static int bench_evlist_open_close__run(char *evstr)
 
 	init_stats(&time_stats);
 
-	printf("  Number of cpus:\t%d\n", perf_cpu_map__nr(evlist->core.cpus));
+	printf("  Number of cpus:\t%d\n", perf_cpu_map__nr(evlist->core.user_cpus));
 	printf("  Number of threads:\t%d\n", evlist->core.threads->nr);
 	printf("  Number of events:\t%d (%d fds)\n",
 		evlist->core.nr_entries, evlist__count_evsel_fds(evlist));
diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
index ad9ce1bfffa1..642cbc6fdfc5 100644
--- a/tools/perf/builtin-ftrace.c
+++ b/tools/perf/builtin-ftrace.c
@@ -301,7 +301,7 @@ static int set_tracing_cpumask(struct perf_cpu_map *cpumap)
 
 static int set_tracing_cpu(struct perf_ftrace *ftrace)
 {
-	struct perf_cpu_map *cpumap = ftrace->evlist->core.cpus;
+	struct perf_cpu_map *cpumap = ftrace->evlist->core.user_cpus;
 
 	if (!target__has_cpu(&ftrace->target))
 		return 0;
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 0b4abed555d8..28ab3866802c 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -987,7 +987,7 @@ static int record__thread_data_init_maps(struct record_thread *thread_data, stru
 	int m, tm, nr_mmaps = evlist->core.nr_mmaps;
 	struct mmap *mmap = evlist->mmap;
 	struct mmap *overwrite_mmap = evlist->overwrite_mmap;
-	struct perf_cpu_map *cpus = evlist->core.cpus;
+	struct perf_cpu_map *cpus = evlist->core.user_cpus;
 
 	thread_data->nr_mmaps = bitmap_weight(thread_data->mask->maps.bits,
 					      thread_data->mask->maps.nbits);
@@ -1881,7 +1881,7 @@ static int record__synthesize(struct record *rec, bool tail)
 		return err;
 	}
 
-	err = perf_event__synthesize_cpu_map(&rec->tool, rec->evlist->core.cpus,
+	err = perf_event__synthesize_cpu_map(&rec->tool, rec->evlist->core.user_cpus,
 					     process_synthesized_event, NULL);
 	if (err < 0) {
 		pr_err("Couldn't synthesize cpu map.\n");
@@ -3675,7 +3675,7 @@ static int record__init_thread_default_masks(struct record *rec, struct perf_cpu
 static int record__init_thread_masks(struct record *rec)
 {
 	int ret = 0;
-	struct perf_cpu_map *cpus = rec->evlist->core.cpus;
+	struct perf_cpu_map *cpus = rec->evlist->core.user_cpus;
 
 	if (!record__threads_enabled(rec))
 		return record__init_thread_default_masks(rec, cpus);
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 4ee40de698a4..5bee529f7656 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -804,7 +804,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 	if (group)
 		evlist__set_leader(evsel_list);
 
-	if (!cpu_map__is_dummy(evsel_list->core.cpus)) {
+	if (!cpu_map__is_dummy(evsel_list->core.user_cpus)) {
 		if (affinity__setup(&saved_affinity) < 0)
 			return -1;
 		affinity = &saved_affinity;
@@ -1458,7 +1458,7 @@ static int perf_stat_init_aggr_mode(void)
 	aggr_cpu_id_get_t get_id = aggr_mode__get_aggr(stat_config.aggr_mode);
 
 	if (get_id) {
-		stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.cpus,
+		stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.user_cpus,
 							 get_id, /*data=*/NULL);
 		if (!stat_config.aggr_map) {
 			pr_err("cannot build %s map", aggr_mode__string[stat_config.aggr_mode]);
@@ -1472,7 +1472,7 @@ static int perf_stat_init_aggr_mode(void)
 	 * taking the highest cpu number to be the size of
 	 * the aggregation translate cpumap.
 	 */
-	nr = perf_cpu_map__max(evsel_list->core.cpus).cpu;
+	nr = perf_cpu_map__max(evsel_list->core.user_cpus).cpu;
 	stat_config.cpus_aggr_map = cpu_aggr_map__empty_new(nr + 1);
 	return stat_config.cpus_aggr_map ? 0 : -ENOMEM;
 }
@@ -1627,7 +1627,7 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
 	if (!get_id)
 		return 0;
 
-	stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.cpus, get_id, env);
+	stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.user_cpus, get_id, env);
 	if (!stat_config.aggr_map) {
 		pr_err("cannot build %s map", aggr_mode__string[stat_config.aggr_mode]);
 		return -1;
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 9b08e44a31d9..4cfa112292d0 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1021,7 +1021,7 @@ static int perf_top__start_counters(struct perf_top *top)
 
 	evlist__for_each_entry(evlist, counter) {
 try_again:
-		if (evsel__open(counter, top->evlist->core.cpus,
+		if (evsel__open(counter, top->evlist->core.user_cpus,
 				     top->evlist->core.threads) < 0) {
 
 			/*
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 9e48652662d4..b138dd6bdefc 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -174,7 +174,7 @@ void auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp,
 	mp->idx = idx;
 
 	if (per_cpu) {
-		mp->cpu = perf_cpu_map__cpu(evlist->core.cpus, idx);
+		mp->cpu = perf_cpu_map__cpu(evlist->core.user_cpus, idx);
 		if (evlist->core.threads)
 			mp->tid = perf_thread_map__pid(evlist->core.threads, 0);
 		else
diff --git a/tools/perf/util/bpf_ftrace.c b/tools/perf/util/bpf_ftrace.c
index 4f4d3aaff37c..69481b28b885 100644
--- a/tools/perf/util/bpf_ftrace.c
+++ b/tools/perf/util/bpf_ftrace.c
@@ -38,7 +38,7 @@ int perf_ftrace__latency_prepare_bpf(struct perf_ftrace *ftrace)
 
 	/* don't need to set cpu filter for system-wide mode */
 	if (ftrace->target.cpu_list) {
-		ncpus = perf_cpu_map__nr(ftrace->evlist->core.cpus);
+		ncpus = perf_cpu_map__nr(ftrace->evlist->core.user_cpus);
 		bpf_map__set_max_entries(skel->maps.cpu_filter, ncpus);
 	}
 
@@ -63,7 +63,7 @@ int perf_ftrace__latency_prepare_bpf(struct perf_ftrace *ftrace)
 		fd = bpf_map__fd(skel->maps.cpu_filter);
 
 		for (i = 0; i < ncpus; i++) {
-			cpu = perf_cpu_map__cpu(ftrace->evlist->core.cpus, i).cpu;
+			cpu = perf_cpu_map__cpu(ftrace->evlist->core.user_cpus, i).cpu;
 			bpf_map_update_elem(fd, &cpu, &val, BPF_ANY);
 		}
 	}
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 9bb79e049957..d335fb713f5e 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -440,7 +440,7 @@ static void __evlist__disable(struct evlist *evlist, char *evsel_name)
 	bool has_imm = false;
 
 	// See explanation in evlist__close()
-	if (!cpu_map__is_dummy(evlist->core.cpus)) {
+	if (!cpu_map__is_dummy(evlist->core.user_cpus)) {
 		if (affinity__setup(&saved_affinity) < 0)
 			return;
 		affinity = &saved_affinity;
@@ -500,7 +500,7 @@ static void __evlist__enable(struct evlist *evlist, char *evsel_name)
 	struct affinity saved_affinity, *affinity = NULL;
 
 	// See explanation in evlist__close()
-	if (!cpu_map__is_dummy(evlist->core.cpus)) {
+	if (!cpu_map__is_dummy(evlist->core.user_cpus)) {
 		if (affinity__setup(&saved_affinity) < 0)
 			return;
 		affinity = &saved_affinity;
@@ -565,7 +565,7 @@ static int evlist__enable_event_cpu(struct evlist *evlist, struct evsel *evsel,
 static int evlist__enable_event_thread(struct evlist *evlist, struct evsel *evsel, int thread)
 {
 	int cpu;
-	int nr_cpus = perf_cpu_map__nr(evlist->core.cpus);
+	int nr_cpus = perf_cpu_map__nr(evlist->core.user_cpus);
 
 	if (!evsel->core.fd)
 		return -EINVAL;
@@ -580,7 +580,7 @@ static int evlist__enable_event_thread(struct evlist *evlist, struct evsel *evse
 
 int evlist__enable_event_idx(struct evlist *evlist, struct evsel *evsel, int idx)
 {
-	bool per_cpu_mmaps = !perf_cpu_map__empty(evlist->core.cpus);
+	bool per_cpu_mmaps = !perf_cpu_map__empty(evlist->core.user_cpus);
 
 	if (per_cpu_mmaps)
 		return evlist__enable_event_cpu(evlist, evsel, idx);
@@ -1301,10 +1301,10 @@ void evlist__close(struct evlist *evlist)
 	struct affinity affinity;
 
 	/*
-	 * With perf record core.cpus is usually NULL.
+	 * With perf record core.user_cpus is usually NULL.
 	 * Use the old method to handle this for now.
 	 */
-	if (!evlist->core.cpus || cpu_map__is_dummy(evlist->core.cpus)) {
+	if (!evlist->core.user_cpus || cpu_map__is_dummy(evlist->core.user_cpus)) {
 		evlist__for_each_entry_reverse(evlist, evsel)
 			evsel__close(evsel);
 		return;
@@ -1367,7 +1367,7 @@ int evlist__open(struct evlist *evlist)
 	 * Default: one fd per CPU, all threads, aka systemwide
 	 * as sys_perf_event_open(cpu = -1, thread = -1) is EINVAL
 	 */
-	if (evlist->core.threads == NULL && evlist->core.cpus == NULL) {
+	if (evlist->core.threads == NULL && evlist->core.user_cpus == NULL) {
 		err = evlist__create_syswide_maps(evlist);
 		if (err < 0)
 			goto out_err;
diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
index 007a64681416..ff326eba084f 100644
--- a/tools/perf/util/record.c
+++ b/tools/perf/util/record.c
@@ -106,7 +106,7 @@ void evlist__config(struct evlist *evlist, struct record_opts *opts, struct call
 	if (opts->group)
 		evlist__set_leader(evlist);
 
-	if (perf_cpu_map__cpu(evlist->core.cpus, 0).cpu < 0)
+	if (perf_cpu_map__cpu(evlist->core.user_cpus, 0).cpu < 0)
 		opts->no_inherit = true;
 
 	use_comm_exec = perf_can_comm_exec();
@@ -244,7 +244,7 @@ bool evlist__can_select_event(struct evlist *evlist, const char *str)
 
 	evsel = evlist__last(temp_evlist);
 
-	if (!evlist || perf_cpu_map__empty(evlist->core.cpus)) {
+	if (!evlist || perf_cpu_map__empty(evlist->core.user_cpus)) {
 		struct perf_cpu_map *cpus = perf_cpu_map__new(NULL);
 
 		if (cpus)
@@ -252,7 +252,7 @@ bool evlist__can_select_event(struct evlist *evlist, const char *str)
 
 		perf_cpu_map__put(cpus);
 	} else {
-		cpu = perf_cpu_map__cpu(evlist->core.cpus, 0);
+		cpu = perf_cpu_map__cpu(evlist->core.user_cpus, 0);
 	}
 
 	while (1) {
diff --git a/tools/perf/util/sideband_evlist.c b/tools/perf/util/sideband_evlist.c
index 748371ac22be..9f58c68a25f7 100644
--- a/tools/perf/util/sideband_evlist.c
+++ b/tools/perf/util/sideband_evlist.c
@@ -114,7 +114,7 @@ int evlist__start_sb_thread(struct evlist *evlist, struct target *target)
 	}
 
 	evlist__for_each_entry(evlist, counter) {
-		if (evsel__open(counter, evlist->core.cpus, evlist->core.threads) < 0)
+		if (evsel__open(counter, evlist->core.user_cpus, evlist->core.threads) < 0)
 			goto out_delete_evlist;
 	}
 
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 9cbe351b141f..634dd9ea2b35 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -929,7 +929,7 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
 	int all_idx;
 	struct perf_cpu cpu;
 
-	perf_cpu_map__for_each_cpu(cpu, all_idx, evlist->core.cpus) {
+	perf_cpu_map__for_each_cpu(cpu, all_idx, evlist->core.user_cpus) {
 		struct evsel *counter;
 		bool first = true;
 
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index b654de0841f8..591afc6c607b 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -2127,7 +2127,7 @@ int perf_event__synthesize_stat_events(struct perf_stat_config *config, struct p
 		return err;
 	}
 
-	err = perf_event__synthesize_cpu_map(tool, evlist->core.cpus, process, NULL);
+	err = perf_event__synthesize_cpu_map(tool, evlist->core.user_cpus, process, NULL);
 	if (err < 0) {
 		pr_err("Couldn't synthesize thread map.\n");
 		return err;
diff --git a/tools/perf/util/top.c b/tools/perf/util/top.c
index c1ebfc5d2e0c..e98422f3ff17 100644
--- a/tools/perf/util/top.c
+++ b/tools/perf/util/top.c
@@ -95,15 +95,16 @@ size_t perf_top__header_snprintf(struct perf_top *top, char *bf, size_t size)
 
 	if (target->cpu_list)
 		ret += SNPRINTF(bf + ret, size - ret, ", CPU%s: %s)",
-				perf_cpu_map__nr(top->evlist->core.cpus) > 1 ? "s" : "",
+				perf_cpu_map__nr(top->evlist->core.user_cpus) > 1 ? "s" : "",
 				target->cpu_list);
 	else {
 		if (target->tid)
 			ret += SNPRINTF(bf + ret, size - ret, ")");
 		else
 			ret += SNPRINTF(bf + ret, size - ret, ", %d CPU%s)",
-					perf_cpu_map__nr(top->evlist->core.cpus),
-					perf_cpu_map__nr(top->evlist->core.cpus) > 1 ? "s" : "");
+					perf_cpu_map__nr(top->evlist->core.user_cpus),
+					perf_cpu_map__nr(top->evlist->core.user_cpus) > 1
+					? "s" : "");
 	}
 
 	perf_top__reset_sample_counters(top);
-- 
2.35.1.1021.g381101b075-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/5] perf cpumap: More cpu map reuse by merge.
  2022-03-28  6:24 ` Ian Rogers
@ 2022-03-28  6:24   ` Ian Rogers
  -1 siblings, 0 replies; 34+ messages in thread
From: Ian Rogers @ 2022-03-28  6:24 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel, coresight,
	linux-arm-kernel, netdev, bpf
  Cc: Stephane Eranian, Ian Rogers

perf_cpu_map__merge will reuse one of its arguments if they are equal or
the other argument is NULL. The arguments could be reused if it is known
one set of values is a subset of the other. For example, a map of 0-1
and a map of just 0 when merged yields the map of 0-1. Currently a new
map is created rather than adding a reference count to the original 0-1
map.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/lib/perf/cpumap.c | 38 ++++++++++++++++++++++++++++----------
 1 file changed, 28 insertions(+), 10 deletions(-)

diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
index ee66760f1e63..953bc50b0e41 100644
--- a/tools/lib/perf/cpumap.c
+++ b/tools/lib/perf/cpumap.c
@@ -319,6 +319,29 @@ struct perf_cpu perf_cpu_map__max(struct perf_cpu_map *map)
 	return map->nr > 0 ? map->map[map->nr - 1] : result;
 }
 
+/** Is 'b' a subset of 'a'. */
+static bool perf_cpu_map__is_subset(const struct perf_cpu_map *a,
+				    const struct perf_cpu_map *b)
+{
+	int i, j;
+
+	if (a == b || !b)
+		return true;
+	if (!a || b->nr > a->nr)
+		return false;
+	j = 0;
+	for (i = 0; i < a->nr; i++) {
+		if (a->map[i].cpu > b->map[j].cpu)
+			return false;
+		if (a->map[i].cpu == b->map[j].cpu) {
+			j++;
+			if (j == b->nr)
+				return true;
+		}
+	}
+	return false;
+}
+
 /*
  * Merge two cpumaps
  *
@@ -335,17 +358,12 @@ struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
 	int i, j, k;
 	struct perf_cpu_map *merged;
 
-	if (!orig && !other)
-		return NULL;
-	if (!orig) {
-		perf_cpu_map__get(other);
-		return other;
-	}
-	if (!other)
-		return orig;
-	if (orig->nr == other->nr &&
-	    !memcmp(orig->map, other->map, orig->nr * sizeof(struct perf_cpu)))
+	if (perf_cpu_map__is_subset(orig, other))
 		return orig;
+	if (perf_cpu_map__is_subset(other, orig)) {
+		perf_cpu_map__put(orig);
+		return perf_cpu_map__get(other);
+	}
 
 	tmp_len = orig->nr + other->nr;
 	tmp_cpus = malloc(tmp_len * sizeof(struct perf_cpu));
-- 
2.35.1.1021.g381101b075-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/5] perf cpumap: More cpu map reuse by merge.
@ 2022-03-28  6:24   ` Ian Rogers
  0 siblings, 0 replies; 34+ messages in thread
From: Ian Rogers @ 2022-03-28  6:24 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel, coresight,
	linux-arm-kernel, netdev, bpf
  Cc: Stephane Eranian, Ian Rogers

perf_cpu_map__merge will reuse one of its arguments if they are equal or
the other argument is NULL. The arguments could be reused if it is known
one set of values is a subset of the other. For example, a map of 0-1
and a map of just 0 when merged yields the map of 0-1. Currently a new
map is created rather than adding a reference count to the original 0-1
map.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/lib/perf/cpumap.c | 38 ++++++++++++++++++++++++++++----------
 1 file changed, 28 insertions(+), 10 deletions(-)

diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
index ee66760f1e63..953bc50b0e41 100644
--- a/tools/lib/perf/cpumap.c
+++ b/tools/lib/perf/cpumap.c
@@ -319,6 +319,29 @@ struct perf_cpu perf_cpu_map__max(struct perf_cpu_map *map)
 	return map->nr > 0 ? map->map[map->nr - 1] : result;
 }
 
+/** Is 'b' a subset of 'a'. */
+static bool perf_cpu_map__is_subset(const struct perf_cpu_map *a,
+				    const struct perf_cpu_map *b)
+{
+	int i, j;
+
+	if (a == b || !b)
+		return true;
+	if (!a || b->nr > a->nr)
+		return false;
+	j = 0;
+	for (i = 0; i < a->nr; i++) {
+		if (a->map[i].cpu > b->map[j].cpu)
+			return false;
+		if (a->map[i].cpu == b->map[j].cpu) {
+			j++;
+			if (j == b->nr)
+				return true;
+		}
+	}
+	return false;
+}
+
 /*
  * Merge two cpumaps
  *
@@ -335,17 +358,12 @@ struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
 	int i, j, k;
 	struct perf_cpu_map *merged;
 
-	if (!orig && !other)
-		return NULL;
-	if (!orig) {
-		perf_cpu_map__get(other);
-		return other;
-	}
-	if (!other)
-		return orig;
-	if (orig->nr == other->nr &&
-	    !memcmp(orig->map, other->map, orig->nr * sizeof(struct perf_cpu)))
+	if (perf_cpu_map__is_subset(orig, other))
 		return orig;
+	if (perf_cpu_map__is_subset(other, orig)) {
+		perf_cpu_map__put(orig);
+		return perf_cpu_map__get(other);
+	}
 
 	tmp_len = orig->nr + other->nr;
 	tmp_cpus = malloc(tmp_len * sizeof(struct perf_cpu));
-- 
2.35.1.1021.g381101b075-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 3/5] perf cpumap: Add intersect function.
  2022-03-28  6:24 ` Ian Rogers
@ 2022-03-28  6:24   ` Ian Rogers
  -1 siblings, 0 replies; 34+ messages in thread
From: Ian Rogers @ 2022-03-28  6:24 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel, coresight,
	linux-arm-kernel, netdev, bpf
  Cc: Stephane Eranian, Ian Rogers

The merge function gives the union of two cpu maps. Add an intersect
function which will be used in the next change.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/lib/perf/cpumap.c              | 38 ++++++++++++++++++++++++++++
 tools/lib/perf/include/perf/cpumap.h |  2 ++
 2 files changed, 40 insertions(+)

diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
index 953bc50b0e41..56b4d213039f 100644
--- a/tools/lib/perf/cpumap.c
+++ b/tools/lib/perf/cpumap.c
@@ -393,3 +393,41 @@ struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
 	perf_cpu_map__put(orig);
 	return merged;
 }
+
+struct perf_cpu_map *perf_cpu_map__intersect(struct perf_cpu_map *orig,
+					     struct perf_cpu_map *other)
+{
+	struct perf_cpu *tmp_cpus;
+	int tmp_len;
+	int i, j, k;
+	struct perf_cpu_map *merged = NULL;
+
+	if (perf_cpu_map__is_subset(other, orig))
+		return orig;
+	if (perf_cpu_map__is_subset(orig, other)) {
+		perf_cpu_map__put(orig);
+		return perf_cpu_map__get(other);
+	}
+
+	tmp_len = max(orig->nr, other->nr);
+	tmp_cpus = malloc(tmp_len * sizeof(struct perf_cpu));
+	if (!tmp_cpus)
+		return NULL;
+
+	i = j = k = 0;
+	while (i < orig->nr && j < other->nr) {
+		if (orig->map[i].cpu < other->map[j].cpu)
+			i++;
+		else if (orig->map[i].cpu > other->map[j].cpu)
+			j++;
+		else {
+			j++;
+			tmp_cpus[k++] = orig->map[i++];
+		}
+	}
+	if (k)
+		merged = cpu_map__trim_new(k, tmp_cpus);
+	free(tmp_cpus);
+	perf_cpu_map__put(orig);
+	return merged;
+}
diff --git a/tools/lib/perf/include/perf/cpumap.h b/tools/lib/perf/include/perf/cpumap.h
index 4a2edbdb5e2b..a2a7216c0b78 100644
--- a/tools/lib/perf/include/perf/cpumap.h
+++ b/tools/lib/perf/include/perf/cpumap.h
@@ -19,6 +19,8 @@ LIBPERF_API struct perf_cpu_map *perf_cpu_map__read(FILE *file);
 LIBPERF_API struct perf_cpu_map *perf_cpu_map__get(struct perf_cpu_map *map);
 LIBPERF_API struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
 						     struct perf_cpu_map *other);
+LIBPERF_API struct perf_cpu_map *perf_cpu_map__intersect(struct perf_cpu_map *orig,
+							 struct perf_cpu_map *other);
 LIBPERF_API void perf_cpu_map__put(struct perf_cpu_map *map);
 LIBPERF_API struct perf_cpu perf_cpu_map__cpu(const struct perf_cpu_map *cpus, int idx);
 LIBPERF_API int perf_cpu_map__nr(const struct perf_cpu_map *cpus);
-- 
2.35.1.1021.g381101b075-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 3/5] perf cpumap: Add intersect function.
@ 2022-03-28  6:24   ` Ian Rogers
  0 siblings, 0 replies; 34+ messages in thread
From: Ian Rogers @ 2022-03-28  6:24 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel, coresight,
	linux-arm-kernel, netdev, bpf
  Cc: Stephane Eranian, Ian Rogers

The merge function gives the union of two cpu maps. Add an intersect
function which will be used in the next change.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/lib/perf/cpumap.c              | 38 ++++++++++++++++++++++++++++
 tools/lib/perf/include/perf/cpumap.h |  2 ++
 2 files changed, 40 insertions(+)

diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
index 953bc50b0e41..56b4d213039f 100644
--- a/tools/lib/perf/cpumap.c
+++ b/tools/lib/perf/cpumap.c
@@ -393,3 +393,41 @@ struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
 	perf_cpu_map__put(orig);
 	return merged;
 }
+
+struct perf_cpu_map *perf_cpu_map__intersect(struct perf_cpu_map *orig,
+					     struct perf_cpu_map *other)
+{
+	struct perf_cpu *tmp_cpus;
+	int tmp_len;
+	int i, j, k;
+	struct perf_cpu_map *merged = NULL;
+
+	if (perf_cpu_map__is_subset(other, orig))
+		return orig;
+	if (perf_cpu_map__is_subset(orig, other)) {
+		perf_cpu_map__put(orig);
+		return perf_cpu_map__get(other);
+	}
+
+	tmp_len = max(orig->nr, other->nr);
+	tmp_cpus = malloc(tmp_len * sizeof(struct perf_cpu));
+	if (!tmp_cpus)
+		return NULL;
+
+	i = j = k = 0;
+	while (i < orig->nr && j < other->nr) {
+		if (orig->map[i].cpu < other->map[j].cpu)
+			i++;
+		else if (orig->map[i].cpu > other->map[j].cpu)
+			j++;
+		else {
+			j++;
+			tmp_cpus[k++] = orig->map[i++];
+		}
+	}
+	if (k)
+		merged = cpu_map__trim_new(k, tmp_cpus);
+	free(tmp_cpus);
+	perf_cpu_map__put(orig);
+	return merged;
+}
diff --git a/tools/lib/perf/include/perf/cpumap.h b/tools/lib/perf/include/perf/cpumap.h
index 4a2edbdb5e2b..a2a7216c0b78 100644
--- a/tools/lib/perf/include/perf/cpumap.h
+++ b/tools/lib/perf/include/perf/cpumap.h
@@ -19,6 +19,8 @@ LIBPERF_API struct perf_cpu_map *perf_cpu_map__read(FILE *file);
 LIBPERF_API struct perf_cpu_map *perf_cpu_map__get(struct perf_cpu_map *map);
 LIBPERF_API struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
 						     struct perf_cpu_map *other);
+LIBPERF_API struct perf_cpu_map *perf_cpu_map__intersect(struct perf_cpu_map *orig,
+							 struct perf_cpu_map *other);
 LIBPERF_API void perf_cpu_map__put(struct perf_cpu_map *map);
 LIBPERF_API struct perf_cpu perf_cpu_map__cpu(const struct perf_cpu_map *cpus, int idx);
 LIBPERF_API int perf_cpu_map__nr(const struct perf_cpu_map *cpus);
-- 
2.35.1.1021.g381101b075-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 4/5] perf stat: Avoid segv if core.user_cpus isn't set.
  2022-03-28  6:24 ` Ian Rogers
@ 2022-03-28  6:24   ` Ian Rogers
  -1 siblings, 0 replies; 34+ messages in thread
From: Ian Rogers @ 2022-03-28  6:24 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel, coresight,
	linux-arm-kernel, netdev, bpf
  Cc: Stephane Eranian, Ian Rogers

Passing null to perf_cpu_map__max doesn't make sense as there is no
valid max. Avoid this problem by null checking in
perf_stat_init_aggr_mode.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-stat.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 5bee529f7656..ecd5cf4fd872 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1472,7 +1472,10 @@ static int perf_stat_init_aggr_mode(void)
 	 * taking the highest cpu number to be the size of
 	 * the aggregation translate cpumap.
 	 */
-	nr = perf_cpu_map__max(evsel_list->core.user_cpus).cpu;
+	if (evsel_list->core.user_cpus)
+		nr = perf_cpu_map__max(evsel_list->core.user_cpus).cpu;
+	else
+		nr = 0;
 	stat_config.cpus_aggr_map = cpu_aggr_map__empty_new(nr + 1);
 	return stat_config.cpus_aggr_map ? 0 : -ENOMEM;
 }
-- 
2.35.1.1021.g381101b075-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 4/5] perf stat: Avoid segv if core.user_cpus isn't set.
@ 2022-03-28  6:24   ` Ian Rogers
  0 siblings, 0 replies; 34+ messages in thread
From: Ian Rogers @ 2022-03-28  6:24 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel, coresight,
	linux-arm-kernel, netdev, bpf
  Cc: Stephane Eranian, Ian Rogers

Passing null to perf_cpu_map__max doesn't make sense as there is no
valid max. Avoid this problem by null checking in
perf_stat_init_aggr_mode.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-stat.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 5bee529f7656..ecd5cf4fd872 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1472,7 +1472,10 @@ static int perf_stat_init_aggr_mode(void)
 	 * taking the highest cpu number to be the size of
 	 * the aggregation translate cpumap.
 	 */
-	nr = perf_cpu_map__max(evsel_list->core.user_cpus).cpu;
+	if (evsel_list->core.user_cpus)
+		nr = perf_cpu_map__max(evsel_list->core.user_cpus).cpu;
+	else
+		nr = 0;
 	stat_config.cpus_aggr_map = cpu_aggr_map__empty_new(nr + 1);
 	return stat_config.cpus_aggr_map ? 0 : -ENOMEM;
 }
-- 
2.35.1.1021.g381101b075-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 5/5] perf evlist: Respect all_cpus when setting user_cpus
  2022-03-28  6:24 ` Ian Rogers
@ 2022-03-28  6:24   ` Ian Rogers
  -1 siblings, 0 replies; 34+ messages in thread
From: Ian Rogers @ 2022-03-28  6:24 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel, coresight,
	linux-arm-kernel, netdev, bpf
  Cc: Stephane Eranian, Ian Rogers

If all_cpus is calculated it represents the merge/union of all
evsel cpu maps. By default user_cpus is computed to be the online
CPUs. For uncore events, it is often the case currently that
all_cpus is a subset of user_cpus. Metrics printed without
aggregation and with metric-only, in print_no_aggr_metric,
iterate over user_cpus assuming every CPU has a metric to
print. For each CPU the prefix is printed, but then if the
evsel's cpus doesn't contain anything you get an empty line like
the following on a SkylakeX:

```
$ perf stat -A -M DRAM_BW_Use -a --metric-only -I 1000
     1.000453137 CPU0                       0.00
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137 CPU18                      0.00
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     2.003717143 CPU0                       0.00
...
```

While it is possible to be lazier in printing the prefix and
trailing newline, having user_cpus not be a subset of all_cpus is
preferential so that wasted work isn't done elsewhere user_cpus
is used. The change modifies user_cpus to be the intersection of
user specified CPUs, or default all online CPUs, with the CPUs
computed through the merge of all evsel cpu maps.

New behavior:
```
$ perf stat -A -M DRAM_BW_Use -a --metric-only -I 1000
     1.001086325 CPU0                       0.00
     1.001086325 CPU18                      0.00
     2.003671291 CPU0                       0.00
     2.003671291 CPU18                      0.00
...
```

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/evlist.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index d335fb713f5e..91bbb66b7e9a 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1036,6 +1036,8 @@ int evlist__create_maps(struct evlist *evlist, struct target *target)
 	if (!cpus)
 		goto out_delete_threads;
 
+	if (evlist->core.all_cpus)
+		cpus = perf_cpu_map__intersect(cpus, evlist->core.all_cpus);
 	evlist->core.has_user_cpus = !!target->cpu_list && !target->hybrid;
 
 	perf_evlist__set_maps(&evlist->core, cpus, threads);
-- 
2.35.1.1021.g381101b075-goog


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 5/5] perf evlist: Respect all_cpus when setting user_cpus
@ 2022-03-28  6:24   ` Ian Rogers
  0 siblings, 0 replies; 34+ messages in thread
From: Ian Rogers @ 2022-03-28  6:24 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel, coresight,
	linux-arm-kernel, netdev, bpf
  Cc: Stephane Eranian, Ian Rogers

If all_cpus is calculated it represents the merge/union of all
evsel cpu maps. By default user_cpus is computed to be the online
CPUs. For uncore events, it is often the case currently that
all_cpus is a subset of user_cpus. Metrics printed without
aggregation and with metric-only, in print_no_aggr_metric,
iterate over user_cpus assuming every CPU has a metric to
print. For each CPU the prefix is printed, but then if the
evsel's cpus doesn't contain anything you get an empty line like
the following on a SkylakeX:

```
$ perf stat -A -M DRAM_BW_Use -a --metric-only -I 1000
     1.000453137 CPU0                       0.00
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137 CPU18                      0.00
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     1.000453137
     2.003717143 CPU0                       0.00
...
```

While it is possible to be lazier in printing the prefix and
trailing newline, having user_cpus not be a subset of all_cpus is
preferential so that wasted work isn't done elsewhere user_cpus
is used. The change modifies user_cpus to be the intersection of
user specified CPUs, or default all online CPUs, with the CPUs
computed through the merge of all evsel cpu maps.

New behavior:
```
$ perf stat -A -M DRAM_BW_Use -a --metric-only -I 1000
     1.001086325 CPU0                       0.00
     1.001086325 CPU18                      0.00
     2.003671291 CPU0                       0.00
     2.003671291 CPU18                      0.00
...
```

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/evlist.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index d335fb713f5e..91bbb66b7e9a 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1036,6 +1036,8 @@ int evlist__create_maps(struct evlist *evlist, struct target *target)
 	if (!cpus)
 		goto out_delete_threads;
 
+	if (evlist->core.all_cpus)
+		cpus = perf_cpu_map__intersect(cpus, evlist->core.all_cpus);
 	evlist->core.has_user_cpus = !!target->cpu_list && !target->hybrid;
 
 	perf_evlist__set_maps(&evlist->core, cpus, threads);
-- 
2.35.1.1021.g381101b075-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/5] perf evlist: Rename cpus to user_cpus
  2022-03-28  6:24   ` Ian Rogers
@ 2022-03-28 20:18     ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 34+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-03-28 20:18 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Adrian Hunter, Riccardo Mancini, Andi Kleen,
	Alexey Bayduraev, Alexander Antonov, linux-perf-users,
	linux-kernel, coresight, linux-arm-kernel, netdev, bpf,
	Stephane Eranian

Em Sun, Mar 27, 2022 at 11:24:10PM -0700, Ian Rogers escreveu:
> evlist contains cpus and all_cpus. all_cpus is the union of the cpu maps
> of all evsels. cpus is set to be cpus required from the command line,

Can we replace "required" with "requested"?

> defaulting to all online cpus if no cpus are specified.

> For something like an uncore event, all_cpus may just be CPU 0,
> however, all_cpus may be every online CPU.

Can this be rephrased as:

"For an uncore event, all_cpus may be just CPU 0 or every online CPU."

?

> This causes all_cpus to have fewer values than the cpus variable which
> is confusing given the 'all' in the name. To try to make the behavior
> clearer, rename cpus to user_cpus and add comments on the two struct
> variables.

"user_cpus" can as well mean CPUs where we should only sample user level
events, so perhaps bite the bullet and rename it to the longer

  'evlist->user_requested_cpus'

?

- Arnaldo
 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/lib/perf/evlist.c                  | 28 ++++++++++++------------
>  tools/lib/perf/include/internal/evlist.h |  4 +++-
>  tools/perf/arch/arm/util/cs-etm.c        |  8 +++----
>  tools/perf/arch/arm64/util/arm-spe.c     |  2 +-
>  tools/perf/arch/x86/util/intel-bts.c     |  2 +-
>  tools/perf/arch/x86/util/intel-pt.c      |  4 ++--
>  tools/perf/bench/evlist-open-close.c     |  2 +-
>  tools/perf/builtin-ftrace.c              |  2 +-
>  tools/perf/builtin-record.c              |  6 ++---
>  tools/perf/builtin-stat.c                |  8 +++----
>  tools/perf/builtin-top.c                 |  2 +-
>  tools/perf/util/auxtrace.c               |  2 +-
>  tools/perf/util/bpf_ftrace.c             |  4 ++--
>  tools/perf/util/evlist.c                 | 14 ++++++------
>  tools/perf/util/record.c                 |  6 ++---
>  tools/perf/util/sideband_evlist.c        |  2 +-
>  tools/perf/util/stat-display.c           |  2 +-
>  tools/perf/util/synthetic-events.c       |  2 +-
>  tools/perf/util/top.c                    |  7 +++---
>  19 files changed, 55 insertions(+), 52 deletions(-)
> 
> diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
> index 9a770bfdc804..e29dc229768a 100644
> --- a/tools/lib/perf/evlist.c
> +++ b/tools/lib/perf/evlist.c
> @@ -41,10 +41,10 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
>  	 */
>  	if (!evsel->own_cpus || evlist->has_user_cpus) {
>  		perf_cpu_map__put(evsel->cpus);
> -		evsel->cpus = perf_cpu_map__get(evlist->cpus);
> -	} else if (!evsel->system_wide && perf_cpu_map__empty(evlist->cpus)) {
> +		evsel->cpus = perf_cpu_map__get(evlist->user_cpus);
> +	} else if (!evsel->system_wide && perf_cpu_map__empty(evlist->user_cpus)) {
>  		perf_cpu_map__put(evsel->cpus);
> -		evsel->cpus = perf_cpu_map__get(evlist->cpus);
> +		evsel->cpus = perf_cpu_map__get(evlist->user_cpus);
>  	} else if (evsel->cpus != evsel->own_cpus) {
>  		perf_cpu_map__put(evsel->cpus);
>  		evsel->cpus = perf_cpu_map__get(evsel->own_cpus);
> @@ -123,10 +123,10 @@ static void perf_evlist__purge(struct perf_evlist *evlist)
>  
>  void perf_evlist__exit(struct perf_evlist *evlist)
>  {
> -	perf_cpu_map__put(evlist->cpus);
> +	perf_cpu_map__put(evlist->user_cpus);
>  	perf_cpu_map__put(evlist->all_cpus);
>  	perf_thread_map__put(evlist->threads);
> -	evlist->cpus = NULL;
> +	evlist->user_cpus = NULL;
>  	evlist->all_cpus = NULL;
>  	evlist->threads = NULL;
>  	fdarray__exit(&evlist->pollfd);
> @@ -155,9 +155,9 @@ void perf_evlist__set_maps(struct perf_evlist *evlist,
>  	 * original reference count of 1.  If that is not the case it is up to
>  	 * the caller to increase the reference count.
>  	 */
> -	if (cpus != evlist->cpus) {
> -		perf_cpu_map__put(evlist->cpus);
> -		evlist->cpus = perf_cpu_map__get(cpus);
> +	if (cpus != evlist->user_cpus) {
> +		perf_cpu_map__put(evlist->user_cpus);
> +		evlist->user_cpus = perf_cpu_map__get(cpus);
>  	}
>  
>  	if (threads != evlist->threads) {
> @@ -294,7 +294,7 @@ int perf_evlist__id_add_fd(struct perf_evlist *evlist,
>  
>  int perf_evlist__alloc_pollfd(struct perf_evlist *evlist)
>  {
> -	int nr_cpus = perf_cpu_map__nr(evlist->cpus);
> +	int nr_cpus = perf_cpu_map__nr(evlist->user_cpus);
>  	int nr_threads = perf_thread_map__nr(evlist->threads);
>  	int nfds = 0;
>  	struct perf_evsel *evsel;
> @@ -426,7 +426,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
>  	       int idx, struct perf_mmap_param *mp, int cpu_idx,
>  	       int thread, int *_output, int *_output_overwrite)
>  {
> -	struct perf_cpu evlist_cpu = perf_cpu_map__cpu(evlist->cpus, cpu_idx);
> +	struct perf_cpu evlist_cpu = perf_cpu_map__cpu(evlist->user_cpus, cpu_idx);
>  	struct perf_evsel *evsel;
>  	int revent;
>  
> @@ -536,7 +536,7 @@ mmap_per_cpu(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
>  	     struct perf_mmap_param *mp)
>  {
>  	int nr_threads = perf_thread_map__nr(evlist->threads);
> -	int nr_cpus    = perf_cpu_map__nr(evlist->cpus);
> +	int nr_cpus    = perf_cpu_map__nr(evlist->user_cpus);
>  	int cpu, thread;
>  
>  	for (cpu = 0; cpu < nr_cpus; cpu++) {
> @@ -564,8 +564,8 @@ static int perf_evlist__nr_mmaps(struct perf_evlist *evlist)
>  {
>  	int nr_mmaps;
>  
> -	nr_mmaps = perf_cpu_map__nr(evlist->cpus);
> -	if (perf_cpu_map__empty(evlist->cpus))
> +	nr_mmaps = perf_cpu_map__nr(evlist->user_cpus);
> +	if (perf_cpu_map__empty(evlist->user_cpus))
>  		nr_mmaps = perf_thread_map__nr(evlist->threads);
>  
>  	return nr_mmaps;
> @@ -576,7 +576,7 @@ int perf_evlist__mmap_ops(struct perf_evlist *evlist,
>  			  struct perf_mmap_param *mp)
>  {
>  	struct perf_evsel *evsel;
> -	const struct perf_cpu_map *cpus = evlist->cpus;
> +	const struct perf_cpu_map *cpus = evlist->user_cpus;
>  	const struct perf_thread_map *threads = evlist->threads;
>  
>  	if (!ops || !ops->get || !ops->mmap)
> diff --git a/tools/lib/perf/include/internal/evlist.h b/tools/lib/perf/include/internal/evlist.h
> index 4cefade540bd..5f95672662ae 100644
> --- a/tools/lib/perf/include/internal/evlist.h
> +++ b/tools/lib/perf/include/internal/evlist.h
> @@ -19,7 +19,9 @@ struct perf_evlist {
>  	int			 nr_entries;
>  	int			 nr_groups;
>  	bool			 has_user_cpus;
> -	struct perf_cpu_map	*cpus;
> +	/** The list of cpus passed from the command line. */
> +	struct perf_cpu_map	*user_cpus;
> +	/** The union of all evsel cpu maps. */
>  	struct perf_cpu_map	*all_cpus;
>  	struct perf_thread_map	*threads;
>  	int			 nr_mmaps;
> diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
> index cbc555245959..405d58903d84 100644
> --- a/tools/perf/arch/arm/util/cs-etm.c
> +++ b/tools/perf/arch/arm/util/cs-etm.c
> @@ -199,7 +199,7 @@ static int cs_etm_set_option(struct auxtrace_record *itr,
>  			     struct evsel *evsel, u32 option)
>  {
>  	int i, err = -EINVAL;
> -	struct perf_cpu_map *event_cpus = evsel->evlist->core.cpus;
> +	struct perf_cpu_map *event_cpus = evsel->evlist->core.user_cpus;
>  	struct perf_cpu_map *online_cpus = perf_cpu_map__new(NULL);
>  
>  	/* Set option of each CPU we have */
> @@ -299,7 +299,7 @@ static int cs_etm_recording_options(struct auxtrace_record *itr,
>  				container_of(itr, struct cs_etm_recording, itr);
>  	struct perf_pmu *cs_etm_pmu = ptr->cs_etm_pmu;
>  	struct evsel *evsel, *cs_etm_evsel = NULL;
> -	struct perf_cpu_map *cpus = evlist->core.cpus;
> +	struct perf_cpu_map *cpus = evlist->core.user_cpus;
>  	bool privileged = perf_event_paranoid_check(-1);
>  	int err = 0;
>  
> @@ -522,7 +522,7 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused,
>  {
>  	int i;
>  	int etmv3 = 0, etmv4 = 0, ete = 0;
> -	struct perf_cpu_map *event_cpus = evlist->core.cpus;
> +	struct perf_cpu_map *event_cpus = evlist->core.user_cpus;
>  	struct perf_cpu_map *online_cpus = perf_cpu_map__new(NULL);
>  
>  	/* cpu map is not empty, we have specific CPUs to work with */
> @@ -713,7 +713,7 @@ static int cs_etm_info_fill(struct auxtrace_record *itr,
>  	u32 offset;
>  	u64 nr_cpu, type;
>  	struct perf_cpu_map *cpu_map;
> -	struct perf_cpu_map *event_cpus = session->evlist->core.cpus;
> +	struct perf_cpu_map *event_cpus = session->evlist->core.user_cpus;
>  	struct perf_cpu_map *online_cpus = perf_cpu_map__new(NULL);
>  	struct cs_etm_recording *ptr =
>  			container_of(itr, struct cs_etm_recording, itr);
> diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
> index 5860bbaea95a..83ad05613321 100644
> --- a/tools/perf/arch/arm64/util/arm-spe.c
> +++ b/tools/perf/arch/arm64/util/arm-spe.c
> @@ -144,7 +144,7 @@ static int arm_spe_recording_options(struct auxtrace_record *itr,
>  			container_of(itr, struct arm_spe_recording, itr);
>  	struct perf_pmu *arm_spe_pmu = sper->arm_spe_pmu;
>  	struct evsel *evsel, *arm_spe_evsel = NULL;
> -	struct perf_cpu_map *cpus = evlist->core.cpus;
> +	struct perf_cpu_map *cpus = evlist->core.user_cpus;
>  	bool privileged = perf_event_paranoid_check(-1);
>  	struct evsel *tracking_evsel;
>  	int err;
> diff --git a/tools/perf/arch/x86/util/intel-bts.c b/tools/perf/arch/x86/util/intel-bts.c
> index 4a76d49d25d6..c9d73ecfd795 100644
> --- a/tools/perf/arch/x86/util/intel-bts.c
> +++ b/tools/perf/arch/x86/util/intel-bts.c
> @@ -110,7 +110,7 @@ static int intel_bts_recording_options(struct auxtrace_record *itr,
>  			container_of(itr, struct intel_bts_recording, itr);
>  	struct perf_pmu *intel_bts_pmu = btsr->intel_bts_pmu;
>  	struct evsel *evsel, *intel_bts_evsel = NULL;
> -	const struct perf_cpu_map *cpus = evlist->core.cpus;
> +	const struct perf_cpu_map *cpus = evlist->core.user_cpus;
>  	bool privileged = perf_event_paranoid_check(-1);
>  
>  	if (opts->auxtrace_sample_mode) {
> diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
> index 8c31578d6f4a..58bf24960273 100644
> --- a/tools/perf/arch/x86/util/intel-pt.c
> +++ b/tools/perf/arch/x86/util/intel-pt.c
> @@ -382,7 +382,7 @@ static int intel_pt_info_fill(struct auxtrace_record *itr,
>  			ui__warning("Intel Processor Trace: TSC not available\n");
>  	}
>  
> -	per_cpu_mmaps = !perf_cpu_map__empty(session->evlist->core.cpus);
> +	per_cpu_mmaps = !perf_cpu_map__empty(session->evlist->core.user_cpus);
>  
>  	auxtrace_info->type = PERF_AUXTRACE_INTEL_PT;
>  	auxtrace_info->priv[INTEL_PT_PMU_TYPE] = intel_pt_pmu->type;
> @@ -632,7 +632,7 @@ static int intel_pt_recording_options(struct auxtrace_record *itr,
>  	struct perf_pmu *intel_pt_pmu = ptr->intel_pt_pmu;
>  	bool have_timing_info, need_immediate = false;
>  	struct evsel *evsel, *intel_pt_evsel = NULL;
> -	const struct perf_cpu_map *cpus = evlist->core.cpus;
> +	const struct perf_cpu_map *cpus = evlist->core.user_cpus;
>  	bool privileged = perf_event_paranoid_check(-1);
>  	u64 tsc_bit;
>  	int err;
> diff --git a/tools/perf/bench/evlist-open-close.c b/tools/perf/bench/evlist-open-close.c
> index de56601f69ee..5bdc6b476a4d 100644
> --- a/tools/perf/bench/evlist-open-close.c
> +++ b/tools/perf/bench/evlist-open-close.c
> @@ -151,7 +151,7 @@ static int bench_evlist_open_close__run(char *evstr)
>  
>  	init_stats(&time_stats);
>  
> -	printf("  Number of cpus:\t%d\n", perf_cpu_map__nr(evlist->core.cpus));
> +	printf("  Number of cpus:\t%d\n", perf_cpu_map__nr(evlist->core.user_cpus));
>  	printf("  Number of threads:\t%d\n", evlist->core.threads->nr);
>  	printf("  Number of events:\t%d (%d fds)\n",
>  		evlist->core.nr_entries, evlist__count_evsel_fds(evlist));
> diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
> index ad9ce1bfffa1..642cbc6fdfc5 100644
> --- a/tools/perf/builtin-ftrace.c
> +++ b/tools/perf/builtin-ftrace.c
> @@ -301,7 +301,7 @@ static int set_tracing_cpumask(struct perf_cpu_map *cpumap)
>  
>  static int set_tracing_cpu(struct perf_ftrace *ftrace)
>  {
> -	struct perf_cpu_map *cpumap = ftrace->evlist->core.cpus;
> +	struct perf_cpu_map *cpumap = ftrace->evlist->core.user_cpus;
>  
>  	if (!target__has_cpu(&ftrace->target))
>  		return 0;
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 0b4abed555d8..28ab3866802c 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -987,7 +987,7 @@ static int record__thread_data_init_maps(struct record_thread *thread_data, stru
>  	int m, tm, nr_mmaps = evlist->core.nr_mmaps;
>  	struct mmap *mmap = evlist->mmap;
>  	struct mmap *overwrite_mmap = evlist->overwrite_mmap;
> -	struct perf_cpu_map *cpus = evlist->core.cpus;
> +	struct perf_cpu_map *cpus = evlist->core.user_cpus;
>  
>  	thread_data->nr_mmaps = bitmap_weight(thread_data->mask->maps.bits,
>  					      thread_data->mask->maps.nbits);
> @@ -1881,7 +1881,7 @@ static int record__synthesize(struct record *rec, bool tail)
>  		return err;
>  	}
>  
> -	err = perf_event__synthesize_cpu_map(&rec->tool, rec->evlist->core.cpus,
> +	err = perf_event__synthesize_cpu_map(&rec->tool, rec->evlist->core.user_cpus,
>  					     process_synthesized_event, NULL);
>  	if (err < 0) {
>  		pr_err("Couldn't synthesize cpu map.\n");
> @@ -3675,7 +3675,7 @@ static int record__init_thread_default_masks(struct record *rec, struct perf_cpu
>  static int record__init_thread_masks(struct record *rec)
>  {
>  	int ret = 0;
> -	struct perf_cpu_map *cpus = rec->evlist->core.cpus;
> +	struct perf_cpu_map *cpus = rec->evlist->core.user_cpus;
>  
>  	if (!record__threads_enabled(rec))
>  		return record__init_thread_default_masks(rec, cpus);
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 4ee40de698a4..5bee529f7656 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -804,7 +804,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>  	if (group)
>  		evlist__set_leader(evsel_list);
>  
> -	if (!cpu_map__is_dummy(evsel_list->core.cpus)) {
> +	if (!cpu_map__is_dummy(evsel_list->core.user_cpus)) {
>  		if (affinity__setup(&saved_affinity) < 0)
>  			return -1;
>  		affinity = &saved_affinity;
> @@ -1458,7 +1458,7 @@ static int perf_stat_init_aggr_mode(void)
>  	aggr_cpu_id_get_t get_id = aggr_mode__get_aggr(stat_config.aggr_mode);
>  
>  	if (get_id) {
> -		stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.cpus,
> +		stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.user_cpus,
>  							 get_id, /*data=*/NULL);
>  		if (!stat_config.aggr_map) {
>  			pr_err("cannot build %s map", aggr_mode__string[stat_config.aggr_mode]);
> @@ -1472,7 +1472,7 @@ static int perf_stat_init_aggr_mode(void)
>  	 * taking the highest cpu number to be the size of
>  	 * the aggregation translate cpumap.
>  	 */
> -	nr = perf_cpu_map__max(evsel_list->core.cpus).cpu;
> +	nr = perf_cpu_map__max(evsel_list->core.user_cpus).cpu;
>  	stat_config.cpus_aggr_map = cpu_aggr_map__empty_new(nr + 1);
>  	return stat_config.cpus_aggr_map ? 0 : -ENOMEM;
>  }
> @@ -1627,7 +1627,7 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
>  	if (!get_id)
>  		return 0;
>  
> -	stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.cpus, get_id, env);
> +	stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.user_cpus, get_id, env);
>  	if (!stat_config.aggr_map) {
>  		pr_err("cannot build %s map", aggr_mode__string[stat_config.aggr_mode]);
>  		return -1;
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index 9b08e44a31d9..4cfa112292d0 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -1021,7 +1021,7 @@ static int perf_top__start_counters(struct perf_top *top)
>  
>  	evlist__for_each_entry(evlist, counter) {
>  try_again:
> -		if (evsel__open(counter, top->evlist->core.cpus,
> +		if (evsel__open(counter, top->evlist->core.user_cpus,
>  				     top->evlist->core.threads) < 0) {
>  
>  			/*
> diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
> index 9e48652662d4..b138dd6bdefc 100644
> --- a/tools/perf/util/auxtrace.c
> +++ b/tools/perf/util/auxtrace.c
> @@ -174,7 +174,7 @@ void auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp,
>  	mp->idx = idx;
>  
>  	if (per_cpu) {
> -		mp->cpu = perf_cpu_map__cpu(evlist->core.cpus, idx);
> +		mp->cpu = perf_cpu_map__cpu(evlist->core.user_cpus, idx);
>  		if (evlist->core.threads)
>  			mp->tid = perf_thread_map__pid(evlist->core.threads, 0);
>  		else
> diff --git a/tools/perf/util/bpf_ftrace.c b/tools/perf/util/bpf_ftrace.c
> index 4f4d3aaff37c..69481b28b885 100644
> --- a/tools/perf/util/bpf_ftrace.c
> +++ b/tools/perf/util/bpf_ftrace.c
> @@ -38,7 +38,7 @@ int perf_ftrace__latency_prepare_bpf(struct perf_ftrace *ftrace)
>  
>  	/* don't need to set cpu filter for system-wide mode */
>  	if (ftrace->target.cpu_list) {
> -		ncpus = perf_cpu_map__nr(ftrace->evlist->core.cpus);
> +		ncpus = perf_cpu_map__nr(ftrace->evlist->core.user_cpus);
>  		bpf_map__set_max_entries(skel->maps.cpu_filter, ncpus);
>  	}
>  
> @@ -63,7 +63,7 @@ int perf_ftrace__latency_prepare_bpf(struct perf_ftrace *ftrace)
>  		fd = bpf_map__fd(skel->maps.cpu_filter);
>  
>  		for (i = 0; i < ncpus; i++) {
> -			cpu = perf_cpu_map__cpu(ftrace->evlist->core.cpus, i).cpu;
> +			cpu = perf_cpu_map__cpu(ftrace->evlist->core.user_cpus, i).cpu;
>  			bpf_map_update_elem(fd, &cpu, &val, BPF_ANY);
>  		}
>  	}
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 9bb79e049957..d335fb713f5e 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -440,7 +440,7 @@ static void __evlist__disable(struct evlist *evlist, char *evsel_name)
>  	bool has_imm = false;
>  
>  	// See explanation in evlist__close()
> -	if (!cpu_map__is_dummy(evlist->core.cpus)) {
> +	if (!cpu_map__is_dummy(evlist->core.user_cpus)) {
>  		if (affinity__setup(&saved_affinity) < 0)
>  			return;
>  		affinity = &saved_affinity;
> @@ -500,7 +500,7 @@ static void __evlist__enable(struct evlist *evlist, char *evsel_name)
>  	struct affinity saved_affinity, *affinity = NULL;
>  
>  	// See explanation in evlist__close()
> -	if (!cpu_map__is_dummy(evlist->core.cpus)) {
> +	if (!cpu_map__is_dummy(evlist->core.user_cpus)) {
>  		if (affinity__setup(&saved_affinity) < 0)
>  			return;
>  		affinity = &saved_affinity;
> @@ -565,7 +565,7 @@ static int evlist__enable_event_cpu(struct evlist *evlist, struct evsel *evsel,
>  static int evlist__enable_event_thread(struct evlist *evlist, struct evsel *evsel, int thread)
>  {
>  	int cpu;
> -	int nr_cpus = perf_cpu_map__nr(evlist->core.cpus);
> +	int nr_cpus = perf_cpu_map__nr(evlist->core.user_cpus);
>  
>  	if (!evsel->core.fd)
>  		return -EINVAL;
> @@ -580,7 +580,7 @@ static int evlist__enable_event_thread(struct evlist *evlist, struct evsel *evse
>  
>  int evlist__enable_event_idx(struct evlist *evlist, struct evsel *evsel, int idx)
>  {
> -	bool per_cpu_mmaps = !perf_cpu_map__empty(evlist->core.cpus);
> +	bool per_cpu_mmaps = !perf_cpu_map__empty(evlist->core.user_cpus);
>  
>  	if (per_cpu_mmaps)
>  		return evlist__enable_event_cpu(evlist, evsel, idx);
> @@ -1301,10 +1301,10 @@ void evlist__close(struct evlist *evlist)
>  	struct affinity affinity;
>  
>  	/*
> -	 * With perf record core.cpus is usually NULL.
> +	 * With perf record core.user_cpus is usually NULL.
>  	 * Use the old method to handle this for now.
>  	 */
> -	if (!evlist->core.cpus || cpu_map__is_dummy(evlist->core.cpus)) {
> +	if (!evlist->core.user_cpus || cpu_map__is_dummy(evlist->core.user_cpus)) {
>  		evlist__for_each_entry_reverse(evlist, evsel)
>  			evsel__close(evsel);
>  		return;
> @@ -1367,7 +1367,7 @@ int evlist__open(struct evlist *evlist)
>  	 * Default: one fd per CPU, all threads, aka systemwide
>  	 * as sys_perf_event_open(cpu = -1, thread = -1) is EINVAL
>  	 */
> -	if (evlist->core.threads == NULL && evlist->core.cpus == NULL) {
> +	if (evlist->core.threads == NULL && evlist->core.user_cpus == NULL) {
>  		err = evlist__create_syswide_maps(evlist);
>  		if (err < 0)
>  			goto out_err;
> diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
> index 007a64681416..ff326eba084f 100644
> --- a/tools/perf/util/record.c
> +++ b/tools/perf/util/record.c
> @@ -106,7 +106,7 @@ void evlist__config(struct evlist *evlist, struct record_opts *opts, struct call
>  	if (opts->group)
>  		evlist__set_leader(evlist);
>  
> -	if (perf_cpu_map__cpu(evlist->core.cpus, 0).cpu < 0)
> +	if (perf_cpu_map__cpu(evlist->core.user_cpus, 0).cpu < 0)
>  		opts->no_inherit = true;
>  
>  	use_comm_exec = perf_can_comm_exec();
> @@ -244,7 +244,7 @@ bool evlist__can_select_event(struct evlist *evlist, const char *str)
>  
>  	evsel = evlist__last(temp_evlist);
>  
> -	if (!evlist || perf_cpu_map__empty(evlist->core.cpus)) {
> +	if (!evlist || perf_cpu_map__empty(evlist->core.user_cpus)) {
>  		struct perf_cpu_map *cpus = perf_cpu_map__new(NULL);
>  
>  		if (cpus)
> @@ -252,7 +252,7 @@ bool evlist__can_select_event(struct evlist *evlist, const char *str)
>  
>  		perf_cpu_map__put(cpus);
>  	} else {
> -		cpu = perf_cpu_map__cpu(evlist->core.cpus, 0);
> +		cpu = perf_cpu_map__cpu(evlist->core.user_cpus, 0);
>  	}
>  
>  	while (1) {
> diff --git a/tools/perf/util/sideband_evlist.c b/tools/perf/util/sideband_evlist.c
> index 748371ac22be..9f58c68a25f7 100644
> --- a/tools/perf/util/sideband_evlist.c
> +++ b/tools/perf/util/sideband_evlist.c
> @@ -114,7 +114,7 @@ int evlist__start_sb_thread(struct evlist *evlist, struct target *target)
>  	}
>  
>  	evlist__for_each_entry(evlist, counter) {
> -		if (evsel__open(counter, evlist->core.cpus, evlist->core.threads) < 0)
> +		if (evsel__open(counter, evlist->core.user_cpus, evlist->core.threads) < 0)
>  			goto out_delete_evlist;
>  	}
>  
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index 9cbe351b141f..634dd9ea2b35 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -929,7 +929,7 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
>  	int all_idx;
>  	struct perf_cpu cpu;
>  
> -	perf_cpu_map__for_each_cpu(cpu, all_idx, evlist->core.cpus) {
> +	perf_cpu_map__for_each_cpu(cpu, all_idx, evlist->core.user_cpus) {
>  		struct evsel *counter;
>  		bool first = true;
>  
> diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> index b654de0841f8..591afc6c607b 100644
> --- a/tools/perf/util/synthetic-events.c
> +++ b/tools/perf/util/synthetic-events.c
> @@ -2127,7 +2127,7 @@ int perf_event__synthesize_stat_events(struct perf_stat_config *config, struct p
>  		return err;
>  	}
>  
> -	err = perf_event__synthesize_cpu_map(tool, evlist->core.cpus, process, NULL);
> +	err = perf_event__synthesize_cpu_map(tool, evlist->core.user_cpus, process, NULL);
>  	if (err < 0) {
>  		pr_err("Couldn't synthesize thread map.\n");
>  		return err;
> diff --git a/tools/perf/util/top.c b/tools/perf/util/top.c
> index c1ebfc5d2e0c..e98422f3ff17 100644
> --- a/tools/perf/util/top.c
> +++ b/tools/perf/util/top.c
> @@ -95,15 +95,16 @@ size_t perf_top__header_snprintf(struct perf_top *top, char *bf, size_t size)
>  
>  	if (target->cpu_list)
>  		ret += SNPRINTF(bf + ret, size - ret, ", CPU%s: %s)",
> -				perf_cpu_map__nr(top->evlist->core.cpus) > 1 ? "s" : "",
> +				perf_cpu_map__nr(top->evlist->core.user_cpus) > 1 ? "s" : "",
>  				target->cpu_list);
>  	else {
>  		if (target->tid)
>  			ret += SNPRINTF(bf + ret, size - ret, ")");
>  		else
>  			ret += SNPRINTF(bf + ret, size - ret, ", %d CPU%s)",
> -					perf_cpu_map__nr(top->evlist->core.cpus),
> -					perf_cpu_map__nr(top->evlist->core.cpus) > 1 ? "s" : "");
> +					perf_cpu_map__nr(top->evlist->core.user_cpus),
> +					perf_cpu_map__nr(top->evlist->core.user_cpus) > 1
> +					? "s" : "");
>  	}
>  
>  	perf_top__reset_sample_counters(top);
> -- 
> 2.35.1.1021.g381101b075-goog

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/5] perf evlist: Rename cpus to user_cpus
@ 2022-03-28 20:18     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 34+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-03-28 20:18 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Adrian Hunter, Riccardo Mancini, Andi Kleen,
	Alexey Bayduraev, Alexander Antonov, linux-perf-users,
	linux-kernel, coresight, linux-arm-kernel, netdev, bpf,
	Stephane Eranian

Em Sun, Mar 27, 2022 at 11:24:10PM -0700, Ian Rogers escreveu:
> evlist contains cpus and all_cpus. all_cpus is the union of the cpu maps
> of all evsels. cpus is set to be cpus required from the command line,

Can we replace "required" with "requested"?

> defaulting to all online cpus if no cpus are specified.

> For something like an uncore event, all_cpus may just be CPU 0,
> however, all_cpus may be every online CPU.

Can this be rephrased as:

"For an uncore event, all_cpus may be just CPU 0 or every online CPU."

?

> This causes all_cpus to have fewer values than the cpus variable which
> is confusing given the 'all' in the name. To try to make the behavior
> clearer, rename cpus to user_cpus and add comments on the two struct
> variables.

"user_cpus" can as well mean CPUs where we should only sample user level
events, so perhaps bite the bullet and rename it to the longer

  'evlist->user_requested_cpus'

?

- Arnaldo
 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/lib/perf/evlist.c                  | 28 ++++++++++++------------
>  tools/lib/perf/include/internal/evlist.h |  4 +++-
>  tools/perf/arch/arm/util/cs-etm.c        |  8 +++----
>  tools/perf/arch/arm64/util/arm-spe.c     |  2 +-
>  tools/perf/arch/x86/util/intel-bts.c     |  2 +-
>  tools/perf/arch/x86/util/intel-pt.c      |  4 ++--
>  tools/perf/bench/evlist-open-close.c     |  2 +-
>  tools/perf/builtin-ftrace.c              |  2 +-
>  tools/perf/builtin-record.c              |  6 ++---
>  tools/perf/builtin-stat.c                |  8 +++----
>  tools/perf/builtin-top.c                 |  2 +-
>  tools/perf/util/auxtrace.c               |  2 +-
>  tools/perf/util/bpf_ftrace.c             |  4 ++--
>  tools/perf/util/evlist.c                 | 14 ++++++------
>  tools/perf/util/record.c                 |  6 ++---
>  tools/perf/util/sideband_evlist.c        |  2 +-
>  tools/perf/util/stat-display.c           |  2 +-
>  tools/perf/util/synthetic-events.c       |  2 +-
>  tools/perf/util/top.c                    |  7 +++---
>  19 files changed, 55 insertions(+), 52 deletions(-)
> 
> diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
> index 9a770bfdc804..e29dc229768a 100644
> --- a/tools/lib/perf/evlist.c
> +++ b/tools/lib/perf/evlist.c
> @@ -41,10 +41,10 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
>  	 */
>  	if (!evsel->own_cpus || evlist->has_user_cpus) {
>  		perf_cpu_map__put(evsel->cpus);
> -		evsel->cpus = perf_cpu_map__get(evlist->cpus);
> -	} else if (!evsel->system_wide && perf_cpu_map__empty(evlist->cpus)) {
> +		evsel->cpus = perf_cpu_map__get(evlist->user_cpus);
> +	} else if (!evsel->system_wide && perf_cpu_map__empty(evlist->user_cpus)) {
>  		perf_cpu_map__put(evsel->cpus);
> -		evsel->cpus = perf_cpu_map__get(evlist->cpus);
> +		evsel->cpus = perf_cpu_map__get(evlist->user_cpus);
>  	} else if (evsel->cpus != evsel->own_cpus) {
>  		perf_cpu_map__put(evsel->cpus);
>  		evsel->cpus = perf_cpu_map__get(evsel->own_cpus);
> @@ -123,10 +123,10 @@ static void perf_evlist__purge(struct perf_evlist *evlist)
>  
>  void perf_evlist__exit(struct perf_evlist *evlist)
>  {
> -	perf_cpu_map__put(evlist->cpus);
> +	perf_cpu_map__put(evlist->user_cpus);
>  	perf_cpu_map__put(evlist->all_cpus);
>  	perf_thread_map__put(evlist->threads);
> -	evlist->cpus = NULL;
> +	evlist->user_cpus = NULL;
>  	evlist->all_cpus = NULL;
>  	evlist->threads = NULL;
>  	fdarray__exit(&evlist->pollfd);
> @@ -155,9 +155,9 @@ void perf_evlist__set_maps(struct perf_evlist *evlist,
>  	 * original reference count of 1.  If that is not the case it is up to
>  	 * the caller to increase the reference count.
>  	 */
> -	if (cpus != evlist->cpus) {
> -		perf_cpu_map__put(evlist->cpus);
> -		evlist->cpus = perf_cpu_map__get(cpus);
> +	if (cpus != evlist->user_cpus) {
> +		perf_cpu_map__put(evlist->user_cpus);
> +		evlist->user_cpus = perf_cpu_map__get(cpus);
>  	}
>  
>  	if (threads != evlist->threads) {
> @@ -294,7 +294,7 @@ int perf_evlist__id_add_fd(struct perf_evlist *evlist,
>  
>  int perf_evlist__alloc_pollfd(struct perf_evlist *evlist)
>  {
> -	int nr_cpus = perf_cpu_map__nr(evlist->cpus);
> +	int nr_cpus = perf_cpu_map__nr(evlist->user_cpus);
>  	int nr_threads = perf_thread_map__nr(evlist->threads);
>  	int nfds = 0;
>  	struct perf_evsel *evsel;
> @@ -426,7 +426,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
>  	       int idx, struct perf_mmap_param *mp, int cpu_idx,
>  	       int thread, int *_output, int *_output_overwrite)
>  {
> -	struct perf_cpu evlist_cpu = perf_cpu_map__cpu(evlist->cpus, cpu_idx);
> +	struct perf_cpu evlist_cpu = perf_cpu_map__cpu(evlist->user_cpus, cpu_idx);
>  	struct perf_evsel *evsel;
>  	int revent;
>  
> @@ -536,7 +536,7 @@ mmap_per_cpu(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
>  	     struct perf_mmap_param *mp)
>  {
>  	int nr_threads = perf_thread_map__nr(evlist->threads);
> -	int nr_cpus    = perf_cpu_map__nr(evlist->cpus);
> +	int nr_cpus    = perf_cpu_map__nr(evlist->user_cpus);
>  	int cpu, thread;
>  
>  	for (cpu = 0; cpu < nr_cpus; cpu++) {
> @@ -564,8 +564,8 @@ static int perf_evlist__nr_mmaps(struct perf_evlist *evlist)
>  {
>  	int nr_mmaps;
>  
> -	nr_mmaps = perf_cpu_map__nr(evlist->cpus);
> -	if (perf_cpu_map__empty(evlist->cpus))
> +	nr_mmaps = perf_cpu_map__nr(evlist->user_cpus);
> +	if (perf_cpu_map__empty(evlist->user_cpus))
>  		nr_mmaps = perf_thread_map__nr(evlist->threads);
>  
>  	return nr_mmaps;
> @@ -576,7 +576,7 @@ int perf_evlist__mmap_ops(struct perf_evlist *evlist,
>  			  struct perf_mmap_param *mp)
>  {
>  	struct perf_evsel *evsel;
> -	const struct perf_cpu_map *cpus = evlist->cpus;
> +	const struct perf_cpu_map *cpus = evlist->user_cpus;
>  	const struct perf_thread_map *threads = evlist->threads;
>  
>  	if (!ops || !ops->get || !ops->mmap)
> diff --git a/tools/lib/perf/include/internal/evlist.h b/tools/lib/perf/include/internal/evlist.h
> index 4cefade540bd..5f95672662ae 100644
> --- a/tools/lib/perf/include/internal/evlist.h
> +++ b/tools/lib/perf/include/internal/evlist.h
> @@ -19,7 +19,9 @@ struct perf_evlist {
>  	int			 nr_entries;
>  	int			 nr_groups;
>  	bool			 has_user_cpus;
> -	struct perf_cpu_map	*cpus;
> +	/** The list of cpus passed from the command line. */
> +	struct perf_cpu_map	*user_cpus;
> +	/** The union of all evsel cpu maps. */
>  	struct perf_cpu_map	*all_cpus;
>  	struct perf_thread_map	*threads;
>  	int			 nr_mmaps;
> diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
> index cbc555245959..405d58903d84 100644
> --- a/tools/perf/arch/arm/util/cs-etm.c
> +++ b/tools/perf/arch/arm/util/cs-etm.c
> @@ -199,7 +199,7 @@ static int cs_etm_set_option(struct auxtrace_record *itr,
>  			     struct evsel *evsel, u32 option)
>  {
>  	int i, err = -EINVAL;
> -	struct perf_cpu_map *event_cpus = evsel->evlist->core.cpus;
> +	struct perf_cpu_map *event_cpus = evsel->evlist->core.user_cpus;
>  	struct perf_cpu_map *online_cpus = perf_cpu_map__new(NULL);
>  
>  	/* Set option of each CPU we have */
> @@ -299,7 +299,7 @@ static int cs_etm_recording_options(struct auxtrace_record *itr,
>  				container_of(itr, struct cs_etm_recording, itr);
>  	struct perf_pmu *cs_etm_pmu = ptr->cs_etm_pmu;
>  	struct evsel *evsel, *cs_etm_evsel = NULL;
> -	struct perf_cpu_map *cpus = evlist->core.cpus;
> +	struct perf_cpu_map *cpus = evlist->core.user_cpus;
>  	bool privileged = perf_event_paranoid_check(-1);
>  	int err = 0;
>  
> @@ -522,7 +522,7 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused,
>  {
>  	int i;
>  	int etmv3 = 0, etmv4 = 0, ete = 0;
> -	struct perf_cpu_map *event_cpus = evlist->core.cpus;
> +	struct perf_cpu_map *event_cpus = evlist->core.user_cpus;
>  	struct perf_cpu_map *online_cpus = perf_cpu_map__new(NULL);
>  
>  	/* cpu map is not empty, we have specific CPUs to work with */
> @@ -713,7 +713,7 @@ static int cs_etm_info_fill(struct auxtrace_record *itr,
>  	u32 offset;
>  	u64 nr_cpu, type;
>  	struct perf_cpu_map *cpu_map;
> -	struct perf_cpu_map *event_cpus = session->evlist->core.cpus;
> +	struct perf_cpu_map *event_cpus = session->evlist->core.user_cpus;
>  	struct perf_cpu_map *online_cpus = perf_cpu_map__new(NULL);
>  	struct cs_etm_recording *ptr =
>  			container_of(itr, struct cs_etm_recording, itr);
> diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
> index 5860bbaea95a..83ad05613321 100644
> --- a/tools/perf/arch/arm64/util/arm-spe.c
> +++ b/tools/perf/arch/arm64/util/arm-spe.c
> @@ -144,7 +144,7 @@ static int arm_spe_recording_options(struct auxtrace_record *itr,
>  			container_of(itr, struct arm_spe_recording, itr);
>  	struct perf_pmu *arm_spe_pmu = sper->arm_spe_pmu;
>  	struct evsel *evsel, *arm_spe_evsel = NULL;
> -	struct perf_cpu_map *cpus = evlist->core.cpus;
> +	struct perf_cpu_map *cpus = evlist->core.user_cpus;
>  	bool privileged = perf_event_paranoid_check(-1);
>  	struct evsel *tracking_evsel;
>  	int err;
> diff --git a/tools/perf/arch/x86/util/intel-bts.c b/tools/perf/arch/x86/util/intel-bts.c
> index 4a76d49d25d6..c9d73ecfd795 100644
> --- a/tools/perf/arch/x86/util/intel-bts.c
> +++ b/tools/perf/arch/x86/util/intel-bts.c
> @@ -110,7 +110,7 @@ static int intel_bts_recording_options(struct auxtrace_record *itr,
>  			container_of(itr, struct intel_bts_recording, itr);
>  	struct perf_pmu *intel_bts_pmu = btsr->intel_bts_pmu;
>  	struct evsel *evsel, *intel_bts_evsel = NULL;
> -	const struct perf_cpu_map *cpus = evlist->core.cpus;
> +	const struct perf_cpu_map *cpus = evlist->core.user_cpus;
>  	bool privileged = perf_event_paranoid_check(-1);
>  
>  	if (opts->auxtrace_sample_mode) {
> diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
> index 8c31578d6f4a..58bf24960273 100644
> --- a/tools/perf/arch/x86/util/intel-pt.c
> +++ b/tools/perf/arch/x86/util/intel-pt.c
> @@ -382,7 +382,7 @@ static int intel_pt_info_fill(struct auxtrace_record *itr,
>  			ui__warning("Intel Processor Trace: TSC not available\n");
>  	}
>  
> -	per_cpu_mmaps = !perf_cpu_map__empty(session->evlist->core.cpus);
> +	per_cpu_mmaps = !perf_cpu_map__empty(session->evlist->core.user_cpus);
>  
>  	auxtrace_info->type = PERF_AUXTRACE_INTEL_PT;
>  	auxtrace_info->priv[INTEL_PT_PMU_TYPE] = intel_pt_pmu->type;
> @@ -632,7 +632,7 @@ static int intel_pt_recording_options(struct auxtrace_record *itr,
>  	struct perf_pmu *intel_pt_pmu = ptr->intel_pt_pmu;
>  	bool have_timing_info, need_immediate = false;
>  	struct evsel *evsel, *intel_pt_evsel = NULL;
> -	const struct perf_cpu_map *cpus = evlist->core.cpus;
> +	const struct perf_cpu_map *cpus = evlist->core.user_cpus;
>  	bool privileged = perf_event_paranoid_check(-1);
>  	u64 tsc_bit;
>  	int err;
> diff --git a/tools/perf/bench/evlist-open-close.c b/tools/perf/bench/evlist-open-close.c
> index de56601f69ee..5bdc6b476a4d 100644
> --- a/tools/perf/bench/evlist-open-close.c
> +++ b/tools/perf/bench/evlist-open-close.c
> @@ -151,7 +151,7 @@ static int bench_evlist_open_close__run(char *evstr)
>  
>  	init_stats(&time_stats);
>  
> -	printf("  Number of cpus:\t%d\n", perf_cpu_map__nr(evlist->core.cpus));
> +	printf("  Number of cpus:\t%d\n", perf_cpu_map__nr(evlist->core.user_cpus));
>  	printf("  Number of threads:\t%d\n", evlist->core.threads->nr);
>  	printf("  Number of events:\t%d (%d fds)\n",
>  		evlist->core.nr_entries, evlist__count_evsel_fds(evlist));
> diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
> index ad9ce1bfffa1..642cbc6fdfc5 100644
> --- a/tools/perf/builtin-ftrace.c
> +++ b/tools/perf/builtin-ftrace.c
> @@ -301,7 +301,7 @@ static int set_tracing_cpumask(struct perf_cpu_map *cpumap)
>  
>  static int set_tracing_cpu(struct perf_ftrace *ftrace)
>  {
> -	struct perf_cpu_map *cpumap = ftrace->evlist->core.cpus;
> +	struct perf_cpu_map *cpumap = ftrace->evlist->core.user_cpus;
>  
>  	if (!target__has_cpu(&ftrace->target))
>  		return 0;
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 0b4abed555d8..28ab3866802c 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -987,7 +987,7 @@ static int record__thread_data_init_maps(struct record_thread *thread_data, stru
>  	int m, tm, nr_mmaps = evlist->core.nr_mmaps;
>  	struct mmap *mmap = evlist->mmap;
>  	struct mmap *overwrite_mmap = evlist->overwrite_mmap;
> -	struct perf_cpu_map *cpus = evlist->core.cpus;
> +	struct perf_cpu_map *cpus = evlist->core.user_cpus;
>  
>  	thread_data->nr_mmaps = bitmap_weight(thread_data->mask->maps.bits,
>  					      thread_data->mask->maps.nbits);
> @@ -1881,7 +1881,7 @@ static int record__synthesize(struct record *rec, bool tail)
>  		return err;
>  	}
>  
> -	err = perf_event__synthesize_cpu_map(&rec->tool, rec->evlist->core.cpus,
> +	err = perf_event__synthesize_cpu_map(&rec->tool, rec->evlist->core.user_cpus,
>  					     process_synthesized_event, NULL);
>  	if (err < 0) {
>  		pr_err("Couldn't synthesize cpu map.\n");
> @@ -3675,7 +3675,7 @@ static int record__init_thread_default_masks(struct record *rec, struct perf_cpu
>  static int record__init_thread_masks(struct record *rec)
>  {
>  	int ret = 0;
> -	struct perf_cpu_map *cpus = rec->evlist->core.cpus;
> +	struct perf_cpu_map *cpus = rec->evlist->core.user_cpus;
>  
>  	if (!record__threads_enabled(rec))
>  		return record__init_thread_default_masks(rec, cpus);
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 4ee40de698a4..5bee529f7656 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -804,7 +804,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>  	if (group)
>  		evlist__set_leader(evsel_list);
>  
> -	if (!cpu_map__is_dummy(evsel_list->core.cpus)) {
> +	if (!cpu_map__is_dummy(evsel_list->core.user_cpus)) {
>  		if (affinity__setup(&saved_affinity) < 0)
>  			return -1;
>  		affinity = &saved_affinity;
> @@ -1458,7 +1458,7 @@ static int perf_stat_init_aggr_mode(void)
>  	aggr_cpu_id_get_t get_id = aggr_mode__get_aggr(stat_config.aggr_mode);
>  
>  	if (get_id) {
> -		stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.cpus,
> +		stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.user_cpus,
>  							 get_id, /*data=*/NULL);
>  		if (!stat_config.aggr_map) {
>  			pr_err("cannot build %s map", aggr_mode__string[stat_config.aggr_mode]);
> @@ -1472,7 +1472,7 @@ static int perf_stat_init_aggr_mode(void)
>  	 * taking the highest cpu number to be the size of
>  	 * the aggregation translate cpumap.
>  	 */
> -	nr = perf_cpu_map__max(evsel_list->core.cpus).cpu;
> +	nr = perf_cpu_map__max(evsel_list->core.user_cpus).cpu;
>  	stat_config.cpus_aggr_map = cpu_aggr_map__empty_new(nr + 1);
>  	return stat_config.cpus_aggr_map ? 0 : -ENOMEM;
>  }
> @@ -1627,7 +1627,7 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
>  	if (!get_id)
>  		return 0;
>  
> -	stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.cpus, get_id, env);
> +	stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.user_cpus, get_id, env);
>  	if (!stat_config.aggr_map) {
>  		pr_err("cannot build %s map", aggr_mode__string[stat_config.aggr_mode]);
>  		return -1;
> diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> index 9b08e44a31d9..4cfa112292d0 100644
> --- a/tools/perf/builtin-top.c
> +++ b/tools/perf/builtin-top.c
> @@ -1021,7 +1021,7 @@ static int perf_top__start_counters(struct perf_top *top)
>  
>  	evlist__for_each_entry(evlist, counter) {
>  try_again:
> -		if (evsel__open(counter, top->evlist->core.cpus,
> +		if (evsel__open(counter, top->evlist->core.user_cpus,
>  				     top->evlist->core.threads) < 0) {
>  
>  			/*
> diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
> index 9e48652662d4..b138dd6bdefc 100644
> --- a/tools/perf/util/auxtrace.c
> +++ b/tools/perf/util/auxtrace.c
> @@ -174,7 +174,7 @@ void auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp,
>  	mp->idx = idx;
>  
>  	if (per_cpu) {
> -		mp->cpu = perf_cpu_map__cpu(evlist->core.cpus, idx);
> +		mp->cpu = perf_cpu_map__cpu(evlist->core.user_cpus, idx);
>  		if (evlist->core.threads)
>  			mp->tid = perf_thread_map__pid(evlist->core.threads, 0);
>  		else
> diff --git a/tools/perf/util/bpf_ftrace.c b/tools/perf/util/bpf_ftrace.c
> index 4f4d3aaff37c..69481b28b885 100644
> --- a/tools/perf/util/bpf_ftrace.c
> +++ b/tools/perf/util/bpf_ftrace.c
> @@ -38,7 +38,7 @@ int perf_ftrace__latency_prepare_bpf(struct perf_ftrace *ftrace)
>  
>  	/* don't need to set cpu filter for system-wide mode */
>  	if (ftrace->target.cpu_list) {
> -		ncpus = perf_cpu_map__nr(ftrace->evlist->core.cpus);
> +		ncpus = perf_cpu_map__nr(ftrace->evlist->core.user_cpus);
>  		bpf_map__set_max_entries(skel->maps.cpu_filter, ncpus);
>  	}
>  
> @@ -63,7 +63,7 @@ int perf_ftrace__latency_prepare_bpf(struct perf_ftrace *ftrace)
>  		fd = bpf_map__fd(skel->maps.cpu_filter);
>  
>  		for (i = 0; i < ncpus; i++) {
> -			cpu = perf_cpu_map__cpu(ftrace->evlist->core.cpus, i).cpu;
> +			cpu = perf_cpu_map__cpu(ftrace->evlist->core.user_cpus, i).cpu;
>  			bpf_map_update_elem(fd, &cpu, &val, BPF_ANY);
>  		}
>  	}
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 9bb79e049957..d335fb713f5e 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -440,7 +440,7 @@ static void __evlist__disable(struct evlist *evlist, char *evsel_name)
>  	bool has_imm = false;
>  
>  	// See explanation in evlist__close()
> -	if (!cpu_map__is_dummy(evlist->core.cpus)) {
> +	if (!cpu_map__is_dummy(evlist->core.user_cpus)) {
>  		if (affinity__setup(&saved_affinity) < 0)
>  			return;
>  		affinity = &saved_affinity;
> @@ -500,7 +500,7 @@ static void __evlist__enable(struct evlist *evlist, char *evsel_name)
>  	struct affinity saved_affinity, *affinity = NULL;
>  
>  	// See explanation in evlist__close()
> -	if (!cpu_map__is_dummy(evlist->core.cpus)) {
> +	if (!cpu_map__is_dummy(evlist->core.user_cpus)) {
>  		if (affinity__setup(&saved_affinity) < 0)
>  			return;
>  		affinity = &saved_affinity;
> @@ -565,7 +565,7 @@ static int evlist__enable_event_cpu(struct evlist *evlist, struct evsel *evsel,
>  static int evlist__enable_event_thread(struct evlist *evlist, struct evsel *evsel, int thread)
>  {
>  	int cpu;
> -	int nr_cpus = perf_cpu_map__nr(evlist->core.cpus);
> +	int nr_cpus = perf_cpu_map__nr(evlist->core.user_cpus);
>  
>  	if (!evsel->core.fd)
>  		return -EINVAL;
> @@ -580,7 +580,7 @@ static int evlist__enable_event_thread(struct evlist *evlist, struct evsel *evse
>  
>  int evlist__enable_event_idx(struct evlist *evlist, struct evsel *evsel, int idx)
>  {
> -	bool per_cpu_mmaps = !perf_cpu_map__empty(evlist->core.cpus);
> +	bool per_cpu_mmaps = !perf_cpu_map__empty(evlist->core.user_cpus);
>  
>  	if (per_cpu_mmaps)
>  		return evlist__enable_event_cpu(evlist, evsel, idx);
> @@ -1301,10 +1301,10 @@ void evlist__close(struct evlist *evlist)
>  	struct affinity affinity;
>  
>  	/*
> -	 * With perf record core.cpus is usually NULL.
> +	 * With perf record core.user_cpus is usually NULL.
>  	 * Use the old method to handle this for now.
>  	 */
> -	if (!evlist->core.cpus || cpu_map__is_dummy(evlist->core.cpus)) {
> +	if (!evlist->core.user_cpus || cpu_map__is_dummy(evlist->core.user_cpus)) {
>  		evlist__for_each_entry_reverse(evlist, evsel)
>  			evsel__close(evsel);
>  		return;
> @@ -1367,7 +1367,7 @@ int evlist__open(struct evlist *evlist)
>  	 * Default: one fd per CPU, all threads, aka systemwide
>  	 * as sys_perf_event_open(cpu = -1, thread = -1) is EINVAL
>  	 */
> -	if (evlist->core.threads == NULL && evlist->core.cpus == NULL) {
> +	if (evlist->core.threads == NULL && evlist->core.user_cpus == NULL) {
>  		err = evlist__create_syswide_maps(evlist);
>  		if (err < 0)
>  			goto out_err;
> diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
> index 007a64681416..ff326eba084f 100644
> --- a/tools/perf/util/record.c
> +++ b/tools/perf/util/record.c
> @@ -106,7 +106,7 @@ void evlist__config(struct evlist *evlist, struct record_opts *opts, struct call
>  	if (opts->group)
>  		evlist__set_leader(evlist);
>  
> -	if (perf_cpu_map__cpu(evlist->core.cpus, 0).cpu < 0)
> +	if (perf_cpu_map__cpu(evlist->core.user_cpus, 0).cpu < 0)
>  		opts->no_inherit = true;
>  
>  	use_comm_exec = perf_can_comm_exec();
> @@ -244,7 +244,7 @@ bool evlist__can_select_event(struct evlist *evlist, const char *str)
>  
>  	evsel = evlist__last(temp_evlist);
>  
> -	if (!evlist || perf_cpu_map__empty(evlist->core.cpus)) {
> +	if (!evlist || perf_cpu_map__empty(evlist->core.user_cpus)) {
>  		struct perf_cpu_map *cpus = perf_cpu_map__new(NULL);
>  
>  		if (cpus)
> @@ -252,7 +252,7 @@ bool evlist__can_select_event(struct evlist *evlist, const char *str)
>  
>  		perf_cpu_map__put(cpus);
>  	} else {
> -		cpu = perf_cpu_map__cpu(evlist->core.cpus, 0);
> +		cpu = perf_cpu_map__cpu(evlist->core.user_cpus, 0);
>  	}
>  
>  	while (1) {
> diff --git a/tools/perf/util/sideband_evlist.c b/tools/perf/util/sideband_evlist.c
> index 748371ac22be..9f58c68a25f7 100644
> --- a/tools/perf/util/sideband_evlist.c
> +++ b/tools/perf/util/sideband_evlist.c
> @@ -114,7 +114,7 @@ int evlist__start_sb_thread(struct evlist *evlist, struct target *target)
>  	}
>  
>  	evlist__for_each_entry(evlist, counter) {
> -		if (evsel__open(counter, evlist->core.cpus, evlist->core.threads) < 0)
> +		if (evsel__open(counter, evlist->core.user_cpus, evlist->core.threads) < 0)
>  			goto out_delete_evlist;
>  	}
>  
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index 9cbe351b141f..634dd9ea2b35 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -929,7 +929,7 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
>  	int all_idx;
>  	struct perf_cpu cpu;
>  
> -	perf_cpu_map__for_each_cpu(cpu, all_idx, evlist->core.cpus) {
> +	perf_cpu_map__for_each_cpu(cpu, all_idx, evlist->core.user_cpus) {
>  		struct evsel *counter;
>  		bool first = true;
>  
> diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> index b654de0841f8..591afc6c607b 100644
> --- a/tools/perf/util/synthetic-events.c
> +++ b/tools/perf/util/synthetic-events.c
> @@ -2127,7 +2127,7 @@ int perf_event__synthesize_stat_events(struct perf_stat_config *config, struct p
>  		return err;
>  	}
>  
> -	err = perf_event__synthesize_cpu_map(tool, evlist->core.cpus, process, NULL);
> +	err = perf_event__synthesize_cpu_map(tool, evlist->core.user_cpus, process, NULL);
>  	if (err < 0) {
>  		pr_err("Couldn't synthesize thread map.\n");
>  		return err;
> diff --git a/tools/perf/util/top.c b/tools/perf/util/top.c
> index c1ebfc5d2e0c..e98422f3ff17 100644
> --- a/tools/perf/util/top.c
> +++ b/tools/perf/util/top.c
> @@ -95,15 +95,16 @@ size_t perf_top__header_snprintf(struct perf_top *top, char *bf, size_t size)
>  
>  	if (target->cpu_list)
>  		ret += SNPRINTF(bf + ret, size - ret, ", CPU%s: %s)",
> -				perf_cpu_map__nr(top->evlist->core.cpus) > 1 ? "s" : "",
> +				perf_cpu_map__nr(top->evlist->core.user_cpus) > 1 ? "s" : "",
>  				target->cpu_list);
>  	else {
>  		if (target->tid)
>  			ret += SNPRINTF(bf + ret, size - ret, ")");
>  		else
>  			ret += SNPRINTF(bf + ret, size - ret, ", %d CPU%s)",
> -					perf_cpu_map__nr(top->evlist->core.cpus),
> -					perf_cpu_map__nr(top->evlist->core.cpus) > 1 ? "s" : "");
> +					perf_cpu_map__nr(top->evlist->core.user_cpus),
> +					perf_cpu_map__nr(top->evlist->core.user_cpus) > 1
> +					? "s" : "");
>  	}
>  
>  	perf_top__reset_sample_counters(top);
> -- 
> 2.35.1.1021.g381101b075-goog

-- 

- Arnaldo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/5] perf cpumap: More cpu map reuse by merge.
  2022-03-28  6:24   ` Ian Rogers
@ 2022-03-28 20:26     ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 34+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-03-28 20:26 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Adrian Hunter, Riccardo Mancini, Andi Kleen,
	Alexey Bayduraev, Alexander Antonov, linux-perf-users,
	linux-kernel, coresight, linux-arm-kernel, netdev, bpf,
	Stephane Eranian

Em Sun, Mar 27, 2022 at 11:24:11PM -0700, Ian Rogers escreveu:
> perf_cpu_map__merge will reuse one of its arguments if they are equal or
> the other argument is NULL. The arguments could be reused if it is known
> one set of values is a subset of the other. For example, a map of 0-1
> and a map of just 0 when merged yields the map of 0-1. Currently a new
> map is created rather than adding a reference count to the original 0-1
> map.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/lib/perf/cpumap.c | 38 ++++++++++++++++++++++++++++----------
>  1 file changed, 28 insertions(+), 10 deletions(-)
> 
> diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
> index ee66760f1e63..953bc50b0e41 100644
> --- a/tools/lib/perf/cpumap.c
> +++ b/tools/lib/perf/cpumap.c
> @@ -319,6 +319,29 @@ struct perf_cpu perf_cpu_map__max(struct perf_cpu_map *map)
>  	return map->nr > 0 ? map->map[map->nr - 1] : result;
>  }
>  
> +/** Is 'b' a subset of 'a'. */
> +static bool perf_cpu_map__is_subset(const struct perf_cpu_map *a,
> +				    const struct perf_cpu_map *b)
> +{
> +	int i, j;
> +
> +	if (a == b || !b)
> +		return true;
> +	if (!a || b->nr > a->nr)
> +		return false;
> +	j = 0;
> +	for (i = 0; i < a->nr; i++) {

Since the kernel bumped the minimum gcc version to one that supports
declaring loop variables locally and that perf has been using this since
forever:

⬢[acme@toolbox perf]$ grep -r '(int [[:alpha:]] = 0;' tools/perf
tools/perf/util/block-info.c:	for (int i = 0; i < nr_hpps; i++)
tools/perf/util/block-info.c:	for (int i = 0; i < nr_hpps; i++) {
tools/perf/util/block-info.c:	for (int i = 0; i < nr_reps; i++)
tools/perf/util/stream.c:	for (int i = 0; i < nr_evsel; i++)
tools/perf/util/stream.c:	for (int i = 0; i < nr_evsel; i++) {
tools/perf/util/stream.c:	for (int i = 0; i < els->nr_evsel; i++) {
tools/perf/util/stream.c:	for (int i = 0; i < es_pair->nr_streams; i++) {
tools/perf/util/stream.c:	for (int i = 0; i < es_base->nr_streams; i++) {
tools/perf/util/cpumap.c:		for (int j = 0; j < c->nr; j++) {
tools/perf/util/mem-events.c:	for (int j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
tools/perf/util/header.c:	for (int i = 0; i < ff->ph->env.nr_hybrid_cpc_nodes; i++) {
tools/perf/builtin-diff.c:	for (int i = 0; i < num; i++)
tools/perf/builtin-diff.c:		for (int i = 0; i < pair->block_info->num; i++) {
tools/perf/builtin-stat.c:	for (int i = 0; i < perf_cpu_map__nr(a->core.cpus); i++) {
⬢[acme@toolbox perf]$

And this builds on all my test containers, please use:

	for (int i = 0, j = 0; i < a->nr; i++)

In this case to make the source code more compact.

> +		if (a->map[i].cpu > b->map[j].cpu)
> +			return false;
> +		if (a->map[i].cpu == b->map[j].cpu) {
> +			j++;
> +			if (j == b->nr)
> +				return true;

Ok, as its guaranteed that cpu_maps are ordered.

> +		}
> +	}
> +	return false;
> +}
> +
>  /*
>   * Merge two cpumaps
>   *
> @@ -335,17 +358,12 @@ struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
>  	int i, j, k;
>  	struct perf_cpu_map *merged;
>  
> -	if (!orig && !other)
> -		return NULL;
> -	if (!orig) {
> -		perf_cpu_map__get(other);
> -		return other;
> -	}
> -	if (!other)
> -		return orig;
> -	if (orig->nr == other->nr &&
> -	    !memcmp(orig->map, other->map, orig->nr * sizeof(struct perf_cpu)))
> +	if (perf_cpu_map__is_subset(orig, other))
>  		return orig;

Can't we have first the introduction of perf_cpu_map__is_subset() and
then another patch that gets the refcount, i.e. the four lines below?

> +	if (perf_cpu_map__is_subset(other, orig)) {
> +		perf_cpu_map__put(orig);
> +		return perf_cpu_map__get(other);
> +	}
>  
>  	tmp_len = orig->nr + other->nr;
>  	tmp_cpus = malloc(tmp_len * sizeof(struct perf_cpu));
> -- 
> 2.35.1.1021.g381101b075-goog

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/5] perf cpumap: More cpu map reuse by merge.
@ 2022-03-28 20:26     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 34+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-03-28 20:26 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Adrian Hunter, Riccardo Mancini, Andi Kleen,
	Alexey Bayduraev, Alexander Antonov, linux-perf-users,
	linux-kernel, coresight, linux-arm-kernel, netdev, bpf,
	Stephane Eranian

Em Sun, Mar 27, 2022 at 11:24:11PM -0700, Ian Rogers escreveu:
> perf_cpu_map__merge will reuse one of its arguments if they are equal or
> the other argument is NULL. The arguments could be reused if it is known
> one set of values is a subset of the other. For example, a map of 0-1
> and a map of just 0 when merged yields the map of 0-1. Currently a new
> map is created rather than adding a reference count to the original 0-1
> map.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/lib/perf/cpumap.c | 38 ++++++++++++++++++++++++++++----------
>  1 file changed, 28 insertions(+), 10 deletions(-)
> 
> diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
> index ee66760f1e63..953bc50b0e41 100644
> --- a/tools/lib/perf/cpumap.c
> +++ b/tools/lib/perf/cpumap.c
> @@ -319,6 +319,29 @@ struct perf_cpu perf_cpu_map__max(struct perf_cpu_map *map)
>  	return map->nr > 0 ? map->map[map->nr - 1] : result;
>  }
>  
> +/** Is 'b' a subset of 'a'. */
> +static bool perf_cpu_map__is_subset(const struct perf_cpu_map *a,
> +				    const struct perf_cpu_map *b)
> +{
> +	int i, j;
> +
> +	if (a == b || !b)
> +		return true;
> +	if (!a || b->nr > a->nr)
> +		return false;
> +	j = 0;
> +	for (i = 0; i < a->nr; i++) {

Since the kernel bumped the minimum gcc version to one that supports
declaring loop variables locally and that perf has been using this since
forever:

⬢[acme@toolbox perf]$ grep -r '(int [[:alpha:]] = 0;' tools/perf
tools/perf/util/block-info.c:	for (int i = 0; i < nr_hpps; i++)
tools/perf/util/block-info.c:	for (int i = 0; i < nr_hpps; i++) {
tools/perf/util/block-info.c:	for (int i = 0; i < nr_reps; i++)
tools/perf/util/stream.c:	for (int i = 0; i < nr_evsel; i++)
tools/perf/util/stream.c:	for (int i = 0; i < nr_evsel; i++) {
tools/perf/util/stream.c:	for (int i = 0; i < els->nr_evsel; i++) {
tools/perf/util/stream.c:	for (int i = 0; i < es_pair->nr_streams; i++) {
tools/perf/util/stream.c:	for (int i = 0; i < es_base->nr_streams; i++) {
tools/perf/util/cpumap.c:		for (int j = 0; j < c->nr; j++) {
tools/perf/util/mem-events.c:	for (int j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
tools/perf/util/header.c:	for (int i = 0; i < ff->ph->env.nr_hybrid_cpc_nodes; i++) {
tools/perf/builtin-diff.c:	for (int i = 0; i < num; i++)
tools/perf/builtin-diff.c:		for (int i = 0; i < pair->block_info->num; i++) {
tools/perf/builtin-stat.c:	for (int i = 0; i < perf_cpu_map__nr(a->core.cpus); i++) {
⬢[acme@toolbox perf]$

And this builds on all my test containers, please use:

	for (int i = 0, j = 0; i < a->nr; i++)

In this case to make the source code more compact.

> +		if (a->map[i].cpu > b->map[j].cpu)
> +			return false;
> +		if (a->map[i].cpu == b->map[j].cpu) {
> +			j++;
> +			if (j == b->nr)
> +				return true;

Ok, as its guaranteed that cpu_maps are ordered.

> +		}
> +	}
> +	return false;
> +}
> +
>  /*
>   * Merge two cpumaps
>   *
> @@ -335,17 +358,12 @@ struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
>  	int i, j, k;
>  	struct perf_cpu_map *merged;
>  
> -	if (!orig && !other)
> -		return NULL;
> -	if (!orig) {
> -		perf_cpu_map__get(other);
> -		return other;
> -	}
> -	if (!other)
> -		return orig;
> -	if (orig->nr == other->nr &&
> -	    !memcmp(orig->map, other->map, orig->nr * sizeof(struct perf_cpu)))
> +	if (perf_cpu_map__is_subset(orig, other))
>  		return orig;

Can't we have first the introduction of perf_cpu_map__is_subset() and
then another patch that gets the refcount, i.e. the four lines below?

> +	if (perf_cpu_map__is_subset(other, orig)) {
> +		perf_cpu_map__put(orig);
> +		return perf_cpu_map__get(other);
> +	}
>  
>  	tmp_len = orig->nr + other->nr;
>  	tmp_cpus = malloc(tmp_len * sizeof(struct perf_cpu));
> -- 
> 2.35.1.1021.g381101b075-goog

-- 

- Arnaldo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/5] perf cpumap: Add intersect function.
  2022-03-28  6:24   ` Ian Rogers
@ 2022-03-28 20:28     ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 34+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-03-28 20:28 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Adrian Hunter, Riccardo Mancini, Andi Kleen,
	Alexey Bayduraev, Alexander Antonov, linux-perf-users,
	linux-kernel, coresight, linux-arm-kernel, netdev, bpf,
	Stephane Eranian

Em Sun, Mar 27, 2022 at 11:24:12PM -0700, Ian Rogers escreveu:
> The merge function gives the union of two cpu maps. Add an intersect
> function which will be used in the next change.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/lib/perf/cpumap.c              | 38 ++++++++++++++++++++++++++++
>  tools/lib/perf/include/perf/cpumap.h |  2 ++
>  2 files changed, 40 insertions(+)
> 
> diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
> index 953bc50b0e41..56b4d213039f 100644
> --- a/tools/lib/perf/cpumap.c
> +++ b/tools/lib/perf/cpumap.c
> @@ -393,3 +393,41 @@ struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
>  	perf_cpu_map__put(orig);
>  	return merged;
>  }
> +
> +struct perf_cpu_map *perf_cpu_map__intersect(struct perf_cpu_map *orig,
> +					     struct perf_cpu_map *other)
> +{
> +	struct perf_cpu *tmp_cpus;
> +	int tmp_len;
> +	int i, j, k;
> +	struct perf_cpu_map *merged = NULL;
> +
> +	if (perf_cpu_map__is_subset(other, orig))
> +		return orig;
> +	if (perf_cpu_map__is_subset(orig, other)) {
> +		perf_cpu_map__put(orig);

Why this put(orig)?

> +		return perf_cpu_map__get(other);

And why the get here and not on the first if?

> +	}
> +
> +	tmp_len = max(orig->nr, other->nr);
> +	tmp_cpus = malloc(tmp_len * sizeof(struct perf_cpu));
> +	if (!tmp_cpus)
> +		return NULL;
> +
> +	i = j = k = 0;
> +	while (i < orig->nr && j < other->nr) {
> +		if (orig->map[i].cpu < other->map[j].cpu)
> +			i++;
> +		else if (orig->map[i].cpu > other->map[j].cpu)
> +			j++;
> +		else {
> +			j++;
> +			tmp_cpus[k++] = orig->map[i++];
> +		}
> +	}
> +	if (k)
> +		merged = cpu_map__trim_new(k, tmp_cpus);
> +	free(tmp_cpus);
> +	perf_cpu_map__put(orig);
> +	return merged;
> +}
> diff --git a/tools/lib/perf/include/perf/cpumap.h b/tools/lib/perf/include/perf/cpumap.h
> index 4a2edbdb5e2b..a2a7216c0b78 100644
> --- a/tools/lib/perf/include/perf/cpumap.h
> +++ b/tools/lib/perf/include/perf/cpumap.h
> @@ -19,6 +19,8 @@ LIBPERF_API struct perf_cpu_map *perf_cpu_map__read(FILE *file);
>  LIBPERF_API struct perf_cpu_map *perf_cpu_map__get(struct perf_cpu_map *map);
>  LIBPERF_API struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
>  						     struct perf_cpu_map *other);
> +LIBPERF_API struct perf_cpu_map *perf_cpu_map__intersect(struct perf_cpu_map *orig,
> +							 struct perf_cpu_map *other);
>  LIBPERF_API void perf_cpu_map__put(struct perf_cpu_map *map);
>  LIBPERF_API struct perf_cpu perf_cpu_map__cpu(const struct perf_cpu_map *cpus, int idx);
>  LIBPERF_API int perf_cpu_map__nr(const struct perf_cpu_map *cpus);
> -- 
> 2.35.1.1021.g381101b075-goog

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/5] perf cpumap: Add intersect function.
@ 2022-03-28 20:28     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 34+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-03-28 20:28 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Adrian Hunter, Riccardo Mancini, Andi Kleen,
	Alexey Bayduraev, Alexander Antonov, linux-perf-users,
	linux-kernel, coresight, linux-arm-kernel, netdev, bpf,
	Stephane Eranian

Em Sun, Mar 27, 2022 at 11:24:12PM -0700, Ian Rogers escreveu:
> The merge function gives the union of two cpu maps. Add an intersect
> function which will be used in the next change.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/lib/perf/cpumap.c              | 38 ++++++++++++++++++++++++++++
>  tools/lib/perf/include/perf/cpumap.h |  2 ++
>  2 files changed, 40 insertions(+)
> 
> diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
> index 953bc50b0e41..56b4d213039f 100644
> --- a/tools/lib/perf/cpumap.c
> +++ b/tools/lib/perf/cpumap.c
> @@ -393,3 +393,41 @@ struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
>  	perf_cpu_map__put(orig);
>  	return merged;
>  }
> +
> +struct perf_cpu_map *perf_cpu_map__intersect(struct perf_cpu_map *orig,
> +					     struct perf_cpu_map *other)
> +{
> +	struct perf_cpu *tmp_cpus;
> +	int tmp_len;
> +	int i, j, k;
> +	struct perf_cpu_map *merged = NULL;
> +
> +	if (perf_cpu_map__is_subset(other, orig))
> +		return orig;
> +	if (perf_cpu_map__is_subset(orig, other)) {
> +		perf_cpu_map__put(orig);

Why this put(orig)?

> +		return perf_cpu_map__get(other);

And why the get here and not on the first if?

> +	}
> +
> +	tmp_len = max(orig->nr, other->nr);
> +	tmp_cpus = malloc(tmp_len * sizeof(struct perf_cpu));
> +	if (!tmp_cpus)
> +		return NULL;
> +
> +	i = j = k = 0;
> +	while (i < orig->nr && j < other->nr) {
> +		if (orig->map[i].cpu < other->map[j].cpu)
> +			i++;
> +		else if (orig->map[i].cpu > other->map[j].cpu)
> +			j++;
> +		else {
> +			j++;
> +			tmp_cpus[k++] = orig->map[i++];
> +		}
> +	}
> +	if (k)
> +		merged = cpu_map__trim_new(k, tmp_cpus);
> +	free(tmp_cpus);
> +	perf_cpu_map__put(orig);
> +	return merged;
> +}
> diff --git a/tools/lib/perf/include/perf/cpumap.h b/tools/lib/perf/include/perf/cpumap.h
> index 4a2edbdb5e2b..a2a7216c0b78 100644
> --- a/tools/lib/perf/include/perf/cpumap.h
> +++ b/tools/lib/perf/include/perf/cpumap.h
> @@ -19,6 +19,8 @@ LIBPERF_API struct perf_cpu_map *perf_cpu_map__read(FILE *file);
>  LIBPERF_API struct perf_cpu_map *perf_cpu_map__get(struct perf_cpu_map *map);
>  LIBPERF_API struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
>  						     struct perf_cpu_map *other);
> +LIBPERF_API struct perf_cpu_map *perf_cpu_map__intersect(struct perf_cpu_map *orig,
> +							 struct perf_cpu_map *other);
>  LIBPERF_API void perf_cpu_map__put(struct perf_cpu_map *map);
>  LIBPERF_API struct perf_cpu perf_cpu_map__cpu(const struct perf_cpu_map *cpus, int idx);
>  LIBPERF_API int perf_cpu_map__nr(const struct perf_cpu_map *cpus);
> -- 
> 2.35.1.1021.g381101b075-goog

-- 

- Arnaldo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] perf stat: Avoid segv if core.user_cpus isn't set.
  2022-03-28  6:24   ` Ian Rogers
@ 2022-03-28 20:32     ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 34+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-03-28 20:32 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Adrian Hunter, Riccardo Mancini, Andi Kleen,
	Alexey Bayduraev, Alexander Antonov, linux-perf-users,
	linux-kernel, coresight, linux-arm-kernel, netdev, bpf,
	Stephane Eranian

Em Sun, Mar 27, 2022 at 11:24:13PM -0700, Ian Rogers escreveu:
> Passing null to perf_cpu_map__max doesn't make sense as there is no
> valid max. Avoid this problem by null checking in
> perf_stat_init_aggr_mode.

Applying this one after changing user_cpus back to cpus as this is a fix
independent of this patchset.

In the future, please try to have such patches at the beginning of the
series, so that  they can get cherry-picked more easily.

- Arnaldo
 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/builtin-stat.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 5bee529f7656..ecd5cf4fd872 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1472,7 +1472,10 @@ static int perf_stat_init_aggr_mode(void)
>  	 * taking the highest cpu number to be the size of
>  	 * the aggregation translate cpumap.
>  	 */
> -	nr = perf_cpu_map__max(evsel_list->core.user_cpus).cpu;
> +	if (evsel_list->core.user_cpus)
> +		nr = perf_cpu_map__max(evsel_list->core.user_cpus).cpu;
> +	else
> +		nr = 0;
>  	stat_config.cpus_aggr_map = cpu_aggr_map__empty_new(nr + 1);
>  	return stat_config.cpus_aggr_map ? 0 : -ENOMEM;
>  }
> -- 
> 2.35.1.1021.g381101b075-goog

-- 

- Arnaldo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] perf stat: Avoid segv if core.user_cpus isn't set.
@ 2022-03-28 20:32     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 34+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-03-28 20:32 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Adrian Hunter, Riccardo Mancini, Andi Kleen,
	Alexey Bayduraev, Alexander Antonov, linux-perf-users,
	linux-kernel, coresight, linux-arm-kernel, netdev, bpf,
	Stephane Eranian

Em Sun, Mar 27, 2022 at 11:24:13PM -0700, Ian Rogers escreveu:
> Passing null to perf_cpu_map__max doesn't make sense as there is no
> valid max. Avoid this problem by null checking in
> perf_stat_init_aggr_mode.

Applying this one after changing user_cpus back to cpus as this is a fix
independent of this patchset.

In the future, please try to have such patches at the beginning of the
series, so that  they can get cherry-picked more easily.

- Arnaldo
 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/builtin-stat.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 5bee529f7656..ecd5cf4fd872 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1472,7 +1472,10 @@ static int perf_stat_init_aggr_mode(void)
>  	 * taking the highest cpu number to be the size of
>  	 * the aggregation translate cpumap.
>  	 */
> -	nr = perf_cpu_map__max(evsel_list->core.user_cpus).cpu;
> +	if (evsel_list->core.user_cpus)
> +		nr = perf_cpu_map__max(evsel_list->core.user_cpus).cpu;
> +	else
> +		nr = 0;
>  	stat_config.cpus_aggr_map = cpu_aggr_map__empty_new(nr + 1);
>  	return stat_config.cpus_aggr_map ? 0 : -ENOMEM;
>  }
> -- 
> 2.35.1.1021.g381101b075-goog

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] perf stat: Avoid segv if core.user_cpus isn't set.
  2022-03-28 20:32     ` Arnaldo Carvalho de Melo
@ 2022-03-28 20:46       ` Ian Rogers
  -1 siblings, 0 replies; 34+ messages in thread
From: Ian Rogers @ 2022-03-28 20:46 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Adrian Hunter, Riccardo Mancini, Andi Kleen,
	Alexey Bayduraev, Alexander Antonov, linux-perf-users,
	linux-kernel, coresight, linux-arm-kernel, netdev, bpf,
	Stephane Eranian

On Mon, Mar 28, 2022 at 1:32 PM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Sun, Mar 27, 2022 at 11:24:13PM -0700, Ian Rogers escreveu:
> > Passing null to perf_cpu_map__max doesn't make sense as there is no
> > valid max. Avoid this problem by null checking in
> > perf_stat_init_aggr_mode.
>
> Applying this one after changing user_cpus back to cpus as this is a fix
> independent of this patchset.
>
> In the future, please try to have such patches at the beginning of the
> series, so that  they can get cherry-picked more easily.

Ack. The problem is best exhibited when the intersect happens, without
it getting a reproducer wasn't something I was able to do.

Thanks,
Ian

> - Arnaldo
>
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/perf/builtin-stat.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> > index 5bee529f7656..ecd5cf4fd872 100644
> > --- a/tools/perf/builtin-stat.c
> > +++ b/tools/perf/builtin-stat.c
> > @@ -1472,7 +1472,10 @@ static int perf_stat_init_aggr_mode(void)
> >        * taking the highest cpu number to be the size of
> >        * the aggregation translate cpumap.
> >        */
> > -     nr = perf_cpu_map__max(evsel_list->core.user_cpus).cpu;
> > +     if (evsel_list->core.user_cpus)
> > +             nr = perf_cpu_map__max(evsel_list->core.user_cpus).cpu;
> > +     else
> > +             nr = 0;
> >       stat_config.cpus_aggr_map = cpu_aggr_map__empty_new(nr + 1);
> >       return stat_config.cpus_aggr_map ? 0 : -ENOMEM;
> >  }
> > --
> > 2.35.1.1021.g381101b075-goog
>
> --
>
> - Arnaldo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] perf stat: Avoid segv if core.user_cpus isn't set.
@ 2022-03-28 20:46       ` Ian Rogers
  0 siblings, 0 replies; 34+ messages in thread
From: Ian Rogers @ 2022-03-28 20:46 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Adrian Hunter, Riccardo Mancini, Andi Kleen,
	Alexey Bayduraev, Alexander Antonov, linux-perf-users,
	linux-kernel, coresight, linux-arm-kernel, netdev, bpf,
	Stephane Eranian

On Mon, Mar 28, 2022 at 1:32 PM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Sun, Mar 27, 2022 at 11:24:13PM -0700, Ian Rogers escreveu:
> > Passing null to perf_cpu_map__max doesn't make sense as there is no
> > valid max. Avoid this problem by null checking in
> > perf_stat_init_aggr_mode.
>
> Applying this one after changing user_cpus back to cpus as this is a fix
> independent of this patchset.
>
> In the future, please try to have such patches at the beginning of the
> series, so that  they can get cherry-picked more easily.

Ack. The problem is best exhibited when the intersect happens, without
it getting a reproducer wasn't something I was able to do.

Thanks,
Ian

> - Arnaldo
>
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/perf/builtin-stat.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> > index 5bee529f7656..ecd5cf4fd872 100644
> > --- a/tools/perf/builtin-stat.c
> > +++ b/tools/perf/builtin-stat.c
> > @@ -1472,7 +1472,10 @@ static int perf_stat_init_aggr_mode(void)
> >        * taking the highest cpu number to be the size of
> >        * the aggregation translate cpumap.
> >        */
> > -     nr = perf_cpu_map__max(evsel_list->core.user_cpus).cpu;
> > +     if (evsel_list->core.user_cpus)
> > +             nr = perf_cpu_map__max(evsel_list->core.user_cpus).cpu;
> > +     else
> > +             nr = 0;
> >       stat_config.cpus_aggr_map = cpu_aggr_map__empty_new(nr + 1);
> >       return stat_config.cpus_aggr_map ? 0 : -ENOMEM;
> >  }
> > --
> > 2.35.1.1021.g381101b075-goog
>
> --
>
> - Arnaldo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/5] perf cpumap: More cpu map reuse by merge.
  2022-03-28 20:26     ` Arnaldo Carvalho de Melo
@ 2022-03-28 20:50       ` Ian Rogers
  -1 siblings, 0 replies; 34+ messages in thread
From: Ian Rogers @ 2022-03-28 20:50 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Adrian Hunter, Riccardo Mancini, Andi Kleen,
	Alexey Bayduraev, Alexander Antonov, linux-perf-users,
	linux-kernel, coresight, linux-arm-kernel, netdev, bpf,
	Stephane Eranian

On Mon, Mar 28, 2022 at 1:26 PM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Sun, Mar 27, 2022 at 11:24:11PM -0700, Ian Rogers escreveu:
> > perf_cpu_map__merge will reuse one of its arguments if they are equal or
> > the other argument is NULL. The arguments could be reused if it is known
> > one set of values is a subset of the other. For example, a map of 0-1
> > and a map of just 0 when merged yields the map of 0-1. Currently a new
> > map is created rather than adding a reference count to the original 0-1
> > map.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/lib/perf/cpumap.c | 38 ++++++++++++++++++++++++++++----------
> >  1 file changed, 28 insertions(+), 10 deletions(-)
> >
> > diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
> > index ee66760f1e63..953bc50b0e41 100644
> > --- a/tools/lib/perf/cpumap.c
> > +++ b/tools/lib/perf/cpumap.c
> > @@ -319,6 +319,29 @@ struct perf_cpu perf_cpu_map__max(struct perf_cpu_map *map)
> >       return map->nr > 0 ? map->map[map->nr - 1] : result;
> >  }
> >
> > +/** Is 'b' a subset of 'a'. */
> > +static bool perf_cpu_map__is_subset(const struct perf_cpu_map *a,
> > +                                 const struct perf_cpu_map *b)
> > +{
> > +     int i, j;
> > +
> > +     if (a == b || !b)
> > +             return true;
> > +     if (!a || b->nr > a->nr)
> > +             return false;
> > +     j = 0;
> > +     for (i = 0; i < a->nr; i++) {
>
> Since the kernel bumped the minimum gcc version to one that supports
> declaring loop variables locally and that perf has been using this since
> forever:
>
> ⬢[acme@toolbox perf]$ grep -r '(int [[:alpha:]] = 0;' tools/perf
> tools/perf/util/block-info.c:   for (int i = 0; i < nr_hpps; i++)
> tools/perf/util/block-info.c:   for (int i = 0; i < nr_hpps; i++) {
> tools/perf/util/block-info.c:   for (int i = 0; i < nr_reps; i++)
> tools/perf/util/stream.c:       for (int i = 0; i < nr_evsel; i++)
> tools/perf/util/stream.c:       for (int i = 0; i < nr_evsel; i++) {
> tools/perf/util/stream.c:       for (int i = 0; i < els->nr_evsel; i++) {
> tools/perf/util/stream.c:       for (int i = 0; i < es_pair->nr_streams; i++) {
> tools/perf/util/stream.c:       for (int i = 0; i < es_base->nr_streams; i++) {
> tools/perf/util/cpumap.c:               for (int j = 0; j < c->nr; j++) {
> tools/perf/util/mem-events.c:   for (int j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
> tools/perf/util/header.c:       for (int i = 0; i < ff->ph->env.nr_hybrid_cpc_nodes; i++) {
> tools/perf/builtin-diff.c:      for (int i = 0; i < num; i++)
> tools/perf/builtin-diff.c:              for (int i = 0; i < pair->block_info->num; i++) {
> tools/perf/builtin-stat.c:      for (int i = 0; i < perf_cpu_map__nr(a->core.cpus); i++) {
> ⬢[acme@toolbox perf]$
>
> And this builds on all my test containers, please use:
>
>         for (int i = 0, j = 0; i < a->nr; i++)
>
> In this case to make the source code more compact.

Ack. We still need to declare 'j' and it is a bit weird to declare j
before i. Fwiw, Making.config has the CORE_CFLAGS set to gnu99, but
declaring in the loop is clearly valid in c99.

> > +             if (a->map[i].cpu > b->map[j].cpu)
> > +                     return false;
> > +             if (a->map[i].cpu == b->map[j].cpu) {
> > +                     j++;
> > +                     if (j == b->nr)
> > +                             return true;
>
> Ok, as its guaranteed that cpu_maps are ordered.
>
> > +             }
> > +     }
> > +     return false;
> > +}
> > +
> >  /*
> >   * Merge two cpumaps
> >   *
> > @@ -335,17 +358,12 @@ struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
> >       int i, j, k;
> >       struct perf_cpu_map *merged;
> >
> > -     if (!orig && !other)
> > -             return NULL;
> > -     if (!orig) {
> > -             perf_cpu_map__get(other);
> > -             return other;
> > -     }
> > -     if (!other)
> > -             return orig;
> > -     if (orig->nr == other->nr &&
> > -         !memcmp(orig->map, other->map, orig->nr * sizeof(struct perf_cpu)))
> > +     if (perf_cpu_map__is_subset(orig, other))
> >               return orig;
>
> Can't we have first the introduction of perf_cpu_map__is_subset() and
> then another patch that gets the refcount, i.e. the four lines below?

I believe that will fail as it'd be an unused static function warning
and werror.

Thanks,
Ian

> > +     if (perf_cpu_map__is_subset(other, orig)) {
> > +             perf_cpu_map__put(orig);
> > +             return perf_cpu_map__get(other);
> > +     }
> >
> >       tmp_len = orig->nr + other->nr;
> >       tmp_cpus = malloc(tmp_len * sizeof(struct perf_cpu));
> > --
> > 2.35.1.1021.g381101b075-goog
>
> --
>
> - Arnaldo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/5] perf cpumap: More cpu map reuse by merge.
@ 2022-03-28 20:50       ` Ian Rogers
  0 siblings, 0 replies; 34+ messages in thread
From: Ian Rogers @ 2022-03-28 20:50 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Adrian Hunter, Riccardo Mancini, Andi Kleen,
	Alexey Bayduraev, Alexander Antonov, linux-perf-users,
	linux-kernel, coresight, linux-arm-kernel, netdev, bpf,
	Stephane Eranian

On Mon, Mar 28, 2022 at 1:26 PM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Sun, Mar 27, 2022 at 11:24:11PM -0700, Ian Rogers escreveu:
> > perf_cpu_map__merge will reuse one of its arguments if they are equal or
> > the other argument is NULL. The arguments could be reused if it is known
> > one set of values is a subset of the other. For example, a map of 0-1
> > and a map of just 0 when merged yields the map of 0-1. Currently a new
> > map is created rather than adding a reference count to the original 0-1
> > map.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/lib/perf/cpumap.c | 38 ++++++++++++++++++++++++++++----------
> >  1 file changed, 28 insertions(+), 10 deletions(-)
> >
> > diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
> > index ee66760f1e63..953bc50b0e41 100644
> > --- a/tools/lib/perf/cpumap.c
> > +++ b/tools/lib/perf/cpumap.c
> > @@ -319,6 +319,29 @@ struct perf_cpu perf_cpu_map__max(struct perf_cpu_map *map)
> >       return map->nr > 0 ? map->map[map->nr - 1] : result;
> >  }
> >
> > +/** Is 'b' a subset of 'a'. */
> > +static bool perf_cpu_map__is_subset(const struct perf_cpu_map *a,
> > +                                 const struct perf_cpu_map *b)
> > +{
> > +     int i, j;
> > +
> > +     if (a == b || !b)
> > +             return true;
> > +     if (!a || b->nr > a->nr)
> > +             return false;
> > +     j = 0;
> > +     for (i = 0; i < a->nr; i++) {
>
> Since the kernel bumped the minimum gcc version to one that supports
> declaring loop variables locally and that perf has been using this since
> forever:
>
> ⬢[acme@toolbox perf]$ grep -r '(int [[:alpha:]] = 0;' tools/perf
> tools/perf/util/block-info.c:   for (int i = 0; i < nr_hpps; i++)
> tools/perf/util/block-info.c:   for (int i = 0; i < nr_hpps; i++) {
> tools/perf/util/block-info.c:   for (int i = 0; i < nr_reps; i++)
> tools/perf/util/stream.c:       for (int i = 0; i < nr_evsel; i++)
> tools/perf/util/stream.c:       for (int i = 0; i < nr_evsel; i++) {
> tools/perf/util/stream.c:       for (int i = 0; i < els->nr_evsel; i++) {
> tools/perf/util/stream.c:       for (int i = 0; i < es_pair->nr_streams; i++) {
> tools/perf/util/stream.c:       for (int i = 0; i < es_base->nr_streams; i++) {
> tools/perf/util/cpumap.c:               for (int j = 0; j < c->nr; j++) {
> tools/perf/util/mem-events.c:   for (int j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
> tools/perf/util/header.c:       for (int i = 0; i < ff->ph->env.nr_hybrid_cpc_nodes; i++) {
> tools/perf/builtin-diff.c:      for (int i = 0; i < num; i++)
> tools/perf/builtin-diff.c:              for (int i = 0; i < pair->block_info->num; i++) {
> tools/perf/builtin-stat.c:      for (int i = 0; i < perf_cpu_map__nr(a->core.cpus); i++) {
> ⬢[acme@toolbox perf]$
>
> And this builds on all my test containers, please use:
>
>         for (int i = 0, j = 0; i < a->nr; i++)
>
> In this case to make the source code more compact.

Ack. We still need to declare 'j' and it is a bit weird to declare j
before i. Fwiw, Making.config has the CORE_CFLAGS set to gnu99, but
declaring in the loop is clearly valid in c99.

> > +             if (a->map[i].cpu > b->map[j].cpu)
> > +                     return false;
> > +             if (a->map[i].cpu == b->map[j].cpu) {
> > +                     j++;
> > +                     if (j == b->nr)
> > +                             return true;
>
> Ok, as its guaranteed that cpu_maps are ordered.
>
> > +             }
> > +     }
> > +     return false;
> > +}
> > +
> >  /*
> >   * Merge two cpumaps
> >   *
> > @@ -335,17 +358,12 @@ struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
> >       int i, j, k;
> >       struct perf_cpu_map *merged;
> >
> > -     if (!orig && !other)
> > -             return NULL;
> > -     if (!orig) {
> > -             perf_cpu_map__get(other);
> > -             return other;
> > -     }
> > -     if (!other)
> > -             return orig;
> > -     if (orig->nr == other->nr &&
> > -         !memcmp(orig->map, other->map, orig->nr * sizeof(struct perf_cpu)))
> > +     if (perf_cpu_map__is_subset(orig, other))
> >               return orig;
>
> Can't we have first the introduction of perf_cpu_map__is_subset() and
> then another patch that gets the refcount, i.e. the four lines below?

I believe that will fail as it'd be an unused static function warning
and werror.

Thanks,
Ian

> > +     if (perf_cpu_map__is_subset(other, orig)) {
> > +             perf_cpu_map__put(orig);
> > +             return perf_cpu_map__get(other);
> > +     }
> >
> >       tmp_len = orig->nr + other->nr;
> >       tmp_cpus = malloc(tmp_len * sizeof(struct perf_cpu));
> > --
> > 2.35.1.1021.g381101b075-goog
>
> --
>
> - Arnaldo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/5] perf cpumap: Add intersect function.
  2022-03-28 20:28     ` Arnaldo Carvalho de Melo
@ 2022-03-28 20:54       ` Ian Rogers
  -1 siblings, 0 replies; 34+ messages in thread
From: Ian Rogers @ 2022-03-28 20:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Adrian Hunter, Riccardo Mancini, Andi Kleen,
	Alexey Bayduraev, Alexander Antonov, linux-perf-users,
	linux-kernel, coresight, linux-arm-kernel, netdev, bpf,
	Stephane Eranian

On Mon, Mar 28, 2022 at 1:28 PM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Sun, Mar 27, 2022 at 11:24:12PM -0700, Ian Rogers escreveu:
> > The merge function gives the union of two cpu maps. Add an intersect
> > function which will be used in the next change.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/lib/perf/cpumap.c              | 38 ++++++++++++++++++++++++++++
> >  tools/lib/perf/include/perf/cpumap.h |  2 ++
> >  2 files changed, 40 insertions(+)
> >
> > diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
> > index 953bc50b0e41..56b4d213039f 100644
> > --- a/tools/lib/perf/cpumap.c
> > +++ b/tools/lib/perf/cpumap.c
> > @@ -393,3 +393,41 @@ struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
> >       perf_cpu_map__put(orig);
> >       return merged;
> >  }
> > +
> > +struct perf_cpu_map *perf_cpu_map__intersect(struct perf_cpu_map *orig,
> > +                                          struct perf_cpu_map *other)
> > +{
> > +     struct perf_cpu *tmp_cpus;
> > +     int tmp_len;
> > +     int i, j, k;
> > +     struct perf_cpu_map *merged = NULL;
> > +
> > +     if (perf_cpu_map__is_subset(other, orig))
> > +             return orig;
> > +     if (perf_cpu_map__is_subset(orig, other)) {
> > +             perf_cpu_map__put(orig);
>
> Why this put(orig)?

As with merge, if orig isn't returned then it is put.

> > +             return perf_cpu_map__get(other);
>
> And why the get here and not on the first if?

The first argument orig is either put or returned while the second may
be returned only if the reference count is incremented. We could
change the API for merge and intersect to put both arguments, or to
not put either argument.

Thanks,
Ian

> > +     }
> > +
> > +     tmp_len = max(orig->nr, other->nr);
> > +     tmp_cpus = malloc(tmp_len * sizeof(struct perf_cpu));
> > +     if (!tmp_cpus)
> > +             return NULL;
> > +
> > +     i = j = k = 0;
> > +     while (i < orig->nr && j < other->nr) {
> > +             if (orig->map[i].cpu < other->map[j].cpu)
> > +                     i++;
> > +             else if (orig->map[i].cpu > other->map[j].cpu)
> > +                     j++;
> > +             else {
> > +                     j++;
> > +                     tmp_cpus[k++] = orig->map[i++];
> > +             }
> > +     }
> > +     if (k)
> > +             merged = cpu_map__trim_new(k, tmp_cpus);
> > +     free(tmp_cpus);
> > +     perf_cpu_map__put(orig);
> > +     return merged;
> > +}
> > diff --git a/tools/lib/perf/include/perf/cpumap.h b/tools/lib/perf/include/perf/cpumap.h
> > index 4a2edbdb5e2b..a2a7216c0b78 100644
> > --- a/tools/lib/perf/include/perf/cpumap.h
> > +++ b/tools/lib/perf/include/perf/cpumap.h
> > @@ -19,6 +19,8 @@ LIBPERF_API struct perf_cpu_map *perf_cpu_map__read(FILE *file);
> >  LIBPERF_API struct perf_cpu_map *perf_cpu_map__get(struct perf_cpu_map *map);
> >  LIBPERF_API struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
> >                                                    struct perf_cpu_map *other);
> > +LIBPERF_API struct perf_cpu_map *perf_cpu_map__intersect(struct perf_cpu_map *orig,
> > +                                                      struct perf_cpu_map *other);
> >  LIBPERF_API void perf_cpu_map__put(struct perf_cpu_map *map);
> >  LIBPERF_API struct perf_cpu perf_cpu_map__cpu(const struct perf_cpu_map *cpus, int idx);
> >  LIBPERF_API int perf_cpu_map__nr(const struct perf_cpu_map *cpus);
> > --
> > 2.35.1.1021.g381101b075-goog
>
> --
>
> - Arnaldo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/5] perf cpumap: Add intersect function.
@ 2022-03-28 20:54       ` Ian Rogers
  0 siblings, 0 replies; 34+ messages in thread
From: Ian Rogers @ 2022-03-28 20:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Adrian Hunter, Riccardo Mancini, Andi Kleen,
	Alexey Bayduraev, Alexander Antonov, linux-perf-users,
	linux-kernel, coresight, linux-arm-kernel, netdev, bpf,
	Stephane Eranian

On Mon, Mar 28, 2022 at 1:28 PM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Sun, Mar 27, 2022 at 11:24:12PM -0700, Ian Rogers escreveu:
> > The merge function gives the union of two cpu maps. Add an intersect
> > function which will be used in the next change.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/lib/perf/cpumap.c              | 38 ++++++++++++++++++++++++++++
> >  tools/lib/perf/include/perf/cpumap.h |  2 ++
> >  2 files changed, 40 insertions(+)
> >
> > diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
> > index 953bc50b0e41..56b4d213039f 100644
> > --- a/tools/lib/perf/cpumap.c
> > +++ b/tools/lib/perf/cpumap.c
> > @@ -393,3 +393,41 @@ struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
> >       perf_cpu_map__put(orig);
> >       return merged;
> >  }
> > +
> > +struct perf_cpu_map *perf_cpu_map__intersect(struct perf_cpu_map *orig,
> > +                                          struct perf_cpu_map *other)
> > +{
> > +     struct perf_cpu *tmp_cpus;
> > +     int tmp_len;
> > +     int i, j, k;
> > +     struct perf_cpu_map *merged = NULL;
> > +
> > +     if (perf_cpu_map__is_subset(other, orig))
> > +             return orig;
> > +     if (perf_cpu_map__is_subset(orig, other)) {
> > +             perf_cpu_map__put(orig);
>
> Why this put(orig)?

As with merge, if orig isn't returned then it is put.

> > +             return perf_cpu_map__get(other);
>
> And why the get here and not on the first if?

The first argument orig is either put or returned while the second may
be returned only if the reference count is incremented. We could
change the API for merge and intersect to put both arguments, or to
not put either argument.

Thanks,
Ian

> > +     }
> > +
> > +     tmp_len = max(orig->nr, other->nr);
> > +     tmp_cpus = malloc(tmp_len * sizeof(struct perf_cpu));
> > +     if (!tmp_cpus)
> > +             return NULL;
> > +
> > +     i = j = k = 0;
> > +     while (i < orig->nr && j < other->nr) {
> > +             if (orig->map[i].cpu < other->map[j].cpu)
> > +                     i++;
> > +             else if (orig->map[i].cpu > other->map[j].cpu)
> > +                     j++;
> > +             else {
> > +                     j++;
> > +                     tmp_cpus[k++] = orig->map[i++];
> > +             }
> > +     }
> > +     if (k)
> > +             merged = cpu_map__trim_new(k, tmp_cpus);
> > +     free(tmp_cpus);
> > +     perf_cpu_map__put(orig);
> > +     return merged;
> > +}
> > diff --git a/tools/lib/perf/include/perf/cpumap.h b/tools/lib/perf/include/perf/cpumap.h
> > index 4a2edbdb5e2b..a2a7216c0b78 100644
> > --- a/tools/lib/perf/include/perf/cpumap.h
> > +++ b/tools/lib/perf/include/perf/cpumap.h
> > @@ -19,6 +19,8 @@ LIBPERF_API struct perf_cpu_map *perf_cpu_map__read(FILE *file);
> >  LIBPERF_API struct perf_cpu_map *perf_cpu_map__get(struct perf_cpu_map *map);
> >  LIBPERF_API struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
> >                                                    struct perf_cpu_map *other);
> > +LIBPERF_API struct perf_cpu_map *perf_cpu_map__intersect(struct perf_cpu_map *orig,
> > +                                                      struct perf_cpu_map *other);
> >  LIBPERF_API void perf_cpu_map__put(struct perf_cpu_map *map);
> >  LIBPERF_API struct perf_cpu perf_cpu_map__cpu(const struct perf_cpu_map *cpus, int idx);
> >  LIBPERF_API int perf_cpu_map__nr(const struct perf_cpu_map *cpus);
> > --
> > 2.35.1.1021.g381101b075-goog
>
> --
>
> - Arnaldo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/5] perf cpumap: More cpu map reuse by merge.
  2022-03-28 20:50       ` Ian Rogers
@ 2022-03-28 20:56         ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 34+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-03-28 20:56 UTC (permalink / raw)
  To: Ian Rogers, Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Adrian Hunter, Riccardo Mancini, Andi Kleen,
	Alexey Bayduraev, Alexander Antonov, linux-perf-users,
	linux-kernel, coresight, linux-arm-kernel, netdev, bpf,
	Stephane Eranian



On March 28, 2022 5:50:21 PM GMT-03:00, Ian Rogers <irogers@google.com> wrote:
>On Mon, Mar 28, 2022 at 1:26 PM Arnaldo Carvalho de Melo
><acme@kernel.org> wrote:
>>
>> Em Sun, Mar 27, 2022 at 11:24:11PM -0700, Ian Rogers escreveu:
>> > perf_cpu_map__merge will reuse one of its arguments if they are equal or
>> > the other argument is NULL. The arguments could be reused if it is known
>> > one set of values is a subset of the other. For example, a map of 0-1
>> > and a map of just 0 when merged yields the map of 0-1. Currently a new
>> > map is created rather than adding a reference count to the original 0-1
>> > map.
>> >
>> > Signed-off-by: Ian Rogers <irogers@google.com>
>> > ---
>> >  tools/lib/perf/cpumap.c | 38 ++++++++++++++++++++++++++++----------
>> >  1 file changed, 28 insertions(+), 10 deletions(-)
>> >
>> > diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
>> > index ee66760f1e63..953bc50b0e41 100644
>> > --- a/tools/lib/perf/cpumap.c
>> > +++ b/tools/lib/perf/cpumap.c
>> > @@ -319,6 +319,29 @@ struct perf_cpu perf_cpu_map__max(struct perf_cpu_map *map)
>> >       return map->nr > 0 ? map->map[map->nr - 1] : result;
>> >  }
>> >
>> > +/** Is 'b' a subset of 'a'. */
>> > +static bool perf_cpu_map__is_subset(const struct perf_cpu_map *a,
>> > +                                 const struct perf_cpu_map *b)
>> > +{
>> > +     int i, j;
>> > +
>> > +     if (a == b || !b)
>> > +             return true;
>> > +     if (!a || b->nr > a->nr)
>> > +             return false;
>> > +     j = 0;
>> > +     for (i = 0; i < a->nr; i++) {
>>
>> Since the kernel bumped the minimum gcc version to one that supports
>> declaring loop variables locally and that perf has been using this since
>> forever:
>>
>> ⬢[acme@toolbox perf]$ grep -r '(int [[:alpha:]] = 0;' tools/perf
>> tools/perf/util/block-info.c:   for (int i = 0; i < nr_hpps; i++)
>> tools/perf/util/block-info.c:   for (int i = 0; i < nr_hpps; i++) {
>> tools/perf/util/block-info.c:   for (int i = 0; i < nr_reps; i++)
>> tools/perf/util/stream.c:       for (int i = 0; i < nr_evsel; i++)
>> tools/perf/util/stream.c:       for (int i = 0; i < nr_evsel; i++) {
>> tools/perf/util/stream.c:       for (int i = 0; i < els->nr_evsel; i++) {
>> tools/perf/util/stream.c:       for (int i = 0; i < es_pair->nr_streams; i++) {
>> tools/perf/util/stream.c:       for (int i = 0; i < es_base->nr_streams; i++) {
>> tools/perf/util/cpumap.c:               for (int j = 0; j < c->nr; j++) {
>> tools/perf/util/mem-events.c:   for (int j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
>> tools/perf/util/header.c:       for (int i = 0; i < ff->ph->env.nr_hybrid_cpc_nodes; i++) {
>> tools/perf/builtin-diff.c:      for (int i = 0; i < num; i++)
>> tools/perf/builtin-diff.c:              for (int i = 0; i < pair->block_info->num; i++) {
>> tools/perf/builtin-stat.c:      for (int i = 0; i < perf_cpu_map__nr(a->core.cpus); i++) {
>> ⬢[acme@toolbox perf]$
>>
>> And this builds on all my test containers, please use:
>>
>>         for (int i = 0, j = 0; i < a->nr; i++)
>>
>> In this case to make the source code more compact.
>
>Ack. We still need to declare 'j' and it is a bit weird to declare j
>before i. Fwiw, Making.config has the CORE_CFLAGS set to gnu99, but
>declaring in the loop is clearly valid in c99.
>
>> > +             if (a->map[i].cpu > b->map[j].cpu)
>> > +                     return false;
>> > +             if (a->map[i].cpu == b->map[j].cpu) {
>> > +                     j++;
>> > +                     if (j == b->nr)
>> > +                             return true;
>>
>> Ok, as its guaranteed that cpu_maps are ordered.
>>
>> > +             }
>> > +     }
>> > +     return false;
>> > +}
>> > +
>> >  /*
>> >   * Merge two cpumaps
>> >   *
>> > @@ -335,17 +358,12 @@ struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
>> >       int i, j, k;
>> >       struct perf_cpu_map *merged;
>> >
>> > -     if (!orig && !other)
>> > -             return NULL;
>> > -     if (!orig) {
>> > -             perf_cpu_map__get(other);
>> > -             return other;
>> > -     }
>> > -     if (!other)
>> > -             return orig;
>> > -     if (orig->nr == other->nr &&
>> > -         !memcmp(orig->map, other->map, orig->nr * sizeof(struct perf_cpu)))
>> > +     if (perf_cpu_map__is_subset(orig, other))
>> >               return orig;
>>
>> Can't we have first the introduction of perf_cpu_map__is_subset() and
>> then another patch that gets the refcount, i.e. the four lines below?
>
>I believe that will fail as it'd be an unused static function warning
>and werror.

I thought that it seemed useful enough not to be a static, even if we don't at first export it, i.e. keep it as internal to libperf

>
>Thanks,
>Ian
>
>> > +     if (perf_cpu_map__is_subset(other, orig)) {
>> > +             perf_cpu_map__put(orig);
>> > +             return perf_cpu_map__get(other);
>> > +     }
>> >
>> >       tmp_len = orig->nr + other->nr;
>> >       tmp_cpus = malloc(tmp_len * sizeof(struct perf_cpu));
>> > --
>> > 2.35.1.1021.g381101b075-goog
>>
>> --
>>
>> - Arnaldo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/5] perf cpumap: More cpu map reuse by merge.
@ 2022-03-28 20:56         ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 34+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-03-28 20:56 UTC (permalink / raw)
  To: Ian Rogers, Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Adrian Hunter, Riccardo Mancini, Andi Kleen,
	Alexey Bayduraev, Alexander Antonov, linux-perf-users,
	linux-kernel, coresight, linux-arm-kernel, netdev, bpf,
	Stephane Eranian



On March 28, 2022 5:50:21 PM GMT-03:00, Ian Rogers <irogers@google.com> wrote:
>On Mon, Mar 28, 2022 at 1:26 PM Arnaldo Carvalho de Melo
><acme@kernel.org> wrote:
>>
>> Em Sun, Mar 27, 2022 at 11:24:11PM -0700, Ian Rogers escreveu:
>> > perf_cpu_map__merge will reuse one of its arguments if they are equal or
>> > the other argument is NULL. The arguments could be reused if it is known
>> > one set of values is a subset of the other. For example, a map of 0-1
>> > and a map of just 0 when merged yields the map of 0-1. Currently a new
>> > map is created rather than adding a reference count to the original 0-1
>> > map.
>> >
>> > Signed-off-by: Ian Rogers <irogers@google.com>
>> > ---
>> >  tools/lib/perf/cpumap.c | 38 ++++++++++++++++++++++++++++----------
>> >  1 file changed, 28 insertions(+), 10 deletions(-)
>> >
>> > diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
>> > index ee66760f1e63..953bc50b0e41 100644
>> > --- a/tools/lib/perf/cpumap.c
>> > +++ b/tools/lib/perf/cpumap.c
>> > @@ -319,6 +319,29 @@ struct perf_cpu perf_cpu_map__max(struct perf_cpu_map *map)
>> >       return map->nr > 0 ? map->map[map->nr - 1] : result;
>> >  }
>> >
>> > +/** Is 'b' a subset of 'a'. */
>> > +static bool perf_cpu_map__is_subset(const struct perf_cpu_map *a,
>> > +                                 const struct perf_cpu_map *b)
>> > +{
>> > +     int i, j;
>> > +
>> > +     if (a == b || !b)
>> > +             return true;
>> > +     if (!a || b->nr > a->nr)
>> > +             return false;
>> > +     j = 0;
>> > +     for (i = 0; i < a->nr; i++) {
>>
>> Since the kernel bumped the minimum gcc version to one that supports
>> declaring loop variables locally and that perf has been using this since
>> forever:
>>
>> ⬢[acme@toolbox perf]$ grep -r '(int [[:alpha:]] = 0;' tools/perf
>> tools/perf/util/block-info.c:   for (int i = 0; i < nr_hpps; i++)
>> tools/perf/util/block-info.c:   for (int i = 0; i < nr_hpps; i++) {
>> tools/perf/util/block-info.c:   for (int i = 0; i < nr_reps; i++)
>> tools/perf/util/stream.c:       for (int i = 0; i < nr_evsel; i++)
>> tools/perf/util/stream.c:       for (int i = 0; i < nr_evsel; i++) {
>> tools/perf/util/stream.c:       for (int i = 0; i < els->nr_evsel; i++) {
>> tools/perf/util/stream.c:       for (int i = 0; i < es_pair->nr_streams; i++) {
>> tools/perf/util/stream.c:       for (int i = 0; i < es_base->nr_streams; i++) {
>> tools/perf/util/cpumap.c:               for (int j = 0; j < c->nr; j++) {
>> tools/perf/util/mem-events.c:   for (int j = 0; j < PERF_MEM_EVENTS__MAX; j++) {
>> tools/perf/util/header.c:       for (int i = 0; i < ff->ph->env.nr_hybrid_cpc_nodes; i++) {
>> tools/perf/builtin-diff.c:      for (int i = 0; i < num; i++)
>> tools/perf/builtin-diff.c:              for (int i = 0; i < pair->block_info->num; i++) {
>> tools/perf/builtin-stat.c:      for (int i = 0; i < perf_cpu_map__nr(a->core.cpus); i++) {
>> ⬢[acme@toolbox perf]$
>>
>> And this builds on all my test containers, please use:
>>
>>         for (int i = 0, j = 0; i < a->nr; i++)
>>
>> In this case to make the source code more compact.
>
>Ack. We still need to declare 'j' and it is a bit weird to declare j
>before i. Fwiw, Making.config has the CORE_CFLAGS set to gnu99, but
>declaring in the loop is clearly valid in c99.
>
>> > +             if (a->map[i].cpu > b->map[j].cpu)
>> > +                     return false;
>> > +             if (a->map[i].cpu == b->map[j].cpu) {
>> > +                     j++;
>> > +                     if (j == b->nr)
>> > +                             return true;
>>
>> Ok, as its guaranteed that cpu_maps are ordered.
>>
>> > +             }
>> > +     }
>> > +     return false;
>> > +}
>> > +
>> >  /*
>> >   * Merge two cpumaps
>> >   *
>> > @@ -335,17 +358,12 @@ struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
>> >       int i, j, k;
>> >       struct perf_cpu_map *merged;
>> >
>> > -     if (!orig && !other)
>> > -             return NULL;
>> > -     if (!orig) {
>> > -             perf_cpu_map__get(other);
>> > -             return other;
>> > -     }
>> > -     if (!other)
>> > -             return orig;
>> > -     if (orig->nr == other->nr &&
>> > -         !memcmp(orig->map, other->map, orig->nr * sizeof(struct perf_cpu)))
>> > +     if (perf_cpu_map__is_subset(orig, other))
>> >               return orig;
>>
>> Can't we have first the introduction of perf_cpu_map__is_subset() and
>> then another patch that gets the refcount, i.e. the four lines below?
>
>I believe that will fail as it'd be an unused static function warning
>and werror.

I thought that it seemed useful enough not to be a static, even if we don't at first export it, i.e. keep it as internal to libperf

>
>Thanks,
>Ian
>
>> > +     if (perf_cpu_map__is_subset(other, orig)) {
>> > +             perf_cpu_map__put(orig);
>> > +             return perf_cpu_map__get(other);
>> > +     }
>> >
>> >       tmp_len = orig->nr + other->nr;
>> >       tmp_cpus = malloc(tmp_len * sizeof(struct perf_cpu));
>> > --
>> > 2.35.1.1021.g381101b075-goog
>>
>> --
>>
>> - Arnaldo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/5] perf evlist: Rename cpus to user_cpus
  2022-03-28 20:18     ` Arnaldo Carvalho de Melo
@ 2022-03-28 20:58       ` Ian Rogers
  -1 siblings, 0 replies; 34+ messages in thread
From: Ian Rogers @ 2022-03-28 20:58 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Adrian Hunter, Riccardo Mancini, Andi Kleen,
	Alexey Bayduraev, Alexander Antonov, linux-perf-users,
	linux-kernel, coresight, linux-arm-kernel, netdev, bpf,
	Stephane Eranian

On Mon, Mar 28, 2022 at 1:19 PM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Sun, Mar 27, 2022 at 11:24:10PM -0700, Ian Rogers escreveu:
> > evlist contains cpus and all_cpus. all_cpus is the union of the cpu maps
> > of all evsels. cpus is set to be cpus required from the command line,
>
> Can we replace "required" with "requested"?

Sure. This statement also isn't true for profiling tasks.

> > defaulting to all online cpus if no cpus are specified.
>
> > For something like an uncore event, all_cpus may just be CPU 0,
> > however, all_cpus may be every online CPU.
>
> Can this be rephrased as:
>
> "For an uncore event, all_cpus may be just CPU 0 or every online CPU."
>
> ?

The rephrasing is fine. duration_time also has a cpu map of just "0".

> > This causes all_cpus to have fewer values than the cpus variable which
> > is confusing given the 'all' in the name. To try to make the behavior
> > clearer, rename cpus to user_cpus and add comments on the two struct
> > variables.
>
> "user_cpus" can as well mean CPUs where we should only sample user level
> events, so perhaps bite the bullet and rename it to the longer
>
>   'evlist->user_requested_cpus'
>
> ?

I can do that.

Thanks,
Ian

> - Arnaldo
>
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/lib/perf/evlist.c                  | 28 ++++++++++++------------
> >  tools/lib/perf/include/internal/evlist.h |  4 +++-
> >  tools/perf/arch/arm/util/cs-etm.c        |  8 +++----
> >  tools/perf/arch/arm64/util/arm-spe.c     |  2 +-
> >  tools/perf/arch/x86/util/intel-bts.c     |  2 +-
> >  tools/perf/arch/x86/util/intel-pt.c      |  4 ++--
> >  tools/perf/bench/evlist-open-close.c     |  2 +-
> >  tools/perf/builtin-ftrace.c              |  2 +-
> >  tools/perf/builtin-record.c              |  6 ++---
> >  tools/perf/builtin-stat.c                |  8 +++----
> >  tools/perf/builtin-top.c                 |  2 +-
> >  tools/perf/util/auxtrace.c               |  2 +-
> >  tools/perf/util/bpf_ftrace.c             |  4 ++--
> >  tools/perf/util/evlist.c                 | 14 ++++++------
> >  tools/perf/util/record.c                 |  6 ++---
> >  tools/perf/util/sideband_evlist.c        |  2 +-
> >  tools/perf/util/stat-display.c           |  2 +-
> >  tools/perf/util/synthetic-events.c       |  2 +-
> >  tools/perf/util/top.c                    |  7 +++---
> >  19 files changed, 55 insertions(+), 52 deletions(-)
> >
> > diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
> > index 9a770bfdc804..e29dc229768a 100644
> > --- a/tools/lib/perf/evlist.c
> > +++ b/tools/lib/perf/evlist.c
> > @@ -41,10 +41,10 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
> >        */
> >       if (!evsel->own_cpus || evlist->has_user_cpus) {
> >               perf_cpu_map__put(evsel->cpus);
> > -             evsel->cpus = perf_cpu_map__get(evlist->cpus);
> > -     } else if (!evsel->system_wide && perf_cpu_map__empty(evlist->cpus)) {
> > +             evsel->cpus = perf_cpu_map__get(evlist->user_cpus);
> > +     } else if (!evsel->system_wide && perf_cpu_map__empty(evlist->user_cpus)) {
> >               perf_cpu_map__put(evsel->cpus);
> > -             evsel->cpus = perf_cpu_map__get(evlist->cpus);
> > +             evsel->cpus = perf_cpu_map__get(evlist->user_cpus);
> >       } else if (evsel->cpus != evsel->own_cpus) {
> >               perf_cpu_map__put(evsel->cpus);
> >               evsel->cpus = perf_cpu_map__get(evsel->own_cpus);
> > @@ -123,10 +123,10 @@ static void perf_evlist__purge(struct perf_evlist *evlist)
> >
> >  void perf_evlist__exit(struct perf_evlist *evlist)
> >  {
> > -     perf_cpu_map__put(evlist->cpus);
> > +     perf_cpu_map__put(evlist->user_cpus);
> >       perf_cpu_map__put(evlist->all_cpus);
> >       perf_thread_map__put(evlist->threads);
> > -     evlist->cpus = NULL;
> > +     evlist->user_cpus = NULL;
> >       evlist->all_cpus = NULL;
> >       evlist->threads = NULL;
> >       fdarray__exit(&evlist->pollfd);
> > @@ -155,9 +155,9 @@ void perf_evlist__set_maps(struct perf_evlist *evlist,
> >        * original reference count of 1.  If that is not the case it is up to
> >        * the caller to increase the reference count.
> >        */
> > -     if (cpus != evlist->cpus) {
> > -             perf_cpu_map__put(evlist->cpus);
> > -             evlist->cpus = perf_cpu_map__get(cpus);
> > +     if (cpus != evlist->user_cpus) {
> > +             perf_cpu_map__put(evlist->user_cpus);
> > +             evlist->user_cpus = perf_cpu_map__get(cpus);
> >       }
> >
> >       if (threads != evlist->threads) {
> > @@ -294,7 +294,7 @@ int perf_evlist__id_add_fd(struct perf_evlist *evlist,
> >
> >  int perf_evlist__alloc_pollfd(struct perf_evlist *evlist)
> >  {
> > -     int nr_cpus = perf_cpu_map__nr(evlist->cpus);
> > +     int nr_cpus = perf_cpu_map__nr(evlist->user_cpus);
> >       int nr_threads = perf_thread_map__nr(evlist->threads);
> >       int nfds = 0;
> >       struct perf_evsel *evsel;
> > @@ -426,7 +426,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
> >              int idx, struct perf_mmap_param *mp, int cpu_idx,
> >              int thread, int *_output, int *_output_overwrite)
> >  {
> > -     struct perf_cpu evlist_cpu = perf_cpu_map__cpu(evlist->cpus, cpu_idx);
> > +     struct perf_cpu evlist_cpu = perf_cpu_map__cpu(evlist->user_cpus, cpu_idx);
> >       struct perf_evsel *evsel;
> >       int revent;
> >
> > @@ -536,7 +536,7 @@ mmap_per_cpu(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
> >            struct perf_mmap_param *mp)
> >  {
> >       int nr_threads = perf_thread_map__nr(evlist->threads);
> > -     int nr_cpus    = perf_cpu_map__nr(evlist->cpus);
> > +     int nr_cpus    = perf_cpu_map__nr(evlist->user_cpus);
> >       int cpu, thread;
> >
> >       for (cpu = 0; cpu < nr_cpus; cpu++) {
> > @@ -564,8 +564,8 @@ static int perf_evlist__nr_mmaps(struct perf_evlist *evlist)
> >  {
> >       int nr_mmaps;
> >
> > -     nr_mmaps = perf_cpu_map__nr(evlist->cpus);
> > -     if (perf_cpu_map__empty(evlist->cpus))
> > +     nr_mmaps = perf_cpu_map__nr(evlist->user_cpus);
> > +     if (perf_cpu_map__empty(evlist->user_cpus))
> >               nr_mmaps = perf_thread_map__nr(evlist->threads);
> >
> >       return nr_mmaps;
> > @@ -576,7 +576,7 @@ int perf_evlist__mmap_ops(struct perf_evlist *evlist,
> >                         struct perf_mmap_param *mp)
> >  {
> >       struct perf_evsel *evsel;
> > -     const struct perf_cpu_map *cpus = evlist->cpus;
> > +     const struct perf_cpu_map *cpus = evlist->user_cpus;
> >       const struct perf_thread_map *threads = evlist->threads;
> >
> >       if (!ops || !ops->get || !ops->mmap)
> > diff --git a/tools/lib/perf/include/internal/evlist.h b/tools/lib/perf/include/internal/evlist.h
> > index 4cefade540bd..5f95672662ae 100644
> > --- a/tools/lib/perf/include/internal/evlist.h
> > +++ b/tools/lib/perf/include/internal/evlist.h
> > @@ -19,7 +19,9 @@ struct perf_evlist {
> >       int                      nr_entries;
> >       int                      nr_groups;
> >       bool                     has_user_cpus;
> > -     struct perf_cpu_map     *cpus;
> > +     /** The list of cpus passed from the command line. */
> > +     struct perf_cpu_map     *user_cpus;
> > +     /** The union of all evsel cpu maps. */
> >       struct perf_cpu_map     *all_cpus;
> >       struct perf_thread_map  *threads;
> >       int                      nr_mmaps;
> > diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
> > index cbc555245959..405d58903d84 100644
> > --- a/tools/perf/arch/arm/util/cs-etm.c
> > +++ b/tools/perf/arch/arm/util/cs-etm.c
> > @@ -199,7 +199,7 @@ static int cs_etm_set_option(struct auxtrace_record *itr,
> >                            struct evsel *evsel, u32 option)
> >  {
> >       int i, err = -EINVAL;
> > -     struct perf_cpu_map *event_cpus = evsel->evlist->core.cpus;
> > +     struct perf_cpu_map *event_cpus = evsel->evlist->core.user_cpus;
> >       struct perf_cpu_map *online_cpus = perf_cpu_map__new(NULL);
> >
> >       /* Set option of each CPU we have */
> > @@ -299,7 +299,7 @@ static int cs_etm_recording_options(struct auxtrace_record *itr,
> >                               container_of(itr, struct cs_etm_recording, itr);
> >       struct perf_pmu *cs_etm_pmu = ptr->cs_etm_pmu;
> >       struct evsel *evsel, *cs_etm_evsel = NULL;
> > -     struct perf_cpu_map *cpus = evlist->core.cpus;
> > +     struct perf_cpu_map *cpus = evlist->core.user_cpus;
> >       bool privileged = perf_event_paranoid_check(-1);
> >       int err = 0;
> >
> > @@ -522,7 +522,7 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused,
> >  {
> >       int i;
> >       int etmv3 = 0, etmv4 = 0, ete = 0;
> > -     struct perf_cpu_map *event_cpus = evlist->core.cpus;
> > +     struct perf_cpu_map *event_cpus = evlist->core.user_cpus;
> >       struct perf_cpu_map *online_cpus = perf_cpu_map__new(NULL);
> >
> >       /* cpu map is not empty, we have specific CPUs to work with */
> > @@ -713,7 +713,7 @@ static int cs_etm_info_fill(struct auxtrace_record *itr,
> >       u32 offset;
> >       u64 nr_cpu, type;
> >       struct perf_cpu_map *cpu_map;
> > -     struct perf_cpu_map *event_cpus = session->evlist->core.cpus;
> > +     struct perf_cpu_map *event_cpus = session->evlist->core.user_cpus;
> >       struct perf_cpu_map *online_cpus = perf_cpu_map__new(NULL);
> >       struct cs_etm_recording *ptr =
> >                       container_of(itr, struct cs_etm_recording, itr);
> > diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
> > index 5860bbaea95a..83ad05613321 100644
> > --- a/tools/perf/arch/arm64/util/arm-spe.c
> > +++ b/tools/perf/arch/arm64/util/arm-spe.c
> > @@ -144,7 +144,7 @@ static int arm_spe_recording_options(struct auxtrace_record *itr,
> >                       container_of(itr, struct arm_spe_recording, itr);
> >       struct perf_pmu *arm_spe_pmu = sper->arm_spe_pmu;
> >       struct evsel *evsel, *arm_spe_evsel = NULL;
> > -     struct perf_cpu_map *cpus = evlist->core.cpus;
> > +     struct perf_cpu_map *cpus = evlist->core.user_cpus;
> >       bool privileged = perf_event_paranoid_check(-1);
> >       struct evsel *tracking_evsel;
> >       int err;
> > diff --git a/tools/perf/arch/x86/util/intel-bts.c b/tools/perf/arch/x86/util/intel-bts.c
> > index 4a76d49d25d6..c9d73ecfd795 100644
> > --- a/tools/perf/arch/x86/util/intel-bts.c
> > +++ b/tools/perf/arch/x86/util/intel-bts.c
> > @@ -110,7 +110,7 @@ static int intel_bts_recording_options(struct auxtrace_record *itr,
> >                       container_of(itr, struct intel_bts_recording, itr);
> >       struct perf_pmu *intel_bts_pmu = btsr->intel_bts_pmu;
> >       struct evsel *evsel, *intel_bts_evsel = NULL;
> > -     const struct perf_cpu_map *cpus = evlist->core.cpus;
> > +     const struct perf_cpu_map *cpus = evlist->core.user_cpus;
> >       bool privileged = perf_event_paranoid_check(-1);
> >
> >       if (opts->auxtrace_sample_mode) {
> > diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
> > index 8c31578d6f4a..58bf24960273 100644
> > --- a/tools/perf/arch/x86/util/intel-pt.c
> > +++ b/tools/perf/arch/x86/util/intel-pt.c
> > @@ -382,7 +382,7 @@ static int intel_pt_info_fill(struct auxtrace_record *itr,
> >                       ui__warning("Intel Processor Trace: TSC not available\n");
> >       }
> >
> > -     per_cpu_mmaps = !perf_cpu_map__empty(session->evlist->core.cpus);
> > +     per_cpu_mmaps = !perf_cpu_map__empty(session->evlist->core.user_cpus);
> >
> >       auxtrace_info->type = PERF_AUXTRACE_INTEL_PT;
> >       auxtrace_info->priv[INTEL_PT_PMU_TYPE] = intel_pt_pmu->type;
> > @@ -632,7 +632,7 @@ static int intel_pt_recording_options(struct auxtrace_record *itr,
> >       struct perf_pmu *intel_pt_pmu = ptr->intel_pt_pmu;
> >       bool have_timing_info, need_immediate = false;
> >       struct evsel *evsel, *intel_pt_evsel = NULL;
> > -     const struct perf_cpu_map *cpus = evlist->core.cpus;
> > +     const struct perf_cpu_map *cpus = evlist->core.user_cpus;
> >       bool privileged = perf_event_paranoid_check(-1);
> >       u64 tsc_bit;
> >       int err;
> > diff --git a/tools/perf/bench/evlist-open-close.c b/tools/perf/bench/evlist-open-close.c
> > index de56601f69ee..5bdc6b476a4d 100644
> > --- a/tools/perf/bench/evlist-open-close.c
> > +++ b/tools/perf/bench/evlist-open-close.c
> > @@ -151,7 +151,7 @@ static int bench_evlist_open_close__run(char *evstr)
> >
> >       init_stats(&time_stats);
> >
> > -     printf("  Number of cpus:\t%d\n", perf_cpu_map__nr(evlist->core.cpus));
> > +     printf("  Number of cpus:\t%d\n", perf_cpu_map__nr(evlist->core.user_cpus));
> >       printf("  Number of threads:\t%d\n", evlist->core.threads->nr);
> >       printf("  Number of events:\t%d (%d fds)\n",
> >               evlist->core.nr_entries, evlist__count_evsel_fds(evlist));
> > diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
> > index ad9ce1bfffa1..642cbc6fdfc5 100644
> > --- a/tools/perf/builtin-ftrace.c
> > +++ b/tools/perf/builtin-ftrace.c
> > @@ -301,7 +301,7 @@ static int set_tracing_cpumask(struct perf_cpu_map *cpumap)
> >
> >  static int set_tracing_cpu(struct perf_ftrace *ftrace)
> >  {
> > -     struct perf_cpu_map *cpumap = ftrace->evlist->core.cpus;
> > +     struct perf_cpu_map *cpumap = ftrace->evlist->core.user_cpus;
> >
> >       if (!target__has_cpu(&ftrace->target))
> >               return 0;
> > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > index 0b4abed555d8..28ab3866802c 100644
> > --- a/tools/perf/builtin-record.c
> > +++ b/tools/perf/builtin-record.c
> > @@ -987,7 +987,7 @@ static int record__thread_data_init_maps(struct record_thread *thread_data, stru
> >       int m, tm, nr_mmaps = evlist->core.nr_mmaps;
> >       struct mmap *mmap = evlist->mmap;
> >       struct mmap *overwrite_mmap = evlist->overwrite_mmap;
> > -     struct perf_cpu_map *cpus = evlist->core.cpus;
> > +     struct perf_cpu_map *cpus = evlist->core.user_cpus;
> >
> >       thread_data->nr_mmaps = bitmap_weight(thread_data->mask->maps.bits,
> >                                             thread_data->mask->maps.nbits);
> > @@ -1881,7 +1881,7 @@ static int record__synthesize(struct record *rec, bool tail)
> >               return err;
> >       }
> >
> > -     err = perf_event__synthesize_cpu_map(&rec->tool, rec->evlist->core.cpus,
> > +     err = perf_event__synthesize_cpu_map(&rec->tool, rec->evlist->core.user_cpus,
> >                                            process_synthesized_event, NULL);
> >       if (err < 0) {
> >               pr_err("Couldn't synthesize cpu map.\n");
> > @@ -3675,7 +3675,7 @@ static int record__init_thread_default_masks(struct record *rec, struct perf_cpu
> >  static int record__init_thread_masks(struct record *rec)
> >  {
> >       int ret = 0;
> > -     struct perf_cpu_map *cpus = rec->evlist->core.cpus;
> > +     struct perf_cpu_map *cpus = rec->evlist->core.user_cpus;
> >
> >       if (!record__threads_enabled(rec))
> >               return record__init_thread_default_masks(rec, cpus);
> > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> > index 4ee40de698a4..5bee529f7656 100644
> > --- a/tools/perf/builtin-stat.c
> > +++ b/tools/perf/builtin-stat.c
> > @@ -804,7 +804,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
> >       if (group)
> >               evlist__set_leader(evsel_list);
> >
> > -     if (!cpu_map__is_dummy(evsel_list->core.cpus)) {
> > +     if (!cpu_map__is_dummy(evsel_list->core.user_cpus)) {
> >               if (affinity__setup(&saved_affinity) < 0)
> >                       return -1;
> >               affinity = &saved_affinity;
> > @@ -1458,7 +1458,7 @@ static int perf_stat_init_aggr_mode(void)
> >       aggr_cpu_id_get_t get_id = aggr_mode__get_aggr(stat_config.aggr_mode);
> >
> >       if (get_id) {
> > -             stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.cpus,
> > +             stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.user_cpus,
> >                                                        get_id, /*data=*/NULL);
> >               if (!stat_config.aggr_map) {
> >                       pr_err("cannot build %s map", aggr_mode__string[stat_config.aggr_mode]);
> > @@ -1472,7 +1472,7 @@ static int perf_stat_init_aggr_mode(void)
> >        * taking the highest cpu number to be the size of
> >        * the aggregation translate cpumap.
> >        */
> > -     nr = perf_cpu_map__max(evsel_list->core.cpus).cpu;
> > +     nr = perf_cpu_map__max(evsel_list->core.user_cpus).cpu;
> >       stat_config.cpus_aggr_map = cpu_aggr_map__empty_new(nr + 1);
> >       return stat_config.cpus_aggr_map ? 0 : -ENOMEM;
> >  }
> > @@ -1627,7 +1627,7 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
> >       if (!get_id)
> >               return 0;
> >
> > -     stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.cpus, get_id, env);
> > +     stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.user_cpus, get_id, env);
> >       if (!stat_config.aggr_map) {
> >               pr_err("cannot build %s map", aggr_mode__string[stat_config.aggr_mode]);
> >               return -1;
> > diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> > index 9b08e44a31d9..4cfa112292d0 100644
> > --- a/tools/perf/builtin-top.c
> > +++ b/tools/perf/builtin-top.c
> > @@ -1021,7 +1021,7 @@ static int perf_top__start_counters(struct perf_top *top)
> >
> >       evlist__for_each_entry(evlist, counter) {
> >  try_again:
> > -             if (evsel__open(counter, top->evlist->core.cpus,
> > +             if (evsel__open(counter, top->evlist->core.user_cpus,
> >                                    top->evlist->core.threads) < 0) {
> >
> >                       /*
> > diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
> > index 9e48652662d4..b138dd6bdefc 100644
> > --- a/tools/perf/util/auxtrace.c
> > +++ b/tools/perf/util/auxtrace.c
> > @@ -174,7 +174,7 @@ void auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp,
> >       mp->idx = idx;
> >
> >       if (per_cpu) {
> > -             mp->cpu = perf_cpu_map__cpu(evlist->core.cpus, idx);
> > +             mp->cpu = perf_cpu_map__cpu(evlist->core.user_cpus, idx);
> >               if (evlist->core.threads)
> >                       mp->tid = perf_thread_map__pid(evlist->core.threads, 0);
> >               else
> > diff --git a/tools/perf/util/bpf_ftrace.c b/tools/perf/util/bpf_ftrace.c
> > index 4f4d3aaff37c..69481b28b885 100644
> > --- a/tools/perf/util/bpf_ftrace.c
> > +++ b/tools/perf/util/bpf_ftrace.c
> > @@ -38,7 +38,7 @@ int perf_ftrace__latency_prepare_bpf(struct perf_ftrace *ftrace)
> >
> >       /* don't need to set cpu filter for system-wide mode */
> >       if (ftrace->target.cpu_list) {
> > -             ncpus = perf_cpu_map__nr(ftrace->evlist->core.cpus);
> > +             ncpus = perf_cpu_map__nr(ftrace->evlist->core.user_cpus);
> >               bpf_map__set_max_entries(skel->maps.cpu_filter, ncpus);
> >       }
> >
> > @@ -63,7 +63,7 @@ int perf_ftrace__latency_prepare_bpf(struct perf_ftrace *ftrace)
> >               fd = bpf_map__fd(skel->maps.cpu_filter);
> >
> >               for (i = 0; i < ncpus; i++) {
> > -                     cpu = perf_cpu_map__cpu(ftrace->evlist->core.cpus, i).cpu;
> > +                     cpu = perf_cpu_map__cpu(ftrace->evlist->core.user_cpus, i).cpu;
> >                       bpf_map_update_elem(fd, &cpu, &val, BPF_ANY);
> >               }
> >       }
> > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> > index 9bb79e049957..d335fb713f5e 100644
> > --- a/tools/perf/util/evlist.c
> > +++ b/tools/perf/util/evlist.c
> > @@ -440,7 +440,7 @@ static void __evlist__disable(struct evlist *evlist, char *evsel_name)
> >       bool has_imm = false;
> >
> >       // See explanation in evlist__close()
> > -     if (!cpu_map__is_dummy(evlist->core.cpus)) {
> > +     if (!cpu_map__is_dummy(evlist->core.user_cpus)) {
> >               if (affinity__setup(&saved_affinity) < 0)
> >                       return;
> >               affinity = &saved_affinity;
> > @@ -500,7 +500,7 @@ static void __evlist__enable(struct evlist *evlist, char *evsel_name)
> >       struct affinity saved_affinity, *affinity = NULL;
> >
> >       // See explanation in evlist__close()
> > -     if (!cpu_map__is_dummy(evlist->core.cpus)) {
> > +     if (!cpu_map__is_dummy(evlist->core.user_cpus)) {
> >               if (affinity__setup(&saved_affinity) < 0)
> >                       return;
> >               affinity = &saved_affinity;
> > @@ -565,7 +565,7 @@ static int evlist__enable_event_cpu(struct evlist *evlist, struct evsel *evsel,
> >  static int evlist__enable_event_thread(struct evlist *evlist, struct evsel *evsel, int thread)
> >  {
> >       int cpu;
> > -     int nr_cpus = perf_cpu_map__nr(evlist->core.cpus);
> > +     int nr_cpus = perf_cpu_map__nr(evlist->core.user_cpus);
> >
> >       if (!evsel->core.fd)
> >               return -EINVAL;
> > @@ -580,7 +580,7 @@ static int evlist__enable_event_thread(struct evlist *evlist, struct evsel *evse
> >
> >  int evlist__enable_event_idx(struct evlist *evlist, struct evsel *evsel, int idx)
> >  {
> > -     bool per_cpu_mmaps = !perf_cpu_map__empty(evlist->core.cpus);
> > +     bool per_cpu_mmaps = !perf_cpu_map__empty(evlist->core.user_cpus);
> >
> >       if (per_cpu_mmaps)
> >               return evlist__enable_event_cpu(evlist, evsel, idx);
> > @@ -1301,10 +1301,10 @@ void evlist__close(struct evlist *evlist)
> >       struct affinity affinity;
> >
> >       /*
> > -      * With perf record core.cpus is usually NULL.
> > +      * With perf record core.user_cpus is usually NULL.
> >        * Use the old method to handle this for now.
> >        */
> > -     if (!evlist->core.cpus || cpu_map__is_dummy(evlist->core.cpus)) {
> > +     if (!evlist->core.user_cpus || cpu_map__is_dummy(evlist->core.user_cpus)) {
> >               evlist__for_each_entry_reverse(evlist, evsel)
> >                       evsel__close(evsel);
> >               return;
> > @@ -1367,7 +1367,7 @@ int evlist__open(struct evlist *evlist)
> >        * Default: one fd per CPU, all threads, aka systemwide
> >        * as sys_perf_event_open(cpu = -1, thread = -1) is EINVAL
> >        */
> > -     if (evlist->core.threads == NULL && evlist->core.cpus == NULL) {
> > +     if (evlist->core.threads == NULL && evlist->core.user_cpus == NULL) {
> >               err = evlist__create_syswide_maps(evlist);
> >               if (err < 0)
> >                       goto out_err;
> > diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
> > index 007a64681416..ff326eba084f 100644
> > --- a/tools/perf/util/record.c
> > +++ b/tools/perf/util/record.c
> > @@ -106,7 +106,7 @@ void evlist__config(struct evlist *evlist, struct record_opts *opts, struct call
> >       if (opts->group)
> >               evlist__set_leader(evlist);
> >
> > -     if (perf_cpu_map__cpu(evlist->core.cpus, 0).cpu < 0)
> > +     if (perf_cpu_map__cpu(evlist->core.user_cpus, 0).cpu < 0)
> >               opts->no_inherit = true;
> >
> >       use_comm_exec = perf_can_comm_exec();
> > @@ -244,7 +244,7 @@ bool evlist__can_select_event(struct evlist *evlist, const char *str)
> >
> >       evsel = evlist__last(temp_evlist);
> >
> > -     if (!evlist || perf_cpu_map__empty(evlist->core.cpus)) {
> > +     if (!evlist || perf_cpu_map__empty(evlist->core.user_cpus)) {
> >               struct perf_cpu_map *cpus = perf_cpu_map__new(NULL);
> >
> >               if (cpus)
> > @@ -252,7 +252,7 @@ bool evlist__can_select_event(struct evlist *evlist, const char *str)
> >
> >               perf_cpu_map__put(cpus);
> >       } else {
> > -             cpu = perf_cpu_map__cpu(evlist->core.cpus, 0);
> > +             cpu = perf_cpu_map__cpu(evlist->core.user_cpus, 0);
> >       }
> >
> >       while (1) {
> > diff --git a/tools/perf/util/sideband_evlist.c b/tools/perf/util/sideband_evlist.c
> > index 748371ac22be..9f58c68a25f7 100644
> > --- a/tools/perf/util/sideband_evlist.c
> > +++ b/tools/perf/util/sideband_evlist.c
> > @@ -114,7 +114,7 @@ int evlist__start_sb_thread(struct evlist *evlist, struct target *target)
> >       }
> >
> >       evlist__for_each_entry(evlist, counter) {
> > -             if (evsel__open(counter, evlist->core.cpus, evlist->core.threads) < 0)
> > +             if (evsel__open(counter, evlist->core.user_cpus, evlist->core.threads) < 0)
> >                       goto out_delete_evlist;
> >       }
> >
> > diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> > index 9cbe351b141f..634dd9ea2b35 100644
> > --- a/tools/perf/util/stat-display.c
> > +++ b/tools/perf/util/stat-display.c
> > @@ -929,7 +929,7 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
> >       int all_idx;
> >       struct perf_cpu cpu;
> >
> > -     perf_cpu_map__for_each_cpu(cpu, all_idx, evlist->core.cpus) {
> > +     perf_cpu_map__for_each_cpu(cpu, all_idx, evlist->core.user_cpus) {
> >               struct evsel *counter;
> >               bool first = true;
> >
> > diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> > index b654de0841f8..591afc6c607b 100644
> > --- a/tools/perf/util/synthetic-events.c
> > +++ b/tools/perf/util/synthetic-events.c
> > @@ -2127,7 +2127,7 @@ int perf_event__synthesize_stat_events(struct perf_stat_config *config, struct p
> >               return err;
> >       }
> >
> > -     err = perf_event__synthesize_cpu_map(tool, evlist->core.cpus, process, NULL);
> > +     err = perf_event__synthesize_cpu_map(tool, evlist->core.user_cpus, process, NULL);
> >       if (err < 0) {
> >               pr_err("Couldn't synthesize thread map.\n");
> >               return err;
> > diff --git a/tools/perf/util/top.c b/tools/perf/util/top.c
> > index c1ebfc5d2e0c..e98422f3ff17 100644
> > --- a/tools/perf/util/top.c
> > +++ b/tools/perf/util/top.c
> > @@ -95,15 +95,16 @@ size_t perf_top__header_snprintf(struct perf_top *top, char *bf, size_t size)
> >
> >       if (target->cpu_list)
> >               ret += SNPRINTF(bf + ret, size - ret, ", CPU%s: %s)",
> > -                             perf_cpu_map__nr(top->evlist->core.cpus) > 1 ? "s" : "",
> > +                             perf_cpu_map__nr(top->evlist->core.user_cpus) > 1 ? "s" : "",
> >                               target->cpu_list);
> >       else {
> >               if (target->tid)
> >                       ret += SNPRINTF(bf + ret, size - ret, ")");
> >               else
> >                       ret += SNPRINTF(bf + ret, size - ret, ", %d CPU%s)",
> > -                                     perf_cpu_map__nr(top->evlist->core.cpus),
> > -                                     perf_cpu_map__nr(top->evlist->core.cpus) > 1 ? "s" : "");
> > +                                     perf_cpu_map__nr(top->evlist->core.user_cpus),
> > +                                     perf_cpu_map__nr(top->evlist->core.user_cpus) > 1
> > +                                     ? "s" : "");
> >       }
> >
> >       perf_top__reset_sample_counters(top);
> > --
> > 2.35.1.1021.g381101b075-goog
>
> --
>
> - Arnaldo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/5] perf evlist: Rename cpus to user_cpus
@ 2022-03-28 20:58       ` Ian Rogers
  0 siblings, 0 replies; 34+ messages in thread
From: Ian Rogers @ 2022-03-28 20:58 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Adrian Hunter, Riccardo Mancini, Andi Kleen,
	Alexey Bayduraev, Alexander Antonov, linux-perf-users,
	linux-kernel, coresight, linux-arm-kernel, netdev, bpf,
	Stephane Eranian

On Mon, Mar 28, 2022 at 1:19 PM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Em Sun, Mar 27, 2022 at 11:24:10PM -0700, Ian Rogers escreveu:
> > evlist contains cpus and all_cpus. all_cpus is the union of the cpu maps
> > of all evsels. cpus is set to be cpus required from the command line,
>
> Can we replace "required" with "requested"?

Sure. This statement also isn't true for profiling tasks.

> > defaulting to all online cpus if no cpus are specified.
>
> > For something like an uncore event, all_cpus may just be CPU 0,
> > however, all_cpus may be every online CPU.
>
> Can this be rephrased as:
>
> "For an uncore event, all_cpus may be just CPU 0 or every online CPU."
>
> ?

The rephrasing is fine. duration_time also has a cpu map of just "0".

> > This causes all_cpus to have fewer values than the cpus variable which
> > is confusing given the 'all' in the name. To try to make the behavior
> > clearer, rename cpus to user_cpus and add comments on the two struct
> > variables.
>
> "user_cpus" can as well mean CPUs where we should only sample user level
> events, so perhaps bite the bullet and rename it to the longer
>
>   'evlist->user_requested_cpus'
>
> ?

I can do that.

Thanks,
Ian

> - Arnaldo
>
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >  tools/lib/perf/evlist.c                  | 28 ++++++++++++------------
> >  tools/lib/perf/include/internal/evlist.h |  4 +++-
> >  tools/perf/arch/arm/util/cs-etm.c        |  8 +++----
> >  tools/perf/arch/arm64/util/arm-spe.c     |  2 +-
> >  tools/perf/arch/x86/util/intel-bts.c     |  2 +-
> >  tools/perf/arch/x86/util/intel-pt.c      |  4 ++--
> >  tools/perf/bench/evlist-open-close.c     |  2 +-
> >  tools/perf/builtin-ftrace.c              |  2 +-
> >  tools/perf/builtin-record.c              |  6 ++---
> >  tools/perf/builtin-stat.c                |  8 +++----
> >  tools/perf/builtin-top.c                 |  2 +-
> >  tools/perf/util/auxtrace.c               |  2 +-
> >  tools/perf/util/bpf_ftrace.c             |  4 ++--
> >  tools/perf/util/evlist.c                 | 14 ++++++------
> >  tools/perf/util/record.c                 |  6 ++---
> >  tools/perf/util/sideband_evlist.c        |  2 +-
> >  tools/perf/util/stat-display.c           |  2 +-
> >  tools/perf/util/synthetic-events.c       |  2 +-
> >  tools/perf/util/top.c                    |  7 +++---
> >  19 files changed, 55 insertions(+), 52 deletions(-)
> >
> > diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
> > index 9a770bfdc804..e29dc229768a 100644
> > --- a/tools/lib/perf/evlist.c
> > +++ b/tools/lib/perf/evlist.c
> > @@ -41,10 +41,10 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
> >        */
> >       if (!evsel->own_cpus || evlist->has_user_cpus) {
> >               perf_cpu_map__put(evsel->cpus);
> > -             evsel->cpus = perf_cpu_map__get(evlist->cpus);
> > -     } else if (!evsel->system_wide && perf_cpu_map__empty(evlist->cpus)) {
> > +             evsel->cpus = perf_cpu_map__get(evlist->user_cpus);
> > +     } else if (!evsel->system_wide && perf_cpu_map__empty(evlist->user_cpus)) {
> >               perf_cpu_map__put(evsel->cpus);
> > -             evsel->cpus = perf_cpu_map__get(evlist->cpus);
> > +             evsel->cpus = perf_cpu_map__get(evlist->user_cpus);
> >       } else if (evsel->cpus != evsel->own_cpus) {
> >               perf_cpu_map__put(evsel->cpus);
> >               evsel->cpus = perf_cpu_map__get(evsel->own_cpus);
> > @@ -123,10 +123,10 @@ static void perf_evlist__purge(struct perf_evlist *evlist)
> >
> >  void perf_evlist__exit(struct perf_evlist *evlist)
> >  {
> > -     perf_cpu_map__put(evlist->cpus);
> > +     perf_cpu_map__put(evlist->user_cpus);
> >       perf_cpu_map__put(evlist->all_cpus);
> >       perf_thread_map__put(evlist->threads);
> > -     evlist->cpus = NULL;
> > +     evlist->user_cpus = NULL;
> >       evlist->all_cpus = NULL;
> >       evlist->threads = NULL;
> >       fdarray__exit(&evlist->pollfd);
> > @@ -155,9 +155,9 @@ void perf_evlist__set_maps(struct perf_evlist *evlist,
> >        * original reference count of 1.  If that is not the case it is up to
> >        * the caller to increase the reference count.
> >        */
> > -     if (cpus != evlist->cpus) {
> > -             perf_cpu_map__put(evlist->cpus);
> > -             evlist->cpus = perf_cpu_map__get(cpus);
> > +     if (cpus != evlist->user_cpus) {
> > +             perf_cpu_map__put(evlist->user_cpus);
> > +             evlist->user_cpus = perf_cpu_map__get(cpus);
> >       }
> >
> >       if (threads != evlist->threads) {
> > @@ -294,7 +294,7 @@ int perf_evlist__id_add_fd(struct perf_evlist *evlist,
> >
> >  int perf_evlist__alloc_pollfd(struct perf_evlist *evlist)
> >  {
> > -     int nr_cpus = perf_cpu_map__nr(evlist->cpus);
> > +     int nr_cpus = perf_cpu_map__nr(evlist->user_cpus);
> >       int nr_threads = perf_thread_map__nr(evlist->threads);
> >       int nfds = 0;
> >       struct perf_evsel *evsel;
> > @@ -426,7 +426,7 @@ mmap_per_evsel(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
> >              int idx, struct perf_mmap_param *mp, int cpu_idx,
> >              int thread, int *_output, int *_output_overwrite)
> >  {
> > -     struct perf_cpu evlist_cpu = perf_cpu_map__cpu(evlist->cpus, cpu_idx);
> > +     struct perf_cpu evlist_cpu = perf_cpu_map__cpu(evlist->user_cpus, cpu_idx);
> >       struct perf_evsel *evsel;
> >       int revent;
> >
> > @@ -536,7 +536,7 @@ mmap_per_cpu(struct perf_evlist *evlist, struct perf_evlist_mmap_ops *ops,
> >            struct perf_mmap_param *mp)
> >  {
> >       int nr_threads = perf_thread_map__nr(evlist->threads);
> > -     int nr_cpus    = perf_cpu_map__nr(evlist->cpus);
> > +     int nr_cpus    = perf_cpu_map__nr(evlist->user_cpus);
> >       int cpu, thread;
> >
> >       for (cpu = 0; cpu < nr_cpus; cpu++) {
> > @@ -564,8 +564,8 @@ static int perf_evlist__nr_mmaps(struct perf_evlist *evlist)
> >  {
> >       int nr_mmaps;
> >
> > -     nr_mmaps = perf_cpu_map__nr(evlist->cpus);
> > -     if (perf_cpu_map__empty(evlist->cpus))
> > +     nr_mmaps = perf_cpu_map__nr(evlist->user_cpus);
> > +     if (perf_cpu_map__empty(evlist->user_cpus))
> >               nr_mmaps = perf_thread_map__nr(evlist->threads);
> >
> >       return nr_mmaps;
> > @@ -576,7 +576,7 @@ int perf_evlist__mmap_ops(struct perf_evlist *evlist,
> >                         struct perf_mmap_param *mp)
> >  {
> >       struct perf_evsel *evsel;
> > -     const struct perf_cpu_map *cpus = evlist->cpus;
> > +     const struct perf_cpu_map *cpus = evlist->user_cpus;
> >       const struct perf_thread_map *threads = evlist->threads;
> >
> >       if (!ops || !ops->get || !ops->mmap)
> > diff --git a/tools/lib/perf/include/internal/evlist.h b/tools/lib/perf/include/internal/evlist.h
> > index 4cefade540bd..5f95672662ae 100644
> > --- a/tools/lib/perf/include/internal/evlist.h
> > +++ b/tools/lib/perf/include/internal/evlist.h
> > @@ -19,7 +19,9 @@ struct perf_evlist {
> >       int                      nr_entries;
> >       int                      nr_groups;
> >       bool                     has_user_cpus;
> > -     struct perf_cpu_map     *cpus;
> > +     /** The list of cpus passed from the command line. */
> > +     struct perf_cpu_map     *user_cpus;
> > +     /** The union of all evsel cpu maps. */
> >       struct perf_cpu_map     *all_cpus;
> >       struct perf_thread_map  *threads;
> >       int                      nr_mmaps;
> > diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
> > index cbc555245959..405d58903d84 100644
> > --- a/tools/perf/arch/arm/util/cs-etm.c
> > +++ b/tools/perf/arch/arm/util/cs-etm.c
> > @@ -199,7 +199,7 @@ static int cs_etm_set_option(struct auxtrace_record *itr,
> >                            struct evsel *evsel, u32 option)
> >  {
> >       int i, err = -EINVAL;
> > -     struct perf_cpu_map *event_cpus = evsel->evlist->core.cpus;
> > +     struct perf_cpu_map *event_cpus = evsel->evlist->core.user_cpus;
> >       struct perf_cpu_map *online_cpus = perf_cpu_map__new(NULL);
> >
> >       /* Set option of each CPU we have */
> > @@ -299,7 +299,7 @@ static int cs_etm_recording_options(struct auxtrace_record *itr,
> >                               container_of(itr, struct cs_etm_recording, itr);
> >       struct perf_pmu *cs_etm_pmu = ptr->cs_etm_pmu;
> >       struct evsel *evsel, *cs_etm_evsel = NULL;
> > -     struct perf_cpu_map *cpus = evlist->core.cpus;
> > +     struct perf_cpu_map *cpus = evlist->core.user_cpus;
> >       bool privileged = perf_event_paranoid_check(-1);
> >       int err = 0;
> >
> > @@ -522,7 +522,7 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused,
> >  {
> >       int i;
> >       int etmv3 = 0, etmv4 = 0, ete = 0;
> > -     struct perf_cpu_map *event_cpus = evlist->core.cpus;
> > +     struct perf_cpu_map *event_cpus = evlist->core.user_cpus;
> >       struct perf_cpu_map *online_cpus = perf_cpu_map__new(NULL);
> >
> >       /* cpu map is not empty, we have specific CPUs to work with */
> > @@ -713,7 +713,7 @@ static int cs_etm_info_fill(struct auxtrace_record *itr,
> >       u32 offset;
> >       u64 nr_cpu, type;
> >       struct perf_cpu_map *cpu_map;
> > -     struct perf_cpu_map *event_cpus = session->evlist->core.cpus;
> > +     struct perf_cpu_map *event_cpus = session->evlist->core.user_cpus;
> >       struct perf_cpu_map *online_cpus = perf_cpu_map__new(NULL);
> >       struct cs_etm_recording *ptr =
> >                       container_of(itr, struct cs_etm_recording, itr);
> > diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
> > index 5860bbaea95a..83ad05613321 100644
> > --- a/tools/perf/arch/arm64/util/arm-spe.c
> > +++ b/tools/perf/arch/arm64/util/arm-spe.c
> > @@ -144,7 +144,7 @@ static int arm_spe_recording_options(struct auxtrace_record *itr,
> >                       container_of(itr, struct arm_spe_recording, itr);
> >       struct perf_pmu *arm_spe_pmu = sper->arm_spe_pmu;
> >       struct evsel *evsel, *arm_spe_evsel = NULL;
> > -     struct perf_cpu_map *cpus = evlist->core.cpus;
> > +     struct perf_cpu_map *cpus = evlist->core.user_cpus;
> >       bool privileged = perf_event_paranoid_check(-1);
> >       struct evsel *tracking_evsel;
> >       int err;
> > diff --git a/tools/perf/arch/x86/util/intel-bts.c b/tools/perf/arch/x86/util/intel-bts.c
> > index 4a76d49d25d6..c9d73ecfd795 100644
> > --- a/tools/perf/arch/x86/util/intel-bts.c
> > +++ b/tools/perf/arch/x86/util/intel-bts.c
> > @@ -110,7 +110,7 @@ static int intel_bts_recording_options(struct auxtrace_record *itr,
> >                       container_of(itr, struct intel_bts_recording, itr);
> >       struct perf_pmu *intel_bts_pmu = btsr->intel_bts_pmu;
> >       struct evsel *evsel, *intel_bts_evsel = NULL;
> > -     const struct perf_cpu_map *cpus = evlist->core.cpus;
> > +     const struct perf_cpu_map *cpus = evlist->core.user_cpus;
> >       bool privileged = perf_event_paranoid_check(-1);
> >
> >       if (opts->auxtrace_sample_mode) {
> > diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
> > index 8c31578d6f4a..58bf24960273 100644
> > --- a/tools/perf/arch/x86/util/intel-pt.c
> > +++ b/tools/perf/arch/x86/util/intel-pt.c
> > @@ -382,7 +382,7 @@ static int intel_pt_info_fill(struct auxtrace_record *itr,
> >                       ui__warning("Intel Processor Trace: TSC not available\n");
> >       }
> >
> > -     per_cpu_mmaps = !perf_cpu_map__empty(session->evlist->core.cpus);
> > +     per_cpu_mmaps = !perf_cpu_map__empty(session->evlist->core.user_cpus);
> >
> >       auxtrace_info->type = PERF_AUXTRACE_INTEL_PT;
> >       auxtrace_info->priv[INTEL_PT_PMU_TYPE] = intel_pt_pmu->type;
> > @@ -632,7 +632,7 @@ static int intel_pt_recording_options(struct auxtrace_record *itr,
> >       struct perf_pmu *intel_pt_pmu = ptr->intel_pt_pmu;
> >       bool have_timing_info, need_immediate = false;
> >       struct evsel *evsel, *intel_pt_evsel = NULL;
> > -     const struct perf_cpu_map *cpus = evlist->core.cpus;
> > +     const struct perf_cpu_map *cpus = evlist->core.user_cpus;
> >       bool privileged = perf_event_paranoid_check(-1);
> >       u64 tsc_bit;
> >       int err;
> > diff --git a/tools/perf/bench/evlist-open-close.c b/tools/perf/bench/evlist-open-close.c
> > index de56601f69ee..5bdc6b476a4d 100644
> > --- a/tools/perf/bench/evlist-open-close.c
> > +++ b/tools/perf/bench/evlist-open-close.c
> > @@ -151,7 +151,7 @@ static int bench_evlist_open_close__run(char *evstr)
> >
> >       init_stats(&time_stats);
> >
> > -     printf("  Number of cpus:\t%d\n", perf_cpu_map__nr(evlist->core.cpus));
> > +     printf("  Number of cpus:\t%d\n", perf_cpu_map__nr(evlist->core.user_cpus));
> >       printf("  Number of threads:\t%d\n", evlist->core.threads->nr);
> >       printf("  Number of events:\t%d (%d fds)\n",
> >               evlist->core.nr_entries, evlist__count_evsel_fds(evlist));
> > diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
> > index ad9ce1bfffa1..642cbc6fdfc5 100644
> > --- a/tools/perf/builtin-ftrace.c
> > +++ b/tools/perf/builtin-ftrace.c
> > @@ -301,7 +301,7 @@ static int set_tracing_cpumask(struct perf_cpu_map *cpumap)
> >
> >  static int set_tracing_cpu(struct perf_ftrace *ftrace)
> >  {
> > -     struct perf_cpu_map *cpumap = ftrace->evlist->core.cpus;
> > +     struct perf_cpu_map *cpumap = ftrace->evlist->core.user_cpus;
> >
> >       if (!target__has_cpu(&ftrace->target))
> >               return 0;
> > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > index 0b4abed555d8..28ab3866802c 100644
> > --- a/tools/perf/builtin-record.c
> > +++ b/tools/perf/builtin-record.c
> > @@ -987,7 +987,7 @@ static int record__thread_data_init_maps(struct record_thread *thread_data, stru
> >       int m, tm, nr_mmaps = evlist->core.nr_mmaps;
> >       struct mmap *mmap = evlist->mmap;
> >       struct mmap *overwrite_mmap = evlist->overwrite_mmap;
> > -     struct perf_cpu_map *cpus = evlist->core.cpus;
> > +     struct perf_cpu_map *cpus = evlist->core.user_cpus;
> >
> >       thread_data->nr_mmaps = bitmap_weight(thread_data->mask->maps.bits,
> >                                             thread_data->mask->maps.nbits);
> > @@ -1881,7 +1881,7 @@ static int record__synthesize(struct record *rec, bool tail)
> >               return err;
> >       }
> >
> > -     err = perf_event__synthesize_cpu_map(&rec->tool, rec->evlist->core.cpus,
> > +     err = perf_event__synthesize_cpu_map(&rec->tool, rec->evlist->core.user_cpus,
> >                                            process_synthesized_event, NULL);
> >       if (err < 0) {
> >               pr_err("Couldn't synthesize cpu map.\n");
> > @@ -3675,7 +3675,7 @@ static int record__init_thread_default_masks(struct record *rec, struct perf_cpu
> >  static int record__init_thread_masks(struct record *rec)
> >  {
> >       int ret = 0;
> > -     struct perf_cpu_map *cpus = rec->evlist->core.cpus;
> > +     struct perf_cpu_map *cpus = rec->evlist->core.user_cpus;
> >
> >       if (!record__threads_enabled(rec))
> >               return record__init_thread_default_masks(rec, cpus);
> > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> > index 4ee40de698a4..5bee529f7656 100644
> > --- a/tools/perf/builtin-stat.c
> > +++ b/tools/perf/builtin-stat.c
> > @@ -804,7 +804,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
> >       if (group)
> >               evlist__set_leader(evsel_list);
> >
> > -     if (!cpu_map__is_dummy(evsel_list->core.cpus)) {
> > +     if (!cpu_map__is_dummy(evsel_list->core.user_cpus)) {
> >               if (affinity__setup(&saved_affinity) < 0)
> >                       return -1;
> >               affinity = &saved_affinity;
> > @@ -1458,7 +1458,7 @@ static int perf_stat_init_aggr_mode(void)
> >       aggr_cpu_id_get_t get_id = aggr_mode__get_aggr(stat_config.aggr_mode);
> >
> >       if (get_id) {
> > -             stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.cpus,
> > +             stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.user_cpus,
> >                                                        get_id, /*data=*/NULL);
> >               if (!stat_config.aggr_map) {
> >                       pr_err("cannot build %s map", aggr_mode__string[stat_config.aggr_mode]);
> > @@ -1472,7 +1472,7 @@ static int perf_stat_init_aggr_mode(void)
> >        * taking the highest cpu number to be the size of
> >        * the aggregation translate cpumap.
> >        */
> > -     nr = perf_cpu_map__max(evsel_list->core.cpus).cpu;
> > +     nr = perf_cpu_map__max(evsel_list->core.user_cpus).cpu;
> >       stat_config.cpus_aggr_map = cpu_aggr_map__empty_new(nr + 1);
> >       return stat_config.cpus_aggr_map ? 0 : -ENOMEM;
> >  }
> > @@ -1627,7 +1627,7 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
> >       if (!get_id)
> >               return 0;
> >
> > -     stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.cpus, get_id, env);
> > +     stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.user_cpus, get_id, env);
> >       if (!stat_config.aggr_map) {
> >               pr_err("cannot build %s map", aggr_mode__string[stat_config.aggr_mode]);
> >               return -1;
> > diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
> > index 9b08e44a31d9..4cfa112292d0 100644
> > --- a/tools/perf/builtin-top.c
> > +++ b/tools/perf/builtin-top.c
> > @@ -1021,7 +1021,7 @@ static int perf_top__start_counters(struct perf_top *top)
> >
> >       evlist__for_each_entry(evlist, counter) {
> >  try_again:
> > -             if (evsel__open(counter, top->evlist->core.cpus,
> > +             if (evsel__open(counter, top->evlist->core.user_cpus,
> >                                    top->evlist->core.threads) < 0) {
> >
> >                       /*
> > diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
> > index 9e48652662d4..b138dd6bdefc 100644
> > --- a/tools/perf/util/auxtrace.c
> > +++ b/tools/perf/util/auxtrace.c
> > @@ -174,7 +174,7 @@ void auxtrace_mmap_params__set_idx(struct auxtrace_mmap_params *mp,
> >       mp->idx = idx;
> >
> >       if (per_cpu) {
> > -             mp->cpu = perf_cpu_map__cpu(evlist->core.cpus, idx);
> > +             mp->cpu = perf_cpu_map__cpu(evlist->core.user_cpus, idx);
> >               if (evlist->core.threads)
> >                       mp->tid = perf_thread_map__pid(evlist->core.threads, 0);
> >               else
> > diff --git a/tools/perf/util/bpf_ftrace.c b/tools/perf/util/bpf_ftrace.c
> > index 4f4d3aaff37c..69481b28b885 100644
> > --- a/tools/perf/util/bpf_ftrace.c
> > +++ b/tools/perf/util/bpf_ftrace.c
> > @@ -38,7 +38,7 @@ int perf_ftrace__latency_prepare_bpf(struct perf_ftrace *ftrace)
> >
> >       /* don't need to set cpu filter for system-wide mode */
> >       if (ftrace->target.cpu_list) {
> > -             ncpus = perf_cpu_map__nr(ftrace->evlist->core.cpus);
> > +             ncpus = perf_cpu_map__nr(ftrace->evlist->core.user_cpus);
> >               bpf_map__set_max_entries(skel->maps.cpu_filter, ncpus);
> >       }
> >
> > @@ -63,7 +63,7 @@ int perf_ftrace__latency_prepare_bpf(struct perf_ftrace *ftrace)
> >               fd = bpf_map__fd(skel->maps.cpu_filter);
> >
> >               for (i = 0; i < ncpus; i++) {
> > -                     cpu = perf_cpu_map__cpu(ftrace->evlist->core.cpus, i).cpu;
> > +                     cpu = perf_cpu_map__cpu(ftrace->evlist->core.user_cpus, i).cpu;
> >                       bpf_map_update_elem(fd, &cpu, &val, BPF_ANY);
> >               }
> >       }
> > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> > index 9bb79e049957..d335fb713f5e 100644
> > --- a/tools/perf/util/evlist.c
> > +++ b/tools/perf/util/evlist.c
> > @@ -440,7 +440,7 @@ static void __evlist__disable(struct evlist *evlist, char *evsel_name)
> >       bool has_imm = false;
> >
> >       // See explanation in evlist__close()
> > -     if (!cpu_map__is_dummy(evlist->core.cpus)) {
> > +     if (!cpu_map__is_dummy(evlist->core.user_cpus)) {
> >               if (affinity__setup(&saved_affinity) < 0)
> >                       return;
> >               affinity = &saved_affinity;
> > @@ -500,7 +500,7 @@ static void __evlist__enable(struct evlist *evlist, char *evsel_name)
> >       struct affinity saved_affinity, *affinity = NULL;
> >
> >       // See explanation in evlist__close()
> > -     if (!cpu_map__is_dummy(evlist->core.cpus)) {
> > +     if (!cpu_map__is_dummy(evlist->core.user_cpus)) {
> >               if (affinity__setup(&saved_affinity) < 0)
> >                       return;
> >               affinity = &saved_affinity;
> > @@ -565,7 +565,7 @@ static int evlist__enable_event_cpu(struct evlist *evlist, struct evsel *evsel,
> >  static int evlist__enable_event_thread(struct evlist *evlist, struct evsel *evsel, int thread)
> >  {
> >       int cpu;
> > -     int nr_cpus = perf_cpu_map__nr(evlist->core.cpus);
> > +     int nr_cpus = perf_cpu_map__nr(evlist->core.user_cpus);
> >
> >       if (!evsel->core.fd)
> >               return -EINVAL;
> > @@ -580,7 +580,7 @@ static int evlist__enable_event_thread(struct evlist *evlist, struct evsel *evse
> >
> >  int evlist__enable_event_idx(struct evlist *evlist, struct evsel *evsel, int idx)
> >  {
> > -     bool per_cpu_mmaps = !perf_cpu_map__empty(evlist->core.cpus);
> > +     bool per_cpu_mmaps = !perf_cpu_map__empty(evlist->core.user_cpus);
> >
> >       if (per_cpu_mmaps)
> >               return evlist__enable_event_cpu(evlist, evsel, idx);
> > @@ -1301,10 +1301,10 @@ void evlist__close(struct evlist *evlist)
> >       struct affinity affinity;
> >
> >       /*
> > -      * With perf record core.cpus is usually NULL.
> > +      * With perf record core.user_cpus is usually NULL.
> >        * Use the old method to handle this for now.
> >        */
> > -     if (!evlist->core.cpus || cpu_map__is_dummy(evlist->core.cpus)) {
> > +     if (!evlist->core.user_cpus || cpu_map__is_dummy(evlist->core.user_cpus)) {
> >               evlist__for_each_entry_reverse(evlist, evsel)
> >                       evsel__close(evsel);
> >               return;
> > @@ -1367,7 +1367,7 @@ int evlist__open(struct evlist *evlist)
> >        * Default: one fd per CPU, all threads, aka systemwide
> >        * as sys_perf_event_open(cpu = -1, thread = -1) is EINVAL
> >        */
> > -     if (evlist->core.threads == NULL && evlist->core.cpus == NULL) {
> > +     if (evlist->core.threads == NULL && evlist->core.user_cpus == NULL) {
> >               err = evlist__create_syswide_maps(evlist);
> >               if (err < 0)
> >                       goto out_err;
> > diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
> > index 007a64681416..ff326eba084f 100644
> > --- a/tools/perf/util/record.c
> > +++ b/tools/perf/util/record.c
> > @@ -106,7 +106,7 @@ void evlist__config(struct evlist *evlist, struct record_opts *opts, struct call
> >       if (opts->group)
> >               evlist__set_leader(evlist);
> >
> > -     if (perf_cpu_map__cpu(evlist->core.cpus, 0).cpu < 0)
> > +     if (perf_cpu_map__cpu(evlist->core.user_cpus, 0).cpu < 0)
> >               opts->no_inherit = true;
> >
> >       use_comm_exec = perf_can_comm_exec();
> > @@ -244,7 +244,7 @@ bool evlist__can_select_event(struct evlist *evlist, const char *str)
> >
> >       evsel = evlist__last(temp_evlist);
> >
> > -     if (!evlist || perf_cpu_map__empty(evlist->core.cpus)) {
> > +     if (!evlist || perf_cpu_map__empty(evlist->core.user_cpus)) {
> >               struct perf_cpu_map *cpus = perf_cpu_map__new(NULL);
> >
> >               if (cpus)
> > @@ -252,7 +252,7 @@ bool evlist__can_select_event(struct evlist *evlist, const char *str)
> >
> >               perf_cpu_map__put(cpus);
> >       } else {
> > -             cpu = perf_cpu_map__cpu(evlist->core.cpus, 0);
> > +             cpu = perf_cpu_map__cpu(evlist->core.user_cpus, 0);
> >       }
> >
> >       while (1) {
> > diff --git a/tools/perf/util/sideband_evlist.c b/tools/perf/util/sideband_evlist.c
> > index 748371ac22be..9f58c68a25f7 100644
> > --- a/tools/perf/util/sideband_evlist.c
> > +++ b/tools/perf/util/sideband_evlist.c
> > @@ -114,7 +114,7 @@ int evlist__start_sb_thread(struct evlist *evlist, struct target *target)
> >       }
> >
> >       evlist__for_each_entry(evlist, counter) {
> > -             if (evsel__open(counter, evlist->core.cpus, evlist->core.threads) < 0)
> > +             if (evsel__open(counter, evlist->core.user_cpus, evlist->core.threads) < 0)
> >                       goto out_delete_evlist;
> >       }
> >
> > diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> > index 9cbe351b141f..634dd9ea2b35 100644
> > --- a/tools/perf/util/stat-display.c
> > +++ b/tools/perf/util/stat-display.c
> > @@ -929,7 +929,7 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
> >       int all_idx;
> >       struct perf_cpu cpu;
> >
> > -     perf_cpu_map__for_each_cpu(cpu, all_idx, evlist->core.cpus) {
> > +     perf_cpu_map__for_each_cpu(cpu, all_idx, evlist->core.user_cpus) {
> >               struct evsel *counter;
> >               bool first = true;
> >
> > diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
> > index b654de0841f8..591afc6c607b 100644
> > --- a/tools/perf/util/synthetic-events.c
> > +++ b/tools/perf/util/synthetic-events.c
> > @@ -2127,7 +2127,7 @@ int perf_event__synthesize_stat_events(struct perf_stat_config *config, struct p
> >               return err;
> >       }
> >
> > -     err = perf_event__synthesize_cpu_map(tool, evlist->core.cpus, process, NULL);
> > +     err = perf_event__synthesize_cpu_map(tool, evlist->core.user_cpus, process, NULL);
> >       if (err < 0) {
> >               pr_err("Couldn't synthesize thread map.\n");
> >               return err;
> > diff --git a/tools/perf/util/top.c b/tools/perf/util/top.c
> > index c1ebfc5d2e0c..e98422f3ff17 100644
> > --- a/tools/perf/util/top.c
> > +++ b/tools/perf/util/top.c
> > @@ -95,15 +95,16 @@ size_t perf_top__header_snprintf(struct perf_top *top, char *bf, size_t size)
> >
> >       if (target->cpu_list)
> >               ret += SNPRINTF(bf + ret, size - ret, ", CPU%s: %s)",
> > -                             perf_cpu_map__nr(top->evlist->core.cpus) > 1 ? "s" : "",
> > +                             perf_cpu_map__nr(top->evlist->core.user_cpus) > 1 ? "s" : "",
> >                               target->cpu_list);
> >       else {
> >               if (target->tid)
> >                       ret += SNPRINTF(bf + ret, size - ret, ")");
> >               else
> >                       ret += SNPRINTF(bf + ret, size - ret, ", %d CPU%s)",
> > -                                     perf_cpu_map__nr(top->evlist->core.cpus),
> > -                                     perf_cpu_map__nr(top->evlist->core.cpus) > 1 ? "s" : "");
> > +                                     perf_cpu_map__nr(top->evlist->core.user_cpus),
> > +                                     perf_cpu_map__nr(top->evlist->core.user_cpus) > 1
> > +                                     ? "s" : "");
> >       }
> >
> >       perf_top__reset_sample_counters(top);
> > --
> > 2.35.1.1021.g381101b075-goog
>
> --
>
> - Arnaldo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/5] perf cpumap: Add intersect function.
  2022-03-28 20:54       ` Ian Rogers
@ 2022-03-28 20:59         ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 34+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-03-28 20:59 UTC (permalink / raw)
  To: Ian Rogers, Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Adrian Hunter, Riccardo Mancini, Andi Kleen,
	Alexey Bayduraev, Alexander Antonov, linux-perf-users,
	linux-kernel, coresight, linux-arm-kernel, netdev, bpf,
	Stephane Eranian



On March 28, 2022 5:54:06 PM GMT-03:00, Ian Rogers <irogers@google.com> wrote:
>On Mon, Mar 28, 2022 at 1:28 PM Arnaldo Carvalho de Melo
><acme@kernel.org> wrote:
>>
>> Em Sun, Mar 27, 2022 at 11:24:12PM -0700, Ian Rogers escreveu:
>> > The merge function gives the union of two cpu maps. Add an intersect
>> > function which will be used in the next change.
>> >
>> > Signed-off-by: Ian Rogers <irogers@google.com>
>> > ---
>> >  tools/lib/perf/cpumap.c              | 38 ++++++++++++++++++++++++++++
>> >  tools/lib/perf/include/perf/cpumap.h |  2 ++
>> >  2 files changed, 40 insertions(+)
>> >
>> > diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
>> > index 953bc50b0e41..56b4d213039f 100644
>> > --- a/tools/lib/perf/cpumap.c
>> > +++ b/tools/lib/perf/cpumap.c
>> > @@ -393,3 +393,41 @@ struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
>> >       perf_cpu_map__put(orig);
>> >       return merged;
>> >  }
>> > +
>> > +struct perf_cpu_map *perf_cpu_map__intersect(struct perf_cpu_map *orig,
>> > +                                          struct perf_cpu_map *other)
>> > +{
>> > +     struct perf_cpu *tmp_cpus;
>> > +     int tmp_len;
>> > +     int i, j, k;
>> > +     struct perf_cpu_map *merged = NULL;
>> > +
>> > +     if (perf_cpu_map__is_subset(other, orig))
>> > +             return orig;
>> > +     if (perf_cpu_map__is_subset(orig, other)) {
>> > +             perf_cpu_map__put(orig);
>>
>> Why this put(orig)?
>
>As with merge, if orig isn't returned then it is put.

For merge I can see it dropping a reference, i.e. get b and merge it into a, after that b was "consumed"

But for intersect?


>
>> > +             return perf_cpu_map__get(other);
>>
>> And why the get here and not on the first if?
>
>The first argument orig is either put or returned while the second may
>be returned only if the reference count is incremented. We could
>change the API for merge and intersect to put both arguments, or to
>not put either argument.
>
>Thanks,
>Ian
>
>> > +     }
>> > +
>> > +     tmp_len = max(orig->nr, other->nr);
>> > +     tmp_cpus = malloc(tmp_len * sizeof(struct perf_cpu));
>> > +     if (!tmp_cpus)
>> > +             return NULL;
>> > +
>> > +     i = j = k = 0;
>> > +     while (i < orig->nr && j < other->nr) {
>> > +             if (orig->map[i].cpu < other->map[j].cpu)
>> > +                     i++;
>> > +             else if (orig->map[i].cpu > other->map[j].cpu)
>> > +                     j++;
>> > +             else {
>> > +                     j++;
>> > +                     tmp_cpus[k++] = orig->map[i++];
>> > +             }
>> > +     }
>> > +     if (k)
>> > +             merged = cpu_map__trim_new(k, tmp_cpus);
>> > +     free(tmp_cpus);
>> > +     perf_cpu_map__put(orig);
>> > +     return merged;
>> > +}
>> > diff --git a/tools/lib/perf/include/perf/cpumap.h b/tools/lib/perf/include/perf/cpumap.h
>> > index 4a2edbdb5e2b..a2a7216c0b78 100644
>> > --- a/tools/lib/perf/include/perf/cpumap.h
>> > +++ b/tools/lib/perf/include/perf/cpumap.h
>> > @@ -19,6 +19,8 @@ LIBPERF_API struct perf_cpu_map *perf_cpu_map__read(FILE *file);
>> >  LIBPERF_API struct perf_cpu_map *perf_cpu_map__get(struct perf_cpu_map *map);
>> >  LIBPERF_API struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
>> >                                                    struct perf_cpu_map *other);
>> > +LIBPERF_API struct perf_cpu_map *perf_cpu_map__intersect(struct perf_cpu_map *orig,
>> > +                                                      struct perf_cpu_map *other);
>> >  LIBPERF_API void perf_cpu_map__put(struct perf_cpu_map *map);
>> >  LIBPERF_API struct perf_cpu perf_cpu_map__cpu(const struct perf_cpu_map *cpus, int idx);
>> >  LIBPERF_API int perf_cpu_map__nr(const struct perf_cpu_map *cpus);
>> > --
>> > 2.35.1.1021.g381101b075-goog
>>
>> --
>>
>> - Arnaldo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/5] perf cpumap: Add intersect function.
@ 2022-03-28 20:59         ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 34+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-03-28 20:59 UTC (permalink / raw)
  To: Ian Rogers, Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Leo Yan, John Garry, Will Deacon, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Kajol Jain, James Clark,
	German Gomez, Adrian Hunter, Riccardo Mancini, Andi Kleen,
	Alexey Bayduraev, Alexander Antonov, linux-perf-users,
	linux-kernel, coresight, linux-arm-kernel, netdev, bpf,
	Stephane Eranian



On March 28, 2022 5:54:06 PM GMT-03:00, Ian Rogers <irogers@google.com> wrote:
>On Mon, Mar 28, 2022 at 1:28 PM Arnaldo Carvalho de Melo
><acme@kernel.org> wrote:
>>
>> Em Sun, Mar 27, 2022 at 11:24:12PM -0700, Ian Rogers escreveu:
>> > The merge function gives the union of two cpu maps. Add an intersect
>> > function which will be used in the next change.
>> >
>> > Signed-off-by: Ian Rogers <irogers@google.com>
>> > ---
>> >  tools/lib/perf/cpumap.c              | 38 ++++++++++++++++++++++++++++
>> >  tools/lib/perf/include/perf/cpumap.h |  2 ++
>> >  2 files changed, 40 insertions(+)
>> >
>> > diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
>> > index 953bc50b0e41..56b4d213039f 100644
>> > --- a/tools/lib/perf/cpumap.c
>> > +++ b/tools/lib/perf/cpumap.c
>> > @@ -393,3 +393,41 @@ struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
>> >       perf_cpu_map__put(orig);
>> >       return merged;
>> >  }
>> > +
>> > +struct perf_cpu_map *perf_cpu_map__intersect(struct perf_cpu_map *orig,
>> > +                                          struct perf_cpu_map *other)
>> > +{
>> > +     struct perf_cpu *tmp_cpus;
>> > +     int tmp_len;
>> > +     int i, j, k;
>> > +     struct perf_cpu_map *merged = NULL;
>> > +
>> > +     if (perf_cpu_map__is_subset(other, orig))
>> > +             return orig;
>> > +     if (perf_cpu_map__is_subset(orig, other)) {
>> > +             perf_cpu_map__put(orig);
>>
>> Why this put(orig)?
>
>As with merge, if orig isn't returned then it is put.

For merge I can see it dropping a reference, i.e. get b and merge it into a, after that b was "consumed"

But for intersect?


>
>> > +             return perf_cpu_map__get(other);
>>
>> And why the get here and not on the first if?
>
>The first argument orig is either put or returned while the second may
>be returned only if the reference count is incremented. We could
>change the API for merge and intersect to put both arguments, or to
>not put either argument.
>
>Thanks,
>Ian
>
>> > +     }
>> > +
>> > +     tmp_len = max(orig->nr, other->nr);
>> > +     tmp_cpus = malloc(tmp_len * sizeof(struct perf_cpu));
>> > +     if (!tmp_cpus)
>> > +             return NULL;
>> > +
>> > +     i = j = k = 0;
>> > +     while (i < orig->nr && j < other->nr) {
>> > +             if (orig->map[i].cpu < other->map[j].cpu)
>> > +                     i++;
>> > +             else if (orig->map[i].cpu > other->map[j].cpu)
>> > +                     j++;
>> > +             else {
>> > +                     j++;
>> > +                     tmp_cpus[k++] = orig->map[i++];
>> > +             }
>> > +     }
>> > +     if (k)
>> > +             merged = cpu_map__trim_new(k, tmp_cpus);
>> > +     free(tmp_cpus);
>> > +     perf_cpu_map__put(orig);
>> > +     return merged;
>> > +}
>> > diff --git a/tools/lib/perf/include/perf/cpumap.h b/tools/lib/perf/include/perf/cpumap.h
>> > index 4a2edbdb5e2b..a2a7216c0b78 100644
>> > --- a/tools/lib/perf/include/perf/cpumap.h
>> > +++ b/tools/lib/perf/include/perf/cpumap.h
>> > @@ -19,6 +19,8 @@ LIBPERF_API struct perf_cpu_map *perf_cpu_map__read(FILE *file);
>> >  LIBPERF_API struct perf_cpu_map *perf_cpu_map__get(struct perf_cpu_map *map);
>> >  LIBPERF_API struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
>> >                                                    struct perf_cpu_map *other);
>> > +LIBPERF_API struct perf_cpu_map *perf_cpu_map__intersect(struct perf_cpu_map *orig,
>> > +                                                      struct perf_cpu_map *other);
>> >  LIBPERF_API void perf_cpu_map__put(struct perf_cpu_map *map);
>> >  LIBPERF_API struct perf_cpu perf_cpu_map__cpu(const struct perf_cpu_map *cpus, int idx);
>> >  LIBPERF_API int perf_cpu_map__nr(const struct perf_cpu_map *cpus);
>> > --
>> > 2.35.1.1021.g381101b075-goog
>>
>> --
>>
>> - Arnaldo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/5] perf cpumap: Add intersect function.
  2022-03-28 20:59         ` Arnaldo Carvalho de Melo
@ 2022-03-28 21:25           ` Ian Rogers
  -1 siblings, 0 replies; 34+ messages in thread
From: Ian Rogers @ 2022-03-28 21:25 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel, coresight,
	linux-arm-kernel, netdev, bpf, Stephane Eranian

On Mon, Mar 28, 2022 at 2:00 PM Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
>
>
>
> On March 28, 2022 5:54:06 PM GMT-03:00, Ian Rogers <irogers@google.com> wrote:
> >On Mon, Mar 28, 2022 at 1:28 PM Arnaldo Carvalho de Melo
> ><acme@kernel.org> wrote:
> >>
> >> Em Sun, Mar 27, 2022 at 11:24:12PM -0700, Ian Rogers escreveu:
> >> > The merge function gives the union of two cpu maps. Add an intersect
> >> > function which will be used in the next change.
> >> >
> >> > Signed-off-by: Ian Rogers <irogers@google.com>
> >> > ---
> >> >  tools/lib/perf/cpumap.c              | 38 ++++++++++++++++++++++++++++
> >> >  tools/lib/perf/include/perf/cpumap.h |  2 ++
> >> >  2 files changed, 40 insertions(+)
> >> >
> >> > diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
> >> > index 953bc50b0e41..56b4d213039f 100644
> >> > --- a/tools/lib/perf/cpumap.c
> >> > +++ b/tools/lib/perf/cpumap.c
> >> > @@ -393,3 +393,41 @@ struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
> >> >       perf_cpu_map__put(orig);
> >> >       return merged;
> >> >  }
> >> > +
> >> > +struct perf_cpu_map *perf_cpu_map__intersect(struct perf_cpu_map *orig,
> >> > +                                          struct perf_cpu_map *other)
> >> > +{
> >> > +     struct perf_cpu *tmp_cpus;
> >> > +     int tmp_len;
> >> > +     int i, j, k;
> >> > +     struct perf_cpu_map *merged = NULL;
> >> > +
> >> > +     if (perf_cpu_map__is_subset(other, orig))
> >> > +             return orig;
> >> > +     if (perf_cpu_map__is_subset(orig, other)) {
> >> > +             perf_cpu_map__put(orig);
> >>
> >> Why this put(orig)?
> >
> >As with merge, if orig isn't returned then it is put.
>
> For merge I can see it dropping a reference, i.e. get b and merge it into a, after that b was "consumed"
>
> But for intersect?

The current use case is the intersect of all online CPUs with the
merge of all CPU maps from evsels. So we can generally just reuse
all_cpus, or the common case of both maps contain every CPU. I think
the pattern makes code like:

evlist->cpus = perf_cpu_map__intersect(evlist->cpus, other);

not quite as messy, as without the put you need:

tmp = perf_cpu_map__intersect(evlist->cpus, other);
perf_cpu_map__put(evlist->cpus);
evlist->cpus = tmp;

I'm somewhat agnostic on what the API should be, but it'd be nice if
merge and intersect behaved in a similar way.

Thanks,
Ian

> >
> >> > +             return perf_cpu_map__get(other);
> >>
> >> And why the get here and not on the first if?
> >
> >The first argument orig is either put or returned while the second may
> >be returned only if the reference count is incremented. We could
> >change the API for merge and intersect to put both arguments, or to
> >not put either argument.
> >
> >Thanks,
> >Ian
> >
> >> > +     }
> >> > +
> >> > +     tmp_len = max(orig->nr, other->nr);
> >> > +     tmp_cpus = malloc(tmp_len * sizeof(struct perf_cpu));
> >> > +     if (!tmp_cpus)
> >> > +             return NULL;
> >> > +
> >> > +     i = j = k = 0;
> >> > +     while (i < orig->nr && j < other->nr) {
> >> > +             if (orig->map[i].cpu < other->map[j].cpu)
> >> > +                     i++;
> >> > +             else if (orig->map[i].cpu > other->map[j].cpu)
> >> > +                     j++;
> >> > +             else {
> >> > +                     j++;
> >> > +                     tmp_cpus[k++] = orig->map[i++];
> >> > +             }
> >> > +     }
> >> > +     if (k)
> >> > +             merged = cpu_map__trim_new(k, tmp_cpus);
> >> > +     free(tmp_cpus);
> >> > +     perf_cpu_map__put(orig);
> >> > +     return merged;
> >> > +}
> >> > diff --git a/tools/lib/perf/include/perf/cpumap.h b/tools/lib/perf/include/perf/cpumap.h
> >> > index 4a2edbdb5e2b..a2a7216c0b78 100644
> >> > --- a/tools/lib/perf/include/perf/cpumap.h
> >> > +++ b/tools/lib/perf/include/perf/cpumap.h
> >> > @@ -19,6 +19,8 @@ LIBPERF_API struct perf_cpu_map *perf_cpu_map__read(FILE *file);
> >> >  LIBPERF_API struct perf_cpu_map *perf_cpu_map__get(struct perf_cpu_map *map);
> >> >  LIBPERF_API struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
> >> >                                                    struct perf_cpu_map *other);
> >> > +LIBPERF_API struct perf_cpu_map *perf_cpu_map__intersect(struct perf_cpu_map *orig,
> >> > +                                                      struct perf_cpu_map *other);
> >> >  LIBPERF_API void perf_cpu_map__put(struct perf_cpu_map *map);
> >> >  LIBPERF_API struct perf_cpu perf_cpu_map__cpu(const struct perf_cpu_map *cpus, int idx);
> >> >  LIBPERF_API int perf_cpu_map__nr(const struct perf_cpu_map *cpus);
> >> > --
> >> > 2.35.1.1021.g381101b075-goog
> >>
> >> --
> >>
> >> - Arnaldo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/5] perf cpumap: Add intersect function.
@ 2022-03-28 21:25           ` Ian Rogers
  0 siblings, 0 replies; 34+ messages in thread
From: Ian Rogers @ 2022-03-28 21:25 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach, Leo Yan,
	John Garry, Will Deacon, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Kajol Jain, James Clark, German Gomez,
	Adrian Hunter, Riccardo Mancini, Andi Kleen, Alexey Bayduraev,
	Alexander Antonov, linux-perf-users, linux-kernel, coresight,
	linux-arm-kernel, netdev, bpf, Stephane Eranian

On Mon, Mar 28, 2022 at 2:00 PM Arnaldo Carvalho de Melo
<arnaldo.melo@gmail.com> wrote:
>
>
>
> On March 28, 2022 5:54:06 PM GMT-03:00, Ian Rogers <irogers@google.com> wrote:
> >On Mon, Mar 28, 2022 at 1:28 PM Arnaldo Carvalho de Melo
> ><acme@kernel.org> wrote:
> >>
> >> Em Sun, Mar 27, 2022 at 11:24:12PM -0700, Ian Rogers escreveu:
> >> > The merge function gives the union of two cpu maps. Add an intersect
> >> > function which will be used in the next change.
> >> >
> >> > Signed-off-by: Ian Rogers <irogers@google.com>
> >> > ---
> >> >  tools/lib/perf/cpumap.c              | 38 ++++++++++++++++++++++++++++
> >> >  tools/lib/perf/include/perf/cpumap.h |  2 ++
> >> >  2 files changed, 40 insertions(+)
> >> >
> >> > diff --git a/tools/lib/perf/cpumap.c b/tools/lib/perf/cpumap.c
> >> > index 953bc50b0e41..56b4d213039f 100644
> >> > --- a/tools/lib/perf/cpumap.c
> >> > +++ b/tools/lib/perf/cpumap.c
> >> > @@ -393,3 +393,41 @@ struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
> >> >       perf_cpu_map__put(orig);
> >> >       return merged;
> >> >  }
> >> > +
> >> > +struct perf_cpu_map *perf_cpu_map__intersect(struct perf_cpu_map *orig,
> >> > +                                          struct perf_cpu_map *other)
> >> > +{
> >> > +     struct perf_cpu *tmp_cpus;
> >> > +     int tmp_len;
> >> > +     int i, j, k;
> >> > +     struct perf_cpu_map *merged = NULL;
> >> > +
> >> > +     if (perf_cpu_map__is_subset(other, orig))
> >> > +             return orig;
> >> > +     if (perf_cpu_map__is_subset(orig, other)) {
> >> > +             perf_cpu_map__put(orig);
> >>
> >> Why this put(orig)?
> >
> >As with merge, if orig isn't returned then it is put.
>
> For merge I can see it dropping a reference, i.e. get b and merge it into a, after that b was "consumed"
>
> But for intersect?

The current use case is the intersect of all online CPUs with the
merge of all CPU maps from evsels. So we can generally just reuse
all_cpus, or the common case of both maps contain every CPU. I think
the pattern makes code like:

evlist->cpus = perf_cpu_map__intersect(evlist->cpus, other);

not quite as messy, as without the put you need:

tmp = perf_cpu_map__intersect(evlist->cpus, other);
perf_cpu_map__put(evlist->cpus);
evlist->cpus = tmp;

I'm somewhat agnostic on what the API should be, but it'd be nice if
merge and intersect behaved in a similar way.

Thanks,
Ian

> >
> >> > +             return perf_cpu_map__get(other);
> >>
> >> And why the get here and not on the first if?
> >
> >The first argument orig is either put or returned while the second may
> >be returned only if the reference count is incremented. We could
> >change the API for merge and intersect to put both arguments, or to
> >not put either argument.
> >
> >Thanks,
> >Ian
> >
> >> > +     }
> >> > +
> >> > +     tmp_len = max(orig->nr, other->nr);
> >> > +     tmp_cpus = malloc(tmp_len * sizeof(struct perf_cpu));
> >> > +     if (!tmp_cpus)
> >> > +             return NULL;
> >> > +
> >> > +     i = j = k = 0;
> >> > +     while (i < orig->nr && j < other->nr) {
> >> > +             if (orig->map[i].cpu < other->map[j].cpu)
> >> > +                     i++;
> >> > +             else if (orig->map[i].cpu > other->map[j].cpu)
> >> > +                     j++;
> >> > +             else {
> >> > +                     j++;
> >> > +                     tmp_cpus[k++] = orig->map[i++];
> >> > +             }
> >> > +     }
> >> > +     if (k)
> >> > +             merged = cpu_map__trim_new(k, tmp_cpus);
> >> > +     free(tmp_cpus);
> >> > +     perf_cpu_map__put(orig);
> >> > +     return merged;
> >> > +}
> >> > diff --git a/tools/lib/perf/include/perf/cpumap.h b/tools/lib/perf/include/perf/cpumap.h
> >> > index 4a2edbdb5e2b..a2a7216c0b78 100644
> >> > --- a/tools/lib/perf/include/perf/cpumap.h
> >> > +++ b/tools/lib/perf/include/perf/cpumap.h
> >> > @@ -19,6 +19,8 @@ LIBPERF_API struct perf_cpu_map *perf_cpu_map__read(FILE *file);
> >> >  LIBPERF_API struct perf_cpu_map *perf_cpu_map__get(struct perf_cpu_map *map);
> >> >  LIBPERF_API struct perf_cpu_map *perf_cpu_map__merge(struct perf_cpu_map *orig,
> >> >                                                    struct perf_cpu_map *other);
> >> > +LIBPERF_API struct perf_cpu_map *perf_cpu_map__intersect(struct perf_cpu_map *orig,
> >> > +                                                      struct perf_cpu_map *other);
> >> >  LIBPERF_API void perf_cpu_map__put(struct perf_cpu_map *map);
> >> >  LIBPERF_API struct perf_cpu perf_cpu_map__cpu(const struct perf_cpu_map *cpus, int idx);
> >> >  LIBPERF_API int perf_cpu_map__nr(const struct perf_cpu_map *cpus);
> >> > --
> >> > 2.35.1.1021.g381101b075-goog
> >>
> >> --
> >>
> >> - Arnaldo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2022-03-28 21:27 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-28  6:24 [PATCH 0/5] Make evlist CPUs more accurate Ian Rogers
2022-03-28  6:24 ` Ian Rogers
2022-03-28  6:24 ` [PATCH 1/5] perf evlist: Rename cpus to user_cpus Ian Rogers
2022-03-28  6:24   ` Ian Rogers
2022-03-28 20:18   ` Arnaldo Carvalho de Melo
2022-03-28 20:18     ` Arnaldo Carvalho de Melo
2022-03-28 20:58     ` Ian Rogers
2022-03-28 20:58       ` Ian Rogers
2022-03-28  6:24 ` [PATCH 2/5] perf cpumap: More cpu map reuse by merge Ian Rogers
2022-03-28  6:24   ` Ian Rogers
2022-03-28 20:26   ` Arnaldo Carvalho de Melo
2022-03-28 20:26     ` Arnaldo Carvalho de Melo
2022-03-28 20:50     ` Ian Rogers
2022-03-28 20:50       ` Ian Rogers
2022-03-28 20:56       ` Arnaldo Carvalho de Melo
2022-03-28 20:56         ` Arnaldo Carvalho de Melo
2022-03-28  6:24 ` [PATCH 3/5] perf cpumap: Add intersect function Ian Rogers
2022-03-28  6:24   ` Ian Rogers
2022-03-28 20:28   ` Arnaldo Carvalho de Melo
2022-03-28 20:28     ` Arnaldo Carvalho de Melo
2022-03-28 20:54     ` Ian Rogers
2022-03-28 20:54       ` Ian Rogers
2022-03-28 20:59       ` Arnaldo Carvalho de Melo
2022-03-28 20:59         ` Arnaldo Carvalho de Melo
2022-03-28 21:25         ` Ian Rogers
2022-03-28 21:25           ` Ian Rogers
2022-03-28  6:24 ` [PATCH 4/5] perf stat: Avoid segv if core.user_cpus isn't set Ian Rogers
2022-03-28  6:24   ` Ian Rogers
2022-03-28 20:32   ` Arnaldo Carvalho de Melo
2022-03-28 20:32     ` Arnaldo Carvalho de Melo
2022-03-28 20:46     ` Ian Rogers
2022-03-28 20:46       ` Ian Rogers
2022-03-28  6:24 ` [PATCH 5/5] perf evlist: Respect all_cpus when setting user_cpus Ian Rogers
2022-03-28  6:24   ` Ian Rogers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.