All of lore.kernel.org
 help / color / mirror / Atom feed
From: Adrian Hunter <adrian.hunter@intel.com>
To: Ian Rogers <irogers@google.com>
Cc: Stephane Eranian <eranian@google.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
	Mathieu Poirier <mathieu.poirier@linaro.org>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Mike Leach <mike.leach@linaro.org>, Leo Yan <leo.yan@linaro.org>,
	John Garry <john.garry@huawei.com>, Will Deacon <will@kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>, Kajol Jain <kjain@linux.ibm.com>,
	James Clark <james.clark@arm.com>,
	German Gomez <german.gomez@arm.com>,
	Riccardo Mancini <rickyman7@gmail.com>,
	Andi Kleen <ak@linux.intel.com>,
	Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>,
	Alexander Antonov <alexander.antonov@linux.intel.com>,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
	coresight@lists.linaro.org, linux-arm-kernel@lists.infradead.org,
	netdev@vger.kernel.org, bpf@vger.kernel.org
Subject: Re: [PATCH v3 4/5] perf evlist: Respect all_cpus when setting user_requested_cpus
Date: Thu, 28 Apr 2022 23:15:42 +0300	[thread overview]
Message-ID: <c9205f19-52bf-43fe-b1ab-b599d5e2cc7a@intel.com> (raw)
In-Reply-To: <20220408035616.1356953-5-irogers@google.com>

On 8/04/22 06:56, Ian Rogers wrote:
> If all_cpus is calculated it represents the merge/union of all
> evsel cpu maps. By default user_requested_cpus is computed to be
> the online CPUs. For uncore events, it is often the case currently
> that all_cpus is a subset of user_requested_cpus. Metrics printed
> without aggregation and with metric-only, in print_no_aggr_metric,
> iterate over user_requested_cpus assuming every CPU has a metric to
> print. For each CPU the prefix is printed, but then if the
> evsel's cpus doesn't contain anything you get an empty line like
> the following on a 2 socket 36 core SkylakeX:
> 
> ```
> $ perf stat -A -M DRAM_BW_Use -a --metric-only -I 1000
>      1.000453137 CPU0                       0.00
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137 CPU18                      0.00
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      2.003717143 CPU0                       0.00
> ...
> ```
> 
> While it is possible to be lazier in printing the prefix and
> trailing newline, having user_requested_cpus not be a subset of
> all_cpus is preferential so that wasted work isn't done elsewhere
> user_requested_cpus is used. The change modifies user_requested_cpus
> to be the intersection of user specified CPUs, or default all online
> CPUs, with the CPUs computed through the merge of all evsel cpu maps.
> 
> New behavior:
> ```
> $ perf stat -A -M DRAM_BW_Use -a --metric-only -I 1000
>      1.001086325 CPU0                       0.00
>      1.001086325 CPU18                      0.00
>      2.003671291 CPU0                       0.00
>      2.003671291 CPU18                      0.00
> ...
> ```
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/util/evlist.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 52ea004ba01e..196d57b905a0 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -1036,6 +1036,13 @@ int evlist__create_maps(struct evlist *evlist, struct target *target)
>  	if (!cpus)
>  		goto out_delete_threads;
>  
> +	if (evlist->core.all_cpus) {
> +		struct perf_cpu_map *tmp;
> +
> +		tmp = perf_cpu_map__intersect(cpus, evlist->core.all_cpus);

Isn't an uncore PMU represented as being on CPU0 actually
collecting data that can be due to any CPU.

Or for an uncore PMU represented as being on CPU0-CPU4 on a
4 core 8 hyperthread processor, actually 1 PMU per core.

So I am not sure intersection makes sense.

Also it is not obvious what happens with hybrid CPUs or
per thread recording.

> +		perf_cpu_map__put(cpus);
> +		cpus = tmp;
> +	}
>  	evlist->core.has_user_cpus = !!target->cpu_list && !target->hybrid;
>  
>  	perf_evlist__set_maps(&evlist->core, cpus, threads);


WARNING: multiple messages have this Message-ID (diff)
From: Adrian Hunter <adrian.hunter@intel.com>
To: Ian Rogers <irogers@google.com>
Cc: Stephane Eranian <eranian@google.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
	Mathieu Poirier <mathieu.poirier@linaro.org>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Mike Leach <mike.leach@linaro.org>, Leo Yan <leo.yan@linaro.org>,
	John Garry <john.garry@huawei.com>, Will Deacon <will@kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>, Kajol Jain <kjain@linux.ibm.com>,
	James Clark <james.clark@arm.com>,
	German Gomez <german.gomez@arm.com>,
	Riccardo Mancini <rickyman7@gmail.com>,
	Andi Kleen <ak@linux.intel.com>,
	Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>,
	Alexander Antonov <alexander.antonov@linux.intel.com>,
	linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
	coresight@lists.linaro.org, linux-arm-kernel@lists.infradead.org,
	netdev@vger.kernel.org, bpf@vger.kernel.org
Subject: Re: [PATCH v3 4/5] perf evlist: Respect all_cpus when setting user_requested_cpus
Date: Thu, 28 Apr 2022 23:15:42 +0300	[thread overview]
Message-ID: <c9205f19-52bf-43fe-b1ab-b599d5e2cc7a@intel.com> (raw)
In-Reply-To: <20220408035616.1356953-5-irogers@google.com>

On 8/04/22 06:56, Ian Rogers wrote:
> If all_cpus is calculated it represents the merge/union of all
> evsel cpu maps. By default user_requested_cpus is computed to be
> the online CPUs. For uncore events, it is often the case currently
> that all_cpus is a subset of user_requested_cpus. Metrics printed
> without aggregation and with metric-only, in print_no_aggr_metric,
> iterate over user_requested_cpus assuming every CPU has a metric to
> print. For each CPU the prefix is printed, but then if the
> evsel's cpus doesn't contain anything you get an empty line like
> the following on a 2 socket 36 core SkylakeX:
> 
> ```
> $ perf stat -A -M DRAM_BW_Use -a --metric-only -I 1000
>      1.000453137 CPU0                       0.00
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137 CPU18                      0.00
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      1.000453137
>      2.003717143 CPU0                       0.00
> ...
> ```
> 
> While it is possible to be lazier in printing the prefix and
> trailing newline, having user_requested_cpus not be a subset of
> all_cpus is preferential so that wasted work isn't done elsewhere
> user_requested_cpus is used. The change modifies user_requested_cpus
> to be the intersection of user specified CPUs, or default all online
> CPUs, with the CPUs computed through the merge of all evsel cpu maps.
> 
> New behavior:
> ```
> $ perf stat -A -M DRAM_BW_Use -a --metric-only -I 1000
>      1.001086325 CPU0                       0.00
>      1.001086325 CPU18                      0.00
>      2.003671291 CPU0                       0.00
>      2.003671291 CPU18                      0.00
> ...
> ```
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>  tools/perf/util/evlist.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 52ea004ba01e..196d57b905a0 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -1036,6 +1036,13 @@ int evlist__create_maps(struct evlist *evlist, struct target *target)
>  	if (!cpus)
>  		goto out_delete_threads;
>  
> +	if (evlist->core.all_cpus) {
> +		struct perf_cpu_map *tmp;
> +
> +		tmp = perf_cpu_map__intersect(cpus, evlist->core.all_cpus);

Isn't an uncore PMU represented as being on CPU0 actually
collecting data that can be due to any CPU.

Or for an uncore PMU represented as being on CPU0-CPU4 on a
4 core 8 hyperthread processor, actually 1 PMU per core.

So I am not sure intersection makes sense.

Also it is not obvious what happens with hybrid CPUs or
per thread recording.

> +		perf_cpu_map__put(cpus);
> +		cpus = tmp;
> +	}
>  	evlist->core.has_user_cpus = !!target->cpu_list && !target->hybrid;
>  
>  	perf_evlist__set_maps(&evlist->core, cpus, threads);


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2022-04-28 20:16 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-08  3:56 [PATCH v3 0/5] Make evlist CPUs more accurate Ian Rogers
2022-04-08  3:56 ` Ian Rogers
2022-04-08  3:56 ` [PATCH v3 1/5] perf cpumap: Don't decrement refcnt on args to merge Ian Rogers
2022-04-08  3:56   ` Ian Rogers
2022-04-08  3:56 ` [PATCH v3 2/5] perf tests: Additional cpumap merge tests Ian Rogers
2022-04-08  3:56   ` Ian Rogers
2022-04-08  3:56 ` [PATCH v3 3/5] perf cpumap: Add intersect function Ian Rogers
2022-04-08  3:56   ` Ian Rogers
2022-04-08  3:56 ` [PATCH v3 4/5] perf evlist: Respect all_cpus when setting user_requested_cpus Ian Rogers
2022-04-08  3:56   ` Ian Rogers
2022-04-28 20:15   ` Adrian Hunter [this message]
2022-04-28 20:15     ` Adrian Hunter
     [not found]     ` <CAP-5=fVNuQDW+yge897RjaWfE3cfQTD4ufFws6PS2k99Qe05Uw@mail.gmail.com>
2022-04-29 11:34       ` Adrian Hunter
2022-04-29 11:34         ` Adrian Hunter
2022-04-30  1:06         ` Ian Rogers
2022-04-30  1:06           ` Ian Rogers
2022-04-08  3:56 ` [PATCH v3 5/5] perf test: Combine cpu map tests into 1 suite Ian Rogers
2022-04-08  3:56   ` Ian Rogers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c9205f19-52bf-43fe-b1ab-b599d5e2cc7a@intel.com \
    --to=adrian.hunter@intel.com \
    --cc=acme@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=alexander.antonov@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=alexey.v.bayduraev@linux.intel.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=coresight@lists.linaro.org \
    --cc=daniel@iogearbox.net \
    --cc=eranian@google.com \
    --cc=german.gomez@arm.com \
    --cc=irogers@google.com \
    --cc=james.clark@arm.com \
    --cc=john.fastabend@gmail.com \
    --cc=john.garry@huawei.com \
    --cc=jolsa@kernel.org \
    --cc=kafai@fb.com \
    --cc=kjain@linux.ibm.com \
    --cc=kpsingh@kernel.org \
    --cc=leo.yan@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.poirier@linaro.org \
    --cc=mike.leach@linaro.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rickyman7@gmail.com \
    --cc=songliubraving@fb.com \
    --cc=suzuki.poulose@arm.com \
    --cc=will@kernel.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.