* [PATCH v4] perf stat: Show percore counts in per CPU output
@ 2020-02-14 8:04 Jin Yao
2020-02-16 22:54 ` Jiri Olsa
2020-03-19 14:10 ` [tip: perf/core] " tip-bot2 for Jin Yao
0 siblings, 2 replies; 8+ messages in thread
From: Jin Yao @ 2020-02-14 8:04 UTC (permalink / raw)
To: acme, jolsa, peterz, mingo, alexander.shishkin
Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
We have supported the event modifier "percore" which sums up the
event counts for all hardware threads in a core and show the counts
per core.
For example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A -- sleep 1
Performance counter stats for 'system wide':
S0-D0-C0 395,072 cpu/event=cpu-cycles,percore/
S0-D0-C1 851,248 cpu/event=cpu-cycles,percore/
S0-D0-C2 954,226 cpu/event=cpu-cycles,percore/
S0-D0-C3 1,233,659 cpu/event=cpu-cycles,percore/
This patch provides a new option "--percore-show-thread". It is
used with event modifier "percore" together to sum up the event counts
for all hardware threads in a core but show the counts per hardware
thread.
This is essentially a replacement for the any bit (which is gone in
Icelake). Per core counts are useful for some formulas, e.g. CoreIPC.
The original percore version was inconvenient to post process. This
variant matches the output of the any bit.
With this patch, for example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
Performance counter stats for 'system wide':
CPU0 2,453,061 cpu/event=cpu-cycles,percore/
CPU1 1,823,921 cpu/event=cpu-cycles,percore/
CPU2 1,383,166 cpu/event=cpu-cycles,percore/
CPU3 1,102,652 cpu/event=cpu-cycles,percore/
CPU4 2,453,061 cpu/event=cpu-cycles,percore/
CPU5 1,823,921 cpu/event=cpu-cycles,percore/
CPU6 1,383,166 cpu/event=cpu-cycles,percore/
CPU7 1,102,652 cpu/event=cpu-cycles,percore/
We can see counts are duplicated in CPU pairs
(CPU0/CPU4, CPU1/CPU5, CPU2/CPU6, CPU3/CPU7).
The interval mode also works. For example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -I 1000
# time CPU counts unit events
1.000425421 CPU0 925,032 cpu/event=cpu-cycles,percore/
1.000425421 CPU1 430,202 cpu/event=cpu-cycles,percore/
1.000425421 CPU2 436,843 cpu/event=cpu-cycles,percore/
1.000425421 CPU3 1,192,504 cpu/event=cpu-cycles,percore/
1.000425421 CPU4 925,032 cpu/event=cpu-cycles,percore/
1.000425421 CPU5 430,202 cpu/event=cpu-cycles,percore/
1.000425421 CPU6 436,843 cpu/event=cpu-cycles,percore/
1.000425421 CPU7 1,192,504 cpu/event=cpu-cycles,percore/
If we offline CPU5, the result is:
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
Performance counter stats for 'system wide':
CPU0 2,752,148 cpu/event=cpu-cycles,percore/
CPU1 1,009,312 cpu/event=cpu-cycles,percore/
CPU2 2,784,072 cpu/event=cpu-cycles,percore/
CPU3 2,427,922 cpu/event=cpu-cycles,percore/
CPU4 2,752,148 cpu/event=cpu-cycles,percore/
CPU6 2,784,072 cpu/event=cpu-cycles,percore/
CPU7 2,427,922 cpu/event=cpu-cycles,percore/
1.001416041 seconds time elapsed
v4:
---
Ravi Bangoria reports an issue in v3. Once we offline a CPU,
the output is not correct. The issue is we should use the cpu
idx in print_percore_thread rather than using the cpu value.
v3:
---
1. Fix the interval mode output error
2. Use cpu value (not cpu index) in config->aggr_get_id().
3. Refine the code according to Jiri's comments.
v2:
---
Add the explanation in change log. This is essentially a replacement
for the any bit. No code change.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
tools/perf/Documentation/perf-stat.txt | 9 +++++++
tools/perf/builtin-stat.c | 4 ++++
tools/perf/util/stat-display.c | 33 ++++++++++++++++++++++----
tools/perf/util/stat.h | 1 +
4 files changed, 42 insertions(+), 5 deletions(-)
diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 9431b8066fb4..4d56586b2fb9 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -334,6 +334,15 @@ Configure all used events to run in kernel space.
--all-user::
Configure all used events to run in user space.
+--percore-show-thread::
+The event modifier "percore" has supported to sum up the event counts
+for all hardware threads in a core and show the counts per core.
+
+This option with event modifier "percore" enabled also sums up the event
+counts for all hardware threads in a core but show the sum counts per
+hardware thread. This is essentially a replacement for the any bit and
+convenient for post processing.
+
EXAMPLES
--------
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index a098c2ebf4ea..ec053dc1e35c 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -929,6 +929,10 @@ static struct option stat_options[] = {
OPT_BOOLEAN_FLAG(0, "all-user", &stat_config.all_user,
"Configure all used events to run in user space.",
PARSE_OPT_EXCLUSIVE),
+ OPT_BOOLEAN(0, "percore-show-thread", &stat_config.percore_show_thread,
+ "Use with 'percore' event qualifier to show the event "
+ "counts of one hardware thread by sum up total hardware "
+ "threads of same physical core"),
OPT_END()
};
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index bc31fccc0057..d89cb0da90f8 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -110,7 +110,7 @@ static void aggr_printout(struct perf_stat_config *config,
config->csv_sep);
break;
case AGGR_NONE:
- if (evsel->percore) {
+ if (evsel->percore && !config->percore_show_thread) {
fprintf(config->output, "S%d-D%d-C%*d%s",
cpu_map__id_to_socket(id),
cpu_map__id_to_die(id),
@@ -628,7 +628,7 @@ static void aggr_cb(struct perf_stat_config *config,
static void print_counter_aggrdata(struct perf_stat_config *config,
struct evsel *counter, int s,
char *prefix, bool metric_only,
- bool *first)
+ bool *first, int cpu)
{
struct aggr_data ad;
FILE *output = config->output;
@@ -654,7 +654,7 @@ static void print_counter_aggrdata(struct perf_stat_config *config,
fprintf(output, "%s", prefix);
uval = val * counter->scale;
- printout(config, id, nr, counter, uval, prefix,
+ printout(config, cpu != -1 ? cpu : id, nr, counter, uval, prefix,
run, ena, 1.0, &rt_stat);
if (!metric_only)
fputc('\n', output);
@@ -687,7 +687,7 @@ static void print_aggr(struct perf_stat_config *config,
evlist__for_each_entry(evlist, counter) {
print_counter_aggrdata(config, counter, s,
prefix, metric_only,
- &first);
+ &first, -1);
}
if (metric_only)
fputc('\n', output);
@@ -1146,6 +1146,26 @@ static void print_footer(struct perf_stat_config *config)
"the same PMU. Try reorganizing the group.\n");
}
+static void print_percore_thread(struct perf_stat_config *config,
+ struct evsel *counter, char *prefix)
+{
+ int s, s2, id;
+ bool first = true;
+
+ for (int i = 0; i < perf_evsel__nr_cpus(counter); i++) {
+ s2 = config->aggr_get_id(config, evsel__cpus(counter), i);
+ for (s = 0; s < config->aggr_map->nr; s++) {
+ id = config->aggr_map->map[s];
+ if (s2 == id)
+ break;
+ }
+
+ print_counter_aggrdata(config, counter, s,
+ prefix, false,
+ &first, i);
+ }
+}
+
static void print_percore(struct perf_stat_config *config,
struct evsel *counter, char *prefix)
{
@@ -1157,13 +1177,16 @@ static void print_percore(struct perf_stat_config *config,
if (!(config->aggr_map || config->aggr_get_id))
return;
+ if (config->percore_show_thread)
+ return print_percore_thread(config, counter, prefix);
+
for (s = 0; s < config->aggr_map->nr; s++) {
if (prefix && metric_only)
fprintf(output, "%s", prefix);
print_counter_aggrdata(config, counter, s,
prefix, metric_only,
- &first);
+ &first, -1);
}
if (metric_only)
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index fb990efa54a8..b4fdfaa7f2c0 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -109,6 +109,7 @@ struct perf_stat_config {
bool walltime_run_table;
bool all_kernel;
bool all_user;
+ bool percore_show_thread;
FILE *output;
unsigned int interval;
unsigned int timeout;
--
2.17.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v4] perf stat: Show percore counts in per CPU output
2020-02-14 8:04 [PATCH v4] perf stat: Show percore counts in per CPU output Jin Yao
@ 2020-02-16 22:54 ` Jiri Olsa
2020-02-17 1:22 ` Jin, Yao
2020-02-17 14:45 ` Arnaldo Carvalho de Melo
2020-03-19 14:10 ` [tip: perf/core] " tip-bot2 for Jin Yao
1 sibling, 2 replies; 8+ messages in thread
From: Jiri Olsa @ 2020-02-16 22:54 UTC (permalink / raw)
To: Jin Yao
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On Fri, Feb 14, 2020 at 04:04:52PM +0800, Jin Yao wrote:
SNIP
> CPU1 1,009,312 cpu/event=cpu-cycles,percore/
> CPU2 2,784,072 cpu/event=cpu-cycles,percore/
> CPU3 2,427,922 cpu/event=cpu-cycles,percore/
> CPU4 2,752,148 cpu/event=cpu-cycles,percore/
> CPU6 2,784,072 cpu/event=cpu-cycles,percore/
> CPU7 2,427,922 cpu/event=cpu-cycles,percore/
>
> 1.001416041 seconds time elapsed
>
> v4:
> ---
> Ravi Bangoria reports an issue in v3. Once we offline a CPU,
> the output is not correct. The issue is we should use the cpu
> idx in print_percore_thread rather than using the cpu value.
Acked-by: Jiri Olsa <jolsa@kernel.org>
btw, there's slight misalignment in -I output, but not due
to your change, it's there for some time now, and probably
in other agregation outputs as well:
$ sudo ./perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000
# time CPU counts unit events
1.000224464 CPU0 7,251,151 cpu/event=cpu-cycles/
1.000224464 CPU1 21,614,946 cpu/event=cpu-cycles/
1.000224464 CPU2 30,812,097 cpu/event=cpu-cycles/
should be (extra space after CPUX):
1.000224464 CPU2 30,812,097 cpu/event=cpu-cycles/
I'll put it on my TODO, but if you're welcome to check on it ;-)
thanks,
jirka
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4] perf stat: Show percore counts in per CPU output
2020-02-16 22:54 ` Jiri Olsa
@ 2020-02-17 1:22 ` Jin, Yao
2020-02-17 11:06 ` Jiri Olsa
2020-02-17 14:45 ` Arnaldo Carvalho de Melo
1 sibling, 1 reply; 8+ messages in thread
From: Jin, Yao @ 2020-02-17 1:22 UTC (permalink / raw)
To: Jiri Olsa
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On 2/17/2020 6:54 AM, Jiri Olsa wrote:
> On Fri, Feb 14, 2020 at 04:04:52PM +0800, Jin Yao wrote:
>
> SNIP
>
>> CPU1 1,009,312 cpu/event=cpu-cycles,percore/
>> CPU2 2,784,072 cpu/event=cpu-cycles,percore/
>> CPU3 2,427,922 cpu/event=cpu-cycles,percore/
>> CPU4 2,752,148 cpu/event=cpu-cycles,percore/
>> CPU6 2,784,072 cpu/event=cpu-cycles,percore/
>> CPU7 2,427,922 cpu/event=cpu-cycles,percore/
>>
>> 1.001416041 seconds time elapsed
>>
>> v4:
>> ---
>> Ravi Bangoria reports an issue in v3. Once we offline a CPU,
>> the output is not correct. The issue is we should use the cpu
>> idx in print_percore_thread rather than using the cpu value.
>
> Acked-by: Jiri Olsa <jolsa@kernel.org>
>
Thanks so much for ACK this patch. :)
> btw, there's slight misalignment in -I output, but not due
> to your change, it's there for some time now, and probably
> in other agregation outputs as well:
>
>
> $ sudo ./perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000
> # time CPU counts unit events
> 1.000224464 CPU0 7,251,151 cpu/event=cpu-cycles/
> 1.000224464 CPU1 21,614,946 cpu/event=cpu-cycles/
> 1.000224464 CPU2 30,812,097 cpu/event=cpu-cycles/
>
> should be (extra space after CPUX):
>
> 1.000224464 CPU2 30,812,097 cpu/event=cpu-cycles/
>
> I'll put it on my TODO, but if you're welcome to check on it ;-)
>
> thanks,
> jirka
>
I have a simple fix for this misalignment issue.
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index bc31fccc0057..95b29c9cba36 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -114,11 +114,11 @@ static void aggr_printout(struct perf_stat_config
*config,
fprintf(config->output, "S%d-D%d-C%*d%s",
cpu_map__id_to_socket(id),
cpu_map__id_to_die(id),
- config->csv_output ? 0 : -5,
+ config->csv_output ? 0 : -3,
cpu_map__id_to_cpu(id), config->csv_sep);
} else {
- fprintf(config->output, "CPU%*d%s ",
- config->csv_output ? 0 : -5,
+ fprintf(config->output, "CPU%*d%s",
+ config->csv_output ? 0 : -7,
evsel__cpus(evsel)->map[id],
config->csv_sep);
}
Following command lines are tested OK.
perf stat -e cpu/event=cpu-cycles/ -I 1000
perf stat -e cpu/event=cpu-cycles/ -a -I 1000
perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000
perf stat -e cpu/event=cpu-cycles,percore/ -a -A -I 1000
perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread
-I 1000
Could you help to look at that?
Thanks
Jin Yao
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v4] perf stat: Show percore counts in per CPU output
2020-02-17 1:22 ` Jin, Yao
@ 2020-02-17 11:06 ` Jiri Olsa
2020-02-18 1:02 ` Jin, Yao
0 siblings, 1 reply; 8+ messages in thread
From: Jiri Olsa @ 2020-02-17 11:06 UTC (permalink / raw)
To: Jin, Yao
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On Mon, Feb 17, 2020 at 09:22:57AM +0800, Jin, Yao wrote:
>
>
> On 2/17/2020 6:54 AM, Jiri Olsa wrote:
> > On Fri, Feb 14, 2020 at 04:04:52PM +0800, Jin Yao wrote:
> >
> > SNIP
> >
> > > CPU1 1,009,312 cpu/event=cpu-cycles,percore/
> > > CPU2 2,784,072 cpu/event=cpu-cycles,percore/
> > > CPU3 2,427,922 cpu/event=cpu-cycles,percore/
> > > CPU4 2,752,148 cpu/event=cpu-cycles,percore/
> > > CPU6 2,784,072 cpu/event=cpu-cycles,percore/
> > > CPU7 2,427,922 cpu/event=cpu-cycles,percore/
> > >
> > > 1.001416041 seconds time elapsed
> > >
> > > v4:
> > > ---
> > > Ravi Bangoria reports an issue in v3. Once we offline a CPU,
> > > the output is not correct. The issue is we should use the cpu
> > > idx in print_percore_thread rather than using the cpu value.
> >
> > Acked-by: Jiri Olsa <jolsa@kernel.org>
> >
>
> Thanks so much for ACK this patch. :)
>
> > btw, there's slight misalignment in -I output, but not due
> > to your change, it's there for some time now, and probably
> > in other agregation outputs as well:
> >
> >
> > $ sudo ./perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000
> > # time CPU counts unit events
> > 1.000224464 CPU0 7,251,151 cpu/event=cpu-cycles/
> > 1.000224464 CPU1 21,614,946 cpu/event=cpu-cycles/
> > 1.000224464 CPU2 30,812,097 cpu/event=cpu-cycles/
> >
> > should be (extra space after CPUX):
> >
> > 1.000224464 CPU2 30,812,097 cpu/event=cpu-cycles/
> >
> > I'll put it on my TODO, but if you're welcome to check on it ;-)
> >
> > thanks,
> > jirka
> >
>
> I have a simple fix for this misalignment issue.
>
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index bc31fccc0057..95b29c9cba36 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -114,11 +114,11 @@ static void aggr_printout(struct perf_stat_config
> *config,
> fprintf(config->output, "S%d-D%d-C%*d%s",
> cpu_map__id_to_socket(id),
> cpu_map__id_to_die(id),
> - config->csv_output ? 0 : -5,
> + config->csv_output ? 0 : -3,
> cpu_map__id_to_cpu(id), config->csv_sep);
> } else {
> - fprintf(config->output, "CPU%*d%s ",
> - config->csv_output ? 0 : -5,
> + fprintf(config->output, "CPU%*d%s",
> + config->csv_output ? 0 : -7,
> evsel__cpus(evsel)->map[id],
> config->csv_sep);
I guess that's ok, will that work with higher (3 digit) cpu numbers?
jirka
> }
>
> Following command lines are tested OK.
>
> perf stat -e cpu/event=cpu-cycles/ -I 1000
> perf stat -e cpu/event=cpu-cycles/ -a -I 1000
> perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000
> perf stat -e cpu/event=cpu-cycles,percore/ -a -A -I 1000
> perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -I
> 1000
>
> Could you help to look at that?
>
> Thanks
> Jin Yao
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4] perf stat: Show percore counts in per CPU output
2020-02-16 22:54 ` Jiri Olsa
2020-02-17 1:22 ` Jin, Yao
@ 2020-02-17 14:45 ` Arnaldo Carvalho de Melo
1 sibling, 0 replies; 8+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-02-17 14:45 UTC (permalink / raw)
To: Jiri Olsa
Cc: Jin Yao, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel,
ak, kan.liang, yao.jin
Em Sun, Feb 16, 2020 at 11:54:07PM +0100, Jiri Olsa escreveu:
> On Fri, Feb 14, 2020 at 04:04:52PM +0800, Jin Yao wrote:
>
> SNIP
>
> > CPU1 1,009,312 cpu/event=cpu-cycles,percore/
> > CPU2 2,784,072 cpu/event=cpu-cycles,percore/
> > CPU3 2,427,922 cpu/event=cpu-cycles,percore/
> > CPU4 2,752,148 cpu/event=cpu-cycles,percore/
> > CPU6 2,784,072 cpu/event=cpu-cycles,percore/
> > CPU7 2,427,922 cpu/event=cpu-cycles,percore/
> >
> > 1.001416041 seconds time elapsed
> >
> > v4:
> > ---
> > Ravi Bangoria reports an issue in v3. Once we offline a CPU,
> > the output is not correct. The issue is we should use the cpu
> > idx in print_percore_thread rather than using the cpu value.
>
> Acked-by: Jiri Olsa <jolsa@kernel.org>
Applied to perf/core
- Arnaldo
> btw, there's slight misalignment in -I output, but not due
> to your change, it's there for some time now, and probably
> in other agregation outputs as well:
>
>
> $ sudo ./perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000
> # time CPU counts unit events
> 1.000224464 CPU0 7,251,151 cpu/event=cpu-cycles/
> 1.000224464 CPU1 21,614,946 cpu/event=cpu-cycles/
> 1.000224464 CPU2 30,812,097 cpu/event=cpu-cycles/
>
> should be (extra space after CPUX):
>
> 1.000224464 CPU2 30,812,097 cpu/event=cpu-cycles/
>
> I'll put it on my TODO, but if you're welcome to check on it ;-)
>
> thanks,
> jirka
>
--
- Arnaldo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4] perf stat: Show percore counts in per CPU output
2020-02-17 11:06 ` Jiri Olsa
@ 2020-02-18 1:02 ` Jin, Yao
2020-02-18 6:14 ` Jiri Olsa
0 siblings, 1 reply; 8+ messages in thread
From: Jin, Yao @ 2020-02-18 1:02 UTC (permalink / raw)
To: Jiri Olsa
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On 2/17/2020 7:06 PM, Jiri Olsa wrote:
> On Mon, Feb 17, 2020 at 09:22:57AM +0800, Jin, Yao wrote:
>>
>>
>> On 2/17/2020 6:54 AM, Jiri Olsa wrote:
>>> On Fri, Feb 14, 2020 at 04:04:52PM +0800, Jin Yao wrote:
>>>
>>> SNIP
>>>
>>>> CPU1 1,009,312 cpu/event=cpu-cycles,percore/
>>>> CPU2 2,784,072 cpu/event=cpu-cycles,percore/
>>>> CPU3 2,427,922 cpu/event=cpu-cycles,percore/
>>>> CPU4 2,752,148 cpu/event=cpu-cycles,percore/
>>>> CPU6 2,784,072 cpu/event=cpu-cycles,percore/
>>>> CPU7 2,427,922 cpu/event=cpu-cycles,percore/
>>>>
>>>> 1.001416041 seconds time elapsed
>>>>
>>>> v4:
>>>> ---
>>>> Ravi Bangoria reports an issue in v3. Once we offline a CPU,
>>>> the output is not correct. The issue is we should use the cpu
>>>> idx in print_percore_thread rather than using the cpu value.
>>>
>>> Acked-by: Jiri Olsa <jolsa@kernel.org>
>>>
>>
>> Thanks so much for ACK this patch. :)
>>
>>> btw, there's slight misalignment in -I output, but not due
>>> to your change, it's there for some time now, and probably
>>> in other agregation outputs as well:
>>>
>>>
>>> $ sudo ./perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000
>>> # time CPU counts unit events
>>> 1.000224464 CPU0 7,251,151 cpu/event=cpu-cycles/
>>> 1.000224464 CPU1 21,614,946 cpu/event=cpu-cycles/
>>> 1.000224464 CPU2 30,812,097 cpu/event=cpu-cycles/
>>>
>>> should be (extra space after CPUX):
>>>
>>> 1.000224464 CPU2 30,812,097 cpu/event=cpu-cycles/
>>>
>>> I'll put it on my TODO, but if you're welcome to check on it ;-)
>>>
>>> thanks,
>>> jirka
>>>
>>
>> I have a simple fix for this misalignment issue.
>>
>> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
>> index bc31fccc0057..95b29c9cba36 100644
>> --- a/tools/perf/util/stat-display.c
>> +++ b/tools/perf/util/stat-display.c
>> @@ -114,11 +114,11 @@ static void aggr_printout(struct perf_stat_config
>> *config,
>> fprintf(config->output, "S%d-D%d-C%*d%s",
>> cpu_map__id_to_socket(id),
>> cpu_map__id_to_die(id),
>> - config->csv_output ? 0 : -5,
>> + config->csv_output ? 0 : -3,
>> cpu_map__id_to_cpu(id), config->csv_sep);
>> } else {
>> - fprintf(config->output, "CPU%*d%s ",
>> - config->csv_output ? 0 : -5,
>> + fprintf(config->output, "CPU%*d%s",
>> + config->csv_output ? 0 : -7,
>> evsel__cpus(evsel)->map[id],
>> config->csv_sep);
>
> I guess that's ok, will that work with higher (3 digit) cpu numbers?
>
> jirka
>
Yes, it works with hundreds of CPU. I have tested with that case.
BTW, do you need me to post a separate patch or you will add this fix in
your patch series?
Thanks
Jin Yao
>> }
>>
>> Following command lines are tested OK.
>>
>> perf stat -e cpu/event=cpu-cycles/ -I 1000
>> perf stat -e cpu/event=cpu-cycles/ -a -I 1000
>> perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000
>> perf stat -e cpu/event=cpu-cycles,percore/ -a -A -I 1000
>> perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -I
>> 1000
>>
>> Could you help to look at that?
>>
>> Thanks
>> Jin Yao
>>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4] perf stat: Show percore counts in per CPU output
2020-02-18 1:02 ` Jin, Yao
@ 2020-02-18 6:14 ` Jiri Olsa
0 siblings, 0 replies; 8+ messages in thread
From: Jiri Olsa @ 2020-02-18 6:14 UTC (permalink / raw)
To: Jin, Yao
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On Tue, Feb 18, 2020 at 09:02:52AM +0800, Jin, Yao wrote:
SNIP
> > > >
> > > > thanks,
> > > > jirka
> > > >
> > >
> > > I have a simple fix for this misalignment issue.
> > >
> > > diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> > > index bc31fccc0057..95b29c9cba36 100644
> > > --- a/tools/perf/util/stat-display.c
> > > +++ b/tools/perf/util/stat-display.c
> > > @@ -114,11 +114,11 @@ static void aggr_printout(struct perf_stat_config
> > > *config,
> > > fprintf(config->output, "S%d-D%d-C%*d%s",
> > > cpu_map__id_to_socket(id),
> > > cpu_map__id_to_die(id),
> > > - config->csv_output ? 0 : -5,
> > > + config->csv_output ? 0 : -3,
> > > cpu_map__id_to_cpu(id), config->csv_sep);
> > > } else {
> > > - fprintf(config->output, "CPU%*d%s ",
> > > - config->csv_output ? 0 : -5,
> > > + fprintf(config->output, "CPU%*d%s",
> > > + config->csv_output ? 0 : -7,
> > > evsel__cpus(evsel)->map[id],
> > > config->csv_sep);
> >
> > I guess that's ok, will that work with higher (3 digit) cpu numbers?
> >
> > jirka
> >
>
> Yes, it works with hundreds of CPU. I have tested with that case.
>
> BTW, do you need me to post a separate patch or you will add this fix in
> your patch series?
please send separate patch
thanks,
jirka
^ permalink raw reply [flat|nested] 8+ messages in thread
* [tip: perf/core] perf stat: Show percore counts in per CPU output
2020-02-14 8:04 [PATCH v4] perf stat: Show percore counts in per CPU output Jin Yao
2020-02-16 22:54 ` Jiri Olsa
@ 2020-03-19 14:10 ` tip-bot2 for Jin Yao
1 sibling, 0 replies; 8+ messages in thread
From: tip-bot2 for Jin Yao @ 2020-03-19 14:10 UTC (permalink / raw)
To: linux-tip-commits
Cc: Jin Yao, Ravi Bangoria, Jiri Olsa, Alexander Shishkin,
Andi Kleen, Kan Liang, Peter Zijlstra, Arnaldo Carvalho de Melo,
x86, LKML
The following commit has been merged into the perf/core branch of tip:
Commit-ID: 1af62ce61cd80a25590d5abeccfb5c3a666527dd
Gitweb: https://git.kernel.org/tip/1af62ce61cd80a25590d5abeccfb5c3a666527dd
Author: Jin Yao <yao.jin@linux.intel.com>
AuthorDate: Fri, 14 Feb 2020 16:04:52 +08:00
Committer: Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Wed, 04 Mar 2020 10:34:09 -03:00
perf stat: Show percore counts in per CPU output
We have supported the event modifier "percore" which sums up the event
counts for all hardware threads in a core and show the counts per core.
For example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A -- sleep 1
Performance counter stats for 'system wide':
S0-D0-C0 395,072 cpu/event=cpu-cycles,percore/
S0-D0-C1 851,248 cpu/event=cpu-cycles,percore/
S0-D0-C2 954,226 cpu/event=cpu-cycles,percore/
S0-D0-C3 1,233,659 cpu/event=cpu-cycles,percore/
This patch provides a new option "--percore-show-thread". It is used
with event modifier "percore" together to sum up the event counts for
all hardware threads in a core but show the counts per hardware thread.
This is essentially a replacement for the any bit (which is gone in
Icelake). Per core counts are useful for some formulas, e.g. CoreIPC.
The original percore version was inconvenient to post process. This
variant matches the output of the any bit.
With this patch, for example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
Performance counter stats for 'system wide':
CPU0 2,453,061 cpu/event=cpu-cycles,percore/
CPU1 1,823,921 cpu/event=cpu-cycles,percore/
CPU2 1,383,166 cpu/event=cpu-cycles,percore/
CPU3 1,102,652 cpu/event=cpu-cycles,percore/
CPU4 2,453,061 cpu/event=cpu-cycles,percore/
CPU5 1,823,921 cpu/event=cpu-cycles,percore/
CPU6 1,383,166 cpu/event=cpu-cycles,percore/
CPU7 1,102,652 cpu/event=cpu-cycles,percore/
We can see counts are duplicated in CPU pairs (CPU0/CPU4, CPU1/CPU5,
CPU2/CPU6, CPU3/CPU7).
The interval mode also works. For example,
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -I 1000
# time CPU counts unit events
1.000425421 CPU0 925,032 cpu/event=cpu-cycles,percore/
1.000425421 CPU1 430,202 cpu/event=cpu-cycles,percore/
1.000425421 CPU2 436,843 cpu/event=cpu-cycles,percore/
1.000425421 CPU3 1,192,504 cpu/event=cpu-cycles,percore/
1.000425421 CPU4 925,032 cpu/event=cpu-cycles,percore/
1.000425421 CPU5 430,202 cpu/event=cpu-cycles,percore/
1.000425421 CPU6 436,843 cpu/event=cpu-cycles,percore/
1.000425421 CPU7 1,192,504 cpu/event=cpu-cycles,percore/
If we offline CPU5, the result is:
# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
Performance counter stats for 'system wide':
CPU0 2,752,148 cpu/event=cpu-cycles,percore/
CPU1 1,009,312 cpu/event=cpu-cycles,percore/
CPU2 2,784,072 cpu/event=cpu-cycles,percore/
CPU3 2,427,922 cpu/event=cpu-cycles,percore/
CPU4 2,752,148 cpu/event=cpu-cycles,percore/
CPU6 2,784,072 cpu/event=cpu-cycles,percore/
CPU7 2,427,922 cpu/event=cpu-cycles,percore/
1.001416041 seconds time elapsed
v4:
---
Ravi Bangoria reports an issue in v3. Once we offline a CPU,
the output is not correct. The issue is we should use the cpu
idx in print_percore_thread rather than using the cpu value.
v3:
---
1. Fix the interval mode output error
2. Use cpu value (not cpu index) in config->aggr_get_id().
3. Refine the code according to Jiri's comments.
v2:
---
Add the explanation in change log. This is essentially a replacement
for the any bit. No code change.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200214080452.26402-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
tools/perf/Documentation/perf-stat.txt | 9 +++++++-
tools/perf/builtin-stat.c | 4 +++-
tools/perf/util/stat-display.c | 33 +++++++++++++++++++++----
tools/perf/util/stat.h | 1 +-
4 files changed, 42 insertions(+), 5 deletions(-)
diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 9431b80..4d56586 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -334,6 +334,15 @@ Configure all used events to run in kernel space.
--all-user::
Configure all used events to run in user space.
+--percore-show-thread::
+The event modifier "percore" has supported to sum up the event counts
+for all hardware threads in a core and show the counts per core.
+
+This option with event modifier "percore" enabled also sums up the event
+counts for all hardware threads in a core but show the sum counts per
+hardware thread. This is essentially a replacement for the any bit and
+convenient for post processing.
+
EXAMPLES
--------
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index a098c2e..ec053dc 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -929,6 +929,10 @@ static struct option stat_options[] = {
OPT_BOOLEAN_FLAG(0, "all-user", &stat_config.all_user,
"Configure all used events to run in user space.",
PARSE_OPT_EXCLUSIVE),
+ OPT_BOOLEAN(0, "percore-show-thread", &stat_config.percore_show_thread,
+ "Use with 'percore' event qualifier to show the event "
+ "counts of one hardware thread by sum up total hardware "
+ "threads of same physical core"),
OPT_END()
};
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index bc31fcc..d89cb0d 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -110,7 +110,7 @@ static void aggr_printout(struct perf_stat_config *config,
config->csv_sep);
break;
case AGGR_NONE:
- if (evsel->percore) {
+ if (evsel->percore && !config->percore_show_thread) {
fprintf(config->output, "S%d-D%d-C%*d%s",
cpu_map__id_to_socket(id),
cpu_map__id_to_die(id),
@@ -628,7 +628,7 @@ static void aggr_cb(struct perf_stat_config *config,
static void print_counter_aggrdata(struct perf_stat_config *config,
struct evsel *counter, int s,
char *prefix, bool metric_only,
- bool *first)
+ bool *first, int cpu)
{
struct aggr_data ad;
FILE *output = config->output;
@@ -654,7 +654,7 @@ static void print_counter_aggrdata(struct perf_stat_config *config,
fprintf(output, "%s", prefix);
uval = val * counter->scale;
- printout(config, id, nr, counter, uval, prefix,
+ printout(config, cpu != -1 ? cpu : id, nr, counter, uval, prefix,
run, ena, 1.0, &rt_stat);
if (!metric_only)
fputc('\n', output);
@@ -687,7 +687,7 @@ static void print_aggr(struct perf_stat_config *config,
evlist__for_each_entry(evlist, counter) {
print_counter_aggrdata(config, counter, s,
prefix, metric_only,
- &first);
+ &first, -1);
}
if (metric_only)
fputc('\n', output);
@@ -1146,6 +1146,26 @@ static void print_footer(struct perf_stat_config *config)
"the same PMU. Try reorganizing the group.\n");
}
+static void print_percore_thread(struct perf_stat_config *config,
+ struct evsel *counter, char *prefix)
+{
+ int s, s2, id;
+ bool first = true;
+
+ for (int i = 0; i < perf_evsel__nr_cpus(counter); i++) {
+ s2 = config->aggr_get_id(config, evsel__cpus(counter), i);
+ for (s = 0; s < config->aggr_map->nr; s++) {
+ id = config->aggr_map->map[s];
+ if (s2 == id)
+ break;
+ }
+
+ print_counter_aggrdata(config, counter, s,
+ prefix, false,
+ &first, i);
+ }
+}
+
static void print_percore(struct perf_stat_config *config,
struct evsel *counter, char *prefix)
{
@@ -1157,13 +1177,16 @@ static void print_percore(struct perf_stat_config *config,
if (!(config->aggr_map || config->aggr_get_id))
return;
+ if (config->percore_show_thread)
+ return print_percore_thread(config, counter, prefix);
+
for (s = 0; s < config->aggr_map->nr; s++) {
if (prefix && metric_only)
fprintf(output, "%s", prefix);
print_counter_aggrdata(config, counter, s,
prefix, metric_only,
- &first);
+ &first, -1);
}
if (metric_only)
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index fb990ef..b4fdfaa 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -109,6 +109,7 @@ struct perf_stat_config {
bool walltime_run_table;
bool all_kernel;
bool all_user;
+ bool percore_show_thread;
FILE *output;
unsigned int interval;
unsigned int timeout;
^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-03-19 14:11 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-14 8:04 [PATCH v4] perf stat: Show percore counts in per CPU output Jin Yao
2020-02-16 22:54 ` Jiri Olsa
2020-02-17 1:22 ` Jin, Yao
2020-02-17 11:06 ` Jiri Olsa
2020-02-18 1:02 ` Jin, Yao
2020-02-18 6:14 ` Jiri Olsa
2020-02-17 14:45 ` Arnaldo Carvalho de Melo
2020-03-19 14:10 ` [tip: perf/core] " tip-bot2 for Jin Yao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).