linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] perf stat: Show percore counts in per CPU output
@ 2020-02-14  8:04 Jin Yao
  2020-02-16 22:54 ` Jiri Olsa
  2020-03-19 14:10 ` [tip: perf/core] " tip-bot2 for Jin Yao
  0 siblings, 2 replies; 8+ messages in thread
From: Jin Yao @ 2020-02-14  8:04 UTC (permalink / raw)
  To: acme, jolsa, peterz, mingo, alexander.shishkin
  Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao

We have supported the event modifier "percore" which sums up the
event counts for all hardware threads in a core and show the counts
per core.

For example,

 # perf stat -e cpu/event=cpu-cycles,percore/ -a -A -- sleep 1

  Performance counter stats for 'system wide':

 S0-D0-C0                395,072      cpu/event=cpu-cycles,percore/
 S0-D0-C1                851,248      cpu/event=cpu-cycles,percore/
 S0-D0-C2                954,226      cpu/event=cpu-cycles,percore/
 S0-D0-C3              1,233,659      cpu/event=cpu-cycles,percore/

This patch provides a new option "--percore-show-thread". It is
used with event modifier "percore" together to sum up the event counts
for all hardware threads in a core but show the counts per hardware
thread.

This is essentially a replacement for the any bit (which is gone in
Icelake). Per core counts are useful for some formulas, e.g. CoreIPC.
The original percore version was inconvenient to post process. This
variant matches the output of the any bit.

With this patch, for example,

 # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread  -- sleep 1

  Performance counter stats for 'system wide':

 CPU0               2,453,061      cpu/event=cpu-cycles,percore/
 CPU1               1,823,921      cpu/event=cpu-cycles,percore/
 CPU2               1,383,166      cpu/event=cpu-cycles,percore/
 CPU3               1,102,652      cpu/event=cpu-cycles,percore/
 CPU4               2,453,061      cpu/event=cpu-cycles,percore/
 CPU5               1,823,921      cpu/event=cpu-cycles,percore/
 CPU6               1,383,166      cpu/event=cpu-cycles,percore/
 CPU7               1,102,652      cpu/event=cpu-cycles,percore/

We can see counts are duplicated in CPU pairs
(CPU0/CPU4, CPU1/CPU5, CPU2/CPU6, CPU3/CPU7).

The interval mode also works. For example,

 # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread  -I 1000
 #           time CPU                    counts unit events
      1.000425421 CPU0                 925,032      cpu/event=cpu-cycles,percore/
      1.000425421 CPU1                 430,202      cpu/event=cpu-cycles,percore/
      1.000425421 CPU2                 436,843      cpu/event=cpu-cycles,percore/
      1.000425421 CPU3               1,192,504      cpu/event=cpu-cycles,percore/
      1.000425421 CPU4                 925,032      cpu/event=cpu-cycles,percore/
      1.000425421 CPU5                 430,202      cpu/event=cpu-cycles,percore/
      1.000425421 CPU6                 436,843      cpu/event=cpu-cycles,percore/
      1.000425421 CPU7               1,192,504      cpu/event=cpu-cycles,percore/

If we offline CPU5, the result is:

 # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1

  Performance counter stats for 'system wide':

 CPU0               2,752,148      cpu/event=cpu-cycles,percore/
 CPU1               1,009,312      cpu/event=cpu-cycles,percore/
 CPU2               2,784,072      cpu/event=cpu-cycles,percore/
 CPU3               2,427,922      cpu/event=cpu-cycles,percore/
 CPU4               2,752,148      cpu/event=cpu-cycles,percore/
 CPU6               2,784,072      cpu/event=cpu-cycles,percore/
 CPU7               2,427,922      cpu/event=cpu-cycles,percore/

        1.001416041 seconds time elapsed

 v4:
 ---
 Ravi Bangoria reports an issue in v3. Once we offline a CPU,
 the output is not correct. The issue is we should use the cpu
 idx in print_percore_thread rather than using the cpu value.

 v3:
 ---
 1. Fix the interval mode output error
 2. Use cpu value (not cpu index) in config->aggr_get_id().
 3. Refine the code according to Jiri's comments.

 v2:
 ---
 Add the explanation in change log. This is essentially a replacement
 for the any bit. No code change.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 tools/perf/Documentation/perf-stat.txt |  9 +++++++
 tools/perf/builtin-stat.c              |  4 ++++
 tools/perf/util/stat-display.c         | 33 ++++++++++++++++++++++----
 tools/perf/util/stat.h                 |  1 +
 4 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 9431b8066fb4..4d56586b2fb9 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -334,6 +334,15 @@ Configure all used events to run in kernel space.
 --all-user::
 Configure all used events to run in user space.
 
+--percore-show-thread::
+The event modifier "percore" has supported to sum up the event counts
+for all hardware threads in a core and show the counts per core.
+
+This option with event modifier "percore" enabled also sums up the event
+counts for all hardware threads in a core but show the sum counts per
+hardware thread. This is essentially a replacement for the any bit and
+convenient for post processing.
+
 EXAMPLES
 --------
 
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index a098c2ebf4ea..ec053dc1e35c 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -929,6 +929,10 @@ static struct option stat_options[] = {
 	OPT_BOOLEAN_FLAG(0, "all-user", &stat_config.all_user,
 			 "Configure all used events to run in user space.",
 			 PARSE_OPT_EXCLUSIVE),
+	OPT_BOOLEAN(0, "percore-show-thread", &stat_config.percore_show_thread,
+		    "Use with 'percore' event qualifier to show the event "
+		    "counts of one hardware thread by sum up total hardware "
+		    "threads of same physical core"),
 	OPT_END()
 };
 
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index bc31fccc0057..d89cb0da90f8 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -110,7 +110,7 @@ static void aggr_printout(struct perf_stat_config *config,
 			config->csv_sep);
 			break;
 	case AGGR_NONE:
-		if (evsel->percore) {
+		if (evsel->percore && !config->percore_show_thread) {
 			fprintf(config->output, "S%d-D%d-C%*d%s",
 				cpu_map__id_to_socket(id),
 				cpu_map__id_to_die(id),
@@ -628,7 +628,7 @@ static void aggr_cb(struct perf_stat_config *config,
 static void print_counter_aggrdata(struct perf_stat_config *config,
 				   struct evsel *counter, int s,
 				   char *prefix, bool metric_only,
-				   bool *first)
+				   bool *first, int cpu)
 {
 	struct aggr_data ad;
 	FILE *output = config->output;
@@ -654,7 +654,7 @@ static void print_counter_aggrdata(struct perf_stat_config *config,
 		fprintf(output, "%s", prefix);
 
 	uval = val * counter->scale;
-	printout(config, id, nr, counter, uval, prefix,
+	printout(config, cpu != -1 ? cpu : id, nr, counter, uval, prefix,
 		 run, ena, 1.0, &rt_stat);
 	if (!metric_only)
 		fputc('\n', output);
@@ -687,7 +687,7 @@ static void print_aggr(struct perf_stat_config *config,
 		evlist__for_each_entry(evlist, counter) {
 			print_counter_aggrdata(config, counter, s,
 					       prefix, metric_only,
-					       &first);
+					       &first, -1);
 		}
 		if (metric_only)
 			fputc('\n', output);
@@ -1146,6 +1146,26 @@ static void print_footer(struct perf_stat_config *config)
 			"the same PMU. Try reorganizing the group.\n");
 }
 
+static void print_percore_thread(struct perf_stat_config *config,
+				 struct evsel *counter, char *prefix)
+{
+	int s, s2, id;
+	bool first = true;
+
+	for (int i = 0; i < perf_evsel__nr_cpus(counter); i++) {
+		s2 = config->aggr_get_id(config, evsel__cpus(counter), i);
+		for (s = 0; s < config->aggr_map->nr; s++) {
+			id = config->aggr_map->map[s];
+			if (s2 == id)
+				break;
+		}
+
+		print_counter_aggrdata(config, counter, s,
+				       prefix, false,
+				       &first, i);
+	}
+}
+
 static void print_percore(struct perf_stat_config *config,
 			  struct evsel *counter, char *prefix)
 {
@@ -1157,13 +1177,16 @@ static void print_percore(struct perf_stat_config *config,
 	if (!(config->aggr_map || config->aggr_get_id))
 		return;
 
+	if (config->percore_show_thread)
+		return print_percore_thread(config, counter, prefix);
+
 	for (s = 0; s < config->aggr_map->nr; s++) {
 		if (prefix && metric_only)
 			fprintf(output, "%s", prefix);
 
 		print_counter_aggrdata(config, counter, s,
 				       prefix, metric_only,
-				       &first);
+				       &first, -1);
 	}
 
 	if (metric_only)
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index fb990efa54a8..b4fdfaa7f2c0 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -109,6 +109,7 @@ struct perf_stat_config {
 	bool			 walltime_run_table;
 	bool			 all_kernel;
 	bool			 all_user;
+	bool			 percore_show_thread;
 	FILE			*output;
 	unsigned int		 interval;
 	unsigned int		 timeout;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] perf stat: Show percore counts in per CPU output
  2020-02-14  8:04 [PATCH v4] perf stat: Show percore counts in per CPU output Jin Yao
@ 2020-02-16 22:54 ` Jiri Olsa
  2020-02-17  1:22   ` Jin, Yao
  2020-02-17 14:45   ` Arnaldo Carvalho de Melo
  2020-03-19 14:10 ` [tip: perf/core] " tip-bot2 for Jin Yao
  1 sibling, 2 replies; 8+ messages in thread
From: Jiri Olsa @ 2020-02-16 22:54 UTC (permalink / raw)
  To: Jin Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Fri, Feb 14, 2020 at 04:04:52PM +0800, Jin Yao wrote:

SNIP

>  CPU1               1,009,312      cpu/event=cpu-cycles,percore/
>  CPU2               2,784,072      cpu/event=cpu-cycles,percore/
>  CPU3               2,427,922      cpu/event=cpu-cycles,percore/
>  CPU4               2,752,148      cpu/event=cpu-cycles,percore/
>  CPU6               2,784,072      cpu/event=cpu-cycles,percore/
>  CPU7               2,427,922      cpu/event=cpu-cycles,percore/
> 
>         1.001416041 seconds time elapsed
> 
>  v4:
>  ---
>  Ravi Bangoria reports an issue in v3. Once we offline a CPU,
>  the output is not correct. The issue is we should use the cpu
>  idx in print_percore_thread rather than using the cpu value.

Acked-by: Jiri Olsa <jolsa@kernel.org>

btw, there's slight misalignment in -I output, but not due
to your change, it's there for some time now, and probably
in other agregation  outputs as well:


  $ sudo ./perf stat -e cpu/event=cpu-cycles/ -a -A  -I 1000
  #           time CPU                    counts unit events
       1.000224464 CPU0               7,251,151      cpu/event=cpu-cycles/                                       
       1.000224464 CPU1              21,614,946      cpu/event=cpu-cycles/                                       
       1.000224464 CPU2              30,812,097      cpu/event=cpu-cycles/                                       

should be (extra space after CPUX):

       1.000224464 CPU2               30,812,097      cpu/event=cpu-cycles/                                       

I'll put it on my TODO, but if you're welcome to check on it ;-)

thanks,
jirka


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] perf stat: Show percore counts in per CPU output
  2020-02-16 22:54 ` Jiri Olsa
@ 2020-02-17  1:22   ` Jin, Yao
  2020-02-17 11:06     ` Jiri Olsa
  2020-02-17 14:45   ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 8+ messages in thread
From: Jin, Yao @ 2020-02-17  1:22 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin



On 2/17/2020 6:54 AM, Jiri Olsa wrote:
> On Fri, Feb 14, 2020 at 04:04:52PM +0800, Jin Yao wrote:
> 
> SNIP
> 
>>   CPU1               1,009,312      cpu/event=cpu-cycles,percore/
>>   CPU2               2,784,072      cpu/event=cpu-cycles,percore/
>>   CPU3               2,427,922      cpu/event=cpu-cycles,percore/
>>   CPU4               2,752,148      cpu/event=cpu-cycles,percore/
>>   CPU6               2,784,072      cpu/event=cpu-cycles,percore/
>>   CPU7               2,427,922      cpu/event=cpu-cycles,percore/
>>
>>          1.001416041 seconds time elapsed
>>
>>   v4:
>>   ---
>>   Ravi Bangoria reports an issue in v3. Once we offline a CPU,
>>   the output is not correct. The issue is we should use the cpu
>>   idx in print_percore_thread rather than using the cpu value.
> 
> Acked-by: Jiri Olsa <jolsa@kernel.org>
> 

Thanks so much for ACK this patch. :)

> btw, there's slight misalignment in -I output, but not due
> to your change, it's there for some time now, and probably
> in other agregation  outputs as well:
> 
> 
>    $ sudo ./perf stat -e cpu/event=cpu-cycles/ -a -A  -I 1000
>    #           time CPU                    counts unit events
>         1.000224464 CPU0               7,251,151      cpu/event=cpu-cycles/
>         1.000224464 CPU1              21,614,946      cpu/event=cpu-cycles/
>         1.000224464 CPU2              30,812,097      cpu/event=cpu-cycles/
> 
> should be (extra space after CPUX):
> 
>         1.000224464 CPU2               30,812,097      cpu/event=cpu-cycles/
> 
> I'll put it on my TODO, but if you're welcome to check on it ;-)
> 
> thanks,
> jirka
> 

I have a simple fix for this misalignment issue.

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index bc31fccc0057..95b29c9cba36 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -114,11 +114,11 @@ static void aggr_printout(struct perf_stat_config 
*config,
                         fprintf(config->output, "S%d-D%d-C%*d%s",
                                 cpu_map__id_to_socket(id),
                                 cpu_map__id_to_die(id),
-                               config->csv_output ? 0 : -5,
+                               config->csv_output ? 0 : -3,
                                 cpu_map__id_to_cpu(id), config->csv_sep);
                 } else {
-                       fprintf(config->output, "CPU%*d%s ",
-                               config->csv_output ? 0 : -5,
+                       fprintf(config->output, "CPU%*d%s",
+                               config->csv_output ? 0 : -7,
                                 evsel__cpus(evsel)->map[id],
                                 config->csv_sep);
                 }

Following command lines are tested OK.

perf stat -e cpu/event=cpu-cycles/ -I 1000
perf stat -e cpu/event=cpu-cycles/ -a -I 1000
perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000
perf stat -e cpu/event=cpu-cycles,percore/ -a -A -I 1000
perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread 
-I 1000

Could you help to look at that?

Thanks
Jin Yao

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] perf stat: Show percore counts in per CPU output
  2020-02-17  1:22   ` Jin, Yao
@ 2020-02-17 11:06     ` Jiri Olsa
  2020-02-18  1:02       ` Jin, Yao
  0 siblings, 1 reply; 8+ messages in thread
From: Jiri Olsa @ 2020-02-17 11:06 UTC (permalink / raw)
  To: Jin, Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Mon, Feb 17, 2020 at 09:22:57AM +0800, Jin, Yao wrote:
> 
> 
> On 2/17/2020 6:54 AM, Jiri Olsa wrote:
> > On Fri, Feb 14, 2020 at 04:04:52PM +0800, Jin Yao wrote:
> > 
> > SNIP
> > 
> > >   CPU1               1,009,312      cpu/event=cpu-cycles,percore/
> > >   CPU2               2,784,072      cpu/event=cpu-cycles,percore/
> > >   CPU3               2,427,922      cpu/event=cpu-cycles,percore/
> > >   CPU4               2,752,148      cpu/event=cpu-cycles,percore/
> > >   CPU6               2,784,072      cpu/event=cpu-cycles,percore/
> > >   CPU7               2,427,922      cpu/event=cpu-cycles,percore/
> > > 
> > >          1.001416041 seconds time elapsed
> > > 
> > >   v4:
> > >   ---
> > >   Ravi Bangoria reports an issue in v3. Once we offline a CPU,
> > >   the output is not correct. The issue is we should use the cpu
> > >   idx in print_percore_thread rather than using the cpu value.
> > 
> > Acked-by: Jiri Olsa <jolsa@kernel.org>
> > 
> 
> Thanks so much for ACK this patch. :)
> 
> > btw, there's slight misalignment in -I output, but not due
> > to your change, it's there for some time now, and probably
> > in other agregation  outputs as well:
> > 
> > 
> >    $ sudo ./perf stat -e cpu/event=cpu-cycles/ -a -A  -I 1000
> >    #           time CPU                    counts unit events
> >         1.000224464 CPU0               7,251,151      cpu/event=cpu-cycles/
> >         1.000224464 CPU1              21,614,946      cpu/event=cpu-cycles/
> >         1.000224464 CPU2              30,812,097      cpu/event=cpu-cycles/
> > 
> > should be (extra space after CPUX):
> > 
> >         1.000224464 CPU2               30,812,097      cpu/event=cpu-cycles/
> > 
> > I'll put it on my TODO, but if you're welcome to check on it ;-)
> > 
> > thanks,
> > jirka
> > 
> 
> I have a simple fix for this misalignment issue.
> 
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index bc31fccc0057..95b29c9cba36 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -114,11 +114,11 @@ static void aggr_printout(struct perf_stat_config
> *config,
>                         fprintf(config->output, "S%d-D%d-C%*d%s",
>                                 cpu_map__id_to_socket(id),
>                                 cpu_map__id_to_die(id),
> -                               config->csv_output ? 0 : -5,
> +                               config->csv_output ? 0 : -3,
>                                 cpu_map__id_to_cpu(id), config->csv_sep);
>                 } else {
> -                       fprintf(config->output, "CPU%*d%s ",
> -                               config->csv_output ? 0 : -5,
> +                       fprintf(config->output, "CPU%*d%s",
> +                               config->csv_output ? 0 : -7,
>                                 evsel__cpus(evsel)->map[id],
>                                 config->csv_sep);

I guess that's ok, will that work with higher (3 digit) cpu numbers?

jirka

>                 }
> 
> Following command lines are tested OK.
> 
> perf stat -e cpu/event=cpu-cycles/ -I 1000
> perf stat -e cpu/event=cpu-cycles/ -a -I 1000
> perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000
> perf stat -e cpu/event=cpu-cycles,percore/ -a -A -I 1000
> perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -I
> 1000
> 
> Could you help to look at that?
> 
> Thanks
> Jin Yao
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] perf stat: Show percore counts in per CPU output
  2020-02-16 22:54 ` Jiri Olsa
  2020-02-17  1:22   ` Jin, Yao
@ 2020-02-17 14:45   ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 8+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-02-17 14:45 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Jin Yao, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel,
	ak, kan.liang, yao.jin

Em Sun, Feb 16, 2020 at 11:54:07PM +0100, Jiri Olsa escreveu:
> On Fri, Feb 14, 2020 at 04:04:52PM +0800, Jin Yao wrote:
> 
> SNIP
> 
> >  CPU1               1,009,312      cpu/event=cpu-cycles,percore/
> >  CPU2               2,784,072      cpu/event=cpu-cycles,percore/
> >  CPU3               2,427,922      cpu/event=cpu-cycles,percore/
> >  CPU4               2,752,148      cpu/event=cpu-cycles,percore/
> >  CPU6               2,784,072      cpu/event=cpu-cycles,percore/
> >  CPU7               2,427,922      cpu/event=cpu-cycles,percore/
> > 
> >         1.001416041 seconds time elapsed
> > 
> >  v4:
> >  ---
> >  Ravi Bangoria reports an issue in v3. Once we offline a CPU,
> >  the output is not correct. The issue is we should use the cpu
> >  idx in print_percore_thread rather than using the cpu value.
> 
> Acked-by: Jiri Olsa <jolsa@kernel.org>

Applied to perf/core

- Arnaldo
 
> btw, there's slight misalignment in -I output, but not due
> to your change, it's there for some time now, and probably
> in other agregation  outputs as well:
> 
> 
>   $ sudo ./perf stat -e cpu/event=cpu-cycles/ -a -A  -I 1000
>   #           time CPU                    counts unit events
>        1.000224464 CPU0               7,251,151      cpu/event=cpu-cycles/                                       
>        1.000224464 CPU1              21,614,946      cpu/event=cpu-cycles/                                       
>        1.000224464 CPU2              30,812,097      cpu/event=cpu-cycles/                                       
> 
> should be (extra space after CPUX):
> 
>        1.000224464 CPU2               30,812,097      cpu/event=cpu-cycles/                                       
> 
> I'll put it on my TODO, but if you're welcome to check on it ;-)
> 
> thanks,
> jirka
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] perf stat: Show percore counts in per CPU output
  2020-02-17 11:06     ` Jiri Olsa
@ 2020-02-18  1:02       ` Jin, Yao
  2020-02-18  6:14         ` Jiri Olsa
  0 siblings, 1 reply; 8+ messages in thread
From: Jin, Yao @ 2020-02-18  1:02 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin



On 2/17/2020 7:06 PM, Jiri Olsa wrote:
> On Mon, Feb 17, 2020 at 09:22:57AM +0800, Jin, Yao wrote:
>>
>>
>> On 2/17/2020 6:54 AM, Jiri Olsa wrote:
>>> On Fri, Feb 14, 2020 at 04:04:52PM +0800, Jin Yao wrote:
>>>
>>> SNIP
>>>
>>>>    CPU1               1,009,312      cpu/event=cpu-cycles,percore/
>>>>    CPU2               2,784,072      cpu/event=cpu-cycles,percore/
>>>>    CPU3               2,427,922      cpu/event=cpu-cycles,percore/
>>>>    CPU4               2,752,148      cpu/event=cpu-cycles,percore/
>>>>    CPU6               2,784,072      cpu/event=cpu-cycles,percore/
>>>>    CPU7               2,427,922      cpu/event=cpu-cycles,percore/
>>>>
>>>>           1.001416041 seconds time elapsed
>>>>
>>>>    v4:
>>>>    ---
>>>>    Ravi Bangoria reports an issue in v3. Once we offline a CPU,
>>>>    the output is not correct. The issue is we should use the cpu
>>>>    idx in print_percore_thread rather than using the cpu value.
>>>
>>> Acked-by: Jiri Olsa <jolsa@kernel.org>
>>>
>>
>> Thanks so much for ACK this patch. :)
>>
>>> btw, there's slight misalignment in -I output, but not due
>>> to your change, it's there for some time now, and probably
>>> in other agregation  outputs as well:
>>>
>>>
>>>     $ sudo ./perf stat -e cpu/event=cpu-cycles/ -a -A  -I 1000
>>>     #           time CPU                    counts unit events
>>>          1.000224464 CPU0               7,251,151      cpu/event=cpu-cycles/
>>>          1.000224464 CPU1              21,614,946      cpu/event=cpu-cycles/
>>>          1.000224464 CPU2              30,812,097      cpu/event=cpu-cycles/
>>>
>>> should be (extra space after CPUX):
>>>
>>>          1.000224464 CPU2               30,812,097      cpu/event=cpu-cycles/
>>>
>>> I'll put it on my TODO, but if you're welcome to check on it ;-)
>>>
>>> thanks,
>>> jirka
>>>
>>
>> I have a simple fix for this misalignment issue.
>>
>> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
>> index bc31fccc0057..95b29c9cba36 100644
>> --- a/tools/perf/util/stat-display.c
>> +++ b/tools/perf/util/stat-display.c
>> @@ -114,11 +114,11 @@ static void aggr_printout(struct perf_stat_config
>> *config,
>>                          fprintf(config->output, "S%d-D%d-C%*d%s",
>>                                  cpu_map__id_to_socket(id),
>>                                  cpu_map__id_to_die(id),
>> -                               config->csv_output ? 0 : -5,
>> +                               config->csv_output ? 0 : -3,
>>                                  cpu_map__id_to_cpu(id), config->csv_sep);
>>                  } else {
>> -                       fprintf(config->output, "CPU%*d%s ",
>> -                               config->csv_output ? 0 : -5,
>> +                       fprintf(config->output, "CPU%*d%s",
>> +                               config->csv_output ? 0 : -7,
>>                                  evsel__cpus(evsel)->map[id],
>>                                  config->csv_sep);
> 
> I guess that's ok, will that work with higher (3 digit) cpu numbers?
> 
> jirka
> 

Yes, it works with hundreds of CPU. I have tested with that case.

BTW, do you need me to post a separate patch or you will add this fix in 
your patch series?

Thanks
Jin Yao

>>                  }
>>
>> Following command lines are tested OK.
>>
>> perf stat -e cpu/event=cpu-cycles/ -I 1000
>> perf stat -e cpu/event=cpu-cycles/ -a -I 1000
>> perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000
>> perf stat -e cpu/event=cpu-cycles,percore/ -a -A -I 1000
>> perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -I
>> 1000
>>
>> Could you help to look at that?
>>
>> Thanks
>> Jin Yao
>>
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4] perf stat: Show percore counts in per CPU output
  2020-02-18  1:02       ` Jin, Yao
@ 2020-02-18  6:14         ` Jiri Olsa
  0 siblings, 0 replies; 8+ messages in thread
From: Jiri Olsa @ 2020-02-18  6:14 UTC (permalink / raw)
  To: Jin, Yao
  Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
	kan.liang, yao.jin

On Tue, Feb 18, 2020 at 09:02:52AM +0800, Jin, Yao wrote:

SNIP

> > > > 
> > > > thanks,
> > > > jirka
> > > > 
> > > 
> > > I have a simple fix for this misalignment issue.
> > > 
> > > diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> > > index bc31fccc0057..95b29c9cba36 100644
> > > --- a/tools/perf/util/stat-display.c
> > > +++ b/tools/perf/util/stat-display.c
> > > @@ -114,11 +114,11 @@ static void aggr_printout(struct perf_stat_config
> > > *config,
> > >                          fprintf(config->output, "S%d-D%d-C%*d%s",
> > >                                  cpu_map__id_to_socket(id),
> > >                                  cpu_map__id_to_die(id),
> > > -                               config->csv_output ? 0 : -5,
> > > +                               config->csv_output ? 0 : -3,
> > >                                  cpu_map__id_to_cpu(id), config->csv_sep);
> > >                  } else {
> > > -                       fprintf(config->output, "CPU%*d%s ",
> > > -                               config->csv_output ? 0 : -5,
> > > +                       fprintf(config->output, "CPU%*d%s",
> > > +                               config->csv_output ? 0 : -7,
> > >                                  evsel__cpus(evsel)->map[id],
> > >                                  config->csv_sep);
> > 
> > I guess that's ok, will that work with higher (3 digit) cpu numbers?
> > 
> > jirka
> > 
> 
> Yes, it works with hundreds of CPU. I have tested with that case.
> 
> BTW, do you need me to post a separate patch or you will add this fix in
> your patch series?

please send separate patch

thanks,
jirka


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [tip: perf/core] perf stat: Show percore counts in per CPU output
  2020-02-14  8:04 [PATCH v4] perf stat: Show percore counts in per CPU output Jin Yao
  2020-02-16 22:54 ` Jiri Olsa
@ 2020-03-19 14:10 ` tip-bot2 for Jin Yao
  1 sibling, 0 replies; 8+ messages in thread
From: tip-bot2 for Jin Yao @ 2020-03-19 14:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Jin Yao, Ravi Bangoria, Jiri Olsa, Alexander Shishkin,
	Andi Kleen, Kan Liang, Peter Zijlstra, Arnaldo Carvalho de Melo,
	x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     1af62ce61cd80a25590d5abeccfb5c3a666527dd
Gitweb:        https://git.kernel.org/tip/1af62ce61cd80a25590d5abeccfb5c3a666527dd
Author:        Jin Yao <yao.jin@linux.intel.com>
AuthorDate:    Fri, 14 Feb 2020 16:04:52 +08:00
Committer:     Arnaldo Carvalho de Melo <acme@redhat.com>
CommitterDate: Wed, 04 Mar 2020 10:34:09 -03:00

perf stat: Show percore counts in per CPU output

We have supported the event modifier "percore" which sums up the event
counts for all hardware threads in a core and show the counts per core.

For example,

 # perf stat -e cpu/event=cpu-cycles,percore/ -a -A -- sleep 1

  Performance counter stats for 'system wide':

 S0-D0-C0                395,072      cpu/event=cpu-cycles,percore/
 S0-D0-C1                851,248      cpu/event=cpu-cycles,percore/
 S0-D0-C2                954,226      cpu/event=cpu-cycles,percore/
 S0-D0-C3              1,233,659      cpu/event=cpu-cycles,percore/

This patch provides a new option "--percore-show-thread". It is used
with event modifier "percore" together to sum up the event counts for
all hardware threads in a core but show the counts per hardware thread.

This is essentially a replacement for the any bit (which is gone in
Icelake). Per core counts are useful for some formulas, e.g. CoreIPC.
The original percore version was inconvenient to post process. This
variant matches the output of the any bit.

With this patch, for example,

 # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread  -- sleep 1

  Performance counter stats for 'system wide':

 CPU0               2,453,061      cpu/event=cpu-cycles,percore/
 CPU1               1,823,921      cpu/event=cpu-cycles,percore/
 CPU2               1,383,166      cpu/event=cpu-cycles,percore/
 CPU3               1,102,652      cpu/event=cpu-cycles,percore/
 CPU4               2,453,061      cpu/event=cpu-cycles,percore/
 CPU5               1,823,921      cpu/event=cpu-cycles,percore/
 CPU6               1,383,166      cpu/event=cpu-cycles,percore/
 CPU7               1,102,652      cpu/event=cpu-cycles,percore/

We can see counts are duplicated in CPU pairs (CPU0/CPU4, CPU1/CPU5,
CPU2/CPU6, CPU3/CPU7).

The interval mode also works. For example,

 # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread  -I 1000
 #           time CPU                    counts unit events
      1.000425421 CPU0                 925,032      cpu/event=cpu-cycles,percore/
      1.000425421 CPU1                 430,202      cpu/event=cpu-cycles,percore/
      1.000425421 CPU2                 436,843      cpu/event=cpu-cycles,percore/
      1.000425421 CPU3               1,192,504      cpu/event=cpu-cycles,percore/
      1.000425421 CPU4                 925,032      cpu/event=cpu-cycles,percore/
      1.000425421 CPU5                 430,202      cpu/event=cpu-cycles,percore/
      1.000425421 CPU6                 436,843      cpu/event=cpu-cycles,percore/
      1.000425421 CPU7               1,192,504      cpu/event=cpu-cycles,percore/

If we offline CPU5, the result is:

 # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1

  Performance counter stats for 'system wide':

 CPU0               2,752,148      cpu/event=cpu-cycles,percore/
 CPU1               1,009,312      cpu/event=cpu-cycles,percore/
 CPU2               2,784,072      cpu/event=cpu-cycles,percore/
 CPU3               2,427,922      cpu/event=cpu-cycles,percore/
 CPU4               2,752,148      cpu/event=cpu-cycles,percore/
 CPU6               2,784,072      cpu/event=cpu-cycles,percore/
 CPU7               2,427,922      cpu/event=cpu-cycles,percore/

        1.001416041 seconds time elapsed

 v4:
 ---
 Ravi Bangoria reports an issue in v3. Once we offline a CPU,
 the output is not correct. The issue is we should use the cpu
 idx in print_percore_thread rather than using the cpu value.

 v3:
 ---
 1. Fix the interval mode output error
 2. Use cpu value (not cpu index) in config->aggr_get_id().
 3. Refine the code according to Jiri's comments.

 v2:
 ---
 Add the explanation in change log. This is essentially a replacement
 for the any bit. No code change.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20200214080452.26402-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-stat.txt |  9 +++++++-
 tools/perf/builtin-stat.c              |  4 +++-
 tools/perf/util/stat-display.c         | 33 +++++++++++++++++++++----
 tools/perf/util/stat.h                 |  1 +-
 4 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 9431b80..4d56586 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -334,6 +334,15 @@ Configure all used events to run in kernel space.
 --all-user::
 Configure all used events to run in user space.
 
+--percore-show-thread::
+The event modifier "percore" has supported to sum up the event counts
+for all hardware threads in a core and show the counts per core.
+
+This option with event modifier "percore" enabled also sums up the event
+counts for all hardware threads in a core but show the sum counts per
+hardware thread. This is essentially a replacement for the any bit and
+convenient for post processing.
+
 EXAMPLES
 --------
 
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index a098c2e..ec053dc 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -929,6 +929,10 @@ static struct option stat_options[] = {
 	OPT_BOOLEAN_FLAG(0, "all-user", &stat_config.all_user,
 			 "Configure all used events to run in user space.",
 			 PARSE_OPT_EXCLUSIVE),
+	OPT_BOOLEAN(0, "percore-show-thread", &stat_config.percore_show_thread,
+		    "Use with 'percore' event qualifier to show the event "
+		    "counts of one hardware thread by sum up total hardware "
+		    "threads of same physical core"),
 	OPT_END()
 };
 
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index bc31fcc..d89cb0d 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -110,7 +110,7 @@ static void aggr_printout(struct perf_stat_config *config,
 			config->csv_sep);
 			break;
 	case AGGR_NONE:
-		if (evsel->percore) {
+		if (evsel->percore && !config->percore_show_thread) {
 			fprintf(config->output, "S%d-D%d-C%*d%s",
 				cpu_map__id_to_socket(id),
 				cpu_map__id_to_die(id),
@@ -628,7 +628,7 @@ static void aggr_cb(struct perf_stat_config *config,
 static void print_counter_aggrdata(struct perf_stat_config *config,
 				   struct evsel *counter, int s,
 				   char *prefix, bool metric_only,
-				   bool *first)
+				   bool *first, int cpu)
 {
 	struct aggr_data ad;
 	FILE *output = config->output;
@@ -654,7 +654,7 @@ static void print_counter_aggrdata(struct perf_stat_config *config,
 		fprintf(output, "%s", prefix);
 
 	uval = val * counter->scale;
-	printout(config, id, nr, counter, uval, prefix,
+	printout(config, cpu != -1 ? cpu : id, nr, counter, uval, prefix,
 		 run, ena, 1.0, &rt_stat);
 	if (!metric_only)
 		fputc('\n', output);
@@ -687,7 +687,7 @@ static void print_aggr(struct perf_stat_config *config,
 		evlist__for_each_entry(evlist, counter) {
 			print_counter_aggrdata(config, counter, s,
 					       prefix, metric_only,
-					       &first);
+					       &first, -1);
 		}
 		if (metric_only)
 			fputc('\n', output);
@@ -1146,6 +1146,26 @@ static void print_footer(struct perf_stat_config *config)
 			"the same PMU. Try reorganizing the group.\n");
 }
 
+static void print_percore_thread(struct perf_stat_config *config,
+				 struct evsel *counter, char *prefix)
+{
+	int s, s2, id;
+	bool first = true;
+
+	for (int i = 0; i < perf_evsel__nr_cpus(counter); i++) {
+		s2 = config->aggr_get_id(config, evsel__cpus(counter), i);
+		for (s = 0; s < config->aggr_map->nr; s++) {
+			id = config->aggr_map->map[s];
+			if (s2 == id)
+				break;
+		}
+
+		print_counter_aggrdata(config, counter, s,
+				       prefix, false,
+				       &first, i);
+	}
+}
+
 static void print_percore(struct perf_stat_config *config,
 			  struct evsel *counter, char *prefix)
 {
@@ -1157,13 +1177,16 @@ static void print_percore(struct perf_stat_config *config,
 	if (!(config->aggr_map || config->aggr_get_id))
 		return;
 
+	if (config->percore_show_thread)
+		return print_percore_thread(config, counter, prefix);
+
 	for (s = 0; s < config->aggr_map->nr; s++) {
 		if (prefix && metric_only)
 			fprintf(output, "%s", prefix);
 
 		print_counter_aggrdata(config, counter, s,
 				       prefix, metric_only,
-				       &first);
+				       &first, -1);
 	}
 
 	if (metric_only)
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index fb990ef..b4fdfaa 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -109,6 +109,7 @@ struct perf_stat_config {
 	bool			 walltime_run_table;
 	bool			 all_kernel;
 	bool			 all_user;
+	bool			 percore_show_thread;
 	FILE			*output;
 	unsigned int		 interval;
 	unsigned int		 timeout;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-03-19 14:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-14  8:04 [PATCH v4] perf stat: Show percore counts in per CPU output Jin Yao
2020-02-16 22:54 ` Jiri Olsa
2020-02-17  1:22   ` Jin, Yao
2020-02-17 11:06     ` Jiri Olsa
2020-02-18  1:02       ` Jin, Yao
2020-02-18  6:14         ` Jiri Olsa
2020-02-17 14:45   ` Arnaldo Carvalho de Melo
2020-03-19 14:10 ` [tip: perf/core] " tip-bot2 for Jin Yao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).