All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/2] perf stat: Use proper cpu for shadow stats
@ 2020-11-27  4:14 Namhyung Kim
  2020-11-27  4:14 ` [PATCH v2 2/2] perf test: Add shadow stat test Namhyung Kim
  2020-11-27 17:32 ` [PATCH v2 1/2] perf stat: Use proper cpu for shadow stats Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 4+ messages in thread
From: Namhyung Kim @ 2020-11-27  4:14 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Mark Rutland, Alexander Shishkin,
	LKML, Stephane Eranian, Ian Rogers, Sam Xi, Andi Kleen

Currently perf stat shows some metrics (like IPC) for defined events.
But when no aggregation mode is used (-A option), it shows incorrect
values since it used a value from a different cpu.

Before:

  $ perf stat -aA -e cycles,instructions sleep 1

   Performance counter stats for 'system wide':

  CPU0      116,057,380      cycles
  CPU1       86,084,722      cycles
  CPU2       99,423,125      cycles
  CPU3       98,272,994      cycles
  CPU0       53,369,217      instructions      #    0.46  insn per cycle
  CPU1       33,378,058      instructions      #    0.29  insn per cycle
  CPU2       58,150,086      instructions      #    0.50  insn per cycle
  CPU3       40,029,703      instructions      #    0.34  insn per cycle

       1.001816971 seconds time elapsed

So the IPC for CPU1 should be 0.38 (= 33,378,058 / 86,084,722)
but it was 0.29 (= 33,378,058 / 116,057,380) and so on.

After:

  $ perf stat -aA -e cycles,instructions sleep 1

   Performance counter stats for 'system wide':

  CPU0      109,621,384      cycles
  CPU1      159,026,454      cycles
  CPU2       99,460,366      cycles
  CPU3      124,144,142      cycles
  CPU0       44,396,706      instructions      #    0.41  insn per cycle
  CPU1      120,195,425      instructions      #    0.76  insn per cycle
  CPU2       44,763,978      instructions      #    0.45  insn per cycle
  CPU3       69,049,079      instructions      #    0.56  insn per cycle

       1.001910444 seconds time elapsed

Reported-by: Sam Xi <xyzsam@google.com>
Fixes: 44d49a600259 ("perf stat: Support metrics in --per-core/socket mode")
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/stat-display.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 4b57c0c07632..a963b5b8eb72 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -324,13 +324,10 @@ static int first_shadow_cpu(struct perf_stat_config *config,
 	struct evlist *evlist = evsel->evlist;
 	int i;
 
-	if (!config->aggr_get_id)
-		return 0;
-
 	if (config->aggr_mode == AGGR_NONE)
 		return id;
 
-	if (config->aggr_mode == AGGR_GLOBAL)
+	if (!config->aggr_get_id)
 		return 0;
 
 	for (i = 0; i < evsel__nr_cpus(evsel); i++) {
-- 
2.29.2.454.gaff20da3a2-goog


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 2/2] perf test: Add shadow stat test
  2020-11-27  4:14 [PATCH v2 1/2] perf stat: Use proper cpu for shadow stats Namhyung Kim
@ 2020-11-27  4:14 ` Namhyung Kim
  2020-11-30 11:59   ` Arnaldo Carvalho de Melo
  2020-11-27 17:32 ` [PATCH v2 1/2] perf stat: Use proper cpu for shadow stats Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 4+ messages in thread
From: Namhyung Kim @ 2020-11-27  4:14 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, Mark Rutland, Alexander Shishkin,
	LKML, Stephane Eranian, Ian Rogers

It calculates IPC from the cycles and instruction counts and compares
it with the shadow stat for both global aggregation (default) and no
aggregation mode.

 $ perf stat -a -A -e cycles,instructions sleep 1

   Performance counter stats for 'system wide':

  CPU0   39,580,880      cycles
  CPU1   45,426,945      cycles
  CPU2   31,151,685      cycles
  CPU3   55,167,421      cycles
  CPU0   17,073,564      instructions      #    0.43  insn per cycle
  CPU1   34,955,764      instructions      #    0.77  insn per cycle
  CPU2   15,688,459      instructions      #    0.50  insn per cycle
  CPU3   34,699,217      instructions      #    0.63  insn per cycle

       1.003275495 seconds time elapsed

In this example, the 'insn per cycle' should be matched to the number
for each cpu.  For CPU2, 0.50 = 15,688,459 / 31,151,685 .

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/tests/shell/stat+shadow_stat.sh | 80 ++++++++++++++++++++++
 1 file changed, 80 insertions(+)
 create mode 100755 tools/perf/tests/shell/stat+shadow_stat.sh

diff --git a/tools/perf/tests/shell/stat+shadow_stat.sh b/tools/perf/tests/shell/stat+shadow_stat.sh
new file mode 100755
index 000000000000..249dfe48cf6a
--- /dev/null
+++ b/tools/perf/tests/shell/stat+shadow_stat.sh
@@ -0,0 +1,80 @@
+#!/bin/sh
+# perf stat metrics (shadow stat) test
+# SPDX-License-Identifier: GPL-2.0
+
+set -e
+
+# skip if system-wide mode is forbidden
+perf stat -a true > /dev/null 2>&1 || exit 2
+
+test_global_aggr()
+{
+	local cyc
+
+	perf stat -a --no-big-num -e cycles,instructions sleep 1  2>&1 | \
+	grep -e cycles -e instructions | \
+	while read num evt hash ipc rest
+	do
+		# skip not counted events
+		if [[ $num == "<not" ]]; then
+			continue
+		fi
+
+		# save cycles count
+		if [[ $evt == "cycles" ]]; then
+			cyc=$num
+			continue
+		fi
+
+		# skip if no cycles
+		if [[ -z $cyc ]]; then
+			continue
+		fi
+
+		# use printf for rounding and a leading zero
+		local res=`printf "%.2f" $(echo "scale=6; $num / $cyc" | bc -q)`
+		if [[ $ipc != $res ]]; then
+			echo "IPC is different: $res != $ipc  ($num / $cyc)"
+			exit 1
+		fi
+	done
+}
+
+test_no_aggr()
+{
+	declare -A results
+
+	perf stat -a -A --no-big-num -e cycles,instructions sleep 1  2>&1 | \
+	grep ^CPU | \
+	while read cpu num evt hash ipc rest
+	do
+		# skip not counted events
+		if [[ $num == "<not" ]]; then
+			continue
+		fi
+
+		# save cycles count
+		if [[ $evt == "cycles" ]]; then
+			results[$cpu]=$num
+			continue
+		fi
+
+		# skip if no cycles
+		local cyc=${results[$cpu]}
+		if [[ -z $cyc ]]; then
+			continue
+		fi
+
+		# use printf for rounding and a leading zero
+		local res=`printf "%.2f" $(echo "scale=6; $num / $cyc" | bc -q)`
+		if [[ $ipc != $res ]]; then
+			echo "IPC is different for $cpu: $res != $ipc  ($num / $cyc)"
+			exit 1
+		fi
+	done
+}
+
+test_global_aggr
+test_no_aggr
+
+exit 0
-- 
2.29.2.454.gaff20da3a2-goog


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 1/2] perf stat: Use proper cpu for shadow stats
  2020-11-27  4:14 [PATCH v2 1/2] perf stat: Use proper cpu for shadow stats Namhyung Kim
  2020-11-27  4:14 ` [PATCH v2 2/2] perf test: Add shadow stat test Namhyung Kim
@ 2020-11-27 17:32 ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 4+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-11-27 17:32 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, Mark Rutland,
	Alexander Shishkin, LKML, Stephane Eranian, Ian Rogers, Sam Xi,
	Andi Kleen

Em Fri, Nov 27, 2020 at 01:14:03PM +0900, Namhyung Kim escreveu:
> Currently perf stat shows some metrics (like IPC) for defined events.
> But when no aggregation mode is used (-A option), it shows incorrect
> values since it used a value from a different cpu.
> 
> Before:
> 
>   $ perf stat -aA -e cycles,instructions sleep 1
> 
>    Performance counter stats for 'system wide':
> 
>   CPU0      116,057,380      cycles
>   CPU1       86,084,722      cycles
>   CPU2       99,423,125      cycles
>   CPU3       98,272,994      cycles
>   CPU0       53,369,217      instructions      #    0.46  insn per cycle
>   CPU1       33,378,058      instructions      #    0.29  insn per cycle
>   CPU2       58,150,086      instructions      #    0.50  insn per cycle
>   CPU3       40,029,703      instructions      #    0.34  insn per cycle
> 
>        1.001816971 seconds time elapsed
> 
> So the IPC for CPU1 should be 0.38 (= 33,378,058 / 86,084,722)
> but it was 0.29 (= 33,378,058 / 116,057,380) and so on.
> 
> After:
> 
>   $ perf stat -aA -e cycles,instructions sleep 1
> 
>    Performance counter stats for 'system wide':
> 
>   CPU0      109,621,384      cycles
>   CPU1      159,026,454      cycles
>   CPU2       99,460,366      cycles
>   CPU3      124,144,142      cycles
>   CPU0       44,396,706      instructions      #    0.41  insn per cycle
>   CPU1      120,195,425      instructions      #    0.76  insn per cycle
>   CPU2       44,763,978      instructions      #    0.45  insn per cycle
>   CPU3       69,049,079      instructions      #    0.56  insn per cycle
> 
>        1.001910444 seconds time elapsed

Thanks, applied, the new 'perf test' entry in 2/2 will be merged into
perf/core, as it isn't purely a fix,

- Arnaldo
 
> Reported-by: Sam Xi <xyzsam@google.com>
> Fixes: 44d49a600259 ("perf stat: Support metrics in --per-core/socket mode")
> Reviewed-by: Andi Kleen <ak@linux.intel.com>
> Acked-by: Jiri Olsa <jolsa@redhat.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/util/stat-display.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index 4b57c0c07632..a963b5b8eb72 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -324,13 +324,10 @@ static int first_shadow_cpu(struct perf_stat_config *config,
>  	struct evlist *evlist = evsel->evlist;
>  	int i;
>  
> -	if (!config->aggr_get_id)
> -		return 0;
> -
>  	if (config->aggr_mode == AGGR_NONE)
>  		return id;
>  
> -	if (config->aggr_mode == AGGR_GLOBAL)
> +	if (!config->aggr_get_id)
>  		return 0;
>  
>  	for (i = 0; i < evsel__nr_cpus(evsel); i++) {
> -- 
> 2.29.2.454.gaff20da3a2-goog
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 2/2] perf test: Add shadow stat test
  2020-11-27  4:14 ` [PATCH v2 2/2] perf test: Add shadow stat test Namhyung Kim
@ 2020-11-30 11:59   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 4+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-11-30 11:59 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Ingo Molnar, Peter Zijlstra, Mark Rutland,
	Alexander Shishkin, LKML, Stephane Eranian, Ian Rogers

Em Fri, Nov 27, 2020 at 01:14:04PM +0900, Namhyung Kim escreveu:
> It calculates IPC from the cycles and instruction counts and compares
> it with the shadow stat for both global aggregation (default) and no
> aggregation mode.
> 
>  $ perf stat -a -A -e cycles,instructions sleep 1
> 
>    Performance counter stats for 'system wide':
> 
>   CPU0   39,580,880      cycles
>   CPU1   45,426,945      cycles
>   CPU2   31,151,685      cycles
>   CPU3   55,167,421      cycles
>   CPU0   17,073,564      instructions      #    0.43  insn per cycle
>   CPU1   34,955,764      instructions      #    0.77  insn per cycle
>   CPU2   15,688,459      instructions      #    0.50  insn per cycle
>   CPU3   34,699,217      instructions      #    0.63  insn per cycle
> 
>        1.003275495 seconds time elapsed
> 
> In this example, the 'insn per cycle' should be matched to the number
> for each cpu.  For CPU2, 0.50 = 15,688,459 / 31,151,685 .

Merged torvalds/master into perf/core and applied this patch,

Thanks,

- Arnaldo
 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/tests/shell/stat+shadow_stat.sh | 80 ++++++++++++++++++++++
>  1 file changed, 80 insertions(+)
>  create mode 100755 tools/perf/tests/shell/stat+shadow_stat.sh
> 
> diff --git a/tools/perf/tests/shell/stat+shadow_stat.sh b/tools/perf/tests/shell/stat+shadow_stat.sh
> new file mode 100755
> index 000000000000..249dfe48cf6a
> --- /dev/null
> +++ b/tools/perf/tests/shell/stat+shadow_stat.sh
> @@ -0,0 +1,80 @@
> +#!/bin/sh
> +# perf stat metrics (shadow stat) test
> +# SPDX-License-Identifier: GPL-2.0
> +
> +set -e
> +
> +# skip if system-wide mode is forbidden
> +perf stat -a true > /dev/null 2>&1 || exit 2
> +
> +test_global_aggr()
> +{
> +	local cyc
> +
> +	perf stat -a --no-big-num -e cycles,instructions sleep 1  2>&1 | \
> +	grep -e cycles -e instructions | \
> +	while read num evt hash ipc rest
> +	do
> +		# skip not counted events
> +		if [[ $num == "<not" ]]; then
> +			continue
> +		fi
> +
> +		# save cycles count
> +		if [[ $evt == "cycles" ]]; then
> +			cyc=$num
> +			continue
> +		fi
> +
> +		# skip if no cycles
> +		if [[ -z $cyc ]]; then
> +			continue
> +		fi
> +
> +		# use printf for rounding and a leading zero
> +		local res=`printf "%.2f" $(echo "scale=6; $num / $cyc" | bc -q)`
> +		if [[ $ipc != $res ]]; then
> +			echo "IPC is different: $res != $ipc  ($num / $cyc)"
> +			exit 1
> +		fi
> +	done
> +}
> +
> +test_no_aggr()
> +{
> +	declare -A results
> +
> +	perf stat -a -A --no-big-num -e cycles,instructions sleep 1  2>&1 | \
> +	grep ^CPU | \
> +	while read cpu num evt hash ipc rest
> +	do
> +		# skip not counted events
> +		if [[ $num == "<not" ]]; then
> +			continue
> +		fi
> +
> +		# save cycles count
> +		if [[ $evt == "cycles" ]]; then
> +			results[$cpu]=$num
> +			continue
> +		fi
> +
> +		# skip if no cycles
> +		local cyc=${results[$cpu]}
> +		if [[ -z $cyc ]]; then
> +			continue
> +		fi
> +
> +		# use printf for rounding and a leading zero
> +		local res=`printf "%.2f" $(echo "scale=6; $num / $cyc" | bc -q)`
> +		if [[ $ipc != $res ]]; then
> +			echo "IPC is different for $cpu: $res != $ipc  ($num / $cyc)"
> +			exit 1
> +		fi
> +	done
> +}
> +
> +test_global_aggr
> +test_no_aggr
> +
> +exit 0
> -- 
> 2.29.2.454.gaff20da3a2-goog
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-11-30 12:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-27  4:14 [PATCH v2 1/2] perf stat: Use proper cpu for shadow stats Namhyung Kim
2020-11-27  4:14 ` [PATCH v2 2/2] perf test: Add shadow stat test Namhyung Kim
2020-11-30 11:59   ` Arnaldo Carvalho de Melo
2020-11-27 17:32 ` [PATCH v2 1/2] perf stat: Use proper cpu for shadow stats Arnaldo Carvalho de Melo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.