* [PATCH v5 1/5] perf stat: Fix wrong per-thread runtime stat for interval mode
2020-05-14 5:36 [PATCH v5 0/5] perf stat: Support overall statistics for interval mode Jin Yao
@ 2020-05-14 5:36 ` Jin Yao
2020-05-14 5:36 ` [PATCH v5 2/5] perf counts: Reset prev_raw_counts counts Jin Yao
` (4 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Jin Yao @ 2020-05-14 5:36 UTC (permalink / raw)
To: acme, jolsa, peterz, mingo, alexander.shishkin
Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
root@kbl-ppc:~# perf stat --per-thread -e cycles,instructions -I1000 --interval-count 2
1.004171683 perf-3696 8,747,311 cycles
...
1.004171683 perf-3696 691,730 instructions # 0.08 insn per cycle
...
2.006490373 perf-3696 1,749,936 cycles
...
2.006490373 perf-3696 1,484,582 instructions # 0.28 insn per cycle
...
Let's see interval 2.006490373
perf-3696 1,749,936 cycles
perf-3696 1,484,582 instructions # 0.28 insn per cycle
insn per cycle = 1,484,582 / 1,749,936 = 0.85.
But now it's 0.28, that's not correct.
stat_config.stats[] records the per-thread runtime stat. But for interval
mode, it should be reset for each interval.
So now, with this patch,
root@kbl-ppc:~# perf stat --per-thread -e cycles,instructions -I1000 --interval-count 2
1.005818121 perf-8633 9,898,045 cycles
...
1.005818121 perf-8633 693,298 instructions # 0.07 insn per cycle
...
2.007863743 perf-8633 1,551,619 cycles
...
2.007863743 perf-8633 1,317,514 instructions # 0.85 insn per cycle
...
Let's check interval 2.007863743.
insn per cycle = 1,317,514 / 1,551,619 = 0.85. It's correct.
This patch creates runtime_stat_reset, places it next to
untime_stat_new/runtime_stat_delete and moves all runtime_stat
functions before process_interval.
v4:
---
Create runtime_stat_reset.
Fixes: commit 14e72a21c783 ("perf stat: Update or print per-thread stats")
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
tools/perf/builtin-stat.c | 70 +++++++++++++++++++++++----------------
1 file changed, 41 insertions(+), 29 deletions(-)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index e0c1ad23c768..f3b3a59ac7d2 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -351,6 +351,46 @@ static void read_counters(struct timespec *rs)
}
}
+static int runtime_stat_new(struct perf_stat_config *config, int nthreads)
+{
+ int i;
+
+ config->stats = calloc(nthreads, sizeof(struct runtime_stat));
+ if (!config->stats)
+ return -1;
+
+ config->stats_num = nthreads;
+
+ for (i = 0; i < nthreads; i++)
+ runtime_stat__init(&config->stats[i]);
+
+ return 0;
+}
+
+static void runtime_stat_delete(struct perf_stat_config *config)
+{
+ int i;
+
+ if (!config->stats)
+ return;
+
+ for (i = 0; i < config->stats_num; i++)
+ runtime_stat__exit(&config->stats[i]);
+
+ zfree(&config->stats);
+}
+
+static void runtime_stat_reset(struct perf_stat_config *config)
+{
+ int i;
+
+ if (!config->stats)
+ return;
+
+ for (i = 0; i < config->stats_num; i++)
+ perf_stat__reset_shadow_per_stat(&config->stats[i]);
+}
+
static void process_interval(void)
{
struct timespec ts, rs;
@@ -359,6 +399,7 @@ static void process_interval(void)
diff_timespec(&rs, &ts, &ref_time);
perf_stat__reset_shadow_per_stat(&rt_stat);
+ runtime_stat_reset(&stat_config);
read_counters(&rs);
if (STAT_RECORD) {
@@ -1737,35 +1778,6 @@ int process_cpu_map_event(struct perf_session *session,
return set_maps(st);
}
-static int runtime_stat_new(struct perf_stat_config *config, int nthreads)
-{
- int i;
-
- config->stats = calloc(nthreads, sizeof(struct runtime_stat));
- if (!config->stats)
- return -1;
-
- config->stats_num = nthreads;
-
- for (i = 0; i < nthreads; i++)
- runtime_stat__init(&config->stats[i]);
-
- return 0;
-}
-
-static void runtime_stat_delete(struct perf_stat_config *config)
-{
- int i;
-
- if (!config->stats)
- return;
-
- for (i = 0; i < config->stats_num; i++)
- runtime_stat__exit(&config->stats[i]);
-
- zfree(&config->stats);
-}
-
static const char * const stat_report_usage[] = {
"perf stat report [<options>]",
NULL,
--
2.17.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v5 2/5] perf counts: Reset prev_raw_counts counts
2020-05-14 5:36 [PATCH v5 0/5] perf stat: Support overall statistics for interval mode Jin Yao
2020-05-14 5:36 ` [PATCH v5 1/5] perf stat: Fix wrong per-thread runtime stat " Jin Yao
@ 2020-05-14 5:36 ` Jin Yao
2020-05-14 5:36 ` [PATCH v5 3/5] perf stat: Copy counts from prev_raw_counts to evsel->counts Jin Yao
` (3 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Jin Yao @ 2020-05-14 5:36 UTC (permalink / raw)
To: acme, jolsa, peterz, mingo, alexander.shishkin
Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
When we want to reset the evsel->prev_raw_counts, zeroing the aggr
is not enough, we need to reset the perf_counts too.
The perf_counts__reset zeros the perf_counts, and it should zero
the aggr too. This patch changes perf_counts__reset to non-static,
and calls it in evsel__reset_prev_raw_counts to reset the
prev_raw_counts.
v4:
---
Zeroing the aggr in perf_counts__reset and use it to reset
prev_raw_counts.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
tools/perf/util/counts.c | 4 +++-
tools/perf/util/counts.h | 1 +
tools/perf/util/stat.c | 7 ++-----
3 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/tools/perf/util/counts.c b/tools/perf/util/counts.c
index 615c9f3e95cb..582f3aeaf5e4 100644
--- a/tools/perf/util/counts.c
+++ b/tools/perf/util/counts.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
#include <errno.h>
#include <stdlib.h>
+#include <string.h>
#include "evsel.h"
#include "counts.h"
#include <linux/zalloc.h>
@@ -42,10 +43,11 @@ void perf_counts__delete(struct perf_counts *counts)
}
}
-static void perf_counts__reset(struct perf_counts *counts)
+void perf_counts__reset(struct perf_counts *counts)
{
xyarray__reset(counts->loaded);
xyarray__reset(counts->values);
+ memset(&counts->aggr, 0, sizeof(struct perf_counts_values));
}
void evsel__reset_counts(struct evsel *evsel)
diff --git a/tools/perf/util/counts.h b/tools/perf/util/counts.h
index 8f556c6d98fa..7ff36bf6d644 100644
--- a/tools/perf/util/counts.h
+++ b/tools/perf/util/counts.h
@@ -37,6 +37,7 @@ perf_counts__set_loaded(struct perf_counts *counts, int cpu, int thread, bool lo
struct perf_counts *perf_counts__new(int ncpus, int nthreads);
void perf_counts__delete(struct perf_counts *counts);
+void perf_counts__reset(struct perf_counts *counts);
void evsel__reset_counts(struct evsel *evsel);
int evsel__alloc_counts(struct evsel *evsel, int ncpus, int nthreads);
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index f4a44df9b221..e397815f0dfb 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -163,11 +163,8 @@ static void evsel__free_prev_raw_counts(struct evsel *evsel)
static void evsel__reset_prev_raw_counts(struct evsel *evsel)
{
- if (evsel->prev_raw_counts) {
- evsel->prev_raw_counts->aggr.val = 0;
- evsel->prev_raw_counts->aggr.ena = 0;
- evsel->prev_raw_counts->aggr.run = 0;
- }
+ if (evsel->prev_raw_counts)
+ perf_counts__reset(evsel->prev_raw_counts);
}
static int evsel__alloc_stats(struct evsel *evsel, bool alloc_raw)
--
2.17.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v5 3/5] perf stat: Copy counts from prev_raw_counts to evsel->counts
2020-05-14 5:36 [PATCH v5 0/5] perf stat: Support overall statistics for interval mode Jin Yao
2020-05-14 5:36 ` [PATCH v5 1/5] perf stat: Fix wrong per-thread runtime stat " Jin Yao
2020-05-14 5:36 ` [PATCH v5 2/5] perf counts: Reset prev_raw_counts counts Jin Yao
@ 2020-05-14 5:36 ` Jin Yao
2020-05-14 5:36 ` [PATCH v5 4/5] perf stat: Save aggr value to first member of prev_raw_counts Jin Yao
` (2 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Jin Yao @ 2020-05-14 5:36 UTC (permalink / raw)
To: acme, jolsa, peterz, mingo, alexander.shishkin
Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
It would be useful to support the overall statistics for perf-stat
interval mode. For example, report the summary at the end of
"perf-stat -I" output.
But since perf-stat can support many aggregation modes, such as
--per-thread, --per-socket, -M and etc, we need a solution which
doesn't bring much complexity.
The idea is to use 'evsel->prev_raw_counts' which is updated in
each interval and it's saved with the latest counts. Before reporting
the summary, we copy the counts from evsel->prev_raw_counts to
evsel->counts, and next we just follow non-interval processing.
v5:
---
Don't save the previous aggr value to the member of [cpu0,thread0]
in perf_counts. Originally that was a trick because the
perf_stat_process_counter would create aggr values from per cpu
values. But we don't need to do that all the time. We will
handle it in next patch.
v4:
---
Change the commit message.
No functional change.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
tools/perf/util/stat.c | 24 ++++++++++++++++++++++++
tools/perf/util/stat.h | 1 +
2 files changed, 25 insertions(+)
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index e397815f0dfb..aadc723ce871 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -225,6 +225,30 @@ void perf_evlist__reset_prev_raw_counts(struct evlist *evlist)
evsel__reset_prev_raw_counts(evsel);
}
+static void perf_evsel__copy_prev_raw_counts(struct evsel *evsel)
+{
+ int ncpus = evsel__nr_cpus(evsel);
+ int nthreads = perf_thread_map__nr(evsel->core.threads);
+
+ for (int thread = 0; thread < nthreads; thread++) {
+ for (int cpu = 0; cpu < ncpus; cpu++) {
+ *perf_counts(evsel->counts, cpu, thread) =
+ *perf_counts(evsel->prev_raw_counts, cpu,
+ thread);
+ }
+ }
+
+ evsel->counts->aggr = evsel->prev_raw_counts->aggr;
+}
+
+void perf_evlist__copy_prev_raw_counts(struct evlist *evlist)
+{
+ struct evsel *evsel;
+
+ evlist__for_each_entry(evlist, evsel)
+ perf_evsel__copy_prev_raw_counts(evsel);
+}
+
static void zero_per_pkg(struct evsel *counter)
{
if (counter->per_pkg_mask)
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index b4fdfaa7f2c0..62cf72c71869 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -198,6 +198,7 @@ int perf_evlist__alloc_stats(struct evlist *evlist, bool alloc_raw);
void perf_evlist__free_stats(struct evlist *evlist);
void perf_evlist__reset_stats(struct evlist *evlist);
void perf_evlist__reset_prev_raw_counts(struct evlist *evlist);
+void perf_evlist__copy_prev_raw_counts(struct evlist *evlist);
int perf_stat_process_counter(struct perf_stat_config *config,
struct evsel *counter);
--
2.17.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v5 4/5] perf stat: Save aggr value to first member of prev_raw_counts
2020-05-14 5:36 [PATCH v5 0/5] perf stat: Support overall statistics for interval mode Jin Yao
` (2 preceding siblings ...)
2020-05-14 5:36 ` [PATCH v5 3/5] perf stat: Copy counts from prev_raw_counts to evsel->counts Jin Yao
@ 2020-05-14 5:36 ` Jin Yao
2020-05-18 12:48 ` Jiri Olsa
2020-05-14 5:36 ` [PATCH v5 5/5] perf stat: Report summary for interval mode Jin Yao
2020-05-14 9:53 ` [PATCH v5 0/5] perf stat: Support overall statistics " kajoljain
5 siblings, 1 reply; 12+ messages in thread
From: Jin Yao @ 2020-05-14 5:36 UTC (permalink / raw)
To: acme, jolsa, peterz, mingo, alexander.shishkin
Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
To collect the overall statistics for interval mode, we copy the
counts from evsel->prev_raw_counts to evsel->counts.
For AGGR_GLOBAL mode, because the perf_stat_process_counter creates
aggr values from per cpu values, but the per cpu values are 0,
so the calculated aggr values will be always 0.
This patch uses a trick that saves the previous aggr value to
the first member of perf_counts, then aggr calculation in
process_counter_values can work correctly for AGGR_GLOBAL.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
tools/perf/util/stat.c | 10 ++++++++++
tools/perf/util/stat.h | 1 +
2 files changed, 11 insertions(+)
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index aadc723ce871..fbabdd5b9b62 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -249,6 +249,16 @@ void perf_evlist__copy_prev_raw_counts(struct evlist *evlist)
perf_evsel__copy_prev_raw_counts(evsel);
}
+void perf_evlist__save_aggr_prev_raw_counts(struct evlist *evlist)
+{
+ struct evsel *evsel;
+
+ evlist__for_each_entry(evlist, evsel) {
+ *perf_counts(evsel->prev_raw_counts, 0, 0) =
+ evsel->prev_raw_counts->aggr;
+ }
+}
+
static void zero_per_pkg(struct evsel *counter)
{
if (counter->per_pkg_mask)
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 62cf72c71869..18ead55756cc 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -199,6 +199,7 @@ void perf_evlist__free_stats(struct evlist *evlist);
void perf_evlist__reset_stats(struct evlist *evlist);
void perf_evlist__reset_prev_raw_counts(struct evlist *evlist);
void perf_evlist__copy_prev_raw_counts(struct evlist *evlist);
+void perf_evlist__save_aggr_prev_raw_counts(struct evlist *evlist);
int perf_stat_process_counter(struct perf_stat_config *config,
struct evsel *counter);
--
2.17.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v5 4/5] perf stat: Save aggr value to first member of prev_raw_counts
2020-05-14 5:36 ` [PATCH v5 4/5] perf stat: Save aggr value to first member of prev_raw_counts Jin Yao
@ 2020-05-18 12:48 ` Jiri Olsa
2020-05-18 14:44 ` Jin, Yao
0 siblings, 1 reply; 12+ messages in thread
From: Jiri Olsa @ 2020-05-18 12:48 UTC (permalink / raw)
To: Jin Yao
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On Thu, May 14, 2020 at 01:36:37PM +0800, Jin Yao wrote:
> To collect the overall statistics for interval mode, we copy the
> counts from evsel->prev_raw_counts to evsel->counts.
>
> For AGGR_GLOBAL mode, because the perf_stat_process_counter creates
> aggr values from per cpu values, but the per cpu values are 0,
> so the calculated aggr values will be always 0.
>
> This patch uses a trick that saves the previous aggr value to
> the first member of perf_counts, then aggr calculation in
> process_counter_values can work correctly for AGGR_GLOBAL.
>
> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
> ---
> tools/perf/util/stat.c | 10 ++++++++++
> tools/perf/util/stat.h | 1 +
> 2 files changed, 11 insertions(+)
>
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index aadc723ce871..fbabdd5b9b62 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -249,6 +249,16 @@ void perf_evlist__copy_prev_raw_counts(struct evlist *evlist)
> perf_evsel__copy_prev_raw_counts(evsel);
> }
>
much better, please put some comments in here explaning wha
is this for, because it's not obvious ;-)
thanks,
jirka
> +void perf_evlist__save_aggr_prev_raw_counts(struct evlist *evlist)
> +{
> + struct evsel *evsel;
> +
> + evlist__for_each_entry(evlist, evsel) {
> + *perf_counts(evsel->prev_raw_counts, 0, 0) =
> + evsel->prev_raw_counts->aggr;
> + }
> +}
> +
> static void zero_per_pkg(struct evsel *counter)
> {
> if (counter->per_pkg_mask)
> diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> index 62cf72c71869..18ead55756cc 100644
> --- a/tools/perf/util/stat.h
> +++ b/tools/perf/util/stat.h
> @@ -199,6 +199,7 @@ void perf_evlist__free_stats(struct evlist *evlist);
> void perf_evlist__reset_stats(struct evlist *evlist);
> void perf_evlist__reset_prev_raw_counts(struct evlist *evlist);
> void perf_evlist__copy_prev_raw_counts(struct evlist *evlist);
> +void perf_evlist__save_aggr_prev_raw_counts(struct evlist *evlist);
>
> int perf_stat_process_counter(struct perf_stat_config *config,
> struct evsel *counter);
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v5 4/5] perf stat: Save aggr value to first member of prev_raw_counts
2020-05-18 12:48 ` Jiri Olsa
@ 2020-05-18 14:44 ` Jin, Yao
0 siblings, 0 replies; 12+ messages in thread
From: Jin, Yao @ 2020-05-18 14:44 UTC (permalink / raw)
To: Jiri Olsa
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On 5/18/2020 8:48 PM, Jiri Olsa wrote:
> On Thu, May 14, 2020 at 01:36:37PM +0800, Jin Yao wrote:
>> To collect the overall statistics for interval mode, we copy the
>> counts from evsel->prev_raw_counts to evsel->counts.
>>
>> For AGGR_GLOBAL mode, because the perf_stat_process_counter creates
>> aggr values from per cpu values, but the per cpu values are 0,
>> so the calculated aggr values will be always 0.
>>
>> This patch uses a trick that saves the previous aggr value to
>> the first member of perf_counts, then aggr calculation in
>> process_counter_values can work correctly for AGGR_GLOBAL.
>>
>> Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
>> ---
>> tools/perf/util/stat.c | 10 ++++++++++
>> tools/perf/util/stat.h | 1 +
>> 2 files changed, 11 insertions(+)
>>
>> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
>> index aadc723ce871..fbabdd5b9b62 100644
>> --- a/tools/perf/util/stat.c
>> +++ b/tools/perf/util/stat.c
>> @@ -249,6 +249,16 @@ void perf_evlist__copy_prev_raw_counts(struct evlist *evlist)
>> perf_evsel__copy_prev_raw_counts(evsel);
>> }
>>
>
> much better, please put some comments in here explaning wha
> is this for, because it's not obvious ;-)
>
> thanks,
> jirka
>
Thanks, I will put some comments in v6.
Thanks
Jin Yao
>> +void perf_evlist__save_aggr_prev_raw_counts(struct evlist *evlist)
>> +{
>> + struct evsel *evsel;
>> +
>> + evlist__for_each_entry(evlist, evsel) {
>> + *perf_counts(evsel->prev_raw_counts, 0, 0) =
>> + evsel->prev_raw_counts->aggr;
>> + }
>> +}
>> +
>> static void zero_per_pkg(struct evsel *counter)
>> {
>> if (counter->per_pkg_mask)
>> diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
>> index 62cf72c71869..18ead55756cc 100644
>> --- a/tools/perf/util/stat.h
>> +++ b/tools/perf/util/stat.h
>> @@ -199,6 +199,7 @@ void perf_evlist__free_stats(struct evlist *evlist);
>> void perf_evlist__reset_stats(struct evlist *evlist);
>> void perf_evlist__reset_prev_raw_counts(struct evlist *evlist);
>> void perf_evlist__copy_prev_raw_counts(struct evlist *evlist);
>> +void perf_evlist__save_aggr_prev_raw_counts(struct evlist *evlist);
>>
>> int perf_stat_process_counter(struct perf_stat_config *config,
>> struct evsel *counter);
>> --
>> 2.17.1
>>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v5 5/5] perf stat: Report summary for interval mode
2020-05-14 5:36 [PATCH v5 0/5] perf stat: Support overall statistics for interval mode Jin Yao
` (3 preceding siblings ...)
2020-05-14 5:36 ` [PATCH v5 4/5] perf stat: Save aggr value to first member of prev_raw_counts Jin Yao
@ 2020-05-14 5:36 ` Jin Yao
2020-05-18 12:47 ` Jiri Olsa
2020-05-14 9:53 ` [PATCH v5 0/5] perf stat: Support overall statistics " kajoljain
5 siblings, 1 reply; 12+ messages in thread
From: Jin Yao @ 2020-05-14 5:36 UTC (permalink / raw)
To: acme, jolsa, peterz, mingo, alexander.shishkin
Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao
Currently perf-stat supports to print counts at regular interval (-I),
but it's not very easy for user to get the overall statistics.
The patch uses 'evsel->prev_raw_counts' to get counts for summary.
Copy the counts to 'evsel->counts' after printing the interval results.
Next, we just follow the non-interval processing.
Let's see some examples,
root@kbl-ppc:~# perf stat -e cycles -I1000 --interval-count 2
# time counts unit events
1.000412064 2,281,114 cycles
2.001383658 2,547,880 cycles
Performance counter stats for 'system wide':
4,828,994 cycles
2.002860349 seconds time elapsed
root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2
# time counts unit events
1.000389902 1,536,093 cycles
1.000389902 420,226 instructions # 0.27 insn per cycle
2.001433453 2,213,952 cycles
2.001433453 735,465 instructions # 0.33 insn per cycle
Performance counter stats for 'system wide':
3,750,045 cycles
1,155,691 instructions # 0.31 insn per cycle
2.003023361 seconds time elapsed
root@kbl-ppc:~# perf stat -M CPI,IPC -I1000 --interval-count 2
# time counts unit events
1.000435121 905,303 inst_retired.any # 2.9 CPI
1.000435121 2,663,333 cycles
1.000435121 914,702 inst_retired.any # 0.3 IPC
1.000435121 2,676,559 cpu_clk_unhalted.thread
2.001615941 1,951,092 inst_retired.any # 1.8 CPI
2.001615941 3,551,357 cycles
2.001615941 1,950,837 inst_retired.any # 0.5 IPC
2.001615941 3,551,044 cpu_clk_unhalted.thread
Performance counter stats for 'system wide':
2,856,395 inst_retired.any # 2.2 CPI
6,214,690 cycles
2,865,539 inst_retired.any # 0.5 IPC
6,227,603 cpu_clk_unhalted.thread
2.003403078 seconds time elapsed
v5:
---
Call perf_evlist__save_aggr_prev_raw_counts to save aggr value
to first member of prev_raw_counts for AGGR_GLOBAL. Then next,
perf_stat_process_counter can create aggr values from per cpu
values.
v4:
---
Move affinity setup and read_counter_cpu to a new function
read_affinity_counters. It's only called when stat_config.summary
is not set.
v3:
---
Use evsel->prev_raw_counts for summary counts
v2:
---
Rebase to perf/core branch
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
---
tools/perf/builtin-stat.c | 31 ++++++++++++++++++++++++++++---
tools/perf/util/stat.c | 2 +-
tools/perf/util/stat.h | 1 +
3 files changed, 30 insertions(+), 4 deletions(-)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index f3b3a59ac7d2..24deed746325 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -314,14 +314,14 @@ static int read_counter_cpu(struct evsel *counter, struct timespec *rs, int cpu)
return 0;
}
-static void read_counters(struct timespec *rs)
+static int read_affinity_counters(struct timespec *rs)
{
struct evsel *counter;
struct affinity affinity;
int i, ncpus, cpu;
if (affinity__setup(&affinity) < 0)
- return;
+ return -1;
ncpus = perf_cpu_map__nr(evsel_list->core.all_cpus);
if (!target__has_cpu(&target) || target__has_per_thread(&target))
@@ -341,6 +341,15 @@ static void read_counters(struct timespec *rs)
}
}
affinity__cleanup(&affinity);
+ return 0;
+}
+
+static void read_counters(struct timespec *rs)
+{
+ struct evsel *counter;
+
+ if (!stat_config.summary && (read_affinity_counters(rs) < 0))
+ return;
evlist__for_each_entry(evsel_list, counter) {
if (counter->err)
@@ -394,6 +403,7 @@ static void runtime_stat_reset(struct perf_stat_config *config)
static void process_interval(void)
{
struct timespec ts, rs;
+ struct stats walltime_nsecs_stats_bak;
clock_gettime(CLOCK_MONOTONIC, &ts);
diff_timespec(&rs, &ts, &ref_time);
@@ -407,9 +417,11 @@ static void process_interval(void)
pr_err("failed to write stat round event\n");
}
+ walltime_nsecs_stats_bak = walltime_nsecs_stats;
init_stats(&walltime_nsecs_stats);
update_stats(&walltime_nsecs_stats, stat_config.interval * 1000000);
print_counters(&rs, 0, NULL);
+ walltime_nsecs_stats = walltime_nsecs_stats_bak;
}
static void enable_counters(void)
@@ -765,6 +777,19 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
update_stats(&walltime_nsecs_stats, t1 - t0);
+ if (interval) {
+ stat_config.interval = 0;
+ stat_config.summary = true;
+
+ if (stat_config.aggr_mode == AGGR_GLOBAL)
+ perf_evlist__save_aggr_prev_raw_counts(evsel_list);
+
+ perf_evlist__copy_prev_raw_counts(evsel_list);
+ perf_evlist__reset_prev_raw_counts(evsel_list);
+ runtime_stat_reset(&stat_config);
+ perf_stat__reset_shadow_per_stat(&rt_stat);
+ }
+
/*
* Closing a group leader splits the group, and as we only disable
* group leaders, results in remaining events becoming enabled. To
@@ -2159,7 +2184,7 @@ int cmd_stat(int argc, const char **argv)
}
}
- if (!forever && status != -1 && !interval)
+ if (!forever && status != -1 && (!interval || stat_config.summary))
print_counters(NULL, argc, argv);
if (STAT_RECORD) {
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index fbabdd5b9b62..481543c422a7 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -398,7 +398,7 @@ int perf_stat_process_counter(struct perf_stat_config *config,
* interval mode, otherwise overall avg running
* averages will be shown for each interval.
*/
- if (config->interval) {
+ if (config->interval || config->summary) {
for (i = 0; i < 3; i++)
init_stats(&ps->res_stats[i]);
}
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 18ead55756cc..a5604a20bdca 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -110,6 +110,7 @@ struct perf_stat_config {
bool all_kernel;
bool all_user;
bool percore_show_thread;
+ bool summary;
FILE *output;
unsigned int interval;
unsigned int timeout;
--
2.17.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v5 5/5] perf stat: Report summary for interval mode
2020-05-14 5:36 ` [PATCH v5 5/5] perf stat: Report summary for interval mode Jin Yao
@ 2020-05-18 12:47 ` Jiri Olsa
2020-05-19 2:51 ` Jin, Yao
0 siblings, 1 reply; 12+ messages in thread
From: Jiri Olsa @ 2020-05-18 12:47 UTC (permalink / raw)
To: Jin Yao
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
On Thu, May 14, 2020 at 01:36:38PM +0800, Jin Yao wrote:
SNIP
>
> evlist__for_each_entry(evsel_list, counter) {
> if (counter->err)
> @@ -394,6 +403,7 @@ static void runtime_stat_reset(struct perf_stat_config *config)
> static void process_interval(void)
> {
> struct timespec ts, rs;
> + struct stats walltime_nsecs_stats_bak;
>
> clock_gettime(CLOCK_MONOTONIC, &ts);
> diff_timespec(&rs, &ts, &ref_time);
> @@ -407,9 +417,11 @@ static void process_interval(void)
> pr_err("failed to write stat round event\n");
> }
>
> + walltime_nsecs_stats_bak = walltime_nsecs_stats;
> init_stats(&walltime_nsecs_stats);
> update_stats(&walltime_nsecs_stats, stat_config.interval * 1000000);
> print_counters(&rs, 0, NULL);
> + walltime_nsecs_stats = walltime_nsecs_stats_bak;
could we instead of above initialize walltime_nsecs_stats
in the condition below, like:
init_stats(&walltime_nsecs_stats);
update_stats(&walltime_nsecs_stats, stat_config.interval * 1000000);
jirka
> }
>
> static void enable_counters(void)
> @@ -765,6 +777,19 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>
> update_stats(&walltime_nsecs_stats, t1 - t0);
>
> + if (interval) {
> + stat_config.interval = 0;
> + stat_config.summary = true;
> +
> + if (stat_config.aggr_mode == AGGR_GLOBAL)
> + perf_evlist__save_aggr_prev_raw_counts(evsel_list);
> +
> + perf_evlist__copy_prev_raw_counts(evsel_list);
> + perf_evlist__reset_prev_raw_counts(evsel_list);
> + runtime_stat_reset(&stat_config);
> + perf_stat__reset_shadow_per_stat(&rt_stat);
> + }
> +
SNIP
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v5 5/5] perf stat: Report summary for interval mode
2020-05-18 12:47 ` Jiri Olsa
@ 2020-05-19 2:51 ` Jin, Yao
0 siblings, 0 replies; 12+ messages in thread
From: Jin, Yao @ 2020-05-19 2:51 UTC (permalink / raw)
To: Jiri Olsa
Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak,
kan.liang, yao.jin
Hi Jiri,
On 5/18/2020 8:47 PM, Jiri Olsa wrote:
> On Thu, May 14, 2020 at 01:36:38PM +0800, Jin Yao wrote:
>
> SNIP
>
>>
>> evlist__for_each_entry(evsel_list, counter) {
>> if (counter->err)
>> @@ -394,6 +403,7 @@ static void runtime_stat_reset(struct perf_stat_config *config)
>> static void process_interval(void)
>> {
>> struct timespec ts, rs;
>> + struct stats walltime_nsecs_stats_bak;
>>
>> clock_gettime(CLOCK_MONOTONIC, &ts);
>> diff_timespec(&rs, &ts, &ref_time);
>> @@ -407,9 +417,11 @@ static void process_interval(void)
>> pr_err("failed to write stat round event\n");
>> }
>>
>> + walltime_nsecs_stats_bak = walltime_nsecs_stats;
>> init_stats(&walltime_nsecs_stats);
>> update_stats(&walltime_nsecs_stats, stat_config.interval * 1000000);
>> print_counters(&rs, 0, NULL);
>> + walltime_nsecs_stats = walltime_nsecs_stats_bak;
>
> could we instead of above initialize walltime_nsecs_stats
> in the condition below, like:
>
> init_stats(&walltime_nsecs_stats);
> update_stats(&walltime_nsecs_stats, stat_config.interval * 1000000);
>
> jirka
>
Yes, I think that's OK and better. My fix is:
@@ -775,11 +772,11 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
if (stat_config.walltime_run_table)
stat_config.walltime_run[run_idx] = t1 - t0;
- update_stats(&walltime_nsecs_stats, t1 - t0);
-
if (interval) {
stat_config.interval = 0;
stat_config.summary = true;
+ init_stats(&walltime_nsecs_stats);
+ update_stats(&walltime_nsecs_stats, t1 - t0);
if (stat_config.aggr_mode == AGGR_GLOBAL)
perf_evlist__save_aggr_prev_raw_counts(evsel_list);
@@ -788,7 +785,8 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
perf_evlist__reset_prev_raw_counts(evsel_list);
runtime_stat_reset(&stat_config);
perf_stat__reset_shadow_per_stat(&rt_stat);
- }
+ } else
+ update_stats(&walltime_nsecs_stats, t1 - t0);
/*
* Closing a group leader splits the group, and as we only disable
Thanks
Jin Yao
>> }
>>
>> static void enable_counters(void)
>> @@ -765,6 +777,19 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>>
>> update_stats(&walltime_nsecs_stats, t1 - t0);
>>
>> + if (interval) {
>> + stat_config.interval = 0;
>> + stat_config.summary = true;
>> +
>> + if (stat_config.aggr_mode == AGGR_GLOBAL)
>> + perf_evlist__save_aggr_prev_raw_counts(evsel_list);
>> +
>> + perf_evlist__copy_prev_raw_counts(evsel_list);
>> + perf_evlist__reset_prev_raw_counts(evsel_list);
>> + runtime_stat_reset(&stat_config);
>> + perf_stat__reset_shadow_per_stat(&rt_stat);
>> + }
>> +
>
> SNIP
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v5 0/5] perf stat: Support overall statistics for interval mode
2020-05-14 5:36 [PATCH v5 0/5] perf stat: Support overall statistics for interval mode Jin Yao
` (4 preceding siblings ...)
2020-05-14 5:36 ` [PATCH v5 5/5] perf stat: Report summary for interval mode Jin Yao
@ 2020-05-14 9:53 ` kajoljain
2020-05-14 13:44 ` Jin, Yao
5 siblings, 1 reply; 12+ messages in thread
From: kajoljain @ 2020-05-14 9:53 UTC (permalink / raw)
To: Jin Yao, acme, jolsa, peterz, mingo, alexander.shishkin
Cc: Linux-kernel, ak, kan.liang, yao.jin
On 5/14/20 11:06 AM, Jin Yao wrote:
> Currently perf-stat supports to print counts at regular interval (-I),
> but it's not very easy for user to get the overall statistics.
>
> With this patchset, it supports to report the summary at the end of
> interval output.
>
> For example,
>
> root@kbl-ppc:~# perf stat -e cycles -I1000 --interval-count 2
> # time counts unit events
> 1.000412064 2,281,114 cycles
> 2.001383658 2,547,880 cycles
>
> Performance counter stats for 'system wide':
>
> 4,828,994 cycles
>
> 2.002860349 seconds time elapsed
>
> root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2
> # time counts unit events
> 1.000389902 1,536,093 cycles
> 1.000389902 420,226 instructions # 0.27 insn per cycle
> 2.001433453 2,213,952 cycles
> 2.001433453 735,465 instructions # 0.33 insn per cycle
>
> Performance counter stats for 'system wide':
>
> 3,750,045 cycles
> 1,155,691 instructions # 0.31 insn per cycle
>
> 2.003023361 seconds time elapsed
>
> root@kbl-ppc:~# perf stat -M CPI,IPC -I1000 --interval-count 2
> # time counts unit events
> 1.000435121 905,303 inst_retired.any # 2.9 CPI
> 1.000435121 2,663,333 cycles
> 1.000435121 914,702 inst_retired.any # 0.3 IPC
> 1.000435121 2,676,559 cpu_clk_unhalted.thread
> 2.001615941 1,951,092 inst_retired.any # 1.8 CPI
> 2.001615941 3,551,357 cycles
> 2.001615941 1,950,837 inst_retired.any # 0.5 IPC
> 2.001615941 3,551,044 cpu_clk_unhalted.thread
>
> Performance counter stats for 'system wide':
>
> 2,856,395 inst_retired.any # 2.2 CPI
> 6,214,690 cycles
> 2,865,539 inst_retired.any # 0.5 IPC
> 6,227,603 cpu_clk_unhalted.thread
>
> 2.003403078 seconds time elapsed
Hi Jin,
Reporting the summary will be great for understanding overall stats. So, Before the
patch where we are reseting rt_stat before read_counters to make sure, whatever printing
in final aggregate is as per counts on that interval,
we used to update stats->means and other info as described in
RFC: https://lkml.org/lkml/2020/3/24/158
Now, stats->means is same as counts which we are using in generic_metric function. Is this expected behavior?
I am not sure, if data like stats->means and all suppose to update per interval or we are using it somewhere else.
So, As we call update_stats for each event and for each interval, can we somehow use that
to print overall stats maybe by adding some var in `struct stats` to keep count of total counts for that event.
Please let me know if my understanding is fine.
Thanks,
Kajol Jain
>
> v5:
> ---
> 1. Create new patch "perf stat: Save aggr value to first member
> of prev_raw_counts".
>
> 2. Call perf_evlist__save_aggr_prev_raw_counts to save aggr value
> to first member of prev_raw_counts for AGGR_GLOBAL. Then next,
> perf_stat_process_counter can create aggr values from per cpu
> values.
>
> Following patches are impacted in v5:
> perf stat: Copy counts from prev_raw_counts to evsel->counts
> perf stat: Save aggr value to first member of prev_raw_counts
> perf stat: Report summary for interval mode
>
> v4:
> ---
> 1. Create runtime_stat_reset.
>
> 2. Zero the aggr in perf_counts__reset and use it to reset
> prev_raw_counts.
>
> 3. Move affinity setup and read_counter_cpu to a new function
> read_affinity_counters. It's only called when stat_config.summary
> is not set.
>
> v3:
> ---
> 1. 'perf stat: Fix wrong per-thread runtime stat for interval mode'
> is a new patch which fixes an existing issue found in test.
>
> 2. We use the prev_raw_counts for summary counts. Drop the summary_counts in v2.
>
> 3. Fix some issues.
>
> v2:
> ---
> Rebase to perf/core branch
>
> Jin Yao (5):
> perf stat: Fix wrong per-thread runtime stat for interval mode
> perf counts: Reset prev_raw_counts counts
> perf stat: Copy counts from prev_raw_counts to evsel->counts
> perf stat: Save aggr value to first member of prev_raw_counts
> perf stat: Report summary for interval mode
>
> tools/perf/builtin-stat.c | 101 ++++++++++++++++++++++++++------------
> tools/perf/util/counts.c | 4 +-
> tools/perf/util/counts.h | 1 +
> tools/perf/util/stat.c | 43 +++++++++++++---
> tools/perf/util/stat.h | 3 ++
> 5 files changed, 113 insertions(+), 39 deletions(-)
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v5 0/5] perf stat: Support overall statistics for interval mode
2020-05-14 9:53 ` [PATCH v5 0/5] perf stat: Support overall statistics " kajoljain
@ 2020-05-14 13:44 ` Jin, Yao
0 siblings, 0 replies; 12+ messages in thread
From: Jin, Yao @ 2020-05-14 13:44 UTC (permalink / raw)
To: kajoljain, acme, jolsa, peterz, mingo, alexander.shishkin
Cc: Linux-kernel, ak, kan.liang, yao.jin
Hi Kajoljain,
On 5/14/2020 5:53 PM, kajoljain wrote:
>
>
> On 5/14/20 11:06 AM, Jin Yao wrote:
>> Currently perf-stat supports to print counts at regular interval (-I),
>> but it's not very easy for user to get the overall statistics.
>>
>> With this patchset, it supports to report the summary at the end of
>> interval output.
>>
>> For example,
>>
>> root@kbl-ppc:~# perf stat -e cycles -I1000 --interval-count 2
>> # time counts unit events
>> 1.000412064 2,281,114 cycles
>> 2.001383658 2,547,880 cycles
>>
>> Performance counter stats for 'system wide':
>>
>> 4,828,994 cycles
>>
>> 2.002860349 seconds time elapsed
>>
>> root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2
>> # time counts unit events
>> 1.000389902 1,536,093 cycles
>> 1.000389902 420,226 instructions # 0.27 insn per cycle
>> 2.001433453 2,213,952 cycles
>> 2.001433453 735,465 instructions # 0.33 insn per cycle
>>
>> Performance counter stats for 'system wide':
>>
>> 3,750,045 cycles
>> 1,155,691 instructions # 0.31 insn per cycle
>>
>> 2.003023361 seconds time elapsed
>>
>> root@kbl-ppc:~# perf stat -M CPI,IPC -I1000 --interval-count 2
>> # time counts unit events
>> 1.000435121 905,303 inst_retired.any # 2.9 CPI
>> 1.000435121 2,663,333 cycles
>> 1.000435121 914,702 inst_retired.any # 0.3 IPC
>> 1.000435121 2,676,559 cpu_clk_unhalted.thread
>> 2.001615941 1,951,092 inst_retired.any # 1.8 CPI
>> 2.001615941 3,551,357 cycles
>> 2.001615941 1,950,837 inst_retired.any # 0.5 IPC
>> 2.001615941 3,551,044 cpu_clk_unhalted.thread
>>
>> Performance counter stats for 'system wide':
>>
>> 2,856,395 inst_retired.any # 2.2 CPI
>> 6,214,690 cycles
>> 2,865,539 inst_retired.any # 0.5 IPC
>> 6,227,603 cpu_clk_unhalted.thread
>>
>> 2.003403078 seconds time elapsed
>
> Hi Jin,
> Reporting the summary will be great for understanding overall stats. So, Before the
> patch where we are reseting rt_stat before read_counters to make sure, whatever printing
> in final aggregate is as per counts on that interval,
>
Yes, I had similar thoughts, so I posted following patch.
https://lore.kernel.org/lkml/20200420145417.6864-1-yao.jin@linux.intel.com/
> we used to update stats->means and other info as described in
>
> RFC: https://lkml.org/lkml/2020/3/24/158
>
I've checked your patch but sorry I'm also not very sure if it's the expected
behavior.
> Now, stats->means is same as counts which we are using in generic_metric function. Is this expected behavior?
> I am not sure, if data like stats->means and all suppose to update per interval or we are using it somewhere else.
>
I just think it's easy to understand, that is the metric calculated by the
counts per interval.
> So, As we call update_stats for each event and for each interval, can we somehow use that
> to print overall stats maybe by adding some var in `struct stats` to keep count of total counts for that event.
> Please let me know if my understanding is fine.
>
Adding var in 'struct stats' looks not enough (or more complicated), because
perf-stat also needs to report some counts according to different aggregation
modes (not only the metric). I just think copying total counts to current counts
is a easy way because we can reuse most of existing non-interval processing code.
Thanks
Jin Yao
> Thanks,
> Kajol Jain
>
>
>
>>
>> v5:
>> ---
>> 1. Create new patch "perf stat: Save aggr value to first member
>> of prev_raw_counts".
>>
>> 2. Call perf_evlist__save_aggr_prev_raw_counts to save aggr value
>> to first member of prev_raw_counts for AGGR_GLOBAL. Then next,
>> perf_stat_process_counter can create aggr values from per cpu
>> values.
>>
>> Following patches are impacted in v5:
>> perf stat: Copy counts from prev_raw_counts to evsel->counts
>> perf stat: Save aggr value to first member of prev_raw_counts
>> perf stat: Report summary for interval mode
>>
>> v4:
>> ---
>> 1. Create runtime_stat_reset.
>>
>> 2. Zero the aggr in perf_counts__reset and use it to reset
>> prev_raw_counts.
>>
>> 3. Move affinity setup and read_counter_cpu to a new function
>> read_affinity_counters. It's only called when stat_config.summary
>> is not set.
>>
>> v3:
>> ---
>> 1. 'perf stat: Fix wrong per-thread runtime stat for interval mode'
>> is a new patch which fixes an existing issue found in test.
>>
>> 2. We use the prev_raw_counts for summary counts. Drop the summary_counts in v2.
>>
>> 3. Fix some issues.
>>
>> v2:
>> ---
>> Rebase to perf/core branch
>>
>> Jin Yao (5):
>> perf stat: Fix wrong per-thread runtime stat for interval mode
>> perf counts: Reset prev_raw_counts counts
>> perf stat: Copy counts from prev_raw_counts to evsel->counts
>> perf stat: Save aggr value to first member of prev_raw_counts
>> perf stat: Report summary for interval mode
>>
>> tools/perf/builtin-stat.c | 101 ++++++++++++++++++++++++++------------
>> tools/perf/util/counts.c | 4 +-
>> tools/perf/util/counts.h | 1 +
>> tools/perf/util/stat.c | 43 +++++++++++++---
>> tools/perf/util/stat.h | 3 ++
>> 5 files changed, 113 insertions(+), 39 deletions(-)
>>
^ permalink raw reply [flat|nested] 12+ messages in thread