linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1)
@ 2022-10-10  5:35 Namhyung Kim
  2022-10-10  5:35 ` [PATCH 01/19] perf tools: Save evsel->pmu in parse_events() Namhyung Kim
                   ` (19 more replies)
  0 siblings, 20 replies; 63+ messages in thread
From: Namhyung Kim @ 2022-10-10  5:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun

Hello,

Current perf stat code is somewhat hard to follow since it handles
many combinations of PMUs/events for given display and aggregation
options.  This is my attempt to clean it up a little. ;-)

My first concern is that aggregation and display routines are intermixed
and processed differently depends on the aggregation mode.  I'd like to
separate them apart and make the logic clearer.

To do that, I added struct perf_stat_aggr to save the aggregated counter
values and other info.  It'll be allocated and processed according to
the aggr_mode and display logic will use it.

The code is available at 'perf/stat-aggr-v1' branch in

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Thanks,
Namhyung


Namhyung Kim (19):
  perf tools: Save evsel->pmu in parse_events()
  perf tools: Use pmu info in evsel__is_hybrid()
  perf stat: Use evsel__is_hybrid() more
  perf stat: Add aggr id for global mode
  perf stat: Add cpu aggr id for no aggregation mode
  perf stat: Add 'needs_sort' argument to cpu_aggr_map__new()
  perf stat: Add struct perf_stat_aggr to perf_stat_evsel
  perf stat: Allocate evsel->stats->aggr properly
  perf stat: Aggregate events using evsel->stats->aggr
  perf stat: Aggregate per-thread stats using evsel->stats->aggr
  perf stat: Allocate aggr counts for recorded data
  perf stat: Reset aggr counts for each interval
  perf stat: Split process_counters()
  perf stat: Add perf_stat_merge_counters()
  perf stat: Add perf_stat_process_percore()
  perf stat: Add perf_stat_process_shadow_stats()
  perf stat: Display event stats using aggr counts
  perf stat: Display percore events properly
  perf stat: Remove unused perf_counts.aggr field

 tools/perf/builtin-script.c                   |   4 +-
 tools/perf/builtin-stat.c                     | 177 +++++--
 tools/perf/tests/parse-metric.c               |   2 +-
 tools/perf/tests/pmu-events.c                 |   2 +-
 tools/perf/util/counts.c                      |   1 -
 tools/perf/util/counts.h                      |   1 -
 tools/perf/util/cpumap.c                      |  16 +-
 tools/perf/util/cpumap.h                      |   8 +-
 tools/perf/util/evsel.c                       |  13 +-
 tools/perf/util/parse-events.c                |   1 +
 tools/perf/util/pmu.c                         |   4 +
 .../scripting-engines/trace-event-python.c    |   6 -
 tools/perf/util/stat-display.c                | 472 +++---------------
 tools/perf/util/stat.c                        | 383 +++++++++++---
 tools/perf/util/stat.h                        |  29 +-
 15 files changed, 590 insertions(+), 529 deletions(-)


base-commit: d79310700590b8b40d8c867012d6c899ea6fd505
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 01/19] perf tools: Save evsel->pmu in parse_events()
  2022-10-10  5:35 [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1) Namhyung Kim
@ 2022-10-10  5:35 ` Namhyung Kim
  2022-10-10 22:21   ` Ian Rogers
  2022-10-10  5:35 ` [PATCH 02/19] perf tools: Use pmu info in evsel__is_hybrid() Namhyung Kim
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 63+ messages in thread
From: Namhyung Kim @ 2022-10-10  5:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun

Now evsel has a pmu pointer, let's save the info and use it like in
evsel__find_pmu().

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/evsel.c        | 1 +
 tools/perf/util/parse-events.c | 1 +
 tools/perf/util/pmu.c          | 4 ++++
 3 files changed, 6 insertions(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 76605fde3507..196f8e4859d7 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -467,6 +467,7 @@ struct evsel *evsel__clone(struct evsel *orig)
 	evsel->collect_stat = orig->collect_stat;
 	evsel->weak_group = orig->weak_group;
 	evsel->use_config_name = orig->use_config_name;
+	evsel->pmu = orig->pmu;
 
 	if (evsel__copy_config_terms(evsel, orig) < 0)
 		goto out_err;
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 437389dacf48..9e704841273d 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -263,6 +263,7 @@ __add_event(struct list_head *list, int *idx,
 	evsel->core.own_cpus = perf_cpu_map__get(cpus);
 	evsel->core.requires_cpu = pmu ? pmu->is_uncore : false;
 	evsel->auto_merge_stats = auto_merge_stats;
+	evsel->pmu = pmu;
 
 	if (name)
 		evsel->name = strdup(name);
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 74a2cafb4e8d..15bf5943083a 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1048,11 +1048,15 @@ struct perf_pmu *evsel__find_pmu(struct evsel *evsel)
 {
 	struct perf_pmu *pmu = NULL;
 
+	if (evsel->pmu)
+		return evsel->pmu;
+
 	while ((pmu = perf_pmu__scan(pmu)) != NULL) {
 		if (pmu->type == evsel->core.attr.type)
 			break;
 	}
 
+	evsel->pmu = pmu;
 	return pmu;
 }
 
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 02/19] perf tools: Use pmu info in evsel__is_hybrid()
  2022-10-10  5:35 [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1) Namhyung Kim
  2022-10-10  5:35 ` [PATCH 01/19] perf tools: Save evsel->pmu in parse_events() Namhyung Kim
@ 2022-10-10  5:35 ` Namhyung Kim
  2022-10-10 22:31   ` Ian Rogers
  2022-10-10  5:35 ` [PATCH 03/19] perf stat: Use evsel__is_hybrid() more Namhyung Kim
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 63+ messages in thread
From: Namhyung Kim @ 2022-10-10  5:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun

If evsel has pmu, it can use pmu->is_hybrid directly.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/evsel.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 196f8e4859d7..a6ea91c72659 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -3132,6 +3132,9 @@ void evsel__zero_per_pkg(struct evsel *evsel)
 
 bool evsel__is_hybrid(struct evsel *evsel)
 {
+	if (evsel->pmu)
+		return evsel->pmu->is_hybrid;
+
 	return evsel->pmu_name && perf_pmu__is_hybrid(evsel->pmu_name);
 }
 
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 03/19] perf stat: Use evsel__is_hybrid() more
  2022-10-10  5:35 [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1) Namhyung Kim
  2022-10-10  5:35 ` [PATCH 01/19] perf tools: Save evsel->pmu in parse_events() Namhyung Kim
  2022-10-10  5:35 ` [PATCH 02/19] perf tools: Use pmu info in evsel__is_hybrid() Namhyung Kim
@ 2022-10-10  5:35 ` Namhyung Kim
  2022-10-10 22:32   ` Ian Rogers
  2022-10-10  5:35 ` [PATCH 04/19] perf stat: Add aggr id for global mode Namhyung Kim
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 63+ messages in thread
From: Namhyung Kim @ 2022-10-10  5:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun

In the stat-display code, it needs to check if the current evsel is
hybrid but it uses perf_pmu__has_hybrid() which can return true for
non-hybrid event too.  I think it's better to use evsel__is_hybrid().

Also remove a NULL check for the 'config' parameter in the
hybrid_merge() since it's called after config->no_merge check.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/stat-display.c | 20 ++++----------------
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 5c47ee9963a7..4113aa86772f 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -704,7 +704,7 @@ static void uniquify_event_name(struct evsel *counter)
 			counter->name = new_name;
 		}
 	} else {
-		if (perf_pmu__has_hybrid()) {
+		if (evsel__is_hybrid(counter)) {
 			ret = asprintf(&new_name, "%s/%s/",
 				       counter->pmu_name, counter->name);
 		} else {
@@ -744,26 +744,14 @@ static void collect_all_aliases(struct perf_stat_config *config, struct evsel *c
 	}
 }
 
-static bool is_uncore(struct evsel *evsel)
-{
-	struct perf_pmu *pmu = evsel__find_pmu(evsel);
-
-	return pmu && pmu->is_uncore;
-}
-
-static bool hybrid_uniquify(struct evsel *evsel)
-{
-	return perf_pmu__has_hybrid() && !is_uncore(evsel);
-}
-
 static bool hybrid_merge(struct evsel *counter, struct perf_stat_config *config,
 			 bool check)
 {
-	if (hybrid_uniquify(counter)) {
+	if (evsel__is_hybrid(counter)) {
 		if (check)
-			return config && config->hybrid_merge;
+			return config->hybrid_merge;
 		else
-			return config && !config->hybrid_merge;
+			return !config->hybrid_merge;
 	}
 
 	return false;
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 04/19] perf stat: Add aggr id for global mode
  2022-10-10  5:35 [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1) Namhyung Kim
                   ` (2 preceding siblings ...)
  2022-10-10  5:35 ` [PATCH 03/19] perf stat: Use evsel__is_hybrid() more Namhyung Kim
@ 2022-10-10  5:35 ` Namhyung Kim
  2022-10-10 22:46   ` Ian Rogers
  2022-10-12 10:55   ` Jiri Olsa
  2022-10-10  5:35 ` [PATCH 05/19] perf stat: Add cpu aggr id for no aggregation mode Namhyung Kim
                   ` (15 subsequent siblings)
  19 siblings, 2 replies; 63+ messages in thread
From: Namhyung Kim @ 2022-10-10  5:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun

To make the code simpler, I'd like to use the same aggregation code for
the global mode.  We can simply add an id function to return cpu 0 and
use print_aggr().

No functional change intended.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c      | 39 ++++++++++++++++++++++++++++++++--
 tools/perf/util/cpumap.c       | 10 +++++++++
 tools/perf/util/cpumap.h       |  6 +++++-
 tools/perf/util/stat-display.c |  9 ++------
 4 files changed, 54 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 265b05157972..144bb3a657f2 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1330,6 +1330,15 @@ static struct aggr_cpu_id perf_stat__get_node(struct perf_stat_config *config __
 	return aggr_cpu_id__node(cpu, /*data=*/NULL);
 }
 
+static struct aggr_cpu_id perf_stat__get_global(struct perf_stat_config *config __maybe_unused,
+						struct perf_cpu cpu __maybe_unused)
+{
+	struct aggr_cpu_id id = aggr_cpu_id__empty();
+
+	id.cpu = (struct perf_cpu){ .cpu = 0 };
+	return id;
+}
+
 static struct aggr_cpu_id perf_stat__get_aggr(struct perf_stat_config *config,
 					      aggr_get_id_t get_id, struct perf_cpu cpu)
 {
@@ -1366,6 +1375,12 @@ static struct aggr_cpu_id perf_stat__get_node_cached(struct perf_stat_config *co
 	return perf_stat__get_aggr(config, perf_stat__get_node, cpu);
 }
 
+static struct aggr_cpu_id perf_stat__get_global_cached(struct perf_stat_config *config,
+						       struct perf_cpu cpu)
+{
+	return perf_stat__get_aggr(config, perf_stat__get_global, cpu);
+}
+
 static bool term_percore_set(void)
 {
 	struct evsel *counter;
@@ -1395,6 +1410,7 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr(enum aggr_mode aggr_mode)
 
 		return NULL;
 	case AGGR_GLOBAL:
+		return aggr_cpu_id__global;
 	case AGGR_THREAD:
 	case AGGR_UNSET:
 	case AGGR_MAX:
@@ -1420,6 +1436,7 @@ static aggr_get_id_t aggr_mode__get_id(enum aggr_mode aggr_mode)
 		}
 		return NULL;
 	case AGGR_GLOBAL:
+		return perf_stat__get_global_cached;
 	case AGGR_THREAD:
 	case AGGR_UNSET:
 	case AGGR_MAX:
@@ -1535,6 +1552,16 @@ static struct aggr_cpu_id perf_env__get_node_aggr_by_cpu(struct perf_cpu cpu, vo
 	return id;
 }
 
+static struct aggr_cpu_id perf_env__get_global_aggr_by_cpu(struct perf_cpu cpu __maybe_unused,
+							   void *data __maybe_unused)
+{
+	struct aggr_cpu_id id = aggr_cpu_id__empty();
+
+	/* it always aggregates to the cpu 0 */
+	id.cpu = (struct perf_cpu){ .cpu = 0 };
+	return id;
+}
+
 static struct aggr_cpu_id perf_stat__get_socket_file(struct perf_stat_config *config __maybe_unused,
 						     struct perf_cpu cpu)
 {
@@ -1558,6 +1585,12 @@ static struct aggr_cpu_id perf_stat__get_node_file(struct perf_stat_config *conf
 	return perf_env__get_node_aggr_by_cpu(cpu, &perf_stat.session->header.env);
 }
 
+static struct aggr_cpu_id perf_stat__get_global_file(struct perf_stat_config *config __maybe_unused,
+						     struct perf_cpu cpu)
+{
+	return perf_env__get_global_aggr_by_cpu(cpu, &perf_stat.session->header.env);
+}
+
 static aggr_cpu_id_get_t aggr_mode__get_aggr_file(enum aggr_mode aggr_mode)
 {
 	switch (aggr_mode) {
@@ -1569,8 +1602,9 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr_file(enum aggr_mode aggr_mode)
 		return perf_env__get_core_aggr_by_cpu;
 	case AGGR_NODE:
 		return perf_env__get_node_aggr_by_cpu;
-	case AGGR_NONE:
 	case AGGR_GLOBAL:
+		return perf_env__get_global_aggr_by_cpu;
+	case AGGR_NONE:
 	case AGGR_THREAD:
 	case AGGR_UNSET:
 	case AGGR_MAX:
@@ -1590,8 +1624,9 @@ static aggr_get_id_t aggr_mode__get_id_file(enum aggr_mode aggr_mode)
 		return perf_stat__get_core_file;
 	case AGGR_NODE:
 		return perf_stat__get_node_file;
-	case AGGR_NONE:
 	case AGGR_GLOBAL:
+		return perf_stat__get_global_file;
+	case AGGR_NONE:
 	case AGGR_THREAD:
 	case AGGR_UNSET:
 	case AGGR_MAX:
diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c
index 8486ca3bec75..60209fe87456 100644
--- a/tools/perf/util/cpumap.c
+++ b/tools/perf/util/cpumap.c
@@ -354,6 +354,16 @@ struct aggr_cpu_id aggr_cpu_id__node(struct perf_cpu cpu, void *data __maybe_unu
 	return id;
 }
 
+struct aggr_cpu_id aggr_cpu_id__global(struct perf_cpu cpu, void *data __maybe_unused)
+{
+	struct aggr_cpu_id id = aggr_cpu_id__empty();
+
+	/* it always aggregates to the cpu 0 */
+	cpu.cpu = 0;
+	id.cpu = cpu;
+	return id;
+}
+
 /* setup simple routines to easily access node numbers given a cpu number */
 static int get_max_num(char *path, int *max)
 {
diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h
index 4a6d029576ee..b2ff648bc417 100644
--- a/tools/perf/util/cpumap.h
+++ b/tools/perf/util/cpumap.h
@@ -133,5 +133,9 @@ struct aggr_cpu_id aggr_cpu_id__cpu(struct perf_cpu cpu, void *data);
  * cpu. The function signature is compatible with aggr_cpu_id_get_t.
  */
 struct aggr_cpu_id aggr_cpu_id__node(struct perf_cpu cpu, void *data);
-
+/**
+ * aggr_cpu_id__global - Create an aggr_cpu_id for global aggregation.
+ * The function signature is compatible with aggr_cpu_id_get_t.
+ */
+struct aggr_cpu_id aggr_cpu_id__global(struct perf_cpu cpu, void *data);
 #endif /* __PERF_CPUMAP_H */
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 4113aa86772f..1d8e585df4ad 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -1477,13 +1477,8 @@ void evlist__print_counters(struct evlist *evlist, struct perf_stat_config *conf
 		if (config->iostat_run)
 			iostat_print_counters(evlist, config, ts, prefix = buf,
 					      print_counter_aggr);
-		else {
-			evlist__for_each_entry(evlist, counter) {
-				print_counter_aggr(config, counter, prefix);
-			}
-			if (metric_only)
-				fputc('\n', config->output);
-		}
+		else
+			print_aggr(config, evlist, prefix);
 		break;
 	case AGGR_NONE:
 		if (metric_only)
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 05/19] perf stat: Add cpu aggr id for no aggregation mode
  2022-10-10  5:35 [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1) Namhyung Kim
                   ` (3 preceding siblings ...)
  2022-10-10  5:35 ` [PATCH 04/19] perf stat: Add aggr id for global mode Namhyung Kim
@ 2022-10-10  5:35 ` Namhyung Kim
  2022-10-10 22:49   ` Ian Rogers
  2022-10-12 10:40   ` Jiri Olsa
  2022-10-10  5:35 ` [PATCH 06/19] perf stat: Add 'needs_sort' argument to cpu_aggr_map__new() Namhyung Kim
                   ` (14 subsequent siblings)
  19 siblings, 2 replies; 63+ messages in thread
From: Namhyung Kim @ 2022-10-10  5:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun

Likewise, add an aggr_id for cpu for none aggregation mode.  This is not
used actually yet but later code will use to unify the aggregation code.

No functional change intended.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c | 48 +++++++++++++++++++++++++++++++++++----
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 144bb3a657f2..b00ef20aef5b 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1339,6 +1339,12 @@ static struct aggr_cpu_id perf_stat__get_global(struct perf_stat_config *config
 	return id;
 }
 
+static struct aggr_cpu_id perf_stat__get_cpu(struct perf_stat_config *config __maybe_unused,
+					     struct perf_cpu cpu)
+{
+	return aggr_cpu_id__cpu(cpu, /*data=*/NULL);
+}
+
 static struct aggr_cpu_id perf_stat__get_aggr(struct perf_stat_config *config,
 					      aggr_get_id_t get_id, struct perf_cpu cpu)
 {
@@ -1381,6 +1387,12 @@ static struct aggr_cpu_id perf_stat__get_global_cached(struct perf_stat_config *
 	return perf_stat__get_aggr(config, perf_stat__get_global, cpu);
 }
 
+static struct aggr_cpu_id perf_stat__get_cpu_cached(struct perf_stat_config *config,
+						    struct perf_cpu cpu)
+{
+	return perf_stat__get_aggr(config, perf_stat__get_cpu, cpu);
+}
+
 static bool term_percore_set(void)
 {
 	struct evsel *counter;
@@ -1407,8 +1419,7 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr(enum aggr_mode aggr_mode)
 	case AGGR_NONE:
 		if (term_percore_set())
 			return aggr_cpu_id__core;
-
-		return NULL;
+		return aggr_cpu_id__cpu;;
 	case AGGR_GLOBAL:
 		return aggr_cpu_id__global;
 	case AGGR_THREAD:
@@ -1431,10 +1442,9 @@ static aggr_get_id_t aggr_mode__get_id(enum aggr_mode aggr_mode)
 	case AGGR_NODE:
 		return perf_stat__get_node_cached;
 	case AGGR_NONE:
-		if (term_percore_set()) {
+		if (term_percore_set())
 			return perf_stat__get_core_cached;
-		}
-		return NULL;
+		return perf_stat__get_cpu_cached;
 	case AGGR_GLOBAL:
 		return perf_stat__get_global_cached;
 	case AGGR_THREAD:
@@ -1544,6 +1554,26 @@ static struct aggr_cpu_id perf_env__get_core_aggr_by_cpu(struct perf_cpu cpu, vo
 	return id;
 }
 
+static struct aggr_cpu_id perf_env__get_cpu_aggr_by_cpu(struct perf_cpu cpu, void *data)
+{
+	struct perf_env *env = data;
+	struct aggr_cpu_id id = aggr_cpu_id__empty();
+
+	if (cpu.cpu != -1) {
+		/*
+		 * core_id is relative to socket and die,
+		 * we need a global id. So we set
+		 * socket, die id and core id
+		 */
+		id.socket = env->cpu[cpu.cpu].socket_id;
+		id.die = env->cpu[cpu.cpu].die_id;
+		id.core = env->cpu[cpu.cpu].core_id;
+		id.cpu = cpu;
+	}
+
+	return id;
+}
+
 static struct aggr_cpu_id perf_env__get_node_aggr_by_cpu(struct perf_cpu cpu, void *data)
 {
 	struct aggr_cpu_id id = aggr_cpu_id__empty();
@@ -1579,6 +1609,12 @@ static struct aggr_cpu_id perf_stat__get_core_file(struct perf_stat_config *conf
 	return perf_env__get_core_aggr_by_cpu(cpu, &perf_stat.session->header.env);
 }
 
+static struct aggr_cpu_id perf_stat__get_cpu_file(struct perf_stat_config *config __maybe_unused,
+						  struct perf_cpu cpu)
+{
+	return perf_env__get_cpu_aggr_by_cpu(cpu, &perf_stat.session->header.env);
+}
+
 static struct aggr_cpu_id perf_stat__get_node_file(struct perf_stat_config *config __maybe_unused,
 						   struct perf_cpu cpu)
 {
@@ -1605,6 +1641,7 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr_file(enum aggr_mode aggr_mode)
 	case AGGR_GLOBAL:
 		return perf_env__get_global_aggr_by_cpu;
 	case AGGR_NONE:
+		return perf_env__get_cpu_aggr_by_cpu;
 	case AGGR_THREAD:
 	case AGGR_UNSET:
 	case AGGR_MAX:
@@ -1627,6 +1664,7 @@ static aggr_get_id_t aggr_mode__get_id_file(enum aggr_mode aggr_mode)
 	case AGGR_GLOBAL:
 		return perf_stat__get_global_file;
 	case AGGR_NONE:
+		return perf_stat__get_cpu_file;
 	case AGGR_THREAD:
 	case AGGR_UNSET:
 	case AGGR_MAX:
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 06/19] perf stat: Add 'needs_sort' argument to cpu_aggr_map__new()
  2022-10-10  5:35 [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1) Namhyung Kim
                   ` (4 preceding siblings ...)
  2022-10-10  5:35 ` [PATCH 05/19] perf stat: Add cpu aggr id for no aggregation mode Namhyung Kim
@ 2022-10-10  5:35 ` Namhyung Kim
  2022-10-10 22:53   ` Ian Rogers
  2022-10-10  5:35 ` [PATCH 07/19] perf stat: Add struct perf_stat_aggr to perf_stat_evsel Namhyung Kim
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 63+ messages in thread
From: Namhyung Kim @ 2022-10-10  5:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun

In case of no aggregation, it needs to keep the original (cpu) ordering
in the aggr_map so that it can be in sync with the cpu map.  This will
make the code easier to handle AGGR_NONE similar to others.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c | 7 +++++--
 tools/perf/util/cpumap.c  | 6 ++++--
 tools/perf/util/cpumap.h  | 2 +-
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index b00ef20aef5b..e5ddf60ab31d 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1461,8 +1461,9 @@ static int perf_stat_init_aggr_mode(void)
 	aggr_cpu_id_get_t get_id = aggr_mode__get_aggr(stat_config.aggr_mode);
 
 	if (get_id) {
+		bool needs_sort = stat_config.aggr_mode != AGGR_NONE;
 		stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.user_requested_cpus,
-							 get_id, /*data=*/NULL);
+							 get_id, /*data=*/NULL, needs_sort);
 		if (!stat_config.aggr_map) {
 			pr_err("cannot build %s map", aggr_mode__string[stat_config.aggr_mode]);
 			return -1;
@@ -1677,11 +1678,13 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
 {
 	struct perf_env *env = &st->session->header.env;
 	aggr_cpu_id_get_t get_id = aggr_mode__get_aggr_file(stat_config.aggr_mode);
+	bool needs_sort = stat_config.aggr_mode != AGGR_NONE;
 
 	if (!get_id)
 		return 0;
 
-	stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.user_requested_cpus, get_id, env);
+	stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.user_requested_cpus,
+						 get_id, env, needs_sort);
 	if (!stat_config.aggr_map) {
 		pr_err("cannot build %s map", aggr_mode__string[stat_config.aggr_mode]);
 		return -1;
diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c
index 60209fe87456..6e3fcf523de9 100644
--- a/tools/perf/util/cpumap.c
+++ b/tools/perf/util/cpumap.c
@@ -234,7 +234,7 @@ static int aggr_cpu_id__cmp(const void *a_pointer, const void *b_pointer)
 
 struct cpu_aggr_map *cpu_aggr_map__new(const struct perf_cpu_map *cpus,
 				       aggr_cpu_id_get_t get_id,
-				       void *data)
+				       void *data, bool needs_sort)
 {
 	int idx;
 	struct perf_cpu cpu;
@@ -270,8 +270,10 @@ struct cpu_aggr_map *cpu_aggr_map__new(const struct perf_cpu_map *cpus,
 		if (trimmed_c)
 			c = trimmed_c;
 	}
+
 	/* ensure we process id in increasing order */
-	qsort(c->map, c->nr, sizeof(struct aggr_cpu_id), aggr_cpu_id__cmp);
+	if (needs_sort)
+		qsort(c->map, c->nr, sizeof(struct aggr_cpu_id), aggr_cpu_id__cmp);
 
 	return c;
 
diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h
index b2ff648bc417..da28b3146ef9 100644
--- a/tools/perf/util/cpumap.h
+++ b/tools/perf/util/cpumap.h
@@ -97,7 +97,7 @@ typedef struct aggr_cpu_id (*aggr_cpu_id_get_t)(struct perf_cpu cpu, void *data)
  */
 struct cpu_aggr_map *cpu_aggr_map__new(const struct perf_cpu_map *cpus,
 				       aggr_cpu_id_get_t get_id,
-				       void *data);
+				       void *data, bool needs_sort);
 
 bool aggr_cpu_id__equal(const struct aggr_cpu_id *a, const struct aggr_cpu_id *b);
 bool aggr_cpu_id__is_empty(const struct aggr_cpu_id *a);
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 07/19] perf stat: Add struct perf_stat_aggr to perf_stat_evsel
  2022-10-10  5:35 [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1) Namhyung Kim
                   ` (5 preceding siblings ...)
  2022-10-10  5:35 ` [PATCH 06/19] perf stat: Add 'needs_sort' argument to cpu_aggr_map__new() Namhyung Kim
@ 2022-10-10  5:35 ` Namhyung Kim
  2022-10-10 23:00   ` Ian Rogers
  2022-10-10  5:35 ` [PATCH 08/19] perf stat: Allocate evsel->stats->aggr properly Namhyung Kim
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 63+ messages in thread
From: Namhyung Kim @ 2022-10-10  5:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun

The perf_stat_aggr struct is to keep aggregated counter values and the
states according to the aggregation mode.  The number of entries is
depends on the mode and this is a preparation for the later use.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/stat.c | 34 +++++++++++++++++++++++++++-------
 tools/perf/util/stat.h |  9 +++++++++
 2 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 8ec8bb4a9912..c9d5aa295b54 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -133,15 +133,33 @@ static void perf_stat_evsel_id_init(struct evsel *evsel)
 static void evsel__reset_stat_priv(struct evsel *evsel)
 {
 	struct perf_stat_evsel *ps = evsel->stats;
+	struct perf_stat_aggr *aggr = ps->aggr;
 
 	init_stats(&ps->res_stats);
+
+	if (aggr)
+		memset(aggr, 0, sizeof(*aggr) * ps->nr_aggr);
 }
 
-static int evsel__alloc_stat_priv(struct evsel *evsel)
+
+static int evsel__alloc_stat_priv(struct evsel *evsel, int nr_aggr)
 {
-	evsel->stats = zalloc(sizeof(struct perf_stat_evsel));
-	if (evsel->stats == NULL)
+	struct perf_stat_evsel *ps;
+
+	ps = zalloc(sizeof(*ps));
+	if (ps == NULL)
 		return -ENOMEM;
+
+	if (nr_aggr) {
+		ps->nr_aggr = nr_aggr;
+		ps->aggr = calloc(nr_aggr, sizeof(*ps->aggr));
+		if (ps->aggr == NULL) {
+			free(ps);
+			return -ENOMEM;
+		}
+	}
+
+	evsel->stats = ps;
 	perf_stat_evsel_id_init(evsel);
 	evsel__reset_stat_priv(evsel);
 	return 0;
@@ -151,8 +169,10 @@ static void evsel__free_stat_priv(struct evsel *evsel)
 {
 	struct perf_stat_evsel *ps = evsel->stats;
 
-	if (ps)
+	if (ps) {
+		zfree(&ps->aggr);
 		zfree(&ps->group_data);
+	}
 	zfree(&evsel->stats);
 }
 
@@ -181,9 +201,9 @@ static void evsel__reset_prev_raw_counts(struct evsel *evsel)
 		perf_counts__reset(evsel->prev_raw_counts);
 }
 
-static int evsel__alloc_stats(struct evsel *evsel, bool alloc_raw)
+static int evsel__alloc_stats(struct evsel *evsel, int nr_aggr, bool alloc_raw)
 {
-	if (evsel__alloc_stat_priv(evsel) < 0 ||
+	if (evsel__alloc_stat_priv(evsel, nr_aggr) < 0 ||
 	    evsel__alloc_counts(evsel) < 0 ||
 	    (alloc_raw && evsel__alloc_prev_raw_counts(evsel) < 0))
 		return -ENOMEM;
@@ -196,7 +216,7 @@ int evlist__alloc_stats(struct evlist *evlist, bool alloc_raw)
 	struct evsel *evsel;
 
 	evlist__for_each_entry(evlist, evsel) {
-		if (evsel__alloc_stats(evsel, alloc_raw))
+		if (evsel__alloc_stats(evsel, 0, alloc_raw))
 			goto out_free;
 	}
 
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index b0899c6e002f..ea356e5aa351 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -8,6 +8,7 @@
 #include <sys/resource.h>
 #include "cpumap.h"
 #include "rblist.h"
+#include "counts.h"
 
 struct perf_cpu_map;
 struct perf_stat_config;
@@ -42,9 +43,17 @@ enum perf_stat_evsel_id {
 	PERF_STAT_EVSEL_ID__MAX,
 };
 
+struct perf_stat_aggr {
+	struct perf_counts_values	counts;
+	int				nr;
+	bool				failed;
+};
+
 struct perf_stat_evsel {
 	struct stats		 res_stats;
 	enum perf_stat_evsel_id	 id;
+	int			 nr_aggr;
+	struct perf_stat_aggr	*aggr;
 	u64			*group_data;
 };
 
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 08/19] perf stat: Allocate evsel->stats->aggr properly
  2022-10-10  5:35 [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1) Namhyung Kim
                   ` (6 preceding siblings ...)
  2022-10-10  5:35 ` [PATCH 07/19] perf stat: Add struct perf_stat_aggr to perf_stat_evsel Namhyung Kim
@ 2022-10-10  5:35 ` Namhyung Kim
  2022-10-10 23:03   ` Ian Rogers
  2022-10-10  5:35 ` [PATCH 09/19] perf stat: Aggregate events using evsel->stats->aggr Namhyung Kim
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 63+ messages in thread
From: Namhyung Kim @ 2022-10-10  5:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun

The perf_stat_config.aggr_map should have a correct size of the
aggregation map.  Use it to allocate aggr_counts.

Also AGGR_NONE with per-core events can be tricky because it doesn't
aggreate basically but it needs to do so for per-core events only.
So only per-core evsels will have stats->aggr data.

Note that other caller of evlist__alloc_stat() might not have
stat_config or aggr_map.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-script.c     | 4 ++--
 tools/perf/builtin-stat.c       | 6 +++---
 tools/perf/tests/parse-metric.c | 2 +-
 tools/perf/tests/pmu-events.c   | 2 +-
 tools/perf/util/stat.c          | 9 +++++++--
 tools/perf/util/stat.h          | 3 ++-
 6 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 7ca238277d83..691915a71c86 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2049,7 +2049,7 @@ static void perf_sample__fprint_metric(struct perf_script *script,
 	u64 val;
 
 	if (!evsel->stats)
-		evlist__alloc_stats(script->session->evlist, false);
+		evlist__alloc_stats(&stat_config, script->session->evlist, false);
 	if (evsel_script(leader)->gnum++ == 0)
 		perf_stat__reset_shadow_stats();
 	val = sample->period * evsel->scale;
@@ -3632,7 +3632,7 @@ static int set_maps(struct perf_script *script)
 
 	perf_evlist__set_maps(&evlist->core, script->cpus, script->threads);
 
-	if (evlist__alloc_stats(evlist, true))
+	if (evlist__alloc_stats(&stat_config, evlist, true))
 		return -ENOMEM;
 
 	script->allocated = true;
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index e5ddf60ab31d..eaddafbd7ff2 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -2124,7 +2124,7 @@ static int set_maps(struct perf_stat *st)
 
 	perf_evlist__set_maps(&evsel_list->core, st->cpus, st->threads);
 
-	if (evlist__alloc_stats(evsel_list, true))
+	if (evlist__alloc_stats(&stat_config, evsel_list, true))
 		return -ENOMEM;
 
 	st->maps_allocated = true;
@@ -2571,10 +2571,10 @@ int cmd_stat(int argc, const char **argv)
 		goto out;
 	}
 
-	if (evlist__alloc_stats(evsel_list, interval))
+	if (perf_stat_init_aggr_mode())
 		goto out;
 
-	if (perf_stat_init_aggr_mode())
+	if (evlist__alloc_stats(&stat_config, evsel_list, interval))
 		goto out;
 
 	/*
diff --git a/tools/perf/tests/parse-metric.c b/tools/perf/tests/parse-metric.c
index 68f5a2a03242..cb3a9b795c0f 100644
--- a/tools/perf/tests/parse-metric.c
+++ b/tools/perf/tests/parse-metric.c
@@ -103,7 +103,7 @@ static int __compute_metric(const char *name, struct value *vals,
 	if (err)
 		goto out;
 
-	err = evlist__alloc_stats(evlist, false);
+	err = evlist__alloc_stats(NULL, evlist, false);
 	if (err)
 		goto out;
 
diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
index 097e05c796ab..a5e1028dacfc 100644
--- a/tools/perf/tests/pmu-events.c
+++ b/tools/perf/tests/pmu-events.c
@@ -889,7 +889,7 @@ static int test__parsing_callback(const struct pmu_event *pe, const struct pmu_e
 		goto out_err;
 	}
 
-	err = evlist__alloc_stats(evlist, false);
+	err = evlist__alloc_stats(NULL, evlist, false);
 	if (err)
 		goto out_err;
 	/*
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index c9d5aa295b54..374149628507 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -211,12 +211,17 @@ static int evsel__alloc_stats(struct evsel *evsel, int nr_aggr, bool alloc_raw)
 	return 0;
 }
 
-int evlist__alloc_stats(struct evlist *evlist, bool alloc_raw)
+int evlist__alloc_stats(struct perf_stat_config *config,
+			struct evlist *evlist, bool alloc_raw)
 {
 	struct evsel *evsel;
+	int nr_aggr = 0;
+
+	if (config && config->aggr_map)
+		nr_aggr = config->aggr_map->nr;
 
 	evlist__for_each_entry(evlist, evsel) {
-		if (evsel__alloc_stats(evsel, 0, alloc_raw))
+		if (evsel__alloc_stats(evsel, nr_aggr, alloc_raw))
 			goto out_free;
 	}
 
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index ea356e5aa351..74bd51a3cb36 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -257,7 +257,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 				   struct runtime_stat *st);
 void perf_stat__collect_metric_expr(struct evlist *);
 
-int evlist__alloc_stats(struct evlist *evlist, bool alloc_raw);
+int evlist__alloc_stats(struct perf_stat_config *config,
+			struct evlist *evlist, bool alloc_raw);
 void evlist__free_stats(struct evlist *evlist);
 void evlist__reset_stats(struct evlist *evlist);
 void evlist__reset_prev_raw_counts(struct evlist *evlist);
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 09/19] perf stat: Aggregate events using evsel->stats->aggr
  2022-10-10  5:35 [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1) Namhyung Kim
                   ` (7 preceding siblings ...)
  2022-10-10  5:35 ` [PATCH 08/19] perf stat: Allocate evsel->stats->aggr properly Namhyung Kim
@ 2022-10-10  5:35 ` Namhyung Kim
  2022-10-10 23:11   ` Ian Rogers
  2022-10-10  5:35 ` [PATCH 10/19] perf stat: Aggregate per-thread stats " Namhyung Kim
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 63+ messages in thread
From: Namhyung Kim @ 2022-10-10  5:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun

Add a logic to aggregate counter values to the new evsel->stats->aggr.
This is not used yet so shadow stats are not updated.  But later patch
will convert the existing code to use it.

With that, we don't need to handle AGGR_GLOBAL specially anymore.  It
can use the same logic with counts, prev_counts and aggr_counts.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c                     |  3 --
 tools/perf/util/evsel.c                       |  9 +---
 .../scripting-engines/trace-event-python.c    |  6 ---
 tools/perf/util/stat.c                        | 46 ++++++++++++++++---
 4 files changed, 41 insertions(+), 23 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index eaddafbd7ff2..139e35ed68d3 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -963,9 +963,6 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 		init_stats(&walltime_nsecs_stats);
 		update_stats(&walltime_nsecs_stats, t1 - t0);
 
-		if (stat_config.aggr_mode == AGGR_GLOBAL)
-			evlist__save_aggr_prev_raw_counts(evsel_list);
-
 		evlist__copy_prev_raw_counts(evsel_list);
 		evlist__reset_prev_raw_counts(evsel_list);
 		perf_stat__reset_shadow_per_stat(&rt_stat);
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index a6ea91c72659..a1fcb3166149 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1526,13 +1526,8 @@ void evsel__compute_deltas(struct evsel *evsel, int cpu_map_idx, int thread,
 	if (!evsel->prev_raw_counts)
 		return;
 
-	if (cpu_map_idx == -1) {
-		tmp = evsel->prev_raw_counts->aggr;
-		evsel->prev_raw_counts->aggr = *count;
-	} else {
-		tmp = *perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread);
-		*perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread) = *count;
-	}
+	tmp = *perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread);
+	*perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread) = *count;
 
 	count->val = count->val - tmp.val;
 	count->ena = count->ena - tmp.ena;
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index 1f2040f36d4e..7bc8559dce6a 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -1653,12 +1653,6 @@ static void python_process_stat(struct perf_stat_config *config,
 	struct perf_cpu_map *cpus = counter->core.cpus;
 	int cpu, thread;
 
-	if (config->aggr_mode == AGGR_GLOBAL) {
-		process_stat(counter, (struct perf_cpu){ .cpu = -1 }, -1, tstamp,
-			     &counter->counts->aggr);
-		return;
-	}
-
 	for (thread = 0; thread < threads->nr; thread++) {
 		for (cpu = 0; cpu < perf_cpu_map__nr(cpus); cpu++) {
 			process_stat(counter, perf_cpu_map__cpu(cpus, cpu),
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 374149628507..99874254809d 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -387,6 +387,7 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
 		       struct perf_counts_values *count)
 {
 	struct perf_counts_values *aggr = &evsel->counts->aggr;
+	struct perf_stat_evsel *ps = evsel->stats;
 	static struct perf_counts_values zero;
 	bool skip = false;
 
@@ -398,6 +399,44 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
 	if (skip)
 		count = &zero;
 
+	if (!evsel->snapshot)
+		evsel__compute_deltas(evsel, cpu_map_idx, thread, count);
+	perf_counts_values__scale(count, config->scale, NULL);
+
+	if (ps->aggr) {
+		struct perf_cpu cpu = perf_cpu_map__cpu(evsel->core.cpus, cpu_map_idx);
+		struct aggr_cpu_id aggr_id = config->aggr_get_id(config, cpu);
+		struct perf_stat_aggr *ps_aggr;
+		int i;
+
+		for (i = 0; i < ps->nr_aggr; i++) {
+			if (!aggr_cpu_id__equal(&aggr_id, &config->aggr_map->map[i]))
+				continue;
+
+			ps_aggr = &ps->aggr[i];
+			ps_aggr->nr++;
+
+			/*
+			 * When any result is bad, make them all to give
+			 * consistent output in interval mode.
+			 */
+			if (count->ena == 0 || count->run == 0 ||
+			    evsel->counts->scaled == -1) {
+				ps_aggr->counts.val = 0;
+				ps_aggr->counts.ena = 0;
+				ps_aggr->counts.run = 0;
+				ps_aggr->failed = true;
+			}
+
+			if (!ps_aggr->failed) {
+				ps_aggr->counts.val += count->val;
+				ps_aggr->counts.ena += count->ena;
+				ps_aggr->counts.run += count->run;
+			}
+			break;
+		}
+	}
+
 	switch (config->aggr_mode) {
 	case AGGR_THREAD:
 	case AGGR_CORE:
@@ -405,9 +444,6 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
 	case AGGR_SOCKET:
 	case AGGR_NODE:
 	case AGGR_NONE:
-		if (!evsel->snapshot)
-			evsel__compute_deltas(evsel, cpu_map_idx, thread, count);
-		perf_counts_values__scale(count, config->scale, NULL);
 		if ((config->aggr_mode == AGGR_NONE) && (!evsel->percore)) {
 			perf_stat__update_shadow_stats(evsel, count->val,
 						       cpu_map_idx, &rt_stat);
@@ -469,10 +505,6 @@ int perf_stat_process_counter(struct perf_stat_config *config,
 	if (config->aggr_mode != AGGR_GLOBAL)
 		return 0;
 
-	if (!counter->snapshot)
-		evsel__compute_deltas(counter, -1, -1, aggr);
-	perf_counts_values__scale(aggr, config->scale, &counter->counts->scaled);
-
 	update_stats(&ps->res_stats, *count);
 
 	if (verbose > 0) {
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 10/19] perf stat: Aggregate per-thread stats using evsel->stats->aggr
  2022-10-10  5:35 [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1) Namhyung Kim
                   ` (8 preceding siblings ...)
  2022-10-10  5:35 ` [PATCH 09/19] perf stat: Aggregate events using evsel->stats->aggr Namhyung Kim
@ 2022-10-10  5:35 ` Namhyung Kim
  2022-10-10 23:17   ` Ian Rogers
  2022-10-10  5:35 ` [PATCH 11/19] perf stat: Allocate aggr counts for recorded data Namhyung Kim
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 63+ messages in thread
From: Namhyung Kim @ 2022-10-10  5:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun

Per-thread aggregation doesn't use the CPU numbers but the logic should
be the same.  Initialize cpu_aggr_map separately for AGGR_THREAD and use
thread map idx to aggregate counter values.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c | 31 +++++++++++++++++++++++++++++++
 tools/perf/util/stat.c    | 19 +++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 139e35ed68d3..c76240cfc635 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1468,6 +1468,21 @@ static int perf_stat_init_aggr_mode(void)
 		stat_config.aggr_get_id = aggr_mode__get_id(stat_config.aggr_mode);
 	}
 
+	if (stat_config.aggr_mode == AGGR_THREAD) {
+		nr = perf_thread_map__nr(evsel_list->core.threads);
+		stat_config.aggr_map = cpu_aggr_map__empty_new(nr);
+		if (stat_config.aggr_map == NULL)
+			return -ENOMEM;
+
+		for (int s = 0; s < nr; s++) {
+			struct aggr_cpu_id id = aggr_cpu_id__empty();
+
+			id.thread_idx = s;
+			stat_config.aggr_map->map[s] = id;
+		}
+		return 0;
+	}
+
 	/*
 	 * The evsel_list->cpus is the base we operate on,
 	 * taking the highest cpu number to be the size of
@@ -1677,6 +1692,22 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
 	aggr_cpu_id_get_t get_id = aggr_mode__get_aggr_file(stat_config.aggr_mode);
 	bool needs_sort = stat_config.aggr_mode != AGGR_NONE;
 
+	if (stat_config.aggr_mode == AGGR_THREAD) {
+		int nr = perf_thread_map__nr(evsel_list->core.threads);
+
+		stat_config.aggr_map = cpu_aggr_map__empty_new(nr);
+		if (stat_config.aggr_map == NULL)
+			return -ENOMEM;
+
+		for (int s = 0; s < nr; s++) {
+			struct aggr_cpu_id id = aggr_cpu_id__empty();
+
+			id.thread_idx = s;
+			stat_config.aggr_map->map[s] = id;
+		}
+		return 0;
+	}
+
 	if (!get_id)
 		return 0;
 
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 99874254809d..013dbe1c5d28 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -403,6 +403,24 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
 		evsel__compute_deltas(evsel, cpu_map_idx, thread, count);
 	perf_counts_values__scale(count, config->scale, NULL);
 
+	if (config->aggr_mode == AGGR_THREAD) {
+		struct perf_counts_values *aggr_counts = &ps->aggr[thread].counts;
+
+		/*
+		 * Skip value 0 when enabling --per-thread globally,
+		 * otherwise too many 0 output.
+		 */
+		if (count->val == 0 && config->system_wide)
+			return 0;
+
+		ps->aggr[thread].nr++;
+
+		aggr_counts->val += count->val;
+		aggr_counts->ena += count->ena;
+		aggr_counts->run += count->run;
+		goto update;
+	}
+
 	if (ps->aggr) {
 		struct perf_cpu cpu = perf_cpu_map__cpu(evsel->core.cpus, cpu_map_idx);
 		struct aggr_cpu_id aggr_id = config->aggr_get_id(config, cpu);
@@ -437,6 +455,7 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
 		}
 	}
 
+update:
 	switch (config->aggr_mode) {
 	case AGGR_THREAD:
 	case AGGR_CORE:
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 11/19] perf stat: Allocate aggr counts for recorded data
  2022-10-10  5:35 [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1) Namhyung Kim
                   ` (9 preceding siblings ...)
  2022-10-10  5:35 ` [PATCH 10/19] perf stat: Aggregate per-thread stats " Namhyung Kim
@ 2022-10-10  5:35 ` Namhyung Kim
  2022-10-10 23:18   ` Ian Rogers
  2022-10-10  5:35 ` [PATCH 12/19] perf stat: Reset aggr counts for each interval Namhyung Kim
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 63+ messages in thread
From: Namhyung Kim @ 2022-10-10  5:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun

In the process_stat_config_event() it sets the aggr_mode that means the
earlier evlist__alloc_stats() cannot allocate the aggr counts due to the
missing aggr_mode.

Do it after setting the aggr_map using evlist__alloc_aggr_stats().

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c |  8 ++++++++
 tools/perf/util/stat.c    | 39 +++++++++++++++++++++++++++++++--------
 tools/perf/util/stat.h    |  2 ++
 3 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index c76240cfc635..983f38cd4caa 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -2139,6 +2139,14 @@ int process_stat_config_event(struct perf_session *session,
 	else
 		perf_stat_init_aggr_mode_file(st);
 
+	if (stat_config.aggr_map) {
+		int nr_aggr = stat_config.aggr_map->nr;
+
+		if (evlist__alloc_aggr_stats(session->evlist, nr_aggr) < 0) {
+			pr_err("cannot allocate aggr counts\n");
+			return -1;
+		}
+	}
 	return 0;
 }
 
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 013dbe1c5d28..279aa4ea342d 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -141,6 +141,31 @@ static void evsel__reset_stat_priv(struct evsel *evsel)
 		memset(aggr, 0, sizeof(*aggr) * ps->nr_aggr);
 }
 
+static int evsel__alloc_aggr_stats(struct evsel *evsel, int nr_aggr)
+{
+	struct perf_stat_evsel *ps = evsel->stats;
+
+	if (ps == NULL)
+		return 0;
+
+	ps->nr_aggr = nr_aggr;
+	ps->aggr = calloc(nr_aggr, sizeof(*ps->aggr));
+	if (ps->aggr == NULL)
+		return -ENOMEM;
+
+	return 0;
+}
+
+int evlist__alloc_aggr_stats(struct evlist *evlist, int nr_aggr)
+{
+	struct evsel *evsel;
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (evsel__alloc_aggr_stats(evsel, nr_aggr) < 0)
+			return -1;
+	}
+	return 0;
+}
 
 static int evsel__alloc_stat_priv(struct evsel *evsel, int nr_aggr)
 {
@@ -150,16 +175,14 @@ static int evsel__alloc_stat_priv(struct evsel *evsel, int nr_aggr)
 	if (ps == NULL)
 		return -ENOMEM;
 
-	if (nr_aggr) {
-		ps->nr_aggr = nr_aggr;
-		ps->aggr = calloc(nr_aggr, sizeof(*ps->aggr));
-		if (ps->aggr == NULL) {
-			free(ps);
-			return -ENOMEM;
-		}
+	evsel->stats = ps;
+
+	if (nr_aggr && evsel__alloc_aggr_stats(evsel, nr_aggr) < 0) {
+		evsel->stats = NULL;
+		free(ps);
+		return -ENOMEM;
 	}
 
-	evsel->stats = ps;
 	perf_stat_evsel_id_init(evsel);
 	evsel__reset_stat_priv(evsel);
 	return 0;
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 74bd51a3cb36..936c0709ce0d 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -265,6 +265,8 @@ void evlist__reset_prev_raw_counts(struct evlist *evlist);
 void evlist__copy_prev_raw_counts(struct evlist *evlist);
 void evlist__save_aggr_prev_raw_counts(struct evlist *evlist);
 
+int evlist__alloc_aggr_stats(struct evlist *evlist, int nr_aggr);
+
 int perf_stat_process_counter(struct perf_stat_config *config,
 			      struct evsel *counter);
 struct perf_tool;
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 12/19] perf stat: Reset aggr counts for each interval
  2022-10-10  5:35 [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1) Namhyung Kim
                   ` (10 preceding siblings ...)
  2022-10-10  5:35 ` [PATCH 11/19] perf stat: Allocate aggr counts for recorded data Namhyung Kim
@ 2022-10-10  5:35 ` Namhyung Kim
  2022-10-10 23:20   ` Ian Rogers
  2022-10-10  5:35 ` [PATCH 13/19] perf stat: Split process_counters() Namhyung Kim
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 63+ messages in thread
From: Namhyung Kim @ 2022-10-10  5:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun

The evsel->stats->aggr->count should be reset for interval processing
since we want to use the values directly for display.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c |  3 +++
 tools/perf/util/stat.c    | 13 +++++++++++++
 tools/perf/util/stat.h    |  1 +
 3 files changed, 17 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 983f38cd4caa..38036f40e993 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -492,6 +492,8 @@ static void process_interval(void)
 	diff_timespec(&rs, &ts, &ref_time);
 
 	perf_stat__reset_shadow_per_stat(&rt_stat);
+	evlist__reset_aggr_stats(evsel_list);
+
 	read_counters(&rs);
 
 	if (STAT_RECORD) {
@@ -965,6 +967,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 
 		evlist__copy_prev_raw_counts(evsel_list);
 		evlist__reset_prev_raw_counts(evsel_list);
+		evlist__reset_aggr_stats(evsel_list);
 		perf_stat__reset_shadow_per_stat(&rt_stat);
 	} else {
 		update_stats(&walltime_nsecs_stats, t1 - t0);
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 279aa4ea342d..4edfc1c5dc07 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -276,6 +276,19 @@ void evlist__reset_stats(struct evlist *evlist)
 	}
 }
 
+void evlist__reset_aggr_stats(struct evlist *evlist)
+{
+	struct evsel *evsel;
+
+	evlist__for_each_entry(evlist, evsel) {
+		struct perf_stat_evsel *ps = evsel->stats;
+		struct perf_stat_aggr *aggr = ps->aggr;
+
+		if (aggr)
+			memset(aggr, 0, sizeof(*aggr) * ps->nr_aggr);
+	}
+}
+
 void evlist__reset_prev_raw_counts(struct evlist *evlist)
 {
 	struct evsel *evsel;
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 936c0709ce0d..3a876ad2870b 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -266,6 +266,7 @@ void evlist__copy_prev_raw_counts(struct evlist *evlist);
 void evlist__save_aggr_prev_raw_counts(struct evlist *evlist);
 
 int evlist__alloc_aggr_stats(struct evlist *evlist, int nr_aggr);
+void evlist__reset_aggr_stats(struct evlist *evlist);
 
 int perf_stat_process_counter(struct perf_stat_config *config,
 			      struct evsel *counter);
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 13/19] perf stat: Split process_counters()
  2022-10-10  5:35 [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1) Namhyung Kim
                   ` (11 preceding siblings ...)
  2022-10-10  5:35 ` [PATCH 12/19] perf stat: Reset aggr counts for each interval Namhyung Kim
@ 2022-10-10  5:35 ` Namhyung Kim
  2022-10-10 23:21   ` Ian Rogers
  2022-10-10  5:35 ` [PATCH 14/19] perf stat: Add perf_stat_merge_counters() Namhyung Kim
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 63+ messages in thread
From: Namhyung Kim @ 2022-10-10  5:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun

It'd do more processing with aggregation.  Let's split the function so that it
can be shared with by process_stat_round_event() too.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 38036f40e993..49a7e290d778 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -465,15 +465,19 @@ static int read_bpf_map_counters(void)
 	return 0;
 }
 
-static void read_counters(struct timespec *rs)
+static int read_counters(struct timespec *rs)
 {
-	struct evsel *counter;
-
 	if (!stat_config.stop_read_counter) {
 		if (read_bpf_map_counters() ||
 		    read_affinity_counters(rs))
-			return;
+			return -1;
 	}
+	return 0;
+}
+
+static void process_counters(void)
+{
+	struct evsel *counter;
 
 	evlist__for_each_entry(evsel_list, counter) {
 		if (counter->err)
@@ -494,7 +498,8 @@ static void process_interval(void)
 	perf_stat__reset_shadow_per_stat(&rt_stat);
 	evlist__reset_aggr_stats(evsel_list);
 
-	read_counters(&rs);
+	if (read_counters(&rs) == 0)
+		process_counters();
 
 	if (STAT_RECORD) {
 		if (WRITE_STAT_ROUND_EVENT(rs.tv_sec * NSEC_PER_SEC + rs.tv_nsec, INTERVAL))
@@ -980,7 +985,8 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 	 * avoid arbitrary skew, we must read all counters before closing any
 	 * group leaders.
 	 */
-	read_counters(&(struct timespec) { .tv_nsec = t1-t0 });
+	if (read_counters(&(struct timespec) { .tv_nsec = t1-t0 }) == 0)
+		process_counters();
 
 	/*
 	 * We need to keep evsel_list alive, because it's processed
@@ -2098,13 +2104,11 @@ static int process_stat_round_event(struct perf_session *session,
 				    union perf_event *event)
 {
 	struct perf_record_stat_round *stat_round = &event->stat_round;
-	struct evsel *counter;
 	struct timespec tsh, *ts = NULL;
 	const char **argv = session->header.env.cmdline_argv;
 	int argc = session->header.env.nr_cmdline;
 
-	evlist__for_each_entry(evsel_list, counter)
-		perf_stat_process_counter(&stat_config, counter);
+	process_counters();
 
 	if (stat_round->type == PERF_STAT_ROUND_TYPE__FINAL)
 		update_stats(&walltime_nsecs_stats, stat_round->time);
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 14/19] perf stat: Add perf_stat_merge_counters()
  2022-10-10  5:35 [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1) Namhyung Kim
                   ` (12 preceding siblings ...)
  2022-10-10  5:35 ` [PATCH 13/19] perf stat: Split process_counters() Namhyung Kim
@ 2022-10-10  5:35 ` Namhyung Kim
  2022-10-10 23:31   ` Ian Rogers
  2022-10-10  5:35 ` [PATCH 15/19] perf stat: Add perf_stat_process_percore() Namhyung Kim
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 63+ messages in thread
From: Namhyung Kim @ 2022-10-10  5:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun

The perf_stat_merge_counters() is to aggregate the same events in different
PMUs like in case of uncore or hybrid.  The same logic is in the stat-display
routines but I think it should be handled when it processes the event counters.

As it works on the aggr_counters, it doesn't change the output yet.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c |  2 +
 tools/perf/util/stat.c    | 96 +++++++++++++++++++++++++++++++++++++++
 tools/perf/util/stat.h    |  2 +
 3 files changed, 100 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 49a7e290d778..f90e8f29cb23 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -486,6 +486,8 @@ static void process_counters(void)
 			pr_warning("failed to process counter %s\n", counter->name);
 		counter->err = 0;
 	}
+
+	perf_stat_merge_counters(&stat_config, evsel_list);
 }
 
 static void process_interval(void)
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 4edfc1c5dc07..1bb197782a34 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -575,6 +575,102 @@ int perf_stat_process_counter(struct perf_stat_config *config,
 	return 0;
 }
 
+static int evsel__merge_aggr_counters(struct evsel *evsel, struct evsel *alias)
+{
+	struct perf_stat_evsel *ps_a = evsel->stats;
+	struct perf_stat_evsel *ps_b = alias->stats;
+	int i;
+
+	if (ps_a->aggr == NULL && ps_b->aggr == NULL)
+		return 0;
+
+	if (ps_a->nr_aggr != ps_b->nr_aggr) {
+		pr_err("Unmatched aggregation mode between aliases\n");
+		return -1;
+	}
+
+	for (i = 0; i < ps_a->nr_aggr; i++) {
+		struct perf_counts_values *aggr_counts_a = &ps_a->aggr[i].counts;
+		struct perf_counts_values *aggr_counts_b = &ps_b->aggr[i].counts;
+
+		/* NB: don't increase aggr.nr for aliases */
+
+		aggr_counts_a->val += aggr_counts_b->val;
+		aggr_counts_a->ena += aggr_counts_b->ena;
+		aggr_counts_a->run += aggr_counts_b->run;
+	}
+
+	return 0;
+}
+/* events should have the same name, scale, unit, cgroup but on different PMUs */
+static bool evsel__is_alias(struct evsel *evsel_a, struct evsel *evsel_b)
+{
+	if (strcmp(evsel__name(evsel_a), evsel__name(evsel_b)))
+		return false;
+
+	if (evsel_a->scale != evsel_b->scale)
+		return false;
+
+	if (evsel_a->cgrp != evsel_b->cgrp)
+		return false;
+
+	if (strcmp(evsel_a->unit, evsel_b->unit))
+		return false;
+
+	if (evsel__is_clock(evsel_a) != evsel__is_clock(evsel_b))
+		return false;
+
+	return !!strcmp(evsel_a->pmu_name, evsel_b->pmu_name);
+}
+
+static void evsel__merge_aliases(struct evsel *evsel)
+{
+	struct evlist *evlist = evsel->evlist;
+	struct evsel *alias;
+
+	alias = list_prepare_entry(evsel, &(evlist->core.entries), core.node);
+	list_for_each_entry_continue(alias, &evlist->core.entries, core.node) {
+		/* Merge the same events on different PMUs. */
+		if (evsel__is_alias(evsel, alias)) {
+			evsel__merge_aggr_counters(evsel, alias);
+			alias->merged_stat = true;
+		}
+	}
+}
+
+static bool evsel__should_merge_hybrid(struct evsel *evsel, struct perf_stat_config *config)
+{
+	struct perf_pmu *pmu;
+
+	if (!config->hybrid_merge)
+		return false;
+
+	pmu = evsel__find_pmu(evsel);
+	return pmu && pmu->is_hybrid;
+}
+
+static void evsel__merge_stats(struct evsel *evsel, struct perf_stat_config *config)
+{
+	/* this evsel is already merged */
+	if (evsel->merged_stat)
+		return;
+
+	if (evsel->auto_merge_stats || evsel__should_merge_hybrid(evsel, config))
+		evsel__merge_aliases(evsel);
+}
+
+/* merge the same uncore and hybrid events if requested */
+void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *evlist)
+{
+	struct evsel *evsel;
+
+	if (config->no_merge)
+		return;
+
+	evlist__for_each_entry(evlist, evsel)
+		evsel__merge_stats(evsel, config);
+}
+
 int perf_event__process_stat_event(struct perf_session *session,
 				   union perf_event *event)
 {
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 3a876ad2870b..12cc60ab04e4 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -270,6 +270,8 @@ void evlist__reset_aggr_stats(struct evlist *evlist);
 
 int perf_stat_process_counter(struct perf_stat_config *config,
 			      struct evsel *counter);
+void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *evlist);
+
 struct perf_tool;
 union perf_event;
 struct perf_session;
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 15/19] perf stat: Add perf_stat_process_percore()
  2022-10-10  5:35 [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1) Namhyung Kim
                   ` (13 preceding siblings ...)
  2022-10-10  5:35 ` [PATCH 14/19] perf stat: Add perf_stat_merge_counters() Namhyung Kim
@ 2022-10-10  5:35 ` Namhyung Kim
  2022-10-10 23:32   ` Ian Rogers
  2022-10-10  5:35 ` [PATCH 16/19] perf stat: Add perf_stat_process_shadow_stats() Namhyung Kim
                   ` (4 subsequent siblings)
  19 siblings, 1 reply; 63+ messages in thread
From: Namhyung Kim @ 2022-10-10  5:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun

The perf_stat_process_percore() is to aggregate counts for an event per-core
even if the aggr_mode is AGGR_NONE.  This is enabled when user requested it
on the command line.

To handle that, it keeps the per-cpu counts at first.  And then it aggregates
the counts that have the same core id in the aggr->counts and updates the
values for each cpu back.

Later, per-core events will skip one of the CPUs unless percore-show-thread
option is given.  In that case, it can simply print all cpu stats with the
updated (per-core) values.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c |  1 +
 tools/perf/util/stat.c    | 71 +++++++++++++++++++++++++++++++++++++++
 tools/perf/util/stat.h    |  2 ++
 3 files changed, 74 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index f90e8f29cb23..c127e784a7be 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -488,6 +488,7 @@ static void process_counters(void)
 	}
 
 	perf_stat_merge_counters(&stat_config, evsel_list);
+	perf_stat_process_percore(&stat_config, evsel_list);
 }
 
 static void process_interval(void)
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 1bb197782a34..d788d0e85204 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -671,6 +671,77 @@ void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *ev
 		evsel__merge_stats(evsel, config);
 }
 
+static void evsel__update_percore_stats(struct evsel *evsel, struct aggr_cpu_id *core_id)
+{
+	struct perf_stat_evsel *ps = evsel->stats;
+	struct perf_counts_values counts = { 0, };
+	struct aggr_cpu_id id;
+	struct perf_cpu cpu;
+	int idx;
+
+	/* collect per-core counts */
+	perf_cpu_map__for_each_cpu(cpu, idx, evsel->core.cpus) {
+		struct perf_stat_aggr *aggr = &ps->aggr[idx];
+
+		id = aggr_cpu_id__core(cpu, NULL);
+		if (!aggr_cpu_id__equal(core_id, &id))
+			continue;
+
+		counts.val += aggr->counts.val;
+		counts.ena += aggr->counts.ena;
+		counts.run += aggr->counts.run;
+	}
+
+	/* update aggregated per-core counts for each CPU */
+	perf_cpu_map__for_each_cpu(cpu, idx, evsel->core.cpus) {
+		struct perf_stat_aggr *aggr = &ps->aggr[idx];
+
+		id = aggr_cpu_id__core(cpu, NULL);
+		if (!aggr_cpu_id__equal(core_id, &id))
+			continue;
+
+		aggr->counts.val = counts.val;
+		aggr->counts.ena = counts.ena;
+		aggr->counts.run = counts.run;
+
+		aggr->used = true;
+	}
+}
+
+/* we have an aggr_map for cpu, but want to aggregate the counters per-core */
+static void evsel__process_percore(struct evsel *evsel)
+{
+	struct perf_stat_evsel *ps = evsel->stats;
+	struct aggr_cpu_id core_id;
+	struct perf_cpu cpu;
+	int idx;
+
+	if (!evsel->percore)
+		return;
+
+	perf_cpu_map__for_each_cpu(cpu, idx, evsel->core.cpus) {
+		struct perf_stat_aggr *aggr = &ps->aggr[idx];
+
+		if (aggr->used)
+			continue;
+
+		core_id = aggr_cpu_id__core(cpu, NULL);
+		evsel__update_percore_stats(evsel, &core_id);
+	}
+}
+
+/* process cpu stats on per-core events */
+void perf_stat_process_percore(struct perf_stat_config *config, struct evlist *evlist)
+{
+	struct evsel *evsel;
+
+	if (config->aggr_mode != AGGR_NONE)
+		return;
+
+	evlist__for_each_entry(evlist, evsel)
+		evsel__process_percore(evsel);
+}
+
 int perf_event__process_stat_event(struct perf_session *session,
 				   union perf_event *event)
 {
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 12cc60ab04e4..ac85ed46aa59 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -46,6 +46,7 @@ enum perf_stat_evsel_id {
 struct perf_stat_aggr {
 	struct perf_counts_values	counts;
 	int				nr;
+	bool				used;
 	bool				failed;
 };
 
@@ -271,6 +272,7 @@ void evlist__reset_aggr_stats(struct evlist *evlist);
 int perf_stat_process_counter(struct perf_stat_config *config,
 			      struct evsel *counter);
 void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *evlist);
+void perf_stat_process_percore(struct perf_stat_config *config, struct evlist *evlist);
 
 struct perf_tool;
 union perf_event;
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 16/19] perf stat: Add perf_stat_process_shadow_stats()
  2022-10-10  5:35 [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1) Namhyung Kim
                   ` (14 preceding siblings ...)
  2022-10-10  5:35 ` [PATCH 15/19] perf stat: Add perf_stat_process_percore() Namhyung Kim
@ 2022-10-10  5:35 ` Namhyung Kim
  2022-10-10 23:36   ` Ian Rogers
  2022-10-10  5:35 ` [PATCH 17/19] perf stat: Display event stats using aggr counts Namhyung Kim
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 63+ messages in thread
From: Namhyung Kim @ 2022-10-10  5:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun

This function updates the shadow stats using the aggregated counts
uniformly since it uses the aggr_counts for the every aggr mode.

It'd have duplicate shadow stats for each items for now since the
display routines will update them once again.  But that'd be fine
as it shows the average values and it'd be gone eventually.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c |  1 +
 tools/perf/util/stat.c    | 50 ++++++++++++++++++++-------------------
 tools/perf/util/stat.h    |  1 +
 3 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index c127e784a7be..d92815f4eae0 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -489,6 +489,7 @@ static void process_counters(void)
 
 	perf_stat_merge_counters(&stat_config, evsel_list);
 	perf_stat_process_percore(&stat_config, evsel_list);
+	perf_stat_process_shadow_stats(&stat_config, evsel_list);
 }
 
 static void process_interval(void)
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index d788d0e85204..f2a3761dacff 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -454,7 +454,7 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
 		aggr_counts->val += count->val;
 		aggr_counts->ena += count->ena;
 		aggr_counts->run += count->run;
-		goto update;
+		return 0;
 	}
 
 	if (ps->aggr) {
@@ -491,32 +491,10 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
 		}
 	}
 
-update:
-	switch (config->aggr_mode) {
-	case AGGR_THREAD:
-	case AGGR_CORE:
-	case AGGR_DIE:
-	case AGGR_SOCKET:
-	case AGGR_NODE:
-	case AGGR_NONE:
-		if ((config->aggr_mode == AGGR_NONE) && (!evsel->percore)) {
-			perf_stat__update_shadow_stats(evsel, count->val,
-						       cpu_map_idx, &rt_stat);
-		}
-
-		if (config->aggr_mode == AGGR_THREAD) {
-			perf_stat__update_shadow_stats(evsel, count->val,
-						       thread, &rt_stat);
-		}
-		break;
-	case AGGR_GLOBAL:
+	if (config->aggr_mode == AGGR_GLOBAL) {
 		aggr->val += count->val;
 		aggr->ena += count->ena;
 		aggr->run += count->run;
-	case AGGR_UNSET:
-	case AGGR_MAX:
-	default:
-		break;
 	}
 
 	return 0;
@@ -742,6 +720,30 @@ void perf_stat_process_percore(struct perf_stat_config *config, struct evlist *e
 		evsel__process_percore(evsel);
 }
 
+static void evsel__update_shadow_stats(struct evsel *evsel)
+{
+	struct perf_stat_evsel *ps = evsel->stats;
+	int i;
+
+	if (ps->aggr == NULL)
+		return;
+
+	for (i = 0; i < ps->nr_aggr; i++) {
+		struct perf_counts_values *aggr_counts = &ps->aggr[i].counts;
+
+		perf_stat__update_shadow_stats(evsel, aggr_counts->val, i, &rt_stat);
+	}
+}
+
+void perf_stat_process_shadow_stats(struct perf_stat_config *config __maybe_unused,
+				    struct evlist *evlist)
+{
+	struct evsel *evsel;
+
+	evlist__for_each_entry(evlist, evsel)
+		evsel__update_shadow_stats(evsel);
+}
+
 int perf_event__process_stat_event(struct perf_session *session,
 				   union perf_event *event)
 {
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index ac85ed46aa59..e51214918c7f 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -273,6 +273,7 @@ int perf_stat_process_counter(struct perf_stat_config *config,
 			      struct evsel *counter);
 void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *evlist);
 void perf_stat_process_percore(struct perf_stat_config *config, struct evlist *evlist);
+void perf_stat_process_shadow_stats(struct perf_stat_config *config, struct evlist *evlist);
 
 struct perf_tool;
 union perf_event;
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 17/19] perf stat: Display event stats using aggr counts
  2022-10-10  5:35 [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1) Namhyung Kim
                   ` (15 preceding siblings ...)
  2022-10-10  5:35 ` [PATCH 16/19] perf stat: Add perf_stat_process_shadow_stats() Namhyung Kim
@ 2022-10-10  5:35 ` Namhyung Kim
  2022-10-10 23:38   ` Ian Rogers
  2022-10-10  5:35 ` [PATCH 18/19] perf stat: Display percore events properly Namhyung Kim
                   ` (2 subsequent siblings)
  19 siblings, 1 reply; 63+ messages in thread
From: Namhyung Kim @ 2022-10-10  5:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun

Now aggr counts are ready for use.  Convert the display routines to use
the aggr counts and update the shadow stat with them.  It doesn't need
to aggregate counts or collect aliases anymore during the display.  Get
rid of now unused struct perf_aggr_thread_value.

Note that there's a difference in the display order among the aggr mode.
For per-core/die/socket/node aggregation, it shows relevant events in
the same unit together, whereas global/thread/no aggregation it shows
the same events for different units together.  So it still uses separate
codes to display them due to the ordering.

One more thing to note is that it breaks per-core event display for now.
The next patch will fix it to have identical output as of now.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/stat-display.c | 428 +++++----------------------------
 tools/perf/util/stat.c         |   5 -
 tools/perf/util/stat.h         |   9 -
 3 files changed, 55 insertions(+), 387 deletions(-)

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 1d8e585df4ad..0c0e22c175a1 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -442,31 +442,6 @@ static void print_metric_header(struct perf_stat_config *config,
 		fprintf(os->fh, "%*s ", config->metric_only_len, unit);
 }
 
-static int first_shadow_map_idx(struct perf_stat_config *config,
-				struct evsel *evsel, const struct aggr_cpu_id *id)
-{
-	struct perf_cpu_map *cpus = evsel__cpus(evsel);
-	struct perf_cpu cpu;
-	int idx;
-
-	if (config->aggr_mode == AGGR_NONE)
-		return perf_cpu_map__idx(cpus, id->cpu);
-
-	if (config->aggr_mode == AGGR_THREAD)
-		return id->thread_idx;
-
-	if (!config->aggr_get_id)
-		return 0;
-
-	perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
-		struct aggr_cpu_id cpu_id = config->aggr_get_id(config, cpu);
-
-		if (aggr_cpu_id__equal(&cpu_id, id))
-			return idx;
-	}
-	return 0;
-}
-
 static void abs_printout(struct perf_stat_config *config,
 			 struct aggr_cpu_id id, int nr, struct evsel *evsel, double avg)
 {
@@ -537,7 +512,7 @@ static bool is_mixed_hw_group(struct evsel *counter)
 static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int nr,
 		     struct evsel *counter, double uval,
 		     char *prefix, u64 run, u64 ena, double noise,
-		     struct runtime_stat *st)
+		     struct runtime_stat *st, int map_idx)
 {
 	struct perf_stat_output_ctx out;
 	struct outstate os = {
@@ -648,8 +623,7 @@ static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int
 		print_running(config, run, ena);
 	}
 
-	perf_stat__print_shadow_stats(config, counter, uval,
-				first_shadow_map_idx(config, counter, &id),
+	perf_stat__print_shadow_stats(config, counter, uval, map_idx,
 				&out, &config->metric_events, st);
 	if (!config->csv_output && !config->metric_only && !config->json_output) {
 		print_noise(config, counter, noise);
@@ -657,34 +631,6 @@ static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int
 	}
 }
 
-static void aggr_update_shadow(struct perf_stat_config *config,
-			       struct evlist *evlist)
-{
-	int idx, s;
-	struct perf_cpu cpu;
-	struct aggr_cpu_id s2, id;
-	u64 val;
-	struct evsel *counter;
-	struct perf_cpu_map *cpus;
-
-	for (s = 0; s < config->aggr_map->nr; s++) {
-		id = config->aggr_map->map[s];
-		evlist__for_each_entry(evlist, counter) {
-			cpus = evsel__cpus(counter);
-			val = 0;
-			perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
-				s2 = config->aggr_get_id(config, cpu);
-				if (!aggr_cpu_id__equal(&s2, &id))
-					continue;
-				val += perf_counts(counter->counts, idx, 0)->val;
-			}
-			perf_stat__update_shadow_stats(counter, val,
-					first_shadow_map_idx(config, counter, &id),
-					&rt_stat);
-		}
-	}
-}
-
 static void uniquify_event_name(struct evsel *counter)
 {
 	char *new_name;
@@ -721,137 +667,51 @@ static void uniquify_event_name(struct evsel *counter)
 	counter->uniquified_name = true;
 }
 
-static void collect_all_aliases(struct perf_stat_config *config, struct evsel *counter,
-			    void (*cb)(struct perf_stat_config *config, struct evsel *counter, void *data,
-				       bool first),
-			    void *data)
-{
-	struct evlist *evlist = counter->evlist;
-	struct evsel *alias;
-
-	alias = list_prepare_entry(counter, &(evlist->core.entries), core.node);
-	list_for_each_entry_continue (alias, &evlist->core.entries, core.node) {
-		/* Merge events with the same name, etc. but on different PMUs. */
-		if (!strcmp(evsel__name(alias), evsel__name(counter)) &&
-			alias->scale == counter->scale &&
-			alias->cgrp == counter->cgrp &&
-			!strcmp(alias->unit, counter->unit) &&
-			evsel__is_clock(alias) == evsel__is_clock(counter) &&
-			strcmp(alias->pmu_name, counter->pmu_name)) {
-			alias->merged_stat = true;
-			cb(config, alias, data, false);
-		}
-	}
-}
-
-static bool hybrid_merge(struct evsel *counter, struct perf_stat_config *config,
-			 bool check)
+static bool hybrid_uniquify(struct evsel *evsel, struct perf_stat_config *config)
 {
-	if (evsel__is_hybrid(counter)) {
-		if (check)
-			return config->hybrid_merge;
-		else
-			return !config->hybrid_merge;
-	}
-
-	return false;
+	return evsel__is_hybrid(evsel) && !config->hybrid_merge;
 }
 
-static bool collect_data(struct perf_stat_config *config, struct evsel *counter,
-			    void (*cb)(struct perf_stat_config *config, struct evsel *counter, void *data,
-				       bool first),
-			    void *data)
+static void uniquify_counter(struct perf_stat_config *config, struct evsel *counter)
 {
-	if (counter->merged_stat)
-		return false;
-	cb(config, counter, data, true);
-	if (config->no_merge || hybrid_merge(counter, config, false))
+	if (config->no_merge || hybrid_uniquify(counter, config))
 		uniquify_event_name(counter);
-	else if (counter->auto_merge_stats || hybrid_merge(counter, config, true))
-		collect_all_aliases(config, counter, cb, data);
-	return true;
-}
-
-struct aggr_data {
-	u64 ena, run, val;
-	struct aggr_cpu_id id;
-	int nr;
-	int cpu_map_idx;
-};
-
-static void aggr_cb(struct perf_stat_config *config,
-		    struct evsel *counter, void *data, bool first)
-{
-	struct aggr_data *ad = data;
-	int idx;
-	struct perf_cpu cpu;
-	struct perf_cpu_map *cpus;
-	struct aggr_cpu_id s2;
-
-	cpus = evsel__cpus(counter);
-	perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
-		struct perf_counts_values *counts;
-
-		s2 = config->aggr_get_id(config, cpu);
-		if (!aggr_cpu_id__equal(&s2, &ad->id))
-			continue;
-		if (first)
-			ad->nr++;
-		counts = perf_counts(counter->counts, idx, 0);
-		/*
-		 * When any result is bad, make them all to give
-		 * consistent output in interval mode.
-		 */
-		if (counts->ena == 0 || counts->run == 0 ||
-		    counter->counts->scaled == -1) {
-			ad->ena = 0;
-			ad->run = 0;
-			break;
-		}
-		ad->val += counts->val;
-		ad->ena += counts->ena;
-		ad->run += counts->run;
-	}
 }
 
 static void print_counter_aggrdata(struct perf_stat_config *config,
 				   struct evsel *counter, int s,
 				   char *prefix, bool metric_only,
-				   bool *first, struct perf_cpu cpu)
+				   bool *first)
 {
-	struct aggr_data ad;
 	FILE *output = config->output;
 	u64 ena, run, val;
-	int nr;
-	struct aggr_cpu_id id;
 	double uval;
+	struct perf_stat_evsel *ps = counter->stats;
+	struct perf_stat_aggr *aggr = &ps->aggr[s];
+	struct aggr_cpu_id id = config->aggr_map->map[s];
+	double avg = aggr->counts.val;
 
-	ad.id = id = config->aggr_map->map[s];
-	ad.val = ad.ena = ad.run = 0;
-	ad.nr = 0;
-	if (!collect_data(config, counter, aggr_cb, &ad))
+	if (aggr->nr == 0)
 		return;
 
-	if (perf_pmu__has_hybrid() && ad.ena == 0)
-		return;
+	uniquify_counter(config, counter);
+
+	val = aggr->counts.val;
+	ena = aggr->counts.ena;
+	run = aggr->counts.run;
 
-	nr = ad.nr;
-	ena = ad.ena;
-	run = ad.run;
-	val = ad.val;
 	if (*first && metric_only) {
 		*first = false;
-		aggr_printout(config, counter, id, nr);
+		aggr_printout(config, counter, id, aggr->nr);
 	}
 	if (prefix && !metric_only)
 		fprintf(output, "%s", prefix);
 
 	uval = val * counter->scale;
-	if (cpu.cpu != -1)
-		id = aggr_cpu_id__cpu(cpu, /*data=*/NULL);
 
-	printout(config, id, nr, counter, uval,
-		 prefix, run, ena, 1.0, &rt_stat);
+	printout(config, id, aggr->nr, counter, uval,
+		 prefix, run, ena, avg, &rt_stat, s);
+
 	if (!metric_only)
 		fputc('\n', output);
 }
@@ -869,8 +729,6 @@ static void print_aggr(struct perf_stat_config *config,
 	if (!config->aggr_map || !config->aggr_get_id)
 		return;
 
-	aggr_update_shadow(config, evlist);
-
 	/*
 	 * With metric_only everything is on a single line.
 	 * Without each counter has its own line.
@@ -881,188 +739,39 @@ static void print_aggr(struct perf_stat_config *config,
 
 		first = true;
 		evlist__for_each_entry(evlist, counter) {
+			if (counter->merged_stat)
+				continue;
+
 			print_counter_aggrdata(config, counter, s,
-					prefix, metric_only,
-					&first, (struct perf_cpu){ .cpu = -1 });
+					       prefix, metric_only,
+					       &first);
 		}
 		if (metric_only)
 			fputc('\n', output);
 	}
 }
 
-static int cmp_val(const void *a, const void *b)
-{
-	return ((struct perf_aggr_thread_value *)b)->val -
-		((struct perf_aggr_thread_value *)a)->val;
-}
-
-static struct perf_aggr_thread_value *sort_aggr_thread(
-					struct evsel *counter,
-					int *ret,
-					struct target *_target)
-{
-	int nthreads = perf_thread_map__nr(counter->core.threads);
-	int i = 0;
-	double uval;
-	struct perf_aggr_thread_value *buf;
-
-	buf = calloc(nthreads, sizeof(struct perf_aggr_thread_value));
-	if (!buf)
-		return NULL;
-
-	for (int thread = 0; thread < nthreads; thread++) {
-		int idx;
-		u64 ena = 0, run = 0, val = 0;
-
-		perf_cpu_map__for_each_idx(idx, evsel__cpus(counter)) {
-			struct perf_counts_values *counts =
-				perf_counts(counter->counts, idx, thread);
-
-			val += counts->val;
-			ena += counts->ena;
-			run += counts->run;
-		}
-
-		uval = val * counter->scale;
-
-		/*
-		 * Skip value 0 when enabling --per-thread globally,
-		 * otherwise too many 0 output.
-		 */
-		if (uval == 0.0 && target__has_per_thread(_target))
-			continue;
-
-		buf[i].counter = counter;
-		buf[i].id = aggr_cpu_id__empty();
-		buf[i].id.thread_idx = thread;
-		buf[i].uval = uval;
-		buf[i].val = val;
-		buf[i].run = run;
-		buf[i].ena = ena;
-		i++;
-	}
-
-	qsort(buf, i, sizeof(struct perf_aggr_thread_value), cmp_val);
-
-	if (ret)
-		*ret = i;
-
-	return buf;
-}
-
-static void print_aggr_thread(struct perf_stat_config *config,
-			      struct target *_target,
-			      struct evsel *counter, char *prefix)
-{
-	FILE *output = config->output;
-	int thread, sorted_threads;
-	struct aggr_cpu_id id;
-	struct perf_aggr_thread_value *buf;
-
-	buf = sort_aggr_thread(counter, &sorted_threads, _target);
-	if (!buf) {
-		perror("cannot sort aggr thread");
-		return;
-	}
-
-	for (thread = 0; thread < sorted_threads; thread++) {
-		if (prefix)
-			fprintf(output, "%s", prefix);
-
-		id = buf[thread].id;
-		printout(config, id, 0, buf[thread].counter, buf[thread].uval,
-			 prefix, buf[thread].run, buf[thread].ena, 1.0,
-			 &rt_stat);
-		fputc('\n', output);
-	}
-
-	free(buf);
-}
-
-struct caggr_data {
-	double avg, avg_enabled, avg_running;
-};
-
-static void counter_aggr_cb(struct perf_stat_config *config __maybe_unused,
-			    struct evsel *counter, void *data,
-			    bool first __maybe_unused)
-{
-	struct caggr_data *cd = data;
-	struct perf_counts_values *aggr = &counter->counts->aggr;
-
-	cd->avg += aggr->val;
-	cd->avg_enabled += aggr->ena;
-	cd->avg_running += aggr->run;
-}
-
-/*
- * Print out the results of a single counter:
- * aggregated counts in system-wide mode
- */
-static void print_counter_aggr(struct perf_stat_config *config,
-			       struct evsel *counter, char *prefix)
-{
-	bool metric_only = config->metric_only;
-	FILE *output = config->output;
-	double uval;
-	struct caggr_data cd = { .avg = 0.0 };
-
-	if (!collect_data(config, counter, counter_aggr_cb, &cd))
-		return;
-
-	if (prefix && !metric_only)
-		fprintf(output, "%s", prefix);
-
-	uval = cd.avg * counter->scale;
-	printout(config, aggr_cpu_id__empty(), 0, counter, uval, prefix, cd.avg_running,
-		 cd.avg_enabled, cd.avg, &rt_stat);
-	if (!metric_only)
-		fprintf(output, "\n");
-}
-
-static void counter_cb(struct perf_stat_config *config __maybe_unused,
-		       struct evsel *counter, void *data,
-		       bool first __maybe_unused)
-{
-	struct aggr_data *ad = data;
-
-	ad->val += perf_counts(counter->counts, ad->cpu_map_idx, 0)->val;
-	ad->ena += perf_counts(counter->counts, ad->cpu_map_idx, 0)->ena;
-	ad->run += perf_counts(counter->counts, ad->cpu_map_idx, 0)->run;
-}
-
-/*
- * Print out the results of a single counter:
- * does not use aggregated count in system-wide
- */
 static void print_counter(struct perf_stat_config *config,
 			  struct evsel *counter, char *prefix)
 {
+	bool metric_only = config->metric_only;
 	FILE *output = config->output;
-	u64 ena, run, val;
-	double uval;
-	int idx;
-	struct perf_cpu cpu;
-	struct aggr_cpu_id id;
-
-	perf_cpu_map__for_each_cpu(cpu, idx, evsel__cpus(counter)) {
-		struct aggr_data ad = { .cpu_map_idx = idx };
-
-		if (!collect_data(config, counter, counter_cb, &ad))
-			return;
-		val = ad.val;
-		ena = ad.ena;
-		run = ad.run;
+	bool first = false;
+	int s;
 
-		if (prefix)
-			fprintf(output, "%s", prefix);
+	/* AGGR_THREAD doesn't have config->aggr_get_id */
+	if (!config->aggr_map)
+		return;
 
-		uval = val * counter->scale;
-		id = aggr_cpu_id__cpu(cpu, /*data=*/NULL);
-		printout(config, id, 0, counter, uval, prefix,
-			 run, ena, 1.0, &rt_stat);
+	if (counter->merged_stat)
+		return;
 
-		fputc('\n', output);
+	for (s = 0; s < config->aggr_map->nr; s++) {
+		print_counter_aggrdata(config, counter, s,
+				       prefix, metric_only,
+				       &first);
+		if (metric_only)
+			fputc('\n', output);
 	}
 }
 
@@ -1081,6 +790,7 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
 			u64 ena, run, val;
 			double uval;
 			struct aggr_cpu_id id;
+			struct perf_stat_evsel *ps = counter->stats;
 			int counter_idx = perf_cpu_map__idx(evsel__cpus(counter), cpu);
 
 			if (counter_idx < 0)
@@ -1093,13 +803,13 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
 				aggr_printout(config, counter, id, 0);
 				first = false;
 			}
-			val = perf_counts(counter->counts, counter_idx, 0)->val;
-			ena = perf_counts(counter->counts, counter_idx, 0)->ena;
-			run = perf_counts(counter->counts, counter_idx, 0)->run;
+			val = ps->aggr[counter_idx].counts.val;
+			ena = ps->aggr[counter_idx].counts.ena;
+			run = ps->aggr[counter_idx].counts.run;
 
 			uval = val * counter->scale;
 			printout(config, id, 0, counter, uval, prefix,
-				 run, ena, 1.0, &rt_stat);
+				 run, ena, 1.0, &rt_stat, counter_idx);
 		}
 		if (!first)
 			fputc('\n', config->output);
@@ -1135,8 +845,8 @@ static void print_metric_headers(struct perf_stat_config *config,
 	};
 	bool first = true;
 
-		if (config->json_output && !config->interval)
-			fprintf(config->output, "{");
+	if (config->json_output && !config->interval)
+		fprintf(config->output, "{");
 
 	if (prefix && !config->json_output)
 		fprintf(config->output, "%s", prefix);
@@ -1379,31 +1089,6 @@ static void print_footer(struct perf_stat_config *config)
 			"the same PMU. Try reorganizing the group.\n");
 }
 
-static void print_percore_thread(struct perf_stat_config *config,
-				 struct evsel *counter, char *prefix)
-{
-	int s;
-	struct aggr_cpu_id s2, id;
-	struct perf_cpu_map *cpus;
-	bool first = true;
-	int idx;
-	struct perf_cpu cpu;
-
-	cpus = evsel__cpus(counter);
-	perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
-		s2 = config->aggr_get_id(config, cpu);
-		for (s = 0; s < config->aggr_map->nr; s++) {
-			id = config->aggr_map->map[s];
-			if (aggr_cpu_id__equal(&s2, &id))
-				break;
-		}
-
-		print_counter_aggrdata(config, counter, s,
-				       prefix, false,
-				       &first, cpu);
-	}
-}
-
 static void print_percore(struct perf_stat_config *config,
 			  struct evsel *counter, char *prefix)
 {
@@ -1416,15 +1101,14 @@ static void print_percore(struct perf_stat_config *config,
 		return;
 
 	if (config->percore_show_thread)
-		return print_percore_thread(config, counter, prefix);
+		return print_counter(config, counter, prefix);
 
 	for (s = 0; s < config->aggr_map->nr; s++) {
 		if (prefix && metric_only)
 			fprintf(output, "%s", prefix);
 
 		print_counter_aggrdata(config, counter, s,
-				prefix, metric_only,
-				&first, (struct perf_cpu){ .cpu = -1 });
+				       prefix, metric_only, &first);
 	}
 
 	if (metric_only)
@@ -1469,16 +1153,14 @@ void evlist__print_counters(struct evlist *evlist, struct perf_stat_config *conf
 		print_aggr(config, evlist, prefix);
 		break;
 	case AGGR_THREAD:
-		evlist__for_each_entry(evlist, counter) {
-			print_aggr_thread(config, _target, counter, prefix);
-		}
-		break;
 	case AGGR_GLOBAL:
-		if (config->iostat_run)
+		if (config->iostat_run) {
 			iostat_print_counters(evlist, config, ts, prefix = buf,
-					      print_counter_aggr);
-		else
-			print_aggr(config, evlist, prefix);
+					      print_counter);
+			break;
+		}
+		evlist__for_each_entry(evlist, counter)
+			print_counter(config, counter, prefix);
 		break;
 	case AGGR_NONE:
 		if (metric_only)
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index f2a3761dacff..1652586a4925 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -545,11 +545,6 @@ int perf_stat_process_counter(struct perf_stat_config *config,
 			evsel__name(counter), count[0], count[1], count[2]);
 	}
 
-	/*
-	 * Save the full runtime - to allow normalization during printout:
-	 */
-	perf_stat__update_shadow_stats(counter, *count, 0, &rt_stat);
-
 	return 0;
 }
 
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index e51214918c7f..b02d8a4ffabf 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -213,15 +213,6 @@ static inline void update_rusage_stats(struct rusage_stats *ru_stats, struct rus
 struct evsel;
 struct evlist;
 
-struct perf_aggr_thread_value {
-	struct evsel *counter;
-	struct aggr_cpu_id id;
-	double uval;
-	u64 val;
-	u64 run;
-	u64 ena;
-};
-
 bool __perf_stat_evsel__is(struct evsel *evsel, enum perf_stat_evsel_id id);
 
 #define perf_stat_evsel__is(evsel, id) \
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 18/19] perf stat: Display percore events properly
  2022-10-10  5:35 [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1) Namhyung Kim
                   ` (16 preceding siblings ...)
  2022-10-10  5:35 ` [PATCH 17/19] perf stat: Display event stats using aggr counts Namhyung Kim
@ 2022-10-10  5:35 ` Namhyung Kim
  2022-10-10 23:39   ` Ian Rogers
  2022-10-10  5:36 ` [PATCH 19/19] perf stat: Remove unused perf_counts.aggr field Namhyung Kim
  2022-10-11  0:25 ` [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1) Andi Kleen
  19 siblings, 1 reply; 63+ messages in thread
From: Namhyung Kim @ 2022-10-10  5:35 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun

The recent change in the perf stat broke the percore event display.
Note that the aggr counts are already processed so that the every
sibling thread in the same core will get the per-core counter values.

Check percore evsels and skip the sibling threads in the display.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c      | 16 ----------------
 tools/perf/util/stat-display.c | 27 +++++++++++++++++++++++++--
 2 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index d92815f4eae0..b3a39d4c86a7 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1403,18 +1403,6 @@ static struct aggr_cpu_id perf_stat__get_cpu_cached(struct perf_stat_config *con
 	return perf_stat__get_aggr(config, perf_stat__get_cpu, cpu);
 }
 
-static bool term_percore_set(void)
-{
-	struct evsel *counter;
-
-	evlist__for_each_entry(evsel_list, counter) {
-		if (counter->percore)
-			return true;
-	}
-
-	return false;
-}
-
 static aggr_cpu_id_get_t aggr_mode__get_aggr(enum aggr_mode aggr_mode)
 {
 	switch (aggr_mode) {
@@ -1427,8 +1415,6 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr(enum aggr_mode aggr_mode)
 	case AGGR_NODE:
 		return aggr_cpu_id__node;
 	case AGGR_NONE:
-		if (term_percore_set())
-			return aggr_cpu_id__core;
 		return aggr_cpu_id__cpu;;
 	case AGGR_GLOBAL:
 		return aggr_cpu_id__global;
@@ -1452,8 +1438,6 @@ static aggr_get_id_t aggr_mode__get_id(enum aggr_mode aggr_mode)
 	case AGGR_NODE:
 		return perf_stat__get_node_cached;
 	case AGGR_NONE:
-		if (term_percore_set())
-			return perf_stat__get_core_cached;
 		return perf_stat__get_cpu_cached;
 	case AGGR_GLOBAL:
 		return perf_stat__get_global_cached;
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 0c0e22c175a1..e0c0df99d40d 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -1094,7 +1094,8 @@ static void print_percore(struct perf_stat_config *config,
 {
 	bool metric_only = config->metric_only;
 	FILE *output = config->output;
-	int s;
+	struct cpu_aggr_map *core_map;
+	int s, c, i;
 	bool first = true;
 
 	if (!config->aggr_map || !config->aggr_get_id)
@@ -1103,13 +1104,35 @@ static void print_percore(struct perf_stat_config *config,
 	if (config->percore_show_thread)
 		return print_counter(config, counter, prefix);
 
-	for (s = 0; s < config->aggr_map->nr; s++) {
+	core_map = cpu_aggr_map__empty_new(config->aggr_map->nr);
+	if (core_map == NULL) {
+		fprintf(output, "Cannot allocate per-core aggr map for display\n");
+		return;
+	}
+
+	for (s = 0, c = 0; s < config->aggr_map->nr; s++) {
+		struct perf_cpu curr_cpu = config->aggr_map->map[s].cpu;
+		struct aggr_cpu_id core_id = aggr_cpu_id__core(curr_cpu, NULL);
+		bool found = false;
+
+		for (i = 0; i < c; i++) {
+			if (aggr_cpu_id__equal(&core_map->map[i], &core_id)) {
+				found = true;
+				break;
+			}
+		}
+		if (found)
+			continue;
+
 		if (prefix && metric_only)
 			fprintf(output, "%s", prefix);
 
 		print_counter_aggrdata(config, counter, s,
 				       prefix, metric_only, &first);
+
+		core_map->map[c++] = core_id;
 	}
+	free(core_map);
 
 	if (metric_only)
 		fputc('\n', output);
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 19/19] perf stat: Remove unused perf_counts.aggr field
  2022-10-10  5:35 [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1) Namhyung Kim
                   ` (17 preceding siblings ...)
  2022-10-10  5:35 ` [PATCH 18/19] perf stat: Display percore events properly Namhyung Kim
@ 2022-10-10  5:36 ` Namhyung Kim
  2022-10-10 23:40   ` Ian Rogers
  2022-10-12  8:41   ` Jiri Olsa
  2022-10-11  0:25 ` [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1) Andi Kleen
  19 siblings, 2 replies; 63+ messages in thread
From: Namhyung Kim @ 2022-10-10  5:36 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun

The aggr field in the struct perf_counts is to keep the aggregated value
in the AGGR_GLOBAL for the old code.  But it's not used anymore.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/counts.c |  1 -
 tools/perf/util/counts.h |  1 -
 tools/perf/util/stat.c   | 35 ++---------------------------------
 3 files changed, 2 insertions(+), 35 deletions(-)

diff --git a/tools/perf/util/counts.c b/tools/perf/util/counts.c
index 7a447d918458..11cd85b278a6 100644
--- a/tools/perf/util/counts.c
+++ b/tools/perf/util/counts.c
@@ -48,7 +48,6 @@ void perf_counts__reset(struct perf_counts *counts)
 {
 	xyarray__reset(counts->loaded);
 	xyarray__reset(counts->values);
-	memset(&counts->aggr, 0, sizeof(struct perf_counts_values));
 }
 
 void evsel__reset_counts(struct evsel *evsel)
diff --git a/tools/perf/util/counts.h b/tools/perf/util/counts.h
index 5de275194f2b..42760242e0df 100644
--- a/tools/perf/util/counts.h
+++ b/tools/perf/util/counts.h
@@ -11,7 +11,6 @@ struct evsel;
 
 struct perf_counts {
 	s8			  scaled;
-	struct perf_counts_values aggr;
 	struct xyarray		  *values;
 	struct xyarray		  *loaded;
 };
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 1652586a4925..0dccfa273fa7 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -307,8 +307,6 @@ static void evsel__copy_prev_raw_counts(struct evsel *evsel)
 				*perf_counts(evsel->prev_raw_counts, idx, thread);
 		}
 	}
-
-	evsel->counts->aggr = evsel->prev_raw_counts->aggr;
 }
 
 void evlist__copy_prev_raw_counts(struct evlist *evlist)
@@ -319,26 +317,6 @@ void evlist__copy_prev_raw_counts(struct evlist *evlist)
 		evsel__copy_prev_raw_counts(evsel);
 }
 
-void evlist__save_aggr_prev_raw_counts(struct evlist *evlist)
-{
-	struct evsel *evsel;
-
-	/*
-	 * To collect the overall statistics for interval mode,
-	 * we copy the counts from evsel->prev_raw_counts to
-	 * evsel->counts. The perf_stat_process_counter creates
-	 * aggr values from per cpu values, but the per cpu values
-	 * are 0 for AGGR_GLOBAL. So we use a trick that saves the
-	 * previous aggr value to the first member of perf_counts,
-	 * then aggr calculation in process_counter_values can work
-	 * correctly.
-	 */
-	evlist__for_each_entry(evlist, evsel) {
-		*perf_counts(evsel->prev_raw_counts, 0, 0) =
-			evsel->prev_raw_counts->aggr;
-	}
-}
-
 static size_t pkg_id_hash(const void *__key, void *ctx __maybe_unused)
 {
 	uint64_t *key = (uint64_t *) __key;
@@ -422,7 +400,6 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
 		       int cpu_map_idx, int thread,
 		       struct perf_counts_values *count)
 {
-	struct perf_counts_values *aggr = &evsel->counts->aggr;
 	struct perf_stat_evsel *ps = evsel->stats;
 	static struct perf_counts_values zero;
 	bool skip = false;
@@ -491,12 +468,6 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
 		}
 	}
 
-	if (config->aggr_mode == AGGR_GLOBAL) {
-		aggr->val += count->val;
-		aggr->ena += count->ena;
-		aggr->run += count->run;
-	}
-
 	return 0;
 }
 
@@ -521,13 +492,10 @@ static int process_counter_maps(struct perf_stat_config *config,
 int perf_stat_process_counter(struct perf_stat_config *config,
 			      struct evsel *counter)
 {
-	struct perf_counts_values *aggr = &counter->counts->aggr;
 	struct perf_stat_evsel *ps = counter->stats;
-	u64 *count = counter->counts->aggr.values;
+	u64 *count;
 	int ret;
 
-	aggr->val = aggr->ena = aggr->run = 0;
-
 	if (counter->per_pkg)
 		evsel__zero_per_pkg(counter);
 
@@ -538,6 +506,7 @@ int perf_stat_process_counter(struct perf_stat_config *config,
 	if (config->aggr_mode != AGGR_GLOBAL)
 		return 0;
 
+	count = ps->aggr[0].counts.values;
 	update_stats(&ps->res_stats, *count);
 
 	if (verbose > 0) {
-- 
2.38.0.rc1.362.ged0d419d3c-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH 01/19] perf tools: Save evsel->pmu in parse_events()
  2022-10-10  5:35 ` [PATCH 01/19] perf tools: Save evsel->pmu in parse_events() Namhyung Kim
@ 2022-10-10 22:21   ` Ian Rogers
  0 siblings, 0 replies; 63+ messages in thread
From: Ian Rogers @ 2022-10-10 22:21 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Now evsel has a pmu pointer, let's save the info and use it like in
> evsel__find_pmu().
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/util/evsel.c        | 1 +
>  tools/perf/util/parse-events.c | 1 +
>  tools/perf/util/pmu.c          | 4 ++++
>  3 files changed, 6 insertions(+)
>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 76605fde3507..196f8e4859d7 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -467,6 +467,7 @@ struct evsel *evsel__clone(struct evsel *orig)
>         evsel->collect_stat = orig->collect_stat;
>         evsel->weak_group = orig->weak_group;
>         evsel->use_config_name = orig->use_config_name;
> +       evsel->pmu = orig->pmu;
>
>         if (evsel__copy_config_terms(evsel, orig) < 0)
>                 goto out_err;
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 437389dacf48..9e704841273d 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -263,6 +263,7 @@ __add_event(struct list_head *list, int *idx,
>         evsel->core.own_cpus = perf_cpu_map__get(cpus);
>         evsel->core.requires_cpu = pmu ? pmu->is_uncore : false;
>         evsel->auto_merge_stats = auto_merge_stats;
> +       evsel->pmu = pmu;
>
>         if (name)
>                 evsel->name = strdup(name);
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index 74a2cafb4e8d..15bf5943083a 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -1048,11 +1048,15 @@ struct perf_pmu *evsel__find_pmu(struct evsel *evsel)
>  {
>         struct perf_pmu *pmu = NULL;
>
> +       if (evsel->pmu)
> +               return evsel->pmu;
> +
>         while ((pmu = perf_pmu__scan(pmu)) != NULL) {
>                 if (pmu->type == evsel->core.attr.type)
>                         break;
>         }
>
> +       evsel->pmu = pmu;
>         return pmu;
>  }
>
> --
> 2.38.0.rc1.362.ged0d419d3c-goog
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 02/19] perf tools: Use pmu info in evsel__is_hybrid()
  2022-10-10  5:35 ` [PATCH 02/19] perf tools: Use pmu info in evsel__is_hybrid() Namhyung Kim
@ 2022-10-10 22:31   ` Ian Rogers
  2022-10-11  5:10     ` Namhyung Kim
  0 siblings, 1 reply; 63+ messages in thread
From: Ian Rogers @ 2022-10-10 22:31 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> If evsel has pmu, it can use pmu->is_hybrid directly.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/util/evsel.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 196f8e4859d7..a6ea91c72659 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -3132,6 +3132,9 @@ void evsel__zero_per_pkg(struct evsel *evsel)
>
>  bool evsel__is_hybrid(struct evsel *evsel)
>  {
> +       if (evsel->pmu)
> +               return evsel->pmu->is_hybrid;
> +
>         return evsel->pmu_name && perf_pmu__is_hybrid(evsel->pmu_name);

Wow, there's so much duplicated state. Why do evsels have a pmu_name
and a pmu? Why not just pmu->name? I feel always having a pmu would be
cleanest here. That said what does evsel__is_hybrid even mean? Does it
mean this event is on a PMU normally called cpu and called cpu_core
and cpu_atom on hybrid systems? And of course there are no comments to
explain what this little mystery could be. Anyway, that's not a fault
of this change, and probably later changes will go someway toward
cleaning this up. It was a shame the code wasn't cleaner in the first
place.

Acked-by: Ian Rogers

Thanks,
Ian

>  }
>
> --
> 2.38.0.rc1.362.ged0d419d3c-goog
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 03/19] perf stat: Use evsel__is_hybrid() more
  2022-10-10  5:35 ` [PATCH 03/19] perf stat: Use evsel__is_hybrid() more Namhyung Kim
@ 2022-10-10 22:32   ` Ian Rogers
  0 siblings, 0 replies; 63+ messages in thread
From: Ian Rogers @ 2022-10-10 22:32 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> In the stat-display code, it needs to check if the current evsel is
> hybrid but it uses perf_pmu__has_hybrid() which can return true for
> non-hybrid event too.  I think it's better to use evsel__is_hybrid().
>
> Also remove a NULL check for the 'config' parameter in the
> hybrid_merge() since it's called after config->no_merge check.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/util/stat-display.c | 20 ++++----------------
>  1 file changed, 4 insertions(+), 16 deletions(-)
>
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index 5c47ee9963a7..4113aa86772f 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -704,7 +704,7 @@ static void uniquify_event_name(struct evsel *counter)
>                         counter->name = new_name;
>                 }
>         } else {
> -               if (perf_pmu__has_hybrid()) {
> +               if (evsel__is_hybrid(counter)) {
>                         ret = asprintf(&new_name, "%s/%s/",
>                                        counter->pmu_name, counter->name);
>                 } else {
> @@ -744,26 +744,14 @@ static void collect_all_aliases(struct perf_stat_config *config, struct evsel *c
>         }
>  }
>
> -static bool is_uncore(struct evsel *evsel)
> -{
> -       struct perf_pmu *pmu = evsel__find_pmu(evsel);
> -
> -       return pmu && pmu->is_uncore;
> -}
> -
> -static bool hybrid_uniquify(struct evsel *evsel)
> -{
> -       return perf_pmu__has_hybrid() && !is_uncore(evsel);
> -}
> -
>  static bool hybrid_merge(struct evsel *counter, struct perf_stat_config *config,
>                          bool check)
>  {
> -       if (hybrid_uniquify(counter)) {
> +       if (evsel__is_hybrid(counter)) {
>                 if (check)
> -                       return config && config->hybrid_merge;
> +                       return config->hybrid_merge;
>                 else
> -                       return config && !config->hybrid_merge;
> +                       return !config->hybrid_merge;
>         }
>
>         return false;
> --
> 2.38.0.rc1.362.ged0d419d3c-goog
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 04/19] perf stat: Add aggr id for global mode
  2022-10-10  5:35 ` [PATCH 04/19] perf stat: Add aggr id for global mode Namhyung Kim
@ 2022-10-10 22:46   ` Ian Rogers
  2022-10-11 23:08     ` Namhyung Kim
  2022-10-12 10:55   ` Jiri Olsa
  1 sibling, 1 reply; 63+ messages in thread
From: Ian Rogers @ 2022-10-10 22:46 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> To make the code simpler, I'd like to use the same aggregation code for
> the global mode.  We can simply add an id function to return cpu 0 and
> use print_aggr().
>
> No functional change intended.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-stat.c      | 39 ++++++++++++++++++++++++++++++++--
>  tools/perf/util/cpumap.c       | 10 +++++++++
>  tools/perf/util/cpumap.h       |  6 +++++-
>  tools/perf/util/stat-display.c |  9 ++------
>  4 files changed, 54 insertions(+), 10 deletions(-)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 265b05157972..144bb3a657f2 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1330,6 +1330,15 @@ static struct aggr_cpu_id perf_stat__get_node(struct perf_stat_config *config __
>         return aggr_cpu_id__node(cpu, /*data=*/NULL);
>  }
>
> +static struct aggr_cpu_id perf_stat__get_global(struct perf_stat_config *config __maybe_unused,
> +                                               struct perf_cpu cpu __maybe_unused)
> +{
> +       struct aggr_cpu_id id = aggr_cpu_id__empty();
> +
> +       id.cpu = (struct perf_cpu){ .cpu = 0 };
> +       return id;
> +}
> +

See below, I think this should just return aggr_cpu_id__global or just
call that directly.

>  static struct aggr_cpu_id perf_stat__get_aggr(struct perf_stat_config *config,
>                                               aggr_get_id_t get_id, struct perf_cpu cpu)
>  {
> @@ -1366,6 +1375,12 @@ static struct aggr_cpu_id perf_stat__get_node_cached(struct perf_stat_config *co
>         return perf_stat__get_aggr(config, perf_stat__get_node, cpu);
>  }
>
> +static struct aggr_cpu_id perf_stat__get_global_cached(struct perf_stat_config *config,
> +                                                      struct perf_cpu cpu)
> +{
> +       return perf_stat__get_aggr(config, perf_stat__get_global, cpu);
> +}
> +
>  static bool term_percore_set(void)
>  {
>         struct evsel *counter;
> @@ -1395,6 +1410,7 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr(enum aggr_mode aggr_mode)
>
>                 return NULL;
>         case AGGR_GLOBAL:
> +               return aggr_cpu_id__global;
>         case AGGR_THREAD:
>         case AGGR_UNSET:
>         case AGGR_MAX:
> @@ -1420,6 +1436,7 @@ static aggr_get_id_t aggr_mode__get_id(enum aggr_mode aggr_mode)
>                 }
>                 return NULL;
>         case AGGR_GLOBAL:
> +               return perf_stat__get_global_cached;
>         case AGGR_THREAD:
>         case AGGR_UNSET:
>         case AGGR_MAX:
> @@ -1535,6 +1552,16 @@ static struct aggr_cpu_id perf_env__get_node_aggr_by_cpu(struct perf_cpu cpu, vo
>         return id;
>  }
>
> +static struct aggr_cpu_id perf_env__get_global_aggr_by_cpu(struct perf_cpu cpu __maybe_unused,
> +                                                          void *data __maybe_unused)
> +{
> +       struct aggr_cpu_id id = aggr_cpu_id__empty();
> +
> +       /* it always aggregates to the cpu 0 */
> +       id.cpu = (struct perf_cpu){ .cpu = 0 };
> +       return id;
> +}
> +
>  static struct aggr_cpu_id perf_stat__get_socket_file(struct perf_stat_config *config __maybe_unused,
>                                                      struct perf_cpu cpu)
>  {
> @@ -1558,6 +1585,12 @@ static struct aggr_cpu_id perf_stat__get_node_file(struct perf_stat_config *conf
>         return perf_env__get_node_aggr_by_cpu(cpu, &perf_stat.session->header.env);
>  }
>
> +static struct aggr_cpu_id perf_stat__get_global_file(struct perf_stat_config *config __maybe_unused,
> +                                                    struct perf_cpu cpu)
> +{
> +       return perf_env__get_global_aggr_by_cpu(cpu, &perf_stat.session->header.env);
> +}
> +
>  static aggr_cpu_id_get_t aggr_mode__get_aggr_file(enum aggr_mode aggr_mode)
>  {
>         switch (aggr_mode) {
> @@ -1569,8 +1602,9 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr_file(enum aggr_mode aggr_mode)
>                 return perf_env__get_core_aggr_by_cpu;
>         case AGGR_NODE:
>                 return perf_env__get_node_aggr_by_cpu;
> -       case AGGR_NONE:
>         case AGGR_GLOBAL:
> +               return perf_env__get_global_aggr_by_cpu;
> +       case AGGR_NONE:
>         case AGGR_THREAD:
>         case AGGR_UNSET:
>         case AGGR_MAX:
> @@ -1590,8 +1624,9 @@ static aggr_get_id_t aggr_mode__get_id_file(enum aggr_mode aggr_mode)
>                 return perf_stat__get_core_file;
>         case AGGR_NODE:
>                 return perf_stat__get_node_file;
> -       case AGGR_NONE:
>         case AGGR_GLOBAL:
> +               return perf_stat__get_global_file;
> +       case AGGR_NONE:
>         case AGGR_THREAD:
>         case AGGR_UNSET:
>         case AGGR_MAX:
> diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c
> index 8486ca3bec75..60209fe87456 100644
> --- a/tools/perf/util/cpumap.c
> +++ b/tools/perf/util/cpumap.c
> @@ -354,6 +354,16 @@ struct aggr_cpu_id aggr_cpu_id__node(struct perf_cpu cpu, void *data __maybe_unu
>         return id;
>  }
>
> +struct aggr_cpu_id aggr_cpu_id__global(struct perf_cpu cpu, void *data __maybe_unused)

Is this a duplicate of aggr_cpu_id perf_stat__get_global? Could we
replace all uses of the former with this one?

Thanks,
Ian

> +{
> +       struct aggr_cpu_id id = aggr_cpu_id__empty();
> +
> +       /* it always aggregates to the cpu 0 */
> +       cpu.cpu = 0;
> +       id.cpu = cpu;
> +       return id;
> +}
> +
>  /* setup simple routines to easily access node numbers given a cpu number */
>  static int get_max_num(char *path, int *max)
>  {
> diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h
> index 4a6d029576ee..b2ff648bc417 100644
> --- a/tools/perf/util/cpumap.h
> +++ b/tools/perf/util/cpumap.h
> @@ -133,5 +133,9 @@ struct aggr_cpu_id aggr_cpu_id__cpu(struct perf_cpu cpu, void *data);
>   * cpu. The function signature is compatible with aggr_cpu_id_get_t.
>   */
>  struct aggr_cpu_id aggr_cpu_id__node(struct perf_cpu cpu, void *data);
> -
> +/**
> + * aggr_cpu_id__global - Create an aggr_cpu_id for global aggregation.
> + * The function signature is compatible with aggr_cpu_id_get_t.
> + */
> +struct aggr_cpu_id aggr_cpu_id__global(struct perf_cpu cpu, void *data);
>  #endif /* __PERF_CPUMAP_H */
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index 4113aa86772f..1d8e585df4ad 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -1477,13 +1477,8 @@ void evlist__print_counters(struct evlist *evlist, struct perf_stat_config *conf
>                 if (config->iostat_run)
>                         iostat_print_counters(evlist, config, ts, prefix = buf,
>                                               print_counter_aggr);
> -               else {
> -                       evlist__for_each_entry(evlist, counter) {
> -                               print_counter_aggr(config, counter, prefix);
> -                       }
> -                       if (metric_only)
> -                               fputc('\n', config->output);
> -               }
> +               else
> +                       print_aggr(config, evlist, prefix);
>                 break;
>         case AGGR_NONE:
>                 if (metric_only)
> --
> 2.38.0.rc1.362.ged0d419d3c-goog
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 05/19] perf stat: Add cpu aggr id for no aggregation mode
  2022-10-10  5:35 ` [PATCH 05/19] perf stat: Add cpu aggr id for no aggregation mode Namhyung Kim
@ 2022-10-10 22:49   ` Ian Rogers
  2022-10-12 10:40   ` Jiri Olsa
  1 sibling, 0 replies; 63+ messages in thread
From: Ian Rogers @ 2022-10-10 22:49 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Likewise, add an aggr_id for cpu for none aggregation mode.  This is not
> used actually yet but later code will use to unify the aggregation code.
>
> No functional change intended.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-stat.c | 48 +++++++++++++++++++++++++++++++++++----
>  1 file changed, 43 insertions(+), 5 deletions(-)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 144bb3a657f2..b00ef20aef5b 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1339,6 +1339,12 @@ static struct aggr_cpu_id perf_stat__get_global(struct perf_stat_config *config
>         return id;
>  }
>
> +static struct aggr_cpu_id perf_stat__get_cpu(struct perf_stat_config *config __maybe_unused,
> +                                            struct perf_cpu cpu)
> +{
> +       return aggr_cpu_id__cpu(cpu, /*data=*/NULL);
> +}
> +
>  static struct aggr_cpu_id perf_stat__get_aggr(struct perf_stat_config *config,
>                                               aggr_get_id_t get_id, struct perf_cpu cpu)
>  {
> @@ -1381,6 +1387,12 @@ static struct aggr_cpu_id perf_stat__get_global_cached(struct perf_stat_config *
>         return perf_stat__get_aggr(config, perf_stat__get_global, cpu);
>  }
>
> +static struct aggr_cpu_id perf_stat__get_cpu_cached(struct perf_stat_config *config,
> +                                                   struct perf_cpu cpu)
> +{
> +       return perf_stat__get_aggr(config, perf_stat__get_cpu, cpu);
> +}
> +

There's an existing issue with this code that it is under documented -
in particular, cached?

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

>  static bool term_percore_set(void)
>  {
>         struct evsel *counter;
> @@ -1407,8 +1419,7 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr(enum aggr_mode aggr_mode)
>         case AGGR_NONE:
>                 if (term_percore_set())
>                         return aggr_cpu_id__core;
> -
> -               return NULL;
> +               return aggr_cpu_id__cpu;;
>         case AGGR_GLOBAL:
>                 return aggr_cpu_id__global;
>         case AGGR_THREAD:
> @@ -1431,10 +1442,9 @@ static aggr_get_id_t aggr_mode__get_id(enum aggr_mode aggr_mode)
>         case AGGR_NODE:
>                 return perf_stat__get_node_cached;
>         case AGGR_NONE:
> -               if (term_percore_set()) {
> +               if (term_percore_set())
>                         return perf_stat__get_core_cached;
> -               }
> -               return NULL;
> +               return perf_stat__get_cpu_cached;
>         case AGGR_GLOBAL:
>                 return perf_stat__get_global_cached;
>         case AGGR_THREAD:
> @@ -1544,6 +1554,26 @@ static struct aggr_cpu_id perf_env__get_core_aggr_by_cpu(struct perf_cpu cpu, vo
>         return id;
>  }
>
> +static struct aggr_cpu_id perf_env__get_cpu_aggr_by_cpu(struct perf_cpu cpu, void *data)
> +{
> +       struct perf_env *env = data;
> +       struct aggr_cpu_id id = aggr_cpu_id__empty();
> +
> +       if (cpu.cpu != -1) {
> +               /*
> +                * core_id is relative to socket and die,
> +                * we need a global id. So we set
> +                * socket, die id and core id
> +                */
> +               id.socket = env->cpu[cpu.cpu].socket_id;
> +               id.die = env->cpu[cpu.cpu].die_id;
> +               id.core = env->cpu[cpu.cpu].core_id;
> +               id.cpu = cpu;
> +       }
> +
> +       return id;
> +}
> +
>  static struct aggr_cpu_id perf_env__get_node_aggr_by_cpu(struct perf_cpu cpu, void *data)
>  {
>         struct aggr_cpu_id id = aggr_cpu_id__empty();
> @@ -1579,6 +1609,12 @@ static struct aggr_cpu_id perf_stat__get_core_file(struct perf_stat_config *conf
>         return perf_env__get_core_aggr_by_cpu(cpu, &perf_stat.session->header.env);
>  }
>
> +static struct aggr_cpu_id perf_stat__get_cpu_file(struct perf_stat_config *config __maybe_unused,
> +                                                 struct perf_cpu cpu)
> +{
> +       return perf_env__get_cpu_aggr_by_cpu(cpu, &perf_stat.session->header.env);
> +}
> +
>  static struct aggr_cpu_id perf_stat__get_node_file(struct perf_stat_config *config __maybe_unused,
>                                                    struct perf_cpu cpu)
>  {
> @@ -1605,6 +1641,7 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr_file(enum aggr_mode aggr_mode)
>         case AGGR_GLOBAL:
>                 return perf_env__get_global_aggr_by_cpu;
>         case AGGR_NONE:
> +               return perf_env__get_cpu_aggr_by_cpu;
>         case AGGR_THREAD:
>         case AGGR_UNSET:
>         case AGGR_MAX:
> @@ -1627,6 +1664,7 @@ static aggr_get_id_t aggr_mode__get_id_file(enum aggr_mode aggr_mode)
>         case AGGR_GLOBAL:
>                 return perf_stat__get_global_file;
>         case AGGR_NONE:
> +               return perf_stat__get_cpu_file;
>         case AGGR_THREAD:
>         case AGGR_UNSET:
>         case AGGR_MAX:
> --
> 2.38.0.rc1.362.ged0d419d3c-goog
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 06/19] perf stat: Add 'needs_sort' argument to cpu_aggr_map__new()
  2022-10-10  5:35 ` [PATCH 06/19] perf stat: Add 'needs_sort' argument to cpu_aggr_map__new() Namhyung Kim
@ 2022-10-10 22:53   ` Ian Rogers
  2022-10-11 23:32     ` Namhyung Kim
  0 siblings, 1 reply; 63+ messages in thread
From: Ian Rogers @ 2022-10-10 22:53 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> In case of no aggregation, it needs to keep the original (cpu) ordering
> in the aggr_map so that it can be in sync with the cpu map.  This will
> make the code easier to handle AGGR_NONE similar to others.
>

The CPU map is sorted and so sorting the aggr_map should be fine. If
the data is already sorted then it is O(n) to sort. I think this is
preferable to having additional complexity around whether the aggr_map
is sorted.

Thanks,
Ian

> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-stat.c | 7 +++++--
>  tools/perf/util/cpumap.c  | 6 ++++--
>  tools/perf/util/cpumap.h  | 2 +-
>  3 files changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index b00ef20aef5b..e5ddf60ab31d 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1461,8 +1461,9 @@ static int perf_stat_init_aggr_mode(void)
>         aggr_cpu_id_get_t get_id = aggr_mode__get_aggr(stat_config.aggr_mode);
>
>         if (get_id) {
> +               bool needs_sort = stat_config.aggr_mode != AGGR_NONE;
>                 stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.user_requested_cpus,
> -                                                        get_id, /*data=*/NULL);
> +                                                        get_id, /*data=*/NULL, needs_sort);
>                 if (!stat_config.aggr_map) {
>                         pr_err("cannot build %s map", aggr_mode__string[stat_config.aggr_mode]);
>                         return -1;
> @@ -1677,11 +1678,13 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
>  {
>         struct perf_env *env = &st->session->header.env;
>         aggr_cpu_id_get_t get_id = aggr_mode__get_aggr_file(stat_config.aggr_mode);
> +       bool needs_sort = stat_config.aggr_mode != AGGR_NONE;
>
>         if (!get_id)
>                 return 0;
>
> -       stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.user_requested_cpus, get_id, env);
> +       stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.user_requested_cpus,
> +                                                get_id, env, needs_sort);
>         if (!stat_config.aggr_map) {
>                 pr_err("cannot build %s map", aggr_mode__string[stat_config.aggr_mode]);
>                 return -1;
> diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c
> index 60209fe87456..6e3fcf523de9 100644
> --- a/tools/perf/util/cpumap.c
> +++ b/tools/perf/util/cpumap.c
> @@ -234,7 +234,7 @@ static int aggr_cpu_id__cmp(const void *a_pointer, const void *b_pointer)
>
>  struct cpu_aggr_map *cpu_aggr_map__new(const struct perf_cpu_map *cpus,
>                                        aggr_cpu_id_get_t get_id,
> -                                      void *data)
> +                                      void *data, bool needs_sort)
>  {
>         int idx;
>         struct perf_cpu cpu;
> @@ -270,8 +270,10 @@ struct cpu_aggr_map *cpu_aggr_map__new(const struct perf_cpu_map *cpus,
>                 if (trimmed_c)
>                         c = trimmed_c;
>         }
> +
>         /* ensure we process id in increasing order */
> -       qsort(c->map, c->nr, sizeof(struct aggr_cpu_id), aggr_cpu_id__cmp);
> +       if (needs_sort)
> +               qsort(c->map, c->nr, sizeof(struct aggr_cpu_id), aggr_cpu_id__cmp);
>
>         return c;
>
> diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h
> index b2ff648bc417..da28b3146ef9 100644
> --- a/tools/perf/util/cpumap.h
> +++ b/tools/perf/util/cpumap.h
> @@ -97,7 +97,7 @@ typedef struct aggr_cpu_id (*aggr_cpu_id_get_t)(struct perf_cpu cpu, void *data)
>   */
>  struct cpu_aggr_map *cpu_aggr_map__new(const struct perf_cpu_map *cpus,
>                                        aggr_cpu_id_get_t get_id,
> -                                      void *data);
> +                                      void *data, bool needs_sort);
>
>  bool aggr_cpu_id__equal(const struct aggr_cpu_id *a, const struct aggr_cpu_id *b);
>  bool aggr_cpu_id__is_empty(const struct aggr_cpu_id *a);
> --
> 2.38.0.rc1.362.ged0d419d3c-goog
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 07/19] perf stat: Add struct perf_stat_aggr to perf_stat_evsel
  2022-10-10  5:35 ` [PATCH 07/19] perf stat: Add struct perf_stat_aggr to perf_stat_evsel Namhyung Kim
@ 2022-10-10 23:00   ` Ian Rogers
  2022-10-11 23:37     ` Namhyung Kim
  0 siblings, 1 reply; 63+ messages in thread
From: Ian Rogers @ 2022-10-10 23:00 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> The perf_stat_aggr struct is to keep aggregated counter values and the
> states according to the aggregation mode.  The number of entries is
> depends on the mode and this is a preparation for the later use.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/util/stat.c | 34 +++++++++++++++++++++++++++-------
>  tools/perf/util/stat.h |  9 +++++++++
>  2 files changed, 36 insertions(+), 7 deletions(-)
>
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index 8ec8bb4a9912..c9d5aa295b54 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -133,15 +133,33 @@ static void perf_stat_evsel_id_init(struct evsel *evsel)
>  static void evsel__reset_stat_priv(struct evsel *evsel)
>  {
>         struct perf_stat_evsel *ps = evsel->stats;
> +       struct perf_stat_aggr *aggr = ps->aggr;
>
>         init_stats(&ps->res_stats);
> +
> +       if (aggr)
> +               memset(aggr, 0, sizeof(*aggr) * ps->nr_aggr);
>  }
>
> -static int evsel__alloc_stat_priv(struct evsel *evsel)
> +
> +static int evsel__alloc_stat_priv(struct evsel *evsel, int nr_aggr)
>  {
> -       evsel->stats = zalloc(sizeof(struct perf_stat_evsel));
> -       if (evsel->stats == NULL)
> +       struct perf_stat_evsel *ps;
> +
> +       ps = zalloc(sizeof(*ps));
> +       if (ps == NULL)
>                 return -ENOMEM;
> +
> +       if (nr_aggr) {
> +               ps->nr_aggr = nr_aggr;
> +               ps->aggr = calloc(nr_aggr, sizeof(*ps->aggr));
> +               if (ps->aggr == NULL) {
> +                       free(ps);
> +                       return -ENOMEM;
> +               }
> +       }
> +
> +       evsel->stats = ps;
>         perf_stat_evsel_id_init(evsel);
>         evsel__reset_stat_priv(evsel);
>         return 0;
> @@ -151,8 +169,10 @@ static void evsel__free_stat_priv(struct evsel *evsel)
>  {
>         struct perf_stat_evsel *ps = evsel->stats;
>
> -       if (ps)
> +       if (ps) {
> +               zfree(&ps->aggr);
>                 zfree(&ps->group_data);
> +       }
>         zfree(&evsel->stats);
>  }
>
> @@ -181,9 +201,9 @@ static void evsel__reset_prev_raw_counts(struct evsel *evsel)
>                 perf_counts__reset(evsel->prev_raw_counts);
>  }
>
> -static int evsel__alloc_stats(struct evsel *evsel, bool alloc_raw)
> +static int evsel__alloc_stats(struct evsel *evsel, int nr_aggr, bool alloc_raw)
>  {
> -       if (evsel__alloc_stat_priv(evsel) < 0 ||
> +       if (evsel__alloc_stat_priv(evsel, nr_aggr) < 0 ||
>             evsel__alloc_counts(evsel) < 0 ||
>             (alloc_raw && evsel__alloc_prev_raw_counts(evsel) < 0))
>                 return -ENOMEM;
> @@ -196,7 +216,7 @@ int evlist__alloc_stats(struct evlist *evlist, bool alloc_raw)
>         struct evsel *evsel;
>
>         evlist__for_each_entry(evlist, evsel) {
> -               if (evsel__alloc_stats(evsel, alloc_raw))
> +               if (evsel__alloc_stats(evsel, 0, alloc_raw))
>                         goto out_free;
>         }
>
> diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> index b0899c6e002f..ea356e5aa351 100644
> --- a/tools/perf/util/stat.h
> +++ b/tools/perf/util/stat.h
> @@ -8,6 +8,7 @@
>  #include <sys/resource.h>
>  #include "cpumap.h"
>  #include "rblist.h"
> +#include "counts.h"
>
>  struct perf_cpu_map;
>  struct perf_stat_config;
> @@ -42,9 +43,17 @@ enum perf_stat_evsel_id {
>         PERF_STAT_EVSEL_ID__MAX,
>  };
>

The new struct variables below are all worth comments.

> +struct perf_stat_aggr {
> +       struct perf_counts_values       counts;
> +       int                             nr;

Could this value be derived from counts.values.size ?

> +       bool                            failed;
> +};
> +
>  struct perf_stat_evsel {
>         struct stats             res_stats;
>         enum perf_stat_evsel_id  id;
> +       int                      nr_aggr;
> +       struct perf_stat_aggr   *aggr;
>         u64                     *group_data;
>  };
>
> --
> 2.38.0.rc1.362.ged0d419d3c-goog
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 08/19] perf stat: Allocate evsel->stats->aggr properly
  2022-10-10  5:35 ` [PATCH 08/19] perf stat: Allocate evsel->stats->aggr properly Namhyung Kim
@ 2022-10-10 23:03   ` Ian Rogers
  2022-10-11 23:38     ` Namhyung Kim
  0 siblings, 1 reply; 63+ messages in thread
From: Ian Rogers @ 2022-10-10 23:03 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> The perf_stat_config.aggr_map should have a correct size of the
> aggregation map.  Use it to allocate aggr_counts.
>
> Also AGGR_NONE with per-core events can be tricky because it doesn't
> aggreate basically but it needs to do so for per-core events only.

nit: s/aggreate/aggregate/

> So only per-core evsels will have stats->aggr data.
>
> Note that other caller of evlist__alloc_stat() might not have
> stat_config or aggr_map.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>

Acked-by: Ian Rogers <irogers@google.com>

nit: Below there are use of constants true, false and NULL, it would
be nice to use the /*argument_name=*/... style parameter passing to be
clearer on what the parameter means.

Thanks,
Ian

> ---
>  tools/perf/builtin-script.c     | 4 ++--
>  tools/perf/builtin-stat.c       | 6 +++---
>  tools/perf/tests/parse-metric.c | 2 +-
>  tools/perf/tests/pmu-events.c   | 2 +-
>  tools/perf/util/stat.c          | 9 +++++++--
>  tools/perf/util/stat.h          | 3 ++-
>  6 files changed, 16 insertions(+), 10 deletions(-)
>
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index 7ca238277d83..691915a71c86 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -2049,7 +2049,7 @@ static void perf_sample__fprint_metric(struct perf_script *script,
>         u64 val;
>
>         if (!evsel->stats)
> -               evlist__alloc_stats(script->session->evlist, false);
> +               evlist__alloc_stats(&stat_config, script->session->evlist, false);
>         if (evsel_script(leader)->gnum++ == 0)
>                 perf_stat__reset_shadow_stats();
>         val = sample->period * evsel->scale;
> @@ -3632,7 +3632,7 @@ static int set_maps(struct perf_script *script)
>
>         perf_evlist__set_maps(&evlist->core, script->cpus, script->threads);
>
> -       if (evlist__alloc_stats(evlist, true))
> +       if (evlist__alloc_stats(&stat_config, evlist, true))
>                 return -ENOMEM;
>
>         script->allocated = true;
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index e5ddf60ab31d..eaddafbd7ff2 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -2124,7 +2124,7 @@ static int set_maps(struct perf_stat *st)
>
>         perf_evlist__set_maps(&evsel_list->core, st->cpus, st->threads);
>
> -       if (evlist__alloc_stats(evsel_list, true))
> +       if (evlist__alloc_stats(&stat_config, evsel_list, true))
>                 return -ENOMEM;
>
>         st->maps_allocated = true;
> @@ -2571,10 +2571,10 @@ int cmd_stat(int argc, const char **argv)
>                 goto out;
>         }
>
> -       if (evlist__alloc_stats(evsel_list, interval))
> +       if (perf_stat_init_aggr_mode())
>                 goto out;
>
> -       if (perf_stat_init_aggr_mode())
> +       if (evlist__alloc_stats(&stat_config, evsel_list, interval))
>                 goto out;
>
>         /*
> diff --git a/tools/perf/tests/parse-metric.c b/tools/perf/tests/parse-metric.c
> index 68f5a2a03242..cb3a9b795c0f 100644
> --- a/tools/perf/tests/parse-metric.c
> +++ b/tools/perf/tests/parse-metric.c
> @@ -103,7 +103,7 @@ static int __compute_metric(const char *name, struct value *vals,
>         if (err)
>                 goto out;
>
> -       err = evlist__alloc_stats(evlist, false);
> +       err = evlist__alloc_stats(NULL, evlist, false);
>         if (err)
>                 goto out;
>
> diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
> index 097e05c796ab..a5e1028dacfc 100644
> --- a/tools/perf/tests/pmu-events.c
> +++ b/tools/perf/tests/pmu-events.c
> @@ -889,7 +889,7 @@ static int test__parsing_callback(const struct pmu_event *pe, const struct pmu_e
>                 goto out_err;
>         }
>
> -       err = evlist__alloc_stats(evlist, false);
> +       err = evlist__alloc_stats(NULL, evlist, false);
>         if (err)
>                 goto out_err;
>         /*
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index c9d5aa295b54..374149628507 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -211,12 +211,17 @@ static int evsel__alloc_stats(struct evsel *evsel, int nr_aggr, bool alloc_raw)
>         return 0;
>  }
>
> -int evlist__alloc_stats(struct evlist *evlist, bool alloc_raw)
> +int evlist__alloc_stats(struct perf_stat_config *config,
> +                       struct evlist *evlist, bool alloc_raw)
>  {
>         struct evsel *evsel;
> +       int nr_aggr = 0;
> +
> +       if (config && config->aggr_map)
> +               nr_aggr = config->aggr_map->nr;
>
>         evlist__for_each_entry(evlist, evsel) {
> -               if (evsel__alloc_stats(evsel, 0, alloc_raw))
> +               if (evsel__alloc_stats(evsel, nr_aggr, alloc_raw))
>                         goto out_free;
>         }
>
> diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> index ea356e5aa351..74bd51a3cb36 100644
> --- a/tools/perf/util/stat.h
> +++ b/tools/perf/util/stat.h
> @@ -257,7 +257,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
>                                    struct runtime_stat *st);
>  void perf_stat__collect_metric_expr(struct evlist *);
>
> -int evlist__alloc_stats(struct evlist *evlist, bool alloc_raw);
> +int evlist__alloc_stats(struct perf_stat_config *config,
> +                       struct evlist *evlist, bool alloc_raw);
>  void evlist__free_stats(struct evlist *evlist);
>  void evlist__reset_stats(struct evlist *evlist);
>  void evlist__reset_prev_raw_counts(struct evlist *evlist);
> --
> 2.38.0.rc1.362.ged0d419d3c-goog
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 09/19] perf stat: Aggregate events using evsel->stats->aggr
  2022-10-10  5:35 ` [PATCH 09/19] perf stat: Aggregate events using evsel->stats->aggr Namhyung Kim
@ 2022-10-10 23:11   ` Ian Rogers
  2022-10-11 23:44     ` Namhyung Kim
  0 siblings, 1 reply; 63+ messages in thread
From: Ian Rogers @ 2022-10-10 23:11 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Add a logic to aggregate counter values to the new evsel->stats->aggr.
> This is not used yet so shadow stats are not updated.  But later patch
> will convert the existing code to use it.
>
> With that, we don't need to handle AGGR_GLOBAL specially anymore.  It
> can use the same logic with counts, prev_counts and aggr_counts.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-stat.c                     |  3 --
>  tools/perf/util/evsel.c                       |  9 +---
>  .../scripting-engines/trace-event-python.c    |  6 ---
>  tools/perf/util/stat.c                        | 46 ++++++++++++++++---
>  4 files changed, 41 insertions(+), 23 deletions(-)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index eaddafbd7ff2..139e35ed68d3 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -963,9 +963,6 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>                 init_stats(&walltime_nsecs_stats);
>                 update_stats(&walltime_nsecs_stats, t1 - t0);
>
> -               if (stat_config.aggr_mode == AGGR_GLOBAL)
> -                       evlist__save_aggr_prev_raw_counts(evsel_list);
> -
>                 evlist__copy_prev_raw_counts(evsel_list);
>                 evlist__reset_prev_raw_counts(evsel_list);
>                 perf_stat__reset_shadow_per_stat(&rt_stat);
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index a6ea91c72659..a1fcb3166149 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -1526,13 +1526,8 @@ void evsel__compute_deltas(struct evsel *evsel, int cpu_map_idx, int thread,
>         if (!evsel->prev_raw_counts)
>                 return;
>
> -       if (cpu_map_idx == -1) {
> -               tmp = evsel->prev_raw_counts->aggr;
> -               evsel->prev_raw_counts->aggr = *count;
> -       } else {
> -               tmp = *perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread);
> -               *perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread) = *count;
> -       }
> +       tmp = *perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread);
> +       *perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread) = *count;
>
>         count->val = count->val - tmp.val;
>         count->ena = count->ena - tmp.ena;
> diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
> index 1f2040f36d4e..7bc8559dce6a 100644
> --- a/tools/perf/util/scripting-engines/trace-event-python.c
> +++ b/tools/perf/util/scripting-engines/trace-event-python.c
> @@ -1653,12 +1653,6 @@ static void python_process_stat(struct perf_stat_config *config,
>         struct perf_cpu_map *cpus = counter->core.cpus;
>         int cpu, thread;
>
> -       if (config->aggr_mode == AGGR_GLOBAL) {
> -               process_stat(counter, (struct perf_cpu){ .cpu = -1 }, -1, tstamp,
> -                            &counter->counts->aggr);
> -               return;
> -       }
> -
>         for (thread = 0; thread < threads->nr; thread++) {
>                 for (cpu = 0; cpu < perf_cpu_map__nr(cpus); cpu++) {
>                         process_stat(counter, perf_cpu_map__cpu(cpus, cpu),
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index 374149628507..99874254809d 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -387,6 +387,7 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
>                        struct perf_counts_values *count)
>  {
>         struct perf_counts_values *aggr = &evsel->counts->aggr;
> +       struct perf_stat_evsel *ps = evsel->stats;
>         static struct perf_counts_values zero;
>         bool skip = false;
>
> @@ -398,6 +399,44 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
>         if (skip)
>                 count = &zero;
>
> +       if (!evsel->snapshot)
> +               evsel__compute_deltas(evsel, cpu_map_idx, thread, count);
> +       perf_counts_values__scale(count, config->scale, NULL);
> +
> +       if (ps->aggr) {
> +               struct perf_cpu cpu = perf_cpu_map__cpu(evsel->core.cpus, cpu_map_idx);
> +               struct aggr_cpu_id aggr_id = config->aggr_get_id(config, cpu);
> +               struct perf_stat_aggr *ps_aggr;
> +               int i;
> +
> +               for (i = 0; i < ps->nr_aggr; i++) {

Would it be cleaner to have a helper function here that returns i or
ps_aggr for the first CPU being aggregated into? That would avoid the
continue/break.

> +                       if (!aggr_cpu_id__equal(&aggr_id, &config->aggr_map->map[i]))
> +                               continue;
> +
> +                       ps_aggr = &ps->aggr[i];
> +                       ps_aggr->nr++;
> +
> +                       /*
> +                        * When any result is bad, make them all to give
> +                        * consistent output in interval mode.
> +                        */
> +                       if (count->ena == 0 || count->run == 0 ||
> +                           evsel->counts->scaled == -1) {
> +                               ps_aggr->counts.val = 0;
> +                               ps_aggr->counts.ena = 0;
> +                               ps_aggr->counts.run = 0;
> +                               ps_aggr->failed = true;
> +                       }
> +
> +                       if (!ps_aggr->failed) {
> +                               ps_aggr->counts.val += count->val;
> +                               ps_aggr->counts.ena += count->ena;
> +                               ps_aggr->counts.run += count->run;
> +                       }
> +                       break;
> +               }
> +       }
> +
>         switch (config->aggr_mode) {
>         case AGGR_THREAD:
>         case AGGR_CORE:
> @@ -405,9 +444,6 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
>         case AGGR_SOCKET:
>         case AGGR_NODE:
>         case AGGR_NONE:
> -               if (!evsel->snapshot)
> -                       evsel__compute_deltas(evsel, cpu_map_idx, thread, count);
> -               perf_counts_values__scale(count, config->scale, NULL);
>                 if ((config->aggr_mode == AGGR_NONE) && (!evsel->percore)) {
>                         perf_stat__update_shadow_stats(evsel, count->val,
>                                                        cpu_map_idx, &rt_stat);
> @@ -469,10 +505,6 @@ int perf_stat_process_counter(struct perf_stat_config *config,
>         if (config->aggr_mode != AGGR_GLOBAL)
>                 return 0;
>
> -       if (!counter->snapshot)
> -               evsel__compute_deltas(counter, -1, -1, aggr);
> -       perf_counts_values__scale(aggr, config->scale, &counter->counts->scaled);
> -

It isn't clear to me how this relates to the patch.

Thanks,
Ian

>         update_stats(&ps->res_stats, *count);
>
>         if (verbose > 0) {
> --
> 2.38.0.rc1.362.ged0d419d3c-goog
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 10/19] perf stat: Aggregate per-thread stats using evsel->stats->aggr
  2022-10-10  5:35 ` [PATCH 10/19] perf stat: Aggregate per-thread stats " Namhyung Kim
@ 2022-10-10 23:17   ` Ian Rogers
  2022-10-11 23:46     ` Namhyung Kim
  0 siblings, 1 reply; 63+ messages in thread
From: Ian Rogers @ 2022-10-10 23:17 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Per-thread aggregation doesn't use the CPU numbers but the logic should
> be the same.  Initialize cpu_aggr_map separately for AGGR_THREAD and use
> thread map idx to aggregate counter values.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-stat.c | 31 +++++++++++++++++++++++++++++++
>  tools/perf/util/stat.c    | 19 +++++++++++++++++++
>  2 files changed, 50 insertions(+)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 139e35ed68d3..c76240cfc635 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1468,6 +1468,21 @@ static int perf_stat_init_aggr_mode(void)
>                 stat_config.aggr_get_id = aggr_mode__get_id(stat_config.aggr_mode);
>         }
>
> +       if (stat_config.aggr_mode == AGGR_THREAD) {
> +               nr = perf_thread_map__nr(evsel_list->core.threads);
> +               stat_config.aggr_map = cpu_aggr_map__empty_new(nr);
> +               if (stat_config.aggr_map == NULL)
> +                       return -ENOMEM;
> +
> +               for (int s = 0; s < nr; s++) {
> +                       struct aggr_cpu_id id = aggr_cpu_id__empty();
> +
> +                       id.thread_idx = s;
> +                       stat_config.aggr_map->map[s] = id;
> +               }
> +               return 0;
> +       }
> +
>         /*
>          * The evsel_list->cpus is the base we operate on,
>          * taking the highest cpu number to be the size of
> @@ -1677,6 +1692,22 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
>         aggr_cpu_id_get_t get_id = aggr_mode__get_aggr_file(stat_config.aggr_mode);
>         bool needs_sort = stat_config.aggr_mode != AGGR_NONE;
>
> +       if (stat_config.aggr_mode == AGGR_THREAD) {
> +               int nr = perf_thread_map__nr(evsel_list->core.threads);
> +
> +               stat_config.aggr_map = cpu_aggr_map__empty_new(nr);
> +               if (stat_config.aggr_map == NULL)
> +                       return -ENOMEM;
> +
> +               for (int s = 0; s < nr; s++) {
> +                       struct aggr_cpu_id id = aggr_cpu_id__empty();
> +
> +                       id.thread_idx = s;
> +                       stat_config.aggr_map->map[s] = id;
> +               }
> +               return 0;
> +       }
> +
>         if (!get_id)
>                 return 0;
>
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index 99874254809d..013dbe1c5d28 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -403,6 +403,24 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
>                 evsel__compute_deltas(evsel, cpu_map_idx, thread, count);
>         perf_counts_values__scale(count, config->scale, NULL);
>
> +       if (config->aggr_mode == AGGR_THREAD) {
> +               struct perf_counts_values *aggr_counts = &ps->aggr[thread].counts;
> +
> +               /*
> +                * Skip value 0 when enabling --per-thread globally,
> +                * otherwise too many 0 output.
> +                */
> +               if (count->val == 0 && config->system_wide)
> +                       return 0;
> +
> +               ps->aggr[thread].nr++;
> +
> +               aggr_counts->val += count->val;
> +               aggr_counts->ena += count->ena;
> +               aggr_counts->run += count->run;
> +               goto update;

nit: perhaps there is a more intention revealing name than update here.

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> +       }
> +
>         if (ps->aggr) {
>                 struct perf_cpu cpu = perf_cpu_map__cpu(evsel->core.cpus, cpu_map_idx);
>                 struct aggr_cpu_id aggr_id = config->aggr_get_id(config, cpu);
> @@ -437,6 +455,7 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
>                 }
>         }
>
> +update:
>         switch (config->aggr_mode) {
>         case AGGR_THREAD:
>         case AGGR_CORE:
> --
> 2.38.0.rc1.362.ged0d419d3c-goog
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 11/19] perf stat: Allocate aggr counts for recorded data
  2022-10-10  5:35 ` [PATCH 11/19] perf stat: Allocate aggr counts for recorded data Namhyung Kim
@ 2022-10-10 23:18   ` Ian Rogers
  0 siblings, 0 replies; 63+ messages in thread
From: Ian Rogers @ 2022-10-10 23:18 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> In the process_stat_config_event() it sets the aggr_mode that means the
> earlier evlist__alloc_stats() cannot allocate the aggr counts due to the
> missing aggr_mode.
>
> Do it after setting the aggr_map using evlist__alloc_aggr_stats().
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/builtin-stat.c |  8 ++++++++
>  tools/perf/util/stat.c    | 39 +++++++++++++++++++++++++++++++--------
>  tools/perf/util/stat.h    |  2 ++
>  3 files changed, 41 insertions(+), 8 deletions(-)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index c76240cfc635..983f38cd4caa 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -2139,6 +2139,14 @@ int process_stat_config_event(struct perf_session *session,
>         else
>                 perf_stat_init_aggr_mode_file(st);
>
> +       if (stat_config.aggr_map) {
> +               int nr_aggr = stat_config.aggr_map->nr;
> +
> +               if (evlist__alloc_aggr_stats(session->evlist, nr_aggr) < 0) {
> +                       pr_err("cannot allocate aggr counts\n");
> +                       return -1;
> +               }
> +       }
>         return 0;
>  }
>
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index 013dbe1c5d28..279aa4ea342d 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -141,6 +141,31 @@ static void evsel__reset_stat_priv(struct evsel *evsel)
>                 memset(aggr, 0, sizeof(*aggr) * ps->nr_aggr);
>  }
>
> +static int evsel__alloc_aggr_stats(struct evsel *evsel, int nr_aggr)
> +{
> +       struct perf_stat_evsel *ps = evsel->stats;
> +
> +       if (ps == NULL)
> +               return 0;
> +
> +       ps->nr_aggr = nr_aggr;
> +       ps->aggr = calloc(nr_aggr, sizeof(*ps->aggr));
> +       if (ps->aggr == NULL)
> +               return -ENOMEM;
> +
> +       return 0;
> +}
> +
> +int evlist__alloc_aggr_stats(struct evlist *evlist, int nr_aggr)
> +{
> +       struct evsel *evsel;
> +
> +       evlist__for_each_entry(evlist, evsel) {
> +               if (evsel__alloc_aggr_stats(evsel, nr_aggr) < 0)
> +                       return -1;
> +       }
> +       return 0;
> +}
>
>  static int evsel__alloc_stat_priv(struct evsel *evsel, int nr_aggr)
>  {
> @@ -150,16 +175,14 @@ static int evsel__alloc_stat_priv(struct evsel *evsel, int nr_aggr)
>         if (ps == NULL)
>                 return -ENOMEM;
>
> -       if (nr_aggr) {
> -               ps->nr_aggr = nr_aggr;
> -               ps->aggr = calloc(nr_aggr, sizeof(*ps->aggr));
> -               if (ps->aggr == NULL) {
> -                       free(ps);
> -                       return -ENOMEM;
> -               }
> +       evsel->stats = ps;
> +
> +       if (nr_aggr && evsel__alloc_aggr_stats(evsel, nr_aggr) < 0) {
> +               evsel->stats = NULL;
> +               free(ps);
> +               return -ENOMEM;
>         }
>
> -       evsel->stats = ps;
>         perf_stat_evsel_id_init(evsel);
>         evsel__reset_stat_priv(evsel);
>         return 0;
> diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> index 74bd51a3cb36..936c0709ce0d 100644
> --- a/tools/perf/util/stat.h
> +++ b/tools/perf/util/stat.h
> @@ -265,6 +265,8 @@ void evlist__reset_prev_raw_counts(struct evlist *evlist);
>  void evlist__copy_prev_raw_counts(struct evlist *evlist);
>  void evlist__save_aggr_prev_raw_counts(struct evlist *evlist);
>
> +int evlist__alloc_aggr_stats(struct evlist *evlist, int nr_aggr);
> +
>  int perf_stat_process_counter(struct perf_stat_config *config,
>                               struct evsel *counter);
>  struct perf_tool;
> --
> 2.38.0.rc1.362.ged0d419d3c-goog
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 12/19] perf stat: Reset aggr counts for each interval
  2022-10-10  5:35 ` [PATCH 12/19] perf stat: Reset aggr counts for each interval Namhyung Kim
@ 2022-10-10 23:20   ` Ian Rogers
  2022-10-11 23:48     ` Namhyung Kim
  0 siblings, 1 reply; 63+ messages in thread
From: Ian Rogers @ 2022-10-10 23:20 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> The evsel->stats->aggr->count should be reset for interval processing
> since we want to use the values directly for display.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-stat.c |  3 +++
>  tools/perf/util/stat.c    | 13 +++++++++++++
>  tools/perf/util/stat.h    |  1 +
>  3 files changed, 17 insertions(+)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 983f38cd4caa..38036f40e993 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -492,6 +492,8 @@ static void process_interval(void)
>         diff_timespec(&rs, &ts, &ref_time);
>
>         perf_stat__reset_shadow_per_stat(&rt_stat);
> +       evlist__reset_aggr_stats(evsel_list);
> +
>         read_counters(&rs);
>
>         if (STAT_RECORD) {
> @@ -965,6 +967,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>
>                 evlist__copy_prev_raw_counts(evsel_list);
>                 evlist__reset_prev_raw_counts(evsel_list);
> +               evlist__reset_aggr_stats(evsel_list);
>                 perf_stat__reset_shadow_per_stat(&rt_stat);
>         } else {
>                 update_stats(&walltime_nsecs_stats, t1 - t0);
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index 279aa4ea342d..4edfc1c5dc07 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -276,6 +276,19 @@ void evlist__reset_stats(struct evlist *evlist)
>         }
>  }
>
> +void evlist__reset_aggr_stats(struct evlist *evlist)
> +{
> +       struct evsel *evsel;
> +
> +       evlist__for_each_entry(evlist, evsel) {
> +               struct perf_stat_evsel *ps = evsel->stats;
> +               struct perf_stat_aggr *aggr = ps->aggr;
> +
> +               if (aggr)
> +                       memset(aggr, 0, sizeof(*aggr) * ps->nr_aggr);

Perhaps this would be cleaner with helper functions on perf_stat_evsel
and perf_stat_aggr?

Thanks,
Ian

> +       }
> +}
> +
>  void evlist__reset_prev_raw_counts(struct evlist *evlist)
>  {
>         struct evsel *evsel;
> diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> index 936c0709ce0d..3a876ad2870b 100644
> --- a/tools/perf/util/stat.h
> +++ b/tools/perf/util/stat.h
> @@ -266,6 +266,7 @@ void evlist__copy_prev_raw_counts(struct evlist *evlist);
>  void evlist__save_aggr_prev_raw_counts(struct evlist *evlist);
>
>  int evlist__alloc_aggr_stats(struct evlist *evlist, int nr_aggr);
> +void evlist__reset_aggr_stats(struct evlist *evlist);
>
>  int perf_stat_process_counter(struct perf_stat_config *config,
>                               struct evsel *counter);
> --
> 2.38.0.rc1.362.ged0d419d3c-goog
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 13/19] perf stat: Split process_counters()
  2022-10-10  5:35 ` [PATCH 13/19] perf stat: Split process_counters() Namhyung Kim
@ 2022-10-10 23:21   ` Ian Rogers
  0 siblings, 0 replies; 63+ messages in thread
From: Ian Rogers @ 2022-10-10 23:21 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> It'd do more processing with aggregation.  Let's split the function so that it
> can be shared with by process_stat_round_event() too.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/builtin-stat.c | 22 +++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 38036f40e993..49a7e290d778 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -465,15 +465,19 @@ static int read_bpf_map_counters(void)
>         return 0;
>  }
>
> -static void read_counters(struct timespec *rs)
> +static int read_counters(struct timespec *rs)
>  {
> -       struct evsel *counter;
> -
>         if (!stat_config.stop_read_counter) {
>                 if (read_bpf_map_counters() ||
>                     read_affinity_counters(rs))
> -                       return;
> +                       return -1;
>         }
> +       return 0;
> +}
> +
> +static void process_counters(void)
> +{
> +       struct evsel *counter;
>
>         evlist__for_each_entry(evsel_list, counter) {
>                 if (counter->err)
> @@ -494,7 +498,8 @@ static void process_interval(void)
>         perf_stat__reset_shadow_per_stat(&rt_stat);
>         evlist__reset_aggr_stats(evsel_list);
>
> -       read_counters(&rs);
> +       if (read_counters(&rs) == 0)
> +               process_counters();
>
>         if (STAT_RECORD) {
>                 if (WRITE_STAT_ROUND_EVENT(rs.tv_sec * NSEC_PER_SEC + rs.tv_nsec, INTERVAL))
> @@ -980,7 +985,8 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>          * avoid arbitrary skew, we must read all counters before closing any
>          * group leaders.
>          */
> -       read_counters(&(struct timespec) { .tv_nsec = t1-t0 });
> +       if (read_counters(&(struct timespec) { .tv_nsec = t1-t0 }) == 0)
> +               process_counters();
>
>         /*
>          * We need to keep evsel_list alive, because it's processed
> @@ -2098,13 +2104,11 @@ static int process_stat_round_event(struct perf_session *session,
>                                     union perf_event *event)
>  {
>         struct perf_record_stat_round *stat_round = &event->stat_round;
> -       struct evsel *counter;
>         struct timespec tsh, *ts = NULL;
>         const char **argv = session->header.env.cmdline_argv;
>         int argc = session->header.env.nr_cmdline;
>
> -       evlist__for_each_entry(evsel_list, counter)
> -               perf_stat_process_counter(&stat_config, counter);
> +       process_counters();
>
>         if (stat_round->type == PERF_STAT_ROUND_TYPE__FINAL)
>                 update_stats(&walltime_nsecs_stats, stat_round->time);
> --
> 2.38.0.rc1.362.ged0d419d3c-goog
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 14/19] perf stat: Add perf_stat_merge_counters()
  2022-10-10  5:35 ` [PATCH 14/19] perf stat: Add perf_stat_merge_counters() Namhyung Kim
@ 2022-10-10 23:31   ` Ian Rogers
  2022-10-11 23:55     ` Namhyung Kim
  0 siblings, 1 reply; 63+ messages in thread
From: Ian Rogers @ 2022-10-10 23:31 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> The perf_stat_merge_counters() is to aggregate the same events in different
> PMUs like in case of uncore or hybrid.  The same logic is in the stat-display
> routines but I think it should be handled when it processes the event counters.

I think I'm confused as to what a merged counter is. Does it relate to
the evsel leader? How are aliases and merging related?

Thanks,
Ian

>
> As it works on the aggr_counters, it doesn't change the output yet.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-stat.c |  2 +
>  tools/perf/util/stat.c    | 96 +++++++++++++++++++++++++++++++++++++++
>  tools/perf/util/stat.h    |  2 +
>  3 files changed, 100 insertions(+)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 49a7e290d778..f90e8f29cb23 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -486,6 +486,8 @@ static void process_counters(void)
>                         pr_warning("failed to process counter %s\n", counter->name);
>                 counter->err = 0;
>         }
> +
> +       perf_stat_merge_counters(&stat_config, evsel_list);
>  }
>
>  static void process_interval(void)
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index 4edfc1c5dc07..1bb197782a34 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -575,6 +575,102 @@ int perf_stat_process_counter(struct perf_stat_config *config,
>         return 0;
>  }
>
> +static int evsel__merge_aggr_counters(struct evsel *evsel, struct evsel *alias)
> +{
> +       struct perf_stat_evsel *ps_a = evsel->stats;
> +       struct perf_stat_evsel *ps_b = alias->stats;
> +       int i;
> +
> +       if (ps_a->aggr == NULL && ps_b->aggr == NULL)
> +               return 0;
> +
> +       if (ps_a->nr_aggr != ps_b->nr_aggr) {
> +               pr_err("Unmatched aggregation mode between aliases\n");
> +               return -1;
> +       }
> +
> +       for (i = 0; i < ps_a->nr_aggr; i++) {
> +               struct perf_counts_values *aggr_counts_a = &ps_a->aggr[i].counts;
> +               struct perf_counts_values *aggr_counts_b = &ps_b->aggr[i].counts;
> +
> +               /* NB: don't increase aggr.nr for aliases */
> +
> +               aggr_counts_a->val += aggr_counts_b->val;
> +               aggr_counts_a->ena += aggr_counts_b->ena;
> +               aggr_counts_a->run += aggr_counts_b->run;
> +       }
> +
> +       return 0;
> +}
> +/* events should have the same name, scale, unit, cgroup but on different PMUs */
> +static bool evsel__is_alias(struct evsel *evsel_a, struct evsel *evsel_b)
> +{
> +       if (strcmp(evsel__name(evsel_a), evsel__name(evsel_b)))
> +               return false;
> +
> +       if (evsel_a->scale != evsel_b->scale)
> +               return false;
> +
> +       if (evsel_a->cgrp != evsel_b->cgrp)
> +               return false;
> +
> +       if (strcmp(evsel_a->unit, evsel_b->unit))
> +               return false;
> +
> +       if (evsel__is_clock(evsel_a) != evsel__is_clock(evsel_b))
> +               return false;
> +
> +       return !!strcmp(evsel_a->pmu_name, evsel_b->pmu_name);
> +}
> +
> +static void evsel__merge_aliases(struct evsel *evsel)
> +{
> +       struct evlist *evlist = evsel->evlist;
> +       struct evsel *alias;
> +
> +       alias = list_prepare_entry(evsel, &(evlist->core.entries), core.node);
> +       list_for_each_entry_continue(alias, &evlist->core.entries, core.node) {
> +               /* Merge the same events on different PMUs. */
> +               if (evsel__is_alias(evsel, alias)) {
> +                       evsel__merge_aggr_counters(evsel, alias);
> +                       alias->merged_stat = true;
> +               }
> +       }
> +}
> +
> +static bool evsel__should_merge_hybrid(struct evsel *evsel, struct perf_stat_config *config)
> +{
> +       struct perf_pmu *pmu;
> +
> +       if (!config->hybrid_merge)
> +               return false;
> +
> +       pmu = evsel__find_pmu(evsel);
> +       return pmu && pmu->is_hybrid;
> +}
> +
> +static void evsel__merge_stats(struct evsel *evsel, struct perf_stat_config *config)
> +{
> +       /* this evsel is already merged */
> +       if (evsel->merged_stat)
> +               return;
> +
> +       if (evsel->auto_merge_stats || evsel__should_merge_hybrid(evsel, config))
> +               evsel__merge_aliases(evsel);
> +}
> +
> +/* merge the same uncore and hybrid events if requested */
> +void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *evlist)
> +{
> +       struct evsel *evsel;
> +
> +       if (config->no_merge)
> +               return;
> +
> +       evlist__for_each_entry(evlist, evsel)
> +               evsel__merge_stats(evsel, config);
> +}
> +
>  int perf_event__process_stat_event(struct perf_session *session,
>                                    union perf_event *event)
>  {
> diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> index 3a876ad2870b..12cc60ab04e4 100644
> --- a/tools/perf/util/stat.h
> +++ b/tools/perf/util/stat.h
> @@ -270,6 +270,8 @@ void evlist__reset_aggr_stats(struct evlist *evlist);
>
>  int perf_stat_process_counter(struct perf_stat_config *config,
>                               struct evsel *counter);
> +void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *evlist);
> +
>  struct perf_tool;
>  union perf_event;
>  struct perf_session;
> --
> 2.38.0.rc1.362.ged0d419d3c-goog
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 15/19] perf stat: Add perf_stat_process_percore()
  2022-10-10  5:35 ` [PATCH 15/19] perf stat: Add perf_stat_process_percore() Namhyung Kim
@ 2022-10-10 23:32   ` Ian Rogers
  2022-10-11 23:59     ` Namhyung Kim
  0 siblings, 1 reply; 63+ messages in thread
From: Ian Rogers @ 2022-10-10 23:32 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> The perf_stat_process_percore() is to aggregate counts for an event per-core
> even if the aggr_mode is AGGR_NONE.  This is enabled when user requested it
> on the command line.

Is there an example command line for this? It would be nice to add as a test.

Thanks,
Ian

> To handle that, it keeps the per-cpu counts at first.  And then it aggregates
> the counts that have the same core id in the aggr->counts and updates the
> values for each cpu back.
>
> Later, per-core events will skip one of the CPUs unless percore-show-thread
> option is given.  In that case, it can simply print all cpu stats with the
> updated (per-core) values.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-stat.c |  1 +
>  tools/perf/util/stat.c    | 71 +++++++++++++++++++++++++++++++++++++++
>  tools/perf/util/stat.h    |  2 ++
>  3 files changed, 74 insertions(+)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index f90e8f29cb23..c127e784a7be 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -488,6 +488,7 @@ static void process_counters(void)
>         }
>
>         perf_stat_merge_counters(&stat_config, evsel_list);
> +       perf_stat_process_percore(&stat_config, evsel_list);
>  }
>
>  static void process_interval(void)
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index 1bb197782a34..d788d0e85204 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -671,6 +671,77 @@ void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *ev
>                 evsel__merge_stats(evsel, config);
>  }
>
> +static void evsel__update_percore_stats(struct evsel *evsel, struct aggr_cpu_id *core_id)
> +{
> +       struct perf_stat_evsel *ps = evsel->stats;
> +       struct perf_counts_values counts = { 0, };
> +       struct aggr_cpu_id id;
> +       struct perf_cpu cpu;
> +       int idx;
> +
> +       /* collect per-core counts */
> +       perf_cpu_map__for_each_cpu(cpu, idx, evsel->core.cpus) {
> +               struct perf_stat_aggr *aggr = &ps->aggr[idx];
> +
> +               id = aggr_cpu_id__core(cpu, NULL);
> +               if (!aggr_cpu_id__equal(core_id, &id))
> +                       continue;
> +
> +               counts.val += aggr->counts.val;
> +               counts.ena += aggr->counts.ena;
> +               counts.run += aggr->counts.run;
> +       }
> +
> +       /* update aggregated per-core counts for each CPU */
> +       perf_cpu_map__for_each_cpu(cpu, idx, evsel->core.cpus) {
> +               struct perf_stat_aggr *aggr = &ps->aggr[idx];
> +
> +               id = aggr_cpu_id__core(cpu, NULL);
> +               if (!aggr_cpu_id__equal(core_id, &id))
> +                       continue;
> +
> +               aggr->counts.val = counts.val;
> +               aggr->counts.ena = counts.ena;
> +               aggr->counts.run = counts.run;
> +
> +               aggr->used = true;
> +       }
> +}
> +
> +/* we have an aggr_map for cpu, but want to aggregate the counters per-core */
> +static void evsel__process_percore(struct evsel *evsel)
> +{
> +       struct perf_stat_evsel *ps = evsel->stats;
> +       struct aggr_cpu_id core_id;
> +       struct perf_cpu cpu;
> +       int idx;
> +
> +       if (!evsel->percore)
> +               return;
> +
> +       perf_cpu_map__for_each_cpu(cpu, idx, evsel->core.cpus) {
> +               struct perf_stat_aggr *aggr = &ps->aggr[idx];
> +
> +               if (aggr->used)
> +                       continue;
> +
> +               core_id = aggr_cpu_id__core(cpu, NULL);
> +               evsel__update_percore_stats(evsel, &core_id);
> +       }
> +}
> +
> +/* process cpu stats on per-core events */
> +void perf_stat_process_percore(struct perf_stat_config *config, struct evlist *evlist)
> +{
> +       struct evsel *evsel;
> +
> +       if (config->aggr_mode != AGGR_NONE)
> +               return;
> +
> +       evlist__for_each_entry(evlist, evsel)
> +               evsel__process_percore(evsel);
> +}
> +
>  int perf_event__process_stat_event(struct perf_session *session,
>                                    union perf_event *event)
>  {
> diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> index 12cc60ab04e4..ac85ed46aa59 100644
> --- a/tools/perf/util/stat.h
> +++ b/tools/perf/util/stat.h
> @@ -46,6 +46,7 @@ enum perf_stat_evsel_id {
>  struct perf_stat_aggr {
>         struct perf_counts_values       counts;
>         int                             nr;
> +       bool                            used;
>         bool                            failed;
>  };
>
> @@ -271,6 +272,7 @@ void evlist__reset_aggr_stats(struct evlist *evlist);
>  int perf_stat_process_counter(struct perf_stat_config *config,
>                               struct evsel *counter);
>  void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *evlist);
> +void perf_stat_process_percore(struct perf_stat_config *config, struct evlist *evlist);
>
>  struct perf_tool;
>  union perf_event;
> --
> 2.38.0.rc1.362.ged0d419d3c-goog
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 16/19] perf stat: Add perf_stat_process_shadow_stats()
  2022-10-10  5:35 ` [PATCH 16/19] perf stat: Add perf_stat_process_shadow_stats() Namhyung Kim
@ 2022-10-10 23:36   ` Ian Rogers
  0 siblings, 0 replies; 63+ messages in thread
From: Ian Rogers @ 2022-10-10 23:36 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> This function updates the shadow stats using the aggregated counts
> uniformly since it uses the aggr_counts for the every aggr mode.
>
> It'd have duplicate shadow stats for each items for now since the
> display routines will update them once again.  But that'd be fine
> as it shows the average values and it'd be gone eventually.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/builtin-stat.c |  1 +
>  tools/perf/util/stat.c    | 50 ++++++++++++++++++++-------------------
>  tools/perf/util/stat.h    |  1 +
>  3 files changed, 28 insertions(+), 24 deletions(-)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index c127e784a7be..d92815f4eae0 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -489,6 +489,7 @@ static void process_counters(void)
>
>         perf_stat_merge_counters(&stat_config, evsel_list);
>         perf_stat_process_percore(&stat_config, evsel_list);
> +       perf_stat_process_shadow_stats(&stat_config, evsel_list);
>  }
>
>  static void process_interval(void)
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index d788d0e85204..f2a3761dacff 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -454,7 +454,7 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
>                 aggr_counts->val += count->val;
>                 aggr_counts->ena += count->ena;
>                 aggr_counts->run += count->run;
> -               goto update;
> +               return 0;
>         }
>
>         if (ps->aggr) {
> @@ -491,32 +491,10 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
>                 }
>         }
>
> -update:
> -       switch (config->aggr_mode) {
> -       case AGGR_THREAD:
> -       case AGGR_CORE:
> -       case AGGR_DIE:
> -       case AGGR_SOCKET:
> -       case AGGR_NODE:
> -       case AGGR_NONE:
> -               if ((config->aggr_mode == AGGR_NONE) && (!evsel->percore)) {
> -                       perf_stat__update_shadow_stats(evsel, count->val,
> -                                                      cpu_map_idx, &rt_stat);
> -               }
> -
> -               if (config->aggr_mode == AGGR_THREAD) {
> -                       perf_stat__update_shadow_stats(evsel, count->val,
> -                                                      thread, &rt_stat);
> -               }
> -               break;
> -       case AGGR_GLOBAL:
> +       if (config->aggr_mode == AGGR_GLOBAL) {
>                 aggr->val += count->val;
>                 aggr->ena += count->ena;
>                 aggr->run += count->run;
> -       case AGGR_UNSET:
> -       case AGGR_MAX:
> -       default:
> -               break;
>         }
>
>         return 0;
> @@ -742,6 +720,30 @@ void perf_stat_process_percore(struct perf_stat_config *config, struct evlist *e
>                 evsel__process_percore(evsel);
>  }
>
> +static void evsel__update_shadow_stats(struct evsel *evsel)
> +{
> +       struct perf_stat_evsel *ps = evsel->stats;
> +       int i;
> +
> +       if (ps->aggr == NULL)
> +               return;
> +
> +       for (i = 0; i < ps->nr_aggr; i++) {
> +               struct perf_counts_values *aggr_counts = &ps->aggr[i].counts;
> +
> +               perf_stat__update_shadow_stats(evsel, aggr_counts->val, i, &rt_stat);
> +       }
> +}
> +
> +void perf_stat_process_shadow_stats(struct perf_stat_config *config __maybe_unused,
> +                                   struct evlist *evlist)
> +{
> +       struct evsel *evsel;
> +
> +       evlist__for_each_entry(evlist, evsel)
> +               evsel__update_shadow_stats(evsel);
> +}
> +
>  int perf_event__process_stat_event(struct perf_session *session,
>                                    union perf_event *event)
>  {
> diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> index ac85ed46aa59..e51214918c7f 100644
> --- a/tools/perf/util/stat.h
> +++ b/tools/perf/util/stat.h
> @@ -273,6 +273,7 @@ int perf_stat_process_counter(struct perf_stat_config *config,
>                               struct evsel *counter);
>  void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *evlist);
>  void perf_stat_process_percore(struct perf_stat_config *config, struct evlist *evlist);
> +void perf_stat_process_shadow_stats(struct perf_stat_config *config, struct evlist *evlist);
>
>  struct perf_tool;
>  union perf_event;
> --
> 2.38.0.rc1.362.ged0d419d3c-goog
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 17/19] perf stat: Display event stats using aggr counts
  2022-10-10  5:35 ` [PATCH 17/19] perf stat: Display event stats using aggr counts Namhyung Kim
@ 2022-10-10 23:38   ` Ian Rogers
  0 siblings, 0 replies; 63+ messages in thread
From: Ian Rogers @ 2022-10-10 23:38 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Now aggr counts are ready for use.  Convert the display routines to use
> the aggr counts and update the shadow stat with them.  It doesn't need
> to aggregate counts or collect aliases anymore during the display.  Get
> rid of now unused struct perf_aggr_thread_value.
>
> Note that there's a difference in the display order among the aggr mode.
> For per-core/die/socket/node aggregation, it shows relevant events in
> the same unit together, whereas global/thread/no aggregation it shows
> the same events for different units together.  So it still uses separate
> codes to display them due to the ordering.
>
> One more thing to note is that it breaks per-core event display for now.
> The next patch will fix it to have identical output as of now.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/util/stat-display.c | 428 +++++----------------------------
>  tools/perf/util/stat.c         |   5 -
>  tools/perf/util/stat.h         |   9 -
>  3 files changed, 55 insertions(+), 387 deletions(-)

Nice!

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

>
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index 1d8e585df4ad..0c0e22c175a1 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -442,31 +442,6 @@ static void print_metric_header(struct perf_stat_config *config,
>                 fprintf(os->fh, "%*s ", config->metric_only_len, unit);
>  }
>
> -static int first_shadow_map_idx(struct perf_stat_config *config,
> -                               struct evsel *evsel, const struct aggr_cpu_id *id)
> -{
> -       struct perf_cpu_map *cpus = evsel__cpus(evsel);
> -       struct perf_cpu cpu;
> -       int idx;
> -
> -       if (config->aggr_mode == AGGR_NONE)
> -               return perf_cpu_map__idx(cpus, id->cpu);
> -
> -       if (config->aggr_mode == AGGR_THREAD)
> -               return id->thread_idx;
> -
> -       if (!config->aggr_get_id)
> -               return 0;
> -
> -       perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
> -               struct aggr_cpu_id cpu_id = config->aggr_get_id(config, cpu);
> -
> -               if (aggr_cpu_id__equal(&cpu_id, id))
> -                       return idx;
> -       }
> -       return 0;
> -}
> -
>  static void abs_printout(struct perf_stat_config *config,
>                          struct aggr_cpu_id id, int nr, struct evsel *evsel, double avg)
>  {
> @@ -537,7 +512,7 @@ static bool is_mixed_hw_group(struct evsel *counter)
>  static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int nr,
>                      struct evsel *counter, double uval,
>                      char *prefix, u64 run, u64 ena, double noise,
> -                    struct runtime_stat *st)
> +                    struct runtime_stat *st, int map_idx)
>  {
>         struct perf_stat_output_ctx out;
>         struct outstate os = {
> @@ -648,8 +623,7 @@ static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int
>                 print_running(config, run, ena);
>         }
>
> -       perf_stat__print_shadow_stats(config, counter, uval,
> -                               first_shadow_map_idx(config, counter, &id),
> +       perf_stat__print_shadow_stats(config, counter, uval, map_idx,
>                                 &out, &config->metric_events, st);
>         if (!config->csv_output && !config->metric_only && !config->json_output) {
>                 print_noise(config, counter, noise);
> @@ -657,34 +631,6 @@ static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int
>         }
>  }
>
> -static void aggr_update_shadow(struct perf_stat_config *config,
> -                              struct evlist *evlist)
> -{
> -       int idx, s;
> -       struct perf_cpu cpu;
> -       struct aggr_cpu_id s2, id;
> -       u64 val;
> -       struct evsel *counter;
> -       struct perf_cpu_map *cpus;
> -
> -       for (s = 0; s < config->aggr_map->nr; s++) {
> -               id = config->aggr_map->map[s];
> -               evlist__for_each_entry(evlist, counter) {
> -                       cpus = evsel__cpus(counter);
> -                       val = 0;
> -                       perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
> -                               s2 = config->aggr_get_id(config, cpu);
> -                               if (!aggr_cpu_id__equal(&s2, &id))
> -                                       continue;
> -                               val += perf_counts(counter->counts, idx, 0)->val;
> -                       }
> -                       perf_stat__update_shadow_stats(counter, val,
> -                                       first_shadow_map_idx(config, counter, &id),
> -                                       &rt_stat);
> -               }
> -       }
> -}
> -
>  static void uniquify_event_name(struct evsel *counter)
>  {
>         char *new_name;
> @@ -721,137 +667,51 @@ static void uniquify_event_name(struct evsel *counter)
>         counter->uniquified_name = true;
>  }
>
> -static void collect_all_aliases(struct perf_stat_config *config, struct evsel *counter,
> -                           void (*cb)(struct perf_stat_config *config, struct evsel *counter, void *data,
> -                                      bool first),
> -                           void *data)
> -{
> -       struct evlist *evlist = counter->evlist;
> -       struct evsel *alias;
> -
> -       alias = list_prepare_entry(counter, &(evlist->core.entries), core.node);
> -       list_for_each_entry_continue (alias, &evlist->core.entries, core.node) {
> -               /* Merge events with the same name, etc. but on different PMUs. */
> -               if (!strcmp(evsel__name(alias), evsel__name(counter)) &&
> -                       alias->scale == counter->scale &&
> -                       alias->cgrp == counter->cgrp &&
> -                       !strcmp(alias->unit, counter->unit) &&
> -                       evsel__is_clock(alias) == evsel__is_clock(counter) &&
> -                       strcmp(alias->pmu_name, counter->pmu_name)) {
> -                       alias->merged_stat = true;
> -                       cb(config, alias, data, false);
> -               }
> -       }
> -}
> -
> -static bool hybrid_merge(struct evsel *counter, struct perf_stat_config *config,
> -                        bool check)
> +static bool hybrid_uniquify(struct evsel *evsel, struct perf_stat_config *config)
>  {
> -       if (evsel__is_hybrid(counter)) {
> -               if (check)
> -                       return config->hybrid_merge;
> -               else
> -                       return !config->hybrid_merge;
> -       }
> -
> -       return false;
> +       return evsel__is_hybrid(evsel) && !config->hybrid_merge;
>  }
>
> -static bool collect_data(struct perf_stat_config *config, struct evsel *counter,
> -                           void (*cb)(struct perf_stat_config *config, struct evsel *counter, void *data,
> -                                      bool first),
> -                           void *data)
> +static void uniquify_counter(struct perf_stat_config *config, struct evsel *counter)
>  {
> -       if (counter->merged_stat)
> -               return false;
> -       cb(config, counter, data, true);
> -       if (config->no_merge || hybrid_merge(counter, config, false))
> +       if (config->no_merge || hybrid_uniquify(counter, config))
>                 uniquify_event_name(counter);
> -       else if (counter->auto_merge_stats || hybrid_merge(counter, config, true))
> -               collect_all_aliases(config, counter, cb, data);
> -       return true;
> -}
> -
> -struct aggr_data {
> -       u64 ena, run, val;
> -       struct aggr_cpu_id id;
> -       int nr;
> -       int cpu_map_idx;
> -};
> -
> -static void aggr_cb(struct perf_stat_config *config,
> -                   struct evsel *counter, void *data, bool first)
> -{
> -       struct aggr_data *ad = data;
> -       int idx;
> -       struct perf_cpu cpu;
> -       struct perf_cpu_map *cpus;
> -       struct aggr_cpu_id s2;
> -
> -       cpus = evsel__cpus(counter);
> -       perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
> -               struct perf_counts_values *counts;
> -
> -               s2 = config->aggr_get_id(config, cpu);
> -               if (!aggr_cpu_id__equal(&s2, &ad->id))
> -                       continue;
> -               if (first)
> -                       ad->nr++;
> -               counts = perf_counts(counter->counts, idx, 0);
> -               /*
> -                * When any result is bad, make them all to give
> -                * consistent output in interval mode.
> -                */
> -               if (counts->ena == 0 || counts->run == 0 ||
> -                   counter->counts->scaled == -1) {
> -                       ad->ena = 0;
> -                       ad->run = 0;
> -                       break;
> -               }
> -               ad->val += counts->val;
> -               ad->ena += counts->ena;
> -               ad->run += counts->run;
> -       }
>  }
>
>  static void print_counter_aggrdata(struct perf_stat_config *config,
>                                    struct evsel *counter, int s,
>                                    char *prefix, bool metric_only,
> -                                  bool *first, struct perf_cpu cpu)
> +                                  bool *first)
>  {
> -       struct aggr_data ad;
>         FILE *output = config->output;
>         u64 ena, run, val;
> -       int nr;
> -       struct aggr_cpu_id id;
>         double uval;
> +       struct perf_stat_evsel *ps = counter->stats;
> +       struct perf_stat_aggr *aggr = &ps->aggr[s];
> +       struct aggr_cpu_id id = config->aggr_map->map[s];
> +       double avg = aggr->counts.val;
>
> -       ad.id = id = config->aggr_map->map[s];
> -       ad.val = ad.ena = ad.run = 0;
> -       ad.nr = 0;
> -       if (!collect_data(config, counter, aggr_cb, &ad))
> +       if (aggr->nr == 0)
>                 return;
>
> -       if (perf_pmu__has_hybrid() && ad.ena == 0)
> -               return;
> +       uniquify_counter(config, counter);
> +
> +       val = aggr->counts.val;
> +       ena = aggr->counts.ena;
> +       run = aggr->counts.run;
>
> -       nr = ad.nr;
> -       ena = ad.ena;
> -       run = ad.run;
> -       val = ad.val;
>         if (*first && metric_only) {
>                 *first = false;
> -               aggr_printout(config, counter, id, nr);
> +               aggr_printout(config, counter, id, aggr->nr);
>         }
>         if (prefix && !metric_only)
>                 fprintf(output, "%s", prefix);
>
>         uval = val * counter->scale;
> -       if (cpu.cpu != -1)
> -               id = aggr_cpu_id__cpu(cpu, /*data=*/NULL);
>
> -       printout(config, id, nr, counter, uval,
> -                prefix, run, ena, 1.0, &rt_stat);
> +       printout(config, id, aggr->nr, counter, uval,
> +                prefix, run, ena, avg, &rt_stat, s);
> +
>         if (!metric_only)
>                 fputc('\n', output);
>  }
> @@ -869,8 +729,6 @@ static void print_aggr(struct perf_stat_config *config,
>         if (!config->aggr_map || !config->aggr_get_id)
>                 return;
>
> -       aggr_update_shadow(config, evlist);
> -
>         /*
>          * With metric_only everything is on a single line.
>          * Without each counter has its own line.
> @@ -881,188 +739,39 @@ static void print_aggr(struct perf_stat_config *config,
>
>                 first = true;
>                 evlist__for_each_entry(evlist, counter) {
> +                       if (counter->merged_stat)
> +                               continue;
> +
>                         print_counter_aggrdata(config, counter, s,
> -                                       prefix, metric_only,
> -                                       &first, (struct perf_cpu){ .cpu = -1 });
> +                                              prefix, metric_only,
> +                                              &first);
>                 }
>                 if (metric_only)
>                         fputc('\n', output);
>         }
>  }
>
> -static int cmp_val(const void *a, const void *b)
> -{
> -       return ((struct perf_aggr_thread_value *)b)->val -
> -               ((struct perf_aggr_thread_value *)a)->val;
> -}
> -
> -static struct perf_aggr_thread_value *sort_aggr_thread(
> -                                       struct evsel *counter,
> -                                       int *ret,
> -                                       struct target *_target)
> -{
> -       int nthreads = perf_thread_map__nr(counter->core.threads);
> -       int i = 0;
> -       double uval;
> -       struct perf_aggr_thread_value *buf;
> -
> -       buf = calloc(nthreads, sizeof(struct perf_aggr_thread_value));
> -       if (!buf)
> -               return NULL;
> -
> -       for (int thread = 0; thread < nthreads; thread++) {
> -               int idx;
> -               u64 ena = 0, run = 0, val = 0;
> -
> -               perf_cpu_map__for_each_idx(idx, evsel__cpus(counter)) {
> -                       struct perf_counts_values *counts =
> -                               perf_counts(counter->counts, idx, thread);
> -
> -                       val += counts->val;
> -                       ena += counts->ena;
> -                       run += counts->run;
> -               }
> -
> -               uval = val * counter->scale;
> -
> -               /*
> -                * Skip value 0 when enabling --per-thread globally,
> -                * otherwise too many 0 output.
> -                */
> -               if (uval == 0.0 && target__has_per_thread(_target))
> -                       continue;
> -
> -               buf[i].counter = counter;
> -               buf[i].id = aggr_cpu_id__empty();
> -               buf[i].id.thread_idx = thread;
> -               buf[i].uval = uval;
> -               buf[i].val = val;
> -               buf[i].run = run;
> -               buf[i].ena = ena;
> -               i++;
> -       }
> -
> -       qsort(buf, i, sizeof(struct perf_aggr_thread_value), cmp_val);
> -
> -       if (ret)
> -               *ret = i;
> -
> -       return buf;
> -}
> -
> -static void print_aggr_thread(struct perf_stat_config *config,
> -                             struct target *_target,
> -                             struct evsel *counter, char *prefix)
> -{
> -       FILE *output = config->output;
> -       int thread, sorted_threads;
> -       struct aggr_cpu_id id;
> -       struct perf_aggr_thread_value *buf;
> -
> -       buf = sort_aggr_thread(counter, &sorted_threads, _target);
> -       if (!buf) {
> -               perror("cannot sort aggr thread");
> -               return;
> -       }
> -
> -       for (thread = 0; thread < sorted_threads; thread++) {
> -               if (prefix)
> -                       fprintf(output, "%s", prefix);
> -
> -               id = buf[thread].id;
> -               printout(config, id, 0, buf[thread].counter, buf[thread].uval,
> -                        prefix, buf[thread].run, buf[thread].ena, 1.0,
> -                        &rt_stat);
> -               fputc('\n', output);
> -       }
> -
> -       free(buf);
> -}
> -
> -struct caggr_data {
> -       double avg, avg_enabled, avg_running;
> -};
> -
> -static void counter_aggr_cb(struct perf_stat_config *config __maybe_unused,
> -                           struct evsel *counter, void *data,
> -                           bool first __maybe_unused)
> -{
> -       struct caggr_data *cd = data;
> -       struct perf_counts_values *aggr = &counter->counts->aggr;
> -
> -       cd->avg += aggr->val;
> -       cd->avg_enabled += aggr->ena;
> -       cd->avg_running += aggr->run;
> -}
> -
> -/*
> - * Print out the results of a single counter:
> - * aggregated counts in system-wide mode
> - */
> -static void print_counter_aggr(struct perf_stat_config *config,
> -                              struct evsel *counter, char *prefix)
> -{
> -       bool metric_only = config->metric_only;
> -       FILE *output = config->output;
> -       double uval;
> -       struct caggr_data cd = { .avg = 0.0 };
> -
> -       if (!collect_data(config, counter, counter_aggr_cb, &cd))
> -               return;
> -
> -       if (prefix && !metric_only)
> -               fprintf(output, "%s", prefix);
> -
> -       uval = cd.avg * counter->scale;
> -       printout(config, aggr_cpu_id__empty(), 0, counter, uval, prefix, cd.avg_running,
> -                cd.avg_enabled, cd.avg, &rt_stat);
> -       if (!metric_only)
> -               fprintf(output, "\n");
> -}
> -
> -static void counter_cb(struct perf_stat_config *config __maybe_unused,
> -                      struct evsel *counter, void *data,
> -                      bool first __maybe_unused)
> -{
> -       struct aggr_data *ad = data;
> -
> -       ad->val += perf_counts(counter->counts, ad->cpu_map_idx, 0)->val;
> -       ad->ena += perf_counts(counter->counts, ad->cpu_map_idx, 0)->ena;
> -       ad->run += perf_counts(counter->counts, ad->cpu_map_idx, 0)->run;
> -}
> -
> -/*
> - * Print out the results of a single counter:
> - * does not use aggregated count in system-wide
> - */
>  static void print_counter(struct perf_stat_config *config,
>                           struct evsel *counter, char *prefix)
>  {
> +       bool metric_only = config->metric_only;
>         FILE *output = config->output;
> -       u64 ena, run, val;
> -       double uval;
> -       int idx;
> -       struct perf_cpu cpu;
> -       struct aggr_cpu_id id;
> -
> -       perf_cpu_map__for_each_cpu(cpu, idx, evsel__cpus(counter)) {
> -               struct aggr_data ad = { .cpu_map_idx = idx };
> -
> -               if (!collect_data(config, counter, counter_cb, &ad))
> -                       return;
> -               val = ad.val;
> -               ena = ad.ena;
> -               run = ad.run;
> +       bool first = false;
> +       int s;
>
> -               if (prefix)
> -                       fprintf(output, "%s", prefix);
> +       /* AGGR_THREAD doesn't have config->aggr_get_id */
> +       if (!config->aggr_map)
> +               return;
>
> -               uval = val * counter->scale;
> -               id = aggr_cpu_id__cpu(cpu, /*data=*/NULL);
> -               printout(config, id, 0, counter, uval, prefix,
> -                        run, ena, 1.0, &rt_stat);
> +       if (counter->merged_stat)
> +               return;
>
> -               fputc('\n', output);
> +       for (s = 0; s < config->aggr_map->nr; s++) {
> +               print_counter_aggrdata(config, counter, s,
> +                                      prefix, metric_only,
> +                                      &first);
> +               if (metric_only)
> +                       fputc('\n', output);
>         }
>  }
>
> @@ -1081,6 +790,7 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
>                         u64 ena, run, val;
>                         double uval;
>                         struct aggr_cpu_id id;
> +                       struct perf_stat_evsel *ps = counter->stats;
>                         int counter_idx = perf_cpu_map__idx(evsel__cpus(counter), cpu);
>
>                         if (counter_idx < 0)
> @@ -1093,13 +803,13 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
>                                 aggr_printout(config, counter, id, 0);
>                                 first = false;
>                         }
> -                       val = perf_counts(counter->counts, counter_idx, 0)->val;
> -                       ena = perf_counts(counter->counts, counter_idx, 0)->ena;
> -                       run = perf_counts(counter->counts, counter_idx, 0)->run;
> +                       val = ps->aggr[counter_idx].counts.val;
> +                       ena = ps->aggr[counter_idx].counts.ena;
> +                       run = ps->aggr[counter_idx].counts.run;
>
>                         uval = val * counter->scale;
>                         printout(config, id, 0, counter, uval, prefix,
> -                                run, ena, 1.0, &rt_stat);
> +                                run, ena, 1.0, &rt_stat, counter_idx);
>                 }
>                 if (!first)
>                         fputc('\n', config->output);
> @@ -1135,8 +845,8 @@ static void print_metric_headers(struct perf_stat_config *config,
>         };
>         bool first = true;
>
> -               if (config->json_output && !config->interval)
> -                       fprintf(config->output, "{");
> +       if (config->json_output && !config->interval)
> +               fprintf(config->output, "{");
>
>         if (prefix && !config->json_output)
>                 fprintf(config->output, "%s", prefix);
> @@ -1379,31 +1089,6 @@ static void print_footer(struct perf_stat_config *config)
>                         "the same PMU. Try reorganizing the group.\n");
>  }
>
> -static void print_percore_thread(struct perf_stat_config *config,
> -                                struct evsel *counter, char *prefix)
> -{
> -       int s;
> -       struct aggr_cpu_id s2, id;
> -       struct perf_cpu_map *cpus;
> -       bool first = true;
> -       int idx;
> -       struct perf_cpu cpu;
> -
> -       cpus = evsel__cpus(counter);
> -       perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
> -               s2 = config->aggr_get_id(config, cpu);
> -               for (s = 0; s < config->aggr_map->nr; s++) {
> -                       id = config->aggr_map->map[s];
> -                       if (aggr_cpu_id__equal(&s2, &id))
> -                               break;
> -               }
> -
> -               print_counter_aggrdata(config, counter, s,
> -                                      prefix, false,
> -                                      &first, cpu);
> -       }
> -}
> -
>  static void print_percore(struct perf_stat_config *config,
>                           struct evsel *counter, char *prefix)
>  {
> @@ -1416,15 +1101,14 @@ static void print_percore(struct perf_stat_config *config,
>                 return;
>
>         if (config->percore_show_thread)
> -               return print_percore_thread(config, counter, prefix);
> +               return print_counter(config, counter, prefix);
>
>         for (s = 0; s < config->aggr_map->nr; s++) {
>                 if (prefix && metric_only)
>                         fprintf(output, "%s", prefix);
>
>                 print_counter_aggrdata(config, counter, s,
> -                               prefix, metric_only,
> -                               &first, (struct perf_cpu){ .cpu = -1 });
> +                                      prefix, metric_only, &first);
>         }
>
>         if (metric_only)
> @@ -1469,16 +1153,14 @@ void evlist__print_counters(struct evlist *evlist, struct perf_stat_config *conf
>                 print_aggr(config, evlist, prefix);
>                 break;
>         case AGGR_THREAD:
> -               evlist__for_each_entry(evlist, counter) {
> -                       print_aggr_thread(config, _target, counter, prefix);
> -               }
> -               break;
>         case AGGR_GLOBAL:
> -               if (config->iostat_run)
> +               if (config->iostat_run) {
>                         iostat_print_counters(evlist, config, ts, prefix = buf,
> -                                             print_counter_aggr);
> -               else
> -                       print_aggr(config, evlist, prefix);
> +                                             print_counter);
> +                       break;
> +               }
> +               evlist__for_each_entry(evlist, counter)
> +                       print_counter(config, counter, prefix);
>                 break;
>         case AGGR_NONE:
>                 if (metric_only)
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index f2a3761dacff..1652586a4925 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -545,11 +545,6 @@ int perf_stat_process_counter(struct perf_stat_config *config,
>                         evsel__name(counter), count[0], count[1], count[2]);
>         }
>
> -       /*
> -        * Save the full runtime - to allow normalization during printout:
> -        */
> -       perf_stat__update_shadow_stats(counter, *count, 0, &rt_stat);
> -
>         return 0;
>  }
>
> diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> index e51214918c7f..b02d8a4ffabf 100644
> --- a/tools/perf/util/stat.h
> +++ b/tools/perf/util/stat.h
> @@ -213,15 +213,6 @@ static inline void update_rusage_stats(struct rusage_stats *ru_stats, struct rus
>  struct evsel;
>  struct evlist;
>
> -struct perf_aggr_thread_value {
> -       struct evsel *counter;
> -       struct aggr_cpu_id id;
> -       double uval;
> -       u64 val;
> -       u64 run;
> -       u64 ena;
> -};
> -
>  bool __perf_stat_evsel__is(struct evsel *evsel, enum perf_stat_evsel_id id);
>
>  #define perf_stat_evsel__is(evsel, id) \
> --
> 2.38.0.rc1.362.ged0d419d3c-goog
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 18/19] perf stat: Display percore events properly
  2022-10-10  5:35 ` [PATCH 18/19] perf stat: Display percore events properly Namhyung Kim
@ 2022-10-10 23:39   ` Ian Rogers
  0 siblings, 0 replies; 63+ messages in thread
From: Ian Rogers @ 2022-10-10 23:39 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> The recent change in the perf stat broke the percore event display.
> Note that the aggr counts are already processed so that the every
> sibling thread in the same core will get the per-core counter values.

Could we add a test given this has broken once?

> Check percore evsels and skip the sibling threads in the display.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

> ---
>  tools/perf/builtin-stat.c      | 16 ----------------
>  tools/perf/util/stat-display.c | 27 +++++++++++++++++++++++++--
>  2 files changed, 25 insertions(+), 18 deletions(-)
>
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index d92815f4eae0..b3a39d4c86a7 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1403,18 +1403,6 @@ static struct aggr_cpu_id perf_stat__get_cpu_cached(struct perf_stat_config *con
>         return perf_stat__get_aggr(config, perf_stat__get_cpu, cpu);
>  }
>
> -static bool term_percore_set(void)
> -{
> -       struct evsel *counter;
> -
> -       evlist__for_each_entry(evsel_list, counter) {
> -               if (counter->percore)
> -                       return true;
> -       }
> -
> -       return false;
> -}
> -
>  static aggr_cpu_id_get_t aggr_mode__get_aggr(enum aggr_mode aggr_mode)
>  {
>         switch (aggr_mode) {
> @@ -1427,8 +1415,6 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr(enum aggr_mode aggr_mode)
>         case AGGR_NODE:
>                 return aggr_cpu_id__node;
>         case AGGR_NONE:
> -               if (term_percore_set())
> -                       return aggr_cpu_id__core;
>                 return aggr_cpu_id__cpu;;
>         case AGGR_GLOBAL:
>                 return aggr_cpu_id__global;
> @@ -1452,8 +1438,6 @@ static aggr_get_id_t aggr_mode__get_id(enum aggr_mode aggr_mode)
>         case AGGR_NODE:
>                 return perf_stat__get_node_cached;
>         case AGGR_NONE:
> -               if (term_percore_set())
> -                       return perf_stat__get_core_cached;
>                 return perf_stat__get_cpu_cached;
>         case AGGR_GLOBAL:
>                 return perf_stat__get_global_cached;
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index 0c0e22c175a1..e0c0df99d40d 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -1094,7 +1094,8 @@ static void print_percore(struct perf_stat_config *config,
>  {
>         bool metric_only = config->metric_only;
>         FILE *output = config->output;
> -       int s;
> +       struct cpu_aggr_map *core_map;
> +       int s, c, i;
>         bool first = true;
>
>         if (!config->aggr_map || !config->aggr_get_id)
> @@ -1103,13 +1104,35 @@ static void print_percore(struct perf_stat_config *config,
>         if (config->percore_show_thread)
>                 return print_counter(config, counter, prefix);
>
> -       for (s = 0; s < config->aggr_map->nr; s++) {
> +       core_map = cpu_aggr_map__empty_new(config->aggr_map->nr);
> +       if (core_map == NULL) {
> +               fprintf(output, "Cannot allocate per-core aggr map for display\n");
> +               return;
> +       }
> +
> +       for (s = 0, c = 0; s < config->aggr_map->nr; s++) {
> +               struct perf_cpu curr_cpu = config->aggr_map->map[s].cpu;
> +               struct aggr_cpu_id core_id = aggr_cpu_id__core(curr_cpu, NULL);
> +               bool found = false;
> +
> +               for (i = 0; i < c; i++) {
> +                       if (aggr_cpu_id__equal(&core_map->map[i], &core_id)) {
> +                               found = true;
> +                               break;
> +                       }
> +               }
> +               if (found)
> +                       continue;
> +
>                 if (prefix && metric_only)
>                         fprintf(output, "%s", prefix);
>
>                 print_counter_aggrdata(config, counter, s,
>                                        prefix, metric_only, &first);
> +
> +               core_map->map[c++] = core_id;
>         }
> +       free(core_map);
>
>         if (metric_only)
>                 fputc('\n', output);
> --
> 2.38.0.rc1.362.ged0d419d3c-goog
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 19/19] perf stat: Remove unused perf_counts.aggr field
  2022-10-10  5:36 ` [PATCH 19/19] perf stat: Remove unused perf_counts.aggr field Namhyung Kim
@ 2022-10-10 23:40   ` Ian Rogers
  2022-10-12  8:41   ` Jiri Olsa
  1 sibling, 0 replies; 63+ messages in thread
From: Ian Rogers @ 2022-10-10 23:40 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> The aggr field in the struct perf_counts is to keep the aggregated value
> in the AGGR_GLOBAL for the old code.  But it's not used anymore.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/util/counts.c |  1 -
>  tools/perf/util/counts.h |  1 -
>  tools/perf/util/stat.c   | 35 ++---------------------------------
>  3 files changed, 2 insertions(+), 35 deletions(-)

Very nice!

Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Ian

>
> diff --git a/tools/perf/util/counts.c b/tools/perf/util/counts.c
> index 7a447d918458..11cd85b278a6 100644
> --- a/tools/perf/util/counts.c
> +++ b/tools/perf/util/counts.c
> @@ -48,7 +48,6 @@ void perf_counts__reset(struct perf_counts *counts)
>  {
>         xyarray__reset(counts->loaded);
>         xyarray__reset(counts->values);
> -       memset(&counts->aggr, 0, sizeof(struct perf_counts_values));
>  }
>
>  void evsel__reset_counts(struct evsel *evsel)
> diff --git a/tools/perf/util/counts.h b/tools/perf/util/counts.h
> index 5de275194f2b..42760242e0df 100644
> --- a/tools/perf/util/counts.h
> +++ b/tools/perf/util/counts.h
> @@ -11,7 +11,6 @@ struct evsel;
>
>  struct perf_counts {
>         s8                        scaled;
> -       struct perf_counts_values aggr;
>         struct xyarray            *values;
>         struct xyarray            *loaded;
>  };
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index 1652586a4925..0dccfa273fa7 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -307,8 +307,6 @@ static void evsel__copy_prev_raw_counts(struct evsel *evsel)
>                                 *perf_counts(evsel->prev_raw_counts, idx, thread);
>                 }
>         }
> -
> -       evsel->counts->aggr = evsel->prev_raw_counts->aggr;
>  }
>
>  void evlist__copy_prev_raw_counts(struct evlist *evlist)
> @@ -319,26 +317,6 @@ void evlist__copy_prev_raw_counts(struct evlist *evlist)
>                 evsel__copy_prev_raw_counts(evsel);
>  }
>
> -void evlist__save_aggr_prev_raw_counts(struct evlist *evlist)
> -{
> -       struct evsel *evsel;
> -
> -       /*
> -        * To collect the overall statistics for interval mode,
> -        * we copy the counts from evsel->prev_raw_counts to
> -        * evsel->counts. The perf_stat_process_counter creates
> -        * aggr values from per cpu values, but the per cpu values
> -        * are 0 for AGGR_GLOBAL. So we use a trick that saves the
> -        * previous aggr value to the first member of perf_counts,
> -        * then aggr calculation in process_counter_values can work
> -        * correctly.
> -        */
> -       evlist__for_each_entry(evlist, evsel) {
> -               *perf_counts(evsel->prev_raw_counts, 0, 0) =
> -                       evsel->prev_raw_counts->aggr;
> -       }
> -}
> -
>  static size_t pkg_id_hash(const void *__key, void *ctx __maybe_unused)
>  {
>         uint64_t *key = (uint64_t *) __key;
> @@ -422,7 +400,6 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
>                        int cpu_map_idx, int thread,
>                        struct perf_counts_values *count)
>  {
> -       struct perf_counts_values *aggr = &evsel->counts->aggr;
>         struct perf_stat_evsel *ps = evsel->stats;
>         static struct perf_counts_values zero;
>         bool skip = false;
> @@ -491,12 +468,6 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
>                 }
>         }
>
> -       if (config->aggr_mode == AGGR_GLOBAL) {
> -               aggr->val += count->val;
> -               aggr->ena += count->ena;
> -               aggr->run += count->run;
> -       }
> -
>         return 0;
>  }
>
> @@ -521,13 +492,10 @@ static int process_counter_maps(struct perf_stat_config *config,
>  int perf_stat_process_counter(struct perf_stat_config *config,
>                               struct evsel *counter)
>  {
> -       struct perf_counts_values *aggr = &counter->counts->aggr;
>         struct perf_stat_evsel *ps = counter->stats;
> -       u64 *count = counter->counts->aggr.values;
> +       u64 *count;
>         int ret;
>
> -       aggr->val = aggr->ena = aggr->run = 0;
> -
>         if (counter->per_pkg)
>                 evsel__zero_per_pkg(counter);
>
> @@ -538,6 +506,7 @@ int perf_stat_process_counter(struct perf_stat_config *config,
>         if (config->aggr_mode != AGGR_GLOBAL)
>                 return 0;
>
> +       count = ps->aggr[0].counts.values;
>         update_stats(&ps->res_stats, *count);
>
>         if (verbose > 0) {
> --
> 2.38.0.rc1.362.ged0d419d3c-goog
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1)
  2022-10-10  5:35 [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1) Namhyung Kim
                   ` (18 preceding siblings ...)
  2022-10-10  5:36 ` [PATCH 19/19] perf stat: Remove unused perf_counts.aggr field Namhyung Kim
@ 2022-10-11  0:25 ` Andi Kleen
  2022-10-11  5:38   ` Namhyung Kim
  19 siblings, 1 reply; 63+ messages in thread
From: Andi Kleen @ 2022-10-11  0:25 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Athira Rajeev, James Clark,
	Xing Zhengjun


On 10/10/2022 10:35 PM, Namhyung Kim wrote:
> Hello,
>
> Current perf stat code is somewhat hard to follow since it handles
> many combinations of PMUs/events for given display and aggregation
> options.  This is my attempt to clean it up a little. ;-)


My main concern would be subtle regressions since there are so many 
different combinations and way to travel through the code, and a lot of 
things are not covered by unit tests. When I worked on the code it was 
difficult to keep it all working. I assume you have some way to 
enumerate them all and tested that the output is identical?

-Andi


^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 02/19] perf tools: Use pmu info in evsel__is_hybrid()
  2022-10-10 22:31   ` Ian Rogers
@ 2022-10-11  5:10     ` Namhyung Kim
  0 siblings, 0 replies; 63+ messages in thread
From: Namhyung Kim @ 2022-10-11  5:10 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

Hi Ian,

On Mon, Oct 10, 2022 at 3:31 PM Ian Rogers <irogers@google.com> wrote:
>
> On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > If evsel has pmu, it can use pmu->is_hybrid directly.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/perf/util/evsel.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > index 196f8e4859d7..a6ea91c72659 100644
> > --- a/tools/perf/util/evsel.c
> > +++ b/tools/perf/util/evsel.c
> > @@ -3132,6 +3132,9 @@ void evsel__zero_per_pkg(struct evsel *evsel)
> >
> >  bool evsel__is_hybrid(struct evsel *evsel)
> >  {
> > +       if (evsel->pmu)
> > +               return evsel->pmu->is_hybrid;
> > +
> >         return evsel->pmu_name && perf_pmu__is_hybrid(evsel->pmu_name);
>
> Wow, there's so much duplicated state. Why do evsels have a pmu_name
> and a pmu? Why not just pmu->name? I feel always having a pmu would be
> cleanest here.

Thanks a lot Ian for your detailed review!

The evsel->pmu was added recently for checking missing features.
And I just made it to have the pmu info when parsing events.

I guess it has pmu_name because it didn't want to add pmu.c dependency
to the python module.  But this change only adds pmu.h dependency.


> That said what does evsel__is_hybrid even mean? Does it
> mean this event is on a PMU normally called cpu and called cpu_core
> and cpu_atom on hybrid systems? And of course there are no comments to
> explain what this little mystery could be.

I believe so.


> Anyway, that's not a fault
> of this change, and probably later changes will go someway toward
> cleaning this up. It was a shame the code wasn't cleaner in the first
> place.
>
> Acked-by: Ian Rogers

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1)
  2022-10-11  0:25 ` [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1) Andi Kleen
@ 2022-10-11  5:38   ` Namhyung Kim
  2022-10-11  6:13     ` Ian Rogers
  2022-10-11 11:57     ` Andi Kleen
  0 siblings, 2 replies; 63+ messages in thread
From: Namhyung Kim @ 2022-10-11  5:38 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Ian Rogers, Adrian Hunter, linux-perf-users, Kan Liang,
	Leo Yan, Athira Rajeev, James Clark, Xing Zhengjun

Hi Andi,

On Mon, Oct 10, 2022 at 5:25 PM Andi Kleen <ak@linux.intel.com> wrote:
>
>
> On 10/10/2022 10:35 PM, Namhyung Kim wrote:
> > Hello,
> >
> > Current perf stat code is somewhat hard to follow since it handles
> > many combinations of PMUs/events for given display and aggregation
> > options.  This is my attempt to clean it up a little. ;-)
>
>
> My main concern would be subtle regressions since there are so many
> different combinations and way to travel through the code, and a lot of
> things are not covered by unit tests. When I worked on the code it was
> difficult to keep it all working. I assume you have some way to
> enumerate them all and tested that the output is identical?

Right, that's my concern too.

I have tested many combinations manually and checked if they
produced similar results.  But the problem is that I cannot test
all hardwares and more importantly it's hard to check
programmatically if the output is the same or not.  The numbers
vary on each run and sometimes it fluctuates a lot.  I don't have
good test workloads and the results work for every combination.

Any suggestions?

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1)
  2022-10-11  5:38   ` Namhyung Kim
@ 2022-10-11  6:13     ` Ian Rogers
  2022-10-12  3:55       ` Namhyung Kim
  2022-10-11 11:57     ` Andi Kleen
  1 sibling, 1 reply; 63+ messages in thread
From: Ian Rogers @ 2022-10-11  6:13 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Andi Kleen, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar,
	Peter Zijlstra, LKML, Adrian Hunter, linux-perf-users, Kan Liang,
	Leo Yan, Athira Rajeev, James Clark, Xing Zhengjun

On Mon, Oct 10, 2022 at 10:38 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hi Andi,
>
> On Mon, Oct 10, 2022 at 5:25 PM Andi Kleen <ak@linux.intel.com> wrote:
> >
> >
> > On 10/10/2022 10:35 PM, Namhyung Kim wrote:
> > > Hello,
> > >
> > > Current perf stat code is somewhat hard to follow since it handles
> > > many combinations of PMUs/events for given display and aggregation
> > > options.  This is my attempt to clean it up a little. ;-)
> >
> >
> > My main concern would be subtle regressions since there are so many
> > different combinations and way to travel through the code, and a lot of
> > things are not covered by unit tests. When I worked on the code it was
> > difficult to keep it all working. I assume you have some way to
> > enumerate them all and tested that the output is identical?
>
> Right, that's my concern too.
>
> I have tested many combinations manually and checked if they
> produced similar results.  But the problem is that I cannot test
> all hardwares and more importantly it's hard to check
> programmatically if the output is the same or not.  The numbers
> vary on each run and sometimes it fluctuates a lot.  I don't have
> good test workloads and the results work for every combination.
>
> Any suggestions?

I don't think there is anything clever we can do here. A few releases
ago summary mode was enabled by default. For CSV output this meant a
summary was printed at the bottom of perf stat and importantly the
summary print out added a column on the left of all the other columns.
This caused some tool issues for us. We now have a test that CSV
output has a fixed number of columns. We added the CSV test because
the json output code reformatted the display code and it would be easy
to introduce a regression (in fact I did :-/ ). So my point is that
stat output can change and break things and we've been doing this by
accident for a while now. This isn't a reason to not merge this
change.

I think the real fix here is for tools to stop using text or CSV
output and switch to the json output, that way output isn't as brittle
except to the keys we use. It isn't feasible for the perf tool to
stand still in case there is a script somewhere, we'll just accumulate
bugs and baggage. However, if someone has a script and they want to
enforce an output, all they need to do is stick a test on it (the
Beyonce principle except s/ring/test/).

Thanks,
Ian

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1)
  2022-10-11  5:38   ` Namhyung Kim
  2022-10-11  6:13     ` Ian Rogers
@ 2022-10-11 11:57     ` Andi Kleen
  2022-10-12  3:58       ` Namhyung Kim
  1 sibling, 1 reply; 63+ messages in thread
From: Andi Kleen @ 2022-10-11 11:57 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Ian Rogers, Adrian Hunter, linux-perf-users, Kan Liang,
	Leo Yan, Athira Rajeev, James Clark, Xing Zhengjun


>> My main concern would be subtle regressions since there are so many
>> different combinations and way to travel through the code, and a lot of
>> things are not covered by unit tests. When I worked on the code it was
>> difficult to keep it all working. I assume you have some way to
>> enumerate them all and tested that the output is identical?
> Right, that's my concern too.
>
> I have tested many combinations manually and checked if they
> produced similar results.

I had a script to test many combinations, but had to check the output 
manually


> But the problem is that I cannot test
> all hardwares and more importantly it's hard to check
> programmatically if the output is the same or not.

Can use "dummy" or some software event (e.g. a probe on some syscall) to 
get stable numbers. I don't think we need to cover all hardware for the 
output options, the different events should be similar, but need some 
coverage for the different aggregation. Or we could add some more tool 
events just for testing purposes, that would allow covering different 
core scopes etc. and would easily allow generating known counts.

-Andi



^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 04/19] perf stat: Add aggr id for global mode
  2022-10-10 22:46   ` Ian Rogers
@ 2022-10-11 23:08     ` Namhyung Kim
  0 siblings, 0 replies; 63+ messages in thread
From: Namhyung Kim @ 2022-10-11 23:08 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Mon, Oct 10, 2022 at 3:46 PM Ian Rogers <irogers@google.com> wrote:
>
> On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > To make the code simpler, I'd like to use the same aggregation code for
> > the global mode.  We can simply add an id function to return cpu 0 and
> > use print_aggr().
> >
> > No functional change intended.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/perf/builtin-stat.c      | 39 ++++++++++++++++++++++++++++++++--
> >  tools/perf/util/cpumap.c       | 10 +++++++++
> >  tools/perf/util/cpumap.h       |  6 +++++-
> >  tools/perf/util/stat-display.c |  9 ++------
> >  4 files changed, 54 insertions(+), 10 deletions(-)
> >
> > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> > index 265b05157972..144bb3a657f2 100644
> > --- a/tools/perf/builtin-stat.c
> > +++ b/tools/perf/builtin-stat.c
> > @@ -1330,6 +1330,15 @@ static struct aggr_cpu_id perf_stat__get_node(struct perf_stat_config *config __
> >         return aggr_cpu_id__node(cpu, /*data=*/NULL);
> >  }
> >
> > +static struct aggr_cpu_id perf_stat__get_global(struct perf_stat_config *config __maybe_unused,
> > +                                               struct perf_cpu cpu __maybe_unused)
> > +{
> > +       struct aggr_cpu_id id = aggr_cpu_id__empty();
> > +
> > +       id.cpu = (struct perf_cpu){ .cpu = 0 };
> > +       return id;
> > +}
> > +
>
> See below, I think this should just return aggr_cpu_id__global or just
> call that directly.

Ok, will do.

>
> >  static struct aggr_cpu_id perf_stat__get_aggr(struct perf_stat_config *config,
> >                                               aggr_get_id_t get_id, struct perf_cpu cpu)
> >  {
> > @@ -1366,6 +1375,12 @@ static struct aggr_cpu_id perf_stat__get_node_cached(struct perf_stat_config *co
> >         return perf_stat__get_aggr(config, perf_stat__get_node, cpu);
> >  }
> >
> > +static struct aggr_cpu_id perf_stat__get_global_cached(struct perf_stat_config *config,
> > +                                                      struct perf_cpu cpu)
> > +{
> > +       return perf_stat__get_aggr(config, perf_stat__get_global, cpu);
> > +}
> > +
> >  static bool term_percore_set(void)
> >  {
> >         struct evsel *counter;
> > @@ -1395,6 +1410,7 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr(enum aggr_mode aggr_mode)
> >
> >                 return NULL;
> >         case AGGR_GLOBAL:
> > +               return aggr_cpu_id__global;
> >         case AGGR_THREAD:
> >         case AGGR_UNSET:
> >         case AGGR_MAX:
> > @@ -1420,6 +1436,7 @@ static aggr_get_id_t aggr_mode__get_id(enum aggr_mode aggr_mode)
> >                 }
> >                 return NULL;
> >         case AGGR_GLOBAL:
> > +               return perf_stat__get_global_cached;
> >         case AGGR_THREAD:
> >         case AGGR_UNSET:
> >         case AGGR_MAX:
> > @@ -1535,6 +1552,16 @@ static struct aggr_cpu_id perf_env__get_node_aggr_by_cpu(struct perf_cpu cpu, vo
> >         return id;
> >  }
> >
> > +static struct aggr_cpu_id perf_env__get_global_aggr_by_cpu(struct perf_cpu cpu __maybe_unused,
> > +                                                          void *data __maybe_unused)
> > +{
> > +       struct aggr_cpu_id id = aggr_cpu_id__empty();
> > +
> > +       /* it always aggregates to the cpu 0 */
> > +       id.cpu = (struct perf_cpu){ .cpu = 0 };
> > +       return id;
> > +}
> > +
> >  static struct aggr_cpu_id perf_stat__get_socket_file(struct perf_stat_config *config __maybe_unused,
> >                                                      struct perf_cpu cpu)
> >  {
> > @@ -1558,6 +1585,12 @@ static struct aggr_cpu_id perf_stat__get_node_file(struct perf_stat_config *conf
> >         return perf_env__get_node_aggr_by_cpu(cpu, &perf_stat.session->header.env);
> >  }
> >
> > +static struct aggr_cpu_id perf_stat__get_global_file(struct perf_stat_config *config __maybe_unused,
> > +                                                    struct perf_cpu cpu)
> > +{
> > +       return perf_env__get_global_aggr_by_cpu(cpu, &perf_stat.session->header.env);
> > +}
> > +
> >  static aggr_cpu_id_get_t aggr_mode__get_aggr_file(enum aggr_mode aggr_mode)
> >  {
> >         switch (aggr_mode) {
> > @@ -1569,8 +1602,9 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr_file(enum aggr_mode aggr_mode)
> >                 return perf_env__get_core_aggr_by_cpu;
> >         case AGGR_NODE:
> >                 return perf_env__get_node_aggr_by_cpu;
> > -       case AGGR_NONE:
> >         case AGGR_GLOBAL:
> > +               return perf_env__get_global_aggr_by_cpu;
> > +       case AGGR_NONE:
> >         case AGGR_THREAD:
> >         case AGGR_UNSET:
> >         case AGGR_MAX:
> > @@ -1590,8 +1624,9 @@ static aggr_get_id_t aggr_mode__get_id_file(enum aggr_mode aggr_mode)
> >                 return perf_stat__get_core_file;
> >         case AGGR_NODE:
> >                 return perf_stat__get_node_file;
> > -       case AGGR_NONE:
> >         case AGGR_GLOBAL:
> > +               return perf_stat__get_global_file;
> > +       case AGGR_NONE:
> >         case AGGR_THREAD:
> >         case AGGR_UNSET:
> >         case AGGR_MAX:
> > diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c
> > index 8486ca3bec75..60209fe87456 100644
> > --- a/tools/perf/util/cpumap.c
> > +++ b/tools/perf/util/cpumap.c
> > @@ -354,6 +354,16 @@ struct aggr_cpu_id aggr_cpu_id__node(struct perf_cpu cpu, void *data __maybe_unu
> >         return id;
> >  }
> >
> > +struct aggr_cpu_id aggr_cpu_id__global(struct perf_cpu cpu, void *data __maybe_unused)
>
> Is this a duplicate of aggr_cpu_id perf_stat__get_global? Could we
> replace all uses of the former with this one?

They are very similar but used for different purposes.
I'll think about how to simplify this code more.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 06/19] perf stat: Add 'needs_sort' argument to cpu_aggr_map__new()
  2022-10-10 22:53   ` Ian Rogers
@ 2022-10-11 23:32     ` Namhyung Kim
  0 siblings, 0 replies; 63+ messages in thread
From: Namhyung Kim @ 2022-10-11 23:32 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Mon, Oct 10, 2022 at 3:53 PM Ian Rogers <irogers@google.com> wrote:
>
> On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > In case of no aggregation, it needs to keep the original (cpu) ordering
> > in the aggr_map so that it can be in sync with the cpu map.  This will
> > make the code easier to handle AGGR_NONE similar to others.
> >
>
> The CPU map is sorted and so sorting the aggr_map should be fine. If
> the data is already sorted then it is O(n) to sort. I think this is
> preferable to having additional complexity around whether the aggr_map
> is sorted.

The problem is that aggr_cpu_id__cmp() only checks socket, die and core
so it will have CPUs in the same core together - like 0, 4, 1, 5, 2, 6, 3, 7.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 07/19] perf stat: Add struct perf_stat_aggr to perf_stat_evsel
  2022-10-10 23:00   ` Ian Rogers
@ 2022-10-11 23:37     ` Namhyung Kim
  0 siblings, 0 replies; 63+ messages in thread
From: Namhyung Kim @ 2022-10-11 23:37 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Mon, Oct 10, 2022 at 4:01 PM Ian Rogers <irogers@google.com> wrote:
>
> On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > The perf_stat_aggr struct is to keep aggregated counter values and the
> > states according to the aggregation mode.  The number of entries is
> > depends on the mode and this is a preparation for the later use.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/perf/util/stat.c | 34 +++++++++++++++++++++++++++-------
> >  tools/perf/util/stat.h |  9 +++++++++
> >  2 files changed, 36 insertions(+), 7 deletions(-)
> >
> > diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> > index 8ec8bb4a9912..c9d5aa295b54 100644
> > --- a/tools/perf/util/stat.c
> > +++ b/tools/perf/util/stat.c
> > @@ -133,15 +133,33 @@ static void perf_stat_evsel_id_init(struct evsel *evsel)
> >  static void evsel__reset_stat_priv(struct evsel *evsel)
> >  {
> >         struct perf_stat_evsel *ps = evsel->stats;
> > +       struct perf_stat_aggr *aggr = ps->aggr;
> >
> >         init_stats(&ps->res_stats);
> > +
> > +       if (aggr)
> > +               memset(aggr, 0, sizeof(*aggr) * ps->nr_aggr);
> >  }
> >
> > -static int evsel__alloc_stat_priv(struct evsel *evsel)
> > +
> > +static int evsel__alloc_stat_priv(struct evsel *evsel, int nr_aggr)
> >  {
> > -       evsel->stats = zalloc(sizeof(struct perf_stat_evsel));
> > -       if (evsel->stats == NULL)
> > +       struct perf_stat_evsel *ps;
> > +
> > +       ps = zalloc(sizeof(*ps));
> > +       if (ps == NULL)
> >                 return -ENOMEM;
> > +
> > +       if (nr_aggr) {
> > +               ps->nr_aggr = nr_aggr;
> > +               ps->aggr = calloc(nr_aggr, sizeof(*ps->aggr));
> > +               if (ps->aggr == NULL) {
> > +                       free(ps);
> > +                       return -ENOMEM;
> > +               }
> > +       }
> > +
> > +       evsel->stats = ps;
> >         perf_stat_evsel_id_init(evsel);
> >         evsel__reset_stat_priv(evsel);
> >         return 0;
> > @@ -151,8 +169,10 @@ static void evsel__free_stat_priv(struct evsel *evsel)
> >  {
> >         struct perf_stat_evsel *ps = evsel->stats;
> >
> > -       if (ps)
> > +       if (ps) {
> > +               zfree(&ps->aggr);
> >                 zfree(&ps->group_data);
> > +       }
> >         zfree(&evsel->stats);
> >  }
> >
> > @@ -181,9 +201,9 @@ static void evsel__reset_prev_raw_counts(struct evsel *evsel)
> >                 perf_counts__reset(evsel->prev_raw_counts);
> >  }
> >
> > -static int evsel__alloc_stats(struct evsel *evsel, bool alloc_raw)
> > +static int evsel__alloc_stats(struct evsel *evsel, int nr_aggr, bool alloc_raw)
> >  {
> > -       if (evsel__alloc_stat_priv(evsel) < 0 ||
> > +       if (evsel__alloc_stat_priv(evsel, nr_aggr) < 0 ||
> >             evsel__alloc_counts(evsel) < 0 ||
> >             (alloc_raw && evsel__alloc_prev_raw_counts(evsel) < 0))
> >                 return -ENOMEM;
> > @@ -196,7 +216,7 @@ int evlist__alloc_stats(struct evlist *evlist, bool alloc_raw)
> >         struct evsel *evsel;
> >
> >         evlist__for_each_entry(evlist, evsel) {
> > -               if (evsel__alloc_stats(evsel, alloc_raw))
> > +               if (evsel__alloc_stats(evsel, 0, alloc_raw))
> >                         goto out_free;
> >         }
> >
> > diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> > index b0899c6e002f..ea356e5aa351 100644
> > --- a/tools/perf/util/stat.h
> > +++ b/tools/perf/util/stat.h
> > @@ -8,6 +8,7 @@
> >  #include <sys/resource.h>
> >  #include "cpumap.h"
> >  #include "rblist.h"
> > +#include "counts.h"
> >
> >  struct perf_cpu_map;
> >  struct perf_stat_config;
> > @@ -42,9 +43,17 @@ enum perf_stat_evsel_id {
> >         PERF_STAT_EVSEL_ID__MAX,
> >  };
> >
>
> The new struct variables below are all worth comments.

Sure, will add.

>
> > +struct perf_stat_aggr {
> > +       struct perf_counts_values       counts;
> > +       int                             nr;
>
> Could this value be derived from counts.values.size ?

Do you mean sizeof() or ARRAY_SIZE() for counts.values?
There's no counts.values.size..

It's a completely different thing.  It's to count how many CPUs are
aggregated to the entry (aggregate-number in JSON).

Thanks,
Namhyung


>
> > +       bool                            failed;
> > +};
> > +
> >  struct perf_stat_evsel {
> >         struct stats             res_stats;
> >         enum perf_stat_evsel_id  id;
> > +       int                      nr_aggr;
> > +       struct perf_stat_aggr   *aggr;
> >         u64                     *group_data;
> >  };
> >
> > --
> > 2.38.0.rc1.362.ged0d419d3c-goog
> >

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 08/19] perf stat: Allocate evsel->stats->aggr properly
  2022-10-10 23:03   ` Ian Rogers
@ 2022-10-11 23:38     ` Namhyung Kim
  0 siblings, 0 replies; 63+ messages in thread
From: Namhyung Kim @ 2022-10-11 23:38 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Mon, Oct 10, 2022 at 4:03 PM Ian Rogers <irogers@google.com> wrote:
>
> On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > The perf_stat_config.aggr_map should have a correct size of the
> > aggregation map.  Use it to allocate aggr_counts.
> >
> > Also AGGR_NONE with per-core events can be tricky because it doesn't
> > aggreate basically but it needs to do so for per-core events only.
>
> nit: s/aggreate/aggregate/
>
> > So only per-core evsels will have stats->aggr data.
> >
> > Note that other caller of evlist__alloc_stat() might not have
> > stat_config or aggr_map.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
>
> Acked-by: Ian Rogers <irogers@google.com>
>
> nit: Below there are use of constants true, false and NULL, it would
> be nice to use the /*argument_name=*/... style parameter passing to be
> clearer on what the parameter means.

Sounds good.  Will add.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 09/19] perf stat: Aggregate events using evsel->stats->aggr
  2022-10-10 23:11   ` Ian Rogers
@ 2022-10-11 23:44     ` Namhyung Kim
  0 siblings, 0 replies; 63+ messages in thread
From: Namhyung Kim @ 2022-10-11 23:44 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Mon, Oct 10, 2022 at 4:11 PM Ian Rogers <irogers@google.com> wrote:
>
> On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Add a logic to aggregate counter values to the new evsel->stats->aggr.
> > This is not used yet so shadow stats are not updated.  But later patch
> > will convert the existing code to use it.
> >
> > With that, we don't need to handle AGGR_GLOBAL specially anymore.  It
> > can use the same logic with counts, prev_counts and aggr_counts.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/perf/builtin-stat.c                     |  3 --
> >  tools/perf/util/evsel.c                       |  9 +---
> >  .../scripting-engines/trace-event-python.c    |  6 ---
> >  tools/perf/util/stat.c                        | 46 ++++++++++++++++---
> >  4 files changed, 41 insertions(+), 23 deletions(-)
> >
> > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> > index eaddafbd7ff2..139e35ed68d3 100644
> > --- a/tools/perf/builtin-stat.c
> > +++ b/tools/perf/builtin-stat.c
> > @@ -963,9 +963,6 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
> >                 init_stats(&walltime_nsecs_stats);
> >                 update_stats(&walltime_nsecs_stats, t1 - t0);
> >
> > -               if (stat_config.aggr_mode == AGGR_GLOBAL)
> > -                       evlist__save_aggr_prev_raw_counts(evsel_list);
> > -
> >                 evlist__copy_prev_raw_counts(evsel_list);
> >                 evlist__reset_prev_raw_counts(evsel_list);
> >                 perf_stat__reset_shadow_per_stat(&rt_stat);
> > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > index a6ea91c72659..a1fcb3166149 100644
> > --- a/tools/perf/util/evsel.c
> > +++ b/tools/perf/util/evsel.c
> > @@ -1526,13 +1526,8 @@ void evsel__compute_deltas(struct evsel *evsel, int cpu_map_idx, int thread,
> >         if (!evsel->prev_raw_counts)
> >                 return;
> >
> > -       if (cpu_map_idx == -1) {
> > -               tmp = evsel->prev_raw_counts->aggr;
> > -               evsel->prev_raw_counts->aggr = *count;
> > -       } else {
> > -               tmp = *perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread);
> > -               *perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread) = *count;
> > -       }
> > +       tmp = *perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread);
> > +       *perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread) = *count;
> >
> >         count->val = count->val - tmp.val;
> >         count->ena = count->ena - tmp.ena;
> > diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
> > index 1f2040f36d4e..7bc8559dce6a 100644
> > --- a/tools/perf/util/scripting-engines/trace-event-python.c
> > +++ b/tools/perf/util/scripting-engines/trace-event-python.c
> > @@ -1653,12 +1653,6 @@ static void python_process_stat(struct perf_stat_config *config,
> >         struct perf_cpu_map *cpus = counter->core.cpus;
> >         int cpu, thread;
> >
> > -       if (config->aggr_mode == AGGR_GLOBAL) {
> > -               process_stat(counter, (struct perf_cpu){ .cpu = -1 }, -1, tstamp,
> > -                            &counter->counts->aggr);
> > -               return;
> > -       }
> > -
> >         for (thread = 0; thread < threads->nr; thread++) {
> >                 for (cpu = 0; cpu < perf_cpu_map__nr(cpus); cpu++) {
> >                         process_stat(counter, perf_cpu_map__cpu(cpus, cpu),
> > diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> > index 374149628507..99874254809d 100644
> > --- a/tools/perf/util/stat.c
> > +++ b/tools/perf/util/stat.c
> > @@ -387,6 +387,7 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
> >                        struct perf_counts_values *count)
> >  {
> >         struct perf_counts_values *aggr = &evsel->counts->aggr;
> > +       struct perf_stat_evsel *ps = evsel->stats;
> >         static struct perf_counts_values zero;
> >         bool skip = false;
> >
> > @@ -398,6 +399,44 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
> >         if (skip)
> >                 count = &zero;
> >
> > +       if (!evsel->snapshot)
> > +               evsel__compute_deltas(evsel, cpu_map_idx, thread, count);
> > +       perf_counts_values__scale(count, config->scale, NULL);
> > +
> > +       if (ps->aggr) {
> > +               struct perf_cpu cpu = perf_cpu_map__cpu(evsel->core.cpus, cpu_map_idx);
> > +               struct aggr_cpu_id aggr_id = config->aggr_get_id(config, cpu);
> > +               struct perf_stat_aggr *ps_aggr;
> > +               int i;
> > +
> > +               for (i = 0; i < ps->nr_aggr; i++) {
>
> Would it be cleaner to have a helper function here that returns i or
> ps_aggr for the first CPU being aggregated into? That would avoid the
> continue/break.

Right, we need cpu -> aggr_idx mapping.

>
> > +                       if (!aggr_cpu_id__equal(&aggr_id, &config->aggr_map->map[i]))
> > +                               continue;
> > +
> > +                       ps_aggr = &ps->aggr[i];
> > +                       ps_aggr->nr++;
> > +
> > +                       /*
> > +                        * When any result is bad, make them all to give
> > +                        * consistent output in interval mode.
> > +                        */
> > +                       if (count->ena == 0 || count->run == 0 ||
> > +                           evsel->counts->scaled == -1) {
> > +                               ps_aggr->counts.val = 0;
> > +                               ps_aggr->counts.ena = 0;
> > +                               ps_aggr->counts.run = 0;
> > +                               ps_aggr->failed = true;
> > +                       }
> > +
> > +                       if (!ps_aggr->failed) {
> > +                               ps_aggr->counts.val += count->val;
> > +                               ps_aggr->counts.ena += count->ena;
> > +                               ps_aggr->counts.run += count->run;
> > +                       }
> > +                       break;
> > +               }
> > +       }
> > +
> >         switch (config->aggr_mode) {
> >         case AGGR_THREAD:
> >         case AGGR_CORE:
> > @@ -405,9 +444,6 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
> >         case AGGR_SOCKET:
> >         case AGGR_NODE:
> >         case AGGR_NONE:
> > -               if (!evsel->snapshot)
> > -                       evsel__compute_deltas(evsel, cpu_map_idx, thread, count);
> > -               perf_counts_values__scale(count, config->scale, NULL);
> >                 if ((config->aggr_mode == AGGR_NONE) && (!evsel->percore)) {
> >                         perf_stat__update_shadow_stats(evsel, count->val,
> >                                                        cpu_map_idx, &rt_stat);
> > @@ -469,10 +505,6 @@ int perf_stat_process_counter(struct perf_stat_config *config,
> >         if (config->aggr_mode != AGGR_GLOBAL)
> >                 return 0;
> >
> > -       if (!counter->snapshot)
> > -               evsel__compute_deltas(counter, -1, -1, aggr);
> > -       perf_counts_values__scale(aggr, config->scale, &counter->counts->scaled);
> > -
>
> It isn't clear to me how this relates to the patch.

It's moved to process_counter_values() to be handled like other
aggr_mode.

Thanks,
Namhyung


>
> >         update_stats(&ps->res_stats, *count);
> >
> >         if (verbose > 0) {
> > --
> > 2.38.0.rc1.362.ged0d419d3c-goog
> >

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 10/19] perf stat: Aggregate per-thread stats using evsel->stats->aggr
  2022-10-10 23:17   ` Ian Rogers
@ 2022-10-11 23:46     ` Namhyung Kim
  0 siblings, 0 replies; 63+ messages in thread
From: Namhyung Kim @ 2022-10-11 23:46 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Mon, Oct 10, 2022 at 4:17 PM Ian Rogers <irogers@google.com> wrote:
>
> On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Per-thread aggregation doesn't use the CPU numbers but the logic should
> > be the same.  Initialize cpu_aggr_map separately for AGGR_THREAD and use
> > thread map idx to aggregate counter values.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/perf/builtin-stat.c | 31 +++++++++++++++++++++++++++++++
> >  tools/perf/util/stat.c    | 19 +++++++++++++++++++
> >  2 files changed, 50 insertions(+)
> >
> > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> > index 139e35ed68d3..c76240cfc635 100644
> > --- a/tools/perf/builtin-stat.c
> > +++ b/tools/perf/builtin-stat.c
> > @@ -1468,6 +1468,21 @@ static int perf_stat_init_aggr_mode(void)
> >                 stat_config.aggr_get_id = aggr_mode__get_id(stat_config.aggr_mode);
> >         }
> >
> > +       if (stat_config.aggr_mode == AGGR_THREAD) {
> > +               nr = perf_thread_map__nr(evsel_list->core.threads);
> > +               stat_config.aggr_map = cpu_aggr_map__empty_new(nr);
> > +               if (stat_config.aggr_map == NULL)
> > +                       return -ENOMEM;
> > +
> > +               for (int s = 0; s < nr; s++) {
> > +                       struct aggr_cpu_id id = aggr_cpu_id__empty();
> > +
> > +                       id.thread_idx = s;
> > +                       stat_config.aggr_map->map[s] = id;
> > +               }
> > +               return 0;
> > +       }
> > +
> >         /*
> >          * The evsel_list->cpus is the base we operate on,
> >          * taking the highest cpu number to be the size of
> > @@ -1677,6 +1692,22 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
> >         aggr_cpu_id_get_t get_id = aggr_mode__get_aggr_file(stat_config.aggr_mode);
> >         bool needs_sort = stat_config.aggr_mode != AGGR_NONE;
> >
> > +       if (stat_config.aggr_mode == AGGR_THREAD) {
> > +               int nr = perf_thread_map__nr(evsel_list->core.threads);
> > +
> > +               stat_config.aggr_map = cpu_aggr_map__empty_new(nr);
> > +               if (stat_config.aggr_map == NULL)
> > +                       return -ENOMEM;
> > +
> > +               for (int s = 0; s < nr; s++) {
> > +                       struct aggr_cpu_id id = aggr_cpu_id__empty();
> > +
> > +                       id.thread_idx = s;
> > +                       stat_config.aggr_map->map[s] = id;
> > +               }
> > +               return 0;
> > +       }
> > +
> >         if (!get_id)
> >                 return 0;
> >
> > diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> > index 99874254809d..013dbe1c5d28 100644
> > --- a/tools/perf/util/stat.c
> > +++ b/tools/perf/util/stat.c
> > @@ -403,6 +403,24 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
> >                 evsel__compute_deltas(evsel, cpu_map_idx, thread, count);
> >         perf_counts_values__scale(count, config->scale, NULL);
> >
> > +       if (config->aggr_mode == AGGR_THREAD) {
> > +               struct perf_counts_values *aggr_counts = &ps->aggr[thread].counts;
> > +
> > +               /*
> > +                * Skip value 0 when enabling --per-thread globally,
> > +                * otherwise too many 0 output.
> > +                */
> > +               if (count->val == 0 && config->system_wide)
> > +                       return 0;
> > +
> > +               ps->aggr[thread].nr++;
> > +
> > +               aggr_counts->val += count->val;
> > +               aggr_counts->ena += count->ena;
> > +               aggr_counts->run += count->run;
> > +               goto update;
>
> nit: perhaps there is a more intention revealing name than update here.

thread_map_idx ? ;-)  I think we need to rename it separately.

>
> Acked-by: Ian Rogers <irogers@google.com>

Thanks,
Namhyung


>
> > +       }
> > +
> >         if (ps->aggr) {
> >                 struct perf_cpu cpu = perf_cpu_map__cpu(evsel->core.cpus, cpu_map_idx);
> >                 struct aggr_cpu_id aggr_id = config->aggr_get_id(config, cpu);
> > @@ -437,6 +455,7 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
> >                 }
> >         }
> >
> > +update:
> >         switch (config->aggr_mode) {
> >         case AGGR_THREAD:
> >         case AGGR_CORE:
> > --
> > 2.38.0.rc1.362.ged0d419d3c-goog
> >

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 12/19] perf stat: Reset aggr counts for each interval
  2022-10-10 23:20   ` Ian Rogers
@ 2022-10-11 23:48     ` Namhyung Kim
  0 siblings, 0 replies; 63+ messages in thread
From: Namhyung Kim @ 2022-10-11 23:48 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Mon, Oct 10, 2022 at 4:20 PM Ian Rogers <irogers@google.com> wrote:
>
> On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > The evsel->stats->aggr->count should be reset for interval processing
> > since we want to use the values directly for display.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/perf/builtin-stat.c |  3 +++
> >  tools/perf/util/stat.c    | 13 +++++++++++++
> >  tools/perf/util/stat.h    |  1 +
> >  3 files changed, 17 insertions(+)
> >
> > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> > index 983f38cd4caa..38036f40e993 100644
> > --- a/tools/perf/builtin-stat.c
> > +++ b/tools/perf/builtin-stat.c
> > @@ -492,6 +492,8 @@ static void process_interval(void)
> >         diff_timespec(&rs, &ts, &ref_time);
> >
> >         perf_stat__reset_shadow_per_stat(&rt_stat);
> > +       evlist__reset_aggr_stats(evsel_list);
> > +
> >         read_counters(&rs);
> >
> >         if (STAT_RECORD) {
> > @@ -965,6 +967,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
> >
> >                 evlist__copy_prev_raw_counts(evsel_list);
> >                 evlist__reset_prev_raw_counts(evsel_list);
> > +               evlist__reset_aggr_stats(evsel_list);
> >                 perf_stat__reset_shadow_per_stat(&rt_stat);
> >         } else {
> >                 update_stats(&walltime_nsecs_stats, t1 - t0);
> > diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> > index 279aa4ea342d..4edfc1c5dc07 100644
> > --- a/tools/perf/util/stat.c
> > +++ b/tools/perf/util/stat.c
> > @@ -276,6 +276,19 @@ void evlist__reset_stats(struct evlist *evlist)
> >         }
> >  }
> >
> > +void evlist__reset_aggr_stats(struct evlist *evlist)
> > +{
> > +       struct evsel *evsel;
> > +
> > +       evlist__for_each_entry(evlist, evsel) {
> > +               struct perf_stat_evsel *ps = evsel->stats;
> > +               struct perf_stat_aggr *aggr = ps->aggr;
> > +
> > +               if (aggr)
> > +                       memset(aggr, 0, sizeof(*aggr) * ps->nr_aggr);
>
> Perhaps this would be cleaner with helper functions on perf_stat_evsel
> and perf_stat_aggr?

Sounds good.  Will add.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 14/19] perf stat: Add perf_stat_merge_counters()
  2022-10-10 23:31   ` Ian Rogers
@ 2022-10-11 23:55     ` Namhyung Kim
  0 siblings, 0 replies; 63+ messages in thread
From: Namhyung Kim @ 2022-10-11 23:55 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Mon, Oct 10, 2022 at 4:31 PM Ian Rogers <irogers@google.com> wrote:
>
> On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > The perf_stat_merge_counters() is to aggregate the same events in different
> > PMUs like in case of uncore or hybrid.  The same logic is in the stat-display
> > routines but I think it should be handled when it processes the event counters.
>
> I think I'm confused as to what a merged counter is. Does it relate to
> the evsel leader? How are aliases and merging related?

I'm also not sure if the 'aliases' is a good name but I just followed
the existing name
in collect_data().  It's about uncore and hybrid events afaik, not about groups.

For uncore case, it wants to merge the same event in other uncore together
e.g. uncore_imc_0/cas_count_read/ , uncore_imc_1/cas_count_read/ , ...
to uncore_imc/cas_count_read/ or simply imc/cas_count_read/ .

For hybrid case, it'd merge cpu_core/cycles/ and cpu_atom/cycles/ to
cycles when --hybrid-merge is given.

Thanks,
Namhyung

>
> >
> > As it works on the aggr_counters, it doesn't change the output yet.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/perf/builtin-stat.c |  2 +
> >  tools/perf/util/stat.c    | 96 +++++++++++++++++++++++++++++++++++++++
> >  tools/perf/util/stat.h    |  2 +
> >  3 files changed, 100 insertions(+)
> >
> > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> > index 49a7e290d778..f90e8f29cb23 100644
> > --- a/tools/perf/builtin-stat.c
> > +++ b/tools/perf/builtin-stat.c
> > @@ -486,6 +486,8 @@ static void process_counters(void)
> >                         pr_warning("failed to process counter %s\n", counter->name);
> >                 counter->err = 0;
> >         }
> > +
> > +       perf_stat_merge_counters(&stat_config, evsel_list);
> >  }
> >
> >  static void process_interval(void)
> > diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> > index 4edfc1c5dc07..1bb197782a34 100644
> > --- a/tools/perf/util/stat.c
> > +++ b/tools/perf/util/stat.c
> > @@ -575,6 +575,102 @@ int perf_stat_process_counter(struct perf_stat_config *config,
> >         return 0;
> >  }
> >
> > +static int evsel__merge_aggr_counters(struct evsel *evsel, struct evsel *alias)
> > +{
> > +       struct perf_stat_evsel *ps_a = evsel->stats;
> > +       struct perf_stat_evsel *ps_b = alias->stats;
> > +       int i;
> > +
> > +       if (ps_a->aggr == NULL && ps_b->aggr == NULL)
> > +               return 0;
> > +
> > +       if (ps_a->nr_aggr != ps_b->nr_aggr) {
> > +               pr_err("Unmatched aggregation mode between aliases\n");
> > +               return -1;
> > +       }
> > +
> > +       for (i = 0; i < ps_a->nr_aggr; i++) {
> > +               struct perf_counts_values *aggr_counts_a = &ps_a->aggr[i].counts;
> > +               struct perf_counts_values *aggr_counts_b = &ps_b->aggr[i].counts;
> > +
> > +               /* NB: don't increase aggr.nr for aliases */
> > +
> > +               aggr_counts_a->val += aggr_counts_b->val;
> > +               aggr_counts_a->ena += aggr_counts_b->ena;
> > +               aggr_counts_a->run += aggr_counts_b->run;
> > +       }
> > +
> > +       return 0;
> > +}
> > +/* events should have the same name, scale, unit, cgroup but on different PMUs */
> > +static bool evsel__is_alias(struct evsel *evsel_a, struct evsel *evsel_b)
> > +{
> > +       if (strcmp(evsel__name(evsel_a), evsel__name(evsel_b)))
> > +               return false;
> > +
> > +       if (evsel_a->scale != evsel_b->scale)
> > +               return false;
> > +
> > +       if (evsel_a->cgrp != evsel_b->cgrp)
> > +               return false;
> > +
> > +       if (strcmp(evsel_a->unit, evsel_b->unit))
> > +               return false;
> > +
> > +       if (evsel__is_clock(evsel_a) != evsel__is_clock(evsel_b))
> > +               return false;
> > +
> > +       return !!strcmp(evsel_a->pmu_name, evsel_b->pmu_name);
> > +}
> > +
> > +static void evsel__merge_aliases(struct evsel *evsel)
> > +{
> > +       struct evlist *evlist = evsel->evlist;
> > +       struct evsel *alias;
> > +
> > +       alias = list_prepare_entry(evsel, &(evlist->core.entries), core.node);
> > +       list_for_each_entry_continue(alias, &evlist->core.entries, core.node) {
> > +               /* Merge the same events on different PMUs. */
> > +               if (evsel__is_alias(evsel, alias)) {
> > +                       evsel__merge_aggr_counters(evsel, alias);
> > +                       alias->merged_stat = true;
> > +               }
> > +       }
> > +}
> > +
> > +static bool evsel__should_merge_hybrid(struct evsel *evsel, struct perf_stat_config *config)
> > +{
> > +       struct perf_pmu *pmu;
> > +
> > +       if (!config->hybrid_merge)
> > +               return false;
> > +
> > +       pmu = evsel__find_pmu(evsel);
> > +       return pmu && pmu->is_hybrid;
> > +}
> > +
> > +static void evsel__merge_stats(struct evsel *evsel, struct perf_stat_config *config)
> > +{
> > +       /* this evsel is already merged */
> > +       if (evsel->merged_stat)
> > +               return;
> > +
> > +       if (evsel->auto_merge_stats || evsel__should_merge_hybrid(evsel, config))
> > +               evsel__merge_aliases(evsel);
> > +}
> > +
> > +/* merge the same uncore and hybrid events if requested */
> > +void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *evlist)
> > +{
> > +       struct evsel *evsel;
> > +
> > +       if (config->no_merge)
> > +               return;
> > +
> > +       evlist__for_each_entry(evlist, evsel)
> > +               evsel__merge_stats(evsel, config);
> > +}
> > +
> >  int perf_event__process_stat_event(struct perf_session *session,
> >                                    union perf_event *event)
> >  {
> > diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> > index 3a876ad2870b..12cc60ab04e4 100644
> > --- a/tools/perf/util/stat.h
> > +++ b/tools/perf/util/stat.h
> > @@ -270,6 +270,8 @@ void evlist__reset_aggr_stats(struct evlist *evlist);
> >
> >  int perf_stat_process_counter(struct perf_stat_config *config,
> >                               struct evsel *counter);
> > +void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *evlist);
> > +
> >  struct perf_tool;
> >  union perf_event;
> >  struct perf_session;
> > --
> > 2.38.0.rc1.362.ged0d419d3c-goog
> >

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 15/19] perf stat: Add perf_stat_process_percore()
  2022-10-10 23:32   ` Ian Rogers
@ 2022-10-11 23:59     ` Namhyung Kim
  0 siblings, 0 replies; 63+ messages in thread
From: Namhyung Kim @ 2022-10-11 23:59 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Mon, Oct 10, 2022 at 4:33 PM Ian Rogers <irogers@google.com> wrote:
>
> On Sun, Oct 9, 2022 at 10:36 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > The perf_stat_process_percore() is to aggregate counts for an event per-core
> > even if the aggr_mode is AGGR_NONE.  This is enabled when user requested it
> > on the command line.
>
> Is there an example command line for this? It would be nice to add as a test.

  perf stat -a -A -e cpu/event=cpu-cycles,percore/ sleep 1

Thanks,
Namhyung


>
> > To handle that, it keeps the per-cpu counts at first.  And then it aggregates
> > the counts that have the same core id in the aggr->counts and updates the
> > values for each cpu back.
> >
> > Later, per-core events will skip one of the CPUs unless percore-show-thread
> > option is given.  In that case, it can simply print all cpu stats with the
> > updated (per-core) values.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/perf/builtin-stat.c |  1 +
> >  tools/perf/util/stat.c    | 71 +++++++++++++++++++++++++++++++++++++++
> >  tools/perf/util/stat.h    |  2 ++
> >  3 files changed, 74 insertions(+)
> >
> > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> > index f90e8f29cb23..c127e784a7be 100644
> > --- a/tools/perf/builtin-stat.c
> > +++ b/tools/perf/builtin-stat.c
> > @@ -488,6 +488,7 @@ static void process_counters(void)
> >         }
> >
> >         perf_stat_merge_counters(&stat_config, evsel_list);
> > +       perf_stat_process_percore(&stat_config, evsel_list);
> >  }
> >
> >  static void process_interval(void)
> > diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> > index 1bb197782a34..d788d0e85204 100644
> > --- a/tools/perf/util/stat.c
> > +++ b/tools/perf/util/stat.c
> > @@ -671,6 +671,77 @@ void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *ev
> >                 evsel__merge_stats(evsel, config);
> >  }
> >
> > +static void evsel__update_percore_stats(struct evsel *evsel, struct aggr_cpu_id *core_id)
> > +{
> > +       struct perf_stat_evsel *ps = evsel->stats;
> > +       struct perf_counts_values counts = { 0, };
> > +       struct aggr_cpu_id id;
> > +       struct perf_cpu cpu;
> > +       int idx;
> > +
> > +       /* collect per-core counts */
> > +       perf_cpu_map__for_each_cpu(cpu, idx, evsel->core.cpus) {
> > +               struct perf_stat_aggr *aggr = &ps->aggr[idx];
> > +
> > +               id = aggr_cpu_id__core(cpu, NULL);
> > +               if (!aggr_cpu_id__equal(core_id, &id))
> > +                       continue;
> > +
> > +               counts.val += aggr->counts.val;
> > +               counts.ena += aggr->counts.ena;
> > +               counts.run += aggr->counts.run;
> > +       }
> > +
> > +       /* update aggregated per-core counts for each CPU */
> > +       perf_cpu_map__for_each_cpu(cpu, idx, evsel->core.cpus) {
> > +               struct perf_stat_aggr *aggr = &ps->aggr[idx];
> > +
> > +               id = aggr_cpu_id__core(cpu, NULL);
> > +               if (!aggr_cpu_id__equal(core_id, &id))
> > +                       continue;
> > +
> > +               aggr->counts.val = counts.val;
> > +               aggr->counts.ena = counts.ena;
> > +               aggr->counts.run = counts.run;
> > +
> > +               aggr->used = true;
> > +       }
> > +}
> > +
> > +/* we have an aggr_map for cpu, but want to aggregate the counters per-core */
> > +static void evsel__process_percore(struct evsel *evsel)
> > +{
> > +       struct perf_stat_evsel *ps = evsel->stats;
> > +       struct aggr_cpu_id core_id;
> > +       struct perf_cpu cpu;
> > +       int idx;
> > +
> > +       if (!evsel->percore)
> > +               return;
> > +
> > +       perf_cpu_map__for_each_cpu(cpu, idx, evsel->core.cpus) {
> > +               struct perf_stat_aggr *aggr = &ps->aggr[idx];
> > +
> > +               if (aggr->used)
> > +                       continue;
> > +
> > +               core_id = aggr_cpu_id__core(cpu, NULL);
> > +               evsel__update_percore_stats(evsel, &core_id);
> > +       }
> > +}
> > +
> > +/* process cpu stats on per-core events */
> > +void perf_stat_process_percore(struct perf_stat_config *config, struct evlist *evlist)
> > +{
> > +       struct evsel *evsel;
> > +
> > +       if (config->aggr_mode != AGGR_NONE)
> > +               return;
> > +
> > +       evlist__for_each_entry(evlist, evsel)
> > +               evsel__process_percore(evsel);
> > +}
> > +
> >  int perf_event__process_stat_event(struct perf_session *session,
> >                                    union perf_event *event)
> >  {
> > diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> > index 12cc60ab04e4..ac85ed46aa59 100644
> > --- a/tools/perf/util/stat.h
> > +++ b/tools/perf/util/stat.h
> > @@ -46,6 +46,7 @@ enum perf_stat_evsel_id {
> >  struct perf_stat_aggr {
> >         struct perf_counts_values       counts;
> >         int                             nr;
> > +       bool                            used;
> >         bool                            failed;
> >  };
> >
> > @@ -271,6 +272,7 @@ void evlist__reset_aggr_stats(struct evlist *evlist);
> >  int perf_stat_process_counter(struct perf_stat_config *config,
> >                               struct evsel *counter);
> >  void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *evlist);
> > +void perf_stat_process_percore(struct perf_stat_config *config, struct evlist *evlist);
> >
> >  struct perf_tool;
> >  union perf_event;
> > --
> > 2.38.0.rc1.362.ged0d419d3c-goog
> >

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1)
  2022-10-11  6:13     ` Ian Rogers
@ 2022-10-12  3:55       ` Namhyung Kim
  0 siblings, 0 replies; 63+ messages in thread
From: Namhyung Kim @ 2022-10-12  3:55 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Andi Kleen, Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar,
	Peter Zijlstra, LKML, Adrian Hunter, linux-perf-users, Kan Liang,
	Leo Yan, Athira Rajeev, James Clark, Xing Zhengjun

On Mon, Oct 10, 2022 at 11:14 PM Ian Rogers <irogers@google.com> wrote:
>
> On Mon, Oct 10, 2022 at 10:38 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hi Andi,
> >
> > On Mon, Oct 10, 2022 at 5:25 PM Andi Kleen <ak@linux.intel.com> wrote:
> > >
> > >
> > > On 10/10/2022 10:35 PM, Namhyung Kim wrote:
> > > > Hello,
> > > >
> > > > Current perf stat code is somewhat hard to follow since it handles
> > > > many combinations of PMUs/events for given display and aggregation
> > > > options.  This is my attempt to clean it up a little. ;-)
> > >
> > >
> > > My main concern would be subtle regressions since there are so many
> > > different combinations and way to travel through the code, and a lot of
> > > things are not covered by unit tests. When I worked on the code it was
> > > difficult to keep it all working. I assume you have some way to
> > > enumerate them all and tested that the output is identical?
> >
> > Right, that's my concern too.
> >
> > I have tested many combinations manually and checked if they
> > produced similar results.  But the problem is that I cannot test
> > all hardwares and more importantly it's hard to check
> > programmatically if the output is the same or not.  The numbers
> > vary on each run and sometimes it fluctuates a lot.  I don't have
> > good test workloads and the results work for every combination.
> >
> > Any suggestions?
>
> I don't think there is anything clever we can do here. A few releases
> ago summary mode was enabled by default. For CSV output this meant a
> summary was printed at the bottom of perf stat and importantly the
> summary print out added a column on the left of all the other columns.
> This caused some tool issues for us. We now have a test that CSV
> output has a fixed number of columns. We added the CSV test because
> the json output code reformatted the display code and it would be easy
> to introduce a regression (in fact I did :-/ ). So my point is that
> stat output can change and break things and we've been doing this by
> accident for a while now. This isn't a reason to not merge this
> change.
>
> I think the real fix here is for tools to stop using text or CSV
> output and switch to the json output, that way output isn't as brittle
> except to the keys we use. It isn't feasible for the perf tool to
> stand still in case there is a script somewhere, we'll just accumulate
> bugs and baggage. However, if someone has a script and they want to
> enforce an output, all they need to do is stick a test on it (the
> Beyonce principle except s/ring/test/).

Thanks for your opinion.

I agree that it'd be better using JSON output for machine processing.
Although there are records of historic perf stat brekages, it'd be nice
if we could avoid that for the default output mode. :)
Let me think about if there's a better way.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1)
  2022-10-11 11:57     ` Andi Kleen
@ 2022-10-12  3:58       ` Namhyung Kim
  0 siblings, 0 replies; 63+ messages in thread
From: Namhyung Kim @ 2022-10-12  3:58 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Ian Rogers, Adrian Hunter, linux-perf-users, Kan Liang,
	Leo Yan, Athira Rajeev, James Clark, Xing Zhengjun

On Tue, Oct 11, 2022 at 4:57 AM Andi Kleen <ak@linux.intel.com> wrote:
>
>
> >> My main concern would be subtle regressions since there are so many
> >> different combinations and way to travel through the code, and a lot of
> >> things are not covered by unit tests. When I worked on the code it was
> >> difficult to keep it all working. I assume you have some way to
> >> enumerate them all and tested that the output is identical?
> > Right, that's my concern too.
> >
> > I have tested many combinations manually and checked if they
> > produced similar results.
>
> I had a script to test many combinations, but had to check the output
> manually
>
>
> > But the problem is that I cannot test
> > all hardwares and more importantly it's hard to check
> > programmatically if the output is the same or not.
>
> Can use "dummy" or some software event (e.g. a probe on some syscall) to
> get stable numbers. I don't think we need to cover all hardware for the
> output options, the different events should be similar, but need some
> coverage for the different aggregation. Or we could add some more tool
> events just for testing purposes, that would allow covering different
> core scopes etc. and would easily allow generating known counts.

Even if we can get a stable number, it still needs to know cpu topology
for different aggregation modes to verify the count.  Also I'm afraid that
cpu hotplug can affect the aggregation.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 19/19] perf stat: Remove unused perf_counts.aggr field
  2022-10-10  5:36 ` [PATCH 19/19] perf stat: Remove unused perf_counts.aggr field Namhyung Kim
  2022-10-10 23:40   ` Ian Rogers
@ 2022-10-12  8:41   ` Jiri Olsa
  2022-10-12 16:26     ` Namhyung Kim
  1 sibling, 1 reply; 63+ messages in thread
From: Jiri Olsa @ 2022-10-12  8:41 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	Ian Rogers, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Sun, Oct 09, 2022 at 10:36:00PM -0700, Namhyung Kim wrote:
> The aggr field in the struct perf_counts is to keep the aggregated value
> in the AGGR_GLOBAL for the old code.  But it's not used anymore.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/util/counts.c |  1 -
>  tools/perf/util/counts.h |  1 -
>  tools/perf/util/stat.c   | 35 ++---------------------------------
>  3 files changed, 2 insertions(+), 35 deletions(-)
> 
> diff --git a/tools/perf/util/counts.c b/tools/perf/util/counts.c
> index 7a447d918458..11cd85b278a6 100644
> --- a/tools/perf/util/counts.c
> +++ b/tools/perf/util/counts.c
> @@ -48,7 +48,6 @@ void perf_counts__reset(struct perf_counts *counts)
>  {
>  	xyarray__reset(counts->loaded);
>  	xyarray__reset(counts->values);
> -	memset(&counts->aggr, 0, sizeof(struct perf_counts_values));
>  }
>  
>  void evsel__reset_counts(struct evsel *evsel)
> diff --git a/tools/perf/util/counts.h b/tools/perf/util/counts.h
> index 5de275194f2b..42760242e0df 100644
> --- a/tools/perf/util/counts.h
> +++ b/tools/perf/util/counts.h
> @@ -11,7 +11,6 @@ struct evsel;
>  
>  struct perf_counts {
>  	s8			  scaled;
> -	struct perf_counts_values aggr;
>  	struct xyarray		  *values;
>  	struct xyarray		  *loaded;
>  };
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index 1652586a4925..0dccfa273fa7 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -307,8 +307,6 @@ static void evsel__copy_prev_raw_counts(struct evsel *evsel)
>  				*perf_counts(evsel->prev_raw_counts, idx, thread);
>  		}
>  	}
> -
> -	evsel->counts->aggr = evsel->prev_raw_counts->aggr;
>  }
>  
>  void evlist__copy_prev_raw_counts(struct evlist *evlist)
> @@ -319,26 +317,6 @@ void evlist__copy_prev_raw_counts(struct evlist *evlist)
>  		evsel__copy_prev_raw_counts(evsel);
>  }
>  
> -void evlist__save_aggr_prev_raw_counts(struct evlist *evlist)
> -{
> -	struct evsel *evsel;
> -
> -	/*
> -	 * To collect the overall statistics for interval mode,
> -	 * we copy the counts from evsel->prev_raw_counts to
> -	 * evsel->counts. The perf_stat_process_counter creates
> -	 * aggr values from per cpu values, but the per cpu values
> -	 * are 0 for AGGR_GLOBAL. So we use a trick that saves the
> -	 * previous aggr value to the first member of perf_counts,
> -	 * then aggr calculation in process_counter_values can work
> -	 * correctly.
> -	 */
> -	evlist__for_each_entry(evlist, evsel) {
> -		*perf_counts(evsel->prev_raw_counts, 0, 0) =
> -			evsel->prev_raw_counts->aggr;
> -	}
> -}
> -
>  static size_t pkg_id_hash(const void *__key, void *ctx __maybe_unused)
>  {
>  	uint64_t *key = (uint64_t *) __key;
> @@ -422,7 +400,6 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
>  		       int cpu_map_idx, int thread,
>  		       struct perf_counts_values *count)
>  {
> -	struct perf_counts_values *aggr = &evsel->counts->aggr;
>  	struct perf_stat_evsel *ps = evsel->stats;
>  	static struct perf_counts_values zero;
>  	bool skip = false;
> @@ -491,12 +468,6 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
>  		}
>  	}
>  
> -	if (config->aggr_mode == AGGR_GLOBAL) {
> -		aggr->val += count->val;
> -		aggr->ena += count->ena;
> -		aggr->run += count->run;
> -	}
> -
>  	return 0;
>  }
>  
> @@ -521,13 +492,10 @@ static int process_counter_maps(struct perf_stat_config *config,
>  int perf_stat_process_counter(struct perf_stat_config *config,
>  			      struct evsel *counter)
>  {
> -	struct perf_counts_values *aggr = &counter->counts->aggr;
>  	struct perf_stat_evsel *ps = counter->stats;
> -	u64 *count = counter->counts->aggr.values;
> +	u64 *count;
>  	int ret;
>  
> -	aggr->val = aggr->ena = aggr->run = 0;
> -
>  	if (counter->per_pkg)
>  		evsel__zero_per_pkg(counter);
>  
> @@ -538,6 +506,7 @@ int perf_stat_process_counter(struct perf_stat_config *config,
>  	if (config->aggr_mode != AGGR_GLOBAL)
>  		return 0;
>  
> +	count = ps->aggr[0].counts.values;
>  	update_stats(&ps->res_stats, *count);

hi,
for some reason 'count' could be NULL, I'm getting crash in here:

	$ ./perf stat record -o krava.data true 
	...

	$ gdb ./perf

	(gdb) r stat report -i krava.data
	Starting program: /home/jolsa/kernel/linux-perf/tools/perf/perf stat report -i krava.data
	[Thread debugging using libthread_db enabled]
	Using host libthread_db library "/lib64/libthread_db.so.1".

	Program received signal SIGSEGV, Segmentation fault.
	0x00000000005ae90b in perf_stat_process_counter (config=0xe18d60 <stat_config>, counter=0xecfd00) at util/stat.c:510
	510             update_stats(&ps->res_stats, *count);
	Missing separate debuginfos, use: dnf debuginfo-install bzip2-libs-1.0.8-11.fc36.x86_64 cyrus-sasl-lib-2.1.27-18.fc36.x86_64 elfutils-debuginfod-client-0.187-4.fc36.x86_64 elfutils-libelf-0.187-4.fc36.x86_64 elfutils-libs-0.187-4.fc36.x86_64 glibc-2.35-15.fc36.x86_64 glibc-2.35-17.fc36.x86_64 keyutils-libs-1.6.1-4.fc36.x86_64 krb5-libs-1.19.2-11.fc36.x86_64 libbrotli-1.0.9-7.fc36.x86_64 libcap-2.48-4.fc36.x86_64 libcom_err-1.46.5-2.fc36.x86_64 libcurl-7.82.0-8.fc36.x86_64 libevent-2.1.12-6.fc36.x86_64 libgcc-12.2.1-2.fc36.x86_64 libidn2-2.3.3-1.fc36.x86_64 libnghttp2-1.46.0-2.fc36.x86_64 libpsl-0.21.1-5.fc36.x86_64 libselinux-3.3-4.fc36.x86_64 libssh-0.9.6-4.fc36.x86_64 libunistring-1.0-1.fc36.x86_64 libunwind-1.6.2-2.fc36.x86_64 libxcrypt-4.4.28-1.fc36.x86_64 libzstd-1.5.2-2.fc36.x86_64 numactl-libs-2.0.14-5.fc36.x86_64 openldap-2.6.3-1.fc36.x86_64 openssl-libs-3.0.5-1.fc36.x86_64 perl-libs-5.34.1-486.fc36.x86_64 python3-libs-3.10.7-1.fc36.x86_64 slang-2.3.2-11.fc36.x86_64 xz-libs-5.2.5-9.fc36.x86_64 zlib-1.2.11-33.fc36.x86_64
	(gdb) bt
	#0  0x00000000005ae90b in perf_stat_process_counter (config=0xe18d60 <stat_config>, counter=0xecfd00) at util/stat.c:510
	#1  0x000000000043b716 in process_counters () at builtin-stat.c:485
	#2  0x000000000043f2bf in process_stat_round_event (session=0xec84f0, event=0x7ffff7ffaba8) at builtin-stat.c:2099
	#3  0x000000000056c7b6 in perf_session__process_user_event (session=0xec84f0, event=0x7ffff7ffaba8, file_offset=2984, file_path=0xecf220 "krava.data")
	    at util/session.c:1714
	#4  0x000000000056cea5 in perf_session__process_event (session=0xec84f0, event=0x7ffff7ffaba8, file_offset=2984, file_path=0xecf220 "krava.data")
	    at util/session.c:1857
	#5  0x000000000056e4fa in process_simple (session=0xec84f0, event=0x7ffff7ffaba8, file_offset=2984, file_path=0xecf220 "krava.data") at util/session.c:2432
	#6  0x000000000056e1b9 in reader__read_event (rd=0x7fffffffb6c0, session=0xec84f0, prog=0x7fffffffb690) at util/session.c:2361
	#7  0x000000000056e3ae in reader__process_events (rd=0x7fffffffb6c0, session=0xec84f0, prog=0x7fffffffb690) at util/session.c:2410
	#8  0x000000000056e652 in __perf_session__process_events (session=0xec84f0) at util/session.c:2457
	#9  0x000000000056eff8 in perf_session__process_events (session=0xec84f0) at util/session.c:2623
	#10 0x000000000043fa53 in __cmd_report (argc=0, argv=0x7fffffffdf70) at builtin-stat.c:2265
	#11 0x000000000043fd94 in cmd_stat (argc=3, argv=0x7fffffffdf70) at builtin-stat.c:2346
	#12 0x00000000004ef495 in run_builtin (p=0xe2f930 <commands+336>, argc=4, argv=0x7fffffffdf70) at perf.c:322
	#13 0x00000000004ef709 in handle_internal_command (argc=4, argv=0x7fffffffdf70) at perf.c:376
	#14 0x00000000004ef858 in run_argv (argcp=0x7fffffffdd9c, argv=0x7fffffffdd90) at perf.c:420
	#15 0x00000000004efc1f in main (argc=4, argv=0x7fffffffdf70) at perf.c:550
	(gdb) p count
	$1 = (u64 *) 0x0

jirka

>  
>  	if (verbose > 0) {
> -- 
> 2.38.0.rc1.362.ged0d419d3c-goog
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 05/19] perf stat: Add cpu aggr id for no aggregation mode
  2022-10-10  5:35 ` [PATCH 05/19] perf stat: Add cpu aggr id for no aggregation mode Namhyung Kim
  2022-10-10 22:49   ` Ian Rogers
@ 2022-10-12 10:40   ` Jiri Olsa
  2022-10-12 16:27     ` Namhyung Kim
  1 sibling, 1 reply; 63+ messages in thread
From: Jiri Olsa @ 2022-10-12 10:40 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	Ian Rogers, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Sun, Oct 09, 2022 at 10:35:46PM -0700, Namhyung Kim wrote:
> Likewise, add an aggr_id for cpu for none aggregation mode.  This is not
> used actually yet but later code will use to unify the aggregation code.
> 
> No functional change intended.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-stat.c | 48 +++++++++++++++++++++++++++++++++++----
>  1 file changed, 43 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 144bb3a657f2..b00ef20aef5b 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -1339,6 +1339,12 @@ static struct aggr_cpu_id perf_stat__get_global(struct perf_stat_config *config
>  	return id;
>  }
>  
> +static struct aggr_cpu_id perf_stat__get_cpu(struct perf_stat_config *config __maybe_unused,
> +					     struct perf_cpu cpu)
> +{
> +	return aggr_cpu_id__cpu(cpu, /*data=*/NULL);
> +}
> +
>  static struct aggr_cpu_id perf_stat__get_aggr(struct perf_stat_config *config,
>  					      aggr_get_id_t get_id, struct perf_cpu cpu)
>  {
> @@ -1381,6 +1387,12 @@ static struct aggr_cpu_id perf_stat__get_global_cached(struct perf_stat_config *
>  	return perf_stat__get_aggr(config, perf_stat__get_global, cpu);
>  }
>  
> +static struct aggr_cpu_id perf_stat__get_cpu_cached(struct perf_stat_config *config,
> +						    struct perf_cpu cpu)
> +{
> +	return perf_stat__get_aggr(config, perf_stat__get_cpu, cpu);
> +}
> +
>  static bool term_percore_set(void)
>  {
>  	struct evsel *counter;
> @@ -1407,8 +1419,7 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr(enum aggr_mode aggr_mode)
>  	case AGGR_NONE:
>  		if (term_percore_set())
>  			return aggr_cpu_id__core;
> -
> -		return NULL;
> +		return aggr_cpu_id__cpu;;

nit, double ;; ;-)

jirka

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 04/19] perf stat: Add aggr id for global mode
  2022-10-10  5:35 ` [PATCH 04/19] perf stat: Add aggr id for global mode Namhyung Kim
  2022-10-10 22:46   ` Ian Rogers
@ 2022-10-12 10:55   ` Jiri Olsa
  2022-10-12 16:31     ` Namhyung Kim
  1 sibling, 1 reply; 63+ messages in thread
From: Jiri Olsa @ 2022-10-12 10:55 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	Ian Rogers, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Sun, Oct 09, 2022 at 10:35:45PM -0700, Namhyung Kim wrote:
> To make the code simpler, I'd like to use the same aggregation code for
> the global mode.  We can simply add an id function to return cpu 0 and
> use print_aggr().
> 
> No functional change intended.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-stat.c      | 39 ++++++++++++++++++++++++++++++++--
>  tools/perf/util/cpumap.c       | 10 +++++++++
>  tools/perf/util/cpumap.h       |  6 +++++-
>  tools/perf/util/stat-display.c |  9 ++------
>  4 files changed, 54 insertions(+), 10 deletions(-)
> 

SNIP

> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index 4113aa86772f..1d8e585df4ad 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -1477,13 +1477,8 @@ void evlist__print_counters(struct evlist *evlist, struct perf_stat_config *conf
>  		if (config->iostat_run)
>  			iostat_print_counters(evlist, config, ts, prefix = buf,
>  					      print_counter_aggr);
> -		else {
> -			evlist__for_each_entry(evlist, counter) {
> -				print_counter_aggr(config, counter, prefix);
> -			}
> -			if (metric_only)
> -				fputc('\n', config->output);
> -		}
> +		else
> +			print_aggr(config, evlist, prefix);

this seems to break output for:

before:
	# ./perf stat -M ipc -I 1000 --metric-only
	#           time                  IPC 
	     1.000674320                 0.61 
	     2.001700284                 0.66 
	     3.003677500                 0.67 
	     4.005583140                 0.64 

after:
	# ./perf stat -M ipc -I 1000 --metric-only
	#           time                  IPC 
	     1.001004048                 0.94 

	     2.003120471                 0.69 

	     3.005030405                 0.65 

	     4.006788766                 0.64 

	     5.008004052                 0.68 


also should this hunk be in separate patch?

jirka

>  		break;
>  	case AGGR_NONE:
>  		if (metric_only)
> -- 
> 2.38.0.rc1.362.ged0d419d3c-goog
> 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 19/19] perf stat: Remove unused perf_counts.aggr field
  2022-10-12  8:41   ` Jiri Olsa
@ 2022-10-12 16:26     ` Namhyung Kim
  2022-10-13 20:56       ` [PATCH] perf stat: Init aggr_map when reporting per-process stat Namhyung Kim
  0 siblings, 1 reply; 63+ messages in thread
From: Namhyung Kim @ 2022-10-12 16:26 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	Ian Rogers, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

Hi Jiri,

On Wed, Oct 12, 2022 at 1:41 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Sun, Oct 09, 2022 at 10:36:00PM -0700, Namhyung Kim wrote:
> > The aggr field in the struct perf_counts is to keep the aggregated value
> > in the AGGR_GLOBAL for the old code.  But it's not used anymore.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/perf/util/counts.c |  1 -
> >  tools/perf/util/counts.h |  1 -
> >  tools/perf/util/stat.c   | 35 ++---------------------------------
> >  3 files changed, 2 insertions(+), 35 deletions(-)
> >
> > diff --git a/tools/perf/util/counts.c b/tools/perf/util/counts.c
> > index 7a447d918458..11cd85b278a6 100644
> > --- a/tools/perf/util/counts.c
> > +++ b/tools/perf/util/counts.c
> > @@ -48,7 +48,6 @@ void perf_counts__reset(struct perf_counts *counts)
> >  {
> >       xyarray__reset(counts->loaded);
> >       xyarray__reset(counts->values);
> > -     memset(&counts->aggr, 0, sizeof(struct perf_counts_values));
> >  }
> >
> >  void evsel__reset_counts(struct evsel *evsel)
> > diff --git a/tools/perf/util/counts.h b/tools/perf/util/counts.h
> > index 5de275194f2b..42760242e0df 100644
> > --- a/tools/perf/util/counts.h
> > +++ b/tools/perf/util/counts.h
> > @@ -11,7 +11,6 @@ struct evsel;
> >
> >  struct perf_counts {
> >       s8                        scaled;
> > -     struct perf_counts_values aggr;
> >       struct xyarray            *values;
> >       struct xyarray            *loaded;
> >  };
> > diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> > index 1652586a4925..0dccfa273fa7 100644
> > --- a/tools/perf/util/stat.c
> > +++ b/tools/perf/util/stat.c
> > @@ -307,8 +307,6 @@ static void evsel__copy_prev_raw_counts(struct evsel *evsel)
> >                               *perf_counts(evsel->prev_raw_counts, idx, thread);
> >               }
> >       }
> > -
> > -     evsel->counts->aggr = evsel->prev_raw_counts->aggr;
> >  }
> >
> >  void evlist__copy_prev_raw_counts(struct evlist *evlist)
> > @@ -319,26 +317,6 @@ void evlist__copy_prev_raw_counts(struct evlist *evlist)
> >               evsel__copy_prev_raw_counts(evsel);
> >  }
> >
> > -void evlist__save_aggr_prev_raw_counts(struct evlist *evlist)
> > -{
> > -     struct evsel *evsel;
> > -
> > -     /*
> > -      * To collect the overall statistics for interval mode,
> > -      * we copy the counts from evsel->prev_raw_counts to
> > -      * evsel->counts. The perf_stat_process_counter creates
> > -      * aggr values from per cpu values, but the per cpu values
> > -      * are 0 for AGGR_GLOBAL. So we use a trick that saves the
> > -      * previous aggr value to the first member of perf_counts,
> > -      * then aggr calculation in process_counter_values can work
> > -      * correctly.
> > -      */
> > -     evlist__for_each_entry(evlist, evsel) {
> > -             *perf_counts(evsel->prev_raw_counts, 0, 0) =
> > -                     evsel->prev_raw_counts->aggr;
> > -     }
> > -}
> > -
> >  static size_t pkg_id_hash(const void *__key, void *ctx __maybe_unused)
> >  {
> >       uint64_t *key = (uint64_t *) __key;
> > @@ -422,7 +400,6 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
> >                      int cpu_map_idx, int thread,
> >                      struct perf_counts_values *count)
> >  {
> > -     struct perf_counts_values *aggr = &evsel->counts->aggr;
> >       struct perf_stat_evsel *ps = evsel->stats;
> >       static struct perf_counts_values zero;
> >       bool skip = false;
> > @@ -491,12 +468,6 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
> >               }
> >       }
> >
> > -     if (config->aggr_mode == AGGR_GLOBAL) {
> > -             aggr->val += count->val;
> > -             aggr->ena += count->ena;
> > -             aggr->run += count->run;
> > -     }
> > -
> >       return 0;
> >  }
> >
> > @@ -521,13 +492,10 @@ static int process_counter_maps(struct perf_stat_config *config,
> >  int perf_stat_process_counter(struct perf_stat_config *config,
> >                             struct evsel *counter)
> >  {
> > -     struct perf_counts_values *aggr = &counter->counts->aggr;
> >       struct perf_stat_evsel *ps = counter->stats;
> > -     u64 *count = counter->counts->aggr.values;
> > +     u64 *count;
> >       int ret;
> >
> > -     aggr->val = aggr->ena = aggr->run = 0;
> > -
> >       if (counter->per_pkg)
> >               evsel__zero_per_pkg(counter);
> >
> > @@ -538,6 +506,7 @@ int perf_stat_process_counter(struct perf_stat_config *config,
> >       if (config->aggr_mode != AGGR_GLOBAL)
> >               return 0;
> >
> > +     count = ps->aggr[0].counts.values;
> >       update_stats(&ps->res_stats, *count);
>
> hi,
> for some reason 'count' could be NULL, I'm getting crash in here:

Ouch, will check.  Thanks for the test!

Thanks,
Namhyung


>
>         $ ./perf stat record -o krava.data true
>         ...
>
>         $ gdb ./perf
>
>         (gdb) r stat report -i krava.data
>         Starting program: /home/jolsa/kernel/linux-perf/tools/perf/perf stat report -i krava.data
>         [Thread debugging using libthread_db enabled]
>         Using host libthread_db library "/lib64/libthread_db.so.1".
>
>         Program received signal SIGSEGV, Segmentation fault.
>         0x00000000005ae90b in perf_stat_process_counter (config=0xe18d60 <stat_config>, counter=0xecfd00) at util/stat.c:510
>         510             update_stats(&ps->res_stats, *count);
>         Missing separate debuginfos, use: dnf debuginfo-install bzip2-libs-1.0.8-11.fc36.x86_64 cyrus-sasl-lib-2.1.27-18.fc36.x86_64 elfutils-debuginfod-client-0.187-4.fc36.x86_64 elfutils-libelf-0.187-4.fc36.x86_64 elfutils-libs-0.187-4.fc36.x86_64 glibc-2.35-15.fc36.x86_64 glibc-2.35-17.fc36.x86_64 keyutils-libs-1.6.1-4.fc36.x86_64 krb5-libs-1.19.2-11.fc36.x86_64 libbrotli-1.0.9-7.fc36.x86_64 libcap-2.48-4.fc36.x86_64 libcom_err-1.46.5-2.fc36.x86_64 libcurl-7.82.0-8.fc36.x86_64 libevent-2.1.12-6.fc36.x86_64 libgcc-12.2.1-2.fc36.x86_64 libidn2-2.3.3-1.fc36.x86_64 libnghttp2-1.46.0-2.fc36.x86_64 libpsl-0.21.1-5.fc36.x86_64 libselinux-3.3-4.fc36.x86_64 libssh-0.9.6-4.fc36.x86_64 libunistring-1.0-1.fc36.x86_64 libunwind-1.6.2-2.fc36.x86_64 libxcrypt-4.4.28-1.fc36.x86_64 libzstd-1.5.2-2.fc36.x86_64 numactl-libs-2.0.14-5.fc36.x86_64 openldap-2.6.3-1.fc36.x86_64 openssl-libs-3.0.5-1.fc36.x86_64 perl-libs-5.34.1-486.fc36.x86_64 python3-libs-3.10.7-1.fc36.x86_64 slang-2.3.2-11.fc36.x86_64 xz-libs-5.2.5-9.fc36.x86_64 zlib-1.2.11-33.fc36.x86_64
>         (gdb) bt
>         #0  0x00000000005ae90b in perf_stat_process_counter (config=0xe18d60 <stat_config>, counter=0xecfd00) at util/stat.c:510
>         #1  0x000000000043b716 in process_counters () at builtin-stat.c:485
>         #2  0x000000000043f2bf in process_stat_round_event (session=0xec84f0, event=0x7ffff7ffaba8) at builtin-stat.c:2099
>         #3  0x000000000056c7b6 in perf_session__process_user_event (session=0xec84f0, event=0x7ffff7ffaba8, file_offset=2984, file_path=0xecf220 "krava.data")
>             at util/session.c:1714
>         #4  0x000000000056cea5 in perf_session__process_event (session=0xec84f0, event=0x7ffff7ffaba8, file_offset=2984, file_path=0xecf220 "krava.data")
>             at util/session.c:1857
>         #5  0x000000000056e4fa in process_simple (session=0xec84f0, event=0x7ffff7ffaba8, file_offset=2984, file_path=0xecf220 "krava.data") at util/session.c:2432
>         #6  0x000000000056e1b9 in reader__read_event (rd=0x7fffffffb6c0, session=0xec84f0, prog=0x7fffffffb690) at util/session.c:2361
>         #7  0x000000000056e3ae in reader__process_events (rd=0x7fffffffb6c0, session=0xec84f0, prog=0x7fffffffb690) at util/session.c:2410
>         #8  0x000000000056e652 in __perf_session__process_events (session=0xec84f0) at util/session.c:2457
>         #9  0x000000000056eff8 in perf_session__process_events (session=0xec84f0) at util/session.c:2623
>         #10 0x000000000043fa53 in __cmd_report (argc=0, argv=0x7fffffffdf70) at builtin-stat.c:2265
>         #11 0x000000000043fd94 in cmd_stat (argc=3, argv=0x7fffffffdf70) at builtin-stat.c:2346
>         #12 0x00000000004ef495 in run_builtin (p=0xe2f930 <commands+336>, argc=4, argv=0x7fffffffdf70) at perf.c:322
>         #13 0x00000000004ef709 in handle_internal_command (argc=4, argv=0x7fffffffdf70) at perf.c:376
>         #14 0x00000000004ef858 in run_argv (argcp=0x7fffffffdd9c, argv=0x7fffffffdd90) at perf.c:420
>         #15 0x00000000004efc1f in main (argc=4, argv=0x7fffffffdf70) at perf.c:550
>         (gdb) p count
>         $1 = (u64 *) 0x0
>
> jirka
>
> >
> >       if (verbose > 0) {
> > --
> > 2.38.0.rc1.362.ged0d419d3c-goog
> >

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 05/19] perf stat: Add cpu aggr id for no aggregation mode
  2022-10-12 10:40   ` Jiri Olsa
@ 2022-10-12 16:27     ` Namhyung Kim
  0 siblings, 0 replies; 63+ messages in thread
From: Namhyung Kim @ 2022-10-12 16:27 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	Ian Rogers, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Wed, Oct 12, 2022 at 3:40 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Sun, Oct 09, 2022 at 10:35:46PM -0700, Namhyung Kim wrote:
> > Likewise, add an aggr_id for cpu for none aggregation mode.  This is not
> > used actually yet but later code will use to unify the aggregation code.
> >
> > No functional change intended.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/perf/builtin-stat.c | 48 +++++++++++++++++++++++++++++++++++----
> >  1 file changed, 43 insertions(+), 5 deletions(-)
> >
> > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> > index 144bb3a657f2..b00ef20aef5b 100644
> > --- a/tools/perf/builtin-stat.c
> > +++ b/tools/perf/builtin-stat.c
> > @@ -1339,6 +1339,12 @@ static struct aggr_cpu_id perf_stat__get_global(struct perf_stat_config *config
> >       return id;
> >  }
> >
> > +static struct aggr_cpu_id perf_stat__get_cpu(struct perf_stat_config *config __maybe_unused,
> > +                                          struct perf_cpu cpu)
> > +{
> > +     return aggr_cpu_id__cpu(cpu, /*data=*/NULL);
> > +}
> > +
> >  static struct aggr_cpu_id perf_stat__get_aggr(struct perf_stat_config *config,
> >                                             aggr_get_id_t get_id, struct perf_cpu cpu)
> >  {
> > @@ -1381,6 +1387,12 @@ static struct aggr_cpu_id perf_stat__get_global_cached(struct perf_stat_config *
> >       return perf_stat__get_aggr(config, perf_stat__get_global, cpu);
> >  }
> >
> > +static struct aggr_cpu_id perf_stat__get_cpu_cached(struct perf_stat_config *config,
> > +                                                 struct perf_cpu cpu)
> > +{
> > +     return perf_stat__get_aggr(config, perf_stat__get_cpu, cpu);
> > +}
> > +
> >  static bool term_percore_set(void)
> >  {
> >       struct evsel *counter;
> > @@ -1407,8 +1419,7 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr(enum aggr_mode aggr_mode)
> >       case AGGR_NONE:
> >               if (term_percore_set())
> >                       return aggr_cpu_id__core;
> > -
> > -             return NULL;
> > +             return aggr_cpu_id__cpu;;
>
> nit, double ;; ;-)

Good eye :)  I'll remove it.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH 04/19] perf stat: Add aggr id for global mode
  2022-10-12 10:55   ` Jiri Olsa
@ 2022-10-12 16:31     ` Namhyung Kim
  0 siblings, 0 replies; 63+ messages in thread
From: Namhyung Kim @ 2022-10-12 16:31 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	Ian Rogers, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun

On Wed, Oct 12, 2022 at 3:56 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Sun, Oct 09, 2022 at 10:35:45PM -0700, Namhyung Kim wrote:
> > To make the code simpler, I'd like to use the same aggregation code for
> > the global mode.  We can simply add an id function to return cpu 0 and
> > use print_aggr().
> >
> > No functional change intended.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/perf/builtin-stat.c      | 39 ++++++++++++++++++++++++++++++++--
> >  tools/perf/util/cpumap.c       | 10 +++++++++
> >  tools/perf/util/cpumap.h       |  6 +++++-
> >  tools/perf/util/stat-display.c |  9 ++------
> >  4 files changed, 54 insertions(+), 10 deletions(-)
> >
>
> SNIP
>
> > diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> > index 4113aa86772f..1d8e585df4ad 100644
> > --- a/tools/perf/util/stat-display.c
> > +++ b/tools/perf/util/stat-display.c
> > @@ -1477,13 +1477,8 @@ void evlist__print_counters(struct evlist *evlist, struct perf_stat_config *conf
> >               if (config->iostat_run)
> >                       iostat_print_counters(evlist, config, ts, prefix = buf,
> >                                             print_counter_aggr);
> > -             else {
> > -                     evlist__for_each_entry(evlist, counter) {
> > -                             print_counter_aggr(config, counter, prefix);
> > -                     }
> > -                     if (metric_only)
> > -                             fputc('\n', config->output);
> > -             }
> > +             else
> > +                     print_aggr(config, evlist, prefix);
>
> this seems to break output for:
>
> before:
>         # ./perf stat -M ipc -I 1000 --metric-only
>         #           time                  IPC
>              1.000674320                 0.61
>              2.001700284                 0.66
>              3.003677500                 0.67
>              4.005583140                 0.64
>
> after:
>         # ./perf stat -M ipc -I 1000 --metric-only
>         #           time                  IPC
>              1.001004048                 0.94
>
>              2.003120471                 0.69
>
>              3.005030405                 0.65
>
>              4.006788766                 0.64
>
>              5.008004052                 0.68
>
>
> also should this hunk be in separate patch?

Yeah, looks like so.  Probably slipped into during rebase.  Will check.

Thanks,
Namhyung


>
> >               break;
> >       case AGGR_NONE:
> >               if (metric_only)
> > --
> > 2.38.0.rc1.362.ged0d419d3c-goog
> >

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH] perf stat: Init aggr_map when reporting per-process stat
  2022-10-12 16:26     ` Namhyung Kim
@ 2022-10-13 20:56       ` Namhyung Kim
  0 siblings, 0 replies; 63+ messages in thread
From: Namhyung Kim @ 2022-10-13 20:56 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun

I'll merge this into the problematic commit.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 4cb3ceeb7ba4..9d35a3338976 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1355,7 +1355,11 @@ static struct aggr_cpu_id perf_stat__get_cpu(struct perf_stat_config *config __m
 static struct aggr_cpu_id perf_stat__get_aggr(struct perf_stat_config *config,
 					      aggr_get_id_t get_id, struct perf_cpu cpu)
 {
-	struct aggr_cpu_id id = aggr_cpu_id__empty();
+	struct aggr_cpu_id id;
+
+	/* per-process mode - should use global aggr mode */
+	if (cpu.cpu == -1)
+		return get_id(config, cpu);
 
 	if (aggr_cpu_id__is_empty(&config->cpus_aggr_map->map[cpu.cpu]))
 		config->cpus_aggr_map->map[cpu.cpu] = get_id(config, cpu);
@@ -2120,11 +2124,9 @@ int process_stat_config_event(struct perf_session *session,
 	if (perf_cpu_map__empty(st->cpus)) {
 		if (st->aggr_mode != AGGR_UNSET)
 			pr_warning("warning: processing task data, aggregation mode not set\n");
-		return 0;
-	}
-
-	if (st->aggr_mode != AGGR_UNSET)
+	} else if (st->aggr_mode != AGGR_UNSET) {
 		stat_config.aggr_mode = st->aggr_mode;
+	}
 
 	if (perf_stat.data.is_pipe)
 		perf_stat_init_aggr_mode();
-- 
2.38.0.413.g74048e4d9e-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 14/19] perf stat: Add perf_stat_merge_counters()
  2022-10-14  6:15 [PATCHSET 00/19] perf stat: Cleanup counter aggregation (v2) Namhyung Kim
@ 2022-10-14  6:15 ` Namhyung Kim
  0 siblings, 0 replies; 63+ messages in thread
From: Namhyung Kim @ 2022-10-14  6:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun

The perf_stat_merge_counters() is to aggregate the same events in different
PMUs like in case of uncore or hybrid.  The same logic is in the stat-display
routines but I think it should be handled when it processes the event counters.

As it works on the aggr_counters, it doesn't change the output yet.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c |  2 +
 tools/perf/util/stat.c    | 96 +++++++++++++++++++++++++++++++++++++++
 tools/perf/util/stat.h    |  2 +
 3 files changed, 100 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 838d29590bed..371d6e896942 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -486,6 +486,8 @@ static void process_counters(void)
 			pr_warning("failed to process counter %s\n", counter->name);
 		counter->err = 0;
 	}
+
+	perf_stat_merge_counters(&stat_config, evsel_list);
 }
 
 static void process_interval(void)
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 847481cc3d5a..877107f5a820 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -577,6 +577,102 @@ int perf_stat_process_counter(struct perf_stat_config *config,
 	return 0;
 }
 
+static int evsel__merge_aggr_counters(struct evsel *evsel, struct evsel *alias)
+{
+	struct perf_stat_evsel *ps_a = evsel->stats;
+	struct perf_stat_evsel *ps_b = alias->stats;
+	int i;
+
+	if (ps_a->aggr == NULL && ps_b->aggr == NULL)
+		return 0;
+
+	if (ps_a->nr_aggr != ps_b->nr_aggr) {
+		pr_err("Unmatched aggregation mode between aliases\n");
+		return -1;
+	}
+
+	for (i = 0; i < ps_a->nr_aggr; i++) {
+		struct perf_counts_values *aggr_counts_a = &ps_a->aggr[i].counts;
+		struct perf_counts_values *aggr_counts_b = &ps_b->aggr[i].counts;
+
+		/* NB: don't increase aggr.nr for aliases */
+
+		aggr_counts_a->val += aggr_counts_b->val;
+		aggr_counts_a->ena += aggr_counts_b->ena;
+		aggr_counts_a->run += aggr_counts_b->run;
+	}
+
+	return 0;
+}
+/* events should have the same name, scale, unit, cgroup but on different PMUs */
+static bool evsel__is_alias(struct evsel *evsel_a, struct evsel *evsel_b)
+{
+	if (strcmp(evsel__name(evsel_a), evsel__name(evsel_b)))
+		return false;
+
+	if (evsel_a->scale != evsel_b->scale)
+		return false;
+
+	if (evsel_a->cgrp != evsel_b->cgrp)
+		return false;
+
+	if (strcmp(evsel_a->unit, evsel_b->unit))
+		return false;
+
+	if (evsel__is_clock(evsel_a) != evsel__is_clock(evsel_b))
+		return false;
+
+	return !!strcmp(evsel_a->pmu_name, evsel_b->pmu_name);
+}
+
+static void evsel__merge_aliases(struct evsel *evsel)
+{
+	struct evlist *evlist = evsel->evlist;
+	struct evsel *alias;
+
+	alias = list_prepare_entry(evsel, &(evlist->core.entries), core.node);
+	list_for_each_entry_continue(alias, &evlist->core.entries, core.node) {
+		/* Merge the same events on different PMUs. */
+		if (evsel__is_alias(evsel, alias)) {
+			evsel__merge_aggr_counters(evsel, alias);
+			alias->merged_stat = true;
+		}
+	}
+}
+
+static bool evsel__should_merge_hybrid(struct evsel *evsel, struct perf_stat_config *config)
+{
+	struct perf_pmu *pmu;
+
+	if (!config->hybrid_merge)
+		return false;
+
+	pmu = evsel__find_pmu(evsel);
+	return pmu && pmu->is_hybrid;
+}
+
+static void evsel__merge_stats(struct evsel *evsel, struct perf_stat_config *config)
+{
+	/* this evsel is already merged */
+	if (evsel->merged_stat)
+		return;
+
+	if (evsel->auto_merge_stats || evsel__should_merge_hybrid(evsel, config))
+		evsel__merge_aliases(evsel);
+}
+
+/* merge the same uncore and hybrid events if requested */
+void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *evlist)
+{
+	struct evsel *evsel;
+
+	if (config->no_merge)
+		return;
+
+	evlist__for_each_entry(evlist, evsel)
+		evsel__merge_stats(evsel, config);
+}
+
 int perf_event__process_stat_event(struct perf_session *session,
 				   union perf_event *event)
 {
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 809f9f0aff0c..728bbc823b0d 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -280,6 +280,8 @@ void evlist__reset_aggr_stats(struct evlist *evlist);
 
 int perf_stat_process_counter(struct perf_stat_config *config,
 			      struct evsel *counter);
+void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *evlist);
+
 struct perf_tool;
 union perf_event;
 struct perf_session;
-- 
2.38.0.413.g74048e4d9e-goog


^ permalink raw reply related	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2022-10-14  6:17 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-10  5:35 [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1) Namhyung Kim
2022-10-10  5:35 ` [PATCH 01/19] perf tools: Save evsel->pmu in parse_events() Namhyung Kim
2022-10-10 22:21   ` Ian Rogers
2022-10-10  5:35 ` [PATCH 02/19] perf tools: Use pmu info in evsel__is_hybrid() Namhyung Kim
2022-10-10 22:31   ` Ian Rogers
2022-10-11  5:10     ` Namhyung Kim
2022-10-10  5:35 ` [PATCH 03/19] perf stat: Use evsel__is_hybrid() more Namhyung Kim
2022-10-10 22:32   ` Ian Rogers
2022-10-10  5:35 ` [PATCH 04/19] perf stat: Add aggr id for global mode Namhyung Kim
2022-10-10 22:46   ` Ian Rogers
2022-10-11 23:08     ` Namhyung Kim
2022-10-12 10:55   ` Jiri Olsa
2022-10-12 16:31     ` Namhyung Kim
2022-10-10  5:35 ` [PATCH 05/19] perf stat: Add cpu aggr id for no aggregation mode Namhyung Kim
2022-10-10 22:49   ` Ian Rogers
2022-10-12 10:40   ` Jiri Olsa
2022-10-12 16:27     ` Namhyung Kim
2022-10-10  5:35 ` [PATCH 06/19] perf stat: Add 'needs_sort' argument to cpu_aggr_map__new() Namhyung Kim
2022-10-10 22:53   ` Ian Rogers
2022-10-11 23:32     ` Namhyung Kim
2022-10-10  5:35 ` [PATCH 07/19] perf stat: Add struct perf_stat_aggr to perf_stat_evsel Namhyung Kim
2022-10-10 23:00   ` Ian Rogers
2022-10-11 23:37     ` Namhyung Kim
2022-10-10  5:35 ` [PATCH 08/19] perf stat: Allocate evsel->stats->aggr properly Namhyung Kim
2022-10-10 23:03   ` Ian Rogers
2022-10-11 23:38     ` Namhyung Kim
2022-10-10  5:35 ` [PATCH 09/19] perf stat: Aggregate events using evsel->stats->aggr Namhyung Kim
2022-10-10 23:11   ` Ian Rogers
2022-10-11 23:44     ` Namhyung Kim
2022-10-10  5:35 ` [PATCH 10/19] perf stat: Aggregate per-thread stats " Namhyung Kim
2022-10-10 23:17   ` Ian Rogers
2022-10-11 23:46     ` Namhyung Kim
2022-10-10  5:35 ` [PATCH 11/19] perf stat: Allocate aggr counts for recorded data Namhyung Kim
2022-10-10 23:18   ` Ian Rogers
2022-10-10  5:35 ` [PATCH 12/19] perf stat: Reset aggr counts for each interval Namhyung Kim
2022-10-10 23:20   ` Ian Rogers
2022-10-11 23:48     ` Namhyung Kim
2022-10-10  5:35 ` [PATCH 13/19] perf stat: Split process_counters() Namhyung Kim
2022-10-10 23:21   ` Ian Rogers
2022-10-10  5:35 ` [PATCH 14/19] perf stat: Add perf_stat_merge_counters() Namhyung Kim
2022-10-10 23:31   ` Ian Rogers
2022-10-11 23:55     ` Namhyung Kim
2022-10-10  5:35 ` [PATCH 15/19] perf stat: Add perf_stat_process_percore() Namhyung Kim
2022-10-10 23:32   ` Ian Rogers
2022-10-11 23:59     ` Namhyung Kim
2022-10-10  5:35 ` [PATCH 16/19] perf stat: Add perf_stat_process_shadow_stats() Namhyung Kim
2022-10-10 23:36   ` Ian Rogers
2022-10-10  5:35 ` [PATCH 17/19] perf stat: Display event stats using aggr counts Namhyung Kim
2022-10-10 23:38   ` Ian Rogers
2022-10-10  5:35 ` [PATCH 18/19] perf stat: Display percore events properly Namhyung Kim
2022-10-10 23:39   ` Ian Rogers
2022-10-10  5:36 ` [PATCH 19/19] perf stat: Remove unused perf_counts.aggr field Namhyung Kim
2022-10-10 23:40   ` Ian Rogers
2022-10-12  8:41   ` Jiri Olsa
2022-10-12 16:26     ` Namhyung Kim
2022-10-13 20:56       ` [PATCH] perf stat: Init aggr_map when reporting per-process stat Namhyung Kim
2022-10-11  0:25 ` [RFC/PATCHSET 00/19] perf stat: Cleanup counter aggregation (v1) Andi Kleen
2022-10-11  5:38   ` Namhyung Kim
2022-10-11  6:13     ` Ian Rogers
2022-10-12  3:55       ` Namhyung Kim
2022-10-11 11:57     ` Andi Kleen
2022-10-12  3:58       ` Namhyung Kim
2022-10-14  6:15 [PATCHSET 00/19] perf stat: Cleanup counter aggregation (v2) Namhyung Kim
2022-10-14  6:15 ` [PATCH 14/19] perf stat: Add perf_stat_merge_counters() Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).