linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET 00/20] perf stat: Cleanup counter aggregation (v3)
@ 2022-10-18  2:02 Namhyung Kim
  2022-10-18  2:02 ` [PATCH 01/20] perf tools: Save evsel->pmu in parse_events() Namhyung Kim
                   ` (19 more replies)
  0 siblings, 20 replies; 28+ messages in thread
From: Namhyung Kim @ 2022-10-18  2:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun, Michael Petlan

Hello,

Current perf stat code is somewhat hard to follow since it handles
many combinations of PMUs/events for given display and aggregation
options.  This is my attempt to clean it up a little. ;-)

changes in v3)
 * fix cgroup event display
 * fix perf test failures  (Arnaldo)
 * add comment on the global debug output  (Athira)

changes in v2)
 * fix a segfault in perf stat report for per-process record  (Jiri)
 * fix metric only display  (Jiri)
 * add evsel__reset_aggr_stat  (ian)
 * add more comments  (Ian)
 * add Acked-by from Ian


My first concern is that aggregation and display routines are intermixed
and processed differently depends on the aggregation mode.  I'd like to
separate them apart and make the logic clearer.

To do that, I added struct perf_stat_aggr to save the aggregated counter
values and other info.  It'll be allocated and processed according to
the aggr_mode and display logic will use it.

I've tested the following combination.

  $ cat test-matrix.sh
  #!/bin/sh

  set -e

  yes > /dev/null &
  TARGET=$!

  ./perf stat true
  ./perf stat -a true
  ./perf stat -C0 true
  ./perf stat -p $TARGET true
  ./perf stat -t $TARGET true

  ./perf stat -a -A true
  ./perf stat -a --per-node true
  ./perf stat -a --per-socket true
  ./perf stat -a --per-die true
  ./perf stat -a --per-core true
  ./perf stat -a --per-thread true

  ./perf stat -a -I 500 sleep 1
  ./perf stat -a -I 500 --summary sleep 1
  ./perf stat -a -I 500 --per-socket sleep 1
  ./perf stat -a -I 500 --summary --per-socket sleep 1

  ./perf stat -a --metric-only true
  ./perf stat -a --metric-only --per-socket true
  ./perf stat -a --metric-only -I 500 sleep 1
  ./perf stat -a --metric-only -I 500 --per-socket sleep 1

  ./perf stat record true && ./perf stat report
  ./perf stat record -p $TARGET true && ./perf stat report
  ./perf stat record -a true && ./perf stat report
  ./perf stat record -a --per-core true && ./perf stat report
  ./perf stat record -a --per-core --metric-only true && ./perf stat report
  ./perf stat record -a -I 500 sleep 1 && ./perf stat report
  ./perf stat record -a -I 500 --per-core sleep 1 && ./perf stat report
  ./perf stat record -a -I 500 --per-core --metric-only sleep 1 && ./perf stat report

  ./perf stat -a -A -e cpu/event=cpu-cycles,percore/ true
  ./perf stat -a -A -e cpu/event=cpu-cycles,percore/ --percore-show-thread true

  kill $TARGET

The code is available at 'perf/stat-aggr-v3' branch in

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Thanks,
Namhyung


Namhyung Kim (20):
  perf tools: Save evsel->pmu in parse_events()
  perf tools: Use pmu info in evsel__is_hybrid()
  perf stat: Use evsel__is_hybrid() more
  perf stat: Add aggr id for global mode
  perf stat: Add cpu aggr id for no aggregation mode
  perf stat: Add 'needs_sort' argument to cpu_aggr_map__new()
  perf stat: Add struct perf_stat_aggr to perf_stat_evsel
  perf stat: Allocate evsel->stats->aggr properly
  perf stat: Aggregate events using evsel->stats->aggr
  perf stat: Factor out evsel__count_has_error()
  perf stat: Aggregate per-thread stats using evsel->stats->aggr
  perf stat: Allocate aggr counts for recorded data
  perf stat: Reset aggr counts for each interval
  perf stat: Split process_counters()
  perf stat: Add perf_stat_merge_counters()
  perf stat: Add perf_stat_process_percore()
  perf stat: Add perf_stat_process_shadow_stats()
  perf stat: Display event stats using aggr counts
  perf stat: Display percore events properly
  perf stat: Remove unused perf_counts.aggr field

 tools/perf/builtin-script.c                   |   4 +-
 tools/perf/builtin-stat.c                     | 186 +++++--
 tools/perf/tests/parse-metric.c               |   2 +-
 tools/perf/tests/pmu-events.c                 |   2 +-
 tools/perf/util/counts.c                      |   1 -
 tools/perf/util/counts.h                      |   1 -
 tools/perf/util/cpumap.c                      |  16 +-
 tools/perf/util/cpumap.h                      |   8 +-
 tools/perf/util/evsel.c                       |  32 +-
 tools/perf/util/parse-events.c                |   1 +
 tools/perf/util/pmu.c                         |   4 +
 .../scripting-engines/trace-event-python.c    |   6 -
 tools/perf/util/stat-display.c                | 462 +++---------------
 tools/perf/util/stat.c                        | 407 ++++++++++++---
 tools/perf/util/stat.h                        |  40 +-
 15 files changed, 633 insertions(+), 539 deletions(-)


base-commit: a3a365655a28f12f07eddf4f3fd596987b175e1d
-- 
2.38.0.413.g74048e4d9e-goog


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 01/20] perf tools: Save evsel->pmu in parse_events()
  2022-10-18  2:02 [PATCHSET 00/20] perf stat: Cleanup counter aggregation (v3) Namhyung Kim
@ 2022-10-18  2:02 ` Namhyung Kim
  2022-10-18  2:02 ` [PATCH 02/20] perf tools: Use pmu info in evsel__is_hybrid() Namhyung Kim
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2022-10-18  2:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun, Michael Petlan

Now evsel has a pmu pointer, let's save the info and use it like in
evsel__find_pmu().  The missing feature check needs to be changed as
the pmu pointer can be set from the beginning.

Acked-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/evsel.c        | 20 ++++++++++----------
 tools/perf/util/parse-events.c |  1 +
 tools/perf/util/pmu.c          |  4 ++++
 3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 76605fde3507..b7140beca970 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -467,6 +467,7 @@ struct evsel *evsel__clone(struct evsel *orig)
 	evsel->collect_stat = orig->collect_stat;
 	evsel->weak_group = orig->weak_group;
 	evsel->use_config_name = orig->use_config_name;
+	evsel->pmu = orig->pmu;
 
 	if (evsel__copy_config_terms(evsel, orig) < 0)
 		goto out_err;
@@ -1966,17 +1967,16 @@ bool evsel__detect_missing_features(struct evsel *evsel)
 		perf_missing_features.mmap2 = true;
 		pr_debug2_peo("switching off mmap2\n");
 		return true;
-	} else if ((evsel->core.attr.exclude_guest || evsel->core.attr.exclude_host) &&
-		   (evsel->pmu == NULL || evsel->pmu->missing_features.exclude_guest)) {
-		if (evsel->pmu == NULL) {
+	} else if (evsel->core.attr.exclude_guest || evsel->core.attr.exclude_host) {
+		if (evsel->pmu == NULL)
 			evsel->pmu = evsel__find_pmu(evsel);
-			if (evsel->pmu)
-				evsel->pmu->missing_features.exclude_guest = true;
-			else {
-				/* we cannot find PMU, disable attrs now */
-				evsel->core.attr.exclude_host = false;
-				evsel->core.attr.exclude_guest = false;
-			}
+
+		if (evsel->pmu)
+			evsel->pmu->missing_features.exclude_guest = true;
+		else {
+			/* we cannot find PMU, disable attrs now */
+			evsel->core.attr.exclude_host = false;
+			evsel->core.attr.exclude_guest = false;
 		}
 
 		if (evsel->exclude_GH) {
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 5973f46c2375..6502cd679f57 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -266,6 +266,7 @@ __add_event(struct list_head *list, int *idx,
 	evsel->core.own_cpus = perf_cpu_map__get(cpus);
 	evsel->core.requires_cpu = pmu ? pmu->is_uncore : false;
 	evsel->auto_merge_stats = auto_merge_stats;
+	evsel->pmu = pmu;
 
 	if (name)
 		evsel->name = strdup(name);
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 03284059175f..6a86e6af0903 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1065,11 +1065,15 @@ struct perf_pmu *evsel__find_pmu(struct evsel *evsel)
 {
 	struct perf_pmu *pmu = NULL;
 
+	if (evsel->pmu)
+		return evsel->pmu;
+
 	while ((pmu = perf_pmu__scan(pmu)) != NULL) {
 		if (pmu->type == evsel->core.attr.type)
 			break;
 	}
 
+	evsel->pmu = pmu;
 	return pmu;
 }
 
-- 
2.38.0.413.g74048e4d9e-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 02/20] perf tools: Use pmu info in evsel__is_hybrid()
  2022-10-18  2:02 [PATCHSET 00/20] perf stat: Cleanup counter aggregation (v3) Namhyung Kim
  2022-10-18  2:02 ` [PATCH 01/20] perf tools: Save evsel->pmu in parse_events() Namhyung Kim
@ 2022-10-18  2:02 ` Namhyung Kim
  2022-10-18  2:02 ` [PATCH 03/20] perf stat: Use evsel__is_hybrid() more Namhyung Kim
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2022-10-18  2:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun, Michael Petlan

If evsel has pmu, it can use pmu->is_hybrid directly.

Acked-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/evsel.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index b7140beca970..c9fef26d0702 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -3131,6 +3131,9 @@ void evsel__zero_per_pkg(struct evsel *evsel)
 
 bool evsel__is_hybrid(struct evsel *evsel)
 {
+	if (evsel->pmu)
+		return evsel->pmu->is_hybrid;
+
 	return evsel->pmu_name && perf_pmu__is_hybrid(evsel->pmu_name);
 }
 
-- 
2.38.0.413.g74048e4d9e-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 03/20] perf stat: Use evsel__is_hybrid() more
  2022-10-18  2:02 [PATCHSET 00/20] perf stat: Cleanup counter aggregation (v3) Namhyung Kim
  2022-10-18  2:02 ` [PATCH 01/20] perf tools: Save evsel->pmu in parse_events() Namhyung Kim
  2022-10-18  2:02 ` [PATCH 02/20] perf tools: Use pmu info in evsel__is_hybrid() Namhyung Kim
@ 2022-10-18  2:02 ` Namhyung Kim
  2022-10-18  2:02 ` [PATCH 04/20] perf stat: Add aggr id for global mode Namhyung Kim
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2022-10-18  2:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun, Michael Petlan

In the stat-display code, it needs to check if the current evsel is
hybrid but it uses perf_pmu__has_hybrid() which can return true for
non-hybrid event too.  I think it's better to use evsel__is_hybrid().

Also remove a NULL check for the 'config' parameter in the
hybrid_merge() since it's called after config->no_merge check.

Acked-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/stat-display.c | 20 ++++----------------
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 5c47ee9963a7..4113aa86772f 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -704,7 +704,7 @@ static void uniquify_event_name(struct evsel *counter)
 			counter->name = new_name;
 		}
 	} else {
-		if (perf_pmu__has_hybrid()) {
+		if (evsel__is_hybrid(counter)) {
 			ret = asprintf(&new_name, "%s/%s/",
 				       counter->pmu_name, counter->name);
 		} else {
@@ -744,26 +744,14 @@ static void collect_all_aliases(struct perf_stat_config *config, struct evsel *c
 	}
 }
 
-static bool is_uncore(struct evsel *evsel)
-{
-	struct perf_pmu *pmu = evsel__find_pmu(evsel);
-
-	return pmu && pmu->is_uncore;
-}
-
-static bool hybrid_uniquify(struct evsel *evsel)
-{
-	return perf_pmu__has_hybrid() && !is_uncore(evsel);
-}
-
 static bool hybrid_merge(struct evsel *counter, struct perf_stat_config *config,
 			 bool check)
 {
-	if (hybrid_uniquify(counter)) {
+	if (evsel__is_hybrid(counter)) {
 		if (check)
-			return config && config->hybrid_merge;
+			return config->hybrid_merge;
 		else
-			return config && !config->hybrid_merge;
+			return !config->hybrid_merge;
 	}
 
 	return false;
-- 
2.38.0.413.g74048e4d9e-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 04/20] perf stat: Add aggr id for global mode
  2022-10-18  2:02 [PATCHSET 00/20] perf stat: Cleanup counter aggregation (v3) Namhyung Kim
                   ` (2 preceding siblings ...)
  2022-10-18  2:02 ` [PATCH 03/20] perf stat: Use evsel__is_hybrid() more Namhyung Kim
@ 2022-10-18  2:02 ` Namhyung Kim
  2022-10-18  2:02 ` [PATCH 05/20] perf stat: Add cpu aggr id for no aggregation mode Namhyung Kim
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2022-10-18  2:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun, Michael Petlan

To make the code simpler, I'd like to use the same aggregation code for
the global mode.  We can simply add an id function to return cpu 0 and
use print_aggr().

No functional change intended.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c | 36 ++++++++++++++++++++++++++++++++++--
 tools/perf/util/cpumap.c  | 10 ++++++++++
 tools/perf/util/cpumap.h  |  6 +++++-
 3 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 265b05157972..75d16e9705a4 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1330,6 +1330,12 @@ static struct aggr_cpu_id perf_stat__get_node(struct perf_stat_config *config __
 	return aggr_cpu_id__node(cpu, /*data=*/NULL);
 }
 
+static struct aggr_cpu_id perf_stat__get_global(struct perf_stat_config *config __maybe_unused,
+						struct perf_cpu cpu)
+{
+	return aggr_cpu_id__global(cpu, /*data=*/NULL);
+}
+
 static struct aggr_cpu_id perf_stat__get_aggr(struct perf_stat_config *config,
 					      aggr_get_id_t get_id, struct perf_cpu cpu)
 {
@@ -1366,6 +1372,12 @@ static struct aggr_cpu_id perf_stat__get_node_cached(struct perf_stat_config *co
 	return perf_stat__get_aggr(config, perf_stat__get_node, cpu);
 }
 
+static struct aggr_cpu_id perf_stat__get_global_cached(struct perf_stat_config *config,
+						       struct perf_cpu cpu)
+{
+	return perf_stat__get_aggr(config, perf_stat__get_global, cpu);
+}
+
 static bool term_percore_set(void)
 {
 	struct evsel *counter;
@@ -1395,6 +1407,7 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr(enum aggr_mode aggr_mode)
 
 		return NULL;
 	case AGGR_GLOBAL:
+		return aggr_cpu_id__global;
 	case AGGR_THREAD:
 	case AGGR_UNSET:
 	case AGGR_MAX:
@@ -1420,6 +1433,7 @@ static aggr_get_id_t aggr_mode__get_id(enum aggr_mode aggr_mode)
 		}
 		return NULL;
 	case AGGR_GLOBAL:
+		return perf_stat__get_global_cached;
 	case AGGR_THREAD:
 	case AGGR_UNSET:
 	case AGGR_MAX:
@@ -1535,6 +1549,16 @@ static struct aggr_cpu_id perf_env__get_node_aggr_by_cpu(struct perf_cpu cpu, vo
 	return id;
 }
 
+static struct aggr_cpu_id perf_env__get_global_aggr_by_cpu(struct perf_cpu cpu __maybe_unused,
+							   void *data __maybe_unused)
+{
+	struct aggr_cpu_id id = aggr_cpu_id__empty();
+
+	/* it always aggregates to the cpu 0 */
+	id.cpu = (struct perf_cpu){ .cpu = 0 };
+	return id;
+}
+
 static struct aggr_cpu_id perf_stat__get_socket_file(struct perf_stat_config *config __maybe_unused,
 						     struct perf_cpu cpu)
 {
@@ -1558,6 +1582,12 @@ static struct aggr_cpu_id perf_stat__get_node_file(struct perf_stat_config *conf
 	return perf_env__get_node_aggr_by_cpu(cpu, &perf_stat.session->header.env);
 }
 
+static struct aggr_cpu_id perf_stat__get_global_file(struct perf_stat_config *config __maybe_unused,
+						     struct perf_cpu cpu)
+{
+	return perf_env__get_global_aggr_by_cpu(cpu, &perf_stat.session->header.env);
+}
+
 static aggr_cpu_id_get_t aggr_mode__get_aggr_file(enum aggr_mode aggr_mode)
 {
 	switch (aggr_mode) {
@@ -1569,8 +1599,9 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr_file(enum aggr_mode aggr_mode)
 		return perf_env__get_core_aggr_by_cpu;
 	case AGGR_NODE:
 		return perf_env__get_node_aggr_by_cpu;
-	case AGGR_NONE:
 	case AGGR_GLOBAL:
+		return perf_env__get_global_aggr_by_cpu;
+	case AGGR_NONE:
 	case AGGR_THREAD:
 	case AGGR_UNSET:
 	case AGGR_MAX:
@@ -1590,8 +1621,9 @@ static aggr_get_id_t aggr_mode__get_id_file(enum aggr_mode aggr_mode)
 		return perf_stat__get_core_file;
 	case AGGR_NODE:
 		return perf_stat__get_node_file;
-	case AGGR_NONE:
 	case AGGR_GLOBAL:
+		return perf_stat__get_global_file;
+	case AGGR_NONE:
 	case AGGR_THREAD:
 	case AGGR_UNSET:
 	case AGGR_MAX:
diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c
index 8486ca3bec75..60209fe87456 100644
--- a/tools/perf/util/cpumap.c
+++ b/tools/perf/util/cpumap.c
@@ -354,6 +354,16 @@ struct aggr_cpu_id aggr_cpu_id__node(struct perf_cpu cpu, void *data __maybe_unu
 	return id;
 }
 
+struct aggr_cpu_id aggr_cpu_id__global(struct perf_cpu cpu, void *data __maybe_unused)
+{
+	struct aggr_cpu_id id = aggr_cpu_id__empty();
+
+	/* it always aggregates to the cpu 0 */
+	cpu.cpu = 0;
+	id.cpu = cpu;
+	return id;
+}
+
 /* setup simple routines to easily access node numbers given a cpu number */
 static int get_max_num(char *path, int *max)
 {
diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h
index 4a6d029576ee..b2ff648bc417 100644
--- a/tools/perf/util/cpumap.h
+++ b/tools/perf/util/cpumap.h
@@ -133,5 +133,9 @@ struct aggr_cpu_id aggr_cpu_id__cpu(struct perf_cpu cpu, void *data);
  * cpu. The function signature is compatible with aggr_cpu_id_get_t.
  */
 struct aggr_cpu_id aggr_cpu_id__node(struct perf_cpu cpu, void *data);
-
+/**
+ * aggr_cpu_id__global - Create an aggr_cpu_id for global aggregation.
+ * The function signature is compatible with aggr_cpu_id_get_t.
+ */
+struct aggr_cpu_id aggr_cpu_id__global(struct perf_cpu cpu, void *data);
 #endif /* __PERF_CPUMAP_H */
-- 
2.38.0.413.g74048e4d9e-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 05/20] perf stat: Add cpu aggr id for no aggregation mode
  2022-10-18  2:02 [PATCHSET 00/20] perf stat: Cleanup counter aggregation (v3) Namhyung Kim
                   ` (3 preceding siblings ...)
  2022-10-18  2:02 ` [PATCH 04/20] perf stat: Add aggr id for global mode Namhyung Kim
@ 2022-10-18  2:02 ` Namhyung Kim
  2022-10-18  2:02 ` [PATCH 06/20] perf stat: Add 'needs_sort' argument to cpu_aggr_map__new() Namhyung Kim
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2022-10-18  2:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun, Michael Petlan

Likewise, add an aggr_id for cpu for none aggregation mode.  This is not
used actually yet but later code will use to unify the aggregation code.

No functional change intended.

Acked-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c | 48 +++++++++++++++++++++++++++++++++++----
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 75d16e9705a4..b03b530fe9a6 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1336,6 +1336,12 @@ static struct aggr_cpu_id perf_stat__get_global(struct perf_stat_config *config
 	return aggr_cpu_id__global(cpu, /*data=*/NULL);
 }
 
+static struct aggr_cpu_id perf_stat__get_cpu(struct perf_stat_config *config __maybe_unused,
+					     struct perf_cpu cpu)
+{
+	return aggr_cpu_id__cpu(cpu, /*data=*/NULL);
+}
+
 static struct aggr_cpu_id perf_stat__get_aggr(struct perf_stat_config *config,
 					      aggr_get_id_t get_id, struct perf_cpu cpu)
 {
@@ -1378,6 +1384,12 @@ static struct aggr_cpu_id perf_stat__get_global_cached(struct perf_stat_config *
 	return perf_stat__get_aggr(config, perf_stat__get_global, cpu);
 }
 
+static struct aggr_cpu_id perf_stat__get_cpu_cached(struct perf_stat_config *config,
+						    struct perf_cpu cpu)
+{
+	return perf_stat__get_aggr(config, perf_stat__get_cpu, cpu);
+}
+
 static bool term_percore_set(void)
 {
 	struct evsel *counter;
@@ -1404,8 +1416,7 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr(enum aggr_mode aggr_mode)
 	case AGGR_NONE:
 		if (term_percore_set())
 			return aggr_cpu_id__core;
-
-		return NULL;
+		return aggr_cpu_id__cpu;
 	case AGGR_GLOBAL:
 		return aggr_cpu_id__global;
 	case AGGR_THREAD:
@@ -1428,10 +1439,9 @@ static aggr_get_id_t aggr_mode__get_id(enum aggr_mode aggr_mode)
 	case AGGR_NODE:
 		return perf_stat__get_node_cached;
 	case AGGR_NONE:
-		if (term_percore_set()) {
+		if (term_percore_set())
 			return perf_stat__get_core_cached;
-		}
-		return NULL;
+		return perf_stat__get_cpu_cached;
 	case AGGR_GLOBAL:
 		return perf_stat__get_global_cached;
 	case AGGR_THREAD:
@@ -1541,6 +1551,26 @@ static struct aggr_cpu_id perf_env__get_core_aggr_by_cpu(struct perf_cpu cpu, vo
 	return id;
 }
 
+static struct aggr_cpu_id perf_env__get_cpu_aggr_by_cpu(struct perf_cpu cpu, void *data)
+{
+	struct perf_env *env = data;
+	struct aggr_cpu_id id = aggr_cpu_id__empty();
+
+	if (cpu.cpu != -1) {
+		/*
+		 * core_id is relative to socket and die,
+		 * we need a global id. So we set
+		 * socket, die id and core id
+		 */
+		id.socket = env->cpu[cpu.cpu].socket_id;
+		id.die = env->cpu[cpu.cpu].die_id;
+		id.core = env->cpu[cpu.cpu].core_id;
+		id.cpu = cpu;
+	}
+
+	return id;
+}
+
 static struct aggr_cpu_id perf_env__get_node_aggr_by_cpu(struct perf_cpu cpu, void *data)
 {
 	struct aggr_cpu_id id = aggr_cpu_id__empty();
@@ -1576,6 +1606,12 @@ static struct aggr_cpu_id perf_stat__get_core_file(struct perf_stat_config *conf
 	return perf_env__get_core_aggr_by_cpu(cpu, &perf_stat.session->header.env);
 }
 
+static struct aggr_cpu_id perf_stat__get_cpu_file(struct perf_stat_config *config __maybe_unused,
+						  struct perf_cpu cpu)
+{
+	return perf_env__get_cpu_aggr_by_cpu(cpu, &perf_stat.session->header.env);
+}
+
 static struct aggr_cpu_id perf_stat__get_node_file(struct perf_stat_config *config __maybe_unused,
 						   struct perf_cpu cpu)
 {
@@ -1602,6 +1638,7 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr_file(enum aggr_mode aggr_mode)
 	case AGGR_GLOBAL:
 		return perf_env__get_global_aggr_by_cpu;
 	case AGGR_NONE:
+		return perf_env__get_cpu_aggr_by_cpu;
 	case AGGR_THREAD:
 	case AGGR_UNSET:
 	case AGGR_MAX:
@@ -1624,6 +1661,7 @@ static aggr_get_id_t aggr_mode__get_id_file(enum aggr_mode aggr_mode)
 	case AGGR_GLOBAL:
 		return perf_stat__get_global_file;
 	case AGGR_NONE:
+		return perf_stat__get_cpu_file;
 	case AGGR_THREAD:
 	case AGGR_UNSET:
 	case AGGR_MAX:
-- 
2.38.0.413.g74048e4d9e-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 06/20] perf stat: Add 'needs_sort' argument to cpu_aggr_map__new()
  2022-10-18  2:02 [PATCHSET 00/20] perf stat: Cleanup counter aggregation (v3) Namhyung Kim
                   ` (4 preceding siblings ...)
  2022-10-18  2:02 ` [PATCH 05/20] perf stat: Add cpu aggr id for no aggregation mode Namhyung Kim
@ 2022-10-18  2:02 ` Namhyung Kim
  2022-10-18  2:02 ` [PATCH 07/20] perf stat: Add struct perf_stat_aggr to perf_stat_evsel Namhyung Kim
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2022-10-18  2:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun, Michael Petlan

In case of no aggregation, it needs to keep the original (cpu) ordering
in the aggr_map so that it can be in sync with the cpu map.  This will
make the code easier to handle AGGR_NONE similar to others.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c | 7 +++++--
 tools/perf/util/cpumap.c  | 6 ++++--
 tools/perf/util/cpumap.h  | 2 +-
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index b03b530fe9a6..9053fd4d15a7 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1458,8 +1458,9 @@ static int perf_stat_init_aggr_mode(void)
 	aggr_cpu_id_get_t get_id = aggr_mode__get_aggr(stat_config.aggr_mode);
 
 	if (get_id) {
+		bool needs_sort = stat_config.aggr_mode != AGGR_NONE;
 		stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.user_requested_cpus,
-							 get_id, /*data=*/NULL);
+							 get_id, /*data=*/NULL, needs_sort);
 		if (!stat_config.aggr_map) {
 			pr_err("cannot build %s map", aggr_mode__string[stat_config.aggr_mode]);
 			return -1;
@@ -1674,11 +1675,13 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
 {
 	struct perf_env *env = &st->session->header.env;
 	aggr_cpu_id_get_t get_id = aggr_mode__get_aggr_file(stat_config.aggr_mode);
+	bool needs_sort = stat_config.aggr_mode != AGGR_NONE;
 
 	if (!get_id)
 		return 0;
 
-	stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.user_requested_cpus, get_id, env);
+	stat_config.aggr_map = cpu_aggr_map__new(evsel_list->core.user_requested_cpus,
+						 get_id, env, needs_sort);
 	if (!stat_config.aggr_map) {
 		pr_err("cannot build %s map", aggr_mode__string[stat_config.aggr_mode]);
 		return -1;
diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c
index 60209fe87456..6e3fcf523de9 100644
--- a/tools/perf/util/cpumap.c
+++ b/tools/perf/util/cpumap.c
@@ -234,7 +234,7 @@ static int aggr_cpu_id__cmp(const void *a_pointer, const void *b_pointer)
 
 struct cpu_aggr_map *cpu_aggr_map__new(const struct perf_cpu_map *cpus,
 				       aggr_cpu_id_get_t get_id,
-				       void *data)
+				       void *data, bool needs_sort)
 {
 	int idx;
 	struct perf_cpu cpu;
@@ -270,8 +270,10 @@ struct cpu_aggr_map *cpu_aggr_map__new(const struct perf_cpu_map *cpus,
 		if (trimmed_c)
 			c = trimmed_c;
 	}
+
 	/* ensure we process id in increasing order */
-	qsort(c->map, c->nr, sizeof(struct aggr_cpu_id), aggr_cpu_id__cmp);
+	if (needs_sort)
+		qsort(c->map, c->nr, sizeof(struct aggr_cpu_id), aggr_cpu_id__cmp);
 
 	return c;
 
diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h
index b2ff648bc417..da28b3146ef9 100644
--- a/tools/perf/util/cpumap.h
+++ b/tools/perf/util/cpumap.h
@@ -97,7 +97,7 @@ typedef struct aggr_cpu_id (*aggr_cpu_id_get_t)(struct perf_cpu cpu, void *data)
  */
 struct cpu_aggr_map *cpu_aggr_map__new(const struct perf_cpu_map *cpus,
 				       aggr_cpu_id_get_t get_id,
-				       void *data);
+				       void *data, bool needs_sort);
 
 bool aggr_cpu_id__equal(const struct aggr_cpu_id *a, const struct aggr_cpu_id *b);
 bool aggr_cpu_id__is_empty(const struct aggr_cpu_id *a);
-- 
2.38.0.413.g74048e4d9e-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 07/20] perf stat: Add struct perf_stat_aggr to perf_stat_evsel
  2022-10-18  2:02 [PATCHSET 00/20] perf stat: Cleanup counter aggregation (v3) Namhyung Kim
                   ` (5 preceding siblings ...)
  2022-10-18  2:02 ` [PATCH 06/20] perf stat: Add 'needs_sort' argument to cpu_aggr_map__new() Namhyung Kim
@ 2022-10-18  2:02 ` Namhyung Kim
  2022-10-18  2:02 ` [PATCH 08/20] perf stat: Allocate evsel->stats->aggr properly Namhyung Kim
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2022-10-18  2:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun, Michael Petlan

The perf_stat_aggr struct is to keep aggregated counter values and the
states according to the aggregation mode.  The number of entries is
depends on the mode and this is a preparation for the later use.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/stat.c | 34 +++++++++++++++++++++++++++-------
 tools/perf/util/stat.h | 19 +++++++++++++++++++
 2 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 8ec8bb4a9912..c9d5aa295b54 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -133,15 +133,33 @@ static void perf_stat_evsel_id_init(struct evsel *evsel)
 static void evsel__reset_stat_priv(struct evsel *evsel)
 {
 	struct perf_stat_evsel *ps = evsel->stats;
+	struct perf_stat_aggr *aggr = ps->aggr;
 
 	init_stats(&ps->res_stats);
+
+	if (aggr)
+		memset(aggr, 0, sizeof(*aggr) * ps->nr_aggr);
 }
 
-static int evsel__alloc_stat_priv(struct evsel *evsel)
+
+static int evsel__alloc_stat_priv(struct evsel *evsel, int nr_aggr)
 {
-	evsel->stats = zalloc(sizeof(struct perf_stat_evsel));
-	if (evsel->stats == NULL)
+	struct perf_stat_evsel *ps;
+
+	ps = zalloc(sizeof(*ps));
+	if (ps == NULL)
 		return -ENOMEM;
+
+	if (nr_aggr) {
+		ps->nr_aggr = nr_aggr;
+		ps->aggr = calloc(nr_aggr, sizeof(*ps->aggr));
+		if (ps->aggr == NULL) {
+			free(ps);
+			return -ENOMEM;
+		}
+	}
+
+	evsel->stats = ps;
 	perf_stat_evsel_id_init(evsel);
 	evsel__reset_stat_priv(evsel);
 	return 0;
@@ -151,8 +169,10 @@ static void evsel__free_stat_priv(struct evsel *evsel)
 {
 	struct perf_stat_evsel *ps = evsel->stats;
 
-	if (ps)
+	if (ps) {
+		zfree(&ps->aggr);
 		zfree(&ps->group_data);
+	}
 	zfree(&evsel->stats);
 }
 
@@ -181,9 +201,9 @@ static void evsel__reset_prev_raw_counts(struct evsel *evsel)
 		perf_counts__reset(evsel->prev_raw_counts);
 }
 
-static int evsel__alloc_stats(struct evsel *evsel, bool alloc_raw)
+static int evsel__alloc_stats(struct evsel *evsel, int nr_aggr, bool alloc_raw)
 {
-	if (evsel__alloc_stat_priv(evsel) < 0 ||
+	if (evsel__alloc_stat_priv(evsel, nr_aggr) < 0 ||
 	    evsel__alloc_counts(evsel) < 0 ||
 	    (alloc_raw && evsel__alloc_prev_raw_counts(evsel) < 0))
 		return -ENOMEM;
@@ -196,7 +216,7 @@ int evlist__alloc_stats(struct evlist *evlist, bool alloc_raw)
 	struct evsel *evsel;
 
 	evlist__for_each_entry(evlist, evsel) {
-		if (evsel__alloc_stats(evsel, alloc_raw))
+		if (evsel__alloc_stats(evsel, 0, alloc_raw))
 			goto out_free;
 	}
 
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index b0899c6e002f..42453513ffea 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -8,6 +8,7 @@
 #include <sys/resource.h>
 #include "cpumap.h"
 #include "rblist.h"
+#include "counts.h"
 
 struct perf_cpu_map;
 struct perf_stat_config;
@@ -42,9 +43,27 @@ enum perf_stat_evsel_id {
 	PERF_STAT_EVSEL_ID__MAX,
 };
 
+/* hold aggregated event info */
+struct perf_stat_aggr {
+	/* aggregated values */
+	struct perf_counts_values	counts;
+	/* number of entries (CPUs) aggregated */
+	int				nr;
+	/* whether any entry has failed to read/process event */
+	bool				failed;
+};
+
+/* per-evsel event stats */
 struct perf_stat_evsel {
+	/* used for repeated runs */
 	struct stats		 res_stats;
+	/* evsel id for quick check */
 	enum perf_stat_evsel_id	 id;
+	/* number of allocated 'aggr' */
+	int			 nr_aggr;
+	/* aggregated event values */
+	struct perf_stat_aggr	*aggr;
+	/* used for group read */
 	u64			*group_data;
 };
 
-- 
2.38.0.413.g74048e4d9e-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 08/20] perf stat: Allocate evsel->stats->aggr properly
  2022-10-18  2:02 [PATCHSET 00/20] perf stat: Cleanup counter aggregation (v3) Namhyung Kim
                   ` (6 preceding siblings ...)
  2022-10-18  2:02 ` [PATCH 07/20] perf stat: Add struct perf_stat_aggr to perf_stat_evsel Namhyung Kim
@ 2022-10-18  2:02 ` Namhyung Kim
  2022-10-18  2:02 ` [PATCH 09/20] perf stat: Aggregate events using evsel->stats->aggr Namhyung Kim
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2022-10-18  2:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun, Michael Petlan

The perf_stat_config.aggr_map should have a correct size of the
aggregation map.  Use it to allocate aggr_counts.

Also AGGR_NONE with per-core events can be tricky because it doesn't
aggreate basically but it needs to do so for per-core events only.
So only per-core evsels will have stats->aggr data.

Note that other caller of evlist__alloc_stat() might not have
stat_config or aggr_map.

Acked-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-script.c     | 4 ++--
 tools/perf/builtin-stat.c       | 6 +++---
 tools/perf/tests/parse-metric.c | 2 +-
 tools/perf/tests/pmu-events.c   | 2 +-
 tools/perf/util/stat.c          | 9 +++++++--
 tools/perf/util/stat.h          | 3 ++-
 6 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 7ca238277d83..d7ec8c1af293 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2049,7 +2049,7 @@ static void perf_sample__fprint_metric(struct perf_script *script,
 	u64 val;
 
 	if (!evsel->stats)
-		evlist__alloc_stats(script->session->evlist, false);
+		evlist__alloc_stats(&stat_config, script->session->evlist, /*alloc_raw=*/false);
 	if (evsel_script(leader)->gnum++ == 0)
 		perf_stat__reset_shadow_stats();
 	val = sample->period * evsel->scale;
@@ -3632,7 +3632,7 @@ static int set_maps(struct perf_script *script)
 
 	perf_evlist__set_maps(&evlist->core, script->cpus, script->threads);
 
-	if (evlist__alloc_stats(evlist, true))
+	if (evlist__alloc_stats(&stat_config, evlist, /*alloc_raw=*/true))
 		return -ENOMEM;
 
 	script->allocated = true;
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 9053fd4d15a7..92a8e4512f98 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -2121,7 +2121,7 @@ static int set_maps(struct perf_stat *st)
 
 	perf_evlist__set_maps(&evsel_list->core, st->cpus, st->threads);
 
-	if (evlist__alloc_stats(evsel_list, true))
+	if (evlist__alloc_stats(&stat_config, evsel_list, /*alloc_raw=*/true))
 		return -ENOMEM;
 
 	st->maps_allocated = true;
@@ -2568,10 +2568,10 @@ int cmd_stat(int argc, const char **argv)
 		goto out;
 	}
 
-	if (evlist__alloc_stats(evsel_list, interval))
+	if (perf_stat_init_aggr_mode())
 		goto out;
 
-	if (perf_stat_init_aggr_mode())
+	if (evlist__alloc_stats(&stat_config, evsel_list, interval))
 		goto out;
 
 	/*
diff --git a/tools/perf/tests/parse-metric.c b/tools/perf/tests/parse-metric.c
index 68f5a2a03242..21b7ac00d798 100644
--- a/tools/perf/tests/parse-metric.c
+++ b/tools/perf/tests/parse-metric.c
@@ -103,7 +103,7 @@ static int __compute_metric(const char *name, struct value *vals,
 	if (err)
 		goto out;
 
-	err = evlist__alloc_stats(evlist, false);
+	err = evlist__alloc_stats(/*config=*/NULL, evlist, /*alloc_raw=*/false);
 	if (err)
 		goto out;
 
diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c
index 097e05c796ab..5d0d3b239a68 100644
--- a/tools/perf/tests/pmu-events.c
+++ b/tools/perf/tests/pmu-events.c
@@ -889,7 +889,7 @@ static int test__parsing_callback(const struct pmu_event *pe, const struct pmu_e
 		goto out_err;
 	}
 
-	err = evlist__alloc_stats(evlist, false);
+	err = evlist__alloc_stats(/*config=*/NULL, evlist, /*alloc_raw=*/false);
 	if (err)
 		goto out_err;
 	/*
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index c9d5aa295b54..374149628507 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -211,12 +211,17 @@ static int evsel__alloc_stats(struct evsel *evsel, int nr_aggr, bool alloc_raw)
 	return 0;
 }
 
-int evlist__alloc_stats(struct evlist *evlist, bool alloc_raw)
+int evlist__alloc_stats(struct perf_stat_config *config,
+			struct evlist *evlist, bool alloc_raw)
 {
 	struct evsel *evsel;
+	int nr_aggr = 0;
+
+	if (config && config->aggr_map)
+		nr_aggr = config->aggr_map->nr;
 
 	evlist__for_each_entry(evlist, evsel) {
-		if (evsel__alloc_stats(evsel, 0, alloc_raw))
+		if (evsel__alloc_stats(evsel, nr_aggr, alloc_raw))
 			goto out_free;
 	}
 
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 42453513ffea..0980875b9be1 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -267,7 +267,8 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 				   struct runtime_stat *st);
 void perf_stat__collect_metric_expr(struct evlist *);
 
-int evlist__alloc_stats(struct evlist *evlist, bool alloc_raw);
+int evlist__alloc_stats(struct perf_stat_config *config,
+			struct evlist *evlist, bool alloc_raw);
 void evlist__free_stats(struct evlist *evlist);
 void evlist__reset_stats(struct evlist *evlist);
 void evlist__reset_prev_raw_counts(struct evlist *evlist);
-- 
2.38.0.413.g74048e4d9e-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 09/20] perf stat: Aggregate events using evsel->stats->aggr
  2022-10-18  2:02 [PATCHSET 00/20] perf stat: Cleanup counter aggregation (v3) Namhyung Kim
                   ` (7 preceding siblings ...)
  2022-10-18  2:02 ` [PATCH 08/20] perf stat: Allocate evsel->stats->aggr properly Namhyung Kim
@ 2022-10-18  2:02 ` Namhyung Kim
  2022-10-18  2:02 ` [PATCH 10/20] perf stat: Factor out evsel__count_has_error() Namhyung Kim
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2022-10-18  2:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun, Michael Petlan

Add a logic to aggregate counter values to the new evsel->stats->aggr.
This is not used yet so shadow stats are not updated.  But later patch
will convert the existing code to use it.

With that, we don't need to handle AGGR_GLOBAL specially anymore.  It
can use the same logic with counts, prev_counts and aggr_counts.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c                     |  3 --
 tools/perf/util/evsel.c                       |  9 +---
 .../scripting-engines/trace-event-python.c    |  6 ---
 tools/perf/util/stat.c                        | 46 ++++++++++++++++---
 4 files changed, 41 insertions(+), 23 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 92a8e4512f98..abede56d79b6 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -963,9 +963,6 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 		init_stats(&walltime_nsecs_stats);
 		update_stats(&walltime_nsecs_stats, t1 - t0);
 
-		if (stat_config.aggr_mode == AGGR_GLOBAL)
-			evlist__save_aggr_prev_raw_counts(evsel_list);
-
 		evlist__copy_prev_raw_counts(evsel_list);
 		evlist__reset_prev_raw_counts(evsel_list);
 		perf_stat__reset_shadow_per_stat(&rt_stat);
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index c9fef26d0702..cdde5b5f8ad2 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1526,13 +1526,8 @@ void evsel__compute_deltas(struct evsel *evsel, int cpu_map_idx, int thread,
 	if (!evsel->prev_raw_counts)
 		return;
 
-	if (cpu_map_idx == -1) {
-		tmp = evsel->prev_raw_counts->aggr;
-		evsel->prev_raw_counts->aggr = *count;
-	} else {
-		tmp = *perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread);
-		*perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread) = *count;
-	}
+	tmp = *perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread);
+	*perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread) = *count;
 
 	count->val = count->val - tmp.val;
 	count->ena = count->ena - tmp.ena;
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index 1f2040f36d4e..7bc8559dce6a 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -1653,12 +1653,6 @@ static void python_process_stat(struct perf_stat_config *config,
 	struct perf_cpu_map *cpus = counter->core.cpus;
 	int cpu, thread;
 
-	if (config->aggr_mode == AGGR_GLOBAL) {
-		process_stat(counter, (struct perf_cpu){ .cpu = -1 }, -1, tstamp,
-			     &counter->counts->aggr);
-		return;
-	}
-
 	for (thread = 0; thread < threads->nr; thread++) {
 		for (cpu = 0; cpu < perf_cpu_map__nr(cpus); cpu++) {
 			process_stat(counter, perf_cpu_map__cpu(cpus, cpu),
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 374149628507..99874254809d 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -387,6 +387,7 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
 		       struct perf_counts_values *count)
 {
 	struct perf_counts_values *aggr = &evsel->counts->aggr;
+	struct perf_stat_evsel *ps = evsel->stats;
 	static struct perf_counts_values zero;
 	bool skip = false;
 
@@ -398,6 +399,44 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
 	if (skip)
 		count = &zero;
 
+	if (!evsel->snapshot)
+		evsel__compute_deltas(evsel, cpu_map_idx, thread, count);
+	perf_counts_values__scale(count, config->scale, NULL);
+
+	if (ps->aggr) {
+		struct perf_cpu cpu = perf_cpu_map__cpu(evsel->core.cpus, cpu_map_idx);
+		struct aggr_cpu_id aggr_id = config->aggr_get_id(config, cpu);
+		struct perf_stat_aggr *ps_aggr;
+		int i;
+
+		for (i = 0; i < ps->nr_aggr; i++) {
+			if (!aggr_cpu_id__equal(&aggr_id, &config->aggr_map->map[i]))
+				continue;
+
+			ps_aggr = &ps->aggr[i];
+			ps_aggr->nr++;
+
+			/*
+			 * When any result is bad, make them all to give
+			 * consistent output in interval mode.
+			 */
+			if (count->ena == 0 || count->run == 0 ||
+			    evsel->counts->scaled == -1) {
+				ps_aggr->counts.val = 0;
+				ps_aggr->counts.ena = 0;
+				ps_aggr->counts.run = 0;
+				ps_aggr->failed = true;
+			}
+
+			if (!ps_aggr->failed) {
+				ps_aggr->counts.val += count->val;
+				ps_aggr->counts.ena += count->ena;
+				ps_aggr->counts.run += count->run;
+			}
+			break;
+		}
+	}
+
 	switch (config->aggr_mode) {
 	case AGGR_THREAD:
 	case AGGR_CORE:
@@ -405,9 +444,6 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
 	case AGGR_SOCKET:
 	case AGGR_NODE:
 	case AGGR_NONE:
-		if (!evsel->snapshot)
-			evsel__compute_deltas(evsel, cpu_map_idx, thread, count);
-		perf_counts_values__scale(count, config->scale, NULL);
 		if ((config->aggr_mode == AGGR_NONE) && (!evsel->percore)) {
 			perf_stat__update_shadow_stats(evsel, count->val,
 						       cpu_map_idx, &rt_stat);
@@ -469,10 +505,6 @@ int perf_stat_process_counter(struct perf_stat_config *config,
 	if (config->aggr_mode != AGGR_GLOBAL)
 		return 0;
 
-	if (!counter->snapshot)
-		evsel__compute_deltas(counter, -1, -1, aggr);
-	perf_counts_values__scale(aggr, config->scale, &counter->counts->scaled);
-
 	update_stats(&ps->res_stats, *count);
 
 	if (verbose > 0) {
-- 
2.38.0.413.g74048e4d9e-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 10/20] perf stat: Factor out evsel__count_has_error()
  2022-10-18  2:02 [PATCHSET 00/20] perf stat: Cleanup counter aggregation (v3) Namhyung Kim
                   ` (8 preceding siblings ...)
  2022-10-18  2:02 ` [PATCH 09/20] perf stat: Aggregate events using evsel->stats->aggr Namhyung Kim
@ 2022-10-18  2:02 ` Namhyung Kim
  2022-10-18  2:02 ` [PATCH 11/20] perf stat: Aggregate per-thread stats using evsel->stats->aggr Namhyung Kim
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2022-10-18  2:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun, Michael Petlan

It's possible to have 0 enabled/running time for some per-task or per-cgroup
events since it's not scheduled on any CPU.  Treating the whole event as
failed would not work in this case.  Thinking again, the code only existed
when any CPU-level aggregation is enabled (like per-socket, per-core, ...).

To make it clearer, factor out the condition check into the new
evsel__count_has_error() function and add some comments.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/stat.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 99874254809d..dc075d5a0f72 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -381,6 +381,25 @@ static int check_per_pkg(struct evsel *counter, struct perf_counts_values *vals,
 	return ret;
 }
 
+static bool evsel__count_has_error(struct evsel *evsel,
+				   struct perf_counts_values *count,
+				   struct perf_stat_config *config)
+{
+	/* the evsel was failed already */
+	if (evsel->err || evsel->counts->scaled == -1)
+		return true;
+
+	/* this is meaningful for CPU aggregation modes only */
+	if (config->aggr_mode == AGGR_GLOBAL)
+		return false;
+
+	/* it's considered ok when it actually ran */
+	if (count->ena != 0 && count->run != 0)
+		return false;
+
+	return true;
+}
+
 static int
 process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
 		       int cpu_map_idx, int thread,
@@ -420,8 +439,7 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
 			 * When any result is bad, make them all to give
 			 * consistent output in interval mode.
 			 */
-			if (count->ena == 0 || count->run == 0 ||
-			    evsel->counts->scaled == -1) {
+			if (evsel__count_has_error(evsel, count, config) && !ps_aggr->failed) {
 				ps_aggr->counts.val = 0;
 				ps_aggr->counts.ena = 0;
 				ps_aggr->counts.run = 0;
-- 
2.38.0.413.g74048e4d9e-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 11/20] perf stat: Aggregate per-thread stats using evsel->stats->aggr
  2022-10-18  2:02 [PATCHSET 00/20] perf stat: Cleanup counter aggregation (v3) Namhyung Kim
                   ` (9 preceding siblings ...)
  2022-10-18  2:02 ` [PATCH 10/20] perf stat: Factor out evsel__count_has_error() Namhyung Kim
@ 2022-10-18  2:02 ` Namhyung Kim
  2022-10-18  2:02 ` [PATCH 12/20] perf stat: Allocate aggr counts for recorded data Namhyung Kim
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2022-10-18  2:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun, Michael Petlan

Per-thread aggregation doesn't use the CPU numbers but the logic should
be the same.  Initialize cpu_aggr_map separately for AGGR_THREAD and use
thread map idx to aggregate counter values.

Acked-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c | 31 +++++++++++++++++++++++++++++++
 tools/perf/util/stat.c    | 24 ++++++++++++++++++++++--
 2 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index abede56d79b6..6777fef0d56c 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1465,6 +1465,21 @@ static int perf_stat_init_aggr_mode(void)
 		stat_config.aggr_get_id = aggr_mode__get_id(stat_config.aggr_mode);
 	}
 
+	if (stat_config.aggr_mode == AGGR_THREAD) {
+		nr = perf_thread_map__nr(evsel_list->core.threads);
+		stat_config.aggr_map = cpu_aggr_map__empty_new(nr);
+		if (stat_config.aggr_map == NULL)
+			return -ENOMEM;
+
+		for (int s = 0; s < nr; s++) {
+			struct aggr_cpu_id id = aggr_cpu_id__empty();
+
+			id.thread_idx = s;
+			stat_config.aggr_map->map[s] = id;
+		}
+		return 0;
+	}
+
 	/*
 	 * The evsel_list->cpus is the base we operate on,
 	 * taking the highest cpu number to be the size of
@@ -1674,6 +1689,22 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
 	aggr_cpu_id_get_t get_id = aggr_mode__get_aggr_file(stat_config.aggr_mode);
 	bool needs_sort = stat_config.aggr_mode != AGGR_NONE;
 
+	if (stat_config.aggr_mode == AGGR_THREAD) {
+		int nr = perf_thread_map__nr(evsel_list->core.threads);
+
+		stat_config.aggr_map = cpu_aggr_map__empty_new(nr);
+		if (stat_config.aggr_map == NULL)
+			return -ENOMEM;
+
+		for (int s = 0; s < nr; s++) {
+			struct aggr_cpu_id id = aggr_cpu_id__empty();
+
+			id.thread_idx = s;
+			stat_config.aggr_map->map[s] = id;
+		}
+		return 0;
+	}
+
 	if (!get_id)
 		return 0;
 
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index dc075d5a0f72..5b04c9d16156 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -422,6 +422,24 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
 		evsel__compute_deltas(evsel, cpu_map_idx, thread, count);
 	perf_counts_values__scale(count, config->scale, NULL);
 
+	if (config->aggr_mode == AGGR_THREAD) {
+		struct perf_counts_values *aggr_counts = &ps->aggr[thread].counts;
+
+		/*
+		 * Skip value 0 when enabling --per-thread globally,
+		 * otherwise too many 0 output.
+		 */
+		if (count->val == 0 && config->system_wide)
+			return 0;
+
+		ps->aggr[thread].nr++;
+
+		aggr_counts->val += count->val;
+		aggr_counts->ena += count->ena;
+		aggr_counts->run += count->run;
+		goto update;
+	}
+
 	if (ps->aggr) {
 		struct perf_cpu cpu = perf_cpu_map__cpu(evsel->core.cpus, cpu_map_idx);
 		struct aggr_cpu_id aggr_id = config->aggr_get_id(config, cpu);
@@ -436,8 +454,9 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
 			ps_aggr->nr++;
 
 			/*
-			 * When any result is bad, make them all to give
-			 * consistent output in interval mode.
+			 * When any result is bad, make them all to give consistent output
+			 * in interval mode.  But per-task counters can have 0 enabled time
+			 * when some tasks are idle.
 			 */
 			if (evsel__count_has_error(evsel, count, config) && !ps_aggr->failed) {
 				ps_aggr->counts.val = 0;
@@ -455,6 +474,7 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
 		}
 	}
 
+update:
 	switch (config->aggr_mode) {
 	case AGGR_THREAD:
 	case AGGR_CORE:
-- 
2.38.0.413.g74048e4d9e-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 12/20] perf stat: Allocate aggr counts for recorded data
  2022-10-18  2:02 [PATCHSET 00/20] perf stat: Cleanup counter aggregation (v3) Namhyung Kim
                   ` (10 preceding siblings ...)
  2022-10-18  2:02 ` [PATCH 11/20] perf stat: Aggregate per-thread stats using evsel->stats->aggr Namhyung Kim
@ 2022-10-18  2:02 ` Namhyung Kim
  2022-10-18  2:02 ` [PATCH 13/20] perf stat: Reset aggr counts for each interval Namhyung Kim
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2022-10-18  2:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun, Michael Petlan

In the process_stat_config_event() it sets the aggr_mode that means the
earlier evlist__alloc_stats() cannot allocate the aggr counts due to the
missing aggr_mode.

Do it after setting the aggr_map using evlist__alloc_aggr_stats().

Acked-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c | 20 +++++++++++++++-----
 tools/perf/util/stat.c    | 39 +++++++++++++++++++++++++++++++--------
 tools/perf/util/stat.h    |  2 ++
 3 files changed, 48 insertions(+), 13 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 6777fef0d56c..2a6a5d0c5563 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1342,7 +1342,11 @@ static struct aggr_cpu_id perf_stat__get_cpu(struct perf_stat_config *config __m
 static struct aggr_cpu_id perf_stat__get_aggr(struct perf_stat_config *config,
 					      aggr_get_id_t get_id, struct perf_cpu cpu)
 {
-	struct aggr_cpu_id id = aggr_cpu_id__empty();
+	struct aggr_cpu_id id;
+
+	/* per-process mode - should use global aggr mode */
+	if (cpu.cpu == -1)
+		return get_id(config, cpu);
 
 	if (aggr_cpu_id__is_empty(&config->cpus_aggr_map->map[cpu.cpu]))
 		config->cpus_aggr_map->map[cpu.cpu] = get_id(config, cpu);
@@ -2125,17 +2129,23 @@ int process_stat_config_event(struct perf_session *session,
 	if (perf_cpu_map__empty(st->cpus)) {
 		if (st->aggr_mode != AGGR_UNSET)
 			pr_warning("warning: processing task data, aggregation mode not set\n");
-		return 0;
-	}
-
-	if (st->aggr_mode != AGGR_UNSET)
+	} else if (st->aggr_mode != AGGR_UNSET) {
 		stat_config.aggr_mode = st->aggr_mode;
+	}
 
 	if (perf_stat.data.is_pipe)
 		perf_stat_init_aggr_mode();
 	else
 		perf_stat_init_aggr_mode_file(st);
 
+	if (stat_config.aggr_map) {
+		int nr_aggr = stat_config.aggr_map->nr;
+
+		if (evlist__alloc_aggr_stats(session->evlist, nr_aggr) < 0) {
+			pr_err("cannot allocate aggr counts\n");
+			return -1;
+		}
+	}
 	return 0;
 }
 
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 5b04c9d16156..1b9048115a18 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -141,6 +141,31 @@ static void evsel__reset_stat_priv(struct evsel *evsel)
 		memset(aggr, 0, sizeof(*aggr) * ps->nr_aggr);
 }
 
+static int evsel__alloc_aggr_stats(struct evsel *evsel, int nr_aggr)
+{
+	struct perf_stat_evsel *ps = evsel->stats;
+
+	if (ps == NULL)
+		return 0;
+
+	ps->nr_aggr = nr_aggr;
+	ps->aggr = calloc(nr_aggr, sizeof(*ps->aggr));
+	if (ps->aggr == NULL)
+		return -ENOMEM;
+
+	return 0;
+}
+
+int evlist__alloc_aggr_stats(struct evlist *evlist, int nr_aggr)
+{
+	struct evsel *evsel;
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (evsel__alloc_aggr_stats(evsel, nr_aggr) < 0)
+			return -1;
+	}
+	return 0;
+}
 
 static int evsel__alloc_stat_priv(struct evsel *evsel, int nr_aggr)
 {
@@ -150,16 +175,14 @@ static int evsel__alloc_stat_priv(struct evsel *evsel, int nr_aggr)
 	if (ps == NULL)
 		return -ENOMEM;
 
-	if (nr_aggr) {
-		ps->nr_aggr = nr_aggr;
-		ps->aggr = calloc(nr_aggr, sizeof(*ps->aggr));
-		if (ps->aggr == NULL) {
-			free(ps);
-			return -ENOMEM;
-		}
+	evsel->stats = ps;
+
+	if (nr_aggr && evsel__alloc_aggr_stats(evsel, nr_aggr) < 0) {
+		evsel->stats = NULL;
+		free(ps);
+		return -ENOMEM;
 	}
 
-	evsel->stats = ps;
 	perf_stat_evsel_id_init(evsel);
 	evsel__reset_stat_priv(evsel);
 	return 0;
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 0980875b9be1..4c00f814bd79 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -275,6 +275,8 @@ void evlist__reset_prev_raw_counts(struct evlist *evlist);
 void evlist__copy_prev_raw_counts(struct evlist *evlist);
 void evlist__save_aggr_prev_raw_counts(struct evlist *evlist);
 
+int evlist__alloc_aggr_stats(struct evlist *evlist, int nr_aggr);
+
 int perf_stat_process_counter(struct perf_stat_config *config,
 			      struct evsel *counter);
 struct perf_tool;
-- 
2.38.0.413.g74048e4d9e-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 13/20] perf stat: Reset aggr counts for each interval
  2022-10-18  2:02 [PATCHSET 00/20] perf stat: Cleanup counter aggregation (v3) Namhyung Kim
                   ` (11 preceding siblings ...)
  2022-10-18  2:02 ` [PATCH 12/20] perf stat: Allocate aggr counts for recorded data Namhyung Kim
@ 2022-10-18  2:02 ` Namhyung Kim
  2022-10-18  2:02 ` [PATCH 14/20] perf stat: Split process_counters() Namhyung Kim
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2022-10-18  2:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun, Michael Petlan

The evsel->stats->aggr->count should be reset for interval processing
since we want to use the values directly for display.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c |  3 +++
 tools/perf/util/stat.c    | 20 +++++++++++++++++---
 tools/perf/util/stat.h    |  1 +
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 2a6a5d0c5563..bff28a199dfd 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -492,6 +492,8 @@ static void process_interval(void)
 	diff_timespec(&rs, &ts, &ref_time);
 
 	perf_stat__reset_shadow_per_stat(&rt_stat);
+	evlist__reset_aggr_stats(evsel_list);
+
 	read_counters(&rs);
 
 	if (STAT_RECORD) {
@@ -965,6 +967,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 
 		evlist__copy_prev_raw_counts(evsel_list);
 		evlist__reset_prev_raw_counts(evsel_list);
+		evlist__reset_aggr_stats(evsel_list);
 		perf_stat__reset_shadow_per_stat(&rt_stat);
 	} else {
 		update_stats(&walltime_nsecs_stats, t1 - t0);
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 1b9048115a18..a4066f0d3637 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -130,17 +130,23 @@ static void perf_stat_evsel_id_init(struct evsel *evsel)
 	}
 }
 
-static void evsel__reset_stat_priv(struct evsel *evsel)
+static void evsel__reset_aggr_stats(struct evsel *evsel)
 {
 	struct perf_stat_evsel *ps = evsel->stats;
 	struct perf_stat_aggr *aggr = ps->aggr;
 
-	init_stats(&ps->res_stats);
-
 	if (aggr)
 		memset(aggr, 0, sizeof(*aggr) * ps->nr_aggr);
 }
 
+static void evsel__reset_stat_priv(struct evsel *evsel)
+{
+	struct perf_stat_evsel *ps = evsel->stats;
+
+	init_stats(&ps->res_stats);
+	evsel__reset_aggr_stats(evsel);
+}
+
 static int evsel__alloc_aggr_stats(struct evsel *evsel, int nr_aggr)
 {
 	struct perf_stat_evsel *ps = evsel->stats;
@@ -276,6 +282,14 @@ void evlist__reset_stats(struct evlist *evlist)
 	}
 }
 
+void evlist__reset_aggr_stats(struct evlist *evlist)
+{
+	struct evsel *evsel;
+
+	evlist__for_each_entry(evlist, evsel)
+		evsel__reset_aggr_stats(evsel);
+}
+
 void evlist__reset_prev_raw_counts(struct evlist *evlist)
 {
 	struct evsel *evsel;
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 4c00f814bd79..809f9f0aff0c 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -276,6 +276,7 @@ void evlist__copy_prev_raw_counts(struct evlist *evlist);
 void evlist__save_aggr_prev_raw_counts(struct evlist *evlist);
 
 int evlist__alloc_aggr_stats(struct evlist *evlist, int nr_aggr);
+void evlist__reset_aggr_stats(struct evlist *evlist);
 
 int perf_stat_process_counter(struct perf_stat_config *config,
 			      struct evsel *counter);
-- 
2.38.0.413.g74048e4d9e-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 14/20] perf stat: Split process_counters()
  2022-10-18  2:02 [PATCHSET 00/20] perf stat: Cleanup counter aggregation (v3) Namhyung Kim
                   ` (12 preceding siblings ...)
  2022-10-18  2:02 ` [PATCH 13/20] perf stat: Reset aggr counts for each interval Namhyung Kim
@ 2022-10-18  2:02 ` Namhyung Kim
  2022-10-18  2:02 ` [PATCH 15/20] perf stat: Add perf_stat_merge_counters() Namhyung Kim
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2022-10-18  2:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun, Michael Petlan

It'd do more processing with aggregation.  Let's split the function so that it
can be shared with by process_stat_round_event() too.

Acked-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index bff28a199dfd..838d29590bed 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -465,15 +465,19 @@ static int read_bpf_map_counters(void)
 	return 0;
 }
 
-static void read_counters(struct timespec *rs)
+static int read_counters(struct timespec *rs)
 {
-	struct evsel *counter;
-
 	if (!stat_config.stop_read_counter) {
 		if (read_bpf_map_counters() ||
 		    read_affinity_counters(rs))
-			return;
+			return -1;
 	}
+	return 0;
+}
+
+static void process_counters(void)
+{
+	struct evsel *counter;
 
 	evlist__for_each_entry(evsel_list, counter) {
 		if (counter->err)
@@ -494,7 +498,8 @@ static void process_interval(void)
 	perf_stat__reset_shadow_per_stat(&rt_stat);
 	evlist__reset_aggr_stats(evsel_list);
 
-	read_counters(&rs);
+	if (read_counters(&rs) == 0)
+		process_counters();
 
 	if (STAT_RECORD) {
 		if (WRITE_STAT_ROUND_EVENT(rs.tv_sec * NSEC_PER_SEC + rs.tv_nsec, INTERVAL))
@@ -980,7 +985,8 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 	 * avoid arbitrary skew, we must read all counters before closing any
 	 * group leaders.
 	 */
-	read_counters(&(struct timespec) { .tv_nsec = t1-t0 });
+	if (read_counters(&(struct timespec) { .tv_nsec = t1-t0 }) == 0)
+		process_counters();
 
 	/*
 	 * We need to keep evsel_list alive, because it's processed
@@ -2099,13 +2105,11 @@ static int process_stat_round_event(struct perf_session *session,
 				    union perf_event *event)
 {
 	struct perf_record_stat_round *stat_round = &event->stat_round;
-	struct evsel *counter;
 	struct timespec tsh, *ts = NULL;
 	const char **argv = session->header.env.cmdline_argv;
 	int argc = session->header.env.nr_cmdline;
 
-	evlist__for_each_entry(evsel_list, counter)
-		perf_stat_process_counter(&stat_config, counter);
+	process_counters();
 
 	if (stat_round->type == PERF_STAT_ROUND_TYPE__FINAL)
 		update_stats(&walltime_nsecs_stats, stat_round->time);
-- 
2.38.0.413.g74048e4d9e-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 15/20] perf stat: Add perf_stat_merge_counters()
  2022-10-18  2:02 [PATCHSET 00/20] perf stat: Cleanup counter aggregation (v3) Namhyung Kim
                   ` (13 preceding siblings ...)
  2022-10-18  2:02 ` [PATCH 14/20] perf stat: Split process_counters() Namhyung Kim
@ 2022-10-18  2:02 ` Namhyung Kim
  2022-10-18  2:02 ` [PATCH 16/20] perf stat: Add perf_stat_process_percore() Namhyung Kim
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2022-10-18  2:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun, Michael Petlan

The perf_stat_merge_counters() is to aggregate the same events in different
PMUs like in case of uncore or hybrid.  The same logic is in the stat-display
routines but I think it should be handled when it processes the event counters.

As it works on the aggr_counters, it doesn't change the output yet.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c |  2 +
 tools/perf/util/stat.c    | 96 +++++++++++++++++++++++++++++++++++++++
 tools/perf/util/stat.h    |  2 +
 3 files changed, 100 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 838d29590bed..371d6e896942 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -486,6 +486,8 @@ static void process_counters(void)
 			pr_warning("failed to process counter %s\n", counter->name);
 		counter->err = 0;
 	}
+
+	perf_stat_merge_counters(&stat_config, evsel_list);
 }
 
 static void process_interval(void)
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index a4066f0d3637..aff1e7390585 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -595,6 +595,102 @@ int perf_stat_process_counter(struct perf_stat_config *config,
 	return 0;
 }
 
+static int evsel__merge_aggr_counters(struct evsel *evsel, struct evsel *alias)
+{
+	struct perf_stat_evsel *ps_a = evsel->stats;
+	struct perf_stat_evsel *ps_b = alias->stats;
+	int i;
+
+	if (ps_a->aggr == NULL && ps_b->aggr == NULL)
+		return 0;
+
+	if (ps_a->nr_aggr != ps_b->nr_aggr) {
+		pr_err("Unmatched aggregation mode between aliases\n");
+		return -1;
+	}
+
+	for (i = 0; i < ps_a->nr_aggr; i++) {
+		struct perf_counts_values *aggr_counts_a = &ps_a->aggr[i].counts;
+		struct perf_counts_values *aggr_counts_b = &ps_b->aggr[i].counts;
+
+		/* NB: don't increase aggr.nr for aliases */
+
+		aggr_counts_a->val += aggr_counts_b->val;
+		aggr_counts_a->ena += aggr_counts_b->ena;
+		aggr_counts_a->run += aggr_counts_b->run;
+	}
+
+	return 0;
+}
+/* events should have the same name, scale, unit, cgroup but on different PMUs */
+static bool evsel__is_alias(struct evsel *evsel_a, struct evsel *evsel_b)
+{
+	if (strcmp(evsel__name(evsel_a), evsel__name(evsel_b)))
+		return false;
+
+	if (evsel_a->scale != evsel_b->scale)
+		return false;
+
+	if (evsel_a->cgrp != evsel_b->cgrp)
+		return false;
+
+	if (strcmp(evsel_a->unit, evsel_b->unit))
+		return false;
+
+	if (evsel__is_clock(evsel_a) != evsel__is_clock(evsel_b))
+		return false;
+
+	return !!strcmp(evsel_a->pmu_name, evsel_b->pmu_name);
+}
+
+static void evsel__merge_aliases(struct evsel *evsel)
+{
+	struct evlist *evlist = evsel->evlist;
+	struct evsel *alias;
+
+	alias = list_prepare_entry(evsel, &(evlist->core.entries), core.node);
+	list_for_each_entry_continue(alias, &evlist->core.entries, core.node) {
+		/* Merge the same events on different PMUs. */
+		if (evsel__is_alias(evsel, alias)) {
+			evsel__merge_aggr_counters(evsel, alias);
+			alias->merged_stat = true;
+		}
+	}
+}
+
+static bool evsel__should_merge_hybrid(struct evsel *evsel, struct perf_stat_config *config)
+{
+	struct perf_pmu *pmu;
+
+	if (!config->hybrid_merge)
+		return false;
+
+	pmu = evsel__find_pmu(evsel);
+	return pmu && pmu->is_hybrid;
+}
+
+static void evsel__merge_stats(struct evsel *evsel, struct perf_stat_config *config)
+{
+	/* this evsel is already merged */
+	if (evsel->merged_stat)
+		return;
+
+	if (evsel->auto_merge_stats || evsel__should_merge_hybrid(evsel, config))
+		evsel__merge_aliases(evsel);
+}
+
+/* merge the same uncore and hybrid events if requested */
+void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *evlist)
+{
+	struct evsel *evsel;
+
+	if (config->no_merge)
+		return;
+
+	evlist__for_each_entry(evlist, evsel)
+		evsel__merge_stats(evsel, config);
+}
+
 int perf_event__process_stat_event(struct perf_session *session,
 				   union perf_event *event)
 {
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 809f9f0aff0c..728bbc823b0d 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -280,6 +280,8 @@ void evlist__reset_aggr_stats(struct evlist *evlist);
 
 int perf_stat_process_counter(struct perf_stat_config *config,
 			      struct evsel *counter);
+void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *evlist);
+
 struct perf_tool;
 union perf_event;
 struct perf_session;
-- 
2.38.0.413.g74048e4d9e-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 16/20] perf stat: Add perf_stat_process_percore()
  2022-10-18  2:02 [PATCHSET 00/20] perf stat: Cleanup counter aggregation (v3) Namhyung Kim
                   ` (14 preceding siblings ...)
  2022-10-18  2:02 ` [PATCH 15/20] perf stat: Add perf_stat_merge_counters() Namhyung Kim
@ 2022-10-18  2:02 ` Namhyung Kim
  2022-10-18  2:02 ` [PATCH 17/20] perf stat: Add perf_stat_process_shadow_stats() Namhyung Kim
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2022-10-18  2:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun, Michael Petlan

The perf_stat_process_percore() is to aggregate counts for an event per-core
even if the aggr_mode is AGGR_NONE.  This is enabled when user requested it
on the command line.

To handle that, it keeps the per-cpu counts at first.  And then it aggregates
the counts that have the same core id in the aggr->counts and updates the
values for each cpu back.

Later, per-core events will skip one of the CPUs unless percore-show-thread
option is given.  In that case, it can simply print all cpu stats with the
updated (per-core) values.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c |  1 +
 tools/perf/util/stat.c    | 71 +++++++++++++++++++++++++++++++++++++++
 tools/perf/util/stat.h    |  3 ++
 3 files changed, 75 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 371d6e896942..d6a006e41da0 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -488,6 +488,7 @@ static void process_counters(void)
 	}
 
 	perf_stat_merge_counters(&stat_config, evsel_list);
+	perf_stat_process_percore(&stat_config, evsel_list);
 }
 
 static void process_interval(void)
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index aff1e7390585..26c48ef7ca92 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -691,6 +691,77 @@ void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *ev
 		evsel__merge_stats(evsel, config);
 }
 
+static void evsel__update_percore_stats(struct evsel *evsel, struct aggr_cpu_id *core_id)
+{
+	struct perf_stat_evsel *ps = evsel->stats;
+	struct perf_counts_values counts = { 0, };
+	struct aggr_cpu_id id;
+	struct perf_cpu cpu;
+	int idx;
+
+	/* collect per-core counts */
+	perf_cpu_map__for_each_cpu(cpu, idx, evsel->core.cpus) {
+		struct perf_stat_aggr *aggr = &ps->aggr[idx];
+
+		id = aggr_cpu_id__core(cpu, NULL);
+		if (!aggr_cpu_id__equal(core_id, &id))
+			continue;
+
+		counts.val += aggr->counts.val;
+		counts.ena += aggr->counts.ena;
+		counts.run += aggr->counts.run;
+	}
+
+	/* update aggregated per-core counts for each CPU */
+	perf_cpu_map__for_each_cpu(cpu, idx, evsel->core.cpus) {
+		struct perf_stat_aggr *aggr = &ps->aggr[idx];
+
+		id = aggr_cpu_id__core(cpu, NULL);
+		if (!aggr_cpu_id__equal(core_id, &id))
+			continue;
+
+		aggr->counts.val = counts.val;
+		aggr->counts.ena = counts.ena;
+		aggr->counts.run = counts.run;
+
+		aggr->used = true;
+	}
+}
+
+/* we have an aggr_map for cpu, but want to aggregate the counters per-core */
+static void evsel__process_percore(struct evsel *evsel)
+{
+	struct perf_stat_evsel *ps = evsel->stats;
+	struct aggr_cpu_id core_id;
+	struct perf_cpu cpu;
+	int idx;
+
+	if (!evsel->percore)
+		return;
+
+	perf_cpu_map__for_each_cpu(cpu, idx, evsel->core.cpus) {
+		struct perf_stat_aggr *aggr = &ps->aggr[idx];
+
+		if (aggr->used)
+			continue;
+
+		core_id = aggr_cpu_id__core(cpu, NULL);
+		evsel__update_percore_stats(evsel, &core_id);
+	}
+}
+
+/* process cpu stats on per-core events */
+void perf_stat_process_percore(struct perf_stat_config *config, struct evlist *evlist)
+{
+	struct evsel *evsel;
+
+	if (config->aggr_mode != AGGR_NONE)
+		return;
+
+	evlist__for_each_entry(evlist, evsel)
+		evsel__process_percore(evsel);
+}
+
 int perf_event__process_stat_event(struct perf_session *session,
 				   union perf_event *event)
 {
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 728bbc823b0d..d23f8743e442 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -51,6 +51,8 @@ struct perf_stat_aggr {
 	int				nr;
 	/* whether any entry has failed to read/process event */
 	bool				failed;
+	/* to mark this data is processed already */
+	bool				used;
 };
 
 /* per-evsel event stats */
@@ -281,6 +283,7 @@ void evlist__reset_aggr_stats(struct evlist *evlist);
 int perf_stat_process_counter(struct perf_stat_config *config,
 			      struct evsel *counter);
 void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *evlist);
+void perf_stat_process_percore(struct perf_stat_config *config, struct evlist *evlist);
 
 struct perf_tool;
 union perf_event;
-- 
2.38.0.413.g74048e4d9e-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 17/20] perf stat: Add perf_stat_process_shadow_stats()
  2022-10-18  2:02 [PATCHSET 00/20] perf stat: Cleanup counter aggregation (v3) Namhyung Kim
                   ` (15 preceding siblings ...)
  2022-10-18  2:02 ` [PATCH 16/20] perf stat: Add perf_stat_process_percore() Namhyung Kim
@ 2022-10-18  2:02 ` Namhyung Kim
  2022-10-18  2:02 ` [PATCH 18/20] perf stat: Display event stats using aggr counts Namhyung Kim
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2022-10-18  2:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun, Michael Petlan

This function updates the shadow stats using the aggregated counts
uniformly since it uses the aggr_counts for the every aggr mode.

It'd have duplicate shadow stats for each items for now since the
display routines will update them once again.  But that'd be fine
as it shows the average values and it'd be gone eventually.

Acked-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c |  1 +
 tools/perf/util/stat.c    | 50 ++++++++++++++++++++-------------------
 tools/perf/util/stat.h    |  1 +
 3 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index d6a006e41da0..d7c52cef70a3 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -489,6 +489,7 @@ static void process_counters(void)
 
 	perf_stat_merge_counters(&stat_config, evsel_list);
 	perf_stat_process_percore(&stat_config, evsel_list);
+	perf_stat_process_shadow_stats(&stat_config, evsel_list);
 }
 
 static void process_interval(void)
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 26c48ef7ca92..c0955a0427ab 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -474,7 +474,7 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
 		aggr_counts->val += count->val;
 		aggr_counts->ena += count->ena;
 		aggr_counts->run += count->run;
-		goto update;
+		return 0;
 	}
 
 	if (ps->aggr) {
@@ -511,32 +511,10 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
 		}
 	}
 
-update:
-	switch (config->aggr_mode) {
-	case AGGR_THREAD:
-	case AGGR_CORE:
-	case AGGR_DIE:
-	case AGGR_SOCKET:
-	case AGGR_NODE:
-	case AGGR_NONE:
-		if ((config->aggr_mode == AGGR_NONE) && (!evsel->percore)) {
-			perf_stat__update_shadow_stats(evsel, count->val,
-						       cpu_map_idx, &rt_stat);
-		}
-
-		if (config->aggr_mode == AGGR_THREAD) {
-			perf_stat__update_shadow_stats(evsel, count->val,
-						       thread, &rt_stat);
-		}
-		break;
-	case AGGR_GLOBAL:
+	if (config->aggr_mode == AGGR_GLOBAL) {
 		aggr->val += count->val;
 		aggr->ena += count->ena;
 		aggr->run += count->run;
-	case AGGR_UNSET:
-	case AGGR_MAX:
-	default:
-		break;
 	}
 
 	return 0;
@@ -762,6 +740,30 @@ void perf_stat_process_percore(struct perf_stat_config *config, struct evlist *e
 		evsel__process_percore(evsel);
 }
 
+static void evsel__update_shadow_stats(struct evsel *evsel)
+{
+	struct perf_stat_evsel *ps = evsel->stats;
+	int i;
+
+	if (ps->aggr == NULL)
+		return;
+
+	for (i = 0; i < ps->nr_aggr; i++) {
+		struct perf_counts_values *aggr_counts = &ps->aggr[i].counts;
+
+		perf_stat__update_shadow_stats(evsel, aggr_counts->val, i, &rt_stat);
+	}
+}
+
+void perf_stat_process_shadow_stats(struct perf_stat_config *config __maybe_unused,
+				    struct evlist *evlist)
+{
+	struct evsel *evsel;
+
+	evlist__for_each_entry(evlist, evsel)
+		evsel__update_shadow_stats(evsel);
+}
+
 int perf_event__process_stat_event(struct perf_session *session,
 				   union perf_event *event)
 {
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index d23f8743e442..3d413ba8c68a 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -284,6 +284,7 @@ int perf_stat_process_counter(struct perf_stat_config *config,
 			      struct evsel *counter);
 void perf_stat_merge_counters(struct perf_stat_config *config, struct evlist *evlist);
 void perf_stat_process_percore(struct perf_stat_config *config, struct evlist *evlist);
+void perf_stat_process_shadow_stats(struct perf_stat_config *config, struct evlist *evlist);
 
 struct perf_tool;
 union perf_event;
-- 
2.38.0.413.g74048e4d9e-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 18/20] perf stat: Display event stats using aggr counts
  2022-10-18  2:02 [PATCHSET 00/20] perf stat: Cleanup counter aggregation (v3) Namhyung Kim
                   ` (16 preceding siblings ...)
  2022-10-18  2:02 ` [PATCH 17/20] perf stat: Add perf_stat_process_shadow_stats() Namhyung Kim
@ 2022-10-18  2:02 ` Namhyung Kim
  2022-12-05 10:51   ` Athira Rajeev
  2023-06-14 13:40   ` Jiri Olsa
  2022-10-18  2:02 ` [PATCH 19/20] perf stat: Display percore events properly Namhyung Kim
  2022-10-18  2:02 ` [PATCH 20/20] perf stat: Remove unused perf_counts.aggr field Namhyung Kim
  19 siblings, 2 replies; 28+ messages in thread
From: Namhyung Kim @ 2022-10-18  2:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun, Michael Petlan

Now aggr counts are ready for use.  Convert the display routines to use
the aggr counts and update the shadow stat with them.  It doesn't need
to aggregate counts or collect aliases anymore during the display.  Get
rid of now unused struct perf_aggr_thread_value.

Note that there's a difference in the display order among the aggr mode.
For per-core/die/socket/node aggregation, it shows relevant events in
the same unit together, whereas global/thread/no aggregation it shows
the same events for different units together.  So it still uses separate
codes to display them due to the ordering.

One more thing to note is that it breaks per-core event display for now.
The next patch will fix it to have identical output as of now.

Acked-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/stat-display.c | 421 ++++-----------------------------
 tools/perf/util/stat.c         |   5 -
 tools/perf/util/stat.h         |   9 -
 3 files changed, 49 insertions(+), 386 deletions(-)

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 4113aa86772f..bfae2784609c 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -442,31 +442,6 @@ static void print_metric_header(struct perf_stat_config *config,
 		fprintf(os->fh, "%*s ", config->metric_only_len, unit);
 }
 
-static int first_shadow_map_idx(struct perf_stat_config *config,
-				struct evsel *evsel, const struct aggr_cpu_id *id)
-{
-	struct perf_cpu_map *cpus = evsel__cpus(evsel);
-	struct perf_cpu cpu;
-	int idx;
-
-	if (config->aggr_mode == AGGR_NONE)
-		return perf_cpu_map__idx(cpus, id->cpu);
-
-	if (config->aggr_mode == AGGR_THREAD)
-		return id->thread_idx;
-
-	if (!config->aggr_get_id)
-		return 0;
-
-	perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
-		struct aggr_cpu_id cpu_id = config->aggr_get_id(config, cpu);
-
-		if (aggr_cpu_id__equal(&cpu_id, id))
-			return idx;
-	}
-	return 0;
-}
-
 static void abs_printout(struct perf_stat_config *config,
 			 struct aggr_cpu_id id, int nr, struct evsel *evsel, double avg)
 {
@@ -537,7 +512,7 @@ static bool is_mixed_hw_group(struct evsel *counter)
 static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int nr,
 		     struct evsel *counter, double uval,
 		     char *prefix, u64 run, u64 ena, double noise,
-		     struct runtime_stat *st)
+		     struct runtime_stat *st, int map_idx)
 {
 	struct perf_stat_output_ctx out;
 	struct outstate os = {
@@ -648,8 +623,7 @@ static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int
 		print_running(config, run, ena);
 	}
 
-	perf_stat__print_shadow_stats(config, counter, uval,
-				first_shadow_map_idx(config, counter, &id),
+	perf_stat__print_shadow_stats(config, counter, uval, map_idx,
 				&out, &config->metric_events, st);
 	if (!config->csv_output && !config->metric_only && !config->json_output) {
 		print_noise(config, counter, noise);
@@ -657,34 +631,6 @@ static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int
 	}
 }
 
-static void aggr_update_shadow(struct perf_stat_config *config,
-			       struct evlist *evlist)
-{
-	int idx, s;
-	struct perf_cpu cpu;
-	struct aggr_cpu_id s2, id;
-	u64 val;
-	struct evsel *counter;
-	struct perf_cpu_map *cpus;
-
-	for (s = 0; s < config->aggr_map->nr; s++) {
-		id = config->aggr_map->map[s];
-		evlist__for_each_entry(evlist, counter) {
-			cpus = evsel__cpus(counter);
-			val = 0;
-			perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
-				s2 = config->aggr_get_id(config, cpu);
-				if (!aggr_cpu_id__equal(&s2, &id))
-					continue;
-				val += perf_counts(counter->counts, idx, 0)->val;
-			}
-			perf_stat__update_shadow_stats(counter, val,
-					first_shadow_map_idx(config, counter, &id),
-					&rt_stat);
-		}
-	}
-}
-
 static void uniquify_event_name(struct evsel *counter)
 {
 	char *new_name;
@@ -721,137 +667,51 @@ static void uniquify_event_name(struct evsel *counter)
 	counter->uniquified_name = true;
 }
 
-static void collect_all_aliases(struct perf_stat_config *config, struct evsel *counter,
-			    void (*cb)(struct perf_stat_config *config, struct evsel *counter, void *data,
-				       bool first),
-			    void *data)
+static bool hybrid_uniquify(struct evsel *evsel, struct perf_stat_config *config)
 {
-	struct evlist *evlist = counter->evlist;
-	struct evsel *alias;
-
-	alias = list_prepare_entry(counter, &(evlist->core.entries), core.node);
-	list_for_each_entry_continue (alias, &evlist->core.entries, core.node) {
-		/* Merge events with the same name, etc. but on different PMUs. */
-		if (!strcmp(evsel__name(alias), evsel__name(counter)) &&
-			alias->scale == counter->scale &&
-			alias->cgrp == counter->cgrp &&
-			!strcmp(alias->unit, counter->unit) &&
-			evsel__is_clock(alias) == evsel__is_clock(counter) &&
-			strcmp(alias->pmu_name, counter->pmu_name)) {
-			alias->merged_stat = true;
-			cb(config, alias, data, false);
-		}
-	}
+	return evsel__is_hybrid(evsel) && !config->hybrid_merge;
 }
 
-static bool hybrid_merge(struct evsel *counter, struct perf_stat_config *config,
-			 bool check)
+static void uniquify_counter(struct perf_stat_config *config, struct evsel *counter)
 {
-	if (evsel__is_hybrid(counter)) {
-		if (check)
-			return config->hybrid_merge;
-		else
-			return !config->hybrid_merge;
-	}
-
-	return false;
-}
-
-static bool collect_data(struct perf_stat_config *config, struct evsel *counter,
-			    void (*cb)(struct perf_stat_config *config, struct evsel *counter, void *data,
-				       bool first),
-			    void *data)
-{
-	if (counter->merged_stat)
-		return false;
-	cb(config, counter, data, true);
-	if (config->no_merge || hybrid_merge(counter, config, false))
+	if (config->no_merge || hybrid_uniquify(counter, config))
 		uniquify_event_name(counter);
-	else if (counter->auto_merge_stats || hybrid_merge(counter, config, true))
-		collect_all_aliases(config, counter, cb, data);
-	return true;
-}
-
-struct aggr_data {
-	u64 ena, run, val;
-	struct aggr_cpu_id id;
-	int nr;
-	int cpu_map_idx;
-};
-
-static void aggr_cb(struct perf_stat_config *config,
-		    struct evsel *counter, void *data, bool first)
-{
-	struct aggr_data *ad = data;
-	int idx;
-	struct perf_cpu cpu;
-	struct perf_cpu_map *cpus;
-	struct aggr_cpu_id s2;
-
-	cpus = evsel__cpus(counter);
-	perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
-		struct perf_counts_values *counts;
-
-		s2 = config->aggr_get_id(config, cpu);
-		if (!aggr_cpu_id__equal(&s2, &ad->id))
-			continue;
-		if (first)
-			ad->nr++;
-		counts = perf_counts(counter->counts, idx, 0);
-		/*
-		 * When any result is bad, make them all to give
-		 * consistent output in interval mode.
-		 */
-		if (counts->ena == 0 || counts->run == 0 ||
-		    counter->counts->scaled == -1) {
-			ad->ena = 0;
-			ad->run = 0;
-			break;
-		}
-		ad->val += counts->val;
-		ad->ena += counts->ena;
-		ad->run += counts->run;
-	}
 }
 
 static void print_counter_aggrdata(struct perf_stat_config *config,
 				   struct evsel *counter, int s,
 				   char *prefix, bool metric_only,
-				   bool *first, struct perf_cpu cpu)
+				   bool *first)
 {
-	struct aggr_data ad;
 	FILE *output = config->output;
 	u64 ena, run, val;
-	int nr;
-	struct aggr_cpu_id id;
 	double uval;
+	struct perf_stat_evsel *ps = counter->stats;
+	struct perf_stat_aggr *aggr = &ps->aggr[s];
+	struct aggr_cpu_id id = config->aggr_map->map[s];
+	double avg = aggr->counts.val;
 
-	ad.id = id = config->aggr_map->map[s];
-	ad.val = ad.ena = ad.run = 0;
-	ad.nr = 0;
-	if (!collect_data(config, counter, aggr_cb, &ad))
+	if (counter->supported && aggr->nr == 0)
 		return;
 
-	if (perf_pmu__has_hybrid() && ad.ena == 0)
-		return;
+	uniquify_counter(config, counter);
+
+	val = aggr->counts.val;
+	ena = aggr->counts.ena;
+	run = aggr->counts.run;
 
-	nr = ad.nr;
-	ena = ad.ena;
-	run = ad.run;
-	val = ad.val;
 	if (*first && metric_only) {
 		*first = false;
-		aggr_printout(config, counter, id, nr);
+		aggr_printout(config, counter, id, aggr->nr);
 	}
 	if (prefix && !metric_only)
 		fprintf(output, "%s", prefix);
 
 	uval = val * counter->scale;
-	if (cpu.cpu != -1)
-		id = aggr_cpu_id__cpu(cpu, /*data=*/NULL);
 
-	printout(config, id, nr, counter, uval,
-		 prefix, run, ena, 1.0, &rt_stat);
+	printout(config, id, aggr->nr, counter, uval,
+		 prefix, run, ena, avg, &rt_stat, s);
+
 	if (!metric_only)
 		fputc('\n', output);
 }
@@ -869,8 +729,6 @@ static void print_aggr(struct perf_stat_config *config,
 	if (!config->aggr_map || !config->aggr_get_id)
 		return;
 
-	aggr_update_shadow(config, evlist);
-
 	/*
 	 * With metric_only everything is on a single line.
 	 * Without each counter has its own line.
@@ -881,188 +739,36 @@ static void print_aggr(struct perf_stat_config *config,
 
 		first = true;
 		evlist__for_each_entry(evlist, counter) {
+			if (counter->merged_stat)
+				continue;
+
 			print_counter_aggrdata(config, counter, s,
-					prefix, metric_only,
-					&first, (struct perf_cpu){ .cpu = -1 });
+					       prefix, metric_only,
+					       &first);
 		}
 		if (metric_only)
 			fputc('\n', output);
 	}
 }
 
-static int cmp_val(const void *a, const void *b)
-{
-	return ((struct perf_aggr_thread_value *)b)->val -
-		((struct perf_aggr_thread_value *)a)->val;
-}
-
-static struct perf_aggr_thread_value *sort_aggr_thread(
-					struct evsel *counter,
-					int *ret,
-					struct target *_target)
-{
-	int nthreads = perf_thread_map__nr(counter->core.threads);
-	int i = 0;
-	double uval;
-	struct perf_aggr_thread_value *buf;
-
-	buf = calloc(nthreads, sizeof(struct perf_aggr_thread_value));
-	if (!buf)
-		return NULL;
-
-	for (int thread = 0; thread < nthreads; thread++) {
-		int idx;
-		u64 ena = 0, run = 0, val = 0;
-
-		perf_cpu_map__for_each_idx(idx, evsel__cpus(counter)) {
-			struct perf_counts_values *counts =
-				perf_counts(counter->counts, idx, thread);
-
-			val += counts->val;
-			ena += counts->ena;
-			run += counts->run;
-		}
-
-		uval = val * counter->scale;
-
-		/*
-		 * Skip value 0 when enabling --per-thread globally,
-		 * otherwise too many 0 output.
-		 */
-		if (uval == 0.0 && target__has_per_thread(_target))
-			continue;
-
-		buf[i].counter = counter;
-		buf[i].id = aggr_cpu_id__empty();
-		buf[i].id.thread_idx = thread;
-		buf[i].uval = uval;
-		buf[i].val = val;
-		buf[i].run = run;
-		buf[i].ena = ena;
-		i++;
-	}
-
-	qsort(buf, i, sizeof(struct perf_aggr_thread_value), cmp_val);
-
-	if (ret)
-		*ret = i;
-
-	return buf;
-}
-
-static void print_aggr_thread(struct perf_stat_config *config,
-			      struct target *_target,
-			      struct evsel *counter, char *prefix)
-{
-	FILE *output = config->output;
-	int thread, sorted_threads;
-	struct aggr_cpu_id id;
-	struct perf_aggr_thread_value *buf;
-
-	buf = sort_aggr_thread(counter, &sorted_threads, _target);
-	if (!buf) {
-		perror("cannot sort aggr thread");
-		return;
-	}
-
-	for (thread = 0; thread < sorted_threads; thread++) {
-		if (prefix)
-			fprintf(output, "%s", prefix);
-
-		id = buf[thread].id;
-		printout(config, id, 0, buf[thread].counter, buf[thread].uval,
-			 prefix, buf[thread].run, buf[thread].ena, 1.0,
-			 &rt_stat);
-		fputc('\n', output);
-	}
-
-	free(buf);
-}
-
-struct caggr_data {
-	double avg, avg_enabled, avg_running;
-};
-
-static void counter_aggr_cb(struct perf_stat_config *config __maybe_unused,
-			    struct evsel *counter, void *data,
-			    bool first __maybe_unused)
-{
-	struct caggr_data *cd = data;
-	struct perf_counts_values *aggr = &counter->counts->aggr;
-
-	cd->avg += aggr->val;
-	cd->avg_enabled += aggr->ena;
-	cd->avg_running += aggr->run;
-}
-
-/*
- * Print out the results of a single counter:
- * aggregated counts in system-wide mode
- */
-static void print_counter_aggr(struct perf_stat_config *config,
-			       struct evsel *counter, char *prefix)
-{
-	bool metric_only = config->metric_only;
-	FILE *output = config->output;
-	double uval;
-	struct caggr_data cd = { .avg = 0.0 };
-
-	if (!collect_data(config, counter, counter_aggr_cb, &cd))
-		return;
-
-	if (prefix && !metric_only)
-		fprintf(output, "%s", prefix);
-
-	uval = cd.avg * counter->scale;
-	printout(config, aggr_cpu_id__empty(), 0, counter, uval, prefix, cd.avg_running,
-		 cd.avg_enabled, cd.avg, &rt_stat);
-	if (!metric_only)
-		fprintf(output, "\n");
-}
-
-static void counter_cb(struct perf_stat_config *config __maybe_unused,
-		       struct evsel *counter, void *data,
-		       bool first __maybe_unused)
-{
-	struct aggr_data *ad = data;
-
-	ad->val += perf_counts(counter->counts, ad->cpu_map_idx, 0)->val;
-	ad->ena += perf_counts(counter->counts, ad->cpu_map_idx, 0)->ena;
-	ad->run += perf_counts(counter->counts, ad->cpu_map_idx, 0)->run;
-}
-
-/*
- * Print out the results of a single counter:
- * does not use aggregated count in system-wide
- */
 static void print_counter(struct perf_stat_config *config,
 			  struct evsel *counter, char *prefix)
 {
-	FILE *output = config->output;
-	u64 ena, run, val;
-	double uval;
-	int idx;
-	struct perf_cpu cpu;
-	struct aggr_cpu_id id;
-
-	perf_cpu_map__for_each_cpu(cpu, idx, evsel__cpus(counter)) {
-		struct aggr_data ad = { .cpu_map_idx = idx };
-
-		if (!collect_data(config, counter, counter_cb, &ad))
-			return;
-		val = ad.val;
-		ena = ad.ena;
-		run = ad.run;
+	bool metric_only = config->metric_only;
+	bool first = false;
+	int s;
 
-		if (prefix)
-			fprintf(output, "%s", prefix);
+	/* AGGR_THREAD doesn't have config->aggr_get_id */
+	if (!config->aggr_map)
+		return;
 
-		uval = val * counter->scale;
-		id = aggr_cpu_id__cpu(cpu, /*data=*/NULL);
-		printout(config, id, 0, counter, uval, prefix,
-			 run, ena, 1.0, &rt_stat);
+	if (counter->merged_stat)
+		return;
 
-		fputc('\n', output);
+	for (s = 0; s < config->aggr_map->nr; s++) {
+		print_counter_aggrdata(config, counter, s,
+				       prefix, metric_only,
+				       &first);
 	}
 }
 
@@ -1081,6 +787,7 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
 			u64 ena, run, val;
 			double uval;
 			struct aggr_cpu_id id;
+			struct perf_stat_evsel *ps = counter->stats;
 			int counter_idx = perf_cpu_map__idx(evsel__cpus(counter), cpu);
 
 			if (counter_idx < 0)
@@ -1093,13 +800,13 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
 				aggr_printout(config, counter, id, 0);
 				first = false;
 			}
-			val = perf_counts(counter->counts, counter_idx, 0)->val;
-			ena = perf_counts(counter->counts, counter_idx, 0)->ena;
-			run = perf_counts(counter->counts, counter_idx, 0)->run;
+			val = ps->aggr[counter_idx].counts.val;
+			ena = ps->aggr[counter_idx].counts.ena;
+			run = ps->aggr[counter_idx].counts.run;
 
 			uval = val * counter->scale;
 			printout(config, id, 0, counter, uval, prefix,
-				 run, ena, 1.0, &rt_stat);
+				 run, ena, 1.0, &rt_stat, counter_idx);
 		}
 		if (!first)
 			fputc('\n', config->output);
@@ -1135,8 +842,8 @@ static void print_metric_headers(struct perf_stat_config *config,
 	};
 	bool first = true;
 
-		if (config->json_output && !config->interval)
-			fprintf(config->output, "{");
+	if (config->json_output && !config->interval)
+		fprintf(config->output, "{");
 
 	if (prefix && !config->json_output)
 		fprintf(config->output, "%s", prefix);
@@ -1379,31 +1086,6 @@ static void print_footer(struct perf_stat_config *config)
 			"the same PMU. Try reorganizing the group.\n");
 }
 
-static void print_percore_thread(struct perf_stat_config *config,
-				 struct evsel *counter, char *prefix)
-{
-	int s;
-	struct aggr_cpu_id s2, id;
-	struct perf_cpu_map *cpus;
-	bool first = true;
-	int idx;
-	struct perf_cpu cpu;
-
-	cpus = evsel__cpus(counter);
-	perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
-		s2 = config->aggr_get_id(config, cpu);
-		for (s = 0; s < config->aggr_map->nr; s++) {
-			id = config->aggr_map->map[s];
-			if (aggr_cpu_id__equal(&s2, &id))
-				break;
-		}
-
-		print_counter_aggrdata(config, counter, s,
-				       prefix, false,
-				       &first, cpu);
-	}
-}
-
 static void print_percore(struct perf_stat_config *config,
 			  struct evsel *counter, char *prefix)
 {
@@ -1416,15 +1098,14 @@ static void print_percore(struct perf_stat_config *config,
 		return;
 
 	if (config->percore_show_thread)
-		return print_percore_thread(config, counter, prefix);
+		return print_counter(config, counter, prefix);
 
 	for (s = 0; s < config->aggr_map->nr; s++) {
 		if (prefix && metric_only)
 			fprintf(output, "%s", prefix);
 
 		print_counter_aggrdata(config, counter, s,
-				prefix, metric_only,
-				&first, (struct perf_cpu){ .cpu = -1 });
+				       prefix, metric_only, &first);
 	}
 
 	if (metric_only)
@@ -1469,17 +1150,13 @@ void evlist__print_counters(struct evlist *evlist, struct perf_stat_config *conf
 		print_aggr(config, evlist, prefix);
 		break;
 	case AGGR_THREAD:
-		evlist__for_each_entry(evlist, counter) {
-			print_aggr_thread(config, _target, counter, prefix);
-		}
-		break;
 	case AGGR_GLOBAL:
 		if (config->iostat_run)
 			iostat_print_counters(evlist, config, ts, prefix = buf,
-					      print_counter_aggr);
+					      print_counter);
 		else {
 			evlist__for_each_entry(evlist, counter) {
-				print_counter_aggr(config, counter, prefix);
+				print_counter(config, counter, prefix);
 			}
 			if (metric_only)
 				fputc('\n', config->output);
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index c0955a0427ab..0316557adce9 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -565,11 +565,6 @@ int perf_stat_process_counter(struct perf_stat_config *config,
 			evsel__name(counter), count[0], count[1], count[2]);
 	}
 
-	/*
-	 * Save the full runtime - to allow normalization during printout:
-	 */
-	perf_stat__update_shadow_stats(counter, *count, 0, &rt_stat);
-
 	return 0;
 }
 
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 3d413ba8c68a..382a1ab92ce1 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -224,15 +224,6 @@ static inline void update_rusage_stats(struct rusage_stats *ru_stats, struct rus
 struct evsel;
 struct evlist;
 
-struct perf_aggr_thread_value {
-	struct evsel *counter;
-	struct aggr_cpu_id id;
-	double uval;
-	u64 val;
-	u64 run;
-	u64 ena;
-};
-
 bool __perf_stat_evsel__is(struct evsel *evsel, enum perf_stat_evsel_id id);
 
 #define perf_stat_evsel__is(evsel, id) \
-- 
2.38.0.413.g74048e4d9e-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 19/20] perf stat: Display percore events properly
  2022-10-18  2:02 [PATCHSET 00/20] perf stat: Cleanup counter aggregation (v3) Namhyung Kim
                   ` (17 preceding siblings ...)
  2022-10-18  2:02 ` [PATCH 18/20] perf stat: Display event stats using aggr counts Namhyung Kim
@ 2022-10-18  2:02 ` Namhyung Kim
  2022-10-18  2:02 ` [PATCH 20/20] perf stat: Remove unused perf_counts.aggr field Namhyung Kim
  19 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2022-10-18  2:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun, Michael Petlan

The recent change in the perf stat broke the percore event display.
Note that the aggr counts are already processed so that the every
sibling thread in the same core will get the per-core counter values.

Check percore evsels and skip the sibling threads in the display.

Acked-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c      | 16 ----------------
 tools/perf/util/stat-display.c | 27 +++++++++++++++++++++++++--
 2 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index d7c52cef70a3..9d35a3338976 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1404,18 +1404,6 @@ static struct aggr_cpu_id perf_stat__get_cpu_cached(struct perf_stat_config *con
 	return perf_stat__get_aggr(config, perf_stat__get_cpu, cpu);
 }
 
-static bool term_percore_set(void)
-{
-	struct evsel *counter;
-
-	evlist__for_each_entry(evsel_list, counter) {
-		if (counter->percore)
-			return true;
-	}
-
-	return false;
-}
-
 static aggr_cpu_id_get_t aggr_mode__get_aggr(enum aggr_mode aggr_mode)
 {
 	switch (aggr_mode) {
@@ -1428,8 +1416,6 @@ static aggr_cpu_id_get_t aggr_mode__get_aggr(enum aggr_mode aggr_mode)
 	case AGGR_NODE:
 		return aggr_cpu_id__node;
 	case AGGR_NONE:
-		if (term_percore_set())
-			return aggr_cpu_id__core;
 		return aggr_cpu_id__cpu;
 	case AGGR_GLOBAL:
 		return aggr_cpu_id__global;
@@ -1453,8 +1439,6 @@ static aggr_get_id_t aggr_mode__get_id(enum aggr_mode aggr_mode)
 	case AGGR_NODE:
 		return perf_stat__get_node_cached;
 	case AGGR_NONE:
-		if (term_percore_set())
-			return perf_stat__get_core_cached;
 		return perf_stat__get_cpu_cached;
 	case AGGR_GLOBAL:
 		return perf_stat__get_global_cached;
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index bfae2784609c..657434cd29ee 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -1091,7 +1091,8 @@ static void print_percore(struct perf_stat_config *config,
 {
 	bool metric_only = config->metric_only;
 	FILE *output = config->output;
-	int s;
+	struct cpu_aggr_map *core_map;
+	int s, c, i;
 	bool first = true;
 
 	if (!config->aggr_map || !config->aggr_get_id)
@@ -1100,13 +1101,35 @@ static void print_percore(struct perf_stat_config *config,
 	if (config->percore_show_thread)
 		return print_counter(config, counter, prefix);
 
-	for (s = 0; s < config->aggr_map->nr; s++) {
+	core_map = cpu_aggr_map__empty_new(config->aggr_map->nr);
+	if (core_map == NULL) {
+		fprintf(output, "Cannot allocate per-core aggr map for display\n");
+		return;
+	}
+
+	for (s = 0, c = 0; s < config->aggr_map->nr; s++) {
+		struct perf_cpu curr_cpu = config->aggr_map->map[s].cpu;
+		struct aggr_cpu_id core_id = aggr_cpu_id__core(curr_cpu, NULL);
+		bool found = false;
+
+		for (i = 0; i < c; i++) {
+			if (aggr_cpu_id__equal(&core_map->map[i], &core_id)) {
+				found = true;
+				break;
+			}
+		}
+		if (found)
+			continue;
+
 		if (prefix && metric_only)
 			fprintf(output, "%s", prefix);
 
 		print_counter_aggrdata(config, counter, s,
 				       prefix, metric_only, &first);
+
+		core_map->map[c++] = core_id;
 	}
+	free(core_map);
 
 	if (metric_only)
 		fputc('\n', output);
-- 
2.38.0.413.g74048e4d9e-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 20/20] perf stat: Remove unused perf_counts.aggr field
  2022-10-18  2:02 [PATCHSET 00/20] perf stat: Cleanup counter aggregation (v3) Namhyung Kim
                   ` (18 preceding siblings ...)
  2022-10-18  2:02 ` [PATCH 19/20] perf stat: Display percore events properly Namhyung Kim
@ 2022-10-18  2:02 ` Namhyung Kim
  19 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2022-10-18  2:02 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Ian Rogers, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, Athira Rajeev,
	James Clark, Xing Zhengjun, Michael Petlan

The aggr field in the struct perf_counts is to keep the aggregated value
in the AGGR_GLOBAL for the old code.  But it's not used anymore.

Acked-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/counts.c |  1 -
 tools/perf/util/counts.h |  1 -
 tools/perf/util/stat.c   | 39 ++++++---------------------------------
 3 files changed, 6 insertions(+), 35 deletions(-)

diff --git a/tools/perf/util/counts.c b/tools/perf/util/counts.c
index 7a447d918458..11cd85b278a6 100644
--- a/tools/perf/util/counts.c
+++ b/tools/perf/util/counts.c
@@ -48,7 +48,6 @@ void perf_counts__reset(struct perf_counts *counts)
 {
 	xyarray__reset(counts->loaded);
 	xyarray__reset(counts->values);
-	memset(&counts->aggr, 0, sizeof(struct perf_counts_values));
 }
 
 void evsel__reset_counts(struct evsel *evsel)
diff --git a/tools/perf/util/counts.h b/tools/perf/util/counts.h
index 5de275194f2b..42760242e0df 100644
--- a/tools/perf/util/counts.h
+++ b/tools/perf/util/counts.h
@@ -11,7 +11,6 @@ struct evsel;
 
 struct perf_counts {
 	s8			  scaled;
-	struct perf_counts_values aggr;
 	struct xyarray		  *values;
 	struct xyarray		  *loaded;
 };
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 0316557adce9..3a432a949d46 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -308,8 +308,6 @@ static void evsel__copy_prev_raw_counts(struct evsel *evsel)
 				*perf_counts(evsel->prev_raw_counts, idx, thread);
 		}
 	}
-
-	evsel->counts->aggr = evsel->prev_raw_counts->aggr;
 }
 
 void evlist__copy_prev_raw_counts(struct evlist *evlist)
@@ -320,26 +318,6 @@ void evlist__copy_prev_raw_counts(struct evlist *evlist)
 		evsel__copy_prev_raw_counts(evsel);
 }
 
-void evlist__save_aggr_prev_raw_counts(struct evlist *evlist)
-{
-	struct evsel *evsel;
-
-	/*
-	 * To collect the overall statistics for interval mode,
-	 * we copy the counts from evsel->prev_raw_counts to
-	 * evsel->counts. The perf_stat_process_counter creates
-	 * aggr values from per cpu values, but the per cpu values
-	 * are 0 for AGGR_GLOBAL. So we use a trick that saves the
-	 * previous aggr value to the first member of perf_counts,
-	 * then aggr calculation in process_counter_values can work
-	 * correctly.
-	 */
-	evlist__for_each_entry(evlist, evsel) {
-		*perf_counts(evsel->prev_raw_counts, 0, 0) =
-			evsel->prev_raw_counts->aggr;
-	}
-}
-
 static size_t pkg_id_hash(const void *__key, void *ctx __maybe_unused)
 {
 	uint64_t *key = (uint64_t *) __key;
@@ -442,7 +420,6 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
 		       int cpu_map_idx, int thread,
 		       struct perf_counts_values *count)
 {
-	struct perf_counts_values *aggr = &evsel->counts->aggr;
 	struct perf_stat_evsel *ps = evsel->stats;
 	static struct perf_counts_values zero;
 	bool skip = false;
@@ -511,12 +488,6 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
 		}
 	}
 
-	if (config->aggr_mode == AGGR_GLOBAL) {
-		aggr->val += count->val;
-		aggr->ena += count->ena;
-		aggr->run += count->run;
-	}
-
 	return 0;
 }
 
@@ -541,13 +512,10 @@ static int process_counter_maps(struct perf_stat_config *config,
 int perf_stat_process_counter(struct perf_stat_config *config,
 			      struct evsel *counter)
 {
-	struct perf_counts_values *aggr = &counter->counts->aggr;
 	struct perf_stat_evsel *ps = counter->stats;
-	u64 *count = counter->counts->aggr.values;
+	u64 *count;
 	int ret;
 
-	aggr->val = aggr->ena = aggr->run = 0;
-
 	if (counter->per_pkg)
 		evsel__zero_per_pkg(counter);
 
@@ -558,6 +526,11 @@ int perf_stat_process_counter(struct perf_stat_config *config,
 	if (config->aggr_mode != AGGR_GLOBAL)
 		return 0;
 
+	/*
+	 * GLOBAL aggregation mode only has a single aggr counts,
+	 * so we can use ps->aggr[0] as the actual output.
+	 */
+	count = ps->aggr[0].counts.values;
 	update_stats(&ps->res_stats, *count);
 
 	if (verbose > 0) {
-- 
2.38.0.413.g74048e4d9e-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 18/20] perf stat: Display event stats using aggr counts
  2022-10-18  2:02 ` [PATCH 18/20] perf stat: Display event stats using aggr counts Namhyung Kim
@ 2022-12-05 10:51   ` Athira Rajeev
  2022-12-05 19:00     ` Namhyung Kim
  2023-06-14 13:40   ` Jiri Olsa
  1 sibling, 1 reply; 28+ messages in thread
From: Athira Rajeev @ 2022-12-05 10:51 UTC (permalink / raw)
  To: Namhyung Kim, Arnaldo Carvalho de Melo, Ian Rogers, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Adrian Hunter,
	linux-perf-users, Kan Liang, Leo Yan, Andi Kleen, James Clark,
	Xing Zhengjun, Michael Petlan



> On 18-Oct-2022, at 7:32 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> 
> Now aggr counts are ready for use.  Convert the display routines to use
> the aggr counts and update the shadow stat with them.  It doesn't need
> to aggregate counts or collect aliases anymore during the display.  Get
> rid of now unused struct perf_aggr_thread_value.
> 
> Note that there's a difference in the display order among the aggr mode.
> For per-core/die/socket/node aggregation, it shows relevant events in
> the same unit together, whereas global/thread/no aggregation it shows
> the same events for different units together.  So it still uses separate
> codes to display them due to the ordering.
> 
> One more thing to note is that it breaks per-core event display for now.
> The next patch will fix it to have identical output as of now.
> 
> Acked-by: Ian Rogers <irogers@google.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
> tools/perf/util/stat-display.c | 421 ++++-----------------------------
> tools/perf/util/stat.c         |   5 -
> tools/perf/util/stat.h         |   9 -
> 3 files changed, 49 insertions(+), 386 deletions(-)
> 
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index 4113aa86772f..bfae2784609c 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -442,31 +442,6 @@ static void print_metric_header(struct perf_stat_config *config,
> 		fprintf(os->fh, "%*s ", config->metric_only_len, unit);
> }
> 
> -static int first_shadow_map_idx(struct perf_stat_config *config,
> -				struct evsel *evsel, const struct aggr_cpu_id *id)
> -{
> -	struct perf_cpu_map *cpus = evsel__cpus(evsel);
> -	struct perf_cpu cpu;
> -	int idx;
> -
> -	if (config->aggr_mode == AGGR_NONE)
> -		return perf_cpu_map__idx(cpus, id->cpu);
> -
> -	if (config->aggr_mode == AGGR_THREAD)
> -		return id->thread_idx;
> -
> -	if (!config->aggr_get_id)
> -		return 0;
> -
> -	perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
> -		struct aggr_cpu_id cpu_id = config->aggr_get_id(config, cpu);
> -
> -		if (aggr_cpu_id__equal(&cpu_id, id))
> -			return idx;
> -	}
> -	return 0;
> -}
> -
> static void abs_printout(struct perf_stat_config *config,
> 			 struct aggr_cpu_id id, int nr, struct evsel *evsel, double avg)
> {
> @@ -537,7 +512,7 @@ static bool is_mixed_hw_group(struct evsel *counter)
> static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int nr,
> 		     struct evsel *counter, double uval,
> 		     char *prefix, u64 run, u64 ena, double noise,
> -		     struct runtime_stat *st)
> +		     struct runtime_stat *st, int map_idx)
> {
> 	struct perf_stat_output_ctx out;
> 	struct outstate os = {
> @@ -648,8 +623,7 @@ static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int
> 		print_running(config, run, ena);
> 	}
> 
> -	perf_stat__print_shadow_stats(config, counter, uval,
> -				first_shadow_map_idx(config, counter, &id),
> +	perf_stat__print_shadow_stats(config, counter, uval, map_idx,
> 				&out, &config->metric_events, st);
> 	if (!config->csv_output && !config->metric_only && !config->json_output) {
> 		print_noise(config, counter, noise);
> @@ -657,34 +631,6 @@ static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int
> 	}
> }
> 
> -static void aggr_update_shadow(struct perf_stat_config *config,
> -			       struct evlist *evlist)
> -{
> -	int idx, s;
> -	struct perf_cpu cpu;
> -	struct aggr_cpu_id s2, id;
> -	u64 val;
> -	struct evsel *counter;
> -	struct perf_cpu_map *cpus;
> -
> -	for (s = 0; s < config->aggr_map->nr; s++) {
> -		id = config->aggr_map->map[s];
> -		evlist__for_each_entry(evlist, counter) {
> -			cpus = evsel__cpus(counter);
> -			val = 0;
> -			perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
> -				s2 = config->aggr_get_id(config, cpu);
> -				if (!aggr_cpu_id__equal(&s2, &id))
> -					continue;
> -				val += perf_counts(counter->counts, idx, 0)->val;
> -			}
> -			perf_stat__update_shadow_stats(counter, val,
> -					first_shadow_map_idx(config, counter, &id),
> -					&rt_stat);
> -		}
> -	}
> -}
> -
> static void uniquify_event_name(struct evsel *counter)
> {
> 	char *new_name;
> @@ -721,137 +667,51 @@ static void uniquify_event_name(struct evsel *counter)
> 	counter->uniquified_name = true;
> }
> 
> -static void collect_all_aliases(struct perf_stat_config *config, struct evsel *counter,
> -			    void (*cb)(struct perf_stat_config *config, struct evsel *counter, void *data,
> -				       bool first),
> -			    void *data)
> +static bool hybrid_uniquify(struct evsel *evsel, struct perf_stat_config *config)
> {
> -	struct evlist *evlist = counter->evlist;
> -	struct evsel *alias;
> -
> -	alias = list_prepare_entry(counter, &(evlist->core.entries), core.node);
> -	list_for_each_entry_continue (alias, &evlist->core.entries, core.node) {
> -		/* Merge events with the same name, etc. but on different PMUs. */
> -		if (!strcmp(evsel__name(alias), evsel__name(counter)) &&
> -			alias->scale == counter->scale &&
> -			alias->cgrp == counter->cgrp &&
> -			!strcmp(alias->unit, counter->unit) &&
> -			evsel__is_clock(alias) == evsel__is_clock(counter) &&
> -			strcmp(alias->pmu_name, counter->pmu_name)) {
> -			alias->merged_stat = true;
> -			cb(config, alias, data, false);
> -		}
> -	}
> +	return evsel__is_hybrid(evsel) && !config->hybrid_merge;
> }
> 
> -static bool hybrid_merge(struct evsel *counter, struct perf_stat_config *config,
> -			 bool check)
> +static void uniquify_counter(struct perf_stat_config *config, struct evsel *counter)
> {
> -	if (evsel__is_hybrid(counter)) {
> -		if (check)
> -			return config->hybrid_merge;
> -		else
> -			return !config->hybrid_merge;
> -	}
> -
> -	return false;
> -}
> -
> -static bool collect_data(struct perf_stat_config *config, struct evsel *counter,
> -			    void (*cb)(struct perf_stat_config *config, struct evsel *counter, void *data,
> -				       bool first),
> -			    void *data)
> -{
> -	if (counter->merged_stat)
> -		return false;
> -	cb(config, counter, data, true);
> -	if (config->no_merge || hybrid_merge(counter, config, false))
> +	if (config->no_merge || hybrid_uniquify(counter, config))
> 		uniquify_event_name(counter);
> -	else if (counter->auto_merge_stats || hybrid_merge(counter, config, true))
> -		collect_all_aliases(config, counter, cb, data);
> -	return true;
> -}
> -
> -struct aggr_data {
> -	u64 ena, run, val;
> -	struct aggr_cpu_id id;
> -	int nr;
> -	int cpu_map_idx;
> -};
> -
> -static void aggr_cb(struct perf_stat_config *config,
> -		    struct evsel *counter, void *data, bool first)
> -{
> -	struct aggr_data *ad = data;
> -	int idx;
> -	struct perf_cpu cpu;
> -	struct perf_cpu_map *cpus;
> -	struct aggr_cpu_id s2;
> -
> -	cpus = evsel__cpus(counter);
> -	perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
> -		struct perf_counts_values *counts;
> -
> -		s2 = config->aggr_get_id(config, cpu);
> -		if (!aggr_cpu_id__equal(&s2, &ad->id))
> -			continue;
> -		if (first)
> -			ad->nr++;
> -		counts = perf_counts(counter->counts, idx, 0);
> -		/*
> -		 * When any result is bad, make them all to give
> -		 * consistent output in interval mode.
> -		 */
> -		if (counts->ena == 0 || counts->run == 0 ||
> -		    counter->counts->scaled == -1) {
> -			ad->ena = 0;
> -			ad->run = 0;
> -			break;
> -		}
> -		ad->val += counts->val;
> -		ad->ena += counts->ena;
> -		ad->run += counts->run;
> -	}
> }
> 
> static void print_counter_aggrdata(struct perf_stat_config *config,
> 				   struct evsel *counter, int s,
> 				   char *prefix, bool metric_only,
> -				   bool *first, struct perf_cpu cpu)
> +				   bool *first)
> {
> -	struct aggr_data ad;
> 	FILE *output = config->output;
> 	u64 ena, run, val;
> -	int nr;
> -	struct aggr_cpu_id id;
> 	double uval;
> +	struct perf_stat_evsel *ps = counter->stats;
> +	struct perf_stat_aggr *aggr = &ps->aggr[s];
> +	struct aggr_cpu_id id = config->aggr_map->map[s];
> +	double avg = aggr->counts.val;
> 
> -	ad.id = id = config->aggr_map->map[s];
> -	ad.val = ad.ena = ad.run = 0;
> -	ad.nr = 0;
> -	if (!collect_data(config, counter, aggr_cb, &ad))
> +	if (counter->supported && aggr->nr == 0)
> 		return;

Hi Namhyung,

Observed one issue with this change in tmp.perf/core branch of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git
Patch link: https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/commit/?h=tmp.perf/core&id=91f85f98da7ab8c32105f42dd03884c01ec4498f

If we have multiple events in a group and if all events in that group can't go in at once, the group will fail to schedule.
The perf stat result used to have status of all events in group. 

Example result from powerpc:
	# perf stat -e '{r1001e,r401e0,r4c010}' <workload>
	Performance counter stats for 'workload’:

	     <not counted>      r1001e                                                      
	     <not counted>      r401e0                                                      
	   <not supported>      r4c010                                                      

But after this change, it shows only the event which broke the constraint.

	# ./perf stat -e '{r1001e,r401e0,r4c010}' <workload>
	Performance counter stats for 'workload’:
	<not supported>      r4c010                    

The new condition added by the patch is:

	if (counter->supported && aggr->nr == 0)
	                return;


The first two events r1001e and r401e0 are good to go in this group, hence
counter->supported  is true where as aggr->nr is zero since this group is
not scheduled. Since counter->supported is false for third event which
broke the group constraint, the condition won't return and hence only prints
this event in output.

Namhyung, is there any scenario because of which this condition was added ?
If not I would go ahead and sent a fix patch to remove this check.

Thanks
Athira

> 
> -	if (perf_pmu__has_hybrid() && ad.ena == 0)
> -		return;
> +	uniquify_counter(config, counter);
> +
> +	val = aggr->counts.val;
> +	ena = aggr->counts.ena;
> +	run = aggr->counts.run;
> 
> -	nr = ad.nr;
> -	ena = ad.ena;
> -	run = ad.run;
> -	val = ad.val;
> 	if (*first && metric_only) {
> 		*first = false;
> -		aggr_printout(config, counter, id, nr);
> +		aggr_printout(config, counter, id, aggr->nr);
> 	}
> 	if (prefix && !metric_only)
> 		fprintf(output, "%s", prefix);
> 
> 	uval = val * counter->scale;
> -	if (cpu.cpu != -1)
> -		id = aggr_cpu_id__cpu(cpu, /*data=*/NULL);
> 
> -	printout(config, id, nr, counter, uval,
> -		 prefix, run, ena, 1.0, &rt_stat);
> +	printout(config, id, aggr->nr, counter, uval,
> +		 prefix, run, ena, avg, &rt_stat, s);
> +
> 	if (!metric_only)
> 		fputc('\n', output);
> }
> @@ -869,8 +729,6 @@ static void print_aggr(struct perf_stat_config *config,
> 	if (!config->aggr_map || !config->aggr_get_id)
> 		return;
> 
> -	aggr_update_shadow(config, evlist);
> -
> 	/*
> 	 * With metric_only everything is on a single line.
> 	 * Without each counter has its own line.
> @@ -881,188 +739,36 @@ static void print_aggr(struct perf_stat_config *config,
> 
> 		first = true;
> 		evlist__for_each_entry(evlist, counter) {
> +			if (counter->merged_stat)
> +				continue;
> +
> 			print_counter_aggrdata(config, counter, s,
> -					prefix, metric_only,
> -					&first, (struct perf_cpu){ .cpu = -1 });
> +					       prefix, metric_only,
> +					       &first);
> 		}
> 		if (metric_only)
> 			fputc('\n', output);
> 	}
> }
> 
> -static int cmp_val(const void *a, const void *b)
> -{
> -	return ((struct perf_aggr_thread_value *)b)->val -
> -		((struct perf_aggr_thread_value *)a)->val;
> -}
> -
> -static struct perf_aggr_thread_value *sort_aggr_thread(
> -					struct evsel *counter,
> -					int *ret,
> -					struct target *_target)
> -{
> -	int nthreads = perf_thread_map__nr(counter->core.threads);
> -	int i = 0;
> -	double uval;
> -	struct perf_aggr_thread_value *buf;
> -
> -	buf = calloc(nthreads, sizeof(struct perf_aggr_thread_value));
> -	if (!buf)
> -		return NULL;
> -
> -	for (int thread = 0; thread < nthreads; thread++) {
> -		int idx;
> -		u64 ena = 0, run = 0, val = 0;
> -
> -		perf_cpu_map__for_each_idx(idx, evsel__cpus(counter)) {
> -			struct perf_counts_values *counts =
> -				perf_counts(counter->counts, idx, thread);
> -
> -			val += counts->val;
> -			ena += counts->ena;
> -			run += counts->run;
> -		}
> -
> -		uval = val * counter->scale;
> -
> -		/*
> -		 * Skip value 0 when enabling --per-thread globally,
> -		 * otherwise too many 0 output.
> -		 */
> -		if (uval == 0.0 && target__has_per_thread(_target))
> -			continue;
> -
> -		buf[i].counter = counter;
> -		buf[i].id = aggr_cpu_id__empty();
> -		buf[i].id.thread_idx = thread;
> -		buf[i].uval = uval;
> -		buf[i].val = val;
> -		buf[i].run = run;
> -		buf[i].ena = ena;
> -		i++;
> -	}
> -
> -	qsort(buf, i, sizeof(struct perf_aggr_thread_value), cmp_val);
> -
> -	if (ret)
> -		*ret = i;
> -
> -	return buf;
> -}
> -
> -static void print_aggr_thread(struct perf_stat_config *config,
> -			      struct target *_target,
> -			      struct evsel *counter, char *prefix)
> -{
> -	FILE *output = config->output;
> -	int thread, sorted_threads;
> -	struct aggr_cpu_id id;
> -	struct perf_aggr_thread_value *buf;
> -
> -	buf = sort_aggr_thread(counter, &sorted_threads, _target);
> -	if (!buf) {
> -		perror("cannot sort aggr thread");
> -		return;
> -	}
> -
> -	for (thread = 0; thread < sorted_threads; thread++) {
> -		if (prefix)
> -			fprintf(output, "%s", prefix);
> -
> -		id = buf[thread].id;
> -		printout(config, id, 0, buf[thread].counter, buf[thread].uval,
> -			 prefix, buf[thread].run, buf[thread].ena, 1.0,
> -			 &rt_stat);
> -		fputc('\n', output);
> -	}
> -
> -	free(buf);
> -}
> -
> -struct caggr_data {
> -	double avg, avg_enabled, avg_running;
> -};
> -
> -static void counter_aggr_cb(struct perf_stat_config *config __maybe_unused,
> -			    struct evsel *counter, void *data,
> -			    bool first __maybe_unused)
> -{
> -	struct caggr_data *cd = data;
> -	struct perf_counts_values *aggr = &counter->counts->aggr;
> -
> -	cd->avg += aggr->val;
> -	cd->avg_enabled += aggr->ena;
> -	cd->avg_running += aggr->run;
> -}
> -
> -/*
> - * Print out the results of a single counter:
> - * aggregated counts in system-wide mode
> - */
> -static void print_counter_aggr(struct perf_stat_config *config,
> -			       struct evsel *counter, char *prefix)
> -{
> -	bool metric_only = config->metric_only;
> -	FILE *output = config->output;
> -	double uval;
> -	struct caggr_data cd = { .avg = 0.0 };
> -
> -	if (!collect_data(config, counter, counter_aggr_cb, &cd))
> -		return;
> -
> -	if (prefix && !metric_only)
> -		fprintf(output, "%s", prefix);
> -
> -	uval = cd.avg * counter->scale;
> -	printout(config, aggr_cpu_id__empty(), 0, counter, uval, prefix, cd.avg_running,
> -		 cd.avg_enabled, cd.avg, &rt_stat);
> -	if (!metric_only)
> -		fprintf(output, "\n");
> -}
> -
> -static void counter_cb(struct perf_stat_config *config __maybe_unused,
> -		       struct evsel *counter, void *data,
> -		       bool first __maybe_unused)
> -{
> -	struct aggr_data *ad = data;
> -
> -	ad->val += perf_counts(counter->counts, ad->cpu_map_idx, 0)->val;
> -	ad->ena += perf_counts(counter->counts, ad->cpu_map_idx, 0)->ena;
> -	ad->run += perf_counts(counter->counts, ad->cpu_map_idx, 0)->run;
> -}
> -
> -/*
> - * Print out the results of a single counter:
> - * does not use aggregated count in system-wide
> - */
> static void print_counter(struct perf_stat_config *config,
> 			  struct evsel *counter, char *prefix)
> {
> -	FILE *output = config->output;
> -	u64 ena, run, val;
> -	double uval;
> -	int idx;
> -	struct perf_cpu cpu;
> -	struct aggr_cpu_id id;
> -
> -	perf_cpu_map__for_each_cpu(cpu, idx, evsel__cpus(counter)) {
> -		struct aggr_data ad = { .cpu_map_idx = idx };
> -
> -		if (!collect_data(config, counter, counter_cb, &ad))
> -			return;
> -		val = ad.val;
> -		ena = ad.ena;
> -		run = ad.run;
> +	bool metric_only = config->metric_only;
> +	bool first = false;
> +	int s;
> 
> -		if (prefix)
> -			fprintf(output, "%s", prefix);
> +	/* AGGR_THREAD doesn't have config->aggr_get_id */
> +	if (!config->aggr_map)
> +		return;
> 
> -		uval = val * counter->scale;
> -		id = aggr_cpu_id__cpu(cpu, /*data=*/NULL);
> -		printout(config, id, 0, counter, uval, prefix,
> -			 run, ena, 1.0, &rt_stat);
> +	if (counter->merged_stat)
> +		return;
> 
> -		fputc('\n', output);
> +	for (s = 0; s < config->aggr_map->nr; s++) {
> +		print_counter_aggrdata(config, counter, s,
> +				       prefix, metric_only,
> +				       &first);
> 	}
> }
> 
> @@ -1081,6 +787,7 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
> 			u64 ena, run, val;
> 			double uval;
> 			struct aggr_cpu_id id;
> +			struct perf_stat_evsel *ps = counter->stats;
> 			int counter_idx = perf_cpu_map__idx(evsel__cpus(counter), cpu);
> 
> 			if (counter_idx < 0)
> @@ -1093,13 +800,13 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
> 				aggr_printout(config, counter, id, 0);
> 				first = false;
> 			}
> -			val = perf_counts(counter->counts, counter_idx, 0)->val;
> -			ena = perf_counts(counter->counts, counter_idx, 0)->ena;
> -			run = perf_counts(counter->counts, counter_idx, 0)->run;
> +			val = ps->aggr[counter_idx].counts.val;
> +			ena = ps->aggr[counter_idx].counts.ena;
> +			run = ps->aggr[counter_idx].counts.run;
> 
> 			uval = val * counter->scale;
> 			printout(config, id, 0, counter, uval, prefix,
> -				 run, ena, 1.0, &rt_stat);
> +				 run, ena, 1.0, &rt_stat, counter_idx);
> 		}
> 		if (!first)
> 			fputc('\n', config->output);
> @@ -1135,8 +842,8 @@ static void print_metric_headers(struct perf_stat_config *config,
> 	};
> 	bool first = true;
> 
> -		if (config->json_output && !config->interval)
> -			fprintf(config->output, "{");
> +	if (config->json_output && !config->interval)
> +		fprintf(config->output, "{");
> 
> 	if (prefix && !config->json_output)
> 		fprintf(config->output, "%s", prefix);
> @@ -1379,31 +1086,6 @@ static void print_footer(struct perf_stat_config *config)
> 			"the same PMU. Try reorganizing the group.\n");
> }
> 
> -static void print_percore_thread(struct perf_stat_config *config,
> -				 struct evsel *counter, char *prefix)
> -{
> -	int s;
> -	struct aggr_cpu_id s2, id;
> -	struct perf_cpu_map *cpus;
> -	bool first = true;
> -	int idx;
> -	struct perf_cpu cpu;
> -
> -	cpus = evsel__cpus(counter);
> -	perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
> -		s2 = config->aggr_get_id(config, cpu);
> -		for (s = 0; s < config->aggr_map->nr; s++) {
> -			id = config->aggr_map->map[s];
> -			if (aggr_cpu_id__equal(&s2, &id))
> -				break;
> -		}
> -
> -		print_counter_aggrdata(config, counter, s,
> -				       prefix, false,
> -				       &first, cpu);
> -	}
> -}
> -
> static void print_percore(struct perf_stat_config *config,
> 			  struct evsel *counter, char *prefix)
> {
> @@ -1416,15 +1098,14 @@ static void print_percore(struct perf_stat_config *config,
> 		return;
> 
> 	if (config->percore_show_thread)
> -		return print_percore_thread(config, counter, prefix);
> +		return print_counter(config, counter, prefix);
> 
> 	for (s = 0; s < config->aggr_map->nr; s++) {
> 		if (prefix && metric_only)
> 			fprintf(output, "%s", prefix);
> 
> 		print_counter_aggrdata(config, counter, s,
> -				prefix, metric_only,
> -				&first, (struct perf_cpu){ .cpu = -1 });
> +				       prefix, metric_only, &first);
> 	}
> 
> 	if (metric_only)
> @@ -1469,17 +1150,13 @@ void evlist__print_counters(struct evlist *evlist, struct perf_stat_config *conf
> 		print_aggr(config, evlist, prefix);
> 		break;
> 	case AGGR_THREAD:
> -		evlist__for_each_entry(evlist, counter) {
> -			print_aggr_thread(config, _target, counter, prefix);
> -		}
> -		break;
> 	case AGGR_GLOBAL:
> 		if (config->iostat_run)
> 			iostat_print_counters(evlist, config, ts, prefix = buf,
> -					      print_counter_aggr);
> +					      print_counter);
> 		else {
> 			evlist__for_each_entry(evlist, counter) {
> -				print_counter_aggr(config, counter, prefix);
> +				print_counter(config, counter, prefix);
> 			}
> 			if (metric_only)
> 				fputc('\n', config->output);
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index c0955a0427ab..0316557adce9 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -565,11 +565,6 @@ int perf_stat_process_counter(struct perf_stat_config *config,
> 			evsel__name(counter), count[0], count[1], count[2]);
> 	}
> 
> -	/*
> -	 * Save the full runtime - to allow normalization during printout:
> -	 */
> -	perf_stat__update_shadow_stats(counter, *count, 0, &rt_stat);
> -
> 	return 0;
> }
> 
> diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> index 3d413ba8c68a..382a1ab92ce1 100644
> --- a/tools/perf/util/stat.h
> +++ b/tools/perf/util/stat.h
> @@ -224,15 +224,6 @@ static inline void update_rusage_stats(struct rusage_stats *ru_stats, struct rus
> struct evsel;
> struct evlist;
> 
> -struct perf_aggr_thread_value {
> -	struct evsel *counter;
> -	struct aggr_cpu_id id;
> -	double uval;
> -	u64 val;
> -	u64 run;
> -	u64 ena;
> -};
> -
> bool __perf_stat_evsel__is(struct evsel *evsel, enum perf_stat_evsel_id id);
> 
> #define perf_stat_evsel__is(evsel, id) \
> -- 
> 2.38.0.413.g74048e4d9e-goog
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 18/20] perf stat: Display event stats using aggr counts
  2022-12-05 10:51   ` Athira Rajeev
@ 2022-12-05 19:00     ` Namhyung Kim
  2022-12-06 13:39       ` Athira Rajeev
  0 siblings, 1 reply; 28+ messages in thread
From: Namhyung Kim @ 2022-12-05 19:00 UTC (permalink / raw)
  To: Athira Rajeev
  Cc: Arnaldo Carvalho de Melo, Ian Rogers, Jiri Olsa, Ingo Molnar,
	Peter Zijlstra, LKML, Adrian Hunter, linux-perf-users, Kan Liang,
	Leo Yan, Andi Kleen, James Clark, Xing Zhengjun, Michael Petlan

Hello,

On Mon, Dec 5, 2022 at 2:52 AM Athira Rajeev
<atrajeev@linux.vnet.ibm.com> wrote:
>
>
>
> > On 18-Oct-2022, at 7:32 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Now aggr counts are ready for use.  Convert the display routines to use
> > the aggr counts and update the shadow stat with them.  It doesn't need
> > to aggregate counts or collect aliases anymore during the display.  Get
> > rid of now unused struct perf_aggr_thread_value.
> >
> > Note that there's a difference in the display order among the aggr mode.
> > For per-core/die/socket/node aggregation, it shows relevant events in
> > the same unit together, whereas global/thread/no aggregation it shows
> > the same events for different units together.  So it still uses separate
> > codes to display them due to the ordering.
> >
> > One more thing to note is that it breaks per-core event display for now.
> > The next patch will fix it to have identical output as of now.
> >
> > Acked-by: Ian Rogers <irogers@google.com>
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> > tools/perf/util/stat-display.c | 421 ++++-----------------------------
> > tools/perf/util/stat.c         |   5 -
> > tools/perf/util/stat.h         |   9 -
> > 3 files changed, 49 insertions(+), 386 deletions(-)
> >
> > diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> > index 4113aa86772f..bfae2784609c 100644
> > --- a/tools/perf/util/stat-display.c
> > +++ b/tools/perf/util/stat-display.c
> > @@ -442,31 +442,6 @@ static void print_metric_header(struct perf_stat_config *config,
> >               fprintf(os->fh, "%*s ", config->metric_only_len, unit);
> > }
> >
> > -static int first_shadow_map_idx(struct perf_stat_config *config,
> > -                             struct evsel *evsel, const struct aggr_cpu_id *id)
> > -{
> > -     struct perf_cpu_map *cpus = evsel__cpus(evsel);
> > -     struct perf_cpu cpu;
> > -     int idx;
> > -
> > -     if (config->aggr_mode == AGGR_NONE)
> > -             return perf_cpu_map__idx(cpus, id->cpu);
> > -
> > -     if (config->aggr_mode == AGGR_THREAD)
> > -             return id->thread_idx;
> > -
> > -     if (!config->aggr_get_id)
> > -             return 0;
> > -
> > -     perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
> > -             struct aggr_cpu_id cpu_id = config->aggr_get_id(config, cpu);
> > -
> > -             if (aggr_cpu_id__equal(&cpu_id, id))
> > -                     return idx;
> > -     }
> > -     return 0;
> > -}
> > -
> > static void abs_printout(struct perf_stat_config *config,
> >                        struct aggr_cpu_id id, int nr, struct evsel *evsel, double avg)
> > {
> > @@ -537,7 +512,7 @@ static bool is_mixed_hw_group(struct evsel *counter)
> > static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int nr,
> >                    struct evsel *counter, double uval,
> >                    char *prefix, u64 run, u64 ena, double noise,
> > -                  struct runtime_stat *st)
> > +                  struct runtime_stat *st, int map_idx)
> > {
> >       struct perf_stat_output_ctx out;
> >       struct outstate os = {
> > @@ -648,8 +623,7 @@ static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int
> >               print_running(config, run, ena);
> >       }
> >
> > -     perf_stat__print_shadow_stats(config, counter, uval,
> > -                             first_shadow_map_idx(config, counter, &id),
> > +     perf_stat__print_shadow_stats(config, counter, uval, map_idx,
> >                               &out, &config->metric_events, st);
> >       if (!config->csv_output && !config->metric_only && !config->json_output) {
> >               print_noise(config, counter, noise);
> > @@ -657,34 +631,6 @@ static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int
> >       }
> > }
> >
> > -static void aggr_update_shadow(struct perf_stat_config *config,
> > -                            struct evlist *evlist)
> > -{
> > -     int idx, s;
> > -     struct perf_cpu cpu;
> > -     struct aggr_cpu_id s2, id;
> > -     u64 val;
> > -     struct evsel *counter;
> > -     struct perf_cpu_map *cpus;
> > -
> > -     for (s = 0; s < config->aggr_map->nr; s++) {
> > -             id = config->aggr_map->map[s];
> > -             evlist__for_each_entry(evlist, counter) {
> > -                     cpus = evsel__cpus(counter);
> > -                     val = 0;
> > -                     perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
> > -                             s2 = config->aggr_get_id(config, cpu);
> > -                             if (!aggr_cpu_id__equal(&s2, &id))
> > -                                     continue;
> > -                             val += perf_counts(counter->counts, idx, 0)->val;
> > -                     }
> > -                     perf_stat__update_shadow_stats(counter, val,
> > -                                     first_shadow_map_idx(config, counter, &id),
> > -                                     &rt_stat);
> > -             }
> > -     }
> > -}
> > -
> > static void uniquify_event_name(struct evsel *counter)
> > {
> >       char *new_name;
> > @@ -721,137 +667,51 @@ static void uniquify_event_name(struct evsel *counter)
> >       counter->uniquified_name = true;
> > }
> >
> > -static void collect_all_aliases(struct perf_stat_config *config, struct evsel *counter,
> > -                         void (*cb)(struct perf_stat_config *config, struct evsel *counter, void *data,
> > -                                    bool first),
> > -                         void *data)
> > +static bool hybrid_uniquify(struct evsel *evsel, struct perf_stat_config *config)
> > {
> > -     struct evlist *evlist = counter->evlist;
> > -     struct evsel *alias;
> > -
> > -     alias = list_prepare_entry(counter, &(evlist->core.entries), core.node);
> > -     list_for_each_entry_continue (alias, &evlist->core.entries, core.node) {
> > -             /* Merge events with the same name, etc. but on different PMUs. */
> > -             if (!strcmp(evsel__name(alias), evsel__name(counter)) &&
> > -                     alias->scale == counter->scale &&
> > -                     alias->cgrp == counter->cgrp &&
> > -                     !strcmp(alias->unit, counter->unit) &&
> > -                     evsel__is_clock(alias) == evsel__is_clock(counter) &&
> > -                     strcmp(alias->pmu_name, counter->pmu_name)) {
> > -                     alias->merged_stat = true;
> > -                     cb(config, alias, data, false);
> > -             }
> > -     }
> > +     return evsel__is_hybrid(evsel) && !config->hybrid_merge;
> > }
> >
> > -static bool hybrid_merge(struct evsel *counter, struct perf_stat_config *config,
> > -                      bool check)
> > +static void uniquify_counter(struct perf_stat_config *config, struct evsel *counter)
> > {
> > -     if (evsel__is_hybrid(counter)) {
> > -             if (check)
> > -                     return config->hybrid_merge;
> > -             else
> > -                     return !config->hybrid_merge;
> > -     }
> > -
> > -     return false;
> > -}
> > -
> > -static bool collect_data(struct perf_stat_config *config, struct evsel *counter,
> > -                         void (*cb)(struct perf_stat_config *config, struct evsel *counter, void *data,
> > -                                    bool first),
> > -                         void *data)
> > -{
> > -     if (counter->merged_stat)
> > -             return false;
> > -     cb(config, counter, data, true);
> > -     if (config->no_merge || hybrid_merge(counter, config, false))
> > +     if (config->no_merge || hybrid_uniquify(counter, config))
> >               uniquify_event_name(counter);
> > -     else if (counter->auto_merge_stats || hybrid_merge(counter, config, true))
> > -             collect_all_aliases(config, counter, cb, data);
> > -     return true;
> > -}
> > -
> > -struct aggr_data {
> > -     u64 ena, run, val;
> > -     struct aggr_cpu_id id;
> > -     int nr;
> > -     int cpu_map_idx;
> > -};
> > -
> > -static void aggr_cb(struct perf_stat_config *config,
> > -                 struct evsel *counter, void *data, bool first)
> > -{
> > -     struct aggr_data *ad = data;
> > -     int idx;
> > -     struct perf_cpu cpu;
> > -     struct perf_cpu_map *cpus;
> > -     struct aggr_cpu_id s2;
> > -
> > -     cpus = evsel__cpus(counter);
> > -     perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
> > -             struct perf_counts_values *counts;
> > -
> > -             s2 = config->aggr_get_id(config, cpu);
> > -             if (!aggr_cpu_id__equal(&s2, &ad->id))
> > -                     continue;
> > -             if (first)
> > -                     ad->nr++;
> > -             counts = perf_counts(counter->counts, idx, 0);
> > -             /*
> > -              * When any result is bad, make them all to give
> > -              * consistent output in interval mode.
> > -              */
> > -             if (counts->ena == 0 || counts->run == 0 ||
> > -                 counter->counts->scaled == -1) {
> > -                     ad->ena = 0;
> > -                     ad->run = 0;
> > -                     break;
> > -             }
> > -             ad->val += counts->val;
> > -             ad->ena += counts->ena;
> > -             ad->run += counts->run;
> > -     }
> > }
> >
> > static void print_counter_aggrdata(struct perf_stat_config *config,
> >                                  struct evsel *counter, int s,
> >                                  char *prefix, bool metric_only,
> > -                                bool *first, struct perf_cpu cpu)
> > +                                bool *first)
> > {
> > -     struct aggr_data ad;
> >       FILE *output = config->output;
> >       u64 ena, run, val;
> > -     int nr;
> > -     struct aggr_cpu_id id;
> >       double uval;
> > +     struct perf_stat_evsel *ps = counter->stats;
> > +     struct perf_stat_aggr *aggr = &ps->aggr[s];
> > +     struct aggr_cpu_id id = config->aggr_map->map[s];
> > +     double avg = aggr->counts.val;
> >
> > -     ad.id = id = config->aggr_map->map[s];
> > -     ad.val = ad.ena = ad.run = 0;
> > -     ad.nr = 0;
> > -     if (!collect_data(config, counter, aggr_cb, &ad))
> > +     if (counter->supported && aggr->nr == 0)
> >               return;
>
> Hi Namhyung,
>
> Observed one issue with this change in tmp.perf/core branch of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git
> Patch link: https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/commit/?h=tmp.perf/core&id=91f85f98da7ab8c32105f42dd03884c01ec4498f
>
> If we have multiple events in a group and if all events in that group can't go in at once, the group will fail to schedule.
> The perf stat result used to have status of all events in group.
>
> Example result from powerpc:
>         # perf stat -e '{r1001e,r401e0,r4c010}' <workload>
>         Performance counter stats for 'workload’:
>
>              <not counted>      r1001e
>              <not counted>      r401e0
>            <not supported>      r4c010
>
> But after this change, it shows only the event which broke the constraint.
>
>         # ./perf stat -e '{r1001e,r401e0,r4c010}' <workload>
>         Performance counter stats for 'workload’:
>         <not supported>      r4c010
>
> The new condition added by the patch is:
>
>         if (counter->supported && aggr->nr == 0)
>                         return;
>
>
> The first two events r1001e and r401e0 are good to go in this group, hence
> counter->supported  is true where as aggr->nr is zero since this group is
> not scheduled. Since counter->supported is false for third event which
> broke the group constraint, the condition won't return and hence only prints
> this event in output.
>
> Namhyung, is there any scenario because of which this condition was added ?
> If not I would go ahead and sent a fix patch to remove this check.

I remember it's for AGGR_THREAD and merged (uncore) aliases and hybrid
events.  In system-wide per-thread mode, many of them would get 0 results
and we'd like to skip them.  And now I see that we have evsel->merged_stat
check already.

Maybe we could just add the aggr_mode check there.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 18/20] perf stat: Display event stats using aggr counts
  2022-12-05 19:00     ` Namhyung Kim
@ 2022-12-06 13:39       ` Athira Rajeev
  0 siblings, 0 replies; 28+ messages in thread
From: Athira Rajeev @ 2022-12-06 13:39 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ian Rogers, Jiri Olsa, Ingo Molnar,
	Peter Zijlstra, LKML, Adrian Hunter, linux-perf-users, Kan Liang,
	Leo Yan, Andi Kleen, James Clark, Xing Zhengjun, Michael Petlan



> On 06-Dec-2022, at 12:30 AM, Namhyung Kim <namhyung@kernel.org> wrote:
> 
> Hello,
> 
> On Mon, Dec 5, 2022 at 2:52 AM Athira Rajeev
> <atrajeev@linux.vnet.ibm.com> wrote:
>> 
>> 
>> 
>>> On 18-Oct-2022, at 7:32 AM, Namhyung Kim <namhyung@kernel.org> wrote:
>>> 
>>> Now aggr counts are ready for use.  Convert the display routines to use
>>> the aggr counts and update the shadow stat with them.  It doesn't need
>>> to aggregate counts or collect aliases anymore during the display.  Get
>>> rid of now unused struct perf_aggr_thread_value.
>>> 
>>> Note that there's a difference in the display order among the aggr mode.
>>> For per-core/die/socket/node aggregation, it shows relevant events in
>>> the same unit together, whereas global/thread/no aggregation it shows
>>> the same events for different units together.  So it still uses separate
>>> codes to display them due to the ordering.
>>> 
>>> One more thing to note is that it breaks per-core event display for now.
>>> The next patch will fix it to have identical output as of now.
>>> 
>>> Acked-by: Ian Rogers <irogers@google.com>
>>> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
>>> ---
>>> tools/perf/util/stat-display.c | 421 ++++-----------------------------
>>> tools/perf/util/stat.c         |   5 -
>>> tools/perf/util/stat.h         |   9 -
>>> 3 files changed, 49 insertions(+), 386 deletions(-)
>>> 
>>> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
>>> index 4113aa86772f..bfae2784609c 100644
>>> --- a/tools/perf/util/stat-display.c
>>> +++ b/tools/perf/util/stat-display.c
>>> @@ -442,31 +442,6 @@ static void print_metric_header(struct perf_stat_config *config,
>>>              fprintf(os->fh, "%*s ", config->metric_only_len, unit);
>>> }
>>> 
>>> -static int first_shadow_map_idx(struct perf_stat_config *config,
>>> -                             struct evsel *evsel, const struct aggr_cpu_id *id)
>>> -{
>>> -     struct perf_cpu_map *cpus = evsel__cpus(evsel);
>>> -     struct perf_cpu cpu;
>>> -     int idx;
>>> -
>>> -     if (config->aggr_mode == AGGR_NONE)
>>> -             return perf_cpu_map__idx(cpus, id->cpu);
>>> -
>>> -     if (config->aggr_mode == AGGR_THREAD)
>>> -             return id->thread_idx;
>>> -
>>> -     if (!config->aggr_get_id)
>>> -             return 0;
>>> -
>>> -     perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
>>> -             struct aggr_cpu_id cpu_id = config->aggr_get_id(config, cpu);
>>> -
>>> -             if (aggr_cpu_id__equal(&cpu_id, id))
>>> -                     return idx;
>>> -     }
>>> -     return 0;
>>> -}
>>> -
>>> static void abs_printout(struct perf_stat_config *config,
>>>                       struct aggr_cpu_id id, int nr, struct evsel *evsel, double avg)
>>> {
>>> @@ -537,7 +512,7 @@ static bool is_mixed_hw_group(struct evsel *counter)
>>> static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int nr,
>>>                   struct evsel *counter, double uval,
>>>                   char *prefix, u64 run, u64 ena, double noise,
>>> -                  struct runtime_stat *st)
>>> +                  struct runtime_stat *st, int map_idx)
>>> {
>>>      struct perf_stat_output_ctx out;
>>>      struct outstate os = {
>>> @@ -648,8 +623,7 @@ static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int
>>>              print_running(config, run, ena);
>>>      }
>>> 
>>> -     perf_stat__print_shadow_stats(config, counter, uval,
>>> -                             first_shadow_map_idx(config, counter, &id),
>>> +     perf_stat__print_shadow_stats(config, counter, uval, map_idx,
>>>                              &out, &config->metric_events, st);
>>>      if (!config->csv_output && !config->metric_only && !config->json_output) {
>>>              print_noise(config, counter, noise);
>>> @@ -657,34 +631,6 @@ static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int
>>>      }
>>> }
>>> 
>>> -static void aggr_update_shadow(struct perf_stat_config *config,
>>> -                            struct evlist *evlist)
>>> -{
>>> -     int idx, s;
>>> -     struct perf_cpu cpu;
>>> -     struct aggr_cpu_id s2, id;
>>> -     u64 val;
>>> -     struct evsel *counter;
>>> -     struct perf_cpu_map *cpus;
>>> -
>>> -     for (s = 0; s < config->aggr_map->nr; s++) {
>>> -             id = config->aggr_map->map[s];
>>> -             evlist__for_each_entry(evlist, counter) {
>>> -                     cpus = evsel__cpus(counter);
>>> -                     val = 0;
>>> -                     perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
>>> -                             s2 = config->aggr_get_id(config, cpu);
>>> -                             if (!aggr_cpu_id__equal(&s2, &id))
>>> -                                     continue;
>>> -                             val += perf_counts(counter->counts, idx, 0)->val;
>>> -                     }
>>> -                     perf_stat__update_shadow_stats(counter, val,
>>> -                                     first_shadow_map_idx(config, counter, &id),
>>> -                                     &rt_stat);
>>> -             }
>>> -     }
>>> -}
>>> -
>>> static void uniquify_event_name(struct evsel *counter)
>>> {
>>>      char *new_name;
>>> @@ -721,137 +667,51 @@ static void uniquify_event_name(struct evsel *counter)
>>>      counter->uniquified_name = true;
>>> }
>>> 
>>> -static void collect_all_aliases(struct perf_stat_config *config, struct evsel *counter,
>>> -                         void (*cb)(struct perf_stat_config *config, struct evsel *counter, void *data,
>>> -                                    bool first),
>>> -                         void *data)
>>> +static bool hybrid_uniquify(struct evsel *evsel, struct perf_stat_config *config)
>>> {
>>> -     struct evlist *evlist = counter->evlist;
>>> -     struct evsel *alias;
>>> -
>>> -     alias = list_prepare_entry(counter, &(evlist->core.entries), core.node);
>>> -     list_for_each_entry_continue (alias, &evlist->core.entries, core.node) {
>>> -             /* Merge events with the same name, etc. but on different PMUs. */
>>> -             if (!strcmp(evsel__name(alias), evsel__name(counter)) &&
>>> -                     alias->scale == counter->scale &&
>>> -                     alias->cgrp == counter->cgrp &&
>>> -                     !strcmp(alias->unit, counter->unit) &&
>>> -                     evsel__is_clock(alias) == evsel__is_clock(counter) &&
>>> -                     strcmp(alias->pmu_name, counter->pmu_name)) {
>>> -                     alias->merged_stat = true;
>>> -                     cb(config, alias, data, false);
>>> -             }
>>> -     }
>>> +     return evsel__is_hybrid(evsel) && !config->hybrid_merge;
>>> }
>>> 
>>> -static bool hybrid_merge(struct evsel *counter, struct perf_stat_config *config,
>>> -                      bool check)
>>> +static void uniquify_counter(struct perf_stat_config *config, struct evsel *counter)
>>> {
>>> -     if (evsel__is_hybrid(counter)) {
>>> -             if (check)
>>> -                     return config->hybrid_merge;
>>> -             else
>>> -                     return !config->hybrid_merge;
>>> -     }
>>> -
>>> -     return false;
>>> -}
>>> -
>>> -static bool collect_data(struct perf_stat_config *config, struct evsel *counter,
>>> -                         void (*cb)(struct perf_stat_config *config, struct evsel *counter, void *data,
>>> -                                    bool first),
>>> -                         void *data)
>>> -{
>>> -     if (counter->merged_stat)
>>> -             return false;
>>> -     cb(config, counter, data, true);
>>> -     if (config->no_merge || hybrid_merge(counter, config, false))
>>> +     if (config->no_merge || hybrid_uniquify(counter, config))
>>>              uniquify_event_name(counter);
>>> -     else if (counter->auto_merge_stats || hybrid_merge(counter, config, true))
>>> -             collect_all_aliases(config, counter, cb, data);
>>> -     return true;
>>> -}
>>> -
>>> -struct aggr_data {
>>> -     u64 ena, run, val;
>>> -     struct aggr_cpu_id id;
>>> -     int nr;
>>> -     int cpu_map_idx;
>>> -};
>>> -
>>> -static void aggr_cb(struct perf_stat_config *config,
>>> -                 struct evsel *counter, void *data, bool first)
>>> -{
>>> -     struct aggr_data *ad = data;
>>> -     int idx;
>>> -     struct perf_cpu cpu;
>>> -     struct perf_cpu_map *cpus;
>>> -     struct aggr_cpu_id s2;
>>> -
>>> -     cpus = evsel__cpus(counter);
>>> -     perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
>>> -             struct perf_counts_values *counts;
>>> -
>>> -             s2 = config->aggr_get_id(config, cpu);
>>> -             if (!aggr_cpu_id__equal(&s2, &ad->id))
>>> -                     continue;
>>> -             if (first)
>>> -                     ad->nr++;
>>> -             counts = perf_counts(counter->counts, idx, 0);
>>> -             /*
>>> -              * When any result is bad, make them all to give
>>> -              * consistent output in interval mode.
>>> -              */
>>> -             if (counts->ena == 0 || counts->run == 0 ||
>>> -                 counter->counts->scaled == -1) {
>>> -                     ad->ena = 0;
>>> -                     ad->run = 0;
>>> -                     break;
>>> -             }
>>> -             ad->val += counts->val;
>>> -             ad->ena += counts->ena;
>>> -             ad->run += counts->run;
>>> -     }
>>> }
>>> 
>>> static void print_counter_aggrdata(struct perf_stat_config *config,
>>>                                 struct evsel *counter, int s,
>>>                                 char *prefix, bool metric_only,
>>> -                                bool *first, struct perf_cpu cpu)
>>> +                                bool *first)
>>> {
>>> -     struct aggr_data ad;
>>>      FILE *output = config->output;
>>>      u64 ena, run, val;
>>> -     int nr;
>>> -     struct aggr_cpu_id id;
>>>      double uval;
>>> +     struct perf_stat_evsel *ps = counter->stats;
>>> +     struct perf_stat_aggr *aggr = &ps->aggr[s];
>>> +     struct aggr_cpu_id id = config->aggr_map->map[s];
>>> +     double avg = aggr->counts.val;
>>> 
>>> -     ad.id = id = config->aggr_map->map[s];
>>> -     ad.val = ad.ena = ad.run = 0;
>>> -     ad.nr = 0;
>>> -     if (!collect_data(config, counter, aggr_cb, &ad))
>>> +     if (counter->supported && aggr->nr == 0)
>>>              return;
>> 
>> Hi Namhyung,
>> 
>> Observed one issue with this change in tmp.perf/core branch of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git
>> Patch link: https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/commit/?h=tmp.perf/core&id=91f85f98da7ab8c32105f42dd03884c01ec4498f
>> 
>> If we have multiple events in a group and if all events in that group can't go in at once, the group will fail to schedule.
>> The perf stat result used to have status of all events in group.
>> 
>> Example result from powerpc:
>>        # perf stat -e '{r1001e,r401e0,r4c010}' <workload>
>>        Performance counter stats for 'workload’:
>> 
>>             <not counted>      r1001e
>>             <not counted>      r401e0
>>           <not supported>      r4c010
>> 
>> But after this change, it shows only the event which broke the constraint.
>> 
>>        # ./perf stat -e '{r1001e,r401e0,r4c010}' <workload>
>>        Performance counter stats for 'workload’:
>>        <not supported>      r4c010
>> 
>> The new condition added by the patch is:
>> 
>>        if (counter->supported && aggr->nr == 0)
>>                        return;
>> 
>> 
>> The first two events r1001e and r401e0 are good to go in this group, hence
>> counter->supported  is true where as aggr->nr is zero since this group is
>> not scheduled. Since counter->supported is false for third event which
>> broke the group constraint, the condition won't return and hence only prints
>> this event in output.
>> 
>> Namhyung, is there any scenario because of which this condition was added ?
>> If not I would go ahead and sent a fix patch to remove this check.
> 
> I remember it's for AGGR_THREAD and merged (uncore) aliases and hybrid
> events.  In system-wide per-thread mode, many of them would get 0 results
> and we'd like to skip them.  And now I see that we have evsel->merged_stat
> check already.
> 
> Maybe we could just add the aggr_mode check there.

Thanks for the response Namhyung.
I will share my response in patch "[PATCH] perf stat: Update event skip condition”

Thanks
Athira
> 
> Thanks,
> Namhyung


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 18/20] perf stat: Display event stats using aggr counts
  2022-10-18  2:02 ` [PATCH 18/20] perf stat: Display event stats using aggr counts Namhyung Kim
  2022-12-05 10:51   ` Athira Rajeev
@ 2023-06-14 13:40   ` Jiri Olsa
  2023-06-14 16:20     ` Ian Rogers
  1 sibling, 1 reply; 28+ messages in thread
From: Jiri Olsa @ 2023-06-14 13:40 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, Peter Zijlstra, LKML,
	Ian Rogers, Adrian Hunter, linux-perf-users, Kan Liang, Leo Yan,
	Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun,
	Michael Petlan

On Mon, Oct 17, 2022 at 07:02:25PM -0700, Namhyung Kim wrote:
> Now aggr counts are ready for use.  Convert the display routines to use
> the aggr counts and update the shadow stat with them.  It doesn't need
> to aggregate counts or collect aliases anymore during the display.  Get
> rid of now unused struct perf_aggr_thread_value.
> 
> Note that there's a difference in the display order among the aggr mode.
> For per-core/die/socket/node aggregation, it shows relevant events in
> the same unit together, whereas global/thread/no aggregation it shows
> the same events for different units together.  So it still uses separate
> codes to display them due to the ordering.
> 
> One more thing to note is that it breaks per-core event display for now.
> The next patch will fix it to have identical output as of now.
> 
> Acked-by: Ian Rogers <irogers@google.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>

hi,
this one seems to break 'perf stat -r X' not sure why so far..

final counts seems to be accumulated instead of displaying average, like:


with this patch:

	 Performance counter stats for './test_progs -n 103/1' (2 runs):

	       206,815,929      cycles:u                                                             ( +-  0.05% )
	    16,052,747,533      cycles:k                                                             ( +-  0.10% )
	    16,259,643,167      cycles                                                               ( +-  0.10% )

		   1.98093 +- 0.00586 seconds time elapsed  ( +-  0.30% )


without this patch:

	 Performance counter stats for './test_progs -n 103/1' (2 runs):

	       103,300,812      cycles:u                                                             ( +-  0.37% )
	     8,016,856,866      cycles:k                                                             ( +-  0.32% )
	     8,120,200,572      cycles                                                               ( +-  0.32% )

		   1.97272 +- 0.00270 seconds time elapsed  ( +-  0.14% )


any idea? ;-)

thanks,
jirka




> ---
>  tools/perf/util/stat-display.c | 421 ++++-----------------------------
>  tools/perf/util/stat.c         |   5 -
>  tools/perf/util/stat.h         |   9 -
>  3 files changed, 49 insertions(+), 386 deletions(-)
> 
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index 4113aa86772f..bfae2784609c 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -442,31 +442,6 @@ static void print_metric_header(struct perf_stat_config *config,
>  		fprintf(os->fh, "%*s ", config->metric_only_len, unit);
>  }
>  
> -static int first_shadow_map_idx(struct perf_stat_config *config,
> -				struct evsel *evsel, const struct aggr_cpu_id *id)
> -{
> -	struct perf_cpu_map *cpus = evsel__cpus(evsel);
> -	struct perf_cpu cpu;
> -	int idx;
> -
> -	if (config->aggr_mode == AGGR_NONE)
> -		return perf_cpu_map__idx(cpus, id->cpu);
> -
> -	if (config->aggr_mode == AGGR_THREAD)
> -		return id->thread_idx;
> -
> -	if (!config->aggr_get_id)
> -		return 0;
> -
> -	perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
> -		struct aggr_cpu_id cpu_id = config->aggr_get_id(config, cpu);
> -
> -		if (aggr_cpu_id__equal(&cpu_id, id))
> -			return idx;
> -	}
> -	return 0;
> -}
> -
>  static void abs_printout(struct perf_stat_config *config,
>  			 struct aggr_cpu_id id, int nr, struct evsel *evsel, double avg)
>  {
> @@ -537,7 +512,7 @@ static bool is_mixed_hw_group(struct evsel *counter)
>  static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int nr,
>  		     struct evsel *counter, double uval,
>  		     char *prefix, u64 run, u64 ena, double noise,
> -		     struct runtime_stat *st)
> +		     struct runtime_stat *st, int map_idx)
>  {
>  	struct perf_stat_output_ctx out;
>  	struct outstate os = {
> @@ -648,8 +623,7 @@ static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int
>  		print_running(config, run, ena);
>  	}
>  
> -	perf_stat__print_shadow_stats(config, counter, uval,
> -				first_shadow_map_idx(config, counter, &id),
> +	perf_stat__print_shadow_stats(config, counter, uval, map_idx,
>  				&out, &config->metric_events, st);
>  	if (!config->csv_output && !config->metric_only && !config->json_output) {
>  		print_noise(config, counter, noise);
> @@ -657,34 +631,6 @@ static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int
>  	}
>  }
>  
> -static void aggr_update_shadow(struct perf_stat_config *config,
> -			       struct evlist *evlist)
> -{
> -	int idx, s;
> -	struct perf_cpu cpu;
> -	struct aggr_cpu_id s2, id;
> -	u64 val;
> -	struct evsel *counter;
> -	struct perf_cpu_map *cpus;
> -
> -	for (s = 0; s < config->aggr_map->nr; s++) {
> -		id = config->aggr_map->map[s];
> -		evlist__for_each_entry(evlist, counter) {
> -			cpus = evsel__cpus(counter);
> -			val = 0;
> -			perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
> -				s2 = config->aggr_get_id(config, cpu);
> -				if (!aggr_cpu_id__equal(&s2, &id))
> -					continue;
> -				val += perf_counts(counter->counts, idx, 0)->val;
> -			}
> -			perf_stat__update_shadow_stats(counter, val,
> -					first_shadow_map_idx(config, counter, &id),
> -					&rt_stat);
> -		}
> -	}
> -}
> -
>  static void uniquify_event_name(struct evsel *counter)
>  {
>  	char *new_name;
> @@ -721,137 +667,51 @@ static void uniquify_event_name(struct evsel *counter)
>  	counter->uniquified_name = true;
>  }
>  
> -static void collect_all_aliases(struct perf_stat_config *config, struct evsel *counter,
> -			    void (*cb)(struct perf_stat_config *config, struct evsel *counter, void *data,
> -				       bool first),
> -			    void *data)
> +static bool hybrid_uniquify(struct evsel *evsel, struct perf_stat_config *config)
>  {
> -	struct evlist *evlist = counter->evlist;
> -	struct evsel *alias;
> -
> -	alias = list_prepare_entry(counter, &(evlist->core.entries), core.node);
> -	list_for_each_entry_continue (alias, &evlist->core.entries, core.node) {
> -		/* Merge events with the same name, etc. but on different PMUs. */
> -		if (!strcmp(evsel__name(alias), evsel__name(counter)) &&
> -			alias->scale == counter->scale &&
> -			alias->cgrp == counter->cgrp &&
> -			!strcmp(alias->unit, counter->unit) &&
> -			evsel__is_clock(alias) == evsel__is_clock(counter) &&
> -			strcmp(alias->pmu_name, counter->pmu_name)) {
> -			alias->merged_stat = true;
> -			cb(config, alias, data, false);
> -		}
> -	}
> +	return evsel__is_hybrid(evsel) && !config->hybrid_merge;
>  }
>  
> -static bool hybrid_merge(struct evsel *counter, struct perf_stat_config *config,
> -			 bool check)
> +static void uniquify_counter(struct perf_stat_config *config, struct evsel *counter)
>  {
> -	if (evsel__is_hybrid(counter)) {
> -		if (check)
> -			return config->hybrid_merge;
> -		else
> -			return !config->hybrid_merge;
> -	}
> -
> -	return false;
> -}
> -
> -static bool collect_data(struct perf_stat_config *config, struct evsel *counter,
> -			    void (*cb)(struct perf_stat_config *config, struct evsel *counter, void *data,
> -				       bool first),
> -			    void *data)
> -{
> -	if (counter->merged_stat)
> -		return false;
> -	cb(config, counter, data, true);
> -	if (config->no_merge || hybrid_merge(counter, config, false))
> +	if (config->no_merge || hybrid_uniquify(counter, config))
>  		uniquify_event_name(counter);
> -	else if (counter->auto_merge_stats || hybrid_merge(counter, config, true))
> -		collect_all_aliases(config, counter, cb, data);
> -	return true;
> -}
> -
> -struct aggr_data {
> -	u64 ena, run, val;
> -	struct aggr_cpu_id id;
> -	int nr;
> -	int cpu_map_idx;
> -};
> -
> -static void aggr_cb(struct perf_stat_config *config,
> -		    struct evsel *counter, void *data, bool first)
> -{
> -	struct aggr_data *ad = data;
> -	int idx;
> -	struct perf_cpu cpu;
> -	struct perf_cpu_map *cpus;
> -	struct aggr_cpu_id s2;
> -
> -	cpus = evsel__cpus(counter);
> -	perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
> -		struct perf_counts_values *counts;
> -
> -		s2 = config->aggr_get_id(config, cpu);
> -		if (!aggr_cpu_id__equal(&s2, &ad->id))
> -			continue;
> -		if (first)
> -			ad->nr++;
> -		counts = perf_counts(counter->counts, idx, 0);
> -		/*
> -		 * When any result is bad, make them all to give
> -		 * consistent output in interval mode.
> -		 */
> -		if (counts->ena == 0 || counts->run == 0 ||
> -		    counter->counts->scaled == -1) {
> -			ad->ena = 0;
> -			ad->run = 0;
> -			break;
> -		}
> -		ad->val += counts->val;
> -		ad->ena += counts->ena;
> -		ad->run += counts->run;
> -	}
>  }
>  
>  static void print_counter_aggrdata(struct perf_stat_config *config,
>  				   struct evsel *counter, int s,
>  				   char *prefix, bool metric_only,
> -				   bool *first, struct perf_cpu cpu)
> +				   bool *first)
>  {
> -	struct aggr_data ad;
>  	FILE *output = config->output;
>  	u64 ena, run, val;
> -	int nr;
> -	struct aggr_cpu_id id;
>  	double uval;
> +	struct perf_stat_evsel *ps = counter->stats;
> +	struct perf_stat_aggr *aggr = &ps->aggr[s];
> +	struct aggr_cpu_id id = config->aggr_map->map[s];
> +	double avg = aggr->counts.val;
>  
> -	ad.id = id = config->aggr_map->map[s];
> -	ad.val = ad.ena = ad.run = 0;
> -	ad.nr = 0;
> -	if (!collect_data(config, counter, aggr_cb, &ad))
> +	if (counter->supported && aggr->nr == 0)
>  		return;
>  
> -	if (perf_pmu__has_hybrid() && ad.ena == 0)
> -		return;
> +	uniquify_counter(config, counter);
> +
> +	val = aggr->counts.val;
> +	ena = aggr->counts.ena;
> +	run = aggr->counts.run;
>  
> -	nr = ad.nr;
> -	ena = ad.ena;
> -	run = ad.run;
> -	val = ad.val;
>  	if (*first && metric_only) {
>  		*first = false;
> -		aggr_printout(config, counter, id, nr);
> +		aggr_printout(config, counter, id, aggr->nr);
>  	}
>  	if (prefix && !metric_only)
>  		fprintf(output, "%s", prefix);
>  
>  	uval = val * counter->scale;
> -	if (cpu.cpu != -1)
> -		id = aggr_cpu_id__cpu(cpu, /*data=*/NULL);
>  
> -	printout(config, id, nr, counter, uval,
> -		 prefix, run, ena, 1.0, &rt_stat);
> +	printout(config, id, aggr->nr, counter, uval,
> +		 prefix, run, ena, avg, &rt_stat, s);
> +
>  	if (!metric_only)
>  		fputc('\n', output);
>  }
> @@ -869,8 +729,6 @@ static void print_aggr(struct perf_stat_config *config,
>  	if (!config->aggr_map || !config->aggr_get_id)
>  		return;
>  
> -	aggr_update_shadow(config, evlist);
> -
>  	/*
>  	 * With metric_only everything is on a single line.
>  	 * Without each counter has its own line.
> @@ -881,188 +739,36 @@ static void print_aggr(struct perf_stat_config *config,
>  
>  		first = true;
>  		evlist__for_each_entry(evlist, counter) {
> +			if (counter->merged_stat)
> +				continue;
> +
>  			print_counter_aggrdata(config, counter, s,
> -					prefix, metric_only,
> -					&first, (struct perf_cpu){ .cpu = -1 });
> +					       prefix, metric_only,
> +					       &first);
>  		}
>  		if (metric_only)
>  			fputc('\n', output);
>  	}
>  }
>  
> -static int cmp_val(const void *a, const void *b)
> -{
> -	return ((struct perf_aggr_thread_value *)b)->val -
> -		((struct perf_aggr_thread_value *)a)->val;
> -}
> -
> -static struct perf_aggr_thread_value *sort_aggr_thread(
> -					struct evsel *counter,
> -					int *ret,
> -					struct target *_target)
> -{
> -	int nthreads = perf_thread_map__nr(counter->core.threads);
> -	int i = 0;
> -	double uval;
> -	struct perf_aggr_thread_value *buf;
> -
> -	buf = calloc(nthreads, sizeof(struct perf_aggr_thread_value));
> -	if (!buf)
> -		return NULL;
> -
> -	for (int thread = 0; thread < nthreads; thread++) {
> -		int idx;
> -		u64 ena = 0, run = 0, val = 0;
> -
> -		perf_cpu_map__for_each_idx(idx, evsel__cpus(counter)) {
> -			struct perf_counts_values *counts =
> -				perf_counts(counter->counts, idx, thread);
> -
> -			val += counts->val;
> -			ena += counts->ena;
> -			run += counts->run;
> -		}
> -
> -		uval = val * counter->scale;
> -
> -		/*
> -		 * Skip value 0 when enabling --per-thread globally,
> -		 * otherwise too many 0 output.
> -		 */
> -		if (uval == 0.0 && target__has_per_thread(_target))
> -			continue;
> -
> -		buf[i].counter = counter;
> -		buf[i].id = aggr_cpu_id__empty();
> -		buf[i].id.thread_idx = thread;
> -		buf[i].uval = uval;
> -		buf[i].val = val;
> -		buf[i].run = run;
> -		buf[i].ena = ena;
> -		i++;
> -	}
> -
> -	qsort(buf, i, sizeof(struct perf_aggr_thread_value), cmp_val);
> -
> -	if (ret)
> -		*ret = i;
> -
> -	return buf;
> -}
> -
> -static void print_aggr_thread(struct perf_stat_config *config,
> -			      struct target *_target,
> -			      struct evsel *counter, char *prefix)
> -{
> -	FILE *output = config->output;
> -	int thread, sorted_threads;
> -	struct aggr_cpu_id id;
> -	struct perf_aggr_thread_value *buf;
> -
> -	buf = sort_aggr_thread(counter, &sorted_threads, _target);
> -	if (!buf) {
> -		perror("cannot sort aggr thread");
> -		return;
> -	}
> -
> -	for (thread = 0; thread < sorted_threads; thread++) {
> -		if (prefix)
> -			fprintf(output, "%s", prefix);
> -
> -		id = buf[thread].id;
> -		printout(config, id, 0, buf[thread].counter, buf[thread].uval,
> -			 prefix, buf[thread].run, buf[thread].ena, 1.0,
> -			 &rt_stat);
> -		fputc('\n', output);
> -	}
> -
> -	free(buf);
> -}
> -
> -struct caggr_data {
> -	double avg, avg_enabled, avg_running;
> -};
> -
> -static void counter_aggr_cb(struct perf_stat_config *config __maybe_unused,
> -			    struct evsel *counter, void *data,
> -			    bool first __maybe_unused)
> -{
> -	struct caggr_data *cd = data;
> -	struct perf_counts_values *aggr = &counter->counts->aggr;
> -
> -	cd->avg += aggr->val;
> -	cd->avg_enabled += aggr->ena;
> -	cd->avg_running += aggr->run;
> -}
> -
> -/*
> - * Print out the results of a single counter:
> - * aggregated counts in system-wide mode
> - */
> -static void print_counter_aggr(struct perf_stat_config *config,
> -			       struct evsel *counter, char *prefix)
> -{
> -	bool metric_only = config->metric_only;
> -	FILE *output = config->output;
> -	double uval;
> -	struct caggr_data cd = { .avg = 0.0 };
> -
> -	if (!collect_data(config, counter, counter_aggr_cb, &cd))
> -		return;
> -
> -	if (prefix && !metric_only)
> -		fprintf(output, "%s", prefix);
> -
> -	uval = cd.avg * counter->scale;
> -	printout(config, aggr_cpu_id__empty(), 0, counter, uval, prefix, cd.avg_running,
> -		 cd.avg_enabled, cd.avg, &rt_stat);
> -	if (!metric_only)
> -		fprintf(output, "\n");
> -}
> -
> -static void counter_cb(struct perf_stat_config *config __maybe_unused,
> -		       struct evsel *counter, void *data,
> -		       bool first __maybe_unused)
> -{
> -	struct aggr_data *ad = data;
> -
> -	ad->val += perf_counts(counter->counts, ad->cpu_map_idx, 0)->val;
> -	ad->ena += perf_counts(counter->counts, ad->cpu_map_idx, 0)->ena;
> -	ad->run += perf_counts(counter->counts, ad->cpu_map_idx, 0)->run;
> -}
> -
> -/*
> - * Print out the results of a single counter:
> - * does not use aggregated count in system-wide
> - */
>  static void print_counter(struct perf_stat_config *config,
>  			  struct evsel *counter, char *prefix)
>  {
> -	FILE *output = config->output;
> -	u64 ena, run, val;
> -	double uval;
> -	int idx;
> -	struct perf_cpu cpu;
> -	struct aggr_cpu_id id;
> -
> -	perf_cpu_map__for_each_cpu(cpu, idx, evsel__cpus(counter)) {
> -		struct aggr_data ad = { .cpu_map_idx = idx };
> -
> -		if (!collect_data(config, counter, counter_cb, &ad))
> -			return;
> -		val = ad.val;
> -		ena = ad.ena;
> -		run = ad.run;
> +	bool metric_only = config->metric_only;
> +	bool first = false;
> +	int s;
>  
> -		if (prefix)
> -			fprintf(output, "%s", prefix);
> +	/* AGGR_THREAD doesn't have config->aggr_get_id */
> +	if (!config->aggr_map)
> +		return;
>  
> -		uval = val * counter->scale;
> -		id = aggr_cpu_id__cpu(cpu, /*data=*/NULL);
> -		printout(config, id, 0, counter, uval, prefix,
> -			 run, ena, 1.0, &rt_stat);
> +	if (counter->merged_stat)
> +		return;
>  
> -		fputc('\n', output);
> +	for (s = 0; s < config->aggr_map->nr; s++) {
> +		print_counter_aggrdata(config, counter, s,
> +				       prefix, metric_only,
> +				       &first);
>  	}
>  }
>  
> @@ -1081,6 +787,7 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
>  			u64 ena, run, val;
>  			double uval;
>  			struct aggr_cpu_id id;
> +			struct perf_stat_evsel *ps = counter->stats;
>  			int counter_idx = perf_cpu_map__idx(evsel__cpus(counter), cpu);
>  
>  			if (counter_idx < 0)
> @@ -1093,13 +800,13 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
>  				aggr_printout(config, counter, id, 0);
>  				first = false;
>  			}
> -			val = perf_counts(counter->counts, counter_idx, 0)->val;
> -			ena = perf_counts(counter->counts, counter_idx, 0)->ena;
> -			run = perf_counts(counter->counts, counter_idx, 0)->run;
> +			val = ps->aggr[counter_idx].counts.val;
> +			ena = ps->aggr[counter_idx].counts.ena;
> +			run = ps->aggr[counter_idx].counts.run;
>  
>  			uval = val * counter->scale;
>  			printout(config, id, 0, counter, uval, prefix,
> -				 run, ena, 1.0, &rt_stat);
> +				 run, ena, 1.0, &rt_stat, counter_idx);
>  		}
>  		if (!first)
>  			fputc('\n', config->output);
> @@ -1135,8 +842,8 @@ static void print_metric_headers(struct perf_stat_config *config,
>  	};
>  	bool first = true;
>  
> -		if (config->json_output && !config->interval)
> -			fprintf(config->output, "{");
> +	if (config->json_output && !config->interval)
> +		fprintf(config->output, "{");
>  
>  	if (prefix && !config->json_output)
>  		fprintf(config->output, "%s", prefix);
> @@ -1379,31 +1086,6 @@ static void print_footer(struct perf_stat_config *config)
>  			"the same PMU. Try reorganizing the group.\n");
>  }
>  
> -static void print_percore_thread(struct perf_stat_config *config,
> -				 struct evsel *counter, char *prefix)
> -{
> -	int s;
> -	struct aggr_cpu_id s2, id;
> -	struct perf_cpu_map *cpus;
> -	bool first = true;
> -	int idx;
> -	struct perf_cpu cpu;
> -
> -	cpus = evsel__cpus(counter);
> -	perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
> -		s2 = config->aggr_get_id(config, cpu);
> -		for (s = 0; s < config->aggr_map->nr; s++) {
> -			id = config->aggr_map->map[s];
> -			if (aggr_cpu_id__equal(&s2, &id))
> -				break;
> -		}
> -
> -		print_counter_aggrdata(config, counter, s,
> -				       prefix, false,
> -				       &first, cpu);
> -	}
> -}
> -
>  static void print_percore(struct perf_stat_config *config,
>  			  struct evsel *counter, char *prefix)
>  {
> @@ -1416,15 +1098,14 @@ static void print_percore(struct perf_stat_config *config,
>  		return;
>  
>  	if (config->percore_show_thread)
> -		return print_percore_thread(config, counter, prefix);
> +		return print_counter(config, counter, prefix);
>  
>  	for (s = 0; s < config->aggr_map->nr; s++) {
>  		if (prefix && metric_only)
>  			fprintf(output, "%s", prefix);
>  
>  		print_counter_aggrdata(config, counter, s,
> -				prefix, metric_only,
> -				&first, (struct perf_cpu){ .cpu = -1 });
> +				       prefix, metric_only, &first);
>  	}
>  
>  	if (metric_only)
> @@ -1469,17 +1150,13 @@ void evlist__print_counters(struct evlist *evlist, struct perf_stat_config *conf
>  		print_aggr(config, evlist, prefix);
>  		break;
>  	case AGGR_THREAD:
> -		evlist__for_each_entry(evlist, counter) {
> -			print_aggr_thread(config, _target, counter, prefix);
> -		}
> -		break;
>  	case AGGR_GLOBAL:
>  		if (config->iostat_run)
>  			iostat_print_counters(evlist, config, ts, prefix = buf,
> -					      print_counter_aggr);
> +					      print_counter);
>  		else {
>  			evlist__for_each_entry(evlist, counter) {
> -				print_counter_aggr(config, counter, prefix);
> +				print_counter(config, counter, prefix);
>  			}
>  			if (metric_only)
>  				fputc('\n', config->output);
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index c0955a0427ab..0316557adce9 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -565,11 +565,6 @@ int perf_stat_process_counter(struct perf_stat_config *config,
>  			evsel__name(counter), count[0], count[1], count[2]);
>  	}
>  
> -	/*
> -	 * Save the full runtime - to allow normalization during printout:
> -	 */
> -	perf_stat__update_shadow_stats(counter, *count, 0, &rt_stat);
> -
>  	return 0;
>  }
>  
> diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> index 3d413ba8c68a..382a1ab92ce1 100644
> --- a/tools/perf/util/stat.h
> +++ b/tools/perf/util/stat.h
> @@ -224,15 +224,6 @@ static inline void update_rusage_stats(struct rusage_stats *ru_stats, struct rus
>  struct evsel;
>  struct evlist;
>  
> -struct perf_aggr_thread_value {
> -	struct evsel *counter;
> -	struct aggr_cpu_id id;
> -	double uval;
> -	u64 val;
> -	u64 run;
> -	u64 ena;
> -};
> -
>  bool __perf_stat_evsel__is(struct evsel *evsel, enum perf_stat_evsel_id id);
>  
>  #define perf_stat_evsel__is(evsel, id) \
> -- 
> 2.38.0.413.g74048e4d9e-goog
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 18/20] perf stat: Display event stats using aggr counts
  2023-06-14 13:40   ` Jiri Olsa
@ 2023-06-14 16:20     ` Ian Rogers
  2023-06-15  8:10       ` Jiri Olsa
  0 siblings, 1 reply; 28+ messages in thread
From: Ian Rogers @ 2023-06-14 16:20 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Namhyung Kim, Arnaldo Carvalho de Melo, Ingo Molnar,
	Peter Zijlstra, LKML, Adrian Hunter, linux-perf-users, Kan Liang,
	Leo Yan, Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun,
	Michael Petlan

On Wed, Jun 14, 2023 at 6:40 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Mon, Oct 17, 2022 at 07:02:25PM -0700, Namhyung Kim wrote:
> > Now aggr counts are ready for use.  Convert the display routines to use
> > the aggr counts and update the shadow stat with them.  It doesn't need
> > to aggregate counts or collect aliases anymore during the display.  Get
> > rid of now unused struct perf_aggr_thread_value.
> >
> > Note that there's a difference in the display order among the aggr mode.
> > For per-core/die/socket/node aggregation, it shows relevant events in
> > the same unit together, whereas global/thread/no aggregation it shows
> > the same events for different units together.  So it still uses separate
> > codes to display them due to the ordering.
> >
> > One more thing to note is that it breaks per-core event display for now.
> > The next patch will fix it to have identical output as of now.
> >
> > Acked-by: Ian Rogers <irogers@google.com>
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
>
> hi,
> this one seems to break 'perf stat -r X' not sure why so far..
>
> final counts seems to be accumulated instead of displaying average, like:
>
>
> with this patch:
>
>          Performance counter stats for './test_progs -n 103/1' (2 runs):
>
>                206,815,929      cycles:u                                                             ( +-  0.05% )
>             16,052,747,533      cycles:k                                                             ( +-  0.10% )
>             16,259,643,167      cycles                                                               ( +-  0.10% )
>
>                    1.98093 +- 0.00586 seconds time elapsed  ( +-  0.30% )
>
>
> without this patch:
>
>          Performance counter stats for './test_progs -n 103/1' (2 runs):
>
>                103,300,812      cycles:u                                                             ( +-  0.37% )
>              8,016,856,866      cycles:k                                                             ( +-  0.32% )
>              8,120,200,572      cycles                                                               ( +-  0.32% )
>
>                    1.97272 +- 0.00270 seconds time elapsed  ( +-  0.14% )
>
>
> any idea? ;-)

Is this still broken in perf-tools-next? The patch is quite old and
there's been work in this area. I'm assuming yes, but thought it was
worth checking.

Thanks,
Ian

> thanks,
> jirka
>
>
>
>
> > ---
> >  tools/perf/util/stat-display.c | 421 ++++-----------------------------
> >  tools/perf/util/stat.c         |   5 -
> >  tools/perf/util/stat.h         |   9 -
> >  3 files changed, 49 insertions(+), 386 deletions(-)
> >
> > diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> > index 4113aa86772f..bfae2784609c 100644
> > --- a/tools/perf/util/stat-display.c
> > +++ b/tools/perf/util/stat-display.c
> > @@ -442,31 +442,6 @@ static void print_metric_header(struct perf_stat_config *config,
> >               fprintf(os->fh, "%*s ", config->metric_only_len, unit);
> >  }
> >
> > -static int first_shadow_map_idx(struct perf_stat_config *config,
> > -                             struct evsel *evsel, const struct aggr_cpu_id *id)
> > -{
> > -     struct perf_cpu_map *cpus = evsel__cpus(evsel);
> > -     struct perf_cpu cpu;
> > -     int idx;
> > -
> > -     if (config->aggr_mode == AGGR_NONE)
> > -             return perf_cpu_map__idx(cpus, id->cpu);
> > -
> > -     if (config->aggr_mode == AGGR_THREAD)
> > -             return id->thread_idx;
> > -
> > -     if (!config->aggr_get_id)
> > -             return 0;
> > -
> > -     perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
> > -             struct aggr_cpu_id cpu_id = config->aggr_get_id(config, cpu);
> > -
> > -             if (aggr_cpu_id__equal(&cpu_id, id))
> > -                     return idx;
> > -     }
> > -     return 0;
> > -}
> > -
> >  static void abs_printout(struct perf_stat_config *config,
> >                        struct aggr_cpu_id id, int nr, struct evsel *evsel, double avg)
> >  {
> > @@ -537,7 +512,7 @@ static bool is_mixed_hw_group(struct evsel *counter)
> >  static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int nr,
> >                    struct evsel *counter, double uval,
> >                    char *prefix, u64 run, u64 ena, double noise,
> > -                  struct runtime_stat *st)
> > +                  struct runtime_stat *st, int map_idx)
> >  {
> >       struct perf_stat_output_ctx out;
> >       struct outstate os = {
> > @@ -648,8 +623,7 @@ static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int
> >               print_running(config, run, ena);
> >       }
> >
> > -     perf_stat__print_shadow_stats(config, counter, uval,
> > -                             first_shadow_map_idx(config, counter, &id),
> > +     perf_stat__print_shadow_stats(config, counter, uval, map_idx,
> >                               &out, &config->metric_events, st);
> >       if (!config->csv_output && !config->metric_only && !config->json_output) {
> >               print_noise(config, counter, noise);
> > @@ -657,34 +631,6 @@ static void printout(struct perf_stat_config *config, struct aggr_cpu_id id, int
> >       }
> >  }
> >
> > -static void aggr_update_shadow(struct perf_stat_config *config,
> > -                            struct evlist *evlist)
> > -{
> > -     int idx, s;
> > -     struct perf_cpu cpu;
> > -     struct aggr_cpu_id s2, id;
> > -     u64 val;
> > -     struct evsel *counter;
> > -     struct perf_cpu_map *cpus;
> > -
> > -     for (s = 0; s < config->aggr_map->nr; s++) {
> > -             id = config->aggr_map->map[s];
> > -             evlist__for_each_entry(evlist, counter) {
> > -                     cpus = evsel__cpus(counter);
> > -                     val = 0;
> > -                     perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
> > -                             s2 = config->aggr_get_id(config, cpu);
> > -                             if (!aggr_cpu_id__equal(&s2, &id))
> > -                                     continue;
> > -                             val += perf_counts(counter->counts, idx, 0)->val;
> > -                     }
> > -                     perf_stat__update_shadow_stats(counter, val,
> > -                                     first_shadow_map_idx(config, counter, &id),
> > -                                     &rt_stat);
> > -             }
> > -     }
> > -}
> > -
> >  static void uniquify_event_name(struct evsel *counter)
> >  {
> >       char *new_name;
> > @@ -721,137 +667,51 @@ static void uniquify_event_name(struct evsel *counter)
> >       counter->uniquified_name = true;
> >  }
> >
> > -static void collect_all_aliases(struct perf_stat_config *config, struct evsel *counter,
> > -                         void (*cb)(struct perf_stat_config *config, struct evsel *counter, void *data,
> > -                                    bool first),
> > -                         void *data)
> > +static bool hybrid_uniquify(struct evsel *evsel, struct perf_stat_config *config)
> >  {
> > -     struct evlist *evlist = counter->evlist;
> > -     struct evsel *alias;
> > -
> > -     alias = list_prepare_entry(counter, &(evlist->core.entries), core.node);
> > -     list_for_each_entry_continue (alias, &evlist->core.entries, core.node) {
> > -             /* Merge events with the same name, etc. but on different PMUs. */
> > -             if (!strcmp(evsel__name(alias), evsel__name(counter)) &&
> > -                     alias->scale == counter->scale &&
> > -                     alias->cgrp == counter->cgrp &&
> > -                     !strcmp(alias->unit, counter->unit) &&
> > -                     evsel__is_clock(alias) == evsel__is_clock(counter) &&
> > -                     strcmp(alias->pmu_name, counter->pmu_name)) {
> > -                     alias->merged_stat = true;
> > -                     cb(config, alias, data, false);
> > -             }
> > -     }
> > +     return evsel__is_hybrid(evsel) && !config->hybrid_merge;
> >  }
> >
> > -static bool hybrid_merge(struct evsel *counter, struct perf_stat_config *config,
> > -                      bool check)
> > +static void uniquify_counter(struct perf_stat_config *config, struct evsel *counter)
> >  {
> > -     if (evsel__is_hybrid(counter)) {
> > -             if (check)
> > -                     return config->hybrid_merge;
> > -             else
> > -                     return !config->hybrid_merge;
> > -     }
> > -
> > -     return false;
> > -}
> > -
> > -static bool collect_data(struct perf_stat_config *config, struct evsel *counter,
> > -                         void (*cb)(struct perf_stat_config *config, struct evsel *counter, void *data,
> > -                                    bool first),
> > -                         void *data)
> > -{
> > -     if (counter->merged_stat)
> > -             return false;
> > -     cb(config, counter, data, true);
> > -     if (config->no_merge || hybrid_merge(counter, config, false))
> > +     if (config->no_merge || hybrid_uniquify(counter, config))
> >               uniquify_event_name(counter);
> > -     else if (counter->auto_merge_stats || hybrid_merge(counter, config, true))
> > -             collect_all_aliases(config, counter, cb, data);
> > -     return true;
> > -}
> > -
> > -struct aggr_data {
> > -     u64 ena, run, val;
> > -     struct aggr_cpu_id id;
> > -     int nr;
> > -     int cpu_map_idx;
> > -};
> > -
> > -static void aggr_cb(struct perf_stat_config *config,
> > -                 struct evsel *counter, void *data, bool first)
> > -{
> > -     struct aggr_data *ad = data;
> > -     int idx;
> > -     struct perf_cpu cpu;
> > -     struct perf_cpu_map *cpus;
> > -     struct aggr_cpu_id s2;
> > -
> > -     cpus = evsel__cpus(counter);
> > -     perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
> > -             struct perf_counts_values *counts;
> > -
> > -             s2 = config->aggr_get_id(config, cpu);
> > -             if (!aggr_cpu_id__equal(&s2, &ad->id))
> > -                     continue;
> > -             if (first)
> > -                     ad->nr++;
> > -             counts = perf_counts(counter->counts, idx, 0);
> > -             /*
> > -              * When any result is bad, make them all to give
> > -              * consistent output in interval mode.
> > -              */
> > -             if (counts->ena == 0 || counts->run == 0 ||
> > -                 counter->counts->scaled == -1) {
> > -                     ad->ena = 0;
> > -                     ad->run = 0;
> > -                     break;
> > -             }
> > -             ad->val += counts->val;
> > -             ad->ena += counts->ena;
> > -             ad->run += counts->run;
> > -     }
> >  }
> >
> >  static void print_counter_aggrdata(struct perf_stat_config *config,
> >                                  struct evsel *counter, int s,
> >                                  char *prefix, bool metric_only,
> > -                                bool *first, struct perf_cpu cpu)
> > +                                bool *first)
> >  {
> > -     struct aggr_data ad;
> >       FILE *output = config->output;
> >       u64 ena, run, val;
> > -     int nr;
> > -     struct aggr_cpu_id id;
> >       double uval;
> > +     struct perf_stat_evsel *ps = counter->stats;
> > +     struct perf_stat_aggr *aggr = &ps->aggr[s];
> > +     struct aggr_cpu_id id = config->aggr_map->map[s];
> > +     double avg = aggr->counts.val;
> >
> > -     ad.id = id = config->aggr_map->map[s];
> > -     ad.val = ad.ena = ad.run = 0;
> > -     ad.nr = 0;
> > -     if (!collect_data(config, counter, aggr_cb, &ad))
> > +     if (counter->supported && aggr->nr == 0)
> >               return;
> >
> > -     if (perf_pmu__has_hybrid() && ad.ena == 0)
> > -             return;
> > +     uniquify_counter(config, counter);
> > +
> > +     val = aggr->counts.val;
> > +     ena = aggr->counts.ena;
> > +     run = aggr->counts.run;
> >
> > -     nr = ad.nr;
> > -     ena = ad.ena;
> > -     run = ad.run;
> > -     val = ad.val;
> >       if (*first && metric_only) {
> >               *first = false;
> > -             aggr_printout(config, counter, id, nr);
> > +             aggr_printout(config, counter, id, aggr->nr);
> >       }
> >       if (prefix && !metric_only)
> >               fprintf(output, "%s", prefix);
> >
> >       uval = val * counter->scale;
> > -     if (cpu.cpu != -1)
> > -             id = aggr_cpu_id__cpu(cpu, /*data=*/NULL);
> >
> > -     printout(config, id, nr, counter, uval,
> > -              prefix, run, ena, 1.0, &rt_stat);
> > +     printout(config, id, aggr->nr, counter, uval,
> > +              prefix, run, ena, avg, &rt_stat, s);
> > +
> >       if (!metric_only)
> >               fputc('\n', output);
> >  }
> > @@ -869,8 +729,6 @@ static void print_aggr(struct perf_stat_config *config,
> >       if (!config->aggr_map || !config->aggr_get_id)
> >               return;
> >
> > -     aggr_update_shadow(config, evlist);
> > -
> >       /*
> >        * With metric_only everything is on a single line.
> >        * Without each counter has its own line.
> > @@ -881,188 +739,36 @@ static void print_aggr(struct perf_stat_config *config,
> >
> >               first = true;
> >               evlist__for_each_entry(evlist, counter) {
> > +                     if (counter->merged_stat)
> > +                             continue;
> > +
> >                       print_counter_aggrdata(config, counter, s,
> > -                                     prefix, metric_only,
> > -                                     &first, (struct perf_cpu){ .cpu = -1 });
> > +                                            prefix, metric_only,
> > +                                            &first);
> >               }
> >               if (metric_only)
> >                       fputc('\n', output);
> >       }
> >  }
> >
> > -static int cmp_val(const void *a, const void *b)
> > -{
> > -     return ((struct perf_aggr_thread_value *)b)->val -
> > -             ((struct perf_aggr_thread_value *)a)->val;
> > -}
> > -
> > -static struct perf_aggr_thread_value *sort_aggr_thread(
> > -                                     struct evsel *counter,
> > -                                     int *ret,
> > -                                     struct target *_target)
> > -{
> > -     int nthreads = perf_thread_map__nr(counter->core.threads);
> > -     int i = 0;
> > -     double uval;
> > -     struct perf_aggr_thread_value *buf;
> > -
> > -     buf = calloc(nthreads, sizeof(struct perf_aggr_thread_value));
> > -     if (!buf)
> > -             return NULL;
> > -
> > -     for (int thread = 0; thread < nthreads; thread++) {
> > -             int idx;
> > -             u64 ena = 0, run = 0, val = 0;
> > -
> > -             perf_cpu_map__for_each_idx(idx, evsel__cpus(counter)) {
> > -                     struct perf_counts_values *counts =
> > -                             perf_counts(counter->counts, idx, thread);
> > -
> > -                     val += counts->val;
> > -                     ena += counts->ena;
> > -                     run += counts->run;
> > -             }
> > -
> > -             uval = val * counter->scale;
> > -
> > -             /*
> > -              * Skip value 0 when enabling --per-thread globally,
> > -              * otherwise too many 0 output.
> > -              */
> > -             if (uval == 0.0 && target__has_per_thread(_target))
> > -                     continue;
> > -
> > -             buf[i].counter = counter;
> > -             buf[i].id = aggr_cpu_id__empty();
> > -             buf[i].id.thread_idx = thread;
> > -             buf[i].uval = uval;
> > -             buf[i].val = val;
> > -             buf[i].run = run;
> > -             buf[i].ena = ena;
> > -             i++;
> > -     }
> > -
> > -     qsort(buf, i, sizeof(struct perf_aggr_thread_value), cmp_val);
> > -
> > -     if (ret)
> > -             *ret = i;
> > -
> > -     return buf;
> > -}
> > -
> > -static void print_aggr_thread(struct perf_stat_config *config,
> > -                           struct target *_target,
> > -                           struct evsel *counter, char *prefix)
> > -{
> > -     FILE *output = config->output;
> > -     int thread, sorted_threads;
> > -     struct aggr_cpu_id id;
> > -     struct perf_aggr_thread_value *buf;
> > -
> > -     buf = sort_aggr_thread(counter, &sorted_threads, _target);
> > -     if (!buf) {
> > -             perror("cannot sort aggr thread");
> > -             return;
> > -     }
> > -
> > -     for (thread = 0; thread < sorted_threads; thread++) {
> > -             if (prefix)
> > -                     fprintf(output, "%s", prefix);
> > -
> > -             id = buf[thread].id;
> > -             printout(config, id, 0, buf[thread].counter, buf[thread].uval,
> > -                      prefix, buf[thread].run, buf[thread].ena, 1.0,
> > -                      &rt_stat);
> > -             fputc('\n', output);
> > -     }
> > -
> > -     free(buf);
> > -}
> > -
> > -struct caggr_data {
> > -     double avg, avg_enabled, avg_running;
> > -};
> > -
> > -static void counter_aggr_cb(struct perf_stat_config *config __maybe_unused,
> > -                         struct evsel *counter, void *data,
> > -                         bool first __maybe_unused)
> > -{
> > -     struct caggr_data *cd = data;
> > -     struct perf_counts_values *aggr = &counter->counts->aggr;
> > -
> > -     cd->avg += aggr->val;
> > -     cd->avg_enabled += aggr->ena;
> > -     cd->avg_running += aggr->run;
> > -}
> > -
> > -/*
> > - * Print out the results of a single counter:
> > - * aggregated counts in system-wide mode
> > - */
> > -static void print_counter_aggr(struct perf_stat_config *config,
> > -                            struct evsel *counter, char *prefix)
> > -{
> > -     bool metric_only = config->metric_only;
> > -     FILE *output = config->output;
> > -     double uval;
> > -     struct caggr_data cd = { .avg = 0.0 };
> > -
> > -     if (!collect_data(config, counter, counter_aggr_cb, &cd))
> > -             return;
> > -
> > -     if (prefix && !metric_only)
> > -             fprintf(output, "%s", prefix);
> > -
> > -     uval = cd.avg * counter->scale;
> > -     printout(config, aggr_cpu_id__empty(), 0, counter, uval, prefix, cd.avg_running,
> > -              cd.avg_enabled, cd.avg, &rt_stat);
> > -     if (!metric_only)
> > -             fprintf(output, "\n");
> > -}
> > -
> > -static void counter_cb(struct perf_stat_config *config __maybe_unused,
> > -                    struct evsel *counter, void *data,
> > -                    bool first __maybe_unused)
> > -{
> > -     struct aggr_data *ad = data;
> > -
> > -     ad->val += perf_counts(counter->counts, ad->cpu_map_idx, 0)->val;
> > -     ad->ena += perf_counts(counter->counts, ad->cpu_map_idx, 0)->ena;
> > -     ad->run += perf_counts(counter->counts, ad->cpu_map_idx, 0)->run;
> > -}
> > -
> > -/*
> > - * Print out the results of a single counter:
> > - * does not use aggregated count in system-wide
> > - */
> >  static void print_counter(struct perf_stat_config *config,
> >                         struct evsel *counter, char *prefix)
> >  {
> > -     FILE *output = config->output;
> > -     u64 ena, run, val;
> > -     double uval;
> > -     int idx;
> > -     struct perf_cpu cpu;
> > -     struct aggr_cpu_id id;
> > -
> > -     perf_cpu_map__for_each_cpu(cpu, idx, evsel__cpus(counter)) {
> > -             struct aggr_data ad = { .cpu_map_idx = idx };
> > -
> > -             if (!collect_data(config, counter, counter_cb, &ad))
> > -                     return;
> > -             val = ad.val;
> > -             ena = ad.ena;
> > -             run = ad.run;
> > +     bool metric_only = config->metric_only;
> > +     bool first = false;
> > +     int s;
> >
> > -             if (prefix)
> > -                     fprintf(output, "%s", prefix);
> > +     /* AGGR_THREAD doesn't have config->aggr_get_id */
> > +     if (!config->aggr_map)
> > +             return;
> >
> > -             uval = val * counter->scale;
> > -             id = aggr_cpu_id__cpu(cpu, /*data=*/NULL);
> > -             printout(config, id, 0, counter, uval, prefix,
> > -                      run, ena, 1.0, &rt_stat);
> > +     if (counter->merged_stat)
> > +             return;
> >
> > -             fputc('\n', output);
> > +     for (s = 0; s < config->aggr_map->nr; s++) {
> > +             print_counter_aggrdata(config, counter, s,
> > +                                    prefix, metric_only,
> > +                                    &first);
> >       }
> >  }
> >
> > @@ -1081,6 +787,7 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
> >                       u64 ena, run, val;
> >                       double uval;
> >                       struct aggr_cpu_id id;
> > +                     struct perf_stat_evsel *ps = counter->stats;
> >                       int counter_idx = perf_cpu_map__idx(evsel__cpus(counter), cpu);
> >
> >                       if (counter_idx < 0)
> > @@ -1093,13 +800,13 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
> >                               aggr_printout(config, counter, id, 0);
> >                               first = false;
> >                       }
> > -                     val = perf_counts(counter->counts, counter_idx, 0)->val;
> > -                     ena = perf_counts(counter->counts, counter_idx, 0)->ena;
> > -                     run = perf_counts(counter->counts, counter_idx, 0)->run;
> > +                     val = ps->aggr[counter_idx].counts.val;
> > +                     ena = ps->aggr[counter_idx].counts.ena;
> > +                     run = ps->aggr[counter_idx].counts.run;
> >
> >                       uval = val * counter->scale;
> >                       printout(config, id, 0, counter, uval, prefix,
> > -                              run, ena, 1.0, &rt_stat);
> > +                              run, ena, 1.0, &rt_stat, counter_idx);
> >               }
> >               if (!first)
> >                       fputc('\n', config->output);
> > @@ -1135,8 +842,8 @@ static void print_metric_headers(struct perf_stat_config *config,
> >       };
> >       bool first = true;
> >
> > -             if (config->json_output && !config->interval)
> > -                     fprintf(config->output, "{");
> > +     if (config->json_output && !config->interval)
> > +             fprintf(config->output, "{");
> >
> >       if (prefix && !config->json_output)
> >               fprintf(config->output, "%s", prefix);
> > @@ -1379,31 +1086,6 @@ static void print_footer(struct perf_stat_config *config)
> >                       "the same PMU. Try reorganizing the group.\n");
> >  }
> >
> > -static void print_percore_thread(struct perf_stat_config *config,
> > -                              struct evsel *counter, char *prefix)
> > -{
> > -     int s;
> > -     struct aggr_cpu_id s2, id;
> > -     struct perf_cpu_map *cpus;
> > -     bool first = true;
> > -     int idx;
> > -     struct perf_cpu cpu;
> > -
> > -     cpus = evsel__cpus(counter);
> > -     perf_cpu_map__for_each_cpu(cpu, idx, cpus) {
> > -             s2 = config->aggr_get_id(config, cpu);
> > -             for (s = 0; s < config->aggr_map->nr; s++) {
> > -                     id = config->aggr_map->map[s];
> > -                     if (aggr_cpu_id__equal(&s2, &id))
> > -                             break;
> > -             }
> > -
> > -             print_counter_aggrdata(config, counter, s,
> > -                                    prefix, false,
> > -                                    &first, cpu);
> > -     }
> > -}
> > -
> >  static void print_percore(struct perf_stat_config *config,
> >                         struct evsel *counter, char *prefix)
> >  {
> > @@ -1416,15 +1098,14 @@ static void print_percore(struct perf_stat_config *config,
> >               return;
> >
> >       if (config->percore_show_thread)
> > -             return print_percore_thread(config, counter, prefix);
> > +             return print_counter(config, counter, prefix);
> >
> >       for (s = 0; s < config->aggr_map->nr; s++) {
> >               if (prefix && metric_only)
> >                       fprintf(output, "%s", prefix);
> >
> >               print_counter_aggrdata(config, counter, s,
> > -                             prefix, metric_only,
> > -                             &first, (struct perf_cpu){ .cpu = -1 });
> > +                                    prefix, metric_only, &first);
> >       }
> >
> >       if (metric_only)
> > @@ -1469,17 +1150,13 @@ void evlist__print_counters(struct evlist *evlist, struct perf_stat_config *conf
> >               print_aggr(config, evlist, prefix);
> >               break;
> >       case AGGR_THREAD:
> > -             evlist__for_each_entry(evlist, counter) {
> > -                     print_aggr_thread(config, _target, counter, prefix);
> > -             }
> > -             break;
> >       case AGGR_GLOBAL:
> >               if (config->iostat_run)
> >                       iostat_print_counters(evlist, config, ts, prefix = buf,
> > -                                           print_counter_aggr);
> > +                                           print_counter);
> >               else {
> >                       evlist__for_each_entry(evlist, counter) {
> > -                             print_counter_aggr(config, counter, prefix);
> > +                             print_counter(config, counter, prefix);
> >                       }
> >                       if (metric_only)
> >                               fputc('\n', config->output);
> > diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> > index c0955a0427ab..0316557adce9 100644
> > --- a/tools/perf/util/stat.c
> > +++ b/tools/perf/util/stat.c
> > @@ -565,11 +565,6 @@ int perf_stat_process_counter(struct perf_stat_config *config,
> >                       evsel__name(counter), count[0], count[1], count[2]);
> >       }
> >
> > -     /*
> > -      * Save the full runtime - to allow normalization during printout:
> > -      */
> > -     perf_stat__update_shadow_stats(counter, *count, 0, &rt_stat);
> > -
> >       return 0;
> >  }
> >
> > diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> > index 3d413ba8c68a..382a1ab92ce1 100644
> > --- a/tools/perf/util/stat.h
> > +++ b/tools/perf/util/stat.h
> > @@ -224,15 +224,6 @@ static inline void update_rusage_stats(struct rusage_stats *ru_stats, struct rus
> >  struct evsel;
> >  struct evlist;
> >
> > -struct perf_aggr_thread_value {
> > -     struct evsel *counter;
> > -     struct aggr_cpu_id id;
> > -     double uval;
> > -     u64 val;
> > -     u64 run;
> > -     u64 ena;
> > -};
> > -
> >  bool __perf_stat_evsel__is(struct evsel *evsel, enum perf_stat_evsel_id id);
> >
> >  #define perf_stat_evsel__is(evsel, id) \
> > --
> > 2.38.0.413.g74048e4d9e-goog
> >

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 18/20] perf stat: Display event stats using aggr counts
  2023-06-14 16:20     ` Ian Rogers
@ 2023-06-15  8:10       ` Jiri Olsa
  2023-06-15 17:31         ` Namhyung Kim
  0 siblings, 1 reply; 28+ messages in thread
From: Jiri Olsa @ 2023-06-15  8:10 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Jiri Olsa, Namhyung Kim, Arnaldo Carvalho de Melo, Ingo Molnar,
	Peter Zijlstra, LKML, Adrian Hunter, linux-perf-users, Kan Liang,
	Leo Yan, Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun,
	Michael Petlan

On Wed, Jun 14, 2023 at 09:20:53AM -0700, Ian Rogers wrote:
> On Wed, Jun 14, 2023 at 6:40 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Mon, Oct 17, 2022 at 07:02:25PM -0700, Namhyung Kim wrote:
> > > Now aggr counts are ready for use.  Convert the display routines to use
> > > the aggr counts and update the shadow stat with them.  It doesn't need
> > > to aggregate counts or collect aliases anymore during the display.  Get
> > > rid of now unused struct perf_aggr_thread_value.
> > >
> > > Note that there's a difference in the display order among the aggr mode.
> > > For per-core/die/socket/node aggregation, it shows relevant events in
> > > the same unit together, whereas global/thread/no aggregation it shows
> > > the same events for different units together.  So it still uses separate
> > > codes to display them due to the ordering.
> > >
> > > One more thing to note is that it breaks per-core event display for now.
> > > The next patch will fix it to have identical output as of now.
> > >
> > > Acked-by: Ian Rogers <irogers@google.com>
> > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> >
> > hi,
> > this one seems to break 'perf stat -r X' not sure why so far..
> >
> > final counts seems to be accumulated instead of displaying average, like:
> >
> >
> > with this patch:
> >
> >          Performance counter stats for './test_progs -n 103/1' (2 runs):
> >
> >                206,815,929      cycles:u                                                             ( +-  0.05% )
> >             16,052,747,533      cycles:k                                                             ( +-  0.10% )
> >             16,259,643,167      cycles                                                               ( +-  0.10% )
> >
> >                    1.98093 +- 0.00586 seconds time elapsed  ( +-  0.30% )
> >
> >
> > without this patch:
> >
> >          Performance counter stats for './test_progs -n 103/1' (2 runs):
> >
> >                103,300,812      cycles:u                                                             ( +-  0.37% )
> >              8,016,856,866      cycles:k                                                             ( +-  0.32% )
> >              8,120,200,572      cycles                                                               ( +-  0.32% )
> >
> >                    1.97272 +- 0.00270 seconds time elapsed  ( +-  0.14% )
> >
> >
> > any idea? ;-)
> 
> Is this still broken in perf-tools-next? The patch is quite old and
> there's been work in this area. I'm assuming yes, but thought it was
> worth checking.

yes


single run:

	[root@krava perf]# ./perf stat -e cycles:u ./perf bench sched pipe
	# Running 'sched/pipe' benchmark:
	# Executed 1000000 pipe operations between two processes

	     Total time: 4.725 [sec]

	       4.725795 usecs/op
		 211604 ops/sec

	 Performance counter stats for './perf bench sched pipe':

	       398,096,363      cycles:u                                                              

	       4.737638715 seconds time elapsed

	       0.227961000 seconds user
	       4.348895000 seconds sys


3 runs (with verbose):

	[root@krava perf]# ./perf stat -v -r 3 -e cycles:u ./perf bench sched pipe
	Using CPUID GenuineIntel-6-8C-1
	Control descriptor is not initialized
	[ perf stat: executing run #1 ... ]
	# Running 'sched/pipe' benchmark:
	# Executed 1000000 pipe operations between two processes

	     Total time: 4.659 [sec]

	       4.659396 usecs/op
		 214620 ops/sec
	cycles:u: 399150620 3866604490 3866604490
	[ perf stat: executing run #2 ... ]
	# Running 'sched/pipe' benchmark:
	# Executed 1000000 pipe operations between two processes

	     Total time: 4.656 [sec]

	       4.656820 usecs/op
		 214738 ops/sec
	cycles:u: 795910540 7700776638 7700776638
	[ perf stat: executing run #3 ... ]
	# Running 'sched/pipe' benchmark:
	# Executed 1000000 pipe operations between two processes

	     Total time: 4.634 [sec]

	       4.634090 usecs/op
		 215792 ops/sec
	cycles:u: 1189927957 11522039559 11522039559

	 Performance counter stats for './perf bench sched pipe' (3 runs):

	     1,189,927,957      cycles:u                                                                ( +- 19.18% )

		    4.6611 +- 0.0102 seconds time elapsed  ( +-  0.22% )


jirka

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 18/20] perf stat: Display event stats using aggr counts
  2023-06-15  8:10       ` Jiri Olsa
@ 2023-06-15 17:31         ` Namhyung Kim
  0 siblings, 0 replies; 28+ messages in thread
From: Namhyung Kim @ 2023-06-15 17:31 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Ian Rogers, Arnaldo Carvalho de Melo, Ingo Molnar,
	Peter Zijlstra, LKML, Adrian Hunter, linux-perf-users, Kan Liang,
	Leo Yan, Andi Kleen, Athira Rajeev, James Clark, Xing Zhengjun,
	Michael Petlan

Hi Jiri,

On Thu, Jun 15, 2023 at 1:10 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Wed, Jun 14, 2023 at 09:20:53AM -0700, Ian Rogers wrote:
> > On Wed, Jun 14, 2023 at 6:40 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> > >
> > > On Mon, Oct 17, 2022 at 07:02:25PM -0700, Namhyung Kim wrote:
> > > > Now aggr counts are ready for use.  Convert the display routines to use
> > > > the aggr counts and update the shadow stat with them.  It doesn't need
> > > > to aggregate counts or collect aliases anymore during the display.  Get
> > > > rid of now unused struct perf_aggr_thread_value.
> > > >
> > > > Note that there's a difference in the display order among the aggr mode.
> > > > For per-core/die/socket/node aggregation, it shows relevant events in
> > > > the same unit together, whereas global/thread/no aggregation it shows
> > > > the same events for different units together.  So it still uses separate
> > > > codes to display them due to the ordering.
> > > >
> > > > One more thing to note is that it breaks per-core event display for now.
> > > > The next patch will fix it to have identical output as of now.
> > > >
> > > > Acked-by: Ian Rogers <irogers@google.com>
> > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > >
> > > hi,
> > > this one seems to break 'perf stat -r X' not sure why so far..
> > >
> > > final counts seems to be accumulated instead of displaying average, like:
> > >
> > >
> > > with this patch:
> > >
> > >          Performance counter stats for './test_progs -n 103/1' (2 runs):
> > >
> > >                206,815,929      cycles:u                                                             ( +-  0.05% )
> > >             16,052,747,533      cycles:k                                                             ( +-  0.10% )
> > >             16,259,643,167      cycles                                                               ( +-  0.10% )
> > >
> > >                    1.98093 +- 0.00586 seconds time elapsed  ( +-  0.30% )
> > >
> > >
> > > without this patch:
> > >
> > >          Performance counter stats for './test_progs -n 103/1' (2 runs):
> > >
> > >                103,300,812      cycles:u                                                             ( +-  0.37% )
> > >              8,016,856,866      cycles:k                                                             ( +-  0.32% )
> > >              8,120,200,572      cycles                                                               ( +-  0.32% )
> > >
> > >                    1.97272 +- 0.00270 seconds time elapsed  ( +-  0.14% )
> > >
> > >
> > > any idea? ;-)
> >
> > Is this still broken in perf-tools-next? The patch is quite old and
> > there's been work in this area. I'm assuming yes, but thought it was
> > worth checking.
>
> yes

I'll take a look.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2023-06-15 17:31 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-18  2:02 [PATCHSET 00/20] perf stat: Cleanup counter aggregation (v3) Namhyung Kim
2022-10-18  2:02 ` [PATCH 01/20] perf tools: Save evsel->pmu in parse_events() Namhyung Kim
2022-10-18  2:02 ` [PATCH 02/20] perf tools: Use pmu info in evsel__is_hybrid() Namhyung Kim
2022-10-18  2:02 ` [PATCH 03/20] perf stat: Use evsel__is_hybrid() more Namhyung Kim
2022-10-18  2:02 ` [PATCH 04/20] perf stat: Add aggr id for global mode Namhyung Kim
2022-10-18  2:02 ` [PATCH 05/20] perf stat: Add cpu aggr id for no aggregation mode Namhyung Kim
2022-10-18  2:02 ` [PATCH 06/20] perf stat: Add 'needs_sort' argument to cpu_aggr_map__new() Namhyung Kim
2022-10-18  2:02 ` [PATCH 07/20] perf stat: Add struct perf_stat_aggr to perf_stat_evsel Namhyung Kim
2022-10-18  2:02 ` [PATCH 08/20] perf stat: Allocate evsel->stats->aggr properly Namhyung Kim
2022-10-18  2:02 ` [PATCH 09/20] perf stat: Aggregate events using evsel->stats->aggr Namhyung Kim
2022-10-18  2:02 ` [PATCH 10/20] perf stat: Factor out evsel__count_has_error() Namhyung Kim
2022-10-18  2:02 ` [PATCH 11/20] perf stat: Aggregate per-thread stats using evsel->stats->aggr Namhyung Kim
2022-10-18  2:02 ` [PATCH 12/20] perf stat: Allocate aggr counts for recorded data Namhyung Kim
2022-10-18  2:02 ` [PATCH 13/20] perf stat: Reset aggr counts for each interval Namhyung Kim
2022-10-18  2:02 ` [PATCH 14/20] perf stat: Split process_counters() Namhyung Kim
2022-10-18  2:02 ` [PATCH 15/20] perf stat: Add perf_stat_merge_counters() Namhyung Kim
2022-10-18  2:02 ` [PATCH 16/20] perf stat: Add perf_stat_process_percore() Namhyung Kim
2022-10-18  2:02 ` [PATCH 17/20] perf stat: Add perf_stat_process_shadow_stats() Namhyung Kim
2022-10-18  2:02 ` [PATCH 18/20] perf stat: Display event stats using aggr counts Namhyung Kim
2022-12-05 10:51   ` Athira Rajeev
2022-12-05 19:00     ` Namhyung Kim
2022-12-06 13:39       ` Athira Rajeev
2023-06-14 13:40   ` Jiri Olsa
2023-06-14 16:20     ` Ian Rogers
2023-06-15  8:10       ` Jiri Olsa
2023-06-15 17:31         ` Namhyung Kim
2022-10-18  2:02 ` [PATCH 19/20] perf stat: Display percore events properly Namhyung Kim
2022-10-18  2:02 ` [PATCH 20/20] perf stat: Remove unused perf_counts.aggr field Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).