All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] perf stat: Introduce --iiostat mode to provide I/O performance metrics
@ 2020-12-10  9:03 Alexander Antonov
  2020-12-10  9:03 ` [PATCH 1/5] perf stat: Add AGGR_IIO_STACK mode Alexander Antonov
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Alexander Antonov @ 2020-12-10  9:03 UTC (permalink / raw)
  To: acme
  Cc: linux-kernel, jolsa, alexander.shishkin, mark.rutland, namhyung,
	mingo, peterz, ak, alexander.antonov

Mode is intended to provide four I/O performance metrics in MB per each
IIO stack:
 - Inbound Read: I/O devices below IIO stack read from the host memory
 - Inbound Write: I/O devices below IIO stack write to the host memory
 - Outbound Read: CPU reads from I/O devices below IIO stack
 - Outbound Write: CPU writes to I/O devices below IIO stack

Each metric requiries only one IIO event which increments at every 4B
transfer in corresponding direction. The formulas to compute metrics
are generic:
    #EventCount * 4B / (1024 * 1024)

Note: --iiostat introduces new perf data aggregation mode - per I/O stack
hence -e and -M options are not supported.

Usage examples:

1. List all IIO stacks (example for 2-S platform):
   $ perf stat --iiostat=show
   S0-uncore_iio_0<0000:00>
   S1-uncore_iio_0<0000:80>
   S0-uncore_iio_1<0000:17>
   S1-uncore_iio_1<0000:85>
   S0-uncore_iio_2<0000:3a>
   S1-uncore_iio_2<0000:ae>
   S0-uncore_iio_3<0000:5d>
   S1-uncore_iio_3<0000:d7>

2. Collect metrics for all I/O stacks:
   $ perf stat --iiostat -- dd if=/dev/zero of=/dev/nvme0n1 bs=1M oflag=direct
   357708+0 records in
   357707+0 records out
   375083606016 bytes (375 GB, 349 GiB) copied, 215.974 s, 1.7 GB/s

    Performance counter stats for 'system wide':

      port             Inbound Read(MB)    Inbound Write(MB)    Outbound Read(MB)   Outbound Write(MB) 
   0000:00                    1                    0                    2                    3 
   0000:80                    0                    0                    0                    0 
   0000:17               352552                   43                    0                   21 
   0000:85                    0                    0                    0                    0 
   0000:3a                    3                    0                    0                    0 
   0000:ae                    0                    0                    0                    0 
   0000:5d                    0                    0                    0                    0 
   0000:d7                    0                    0                    0                    0

3. Collect metrics for comma separated list of I/O stacks:
   $ perf stat --iiostat=0000:17,0:3a -- dd if=/dev/zero of=/dev/nvme0n1 bs=1M oflag=direct
   357708+0 records in
   357707+0 records out
   375083606016 bytes (375 GB, 349 GiB) copied, 197.08 s, 1.9 GB/s

    Performance counter stats for 'system wide':

      port             Inbound Read(MB)    Inbound Write(MB)    Outbound Read(MB)   Outbound Write(MB) 
   0000:17               358559                   44                    0                   22 
   0000:3a                    3                    2                    0                    0 

        197.081983474 seconds time elapsed

Alexander Antonov (5):
  perf stat: Add AGGR_IIO_STACK mode
  perf evsel: Introduce an observed performance device
  perf stat: Basic support for iiostat in perf stat
  perf stat: Helper functions for IIO stacks list in iiostat mode
  perf stat: Enable --iiostat mode for x86 platforms

 tools/perf/Documentation/perf-stat.txt        |  31 ++
 tools/perf/arch/x86/util/Build                |   1 +
 tools/perf/arch/x86/util/iiostat.c            | 460 ++++++++++++++++++
 tools/perf/builtin-stat.c                     |  38 +-
 tools/perf/util/evsel.h                       |   1 +
 tools/perf/util/iiostat.h                     |  33 ++
 .../scripting-engines/trace-event-python.c    |   2 +-
 tools/perf/util/stat-display.c                |  51 +-
 tools/perf/util/stat-shadow.c                 |  11 +-
 tools/perf/util/stat.c                        |   3 +-
 tools/perf/util/stat.h                        |   2 +
 11 files changed, 625 insertions(+), 8 deletions(-)
 create mode 100644 tools/perf/arch/x86/util/iiostat.c
 create mode 100644 tools/perf/util/iiostat.h


base-commit: 644bf4b0f7acde641d3db200b4db66977e96c3bd
-- 
2.19.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/5] perf stat: Add AGGR_IIO_STACK mode
  2020-12-10  9:03 [PATCH 0/5] perf stat: Introduce --iiostat mode to provide I/O performance metrics Alexander Antonov
@ 2020-12-10  9:03 ` Alexander Antonov
  2020-12-10  9:03 ` [PATCH 2/5] perf evsel: Introduce an observed performance device Alexander Antonov
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Alexander Antonov @ 2020-12-10  9:03 UTC (permalink / raw)
  To: acme
  Cc: linux-kernel, jolsa, alexander.shishkin, mark.rutland, namhyung,
	mingo, peterz, ak, alexander.antonov

Adding AGGR_IIO_STACK mode to be able to distinguish aggr_mode
for IIO stacks in following patches.

Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com>
---
 tools/perf/builtin-stat.c                           |  5 ++++-
 .../util/scripting-engines/trace-event-python.c     |  2 +-
 tools/perf/util/stat-display.c                      | 13 +++++++++++--
 tools/perf/util/stat.c                              |  3 ++-
 tools/perf/util/stat.h                              |  1 +
 5 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index f15b2f8aa14d..e3ff55de4f7a 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1309,6 +1309,7 @@ static int perf_stat_init_aggr_mode(void)
 		break;
 	case AGGR_GLOBAL:
 	case AGGR_THREAD:
+	case AGGR_IIO_STACK:
 	case AGGR_UNSET:
 	default:
 		break;
@@ -1499,6 +1500,7 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
 	case AGGR_NONE:
 	case AGGR_GLOBAL:
 	case AGGR_THREAD:
+	case AGGR_IIO_STACK:
 	case AGGR_UNSET:
 	default:
 		break;
@@ -2216,7 +2218,8 @@ int cmd_stat(int argc, const char **argv)
 	 * --per-thread is aggregated per thread, we dont mix it with cpu mode
 	 */
 	if (((stat_config.aggr_mode != AGGR_GLOBAL &&
-	      stat_config.aggr_mode != AGGR_THREAD) || nr_cgroups) &&
+	      stat_config.aggr_mode != AGGR_THREAD &&
+	      stat_config.aggr_mode != AGGR_IIO_STACK) || nr_cgroups) &&
 	    !target__has_cpu(&target)) {
 		fprintf(stderr, "both cgroup and no-aggregation "
 			"modes only available in system-wide mode\n");
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index c83c2c6564e0..e8b472faeae4 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -1401,7 +1401,7 @@ static void python_process_stat(struct perf_stat_config *config,
 	struct perf_cpu_map *cpus = counter->core.cpus;
 	int cpu, thread;
 
-	if (config->aggr_mode == AGGR_GLOBAL) {
+	if (config->aggr_mode == AGGR_GLOBAL || config->aggr_mode == AGGR_IIO_STACK) {
 		process_stat(counter, -1, -1, tstamp,
 			     &counter->counts->aggr);
 		return;
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 4b57c0c07632..3bfcdb80443a 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -133,6 +133,7 @@ static void aggr_printout(struct perf_stat_config *config,
 			config->csv_sep);
 		break;
 	case AGGR_GLOBAL:
+	case AGGR_IIO_STACK:
 	case AGGR_UNSET:
 	default:
 		break;
@@ -330,7 +331,7 @@ static int first_shadow_cpu(struct perf_stat_config *config,
 	if (config->aggr_mode == AGGR_NONE)
 		return id;
 
-	if (config->aggr_mode == AGGR_GLOBAL)
+	if (config->aggr_mode == AGGR_GLOBAL || config->aggr_mode == AGGR_IIO_STACK)
 		return 0;
 
 	for (i = 0; i < evsel__nr_cpus(evsel); i++) {
@@ -424,6 +425,7 @@ static void printout(struct perf_stat_config *config, int id, int nr,
 	if (config->csv_output && !config->metric_only) {
 		static int aggr_fields[] = {
 			[AGGR_GLOBAL] = 0,
+			[AGGR_IIO_STACK] = 0,
 			[AGGR_THREAD] = 1,
 			[AGGR_NONE] = 1,
 			[AGGR_SOCKET] = 2,
@@ -906,6 +908,7 @@ static int aggr_header_lens[] = {
 	[AGGR_NONE] = 6,
 	[AGGR_THREAD] = 24,
 	[AGGR_GLOBAL] = 0,
+	[AGGR_IIO_STACK] = 0,
 };
 
 static const char *aggr_header_csv[] = {
@@ -914,7 +917,8 @@ static const char *aggr_header_csv[] = {
 	[AGGR_SOCKET] 	= 	"socket,cpus",
 	[AGGR_NONE] 	= 	"cpu,",
 	[AGGR_THREAD] 	= 	"comm-pid,",
-	[AGGR_GLOBAL] 	=	""
+	[AGGR_GLOBAL]	=	"",
+	[AGGR_IIO_STACK] =	"port,"
 };
 
 static void print_metric_headers(struct perf_stat_config *config,
@@ -1001,6 +1005,9 @@ static void print_interval(struct perf_stat_config *config,
 			if (!metric_only)
 				fprintf(output, "                  counts %*s events\n", unit_width, "unit");
 			break;
+		case AGGR_IIO_STACK:
+			fprintf(output, "#           time    port        ");
+			break;
 		case AGGR_GLOBAL:
 		default:
 			fprintf(output, "#           time");
@@ -1246,6 +1253,8 @@ perf_evlist__print_counters(struct evlist *evlist,
 			}
 		}
 		break;
+	case AGGR_IIO_STACK:
+		break;
 	case AGGR_UNSET:
 	default:
 		break;
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index bd0decd6d753..6a655f909587 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -363,6 +363,7 @@ process_counter_values(struct perf_stat_config *config, struct evsel *evsel,
 		}
 		break;
 	case AGGR_GLOBAL:
+	case AGGR_IIO_STACK:
 		aggr->val += count->val;
 		aggr->ena += count->ena;
 		aggr->run += count->run;
@@ -424,7 +425,7 @@ int perf_stat_process_counter(struct perf_stat_config *config,
 	if (ret)
 		return ret;
 
-	if (config->aggr_mode != AGGR_GLOBAL)
+	if (config->aggr_mode != AGGR_GLOBAL && config->aggr_mode != AGGR_IIO_STACK)
 		return 0;
 
 	if (!counter->snapshot)
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 05adf8165025..d053ebd44ae3 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -50,6 +50,7 @@ enum aggr_mode {
 	AGGR_DIE,
 	AGGR_CORE,
 	AGGR_THREAD,
+	AGGR_IIO_STACK,
 	AGGR_UNSET,
 	AGGR_NODE,
 };
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/5] perf evsel: Introduce an observed performance device
  2020-12-10  9:03 [PATCH 0/5] perf stat: Introduce --iiostat mode to provide I/O performance metrics Alexander Antonov
  2020-12-10  9:03 ` [PATCH 1/5] perf stat: Add AGGR_IIO_STACK mode Alexander Antonov
@ 2020-12-10  9:03 ` Alexander Antonov
  2020-12-10  9:03 ` [PATCH 3/5] perf stat: Basic support for iiostat in perf stat Alexander Antonov
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Alexander Antonov @ 2020-12-10  9:03 UTC (permalink / raw)
  To: acme
  Cc: linux-kernel, jolsa, alexander.shishkin, mark.rutland, namhyung,
	mingo, peterz, ak, alexander.antonov

Adding evsel::perf_device void pointer.

For performance monitoring purposes, an evsel can have a related device.
These changes allow to attribute, for example, I/O performance metrics
to IIO stack.

Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com>
---
 tools/perf/util/evsel.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 79a860d8e3ee..c346920f477a 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -127,6 +127,7 @@ struct evsel {
 	 * See also evsel__has_callchain().
 	 */
 	__u64			synth_sample_type;
+	void			*perf_device;
 };
 
 struct perf_missing_features {
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/5] perf stat: Basic support for iiostat in perf stat
  2020-12-10  9:03 [PATCH 0/5] perf stat: Introduce --iiostat mode to provide I/O performance metrics Alexander Antonov
  2020-12-10  9:03 ` [PATCH 1/5] perf stat: Add AGGR_IIO_STACK mode Alexander Antonov
  2020-12-10  9:03 ` [PATCH 2/5] perf evsel: Introduce an observed performance device Alexander Antonov
@ 2020-12-10  9:03 ` Alexander Antonov
  2020-12-10  9:03 ` [PATCH 4/5] perf stat: Helper functions for IIO stacks list in iiostat mode Alexander Antonov
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Alexander Antonov @ 2020-12-10  9:03 UTC (permalink / raw)
  To: acme
  Cc: linux-kernel, jolsa, alexander.shishkin, mark.rutland, namhyung,
	mingo, peterz, ak, alexander.antonov

Add basic flow for a new --iiostat mode in perf stat. Mode is intended to
provide four I/O performance metrics per each IIO stack: Inbound Read,
Inbound Write, Outbound Read, Outbound Write.

The actual code to compute the metrics and attribute it to
evsel::perf_device is in follow-on patches.

Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com>
---
 tools/perf/builtin-stat.c      | 33 ++++++++++++++++++++++++++++-
 tools/perf/util/iiostat.h      | 33 +++++++++++++++++++++++++++++
 tools/perf/util/stat-display.c | 38 +++++++++++++++++++++++++++++++++-
 tools/perf/util/stat-shadow.c  | 11 +++++++++-
 tools/perf/util/stat.h         |  1 +
 5 files changed, 113 insertions(+), 3 deletions(-)
 create mode 100644 tools/perf/util/iiostat.h

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index e3ff55de4f7a..c8168cfe202b 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -67,6 +67,7 @@
 #include "util/top.h"
 #include "util/affinity.h"
 #include "util/pfm.h"
+#include "util/iiostat.h"
 #include "asm/bug.h"
 
 #include <linux/time64.h>
@@ -198,7 +199,8 @@ static struct perf_stat_config stat_config = {
 	.walltime_nsecs_stats	= &walltime_nsecs_stats,
 	.big_num		= true,
 	.ctl_fd			= -1,
-	.ctl_fd_ack		= -1
+	.ctl_fd_ack		= -1,
+	.iiostat_run		= false,
 };
 
 static bool cpus_map_matched(struct evsel *a, struct evsel *b)
@@ -1073,6 +1075,14 @@ static int parse_stat_cgroups(const struct option *opt,
 	return parse_cgroups(opt, str, unset);
 }
 
+__weak int iiostat_parse(const struct option *opt __maybe_unused,
+			 const char *str __maybe_unused,
+			 int unset __maybe_unused)
+{
+	pr_err("--iiostat mode is not supported\n");
+	return -1;
+}
+
 static struct option stat_options[] = {
 	OPT_BOOLEAN('T', "transaction", &transaction_run,
 		    "hardware transaction statistics"),
@@ -1185,6 +1195,8 @@ static struct option stat_options[] = {
 		     "\t\t\t  Optionally send control command completion ('ack\\n') to ack-fd descriptor.\n"
 		     "\t\t\t  Alternatively, ctl-fifo / ack-fifo will be opened and used as ctl-fd / ack-fd.",
 		      parse_control_option),
+	OPT_CALLBACK_OPTARG(0, "iiostat", &evsel_list, &stat_config, "root port",
+			    "measure PCIe metrics per IIO stack", iiostat_parse),
 	OPT_END()
 };
 
@@ -1509,6 +1521,12 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
 	return 0;
 }
 
+__weak int iiostat_show_root_ports(struct evlist *evlist __maybe_unused,
+				   struct perf_stat_config *config __maybe_unused)
+{
+	return 0;
+}
+
 /*
  * Add default attributes, if there were no attributes specified or
  * if -d/--detailed, -d -d or -d -d -d is used:
@@ -2054,6 +2072,10 @@ static void setup_system_wide(int forks)
 	}
 }
 
+__weak void iiostat_delete_root_ports(struct evlist *evlist __maybe_unused)
+{
+}
+
 int cmd_stat(int argc, const char **argv)
 {
 	const char * const stat_usage[] = {
@@ -2230,6 +2252,12 @@ int cmd_stat(int argc, const char **argv)
 		goto out;
 	}
 
+	if (stat_config.iiostat_run) {
+		status = iiostat_show_root_ports(evsel_list, &stat_config);
+		if (status || !stat_config.iiostat_run)
+			goto out;
+	}
+
 	if (add_default_attributes())
 		goto out;
 
@@ -2406,6 +2434,9 @@ int cmd_stat(int argc, const char **argv)
 	perf_stat__exit_aggr_mode();
 	perf_evlist__free_stats(evsel_list);
 out:
+	if (stat_config.iiostat_run)
+		iiostat_delete_root_ports(evsel_list);
+
 	zfree(&stat_config.walltime_run);
 
 	if (smi_cost && smi_reset)
diff --git a/tools/perf/util/iiostat.h b/tools/perf/util/iiostat.h
new file mode 100644
index 000000000000..6a905b2b40b9
--- /dev/null
+++ b/tools/perf/util/iiostat.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * perf stat --iiostat
+ *
+ * Copyright (C) 2020, Intel Corporation
+ *
+ * Authors: Alexander Antonov <alexander.antonov@linux.intel.com>
+ */
+
+#ifndef _IIOSTAT_H
+#define _IIOSTAT_H
+
+#include <subcmd/parse-options.h>
+#include "util/stat.h"
+#include "util/parse-events.h"
+#include "util/evlist.h"
+
+struct option;
+struct perf_stat_config;
+struct evlist;
+struct timespec;
+
+int iiostat_parse(const struct option *opt, const char *str,
+		  int unset __maybe_unused);
+void iiostat_prefix(struct perf_stat_config *config, struct evlist *evlist,
+		    char *prefix, struct timespec *ts);
+void iiostat_print_metric(struct perf_stat_config *config, struct evsel *evsel,
+			  struct perf_stat_output_ctx *out);
+int iiostat_show_root_ports(struct evlist *evlist,
+			    struct perf_stat_config *config);
+void iiostat_delete_root_ports(struct evlist *evlist);
+
+#endif /* _IIOSTAT_H */
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 3bfcdb80443a..9eb8484e8b90 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -17,6 +17,7 @@
 #include "cgroup.h"
 #include <api/fs/fs.h>
 #include "util.h"
+#include "iiostat.h"
 
 #define CNTR_NOT_SUPPORTED	"<not supported>"
 #define CNTR_NOT_COUNTED	"<not counted>"
@@ -310,6 +311,12 @@ static void print_metric_header(struct perf_stat_config *config,
 	struct outstate *os = ctx;
 	char tbuf[1024];
 
+	/* In case of iiostat, print metric header for first perf_device only */
+	if (os->evsel->perf_device && os->evsel->evlist->selected->perf_device &&
+	    config->iiostat_run &&
+	    os->evsel->perf_device != os->evsel->evlist->selected->perf_device)
+		return;
+
 	if (!valid_only_metric(unit))
 		return;
 	unit = fixunit(tbuf, os->evsel, unit);
@@ -942,6 +949,8 @@ static void print_metric_headers(struct perf_stat_config *config,
 			fputs("time,", config->output);
 		fputs(aggr_header_csv[config->aggr_mode], config->output);
 	}
+	if (config->iiostat_run && !config->interval && !config->csv_output)
+		fprintf(config->output, "   port         ");
 
 	/* Print metrics headers only */
 	evlist__for_each_entry(evlist, counter) {
@@ -959,6 +968,13 @@ static void print_metric_headers(struct perf_stat_config *config,
 	fputc('\n', config->output);
 }
 
+__weak void iiostat_prefix(struct perf_stat_config *config __maybe_unused,
+			    struct evlist *evlist __maybe_unused,
+			    char *prefix __maybe_unused,
+			    struct timespec *ts __maybe_unused)
+{
+}
+
 static void print_interval(struct perf_stat_config *config,
 			   struct evlist *evlist,
 			   char *prefix, struct timespec *ts)
@@ -971,7 +987,8 @@ static void print_interval(struct perf_stat_config *config,
 	if (config->interval_clear)
 		puts(CONSOLE_CLEAR);
 
-	sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, config->csv_sep);
+	if (!config->iiostat_run)
+		sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, config->csv_sep);
 
 	if ((num_print_interval == 0 && !config->csv_output) || config->interval_clear) {
 		switch (config->aggr_mode) {
@@ -1205,6 +1222,10 @@ perf_evlist__print_counters(struct evlist *evlist,
 	int interval = config->interval;
 	struct evsel *counter;
 	char buf[64], *prefix = NULL;
+	void *device = NULL;
+
+	if (config->iiostat_run)
+		evlist->selected = evlist__first(evlist);
 
 	if (interval)
 		print_interval(config, evlist, prefix = buf, ts);
@@ -1254,6 +1275,21 @@ perf_evlist__print_counters(struct evlist *evlist,
 		}
 		break;
 	case AGGR_IIO_STACK:
+		counter = evlist__first(evlist);
+		perf_evlist__set_selected(evlist, counter);
+		iiostat_prefix(config, evlist, prefix = buf, ts);
+		fprintf(config->output, "%s", prefix);
+		evlist__for_each_entry(evlist, counter) {
+			device = evlist->selected->perf_device;
+			if (device && device != counter->perf_device) {
+				perf_evlist__set_selected(evlist, counter);
+				iiostat_prefix(config, evlist, prefix, ts);
+				fprintf(config->output, "\n%s", prefix);
+			}
+			print_counter_aggr(config, counter, prefix);
+			if ((counter->idx + 1) == evlist->core.nr_entries)
+				fputc('\n', config->output);
+		}
 		break;
 	case AGGR_UNSET:
 	default:
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 901265127e36..c5851ceb4c6b 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -9,6 +9,7 @@
 #include "expr.h"
 #include "metricgroup.h"
 #include <linux/zalloc.h>
+#include "iiostat.h"
 
 /*
  * AGGR_GLOBAL: Use CPU 0
@@ -919,6 +920,12 @@ double test_generic_metric(struct metric_expr *mexp, int cpu, struct runtime_sta
 	return ratio;
 }
 
+__weak void iiostat_print_metric(struct perf_stat_config *config __maybe_unused,
+				  struct evsel *evsel __maybe_unused,
+				  struct perf_stat_output_ctx *out __maybe_unused)
+{
+}
+
 void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 				   struct evsel *evsel,
 				   double avg, int cpu,
@@ -934,7 +941,9 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 	struct metric_event *me;
 	int num = 1;
 
-	if (evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS)) {
+	if (config->iiostat_run) {
+		iiostat_print_metric(config, evsel, out);
+	} else if (evsel__match(evsel, HARDWARE, HW_INSTRUCTIONS)) {
 		total = runtime_stat_avg(st, STAT_CYCLES, ctx, cpu);
 
 		if (total) {
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index d053ebd44ae3..a072dfe3dbbf 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -118,6 +118,7 @@ struct perf_stat_config {
 	bool			 walltime_run_table;
 	bool			 all_kernel;
 	bool			 all_user;
+	bool			 iiostat_run;
 	bool			 percore_show_thread;
 	bool			 summary;
 	bool			 metric_no_group;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/5] perf stat: Helper functions for IIO stacks list in iiostat mode
  2020-12-10  9:03 [PATCH 0/5] perf stat: Introduce --iiostat mode to provide I/O performance metrics Alexander Antonov
                   ` (2 preceding siblings ...)
  2020-12-10  9:03 ` [PATCH 3/5] perf stat: Basic support for iiostat in perf stat Alexander Antonov
@ 2020-12-10  9:03 ` Alexander Antonov
  2020-12-10  9:03 ` [PATCH 5/5] perf stat: Enable --iiostat mode for x86 platforms Alexander Antonov
  2020-12-14 13:28 ` [PATCH 0/5] perf stat: Introduce --iiostat mode to provide I/O performance metrics Arnaldo Carvalho de Melo
  5 siblings, 0 replies; 10+ messages in thread
From: Alexander Antonov @ 2020-12-10  9:03 UTC (permalink / raw)
  To: acme
  Cc: linux-kernel, jolsa, alexander.shishkin, mark.rutland, namhyung,
	mingo, peterz, ak, alexander.antonov

Introduce helper functions to control IIO stacks list.
These helpers will be used in the follow-up patch.

Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com>
---
 tools/perf/arch/x86/util/iiostat.c | 125 +++++++++++++++++++++++++++++
 1 file changed, 125 insertions(+)
 create mode 100644 tools/perf/arch/x86/util/iiostat.c

diff --git a/tools/perf/arch/x86/util/iiostat.c b/tools/perf/arch/x86/util/iiostat.c
new file mode 100644
index 000000000000..70f93a96723f
--- /dev/null
+++ b/tools/perf/arch/x86/util/iiostat.c
@@ -0,0 +1,125 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * perf stat --iiostat
+ *
+ * Copyright (C) 2020, Intel Corporation
+ *
+ * Authors: Alexander Antonov <alexander.antonov@linux.intel.com>
+ */
+
+#include <api/fs/fs.h>
+#include <linux/kernel.h>
+#include <linux/err.h>
+#include <limits.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <dirent.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <regex.h>
+#include "util/cpumap.h"
+#include "util/debug.h"
+#include "util/iiostat.h"
+#include "util/counts.h"
+#include "path.h"
+
+struct iio_root_port {
+	u32 domain;
+	u8 bus;
+	u8 die;
+	u8 pmu_idx;
+	int idx;
+};
+
+struct iio_root_ports_list {
+	struct iio_root_port **rps;
+	int nr_entries;
+};
+
+static void iio_root_port_show(FILE *output, const struct iio_root_port * const rp)
+{
+	if (output && rp)
+		fprintf(output, "S%d-uncore_iio_%d<%04x:%02x>\n",
+			rp->die, rp->pmu_idx, rp->domain, rp->bus);
+}
+
+static struct iio_root_port *iio_root_port_new(u32 domain, u8 bus, u8 die, u8 pmu_idx)
+{
+	struct iio_root_port *p = calloc(1, sizeof(*p));
+
+	if (p) {
+		p->domain = domain;
+		p->bus = bus;
+		p->die = die;
+		p->pmu_idx = pmu_idx;
+	}
+	return p;
+}
+
+static struct iio_root_ports_list *iio_root_ports_list_new(void)
+{
+	struct iio_root_ports_list *list = calloc(1, sizeof(*list));
+
+	if (list) {
+		list->rps = calloc(1, sizeof(struct iio_root_port *));
+		if (!list->rps) {
+			free(list);
+			list = NULL;
+		}
+	}
+
+	return list;
+}
+
+static void iio_root_ports_list_free(struct iio_root_ports_list *list)
+{
+	int idx;
+
+	if (list) {
+		for (idx = 0; idx < list->nr_entries; idx++)
+			free(list->rps[idx]);
+		free(list->rps);
+		free(list);
+	}
+}
+
+static
+struct iio_root_port *iio_root_port_find_by_notation(const struct iio_root_ports_list * const list,
+						     u32 domain, u8 bus)
+{
+	int idx;
+	struct iio_root_port *rp;
+
+	if (list) {
+		for (idx = 0; idx < list->nr_entries; idx++) {
+			rp = list->rps[idx];
+			if (rp && rp->domain == domain && rp->bus == bus)
+				return rp;
+		}
+	}
+	return NULL;
+}
+
+static int iio_root_ports_list_insert(struct iio_root_ports_list *list,
+				      struct iio_root_port * const rp)
+{
+	struct iio_root_port **tmp_buf;
+
+	if (list && rp) {
+		rp->idx = list->nr_entries++;
+		/* One more for NULL.*/
+		tmp_buf = realloc(list->rps, (list->nr_entries + 1) * sizeof(*list->rps));
+		if (!tmp_buf) {
+			pr_err("Failed to realloc memory\n");
+			return -ENOMEM;
+		}
+		tmp_buf[rp->idx] = rp;
+		tmp_buf[list->nr_entries] = NULL;
+		list->rps = tmp_buf;
+	}
+	return 0;
+}
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/5] perf stat: Enable --iiostat mode for x86 platforms
  2020-12-10  9:03 [PATCH 0/5] perf stat: Introduce --iiostat mode to provide I/O performance metrics Alexander Antonov
                   ` (3 preceding siblings ...)
  2020-12-10  9:03 ` [PATCH 4/5] perf stat: Helper functions for IIO stacks list in iiostat mode Alexander Antonov
@ 2020-12-10  9:03 ` Alexander Antonov
  2020-12-14 13:28 ` [PATCH 0/5] perf stat: Introduce --iiostat mode to provide I/O performance metrics Arnaldo Carvalho de Melo
  5 siblings, 0 replies; 10+ messages in thread
From: Alexander Antonov @ 2020-12-10  9:03 UTC (permalink / raw)
  To: acme
  Cc: linux-kernel, jolsa, alexander.shishkin, mark.rutland, namhyung,
	mingo, peterz, ak, alexander.antonov

This functionality is based on recently introduced sysfs attributes
for Intel® Xeon® Scalable processor family (code name Skylake-SP):
Commit bb42b3d39781 ("perf/x86/intel/uncore: Expose an Uncore unit to
IIO PMON mapping")

Mode is intended to provide four I/O performance metrics in MB per each
IIO stack:
 - Inbound Read: I/O devices below IIO stack read from the host memory
 - Inbound Write: I/O devices below IIO stack write to the host memory
 - Outbound Read: CPU reads from I/O devices below IIO stack
 - Outbound Write: CPU writes to I/O devices below IIO stack

Each metric requiries only one IIO event which increments at every 4B
transfer in corresponding direction. The formulas to compute metrics
are generic:
    #EventCount * 4B / (1024 * 1024)

Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com>
---
 tools/perf/Documentation/perf-stat.txt |  31 +++
 tools/perf/arch/x86/util/Build         |   1 +
 tools/perf/arch/x86/util/iiostat.c     | 335 +++++++++++++++++++++++++
 3 files changed, 367 insertions(+)

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 5d4a673d7621..2c066f7e0681 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -121,6 +121,37 @@ to activate system-wide monitoring. Default is to count on all CPUs.
 -A::
 --no-aggr::
 Do not aggregate counts across all monitored CPUs.
+--iiostat::
+Mode is intended to provide four I/O performance metrics per each IIO
+stack (PCIe root port):
+    --Inbound Read(MB)   - I/O devices below IIO stack read from the host memory, in MB
+    --Inbound Write(MB)  - I/O devices below IIO stack write to the host memory, in MB
+    --Outbound Read(MB)  - CPU reads from I/O devices below IIO stack, in MB
+    --Outbound Write(MB) - CPU writes to I/O devices below IIO stack, in MB
+
+Sample output:
+
+Show all IIO stacks on 2-S platform:
+  $ perf stat --iiostat=show
+    S0-uncore_iio_0<0000:00>
+    S1-uncore_iio_0<0000:80>
+    S0-uncore_iio_1<0000:17>
+    S1-uncore_iio_1<0000:85>
+    S0-uncore_iio_2<0000:3a>
+    S1-uncore_iio_2<0000:ae>
+    S0-uncore_iio_3<0000:5d>
+    S1-uncore_iio_3<0000:d7>
+
+Print metrics for requested IIO stacks, multiple comma-separated list supported.
+  $ perf stat --iiostat=0000:17 -- dd if=/dev/zero of=/dev/nvme0n1 bs=1M oflag=direct
+		357708+0 records in
+		357707+0 records out
+		375083606016 bytes (375 GB, 349 GiB) copied, 213.997 s, 1.8 GB/s
+
+   Performance counter stats for 'system wide':
+
+     port             Inbound Read(MB)    Inbound Write(MB)    Outbound Read(MB)   Outbound Write(MB)
+  0000:17               358559                   44                    0                   22
 
 -n::
 --null::
diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index 347c39b960eb..6fa275d3d897 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -6,6 +6,7 @@ perf-y += perf_regs.o
 perf-y += topdown.o
 perf-y += machine.o
 perf-y += event.o
+perf-y += iiostat.o
 
 perf-$(CONFIG_DWARF) += dwarf-regs.o
 perf-$(CONFIG_BPF_PROLOGUE) += dwarf-regs.o
diff --git a/tools/perf/arch/x86/util/iiostat.c b/tools/perf/arch/x86/util/iiostat.c
index 70f93a96723f..44342a111746 100644
--- a/tools/perf/arch/x86/util/iiostat.c
+++ b/tools/perf/arch/x86/util/iiostat.c
@@ -27,6 +27,44 @@
 #include "util/counts.h"
 #include "path.h"
 
+#ifndef MAX_PATH
+#define MAX_PATH 1024
+#endif
+
+#define UNCORE_IIO_PMU_PATH	"devices/uncore_iio_%d"
+#define SYSFS_UNCORE_PMU_PATH	"%s/"UNCORE_IIO_PMU_PATH
+#define PLATFORM_MAPPING_PATH	UNCORE_IIO_PMU_PATH"/die%d"
+
+enum iiostat_mode_t {
+	IIOSTAT_NONE		= -1,
+	IIOSTAT_RUN		= 0,
+	IIOSTAT_SHOW		= 1
+};
+
+static enum iiostat_mode_t iiostat_mode = IIOSTAT_NONE;
+
+/*
+ * Each metric requiries only one IIO event which increments at every 4B transfer
+ * in corresponding direction. The formulas to compute metrics are generic:
+ *     #EventCount * 4B / (1024 * 1024)
+ */
+static const char * const iiostat_metrics[] = {
+	"Inbound Read(MB)",
+	"Inbound Write(MB)",
+	"Outbound Read(MB)",
+	"Outbound Write(MB)",
+};
+
+static inline int iiostat_metrics_count(void)
+{
+	return sizeof(iiostat_metrics) / sizeof(char *);
+}
+
+static const char *iiostat_metric_by_idx(int idx)
+{
+	return *(iiostat_metrics + idx % iiostat_metrics_count());
+}
+
 struct iio_root_port {
 	u32 domain;
 	u8 bus;
@@ -123,3 +161,300 @@ static int iio_root_ports_list_insert(struct iio_root_ports_list *list,
 	}
 	return 0;
 }
+
+static int uncore_pmu_iio_platform_mapping(u8 pmu_idx, struct iio_root_ports_list * const list)
+{
+	char *buf;
+	char path[MAX_PATH];
+	u32 domain;
+	u8 bus;
+	struct iio_root_port *rp;
+	size_t size;
+	int ret;
+
+	for (int die = 0; die < cpu__max_node(); die++) {
+		scnprintf(path, MAX_PATH, PLATFORM_MAPPING_PATH, pmu_idx, die);
+		if (sysfs__read_str(path, &buf, &size) < 0) {
+			if (pmu_idx)
+				goto out;
+			pr_err("Mode iiostat is not supported\n");
+			return -1;
+		}
+		ret = sscanf(buf, "%04x:%02hhx", &domain, &bus);
+		free(buf);
+		if (ret != 2) {
+			pr_err("Invalid mapping data: iio_%d; die%d\n", pmu_idx, die);
+			return -1;
+		}
+		rp = iio_root_port_new(domain, bus, die, pmu_idx);
+		if (!rp || iio_root_ports_list_insert(list, rp)) {
+			free(rp);
+			return -ENOMEM;
+		}
+	}
+out:
+	return 0;
+}
+
+static u8 iio_pmu_count(void)
+{
+	u8 pmu_idx = 0;
+	char path[MAX_PATH];
+	const char *sysfs = sysfs__mountpoint();
+
+	if (sysfs) {
+		for (;; pmu_idx++) {
+			snprintf(path, sizeof(path), SYSFS_UNCORE_PMU_PATH,
+				 sysfs, pmu_idx);
+			if (access(path, F_OK) != 0)
+				break;
+		}
+	}
+	return pmu_idx;
+}
+
+static int iio_root_ports_scan(struct iio_root_ports_list **list)
+{
+	int ret = -ENOMEM;
+	struct iio_root_ports_list *tmp_list;
+	u8 pmu_count = iio_pmu_count();
+
+	if (!pmu_count) {
+		pr_err("Unsupported uncore pmu configuration\n");
+		return -1;
+	}
+
+	tmp_list = iio_root_ports_list_new();
+	if (!tmp_list)
+		goto err;
+
+	for (u8 pmu_idx = 0; pmu_idx < pmu_count; pmu_idx++) {
+		ret = uncore_pmu_iio_platform_mapping(pmu_idx, tmp_list);
+		if (ret)
+			break;
+	}
+err:
+	if (!ret)
+		*list = tmp_list;
+	else
+		iio_root_ports_list_free(tmp_list);
+
+	return ret;
+}
+
+static int iio_root_port_parse_str(u32 *domain, u8 *bus, char *str)
+{
+	int ret;
+	regex_t regex;
+	/*
+	 * Expected format domain:bus:
+	 * Valid domain range [0:ffff]
+	 * Valid bus range [0:ff]
+	 * Example: 0000:af, 0:3d, 01:7
+	 */
+	regcomp(&regex, "^([a-f0-9A-F]{1,}):([a-f0-9A-F]{1,2})$", REG_EXTENDED);
+	ret = regexec(&regex, str, 0, NULL, 0);
+	if (ret || sscanf(str, "%08x:%02hhx", domain, bus) != 2)
+		pr_warning("Unrecognized root port format: %s\n"
+			   "Please use the following format:\n"
+			   "\t [domain]:[bus]\n"
+			   "\t for example: 0000:3d\n", str);
+
+	regfree(&regex);
+	return ret;
+}
+
+static int iio_root_ports_list_filter(struct iio_root_ports_list **list,
+				      const char *filter)
+{
+	char *tok, *tmp, *filter_copy = NULL;
+	struct iio_root_port *rp;
+	u32 domain;
+	u8 bus;
+	int ret = -ENOMEM;
+	struct iio_root_ports_list *tmp_list = iio_root_ports_list_new();
+
+	if (!tmp_list)
+		goto err;
+
+	filter_copy = strdup(filter);
+	if (!filter_copy)
+		goto err;
+
+	for (tok = strtok_r(filter_copy, ",", &tmp); tok; tok = strtok_r(NULL, ",", &tmp)) {
+		if (!iio_root_port_parse_str(&domain, &bus, tok)) {
+			rp = iio_root_port_find_by_notation(*list, domain, bus);
+			if (rp) {
+				(*list)->rps[rp->idx] = NULL;
+				ret = iio_root_ports_list_insert(tmp_list, rp);
+				if (ret) {
+					free(rp);
+					goto err;
+				}
+			} else if (!iio_root_port_find_by_notation(tmp_list, domain, bus))
+				pr_warning("Root port %04x:%02x were not found\n", domain, bus);
+		}
+	}
+
+	if (tmp_list->nr_entries == 0) {
+		pr_err("Requested root ports were not found\n");
+		ret = -EINVAL;
+	}
+err:
+	iio_root_ports_list_free(*list);
+	if (ret)
+		iio_root_ports_list_free(tmp_list);
+	else
+		*list = tmp_list;
+
+	free(filter_copy);
+	return ret;
+}
+
+static int iiostat_event_group(struct evlist *evl, struct iio_root_ports_list *list)
+{
+	int ret;
+	struct iio_root_port **rp;
+	const char *iiostat_cmd_template =
+		"{uncore_iio_%x/event=0x83,umask=0x04,ch_mask=0xF,fc_mask=0x07/,\
+		uncore_iio_%x/event=0x83,umask=0x01,ch_mask=0xF,fc_mask=0x07/,\
+		uncore_iio_%x/event=0xc0,umask=0x04,ch_mask=0xF,fc_mask=0x07/,\
+		uncore_iio_%x/event=0xc0,umask=0x01,ch_mask=0xF,fc_mask=0x07/}";
+	const int len_template = strlen(iiostat_cmd_template) + 1;
+	struct evsel *evsel = NULL;
+	int metrics_count = iiostat_metrics_count();
+	char *iiostat_cmd = calloc(len_template, 1);
+
+	if (!iiostat_cmd)
+		return -ENOMEM;
+
+	for (rp = list->rps; *rp; rp++) {
+		sprintf(iiostat_cmd, iiostat_cmd_template,
+			(*rp)->pmu_idx, (*rp)->pmu_idx, (*rp)->pmu_idx, (*rp)->pmu_idx);
+		ret = parse_events(evl, iiostat_cmd, NULL);
+		if (ret)
+			goto err;
+	}
+
+	evlist__for_each_entry(evl, evsel) {
+		evsel->perf_device = list->rps[evsel->idx / metrics_count];
+	}
+	list->nr_entries = 0;
+err:
+	iio_root_ports_list_free(list);
+	free(iiostat_cmd);
+	return ret;
+}
+
+int iiostat_parse(const struct option *opt, const char *str,
+		  int unset __maybe_unused)
+{
+	int ret;
+	struct iio_root_ports_list *list;
+	struct evlist *evl = *(struct evlist **)opt->value;
+	struct perf_stat_config *config = (struct perf_stat_config *)opt->data;
+
+	if (evl->core.nr_entries > 0) {
+		pr_err("Unsupported event configuration\n");
+		return -1;
+	}
+	config->metric_only = true;
+	config->aggr_mode = AGGR_IIO_STACK;
+	config->iiostat_run = true;
+	ret = iio_root_ports_scan(&list);
+	if (ret)
+		return ret;
+
+	if (!str) {
+		iiostat_mode = IIOSTAT_RUN;
+	} else if (!strcmp(str, "show")) {
+		iiostat_mode = IIOSTAT_SHOW;
+	} else {
+		iiostat_mode = IIOSTAT_RUN;
+		ret = iio_root_ports_list_filter(&list, str);
+		if (ret)
+			return ret;
+	}
+	return iiostat_event_group(evl, list);
+}
+
+void iiostat_prefix(struct perf_stat_config *config,
+		    struct evlist *evlist,
+		    char *prefix, struct timespec *ts)
+{
+	struct iio_root_port *rp = evlist->selected->perf_device;
+
+	if (rp) {
+		if (ts)
+			sprintf(prefix, "%6lu.%09lu%s%04x:%02x%s",
+					ts->tv_sec, ts->tv_nsec,
+					config->csv_sep, rp->domain, rp->bus,
+					config->csv_sep);
+		else
+			sprintf(prefix, "%04x:%02x%s", rp->domain, rp->bus,
+				config->csv_sep);
+	}
+}
+
+void iiostat_print_metric(struct perf_stat_config *config, struct evsel *evsel,
+			  struct perf_stat_output_ctx *out)
+{
+	double iiostat_value = 0;
+	u64 prev_count_val = 0;
+	const char *iiostat_metric = iiostat_metric_by_idx(evsel->idx);
+	u8 die = ((struct iio_root_port *)evsel->perf_device)->die;
+	struct perf_counts_values *count = perf_counts(evsel->counts, die, 0);
+
+	if (evsel->prev_raw_counts && !out->force_header) {
+		struct perf_counts_values *prev_count = perf_counts(evsel->prev_raw_counts, die, 0);
+
+		prev_count_val = prev_count->val;
+		prev_count->val = count->val;
+	}
+	iiostat_value = (count->val - prev_count_val) / ((double) count->run / count->ena);
+	out->print_metric(config, out->ctx, NULL, "%8.0f", iiostat_metric,
+			  iiostat_value / (256 * 1024));
+}
+
+int iiostat_show_root_ports(struct evlist *evlist, struct perf_stat_config *config)
+{
+	struct evsel *evsel;
+	struct iio_root_port *rp = NULL;
+	bool is_show_mode = (iiostat_mode == IIOSTAT_SHOW);
+
+	if (config->aggr_mode != AGGR_IIO_STACK) {
+		pr_err("Unsupported event configuration\n");
+		return -1;
+	}
+
+	if (is_show_mode || verbose) {
+		evlist__for_each_entry(evlist, evsel) {
+			if (!evsel->perf_device) {
+				pr_err("Unsupported event configuration\n");
+				return -1;
+			}
+			if (rp != evsel->perf_device) {
+				rp = evsel->perf_device;
+				iio_root_port_show(config->output, rp);
+			}
+		}
+	}
+	/* Stop iiostat for show mode*/
+	config->iiostat_run = !is_show_mode;
+	if (is_show_mode)
+		iiostat_delete_root_ports(evlist);
+	return 0;
+}
+
+void iiostat_delete_root_ports(struct evlist *evlist)
+{
+	struct evsel *evsel;
+	struct iio_root_port *rp = NULL;
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (rp != evsel->perf_device) {
+			rp = evsel->perf_device;
+			free(evsel->perf_device);
+		}
+	}
+}
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/5] perf stat: Introduce --iiostat mode to provide I/O performance metrics
  2020-12-10  9:03 [PATCH 0/5] perf stat: Introduce --iiostat mode to provide I/O performance metrics Alexander Antonov
                   ` (4 preceding siblings ...)
  2020-12-10  9:03 ` [PATCH 5/5] perf stat: Enable --iiostat mode for x86 platforms Alexander Antonov
@ 2020-12-14 13:28 ` Arnaldo Carvalho de Melo
  2020-12-15  3:04   ` Andi Kleen
  5 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-12-14 13:28 UTC (permalink / raw)
  To: Alexander Antonov
  Cc: linux-kernel, jolsa, alexander.shishkin, mark.rutland, namhyung,
	mingo, peterz, ak, Ian Rogers

Em Thu, Dec 10, 2020 at 12:03:35PM +0300, Alexander Antonov escreveu:
> Mode is intended to provide four I/O performance metrics in MB per each
> IIO stack:
>  - Inbound Read: I/O devices below IIO stack read from the host memory
>  - Inbound Write: I/O devices below IIO stack write to the host memory
>  - Outbound Read: CPU reads from I/O devices below IIO stack
>  - Outbound Write: CPU writes to I/O devices below IIO stack
> 
> Each metric requiries only one IIO event which increments at every 4B
> transfer in corresponding direction. The formulas to compute metrics
> are generic:
>     #EventCount * 4B / (1024 * 1024)
> 
> Note: --iiostat introduces new perf data aggregation mode - per I/O stack
> hence -e and -M options are not supported.
> 
> Usage examples:

My first thought was: Why not have a 'perf iiostat' subcommand?

You're reusing the aggregation code for 'perf stat' and for that I'd
love to have Ian, Andi, Jiri et all to look at how you implemented it,
but I think having a shorter way of using this would be interesting :-)

- Arnaldo
 
> 1. List all IIO stacks (example for 2-S platform):
>    $ perf stat --iiostat=show

  Would be:

     $ perf iiostat show

>    S0-uncore_iio_0<0000:00>
>    S1-uncore_iio_0<0000:80>
>    S0-uncore_iio_1<0000:17>
>    S1-uncore_iio_1<0000:85>
>    S0-uncore_iio_2<0000:3a>
>    S1-uncore_iio_2<0000:ae>
>    S0-uncore_iio_3<0000:5d>
>    S1-uncore_iio_3<0000:d7>
> 
> 2. Collect metrics for all I/O stacks:
>    $ perf stat --iiostat -- dd if=/dev/zero of=/dev/nvme0n1 bs=1M oflag=direct

     $ perf iiostat -- dd if=/dev/zero of=/dev/nvme0n1 bs=1M oflag=direct

>    357708+0 records in
>    357707+0 records out
>    375083606016 bytes (375 GB, 349 GiB) copied, 215.974 s, 1.7 GB/s
> 
>     Performance counter stats for 'system wide':
> 
>       port             Inbound Read(MB)    Inbound Write(MB)    Outbound Read(MB)   Outbound Write(MB) 
>    0000:00                    1                    0                    2                    3 
>    0000:80                    0                    0                    0                    0 
>    0000:17               352552                   43                    0                   21 
>    0000:85                    0                    0                    0                    0 
>    0000:3a                    3                    0                    0                    0 
>    0000:ae                    0                    0                    0                    0 
>    0000:5d                    0                    0                    0                    0 
>    0000:d7                    0                    0                    0                    0
> 
> 3. Collect metrics for comma separated list of I/O stacks:
>    $ perf stat --iiostat=0000:17,0:3a -- dd if=/dev/zero of=/dev/nvme0n1 bs=1M oflag=direct

     $ perf iiostat 0000:17,0:3a -- dd if=/dev/zero of=/dev/nvme0n1 bs=1M oflag=direct

>    357708+0 records in
>    357707+0 records out
>    375083606016 bytes (375 GB, 349 GiB) copied, 197.08 s, 1.9 GB/s
> 
>     Performance counter stats for 'system wide':
> 
>       port             Inbound Read(MB)    Inbound Write(MB)    Outbound Read(MB)   Outbound Write(MB) 
>    0000:17               358559                   44                    0                   22 
>    0000:3a                    3                    2                    0                    0 
> 
>         197.081983474 seconds time elapsed
> 
> Alexander Antonov (5):
>   perf stat: Add AGGR_IIO_STACK mode
>   perf evsel: Introduce an observed performance device
>   perf stat: Basic support for iiostat in perf stat
>   perf stat: Helper functions for IIO stacks list in iiostat mode
>   perf stat: Enable --iiostat mode for x86 platforms
> 
>  tools/perf/Documentation/perf-stat.txt        |  31 ++
>  tools/perf/arch/x86/util/Build                |   1 +
>  tools/perf/arch/x86/util/iiostat.c            | 460 ++++++++++++++++++
>  tools/perf/builtin-stat.c                     |  38 +-
>  tools/perf/util/evsel.h                       |   1 +
>  tools/perf/util/iiostat.h                     |  33 ++
>  .../scripting-engines/trace-event-python.c    |   2 +-
>  tools/perf/util/stat-display.c                |  51 +-
>  tools/perf/util/stat-shadow.c                 |  11 +-
>  tools/perf/util/stat.c                        |   3 +-
>  tools/perf/util/stat.h                        |   2 +
>  11 files changed, 625 insertions(+), 8 deletions(-)
>  create mode 100644 tools/perf/arch/x86/util/iiostat.c
>  create mode 100644 tools/perf/util/iiostat.h
> 
> 
> base-commit: 644bf4b0f7acde641d3db200b4db66977e96c3bd
> -- 
> 2.19.1
> 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/5] perf stat: Introduce --iiostat mode to provide I/O performance metrics
  2020-12-14 13:28 ` [PATCH 0/5] perf stat: Introduce --iiostat mode to provide I/O performance metrics Arnaldo Carvalho de Melo
@ 2020-12-15  3:04   ` Andi Kleen
  2020-12-15 13:58     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2020-12-15  3:04 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Alexander Antonov, linux-kernel, jolsa, alexander.shishkin,
	mark.rutland, namhyung, mingo, peterz, Ian Rogers

> My first thought was: Why not have a 'perf iiostat' subcommand?

Same would apply to a lot of options in perf stat.

I guess you could add some aliases to "perf" that give shortcuts
for common perf stat command lines.

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/5] perf stat: Introduce --iiostat mode to provide I/O performance metrics
  2020-12-15  3:04   ` Andi Kleen
@ 2020-12-15 13:58     ` Arnaldo Carvalho de Melo
  2020-12-20 17:23       ` Alexander Antonov
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-12-15 13:58 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alexander Antonov, linux-kernel, jolsa, alexander.shishkin,
	mark.rutland, namhyung, mingo, peterz, Ian Rogers

Em Mon, Dec 14, 2020 at 07:04:30PM -0800, Andi Kleen escreveu:
> > My first thought was: Why not have a 'perf iiostat' subcommand?
 
> Same would apply to a lot of options in perf stat.
 
> I guess you could add some aliases to "perf" that give shortcuts
> for common perf stat command lines.

Yeah, and we have a mechanism for that, that was exercised only in the
'perf archive' case:

~/libexec/perf-core/perf-archive

I tried this and it works:

[root@five ~]# ls -la ~/bin/perf
lrwxrwxrwx. 1 root root 19 Feb 18  2020 /root/bin/perf -> /home/acme/bin/perf
[root@five ~]# vim ~acme/libexec/perf-core/perf-cgtop
[root@five ~]# chmod +x ~acme/libexec/perf-core/perf-cgtop
[root@five ~]# cat ~acme/libexec/perf-core/perf-cgtop
perf top --hierarchy --all-cgroups -s cgroup,dso,sym $*
[root@five ~]# perf cgtop
[root@five ~]#

use 'e' to expand collapse the current level (+ -> -), 'E'/'C' to
expand/collapse all levels.

'perf help' doesn't show it, which is a shame, I'll add support for it
to traverse ~/libexec/perf-core/perf-* and get the first non interpreter
comment line as a description for the command, so to add a new one is
just a matter of dropping a shell + man page, no need to change the perf
binary.


To test that '$*' at the end:

[root@five ~]# perf cgtop -U

I.e.:

[acme@five perf]$ perf top -h -U

 Usage: perf top [<options>]

    -U, --hide_user_symbols
                          hide user symbols

[acme@five perf]$

And it works, just kernel level samples grouped in an hierarchy, first
cgroup, then dso, then the symbol.

Also, using this with the 'P' hotkey:

[root@five ~]# perf cgtop --percent-limit 1

Shows how it looks like:

[root@five ~]# cat perf.hist.0 
-  86.77%        /user.slice/user-1000.slice/session-2.scope
   -  36.18%        [kernel]
          2.24%        [k] unmap_page_range
          1.15%        [k] clear_page_rep
          1.10%        [k] add_mm_counter_fast
          1.03%        [k] alloc_set_pte
          1.03%        [k] handle_mm_fault
   -  17.65%        libc-2.32.so
          2.04%        [.] _int_malloc
          1.82%        [.] __memmove_avx_unaligned_erms
          1.48%        [.] __strlen_avx2
          1.13%        [.] _int_free
          1.12%        [.] malloc
   -   8.09%        make
          1.65%        [.] jhash_string
          1.05%        [.] hash_find_slot
   -   6.90%        ld-2.32.so
          2.03%        [.] do_lookup_x
          1.49%        [.] _dl_lookup_symbol_x
   -   4.78%        cc1
   -   4.60%        libperl.so.5.32.0
   -   2.86%        bash
   -   1.98%        libselinux.so.1
   -   1.61%        libpython2.7.so.1.0
   -   1.06%        libpcre2-8.so.0.10.0
-   9.17%        /user.slice/user-1000.slice/session-4.scope
   -   4.66%        perf
   -   2.40%        libc-2.32.so
   -   1.82%        [kernel]
-   4.04%        /
   -   4.02%        [kernel]
[root@five ~]#

So 'perf iiostat' would become:

[root@five ~]# cat ~acme/libexec/perf-core/perf-iiostat
perf stat --iiostat $*
[root@five ~]#

There are parameters to that '--iiostat' in the current patchset that
may complicates this tho, with some changes I guess we get what we want.

- Arnaldo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/5] perf stat: Introduce --iiostat mode to provide I/O performance metrics
  2020-12-15 13:58     ` Arnaldo Carvalho de Melo
@ 2020-12-20 17:23       ` Alexander Antonov
  0 siblings, 0 replies; 10+ messages in thread
From: Alexander Antonov @ 2020-12-20 17:23 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Andi Kleen
  Cc: linux-kernel, jolsa, alexander.shishkin, mark.rutland, namhyung,
	mingo, peterz, Ian Rogers

On 12/15/2020 4:58 PM, Arnaldo Carvalho de Melo wrote:
> Em Mon, Dec 14, 2020 at 07:04:30PM -0800, Andi Kleen escreveu:
>>> My first thought was: Why not have a 'perf iiostat' subcommand?
>   
>> Same would apply to a lot of options in perf stat.
>   
>> I guess you could add some aliases to "perf" that give shortcuts
>> for common perf stat command lines.
> Yeah, and we have a mechanism for that, that was exercised only in the
> 'perf archive' case:
>
> ~/libexec/perf-core/perf-archive
>
> I tried this and it works:
>
> [root@five ~]# ls -la ~/bin/perf
> lrwxrwxrwx. 1 root root 19 Feb 18  2020 /root/bin/perf -> /home/acme/bin/perf
> [root@five ~]# vim ~acme/libexec/perf-core/perf-cgtop
> [root@five ~]# chmod +x ~acme/libexec/perf-core/perf-cgtop
> [root@five ~]# cat ~acme/libexec/perf-core/perf-cgtop
> perf top --hierarchy --all-cgroups -s cgroup,dso,sym $*
> [root@five ~]# perf cgtop
> [root@five ~]#
>
> use 'e' to expand collapse the current level (+ -> -), 'E'/'C' to
> expand/collapse all levels.
>
> 'perf help' doesn't show it, which is a shame, I'll add support for it
> to traverse ~/libexec/perf-core/perf-* and get the first non interpreter
> comment line as a description for the command, so to add a new one is
> just a matter of dropping a shell + man page, no need to change the perf
> binary.
>
>
> To test that '$*' at the end:
>
> [root@five ~]# perf cgtop -U
>
> I.e.:
>
> [acme@five perf]$ perf top -h -U
>
>   Usage: perf top [<options>]
>
>      -U, --hide_user_symbols
>                            hide user symbols
>
> [acme@five perf]$
>
> And it works, just kernel level samples grouped in an hierarchy, first
> cgroup, then dso, then the symbol.
>
> Also, using this with the 'P' hotkey:
>
> [root@five ~]# perf cgtop --percent-limit 1
>
> Shows how it looks like:
>
> [root@five ~]# cat perf.hist.0
> -  86.77%        /user.slice/user-1000.slice/session-2.scope
>     -  36.18%        [kernel]
>            2.24%        [k] unmap_page_range
>            1.15%        [k] clear_page_rep
>            1.10%        [k] add_mm_counter_fast
>            1.03%        [k] alloc_set_pte
>            1.03%        [k] handle_mm_fault
>     -  17.65%        libc-2.32.so
>            2.04%        [.] _int_malloc
>            1.82%        [.] __memmove_avx_unaligned_erms
>            1.48%        [.] __strlen_avx2
>            1.13%        [.] _int_free
>            1.12%        [.] malloc
>     -   8.09%        make
>            1.65%        [.] jhash_string
>            1.05%        [.] hash_find_slot
>     -   6.90%        ld-2.32.so
>            2.03%        [.] do_lookup_x
>            1.49%        [.] _dl_lookup_symbol_x
>     -   4.78%        cc1
>     -   4.60%        libperl.so.5.32.0
>     -   2.86%        bash
>     -   1.98%        libselinux.so.1
>     -   1.61%        libpython2.7.so.1.0
>     -   1.06%        libpcre2-8.so.0.10.0
> -   9.17%        /user.slice/user-1000.slice/session-4.scope
>     -   4.66%        perf
>     -   2.40%        libc-2.32.so
>     -   1.82%        [kernel]
> -   4.04%        /
>     -   4.02%        [kernel]
> [root@five ~]#
>
> So 'perf iiostat' would become:
>
> [root@five ~]# cat ~acme/libexec/perf-core/perf-iiostat
> perf stat --iiostat $*
> [root@five ~]#
>
> There are parameters to that '--iiostat' in the current patchset that
> may complicates this tho, with some changes I guess we get what we want.
>
> - Arnaldo

Hello Arnaldo,
Sorry for delayed response.

This is the interesting approach to get shorter command. Thank you for the
explanation. I will update the patchset.

- Alexander

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-12-20 17:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-10  9:03 [PATCH 0/5] perf stat: Introduce --iiostat mode to provide I/O performance metrics Alexander Antonov
2020-12-10  9:03 ` [PATCH 1/5] perf stat: Add AGGR_IIO_STACK mode Alexander Antonov
2020-12-10  9:03 ` [PATCH 2/5] perf evsel: Introduce an observed performance device Alexander Antonov
2020-12-10  9:03 ` [PATCH 3/5] perf stat: Basic support for iiostat in perf stat Alexander Antonov
2020-12-10  9:03 ` [PATCH 4/5] perf stat: Helper functions for IIO stacks list in iiostat mode Alexander Antonov
2020-12-10  9:03 ` [PATCH 5/5] perf stat: Enable --iiostat mode for x86 platforms Alexander Antonov
2020-12-14 13:28 ` [PATCH 0/5] perf stat: Introduce --iiostat mode to provide I/O performance metrics Arnaldo Carvalho de Melo
2020-12-15  3:04   ` Andi Kleen
2020-12-15 13:58     ` Arnaldo Carvalho de Melo
2020-12-20 17:23       ` Alexander Antonov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.