* [PATCH v1 0/2] perf stat: Support --all-kernel and --all-user @ 2019-09-25 2:02 Jin Yao 2019-09-25 2:02 ` [PATCH v1 1/2] perf stat: Support --all-kernel and --all-user options Jin Yao ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Jin Yao @ 2019-09-25 2:02 UTC (permalink / raw) To: acme, jolsa, peterz, mingo, alexander.shishkin Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao This patch series supports the new options "--all-kernel" and "--all-user" in perf-stat. For example, root@kbl:~# perf stat -e cycles,instructions --all-kernel --all-user -a -- sleep 1 Performance counter stats for 'system wide': 19,156,665 cycles:k 7,265,342 instructions:k # 0.38 insn per cycle 4,511,186,293 cycles:u 121,881,436 instructions:u # 0.03 insn per cycle 1.001153540 seconds time elapsed root@kbl:~# perf stat -a --topdown --all-kernel -- sleep 1 Performance counter stats for 'system wide': retiring:k bad speculation:k frontend bound:k backend bound:k S0-D0-C0 2 7.6% 1.8% 40.5% 50.0% S0-D0-C1 2 15.4% 3.4% 14.4% 66.8% S0-D0-C2 2 15.8% 5.1% 26.9% 52.2% S0-D0-C3 2 5.7% 5.7% 46.2% 42.4% 1.000771709 seconds time elapsed More detail information are in the patch descriptions. Jin Yao (2): perf stat: Support --all-kernel and --all-user options perf stat: Support topdown with --all-kernel/--all-user tools/perf/Documentation/perf-record.txt | 3 +- tools/perf/Documentation/perf-stat.txt | 7 + tools/perf/builtin-stat.c | 200 ++++++++++++++++++++++- tools/perf/util/stat-shadow.c | 167 ++++++++++++++----- tools/perf/util/stat.h | 23 +++ 5 files changed, 353 insertions(+), 47 deletions(-) -- 2.17.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v1 1/2] perf stat: Support --all-kernel and --all-user options 2019-09-25 2:02 [PATCH v1 0/2] perf stat: Support --all-kernel and --all-user Jin Yao @ 2019-09-25 2:02 ` Jin Yao 2019-09-25 2:02 ` [PATCH v1 2/2] perf stat: Support topdown with --all-kernel/--all-user Jin Yao 2019-09-29 15:29 ` [PATCH v1 0/2] perf stat: Support --all-kernel and --all-user Jiri Olsa 2 siblings, 0 replies; 14+ messages in thread From: Jin Yao @ 2019-09-25 2:02 UTC (permalink / raw) To: acme, jolsa, peterz, mingo, alexander.shishkin Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao perf record has supported --all-kernel / --all-user to configure all used events to run in kernel space or in user space. But perf stat doesn't support that. It would be useful to support these options so that we can collect metrics for e.g. user space only without having to type ":u" in the events manually. Also it would be useful if --all-user / --all-kernel could be specified together, and the tool would automatically add two copies of the events, so that we get a break down between user and kernel. Since we need to support specifying both --all-user and --all-kernel together, we can't do as what the perf record does for supporting --all-user / --all-kernel (it sets attr->exclude_kernel and attr->exclude_user). This patch uses another solution which appends the modifiers ":u"/":k" to event string. For example, perf stat -e cycles,instructions --all-user --all-kernel It's automatically expanded to: perf stat -e cycles:k,instructions:k,cycles:u,instructions:u More examples, root@kbl:~# perf stat -e cycles --all-kernel --all-user -a -- sleep 1 Performance counter stats for 'system wide': 20,884,637 cycles:k 4,511,494,722 cycles:u 1.000891147 seconds time elapsed root@kbl:~# perf stat -e cycles,instructions --all-kernel --all-user -a -- sleep 1 Performance counter stats for 'system wide': 19,156,665 cycles:k 7,265,342 instructions:k # 0.38 insn per cycle 4,511,186,293 cycles:u 121,881,436 instructions:u # 0.03 insn per cycle 1.001153540 seconds time elapsed root@kbl:~# perf stat -e "{cycles,instructions}" --all-kernel --all-user -a -- sleep 1 Performance counter stats for 'system wide': 16,230,472 cycles:k 5,357,549 instructions:k # 0.33 insn per cycle 4,510,695,030 cycles:u 122,097,780 instructions:u # 0.03 insn per cycle 1.000933419 seconds time elapsed root@kbl:~# perf stat -e "{cycles,instructions},{cache-misses,cache-references}" --all-kernel --all-user -a -- sleep 1 Performance counter stats for 'system wide': 111,688,302 cycles:k (74.81%) 24,322,238 instructions:k # 0.22 insn per cycle (74.81%) 1,115,414 cache-misses:k # 21.292 % of all cache refs (75.02%) 5,238,665 cache-references:k (75.02%) 4,506,792,681 cycles:u (75.22%) 124,199,635 instructions:u # 0.03 insn per cycle (75.22%) 43,846,543 cache-misses:u # 62.616 % of all cache refs (74.97%) 70,024,231 cache-references:u (74.97%) 1.001186804 seconds time elapsed Note: 1. This patch can only configure the specified events (-e xxx) to run in user space or in kernel space. For supporting other options, such as --topdown, need follow-up patches. 2. In perf-record, it has already supported the --all-kernel and --all-user, but they can't be combined. We should keep the behavior consistent among all perf subtools. So if this patch can be accepted, will post follow-up patches for supporting other subtools for the same behavior. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> --- tools/perf/Documentation/perf-record.txt | 3 +- tools/perf/Documentation/perf-stat.txt | 7 + tools/perf/builtin-stat.c | 163 ++++++++++++++++++++++- tools/perf/util/stat.h | 11 ++ 4 files changed, 182 insertions(+), 2 deletions(-) diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index c6f9f31b6039..739e29905184 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -495,7 +495,8 @@ Produce compressed trace using specified level n (default: 1 - fastest compressi Configure all used events to run in kernel space. --all-user:: -Configure all used events to run in user space. +Configure all used events to run in user space. The --all-kernel +and --all-user can't be combined. --kernel-callchains:: Collect callchains only from kernel space. I.e. this option sets diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt index 930c51c01201..3f630e2f4144 100644 --- a/tools/perf/Documentation/perf-stat.txt +++ b/tools/perf/Documentation/perf-stat.txt @@ -323,6 +323,13 @@ The output is SMI cycles%, equals to (aperf - unhalted core cycles) / aperf Users who wants to get the actual value can apply --no-metric-only. +--all-kernel:: +Configure all specified events to run in kernel space. + +--all-user:: +Configure all specified events to run in user space. The --all-kernel +and --all-user can be combined. + EXAMPLES -------- diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 468fc49420ce..7f4d22b00d04 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -65,6 +65,7 @@ #include "util/target.h" #include "util/time-utils.h" #include "util/top.h" +#include "util/strbuf.h" #include "asm/bug.h" #include <linux/time64.h> @@ -723,12 +724,124 @@ static int parse_metric_groups(const struct option *opt, return metricgroup__parse_groups(opt, str, &stat_config.metric_events); } +static int get_char_num(const char *str, char ch) +{ + int i = 0; + + while (*str != '\0') { + if (*str == ch) + i++; + str++; + } + + return i; +} + +static int append_uk(struct strbuf *new_str, char *old_str, char ch) +{ + char *str_dup, *name, *next, *sep; + int event_num = get_char_num(old_str, ',') + 1; + + str_dup = strdup(old_str); + if (!str_dup) + return -1; + + name = strtok_r(str_dup, ",", &next); + while (name) { + sep = strchr(name, ':'); + if (sep) + *sep = 0; + + if (--event_num) + strbuf_addf(new_str, "%s:%c,", name, ch); + else + strbuf_addf(new_str, "%s:%c", name, ch); + + name = strtok_r(NULL, ",", &next); + } + + free(str_dup); + return 0; +} + +static int get_group_num(char *str) +{ + return get_char_num(str, '}'); +} + +static int append_group_uk(struct strbuf *new_str, const char *old_str, bool k) +{ + char *str_dup, *group_str, *group_next, ch; + int group_num; + bool no_group; + + str_dup = strdup(old_str); + if (!str_dup) + return -1; + + group_num = get_group_num(str_dup); + ch = (k) ? 'k' : 'u'; + no_group = (group_num) ? false : true; + + group_str = strtok_r(str_dup, "}", &group_next); + while (group_str) { + append_uk(new_str, group_str, ch); + + if (!no_group) { + if (--group_num) + strbuf_addf(new_str, "},"); + else + strbuf_addf(new_str, "}"); + } + + group_str = strtok_r(NULL, "}", &group_next); + if (group_str && *group_str == ',') + group_str++; + } + + free(str_dup); + return 0; +} + +static int append_modifier(struct strbuf *new_str, const char *str, + bool all_kernel, bool all_user) +{ + int ret; + + strbuf_init(new_str, 512); + + if (all_kernel && all_user) { + ret = append_group_uk(new_str, str, true); + if (ret) + return ret; + + strbuf_addf(new_str, ","); + ret = append_group_uk(new_str, str, false); + } else + ret = append_group_uk(new_str, str, all_kernel); + + return ret; +} + +static int parse_events_option_wrap(const struct option *opt, const char *str, + int unset __maybe_unused) +{ + if (stat_config.opt_num < PERF_STAT_EVENT_OPT_MAX) { + stat_config.event_opt[stat_config.opt_num].opt = opt; + stat_config.event_opt[stat_config.opt_num].event_str = str; + stat_config.opt_num++; + return 0; + } + + return -1; +} + static struct option stat_options[] = { OPT_BOOLEAN('T', "transaction", &transaction_run, "hardware transaction statistics"), OPT_CALLBACK('e', "event", &evsel_list, "event", "event selector. use 'perf list' to list available events", - parse_events_option), + parse_events_option_wrap), OPT_CALLBACK(0, "filter", &evsel_list, "filter", "event filter", parse_filter), OPT_BOOLEAN('i', "no-inherit", &stat_config.no_inherit, @@ -800,6 +913,10 @@ static struct option stat_options[] = { "measure topdown level 1 statistics"), OPT_BOOLEAN(0, "smi-cost", &smi_cost, "measure SMI cost"), + OPT_BOOLEAN(0, "all-user", &stat_config.all_user, + "Configure all used events to run in user space."), + OPT_BOOLEAN(0, "all-kernel", &stat_config.all_kernel, + "Configure all used events to run in kernel space."), OPT_CALLBACK('M', "metrics", &evsel_list, "metric/metric group list", "monitor specified metrics or metric groups (separated by ,)", parse_metric_groups), @@ -1680,6 +1797,44 @@ static void setup_system_wide(int forks) } } +static int parse_events_again(void) +{ + int i, ret; + + if (!stat_config.all_kernel && !stat_config.all_user) { + for (i = 0; i < stat_config.opt_num; i++) { + ret = parse_events_option(stat_config.event_opt[i].opt, + stat_config.event_opt[i].event_str, 0); + if (ret) + return ret; + } + + return 0; + } + + for (i = 0; i < stat_config.opt_num; i++) { + struct strbuf new_str; + + ret = append_modifier(&new_str, + stat_config.event_opt[i].event_str, + stat_config.all_kernel, + stat_config.all_user); + if (ret) + return ret; + + pr_debug("New event string with appended modifiers: %s\n", + new_str.buf); + + ret = parse_events_option(stat_config.event_opt[i].opt, + new_str.buf, 0); + strbuf_release(&new_str); + if (ret) + return ret; + } + + return 0; +} + int cmd_stat(int argc, const char **argv) { const char * const stat_usage[] = { @@ -1708,6 +1863,12 @@ int cmd_stat(int argc, const char **argv) argc = parse_options_subcommand(argc, argv, stat_options, stat_subcommands, (const char **) stat_usage, PARSE_OPT_STOP_AT_NON_OPTION); + + if (parse_events_again() != 0) { + parse_options_usage(stat_usage, stat_options, "e", 1); + goto out; + } + perf_stat__collect_metric_expr(evsel_list); perf_stat__init_shadow_stats(); diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h index edbeb2f63e8d..8154e07ced64 100644 --- a/tools/perf/util/stat.h +++ b/tools/perf/util/stat.h @@ -93,6 +93,13 @@ struct runtime_stat { typedef int (*aggr_get_id_t)(struct perf_stat_config *config, struct perf_cpu_map *m, int cpu); +struct perf_stat_event_opt { + const struct option *opt; + const char *event_str; +}; + +#define PERF_STAT_EVENT_OPT_MAX 128 + struct perf_stat_config { enum aggr_mode aggr_mode; bool scale; @@ -106,6 +113,8 @@ struct perf_stat_config { bool big_num; bool no_merge; bool walltime_run_table; + bool all_kernel; + bool all_user; FILE *output; unsigned int interval; unsigned int timeout; @@ -126,6 +135,8 @@ struct perf_stat_config { struct perf_cpu_map *cpus_aggr_map; u64 *walltime_run; struct rblist metric_events; + struct perf_stat_event_opt event_opt[PERF_STAT_EVENT_OPT_MAX]; + int opt_num; }; void update_stats(struct stats *stats, u64 val); -- 2.17.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v1 2/2] perf stat: Support topdown with --all-kernel/--all-user 2019-09-25 2:02 [PATCH v1 0/2] perf stat: Support --all-kernel and --all-user Jin Yao 2019-09-25 2:02 ` [PATCH v1 1/2] perf stat: Support --all-kernel and --all-user options Jin Yao @ 2019-09-25 2:02 ` Jin Yao 2019-09-29 15:29 ` [PATCH v1 0/2] perf stat: Support --all-kernel and --all-user Jiri Olsa 2 siblings, 0 replies; 14+ messages in thread From: Jin Yao @ 2019-09-25 2:02 UTC (permalink / raw) To: acme, jolsa, peterz, mingo, alexander.shishkin Cc: Linux-kernel, ak, kan.liang, yao.jin, Jin Yao When perf stat --topdown is enabled, the internal event list is expanded to: "{topdown-total-slots,topdown-slots-retired,topdown-recovery-bubbles,topdown-fetch-bubbles,topdown-slots-issued}". With this patch, 1. When --all-user is enabled, it's expanded to: "{topdown-total-slots:u,topdown-slots-retired:u,topdown-recovery-bubbles:u,topdown-fetch-bubbles:u,topdown-slots-issued:u}" 2. When --all-kernel is enabled, it's expanded to: "{topdown-total-slots:k,topdown-slots-retired:k,topdown-recovery-bubbles:k,topdown-fetch-bubbles:k,topdown-slots-issued:k}" 3. Both are enabled, it's expanded to: "{topdown-total-slots:k,topdown-slots-retired:k,topdown-recovery-bubbles:k,topdown-fetch-bubbles:k,topdown-slots-issued:k},{topdown-total-slots:u,topdown-slots-retired:u,topdown-recovery-bubbles:u,topdown-fetch-bubbles:u,topdown-slots-issued:u}" This patch creates new topdown stat type (STAT_TOPDOWN_XXX_K / STAT_TOPDOWN_XXX_U), and save the event counting value to type related entry in runtime_stat rblist. For example, root@kbl:~# perf stat -a --topdown --all-kernel -- sleep 1 Performance counter stats for 'system wide': retiring:k bad speculation:k frontend bound:k backend bound:k S0-D0-C0 2 7.6% 1.8% 40.5% 50.0% S0-D0-C1 2 15.4% 3.4% 14.4% 66.8% S0-D0-C2 2 15.8% 5.1% 26.9% 52.2% S0-D0-C3 2 5.7% 5.7% 46.2% 42.4% 1.000771709 seconds time elapsed root@kbl:~# perf stat -a --topdown --all-user -- sleep 1 Performance counter stats for 'system wide': retiring:u bad speculation:u frontend bound:u backend bound:u S0-D0-C0 2 0.5% 0.0% 0.0% 99.4% S0-D0-C1 2 5.7% 5.8% 77.7% 10.7% S0-D0-C2 2 15.5% 20.5% 35.8% 28.2% S0-D0-C3 2 14.1% 0.5% 1.5% 83.9% 1.000773028 seconds time elapsed Signed-off-by: Jin Yao <yao.jin@linux.intel.com> --- tools/perf/builtin-stat.c | 37 +++++++- tools/perf/util/stat-shadow.c | 167 +++++++++++++++++++++++++--------- tools/perf/util/stat.h | 12 +++ 3 files changed, 171 insertions(+), 45 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 7f4d22b00d04..b766293b9a15 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -1436,7 +1436,8 @@ static int add_default_attributes(void) if (topdown_run) { char *str = NULL; - bool warn = false; + bool warn = false, append_uk = false; + struct strbuf new_str; if (stat_config.aggr_mode != AGGR_GLOBAL && stat_config.aggr_mode != AGGR_CORE) { @@ -1457,6 +1458,21 @@ static int add_default_attributes(void) return -1; } if (topdown_attrs[0] && str) { + int ret; + + if (stat_config.all_kernel || stat_config.all_user) { + ret = append_modifier(&new_str, str, + stat_config.all_kernel, + stat_config.all_user); + if (ret) + return ret; + + free(str); + str = strbuf_detach(&new_str, NULL); + strbuf_release(&new_str); + append_uk = true; + } + if (warn) arch_topdown_group_warn(); err = parse_events(evsel_list, str, &errinfo); @@ -1468,6 +1484,25 @@ static int add_default_attributes(void) free(str); return -1; } + + if (append_uk) { + struct evsel *evsel; + char *p; + + evlist__for_each_entry(evsel_list, evsel) { + /* + * We appended the modifiers ":u"/":k" + * to evsel->name. Since the events have + * been parsed, remove the appended + * modifiers from event name here. + */ + if (evsel->name) { + p = strchr(evsel->name, ':'); + if (p) + *p = 0; + } + } + } } else { fprintf(stderr, "System does not support topdown\n"); return -1; diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c index 70c87fdb2a43..013e0f772658 100644 --- a/tools/perf/util/stat-shadow.c +++ b/tools/perf/util/stat-shadow.c @@ -204,6 +204,24 @@ static void update_runtime_stat(struct runtime_stat *st, update_stats(&v->stats, count); } +static void update_runtime_stat_uk(struct runtime_stat *st, + enum stat_type type, + int ctx, int cpu, u64 count, + struct evsel *counter, int type_off) +{ + struct perf_event_attr *attr = &counter->core.attr; + + if (!attr->exclude_user && !attr->exclude_kernel) + update_runtime_stat(st, type, ctx, cpu, count); + else if (attr->exclude_user) { + update_runtime_stat(st, type + type_off, + ctx, cpu, count); + } else { + update_runtime_stat(st, type + type_off * 2, + ctx, cpu, count); + } +} + /* * Update various tracking values we maintain to print * more semantic information such as miss/hit ratios, @@ -229,20 +247,25 @@ void perf_stat__update_shadow_stats(struct evsel *counter, u64 count, else if (perf_stat_evsel__is(counter, ELISION_START)) update_runtime_stat(st, STAT_ELISION, ctx, cpu, count); else if (perf_stat_evsel__is(counter, TOPDOWN_TOTAL_SLOTS)) - update_runtime_stat(st, STAT_TOPDOWN_TOTAL_SLOTS, - ctx, cpu, count); + update_runtime_stat_uk(st, STAT_TOPDOWN_TOTAL_SLOTS, + ctx, cpu, count, counter, + STAT_TOPDOWN_NUM); else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_ISSUED)) - update_runtime_stat(st, STAT_TOPDOWN_SLOTS_ISSUED, - ctx, cpu, count); + update_runtime_stat_uk(st, STAT_TOPDOWN_SLOTS_ISSUED, + ctx, cpu, count, counter, + STAT_TOPDOWN_NUM); else if (perf_stat_evsel__is(counter, TOPDOWN_SLOTS_RETIRED)) - update_runtime_stat(st, STAT_TOPDOWN_SLOTS_RETIRED, - ctx, cpu, count); + update_runtime_stat_uk(st, STAT_TOPDOWN_SLOTS_RETIRED, + ctx, cpu, count, counter, + STAT_TOPDOWN_NUM); else if (perf_stat_evsel__is(counter, TOPDOWN_FETCH_BUBBLES)) - update_runtime_stat(st, STAT_TOPDOWN_FETCH_BUBBLES, - ctx, cpu, count); + update_runtime_stat_uk(st, STAT_TOPDOWN_FETCH_BUBBLES, + ctx, cpu, count, counter, + STAT_TOPDOWN_NUM); else if (perf_stat_evsel__is(counter, TOPDOWN_RECOVERY_BUBBLES)) - update_runtime_stat(st, STAT_TOPDOWN_RECOVERY_BUBBLES, - ctx, cpu, count); + update_runtime_stat_uk(st, STAT_TOPDOWN_RECOVERY_BUBBLES, + ctx, cpu, count, counter, + STAT_TOPDOWN_NUM); else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND)) update_runtime_stat(st, STAT_STALLED_CYCLES_FRONT, ctx, cpu, count); @@ -410,6 +433,20 @@ static double runtime_stat_avg(struct runtime_stat *st, return avg_stats(&v->stats); } +static double runtime_stat_avg_uk(struct runtime_stat *st, + enum stat_type type, int ctx, int cpu, + struct evsel *counter, int type_off) +{ + struct perf_event_attr *attr = &counter->core.attr; + + if (!attr->exclude_user && !attr->exclude_kernel) + return runtime_stat_avg(st, type, ctx, cpu); + else if (attr->exclude_user) + return runtime_stat_avg(st, type + type_off, ctx, cpu); + + return runtime_stat_avg(st, type + type_off * 2, ctx, cpu); +} + static double runtime_stat_n(struct runtime_stat *st, enum stat_type type, int ctx, int cpu) { @@ -639,56 +676,67 @@ static double sanitize_val(double x) return x; } -static double td_total_slots(int ctx, int cpu, struct runtime_stat *st) +static double td_total_slots(int ctx, int cpu, struct runtime_stat *st, + struct evsel *evsel) { - return runtime_stat_avg(st, STAT_TOPDOWN_TOTAL_SLOTS, ctx, cpu); + return runtime_stat_avg_uk(st, STAT_TOPDOWN_TOTAL_SLOTS, ctx, cpu, + evsel, STAT_TOPDOWN_NUM); } -static double td_bad_spec(int ctx, int cpu, struct runtime_stat *st) +static double td_bad_spec(int ctx, int cpu, struct runtime_stat *st, + struct evsel *evsel) { double bad_spec = 0; double total_slots; double total; - total = runtime_stat_avg(st, STAT_TOPDOWN_SLOTS_ISSUED, ctx, cpu) - - runtime_stat_avg(st, STAT_TOPDOWN_SLOTS_RETIRED, ctx, cpu) + - runtime_stat_avg(st, STAT_TOPDOWN_RECOVERY_BUBBLES, ctx, cpu); + total = runtime_stat_avg_uk(st, STAT_TOPDOWN_SLOTS_ISSUED, ctx, cpu, + evsel, STAT_TOPDOWN_NUM) - + runtime_stat_avg_uk(st, STAT_TOPDOWN_SLOTS_RETIRED, ctx, cpu, + evsel, STAT_TOPDOWN_NUM) + + runtime_stat_avg_uk(st, STAT_TOPDOWN_RECOVERY_BUBBLES, ctx, cpu, + evsel, STAT_TOPDOWN_NUM); - total_slots = td_total_slots(ctx, cpu, st); + total_slots = td_total_slots(ctx, cpu, st, evsel); if (total_slots) bad_spec = total / total_slots; return sanitize_val(bad_spec); } -static double td_retiring(int ctx, int cpu, struct runtime_stat *st) +static double td_retiring(int ctx, int cpu, struct runtime_stat *st, + struct evsel *evsel) { double retiring = 0; - double total_slots = td_total_slots(ctx, cpu, st); - double ret_slots = runtime_stat_avg(st, STAT_TOPDOWN_SLOTS_RETIRED, - ctx, cpu); + double total_slots = td_total_slots(ctx, cpu, st, evsel); + double ret_slots = runtime_stat_avg_uk(st, STAT_TOPDOWN_SLOTS_RETIRED, + ctx, cpu, evsel, + STAT_TOPDOWN_NUM); if (total_slots) retiring = ret_slots / total_slots; return retiring; } -static double td_fe_bound(int ctx, int cpu, struct runtime_stat *st) +static double td_fe_bound(int ctx, int cpu, struct runtime_stat *st, + struct evsel *evsel) { double fe_bound = 0; - double total_slots = td_total_slots(ctx, cpu, st); - double fetch_bub = runtime_stat_avg(st, STAT_TOPDOWN_FETCH_BUBBLES, - ctx, cpu); + double total_slots = td_total_slots(ctx, cpu, st, evsel); + double fetch_bub = runtime_stat_avg_uk(st, STAT_TOPDOWN_FETCH_BUBBLES, + ctx, cpu, evsel, + STAT_TOPDOWN_NUM); if (total_slots) fe_bound = fetch_bub / total_slots; return fe_bound; } -static double td_be_bound(int ctx, int cpu, struct runtime_stat *st) +static double td_be_bound(int ctx, int cpu, struct runtime_stat *st, + struct evsel *evsel) { - double sum = (td_fe_bound(ctx, cpu, st) + - td_bad_spec(ctx, cpu, st) + - td_retiring(ctx, cpu, st)); + double sum = (td_fe_bound(ctx, cpu, st, evsel) + + td_bad_spec(ctx, cpu, st, evsel) + + td_retiring(ctx, cpu, st, evsel)); if (sum == 0) return 0; return sanitize_val(1.0 - sum); @@ -814,6 +862,33 @@ static void generic_metric(struct perf_stat_config *config, zfree(&pctx.ids[i].name); } +static void print_metric_uk(struct perf_stat_config *config, + void *ctx, const char *color, + const char *fmt, const char *unit, + double val, struct evsel *evsel, + print_metric_t print_metric) +{ + struct perf_event_attr *attr = &evsel->core.attr; + char *new_unit; + + if (!attr->exclude_user && !attr->exclude_kernel) { + print_metric(config, ctx, color, fmt, unit, val); + return; + } + + new_unit = calloc(1, strlen(unit) + 3); + if (!new_unit) + return; + + if (attr->exclude_user) + sprintf(new_unit, "%s:k", unit); + else + sprintf(new_unit, "%s:u", unit); + + print_metric(config, ctx, color, fmt, new_unit, val); + free(new_unit); +} + void perf_stat__print_shadow_stats(struct perf_stat_config *config, struct evsel *evsel, double avg, int cpu, @@ -986,28 +1061,30 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, else print_metric(config, ctxp, NULL, NULL, "CPUs utilized", 0); } else if (perf_stat_evsel__is(evsel, TOPDOWN_FETCH_BUBBLES)) { - double fe_bound = td_fe_bound(ctx, cpu, st); + double fe_bound = td_fe_bound(ctx, cpu, st, evsel); if (fe_bound > 0.2) color = PERF_COLOR_RED; - print_metric(config, ctxp, color, "%8.1f%%", "frontend bound", - fe_bound * 100.); + print_metric_uk(config, ctxp, color, "%8.1f%%", + "frontend bound", + fe_bound * 100., evsel, print_metric); } else if (perf_stat_evsel__is(evsel, TOPDOWN_SLOTS_RETIRED)) { - double retiring = td_retiring(ctx, cpu, st); + double retiring = td_retiring(ctx, cpu, st, evsel); if (retiring > 0.7) color = PERF_COLOR_GREEN; - print_metric(config, ctxp, color, "%8.1f%%", "retiring", - retiring * 100.); + print_metric_uk(config, ctxp, color, "%8.1f%%", "retiring", + retiring * 100., evsel, print_metric); } else if (perf_stat_evsel__is(evsel, TOPDOWN_RECOVERY_BUBBLES)) { - double bad_spec = td_bad_spec(ctx, cpu, st); + double bad_spec = td_bad_spec(ctx, cpu, st, evsel); if (bad_spec > 0.1) color = PERF_COLOR_RED; - print_metric(config, ctxp, color, "%8.1f%%", "bad speculation", - bad_spec * 100.); + print_metric_uk(config, ctxp, color, "%8.1f%%", + "bad speculation", + bad_spec * 100., evsel, print_metric); } else if (perf_stat_evsel__is(evsel, TOPDOWN_SLOTS_ISSUED)) { - double be_bound = td_be_bound(ctx, cpu, st); + double be_bound = td_be_bound(ctx, cpu, st, evsel); const char *name = "backend bound"; static int have_recovery_bubbles = -1; @@ -1020,11 +1097,13 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config, if (be_bound > 0.2) color = PERF_COLOR_RED; - if (td_total_slots(ctx, cpu, st) > 0) - print_metric(config, ctxp, color, "%8.1f%%", name, - be_bound * 100.); - else - print_metric(config, ctxp, NULL, NULL, name, 0); + if (td_total_slots(ctx, cpu, st, evsel) > 0) + print_metric_uk(config, ctxp, color, "%8.1f%%", name, + be_bound * 100., evsel, print_metric); + else { + print_metric_uk(config, ctxp, NULL, NULL, name, 0, + evsel, print_metric); + } } else if (evsel->metric_expr) { generic_metric(config, evsel->metric_expr, evsel->metric_events, evsel->name, evsel->metric_name, NULL, avg, cpu, out, st); diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h index 8154e07ced64..1bae80ed5543 100644 --- a/tools/perf/util/stat.h +++ b/tools/perf/util/stat.h @@ -60,6 +60,8 @@ enum { #define NUM_CTX CTX_BIT_MAX +#define STAT_TOPDOWN_NUM 5 + enum stat_type { STAT_NONE = 0, STAT_NSECS, @@ -81,6 +83,16 @@ enum stat_type { STAT_TOPDOWN_SLOTS_RETIRED, STAT_TOPDOWN_FETCH_BUBBLES, STAT_TOPDOWN_RECOVERY_BUBBLES, + STAT_TOPDOWN_TOTAL_SLOTS_K, + STAT_TOPDOWN_SLOTS_ISSUED_K, + STAT_TOPDOWN_SLOTS_RETIRED_K, + STAT_TOPDOWN_FETCH_BUBBLES_K, + STAT_TOPDOWN_RECOVERY_BUBBLES_K, + STAT_TOPDOWN_TOTAL_SLOTS_U, + STAT_TOPDOWN_SLOTS_ISSUED_U, + STAT_TOPDOWN_SLOTS_RETIRED_U, + STAT_TOPDOWN_FETCH_BUBBLES_U, + STAT_TOPDOWN_RECOVERY_BUBBLES_U, STAT_SMI_NUM, STAT_APERF, STAT_MAX -- 2.17.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v1 0/2] perf stat: Support --all-kernel and --all-user 2019-09-25 2:02 [PATCH v1 0/2] perf stat: Support --all-kernel and --all-user Jin Yao 2019-09-25 2:02 ` [PATCH v1 1/2] perf stat: Support --all-kernel and --all-user options Jin Yao 2019-09-25 2:02 ` [PATCH v1 2/2] perf stat: Support topdown with --all-kernel/--all-user Jin Yao @ 2019-09-29 15:29 ` Jiri Olsa 2019-09-30 18:21 ` Andi Kleen 2 siblings, 1 reply; 14+ messages in thread From: Jiri Olsa @ 2019-09-29 15:29 UTC (permalink / raw) To: Jin Yao Cc: acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, ak, kan.liang, yao.jin On Wed, Sep 25, 2019 at 10:02:16AM +0800, Jin Yao wrote: > This patch series supports the new options "--all-kernel" and "--all-user" > in perf-stat. > > For example, > > root@kbl:~# perf stat -e cycles,instructions --all-kernel --all-user -a -- sleep 1 > > Performance counter stats for 'system wide': > > 19,156,665 cycles:k > 7,265,342 instructions:k # 0.38 insn per cycle > 4,511,186,293 cycles:u > 121,881,436 instructions:u # 0.03 insn per cycle hi, I think we should follow --all-kernel/--all-user behaviour from record command, adding extra events seems like unnecesary complexity to me jirka > > 1.001153540 seconds time elapsed > > > root@kbl:~# perf stat -a --topdown --all-kernel -- sleep 1 > > Performance counter stats for 'system wide': > > retiring:k bad speculation:k frontend bound:k backend bound:k > S0-D0-C0 2 7.6% 1.8% 40.5% 50.0% > S0-D0-C1 2 15.4% 3.4% 14.4% 66.8% > S0-D0-C2 2 15.8% 5.1% 26.9% 52.2% > S0-D0-C3 2 5.7% 5.7% 46.2% 42.4% > > 1.000771709 seconds time elapsed > > More detail information are in the patch descriptions. > > Jin Yao (2): > perf stat: Support --all-kernel and --all-user options > perf stat: Support topdown with --all-kernel/--all-user > > tools/perf/Documentation/perf-record.txt | 3 +- > tools/perf/Documentation/perf-stat.txt | 7 + > tools/perf/builtin-stat.c | 200 ++++++++++++++++++++++- > tools/perf/util/stat-shadow.c | 167 ++++++++++++++----- > tools/perf/util/stat.h | 23 +++ > 5 files changed, 353 insertions(+), 47 deletions(-) > > -- > 2.17.1 > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 0/2] perf stat: Support --all-kernel and --all-user 2019-09-29 15:29 ` [PATCH v1 0/2] perf stat: Support --all-kernel and --all-user Jiri Olsa @ 2019-09-30 18:21 ` Andi Kleen 2019-09-30 19:28 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 14+ messages in thread From: Andi Kleen @ 2019-09-30 18:21 UTC (permalink / raw) To: Jiri Olsa Cc: Jin Yao, acme, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, kan.liang, yao.jin On Sun, Sep 29, 2019 at 05:29:13PM +0200, Jiri Olsa wrote: > On Wed, Sep 25, 2019 at 10:02:16AM +0800, Jin Yao wrote: > > This patch series supports the new options "--all-kernel" and "--all-user" > > in perf-stat. > > > > For example, > > > > root@kbl:~# perf stat -e cycles,instructions --all-kernel --all-user -a -- sleep 1 > > > > Performance counter stats for 'system wide': > > > > 19,156,665 cycles:k > > 7,265,342 instructions:k # 0.38 insn per cycle > > 4,511,186,293 cycles:u > > 121,881,436 instructions:u # 0.03 insn per cycle > > hi, > I think we should follow --all-kernel/--all-user behaviour from record > command, adding extra events seems like unnecesary complexity to me I think it's useful. Makes it easy to do kernel/user break downs. perf record should support the same. -Andi ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 0/2] perf stat: Support --all-kernel and --all-user 2019-09-30 18:21 ` Andi Kleen @ 2019-09-30 19:28 ` Arnaldo Carvalho de Melo 2019-10-01 2:17 ` Andi Kleen 0 siblings, 1 reply; 14+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-09-30 19:28 UTC (permalink / raw) To: Andi Kleen Cc: Jiri Olsa, Jin Yao, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, kan.liang, yao.jin Em Mon, Sep 30, 2019 at 11:21:36AM -0700, Andi Kleen escreveu: > On Sun, Sep 29, 2019 at 05:29:13PM +0200, Jiri Olsa wrote: > > On Wed, Sep 25, 2019 at 10:02:16AM +0800, Jin Yao wrote: > > > This patch series supports the new options "--all-kernel" and "--all-user" > > > in perf-stat. > > > For example, > > > root@kbl:~# perf stat -e cycles,instructions --all-kernel --all-user -a -- sleep 1 > > > Performance counter stats for 'system wide': > > > 19,156,665 cycles:k > > > 7,265,342 instructions:k # 0.38 insn per cycle > > > 4,511,186,293 cycles:u > > > 121,881,436 instructions:u # 0.03 insn per cycle > > I think we should follow --all-kernel/--all-user behaviour from record > > command, adding extra events seems like unnecesary complexity to me > I think it's useful. Makes it easy to do kernel/user break downs. > perf record should support the same. Don't we have this already with: [root@quaco ~]# perf stat -e cycles:u,instructions:u,cycles:k,instructions:k -a -- sleep 1 Performance counter stats for 'system wide': 69,937,445 cycles:u 23,459,595 instructions:u # 0.34 insn per cycle 51,040,704 cycles:k 11,368,152 instructions:k # 0.22 insn per cycle 1.002887417 seconds time elapsed [root@quaco ~]# perf record -e cycles:u,instructions:u,cycles:k,instructions:k -a -- sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.340 MB perf.data (927 samples) ] [root@quaco ~]# perf evlist cycles:u instructions:u cycles:k instructions:k [root@quaco ~]# To make it shorter we could have: diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l index 7469497cd28e..7df28b0e9682 100644 --- a/tools/perf/util/parse-events.l +++ b/tools/perf/util/parse-events.l @@ -313,11 +313,11 @@ aux-output { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT); } <<EOF>> { BEGIN(INITIAL); } } -cpu-cycles|cycles { return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_CPU_CYCLES); } +cpu-cycles|cycles|cyc { return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_CPU_CYCLES); } stalled-cycles-frontend|idle-cycles-frontend { return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); } stalled-cycles-backend|idle-cycles-backend { return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); } -instructions { return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_INSTRUCTIONS); } +instructions|insn|ins { return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_INSTRUCTIONS); } cache-misses { return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_CACHE_MISSES); } branch-instructions|branches { return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_BRANCH_INSTRUCTIONS); } branch-misses { return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_BRANCH_MISSES); } And another thing that comes to mind is to make -M metrics be accepted as -e arg, as someone suggested recently (Andi?), and also make it set its events honouring the :k or :u modifiers: [root@quaco ~]# perf stat -M ipc ^C Performance counter stats for 'system wide': 15,030,011 inst_retired.any # 0.3 IPC 54,449,797 cpu_clk_unhalted.thread 1.186531715 seconds time elapsed [root@quaco ~]# perf stat -M ipc:k,ipc:u Cannot find metric or group `ipc:k' Usage: perf stat [<options>] [<command>] -M, --metrics <metric/metric group list> monitor specified metrics or metric groups (separated by ,) [root@quaco ~]# - Arnaldo ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v1 0/2] perf stat: Support --all-kernel and --all-user 2019-09-30 19:28 ` Arnaldo Carvalho de Melo @ 2019-10-01 2:17 ` Andi Kleen 2019-10-10 6:46 ` Jin, Yao 0 siblings, 1 reply; 14+ messages in thread From: Andi Kleen @ 2019-10-01 2:17 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Jiri Olsa, Jin Yao, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, kan.liang, yao.jin > > I think it's useful. Makes it easy to do kernel/user break downs. > > perf record should support the same. > > Don't we have this already with: > > [root@quaco ~]# perf stat -e cycles:u,instructions:u,cycles:k,instructions:k -a -- sleep 1 This only works for simple cases. Try it for --topdown or multiple -M metrics. -Andi ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 0/2] perf stat: Support --all-kernel and --all-user 2019-10-01 2:17 ` Andi Kleen @ 2019-10-10 6:46 ` Jin, Yao 2019-10-10 8:00 ` Jiri Olsa 0 siblings, 1 reply; 14+ messages in thread From: Jin, Yao @ 2019-10-10 6:46 UTC (permalink / raw) To: Andi Kleen, Arnaldo Carvalho de Melo Cc: Jiri Olsa, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, kan.liang, yao.jin On 10/1/2019 10:17 AM, Andi Kleen wrote: >>> I think it's useful. Makes it easy to do kernel/user break downs. >>> perf record should support the same. >> >> Don't we have this already with: >> >> [root@quaco ~]# perf stat -e cycles:u,instructions:u,cycles:k,instructions:k -a -- sleep 1 > > This only works for simple cases. Try it for --topdown or multiple -M metrics. > > -Andi > Hi Arnaldo, Jiri, We think it should be very useful if --all-user / --all-kernel can be specified together, so that we can get a break down between user and kernel easily. But yes, the patches for supporting this new semantics is much complicated than the patch which just follows original perf-record behavior. I fully understand this concern. So if this new semantics can be accepted, that would be very good. But if you think the new semantics is too complicated, I'm also fine for posting a new patch which just follows the perf-record behavior. Thanks Jin Yao ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 0/2] perf stat: Support --all-kernel and --all-user 2019-10-10 6:46 ` Jin, Yao @ 2019-10-10 8:00 ` Jiri Olsa 2019-10-10 8:33 ` Jin, Yao 0 siblings, 1 reply; 14+ messages in thread From: Jiri Olsa @ 2019-10-10 8:00 UTC (permalink / raw) To: Jin, Yao Cc: Andi Kleen, Arnaldo Carvalho de Melo, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, kan.liang, yao.jin On Thu, Oct 10, 2019 at 02:46:36PM +0800, Jin, Yao wrote: > > > On 10/1/2019 10:17 AM, Andi Kleen wrote: > > > > I think it's useful. Makes it easy to do kernel/user break downs. > > > > perf record should support the same. > > > > > > Don't we have this already with: > > > > > > [root@quaco ~]# perf stat -e cycles:u,instructions:u,cycles:k,instructions:k -a -- sleep 1 > > > > This only works for simple cases. Try it for --topdown or multiple -M metrics. > > > > -Andi > > > > Hi Arnaldo, Jiri, > > We think it should be very useful if --all-user / --all-kernel can be > specified together, so that we can get a break down between user and kernel > easily. > > But yes, the patches for supporting this new semantics is much complicated > than the patch which just follows original perf-record behavior. I fully > understand this concern. > > So if this new semantics can be accepted, that would be very good. But if > you think the new semantics is too complicated, I'm also fine for posting a > new patch which just follows the perf-record behavior. I still need to think a bit more about this.. did you consider other options like cloning of the perf_evlist/perf_evsel and changing just the exclude* bits? might be event worse actualy ;-) or maybe if we add modifier we could add extra events/groups within the parser.. like: "{cycles,instructions}:A,{cache-misses,cache-references}:A,cycles:A" but that might be still more complicated then what you did also please add the perf record changes so we have same code and logic for both if we are going to change it jirka ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 0/2] perf stat: Support --all-kernel and --all-user 2019-10-10 8:00 ` Jiri Olsa @ 2019-10-10 8:33 ` Jin, Yao 2019-10-10 12:33 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 14+ messages in thread From: Jin, Yao @ 2019-10-10 8:33 UTC (permalink / raw) To: Jiri Olsa Cc: Andi Kleen, Arnaldo Carvalho de Melo, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, kan.liang, yao.jin On 10/10/2019 4:00 PM, Jiri Olsa wrote: > On Thu, Oct 10, 2019 at 02:46:36PM +0800, Jin, Yao wrote: >> >> >> On 10/1/2019 10:17 AM, Andi Kleen wrote: >>>>> I think it's useful. Makes it easy to do kernel/user break downs. >>>>> perf record should support the same. >>>> >>>> Don't we have this already with: >>>> >>>> [root@quaco ~]# perf stat -e cycles:u,instructions:u,cycles:k,instructions:k -a -- sleep 1 >>> >>> This only works for simple cases. Try it for --topdown or multiple -M metrics. >>> >>> -Andi >>> >> >> Hi Arnaldo, Jiri, >> >> We think it should be very useful if --all-user / --all-kernel can be >> specified together, so that we can get a break down between user and kernel >> easily. >> >> But yes, the patches for supporting this new semantics is much complicated >> than the patch which just follows original perf-record behavior. I fully >> understand this concern. >> >> So if this new semantics can be accepted, that would be very good. But if >> you think the new semantics is too complicated, I'm also fine for posting a >> new patch which just follows the perf-record behavior. > > I still need to think a bit more about this.. did you consider > other options like cloning of the perf_evlist/perf_evsel and > changing just the exclude* bits? might be event worse actualy ;-) > That should be another approach, but it might be a bit more complicated than just appending ":u"/":k" modifiers to the event name string. > or maybe if we add modifier we could add extra events/groups > within the parser.. like: > > "{cycles,instructions}:A,{cache-misses,cache-references}:A,cycles:A" > > but that might be still more complicated then what you did > Yes agree. > also please add the perf record changes so we have same code > and logic for both if we are going to change it > If this new semantics can be accepted, I'd like to add perf record supporting as well. :) Another difficulty for the new semantics is we need to create user and kernel stat type in runtime_stat rblist (see patch "perf stat: Support topdown with --all-kernel/--all-user"). That has to bring extra complexity. Thanks Jin Yao > jirka > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 0/2] perf stat: Support --all-kernel and --all-user 2019-10-10 8:33 ` Jin, Yao @ 2019-10-10 12:33 ` Arnaldo Carvalho de Melo 2019-10-11 2:50 ` Jin, Yao 0 siblings, 1 reply; 14+ messages in thread From: Arnaldo Carvalho de Melo @ 2019-10-10 12:33 UTC (permalink / raw) To: Jin, Yao Cc: Jiri Olsa, Andi Kleen, Arnaldo Carvalho de Melo, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, kan.liang, yao.jin Em Thu, Oct 10, 2019 at 04:33:57PM +0800, Jin, Yao escreveu: > > > On 10/10/2019 4:00 PM, Jiri Olsa wrote: > > On Thu, Oct 10, 2019 at 02:46:36PM +0800, Jin, Yao wrote: > > > > > > > > > On 10/1/2019 10:17 AM, Andi Kleen wrote: > > > > > > I think it's useful. Makes it easy to do kernel/user break downs. > > > > > > perf record should support the same. > > > > > > > > > > Don't we have this already with: > > > > > > > > > > [root@quaco ~]# perf stat -e cycles:u,instructions:u,cycles:k,instructions:k -a -- sleep 1 > > > > > > > > This only works for simple cases. Try it for --topdown or multiple -M metrics. > > > > > > > > -Andi > > > > > > > > > > Hi Arnaldo, Jiri, > > > > > > We think it should be very useful if --all-user / --all-kernel can be > > > specified together, so that we can get a break down between user and kernel > > > easily. > > > > > > But yes, the patches for supporting this new semantics is much complicated > > > than the patch which just follows original perf-record behavior. I fully > > > understand this concern. > > > > > > So if this new semantics can be accepted, that would be very good. But if > > > you think the new semantics is too complicated, I'm also fine for posting a > > > new patch which just follows the perf-record behavior. > > > > I still need to think a bit more about this.. did you consider > > other options like cloning of the perf_evlist/perf_evsel and > > changing just the exclude* bits? might be event worse actualy ;-) > > > > That should be another approach, but it might be a bit more complicated than > just appending ":u"/":k" modifiers to the event name string. > > > or maybe if we add modifier we could add extra events/groups > > within the parser.. like: > > > > "{cycles,instructions}:A,{cache-misses,cache-references}:A,cycles:A" > > > > but that might be still more complicated then what you did > > > > Yes agree. > > > also please add the perf record changes so we have same code > > and logic for both if we are going to change it > If this new semantics can be accepted, I'd like to add perf record > supporting as well. :) Changes in semantics should be avoided, when we add an option already present in some other tool, we should strive to keep the semantics, so that people can reuse their knowledge and just switch tools to go from sampling to counting, say. So if at all possible, and without having really looked deep in this specific case, I would prefer that new semantics come with a new syntax, would that be possible? - Arnaldo > Another difficulty for the new semantics is we need to create user and > kernel stat type in runtime_stat rblist (see patch "perf stat: Support > topdown with --all-kernel/--all-user"). That has to bring extra complexity. > > Thanks > Jin Yao > > > jirka > > -- - Arnaldo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 0/2] perf stat: Support --all-kernel and --all-user 2019-10-10 12:33 ` Arnaldo Carvalho de Melo @ 2019-10-11 2:50 ` Jin, Yao 2019-10-11 7:21 ` Jiri Olsa 0 siblings, 1 reply; 14+ messages in thread From: Jin, Yao @ 2019-10-11 2:50 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Jiri Olsa, Andi Kleen, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, kan.liang, yao.jin On 10/10/2019 8:33 PM, Arnaldo Carvalho de Melo wrote: > Em Thu, Oct 10, 2019 at 04:33:57PM +0800, Jin, Yao escreveu: >> >> >> On 10/10/2019 4:00 PM, Jiri Olsa wrote: >>> On Thu, Oct 10, 2019 at 02:46:36PM +0800, Jin, Yao wrote: >>>> >>>> >>>> On 10/1/2019 10:17 AM, Andi Kleen wrote: >>>>>>> I think it's useful. Makes it easy to do kernel/user break downs. >>>>>>> perf record should support the same. >>>>>> >>>>>> Don't we have this already with: >>>>>> >>>>>> [root@quaco ~]# perf stat -e cycles:u,instructions:u,cycles:k,instructions:k -a -- sleep 1 >>>>> >>>>> This only works for simple cases. Try it for --topdown or multiple -M metrics. >>>>> >>>>> -Andi >>>>> >>>> >>>> Hi Arnaldo, Jiri, >>>> >>>> We think it should be very useful if --all-user / --all-kernel can be >>>> specified together, so that we can get a break down between user and kernel >>>> easily. >>>> >>>> But yes, the patches for supporting this new semantics is much complicated >>>> than the patch which just follows original perf-record behavior. I fully >>>> understand this concern. >>>> >>>> So if this new semantics can be accepted, that would be very good. But if >>>> you think the new semantics is too complicated, I'm also fine for posting a >>>> new patch which just follows the perf-record behavior. >>> >>> I still need to think a bit more about this.. did you consider >>> other options like cloning of the perf_evlist/perf_evsel and >>> changing just the exclude* bits? might be event worse actualy ;-) >>> >> >> That should be another approach, but it might be a bit more complicated than >> just appending ":u"/":k" modifiers to the event name string. >> >>> or maybe if we add modifier we could add extra events/groups >>> within the parser.. like: >>> >>> "{cycles,instructions}:A,{cache-misses,cache-references}:A,cycles:A" >>> >>> but that might be still more complicated then what you did >>> >> >> Yes agree. >> >>> also please add the perf record changes so we have same code >>> and logic for both if we are going to change it > >> If this new semantics can be accepted, I'd like to add perf record >> supporting as well. :) > > Changes in semantics should be avoided, when we add an option already > present in some other tool, we should strive to keep the semantics, so > that people can reuse their knowledge and just switch tools to go from > sampling to counting, say. > Yes, that makes sense. We need to try our best to keep the original semantics. I will post a patch for perf-stat which just follows the semantics in perf-record. > So if at all possible, and without having really looked deep in this > specific case, I would prefer that new semantics come with a new syntax, > would that be possible? > Yes, that's possible. Maybe we can use a new option for automatically adding two copies of the events (one copy for user and the other copy for kernel). The option something like "--all-space"? Thanks Jin Yao > - Arnaldo > >> Another difficulty for the new semantics is we need to create user and >> kernel stat type in runtime_stat rblist (see patch "perf stat: Support >> topdown with --all-kernel/--all-user"). That has to bring extra complexity. >> >> Thanks >> Jin Yao >> >>> jirka >>> > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 0/2] perf stat: Support --all-kernel and --all-user 2019-10-11 2:50 ` Jin, Yao @ 2019-10-11 7:21 ` Jiri Olsa 2019-10-12 1:49 ` Jin, Yao 0 siblings, 1 reply; 14+ messages in thread From: Jiri Olsa @ 2019-10-11 7:21 UTC (permalink / raw) To: Jin, Yao Cc: Arnaldo Carvalho de Melo, Andi Kleen, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, kan.liang, yao.jin On Fri, Oct 11, 2019 at 10:50:35AM +0800, Jin, Yao wrote: > > > On 10/10/2019 8:33 PM, Arnaldo Carvalho de Melo wrote: > > Em Thu, Oct 10, 2019 at 04:33:57PM +0800, Jin, Yao escreveu: > > > > > > > > > On 10/10/2019 4:00 PM, Jiri Olsa wrote: > > > > On Thu, Oct 10, 2019 at 02:46:36PM +0800, Jin, Yao wrote: > > > > > > > > > > > > > > > On 10/1/2019 10:17 AM, Andi Kleen wrote: > > > > > > > > I think it's useful. Makes it easy to do kernel/user break downs. > > > > > > > > perf record should support the same. > > > > > > > > > > > > > > Don't we have this already with: > > > > > > > > > > > > > > [root@quaco ~]# perf stat -e cycles:u,instructions:u,cycles:k,instructions:k -a -- sleep 1 > > > > > > > > > > > > This only works for simple cases. Try it for --topdown or multiple -M metrics. > > > > > > > > > > > > -Andi > > > > > > > > > > > > > > > > Hi Arnaldo, Jiri, > > > > > > > > > > We think it should be very useful if --all-user / --all-kernel can be > > > > > specified together, so that we can get a break down between user and kernel > > > > > easily. > > > > > > > > > > But yes, the patches for supporting this new semantics is much complicated > > > > > than the patch which just follows original perf-record behavior. I fully > > > > > understand this concern. > > > > > > > > > > So if this new semantics can be accepted, that would be very good. But if > > > > > you think the new semantics is too complicated, I'm also fine for posting a > > > > > new patch which just follows the perf-record behavior. > > > > > > > > I still need to think a bit more about this.. did you consider > > > > other options like cloning of the perf_evlist/perf_evsel and > > > > changing just the exclude* bits? might be event worse actualy ;-) > > > > > > > > > > That should be another approach, but it might be a bit more complicated than > > > just appending ":u"/":k" modifiers to the event name string. > > > > > > > or maybe if we add modifier we could add extra events/groups > > > > within the parser.. like: > > > > > > > > "{cycles,instructions}:A,{cache-misses,cache-references}:A,cycles:A" > > > > > > > > but that might be still more complicated then what you did > > > > > > > > > > Yes agree. > > > > > > > also please add the perf record changes so we have same code > > > > and logic for both if we are going to change it > > > If this new semantics can be accepted, I'd like to add perf record > > > supporting as well. :) > > > > Changes in semantics should be avoided, when we add an option already > > present in some other tool, we should strive to keep the semantics, so > > that people can reuse their knowledge and just switch tools to go from > > sampling to counting, say. > > > > Yes, that makes sense. We need to try our best to keep the original > semantics. I will post a patch for perf-stat which just follows the > semantics in perf-record. > > > So if at all possible, and without having really looked deep in this > > specific case, I would prefer that new semantics come with a new syntax, > > would that be possible? > > > > Yes, that's possible. Maybe we can use a new option for automatically adding > two copies of the events (one copy for user and the other copy for kernel). > The option something like "--all-space"? some other ideas: --all --uk --both --full -e {cycles,cache-misses}:A,cycles,instructions:A -e {cycles,cache-misses}:B,cycles,instructions:B --duplicate-every-event-or-group-of-events-for-each-address-space ;-) jirka ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v1 0/2] perf stat: Support --all-kernel and --all-user 2019-10-11 7:21 ` Jiri Olsa @ 2019-10-12 1:49 ` Jin, Yao 0 siblings, 0 replies; 14+ messages in thread From: Jin, Yao @ 2019-10-12 1:49 UTC (permalink / raw) To: Jiri Olsa Cc: Arnaldo Carvalho de Melo, Andi Kleen, jolsa, peterz, mingo, alexander.shishkin, Linux-kernel, kan.liang, yao.jin On 10/11/2019 3:21 PM, Jiri Olsa wrote: > On Fri, Oct 11, 2019 at 10:50:35AM +0800, Jin, Yao wrote: >> >> >> On 10/10/2019 8:33 PM, Arnaldo Carvalho de Melo wrote: >>> Em Thu, Oct 10, 2019 at 04:33:57PM +0800, Jin, Yao escreveu: >>>> >>>> >>>> On 10/10/2019 4:00 PM, Jiri Olsa wrote: >>>>> On Thu, Oct 10, 2019 at 02:46:36PM +0800, Jin, Yao wrote: >>>>>> >>>>>> >>>>>> On 10/1/2019 10:17 AM, Andi Kleen wrote: >>>>>>>>> I think it's useful. Makes it easy to do kernel/user break downs. >>>>>>>>> perf record should support the same. >>>>>>>> >>>>>>>> Don't we have this already with: >>>>>>>> >>>>>>>> [root@quaco ~]# perf stat -e cycles:u,instructions:u,cycles:k,instructions:k -a -- sleep 1 >>>>>>> >>>>>>> This only works for simple cases. Try it for --topdown or multiple -M metrics. >>>>>>> >>>>>>> -Andi >>>>>>> >>>>>> >>>>>> Hi Arnaldo, Jiri, >>>>>> >>>>>> We think it should be very useful if --all-user / --all-kernel can be >>>>>> specified together, so that we can get a break down between user and kernel >>>>>> easily. >>>>>> >>>>>> But yes, the patches for supporting this new semantics is much complicated >>>>>> than the patch which just follows original perf-record behavior. I fully >>>>>> understand this concern. >>>>>> >>>>>> So if this new semantics can be accepted, that would be very good. But if >>>>>> you think the new semantics is too complicated, I'm also fine for posting a >>>>>> new patch which just follows the perf-record behavior. >>>>> >>>>> I still need to think a bit more about this.. did you consider >>>>> other options like cloning of the perf_evlist/perf_evsel and >>>>> changing just the exclude* bits? might be event worse actualy ;-) >>>>> >>>> >>>> That should be another approach, but it might be a bit more complicated than >>>> just appending ":u"/":k" modifiers to the event name string. >>>> >>>>> or maybe if we add modifier we could add extra events/groups >>>>> within the parser.. like: >>>>> >>>>> "{cycles,instructions}:A,{cache-misses,cache-references}:A,cycles:A" >>>>> >>>>> but that might be still more complicated then what you did >>>>> >>>> >>>> Yes agree. >>>> >>>>> also please add the perf record changes so we have same code >>>>> and logic for both if we are going to change it >>>> If this new semantics can be accepted, I'd like to add perf record >>>> supporting as well. :) >>> >>> Changes in semantics should be avoided, when we add an option already >>> present in some other tool, we should strive to keep the semantics, so >>> that people can reuse their knowledge and just switch tools to go from >>> sampling to counting, say. >>> >> >> Yes, that makes sense. We need to try our best to keep the original >> semantics. I will post a patch for perf-stat which just follows the >> semantics in perf-record. >> >>> So if at all possible, and without having really looked deep in this >>> specific case, I would prefer that new semantics come with a new syntax, >>> would that be possible? >>> >> >> Yes, that's possible. Maybe we can use a new option for automatically adding >> two copies of the events (one copy for user and the other copy for kernel). >> The option something like "--all-space"? > > some other ideas: > > --all > --uk > --both > --full > -e {cycles,cache-misses}:A,cycles,instructions:A > -e {cycles,cache-misses}:B,cycles,instructions:B > --duplicate-every-event-or-group-of-events-for-each-address-space ;-) > > jirka > I like '--uk'. :) Thanks Jin Yao ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2019-10-12 1:49 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-09-25 2:02 [PATCH v1 0/2] perf stat: Support --all-kernel and --all-user Jin Yao 2019-09-25 2:02 ` [PATCH v1 1/2] perf stat: Support --all-kernel and --all-user options Jin Yao 2019-09-25 2:02 ` [PATCH v1 2/2] perf stat: Support topdown with --all-kernel/--all-user Jin Yao 2019-09-29 15:29 ` [PATCH v1 0/2] perf stat: Support --all-kernel and --all-user Jiri Olsa 2019-09-30 18:21 ` Andi Kleen 2019-09-30 19:28 ` Arnaldo Carvalho de Melo 2019-10-01 2:17 ` Andi Kleen 2019-10-10 6:46 ` Jin, Yao 2019-10-10 8:00 ` Jiri Olsa 2019-10-10 8:33 ` Jin, Yao 2019-10-10 12:33 ` Arnaldo Carvalho de Melo 2019-10-11 2:50 ` Jin, Yao 2019-10-11 7:21 ` Jiri Olsa 2019-10-12 1:49 ` Jin, Yao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).