From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932567AbcFBLwq (ORCPT ); Thu, 2 Jun 2016 07:52:46 -0400 Received: from mail-io0-f196.google.com ([209.85.223.196]:33503 "EHLO mail-io0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751217AbcFBLwo (ORCPT ); Thu, 2 Jun 2016 07:52:44 -0400 MIME-Version: 1.0 In-Reply-To: <20160601152456.GQ22049@tassilo.jf.intel.com> References: <1464119559-17203-1-git-send-email-andi@firstfloor.org> <20160601152456.GQ22049@tassilo.jf.intel.com> From: Nilay Vaish Date: Thu, 2 Jun 2016 06:52:03 -0500 Message-ID: Subject: Re: [PATCH 1/4] perf stat: Basic support for TopDown in perf stat To: Andi Kleen Cc: Andi Kleen , acme@kernel.org, jolsa@kernel.org, Linux Kernel list Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch looks fine to me. -- Nilay On 1 June 2016 at 10:24, Andi Kleen wrote: > > Here's an updated patch addresses the duplicated issue and explicitly frees memory. > > --- > > Add basic plumbing for TopDown in perf stat > > TopDown is intended to replace the frontend cycles idle/ > backend cycles idle metrics in standard perf stat output. > These metrics are not reliable in many workloads, > due to out of order effects. > > This implements a new --topdown mode in perf stat > (similar to --transaction) that measures the pipe line > bottlenecks using standardized formulas. The measurement > can be all done with 5 counters (one fixed counter) > > The result are four metrics: > FrontendBound, BackendBound, BadSpeculation, Retiring > > that describe the CPU pipeline behavior on a high level. > > FrontendBound and BackendBound > BadSpeculation is a higher > > The full top down methology has many hierarchical metrics. > This implementation only supports level 1 which can be > collected without multiplexing. A full implementation > of top down on top of perf is available in pmu-tools toplev. > (http://github.com/andikleen/pmu-tools) > > The current version works on Intel Core CPUs starting > with Sandy Bridge, and Atom CPUs starting with Silvermont. > In principle the generic metrics should be also implementable > on other out of order CPUs. > > TopDown level 1 uses a set of abstracted metrics which > are generic to out of order CPU cores (although some > CPUs may not implement all of them): > > topdown-total-slots Available slots in the pipeline > topdown-slots-issued Slots issued into the pipeline > topdown-slots-retired Slots successfully retired > topdown-fetch-bubbles Pipeline gaps in the frontend > topdown-recovery-bubbles Pipeline gaps during recovery > from misspeculation > > These metrics then allow to compute four useful metrics: > FrontendBound, BackendBound, Retiring, BadSpeculation. > > Add a new --topdown options to enable events. > When --topdown is specified set up events for all topdown > events supported by the kernel. > Add topdown-* as a special case to the event parser, as is > needed for all events containing -. > > The actual code to compute the metrics is in follow-on patches. > > v2: Use standard sysctl read function. > v3: Move x86 specific code to arch/ > v4: Enable --metric-only implicitly for topdown. > v5: Add --single-thread option to not force per core mode > v6: Fix output order of topdown metrics > v7: Allow combining with -d > v8: Remove --single-thread again > v9: Rename functions, adding arch_ and topdown_. > v10: Expand man page and describe TopDown better > Paste intro into commit description. > Print error when malloc fails. > v11: > Free memory before exit > Remove duplicate include. > Acked-by: Jiri Olsa > Signed-off-by: Andi Kleen > --- > tools/perf/Documentation/perf-stat.txt | 32 +++++++++ > tools/perf/arch/x86/util/Build | 1 + > tools/perf/arch/x86/util/group.c | 27 ++++++++ > tools/perf/builtin-stat.c | 119 ++++++++++++++++++++++++++++++++- > tools/perf/util/group.h | 7 ++ > tools/perf/util/parse-events.l | 1 + > 6 files changed, 184 insertions(+), 3 deletions(-) > create mode 100644 tools/perf/arch/x86/util/group.c > create mode 100644 tools/perf/util/group.h > > diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt > index 04f23b404bbc..d96ccd4844df 100644 > --- a/tools/perf/Documentation/perf-stat.txt > +++ b/tools/perf/Documentation/perf-stat.txt > @@ -204,6 +204,38 @@ Aggregate counts per physical processor for system-wide mode measurements. > --no-aggr:: > Do not aggregate counts across all monitored CPUs. > > +--topdown:: > +Print top down level 1 metrics if supported by the CPU. This allows to > +determine bottle necks in the CPU pipeline for CPU bound workloads, > +by breaking the cycles consumed down into frontend bound, backend bound, > +bad speculation and retiring. > + > +Frontend bound means that the CPU cannot fetch and decode instructions fast > +enough. Backend bound means that computation or memory access is the bottle > +neck. Bad Speculation means that the CPU wasted cycles due to branch > +mispredictions and similar issues. Retiring means that the CPU computed without > +an apparently bottleneck. The bottleneck is only the real bottleneck > +if the workload is actually bound by the CPU and not by something else. > + > +For best results it is usually a good idea to use it with interval > +mode like -I 1000, as the bottleneck of workloads can change often. > + > +The top down metrics are collected per core instead of per > +CPU thread. Per core mode is automatically enabled > +and -a (global monitoring) is needed, requiring root rights or > +perf.perf_event_paranoid=-1. > + > +Topdown uses the full Performance Monitoring Unit, and needs > +disabling of the NMI watchdog (as root): > +echo 0 > /proc/sys/kernel/nmi_watchdog > +for best results. Otherwise the bottlenecks may be inconsistent > +on workload with changing phases. > + > +This enables --metric-only, unless overriden with --no-metric-only. > + > +To interpret the results it is usually needed to know on which > +CPUs the workload runs on. If needed the CPUs can be forced using > +taskset. > > EXAMPLES > -------- > diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build > index 465970370f3e..4cd8a16b1b7b 100644 > --- a/tools/perf/arch/x86/util/Build > +++ b/tools/perf/arch/x86/util/Build > @@ -3,6 +3,7 @@ libperf-y += tsc.o > libperf-y += pmu.o > libperf-y += kvm-stat.o > libperf-y += perf_regs.o > +libperf-y += group.o > > libperf-$(CONFIG_DWARF) += dwarf-regs.o > libperf-$(CONFIG_BPF_PROLOGUE) += dwarf-regs.o > diff --git a/tools/perf/arch/x86/util/group.c b/tools/perf/arch/x86/util/group.c > new file mode 100644 > index 000000000000..37f92aa39a5d > --- /dev/null > +++ b/tools/perf/arch/x86/util/group.c > @@ -0,0 +1,27 @@ > +#include > +#include "api/fs/fs.h" > +#include "util/group.h" > + > +/* > + * Check whether we can use a group for top down. > + * Without a group may get bad results due to multiplexing. > + */ > +bool arch_topdown_check_group(bool *warn) > +{ > + int n; > + > + if (sysctl__read_int("kernel/nmi_watchdog", &n) < 0) > + return false; > + if (n > 0) { > + *warn = true; > + return false; > + } > + return true; > +} > + > +void arch_topdown_group_warn(void) > +{ > + fprintf(stderr, > + "nmi_watchdog enabled with topdown. May give wrong results.\n" > + "Disable with echo 0 > /proc/sys/kernel/nmi_watchdog\n"); > +} > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c > index 715a1128daeb..0e7d65969028 100644 > --- a/tools/perf/builtin-stat.c > +++ b/tools/perf/builtin-stat.c > @@ -59,10 +59,12 @@ > #include "util/thread.h" > #include "util/thread_map.h" > #include "util/counts.h" > +#include "util/group.h" > #include "util/session.h" > #include "util/tool.h" > #include "asm/bug.h" > > +#include > #include > #include > #include > @@ -98,6 +100,15 @@ static const char * transaction_limited_attrs = { > "}" > }; > > +static const char * topdown_attrs[] = { > + "topdown-total-slots", > + "topdown-slots-retired", > + "topdown-recovery-bubbles", > + "topdown-fetch-bubbles", > + "topdown-slots-issued", > + NULL, > +}; > + > static struct perf_evlist *evsel_list; > > static struct target target = { > @@ -112,6 +123,7 @@ static volatile pid_t child_pid = -1; > static bool null_run = false; > static int detailed_run = 0; > static bool transaction_run; > +static bool topdown_run = false; > static bool big_num = true; > static int big_num_opt = -1; > static const char *csv_sep = NULL; > @@ -124,6 +136,7 @@ static unsigned int initial_delay = 0; > static unsigned int unit_width = 4; /* strlen("unit") */ > static bool forever = false; > static bool metric_only = false; > +static bool force_metric_only = false; > static struct timespec ref_time; > static struct cpu_map *aggr_map; > static aggr_get_id_t aggr_get_id; > @@ -1520,6 +1533,14 @@ static int stat__set_big_num(const struct option *opt __maybe_unused, > return 0; > } > > +static int enable_metric_only(const struct option *opt __maybe_unused, > + const char *s __maybe_unused, int unset) > +{ > + force_metric_only = true; > + metric_only = !unset; > + return 0; > +} > + > static const struct option stat_options[] = { > OPT_BOOLEAN('T', "transaction", &transaction_run, > "hardware transaction statistics"), > @@ -1578,8 +1599,10 @@ static const struct option stat_options[] = { > "aggregate counts per thread", AGGR_THREAD), > OPT_UINTEGER('D', "delay", &initial_delay, > "ms to wait before starting measurement after program start"), > - OPT_BOOLEAN(0, "metric-only", &metric_only, > - "Only print computed metrics. No raw values"), > + OPT_CALLBACK_NOOPT(0, "metric-only", &metric_only, NULL, > + "Only print computed metrics. No raw values", enable_metric_only), > + OPT_BOOLEAN(0, "topdown", &topdown_run, > + "measure topdown level 1 statistics"), > OPT_END() > }; > > @@ -1772,12 +1795,62 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st) > return 0; > } > > +static int topdown_filter_events(const char **attr, char **str, bool use_group) > +{ > + int off = 0; > + int i; > + int len = 0; > + char *s; > + > + for (i = 0; attr[i]; i++) { > + if (pmu_have_event("cpu", attr[i])) { > + len += strlen(attr[i]) + 1; > + attr[i - off] = attr[i]; > + } else > + off++; > + } > + attr[i - off] = NULL; > + > + *str = malloc(len + 1 + 2); > + if (!*str) > + return -1; > + s = *str; > + if (i - off == 0) { > + *s = 0; > + return 0; > + } > + if (use_group) > + *s++ = '{'; > + for (i = 0; attr[i]; i++) { > + strcpy(s, attr[i]); > + s += strlen(s); > + *s++ = ','; > + } > + if (use_group) { > + s[-1] = '}'; > + *s = 0; > + } else > + s[-1] = 0; > + return 0; > +} > + > +__weak bool arch_topdown_check_group(bool *warn) > +{ > + *warn = false; > + return false; > +} > + > +__weak void arch_topdown_group_warn(void) > +{ > +} > + > /* > * Add default attributes, if there were no attributes specified or > * if -d/--detailed, -d -d or -d -d -d is used: > */ > static int add_default_attributes(void) > { > + int err; > struct perf_event_attr default_attrs0[] = { > > { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK }, > @@ -1896,7 +1969,6 @@ static int add_default_attributes(void) > return 0; > > if (transaction_run) { > - int err; > if (pmu_have_event("cpu", "cycles-ct") && > pmu_have_event("cpu", "el-start")) > err = parse_events(evsel_list, transaction_attrs, NULL); > @@ -1909,6 +1981,47 @@ static int add_default_attributes(void) > return 0; > } > > + if (topdown_run) { > + char *str = NULL; > + bool warn = false; > + > + if (stat_config.aggr_mode != AGGR_GLOBAL && > + stat_config.aggr_mode != AGGR_CORE) { > + pr_err("top down event configuration requires --per-core mode\n"); > + return -1; > + } > + stat_config.aggr_mode = AGGR_CORE; > + if (nr_cgroups || !target__has_cpu(&target)) { > + pr_err("top down event configuration requires system-wide mode (-a)\n"); > + return -1; > + } > + > + if (!force_metric_only) > + metric_only = true; > + if (topdown_filter_events(topdown_attrs, &str, > + arch_topdown_check_group(&warn)) < 0) { > + pr_err("Out of memory\n"); > + return -1; > + } > + if (topdown_attrs[0] && str) { > + if (warn) > + arch_topdown_group_warn(); > + err = parse_events(evsel_list, str, NULL); > + if (err) { > + fprintf(stderr, > + "Cannot set up top down events %s: %d\n", > + str, err); > + free(str); > + return -1; > + } > + } else { > + fprintf(stderr, "System does not support topdown\n"); > + free(str); > + return -1; > + } > + free(str); > + } > + > if (!evsel_list->nr_entries) { > if (perf_evlist__add_default_attrs(evsel_list, default_attrs0) < 0) > return -1; > diff --git a/tools/perf/util/group.h b/tools/perf/util/group.h > new file mode 100644 > index 000000000000..116debe7a995 > --- /dev/null > +++ b/tools/perf/util/group.h > @@ -0,0 +1,7 @@ > +#ifndef GROUP_H > +#define GROUP_H 1 > + > +bool arch_topdown_check_group(bool *warn); > +void arch_topdown_group_warn(void); > + > +#endif > diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l > index 1477fbc78993..744ebe3fa30f 100644 > --- a/tools/perf/util/parse-events.l > +++ b/tools/perf/util/parse-events.l > @@ -259,6 +259,7 @@ cycles-ct { return str(yyscanner, PE_KERNEL_PMU_EVENT); } > cycles-t { return str(yyscanner, PE_KERNEL_PMU_EVENT); } > mem-loads { return str(yyscanner, PE_KERNEL_PMU_EVENT); } > mem-stores { return str(yyscanner, PE_KERNEL_PMU_EVENT); } > +topdown-[a-z-]+ { return str(yyscanner, PE_KERNEL_PMU_EVENT); } > > L1-dcache|l1-d|l1d|L1-data | > L1-icache|l1-i|l1i|L1-instruction | > -- > 2.5.5 >