From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753979AbcENBqx (ORCPT ); Fri, 13 May 2016 21:46:53 -0400 Received: from mga02.intel.com ([134.134.136.20]:56611 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753653AbcENBpN (ORCPT ); Fri, 13 May 2016 21:45:13 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,615,1455004800"; d="scan'208";a="980193212" From: Andi Kleen To: acme@kernel.org Cc: peterz@infradead.org, jolsa@kernel.org, mingo@kernel.org, linux-kernel@vger.kernel.org Subject: Add top down metrics to perf stat Date: Fri, 13 May 2016 18:44:49 -0700 Message-Id: <1463190297-17408-1-git-send-email-andi@firstfloor.org> X-Mailer: git-send-email 2.5.5 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Note to reviewers: includes both tools and kernel patches. The kernel patches are at the beginning. [v2: Address review feedback. Metrics are now always printed, but colored when crossing threshold. --topdown implies --metric-only. Various smaller fixes, see individual patches] [v3: Add --single-thread option and support it with HT off. Clean up old HT workaround. Improve documentation. Various smaller fixes, see individual patches.] [v4: Rebased on latest tree] [v5: Rebased on latest tree. Move debug messages to -vv] [v6: Rebased. Remove .aggr-per-core and --single-thread to not break old perf binaries. Put SMT enumeration into generic topology API.] [v7: Address review comments. Change patch title headers.] This patchkit adds support for TopDown measurements to perf stat It applies on top of my earlier metrics patchkit, posted separately. TopDown is intended to replace the frontend cycles idle/ backend cycles idle metrics in standard perf stat output. These metrics are not reliable in many workloads, due to out of order effects. This implements a new --topdown mode in perf stat (similar to --transaction) that measures the pipe line bottlenecks using standardized formulas. The measurement can be all done with 5 counters (one fixed counter) The result are four metrics: FrontendBound, BackendBound, BadSpeculation, Retiring that describe the CPU pipeline behavior on a high level. FrontendBound and BackendBound BadSpeculation is a higher The full top down methology has many hierarchical metrics. This implementation only supports level 1 which can be collected without multiplexing. A full implementation of top down on top of perf is available in pmu-tools toplev. (http://github.com/andikleen/pmu-tools) The current version works on Intel Core CPUs starting with Sandy Bridge, and Atom CPUs starting with Silvermont. In principle the generic metrics should be also implementable on other out of order CPUs. TopDown level 1 uses a set of abstracted metrics which are generic to out of order CPU cores (although some CPUs may not implement all of them): topdown-total-slots Available slots in the pipeline topdown-slots-issued Slots issued into the pipeline topdown-slots-retired Slots successfully retired topdown-fetch-bubbles Pipeline gaps in the frontend topdown-recovery-bubbles Pipeline gaps during recovery from misspeculation These metrics then allow to compute four useful metrics: FrontendBound, BackendBound, Retiring, BadSpeculation. The formulas to compute the metrics are generic, they only change based on the availability on the abstracted input values. The kernel declares the events supported by the current CPU and perf stat then computes the formulas based on the available metrics. Example output: $ perf stat --topdown -I 1000 cmd 1.000735655 frontend bound retiring bad speculation backend bound 1.000735655 S0-C0 2 47.84% 11.69% 8.37% 32.10% 1.000735655 S0-C1 2 45.53% 11.39% 8.52% 34.56% 2.003978563 S0-C0 2 49.47% 12.22% 8.65% 29.66% 2.003978563 S0-C1 2 47.21% 12.98% 8.77% 31.04% 3.004901432 S0-C0 2 49.35% 12.26% 8.68% 29.70% 3.004901432 S0-C1 2 47.23% 12.67% 8.76% 31.35% 4.005766611 S0-C0 2 48.44% 12.14% 8.59% 30.82% 4.005766611 S0-C1 2 46.07% 12.41% 8.67% 32.85% 5.006580592 S0-C0 2 47.91% 12.08% 8.57% 31.44% 5.006580592 S0-C1 2 45.57% 12.27% 8.63% 33.53% 6.007545125 S0-C0 2 47.45% 12.02% 8.57% 31.96% 6.007545125 S0-C1 2 45.13% 12.17% 8.57% 34.14% 7.008539347 S0-C0 2 47.07% 12.03% 8.61% 32.29% ... For Level 1 Top Down computes metrics per core instead of per logical CPU on Core CPUs (On Atom CPUs there is no Hyper Threading and TopDown is per thread) In this case perf stat automatically enables --per-core mode and also requires global mode (-a) and avoiding other filters (no cgroup mode) One side effect is that this may require root rights or a kernel.perf_event_paranoid=-1 setting. Full tree available in git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/top-down-21