From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754413AbcJVI2Q (ORCPT ); Sat, 22 Oct 2016 04:28:16 -0400 Received: from mail-wm0-f67.google.com ([74.125.82.67]:34245 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752135AbcJVI2L (ORCPT ); Sat, 22 Oct 2016 04:28:11 -0400 Date: Sat, 22 Oct 2016 10:28:06 +0200 From: Ingo Molnar To: Arnaldo Carvalho de Melo Cc: linux-kernel@vger.kernel.org, Linux Weekly News , Andi Kleen , David Ahern , Don Zickus , Jiri Olsa , Joe Mario , Namhyung Kim , Peter Zijlstra , Arnaldo Carvalho de Melo Subject: Re: [GIT PULL 00/52] New Tool: perf c2c Message-ID: <20161022082806.GA4526@gmail.com> References: <1476975876-2522-1-git-send-email-acme@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1476975876-2522-1-git-send-email-acme@kernel.org> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Arnaldo Carvalho de Melo wrote: > Hi Ingo, > > Please consider pulling into tip/perf/core, > > Thanks, > > - Arnaldo > > The following changes since commit 10b37cb59fa1e61fec1386f324615e0e8202cd87: > > Merge tag 'perf-vendor_events-for-mingo-20161018' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2016-10-19 15:22:26 +0200) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-c2c-for-mingo-20161020 > > for you to fetch changes up to 535bbde62701b2bb298063e9dfa007e8a1ff95d1: > > perf c2c report: Add --show-all option (2016-10-19 13:18:31 -0300) > > ---------------------------------------------------------------- > - The 'perf c2c' tool provides means for Shared Data C2C/HITM analysis. > > It allows you to track down cacheline contention. The tool is based > on x86's load latency and precise store facility events provided by > Intel CPUs. > > It was tested by Joe Mario and has proven to be useful, finding some > cacheline contentions. Joe also wrote a blog about c2c tool with > examples: > > https://joemario.github.io/blog/2016/09/01/c2c-blog/ > > Excerpt of the content on this site: > > --- > At a high level, “perf c2c” will show you: > > * The cachelines where false sharing was detected. > * The readers and writers to those cachelines, and the offsets where those accesses occurred. > * The pid, tid, instruction addr, function name, binary object name for those readers and writers. > * The source file and line number for each reader and writer. > * The average load latency for the loads to those cachelines. > * Which numa nodes the samples a cacheline came from and which CPUs were involved. > > Using perf c2c is similar to using the Linux perf tool today. > First collect data with “perf c2c record” Then generate a report output with “perf c2c report” > --- > > There one finds extensive details on using the tool, with tips on > reducing the volume of samples while still capturing enough to do > its job. (Dick Fowles, Joe Mario, Don Zickus, Jiri Olsa) > > Signed-off-by: Arnaldo Carvalho de Melo > > ---------------------------------------------------------------- > Jiri Olsa (52): > perf c2c: Introduce c2c_decode_stats function > perf c2c: Introduce c2c_add_stats function > perf c2c: Add c2c command > perf c2c: Add record subcommand > perf c2c: Add report subcommand > perf c2c report: Add dimension support > perf c2c report: Add sort_entry dimension support > perf c2c report: Fallback to standard dimensions > perf c2c report: Add sample processing > perf c2c report: Add cacheline hists processing > perf c2c report: Decode c2c_stats for hist entries > perf c2c report: Add header macros > perf c2c report: Add 'dcacheline' dimension key > perf c2c report: Add 'offset' dimension key > perf c2c report: Add 'iaddr' dimension key > perf c2c report: Add hitm related dimension keys > perf c2c report: Add stores related dimension keys > perf c2c report: Add loads related dimension keys > perf c2c report: Add llc and remote loads related dimension keys > perf c2c report: Add llc load miss dimension key > perf c2c report: Add total record sort key > perf c2c report: Add total loads sort key > perf c2c report: Add hitm percent sort key > perf c2c report: Add hitm/store percent related sort keys > perf c2c report: Add dram related sort keys > perf c2c report: Add 'pid' sort key > perf c2c report: Add 'tid' sort key > perf c2c report: Add 'symbol' and 'dso' sort keys > perf c2c report: Add 'node' sort key > perf c2c report: Add stats related sort keys > perf c2c report: Add 'cpucnt' sort key > perf c2c report: Add src line sort key > perf c2c report: Setup number of header lines for hists > perf c2c report: Set final resort fields > perf c2c report: Add stdio output support > perf c2c report: Add main TUI browser > perf c2c report: Add TUI cacheline browser > perf c2c report: Add global stats stdio output > perf c2c report: Add shared cachelines stats stdio output > perf c2c report: Add c2c related stats stdio output > perf c2c report: Allow to report callchains > perf c2c report: Limit the cachelines table entries > perf c2c report: Add support to choose local HITMs > perf c2c report: Allow to set cacheline sort fields > perf c2c report: Recalc width of global sort entries > perf c2c report: Add cacheline index entry > perf c2c report: Add support to manage symbol name length > perf c2c report: Iterate node display in browser > perf c2c report: Add help windows > perf c2c: Add man page and credits > perf c2c report: Add --no-source option > perf c2c report: Add --show-all option > > tools/perf/Build | 1 + > tools/perf/Documentation/perf-c2c.txt | 282 ++++ > tools/perf/builtin-c2c.c | 2754 +++++++++++++++++++++++++++++++++ > tools/perf/builtin.h | 1 + > tools/perf/perf.c | 1 + > tools/perf/ui/browsers/hists.c | 2 +- > tools/perf/ui/browsers/hists.h | 1 + > tools/perf/util/hist.c | 1 + > tools/perf/util/hist.h | 1 + > tools/perf/util/mem-events.c | 128 ++ > tools/perf/util/mem-events.h | 37 + > tools/perf/util/sort.c | 2 +- > tools/perf/util/sort.h | 1 + > 13 files changed, 3210 insertions(+), 2 deletions(-) > create mode 100644 tools/perf/Documentation/perf-c2c.txt > create mode 100644 tools/perf/builtin-c2c.c Pulled the perf-c2c-for-mingo-20161021 tag, thanks a lot Arnaldo! I can see some teething problems. For example if I run it on an older kernel (v4.4 distro kernel), I get this: triton:~/tip> perf c2c record perf bench sched pipe # Running 'sched/pipe' benchmark: # Executed 1000000 pipe operations between two processes Total time: 12.001 [sec] 12.001919 usecs/op 83320 ops/sec [ perf record: Woken up 18 times to write data ] [ perf record: Captured and wrote 5.356 MB perf.data (69804 samples) ] but there's no 'perf c2c report' TUI output at all: Shared Data Cache Line Table (0 entries, sorted on remote HITMs) Total Rmt ----- LLC Load Hitm ----- ---- Store Reference ---- --- Load Dram ---- LLC Total ----- Core Load Hit ----- -- LLC Load Hit - Index Cacheline records Hitm Total Lcl Rmt Total L1Hit L1Miss Lcl Rmt Ld Miss Loads FB L1 L2 Llc Rm and just an empty screen. If I do 'perf report' I get two events: Available samples 24K cpu/mem-loads,ldlat=30/P 45K cpu/mem-stores/P and both have some real data. What am I missing? Ingo