From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933750AbcI2JTf (ORCPT ); Thu, 29 Sep 2016 05:19:35 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:38556 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932591AbcI2JTZ (ORCPT ); Thu, 29 Sep 2016 05:19:25 -0400 Date: Thu, 29 Sep 2016 11:19:12 +0200 From: Peter Zijlstra To: Jiri Olsa Cc: Arnaldo Carvalho de Melo , Michael Trapp , "Long, Wai Man" , Stanislav Ievlev , Kim Phillips , lkml , Don Zickus , Joe Mario , Ingo Molnar , Namhyung Kim , David Ahern , Andi Kleen , Stephane Eranian Subject: Re: [PATCHv4 00/57] perf c2c: Add new tool to analyze cacheline contention on NUMA systems Message-ID: <20160929091912.GV5012@twins.programming.kicks-ass.net> References: <1474558645-19956-1-git-send-email-jolsa@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1474558645-19956-1-git-send-email-jolsa@kernel.org> User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 22, 2016 at 05:36:28PM +0200, Jiri Olsa wrote: > hi, > sending new version of c2c patches (v3) originally posted in here: > http://lwn.net/Articles/588866/ > > I took the old set and reworked it to fit into current upstream code. > It follows the same logic as original patch and provides (almost) the > same stdio interface. In addition new TUI interface was added. > > The perf c2c tool provides means for Shared Data C2C/HITM analysis. > It allows you to track down the cacheline contentions. The tool is > based on x86's load latency and precise store facility events provided > by Intel CPUs. > > The tool was tested by Joe Mario and has proven to be useful and found > some cachelines contentions. Joe also wrote a blog about c2c tool with > examples located in here: > > https://joemario.github.io/blog/2016/09/01/c2c-blog/ > > v4 changes: > - 4 patches already queued > - used u32 for c2c_stats instead of int [Stanislav] > - fixed NO_SLANG=1 compilation [Kim] > - add __hist_entry__snprintf helper [Arnaldo] > > Code is also available in: > git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git > perf/c2c_v4 > > Testing: > $ perf c2c record -a [workload] > $ perf c2c report [--stdio] > $ man perf-c2c > > It's most likely you won't generate any remote HITMs on common > laptops, so to get results for local HITMs please use: > > $ perf c2c report -d lcl [--stdio] I'll just keep repeating; this is not the tool I want :-( I'll not block this tool, but I also think its far less usable than it should've been. https://lkml.kernel.org/r/20151209093402.GM6356@twins.programming.kicks-ass.net What I want is a tool that maps memop events (any PEBS memops) back to a 'type::member' form and sorts on that. That doesn't rely on the PEBS 'Data Linear Address' field, as that is useless for dynamically allocated bits. Instead it would use the IP and Dwarf information to deduce the 'type::member' of the memop. I want pahole like output, showing me where the hits (green) and misses (red) are in a structure. I want to be able to 'perf memops report -EC task_struct' and see the expanded task_struct (as per 'pahole -EC task_struct') annotated, not a data address for each task in my workload (which could be 100+ and entirely useless). Currently this is somewhat involved, since Dwarf doesn't include type information for all memops, so we'd have to disassemble and interpret, which while tedious is possible. However, afaik, Stephane has been working with their tools team to get additional DWARF info to make this easier. Stephane, any updates on that?