From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751788AbbE0Rwb (ORCPT ); Wed, 27 May 2015 13:52:31 -0400 Received: from mga03.intel.com ([134.134.136.65]:15407 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753379AbbE0RwN (ORCPT ); Wed, 27 May 2015 13:52:13 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.13,506,1427785200"; d="scan'208";a="701130388" From: Andi Kleen To: acme@kernel.org Cc: jolsa@redhat.com, namhyung@kernel.org, eranian@google.com, linux-kernel@vger.kernel.org Subject: Cycles annotation support for perf tools v2 Date: Wed, 27 May 2015 10:51:43 -0700 Message-Id: <1432749114-904-1-git-send-email-andi@firstfloor.org> X-Mailer: git-send-email 2.1.0 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [v2: Addressed review comments. Fixed display problems and correctly compute IPC now. See patches for detailed changes.] The upcoming Skylake CPU has a new timed branch stack feature, that reports cycle counts for individual branches in the last branch record. This allows to get fine grained cost information for code, and also allows to compute fine grained IPC. Available from git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git perf/skl-tools2 This patchkit adds support for this in the perf tools: - Basic support for the cycles field like other branch fields - Show cycles in the standard branch sort view (no IPC here, as IPC needs the instruction counts from annotation) - Annotate cycles and IPC in the assembler annotate view - Add branch support to top, so we can do live annotation. - Misc support, like dumping it in perf report -D The kernel support has been posted separately. I included a test patch to generate fake data for testing on existing systems. Example output for annotate (with made up numbers): The second column is the IPC and third average cycles for the basic block. │ static int hex(char ch) ▒ │ { ▒ 0.12 │ push %rbp ◆ 0.12 │ mov %rsp,%rbp ▒ 0.12 │ sub $0x20,%rsp ▒ 0.12 │ mov %edi,%eax ▒ 0.12 │ mov %al,-0x14(%rbp) ▒ 0.12 │ mov %fs:0x28,%rax ▒ 0.12 │ mov %rax,-0x8(%rbp) ▒ 0.12 │ xor %eax,%eax ▒ │ if ((ch >= '0') && (ch <= '9')) ▒ 0.12 │ cmpb $0x2f,-0x14(%rbp) ▒ 66.67 0.12 123 │ ↓ jle 31 ▒ 0.12 │ cmpb $0x39,-0x14(%rbp) ▒ 0.12 123 │ ↓ jg 31 ▒ │ return ch - '0'; ▒ 22.22 0.12 │ movsbl -0x14(%rbp),%eax ▒ 0.12 │ sub $0x30,%eax ▒ 0.12 123 │ ↓ jmp 60 ▒ │ if ((ch >= 'a') && (ch <= 'f')) ▒ 0.06 │31: cmpb $0x60,-0x14(%rbp) ▒ 0.06 123 │ ↓ jle 46 ▒ 0.06 │ cmpb $0x66,-0x14(%rbp) ▒ 0.06 │ ↓ jg 46 ▒ │ return ch - 'a' + 10; ▒ 0.06 │ movsbl -0x14(%rbp),%eax Example output for branch view (again with fake data): Overhead Command Source Shared Object Source Symbol Target Symbol Basic Block Cycles ◆ 30.08% tcall tcall [.] f1 [.] f2 123 ▒ 27.44% tcall tcall [.] f2 [.] f1 123 ▒ 15.60% tcall tcall [.] main [.] f1 123 ▒ 12.96% tcall tcall [.] f1 [.] main 123 ▒ 12.86% tcall tcall [.] main [.] main 123 ▒ 0.08% tcall [kernel.kallsyms] [k] hrtimer_interrupt [k] hrtimer_interrupt 123 IPC computation has a few limitations (see the comments in the respective patches), in particular it punts on overlaping basic blocks. The annotation only works for the interactive annotation. Currently it is not working in the scripted perf annotate, as that is missing a lot of the infrastructure needed for per instruction state. It would be nice to add column headers to annotate. So far no support in --branch-history or in perf script.