linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL 00/18] perf/core improvements and fixes
@ 2016-11-23 16:40 Arnaldo Carvalho de Melo
  2016-11-23 16:40 ` [PATCH 01/18] perf annotate: Start supporting cross arch annotation Arnaldo Carvalho de Melo
                   ` (18 more replies)
  0 siblings, 19 replies; 20+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-11-23 16:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Adrian Hunter,
	Alexander Shishkin, Andi Kleen, Andrew Morton, Chris Riyder,
	David Ahern, Don Zickus, Jiri Olsa, Joe Mario, Kim Phillips,
	Markus Trippelsdorf, Masami Hiramatsu, Namhyung Kim,
	Naveen N . Rao, Pawel Moll, Peter Zijlstra, Ravi Bangoria,
	Russell King, Stephane Eranian, Steven Rostedt, Taeung Song,
	Wang Nan

From: Arnaldo Carvalho de Melo <acme@redhat.com>

Hi Ingo,

	Please consider pulling,

- Arnaldo

Test results at the end.

The following changes since commit 6a6b12e2125591e24891e6860410795ea53aed11:

  Merge tag 'perf-core-for-mingo-20161114' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2016-11-15 09:45:04 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-20161123

for you to fetch changes up to a407b0678bc1c39d70af5fdbe6421c164b69a8c0:

  perf sched timehist: Add -V/--cpu-visual option (2016-11-23 10:44:09 -0300)

----------------------------------------------------------------
perf/core improvements and fixes:

New tool:

- 'perf sched timehist' provides an analysis of scheduling events.

  Example usage:
      perf sched record -- sleep 1
      perf sched timehist

  By default it shows the individual schedule events, including the wait
  time (time between sched-out and next sched-in events for the task), the
  task scheduling delay (time between wakeup and actually running) and run
  time for the task:

        time    cpu  task name         wait time  sch delay  run time
                     [tid/pid]            (msec)     (msec)    (msec)
    -------- ------  ----------------  ---------  ---------  --------
    1.874569 [0011]  gcc[31949]            0.014      0.000     1.148
    1.874591 [0010]  gcc[31951]            0.000      0.000     0.024
    1.874603 [0010]  migration/10[59]      3.350      0.004     0.011
    1.874604 [0011]  <idle>                1.148      0.000     0.035
    1.874723 [0005]  <idle>                0.016      0.000     1.383
    1.874746 [0005]  gcc[31949]            0.153      0.078     0.022
  ...

  Times are in msec.usec. (David Ahern, Namhyung Kim)

Improvements:

- Make 'perf c2c report' support -f/--force, to allow skipping the
  ownership check for root users, for instance, just like the other
  tools (Jiri Olsa)

- Allow sorting cachelines by total number of HITMs, in addition to
  local and remote numbers (Jiri Olsa)

Fixes:

- Make sure errors aren't suppressed by the TUI reset at the end of
  a 'perf c2c report' session (Jiri Olsa)

Infrastructure:

- Initial work on having the annotate code better support multiple
  architectures, including the ability to cross-annotate, i.e. to
  annotate perf.data files collected on an ARM system on a x86_64
  workstation (Arnaldo Carvalho de Melo, Ravi Bangoria, Kim Phillips)

- Use USECS_PER_SEC instead of hard coded number in libtraceevent (Steven Rostedt)

- Add retrieval of preempt count and latency flags in libtraceevent (Steven Rostedt)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

----------------------------------------------------------------
Arnaldo Carvalho de Melo (3):
      perf annotate: Start supporting cross arch annotation
      perf annotate: Allow arches to specify functions to skip
      perf annotate: Add per arch instructions annotate handlers

David Ahern (5):
      perf sched timehist: Introduce timehist command
      perf sched timehist: Add summary options
      perf sched timehist: Add -w/--wakeups option
      perf sched timehist: Add call graph options
      perf sched timehist: Add -V/--cpu-visual option

Jiri Olsa (6):
      perf tools: Show event fd in debug output
      perf c2c report: Setup browser after opening perf.data
      perf c2c report: Add -f/--force option
      perf c2c report: Add struct c2c_stats::tot_hitm field
      perf c2c report: Display total HITMs on default
      perf c2c: Support cascading options

Namhyung Kim (2):
      perf symbols: Print symbol offsets conditionally
      perf evsel: Support printing callchains with arrows

Steven Rostedt (2):
      tools lib traceevent: Use USECS_PER_SEC instead of hardcoded number
      tools lib traceevent: Add retrieval of preempt count and latency flags

 tools/lib/traceevent/event-parse.c          |  41 +-
 tools/lib/traceevent/event-parse.h          |   5 +-
 tools/perf/Documentation/perf-c2c.txt       |   8 +
 tools/perf/Documentation/perf-sched.txt     |  66 +-
 tools/perf/arch/arm/annotate/instructions.c |  90 +++
 tools/perf/arch/x86/annotate/instructions.c |  78 +++
 tools/perf/builtin-c2c.c                    |  80 ++-
 tools/perf/builtin-sched.c                  | 914 +++++++++++++++++++++++++++-
 tools/perf/builtin-top.c                    |   2 +-
 tools/perf/ui/browsers/annotate.c           |   2 +-
 tools/perf/ui/gtk/annotate.c                |   2 +-
 tools/perf/util/annotate.c                  | 251 ++++----
 tools/perf/util/annotate.h                  |   6 +-
 tools/perf/util/evsel.c                     |   6 +-
 tools/perf/util/evsel.h                     |   1 +
 tools/perf/util/evsel_fprintf.c             |  12 +-
 tools/perf/util/mem-events.c                |  12 +-
 tools/perf/util/mem-events.h                |   1 +
 tools/perf/util/symbol.h                    |   3 +-
 tools/perf/util/symbol_fprintf.c            |  11 +-
 20 files changed, 1406 insertions(+), 185 deletions(-)
 create mode 100644 tools/perf/arch/arm/annotate/instructions.c
 create mode 100644 tools/perf/arch/x86/annotate/instructions.c

  # uname -a
  Linux jouet 4.8.6-201.fc24.x86_64 #1 SMP Thu Nov 3 14:38:57 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
  # perf test
   1: vmlinux symtab matches kallsyms                          : Ok
   2: detect openat syscall event                              : Ok
   3: detect openat syscall event on all cpus                  : Ok
   4: read samples using the mmap interface                    : Ok
   5: parse events tests                                       : Ok
   6: Validate PERF_RECORD_* events & perf_sample fields       : Ok
   7: Test perf pmu format parsing                             : Ok
   8: Test dso data read                                       : Ok
   9: Test dso data cache                                      : Ok
  10: Test dso data reopen                                     : Ok
  11: roundtrip evsel->name check                              : Ok
  12: Check parsing of sched tracepoints fields                : Ok
  13: Generate and check syscalls:sys_enter_openat event fields: Ok
  14: struct perf_event_attr setup                             : Ok
  15: Test matching and linking multiple hists                 : Ok
  16: Try 'import perf' in python, checking link problems      : Ok
  17: Test breakpoint overflow signal handler                  : Ok
  18: Test breakpoint overflow sampling                        : Ok
  19: Test number of exit event of a simple workload           : Ok
  20: Test software clock events have valid period values      : Ok
  21: Test object code reading                                 : Ok
  22: Test sample parsing                                      : Ok
  23: Test using a dummy software event to keep tracking       : Ok
  24: Test parsing with no sample_id_all bit set               : Ok
  25: Test filtering hist entries                              : Ok
  26: Test mmap thread lookup                                  : Ok
  27: Test thread mg sharing                                   : Ok
  28: Test output sorting of hist entries                      : Ok
  29: Test cumulation of child hist entries                    : Ok
  30: Test tracking with sched_switch                          : Ok
  31: Filter fds with revents mask in a fdarray                : Ok
  32: Add fd to a fdarray, making it autogrow                  : Ok
  33: Test kmod_path__parse function                           : Ok
  34: Test thread map                                          : Ok
  35: Test LLVM searching and compiling                        :
  35.1: Basic BPF llvm compiling test                          : Ok
  35.2: Test kbuild searching                                  : Ok
  35.3: Compile source for BPF prologue generation test        : Ok
  35.4: Compile source for BPF relocation test                 : Ok
  36: Test topology in session                                 : Ok
  37: Test BPF filter                                          :
  37.1: Test basic BPF filtering                               : Ok
  37.2: Test BPF prologue generation                           : Ok
  37.3: Test BPF relocation checker                            : Ok
  38: Test thread map synthesize                               : Ok
  39: Test cpu map synthesize                                  : Ok
  40: Test stat config synthesize                              : Ok
  41: Test stat synthesize                                     : Ok
  42: Test stat round synthesize                               : Ok
  43: Test attr update synthesize                              : Ok
  44: Test events times                                        : Ok
  45: Test backward reading from ring buffer                   : Ok
  46: Test cpu map print                                       : Ok
  47: Test SDT event probing                                   : Ok
  48: Test is_printable_array function                         : Ok
  49: Test bitmap print                                        : Ok
  50: x86 rdpmc test                                           : Ok
  51: Test converting perf time to TSC                         : Ok
  52: Test dwarf unwind                                        : Ok
  53: Test x86 instruction decoder - new instructions          : Ok
  54: Test intel cqm nmi context read                          : Skip
  # 

  # dm
   1 alpine:3.4: Ok
   2 android-ndk:r12b-arm: Ok
   3 archlinux:latest: Ok
   4 centos:5: Ok
   5 centos:6: Ok
   6 centos:7: Ok
   7 debian:7: Ok
   8 debian:8: Ok
   9 debian:experimental: Ok
  10 fedora:20: Ok
  11 fedora:21: Ok
  12 fedora:22: Ok
  13 fedora:23: Ok
  14 fedora:24: Ok
  15 fedora:24-x-ARC-uClibc: Ok
  16 fedora:rawhide: Ok
  17 mageia:5: Ok
  18 opensuse:13.2: Ok
  19 opensuse:42.1: Ok
  20 opensuse:tumbleweed: Ok
  21 ubuntu:12.04.5: Ok
  22 ubuntu:14.04: Ok
  23 ubuntu:14.04.4: Ok
  24 ubuntu:15.10: Ok
  25 ubuntu:16.04: Ok
  26 ubuntu:16.04-x-arm: Ok
  27 ubuntu:16.04-x-arm64: Ok
  28 ubuntu:16.04-x-powerpc: Ok
  29 ubuntu:16.04-x-powerpc64: Ok
  30 ubuntu:16.04-x-powerpc64el: Ok
  31 ubuntu:16.04-x-s390: Ok
  32 ubuntu:16.10: Ok
  #

  $ make -C tools/perf build-test
  make: Entering directory '/home/acme/git/linux/tools/perf'
  - tarpkg: ./tests/perf-targz-src-pkg .
             make_no_libperl_O: make NO_LIBPERL=1
            make_no_demangle_O: make NO_DEMANGLE=1
             make_no_libnuma_O: make NO_LIBNUMA=1
            make_install_bin_O: make install-bin
         make_install_prefix_O: make install prefix=/tmp/krava
             make_util_map_o_O: make util/map.o
  make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1
              make_no_libbpf_O: make NO_LIBBPF=1
                    make_doc_O: make doc
       make_util_pmu_bison_o_O: make util/pmu-bison.o
        make_with_babeltrace_O: make LIBBABELTRACE=1
                  make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1
             make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1
                  make_debug_O: make DEBUG=1
                 make_perf_o_O: make perf.o
               make_no_slang_O: make NO_SLANG=1
                make_no_newt_O: make NO_NEWT=1
                make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1
                make_install_O: make install
            make_no_auxtrace_O: make NO_AUXTRACE=1
           make_no_backtrace_O: make NO_BACKTRACE=1
   make_install_prefix_slash_O: make install prefix=/tmp/krava/
              make_clean_all_O: make clean all
           make_no_libpython_O: make NO_LIBPYTHON=1
            make_no_libaudit_O: make NO_LIBAUDIT=1
           make_no_libbionic_O: make NO_LIBBIONIC=1
                   make_help_O: make help
              make_no_libelf_O: make NO_LIBELF=1
                   make_tags_O: make tags
           make_no_libunwind_O: make NO_LIBUNWIND=1
                make_no_gtk2_O: make NO_GTK2=1
                   make_pure_O: make
                 make_static_O: make LDFLAGS=-static
  OK
  make: Leaving directory '/home/acme/git/linux/tools/perf'

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 01/18] perf annotate: Start supporting cross arch annotation
  2016-11-23 16:40 [GIT PULL 00/18] perf/core improvements and fixes Arnaldo Carvalho de Melo
@ 2016-11-23 16:40 ` Arnaldo Carvalho de Melo
  2016-11-23 16:40 ` [PATCH 02/18] perf annotate: Allow arches to specify functions to skip Arnaldo Carvalho de Melo
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-11-23 16:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Adrian Hunter,
	Alexander Shishkin, Chris Riyder, David Ahern, Jiri Olsa,
	Kim Phillips, Markus Trippelsdorf, Masami Hiramatsu,
	Namhyung Kim, Naveen N . Rao, Pawel Moll, Peter Zijlstra,
	Ravi Bangoria, Taeung Song, Wang Nan

From: Arnaldo Carvalho de Melo <acme@redhat.com>

Introduce a 'struct arch', where arch specific stuff will live, starting
with objdump's choice of comment delimitation character, that is '#' in
x86 while a ';' in arm.

This has some bits and pieces from a patch submitted by Ravi.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Chris Riyder <chris.ryder@arm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-f337tzjjcl8vtapgvjxmhrbx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-top.c          |   2 +-
 tools/perf/ui/browsers/annotate.c |   2 +-
 tools/perf/ui/gtk/annotate.c      |   2 +-
 tools/perf/util/annotate.c        | 114 ++++++++++++++++++++++++++++++++------
 tools/perf/util/annotate.h        |   6 +-
 5 files changed, 103 insertions(+), 23 deletions(-)

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index fe3af9535e85..3df4178ba378 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -130,7 +130,7 @@ static int perf_top__parse_source(struct perf_top *top, struct hist_entry *he)
 		return err;
 	}
 
-	err = symbol__disassemble(sym, map, 0);
+	err = symbol__disassemble(sym, map, NULL, 0);
 	if (err == 0) {
 out_assign:
 		top->sym_filter_entry = he;
diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c
index 4c18271c71c9..e6e9f7d80dbd 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -1050,7 +1050,7 @@ int symbol__tui_annotate(struct symbol *sym, struct map *map,
 		  (nr_pcnt - 1);
 	}
 
-	err = symbol__disassemble(sym, map, sizeof_bdl);
+	err = symbol__disassemble(sym, map, perf_evsel__env_arch(evsel), sizeof_bdl);
 	if (err) {
 		char msg[BUFSIZ];
 		symbol__strerror_disassemble(sym, map, err, msg, sizeof(msg));
diff --git a/tools/perf/ui/gtk/annotate.c b/tools/perf/ui/gtk/annotate.c
index 42d319927762..8c9308ac30b7 100644
--- a/tools/perf/ui/gtk/annotate.c
+++ b/tools/perf/ui/gtk/annotate.c
@@ -167,7 +167,7 @@ static int symbol__gtk_annotate(struct symbol *sym, struct map *map,
 	if (map->dso->annotate_warned)
 		return -1;
 
-	err = symbol__disassemble(sym, map, 0);
+	err = symbol__disassemble(sym, map, perf_evsel__env_arch(evsel), 0);
 	if (err) {
 		char msg[BUFSIZ];
 		symbol__strerror_disassemble(sym, map, err, msg, sizeof(msg));
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index aeb5a441bd74..1ba41a27214d 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -18,9 +18,11 @@
 #include "annotate.h"
 #include "evsel.h"
 #include "block-range.h"
+#include "arch/common.h"
 #include <regex.h>
 #include <pthread.h>
 #include <linux/bitops.h>
+#include <sys/utsname.h>
 
 const char 	*disassembler_style;
 const char	*objdump_path;
@@ -29,6 +31,28 @@ static regex_t	 file_lineno;
 static struct ins *ins__find(const char *name);
 static int disasm_line__parse(char *line, char **namep, char **rawp);
 
+struct arch {
+	const char	*name;
+	struct		{
+		char comment_char;
+	} objdump;
+};
+
+static struct arch architectures[] = {
+	{
+		.name = "arm",
+		.objdump =  {
+			.comment_char = ';',
+		},
+	},
+	{
+		.name = "x86",
+		.objdump =  {
+			.comment_char = '#',
+		},
+	},
+};
+
 static void ins__delete(struct ins_operands *ops)
 {
 	if (ops == NULL)
@@ -54,7 +78,7 @@ int ins__scnprintf(struct ins *ins, char *bf, size_t size,
 	return ins__raw_scnprintf(ins, bf, size, ops);
 }
 
-static int call__parse(struct ins_operands *ops, struct map *map)
+static int call__parse(struct arch *arch __maybe_unused, struct ins_operands *ops, struct map *map)
 {
 	char *endptr, *tok, *name;
 
@@ -118,7 +142,7 @@ bool ins__is_call(const struct ins *ins)
 	return ins->ops == &call_ops;
 }
 
-static int jump__parse(struct ins_operands *ops, struct map *map __maybe_unused)
+static int jump__parse(struct arch *arch __maybe_unused, struct ins_operands *ops, struct map *map __maybe_unused)
 {
 	const char *s = strchr(ops->raw, '+');
 
@@ -173,7 +197,7 @@ static int comment__symbol(char *raw, char *comment, u64 *addrp, char **namep)
 	return 0;
 }
 
-static int lock__parse(struct ins_operands *ops, struct map *map)
+static int lock__parse(struct arch *arch, struct ins_operands *ops, struct map *map)
 {
 	char *name;
 
@@ -194,7 +218,7 @@ static int lock__parse(struct ins_operands *ops, struct map *map)
 		return 0;
 
 	if (ops->locked.ins->ops->parse &&
-	    ops->locked.ins->ops->parse(ops->locked.ops, map) < 0)
+	    ops->locked.ins->ops->parse(arch, ops->locked.ops, map) < 0)
 		goto out_free_ops;
 
 	return 0;
@@ -237,7 +261,7 @@ static struct ins_ops lock_ops = {
 	.scnprintf = lock__scnprintf,
 };
 
-static int mov__parse(struct ins_operands *ops, struct map *map __maybe_unused)
+static int mov__parse(struct arch *arch, struct ins_operands *ops, struct map *map __maybe_unused)
 {
 	char *s = strchr(ops->raw, ','), *target, *comment, prev;
 
@@ -252,11 +276,7 @@ static int mov__parse(struct ins_operands *ops, struct map *map __maybe_unused)
 		return -1;
 
 	target = ++s;
-#ifdef __arm__
-	comment = strchr(s, ';');
-#else
-	comment = strchr(s, '#');
-#endif
+	comment = strchr(s, arch->objdump.comment_char);
 
 	if (comment != NULL)
 		s = comment - 1;
@@ -304,7 +324,7 @@ static struct ins_ops mov_ops = {
 	.scnprintf = mov__scnprintf,
 };
 
-static int dec__parse(struct ins_operands *ops, struct map *map __maybe_unused)
+static int dec__parse(struct arch *arch __maybe_unused, struct ins_operands *ops, struct map *map __maybe_unused)
 {
 	char *target, *comment, *s, prev;
 
@@ -492,6 +512,41 @@ static struct ins *ins__find(const char *name)
 	return bsearch(name, instructions, nmemb, sizeof(struct ins), ins__key_cmp);
 }
 
+static int arch__key_cmp(const void *name, const void *archp)
+{
+	const struct arch *arch = archp;
+
+	return strcmp(name, arch->name);
+}
+
+static int arch__cmp(const void *a, const void *b)
+{
+	const struct arch *aa = a;
+	const struct arch *ab = b;
+
+	return strcmp(aa->name, ab->name);
+}
+
+static void arch__sort(void)
+{
+	const int nmemb = ARRAY_SIZE(architectures);
+
+	qsort(architectures, nmemb, sizeof(struct arch), arch__cmp);
+}
+
+static struct arch *arch__find(const char *name)
+{
+	const int nmemb = ARRAY_SIZE(architectures);
+	static bool sorted;
+
+	if (!sorted) {
+		arch__sort();
+		sorted = true;
+	}
+
+	return bsearch(name, architectures, nmemb, sizeof(struct arch), arch__key_cmp);
+}
+
 int symbol__alloc_hist(struct symbol *sym)
 {
 	struct annotation *notes = symbol__annotation(sym);
@@ -709,7 +764,7 @@ int hist_entry__inc_addr_samples(struct hist_entry *he, int evidx, u64 ip)
 	return symbol__inc_addr_samples(he->ms.sym, he->ms.map, evidx, ip);
 }
 
-static void disasm_line__init_ins(struct disasm_line *dl, struct map *map)
+static void disasm_line__init_ins(struct disasm_line *dl, struct arch *arch, struct map *map)
 {
 	dl->ins = ins__find(dl->name);
 
@@ -719,7 +774,7 @@ static void disasm_line__init_ins(struct disasm_line *dl, struct map *map)
 	if (!dl->ins->ops)
 		return;
 
-	if (dl->ins->ops->parse && dl->ins->ops->parse(&dl->ops, map) < 0)
+	if (dl->ins->ops->parse && dl->ins->ops->parse(arch, &dl->ops, map) < 0)
 		dl->ins = NULL;
 }
 
@@ -762,6 +817,7 @@ static int disasm_line__parse(char *line, char **namep, char **rawp)
 
 static struct disasm_line *disasm_line__new(s64 offset, char *line,
 					    size_t privsize, int line_nr,
+					    struct arch *arch,
 					    struct map *map)
 {
 	struct disasm_line *dl = zalloc(sizeof(*dl) + privsize);
@@ -777,7 +833,7 @@ static struct disasm_line *disasm_line__new(s64 offset, char *line,
 			if (disasm_line__parse(dl->line, &dl->name, &dl->ops.raw) < 0)
 				goto out_free_line;
 
-			disasm_line__init_ins(dl, map);
+			disasm_line__init_ins(dl, arch, map);
 		}
 	}
 
@@ -1087,6 +1143,7 @@ static int disasm_line__print(struct disasm_line *dl, struct symbol *sym, u64 st
  * The ops.raw part will be parsed further according to type of the instruction.
  */
 static int symbol__parse_objdump_line(struct symbol *sym, struct map *map,
+				      struct arch *arch,
 				      FILE *file, size_t privsize,
 				      int *line_nr)
 {
@@ -1149,7 +1206,7 @@ static int symbol__parse_objdump_line(struct symbol *sym, struct map *map,
 			parsed_line = tmp2 + 1;
 	}
 
-	dl = disasm_line__new(offset, parsed_line, privsize, *line_nr, map);
+	dl = disasm_line__new(offset, parsed_line, privsize, *line_nr, arch, map);
 	free(line);
 	(*line_nr)++;
 
@@ -1280,10 +1337,23 @@ static int dso__disassemble_filename(struct dso *dso, char *filename, size_t fil
 	return 0;
 }
 
-int symbol__disassemble(struct symbol *sym, struct map *map, size_t privsize)
+static const char *annotate__norm_arch(const char *arch_name)
+{
+	struct utsname uts;
+
+	if (!arch_name) { /* Assume we are annotating locally. */
+		if (uname(&uts) < 0)
+			return NULL;
+		arch_name = uts.machine;
+	}
+	return normalize_arch((char *)arch_name);
+}
+
+int symbol__disassemble(struct symbol *sym, struct map *map, const char *arch_name, size_t privsize)
 {
 	struct dso *dso = map->dso;
 	char command[PATH_MAX * 2];
+	struct arch *arch = NULL;
 	FILE *file;
 	char symfs_filename[PATH_MAX];
 	struct kcore_extract kce;
@@ -1297,6 +1367,14 @@ int symbol__disassemble(struct symbol *sym, struct map *map, size_t privsize)
 	if (err)
 		return err;
 
+	arch_name = annotate__norm_arch(arch_name);
+	if (!arch_name)
+		return -1;
+
+	arch = arch__find(arch_name);
+	if (arch == NULL)
+		return -ENOTSUP;
+
 	pr_debug("%s: filename=%s, sym=%s, start=%#" PRIx64 ", end=%#" PRIx64 "\n", __func__,
 		 symfs_filename, sym->name, map->unmap_ip(map, sym->start),
 		 map->unmap_ip(map, sym->end));
@@ -1395,7 +1473,7 @@ int symbol__disassemble(struct symbol *sym, struct map *map, size_t privsize)
 
 	nline = 0;
 	while (!feof(file)) {
-		if (symbol__parse_objdump_line(sym, map, file, privsize,
+		if (symbol__parse_objdump_line(sym, map, arch, file, privsize,
 			    &lineno) < 0)
 			break;
 		nline++;
@@ -1793,7 +1871,7 @@ int symbol__tty_annotate(struct symbol *sym, struct map *map,
 	struct rb_root source_line = RB_ROOT;
 	u64 len;
 
-	if (symbol__disassemble(sym, map, 0) < 0)
+	if (symbol__disassemble(sym, map, perf_evsel__env_arch(evsel), 0) < 0)
 		return -1;
 
 	len = symbol__size(sym);
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index 5bbcec173b82..8e490b5c91bc 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -34,9 +34,11 @@ struct ins_operands {
 	};
 };
 
+struct arch;
+
 struct ins_ops {
 	void (*free)(struct ins_operands *ops);
-	int (*parse)(struct ins_operands *ops, struct map *map);
+	int (*parse)(struct arch *arch, struct ins_operands *ops, struct map *map);
 	int (*scnprintf)(struct ins *ins, char *bf, size_t size,
 			 struct ins_operands *ops);
 };
@@ -156,7 +158,7 @@ int hist_entry__inc_addr_samples(struct hist_entry *he, int evidx, u64 addr);
 int symbol__alloc_hist(struct symbol *sym);
 void symbol__annotate_zero_histograms(struct symbol *sym);
 
-int symbol__disassemble(struct symbol *sym, struct map *map, size_t privsize);
+int symbol__disassemble(struct symbol *sym, struct map *map, const char *arch_name, size_t privsize);
 
 enum symbol_disassemble_errno {
 	SYMBOL_ANNOTATE_ERRNO__SUCCESS		= 0,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 02/18] perf annotate: Allow arches to specify functions to skip
  2016-11-23 16:40 [GIT PULL 00/18] perf/core improvements and fixes Arnaldo Carvalho de Melo
  2016-11-23 16:40 ` [PATCH 01/18] perf annotate: Start supporting cross arch annotation Arnaldo Carvalho de Melo
@ 2016-11-23 16:40 ` Arnaldo Carvalho de Melo
  2016-11-23 16:40 ` [PATCH 03/18] perf annotate: Add per arch instructions annotate handlers Arnaldo Carvalho de Melo
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-11-23 16:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Adrian Hunter,
	Alexander Shishkin, Chris Riyder, David Ahern, Jiri Olsa,
	Kim Phillips, Markus Trippelsdorf, Masami Hiramatsu,
	Namhyung Kim, Naveen N . Rao, Pawel Moll, Peter Zijlstra,
	Ravi Bangoria, Russell King, Taeung Song, Wang Nan

From: Arnaldo Carvalho de Melo <acme@redhat.com>

This is to cope with an ARM specific kludge introduced in the original
patch supporting ARM annotation, cfef25b8daf7 ("perf annotate: ARM
support") that made functions with a '+' in its name to be skipped when
processing call instructions.

With this patchkit it should be possible to collect a perf.data file on
a ARM machine and then annotate it on a x86 workstation and have those
ARM kludges used.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Chris Riyder <chris.ryder@arm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-2fi3sy7q3sssdi7m7cbe07gy@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/annotate.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 1ba41a27214d..72769762ece9 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -35,6 +35,7 @@ struct arch {
 	const char	*name;
 	struct		{
 		char comment_char;
+		char skip_functions_char;
 	} objdump;
 };
 
@@ -43,6 +44,7 @@ static struct arch architectures[] = {
 		.name = "arm",
 		.objdump =  {
 			.comment_char = ';',
+			.skip_functions_char = '+',
 		},
 	},
 	{
@@ -78,7 +80,7 @@ int ins__scnprintf(struct ins *ins, char *bf, size_t size,
 	return ins__raw_scnprintf(ins, bf, size, ops);
 }
 
-static int call__parse(struct arch *arch __maybe_unused, struct ins_operands *ops, struct map *map)
+static int call__parse(struct arch *arch, struct ins_operands *ops, struct map *map)
 {
 	char *endptr, *tok, *name;
 
@@ -90,10 +92,9 @@ static int call__parse(struct arch *arch __maybe_unused, struct ins_operands *op
 
 	name++;
 
-#ifdef __arm__
-	if (strchr(name, '+'))
+	if (arch->objdump.skip_functions_char &&
+	    strchr(name, arch->objdump.skip_functions_char))
 		return -1;
-#endif
 
 	tok = strchr(name, '>');
 	if (tok == NULL)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 03/18] perf annotate: Add per arch instructions annotate handlers
  2016-11-23 16:40 [GIT PULL 00/18] perf/core improvements and fixes Arnaldo Carvalho de Melo
  2016-11-23 16:40 ` [PATCH 01/18] perf annotate: Start supporting cross arch annotation Arnaldo Carvalho de Melo
  2016-11-23 16:40 ` [PATCH 02/18] perf annotate: Allow arches to specify functions to skip Arnaldo Carvalho de Melo
@ 2016-11-23 16:40 ` Arnaldo Carvalho de Melo
  2016-11-23 16:40 ` [PATCH 04/18] tools lib traceevent: Use USECS_PER_SEC instead of hardcoded number Arnaldo Carvalho de Melo
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-11-23 16:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Adrian Hunter,
	Alexander Shishkin, Chris Riyder, David Ahern, Jiri Olsa,
	Kim Phillips, Markus Trippelsdorf, Masami Hiramatsu,
	Namhyung Kim, Naveen N . Rao, Pawel Moll, Peter Zijlstra,
	Ravi Bangoria, Russell King, Taeung Song, Wang Nan

From: Arnaldo Carvalho de Melo <acme@redhat.com>

Another step in supporting cross annotation.

The arch specific tables are put in:

   tools/perf/arch/$ARCH/annotation/instructions.c

which, so far, just plug instructions to a bunch of parsers/formatters,
but may have more as the need arises.

This is an alternative implementation to a previous attempt made by Ravi
Bangoria.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Chris Riyder <chris.ryder@arm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-g3wt282lfa51j4qd0813e3az@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/arch/arm/annotate/instructions.c |  90 ++++++++++++++++++
 tools/perf/arch/x86/annotate/instructions.c |  78 ++++++++++++++++
 tools/perf/util/annotate.c                  | 136 ++++++----------------------
 3 files changed, 198 insertions(+), 106 deletions(-)
 create mode 100644 tools/perf/arch/arm/annotate/instructions.c
 create mode 100644 tools/perf/arch/x86/annotate/instructions.c

diff --git a/tools/perf/arch/arm/annotate/instructions.c b/tools/perf/arch/arm/annotate/instructions.c
new file mode 100644
index 000000000000..d67b8aa26274
--- /dev/null
+++ b/tools/perf/arch/arm/annotate/instructions.c
@@ -0,0 +1,90 @@
+static struct ins arm__instructions[] = {
+	{ .name = "add",	.ops = &mov_ops,  },
+	{ .name = "addl",	.ops = &mov_ops,  },
+	{ .name = "addq",	.ops = &mov_ops,  },
+	{ .name = "addw",	.ops = &mov_ops,  },
+	{ .name = "and",	.ops = &mov_ops,  },
+	{ .name = "b",		.ops = &jump_ops, }, // might also be a call
+	{ .name = "bcc",	.ops = &jump_ops, },
+	{ .name = "bcs",	.ops = &jump_ops, },
+	{ .name = "beq",	.ops = &jump_ops, },
+	{ .name = "bge",	.ops = &jump_ops, },
+	{ .name = "bgt",	.ops = &jump_ops, },
+	{ .name = "bhi",	.ops = &jump_ops, },
+	{ .name = "bl",		.ops = &call_ops, },
+	{ .name = "bls",	.ops = &jump_ops, },
+	{ .name = "blt",	.ops = &jump_ops, },
+	{ .name = "blx",	.ops = &call_ops, },
+	{ .name = "bne",	.ops = &jump_ops, },
+	{ .name = "bts",	.ops = &mov_ops,  },
+	{ .name = "call",	.ops = &call_ops, },
+	{ .name = "callq",	.ops = &call_ops, },
+	{ .name = "cmp",	.ops = &mov_ops,  },
+	{ .name = "cmpb",	.ops = &mov_ops,  },
+	{ .name = "cmpl",	.ops = &mov_ops,  },
+	{ .name = "cmpq",	.ops = &mov_ops,  },
+	{ .name = "cmpw",	.ops = &mov_ops,  },
+	{ .name = "cmpxch",	.ops = &mov_ops,  },
+	{ .name = "dec",	.ops = &dec_ops,  },
+	{ .name = "decl",	.ops = &dec_ops,  },
+	{ .name = "imul",	.ops = &mov_ops,  },
+	{ .name = "inc",	.ops = &dec_ops,  },
+	{ .name = "incl",	.ops = &dec_ops,  },
+	{ .name = "ja",		.ops = &jump_ops, },
+	{ .name = "jae",	.ops = &jump_ops, },
+	{ .name = "jb",		.ops = &jump_ops, },
+	{ .name = "jbe",	.ops = &jump_ops, },
+	{ .name = "jc",		.ops = &jump_ops, },
+	{ .name = "jcxz",	.ops = &jump_ops, },
+	{ .name = "je",		.ops = &jump_ops, },
+	{ .name = "jecxz",	.ops = &jump_ops, },
+	{ .name = "jg",		.ops = &jump_ops, },
+	{ .name = "jge",	.ops = &jump_ops, },
+	{ .name = "jl",		.ops = &jump_ops, },
+	{ .name = "jle",	.ops = &jump_ops, },
+	{ .name = "jmp",	.ops = &jump_ops, },
+	{ .name = "jmpq",	.ops = &jump_ops, },
+	{ .name = "jna",	.ops = &jump_ops, },
+	{ .name = "jnae",	.ops = &jump_ops, },
+	{ .name = "jnb",	.ops = &jump_ops, },
+	{ .name = "jnbe",	.ops = &jump_ops, },
+	{ .name = "jnc",	.ops = &jump_ops, },
+	{ .name = "jne",	.ops = &jump_ops, },
+	{ .name = "jng",	.ops = &jump_ops, },
+	{ .name = "jnge",	.ops = &jump_ops, },
+	{ .name = "jnl",	.ops = &jump_ops, },
+	{ .name = "jnle",	.ops = &jump_ops, },
+	{ .name = "jno",	.ops = &jump_ops, },
+	{ .name = "jnp",	.ops = &jump_ops, },
+	{ .name = "jns",	.ops = &jump_ops, },
+	{ .name = "jnz",	.ops = &jump_ops, },
+	{ .name = "jo",		.ops = &jump_ops, },
+	{ .name = "jp",		.ops = &jump_ops, },
+	{ .name = "jpe",	.ops = &jump_ops, },
+	{ .name = "jpo",	.ops = &jump_ops, },
+	{ .name = "jrcxz",	.ops = &jump_ops, },
+	{ .name = "js",		.ops = &jump_ops, },
+	{ .name = "jz",		.ops = &jump_ops, },
+	{ .name = "lea",	.ops = &mov_ops,  },
+	{ .name = "lock",	.ops = &lock_ops, },
+	{ .name = "mov",	.ops = &mov_ops,  },
+	{ .name = "movb",	.ops = &mov_ops,  },
+	{ .name = "movdqa",	.ops = &mov_ops,  },
+	{ .name = "movl",	.ops = &mov_ops,  },
+	{ .name = "movq",	.ops = &mov_ops,  },
+	{ .name = "movslq",	.ops = &mov_ops,  },
+	{ .name = "movzbl",	.ops = &mov_ops,  },
+	{ .name = "movzwl",	.ops = &mov_ops,  },
+	{ .name = "nop",	.ops = &nop_ops,  },
+	{ .name = "nopl",	.ops = &nop_ops,  },
+	{ .name = "nopw",	.ops = &nop_ops,  },
+	{ .name = "or",		.ops = &mov_ops,  },
+	{ .name = "orl",	.ops = &mov_ops,  },
+	{ .name = "test",	.ops = &mov_ops,  },
+	{ .name = "testb",	.ops = &mov_ops,  },
+	{ .name = "testl",	.ops = &mov_ops,  },
+	{ .name = "xadd",	.ops = &mov_ops,  },
+	{ .name = "xbeginl",	.ops = &jump_ops, },
+	{ .name = "xbeginq",	.ops = &jump_ops, },
+	{ .name = "retq",	.ops = &ret_ops,  },
+};
diff --git a/tools/perf/arch/x86/annotate/instructions.c b/tools/perf/arch/x86/annotate/instructions.c
new file mode 100644
index 000000000000..c1625f256df3
--- /dev/null
+++ b/tools/perf/arch/x86/annotate/instructions.c
@@ -0,0 +1,78 @@
+static struct ins x86__instructions[] = {
+	{ .name = "add",	.ops = &mov_ops,  },
+	{ .name = "addl",	.ops = &mov_ops,  },
+	{ .name = "addq",	.ops = &mov_ops,  },
+	{ .name = "addw",	.ops = &mov_ops,  },
+	{ .name = "and",	.ops = &mov_ops,  },
+	{ .name = "bts",	.ops = &mov_ops,  },
+	{ .name = "call",	.ops = &call_ops, },
+	{ .name = "callq",	.ops = &call_ops, },
+	{ .name = "cmp",	.ops = &mov_ops,  },
+	{ .name = "cmpb",	.ops = &mov_ops,  },
+	{ .name = "cmpl",	.ops = &mov_ops,  },
+	{ .name = "cmpq",	.ops = &mov_ops,  },
+	{ .name = "cmpw",	.ops = &mov_ops,  },
+	{ .name = "cmpxch",	.ops = &mov_ops,  },
+	{ .name = "dec",	.ops = &dec_ops,  },
+	{ .name = "decl",	.ops = &dec_ops,  },
+	{ .name = "imul",	.ops = &mov_ops,  },
+	{ .name = "inc",	.ops = &dec_ops,  },
+	{ .name = "incl",	.ops = &dec_ops,  },
+	{ .name = "ja",		.ops = &jump_ops, },
+	{ .name = "jae",	.ops = &jump_ops, },
+	{ .name = "jb",		.ops = &jump_ops, },
+	{ .name = "jbe",	.ops = &jump_ops, },
+	{ .name = "jc",		.ops = &jump_ops, },
+	{ .name = "jcxz",	.ops = &jump_ops, },
+	{ .name = "je",		.ops = &jump_ops, },
+	{ .name = "jecxz",	.ops = &jump_ops, },
+	{ .name = "jg",		.ops = &jump_ops, },
+	{ .name = "jge",	.ops = &jump_ops, },
+	{ .name = "jl",		.ops = &jump_ops, },
+	{ .name = "jle",	.ops = &jump_ops, },
+	{ .name = "jmp",	.ops = &jump_ops, },
+	{ .name = "jmpq",	.ops = &jump_ops, },
+	{ .name = "jna",	.ops = &jump_ops, },
+	{ .name = "jnae",	.ops = &jump_ops, },
+	{ .name = "jnb",	.ops = &jump_ops, },
+	{ .name = "jnbe",	.ops = &jump_ops, },
+	{ .name = "jnc",	.ops = &jump_ops, },
+	{ .name = "jne",	.ops = &jump_ops, },
+	{ .name = "jng",	.ops = &jump_ops, },
+	{ .name = "jnge",	.ops = &jump_ops, },
+	{ .name = "jnl",	.ops = &jump_ops, },
+	{ .name = "jnle",	.ops = &jump_ops, },
+	{ .name = "jno",	.ops = &jump_ops, },
+	{ .name = "jnp",	.ops = &jump_ops, },
+	{ .name = "jns",	.ops = &jump_ops, },
+	{ .name = "jnz",	.ops = &jump_ops, },
+	{ .name = "jo",		.ops = &jump_ops, },
+	{ .name = "jp",		.ops = &jump_ops, },
+	{ .name = "jpe",	.ops = &jump_ops, },
+	{ .name = "jpo",	.ops = &jump_ops, },
+	{ .name = "jrcxz",	.ops = &jump_ops, },
+	{ .name = "js",		.ops = &jump_ops, },
+	{ .name = "jz",		.ops = &jump_ops, },
+	{ .name = "lea",	.ops = &mov_ops,  },
+	{ .name = "lock",	.ops = &lock_ops, },
+	{ .name = "mov",	.ops = &mov_ops,  },
+	{ .name = "movb",	.ops = &mov_ops,  },
+	{ .name = "movdqa",	.ops = &mov_ops,  },
+	{ .name = "movl",	.ops = &mov_ops,  },
+	{ .name = "movq",	.ops = &mov_ops,  },
+	{ .name = "movslq",	.ops = &mov_ops,  },
+	{ .name = "movzbl",	.ops = &mov_ops,  },
+	{ .name = "movzwl",	.ops = &mov_ops,  },
+	{ .name = "nop",	.ops = &nop_ops,  },
+	{ .name = "nopl",	.ops = &nop_ops,  },
+	{ .name = "nopw",	.ops = &nop_ops,  },
+	{ .name = "or",		.ops = &mov_ops,  },
+	{ .name = "orl",	.ops = &mov_ops,  },
+	{ .name = "test",	.ops = &mov_ops,  },
+	{ .name = "testb",	.ops = &mov_ops,  },
+	{ .name = "testl",	.ops = &mov_ops,  },
+	{ .name = "xadd",	.ops = &mov_ops,  },
+	{ .name = "xbeginl",	.ops = &jump_ops, },
+	{ .name = "xbeginq",	.ops = &jump_ops, },
+	{ .name = "retq",	.ops = &ret_ops,  },
+};
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 72769762ece9..095d90a9077f 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -28,20 +28,36 @@ const char 	*disassembler_style;
 const char	*objdump_path;
 static regex_t	 file_lineno;
 
-static struct ins *ins__find(const char *name);
+static struct ins *ins__find(struct arch *arch, const char *name);
 static int disasm_line__parse(char *line, char **namep, char **rawp);
 
 struct arch {
 	const char	*name;
+	struct ins	*instructions;
+	size_t		nr_instructions;
+	bool		sorted_instructions;
 	struct		{
 		char comment_char;
 		char skip_functions_char;
 	} objdump;
 };
 
+static struct ins_ops call_ops;
+static struct ins_ops dec_ops;
+static struct ins_ops jump_ops;
+static struct ins_ops mov_ops;
+static struct ins_ops nop_ops;
+static struct ins_ops lock_ops;
+static struct ins_ops ret_ops;
+
+#include "arch/arm/annotate/instructions.c"
+#include "arch/x86/annotate/instructions.c"
+
 static struct arch architectures[] = {
 	{
 		.name = "arm",
+		.instructions = arm__instructions,
+		.nr_instructions = ARRAY_SIZE(arm__instructions),
 		.objdump =  {
 			.comment_char = ';',
 			.skip_functions_char = '+',
@@ -49,6 +65,8 @@ static struct arch architectures[] = {
 	},
 	{
 		.name = "x86",
+		.instructions = x86__instructions,
+		.nr_instructions = ARRAY_SIZE(x86__instructions),
 		.objdump =  {
 			.comment_char = '#',
 		},
@@ -209,7 +227,7 @@ static int lock__parse(struct arch *arch, struct ins_operands *ops, struct map *
 	if (disasm_line__parse(ops->raw, &name, &ops->locked.ops->raw) < 0)
 		goto out_free_ops;
 
-	ops->locked.ins = ins__find(name);
+	ops->locked.ins = ins__find(arch, name);
 	free(name);
 
 	if (ops->locked.ins == NULL)
@@ -385,99 +403,6 @@ bool ins__is_ret(const struct ins *ins)
 	return ins->ops == &ret_ops;
 }
 
-static struct ins instructions[] = {
-	{ .name = "add",   .ops  = &mov_ops, },
-	{ .name = "addl",  .ops  = &mov_ops, },
-	{ .name = "addq",  .ops  = &mov_ops, },
-	{ .name = "addw",  .ops  = &mov_ops, },
-	{ .name = "and",   .ops  = &mov_ops, },
-#ifdef __arm__
-	{ .name = "b",     .ops  = &jump_ops, }, // might also be a call
-	{ .name = "bcc",   .ops  = &jump_ops, },
-	{ .name = "bcs",   .ops  = &jump_ops, },
-	{ .name = "beq",   .ops  = &jump_ops, },
-	{ .name = "bge",   .ops  = &jump_ops, },
-	{ .name = "bgt",   .ops  = &jump_ops, },
-	{ .name = "bhi",   .ops  = &jump_ops, },
-	{ .name = "bl",    .ops  = &call_ops, },
-	{ .name = "bls",   .ops  = &jump_ops, },
-	{ .name = "blt",   .ops  = &jump_ops, },
-	{ .name = "blx",   .ops  = &call_ops, },
-	{ .name = "bne",   .ops  = &jump_ops, },
-#endif
-	{ .name = "bts",   .ops  = &mov_ops, },
-	{ .name = "call",  .ops  = &call_ops, },
-	{ .name = "callq", .ops  = &call_ops, },
-	{ .name = "cmp",   .ops  = &mov_ops, },
-	{ .name = "cmpb",  .ops  = &mov_ops, },
-	{ .name = "cmpl",  .ops  = &mov_ops, },
-	{ .name = "cmpq",  .ops  = &mov_ops, },
-	{ .name = "cmpw",  .ops  = &mov_ops, },
-	{ .name = "cmpxch", .ops  = &mov_ops, },
-	{ .name = "dec",   .ops  = &dec_ops, },
-	{ .name = "decl",  .ops  = &dec_ops, },
-	{ .name = "imul",  .ops  = &mov_ops, },
-	{ .name = "inc",   .ops  = &dec_ops, },
-	{ .name = "incl",  .ops  = &dec_ops, },
-	{ .name = "ja",	   .ops  = &jump_ops, },
-	{ .name = "jae",   .ops  = &jump_ops, },
-	{ .name = "jb",	   .ops  = &jump_ops, },
-	{ .name = "jbe",   .ops  = &jump_ops, },
-	{ .name = "jc",	   .ops  = &jump_ops, },
-	{ .name = "jcxz",  .ops  = &jump_ops, },
-	{ .name = "je",	   .ops  = &jump_ops, },
-	{ .name = "jecxz", .ops  = &jump_ops, },
-	{ .name = "jg",	   .ops  = &jump_ops, },
-	{ .name = "jge",   .ops  = &jump_ops, },
-	{ .name = "jl",    .ops  = &jump_ops, },
-	{ .name = "jle",   .ops  = &jump_ops, },
-	{ .name = "jmp",   .ops  = &jump_ops, },
-	{ .name = "jmpq",  .ops  = &jump_ops, },
-	{ .name = "jna",   .ops  = &jump_ops, },
-	{ .name = "jnae",  .ops  = &jump_ops, },
-	{ .name = "jnb",   .ops  = &jump_ops, },
-	{ .name = "jnbe",  .ops  = &jump_ops, },
-	{ .name = "jnc",   .ops  = &jump_ops, },
-	{ .name = "jne",   .ops  = &jump_ops, },
-	{ .name = "jng",   .ops  = &jump_ops, },
-	{ .name = "jnge",  .ops  = &jump_ops, },
-	{ .name = "jnl",   .ops  = &jump_ops, },
-	{ .name = "jnle",  .ops  = &jump_ops, },
-	{ .name = "jno",   .ops  = &jump_ops, },
-	{ .name = "jnp",   .ops  = &jump_ops, },
-	{ .name = "jns",   .ops  = &jump_ops, },
-	{ .name = "jnz",   .ops  = &jump_ops, },
-	{ .name = "jo",	   .ops  = &jump_ops, },
-	{ .name = "jp",	   .ops  = &jump_ops, },
-	{ .name = "jpe",   .ops  = &jump_ops, },
-	{ .name = "jpo",   .ops  = &jump_ops, },
-	{ .name = "jrcxz", .ops  = &jump_ops, },
-	{ .name = "js",	   .ops  = &jump_ops, },
-	{ .name = "jz",	   .ops  = &jump_ops, },
-	{ .name = "lea",   .ops  = &mov_ops, },
-	{ .name = "lock",  .ops  = &lock_ops, },
-	{ .name = "mov",   .ops  = &mov_ops, },
-	{ .name = "movb",  .ops  = &mov_ops, },
-	{ .name = "movdqa",.ops  = &mov_ops, },
-	{ .name = "movl",  .ops  = &mov_ops, },
-	{ .name = "movq",  .ops  = &mov_ops, },
-	{ .name = "movslq", .ops  = &mov_ops, },
-	{ .name = "movzbl", .ops  = &mov_ops, },
-	{ .name = "movzwl", .ops  = &mov_ops, },
-	{ .name = "nop",   .ops  = &nop_ops, },
-	{ .name = "nopl",  .ops  = &nop_ops, },
-	{ .name = "nopw",  .ops  = &nop_ops, },
-	{ .name = "or",    .ops  = &mov_ops, },
-	{ .name = "orl",   .ops  = &mov_ops, },
-	{ .name = "test",  .ops  = &mov_ops, },
-	{ .name = "testb", .ops  = &mov_ops, },
-	{ .name = "testl", .ops  = &mov_ops, },
-	{ .name = "xadd",  .ops  = &mov_ops, },
-	{ .name = "xbeginl", .ops  = &jump_ops, },
-	{ .name = "xbeginq", .ops  = &jump_ops, },
-	{ .name = "retq",  .ops  = &ret_ops, },
-};
-
 static int ins__key_cmp(const void *name, const void *insp)
 {
 	const struct ins *ins = insp;
@@ -493,24 +418,23 @@ static int ins__cmp(const void *a, const void *b)
 	return strcmp(ia->name, ib->name);
 }
 
-static void ins__sort(void)
+static void ins__sort(struct arch *arch)
 {
-	const int nmemb = ARRAY_SIZE(instructions);
+	const int nmemb = arch->nr_instructions;
 
-	qsort(instructions, nmemb, sizeof(struct ins), ins__cmp);
+	qsort(arch->instructions, nmemb, sizeof(struct ins), ins__cmp);
 }
 
-static struct ins *ins__find(const char *name)
+static struct ins *ins__find(struct arch *arch, const char *name)
 {
-	const int nmemb = ARRAY_SIZE(instructions);
-	static bool sorted;
+	const int nmemb = arch->nr_instructions;
 
-	if (!sorted) {
-		ins__sort();
-		sorted = true;
+	if (!arch->sorted_instructions) {
+		ins__sort(arch);
+		arch->sorted_instructions = true;
 	}
 
-	return bsearch(name, instructions, nmemb, sizeof(struct ins), ins__key_cmp);
+	return bsearch(name, arch->instructions, nmemb, sizeof(struct ins), ins__key_cmp);
 }
 
 static int arch__key_cmp(const void *name, const void *archp)
@@ -767,7 +691,7 @@ int hist_entry__inc_addr_samples(struct hist_entry *he, int evidx, u64 ip)
 
 static void disasm_line__init_ins(struct disasm_line *dl, struct arch *arch, struct map *map)
 {
-	dl->ins = ins__find(dl->name);
+	dl->ins = ins__find(arch, dl->name);
 
 	if (dl->ins == NULL)
 		return;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 04/18] tools lib traceevent: Use USECS_PER_SEC instead of hardcoded number
  2016-11-23 16:40 [GIT PULL 00/18] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (2 preceding siblings ...)
  2016-11-23 16:40 ` [PATCH 03/18] perf annotate: Add per arch instructions annotate handlers Arnaldo Carvalho de Melo
@ 2016-11-23 16:40 ` Arnaldo Carvalho de Melo
  2016-11-23 16:40 ` [PATCH 05/18] tools lib traceevent: Add retrieval of preempt count and latency flags Arnaldo Carvalho de Melo
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-11-23 16:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Steven Rostedt, Andrew Morton, Jiri Olsa,
	Arnaldo Carvalho de Melo

From: Steven Rostedt <rostedt@goodmis.org>

Instead of using 1000000, use the define in time64.h instead.

Also remove the the duplicate defines for NSECS_PER_SEC.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20161121114149.67111981@gandalf.local.home
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/lib/traceevent/event-parse.c | 11 ++++++-----
 tools/lib/traceevent/event-parse.h |  3 ---
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c
index 664c90c8e22b..c1ad1ffa7335 100644
--- a/tools/lib/traceevent/event-parse.c
+++ b/tools/lib/traceevent/event-parse.c
@@ -33,6 +33,7 @@
 #include <stdint.h>
 #include <limits.h>
 #include <linux/string.h>
+#include <linux/time64.h>
 
 #include <netinet/in.h>
 #include "event-parse.h"
@@ -5424,8 +5425,8 @@ void pevent_print_event_time(struct pevent *pevent, struct trace_seq *s,
 	use_usec_format = is_timestamp_in_us(pevent->trace_clock,
 							use_trace_clock);
 	if (use_usec_format) {
-		secs = record->ts / NSECS_PER_SEC;
-		nsecs = record->ts - secs * NSECS_PER_SEC;
+		secs = record->ts / NSEC_PER_SEC;
+		nsecs = record->ts - secs * NSEC_PER_SEC;
 	}
 
 	if (pevent->latency_format) {
@@ -5437,10 +5438,10 @@ void pevent_print_event_time(struct pevent *pevent, struct trace_seq *s,
 			usecs = nsecs;
 			p = 9;
 		} else {
-			usecs = (nsecs + 500) / NSECS_PER_USEC;
+			usecs = (nsecs + 500) / NSEC_PER_USEC;
 			/* To avoid usecs larger than 1 sec */
-			if (usecs >= 1000000) {
-				usecs -= 1000000;
+			if (usecs >= USEC_PER_SEC) {
+				usecs -= USEC_PER_SEC;
 				secs++;
 			}
 			p = 6;
diff --git a/tools/lib/traceevent/event-parse.h b/tools/lib/traceevent/event-parse.h
index 9ffde377e89d..783c842d6517 100644
--- a/tools/lib/traceevent/event-parse.h
+++ b/tools/lib/traceevent/event-parse.h
@@ -172,9 +172,6 @@ struct pevent_plugin_option {
 #define PEVENT_PLUGIN_OPTIONS_NAME MAKE_STR(PEVENT_PLUGIN_OPTIONS)
 #define PEVENT_PLUGIN_ALIAS_NAME MAKE_STR(PEVENT_PLUGIN_ALIAS)
 
-#define NSECS_PER_SEC		1000000000ULL
-#define NSECS_PER_USEC		1000ULL
-
 enum format_flags {
 	FIELD_IS_ARRAY		= 1,
 	FIELD_IS_POINTER	= 2,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 05/18] tools lib traceevent: Add retrieval of preempt count and latency flags
  2016-11-23 16:40 [GIT PULL 00/18] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (3 preceding siblings ...)
  2016-11-23 16:40 ` [PATCH 04/18] tools lib traceevent: Use USECS_PER_SEC instead of hardcoded number Arnaldo Carvalho de Melo
@ 2016-11-23 16:40 ` Arnaldo Carvalho de Melo
  2016-11-23 16:40 ` [PATCH 06/18] perf tools: Show event fd in debug output Arnaldo Carvalho de Melo
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-11-23 16:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Steven Rostedt, Andrew Morton, Jiri Olsa,
	Arnaldo Carvalho de Melo

From: Steven Rostedt <rostedt@goodmis.org>

Add a way to retrieve the preempt count as well as the latency flags from a
pevent_record.

  int pevent_data_preempt_count(pevent, record);

returns the preempt count of a record.

  int pevent_data_flags(pevent, record);

returns the latency flags for a record.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20161122113158.03a010a8@gandalf.local.home
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/lib/traceevent/event-parse.c | 30 ++++++++++++++++++++++++++++--
 tools/lib/traceevent/event-parse.h |  2 ++
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c
index c1ad1ffa7335..14a4f623c1a5 100644
--- a/tools/lib/traceevent/event-parse.c
+++ b/tools/lib/traceevent/event-parse.c
@@ -5192,11 +5192,11 @@ struct event_format *pevent_data_event_from_type(struct pevent *pevent, int type
 }
 
 /**
- * pevent_data_pid - parse the PID from raw data
+ * pevent_data_pid - parse the PID from record
  * @pevent: a handle to the pevent
  * @rec: the record to parse
  *
- * This returns the PID from a raw data.
+ * This returns the PID from a record.
  */
 int pevent_data_pid(struct pevent *pevent, struct pevent_record *rec)
 {
@@ -5204,6 +5204,32 @@ int pevent_data_pid(struct pevent *pevent, struct pevent_record *rec)
 }
 
 /**
+ * pevent_data_prempt_count - parse the preempt count from the record
+ * @pevent: a handle to the pevent
+ * @rec: the record to parse
+ *
+ * This returns the preempt count from a record.
+ */
+int pevent_data_prempt_count(struct pevent *pevent, struct pevent_record *rec)
+{
+	return parse_common_pc(pevent, rec->data);
+}
+
+/**
+ * pevent_data_flags - parse the latency flags from the record
+ * @pevent: a handle to the pevent
+ * @rec: the record to parse
+ *
+ * This returns the latency flags from a record.
+ *
+ *  Use trace_flag_type enum for the flags (see event-parse.h).
+ */
+int pevent_data_flags(struct pevent *pevent, struct pevent_record *rec)
+{
+	return parse_common_flags(pevent, rec->data);
+}
+
+/**
  * pevent_data_comm_from_pid - return the command line from PID
  * @pevent: a handle to the pevent
  * @pid: the PID of the task to search for
diff --git a/tools/lib/traceevent/event-parse.h b/tools/lib/traceevent/event-parse.h
index 783c842d6517..7aae746ec2fe 100644
--- a/tools/lib/traceevent/event-parse.h
+++ b/tools/lib/traceevent/event-parse.h
@@ -709,6 +709,8 @@ void pevent_data_lat_fmt(struct pevent *pevent,
 int pevent_data_type(struct pevent *pevent, struct pevent_record *rec);
 struct event_format *pevent_data_event_from_type(struct pevent *pevent, int type);
 int pevent_data_pid(struct pevent *pevent, struct pevent_record *rec);
+int pevent_data_prempt_count(struct pevent *pevent, struct pevent_record *rec);
+int pevent_data_flags(struct pevent *pevent, struct pevent_record *rec);
 const char *pevent_data_comm_from_pid(struct pevent *pevent, int pid);
 struct cmdline;
 struct cmdline *pevent_data_pid_from_comm(struct pevent *pevent, const char *comm,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 06/18] perf tools: Show event fd in debug output
  2016-11-23 16:40 [GIT PULL 00/18] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (4 preceding siblings ...)
  2016-11-23 16:40 ` [PATCH 05/18] tools lib traceevent: Add retrieval of preempt count and latency flags Arnaldo Carvalho de Melo
@ 2016-11-23 16:40 ` Arnaldo Carvalho de Melo
  2016-11-23 16:40 ` [PATCH 07/18] perf c2c report: Setup browser after opening perf.data Arnaldo Carvalho de Melo
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-11-23 16:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Jiri Olsa, Jiri Olsa, Andi Kleen, David Ahern,
	Don Zickus, Joe Mario, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Jiri Olsa <jolsa@kernel.org>

It is useful for debug to see file descriptors for each event.

Before:

  $ perf stat -vvv -e cycles,cache-misses ls
  ...
  sys_perf_event_open: pid 12146  cpu -1  group_fd -1  flags 0x8
  ...
  sys_perf_event_open: pid 12146  cpu -1  group_fd 3  flags 0x8
  sys_perf_event_open failed, error -13

Now:

  $ perf stat -vvv -e cycles,cache-misses ls
  ...
  sys_perf_event_open: pid 12858  cpu -1  group_fd -1  flags 0x8 = 3
  ...
  sys_perf_event_open: pid 12858  cpu -1  group_fd 3  flags 0x8
  sys_perf_event_open failed, error -13

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1479764011-10732-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/evsel.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index e58a2fbf3b16..b2365a63db45 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1481,7 +1481,7 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
 
 			group_fd = get_group_fd(evsel, cpu, thread);
 retry_open:
-			pr_debug2("sys_perf_event_open: pid %d  cpu %d  group_fd %d  flags %#lx\n",
+			pr_debug2("sys_perf_event_open: pid %d  cpu %d  group_fd %d  flags %#lx",
 				  pid, cpus->map[cpu], group_fd, flags);
 
 			FD(evsel, cpu, thread) = sys_perf_event_open(&evsel->attr,
@@ -1490,11 +1490,13 @@ static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
 								     group_fd, flags);
 			if (FD(evsel, cpu, thread) < 0) {
 				err = -errno;
-				pr_debug2("sys_perf_event_open failed, error %d\n",
+				pr_debug2("\nsys_perf_event_open failed, error %d\n",
 					  err);
 				goto try_fallback;
 			}
 
+			pr_debug2(" = %d\n", FD(evsel, cpu, thread));
+
 			if (evsel->bpf_fd >= 0) {
 				int evt_fd = FD(evsel, cpu, thread);
 				int bpf_fd = evsel->bpf_fd;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 07/18] perf c2c report: Setup browser after opening perf.data
  2016-11-23 16:40 [GIT PULL 00/18] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (5 preceding siblings ...)
  2016-11-23 16:40 ` [PATCH 06/18] perf tools: Show event fd in debug output Arnaldo Carvalho de Melo
@ 2016-11-23 16:40 ` Arnaldo Carvalho de Melo
  2016-11-23 16:40 ` [PATCH 08/18] perf c2c report: Add -f/--force option Arnaldo Carvalho de Melo
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-11-23 16:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Jiri Olsa, Jiri Olsa, Andi Kleen, David Ahern,
	Don Zickus, Joe Mario, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Jiri Olsa <jolsa@kernel.org>

Because of the early browser switch we won't get possible error
messages, as it will clear the screen right after showing the message,
e.g.:

Before:

   $ sudo perf c2c report -d lcl
   $

After:
   $ sudo perf c2c report -d lcl
   File perf.data not owned by current user or root (use -f to override)
   $
   $ ls -la perf.data
   -rw-------. 1 acme acme 26648 Nov 22 15:11 perf.data
   $

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1479764011-10732-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-c2c.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index c6d0dda594d9..15addb06d611 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2535,13 +2535,6 @@ static int perf_c2c__report(int argc, const char **argv)
 	if (c2c.stats_only)
 		c2c.use_stdio = true;
 
-	if (c2c.use_stdio)
-		use_browser = 0;
-	else
-		use_browser = 1;
-
-	setup_browser(false);
-
 	if (!input_name || !strlen(input_name))
 		input_name = "perf.data";
 
@@ -2568,6 +2561,7 @@ static int perf_c2c__report(int argc, const char **argv)
 		pr_debug("No memory for session\n");
 		goto out;
 	}
+
 	err = setup_nodes(session);
 	if (err) {
 		pr_err("Failed setup nodes\n");
@@ -2587,6 +2581,13 @@ static int perf_c2c__report(int argc, const char **argv)
 		goto out_session;
 	}
 
+	if (c2c.use_stdio)
+		use_browser = 0;
+	else
+		use_browser = 1;
+
+	setup_browser(false);
+
 	err = perf_session__process_events(session);
 	if (err) {
 		pr_err("failed to process sample\n");
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 08/18] perf c2c report: Add -f/--force option
  2016-11-23 16:40 [GIT PULL 00/18] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (6 preceding siblings ...)
  2016-11-23 16:40 ` [PATCH 07/18] perf c2c report: Setup browser after opening perf.data Arnaldo Carvalho de Melo
@ 2016-11-23 16:40 ` Arnaldo Carvalho de Melo
  2016-11-23 16:40 ` [PATCH 09/18] perf c2c report: Add struct c2c_stats::tot_hitm field Arnaldo Carvalho de Melo
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-11-23 16:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Jiri Olsa, Jiri Olsa, Andi Kleen, David Ahern,
	Don Zickus, Joe Mario, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Jiri Olsa <jolsa@kernel.org>

Adding -f/--force option to go through ownership validation:

  $ sudo perf c2c report
  File perf.data not owned by current user or root (use -f to override)
  $
  $ sudo perf c2c report -f
  < c2c report output >
  $

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1479764011-10732-4-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-c2c.txt | 4 ++++
 tools/perf/builtin-c2c.c              | 4 +++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
index 21810d711f5f..5eda9336267e 100644
--- a/tools/perf/Documentation/perf-c2c.txt
+++ b/tools/perf/Documentation/perf-c2c.txt
@@ -100,6 +100,10 @@ REPORT OPTIONS
 --show-all::
 	Show all captured HITM lines, with no regard to HITM % 0.0005 limit.
 
+-f::
+--force::
+	Don't do ownership validation.
+
 C2C RECORD
 ----------
 The perf c2c record command setup options related to HITM cacheline analysis
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 15addb06d611..d873977b8fb6 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2523,6 +2523,7 @@ static int perf_c2c__report(int argc, const char **argv)
 	OPT_STRING('d', "display", &display, NULL, "lcl,rmt"),
 	OPT_STRING('c', "coalesce", &coalesce, "coalesce fields",
 		   "coalesce fields: pid,tid,iaddr,dso"),
+	OPT_BOOLEAN('f', "force", &symbol_conf.force, "don't complain, do it"),
 	OPT_END()
 	};
 	int err = 0;
@@ -2538,7 +2539,8 @@ static int perf_c2c__report(int argc, const char **argv)
 	if (!input_name || !strlen(input_name))
 		input_name = "perf.data";
 
-	file.path = input_name;
+	file.path  = input_name;
+	file.force = symbol_conf.force;
 
 	err = setup_display(display);
 	if (err)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 09/18] perf c2c report: Add struct c2c_stats::tot_hitm field
  2016-11-23 16:40 [GIT PULL 00/18] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (7 preceding siblings ...)
  2016-11-23 16:40 ` [PATCH 08/18] perf c2c report: Add -f/--force option Arnaldo Carvalho de Melo
@ 2016-11-23 16:40 ` Arnaldo Carvalho de Melo
  2016-11-23 16:40 ` [PATCH 10/18] perf c2c report: Display total HITMs on default Arnaldo Carvalho de Melo
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-11-23 16:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Jiri Olsa, Jiri Olsa, Andi Kleen, David Ahern,
	Don Zickus, Joe Mario, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Jiri Olsa <jolsa@kernel.org>

Count total number of HITMs in a special field. This will ease up
addition of total HITM sorting into c2c report in the following patch.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1479764011-10732-5-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/mem-events.c | 12 ++++++++++--
 tools/perf/util/mem-events.h |  1 +
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/mem-events.c b/tools/perf/util/mem-events.c
index e50773286ef6..1d4ab53c60ca 100644
--- a/tools/perf/util/mem-events.c
+++ b/tools/perf/util/mem-events.c
@@ -280,6 +280,12 @@ int c2c_decode_stats(struct c2c_stats *stats, struct mem_info *mi)
 	u64 lock   = data_src->mem_lock;
 	int err = 0;
 
+#define HITM_INC(__f)		\
+do {				\
+	stats->__f++;		\
+	stats->tot_hitm++;	\
+} while (0)
+
 #define P(a, b) PERF_MEM_##a##_##b
 
 	stats->nr_entries++;
@@ -303,7 +309,7 @@ int c2c_decode_stats(struct c2c_stats *stats, struct mem_info *mi)
 			if (lvl & P(LVL, L2 )) stats->ld_l2hit++;
 			if (lvl & P(LVL, L3 )) {
 				if (snoop & P(SNOOP, HITM))
-					stats->lcl_hitm++;
+					HITM_INC(lcl_hitm);
 				else
 					stats->ld_llchit++;
 			}
@@ -331,7 +337,7 @@ int c2c_decode_stats(struct c2c_stats *stats, struct mem_info *mi)
 			if (snoop & P(SNOOP, HIT))
 				stats->rmt_hit++;
 			else if (snoop & P(SNOOP, HITM))
-				stats->rmt_hitm++;
+				HITM_INC(rmt_hitm);
 		}
 
 		if ((lvl & P(LVL, MISS)))
@@ -364,6 +370,7 @@ int c2c_decode_stats(struct c2c_stats *stats, struct mem_info *mi)
 	}
 
 #undef P
+#undef HITM_INC
 	return err;
 }
 
@@ -390,6 +397,7 @@ void c2c_add_stats(struct c2c_stats *stats, struct c2c_stats *add)
 	stats->ld_llchit	+= add->ld_llchit;
 	stats->lcl_hitm		+= add->lcl_hitm;
 	stats->rmt_hitm		+= add->rmt_hitm;
+	stats->tot_hitm		+= add->tot_hitm;
 	stats->rmt_hit		+= add->rmt_hit;
 	stats->lcl_dram		+= add->lcl_dram;
 	stats->rmt_dram		+= add->rmt_dram;
diff --git a/tools/perf/util/mem-events.h b/tools/perf/util/mem-events.h
index faf80403b519..40f72ee4f42a 100644
--- a/tools/perf/util/mem-events.h
+++ b/tools/perf/util/mem-events.h
@@ -59,6 +59,7 @@ struct c2c_stats {
 	u32	ld_llchit;           /* count of loads that hit LLC */
 	u32	lcl_hitm;            /* count of loads with local HITM  */
 	u32	rmt_hitm;            /* count of loads with remote HITM */
+	u32	tot_hitm;            /* count of loads with local and remote HITM */
 	u32	rmt_hit;             /* count of loads with remote hit clean; */
 	u32	lcl_dram;            /* count of loads miss to local DRAM */
 	u32	rmt_dram;            /* count of loads miss to remote DRAM */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 10/18] perf c2c report: Display total HITMs on default
  2016-11-23 16:40 [GIT PULL 00/18] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (8 preceding siblings ...)
  2016-11-23 16:40 ` [PATCH 09/18] perf c2c report: Add struct c2c_stats::tot_hitm field Arnaldo Carvalho de Melo
@ 2016-11-23 16:40 ` Arnaldo Carvalho de Melo
  2016-11-23 16:40 ` [PATCH 11/18] perf c2c: Support cascading options Arnaldo Carvalho de Melo
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-11-23 16:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Jiri Olsa, Jiri Olsa, Andi Kleen, David Ahern,
	Don Zickus, Joe Mario, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Jiri Olsa <jolsa@kernel.org>

Currently we display the cacheline list sorted on remote HITMs by
default.

The problem is that they might not be always counted and 'perf c2c
report' displays an empty output. Thus it's more convenient to display
and sort the cacheline list based on the total of HITMs and have the
best change to see data in the default report run.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1479764011-10732-6-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-c2c.txt |  4 ++++
 tools/perf/builtin-c2c.c              | 39 ++++++++++++++++++++++++++++-------
 2 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/tools/perf/Documentation/perf-c2c.txt b/tools/perf/Documentation/perf-c2c.txt
index 5eda9336267e..3f06730c7f47 100644
--- a/tools/perf/Documentation/perf-c2c.txt
+++ b/tools/perf/Documentation/perf-c2c.txt
@@ -104,6 +104,10 @@ REPORT OPTIONS
 --force::
 	Don't do ownership validation.
 
+-d::
+--display::
+	Siwtch to HITM type (rmt, lcl) to display and sort on. Total HITMs as default.
+
 C2C RECORD
 ----------
 The perf c2c record command setup options related to HITM cacheline analysis
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index d873977b8fb6..54924717ae8e 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -91,6 +91,14 @@ struct perf_c2c {
 enum {
 	DISPLAY_LCL,
 	DISPLAY_RMT,
+	DISPLAY_TOT,
+	DISPLAY_MAX,
+};
+
+static const char *display_str[DISPLAY_MAX] = {
+	[DISPLAY_LCL] = "Local",
+	[DISPLAY_RMT] = "Remote",
+	[DISPLAY_TOT] = "Total",
 };
 
 static struct perf_c2c c2c;
@@ -745,6 +753,10 @@ static double percent_hitm(struct c2c_hist_entry *c2c_he)
 	case DISPLAY_LCL:
 		st  = stats->lcl_hitm;
 		tot = total->lcl_hitm;
+		break;
+	case DISPLAY_TOT:
+		st  = stats->tot_hitm;
+		tot = total->tot_hitm;
 	default:
 		break;
 	}
@@ -1044,6 +1056,9 @@ node_entry(struct perf_hpp_fmt *fmt __maybe_unused, struct perf_hpp *hpp,
 				break;
 			case DISPLAY_LCL:
 				DISPLAY_HITM(lcl_hitm);
+				break;
+			case DISPLAY_TOT:
+				DISPLAY_HITM(tot_hitm);
 			default:
 				break;
 			}
@@ -1351,6 +1366,7 @@ static struct c2c_dimension dim_tot_loads = {
 static struct c2c_header percent_hitm_header[] = {
 	[DISPLAY_LCL] = HEADER_BOTH("Lcl", "Hitm"),
 	[DISPLAY_RMT] = HEADER_BOTH("Rmt", "Hitm"),
+	[DISPLAY_TOT] = HEADER_BOTH("Tot", "Hitm"),
 };
 
 static struct c2c_dimension dim_percent_hitm = {
@@ -1794,6 +1810,9 @@ static bool he__display(struct hist_entry *he, struct c2c_stats *stats)
 		break;
 	case DISPLAY_RMT:
 		FILTER_HITM(rmt_hitm);
+		break;
+	case DISPLAY_TOT:
+		FILTER_HITM(tot_hitm);
 	default:
 		break;
 	};
@@ -1809,8 +1828,9 @@ static inline int valid_hitm_or_store(struct hist_entry *he)
 	bool has_hitm;
 
 	c2c_he = container_of(he, struct c2c_hist_entry, he);
-	has_hitm = c2c.display == DISPLAY_LCL ?
-		   c2c_he->stats.lcl_hitm : c2c_he->stats.rmt_hitm;
+	has_hitm = c2c.display == DISPLAY_TOT ? c2c_he->stats.tot_hitm :
+		   c2c.display == DISPLAY_LCL ? c2c_he->stats.lcl_hitm :
+						c2c_he->stats.rmt_hitm;
 	return has_hitm || c2c_he->stats.store;
 }
 
@@ -2095,7 +2115,7 @@ static void print_c2c_info(FILE *out, struct perf_session *session)
 		first = false;
 	}
 	fprintf(out, "  Cachelines sort on                : %s HITMs\n",
-		c2c.display == DISPLAY_LCL ? "Local" : "Remote");
+		display_str[c2c.display]);
 	fprintf(out, "  Cacheline data grouping           : %s\n", c2c.cl_sort);
 }
 
@@ -2250,7 +2270,7 @@ static int perf_c2c_browser__title(struct hist_browser *browser,
 		  "Shared Data Cache Line Table     "
 		  "(%lu entries, sorted on %s HITMs)",
 		  browser->nr_non_filtered_entries,
-		  c2c.display == DISPLAY_LCL ? "local" : "remote");
+		  display_str[c2c.display]);
 	return 0;
 }
 
@@ -2387,9 +2407,11 @@ static int setup_callchain(struct perf_evlist *evlist)
 
 static int setup_display(const char *str)
 {
-	const char *display = str ?: "rmt";
+	const char *display = str ?: "tot";
 
-	if (!strcmp(display, "rmt"))
+	if (!strcmp(display, "tot"))
+		c2c.display = DISPLAY_TOT;
+	else if (!strcmp(display, "rmt"))
 		c2c.display = DISPLAY_RMT;
 	else if (!strcmp(display, "lcl"))
 		c2c.display = DISPLAY_LCL;
@@ -2474,6 +2496,8 @@ static int setup_coalesce(const char *coalesce, bool no_source)
 		return -1;
 
 	if (asprintf(&c2c.cl_resort, "offset,%s",
+		     c2c.display == DISPLAY_TOT ?
+		     "tot_hitm" :
 		     c2c.display == DISPLAY_RMT ?
 		     "rmt_hitm,lcl_hitm" :
 		     "lcl_hitm,rmt_hitm") < 0)
@@ -2520,7 +2544,7 @@ static int perf_c2c__report(int argc, const char **argv)
 			     "print_type,threshold[,print_limit],order,sort_key[,branch],value",
 			     callchain_help, &parse_callchain_opt,
 			     callchain_default_opt),
-	OPT_STRING('d', "display", &display, NULL, "lcl,rmt"),
+	OPT_STRING('d', "display", &display, "Switch HITM output type", "lcl,rmt"),
 	OPT_STRING('c', "coalesce", &coalesce, "coalesce fields",
 		   "coalesce fields: pid,tid,iaddr,dso"),
 	OPT_BOOLEAN('f', "force", &symbol_conf.force, "don't complain, do it"),
@@ -2608,6 +2632,7 @@ static int perf_c2c__report(int argc, const char **argv)
 			"tot_loads,"
 			"ld_fbhit,ld_l1hit,ld_l2hit,"
 			"ld_lclhit,ld_rmthit",
+			c2c.display == DISPLAY_TOT ? "tot_hitm" :
 			c2c.display == DISPLAY_LCL ? "lcl_hitm" : "rmt_hitm"
 			);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 11/18] perf c2c: Support cascading options
  2016-11-23 16:40 [GIT PULL 00/18] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (9 preceding siblings ...)
  2016-11-23 16:40 ` [PATCH 10/18] perf c2c report: Display total HITMs on default Arnaldo Carvalho de Melo
@ 2016-11-23 16:40 ` Arnaldo Carvalho de Melo
  2016-11-23 16:40 ` [PATCH 12/18] perf symbols: Print symbol offsets conditionally Arnaldo Carvalho de Melo
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-11-23 16:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Jiri Olsa, Jiri Olsa, Andi Kleen, David Ahern,
	Don Zickus, Joe Mario, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Jiri Olsa <jolsa@kernel.org>

Adding support for cascading options added by Namhyung in:

  commit 369a2478973a ("tools lib subcmd: Support cascading options")

This way the report and record command share options with with c2c
command and can save some option duplicates. For now it's the 'v'
option.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1479764011-10732-7-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-c2c.c | 22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 54924717ae8e..4b419631753d 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -101,6 +101,11 @@ static const char *display_str[DISPLAY_MAX] = {
 	[DISPLAY_TOT] = "Total",
 };
 
+static const struct option c2c_options[] = {
+	OPT_INCR('v', "verbose", &verbose, "be more verbose (show counter open errors, etc)"),
+	OPT_END()
+};
+
 static struct perf_c2c c2c;
 
 static void *c2c_he_zalloc(size_t size)
@@ -2520,11 +2525,9 @@ static int perf_c2c__report(int argc, const char **argv)
 	const char *display = NULL;
 	const char *coalesce = NULL;
 	bool no_source = false;
-	const struct option c2c_options[] = {
+	const struct option options[] = {
 	OPT_STRING('k', "vmlinux", &symbol_conf.vmlinux_name,
 		   "file", "vmlinux pathname"),
-	OPT_INCR('v', "verbose", &verbose,
-		 "be more verbose (show counter open errors, etc)"),
 	OPT_STRING('i', "input", &input_name, "file",
 		   "the input file to process"),
 	OPT_INCR('N', "node-info", &c2c.node_info,
@@ -2548,14 +2551,15 @@ static int perf_c2c__report(int argc, const char **argv)
 	OPT_STRING('c', "coalesce", &coalesce, "coalesce fields",
 		   "coalesce fields: pid,tid,iaddr,dso"),
 	OPT_BOOLEAN('f', "force", &symbol_conf.force, "don't complain, do it"),
+	OPT_PARENT(c2c_options),
 	OPT_END()
 	};
 	int err = 0;
 
-	argc = parse_options(argc, argv, c2c_options, report_c2c_usage,
+	argc = parse_options(argc, argv, options, report_c2c_usage,
 			     PARSE_OPT_STOP_AT_NON_OPTION);
 	if (argc)
-		usage_with_options(report_c2c_usage, c2c_options);
+		usage_with_options(report_c2c_usage, options);
 
 	if (c2c.stats_only)
 		c2c.use_stdio = true;
@@ -2683,11 +2687,10 @@ static int perf_c2c__record(int argc, const char **argv)
 	OPT_CALLBACK('e', "event", &event_set, "event",
 		     "event selector. Use 'perf mem record -e list' to list available events",
 		     parse_record_events),
-	OPT_INCR('v', "verbose", &verbose,
-		 "be more verbose (show counter open errors, etc)"),
 	OPT_BOOLEAN('u', "all-user", &all_user, "collect only user level data"),
 	OPT_BOOLEAN('k', "all-kernel", &all_kernel, "collect only kernel level data"),
 	OPT_UINTEGER('l', "ldlat", &perf_mem_events__loads_ldlat, "setup mem-loads latency"),
+	OPT_PARENT(c2c_options),
 	OPT_END()
 	};
 
@@ -2759,11 +2762,6 @@ static int perf_c2c__record(int argc, const char **argv)
 
 int cmd_c2c(int argc, const char **argv, const char *prefix __maybe_unused)
 {
-	const struct option c2c_options[] = {
-	OPT_INCR('v', "verbose", &verbose, "be more verbose"),
-	OPT_END()
-	};
-
 	argc = parse_options(argc, argv, c2c_options, c2c_usage,
 			     PARSE_OPT_STOP_AT_NON_OPTION);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 12/18] perf symbols: Print symbol offsets conditionally
  2016-11-23 16:40 [GIT PULL 00/18] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (10 preceding siblings ...)
  2016-11-23 16:40 ` [PATCH 11/18] perf c2c: Support cascading options Arnaldo Carvalho de Melo
@ 2016-11-23 16:40 ` Arnaldo Carvalho de Melo
  2016-11-23 16:40 ` [PATCH 13/18] perf evsel: Support printing callchains with arrows Arnaldo Carvalho de Melo
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-11-23 16:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Namhyung Kim, Andi Kleen, David Ahern,
	Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo

From: Namhyung Kim <namhyung@kernel.org>

The __symbol__fprintf_symname_offs() always shows symbol offsets.  So
there's no difference between 'perf script -F ip,sym' and 'perf script
-F ip,sym,symoff'.  I don't think it's a desired behavior..

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20161116060634.28477-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/evsel_fprintf.c  |  6 ++++--
 tools/perf/util/symbol.h         |  3 ++-
 tools/perf/util/symbol_fprintf.c | 11 ++++++-----
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/evsel_fprintf.c b/tools/perf/util/evsel_fprintf.c
index 662a0a6182e7..ccb602397b60 100644
--- a/tools/perf/util/evsel_fprintf.c
+++ b/tools/perf/util/evsel_fprintf.c
@@ -137,7 +137,8 @@ int sample__fprintf_callchain(struct perf_sample *sample, int left_alignment,
 
 				if (print_symoffset) {
 					printed += __symbol__fprintf_symname_offs(node->sym, &node_al,
-										  print_unknown_as_addr, fp);
+										  print_unknown_as_addr,
+										  true, fp);
 				} else {
 					printed += __symbol__fprintf_symname(node->sym, &node_al,
 									     print_unknown_as_addr, fp);
@@ -188,7 +189,8 @@ int sample__fprintf_sym(struct perf_sample *sample, struct addr_location *al,
 			printed += fprintf(fp, " ");
 			if (print_symoffset) {
 				printed += __symbol__fprintf_symname_offs(al->sym, al,
-									  print_unknown_as_addr, fp);
+									  print_unknown_as_addr,
+									  true, fp);
 			} else {
 				printed += __symbol__fprintf_symname(al->sym, al,
 								     print_unknown_as_addr, fp);
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 2d0a905c879a..dec7e2d44885 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -282,7 +282,8 @@ int symbol__annotation_init(void);
 struct symbol *symbol__new(u64 start, u64 len, u8 binding, const char *name);
 size_t __symbol__fprintf_symname_offs(const struct symbol *sym,
 				      const struct addr_location *al,
-				      bool unknown_as_addr, FILE *fp);
+				      bool unknown_as_addr,
+				      bool print_offsets, FILE *fp);
 size_t symbol__fprintf_symname_offs(const struct symbol *sym,
 				    const struct addr_location *al, FILE *fp);
 size_t __symbol__fprintf_symname(const struct symbol *sym,
diff --git a/tools/perf/util/symbol_fprintf.c b/tools/perf/util/symbol_fprintf.c
index a680bdaa65dc..7c6b33e8e2d2 100644
--- a/tools/perf/util/symbol_fprintf.c
+++ b/tools/perf/util/symbol_fprintf.c
@@ -15,14 +15,15 @@ size_t symbol__fprintf(struct symbol *sym, FILE *fp)
 
 size_t __symbol__fprintf_symname_offs(const struct symbol *sym,
 				      const struct addr_location *al,
-				      bool unknown_as_addr, FILE *fp)
+				      bool unknown_as_addr,
+				      bool print_offsets, FILE *fp)
 {
 	unsigned long offset;
 	size_t length;
 
 	if (sym && sym->name) {
 		length = fprintf(fp, "%s", sym->name);
-		if (al) {
+		if (al && print_offsets) {
 			if (al->addr < sym->end)
 				offset = al->addr - sym->start;
 			else
@@ -40,19 +41,19 @@ size_t symbol__fprintf_symname_offs(const struct symbol *sym,
 				    const struct addr_location *al,
 				    FILE *fp)
 {
-	return __symbol__fprintf_symname_offs(sym, al, false, fp);
+	return __symbol__fprintf_symname_offs(sym, al, false, true, fp);
 }
 
 size_t __symbol__fprintf_symname(const struct symbol *sym,
 				 const struct addr_location *al,
 				 bool unknown_as_addr, FILE *fp)
 {
-	return __symbol__fprintf_symname_offs(sym, al, unknown_as_addr, fp);
+	return __symbol__fprintf_symname_offs(sym, al, unknown_as_addr, false, fp);
 }
 
 size_t symbol__fprintf_symname(const struct symbol *sym, FILE *fp)
 {
-	return __symbol__fprintf_symname_offs(sym, NULL, false, fp);
+	return __symbol__fprintf_symname_offs(sym, NULL, false, false, fp);
 }
 
 size_t dso__fprintf_symbols_by_name(struct dso *dso,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 13/18] perf evsel: Support printing callchains with arrows
  2016-11-23 16:40 [GIT PULL 00/18] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (11 preceding siblings ...)
  2016-11-23 16:40 ` [PATCH 12/18] perf symbols: Print symbol offsets conditionally Arnaldo Carvalho de Melo
@ 2016-11-23 16:40 ` Arnaldo Carvalho de Melo
  2016-11-23 16:40 ` [PATCH 14/18] perf sched timehist: Introduce timehist command Arnaldo Carvalho de Melo
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-11-23 16:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Namhyung Kim, Andi Kleen, David Ahern,
	Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo

From: Namhyung Kim <namhyung@kernel.org>

The EVSEL__PRINT_CALLCHAIN_ARROW options can be used to print callchains
with arrows for readability.  It will be used 'sched timehist' command
like below:

    __schedule <- schedule <- schedule_timeout <- rcu_gp_kthread <- kthread <- ret_from_fork
    __schedule <- schedule <- schedule_timeout <- rcu_gp_kthread <- kthread <- ret_from_fork
    __schedule <- schedule <- worker_thread <- kthread <- ret_from_fork

Suggested-and-Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20161116060634.28477-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/evsel.h         | 1 +
 tools/perf/util/evsel_fprintf.c | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 8cd7cd227483..27fa3a343577 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -391,6 +391,7 @@ int perf_evsel__fprintf(struct perf_evsel *evsel,
 #define EVSEL__PRINT_ONELINE		(1<<4)
 #define EVSEL__PRINT_SRCLINE		(1<<5)
 #define EVSEL__PRINT_UNKNOWN_AS_ADDR	(1<<6)
+#define EVSEL__PRINT_CALLCHAIN_ARROW	(1<<7)
 
 struct callchain_cursor;
 
diff --git a/tools/perf/util/evsel_fprintf.c b/tools/perf/util/evsel_fprintf.c
index ccb602397b60..53bb614feafb 100644
--- a/tools/perf/util/evsel_fprintf.c
+++ b/tools/perf/util/evsel_fprintf.c
@@ -108,7 +108,9 @@ int sample__fprintf_callchain(struct perf_sample *sample, int left_alignment,
 	int print_oneline = print_opts & EVSEL__PRINT_ONELINE;
 	int print_srcline = print_opts & EVSEL__PRINT_SRCLINE;
 	int print_unknown_as_addr = print_opts & EVSEL__PRINT_UNKNOWN_AS_ADDR;
+	int print_arrow = print_opts & EVSEL__PRINT_CALLCHAIN_ARROW;
 	char s = print_oneline ? ' ' : '\t';
+	bool first = true;
 
 	if (sample->callchain) {
 		struct addr_location node_al;
@@ -124,6 +126,9 @@ int sample__fprintf_callchain(struct perf_sample *sample, int left_alignment,
 
 			printed += fprintf(fp, "%-*.*s", left_alignment, left_alignment, " ");
 
+			if (print_arrow && !first)
+				printed += fprintf(fp, " <-");
+
 			if (print_ip)
 				printed += fprintf(fp, "%c%16" PRIx64, s, node->ip);
 
@@ -158,6 +163,7 @@ int sample__fprintf_callchain(struct perf_sample *sample, int left_alignment,
 				printed += fprintf(fp, "\n");
 
 			callchain_cursor_advance(cursor);
+			first = false;
 		}
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 14/18] perf sched timehist: Introduce timehist command
  2016-11-23 16:40 [GIT PULL 00/18] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (12 preceding siblings ...)
  2016-11-23 16:40 ` [PATCH 13/18] perf evsel: Support printing callchains with arrows Arnaldo Carvalho de Melo
@ 2016-11-23 16:40 ` Arnaldo Carvalho de Melo
  2016-11-23 16:40 ` [PATCH 15/18] perf sched timehist: Add summary options Arnaldo Carvalho de Melo
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-11-23 16:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, David Ahern, Namhyung Kim, Andi Kleen,
	Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo

From: David Ahern <dsahern@gmail.com>

'perf sched timehist' provides an analysis of scheduling events.

Example usage:
    perf sched record -- sleep 1
    perf sched timehist

By default it shows the individual schedule events, including the wait
time (time between sched-out and next sched-in events for the task), the
task scheduling delay (time between wakeup and actually running) and run
time for the task:

            time    cpu  task name             wait time  sch delay   run time
                         [tid/pid]                (msec)     (msec)     (msec)
  -------------- ------  --------------------  ---------  ---------  ---------
    79371.874569 [0011]  gcc[31949]                0.014      0.000      1.148
    79371.874591 [0010]  gcc[31951]                0.000      0.000      0.024
    79371.874603 [0010]  migration/10[59]          3.350      0.004      0.011
    79371.874604 [0011]  <idle>                    1.148      0.000      0.035
    79371.874723 [0005]  <idle>                    0.016      0.000      1.383
    79371.874746 [0005]  gcc[31949]                0.153      0.078      0.022
...

Times are in msec.usec.

Committer note:

Add above explanation as the 'perf sched timehist' entry for 'man
perf-sched'.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20161116060634.28477-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-sched.txt |  50 ++-
 tools/perf/builtin-sched.c              | 594 +++++++++++++++++++++++++++++++-
 2 files changed, 637 insertions(+), 7 deletions(-)

diff --git a/tools/perf/Documentation/perf-sched.txt b/tools/perf/Documentation/perf-sched.txt
index 1cc08cc47ac5..72730da307b9 100644
--- a/tools/perf/Documentation/perf-sched.txt
+++ b/tools/perf/Documentation/perf-sched.txt
@@ -8,11 +8,11 @@ perf-sched - Tool to trace/measure scheduler properties (latencies)
 SYNOPSIS
 --------
 [verse]
-'perf sched' {record|latency|map|replay|script}
+'perf sched' {record|latency|map|replay|script|timehist}
 
 DESCRIPTION
 -----------
-There are five variants of perf sched:
+There are several variants of 'perf sched':
 
   'perf sched record <command>' to record the scheduling events
   of an arbitrary workload.
@@ -36,6 +36,30 @@ There are five variants of perf sched:
   are running on a CPU. A '*' denotes the CPU that had the event, and
   a dot signals an idle CPU.
 
+  'perf sched timehist' provides an analysis of scheduling events.
+    
+    Example usage:
+        perf sched record -- sleep 1
+        perf sched timehist
+    
+   By default it shows the individual schedule events, including the wait
+   time (time between sched-out and next sched-in events for the task), the
+   task scheduling delay (time between wakeup and actually running) and run
+   time for the task:
+    
+                time    cpu  task name             wait time  sch delay   run time
+                             [tid/pid]                (msec)     (msec)     (msec)
+      -------------- ------  --------------------  ---------  ---------  ---------
+        79371.874569 [0011]  gcc[31949]                0.014      0.000      1.148
+        79371.874591 [0010]  gcc[31951]                0.000      0.000      0.024
+        79371.874603 [0010]  migration/10[59]          3.350      0.004      0.011
+        79371.874604 [0011]  <idle>                    1.148      0.000      0.035
+        79371.874723 [0005]  <idle>                    0.016      0.000      1.383
+        79371.874746 [0005]  gcc[31949]                0.153      0.078      0.022
+    ...
+    
+   Times are in msec.usec.
+
 OPTIONS
 -------
 -i::
@@ -66,6 +90,28 @@ OPTIONS for 'perf sched map'
 --color-pids::
 	Highlight the given pids.
 
+OPTIONS for 'perf sched timehist'
+---------------------------------
+-k::
+--vmlinux=<file>::
+    vmlinux pathname
+
+--kallsyms=<file>::
+    kallsyms pathname
+
+-s::
+--summary::
+    Show only a summary of scheduling by thread with min, max, and average
+    run times (in sec) and relative stddev.
+
+-S::
+--with-summary::
+    Show all scheduling events followed by a summary by thread with min,
+    max, and average run times (in sec) and relative stddev.
+
+--symfs=<directory>::
+    Look for files with symbols relative to this directory.
+
 SEE ALSO
 --------
 linkperf:perf-record[1]
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index fb3441211e4b..c0ac0c9557e8 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -13,12 +13,14 @@
 #include "util/cloexec.h"
 #include "util/thread_map.h"
 #include "util/color.h"
+#include "util/stat.h"
 
 #include <subcmd/parse-options.h>
 #include "util/trace-event.h"
 
 #include "util/debug.h"
 
+#include <linux/log2.h>
 #include <sys/prctl.h>
 #include <sys/resource.h>
 
@@ -194,6 +196,29 @@ struct perf_sched {
 	struct perf_sched_map map;
 };
 
+/* per thread run time data */
+struct thread_runtime {
+	u64 last_time;      /* time of previous sched in/out event */
+	u64 dt_run;         /* run time */
+	u64 dt_wait;        /* time between CPU access (off cpu) */
+	u64 dt_delay;       /* time between wakeup and sched-in */
+	u64 ready_to_run;   /* time of wakeup */
+
+	struct stats run_stats;
+	u64 total_run_time;
+};
+
+/* per event run time data */
+struct evsel_runtime {
+	u64 *last_time; /* time this event was last seen per cpu */
+	u32 ncpu;       /* highest cpu slot allocated */
+};
+
+/* track idle times per cpu */
+static struct thread **idle_threads;
+static int idle_max_cpu;
+static char idle_comm[] = "<idle>";
+
 static u64 get_nsecs(void)
 {
 	struct timespec ts;
@@ -1654,6 +1679,546 @@ static int perf_sched__read_events(struct perf_sched *sched)
 	return rc;
 }
 
+/*
+ * scheduling times are printed as msec.usec
+ */
+static inline void print_sched_time(unsigned long long nsecs, int width)
+{
+	unsigned long msecs;
+	unsigned long usecs;
+
+	msecs  = nsecs / NSEC_PER_MSEC;
+	nsecs -= msecs * NSEC_PER_MSEC;
+	usecs  = nsecs / NSEC_PER_USEC;
+	printf("%*lu.%03lu ", width, msecs, usecs);
+}
+
+/*
+ * returns runtime data for event, allocating memory for it the
+ * first time it is used.
+ */
+static struct evsel_runtime *perf_evsel__get_runtime(struct perf_evsel *evsel)
+{
+	struct evsel_runtime *r = evsel->priv;
+
+	if (r == NULL) {
+		r = zalloc(sizeof(struct evsel_runtime));
+		evsel->priv = r;
+	}
+
+	return r;
+}
+
+/*
+ * save last time event was seen per cpu
+ */
+static void perf_evsel__save_time(struct perf_evsel *evsel,
+				  u64 timestamp, u32 cpu)
+{
+	struct evsel_runtime *r = perf_evsel__get_runtime(evsel);
+
+	if (r == NULL)
+		return;
+
+	if ((cpu >= r->ncpu) || (r->last_time == NULL)) {
+		int i, n = __roundup_pow_of_two(cpu+1);
+		void *p = r->last_time;
+
+		p = realloc(r->last_time, n * sizeof(u64));
+		if (!p)
+			return;
+
+		r->last_time = p;
+		for (i = r->ncpu; i < n; ++i)
+			r->last_time[i] = (u64) 0;
+
+		r->ncpu = n;
+	}
+
+	r->last_time[cpu] = timestamp;
+}
+
+/* returns last time this event was seen on the given cpu */
+static u64 perf_evsel__get_time(struct perf_evsel *evsel, u32 cpu)
+{
+	struct evsel_runtime *r = perf_evsel__get_runtime(evsel);
+
+	if ((r == NULL) || (r->last_time == NULL) || (cpu >= r->ncpu))
+		return 0;
+
+	return r->last_time[cpu];
+}
+
+static int comm_width = 20;
+
+static char *timehist_get_commstr(struct thread *thread)
+{
+	static char str[32];
+	const char *comm = thread__comm_str(thread);
+	pid_t tid = thread->tid;
+	pid_t pid = thread->pid_;
+	int n;
+
+	if (pid == 0)
+		n = scnprintf(str, sizeof(str), "%s", comm);
+
+	else if (tid != pid)
+		n = scnprintf(str, sizeof(str), "%s[%d/%d]", comm, tid, pid);
+
+	else
+		n = scnprintf(str, sizeof(str), "%s[%d]", comm, tid);
+
+	if (n > comm_width)
+		comm_width = n;
+
+	return str;
+}
+
+static void timehist_header(void)
+{
+	printf("%15s %6s ", "time", "cpu");
+
+	printf(" %-20s  %9s  %9s  %9s",
+		"task name", "wait time", "sch delay", "run time");
+
+	printf("\n");
+
+	/*
+	 * units row
+	 */
+	printf("%15s %-6s ", "", "");
+
+	printf(" %-20s  %9s  %9s  %9s\n", "[tid/pid]", "(msec)", "(msec)", "(msec)");
+
+	/*
+	 * separator
+	 */
+	printf("%.15s %.6s ", graph_dotted_line, graph_dotted_line);
+
+	printf(" %.20s  %.9s  %.9s  %.9s",
+		graph_dotted_line, graph_dotted_line, graph_dotted_line,
+		graph_dotted_line);
+
+	printf("\n");
+}
+
+static void timehist_print_sample(struct perf_sample *sample,
+				  struct thread *thread)
+{
+	struct thread_runtime *tr = thread__priv(thread);
+	char tstr[64];
+
+	timestamp__scnprintf_usec(sample->time, tstr, sizeof(tstr));
+	printf("%15s [%04d] ", tstr, sample->cpu);
+
+	printf(" %-*s ", comm_width, timehist_get_commstr(thread));
+
+	print_sched_time(tr->dt_wait, 6);
+	print_sched_time(tr->dt_delay, 6);
+	print_sched_time(tr->dt_run, 6);
+	printf("\n");
+}
+
+/*
+ * Explanation of delta-time stats:
+ *
+ *            t = time of current schedule out event
+ *        tprev = time of previous sched out event
+ *                also time of schedule-in event for current task
+ *    last_time = time of last sched change event for current task
+ *                (i.e, time process was last scheduled out)
+ * ready_to_run = time of wakeup for current task
+ *
+ * -----|------------|------------|------------|------
+ *    last         ready        tprev          t
+ *    time         to run
+ *
+ *      |-------- dt_wait --------|
+ *                   |- dt_delay -|-- dt_run --|
+ *
+ *   dt_run = run time of current task
+ *  dt_wait = time between last schedule out event for task and tprev
+ *            represents time spent off the cpu
+ * dt_delay = time between wakeup and schedule-in of task
+ */
+
+static void timehist_update_runtime_stats(struct thread_runtime *r,
+					 u64 t, u64 tprev)
+{
+	r->dt_delay   = 0;
+	r->dt_wait    = 0;
+	r->dt_run     = 0;
+	if (tprev) {
+		r->dt_run = t - tprev;
+		if (r->ready_to_run) {
+			if (r->ready_to_run > tprev)
+				pr_debug("time travel: wakeup time for task > previous sched_switch event\n");
+			else
+				r->dt_delay = tprev - r->ready_to_run;
+		}
+
+		if (r->last_time > tprev)
+			pr_debug("time travel: last sched out time for task > previous sched_switch event\n");
+		else if (r->last_time)
+			r->dt_wait = tprev - r->last_time;
+	}
+
+	update_stats(&r->run_stats, r->dt_run);
+	r->total_run_time += r->dt_run;
+}
+
+static bool is_idle_sample(struct perf_sample *sample,
+			   struct perf_evsel *evsel)
+{
+	/* pid 0 == swapper == idle task */
+	if (sample->pid == 0)
+		return true;
+
+	if (strcmp(perf_evsel__name(evsel), "sched:sched_switch") == 0) {
+		if (perf_evsel__intval(evsel, sample, "prev_pid") == 0)
+			return true;
+	}
+	return false;
+}
+
+/*
+ * Track idle stats per cpu by maintaining a local thread
+ * struct for the idle task on each cpu.
+ */
+static int init_idle_threads(int ncpu)
+{
+	int i;
+
+	idle_threads = zalloc(ncpu * sizeof(struct thread *));
+	if (!idle_threads)
+		return -ENOMEM;
+
+	idle_max_cpu = ncpu - 1;
+
+	/* allocate the actual thread struct if needed */
+	for (i = 0; i < ncpu; ++i) {
+		idle_threads[i] = thread__new(0, 0);
+		if (idle_threads[i] == NULL)
+			return -ENOMEM;
+
+		thread__set_comm(idle_threads[i], idle_comm, 0);
+	}
+
+	return 0;
+}
+
+static void free_idle_threads(void)
+{
+	int i;
+
+	if (idle_threads == NULL)
+		return;
+
+	for (i = 0; i <= idle_max_cpu; ++i) {
+		if ((idle_threads[i]))
+			thread__delete(idle_threads[i]);
+	}
+
+	free(idle_threads);
+}
+
+static struct thread *get_idle_thread(int cpu)
+{
+	/*
+	 * expand/allocate array of pointers to local thread
+	 * structs if needed
+	 */
+	if ((cpu >= idle_max_cpu) || (idle_threads == NULL)) {
+		int i, j = __roundup_pow_of_two(cpu+1);
+		void *p;
+
+		p = realloc(idle_threads, j * sizeof(struct thread *));
+		if (!p)
+			return NULL;
+
+		idle_threads = (struct thread **) p;
+		i = idle_max_cpu ? idle_max_cpu + 1 : 0;
+		for (; i < j; ++i)
+			idle_threads[i] = NULL;
+
+		idle_max_cpu = j;
+	}
+
+	/* allocate a new thread struct if needed */
+	if (idle_threads[cpu] == NULL) {
+		idle_threads[cpu] = thread__new(0, 0);
+
+		if (idle_threads[cpu]) {
+			idle_threads[cpu]->tid = 0;
+			thread__set_comm(idle_threads[cpu], idle_comm, 0);
+		}
+	}
+
+	return idle_threads[cpu];
+}
+
+/*
+ * handle runtime stats saved per thread
+ */
+static struct thread_runtime *thread__init_runtime(struct thread *thread)
+{
+	struct thread_runtime *r;
+
+	r = zalloc(sizeof(struct thread_runtime));
+	if (!r)
+		return NULL;
+
+	init_stats(&r->run_stats);
+	thread__set_priv(thread, r);
+
+	return r;
+}
+
+static struct thread_runtime *thread__get_runtime(struct thread *thread)
+{
+	struct thread_runtime *tr;
+
+	tr = thread__priv(thread);
+	if (tr == NULL) {
+		tr = thread__init_runtime(thread);
+		if (tr == NULL)
+			pr_debug("Failed to malloc memory for runtime data.\n");
+	}
+
+	return tr;
+}
+
+static struct thread *timehist_get_thread(struct perf_sample *sample,
+					  struct machine *machine,
+					  struct perf_evsel *evsel)
+{
+	struct thread *thread;
+
+	if (is_idle_sample(sample, evsel)) {
+		thread = get_idle_thread(sample->cpu);
+		if (thread == NULL)
+			pr_err("Failed to get idle thread for cpu %d.\n", sample->cpu);
+
+	} else {
+		thread = machine__findnew_thread(machine, sample->pid, sample->tid);
+		if (thread == NULL) {
+			pr_debug("Failed to get thread for tid %d. skipping sample.\n",
+				 sample->tid);
+		}
+	}
+
+	return thread;
+}
+
+static bool timehist_skip_sample(struct thread *thread)
+{
+	bool rc = false;
+
+	if (thread__is_filtered(thread))
+		rc = true;
+
+	return rc;
+}
+
+static int timehist_sched_wakeup_event(struct perf_tool *tool __maybe_unused,
+				       union perf_event *event __maybe_unused,
+				       struct perf_evsel *evsel,
+				       struct perf_sample *sample,
+				       struct machine *machine)
+{
+	struct thread *thread;
+	struct thread_runtime *tr = NULL;
+	/* want pid of awakened task not pid in sample */
+	const u32 pid = perf_evsel__intval(evsel, sample, "pid");
+
+	thread = machine__findnew_thread(machine, 0, pid);
+	if (thread == NULL)
+		return -1;
+
+	tr = thread__get_runtime(thread);
+	if (tr == NULL)
+		return -1;
+
+	if (tr->ready_to_run == 0)
+		tr->ready_to_run = sample->time;
+
+	return 0;
+}
+
+static int timehist_sched_change_event(struct perf_tool *tool __maybe_unused,
+				       union perf_event *event,
+				       struct perf_evsel *evsel,
+				       struct perf_sample *sample,
+				       struct machine *machine)
+{
+	struct addr_location al;
+	struct thread *thread;
+	struct thread_runtime *tr = NULL;
+	u64 tprev;
+	int rc = 0;
+
+	if (machine__resolve(machine, &al, sample) < 0) {
+		pr_err("problem processing %d event. skipping it\n",
+		       event->header.type);
+		rc = -1;
+		goto out;
+	}
+
+	thread = timehist_get_thread(sample, machine, evsel);
+	if (thread == NULL) {
+		rc = -1;
+		goto out;
+	}
+
+	if (timehist_skip_sample(thread))
+		goto out;
+
+	tr = thread__get_runtime(thread);
+	if (tr == NULL) {
+		rc = -1;
+		goto out;
+	}
+
+	tprev = perf_evsel__get_time(evsel, sample->cpu);
+
+	timehist_update_runtime_stats(tr, sample->time, tprev);
+	timehist_print_sample(sample, thread);
+
+out:
+	if (tr) {
+		/* time of this sched_switch event becomes last time task seen */
+		tr->last_time = sample->time;
+
+		/* sched out event for task so reset ready to run time */
+		tr->ready_to_run = 0;
+	}
+
+	perf_evsel__save_time(evsel, sample->time, sample->cpu);
+
+	return rc;
+}
+
+static int timehist_sched_switch_event(struct perf_tool *tool,
+			     union perf_event *event,
+			     struct perf_evsel *evsel,
+			     struct perf_sample *sample,
+			     struct machine *machine __maybe_unused)
+{
+	return timehist_sched_change_event(tool, event, evsel, sample, machine);
+}
+
+static int process_lost(struct perf_tool *tool __maybe_unused,
+			union perf_event *event,
+			struct perf_sample *sample,
+			struct machine *machine __maybe_unused)
+{
+	char tstr[64];
+
+	timestamp__scnprintf_usec(sample->time, tstr, sizeof(tstr));
+	printf("%15s ", tstr);
+	printf("lost %" PRIu64 " events on cpu %d\n", event->lost.lost, sample->cpu);
+
+	return 0;
+}
+
+
+typedef int (*sched_handler)(struct perf_tool *tool,
+			  union perf_event *event,
+			  struct perf_evsel *evsel,
+			  struct perf_sample *sample,
+			  struct machine *machine);
+
+static int perf_timehist__process_sample(struct perf_tool *tool,
+					 union perf_event *event,
+					 struct perf_sample *sample,
+					 struct perf_evsel *evsel,
+					 struct machine *machine)
+{
+	struct perf_sched *sched = container_of(tool, struct perf_sched, tool);
+	int err = 0;
+	int this_cpu = sample->cpu;
+
+	if (this_cpu > sched->max_cpu)
+		sched->max_cpu = this_cpu;
+
+	if (evsel->handler != NULL) {
+		sched_handler f = evsel->handler;
+
+		err = f(tool, event, evsel, sample, machine);
+	}
+
+	return err;
+}
+
+static int perf_sched__timehist(struct perf_sched *sched)
+{
+	const struct perf_evsel_str_handler handlers[] = {
+		{ "sched:sched_switch",       timehist_sched_switch_event, },
+		{ "sched:sched_wakeup",	      timehist_sched_wakeup_event, },
+		{ "sched:sched_wakeup_new",   timehist_sched_wakeup_event, },
+	};
+	struct perf_data_file file = {
+		.path = input_name,
+		.mode = PERF_DATA_MODE_READ,
+	};
+
+	struct perf_session *session;
+	int err = -1;
+
+	/*
+	 * event handlers for timehist option
+	 */
+	sched->tool.sample	 = perf_timehist__process_sample;
+	sched->tool.mmap	 = perf_event__process_mmap;
+	sched->tool.comm	 = perf_event__process_comm;
+	sched->tool.exit	 = perf_event__process_exit;
+	sched->tool.fork	 = perf_event__process_fork;
+	sched->tool.lost	 = process_lost;
+	sched->tool.attr	 = perf_event__process_attr;
+	sched->tool.tracing_data = perf_event__process_tracing_data;
+	sched->tool.build_id	 = perf_event__process_build_id;
+
+	sched->tool.ordered_events = true;
+	sched->tool.ordering_requires_timestamps = true;
+
+	session = perf_session__new(&file, false, &sched->tool);
+	if (session == NULL)
+		return -ENOMEM;
+
+	symbol__init(&session->header.env);
+
+	setup_pager();
+
+	/* setup per-evsel handlers */
+	if (perf_session__set_tracepoints_handlers(session, handlers))
+		goto out;
+
+	if (!perf_session__has_traces(session, "record -R"))
+		goto out;
+
+	/* pre-allocate struct for per-CPU idle stats */
+	sched->max_cpu = session->header.env.nr_cpus_online;
+	if (sched->max_cpu == 0)
+		sched->max_cpu = 4;
+	if (init_idle_threads(sched->max_cpu))
+		goto out;
+
+	timehist_header();
+
+	err = perf_session__process_events(session);
+	if (err) {
+		pr_err("Failed to process events, error %d", err);
+		goto out;
+	}
+
+out:
+	free_idle_threads();
+	perf_session__delete(session);
+
+	return err;
+}
+
+
 static void print_bad_events(struct perf_sched *sched)
 {
 	if (sched->nr_unordered_timestamps && sched->nr_timestamps) {
@@ -1970,8 +2535,6 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 	const struct option latency_options[] = {
 	OPT_STRING('s', "sort", &sched.sort_order, "key[,key2...]",
 		   "sort by key(s): runtime, switch, avg, max"),
-	OPT_INCR('v', "verbose", &verbose,
-		    "be more verbose (show symbol address, etc)"),
 	OPT_INTEGER('C', "CPU", &sched.profile_cpu,
 		    "CPU to profile on"),
 	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
@@ -1983,8 +2546,6 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 	const struct option replay_options[] = {
 	OPT_UINTEGER('r', "repeat", &sched.replay_repeat,
 		     "repeat the workload replay N times (-1: infinite)"),
-	OPT_INCR('v', "verbose", &verbose,
-		    "be more verbose (show symbol address, etc)"),
 	OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
 		    "dump raw trace in ASCII"),
 	OPT_BOOLEAN('f', "force", &sched.force, "don't complain, do it"),
@@ -2001,6 +2562,16 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
                     "display given CPUs in map"),
 	OPT_PARENT(sched_options)
 	};
+	const struct option timehist_options[] = {
+	OPT_STRING('k', "vmlinux", &symbol_conf.vmlinux_name,
+		   "file", "vmlinux pathname"),
+	OPT_STRING(0, "kallsyms", &symbol_conf.kallsyms_name,
+		   "file", "kallsyms pathname"),
+	OPT_STRING(0, "symfs", &symbol_conf.symfs, "directory",
+		    "Look for files with symbols relative to this directory"),
+	OPT_PARENT(sched_options)
+	};
+
 	const char * const latency_usage[] = {
 		"perf sched latency [<options>]",
 		NULL
@@ -2013,8 +2584,13 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 		"perf sched map [<options>]",
 		NULL
 	};
+	const char * const timehist_usage[] = {
+		"perf sched timehist [<options>]",
+		NULL
+	};
 	const char *const sched_subcommands[] = { "record", "latency", "map",
-						  "replay", "script", NULL };
+						  "replay", "script",
+						  "timehist", NULL };
 	const char *sched_usage[] = {
 		NULL,
 		NULL
@@ -2077,6 +2653,14 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 				usage_with_options(replay_usage, replay_options);
 		}
 		return perf_sched__replay(&sched);
+	} else if (!strcmp(argv[0], "timehist")) {
+		if (argc) {
+			argc = parse_options(argc, argv, timehist_options,
+					     timehist_usage, 0);
+			if (argc)
+				usage_with_options(timehist_usage, timehist_options);
+		}
+		return perf_sched__timehist(&sched);
 	} else {
 		usage_with_options(sched_usage, sched_options);
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 15/18] perf sched timehist: Add summary options
  2016-11-23 16:40 [GIT PULL 00/18] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (13 preceding siblings ...)
  2016-11-23 16:40 ` [PATCH 14/18] perf sched timehist: Introduce timehist command Arnaldo Carvalho de Melo
@ 2016-11-23 16:40 ` Arnaldo Carvalho de Melo
  2016-11-23 16:40 ` [PATCH 16/18] perf sched timehist: Add -w/--wakeups option Arnaldo Carvalho de Melo
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-11-23 16:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, David Ahern, Namhyung Kim, Andi Kleen,
	Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo

From: David Ahern <dsahern@gmail.com>

The -s/--summary option is to show process runtime statistics.  And the
 -S/--with-summary option is to show the stats with the normal output.

  $ perf sched timehist -s

  Runtime summary
                            comm  parent   sched-in     run-time    min-run     avg-run     max-run  stddev
                                            (count)       (msec)     (msec)      (msec)      (msec)       %
  ---------------------------------------------------------------------------------------------------------
                  ksoftirqd/0[3]       2          2        0.011      0.004       0.005       0.006   14.87
                  rcu_preempt[7]       2         11        0.071      0.002       0.006       0.017   20.23
                  watchdog/0[11]       2          1        0.002      0.002       0.002       0.002    0.00
                  watchdog/1[12]       2          1        0.004      0.004       0.004       0.004    0.00
  ...

  Terminated tasks:
                     sleep[7220]    7219          3        0.770      0.087       0.256       0.576   62.28

  Idle stats:
      CPU  0 idle for   2352.006  msec
      CPU  1 idle for   2764.497  msec
      CPU  2 idle for   2998.229  msec
      CPU  3 idle for   2967.800  msec

      Total number of unique tasks: 52
  Total number of context switches: 2532
             Total run time (msec): 218.036

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20161116060634.28477-5-namhyung@kernel.org
[ Add documentation from last commit, so that docs comes with the cset that introduces the feature ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-sched.c | 166 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 160 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index c0ac0c9557e8..1e7d81ad5ec6 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -194,6 +194,11 @@ struct perf_sched {
 	bool force;
 	bool skip_merge;
 	struct perf_sched_map map;
+
+	/* options for timehist command */
+	bool		summary;
+	bool		summary_only;
+	u64		skipped_samples;
 };
 
 /* per thread run time data */
@@ -2010,12 +2015,15 @@ static struct thread *timehist_get_thread(struct perf_sample *sample,
 	return thread;
 }
 
-static bool timehist_skip_sample(struct thread *thread)
+static bool timehist_skip_sample(struct perf_sched *sched,
+				 struct thread *thread)
 {
 	bool rc = false;
 
-	if (thread__is_filtered(thread))
+	if (thread__is_filtered(thread)) {
 		rc = true;
+		sched->skipped_samples++;
+	}
 
 	return rc;
 }
@@ -2045,7 +2053,7 @@ static int timehist_sched_wakeup_event(struct perf_tool *tool __maybe_unused,
 	return 0;
 }
 
-static int timehist_sched_change_event(struct perf_tool *tool __maybe_unused,
+static int timehist_sched_change_event(struct perf_tool *tool,
 				       union perf_event *event,
 				       struct perf_evsel *evsel,
 				       struct perf_sample *sample,
@@ -2056,6 +2064,7 @@ static int timehist_sched_change_event(struct perf_tool *tool __maybe_unused,
 	struct thread_runtime *tr = NULL;
 	u64 tprev;
 	int rc = 0;
+	struct perf_sched *sched = container_of(tool, struct perf_sched, tool);
 
 	if (machine__resolve(machine, &al, sample) < 0) {
 		pr_err("problem processing %d event. skipping it\n",
@@ -2070,7 +2079,7 @@ static int timehist_sched_change_event(struct perf_tool *tool __maybe_unused,
 		goto out;
 	}
 
-	if (timehist_skip_sample(thread))
+	if (timehist_skip_sample(sched, thread))
 		goto out;
 
 	tr = thread__get_runtime(thread);
@@ -2082,7 +2091,8 @@ static int timehist_sched_change_event(struct perf_tool *tool __maybe_unused,
 	tprev = perf_evsel__get_time(evsel, sample->cpu);
 
 	timehist_update_runtime_stats(tr, sample->time, tprev);
-	timehist_print_sample(sample, thread);
+	if (!sched->summary_only)
+		timehist_print_sample(sample, thread);
 
 out:
 	if (tr) {
@@ -2122,6 +2132,131 @@ static int process_lost(struct perf_tool *tool __maybe_unused,
 }
 
 
+static void print_thread_runtime(struct thread *t,
+				 struct thread_runtime *r)
+{
+	double mean = avg_stats(&r->run_stats);
+	float stddev;
+
+	printf("%*s   %5d  %9" PRIu64 " ",
+	       comm_width, timehist_get_commstr(t), t->ppid,
+	       (u64) r->run_stats.n);
+
+	print_sched_time(r->total_run_time, 8);
+	stddev = rel_stddev_stats(stddev_stats(&r->run_stats), mean);
+	print_sched_time(r->run_stats.min, 6);
+	printf(" ");
+	print_sched_time((u64) mean, 6);
+	printf(" ");
+	print_sched_time(r->run_stats.max, 6);
+	printf("  ");
+	printf("%5.2f", stddev);
+	printf("\n");
+}
+
+struct total_run_stats {
+	u64  sched_count;
+	u64  task_count;
+	u64  total_run_time;
+};
+
+static int __show_thread_runtime(struct thread *t, void *priv)
+{
+	struct total_run_stats *stats = priv;
+	struct thread_runtime *r;
+
+	if (thread__is_filtered(t))
+		return 0;
+
+	r = thread__priv(t);
+	if (r && r->run_stats.n) {
+		stats->task_count++;
+		stats->sched_count += r->run_stats.n;
+		stats->total_run_time += r->total_run_time;
+		print_thread_runtime(t, r);
+	}
+
+	return 0;
+}
+
+static int show_thread_runtime(struct thread *t, void *priv)
+{
+	if (t->dead)
+		return 0;
+
+	return __show_thread_runtime(t, priv);
+}
+
+static int show_deadthread_runtime(struct thread *t, void *priv)
+{
+	if (!t->dead)
+		return 0;
+
+	return __show_thread_runtime(t, priv);
+}
+
+static void timehist_print_summary(struct perf_sched *sched,
+				   struct perf_session *session)
+{
+	struct machine *m = &session->machines.host;
+	struct total_run_stats totals;
+	u64 task_count;
+	struct thread *t;
+	struct thread_runtime *r;
+	int i;
+
+	memset(&totals, 0, sizeof(totals));
+
+	if (comm_width < 30)
+		comm_width = 30;
+
+	printf("\nRuntime summary\n");
+	printf("%*s  parent   sched-in  ", comm_width, "comm");
+	printf("   run-time    min-run     avg-run     max-run  stddev\n");
+	printf("%*s            (count)  ", comm_width, "");
+	printf("     (msec)     (msec)      (msec)      (msec)       %%\n");
+	printf("%.105s\n", graph_dotted_line);
+
+	machine__for_each_thread(m, show_thread_runtime, &totals);
+	task_count = totals.task_count;
+	if (!task_count)
+		printf("<no still running tasks>\n");
+
+	printf("\nTerminated tasks:\n");
+	machine__for_each_thread(m, show_deadthread_runtime, &totals);
+	if (task_count == totals.task_count)
+		printf("<no terminated tasks>\n");
+
+	/* CPU idle stats not tracked when samples were skipped */
+	if (sched->skipped_samples)
+		return;
+
+	printf("\nIdle stats:\n");
+	for (i = 0; i <= idle_max_cpu; ++i) {
+		t = idle_threads[i];
+		if (!t)
+			continue;
+
+		r = thread__priv(t);
+		if (r && r->run_stats.n) {
+			totals.sched_count += r->run_stats.n;
+			printf("    CPU %2d idle for ", i);
+			print_sched_time(r->total_run_time, 6);
+			printf(" msec\n");
+		} else
+			printf("    CPU %2d idle entire time window\n", i);
+	}
+
+	printf("\n"
+	       "    Total number of unique tasks: %" PRIu64 "\n"
+	       "Total number of context switches: %" PRIu64 "\n"
+	       "           Total run time (msec): ",
+	       totals.task_count, totals.sched_count);
+
+	print_sched_time(totals.total_run_time, 2);
+	printf("\n");
+}
+
 typedef int (*sched_handler)(struct perf_tool *tool,
 			  union perf_event *event,
 			  struct perf_evsel *evsel,
@@ -2163,6 +2298,7 @@ static int perf_sched__timehist(struct perf_sched *sched)
 	};
 
 	struct perf_session *session;
+	struct perf_evlist *evlist;
 	int err = -1;
 
 	/*
@@ -2185,6 +2321,8 @@ static int perf_sched__timehist(struct perf_sched *sched)
 	if (session == NULL)
 		return -ENOMEM;
 
+	evlist = session->evlist;
+
 	symbol__init(&session->header.env);
 
 	setup_pager();
@@ -2203,7 +2341,12 @@ static int perf_sched__timehist(struct perf_sched *sched)
 	if (init_idle_threads(sched->max_cpu))
 		goto out;
 
-	timehist_header();
+	/* summary_only implies summary option, but don't overwrite summary if set */
+	if (sched->summary_only)
+		sched->summary = sched->summary_only;
+
+	if (!sched->summary_only)
+		timehist_header();
 
 	err = perf_session__process_events(session);
 	if (err) {
@@ -2211,6 +2354,13 @@ static int perf_sched__timehist(struct perf_sched *sched)
 		goto out;
 	}
 
+	sched->nr_events      = evlist->stats.nr_events[0];
+	sched->nr_lost_events = evlist->stats.total_lost;
+	sched->nr_lost_chunks = evlist->stats.nr_events[PERF_RECORD_LOST];
+
+	if (sched->summary)
+		timehist_print_summary(sched, session);
+
 out:
 	free_idle_threads();
 	perf_session__delete(session);
@@ -2569,6 +2719,10 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 		   "file", "kallsyms pathname"),
 	OPT_STRING(0, "symfs", &symbol_conf.symfs, "directory",
 		    "Look for files with symbols relative to this directory"),
+	OPT_BOOLEAN('s', "summary", &sched.summary_only,
+		    "Show only syscall summary with statistics"),
+	OPT_BOOLEAN('S', "with-summary", &sched.summary,
+		    "Show all syscalls and summary with statistics"),
 	OPT_PARENT(sched_options)
 	};
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 16/18] perf sched timehist: Add -w/--wakeups option
  2016-11-23 16:40 [GIT PULL 00/18] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (14 preceding siblings ...)
  2016-11-23 16:40 ` [PATCH 15/18] perf sched timehist: Add summary options Arnaldo Carvalho de Melo
@ 2016-11-23 16:40 ` Arnaldo Carvalho de Melo
  2016-11-23 16:40 ` [PATCH 17/18] perf sched timehist: Add call graph options Arnaldo Carvalho de Melo
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-11-23 16:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, David Ahern, Namhyung Kim, Andi Kleen,
	Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo

From: David Ahern <dsahern@gmail.com>

The -w option is to show wakeup events with timehist.

  $ perf sched timehist -w
             time    cpu  task name              b/n time  sch delay   run time
                          [tid/pid]                (msec)     (msec)     (msec)
  --------------- ------  --------------------  ---------  ---------  ---------
   2412598.429689 [0002]  perf[7219]                                             awakened: perf[7220]
   2412598.429696 [0009]  <idle>                    0.000      0.000      0.000
   2412598.429767 [0002]  perf[7219]                0.000      0.000      0.000
   2412598.429780 [0009]  perf[7220]                                             awakened: migration/9[53]
  ...

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20161116060634.28477-6-namhyung@kernel.org
[ Add documentation based on above commit message ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-sched.txt |  4 +++
 tools/perf/builtin-sched.c              | 58 ++++++++++++++++++++++++++++++---
 2 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/tools/perf/Documentation/perf-sched.txt b/tools/perf/Documentation/perf-sched.txt
index 72730da307b9..9a77bc73e8a3 100644
--- a/tools/perf/Documentation/perf-sched.txt
+++ b/tools/perf/Documentation/perf-sched.txt
@@ -112,6 +112,10 @@ OPTIONS for 'perf sched timehist'
 --symfs=<directory>::
     Look for files with symbols relative to this directory.
 
+-w::
+--wakeups::
+	Show wakeup events.
+
 SEE ALSO
 --------
 linkperf:perf-record[1]
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 1e7d81ad5ec6..8fb7bcc2cb76 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -198,6 +198,7 @@ struct perf_sched {
 	/* options for timehist command */
 	bool		summary;
 	bool		summary_only;
+	bool		show_wakeups;
 	u64		skipped_samples;
 };
 
@@ -1807,7 +1808,8 @@ static void timehist_header(void)
 	printf("\n");
 }
 
-static void timehist_print_sample(struct perf_sample *sample,
+static void timehist_print_sample(struct perf_sched *sched,
+				  struct perf_sample *sample,
 				  struct thread *thread)
 {
 	struct thread_runtime *tr = thread__priv(thread);
@@ -1821,6 +1823,10 @@ static void timehist_print_sample(struct perf_sample *sample,
 	print_sched_time(tr->dt_wait, 6);
 	print_sched_time(tr->dt_delay, 6);
 	print_sched_time(tr->dt_run, 6);
+
+	if (sched->show_wakeups)
+		printf("  %-*s", comm_width, "");
+
 	printf("\n");
 }
 
@@ -2028,12 +2034,44 @@ static bool timehist_skip_sample(struct perf_sched *sched,
 	return rc;
 }
 
-static int timehist_sched_wakeup_event(struct perf_tool *tool __maybe_unused,
+static void timehist_print_wakeup_event(struct perf_sched *sched,
+					struct perf_sample *sample,
+					struct machine *machine,
+					struct thread *awakened)
+{
+	struct thread *thread;
+	char tstr[64];
+
+	thread = machine__findnew_thread(machine, sample->pid, sample->tid);
+	if (thread == NULL)
+		return;
+
+	/* show wakeup unless both awakee and awaker are filtered */
+	if (timehist_skip_sample(sched, thread) &&
+	    timehist_skip_sample(sched, awakened)) {
+		return;
+	}
+
+	timestamp__scnprintf_usec(sample->time, tstr, sizeof(tstr));
+	printf("%15s [%04d] ", tstr, sample->cpu);
+
+	printf(" %-*s ", comm_width, timehist_get_commstr(thread));
+
+	/* dt spacer */
+	printf("  %9s  %9s  %9s ", "", "", "");
+
+	printf("awakened: %s", timehist_get_commstr(awakened));
+
+	printf("\n");
+}
+
+static int timehist_sched_wakeup_event(struct perf_tool *tool,
 				       union perf_event *event __maybe_unused,
 				       struct perf_evsel *evsel,
 				       struct perf_sample *sample,
 				       struct machine *machine)
 {
+	struct perf_sched *sched = container_of(tool, struct perf_sched, tool);
 	struct thread *thread;
 	struct thread_runtime *tr = NULL;
 	/* want pid of awakened task not pid in sample */
@@ -2050,6 +2088,10 @@ static int timehist_sched_wakeup_event(struct perf_tool *tool __maybe_unused,
 	if (tr->ready_to_run == 0)
 		tr->ready_to_run = sample->time;
 
+	/* show wakeups if requested */
+	if (sched->show_wakeups)
+		timehist_print_wakeup_event(sched, sample, machine, thread);
+
 	return 0;
 }
 
@@ -2059,12 +2101,12 @@ static int timehist_sched_change_event(struct perf_tool *tool,
 				       struct perf_sample *sample,
 				       struct machine *machine)
 {
+	struct perf_sched *sched = container_of(tool, struct perf_sched, tool);
 	struct addr_location al;
 	struct thread *thread;
 	struct thread_runtime *tr = NULL;
 	u64 tprev;
 	int rc = 0;
-	struct perf_sched *sched = container_of(tool, struct perf_sched, tool);
 
 	if (machine__resolve(machine, &al, sample) < 0) {
 		pr_err("problem processing %d event. skipping it\n",
@@ -2092,7 +2134,7 @@ static int timehist_sched_change_event(struct perf_tool *tool,
 
 	timehist_update_runtime_stats(tr, sample->time, tprev);
 	if (!sched->summary_only)
-		timehist_print_sample(sample, thread);
+		timehist_print_sample(sched, sample, thread);
 
 out:
 	if (tr) {
@@ -2723,6 +2765,7 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 		    "Show only syscall summary with statistics"),
 	OPT_BOOLEAN('S', "with-summary", &sched.summary,
 		    "Show all syscalls and summary with statistics"),
+	OPT_BOOLEAN('w', "wakeups", &sched.show_wakeups, "Show wakeup events"),
 	OPT_PARENT(sched_options)
 	};
 
@@ -2814,6 +2857,13 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 			if (argc)
 				usage_with_options(timehist_usage, timehist_options);
 		}
+		if (sched.show_wakeups && sched.summary_only) {
+			pr_err(" Error: -s and -w are mutually exclusive.\n");
+			parse_options_usage(timehist_usage, timehist_options, "s", true);
+			parse_options_usage(NULL, timehist_options, "w", true);
+			return -EINVAL;
+		}
+
 		return perf_sched__timehist(&sched);
 	} else {
 		usage_with_options(sched_usage, sched_options);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 17/18] perf sched timehist: Add call graph options
  2016-11-23 16:40 [GIT PULL 00/18] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (15 preceding siblings ...)
  2016-11-23 16:40 ` [PATCH 16/18] perf sched timehist: Add -w/--wakeups option Arnaldo Carvalho de Melo
@ 2016-11-23 16:40 ` Arnaldo Carvalho de Melo
  2016-11-23 16:40 ` [PATCH 18/18] perf sched timehist: Add -V/--cpu-visual option Arnaldo Carvalho de Melo
  2016-11-24  4:10 ` [GIT PULL 00/18] perf/core improvements and fixes Ingo Molnar
  18 siblings, 0 replies; 20+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-11-23 16:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, David Ahern, Namhyung Kim, Andi Kleen,
	Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo

From: David Ahern <dsahern@gmail.com>

If callchains were recorded they are appended to the line with a default stack depth of 5:

  1.874569 [0011] gcc[31949]       0.014 0.000 1.148 wait_for_completion_killable <- do_fork <- sys_vfork <- stub_vfork <- __vfork
  1.874591 [0010] gcc[31951]       0.000 0.000 0.024 __cond_resched <- _cond_resched <- wait_for_completion <- stop_one_cpu <- sched_exec
  1.874603 [0010] migration/10[59] 3.350 0.004 0.011 smpboot_thread_fn <- kthread <- ret_from_fork
  1.874604 [0011] <idle>           1.148 0.000 0.035 cpu_startup_entry <- start_secondary
  1.874723 [0005] <idle>           0.016 0.000 1.383 cpu_startup_entry <- start_secondary
  1.874746 [0005] gcc[31949]       0.153 0.078 0.022 do_wait sys_wait4 <- system_call_fastpath <- __GI___waitpid

 --no-call-graph can be used to not show the callchains. --max-stack is used
to control the number of frames shown (default of 5). -x/--excl options can
be used to collapse redundant callchains to get more relevant data on screen.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20161116060634.28477-7-namhyung@kernel.org
[ Add documentation based on above commit message ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-sched.txt |  7 +++
 tools/perf/builtin-sched.c              | 88 ++++++++++++++++++++++++++++++---
 2 files changed, 89 insertions(+), 6 deletions(-)

diff --git a/tools/perf/Documentation/perf-sched.txt b/tools/perf/Documentation/perf-sched.txt
index 9a77bc73e8a3..83452088727d 100644
--- a/tools/perf/Documentation/perf-sched.txt
+++ b/tools/perf/Documentation/perf-sched.txt
@@ -99,6 +99,13 @@ OPTIONS for 'perf sched timehist'
 --kallsyms=<file>::
     kallsyms pathname
 
+-g::
+--no-call-graph::
+	Do not display call chains if present.
+
+--max-stack::
+	Maximum number of functions to display in backtrace, default 5.
+
 -s::
 --summary::
     Show only a summary of scheduling by thread with min, max, and average
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 8fb7bcc2cb76..1f8731640809 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -14,6 +14,7 @@
 #include "util/thread_map.h"
 #include "util/color.h"
 #include "util/stat.h"
+#include "util/callchain.h"
 
 #include <subcmd/parse-options.h>
 #include "util/trace-event.h"
@@ -198,6 +199,8 @@ struct perf_sched {
 	/* options for timehist command */
 	bool		summary;
 	bool		summary_only;
+	bool		show_callchain;
+	unsigned int	max_stack;
 	bool		show_wakeups;
 	u64		skipped_samples;
 };
@@ -1810,6 +1813,7 @@ static void timehist_header(void)
 
 static void timehist_print_sample(struct perf_sched *sched,
 				  struct perf_sample *sample,
+				  struct addr_location *al,
 				  struct thread *thread)
 {
 	struct thread_runtime *tr = thread__priv(thread);
@@ -1827,6 +1831,18 @@ static void timehist_print_sample(struct perf_sched *sched,
 	if (sched->show_wakeups)
 		printf("  %-*s", comm_width, "");
 
+	if (thread->tid == 0)
+		goto out;
+
+	if (sched->show_callchain)
+		printf("  ");
+
+	sample__fprintf_sym(sample, al, 0,
+			    EVSEL__PRINT_SYM | EVSEL__PRINT_ONELINE |
+			    EVSEL__PRINT_CALLCHAIN_ARROW,
+			    &callchain_cursor, stdout);
+
+out:
 	printf("\n");
 }
 
@@ -1878,9 +1894,14 @@ static void timehist_update_runtime_stats(struct thread_runtime *r,
 	r->total_run_time += r->dt_run;
 }
 
-static bool is_idle_sample(struct perf_sample *sample,
-			   struct perf_evsel *evsel)
+static bool is_idle_sample(struct perf_sched *sched,
+			   struct perf_sample *sample,
+			   struct perf_evsel *evsel,
+			   struct machine *machine)
 {
+	struct thread *thread;
+	struct callchain_cursor *cursor = &callchain_cursor;
+
 	/* pid 0 == swapper == idle task */
 	if (sample->pid == 0)
 		return true;
@@ -1889,6 +1910,25 @@ static bool is_idle_sample(struct perf_sample *sample,
 		if (perf_evsel__intval(evsel, sample, "prev_pid") == 0)
 			return true;
 	}
+
+	/* want main thread for process - has maps */
+	thread = machine__findnew_thread(machine, sample->pid, sample->pid);
+	if (thread == NULL) {
+		pr_debug("Failed to get thread for pid %d.\n", sample->pid);
+		return false;
+	}
+
+	if (!symbol_conf.use_callchain || sample->callchain == NULL)
+		return false;
+
+	if (thread__resolve_callchain(thread, cursor, evsel, sample,
+				      NULL, NULL, sched->max_stack) != 0) {
+		if (verbose)
+			error("Failed to resolve callchain. Skipping\n");
+
+		return false;
+	}
+	callchain_cursor_commit(cursor);
 	return false;
 }
 
@@ -1999,13 +2039,14 @@ static struct thread_runtime *thread__get_runtime(struct thread *thread)
 	return tr;
 }
 
-static struct thread *timehist_get_thread(struct perf_sample *sample,
+static struct thread *timehist_get_thread(struct perf_sched *sched,
+					  struct perf_sample *sample,
 					  struct machine *machine,
 					  struct perf_evsel *evsel)
 {
 	struct thread *thread;
 
-	if (is_idle_sample(sample, evsel)) {
+	if (is_idle_sample(sched, sample, evsel, machine)) {
 		thread = get_idle_thread(sample->cpu);
 		if (thread == NULL)
 			pr_err("Failed to get idle thread for cpu %d.\n", sample->cpu);
@@ -2115,7 +2156,7 @@ static int timehist_sched_change_event(struct perf_tool *tool,
 		goto out;
 	}
 
-	thread = timehist_get_thread(sample, machine, evsel);
+	thread = timehist_get_thread(sched, sample, machine, evsel);
 	if (thread == NULL) {
 		rc = -1;
 		goto out;
@@ -2134,7 +2175,7 @@ static int timehist_sched_change_event(struct perf_tool *tool,
 
 	timehist_update_runtime_stats(tr, sample->time, tprev);
 	if (!sched->summary_only)
-		timehist_print_sample(sched, sample, thread);
+		timehist_print_sample(sched, sample, &al, thread);
 
 out:
 	if (tr) {
@@ -2327,6 +2368,30 @@ static int perf_timehist__process_sample(struct perf_tool *tool,
 	return err;
 }
 
+static int timehist_check_attr(struct perf_sched *sched,
+			       struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel;
+	struct evsel_runtime *er;
+
+	list_for_each_entry(evsel, &evlist->entries, node) {
+		er = perf_evsel__get_runtime(evsel);
+		if (er == NULL) {
+			pr_err("Failed to allocate memory for evsel runtime data\n");
+			return -1;
+		}
+
+		if (sched->show_callchain &&
+		    !(evsel->attr.sample_type & PERF_SAMPLE_CALLCHAIN)) {
+			pr_info("Samples do not have callchains.\n");
+			sched->show_callchain = 0;
+			symbol_conf.use_callchain = 0;
+		}
+	}
+
+	return 0;
+}
+
 static int perf_sched__timehist(struct perf_sched *sched)
 {
 	const struct perf_evsel_str_handler handlers[] = {
@@ -2359,6 +2424,8 @@ static int perf_sched__timehist(struct perf_sched *sched)
 	sched->tool.ordered_events = true;
 	sched->tool.ordering_requires_timestamps = true;
 
+	symbol_conf.use_callchain = sched->show_callchain;
+
 	session = perf_session__new(&file, false, &sched->tool);
 	if (session == NULL)
 		return -ENOMEM;
@@ -2367,6 +2434,9 @@ static int perf_sched__timehist(struct perf_sched *sched)
 
 	symbol__init(&session->header.env);
 
+	if (timehist_check_attr(sched, evlist) != 0)
+		goto out;
+
 	setup_pager();
 
 	/* setup per-evsel handlers */
@@ -2714,6 +2784,8 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 		.next_shortname1      = 'A',
 		.next_shortname2      = '0',
 		.skip_merge           = 0,
+		.show_callchain	      = 1,
+		.max_stack            = 5,
 	};
 	const struct option sched_options[] = {
 	OPT_STRING('i', "input", &input_name, "file",
@@ -2759,6 +2831,10 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 		   "file", "vmlinux pathname"),
 	OPT_STRING(0, "kallsyms", &symbol_conf.kallsyms_name,
 		   "file", "kallsyms pathname"),
+	OPT_BOOLEAN('g', "call-graph", &sched.show_callchain,
+		    "Display call chains if present (default on)"),
+	OPT_UINTEGER(0, "max-stack", &sched.max_stack,
+		   "Maximum number of functions to display backtrace."),
 	OPT_STRING(0, "symfs", &symbol_conf.symfs, "directory",
 		    "Look for files with symbols relative to this directory"),
 	OPT_BOOLEAN('s', "summary", &sched.summary_only,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 18/18] perf sched timehist: Add -V/--cpu-visual option
  2016-11-23 16:40 [GIT PULL 00/18] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (16 preceding siblings ...)
  2016-11-23 16:40 ` [PATCH 17/18] perf sched timehist: Add call graph options Arnaldo Carvalho de Melo
@ 2016-11-23 16:40 ` Arnaldo Carvalho de Melo
  2016-11-24  4:10 ` [GIT PULL 00/18] perf/core improvements and fixes Ingo Molnar
  18 siblings, 0 replies; 20+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-11-23 16:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, David Ahern, Namhyung Kim, Andi Kleen,
	Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo

From: David Ahern <dsahern@gmail.com>

The -V option provides a visual aid for sched switches by cpu:

  $ perf sched timehist -V
             time    cpu  0123456789abc  task name              b/n time  sch delay   run time
                                         [tid/pid]                (msec)     (msec)     (msec)
  --------------- ------  -------------  --------------------  ---------  ---------  ---------
  ...
   2412598.429696 [0009]           i     <idle>                    0.000      0.000      0.000
   2412598.429767 [0002]    s            perf[7219]                0.000      0.000      0.000
   2412598.429783 [0009]           s     perf[7220]                0.000      0.006      0.087
   2412598.429794 [0010]            i    <idle>                    0.000      0.000      0.000
   2412598.429795 [0009]           s     migration/9[53]           0.000      0.003      0.011
   2412598.430370 [0010]            s    sleep[7220]               0.011      0.000      0.576
   2412598.432584 [0003]     i           <idle>                    0.000      0.000      0.000
  ...

Committer notes:

'i' marks idle time, 's' are scheduler events.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20161116060634.28477-8-namhyung@kernel.org
[ Add documentation based on above commit message ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-sched.txt |  5 ++++
 tools/perf/builtin-sched.c              | 44 +++++++++++++++++++++++++++++++--
 2 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-sched.txt b/tools/perf/Documentation/perf-sched.txt
index 83452088727d..fb9e52d65fca 100644
--- a/tools/perf/Documentation/perf-sched.txt
+++ b/tools/perf/Documentation/perf-sched.txt
@@ -119,6 +119,11 @@ OPTIONS for 'perf sched timehist'
 --symfs=<directory>::
     Look for files with symbols relative to this directory.
 
+-V::
+--cpu-visual::
+	Show visual aid for sched switches by CPU: 'i' marks idle time,
+	's' are scheduler events.
+
 -w::
 --wakeups::
 	Show wakeup events.
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 1f8731640809..829468defa07 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -201,6 +201,7 @@ struct perf_sched {
 	bool		summary_only;
 	bool		show_callchain;
 	unsigned int	max_stack;
+	bool		show_cpu_visual;
 	bool		show_wakeups;
 	u64		skipped_samples;
 };
@@ -1783,10 +1784,23 @@ static char *timehist_get_commstr(struct thread *thread)
 	return str;
 }
 
-static void timehist_header(void)
+static void timehist_header(struct perf_sched *sched)
 {
+	u32 ncpus = sched->max_cpu + 1;
+	u32 i, j;
+
 	printf("%15s %6s ", "time", "cpu");
 
+	if (sched->show_cpu_visual) {
+		printf(" ");
+		for (i = 0, j = 0; i < ncpus; ++i) {
+			printf("%x", j++);
+			if (j > 15)
+				j = 0;
+		}
+		printf(" ");
+	}
+
 	printf(" %-20s  %9s  %9s  %9s",
 		"task name", "wait time", "sch delay", "run time");
 
@@ -1797,6 +1811,9 @@ static void timehist_header(void)
 	 */
 	printf("%15s %-6s ", "", "");
 
+	if (sched->show_cpu_visual)
+		printf(" %*s ", ncpus, "");
+
 	printf(" %-20s  %9s  %9s  %9s\n", "[tid/pid]", "(msec)", "(msec)", "(msec)");
 
 	/*
@@ -1804,6 +1821,9 @@ static void timehist_header(void)
 	 */
 	printf("%.15s %.6s ", graph_dotted_line, graph_dotted_line);
 
+	if (sched->show_cpu_visual)
+		printf(" %.*s ", ncpus, graph_dotted_line);
+
 	printf(" %.20s  %.9s  %.9s  %.9s",
 		graph_dotted_line, graph_dotted_line, graph_dotted_line,
 		graph_dotted_line);
@@ -1817,11 +1837,28 @@ static void timehist_print_sample(struct perf_sched *sched,
 				  struct thread *thread)
 {
 	struct thread_runtime *tr = thread__priv(thread);
+	u32 max_cpus = sched->max_cpu + 1;
 	char tstr[64];
 
 	timestamp__scnprintf_usec(sample->time, tstr, sizeof(tstr));
 	printf("%15s [%04d] ", tstr, sample->cpu);
 
+	if (sched->show_cpu_visual) {
+		u32 i;
+		char c;
+
+		printf(" ");
+		for (i = 0; i < max_cpus; ++i) {
+			/* flag idle times with 'i'; others are sched events */
+			if (i == sample->cpu)
+				c = (thread->tid == 0) ? 'i' : 's';
+			else
+				c = ' ';
+			printf("%c", c);
+		}
+		printf(" ");
+	}
+
 	printf(" %-*s ", comm_width, timehist_get_commstr(thread));
 
 	print_sched_time(tr->dt_wait, 6);
@@ -2095,6 +2132,8 @@ static void timehist_print_wakeup_event(struct perf_sched *sched,
 
 	timestamp__scnprintf_usec(sample->time, tstr, sizeof(tstr));
 	printf("%15s [%04d] ", tstr, sample->cpu);
+	if (sched->show_cpu_visual)
+		printf(" %*s ", sched->max_cpu + 1, "");
 
 	printf(" %-*s ", comm_width, timehist_get_commstr(thread));
 
@@ -2458,7 +2497,7 @@ static int perf_sched__timehist(struct perf_sched *sched)
 		sched->summary = sched->summary_only;
 
 	if (!sched->summary_only)
-		timehist_header();
+		timehist_header(sched);
 
 	err = perf_session__process_events(session);
 	if (err) {
@@ -2842,6 +2881,7 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_BOOLEAN('S', "with-summary", &sched.summary,
 		    "Show all syscalls and summary with statistics"),
 	OPT_BOOLEAN('w', "wakeups", &sched.show_wakeups, "Show wakeup events"),
+	OPT_BOOLEAN('V', "cpu-visual", &sched.show_cpu_visual, "Add CPU visual"),
 	OPT_PARENT(sched_options)
 	};
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [GIT PULL 00/18] perf/core improvements and fixes
  2016-11-23 16:40 [GIT PULL 00/18] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (17 preceding siblings ...)
  2016-11-23 16:40 ` [PATCH 18/18] perf sched timehist: Add -V/--cpu-visual option Arnaldo Carvalho de Melo
@ 2016-11-24  4:10 ` Ingo Molnar
  18 siblings, 0 replies; 20+ messages in thread
From: Ingo Molnar @ 2016-11-24  4:10 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Adrian Hunter,
	Alexander Shishkin, Andi Kleen, Andrew Morton, Chris Riyder,
	David Ahern, Don Zickus, Jiri Olsa, Joe Mario, Kim Phillips,
	Markus Trippelsdorf, Masami Hiramatsu, Namhyung Kim,
	Naveen N . Rao, Pawel Moll, Peter Zijlstra, Ravi Bangoria,
	Russell King, Stephane Eranian, Steven Rostedt, Taeung Song,
	Wang Nan


* Arnaldo Carvalho de Melo <acme@kernel.org> wrote:

> From: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> Hi Ingo,
> 
> 	Please consider pulling,
> 
> - Arnaldo
> 
> Test results at the end.
> 
> The following changes since commit 6a6b12e2125591e24891e6860410795ea53aed11:
> 
>   Merge tag 'perf-core-for-mingo-20161114' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2016-11-15 09:45:04 +0100)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-20161123
> 
> for you to fetch changes up to a407b0678bc1c39d70af5fdbe6421c164b69a8c0:
> 
>   perf sched timehist: Add -V/--cpu-visual option (2016-11-23 10:44:09 -0300)
> 
> ----------------------------------------------------------------
> perf/core improvements and fixes:
> 
> New tool:
> 
> - 'perf sched timehist' provides an analysis of scheduling events.
> 
>   Example usage:
>       perf sched record -- sleep 1
>       perf sched timehist
> 
>   By default it shows the individual schedule events, including the wait
>   time (time between sched-out and next sched-in events for the task), the
>   task scheduling delay (time between wakeup and actually running) and run
>   time for the task:
> 
>         time    cpu  task name         wait time  sch delay  run time
>                      [tid/pid]            (msec)     (msec)    (msec)
>     -------- ------  ----------------  ---------  ---------  --------
>     1.874569 [0011]  gcc[31949]            0.014      0.000     1.148
>     1.874591 [0010]  gcc[31951]            0.000      0.000     0.024
>     1.874603 [0010]  migration/10[59]      3.350      0.004     0.011
>     1.874604 [0011]  <idle>                1.148      0.000     0.035
>     1.874723 [0005]  <idle>                0.016      0.000     1.383
>     1.874746 [0005]  gcc[31949]            0.153      0.078     0.022
>   ...
> 
>   Times are in msec.usec. (David Ahern, Namhyung Kim)
> 
> Improvements:
> 
> - Make 'perf c2c report' support -f/--force, to allow skipping the
>   ownership check for root users, for instance, just like the other
>   tools (Jiri Olsa)
> 
> - Allow sorting cachelines by total number of HITMs, in addition to
>   local and remote numbers (Jiri Olsa)
> 
> Fixes:
> 
> - Make sure errors aren't suppressed by the TUI reset at the end of
>   a 'perf c2c report' session (Jiri Olsa)
> 
> Infrastructure:
> 
> - Initial work on having the annotate code better support multiple
>   architectures, including the ability to cross-annotate, i.e. to
>   annotate perf.data files collected on an ARM system on a x86_64
>   workstation (Arnaldo Carvalho de Melo, Ravi Bangoria, Kim Phillips)
> 
> - Use USECS_PER_SEC instead of hard coded number in libtraceevent (Steven Rostedt)
> 
> - Add retrieval of preempt count and latency flags in libtraceevent (Steven Rostedt)
> 
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> ----------------------------------------------------------------
> Arnaldo Carvalho de Melo (3):
>       perf annotate: Start supporting cross arch annotation
>       perf annotate: Allow arches to specify functions to skip
>       perf annotate: Add per arch instructions annotate handlers
> 
> David Ahern (5):
>       perf sched timehist: Introduce timehist command
>       perf sched timehist: Add summary options
>       perf sched timehist: Add -w/--wakeups option
>       perf sched timehist: Add call graph options
>       perf sched timehist: Add -V/--cpu-visual option
> 
> Jiri Olsa (6):
>       perf tools: Show event fd in debug output
>       perf c2c report: Setup browser after opening perf.data
>       perf c2c report: Add -f/--force option
>       perf c2c report: Add struct c2c_stats::tot_hitm field
>       perf c2c report: Display total HITMs on default
>       perf c2c: Support cascading options
> 
> Namhyung Kim (2):
>       perf symbols: Print symbol offsets conditionally
>       perf evsel: Support printing callchains with arrows
> 
> Steven Rostedt (2):
>       tools lib traceevent: Use USECS_PER_SEC instead of hardcoded number
>       tools lib traceevent: Add retrieval of preempt count and latency flags
> 
>  tools/lib/traceevent/event-parse.c          |  41 +-
>  tools/lib/traceevent/event-parse.h          |   5 +-
>  tools/perf/Documentation/perf-c2c.txt       |   8 +
>  tools/perf/Documentation/perf-sched.txt     |  66 +-
>  tools/perf/arch/arm/annotate/instructions.c |  90 +++
>  tools/perf/arch/x86/annotate/instructions.c |  78 +++
>  tools/perf/builtin-c2c.c                    |  80 ++-
>  tools/perf/builtin-sched.c                  | 914 +++++++++++++++++++++++++++-
>  tools/perf/builtin-top.c                    |   2 +-
>  tools/perf/ui/browsers/annotate.c           |   2 +-
>  tools/perf/ui/gtk/annotate.c                |   2 +-
>  tools/perf/util/annotate.c                  | 251 ++++----
>  tools/perf/util/annotate.h                  |   6 +-
>  tools/perf/util/evsel.c                     |   6 +-
>  tools/perf/util/evsel.h                     |   1 +
>  tools/perf/util/evsel_fprintf.c             |  12 +-
>  tools/perf/util/mem-events.c                |  12 +-
>  tools/perf/util/mem-events.h                |   1 +
>  tools/perf/util/symbol.h                    |   3 +-
>  tools/perf/util/symbol_fprintf.c            |  11 +-
>  20 files changed, 1406 insertions(+), 185 deletions(-)
>  create mode 100644 tools/perf/arch/arm/annotate/instructions.c
>  create mode 100644 tools/perf/arch/x86/annotate/instructions.c
> 
>   # uname -a
>   Linux jouet 4.8.6-201.fc24.x86_64 #1 SMP Thu Nov 3 14:38:57 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>   # perf test
>    1: vmlinux symtab matches kallsyms                          : Ok
>    2: detect openat syscall event                              : Ok
>    3: detect openat syscall event on all cpus                  : Ok
>    4: read samples using the mmap interface                    : Ok
>    5: parse events tests                                       : Ok
>    6: Validate PERF_RECORD_* events & perf_sample fields       : Ok
>    7: Test perf pmu format parsing                             : Ok
>    8: Test dso data read                                       : Ok
>    9: Test dso data cache                                      : Ok
>   10: Test dso data reopen                                     : Ok
>   11: roundtrip evsel->name check                              : Ok
>   12: Check parsing of sched tracepoints fields                : Ok
>   13: Generate and check syscalls:sys_enter_openat event fields: Ok
>   14: struct perf_event_attr setup                             : Ok
>   15: Test matching and linking multiple hists                 : Ok
>   16: Try 'import perf' in python, checking link problems      : Ok
>   17: Test breakpoint overflow signal handler                  : Ok
>   18: Test breakpoint overflow sampling                        : Ok
>   19: Test number of exit event of a simple workload           : Ok
>   20: Test software clock events have valid period values      : Ok
>   21: Test object code reading                                 : Ok
>   22: Test sample parsing                                      : Ok
>   23: Test using a dummy software event to keep tracking       : Ok
>   24: Test parsing with no sample_id_all bit set               : Ok
>   25: Test filtering hist entries                              : Ok
>   26: Test mmap thread lookup                                  : Ok
>   27: Test thread mg sharing                                   : Ok
>   28: Test output sorting of hist entries                      : Ok
>   29: Test cumulation of child hist entries                    : Ok
>   30: Test tracking with sched_switch                          : Ok
>   31: Filter fds with revents mask in a fdarray                : Ok
>   32: Add fd to a fdarray, making it autogrow                  : Ok
>   33: Test kmod_path__parse function                           : Ok
>   34: Test thread map                                          : Ok
>   35: Test LLVM searching and compiling                        :
>   35.1: Basic BPF llvm compiling test                          : Ok
>   35.2: Test kbuild searching                                  : Ok
>   35.3: Compile source for BPF prologue generation test        : Ok
>   35.4: Compile source for BPF relocation test                 : Ok
>   36: Test topology in session                                 : Ok
>   37: Test BPF filter                                          :
>   37.1: Test basic BPF filtering                               : Ok
>   37.2: Test BPF prologue generation                           : Ok
>   37.3: Test BPF relocation checker                            : Ok
>   38: Test thread map synthesize                               : Ok
>   39: Test cpu map synthesize                                  : Ok
>   40: Test stat config synthesize                              : Ok
>   41: Test stat synthesize                                     : Ok
>   42: Test stat round synthesize                               : Ok
>   43: Test attr update synthesize                              : Ok
>   44: Test events times                                        : Ok
>   45: Test backward reading from ring buffer                   : Ok
>   46: Test cpu map print                                       : Ok
>   47: Test SDT event probing                                   : Ok
>   48: Test is_printable_array function                         : Ok
>   49: Test bitmap print                                        : Ok
>   50: x86 rdpmc test                                           : Ok
>   51: Test converting perf time to TSC                         : Ok
>   52: Test dwarf unwind                                        : Ok
>   53: Test x86 instruction decoder - new instructions          : Ok
>   54: Test intel cqm nmi context read                          : Skip
>   # 
> 
>   # dm
>    1 alpine:3.4: Ok
>    2 android-ndk:r12b-arm: Ok
>    3 archlinux:latest: Ok
>    4 centos:5: Ok
>    5 centos:6: Ok
>    6 centos:7: Ok
>    7 debian:7: Ok
>    8 debian:8: Ok
>    9 debian:experimental: Ok
>   10 fedora:20: Ok
>   11 fedora:21: Ok
>   12 fedora:22: Ok
>   13 fedora:23: Ok
>   14 fedora:24: Ok
>   15 fedora:24-x-ARC-uClibc: Ok
>   16 fedora:rawhide: Ok
>   17 mageia:5: Ok
>   18 opensuse:13.2: Ok
>   19 opensuse:42.1: Ok
>   20 opensuse:tumbleweed: Ok
>   21 ubuntu:12.04.5: Ok
>   22 ubuntu:14.04: Ok
>   23 ubuntu:14.04.4: Ok
>   24 ubuntu:15.10: Ok
>   25 ubuntu:16.04: Ok
>   26 ubuntu:16.04-x-arm: Ok
>   27 ubuntu:16.04-x-arm64: Ok
>   28 ubuntu:16.04-x-powerpc: Ok
>   29 ubuntu:16.04-x-powerpc64: Ok
>   30 ubuntu:16.04-x-powerpc64el: Ok
>   31 ubuntu:16.04-x-s390: Ok
>   32 ubuntu:16.10: Ok
>   #
> 
>   $ make -C tools/perf build-test
>   make: Entering directory '/home/acme/git/linux/tools/perf'
>   - tarpkg: ./tests/perf-targz-src-pkg .
>              make_no_libperl_O: make NO_LIBPERL=1
>             make_no_demangle_O: make NO_DEMANGLE=1
>              make_no_libnuma_O: make NO_LIBNUMA=1
>             make_install_bin_O: make install-bin
>          make_install_prefix_O: make install prefix=/tmp/krava
>              make_util_map_o_O: make util/map.o
>   make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1
>               make_no_libbpf_O: make NO_LIBBPF=1
>                     make_doc_O: make doc
>        make_util_pmu_bison_o_O: make util/pmu-bison.o
>         make_with_babeltrace_O: make LIBBABELTRACE=1
>                   make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1
>              make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1
>                   make_debug_O: make DEBUG=1
>                  make_perf_o_O: make perf.o
>                make_no_slang_O: make NO_SLANG=1
>                 make_no_newt_O: make NO_NEWT=1
>                 make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1
>                 make_install_O: make install
>             make_no_auxtrace_O: make NO_AUXTRACE=1
>            make_no_backtrace_O: make NO_BACKTRACE=1
>    make_install_prefix_slash_O: make install prefix=/tmp/krava/
>               make_clean_all_O: make clean all
>            make_no_libpython_O: make NO_LIBPYTHON=1
>             make_no_libaudit_O: make NO_LIBAUDIT=1
>            make_no_libbionic_O: make NO_LIBBIONIC=1
>                    make_help_O: make help
>               make_no_libelf_O: make NO_LIBELF=1
>                    make_tags_O: make tags
>            make_no_libunwind_O: make NO_LIBUNWIND=1
>                 make_no_gtk2_O: make NO_GTK2=1
>                    make_pure_O: make
>                  make_static_O: make LDFLAGS=-static
>   OK
>   make: Leaving directory '/home/acme/git/linux/tools/perf'

Pulled, thanks a lot Arnaldo!

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-11-24  4:10 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-23 16:40 [GIT PULL 00/18] perf/core improvements and fixes Arnaldo Carvalho de Melo
2016-11-23 16:40 ` [PATCH 01/18] perf annotate: Start supporting cross arch annotation Arnaldo Carvalho de Melo
2016-11-23 16:40 ` [PATCH 02/18] perf annotate: Allow arches to specify functions to skip Arnaldo Carvalho de Melo
2016-11-23 16:40 ` [PATCH 03/18] perf annotate: Add per arch instructions annotate handlers Arnaldo Carvalho de Melo
2016-11-23 16:40 ` [PATCH 04/18] tools lib traceevent: Use USECS_PER_SEC instead of hardcoded number Arnaldo Carvalho de Melo
2016-11-23 16:40 ` [PATCH 05/18] tools lib traceevent: Add retrieval of preempt count and latency flags Arnaldo Carvalho de Melo
2016-11-23 16:40 ` [PATCH 06/18] perf tools: Show event fd in debug output Arnaldo Carvalho de Melo
2016-11-23 16:40 ` [PATCH 07/18] perf c2c report: Setup browser after opening perf.data Arnaldo Carvalho de Melo
2016-11-23 16:40 ` [PATCH 08/18] perf c2c report: Add -f/--force option Arnaldo Carvalho de Melo
2016-11-23 16:40 ` [PATCH 09/18] perf c2c report: Add struct c2c_stats::tot_hitm field Arnaldo Carvalho de Melo
2016-11-23 16:40 ` [PATCH 10/18] perf c2c report: Display total HITMs on default Arnaldo Carvalho de Melo
2016-11-23 16:40 ` [PATCH 11/18] perf c2c: Support cascading options Arnaldo Carvalho de Melo
2016-11-23 16:40 ` [PATCH 12/18] perf symbols: Print symbol offsets conditionally Arnaldo Carvalho de Melo
2016-11-23 16:40 ` [PATCH 13/18] perf evsel: Support printing callchains with arrows Arnaldo Carvalho de Melo
2016-11-23 16:40 ` [PATCH 14/18] perf sched timehist: Introduce timehist command Arnaldo Carvalho de Melo
2016-11-23 16:40 ` [PATCH 15/18] perf sched timehist: Add summary options Arnaldo Carvalho de Melo
2016-11-23 16:40 ` [PATCH 16/18] perf sched timehist: Add -w/--wakeups option Arnaldo Carvalho de Melo
2016-11-23 16:40 ` [PATCH 17/18] perf sched timehist: Add call graph options Arnaldo Carvalho de Melo
2016-11-23 16:40 ` [PATCH 18/18] perf sched timehist: Add -V/--cpu-visual option Arnaldo Carvalho de Melo
2016-11-24  4:10 ` [GIT PULL 00/18] perf/core improvements and fixes Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).