linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL 00/19] perf/core improvements and fixes
@ 2017-03-14 18:50 Arnaldo Carvalho de Melo
  2017-03-14 18:50 ` [PATCH 01/19] perf report: Hide tip message when -q option is given Arnaldo Carvalho de Melo
                   ` (19 more replies)
  0 siblings, 20 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-14 18:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Alexander Shishkin,
	Alexei Starovoitov, Ananth N Mavinakayanahalli, Andi Kleen,
	Aravinda Prasad, Brendan Gregg, Changbin Du, Daniel Borkmann,
	Eric Biederman, Feng Tang, Hari Bathini, Jiri Olsa, kernel-team,
	linuxppc-dev, Masami Hiramatsu, Michael Ellerman, Namhyung Kim,
	Naveen N . Rao, Peter Zijlstra, Sargun Dhillon, Steven Rostedt,
	Arnaldo Carvalho de Melo

Hi Ingo,

	Please consider pulling,

- Arnaldo

Test results at the end of this message, as usual.

The following changes since commit 84e5b549214f2160c12318aac549de85f600c79a:

  Merge tag 'perf-core-for-mingo-4.11-20170306' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2017-03-07 08:14:14 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-4.12-20170314

for you to fetch changes up to 5f6bee34707973ea7879a7857fd63ddccc92fff3:

  kprobes: Convert kprobe_exceptions_notify to use NOKPROBE_SYMBOL (2017-03-14 15:17:40 -0300)

----------------------------------------------------------------
perf/core improvements and fixes:

New features:

- Add PERF_RECORD_NAMESPACES so that the kernel can record information
  required to associate samples to namespaces, helping in container
  problem characterization.

  Now the 'perf record has a --namespace' option to ask for such info,
  and when present, it can be used, initially, via a new sort order,
  'cgroup_id', allowing histogram entry bucketization by a (device, inode)
  based cgroup identifier (Hari Bathini)

- Add --next option to 'perf sched timehist', showing what is the next
  thread to run (Brendan Gregg)

Fixes:

- Fix segfault with basic block 'cycles' sort dimension (Changbin Du)

- Add c2c to command-list.txt, making it appear in the 'perf help'
  output (Changbin Du)

- Fix zeroing of 'abs_path' variable in the perf hists browser switch
  file code (Changbin Du)

- Hide tips messages when -q/--quiet is given to 'perf report' (Namhyung Kim)

Infrastructure:

- Use ref_reloc_sym + offset to setup kretprobes (Naveen Rao)

- Ignore generated files pmu-events/{jevents,pmu-events.c} for git (Changbin Du)

Documentation:

- Document +field style argument support for --field option (Changbin Du)

- Clarify 'perf c2c --stats' help message (Namhyung Kim)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

----------------------------------------------------------------
Brendan Gregg (1):
      perf sched timehist: Add --next option

Changbin Du (5):
      perf tools: Missing c2c command in command-list
      perf tools: Ignore generated files pmu-events/{jevents,pmu-events.c} for git
      perf sort: Fix segfault with basic block 'cycles' sort dimension
      perf report: Document +field style argument support for --field option
      perf hists browser: Fix typo in function switch_data_file

Hari Bathini (5):
      perf: Add PERF_RECORD_NAMESPACES to include namespaces related info
      perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info
      perf record: Synthesize namespace events for current processes
      perf script: Add script print support for namespace events
      perf tools: Add 'cgroup_id' sort order keyword

Namhyung Kim (3):
      perf report: Hide tip message when -q option is given
      perf c2c: Clarify help message of --stats option
      perf c2c: Fix display bug when using pipe

Naveen N. Rao (5):
      perf probe: Factor out the ftrace README scanning
      perf kretprobes: Offset from reloc_sym if kernel supports it
      perf powerpc: Choose local entry point with kretprobes
      doc: trace/kprobes: add information about NOKPROBE_SYMBOL
      kprobes: Convert kprobe_exceptions_notify to use NOKPROBE_SYMBOL

 Documentation/trace/kprobetrace.txt         |   5 +-
 include/linux/perf_event.h                  |   2 +
 include/uapi/linux/perf_event.h             |  32 +++++-
 kernel/events/core.c                        | 139 ++++++++++++++++++++++++++
 kernel/fork.c                               |   2 +
 kernel/kprobes.c                            |   5 +-
 kernel/nsproxy.c                            |   3 +
 tools/include/uapi/linux/perf_event.h       |  32 +++++-
 tools/perf/.gitignore                       |   2 +
 tools/perf/Documentation/perf-record.txt    |   3 +
 tools/perf/Documentation/perf-report.txt    |   7 +-
 tools/perf/Documentation/perf-sched.txt     |   4 +
 tools/perf/Documentation/perf-script.txt    |   3 +
 tools/perf/arch/powerpc/util/sym-handling.c |  14 ++-
 tools/perf/builtin-annotate.c               |   1 +
 tools/perf/builtin-c2c.c                    |   4 +-
 tools/perf/builtin-diff.c                   |   1 +
 tools/perf/builtin-inject.c                 |  13 +++
 tools/perf/builtin-kmem.c                   |   1 +
 tools/perf/builtin-kvm.c                    |   2 +
 tools/perf/builtin-lock.c                   |   1 +
 tools/perf/builtin-mem.c                    |   1 +
 tools/perf/builtin-record.c                 |  35 ++++++-
 tools/perf/builtin-report.c                 |   4 +-
 tools/perf/builtin-sched.c                  |  26 ++++-
 tools/perf/builtin-script.c                 |  41 ++++++++
 tools/perf/builtin-trace.c                  |   3 +-
 tools/perf/command-list.txt                 |   1 +
 tools/perf/perf.h                           |   1 +
 tools/perf/ui/browsers/hists.c              |   2 +-
 tools/perf/util/Build                       |   1 +
 tools/perf/util/data-convert-bt.c           |   1 +
 tools/perf/util/event.c                     | 150 ++++++++++++++++++++++++++--
 tools/perf/util/event.h                     |  19 ++++
 tools/perf/util/evsel.c                     |   3 +
 tools/perf/util/hist.c                      |   7 ++
 tools/perf/util/hist.h                      |   1 +
 tools/perf/util/machine.c                   |  34 +++++++
 tools/perf/util/machine.h                   |   3 +
 tools/perf/util/namespaces.c                |  36 +++++++
 tools/perf/util/namespaces.h                |  26 +++++
 tools/perf/util/probe-event.c               |  12 +--
 tools/perf/util/probe-file.c                |  77 ++++++++------
 tools/perf/util/probe-file.h                |   1 +
 tools/perf/util/session.c                   |   7 ++
 tools/perf/util/sort.c                      |  46 +++++++++
 tools/perf/util/sort.h                      |   7 ++
 tools/perf/util/thread.c                    |  44 +++++++-
 tools/perf/util/thread.h                    |   6 ++
 tools/perf/util/tool.h                      |   2 +
 50 files changed, 799 insertions(+), 74 deletions(-)
 create mode 100644 tools/perf/util/namespaces.c
 create mode 100644 tools/perf/util/namespaces.h

Test results:

The first ones are container (docker) based builds of tools/perf with and
without libelf support, objtool where it is supported and samples/bpf/, ditto.
Where clang is available, it is also used to build perf with/without libelf.

Several are cross builds, the ones with -x-ARCH, and the android one, and those
may not have all the features built, due to lack of multi-arch devel packages,
available and being used so far on just a few, like
debian:experimental-x-{arm64,mipsel}.

The 'perf test' one will perform a variety of tests exercising
tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands
with a variety of command line event specifications to then intercept the
sys_perf_event syscall to check that the perf_event_attr fields are set up as
expected, among a variety of other unit tests.

Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/
with a variety of feature sets, exercising the build with an incomplete set of
features as well as with a complete one. It is planned to have it run on each
of the containers mentioned above, using some container orchestration
infrastructure. Get in contact if interested in helping having this in place.

  # dm
   1 alpine:3.4: Ok
   2 alpine:3.5: Ok
   3 alpine:edge: Ok
   4 android-ndk:r12b-arm: Ok
   5 archlinux:latest: Ok
   6 centos:5: Ok
   7 centos:6: Ok
   8 centos:7: Ok
   9 debian:7: Ok
  10 debian:8: Ok
  11 debian:experimental: Ok
  12 debian:experimental-x-arm64: Ok
  13 debian:experimental-x-mips: Ok
  14 debian:experimental-x-mips64: Ok
  15 debian:experimental-x-mipsel: Ok
  16 fedora:20: Ok
  17 fedora:21: Ok
  18 fedora:22: Ok
  19 fedora:23: Ok
  20 fedora:24: Ok
  21 fedora:24-x-ARC-uClibc: Ok
  22 fedora:25: Ok
  23 fedora:rawhide: Ok
  24 mageia:5: Ok
  25 opensuse:13.2: Ok
  26 opensuse:42.1: Ok
  27 opensuse:tumbleweed: Ok
  28 ubuntu:12.04.5: Ok
  29 ubuntu:14.04.4: Ok
  30 ubuntu:14.04.4-x-linaro-arm64: Ok
  31 ubuntu:15.10: Ok
  32 ubuntu:16.04: Ok
  33 ubuntu:16.04-x-arm: Ok
  34 ubuntu:16.04-x-arm64: Ok
  35 ubuntu:16.04-x-powerpc: Ok
  36 ubuntu:16.04-x-powerpc64: Ok
  37 ubuntu:16.04-x-s390: Ok
  38 ubuntu:16.10: Ok
  39 ubuntu:17.04: Ok
  #

  # uname -a
  Linux zoo 4.9.13-100.fc24.x86_64 #1 SMP Mon Feb 27 16:57:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
  # perf test
   1: vmlinux symtab matches kallsyms            : Ok
   2: Detect openat syscall event                : Ok
   3: Detect openat syscall event on all cpus    : Ok
   4: Read samples using the mmap interface      : Ok
   5: Parse event definition strings             : Ok
   6: PERF_RECORD_* events & perf_sample fields  : Ok
   7: Parse perf pmu format                      : Ok
   8: DSO data read                              : Ok
   9: DSO data cache                             : Ok
  10: DSO data reopen                            : Ok
  11: Roundtrip evsel->name                      : Ok
  12: Parse sched tracepoints fields             : Ok
  13: syscalls:sys_enter_openat event fields     : Ok
  14: Setup struct perf_event_attr               : Ok
  15: Match and link multiple hists              : Ok
  16: 'import perf' in python                    : Ok
  17: Breakpoint overflow signal handler         : Ok
  18: Breakpoint overflow sampling               : Ok
  19: Number of exit events of a simple workload : Ok
  20: Software clock events period values        : Ok
  21: Object code reading                        : Ok
  22: Sample parsing                             : Ok
  23: Use a dummy software event to keep tracking: Ok
  24: Parse with no sample_id_all bit set        : Ok
  25: Filter hist entries                        : Ok
  26: Lookup mmap thread                         : Ok
  27: Share thread mg                            : Ok
  28: Sort output of hist entries                : Ok
  29: Cumulate child hist entries                : Ok
  30: Track with sched_switch                    : Ok
  31: Filter fds with revents mask in a fdarray  : Ok
  32: Add fd to a fdarray, making it autogrow    : Ok
  33: kmod_path__parse                           : Ok
  34: Thread map                                 : Ok
  35: LLVM search and compile                    :
  35.1: Basic BPF llvm compile                    : Ok
  35.2: kbuild searching                          : Ok
  35.3: Compile source for BPF prologue generation: Ok
  35.4: Compile source for BPF relocation         : Ok
  36: Session topology                           : Ok
  37: BPF filter                                 :
  37.1: Basic BPF filtering                      : Ok
  37.2: BPF pinning                              : Ok
  37.3: BPF prologue generation                  : Ok
  37.4: BPF relocation checker                   : Ok
  38: Synthesize thread map                      : Ok
  39: Remove thread map                          : Ok
  40: Synthesize cpu map                         : Ok
  41: Synthesize stat config                     : Ok
  42: Synthesize stat                            : Ok
  43: Synthesize stat round                      : Ok
  44: Synthesize attr update                     : Ok
  45: Event times                                : Ok
  46: Read backward ring buffer                  : Ok
  47: Print cpu map                              : Ok
  48: Probe SDT events                           : Ok
  49: is_printable_array                         : Ok
  50: Print bitmap                               : Ok
  51: perf hooks                                 : Ok
  52: builtin clang support                      : Skip (not compiled in)
  53: unit_number__scnprintf                     : Ok
  54: x86 rdpmc                                  : Ok
  55: Convert perf time to TSC                   : Ok
  56: DWARF unwind                               : Ok
  57: x86 instruction decoder - new instructions : Ok
  58: Intel cqm nmi context read                 : Skip
  # 

  $ make -C tools/perf build-test
  make: Entering directory '/home/acme/git/linux/tools/perf'
  - tarpkg: ./tests/perf-targz-src-pkg .
                  make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1
                  make_debug_O: make DEBUG=1
              make_no_libelf_O: make NO_LIBELF=1
           make_no_libbionic_O: make NO_LIBBIONIC=1
            make_no_libaudit_O: make NO_LIBAUDIT=1
                   make_pure_O: make
              make_no_libbpf_O: make NO_LIBBPF=1
                   make_tags_O: make tags
        make_with_babeltrace_O: make LIBBABELTRACE=1
         make_with_clangllvm_O: make LIBCLANGLLVM=1
            make_no_auxtrace_O: make NO_AUXTRACE=1
                 make_perf_o_O: make perf.o
            make_no_demangle_O: make NO_DEMANGLE=1
              make_clean_all_O: make clean all
               make_no_slang_O: make NO_SLANG=1
                    make_doc_O: make doc
                make_no_newt_O: make NO_NEWT=1
           make_no_libpython_O: make NO_LIBPYTHON=1
       make_util_pmu_bison_o_O: make util/pmu-bison.o
            make_install_bin_O: make install-bin
                make_no_gtk2_O: make NO_GTK2=1
           make_no_backtrace_O: make NO_BACKTRACE=1
             make_no_libnuma_O: make NO_LIBNUMA=1
  make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1
             make_util_map_o_O: make util/map.o
             make_no_libperl_O: make NO_LIBPERL=1
                 make_static_O: make LDFLAGS=-static
           make_no_libunwind_O: make NO_LIBUNWIND=1
                make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1
                   make_help_O: make help
             make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1
  OK
  $

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 01/19] perf report: Hide tip message when -q option is given
  2017-03-14 18:50 [GIT PULL 00/19] perf/core improvements and fixes Arnaldo Carvalho de Melo
@ 2017-03-14 18:50 ` Arnaldo Carvalho de Melo
  2017-03-14 18:50 ` [PATCH 02/19] perf c2c: Clarify help message of --stats option Arnaldo Carvalho de Melo
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-14 18:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Namhyung Kim, Jiri Olsa, Peter Zijlstra,
	kernel-team, Arnaldo Carvalho de Melo

From: Namhyung Kim <namhyung@kernel.org>

The tip message at the end was printed regardless of the -q option.

Originally, the message suggested only '-s comm,dso' option for higher
level view when no sort option and parent option were given.

Now it shows random help message regardless of the options so the
condition can be simplified to honor the -q option.

Committer notes:

Before:

  $ perf report --stdio -q
    42.77%  ls       ls                [.] _init
    13.21%  ls       ld-2.24.so        [.] match_symbol
    12.55%  ls       libc-2.24.so      [.] __strcoll_l
    11.94%  ls       libc-2.24.so      [.] _init

  #
  # (Tip: Show current config key-value pairs: perf config --list)
  #
  $

After:

  $ perf report --stdio -q
    42.77%  ls       ls                [.] _init
    13.21%  ls       ld-2.24.so        [.] match_symbol
    12.55%  ls       libc-2.24.so      [.] __strcoll_l
    11.94%  ls       libc-2.24.so      [.] _init

  $

We still have those two extra lines tho (that git commit insists in
turning into one, or git commit --amend doesn't make me add), food for
another patch...

Reported-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@lge.com
Link: http://lkml.kernel.org/r/20170307150851.22304-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-report.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 0a88670e56f3..f03a5eac2a62 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -394,8 +394,7 @@ static int perf_evlist__tty_browse_hists(struct perf_evlist *evlist,
 		fprintf(stdout, "\n\n");
 	}
 
-	if (sort_order == NULL &&
-	    parent_pattern == default_parent_pattern)
+	if (!quiet)
 		fprintf(stdout, "#\n# (%s)\n#\n", help);
 
 	if (rep->show_threads) {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 02/19] perf c2c: Clarify help message of --stats option
  2017-03-14 18:50 [GIT PULL 00/19] perf/core improvements and fixes Arnaldo Carvalho de Melo
  2017-03-14 18:50 ` [PATCH 01/19] perf report: Hide tip message when -q option is given Arnaldo Carvalho de Melo
@ 2017-03-14 18:50 ` Arnaldo Carvalho de Melo
  2017-03-14 18:50 ` [PATCH 03/19] perf c2c: Fix display bug when using pipe Arnaldo Carvalho de Melo
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-14 18:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Namhyung Kim, Peter Zijlstra, kernel-team,
	Arnaldo Carvalho de Melo

From: Namhyung Kim <namhyung@kernel.org>

As it is not strictly asking for only stdio output, but will imply using
it.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@lge.com
Link: http://lkml.kernel.org/r/20170307150851.22304-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-c2c.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index e2b21723bbf8..3fac30ed92f1 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2536,7 +2536,7 @@ static int perf_c2c__report(int argc, const char **argv)
 	OPT_BOOLEAN(0, "stdio", &c2c.use_stdio, "Use the stdio interface"),
 #endif
 	OPT_BOOLEAN(0, "stats", &c2c.stats_only,
-		    "Use the stdio interface"),
+		    "Display only statistic tables (implies --stdio)"),
 	OPT_BOOLEAN(0, "full-symbols", &c2c.symbol_full,
 		    "Display full length of symbols"),
 	OPT_BOOLEAN(0, "no-source", &no_source,
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 03/19] perf c2c: Fix display bug when using pipe
  2017-03-14 18:50 [GIT PULL 00/19] perf/core improvements and fixes Arnaldo Carvalho de Melo
  2017-03-14 18:50 ` [PATCH 01/19] perf report: Hide tip message when -q option is given Arnaldo Carvalho de Melo
  2017-03-14 18:50 ` [PATCH 02/19] perf c2c: Clarify help message of --stats option Arnaldo Carvalho de Melo
@ 2017-03-14 18:50 ` Arnaldo Carvalho de Melo
  2017-03-14 18:50 ` [PATCH 04/19] perf tools: Missing c2c command in command-list Arnaldo Carvalho de Melo
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-14 18:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Namhyung Kim, Peter Zijlstra, kernel-team,
	Arnaldo Carvalho de Melo

From: Namhyung Kim <namhyung@kernel.org>

Currently 'perf c2c report' determines display mode using the --stdio
option, but it could be a problem if stdout is not a tty since
setup_browser falls back to stdio in this case.

But perf c2c didn't know this and tried to use TUI browser anyway.  It
should check "use_browser" variable instead.

For example, the following command showed nothing and broke terminal
setting.  Now it's fixed..

  $ perf c2c report | head
  =================================================
              Trace Event Information
  =================================================
    Total records                     :        136
    Locked Load/Store Operations      :          6
    Load Operations                   :         62
    Loads - uncacheable               :          0
    Loads - IO                        :          1
    Loads - Miss                      :          7
    Loads - no mapping                :          2

Committer notes:

When trying it without a proper perf.data file it results in a stuck
terminal, just as Namhyung reported above:

  [acme@jouet ~]$ perf c2c report | head
  WARNING: no sample cpu value[acme@jouet ~]$

One has to kill it from some other xterm. Confirm that this patch fixes
it:

After:

  $ perf c2c report | head
  WARNING: no sample cpu value=================================================
              Trace Event Information
  =================================================
    Total records                     :         14
    Locked Load/Store Operations      :          0
    Load Operations                   :          0
    Loads - uncacheable               :          0
    Loads - IO                        :          0
    Loads - Miss                      :          0
    Loads - no mapping                :          0
  $

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@lge.com
Link: http://lkml.kernel.org/r/20170307150851.22304-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-c2c.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 3fac30ed92f1..5cd6d7a047b9 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2334,7 +2334,7 @@ static int perf_c2c__hists_browse(struct hists *hists)
 
 static void perf_c2c_display(struct perf_session *session)
 {
-	if (c2c.use_stdio)
+	if (use_browser == 0)
 		perf_c2c__hists_fprintf(stdout, session);
 	else
 		perf_c2c__hists_browse(&c2c.hists.hists);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 04/19] perf tools: Missing c2c command in command-list
  2017-03-14 18:50 [GIT PULL 00/19] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (2 preceding siblings ...)
  2017-03-14 18:50 ` [PATCH 03/19] perf c2c: Fix display bug when using pipe Arnaldo Carvalho de Melo
@ 2017-03-14 18:50 ` Arnaldo Carvalho de Melo
  2017-03-14 18:50 ` [PATCH 05/19] perf tools: Ignore generated files pmu-events/{jevents,pmu-events.c} for git Arnaldo Carvalho de Melo
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-14 18:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Changbin Du, Peter Zijlstra, Arnaldo Carvalho de Melo

From: Changbin Du <changbin.du@intel.com>

Add the c2c command to command-list.txt so perf help can list this
command.

Committer notes:

Before:

  # perf help | grep c2c
  #

After:

  # perf help | grep c2c
     c2c             Shared Data C2C/HITM Analyzer.
  #

Signed-off-by: Changbin Du <changbin.du@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170313082845.23373-1-changbin.du@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/command-list.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt
index ac3efd396a72..2d0caf20ff3a 100644
--- a/tools/perf/command-list.txt
+++ b/tools/perf/command-list.txt
@@ -9,6 +9,7 @@ perf-buildid-cache		mainporcelain common
 perf-buildid-list		mainporcelain common
 perf-data			mainporcelain common
 perf-diff			mainporcelain common
+perf-c2c			mainporcelain common
 perf-config			mainporcelain common
 perf-evlist			mainporcelain common
 perf-ftrace			mainporcelain common
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 05/19] perf tools: Ignore generated files pmu-events/{jevents,pmu-events.c} for git
  2017-03-14 18:50 [GIT PULL 00/19] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (3 preceding siblings ...)
  2017-03-14 18:50 ` [PATCH 04/19] perf tools: Missing c2c command in command-list Arnaldo Carvalho de Melo
@ 2017-03-14 18:50 ` Arnaldo Carvalho de Melo
  2017-03-14 18:50 ` [PATCH 06/19] perf sort: Fix segfault with basic block 'cycles' sort dimension Arnaldo Carvalho de Melo
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-14 18:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Changbin Du, Peter Zijlstra, Arnaldo Carvalho de Melo

From: Changbin Du <changbin.du@intel.com>

Ignore two files: pmu-events/{jevents,pmu-events.c} which are generated
during the build.

Committer notes:

Testing it:

  $ make -C tools/perf/
  $ git status
  On branch perf/core
  Untracked files:
  (use "git add <file>..." to include in what will be committed)

	tools/perf/pmu-events/jevents
	tools/perf/pmu-events/pmu-events.c

  nothing added to commit but untracked files present (use "git add" to track)
  $

After the patch:

  $ git status
  On branch perf/core
  nothing to commit, working tree clean
  $

Signed-off-by: Changbin Du <changbin.du@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170313083026.23487-1-changbin.du@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/.gitignore | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/.gitignore b/tools/perf/.gitignore
index 3db3db9278be..643cc4ba6872 100644
--- a/tools/perf/.gitignore
+++ b/tools/perf/.gitignore
@@ -31,3 +31,5 @@ config.mak.autogen
 .config-detected
 util/intel-pt-decoder/inat-tables.c
 arch/*/include/generated/
+pmu-events/pmu-events.c
+pmu-events/jevents
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 06/19] perf sort: Fix segfault with basic block 'cycles' sort dimension
  2017-03-14 18:50 [GIT PULL 00/19] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (4 preceding siblings ...)
  2017-03-14 18:50 ` [PATCH 05/19] perf tools: Ignore generated files pmu-events/{jevents,pmu-events.c} for git Arnaldo Carvalho de Melo
@ 2017-03-14 18:50 ` Arnaldo Carvalho de Melo
  2017-03-14 18:50 ` [PATCH 07/19] perf report: Document +field style argument support for --field option Arnaldo Carvalho de Melo
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-14 18:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Changbin Du, Andi Kleen, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Changbin Du <changbin.du@intel.com>

Skip the sample which doesn't have branch_info to avoid segmentation
fault:

The fault can be reproduced by:

  perf record -a
  perf report -F cycles

Signed-off-by: Changbin Du <changbin.du@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 0e332f033a82 ("perf tools: Add support for cycles, weight branch_info field")
Link: http://lkml.kernel.org/r/20170313083148.23568-1-changbin.du@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/sort.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index f8f16c0e20b6..93f755ac60ca 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -846,6 +846,9 @@ static int hist_entry__mispredict_snprintf(struct hist_entry *he, char *bf,
 static int64_t
 sort__cycles_cmp(struct hist_entry *left, struct hist_entry *right)
 {
+	if (!left->branch_info || !right->branch_info)
+		return cmp_null(left->branch_info, right->branch_info);
+
 	return left->branch_info->flags.cycles -
 		right->branch_info->flags.cycles;
 }
@@ -853,6 +856,8 @@ sort__cycles_cmp(struct hist_entry *left, struct hist_entry *right)
 static int hist_entry__cycles_snprintf(struct hist_entry *he, char *bf,
 				    size_t size, unsigned int width)
 {
+	if (!he->branch_info)
+		return scnprintf(bf, size, "%-.*s", width, "N/A");
 	if (he->branch_info->flags.cycles == 0)
 		return repsep_snprintf(bf, size, "%-*s", width, "-");
 	return repsep_snprintf(bf, size, "%-*hd", width,
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 07/19] perf report: Document +field style argument support for --field option
  2017-03-14 18:50 [GIT PULL 00/19] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (5 preceding siblings ...)
  2017-03-14 18:50 ` [PATCH 06/19] perf sort: Fix segfault with basic block 'cycles' sort dimension Arnaldo Carvalho de Melo
@ 2017-03-14 18:50 ` Arnaldo Carvalho de Melo
  2017-03-14 18:50 ` [PATCH 08/19] perf hists browser: Fix typo in function switch_data_file Arnaldo Carvalho de Melo
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-14 18:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Changbin Du, Peter Zijlstra, Arnaldo Carvalho de Melo

From: Changbin Du <changbin.du@intel.com>

Commit 2f3f9bcf000b ("perf tools: Add +field argument support for
--field option") by Jiri Olsa <jolsa@kernel.org> introduced +field style
argument support for --field option.

This is useful but not updated documentation.  This add a little
description there.

Signed-off-by: Changbin Du <changbin.du@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170313083252.23644-1-changbin.du@intel.com
[ Slightly improved the phrase structure ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-report.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 33f91906f5dc..672b149aa80a 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -173,6 +173,9 @@ OPTIONS
 	By default, every sort keys not specified in -F will be appended
 	automatically.
 
+	If the keys starts with a prefix '+', then it will append the specified
+        field(s) to the default field order. For example: perf report -F +period,sample.
+
 -p::
 --parent=<regex>::
         A regex filter to identify parent. The parent is a caller of this
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 08/19] perf hists browser: Fix typo in function switch_data_file
  2017-03-14 18:50 [GIT PULL 00/19] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (6 preceding siblings ...)
  2017-03-14 18:50 ` [PATCH 07/19] perf report: Document +field style argument support for --field option Arnaldo Carvalho de Melo
@ 2017-03-14 18:50 ` Arnaldo Carvalho de Melo
  2017-03-14 18:50 ` [PATCH 09/19] perf: Add PERF_RECORD_NAMESPACES to include namespaces related info Arnaldo Carvalho de Melo
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-14 18:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Changbin Du, Feng Tang, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Changbin Du <changbin.du@intel.com>

Should clear buf 'abs_path', not 'options'.

Signed-off-by: Changbin Du <changbin.du@intel.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 341487ab561f ("perf hists browser: Add option for runtime switching perf data file")
Link: http://lkml.kernel.org/r/20170313114652.9207-1-changbin.du@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/ui/browsers/hists.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index fc4fb669ceee..2dc82bec10c0 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -2308,7 +2308,7 @@ static int switch_data_file(void)
 		return ret;
 
 	memset(options, 0, sizeof(options));
-	memset(options, 0, sizeof(abs_path));
+	memset(abs_path, 0, sizeof(abs_path));
 
 	while ((dent = readdir(pwd_dir))) {
 		char path[PATH_MAX];
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 09/19] perf: Add PERF_RECORD_NAMESPACES to include namespaces related info
  2017-03-14 18:50 [GIT PULL 00/19] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (7 preceding siblings ...)
  2017-03-14 18:50 ` [PATCH 08/19] perf hists browser: Fix typo in function switch_data_file Arnaldo Carvalho de Melo
@ 2017-03-14 18:50 ` Arnaldo Carvalho de Melo
  2017-03-14 18:50 ` [PATCH 10/19] perf tools: " Arnaldo Carvalho de Melo
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-14 18:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Hari Bathini, Alexander Shishkin,
	Alexei Starovoitov, Ananth N Mavinakayanahalli, Aravinda Prasad,
	Brendan Gregg, Daniel Borkmann, Eric Biederman, Sargun Dhillon,
	Steven Rostedt, Arnaldo Carvalho de Melo

From: Hari Bathini <hbathini@linux.vnet.ibm.com>

With the advert of container technologies like docker, that depend on
namespaces for isolation, there is a need for tracing support for
namespaces. This patch introduces new PERF_RECORD_NAMESPACES event for
recording namespaces related info. By recording info for every
namespace, it is left to userspace to take a call on the definition of a
container and trace containers by updating perf tool accordingly.

Each namespace has a combination of device and inode numbers. Though
every namespace has the same device number currently, that may change in
future to avoid the need for a namespace of namespaces. Considering such
possibility, record both device and inode numbers separately for each
namespace.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/148891929686.25309.2827618988917007768.stgit@hbathini.in.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 include/linux/perf_event.h      |   2 +
 include/uapi/linux/perf_event.h |  32 ++++++++-
 kernel/events/core.c            | 139 ++++++++++++++++++++++++++++++++++++++++
 kernel/fork.c                   |   2 +
 kernel/nsproxy.c                |   3 +
 5 files changed, 177 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 000fdb211c7d..f19a82362851 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1112,6 +1112,7 @@ extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks
 
 extern void perf_event_exec(void);
 extern void perf_event_comm(struct task_struct *tsk, bool exec);
+extern void perf_event_namespaces(struct task_struct *tsk);
 extern void perf_event_fork(struct task_struct *tsk);
 
 /* Callchains */
@@ -1315,6 +1316,7 @@ static inline int perf_unregister_guest_info_callbacks
 static inline void perf_event_mmap(struct vm_area_struct *vma)		{ }
 static inline void perf_event_exec(void)				{ }
 static inline void perf_event_comm(struct task_struct *tsk, bool exec)	{ }
+static inline void perf_event_namespaces(struct task_struct *tsk)	{ }
 static inline void perf_event_fork(struct task_struct *tsk)		{ }
 static inline void perf_event_init(void)				{ }
 static inline int  perf_swevent_get_recursion_context(void)		{ return -1; }
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index c66a485a24ac..bec0aad0e15c 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -344,7 +344,8 @@ struct perf_event_attr {
 				use_clockid    :  1, /* use @clockid for time fields */
 				context_switch :  1, /* context switch data */
 				write_backward :  1, /* Write ring buffer from end to beginning */
-				__reserved_1   : 36;
+				namespaces     :  1, /* include namespaces data */
+				__reserved_1   : 35;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -610,6 +611,23 @@ struct perf_event_header {
 	__u16	size;
 };
 
+struct perf_ns_link_info {
+	__u64	dev;
+	__u64	ino;
+};
+
+enum {
+	NET_NS_INDEX		= 0,
+	UTS_NS_INDEX		= 1,
+	IPC_NS_INDEX		= 2,
+	PID_NS_INDEX		= 3,
+	USER_NS_INDEX		= 4,
+	MNT_NS_INDEX		= 5,
+	CGROUP_NS_INDEX		= 6,
+
+	NR_NAMESPACES,		/* number of available namespaces */
+};
+
 enum perf_event_type {
 
 	/*
@@ -862,6 +880,18 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_SWITCH_CPU_WIDE		= 15,
 
+	/*
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u32				pid;
+	 *	u32				tid;
+	 *	u64				nr_namespaces;
+	 *	{ u64				dev, inode; } [nr_namespaces];
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_NAMESPACES			= 16,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6f41548f2e32..16c877a121c8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -48,6 +48,8 @@
 #include <linux/parser.h>
 #include <linux/sched/clock.h>
 #include <linux/sched/mm.h>
+#include <linux/proc_ns.h>
+#include <linux/mount.h>
 
 #include "internal.h"
 
@@ -379,6 +381,7 @@ static DEFINE_PER_CPU(struct pmu_event_list, pmu_sb_events);
 
 static atomic_t nr_mmap_events __read_mostly;
 static atomic_t nr_comm_events __read_mostly;
+static atomic_t nr_namespaces_events __read_mostly;
 static atomic_t nr_task_events __read_mostly;
 static atomic_t nr_freq_events __read_mostly;
 static atomic_t nr_switch_events __read_mostly;
@@ -3991,6 +3994,8 @@ static void unaccount_event(struct perf_event *event)
 		atomic_dec(&nr_mmap_events);
 	if (event->attr.comm)
 		atomic_dec(&nr_comm_events);
+	if (event->attr.namespaces)
+		atomic_dec(&nr_namespaces_events);
 	if (event->attr.task)
 		atomic_dec(&nr_task_events);
 	if (event->attr.freq)
@@ -6491,6 +6496,7 @@ static void perf_event_task(struct task_struct *task,
 void perf_event_fork(struct task_struct *task)
 {
 	perf_event_task(task, NULL, 1);
+	perf_event_namespaces(task);
 }
 
 /*
@@ -6593,6 +6599,132 @@ void perf_event_comm(struct task_struct *task, bool exec)
 }
 
 /*
+ * namespaces tracking
+ */
+
+struct perf_namespaces_event {
+	struct task_struct		*task;
+
+	struct {
+		struct perf_event_header	header;
+
+		u32				pid;
+		u32				tid;
+		u64				nr_namespaces;
+		struct perf_ns_link_info	link_info[NR_NAMESPACES];
+	} event_id;
+};
+
+static int perf_event_namespaces_match(struct perf_event *event)
+{
+	return event->attr.namespaces;
+}
+
+static void perf_event_namespaces_output(struct perf_event *event,
+					 void *data)
+{
+	struct perf_namespaces_event *namespaces_event = data;
+	struct perf_output_handle handle;
+	struct perf_sample_data sample;
+	int ret;
+
+	if (!perf_event_namespaces_match(event))
+		return;
+
+	perf_event_header__init_id(&namespaces_event->event_id.header,
+				   &sample, event);
+	ret = perf_output_begin(&handle, event,
+				namespaces_event->event_id.header.size);
+	if (ret)
+		return;
+
+	namespaces_event->event_id.pid = perf_event_pid(event,
+							namespaces_event->task);
+	namespaces_event->event_id.tid = perf_event_tid(event,
+							namespaces_event->task);
+
+	perf_output_put(&handle, namespaces_event->event_id);
+
+	perf_event__output_id_sample(event, &handle, &sample);
+
+	perf_output_end(&handle);
+}
+
+static void perf_fill_ns_link_info(struct perf_ns_link_info *ns_link_info,
+				   struct task_struct *task,
+				   const struct proc_ns_operations *ns_ops)
+{
+	struct path ns_path;
+	struct inode *ns_inode;
+	void *error;
+
+	error = ns_get_path(&ns_path, task, ns_ops);
+	if (!error) {
+		ns_inode = ns_path.dentry->d_inode;
+		ns_link_info->dev = new_encode_dev(ns_inode->i_sb->s_dev);
+		ns_link_info->ino = ns_inode->i_ino;
+	}
+}
+
+void perf_event_namespaces(struct task_struct *task)
+{
+	struct perf_namespaces_event namespaces_event;
+	struct perf_ns_link_info *ns_link_info;
+
+	if (!atomic_read(&nr_namespaces_events))
+		return;
+
+	namespaces_event = (struct perf_namespaces_event){
+		.task	= task,
+		.event_id  = {
+			.header = {
+				.type = PERF_RECORD_NAMESPACES,
+				.misc = 0,
+				.size = sizeof(namespaces_event.event_id),
+			},
+			/* .pid */
+			/* .tid */
+			.nr_namespaces = NR_NAMESPACES,
+			/* .link_info[NR_NAMESPACES] */
+		},
+	};
+
+	ns_link_info = namespaces_event.event_id.link_info;
+
+	perf_fill_ns_link_info(&ns_link_info[MNT_NS_INDEX],
+			       task, &mntns_operations);
+
+#ifdef CONFIG_USER_NS
+	perf_fill_ns_link_info(&ns_link_info[USER_NS_INDEX],
+			       task, &userns_operations);
+#endif
+#ifdef CONFIG_NET_NS
+	perf_fill_ns_link_info(&ns_link_info[NET_NS_INDEX],
+			       task, &netns_operations);
+#endif
+#ifdef CONFIG_UTS_NS
+	perf_fill_ns_link_info(&ns_link_info[UTS_NS_INDEX],
+			       task, &utsns_operations);
+#endif
+#ifdef CONFIG_IPC_NS
+	perf_fill_ns_link_info(&ns_link_info[IPC_NS_INDEX],
+			       task, &ipcns_operations);
+#endif
+#ifdef CONFIG_PID_NS
+	perf_fill_ns_link_info(&ns_link_info[PID_NS_INDEX],
+			       task, &pidns_operations);
+#endif
+#ifdef CONFIG_CGROUPS
+	perf_fill_ns_link_info(&ns_link_info[CGROUP_NS_INDEX],
+			       task, &cgroupns_operations);
+#endif
+
+	perf_iterate_sb(perf_event_namespaces_output,
+			&namespaces_event,
+			NULL);
+}
+
+/*
  * mmap tracking
  */
 
@@ -9146,6 +9278,8 @@ static void account_event(struct perf_event *event)
 		atomic_inc(&nr_mmap_events);
 	if (event->attr.comm)
 		atomic_inc(&nr_comm_events);
+	if (event->attr.namespaces)
+		atomic_inc(&nr_namespaces_events);
 	if (event->attr.task)
 		atomic_inc(&nr_task_events);
 	if (event->attr.freq)
@@ -9691,6 +9825,11 @@ SYSCALL_DEFINE5(perf_event_open,
 			return -EACCES;
 	}
 
+	if (attr.namespaces) {
+		if (!capable(CAP_SYS_ADMIN))
+			return -EACCES;
+	}
+
 	if (attr.freq) {
 		if (attr.sample_freq > sysctl_perf_event_sample_rate)
 			return -EINVAL;
diff --git a/kernel/fork.c b/kernel/fork.c
index 6c463c80e93d..afa2947286cd 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2352,6 +2352,8 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)
 		}
 	}
 
+	perf_event_namespaces(current);
+
 bad_unshare_cleanup_cred:
 	if (new_cred)
 		put_cred(new_cred);
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 782102e59eed..f6c5d330059a 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -26,6 +26,7 @@
 #include <linux/file.h>
 #include <linux/syscalls.h>
 #include <linux/cgroup.h>
+#include <linux/perf_event.h>
 
 static struct kmem_cache *nsproxy_cachep;
 
@@ -262,6 +263,8 @@ SYSCALL_DEFINE2(setns, int, fd, int, nstype)
 		goto out;
 	}
 	switch_task_namespaces(tsk, new_nsproxy);
+
+	perf_event_namespaces(tsk);
 out:
 	fput(file);
 	return err;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 10/19] perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info
  2017-03-14 18:50 [GIT PULL 00/19] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (8 preceding siblings ...)
  2017-03-14 18:50 ` [PATCH 09/19] perf: Add PERF_RECORD_NAMESPACES to include namespaces related info Arnaldo Carvalho de Melo
@ 2017-03-14 18:50 ` Arnaldo Carvalho de Melo
  2017-03-14 18:50 ` [PATCH 11/19] perf record: Synthesize namespace events for current processes Arnaldo Carvalho de Melo
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-14 18:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Hari Bathini, Alexander Shishkin,
	Alexei Starovoitov, Ananth N Mavinakayanahalli, Aravinda Prasad,
	Brendan Gregg, Daniel Borkmann, Eric Biederman, Peter Zijlstra,
	Sargun Dhillon, Steven Rostedt, Arnaldo Carvalho de Melo

From: Hari Bathini <hbathini@linux.vnet.ibm.com>

Introduce a new option to record PERF_RECORD_NAMESPACES events emitted
by the kernel when fork, clone, setns or unshare are invoked. And update
perf-record documentation with the new option to record namespace
events.

Committer notes:

Combined it with a later patch to allow printing it via 'perf report -D'
and be able to test the feature introduced in this patch. Had to move
here also perf_ns__name(), that was introduced in another later patch.

Also used PRIu64 and PRIx64 to fix the build in some enfironments wrt:

  util/event.c:1129:39: error: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'long long unsigned int' [-Werror=format=]
     ret  += fprintf(fp, "%u/%s: %lu/0x%lx%s", idx
                                         ^
Testing it:

  # perf record --namespaces -a
  ^C[ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 1.083 MB perf.data (423 samples) ]
  #
  # perf report -D
  <SNIP>
  3 2028902078892 0x115140 [0xa0]: PERF_RECORD_NAMESPACES 14783/14783 - nr_namespaces: 7
                [0/net: 3/0xf0000081, 1/uts: 3/0xeffffffe, 2/ipc: 3/0xefffffff, 3/pid: 3/0xeffffffc,
                 4/user: 3/0xeffffffd, 5/mnt: 3/0xf0000000, 6/cgroup: 3/0xeffffffb]

  0x1151e0 [0x30]: event: 9
  .
  . ... raw event: size 48 bytes
  .  0000:  09 00 00 00 02 00 30 00 c4 71 82 68 0c 7f 00 00  ......0..q.h....
  .  0010:  a9 39 00 00 a9 39 00 00 94 28 fe 63 d8 01 00 00  .9...9...(.c....
  .  0020:  03 00 00 00 00 00 00 00 ce c4 02 00 00 00 00 00  ................
  <SNIP>
        NAMESPACES events:          1
  <SNIP>
  #

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/148891930386.25309.18412039920746995488.stgit@hbathini.in.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/include/uapi/linux/perf_event.h    | 32 +++++++++++++++++-
 tools/perf/Documentation/perf-record.txt |  3 ++
 tools/perf/builtin-annotate.c            |  1 +
 tools/perf/builtin-diff.c                |  1 +
 tools/perf/builtin-inject.c              | 13 ++++++++
 tools/perf/builtin-kmem.c                |  1 +
 tools/perf/builtin-kvm.c                 |  2 ++
 tools/perf/builtin-lock.c                |  1 +
 tools/perf/builtin-mem.c                 |  1 +
 tools/perf/builtin-record.c              |  6 ++++
 tools/perf/builtin-report.c              |  1 +
 tools/perf/builtin-sched.c               |  1 +
 tools/perf/builtin-script.c              |  1 +
 tools/perf/builtin-trace.c               |  3 +-
 tools/perf/perf.h                        |  1 +
 tools/perf/util/Build                    |  1 +
 tools/perf/util/data-convert-bt.c        |  1 +
 tools/perf/util/event.c                  | 56 ++++++++++++++++++++++++++++++++
 tools/perf/util/event.h                  | 13 ++++++++
 tools/perf/util/evsel.c                  |  3 ++
 tools/perf/util/machine.c                | 34 +++++++++++++++++++
 tools/perf/util/machine.h                |  3 ++
 tools/perf/util/namespaces.c             | 36 ++++++++++++++++++++
 tools/perf/util/namespaces.h             | 26 +++++++++++++++
 tools/perf/util/session.c                |  7 ++++
 tools/perf/util/thread.c                 | 44 +++++++++++++++++++++++--
 tools/perf/util/thread.h                 |  6 ++++
 tools/perf/util/tool.h                   |  2 ++
 28 files changed, 296 insertions(+), 4 deletions(-)
 create mode 100644 tools/perf/util/namespaces.c
 create mode 100644 tools/perf/util/namespaces.h

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index c66a485a24ac..bec0aad0e15c 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -344,7 +344,8 @@ struct perf_event_attr {
 				use_clockid    :  1, /* use @clockid for time fields */
 				context_switch :  1, /* context switch data */
 				write_backward :  1, /* Write ring buffer from end to beginning */
-				__reserved_1   : 36;
+				namespaces     :  1, /* include namespaces data */
+				__reserved_1   : 35;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
@@ -610,6 +611,23 @@ struct perf_event_header {
 	__u16	size;
 };
 
+struct perf_ns_link_info {
+	__u64	dev;
+	__u64	ino;
+};
+
+enum {
+	NET_NS_INDEX		= 0,
+	UTS_NS_INDEX		= 1,
+	IPC_NS_INDEX		= 2,
+	PID_NS_INDEX		= 3,
+	USER_NS_INDEX		= 4,
+	MNT_NS_INDEX		= 5,
+	CGROUP_NS_INDEX		= 6,
+
+	NR_NAMESPACES,		/* number of available namespaces */
+};
+
 enum perf_event_type {
 
 	/*
@@ -862,6 +880,18 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_SWITCH_CPU_WIDE		= 15,
 
+	/*
+	 * struct {
+	 *	struct perf_event_header	header;
+	 *	u32				pid;
+	 *	u32				tid;
+	 *	u64				nr_namespaces;
+	 *	{ u64				dev, inode; } [nr_namespaces];
+	 *	struct sample_id		sample_id;
+	 * };
+	 */
+	PERF_RECORD_NAMESPACES			= 16,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index b16003ec14a7..ea3789d05e5e 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -347,6 +347,9 @@ Enable weightened sampling. An additional weight is recorded per sample and can
 displayed with the weight and local_weight sort keys.  This currently works for TSX
 abort events and some memory events in precise mode on modern Intel CPUs.
 
+--namespaces::
+Record events of type PERF_RECORD_NAMESPACES.
+
 --transaction::
 Record transaction flags for transaction related events.
 
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 4f52d85f5ebc..e54b1f9fe1ee 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -393,6 +393,7 @@ int cmd_annotate(int argc, const char **argv, const char *prefix __maybe_unused)
 			.comm	= perf_event__process_comm,
 			.exit	= perf_event__process_exit,
 			.fork	= perf_event__process_fork,
+			.namespaces = perf_event__process_namespaces,
 			.ordered_events = true,
 			.ordering_requires_timestamps = true,
 		},
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 1b96a3122228..5e4803158672 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -364,6 +364,7 @@ static struct perf_tool tool = {
 	.exit	= perf_event__process_exit,
 	.fork	= perf_event__process_fork,
 	.lost	= perf_event__process_lost,
+	.namespaces = perf_event__process_namespaces,
 	.ordered_events = true,
 	.ordering_requires_timestamps = true,
 };
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index b9bc7e39833a..8d1d13b9bab6 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -333,6 +333,18 @@ static int perf_event__repipe_comm(struct perf_tool *tool,
 	return err;
 }
 
+static int perf_event__repipe_namespaces(struct perf_tool *tool,
+					 union perf_event *event,
+					 struct perf_sample *sample,
+					 struct machine *machine)
+{
+	int err = perf_event__process_namespaces(tool, event, sample, machine);
+
+	perf_event__repipe(tool, event, sample, machine);
+
+	return err;
+}
+
 static int perf_event__repipe_exit(struct perf_tool *tool,
 				   union perf_event *event,
 				   struct perf_sample *sample,
@@ -660,6 +672,7 @@ static int __cmd_inject(struct perf_inject *inject)
 		session->itrace_synth_opts = &inject->itrace_synth_opts;
 		inject->itrace_synth_opts.inject = true;
 		inject->tool.comm	    = perf_event__repipe_comm;
+		inject->tool.namespaces	    = perf_event__repipe_namespaces;
 		inject->tool.exit	    = perf_event__repipe_exit;
 		inject->tool.id_index	    = perf_event__repipe_id_index;
 		inject->tool.auxtrace_info  = perf_event__process_auxtrace_info;
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 6da8d083e4e5..d509e74bc6e8 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -964,6 +964,7 @@ static struct perf_tool perf_kmem = {
 	.comm		 = perf_event__process_comm,
 	.mmap		 = perf_event__process_mmap,
 	.mmap2		 = perf_event__process_mmap2,
+	.namespaces	 = perf_event__process_namespaces,
 	.ordered_events	 = true,
 };
 
diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index 08fa88f62a24..18e6c38864bc 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -1044,6 +1044,7 @@ static int read_events(struct perf_kvm_stat *kvm)
 	struct perf_tool eops = {
 		.sample			= process_sample_event,
 		.comm			= perf_event__process_comm,
+		.namespaces		= perf_event__process_namespaces,
 		.ordered_events		= true,
 	};
 	struct perf_data_file file = {
@@ -1348,6 +1349,7 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
 	kvm->tool.exit   = perf_event__process_exit;
 	kvm->tool.fork   = perf_event__process_fork;
 	kvm->tool.lost   = process_lost_event;
+	kvm->tool.namespaces  = perf_event__process_namespaces;
 	kvm->tool.ordered_events = true;
 	perf_tool__fill_defaults(&kvm->tool);
 
diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
index ce3bfb48b26f..d750ccaa978f 100644
--- a/tools/perf/builtin-lock.c
+++ b/tools/perf/builtin-lock.c
@@ -858,6 +858,7 @@ static int __cmd_report(bool display_info)
 	struct perf_tool eops = {
 		.sample		 = process_sample_event,
 		.comm		 = perf_event__process_comm,
+		.namespaces	 = perf_event__process_namespaces,
 		.ordered_events	 = true,
 	};
 	struct perf_data_file file = {
diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c
index 6114e07ca613..030a6cfdda59 100644
--- a/tools/perf/builtin-mem.c
+++ b/tools/perf/builtin-mem.c
@@ -342,6 +342,7 @@ int cmd_mem(int argc, const char **argv, const char *prefix __maybe_unused)
 			.lost		= perf_event__process_lost,
 			.fork		= perf_event__process_fork,
 			.build_id	= perf_event__process_build_id,
+			.namespaces	= perf_event__process_namespaces,
 			.ordered_events	= true,
 		},
 		.input_name		 = "perf.data",
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index bc84a375295d..99562c7242b6 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -876,6 +876,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	signal(SIGTERM, sig_handler);
 	signal(SIGSEGV, sigsegv_handler);
 
+	if (rec->opts.record_namespaces)
+		tool->namespace_events = true;
+
 	if (rec->opts.auxtrace_snapshot_mode || rec->switch_output.enabled) {
 		signal(SIGUSR2, snapshot_sig_handler);
 		if (rec->opts.auxtrace_snapshot_mode)
@@ -1497,6 +1500,7 @@ static struct record record = {
 		.fork		= perf_event__process_fork,
 		.exit		= perf_event__process_exit,
 		.comm		= perf_event__process_comm,
+		.namespaces	= perf_event__process_namespaces,
 		.mmap		= perf_event__process_mmap,
 		.mmap2		= perf_event__process_mmap2,
 		.ordered_events	= true,
@@ -1611,6 +1615,8 @@ static struct option __record_options[] = {
 			  "opts", "AUX area tracing Snapshot Mode", ""),
 	OPT_UINTEGER(0, "proc-map-timeout", &record.opts.proc_map_timeout,
 			"per thread proc mmap processing timeout in ms"),
+	OPT_BOOLEAN(0, "namespaces", &record.opts.record_namespaces,
+		    "Record namespaces events"),
 	OPT_BOOLEAN(0, "switch-events", &record.opts.record_switch_events,
 		    "Record context switch events"),
 	OPT_BOOLEAN_FLAG(0, "all-kernel", &record.opts.all_kernel,
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index f03a5eac2a62..5ab8117c3bfd 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -700,6 +700,7 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 			.mmap		 = perf_event__process_mmap,
 			.mmap2		 = perf_event__process_mmap2,
 			.comm		 = perf_event__process_comm,
+			.namespaces	 = perf_event__process_namespaces,
 			.exit		 = perf_event__process_exit,
 			.fork		 = perf_event__process_fork,
 			.lost		 = perf_event__process_lost,
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index b94cf0de715a..16170e9b47e6 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -3272,6 +3272,7 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 		.tool = {
 			.sample		 = perf_sched__process_tracepoint_sample,
 			.comm		 = perf_event__process_comm,
+			.namespaces	 = perf_event__process_namespaces,
 			.lost		 = perf_event__process_lost,
 			.fork		 = perf_sched__process_fork_event,
 			.ordered_events = true,
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index c0783b4f7b6c..f1ce806a1f31 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2097,6 +2097,7 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 			.mmap		 = perf_event__process_mmap,
 			.mmap2		 = perf_event__process_mmap2,
 			.comm		 = perf_event__process_comm,
+			.namespaces	 = perf_event__process_namespaces,
 			.exit		 = perf_event__process_exit,
 			.fork		 = perf_event__process_fork,
 			.attr		 = process_attr,
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 256f1fac6f7e..912fedc5b42d 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -2415,8 +2415,9 @@ static int trace__replay(struct trace *trace)
 	trace->tool.exit	  = perf_event__process_exit;
 	trace->tool.fork	  = perf_event__process_fork;
 	trace->tool.attr	  = perf_event__process_attr;
-	trace->tool.tracing_data = perf_event__process_tracing_data;
+	trace->tool.tracing_data  = perf_event__process_tracing_data;
 	trace->tool.build_id	  = perf_event__process_build_id;
+	trace->tool.namespaces	  = perf_event__process_namespaces;
 
 	trace->tool.ordered_events = true;
 	trace->tool.ordering_requires_timestamps = true;
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 1c27d947c2fe..806c216a1078 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -50,6 +50,7 @@ struct record_opts {
 	bool	     running_time;
 	bool	     full_auxtrace;
 	bool	     auxtrace_snapshot_mode;
+	bool	     record_namespaces;
 	bool	     record_switch_events;
 	bool	     all_kernel;
 	bool	     all_user;
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 5da376bc1afc..2ea5ee179a3b 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -42,6 +42,7 @@ libperf-y += pstack.o
 libperf-y += session.o
 libperf-$(CONFIG_AUDIT) += syscalltbl.o
 libperf-y += ordered-events.o
+libperf-y += namespaces.o
 libperf-y += comm.o
 libperf-y += thread.o
 libperf-y += thread_map.o
diff --git a/tools/perf/util/data-convert-bt.c b/tools/perf/util/data-convert-bt.c
index 4e6cbc99f08e..89ece2445713 100644
--- a/tools/perf/util/data-convert-bt.c
+++ b/tools/perf/util/data-convert-bt.c
@@ -1468,6 +1468,7 @@ int bt_convert__perf2ctf(const char *input, const char *path,
 			.lost            = perf_event__process_lost,
 			.tracing_data    = perf_event__process_tracing_data,
 			.build_id        = perf_event__process_build_id,
+			.namespaces      = perf_event__process_namespaces,
 			.ordered_events  = true,
 			.ordering_requires_timestamps = true,
 		},
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 4ea7ce72ed9c..fb52819023c7 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -31,6 +31,7 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_LOST_SAMPLES]		= "LOST_SAMPLES",
 	[PERF_RECORD_SWITCH]			= "SWITCH",
 	[PERF_RECORD_SWITCH_CPU_WIDE]		= "SWITCH_CPU_WIDE",
+	[PERF_RECORD_NAMESPACES]		= "NAMESPACES",
 	[PERF_RECORD_HEADER_ATTR]		= "ATTR",
 	[PERF_RECORD_HEADER_EVENT_TYPE]		= "EVENT_TYPE",
 	[PERF_RECORD_HEADER_TRACING_DATA]	= "TRACING_DATA",
@@ -49,6 +50,16 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_TIME_CONV]			= "TIME_CONV",
 };
 
+static const char *perf_ns__names[] = {
+	[NET_NS_INDEX]		= "net",
+	[UTS_NS_INDEX]		= "uts",
+	[IPC_NS_INDEX]		= "ipc",
+	[PID_NS_INDEX]		= "pid",
+	[USER_NS_INDEX]		= "user",
+	[MNT_NS_INDEX]		= "mnt",
+	[CGROUP_NS_INDEX]	= "cgroup",
+};
+
 const char *perf_event__name(unsigned int id)
 {
 	if (id >= ARRAY_SIZE(perf_event__names))
@@ -58,6 +69,13 @@ const char *perf_event__name(unsigned int id)
 	return perf_event__names[id];
 }
 
+static const char *perf_ns__name(unsigned int id)
+{
+	if (id >= ARRAY_SIZE(perf_ns__names))
+		return "UNKNOWN";
+	return perf_ns__names[id];
+}
+
 static int perf_tool__process_synth_event(struct perf_tool *tool,
 					  union perf_event *event,
 					  struct machine *machine,
@@ -1008,6 +1026,33 @@ size_t perf_event__fprintf_comm(union perf_event *event, FILE *fp)
 	return fprintf(fp, "%s: %s:%d/%d\n", s, event->comm.comm, event->comm.pid, event->comm.tid);
 }
 
+size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp)
+{
+	size_t ret = 0;
+	struct perf_ns_link_info *ns_link_info;
+	u32 nr_namespaces, idx;
+
+	ns_link_info = event->namespaces.link_info;
+	nr_namespaces = event->namespaces.nr_namespaces;
+
+	ret += fprintf(fp, " %d/%d - nr_namespaces: %u\n\t\t[",
+		       event->namespaces.pid,
+		       event->namespaces.tid,
+		       nr_namespaces);
+
+	for (idx = 0; idx < nr_namespaces; idx++) {
+		if (idx && (idx % 4 == 0))
+			ret += fprintf(fp, "\n\t\t ");
+
+		ret  += fprintf(fp, "%u/%s: %" PRIu64 "/%#" PRIx64 "%s", idx,
+				perf_ns__name(idx), (u64)ns_link_info[idx].dev,
+				(u64)ns_link_info[idx].ino,
+				((idx + 1) != nr_namespaces) ? ", " : "]\n");
+	}
+
+	return ret;
+}
+
 int perf_event__process_comm(struct perf_tool *tool __maybe_unused,
 			     union perf_event *event,
 			     struct perf_sample *sample,
@@ -1016,6 +1061,14 @@ int perf_event__process_comm(struct perf_tool *tool __maybe_unused,
 	return machine__process_comm_event(machine, event, sample);
 }
 
+int perf_event__process_namespaces(struct perf_tool *tool __maybe_unused,
+				   union perf_event *event,
+				   struct perf_sample *sample,
+				   struct machine *machine)
+{
+	return machine__process_namespaces_event(machine, event, sample);
+}
+
 int perf_event__process_lost(struct perf_tool *tool __maybe_unused,
 			     union perf_event *event,
 			     struct perf_sample *sample,
@@ -1196,6 +1249,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp)
 	case PERF_RECORD_MMAP:
 		ret += perf_event__fprintf_mmap(event, fp);
 		break;
+	case PERF_RECORD_NAMESPACES:
+		ret += perf_event__fprintf_namespaces(event, fp);
+		break;
 	case PERF_RECORD_MMAP2:
 		ret += perf_event__fprintf_mmap2(event, fp);
 		break;
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index c735c53a26f8..b39ff795b9a9 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -39,6 +39,13 @@ struct comm_event {
 	char comm[16];
 };
 
+struct namespaces_event {
+	struct perf_event_header header;
+	u32 pid, tid;
+	u64 nr_namespaces;
+	struct perf_ns_link_info link_info[];
+};
+
 struct fork_event {
 	struct perf_event_header header;
 	u32 pid, ppid;
@@ -485,6 +492,7 @@ union perf_event {
 	struct mmap_event		mmap;
 	struct mmap2_event		mmap2;
 	struct comm_event		comm;
+	struct namespaces_event		namespaces;
 	struct fork_event		fork;
 	struct lost_event		lost;
 	struct lost_samples_event	lost_samples;
@@ -587,6 +595,10 @@ int perf_event__process_switch(struct perf_tool *tool,
 			       union perf_event *event,
 			       struct perf_sample *sample,
 			       struct machine *machine);
+int perf_event__process_namespaces(struct perf_tool *tool,
+				   union perf_event *event,
+				   struct perf_sample *sample,
+				   struct machine *machine);
 int perf_event__process_mmap(struct perf_tool *tool,
 			     union perf_event *event,
 			     struct perf_sample *sample,
@@ -653,6 +665,7 @@ size_t perf_event__fprintf_itrace_start(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_switch(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_thread_map(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_cpu_map(union perf_event *event, FILE *fp);
+size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf(union perf_event *event, FILE *fp);
 
 u64 kallsyms__get_function_start(const char *kallsyms_filename,
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index ac59710b79e0..175dc2305aa8 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -932,6 +932,9 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts,
 	attr->mmap2 = track && !perf_missing_features.mmap2;
 	attr->comm  = track;
 
+	if (opts->record_namespaces)
+		attr->namespaces  = track;
+
 	if (opts->record_switch_events)
 		attr->context_switch = track;
 
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index b9974fe41bc1..dfc600446586 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -13,6 +13,7 @@
 #include <symbol/kallsyms.h>
 #include "unwind.h"
 #include "linux/hash.h"
+#include "asm/bug.h"
 
 static void __machine__remove_thread(struct machine *machine, struct thread *th, bool lock);
 
@@ -501,6 +502,37 @@ int machine__process_comm_event(struct machine *machine, union perf_event *event
 	return err;
 }
 
+int machine__process_namespaces_event(struct machine *machine __maybe_unused,
+				      union perf_event *event,
+				      struct perf_sample *sample __maybe_unused)
+{
+	struct thread *thread = machine__findnew_thread(machine,
+							event->namespaces.pid,
+							event->namespaces.tid);
+	int err = 0;
+
+	WARN_ONCE(event->namespaces.nr_namespaces > NR_NAMESPACES,
+		  "\nWARNING: kernel seems to support more namespaces than perf"
+		  " tool.\nTry updating the perf tool..\n\n");
+
+	WARN_ONCE(event->namespaces.nr_namespaces < NR_NAMESPACES,
+		  "\nWARNING: perf tool seems to support more namespaces than"
+		  " the kernel.\nTry updating the kernel..\n\n");
+
+	if (dump_trace)
+		perf_event__fprintf_namespaces(event, stdout);
+
+	if (thread == NULL ||
+	    thread__set_namespaces(thread, sample->time, &event->namespaces)) {
+		dump_printf("problem processing PERF_RECORD_NAMESPACES, skipping event.\n");
+		err = -1;
+	}
+
+	thread__put(thread);
+
+	return err;
+}
+
 int machine__process_lost_event(struct machine *machine __maybe_unused,
 				union perf_event *event, struct perf_sample *sample __maybe_unused)
 {
@@ -1538,6 +1570,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
 		ret = machine__process_comm_event(machine, event, sample); break;
 	case PERF_RECORD_MMAP:
 		ret = machine__process_mmap_event(machine, event, sample); break;
+	case PERF_RECORD_NAMESPACES:
+		ret = machine__process_namespaces_event(machine, event, sample); break;
 	case PERF_RECORD_MMAP2:
 		ret = machine__process_mmap2_event(machine, event, sample); break;
 	case PERF_RECORD_FORK:
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index a28305029711..3cdb1340f917 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -97,6 +97,9 @@ int machine__process_itrace_start_event(struct machine *machine,
 					union perf_event *event);
 int machine__process_switch_event(struct machine *machine,
 				  union perf_event *event);
+int machine__process_namespaces_event(struct machine *machine,
+				      union perf_event *event,
+				      struct perf_sample *sample);
 int machine__process_mmap_event(struct machine *machine, union perf_event *event,
 				struct perf_sample *sample);
 int machine__process_mmap2_event(struct machine *machine, union perf_event *event,
diff --git a/tools/perf/util/namespaces.c b/tools/perf/util/namespaces.c
new file mode 100644
index 000000000000..2de8da64d90c
--- /dev/null
+++ b/tools/perf/util/namespaces.c
@@ -0,0 +1,36 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * Copyright (C) 2017 Hari Bathini, IBM Corporation
+ */
+
+#include "namespaces.h"
+#include "util.h"
+#include "event.h"
+#include <stdlib.h>
+#include <stdio.h>
+
+struct namespaces *namespaces__new(struct namespaces_event *event)
+{
+	struct namespaces *namespaces;
+	u64 link_info_size = ((event ? event->nr_namespaces : NR_NAMESPACES) *
+			      sizeof(struct perf_ns_link_info));
+
+	namespaces = zalloc(sizeof(struct namespaces) + link_info_size);
+	if (!namespaces)
+		return NULL;
+
+	namespaces->end_time = -1;
+
+	if (event)
+		memcpy(namespaces->link_info, event->link_info, link_info_size);
+
+	return namespaces;
+}
+
+void namespaces__free(struct namespaces *namespaces)
+{
+	free(namespaces);
+}
diff --git a/tools/perf/util/namespaces.h b/tools/perf/util/namespaces.h
new file mode 100644
index 000000000000..468f1e9a1484
--- /dev/null
+++ b/tools/perf/util/namespaces.h
@@ -0,0 +1,26 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * Copyright (C) 2017 Hari Bathini, IBM Corporation
+ */
+
+#ifndef __PERF_NAMESPACES_H
+#define __PERF_NAMESPACES_H
+
+#include "../perf.h"
+#include <linux/list.h>
+
+struct namespaces_event;
+
+struct namespaces {
+	struct list_head list;
+	u64 end_time;
+	struct perf_ns_link_info link_info[];
+};
+
+struct namespaces *namespaces__new(struct namespaces_event *event);
+void namespaces__free(struct namespaces *namespaces);
+
+#endif  /* __PERF_NAMESPACES_H */
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 1dd617d116b5..ae42e742d461 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1239,6 +1239,8 @@ static int machines__deliver_event(struct machines *machines,
 		return tool->mmap2(tool, event, sample, machine);
 	case PERF_RECORD_COMM:
 		return tool->comm(tool, event, sample, machine);
+	case PERF_RECORD_NAMESPACES:
+		return tool->namespaces(tool, event, sample, machine);
 	case PERF_RECORD_FORK:
 		return tool->fork(tool, event, sample, machine);
 	case PERF_RECORD_EXIT:
@@ -1494,6 +1496,11 @@ int perf_session__register_idle_thread(struct perf_session *session)
 		err = -1;
 	}
 
+	if (thread == NULL || thread__set_namespaces(thread, 0, NULL)) {
+		pr_err("problem inserting idle task.\n");
+		err = -1;
+	}
+
 	/* machine__findnew_thread() got the thread, so put it */
 	thread__put(thread);
 	return err;
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 74e79d26b421..dcdb87a5d0a1 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -7,6 +7,7 @@
 #include "thread-stack.h"
 #include "util.h"
 #include "debug.h"
+#include "namespaces.h"
 #include "comm.h"
 #include "unwind.h"
 
@@ -40,6 +41,7 @@ struct thread *thread__new(pid_t pid, pid_t tid)
 		thread->tid = tid;
 		thread->ppid = -1;
 		thread->cpu = -1;
+		INIT_LIST_HEAD(&thread->namespaces_list);
 		INIT_LIST_HEAD(&thread->comm_list);
 
 		comm_str = malloc(32);
@@ -66,7 +68,8 @@ struct thread *thread__new(pid_t pid, pid_t tid)
 
 void thread__delete(struct thread *thread)
 {
-	struct comm *comm, *tmp;
+	struct namespaces *namespaces, *tmp_namespaces;
+	struct comm *comm, *tmp_comm;
 
 	BUG_ON(!RB_EMPTY_NODE(&thread->rb_node));
 
@@ -76,7 +79,12 @@ void thread__delete(struct thread *thread)
 		map_groups__put(thread->mg);
 		thread->mg = NULL;
 	}
-	list_for_each_entry_safe(comm, tmp, &thread->comm_list, list) {
+	list_for_each_entry_safe(namespaces, tmp_namespaces,
+				 &thread->namespaces_list, list) {
+		list_del(&namespaces->list);
+		namespaces__free(namespaces);
+	}
+	list_for_each_entry_safe(comm, tmp_comm, &thread->comm_list, list) {
 		list_del(&comm->list);
 		comm__free(comm);
 	}
@@ -104,6 +112,38 @@ void thread__put(struct thread *thread)
 	}
 }
 
+struct namespaces *thread__namespaces(const struct thread *thread)
+{
+	if (list_empty(&thread->namespaces_list))
+		return NULL;
+
+	return list_first_entry(&thread->namespaces_list, struct namespaces, list);
+}
+
+int thread__set_namespaces(struct thread *thread, u64 timestamp,
+			   struct namespaces_event *event)
+{
+	struct namespaces *new, *curr = thread__namespaces(thread);
+
+	new = namespaces__new(event);
+	if (!new)
+		return -ENOMEM;
+
+	list_add(&new->list, &thread->namespaces_list);
+
+	if (timestamp && curr) {
+		/*
+		 * setns syscall must have changed few or all the namespaces
+		 * of this thread. Update end time for the namespaces
+		 * previously used.
+		 */
+		curr = list_next_entry(new, list);
+		curr->end_time = timestamp;
+	}
+
+	return 0;
+}
+
 struct comm *thread__comm(const struct thread *thread)
 {
 	if (list_empty(&thread->comm_list))
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index e57188546465..4eb849e9098f 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -28,6 +28,7 @@ struct thread {
 	bool			comm_set;
 	int			comm_len;
 	bool			dead; /* if set thread has exited */
+	struct list_head	namespaces_list;
 	struct list_head	comm_list;
 	u64			db_id;
 
@@ -40,6 +41,7 @@ struct thread {
 };
 
 struct machine;
+struct namespaces;
 struct comm;
 
 struct thread *thread__new(pid_t pid, pid_t tid);
@@ -62,6 +64,10 @@ static inline void thread__exited(struct thread *thread)
 	thread->dead = true;
 }
 
+struct namespaces *thread__namespaces(const struct thread *thread);
+int thread__set_namespaces(struct thread *thread, u64 timestamp,
+			   struct namespaces_event *event);
+
 int __thread__set_comm(struct thread *thread, const char *comm, u64 timestamp,
 		       bool exec);
 static inline int thread__set_comm(struct thread *thread, const char *comm,
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index ac2590a3de2d..829471a1c6d7 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -40,6 +40,7 @@ struct perf_tool {
 	event_op	mmap,
 			mmap2,
 			comm,
+			namespaces,
 			fork,
 			exit,
 			lost,
@@ -66,6 +67,7 @@ struct perf_tool {
 	event_op3	auxtrace;
 	bool		ordered_events;
 	bool		ordering_requires_timestamps;
+	bool		namespace_events;
 };
 
 #endif /* __PERF_TOOL_H */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 11/19] perf record: Synthesize namespace events for current processes
  2017-03-14 18:50 [GIT PULL 00/19] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (9 preceding siblings ...)
  2017-03-14 18:50 ` [PATCH 10/19] perf tools: " Arnaldo Carvalho de Melo
@ 2017-03-14 18:50 ` Arnaldo Carvalho de Melo
  2017-03-14 18:50 ` [PATCH 12/19] perf script: Add script print support for namespace events Arnaldo Carvalho de Melo
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-14 18:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Hari Bathini, Alexander Shishkin,
	Alexei Starovoitov, Ananth N Mavinakayanahalli, Aravinda Prasad,
	Brendan Gregg, Daniel Borkmann, Eric Biederman, Peter Zijlstra,
	Sargun Dhillon, Steven Rostedt, Arnaldo Carvalho de Melo

From: Hari Bathini <hbathini@linux.vnet.ibm.com>

Synthesize PERF_RECORD_NAMESPACES events for processes that were running prior
to invocation of perf record. The data for this is taken from /proc/$PID/ns.
These changes make way for analyzing events with regard to namespaces.

Committer notes:

Check if 'tool' is NULL in perf_event__synthesize_namespaces(), as in the
test__mmap_thread_lookup case, i.e. 'perf test Lookup mmap thread".

Testing it:

  # ps axH > /tmp/allthreads
  # perf record -a --namespaces usleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 1.169 MB perf.data (8 samples) ]
  # perf report -D | grep PERF_RECORD_NAMESPACES | wc -l
  602
  # wc -l /tmp/allthreads
  601 /tmp/allthreads
  # tail /tmp/allthreads
  16951 pts/4    T      0:00 git rebase -i a033bf1bfacdaa25642e6bcc857a7d0f67cc3c92^
  16952 pts/4    T      0:00 /bin/sh /usr/libexec/git-core/git-rebase -i a033bf1bfacdaa25642e6bcc857a7d0f67cc3c92^
  17176 pts/4    T      0:00 git commit --amend --no-post-rewrite
  17204 pts/4    T      0:00 vim /home/acme/git/linux/.git/COMMIT_EDITMSG
  18939 ?        S      0:00 [kworker/2:1]
  18947 ?        S      0:00 [kworker/3:0]
  18974 ?        S      0:00 [kworker/1:0]
  19047 ?        S      0:00 [kworker/0:1]
  19152 pts/6    S+     0:00 weechat
  19153 pts/7    R+     0:00 ps axH
  # perf report -D | grep PERF_RECORD_NAMESPACES | tail
  0 0 0x125068 [0xa0]: PERF_RECORD_NAMESPACES 17176/17176 - nr_namespaces: 7
  0 0 0x1255b8 [0xa0]: PERF_RECORD_NAMESPACES 17204/17204 - nr_namespaces: 7
  0 0 0x125df0 [0xa0]: PERF_RECORD_NAMESPACES 18939/18939 - nr_namespaces: 7
  0 0 0x125f00 [0xa0]: PERF_RECORD_NAMESPACES 18947/18947 - nr_namespaces: 7
  0 0 0x126010 [0xa0]: PERF_RECORD_NAMESPACES 18974/18974 - nr_namespaces: 7
  0 0 0x126120 [0xa0]: PERF_RECORD_NAMESPACES 19047/19047 - nr_namespaces: 7
  0 0 0x126230 [0xa0]: PERF_RECORD_NAMESPACES 19152/19152 - nr_namespaces: 7
  0 0 0x129330 [0xa0]: PERF_RECORD_NAMESPACES 19154/19154 - nr_namespaces: 7
  0 0 0x12a1f8 [0xa0]: PERF_RECORD_NAMESPACES 19155/19155 - nr_namespaces: 7
  0 0 0x12b0b8 [0xa0]: PERF_RECORD_NAMESPACES 19155/19155 - nr_namespaces: 7
  #

Humm, investigate why we got two record for the 19155 pid/tid...

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/148891931111.25309.11073854609798681633.stgit@hbathini.in.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-record.c | 29 ++++++++++++--
 tools/perf/util/event.c     | 94 ++++++++++++++++++++++++++++++++++++++++++---
 tools/perf/util/event.h     |  6 +++
 3 files changed, 119 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 99562c7242b6..04faef79a548 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -986,6 +986,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	 */
 	if (forks) {
 		union perf_event *event;
+		pid_t tgid;
 
 		event = malloc(sizeof(event->comm) + machine->id_hdr_size);
 		if (event == NULL) {
@@ -999,10 +1000,30 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 		 * cannot see a correct process name for those events.
 		 * Synthesize COMM event to prevent it.
 		 */
-		perf_event__synthesize_comm(tool, event,
-					    rec->evlist->workload.pid,
-					    process_synthesized_event,
-					    machine);
+		tgid = perf_event__synthesize_comm(tool, event,
+						   rec->evlist->workload.pid,
+						   process_synthesized_event,
+						   machine);
+		free(event);
+
+		if (tgid == -1)
+			goto out_child;
+
+		event = malloc(sizeof(event->namespaces) +
+			       (NR_NAMESPACES * sizeof(struct perf_ns_link_info)) +
+			       machine->id_hdr_size);
+		if (event == NULL) {
+			err = -ENOMEM;
+			goto out_child;
+		}
+
+		/*
+		 * Synthesize NAMESPACES event for the command specified.
+		 */
+		perf_event__synthesize_namespaces(tool, event,
+						  rec->evlist->workload.pid,
+						  tgid, process_synthesized_event,
+						  machine);
 		free(event);
 
 		perf_evlist__start_workload(rec->evlist);
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index fb52819023c7..d082cb70445d 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -221,6 +221,58 @@ pid_t perf_event__synthesize_comm(struct perf_tool *tool,
 	return tgid;
 }
 
+static void perf_event__get_ns_link_info(pid_t pid, const char *ns,
+					 struct perf_ns_link_info *ns_link_info)
+{
+	struct stat64 st;
+	char proc_ns[128];
+
+	sprintf(proc_ns, "/proc/%u/ns/%s", pid, ns);
+	if (stat64(proc_ns, &st) == 0) {
+		ns_link_info->dev = st.st_dev;
+		ns_link_info->ino = st.st_ino;
+	}
+}
+
+int perf_event__synthesize_namespaces(struct perf_tool *tool,
+				      union perf_event *event,
+				      pid_t pid, pid_t tgid,
+				      perf_event__handler_t process,
+				      struct machine *machine)
+{
+	u32 idx;
+	struct perf_ns_link_info *ns_link_info;
+
+	if (!tool || !tool->namespace_events)
+		return 0;
+
+	memset(&event->namespaces, 0, (sizeof(event->namespaces) +
+	       (NR_NAMESPACES * sizeof(struct perf_ns_link_info)) +
+	       machine->id_hdr_size));
+
+	event->namespaces.pid = tgid;
+	event->namespaces.tid = pid;
+
+	event->namespaces.nr_namespaces = NR_NAMESPACES;
+
+	ns_link_info = event->namespaces.link_info;
+
+	for (idx = 0; idx < event->namespaces.nr_namespaces; idx++)
+		perf_event__get_ns_link_info(pid, perf_ns__name(idx),
+					     &ns_link_info[idx]);
+
+	event->namespaces.header.type = PERF_RECORD_NAMESPACES;
+
+	event->namespaces.header.size = (sizeof(event->namespaces) +
+			(NR_NAMESPACES * sizeof(struct perf_ns_link_info)) +
+			machine->id_hdr_size);
+
+	if (perf_tool__process_synth_event(tool, event, machine, process) != 0)
+		return -1;
+
+	return 0;
+}
+
 static int perf_event__synthesize_fork(struct perf_tool *tool,
 				       union perf_event *event,
 				       pid_t pid, pid_t tgid, pid_t ppid,
@@ -452,8 +504,9 @@ int perf_event__synthesize_modules(struct perf_tool *tool,
 static int __event__synthesize_thread(union perf_event *comm_event,
 				      union perf_event *mmap_event,
 				      union perf_event *fork_event,
+				      union perf_event *namespaces_event,
 				      pid_t pid, int full,
-					  perf_event__handler_t process,
+				      perf_event__handler_t process,
 				      struct perf_tool *tool,
 				      struct machine *machine,
 				      bool mmap_data,
@@ -473,6 +526,11 @@ static int __event__synthesize_thread(union perf_event *comm_event,
 		if (tgid == -1)
 			return -1;
 
+		if (perf_event__synthesize_namespaces(tool, namespaces_event, pid,
+						      tgid, process, machine) < 0)
+			return -1;
+
+
 		return perf_event__synthesize_mmap_events(tool, mmap_event, pid, tgid,
 							  process, machine, mmap_data,
 							  proc_map_timeout);
@@ -506,6 +564,11 @@ static int __event__synthesize_thread(union perf_event *comm_event,
 		if (perf_event__synthesize_fork(tool, fork_event, _pid, tgid,
 						ppid, process, machine) < 0)
 			break;
+
+		if (perf_event__synthesize_namespaces(tool, namespaces_event, _pid,
+						      tgid, process, machine) < 0)
+			break;
+
 		/*
 		 * Send the prepared comm event
 		 */
@@ -534,6 +597,7 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
 				      unsigned int proc_map_timeout)
 {
 	union perf_event *comm_event, *mmap_event, *fork_event;
+	union perf_event *namespaces_event;
 	int err = -1, thread, j;
 
 	comm_event = malloc(sizeof(comm_event->comm) + machine->id_hdr_size);
@@ -548,10 +612,16 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
 	if (fork_event == NULL)
 		goto out_free_mmap;
 
+	namespaces_event = malloc(sizeof(namespaces_event->namespaces) +
+				  (NR_NAMESPACES * sizeof(struct perf_ns_link_info)) +
+				  machine->id_hdr_size);
+	if (namespaces_event == NULL)
+		goto out_free_fork;
+
 	err = 0;
 	for (thread = 0; thread < threads->nr; ++thread) {
 		if (__event__synthesize_thread(comm_event, mmap_event,
-					       fork_event,
+					       fork_event, namespaces_event,
 					       thread_map__pid(threads, thread), 0,
 					       process, tool, machine,
 					       mmap_data, proc_map_timeout)) {
@@ -577,7 +647,7 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
 			/* if not, generate events for it */
 			if (need_leader &&
 			    __event__synthesize_thread(comm_event, mmap_event,
-						       fork_event,
+						       fork_event, namespaces_event,
 						       comm_event->comm.pid, 0,
 						       process, tool, machine,
 						       mmap_data, proc_map_timeout)) {
@@ -586,6 +656,8 @@ int perf_event__synthesize_thread_map(struct perf_tool *tool,
 			}
 		}
 	}
+	free(namespaces_event);
+out_free_fork:
 	free(fork_event);
 out_free_mmap:
 	free(mmap_event);
@@ -605,6 +677,7 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
 	char proc_path[PATH_MAX];
 	struct dirent *dirent;
 	union perf_event *comm_event, *mmap_event, *fork_event;
+	union perf_event *namespaces_event;
 	int err = -1;
 
 	if (machine__is_default_guest(machine))
@@ -622,11 +695,17 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
 	if (fork_event == NULL)
 		goto out_free_mmap;
 
+	namespaces_event = malloc(sizeof(namespaces_event->namespaces) +
+				  (NR_NAMESPACES * sizeof(struct perf_ns_link_info)) +
+				  machine->id_hdr_size);
+	if (namespaces_event == NULL)
+		goto out_free_fork;
+
 	snprintf(proc_path, sizeof(proc_path), "%s/proc", machine->root_dir);
 	proc = opendir(proc_path);
 
 	if (proc == NULL)
-		goto out_free_fork;
+		goto out_free_namespaces;
 
 	while ((dirent = readdir(proc)) != NULL) {
 		char *end;
@@ -638,13 +717,16 @@ int perf_event__synthesize_threads(struct perf_tool *tool,
  		 * We may race with exiting thread, so don't stop just because
  		 * one thread couldn't be synthesized.
  		 */
-		__event__synthesize_thread(comm_event, mmap_event, fork_event, pid,
-					   1, process, tool, machine, mmap_data,
+		__event__synthesize_thread(comm_event, mmap_event, fork_event,
+					   namespaces_event, pid, 1, process,
+					   tool, machine, mmap_data,
 					   proc_map_timeout);
 	}
 
 	err = 0;
 	closedir(proc);
+out_free_namespaces:
+	free(namespaces_event);
 out_free_fork:
 	free(fork_event);
 out_free_mmap:
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index b39ff795b9a9..e1d8166ebbd5 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -648,6 +648,12 @@ pid_t perf_event__synthesize_comm(struct perf_tool *tool,
 				  perf_event__handler_t process,
 				  struct machine *machine);
 
+int perf_event__synthesize_namespaces(struct perf_tool *tool,
+				      union perf_event *event,
+				      pid_t pid, pid_t tgid,
+				      perf_event__handler_t process,
+				      struct machine *machine);
+
 int perf_event__synthesize_mmap_events(struct perf_tool *tool,
 				       union perf_event *event,
 				       pid_t pid, pid_t tgid,
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 12/19] perf script: Add script print support for namespace events
  2017-03-14 18:50 [GIT PULL 00/19] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (10 preceding siblings ...)
  2017-03-14 18:50 ` [PATCH 11/19] perf record: Synthesize namespace events for current processes Arnaldo Carvalho de Melo
@ 2017-03-14 18:50 ` Arnaldo Carvalho de Melo
  2017-03-14 18:50 ` [PATCH 13/19] perf tools: Add 'cgroup_id' sort order keyword Arnaldo Carvalho de Melo
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-14 18:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Hari Bathini, Alexander Shishkin,
	Alexei Starovoitov, Ananth N Mavinakayanahalli, Aravinda Prasad,
	Brendan Gregg, Daniel Borkmann, Eric Biederman, Peter Zijlstra,
	Sargun Dhillon, Steven Rostedt, Arnaldo Carvalho de Melo

From: Hari Bathini <hbathini@linux.vnet.ibm.com>

Introduce a new option to display events of type PERF_RECORD_NAMESPACES
and update perf-script documentation accordingly.

Shown below is output (trimmed) of perf script command with the newly
introduced option, on perf.data generated with perf record command using
--namespaces option.

  $ perf script --show-namespace-events
      swapper   0 [000]     0.000000: PERF_RECORD_NAMESPACES 1/1 - nr_namespaces: 7
                [0/net: 3/0xf000001c, 1/uts: 3/0xeffffffe, 2/ipc: 3/0xefffffff, 3/pid: 3/0xeffffffc,
                 4/user: 3/0xeffffffd, 5/mnt: 3/0xf0000000, 6/cgroup: 3/0xeffffffb]
      swapper   0 [000]     0.000000: PERF_RECORD_NAMESPACES 2/2 - nr_namespaces: 7
                [0/net: 3/0xf000001c, 1/uts: 3/0xeffffffe, 2/ipc: 3/0xefffffff, 3/pid: 3/0xeffffffc,
                 4/user: 3/0xeffffffd, 5/mnt: 3/0xf0000000, 6/cgroup: 3/0xeffffffb]

Commiter notes:

Testing it:

Investigating that double PERF_RECORD_NAMESPACES for the 19155
pid/tid... Its more than that, there are two PERF_RECORD_COMM as well,
and with zeroed timestamps, so probably a synthesizing artifact...

  # perf script --show-task --show-namespace
  <SNIP>
      perf     0 [000]     0.000000: PERF_RECORD_COMM: perf:19154/19154
      perf     0 [000]     0.000000: PERF_RECORD_FORK(19155:19155):(19154:19154)
      perf     0 [000]     0.000000: PERF_RECORD_NAMESPACES 19155/19155 - nr_namespaces: 7
          [0/net: 3/0xf0000081, 1/uts: 3/0xeffffffe, 2/ipc: 3/0xefffffff, 3/pid: 3/0xeffffffc,
           4/user: 3/0xeffffffd, 5/mnt: 3/0xf0000000, 6/cgroup: 3/0xeffffffb]
      perf     0 [000]     0.000000: PERF_RECORD_COMM: perf:19155/19155
      perf     0 [000]     0.000000: PERF_RECORD_COMM: perf:19155/19155
      perf     0 [000]     0.000000: PERF_RECORD_NAMESPACES 19155/19155 - nr_namespaces: 7
          [0/net: 3/0xf0000081, 1/uts: 3/0xeffffffe, 2/ipc: 3/0xefffffff, 3/pid: 3/0xeffffffc,
           4/user: 3/0xeffffffd, 5/mnt: 3/0xf0000000, 6/cgroup: 3/0xeffffffb]
   swapper     0 [000]  3110.881834:          1 cycles:  ffffffffa7060bf6 native_write_msr (/lib/modules/4.11.0-rc1+/build/vmlinux)

  <SNIP>

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/148891932627.25309.1941587059154176221.stgit@hbathini.in.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-script.txt |  3 +++
 tools/perf/builtin-script.c              | 40 ++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 4ed5f239ba7d..62c9b0c77a3a 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -248,6 +248,9 @@ OPTIONS
 --show-mmap-events
 	Display mmap related events (e.g. MMAP, MMAP2).
 
+--show-namespace-events
+	Display namespace events i.e. events of type PERF_RECORD_NAMESPACES.
+
 --show-switch-events
 	Display context switch events i.e. events of type PERF_RECORD_SWITCH or
 	PERF_RECORD_SWITCH_CPU_WIDE.
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index f1ce806a1f31..66d62c98dff9 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -830,6 +830,7 @@ struct perf_script {
 	bool			show_task_events;
 	bool			show_mmap_events;
 	bool			show_switch_events;
+	bool			show_namespace_events;
 	bool			allocated;
 	struct cpu_map		*cpus;
 	struct thread_map	*threads;
@@ -1118,6 +1119,41 @@ static int process_comm_event(struct perf_tool *tool,
 	return ret;
 }
 
+static int process_namespaces_event(struct perf_tool *tool,
+				    union perf_event *event,
+				    struct perf_sample *sample,
+				    struct machine *machine)
+{
+	struct thread *thread;
+	struct perf_script *script = container_of(tool, struct perf_script, tool);
+	struct perf_session *session = script->session;
+	struct perf_evsel *evsel = perf_evlist__id2evsel(session->evlist, sample->id);
+	int ret = -1;
+
+	thread = machine__findnew_thread(machine, event->namespaces.pid,
+					 event->namespaces.tid);
+	if (thread == NULL) {
+		pr_debug("problem processing NAMESPACES event, skipping it.\n");
+		return -1;
+	}
+
+	if (perf_event__process_namespaces(tool, event, sample, machine) < 0)
+		goto out;
+
+	if (!evsel->attr.sample_id_all) {
+		sample->cpu = 0;
+		sample->time = 0;
+		sample->tid = event->namespaces.tid;
+		sample->pid = event->namespaces.pid;
+	}
+	print_sample_start(sample, thread, evsel);
+	perf_event__fprintf(event, stdout);
+	ret = 0;
+out:
+	thread__put(thread);
+	return ret;
+}
+
 static int process_fork_event(struct perf_tool *tool,
 			      union perf_event *event,
 			      struct perf_sample *sample,
@@ -1293,6 +1329,8 @@ static int __cmd_script(struct perf_script *script)
 	}
 	if (script->show_switch_events)
 		script->tool.context_switch = process_switch_event;
+	if (script->show_namespace_events)
+		script->tool.namespaces = process_namespaces_event;
 
 	ret = perf_session__process_events(script->session);
 
@@ -2181,6 +2219,8 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 		    "Show the mmap events"),
 	OPT_BOOLEAN('\0', "show-switch-events", &script.show_switch_events,
 		    "Show context switch events (if recorded)"),
+	OPT_BOOLEAN('\0', "show-namespace-events", &script.show_namespace_events,
+		    "Show namespace events (if recorded)"),
 	OPT_BOOLEAN('f', "force", &symbol_conf.force, "don't complain, do it"),
 	OPT_BOOLEAN(0, "ns", &nanosecs,
 		    "Use 9 decimal places when displaying time"),
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 13/19] perf tools: Add 'cgroup_id' sort order keyword
  2017-03-14 18:50 [GIT PULL 00/19] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (11 preceding siblings ...)
  2017-03-14 18:50 ` [PATCH 12/19] perf script: Add script print support for namespace events Arnaldo Carvalho de Melo
@ 2017-03-14 18:50 ` Arnaldo Carvalho de Melo
  2017-03-14 18:50 ` [PATCH 14/19] perf sched timehist: Add --next option Arnaldo Carvalho de Melo
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-14 18:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Hari Bathini, Alexander Shishkin,
	Alexei Starovoitov, Ananth N Mavinakayanahalli, Aravinda Prasad,
	Brendan Gregg, Daniel Borkmann, Eric Biederman, Jiri Olsa,
	Peter Zijlstra, Sargun Dhillon, Steven Rostedt,
	Arnaldo Carvalho de Melo

From: Hari Bathini <hbathini@linux.vnet.ibm.com>

This patch introduces a cgroup identifier entry field in perf report to
identify or distinguish data of different cgroups. It uses the device
number and inode number of cgroup namespace, included in perf data with
the new PERF_RECORD_NAMESPACES event, as cgroup identifier.

With the assumption that each container is created with it's own cgroup
namespace,  this allows assessment/analysis of multiple containers at
once.

A simple test for this would be to clone a few processes passing
SIGCHILD & CLONE_NEWCROUP flags to each of them, execute shell and run
different workloads  on each of those contexts,  while running perf
record command with --namespaces option.

Shown below is the output of perf report, sorted with cgroup identifier,
on perf.data generated with the above test scenario, clearly indicating
one context's considerable use of kernel memory in comparison with
others:

	$ perf report -s cgroup_id,sample --stdio
	#
	# Total Lost Samples: 0
	#
	# Samples: 5K of event 'kmem:kmalloc'
	# Event count (approx.): 5965
	#
	# Overhead  cgroup id (dev/inode)       Samples
	# ........  .....................  ............
	#
	    81.27%  3/0xeffffffb                   4848
	    16.24%  3/0xf00000d0                    969
	     1.16%  3/0xf00000ce                     69
	     0.82%  3/0xf00000cf                     49
	     0.50%  0/0x0                            30

While this is a start, there is further scope of improving this. For
example, instead of cgroup namespace's device and inode numbers, dev
and inode numbers of some or all namespaces may be used to distinguish
which processes are running in a given container context.

Also, scripts to map device and inode info to containers sounds
plausible for better tracing of containers.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/148891933338.25309.756882900782042645.stgit@hbathini.in.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-report.txt |  4 +++-
 tools/perf/util/hist.c                   |  7 ++++++
 tools/perf/util/hist.h                   |  1 +
 tools/perf/util/sort.c                   | 41 ++++++++++++++++++++++++++++++++
 tools/perf/util/sort.h                   |  7 ++++++
 5 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 672b149aa80a..e9a61f5485eb 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -72,7 +72,8 @@ OPTIONS
 --sort=::
 	Sort histogram entries by given key(s) - multiple keys can be specified
 	in CSV format.  Following sort keys are available:
-	pid, comm, dso, symbol, parent, cpu, socket, srcline, weight, local_weight.
+	pid, comm, dso, symbol, parent, cpu, socket, srcline, weight,
+	local_weight, cgroup_id.
 
 	Each key has following meaning:
 
@@ -92,6 +93,7 @@ OPTIONS
 	- weight: Event specific weight, e.g. memory latency or transaction
 	abort cost. This is the global weight.
 	- local_weight: Local weight version of the weight above.
+	- cgroup_id: ID derived from cgroup namespace device and inode numbers.
 	- transaction: Transaction abort flags.
 	- overhead: Overhead percentage of sample
 	- overhead_sys: Overhead percentage of sample running in system mode
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index eaf72a938fb4..e3b38f629504 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -3,6 +3,7 @@
 #include "hist.h"
 #include "map.h"
 #include "session.h"
+#include "namespaces.h"
 #include "sort.h"
 #include "evlist.h"
 #include "evsel.h"
@@ -169,6 +170,7 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
 		hists__set_unres_dso_col_len(hists, HISTC_MEM_DADDR_DSO);
 	}
 
+	hists__new_col_len(hists, HISTC_CGROUP_ID, 20);
 	hists__new_col_len(hists, HISTC_CPU, 3);
 	hists__new_col_len(hists, HISTC_SOCKET, 6);
 	hists__new_col_len(hists, HISTC_MEM_LOCKED, 6);
@@ -574,9 +576,14 @@ __hists__add_entry(struct hists *hists,
 		   bool sample_self,
 		   struct hist_entry_ops *ops)
 {
+	struct namespaces *ns = thread__namespaces(al->thread);
 	struct hist_entry entry = {
 		.thread	= al->thread,
 		.comm = thread__comm(al->thread),
+		.cgroup_id = {
+			.dev = ns ? ns->link_info[CGROUP_NS_INDEX].dev : 0,
+			.ino = ns ? ns->link_info[CGROUP_NS_INDEX].ino : 0,
+		},
 		.ms = {
 			.map	= al->map,
 			.sym	= al->sym,
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 2e839bf40bdd..ee3670a388df 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -30,6 +30,7 @@ enum hist_column {
 	HISTC_DSO,
 	HISTC_THREAD,
 	HISTC_COMM,
+	HISTC_CGROUP_ID,
 	HISTC_PARENT,
 	HISTC_CPU,
 	HISTC_SOCKET,
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 93f755ac60ca..8b0d4e39f640 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -536,6 +536,46 @@ struct sort_entry sort_cpu = {
 	.se_width_idx	= HISTC_CPU,
 };
 
+/* --sort cgroup_id */
+
+static int64_t _sort__cgroup_dev_cmp(u64 left_dev, u64 right_dev)
+{
+	return (int64_t)(right_dev - left_dev);
+}
+
+static int64_t _sort__cgroup_inode_cmp(u64 left_ino, u64 right_ino)
+{
+	return (int64_t)(right_ino - left_ino);
+}
+
+static int64_t
+sort__cgroup_id_cmp(struct hist_entry *left, struct hist_entry *right)
+{
+	int64_t ret;
+
+	ret = _sort__cgroup_dev_cmp(right->cgroup_id.dev, left->cgroup_id.dev);
+	if (ret != 0)
+		return ret;
+
+	return _sort__cgroup_inode_cmp(right->cgroup_id.ino,
+				       left->cgroup_id.ino);
+}
+
+static int hist_entry__cgroup_id_snprintf(struct hist_entry *he,
+					  char *bf, size_t size,
+					  unsigned int width __maybe_unused)
+{
+	return repsep_snprintf(bf, size, "%lu/0x%lx", he->cgroup_id.dev,
+			       he->cgroup_id.ino);
+}
+
+struct sort_entry sort_cgroup_id = {
+	.se_header      = "cgroup id (dev/inode)",
+	.se_cmp	        = sort__cgroup_id_cmp,
+	.se_snprintf    = hist_entry__cgroup_id_snprintf,
+	.se_width_idx	= HISTC_CGROUP_ID,
+};
+
 /* --sort socket */
 
 static int64_t
@@ -1464,6 +1504,7 @@ static struct sort_dimension common_sort_dimensions[] = {
 	DIM(SORT_TRANSACTION, "transaction", sort_transaction),
 	DIM(SORT_TRACE, "trace", sort_trace),
 	DIM(SORT_SYM_SIZE, "symbol_size", sort_sym_size),
+	DIM(SORT_CGROUP_ID, "cgroup_id", sort_cgroup_id),
 };
 
 #undef DIM
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index f583325a3743..baf20a399f34 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -54,6 +54,11 @@ struct he_stat {
 	u32			nr_events;
 };
 
+struct namespace_id {
+	u64			dev;
+	u64			ino;
+};
+
 struct hist_entry_diff {
 	bool	computed;
 	union {
@@ -91,6 +96,7 @@ struct hist_entry {
 	struct map_symbol	ms;
 	struct thread		*thread;
 	struct comm		*comm;
+	struct namespace_id	cgroup_id;
 	u64			ip;
 	u64			transaction;
 	s32			socket;
@@ -212,6 +218,7 @@ enum sort_type {
 	SORT_TRANSACTION,
 	SORT_TRACE,
 	SORT_SYM_SIZE,
+	SORT_CGROUP_ID,
 
 	/* branch stack specific sort keys */
 	__SORT_BRANCH_STACK,
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 14/19] perf sched timehist: Add --next option
  2017-03-14 18:50 [GIT PULL 00/19] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (12 preceding siblings ...)
  2017-03-14 18:50 ` [PATCH 13/19] perf tools: Add 'cgroup_id' sort order keyword Arnaldo Carvalho de Melo
@ 2017-03-14 18:50 ` Arnaldo Carvalho de Melo
  2017-03-14 18:50 ` [PATCH 15/19] perf probe: Factor out the ftrace README scanning Arnaldo Carvalho de Melo
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-14 18:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Brendan Gregg, Alexander Shishkin, Namhyung Kim,
	Peter Zijlstra, Arnaldo Carvalho de Melo

From: Brendan Gregg <bgregg@netflix.com>

The --next option shows the next task for each context switch, providing
more context for the sequence of scheduler events.

  $ perf sched timehist --next | head
  Samples do not have callchains.
       time  cpu task name  waittime schdelay run time
                 [tid/pid]     (msec) (msec) (msec)
  ---------- --- ---------- --------- ------ -----
  374.793792 [0] <idle>         0.000  0.000 0.000 next: rngd[1524]
  374.793801 [0] rngd[1524]     0.000  0.000 0.009 next: swapper/0[0]
  374.794048 [7] <idle>         0.000  0.000 0.000 next: yes[30884]
  374.794066 [7] yes[30884]     0.000  0.000 0.018 next: swapper/7[0]
  374.794126 [2] <idle>         0.000  0.000 0.000 next: rngd[1524]
  374.794140 [2] rngd[1524]     0.325  0.006 0.013 next: swapper/2[0]
  374.794281 [3] <idle>         0.000  0.000 0.000 next: perf[31070]

Signed-off-by: Brendan Gregg <bgregg@netflix.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1489456589-32555-1-git-send-email-bgregg@netflix.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-sched.txt |  4 ++++
 tools/perf/builtin-sched.c              | 25 ++++++++++++++++++++-----
 2 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/tools/perf/Documentation/perf-sched.txt b/tools/perf/Documentation/perf-sched.txt
index d33deddb0146..a092a2499e8f 100644
--- a/tools/perf/Documentation/perf-sched.txt
+++ b/tools/perf/Documentation/perf-sched.txt
@@ -132,6 +132,10 @@ OPTIONS for 'perf sched timehist'
 --migrations::
 	Show migration events.
 
+-n::
+--next::
+	Show next task.
+
 -I::
 --idle-hist::
 	Show idle-related events only.
diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 16170e9b47e6..b92c4d97192c 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -221,6 +221,7 @@ struct perf_sched {
 	unsigned int	max_stack;
 	bool		show_cpu_visual;
 	bool		show_wakeups;
+	bool		show_next;
 	bool		show_migrations;
 	bool		show_state;
 	u64		skipped_samples;
@@ -1897,14 +1898,18 @@ static char task_state_char(struct thread *thread, int state)
 }
 
 static void timehist_print_sample(struct perf_sched *sched,
+				  struct perf_evsel *evsel,
 				  struct perf_sample *sample,
 				  struct addr_location *al,
 				  struct thread *thread,
 				  u64 t, int state)
 {
 	struct thread_runtime *tr = thread__priv(thread);
+	const char *next_comm = perf_evsel__strval(evsel, sample, "next_comm");
+	const u32 next_pid = perf_evsel__intval(evsel, sample, "next_pid");
 	u32 max_cpus = sched->max_cpu + 1;
 	char tstr[64];
+	char nstr[30];
 	u64 wait_time;
 
 	timestamp__scnprintf_usec(t, tstr, sizeof(tstr));
@@ -1937,7 +1942,12 @@ static void timehist_print_sample(struct perf_sched *sched,
 	if (sched->show_state)
 		printf(" %5c ", task_state_char(thread, state));
 
-	if (sched->show_wakeups)
+	if (sched->show_next) {
+		snprintf(nstr, sizeof(nstr), "next: %s[%d]", next_comm, next_pid);
+		printf(" %-*s", comm_width, nstr);
+	}
+
+	if (sched->show_wakeups && !sched->show_next)
 		printf("  %-*s", comm_width, "");
 
 	if (thread->tid == 0)
@@ -2531,7 +2541,7 @@ static int timehist_sched_change_event(struct perf_tool *tool,
 	}
 
 	if (!sched->summary_only)
-		timehist_print_sample(sched, sample, &al, thread, t, state);
+		timehist_print_sample(sched, evsel, sample, &al, thread, t, state);
 
 out:
 	if (sched->hist_time.start == 0 && t >= ptime->start)
@@ -3341,6 +3351,7 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_BOOLEAN('S', "with-summary", &sched.summary,
 		    "Show all syscalls and summary with statistics"),
 	OPT_BOOLEAN('w', "wakeups", &sched.show_wakeups, "Show wakeup events"),
+	OPT_BOOLEAN('n', "next", &sched.show_next, "Show next task"),
 	OPT_BOOLEAN('M', "migrations", &sched.show_migrations, "Show migration events"),
 	OPT_BOOLEAN('V', "cpu-visual", &sched.show_cpu_visual, "Add CPU visual"),
 	OPT_BOOLEAN('I', "idle-hist", &sched.idle_hist, "Show idle events only"),
@@ -3438,10 +3449,14 @@ int cmd_sched(int argc, const char **argv, const char *prefix __maybe_unused)
 			if (argc)
 				usage_with_options(timehist_usage, timehist_options);
 		}
-		if (sched.show_wakeups && sched.summary_only) {
-			pr_err(" Error: -s and -w are mutually exclusive.\n");
+		if ((sched.show_wakeups || sched.show_next) &&
+		    sched.summary_only) {
+			pr_err(" Error: -s and -[n|w] are mutually exclusive.\n");
 			parse_options_usage(timehist_usage, timehist_options, "s", true);
-			parse_options_usage(NULL, timehist_options, "w", true);
+			if (sched.show_wakeups)
+				parse_options_usage(NULL, timehist_options, "w", true);
+			if (sched.show_next)
+				parse_options_usage(NULL, timehist_options, "n", true);
 			return -EINVAL;
 		}
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 15/19] perf probe: Factor out the ftrace README scanning
  2017-03-14 18:50 [GIT PULL 00/19] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (13 preceding siblings ...)
  2017-03-14 18:50 ` [PATCH 14/19] perf sched timehist: Add --next option Arnaldo Carvalho de Melo
@ 2017-03-14 18:50 ` Arnaldo Carvalho de Melo
  2017-03-14 18:50 ` [PATCH 16/19] perf kretprobes: Offset from reloc_sym if kernel supports it Arnaldo Carvalho de Melo
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-14 18:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Naveen N. Rao, Ananth N Mavinakayanahalli,
	Michael Ellerman, Steven Rostedt, linuxppc-dev,
	Arnaldo Carvalho de Melo

From: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>

Simplify and separate out the ftrace README scanning logic into a
separate helper. This is used subsequently to scan for all patterns of
interest and to cache the result.

Since we are only interested in availability of probe argument type x,
we will only scan for that.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/6dc30edc747ba82a236593be6cf3a046fa9453b5.1488961018.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/probe-file.c | 70 +++++++++++++++++++++++---------------------
 1 file changed, 37 insertions(+), 33 deletions(-)

diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index 1a62daceb028..8a219cd831b7 100644
--- a/tools/perf/util/probe-file.c
+++ b/tools/perf/util/probe-file.c
@@ -877,35 +877,31 @@ int probe_cache__show_all_caches(struct strfilter *filter)
 	return 0;
 }
 
+enum ftrace_readme {
+	FTRACE_README_PROBE_TYPE_X = 0,
+	FTRACE_README_END,
+};
+
 static struct {
 	const char *pattern;
-	bool	avail;
-	bool	checked;
-} probe_type_table[] = {
-#define DEFINE_TYPE(idx, pat, def_avail)	\
-	[idx] = {.pattern = pat, .avail = (def_avail)}
-	DEFINE_TYPE(PROBE_TYPE_U, "* u8/16/32/64,*", true),
-	DEFINE_TYPE(PROBE_TYPE_S, "* s8/16/32/64,*", true),
-	DEFINE_TYPE(PROBE_TYPE_X, "* x8/16/32/64,*", false),
-	DEFINE_TYPE(PROBE_TYPE_STRING, "* string,*", true),
-	DEFINE_TYPE(PROBE_TYPE_BITFIELD,
-		    "* b<bit-width>@<bit-offset>/<container-size>", true),
+	bool avail;
+} ftrace_readme_table[] = {
+#define DEFINE_TYPE(idx, pat)			\
+	[idx] = {.pattern = pat, .avail = false}
+	DEFINE_TYPE(FTRACE_README_PROBE_TYPE_X, "*type: * x8/16/32/64,*"),
 };
 
-bool probe_type_is_available(enum probe_type type)
+static bool scan_ftrace_readme(enum ftrace_readme type)
 {
+	int fd;
 	FILE *fp;
 	char *buf = NULL;
 	size_t len = 0;
-	bool target_line = false;
-	bool ret = probe_type_table[type].avail;
-	int fd;
+	bool ret = false;
+	static bool scanned = false;
 
-	if (type >= PROBE_TYPE_END)
-		return false;
-	/* We don't have to check the type which supported by default */
-	if (ret || probe_type_table[type].checked)
-		return ret;
+	if (scanned)
+		goto result;
 
 	fd = open_trace_file("README", false);
 	if (fd < 0)
@@ -917,21 +913,29 @@ bool probe_type_is_available(enum probe_type type)
 		return ret;
 	}
 
-	while (getline(&buf, &len, fp) > 0 && !ret) {
-		if (!target_line) {
-			target_line = !!strstr(buf, " type: ");
-			if (!target_line)
-				continue;
-		} else if (strstr(buf, "\t          ") != buf)
-			break;
-		ret = strglobmatch(buf, probe_type_table[type].pattern);
-	}
-	/* Cache the result */
-	probe_type_table[type].checked = true;
-	probe_type_table[type].avail = ret;
+	while (getline(&buf, &len, fp) > 0)
+		for (enum ftrace_readme i = 0; i < FTRACE_README_END; i++)
+			if (!ftrace_readme_table[i].avail)
+				ftrace_readme_table[i].avail =
+					strglobmatch(buf, ftrace_readme_table[i].pattern);
+	scanned = true;
 
 	fclose(fp);
 	free(buf);
 
-	return ret;
+result:
+	if (type >= FTRACE_README_END)
+		return false;
+
+	return ftrace_readme_table[type].avail;
+}
+
+bool probe_type_is_available(enum probe_type type)
+{
+	if (type >= PROBE_TYPE_END)
+		return false;
+	else if (type == PROBE_TYPE_X)
+		return scan_ftrace_readme(FTRACE_README_PROBE_TYPE_X);
+
+	return true;
 }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 16/19] perf kretprobes: Offset from reloc_sym if kernel supports it
  2017-03-14 18:50 [GIT PULL 00/19] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (14 preceding siblings ...)
  2017-03-14 18:50 ` [PATCH 15/19] perf probe: Factor out the ftrace README scanning Arnaldo Carvalho de Melo
@ 2017-03-14 18:50 ` Arnaldo Carvalho de Melo
  2017-03-14 18:50 ` [PATCH 17/19] perf powerpc: Choose local entry point with kretprobes Arnaldo Carvalho de Melo
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-14 18:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Naveen N. Rao, Ananth N Mavinakayanahalli,
	Michael Ellerman, Steven Rostedt, linuxppc-dev,
	Arnaldo Carvalho de Melo

From: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>

We indicate support for accepting sym+offset with kretprobes through a
line in ftrace README. Parse the same to identify support and choose the
appropriate format for kprobe_events.

As an example, without this perf patch, but with the ftrace changes:

  naveen@ubuntu:~/linux/tools/perf$ sudo cat /sys/kernel/debug/tracing/README | grep kretprobe
  place (kretprobe): [<module>:]<symbol>[+<offset>]|<memaddr>
  naveen@ubuntu:~/linux/tools/perf$
  naveen@ubuntu:~/linux/tools/perf$ sudo ./perf probe -v do_open%return
  probe-definition(0): do_open%return
  symbol:do_open file:(null) line:0 offset:0 return:1 lazy:(null)
  0 arguments
  Looking at the vmlinux_path (8 entries long)
  Using /boot/vmlinux for symbols
  Open Debuginfo file: /boot/vmlinux
  Try to find probe point from debuginfo.
  Matched function: do_open [2d0c7d8]
  Probe point found: do_open+0
  Matched function: do_open [35d76b5]
  found inline addr: 0xc0000000004ba984
  Failed to find "do_open%return",
   because do_open is an inlined function and has no return point.
  An error occurred in debuginfo analysis (-22).
  Trying to use symbols.
  Opening /sys/kernel/debug/tracing//kprobe_events write=1
  Writing event: r:probe/do_open do_open+0
  Writing event: r:probe/do_open_1 do_open+0
  Added new events:
    probe:do_open        (on do_open%return)
    probe:do_open_1      (on do_open%return)

  You can now use it in all perf tools, such as:

	  perf record -e probe:do_open_1 -aR sleep 1

  naveen@ubuntu:~/linux/tools/perf$ sudo cat /sys/kernel/debug/kprobes/list
  c000000000041370  k  kretprobe_trampoline+0x0    [OPTIMIZED]
  c0000000004433d0  r  do_open+0x0    [DISABLED]
  c0000000004433d0  r  do_open+0x0    [DISABLED]

And after this patch (and the subsequent powerpc patch):

  naveen@ubuntu:~/linux/tools/perf$ sudo ./perf probe -v do_open%return
  probe-definition(0): do_open%return
  symbol:do_open file:(null) line:0 offset:0 return:1 lazy:(null)
  0 arguments
  Looking at the vmlinux_path (8 entries long)
  Using /boot/vmlinux for symbols
  Open Debuginfo file: /boot/vmlinux
  Try to find probe point from debuginfo.
  Matched function: do_open [2d0c7d8]
  Probe point found: do_open+0
  Matched function: do_open [35d76b5]
  found inline addr: 0xc0000000004ba984
  Failed to find "do_open%return",
   because do_open is an inlined function and has no return point.
  An error occurred in debuginfo analysis (-22).
  Trying to use symbols.
  Opening /sys/kernel/debug/tracing//README write=0
  Opening /sys/kernel/debug/tracing//kprobe_events write=1
  Writing event: r:probe/do_open _text+4469712
  Writing event: r:probe/do_open_1 _text+4956248
  Added new events:
    probe:do_open        (on do_open%return)
    probe:do_open_1      (on do_open%return)

  You can now use it in all perf tools, such as:

	  perf record -e probe:do_open_1 -aR sleep 1

  naveen@ubuntu:~/linux/tools/perf$ sudo cat /sys/kernel/debug/kprobes/list
  c000000000041370  k  kretprobe_trampoline+0x0    [OPTIMIZED]
  c0000000004433d0  r  do_open+0x0    [DISABLED]
  c0000000004ba058  r  do_open+0x8    [DISABLED]

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/496ef9f33c1ab16286ece9dd62aa672807aef91c.1488961018.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/probe-event.c | 12 +++++-------
 tools/perf/util/probe-file.c  |  7 +++++++
 tools/perf/util/probe-file.h  |  1 +
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 28fb62c32678..c9bdc9ded0c3 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -757,7 +757,9 @@ post_process_kernel_probe_trace_events(struct probe_trace_event *tevs,
 	}
 
 	for (i = 0; i < ntevs; i++) {
-		if (!tevs[i].point.address || tevs[i].point.retprobe)
+		if (!tevs[i].point.address)
+			continue;
+		if (tevs[i].point.retprobe && !kretprobe_offset_is_supported())
 			continue;
 		/* If we found a wrong one, mark it by NULL symbol */
 		if (kprobe_warn_out_range(tevs[i].point.symbol,
@@ -1528,11 +1530,6 @@ static int parse_perf_probe_point(char *arg, struct perf_probe_event *pev)
 		return -EINVAL;
 	}
 
-	if (pp->retprobe && !pp->function) {
-		semantic_error("Return probe requires an entry function.\n");
-		return -EINVAL;
-	}
-
 	if ((pp->offset || pp->line || pp->lazy_line) && pp->retprobe) {
 		semantic_error("Offset/Line/Lazy pattern can't be used with "
 			       "return probe.\n");
@@ -2841,7 +2838,8 @@ static int find_probe_trace_events_from_map(struct perf_probe_event *pev,
 	}
 
 	/* Note that the symbols in the kmodule are not relocated */
-	if (!pev->uprobes && !pp->retprobe && !pev->target) {
+	if (!pev->uprobes && !pev->target &&
+			(!pp->retprobe || kretprobe_offset_is_supported())) {
 		reloc_sym = kernel_get_ref_reloc_sym();
 		if (!reloc_sym) {
 			pr_warning("Relocated base symbol is not found!\n");
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index 8a219cd831b7..1542cd0d6799 100644
--- a/tools/perf/util/probe-file.c
+++ b/tools/perf/util/probe-file.c
@@ -879,6 +879,7 @@ int probe_cache__show_all_caches(struct strfilter *filter)
 
 enum ftrace_readme {
 	FTRACE_README_PROBE_TYPE_X = 0,
+	FTRACE_README_KRETPROBE_OFFSET,
 	FTRACE_README_END,
 };
 
@@ -889,6 +890,7 @@ static struct {
 #define DEFINE_TYPE(idx, pat)			\
 	[idx] = {.pattern = pat, .avail = false}
 	DEFINE_TYPE(FTRACE_README_PROBE_TYPE_X, "*type: * x8/16/32/64,*"),
+	DEFINE_TYPE(FTRACE_README_KRETPROBE_OFFSET, "*place (kretprobe): *"),
 };
 
 static bool scan_ftrace_readme(enum ftrace_readme type)
@@ -939,3 +941,8 @@ bool probe_type_is_available(enum probe_type type)
 
 	return true;
 }
+
+bool kretprobe_offset_is_supported(void)
+{
+	return scan_ftrace_readme(FTRACE_README_KRETPROBE_OFFSET);
+}
diff --git a/tools/perf/util/probe-file.h b/tools/perf/util/probe-file.h
index a17a82eff8a0..dbf95a00864a 100644
--- a/tools/perf/util/probe-file.h
+++ b/tools/perf/util/probe-file.h
@@ -65,6 +65,7 @@ struct probe_cache_entry *probe_cache__find_by_name(struct probe_cache *pcache,
 					const char *group, const char *event);
 int probe_cache__show_all_caches(struct strfilter *filter);
 bool probe_type_is_available(enum probe_type type);
+bool kretprobe_offset_is_supported(void);
 #else	/* ! HAVE_LIBELF_SUPPORT */
 static inline struct probe_cache *probe_cache__new(const char *tgt __maybe_unused)
 {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 17/19] perf powerpc: Choose local entry point with kretprobes
  2017-03-14 18:50 [GIT PULL 00/19] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (15 preceding siblings ...)
  2017-03-14 18:50 ` [PATCH 16/19] perf kretprobes: Offset from reloc_sym if kernel supports it Arnaldo Carvalho de Melo
@ 2017-03-14 18:50 ` Arnaldo Carvalho de Melo
  2017-03-14 18:50 ` [PATCH 18/19] doc: trace/kprobes: add information about NOKPROBE_SYMBOL Arnaldo Carvalho de Melo
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-14 18:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Naveen N. Rao, Ananth N Mavinakayanahalli,
	Michael Ellerman, Steven Rostedt, linuxppc-dev,
	Arnaldo Carvalho de Melo

From: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>

perf now uses an offset from _text/_stext for kretprobes if the kernel
supports it, rather than the actual function name. As such, let's choose
the LEP for powerpc ABIv2 so as to ensure the probe gets hit. Do it only
if the kernel supports specifying offsets with kretprobes.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/7445b5334673ef5404ac1d12609bad4d73d2b567.1488961018.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/arch/powerpc/util/sym-handling.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/tools/perf/arch/powerpc/util/sym-handling.c b/tools/perf/arch/powerpc/util/sym-handling.c
index 1030a6e504bb..39dbe512b9fc 100644
--- a/tools/perf/arch/powerpc/util/sym-handling.c
+++ b/tools/perf/arch/powerpc/util/sym-handling.c
@@ -10,6 +10,7 @@
 #include "symbol.h"
 #include "map.h"
 #include "probe-event.h"
+#include "probe-file.h"
 
 #ifdef HAVE_LIBELF_SUPPORT
 bool elf__needs_adjust_symbols(GElf_Ehdr ehdr)
@@ -79,13 +80,18 @@ void arch__fix_tev_from_maps(struct perf_probe_event *pev,
 	 * However, if the user specifies an offset, we fall back to using the
 	 * GEP since all userspace applications (objdump/readelf) show function
 	 * disassembly with offsets from the GEP.
-	 *
-	 * In addition, we shouldn't specify an offset for kretprobes.
 	 */
-	if (pev->point.offset || (!pev->uprobes && pev->point.retprobe) ||
-	    !map || !sym)
+	if (pev->point.offset || !map || !sym)
 		return;
 
+	/* For kretprobes, add an offset only if the kernel supports it */
+	if (!pev->uprobes && pev->point.retprobe) {
+#ifdef HAVE_LIBELF_SUPPORT
+		if (!kretprobe_offset_is_supported())
+#endif
+			return;
+	}
+
 	lep_offset = PPC64_LOCAL_ENTRY_OFFSET(sym->arch_sym);
 
 	if (map->dso->symtab_type == DSO_BINARY_TYPE__KALLSYMS)
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 18/19] doc: trace/kprobes: add information about NOKPROBE_SYMBOL
  2017-03-14 18:50 [GIT PULL 00/19] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (16 preceding siblings ...)
  2017-03-14 18:50 ` [PATCH 17/19] perf powerpc: Choose local entry point with kretprobes Arnaldo Carvalho de Melo
@ 2017-03-14 18:50 ` Arnaldo Carvalho de Melo
  2017-03-14 18:50 ` [PATCH 19/19] kprobes: Convert kprobe_exceptions_notify to use NOKPROBE_SYMBOL Arnaldo Carvalho de Melo
  2017-03-15 18:29 ` [GIT PULL 00/19] perf/core improvements and fixes Ingo Molnar
  19 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-14 18:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Naveen N. Rao, Ananth N Mavinakayanahalli,
	Arnaldo Carvalho de Melo

From: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>

Update kprobe tracer documentation to also mention that
NOKPROBE_SYMBOL() and nokprobe_inline add symbols to the kprobes
blacklist.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/d924e20de099579ace4286e610304f054cd798db.1488991670.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 Documentation/trace/kprobetrace.txt | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt
index 41ef9d8efe95..5ea85059db3b 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -8,8 +8,9 @@ Overview
 --------
 These events are similar to tracepoint based events. Instead of Tracepoint,
 this is based on kprobes (kprobe and kretprobe). So it can probe wherever
-kprobes can probe (this means, all functions body except for __kprobes
-functions). Unlike the Tracepoint based event, this can be added and removed
+kprobes can probe (this means, all functions except those with
+__kprobes/nokprobe_inline annotation and those marked NOKPROBE_SYMBOL).
+Unlike the Tracepoint based event, this can be added and removed
 dynamically, on the fly.
 
 To enable this feature, build your kernel with CONFIG_KPROBE_EVENTS=y.
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 19/19] kprobes: Convert kprobe_exceptions_notify to use NOKPROBE_SYMBOL
  2017-03-14 18:50 [GIT PULL 00/19] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (17 preceding siblings ...)
  2017-03-14 18:50 ` [PATCH 18/19] doc: trace/kprobes: add information about NOKPROBE_SYMBOL Arnaldo Carvalho de Melo
@ 2017-03-14 18:50 ` Arnaldo Carvalho de Melo
  2017-03-15 18:29 ` [GIT PULL 00/19] perf/core improvements and fixes Ingo Molnar
  19 siblings, 0 replies; 21+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-14 18:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Naveen N. Rao, Ananth N Mavinakayanahalli,
	Arnaldo Carvalho de Melo

From: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>

commit fc62d0207ae0 ("kprobes: Introduce weak variant of
kprobe_exceptions_notify()") used the __kprobes annotation to exclude
kprobe_exceptions_notify from being probed. Since NOKPROBE_SYMBOL() is a
better way to do this enabling the symbol to be discovered as being
blacklisted, change over to using NOKPROBE_SYMBOL().

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/3f25bf400da5c222cd9b10eec6ded2d6b58209f8.1488991670.git.naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 kernel/kprobes.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 448759d4a263..4780ec236035 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1740,11 +1740,12 @@ void unregister_kprobes(struct kprobe **kps, int num)
 }
 EXPORT_SYMBOL_GPL(unregister_kprobes);
 
-int __weak __kprobes kprobe_exceptions_notify(struct notifier_block *self,
-					      unsigned long val, void *data)
+int __weak kprobe_exceptions_notify(struct notifier_block *self,
+					unsigned long val, void *data)
 {
 	return NOTIFY_DONE;
 }
+NOKPROBE_SYMBOL(kprobe_exceptions_notify);
 
 static struct notifier_block kprobe_exceptions_nb = {
 	.notifier_call = kprobe_exceptions_notify,
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [GIT PULL 00/19] perf/core improvements and fixes
  2017-03-14 18:50 [GIT PULL 00/19] perf/core improvements and fixes Arnaldo Carvalho de Melo
                   ` (18 preceding siblings ...)
  2017-03-14 18:50 ` [PATCH 19/19] kprobes: Convert kprobe_exceptions_notify to use NOKPROBE_SYMBOL Arnaldo Carvalho de Melo
@ 2017-03-15 18:29 ` Ingo Molnar
  19 siblings, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2017-03-15 18:29 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, Alexander Shishkin, Alexei Starovoitov,
	Ananth N Mavinakayanahalli, Andi Kleen, Aravinda Prasad,
	Brendan Gregg, Changbin Du, Daniel Borkmann, Eric Biederman,
	Feng Tang, Hari Bathini, Jiri Olsa, kernel-team, linuxppc-dev,
	Masami Hiramatsu, Michael Ellerman, Namhyung Kim, Naveen N . Rao,
	Peter Zijlstra, Sargun Dhillon, Steven Rostedt,
	Arnaldo Carvalho de Melo


* Arnaldo Carvalho de Melo <acme@kernel.org> wrote:

> Hi Ingo,
> 
> 	Please consider pulling,
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit 84e5b549214f2160c12318aac549de85f600c79a:
> 
>   Merge tag 'perf-core-for-mingo-4.11-20170306' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2017-03-07 08:14:14 +0100)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-4.12-20170314
> 
> for you to fetch changes up to 5f6bee34707973ea7879a7857fd63ddccc92fff3:
> 
>   kprobes: Convert kprobe_exceptions_notify to use NOKPROBE_SYMBOL (2017-03-14 15:17:40 -0300)
> 
> ----------------------------------------------------------------
> perf/core improvements and fixes:
> 
> New features:
> 
> - Add PERF_RECORD_NAMESPACES so that the kernel can record information
>   required to associate samples to namespaces, helping in container
>   problem characterization.
> 
>   Now the 'perf record has a --namespace' option to ask for such info,
>   and when present, it can be used, initially, via a new sort order,
>   'cgroup_id', allowing histogram entry bucketization by a (device, inode)
>   based cgroup identifier (Hari Bathini)
> 
> - Add --next option to 'perf sched timehist', showing what is the next
>   thread to run (Brendan Gregg)
> 
> Fixes:
> 
> - Fix segfault with basic block 'cycles' sort dimension (Changbin Du)
> 
> - Add c2c to command-list.txt, making it appear in the 'perf help'
>   output (Changbin Du)
> 
> - Fix zeroing of 'abs_path' variable in the perf hists browser switch
>   file code (Changbin Du)
> 
> - Hide tips messages when -q/--quiet is given to 'perf report' (Namhyung Kim)
> 
> Infrastructure:
> 
> - Use ref_reloc_sym + offset to setup kretprobes (Naveen Rao)
> 
> - Ignore generated files pmu-events/{jevents,pmu-events.c} for git (Changbin Du)
> 
> Documentation:
> 
> - Document +field style argument support for --field option (Changbin Du)
> 
> - Clarify 'perf c2c --stats' help message (Namhyung Kim)
> 
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> ----------------------------------------------------------------
> Brendan Gregg (1):
>       perf sched timehist: Add --next option
> 
> Changbin Du (5):
>       perf tools: Missing c2c command in command-list
>       perf tools: Ignore generated files pmu-events/{jevents,pmu-events.c} for git
>       perf sort: Fix segfault with basic block 'cycles' sort dimension
>       perf report: Document +field style argument support for --field option
>       perf hists browser: Fix typo in function switch_data_file
> 
> Hari Bathini (5):
>       perf: Add PERF_RECORD_NAMESPACES to include namespaces related info
>       perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info
>       perf record: Synthesize namespace events for current processes
>       perf script: Add script print support for namespace events
>       perf tools: Add 'cgroup_id' sort order keyword
> 
> Namhyung Kim (3):
>       perf report: Hide tip message when -q option is given
>       perf c2c: Clarify help message of --stats option
>       perf c2c: Fix display bug when using pipe
> 
> Naveen N. Rao (5):
>       perf probe: Factor out the ftrace README scanning
>       perf kretprobes: Offset from reloc_sym if kernel supports it
>       perf powerpc: Choose local entry point with kretprobes
>       doc: trace/kprobes: add information about NOKPROBE_SYMBOL
>       kprobes: Convert kprobe_exceptions_notify to use NOKPROBE_SYMBOL
> 
>  Documentation/trace/kprobetrace.txt         |   5 +-
>  include/linux/perf_event.h                  |   2 +
>  include/uapi/linux/perf_event.h             |  32 +++++-
>  kernel/events/core.c                        | 139 ++++++++++++++++++++++++++
>  kernel/fork.c                               |   2 +
>  kernel/kprobes.c                            |   5 +-
>  kernel/nsproxy.c                            |   3 +
>  tools/include/uapi/linux/perf_event.h       |  32 +++++-
>  tools/perf/.gitignore                       |   2 +
>  tools/perf/Documentation/perf-record.txt    |   3 +
>  tools/perf/Documentation/perf-report.txt    |   7 +-
>  tools/perf/Documentation/perf-sched.txt     |   4 +
>  tools/perf/Documentation/perf-script.txt    |   3 +
>  tools/perf/arch/powerpc/util/sym-handling.c |  14 ++-
>  tools/perf/builtin-annotate.c               |   1 +
>  tools/perf/builtin-c2c.c                    |   4 +-
>  tools/perf/builtin-diff.c                   |   1 +
>  tools/perf/builtin-inject.c                 |  13 +++
>  tools/perf/builtin-kmem.c                   |   1 +
>  tools/perf/builtin-kvm.c                    |   2 +
>  tools/perf/builtin-lock.c                   |   1 +
>  tools/perf/builtin-mem.c                    |   1 +
>  tools/perf/builtin-record.c                 |  35 ++++++-
>  tools/perf/builtin-report.c                 |   4 +-
>  tools/perf/builtin-sched.c                  |  26 ++++-
>  tools/perf/builtin-script.c                 |  41 ++++++++
>  tools/perf/builtin-trace.c                  |   3 +-
>  tools/perf/command-list.txt                 |   1 +
>  tools/perf/perf.h                           |   1 +
>  tools/perf/ui/browsers/hists.c              |   2 +-
>  tools/perf/util/Build                       |   1 +
>  tools/perf/util/data-convert-bt.c           |   1 +
>  tools/perf/util/event.c                     | 150 ++++++++++++++++++++++++++--
>  tools/perf/util/event.h                     |  19 ++++
>  tools/perf/util/evsel.c                     |   3 +
>  tools/perf/util/hist.c                      |   7 ++
>  tools/perf/util/hist.h                      |   1 +
>  tools/perf/util/machine.c                   |  34 +++++++
>  tools/perf/util/machine.h                   |   3 +
>  tools/perf/util/namespaces.c                |  36 +++++++
>  tools/perf/util/namespaces.h                |  26 +++++
>  tools/perf/util/probe-event.c               |  12 +--
>  tools/perf/util/probe-file.c                |  77 ++++++++------
>  tools/perf/util/probe-file.h                |   1 +
>  tools/perf/util/session.c                   |   7 ++
>  tools/perf/util/sort.c                      |  46 +++++++++
>  tools/perf/util/sort.h                      |   7 ++
>  tools/perf/util/thread.c                    |  44 +++++++-
>  tools/perf/util/thread.h                    |   6 ++
>  tools/perf/util/tool.h                      |   2 +
>  50 files changed, 799 insertions(+), 74 deletions(-)
>  create mode 100644 tools/perf/util/namespaces.c
>  create mode 100644 tools/perf/util/namespaces.h

Pulled, thanks a lot Arnaldo!

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-03-15 18:30 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-14 18:50 [GIT PULL 00/19] perf/core improvements and fixes Arnaldo Carvalho de Melo
2017-03-14 18:50 ` [PATCH 01/19] perf report: Hide tip message when -q option is given Arnaldo Carvalho de Melo
2017-03-14 18:50 ` [PATCH 02/19] perf c2c: Clarify help message of --stats option Arnaldo Carvalho de Melo
2017-03-14 18:50 ` [PATCH 03/19] perf c2c: Fix display bug when using pipe Arnaldo Carvalho de Melo
2017-03-14 18:50 ` [PATCH 04/19] perf tools: Missing c2c command in command-list Arnaldo Carvalho de Melo
2017-03-14 18:50 ` [PATCH 05/19] perf tools: Ignore generated files pmu-events/{jevents,pmu-events.c} for git Arnaldo Carvalho de Melo
2017-03-14 18:50 ` [PATCH 06/19] perf sort: Fix segfault with basic block 'cycles' sort dimension Arnaldo Carvalho de Melo
2017-03-14 18:50 ` [PATCH 07/19] perf report: Document +field style argument support for --field option Arnaldo Carvalho de Melo
2017-03-14 18:50 ` [PATCH 08/19] perf hists browser: Fix typo in function switch_data_file Arnaldo Carvalho de Melo
2017-03-14 18:50 ` [PATCH 09/19] perf: Add PERF_RECORD_NAMESPACES to include namespaces related info Arnaldo Carvalho de Melo
2017-03-14 18:50 ` [PATCH 10/19] perf tools: " Arnaldo Carvalho de Melo
2017-03-14 18:50 ` [PATCH 11/19] perf record: Synthesize namespace events for current processes Arnaldo Carvalho de Melo
2017-03-14 18:50 ` [PATCH 12/19] perf script: Add script print support for namespace events Arnaldo Carvalho de Melo
2017-03-14 18:50 ` [PATCH 13/19] perf tools: Add 'cgroup_id' sort order keyword Arnaldo Carvalho de Melo
2017-03-14 18:50 ` [PATCH 14/19] perf sched timehist: Add --next option Arnaldo Carvalho de Melo
2017-03-14 18:50 ` [PATCH 15/19] perf probe: Factor out the ftrace README scanning Arnaldo Carvalho de Melo
2017-03-14 18:50 ` [PATCH 16/19] perf kretprobes: Offset from reloc_sym if kernel supports it Arnaldo Carvalho de Melo
2017-03-14 18:50 ` [PATCH 17/19] perf powerpc: Choose local entry point with kretprobes Arnaldo Carvalho de Melo
2017-03-14 18:50 ` [PATCH 18/19] doc: trace/kprobes: add information about NOKPROBE_SYMBOL Arnaldo Carvalho de Melo
2017-03-14 18:50 ` [PATCH 19/19] kprobes: Convert kprobe_exceptions_notify to use NOKPROBE_SYMBOL Arnaldo Carvalho de Melo
2017-03-15 18:29 ` [GIT PULL 00/19] perf/core improvements and fixes Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).