All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL 00/41] perf/core improvements and fixes
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo,
	Adrian Hunter, Alexander Shishkin, Andi Kleen, coresight,
	David Ahern, Heiko Carstens, Hendrik Brueckner, Jaecheol Shin,
	Jin Yao, Jiri Olsa, Kan Liang, linux-arm-kernel, linuxppc-dev,
	Martin Schwidefsky, Masami Hiramatsu, Mathieu Poirier,
	Michael Ellerman, Milian Wolff, Namhyung Kim, Naveen N . Rao,
	Peter Zijlstra, Ravi Bangoria, Robert Walker, Sangwon Hong,
	Stephane Eranian, Taeung Song, Thomas Richter, Wang Nan,
	yuzhoujian, Arnaldo Carvalho de Melo

Hi Ingo,

	Please consider pulling, this is on top of tip/perf/urgent.

- Arnaldo

Test results at the end of this message, as usual.

The following changes since commit 297f9233b53a08fd457815e19f1d6f2c3389857b:

  kprobes: Propagate error from disarm_kprobe_ftrace() (2018-02-16 09:12:58 +0100)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-4.17-20180216

for you to fetch changes up to 21316ac6803d4a1aadd74b896db8d60a92cd1140:

  perf tests shell lib: Use a wildcard to remove the vfs_getname probe (2018-02-16 15:31:12 -0300)

----------------------------------------------------------------
perf/core improvements and fixes:

- Fix wrong jump arrow in systems with branch records with cycles,
  i.e. Intel's >= Skylake (Jin Yao)

- Fix 'perf record --per-thread' problem introduced when
  implementing 'perf stat --per-thread (Jin Yao)

- Use arch__compare_symbol_names() to fix 'perf test vmlinux',
  that was using strcmp(symbol names) while the dso routines
  doing symbol lookups used the arch overridable one, making
  this test fail in architectures that overrided that function
  with something other than strcmp() (Jiri Olsa)

- Add 'perf script --show-round-event' to display
  PERF_RECORD_FINISHED_ROUND entries (Jiri Olsa)

- Fix dwarf unwind for stripped binaries in 'perf test' (Jiri Olsa)

- Use ordered_events for 'perf report --tasks', otherwise we may get
  artifacts when PERF_RECORD_FORK gets processed before PERF_RECORD_COMM
  (when they got recorded in different CPUs) (Jiri Olsa)

- Add support to display group output for non group events, i.e.
  now when one uses 'perf report --group' on a perf.data file
  recorded without explicitly grouping events with {} (e.g.
  "perf record -e '{cycles,instructions}'" get the same output
  that would produce, i.e. see all those non-grouped events in
  multiple columns, at the same time (Jiri Olsa)

- Skip non-address kallsyms entries, e.g. '(null)' for !root (Jiri Olsa)

- Kernel maps fixes wrt perf.data(report) versus live system (top)
  (Jiri Olsa)

- Fix memory corruption when using 'perf record -j call -g -a <application>'
  followed by 'perf report --branch-history' (Jiri Olsa)

- ARM CoreSight fixes (Mathieu Poirier)

- Add inject capability for CoreSight Traces (Robert Waker)

- Update documentation for use of 'perf' + ARM CoreSight (Robert Walker)

- Man pages fixes (Sangwon Hong, Jaecheol Shin)

- Fix some 'perf test' cases on s/390 and x86_64 (some backtraces
  changed with a glibc update) (Thomas Richter)

- Add detailed CPUID info in the 'perf.data' headers for s/390 to
  then use it in 'perf annotate' (Thomas Richter)

- Add '--interval-count N' to 'perf stat', to use with -I, i.e.
  'perf stat -I 1000 --interval-count 2' will show stats every
   1000ms, two times (yuzhoujian)

- Add 'perf stat --timeout Nms', that will run for that many
  milliseconds and then stop, printing the counters (yuzhoujian)

- Fix description for 'perf report --mem-modex (Andi Kleen)

- Use a wildcard to remove the vfs_getname probe in the
  'perf test' shell based test cases (Arnaldo Carvalho de Melo)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

----------------------------------------------------------------
Andi Kleen (1):
      perf report: Fix description for --mem-mode

Arnaldo Carvalho de Melo (1):
      perf tests shell lib: Use a wildcard to remove the vfs_getname probe

Jaecheol Shin (1):
      perf annotate: Add missing arguments in Man page

Jin Yao (2):
      perf tools: Use target->per_thread and target->system_wide flags
      perf report: Fix wrong jump arrow

Jiri Olsa (18):
      perf record: Put new line after target override warning
      perf script: Add --show-round-event to display PERF_RECORD_FINISHED_ROUND
      tools lib api fs: Add filename__read_xll function
      tools lib api fs: Add sysfs__read_xll function
      perf tests: Fix dwarf unwind for stripped binaries
      perf tools: Fix comment for sort__* compare functions
      perf report: Ask for ordered events for --tasks option
      perf report: Add support to display group output for non group events
      tools lib symbol: Skip non-address kallsyms line
      perf symbols: Check if we read regular file in dso__load()
      perf machine: Free root_dir in machine__init() error path
      perf machine: Move kernel mmap name into struct machine
      perf machine: Generalize machine__set_kernel_mmap()
      perf machine: Don't search for active kernel start in __machine__create_kernel_maps
      perf machine: Remove machine__load_kallsyms()
      perf tools: Do not create kernel maps in sample__resolve()
      perf tests: Use arch__compare_symbol_names to compare symbols
      perf report: Fix memory corruption in --branch-history mode --branch-history

Mathieu Poirier (3):
      perf cs-etm: Freeing allocated memory
      perf auxtrace arm: Fixing uninitialised variable
      perf cs-etm: Properly deal with cpu maps

Ravi Bangoria (3):
      tools include powerpc: Grab a copy of arch/powerpc/include/uapi/asm/unistd.h
      perf powerpc: Generate system call table from asm/unistd.h
      perf trace powerpc: Use generated syscall table

Robert Walker (3):
      perf cs-etm: Inject capabilitity for CoreSight traces
      perf inject: Emit instruction records on ETM trace discontinuity
      coresight: Update documentation for perf usage

Sangwon Hong (2):
      perf kmem: Document a missing option & an argument
      perf mem: Document a missing option

Thomas Richter (5):
      perf record: Provide detailed information on s390 CPU
      perf annotate: Scan cpuid for s390 and save machine type
      perf cpuid: Introduce a platform specific cpuid compare function
      perf test: Fix test case 23 for s390 z/VM or KVM guests
      perf test: Fix test case inet_pton to accept inlines.

yuzhoujian (2):
      perf stat: Add support to print counts for fixed times
      perf stat: Add support to print counts after a period of time

 Documentation/trace/coresight.txt                  |  51 +++
 tools/arch/powerpc/include/uapi/asm/unistd.h       | 402 +++++++++++++++++
 tools/lib/api/fs/fs.c                              |  44 +-
 tools/lib/api/fs/fs.h                              |   2 +
 tools/lib/symbol/kallsyms.c                        |   4 +
 tools/perf/Documentation/perf-annotate.txt         |   6 +-
 tools/perf/Documentation/perf-kmem.txt             |   6 +-
 tools/perf/Documentation/perf-mem.txt              |   4 +
 tools/perf/Documentation/perf-report.txt           |   5 +-
 tools/perf/Documentation/perf-script.txt           |   3 +
 tools/perf/Documentation/perf-stat.txt             |  10 +
 tools/perf/Makefile.config                         |   2 +
 tools/perf/arch/arm/util/auxtrace.c                |   2 +-
 tools/perf/arch/arm/util/cs-etm.c                  |  51 ++-
 tools/perf/arch/powerpc/Makefile                   |  25 ++
 .../perf/arch/powerpc/entry/syscalls/mksyscalltbl  |  37 ++
 tools/perf/arch/s390/annotate/instructions.c       |  27 +-
 tools/perf/arch/s390/util/header.c                 | 148 ++++++-
 tools/perf/builtin-record.c                        |   2 +-
 tools/perf/builtin-report.c                        |   7 +-
 tools/perf/builtin-script.c                        |  17 +
 tools/perf/builtin-stat.c                          |  53 ++-
 tools/perf/check-headers.sh                        |   1 +
 tools/perf/tests/code-reading.c                    |  33 +-
 tools/perf/tests/dwarf-unwind.c                    |  46 +-
 tools/perf/tests/shell/lib/probe_vfs_getname.sh    |   2 +-
 .../perf/tests/shell/trace+probe_libc_inet_pton.sh |   6 +-
 tools/perf/tests/vmlinux-kallsyms.c                |   4 +-
 tools/perf/ui/browsers/annotate.c                  |   9 +-
 tools/perf/util/build-id.c                         |  10 +-
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c    |  74 +++-
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h    |   2 +
 tools/perf/util/cs-etm.c                           | 478 ++++++++++++++++++---
 tools/perf/util/event.c                            |  16 +-
 tools/perf/util/evlist.c                           |  21 +-
 tools/perf/util/header.h                           |   1 +
 tools/perf/util/hist.c                             |   4 +-
 tools/perf/util/hist.h                             |   1 -
 tools/perf/util/machine.c                          | 145 +++----
 tools/perf/util/machine.h                          |   6 +-
 tools/perf/util/pmu.c                              |  47 +-
 tools/perf/util/sort.c                             |   7 +-
 tools/perf/util/stat.h                             |   2 +
 tools/perf/util/symbol.c                           |  13 +-
 tools/perf/util/syscalltbl.c                       |   8 +
 tools/perf/util/thread_map.c                       |   4 +-
 tools/perf/util/thread_map.h                       |   2 +-
 47 files changed, 1577 insertions(+), 273 deletions(-)
 create mode 100644 tools/arch/powerpc/include/uapi/asm/unistd.h
 create mode 100755 tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl

Test results:

The first ones are container (docker) based builds of tools/perf with and
without libelf support.  Where clang is available, it is also used to build
perf with/without libelf.

The objtool and samples/bpf/ builds are disabled now that I'm switching from
using the sources in a local volume to fetching them from a http server to
build it inside the container, to make it easier to build in a container cluster.
Those will come back later.

Several are cross builds, the ones with -x-ARCH and the android one, and those
may not have all the features built, due to lack of multi-arch devel packages,
available and being used so far on just a few, like
debian:experimental-x-{arm64,mipsel}.

The 'perf test' one will perform a variety of tests exercising
tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands
with a variety of command line event specifications to then intercept the
sys_perf_event syscall to check that the perf_event_attr fields are set up as
expected, among a variety of other unit tests.

Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/
with a variety of feature sets, exercising the build with an incomplete set of
features as well as with a complete one. It is planned to have it run on each
of the containers mentioned above, using some container orchestration
infrastructure. Get in contact if interested in helping having this in place.

  On a Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz

  # dm
   1 39.82 alpine:3.4                    : Ok   gcc (Alpine 5.3.0) 5.3.0
   2 57.59 alpine:3.5                    : Ok   gcc (Alpine 6.2.1) 6.2.1 20160822
   3 44.30 alpine:3.6                    : Ok   gcc (Alpine 6.3.0) 6.3.0
   4 42.14 alpine:edge                   : Ok   gcc (Alpine 6.4.0) 6.4.0
   5 35.50 amazonlinux:1                 : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
   6 42.97 amazonlinux:2                 : Ok   gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2)
   7 26.46 android-ndk:r12b-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
   8 27.28 android-ndk:r15c-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
   9 24.29 centos:5                      : Ok   gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55)
  10 32.15 centos:6                      : Ok   gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18)
  11 39.36 centos:7                      : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
  12 34.69 debian:7                      : Ok   gcc (Debian 4.7.2-5) 4.7.2
  13 37.92 debian:8                      : Ok   gcc (Debian 4.9.2-10) 4.9.2
  14 62.13 debian:9                      : Ok   gcc (Debian 6.3.0-18) 6.3.0 20170516
  15 65.51 debian:experimental           : Ok   gcc (Debian 7.2.0-18) 7.2.0
  16 38.73 debian:experimental-x-arm64   : Ok   aarch64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
  17 68.18 debian:experimental-x-mips    : Ok   mips-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
  18 36.21 debian:experimental-x-mips64  : Ok   mips64-linux-gnuabi64-gcc (Debian 7.2.0-11) 7.2.0
  19 37.57 debian:experimental-x-mipsel  : Ok   mipsel-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
  20 38.22 fedora:20                     : Ok   gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)
  21 42.49 fedora:21                     : Ok   gcc (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)
  22 39.15 fedora:22                     : Ok   gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
  23 41.46 fedora:23                     : Ok   gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
  24 41.12 fedora:24                     : Ok   gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1)
  25 34.92 fedora:24-x-ARC-uClibc        : Ok   arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710
  26 78.28 fedora:25                     : Ok   gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1)
  27 84.02 fedora:26                     : Ok   gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2)
  28 95.42 fedora:27                     : Ok   gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2)
  29 78.89 fedora:rawhide                : Ok   gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-4)
  30 57.48 gentoo-stage3-amd64:latest    : Ok   gcc (Gentoo 6.4.0-r1 p1.3) 6.4.0
  31 41.18 mageia:5                      : Ok   gcc (GCC) 4.9.2
  32 42.27 mageia:6                      : Ok   gcc (Mageia 5.4.0-5.mga6) 5.4.0
  33 39.66 opensuse:42.1                 : Ok   gcc (SUSE Linux) 4.8.5
  34 40.09 opensuse:42.2                 : Ok   gcc (SUSE Linux) 4.8.5
  35 41.01 opensuse:42.3                 : Ok   gcc (SUSE Linux) 4.8.5
  36 82.32 opensuse:tumbleweed           : Ok   gcc (SUSE Linux) 7.3.0
  37 31.70 oraclelinux:6                 : Ok   gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18)
  38 38.39 oraclelinux:7                 : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
  39 30.49 ubuntu:12.04.5                : Ok   gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
  40 36.44 ubuntu:14.04.4                : Ok   gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
  41 32.13 ubuntu:14.04.4-x-linaro-arm64 : Ok   aarch64-linux-gnu-gcc (Linaro GCC 5.5-2017.10) 5.5.0
  42 58.58 ubuntu:16.04                  : Ok   gcc (Ubuntu 5.4.0-6ubuntu1~16.04.6) 5.4.0 20160609
  43 31.52 ubuntu:16.04-x-arm            : Ok   arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
  44 31.06 ubuntu:16.04-x-arm64          : Ok   aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
  45 31.61 ubuntu:16.04-x-powerpc        : Ok   powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
  46 31.93 ubuntu:16.04-x-powerpc64      : Ok   powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.1) 5.4.0 20160609
  47 33.02 ubuntu:16.04-x-powerpc64el    : Ok   powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
  48 30.94 ubuntu:16.04-x-s390           : Ok   s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
  49 63.24 ubuntu:16.10                  : Ok   gcc (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005
  50 63.34 ubuntu:17.04                  : Ok   gcc (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406
  51 63.56 ubuntu:17.10                  : Ok   gcc (Ubuntu 7.2.0-8ubuntu3) 7.2.0
  52 63.45 ubuntu:18.04                  : Ok   gcc (Ubuntu 7.2.0-18ubuntu2) 7.2.0

  # uname -a
  Linux jouet 4.15.0-rc9+ #7 SMP Mon Jan 22 18:16:36 -03 2018 x86_64 x86_64 x86_64 GNU/Linux
  # perf test
   1: vmlinux symtab matches kallsyms                       : Ok
   2: Detect openat syscall event                           : Ok
   3: Detect openat syscall event on all cpus               : Ok
   4: Read samples using the mmap interface                 : Ok
   5: Test data source output                               : Ok
   6: Parse event definition strings                        : Ok
   7: Simple expression parser                              : Ok
   8: PERF_RECORD_* events & perf_sample fields             : Ok
   9: Parse perf pmu format                                 : Ok
  10: DSO data read                                         : Ok
  11: DSO data cache                                        : Ok
  12: DSO data reopen                                       : Ok
  13: Roundtrip evsel->name                                 : Ok
  14: Parse sched tracepoints fields                        : Ok
  15: syscalls:sys_enter_openat event fields                : Ok
  16: Setup struct perf_event_attr                          : Ok
  17: Match and link multiple hists                         : Ok
  18: 'import perf' in python                               : Ok
  19: Breakpoint overflow signal handler                    : Ok
  20: Breakpoint overflow sampling                          : Ok
  21: Number of exit events of a simple workload            : Ok
  22: Software clock events period values                   : Ok
  23: Object code reading                                   : Ok
  24: Sample parsing                                        : Ok
  25: Use a dummy software event to keep tracking           : Ok
  26: Parse with no sample_id_all bit set                   : Ok
  27: Filter hist entries                                   : Ok
  28: Lookup mmap thread                                    : Ok
  29: Share thread mg                                       : Ok
  30: Sort output of hist entries                           : Ok
  31: Cumulate child hist entries                           : Ok
  32: Track with sched_switch                               : Ok
  33: Filter fds with revents mask in a fdarray             : Ok
  34: Add fd to a fdarray, making it autogrow               : Ok
  35: kmod_path__parse                                      : Ok
  36: Thread map                                            : Ok
  37: LLVM search and compile                               :
  37.1: Basic BPF llvm compile                              : Ok
  37.2: kbuild searching                                    : Ok
  37.3: Compile source for BPF prologue generation          : Ok
  37.4: Compile source for BPF relocation                   : Ok
  38: Session topology                                      : Ok
  39: BPF filter                                            :
  39.1: Basic BPF filtering                                 : Ok
  39.2: BPF pinning                                         : Ok
  39.3: BPF prologue generation                             : Ok
  39.4: BPF relocation checker                              : Ok
  40: Synthesize thread map                                 : Ok
  41: Remove thread map                                     : Ok
  42: Synthesize cpu map                                    : Ok
  43: Synthesize stat config                                : Ok
  44: Synthesize stat                                       : Ok
  45: Synthesize stat round                                 : Ok
  46: Synthesize attr update                                : Ok
  47: Event times                                           : Ok
  48: Read backward ring buffer                             : Ok
  49: Print cpu map                                         : Ok
  50: Probe SDT events                                      : Ok
  51: is_printable_array                                    : Ok
  52: Print bitmap                                          : Ok
  53: perf hooks                                            : Ok
  54: builtin clang support                                 : Skip (not compiled in)
  55: unit_number__scnprintf                                : Ok
  56: x86 rdpmc                                             : Ok
  57: Convert perf time to TSC                              : Ok
  58: DWARF unwind                                          : Ok
  59: x86 instruction decoder - new instructions            : Ok
  60: Use vfs_getname probe to get syscall args filenames   : Ok
  61: probe libc's inet_pton & backtrace it with ping       : Ok
  62: Check open filename arg using perf trace + vfs_getname: Ok
  63: Add vfs_getname probe to get syscall args filenames   : Ok
  # 
  
  $ make -C tools/perf build-test
  make: Entering directory '/home/acme/git/perf/tools/perf'
  - tarpkg: ./tests/perf-targz-src-pkg .
           make_no_libunwind_O: make NO_LIBUNWIND=1
           make_no_backtrace_O: make NO_BACKTRACE=1
        make_with_babeltrace_O: make LIBBABELTRACE=1
  make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1
   make_install_prefix_slash_O: make install prefix=/tmp/krava/
                 make_static_O: make LDFLAGS=-static
                make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1
         make_with_clangllvm_O: make LIBCLANGLLVM=1
                make_install_O: make install
                    make_doc_O: make doc
                   make_pure_O: make
            make_no_libaudit_O: make NO_LIBAUDIT=1
               make_no_slang_O: make NO_SLANG=1
                make_no_gtk2_O: make NO_GTK2=1
            make_install_bin_O: make install-bin
             make_no_libperl_O: make NO_LIBPERL=1
            make_no_demangle_O: make NO_DEMANGLE=1
                   make_tags_O: make tags
            make_no_auxtrace_O: make NO_AUXTRACE=1
       make_util_pmu_bison_o_O: make util/pmu-bison.o
         make_install_prefix_O: make install prefix=/tmp/krava
                  make_debug_O: make DEBUG=1
             make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1
                make_no_newt_O: make NO_NEWT=1
                  make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1
  ^[[5~           make_no_libnuma_O: make NO_LIBNUMA=1
              make_no_libelf_O: make NO_LIBELF=1
                   make_help_O: make help
              make_no_libbpf_O: make NO_LIBBPF=1
           make_no_libpython_O: make NO_LIBPYTHON=1
           make_no_libbionic_O: make NO_LIBBIONIC=1
              make_clean_all_O: make clean all
                 make_perf_o_O: make perf.o
             make_util_map_o_O: make util/map.o
  OK
  make: Leaving directory '/home/acme/git/perf/tools/perf'
  $ 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [GIT PULL 00/41] perf/core improvements and fixes
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo,
	Adrian Hunter, Alexander Shishkin, Andi Kleen, coresight,
	David Ahern, Heiko Carstens, Hendrik Brueckner, Jaecheol Shin,
	Jin Yao, Jiri Olsa, Kan Liang, linux-arm-kernel, linuxppc-dev,
	Martin Schwidefsky, Masami Hiramatsu, Mathieu Poirier,
	Michael Ellerman

Hi Ingo,

	Please consider pulling, this is on top of tip/perf/urgent.

- Arnaldo

Test results at the end of this message, as usual.

The following changes since commit 297f9233b53a08fd457815e19f1d6f2c3389857b:

  kprobes: Propagate error from disarm_kprobe_ftrace() (2018-02-16 09:12:58 +0100)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-4.17-20180216

for you to fetch changes up to 21316ac6803d4a1aadd74b896db8d60a92cd1140:

  perf tests shell lib: Use a wildcard to remove the vfs_getname probe (2018-02-16 15:31:12 -0300)

----------------------------------------------------------------
perf/core improvements and fixes:

- Fix wrong jump arrow in systems with branch records with cycles,
  i.e. Intel's >= Skylake (Jin Yao)

- Fix 'perf record --per-thread' problem introduced when
  implementing 'perf stat --per-thread (Jin Yao)

- Use arch__compare_symbol_names() to fix 'perf test vmlinux',
  that was using strcmp(symbol names) while the dso routines
  doing symbol lookups used the arch overridable one, making
  this test fail in architectures that overrided that function
  with something other than strcmp() (Jiri Olsa)

- Add 'perf script --show-round-event' to display
  PERF_RECORD_FINISHED_ROUND entries (Jiri Olsa)

- Fix dwarf unwind for stripped binaries in 'perf test' (Jiri Olsa)

- Use ordered_events for 'perf report --tasks', otherwise we may get
  artifacts when PERF_RECORD_FORK gets processed before PERF_RECORD_COMM
  (when they got recorded in different CPUs) (Jiri Olsa)

- Add support to display group output for non group events, i.e.
  now when one uses 'perf report --group' on a perf.data file
  recorded without explicitly grouping events with {} (e.g.
  "perf record -e '{cycles,instructions}'" get the same output
  that would produce, i.e. see all those non-grouped events in
  multiple columns, at the same time (Jiri Olsa)

- Skip non-address kallsyms entries, e.g. '(null)' for !root (Jiri Olsa)

- Kernel maps fixes wrt perf.data(report) versus live system (top)
  (Jiri Olsa)

- Fix memory corruption when using 'perf record -j call -g -a <application>'
  followed by 'perf report --branch-history' (Jiri Olsa)

- ARM CoreSight fixes (Mathieu Poirier)

- Add inject capability for CoreSight Traces (Robert Waker)

- Update documentation for use of 'perf' + ARM CoreSight (Robert Walker)

- Man pages fixes (Sangwon Hong, Jaecheol Shin)

- Fix some 'perf test' cases on s/390 and x86_64 (some backtraces
  changed with a glibc update) (Thomas Richter)

- Add detailed CPUID info in the 'perf.data' headers for s/390 to
  then use it in 'perf annotate' (Thomas Richter)

- Add '--interval-count N' to 'perf stat', to use with -I, i.e.
  'perf stat -I 1000 --interval-count 2' will show stats every
   1000ms, two times (yuzhoujian)

- Add 'perf stat --timeout Nms', that will run for that many
  milliseconds and then stop, printing the counters (yuzhoujian)

- Fix description for 'perf report --mem-modex (Andi Kleen)

- Use a wildcard to remove the vfs_getname probe in the
  'perf test' shell based test cases (Arnaldo Carvalho de Melo)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

----------------------------------------------------------------
Andi Kleen (1):
      perf report: Fix description for --mem-mode

Arnaldo Carvalho de Melo (1):
      perf tests shell lib: Use a wildcard to remove the vfs_getname probe

Jaecheol Shin (1):
      perf annotate: Add missing arguments in Man page

Jin Yao (2):
      perf tools: Use target->per_thread and target->system_wide flags
      perf report: Fix wrong jump arrow

Jiri Olsa (18):
      perf record: Put new line after target override warning
      perf script: Add --show-round-event to display PERF_RECORD_FINISHED_ROUND
      tools lib api fs: Add filename__read_xll function
      tools lib api fs: Add sysfs__read_xll function
      perf tests: Fix dwarf unwind for stripped binaries
      perf tools: Fix comment for sort__* compare functions
      perf report: Ask for ordered events for --tasks option
      perf report: Add support to display group output for non group events
      tools lib symbol: Skip non-address kallsyms line
      perf symbols: Check if we read regular file in dso__load()
      perf machine: Free root_dir in machine__init() error path
      perf machine: Move kernel mmap name into struct machine
      perf machine: Generalize machine__set_kernel_mmap()
      perf machine: Don't search for active kernel start in __machine__create_kernel_maps
      perf machine: Remove machine__load_kallsyms()
      perf tools: Do not create kernel maps in sample__resolve()
      perf tests: Use arch__compare_symbol_names to compare symbols
      perf report: Fix memory corruption in --branch-history mode --branch-history

Mathieu Poirier (3):
      perf cs-etm: Freeing allocated memory
      perf auxtrace arm: Fixing uninitialised variable
      perf cs-etm: Properly deal with cpu maps

Ravi Bangoria (3):
      tools include powerpc: Grab a copy of arch/powerpc/include/uapi/asm/unistd.h
      perf powerpc: Generate system call table from asm/unistd.h
      perf trace powerpc: Use generated syscall table

Robert Walker (3):
      perf cs-etm: Inject capabilitity for CoreSight traces
      perf inject: Emit instruction records on ETM trace discontinuity
      coresight: Update documentation for perf usage

Sangwon Hong (2):
      perf kmem: Document a missing option & an argument
      perf mem: Document a missing option

Thomas Richter (5):
      perf record: Provide detailed information on s390 CPU
      perf annotate: Scan cpuid for s390 and save machine type
      perf cpuid: Introduce a platform specific cpuid compare function
      perf test: Fix test case 23 for s390 z/VM or KVM guests
      perf test: Fix test case inet_pton to accept inlines.

yuzhoujian (2):
      perf stat: Add support to print counts for fixed times
      perf stat: Add support to print counts after a period of time

 Documentation/trace/coresight.txt                  |  51 +++
 tools/arch/powerpc/include/uapi/asm/unistd.h       | 402 +++++++++++++++++
 tools/lib/api/fs/fs.c                              |  44 +-
 tools/lib/api/fs/fs.h                              |   2 +
 tools/lib/symbol/kallsyms.c                        |   4 +
 tools/perf/Documentation/perf-annotate.txt         |   6 +-
 tools/perf/Documentation/perf-kmem.txt             |   6 +-
 tools/perf/Documentation/perf-mem.txt              |   4 +
 tools/perf/Documentation/perf-report.txt           |   5 +-
 tools/perf/Documentation/perf-script.txt           |   3 +
 tools/perf/Documentation/perf-stat.txt             |  10 +
 tools/perf/Makefile.config                         |   2 +
 tools/perf/arch/arm/util/auxtrace.c                |   2 +-
 tools/perf/arch/arm/util/cs-etm.c                  |  51 ++-
 tools/perf/arch/powerpc/Makefile                   |  25 ++
 .../perf/arch/powerpc/entry/syscalls/mksyscalltbl  |  37 ++
 tools/perf/arch/s390/annotate/instructions.c       |  27 +-
 tools/perf/arch/s390/util/header.c                 | 148 ++++++-
 tools/perf/builtin-record.c                        |   2 +-
 tools/perf/builtin-report.c                        |   7 +-
 tools/perf/builtin-script.c                        |  17 +
 tools/perf/builtin-stat.c                          |  53 ++-
 tools/perf/check-headers.sh                        |   1 +
 tools/perf/tests/code-reading.c                    |  33 +-
 tools/perf/tests/dwarf-unwind.c                    |  46 +-
 tools/perf/tests/shell/lib/probe_vfs_getname.sh    |   2 +-
 .../perf/tests/shell/trace+probe_libc_inet_pton.sh |   6 +-
 tools/perf/tests/vmlinux-kallsyms.c                |   4 +-
 tools/perf/ui/browsers/annotate.c                  |   9 +-
 tools/perf/util/build-id.c                         |  10 +-
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c    |  74 +++-
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h    |   2 +
 tools/perf/util/cs-etm.c                           | 478 ++++++++++++++++++---
 tools/perf/util/event.c                            |  16 +-
 tools/perf/util/evlist.c                           |  21 +-
 tools/perf/util/header.h                           |   1 +
 tools/perf/util/hist.c                             |   4 +-
 tools/perf/util/hist.h                             |   1 -
 tools/perf/util/machine.c                          | 145 +++----
 tools/perf/util/machine.h                          |   6 +-
 tools/perf/util/pmu.c                              |  47 +-
 tools/perf/util/sort.c                             |   7 +-
 tools/perf/util/stat.h                             |   2 +
 tools/perf/util/symbol.c                           |  13 +-
 tools/perf/util/syscalltbl.c                       |   8 +
 tools/perf/util/thread_map.c                       |   4 +-
 tools/perf/util/thread_map.h                       |   2 +-
 47 files changed, 1577 insertions(+), 273 deletions(-)
 create mode 100644 tools/arch/powerpc/include/uapi/asm/unistd.h
 create mode 100755 tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl

Test results:

The first ones are container (docker) based builds of tools/perf with and
without libelf support.  Where clang is available, it is also used to build
perf with/without libelf.

The objtool and samples/bpf/ builds are disabled now that I'm switching from
using the sources in a local volume to fetching them from a http server to
build it inside the container, to make it easier to build in a container cluster.
Those will come back later.

Several are cross builds, the ones with -x-ARCH and the android one, and those
may not have all the features built, due to lack of multi-arch devel packages,
available and being used so far on just a few, like
debian:experimental-x-{arm64,mipsel}.

The 'perf test' one will perform a variety of tests exercising
tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands
with a variety of command line event specifications to then intercept the
sys_perf_event syscall to check that the perf_event_attr fields are set up as
expected, among a variety of other unit tests.

Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/
with a variety of feature sets, exercising the build with an incomplete set of
features as well as with a complete one. It is planned to have it run on each
of the containers mentioned above, using some container orchestration
infrastructure. Get in contact if interested in helping having this in place.

  On a Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz

  # dm
   1 39.82 alpine:3.4                    : Ok   gcc (Alpine 5.3.0) 5.3.0
   2 57.59 alpine:3.5                    : Ok   gcc (Alpine 6.2.1) 6.2.1 20160822
   3 44.30 alpine:3.6                    : Ok   gcc (Alpine 6.3.0) 6.3.0
   4 42.14 alpine:edge                   : Ok   gcc (Alpine 6.4.0) 6.4.0
   5 35.50 amazonlinux:1                 : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
   6 42.97 amazonlinux:2                 : Ok   gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2)
   7 26.46 android-ndk:r12b-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
   8 27.28 android-ndk:r15c-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
   9 24.29 centos:5                      : Ok   gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55)
  10 32.15 centos:6                      : Ok   gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18)
  11 39.36 centos:7                      : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
  12 34.69 debian:7                      : Ok   gcc (Debian 4.7.2-5) 4.7.2
  13 37.92 debian:8                      : Ok   gcc (Debian 4.9.2-10) 4.9.2
  14 62.13 debian:9                      : Ok   gcc (Debian 6.3.0-18) 6.3.0 20170516
  15 65.51 debian:experimental           : Ok   gcc (Debian 7.2.0-18) 7.2.0
  16 38.73 debian:experimental-x-arm64   : Ok   aarch64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
  17 68.18 debian:experimental-x-mips    : Ok   mips-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
  18 36.21 debian:experimental-x-mips64  : Ok   mips64-linux-gnuabi64-gcc (Debian 7.2.0-11) 7.2.0
  19 37.57 debian:experimental-x-mipsel  : Ok   mipsel-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
  20 38.22 fedora:20                     : Ok   gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)
  21 42.49 fedora:21                     : Ok   gcc (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)
  22 39.15 fedora:22                     : Ok   gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
  23 41.46 fedora:23                     : Ok   gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
  24 41.12 fedora:24                     : Ok   gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1)
  25 34.92 fedora:24-x-ARC-uClibc        : Ok   arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710
  26 78.28 fedora:25                     : Ok   gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1)
  27 84.02 fedora:26                     : Ok   gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2)
  28 95.42 fedora:27                     : Ok   gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2)
  29 78.89 fedora:rawhide                : Ok   gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-4)
  30 57.48 gentoo-stage3-amd64:latest    : Ok   gcc (Gentoo 6.4.0-r1 p1.3) 6.4.0
  31 41.18 mageia:5                      : Ok   gcc (GCC) 4.9.2
  32 42.27 mageia:6                      : Ok   gcc (Mageia 5.4.0-5.mga6) 5.4.0
  33 39.66 opensuse:42.1                 : Ok   gcc (SUSE Linux) 4.8.5
  34 40.09 opensuse:42.2                 : Ok   gcc (SUSE Linux) 4.8.5
  35 41.01 opensuse:42.3                 : Ok   gcc (SUSE Linux) 4.8.5
  36 82.32 opensuse:tumbleweed           : Ok   gcc (SUSE Linux) 7.3.0
  37 31.70 oraclelinux:6                 : Ok   gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18)
  38 38.39 oraclelinux:7                 : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
  39 30.49 ubuntu:12.04.5                : Ok   gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
  40 36.44 ubuntu:14.04.4                : Ok   gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
  41 32.13 ubuntu:14.04.4-x-linaro-arm64 : Ok   aarch64-linux-gnu-gcc (Linaro GCC 5.5-2017.10) 5.5.0
  42 58.58 ubuntu:16.04                  : Ok   gcc (Ubuntu 5.4.0-6ubuntu1~16.04.6) 5.4.0 20160609
  43 31.52 ubuntu:16.04-x-arm            : Ok   arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
  44 31.06 ubuntu:16.04-x-arm64          : Ok   aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
  45 31.61 ubuntu:16.04-x-powerpc        : Ok   powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
  46 31.93 ubuntu:16.04-x-powerpc64      : Ok   powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.1) 5.4.0 20160609
  47 33.02 ubuntu:16.04-x-powerpc64el    : Ok   powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
  48 30.94 ubuntu:16.04-x-s390           : Ok   s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
  49 63.24 ubuntu:16.10                  : Ok   gcc (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005
  50 63.34 ubuntu:17.04                  : Ok   gcc (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406
  51 63.56 ubuntu:17.10                  : Ok   gcc (Ubuntu 7.2.0-8ubuntu3) 7.2.0
  52 63.45 ubuntu:18.04                  : Ok   gcc (Ubuntu 7.2.0-18ubuntu2) 7.2.0

  # uname -a
  Linux jouet 4.15.0-rc9+ #7 SMP Mon Jan 22 18:16:36 -03 2018 x86_64 x86_64 x86_64 GNU/Linux
  # perf test
   1: vmlinux symtab matches kallsyms                       : Ok
   2: Detect openat syscall event                           : Ok
   3: Detect openat syscall event on all cpus               : Ok
   4: Read samples using the mmap interface                 : Ok
   5: Test data source output                               : Ok
   6: Parse event definition strings                        : Ok
   7: Simple expression parser                              : Ok
   8: PERF_RECORD_* events & perf_sample fields             : Ok
   9: Parse perf pmu format                                 : Ok
  10: DSO data read                                         : Ok
  11: DSO data cache                                        : Ok
  12: DSO data reopen                                       : Ok
  13: Roundtrip evsel->name                                 : Ok
  14: Parse sched tracepoints fields                        : Ok
  15: syscalls:sys_enter_openat event fields                : Ok
  16: Setup struct perf_event_attr                          : Ok
  17: Match and link multiple hists                         : Ok
  18: 'import perf' in python                               : Ok
  19: Breakpoint overflow signal handler                    : Ok
  20: Breakpoint overflow sampling                          : Ok
  21: Number of exit events of a simple workload            : Ok
  22: Software clock events period values                   : Ok
  23: Object code reading                                   : Ok
  24: Sample parsing                                        : Ok
  25: Use a dummy software event to keep tracking           : Ok
  26: Parse with no sample_id_all bit set                   : Ok
  27: Filter hist entries                                   : Ok
  28: Lookup mmap thread                                    : Ok
  29: Share thread mg                                       : Ok
  30: Sort output of hist entries                           : Ok
  31: Cumulate child hist entries                           : Ok
  32: Track with sched_switch                               : Ok
  33: Filter fds with revents mask in a fdarray             : Ok
  34: Add fd to a fdarray, making it autogrow               : Ok
  35: kmod_path__parse                                      : Ok
  36: Thread map                                            : Ok
  37: LLVM search and compile                               :
  37.1: Basic BPF llvm compile                              : Ok
  37.2: kbuild searching                                    : Ok
  37.3: Compile source for BPF prologue generation          : Ok
  37.4: Compile source for BPF relocation                   : Ok
  38: Session topology                                      : Ok
  39: BPF filter                                            :
  39.1: Basic BPF filtering                                 : Ok
  39.2: BPF pinning                                         : Ok
  39.3: BPF prologue generation                             : Ok
  39.4: BPF relocation checker                              : Ok
  40: Synthesize thread map                                 : Ok
  41: Remove thread map                                     : Ok
  42: Synthesize cpu map                                    : Ok
  43: Synthesize stat config                                : Ok
  44: Synthesize stat                                       : Ok
  45: Synthesize stat round                                 : Ok
  46: Synthesize attr update                                : Ok
  47: Event times                                           : Ok
  48: Read backward ring buffer                             : Ok
  49: Print cpu map                                         : Ok
  50: Probe SDT events                                      : Ok
  51: is_printable_array                                    : Ok
  52: Print bitmap                                          : Ok
  53: perf hooks                                            : Ok
  54: builtin clang support                                 : Skip (not compiled in)
  55: unit_number__scnprintf                                : Ok
  56: x86 rdpmc                                             : Ok
  57: Convert perf time to TSC                              : Ok
  58: DWARF unwind                                          : Ok
  59: x86 instruction decoder - new instructions            : Ok
  60: Use vfs_getname probe to get syscall args filenames   : Ok
  61: probe libc's inet_pton & backtrace it with ping       : Ok
  62: Check open filename arg using perf trace + vfs_getname: Ok
  63: Add vfs_getname probe to get syscall args filenames   : Ok
  # 
  
  $ make -C tools/perf build-test
  make: Entering directory '/home/acme/git/perf/tools/perf'
  - tarpkg: ./tests/perf-targz-src-pkg .
           make_no_libunwind_O: make NO_LIBUNWIND=1
           make_no_backtrace_O: make NO_BACKTRACE=1
        make_with_babeltrace_O: make LIBBABELTRACE=1
  make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1
   make_install_prefix_slash_O: make install prefix=/tmp/krava/
                 make_static_O: make LDFLAGS=-static
                make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1
         make_with_clangllvm_O: make LIBCLANGLLVM=1
                make_install_O: make install
                    make_doc_O: make doc
                   make_pure_O: make
            make_no_libaudit_O: make NO_LIBAUDIT=1
               make_no_slang_O: make NO_SLANG=1
                make_no_gtk2_O: make NO_GTK2=1
            make_install_bin_O: make install-bin
             make_no_libperl_O: make NO_LIBPERL=1
            make_no_demangle_O: make NO_DEMANGLE=1
                   make_tags_O: make tags
            make_no_auxtrace_O: make NO_AUXTRACE=1
       make_util_pmu_bison_o_O: make util/pmu-bison.o
         make_install_prefix_O: make install prefix=/tmp/krava
                  make_debug_O: make DEBUG=1
             make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1
                make_no_newt_O: make NO_NEWT=1
                  make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1
  ^[[5~           make_no_libnuma_O: make NO_LIBNUMA=1
              make_no_libelf_O: make NO_LIBELF=1
                   make_help_O: make help
              make_no_libbpf_O: make NO_LIBBPF=1
           make_no_libpython_O: make NO_LIBPYTHON=1
           make_no_libbionic_O: make NO_LIBBIONIC=1
              make_clean_all_O: make clean all
                 make_perf_o_O: make perf.o
             make_util_map_o_O: make util/map.o
  OK
  make: Leaving directory '/home/acme/git/perf/tools/perf'
  $ 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [GIT PULL 00/41] perf/core improvements and fixes
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ingo,

	Please consider pulling, this is on top of tip/perf/urgent.

- Arnaldo

Test results at the end of this message, as usual.

The following changes since commit 297f9233b53a08fd457815e19f1d6f2c3389857b:

  kprobes: Propagate error from disarm_kprobe_ftrace() (2018-02-16 09:12:58 +0100)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-4.17-20180216

for you to fetch changes up to 21316ac6803d4a1aadd74b896db8d60a92cd1140:

  perf tests shell lib: Use a wildcard to remove the vfs_getname probe (2018-02-16 15:31:12 -0300)

----------------------------------------------------------------
perf/core improvements and fixes:

- Fix wrong jump arrow in systems with branch records with cycles,
  i.e. Intel's >= Skylake (Jin Yao)

- Fix 'perf record --per-thread' problem introduced when
  implementing 'perf stat --per-thread (Jin Yao)

- Use arch__compare_symbol_names() to fix 'perf test vmlinux',
  that was using strcmp(symbol names) while the dso routines
  doing symbol lookups used the arch overridable one, making
  this test fail in architectures that overrided that function
  with something other than strcmp() (Jiri Olsa)

- Add 'perf script --show-round-event' to display
  PERF_RECORD_FINISHED_ROUND entries (Jiri Olsa)

- Fix dwarf unwind for stripped binaries in 'perf test' (Jiri Olsa)

- Use ordered_events for 'perf report --tasks', otherwise we may get
  artifacts when PERF_RECORD_FORK gets processed before PERF_RECORD_COMM
  (when they got recorded in different CPUs) (Jiri Olsa)

- Add support to display group output for non group events, i.e.
  now when one uses 'perf report --group' on a perf.data file
  recorded without explicitly grouping events with {} (e.g.
  "perf record -e '{cycles,instructions}'" get the same output
  that would produce, i.e. see all those non-grouped events in
  multiple columns, at the same time (Jiri Olsa)

- Skip non-address kallsyms entries, e.g. '(null)' for !root (Jiri Olsa)

- Kernel maps fixes wrt perf.data(report) versus live system (top)
  (Jiri Olsa)

- Fix memory corruption when using 'perf record -j call -g -a <application>'
  followed by 'perf report --branch-history' (Jiri Olsa)

- ARM CoreSight fixes (Mathieu Poirier)

- Add inject capability for CoreSight Traces (Robert Waker)

- Update documentation for use of 'perf' + ARM CoreSight (Robert Walker)

- Man pages fixes (Sangwon Hong, Jaecheol Shin)

- Fix some 'perf test' cases on s/390 and x86_64 (some backtraces
  changed with a glibc update) (Thomas Richter)

- Add detailed CPUID info in the 'perf.data' headers for s/390 to
  then use it in 'perf annotate' (Thomas Richter)

- Add '--interval-count N' to 'perf stat', to use with -I, i.e.
  'perf stat -I 1000 --interval-count 2' will show stats every
   1000ms, two times (yuzhoujian)

- Add 'perf stat --timeout Nms', that will run for that many
  milliseconds and then stop, printing the counters (yuzhoujian)

- Fix description for 'perf report --mem-modex (Andi Kleen)

- Use a wildcard to remove the vfs_getname probe in the
  'perf test' shell based test cases (Arnaldo Carvalho de Melo)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

----------------------------------------------------------------
Andi Kleen (1):
      perf report: Fix description for --mem-mode

Arnaldo Carvalho de Melo (1):
      perf tests shell lib: Use a wildcard to remove the vfs_getname probe

Jaecheol Shin (1):
      perf annotate: Add missing arguments in Man page

Jin Yao (2):
      perf tools: Use target->per_thread and target->system_wide flags
      perf report: Fix wrong jump arrow

Jiri Olsa (18):
      perf record: Put new line after target override warning
      perf script: Add --show-round-event to display PERF_RECORD_FINISHED_ROUND
      tools lib api fs: Add filename__read_xll function
      tools lib api fs: Add sysfs__read_xll function
      perf tests: Fix dwarf unwind for stripped binaries
      perf tools: Fix comment for sort__* compare functions
      perf report: Ask for ordered events for --tasks option
      perf report: Add support to display group output for non group events
      tools lib symbol: Skip non-address kallsyms line
      perf symbols: Check if we read regular file in dso__load()
      perf machine: Free root_dir in machine__init() error path
      perf machine: Move kernel mmap name into struct machine
      perf machine: Generalize machine__set_kernel_mmap()
      perf machine: Don't search for active kernel start in __machine__create_kernel_maps
      perf machine: Remove machine__load_kallsyms()
      perf tools: Do not create kernel maps in sample__resolve()
      perf tests: Use arch__compare_symbol_names to compare symbols
      perf report: Fix memory corruption in --branch-history mode --branch-history

Mathieu Poirier (3):
      perf cs-etm: Freeing allocated memory
      perf auxtrace arm: Fixing uninitialised variable
      perf cs-etm: Properly deal with cpu maps

Ravi Bangoria (3):
      tools include powerpc: Grab a copy of arch/powerpc/include/uapi/asm/unistd.h
      perf powerpc: Generate system call table from asm/unistd.h
      perf trace powerpc: Use generated syscall table

Robert Walker (3):
      perf cs-etm: Inject capabilitity for CoreSight traces
      perf inject: Emit instruction records on ETM trace discontinuity
      coresight: Update documentation for perf usage

Sangwon Hong (2):
      perf kmem: Document a missing option & an argument
      perf mem: Document a missing option

Thomas Richter (5):
      perf record: Provide detailed information on s390 CPU
      perf annotate: Scan cpuid for s390 and save machine type
      perf cpuid: Introduce a platform specific cpuid compare function
      perf test: Fix test case 23 for s390 z/VM or KVM guests
      perf test: Fix test case inet_pton to accept inlines.

yuzhoujian (2):
      perf stat: Add support to print counts for fixed times
      perf stat: Add support to print counts after a period of time

 Documentation/trace/coresight.txt                  |  51 +++
 tools/arch/powerpc/include/uapi/asm/unistd.h       | 402 +++++++++++++++++
 tools/lib/api/fs/fs.c                              |  44 +-
 tools/lib/api/fs/fs.h                              |   2 +
 tools/lib/symbol/kallsyms.c                        |   4 +
 tools/perf/Documentation/perf-annotate.txt         |   6 +-
 tools/perf/Documentation/perf-kmem.txt             |   6 +-
 tools/perf/Documentation/perf-mem.txt              |   4 +
 tools/perf/Documentation/perf-report.txt           |   5 +-
 tools/perf/Documentation/perf-script.txt           |   3 +
 tools/perf/Documentation/perf-stat.txt             |  10 +
 tools/perf/Makefile.config                         |   2 +
 tools/perf/arch/arm/util/auxtrace.c                |   2 +-
 tools/perf/arch/arm/util/cs-etm.c                  |  51 ++-
 tools/perf/arch/powerpc/Makefile                   |  25 ++
 .../perf/arch/powerpc/entry/syscalls/mksyscalltbl  |  37 ++
 tools/perf/arch/s390/annotate/instructions.c       |  27 +-
 tools/perf/arch/s390/util/header.c                 | 148 ++++++-
 tools/perf/builtin-record.c                        |   2 +-
 tools/perf/builtin-report.c                        |   7 +-
 tools/perf/builtin-script.c                        |  17 +
 tools/perf/builtin-stat.c                          |  53 ++-
 tools/perf/check-headers.sh                        |   1 +
 tools/perf/tests/code-reading.c                    |  33 +-
 tools/perf/tests/dwarf-unwind.c                    |  46 +-
 tools/perf/tests/shell/lib/probe_vfs_getname.sh    |   2 +-
 .../perf/tests/shell/trace+probe_libc_inet_pton.sh |   6 +-
 tools/perf/tests/vmlinux-kallsyms.c                |   4 +-
 tools/perf/ui/browsers/annotate.c                  |   9 +-
 tools/perf/util/build-id.c                         |  10 +-
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c    |  74 +++-
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h    |   2 +
 tools/perf/util/cs-etm.c                           | 478 ++++++++++++++++++---
 tools/perf/util/event.c                            |  16 +-
 tools/perf/util/evlist.c                           |  21 +-
 tools/perf/util/header.h                           |   1 +
 tools/perf/util/hist.c                             |   4 +-
 tools/perf/util/hist.h                             |   1 -
 tools/perf/util/machine.c                          | 145 +++----
 tools/perf/util/machine.h                          |   6 +-
 tools/perf/util/pmu.c                              |  47 +-
 tools/perf/util/sort.c                             |   7 +-
 tools/perf/util/stat.h                             |   2 +
 tools/perf/util/symbol.c                           |  13 +-
 tools/perf/util/syscalltbl.c                       |   8 +
 tools/perf/util/thread_map.c                       |   4 +-
 tools/perf/util/thread_map.h                       |   2 +-
 47 files changed, 1577 insertions(+), 273 deletions(-)
 create mode 100644 tools/arch/powerpc/include/uapi/asm/unistd.h
 create mode 100755 tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl

Test results:

The first ones are container (docker) based builds of tools/perf with and
without libelf support.  Where clang is available, it is also used to build
perf with/without libelf.

The objtool and samples/bpf/ builds are disabled now that I'm switching from
using the sources in a local volume to fetching them from a http server to
build it inside the container, to make it easier to build in a container cluster.
Those will come back later.

Several are cross builds, the ones with -x-ARCH and the android one, and those
may not have all the features built, due to lack of multi-arch devel packages,
available and being used so far on just a few, like
debian:experimental-x-{arm64,mipsel}.

The 'perf test' one will perform a variety of tests exercising
tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands
with a variety of command line event specifications to then intercept the
sys_perf_event syscall to check that the perf_event_attr fields are set up as
expected, among a variety of other unit tests.

Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/
with a variety of feature sets, exercising the build with an incomplete set of
features as well as with a complete one. It is planned to have it run on each
of the containers mentioned above, using some container orchestration
infrastructure. Get in contact if interested in helping having this in place.

  On a Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz

  # dm
   1 39.82 alpine:3.4                    : Ok   gcc (Alpine 5.3.0) 5.3.0
   2 57.59 alpine:3.5                    : Ok   gcc (Alpine 6.2.1) 6.2.1 20160822
   3 44.30 alpine:3.6                    : Ok   gcc (Alpine 6.3.0) 6.3.0
   4 42.14 alpine:edge                   : Ok   gcc (Alpine 6.4.0) 6.4.0
   5 35.50 amazonlinux:1                 : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
   6 42.97 amazonlinux:2                 : Ok   gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2)
   7 26.46 android-ndk:r12b-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
   8 27.28 android-ndk:r15c-arm          : Ok   arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
   9 24.29 centos:5                      : Ok   gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55)
  10 32.15 centos:6                      : Ok   gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18)
  11 39.36 centos:7                      : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
  12 34.69 debian:7                      : Ok   gcc (Debian 4.7.2-5) 4.7.2
  13 37.92 debian:8                      : Ok   gcc (Debian 4.9.2-10) 4.9.2
  14 62.13 debian:9                      : Ok   gcc (Debian 6.3.0-18) 6.3.0 20170516
  15 65.51 debian:experimental           : Ok   gcc (Debian 7.2.0-18) 7.2.0
  16 38.73 debian:experimental-x-arm64   : Ok   aarch64-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
  17 68.18 debian:experimental-x-mips    : Ok   mips-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
  18 36.21 debian:experimental-x-mips64  : Ok   mips64-linux-gnuabi64-gcc (Debian 7.2.0-11) 7.2.0
  19 37.57 debian:experimental-x-mipsel  : Ok   mipsel-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
  20 38.22 fedora:20                     : Ok   gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)
  21 42.49 fedora:21                     : Ok   gcc (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)
  22 39.15 fedora:22                     : Ok   gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
  23 41.46 fedora:23                     : Ok   gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
  24 41.12 fedora:24                     : Ok   gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1)
  25 34.92 fedora:24-x-ARC-uClibc        : Ok   arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710
  26 78.28 fedora:25                     : Ok   gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1)
  27 84.02 fedora:26                     : Ok   gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2)
  28 95.42 fedora:27                     : Ok   gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2)
  29 78.89 fedora:rawhide                : Ok   gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-4)
  30 57.48 gentoo-stage3-amd64:latest    : Ok   gcc (Gentoo 6.4.0-r1 p1.3) 6.4.0
  31 41.18 mageia:5                      : Ok   gcc (GCC) 4.9.2
  32 42.27 mageia:6                      : Ok   gcc (Mageia 5.4.0-5.mga6) 5.4.0
  33 39.66 opensuse:42.1                 : Ok   gcc (SUSE Linux) 4.8.5
  34 40.09 opensuse:42.2                 : Ok   gcc (SUSE Linux) 4.8.5
  35 41.01 opensuse:42.3                 : Ok   gcc (SUSE Linux) 4.8.5
  36 82.32 opensuse:tumbleweed           : Ok   gcc (SUSE Linux) 7.3.0
  37 31.70 oraclelinux:6                 : Ok   gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18)
  38 38.39 oraclelinux:7                 : Ok   gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
  39 30.49 ubuntu:12.04.5                : Ok   gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
  40 36.44 ubuntu:14.04.4                : Ok   gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
  41 32.13 ubuntu:14.04.4-x-linaro-arm64 : Ok   aarch64-linux-gnu-gcc (Linaro GCC 5.5-2017.10) 5.5.0
  42 58.58 ubuntu:16.04                  : Ok   gcc (Ubuntu 5.4.0-6ubuntu1~16.04.6) 5.4.0 20160609
  43 31.52 ubuntu:16.04-x-arm            : Ok   arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
  44 31.06 ubuntu:16.04-x-arm64          : Ok   aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
  45 31.61 ubuntu:16.04-x-powerpc        : Ok   powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
  46 31.93 ubuntu:16.04-x-powerpc64      : Ok   powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.1) 5.4.0 20160609
  47 33.02 ubuntu:16.04-x-powerpc64el    : Ok   powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
  48 30.94 ubuntu:16.04-x-s390           : Ok   s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
  49 63.24 ubuntu:16.10                  : Ok   gcc (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005
  50 63.34 ubuntu:17.04                  : Ok   gcc (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406
  51 63.56 ubuntu:17.10                  : Ok   gcc (Ubuntu 7.2.0-8ubuntu3) 7.2.0
  52 63.45 ubuntu:18.04                  : Ok   gcc (Ubuntu 7.2.0-18ubuntu2) 7.2.0

  # uname -a
  Linux jouet 4.15.0-rc9+ #7 SMP Mon Jan 22 18:16:36 -03 2018 x86_64 x86_64 x86_64 GNU/Linux
  # perf test
   1: vmlinux symtab matches kallsyms                       : Ok
   2: Detect openat syscall event                           : Ok
   3: Detect openat syscall event on all cpus               : Ok
   4: Read samples using the mmap interface                 : Ok
   5: Test data source output                               : Ok
   6: Parse event definition strings                        : Ok
   7: Simple expression parser                              : Ok
   8: PERF_RECORD_* events & perf_sample fields             : Ok
   9: Parse perf pmu format                                 : Ok
  10: DSO data read                                         : Ok
  11: DSO data cache                                        : Ok
  12: DSO data reopen                                       : Ok
  13: Roundtrip evsel->name                                 : Ok
  14: Parse sched tracepoints fields                        : Ok
  15: syscalls:sys_enter_openat event fields                : Ok
  16: Setup struct perf_event_attr                          : Ok
  17: Match and link multiple hists                         : Ok
  18: 'import perf' in python                               : Ok
  19: Breakpoint overflow signal handler                    : Ok
  20: Breakpoint overflow sampling                          : Ok
  21: Number of exit events of a simple workload            : Ok
  22: Software clock events period values                   : Ok
  23: Object code reading                                   : Ok
  24: Sample parsing                                        : Ok
  25: Use a dummy software event to keep tracking           : Ok
  26: Parse with no sample_id_all bit set                   : Ok
  27: Filter hist entries                                   : Ok
  28: Lookup mmap thread                                    : Ok
  29: Share thread mg                                       : Ok
  30: Sort output of hist entries                           : Ok
  31: Cumulate child hist entries                           : Ok
  32: Track with sched_switch                               : Ok
  33: Filter fds with revents mask in a fdarray             : Ok
  34: Add fd to a fdarray, making it autogrow               : Ok
  35: kmod_path__parse                                      : Ok
  36: Thread map                                            : Ok
  37: LLVM search and compile                               :
  37.1: Basic BPF llvm compile                              : Ok
  37.2: kbuild searching                                    : Ok
  37.3: Compile source for BPF prologue generation          : Ok
  37.4: Compile source for BPF relocation                   : Ok
  38: Session topology                                      : Ok
  39: BPF filter                                            :
  39.1: Basic BPF filtering                                 : Ok
  39.2: BPF pinning                                         : Ok
  39.3: BPF prologue generation                             : Ok
  39.4: BPF relocation checker                              : Ok
  40: Synthesize thread map                                 : Ok
  41: Remove thread map                                     : Ok
  42: Synthesize cpu map                                    : Ok
  43: Synthesize stat config                                : Ok
  44: Synthesize stat                                       : Ok
  45: Synthesize stat round                                 : Ok
  46: Synthesize attr update                                : Ok
  47: Event times                                           : Ok
  48: Read backward ring buffer                             : Ok
  49: Print cpu map                                         : Ok
  50: Probe SDT events                                      : Ok
  51: is_printable_array                                    : Ok
  52: Print bitmap                                          : Ok
  53: perf hooks                                            : Ok
  54: builtin clang support                                 : Skip (not compiled in)
  55: unit_number__scnprintf                                : Ok
  56: x86 rdpmc                                             : Ok
  57: Convert perf time to TSC                              : Ok
  58: DWARF unwind                                          : Ok
  59: x86 instruction decoder - new instructions            : Ok
  60: Use vfs_getname probe to get syscall args filenames   : Ok
  61: probe libc's inet_pton & backtrace it with ping       : Ok
  62: Check open filename arg using perf trace + vfs_getname: Ok
  63: Add vfs_getname probe to get syscall args filenames   : Ok
  # 
  
  $ make -C tools/perf build-test
  make: Entering directory '/home/acme/git/perf/tools/perf'
  - tarpkg: ./tests/perf-targz-src-pkg .
           make_no_libunwind_O: make NO_LIBUNWIND=1
           make_no_backtrace_O: make NO_BACKTRACE=1
        make_with_babeltrace_O: make LIBBABELTRACE=1
  make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1
   make_install_prefix_slash_O: make install prefix=/tmp/krava/
                 make_static_O: make LDFLAGS=-static
                make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1
         make_with_clangllvm_O: make LIBCLANGLLVM=1
                make_install_O: make install
                    make_doc_O: make doc
                   make_pure_O: make
            make_no_libaudit_O: make NO_LIBAUDIT=1
               make_no_slang_O: make NO_SLANG=1
                make_no_gtk2_O: make NO_GTK2=1
            make_install_bin_O: make install-bin
             make_no_libperl_O: make NO_LIBPERL=1
            make_no_demangle_O: make NO_DEMANGLE=1
                   make_tags_O: make tags
            make_no_auxtrace_O: make NO_AUXTRACE=1
       make_util_pmu_bison_o_O: make util/pmu-bison.o
         make_install_prefix_O: make install prefix=/tmp/krava
                  make_debug_O: make DEBUG=1
             make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1
                make_no_newt_O: make NO_NEWT=1
                  make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1
  ^[[5~           make_no_libnuma_O: make NO_LIBNUMA=1
              make_no_libelf_O: make NO_LIBELF=1
                   make_help_O: make help
              make_no_libbpf_O: make NO_LIBBPF=1
           make_no_libpython_O: make NO_LIBPYTHON=1
           make_no_libbionic_O: make NO_LIBBIONIC=1
              make_clean_all_O: make clean all
                 make_perf_o_O: make perf.o
             make_util_map_o_O: make util/map.o
  OK
  make: Leaving directory '/home/acme/git/perf/tools/perf'
  $ 

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [PATCH 01/41] perf record: Put new line after target override warning
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  (?)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Jiri Olsa, Alexander Shishkin,
	David Ahern, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Jiri Olsa <jolsa@kernel.org>

There's no new-line after target-override warning, now:

  $ perf record -a --per-thread
  Warning:
  SYSTEM/CPU switch overriding PER-THREAD^C[ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.705 MB perf.data (2939 samples) ]

with patch:

  $ perf record -a --per-thread
  Warning:
  SYSTEM/CPU switch overriding PER-THREAD
  ^C[ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.705 MB perf.data (2939 samples) ]

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 16ad2ffb822c ("perf tools: Introduce perf_target__strerror()")
Link: http://lkml.kernel.org/r/20180206181813.10943-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-record.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index bf4ca749d1ac..907267206973 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1803,7 +1803,7 @@ int cmd_record(int argc, const char **argv)
 	err = target__validate(&rec->opts.target);
 	if (err) {
 		target__strerror(&rec->opts.target, err, errbuf, BUFSIZ);
-		ui__warning("%s", errbuf);
+		ui__warning("%s\n", errbuf);
 	}
 
 	err = target__parse_uid(&rec->opts.target);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 02/41] perf script: Add --show-round-event to display PERF_RECORD_FINISHED_ROUND
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (2 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Jiri Olsa, Alexander Shishkin,
	David Ahern, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Jiri Olsa <jolsa@kernel.org>

Adding --show-round-event to display PERF_RECORD_FINISHED_ROUND events
like:

  # perf script --show-round-events 2>/dev/null
               yes  8591 [002] 124177.397597:         18         cpu/mem-stores/P: ff...
               yes  8591 [002] 124177.397615:          1 cpu/mem-loads,ldlat=30/P: ff...
  PERF_RECORD_FINISHED_ROUND
              perf 10380 [001] 124177.397622:          6 cpu/mem-loads,ldlat=30/P: ff...
  PERF_RECORD_FINISHED_ROUND
           swapper     0 [000] 124177.400518:         88         cpu/mem-stores/P: ff...
           swapper     0 [000] 124177.400521:         88         cpu/mem-stores/P: ff...

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180206181813.10943-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-script.txt |  3 +++
 tools/perf/builtin-script.c              | 17 +++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 7730c1d2b5d3..36ec0257f8d3 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -303,6 +303,9 @@ OPTIONS
 --show-lost-events
 	Display lost events i.e. events of type PERF_RECORD_LOST.
 
+--show-round-events
+	Display finished round events i.e. events of type PERF_RECORD_FINISHED_ROUND.
+
 --demangle::
 	Demangle symbol names to human readable form. It's enabled by default,
 	disable with --no-demangle.
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index ab19a6ee4093..cce926aeb0c0 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1489,6 +1489,7 @@ struct perf_script {
 	bool			show_switch_events;
 	bool			show_namespace_events;
 	bool			show_lost_events;
+	bool			show_round_events;
 	bool			allocated;
 	bool			per_event_dump;
 	struct cpu_map		*cpus;
@@ -2104,6 +2105,16 @@ process_lost_event(struct perf_tool *tool,
 	return 0;
 }
 
+static int
+process_finished_round_event(struct perf_tool *tool __maybe_unused,
+			     union perf_event *event,
+			     struct ordered_events *oe __maybe_unused)
+
+{
+	perf_event__fprintf(event, stdout);
+	return 0;
+}
+
 static void sig_handler(int sig __maybe_unused)
 {
 	session_done = 1;
@@ -2200,6 +2211,10 @@ static int __cmd_script(struct perf_script *script)
 		script->tool.namespaces = process_namespaces_event;
 	if (script->show_lost_events)
 		script->tool.lost = process_lost_event;
+	if (script->show_round_events) {
+		script->tool.ordered_events = false;
+		script->tool.finished_round = process_finished_round_event;
+	}
 
 	if (perf_script__setup_per_event_dump(script)) {
 		pr_err("Couldn't create the per event dump files\n");
@@ -3139,6 +3154,8 @@ int cmd_script(int argc, const char **argv)
 		    "Show namespace events (if recorded)"),
 	OPT_BOOLEAN('\0', "show-lost-events", &script.show_lost_events,
 		    "Show lost events (if recorded)"),
+	OPT_BOOLEAN('\0', "show-round-events", &script.show_round_events,
+		    "Show round events (if recorded)"),
 	OPT_BOOLEAN('\0', "per-event-dump", &script.per_event_dump,
 		    "Dump trace output to files named by the monitored events"),
 	OPT_BOOLEAN('f', "force", &symbol_conf.force, "don't complain, do it"),
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 03/41] tools lib api fs: Add filename__read_xll function
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (3 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Jiri Olsa, Alexander Shishkin,
	David Ahern, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Jiri Olsa <jolsa@kernel.org>

Adding filename__read_xll function to be able to read files with hex
numbers in, which do not have 0x prefix.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180206181813.10943-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/lib/api/fs/fs.c | 29 ++++++++++++++++++++++-------
 tools/lib/api/fs/fs.h |  1 +
 2 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/tools/lib/api/fs/fs.c b/tools/lib/api/fs/fs.c
index b24afc0e6e81..8b0e4a4315bd 100644
--- a/tools/lib/api/fs/fs.c
+++ b/tools/lib/api/fs/fs.c
@@ -315,12 +315,8 @@ int filename__read_int(const char *filename, int *value)
 	return err;
 }
 
-/*
- * Parses @value out of @filename with strtoull.
- * By using 0 for base, the strtoull detects the
- * base automatically (see man strtoull).
- */
-int filename__read_ull(const char *filename, unsigned long long *value)
+static int filename__read_ull_base(const char *filename,
+				   unsigned long long *value, int base)
 {
 	char line[64];
 	int fd = open(filename, O_RDONLY), err = -1;
@@ -329,7 +325,7 @@ int filename__read_ull(const char *filename, unsigned long long *value)
 		return -1;
 
 	if (read(fd, line, sizeof(line)) > 0) {
-		*value = strtoull(line, NULL, 0);
+		*value = strtoull(line, NULL, base);
 		if (*value != ULLONG_MAX)
 			err = 0;
 	}
@@ -338,6 +334,25 @@ int filename__read_ull(const char *filename, unsigned long long *value)
 	return err;
 }
 
+/*
+ * Parses @value out of @filename with strtoull.
+ * By using 16 for base to treat the number as hex.
+ */
+int filename__read_xll(const char *filename, unsigned long long *value)
+{
+	return filename__read_ull_base(filename, value, 16);
+}
+
+/*
+ * Parses @value out of @filename with strtoull.
+ * By using 0 for base, the strtoull detects the
+ * base automatically (see man strtoull).
+ */
+int filename__read_ull(const char *filename, unsigned long long *value)
+{
+	return filename__read_ull_base(filename, value, 0);
+}
+
 #define STRERR_BUFSIZE  128     /* For the buffer size of strerror_r */
 
 int filename__read_str(const char *filename, char **buf, size_t *sizep)
diff --git a/tools/lib/api/fs/fs.h b/tools/lib/api/fs/fs.h
index dda49deefb52..8ebee35a6395 100644
--- a/tools/lib/api/fs/fs.h
+++ b/tools/lib/api/fs/fs.h
@@ -30,6 +30,7 @@ FS(bpf_fs)
 
 int filename__read_int(const char *filename, int *value);
 int filename__read_ull(const char *filename, unsigned long long *value);
+int filename__read_xll(const char *filename, unsigned long long *value);
 int filename__read_str(const char *filename, char **buf, size_t *sizep);
 
 int filename__write_int(const char *filename, int value);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 04/41] tools lib api fs: Add sysfs__read_xll function
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (4 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Jiri Olsa, Alexander Shishkin,
	David Ahern, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Jiri Olsa <jolsa@kernel.org>

Adding sysfs__read_xll function to be able to read sysfs files with hex
numbers in, which do not have 0x prefix.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180206181813.10943-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/lib/api/fs/fs.c | 15 +++++++++++++--
 tools/lib/api/fs/fs.h |  1 +
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/tools/lib/api/fs/fs.c b/tools/lib/api/fs/fs.c
index 8b0e4a4315bd..6a12bbf39f7b 100644
--- a/tools/lib/api/fs/fs.c
+++ b/tools/lib/api/fs/fs.c
@@ -432,7 +432,8 @@ int procfs__read_str(const char *entry, char **buf, size_t *sizep)
 	return filename__read_str(path, buf, sizep);
 }
 
-int sysfs__read_ull(const char *entry, unsigned long long *value)
+static int sysfs__read_ull_base(const char *entry,
+				unsigned long long *value, int base)
 {
 	char path[PATH_MAX];
 	const char *sysfs = sysfs__mountpoint();
@@ -442,7 +443,17 @@ int sysfs__read_ull(const char *entry, unsigned long long *value)
 
 	snprintf(path, sizeof(path), "%s/%s", sysfs, entry);
 
-	return filename__read_ull(path, value);
+	return filename__read_ull_base(path, value, base);
+}
+
+int sysfs__read_xll(const char *entry, unsigned long long *value)
+{
+	return sysfs__read_ull_base(entry, value, 16);
+}
+
+int sysfs__read_ull(const char *entry, unsigned long long *value)
+{
+	return sysfs__read_ull_base(entry, value, 0);
 }
 
 int sysfs__read_int(const char *entry, int *value)
diff --git a/tools/lib/api/fs/fs.h b/tools/lib/api/fs/fs.h
index 8ebee35a6395..92d03b8396b1 100644
--- a/tools/lib/api/fs/fs.h
+++ b/tools/lib/api/fs/fs.h
@@ -40,6 +40,7 @@ int procfs__read_str(const char *entry, char **buf, size_t *sizep);
 int sysctl__read_int(const char *sysctl, int *value);
 int sysfs__read_int(const char *entry, int *value);
 int sysfs__read_ull(const char *entry, unsigned long long *value);
+int sysfs__read_xll(const char *entry, unsigned long long *value);
 int sysfs__read_str(const char *entry, char **buf, size_t *sizep);
 int sysfs__read_bool(const char *entry, bool *value);
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 05/41] perf tests: Fix dwarf unwind for stripped binaries
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (5 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Jiri Olsa, Alexander Shishkin,
	David Ahern, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Jiri Olsa <jolsa@kernel.org>

When we strip the perf binary, dwarf unwind test stop
to work. The reason is that strip will remove static
function symbols, which we need to check for unwind.

This change will keep this test working in cases where
the global symbols are put into dynamic symbol table,
which is the case on x86. It still won't work on powerpc.

Making those 5 local functions global, and adding
'test_dwarf_unwind__' to their names.

Committer testing:

Before:

  # perf test dwarf
  58: DWARF unwind                               : Ok
  # strip ~/bin/perf
  # perf test dwarf
  58: DWARF unwind                               : FAILED!
  # perf test -v dwarf
  58: DWARF unwind                               :
  --- start ---
  test child forked, pid 6590
  unwind: thread map already set, dso=/home/acme/bin/perf
  <SNIP>
  unwind: access_mem addr 0x7ffce6c48098 val 48563f, offset 1144
  unwind: test__dwarf_unwind:ip = 0x4a54e5 (0xa54e5)
  got: test__dwarf_unwind 0xa54e5, expecting test__dwarf_unwind
  unwind: '':ip = 0x4a50bb (0xa50bb)
  failed: got unresolved address 0xa50bb
  unwind failed
  test child finished with -1
  ---- end ----
  DWARF unwind: FAILED!
  #

After:

  # perf test dwarf
  58: DWARF unwind                               : Ok
  # strip ~/bin/perf
  # perf test dwarf
  58: DWARF unwind                               : Ok
  #
  # perf test -v dwarf
  58: DWARF unwind                               :
  --- start ---
  test child forked, pid 7219
  unwind: thread map already set, dso=/home/acme/bin/perf
  <SNIP>
  unwind: access_mem addr 0x7fff007da2c8 val 48575f, offset 1144
  unwind: test__arch_unwind_sample:ip = 0x589044 (0x189044)
  got: test__arch_unwind_sample 0x189044, expecting test__arch_unwind_sample
  unwind: test_dwarf_unwind__thread:ip = 0x4a52f7 (0xa52f7)
  got: test_dwarf_unwind__thread 0xa52f7, expecting test_dwarf_unwind__thread
  unwind: test_dwarf_unwind__compare:ip = 0x4a5468 (0xa5468)
  got: test_dwarf_unwind__compare 0xa5468, expecting test_dwarf_unwind__compare
  unwind: bsearch:ip = 0x7f6608ae94d8 (0x394d8)
  got: bsearch 0x394d8, expecting bsearch
  unwind: test_dwarf_unwind__krava_3:ip = 0x4a54d1 (0xa54d1)
  got: test_dwarf_unwind__krava_3 0xa54d1, expecting test_dwarf_unwind__krava_3
  unwind: test_dwarf_unwind__krava_2:ip = 0x4a550b (0xa550b)
  got: test_dwarf_unwind__krava_2 0xa550b, expecting test_dwarf_unwind__krava_2
  unwind: test_dwarf_unwind__krava_1:ip = 0x4a554b (0xa554b)
  got: test_dwarf_unwind__krava_1 0xa554b, expecting test_dwarf_unwind__krava_1
  unwind: test__dwarf_unwind:ip = 0x4a5605 (0xa5605)
  got: test__dwarf_unwind 0xa5605, expecting test__dwarf_unwind
  test child finished with 0
  ---- end ----
  DWARF unwind: Ok
  #

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180206181813.10943-17-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/tests/dwarf-unwind.c | 46 +++++++++++++++++++++++++++--------------
 1 file changed, 30 insertions(+), 16 deletions(-)

diff --git a/tools/perf/tests/dwarf-unwind.c b/tools/perf/tests/dwarf-unwind.c
index 260418969120..2f008067d989 100644
--- a/tools/perf/tests/dwarf-unwind.c
+++ b/tools/perf/tests/dwarf-unwind.c
@@ -37,6 +37,19 @@ static int init_live_machine(struct machine *machine)
 						  mmap_handler, machine, true, 500);
 }
 
+/*
+ * We need to keep these functions global, despite the
+ * fact that they are used only locally in this object,
+ * in order to keep them around even if the binary is
+ * stripped. If they are gone, the unwind check for
+ * symbol fails.
+ */
+int test_dwarf_unwind__thread(struct thread *thread);
+int test_dwarf_unwind__compare(void *p1, void *p2);
+int test_dwarf_unwind__krava_3(struct thread *thread);
+int test_dwarf_unwind__krava_2(struct thread *thread);
+int test_dwarf_unwind__krava_1(struct thread *thread);
+
 #define MAX_STACK 8
 
 static int unwind_entry(struct unwind_entry *entry, void *arg)
@@ -45,12 +58,12 @@ static int unwind_entry(struct unwind_entry *entry, void *arg)
 	char *symbol = entry->sym ? entry->sym->name : NULL;
 	static const char *funcs[MAX_STACK] = {
 		"test__arch_unwind_sample",
-		"unwind_thread",
-		"compare",
+		"test_dwarf_unwind__thread",
+		"test_dwarf_unwind__compare",
 		"bsearch",
-		"krava_3",
-		"krava_2",
-		"krava_1",
+		"test_dwarf_unwind__krava_3",
+		"test_dwarf_unwind__krava_2",
+		"test_dwarf_unwind__krava_1",
 		"test__dwarf_unwind"
 	};
 	/*
@@ -77,7 +90,7 @@ static int unwind_entry(struct unwind_entry *entry, void *arg)
 	return strcmp((const char *) symbol, funcs[idx]);
 }
 
-static noinline int unwind_thread(struct thread *thread)
+noinline int test_dwarf_unwind__thread(struct thread *thread)
 {
 	struct perf_sample sample;
 	unsigned long cnt = 0;
@@ -108,7 +121,7 @@ static noinline int unwind_thread(struct thread *thread)
 
 static int global_unwind_retval = -INT_MAX;
 
-static noinline int compare(void *p1, void *p2)
+noinline int test_dwarf_unwind__compare(void *p1, void *p2)
 {
 	/* Any possible value should be 'thread' */
 	struct thread *thread = *(struct thread **)p1;
@@ -117,17 +130,17 @@ static noinline int compare(void *p1, void *p2)
 		/* Call unwinder twice for both callchain orders. */
 		callchain_param.order = ORDER_CALLER;
 
-		global_unwind_retval = unwind_thread(thread);
+		global_unwind_retval = test_dwarf_unwind__thread(thread);
 		if (!global_unwind_retval) {
 			callchain_param.order = ORDER_CALLEE;
-			global_unwind_retval = unwind_thread(thread);
+			global_unwind_retval = test_dwarf_unwind__thread(thread);
 		}
 	}
 
 	return p1 - p2;
 }
 
-static noinline int krava_3(struct thread *thread)
+noinline int test_dwarf_unwind__krava_3(struct thread *thread)
 {
 	struct thread *array[2] = {thread, thread};
 	void *fp = &bsearch;
@@ -141,18 +154,19 @@ static noinline int krava_3(struct thread *thread)
 			size_t, int (*)(void *, void *));
 
 	_bsearch = fp;
-	_bsearch(array, &thread, 2, sizeof(struct thread **), compare);
+	_bsearch(array, &thread, 2, sizeof(struct thread **),
+		 test_dwarf_unwind__compare);
 	return global_unwind_retval;
 }
 
-static noinline int krava_2(struct thread *thread)
+noinline int test_dwarf_unwind__krava_2(struct thread *thread)
 {
-	return krava_3(thread);
+	return test_dwarf_unwind__krava_3(thread);
 }
 
-static noinline int krava_1(struct thread *thread)
+noinline int test_dwarf_unwind__krava_1(struct thread *thread)
 {
-	return krava_2(thread);
+	return test_dwarf_unwind__krava_2(thread);
 }
 
 int test__dwarf_unwind(struct test *test __maybe_unused, int subtest __maybe_unused)
@@ -189,7 +203,7 @@ int test__dwarf_unwind(struct test *test __maybe_unused, int subtest __maybe_unu
 		goto out;
 	}
 
-	err = krava_1(thread);
+	err = test_dwarf_unwind__krava_1(thread);
 	thread__put(thread);
 
  out:
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 06/41] perf tools: Fix comment for sort__* compare functions
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (6 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Jiri Olsa, Alexander Shishkin,
	David Ahern, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Jiri Olsa <jolsa@kernel.org>

In commit 2f15bd8c6c6e ("perf tools: Fix "Command" sort_entry's cmp and
collapse function") we switched from pointer to string comparison.

But failed to remove related comments. Removing them and adding another
one to warn before pointer comparison in here.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180206181813.10943-18-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/sort.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index 2da4d0456a03..e8514f651865 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -111,17 +111,20 @@ struct sort_entry sort_thread = {
 
 /* --sort comm */
 
+/*
+ * We can't use pointer comparison in functions below,
+ * because it gives different results based on pointer
+ * values, which could break some sorting assumptions.
+ */
 static int64_t
 sort__comm_cmp(struct hist_entry *left, struct hist_entry *right)
 {
-	/* Compare the addr that should be unique among comm */
 	return strcmp(comm__str(right->comm), comm__str(left->comm));
 }
 
 static int64_t
 sort__comm_collapse(struct hist_entry *left, struct hist_entry *right)
 {
-	/* Compare the addr that should be unique among comm */
 	return strcmp(comm__str(right->comm), comm__str(left->comm));
 }
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 07/41] perf report: Ask for ordered events for --tasks option
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (7 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Jiri Olsa, Alexander Shishkin,
	David Ahern, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Jiri Olsa <jolsa@kernel.org>

If we have the time in, keep the events in time order.

Committer notes:

Trying to be more verbose, what actual effect this will have in this particular
case?

Before and after this patch shows the artifacts:

  --- /tmp/before 2018-02-06 15:40:29.536411625 -0300
  +++ /tmp/after  2018-02-06 15:40:51.963403599 -0300
  @@ -5,34 +5,34 @@
         2540     2540     1818 |   gnome-terminal-
         3489     3489     2540 |    bash
        32433    32433     3489 |     perf
  -     32434    32434    32433 |      perf
  +     32434    32434    32433 |      make
        32441    32441    32434 |       make
        32514    32514    32441 |        make
          511      511    32514 |         sh
  -       512      512      511 |          sh
  +       512      512      511 |          install
<SNIP>

We don't have 'perf' calling 'perf' calling 'make', etc, the second
'perf' actually is 'make', i.e.  there was reordering of the relevant
PERF_RECORD_COMM and PERF_RECORD_FORK records.

Ditto for sh/install later on.

Look for FORK and COMM meta events, for those tids:

  # perf report -D | egrep 'PERF_RECORD_(FORK|COMM)' | egrep '3243[34]'
  0 14774650990679 0x1a3cd8 [0x38]: PERF_RECORD_FORK(32433:32433):(3489:3489)
  1 14774652080381 0x1d6568 [0x30]: PERF_RECORD_COMM exec: perf:32433/32433
  1 14774742473340 0x1dbb48 [0x38]: PERF_RECORD_FORK(32434:32434):(32433:32433)
  0 14774752005779 0x1a4af8 [0x30]: PERF_RECORD_COMM exec: make:32434/32434
  0 14774753997960 0x1a5578 [0x38]: PERF_RECORD_FORK(32435:32435):(32434:32434)
  0 14774756070782 0x1a5618 [0x38]: PERF_RECORD_FORK(32438:32438):(32434:32434)
  0 14774757772939 0x1a5680 [0x38]: PERF_RECORD_FORK(32440:32440):(32434:32434)
  0 14774758230600 0x1a56e8 [0x38]: PERF_RECORD_FORK(32441:32441):(32434:32434)
  #

First column is the cpu, second is the timestamp.

So they are on different CPUs, thus ring buffers, and when we don't use
the ordered_events class, we end up mixing that up, use it to take
advantage of the PERF_RECORD_FINISHED_ROUND meta events to go on
ordering the events using the PERF_SAMPLE_TIME present in the
PERF_RECORD_{FORK,COMM,EXIT,SAMPLE,etc} records in the ring buffer.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180206181813.10943-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-report.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 4ad5dc649716..8ef71669e7a0 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -614,6 +614,7 @@ static int stats_print(struct report *rep)
 static void tasks_setup(struct report *rep)
 {
 	memset(&rep->tool, 0, sizeof(rep->tool));
+	rep->tool.ordered_events = true;
 	if (rep->mmaps_mode) {
 		rep->tool.mmap = perf_event__process_mmap;
 		rep->tool.mmap2 = perf_event__process_mmap2;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 08/41] perf report: Add support to display group output for non group events
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (8 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Jiri Olsa, Jiri Olsa,
	Alexander Shishkin, Andi Kleen, David Ahern, Namhyung Kim,
	Peter Zijlstra, Arnaldo Carvalho de Melo

From: Jiri Olsa <jolsa@redhat.com>

Add support to display group output for if non grouped events are
detected and user forces --group option. Now for non-group events
recorded like:

  $ perf record -e 'cycles,instructions' ls

you can still get group output by using --group option
in report:

  $ perf report --group --stdio
  ...
  #         Overhead  Command  Shared Object     Symbol
  # ................  .......  ................  ......................
  #
      17.67%   0.00%  ls       libc-2.25.so      [.] _IO_do_write@@GLIB
      15.59%  25.94%  ls       ls                [.] calculate_columns
      15.41%  31.35%  ls       libc-2.25.so      [.] __strcoll_l
  ...

Committer note:

We should improve on this by making sure that the first line states that
this is not a group, but since the user doesn't have to force group view
when really using grouped events (e.g. '{cycles,instructions}'), the
user better know what is being done...

Requested-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Stephane Eranian <eranian@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180209092734.GB20449@krava
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-report.txt | 3 ++-
 tools/perf/builtin-report.c              | 6 +++++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 907e505b6309..a76b871f78a6 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -354,7 +354,8 @@ OPTIONS
         Path to objdump binary.
 
 --group::
-	Show event group information together.
+	Show event group information together. It forces group output also
+	if there are no groups defined in data file.
 
 --demangle::
 	Demangle symbol names to human readable form. It's enabled by default,
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 8ef71669e7a0..1eedb1815c4c 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -938,6 +938,7 @@ int cmd_report(int argc, const char **argv)
 		"perf report [<options>]",
 		NULL
 	};
+	bool group_set = false;
 	struct report report = {
 		.tool = {
 			.sample		 = process_sample_event,
@@ -1057,7 +1058,7 @@ int cmd_report(int argc, const char **argv)
 		   "Specify disassembler style (e.g. -M intel for intel syntax)"),
 	OPT_BOOLEAN(0, "show-total-period", &symbol_conf.show_total_period,
 		    "Show a column with the sum of periods"),
-	OPT_BOOLEAN(0, "group", &symbol_conf.event_group,
+	OPT_BOOLEAN_SET(0, "group", &symbol_conf.event_group, &group_set,
 		    "Show event group information together"),
 	OPT_CALLBACK_NOOPT('b', "branch-stack", &branch_mode, "",
 		    "use branch records for per branch histogram filling",
@@ -1174,6 +1175,9 @@ int cmd_report(int argc, const char **argv)
 	has_br_stack = perf_header__has_feat(&session->header,
 					     HEADER_BRANCH_STACK);
 
+	if (group_set && !session->evlist->nr_groups)
+		perf_evlist__set_leader(session->evlist);
+
 	if (itrace_synth_opts.last_branch)
 		has_br_stack = true;
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 09/41] perf stat: Add support to print counts for fixed times
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (9 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, yuzhoujian, Adrian Hunter,
	Alexander Shishkin, David Ahern, Kan Liang, Milian Wolff,
	Namhyung Kim, Peter Zijlstra, Wang Nan, Arnaldo Carvalho de Melo

From: yuzhoujian <yuzhoujian@didichuxing.com>

Introduce a new option to print counts for fixed number of times and
update 'perf stat' documentation accordingly.

Show below is the output of the new option for perf stat.

  $ perf stat -I 1000 --interval-count 2 -e cycles -a
  #           time             counts unit events
           1.002827089         93,884,870      cycles
           2.004231506         56,573,446      cycles

We can just print the counts for several times with this newly
introduced option. The usage of it is a little like 'vmstat', and it
should be used together with "-I" option.

  $ vmstat -n 1 2
  procs ---------memory-------------- --swap- ----io-- -system-- ------cpu---
   r  b swpd   free   buff   cache    si   so  bi   bo  in   cs us sy id wa st
   0  0    0 78270544 547484 51732076  0   0   0   20    1    1  1  0 99  0 0
   0  0    0 78270512 547484 51732080  0   0   0   16  477 1555  0  0 100 0 0

Changes since v3:
- merge interval_count check and times check to one line.
- fix the wrong indent in stat.h
- use stat_config.times instead of 'times' in cmd_stat function.

Changes since v2:
- none.

Changes since v1:
- change the name of the new option "times-print" to "interval-count".
- keep the new option interval specifically.

Signed-off-by: yuzhoujian <yuzhoujian@didichuxing.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1517217923-8302-2-git-send-email-ufo19890607@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-stat.txt |  5 +++++
 tools/perf/builtin-stat.c              | 20 +++++++++++++++++++-
 tools/perf/util/stat.h                 |  1 +
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 823fce7674bb..47a21645f60c 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -146,6 +146,11 @@ Print count deltas every N milliseconds (minimum: 10ms)
 The overhead percentage could be high in some cases, for instance with small, sub 100ms intervals.  Use with caution.
 	example: 'perf stat -I 1000 -e cycles -a sleep 5'
 
+--interval-count times::
+Print count deltas for fixed number of times.
+This option should be used together with "-I" option.
+	example: 'perf stat -I 1000 --interval-count 2 -e cycles -a'
+
 --metric-only::
 Only print computed metrics. Print them in a single line.
 Don't show any raw values. Not supported with --per-thread.
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 98bf9d32f222..7d1d7613bf56 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -168,6 +168,7 @@ static struct timespec		ref_time;
 static struct cpu_map		*aggr_map;
 static aggr_get_id_t		aggr_get_id;
 static bool			append_file;
+static bool			interval_count;
 static const char		*output_name;
 static int			output_fd;
 static int			print_free_counters_hint;
@@ -571,6 +572,7 @@ static struct perf_evsel *perf_evsel__reset_weak_group(struct perf_evsel *evsel)
 static int __run_perf_stat(int argc, const char **argv)
 {
 	int interval = stat_config.interval;
+	int times = stat_config.times;
 	char msg[BUFSIZ];
 	unsigned long long t0, t1;
 	struct perf_evsel *counter;
@@ -700,6 +702,8 @@ static int __run_perf_stat(int argc, const char **argv)
 			while (!waitpid(child_pid, &status, WNOHANG)) {
 				nanosleep(&ts, NULL);
 				process_interval();
+				if (interval_count && !(--times))
+					break;
 			}
 		}
 		waitpid(child_pid, &status, 0);
@@ -716,8 +720,11 @@ static int __run_perf_stat(int argc, const char **argv)
 		enable_counters();
 		while (!done) {
 			nanosleep(&ts, NULL);
-			if (interval)
+			if (interval) {
 				process_interval();
+				if (interval_count && !(--times))
+					break;
+			}
 		}
 	}
 
@@ -1891,6 +1898,8 @@ static const struct option stat_options[] = {
 			"command to run after to the measured command"),
 	OPT_UINTEGER('I', "interval-print", &stat_config.interval,
 		    "print counts at regular interval in ms (>= 10)"),
+	OPT_INTEGER(0, "interval-count", &stat_config.times,
+		    "print counts for fixed number of times"),
 	OPT_SET_UINT(0, "per-socket", &stat_config.aggr_mode,
 		     "aggregate counts per processor socket", AGGR_SOCKET),
 	OPT_SET_UINT(0, "per-core", &stat_config.aggr_mode,
@@ -2870,6 +2879,15 @@ int cmd_stat(int argc, const char **argv)
 				   "The overhead percentage could be high in some cases. "
 				   "Please proceed with caution.\n");
 	}
+	if (stat_config.times && interval)
+		interval_count = true;
+	else if (stat_config.times && !interval) {
+		pr_err("interval-count option should be used together with "
+				"interval-print.\n");
+		parse_options_usage(stat_usage, stat_options, "interval-count", 0);
+		parse_options_usage(stat_usage, stat_options, "I", 1);
+		goto out;
+	}
 
 	if (perf_evlist__alloc_stats(evsel_list, interval))
 		goto out;
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index dbc6f7134f61..540fbb350e53 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -90,6 +90,7 @@ struct perf_stat_config {
 	bool		scale;
 	FILE		*output;
 	unsigned int	interval;
+	int		times;
 	struct runtime_stat *stats;
 	int		stats_num;
 };
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 10/41] perf stat: Add support to print counts after a period of time
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (10 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, yuzhoujian, Adrian Hunter,
	Alexander Shishkin, David Ahern, Kan Liang, Milian Wolff,
	Namhyung Kim, Peter Zijlstra, Wang Nan, Arnaldo Carvalho de Melo

From: yuzhoujian <yuzhoujian@didichuxing.com>

Introduce a new option to print counts after N milliseconds and update
'perf stat' documentation accordingly.

Show below is the output of the new option for perf stat.

  $ perf stat --time 2000 -e cycles -a
  Performance counter stats for 'system wide':

        157,260,423      cycles

        2.003060766 seconds time elapsed

We can print the count deltas after N milliseconds with this new
introduced option. This option is not supported with "-I" option.

In addition, according to Kangliang's patch(19afd10410957), the
monitoring overhead for system-wide core event could be very high if the
interval-print parameter was below 100ms, and the limitation value is
10ms.

So the same warning will be displayed when the time is set between 10ms
to 100ms, and the minimal time is limited to 10ms. Users can make a
decision according to their spcific cases.

Committer notes:

This actually stops the workload after the specified time, then prints
the counts.

So I renamed the option to --timeout and updated the documentation to
state that it will not just print the counts after the specified time,
but will really stop the 'perf stat' session and print the counts.

The rename from 'time' to 'timeout' also fixes the build in systems
where 'time' is used by glibc and can't be used as a name of a variable,
such as centos:5 and centos:6.

Changes since v3:
- none.

Changes since v2:
- modify the time check in __run_perf_stat func to keep some consistency
  with the workload case.
- add the warning when the time is set between 10ms to 100ms.
- add the pr_err when the time is set below 10ms.

Changes since v1:
- none.

Signed-off-by: yuzhoujian <yuzhoujian@didichuxing.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1517217923-8302-3-git-send-email-ufo19890607@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-stat.txt |  5 +++++
 tools/perf/builtin-stat.c              | 33 +++++++++++++++++++++++++++++++--
 tools/perf/util/stat.h                 |  1 +
 3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 47a21645f60c..2bbe79a50d3c 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -151,6 +151,11 @@ Print count deltas for fixed number of times.
 This option should be used together with "-I" option.
 	example: 'perf stat -I 1000 --interval-count 2 -e cycles -a'
 
+--timeout msecs::
+Stop the 'perf stat' session and print count deltas after N milliseconds (minimum: 10 ms).
+This option is not supported with the "-I" option.
+	example: 'perf stat --time 2000 -e cycles -a'
+
 --metric-only::
 Only print computed metrics. Print them in a single line.
 Don't show any raw values. Not supported with --per-thread.
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 7d1d7613bf56..2d49eccf98f2 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -573,6 +573,7 @@ static int __run_perf_stat(int argc, const char **argv)
 {
 	int interval = stat_config.interval;
 	int times = stat_config.times;
+	int timeout = stat_config.timeout;
 	char msg[BUFSIZ];
 	unsigned long long t0, t1;
 	struct perf_evsel *counter;
@@ -586,6 +587,9 @@ static int __run_perf_stat(int argc, const char **argv)
 	if (interval) {
 		ts.tv_sec  = interval / USEC_PER_MSEC;
 		ts.tv_nsec = (interval % USEC_PER_MSEC) * NSEC_PER_MSEC;
+	} else if (timeout) {
+		ts.tv_sec  = timeout / USEC_PER_MSEC;
+		ts.tv_nsec = (timeout % USEC_PER_MSEC) * NSEC_PER_MSEC;
 	} else {
 		ts.tv_sec  = 1;
 		ts.tv_nsec = 0;
@@ -698,9 +702,11 @@ static int __run_perf_stat(int argc, const char **argv)
 		perf_evlist__start_workload(evsel_list);
 		enable_counters();
 
-		if (interval) {
+		if (interval || timeout) {
 			while (!waitpid(child_pid, &status, WNOHANG)) {
 				nanosleep(&ts, NULL);
+				if (timeout)
+					break;
 				process_interval();
 				if (interval_count && !(--times))
 					break;
@@ -720,6 +726,8 @@ static int __run_perf_stat(int argc, const char **argv)
 		enable_counters();
 		while (!done) {
 			nanosleep(&ts, NULL);
+			if (timeout)
+				break;
 			if (interval) {
 				process_interval();
 				if (interval_count && !(--times))
@@ -1900,6 +1908,8 @@ static const struct option stat_options[] = {
 		    "print counts at regular interval in ms (>= 10)"),
 	OPT_INTEGER(0, "interval-count", &stat_config.times,
 		    "print counts for fixed number of times"),
+	OPT_UINTEGER(0, "timeout", &stat_config.timeout,
+		    "stop workload and print counts after a timeout period in ms (>= 10ms)"),
 	OPT_SET_UINT(0, "per-socket", &stat_config.aggr_mode,
 		     "aggregate counts per processor socket", AGGR_SOCKET),
 	OPT_SET_UINT(0, "per-core", &stat_config.aggr_mode,
@@ -2697,7 +2707,7 @@ int cmd_stat(int argc, const char **argv)
 	int status = -EINVAL, run_idx;
 	const char *mode;
 	FILE *output = stderr;
-	unsigned int interval;
+	unsigned int interval, timeout;
 	const char * const stat_subcommands[] = { "record", "report" };
 
 	setlocale(LC_ALL, "");
@@ -2728,6 +2738,7 @@ int cmd_stat(int argc, const char **argv)
 		return __cmd_report(argc, argv);
 
 	interval = stat_config.interval;
+	timeout = stat_config.timeout;
 
 	/*
 	 * For record command the -o is already taken care of.
@@ -2879,6 +2890,7 @@ int cmd_stat(int argc, const char **argv)
 				   "The overhead percentage could be high in some cases. "
 				   "Please proceed with caution.\n");
 	}
+
 	if (stat_config.times && interval)
 		interval_count = true;
 	else if (stat_config.times && !interval) {
@@ -2889,6 +2901,23 @@ int cmd_stat(int argc, const char **argv)
 		goto out;
 	}
 
+	if (timeout && timeout < 100) {
+		if (timeout < 10) {
+			pr_err("timeout must be >= 10ms.\n");
+			parse_options_usage(stat_usage, stat_options, "timeout", 0);
+			goto out;
+		} else
+			pr_warning("timeout < 100ms. "
+				   "The overhead percentage could be high in some cases. "
+				   "Please proceed with caution.\n");
+	}
+	if (timeout && interval) {
+		pr_err("timeout option is not supported with interval-print.\n");
+		parse_options_usage(stat_usage, stat_options, "timeout", 0);
+		parse_options_usage(stat_usage, stat_options, "I", 1);
+		goto out;
+	}
+
 	if (perf_evlist__alloc_stats(evsel_list, interval))
 		goto out;
 
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 540fbb350e53..2f44e386a0e8 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -90,6 +90,7 @@ struct perf_stat_config {
 	bool		scale;
 	FILE		*output;
 	unsigned int	interval;
+	unsigned int	timeout;
 	int		times;
 	struct runtime_stat *stats;
 	int		stats_num;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 11/41] tools lib symbol: Skip non-address kallsyms line
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (11 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Jiri Olsa, Alexander Shishkin,
	David Ahern, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Jiri Olsa <jolsa@kernel.org>

Adding check on failed attempt to parse the address and skip the line
parsing early in that case.

The address can be replaced with '(null)' string in case user don't have
enough permissions, like:

  $ cat /proc/kallsyms
      (null) A irq_stack_union
      (null) A __per_cpu_start
      ...

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180215122635.24029-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/lib/symbol/kallsyms.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/lib/symbol/kallsyms.c b/tools/lib/symbol/kallsyms.c
index 914cb8e3d40b..689b6a130dd7 100644
--- a/tools/lib/symbol/kallsyms.c
+++ b/tools/lib/symbol/kallsyms.c
@@ -38,6 +38,10 @@ int kallsyms__parse(const char *filename, void *arg,
 
 		len = hex2u64(line, &start);
 
+		/* Skip the line if we failed to parse the address. */
+		if (!len)
+			continue;
+
 		len++;
 		if (len + 2 >= line_len)
 			continue;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 12/41] perf symbols: Check if we read regular file in dso__load()
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (12 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Jiri Olsa, Alexander Shishkin,
	David Ahern, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Jiri Olsa <jolsa@kernel.org>

The current code in dso__load() calls is_regular_file(), but it checks
its return value only after calling symsrc__init().

That can make symsrc__init() block in elf_* functions on reading
the file if the file happens to be device and not regular one.

Call symsrc__init() only for regular files. Also remove the
symsrc__destroy() cleanup, which is not needed now, because we call
symsrc__init() only for regular files.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180215122635.24029-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/symbol.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index cc065d4bfafc..e366e3060e6b 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1582,7 +1582,7 @@ int dso__load(struct dso *dso, struct map *map)
 		bool next_slot = false;
 		bool is_reg;
 		bool nsexit;
-		int sirc;
+		int sirc = -1;
 
 		enum dso_binary_type symtab_type = binary_type_symtab[i];
 
@@ -1600,16 +1600,14 @@ int dso__load(struct dso *dso, struct map *map)
 			nsinfo__mountns_exit(&nsc);
 
 		is_reg = is_regular_file(name);
-		sirc = symsrc__init(ss, dso, name, symtab_type);
+		if (is_reg)
+			sirc = symsrc__init(ss, dso, name, symtab_type);
 
 		if (nsexit)
 			nsinfo__mountns_enter(dso->nsinfo, &nsc);
 
-		if (!is_reg || sirc < 0) {
-			if (sirc >= 0)
-				symsrc__destroy(ss);
+		if (!is_reg || sirc < 0)
 			continue;
-		}
 
 		if (!syms_ss && symsrc__has_symtab(ss)) {
 			syms_ss = ss;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 13/41] perf machine: Free root_dir in machine__init() error path
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (13 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Jiri Olsa, Alexander Shishkin,
	David Ahern, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Jiri Olsa <jolsa@kernel.org>

Free root_dir in machine__init() error path.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180215122635.24029-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/machine.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index b05a67464c03..c976384f9022 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -50,6 +50,8 @@ static void machine__threads_init(struct machine *machine)
 
 int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
 {
+	int err = -ENOMEM;
+
 	memset(machine, 0, sizeof(*machine));
 	map_groups__init(&machine->kmaps, machine);
 	RB_CLEAR_NODE(&machine->rb_node);
@@ -79,7 +81,7 @@ int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
 		char comm[64];
 
 		if (thread == NULL)
-			return -ENOMEM;
+			goto out;
 
 		snprintf(comm, sizeof(comm), "[guest/%d]", pid);
 		thread__set_comm(thread, comm, 0);
@@ -87,7 +89,11 @@ int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
 	}
 
 	machine->current_tid = NULL;
+	err = 0;
 
+out:
+	if (err)
+		zfree(&machine->root_dir);
 	return 0;
 }
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 14/41] perf machine: Move kernel mmap name into struct machine
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (14 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Jiri Olsa, Alexander Shishkin,
	David Ahern, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Jiri Olsa <jolsa@kernel.org>

It simplifies and centralizes the code. The kernel mmap name is set for
machine type, which we know from the beginning, so there's no reason to
generate it every time we need it.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180215122635.24029-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/build-id.c | 10 +++----
 tools/perf/util/event.c    |  5 +---
 tools/perf/util/machine.c  | 67 +++++++++++++++++++++++-----------------------
 tools/perf/util/machine.h  |  3 +--
 tools/perf/util/symbol.c   |  3 +--
 5 files changed, 39 insertions(+), 49 deletions(-)

diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index 7f8553630c4d..537eadd81914 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -316,7 +316,6 @@ static int machine__write_buildid_table(struct machine *machine,
 					struct feat_fd *fd)
 {
 	int err = 0;
-	char nm[PATH_MAX];
 	struct dso *pos;
 	u16 kmisc = PERF_RECORD_MISC_KERNEL,
 	    umisc = PERF_RECORD_MISC_USER;
@@ -338,9 +337,8 @@ static int machine__write_buildid_table(struct machine *machine,
 			name = pos->short_name;
 			name_len = pos->short_name_len;
 		} else if (dso__is_kcore(pos)) {
-			machine__mmap_name(machine, nm, sizeof(nm));
-			name = nm;
-			name_len = strlen(nm);
+			name = machine->mmap_name;
+			name_len = strlen(name);
 		} else {
 			name = pos->long_name;
 			name_len = pos->long_name_len;
@@ -813,12 +811,10 @@ static int dso__cache_build_id(struct dso *dso, struct machine *machine)
 	bool is_kallsyms = dso__is_kallsyms(dso);
 	bool is_vdso = dso__is_vdso(dso);
 	const char *name = dso->long_name;
-	char nm[PATH_MAX];
 
 	if (dso__is_kcore(dso)) {
 		is_kallsyms = true;
-		machine__mmap_name(machine, nm, sizeof(nm));
-		name = nm;
+		name = machine->mmap_name;
 	}
 	return build_id_cache__add_b(dso->build_id, sizeof(dso->build_id), name,
 				     dso->nsinfo, is_kallsyms, is_vdso);
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 44e603c27944..4644e751a3e3 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -894,8 +894,6 @@ int perf_event__synthesize_kernel_mmap(struct perf_tool *tool,
 				       struct machine *machine)
 {
 	size_t size;
-	const char *mmap_name;
-	char name_buff[PATH_MAX];
 	struct map *map = machine__kernel_map(machine);
 	struct kmap *kmap;
 	int err;
@@ -918,7 +916,6 @@ int perf_event__synthesize_kernel_mmap(struct perf_tool *tool,
 		return -1;
 	}
 
-	mmap_name = machine__mmap_name(machine, name_buff, sizeof(name_buff));
 	if (machine__is_host(machine)) {
 		/*
 		 * kernel uses PERF_RECORD_MISC_USER for user space maps,
@@ -931,7 +928,7 @@ int perf_event__synthesize_kernel_mmap(struct perf_tool *tool,
 
 	kmap = map__kmap(map);
 	size = snprintf(event->mmap.filename, sizeof(event->mmap.filename),
-			"%s%s", mmap_name, kmap->ref_reloc_sym->name) + 1;
+			"%s%s", machine->mmap_name, kmap->ref_reloc_sym->name) + 1;
 	size = PERF_ALIGN(size, sizeof(u64));
 	event->mmap.header.type = PERF_RECORD_MMAP;
 	event->mmap.header.size = (sizeof(event->mmap) -
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index c976384f9022..b1f1961b13f4 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -48,6 +48,27 @@ static void machine__threads_init(struct machine *machine)
 	}
 }
 
+static int machine__set_mmap_name(struct machine *machine)
+{
+	if (machine__is_host(machine)) {
+		if (symbol_conf.vmlinux_name)
+			machine->mmap_name = strdup(symbol_conf.vmlinux_name);
+		else
+			machine->mmap_name = strdup("[kernel.kallsyms]");
+	} else if (machine__is_default_guest(machine)) {
+		if (symbol_conf.default_guest_vmlinux_name)
+			machine->mmap_name = strdup(symbol_conf.default_guest_vmlinux_name);
+		else
+			machine->mmap_name = strdup("[guest.kernel.kallsyms]");
+	} else {
+		if (asprintf(&machine->mmap_name, "[guest.kernel.kallsyms.%d]",
+			 machine->pid) < 0)
+			machine->mmap_name = NULL;
+	}
+
+	return machine->mmap_name ? 0 : -ENOMEM;
+}
+
 int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
 {
 	int err = -ENOMEM;
@@ -75,6 +96,9 @@ int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
 	if (machine->root_dir == NULL)
 		return -ENOMEM;
 
+	if (machine__set_mmap_name(machine))
+		goto out;
+
 	if (pid != HOST_KERNEL_ID) {
 		struct thread *thread = machine__findnew_thread(machine, -1,
 								pid);
@@ -92,8 +116,10 @@ int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
 	err = 0;
 
 out:
-	if (err)
+	if (err) {
 		zfree(&machine->root_dir);
+		zfree(&machine->mmap_name);
+	}
 	return 0;
 }
 
@@ -186,6 +212,7 @@ void machine__exit(struct machine *machine)
 	dsos__exit(&machine->dsos);
 	machine__exit_vdso(machine);
 	zfree(&machine->root_dir);
+	zfree(&machine->mmap_name);
 	zfree(&machine->current_tid);
 
 	for (i = 0; i < THREADS__TABLE_SIZE; i++) {
@@ -328,20 +355,6 @@ void machines__process_guests(struct machines *machines,
 	}
 }
 
-char *machine__mmap_name(struct machine *machine, char *bf, size_t size)
-{
-	if (machine__is_host(machine))
-		snprintf(bf, size, "[%s]", "kernel.kallsyms");
-	else if (machine__is_default_guest(machine))
-		snprintf(bf, size, "[%s]", "guest.kernel.kallsyms");
-	else {
-		snprintf(bf, size, "[%s.%d]", "guest.kernel.kallsyms",
-			 machine->pid);
-	}
-
-	return bf;
-}
-
 void machines__set_id_hdr_size(struct machines *machines, u16 id_hdr_size)
 {
 	struct rb_node *node;
@@ -777,25 +790,13 @@ size_t machine__fprintf(struct machine *machine, FILE *fp)
 
 static struct dso *machine__get_kernel(struct machine *machine)
 {
-	const char *vmlinux_name = NULL;
+	const char *vmlinux_name = machine->mmap_name;
 	struct dso *kernel;
 
 	if (machine__is_host(machine)) {
-		vmlinux_name = symbol_conf.vmlinux_name;
-		if (!vmlinux_name)
-			vmlinux_name = DSO__NAME_KALLSYMS;
-
 		kernel = machine__findnew_kernel(machine, vmlinux_name,
 						 "[kernel]", DSO_TYPE_KERNEL);
 	} else {
-		char bf[PATH_MAX];
-
-		if (machine__is_default_guest(machine))
-			vmlinux_name = symbol_conf.default_guest_vmlinux_name;
-		if (!vmlinux_name)
-			vmlinux_name = machine__mmap_name(machine, bf,
-							  sizeof(bf));
-
 		kernel = machine__findnew_kernel(machine, vmlinux_name,
 						 "[guest.kernel]",
 						 DSO_TYPE_GUEST_KERNEL);
@@ -1295,7 +1296,6 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
 					      union perf_event *event)
 {
 	struct map *map;
-	char kmmap_prefix[PATH_MAX];
 	enum dso_kernel_type kernel_type;
 	bool is_kernel_mmap;
 
@@ -1303,15 +1303,14 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
 	if (machine__uses_kcore(machine))
 		return 0;
 
-	machine__mmap_name(machine, kmmap_prefix, sizeof(kmmap_prefix));
 	if (machine__is_host(machine))
 		kernel_type = DSO_TYPE_KERNEL;
 	else
 		kernel_type = DSO_TYPE_GUEST_KERNEL;
 
 	is_kernel_mmap = memcmp(event->mmap.filename,
-				kmmap_prefix,
-				strlen(kmmap_prefix) - 1) == 0;
+				machine->mmap_name,
+				strlen(machine->mmap_name) - 1) == 0;
 	if (event->mmap.filename[0] == '/' ||
 	    (!is_kernel_mmap && event->mmap.filename[0] == '[')) {
 		map = machine__findnew_module_map(machine, event->mmap.start,
@@ -1322,7 +1321,7 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
 		map->end = map->start + event->mmap.len;
 	} else if (is_kernel_mmap) {
 		const char *symbol_name = (event->mmap.filename +
-				strlen(kmmap_prefix));
+				strlen(machine->mmap_name));
 		/*
 		 * Should be there already, from the build-id table in
 		 * the header.
@@ -1363,7 +1362,7 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
 		up_read(&machine->dsos.lock);
 
 		if (kernel == NULL)
-			kernel = machine__findnew_dso(machine, kmmap_prefix);
+			kernel = machine__findnew_dso(machine, machine->mmap_name);
 		if (kernel == NULL)
 			goto out_problem;
 
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 5ce860b64c74..cb0a20f3a96b 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -43,6 +43,7 @@ struct machine {
 	bool		  comm_exec;
 	bool		  kptr_restrict_warned;
 	char		  *root_dir;
+	char		  *mmap_name;
 	struct threads    threads[THREADS__TABLE_SIZE];
 	struct vdso_info  *vdso_info;
 	struct perf_env   *env;
@@ -142,8 +143,6 @@ struct machine *machines__find(struct machines *machines, pid_t pid);
 struct machine *machines__findnew(struct machines *machines, pid_t pid);
 
 void machines__set_id_hdr_size(struct machines *machines, u16 id_hdr_size);
-char *machine__mmap_name(struct machine *machine, char *bf, size_t size);
-
 void machines__set_comm_exec(struct machines *machines, bool comm_exec);
 
 struct machine *machine__new_host(void);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index e366e3060e6b..a1a312d99f30 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1958,8 +1958,7 @@ static int dso__load_guest_kernel_sym(struct dso *dso, struct map *map)
 		pr_debug("Using %s for symbols\n", kallsyms_filename);
 	if (err > 0 && !dso__is_kcore(dso)) {
 		dso->binary_type = DSO_BINARY_TYPE__GUEST_KALLSYMS;
-		machine__mmap_name(machine, path, sizeof(path));
-		dso__set_long_name(dso, strdup(path), true);
+		dso__set_long_name(dso, machine->mmap_name, false);
 		map__fixup_start(map);
 		map__fixup_end(map);
 	}
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 15/41] perf machine: Generalize machine__set_kernel_mmap()
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (15 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Jiri Olsa, Alexander Shishkin,
	David Ahern, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Jiri Olsa <jolsa@kernel.org>

So it could be called without event object, just with start and end
values. It will be used in following patch.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180215122635.24029-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/machine.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index b1f1961b13f4..292e70c774bd 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1262,15 +1262,15 @@ int machine__create_kernel_maps(struct machine *machine)
 	return 0;
 }
 
-static void machine__set_kernel_mmap_len(struct machine *machine,
-					 union perf_event *event)
+static void machine__set_kernel_mmap(struct machine *machine,
+				     u64 start, u64 end)
 {
 	int i;
 
 	for (i = 0; i < MAP__NR_TYPES; i++) {
-		machine->vmlinux_maps[i]->start = event->mmap.start;
-		machine->vmlinux_maps[i]->end   = (event->mmap.start +
-						   event->mmap.len);
+		machine->vmlinux_maps[i]->start = start;
+		machine->vmlinux_maps[i]->end   = end;
+
 		/*
 		 * Be a bit paranoid here, some perf.data file came with
 		 * a zero sized synthesized MMAP event for the kernel.
@@ -1375,7 +1375,8 @@ static int machine__process_kernel_mmap_event(struct machine *machine,
 		if (strstr(kernel->long_name, "vmlinux"))
 			dso__set_short_name(kernel, "[kernel.vmlinux]", false);
 
-		machine__set_kernel_mmap_len(machine, event);
+		machine__set_kernel_mmap(machine, event->mmap.start,
+					 event->mmap.start + event->mmap.len);
 
 		/*
 		 * Avoid using a zero address (kptr_restrict) for the ref reloc
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 16/41] perf machine: Don't search for active kernel start in __machine__create_kernel_maps
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (16 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Jiri Olsa, Alexander Shishkin,
	David Ahern, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Jiri Olsa <jolsa@kernel.org>

We should not search for the kernel start address in
__machine__create_kernel_maps(), because it's being used in the 'report'
code path, where we are interested in kernel MMAP data address (the one
recorded via 'perf record', possibly on another machine, or an older or
newer kernel on the same machine where analysis is being performed)
instead of in current kernel address.

The __machine__create_kernel_maps() function serves purely for creating
the machines kernel maps and setting up the kmap group. The report code
path then sets the address based on the data from kernel MMAP event in
the machine__set_kernel_mmap() function.

The kallsyms search address logic is used for test code, that calls
machine__create_kernel_maps() to get current maps and calls
machine__get_running_kernel_start() to get kernel starting address.

Use machine__set_kernel_mmap() to set the kernel maps start address and
moving map_groups__fixup_end to be call when all maps are in place.

Also make __machine__create_kernel_maps static, because there's no
external user.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180215122635.24029-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/machine.c | 55 ++++++++++++++++++++++-------------------------
 tools/perf/util/machine.h |  1 -
 2 files changed, 26 insertions(+), 30 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 292e70c774bd..2db8d7dd0f80 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -856,13 +856,10 @@ static int machine__get_running_kernel_start(struct machine *machine,
 	return 0;
 }
 
-int __machine__create_kernel_maps(struct machine *machine, struct dso *kernel)
+static int
+__machine__create_kernel_maps(struct machine *machine, struct dso *kernel)
 {
 	int type;
-	u64 start = 0;
-
-	if (machine__get_running_kernel_start(machine, NULL, &start))
-		return -1;
 
 	/* In case of renewal the kernel map, destroy previous one */
 	machine__destroy_kernel_maps(machine);
@@ -871,7 +868,7 @@ int __machine__create_kernel_maps(struct machine *machine, struct dso *kernel)
 		struct kmap *kmap;
 		struct map *map;
 
-		machine->vmlinux_maps[type] = map__new2(start, kernel, type);
+		machine->vmlinux_maps[type] = map__new2(0, kernel, type);
 		if (machine->vmlinux_maps[type] == NULL)
 			return -1;
 
@@ -1222,6 +1219,24 @@ static int machine__create_modules(struct machine *machine)
 	return 0;
 }
 
+static void machine__set_kernel_mmap(struct machine *machine,
+				     u64 start, u64 end)
+{
+	int i;
+
+	for (i = 0; i < MAP__NR_TYPES; i++) {
+		machine->vmlinux_maps[i]->start = start;
+		machine->vmlinux_maps[i]->end   = end;
+
+		/*
+		 * Be a bit paranoid here, some perf.data file came with
+		 * a zero sized synthesized MMAP event for the kernel.
+		 */
+		if (machine->vmlinux_maps[i]->end == 0)
+			machine->vmlinux_maps[i]->end = ~0ULL;
+	}
+}
+
 int machine__create_kernel_maps(struct machine *machine)
 {
 	struct dso *kernel = machine__get_kernel(machine);
@@ -1246,40 +1261,22 @@ int machine__create_kernel_maps(struct machine *machine)
 				 "continuing anyway...\n", machine->pid);
 	}
 
-	/*
-	 * Now that we have all the maps created, just set the ->end of them:
-	 */
-	map_groups__fixup_end(&machine->kmaps);
-
 	if (!machine__get_running_kernel_start(machine, &name, &addr)) {
 		if (name &&
 		    maps__set_kallsyms_ref_reloc_sym(machine->vmlinux_maps, name, addr)) {
 			machine__destroy_kernel_maps(machine);
 			return -1;
 		}
+		machine__set_kernel_mmap(machine, addr, 0);
 	}
 
+	/*
+	 * Now that we have all the maps created, just set the ->end of them:
+	 */
+	map_groups__fixup_end(&machine->kmaps);
 	return 0;
 }
 
-static void machine__set_kernel_mmap(struct machine *machine,
-				     u64 start, u64 end)
-{
-	int i;
-
-	for (i = 0; i < MAP__NR_TYPES; i++) {
-		machine->vmlinux_maps[i]->start = start;
-		machine->vmlinux_maps[i]->end   = end;
-
-		/*
-		 * Be a bit paranoid here, some perf.data file came with
-		 * a zero sized synthesized MMAP event for the kernel.
-		 */
-		if (machine->vmlinux_maps[i]->end == 0)
-			machine->vmlinux_maps[i]->end = ~0ULL;
-	}
-}
-
 static bool machine__uses_kcore(struct machine *machine)
 {
 	struct dso *dso;
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index cb0a20f3a96b..50d587d34459 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -238,7 +238,6 @@ size_t machines__fprintf_dsos_buildid(struct machines *machines, FILE *fp,
 				     bool (skip)(struct dso *dso, int parm), int parm);
 
 void machine__destroy_kernel_maps(struct machine *machine);
-int __machine__create_kernel_maps(struct machine *machine, struct dso *kernel);
 int machine__create_kernel_maps(struct machine *machine);
 
 int machines__create_kernel_maps(struct machines *machines, pid_t pid);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 17/41] perf machine: Remove machine__load_kallsyms()
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (17 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Jiri Olsa, Alexander Shishkin,
	David Ahern, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Jiri Olsa <jolsa@kernel.org>

The current machine__load_kallsyms() function has no caller, so replace
it directly with __machine__load_kallsyms().  Also remove the no_kcore
argument as it was always called with a 'true' value.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180215122635.24029-8-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/tests/vmlinux-kallsyms.c |  2 +-
 tools/perf/util/machine.c           | 14 ++++----------
 tools/perf/util/machine.h           |  2 --
 3 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/tools/perf/tests/vmlinux-kallsyms.c b/tools/perf/tests/vmlinux-kallsyms.c
index f6789fb029d6..58349297f9fb 100644
--- a/tools/perf/tests/vmlinux-kallsyms.c
+++ b/tools/perf/tests/vmlinux-kallsyms.c
@@ -56,7 +56,7 @@ int test__vmlinux_matches_kallsyms(struct test *test __maybe_unused, int subtest
 	 * be compacted against the list of modules found in the "vmlinux"
 	 * code and with the one got from /proc/modules from the "kallsyms" code.
 	 */
-	if (__machine__load_kallsyms(&kallsyms, "/proc/kallsyms", type, true) <= 0) {
+	if (machine__load_kallsyms(&kallsyms, "/proc/kallsyms", type) <= 0) {
 		pr_debug("dso__load_kallsyms ");
 		goto out;
 	}
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 2db8d7dd0f80..fe27ef55cbb9 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -151,7 +151,7 @@ struct machine *machine__new_kallsyms(void)
 	 *    ask for not using the kcore parsing code, once this one is fixed
 	 *    to create a map per module.
 	 */
-	if (machine && __machine__load_kallsyms(machine, "/proc/kallsyms", MAP__FUNCTION, true) <= 0) {
+	if (machine && machine__load_kallsyms(machine, "/proc/kallsyms", MAP__FUNCTION) <= 0) {
 		machine__delete(machine);
 		machine = NULL;
 	}
@@ -991,11 +991,11 @@ int machines__create_kernel_maps(struct machines *machines, pid_t pid)
 	return machine__create_kernel_maps(machine);
 }
 
-int __machine__load_kallsyms(struct machine *machine, const char *filename,
-			     enum map_type type, bool no_kcore)
+int machine__load_kallsyms(struct machine *machine, const char *filename,
+			     enum map_type type)
 {
 	struct map *map = machine__kernel_map(machine);
-	int ret = __dso__load_kallsyms(map->dso, filename, map, no_kcore);
+	int ret = __dso__load_kallsyms(map->dso, filename, map, true);
 
 	if (ret > 0) {
 		dso__set_loaded(map->dso, type);
@@ -1010,12 +1010,6 @@ int __machine__load_kallsyms(struct machine *machine, const char *filename,
 	return ret;
 }
 
-int machine__load_kallsyms(struct machine *machine, const char *filename,
-			   enum map_type type)
-{
-	return __machine__load_kallsyms(machine, filename, type, false);
-}
-
 int machine__load_vmlinux_path(struct machine *machine, enum map_type type)
 {
 	struct map *map = machine__kernel_map(machine);
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 50d587d34459..66cc200ef86f 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -225,8 +225,6 @@ struct map *machine__findnew_module_map(struct machine *machine, u64 start,
 					const char *filename);
 int arch__fix_module_text_start(u64 *start, const char *name);
 
-int __machine__load_kallsyms(struct machine *machine, const char *filename,
-			     enum map_type type, bool no_kcore);
 int machine__load_kallsyms(struct machine *machine, const char *filename,
 			   enum map_type type);
 int machine__load_vmlinux_path(struct machine *machine, enum map_type type);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 18/41] perf tools: Do not create kernel maps in sample__resolve()
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (18 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Jiri Olsa, Alexander Shishkin,
	David Ahern, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Jiri Olsa <jolsa@kernel.org>

There's no need for kernel maps to be allocated at this point - sample
processing.

We search for kernel maps using the kernel map_groups in machine::kmaps
which is static. If vmlinux maps for any reason still don't exist, the
search correctly fails because they are not in the map group.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180215122635.24029-9-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/event.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 4644e751a3e3..f0a6cbd033cc 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -1588,17 +1588,6 @@ int machine__resolve(struct machine *machine, struct addr_location *al,
 		return -1;
 
 	dump_printf(" ... thread: %s:%d\n", thread__comm_str(thread), thread->tid);
-	/*
-	 * Have we already created the kernel maps for this machine?
-	 *
-	 * This should have happened earlier, when we processed the kernel MMAP
-	 * events, but for older perf.data files there was no such thing, so do
-	 * it now.
-	 */
-	if (sample->cpumode == PERF_RECORD_MISC_KERNEL &&
-	    machine__kernel_map(machine) == NULL)
-		machine__create_kernel_maps(machine);
-
 	thread__find_addr_map(thread, sample->cpumode, MAP__FUNCTION, sample->ip, al);
 	dump_printf(" ...... dso: %s\n",
 		    al->map ? al->map->dso->long_name :
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 19/41] perf tests: Use arch__compare_symbol_names to compare symbols
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (19 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Jiri Olsa, Alexander Shishkin,
	David Ahern, Namhyung Kim, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Jiri Olsa <jolsa@kernel.org>

The symbol search called by machine__find_kernel_symbol_by_name is using
internally arch__compare_symbol_names function to compare 2 symbol
names, because different archs have different ways of comparing symbols.
Mostly for skipping '.' prefixes and similar.

In test 1 when we try to find matching symbols in kallsyms and vmlinux,
by address and by symbol name. When either is found we compare the pair
symbol names  by simple strcmp, which is not good enough for reasons
explained in previous paragraph.

On powerpc this can cause lockup, because even thought we found the
pair, the compared names are different and don't match simple strcmp.
Following code path is executed, that leads to lockup:

   - we find the pair in kallsyms by sym->start
next_pair:
   - we compare the names and it fails
   - we find the pair by sym->name
   - the pair addresses match so we call goto next_pair
     because we assume the names match in this case

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 031b84c407c3 ("perf probe ppc: Enable matching against dot symbols automatically")
Link: http://lkml.kernel.org/r/20180215122635.24029-10-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/tests/vmlinux-kallsyms.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/tests/vmlinux-kallsyms.c b/tools/perf/tests/vmlinux-kallsyms.c
index 58349297f9fb..1e5adb65632a 100644
--- a/tools/perf/tests/vmlinux-kallsyms.c
+++ b/tools/perf/tests/vmlinux-kallsyms.c
@@ -125,7 +125,7 @@ int test__vmlinux_matches_kallsyms(struct test *test __maybe_unused, int subtest
 
 		if (pair && UM(pair->start) == mem_start) {
 next_pair:
-			if (strcmp(sym->name, pair->name) == 0) {
+			if (arch__compare_symbol_names(sym->name, pair->name) == 0) {
 				/*
 				 * kallsyms don't have the symbol end, so we
 				 * set that by using the next symbol start - 1,
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 20/41] perf cs-etm: Freeing allocated memory
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  (?)
@ 2018-02-16 19:17   ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Mathieu Poirier,
	Alexander Shishkin, Jin Yao, Namhyung Kim, Peter Zijlstra,
	linux-arm-kernel, Arnaldo Carvalho de Melo

From: Mathieu Poirier <mathieu.poirier@linaro.org>

This patch frees all the memory allocated in function
cs_etm__alloc_queue().

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1518467557-18505-2-git-send-email-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/cs-etm.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index b9f0a53dfa65..f2c98774e665 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -174,6 +174,12 @@ static void cs_etm__free_queue(void *priv)
 {
 	struct cs_etm_queue *etmq = priv;
 
+	if (!etmq)
+		return;
+
+	thread__zput(etmq->thread);
+	cs_etm_decoder__free(etmq->decoder);
+	zfree(&etmq->event_buf);
 	free(etmq);
 }
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 20/41] perf cs-etm: Freeing allocated memory
@ 2018-02-16 19:17   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, Mathieu Poirier, Alexander Shishkin,
	linux-kernel, linux-perf-users, Peter Zijlstra, Jin Yao,
	Namhyung Kim, linux-arm-kernel

From: Mathieu Poirier <mathieu.poirier@linaro.org>

This patch frees all the memory allocated in function
cs_etm__alloc_queue().

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1518467557-18505-2-git-send-email-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/cs-etm.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index b9f0a53dfa65..f2c98774e665 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -174,6 +174,12 @@ static void cs_etm__free_queue(void *priv)
 {
 	struct cs_etm_queue *etmq = priv;
 
+	if (!etmq)
+		return;
+
+	thread__zput(etmq->thread);
+	cs_etm_decoder__free(etmq->decoder);
+	zfree(&etmq->event_buf);
 	free(etmq);
 }
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 20/41] perf cs-etm: Freeing allocated memory
@ 2018-02-16 19:17   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: linux-arm-kernel

From: Mathieu Poirier <mathieu.poirier@linaro.org>

This patch frees all the memory allocated in function
cs_etm__alloc_queue().

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel at lists.infradead.org
Link: http://lkml.kernel.org/r/1518467557-18505-2-git-send-email-mathieu.poirier at linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/cs-etm.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index b9f0a53dfa65..f2c98774e665 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -174,6 +174,12 @@ static void cs_etm__free_queue(void *priv)
 {
 	struct cs_etm_queue *etmq = priv;
 
+	if (!etmq)
+		return;
+
+	thread__zput(etmq->thread);
+	cs_etm_decoder__free(etmq->decoder);
+	zfree(&etmq->event_buf);
 	free(etmq);
 }
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 21/41] perf tools: Use target->per_thread and target->system_wide flags
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
@ 2018-02-16 19:17   ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Jin Yao, Alexander Shishkin,
	Namhyung Kim, Peter Zijlstra, linux-arm-kernel, Mathieu Poirier,
	Arnaldo Carvalho de Melo

From: Jin Yao <yao.jin@linux.intel.com>

Mathieu Poirier reports issue in commit ("73c0ca1eee3d perf thread_map:
Enumerate all threads from /proc") that it has negative impact on 'perf
record --per-thread'. It has the effect of creating a kernel event for
each thread in the system for 'perf record --per-thread'.

Mathieu Poirier's patch ("perf util: Do not reuse target->per_thread flag")
can fix this issue by creating a new target->all_threads flag.

This patch is based on Mathieu Poirier's patch but it doesn't use a new
target->all_threads flag. This patch just uses 'target->per_thread &&
target->system_wide' as a condition to check for all threads case.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Fixes: 73c0ca1eee3d ("perf thread_map: Enumerate all threads from /proc")
Link: http://lkml.kernel.org/r/1518467557-18505-3-git-send-email-mathieu.poirier@linaro.org
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
[Fixed checkpatch warning about line over 80 characters]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/evlist.c     | 21 ++++++++++++++++++++-
 tools/perf/util/thread_map.c |  4 ++--
 tools/perf/util/thread_map.h |  2 +-
 3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index e5fc14e53c05..7b7d535396f7 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1086,11 +1086,30 @@ int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages)
 
 int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
 {
+	bool all_threads = (target->per_thread && target->system_wide);
 	struct cpu_map *cpus;
 	struct thread_map *threads;
 
+	/*
+	 * If specify '-a' and '--per-thread' to perf record, perf record
+	 * will override '--per-thread'. target->per_thread = false and
+	 * target->system_wide = true.
+	 *
+	 * If specify '--per-thread' only to perf record,
+	 * target->per_thread = true and target->system_wide = false.
+	 *
+	 * So target->per_thread && target->system_wide is false.
+	 * For perf record, thread_map__new_str doesn't call
+	 * thread_map__new_all_cpus. That will keep perf record's
+	 * current behavior.
+	 *
+	 * For perf stat, it allows the case that target->per_thread and
+	 * target->system_wide are all true. It means to collect system-wide
+	 * per-thread data. thread_map__new_str will call
+	 * thread_map__new_all_cpus to enumerate all threads.
+	 */
 	threads = thread_map__new_str(target->pid, target->tid, target->uid,
-				      target->per_thread);
+				      all_threads);
 
 	if (!threads)
 		return -1;
diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
index 3e1038f6491c..729dad8f412d 100644
--- a/tools/perf/util/thread_map.c
+++ b/tools/perf/util/thread_map.c
@@ -323,7 +323,7 @@ struct thread_map *thread_map__new_by_tid_str(const char *tid_str)
 }
 
 struct thread_map *thread_map__new_str(const char *pid, const char *tid,
-				       uid_t uid, bool per_thread)
+				       uid_t uid, bool all_threads)
 {
 	if (pid)
 		return thread_map__new_by_pid_str(pid);
@@ -331,7 +331,7 @@ struct thread_map *thread_map__new_str(const char *pid, const char *tid,
 	if (!tid && uid != UINT_MAX)
 		return thread_map__new_by_uid(uid);
 
-	if (per_thread)
+	if (all_threads)
 		return thread_map__new_all_cpus();
 
 	return thread_map__new_by_tid_str(tid);
diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h
index 0a806b99e73c..5ec91cfd1869 100644
--- a/tools/perf/util/thread_map.h
+++ b/tools/perf/util/thread_map.h
@@ -31,7 +31,7 @@ struct thread_map *thread_map__get(struct thread_map *map);
 void thread_map__put(struct thread_map *map);
 
 struct thread_map *thread_map__new_str(const char *pid,
-		const char *tid, uid_t uid, bool per_thread);
+		const char *tid, uid_t uid, bool all_threads);
 
 struct thread_map *thread_map__new_by_tid_str(const char *tid_str);
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 21/41] perf tools: Use target->per_thread and target->system_wide flags
@ 2018-02-16 19:17   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: linux-arm-kernel

From: Jin Yao <yao.jin@linux.intel.com>

Mathieu Poirier reports issue in commit ("73c0ca1eee3d perf thread_map:
Enumerate all threads from /proc") that it has negative impact on 'perf
record --per-thread'. It has the effect of creating a kernel event for
each thread in the system for 'perf record --per-thread'.

Mathieu Poirier's patch ("perf util: Do not reuse target->per_thread flag")
can fix this issue by creating a new target->all_threads flag.

This patch is based on Mathieu Poirier's patch but it doesn't use a new
target->all_threads flag. This patch just uses 'target->per_thread &&
target->system_wide' as a condition to check for all threads case.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel at lists.infradead.org
Fixes: 73c0ca1eee3d ("perf thread_map: Enumerate all threads from /proc")
Link: http://lkml.kernel.org/r/1518467557-18505-3-git-send-email-mathieu.poirier at linaro.org
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
[Fixed checkpatch warning about line over 80 characters]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/evlist.c     | 21 ++++++++++++++++++++-
 tools/perf/util/thread_map.c |  4 ++--
 tools/perf/util/thread_map.h |  2 +-
 3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index e5fc14e53c05..7b7d535396f7 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1086,11 +1086,30 @@ int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages)
 
 int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
 {
+	bool all_threads = (target->per_thread && target->system_wide);
 	struct cpu_map *cpus;
 	struct thread_map *threads;
 
+	/*
+	 * If specify '-a' and '--per-thread' to perf record, perf record
+	 * will override '--per-thread'. target->per_thread = false and
+	 * target->system_wide = true.
+	 *
+	 * If specify '--per-thread' only to perf record,
+	 * target->per_thread = true and target->system_wide = false.
+	 *
+	 * So target->per_thread && target->system_wide is false.
+	 * For perf record, thread_map__new_str doesn't call
+	 * thread_map__new_all_cpus. That will keep perf record's
+	 * current behavior.
+	 *
+	 * For perf stat, it allows the case that target->per_thread and
+	 * target->system_wide are all true. It means to collect system-wide
+	 * per-thread data. thread_map__new_str will call
+	 * thread_map__new_all_cpus to enumerate all threads.
+	 */
 	threads = thread_map__new_str(target->pid, target->tid, target->uid,
-				      target->per_thread);
+				      all_threads);
 
 	if (!threads)
 		return -1;
diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
index 3e1038f6491c..729dad8f412d 100644
--- a/tools/perf/util/thread_map.c
+++ b/tools/perf/util/thread_map.c
@@ -323,7 +323,7 @@ struct thread_map *thread_map__new_by_tid_str(const char *tid_str)
 }
 
 struct thread_map *thread_map__new_str(const char *pid, const char *tid,
-				       uid_t uid, bool per_thread)
+				       uid_t uid, bool all_threads)
 {
 	if (pid)
 		return thread_map__new_by_pid_str(pid);
@@ -331,7 +331,7 @@ struct thread_map *thread_map__new_str(const char *pid, const char *tid,
 	if (!tid && uid != UINT_MAX)
 		return thread_map__new_by_uid(uid);
 
-	if (per_thread)
+	if (all_threads)
 		return thread_map__new_all_cpus();
 
 	return thread_map__new_by_tid_str(tid);
diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h
index 0a806b99e73c..5ec91cfd1869 100644
--- a/tools/perf/util/thread_map.h
+++ b/tools/perf/util/thread_map.h
@@ -31,7 +31,7 @@ struct thread_map *thread_map__get(struct thread_map *map);
 void thread_map__put(struct thread_map *map);
 
 struct thread_map *thread_map__new_str(const char *pid,
-		const char *tid, uid_t uid, bool per_thread);
+		const char *tid, uid_t uid, bool all_threads);
 
 struct thread_map *thread_map__new_by_tid_str(const char *tid_str);
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 22/41] perf auxtrace arm: Fixing uninitialised variable
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  (?)
@ 2018-02-16 19:17   ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Mathieu Poirier,
	Alexander Shishkin, Jin Yao, Namhyung Kim, Peter Zijlstra,
	linux-arm-kernel, Arnaldo Carvalho de Melo

From: Mathieu Poirier <mathieu.poirier@linaro.org>

When working natively on arm64 the compiler gets pesky and complains
that variable 'i' is uninitialised, something that breaks the
compilation.  Here no further checks are needed since variable
'found_spe' can only be true if variable 'i' has been initialised as
part of the for loop.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1518467557-18505-4-git-send-email-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/arch/arm/util/auxtrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c
index 2323581b157d..fa639e3e52ac 100644
--- a/tools/perf/arch/arm/util/auxtrace.c
+++ b/tools/perf/arch/arm/util/auxtrace.c
@@ -68,7 +68,7 @@ struct auxtrace_record
 	bool found_spe = false;
 	static struct perf_pmu **arm_spe_pmus = NULL;
 	static int nr_spes = 0;
-	int i;
+	int i = 0;
 
 	if (!evlist)
 		return NULL;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 22/41] perf auxtrace arm: Fixing uninitialised variable
@ 2018-02-16 19:17   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, Mathieu Poirier, Alexander Shishkin,
	linux-kernel, linux-perf-users, Peter Zijlstra, Jin Yao,
	Namhyung Kim, linux-arm-kernel

From: Mathieu Poirier <mathieu.poirier@linaro.org>

When working natively on arm64 the compiler gets pesky and complains
that variable 'i' is uninitialised, something that breaks the
compilation.  Here no further checks are needed since variable
'found_spe' can only be true if variable 'i' has been initialised as
part of the for loop.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1518467557-18505-4-git-send-email-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/arch/arm/util/auxtrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c
index 2323581b157d..fa639e3e52ac 100644
--- a/tools/perf/arch/arm/util/auxtrace.c
+++ b/tools/perf/arch/arm/util/auxtrace.c
@@ -68,7 +68,7 @@ struct auxtrace_record
 	bool found_spe = false;
 	static struct perf_pmu **arm_spe_pmus = NULL;
 	static int nr_spes = 0;
-	int i;
+	int i = 0;
 
 	if (!evlist)
 		return NULL;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 22/41] perf auxtrace arm: Fixing uninitialised variable
@ 2018-02-16 19:17   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: linux-arm-kernel

From: Mathieu Poirier <mathieu.poirier@linaro.org>

When working natively on arm64 the compiler gets pesky and complains
that variable 'i' is uninitialised, something that breaks the
compilation.  Here no further checks are needed since variable
'found_spe' can only be true if variable 'i' has been initialised as
part of the for loop.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel at lists.infradead.org
Link: http://lkml.kernel.org/r/1518467557-18505-4-git-send-email-mathieu.poirier at linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/arch/arm/util/auxtrace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c
index 2323581b157d..fa639e3e52ac 100644
--- a/tools/perf/arch/arm/util/auxtrace.c
+++ b/tools/perf/arch/arm/util/auxtrace.c
@@ -68,7 +68,7 @@ struct auxtrace_record
 	bool found_spe = false;
 	static struct perf_pmu **arm_spe_pmus = NULL;
 	static int nr_spes = 0;
-	int i;
+	int i = 0;
 
 	if (!evlist)
 		return NULL;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 23/41] perf cs-etm: Properly deal with cpu maps
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
@ 2018-02-16 19:17   ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Mathieu Poirier,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Peter Zijlstra,
	linux-arm-kernel, Arnaldo Carvalho de Melo

From: Mathieu Poirier <mathieu.poirier@linaro.org>

This patch allows the CoreSight AUX info section to fit topologies where
only a subset of all available CPUs are present, avoiding at the same
time accessing the ETM configuration areas of CPUs that have been
offlined.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1518478737-24649-1-git-send-email-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/arch/arm/util/cs-etm.c | 51 +++++++++++++++++++++++++++------------
 1 file changed, 36 insertions(+), 15 deletions(-)

diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
index fbfc055d3f4d..5c655ad4621e 100644
--- a/tools/perf/arch/arm/util/cs-etm.c
+++ b/tools/perf/arch/arm/util/cs-etm.c
@@ -298,12 +298,17 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused,
 {
 	int i;
 	int etmv3 = 0, etmv4 = 0;
-	const struct cpu_map *cpus = evlist->cpus;
+	struct cpu_map *event_cpus = evlist->cpus;
+	struct cpu_map *online_cpus = cpu_map__new(NULL);
 
 	/* cpu map is not empty, we have specific CPUs to work with */
-	if (!cpu_map__empty(cpus)) {
-		for (i = 0; i < cpu_map__nr(cpus); i++) {
-			if (cs_etm_is_etmv4(itr, cpus->map[i]))
+	if (!cpu_map__empty(event_cpus)) {
+		for (i = 0; i < cpu__max_cpu(); i++) {
+			if (!cpu_map__has(event_cpus, i) ||
+			    !cpu_map__has(online_cpus, i))
+				continue;
+
+			if (cs_etm_is_etmv4(itr, i))
 				etmv4++;
 			else
 				etmv3++;
@@ -311,6 +316,9 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused,
 	} else {
 		/* get configuration for all CPUs in the system */
 		for (i = 0; i < cpu__max_cpu(); i++) {
+			if (!cpu_map__has(online_cpus, i))
+				continue;
+
 			if (cs_etm_is_etmv4(itr, i))
 				etmv4++;
 			else
@@ -318,6 +326,8 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused,
 		}
 	}
 
+	cpu_map__put(online_cpus);
+
 	return (CS_ETM_HEADER_SIZE +
 	       (etmv4 * CS_ETMV4_PRIV_SIZE) +
 	       (etmv3 * CS_ETMV3_PRIV_SIZE));
@@ -447,7 +457,9 @@ static int cs_etm_info_fill(struct auxtrace_record *itr,
 	int i;
 	u32 offset;
 	u64 nr_cpu, type;
-	const struct cpu_map *cpus = session->evlist->cpus;
+	struct cpu_map *cpu_map;
+	struct cpu_map *event_cpus = session->evlist->cpus;
+	struct cpu_map *online_cpus = cpu_map__new(NULL);
 	struct cs_etm_recording *ptr =
 			container_of(itr, struct cs_etm_recording, itr);
 	struct perf_pmu *cs_etm_pmu = ptr->cs_etm_pmu;
@@ -458,8 +470,21 @@ static int cs_etm_info_fill(struct auxtrace_record *itr,
 	if (!session->evlist->nr_mmaps)
 		return -EINVAL;
 
-	/* If the cpu_map is empty all CPUs are involved */
-	nr_cpu = cpu_map__empty(cpus) ? cpu__max_cpu() : cpu_map__nr(cpus);
+	/* If the cpu_map is empty all online CPUs are involved */
+	if (cpu_map__empty(event_cpus)) {
+		cpu_map = online_cpus;
+	} else {
+		/* Make sure all specified CPUs are online */
+		for (i = 0; i < cpu_map__nr(event_cpus); i++) {
+			if (cpu_map__has(event_cpus, i) &&
+			    !cpu_map__has(online_cpus, i))
+				return -EINVAL;
+		}
+
+		cpu_map = event_cpus;
+	}
+
+	nr_cpu = cpu_map__nr(cpu_map);
 	/* Get PMU type as dynamically assigned by the core */
 	type = cs_etm_pmu->type;
 
@@ -472,15 +497,11 @@ static int cs_etm_info_fill(struct auxtrace_record *itr,
 
 	offset = CS_ETM_SNAPSHOT + 1;
 
-	/* cpu map is not empty, we have specific CPUs to work with */
-	if (!cpu_map__empty(cpus)) {
-		for (i = 0; i < cpu_map__nr(cpus) && offset < priv_size; i++)
-			cs_etm_get_metadata(cpus->map[i], &offset, itr, info);
-	} else {
-		/* get configuration for all CPUs in the system */
-		for (i = 0; i < cpu__max_cpu(); i++)
+	for (i = 0; i < cpu__max_cpu() && offset < priv_size; i++)
+		if (cpu_map__has(cpu_map, i))
 			cs_etm_get_metadata(i, &offset, itr, info);
-	}
+
+	cpu_map__put(online_cpus);
 
 	return 0;
 }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 23/41] perf cs-etm: Properly deal with cpu maps
@ 2018-02-16 19:17   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: linux-arm-kernel

From: Mathieu Poirier <mathieu.poirier@linaro.org>

This patch allows the CoreSight AUX info section to fit topologies where
only a subset of all available CPUs are present, avoiding at the same
time accessing the ETM configuration areas of CPUs that have been
offlined.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel at lists.infradead.org
Link: http://lkml.kernel.org/r/1518478737-24649-1-git-send-email-mathieu.poirier at linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/arch/arm/util/cs-etm.c | 51 +++++++++++++++++++++++++++------------
 1 file changed, 36 insertions(+), 15 deletions(-)

diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
index fbfc055d3f4d..5c655ad4621e 100644
--- a/tools/perf/arch/arm/util/cs-etm.c
+++ b/tools/perf/arch/arm/util/cs-etm.c
@@ -298,12 +298,17 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused,
 {
 	int i;
 	int etmv3 = 0, etmv4 = 0;
-	const struct cpu_map *cpus = evlist->cpus;
+	struct cpu_map *event_cpus = evlist->cpus;
+	struct cpu_map *online_cpus = cpu_map__new(NULL);
 
 	/* cpu map is not empty, we have specific CPUs to work with */
-	if (!cpu_map__empty(cpus)) {
-		for (i = 0; i < cpu_map__nr(cpus); i++) {
-			if (cs_etm_is_etmv4(itr, cpus->map[i]))
+	if (!cpu_map__empty(event_cpus)) {
+		for (i = 0; i < cpu__max_cpu(); i++) {
+			if (!cpu_map__has(event_cpus, i) ||
+			    !cpu_map__has(online_cpus, i))
+				continue;
+
+			if (cs_etm_is_etmv4(itr, i))
 				etmv4++;
 			else
 				etmv3++;
@@ -311,6 +316,9 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused,
 	} else {
 		/* get configuration for all CPUs in the system */
 		for (i = 0; i < cpu__max_cpu(); i++) {
+			if (!cpu_map__has(online_cpus, i))
+				continue;
+
 			if (cs_etm_is_etmv4(itr, i))
 				etmv4++;
 			else
@@ -318,6 +326,8 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused,
 		}
 	}
 
+	cpu_map__put(online_cpus);
+
 	return (CS_ETM_HEADER_SIZE +
 	       (etmv4 * CS_ETMV4_PRIV_SIZE) +
 	       (etmv3 * CS_ETMV3_PRIV_SIZE));
@@ -447,7 +457,9 @@ static int cs_etm_info_fill(struct auxtrace_record *itr,
 	int i;
 	u32 offset;
 	u64 nr_cpu, type;
-	const struct cpu_map *cpus = session->evlist->cpus;
+	struct cpu_map *cpu_map;
+	struct cpu_map *event_cpus = session->evlist->cpus;
+	struct cpu_map *online_cpus = cpu_map__new(NULL);
 	struct cs_etm_recording *ptr =
 			container_of(itr, struct cs_etm_recording, itr);
 	struct perf_pmu *cs_etm_pmu = ptr->cs_etm_pmu;
@@ -458,8 +470,21 @@ static int cs_etm_info_fill(struct auxtrace_record *itr,
 	if (!session->evlist->nr_mmaps)
 		return -EINVAL;
 
-	/* If the cpu_map is empty all CPUs are involved */
-	nr_cpu = cpu_map__empty(cpus) ? cpu__max_cpu() : cpu_map__nr(cpus);
+	/* If the cpu_map is empty all online CPUs are involved */
+	if (cpu_map__empty(event_cpus)) {
+		cpu_map = online_cpus;
+	} else {
+		/* Make sure all specified CPUs are online */
+		for (i = 0; i < cpu_map__nr(event_cpus); i++) {
+			if (cpu_map__has(event_cpus, i) &&
+			    !cpu_map__has(online_cpus, i))
+				return -EINVAL;
+		}
+
+		cpu_map = event_cpus;
+	}
+
+	nr_cpu = cpu_map__nr(cpu_map);
 	/* Get PMU type as dynamically assigned by the core */
 	type = cs_etm_pmu->type;
 
@@ -472,15 +497,11 @@ static int cs_etm_info_fill(struct auxtrace_record *itr,
 
 	offset = CS_ETM_SNAPSHOT + 1;
 
-	/* cpu map is not empty, we have specific CPUs to work with */
-	if (!cpu_map__empty(cpus)) {
-		for (i = 0; i < cpu_map__nr(cpus) && offset < priv_size; i++)
-			cs_etm_get_metadata(cpus->map[i], &offset, itr, info);
-	} else {
-		/* get configuration for all CPUs in the system */
-		for (i = 0; i < cpu__max_cpu(); i++)
+	for (i = 0; i < cpu__max_cpu() && offset < priv_size; i++)
+		if (cpu_map__has(cpu_map, i))
 			cs_etm_get_metadata(i, &offset, itr, info);
-	}
+
+	cpu_map__put(online_cpus);
 
 	return 0;
 }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 24/41] perf annotate: Add missing arguments in Man page
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (24 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Jaecheol Shin, Jiri Olsa,
	Namhyung Kim, Taeung Song, Arnaldo Carvalho de Melo

From: Jaecheol Shin <jcgod413@gmail.com>

Some options must require an argument. But input, stdio-color, cpu have
no them.  So I added it.

Signed-off-by: Jaecheol Shin <jcgod413@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Link: http://lkml.kernel.org/r/20180207095205.62715-1-jcgod413@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-annotate.txt | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Documentation/perf-annotate.txt b/tools/perf/Documentation/perf-annotate.txt
index c635eab6af54..292809c3c0ca 100644
--- a/tools/perf/Documentation/perf-annotate.txt
+++ b/tools/perf/Documentation/perf-annotate.txt
@@ -21,7 +21,7 @@ If there is no debug info in the object, then annotated assembly is displayed.
 OPTIONS
 -------
 -i::
---input=::
+--input=<file>::
         Input file name. (default: perf.data unless stdin is a fifo)
 
 -d::
@@ -69,7 +69,7 @@ OPTIONS
 
 --stdio:: Use the stdio interface.
 
---stdio-color::
+--stdio-color=<mode>::
 	'always', 'never' or 'auto', allowing configuring color output
 	via the command line, in addition to via "color.ui" .perfconfig.
 	Use '--stdio-color always' to generate color even when redirecting
@@ -84,7 +84,7 @@ OPTIONS
 --gtk:: Use the GTK interface.
 
 -C::
---cpu:: Only report samples for the list of CPUs provided. Multiple CPUs can
+--cpu=<cpu>:: Only report samples for the list of CPUs provided. Multiple CPUs can
 	be provided as a comma-separated list with no space: 0,1. Ranges of
 	CPUs are specified with -: 0-2. Default is to report samples on all
 	CPUs.
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 25/41] perf kmem: Document a missing option & an argument
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (25 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Sangwon Hong, Jiri Olsa,
	Taeung Song, Arnaldo Carvalho de Melo

From: Sangwon Hong <qpakzk@gmail.com>

First, 'perf kmem' has a '--force' option, but didn't document it on the
man page. So add it.

Second, the '--time' option has to get a value, but isn't documented on
the man page. Describe it.

Signed-off-by: Sangwon Hong <qpakzk@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Link: http://lkml.kernel.org/r/1518381517-30766-1-git-send-email-qpakzk@gmail.com
[ Add blank like after --force block, as requested by Namhyung ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-kmem.txt | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-kmem.txt b/tools/perf/Documentation/perf-kmem.txt
index 479fc3261a50..85b8ac695c87 100644
--- a/tools/perf/Documentation/perf-kmem.txt
+++ b/tools/perf/Documentation/perf-kmem.txt
@@ -25,6 +25,10 @@ OPTIONS
 --input=<file>::
 	Select the input file (default: perf.data unless stdin is a fifo)
 
+-f::
+--force::
+	Don't do ownership validation
+
 -v::
 --verbose::
         Be more verbose. (show symbol address, etc)
@@ -61,7 +65,7 @@ OPTIONS
 	default, but this option shows live (currently allocated) pages
 	instead.  (This option works with --page option only)
 
---time::
+--time=<start>,<stop>::
 	Only analyze samples within given time window: <start>,<stop>. Times
 	have the format seconds.microseconds. If start is not given (i.e., time
 	string is ',x.y') then analysis starts at the beginning of the file. If
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 26/41] perf mem: Document a missing option
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (26 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Sangwon Hong, Jiri Olsa,
	Taeung Song, Arnaldo Carvalho de Melo

From: Sangwon Hong <qpakzk@gmail.com>

Add the missing --force option on the man page.

Signed-off-by: Sangwon Hong <qpakzk@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Link: http://lkml.kernel.org/r/1518381517-30766-2-git-send-email-qpakzk@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-mem.txt | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/Documentation/perf-mem.txt b/tools/perf/Documentation/perf-mem.txt
index 4be08a1e3f8d..b0211410969b 100644
--- a/tools/perf/Documentation/perf-mem.txt
+++ b/tools/perf/Documentation/perf-mem.txt
@@ -28,6 +28,10 @@ OPTIONS
 <command>...::
 	Any command you can specify in a shell.
 
+-f::
+--force::
+	Don't do ownership validation
+
 -t::
 --type=::
 	Select the memory operation type: load or store (default: load,store)
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 27/41] perf cs-etm: Inject capabilitity for CoreSight traces
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  (?)
@ 2018-02-16 19:17   ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Robert Walker, coresight,
	linux-arm-kernel, Arnaldo Carvalho de Melo

From: Robert Walker <robert.walker@arm.com>

Added user space perf functionality to translate CoreSight traces into
instruction events with branch stack.

To invoke the new functionality, use the perf inject tool with
--itrace=il. For example, to translate the ETM trace from perf.data into
last branch records in a new inj.data file:

    $ perf inject --itrace=i100000il128 -i perf.data -o perf.data.new

The 'i' parameter to itrace generates periodic instruction events.  The
period between instruction events can be specified as a number of
instructions suffixed by i (default 100000).

The parameter to 'l' specifies the number of entries in the branch stack
attached to instruction events.

The 'b' parameter to itrace generates events on taken branches.

This patch also fixes the contents of the branch events used in perf
report - previously branch events were generated for each contiguous
range of instructions executed.  These are fixed to generate branch
events between the last address of a range ending in an executed branch
instruction and the start address of the next range.

Based on patches by Sebastian Pop <s.pop@samsung.com> with additional fixes
and support for specifying the instruction period.

Originally-by: Sebastian Pop <s.pop@samsung.com>
Signed-off-by: Robert Walker <robert.walker@arm.com>
Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1518607481-4059-2-git-send-email-robert.walker@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c |  65 +++-
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h |   1 +
 tools/perf/util/cs-etm.c                        | 434 +++++++++++++++++++++---
 3 files changed, 436 insertions(+), 64 deletions(-)

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index 1fb01849f1c7..8ff69dfd725a 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -78,6 +78,8 @@ int cs_etm_decoder__reset(struct cs_etm_decoder *decoder)
 {
 	ocsd_datapath_resp_t dp_ret;
 
+	decoder->prev_return = OCSD_RESP_CONT;
+
 	dp_ret = ocsd_dt_process_data(decoder->dcd_tree, OCSD_OP_RESET,
 				      0, 0, NULL, NULL);
 	if (OCSD_DATA_RESP_IS_FATAL(dp_ret))
@@ -253,16 +255,16 @@ static void cs_etm_decoder__clear_buffer(struct cs_etm_decoder *decoder)
 	decoder->packet_count = 0;
 	for (i = 0; i < MAX_BUFFER; i++) {
 		decoder->packet_buffer[i].start_addr = 0xdeadbeefdeadbeefUL;
-		decoder->packet_buffer[i].end_addr   = 0xdeadbeefdeadbeefUL;
-		decoder->packet_buffer[i].exc	     = false;
-		decoder->packet_buffer[i].exc_ret    = false;
-		decoder->packet_buffer[i].cpu	     = INT_MIN;
+		decoder->packet_buffer[i].end_addr = 0xdeadbeefdeadbeefUL;
+		decoder->packet_buffer[i].last_instr_taken_branch = false;
+		decoder->packet_buffer[i].exc = false;
+		decoder->packet_buffer[i].exc_ret = false;
+		decoder->packet_buffer[i].cpu = INT_MIN;
 	}
 }
 
 static ocsd_datapath_resp_t
 cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder,
-			      const ocsd_generic_trace_elem *elem,
 			      const u8 trace_chan_id,
 			      enum cs_etm_sample_type sample_type)
 {
@@ -278,18 +280,16 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder,
 		return OCSD_RESP_FATAL_SYS_ERR;
 
 	et = decoder->tail;
+	et = (et + 1) & (MAX_BUFFER - 1);
+	decoder->tail = et;
+	decoder->packet_count++;
+
 	decoder->packet_buffer[et].sample_type = sample_type;
-	decoder->packet_buffer[et].start_addr = elem->st_addr;
-	decoder->packet_buffer[et].end_addr = elem->en_addr;
 	decoder->packet_buffer[et].exc = false;
 	decoder->packet_buffer[et].exc_ret = false;
 	decoder->packet_buffer[et].cpu = *((int *)inode->priv);
-
-	/* Wrap around if need be */
-	et = (et + 1) & (MAX_BUFFER - 1);
-
-	decoder->tail = et;
-	decoder->packet_count++;
+	decoder->packet_buffer[et].start_addr = 0xdeadbeefdeadbeefUL;
+	decoder->packet_buffer[et].end_addr = 0xdeadbeefdeadbeefUL;
 
 	if (decoder->packet_count == MAX_BUFFER - 1)
 		return OCSD_RESP_WAIT;
@@ -297,6 +297,40 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder,
 	return OCSD_RESP_CONT;
 }
 
+static ocsd_datapath_resp_t
+cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder,
+			     const ocsd_generic_trace_elem *elem,
+			     const uint8_t trace_chan_id)
+{
+	int ret = 0;
+	struct cs_etm_packet *packet;
+
+	ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+					    CS_ETM_RANGE);
+	if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT)
+		return ret;
+
+	packet = &decoder->packet_buffer[decoder->tail];
+
+	packet->start_addr = elem->st_addr;
+	packet->end_addr = elem->en_addr;
+	switch (elem->last_i_type) {
+	case OCSD_INSTR_BR:
+	case OCSD_INSTR_BR_INDIRECT:
+		packet->last_instr_taken_branch = elem->last_instr_exec;
+		break;
+	case OCSD_INSTR_ISB:
+	case OCSD_INSTR_DSB_DMB:
+	case OCSD_INSTR_OTHER:
+	default:
+		packet->last_instr_taken_branch = false;
+		break;
+	}
+
+	return ret;
+
+}
+
 static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
 				const void *context,
 				const ocsd_trc_index_t indx __maybe_unused,
@@ -316,9 +350,8 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
 		decoder->trace_on = true;
 		break;
 	case OCSD_GEN_TRC_ELEM_INSTR_RANGE:
-		resp = cs_etm_decoder__buffer_packet(decoder, elem,
-						     trace_chan_id,
-						     CS_ETM_RANGE);
+		resp = cs_etm_decoder__buffer_range(decoder, elem,
+						    trace_chan_id);
 		break;
 	case OCSD_GEN_TRC_ELEM_EXCEPTION:
 		decoder->packet_buffer[decoder->tail].exc = true;
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
index 3d2e6205d186..a4fdd285b145 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
@@ -30,6 +30,7 @@ struct cs_etm_packet {
 	enum cs_etm_sample_type sample_type;
 	u64 start_addr;
 	u64 end_addr;
+	u8 last_instr_taken_branch;
 	u8 exc;
 	u8 exc_ret;
 	int cpu;
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index f2c98774e665..6e595d96c04d 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -32,6 +32,14 @@
 
 #define MAX_TIMESTAMP (~0ULL)
 
+/*
+ * A64 instructions are always 4 bytes
+ *
+ * Only A64 is supported, so can use this constant for converting between
+ * addresses and instruction counts, calculting offsets etc
+ */
+#define A64_INSTR_SIZE 4
+
 struct cs_etm_auxtrace {
 	struct auxtrace auxtrace;
 	struct auxtrace_queues queues;
@@ -45,11 +53,15 @@ struct cs_etm_auxtrace {
 	u8 snapshot_mode;
 	u8 data_queued;
 	u8 sample_branches;
+	u8 sample_instructions;
 
 	int num_cpu;
 	u32 auxtrace_type;
 	u64 branches_sample_type;
 	u64 branches_id;
+	u64 instructions_sample_type;
+	u64 instructions_sample_period;
+	u64 instructions_id;
 	u64 **metadata;
 	u64 kernel_start;
 	unsigned int pmu_type;
@@ -68,6 +80,12 @@ struct cs_etm_queue {
 	u64 time;
 	u64 timestamp;
 	u64 offset;
+	u64 period_instructions;
+	struct branch_stack *last_branch;
+	struct branch_stack *last_branch_rb;
+	size_t last_branch_pos;
+	struct cs_etm_packet *prev_packet;
+	struct cs_etm_packet *packet;
 };
 
 static int cs_etm__update_queues(struct cs_etm_auxtrace *etm);
@@ -180,6 +198,10 @@ static void cs_etm__free_queue(void *priv)
 	thread__zput(etmq->thread);
 	cs_etm_decoder__free(etmq->decoder);
 	zfree(&etmq->event_buf);
+	zfree(&etmq->last_branch);
+	zfree(&etmq->last_branch_rb);
+	zfree(&etmq->prev_packet);
+	zfree(&etmq->packet);
 	free(etmq);
 }
 
@@ -276,11 +298,35 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm,
 	struct cs_etm_decoder_params d_params;
 	struct cs_etm_trace_params  *t_params;
 	struct cs_etm_queue *etmq;
+	size_t szp = sizeof(struct cs_etm_packet);
 
 	etmq = zalloc(sizeof(*etmq));
 	if (!etmq)
 		return NULL;
 
+	etmq->packet = zalloc(szp);
+	if (!etmq->packet)
+		goto out_free;
+
+	if (etm->synth_opts.last_branch || etm->sample_branches) {
+		etmq->prev_packet = zalloc(szp);
+		if (!etmq->prev_packet)
+			goto out_free;
+	}
+
+	if (etm->synth_opts.last_branch) {
+		size_t sz = sizeof(struct branch_stack);
+
+		sz += etm->synth_opts.last_branch_sz *
+		      sizeof(struct branch_entry);
+		etmq->last_branch = zalloc(sz);
+		if (!etmq->last_branch)
+			goto out_free;
+		etmq->last_branch_rb = zalloc(sz);
+		if (!etmq->last_branch_rb)
+			goto out_free;
+	}
+
 	etmq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
 	if (!etmq->event_buf)
 		goto out_free;
@@ -335,6 +381,7 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm,
 		goto out_free_decoder;
 
 	etmq->offset = 0;
+	etmq->period_instructions = 0;
 
 	return etmq;
 
@@ -342,6 +389,10 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm,
 	cs_etm_decoder__free(etmq->decoder);
 out_free:
 	zfree(&etmq->event_buf);
+	zfree(&etmq->last_branch);
+	zfree(&etmq->last_branch_rb);
+	zfree(&etmq->prev_packet);
+	zfree(&etmq->packet);
 	free(etmq);
 
 	return NULL;
@@ -395,6 +446,129 @@ static int cs_etm__update_queues(struct cs_etm_auxtrace *etm)
 	return 0;
 }
 
+static inline void cs_etm__copy_last_branch_rb(struct cs_etm_queue *etmq)
+{
+	struct branch_stack *bs_src = etmq->last_branch_rb;
+	struct branch_stack *bs_dst = etmq->last_branch;
+	size_t nr = 0;
+
+	/*
+	 * Set the number of records before early exit: ->nr is used to
+	 * determine how many branches to copy from ->entries.
+	 */
+	bs_dst->nr = bs_src->nr;
+
+	/*
+	 * Early exit when there is nothing to copy.
+	 */
+	if (!bs_src->nr)
+		return;
+
+	/*
+	 * As bs_src->entries is a circular buffer, we need to copy from it in
+	 * two steps.  First, copy the branches from the most recently inserted
+	 * branch ->last_branch_pos until the end of bs_src->entries buffer.
+	 */
+	nr = etmq->etm->synth_opts.last_branch_sz - etmq->last_branch_pos;
+	memcpy(&bs_dst->entries[0],
+	       &bs_src->entries[etmq->last_branch_pos],
+	       sizeof(struct branch_entry) * nr);
+
+	/*
+	 * If we wrapped around at least once, the branches from the beginning
+	 * of the bs_src->entries buffer and until the ->last_branch_pos element
+	 * are older valid branches: copy them over.  The total number of
+	 * branches copied over will be equal to the number of branches asked by
+	 * the user in last_branch_sz.
+	 */
+	if (bs_src->nr >= etmq->etm->synth_opts.last_branch_sz) {
+		memcpy(&bs_dst->entries[nr],
+		       &bs_src->entries[0],
+		       sizeof(struct branch_entry) * etmq->last_branch_pos);
+	}
+}
+
+static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
+{
+	etmq->last_branch_pos = 0;
+	etmq->last_branch_rb->nr = 0;
+}
+
+static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
+{
+	/*
+	 * The packet records the execution range with an exclusive end address
+	 *
+	 * A64 instructions are constant size, so the last executed
+	 * instruction is A64_INSTR_SIZE before the end address
+	 * Will need to do instruction level decode for T32 instructions as
+	 * they can be variable size (not yet supported).
+	 */
+	return packet->end_addr - A64_INSTR_SIZE;
+}
+
+static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet)
+{
+	/*
+	 * Only A64 instructions are currently supported, so can get
+	 * instruction count by dividing.
+	 * Will need to do instruction level decode for T32 instructions as
+	 * they can be variable size (not yet supported).
+	 */
+	return (packet->end_addr - packet->start_addr) / A64_INSTR_SIZE;
+}
+
+static inline u64 cs_etm__instr_addr(const struct cs_etm_packet *packet,
+				     u64 offset)
+{
+	/*
+	 * Only A64 instructions are currently supported, so can get
+	 * instruction address by muliplying.
+	 * Will need to do instruction level decode for T32 instructions as
+	 * they can be variable size (not yet supported).
+	 */
+	return packet->start_addr + offset * A64_INSTR_SIZE;
+}
+
+static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq)
+{
+	struct branch_stack *bs = etmq->last_branch_rb;
+	struct branch_entry *be;
+
+	/*
+	 * The branches are recorded in a circular buffer in reverse
+	 * chronological order: we start recording from the last element of the
+	 * buffer down.  After writing the first element of the stack, move the
+	 * insert position back to the end of the buffer.
+	 */
+	if (!etmq->last_branch_pos)
+		etmq->last_branch_pos = etmq->etm->synth_opts.last_branch_sz;
+
+	etmq->last_branch_pos -= 1;
+
+	be       = &bs->entries[etmq->last_branch_pos];
+	be->from = cs_etm__last_executed_instr(etmq->prev_packet);
+	be->to	 = etmq->packet->start_addr;
+	/* No support for mispredict */
+	be->flags.mispred = 0;
+	be->flags.predicted = 1;
+
+	/*
+	 * Increment bs->nr until reaching the number of last branches asked by
+	 * the user on the command line.
+	 */
+	if (bs->nr < etmq->etm->synth_opts.last_branch_sz)
+		bs->nr += 1;
+}
+
+static int cs_etm__inject_event(union perf_event *event,
+			       struct perf_sample *sample, u64 type)
+{
+	event->header.size = perf_event__sample_event_size(sample, type, 0);
+	return perf_event__synthesize_sample(event, type, 0, sample);
+}
+
+
 static int
 cs_etm__get_trace(struct cs_etm_buffer *buff, struct cs_etm_queue *etmq)
 {
@@ -459,35 +633,105 @@ static void  cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm,
 	}
 }
 
+static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
+					    u64 addr, u64 period)
+{
+	int ret = 0;
+	struct cs_etm_auxtrace *etm = etmq->etm;
+	union perf_event *event = etmq->event_buf;
+	struct perf_sample sample = {.ip = 0,};
+
+	event->sample.header.type = PERF_RECORD_SAMPLE;
+	event->sample.header.misc = PERF_RECORD_MISC_USER;
+	event->sample.header.size = sizeof(struct perf_event_header);
+
+	sample.ip = addr;
+	sample.pid = etmq->pid;
+	sample.tid = etmq->tid;
+	sample.id = etmq->etm->instructions_id;
+	sample.stream_id = etmq->etm->instructions_id;
+	sample.period = period;
+	sample.cpu = etmq->packet->cpu;
+	sample.flags = 0;
+	sample.insn_len = 1;
+	sample.cpumode = event->header.misc;
+
+	if (etm->synth_opts.last_branch) {
+		cs_etm__copy_last_branch_rb(etmq);
+		sample.branch_stack = etmq->last_branch;
+	}
+
+	if (etm->synth_opts.inject) {
+		ret = cs_etm__inject_event(event, &sample,
+					   etm->instructions_sample_type);
+		if (ret)
+			return ret;
+	}
+
+	ret = perf_session__deliver_synth_event(etm->session, event, &sample);
+
+	if (ret)
+		pr_err(
+			"CS ETM Trace: failed to deliver instruction event, error %d\n",
+			ret);
+
+	if (etm->synth_opts.last_branch)
+		cs_etm__reset_last_branch_rb(etmq);
+
+	return ret;
+}
+
 /*
  * The cs etm packet encodes an instruction range between a branch target
  * and the next taken branch. Generate sample accordingly.
  */
-static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
-				       struct cs_etm_packet *packet)
+static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq)
 {
 	int ret = 0;
 	struct cs_etm_auxtrace *etm = etmq->etm;
 	struct perf_sample sample = {.ip = 0,};
 	union perf_event *event = etmq->event_buf;
-	u64 start_addr = packet->start_addr;
-	u64 end_addr = packet->end_addr;
+	struct dummy_branch_stack {
+		u64			nr;
+		struct branch_entry	entries;
+	} dummy_bs;
 
 	event->sample.header.type = PERF_RECORD_SAMPLE;
 	event->sample.header.misc = PERF_RECORD_MISC_USER;
 	event->sample.header.size = sizeof(struct perf_event_header);
 
-	sample.ip = start_addr;
+	sample.ip = cs_etm__last_executed_instr(etmq->prev_packet);
 	sample.pid = etmq->pid;
 	sample.tid = etmq->tid;
-	sample.addr = end_addr;
+	sample.addr = etmq->packet->start_addr;
 	sample.id = etmq->etm->branches_id;
 	sample.stream_id = etmq->etm->branches_id;
 	sample.period = 1;
-	sample.cpu = packet->cpu;
+	sample.cpu = etmq->packet->cpu;
 	sample.flags = 0;
 	sample.cpumode = PERF_RECORD_MISC_USER;
 
+	/*
+	 * perf report cannot handle events without a branch stack
+	 */
+	if (etm->synth_opts.last_branch) {
+		dummy_bs = (struct dummy_branch_stack){
+			.nr = 1,
+			.entries = {
+				.from = sample.ip,
+				.to = sample.addr,
+			},
+		};
+		sample.branch_stack = (struct branch_stack *)&dummy_bs;
+	}
+
+	if (etm->synth_opts.inject) {
+		ret = cs_etm__inject_event(event, &sample,
+					   etm->branches_sample_type);
+		if (ret)
+			return ret;
+	}
+
 	ret = perf_session__deliver_synth_event(etm->session, event, &sample);
 
 	if (ret)
@@ -584,6 +828,24 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace *etm,
 		etm->sample_branches = true;
 		etm->branches_sample_type = attr.sample_type;
 		etm->branches_id = id;
+		id += 1;
+		attr.sample_type &= ~(u64)PERF_SAMPLE_ADDR;
+	}
+
+	if (etm->synth_opts.last_branch)
+		attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
+
+	if (etm->synth_opts.instructions) {
+		attr.config = PERF_COUNT_HW_INSTRUCTIONS;
+		attr.sample_period = etm->synth_opts.period;
+		etm->instructions_sample_period = attr.sample_period;
+		err = cs_etm__synth_event(session, &attr, id);
+		if (err)
+			return err;
+		etm->sample_instructions = true;
+		etm->instructions_sample_type = attr.sample_type;
+		etm->instructions_id = id;
+		id += 1;
 	}
 
 	return 0;
@@ -591,20 +853,68 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace *etm,
 
 static int cs_etm__sample(struct cs_etm_queue *etmq)
 {
+	struct cs_etm_auxtrace *etm = etmq->etm;
+	struct cs_etm_packet *tmp;
 	int ret;
-	struct cs_etm_packet packet;
+	u64 instrs_executed;
 
-	while (1) {
-		ret = cs_etm_decoder__get_packet(etmq->decoder, &packet);
-		if (ret <= 0)
+	instrs_executed = cs_etm__instr_count(etmq->packet);
+	etmq->period_instructions += instrs_executed;
+
+	/*
+	 * Record a branch when the last instruction in
+	 * PREV_PACKET is a branch.
+	 */
+	if (etm->synth_opts.last_branch &&
+	    etmq->prev_packet &&
+	    etmq->prev_packet->last_instr_taken_branch)
+		cs_etm__update_last_branch_rb(etmq);
+
+	if (etm->sample_instructions &&
+	    etmq->period_instructions >= etm->instructions_sample_period) {
+		/*
+		 * Emit instruction sample periodically
+		 * TODO: allow period to be defined in cycles and clock time
+		 */
+
+		/* Get number of instructions executed after the sample point */
+		u64 instrs_over = etmq->period_instructions -
+			etm->instructions_sample_period;
+
+		/*
+		 * Calculate the address of the sampled instruction (-1 as
+		 * sample is reported as though instruction has just been
+		 * executed, but PC has not advanced to next instruction)
+		 */
+		u64 offset = (instrs_executed - instrs_over - 1);
+		u64 addr = cs_etm__instr_addr(etmq->packet, offset);
+
+		ret = cs_etm__synth_instruction_sample(
+			etmq, addr, etm->instructions_sample_period);
+		if (ret)
+			return ret;
+
+		/* Carry remaining instructions into next sample period */
+		etmq->period_instructions = instrs_over;
+	}
+
+	if (etm->sample_branches &&
+	    etmq->prev_packet &&
+	    etmq->prev_packet->sample_type == CS_ETM_RANGE &&
+	    etmq->prev_packet->last_instr_taken_branch) {
+		ret = cs_etm__synth_branch_sample(etmq);
+		if (ret)
 			return ret;
+	}
 
+	if (etm->sample_branches || etm->synth_opts.last_branch) {
 		/*
-		 * If the packet contains an instruction range, generate an
-		 * instruction sequence event.
+		 * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
+		 * the next incoming packet.
 		 */
-		if (packet.sample_type & CS_ETM_RANGE)
-			cs_etm__synth_branch_sample(etmq, &packet);
+		tmp = etmq->packet;
+		etmq->packet = etmq->prev_packet;
+		etmq->prev_packet = tmp;
 	}
 
 	return 0;
@@ -621,45 +931,73 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
 		etm->kernel_start = machine__kernel_start(etm->machine);
 
 	/* Go through each buffer in the queue and decode them one by one */
-more:
-	buffer_used = 0;
-	memset(&buffer, 0, sizeof(buffer));
-	err = cs_etm__get_trace(&buffer, etmq);
-	if (err <= 0)
-		return err;
-	/*
-	 * We cannot assume consecutive blocks in the data file are contiguous,
-	 * reset the decoder to force re-sync.
-	 */
-	err = cs_etm_decoder__reset(etmq->decoder);
-	if (err != 0)
-		return err;
-
-	/* Run trace decoder until buffer consumed or end of trace */
-	do {
-		processed = 0;
-
-		err = cs_etm_decoder__process_data_block(
-						etmq->decoder,
-						etmq->offset,
-						&buffer.buf[buffer_used],
-						buffer.len - buffer_used,
-						&processed);
-
-		if (err)
+	while (1) {
+		buffer_used = 0;
+		memset(&buffer, 0, sizeof(buffer));
+		err = cs_etm__get_trace(&buffer, etmq);
+		if (err <= 0)
+			return err;
+		/*
+		 * We cannot assume consecutive blocks in the data file are
+		 * contiguous, reset the decoder to force re-sync.
+		 */
+		err = cs_etm_decoder__reset(etmq->decoder);
+		if (err != 0)
 			return err;
 
-		etmq->offset += processed;
-		buffer_used += processed;
+		/* Run trace decoder until buffer consumed or end of trace */
+		do {
+			processed = 0;
+			err = cs_etm_decoder__process_data_block(
+				etmq->decoder,
+				etmq->offset,
+				&buffer.buf[buffer_used],
+				buffer.len - buffer_used,
+				&processed);
+			if (err)
+				return err;
+
+			etmq->offset += processed;
+			buffer_used += processed;
+
+			/* Process each packet in this chunk */
+			while (1) {
+				err = cs_etm_decoder__get_packet(etmq->decoder,
+								 etmq->packet);
+				if (err <= 0)
+					/*
+					 * Stop processing this chunk on
+					 * end of data or error
+					 */
+					break;
+
+				/*
+				 * If the packet contains an instruction
+				 * range, generate instruction sequence
+				 * events.
+				 */
+				if (etmq->packet->sample_type & CS_ETM_RANGE)
+					err = cs_etm__sample(etmq);
+			}
+		} while (buffer.len > buffer_used);
 
 		/*
-		 * Nothing to do with an error condition, let's hope the next
-		 * chunk will be better.
+		 * Generate a last branch event for the branches left in
+		 * the circular buffer at the end of the trace.
 		 */
-		err = cs_etm__sample(etmq);
-	} while (buffer.len > buffer_used);
+		if (etm->sample_instructions &&
+		    etmq->etm->synth_opts.last_branch) {
+			struct branch_stack *bs = etmq->last_branch_rb;
+			struct branch_entry *be =
+				&bs->entries[etmq->last_branch_pos];
+
+			err = cs_etm__synth_instruction_sample(
+				etmq, be->to, etmq->period_instructions);
+			if (err)
+				return err;
+		}
 
-goto more;
+	}
 
 	return err;
 }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 27/41] perf cs-etm: Inject capabilitity for CoreSight traces
@ 2018-02-16 19:17   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, coresight, linux-kernel,
	linux-perf-users, Robert Walker, linux-arm-kernel

From: Robert Walker <robert.walker@arm.com>

Added user space perf functionality to translate CoreSight traces into
instruction events with branch stack.

To invoke the new functionality, use the perf inject tool with
--itrace=il. For example, to translate the ETM trace from perf.data into
last branch records in a new inj.data file:

    $ perf inject --itrace=i100000il128 -i perf.data -o perf.data.new

The 'i' parameter to itrace generates periodic instruction events.  The
period between instruction events can be specified as a number of
instructions suffixed by i (default 100000).

The parameter to 'l' specifies the number of entries in the branch stack
attached to instruction events.

The 'b' parameter to itrace generates events on taken branches.

This patch also fixes the contents of the branch events used in perf
report - previously branch events were generated for each contiguous
range of instructions executed.  These are fixed to generate branch
events between the last address of a range ending in an executed branch
instruction and the start address of the next range.

Based on patches by Sebastian Pop <s.pop@samsung.com> with additional fixes
and support for specifying the instruction period.

Originally-by: Sebastian Pop <s.pop@samsung.com>
Signed-off-by: Robert Walker <robert.walker@arm.com>
Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1518607481-4059-2-git-send-email-robert.walker@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c |  65 +++-
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h |   1 +
 tools/perf/util/cs-etm.c                        | 434 +++++++++++++++++++++---
 3 files changed, 436 insertions(+), 64 deletions(-)

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index 1fb01849f1c7..8ff69dfd725a 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -78,6 +78,8 @@ int cs_etm_decoder__reset(struct cs_etm_decoder *decoder)
 {
 	ocsd_datapath_resp_t dp_ret;
 
+	decoder->prev_return = OCSD_RESP_CONT;
+
 	dp_ret = ocsd_dt_process_data(decoder->dcd_tree, OCSD_OP_RESET,
 				      0, 0, NULL, NULL);
 	if (OCSD_DATA_RESP_IS_FATAL(dp_ret))
@@ -253,16 +255,16 @@ static void cs_etm_decoder__clear_buffer(struct cs_etm_decoder *decoder)
 	decoder->packet_count = 0;
 	for (i = 0; i < MAX_BUFFER; i++) {
 		decoder->packet_buffer[i].start_addr = 0xdeadbeefdeadbeefUL;
-		decoder->packet_buffer[i].end_addr   = 0xdeadbeefdeadbeefUL;
-		decoder->packet_buffer[i].exc	     = false;
-		decoder->packet_buffer[i].exc_ret    = false;
-		decoder->packet_buffer[i].cpu	     = INT_MIN;
+		decoder->packet_buffer[i].end_addr = 0xdeadbeefdeadbeefUL;
+		decoder->packet_buffer[i].last_instr_taken_branch = false;
+		decoder->packet_buffer[i].exc = false;
+		decoder->packet_buffer[i].exc_ret = false;
+		decoder->packet_buffer[i].cpu = INT_MIN;
 	}
 }
 
 static ocsd_datapath_resp_t
 cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder,
-			      const ocsd_generic_trace_elem *elem,
 			      const u8 trace_chan_id,
 			      enum cs_etm_sample_type sample_type)
 {
@@ -278,18 +280,16 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder,
 		return OCSD_RESP_FATAL_SYS_ERR;
 
 	et = decoder->tail;
+	et = (et + 1) & (MAX_BUFFER - 1);
+	decoder->tail = et;
+	decoder->packet_count++;
+
 	decoder->packet_buffer[et].sample_type = sample_type;
-	decoder->packet_buffer[et].start_addr = elem->st_addr;
-	decoder->packet_buffer[et].end_addr = elem->en_addr;
 	decoder->packet_buffer[et].exc = false;
 	decoder->packet_buffer[et].exc_ret = false;
 	decoder->packet_buffer[et].cpu = *((int *)inode->priv);
-
-	/* Wrap around if need be */
-	et = (et + 1) & (MAX_BUFFER - 1);
-
-	decoder->tail = et;
-	decoder->packet_count++;
+	decoder->packet_buffer[et].start_addr = 0xdeadbeefdeadbeefUL;
+	decoder->packet_buffer[et].end_addr = 0xdeadbeefdeadbeefUL;
 
 	if (decoder->packet_count == MAX_BUFFER - 1)
 		return OCSD_RESP_WAIT;
@@ -297,6 +297,40 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder,
 	return OCSD_RESP_CONT;
 }
 
+static ocsd_datapath_resp_t
+cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder,
+			     const ocsd_generic_trace_elem *elem,
+			     const uint8_t trace_chan_id)
+{
+	int ret = 0;
+	struct cs_etm_packet *packet;
+
+	ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+					    CS_ETM_RANGE);
+	if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT)
+		return ret;
+
+	packet = &decoder->packet_buffer[decoder->tail];
+
+	packet->start_addr = elem->st_addr;
+	packet->end_addr = elem->en_addr;
+	switch (elem->last_i_type) {
+	case OCSD_INSTR_BR:
+	case OCSD_INSTR_BR_INDIRECT:
+		packet->last_instr_taken_branch = elem->last_instr_exec;
+		break;
+	case OCSD_INSTR_ISB:
+	case OCSD_INSTR_DSB_DMB:
+	case OCSD_INSTR_OTHER:
+	default:
+		packet->last_instr_taken_branch = false;
+		break;
+	}
+
+	return ret;
+
+}
+
 static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
 				const void *context,
 				const ocsd_trc_index_t indx __maybe_unused,
@@ -316,9 +350,8 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
 		decoder->trace_on = true;
 		break;
 	case OCSD_GEN_TRC_ELEM_INSTR_RANGE:
-		resp = cs_etm_decoder__buffer_packet(decoder, elem,
-						     trace_chan_id,
-						     CS_ETM_RANGE);
+		resp = cs_etm_decoder__buffer_range(decoder, elem,
+						    trace_chan_id);
 		break;
 	case OCSD_GEN_TRC_ELEM_EXCEPTION:
 		decoder->packet_buffer[decoder->tail].exc = true;
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
index 3d2e6205d186..a4fdd285b145 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
@@ -30,6 +30,7 @@ struct cs_etm_packet {
 	enum cs_etm_sample_type sample_type;
 	u64 start_addr;
 	u64 end_addr;
+	u8 last_instr_taken_branch;
 	u8 exc;
 	u8 exc_ret;
 	int cpu;
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index f2c98774e665..6e595d96c04d 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -32,6 +32,14 @@
 
 #define MAX_TIMESTAMP (~0ULL)
 
+/*
+ * A64 instructions are always 4 bytes
+ *
+ * Only A64 is supported, so can use this constant for converting between
+ * addresses and instruction counts, calculting offsets etc
+ */
+#define A64_INSTR_SIZE 4
+
 struct cs_etm_auxtrace {
 	struct auxtrace auxtrace;
 	struct auxtrace_queues queues;
@@ -45,11 +53,15 @@ struct cs_etm_auxtrace {
 	u8 snapshot_mode;
 	u8 data_queued;
 	u8 sample_branches;
+	u8 sample_instructions;
 
 	int num_cpu;
 	u32 auxtrace_type;
 	u64 branches_sample_type;
 	u64 branches_id;
+	u64 instructions_sample_type;
+	u64 instructions_sample_period;
+	u64 instructions_id;
 	u64 **metadata;
 	u64 kernel_start;
 	unsigned int pmu_type;
@@ -68,6 +80,12 @@ struct cs_etm_queue {
 	u64 time;
 	u64 timestamp;
 	u64 offset;
+	u64 period_instructions;
+	struct branch_stack *last_branch;
+	struct branch_stack *last_branch_rb;
+	size_t last_branch_pos;
+	struct cs_etm_packet *prev_packet;
+	struct cs_etm_packet *packet;
 };
 
 static int cs_etm__update_queues(struct cs_etm_auxtrace *etm);
@@ -180,6 +198,10 @@ static void cs_etm__free_queue(void *priv)
 	thread__zput(etmq->thread);
 	cs_etm_decoder__free(etmq->decoder);
 	zfree(&etmq->event_buf);
+	zfree(&etmq->last_branch);
+	zfree(&etmq->last_branch_rb);
+	zfree(&etmq->prev_packet);
+	zfree(&etmq->packet);
 	free(etmq);
 }
 
@@ -276,11 +298,35 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm,
 	struct cs_etm_decoder_params d_params;
 	struct cs_etm_trace_params  *t_params;
 	struct cs_etm_queue *etmq;
+	size_t szp = sizeof(struct cs_etm_packet);
 
 	etmq = zalloc(sizeof(*etmq));
 	if (!etmq)
 		return NULL;
 
+	etmq->packet = zalloc(szp);
+	if (!etmq->packet)
+		goto out_free;
+
+	if (etm->synth_opts.last_branch || etm->sample_branches) {
+		etmq->prev_packet = zalloc(szp);
+		if (!etmq->prev_packet)
+			goto out_free;
+	}
+
+	if (etm->synth_opts.last_branch) {
+		size_t sz = sizeof(struct branch_stack);
+
+		sz += etm->synth_opts.last_branch_sz *
+		      sizeof(struct branch_entry);
+		etmq->last_branch = zalloc(sz);
+		if (!etmq->last_branch)
+			goto out_free;
+		etmq->last_branch_rb = zalloc(sz);
+		if (!etmq->last_branch_rb)
+			goto out_free;
+	}
+
 	etmq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
 	if (!etmq->event_buf)
 		goto out_free;
@@ -335,6 +381,7 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm,
 		goto out_free_decoder;
 
 	etmq->offset = 0;
+	etmq->period_instructions = 0;
 
 	return etmq;
 
@@ -342,6 +389,10 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm,
 	cs_etm_decoder__free(etmq->decoder);
 out_free:
 	zfree(&etmq->event_buf);
+	zfree(&etmq->last_branch);
+	zfree(&etmq->last_branch_rb);
+	zfree(&etmq->prev_packet);
+	zfree(&etmq->packet);
 	free(etmq);
 
 	return NULL;
@@ -395,6 +446,129 @@ static int cs_etm__update_queues(struct cs_etm_auxtrace *etm)
 	return 0;
 }
 
+static inline void cs_etm__copy_last_branch_rb(struct cs_etm_queue *etmq)
+{
+	struct branch_stack *bs_src = etmq->last_branch_rb;
+	struct branch_stack *bs_dst = etmq->last_branch;
+	size_t nr = 0;
+
+	/*
+	 * Set the number of records before early exit: ->nr is used to
+	 * determine how many branches to copy from ->entries.
+	 */
+	bs_dst->nr = bs_src->nr;
+
+	/*
+	 * Early exit when there is nothing to copy.
+	 */
+	if (!bs_src->nr)
+		return;
+
+	/*
+	 * As bs_src->entries is a circular buffer, we need to copy from it in
+	 * two steps.  First, copy the branches from the most recently inserted
+	 * branch ->last_branch_pos until the end of bs_src->entries buffer.
+	 */
+	nr = etmq->etm->synth_opts.last_branch_sz - etmq->last_branch_pos;
+	memcpy(&bs_dst->entries[0],
+	       &bs_src->entries[etmq->last_branch_pos],
+	       sizeof(struct branch_entry) * nr);
+
+	/*
+	 * If we wrapped around at least once, the branches from the beginning
+	 * of the bs_src->entries buffer and until the ->last_branch_pos element
+	 * are older valid branches: copy them over.  The total number of
+	 * branches copied over will be equal to the number of branches asked by
+	 * the user in last_branch_sz.
+	 */
+	if (bs_src->nr >= etmq->etm->synth_opts.last_branch_sz) {
+		memcpy(&bs_dst->entries[nr],
+		       &bs_src->entries[0],
+		       sizeof(struct branch_entry) * etmq->last_branch_pos);
+	}
+}
+
+static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
+{
+	etmq->last_branch_pos = 0;
+	etmq->last_branch_rb->nr = 0;
+}
+
+static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
+{
+	/*
+	 * The packet records the execution range with an exclusive end address
+	 *
+	 * A64 instructions are constant size, so the last executed
+	 * instruction is A64_INSTR_SIZE before the end address
+	 * Will need to do instruction level decode for T32 instructions as
+	 * they can be variable size (not yet supported).
+	 */
+	return packet->end_addr - A64_INSTR_SIZE;
+}
+
+static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet)
+{
+	/*
+	 * Only A64 instructions are currently supported, so can get
+	 * instruction count by dividing.
+	 * Will need to do instruction level decode for T32 instructions as
+	 * they can be variable size (not yet supported).
+	 */
+	return (packet->end_addr - packet->start_addr) / A64_INSTR_SIZE;
+}
+
+static inline u64 cs_etm__instr_addr(const struct cs_etm_packet *packet,
+				     u64 offset)
+{
+	/*
+	 * Only A64 instructions are currently supported, so can get
+	 * instruction address by muliplying.
+	 * Will need to do instruction level decode for T32 instructions as
+	 * they can be variable size (not yet supported).
+	 */
+	return packet->start_addr + offset * A64_INSTR_SIZE;
+}
+
+static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq)
+{
+	struct branch_stack *bs = etmq->last_branch_rb;
+	struct branch_entry *be;
+
+	/*
+	 * The branches are recorded in a circular buffer in reverse
+	 * chronological order: we start recording from the last element of the
+	 * buffer down.  After writing the first element of the stack, move the
+	 * insert position back to the end of the buffer.
+	 */
+	if (!etmq->last_branch_pos)
+		etmq->last_branch_pos = etmq->etm->synth_opts.last_branch_sz;
+
+	etmq->last_branch_pos -= 1;
+
+	be       = &bs->entries[etmq->last_branch_pos];
+	be->from = cs_etm__last_executed_instr(etmq->prev_packet);
+	be->to	 = etmq->packet->start_addr;
+	/* No support for mispredict */
+	be->flags.mispred = 0;
+	be->flags.predicted = 1;
+
+	/*
+	 * Increment bs->nr until reaching the number of last branches asked by
+	 * the user on the command line.
+	 */
+	if (bs->nr < etmq->etm->synth_opts.last_branch_sz)
+		bs->nr += 1;
+}
+
+static int cs_etm__inject_event(union perf_event *event,
+			       struct perf_sample *sample, u64 type)
+{
+	event->header.size = perf_event__sample_event_size(sample, type, 0);
+	return perf_event__synthesize_sample(event, type, 0, sample);
+}
+
+
 static int
 cs_etm__get_trace(struct cs_etm_buffer *buff, struct cs_etm_queue *etmq)
 {
@@ -459,35 +633,105 @@ static void  cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm,
 	}
 }
 
+static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
+					    u64 addr, u64 period)
+{
+	int ret = 0;
+	struct cs_etm_auxtrace *etm = etmq->etm;
+	union perf_event *event = etmq->event_buf;
+	struct perf_sample sample = {.ip = 0,};
+
+	event->sample.header.type = PERF_RECORD_SAMPLE;
+	event->sample.header.misc = PERF_RECORD_MISC_USER;
+	event->sample.header.size = sizeof(struct perf_event_header);
+
+	sample.ip = addr;
+	sample.pid = etmq->pid;
+	sample.tid = etmq->tid;
+	sample.id = etmq->etm->instructions_id;
+	sample.stream_id = etmq->etm->instructions_id;
+	sample.period = period;
+	sample.cpu = etmq->packet->cpu;
+	sample.flags = 0;
+	sample.insn_len = 1;
+	sample.cpumode = event->header.misc;
+
+	if (etm->synth_opts.last_branch) {
+		cs_etm__copy_last_branch_rb(etmq);
+		sample.branch_stack = etmq->last_branch;
+	}
+
+	if (etm->synth_opts.inject) {
+		ret = cs_etm__inject_event(event, &sample,
+					   etm->instructions_sample_type);
+		if (ret)
+			return ret;
+	}
+
+	ret = perf_session__deliver_synth_event(etm->session, event, &sample);
+
+	if (ret)
+		pr_err(
+			"CS ETM Trace: failed to deliver instruction event, error %d\n",
+			ret);
+
+	if (etm->synth_opts.last_branch)
+		cs_etm__reset_last_branch_rb(etmq);
+
+	return ret;
+}
+
 /*
  * The cs etm packet encodes an instruction range between a branch target
  * and the next taken branch. Generate sample accordingly.
  */
-static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
-				       struct cs_etm_packet *packet)
+static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq)
 {
 	int ret = 0;
 	struct cs_etm_auxtrace *etm = etmq->etm;
 	struct perf_sample sample = {.ip = 0,};
 	union perf_event *event = etmq->event_buf;
-	u64 start_addr = packet->start_addr;
-	u64 end_addr = packet->end_addr;
+	struct dummy_branch_stack {
+		u64			nr;
+		struct branch_entry	entries;
+	} dummy_bs;
 
 	event->sample.header.type = PERF_RECORD_SAMPLE;
 	event->sample.header.misc = PERF_RECORD_MISC_USER;
 	event->sample.header.size = sizeof(struct perf_event_header);
 
-	sample.ip = start_addr;
+	sample.ip = cs_etm__last_executed_instr(etmq->prev_packet);
 	sample.pid = etmq->pid;
 	sample.tid = etmq->tid;
-	sample.addr = end_addr;
+	sample.addr = etmq->packet->start_addr;
 	sample.id = etmq->etm->branches_id;
 	sample.stream_id = etmq->etm->branches_id;
 	sample.period = 1;
-	sample.cpu = packet->cpu;
+	sample.cpu = etmq->packet->cpu;
 	sample.flags = 0;
 	sample.cpumode = PERF_RECORD_MISC_USER;
 
+	/*
+	 * perf report cannot handle events without a branch stack
+	 */
+	if (etm->synth_opts.last_branch) {
+		dummy_bs = (struct dummy_branch_stack){
+			.nr = 1,
+			.entries = {
+				.from = sample.ip,
+				.to = sample.addr,
+			},
+		};
+		sample.branch_stack = (struct branch_stack *)&dummy_bs;
+	}
+
+	if (etm->synth_opts.inject) {
+		ret = cs_etm__inject_event(event, &sample,
+					   etm->branches_sample_type);
+		if (ret)
+			return ret;
+	}
+
 	ret = perf_session__deliver_synth_event(etm->session, event, &sample);
 
 	if (ret)
@@ -584,6 +828,24 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace *etm,
 		etm->sample_branches = true;
 		etm->branches_sample_type = attr.sample_type;
 		etm->branches_id = id;
+		id += 1;
+		attr.sample_type &= ~(u64)PERF_SAMPLE_ADDR;
+	}
+
+	if (etm->synth_opts.last_branch)
+		attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
+
+	if (etm->synth_opts.instructions) {
+		attr.config = PERF_COUNT_HW_INSTRUCTIONS;
+		attr.sample_period = etm->synth_opts.period;
+		etm->instructions_sample_period = attr.sample_period;
+		err = cs_etm__synth_event(session, &attr, id);
+		if (err)
+			return err;
+		etm->sample_instructions = true;
+		etm->instructions_sample_type = attr.sample_type;
+		etm->instructions_id = id;
+		id += 1;
 	}
 
 	return 0;
@@ -591,20 +853,68 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace *etm,
 
 static int cs_etm__sample(struct cs_etm_queue *etmq)
 {
+	struct cs_etm_auxtrace *etm = etmq->etm;
+	struct cs_etm_packet *tmp;
 	int ret;
-	struct cs_etm_packet packet;
+	u64 instrs_executed;
 
-	while (1) {
-		ret = cs_etm_decoder__get_packet(etmq->decoder, &packet);
-		if (ret <= 0)
+	instrs_executed = cs_etm__instr_count(etmq->packet);
+	etmq->period_instructions += instrs_executed;
+
+	/*
+	 * Record a branch when the last instruction in
+	 * PREV_PACKET is a branch.
+	 */
+	if (etm->synth_opts.last_branch &&
+	    etmq->prev_packet &&
+	    etmq->prev_packet->last_instr_taken_branch)
+		cs_etm__update_last_branch_rb(etmq);
+
+	if (etm->sample_instructions &&
+	    etmq->period_instructions >= etm->instructions_sample_period) {
+		/*
+		 * Emit instruction sample periodically
+		 * TODO: allow period to be defined in cycles and clock time
+		 */
+
+		/* Get number of instructions executed after the sample point */
+		u64 instrs_over = etmq->period_instructions -
+			etm->instructions_sample_period;
+
+		/*
+		 * Calculate the address of the sampled instruction (-1 as
+		 * sample is reported as though instruction has just been
+		 * executed, but PC has not advanced to next instruction)
+		 */
+		u64 offset = (instrs_executed - instrs_over - 1);
+		u64 addr = cs_etm__instr_addr(etmq->packet, offset);
+
+		ret = cs_etm__synth_instruction_sample(
+			etmq, addr, etm->instructions_sample_period);
+		if (ret)
+			return ret;
+
+		/* Carry remaining instructions into next sample period */
+		etmq->period_instructions = instrs_over;
+	}
+
+	if (etm->sample_branches &&
+	    etmq->prev_packet &&
+	    etmq->prev_packet->sample_type == CS_ETM_RANGE &&
+	    etmq->prev_packet->last_instr_taken_branch) {
+		ret = cs_etm__synth_branch_sample(etmq);
+		if (ret)
 			return ret;
+	}
 
+	if (etm->sample_branches || etm->synth_opts.last_branch) {
 		/*
-		 * If the packet contains an instruction range, generate an
-		 * instruction sequence event.
+		 * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
+		 * the next incoming packet.
 		 */
-		if (packet.sample_type & CS_ETM_RANGE)
-			cs_etm__synth_branch_sample(etmq, &packet);
+		tmp = etmq->packet;
+		etmq->packet = etmq->prev_packet;
+		etmq->prev_packet = tmp;
 	}
 
 	return 0;
@@ -621,45 +931,73 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
 		etm->kernel_start = machine__kernel_start(etm->machine);
 
 	/* Go through each buffer in the queue and decode them one by one */
-more:
-	buffer_used = 0;
-	memset(&buffer, 0, sizeof(buffer));
-	err = cs_etm__get_trace(&buffer, etmq);
-	if (err <= 0)
-		return err;
-	/*
-	 * We cannot assume consecutive blocks in the data file are contiguous,
-	 * reset the decoder to force re-sync.
-	 */
-	err = cs_etm_decoder__reset(etmq->decoder);
-	if (err != 0)
-		return err;
-
-	/* Run trace decoder until buffer consumed or end of trace */
-	do {
-		processed = 0;
-
-		err = cs_etm_decoder__process_data_block(
-						etmq->decoder,
-						etmq->offset,
-						&buffer.buf[buffer_used],
-						buffer.len - buffer_used,
-						&processed);
-
-		if (err)
+	while (1) {
+		buffer_used = 0;
+		memset(&buffer, 0, sizeof(buffer));
+		err = cs_etm__get_trace(&buffer, etmq);
+		if (err <= 0)
+			return err;
+		/*
+		 * We cannot assume consecutive blocks in the data file are
+		 * contiguous, reset the decoder to force re-sync.
+		 */
+		err = cs_etm_decoder__reset(etmq->decoder);
+		if (err != 0)
 			return err;
 
-		etmq->offset += processed;
-		buffer_used += processed;
+		/* Run trace decoder until buffer consumed or end of trace */
+		do {
+			processed = 0;
+			err = cs_etm_decoder__process_data_block(
+				etmq->decoder,
+				etmq->offset,
+				&buffer.buf[buffer_used],
+				buffer.len - buffer_used,
+				&processed);
+			if (err)
+				return err;
+
+			etmq->offset += processed;
+			buffer_used += processed;
+
+			/* Process each packet in this chunk */
+			while (1) {
+				err = cs_etm_decoder__get_packet(etmq->decoder,
+								 etmq->packet);
+				if (err <= 0)
+					/*
+					 * Stop processing this chunk on
+					 * end of data or error
+					 */
+					break;
+
+				/*
+				 * If the packet contains an instruction
+				 * range, generate instruction sequence
+				 * events.
+				 */
+				if (etmq->packet->sample_type & CS_ETM_RANGE)
+					err = cs_etm__sample(etmq);
+			}
+		} while (buffer.len > buffer_used);
 
 		/*
-		 * Nothing to do with an error condition, let's hope the next
-		 * chunk will be better.
+		 * Generate a last branch event for the branches left in
+		 * the circular buffer at the end of the trace.
 		 */
-		err = cs_etm__sample(etmq);
-	} while (buffer.len > buffer_used);
+		if (etm->sample_instructions &&
+		    etmq->etm->synth_opts.last_branch) {
+			struct branch_stack *bs = etmq->last_branch_rb;
+			struct branch_entry *be =
+				&bs->entries[etmq->last_branch_pos];
+
+			err = cs_etm__synth_instruction_sample(
+				etmq, be->to, etmq->period_instructions);
+			if (err)
+				return err;
+		}
 
-goto more;
+	}
 
 	return err;
 }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 27/41] perf cs-etm: Inject capabilitity for CoreSight traces
@ 2018-02-16 19:17   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: linux-arm-kernel

From: Robert Walker <robert.walker@arm.com>

Added user space perf functionality to translate CoreSight traces into
instruction events with branch stack.

To invoke the new functionality, use the perf inject tool with
--itrace=il. For example, to translate the ETM trace from perf.data into
last branch records in a new inj.data file:

    $ perf inject --itrace=i100000il128 -i perf.data -o perf.data.new

The 'i' parameter to itrace generates periodic instruction events.  The
period between instruction events can be specified as a number of
instructions suffixed by i (default 100000).

The parameter to 'l' specifies the number of entries in the branch stack
attached to instruction events.

The 'b' parameter to itrace generates events on taken branches.

This patch also fixes the contents of the branch events used in perf
report - previously branch events were generated for each contiguous
range of instructions executed.  These are fixed to generate branch
events between the last address of a range ending in an executed branch
instruction and the start address of the next range.

Based on patches by Sebastian Pop <s.pop@samsung.com> with additional fixes
and support for specifying the instruction period.

Originally-by: Sebastian Pop <s.pop@samsung.com>
Signed-off-by: Robert Walker <robert.walker@arm.com>
Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: coresight at lists.linaro.org
Cc: linux-arm-kernel at lists.infradead.org
Link: http://lkml.kernel.org/r/1518607481-4059-2-git-send-email-robert.walker at arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c |  65 +++-
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h |   1 +
 tools/perf/util/cs-etm.c                        | 434 +++++++++++++++++++++---
 3 files changed, 436 insertions(+), 64 deletions(-)

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index 1fb01849f1c7..8ff69dfd725a 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -78,6 +78,8 @@ int cs_etm_decoder__reset(struct cs_etm_decoder *decoder)
 {
 	ocsd_datapath_resp_t dp_ret;
 
+	decoder->prev_return = OCSD_RESP_CONT;
+
 	dp_ret = ocsd_dt_process_data(decoder->dcd_tree, OCSD_OP_RESET,
 				      0, 0, NULL, NULL);
 	if (OCSD_DATA_RESP_IS_FATAL(dp_ret))
@@ -253,16 +255,16 @@ static void cs_etm_decoder__clear_buffer(struct cs_etm_decoder *decoder)
 	decoder->packet_count = 0;
 	for (i = 0; i < MAX_BUFFER; i++) {
 		decoder->packet_buffer[i].start_addr = 0xdeadbeefdeadbeefUL;
-		decoder->packet_buffer[i].end_addr   = 0xdeadbeefdeadbeefUL;
-		decoder->packet_buffer[i].exc	     = false;
-		decoder->packet_buffer[i].exc_ret    = false;
-		decoder->packet_buffer[i].cpu	     = INT_MIN;
+		decoder->packet_buffer[i].end_addr = 0xdeadbeefdeadbeefUL;
+		decoder->packet_buffer[i].last_instr_taken_branch = false;
+		decoder->packet_buffer[i].exc = false;
+		decoder->packet_buffer[i].exc_ret = false;
+		decoder->packet_buffer[i].cpu = INT_MIN;
 	}
 }
 
 static ocsd_datapath_resp_t
 cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder,
-			      const ocsd_generic_trace_elem *elem,
 			      const u8 trace_chan_id,
 			      enum cs_etm_sample_type sample_type)
 {
@@ -278,18 +280,16 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder,
 		return OCSD_RESP_FATAL_SYS_ERR;
 
 	et = decoder->tail;
+	et = (et + 1) & (MAX_BUFFER - 1);
+	decoder->tail = et;
+	decoder->packet_count++;
+
 	decoder->packet_buffer[et].sample_type = sample_type;
-	decoder->packet_buffer[et].start_addr = elem->st_addr;
-	decoder->packet_buffer[et].end_addr = elem->en_addr;
 	decoder->packet_buffer[et].exc = false;
 	decoder->packet_buffer[et].exc_ret = false;
 	decoder->packet_buffer[et].cpu = *((int *)inode->priv);
-
-	/* Wrap around if need be */
-	et = (et + 1) & (MAX_BUFFER - 1);
-
-	decoder->tail = et;
-	decoder->packet_count++;
+	decoder->packet_buffer[et].start_addr = 0xdeadbeefdeadbeefUL;
+	decoder->packet_buffer[et].end_addr = 0xdeadbeefdeadbeefUL;
 
 	if (decoder->packet_count == MAX_BUFFER - 1)
 		return OCSD_RESP_WAIT;
@@ -297,6 +297,40 @@ cs_etm_decoder__buffer_packet(struct cs_etm_decoder *decoder,
 	return OCSD_RESP_CONT;
 }
 
+static ocsd_datapath_resp_t
+cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder,
+			     const ocsd_generic_trace_elem *elem,
+			     const uint8_t trace_chan_id)
+{
+	int ret = 0;
+	struct cs_etm_packet *packet;
+
+	ret = cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+					    CS_ETM_RANGE);
+	if (ret != OCSD_RESP_CONT && ret != OCSD_RESP_WAIT)
+		return ret;
+
+	packet = &decoder->packet_buffer[decoder->tail];
+
+	packet->start_addr = elem->st_addr;
+	packet->end_addr = elem->en_addr;
+	switch (elem->last_i_type) {
+	case OCSD_INSTR_BR:
+	case OCSD_INSTR_BR_INDIRECT:
+		packet->last_instr_taken_branch = elem->last_instr_exec;
+		break;
+	case OCSD_INSTR_ISB:
+	case OCSD_INSTR_DSB_DMB:
+	case OCSD_INSTR_OTHER:
+	default:
+		packet->last_instr_taken_branch = false;
+		break;
+	}
+
+	return ret;
+
+}
+
 static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
 				const void *context,
 				const ocsd_trc_index_t indx __maybe_unused,
@@ -316,9 +350,8 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
 		decoder->trace_on = true;
 		break;
 	case OCSD_GEN_TRC_ELEM_INSTR_RANGE:
-		resp = cs_etm_decoder__buffer_packet(decoder, elem,
-						     trace_chan_id,
-						     CS_ETM_RANGE);
+		resp = cs_etm_decoder__buffer_range(decoder, elem,
+						    trace_chan_id);
 		break;
 	case OCSD_GEN_TRC_ELEM_EXCEPTION:
 		decoder->packet_buffer[decoder->tail].exc = true;
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
index 3d2e6205d186..a4fdd285b145 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
@@ -30,6 +30,7 @@ struct cs_etm_packet {
 	enum cs_etm_sample_type sample_type;
 	u64 start_addr;
 	u64 end_addr;
+	u8 last_instr_taken_branch;
 	u8 exc;
 	u8 exc_ret;
 	int cpu;
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index f2c98774e665..6e595d96c04d 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -32,6 +32,14 @@
 
 #define MAX_TIMESTAMP (~0ULL)
 
+/*
+ * A64 instructions are always 4 bytes
+ *
+ * Only A64 is supported, so can use this constant for converting between
+ * addresses and instruction counts, calculting offsets etc
+ */
+#define A64_INSTR_SIZE 4
+
 struct cs_etm_auxtrace {
 	struct auxtrace auxtrace;
 	struct auxtrace_queues queues;
@@ -45,11 +53,15 @@ struct cs_etm_auxtrace {
 	u8 snapshot_mode;
 	u8 data_queued;
 	u8 sample_branches;
+	u8 sample_instructions;
 
 	int num_cpu;
 	u32 auxtrace_type;
 	u64 branches_sample_type;
 	u64 branches_id;
+	u64 instructions_sample_type;
+	u64 instructions_sample_period;
+	u64 instructions_id;
 	u64 **metadata;
 	u64 kernel_start;
 	unsigned int pmu_type;
@@ -68,6 +80,12 @@ struct cs_etm_queue {
 	u64 time;
 	u64 timestamp;
 	u64 offset;
+	u64 period_instructions;
+	struct branch_stack *last_branch;
+	struct branch_stack *last_branch_rb;
+	size_t last_branch_pos;
+	struct cs_etm_packet *prev_packet;
+	struct cs_etm_packet *packet;
 };
 
 static int cs_etm__update_queues(struct cs_etm_auxtrace *etm);
@@ -180,6 +198,10 @@ static void cs_etm__free_queue(void *priv)
 	thread__zput(etmq->thread);
 	cs_etm_decoder__free(etmq->decoder);
 	zfree(&etmq->event_buf);
+	zfree(&etmq->last_branch);
+	zfree(&etmq->last_branch_rb);
+	zfree(&etmq->prev_packet);
+	zfree(&etmq->packet);
 	free(etmq);
 }
 
@@ -276,11 +298,35 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm,
 	struct cs_etm_decoder_params d_params;
 	struct cs_etm_trace_params  *t_params;
 	struct cs_etm_queue *etmq;
+	size_t szp = sizeof(struct cs_etm_packet);
 
 	etmq = zalloc(sizeof(*etmq));
 	if (!etmq)
 		return NULL;
 
+	etmq->packet = zalloc(szp);
+	if (!etmq->packet)
+		goto out_free;
+
+	if (etm->synth_opts.last_branch || etm->sample_branches) {
+		etmq->prev_packet = zalloc(szp);
+		if (!etmq->prev_packet)
+			goto out_free;
+	}
+
+	if (etm->synth_opts.last_branch) {
+		size_t sz = sizeof(struct branch_stack);
+
+		sz += etm->synth_opts.last_branch_sz *
+		      sizeof(struct branch_entry);
+		etmq->last_branch = zalloc(sz);
+		if (!etmq->last_branch)
+			goto out_free;
+		etmq->last_branch_rb = zalloc(sz);
+		if (!etmq->last_branch_rb)
+			goto out_free;
+	}
+
 	etmq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
 	if (!etmq->event_buf)
 		goto out_free;
@@ -335,6 +381,7 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm,
 		goto out_free_decoder;
 
 	etmq->offset = 0;
+	etmq->period_instructions = 0;
 
 	return etmq;
 
@@ -342,6 +389,10 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm,
 	cs_etm_decoder__free(etmq->decoder);
 out_free:
 	zfree(&etmq->event_buf);
+	zfree(&etmq->last_branch);
+	zfree(&etmq->last_branch_rb);
+	zfree(&etmq->prev_packet);
+	zfree(&etmq->packet);
 	free(etmq);
 
 	return NULL;
@@ -395,6 +446,129 @@ static int cs_etm__update_queues(struct cs_etm_auxtrace *etm)
 	return 0;
 }
 
+static inline void cs_etm__copy_last_branch_rb(struct cs_etm_queue *etmq)
+{
+	struct branch_stack *bs_src = etmq->last_branch_rb;
+	struct branch_stack *bs_dst = etmq->last_branch;
+	size_t nr = 0;
+
+	/*
+	 * Set the number of records before early exit: ->nr is used to
+	 * determine how many branches to copy from ->entries.
+	 */
+	bs_dst->nr = bs_src->nr;
+
+	/*
+	 * Early exit when there is nothing to copy.
+	 */
+	if (!bs_src->nr)
+		return;
+
+	/*
+	 * As bs_src->entries is a circular buffer, we need to copy from it in
+	 * two steps.  First, copy the branches from the most recently inserted
+	 * branch ->last_branch_pos until the end of bs_src->entries buffer.
+	 */
+	nr = etmq->etm->synth_opts.last_branch_sz - etmq->last_branch_pos;
+	memcpy(&bs_dst->entries[0],
+	       &bs_src->entries[etmq->last_branch_pos],
+	       sizeof(struct branch_entry) * nr);
+
+	/*
+	 * If we wrapped around at least once, the branches from the beginning
+	 * of the bs_src->entries buffer and until the ->last_branch_pos element
+	 * are older valid branches: copy them over.  The total number of
+	 * branches copied over will be equal to the number of branches asked by
+	 * the user in last_branch_sz.
+	 */
+	if (bs_src->nr >= etmq->etm->synth_opts.last_branch_sz) {
+		memcpy(&bs_dst->entries[nr],
+		       &bs_src->entries[0],
+		       sizeof(struct branch_entry) * etmq->last_branch_pos);
+	}
+}
+
+static inline void cs_etm__reset_last_branch_rb(struct cs_etm_queue *etmq)
+{
+	etmq->last_branch_pos = 0;
+	etmq->last_branch_rb->nr = 0;
+}
+
+static inline u64 cs_etm__last_executed_instr(struct cs_etm_packet *packet)
+{
+	/*
+	 * The packet records the execution range with an exclusive end address
+	 *
+	 * A64 instructions are constant size, so the last executed
+	 * instruction is A64_INSTR_SIZE before the end address
+	 * Will need to do instruction level decode for T32 instructions as
+	 * they can be variable size (not yet supported).
+	 */
+	return packet->end_addr - A64_INSTR_SIZE;
+}
+
+static inline u64 cs_etm__instr_count(const struct cs_etm_packet *packet)
+{
+	/*
+	 * Only A64 instructions are currently supported, so can get
+	 * instruction count by dividing.
+	 * Will need to do instruction level decode for T32 instructions as
+	 * they can be variable size (not yet supported).
+	 */
+	return (packet->end_addr - packet->start_addr) / A64_INSTR_SIZE;
+}
+
+static inline u64 cs_etm__instr_addr(const struct cs_etm_packet *packet,
+				     u64 offset)
+{
+	/*
+	 * Only A64 instructions are currently supported, so can get
+	 * instruction address by muliplying.
+	 * Will need to do instruction level decode for T32 instructions as
+	 * they can be variable size (not yet supported).
+	 */
+	return packet->start_addr + offset * A64_INSTR_SIZE;
+}
+
+static void cs_etm__update_last_branch_rb(struct cs_etm_queue *etmq)
+{
+	struct branch_stack *bs = etmq->last_branch_rb;
+	struct branch_entry *be;
+
+	/*
+	 * The branches are recorded in a circular buffer in reverse
+	 * chronological order: we start recording from the last element of the
+	 * buffer down.  After writing the first element of the stack, move the
+	 * insert position back to the end of the buffer.
+	 */
+	if (!etmq->last_branch_pos)
+		etmq->last_branch_pos = etmq->etm->synth_opts.last_branch_sz;
+
+	etmq->last_branch_pos -= 1;
+
+	be       = &bs->entries[etmq->last_branch_pos];
+	be->from = cs_etm__last_executed_instr(etmq->prev_packet);
+	be->to	 = etmq->packet->start_addr;
+	/* No support for mispredict */
+	be->flags.mispred = 0;
+	be->flags.predicted = 1;
+
+	/*
+	 * Increment bs->nr until reaching the number of last branches asked by
+	 * the user on the command line.
+	 */
+	if (bs->nr < etmq->etm->synth_opts.last_branch_sz)
+		bs->nr += 1;
+}
+
+static int cs_etm__inject_event(union perf_event *event,
+			       struct perf_sample *sample, u64 type)
+{
+	event->header.size = perf_event__sample_event_size(sample, type, 0);
+	return perf_event__synthesize_sample(event, type, 0, sample);
+}
+
+
 static int
 cs_etm__get_trace(struct cs_etm_buffer *buff, struct cs_etm_queue *etmq)
 {
@@ -459,35 +633,105 @@ static void  cs_etm__set_pid_tid_cpu(struct cs_etm_auxtrace *etm,
 	}
 }
 
+static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
+					    u64 addr, u64 period)
+{
+	int ret = 0;
+	struct cs_etm_auxtrace *etm = etmq->etm;
+	union perf_event *event = etmq->event_buf;
+	struct perf_sample sample = {.ip = 0,};
+
+	event->sample.header.type = PERF_RECORD_SAMPLE;
+	event->sample.header.misc = PERF_RECORD_MISC_USER;
+	event->sample.header.size = sizeof(struct perf_event_header);
+
+	sample.ip = addr;
+	sample.pid = etmq->pid;
+	sample.tid = etmq->tid;
+	sample.id = etmq->etm->instructions_id;
+	sample.stream_id = etmq->etm->instructions_id;
+	sample.period = period;
+	sample.cpu = etmq->packet->cpu;
+	sample.flags = 0;
+	sample.insn_len = 1;
+	sample.cpumode = event->header.misc;
+
+	if (etm->synth_opts.last_branch) {
+		cs_etm__copy_last_branch_rb(etmq);
+		sample.branch_stack = etmq->last_branch;
+	}
+
+	if (etm->synth_opts.inject) {
+		ret = cs_etm__inject_event(event, &sample,
+					   etm->instructions_sample_type);
+		if (ret)
+			return ret;
+	}
+
+	ret = perf_session__deliver_synth_event(etm->session, event, &sample);
+
+	if (ret)
+		pr_err(
+			"CS ETM Trace: failed to deliver instruction event, error %d\n",
+			ret);
+
+	if (etm->synth_opts.last_branch)
+		cs_etm__reset_last_branch_rb(etmq);
+
+	return ret;
+}
+
 /*
  * The cs etm packet encodes an instruction range between a branch target
  * and the next taken branch. Generate sample accordingly.
  */
-static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq,
-				       struct cs_etm_packet *packet)
+static int cs_etm__synth_branch_sample(struct cs_etm_queue *etmq)
 {
 	int ret = 0;
 	struct cs_etm_auxtrace *etm = etmq->etm;
 	struct perf_sample sample = {.ip = 0,};
 	union perf_event *event = etmq->event_buf;
-	u64 start_addr = packet->start_addr;
-	u64 end_addr = packet->end_addr;
+	struct dummy_branch_stack {
+		u64			nr;
+		struct branch_entry	entries;
+	} dummy_bs;
 
 	event->sample.header.type = PERF_RECORD_SAMPLE;
 	event->sample.header.misc = PERF_RECORD_MISC_USER;
 	event->sample.header.size = sizeof(struct perf_event_header);
 
-	sample.ip = start_addr;
+	sample.ip = cs_etm__last_executed_instr(etmq->prev_packet);
 	sample.pid = etmq->pid;
 	sample.tid = etmq->tid;
-	sample.addr = end_addr;
+	sample.addr = etmq->packet->start_addr;
 	sample.id = etmq->etm->branches_id;
 	sample.stream_id = etmq->etm->branches_id;
 	sample.period = 1;
-	sample.cpu = packet->cpu;
+	sample.cpu = etmq->packet->cpu;
 	sample.flags = 0;
 	sample.cpumode = PERF_RECORD_MISC_USER;
 
+	/*
+	 * perf report cannot handle events without a branch stack
+	 */
+	if (etm->synth_opts.last_branch) {
+		dummy_bs = (struct dummy_branch_stack){
+			.nr = 1,
+			.entries = {
+				.from = sample.ip,
+				.to = sample.addr,
+			},
+		};
+		sample.branch_stack = (struct branch_stack *)&dummy_bs;
+	}
+
+	if (etm->synth_opts.inject) {
+		ret = cs_etm__inject_event(event, &sample,
+					   etm->branches_sample_type);
+		if (ret)
+			return ret;
+	}
+
 	ret = perf_session__deliver_synth_event(etm->session, event, &sample);
 
 	if (ret)
@@ -584,6 +828,24 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace *etm,
 		etm->sample_branches = true;
 		etm->branches_sample_type = attr.sample_type;
 		etm->branches_id = id;
+		id += 1;
+		attr.sample_type &= ~(u64)PERF_SAMPLE_ADDR;
+	}
+
+	if (etm->synth_opts.last_branch)
+		attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
+
+	if (etm->synth_opts.instructions) {
+		attr.config = PERF_COUNT_HW_INSTRUCTIONS;
+		attr.sample_period = etm->synth_opts.period;
+		etm->instructions_sample_period = attr.sample_period;
+		err = cs_etm__synth_event(session, &attr, id);
+		if (err)
+			return err;
+		etm->sample_instructions = true;
+		etm->instructions_sample_type = attr.sample_type;
+		etm->instructions_id = id;
+		id += 1;
 	}
 
 	return 0;
@@ -591,20 +853,68 @@ static int cs_etm__synth_events(struct cs_etm_auxtrace *etm,
 
 static int cs_etm__sample(struct cs_etm_queue *etmq)
 {
+	struct cs_etm_auxtrace *etm = etmq->etm;
+	struct cs_etm_packet *tmp;
 	int ret;
-	struct cs_etm_packet packet;
+	u64 instrs_executed;
 
-	while (1) {
-		ret = cs_etm_decoder__get_packet(etmq->decoder, &packet);
-		if (ret <= 0)
+	instrs_executed = cs_etm__instr_count(etmq->packet);
+	etmq->period_instructions += instrs_executed;
+
+	/*
+	 * Record a branch when the last instruction in
+	 * PREV_PACKET is a branch.
+	 */
+	if (etm->synth_opts.last_branch &&
+	    etmq->prev_packet &&
+	    etmq->prev_packet->last_instr_taken_branch)
+		cs_etm__update_last_branch_rb(etmq);
+
+	if (etm->sample_instructions &&
+	    etmq->period_instructions >= etm->instructions_sample_period) {
+		/*
+		 * Emit instruction sample periodically
+		 * TODO: allow period to be defined in cycles and clock time
+		 */
+
+		/* Get number of instructions executed after the sample point */
+		u64 instrs_over = etmq->period_instructions -
+			etm->instructions_sample_period;
+
+		/*
+		 * Calculate the address of the sampled instruction (-1 as
+		 * sample is reported as though instruction has just been
+		 * executed, but PC has not advanced to next instruction)
+		 */
+		u64 offset = (instrs_executed - instrs_over - 1);
+		u64 addr = cs_etm__instr_addr(etmq->packet, offset);
+
+		ret = cs_etm__synth_instruction_sample(
+			etmq, addr, etm->instructions_sample_period);
+		if (ret)
+			return ret;
+
+		/* Carry remaining instructions into next sample period */
+		etmq->period_instructions = instrs_over;
+	}
+
+	if (etm->sample_branches &&
+	    etmq->prev_packet &&
+	    etmq->prev_packet->sample_type == CS_ETM_RANGE &&
+	    etmq->prev_packet->last_instr_taken_branch) {
+		ret = cs_etm__synth_branch_sample(etmq);
+		if (ret)
 			return ret;
+	}
 
+	if (etm->sample_branches || etm->synth_opts.last_branch) {
 		/*
-		 * If the packet contains an instruction range, generate an
-		 * instruction sequence event.
+		 * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
+		 * the next incoming packet.
 		 */
-		if (packet.sample_type & CS_ETM_RANGE)
-			cs_etm__synth_branch_sample(etmq, &packet);
+		tmp = etmq->packet;
+		etmq->packet = etmq->prev_packet;
+		etmq->prev_packet = tmp;
 	}
 
 	return 0;
@@ -621,45 +931,73 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
 		etm->kernel_start = machine__kernel_start(etm->machine);
 
 	/* Go through each buffer in the queue and decode them one by one */
-more:
-	buffer_used = 0;
-	memset(&buffer, 0, sizeof(buffer));
-	err = cs_etm__get_trace(&buffer, etmq);
-	if (err <= 0)
-		return err;
-	/*
-	 * We cannot assume consecutive blocks in the data file are contiguous,
-	 * reset the decoder to force re-sync.
-	 */
-	err = cs_etm_decoder__reset(etmq->decoder);
-	if (err != 0)
-		return err;
-
-	/* Run trace decoder until buffer consumed or end of trace */
-	do {
-		processed = 0;
-
-		err = cs_etm_decoder__process_data_block(
-						etmq->decoder,
-						etmq->offset,
-						&buffer.buf[buffer_used],
-						buffer.len - buffer_used,
-						&processed);
-
-		if (err)
+	while (1) {
+		buffer_used = 0;
+		memset(&buffer, 0, sizeof(buffer));
+		err = cs_etm__get_trace(&buffer, etmq);
+		if (err <= 0)
+			return err;
+		/*
+		 * We cannot assume consecutive blocks in the data file are
+		 * contiguous, reset the decoder to force re-sync.
+		 */
+		err = cs_etm_decoder__reset(etmq->decoder);
+		if (err != 0)
 			return err;
 
-		etmq->offset += processed;
-		buffer_used += processed;
+		/* Run trace decoder until buffer consumed or end of trace */
+		do {
+			processed = 0;
+			err = cs_etm_decoder__process_data_block(
+				etmq->decoder,
+				etmq->offset,
+				&buffer.buf[buffer_used],
+				buffer.len - buffer_used,
+				&processed);
+			if (err)
+				return err;
+
+			etmq->offset += processed;
+			buffer_used += processed;
+
+			/* Process each packet in this chunk */
+			while (1) {
+				err = cs_etm_decoder__get_packet(etmq->decoder,
+								 etmq->packet);
+				if (err <= 0)
+					/*
+					 * Stop processing this chunk on
+					 * end of data or error
+					 */
+					break;
+
+				/*
+				 * If the packet contains an instruction
+				 * range, generate instruction sequence
+				 * events.
+				 */
+				if (etmq->packet->sample_type & CS_ETM_RANGE)
+					err = cs_etm__sample(etmq);
+			}
+		} while (buffer.len > buffer_used);
 
 		/*
-		 * Nothing to do with an error condition, let's hope the next
-		 * chunk will be better.
+		 * Generate a last branch event for the branches left in
+		 * the circular buffer at the end of the trace.
 		 */
-		err = cs_etm__sample(etmq);
-	} while (buffer.len > buffer_used);
+		if (etm->sample_instructions &&
+		    etmq->etm->synth_opts.last_branch) {
+			struct branch_stack *bs = etmq->last_branch_rb;
+			struct branch_entry *be =
+				&bs->entries[etmq->last_branch_pos];
+
+			err = cs_etm__synth_instruction_sample(
+				etmq, be->to, etmq->period_instructions);
+			if (err)
+				return err;
+		}
 
-goto more;
+	}
 
 	return err;
 }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 28/41] perf inject: Emit instruction records on ETM trace discontinuity
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  (?)
@ 2018-02-16 19:17   ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Robert Walker, coresight,
	linux-arm-kernel, Arnaldo Carvalho de Melo

From: Robert Walker <robert.walker@arm.com>

There may be discontinuities in the ETM trace stream due to overflows or
ETM configuration for selective trace.  This patch emits an instruction
sample with the pending branch stack when a TRACE ON packet occurs
indicating a discontinuity in the trace data.

A new packet type CS_ETM_TRACE_ON is added, which is emitted by the low
level decoder when a TRACE ON occurs.  The higher level decoder flushes
the branch stack when this packet is emitted.

Signed-off-by: Robert Walker <robert.walker@arm.com>
Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1518607481-4059-3-git-send-email-robert.walker@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c |  9 +++
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h |  1 +
 tools/perf/util/cs-etm.c                        | 80 ++++++++++++++++++-------
 3 files changed, 67 insertions(+), 23 deletions(-)

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index 8ff69dfd725a..640af88331b4 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -328,7 +328,14 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder,
 	}
 
 	return ret;
+}
 
+static ocsd_datapath_resp_t
+cs_etm_decoder__buffer_trace_on(struct cs_etm_decoder *decoder,
+				const uint8_t trace_chan_id)
+{
+	return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+					     CS_ETM_TRACE_ON);
 }
 
 static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
@@ -347,6 +354,8 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
 		decoder->trace_on = false;
 		break;
 	case OCSD_GEN_TRC_ELEM_TRACE_ON:
+		resp = cs_etm_decoder__buffer_trace_on(decoder,
+						       trace_chan_id);
 		decoder->trace_on = true;
 		break;
 	case OCSD_GEN_TRC_ELEM_INSTR_RANGE:
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
index a4fdd285b145..743f5f444304 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
@@ -24,6 +24,7 @@ struct cs_etm_buffer {
 
 enum cs_etm_sample_type {
 	CS_ETM_RANGE = 1 << 0,
+	CS_ETM_TRACE_ON = 1 << 1,
 };
 
 struct cs_etm_packet {
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 6e595d96c04d..1b0d422373be 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -867,6 +867,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
 	 */
 	if (etm->synth_opts.last_branch &&
 	    etmq->prev_packet &&
+	    etmq->prev_packet->sample_type == CS_ETM_RANGE &&
 	    etmq->prev_packet->last_instr_taken_branch)
 		cs_etm__update_last_branch_rb(etmq);
 
@@ -920,6 +921,40 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
 	return 0;
 }
 
+static int cs_etm__flush(struct cs_etm_queue *etmq)
+{
+	int err = 0;
+	struct cs_etm_packet *tmp;
+
+	if (etmq->etm->synth_opts.last_branch &&
+	    etmq->prev_packet &&
+	    etmq->prev_packet->sample_type == CS_ETM_RANGE) {
+		/*
+		 * Generate a last branch event for the branches left in the
+		 * circular buffer at the end of the trace.
+		 *
+		 * Use the address of the end of the last reported execution
+		 * range
+		 */
+		u64 addr = cs_etm__last_executed_instr(etmq->prev_packet);
+
+		err = cs_etm__synth_instruction_sample(
+			etmq, addr,
+			etmq->period_instructions);
+		etmq->period_instructions = 0;
+
+		/*
+		 * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
+		 * the next incoming packet.
+		 */
+		tmp = etmq->packet;
+		etmq->packet = etmq->prev_packet;
+		etmq->prev_packet = tmp;
+	}
+
+	return err;
+}
+
 static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
 {
 	struct cs_etm_auxtrace *etm = etmq->etm;
@@ -971,32 +1006,31 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
 					 */
 					break;
 
-				/*
-				 * If the packet contains an instruction
-				 * range, generate instruction sequence
-				 * events.
-				 */
-				if (etmq->packet->sample_type & CS_ETM_RANGE)
-					err = cs_etm__sample(etmq);
+				switch (etmq->packet->sample_type) {
+				case CS_ETM_RANGE:
+					/*
+					 * If the packet contains an instruction
+					 * range, generate instruction sequence
+					 * events.
+					 */
+					cs_etm__sample(etmq);
+					break;
+				case CS_ETM_TRACE_ON:
+					/*
+					 * Discontinuity in trace, flush
+					 * previous branch stack
+					 */
+					cs_etm__flush(etmq);
+					break;
+				default:
+					break;
+				}
 			}
 		} while (buffer.len > buffer_used);
 
-		/*
-		 * Generate a last branch event for the branches left in
-		 * the circular buffer at the end of the trace.
-		 */
-		if (etm->sample_instructions &&
-		    etmq->etm->synth_opts.last_branch) {
-			struct branch_stack *bs = etmq->last_branch_rb;
-			struct branch_entry *be =
-				&bs->entries[etmq->last_branch_pos];
-
-			err = cs_etm__synth_instruction_sample(
-				etmq, be->to, etmq->period_instructions);
-			if (err)
-				return err;
-		}
-
+		if (err == 0)
+			/* Flush any remaining branch stack entries */
+			err = cs_etm__flush(etmq);
 	}
 
 	return err;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 28/41] perf inject: Emit instruction records on ETM trace discontinuity
@ 2018-02-16 19:17   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, coresight, linux-kernel,
	linux-perf-users, Robert Walker, linux-arm-kernel

From: Robert Walker <robert.walker@arm.com>

There may be discontinuities in the ETM trace stream due to overflows or
ETM configuration for selective trace.  This patch emits an instruction
sample with the pending branch stack when a TRACE ON packet occurs
indicating a discontinuity in the trace data.

A new packet type CS_ETM_TRACE_ON is added, which is emitted by the low
level decoder when a TRACE ON occurs.  The higher level decoder flushes
the branch stack when this packet is emitted.

Signed-off-by: Robert Walker <robert.walker@arm.com>
Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1518607481-4059-3-git-send-email-robert.walker@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c |  9 +++
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h |  1 +
 tools/perf/util/cs-etm.c                        | 80 ++++++++++++++++++-------
 3 files changed, 67 insertions(+), 23 deletions(-)

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index 8ff69dfd725a..640af88331b4 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -328,7 +328,14 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder,
 	}
 
 	return ret;
+}
 
+static ocsd_datapath_resp_t
+cs_etm_decoder__buffer_trace_on(struct cs_etm_decoder *decoder,
+				const uint8_t trace_chan_id)
+{
+	return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+					     CS_ETM_TRACE_ON);
 }
 
 static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
@@ -347,6 +354,8 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
 		decoder->trace_on = false;
 		break;
 	case OCSD_GEN_TRC_ELEM_TRACE_ON:
+		resp = cs_etm_decoder__buffer_trace_on(decoder,
+						       trace_chan_id);
 		decoder->trace_on = true;
 		break;
 	case OCSD_GEN_TRC_ELEM_INSTR_RANGE:
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
index a4fdd285b145..743f5f444304 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
@@ -24,6 +24,7 @@ struct cs_etm_buffer {
 
 enum cs_etm_sample_type {
 	CS_ETM_RANGE = 1 << 0,
+	CS_ETM_TRACE_ON = 1 << 1,
 };
 
 struct cs_etm_packet {
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 6e595d96c04d..1b0d422373be 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -867,6 +867,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
 	 */
 	if (etm->synth_opts.last_branch &&
 	    etmq->prev_packet &&
+	    etmq->prev_packet->sample_type == CS_ETM_RANGE &&
 	    etmq->prev_packet->last_instr_taken_branch)
 		cs_etm__update_last_branch_rb(etmq);
 
@@ -920,6 +921,40 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
 	return 0;
 }
 
+static int cs_etm__flush(struct cs_etm_queue *etmq)
+{
+	int err = 0;
+	struct cs_etm_packet *tmp;
+
+	if (etmq->etm->synth_opts.last_branch &&
+	    etmq->prev_packet &&
+	    etmq->prev_packet->sample_type == CS_ETM_RANGE) {
+		/*
+		 * Generate a last branch event for the branches left in the
+		 * circular buffer at the end of the trace.
+		 *
+		 * Use the address of the end of the last reported execution
+		 * range
+		 */
+		u64 addr = cs_etm__last_executed_instr(etmq->prev_packet);
+
+		err = cs_etm__synth_instruction_sample(
+			etmq, addr,
+			etmq->period_instructions);
+		etmq->period_instructions = 0;
+
+		/*
+		 * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
+		 * the next incoming packet.
+		 */
+		tmp = etmq->packet;
+		etmq->packet = etmq->prev_packet;
+		etmq->prev_packet = tmp;
+	}
+
+	return err;
+}
+
 static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
 {
 	struct cs_etm_auxtrace *etm = etmq->etm;
@@ -971,32 +1006,31 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
 					 */
 					break;
 
-				/*
-				 * If the packet contains an instruction
-				 * range, generate instruction sequence
-				 * events.
-				 */
-				if (etmq->packet->sample_type & CS_ETM_RANGE)
-					err = cs_etm__sample(etmq);
+				switch (etmq->packet->sample_type) {
+				case CS_ETM_RANGE:
+					/*
+					 * If the packet contains an instruction
+					 * range, generate instruction sequence
+					 * events.
+					 */
+					cs_etm__sample(etmq);
+					break;
+				case CS_ETM_TRACE_ON:
+					/*
+					 * Discontinuity in trace, flush
+					 * previous branch stack
+					 */
+					cs_etm__flush(etmq);
+					break;
+				default:
+					break;
+				}
 			}
 		} while (buffer.len > buffer_used);
 
-		/*
-		 * Generate a last branch event for the branches left in
-		 * the circular buffer at the end of the trace.
-		 */
-		if (etm->sample_instructions &&
-		    etmq->etm->synth_opts.last_branch) {
-			struct branch_stack *bs = etmq->last_branch_rb;
-			struct branch_entry *be =
-				&bs->entries[etmq->last_branch_pos];
-
-			err = cs_etm__synth_instruction_sample(
-				etmq, be->to, etmq->period_instructions);
-			if (err)
-				return err;
-		}
-
+		if (err == 0)
+			/* Flush any remaining branch stack entries */
+			err = cs_etm__flush(etmq);
 	}
 
 	return err;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 28/41] perf inject: Emit instruction records on ETM trace discontinuity
@ 2018-02-16 19:17   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: linux-arm-kernel

From: Robert Walker <robert.walker@arm.com>

There may be discontinuities in the ETM trace stream due to overflows or
ETM configuration for selective trace.  This patch emits an instruction
sample with the pending branch stack when a TRACE ON packet occurs
indicating a discontinuity in the trace data.

A new packet type CS_ETM_TRACE_ON is added, which is emitted by the low
level decoder when a TRACE ON occurs.  The higher level decoder flushes
the branch stack when this packet is emitted.

Signed-off-by: Robert Walker <robert.walker@arm.com>
Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: coresight at lists.linaro.org
Cc: linux-arm-kernel at lists.infradead.org
Link: http://lkml.kernel.org/r/1518607481-4059-3-git-send-email-robert.walker at arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c |  9 +++
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.h |  1 +
 tools/perf/util/cs-etm.c                        | 80 ++++++++++++++++++-------
 3 files changed, 67 insertions(+), 23 deletions(-)

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index 8ff69dfd725a..640af88331b4 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -328,7 +328,14 @@ cs_etm_decoder__buffer_range(struct cs_etm_decoder *decoder,
 	}
 
 	return ret;
+}
 
+static ocsd_datapath_resp_t
+cs_etm_decoder__buffer_trace_on(struct cs_etm_decoder *decoder,
+				const uint8_t trace_chan_id)
+{
+	return cs_etm_decoder__buffer_packet(decoder, trace_chan_id,
+					     CS_ETM_TRACE_ON);
 }
 
 static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
@@ -347,6 +354,8 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
 		decoder->trace_on = false;
 		break;
 	case OCSD_GEN_TRC_ELEM_TRACE_ON:
+		resp = cs_etm_decoder__buffer_trace_on(decoder,
+						       trace_chan_id);
 		decoder->trace_on = true;
 		break;
 	case OCSD_GEN_TRC_ELEM_INSTR_RANGE:
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
index a4fdd285b145..743f5f444304 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.h
@@ -24,6 +24,7 @@ struct cs_etm_buffer {
 
 enum cs_etm_sample_type {
 	CS_ETM_RANGE = 1 << 0,
+	CS_ETM_TRACE_ON = 1 << 1,
 };
 
 struct cs_etm_packet {
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 6e595d96c04d..1b0d422373be 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -867,6 +867,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
 	 */
 	if (etm->synth_opts.last_branch &&
 	    etmq->prev_packet &&
+	    etmq->prev_packet->sample_type == CS_ETM_RANGE &&
 	    etmq->prev_packet->last_instr_taken_branch)
 		cs_etm__update_last_branch_rb(etmq);
 
@@ -920,6 +921,40 @@ static int cs_etm__sample(struct cs_etm_queue *etmq)
 	return 0;
 }
 
+static int cs_etm__flush(struct cs_etm_queue *etmq)
+{
+	int err = 0;
+	struct cs_etm_packet *tmp;
+
+	if (etmq->etm->synth_opts.last_branch &&
+	    etmq->prev_packet &&
+	    etmq->prev_packet->sample_type == CS_ETM_RANGE) {
+		/*
+		 * Generate a last branch event for the branches left in the
+		 * circular buffer at the end of the trace.
+		 *
+		 * Use the address of the end of the last reported execution
+		 * range
+		 */
+		u64 addr = cs_etm__last_executed_instr(etmq->prev_packet);
+
+		err = cs_etm__synth_instruction_sample(
+			etmq, addr,
+			etmq->period_instructions);
+		etmq->period_instructions = 0;
+
+		/*
+		 * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
+		 * the next incoming packet.
+		 */
+		tmp = etmq->packet;
+		etmq->packet = etmq->prev_packet;
+		etmq->prev_packet = tmp;
+	}
+
+	return err;
+}
+
 static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
 {
 	struct cs_etm_auxtrace *etm = etmq->etm;
@@ -971,32 +1006,31 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
 					 */
 					break;
 
-				/*
-				 * If the packet contains an instruction
-				 * range, generate instruction sequence
-				 * events.
-				 */
-				if (etmq->packet->sample_type & CS_ETM_RANGE)
-					err = cs_etm__sample(etmq);
+				switch (etmq->packet->sample_type) {
+				case CS_ETM_RANGE:
+					/*
+					 * If the packet contains an instruction
+					 * range, generate instruction sequence
+					 * events.
+					 */
+					cs_etm__sample(etmq);
+					break;
+				case CS_ETM_TRACE_ON:
+					/*
+					 * Discontinuity in trace, flush
+					 * previous branch stack
+					 */
+					cs_etm__flush(etmq);
+					break;
+				default:
+					break;
+				}
 			}
 		} while (buffer.len > buffer_used);
 
-		/*
-		 * Generate a last branch event for the branches left in
-		 * the circular buffer at the end of the trace.
-		 */
-		if (etm->sample_instructions &&
-		    etmq->etm->synth_opts.last_branch) {
-			struct branch_stack *bs = etmq->last_branch_rb;
-			struct branch_entry *be =
-				&bs->entries[etmq->last_branch_pos];
-
-			err = cs_etm__synth_instruction_sample(
-				etmq, be->to, etmq->period_instructions);
-			if (err)
-				return err;
-		}
-
+		if (err == 0)
+			/* Flush any remaining branch stack entries */
+			err = cs_etm__flush(etmq);
 	}
 
 	return err;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 29/41] coresight: Update documentation for perf usage
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  (?)
@ 2018-02-16 19:17   ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Robert Walker, Mathieu Poirier,
	coresight, linux-arm-kernel, Arnaldo Carvalho de Melo

From: Robert Walker <robert.walker@arm.com>

Add notes on using perf to collect and analyze CoreSight trace

Signed-off-by: Robert Walker <robert.walker@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1518607481-4059-4-git-send-email-robert.walker@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 Documentation/trace/coresight.txt | 51 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/Documentation/trace/coresight.txt b/Documentation/trace/coresight.txt
index a33c88cd5d1d..6f0120c3a4f1 100644
--- a/Documentation/trace/coresight.txt
+++ b/Documentation/trace/coresight.txt
@@ -330,3 +330,54 @@ Details on how to use the generic STM API can be found here [2].
 
 [1]. Documentation/ABI/testing/sysfs-bus-coresight-devices-stm
 [2]. Documentation/trace/stm.txt
+
+
+Using perf tools
+----------------
+
+perf can be used to record and analyze trace of programs.
+
+Execution can be recorded using 'perf record' with the cs_etm event,
+specifying the name of the sink to record to, e.g:
+
+    perf record -e cs_etm/@20070000.etr/u --per-thread
+
+The 'perf report' and 'perf script' commands can be used to analyze execution,
+synthesizing instruction and branch events from the instruction trace.
+'perf inject' can be used to replace the trace data with the synthesized events.
+The --itrace option controls the type and frequency of synthesized events
+(see perf documentation).
+
+Note that only 64-bit programs are currently supported - further work is
+required to support instruction decode of 32-bit Arm programs.
+
+
+Generating coverage files for Feedback Directed Optimization: AutoFDO
+---------------------------------------------------------------------
+
+'perf inject' accepts the --itrace option in which case tracing data is
+removed and replaced with the synthesized events. e.g.
+
+	perf inject --itrace --strip -i perf.data -o perf.data.new
+
+Below is an example of using ARM ETM for autoFDO.  It requires autofdo
+(https://github.com/google/autofdo) and gcc version 5.  The bubble
+sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial).
+
+	$ gcc-5 -O3 sort.c -o sort
+	$ taskset -c 2 ./sort
+	Bubble sorting array of 30000 elements
+	5910 ms
+
+	$ perf record -e cs_etm/@20070000.etr/u --per-thread taskset -c 2 ./sort
+	Bubble sorting array of 30000 elements
+	12543 ms
+	[ perf record: Woken up 35 times to write data ]
+	[ perf record: Captured and wrote 69.640 MB perf.data ]
+
+	$ perf inject -i perf.data -o inj.data --itrace=il64 --strip
+	$ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1
+	$ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
+	$ taskset -c 2 ./sort_autofdo
+	Bubble sorting array of 30000 elements
+	5806 ms
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 29/41] coresight: Update documentation for perf usage
@ 2018-02-16 19:17   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, Mathieu Poirier, coresight,
	linux-kernel, linux-perf-users, Robert Walker, linux-arm-kernel

From: Robert Walker <robert.walker@arm.com>

Add notes on using perf to collect and analyze CoreSight trace

Signed-off-by: Robert Walker <robert.walker@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1518607481-4059-4-git-send-email-robert.walker@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 Documentation/trace/coresight.txt | 51 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/Documentation/trace/coresight.txt b/Documentation/trace/coresight.txt
index a33c88cd5d1d..6f0120c3a4f1 100644
--- a/Documentation/trace/coresight.txt
+++ b/Documentation/trace/coresight.txt
@@ -330,3 +330,54 @@ Details on how to use the generic STM API can be found here [2].
 
 [1]. Documentation/ABI/testing/sysfs-bus-coresight-devices-stm
 [2]. Documentation/trace/stm.txt
+
+
+Using perf tools
+----------------
+
+perf can be used to record and analyze trace of programs.
+
+Execution can be recorded using 'perf record' with the cs_etm event,
+specifying the name of the sink to record to, e.g:
+
+    perf record -e cs_etm/@20070000.etr/u --per-thread
+
+The 'perf report' and 'perf script' commands can be used to analyze execution,
+synthesizing instruction and branch events from the instruction trace.
+'perf inject' can be used to replace the trace data with the synthesized events.
+The --itrace option controls the type and frequency of synthesized events
+(see perf documentation).
+
+Note that only 64-bit programs are currently supported - further work is
+required to support instruction decode of 32-bit Arm programs.
+
+
+Generating coverage files for Feedback Directed Optimization: AutoFDO
+---------------------------------------------------------------------
+
+'perf inject' accepts the --itrace option in which case tracing data is
+removed and replaced with the synthesized events. e.g.
+
+	perf inject --itrace --strip -i perf.data -o perf.data.new
+
+Below is an example of using ARM ETM for autoFDO.  It requires autofdo
+(https://github.com/google/autofdo) and gcc version 5.  The bubble
+sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial).
+
+	$ gcc-5 -O3 sort.c -o sort
+	$ taskset -c 2 ./sort
+	Bubble sorting array of 30000 elements
+	5910 ms
+
+	$ perf record -e cs_etm/@20070000.etr/u --per-thread taskset -c 2 ./sort
+	Bubble sorting array of 30000 elements
+	12543 ms
+	[ perf record: Woken up 35 times to write data ]
+	[ perf record: Captured and wrote 69.640 MB perf.data ]
+
+	$ perf inject -i perf.data -o inj.data --itrace=il64 --strip
+	$ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1
+	$ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
+	$ taskset -c 2 ./sort_autofdo
+	Bubble sorting array of 30000 elements
+	5806 ms
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 29/41] coresight: Update documentation for perf usage
@ 2018-02-16 19:17   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: linux-arm-kernel

From: Robert Walker <robert.walker@arm.com>

Add notes on using perf to collect and analyze CoreSight trace

Signed-off-by: Robert Walker <robert.walker@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: coresight at lists.linaro.org
Cc: linux-arm-kernel at lists.infradead.org
Link: http://lkml.kernel.org/r/1518607481-4059-4-git-send-email-robert.walker at arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 Documentation/trace/coresight.txt | 51 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/Documentation/trace/coresight.txt b/Documentation/trace/coresight.txt
index a33c88cd5d1d..6f0120c3a4f1 100644
--- a/Documentation/trace/coresight.txt
+++ b/Documentation/trace/coresight.txt
@@ -330,3 +330,54 @@ Details on how to use the generic STM API can be found here [2].
 
 [1]. Documentation/ABI/testing/sysfs-bus-coresight-devices-stm
 [2]. Documentation/trace/stm.txt
+
+
+Using perf tools
+----------------
+
+perf can be used to record and analyze trace of programs.
+
+Execution can be recorded using 'perf record' with the cs_etm event,
+specifying the name of the sink to record to, e.g:
+
+    perf record -e cs_etm/@20070000.etr/u --per-thread
+
+The 'perf report' and 'perf script' commands can be used to analyze execution,
+synthesizing instruction and branch events from the instruction trace.
+'perf inject' can be used to replace the trace data with the synthesized events.
+The --itrace option controls the type and frequency of synthesized events
+(see perf documentation).
+
+Note that only 64-bit programs are currently supported - further work is
+required to support instruction decode of 32-bit Arm programs.
+
+
+Generating coverage files for Feedback Directed Optimization: AutoFDO
+---------------------------------------------------------------------
+
+'perf inject' accepts the --itrace option in which case tracing data is
+removed and replaced with the synthesized events. e.g.
+
+	perf inject --itrace --strip -i perf.data -o perf.data.new
+
+Below is an example of using ARM ETM for autoFDO.  It requires autofdo
+(https://github.com/google/autofdo) and gcc version 5.  The bubble
+sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial).
+
+	$ gcc-5 -O3 sort.c -o sort
+	$ taskset -c 2 ./sort
+	Bubble sorting array of 30000 elements
+	5910 ms
+
+	$ perf record -e cs_etm/@20070000.etr/u --per-thread taskset -c 2 ./sort
+	Bubble sorting array of 30000 elements
+	12543 ms
+	[ perf record: Woken up 35 times to write data ]
+	[ perf record: Captured and wrote 69.640 MB perf.data ]
+
+	$ perf inject -i perf.data -o inj.data --itrace=il64 --strip
+	$ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1
+	$ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
+	$ taskset -c 2 ./sort_autofdo
+	Bubble sorting array of 30000 elements
+	5806 ms
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 30/41] perf report: Fix description for --mem-mode
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (30 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Andi Kleen, Jiri Olsa,
	Arnaldo Carvalho de Melo

From: Andi Kleen <ak@linux.intel.com>

The "mem-loads" event only works when PEBS is enabled, so add the "/p"
("precise") suffix to the examples.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
LPU-Reference: 20180209163909.9240-1-andi@firstfloor.org
Link: https://lkml.kernel.org/n/tip-v0gcd4u9tktrvjjsp6y7ouv4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/perf-report.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index a76b871f78a6..cba16d8a970e 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -368,7 +368,7 @@ OPTIONS
 	Use the data addresses of samples in addition to instruction addresses
 	to build the histograms.  To generate meaningful output, the perf.data
 	file must have been obtained using perf record -d -W and using a
-	special event -e cpu/mem-loads/ or -e cpu/mem-stores/. See
+	special event -e cpu/mem-loads/p or -e cpu/mem-stores/p. See
 	'perf mem' for simpler access.
 
 --percent-limit::
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 31/41] perf report: Fix wrong jump arrow
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (31 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Jin Yao, Alexander Shishkin,
	Andi Kleen, Jin Yao, Jiri Olsa, Kan Liang, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Jin Yao <yao.jin@linux.intel.com>

When we use perf report interactive annotate view, we can see
the position of jump arrow is not correct. For example,

1. perf record -b ...
2. perf report
3. In interactive mode, select Annotate 'function'

Percent│ IPC Cycle
       │                                if (flag)
  1.37 │0.4┌──   1      ↓ je     82
       │   │                                    x += x / y + y / x;
  0.00 │0.4│  1310        movsd  (%rsp),%xmm0
  0.00 │0.4│   565        movsd  0x8(%rsp),%xmm4
       │0.4│              movsd  0x8(%rsp),%xmm1
       │0.4│              movsd  (%rsp),%xmm3
       │0.4│              divsd  %xmm4,%xmm0
  0.00 │0.4│   579        divsd  %xmm3,%xmm1
       │0.4│              movsd  (%rsp),%xmm2
       │0.4│              addsd  %xmm1,%xmm0
       │0.4│              addsd  %xmm2,%xmm0
  0.00 │0.4│              movsd  %xmm0,(%rsp)
       │   │                    volatile double x = 1212121212, y = 121212;
       │   │
       │   │                    s_randseed = time(0);
       │   │                    srand(s_randseed);
       │   │
       │   │                    for (i = 0; i < 2000000000; i++) {
  1.37 │0.4└─→      82:   sub    $0x1,%ebx
 28.21 │0.48    17      ↑ jne    38

The jump arrow in above example is not correct. It should add the
width of IPC and Cycle.

With this patch, the result is:

Percent│ IPC Cycle
       │                                if (flag)
  1.37 │0.48     1     ┌──je     82
       │               │                        x += x / y + y / x;
  0.00 │0.48  1310     │  movsd  (%rsp),%xmm0
  0.00 │0.48   565     │  movsd  0x8(%rsp),%xmm4
       │0.48           │  movsd  0x8(%rsp),%xmm1
       │0.48           │  movsd  (%rsp),%xmm3
       │0.48           │  divsd  %xmm4,%xmm0
  0.00 │0.48   579     │  divsd  %xmm3,%xmm1
       │0.48           │  movsd  (%rsp),%xmm2
       │0.48           │  addsd  %xmm1,%xmm0
       │0.48           │  addsd  %xmm2,%xmm0
  0.00 │0.48           │  movsd  %xmm0,(%rsp)
       │               │        volatile double x = 1212121212, y = 121212;
       │               │
       │               │        s_randseed = time(0);
       │               │        srand(s_randseed);
       │               │
       │               │        for (i = 0; i < 2000000000; i++) {
  1.37 │0.48        82:└─→sub    $0x1,%ebx
 28.21 │0.48    17      ↑ jne    38

Committer notes:

Please note that only from LBRv5 (according to Jiri) onwards, i.e. >=
Skylake is that we'll have the cycles counts in each branch record
entry, so to see the Cycles and IPC columns, and be able to test this
patch, one need a capable hardware.

While applying this I first tested it on a Broadwell class machine and
couldn't get those columns, will add code to the annotate browser to
warn the user about that, i.e. you have branch records, but no cycles,
use a more recent hardware to get the cycles and IPC columns.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1517223473-14750-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/ui/browsers/annotate.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c
index 286427975112..e2f666391ac4 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -319,6 +319,7 @@ static void annotate_browser__draw_current_jump(struct ui_browser *browser)
 	struct map_symbol *ms = ab->b.priv;
 	struct symbol *sym = ms->sym;
 	u8 pcnt_width = annotate_browser__pcnt_width(ab);
+	int width = 0;
 
 	/* PLT symbols contain external offsets */
 	if (strstr(sym->name, "@plt"))
@@ -340,13 +341,17 @@ static void annotate_browser__draw_current_jump(struct ui_browser *browser)
 		to = (u64)btarget->idx;
 	}
 
+	if (ab->have_cycles)
+		width = IPC_WIDTH + CYCLES_WIDTH;
+
 	ui_browser__set_color(browser, HE_COLORSET_JUMP_ARROWS);
-	__ui_browser__line_arrow(browser, pcnt_width + 2 + ab->addr_width,
+	__ui_browser__line_arrow(browser,
+				 pcnt_width + 2 + ab->addr_width + width,
 				 from, to);
 
 	if (is_fused(ab, cursor)) {
 		ui_browser__mark_fused(browser,
-				       pcnt_width + 3 + ab->addr_width,
+				       pcnt_width + 3 + ab->addr_width + width,
 				       from - 1,
 				       to > from ? true : false);
 	}
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 32/41] perf report: Fix memory corruption in --branch-history mode --branch-history
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (32 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Jiri Olsa, Jiri Olsa,
	Alexander Shishkin, Andi Kleen, Kan Liang, Peter Zijlstra,
	Arnaldo Carvalho de Melo

From: Jiri Olsa <jolsa@redhat.com>

Jin Yao reported memory corrupton in perf report with
branch info used for stack trace:

  > Following command lines will cause perf crash.

  > perf record -j call -g -a <application>
  > perf report --branch-history
  >
  > *** Error in `perf': double free or corruption (!prev): 0x00000000104aa040 ***
  > ======= Backtrace: =========
  > /lib/x86_64-linux-gnu/libc.so.6(+0x77725)[0x7f6b37254725]
  > /lib/x86_64-linux-gnu/libc.so.6(+0x7ff4a)[0x7f6b3725cf4a]
  > /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f6b37260abc]
  > perf[0x51b914]
  > perf(hist_entry_iter__add+0x1e5)[0x51f305]
  > perf[0x43cf01]
  > perf[0x4fa3bf]
  > perf[0x4fa923]
  > perf[0x4fd396]
  > perf[0x4f9614]
  > perf(perf_session__process_events+0x89e)[0x4fc38e]
  > perf(cmd_report+0x15d2)[0x43f202]
  > perf[0x4a059f]
  > perf(main+0x631)[0x427b71]
  > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f6b371fd830]
  > perf(_start+0x29)[0x427d89]

For the cumulative output, we allocate the he_cache array based on the
--max-stack option value and populate it with data from 'callchain_cursor'.

The --max-stack option value does not ensure now the limit for number of
callchain_cursor nodes, so the cumulative iter code will allocate smaller array
than it's actually needed and cause above corruption.

I think the --max-stack limit does not apply here anyway, because we add
callchain data as normal hist entries, while the --max-stack control the limit
of single entry callchain depth.

Using the callchain_cursor.nr as he_cache array count to fix this. Also
removing struct hist_entry_iter::max_stack, because there's no longer any use
for it.

We need more fixes to ensure that the branch stack code follows properly the
logic of --max-stack, which is not the case at the moment.

Original-patch-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reported-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180216123619.GA9945@krava
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/hist.c | 4 +---
 tools/perf/util/hist.h | 1 -
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index b6140950301e..44a8456cea10 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -879,7 +879,7 @@ iter_prepare_cumulative_entry(struct hist_entry_iter *iter,
 	 * cumulated only one time to prevent entries more than 100%
 	 * overhead.
 	 */
-	he_cache = malloc(sizeof(*he_cache) * (iter->max_stack + 1));
+	he_cache = malloc(sizeof(*he_cache) * (callchain_cursor.nr + 1));
 	if (he_cache == NULL)
 		return -ENOMEM;
 
@@ -1045,8 +1045,6 @@ int hist_entry_iter__add(struct hist_entry_iter *iter, struct addr_location *al,
 	if (err)
 		return err;
 
-	iter->max_stack = max_stack_depth;
-
 	err = iter->ops->prepare_entry(iter, al);
 	if (err)
 		goto out;
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 02721b579746..e869cad4d89f 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -107,7 +107,6 @@ struct hist_entry_iter {
 	int curr;
 
 	bool hide_unresolved;
-	int max_stack;
 
 	struct perf_evsel *evsel;
 	struct perf_sample *sample;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 33/41] tools include powerpc: Grab a copy of arch/powerpc/include/uapi/asm/unistd.h
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
@ 2018-02-16 19:17   ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Ravi Bangoria,
	Alexander Shishkin, Hendrik Brueckner, Jiri Olsa,
	Michael Ellerman, Namhyung Kim, Thomas Richter, linuxppc-dev,
	Arnaldo Carvalho de Melo

From: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>

Will be used for generating the syscall id/string translation table.

Committer notes:

Update it already to catch with these csets applied since Ravi first
submitted this patch:

  3350eb2ea127 powerpc: sys_pkey_mprotect() system call
  9499ec1b5e82 powerpc: sys_pkey_alloc() and sys_pkey_free() system calls

So now 'perf trace' on ppc now knows about the pkey_ syscals.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20180129083417.31240-2-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/arch/powerpc/include/uapi/asm/unistd.h | 402 +++++++++++++++++++++++++++
 tools/perf/check-headers.sh                  |   1 +
 2 files changed, 403 insertions(+)
 create mode 100644 tools/arch/powerpc/include/uapi/asm/unistd.h

diff --git a/tools/arch/powerpc/include/uapi/asm/unistd.h b/tools/arch/powerpc/include/uapi/asm/unistd.h
new file mode 100644
index 000000000000..389c36fd8299
--- /dev/null
+++ b/tools/arch/powerpc/include/uapi/asm/unistd.h
@@ -0,0 +1,402 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/*
+ * This file contains the system call numbers.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#ifndef _UAPI_ASM_POWERPC_UNISTD_H_
+#define _UAPI_ASM_POWERPC_UNISTD_H_
+
+
+#define __NR_restart_syscall	  0
+#define __NR_exit		  1
+#define __NR_fork		  2
+#define __NR_read		  3
+#define __NR_write		  4
+#define __NR_open		  5
+#define __NR_close		  6
+#define __NR_waitpid		  7
+#define __NR_creat		  8
+#define __NR_link		  9
+#define __NR_unlink		 10
+#define __NR_execve		 11
+#define __NR_chdir		 12
+#define __NR_time		 13
+#define __NR_mknod		 14
+#define __NR_chmod		 15
+#define __NR_lchown		 16
+#define __NR_break		 17
+#define __NR_oldstat		 18
+#define __NR_lseek		 19
+#define __NR_getpid		 20
+#define __NR_mount		 21
+#define __NR_umount		 22
+#define __NR_setuid		 23
+#define __NR_getuid		 24
+#define __NR_stime		 25
+#define __NR_ptrace		 26
+#define __NR_alarm		 27
+#define __NR_oldfstat		 28
+#define __NR_pause		 29
+#define __NR_utime		 30
+#define __NR_stty		 31
+#define __NR_gtty		 32
+#define __NR_access		 33
+#define __NR_nice		 34
+#define __NR_ftime		 35
+#define __NR_sync		 36
+#define __NR_kill		 37
+#define __NR_rename		 38
+#define __NR_mkdir		 39
+#define __NR_rmdir		 40
+#define __NR_dup		 41
+#define __NR_pipe		 42
+#define __NR_times		 43
+#define __NR_prof		 44
+#define __NR_brk		 45
+#define __NR_setgid		 46
+#define __NR_getgid		 47
+#define __NR_signal		 48
+#define __NR_geteuid		 49
+#define __NR_getegid		 50
+#define __NR_acct		 51
+#define __NR_umount2		 52
+#define __NR_lock		 53
+#define __NR_ioctl		 54
+#define __NR_fcntl		 55
+#define __NR_mpx		 56
+#define __NR_setpgid		 57
+#define __NR_ulimit		 58
+#define __NR_oldolduname	 59
+#define __NR_umask		 60
+#define __NR_chroot		 61
+#define __NR_ustat		 62
+#define __NR_dup2		 63
+#define __NR_getppid		 64
+#define __NR_getpgrp		 65
+#define __NR_setsid		 66
+#define __NR_sigaction		 67
+#define __NR_sgetmask		 68
+#define __NR_ssetmask		 69
+#define __NR_setreuid		 70
+#define __NR_setregid		 71
+#define __NR_sigsuspend		 72
+#define __NR_sigpending		 73
+#define __NR_sethostname	 74
+#define __NR_setrlimit		 75
+#define __NR_getrlimit		 76
+#define __NR_getrusage		 77
+#define __NR_gettimeofday	 78
+#define __NR_settimeofday	 79
+#define __NR_getgroups		 80
+#define __NR_setgroups		 81
+#define __NR_select		 82
+#define __NR_symlink		 83
+#define __NR_oldlstat		 84
+#define __NR_readlink		 85
+#define __NR_uselib		 86
+#define __NR_swapon		 87
+#define __NR_reboot		 88
+#define __NR_readdir		 89
+#define __NR_mmap		 90
+#define __NR_munmap		 91
+#define __NR_truncate		 92
+#define __NR_ftruncate		 93
+#define __NR_fchmod		 94
+#define __NR_fchown		 95
+#define __NR_getpriority	 96
+#define __NR_setpriority	 97
+#define __NR_profil		 98
+#define __NR_statfs		 99
+#define __NR_fstatfs		100
+#define __NR_ioperm		101
+#define __NR_socketcall		102
+#define __NR_syslog		103
+#define __NR_setitimer		104
+#define __NR_getitimer		105
+#define __NR_stat		106
+#define __NR_lstat		107
+#define __NR_fstat		108
+#define __NR_olduname		109
+#define __NR_iopl		110
+#define __NR_vhangup		111
+#define __NR_idle		112
+#define __NR_vm86		113
+#define __NR_wait4		114
+#define __NR_swapoff		115
+#define __NR_sysinfo		116
+#define __NR_ipc		117
+#define __NR_fsync		118
+#define __NR_sigreturn		119
+#define __NR_clone		120
+#define __NR_setdomainname	121
+#define __NR_uname		122
+#define __NR_modify_ldt		123
+#define __NR_adjtimex		124
+#define __NR_mprotect		125
+#define __NR_sigprocmask	126
+#define __NR_create_module	127
+#define __NR_init_module	128
+#define __NR_delete_module	129
+#define __NR_get_kernel_syms	130
+#define __NR_quotactl		131
+#define __NR_getpgid		132
+#define __NR_fchdir		133
+#define __NR_bdflush		134
+#define __NR_sysfs		135
+#define __NR_personality	136
+#define __NR_afs_syscall	137 /* Syscall for Andrew File System */
+#define __NR_setfsuid		138
+#define __NR_setfsgid		139
+#define __NR__llseek		140
+#define __NR_getdents		141
+#define __NR__newselect		142
+#define __NR_flock		143
+#define __NR_msync		144
+#define __NR_readv		145
+#define __NR_writev		146
+#define __NR_getsid		147
+#define __NR_fdatasync		148
+#define __NR__sysctl		149
+#define __NR_mlock		150
+#define __NR_munlock		151
+#define __NR_mlockall		152
+#define __NR_munlockall		153
+#define __NR_sched_setparam		154
+#define __NR_sched_getparam		155
+#define __NR_sched_setscheduler		156
+#define __NR_sched_getscheduler		157
+#define __NR_sched_yield		158
+#define __NR_sched_get_priority_max	159
+#define __NR_sched_get_priority_min	160
+#define __NR_sched_rr_get_interval	161
+#define __NR_nanosleep		162
+#define __NR_mremap		163
+#define __NR_setresuid		164
+#define __NR_getresuid		165
+#define __NR_query_module	166
+#define __NR_poll		167
+#define __NR_nfsservctl		168
+#define __NR_setresgid		169
+#define __NR_getresgid		170
+#define __NR_prctl		171
+#define __NR_rt_sigreturn	172
+#define __NR_rt_sigaction	173
+#define __NR_rt_sigprocmask	174
+#define __NR_rt_sigpending	175
+#define __NR_rt_sigtimedwait	176
+#define __NR_rt_sigqueueinfo	177
+#define __NR_rt_sigsuspend	178
+#define __NR_pread64		179
+#define __NR_pwrite64		180
+#define __NR_chown		181
+#define __NR_getcwd		182
+#define __NR_capget		183
+#define __NR_capset		184
+#define __NR_sigaltstack	185
+#define __NR_sendfile		186
+#define __NR_getpmsg		187	/* some people actually want streams */
+#define __NR_putpmsg		188	/* some people actually want streams */
+#define __NR_vfork		189
+#define __NR_ugetrlimit		190	/* SuS compliant getrlimit */
+#define __NR_readahead		191
+#ifndef __powerpc64__			/* these are 32-bit only */
+#define __NR_mmap2		192
+#define __NR_truncate64		193
+#define __NR_ftruncate64	194
+#define __NR_stat64		195
+#define __NR_lstat64		196
+#define __NR_fstat64		197
+#endif
+#define __NR_pciconfig_read	198
+#define __NR_pciconfig_write	199
+#define __NR_pciconfig_iobase	200
+#define __NR_multiplexer	201
+#define __NR_getdents64		202
+#define __NR_pivot_root		203
+#ifndef __powerpc64__
+#define __NR_fcntl64		204
+#endif
+#define __NR_madvise		205
+#define __NR_mincore		206
+#define __NR_gettid		207
+#define __NR_tkill		208
+#define __NR_setxattr		209
+#define __NR_lsetxattr		210
+#define __NR_fsetxattr		211
+#define __NR_getxattr		212
+#define __NR_lgetxattr		213
+#define __NR_fgetxattr		214
+#define __NR_listxattr		215
+#define __NR_llistxattr		216
+#define __NR_flistxattr		217
+#define __NR_removexattr	218
+#define __NR_lremovexattr	219
+#define __NR_fremovexattr	220
+#define __NR_futex		221
+#define __NR_sched_setaffinity	222
+#define __NR_sched_getaffinity	223
+/* 224 currently unused */
+#define __NR_tuxcall		225
+#ifndef __powerpc64__
+#define __NR_sendfile64		226
+#endif
+#define __NR_io_setup		227
+#define __NR_io_destroy		228
+#define __NR_io_getevents	229
+#define __NR_io_submit		230
+#define __NR_io_cancel		231
+#define __NR_set_tid_address	232
+#define __NR_fadvise64		233
+#define __NR_exit_group		234
+#define __NR_lookup_dcookie	235
+#define __NR_epoll_create	236
+#define __NR_epoll_ctl		237
+#define __NR_epoll_wait		238
+#define __NR_remap_file_pages	239
+#define __NR_timer_create	240
+#define __NR_timer_settime	241
+#define __NR_timer_gettime	242
+#define __NR_timer_getoverrun	243
+#define __NR_timer_delete	244
+#define __NR_clock_settime	245
+#define __NR_clock_gettime	246
+#define __NR_clock_getres	247
+#define __NR_clock_nanosleep	248
+#define __NR_swapcontext	249
+#define __NR_tgkill		250
+#define __NR_utimes		251
+#define __NR_statfs64		252
+#define __NR_fstatfs64		253
+#ifndef __powerpc64__
+#define __NR_fadvise64_64	254
+#endif
+#define __NR_rtas		255
+#define __NR_sys_debug_setcontext 256
+/* Number 257 is reserved for vserver */
+#define __NR_migrate_pages	258
+#define __NR_mbind		259
+#define __NR_get_mempolicy	260
+#define __NR_set_mempolicy	261
+#define __NR_mq_open		262
+#define __NR_mq_unlink		263
+#define __NR_mq_timedsend	264
+#define __NR_mq_timedreceive	265
+#define __NR_mq_notify		266
+#define __NR_mq_getsetattr	267
+#define __NR_kexec_load		268
+#define __NR_add_key		269
+#define __NR_request_key	270
+#define __NR_keyctl		271
+#define __NR_waitid		272
+#define __NR_ioprio_set		273
+#define __NR_ioprio_get		274
+#define __NR_inotify_init	275
+#define __NR_inotify_add_watch	276
+#define __NR_inotify_rm_watch	277
+#define __NR_spu_run		278
+#define __NR_spu_create		279
+#define __NR_pselect6		280
+#define __NR_ppoll		281
+#define __NR_unshare		282
+#define __NR_splice		283
+#define __NR_tee		284
+#define __NR_vmsplice		285
+#define __NR_openat		286
+#define __NR_mkdirat		287
+#define __NR_mknodat		288
+#define __NR_fchownat		289
+#define __NR_futimesat		290
+#ifdef __powerpc64__
+#define __NR_newfstatat		291
+#else
+#define __NR_fstatat64		291
+#endif
+#define __NR_unlinkat		292
+#define __NR_renameat		293
+#define __NR_linkat		294
+#define __NR_symlinkat		295
+#define __NR_readlinkat		296
+#define __NR_fchmodat		297
+#define __NR_faccessat		298
+#define __NR_get_robust_list	299
+#define __NR_set_robust_list	300
+#define __NR_move_pages		301
+#define __NR_getcpu		302
+#define __NR_epoll_pwait	303
+#define __NR_utimensat		304
+#define __NR_signalfd		305
+#define __NR_timerfd_create	306
+#define __NR_eventfd		307
+#define __NR_sync_file_range2	308
+#define __NR_fallocate		309
+#define __NR_subpage_prot	310
+#define __NR_timerfd_settime	311
+#define __NR_timerfd_gettime	312
+#define __NR_signalfd4		313
+#define __NR_eventfd2		314
+#define __NR_epoll_create1	315
+#define __NR_dup3		316
+#define __NR_pipe2		317
+#define __NR_inotify_init1	318
+#define __NR_perf_event_open	319
+#define __NR_preadv		320
+#define __NR_pwritev		321
+#define __NR_rt_tgsigqueueinfo	322
+#define __NR_fanotify_init	323
+#define __NR_fanotify_mark	324
+#define __NR_prlimit64		325
+#define __NR_socket		326
+#define __NR_bind		327
+#define __NR_connect		328
+#define __NR_listen		329
+#define __NR_accept		330
+#define __NR_getsockname	331
+#define __NR_getpeername	332
+#define __NR_socketpair		333
+#define __NR_send		334
+#define __NR_sendto		335
+#define __NR_recv		336
+#define __NR_recvfrom		337
+#define __NR_shutdown		338
+#define __NR_setsockopt		339
+#define __NR_getsockopt		340
+#define __NR_sendmsg		341
+#define __NR_recvmsg		342
+#define __NR_recvmmsg		343
+#define __NR_accept4		344
+#define __NR_name_to_handle_at	345
+#define __NR_open_by_handle_at	346
+#define __NR_clock_adjtime	347
+#define __NR_syncfs		348
+#define __NR_sendmmsg		349
+#define __NR_setns		350
+#define __NR_process_vm_readv	351
+#define __NR_process_vm_writev	352
+#define __NR_finit_module	353
+#define __NR_kcmp		354
+#define __NR_sched_setattr	355
+#define __NR_sched_getattr	356
+#define __NR_renameat2		357
+#define __NR_seccomp		358
+#define __NR_getrandom		359
+#define __NR_memfd_create	360
+#define __NR_bpf		361
+#define __NR_execveat		362
+#define __NR_switch_endian	363
+#define __NR_userfaultfd	364
+#define __NR_membarrier		365
+#define __NR_mlock2		378
+#define __NR_copy_file_range	379
+#define __NR_preadv2		380
+#define __NR_pwritev2		381
+#define __NR_kexec_file_load	382
+#define __NR_statx		383
+#define __NR_pkey_alloc		384
+#define __NR_pkey_free		385
+#define __NR_pkey_mprotect	386
+
+#endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
diff --git a/tools/perf/check-headers.sh b/tools/perf/check-headers.sh
index 790ec25919a0..bf206ffe5c45 100755
--- a/tools/perf/check-headers.sh
+++ b/tools/perf/check-headers.sh
@@ -42,6 +42,7 @@ arch/parisc/include/uapi/asm/errno.h
 arch/powerpc/include/uapi/asm/errno.h
 arch/sparc/include/uapi/asm/errno.h
 arch/x86/include/uapi/asm/errno.h
+arch/powerpc/include/uapi/asm/unistd.h
 include/asm-generic/bitops/arch_hweight.h
 include/asm-generic/bitops/const_hweight.h
 include/asm-generic/bitops/__fls.h
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 33/41] tools include powerpc: Grab a copy of arch/powerpc/include/uapi/asm/unistd.h
@ 2018-02-16 19:17   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, linuxppc-dev, Alexander Shishkin,
	Thomas Richter, linux-kernel, Ravi Bangoria, linux-perf-users,
	Hendrik Brueckner, Namhyung Kim, Jiri Olsa

From: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>

Will be used for generating the syscall id/string translation table.

Committer notes:

Update it already to catch with these csets applied since Ravi first
submitted this patch:

  3350eb2ea127 powerpc: sys_pkey_mprotect() system call
  9499ec1b5e82 powerpc: sys_pkey_alloc() and sys_pkey_free() system calls

So now 'perf trace' on ppc now knows about the pkey_ syscals.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20180129083417.31240-2-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/arch/powerpc/include/uapi/asm/unistd.h | 402 +++++++++++++++++++++++++++
 tools/perf/check-headers.sh                  |   1 +
 2 files changed, 403 insertions(+)
 create mode 100644 tools/arch/powerpc/include/uapi/asm/unistd.h

diff --git a/tools/arch/powerpc/include/uapi/asm/unistd.h b/tools/arch/powerpc/include/uapi/asm/unistd.h
new file mode 100644
index 000000000000..389c36fd8299
--- /dev/null
+++ b/tools/arch/powerpc/include/uapi/asm/unistd.h
@@ -0,0 +1,402 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/*
+ * This file contains the system call numbers.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#ifndef _UAPI_ASM_POWERPC_UNISTD_H_
+#define _UAPI_ASM_POWERPC_UNISTD_H_
+
+
+#define __NR_restart_syscall	  0
+#define __NR_exit		  1
+#define __NR_fork		  2
+#define __NR_read		  3
+#define __NR_write		  4
+#define __NR_open		  5
+#define __NR_close		  6
+#define __NR_waitpid		  7
+#define __NR_creat		  8
+#define __NR_link		  9
+#define __NR_unlink		 10
+#define __NR_execve		 11
+#define __NR_chdir		 12
+#define __NR_time		 13
+#define __NR_mknod		 14
+#define __NR_chmod		 15
+#define __NR_lchown		 16
+#define __NR_break		 17
+#define __NR_oldstat		 18
+#define __NR_lseek		 19
+#define __NR_getpid		 20
+#define __NR_mount		 21
+#define __NR_umount		 22
+#define __NR_setuid		 23
+#define __NR_getuid		 24
+#define __NR_stime		 25
+#define __NR_ptrace		 26
+#define __NR_alarm		 27
+#define __NR_oldfstat		 28
+#define __NR_pause		 29
+#define __NR_utime		 30
+#define __NR_stty		 31
+#define __NR_gtty		 32
+#define __NR_access		 33
+#define __NR_nice		 34
+#define __NR_ftime		 35
+#define __NR_sync		 36
+#define __NR_kill		 37
+#define __NR_rename		 38
+#define __NR_mkdir		 39
+#define __NR_rmdir		 40
+#define __NR_dup		 41
+#define __NR_pipe		 42
+#define __NR_times		 43
+#define __NR_prof		 44
+#define __NR_brk		 45
+#define __NR_setgid		 46
+#define __NR_getgid		 47
+#define __NR_signal		 48
+#define __NR_geteuid		 49
+#define __NR_getegid		 50
+#define __NR_acct		 51
+#define __NR_umount2		 52
+#define __NR_lock		 53
+#define __NR_ioctl		 54
+#define __NR_fcntl		 55
+#define __NR_mpx		 56
+#define __NR_setpgid		 57
+#define __NR_ulimit		 58
+#define __NR_oldolduname	 59
+#define __NR_umask		 60
+#define __NR_chroot		 61
+#define __NR_ustat		 62
+#define __NR_dup2		 63
+#define __NR_getppid		 64
+#define __NR_getpgrp		 65
+#define __NR_setsid		 66
+#define __NR_sigaction		 67
+#define __NR_sgetmask		 68
+#define __NR_ssetmask		 69
+#define __NR_setreuid		 70
+#define __NR_setregid		 71
+#define __NR_sigsuspend		 72
+#define __NR_sigpending		 73
+#define __NR_sethostname	 74
+#define __NR_setrlimit		 75
+#define __NR_getrlimit		 76
+#define __NR_getrusage		 77
+#define __NR_gettimeofday	 78
+#define __NR_settimeofday	 79
+#define __NR_getgroups		 80
+#define __NR_setgroups		 81
+#define __NR_select		 82
+#define __NR_symlink		 83
+#define __NR_oldlstat		 84
+#define __NR_readlink		 85
+#define __NR_uselib		 86
+#define __NR_swapon		 87
+#define __NR_reboot		 88
+#define __NR_readdir		 89
+#define __NR_mmap		 90
+#define __NR_munmap		 91
+#define __NR_truncate		 92
+#define __NR_ftruncate		 93
+#define __NR_fchmod		 94
+#define __NR_fchown		 95
+#define __NR_getpriority	 96
+#define __NR_setpriority	 97
+#define __NR_profil		 98
+#define __NR_statfs		 99
+#define __NR_fstatfs		100
+#define __NR_ioperm		101
+#define __NR_socketcall		102
+#define __NR_syslog		103
+#define __NR_setitimer		104
+#define __NR_getitimer		105
+#define __NR_stat		106
+#define __NR_lstat		107
+#define __NR_fstat		108
+#define __NR_olduname		109
+#define __NR_iopl		110
+#define __NR_vhangup		111
+#define __NR_idle		112
+#define __NR_vm86		113
+#define __NR_wait4		114
+#define __NR_swapoff		115
+#define __NR_sysinfo		116
+#define __NR_ipc		117
+#define __NR_fsync		118
+#define __NR_sigreturn		119
+#define __NR_clone		120
+#define __NR_setdomainname	121
+#define __NR_uname		122
+#define __NR_modify_ldt		123
+#define __NR_adjtimex		124
+#define __NR_mprotect		125
+#define __NR_sigprocmask	126
+#define __NR_create_module	127
+#define __NR_init_module	128
+#define __NR_delete_module	129
+#define __NR_get_kernel_syms	130
+#define __NR_quotactl		131
+#define __NR_getpgid		132
+#define __NR_fchdir		133
+#define __NR_bdflush		134
+#define __NR_sysfs		135
+#define __NR_personality	136
+#define __NR_afs_syscall	137 /* Syscall for Andrew File System */
+#define __NR_setfsuid		138
+#define __NR_setfsgid		139
+#define __NR__llseek		140
+#define __NR_getdents		141
+#define __NR__newselect		142
+#define __NR_flock		143
+#define __NR_msync		144
+#define __NR_readv		145
+#define __NR_writev		146
+#define __NR_getsid		147
+#define __NR_fdatasync		148
+#define __NR__sysctl		149
+#define __NR_mlock		150
+#define __NR_munlock		151
+#define __NR_mlockall		152
+#define __NR_munlockall		153
+#define __NR_sched_setparam		154
+#define __NR_sched_getparam		155
+#define __NR_sched_setscheduler		156
+#define __NR_sched_getscheduler		157
+#define __NR_sched_yield		158
+#define __NR_sched_get_priority_max	159
+#define __NR_sched_get_priority_min	160
+#define __NR_sched_rr_get_interval	161
+#define __NR_nanosleep		162
+#define __NR_mremap		163
+#define __NR_setresuid		164
+#define __NR_getresuid		165
+#define __NR_query_module	166
+#define __NR_poll		167
+#define __NR_nfsservctl		168
+#define __NR_setresgid		169
+#define __NR_getresgid		170
+#define __NR_prctl		171
+#define __NR_rt_sigreturn	172
+#define __NR_rt_sigaction	173
+#define __NR_rt_sigprocmask	174
+#define __NR_rt_sigpending	175
+#define __NR_rt_sigtimedwait	176
+#define __NR_rt_sigqueueinfo	177
+#define __NR_rt_sigsuspend	178
+#define __NR_pread64		179
+#define __NR_pwrite64		180
+#define __NR_chown		181
+#define __NR_getcwd		182
+#define __NR_capget		183
+#define __NR_capset		184
+#define __NR_sigaltstack	185
+#define __NR_sendfile		186
+#define __NR_getpmsg		187	/* some people actually want streams */
+#define __NR_putpmsg		188	/* some people actually want streams */
+#define __NR_vfork		189
+#define __NR_ugetrlimit		190	/* SuS compliant getrlimit */
+#define __NR_readahead		191
+#ifndef __powerpc64__			/* these are 32-bit only */
+#define __NR_mmap2		192
+#define __NR_truncate64		193
+#define __NR_ftruncate64	194
+#define __NR_stat64		195
+#define __NR_lstat64		196
+#define __NR_fstat64		197
+#endif
+#define __NR_pciconfig_read	198
+#define __NR_pciconfig_write	199
+#define __NR_pciconfig_iobase	200
+#define __NR_multiplexer	201
+#define __NR_getdents64		202
+#define __NR_pivot_root		203
+#ifndef __powerpc64__
+#define __NR_fcntl64		204
+#endif
+#define __NR_madvise		205
+#define __NR_mincore		206
+#define __NR_gettid		207
+#define __NR_tkill		208
+#define __NR_setxattr		209
+#define __NR_lsetxattr		210
+#define __NR_fsetxattr		211
+#define __NR_getxattr		212
+#define __NR_lgetxattr		213
+#define __NR_fgetxattr		214
+#define __NR_listxattr		215
+#define __NR_llistxattr		216
+#define __NR_flistxattr		217
+#define __NR_removexattr	218
+#define __NR_lremovexattr	219
+#define __NR_fremovexattr	220
+#define __NR_futex		221
+#define __NR_sched_setaffinity	222
+#define __NR_sched_getaffinity	223
+/* 224 currently unused */
+#define __NR_tuxcall		225
+#ifndef __powerpc64__
+#define __NR_sendfile64		226
+#endif
+#define __NR_io_setup		227
+#define __NR_io_destroy		228
+#define __NR_io_getevents	229
+#define __NR_io_submit		230
+#define __NR_io_cancel		231
+#define __NR_set_tid_address	232
+#define __NR_fadvise64		233
+#define __NR_exit_group		234
+#define __NR_lookup_dcookie	235
+#define __NR_epoll_create	236
+#define __NR_epoll_ctl		237
+#define __NR_epoll_wait		238
+#define __NR_remap_file_pages	239
+#define __NR_timer_create	240
+#define __NR_timer_settime	241
+#define __NR_timer_gettime	242
+#define __NR_timer_getoverrun	243
+#define __NR_timer_delete	244
+#define __NR_clock_settime	245
+#define __NR_clock_gettime	246
+#define __NR_clock_getres	247
+#define __NR_clock_nanosleep	248
+#define __NR_swapcontext	249
+#define __NR_tgkill		250
+#define __NR_utimes		251
+#define __NR_statfs64		252
+#define __NR_fstatfs64		253
+#ifndef __powerpc64__
+#define __NR_fadvise64_64	254
+#endif
+#define __NR_rtas		255
+#define __NR_sys_debug_setcontext 256
+/* Number 257 is reserved for vserver */
+#define __NR_migrate_pages	258
+#define __NR_mbind		259
+#define __NR_get_mempolicy	260
+#define __NR_set_mempolicy	261
+#define __NR_mq_open		262
+#define __NR_mq_unlink		263
+#define __NR_mq_timedsend	264
+#define __NR_mq_timedreceive	265
+#define __NR_mq_notify		266
+#define __NR_mq_getsetattr	267
+#define __NR_kexec_load		268
+#define __NR_add_key		269
+#define __NR_request_key	270
+#define __NR_keyctl		271
+#define __NR_waitid		272
+#define __NR_ioprio_set		273
+#define __NR_ioprio_get		274
+#define __NR_inotify_init	275
+#define __NR_inotify_add_watch	276
+#define __NR_inotify_rm_watch	277
+#define __NR_spu_run		278
+#define __NR_spu_create		279
+#define __NR_pselect6		280
+#define __NR_ppoll		281
+#define __NR_unshare		282
+#define __NR_splice		283
+#define __NR_tee		284
+#define __NR_vmsplice		285
+#define __NR_openat		286
+#define __NR_mkdirat		287
+#define __NR_mknodat		288
+#define __NR_fchownat		289
+#define __NR_futimesat		290
+#ifdef __powerpc64__
+#define __NR_newfstatat		291
+#else
+#define __NR_fstatat64		291
+#endif
+#define __NR_unlinkat		292
+#define __NR_renameat		293
+#define __NR_linkat		294
+#define __NR_symlinkat		295
+#define __NR_readlinkat		296
+#define __NR_fchmodat		297
+#define __NR_faccessat		298
+#define __NR_get_robust_list	299
+#define __NR_set_robust_list	300
+#define __NR_move_pages		301
+#define __NR_getcpu		302
+#define __NR_epoll_pwait	303
+#define __NR_utimensat		304
+#define __NR_signalfd		305
+#define __NR_timerfd_create	306
+#define __NR_eventfd		307
+#define __NR_sync_file_range2	308
+#define __NR_fallocate		309
+#define __NR_subpage_prot	310
+#define __NR_timerfd_settime	311
+#define __NR_timerfd_gettime	312
+#define __NR_signalfd4		313
+#define __NR_eventfd2		314
+#define __NR_epoll_create1	315
+#define __NR_dup3		316
+#define __NR_pipe2		317
+#define __NR_inotify_init1	318
+#define __NR_perf_event_open	319
+#define __NR_preadv		320
+#define __NR_pwritev		321
+#define __NR_rt_tgsigqueueinfo	322
+#define __NR_fanotify_init	323
+#define __NR_fanotify_mark	324
+#define __NR_prlimit64		325
+#define __NR_socket		326
+#define __NR_bind		327
+#define __NR_connect		328
+#define __NR_listen		329
+#define __NR_accept		330
+#define __NR_getsockname	331
+#define __NR_getpeername	332
+#define __NR_socketpair		333
+#define __NR_send		334
+#define __NR_sendto		335
+#define __NR_recv		336
+#define __NR_recvfrom		337
+#define __NR_shutdown		338
+#define __NR_setsockopt		339
+#define __NR_getsockopt		340
+#define __NR_sendmsg		341
+#define __NR_recvmsg		342
+#define __NR_recvmmsg		343
+#define __NR_accept4		344
+#define __NR_name_to_handle_at	345
+#define __NR_open_by_handle_at	346
+#define __NR_clock_adjtime	347
+#define __NR_syncfs		348
+#define __NR_sendmmsg		349
+#define __NR_setns		350
+#define __NR_process_vm_readv	351
+#define __NR_process_vm_writev	352
+#define __NR_finit_module	353
+#define __NR_kcmp		354
+#define __NR_sched_setattr	355
+#define __NR_sched_getattr	356
+#define __NR_renameat2		357
+#define __NR_seccomp		358
+#define __NR_getrandom		359
+#define __NR_memfd_create	360
+#define __NR_bpf		361
+#define __NR_execveat		362
+#define __NR_switch_endian	363
+#define __NR_userfaultfd	364
+#define __NR_membarrier		365
+#define __NR_mlock2		378
+#define __NR_copy_file_range	379
+#define __NR_preadv2		380
+#define __NR_pwritev2		381
+#define __NR_kexec_file_load	382
+#define __NR_statx		383
+#define __NR_pkey_alloc		384
+#define __NR_pkey_free		385
+#define __NR_pkey_mprotect	386
+
+#endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
diff --git a/tools/perf/check-headers.sh b/tools/perf/check-headers.sh
index 790ec25919a0..bf206ffe5c45 100755
--- a/tools/perf/check-headers.sh
+++ b/tools/perf/check-headers.sh
@@ -42,6 +42,7 @@ arch/parisc/include/uapi/asm/errno.h
 arch/powerpc/include/uapi/asm/errno.h
 arch/sparc/include/uapi/asm/errno.h
 arch/x86/include/uapi/asm/errno.h
+arch/powerpc/include/uapi/asm/unistd.h
 include/asm-generic/bitops/arch_hweight.h
 include/asm-generic/bitops/const_hweight.h
 include/asm-generic/bitops/__fls.h
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 34/41] perf powerpc: Generate system call table from asm/unistd.h
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (34 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Ravi Bangoria,
	Alexander Shishkin, Hendrik Brueckner, Jiri Olsa,
	Michael Ellerman, Namhyung Kim, Thomas Richter, linuxppc-dev,
	Arnaldo Carvalho de Melo

From: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>

This should speed up accessing new system calls introduced with the
kernel rather than waiting for libaudit updates to include them.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20180129083417.31240-3-ravi.bangoria@linux.vnet.ibm.com
[ Made it generate syscall_32.c as well to fix the build on 32-bit ppc ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/arch/powerpc/Makefile                   | 25 +++++++++++++++
 .../perf/arch/powerpc/entry/syscalls/mksyscalltbl  | 37 ++++++++++++++++++++++
 2 files changed, 62 insertions(+)
 create mode 100755 tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl

diff --git a/tools/perf/arch/powerpc/Makefile b/tools/perf/arch/powerpc/Makefile
index 42dab7c8f508..a111239df182 100644
--- a/tools/perf/arch/powerpc/Makefile
+++ b/tools/perf/arch/powerpc/Makefile
@@ -6,3 +6,28 @@ endif
 HAVE_KVM_STAT_SUPPORT := 1
 PERF_HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET := 1
 PERF_HAVE_JITDUMP := 1
+
+#
+# Syscall table generation for perf
+#
+
+out    := $(OUTPUT)arch/powerpc/include/generated/asm
+header32 := $(out)/syscalls_32.c
+header64 := $(out)/syscalls_64.c
+sysdef := $(srctree)/tools/arch/powerpc/include/uapi/asm/unistd.h
+sysprf := $(srctree)/tools/perf/arch/powerpc/entry/syscalls/
+systbl := $(sysprf)/mksyscalltbl
+
+# Create output directory if not already present
+_dummy := $(shell [ -d '$(out)' ] || mkdir -p '$(out)')
+
+$(header64): $(sysdef) $(systbl)
+	$(Q)$(SHELL) '$(systbl)' '64' '$(CC)' $(sysdef) > $@
+
+$(header32): $(sysdef) $(systbl)
+	$(Q)$(SHELL) '$(systbl)' '32' '$(CC)' $(sysdef) > $@
+
+clean::
+	$(call QUIET_CLEAN, powerpc) $(RM) $(header32) $(header64)
+
+archheaders: $(header32) $(header64)
diff --git a/tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl b/tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl
new file mode 100755
index 000000000000..ef52e1dd694b
--- /dev/null
+++ b/tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl
@@ -0,0 +1,37 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+#
+# Generate system call table for perf. Derived from
+# s390 script.
+#
+# Copyright IBM Corp. 2017
+# Author(s):  Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
+# Changed by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
+
+wordsize=$1
+gcc=$2
+input=$3
+
+if ! test -r $input; then
+	echo "Could not read input file" >&2
+	exit 1
+fi
+
+create_table()
+{
+	local wordsize=$1
+	local max_nr
+
+	echo "static const char *syscalltbl_powerpc_${wordsize}[] = {"
+	while read sc nr; do
+		printf '\t[%d] = "%s",\n' $nr $sc
+		max_nr=$nr
+	done
+	echo '};'
+	echo "#define SYSCALLTBL_POWERPC_${wordsize}_MAX_ID $max_nr"
+}
+
+$gcc -m${wordsize} -E -dM -x c  $input	       \
+	|sed -ne 's/^#define __NR_//p' \
+	|sort -t' ' -k2 -nu	       \
+	|create_table ${wordsize}
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 35/41] perf trace powerpc: Use generated syscall table
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (35 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Ravi Bangoria,
	Alexander Shishkin, Hendrik Brueckner, Jiri Olsa,
	Michael Ellerman, Namhyung Kim, Thomas Richter, linuxppc-dev,
	Arnaldo Carvalho de Melo

From: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>

This should speed up accessing new system calls introduced with the
kernel rather than waiting for libaudit updates to include them.

It also enables users to specify wildcards, for example, perf trace -e
'open*', just like was already possible on x86 and s390.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20180129083417.31240-4-ravi.bangoria@linux.vnet.ibm.com
[ Do it for ppc32 as well ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Makefile.config   | 2 ++
 tools/perf/util/syscalltbl.c | 8 ++++++++
 2 files changed, 10 insertions(+)

diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index 0dfdaa9fa81e..577a5d2988fe 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -27,6 +27,8 @@ NO_SYSCALL_TABLE := 1
 # Additional ARCH settings for ppc
 ifeq ($(SRCARCH),powerpc)
   NO_PERF_REGS := 0
+  NO_SYSCALL_TABLE := 0
+  CFLAGS += -I$(OUTPUT)arch/powerpc/include/generated
   LIBUNWIND_LIBS := -lunwind -lunwind-ppc64
 endif
 
diff --git a/tools/perf/util/syscalltbl.c b/tools/perf/util/syscalltbl.c
index 303bdb84ab5a..895122d638dd 100644
--- a/tools/perf/util/syscalltbl.c
+++ b/tools/perf/util/syscalltbl.c
@@ -30,6 +30,14 @@ static const char **syscalltbl_native = syscalltbl_x86_64;
 #include <asm/syscalls_64.c>
 const int syscalltbl_native_max_id = SYSCALLTBL_S390_64_MAX_ID;
 static const char **syscalltbl_native = syscalltbl_s390_64;
+#elif defined(__powerpc64__)
+#include <asm/syscalls_64.c>
+const int syscalltbl_native_max_id = SYSCALLTBL_POWERPC_64_MAX_ID;
+static const char **syscalltbl_native = syscalltbl_powerpc_64;
+#elif defined(__powerpc__)
+#include <asm/syscalls_32.c>
+const int syscalltbl_native_max_id = SYSCALLTBL_POWERPC_32_MAX_ID;
+static const char **syscalltbl_native = syscalltbl_powerpc_32;
 #endif
 
 struct syscall {
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 36/41] perf record: Provide detailed information on s390 CPU
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (36 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Thomas Richter, Heiko Carstens,
	Martin Schwidefsky, Arnaldo Carvalho de Melo

From: Thomas Richter <tmricht@linux.vnet.ibm.com>

When perf record ... is setup to record data, the s390 cpu information
was a fixed string "IBM/S390".

Replace this string with one containing more information about the
machine. The information included in the cpuid is a comma separated
list:

   manufacturer,type,model-capacity,model[,version,authorization]
with

- manufacturer: up to 16 byte name of the manufacturer (IBM).
- type: a four digit number refering to the machine
  generation.
- model-capacitiy: up to 16 characters describing number
  of cpus etc.
- model: up to 16 characters describing model.
- version: the CPU-MF counter facility version number,
  available on LPARs only, omitted on z/VM guests.
- authorization: the CPU-MF counter facility authorization level,
  available on LPARs only, omitted on z/VM guests.

Before:

  [root@s8360047 perf]# ./perf record -- sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.001 MB perf.data (4 samples) ]
  [root@s8360047 perf]# ./perf report --header | fgrep cpuid
   # cpuid : IBM/S390
  [root@s8360047 perf]#

After:

  [root@s35lp76 perf]# ./perf report --header|fgrep cpuid
   # cpuid : IBM,3906,704,M03,3.5,002f
  [root@s35lp76 perf]#

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20180213151419.80737-1-tmricht@linux.vnet.ibm.com
[ Use scnprintf instead of strncat to fix build errors on gcc GNU C99 5.4.0 20160609 -march=zEC12 -m64 -mzarch -ggdb3 -O6 -std=gnu99 -fPIC -fno-omit-frame-pointer -funwind-tables -fstack-protector-all ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/arch/s390/util/header.c | 130 +++++++++++++++++++++++++++++++++++--
 1 file changed, 125 insertions(+), 5 deletions(-)

diff --git a/tools/perf/arch/s390/util/header.c b/tools/perf/arch/s390/util/header.c
index 9fa6c3e5782c..a78064c25ced 100644
--- a/tools/perf/arch/s390/util/header.c
+++ b/tools/perf/arch/s390/util/header.c
@@ -1,8 +1,9 @@
 /*
  * Implementation of get_cpuid().
  *
- * Copyright 2014 IBM Corp.
+ * Copyright IBM Corp. 2014, 2018
  * Author(s): Alexander Yarygin <yarygin@linux.vnet.ibm.com>
+ *	      Thomas Richter <tmricht@linux.vnet.ibm.com>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License (version 2 only)
@@ -13,16 +14,135 @@
 #include <unistd.h>
 #include <stdio.h>
 #include <string.h>
+#include <ctype.h>
 
 #include "../../util/header.h"
+#include "../../util/util.h"
+
+#define SYSINFO_MANU	"Manufacturer:"
+#define SYSINFO_TYPE	"Type:"
+#define SYSINFO_MODEL	"Model:"
+#define SRVLVL_CPUMF	"CPU-MF:"
+#define SRVLVL_VERSION	"version="
+#define SRVLVL_AUTHORIZATION	"authorization="
+#define SYSINFO		"/proc/sysinfo"
+#define SRVLVL		"/proc/service_levels"
 
 int get_cpuid(char *buffer, size_t sz)
 {
-	const char *cpuid = "IBM/S390";
+	char *cp, *line = NULL, *line2;
+	char type[8], model[33], version[8], manufacturer[32], authorization[8];
+	int tpsize = 0, mdsize = 0, vssize = 0, mfsize = 0, atsize = 0;
+	int read;
+	unsigned long line_sz;
+	size_t nbytes;
+	FILE *sysinfo;
+
+	/*
+	 * Scan /proc/sysinfo line by line and read out values for
+	 * Manufacturer:, Type: and Model:, for example:
+	 * Manufacturer:    IBM
+	 * Type:            2964
+	 * Model:           702              N96
+	 * The first word is the Model Capacity and the second word is
+	 * Model (can be omitted). Both words have a maximum size of 16
+	 * bytes.
+	 */
+	memset(manufacturer, 0, sizeof(manufacturer));
+	memset(type, 0, sizeof(type));
+	memset(model, 0, sizeof(model));
+	memset(version, 0, sizeof(version));
+	memset(authorization, 0, sizeof(authorization));
+
+	sysinfo = fopen(SYSINFO, "r");
+	if (sysinfo == NULL)
+		return -1;
+
+	while ((read = getline(&line, &line_sz, sysinfo)) != -1) {
+		if (!strncmp(line, SYSINFO_MANU, strlen(SYSINFO_MANU))) {
+			line2 = line + strlen(SYSINFO_MANU);
+
+			while ((cp = strtok_r(line2, "\n ", &line2))) {
+				mfsize += scnprintf(manufacturer + mfsize,
+						    sizeof(manufacturer) - mfsize, "%s", cp);
+			}
+		}
+
+		if (!strncmp(line, SYSINFO_TYPE, strlen(SYSINFO_TYPE))) {
+			line2 = line + strlen(SYSINFO_TYPE);
+
+			while ((cp = strtok_r(line2, "\n ", &line2))) {
+				tpsize += scnprintf(type + tpsize,
+						    sizeof(type) - tpsize, "%s", cp);
+			}
+		}
+
+		if (!strncmp(line, SYSINFO_MODEL, strlen(SYSINFO_MODEL))) {
+			line2 = line + strlen(SYSINFO_MODEL);
+
+			while ((cp = strtok_r(line2, "\n ", &line2))) {
+				mdsize += scnprintf(model + mdsize, sizeof(type) - mdsize,
+						    "%s%s", model[0] ? "," : "", cp);
+			}
+			break;
+		}
+	}
+	fclose(sysinfo);
 
-	if (strlen(cpuid) + 1 > sz)
+	/* Missing manufacturer, type or model information should not happen */
+	if (!manufacturer[0] || !type[0] || !model[0])
 		return -1;
 
-	strcpy(buffer, cpuid);
-	return 0;
+	/*
+	 * Scan /proc/service_levels and return the CPU-MF counter facility
+	 * version number and authorization level.
+	 * Optional, does not exist on z/VM guests.
+	 */
+	sysinfo = fopen(SRVLVL, "r");
+	if (sysinfo == NULL)
+		goto skip_sysinfo;
+	while ((read = getline(&line, &line_sz, sysinfo)) != -1) {
+		if (strncmp(line, SRVLVL_CPUMF, strlen(SRVLVL_CPUMF)))
+			continue;
+
+		line2 = line + strlen(SRVLVL_CPUMF);
+		while ((cp = strtok_r(line2, "\n ", &line2))) {
+			if (!strncmp(cp, SRVLVL_VERSION,
+				     strlen(SRVLVL_VERSION))) {
+				char *sep = strchr(cp, '=');
+
+				vssize += scnprintf(version + vssize,
+						    sizeof(version) - vssize, "%s", sep + 1);
+			}
+			if (!strncmp(cp, SRVLVL_AUTHORIZATION,
+				     strlen(SRVLVL_AUTHORIZATION))) {
+				char *sep = strchr(cp, '=');
+
+				atsize += scnprintf(authorization + atsize,
+						    sizeof(authorization) - atsize, "%s", sep + 1);
+			}
+		}
+	}
+	fclose(sysinfo);
+
+skip_sysinfo:
+	free(line);
+
+	if (version[0] && authorization[0] )
+		nbytes = snprintf(buffer, sz, "%s,%s,%s,%s,%s",
+				  manufacturer, type, model, version,
+				  authorization);
+	else
+		nbytes = snprintf(buffer, sz, "%s,%s,%s", manufacturer, type,
+				  model);
+	return (nbytes >= sz) ? -1 : 0;
+}
+
+char *get_cpuid_str(struct perf_pmu *pmu __maybe_unused)
+{
+	char *buf = malloc(128);
+
+	if (buf && get_cpuid(buf, 128) < 0)
+		zfree(&buf);
+	return buf;
 }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 37/41] perf annotate: Scan cpuid for s390 and save machine type
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (37 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Thomas Richter, Heiko Carstens,
	Martin Schwidefsky, Arnaldo Carvalho de Melo

From: Thomas Richter <tmricht@linux.vnet.ibm.com>

Scan the cpuid string and extract the type number for later use.

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20180213151419.80737-2-tmricht@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/arch/s390/annotate/instructions.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/tools/perf/arch/s390/annotate/instructions.c b/tools/perf/arch/s390/annotate/instructions.c
index 8c72b44444cb..01df9d8303e1 100644
--- a/tools/perf/arch/s390/annotate/instructions.c
+++ b/tools/perf/arch/s390/annotate/instructions.c
@@ -23,12 +23,37 @@ static struct ins_ops *s390__associate_ins_ops(struct arch *arch, const char *na
 	return ops;
 }
 
+static int s390__cpuid_parse(struct arch *arch, char *cpuid)
+{
+	unsigned int family;
+	char model[16], model_c[16], cpumf_v[16], cpumf_a[16];
+	int ret;
+
+	/*
+	 * cpuid string format:
+	 * "IBM,family,model-capacity,model[,cpum_cf-version,cpum_cf-authorization]"
+	 */
+	ret = sscanf(cpuid, "%*[^,],%u,%[^,],%[^,],%[^,],%s", &family, model_c,
+		     model, cpumf_v, cpumf_a);
+	if (ret >= 2) {
+		arch->family = family;
+		arch->model = 0;
+		return 0;
+	}
+
+	return -1;
+}
+
 static int s390__annotate_init(struct arch *arch, char *cpuid __maybe_unused)
 {
+	int err = 0;
+
 	if (!arch->initialized) {
 		arch->initialized = true;
 		arch->associate_instruction_ops = s390__associate_ins_ops;
+		if (cpuid)
+			err = s390__cpuid_parse(arch, cpuid);
 	}
 
-	return 0;
+	return err;
 }
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 38/41] perf cpuid: Introduce a platform specific cpuid compare function
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (38 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Thomas Richter, Heiko Carstens,
	Martin Schwidefsky, Arnaldo Carvalho de Melo

From: Thomas Richter <tmricht@linux.vnet.ibm.com>

The function get_cpuid_str() is called by perf_pmu__getcpuid() and on
s390 returns a complete description of the CPU and its capabilities,
which is a comma separated list.

To map the CPU type with the value defined in the
pmu-events/arch/s390/mapfile.csv, introduce an architecture specific
cpuid compare function named strcmp_cpuid_str()

The currently used regex algorithm is defined as the weak default and
will be used if no platform specific one is defined. This matches the
current behavior.

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20180213151419.80737-3-tmricht@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/arch/s390/util/header.c | 18 +++++++++++++++
 tools/perf/util/header.h           |  1 +
 tools/perf/util/pmu.c              | 47 +++++++++++++++++++++++---------------
 3 files changed, 48 insertions(+), 18 deletions(-)

diff --git a/tools/perf/arch/s390/util/header.c b/tools/perf/arch/s390/util/header.c
index a78064c25ced..231294b80dc4 100644
--- a/tools/perf/arch/s390/util/header.c
+++ b/tools/perf/arch/s390/util/header.c
@@ -146,3 +146,21 @@ char *get_cpuid_str(struct perf_pmu *pmu __maybe_unused)
 		zfree(&buf);
 	return buf;
 }
+
+/*
+ * Compare the cpuid string returned by get_cpuid() function
+ * with the name generated by the jevents file read from
+ * pmu-events/arch/s390/mapfile.csv.
+ *
+ * Parameter mapcpuid is the cpuid as stored in the
+ * pmu-events/arch/s390/mapfile.csv. This is just the type number.
+ * Parameter cpuid is the cpuid returned by function get_cpuid().
+ */
+int strcmp_cpuid_str(const char *mapcpuid, const char *cpuid)
+{
+	char *cp = strchr(cpuid, ',');
+
+	if (cp == NULL)
+		return -1;
+	return strncmp(cp + 1, mapcpuid, strlen(mapcpuid));
+}
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index f28aaaa3a440..942bdec6d70d 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -174,4 +174,5 @@ int write_padded(struct feat_fd *fd, const void *bf,
 int get_cpuid(char *buffer, size_t sz);
 
 char *get_cpuid_str(struct perf_pmu *pmu __maybe_unused);
+int strcmp_cpuid_str(const char *s1, const char *s2);
 #endif /* __PERF_HEADER_H */
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 57e38fdf0b34..1111d5bf15ca 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -576,6 +576,34 @@ char * __weak get_cpuid_str(struct perf_pmu *pmu __maybe_unused)
 	return NULL;
 }
 
+/* Return zero when the cpuid from the mapfile.csv matches the
+ * cpuid string generated on this platform.
+ * Otherwise return non-zero.
+ */
+int __weak strcmp_cpuid_str(const char *mapcpuid, const char *cpuid)
+{
+	regex_t re;
+	regmatch_t pmatch[1];
+	int match;
+
+	if (regcomp(&re, mapcpuid, REG_EXTENDED) != 0) {
+		/* Warn unable to generate match particular string. */
+		pr_info("Invalid regular expression %s\n", mapcpuid);
+		return 1;
+	}
+
+	match = !regexec(&re, cpuid, 1, pmatch, 0);
+	regfree(&re);
+	if (match) {
+		size_t match_len = (pmatch[0].rm_eo - pmatch[0].rm_so);
+
+		/* Verify the entire string matched. */
+		if (match_len == strlen(cpuid))
+			return 0;
+	}
+	return 1;
+}
+
 static char *perf_pmu__getcpuid(struct perf_pmu *pmu)
 {
 	char *cpuid;
@@ -610,31 +638,14 @@ struct pmu_events_map *perf_pmu__find_map(struct perf_pmu *pmu)
 
 	i = 0;
 	for (;;) {
-		regex_t re;
-		regmatch_t pmatch[1];
-		int match;
-
 		map = &pmu_events_map[i++];
 		if (!map->table) {
 			map = NULL;
 			break;
 		}
 
-		if (regcomp(&re, map->cpuid, REG_EXTENDED) != 0) {
-			/* Warn unable to generate match particular string. */
-			pr_info("Invalid regular expression %s\n", map->cpuid);
+		if (!strcmp_cpuid_str(map->cpuid, cpuid))
 			break;
-		}
-
-		match = !regexec(&re, cpuid, 1, pmatch, 0);
-		regfree(&re);
-		if (match) {
-			size_t match_len = (pmatch[0].rm_eo - pmatch[0].rm_so);
-
-			/* Verify the entire string matched. */
-			if (match_len == strlen(cpuid))
-				break;
-		}
 	}
 	free(cpuid);
 	return map;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 39/41] perf test: Fix test case 23 for s390 z/VM or KVM guests
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (39 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Thomas Richter, Heiko Carstens,
	Martin Schwidefsky, Arnaldo Carvalho de Melo

From: Thomas Richter <tmricht@linux.vnet.ibm.com>

On s390 perf can be executed on a LPAR with support for hardware events
(i. e. cycles) or on a z/VM or KVM guest where no hardware events are
supported. In this environment use software event named cpu-clock for
this test case.

Use the cpuid infrastructure functions to determine the cpuid on s390
which contains an indication of the cpu counter facility availability.

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20180213151419.80737-4-tmricht@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/tests/code-reading.c | 33 +++++++++++++++++++++++++++++----
 1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/tools/perf/tests/code-reading.c b/tools/perf/tests/code-reading.c
index 3bf7b145b826..c7115d369511 100644
--- a/tools/perf/tests/code-reading.c
+++ b/tools/perf/tests/code-reading.c
@@ -482,6 +482,34 @@ static void fs_something(void)
 	}
 }
 
+static const char *do_determine_event(bool excl_kernel)
+{
+	const char *event = excl_kernel ? "cycles:u" : "cycles";
+
+#ifdef __s390x__
+	char cpuid[128], model[16], model_c[16], cpum_cf_v[16];
+	unsigned int family;
+	int ret, cpum_cf_a;
+
+	if (get_cpuid(cpuid, sizeof(cpuid)))
+		goto out_clocks;
+	ret = sscanf(cpuid, "%*[^,],%u,%[^,],%[^,],%[^,],%x", &family, model_c,
+		     model, cpum_cf_v, &cpum_cf_a);
+	if (ret != 5)		 /* Not available */
+		goto out_clocks;
+	if (excl_kernel && (cpum_cf_a & 4))
+		return event;
+	if (!excl_kernel && (cpum_cf_a & 2))
+		return event;
+
+	/* Fall through: missing authorization */
+out_clocks:
+	event = excl_kernel ? "cpu-clock:u" : "cpu-clock";
+
+#endif
+	return event;
+}
+
 static void do_something(void)
 {
 	fs_something();
@@ -592,10 +620,7 @@ static int do_test_code_reading(bool try_kcore)
 
 		perf_evlist__set_maps(evlist, cpus, threads);
 
-		if (excl_kernel)
-			str = "cycles:u";
-		else
-			str = "cycles";
+		str = do_determine_event(excl_kernel);
 		pr_debug("Parsing event '%s'\n", str);
 		ret = parse_events(evlist, str, NULL);
 		if (ret < 0) {
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 40/41] perf test: Fix test case inet_pton to accept inlines.
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (40 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Thomas Richter, Heiko Carstens,
	Hendrik Brueckner, Martin Schwidefsky, Arnaldo Carvalho de Melo

From: Thomas Richter <tmricht@linux.vnet.ibm.com>

Using Fedora 27 and latest Linux kernel the test case
trace+probe_libc_inet_pton.sh fails again on s390.  This time is the
inlining of functions which does not match.  After an update of the
glibc (from 2.26-16 to 2.26-24) the output is different

The expected output is:

             __inet_pton (/usr/lib64/libc-2.26.so)
             gaih_inet (inlined)
             ....

The actual output is:

  1 packets transmitted, 1 received, 0% packet loss, time 0ms
  rtt min/avg/max/mdev = 0.061/0.061/0.061/0.000 ms
       0.000 probe_libc:inet_pton:(3ffb2140448))
             __inet_pton (inlined)
             gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
             ...

Fix this by being less strict on 'inlined' verses library name and
accept both

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20180214070303.55757-1-tmricht@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/tests/shell/trace+probe_libc_inet_pton.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/perf/tests/shell/trace+probe_libc_inet_pton.sh b/tools/perf/tests/shell/trace+probe_libc_inet_pton.sh
index c446c894b297..8c4ab0b390c0 100755
--- a/tools/perf/tests/shell/trace+probe_libc_inet_pton.sh
+++ b/tools/perf/tests/shell/trace+probe_libc_inet_pton.sh
@@ -21,12 +21,12 @@ trace_libc_inet_pton_backtrace() {
 	expected[3]=".*packets transmitted.*"
 	expected[4]="rtt min.*"
 	expected[5]="[0-9]+\.[0-9]+[[:space:]]+probe_libc:inet_pton:\([[:xdigit:]]+\)"
-	expected[6]=".*inet_pton[[:space:]]\($libc\)$"
+	expected[6]=".*inet_pton[[:space:]]\($libc|inlined\)$"
 	case "$(uname -m)" in
 	s390x)
 		eventattr='call-graph=dwarf'
-		expected[7]="gaih_inet[[:space:]]\(inlined\)$"
-		expected[8]="__GI_getaddrinfo[[:space:]]\(inlined\)$"
+		expected[7]="gaih_inet.*[[:space:]]\($libc|inlined\)$"
+		expected[8]="__GI_getaddrinfo[[:space:]]\($libc|inlined\)$"
 		expected[9]="main[[:space:]]\(.*/bin/ping.*\)$"
 		expected[10]="__libc_start_main[[:space:]]\($libc\)$"
 		expected[11]="_start[[:space:]]\(.*/bin/ping.*\)$"
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH 41/41] perf tests shell lib: Use a wildcard to remove the vfs_getname probe
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
                   ` (41 preceding siblings ...)
  (?)
@ 2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-02-16 19:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo,
	Adrian Hunter, David Ahern, Jiri Olsa, Masami Hiramatsu,
	Namhyung Kim, Thomas Richter, Wang Nan

From: Arnaldo Carvalho de Melo <acme@redhat.com>

In some situations the vfs_getname is being added both as requested and
with a _1 suffix (inlines?):

  probe:vfs_getname_1  (on getname_flags:63@acme/git/linux/fs/namei.c with pathname)

This ends up making the cleanup to miss that one, as it removes just
'probe:vfs_getname', which makes the second test to use this probe point
to fail, since it finds that leftover from the first test, use a
wildcard to remove both.

Before:

  # perf test 60 61 62 63
  60: Use vfs_getname probe to get syscall args filenames   : FAILED!
  61: probe libc's inet_pton & backtrace it with ping       : Ok
  62: Check open filename arg using perf trace + vfs_getname: FAILED!
  63: Add vfs_getname probe to get syscall args filenames   : Ok

After:

  # perf test 60 61 62 63
  60: Use vfs_getname probe to get syscall args filenames   : Ok
  61: probe libc's inet_pton & backtrace it with ping       : Ok
  62: Check open filename arg using perf trace + vfs_getname: Ok
  63: Add vfs_getname probe to get syscall args filenames   : Ok
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-2k5kutwr4ds36adiakyb4yvy@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/tests/shell/lib/probe_vfs_getname.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/tests/shell/lib/probe_vfs_getname.sh b/tools/perf/tests/shell/lib/probe_vfs_getname.sh
index 30a950c9d407..1c16e56cd93e 100644
--- a/tools/perf/tests/shell/lib/probe_vfs_getname.sh
+++ b/tools/perf/tests/shell/lib/probe_vfs_getname.sh
@@ -5,7 +5,7 @@ had_vfs_getname=$?
 
 cleanup_probe_vfs_getname() {
 	if [ $had_vfs_getname -eq 1 ] ; then
-		perf probe -q -d probe:vfs_getname
+		perf probe -q -d probe:vfs_getname*
 	fi
 }
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [GIT PULL 00/41] perf/core improvements and fixes
  2018-02-16 19:17 ` Arnaldo Carvalho de Melo
  (?)
@ 2018-02-17 10:49   ` Ingo Molnar
  -1 siblings, 0 replies; 63+ messages in thread
From: Ingo Molnar @ 2018-02-17 10:49 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, linux-perf-users, Adrian Hunter,
	Alexander Shishkin, Andi Kleen, coresight, David Ahern,
	Heiko Carstens, Hendrik Brueckner, Jaecheol Shin, Jin Yao,
	Jiri Olsa, Kan Liang, linux-arm-kernel, linuxppc-dev,
	Martin Schwidefsky, Masami Hiramatsu, Mathieu Poirier,
	Michael Ellerman, Milian Wolff, Namhyung Kim, Naveen N . Rao,
	Peter Zijlstra, Ravi Bangoria, Robert Walker, Sangwon Hong,
	Stephane Eranian, Taeung Song, Thomas Richter, Wang Nan,
	yuzhoujian, Arnaldo Carvalho de Melo


* Arnaldo Carvalho de Melo <acme@kernel.org> wrote:

> Hi Ingo,
> 
> 	Please consider pulling, this is on top of tip/perf/urgent.
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit 297f9233b53a08fd457815e19f1d6f2c3389857b:
> 
>   kprobes: Propagate error from disarm_kprobe_ftrace() (2018-02-16 09:12:58 +0100)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-4.17-20180216
> 
> for you to fetch changes up to 21316ac6803d4a1aadd74b896db8d60a92cd1140:
> 
>   perf tests shell lib: Use a wildcard to remove the vfs_getname probe (2018-02-16 15:31:12 -0300)
> 
> ----------------------------------------------------------------
> perf/core improvements and fixes:
> 
> - Fix wrong jump arrow in systems with branch records with cycles,
>   i.e. Intel's >= Skylake (Jin Yao)
> 
> - Fix 'perf record --per-thread' problem introduced when
>   implementing 'perf stat --per-thread (Jin Yao)
> 
> - Use arch__compare_symbol_names() to fix 'perf test vmlinux',
>   that was using strcmp(symbol names) while the dso routines
>   doing symbol lookups used the arch overridable one, making
>   this test fail in architectures that overrided that function
>   with something other than strcmp() (Jiri Olsa)
> 
> - Add 'perf script --show-round-event' to display
>   PERF_RECORD_FINISHED_ROUND entries (Jiri Olsa)
> 
> - Fix dwarf unwind for stripped binaries in 'perf test' (Jiri Olsa)
> 
> - Use ordered_events for 'perf report --tasks', otherwise we may get
>   artifacts when PERF_RECORD_FORK gets processed before PERF_RECORD_COMM
>   (when they got recorded in different CPUs) (Jiri Olsa)
> 
> - Add support to display group output for non group events, i.e.
>   now when one uses 'perf report --group' on a perf.data file
>   recorded without explicitly grouping events with {} (e.g.
>   "perf record -e '{cycles,instructions}'" get the same output
>   that would produce, i.e. see all those non-grouped events in
>   multiple columns, at the same time (Jiri Olsa)
> 
> - Skip non-address kallsyms entries, e.g. '(null)' for !root (Jiri Olsa)
> 
> - Kernel maps fixes wrt perf.data(report) versus live system (top)
>   (Jiri Olsa)
> 
> - Fix memory corruption when using 'perf record -j call -g -a <application>'
>   followed by 'perf report --branch-history' (Jiri Olsa)
> 
> - ARM CoreSight fixes (Mathieu Poirier)
> 
> - Add inject capability for CoreSight Traces (Robert Waker)
> 
> - Update documentation for use of 'perf' + ARM CoreSight (Robert Walker)
> 
> - Man pages fixes (Sangwon Hong, Jaecheol Shin)
> 
> - Fix some 'perf test' cases on s/390 and x86_64 (some backtraces
>   changed with a glibc update) (Thomas Richter)
> 
> - Add detailed CPUID info in the 'perf.data' headers for s/390 to
>   then use it in 'perf annotate' (Thomas Richter)
> 
> - Add '--interval-count N' to 'perf stat', to use with -I, i.e.
>   'perf stat -I 1000 --interval-count 2' will show stats every
>    1000ms, two times (yuzhoujian)
> 
> - Add 'perf stat --timeout Nms', that will run for that many
>   milliseconds and then stop, printing the counters (yuzhoujian)
> 
> - Fix description for 'perf report --mem-modex (Andi Kleen)
> 
> - Use a wildcard to remove the vfs_getname probe in the
>   'perf test' shell based test cases (Arnaldo Carvalho de Melo)
> 
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> ----------------------------------------------------------------
> Andi Kleen (1):
>       perf report: Fix description for --mem-mode
> 
> Arnaldo Carvalho de Melo (1):
>       perf tests shell lib: Use a wildcard to remove the vfs_getname probe
> 
> Jaecheol Shin (1):
>       perf annotate: Add missing arguments in Man page
> 
> Jin Yao (2):
>       perf tools: Use target->per_thread and target->system_wide flags
>       perf report: Fix wrong jump arrow
> 
> Jiri Olsa (18):
>       perf record: Put new line after target override warning
>       perf script: Add --show-round-event to display PERF_RECORD_FINISHED_ROUND
>       tools lib api fs: Add filename__read_xll function
>       tools lib api fs: Add sysfs__read_xll function
>       perf tests: Fix dwarf unwind for stripped binaries
>       perf tools: Fix comment for sort__* compare functions
>       perf report: Ask for ordered events for --tasks option
>       perf report: Add support to display group output for non group events
>       tools lib symbol: Skip non-address kallsyms line
>       perf symbols: Check if we read regular file in dso__load()
>       perf machine: Free root_dir in machine__init() error path
>       perf machine: Move kernel mmap name into struct machine
>       perf machine: Generalize machine__set_kernel_mmap()
>       perf machine: Don't search for active kernel start in __machine__create_kernel_maps
>       perf machine: Remove machine__load_kallsyms()
>       perf tools: Do not create kernel maps in sample__resolve()
>       perf tests: Use arch__compare_symbol_names to compare symbols
>       perf report: Fix memory corruption in --branch-history mode --branch-history
> 
> Mathieu Poirier (3):
>       perf cs-etm: Freeing allocated memory
>       perf auxtrace arm: Fixing uninitialised variable
>       perf cs-etm: Properly deal with cpu maps
> 
> Ravi Bangoria (3):
>       tools include powerpc: Grab a copy of arch/powerpc/include/uapi/asm/unistd.h
>       perf powerpc: Generate system call table from asm/unistd.h
>       perf trace powerpc: Use generated syscall table
> 
> Robert Walker (3):
>       perf cs-etm: Inject capabilitity for CoreSight traces
>       perf inject: Emit instruction records on ETM trace discontinuity
>       coresight: Update documentation for perf usage
> 
> Sangwon Hong (2):
>       perf kmem: Document a missing option & an argument
>       perf mem: Document a missing option
> 
> Thomas Richter (5):
>       perf record: Provide detailed information on s390 CPU
>       perf annotate: Scan cpuid for s390 and save machine type
>       perf cpuid: Introduce a platform specific cpuid compare function
>       perf test: Fix test case 23 for s390 z/VM or KVM guests
>       perf test: Fix test case inet_pton to accept inlines.
> 
> yuzhoujian (2):
>       perf stat: Add support to print counts for fixed times
>       perf stat: Add support to print counts after a period of time
> 
>  Documentation/trace/coresight.txt                  |  51 +++
>  tools/arch/powerpc/include/uapi/asm/unistd.h       | 402 +++++++++++++++++
>  tools/lib/api/fs/fs.c                              |  44 +-
>  tools/lib/api/fs/fs.h                              |   2 +
>  tools/lib/symbol/kallsyms.c                        |   4 +
>  tools/perf/Documentation/perf-annotate.txt         |   6 +-
>  tools/perf/Documentation/perf-kmem.txt             |   6 +-
>  tools/perf/Documentation/perf-mem.txt              |   4 +
>  tools/perf/Documentation/perf-report.txt           |   5 +-
>  tools/perf/Documentation/perf-script.txt           |   3 +
>  tools/perf/Documentation/perf-stat.txt             |  10 +
>  tools/perf/Makefile.config                         |   2 +
>  tools/perf/arch/arm/util/auxtrace.c                |   2 +-
>  tools/perf/arch/arm/util/cs-etm.c                  |  51 ++-
>  tools/perf/arch/powerpc/Makefile                   |  25 ++
>  .../perf/arch/powerpc/entry/syscalls/mksyscalltbl  |  37 ++
>  tools/perf/arch/s390/annotate/instructions.c       |  27 +-
>  tools/perf/arch/s390/util/header.c                 | 148 ++++++-
>  tools/perf/builtin-record.c                        |   2 +-
>  tools/perf/builtin-report.c                        |   7 +-
>  tools/perf/builtin-script.c                        |  17 +
>  tools/perf/builtin-stat.c                          |  53 ++-
>  tools/perf/check-headers.sh                        |   1 +
>  tools/perf/tests/code-reading.c                    |  33 +-
>  tools/perf/tests/dwarf-unwind.c                    |  46 +-
>  tools/perf/tests/shell/lib/probe_vfs_getname.sh    |   2 +-
>  .../perf/tests/shell/trace+probe_libc_inet_pton.sh |   6 +-
>  tools/perf/tests/vmlinux-kallsyms.c                |   4 +-
>  tools/perf/ui/browsers/annotate.c                  |   9 +-
>  tools/perf/util/build-id.c                         |  10 +-
>  tools/perf/util/cs-etm-decoder/cs-etm-decoder.c    |  74 +++-
>  tools/perf/util/cs-etm-decoder/cs-etm-decoder.h    |   2 +
>  tools/perf/util/cs-etm.c                           | 478 ++++++++++++++++++---
>  tools/perf/util/event.c                            |  16 +-
>  tools/perf/util/evlist.c                           |  21 +-
>  tools/perf/util/header.h                           |   1 +
>  tools/perf/util/hist.c                             |   4 +-
>  tools/perf/util/hist.h                             |   1 -
>  tools/perf/util/machine.c                          | 145 +++----
>  tools/perf/util/machine.h                          |   6 +-
>  tools/perf/util/pmu.c                              |  47 +-
>  tools/perf/util/sort.c                             |   7 +-
>  tools/perf/util/stat.h                             |   2 +
>  tools/perf/util/symbol.c                           |  13 +-
>  tools/perf/util/syscalltbl.c                       |   8 +
>  tools/perf/util/thread_map.c                       |   4 +-
>  tools/perf/util/thread_map.h                       |   2 +-
>  47 files changed, 1577 insertions(+), 273 deletions(-)
>  create mode 100644 tools/arch/powerpc/include/uapi/asm/unistd.h
>  create mode 100755 tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl

Pulled, thanks a lot Arnaldo!

	Ingo

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [GIT PULL 00/41] perf/core improvements and fixes
@ 2018-02-17 10:49   ` Ingo Molnar
  0 siblings, 0 replies; 63+ messages in thread
From: Ingo Molnar @ 2018-02-17 10:49 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, linux-perf-users, Adrian Hunter,
	Alexander Shishkin, Andi Kleen, coresight, David Ahern,
	Heiko Carstens, Hendrik Brueckner, Jaecheol Shin, Jin Yao,
	Jiri Olsa, Kan Liang, linux-arm-kernel, linuxppc-dev,
	Martin Schwidefsky, Masami Hiramatsu, Mathieu Poirier,
	Michael Ellerman, Milian Wolff


* Arnaldo Carvalho de Melo <acme@kernel.org> wrote:

> Hi Ingo,
> 
> 	Please consider pulling, this is on top of tip/perf/urgent.
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit 297f9233b53a08fd457815e19f1d6f2c3389857b:
> 
>   kprobes: Propagate error from disarm_kprobe_ftrace() (2018-02-16 09:12:58 +0100)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-4.17-20180216
> 
> for you to fetch changes up to 21316ac6803d4a1aadd74b896db8d60a92cd1140:
> 
>   perf tests shell lib: Use a wildcard to remove the vfs_getname probe (2018-02-16 15:31:12 -0300)
> 
> ----------------------------------------------------------------
> perf/core improvements and fixes:
> 
> - Fix wrong jump arrow in systems with branch records with cycles,
>   i.e. Intel's >= Skylake (Jin Yao)
> 
> - Fix 'perf record --per-thread' problem introduced when
>   implementing 'perf stat --per-thread (Jin Yao)
> 
> - Use arch__compare_symbol_names() to fix 'perf test vmlinux',
>   that was using strcmp(symbol names) while the dso routines
>   doing symbol lookups used the arch overridable one, making
>   this test fail in architectures that overrided that function
>   with something other than strcmp() (Jiri Olsa)
> 
> - Add 'perf script --show-round-event' to display
>   PERF_RECORD_FINISHED_ROUND entries (Jiri Olsa)
> 
> - Fix dwarf unwind for stripped binaries in 'perf test' (Jiri Olsa)
> 
> - Use ordered_events for 'perf report --tasks', otherwise we may get
>   artifacts when PERF_RECORD_FORK gets processed before PERF_RECORD_COMM
>   (when they got recorded in different CPUs) (Jiri Olsa)
> 
> - Add support to display group output for non group events, i.e.
>   now when one uses 'perf report --group' on a perf.data file
>   recorded without explicitly grouping events with {} (e.g.
>   "perf record -e '{cycles,instructions}'" get the same output
>   that would produce, i.e. see all those non-grouped events in
>   multiple columns, at the same time (Jiri Olsa)
> 
> - Skip non-address kallsyms entries, e.g. '(null)' for !root (Jiri Olsa)
> 
> - Kernel maps fixes wrt perf.data(report) versus live system (top)
>   (Jiri Olsa)
> 
> - Fix memory corruption when using 'perf record -j call -g -a <application>'
>   followed by 'perf report --branch-history' (Jiri Olsa)
> 
> - ARM CoreSight fixes (Mathieu Poirier)
> 
> - Add inject capability for CoreSight Traces (Robert Waker)
> 
> - Update documentation for use of 'perf' + ARM CoreSight (Robert Walker)
> 
> - Man pages fixes (Sangwon Hong, Jaecheol Shin)
> 
> - Fix some 'perf test' cases on s/390 and x86_64 (some backtraces
>   changed with a glibc update) (Thomas Richter)
> 
> - Add detailed CPUID info in the 'perf.data' headers for s/390 to
>   then use it in 'perf annotate' (Thomas Richter)
> 
> - Add '--interval-count N' to 'perf stat', to use with -I, i.e.
>   'perf stat -I 1000 --interval-count 2' will show stats every
>    1000ms, two times (yuzhoujian)
> 
> - Add 'perf stat --timeout Nms', that will run for that many
>   milliseconds and then stop, printing the counters (yuzhoujian)
> 
> - Fix description for 'perf report --mem-modex (Andi Kleen)
> 
> - Use a wildcard to remove the vfs_getname probe in the
>   'perf test' shell based test cases (Arnaldo Carvalho de Melo)
> 
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> ----------------------------------------------------------------
> Andi Kleen (1):
>       perf report: Fix description for --mem-mode
> 
> Arnaldo Carvalho de Melo (1):
>       perf tests shell lib: Use a wildcard to remove the vfs_getname probe
> 
> Jaecheol Shin (1):
>       perf annotate: Add missing arguments in Man page
> 
> Jin Yao (2):
>       perf tools: Use target->per_thread and target->system_wide flags
>       perf report: Fix wrong jump arrow
> 
> Jiri Olsa (18):
>       perf record: Put new line after target override warning
>       perf script: Add --show-round-event to display PERF_RECORD_FINISHED_ROUND
>       tools lib api fs: Add filename__read_xll function
>       tools lib api fs: Add sysfs__read_xll function
>       perf tests: Fix dwarf unwind for stripped binaries
>       perf tools: Fix comment for sort__* compare functions
>       perf report: Ask for ordered events for --tasks option
>       perf report: Add support to display group output for non group events
>       tools lib symbol: Skip non-address kallsyms line
>       perf symbols: Check if we read regular file in dso__load()
>       perf machine: Free root_dir in machine__init() error path
>       perf machine: Move kernel mmap name into struct machine
>       perf machine: Generalize machine__set_kernel_mmap()
>       perf machine: Don't search for active kernel start in __machine__create_kernel_maps
>       perf machine: Remove machine__load_kallsyms()
>       perf tools: Do not create kernel maps in sample__resolve()
>       perf tests: Use arch__compare_symbol_names to compare symbols
>       perf report: Fix memory corruption in --branch-history mode --branch-history
> 
> Mathieu Poirier (3):
>       perf cs-etm: Freeing allocated memory
>       perf auxtrace arm: Fixing uninitialised variable
>       perf cs-etm: Properly deal with cpu maps
> 
> Ravi Bangoria (3):
>       tools include powerpc: Grab a copy of arch/powerpc/include/uapi/asm/unistd.h
>       perf powerpc: Generate system call table from asm/unistd.h
>       perf trace powerpc: Use generated syscall table
> 
> Robert Walker (3):
>       perf cs-etm: Inject capabilitity for CoreSight traces
>       perf inject: Emit instruction records on ETM trace discontinuity
>       coresight: Update documentation for perf usage
> 
> Sangwon Hong (2):
>       perf kmem: Document a missing option & an argument
>       perf mem: Document a missing option
> 
> Thomas Richter (5):
>       perf record: Provide detailed information on s390 CPU
>       perf annotate: Scan cpuid for s390 and save machine type
>       perf cpuid: Introduce a platform specific cpuid compare function
>       perf test: Fix test case 23 for s390 z/VM or KVM guests
>       perf test: Fix test case inet_pton to accept inlines.
> 
> yuzhoujian (2):
>       perf stat: Add support to print counts for fixed times
>       perf stat: Add support to print counts after a period of time
> 
>  Documentation/trace/coresight.txt                  |  51 +++
>  tools/arch/powerpc/include/uapi/asm/unistd.h       | 402 +++++++++++++++++
>  tools/lib/api/fs/fs.c                              |  44 +-
>  tools/lib/api/fs/fs.h                              |   2 +
>  tools/lib/symbol/kallsyms.c                        |   4 +
>  tools/perf/Documentation/perf-annotate.txt         |   6 +-
>  tools/perf/Documentation/perf-kmem.txt             |   6 +-
>  tools/perf/Documentation/perf-mem.txt              |   4 +
>  tools/perf/Documentation/perf-report.txt           |   5 +-
>  tools/perf/Documentation/perf-script.txt           |   3 +
>  tools/perf/Documentation/perf-stat.txt             |  10 +
>  tools/perf/Makefile.config                         |   2 +
>  tools/perf/arch/arm/util/auxtrace.c                |   2 +-
>  tools/perf/arch/arm/util/cs-etm.c                  |  51 ++-
>  tools/perf/arch/powerpc/Makefile                   |  25 ++
>  .../perf/arch/powerpc/entry/syscalls/mksyscalltbl  |  37 ++
>  tools/perf/arch/s390/annotate/instructions.c       |  27 +-
>  tools/perf/arch/s390/util/header.c                 | 148 ++++++-
>  tools/perf/builtin-record.c                        |   2 +-
>  tools/perf/builtin-report.c                        |   7 +-
>  tools/perf/builtin-script.c                        |  17 +
>  tools/perf/builtin-stat.c                          |  53 ++-
>  tools/perf/check-headers.sh                        |   1 +
>  tools/perf/tests/code-reading.c                    |  33 +-
>  tools/perf/tests/dwarf-unwind.c                    |  46 +-
>  tools/perf/tests/shell/lib/probe_vfs_getname.sh    |   2 +-
>  .../perf/tests/shell/trace+probe_libc_inet_pton.sh |   6 +-
>  tools/perf/tests/vmlinux-kallsyms.c                |   4 +-
>  tools/perf/ui/browsers/annotate.c                  |   9 +-
>  tools/perf/util/build-id.c                         |  10 +-
>  tools/perf/util/cs-etm-decoder/cs-etm-decoder.c    |  74 +++-
>  tools/perf/util/cs-etm-decoder/cs-etm-decoder.h    |   2 +
>  tools/perf/util/cs-etm.c                           | 478 ++++++++++++++++++---
>  tools/perf/util/event.c                            |  16 +-
>  tools/perf/util/evlist.c                           |  21 +-
>  tools/perf/util/header.h                           |   1 +
>  tools/perf/util/hist.c                             |   4 +-
>  tools/perf/util/hist.h                             |   1 -
>  tools/perf/util/machine.c                          | 145 +++----
>  tools/perf/util/machine.h                          |   6 +-
>  tools/perf/util/pmu.c                              |  47 +-
>  tools/perf/util/sort.c                             |   7 +-
>  tools/perf/util/stat.h                             |   2 +
>  tools/perf/util/symbol.c                           |  13 +-
>  tools/perf/util/syscalltbl.c                       |   8 +
>  tools/perf/util/thread_map.c                       |   4 +-
>  tools/perf/util/thread_map.h                       |   2 +-
>  47 files changed, 1577 insertions(+), 273 deletions(-)
>  create mode 100644 tools/arch/powerpc/include/uapi/asm/unistd.h
>  create mode 100755 tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl

Pulled, thanks a lot Arnaldo!

	Ingo

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [GIT PULL 00/41] perf/core improvements and fixes
@ 2018-02-17 10:49   ` Ingo Molnar
  0 siblings, 0 replies; 63+ messages in thread
From: Ingo Molnar @ 2018-02-17 10:49 UTC (permalink / raw)
  To: linux-arm-kernel


* Arnaldo Carvalho de Melo <acme@kernel.org> wrote:

> Hi Ingo,
> 
> 	Please consider pulling, this is on top of tip/perf/urgent.
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit 297f9233b53a08fd457815e19f1d6f2c3389857b:
> 
>   kprobes: Propagate error from disarm_kprobe_ftrace() (2018-02-16 09:12:58 +0100)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-4.17-20180216
> 
> for you to fetch changes up to 21316ac6803d4a1aadd74b896db8d60a92cd1140:
> 
>   perf tests shell lib: Use a wildcard to remove the vfs_getname probe (2018-02-16 15:31:12 -0300)
> 
> ----------------------------------------------------------------
> perf/core improvements and fixes:
> 
> - Fix wrong jump arrow in systems with branch records with cycles,
>   i.e. Intel's >= Skylake (Jin Yao)
> 
> - Fix 'perf record --per-thread' problem introduced when
>   implementing 'perf stat --per-thread (Jin Yao)
> 
> - Use arch__compare_symbol_names() to fix 'perf test vmlinux',
>   that was using strcmp(symbol names) while the dso routines
>   doing symbol lookups used the arch overridable one, making
>   this test fail in architectures that overrided that function
>   with something other than strcmp() (Jiri Olsa)
> 
> - Add 'perf script --show-round-event' to display
>   PERF_RECORD_FINISHED_ROUND entries (Jiri Olsa)
> 
> - Fix dwarf unwind for stripped binaries in 'perf test' (Jiri Olsa)
> 
> - Use ordered_events for 'perf report --tasks', otherwise we may get
>   artifacts when PERF_RECORD_FORK gets processed before PERF_RECORD_COMM
>   (when they got recorded in different CPUs) (Jiri Olsa)
> 
> - Add support to display group output for non group events, i.e.
>   now when one uses 'perf report --group' on a perf.data file
>   recorded without explicitly grouping events with {} (e.g.
>   "perf record -e '{cycles,instructions}'" get the same output
>   that would produce, i.e. see all those non-grouped events in
>   multiple columns, at the same time (Jiri Olsa)
> 
> - Skip non-address kallsyms entries, e.g. '(null)' for !root (Jiri Olsa)
> 
> - Kernel maps fixes wrt perf.data(report) versus live system (top)
>   (Jiri Olsa)
> 
> - Fix memory corruption when using 'perf record -j call -g -a <application>'
>   followed by 'perf report --branch-history' (Jiri Olsa)
> 
> - ARM CoreSight fixes (Mathieu Poirier)
> 
> - Add inject capability for CoreSight Traces (Robert Waker)
> 
> - Update documentation for use of 'perf' + ARM CoreSight (Robert Walker)
> 
> - Man pages fixes (Sangwon Hong, Jaecheol Shin)
> 
> - Fix some 'perf test' cases on s/390 and x86_64 (some backtraces
>   changed with a glibc update) (Thomas Richter)
> 
> - Add detailed CPUID info in the 'perf.data' headers for s/390 to
>   then use it in 'perf annotate' (Thomas Richter)
> 
> - Add '--interval-count N' to 'perf stat', to use with -I, i.e.
>   'perf stat -I 1000 --interval-count 2' will show stats every
>    1000ms, two times (yuzhoujian)
> 
> - Add 'perf stat --timeout Nms', that will run for that many
>   milliseconds and then stop, printing the counters (yuzhoujian)
> 
> - Fix description for 'perf report --mem-modex (Andi Kleen)
> 
> - Use a wildcard to remove the vfs_getname probe in the
>   'perf test' shell based test cases (Arnaldo Carvalho de Melo)
> 
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> ----------------------------------------------------------------
> Andi Kleen (1):
>       perf report: Fix description for --mem-mode
> 
> Arnaldo Carvalho de Melo (1):
>       perf tests shell lib: Use a wildcard to remove the vfs_getname probe
> 
> Jaecheol Shin (1):
>       perf annotate: Add missing arguments in Man page
> 
> Jin Yao (2):
>       perf tools: Use target->per_thread and target->system_wide flags
>       perf report: Fix wrong jump arrow
> 
> Jiri Olsa (18):
>       perf record: Put new line after target override warning
>       perf script: Add --show-round-event to display PERF_RECORD_FINISHED_ROUND
>       tools lib api fs: Add filename__read_xll function
>       tools lib api fs: Add sysfs__read_xll function
>       perf tests: Fix dwarf unwind for stripped binaries
>       perf tools: Fix comment for sort__* compare functions
>       perf report: Ask for ordered events for --tasks option
>       perf report: Add support to display group output for non group events
>       tools lib symbol: Skip non-address kallsyms line
>       perf symbols: Check if we read regular file in dso__load()
>       perf machine: Free root_dir in machine__init() error path
>       perf machine: Move kernel mmap name into struct machine
>       perf machine: Generalize machine__set_kernel_mmap()
>       perf machine: Don't search for active kernel start in __machine__create_kernel_maps
>       perf machine: Remove machine__load_kallsyms()
>       perf tools: Do not create kernel maps in sample__resolve()
>       perf tests: Use arch__compare_symbol_names to compare symbols
>       perf report: Fix memory corruption in --branch-history mode --branch-history
> 
> Mathieu Poirier (3):
>       perf cs-etm: Freeing allocated memory
>       perf auxtrace arm: Fixing uninitialised variable
>       perf cs-etm: Properly deal with cpu maps
> 
> Ravi Bangoria (3):
>       tools include powerpc: Grab a copy of arch/powerpc/include/uapi/asm/unistd.h
>       perf powerpc: Generate system call table from asm/unistd.h
>       perf trace powerpc: Use generated syscall table
> 
> Robert Walker (3):
>       perf cs-etm: Inject capabilitity for CoreSight traces
>       perf inject: Emit instruction records on ETM trace discontinuity
>       coresight: Update documentation for perf usage
> 
> Sangwon Hong (2):
>       perf kmem: Document a missing option & an argument
>       perf mem: Document a missing option
> 
> Thomas Richter (5):
>       perf record: Provide detailed information on s390 CPU
>       perf annotate: Scan cpuid for s390 and save machine type
>       perf cpuid: Introduce a platform specific cpuid compare function
>       perf test: Fix test case 23 for s390 z/VM or KVM guests
>       perf test: Fix test case inet_pton to accept inlines.
> 
> yuzhoujian (2):
>       perf stat: Add support to print counts for fixed times
>       perf stat: Add support to print counts after a period of time
> 
>  Documentation/trace/coresight.txt                  |  51 +++
>  tools/arch/powerpc/include/uapi/asm/unistd.h       | 402 +++++++++++++++++
>  tools/lib/api/fs/fs.c                              |  44 +-
>  tools/lib/api/fs/fs.h                              |   2 +
>  tools/lib/symbol/kallsyms.c                        |   4 +
>  tools/perf/Documentation/perf-annotate.txt         |   6 +-
>  tools/perf/Documentation/perf-kmem.txt             |   6 +-
>  tools/perf/Documentation/perf-mem.txt              |   4 +
>  tools/perf/Documentation/perf-report.txt           |   5 +-
>  tools/perf/Documentation/perf-script.txt           |   3 +
>  tools/perf/Documentation/perf-stat.txt             |  10 +
>  tools/perf/Makefile.config                         |   2 +
>  tools/perf/arch/arm/util/auxtrace.c                |   2 +-
>  tools/perf/arch/arm/util/cs-etm.c                  |  51 ++-
>  tools/perf/arch/powerpc/Makefile                   |  25 ++
>  .../perf/arch/powerpc/entry/syscalls/mksyscalltbl  |  37 ++
>  tools/perf/arch/s390/annotate/instructions.c       |  27 +-
>  tools/perf/arch/s390/util/header.c                 | 148 ++++++-
>  tools/perf/builtin-record.c                        |   2 +-
>  tools/perf/builtin-report.c                        |   7 +-
>  tools/perf/builtin-script.c                        |  17 +
>  tools/perf/builtin-stat.c                          |  53 ++-
>  tools/perf/check-headers.sh                        |   1 +
>  tools/perf/tests/code-reading.c                    |  33 +-
>  tools/perf/tests/dwarf-unwind.c                    |  46 +-
>  tools/perf/tests/shell/lib/probe_vfs_getname.sh    |   2 +-
>  .../perf/tests/shell/trace+probe_libc_inet_pton.sh |   6 +-
>  tools/perf/tests/vmlinux-kallsyms.c                |   4 +-
>  tools/perf/ui/browsers/annotate.c                  |   9 +-
>  tools/perf/util/build-id.c                         |  10 +-
>  tools/perf/util/cs-etm-decoder/cs-etm-decoder.c    |  74 +++-
>  tools/perf/util/cs-etm-decoder/cs-etm-decoder.h    |   2 +
>  tools/perf/util/cs-etm.c                           | 478 ++++++++++++++++++---
>  tools/perf/util/event.c                            |  16 +-
>  tools/perf/util/evlist.c                           |  21 +-
>  tools/perf/util/header.h                           |   1 +
>  tools/perf/util/hist.c                             |   4 +-
>  tools/perf/util/hist.h                             |   1 -
>  tools/perf/util/machine.c                          | 145 +++----
>  tools/perf/util/machine.h                          |   6 +-
>  tools/perf/util/pmu.c                              |  47 +-
>  tools/perf/util/sort.c                             |   7 +-
>  tools/perf/util/stat.h                             |   2 +
>  tools/perf/util/symbol.c                           |  13 +-
>  tools/perf/util/syscalltbl.c                       |   8 +
>  tools/perf/util/thread_map.c                       |   4 +-
>  tools/perf/util/thread_map.h                       |   2 +-
>  47 files changed, 1577 insertions(+), 273 deletions(-)
>  create mode 100644 tools/arch/powerpc/include/uapi/asm/unistd.h
>  create mode 100755 tools/perf/arch/powerpc/entry/syscalls/mksyscalltbl

Pulled, thanks a lot Arnaldo!

	Ingo

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [GIT PULL 00/41] perf/core improvements and fixes
@ 2017-09-12 15:09 Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-09-12 15:09 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, linux-perf-users, Arnaldo Carvalho de Melo,
	Adrian Hunter, Andi Kleen, David Ahern, Jiri Olsa, Kan Liang,
	Lukasz Odzioba, Milian Wolff, Namhyung Kim, Peter Zijlstra,
	Taeung Song, Wang Nan, Yao Jin, Arnaldo Carvalho de Melo

Hi Ingo,

	Please consider pulling, I'll try and be strict from now on wrt
new stuff versus fixes, which will got to a perf/urgent branch.

- Arnaldo

Test results at the end of this message, as usual.

The following changes since commit 770e96125515daf1c7bc179323f2e0d488dfe6ac:

  Merge tag 'perf-core-for-mingo-4.14-20170901' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent (2017-09-05 07:14:28 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-4.14-20170912

for you to fetch changes up to f94843e82778769422e7e6a02b9ce39d5eb882f6:

  perf vendor events: Add JSON metrics for Skylake server (2017-09-11 17:22:40 -0300)

----------------------------------------------------------------
perf/core improvements and fixes:

- Support direct --user-regs arguments in 'perf record', previously the
  only way to sample PERF_SAMPLE_REGS_USER was implicitly selecting it
  when recording callchains (Andi Kleen)

- Support showing sampled user regs in 'perf script' (Andi Kleen)

- Introduce the concept of weak groups in 'perf stat': try to set up a
  group, but if it's not schedulable fallback to not using a group. That
  gives us the best of both worlds: groups if they work, but still a
  usable fallback if they don't. E.g: (Andi Kleen)

  % perf stat -e '{branches,branch-misses,l1d.replacement,l2_lines_in.all,l2_rqsts.all_code_rd}:W' -a sleep 1

    125,366,055  branches                                    (80.02%)
      9,208,402  branch-misses       # 7.35% of all branches (80.01%)
     24,560,249  l1d.replacement                             (80.00%)
     43,174,971  l2_lines_in.all                             (80.05%)
     31,891,457  l2_rqsts.all_code_rd                        (79.92%)

- Support metrics in 'stat' and 'list'. A metric is a formula that
  uses multiple events to compute a higher level result (e.g. IPC). (Andi Kleen)

- Add Intel processors vendor event metrics JSON files (Andi Kleen)

- Add 'pid' and 'tid' options to 'perf sched timehist' (David Ahern)

- Improve TUI progress bar by showing how many bytes from a total were
  processed (Jiri Olsa)

- Make tools/lib/api make DEBUG=1 build use -D_FORTIFY_SOURCE=2 not
  to cripple debuginfo, just like tools/perf/ does (Jiri Olsa)

- Avoid leaking the 'perf.data' file to workloads started from the
  'perf record' command line by using the O_CLOEXEC open flag (Jiri Olsa)

- Add 'python-clean' make target (Jiri Olsa)

- Use scandir() to replace readdir(), prep work to have the synthesizing
  of PERF_RECORD_ entries for existing threads be multithreaded, making
  'perf top' bearable on high core count systems such as Intel's Knights
  Landing/Mill  (Kan Liang)

- Fix building when libunwind's 'unwind.h' file is present in the
  include path, clashing with tools/perf/util/unwind.h (Milian Wolff)

- Support running perf binaries with a dash in their name (Milian Wolff)

- Allow creating a ~/.perfconfig file when setting a variable to its
  default value, previously it would bail out and not write such a
  file (Taeung Song)

- Avoid writing ~/.perfconfig repeatedly when processing changes to
  multiple variables (Taeung Song)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

----------------------------------------------------------------
Andi Kleen (25):
      perf tools: Support weak groups in 'perf stat'
      perf vendor events: Support metric_group and no event name in JSON parser
      perf stat: Factor out generic metric printing
      perf stat: Print generic metric header even for failed expressions
      perf pmu: Extract function to get JSON alias map
      perf stat: Support JSON metrics in perf stat
      perf list: Add metric groups to perf list
      perf stat: Don't use ctx for saved values lookup
      perf stat: Support duration_time for metrics
      perf stat: Hide internal duration_time counter
      perf stat: Update walltime_nsecs_stats in interval mode
      perf record: Support direct --user-regs arguments
      perf script: Support user regs
      perf stat: Fall weak group back even for EBADF
      perf vendor events: Add JSON metrics for Broadwell
      perf vendor events: Add JSON metrics for Skylake
      perf vendor events: Add JSON metrics for Sandy Bridge
      perf vendor events: Add JSON metrics for Sandy Bridge EP
      perf vendor events: Add JSON metrics for Ivy Bridge
      perf vendor events: Add JSON metrics for Haswell
      perf vendor events: Add JSON metrics for Ivy Town
      perf vendor events: Add JSON metrics for Haswell EP
      perf vendor events: Add JSON metrics for Broadwell Server
      perf vendor events: Add JSON metrics for Broadwell DE
      perf vendor events: Add JSON metrics for Skylake server

Arnaldo Carvalho de Melo (2):
      tools include linux: Guard against redefinition of some macros
      perf tools: Make copyfile_offset() static

David Ahern (1):
      perf sched timehist: Add pid and tid options

Jiri Olsa (7):
      tools lib api: Fix make DEBUG=1 build
      perf tools: Open perf.data with O_CLOEXEC flag
      perf tools: Add python-clean target
      perf ui progress: Make sure we always define step value
      perf ui progress: Fix progress update
      perf ui progress: Add ui specific init function
      perf ui progress: Add size info into progress bar

Kan Liang (1):
      perf tools: Use scandir() to replace readdir()

Milian Wolff (2):
      perf tests: Fix compile when libunwind's unwind.h is available
      perf tools: Support running perf binaries with a dash in their name

Taeung Song (3):
      perf config: Check not only section->from_system_config but also item's
      perf config: Write a config file just once
      perf config: Allow creating empty config set for config file autogeneration

 tools/include/linux/compiler-gcc.h                 |   9 +-
 tools/lib/api/Makefile                             |   8 +-
 tools/perf/Documentation/perf-list.txt             |   9 +-
 tools/perf/Documentation/perf-record.txt           |   2 +
 tools/perf/Documentation/perf-sched.txt            |   8 +
 tools/perf/Documentation/perf-script.txt           |   4 +-
 tools/perf/Documentation/perf-stat.txt             |   7 +
 tools/perf/Makefile.perf                           |   8 +-
 tools/perf/builtin-config.c                        |  24 +-
 tools/perf/builtin-list.c                          |   7 +
 tools/perf/builtin-record.c                        |   3 +
 tools/perf/builtin-sched.c                         |   4 +
 tools/perf/builtin-script.c                        |  30 +-
 tools/perf/builtin-stat.c                          |  82 +++-
 tools/perf/perf.c                                  |  14 +-
 tools/perf/perf.h                                  |   1 +
 .../pmu-events/arch/x86/broadwell/bdw-metrics.json | 164 +++++++
 .../arch/x86/broadwellde/bdwde-metrics.json        | 164 +++++++
 .../arch/x86/broadwellx/bdx-metrics.json           | 164 +++++++
 .../pmu-events/arch/x86/haswell/hsw-metrics.json   | 158 +++++++
 .../pmu-events/arch/x86/haswellx/hsx-metrics.json  | 158 +++++++
 .../pmu-events/arch/x86/ivybridge/ivb-metrics.json | 164 +++++++
 .../pmu-events/arch/x86/ivytown/ivt-metrics.json   | 164 +++++++
 .../pmu-events/arch/x86/jaketown/jkt-metrics.json  | 140 ++++++
 .../arch/x86/sandybridge/snb-metrics.json          | 140 ++++++
 .../pmu-events/arch/x86/skylake/skl-metrics.json   | 164 +++++++
 .../pmu-events/arch/x86/skylakex/skx-metrics.json  | 182 ++++++++
 tools/perf/pmu-events/jevents.c                    |  24 +-
 tools/perf/pmu-events/jevents.h                    |   2 +-
 tools/perf/pmu-events/pmu-events.h                 |   1 +
 tools/perf/tests/builtin-test.c                    |   1 +
 tools/perf/tests/dwarf-unwind.c                    |   2 +-
 tools/perf/ui/progress.c                           |  15 +-
 tools/perf/ui/progress.h                           |  12 +-
 tools/perf/ui/tui/progress.c                       |  32 +-
 tools/perf/util/Build                              |   1 +
 tools/perf/util/config.c                           |   5 +-
 tools/perf/util/data.c                             |  14 +-
 tools/perf/util/dso.c                              |   1 +
 tools/perf/util/event.c                            |  46 +-
 tools/perf/util/evlist.h                           |   1 +
 tools/perf/util/evsel.c                            |   7 +-
 tools/perf/util/evsel.h                            |   1 +
 tools/perf/util/metricgroup.c                      | 489 +++++++++++++++++++++
 tools/perf/util/metricgroup.h                      |  31 ++
 tools/perf/util/namespaces.c                       |   1 +
 tools/perf/util/parse-events.c                     |  11 +-
 tools/perf/util/parse-events.l                     |   3 +-
 tools/perf/util/pmu.c                              |  55 ++-
 tools/perf/util/pmu.h                              |   2 +
 tools/perf/util/probe-file.c                       |   1 +
 tools/perf/util/session.c                          |   2 +-
 tools/perf/util/stat-shadow.c                      | 110 +++--
 tools/perf/util/stat.h                             |   4 +-
 tools/perf/util/util.c                             |   3 +-
 tools/perf/util/util.h                             |   2 -
 tools/perf/util/zlib.c                             |   1 +
 57 files changed, 2734 insertions(+), 128 deletions(-)
 create mode 100644 tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json
 create mode 100644 tools/perf/pmu-events/arch/x86/broadwellde/bdwde-metrics.json
 create mode 100644 tools/perf/pmu-events/arch/x86/broadwellx/bdx-metrics.json
 create mode 100644 tools/perf/pmu-events/arch/x86/haswell/hsw-metrics.json
 create mode 100644 tools/perf/pmu-events/arch/x86/haswellx/hsx-metrics.json
 create mode 100644 tools/perf/pmu-events/arch/x86/ivybridge/ivb-metrics.json
 create mode 100644 tools/perf/pmu-events/arch/x86/ivytown/ivt-metrics.json
 create mode 100644 tools/perf/pmu-events/arch/x86/jaketown/jkt-metrics.json
 create mode 100644 tools/perf/pmu-events/arch/x86/sandybridge/snb-metrics.json
 create mode 100644 tools/perf/pmu-events/arch/x86/skylake/skl-metrics.json
 create mode 100644 tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json
 create mode 100644 tools/perf/util/metricgroup.c
 create mode 100644 tools/perf/util/metricgroup.h

Test results:

The first ones are container (docker) based builds of tools/perf with and
without libelf support, objtool where it is supported and samples/bpf/, ditto.
Where clang is available, it is also used to build perf with/without libelf.

Several are cross builds, the ones with -x-ARCH and the android one, and those
may not have all the features built, due to lack of multi-arch devel packages,
available and being used so far on just a few, like
debian:experimental-x-{arm64,mipsel}.

The 'perf test' one will perform a variety of tests exercising
tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands
with a variety of command line event specifications to then intercept the
sys_perf_event syscall to check that the perf_event_attr fields are set up as
expected, among a variety of other unit tests.

Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/
with a variety of feature sets, exercising the build with an incomplete set of
features as well as with a complete one. It is planned to have it run on each
of the containers mentioned above, using some container orchestration
infrastructure. Get in contact if interested in helping having this in place.
Tue Sep 12 09:49:13 -03 2017

[root@jouet ~]# waitp 23628 ; time dm
   1 alpine:3.4: Ok
   2 alpine:3.5: Ok
   3 alpine:3.6: Ok
   4 alpine:edge: Ok
   5 android-ndk:r12b-arm: Ok
   6 android-ndk:r15c-arm: Ok
   7 archlinux:latest: Ok
   8 centos:5: Ok
   9 centos:6: Ok
  10 centos:7: Ok
  11 debian:7: Ok
  12 debian:8: Ok
  13 debian:9: Ok
  14 debian:experimental: Ok
  15 debian:experimental-x-arm64: Ok
  16 debian:experimental-x-mips: Ok
  17 debian:experimental-x-mips64: Ok
  18 debian:experimental-x-mipsel: Ok
  19 fedora:20: Ok
  20 fedora:21: Ok
  21 fedora:22: Ok
  22 fedora:23: Ok
  23 fedora:24: Ok
  24 fedora:24-x-ARC-uClibc: Ok
  25 fedora:25: Ok
  26 fedora:26: Ok
  27 fedora:rawhide: Ok
  28 mageia:5: Ok
  29 opensuse:13.2: Ok
  30 opensuse:42.1: Ok
  31 opensuse:42.2: Ok
  32 opensuse:42.3: Ok
  33 opensuse:tumbleweed: Ok
  34 oraclelinux:6: Ok
  35 oraclelinux:7: Ok
  36 ubuntu:12.04.5: Ok
  37 ubuntu:14.04.4: Ok
  38 ubuntu:14.04.4-x-linaro-arm64: Ok
  39 ubuntu:15.10: Ok
  40 ubuntu:16.04: Ok
  41 ubuntu:16.04-x-arm: Ok
  42 ubuntu:16.04-x-arm64: Ok
  43 ubuntu:16.04-x-powerpc: Ok
  44 ubuntu:16.04-x-powerpc64: Ok
  45 ubuntu:16.04-x-powerpc64el: Ok
  46 ubuntu:16.04-x-s390: Ok
  47 ubuntu:16.10: Ok
  48 ubuntu:17.04: Ok
  49 ubuntu:17.10: Ok
  #

  # uname -a
  Linux jouet 4.13.0-rc7+ #3 SMP Sat Sep 2 09:04:44 -03 2017 x86_64 x86_64 x86_64 GNU/Linux
  # perf test
   1: vmlinux symtab matches kallsyms                       : Ok
   2: Detect openat syscall event                           : Ok
   3: Detect openat syscall event on all cpus               : Ok
   4: Read samples using the mmap interface                 : Ok
   5: Test data source output                               : Ok
   6: Parse event definition strings                        : Ok
   7: Simple expression parser                              : Ok
   8: PERF_RECORD_* events & perf_sample fields             : Ok
   9: Parse perf pmu format                                 : Ok
  10: DSO data read                                         : Ok
  11: DSO data cache                                        : Ok
  12: DSO data reopen                                       : Ok
  13: Roundtrip evsel->name                                 : Ok
  14: Parse sched tracepoints fields                        : Ok
  15: syscalls:sys_enter_openat event fields                : Ok
  16: Setup struct perf_event_attr                          : Ok
  17: Match and link multiple hists                         : Ok
  18: 'import perf' in python                               : Ok
  19: Breakpoint overflow signal handler                    : Ok
  20: Breakpoint overflow sampling                          : Ok
  21: Number of exit events of a simple workload            : Ok
  22: Software clock events period values                   : Ok
  23: Object code reading                                   : Ok
  24: Sample parsing                                        : Ok
  25: Use a dummy software event to keep tracking           : Ok
  26: Parse with no sample_id_all bit set                   : Ok
  27: Filter hist entries                                   : Ok
  28: Lookup mmap thread                                    : Ok
  29: Share thread mg                                       : Ok
  30: Sort output of hist entries                           : Ok
  31: Cumulate child hist entries                           : Ok
  32: Track with sched_switch                               : Ok
  33: Filter fds with revents mask in a fdarray             : Ok
  34: Add fd to a fdarray, making it autogrow               : Ok
  35: kmod_path__parse                                      : Ok
  36: Thread map                                            : Ok
  37: LLVM search and compile                               :
  37.1: Basic BPF llvm compile                              : Ok
  37.2: kbuild searching                                    : Ok
  37.3: Compile source for BPF prologue generation          : Ok
  37.4: Compile source for BPF relocation                   : Ok
  38: Session topology                                      : Ok
  39: BPF filter                                            :
  39.1: Basic BPF filtering                                 : Ok
  39.2: BPF pinning                                         : Ok
  39.3: BPF prologue generation                             : Ok
  39.4: BPF relocation checker                              : Ok
  40: Synthesize thread map                                 : Ok
  41: Remove thread map                                     : Ok
  42: Synthesize cpu map                                    : Ok
  43: Synthesize stat config                                : Ok
  44: Synthesize stat                                       : Ok
  45: Synthesize stat round                                 : Ok
  46: Synthesize attr update                                : Ok
  47: Event times                                           : Ok
  48: Read backward ring buffer                             : Ok
  49: Print cpu map                                         : Ok
  50: Probe SDT events                                      : Ok
  51: is_printable_array                                    : Ok
  52: Print bitmap                                          : Ok
  53: perf hooks                                            : Ok
  54: builtin clang support                                 : Skip (not compiled in)
  55: unit_number__scnprintf                                : Ok
  56: x86 rdpmc                                             : Ok
  57: Convert perf time to TSC                              : Ok
  58: DWARF unwind                                          : Ok
  59: x86 instruction decoder - new instructions            : Ok
  60: Intel cqm nmi context read                            : Skip
  61: Use vfs_getname probe to get syscall args filenames   : Ok
  62: probe libc's inet_pton & backtrace it with ping       : Ok
  63: Check open filename arg using perf trace + vfs_getname: Ok
  64: Add vfs_getname probe to get syscall args filenames   : Ok
  #
  
  [acme@jouet linux]$ time make -C tools/perf build-test
  make: Entering directory '/home/acme/git/linux/tools/perf'
  - tarpkg: ./tests/perf-targz-src-pkg .
           make_no_backtrace_O: make NO_BACKTRACE=1
                make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1
                 make_static_O: make LDFLAGS=-static
        make_with_babeltrace_O: make LIBBABELTRACE=1
            make_no_auxtrace_O: make NO_AUXTRACE=1
           make_no_libpython_O: make NO_LIBPYTHON=1
                   make_help_O: make help
             make_no_libperl_O: make NO_LIBPERL=1
            make_no_demangle_O: make NO_DEMANGLE=1
              make_clean_all_O: make clean all
              make_no_libelf_O: make NO_LIBELF=1
                make_install_O: make install
                   make_tags_O: make tags
             make_no_libnuma_O: make NO_LIBNUMA=1
           make_no_libbionic_O: make NO_LIBBIONIC=1
                  make_debug_O: make DEBUG=1
            make_no_libaudit_O: make NO_LIBAUDIT=1
                make_no_gtk2_O: make NO_GTK2=1
                 make_perf_o_O: make perf.o
                make_no_newt_O: make NO_NEWT=1
       make_util_pmu_bison_o_O: make util/pmu-bison.o
   make_install_prefix_slash_O: make install prefix=/tmp/krava/
         make_with_clangllvm_O: make LIBCLANGLLVM=1
  make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1
             make_util_map_o_O: make util/map.o
            make_install_bin_O: make install-bin
                   make_pure_O: make
                  make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1
           make_no_libunwind_O: make NO_LIBUNWIND=1
                    make_doc_O: make doc
             make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1
              make_no_libbpf_O: make NO_LIBBPF=1
               make_no_slang_O: make NO_SLANG=1
         make_install_prefix_O: make install prefix=/tmp/krava
  OK
  make: Leaving directory '/home/acme/git/linux/tools/perf'
  $

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [GIT PULL 00/41] perf/core improvements and fixes
  2014-08-13 22:47 Arnaldo Carvalho de Melo
@ 2014-08-14  8:40 ` Ingo Molnar
  0 siblings, 0 replies; 63+ messages in thread
From: Ingo Molnar @ 2014-08-14  8:40 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, Adrian Hunter, Alex Snast, Andi Kleen,
	Corey Ashford, David Ahern, Don Zickus, Frederic Weisbecker,
	Jean Pihet, Jiri Olsa, Masami Hiramatsu, Mike Galbraith,
	Minchan Kim, Namhyung Kim, Naohiro Aota, Paul Mackerras,
	Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo


* Arnaldo Carvalho de Melo <acme@kernel.org> wrote:

> Hi Ingo,
> 
> 	Please consider pulling, more to come! :-)
> 
> - Arnaldo
> 
> The following changes since commit ddcd0973fe02aad3d4bdc59dd0f1db90f51105a9:
> 
>   perf/x86/uncore: Rename IvyTown to IvyBridge-EP (2014-08-13 07:51:18 +0200)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo
> 
> for you to fetch changes up to 1c65056c547141a0cb52fb8e6056f63524d2bbf2:
> 
>   perf evlist: Add perf_evlist__enable_event_idx() (2014-08-13 19:23:48 -0300)
> 
> ----------------------------------------------------------------
> perf/probe fixes and improvements:
> 
> User visible changes:
> 
> . Do not show +/- callchain expansion when there are no childs (top/report) (Namhyung Kim)
> 
> . Fix -z and add respective 'z' hotkey to zero samples before refresh
>   in 'perf top' (Namhyung Kim)
> 
> . Capability probing fixes, improving the detection of
>   kernel features for non-priviledged users (Adrian Hunter)
> 
> . Add beautifier for mremap flags param in 'trace' (Alex Snast)
> 
> . Fix --list and --del options to show events when just uprobes is
>   enabled (Masami Hiramatsu)
> 
> . perf script: Allow callchains if any event samples them
> 
> . Don't look for kernel idle symbols in all DSOs in 'perf top' (Arnaldo Carvalho de Melo)
> 
> . Add cpu_startup_entry to the list of kernel idle symbols (Arnaldo Carvalho de Melo)
> 
> . 'perf top' terminal output fixes (Jiri Olsa)
> 
> . Fix stdin handling for 'perf kvm stat live' (Jiri Olsa)
> 
> . Fix missing label symbols (Adrian Hunter)
> 
> . Don't demangle C++ parameters and such by default, only in
>   --verbose mode (Namhyung Kim)
> 
> . Set proper sort__mode for the branch option (Naohiro Aota)
> 
> . Check recorded kernel version when finding vmlinux (Namhyung Kim)
> 
> Developer Stuff:
> 
> . More prep work for intel PT (Adrian Hunter)
> 
> . Fix possible memory leaks (Namhyung Kim)
> 
> . Fix a memory leak in vmlinux_path__init() (Namhyung Kim)
> 
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> ----------------------------------------------------------------
> Adrian Hunter (14):
>       perf tools: Fix CLOEXEC probe for perf_event_paranoid == 2
>       perf tools: Fix one of the probe events to exclude kernel
>       perf tools: Fix probing the kernel API with cpu-wide events
>       perf tools: Prefer to use a cpu-wide event for probing CLOEXEC
>       perf symbols: Fix missing label symbols
>       perf evlist: Add 'system_wide' option
>       perf evlist: Add perf_evlist__set_tracking_event()
>       perf session: Add perf_session__peek_event()
>       perf script: Allow callchains if any event samples them
>       perf script python: Add helpers for calling Python objects
>       perf tools: Identify which comms are from exec
>       perf machine: Add machine__thread_exec_comm()
>       perf tools: Add flags and insn_len to struct sample
>       perf evlist: Add perf_evlist__enable_event_idx()
> 
> Alex Snast (1):
>       perf trace: Add beautifier for mremap flags param
> 
> Arnaldo Carvalho de Melo (2):
>       perf top: Don't look for kernel idle symbols in all DSOs
>       perf tools: Add cpu_startup_entry to the list of kernel idle symbols
> 
> Jiri Olsa (4):
>       perf top: Join the display thread on exit
>       perf tools: Introduce set_term_quiet_input helper function
>       perf top: Setup signals for terminal output
>       perf kvm: Fix stdin handling for 'kvm stat live' command
> 
> Masami Hiramatsu (2):
>       perf probe: Fix --list option to show events only with uprobe events
>       perf probe: Fix --del option to delete events only with uprobe events
> 
> Namhyung Kim (17):
>       perf script: Fix possible memory leaks
>       perf symbols: Fix a memory leak in vmlinux_path__init()
>       perf annotate: Move session handling out of __cmd_annotate()
>       perf buildid-cache: Move session handling into cmd_buildid_cache()
>       perf inject: Move session handling out of __cmd_inject()
>       perf kmem: Move session handling out of __cmd_kmem()
>       perf kvm: Move call to symbol__init() after creating session
>       perf lock: Move call to symbol__init() after creating session
>       perf sched: Move call to symbol__init() after creating session
>       perf script: Move call to symbol__init() after creating session
>       perf timechart: Move call to symbol__init() after creating session
>       perf trace: Move call to symbol__init() after creating session
>       perf tools: Check recorded kernel version when finding vmlinux
>       perf hists browser: Fix a small callchain display bug
>       perf top: Fix -z option behavior
>       perf top: Handle 'z' key for toggle zeroing samples in TUI
>       perf symbols: Don't demangle parameters and such by default
> 
> naota@elisp.net (1):
>       perf report: Set proper sort__mode for the branch option
> 
>  tools/perf/builtin-annotate.c                      |  75 +++++++-------
>  tools/perf/builtin-buildid-cache.c                 |  37 ++++---
>  tools/perf/builtin-diff.c                          |   2 +-
>  tools/perf/builtin-inject.c                        |  31 +++---
>  tools/perf/builtin-kmem.c                          |  49 +++++----
>  tools/perf/builtin-kvm.c                           |  24 ++---
>  tools/perf/builtin-lock.c                          |   3 +-
>  tools/perf/builtin-mem.c                           |   2 +-
>  tools/perf/builtin-record.c                        |   2 +-
>  tools/perf/builtin-report.c                        |   4 +-
>  tools/perf/builtin-sched.c                         |   3 +-
>  tools/perf/builtin-script.c                        |  57 +++++++----
>  tools/perf/builtin-timechart.c                     |   4 +-
>  tools/perf/builtin-top.c                           |  59 ++++++++---
>  tools/perf/builtin-trace.c                         |  34 +++++-
>  tools/perf/tests/builtin-test.c                    |   2 +-
>  tools/perf/ui/browsers/hists.c                     |  13 ++-
>  tools/perf/util/cloexec.c                          |  23 ++++-
>  tools/perf/util/comm.c                             |   7 +-
>  tools/perf/util/comm.h                             |   6 +-
>  tools/perf/util/event.h                            |   2 +
>  tools/perf/util/evlist.c                           | 109 ++++++++++++++++++--
>  tools/perf/util/evlist.h                           |   5 +
>  tools/perf/util/evsel.c                            |  34 ++++--
>  tools/perf/util/evsel.h                            |   2 +
>  tools/perf/util/hist.c                             |  22 ++++
>  tools/perf/util/hist.h                             |   1 +
>  tools/perf/util/machine.c                          |  30 +++++-
>  tools/perf/util/machine.h                          |   4 +
>  tools/perf/util/probe-event.c                      |  98 ++++++++++++------
>  tools/perf/util/record.c                           |  40 ++++++--
>  .../util/scripting-engines/trace-event-python.c    | 114 +++++++++------------
>  tools/perf/util/session.c                          |  79 +++++++++++++-
>  tools/perf/util/session.h                          |   5 +
>  tools/perf/util/symbol-elf.c                       |   9 +-
>  tools/perf/util/symbol.c                           |  31 ++++--
>  tools/perf/util/symbol.h                           |   4 +-
>  tools/perf/util/thread.c                           |  24 ++++-
>  tools/perf/util/thread.h                           |  10 +-
>  tools/perf/util/util.c                             |  13 +++
>  tools/perf/util/util.h                             |   2 +
>  41 files changed, 770 insertions(+), 305 deletions(-)

Pulled, thanks a lot Arnaldo!

	Ingo

^ permalink raw reply	[flat|nested] 63+ messages in thread

* [GIT PULL 00/41] perf/core improvements and fixes
@ 2014-08-13 22:47 Arnaldo Carvalho de Melo
  2014-08-14  8:40 ` Ingo Molnar
  0 siblings, 1 reply; 63+ messages in thread
From: Arnaldo Carvalho de Melo @ 2014-08-13 22:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Adrian Hunter,
	Alex Snast, Andi Kleen, Corey Ashford, David Ahern, Don Zickus,
	Frederic Weisbecker, Jean Pihet, Jiri Olsa, Masami Hiramatsu,
	Mike Galbraith, Minchan Kim, Namhyung Kim, Naohiro Aota,
	Paul Mackerras, Peter Zijlstra, Stephane Eranian,
	Arnaldo Carvalho de Melo

Hi Ingo,

	Please consider pulling, more to come! :-)

- Arnaldo

The following changes since commit ddcd0973fe02aad3d4bdc59dd0f1db90f51105a9:

  perf/x86/uncore: Rename IvyTown to IvyBridge-EP (2014-08-13 07:51:18 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo

for you to fetch changes up to 1c65056c547141a0cb52fb8e6056f63524d2bbf2:

  perf evlist: Add perf_evlist__enable_event_idx() (2014-08-13 19:23:48 -0300)

----------------------------------------------------------------
perf/probe fixes and improvements:

User visible changes:

. Do not show +/- callchain expansion when there are no childs (top/report) (Namhyung Kim)

. Fix -z and add respective 'z' hotkey to zero samples before refresh
  in 'perf top' (Namhyung Kim)

. Capability probing fixes, improving the detection of
  kernel features for non-priviledged users (Adrian Hunter)

. Add beautifier for mremap flags param in 'trace' (Alex Snast)

. Fix --list and --del options to show events when just uprobes is
  enabled (Masami Hiramatsu)

. perf script: Allow callchains if any event samples them

. Don't look for kernel idle symbols in all DSOs in 'perf top' (Arnaldo Carvalho de Melo)

. Add cpu_startup_entry to the list of kernel idle symbols (Arnaldo Carvalho de Melo)

. 'perf top' terminal output fixes (Jiri Olsa)

. Fix stdin handling for 'perf kvm stat live' (Jiri Olsa)

. Fix missing label symbols (Adrian Hunter)

. Don't demangle C++ parameters and such by default, only in
  --verbose mode (Namhyung Kim)

. Set proper sort__mode for the branch option (Naohiro Aota)

. Check recorded kernel version when finding vmlinux (Namhyung Kim)

Developer Stuff:

. More prep work for intel PT (Adrian Hunter)

. Fix possible memory leaks (Namhyung Kim)

. Fix a memory leak in vmlinux_path__init() (Namhyung Kim)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

----------------------------------------------------------------
Adrian Hunter (14):
      perf tools: Fix CLOEXEC probe for perf_event_paranoid == 2
      perf tools: Fix one of the probe events to exclude kernel
      perf tools: Fix probing the kernel API with cpu-wide events
      perf tools: Prefer to use a cpu-wide event for probing CLOEXEC
      perf symbols: Fix missing label symbols
      perf evlist: Add 'system_wide' option
      perf evlist: Add perf_evlist__set_tracking_event()
      perf session: Add perf_session__peek_event()
      perf script: Allow callchains if any event samples them
      perf script python: Add helpers for calling Python objects
      perf tools: Identify which comms are from exec
      perf machine: Add machine__thread_exec_comm()
      perf tools: Add flags and insn_len to struct sample
      perf evlist: Add perf_evlist__enable_event_idx()

Alex Snast (1):
      perf trace: Add beautifier for mremap flags param

Arnaldo Carvalho de Melo (2):
      perf top: Don't look for kernel idle symbols in all DSOs
      perf tools: Add cpu_startup_entry to the list of kernel idle symbols

Jiri Olsa (4):
      perf top: Join the display thread on exit
      perf tools: Introduce set_term_quiet_input helper function
      perf top: Setup signals for terminal output
      perf kvm: Fix stdin handling for 'kvm stat live' command

Masami Hiramatsu (2):
      perf probe: Fix --list option to show events only with uprobe events
      perf probe: Fix --del option to delete events only with uprobe events

Namhyung Kim (17):
      perf script: Fix possible memory leaks
      perf symbols: Fix a memory leak in vmlinux_path__init()
      perf annotate: Move session handling out of __cmd_annotate()
      perf buildid-cache: Move session handling into cmd_buildid_cache()
      perf inject: Move session handling out of __cmd_inject()
      perf kmem: Move session handling out of __cmd_kmem()
      perf kvm: Move call to symbol__init() after creating session
      perf lock: Move call to symbol__init() after creating session
      perf sched: Move call to symbol__init() after creating session
      perf script: Move call to symbol__init() after creating session
      perf timechart: Move call to symbol__init() after creating session
      perf trace: Move call to symbol__init() after creating session
      perf tools: Check recorded kernel version when finding vmlinux
      perf hists browser: Fix a small callchain display bug
      perf top: Fix -z option behavior
      perf top: Handle 'z' key for toggle zeroing samples in TUI
      perf symbols: Don't demangle parameters and such by default

naota@elisp.net (1):
      perf report: Set proper sort__mode for the branch option

 tools/perf/builtin-annotate.c                      |  75 +++++++-------
 tools/perf/builtin-buildid-cache.c                 |  37 ++++---
 tools/perf/builtin-diff.c                          |   2 +-
 tools/perf/builtin-inject.c                        |  31 +++---
 tools/perf/builtin-kmem.c                          |  49 +++++----
 tools/perf/builtin-kvm.c                           |  24 ++---
 tools/perf/builtin-lock.c                          |   3 +-
 tools/perf/builtin-mem.c                           |   2 +-
 tools/perf/builtin-record.c                        |   2 +-
 tools/perf/builtin-report.c                        |   4 +-
 tools/perf/builtin-sched.c                         |   3 +-
 tools/perf/builtin-script.c                        |  57 +++++++----
 tools/perf/builtin-timechart.c                     |   4 +-
 tools/perf/builtin-top.c                           |  59 ++++++++---
 tools/perf/builtin-trace.c                         |  34 +++++-
 tools/perf/tests/builtin-test.c                    |   2 +-
 tools/perf/ui/browsers/hists.c                     |  13 ++-
 tools/perf/util/cloexec.c                          |  23 ++++-
 tools/perf/util/comm.c                             |   7 +-
 tools/perf/util/comm.h                             |   6 +-
 tools/perf/util/event.h                            |   2 +
 tools/perf/util/evlist.c                           | 109 ++++++++++++++++++--
 tools/perf/util/evlist.h                           |   5 +
 tools/perf/util/evsel.c                            |  34 ++++--
 tools/perf/util/evsel.h                            |   2 +
 tools/perf/util/hist.c                             |  22 ++++
 tools/perf/util/hist.h                             |   1 +
 tools/perf/util/machine.c                          |  30 +++++-
 tools/perf/util/machine.h                          |   4 +
 tools/perf/util/probe-event.c                      |  98 ++++++++++++------
 tools/perf/util/record.c                           |  40 ++++++--
 .../util/scripting-engines/trace-event-python.c    | 114 +++++++++------------
 tools/perf/util/session.c                          |  79 +++++++++++++-
 tools/perf/util/session.h                          |   5 +
 tools/perf/util/symbol-elf.c                       |   9 +-
 tools/perf/util/symbol.c                           |  31 ++++--
 tools/perf/util/symbol.h                           |   4 +-
 tools/perf/util/thread.c                           |  24 ++++-
 tools/perf/util/thread.h                           |  10 +-
 tools/perf/util/util.c                             |  13 +++
 tools/perf/util/util.h                             |   2 +
 41 files changed, 770 insertions(+), 305 deletions(-)

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2018-02-17 10:49 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-16 19:17 [GIT PULL 00/41] perf/core improvements and fixes Arnaldo Carvalho de Melo
2018-02-16 19:17 ` Arnaldo Carvalho de Melo
2018-02-16 19:17 ` Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 01/41] perf record: Put new line after target override warning Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 02/41] perf script: Add --show-round-event to display PERF_RECORD_FINISHED_ROUND Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 03/41] tools lib api fs: Add filename__read_xll function Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 04/41] tools lib api fs: Add sysfs__read_xll function Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 05/41] perf tests: Fix dwarf unwind for stripped binaries Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 06/41] perf tools: Fix comment for sort__* compare functions Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 07/41] perf report: Ask for ordered events for --tasks option Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 08/41] perf report: Add support to display group output for non group events Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 09/41] perf stat: Add support to print counts for fixed times Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 10/41] perf stat: Add support to print counts after a period of time Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 11/41] tools lib symbol: Skip non-address kallsyms line Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 12/41] perf symbols: Check if we read regular file in dso__load() Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 13/41] perf machine: Free root_dir in machine__init() error path Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 14/41] perf machine: Move kernel mmap name into struct machine Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 15/41] perf machine: Generalize machine__set_kernel_mmap() Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 16/41] perf machine: Don't search for active kernel start in __machine__create_kernel_maps Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 17/41] perf machine: Remove machine__load_kallsyms() Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 18/41] perf tools: Do not create kernel maps in sample__resolve() Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 19/41] perf tests: Use arch__compare_symbol_names to compare symbols Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 20/41] perf cs-etm: Freeing allocated memory Arnaldo Carvalho de Melo
2018-02-16 19:17   ` Arnaldo Carvalho de Melo
2018-02-16 19:17   ` Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 21/41] perf tools: Use target->per_thread and target->system_wide flags Arnaldo Carvalho de Melo
2018-02-16 19:17   ` Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 22/41] perf auxtrace arm: Fixing uninitialised variable Arnaldo Carvalho de Melo
2018-02-16 19:17   ` Arnaldo Carvalho de Melo
2018-02-16 19:17   ` Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 23/41] perf cs-etm: Properly deal with cpu maps Arnaldo Carvalho de Melo
2018-02-16 19:17   ` Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 24/41] perf annotate: Add missing arguments in Man page Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 25/41] perf kmem: Document a missing option & an argument Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 26/41] perf mem: Document a missing option Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 27/41] perf cs-etm: Inject capabilitity for CoreSight traces Arnaldo Carvalho de Melo
2018-02-16 19:17   ` Arnaldo Carvalho de Melo
2018-02-16 19:17   ` Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 28/41] perf inject: Emit instruction records on ETM trace discontinuity Arnaldo Carvalho de Melo
2018-02-16 19:17   ` Arnaldo Carvalho de Melo
2018-02-16 19:17   ` Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 29/41] coresight: Update documentation for perf usage Arnaldo Carvalho de Melo
2018-02-16 19:17   ` Arnaldo Carvalho de Melo
2018-02-16 19:17   ` Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 30/41] perf report: Fix description for --mem-mode Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 31/41] perf report: Fix wrong jump arrow Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 32/41] perf report: Fix memory corruption in --branch-history mode --branch-history Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 33/41] tools include powerpc: Grab a copy of arch/powerpc/include/uapi/asm/unistd.h Arnaldo Carvalho de Melo
2018-02-16 19:17   ` Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 34/41] perf powerpc: Generate system call table from asm/unistd.h Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 35/41] perf trace powerpc: Use generated syscall table Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 36/41] perf record: Provide detailed information on s390 CPU Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 37/41] perf annotate: Scan cpuid for s390 and save machine type Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 38/41] perf cpuid: Introduce a platform specific cpuid compare function Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 39/41] perf test: Fix test case 23 for s390 z/VM or KVM guests Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 40/41] perf test: Fix test case inet_pton to accept inlines Arnaldo Carvalho de Melo
2018-02-16 19:17 ` [PATCH 41/41] perf tests shell lib: Use a wildcard to remove the vfs_getname probe Arnaldo Carvalho de Melo
2018-02-17 10:49 ` [GIT PULL 00/41] perf/core improvements and fixes Ingo Molnar
2018-02-17 10:49   ` Ingo Molnar
2018-02-17 10:49   ` Ingo Molnar
  -- strict thread matches above, loose matches on Subject: below --
2017-09-12 15:09 Arnaldo Carvalho de Melo
2014-08-13 22:47 Arnaldo Carvalho de Melo
2014-08-14  8:40 ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.