linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] perf changes for v3.8
@ 2012-12-11  9:09 Ingo Molnar
  2012-12-13  2:53 ` Linus Torvalds
  2012-12-13 17:04 ` [PATCH] x86: fix perf build with uclibc toolchains Florian Fainelli
  0 siblings, 2 replies; 34+ messages in thread
From: Ingo Molnar @ 2012-12-11  9:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Peter Zijlstra,
	Thomas Gleixner, Andrew Morton

Linus,

Please pull the latest perf-core-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf-core-for-linus

   HEAD: cc1b39dbf9f55a438e8a21a694394c20e6a17129 Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core

Lots of activity:

   211 files changed, 8328 insertions(+), 4116 deletions(-)

most of it on the tooling side.

Main changes:
    
     * ftrace enhancements and fixes from Steve Rostedt.
    
     * uprobes fixes, cleanups and preparation for the ARM 
       port from Oleg Nesterov.
    
     * UAPI fixes, from David Howels - prepares the arch/x86 UAPI transition
    
     * Separate perf tests into multiple objects, one per test, from Jiri Olsa.
    
     * Make hardware event translations available in sysfs, from Jiri Olsa.

     * Fixes to /proc/pid/maps parsing, preparatory to supporting data maps,
       from Namhyung Kim
    
     * Implement ui_progress for GTK, from Namhyung Kim
    
     * Add framework for automated perf_event_attr tests, where tools with
       different command line options will be run from a 'perf test', via
       python glue, and the perf syscall will be intercepted to verify that
       the perf_event_attr fields set by the tool are those expected,
       from Jiri Olsa
    
     * Add a 'link' method for hists, so that we can have the leader with
       buckets for all the entries in all the hists.  This new method
       is now used in the default 'diff' output, making the sum of the 'baseline'
       column be 100%, eliminating blind spots.
    
     * libtraceevent fixes for compiler warnings trying to make perf it build
       on some distros, like fedora 14, 32-bit, some of the warnings really
       pointed to real bugs.
    
     * Add a browser for 'perf script' and make it available from the report
       and annotate browsers. It does filtering to find the scripts that
       handle events found in the perf.data file used. From Feng Tang
    
     * perf inject changes to allow showing where a task sleeps, from Andrew Vagin.
    
     * Makefile improvements from Namhyung Kim.
    
     * Add --pre and --post command hooks in 'stat', from Peter Zijlstra.
    
     * Don't stop synthesizing threads when one vanishes, this is for
       the existing threads when we start a tool like trace.
    
     * Use sched:sched_stat_runtime to provide a thread summary, this
       produces the same output as the 'trace summary' subcommand of
       tglx's original "trace" tool.
    
     * Support interrupted syscalls in 'trace'
    
     * Add an event duration column and filter in 'trace'.
    
     * There are references to the man pages in some tools, so try to build
       Documentation when installing, warning the user if that is not possible,
       from Borislav Petkov.
    
     * Give user better message if precise is not supported, from David Ahern.
    
     * Try to find cross-built objdump path by using the session environment
       information in the perf.data file header, from Irina Tirdea, original
       patch and idea by Namhyung Kim.
    
     * Diplays more output on features check for make V=1, so that one can figure
       out what is happening by looking at gcc output, etc. From Jiri Olsa.
    
     * Add on_exit implementation for systems without one, e.g. Android, from
       Bernhard Rosenkraenzer.
    
     * Only process events for vcpus of interest, helps handling large number
       of events, from David Ahern.
    
     * Cross compilation fixes for Android, from Irina Tirdea.
    
     * Add documentation on compiling for Android, from Irina Tirdea.
    
     * perf diff improvements from Jiri Olsa.
    
     * Target (task/user/cpu/syswide) handling improvements, from Namhyung Kim.
    
     * Add support in 'trace' for tracing workload given by command line, from
       Namhyung Kim.

     * ... and much more.
    
Thanks,

	Ingo

------------------>
Andi Kleen (3):
      perf tools: Move parse_events error printing to parse_events_options
      perf annotate: Handle XBEGIN like a jump
      perf tools: Add arbitary aliases and support names with -

Andrew Vagin (3):
      perf inject: Work with files
      perf inject: Merge sched_stat_* and sched_switch events
      perf inject: Mark a dso if it's used

Arnaldo Carvalho de Melo (31):
      perf tools: Have the page size value available for all tools
      perf machine: Introduce find_thread method
      perf event: No need to create a thread when handling PERF_RECORD_EXIT
      perf annotate: Handle PERF_RECORD_EXIT events
      perf sched: Handle PERF_RECORD_EXIT events
      perf machine: Carve up event processing specific from perf_tool
      perf tools: Remove noise in python version feature test
      perf test: Align the 'Ok'/'FAILED!' test results
      perf trace: Support interrupted syscalls
      perf trace: Add an event duration column
      perf trace: Add duration filter
      perf tools: Pretty print errno for some more functions
      perf trace: Print the name of a syscall when failing to read its info
      perf tools: Don't stop synthesizing threads when one vanishes
      perf trace: Count number of events for each thread and globally
      perf trace: Use sched:sched_stat_runtime to provide a thread summary
      perf python: Initialize 'page_size' variable
      perf tools: Handle --version string generation on machines without git
      perf diff: Start moving to support matching more than two hists
      perf diff: Move hists__match to the hists lib
      perf hists: Introduce hists__link
      perf diff: Use hists__link when not pairing just with baseline
      perf machine: Move more methods to machine.[ch]
      tools lib traceevent: Add __maybe_unused to unused parameters
      tools lib traceevent: Avoid comparisions between signed/unsigned
      tools lib traceevent: No need to check for < 0 on an unsigned enum
      tools lib traceevent: Handle INVALID_ARG_TYPE errno in pevent_strerror
      tools lib traceevent: Use 'const' in variables pointing to const strings
      perf tools: Stop using 'self' in pstack
      perf hists: Initialize all of he->stat with zeroes
      perf evsel: Introduce is_group_member method

Bernhard Rosenkraenzer (1):
      perf tools: Add on_exit implementation

Borislav Petkov (1):
      perf tools: Try to build Documentation when installing

Daniel Walter (1):
      tracing: Replace strict_strto* with kstrto*

David Ahern (5):
      perf kvm: Only process events for vcpus of interest
      perf kvm: Remove typecast in init_kvm_event_record
      perf kvm: Total count is a u64, print as so
      perf kvm: Add braces around multi-line statements
      perf tools: Give user better message if precise is not supported

David Howells (3):
      tools: Define a Makefile function to do subdir processing
      tools: Honour the O= flag when tool build called from a higher Makefile
      tools: Pass the target in descend

David Sharp (4):
      tracing: Trivial cleanup
      tracing: Reset ring buffer when changing trace_clocks
      tracing,x86: Add a TSC trace_clock
      tracing: Format non-nanosec times from tsc clock without a decimal point.

David Vrabel (1):
      x86: Allow tracing of functions in arch/x86/kernel/rtc.c

Feng Tang (7):
      perf tools: Add a global variable "const char *input_name"
      perf script: Add more filter to find_scripts()
      perf scripts browser: Add a browser for perf script
      perf annotate browser: Integrate script browser into annotation browser
      perf hists browser: Integrate script browser into main hists browser
      perf header: Add is_perf_magic() func
      perf browser: Don't show scripts menu for 'perf top'

Hiraku Toyooka (1):
      tracing: Change tracer's integer flags to bool

Ingo Molnar (2):
      perf tools: Speed up the perf build time by simplifying the perf --version string generation
      perf tools: Further speed up the perf build

Irina Tirdea (3):
      perf tools: Update Makefile for Android
      Documentation: add documentation on compiling for Android
      perf tools: Try to find cross-built objdump path

Jiri Olsa (70):
      perf diff: Add -b option for perf diff to display paired entries only
      perf diff: Add ratio computation way to compare hist entries
      perf diff: Add option to sort entries based on diff computation
      perf diff: Add weighted diff computation way to compare hist entries
      perf diff: Add -p option to display period values for hist entries
      perf diff: Add -F option to display formula for computation
      perf diff: Include samples without symbol in overall stats
      perf diff: Display empty space for non paired samples
      perf/x86: Make hardware event translations available in sysfs
      perf/x86: Filter out undefined events from sysfs events attribute
      perf/x86: Add hardware events translations for Intel cpus
      perf/x86: Add hardware events translations for AMD cpus
      perf/x86: Add hardware events translations for Intel P6 cpus
      perf tools: Fix PMU object alias initialization
      perf tools: Add support to specify hw event as PMU event term
      perf test: Add automated tests for pmu sysfs translated events
      perf tools: Diplays more output on features check for make V=1
      perf tools: Move build_id__sprintf into build-id object
      perf tools: Move BUILD_ID_SIZE into build-id object
      perf tools: Move hex2u64 into util object
      perf tools: Move strxfrchar into string object
      perf tools: Move dso_* related functions into dso object
      perf record: Fix mmap error output condition
      perf tools: Remove BINDIR define from exec_cmd.o compilation
      perf tests: Move test objects into 'tests' directory
      perf tests: Add framework for automated perf_event_attr tests
      perf tests: Add attr record basic test
      perf tests: Add attr tests under builtin test command
      perf tests: Add attr record group test
      perf tests: Add attr record event syntax group test
      perf tests: Add attr record freq test
      perf tests: Add attr record count test
      perf tests: Add attr record graph test
      perf tests: Add attr record period test
      perf tests: Add attr record no samples test
      perf tests: Add attr record no-inherit test
      perf tests: Add attr record data test
      perf tests: Add attr record raw test
      perf tests: Add attr record no delay test
      perf tests: Add attr record branch any test
      perf tests: Add attr record branch filter tests
      perf tests: Add attr stat no-inherit test
      perf tests: Add attr stat group test
      perf tests: Add attr stat event syntax group test
      perf tests: Add attr stat default test
      perf tests: Add attr stat default test
      perf tests: Add documentation for attr tests
      perf tests: Add missing attr stat basic test
      perf tests: Factor attr tests WRITE_ASS macro
      perf tests: Fix attr watermark field name typo
      perf tests: Removing 'optional' field
      perf tests: Move attr.py temp dir cleanup into finally section
      perf tools: Add LIBDW_DIR Makefile variable to for alternate libdw
      perf tests: Move test__vmlinux_matches_kallsyms into separate object
      perf tests: Move test__open_syscall_event into separate object
      perf tests: Move test__open_syscall_event_on_all_cpus into separate object
      perf tests: Move test__basic_mmap into separate object
      perf tests: Move test__PERF_RECORD into separate object
      perf tests: Move test__rdpmc into separate object
      perf tests: Move perf_evsel__roundtrip_name_test into separate object
      perf tests: Move perf_evsel__tp_sched_test into separate object
      perf tests: Move test__syscall_open_tp_fields into separate object
      perf tests: Move pmu tests into separate object
      perf tests: Final cleanup for builtin-test move
      perf tests: Check for mkstemp return value in dso-data test
      perf tools: Fix attributes for '{}' defined event groups
      perf tools: Fix 'disabled' attribute config for record command
      perf tools: Ensure single disable call per event in record comand
      perf tools: Omit group members from perf_evlist__disable/enable
      perf tools: Add basic event modifier sanity check

Joonsoo Kim (1):
      perf tools: Add info about cross compiling for Android ARM

Jovi Zhang (1):
      uprobes: Fix misleading log entry

Michal Hocko (1):
      linux/kernel.h: Remove duplicate trace_printk declaration

Namhyung Kim (27):
      perf trace: Validate target task/user/cpu argument
      perf trace: Explicitly enable system-wide mode if no option is given
      perf trace: Add support for tracing workload given by command line
      tools lib traceevent: Do not generate dependency for system header files
      perf tools: Cleanup doc related targets
      perf tools: Convert invocation of MAKE into SUBDIR
      perf tools: Always show CHK message when doing try-cc
      perf tools: Fix LIBELF_MMAP checking
      perf tools: Warn about missing libelf
      perf tools: Use normalized arch name for searching objdump path
      perf tools: Introduce struct hist_browser_timer
      perf report: Postpone objdump check until annotation requested
      perf machine: Set kernel data mapping length
      perf tools: Fix detection of stack area
      perf hists: Free branch_info when freeing hist_entry
      perf tools: Don't try to lookup objdump for live mode
      perf annotate: Whitespace fixups
      perf annotate: Don't try to follow jump target on PLT symbols
      perf annotate: Merge same lines in summary view
      perf tools: Fix compile error on NO_NEWT=1 build
      perf tools: Add gtk.<command> config option for launching GTK browser
      perf tools: Use sscanf for parsing /proc/pid/maps
      perf ui tui: Move progress.c under ui/tui directory
      perf ui: Introduce generic ui_progress helper
      perf ui gtk: Implement ui_progress functions
      perf ui: Add ui_progress__finish()
      perf ui: Always compile browser setup code

Oleg Nesterov (5):
      uprobes/powerpc: Don't clear TIF_UPROBE in do_notify_resume()
      uprobes/powerpc: Do not use arch_uprobe_*_step() helpers
      uprobes/x86: Cleanup the single-stepping code
      uprobes: Kill arch_uprobe_enable/disable_step() hooks
      uprobes: Use percpu_rw_semaphore to fix register/unregister vs dup_mmap() race

Peter Huewe (1):
      perf/x86: Fix sparse warnings

Peter Zijlstra (1):
      perf stat: Add --pre and --post command

Rabin Vincent (1):
      uprobes: Flush cache after xol write

Shan Wei (1):
      tracing: Kill unused and puzzled sample code in ftrace.h

Slava Pestov (1):
      ring-buffer: Add a 'dropped events' counter

Steven Rostedt (11):
      tracing: Allow tracers to start at core initcall
      tracing: Expand ring buffer when trace_printk() is used
      tracing: Enable comm recording if trace_printk() is used
      tracing: Have tracing_sched_wakeup_trace() use standard unlock_commit
      tracing: Cache comms only after an event occurred
      tracing: Separate open function from set_event and available_events
      tracing: Remove unused function unregister_tracer()
      tracing: Make tracing_enabled be equal to tracing_on
      tracing: Remove deprecated tracing_enabled file
      tracing: Use irq_work for wake ups and remove *_nowake_*() functions
      tracing: Add trace_options kernel command line parameter

Sukadev Bhattiprolu (1):
      perf powerpc: Use uapi/unistd.h to fix build error

Suzuki K. Poulose (1):
      Account the nr_entries in rblist properly

Vaibhav Nagarnaik (1):
      tracing: Cleanup unnecessary function declarations

Wei Yongjun (1):
      perf tools: Remove duplicated include from trace-event-python.c

Yoshihiro YUNOMAE (2):
      ring-buffer: Change unsigned long type of ring_buffer_oldest_event_ts() to u64
      tracing: Show raw time stamp on stats per cpu using counter or tsc mode for trace_clock

Zheng Liu (1):
      perf test: fix a build error on builtin-test


 Documentation/kernel-parameters.txt                |   16 +
 arch/alpha/include/asm/Kbuild                      |    1 +
 arch/arm/include/asm/Kbuild                        |    1 +
 arch/arm64/include/asm/Kbuild                      |    1 +
 arch/avr32/include/asm/Kbuild                      |    1 +
 arch/blackfin/include/asm/Kbuild                   |    1 +
 arch/c6x/include/asm/Kbuild                        |    1 +
 arch/cris/include/asm/Kbuild                       |    1 +
 arch/frv/include/asm/Kbuild                        |    1 +
 arch/h8300/include/asm/Kbuild                      |    1 +
 arch/hexagon/include/asm/Kbuild                    |    1 +
 arch/ia64/include/asm/Kbuild                       |    1 +
 arch/m32r/include/asm/Kbuild                       |    1 +
 arch/m68k/include/asm/Kbuild                       |    1 +
 arch/microblaze/include/asm/Kbuild                 |    1 +
 arch/mips/include/asm/Kbuild                       |    1 +
 arch/mn10300/include/asm/Kbuild                    |    1 +
 arch/openrisc/include/asm/Kbuild                   |    1 +
 arch/parisc/include/asm/Kbuild                     |    1 +
 arch/powerpc/include/asm/Kbuild                    |    1 +
 arch/powerpc/kernel/signal.c                       |    4 +-
 arch/powerpc/kernel/uprobes.c                      |    6 +
 arch/s390/include/asm/Kbuild                       |    1 +
 arch/score/include/asm/Kbuild                      |    1 +
 arch/sh/include/asm/Kbuild                         |    1 +
 arch/sparc/include/asm/Kbuild                      |    1 +
 arch/tile/include/asm/Kbuild                       |    1 +
 arch/um/include/asm/Kbuild                         |    1 +
 arch/unicore32/include/asm/Kbuild                  |    1 +
 arch/x86/include/asm/trace_clock.h                 |   20 +
 arch/x86/kernel/Makefile                           |    2 +-
 arch/x86/kernel/cpu/perf_event.c                   |  121 ++
 arch/x86/kernel/cpu/perf_event.h                   |    5 +
 arch/x86/kernel/cpu/perf_event_amd.c               |    9 +
 arch/x86/kernel/cpu/perf_event_intel.c             |    9 +
 arch/x86/kernel/cpu/perf_event_p6.c                |    2 +
 arch/x86/kernel/rtc.c                              |    6 -
 arch/x86/kernel/trace_clock.c                      |   21 +
 arch/x86/kernel/tsc.c                              |    6 +
 arch/x86/kernel/uprobes.c                          |   54 +-
 arch/xtensa/include/asm/Kbuild                     |    1 +
 include/asm-generic/trace_clock.h                  |   16 +
 include/linux/ftrace_event.h                       |   20 +-
 include/linux/kernel.h                             |    7 +-
 include/linux/ring_buffer.h                        |    3 +-
 include/linux/trace_clock.h                        |    2 +
 include/linux/uprobes.h                            |   10 +-
 include/trace/ftrace.h                             |   76 +-
 include/trace/syscall.h                            |   23 -
 kernel/events/uprobes.c                            |   43 +-
 kernel/fork.c                                      |    2 +
 kernel/trace/Kconfig                               |    1 +
 kernel/trace/ftrace.c                              |    6 +-
 kernel/trace/ring_buffer.c                         |   51 +-
 kernel/trace/trace.c                               |  411 +++---
 kernel/trace/trace.h                               |   18 +-
 kernel/trace/trace_branch.c                        |    4 +-
 kernel/trace/trace_events.c                        |   51 +-
 kernel/trace/trace_events_filter.c                 |    4 +-
 kernel/trace/trace_functions.c                     |    5 +-
 kernel/trace/trace_functions_graph.c               |    6 +-
 kernel/trace/trace_irqsoff.c                       |   14 +-
 kernel/trace/trace_kprobe.c                        |   10 +-
 kernel/trace/trace_output.c                        |   78 +-
 kernel/trace/trace_probe.c                         |   14 +-
 kernel/trace/trace_sched_switch.c                  |    4 +-
 kernel/trace/trace_sched_wakeup.c                  |   10 +-
 kernel/trace/trace_selftest.c                      |   13 +-
 kernel/trace/trace_syscalls.c                      |   61 +-
 kernel/trace/trace_uprobe.c                        |    4 +-
 tools/lib/traceevent/Makefile                      |    2 +-
 tools/lib/traceevent/event-parse.c                 |   22 +-
 tools/perf/Documentation/Makefile                  |   31 +-
 tools/perf/Documentation/android.txt               |   78 +
 tools/perf/Documentation/perf-diff.txt             |   60 +
 tools/perf/Documentation/perf-inject.txt           |   11 +
 tools/perf/Documentation/perf-stat.txt             |    5 +
 tools/perf/Documentation/perf-trace.txt            |    6 +
 tools/perf/Makefile                                |  174 ++-
 tools/perf/arch/common.c                           |  211 +++
 tools/perf/arch/common.h                           |   10 +
 tools/perf/builtin-annotate.c                      |   17 +-
 tools/perf/builtin-buildid-cache.c                 |    1 +
 tools/perf/builtin-buildid-list.c                  |    6 +-
 tools/perf/builtin-diff.c                          |  437 +++++-
 tools/perf/builtin-evlist.c                        |    5 +-
 tools/perf/builtin-inject.c                        |  195 ++-
 tools/perf/builtin-kmem.c                          |    5 +-
 tools/perf/builtin-kvm.c                           |   35 +-
 tools/perf/builtin-lock.c                          |    2 -
 tools/perf/builtin-record.c                        |   66 +-
 tools/perf/builtin-report.c                        |   23 +-
 tools/perf/builtin-sched.c                         |    8 +-
 tools/perf/builtin-script.c                        |   87 +-
 tools/perf/builtin-stat.c                          |   54 +-
 tools/perf/builtin-test.c                          | 1547 --------------------
 tools/perf/builtin-timechart.c                     |    5 +-
 tools/perf/builtin-top.c                           |   17 +-
 tools/perf/builtin-trace.c                         |  403 ++++-
 tools/perf/config/feature-tests.mak                |   25 +-
 tools/perf/config/utilities.mak                    |   10 +-
 tools/perf/perf.c                                  |   20 +-
 tools/perf/perf.h                                  |   18 +-
 tools/perf/tests/attr.c                            |  175 +++
 tools/perf/tests/attr.py                           |  322 ++++
 tools/perf/tests/attr/README                       |   64 +
 tools/perf/tests/attr/base-record                  |   39 +
 tools/perf/tests/attr/base-stat                    |   39 +
 tools/perf/tests/attr/test-record-basic            |    5 +
 tools/perf/tests/attr/test-record-branch-any       |    8 +
 .../perf/tests/attr/test-record-branch-filter-any  |    8 +
 .../tests/attr/test-record-branch-filter-any_call  |    8 +
 .../tests/attr/test-record-branch-filter-any_ret   |    8 +
 tools/perf/tests/attr/test-record-branch-filter-hv |    8 +
 .../tests/attr/test-record-branch-filter-ind_call  |    8 +
 tools/perf/tests/attr/test-record-branch-filter-k  |    8 +
 tools/perf/tests/attr/test-record-branch-filter-u  |    8 +
 tools/perf/tests/attr/test-record-count            |    8 +
 tools/perf/tests/attr/test-record-data             |    8 +
 tools/perf/tests/attr/test-record-freq             |    6 +
 tools/perf/tests/attr/test-record-graph-default    |    6 +
 tools/perf/tests/attr/test-record-graph-dwarf      |   10 +
 tools/perf/tests/attr/test-record-graph-fp         |    6 +
 tools/perf/tests/attr/test-record-group            |   18 +
 tools/perf/tests/attr/test-record-group1           |   19 +
 tools/perf/tests/attr/test-record-no-delay         |    9 +
 tools/perf/tests/attr/test-record-no-inherit       |    7 +
 tools/perf/tests/attr/test-record-no-samples       |    6 +
 tools/perf/tests/attr/test-record-period           |    7 +
 tools/perf/tests/attr/test-record-raw              |    7 +
 tools/perf/tests/attr/test-stat-basic              |    6 +
 tools/perf/tests/attr/test-stat-default            |   64 +
 tools/perf/tests/attr/test-stat-detailed-1         |  101 ++
 tools/perf/tests/attr/test-stat-detailed-2         |  155 ++
 tools/perf/tests/attr/test-stat-detailed-3         |  173 +++
 tools/perf/tests/attr/test-stat-group              |   15 +
 tools/perf/tests/attr/test-stat-group1             |   15 +
 tools/perf/tests/attr/test-stat-no-inherit         |    7 +
 tools/perf/tests/builtin-test.c                    |  173 +++
 .../{util/dso-test-data.c => tests/dso-data.c}     |    8 +-
 tools/perf/tests/evsel-roundtrip-name.c            |  114 ++
 tools/perf/tests/evsel-tp-sched.c                  |   84 ++
 tools/perf/tests/mmap-basic.c                      |  162 ++
 tools/perf/tests/open-syscall-all-cpus.c           |  120 ++
 tools/perf/tests/open-syscall-tp-fields.c          |  117 ++
 tools/perf/tests/open-syscall.c                    |   66 +
 .../parse-events-test.c => tests/parse-events.c}   |   91 +-
 tools/perf/tests/perf-record.c                     |  312 ++++
 tools/perf/tests/pmu.c                             |  178 +++
 tools/perf/tests/rdpmc.c                           |  175 +++
 tools/perf/tests/tests.h                           |   22 +
 tools/perf/tests/util.c                            |   30 +
 tools/perf/tests/vmlinux-kallsyms.c                |  230 +++
 tools/perf/ui/browsers/annotate.c                  |   45 +-
 tools/perf/ui/browsers/hists.c                     |   97 +-
 tools/perf/ui/browsers/scripts.c                   |  189 +++
 tools/perf/ui/gtk/browser.c                        |    4 +-
 tools/perf/ui/gtk/gtk.h                            |    1 +
 tools/perf/ui/gtk/progress.c                       |   59 +
 tools/perf/ui/gtk/setup.c                          |    2 +
 tools/perf/ui/gtk/util.c                           |   11 -
 tools/perf/ui/hist.c                               |  138 +-
 tools/perf/ui/progress.c                           |   44 +-
 tools/perf/ui/progress.h                           |   10 +
 tools/perf/ui/stdio/hist.c                         |    2 +-
 tools/perf/ui/tui/progress.c                       |   42 +
 tools/perf/ui/tui/setup.c                          |    1 +
 tools/perf/ui/ui.h                                 |   28 +
 tools/perf/util/PERF-VERSION-GEN                   |   14 +-
 tools/perf/util/annotate.c                         |   72 +-
 tools/perf/util/annotate.h                         |   10 +-
 tools/perf/util/build-id.c                         |   27 +-
 tools/perf/util/build-id.h                         |   11 +-
 tools/perf/util/cache.h                            |   39 +-
 tools/perf/util/debug.h                            |    1 +
 tools/perf/util/dso.c                              |  595 ++++++++
 tools/perf/util/dso.h                              |  148 ++
 tools/perf/util/event.c                            |  302 +---
 tools/perf/util/event.h                            |    9 +-
 tools/perf/util/evlist.c                           |   13 +-
 tools/perf/util/evsel.c                            |   52 +-
 tools/perf/util/evsel.h                            |    8 +-
 tools/perf/util/header.c                           |   11 +
 tools/perf/util/header.h                           |    1 +
 tools/perf/util/hist.c                             |   99 ++
 tools/perf/util/hist.h                             |   49 +-
 tools/perf/util/machine.c                          |  464 ++++++
 tools/perf/util/machine.h                          |  148 ++
 tools/perf/util/map.c                              |  182 +--
 tools/perf/util/map.h                              |   93 --
 tools/perf/util/parse-events.c                     |   54 +-
 tools/perf/util/parse-events.h                     |    3 +-
 tools/perf/util/parse-events.l                     |    4 +-
 tools/perf/util/parse-events.y                     |   18 +
 tools/perf/util/pmu.c                              |  192 +--
 tools/perf/util/pmu.h                              |    4 +
 tools/perf/util/pstack.c                           |   46 +-
 tools/perf/util/python.c                           |    2 +
 tools/perf/util/rblist.c                           |    4 +-
 .../util/scripting-engines/trace-event-python.c    |    1 -
 tools/perf/util/session.c                          |    5 +-
 tools/perf/util/session.h                          |    5 +-
 tools/perf/util/sort.h                             |   45 +-
 tools/perf/util/string.c                           |   18 +
 tools/perf/util/symbol.c                           |  658 +--------
 tools/perf/util/symbol.h                           |  162 +-
 tools/perf/util/thread.c                           |   41 +-
 tools/perf/util/thread.h                           |    2 +
 tools/perf/util/trace-event-read.c                 |    2 -
 tools/perf/util/util.c                             |   35 +
 tools/perf/util/util.h                             |    8 +
 211 files changed, 8328 insertions(+), 4116 deletions(-)
 create mode 100644 arch/x86/include/asm/trace_clock.h
 create mode 100644 arch/x86/kernel/trace_clock.c
 create mode 100644 include/asm-generic/trace_clock.h
 create mode 100644 tools/perf/Documentation/android.txt
 create mode 100644 tools/perf/arch/common.c
 create mode 100644 tools/perf/arch/common.h
 delete mode 100644 tools/perf/builtin-test.c
 create mode 100644 tools/perf/tests/attr.c
 create mode 100644 tools/perf/tests/attr.py
 create mode 100644 tools/perf/tests/attr/README
 create mode 100644 tools/perf/tests/attr/base-record
 create mode 100644 tools/perf/tests/attr/base-stat
 create mode 100644 tools/perf/tests/attr/test-record-basic
 create mode 100644 tools/perf/tests/attr/test-record-branch-any
 create mode 100644 tools/perf/tests/attr/test-record-branch-filter-any
 create mode 100644 tools/perf/tests/attr/test-record-branch-filter-any_call
 create mode 100644 tools/perf/tests/attr/test-record-branch-filter-any_ret
 create mode 100644 tools/perf/tests/attr/test-record-branch-filter-hv
 create mode 100644 tools/perf/tests/attr/test-record-branch-filter-ind_call
 create mode 100644 tools/perf/tests/attr/test-record-branch-filter-k
 create mode 100644 tools/perf/tests/attr/test-record-branch-filter-u
 create mode 100644 tools/perf/tests/attr/test-record-count
 create mode 100644 tools/perf/tests/attr/test-record-data
 create mode 100644 tools/perf/tests/attr/test-record-freq
 create mode 100644 tools/perf/tests/attr/test-record-graph-default
 create mode 100644 tools/perf/tests/attr/test-record-graph-dwarf
 create mode 100644 tools/perf/tests/attr/test-record-graph-fp
 create mode 100644 tools/perf/tests/attr/test-record-group
 create mode 100644 tools/perf/tests/attr/test-record-group1
 create mode 100644 tools/perf/tests/attr/test-record-no-delay
 create mode 100644 tools/perf/tests/attr/test-record-no-inherit
 create mode 100644 tools/perf/tests/attr/test-record-no-samples
 create mode 100644 tools/perf/tests/attr/test-record-period
 create mode 100644 tools/perf/tests/attr/test-record-raw
 create mode 100644 tools/perf/tests/attr/test-stat-basic
 create mode 100644 tools/perf/tests/attr/test-stat-default
 create mode 100644 tools/perf/tests/attr/test-stat-detailed-1
 create mode 100644 tools/perf/tests/attr/test-stat-detailed-2
 create mode 100644 tools/perf/tests/attr/test-stat-detailed-3
 create mode 100644 tools/perf/tests/attr/test-stat-group
 create mode 100644 tools/perf/tests/attr/test-stat-group1
 create mode 100644 tools/perf/tests/attr/test-stat-no-inherit
 create mode 100644 tools/perf/tests/builtin-test.c
 rename tools/perf/{util/dso-test-data.c => tests/dso-data.c} (95%)
 create mode 100644 tools/perf/tests/evsel-roundtrip-name.c
 create mode 100644 tools/perf/tests/evsel-tp-sched.c
 create mode 100644 tools/perf/tests/mmap-basic.c
 create mode 100644 tools/perf/tests/open-syscall-all-cpus.c
 create mode 100644 tools/perf/tests/open-syscall-tp-fields.c
 create mode 100644 tools/perf/tests/open-syscall.c
 rename tools/perf/{util/parse-events-test.c => tests/parse-events.c} (94%)
 create mode 100644 tools/perf/tests/perf-record.c
 create mode 100644 tools/perf/tests/pmu.c
 create mode 100644 tools/perf/tests/rdpmc.c
 create mode 100644 tools/perf/tests/tests.h
 create mode 100644 tools/perf/tests/util.c
 create mode 100644 tools/perf/tests/vmlinux-kallsyms.c
 create mode 100644 tools/perf/ui/browsers/scripts.c
 create mode 100644 tools/perf/ui/gtk/progress.c
 create mode 100644 tools/perf/ui/tui/progress.c
 create mode 100644 tools/perf/util/dso.c
 create mode 100644 tools/perf/util/dso.h
 create mode 100644 tools/perf/util/machine.c
 create mode 100644 tools/perf/util/machine.h

[ ... half a meg diff omitted due to lkml limits ... ]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-11  9:09 [GIT PULL] perf changes for v3.8 Ingo Molnar
@ 2012-12-13  2:53 ` Linus Torvalds
  2012-12-13  3:02   ` David Ahern
  2012-12-13  3:25   ` David Ahern
  2012-12-13 17:04 ` [PATCH] x86: fix perf build with uclibc toolchains Florian Fainelli
  1 sibling, 2 replies; 34+ messages in thread
From: Linus Torvalds @ 2012-12-13  2:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Thomas Gleixner, Andrew Morton

Hmm. This may be entirely unrelated to this particular pull request, but

   perf record -e cycles:pp

no longer works on my westmere machine (Operation not supported). It
used to work, but I haven't tried to bisect it, since I hope somebody
will just go "oh, I know what's up".

dmesg says:

  Performance Events: PEBS fmt1+, 16-deep LBR, Westmere events, Intel
PMU driver.
  perf_event_intel: CPUID marked event: 'bus cycles' unavailable
  ... version:                3
  ... bit width:              48
  ... generic registers:      4
  ... value mask:             0000ffffffffffff
  ... max period:             000000007fffffff
  ... fixed-purpose events:   3
  ... event mask:             000000070000000f

Any ideas?

              Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-13  2:53 ` Linus Torvalds
@ 2012-12-13  3:02   ` David Ahern
  2012-12-13  3:09     ` Linus Torvalds
  2012-12-13  3:25   ` David Ahern
  1 sibling, 1 reply; 34+ messages in thread
From: David Ahern @ 2012-12-13  3:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Thomas Gleixner, Andrew Morton

On 12/12/12 7:53 PM, Linus Torvalds wrote:
> Hmm. This may be entirely unrelated to this particular pull request, but
>
>     perf record -e cycles:pp
>
> no longer works on my westmere machine (Operation not supported). It
> used to work, but I haven't tried to bisect it, since I hope somebody
> will just go "oh, I know what's up".

Can you add -v and see if it spits out more info?

David

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-13  3:02   ` David Ahern
@ 2012-12-13  3:09     ` Linus Torvalds
  2012-12-13  3:16       ` David Ahern
  0 siblings, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2012-12-13  3:09 UTC (permalink / raw)
  To: David Ahern
  Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Thomas Gleixner, Andrew Morton

On Wed, Dec 12, 2012 at 7:02 PM, David Ahern <dsahern@gmail.com> wrote:
>
> Can you add -v and see if it spits out more info?

No more info.

Sure, it does the usual "do  you have an APIC" message (it does that
without "-v" too), which isn't useful:

  Error: sys_perf_event_open() syscall returned with 95 (Operation not
supported) for event cycles:pp. /bin/dmesg may provide additional
information.

  No hardware sampling interrupt available. No APIC? If so then you
can boot the kernel with the "lapic" boot parameter to force-enable
it.

And yes, I have a local apic. Every single modern CPU does.

The error message is garbage and actively misleading. Lack of an APIC
is just about the *least* likely possible reason for the EOPNOTSUPP
error return.

              Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-13  3:09     ` Linus Torvalds
@ 2012-12-13  3:16       ` David Ahern
  0 siblings, 0 replies; 34+ messages in thread
From: David Ahern @ 2012-12-13  3:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Thomas Gleixner, Andrew Morton

On 12/12/12 8:09 PM, Linus Torvalds wrote:
> On Wed, Dec 12, 2012 at 7:02 PM, David Ahern <dsahern@gmail.com> wrote:
>>
>> Can you add -v and see if it spits out more info?
>
> No more info.

I'm surprised you are not seeing this as well:

            } else if ((err == EOPNOTSUPP) && (attr->precise_ip)) {
                 ui__error("\'precise\' request may not be supported. "
                       "Try removing 'p' modifier\n");
                 rc = -err;
                 goto out;
             }

I made changes in this area relatively recently; I'll take a look.

David



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-13  2:53 ` Linus Torvalds
  2012-12-13  3:02   ` David Ahern
@ 2012-12-13  3:25   ` David Ahern
  2012-12-13  3:34     ` Linus Torvalds
  1 sibling, 1 reply; 34+ messages in thread
From: David Ahern @ 2012-12-13  3:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Thomas Gleixner, Andrew Morton

On 12/12/12 7:53 PM, Linus Torvalds wrote:
> Hmm. This may be entirely unrelated to this particular pull request, but
>
>     perf record -e cycles:pp
>
> no longer works on my westmere machine (Operation not supported). It
> used to work, but I haven't tried to bisect it, since I hope somebody
> will just go "oh, I know what's up".

One last "I may know what's up" question. I wonder if you are tripping 
on this:

     if (event->attr.precise_ip) {
         int precise = 0;

         if (!event->attr.exclude_guest)
             return -EOPNOTSUPP;

Are you running an older perf binary on the 3.8 kernel?

Does this work: perf record -e cycles:ppH  ...

David

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-13  3:25   ` David Ahern
@ 2012-12-13  3:34     ` Linus Torvalds
  2012-12-13  3:43       ` David Ahern
  0 siblings, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2012-12-13  3:34 UTC (permalink / raw)
  To: David Ahern
  Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Thomas Gleixner, Andrew Morton

On Wed, Dec 12, 2012 at 7:25 PM, David Ahern <dsahern@gmail.com> wrote:
>
> Are you running an older perf binary on the 3.8 kernel?

I am.. I don't tend to rebuild 'perf'..

> Does this work: perf record -e cycles:ppH  ...

Yes it does. What is 'H' and why should anybody care? Especially since
I'm not running virtualized.

That whole "exclude_guest" test is insane when there isn't any
virtualization going on. Very annoying.

                Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-13  3:34     ` Linus Torvalds
@ 2012-12-13  3:43       ` David Ahern
  2012-12-13  3:51         ` Linus Torvalds
                           ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: David Ahern @ 2012-12-13  3:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Thomas Gleixner, Andrew Morton

On 12/12/12 8:34 PM, Linus Torvalds wrote:
> On Wed, Dec 12, 2012 at 7:25 PM, David Ahern <dsahern@gmail.com> wrote:
>>
>> Are you running an older perf binary on the 3.8 kernel?
>
> I am.. I don't tend to rebuild 'perf'..
>
>> Does this work: perf record -e cycles:ppH  ...
>
> Yes it does. What is 'H' and why should anybody care? Especially since
> I'm not running virtualized.
>
> That whole "exclude_guest" test is insane when there isn't any
> virtualization going on. Very annoying.

you know what's worse? All of your VMs blowing up because anyone runs 
perf with precise attribute. Virtualization and and performance 
monitoring collide. From the log message for commit 1342798.

"Intel PEBS in VT-x context uses the DS address as a guest linear 
address, even though its programmed by the host as a host linear 
address. This either results in guest memory corruption and or the 
hardware faulting and 'crashing' the virtual machine.  Therefore we have 
to disable PEBS on VT-x enter and re-enable on VT-x exit, enforcing a 
strict exclude_guest."

David


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-13  3:43       ` David Ahern
@ 2012-12-13  3:51         ` Linus Torvalds
  2012-12-13  4:31           ` David Ahern
  2012-12-13  7:48         ` [PATCH] Revert "perf: Require exclude_guest to use PEBS - kernel side enforcement" Ingo Molnar
       [not found]         ` <20121217102000.GE11016@redhat.com>
  2 siblings, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2012-12-13  3:51 UTC (permalink / raw)
  To: David Ahern
  Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Thomas Gleixner, Andrew Morton

On Wed, Dec 12, 2012 at 7:43 PM, David Ahern <dsahern@gmail.com> wrote:
>
> you know what's worse? All of your VMs blowing up because anyone runs perf
> with precise attribute. Virtualization and and performance monitoring
> collide. From the log message for commit 1342798.
>
> "Intel PEBS in VT-x context uses the DS address as a guest linear address,
> even though its programmed by the host as a host linear address. This either
> results in guest memory corruption and or the hardware faulting and
> 'crashing' the virtual machine.  Therefore we have to disable PEBS on VT-x
> enter and re-enable on VT-x exit, enforcing a strict exclude_guest."

Right.

SO WHY DON'T YOU JUST DO THAT THEN?

Disable PEBS on Vt-x enter and re-enable it on exit. End of story.
Exactly like you say.

But don't in the process screw up people WHO DON'T EVEN DO VIRTUALIZATION!

So please, just remove that idiotic "if (!event->attr.exclude_guest)"
test. It's wrong. It cannot possibly do the right thing.  It is
totally misdesigned, exactly because you don't even know beforehand if
somebody uses virtualization or not.

Now, if the feature had been done the sane way around, and you'd have
an explicit flag that says "force this even on entry to virtualized
guests", then you could have said "Dave, I can't do that combination
of precise and virtualized guests". At that point you have - at perf
record time - a valid reason to say EOPNOTSUPP.

But doing it this way was wrong. Switch that "exclude_guest" attribute
around, and admit that "H" was bogus, and that the right thing to do
was to add a "V" flag that sets the "force_guest" flag instead.

Problem solved, without screwing people who have no reason to ever care.

               Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-13  3:51         ` Linus Torvalds
@ 2012-12-13  4:31           ` David Ahern
  2012-12-13  4:46             ` Linus Torvalds
  2012-12-13  7:30             ` Ingo Molnar
  0 siblings, 2 replies; 34+ messages in thread
From: David Ahern @ 2012-12-13  4:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Thomas Gleixner, Andrew Morton

On 12/12/12 8:51 PM, Linus Torvalds wrote:
> SO WHY DON'T YOU JUST DO THAT THEN?
>
> Disable PEBS on Vt-x enter and re-enable it on exit. End of story.
> Exactly like you say.

See commit 26a4f3c0. But that was not enough. Requiring exclude_guest 
was another required piece. If you want to see the discussion: 
https://lkml.org/lkml/2012/7/9/264

>
> But doing it this way was wrong. Switch that "exclude_guest" attribute
> around, and admit that "H" was bogus, and that the right thing to do
> was to add a "V" flag that sets the "force_guest" flag instead.

I understand this is annoying. Older binaries on newer kernels was the 
only case I could not fix. (I guess a message could be added kernel side 
to at least give a hint.) But the alternative -- based on code that has 
existed for some time -- is for older binaries to crash VMs.

David


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-13  4:31           ` David Ahern
@ 2012-12-13  4:46             ` Linus Torvalds
  2012-12-13  7:27               ` Ingo Molnar
  2012-12-13  7:30             ` Ingo Molnar
  1 sibling, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2012-12-13  4:46 UTC (permalink / raw)
  To: David Ahern
  Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Thomas Gleixner, Andrew Morton

On Wed, Dec 12, 2012 at 8:31 PM, David Ahern <dsahern@gmail.com> wrote:
>
>
> See commit 26a4f3c0. But that was not enough.

Why? Make the people who run virtualization do the extra work. Things
never worked for them anyway, so forcing *them* to set a flag to get a
working thing is sane.

Forcing everybody else to set a flag is insane. See?

Your "that was not enough" is insane. It's purely about which *default
convention* you choose. The "if (!event->attr.exclude_guest)" test is
the wrong default convention, and it *should* have been "if
(event->attr.include_guest)" with the virtualization people forced to
use "cycles:ppV".

Claiming that there is some hardware overrun is silly, since that's
totally *independent* of the choice of which way the flag works!

> Requiring exclude_guest was
> another required piece. If you want to see the discussion:
> https://lkml.org/lkml/2012/7/9/264

The only thing that discussion shows is that people were *AWARE* that
this was a stupid change. I see Peter pointing out that this breaks
peoples existing working setups.

You broke the WORKING case for old binaries in order to give an error
return in a case that NEVER EVEN WORKED with those binaries. Don't you
see how insane that is?

The 'H' flag is totally the wrong way around.  Exactly because it only
"fixes" a case that was already working, and makes a case that never
worked anyway now return an error value. That's not sane. Since the
old broken case never worked, nobody can have depended on it. See why
I'm saying that it's the people who use virtualization who should be
forced to use the new flag, not the other way around?

                   Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-13  4:46             ` Linus Torvalds
@ 2012-12-13  7:27               ` Ingo Molnar
  0 siblings, 0 replies; 34+ messages in thread
From: Ingo Molnar @ 2012-12-13  7:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Ahern, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Thomas Gleixner, Andrew Morton


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, Dec 12, 2012 at 8:31 PM, David Ahern <dsahern@gmail.com> wrote:
> >
> >
> > See commit 26a4f3c0. But that was not enough.
> 
> Why? Make the people who run virtualization do the extra work. Things
> never worked for them anyway, so forcing *them* to set a flag to get a
> working thing is sane.
> 
> Forcing everybody else to set a flag is insane. See?

Yeah, that's 100% stupid, we'll revert this change.

Arnado, wanna do it or should I? This slipped through the 
testing cracks ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-13  4:31           ` David Ahern
  2012-12-13  4:46             ` Linus Torvalds
@ 2012-12-13  7:30             ` Ingo Molnar
  2012-12-13 14:30               ` David Ahern
  1 sibling, 1 reply; 34+ messages in thread
From: Ingo Molnar @ 2012-12-13  7:30 UTC (permalink / raw)
  To: David Ahern
  Cc: Linus Torvalds, Linux Kernel Mailing List,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton


* David Ahern <dsahern@gmail.com> wrote:

> > But doing it this way was wrong. Switch that "exclude_guest" 
> > attribute around, and admit that "H" was bogus, and that the 
> > right thing to do was to add a "V" flag that sets the 
> > "force_guest" flag instead.
> 
> I understand this is annoying. [...]

It's not annoying, it's outright broken - it's a regression that 
we'll fix.

> [...] Older binaries on newer kernels was the only case I 
> could not fix. [...]

The "only" case?? Old, working binaries are actually our _most_ 
important usecase: it's 99.9% of our current installed base ...

> [...] (I guess a message could be added kernel side to at 
> least give a hint.) But the alternative -- based on code that 
> has existed for some time -- is for older binaries to crash 
> VMs.

That should be fixed differently, by not breaking existing 
working functionality.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH] Revert "perf: Require exclude_guest to use PEBS - kernel side enforcement"
  2012-12-13  3:43       ` David Ahern
  2012-12-13  3:51         ` Linus Torvalds
@ 2012-12-13  7:48         ` Ingo Molnar
       [not found]         ` <20121217102000.GE11016@redhat.com>
  2 siblings, 0 replies; 34+ messages in thread
From: Ingo Molnar @ 2012-12-13  7:48 UTC (permalink / raw)
  To: David Ahern
  Cc: Linus Torvalds, Linux Kernel Mailing List,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton


* David Ahern <dsahern@gmail.com> wrote:

> On 12/12/12 8:34 PM, Linus Torvalds wrote:
> >On Wed, Dec 12, 2012 at 7:25 PM, David Ahern <dsahern@gmail.com> wrote:
> >>
> >>Are you running an older perf binary on the 3.8 kernel?
> >
> >I am.. I don't tend to rebuild 'perf'..
> >
> >>Does this work: perf record -e cycles:ppH  ...
> >
> >Yes it does. What is 'H' and why should anybody care? Especially since
> >I'm not running virtualized.
> >
> > That whole "exclude_guest" test is insane when there isn't 
> > any virtualization going on. Very annoying.
> 
> you know what's worse? [...]

No, nothing can be worse than breaking 99% of our installed 
base...

I'm wondering where this broke - is it:

  20b279ddb38c perf: Require exclude_guest to use PEBS - kernel side enforcement

Linus, does the straight revert below fix everything for you - 
or do we need to do more?

( The VM problem needs a different fix: a new include_guest bit 
  should be introduced, which would naturally default to 'off' 
  on older binaries, and the old bit should be phased out. Then 
  new perf binaries can turn on that bit safely. Or PEBS should 
  be fixed for guests. Or something along these lines - but 
  it should *not* by fixed by regressing existing binaries ... )

Thanks,

	Ingo

----------------->
>From 581ba4671bf1d1095e9ecf843be61904e4c97e91 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@kernel.org>
Date: Thu, 13 Dec 2012 08:41:40 +0100
Subject: [PATCH] Revert "perf: Require exclude_guest to use PEBS - kernel
 side enforcement"

This reverts commit 20b279ddb38ca42f8863cec07b4d45ec24589f13.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/cpu/perf_event.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 4428fd1..6774c17 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -340,9 +340,6 @@ int x86_setup_perfctr(struct perf_event *event)
 		/* BTS is currently only allowed for user-mode. */
 		if (!attr->exclude_kernel)
 			return -EOPNOTSUPP;
-
-		if (!attr->exclude_guest)
-			return -EOPNOTSUPP;
 	}
 
 	hwc->config |= config;
@@ -385,9 +382,6 @@ int x86_pmu_hw_config(struct perf_event *event)
 	if (event->attr.precise_ip) {
 		int precise = 0;
 
-		if (!event->attr.exclude_guest)
-			return -EOPNOTSUPP;
-
 		/* Support for constant skid */
 		if (x86_pmu.pebs_active && !x86_pmu.pebs_broken) {
 			precise++;

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-13  7:30             ` Ingo Molnar
@ 2012-12-13 14:30               ` David Ahern
  2012-12-13 14:38                 ` David Ahern
  2012-12-13 16:03                 ` Linus Torvalds
  0 siblings, 2 replies; 34+ messages in thread
From: David Ahern @ 2012-12-13 14:30 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds
  Cc: Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Thomas Gleixner, Andrew Morton

On 12/13/12 12:30 AM, Ingo Molnar wrote:
>
> * David Ahern <dsahern@gmail.com> wrote:
>
>>> But doing it this way was wrong. Switch that "exclude_guest"
>>> attribute around, and admit that "H" was bogus, and that the
>>> right thing to do was to add a "V" flag that sets the
>>> "force_guest" flag instead.
>>
>> I understand this is annoying. [...]
>
> It's not annoying, it's outright broken - it's a regression that
> we'll fix.

One of the problems is that existing binaries set the exclude_guest flag 
(https://lkml.org/lkml/2012/7/9/292).

So, requesting users to update their binaries if they want to use 
precise sampling is not acceptable. A 100% catastrophic failure of all 
running VMs is acceptable? All VMs will crash and there is no direct 
causal relationship.

David

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-13 14:30               ` David Ahern
@ 2012-12-13 14:38                 ` David Ahern
  2012-12-13 16:03                 ` Linus Torvalds
  1 sibling, 0 replies; 34+ messages in thread
From: David Ahern @ 2012-12-13 14:38 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds
  Cc: Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Thomas Gleixner, Andrew Morton

On 12/13/12 7:30 AM, David Ahern wrote:
>> It's not annoying, it's outright broken - it's a regression that
>> we'll fix.
>
> One of the problems is that existing binaries set the exclude_guest flag
> (https://lkml.org/lkml/2012/7/9/292).

Correction, I meant to say one of the problems is that existing binaries 
sets the flag to 0 when precise is used.

>
> So, requesting users to update their binaries if they want to use
> precise sampling is not acceptable. A 100% catastrophic failure of all
> running VMs is acceptable? All VMs will crash and there is no direct
> causal relationship.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-13 14:30               ` David Ahern
  2012-12-13 14:38                 ` David Ahern
@ 2012-12-13 16:03                 ` Linus Torvalds
  2012-12-13 16:24                   ` David Ahern
  1 sibling, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2012-12-13 16:03 UTC (permalink / raw)
  To: David Ahern
  Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Thomas Gleixner, Andrew Morton

On Thu, Dec 13, 2012 at 6:30 AM, David Ahern <dsahern@gmail.com> wrote:
>
> One of the problems is that existing binaries set the exclude_guest flag
> (https://lkml.org/lkml/2012/7/9/292).

[ to zero ]

Yeah. And it apparently *never* worked. So it's not a regression.

> So, requesting users to update their binaries if they want to use precise
> sampling is not acceptable. A 100% catastrophic failure of all running VMs
> is acceptable? All VMs will crash and there is no direct causal
> relationship.

So instead, you expect everybody else - for whom things *used* to work
- to upgrade their binary, or their scripts, or just start using an
insane command line flag that makes no sense for them? Forcing
non-virtualization users to use a "only trace the host" flag is crazy.

Either way, somebody will be unhappy. No question about that. But our
rule in the kernel is "no regressions".

Now, I do agree that for "perf", it's fairly easy to say "just
recompile". I can do it in seconds, and it would presumably solve my
problem by just making the "host only" case the default, and I don't
need the "H" any more.

But that whole "no regressions" really is important. I can work around
things very easily, but the "no regressions" rule really means that I
should never *need* to work around things.

So when I see a regression, I consider it a major bug, even if the
workaround is trivial.

                     Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-13 16:03                 ` Linus Torvalds
@ 2012-12-13 16:24                   ` David Ahern
  2012-12-13 16:33                     ` Linus Torvalds
  0 siblings, 1 reply; 34+ messages in thread
From: David Ahern @ 2012-12-13 16:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Thomas Gleixner, Andrew Morton

On 12/13/12 9:03 AM, Linus Torvalds wrote:
> On Thu, Dec 13, 2012 at 6:30 AM, David Ahern <dsahern@gmail.com> wrote:
>>
>> One of the problems is that existing binaries set the exclude_guest flag
>> (https://lkml.org/lkml/2012/7/9/292).
>
> [ to zero ]
>
> Yeah. And it apparently *never* worked. So it's not a regression.

The flag works. It does have a purpose. I did not write the original 
code; I am not defending its design. It is what is. We now have a 
catastrophic problem that needs to be fixed.

> So instead, you expect everybody else - for whom things *used* to work
> - to upgrade their binary, or their scripts, or just start using an
> insane command line flag that makes no sense for them? Forcing
> non-virtualization users to use a "only trace the host" flag is crazy.
>
> Either way, somebody will be unhappy. No question about that. But our
> rule in the kernel is "no regressions".

...

> But that whole "no regressions" really is important. I can work around
> things very easily, but the "no regressions" rule really means that I
> should never *need* to work around things.

I get the regressions point. I have seen that statement from you enough 
I think you have it on a permanent copy-and-paste shortcut.

Without the kernel side restriction existing perf binaries will crash 
all running VMs. I could write the patch to completely invert the 
exclude_guest logic -- make it include_guest. That breaks all existing 
perf binaries as well - just a different syntax that gets broken. That 
regression is acceptable?

David



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-13 16:24                   ` David Ahern
@ 2012-12-13 16:33                     ` Linus Torvalds
  2012-12-13 16:59                       ` Ingo Molnar
  2012-12-13 17:02                       ` Linus Torvalds
  0 siblings, 2 replies; 34+ messages in thread
From: Linus Torvalds @ 2012-12-13 16:33 UTC (permalink / raw)
  To: David Ahern
  Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Thomas Gleixner, Andrew Morton

On Thu, Dec 13, 2012 at 8:24 AM, David Ahern <dsahern@gmail.com> wrote:
>
> Without the kernel side restriction existing perf binaries will crash all
> running VMs.

..and they apparently always did, and we had that situation for years
without anybody ever even noticing.

And no, it's not a security fix, since you can just add the 'H' flag
and it will *still* crash according to the thread I saw (ie there is
some race condition in PEBS handling at VM entry, possibly at a
hardware level).

So the real security fix has to either fix the root cause or the
actual crash (which apparently is unknown), or to make perf be
root-only at least in the presense of virtualization.

The "return EOPNOTSUPP" thing does nothing but annoy people.

> I could write the patch to completely invert the exclude_guest
> logic -- make it include_guest. That breaks all existing perf binaries as
> well - just a different syntax that gets broken. That regression is
> acceptable?

It's not a regression since THAT CODE NEVER WORKED, for chissake! The
case of people actually profiling into virtual machines crashes the
running VMs, as you say. There's no way in hell we can call it a
regression to say "you now have to use a flag if you profile a load
with virtualization", since there wasn't any working case to begin
with.

                Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-13 16:33                     ` Linus Torvalds
@ 2012-12-13 16:59                       ` Ingo Molnar
  2012-12-13 17:10                         ` Linus Torvalds
  2012-12-13 17:02                       ` Linus Torvalds
  1 sibling, 1 reply; 34+ messages in thread
From: Ingo Molnar @ 2012-12-13 16:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Ahern, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Thomas Gleixner, Andrew Morton


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> > I could write the patch to completely invert the 
> > exclude_guest logic -- make it include_guest. That breaks 
> > all existing perf binaries as well - just a different syntax 
> > that gets broken. That regression is acceptable?
> 
> It's not a regression since THAT CODE NEVER WORKED, for 
> chissake! The case of people actually profiling into virtual 
> machines crashes the running VMs, as you say. There's no way 
> in hell we can call it a regression to say "you now have to 
> use a flag if you profile a load with virtualization", since 
> there wasn't any working case to begin with.

Correct.

::include_guest looks like the more logical flag direction to
use in any case.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-13 16:33                     ` Linus Torvalds
  2012-12-13 16:59                       ` Ingo Molnar
@ 2012-12-13 17:02                       ` Linus Torvalds
  2012-12-13 17:30                         ` David Ahern
  1 sibling, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2012-12-13 17:02 UTC (permalink / raw)
  To: David Ahern
  Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Thomas Gleixner, Andrew Morton

Btw, I do *not* think that you should necessariyl default to 'H' for
host-only mode.

The way it should work is that ":pp", ":ppH" and ":ppV" are all different.

 - "cycles:ppH" means: I want precise cycles only for the host case

 - "cycles:ppV" means: I want precise cycles, and I want the VM too

   This would result in EOPNOTSUPP for the case we know is buggy (but
presumably work on some other CPUs that don't have the problem)

 - "cycles:pp" is "I want precise cycles, and I don't care about virtualization"

   This would do whatever works. So it would basically become
host-only, but if you don't want precise cycles (so no ":pp") then
whatever our old behavior was (presumably "profile the virtual machine
too") would be what happens.

That sounds like (a) the interface that people want and (b) entirely
backwards-compatible for all cases that can matter (where "oops, I
crashed the VM" case does not matter).

                Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH] x86: fix perf build with uclibc toolchains
  2012-12-11  9:09 [GIT PULL] perf changes for v3.8 Ingo Molnar
  2012-12-13  2:53 ` Linus Torvalds
@ 2012-12-13 17:04 ` Florian Fainelli
  1 sibling, 0 replies; 34+ messages in thread
From: Florian Fainelli @ 2012-12-13 17:04 UTC (permalink / raw)
  To: open list; +Cc: Florian Fainelli

libio.h is not provided by uClibc, in order to be able to test the
definition of __UCLIBC__ we need to include stdlib.h, which also
includes stddef.h, providing the definition of 'NULL'

Signed-off-by: Florian Fainelli <florian@openwrt.org>
---
 tools/perf/arch/x86/util/dwarf-regs.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/arch/x86/util/dwarf-regs.c b/tools/perf/arch/x86/util/dwarf-regs.c
index a794d30..6f5267f 100644
--- a/tools/perf/arch/x86/util/dwarf-regs.c
+++ b/tools/perf/arch/x86/util/dwarf-regs.c
@@ -20,7 +20,10 @@
  *
  */
 
+#include <stdlib.h>
+#ifndef __UCLIBC__
 #include <libio.h>
+#endif
 #include <dwarf-regs.h>
 
 /*
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-13 16:59                       ` Ingo Molnar
@ 2012-12-13 17:10                         ` Linus Torvalds
  2012-12-13 17:31                           ` Ingo Molnar
  0 siblings, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2012-12-13 17:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Ahern, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Thomas Gleixner, Andrew Morton

On Thu, Dec 13, 2012 at 8:59 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>
>> It's not a regression since THAT CODE NEVER WORKED, for
>> chissake! The case of people actually profiling into virtual
>> machines crashes the running VMs, as you say. There's no way
>> in hell we can call it a regression to say "you now have to
>> use a flag if you profile a load with virtualization", since
>> there wasn't any working case to begin with.
>
> Correct.
>
> ::include_guest looks like the more logical flag direction to
> use in any case.

See the email I just sent. The *non*-precise case presumably used to
work (and included the virtualized environment). No?

So the default shouldn't necessarily be "include guest". The default
should presumably be "the user didn't say", and then the kernel does
whatever works best.

If the user actually explicitly says one or the other, we should try
to honor that (and then EOPNOTSUPP may be a "sorry, I really cannot do
that particular combination that you explicitly asked for").

That should make everybody happy. Doing a non-PEBS virtualized perf
run should still work with the old binary.

So there should be two bits: "include guest" (V in the event specifier
unless you already used that for something else) and "host only" (H),
and they should both default to off. Then the kernel can see the three
actual cases.

(Or four cases, if you really want to: you may or may not want to make
the "both V and H set means both, and _only_ V set means 'no host at
all, _only_ virtual environment'. So then ":ppV" would mean
"cycle-accurate for virtual box _only_", while ":ppVH" would mean
"cycle-accurate for both the host and the virtual box". Of course,
considering the PEBS interface, right now neither of those can
actually work, but plain ":V" and ":HV" could work).

The important thing, I think, is that if the user doesn't know or care
about the VM case (because he's not running any!) and doesn't specify,
then the kernel should not say EOPNOTSUPP, and should do whatever
works for that cpu.

                 Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-13 17:02                       ` Linus Torvalds
@ 2012-12-13 17:30                         ` David Ahern
  2012-12-13 17:36                           ` Ingo Molnar
  0 siblings, 1 reply; 34+ messages in thread
From: David Ahern @ 2012-12-13 17:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Thomas Gleixner, Andrew Morton

On 12/13/12 10:02 AM, Linus Torvalds wrote:

 From your response to Ingo I take it you looked into other cases. I'll 
summarize here to make sure we are on the same page:

1. guest only profiling from the host
perf {record|top} -e cycles:G

2. host only profiling
perf {record|top} -e cycles:H

These are 4 existing use cases that toggle exclude_guest and do work 
today for those who care. Not the lack of precise attribute on the 
commands. These are the existing use cases that break by inverting the 
logic in the kernel.

The problem child is perf record -e cycles:ppG. That command silently 
crashes running VMs. You don't get a pop up or message that says "Dave, 
you crashed your VMs running perf". You don't notice the VMs have 
crashed until you attempt to login or what have you.

So how many perf users are having weird VM crashes? I don't know. I just 
happened to:

1. not use libbvirt
2. have a running VM with console messages kicked to ttyS0
3. ttyS0 connected to stdio
4. screen session with a running VM open

at the time I ran perf.


> Btw, I do *not* think that you should necessariyl default to 'H' for
> host-only mode.

The change made to perf userspace was to set exclude_guest IF precise is 
requested AND GH have not been specified.

>
> The way it should work is that ":pp", ":ppH" and ":ppV" are all different.
>
>   - "cycles:ppH" means: I want precise cycles only for the host case
>
>   - "cycles:ppV" means: I want precise cycles, and I want the VM too
>
>     This would result in EOPNOTSUPP for the case we know is buggy (but
> presumably work on some other CPUs that don't have the problem)
>
>   - "cycles:pp" is "I want precise cycles, and I don't care about virtualization"

yes, this is the case I handled within perf userspace.

And then there is the whole 'perf kvm {top|record}' twists to the perf code.

David


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-13 17:10                         ` Linus Torvalds
@ 2012-12-13 17:31                           ` Ingo Molnar
  2012-12-17  4:43                             ` David Ahern
  0 siblings, 1 reply; 34+ messages in thread
From: Ingo Molnar @ 2012-12-13 17:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Ahern, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
	Peter Zijlstra, Thomas Gleixner, Andrew Morton


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Thu, Dec 13, 2012 at 8:59 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >>
> >> It's not a regression since THAT CODE NEVER WORKED, for
> >> chissake! The case of people actually profiling into virtual
> >> machines crashes the running VMs, as you say. There's no way
> >> in hell we can call it a regression to say "you now have to
> >> use a flag if you profile a load with virtualization", since
> >> there wasn't any working case to begin with.
> >
> > Correct.
> >
> > ::include_guest looks like the more logical flag direction to
> > use in any case.
> 
> See the email I just sent. The *non*-precise case presumably used to
> work (and included the virtualized environment). No?
> 
> So the default shouldn't necessarily be "include guest". The default
> should presumably be "the user didn't say", and then the kernel does
> whatever works best.
> 
> If the user actually explicitly says one or the other, we should try
> to honor that (and then EOPNOTSUPP may be a "sorry, I really cannot do
> that particular combination that you explicitly asked for").
> 
> That should make everybody happy. Doing a non-PEBS virtualized perf
> run should still work with the old binary.
> 
> So there should be two bits: "include guest" (V in the event specifier
> unless you already used that for something else) and "host only" (H),
> and they should both default to off. Then the kernel can see the three
> actual cases.
> 
> (Or four cases, if you really want to: you may or may not want to make
> the "both V and H set means both, and _only_ V set means 'no host at
> all, _only_ virtual environment'. So then ":ppV" would mean
> "cycle-accurate for virtual box _only_", while ":ppVH" would mean
> "cycle-accurate for both the host and the virtual box". Of course,
> considering the PEBS interface, right now neither of those can
> actually work, but plain ":V" and ":HV" could work).
> 
> The important thing, I think, is that if the user doesn't know 
> or care about the VM case (because he's not running any!) and 
> doesn't specify, then the kernel should not say EOPNOTSUPP, 
> and should do whatever works for that cpu.

Agreed.

David, wanna send a patch for this?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-13 17:30                         ` David Ahern
@ 2012-12-13 17:36                           ` Ingo Molnar
  2012-12-13 19:12                             ` David Ahern
  0 siblings, 1 reply; 34+ messages in thread
From: Ingo Molnar @ 2012-12-13 17:36 UTC (permalink / raw)
  To: David Ahern
  Cc: Linus Torvalds, Linux Kernel Mailing List,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton


* David Ahern <dsahern@gmail.com> wrote:

> On 12/13/12 10:02 AM, Linus Torvalds wrote:
> 
> From your response to Ingo I take it you looked into other cases.
> I'll summarize here to make sure we are on the same page:
> 
> 1. guest only profiling from the host
> perf {record|top} -e cycles:G
> 
> 2. host only profiling
> perf {record|top} -e cycles:H
> 
> These are 4 existing use cases that toggle exclude_guest and do work
> today for those who care. Not the lack of precise attribute on the
> commands. These are the existing use cases that break by inverting
> the logic in the kernel.
> 
> The problem child is perf record -e cycles:ppG. [...]

The #1 problem child in this particular case, the one you should 
care about most is:

	perf record -e cycles:pp

As 99% of the people won't be doing any host or guest side 
profiling, they just want to do profiling.

The above G/H variants are the 1%.

So make sure the default works fine, that old binaries don't 
stop working - and then you can automatically (by default) 
exclude guest profiling if PEBS is enabled, and only ever reject 
profiling in the very specific case of:

	perf record -e cycles:ppG

where the user asks for something we don't support (yet).

So please stop thinking exclusively with a virtualization hat 
on, first think with a generic kernel developer hat on. Once all 
those cases work, make the virtualization case do the right 
thing as well.

We can fix all these cases properly and automatically, users 
don't need to specify flags they don't care about, old binaries 
will work and no VMs will crash.

Ok? So now we need a patch that does all that - otherwise we'll 
have to revert the one that added the regression.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-13 17:36                           ` Ingo Molnar
@ 2012-12-13 19:12                             ` David Ahern
  0 siblings, 0 replies; 34+ messages in thread
From: David Ahern @ 2012-12-13 19:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Linux Kernel Mailing List,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton

On 12/13/12 10:36 AM, Ingo Molnar wrote:
> The #1 problem child in this particular case, the one you should
> care about most is:
>
> 	perf record -e cycles:pp
>
> As 99% of the people won't be doing any host or guest side
> profiling, they just want to do profiling.

In older code (v3.6 and before) -e cycles:p sets the exclude_guest flag 
to 0, so -e cycles:p and -e cycles:pG are equivalent with respect to 
exclude_guest.


> Ok? So now we need a patch that does all that - otherwise we'll
> have to revert the one that added the regression.

I will not have time to work on a patch until Sunday at the earliest.

David

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-13 17:31                           ` Ingo Molnar
@ 2012-12-17  4:43                             ` David Ahern
  2012-12-22 19:22                               ` David Ahern
  0 siblings, 1 reply; 34+ messages in thread
From: David Ahern @ 2012-12-17  4:43 UTC (permalink / raw)
  To: Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Linus Torvalds, Linux Kernel Mailing List, Peter Zijlstra,
	Thomas Gleixner, Andrew Morton

On 12/13/12 10:31 AM, Ingo Molnar wrote:
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>> So the default shouldn't necessarily be "include guest". The default
>> should presumably be "the user didn't say", and then the kernel does
>> whatever works best.
>>
>> If the user actually explicitly says one or the other, we should try
>> to honor that (and then EOPNOTSUPP may be a "sorry, I really cannot do
>> that particular combination that you explicitly asked for").
>>
>> That should make everybody happy. Doing a non-PEBS virtualized perf
>> run should still work with the old binary.
>>
>> So there should be two bits: "include guest" (V in the event specifier
>> unless you already used that for something else) and "host only" (H),
>> and they should both default to off. Then the kernel can see the three
>> actual cases.
>>
>> (Or four cases, if you really want to: you may or may not want to make
>> the "both V and H set means both, and _only_ V set means 'no host at
>> all, _only_ virtual environment'. So then ":ppV" would mean
>> "cycle-accurate for virtual box _only_", while ":ppVH" would mean
>> "cycle-accurate for both the host and the virtual box". Of course,
>> considering the PEBS interface, right now neither of those can
>> actually work, but plain ":V" and ":HV" could work).
>>
>> The important thing, I think, is that if the user doesn't know
>> or care about the VM case (because he's not running any!) and
>> doesn't specify, then the kernel should not say EOPNOTSUPP,
>> and should do whatever works for that cpu.
>
> Agreed.
>
> David, wanna send a patch for this?

As I mentioned in a prior email exclude_{guest,host} work currently work 
fine without PEBS. The current matrix for the flags:

                    profiling
                   guest    host
-e <event>         y        y
-e <event>:G       y        n    - G means enable guest, turn off host
-e <event>:H       n        y    - H means enable host, turn off guest
-e <event>:GH      y        y    - G followed by H means enable both
-e <event>:HG      y        y    - same as GH

There is no reason to change how these work. It's the variants with :p 
that need to be handled:

-e <event>:p       n        y    - guest off is required
-e <event>:pG      y        n    - needs to fail - not supported
-e <event>:pH      n        y
-e <event>:pGH     y        y    - needs to fail - not supported

This is the logic that was implemented in the original patchset which 
was pulled into v3.7 and the cause of this email thread.

One suggestion was to switch exclude_guest to include_guest. I take that 
to mean deprecate the current exclude_guest and add a new include_guest 
flag. Given that there are a number of exclude_XXXX flags (XXXX = user, 
kernel, host, guest, hv, etc) that would make the perf code inconsistent.

All that is needed is for the current exclude_guest flag to be 
deprecated such that for older binaries on newer kernels it is ignored 
(perhaps a warn on once), and then a new flag -- exclude_guest2 -- is 
then used for the new logic.

e.g.,

diff --git a/include/uapi/linux/perf_event.h 
b/include/uapi/linux/perf_event.h
index 4f63c05..19900df 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -266,12 +266,14 @@ struct perf_event_attr {
                 sample_id_all  :  1, /* sample_type all events */

                 exclude_host   :  1, /* don't count in host   */
-               exclude_guest  :  1, /* don't count in guest  */
+               exclude_guest  :  1, /* don't count in guest - DEPRECATED */

                 exclude_callchain_kernel : 1, /* exclude kernel 
callchains */
                 exclude_callchain_user   : 1, /* exclude user callchains */

-               __reserved_1   : 41;
+               exclude_guest2  :  1, /* don't count in guest  */
+
+               __reserved_1   : 40;

     union {
         __u32       wakeup_events;    /* wakeup every n events */


Do you agree with that?

David

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-17  4:43                             ` David Ahern
@ 2012-12-22 19:22                               ` David Ahern
  2012-12-23  0:00                                 ` Linus Torvalds
  0 siblings, 1 reply; 34+ messages in thread
From: David Ahern @ 2012-12-22 19:22 UTC (permalink / raw)
  To: Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: Linus Torvalds, Linux Kernel Mailing List, Peter Zijlstra,
	Thomas Gleixner, Andrew Morton

any opinions on whether the approach is reasonable?

On 12/16/12 9:43 PM, David Ahern wrote:
> On 12/13/12 10:31 AM, Ingo Molnar wrote:
>> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>> So the default shouldn't necessarily be "include guest". The default
>>> should presumably be "the user didn't say", and then the kernel does
>>> whatever works best.
>>>
>>> If the user actually explicitly says one or the other, we should try
>>> to honor that (and then EOPNOTSUPP may be a "sorry, I really cannot do
>>> that particular combination that you explicitly asked for").
>>>
>>> That should make everybody happy. Doing a non-PEBS virtualized perf
>>> run should still work with the old binary.
>>>
>>> So there should be two bits: "include guest" (V in the event specifier
>>> unless you already used that for something else) and "host only" (H),
>>> and they should both default to off. Then the kernel can see the three
>>> actual cases.
>>>
>>> (Or four cases, if you really want to: you may or may not want to make
>>> the "both V and H set means both, and _only_ V set means 'no host at
>>> all, _only_ virtual environment'. So then ":ppV" would mean
>>> "cycle-accurate for virtual box _only_", while ":ppVH" would mean
>>> "cycle-accurate for both the host and the virtual box". Of course,
>>> considering the PEBS interface, right now neither of those can
>>> actually work, but plain ":V" and ":HV" could work).
>>>
>>> The important thing, I think, is that if the user doesn't know
>>> or care about the VM case (because he's not running any!) and
>>> doesn't specify, then the kernel should not say EOPNOTSUPP,
>>> and should do whatever works for that cpu.
>>
>> Agreed.
>>
>> David, wanna send a patch for this?
>
> As I mentioned in a prior email exclude_{guest,host} work currently work
> fine without PEBS. The current matrix for the flags:
>
>                     profiling
>                    guest    host
> -e <event>         y        y
> -e <event>:G       y        n    - G means enable guest, turn off host
> -e <event>:H       n        y    - H means enable host, turn off guest
> -e <event>:GH      y        y    - G followed by H means enable both
> -e <event>:HG      y        y    - same as GH
>
> There is no reason to change how these work. It's the variants with :p
> that need to be handled:
>
> -e <event>:p       n        y    - guest off is required
> -e <event>:pG      y        n    - needs to fail - not supported
> -e <event>:pH      n        y
> -e <event>:pGH     y        y    - needs to fail - not supported
>
> This is the logic that was implemented in the original patchset which
> was pulled into v3.7 and the cause of this email thread.
>
> One suggestion was to switch exclude_guest to include_guest. I take that
> to mean deprecate the current exclude_guest and add a new include_guest
> flag. Given that there are a number of exclude_XXXX flags (XXXX = user,
> kernel, host, guest, hv, etc) that would make the perf code inconsistent.
>
> All that is needed is for the current exclude_guest flag to be
> deprecated such that for older binaries on newer kernels it is ignored
> (perhaps a warn on once), and then a new flag -- exclude_guest2 -- is
> then used for the new logic.
>
> e.g.,
>
> diff --git a/include/uapi/linux/perf_event.h
> b/include/uapi/linux/perf_event.h
> index 4f63c05..19900df 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -266,12 +266,14 @@ struct perf_event_attr {
>                  sample_id_all  :  1, /* sample_type all events */
>
>                  exclude_host   :  1, /* don't count in host   */
> -               exclude_guest  :  1, /* don't count in guest  */
> +               exclude_guest  :  1, /* don't count in guest -
> DEPRECATED */
>
>                  exclude_callchain_kernel : 1, /* exclude kernel
> callchains */
>                  exclude_callchain_user   : 1, /* exclude user
> callchains */
>
> -               __reserved_1   : 41;
> +               exclude_guest2  :  1, /* don't count in guest  */
> +
> +               __reserved_1   : 40;
>
>      union {
>          __u32       wakeup_events;    /* wakeup every n events */
>
>
> Do you agree with that?
>
> David


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
       [not found]         ` <20121217102000.GE11016@redhat.com>
@ 2012-12-22 19:30           ` David Ahern
  2012-12-23  9:23             ` Gleb Natapov
  0 siblings, 1 reply; 34+ messages in thread
From: David Ahern @ 2012-12-22 19:30 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Linus Torvalds, Ingo Molnar, Linux Kernel Mailing List,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton

On 12/17/12 3:20 AM, Gleb Natapov wrote:
> Does the regression happen because of commit 20b279ddb38c. If it does I
> think it is safe to revert it. KVM disables PEBS during guest entry now, so
> VMs shouldn't be blowing up (they do not in my testing) and if they still
> do we can disable the counter that has PEBS enabled on a guest entry too.
> Yes, if user runs "perf record -e cycles:ppG" he will not know that
> kernel ignored :pp modifier (with 20b279ddb38c he will get an error), but
> at least old binaries will continue working and new binaries can do the
> checking in userspace.
>

Your patch alone was not enough. Start here:
  https://lkml.org/lkml/2012/7/12/3

And from your response:
https://lkml.org/lkml/2012/7/12/337

"Do not run perf kvm. It does not set exclude_guest and :p and :pp is 
not compatible with guest profiling and should be disallowed. Again 
Peter's patch takes care of this."

20b279ddb38c is Peter's patch -- kernel side enforcement that 
exclude_guest needs to be set when using precise mode.

David

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-22 19:22                               ` David Ahern
@ 2012-12-23  0:00                                 ` Linus Torvalds
  0 siblings, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2012-12-23  0:00 UTC (permalink / raw)
  To: David Ahern
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Linux Kernel Mailing List,
	Peter Zijlstra, Thomas Gleixner, Andrew Morton

Sure. I actually think both should be deprecated, and replaced with
explicit "trace guest" vs "trace host" flags, just to make it
consistent, but I don't care deeply.

The only thing I care about is that compatibility must never be broken
(which the current setup does), and that people who don't use
virtualization should never be asked to use flags that have anything
to do with virtualization (which the current work-around involves).

        Linus

On Sat, Dec 22, 2012 at 11:22 AM, David Ahern <dsahern@gmail.com> wrote:
> any opinions on whether the approach is reasonable?
>
> On 12/16/12 9:43 PM, David Ahern wrote:
>>
>> All that is needed is for the current exclude_guest flag to be
>> deprecated such that for older binaries on newer kernels it is ignored
>> (perhaps a warn on once), and then a new flag -- exclude_guest2 -- is
>> then used for the new logic.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-22 19:30           ` [GIT PULL] perf changes for v3.8 David Ahern
@ 2012-12-23  9:23             ` Gleb Natapov
  2012-12-23 23:17               ` David Ahern
  0 siblings, 1 reply; 34+ messages in thread
From: Gleb Natapov @ 2012-12-23  9:23 UTC (permalink / raw)
  To: David Ahern
  Cc: Linus Torvalds, Ingo Molnar, Linux Kernel Mailing List,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton

On Sat, Dec 22, 2012 at 12:30:35PM -0700, David Ahern wrote:
> On 12/17/12 3:20 AM, Gleb Natapov wrote:
> >Does the regression happen because of commit 20b279ddb38c. If it does I
> >think it is safe to revert it. KVM disables PEBS during guest entry now, so
> >VMs shouldn't be blowing up (they do not in my testing) and if they still
> >do we can disable the counter that has PEBS enabled on a guest entry too.
> >Yes, if user runs "perf record -e cycles:ppG" he will not know that
> >kernel ignored :pp modifier (with 20b279ddb38c he will get an error), but
> >at least old binaries will continue working and new binaries can do the
> >checking in userspace.
> >
> 
> Your patch alone was not enough. Start here:
>  https://lkml.org/lkml/2012/7/12/3
>
I cannot reproduce this failure. I reverted 20b279ddb38c and ran "perf
record -e cycles:ppG" while guest was running. Admittedly I ran the test
for a short time, but without disabling PEBS during the guest entry this
was enough to crash a guest.

The difference between "perf record -e cycles:ppG" and "perf record -e
cycles:ppH" from KVM point of view is that for ppH  PMU counter and PEBS
will be disabled during a guest entry for ppG only PEBS will be disabled,
so may be my testing is not enough and if counter remains enabled PEBS
write can eventually overshoot guest entry. In this case we can treat
ppG and ppH the same during guest entry and disable both counter and PEBS.

> And from your response:
> https://lkml.org/lkml/2012/7/12/337
> 
> "Do not run perf kvm. It does not set exclude_guest and :p and :pp
> is not compatible with guest profiling and should be disallowed.
> Again Peter's patch takes care of this."
> 
I stand by this :) It should be disallowed as in "user should get
a warning that he does something wrong and hist settings will be
ignored". Unfortunately the way it was implemented breaks old perf
binaries and keeping them running is more important than warning users
about something that never worked anyway. New perf binary can do the
check in userspace. Kernel should still disallow configuration that may
crash a guest, but not in a way that breaks the userspace that does not
set exclude_guest. What about forcing exclude_guest on an event that
has precise flag set without reporting error to userspace?
 
> 20b279ddb38c is Peter's patch -- kernel side enforcement that
> exclude_guest needs to be set when using precise mode.
> 
> David

--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-23  9:23             ` Gleb Natapov
@ 2012-12-23 23:17               ` David Ahern
  2012-12-24 10:36                 ` Gleb Natapov
  0 siblings, 1 reply; 34+ messages in thread
From: David Ahern @ 2012-12-23 23:17 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Linus Torvalds, Ingo Molnar, Linux Kernel Mailing List,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton

On 12/23/12 2:23 AM, Gleb Natapov wrote:
>> Your patch alone was not enough. Start here:
>>   https://lkml.org/lkml/2012/7/12/3
>>
> I cannot reproduce this failure. I reverted 20b279ddb38c and ran "perf
> record -e cycles:ppG" while guest was running. Admittedly I ran the test
> for a short time, but without disabling PEBS during the guest entry this
> was enough to crash a guest.

In the beginning (without any patches) VMs crashed fairly quickly. With 
your patch it took longer, but I was able to consistently crash VMs. The 
thread notes server info (processor, OS) and VM versions as well as load 
used for the tests -- a cpu bound process (openssl), disk bound (dd) and 
network (netperf).


> What about forcing exclude_guest on an event that
> has precise flag set without reporting error to userspace?

That's up to the perf maintainers -- Ingo, Peter, Arnaldo. Personally, I 
don't like it since kernel side is changing the user request.

David

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [GIT PULL] perf changes for v3.8
  2012-12-23 23:17               ` David Ahern
@ 2012-12-24 10:36                 ` Gleb Natapov
  0 siblings, 0 replies; 34+ messages in thread
From: Gleb Natapov @ 2012-12-24 10:36 UTC (permalink / raw)
  To: David Ahern
  Cc: Linus Torvalds, Ingo Molnar, Linux Kernel Mailing List,
	Arnaldo Carvalho de Melo, Peter Zijlstra, Thomas Gleixner,
	Andrew Morton

On Sun, Dec 23, 2012 at 04:17:45PM -0700, David Ahern wrote:
> On 12/23/12 2:23 AM, Gleb Natapov wrote:
> >>Your patch alone was not enough. Start here:
> >>  https://lkml.org/lkml/2012/7/12/3
> >>
> >I cannot reproduce this failure. I reverted 20b279ddb38c and ran "perf
> >record -e cycles:ppG" while guest was running. Admittedly I ran the test
> >for a short time, but without disabling PEBS during the guest entry this
> >was enough to crash a guest.
> 
> In the beginning (without any patches) VMs crashed fairly quickly.
> With your patch it took longer, but I was able to consistently crash
> VMs. The thread notes server info (processor, OS) and VM versions as
> well as load used for the tests -- a cpu bound process (openssl),
> disk bound (dd) and network (netperf).
> 
> 
It means that disabling PEBS is not enough and PMU counter should be
disabled too.

> >What about forcing exclude_guest on an event that
> >has precise flag set without reporting error to userspace?
> 
> That's up to the perf maintainers -- Ingo, Peter, Arnaldo.
> Personally, I don't like it since kernel side is changing the user
> request.
> 
I do not see other way to prevent guests from crashing with older perf
binaries if 20b279ddb38c will be reverted.

--
			Gleb.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2012-12-24 10:36 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-11  9:09 [GIT PULL] perf changes for v3.8 Ingo Molnar
2012-12-13  2:53 ` Linus Torvalds
2012-12-13  3:02   ` David Ahern
2012-12-13  3:09     ` Linus Torvalds
2012-12-13  3:16       ` David Ahern
2012-12-13  3:25   ` David Ahern
2012-12-13  3:34     ` Linus Torvalds
2012-12-13  3:43       ` David Ahern
2012-12-13  3:51         ` Linus Torvalds
2012-12-13  4:31           ` David Ahern
2012-12-13  4:46             ` Linus Torvalds
2012-12-13  7:27               ` Ingo Molnar
2012-12-13  7:30             ` Ingo Molnar
2012-12-13 14:30               ` David Ahern
2012-12-13 14:38                 ` David Ahern
2012-12-13 16:03                 ` Linus Torvalds
2012-12-13 16:24                   ` David Ahern
2012-12-13 16:33                     ` Linus Torvalds
2012-12-13 16:59                       ` Ingo Molnar
2012-12-13 17:10                         ` Linus Torvalds
2012-12-13 17:31                           ` Ingo Molnar
2012-12-17  4:43                             ` David Ahern
2012-12-22 19:22                               ` David Ahern
2012-12-23  0:00                                 ` Linus Torvalds
2012-12-13 17:02                       ` Linus Torvalds
2012-12-13 17:30                         ` David Ahern
2012-12-13 17:36                           ` Ingo Molnar
2012-12-13 19:12                             ` David Ahern
2012-12-13  7:48         ` [PATCH] Revert "perf: Require exclude_guest to use PEBS - kernel side enforcement" Ingo Molnar
     [not found]         ` <20121217102000.GE11016@redhat.com>
2012-12-22 19:30           ` [GIT PULL] perf changes for v3.8 David Ahern
2012-12-23  9:23             ` Gleb Natapov
2012-12-23 23:17               ` David Ahern
2012-12-24 10:36                 ` Gleb Natapov
2012-12-13 17:04 ` [PATCH] x86: fix perf build with uclibc toolchains Florian Fainelli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).