* [GIT PULL] perf changes for v3.8
@ 2012-12-11 9:09 Ingo Molnar
2012-12-13 2:53 ` Linus Torvalds
2012-12-13 17:04 ` [PATCH] x86: fix perf build with uclibc toolchains Florian Fainelli
0 siblings, 2 replies; 34+ messages in thread
From: Ingo Molnar @ 2012-12-11 9:09 UTC (permalink / raw)
To: Linus Torvalds
Cc: linux-kernel, Arnaldo Carvalho de Melo, Peter Zijlstra,
Thomas Gleixner, Andrew Morton
Linus,
Please pull the latest perf-core-for-linus git tree from:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf-core-for-linus
HEAD: cc1b39dbf9f55a438e8a21a694394c20e6a17129 Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core
Lots of activity:
211 files changed, 8328 insertions(+), 4116 deletions(-)
most of it on the tooling side.
Main changes:
* ftrace enhancements and fixes from Steve Rostedt.
* uprobes fixes, cleanups and preparation for the ARM
port from Oleg Nesterov.
* UAPI fixes, from David Howels - prepares the arch/x86 UAPI transition
* Separate perf tests into multiple objects, one per test, from Jiri Olsa.
* Make hardware event translations available in sysfs, from Jiri Olsa.
* Fixes to /proc/pid/maps parsing, preparatory to supporting data maps,
from Namhyung Kim
* Implement ui_progress for GTK, from Namhyung Kim
* Add framework for automated perf_event_attr tests, where tools with
different command line options will be run from a 'perf test', via
python glue, and the perf syscall will be intercepted to verify that
the perf_event_attr fields set by the tool are those expected,
from Jiri Olsa
* Add a 'link' method for hists, so that we can have the leader with
buckets for all the entries in all the hists. This new method
is now used in the default 'diff' output, making the sum of the 'baseline'
column be 100%, eliminating blind spots.
* libtraceevent fixes for compiler warnings trying to make perf it build
on some distros, like fedora 14, 32-bit, some of the warnings really
pointed to real bugs.
* Add a browser for 'perf script' and make it available from the report
and annotate browsers. It does filtering to find the scripts that
handle events found in the perf.data file used. From Feng Tang
* perf inject changes to allow showing where a task sleeps, from Andrew Vagin.
* Makefile improvements from Namhyung Kim.
* Add --pre and --post command hooks in 'stat', from Peter Zijlstra.
* Don't stop synthesizing threads when one vanishes, this is for
the existing threads when we start a tool like trace.
* Use sched:sched_stat_runtime to provide a thread summary, this
produces the same output as the 'trace summary' subcommand of
tglx's original "trace" tool.
* Support interrupted syscalls in 'trace'
* Add an event duration column and filter in 'trace'.
* There are references to the man pages in some tools, so try to build
Documentation when installing, warning the user if that is not possible,
from Borislav Petkov.
* Give user better message if precise is not supported, from David Ahern.
* Try to find cross-built objdump path by using the session environment
information in the perf.data file header, from Irina Tirdea, original
patch and idea by Namhyung Kim.
* Diplays more output on features check for make V=1, so that one can figure
out what is happening by looking at gcc output, etc. From Jiri Olsa.
* Add on_exit implementation for systems without one, e.g. Android, from
Bernhard Rosenkraenzer.
* Only process events for vcpus of interest, helps handling large number
of events, from David Ahern.
* Cross compilation fixes for Android, from Irina Tirdea.
* Add documentation on compiling for Android, from Irina Tirdea.
* perf diff improvements from Jiri Olsa.
* Target (task/user/cpu/syswide) handling improvements, from Namhyung Kim.
* Add support in 'trace' for tracing workload given by command line, from
Namhyung Kim.
* ... and much more.
Thanks,
Ingo
------------------>
Andi Kleen (3):
perf tools: Move parse_events error printing to parse_events_options
perf annotate: Handle XBEGIN like a jump
perf tools: Add arbitary aliases and support names with -
Andrew Vagin (3):
perf inject: Work with files
perf inject: Merge sched_stat_* and sched_switch events
perf inject: Mark a dso if it's used
Arnaldo Carvalho de Melo (31):
perf tools: Have the page size value available for all tools
perf machine: Introduce find_thread method
perf event: No need to create a thread when handling PERF_RECORD_EXIT
perf annotate: Handle PERF_RECORD_EXIT events
perf sched: Handle PERF_RECORD_EXIT events
perf machine: Carve up event processing specific from perf_tool
perf tools: Remove noise in python version feature test
perf test: Align the 'Ok'/'FAILED!' test results
perf trace: Support interrupted syscalls
perf trace: Add an event duration column
perf trace: Add duration filter
perf tools: Pretty print errno for some more functions
perf trace: Print the name of a syscall when failing to read its info
perf tools: Don't stop synthesizing threads when one vanishes
perf trace: Count number of events for each thread and globally
perf trace: Use sched:sched_stat_runtime to provide a thread summary
perf python: Initialize 'page_size' variable
perf tools: Handle --version string generation on machines without git
perf diff: Start moving to support matching more than two hists
perf diff: Move hists__match to the hists lib
perf hists: Introduce hists__link
perf diff: Use hists__link when not pairing just with baseline
perf machine: Move more methods to machine.[ch]
tools lib traceevent: Add __maybe_unused to unused parameters
tools lib traceevent: Avoid comparisions between signed/unsigned
tools lib traceevent: No need to check for < 0 on an unsigned enum
tools lib traceevent: Handle INVALID_ARG_TYPE errno in pevent_strerror
tools lib traceevent: Use 'const' in variables pointing to const strings
perf tools: Stop using 'self' in pstack
perf hists: Initialize all of he->stat with zeroes
perf evsel: Introduce is_group_member method
Bernhard Rosenkraenzer (1):
perf tools: Add on_exit implementation
Borislav Petkov (1):
perf tools: Try to build Documentation when installing
Daniel Walter (1):
tracing: Replace strict_strto* with kstrto*
David Ahern (5):
perf kvm: Only process events for vcpus of interest
perf kvm: Remove typecast in init_kvm_event_record
perf kvm: Total count is a u64, print as so
perf kvm: Add braces around multi-line statements
perf tools: Give user better message if precise is not supported
David Howells (3):
tools: Define a Makefile function to do subdir processing
tools: Honour the O= flag when tool build called from a higher Makefile
tools: Pass the target in descend
David Sharp (4):
tracing: Trivial cleanup
tracing: Reset ring buffer when changing trace_clocks
tracing,x86: Add a TSC trace_clock
tracing: Format non-nanosec times from tsc clock without a decimal point.
David Vrabel (1):
x86: Allow tracing of functions in arch/x86/kernel/rtc.c
Feng Tang (7):
perf tools: Add a global variable "const char *input_name"
perf script: Add more filter to find_scripts()
perf scripts browser: Add a browser for perf script
perf annotate browser: Integrate script browser into annotation browser
perf hists browser: Integrate script browser into main hists browser
perf header: Add is_perf_magic() func
perf browser: Don't show scripts menu for 'perf top'
Hiraku Toyooka (1):
tracing: Change tracer's integer flags to bool
Ingo Molnar (2):
perf tools: Speed up the perf build time by simplifying the perf --version string generation
perf tools: Further speed up the perf build
Irina Tirdea (3):
perf tools: Update Makefile for Android
Documentation: add documentation on compiling for Android
perf tools: Try to find cross-built objdump path
Jiri Olsa (70):
perf diff: Add -b option for perf diff to display paired entries only
perf diff: Add ratio computation way to compare hist entries
perf diff: Add option to sort entries based on diff computation
perf diff: Add weighted diff computation way to compare hist entries
perf diff: Add -p option to display period values for hist entries
perf diff: Add -F option to display formula for computation
perf diff: Include samples without symbol in overall stats
perf diff: Display empty space for non paired samples
perf/x86: Make hardware event translations available in sysfs
perf/x86: Filter out undefined events from sysfs events attribute
perf/x86: Add hardware events translations for Intel cpus
perf/x86: Add hardware events translations for AMD cpus
perf/x86: Add hardware events translations for Intel P6 cpus
perf tools: Fix PMU object alias initialization
perf tools: Add support to specify hw event as PMU event term
perf test: Add automated tests for pmu sysfs translated events
perf tools: Diplays more output on features check for make V=1
perf tools: Move build_id__sprintf into build-id object
perf tools: Move BUILD_ID_SIZE into build-id object
perf tools: Move hex2u64 into util object
perf tools: Move strxfrchar into string object
perf tools: Move dso_* related functions into dso object
perf record: Fix mmap error output condition
perf tools: Remove BINDIR define from exec_cmd.o compilation
perf tests: Move test objects into 'tests' directory
perf tests: Add framework for automated perf_event_attr tests
perf tests: Add attr record basic test
perf tests: Add attr tests under builtin test command
perf tests: Add attr record group test
perf tests: Add attr record event syntax group test
perf tests: Add attr record freq test
perf tests: Add attr record count test
perf tests: Add attr record graph test
perf tests: Add attr record period test
perf tests: Add attr record no samples test
perf tests: Add attr record no-inherit test
perf tests: Add attr record data test
perf tests: Add attr record raw test
perf tests: Add attr record no delay test
perf tests: Add attr record branch any test
perf tests: Add attr record branch filter tests
perf tests: Add attr stat no-inherit test
perf tests: Add attr stat group test
perf tests: Add attr stat event syntax group test
perf tests: Add attr stat default test
perf tests: Add attr stat default test
perf tests: Add documentation for attr tests
perf tests: Add missing attr stat basic test
perf tests: Factor attr tests WRITE_ASS macro
perf tests: Fix attr watermark field name typo
perf tests: Removing 'optional' field
perf tests: Move attr.py temp dir cleanup into finally section
perf tools: Add LIBDW_DIR Makefile variable to for alternate libdw
perf tests: Move test__vmlinux_matches_kallsyms into separate object
perf tests: Move test__open_syscall_event into separate object
perf tests: Move test__open_syscall_event_on_all_cpus into separate object
perf tests: Move test__basic_mmap into separate object
perf tests: Move test__PERF_RECORD into separate object
perf tests: Move test__rdpmc into separate object
perf tests: Move perf_evsel__roundtrip_name_test into separate object
perf tests: Move perf_evsel__tp_sched_test into separate object
perf tests: Move test__syscall_open_tp_fields into separate object
perf tests: Move pmu tests into separate object
perf tests: Final cleanup for builtin-test move
perf tests: Check for mkstemp return value in dso-data test
perf tools: Fix attributes for '{}' defined event groups
perf tools: Fix 'disabled' attribute config for record command
perf tools: Ensure single disable call per event in record comand
perf tools: Omit group members from perf_evlist__disable/enable
perf tools: Add basic event modifier sanity check
Joonsoo Kim (1):
perf tools: Add info about cross compiling for Android ARM
Jovi Zhang (1):
uprobes: Fix misleading log entry
Michal Hocko (1):
linux/kernel.h: Remove duplicate trace_printk declaration
Namhyung Kim (27):
perf trace: Validate target task/user/cpu argument
perf trace: Explicitly enable system-wide mode if no option is given
perf trace: Add support for tracing workload given by command line
tools lib traceevent: Do not generate dependency for system header files
perf tools: Cleanup doc related targets
perf tools: Convert invocation of MAKE into SUBDIR
perf tools: Always show CHK message when doing try-cc
perf tools: Fix LIBELF_MMAP checking
perf tools: Warn about missing libelf
perf tools: Use normalized arch name for searching objdump path
perf tools: Introduce struct hist_browser_timer
perf report: Postpone objdump check until annotation requested
perf machine: Set kernel data mapping length
perf tools: Fix detection of stack area
perf hists: Free branch_info when freeing hist_entry
perf tools: Don't try to lookup objdump for live mode
perf annotate: Whitespace fixups
perf annotate: Don't try to follow jump target on PLT symbols
perf annotate: Merge same lines in summary view
perf tools: Fix compile error on NO_NEWT=1 build
perf tools: Add gtk.<command> config option for launching GTK browser
perf tools: Use sscanf for parsing /proc/pid/maps
perf ui tui: Move progress.c under ui/tui directory
perf ui: Introduce generic ui_progress helper
perf ui gtk: Implement ui_progress functions
perf ui: Add ui_progress__finish()
perf ui: Always compile browser setup code
Oleg Nesterov (5):
uprobes/powerpc: Don't clear TIF_UPROBE in do_notify_resume()
uprobes/powerpc: Do not use arch_uprobe_*_step() helpers
uprobes/x86: Cleanup the single-stepping code
uprobes: Kill arch_uprobe_enable/disable_step() hooks
uprobes: Use percpu_rw_semaphore to fix register/unregister vs dup_mmap() race
Peter Huewe (1):
perf/x86: Fix sparse warnings
Peter Zijlstra (1):
perf stat: Add --pre and --post command
Rabin Vincent (1):
uprobes: Flush cache after xol write
Shan Wei (1):
tracing: Kill unused and puzzled sample code in ftrace.h
Slava Pestov (1):
ring-buffer: Add a 'dropped events' counter
Steven Rostedt (11):
tracing: Allow tracers to start at core initcall
tracing: Expand ring buffer when trace_printk() is used
tracing: Enable comm recording if trace_printk() is used
tracing: Have tracing_sched_wakeup_trace() use standard unlock_commit
tracing: Cache comms only after an event occurred
tracing: Separate open function from set_event and available_events
tracing: Remove unused function unregister_tracer()
tracing: Make tracing_enabled be equal to tracing_on
tracing: Remove deprecated tracing_enabled file
tracing: Use irq_work for wake ups and remove *_nowake_*() functions
tracing: Add trace_options kernel command line parameter
Sukadev Bhattiprolu (1):
perf powerpc: Use uapi/unistd.h to fix build error
Suzuki K. Poulose (1):
Account the nr_entries in rblist properly
Vaibhav Nagarnaik (1):
tracing: Cleanup unnecessary function declarations
Wei Yongjun (1):
perf tools: Remove duplicated include from trace-event-python.c
Yoshihiro YUNOMAE (2):
ring-buffer: Change unsigned long type of ring_buffer_oldest_event_ts() to u64
tracing: Show raw time stamp on stats per cpu using counter or tsc mode for trace_clock
Zheng Liu (1):
perf test: fix a build error on builtin-test
Documentation/kernel-parameters.txt | 16 +
arch/alpha/include/asm/Kbuild | 1 +
arch/arm/include/asm/Kbuild | 1 +
arch/arm64/include/asm/Kbuild | 1 +
arch/avr32/include/asm/Kbuild | 1 +
arch/blackfin/include/asm/Kbuild | 1 +
arch/c6x/include/asm/Kbuild | 1 +
arch/cris/include/asm/Kbuild | 1 +
arch/frv/include/asm/Kbuild | 1 +
arch/h8300/include/asm/Kbuild | 1 +
arch/hexagon/include/asm/Kbuild | 1 +
arch/ia64/include/asm/Kbuild | 1 +
arch/m32r/include/asm/Kbuild | 1 +
arch/m68k/include/asm/Kbuild | 1 +
arch/microblaze/include/asm/Kbuild | 1 +
arch/mips/include/asm/Kbuild | 1 +
arch/mn10300/include/asm/Kbuild | 1 +
arch/openrisc/include/asm/Kbuild | 1 +
arch/parisc/include/asm/Kbuild | 1 +
arch/powerpc/include/asm/Kbuild | 1 +
arch/powerpc/kernel/signal.c | 4 +-
arch/powerpc/kernel/uprobes.c | 6 +
arch/s390/include/asm/Kbuild | 1 +
arch/score/include/asm/Kbuild | 1 +
arch/sh/include/asm/Kbuild | 1 +
arch/sparc/include/asm/Kbuild | 1 +
arch/tile/include/asm/Kbuild | 1 +
arch/um/include/asm/Kbuild | 1 +
arch/unicore32/include/asm/Kbuild | 1 +
arch/x86/include/asm/trace_clock.h | 20 +
arch/x86/kernel/Makefile | 2 +-
arch/x86/kernel/cpu/perf_event.c | 121 ++
arch/x86/kernel/cpu/perf_event.h | 5 +
arch/x86/kernel/cpu/perf_event_amd.c | 9 +
arch/x86/kernel/cpu/perf_event_intel.c | 9 +
arch/x86/kernel/cpu/perf_event_p6.c | 2 +
arch/x86/kernel/rtc.c | 6 -
arch/x86/kernel/trace_clock.c | 21 +
arch/x86/kernel/tsc.c | 6 +
arch/x86/kernel/uprobes.c | 54 +-
arch/xtensa/include/asm/Kbuild | 1 +
include/asm-generic/trace_clock.h | 16 +
include/linux/ftrace_event.h | 20 +-
include/linux/kernel.h | 7 +-
include/linux/ring_buffer.h | 3 +-
include/linux/trace_clock.h | 2 +
include/linux/uprobes.h | 10 +-
include/trace/ftrace.h | 76 +-
include/trace/syscall.h | 23 -
kernel/events/uprobes.c | 43 +-
kernel/fork.c | 2 +
kernel/trace/Kconfig | 1 +
kernel/trace/ftrace.c | 6 +-
kernel/trace/ring_buffer.c | 51 +-
kernel/trace/trace.c | 411 +++---
kernel/trace/trace.h | 18 +-
kernel/trace/trace_branch.c | 4 +-
kernel/trace/trace_events.c | 51 +-
kernel/trace/trace_events_filter.c | 4 +-
kernel/trace/trace_functions.c | 5 +-
kernel/trace/trace_functions_graph.c | 6 +-
kernel/trace/trace_irqsoff.c | 14 +-
kernel/trace/trace_kprobe.c | 10 +-
kernel/trace/trace_output.c | 78 +-
kernel/trace/trace_probe.c | 14 +-
kernel/trace/trace_sched_switch.c | 4 +-
kernel/trace/trace_sched_wakeup.c | 10 +-
kernel/trace/trace_selftest.c | 13 +-
kernel/trace/trace_syscalls.c | 61 +-
kernel/trace/trace_uprobe.c | 4 +-
tools/lib/traceevent/Makefile | 2 +-
tools/lib/traceevent/event-parse.c | 22 +-
tools/perf/Documentation/Makefile | 31 +-
tools/perf/Documentation/android.txt | 78 +
tools/perf/Documentation/perf-diff.txt | 60 +
tools/perf/Documentation/perf-inject.txt | 11 +
tools/perf/Documentation/perf-stat.txt | 5 +
tools/perf/Documentation/perf-trace.txt | 6 +
tools/perf/Makefile | 174 ++-
tools/perf/arch/common.c | 211 +++
tools/perf/arch/common.h | 10 +
tools/perf/builtin-annotate.c | 17 +-
tools/perf/builtin-buildid-cache.c | 1 +
tools/perf/builtin-buildid-list.c | 6 +-
tools/perf/builtin-diff.c | 437 +++++-
tools/perf/builtin-evlist.c | 5 +-
tools/perf/builtin-inject.c | 195 ++-
tools/perf/builtin-kmem.c | 5 +-
tools/perf/builtin-kvm.c | 35 +-
tools/perf/builtin-lock.c | 2 -
tools/perf/builtin-record.c | 66 +-
tools/perf/builtin-report.c | 23 +-
tools/perf/builtin-sched.c | 8 +-
tools/perf/builtin-script.c | 87 +-
tools/perf/builtin-stat.c | 54 +-
tools/perf/builtin-test.c | 1547 --------------------
tools/perf/builtin-timechart.c | 5 +-
tools/perf/builtin-top.c | 17 +-
tools/perf/builtin-trace.c | 403 ++++-
tools/perf/config/feature-tests.mak | 25 +-
tools/perf/config/utilities.mak | 10 +-
tools/perf/perf.c | 20 +-
tools/perf/perf.h | 18 +-
tools/perf/tests/attr.c | 175 +++
tools/perf/tests/attr.py | 322 ++++
tools/perf/tests/attr/README | 64 +
tools/perf/tests/attr/base-record | 39 +
tools/perf/tests/attr/base-stat | 39 +
tools/perf/tests/attr/test-record-basic | 5 +
tools/perf/tests/attr/test-record-branch-any | 8 +
.../perf/tests/attr/test-record-branch-filter-any | 8 +
.../tests/attr/test-record-branch-filter-any_call | 8 +
.../tests/attr/test-record-branch-filter-any_ret | 8 +
tools/perf/tests/attr/test-record-branch-filter-hv | 8 +
.../tests/attr/test-record-branch-filter-ind_call | 8 +
tools/perf/tests/attr/test-record-branch-filter-k | 8 +
tools/perf/tests/attr/test-record-branch-filter-u | 8 +
tools/perf/tests/attr/test-record-count | 8 +
tools/perf/tests/attr/test-record-data | 8 +
tools/perf/tests/attr/test-record-freq | 6 +
tools/perf/tests/attr/test-record-graph-default | 6 +
tools/perf/tests/attr/test-record-graph-dwarf | 10 +
tools/perf/tests/attr/test-record-graph-fp | 6 +
tools/perf/tests/attr/test-record-group | 18 +
tools/perf/tests/attr/test-record-group1 | 19 +
tools/perf/tests/attr/test-record-no-delay | 9 +
tools/perf/tests/attr/test-record-no-inherit | 7 +
tools/perf/tests/attr/test-record-no-samples | 6 +
tools/perf/tests/attr/test-record-period | 7 +
tools/perf/tests/attr/test-record-raw | 7 +
tools/perf/tests/attr/test-stat-basic | 6 +
tools/perf/tests/attr/test-stat-default | 64 +
tools/perf/tests/attr/test-stat-detailed-1 | 101 ++
tools/perf/tests/attr/test-stat-detailed-2 | 155 ++
tools/perf/tests/attr/test-stat-detailed-3 | 173 +++
tools/perf/tests/attr/test-stat-group | 15 +
tools/perf/tests/attr/test-stat-group1 | 15 +
tools/perf/tests/attr/test-stat-no-inherit | 7 +
tools/perf/tests/builtin-test.c | 173 +++
.../{util/dso-test-data.c => tests/dso-data.c} | 8 +-
tools/perf/tests/evsel-roundtrip-name.c | 114 ++
tools/perf/tests/evsel-tp-sched.c | 84 ++
tools/perf/tests/mmap-basic.c | 162 ++
tools/perf/tests/open-syscall-all-cpus.c | 120 ++
tools/perf/tests/open-syscall-tp-fields.c | 117 ++
tools/perf/tests/open-syscall.c | 66 +
.../parse-events-test.c => tests/parse-events.c} | 91 +-
tools/perf/tests/perf-record.c | 312 ++++
tools/perf/tests/pmu.c | 178 +++
tools/perf/tests/rdpmc.c | 175 +++
tools/perf/tests/tests.h | 22 +
tools/perf/tests/util.c | 30 +
tools/perf/tests/vmlinux-kallsyms.c | 230 +++
tools/perf/ui/browsers/annotate.c | 45 +-
tools/perf/ui/browsers/hists.c | 97 +-
tools/perf/ui/browsers/scripts.c | 189 +++
tools/perf/ui/gtk/browser.c | 4 +-
tools/perf/ui/gtk/gtk.h | 1 +
tools/perf/ui/gtk/progress.c | 59 +
tools/perf/ui/gtk/setup.c | 2 +
tools/perf/ui/gtk/util.c | 11 -
tools/perf/ui/hist.c | 138 +-
tools/perf/ui/progress.c | 44 +-
tools/perf/ui/progress.h | 10 +
tools/perf/ui/stdio/hist.c | 2 +-
tools/perf/ui/tui/progress.c | 42 +
tools/perf/ui/tui/setup.c | 1 +
tools/perf/ui/ui.h | 28 +
tools/perf/util/PERF-VERSION-GEN | 14 +-
tools/perf/util/annotate.c | 72 +-
tools/perf/util/annotate.h | 10 +-
tools/perf/util/build-id.c | 27 +-
tools/perf/util/build-id.h | 11 +-
tools/perf/util/cache.h | 39 +-
tools/perf/util/debug.h | 1 +
tools/perf/util/dso.c | 595 ++++++++
tools/perf/util/dso.h | 148 ++
tools/perf/util/event.c | 302 +---
tools/perf/util/event.h | 9 +-
tools/perf/util/evlist.c | 13 +-
tools/perf/util/evsel.c | 52 +-
tools/perf/util/evsel.h | 8 +-
tools/perf/util/header.c | 11 +
tools/perf/util/header.h | 1 +
tools/perf/util/hist.c | 99 ++
tools/perf/util/hist.h | 49 +-
tools/perf/util/machine.c | 464 ++++++
tools/perf/util/machine.h | 148 ++
tools/perf/util/map.c | 182 +--
tools/perf/util/map.h | 93 --
tools/perf/util/parse-events.c | 54 +-
tools/perf/util/parse-events.h | 3 +-
tools/perf/util/parse-events.l | 4 +-
tools/perf/util/parse-events.y | 18 +
tools/perf/util/pmu.c | 192 +--
tools/perf/util/pmu.h | 4 +
tools/perf/util/pstack.c | 46 +-
tools/perf/util/python.c | 2 +
tools/perf/util/rblist.c | 4 +-
.../util/scripting-engines/trace-event-python.c | 1 -
tools/perf/util/session.c | 5 +-
tools/perf/util/session.h | 5 +-
tools/perf/util/sort.h | 45 +-
tools/perf/util/string.c | 18 +
tools/perf/util/symbol.c | 658 +--------
tools/perf/util/symbol.h | 162 +-
tools/perf/util/thread.c | 41 +-
tools/perf/util/thread.h | 2 +
tools/perf/util/trace-event-read.c | 2 -
tools/perf/util/util.c | 35 +
tools/perf/util/util.h | 8 +
211 files changed, 8328 insertions(+), 4116 deletions(-)
create mode 100644 arch/x86/include/asm/trace_clock.h
create mode 100644 arch/x86/kernel/trace_clock.c
create mode 100644 include/asm-generic/trace_clock.h
create mode 100644 tools/perf/Documentation/android.txt
create mode 100644 tools/perf/arch/common.c
create mode 100644 tools/perf/arch/common.h
delete mode 100644 tools/perf/builtin-test.c
create mode 100644 tools/perf/tests/attr.c
create mode 100644 tools/perf/tests/attr.py
create mode 100644 tools/perf/tests/attr/README
create mode 100644 tools/perf/tests/attr/base-record
create mode 100644 tools/perf/tests/attr/base-stat
create mode 100644 tools/perf/tests/attr/test-record-basic
create mode 100644 tools/perf/tests/attr/test-record-branch-any
create mode 100644 tools/perf/tests/attr/test-record-branch-filter-any
create mode 100644 tools/perf/tests/attr/test-record-branch-filter-any_call
create mode 100644 tools/perf/tests/attr/test-record-branch-filter-any_ret
create mode 100644 tools/perf/tests/attr/test-record-branch-filter-hv
create mode 100644 tools/perf/tests/attr/test-record-branch-filter-ind_call
create mode 100644 tools/perf/tests/attr/test-record-branch-filter-k
create mode 100644 tools/perf/tests/attr/test-record-branch-filter-u
create mode 100644 tools/perf/tests/attr/test-record-count
create mode 100644 tools/perf/tests/attr/test-record-data
create mode 100644 tools/perf/tests/attr/test-record-freq
create mode 100644 tools/perf/tests/attr/test-record-graph-default
create mode 100644 tools/perf/tests/attr/test-record-graph-dwarf
create mode 100644 tools/perf/tests/attr/test-record-graph-fp
create mode 100644 tools/perf/tests/attr/test-record-group
create mode 100644 tools/perf/tests/attr/test-record-group1
create mode 100644 tools/perf/tests/attr/test-record-no-delay
create mode 100644 tools/perf/tests/attr/test-record-no-inherit
create mode 100644 tools/perf/tests/attr/test-record-no-samples
create mode 100644 tools/perf/tests/attr/test-record-period
create mode 100644 tools/perf/tests/attr/test-record-raw
create mode 100644 tools/perf/tests/attr/test-stat-basic
create mode 100644 tools/perf/tests/attr/test-stat-default
create mode 100644 tools/perf/tests/attr/test-stat-detailed-1
create mode 100644 tools/perf/tests/attr/test-stat-detailed-2
create mode 100644 tools/perf/tests/attr/test-stat-detailed-3
create mode 100644 tools/perf/tests/attr/test-stat-group
create mode 100644 tools/perf/tests/attr/test-stat-group1
create mode 100644 tools/perf/tests/attr/test-stat-no-inherit
create mode 100644 tools/perf/tests/builtin-test.c
rename tools/perf/{util/dso-test-data.c => tests/dso-data.c} (95%)
create mode 100644 tools/perf/tests/evsel-roundtrip-name.c
create mode 100644 tools/perf/tests/evsel-tp-sched.c
create mode 100644 tools/perf/tests/mmap-basic.c
create mode 100644 tools/perf/tests/open-syscall-all-cpus.c
create mode 100644 tools/perf/tests/open-syscall-tp-fields.c
create mode 100644 tools/perf/tests/open-syscall.c
rename tools/perf/{util/parse-events-test.c => tests/parse-events.c} (94%)
create mode 100644 tools/perf/tests/perf-record.c
create mode 100644 tools/perf/tests/pmu.c
create mode 100644 tools/perf/tests/rdpmc.c
create mode 100644 tools/perf/tests/tests.h
create mode 100644 tools/perf/tests/util.c
create mode 100644 tools/perf/tests/vmlinux-kallsyms.c
create mode 100644 tools/perf/ui/browsers/scripts.c
create mode 100644 tools/perf/ui/gtk/progress.c
create mode 100644 tools/perf/ui/tui/progress.c
create mode 100644 tools/perf/util/dso.c
create mode 100644 tools/perf/util/dso.h
create mode 100644 tools/perf/util/machine.c
create mode 100644 tools/perf/util/machine.h
[ ... half a meg diff omitted due to lkml limits ... ]
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-11 9:09 [GIT PULL] perf changes for v3.8 Ingo Molnar
@ 2012-12-13 2:53 ` Linus Torvalds
2012-12-13 3:02 ` David Ahern
2012-12-13 3:25 ` David Ahern
2012-12-13 17:04 ` [PATCH] x86: fix perf build with uclibc toolchains Florian Fainelli
1 sibling, 2 replies; 34+ messages in thread
From: Linus Torvalds @ 2012-12-13 2:53 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
Peter Zijlstra, Thomas Gleixner, Andrew Morton
Hmm. This may be entirely unrelated to this particular pull request, but
perf record -e cycles:pp
no longer works on my westmere machine (Operation not supported). It
used to work, but I haven't tried to bisect it, since I hope somebody
will just go "oh, I know what's up".
dmesg says:
Performance Events: PEBS fmt1+, 16-deep LBR, Westmere events, Intel
PMU driver.
perf_event_intel: CPUID marked event: 'bus cycles' unavailable
... version: 3
... bit width: 48
... generic registers: 4
... value mask: 0000ffffffffffff
... max period: 000000007fffffff
... fixed-purpose events: 3
... event mask: 000000070000000f
Any ideas?
Linus
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-13 2:53 ` Linus Torvalds
@ 2012-12-13 3:02 ` David Ahern
2012-12-13 3:09 ` Linus Torvalds
2012-12-13 3:25 ` David Ahern
1 sibling, 1 reply; 34+ messages in thread
From: David Ahern @ 2012-12-13 3:02 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
Peter Zijlstra, Thomas Gleixner, Andrew Morton
On 12/12/12 7:53 PM, Linus Torvalds wrote:
> Hmm. This may be entirely unrelated to this particular pull request, but
>
> perf record -e cycles:pp
>
> no longer works on my westmere machine (Operation not supported). It
> used to work, but I haven't tried to bisect it, since I hope somebody
> will just go "oh, I know what's up".
Can you add -v and see if it spits out more info?
David
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-13 3:02 ` David Ahern
@ 2012-12-13 3:09 ` Linus Torvalds
2012-12-13 3:16 ` David Ahern
0 siblings, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2012-12-13 3:09 UTC (permalink / raw)
To: David Ahern
Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
Peter Zijlstra, Thomas Gleixner, Andrew Morton
On Wed, Dec 12, 2012 at 7:02 PM, David Ahern <dsahern@gmail.com> wrote:
>
> Can you add -v and see if it spits out more info?
No more info.
Sure, it does the usual "do you have an APIC" message (it does that
without "-v" too), which isn't useful:
Error: sys_perf_event_open() syscall returned with 95 (Operation not
supported) for event cycles:pp. /bin/dmesg may provide additional
information.
No hardware sampling interrupt available. No APIC? If so then you
can boot the kernel with the "lapic" boot parameter to force-enable
it.
And yes, I have a local apic. Every single modern CPU does.
The error message is garbage and actively misleading. Lack of an APIC
is just about the *least* likely possible reason for the EOPNOTSUPP
error return.
Linus
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-13 3:09 ` Linus Torvalds
@ 2012-12-13 3:16 ` David Ahern
0 siblings, 0 replies; 34+ messages in thread
From: David Ahern @ 2012-12-13 3:16 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
Peter Zijlstra, Thomas Gleixner, Andrew Morton
On 12/12/12 8:09 PM, Linus Torvalds wrote:
> On Wed, Dec 12, 2012 at 7:02 PM, David Ahern <dsahern@gmail.com> wrote:
>>
>> Can you add -v and see if it spits out more info?
>
> No more info.
I'm surprised you are not seeing this as well:
} else if ((err == EOPNOTSUPP) && (attr->precise_ip)) {
ui__error("\'precise\' request may not be supported. "
"Try removing 'p' modifier\n");
rc = -err;
goto out;
}
I made changes in this area relatively recently; I'll take a look.
David
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-13 2:53 ` Linus Torvalds
2012-12-13 3:02 ` David Ahern
@ 2012-12-13 3:25 ` David Ahern
2012-12-13 3:34 ` Linus Torvalds
1 sibling, 1 reply; 34+ messages in thread
From: David Ahern @ 2012-12-13 3:25 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
Peter Zijlstra, Thomas Gleixner, Andrew Morton
On 12/12/12 7:53 PM, Linus Torvalds wrote:
> Hmm. This may be entirely unrelated to this particular pull request, but
>
> perf record -e cycles:pp
>
> no longer works on my westmere machine (Operation not supported). It
> used to work, but I haven't tried to bisect it, since I hope somebody
> will just go "oh, I know what's up".
One last "I may know what's up" question. I wonder if you are tripping
on this:
if (event->attr.precise_ip) {
int precise = 0;
if (!event->attr.exclude_guest)
return -EOPNOTSUPP;
Are you running an older perf binary on the 3.8 kernel?
Does this work: perf record -e cycles:ppH ...
David
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-13 3:25 ` David Ahern
@ 2012-12-13 3:34 ` Linus Torvalds
2012-12-13 3:43 ` David Ahern
0 siblings, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2012-12-13 3:34 UTC (permalink / raw)
To: David Ahern
Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
Peter Zijlstra, Thomas Gleixner, Andrew Morton
On Wed, Dec 12, 2012 at 7:25 PM, David Ahern <dsahern@gmail.com> wrote:
>
> Are you running an older perf binary on the 3.8 kernel?
I am.. I don't tend to rebuild 'perf'..
> Does this work: perf record -e cycles:ppH ...
Yes it does. What is 'H' and why should anybody care? Especially since
I'm not running virtualized.
That whole "exclude_guest" test is insane when there isn't any
virtualization going on. Very annoying.
Linus
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-13 3:34 ` Linus Torvalds
@ 2012-12-13 3:43 ` David Ahern
2012-12-13 3:51 ` Linus Torvalds
` (2 more replies)
0 siblings, 3 replies; 34+ messages in thread
From: David Ahern @ 2012-12-13 3:43 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
Peter Zijlstra, Thomas Gleixner, Andrew Morton
On 12/12/12 8:34 PM, Linus Torvalds wrote:
> On Wed, Dec 12, 2012 at 7:25 PM, David Ahern <dsahern@gmail.com> wrote:
>>
>> Are you running an older perf binary on the 3.8 kernel?
>
> I am.. I don't tend to rebuild 'perf'..
>
>> Does this work: perf record -e cycles:ppH ...
>
> Yes it does. What is 'H' and why should anybody care? Especially since
> I'm not running virtualized.
>
> That whole "exclude_guest" test is insane when there isn't any
> virtualization going on. Very annoying.
you know what's worse? All of your VMs blowing up because anyone runs
perf with precise attribute. Virtualization and and performance
monitoring collide. From the log message for commit 1342798.
"Intel PEBS in VT-x context uses the DS address as a guest linear
address, even though its programmed by the host as a host linear
address. This either results in guest memory corruption and or the
hardware faulting and 'crashing' the virtual machine. Therefore we have
to disable PEBS on VT-x enter and re-enable on VT-x exit, enforcing a
strict exclude_guest."
David
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-13 3:43 ` David Ahern
@ 2012-12-13 3:51 ` Linus Torvalds
2012-12-13 4:31 ` David Ahern
2012-12-13 7:48 ` [PATCH] Revert "perf: Require exclude_guest to use PEBS - kernel side enforcement" Ingo Molnar
[not found] ` <20121217102000.GE11016@redhat.com>
2 siblings, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2012-12-13 3:51 UTC (permalink / raw)
To: David Ahern
Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
Peter Zijlstra, Thomas Gleixner, Andrew Morton
On Wed, Dec 12, 2012 at 7:43 PM, David Ahern <dsahern@gmail.com> wrote:
>
> you know what's worse? All of your VMs blowing up because anyone runs perf
> with precise attribute. Virtualization and and performance monitoring
> collide. From the log message for commit 1342798.
>
> "Intel PEBS in VT-x context uses the DS address as a guest linear address,
> even though its programmed by the host as a host linear address. This either
> results in guest memory corruption and or the hardware faulting and
> 'crashing' the virtual machine. Therefore we have to disable PEBS on VT-x
> enter and re-enable on VT-x exit, enforcing a strict exclude_guest."
Right.
SO WHY DON'T YOU JUST DO THAT THEN?
Disable PEBS on Vt-x enter and re-enable it on exit. End of story.
Exactly like you say.
But don't in the process screw up people WHO DON'T EVEN DO VIRTUALIZATION!
So please, just remove that idiotic "if (!event->attr.exclude_guest)"
test. It's wrong. It cannot possibly do the right thing. It is
totally misdesigned, exactly because you don't even know beforehand if
somebody uses virtualization or not.
Now, if the feature had been done the sane way around, and you'd have
an explicit flag that says "force this even on entry to virtualized
guests", then you could have said "Dave, I can't do that combination
of precise and virtualized guests". At that point you have - at perf
record time - a valid reason to say EOPNOTSUPP.
But doing it this way was wrong. Switch that "exclude_guest" attribute
around, and admit that "H" was bogus, and that the right thing to do
was to add a "V" flag that sets the "force_guest" flag instead.
Problem solved, without screwing people who have no reason to ever care.
Linus
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-13 3:51 ` Linus Torvalds
@ 2012-12-13 4:31 ` David Ahern
2012-12-13 4:46 ` Linus Torvalds
2012-12-13 7:30 ` Ingo Molnar
0 siblings, 2 replies; 34+ messages in thread
From: David Ahern @ 2012-12-13 4:31 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
Peter Zijlstra, Thomas Gleixner, Andrew Morton
On 12/12/12 8:51 PM, Linus Torvalds wrote:
> SO WHY DON'T YOU JUST DO THAT THEN?
>
> Disable PEBS on Vt-x enter and re-enable it on exit. End of story.
> Exactly like you say.
See commit 26a4f3c0. But that was not enough. Requiring exclude_guest
was another required piece. If you want to see the discussion:
https://lkml.org/lkml/2012/7/9/264
>
> But doing it this way was wrong. Switch that "exclude_guest" attribute
> around, and admit that "H" was bogus, and that the right thing to do
> was to add a "V" flag that sets the "force_guest" flag instead.
I understand this is annoying. Older binaries on newer kernels was the
only case I could not fix. (I guess a message could be added kernel side
to at least give a hint.) But the alternative -- based on code that has
existed for some time -- is for older binaries to crash VMs.
David
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-13 4:31 ` David Ahern
@ 2012-12-13 4:46 ` Linus Torvalds
2012-12-13 7:27 ` Ingo Molnar
2012-12-13 7:30 ` Ingo Molnar
1 sibling, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2012-12-13 4:46 UTC (permalink / raw)
To: David Ahern
Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
Peter Zijlstra, Thomas Gleixner, Andrew Morton
On Wed, Dec 12, 2012 at 8:31 PM, David Ahern <dsahern@gmail.com> wrote:
>
>
> See commit 26a4f3c0. But that was not enough.
Why? Make the people who run virtualization do the extra work. Things
never worked for them anyway, so forcing *them* to set a flag to get a
working thing is sane.
Forcing everybody else to set a flag is insane. See?
Your "that was not enough" is insane. It's purely about which *default
convention* you choose. The "if (!event->attr.exclude_guest)" test is
the wrong default convention, and it *should* have been "if
(event->attr.include_guest)" with the virtualization people forced to
use "cycles:ppV".
Claiming that there is some hardware overrun is silly, since that's
totally *independent* of the choice of which way the flag works!
> Requiring exclude_guest was
> another required piece. If you want to see the discussion:
> https://lkml.org/lkml/2012/7/9/264
The only thing that discussion shows is that people were *AWARE* that
this was a stupid change. I see Peter pointing out that this breaks
peoples existing working setups.
You broke the WORKING case for old binaries in order to give an error
return in a case that NEVER EVEN WORKED with those binaries. Don't you
see how insane that is?
The 'H' flag is totally the wrong way around. Exactly because it only
"fixes" a case that was already working, and makes a case that never
worked anyway now return an error value. That's not sane. Since the
old broken case never worked, nobody can have depended on it. See why
I'm saying that it's the people who use virtualization who should be
forced to use the new flag, not the other way around?
Linus
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-13 4:46 ` Linus Torvalds
@ 2012-12-13 7:27 ` Ingo Molnar
0 siblings, 0 replies; 34+ messages in thread
From: Ingo Molnar @ 2012-12-13 7:27 UTC (permalink / raw)
To: Linus Torvalds
Cc: David Ahern, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
Peter Zijlstra, Thomas Gleixner, Andrew Morton
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Wed, Dec 12, 2012 at 8:31 PM, David Ahern <dsahern@gmail.com> wrote:
> >
> >
> > See commit 26a4f3c0. But that was not enough.
>
> Why? Make the people who run virtualization do the extra work. Things
> never worked for them anyway, so forcing *them* to set a flag to get a
> working thing is sane.
>
> Forcing everybody else to set a flag is insane. See?
Yeah, that's 100% stupid, we'll revert this change.
Arnado, wanna do it or should I? This slipped through the
testing cracks ...
Thanks,
Ingo
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-13 4:31 ` David Ahern
2012-12-13 4:46 ` Linus Torvalds
@ 2012-12-13 7:30 ` Ingo Molnar
2012-12-13 14:30 ` David Ahern
1 sibling, 1 reply; 34+ messages in thread
From: Ingo Molnar @ 2012-12-13 7:30 UTC (permalink / raw)
To: David Ahern
Cc: Linus Torvalds, Linux Kernel Mailing List,
Arnaldo Carvalho de Melo, Peter Zijlstra, Thomas Gleixner,
Andrew Morton
* David Ahern <dsahern@gmail.com> wrote:
> > But doing it this way was wrong. Switch that "exclude_guest"
> > attribute around, and admit that "H" was bogus, and that the
> > right thing to do was to add a "V" flag that sets the
> > "force_guest" flag instead.
>
> I understand this is annoying. [...]
It's not annoying, it's outright broken - it's a regression that
we'll fix.
> [...] Older binaries on newer kernels was the only case I
> could not fix. [...]
The "only" case?? Old, working binaries are actually our _most_
important usecase: it's 99.9% of our current installed base ...
> [...] (I guess a message could be added kernel side to at
> least give a hint.) But the alternative -- based on code that
> has existed for some time -- is for older binaries to crash
> VMs.
That should be fixed differently, by not breaking existing
working functionality.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] Revert "perf: Require exclude_guest to use PEBS - kernel side enforcement"
2012-12-13 3:43 ` David Ahern
2012-12-13 3:51 ` Linus Torvalds
@ 2012-12-13 7:48 ` Ingo Molnar
[not found] ` <20121217102000.GE11016@redhat.com>
2 siblings, 0 replies; 34+ messages in thread
From: Ingo Molnar @ 2012-12-13 7:48 UTC (permalink / raw)
To: David Ahern
Cc: Linus Torvalds, Linux Kernel Mailing List,
Arnaldo Carvalho de Melo, Peter Zijlstra, Thomas Gleixner,
Andrew Morton
* David Ahern <dsahern@gmail.com> wrote:
> On 12/12/12 8:34 PM, Linus Torvalds wrote:
> >On Wed, Dec 12, 2012 at 7:25 PM, David Ahern <dsahern@gmail.com> wrote:
> >>
> >>Are you running an older perf binary on the 3.8 kernel?
> >
> >I am.. I don't tend to rebuild 'perf'..
> >
> >>Does this work: perf record -e cycles:ppH ...
> >
> >Yes it does. What is 'H' and why should anybody care? Especially since
> >I'm not running virtualized.
> >
> > That whole "exclude_guest" test is insane when there isn't
> > any virtualization going on. Very annoying.
>
> you know what's worse? [...]
No, nothing can be worse than breaking 99% of our installed
base...
I'm wondering where this broke - is it:
20b279ddb38c perf: Require exclude_guest to use PEBS - kernel side enforcement
Linus, does the straight revert below fix everything for you -
or do we need to do more?
( The VM problem needs a different fix: a new include_guest bit
should be introduced, which would naturally default to 'off'
on older binaries, and the old bit should be phased out. Then
new perf binaries can turn on that bit safely. Or PEBS should
be fixed for guests. Or something along these lines - but
it should *not* by fixed by regressing existing binaries ... )
Thanks,
Ingo
----------------->
>From 581ba4671bf1d1095e9ecf843be61904e4c97e91 Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo@kernel.org>
Date: Thu, 13 Dec 2012 08:41:40 +0100
Subject: [PATCH] Revert "perf: Require exclude_guest to use PEBS - kernel
side enforcement"
This reverts commit 20b279ddb38ca42f8863cec07b4d45ec24589f13.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
arch/x86/kernel/cpu/perf_event.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 4428fd1..6774c17 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -340,9 +340,6 @@ int x86_setup_perfctr(struct perf_event *event)
/* BTS is currently only allowed for user-mode. */
if (!attr->exclude_kernel)
return -EOPNOTSUPP;
-
- if (!attr->exclude_guest)
- return -EOPNOTSUPP;
}
hwc->config |= config;
@@ -385,9 +382,6 @@ int x86_pmu_hw_config(struct perf_event *event)
if (event->attr.precise_ip) {
int precise = 0;
- if (!event->attr.exclude_guest)
- return -EOPNOTSUPP;
-
/* Support for constant skid */
if (x86_pmu.pebs_active && !x86_pmu.pebs_broken) {
precise++;
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-13 7:30 ` Ingo Molnar
@ 2012-12-13 14:30 ` David Ahern
2012-12-13 14:38 ` David Ahern
2012-12-13 16:03 ` Linus Torvalds
0 siblings, 2 replies; 34+ messages in thread
From: David Ahern @ 2012-12-13 14:30 UTC (permalink / raw)
To: Ingo Molnar, Linus Torvalds
Cc: Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
Peter Zijlstra, Thomas Gleixner, Andrew Morton
On 12/13/12 12:30 AM, Ingo Molnar wrote:
>
> * David Ahern <dsahern@gmail.com> wrote:
>
>>> But doing it this way was wrong. Switch that "exclude_guest"
>>> attribute around, and admit that "H" was bogus, and that the
>>> right thing to do was to add a "V" flag that sets the
>>> "force_guest" flag instead.
>>
>> I understand this is annoying. [...]
>
> It's not annoying, it's outright broken - it's a regression that
> we'll fix.
One of the problems is that existing binaries set the exclude_guest flag
(https://lkml.org/lkml/2012/7/9/292).
So, requesting users to update their binaries if they want to use
precise sampling is not acceptable. A 100% catastrophic failure of all
running VMs is acceptable? All VMs will crash and there is no direct
causal relationship.
David
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-13 14:30 ` David Ahern
@ 2012-12-13 14:38 ` David Ahern
2012-12-13 16:03 ` Linus Torvalds
1 sibling, 0 replies; 34+ messages in thread
From: David Ahern @ 2012-12-13 14:38 UTC (permalink / raw)
To: Ingo Molnar, Linus Torvalds
Cc: Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
Peter Zijlstra, Thomas Gleixner, Andrew Morton
On 12/13/12 7:30 AM, David Ahern wrote:
>> It's not annoying, it's outright broken - it's a regression that
>> we'll fix.
>
> One of the problems is that existing binaries set the exclude_guest flag
> (https://lkml.org/lkml/2012/7/9/292).
Correction, I meant to say one of the problems is that existing binaries
sets the flag to 0 when precise is used.
>
> So, requesting users to update their binaries if they want to use
> precise sampling is not acceptable. A 100% catastrophic failure of all
> running VMs is acceptable? All VMs will crash and there is no direct
> causal relationship.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-13 14:30 ` David Ahern
2012-12-13 14:38 ` David Ahern
@ 2012-12-13 16:03 ` Linus Torvalds
2012-12-13 16:24 ` David Ahern
1 sibling, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2012-12-13 16:03 UTC (permalink / raw)
To: David Ahern
Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
Peter Zijlstra, Thomas Gleixner, Andrew Morton
On Thu, Dec 13, 2012 at 6:30 AM, David Ahern <dsahern@gmail.com> wrote:
>
> One of the problems is that existing binaries set the exclude_guest flag
> (https://lkml.org/lkml/2012/7/9/292).
[ to zero ]
Yeah. And it apparently *never* worked. So it's not a regression.
> So, requesting users to update their binaries if they want to use precise
> sampling is not acceptable. A 100% catastrophic failure of all running VMs
> is acceptable? All VMs will crash and there is no direct causal
> relationship.
So instead, you expect everybody else - for whom things *used* to work
- to upgrade their binary, or their scripts, or just start using an
insane command line flag that makes no sense for them? Forcing
non-virtualization users to use a "only trace the host" flag is crazy.
Either way, somebody will be unhappy. No question about that. But our
rule in the kernel is "no regressions".
Now, I do agree that for "perf", it's fairly easy to say "just
recompile". I can do it in seconds, and it would presumably solve my
problem by just making the "host only" case the default, and I don't
need the "H" any more.
But that whole "no regressions" really is important. I can work around
things very easily, but the "no regressions" rule really means that I
should never *need* to work around things.
So when I see a regression, I consider it a major bug, even if the
workaround is trivial.
Linus
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-13 16:03 ` Linus Torvalds
@ 2012-12-13 16:24 ` David Ahern
2012-12-13 16:33 ` Linus Torvalds
0 siblings, 1 reply; 34+ messages in thread
From: David Ahern @ 2012-12-13 16:24 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
Peter Zijlstra, Thomas Gleixner, Andrew Morton
On 12/13/12 9:03 AM, Linus Torvalds wrote:
> On Thu, Dec 13, 2012 at 6:30 AM, David Ahern <dsahern@gmail.com> wrote:
>>
>> One of the problems is that existing binaries set the exclude_guest flag
>> (https://lkml.org/lkml/2012/7/9/292).
>
> [ to zero ]
>
> Yeah. And it apparently *never* worked. So it's not a regression.
The flag works. It does have a purpose. I did not write the original
code; I am not defending its design. It is what is. We now have a
catastrophic problem that needs to be fixed.
> So instead, you expect everybody else - for whom things *used* to work
> - to upgrade their binary, or their scripts, or just start using an
> insane command line flag that makes no sense for them? Forcing
> non-virtualization users to use a "only trace the host" flag is crazy.
>
> Either way, somebody will be unhappy. No question about that. But our
> rule in the kernel is "no regressions".
...
> But that whole "no regressions" really is important. I can work around
> things very easily, but the "no regressions" rule really means that I
> should never *need* to work around things.
I get the regressions point. I have seen that statement from you enough
I think you have it on a permanent copy-and-paste shortcut.
Without the kernel side restriction existing perf binaries will crash
all running VMs. I could write the patch to completely invert the
exclude_guest logic -- make it include_guest. That breaks all existing
perf binaries as well - just a different syntax that gets broken. That
regression is acceptable?
David
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-13 16:24 ` David Ahern
@ 2012-12-13 16:33 ` Linus Torvalds
2012-12-13 16:59 ` Ingo Molnar
2012-12-13 17:02 ` Linus Torvalds
0 siblings, 2 replies; 34+ messages in thread
From: Linus Torvalds @ 2012-12-13 16:33 UTC (permalink / raw)
To: David Ahern
Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
Peter Zijlstra, Thomas Gleixner, Andrew Morton
On Thu, Dec 13, 2012 at 8:24 AM, David Ahern <dsahern@gmail.com> wrote:
>
> Without the kernel side restriction existing perf binaries will crash all
> running VMs.
..and they apparently always did, and we had that situation for years
without anybody ever even noticing.
And no, it's not a security fix, since you can just add the 'H' flag
and it will *still* crash according to the thread I saw (ie there is
some race condition in PEBS handling at VM entry, possibly at a
hardware level).
So the real security fix has to either fix the root cause or the
actual crash (which apparently is unknown), or to make perf be
root-only at least in the presense of virtualization.
The "return EOPNOTSUPP" thing does nothing but annoy people.
> I could write the patch to completely invert the exclude_guest
> logic -- make it include_guest. That breaks all existing perf binaries as
> well - just a different syntax that gets broken. That regression is
> acceptable?
It's not a regression since THAT CODE NEVER WORKED, for chissake! The
case of people actually profiling into virtual machines crashes the
running VMs, as you say. There's no way in hell we can call it a
regression to say "you now have to use a flag if you profile a load
with virtualization", since there wasn't any working case to begin
with.
Linus
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-13 16:33 ` Linus Torvalds
@ 2012-12-13 16:59 ` Ingo Molnar
2012-12-13 17:10 ` Linus Torvalds
2012-12-13 17:02 ` Linus Torvalds
1 sibling, 1 reply; 34+ messages in thread
From: Ingo Molnar @ 2012-12-13 16:59 UTC (permalink / raw)
To: Linus Torvalds
Cc: David Ahern, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
Peter Zijlstra, Thomas Gleixner, Andrew Morton
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > I could write the patch to completely invert the
> > exclude_guest logic -- make it include_guest. That breaks
> > all existing perf binaries as well - just a different syntax
> > that gets broken. That regression is acceptable?
>
> It's not a regression since THAT CODE NEVER WORKED, for
> chissake! The case of people actually profiling into virtual
> machines crashes the running VMs, as you say. There's no way
> in hell we can call it a regression to say "you now have to
> use a flag if you profile a load with virtualization", since
> there wasn't any working case to begin with.
Correct.
::include_guest looks like the more logical flag direction to
use in any case.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-13 16:33 ` Linus Torvalds
2012-12-13 16:59 ` Ingo Molnar
@ 2012-12-13 17:02 ` Linus Torvalds
2012-12-13 17:30 ` David Ahern
1 sibling, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2012-12-13 17:02 UTC (permalink / raw)
To: David Ahern
Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
Peter Zijlstra, Thomas Gleixner, Andrew Morton
Btw, I do *not* think that you should necessariyl default to 'H' for
host-only mode.
The way it should work is that ":pp", ":ppH" and ":ppV" are all different.
- "cycles:ppH" means: I want precise cycles only for the host case
- "cycles:ppV" means: I want precise cycles, and I want the VM too
This would result in EOPNOTSUPP for the case we know is buggy (but
presumably work on some other CPUs that don't have the problem)
- "cycles:pp" is "I want precise cycles, and I don't care about virtualization"
This would do whatever works. So it would basically become
host-only, but if you don't want precise cycles (so no ":pp") then
whatever our old behavior was (presumably "profile the virtual machine
too") would be what happens.
That sounds like (a) the interface that people want and (b) entirely
backwards-compatible for all cases that can matter (where "oops, I
crashed the VM" case does not matter).
Linus
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] x86: fix perf build with uclibc toolchains
2012-12-11 9:09 [GIT PULL] perf changes for v3.8 Ingo Molnar
2012-12-13 2:53 ` Linus Torvalds
@ 2012-12-13 17:04 ` Florian Fainelli
1 sibling, 0 replies; 34+ messages in thread
From: Florian Fainelli @ 2012-12-13 17:04 UTC (permalink / raw)
To: open list; +Cc: Florian Fainelli
libio.h is not provided by uClibc, in order to be able to test the
definition of __UCLIBC__ we need to include stdlib.h, which also
includes stddef.h, providing the definition of 'NULL'
Signed-off-by: Florian Fainelli <florian@openwrt.org>
---
tools/perf/arch/x86/util/dwarf-regs.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/perf/arch/x86/util/dwarf-regs.c b/tools/perf/arch/x86/util/dwarf-regs.c
index a794d30..6f5267f 100644
--- a/tools/perf/arch/x86/util/dwarf-regs.c
+++ b/tools/perf/arch/x86/util/dwarf-regs.c
@@ -20,7 +20,10 @@
*
*/
+#include <stdlib.h>
+#ifndef __UCLIBC__
#include <libio.h>
+#endif
#include <dwarf-regs.h>
/*
--
1.7.10.4
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-13 16:59 ` Ingo Molnar
@ 2012-12-13 17:10 ` Linus Torvalds
2012-12-13 17:31 ` Ingo Molnar
0 siblings, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2012-12-13 17:10 UTC (permalink / raw)
To: Ingo Molnar
Cc: David Ahern, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
Peter Zijlstra, Thomas Gleixner, Andrew Morton
On Thu, Dec 13, 2012 at 8:59 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>
>> It's not a regression since THAT CODE NEVER WORKED, for
>> chissake! The case of people actually profiling into virtual
>> machines crashes the running VMs, as you say. There's no way
>> in hell we can call it a regression to say "you now have to
>> use a flag if you profile a load with virtualization", since
>> there wasn't any working case to begin with.
>
> Correct.
>
> ::include_guest looks like the more logical flag direction to
> use in any case.
See the email I just sent. The *non*-precise case presumably used to
work (and included the virtualized environment). No?
So the default shouldn't necessarily be "include guest". The default
should presumably be "the user didn't say", and then the kernel does
whatever works best.
If the user actually explicitly says one or the other, we should try
to honor that (and then EOPNOTSUPP may be a "sorry, I really cannot do
that particular combination that you explicitly asked for").
That should make everybody happy. Doing a non-PEBS virtualized perf
run should still work with the old binary.
So there should be two bits: "include guest" (V in the event specifier
unless you already used that for something else) and "host only" (H),
and they should both default to off. Then the kernel can see the three
actual cases.
(Or four cases, if you really want to: you may or may not want to make
the "both V and H set means both, and _only_ V set means 'no host at
all, _only_ virtual environment'. So then ":ppV" would mean
"cycle-accurate for virtual box _only_", while ":ppVH" would mean
"cycle-accurate for both the host and the virtual box". Of course,
considering the PEBS interface, right now neither of those can
actually work, but plain ":V" and ":HV" could work).
The important thing, I think, is that if the user doesn't know or care
about the VM case (because he's not running any!) and doesn't specify,
then the kernel should not say EOPNOTSUPP, and should do whatever
works for that cpu.
Linus
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-13 17:02 ` Linus Torvalds
@ 2012-12-13 17:30 ` David Ahern
2012-12-13 17:36 ` Ingo Molnar
0 siblings, 1 reply; 34+ messages in thread
From: David Ahern @ 2012-12-13 17:30 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
Peter Zijlstra, Thomas Gleixner, Andrew Morton
On 12/13/12 10:02 AM, Linus Torvalds wrote:
From your response to Ingo I take it you looked into other cases. I'll
summarize here to make sure we are on the same page:
1. guest only profiling from the host
perf {record|top} -e cycles:G
2. host only profiling
perf {record|top} -e cycles:H
These are 4 existing use cases that toggle exclude_guest and do work
today for those who care. Not the lack of precise attribute on the
commands. These are the existing use cases that break by inverting the
logic in the kernel.
The problem child is perf record -e cycles:ppG. That command silently
crashes running VMs. You don't get a pop up or message that says "Dave,
you crashed your VMs running perf". You don't notice the VMs have
crashed until you attempt to login or what have you.
So how many perf users are having weird VM crashes? I don't know. I just
happened to:
1. not use libbvirt
2. have a running VM with console messages kicked to ttyS0
3. ttyS0 connected to stdio
4. screen session with a running VM open
at the time I ran perf.
> Btw, I do *not* think that you should necessariyl default to 'H' for
> host-only mode.
The change made to perf userspace was to set exclude_guest IF precise is
requested AND GH have not been specified.
>
> The way it should work is that ":pp", ":ppH" and ":ppV" are all different.
>
> - "cycles:ppH" means: I want precise cycles only for the host case
>
> - "cycles:ppV" means: I want precise cycles, and I want the VM too
>
> This would result in EOPNOTSUPP for the case we know is buggy (but
> presumably work on some other CPUs that don't have the problem)
>
> - "cycles:pp" is "I want precise cycles, and I don't care about virtualization"
yes, this is the case I handled within perf userspace.
And then there is the whole 'perf kvm {top|record}' twists to the perf code.
David
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-13 17:10 ` Linus Torvalds
@ 2012-12-13 17:31 ` Ingo Molnar
2012-12-17 4:43 ` David Ahern
0 siblings, 1 reply; 34+ messages in thread
From: Ingo Molnar @ 2012-12-13 17:31 UTC (permalink / raw)
To: Linus Torvalds
Cc: David Ahern, Linux Kernel Mailing List, Arnaldo Carvalho de Melo,
Peter Zijlstra, Thomas Gleixner, Andrew Morton
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Thu, Dec 13, 2012 at 8:59 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >>
> >> It's not a regression since THAT CODE NEVER WORKED, for
> >> chissake! The case of people actually profiling into virtual
> >> machines crashes the running VMs, as you say. There's no way
> >> in hell we can call it a regression to say "you now have to
> >> use a flag if you profile a load with virtualization", since
> >> there wasn't any working case to begin with.
> >
> > Correct.
> >
> > ::include_guest looks like the more logical flag direction to
> > use in any case.
>
> See the email I just sent. The *non*-precise case presumably used to
> work (and included the virtualized environment). No?
>
> So the default shouldn't necessarily be "include guest". The default
> should presumably be "the user didn't say", and then the kernel does
> whatever works best.
>
> If the user actually explicitly says one or the other, we should try
> to honor that (and then EOPNOTSUPP may be a "sorry, I really cannot do
> that particular combination that you explicitly asked for").
>
> That should make everybody happy. Doing a non-PEBS virtualized perf
> run should still work with the old binary.
>
> So there should be two bits: "include guest" (V in the event specifier
> unless you already used that for something else) and "host only" (H),
> and they should both default to off. Then the kernel can see the three
> actual cases.
>
> (Or four cases, if you really want to: you may or may not want to make
> the "both V and H set means both, and _only_ V set means 'no host at
> all, _only_ virtual environment'. So then ":ppV" would mean
> "cycle-accurate for virtual box _only_", while ":ppVH" would mean
> "cycle-accurate for both the host and the virtual box". Of course,
> considering the PEBS interface, right now neither of those can
> actually work, but plain ":V" and ":HV" could work).
>
> The important thing, I think, is that if the user doesn't know
> or care about the VM case (because he's not running any!) and
> doesn't specify, then the kernel should not say EOPNOTSUPP,
> and should do whatever works for that cpu.
Agreed.
David, wanna send a patch for this?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-13 17:30 ` David Ahern
@ 2012-12-13 17:36 ` Ingo Molnar
2012-12-13 19:12 ` David Ahern
0 siblings, 1 reply; 34+ messages in thread
From: Ingo Molnar @ 2012-12-13 17:36 UTC (permalink / raw)
To: David Ahern
Cc: Linus Torvalds, Linux Kernel Mailing List,
Arnaldo Carvalho de Melo, Peter Zijlstra, Thomas Gleixner,
Andrew Morton
* David Ahern <dsahern@gmail.com> wrote:
> On 12/13/12 10:02 AM, Linus Torvalds wrote:
>
> From your response to Ingo I take it you looked into other cases.
> I'll summarize here to make sure we are on the same page:
>
> 1. guest only profiling from the host
> perf {record|top} -e cycles:G
>
> 2. host only profiling
> perf {record|top} -e cycles:H
>
> These are 4 existing use cases that toggle exclude_guest and do work
> today for those who care. Not the lack of precise attribute on the
> commands. These are the existing use cases that break by inverting
> the logic in the kernel.
>
> The problem child is perf record -e cycles:ppG. [...]
The #1 problem child in this particular case, the one you should
care about most is:
perf record -e cycles:pp
As 99% of the people won't be doing any host or guest side
profiling, they just want to do profiling.
The above G/H variants are the 1%.
So make sure the default works fine, that old binaries don't
stop working - and then you can automatically (by default)
exclude guest profiling if PEBS is enabled, and only ever reject
profiling in the very specific case of:
perf record -e cycles:ppG
where the user asks for something we don't support (yet).
So please stop thinking exclusively with a virtualization hat
on, first think with a generic kernel developer hat on. Once all
those cases work, make the virtualization case do the right
thing as well.
We can fix all these cases properly and automatically, users
don't need to specify flags they don't care about, old binaries
will work and no VMs will crash.
Ok? So now we need a patch that does all that - otherwise we'll
have to revert the one that added the regression.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-13 17:36 ` Ingo Molnar
@ 2012-12-13 19:12 ` David Ahern
0 siblings, 0 replies; 34+ messages in thread
From: David Ahern @ 2012-12-13 19:12 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, Linux Kernel Mailing List,
Arnaldo Carvalho de Melo, Peter Zijlstra, Thomas Gleixner,
Andrew Morton
On 12/13/12 10:36 AM, Ingo Molnar wrote:
> The #1 problem child in this particular case, the one you should
> care about most is:
>
> perf record -e cycles:pp
>
> As 99% of the people won't be doing any host or guest side
> profiling, they just want to do profiling.
In older code (v3.6 and before) -e cycles:p sets the exclude_guest flag
to 0, so -e cycles:p and -e cycles:pG are equivalent with respect to
exclude_guest.
> Ok? So now we need a patch that does all that - otherwise we'll
> have to revert the one that added the regression.
I will not have time to work on a patch until Sunday at the earliest.
David
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-13 17:31 ` Ingo Molnar
@ 2012-12-17 4:43 ` David Ahern
2012-12-22 19:22 ` David Ahern
0 siblings, 1 reply; 34+ messages in thread
From: David Ahern @ 2012-12-17 4:43 UTC (permalink / raw)
To: Ingo Molnar, Arnaldo Carvalho de Melo
Cc: Linus Torvalds, Linux Kernel Mailing List, Peter Zijlstra,
Thomas Gleixner, Andrew Morton
On 12/13/12 10:31 AM, Ingo Molnar wrote:
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>> So the default shouldn't necessarily be "include guest". The default
>> should presumably be "the user didn't say", and then the kernel does
>> whatever works best.
>>
>> If the user actually explicitly says one or the other, we should try
>> to honor that (and then EOPNOTSUPP may be a "sorry, I really cannot do
>> that particular combination that you explicitly asked for").
>>
>> That should make everybody happy. Doing a non-PEBS virtualized perf
>> run should still work with the old binary.
>>
>> So there should be two bits: "include guest" (V in the event specifier
>> unless you already used that for something else) and "host only" (H),
>> and they should both default to off. Then the kernel can see the three
>> actual cases.
>>
>> (Or four cases, if you really want to: you may or may not want to make
>> the "both V and H set means both, and _only_ V set means 'no host at
>> all, _only_ virtual environment'. So then ":ppV" would mean
>> "cycle-accurate for virtual box _only_", while ":ppVH" would mean
>> "cycle-accurate for both the host and the virtual box". Of course,
>> considering the PEBS interface, right now neither of those can
>> actually work, but plain ":V" and ":HV" could work).
>>
>> The important thing, I think, is that if the user doesn't know
>> or care about the VM case (because he's not running any!) and
>> doesn't specify, then the kernel should not say EOPNOTSUPP,
>> and should do whatever works for that cpu.
>
> Agreed.
>
> David, wanna send a patch for this?
As I mentioned in a prior email exclude_{guest,host} work currently work
fine without PEBS. The current matrix for the flags:
profiling
guest host
-e <event> y y
-e <event>:G y n - G means enable guest, turn off host
-e <event>:H n y - H means enable host, turn off guest
-e <event>:GH y y - G followed by H means enable both
-e <event>:HG y y - same as GH
There is no reason to change how these work. It's the variants with :p
that need to be handled:
-e <event>:p n y - guest off is required
-e <event>:pG y n - needs to fail - not supported
-e <event>:pH n y
-e <event>:pGH y y - needs to fail - not supported
This is the logic that was implemented in the original patchset which
was pulled into v3.7 and the cause of this email thread.
One suggestion was to switch exclude_guest to include_guest. I take that
to mean deprecate the current exclude_guest and add a new include_guest
flag. Given that there are a number of exclude_XXXX flags (XXXX = user,
kernel, host, guest, hv, etc) that would make the perf code inconsistent.
All that is needed is for the current exclude_guest flag to be
deprecated such that for older binaries on newer kernels it is ignored
(perhaps a warn on once), and then a new flag -- exclude_guest2 -- is
then used for the new logic.
e.g.,
diff --git a/include/uapi/linux/perf_event.h
b/include/uapi/linux/perf_event.h
index 4f63c05..19900df 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -266,12 +266,14 @@ struct perf_event_attr {
sample_id_all : 1, /* sample_type all events */
exclude_host : 1, /* don't count in host */
- exclude_guest : 1, /* don't count in guest */
+ exclude_guest : 1, /* don't count in guest - DEPRECATED */
exclude_callchain_kernel : 1, /* exclude kernel
callchains */
exclude_callchain_user : 1, /* exclude user callchains */
- __reserved_1 : 41;
+ exclude_guest2 : 1, /* don't count in guest */
+
+ __reserved_1 : 40;
union {
__u32 wakeup_events; /* wakeup every n events */
Do you agree with that?
David
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-17 4:43 ` David Ahern
@ 2012-12-22 19:22 ` David Ahern
2012-12-23 0:00 ` Linus Torvalds
0 siblings, 1 reply; 34+ messages in thread
From: David Ahern @ 2012-12-22 19:22 UTC (permalink / raw)
To: Ingo Molnar, Arnaldo Carvalho de Melo
Cc: Linus Torvalds, Linux Kernel Mailing List, Peter Zijlstra,
Thomas Gleixner, Andrew Morton
any opinions on whether the approach is reasonable?
On 12/16/12 9:43 PM, David Ahern wrote:
> On 12/13/12 10:31 AM, Ingo Molnar wrote:
>> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>> So the default shouldn't necessarily be "include guest". The default
>>> should presumably be "the user didn't say", and then the kernel does
>>> whatever works best.
>>>
>>> If the user actually explicitly says one or the other, we should try
>>> to honor that (and then EOPNOTSUPP may be a "sorry, I really cannot do
>>> that particular combination that you explicitly asked for").
>>>
>>> That should make everybody happy. Doing a non-PEBS virtualized perf
>>> run should still work with the old binary.
>>>
>>> So there should be two bits: "include guest" (V in the event specifier
>>> unless you already used that for something else) and "host only" (H),
>>> and they should both default to off. Then the kernel can see the three
>>> actual cases.
>>>
>>> (Or four cases, if you really want to: you may or may not want to make
>>> the "both V and H set means both, and _only_ V set means 'no host at
>>> all, _only_ virtual environment'. So then ":ppV" would mean
>>> "cycle-accurate for virtual box _only_", while ":ppVH" would mean
>>> "cycle-accurate for both the host and the virtual box". Of course,
>>> considering the PEBS interface, right now neither of those can
>>> actually work, but plain ":V" and ":HV" could work).
>>>
>>> The important thing, I think, is that if the user doesn't know
>>> or care about the VM case (because he's not running any!) and
>>> doesn't specify, then the kernel should not say EOPNOTSUPP,
>>> and should do whatever works for that cpu.
>>
>> Agreed.
>>
>> David, wanna send a patch for this?
>
> As I mentioned in a prior email exclude_{guest,host} work currently work
> fine without PEBS. The current matrix for the flags:
>
> profiling
> guest host
> -e <event> y y
> -e <event>:G y n - G means enable guest, turn off host
> -e <event>:H n y - H means enable host, turn off guest
> -e <event>:GH y y - G followed by H means enable both
> -e <event>:HG y y - same as GH
>
> There is no reason to change how these work. It's the variants with :p
> that need to be handled:
>
> -e <event>:p n y - guest off is required
> -e <event>:pG y n - needs to fail - not supported
> -e <event>:pH n y
> -e <event>:pGH y y - needs to fail - not supported
>
> This is the logic that was implemented in the original patchset which
> was pulled into v3.7 and the cause of this email thread.
>
> One suggestion was to switch exclude_guest to include_guest. I take that
> to mean deprecate the current exclude_guest and add a new include_guest
> flag. Given that there are a number of exclude_XXXX flags (XXXX = user,
> kernel, host, guest, hv, etc) that would make the perf code inconsistent.
>
> All that is needed is for the current exclude_guest flag to be
> deprecated such that for older binaries on newer kernels it is ignored
> (perhaps a warn on once), and then a new flag -- exclude_guest2 -- is
> then used for the new logic.
>
> e.g.,
>
> diff --git a/include/uapi/linux/perf_event.h
> b/include/uapi/linux/perf_event.h
> index 4f63c05..19900df 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -266,12 +266,14 @@ struct perf_event_attr {
> sample_id_all : 1, /* sample_type all events */
>
> exclude_host : 1, /* don't count in host */
> - exclude_guest : 1, /* don't count in guest */
> + exclude_guest : 1, /* don't count in guest -
> DEPRECATED */
>
> exclude_callchain_kernel : 1, /* exclude kernel
> callchains */
> exclude_callchain_user : 1, /* exclude user
> callchains */
>
> - __reserved_1 : 41;
> + exclude_guest2 : 1, /* don't count in guest */
> +
> + __reserved_1 : 40;
>
> union {
> __u32 wakeup_events; /* wakeup every n events */
>
>
> Do you agree with that?
>
> David
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
[not found] ` <20121217102000.GE11016@redhat.com>
@ 2012-12-22 19:30 ` David Ahern
2012-12-23 9:23 ` Gleb Natapov
0 siblings, 1 reply; 34+ messages in thread
From: David Ahern @ 2012-12-22 19:30 UTC (permalink / raw)
To: Gleb Natapov
Cc: Linus Torvalds, Ingo Molnar, Linux Kernel Mailing List,
Arnaldo Carvalho de Melo, Peter Zijlstra, Thomas Gleixner,
Andrew Morton
On 12/17/12 3:20 AM, Gleb Natapov wrote:
> Does the regression happen because of commit 20b279ddb38c. If it does I
> think it is safe to revert it. KVM disables PEBS during guest entry now, so
> VMs shouldn't be blowing up (they do not in my testing) and if they still
> do we can disable the counter that has PEBS enabled on a guest entry too.
> Yes, if user runs "perf record -e cycles:ppG" he will not know that
> kernel ignored :pp modifier (with 20b279ddb38c he will get an error), but
> at least old binaries will continue working and new binaries can do the
> checking in userspace.
>
Your patch alone was not enough. Start here:
https://lkml.org/lkml/2012/7/12/3
And from your response:
https://lkml.org/lkml/2012/7/12/337
"Do not run perf kvm. It does not set exclude_guest and :p and :pp is
not compatible with guest profiling and should be disallowed. Again
Peter's patch takes care of this."
20b279ddb38c is Peter's patch -- kernel side enforcement that
exclude_guest needs to be set when using precise mode.
David
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-22 19:22 ` David Ahern
@ 2012-12-23 0:00 ` Linus Torvalds
0 siblings, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2012-12-23 0:00 UTC (permalink / raw)
To: David Ahern
Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Linux Kernel Mailing List,
Peter Zijlstra, Thomas Gleixner, Andrew Morton
Sure. I actually think both should be deprecated, and replaced with
explicit "trace guest" vs "trace host" flags, just to make it
consistent, but I don't care deeply.
The only thing I care about is that compatibility must never be broken
(which the current setup does), and that people who don't use
virtualization should never be asked to use flags that have anything
to do with virtualization (which the current work-around involves).
Linus
On Sat, Dec 22, 2012 at 11:22 AM, David Ahern <dsahern@gmail.com> wrote:
> any opinions on whether the approach is reasonable?
>
> On 12/16/12 9:43 PM, David Ahern wrote:
>>
>> All that is needed is for the current exclude_guest flag to be
>> deprecated such that for older binaries on newer kernels it is ignored
>> (perhaps a warn on once), and then a new flag -- exclude_guest2 -- is
>> then used for the new logic.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-22 19:30 ` [GIT PULL] perf changes for v3.8 David Ahern
@ 2012-12-23 9:23 ` Gleb Natapov
2012-12-23 23:17 ` David Ahern
0 siblings, 1 reply; 34+ messages in thread
From: Gleb Natapov @ 2012-12-23 9:23 UTC (permalink / raw)
To: David Ahern
Cc: Linus Torvalds, Ingo Molnar, Linux Kernel Mailing List,
Arnaldo Carvalho de Melo, Peter Zijlstra, Thomas Gleixner,
Andrew Morton
On Sat, Dec 22, 2012 at 12:30:35PM -0700, David Ahern wrote:
> On 12/17/12 3:20 AM, Gleb Natapov wrote:
> >Does the regression happen because of commit 20b279ddb38c. If it does I
> >think it is safe to revert it. KVM disables PEBS during guest entry now, so
> >VMs shouldn't be blowing up (they do not in my testing) and if they still
> >do we can disable the counter that has PEBS enabled on a guest entry too.
> >Yes, if user runs "perf record -e cycles:ppG" he will not know that
> >kernel ignored :pp modifier (with 20b279ddb38c he will get an error), but
> >at least old binaries will continue working and new binaries can do the
> >checking in userspace.
> >
>
> Your patch alone was not enough. Start here:
> https://lkml.org/lkml/2012/7/12/3
>
I cannot reproduce this failure. I reverted 20b279ddb38c and ran "perf
record -e cycles:ppG" while guest was running. Admittedly I ran the test
for a short time, but without disabling PEBS during the guest entry this
was enough to crash a guest.
The difference between "perf record -e cycles:ppG" and "perf record -e
cycles:ppH" from KVM point of view is that for ppH PMU counter and PEBS
will be disabled during a guest entry for ppG only PEBS will be disabled,
so may be my testing is not enough and if counter remains enabled PEBS
write can eventually overshoot guest entry. In this case we can treat
ppG and ppH the same during guest entry and disable both counter and PEBS.
> And from your response:
> https://lkml.org/lkml/2012/7/12/337
>
> "Do not run perf kvm. It does not set exclude_guest and :p and :pp
> is not compatible with guest profiling and should be disallowed.
> Again Peter's patch takes care of this."
>
I stand by this :) It should be disallowed as in "user should get
a warning that he does something wrong and hist settings will be
ignored". Unfortunately the way it was implemented breaks old perf
binaries and keeping them running is more important than warning users
about something that never worked anyway. New perf binary can do the
check in userspace. Kernel should still disallow configuration that may
crash a guest, but not in a way that breaks the userspace that does not
set exclude_guest. What about forcing exclude_guest on an event that
has precise flag set without reporting error to userspace?
> 20b279ddb38c is Peter's patch -- kernel side enforcement that
> exclude_guest needs to be set when using precise mode.
>
> David
--
Gleb.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-23 9:23 ` Gleb Natapov
@ 2012-12-23 23:17 ` David Ahern
2012-12-24 10:36 ` Gleb Natapov
0 siblings, 1 reply; 34+ messages in thread
From: David Ahern @ 2012-12-23 23:17 UTC (permalink / raw)
To: Gleb Natapov
Cc: Linus Torvalds, Ingo Molnar, Linux Kernel Mailing List,
Arnaldo Carvalho de Melo, Peter Zijlstra, Thomas Gleixner,
Andrew Morton
On 12/23/12 2:23 AM, Gleb Natapov wrote:
>> Your patch alone was not enough. Start here:
>> https://lkml.org/lkml/2012/7/12/3
>>
> I cannot reproduce this failure. I reverted 20b279ddb38c and ran "perf
> record -e cycles:ppG" while guest was running. Admittedly I ran the test
> for a short time, but without disabling PEBS during the guest entry this
> was enough to crash a guest.
In the beginning (without any patches) VMs crashed fairly quickly. With
your patch it took longer, but I was able to consistently crash VMs. The
thread notes server info (processor, OS) and VM versions as well as load
used for the tests -- a cpu bound process (openssl), disk bound (dd) and
network (netperf).
> What about forcing exclude_guest on an event that
> has precise flag set without reporting error to userspace?
That's up to the perf maintainers -- Ingo, Peter, Arnaldo. Personally, I
don't like it since kernel side is changing the user request.
David
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [GIT PULL] perf changes for v3.8
2012-12-23 23:17 ` David Ahern
@ 2012-12-24 10:36 ` Gleb Natapov
0 siblings, 0 replies; 34+ messages in thread
From: Gleb Natapov @ 2012-12-24 10:36 UTC (permalink / raw)
To: David Ahern
Cc: Linus Torvalds, Ingo Molnar, Linux Kernel Mailing List,
Arnaldo Carvalho de Melo, Peter Zijlstra, Thomas Gleixner,
Andrew Morton
On Sun, Dec 23, 2012 at 04:17:45PM -0700, David Ahern wrote:
> On 12/23/12 2:23 AM, Gleb Natapov wrote:
> >>Your patch alone was not enough. Start here:
> >> https://lkml.org/lkml/2012/7/12/3
> >>
> >I cannot reproduce this failure. I reverted 20b279ddb38c and ran "perf
> >record -e cycles:ppG" while guest was running. Admittedly I ran the test
> >for a short time, but without disabling PEBS during the guest entry this
> >was enough to crash a guest.
>
> In the beginning (without any patches) VMs crashed fairly quickly.
> With your patch it took longer, but I was able to consistently crash
> VMs. The thread notes server info (processor, OS) and VM versions as
> well as load used for the tests -- a cpu bound process (openssl),
> disk bound (dd) and network (netperf).
>
>
It means that disabling PEBS is not enough and PMU counter should be
disabled too.
> >What about forcing exclude_guest on an event that
> >has precise flag set without reporting error to userspace?
>
> That's up to the perf maintainers -- Ingo, Peter, Arnaldo.
> Personally, I don't like it since kernel side is changing the user
> request.
>
I do not see other way to prevent guests from crashing with older perf
binaries if 20b279ddb38c will be reverted.
--
Gleb.
^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2012-12-24 10:36 UTC | newest]
Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-11 9:09 [GIT PULL] perf changes for v3.8 Ingo Molnar
2012-12-13 2:53 ` Linus Torvalds
2012-12-13 3:02 ` David Ahern
2012-12-13 3:09 ` Linus Torvalds
2012-12-13 3:16 ` David Ahern
2012-12-13 3:25 ` David Ahern
2012-12-13 3:34 ` Linus Torvalds
2012-12-13 3:43 ` David Ahern
2012-12-13 3:51 ` Linus Torvalds
2012-12-13 4:31 ` David Ahern
2012-12-13 4:46 ` Linus Torvalds
2012-12-13 7:27 ` Ingo Molnar
2012-12-13 7:30 ` Ingo Molnar
2012-12-13 14:30 ` David Ahern
2012-12-13 14:38 ` David Ahern
2012-12-13 16:03 ` Linus Torvalds
2012-12-13 16:24 ` David Ahern
2012-12-13 16:33 ` Linus Torvalds
2012-12-13 16:59 ` Ingo Molnar
2012-12-13 17:10 ` Linus Torvalds
2012-12-13 17:31 ` Ingo Molnar
2012-12-17 4:43 ` David Ahern
2012-12-22 19:22 ` David Ahern
2012-12-23 0:00 ` Linus Torvalds
2012-12-13 17:02 ` Linus Torvalds
2012-12-13 17:30 ` David Ahern
2012-12-13 17:36 ` Ingo Molnar
2012-12-13 19:12 ` David Ahern
2012-12-13 7:48 ` [PATCH] Revert "perf: Require exclude_guest to use PEBS - kernel side enforcement" Ingo Molnar
[not found] ` <20121217102000.GE11016@redhat.com>
2012-12-22 19:30 ` [GIT PULL] perf changes for v3.8 David Ahern
2012-12-23 9:23 ` Gleb Natapov
2012-12-23 23:17 ` David Ahern
2012-12-24 10:36 ` Gleb Natapov
2012-12-13 17:04 ` [PATCH] x86: fix perf build with uclibc toolchains Florian Fainelli
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).