[RFC PATCH 00/17] perf: Detached events

* [RFC PATCH 00/17] perf: Detached events
@ 2017-09-05 13:30 Alexander Shishkin
  2017-09-05 13:30 ` [RFC PATCH 01/17] perf: Allow mmapping only user page Alexander Shishkin
                   ` (17 more replies)
  0 siblings, 18 replies; 34+ messages in thread
From: Alexander Shishkin @ 2017-09-05 13:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, acme, kirill.shutemov,
	Borislav Petkov, rric, Alexander Shishkin

Hi,

I'm going to keep this short.

Objective: include perf data (specifically, AUX/Intel PT) in process core
dumps.

Obstacles and how this patchset deals with them:
(1) Need to be able to have perf events running without consumer (perf
record) running in the background.
Detached events: a new flag to the perf syscall makes a 'detached' event,
which exists after its file descriptor is released. Not all detached events
are per-thread AUX events: this tries to take into account the need for
system-wide persistent events too.

(2) Need to be able to kill those events, so they need to be accessible
after they are created.
Event files: detached events exist as files in tracefs (at the moment), can
be opened/mmaped/read/removed.

(3) Ring buffer contents from these events needs to end up in the core dump
file.
Injecting perf ring buffer into the target task's address space.

(4) Inheritance will have to allocate ring buffers for such events for this
feature to be useful.
A parentless detached event is created (with a ring buffer) upon
inheritance, no output redirection, each event has its own ring buffer.

(5) Sideeffect of (4) is that we can't use GFP_KERNEL pages for such ring
buffers or else we'll have to fail inherit_event() (and, therefore, user's
fork()) when they exhaust their mlock limit.
Using shmemfs-backed pages for such a ring buffer and only pinning them
while the corresponding target task is running. Other times these pages can
be swapped out.

(6) Ring buffer memory accounting needs to take this new arrangement into
account: one user can use up at most NR_CPUS * buffer_size memory at any
given point in time.
Only account the first such event and undo the accounting when the last
event is gone.

(7) We'll also need to supply all the things that the [PT] decoder normally
finds out via sysfs attributes, like clock ratios, capabilities, etc so that
it also finds its way into the core dump file.
"PMU info" structure is appended to the user page.

I've also hack the perf tool to support all this, all these things can be
found at [1]. I'm not posting the tooling patches though, them being
thoroughly ugly and proof-of-concept. In short, perf record will create
detached events with '--detached' and afterwards will open detached events
via their path in tracefs.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/ash/linux.git/log/?h=perf-detached-shmem-wip

Alexander Shishkin (17):
  perf: Allow mmapping only user page
  perf: Factor out mlock accounting
  tracefs: De-globalize instances' callbacks
  tracefs: Add ->unlink callback to tracefs_dir_ops
  perf: Introduce detached events
  perf: Add buffers to the detached events
  perf: Add pmu_info to user page
  perf: Allow inheritance for detached events
  perf: Use shmemfs pages for userspace-only per-thread detached events
  perf: Implement pinning and scheduling for SHMEM events
  perf: Implement mlock accounting for shmem ring buffers
  perf: Track pinned events per user
  perf: Re-inject shmem buffers after exec
  perf: Add ioctl(REATTACH) for detached events
  perf: Allow controlled non-root access to detached events
  perf/x86/intel/pt: Add PMU info
  perf/x86/intel/bts: Add PMU info

 arch/x86/events/intel/bts.c     |  20 +-
 arch/x86/events/intel/pt.c      |  23 +-
 arch/x86/events/intel/pt.h      |  11 +
 fs/tracefs/inode.c              |  71 +++-
 include/linux/perf_event.h      |  33 ++
 include/linux/sched/user.h      |   6 +
 include/linux/tracefs.h         |   3 +-
 include/uapi/linux/perf_event.h |  15 +
 kernel/events/core.c            | 526 +++++++++++++++++++++++------
 kernel/events/internal.h        |  27 +-
 kernel/events/ring_buffer.c     | 730 ++++++++++++++++++++++++++++++++++++--
 kernel/trace/trace.c            |   8 +-
 kernel/user.c                   |   1 +
 13 files changed, 1315 insertions(+), 159 deletions(-)

-- 
2.14.1

^ permalink raw reply	[flat|nested] 34+ messages in thread