[PATCH v2 0/5] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter

* [PATCH v2 0/5] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter
@ 2015-07-22  8:09 Kaixu Xia
  2015-07-22  8:09 ` [PATCH v2 1/5] bpf: Add new bpf map type to store the pointer to struct perf_event Kaixu Xia
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Kaixu Xia @ 2015-07-22  8:09 UTC (permalink / raw)
  To: ast, davem, acme, mingo, a.p.zijlstra, masami.hiramatsu.pt, jolsa
  Cc: xiakaixu, wangnan0, linux-kernel, pi3orama, hekuang

Previous patch v1 url:
https://lkml.org/lkml/2015/7/17/287

This patchset allows user read PMU events in the following way:
 1. Open the PMU using perf_event_open() (for each CPUs or for 
    each processes he/she'd like to watch);
 2. Create a BPF_MAP_TYPE_PERF_EVENT_ARRAY BPF map;
 3. Insert FDs into the map with some key-value mapping scheme
    (i.e. cpuid -> event on that CPU);
 4. Load and attach eBPF programs as usual; 
 5. In eBPF program, get the perf_event_map_fd and key (i.e.
    cpuid get from bpf_get_smp_processor_id()) then use 
    bpf_perf_event_read() to read from it. 
 6. Do anything he/her want.

changes in V2: 
 - put atomic_long_inc_not_zero() between fdget() and fdput();
 - limit the event type to PERF_TYPE_RAW and PERF_TYPE_HARDWARE;
 - Only read the event counter on current CPU or on current
   process;
 - add new map type BPF_MAP_TYPE_PERF_EVENT_ARRAY to store the 
   pointer to the struct perf_event;
 - according to the perf_event_map_fd and key, the function
   bpf_perf_event_read() can get the Hardware PMU counter value;

Patch 5/5 is a simple example and shows how to use this new eBPF
programs ability. The PMU counter data can be found in
/sys/kernel/debug/tracing/trace(trace_pipe).(the cycles PMU 
value when 'kprobe/sys_write' sampling)

  $ cat /sys/kernel/debug/tracing/trace_pipe
  $ ./tracex6
       ...
             cat-677   [002] d..1   210.299270: : bpf count: CPU-2  5316659
             cat-677   [002] d..1   210.299316: : bpf count: CPU-2  5378639
             cat-677   [002] d..1   210.299362: : bpf count: CPU-2  5440654
             cat-677   [002] d..1   210.299408: : bpf count: CPU-2  5503211
             cat-677   [002] d..1   210.299454: : bpf count: CPU-2  5565438
             cat-677   [002] d..1   210.299500: : bpf count: CPU-2  5627433
             cat-677   [002] d..1   210.299547: : bpf count: CPU-2  5690033
             cat-677   [002] d..1   210.299593: : bpf count: CPU-2  5752184
             cat-677   [002] d..1   210.299639: : bpf count: CPU-2  5814543
           <...>-548   [009] d..1   210.299667: : bpf count: CPU-9  605418074
           <...>-548   [009] d..1   210.299692: : bpf count: CPU-9  605452692
             cat-677   [002] d..1   210.299700: : bpf count: CPU-2  5896319
           <...>-548   [009] d..1   210.299710: : bpf count: CPU-9  605477824
           <...>-548   [009] d..1   210.299728: : bpf count: CPU-9  605501726
           <...>-548   [009] d..1   210.299745: : bpf count: CPU-9  605525279
           <...>-548   [009] d..1   210.299762: : bpf count: CPU-9  605547817
           <...>-548   [009] d..1   210.299778: : bpf count: CPU-9  605570433
           <...>-548   [009] d..1   210.299795: : bpf count: CPU-9  605592743
       ...

The detail of patches is as follow:

Patch 1/5 introduces a new bpf map type. This map only stores the
pointer to struct perf_event;

Patch 2/5 introduces a map_traverse_elem() function for further use;

Patch 3/5 convets event file descriptors into perf_event structure when
add new element to the map;

Patch 4/5 implement function bpf_perf_event_read() that get the selected
hardware PMU conuter;

Patch 5/5 give a simple example.

Kaixu Xia (5):
  bpf: Add new bpf map type to store the pointer to struct perf_event
  bpf: Add function map->ops->map_traverse_elem() to traverse map elems
  bpf: Save the pointer to struct perf_event to map
  bpf: Implement function bpf_perf_event_read() that get the selected
    hardware PMU conuter
  samples/bpf: example of get selected PMU counter value

 include/linux/bpf.h        |   6 +++
 include/linux/perf_event.h |   5 ++-
 include/uapi/linux/bpf.h   |   3 ++
 kernel/bpf/arraymap.c      | 110 +++++++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/helpers.c       |  42 +++++++++++++++++
 kernel/bpf/syscall.c       |  26 +++++++++++
 kernel/events/core.c       |  30 ++++++++++++-
 kernel/trace/bpf_trace.c   |   2 +
 samples/bpf/Makefile       |   4 ++
 samples/bpf/bpf_helpers.h  |   2 +
 samples/bpf/tracex6_kern.c |  27 +++++++++++
 samples/bpf/tracex6_user.c |  67 +++++++++++++++++++++++++++
 12 files changed, 321 insertions(+), 3 deletions(-)
 create mode 100644 samples/bpf/tracex6_kern.c
 create mode 100644 samples/bpf/tracex6_user.c

-- 
1.8.3.4

^ permalink raw reply	[flat|nested] 18+ messages in thread