linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kaixu Xia <xiakaixu@huawei.com>
To: <ast@plumgrid.com>, <davem@davemloft.net>, <acme@kernel.org>,
	<mingo@redhat.com>, <a.p.zijlstra@chello.nl>,
	<masami.hiramatsu.pt@hitachi.com>, <jolsa@kernel.org>
Cc: <xiakaixu@huawei.com>, <wangnan0@huawei.com>,
	<linux-kernel@vger.kernel.org>, <pi3orama@163.com>,
	<hekuang@huawei.com>
Subject: [PATCH v3 0/3] bpf: Introduce the new ability of eBPF programs to access hardware PMU counter
Date: Thu, 23 Jul 2015 09:42:39 +0000	[thread overview]
Message-ID: <1437644562-84431-1-git-send-email-xiakaixu@huawei.com> (raw)

Previous patch v2 url:
https://lkml.org/lkml/2015/7/22/104

This patchset allows user read PMU events in the following way:
 1. Open the PMU using perf_event_open() (for each CPUs or for 
    each processes he/she'd like to watch);
 2. Create a BPF_MAP_TYPE_PERF_EVENT_ARRAY BPF map;
 3. Insert FDs into the map with some key-value mapping scheme
    (i.e. cpuid -> event on that CPU);
 4. Load and attach eBPF programs as usual; 
 5. In eBPF program, get the perf_event_map_fd and index (i.e.
    cpuid get from bpf_get_smp_processor_id()) then use 
    bpf_perf_event_read() to read from it. 
 6. Do anything he/her want.

changes in V3: 
 - collapse V2 patches 1-3 into one;
 - drop the function map->ops->map_traverse_elem() and release
   the struct perf_event in map_free;
 - only allow to access bpf_perf_event_read() from programs;
 - update the perf_event_array_map elem via xchg();
 - pass index directly to bpf_perf_event_read() instead of
   MAP_KEY;

changes in V2: 
 - put atomic_long_inc_not_zero() between fdget() and fdput();
 - limit the event type to PERF_TYPE_RAW and PERF_TYPE_HARDWARE;
 - Only read the event counter on current CPU or on current
   process;
 - add new map type BPF_MAP_TYPE_PERF_EVENT_ARRAY to store the 
   pointer to the struct perf_event;
 - according to the perf_event_map_fd and key, the function
   bpf_perf_event_read() can get the Hardware PMU counter value;

Patch 3/3 is a simple example and shows how to use this new eBPF
programs ability. The PMU counter data can be found in
/sys/kernel/debug/tracing/trace(trace_pipe).(the cycles PMU
value when 'kprobe/sys_write' sampling)

  $ cat /sys/kernel/debug/tracing/trace_pipe
  $ ./tracex6
       ...
             cat-674   [000] d..1   146.413405: : CPU-0   2558223
           <...>-699   [003] d..1   146.413441: : CPU-3   2663985
             cat-674   [000] d..1   146.413480: : CPU-0   2659705
           <...>-699   [003] d..1   146.413516: : CPU-3   2765199
             cat-674   [000] d..1   146.413555: : CPU-0   2761277
           <...>-699   [003] d..1   146.413600: : CPU-3   2877051
             cat-674   [000] d..1   146.413651: : CPU-0   2889668
           <...>-699   [003] d..1   146.413696: : CPU-3   3006447
             cat-674   [000] d..1   146.413749: : CPU-0   3021285
           <...>-699   [003] d..1   146.413787: : CPU-3   3131459
             cat-674   [000] d..1   146.413838: : CPU-0   3141959
           <...>-699   [003] d..1   146.413886: : CPU-3   3264188
             cat-674   [000] d..1   146.413930: : CPU-0   3266461
           <...>-699   [003] d..1   146.413973: : CPU-3   3381038
             cat-674   [000] d..1   146.414025: : CPU-0   3393730
           <...>-699   [003] d..1   146.414071: : CPU-3   3514676
       ...

The detail of patches is as follow:

Patch 1/3 introduces a new bpf map type. This map only stores the
pointer to struct perf_event;

Patch 2/3 implement function bpf_perf_event_read() that get the selected
hardware PMU conuter;

Patch 3/3 give a simple example.

Kaixu Xia (3):
  bpf: Add new bpf map type to store the pointer to struct perf_event
  bpf: Implement function bpf_perf_event_read() that get the selected
    hardware PMU conuter
  samples/bpf: example of get selected PMU counter value

 include/linux/bpf.h        |   3 ++
 include/linux/perf_event.h |   5 +-
 include/uapi/linux/bpf.h   |   2 +
 kernel/bpf/arraymap.c      | 113 +++++++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/helpers.c       |  36 +++++++++++++++
 kernel/bpf/verifier.c      |  15 ++++++
 kernel/events/core.c       |  27 ++++++++++-
 kernel/trace/bpf_trace.c   |   2 +
 samples/bpf/Makefile       |   4 ++
 samples/bpf/bpf_helpers.h  |   2 +
 samples/bpf/tracex6_kern.c |  26 +++++++++++
 samples/bpf/tracex6_user.c |  67 +++++++++++++++++++++++++++
 12 files changed, 299 insertions(+), 3 deletions(-)
 create mode 100644 samples/bpf/tracex6_kern.c
 create mode 100644 samples/bpf/tracex6_user.c

-- 
1.8.3.4


             reply	other threads:[~2015-07-23  9:43 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-23  9:42 Kaixu Xia [this message]
2015-07-23  9:42 ` [PATCH v3 1/3] bpf: Add new bpf map type to store the pointer to struct perf_event Kaixu Xia
2015-07-23 22:54   ` Alexei Starovoitov
2015-07-24  2:22     ` xiakaixu
2015-07-24  2:26       ` Alexei Starovoitov
2015-08-03  9:38   ` Peter Zijlstra
2015-07-23  9:42 ` [PATCH v3 2/3] bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter Kaixu Xia
2015-07-23 22:56   ` Alexei Starovoitov
2015-07-24  1:57     ` xiakaixu
2015-08-03  9:34   ` Peter Zijlstra
2015-08-03 10:32     ` xiakaixu
2015-07-23  9:42 ` [PATCH v3 3/3] samples/bpf: example of get selected PMU counter value Kaixu Xia
2015-07-23 22:59   ` Alexei Starovoitov
2015-07-24  1:54     ` xiakaixu
2015-07-24  2:23       ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1437644562-84431-1-git-send-email-xiakaixu@huawei.com \
    --to=xiakaixu@huawei.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=acme@kernel.org \
    --cc=ast@plumgrid.com \
    --cc=davem@davemloft.net \
    --cc=hekuang@huawei.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=masami.hiramatsu.pt@hitachi.com \
    --cc=mingo@redhat.com \
    --cc=pi3orama@163.com \
    --cc=wangnan0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).