* [PATCH v5 bpf-next 0/5] libbpf: add perf buffer abstraction and API
@ 2019-07-06 4:35 Andrii Nakryiko
2019-07-06 4:35 ` [PATCH v5 bpf-next 1/5] libbpf: add perf buffer API Andrii Nakryiko
` (4 more replies)
0 siblings, 5 replies; 9+ messages in thread
From: Andrii Nakryiko @ 2019-07-06 4:35 UTC (permalink / raw)
To: andrii.nakryiko, kernel-team, bpf, netdev, ast, daniel; +Cc: Andrii Nakryiko
This patchset adds a high-level API for setting up and polling perf buffers
associated with BPF_MAP_TYPE_PERF_EVENT_ARRAY map. Details of APIs are
described in corresponding commit.
Patch #1 adds a set of APIs to set up and work with perf buffer.
Patch #2 enhances libbpf to support auto-setting PERF_EVENT_ARRAY map size.
Patch #3 adds test.
Patch #4 converts bpftool map event_pipe to new API.
Patch #5 updates README to mention perf_buffer_ prefix.
v4->v5:
- initialize perf_buffer_raw_opts in bpftool map event_pipe (Jakub);
- add perf_buffer_ to README;
v3->v4:
- fixed bpftool event_pipe cmd error handling (Jakub);
v2->v3:
- added perf_buffer__new_raw for more low-level control;
- converted bpftool map event_pipe to new API (Daniel);
- fixed bug with error handling in create_maps (Song);
v1->v2:
- add auto-sizing of PERF_EVENT_ARRAY maps;
Andrii Nakryiko (5):
libbpf: add perf buffer API
libbpf: auto-set PERF_EVENT_ARRAY size to number of CPUs
selftests/bpf: test perf buffer API
tools/bpftool: switch map event_pipe to libbpf's perf_buffer
libbpf: add perf_buffer_ prefix to README
tools/bpf/bpftool/map_perf_ring.c | 201 +++------
tools/lib/bpf/README.rst | 3 +-
tools/lib/bpf/libbpf.c | 397 +++++++++++++++++-
tools/lib/bpf/libbpf.h | 49 +++
tools/lib/bpf/libbpf.map | 4 +
.../selftests/bpf/prog_tests/perf_buffer.c | 94 +++++
.../selftests/bpf/progs/test_perf_buffer.c | 25 ++
7 files changed, 628 insertions(+), 145 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_buffer.c
create mode 100644 tools/testing/selftests/bpf/progs/test_perf_buffer.c
--
2.17.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v5 bpf-next 1/5] libbpf: add perf buffer API
2019-07-06 4:35 [PATCH v5 bpf-next 0/5] libbpf: add perf buffer abstraction and API Andrii Nakryiko
@ 2019-07-06 4:35 ` Andrii Nakryiko
2019-07-06 5:42 ` Yonghong Song
2019-07-06 4:35 ` [PATCH v5 bpf-next 2/5] libbpf: auto-set PERF_EVENT_ARRAY size to number of CPUs Andrii Nakryiko
` (3 subsequent siblings)
4 siblings, 1 reply; 9+ messages in thread
From: Andrii Nakryiko @ 2019-07-06 4:35 UTC (permalink / raw)
To: andrii.nakryiko, kernel-team, bpf, netdev, ast, daniel; +Cc: Andrii Nakryiko
BPF_MAP_TYPE_PERF_EVENT_ARRAY map is often used to send data from BPF program
to user space for additional processing. libbpf already has very low-level API
to read single CPU perf buffer, bpf_perf_event_read_simple(), but it's hard to
use and requires a lot of code to set everything up. This patch adds
perf_buffer abstraction on top of it, abstracting setting up and polling
per-CPU logic into simple and convenient API, similar to what BCC provides.
perf_buffer__new() sets up per-CPU ring buffers and updates corresponding BPF
map entries. It accepts two user-provided callbacks: one for handling raw
samples and one for get notifications of lost samples due to buffer overflow.
perf_buffer__new_raw() is similar, but provides more control over how
perf events are set up (by accepting user-provided perf_event_attr), how
they are handled (perf_event_header pointer is passed directly to
user-provided callback), and on which CPUs ring buffers are created
(it's possible to provide a list of CPUs and corresponding map keys to
update). This API allows advanced users fuller control.
perf_buffer__poll() is used to fetch ring buffer data across all CPUs,
utilizing epoll instance.
perf_buffer__free() does corresponding clean up and unsets FDs from BPF map.
All APIs are not thread-safe. User should ensure proper locking/coordination if
used in multi-threaded set up.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
tools/lib/bpf/libbpf.c | 366 +++++++++++++++++++++++++++++++++++++++
tools/lib/bpf/libbpf.h | 49 ++++++
tools/lib/bpf/libbpf.map | 4 +
3 files changed, 419 insertions(+)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 2a08eb106221..72149d68b8c1 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -32,7 +32,9 @@
#include <linux/limits.h>
#include <linux/perf_event.h>
#include <linux/ring_buffer.h>
+#include <sys/epoll.h>
#include <sys/ioctl.h>
+#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/vfs.h>
@@ -4354,6 +4356,370 @@ bpf_perf_event_read_simple(void *mmap_mem, size_t mmap_size, size_t page_size,
return ret;
}
+struct perf_buffer;
+
+struct perf_buffer_params {
+ struct perf_event_attr *attr;
+ /* if event_cb is specified, it takes precendence */
+ perf_buffer_event_fn event_cb;
+ /* sample_cb and lost_cb are higher-level common-case callbacks */
+ perf_buffer_sample_fn sample_cb;
+ perf_buffer_lost_fn lost_cb;
+ void *ctx;
+ int cpu_cnt;
+ int *cpus;
+ int *map_keys;
+};
+
+struct perf_cpu_buf {
+ struct perf_buffer *pb;
+ void *base; /* mmap()'ed memory */
+ void *buf; /* for reconstructing segmented data */
+ size_t buf_size;
+ int fd;
+ int cpu;
+ int map_key;
+};
+
+struct perf_buffer {
+ perf_buffer_event_fn event_cb;
+ perf_buffer_sample_fn sample_cb;
+ perf_buffer_lost_fn lost_cb;
+ void *ctx; /* passed into callbacks */
+
+ size_t page_size;
+ size_t mmap_size;
+ struct perf_cpu_buf **cpu_bufs;
+ struct epoll_event *events;
+ int cpu_cnt;
+ int epoll_fd; /* perf event FD */
+ int map_fd; /* BPF_MAP_TYPE_PERF_EVENT_ARRAY BPF map FD */
+};
+
+static void perf_buffer__free_cpu_buf(struct perf_buffer *pb,
+ struct perf_cpu_buf *cpu_buf)
+{
+ if (!cpu_buf)
+ return;
+ if (cpu_buf->base &&
+ munmap(cpu_buf->base, pb->mmap_size + pb->page_size))
+ pr_warning("failed to munmap cpu_buf #%d\n", cpu_buf->cpu);
+ if (cpu_buf->fd >= 0) {
+ ioctl(cpu_buf->fd, PERF_EVENT_IOC_DISABLE, 0);
+ close(cpu_buf->fd);
+ }
+ free(cpu_buf->buf);
+ free(cpu_buf);
+}
+
+void perf_buffer__free(struct perf_buffer *pb)
+{
+ int i;
+
+ if (!pb)
+ return;
+ if (pb->cpu_bufs) {
+ for (i = 0; i < pb->cpu_cnt && pb->cpu_bufs[i]; i++) {
+ struct perf_cpu_buf *cpu_buf = pb->cpu_bufs[i];
+
+ bpf_map_delete_elem(pb->map_fd, &cpu_buf->map_key);
+ perf_buffer__free_cpu_buf(pb, cpu_buf);
+ }
+ free(pb->cpu_bufs);
+ }
+ if (pb->epoll_fd >= 0)
+ close(pb->epoll_fd);
+ free(pb->events);
+ free(pb);
+}
+
+static struct perf_cpu_buf *
+perf_buffer__open_cpu_buf(struct perf_buffer *pb, struct perf_event_attr *attr,
+ int cpu, int map_key)
+{
+ struct perf_cpu_buf *cpu_buf;
+ char msg[STRERR_BUFSIZE];
+ int err;
+
+ cpu_buf = calloc(1, sizeof(*cpu_buf));
+ if (!cpu_buf)
+ return ERR_PTR(-ENOMEM);
+
+ cpu_buf->pb = pb;
+ cpu_buf->cpu = cpu;
+ cpu_buf->map_key = map_key;
+
+ cpu_buf->fd = syscall(__NR_perf_event_open, attr, -1 /* pid */, cpu,
+ -1, PERF_FLAG_FD_CLOEXEC);
+ if (cpu_buf->fd < 0) {
+ err = -errno;
+ pr_warning("failed to open perf buffer event on cpu #%d: %s\n",
+ cpu, libbpf_strerror_r(err, msg, sizeof(msg)));
+ goto error;
+ }
+
+ cpu_buf->base = mmap(NULL, pb->mmap_size + pb->page_size,
+ PROT_READ | PROT_WRITE, MAP_SHARED,
+ cpu_buf->fd, 0);
+ if (cpu_buf->base == MAP_FAILED) {
+ cpu_buf->base = NULL;
+ err = -errno;
+ pr_warning("failed to mmap perf buffer on cpu #%d: %s\n",
+ cpu, libbpf_strerror_r(err, msg, sizeof(msg)));
+ goto error;
+ }
+
+ if (ioctl(cpu_buf->fd, PERF_EVENT_IOC_ENABLE, 0) < 0) {
+ err = -errno;
+ pr_warning("failed to enable perf buffer event on cpu #%d: %s\n",
+ cpu, libbpf_strerror_r(err, msg, sizeof(msg)));
+ goto error;
+ }
+
+ return cpu_buf;
+
+error:
+ perf_buffer__free_cpu_buf(pb, cpu_buf);
+ return (struct perf_cpu_buf *)ERR_PTR(err);
+}
+
+static struct perf_buffer *__perf_buffer__new(int map_fd, size_t page_cnt,
+ struct perf_buffer_params *p);
+
+struct perf_buffer *perf_buffer__new(int map_fd, size_t page_cnt,
+ const struct perf_buffer_opts *opts)
+{
+ struct perf_buffer_params p = {};
+ struct perf_event_attr attr = {
+ .config = PERF_COUNT_SW_BPF_OUTPUT,
+ .type = PERF_TYPE_SOFTWARE,
+ .sample_type = PERF_SAMPLE_RAW,
+ .sample_period = 1,
+ .wakeup_events = 1,
+ };
+
+ p.attr = &attr;
+ p.sample_cb = opts ? opts->sample_cb : NULL;
+ p.lost_cb = opts ? opts->lost_cb : NULL;
+ p.ctx = opts ? opts->ctx : NULL;
+
+ return __perf_buffer__new(map_fd, page_cnt, &p);
+}
+
+struct perf_buffer *
+perf_buffer__new_raw(int map_fd, size_t page_cnt,
+ const struct perf_buffer_raw_opts *opts)
+{
+ struct perf_buffer_params p = {};
+
+ p.attr = opts->attr;
+ p.event_cb = opts->event_cb;
+ p.ctx = opts->ctx;
+ p.cpu_cnt = opts->cpu_cnt;
+ p.cpus = opts->cpus;
+ p.map_keys = opts->map_keys;
+
+ return __perf_buffer__new(map_fd, page_cnt, &p);
+}
+
+static struct perf_buffer *__perf_buffer__new(int map_fd, size_t page_cnt,
+ struct perf_buffer_params *p)
+{
+ struct bpf_map_info map = {};
+ char msg[STRERR_BUFSIZE];
+ struct perf_buffer *pb;
+ __u32 map_info_len;
+ int err, i;
+
+ if (page_cnt & (page_cnt - 1)) {
+ pr_warning("page count should be power of two, but is %zu\n",
+ page_cnt);
+ return ERR_PTR(-EINVAL);
+ }
+
+ map_info_len = sizeof(map);
+ err = bpf_obj_get_info_by_fd(map_fd, &map, &map_info_len);
+ if (err) {
+ err = -errno;
+ pr_warning("failed to get map info for map FD %d: %s\n",
+ map_fd, libbpf_strerror_r(err, msg, sizeof(msg)));
+ return ERR_PTR(err);
+ }
+
+ if (map.type != BPF_MAP_TYPE_PERF_EVENT_ARRAY) {
+ pr_warning("map '%s' should be BPF_MAP_TYPE_PERF_EVENT_ARRAY\n",
+ map.name);
+ return ERR_PTR(-EINVAL);
+ }
+
+ pb = calloc(1, sizeof(*pb));
+ if (!pb)
+ return ERR_PTR(-ENOMEM);
+
+ pb->event_cb = p->event_cb;
+ pb->sample_cb = p->sample_cb;
+ pb->lost_cb = p->lost_cb;
+ pb->ctx = p->ctx;
+
+ pb->page_size = getpagesize();
+ pb->mmap_size = pb->page_size * page_cnt;
+ pb->map_fd = map_fd;
+
+ pb->epoll_fd = epoll_create1(EPOLL_CLOEXEC);
+ if (pb->epoll_fd < 0) {
+ err = -errno;
+ pr_warning("failed to create epoll instance: %s\n",
+ libbpf_strerror_r(err, msg, sizeof(msg)));
+ goto error;
+ }
+
+ if (p->cpu_cnt > 0) {
+ pb->cpu_cnt = p->cpu_cnt;
+ } else {
+ pb->cpu_cnt = libbpf_num_possible_cpus();
+ if (pb->cpu_cnt < 0) {
+ err = pb->cpu_cnt;
+ goto error;
+ }
+ if (map.max_entries < pb->cpu_cnt)
+ pb->cpu_cnt = map.max_entries;
+ }
+
+ pb->events = calloc(pb->cpu_cnt, sizeof(*pb->events));
+ if (!pb->events) {
+ err = -ENOMEM;
+ pr_warning("failed to allocate events: out of memory\n");
+ goto error;
+ }
+ pb->cpu_bufs = calloc(pb->cpu_cnt, sizeof(*pb->cpu_bufs));
+ if (!pb->cpu_bufs) {
+ err = -ENOMEM;
+ pr_warning("failed to allocate buffers: out of memory\n");
+ goto error;
+ }
+
+ for (i = 0; i < pb->cpu_cnt; i++) {
+ struct perf_cpu_buf *cpu_buf;
+ int cpu, map_key;
+
+ cpu = p->cpu_cnt > 0 ? p->cpus[i] : i;
+ map_key = p->cpu_cnt > 0 ? p->map_keys[i] : i;
+
+ cpu_buf = perf_buffer__open_cpu_buf(pb, p->attr, cpu, map_key);
+ if (IS_ERR(cpu_buf)) {
+ err = PTR_ERR(cpu_buf);
+ goto error;
+ }
+
+ pb->cpu_bufs[i] = cpu_buf;
+
+ err = bpf_map_update_elem(pb->map_fd, &map_key,
+ &cpu_buf->fd, 0);
+ if (err) {
+ err = -errno;
+ pr_warning("failed to set cpu #%d, key %d -> perf FD %d: %s\n",
+ cpu, map_key, cpu_buf->fd,
+ libbpf_strerror_r(err, msg, sizeof(msg)));
+ goto error;
+ }
+
+ pb->events[i].events = EPOLLIN;
+ pb->events[i].data.ptr = cpu_buf;
+ if (epoll_ctl(pb->epoll_fd, EPOLL_CTL_ADD, cpu_buf->fd,
+ &pb->events[i]) < 0) {
+ err = -errno;
+ pr_warning("failed to epoll_ctl cpu #%d perf FD %d: %s\n",
+ cpu, cpu_buf->fd,
+ libbpf_strerror_r(err, msg, sizeof(msg)));
+ goto error;
+ }
+ }
+
+ return pb;
+
+error:
+ if (pb)
+ perf_buffer__free(pb);
+ return ERR_PTR(err);
+}
+
+struct perf_sample_raw {
+ struct perf_event_header header;
+ uint32_t size;
+ char data[0];
+};
+
+struct perf_sample_lost {
+ struct perf_event_header header;
+ uint64_t id;
+ uint64_t lost;
+ uint64_t sample_id;
+};
+
+static enum bpf_perf_event_ret
+perf_buffer__process_record(struct perf_event_header *e, void *ctx)
+{
+ struct perf_cpu_buf *cpu_buf = ctx;
+ struct perf_buffer *pb = cpu_buf->pb;
+ void *data = e;
+
+ /* user wants full control over parsing perf event */
+ if (pb->event_cb)
+ return pb->event_cb(pb->ctx, cpu_buf->cpu, e);
+
+ switch (e->type) {
+ case PERF_RECORD_SAMPLE: {
+ struct perf_sample_raw *s = data;
+
+ if (pb->sample_cb)
+ pb->sample_cb(pb->ctx, cpu_buf->cpu, s->data, s->size);
+ break;
+ }
+ case PERF_RECORD_LOST: {
+ struct perf_sample_lost *s = data;
+
+ if (pb->lost_cb)
+ pb->lost_cb(pb->ctx, cpu_buf->cpu, s->lost);
+ break;
+ }
+ default:
+ pr_warning("unknown perf sample type %d\n", e->type);
+ return LIBBPF_PERF_EVENT_ERROR;
+ }
+ return LIBBPF_PERF_EVENT_CONT;
+}
+
+static int perf_buffer__process_records(struct perf_buffer *pb,
+ struct perf_cpu_buf *cpu_buf)
+{
+ enum bpf_perf_event_ret ret;
+
+ ret = bpf_perf_event_read_simple(cpu_buf->base, pb->mmap_size,
+ pb->page_size, &cpu_buf->buf,
+ &cpu_buf->buf_size,
+ perf_buffer__process_record, cpu_buf);
+ if (ret != LIBBPF_PERF_EVENT_CONT)
+ return ret;
+ return 0;
+}
+
+int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms)
+{
+ int cnt, err;
+
+ cnt = epoll_wait(pb->epoll_fd, pb->events, pb->cpu_cnt, timeout_ms);
+ for (int i = 0; i < cnt; i++) {
+ struct perf_cpu_buf *cpu_buf = pb->events[i].data.ptr;
+
+ err = perf_buffer__process_records(pb, cpu_buf);
+ if (err) {
+ pr_warning("error while processing records: %d\n", err);
+ return err;
+ }
+ }
+ return cnt < 0 ? -errno : cnt;
+}
+
struct bpf_prog_info_array_desc {
int array_offset; /* e.g. offset of jited_prog_insns */
int count_offset; /* e.g. offset of jited_prog_len */
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index f55933784f95..5cbf459ece0b 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -358,6 +358,26 @@ LIBBPF_API int bpf_prog_load(const char *file, enum bpf_prog_type type,
LIBBPF_API int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags);
LIBBPF_API int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags);
+struct perf_buffer;
+
+typedef void (*perf_buffer_sample_fn)(void *ctx, int cpu,
+ void *data, __u32 size);
+typedef void (*perf_buffer_lost_fn)(void *ctx, int cpu, __u64 cnt);
+
+/* common use perf buffer options */
+struct perf_buffer_opts {
+ /* if specified, sample_cb is called for each sample */
+ perf_buffer_sample_fn sample_cb;
+ /* if specified, lost_cb is called for each batch of lost samples */
+ perf_buffer_lost_fn lost_cb;
+ /* ctx is provided to sample_cb and lost_cb */
+ void *ctx;
+};
+
+LIBBPF_API struct perf_buffer *
+perf_buffer__new(int map_fd, size_t page_cnt,
+ const struct perf_buffer_opts *opts);
+
enum bpf_perf_event_ret {
LIBBPF_PERF_EVENT_DONE = 0,
LIBBPF_PERF_EVENT_ERROR = -1,
@@ -365,6 +385,35 @@ enum bpf_perf_event_ret {
};
struct perf_event_header;
+
+typedef enum bpf_perf_event_ret
+(*perf_buffer_event_fn)(void *ctx, int cpu, struct perf_event_header *event);
+
+/* raw perf buffer options, giving most power and control */
+struct perf_buffer_raw_opts {
+ /* perf event attrs passed directly into perf_event_open() */
+ struct perf_event_attr *attr;
+ /* raw event callback */
+ perf_buffer_event_fn event_cb;
+ /* ctx is provided to event_cb */
+ void *ctx;
+ /* if cpu_cnt == 0, open all on all possible CPUs (up to the number of
+ * max_entries of given PERF_EVENT_ARRAY map)
+ */
+ int cpu_cnt;
+ /* if cpu_cnt > 0, cpus is an array of CPUs to open ring buffers on */
+ int *cpus;
+ /* if cpu_cnt > 0, map_keys specify map keys to set per-CPU FDs for */
+ int *map_keys;
+};
+
+LIBBPF_API struct perf_buffer *
+perf_buffer__new_raw(int map_fd, size_t page_cnt,
+ const struct perf_buffer_raw_opts *opts);
+
+LIBBPF_API void perf_buffer__free(struct perf_buffer *pb);
+LIBBPF_API int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms);
+
typedef enum bpf_perf_event_ret
(*bpf_perf_event_print_t)(struct perf_event_header *hdr,
void *private_data);
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index e6b7d4edbc93..f9d316e873d8 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -179,4 +179,8 @@ LIBBPF_0.0.4 {
btf_dump__new;
btf__parse_elf;
libbpf_num_possible_cpus;
+ perf_buffer__free;
+ perf_buffer__new;
+ perf_buffer__new_raw;
+ perf_buffer__poll;
} LIBBPF_0.0.3;
--
2.17.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v5 bpf-next 2/5] libbpf: auto-set PERF_EVENT_ARRAY size to number of CPUs
2019-07-06 4:35 [PATCH v5 bpf-next 0/5] libbpf: add perf buffer abstraction and API Andrii Nakryiko
2019-07-06 4:35 ` [PATCH v5 bpf-next 1/5] libbpf: add perf buffer API Andrii Nakryiko
@ 2019-07-06 4:35 ` Andrii Nakryiko
2019-07-06 4:35 ` [PATCH v5 bpf-next 3/5] selftests/bpf: test perf buffer API Andrii Nakryiko
` (2 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Andrii Nakryiko @ 2019-07-06 4:35 UTC (permalink / raw)
To: andrii.nakryiko, kernel-team, bpf, netdev, ast, daniel; +Cc: Andrii Nakryiko
For BPF_MAP_TYPE_PERF_EVENT_ARRAY typically correct size is number of
possible CPUs. This is impossible to specify at compilation time. This
change adds automatic setting of PERF_EVENT_ARRAY size to number of
system CPUs, unless non-zero size is specified explicitly. This allows
to adjust size for advanced specific cases, while providing convenient
and logical defaults.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
---
tools/lib/bpf/libbpf.c | 31 ++++++++++++++++++++++++-------
1 file changed, 24 insertions(+), 7 deletions(-)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 72149d68b8c1..e4df833b5138 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -2116,6 +2116,7 @@ static int
bpf_object__create_maps(struct bpf_object *obj)
{
struct bpf_create_map_attr create_attr = {};
+ int nr_cpus = 0;
unsigned int i;
int err;
@@ -2138,7 +2139,22 @@ bpf_object__create_maps(struct bpf_object *obj)
create_attr.map_flags = def->map_flags;
create_attr.key_size = def->key_size;
create_attr.value_size = def->value_size;
- create_attr.max_entries = def->max_entries;
+ if (def->type == BPF_MAP_TYPE_PERF_EVENT_ARRAY &&
+ !def->max_entries) {
+ if (!nr_cpus)
+ nr_cpus = libbpf_num_possible_cpus();
+ if (nr_cpus < 0) {
+ pr_warning("failed to determine number of system CPUs: %d\n",
+ nr_cpus);
+ err = nr_cpus;
+ goto err_out;
+ }
+ pr_debug("map '%s': setting size to %d\n",
+ map->name, nr_cpus);
+ create_attr.max_entries = nr_cpus;
+ } else {
+ create_attr.max_entries = def->max_entries;
+ }
create_attr.btf_fd = 0;
create_attr.btf_key_type_id = 0;
create_attr.btf_value_type_id = 0;
@@ -2155,9 +2171,10 @@ bpf_object__create_maps(struct bpf_object *obj)
*pfd = bpf_create_map_xattr(&create_attr);
if (*pfd < 0 && (create_attr.btf_key_type_id ||
create_attr.btf_value_type_id)) {
- cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
+ err = -errno;
+ cp = libbpf_strerror_r(err, errmsg, sizeof(errmsg));
pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
- map->name, cp, errno);
+ map->name, cp, err);
create_attr.btf_fd = 0;
create_attr.btf_key_type_id = 0;
create_attr.btf_value_type_id = 0;
@@ -2169,11 +2186,11 @@ bpf_object__create_maps(struct bpf_object *obj)
if (*pfd < 0) {
size_t j;
- err = *pfd;
+ err = -errno;
err_out:
- cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
- pr_warning("failed to create map (name: '%s'): %s\n",
- map->name, cp);
+ cp = libbpf_strerror_r(err, errmsg, sizeof(errmsg));
+ pr_warning("failed to create map (name: '%s'): %s(%d)\n",
+ map->name, cp, err);
for (j = 0; j < i; j++)
zclose(obj->maps[j].fd);
return err;
--
2.17.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v5 bpf-next 3/5] selftests/bpf: test perf buffer API
2019-07-06 4:35 [PATCH v5 bpf-next 0/5] libbpf: add perf buffer abstraction and API Andrii Nakryiko
2019-07-06 4:35 ` [PATCH v5 bpf-next 1/5] libbpf: add perf buffer API Andrii Nakryiko
2019-07-06 4:35 ` [PATCH v5 bpf-next 2/5] libbpf: auto-set PERF_EVENT_ARRAY size to number of CPUs Andrii Nakryiko
@ 2019-07-06 4:35 ` Andrii Nakryiko
2019-07-06 4:35 ` [PATCH v5 bpf-next 4/5] tools/bpftool: switch map event_pipe to libbpf's perf_buffer Andrii Nakryiko
2019-07-06 4:35 ` [PATCH v5 bpf-next 5/5] libbpf: add perf_buffer_ prefix to README Andrii Nakryiko
4 siblings, 0 replies; 9+ messages in thread
From: Andrii Nakryiko @ 2019-07-06 4:35 UTC (permalink / raw)
To: andrii.nakryiko, kernel-team, bpf, netdev, ast, daniel; +Cc: Andrii Nakryiko
Add test verifying perf buffer API functionality.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
---
.../selftests/bpf/prog_tests/perf_buffer.c | 94 +++++++++++++++++++
.../selftests/bpf/progs/test_perf_buffer.c | 25 +++++
2 files changed, 119 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_buffer.c
create mode 100644 tools/testing/selftests/bpf/progs/test_perf_buffer.c
diff --git a/tools/testing/selftests/bpf/prog_tests/perf_buffer.c b/tools/testing/selftests/bpf/prog_tests/perf_buffer.c
new file mode 100644
index 000000000000..64556ab0d1a9
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/perf_buffer.c
@@ -0,0 +1,94 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <pthread.h>
+#include <sched.h>
+#include <sys/socket.h>
+#include <test_progs.h>
+
+static void on_sample(void *ctx, int cpu, void *data, __u32 size)
+{
+ int cpu_data = *(int *)data, duration = 0;
+ cpu_set_t *cpu_seen = ctx;
+
+ if (cpu_data != cpu)
+ CHECK(cpu_data != cpu, "check_cpu_data",
+ "cpu_data %d != cpu %d\n", cpu_data, cpu);
+
+ CPU_SET(cpu, cpu_seen);
+}
+
+void test_perf_buffer(void)
+{
+ int err, prog_fd, nr_cpus, i, duration = 0;
+ const char *prog_name = "kprobe/sys_nanosleep";
+ const char *file = "./test_perf_buffer.o";
+ struct perf_buffer_opts pb_opts = {};
+ struct bpf_map *perf_buf_map;
+ cpu_set_t cpu_set, cpu_seen;
+ struct bpf_program *prog;
+ struct bpf_object *obj;
+ struct perf_buffer *pb;
+ struct bpf_link *link;
+
+ nr_cpus = libbpf_num_possible_cpus();
+ if (CHECK(nr_cpus < 0, "nr_cpus", "err %d\n", nr_cpus))
+ return;
+
+ /* load program */
+ err = bpf_prog_load(file, BPF_PROG_TYPE_KPROBE, &obj, &prog_fd);
+ if (CHECK(err, "obj_load", "err %d errno %d\n", err, errno))
+ return;
+
+ prog = bpf_object__find_program_by_title(obj, prog_name);
+ if (CHECK(!prog, "find_probe", "prog '%s' not found\n", prog_name))
+ goto out_close;
+
+ /* load map */
+ perf_buf_map = bpf_object__find_map_by_name(obj, "perf_buf_map");
+ if (CHECK(!perf_buf_map, "find_perf_buf_map", "not found\n"))
+ goto out_close;
+
+ /* attach kprobe */
+ link = bpf_program__attach_kprobe(prog, false /* retprobe */,
+ "sys_nanosleep");
+ if (CHECK(IS_ERR(link), "attach_kprobe", "err %ld\n", PTR_ERR(link)))
+ goto out_close;
+
+ /* set up perf buffer */
+ pb_opts.sample_cb = on_sample;
+ pb_opts.ctx = &cpu_seen;
+ pb = perf_buffer__new(bpf_map__fd(perf_buf_map), 1, &pb_opts);
+ if (CHECK(IS_ERR(pb), "perf_buf__new", "err %ld\n", PTR_ERR(pb)))
+ goto out_detach;
+
+ /* trigger kprobe on every CPU */
+ CPU_ZERO(&cpu_seen);
+ for (i = 0; i < nr_cpus; i++) {
+ CPU_ZERO(&cpu_set);
+ CPU_SET(i, &cpu_set);
+
+ err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set),
+ &cpu_set);
+ if (err && CHECK(err, "set_affinity", "cpu #%d, err %d\n",
+ i, err))
+ goto out_detach;
+
+ usleep(1);
+ }
+
+ /* read perf buffer */
+ err = perf_buffer__poll(pb, 100);
+ if (CHECK(err < 0, "perf_buffer__poll", "err %d\n", err))
+ goto out_free_pb;
+
+ if (CHECK(CPU_COUNT(&cpu_seen) != nr_cpus, "seen_cpu_cnt",
+ "expect %d, seen %d\n", nr_cpus, CPU_COUNT(&cpu_seen)))
+ goto out_free_pb;
+
+out_free_pb:
+ perf_buffer__free(pb);
+out_detach:
+ bpf_link__destroy(link);
+out_close:
+ bpf_object__close(obj);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_perf_buffer.c b/tools/testing/selftests/bpf/progs/test_perf_buffer.c
new file mode 100644
index 000000000000..876c27deb65a
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_perf_buffer.c
@@ -0,0 +1,25 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+
+#include <linux/ptrace.h>
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+
+struct {
+ __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
+ __uint(key_size, sizeof(int));
+ __uint(value_size, sizeof(int));
+} perf_buf_map SEC(".maps");
+
+SEC("kprobe/sys_nanosleep")
+int handle_sys_nanosleep_entry(struct pt_regs *ctx)
+{
+ int cpu = bpf_get_smp_processor_id();
+
+ bpf_perf_event_output(ctx, &perf_buf_map, BPF_F_CURRENT_CPU,
+ &cpu, sizeof(cpu));
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+__u32 _version SEC("version") = 1;
--
2.17.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v5 bpf-next 4/5] tools/bpftool: switch map event_pipe to libbpf's perf_buffer
2019-07-06 4:35 [PATCH v5 bpf-next 0/5] libbpf: add perf buffer abstraction and API Andrii Nakryiko
` (2 preceding siblings ...)
2019-07-06 4:35 ` [PATCH v5 bpf-next 3/5] selftests/bpf: test perf buffer API Andrii Nakryiko
@ 2019-07-06 4:35 ` Andrii Nakryiko
2019-07-06 4:35 ` [PATCH v5 bpf-next 5/5] libbpf: add perf_buffer_ prefix to README Andrii Nakryiko
4 siblings, 0 replies; 9+ messages in thread
From: Andrii Nakryiko @ 2019-07-06 4:35 UTC (permalink / raw)
To: andrii.nakryiko, kernel-team, bpf, netdev, ast, daniel; +Cc: Andrii Nakryiko
Switch event_pipe implementation to rely on new libbpf perf buffer API
(it's raw low-level variant).
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
tools/bpf/bpftool/map_perf_ring.c | 201 ++++++++++--------------------
1 file changed, 64 insertions(+), 137 deletions(-)
diff --git a/tools/bpf/bpftool/map_perf_ring.c b/tools/bpf/bpftool/map_perf_ring.c
index 0507dfaf7a8f..3f108ab17797 100644
--- a/tools/bpf/bpftool/map_perf_ring.c
+++ b/tools/bpf/bpftool/map_perf_ring.c
@@ -28,7 +28,7 @@
#define MMAP_PAGE_CNT 16
-static bool stop;
+static volatile bool stop;
struct event_ring_info {
int fd;
@@ -44,32 +44,44 @@ struct perf_event_sample {
unsigned char data[];
};
+struct perf_event_lost {
+ struct perf_event_header header;
+ __u64 id;
+ __u64 lost;
+};
+
static void int_exit(int signo)
{
fprintf(stderr, "Stopping...\n");
stop = true;
}
+struct event_pipe_ctx {
+ bool all_cpus;
+ int cpu;
+ int idx;
+};
+
static enum bpf_perf_event_ret
-print_bpf_output(struct perf_event_header *event, void *private_data)
+print_bpf_output(void *private_data, int cpu, struct perf_event_header *event)
{
- struct perf_event_sample *e = container_of(event, struct perf_event_sample,
+ struct perf_event_sample *e = container_of(event,
+ struct perf_event_sample,
header);
- struct event_ring_info *ring = private_data;
- struct {
- struct perf_event_header header;
- __u64 id;
- __u64 lost;
- } *lost = (typeof(lost))event;
+ struct perf_event_lost *lost = container_of(event,
+ struct perf_event_lost,
+ header);
+ struct event_pipe_ctx *ctx = private_data;
+ int idx = ctx->all_cpus ? cpu : ctx->idx;
if (json_output) {
jsonw_start_object(json_wtr);
jsonw_name(json_wtr, "type");
jsonw_uint(json_wtr, e->header.type);
jsonw_name(json_wtr, "cpu");
- jsonw_uint(json_wtr, ring->cpu);
+ jsonw_uint(json_wtr, cpu);
jsonw_name(json_wtr, "index");
- jsonw_uint(json_wtr, ring->key);
+ jsonw_uint(json_wtr, idx);
if (e->header.type == PERF_RECORD_SAMPLE) {
jsonw_name(json_wtr, "timestamp");
jsonw_uint(json_wtr, e->time);
@@ -89,7 +101,7 @@ print_bpf_output(struct perf_event_header *event, void *private_data)
if (e->header.type == PERF_RECORD_SAMPLE) {
printf("== @%lld.%09lld CPU: %d index: %d =====\n",
e->time / 1000000000ULL, e->time % 1000000000ULL,
- ring->cpu, ring->key);
+ cpu, idx);
fprint_hex(stdout, e->data, e->size, " ");
printf("\n");
} else if (e->header.type == PERF_RECORD_LOST) {
@@ -103,87 +115,25 @@ print_bpf_output(struct perf_event_header *event, void *private_data)
return LIBBPF_PERF_EVENT_CONT;
}
-static void
-perf_event_read(struct event_ring_info *ring, void **buf, size_t *buf_len)
-{
- enum bpf_perf_event_ret ret;
-
- ret = bpf_perf_event_read_simple(ring->mem,
- MMAP_PAGE_CNT * get_page_size(),
- get_page_size(), buf, buf_len,
- print_bpf_output, ring);
- if (ret != LIBBPF_PERF_EVENT_CONT) {
- fprintf(stderr, "perf read loop failed with %d\n", ret);
- stop = true;
- }
-}
-
-static int perf_mmap_size(void)
-{
- return get_page_size() * (MMAP_PAGE_CNT + 1);
-}
-
-static void *perf_event_mmap(int fd)
-{
- int mmap_size = perf_mmap_size();
- void *base;
-
- base = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
- if (base == MAP_FAILED) {
- p_err("event mmap failed: %s\n", strerror(errno));
- return NULL;
- }
-
- return base;
-}
-
-static void perf_event_unmap(void *mem)
-{
- if (munmap(mem, perf_mmap_size()))
- fprintf(stderr, "Can't unmap ring memory!\n");
-}
-
-static int bpf_perf_event_open(int map_fd, int key, int cpu)
+int do_event_pipe(int argc, char **argv)
{
- struct perf_event_attr attr = {
+ struct perf_event_attr perf_attr = {
.sample_type = PERF_SAMPLE_RAW | PERF_SAMPLE_TIME,
.type = PERF_TYPE_SOFTWARE,
.config = PERF_COUNT_SW_BPF_OUTPUT,
+ .sample_period = 1,
+ .wakeup_events = 1,
};
- int pmu_fd;
-
- pmu_fd = sys_perf_event_open(&attr, -1, cpu, -1, 0);
- if (pmu_fd < 0) {
- p_err("failed to open perf event %d for CPU %d", key, cpu);
- return -1;
- }
-
- if (bpf_map_update_elem(map_fd, &key, &pmu_fd, BPF_ANY)) {
- p_err("failed to update map for event %d for CPU %d", key, cpu);
- goto err_close;
- }
- if (ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0)) {
- p_err("failed to enable event %d for CPU %d", key, cpu);
- goto err_close;
- }
-
- return pmu_fd;
-
-err_close:
- close(pmu_fd);
- return -1;
-}
-
-int do_event_pipe(int argc, char **argv)
-{
- int i, nfds, map_fd, index = -1, cpu = -1;
struct bpf_map_info map_info = {};
- struct event_ring_info *rings;
- size_t tmp_buf_sz = 0;
- void *tmp_buf = NULL;
- struct pollfd *pfds;
+ struct perf_buffer_raw_opts opts = {};
+ struct event_pipe_ctx ctx = {
+ .all_cpus = true,
+ .cpu = -1,
+ .idx = -1,
+ };
+ struct perf_buffer *pb;
__u32 map_info_len;
- bool do_all = true;
+ int err, map_fd;
map_info_len = sizeof(map_info);
map_fd = map_parse_fd_and_info(&argc, &argv, &map_info, &map_info_len);
@@ -205,7 +155,7 @@ int do_event_pipe(int argc, char **argv)
char *endptr;
NEXT_ARG();
- cpu = strtoul(*argv, &endptr, 0);
+ ctx.cpu = strtoul(*argv, &endptr, 0);
if (*endptr) {
p_err("can't parse %s as CPU ID", **argv);
goto err_close_map;
@@ -216,7 +166,7 @@ int do_event_pipe(int argc, char **argv)
char *endptr;
NEXT_ARG();
- index = strtoul(*argv, &endptr, 0);
+ ctx.idx = strtoul(*argv, &endptr, 0);
if (*endptr) {
p_err("can't parse %s as index", **argv);
goto err_close_map;
@@ -228,45 +178,32 @@ int do_event_pipe(int argc, char **argv)
goto err_close_map;
}
- do_all = false;
+ ctx.all_cpus = false;
}
- if (!do_all) {
- if (index == -1 || cpu == -1) {
+ if (!ctx.all_cpus) {
+ if (ctx.idx == -1 || ctx.cpu == -1) {
p_err("cpu and index must be specified together");
goto err_close_map;
}
-
- nfds = 1;
} else {
- nfds = min(get_possible_cpus(), map_info.max_entries);
- cpu = 0;
- index = 0;
+ ctx.cpu = 0;
+ ctx.idx = 0;
}
- rings = calloc(nfds, sizeof(rings[0]));
- if (!rings)
+ opts.attr = &perf_attr;
+ opts.event_cb = print_bpf_output;
+ opts.ctx = &ctx;
+ opts.cpu_cnt = ctx.all_cpus ? 0 : 1;
+ opts.cpus = &ctx.cpu;
+ opts.map_keys = &ctx.idx;
+
+ pb = perf_buffer__new_raw(map_fd, MMAP_PAGE_CNT, &opts);
+ err = libbpf_get_error(pb);
+ if (err) {
+ p_err("failed to create perf buffer: %s (%d)",
+ strerror(err), err);
goto err_close_map;
-
- pfds = calloc(nfds, sizeof(pfds[0]));
- if (!pfds)
- goto err_free_rings;
-
- for (i = 0; i < nfds; i++) {
- rings[i].cpu = cpu + i;
- rings[i].key = index + i;
-
- rings[i].fd = bpf_perf_event_open(map_fd, rings[i].key,
- rings[i].cpu);
- if (rings[i].fd < 0)
- goto err_close_fds_prev;
-
- rings[i].mem = perf_event_mmap(rings[i].fd);
- if (!rings[i].mem)
- goto err_close_fds_current;
-
- pfds[i].fd = rings[i].fd;
- pfds[i].events = POLLIN;
}
signal(SIGINT, int_exit);
@@ -277,34 +214,24 @@ int do_event_pipe(int argc, char **argv)
jsonw_start_array(json_wtr);
while (!stop) {
- poll(pfds, nfds, 200);
- for (i = 0; i < nfds; i++)
- perf_event_read(&rings[i], &tmp_buf, &tmp_buf_sz);
+ err = perf_buffer__poll(pb, 200);
+ if (err < 0 && err != -EINTR) {
+ p_err("perf buffer polling failed: %s (%d)",
+ strerror(err), err);
+ goto err_close_pb;
+ }
}
- free(tmp_buf);
if (json_output)
jsonw_end_array(json_wtr);
- for (i = 0; i < nfds; i++) {
- perf_event_unmap(rings[i].mem);
- close(rings[i].fd);
- }
- free(pfds);
- free(rings);
+ perf_buffer__free(pb);
close(map_fd);
return 0;
-err_close_fds_prev:
- while (i--) {
- perf_event_unmap(rings[i].mem);
-err_close_fds_current:
- close(rings[i].fd);
- }
- free(pfds);
-err_free_rings:
- free(rings);
+err_close_pb:
+ perf_buffer__free(pb);
err_close_map:
close(map_fd);
return -1;
--
2.17.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v5 bpf-next 5/5] libbpf: add perf_buffer_ prefix to README
2019-07-06 4:35 [PATCH v5 bpf-next 0/5] libbpf: add perf buffer abstraction and API Andrii Nakryiko
` (3 preceding siblings ...)
2019-07-06 4:35 ` [PATCH v5 bpf-next 4/5] tools/bpftool: switch map event_pipe to libbpf's perf_buffer Andrii Nakryiko
@ 2019-07-06 4:35 ` Andrii Nakryiko
4 siblings, 0 replies; 9+ messages in thread
From: Andrii Nakryiko @ 2019-07-06 4:35 UTC (permalink / raw)
To: andrii.nakryiko, kernel-team, bpf, netdev, ast, daniel; +Cc: Andrii Nakryiko
perf_buffer "object" is part of libbpf API now, add it to the list of
libbpf function prefixes.
Suggested-by: Daniel Borkman <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
tools/lib/bpf/README.rst | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/lib/bpf/README.rst b/tools/lib/bpf/README.rst
index cef7b77eab69..8928f7787f2d 100644
--- a/tools/lib/bpf/README.rst
+++ b/tools/lib/bpf/README.rst
@@ -9,7 +9,8 @@ described here. It's recommended to follow these conventions whenever a
new function or type is added to keep libbpf API clean and consistent.
All types and functions provided by libbpf API should have one of the
-following prefixes: ``bpf_``, ``btf_``, ``libbpf_``, ``xsk_``.
+following prefixes: ``bpf_``, ``btf_``, ``libbpf_``, ``xsk_``,
+``perf_buffer_``.
System call wrappers
--------------------
--
2.17.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v5 bpf-next 1/5] libbpf: add perf buffer API
2019-07-06 4:35 ` [PATCH v5 bpf-next 1/5] libbpf: add perf buffer API Andrii Nakryiko
@ 2019-07-06 5:42 ` Yonghong Song
2019-07-06 5:54 ` Andrii Nakryiko
0 siblings, 1 reply; 9+ messages in thread
From: Yonghong Song @ 2019-07-06 5:42 UTC (permalink / raw)
To: Andrii Nakryiko, andrii.nakryiko, Kernel Team, bpf, netdev,
Alexei Starovoitov, daniel
On 7/5/19 9:35 PM, Andrii Nakryiko wrote:
> BPF_MAP_TYPE_PERF_EVENT_ARRAY map is often used to send data from BPF program
> to user space for additional processing. libbpf already has very low-level API
> to read single CPU perf buffer, bpf_perf_event_read_simple(), but it's hard to
> use and requires a lot of code to set everything up. This patch adds
> perf_buffer abstraction on top of it, abstracting setting up and polling
> per-CPU logic into simple and convenient API, similar to what BCC provides.
>
> perf_buffer__new() sets up per-CPU ring buffers and updates corresponding BPF
> map entries. It accepts two user-provided callbacks: one for handling raw
> samples and one for get notifications of lost samples due to buffer overflow.
>
> perf_buffer__new_raw() is similar, but provides more control over how
> perf events are set up (by accepting user-provided perf_event_attr), how
> they are handled (perf_event_header pointer is passed directly to
> user-provided callback), and on which CPUs ring buffers are created
> (it's possible to provide a list of CPUs and corresponding map keys to
> update). This API allows advanced users fuller control.
>
> perf_buffer__poll() is used to fetch ring buffer data across all CPUs,
> utilizing epoll instance.
>
> perf_buffer__free() does corresponding clean up and unsets FDs from BPF map.
>
> All APIs are not thread-safe. User should ensure proper locking/coordination if
> used in multi-threaded set up.
>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---
> tools/lib/bpf/libbpf.c | 366 +++++++++++++++++++++++++++++++++++++++
> tools/lib/bpf/libbpf.h | 49 ++++++
> tools/lib/bpf/libbpf.map | 4 +
> 3 files changed, 419 insertions(+)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 2a08eb106221..72149d68b8c1 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -32,7 +32,9 @@
> #include <linux/limits.h>
> #include <linux/perf_event.h>
> #include <linux/ring_buffer.h>
> +#include <sys/epoll.h>
> #include <sys/ioctl.h>
> +#include <sys/mman.h>
> #include <sys/stat.h>
> #include <sys/types.h>
> #include <sys/vfs.h>
> @@ -4354,6 +4356,370 @@ bpf_perf_event_read_simple(void *mmap_mem, size_t mmap_size, size_t page_size,
> return ret;
> }
>
> +struct perf_buffer;
> +
> +struct perf_buffer_params {
> + struct perf_event_attr *attr;
> + /* if event_cb is specified, it takes precendence */
> + perf_buffer_event_fn event_cb;
> + /* sample_cb and lost_cb are higher-level common-case callbacks */
> + perf_buffer_sample_fn sample_cb;
> + perf_buffer_lost_fn lost_cb;
> + void *ctx;
> + int cpu_cnt;
> + int *cpus;
[...]
> +
> +int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms)
> +{
> + int cnt, err;
> +
> + cnt = epoll_wait(pb->epoll_fd, pb->events, pb->cpu_cnt, timeout_ms);
> + for (int i = 0; i < cnt; i++) {
Find one compilation error here.
libbpf.c: In function ‘perf_buffer__poll’:
libbpf.c:4728:2: error: ‘for’ loop initial declarations are only allowed
in C99 mode
for (int i = 0; i < cnt; i++) {
^
> + struct perf_cpu_buf *cpu_buf = pb->events[i].data.ptr;
> +
> + err = perf_buffer__process_records(pb, cpu_buf);
> + if (err) {
> + pr_warning("error while processing records: %d\n", err);
> + return err;
> + }
> + }
> + return cnt < 0 ? -errno : cnt;
> +}
> +
> struct bpf_prog_info_array_desc {
> int array_offset; /* e.g. offset of jited_prog_insns */
> int count_offset; /* e.g. offset of jited_prog_len */
[...]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v5 bpf-next 1/5] libbpf: add perf buffer API
2019-07-06 5:42 ` Yonghong Song
@ 2019-07-06 5:54 ` Andrii Nakryiko
2019-07-06 15:56 ` Yonghong Song
0 siblings, 1 reply; 9+ messages in thread
From: Andrii Nakryiko @ 2019-07-06 5:54 UTC (permalink / raw)
To: Yonghong Song
Cc: Andrii Nakryiko, Kernel Team, bpf, netdev, Alexei Starovoitov, daniel
On Fri, Jul 5, 2019 at 10:42 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 7/5/19 9:35 PM, Andrii Nakryiko wrote:
> > BPF_MAP_TYPE_PERF_EVENT_ARRAY map is often used to send data from BPF program
> > to user space for additional processing. libbpf already has very low-level API
> > to read single CPU perf buffer, bpf_perf_event_read_simple(), but it's hard to
> > use and requires a lot of code to set everything up. This patch adds
> > perf_buffer abstraction on top of it, abstracting setting up and polling
> > per-CPU logic into simple and convenient API, similar to what BCC provides.
> >
> > perf_buffer__new() sets up per-CPU ring buffers and updates corresponding BPF
> > map entries. It accepts two user-provided callbacks: one for handling raw
> > samples and one for get notifications of lost samples due to buffer overflow.
> >
> > perf_buffer__new_raw() is similar, but provides more control over how
> > perf events are set up (by accepting user-provided perf_event_attr), how
> > they are handled (perf_event_header pointer is passed directly to
> > user-provided callback), and on which CPUs ring buffers are created
> > (it's possible to provide a list of CPUs and corresponding map keys to
> > update). This API allows advanced users fuller control.
> >
> > perf_buffer__poll() is used to fetch ring buffer data across all CPUs,
> > utilizing epoll instance.
> >
> > perf_buffer__free() does corresponding clean up and unsets FDs from BPF map.
> >
> > All APIs are not thread-safe. User should ensure proper locking/coordination if
> > used in multi-threaded set up.
> >
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > ---
> > tools/lib/bpf/libbpf.c | 366 +++++++++++++++++++++++++++++++++++++++
> > tools/lib/bpf/libbpf.h | 49 ++++++
> > tools/lib/bpf/libbpf.map | 4 +
> > 3 files changed, 419 insertions(+)
> >
> > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > index 2a08eb106221..72149d68b8c1 100644
> > --- a/tools/lib/bpf/libbpf.c
> > +++ b/tools/lib/bpf/libbpf.c
> > @@ -32,7 +32,9 @@
> > #include <linux/limits.h>
> > #include <linux/perf_event.h>
> > #include <linux/ring_buffer.h>
> > +#include <sys/epoll.h>
> > #include <sys/ioctl.h>
> > +#include <sys/mman.h>
> > #include <sys/stat.h>
> > #include <sys/types.h>
> > #include <sys/vfs.h>
> > @@ -4354,6 +4356,370 @@ bpf_perf_event_read_simple(void *mmap_mem, size_t mmap_size, size_t page_size,
> > return ret;
> > }
> >
> > +struct perf_buffer;
> > +
> > +struct perf_buffer_params {
> > + struct perf_event_attr *attr;
> > + /* if event_cb is specified, it takes precendence */
> > + perf_buffer_event_fn event_cb;
> > + /* sample_cb and lost_cb are higher-level common-case callbacks */
> > + perf_buffer_sample_fn sample_cb;
> > + perf_buffer_lost_fn lost_cb;
> > + void *ctx;
> > + int cpu_cnt;
> > + int *cpus;
> [...]
> > +
> > +int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms)
> > +{
> > + int cnt, err;
> > +
> > + cnt = epoll_wait(pb->epoll_fd, pb->events, pb->cpu_cnt, timeout_ms);
> > + for (int i = 0; i < cnt; i++) {
>
> Find one compilation error here.
>
> libbpf.c: In function ‘perf_buffer__poll’:
> libbpf.c:4728:2: error: ‘for’ loop initial declarations are only allowed
> in C99 mode
> for (int i = 0; i < cnt; i++) {
> ^
>
Ah... Fixing, thanks!. How did you compile? make -C tools/lib/bpf
doesn't show this, should we update libbpf Makefile to catch stuff
like this?
> > + struct perf_cpu_buf *cpu_buf = pb->events[i].data.ptr;
> > +
> > + err = perf_buffer__process_records(pb, cpu_buf);
> > + if (err) {
> > + pr_warning("error while processing records: %d\n", err);
> > + return err;
> > + }
> > + }
> > + return cnt < 0 ? -errno : cnt;
> > +}
> > +
> > struct bpf_prog_info_array_desc {
> > int array_offset; /* e.g. offset of jited_prog_insns */
> > int count_offset; /* e.g. offset of jited_prog_len */
> [...]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v5 bpf-next 1/5] libbpf: add perf buffer API
2019-07-06 5:54 ` Andrii Nakryiko
@ 2019-07-06 15:56 ` Yonghong Song
0 siblings, 0 replies; 9+ messages in thread
From: Yonghong Song @ 2019-07-06 15:56 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Andrii Nakryiko, Kernel Team, bpf, netdev, Alexei Starovoitov, daniel
On 7/5/19 10:54 PM, Andrii Nakryiko wrote:
> On Fri, Jul 5, 2019 at 10:42 PM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 7/5/19 9:35 PM, Andrii Nakryiko wrote:
>>> BPF_MAP_TYPE_PERF_EVENT_ARRAY map is often used to send data from BPF program
>>> to user space for additional processing. libbpf already has very low-level API
>>> to read single CPU perf buffer, bpf_perf_event_read_simple(), but it's hard to
>>> use and requires a lot of code to set everything up. This patch adds
>>> perf_buffer abstraction on top of it, abstracting setting up and polling
>>> per-CPU logic into simple and convenient API, similar to what BCC provides.
>>>
>>> perf_buffer__new() sets up per-CPU ring buffers and updates corresponding BPF
>>> map entries. It accepts two user-provided callbacks: one for handling raw
>>> samples and one for get notifications of lost samples due to buffer overflow.
>>>
>>> perf_buffer__new_raw() is similar, but provides more control over how
>>> perf events are set up (by accepting user-provided perf_event_attr), how
>>> they are handled (perf_event_header pointer is passed directly to
>>> user-provided callback), and on which CPUs ring buffers are created
>>> (it's possible to provide a list of CPUs and corresponding map keys to
>>> update). This API allows advanced users fuller control.
>>>
>>> perf_buffer__poll() is used to fetch ring buffer data across all CPUs,
>>> utilizing epoll instance.
>>>
>>> perf_buffer__free() does corresponding clean up and unsets FDs from BPF map.
>>>
>>> All APIs are not thread-safe. User should ensure proper locking/coordination if
>>> used in multi-threaded set up.
>>>
>>> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
>>> ---
>>> tools/lib/bpf/libbpf.c | 366 +++++++++++++++++++++++++++++++++++++++
>>> tools/lib/bpf/libbpf.h | 49 ++++++
>>> tools/lib/bpf/libbpf.map | 4 +
>>> 3 files changed, 419 insertions(+)
>>>
>>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>>> index 2a08eb106221..72149d68b8c1 100644
>>> --- a/tools/lib/bpf/libbpf.c
>>> +++ b/tools/lib/bpf/libbpf.c
>>> @@ -32,7 +32,9 @@
>>> #include <linux/limits.h>
>>> #include <linux/perf_event.h>
>>> #include <linux/ring_buffer.h>
>>> +#include <sys/epoll.h>
>>> #include <sys/ioctl.h>
>>> +#include <sys/mman.h>
>>> #include <sys/stat.h>
>>> #include <sys/types.h>
>>> #include <sys/vfs.h>
>>> @@ -4354,6 +4356,370 @@ bpf_perf_event_read_simple(void *mmap_mem, size_t mmap_size, size_t page_size,
>>> return ret;
>>> }
>>>
>>> +struct perf_buffer;
>>> +
>>> +struct perf_buffer_params {
>>> + struct perf_event_attr *attr;
>>> + /* if event_cb is specified, it takes precendence */
>>> + perf_buffer_event_fn event_cb;
>>> + /* sample_cb and lost_cb are higher-level common-case callbacks */
>>> + perf_buffer_sample_fn sample_cb;
>>> + perf_buffer_lost_fn lost_cb;
>>> + void *ctx;
>>> + int cpu_cnt;
>>> + int *cpus;
>> [...]
>>> +
>>> +int perf_buffer__poll(struct perf_buffer *pb, int timeout_ms)
>>> +{
>>> + int cnt, err;
>>> +
>>> + cnt = epoll_wait(pb->epoll_fd, pb->events, pb->cpu_cnt, timeout_ms);
>>> + for (int i = 0; i < cnt; i++) {
>>
>> Find one compilation error here.
>>
>> libbpf.c: In function ‘perf_buffer__poll’:
>> libbpf.c:4728:2: error: ‘for’ loop initial declarations are only allowed
>> in C99 mode
>> for (int i = 0; i < cnt; i++) {
>> ^
>>
>
> Ah... Fixing, thanks!. How did you compile? make -C tools/lib/bpf
> doesn't show this, should we update libbpf Makefile to catch stuff
> like this?
I did not make any code changes. My compiler is gcc 4.8.5. it is
possible that old compiler less tolerant.
>>> + struct perf_cpu_buf *cpu_buf = pb->events[i].data.ptr;
>>> +
>>> + err = perf_buffer__process_records(pb, cpu_buf);
>>> + if (err) {
>>> + pr_warning("error while processing records: %d\n", err);
>>> + return err;
>>> + }
>>> + }
>>> + return cnt < 0 ? -errno : cnt;
>>> +}
>>> +
>>> struct bpf_prog_info_array_desc {
>>> int array_offset; /* e.g. offset of jited_prog_insns */
>>> int count_offset; /* e.g. offset of jited_prog_len */
>> [...]
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-07-06 15:57 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-06 4:35 [PATCH v5 bpf-next 0/5] libbpf: add perf buffer abstraction and API Andrii Nakryiko
2019-07-06 4:35 ` [PATCH v5 bpf-next 1/5] libbpf: add perf buffer API Andrii Nakryiko
2019-07-06 5:42 ` Yonghong Song
2019-07-06 5:54 ` Andrii Nakryiko
2019-07-06 15:56 ` Yonghong Song
2019-07-06 4:35 ` [PATCH v5 bpf-next 2/5] libbpf: auto-set PERF_EVENT_ARRAY size to number of CPUs Andrii Nakryiko
2019-07-06 4:35 ` [PATCH v5 bpf-next 3/5] selftests/bpf: test perf buffer API Andrii Nakryiko
2019-07-06 4:35 ` [PATCH v5 bpf-next 4/5] tools/bpftool: switch map event_pipe to libbpf's perf_buffer Andrii Nakryiko
2019-07-06 4:35 ` [PATCH v5 bpf-next 5/5] libbpf: add perf_buffer_ prefix to README Andrii Nakryiko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).