All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 net-next 0/3] bpf: Add BPF support to all perf_event
@ 2017-05-26  5:55 Alexei Starovoitov
  2017-05-26  5:55 ` [PATCH v2 net-next 1/3] perf, bpf: Add BPF support to all perf_event types Alexei Starovoitov
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Alexei Starovoitov @ 2017-05-26  5:55 UTC (permalink / raw)
  To: David S . Miller
  Cc: Peter Zijlstra, Brendan Gregg, Daniel Borkmann, Teng Qin, netdev,
	linux-kernel

v1->v2: address Peter's feedback. Refactor patch 1 to allow attaching
bpf programs to all event types and reading counters from all of them as well
patch 2 - more tests
patch 3 - address Dave's feedback and document bpf_perf_event_read()
and bpf_perf_event_output() properly

Teng Qin (3):
  perf, bpf: Add BPF support to all perf_event types
  samples/bpf: add samples for more perf event types
  bpf: update perf event helper functions documentation

 include/uapi/linux/bpf.h       |  11 ++-
 kernel/bpf/arraymap.c          |  26 +++---
 kernel/events/core.c           |   6 +-
 kernel/trace/bpf_trace.c       |   4 +-
 samples/bpf/bpf_helpers.h      |   3 +-
 samples/bpf/trace_event_user.c |  46 ++++++++++-
 samples/bpf/tracex6_kern.c     |  28 +++++--
 samples/bpf/tracex6_user.c     | 176 ++++++++++++++++++++++++++++++++---------
 tools/include/uapi/linux/bpf.h |  11 ++-
 9 files changed, 232 insertions(+), 79 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 net-next 1/3] perf, bpf: Add BPF support to all perf_event types
  2017-05-26  5:55 [PATCH v2 net-next 0/3] bpf: Add BPF support to all perf_event Alexei Starovoitov
@ 2017-05-26  5:55 ` Alexei Starovoitov
  2017-05-26 11:04   ` kbuild test robot
                     ` (2 more replies)
  2017-05-26  5:55 ` [PATCH v2 net-next 2/3] samples/bpf: add samples for more perf event types Alexei Starovoitov
  2017-05-26  5:55 ` [PATCH v2 net-next 3/3] bpf: update perf event helper functions documentation Alexei Starovoitov
  2 siblings, 3 replies; 14+ messages in thread
From: Alexei Starovoitov @ 2017-05-26  5:55 UTC (permalink / raw)
  To: David S . Miller
  Cc: Peter Zijlstra, Brendan Gregg, Daniel Borkmann, Teng Qin, netdev,
	linux-kernel

From: Teng Qin <qinteng@fb.com>

Allow BPF program to attach to all perf_event types supported
by the current bpf and perf code logic, including HW_CACHE, RAW,
and dynamic pmu events.

Also add support for reading these event counters using
bpf_perf_event_read() helper.

Signed-off-by: Teng Qin <qinteng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/bpf/arraymap.c    | 26 +++++++++++---------------
 kernel/events/core.c     |  6 +-----
 kernel/trace/bpf_trace.c |  4 ++--
 3 files changed, 14 insertions(+), 22 deletions(-)

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 5e00b2333c26..55ffa9949128 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -462,26 +462,22 @@ static void *perf_event_fd_array_get_ptr(struct bpf_map *map,
 
 	event = perf_file->private_data;
 	ee = ERR_PTR(-EINVAL);
+	/* Per-task events are not supported */
+	if (event->attach_state & PERF_ATTACH_TASK)
+		goto err_out;
 
 	attr = perf_event_attrs(event);
 	if (IS_ERR(attr) || attr->inherit)
 		goto err_out;
+	/* TRACEPOINT and BREAKPOINT not supported in perf_event_read_local */
+	if (attr->type == PERF_TYPE_TRACEPOINT ||
+	    attr->type == PERF_TYPE_BREAKPOINT)
+		goto err_out;
 
-	switch (attr->type) {
-	case PERF_TYPE_SOFTWARE:
-		if (attr->config != PERF_COUNT_SW_BPF_OUTPUT)
-			goto err_out;
-		/* fall-through */
-	case PERF_TYPE_RAW:
-	case PERF_TYPE_HARDWARE:
-		ee = bpf_event_entry_gen(perf_file, map_file);
-		if (ee)
-			return ee;
-		ee = ERR_PTR(-ENOMEM);
-		/* fall-through */
-	default:
-		break;
-	}
+	ee = bpf_event_entry_gen(perf_file, map_file);
+	if (ee)
+		return ee;
+	ee = ERR_PTR(-ENOMEM);
 
 err_out:
 	fput(perf_file);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6e75a5c9412d..52f667046599 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8037,12 +8037,8 @@ static int perf_event_set_bpf_prog(struct perf_event *event, u32 prog_fd)
 	bool is_kprobe, is_tracepoint;
 	struct bpf_prog *prog;
 
-	if (event->attr.type == PERF_TYPE_HARDWARE ||
-	    event->attr.type == PERF_TYPE_SOFTWARE)
-		return perf_event_set_bpf_handler(event, prog_fd);
-
 	if (event->attr.type != PERF_TYPE_TRACEPOINT)
-		return -EINVAL;
+		return perf_event_set_bpf_handler(event, prog_fd);
 
 	if (event->tp_event->prog)
 		return -EEXIST;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 460a031c77e5..8425bf193f39 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -248,8 +248,8 @@ BPF_CALL_2(bpf_perf_event_read, struct bpf_map *, map, u64, flags)
 		return -ENOENT;
 
 	event = ee->event;
-	if (unlikely(event->attr.type != PERF_TYPE_HARDWARE &&
-		     event->attr.type != PERF_TYPE_RAW))
+	if (unlikely(event->attr.type == PERF_TYPE_SOFTWARE &&
+		     event->attr.config == PERF_COUNT_SW_BPF_OUTPUT))
 		return -EINVAL;
 
 	/* make sure event is local and doesn't have pmu::count */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 net-next 2/3] samples/bpf: add samples for more perf event types
  2017-05-26  5:55 [PATCH v2 net-next 0/3] bpf: Add BPF support to all perf_event Alexei Starovoitov
  2017-05-26  5:55 ` [PATCH v2 net-next 1/3] perf, bpf: Add BPF support to all perf_event types Alexei Starovoitov
@ 2017-05-26  5:55 ` Alexei Starovoitov
  2017-05-26  5:55 ` [PATCH v2 net-next 3/3] bpf: update perf event helper functions documentation Alexei Starovoitov
  2 siblings, 0 replies; 14+ messages in thread
From: Alexei Starovoitov @ 2017-05-26  5:55 UTC (permalink / raw)
  To: David S . Miller
  Cc: Peter Zijlstra, Brendan Gregg, Daniel Borkmann, Teng Qin, netdev,
	linux-kernel

From: Teng Qin <qinteng@fb.com>

This commit adds test code to attach BPF to HW_CACHE and RAW type events
and updates clean-up logic to disable the perf events before closing pmu_fd.

This commit also adds test code to read SOFTWARE, HW_CACHE, RAW and dynamic
pmu events from BPF program using bpf_perf_event_read(). Refactored the
existing sample to fork individual task on each CPU, attach kprobe to
more controllable function, and more accurately check if each read on
every CPU returned with good value.

Signed-off-by: Teng Qin <qinteng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 samples/bpf/bpf_helpers.h      |   3 +-
 samples/bpf/trace_event_user.c |  46 ++++++++++-
 samples/bpf/tracex6_kern.c     |  28 +++++--
 samples/bpf/tracex6_user.c     | 176 ++++++++++++++++++++++++++++++++---------
 4 files changed, 204 insertions(+), 49 deletions(-)

diff --git a/samples/bpf/bpf_helpers.h b/samples/bpf/bpf_helpers.h
index 9a9c95f2c9fb..51e567bc70fc 100644
--- a/samples/bpf/bpf_helpers.h
+++ b/samples/bpf/bpf_helpers.h
@@ -31,7 +31,8 @@ static unsigned long long (*bpf_get_current_uid_gid)(void) =
 	(void *) BPF_FUNC_get_current_uid_gid;
 static int (*bpf_get_current_comm)(void *buf, int buf_size) =
 	(void *) BPF_FUNC_get_current_comm;
-static int (*bpf_perf_event_read)(void *map, int index) =
+static unsigned long long (*bpf_perf_event_read)(void *map,
+						 unsigned long long flags) =
 	(void *) BPF_FUNC_perf_event_read;
 static int (*bpf_clone_redirect)(void *ctx, int ifindex, int flags) =
 	(void *) BPF_FUNC_clone_redirect;
diff --git a/samples/bpf/trace_event_user.c b/samples/bpf/trace_event_user.c
index fa4336423da5..666761773fda 100644
--- a/samples/bpf/trace_event_user.c
+++ b/samples/bpf/trace_event_user.c
@@ -122,13 +122,14 @@ static void test_perf_event_all_cpu(struct perf_event_attr *attr)
 {
 	int nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
 	int *pmu_fd = malloc(nr_cpus * sizeof(int));
-	int i;
+	int i, error = 0;
 
 	/* open perf_event on all cpus */
 	for (i = 0; i < nr_cpus; i++) {
 		pmu_fd[i] = sys_perf_event_open(attr, -1, i, -1, 0);
 		if (pmu_fd[i] < 0) {
 			printf("sys_perf_event_open failed\n");
+			error = 1;
 			goto all_cpu_err;
 		}
 		assert(ioctl(pmu_fd[i], PERF_EVENT_IOC_SET_BPF, prog_fd[0]) == 0);
@@ -137,9 +138,13 @@ static void test_perf_event_all_cpu(struct perf_event_attr *attr)
 	system("dd if=/dev/zero of=/dev/null count=5000k");
 	print_stacks();
 all_cpu_err:
-	for (i--; i >= 0; i--)
+	for (i--; i >= 0; i--) {
+		ioctl(pmu_fd[i], PERF_EVENT_IOC_DISABLE, 0);
 		close(pmu_fd[i]);
+	}
 	free(pmu_fd);
+	if (error)
+		int_exit(0);
 }
 
 static void test_perf_event_task(struct perf_event_attr *attr)
@@ -150,7 +155,7 @@ static void test_perf_event_task(struct perf_event_attr *attr)
 	pmu_fd = sys_perf_event_open(attr, 0, -1, -1, 0);
 	if (pmu_fd < 0) {
 		printf("sys_perf_event_open failed\n");
-		return;
+		int_exit(0);
 	}
 	assert(ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd[0]) == 0);
 	assert(ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0) == 0);
@@ -175,11 +180,45 @@ static void test_bpf_perf_event(void)
 		.config = PERF_COUNT_SW_CPU_CLOCK,
 		.inherit = 1,
 	};
+	struct perf_event_attr attr_hw_cache_l1d = {
+		.sample_freq = SAMPLE_FREQ,
+		.freq = 1,
+		.type = PERF_TYPE_HW_CACHE,
+		.config =
+			PERF_COUNT_HW_CACHE_L1D |
+			(PERF_COUNT_HW_CACHE_OP_READ << 8) |
+			(PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16),
+		.inherit = 1,
+	};
+	struct perf_event_attr attr_hw_cache_branch_miss = {
+		.sample_freq = SAMPLE_FREQ,
+		.freq = 1,
+		.type = PERF_TYPE_HW_CACHE,
+		.config =
+			PERF_COUNT_HW_CACHE_BPU |
+			(PERF_COUNT_HW_CACHE_OP_READ << 8) |
+			(PERF_COUNT_HW_CACHE_RESULT_MISS << 16),
+		.inherit = 1,
+	};
+	struct perf_event_attr attr_type_raw = {
+		.sample_freq = SAMPLE_FREQ,
+		.freq = 1,
+		.type = PERF_TYPE_RAW,
+		/* Intel Instruction Retired */
+		.config = 0xc0,
+		.inherit = 1,
+	};
 
 	test_perf_event_all_cpu(&attr_type_hw);
 	test_perf_event_task(&attr_type_hw);
 	test_perf_event_all_cpu(&attr_type_sw);
 	test_perf_event_task(&attr_type_sw);
+	test_perf_event_all_cpu(&attr_hw_cache_l1d);
+	test_perf_event_task(&attr_hw_cache_l1d);
+	test_perf_event_all_cpu(&attr_hw_cache_branch_miss);
+	test_perf_event_task(&attr_hw_cache_branch_miss);
+	test_perf_event_all_cpu(&attr_type_raw);
+	test_perf_event_task(&attr_type_raw);
 }
 
 
@@ -210,6 +249,7 @@ int main(int argc, char **argv)
 	}
 	test_bpf_perf_event();
 
+	printf("Success!\n");
 	int_exit(0);
 	return 0;
 }
diff --git a/samples/bpf/tracex6_kern.c b/samples/bpf/tracex6_kern.c
index be479c4af9e2..646f86426d09 100644
--- a/samples/bpf/tracex6_kern.c
+++ b/samples/bpf/tracex6_kern.c
@@ -3,22 +3,36 @@
 #include <uapi/linux/bpf.h>
 #include "bpf_helpers.h"
 
-struct bpf_map_def SEC("maps") my_map = {
+struct bpf_map_def SEC("maps") counters = {
 	.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
 	.key_size = sizeof(int),
 	.value_size = sizeof(u32),
-	.max_entries = 32,
+	.max_entries = 64,
+};
+struct bpf_map_def SEC("maps") values = {
+	.type = BPF_MAP_TYPE_HASH,
+	.key_size = sizeof(int),
+	.value_size = sizeof(u64),
+	.max_entries = 64,
 };
 
-SEC("kprobe/sys_write")
+SEC("kprobe/htab_map_get_next_key")
 int bpf_prog1(struct pt_regs *ctx)
 {
-	u64 count;
+	u64 count, *val;
+	s64 error;
 	u32 key = bpf_get_smp_processor_id();
-	char fmt[] = "CPU-%d   %llu\n";
 
-	count = bpf_perf_event_read(&my_map, key);
-	bpf_trace_printk(fmt, sizeof(fmt), key, count);
+	count = bpf_perf_event_read(&counters, key);
+	error = (s64)count;
+	if (error < 0 && error > -256)
+		return 0;
+
+	val = bpf_map_lookup_elem(&values, &key);
+	if (val)
+		*val = count;
+	else
+		bpf_map_update_elem(&values, &key, &count, BPF_NOEXIST);
 
 	return 0;
 }
diff --git a/samples/bpf/tracex6_user.c b/samples/bpf/tracex6_user.c
index ca7874ed77f4..43743aa8b3e0 100644
--- a/samples/bpf/tracex6_user.c
+++ b/samples/bpf/tracex6_user.c
@@ -1,73 +1,173 @@
-#include <stdio.h>
-#include <unistd.h>
-#include <stdlib.h>
-#include <stdbool.h>
-#include <string.h>
+#define _GNU_SOURCE
+
+#include <assert.h>
 #include <fcntl.h>
-#include <poll.h>
-#include <sys/ioctl.h>
 #include <linux/perf_event.h>
 #include <linux/bpf.h>
-#include "libbpf.h"
+#include <sched.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/ioctl.h>
+#include <sys/resource.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <unistd.h>
+
 #include "bpf_load.h"
+#include "libbpf.h"
 #include "perf-sys.h"
 
 #define SAMPLE_PERIOD  0x7fffffffffffffffULL
 
-static void test_bpf_perf_event(void)
+static void check_on_cpu(int cpu, struct perf_event_attr *attr)
+{
+	cpu_set_t set;
+	int pmu_fd;
+	__u64 value;
+	int error = 0;
+	/* Move to target CPU */
+	CPU_ZERO(&set);
+	CPU_SET(cpu, &set);
+	assert(sched_setaffinity(0, sizeof(set), &set) == 0);
+	/* Open perf event and attach to the perf_event_array */
+	pmu_fd = sys_perf_event_open(attr, -1/*pid*/, cpu/*cpu*/, -1/*group_fd*/, 0);
+	if (pmu_fd < 0) {
+		fprintf(stderr, "sys_perf_event_open failed on CPU %d\n", cpu);
+		error = 1;
+		goto on_exit;
+	}
+	assert(bpf_map_update_elem(map_fd[0], &cpu, &pmu_fd, BPF_ANY) == 0);
+	assert(ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0) == 0);
+	/* Trigger the kprobe */
+	bpf_map_get_next_key(map_fd[1], &cpu, NULL);
+	/* Check the value */
+	if (bpf_map_lookup_elem(map_fd[1], &cpu, &value)) {
+		fprintf(stderr, "Value missing for CPU %d\n", cpu);
+		error = 1;
+		goto on_exit;
+	}
+	fprintf(stderr, "CPU %d: %llu\n", cpu, value);
+
+on_exit:
+	assert(bpf_map_delete_elem(map_fd[0], &cpu) == 0 || error);
+	assert(ioctl(pmu_fd, PERF_EVENT_IOC_DISABLE, 0) == 0);
+	assert(close(pmu_fd) == 0 || error);
+	assert(bpf_map_delete_elem(map_fd[1], &cpu) == 0 || error);
+	exit(error);
+}
+
+static void test_perf_event_array(struct perf_event_attr *attr,
+				  const char *name)
 {
-	int nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
-	int *pmu_fd = malloc(nr_cpus * sizeof(int));
-	int status, i;
+	int i, status, nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
+	pid_t pid[nr_cpus];
+
+	printf("Test reading %s counters\n", name);
+
+	for (i = 0; i < nr_cpus; i++) {
+		pid[i] = fork();
+		assert(pid[i] >= 0);
+		if (pid[i] == 0) {
+			check_on_cpu(i, attr);
+			exit(1);
+		}
+	}
 
-	struct perf_event_attr attr_insn_pmu = {
+	for (i = 0; i < nr_cpus; i++) {
+		assert(waitpid(pid[i], &status, 0) == pid[i]);
+		assert(status == 0);
+	}
+}
+
+static void test_bpf_perf_event(void)
+{
+	struct perf_event_attr attr_cycles = {
 		.freq = 0,
 		.sample_period = SAMPLE_PERIOD,
 		.inherit = 0,
 		.type = PERF_TYPE_HARDWARE,
 		.read_format = 0,
 		.sample_type = 0,
-		.config = 0,/* PMU: cycles */
+		.config = PERF_COUNT_HW_CPU_CYCLES,
+	};
+	struct perf_event_attr attr_clock = {
+		.freq = 0,
+		.sample_period = SAMPLE_PERIOD,
+		.inherit = 0,
+		.type = PERF_TYPE_SOFTWARE,
+		.read_format = 0,
+		.sample_type = 0,
+		.config = PERF_COUNT_SW_CPU_CLOCK,
+	};
+	struct perf_event_attr attr_raw = {
+		.freq = 0,
+		.sample_period = SAMPLE_PERIOD,
+		.inherit = 0,
+		.type = PERF_TYPE_RAW,
+		.read_format = 0,
+		.sample_type = 0,
+		/* Intel Instruction Retired */
+		.config = 0xc0,
+	};
+	struct perf_event_attr attr_l1d_load = {
+		.freq = 0,
+		.sample_period = SAMPLE_PERIOD,
+		.inherit = 0,
+		.type = PERF_TYPE_HW_CACHE,
+		.read_format = 0,
+		.sample_type = 0,
+		.config =
+			PERF_COUNT_HW_CACHE_L1D |
+			(PERF_COUNT_HW_CACHE_OP_READ << 8) |
+			(PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16),
+	};
+	struct perf_event_attr attr_llc_miss = {
+		.freq = 0,
+		.sample_period = SAMPLE_PERIOD,
+		.inherit = 0,
+		.type = PERF_TYPE_HW_CACHE,
+		.read_format = 0,
+		.sample_type = 0,
+		.config =
+			PERF_COUNT_HW_CACHE_LL |
+			(PERF_COUNT_HW_CACHE_OP_READ << 8) |
+			(PERF_COUNT_HW_CACHE_RESULT_MISS << 16),
+	};
+	struct perf_event_attr attr_msr_tsc = {
+		.freq = 0,
+		.sample_period = 0,
+		.inherit = 0,
+		/* From /sys/bus/event_source/devices/msr/ */
+		.type = 7,
+		.read_format = 0,
+		.sample_type = 0,
+		.config = 0,
 	};
 
-	for (i = 0; i < nr_cpus; i++) {
-		pmu_fd[i] = sys_perf_event_open(&attr_insn_pmu, -1/*pid*/, i/*cpu*/, -1/*group_fd*/, 0);
-		if (pmu_fd[i] < 0) {
-			printf("event syscall failed\n");
-			goto exit;
-		}
-
-		bpf_map_update_elem(map_fd[0], &i, &pmu_fd[i], BPF_ANY);
-		ioctl(pmu_fd[i], PERF_EVENT_IOC_ENABLE, 0);
-	}
-
-	status = system("ls > /dev/null");
-	if (status)
-		goto exit;
-	status = system("sleep 2");
-	if (status)
-		goto exit;
-
-exit:
-	for (i = 0; i < nr_cpus; i++)
-		close(pmu_fd[i]);
-	close(map_fd[0]);
-	free(pmu_fd);
+	test_perf_event_array(&attr_cycles, "HARDWARE-cycles");
+	test_perf_event_array(&attr_clock, "SOFTWARE-clock");
+	test_perf_event_array(&attr_raw, "RAW-instruction-retired");
+	test_perf_event_array(&attr_l1d_load, "HW_CACHE-L1D-load");
+	test_perf_event_array(&attr_llc_miss, "HW_CACHE-LLC-miss");
+	test_perf_event_array(&attr_msr_tsc, "Dynamic-msr-tsc");
 }
 
 int main(int argc, char **argv)
 {
+	struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
 	char filename[256];
 
 	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
 
+	setrlimit(RLIMIT_MEMLOCK, &r);
 	if (load_bpf_file(filename)) {
 		printf("%s", bpf_log_buf);
 		return 1;
 	}
 
 	test_bpf_perf_event();
-	read_trace_pipe();
 
+	printf("Success!\n");
 	return 0;
 }
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 net-next 3/3] bpf: update perf event helper functions documentation
  2017-05-26  5:55 [PATCH v2 net-next 0/3] bpf: Add BPF support to all perf_event Alexei Starovoitov
  2017-05-26  5:55 ` [PATCH v2 net-next 1/3] perf, bpf: Add BPF support to all perf_event types Alexei Starovoitov
  2017-05-26  5:55 ` [PATCH v2 net-next 2/3] samples/bpf: add samples for more perf event types Alexei Starovoitov
@ 2017-05-26  5:55 ` Alexei Starovoitov
  2 siblings, 0 replies; 14+ messages in thread
From: Alexei Starovoitov @ 2017-05-26  5:55 UTC (permalink / raw)
  To: David S . Miller
  Cc: Peter Zijlstra, Brendan Gregg, Daniel Borkmann, Teng Qin, netdev,
	linux-kernel

From: Teng Qin <qinteng@fb.com>

This commit updates documentation of the bpf_perf_event_output and
bpf_perf_event_read helpers to match their implementation.

Signed-off-by: Teng Qin <qinteng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/uapi/linux/bpf.h       | 11 +++++++----
 tools/include/uapi/linux/bpf.h | 11 +++++++----
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 94dfa9def355..e78aece03628 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -313,8 +313,11 @@ union bpf_attr {
  *     @flags: room for future extensions
  *     Return: 0 on success or negative error
  *
- * u64 bpf_perf_event_read(&map, index)
- *     Return: Number events read or error code
+ * u64 bpf_perf_event_read(map, flags)
+ *     read perf event counter value
+ *     @map: pointer to perf_event_array map
+ *     @flags: index of event in the map or bitmask flags
+ *     Return: value of perf event counter read or error code
  *
  * int bpf_redirect(ifindex, flags)
  *     redirect to another netdev
@@ -328,11 +331,11 @@ union bpf_attr {
  *     @skb: pointer to skb
  *     Return: realm if != 0
  *
- * int bpf_perf_event_output(ctx, map, index, data, size)
+ * int bpf_perf_event_output(ctx, map, flags, data, size)
  *     output perf raw sample
  *     @ctx: struct pt_regs*
  *     @map: pointer to perf_event_array map
- *     @index: index of event in the map
+ *     @flags: index of event in the map or bitmask flags
  *     @data: data on stack to be output as raw data
  *     @size: size of data
  *     Return: 0 on success or negative error
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 94dfa9def355..e78aece03628 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -313,8 +313,11 @@ union bpf_attr {
  *     @flags: room for future extensions
  *     Return: 0 on success or negative error
  *
- * u64 bpf_perf_event_read(&map, index)
- *     Return: Number events read or error code
+ * u64 bpf_perf_event_read(map, flags)
+ *     read perf event counter value
+ *     @map: pointer to perf_event_array map
+ *     @flags: index of event in the map or bitmask flags
+ *     Return: value of perf event counter read or error code
  *
  * int bpf_redirect(ifindex, flags)
  *     redirect to another netdev
@@ -328,11 +331,11 @@ union bpf_attr {
  *     @skb: pointer to skb
  *     Return: realm if != 0
  *
- * int bpf_perf_event_output(ctx, map, index, data, size)
+ * int bpf_perf_event_output(ctx, map, flags, data, size)
  *     output perf raw sample
  *     @ctx: struct pt_regs*
  *     @map: pointer to perf_event_array map
- *     @index: index of event in the map
+ *     @flags: index of event in the map or bitmask flags
  *     @data: data on stack to be output as raw data
  *     @size: size of data
  *     Return: 0 on success or negative error
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 net-next 1/3] perf, bpf: Add BPF support to all perf_event types
  2017-05-26  5:55 ` [PATCH v2 net-next 1/3] perf, bpf: Add BPF support to all perf_event types Alexei Starovoitov
@ 2017-05-26 11:04   ` kbuild test robot
  2017-05-26 14:55   ` David Miller
  2017-05-29  9:12   ` Peter Zijlstra
  2 siblings, 0 replies; 14+ messages in thread
From: kbuild test robot @ 2017-05-26 11:04 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: kbuild-all, David S . Miller, Peter Zijlstra, Brendan Gregg,
	Daniel Borkmann, Teng Qin, netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1274 bytes --]

Hi Teng,

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Alexei-Starovoitov/bpf-Add-BPF-support-to-all-perf_event/20170526-171542
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 6.2.0
reproduce:
        wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   kernel//bpf/arraymap.c: In function 'perf_event_fd_array_get_ptr':
>> kernel//bpf/arraymap.c:466:11: error: 'struct perf_event' has no member named 'attach_state'
     if (event->attach_state & PERF_ATTACH_TASK)
              ^~

vim +466 kernel//bpf/arraymap.c

   460		if (IS_ERR(perf_file))
   461			return perf_file;
   462	
   463		event = perf_file->private_data;
   464		ee = ERR_PTR(-EINVAL);
   465		/* Per-task events are not supported */
 > 466		if (event->attach_state & PERF_ATTACH_TASK)
   467			goto err_out;
   468	
   469		attr = perf_event_attrs(event);

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 47722 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 net-next 1/3] perf, bpf: Add BPF support to all perf_event types
  2017-05-26  5:55 ` [PATCH v2 net-next 1/3] perf, bpf: Add BPF support to all perf_event types Alexei Starovoitov
  2017-05-26 11:04   ` kbuild test robot
@ 2017-05-26 14:55   ` David Miller
  2017-05-29  9:12   ` Peter Zijlstra
  2 siblings, 0 replies; 14+ messages in thread
From: David Miller @ 2017-05-26 14:55 UTC (permalink / raw)
  To: ast; +Cc: peterz, bgregg, daniel, qinteng, netdev, linux-kernel

From: Alexei Starovoitov <ast@fb.com>
Date: Thu, 25 May 2017 22:55:47 -0700

> +	if (event->attach_state & PERF_ATTACH_TASK)
> +		goto err_out;

The attach_state member only exists when PERF_EVENTS is enabled.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 net-next 1/3] perf, bpf: Add BPF support to all perf_event types
  2017-05-26  5:55 ` [PATCH v2 net-next 1/3] perf, bpf: Add BPF support to all perf_event types Alexei Starovoitov
  2017-05-26 11:04   ` kbuild test robot
  2017-05-26 14:55   ` David Miller
@ 2017-05-29  9:12   ` Peter Zijlstra
  2017-05-29  9:39     ` Peter Zijlstra
  2 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2017-05-29  9:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S . Miller, Brendan Gregg, Daniel Borkmann, Teng Qin,
	netdev, linux-kernel

On Thu, May 25, 2017 at 10:55:47PM -0700, Alexei Starovoitov wrote:

> +++ b/kernel/bpf/arraymap.c
> @@ -462,26 +462,22 @@ static void *perf_event_fd_array_get_ptr(struct bpf_map *map,
>  
>  	event = perf_file->private_data;
>  	ee = ERR_PTR(-EINVAL);
> +	/* Per-task events are not supported */
> +	if (event->attach_state & PERF_ATTACH_TASK)
> +		goto err_out;
>  
>  	attr = perf_event_attrs(event);
>  	if (IS_ERR(attr) || attr->inherit)
>  		goto err_out;

> +	/* TRACEPOINT and BREAKPOINT not supported in perf_event_read_local */

I cannot find reason for this comment. That is, why would
perf_event_read_local() not support those two types?

> +	if (attr->type == PERF_TYPE_TRACEPOINT ||
> +	    attr->type == PERF_TYPE_BREAKPOINT)
> +		goto err_out;
>  
> +	ee = bpf_event_entry_gen(perf_file, map_file);
> +	if (ee)
> +		return ee;
> +	ee = ERR_PTR(-ENOMEM);
>  
>  err_out:
>  	fput(perf_file);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 net-next 1/3] perf, bpf: Add BPF support to all perf_event types
  2017-05-29  9:12   ` Peter Zijlstra
@ 2017-05-29  9:39     ` Peter Zijlstra
  2017-05-30 15:52       ` Alexei Starovoitov
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2017-05-29  9:39 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S . Miller, Brendan Gregg, Daniel Borkmann, Teng Qin,
	netdev, linux-kernel

On Mon, May 29, 2017 at 11:12:53AM +0200, Peter Zijlstra wrote:
> On Thu, May 25, 2017 at 10:55:47PM -0700, Alexei Starovoitov wrote:
> 
> > +++ b/kernel/bpf/arraymap.c
> > @@ -462,26 +462,22 @@ static void *perf_event_fd_array_get_ptr(struct bpf_map *map,
> >  
> >  	event = perf_file->private_data;
> >  	ee = ERR_PTR(-EINVAL);
> > +	/* Per-task events are not supported */
> > +	if (event->attach_state & PERF_ATTACH_TASK)
> > +		goto err_out;
> >  
> >  	attr = perf_event_attrs(event);
> >  	if (IS_ERR(attr) || attr->inherit)
> >  		goto err_out;
> 
> > +	/* TRACEPOINT and BREAKPOINT not supported in perf_event_read_local */
> 
> I cannot find reason for this comment. That is, why would
> perf_event_read_local() not support those two types?
> 
> > +	if (attr->type == PERF_TYPE_TRACEPOINT ||
> > +	    attr->type == PERF_TYPE_BREAKPOINT)
> > +		goto err_out;
> >  
> > +	ee = bpf_event_entry_gen(perf_file, map_file);
> > +	if (ee)
> > +		return ee;
> > +	ee = ERR_PTR(-ENOMEM);
> >  
> >  err_out:
> >  	fput(perf_file);


Do we want something like the below to replace much of the above?

	if (!perf_event_valid_local(event, NULL, cpu))
		goto err_out;

Seems to be roughly what you're after, although I suppose @cpu might be
hard to determine a priory, so maybe we should allow a magic value to
short-circuit that test.

---
 kernel/events/core.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 8d6acaeeea17..a7dc34f19568 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3630,6 +3630,36 @@ static inline u64 perf_event_count(struct perf_event *event)
 }
 
 /*
+ * perf_event_valid_local() - validates if the event is usable by perf_event_read_local()
+ * event: the event to validate
+ * task:  the task the @event will be used in
+ * cpu:   the cpu the @event will be used on
+ *
+ * In case one wants to disallow all per-task events, use @task = NULL.
+ * In case one wants to disallow all per-cpu events, use @cpu = -1.
+ */
+bool perf_event_valid_local(struct perf_event *event, struct task_struct *task, int cpu)
+{
+	/* See perf_event_read_local() for the reasons for these tests */
+
+	if ((event->attach_state & PERF_ATTACH_TASK) &&
+	    event->hw.target != task)
+		return false;
+
+	if (!(event->attach_state & PERF_ATTACH_TASK) &&
+	    event->cpu != cpu)
+		return false;
+
+	if (event->attr.inherit)
+		return false;
+
+	if (event->pmu->count)
+		return false;
+
+	return true;
+}
+
+/*
  * NMI-safe method to read a local event, that is an event that
  * is:
  *   - either for the current task, or for this CPU

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 net-next 1/3] perf, bpf: Add BPF support to all perf_event types
  2017-05-29  9:39     ` Peter Zijlstra
@ 2017-05-30 15:52       ` Alexei Starovoitov
  2017-05-30 16:51         ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Alexei Starovoitov @ 2017-05-30 15:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David S . Miller, Brendan Gregg, Daniel Borkmann, Teng Qin,
	netdev, linux-kernel

On 5/29/17 2:39 AM, Peter Zijlstra wrote:
>
>
> Do we want something like the below to replace much of the above?
>
> 	if (!perf_event_valid_local(event, NULL, cpu))
> 		goto err_out;
>
> Seems to be roughly what you're after, although I suppose @cpu might be
> hard to determine a priory, so maybe we should allow a magic value to
> short-circuit that test.
>
> ---
>  kernel/events/core.c | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 8d6acaeeea17..a7dc34f19568 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3630,6 +3630,36 @@ static inline u64 perf_event_count(struct perf_event *event)
>  }
>
>  /*
> + * perf_event_valid_local() - validates if the event is usable by perf_event_read_local()
> + * event: the event to validate
> + * task:  the task the @event will be used in
> + * cpu:   the cpu the @event will be used on
> + *
> + * In case one wants to disallow all per-task events, use @task = NULL.
> + * In case one wants to disallow all per-cpu events, use @cpu = -1.
> + */
> +bool perf_event_valid_local(struct perf_event *event, struct task_struct *task, int cpu)
> +{
> +	/* See perf_event_read_local() for the reasons for these tests */
> +
> +	if ((event->attach_state & PERF_ATTACH_TASK) &&
> +	    event->hw.target != task)
> +		return false;
> +
> +	if (!(event->attach_state & PERF_ATTACH_TASK) &&
> +	    event->cpu != cpu)
> +		return false;

we do if (unlikely(event->oncpu != cpu))
as dynamic check inside bpf_perf_event_read(), since we cannot do it
statically at perf_event_array update time.
If we drop the above 'if' and keep 'task==null' trick,
then indeed we can use this function as static check.

Right now we're trying to keep as many checks as possible as
static checks to make bpf_perf_event_read() faster.
I guess we can drop that approach and do perf_event_valid_local()
check for every read since perf_event_read_local() does all the
same checks anyway.
So how about converting all WARN_ON in perf_event_read_local()
into 'return -EINVAL' and change func proto into:
int perf_event_read_local(struct perf_event *even, u64 *counter_val)

 > I cannot find reason for this comment. That is, why would
 > perf_event_read_local() not support those two types?

I don't know. What is the meaning of
reading tracepoint/breakpoint counter?
Because of 'event->oncpu != cpu' dynamic check all counters are
expected to be per-cpu. I'm not sure how uncore counters work.
What do they have in event->oncpu? -1? I guess they have pmu->count?
So we cannot read them from bpf program anyway?

If we change warn_ons in perf_event_read_local() to returns
them we can make per-task counters working.
User side will open per-task counter and bpf program will
do current->pid != expected_pid check before calling
bpf_perf_event_read(). bpf scripts often do that already.

int perf_event_read_local(struct perf_event *even, u64 *counter_val)
{
   local_irq_save(flags);
   if ((event->attach_state & PERF_ATTACH_TASK) &&
        event->hw.target != current)
     return -EINVAL;

   if (!(event->attach_state & PERF_ATTACH_TASK) &&
       event->cpu != smp_processor_id()
     return -EINVAL;
   ... inherit and pmu->count checks here ...
   *counter_val = local64_read(&event->count)
   local_irq_restore(flags);
   return 0;
}

thoughts?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 net-next 1/3] perf, bpf: Add BPF support to all perf_event types
  2017-05-30 15:52       ` Alexei Starovoitov
@ 2017-05-30 16:51         ` Peter Zijlstra
  2017-05-30 17:37           ` Alexei Starovoitov
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2017-05-30 16:51 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S . Miller, Brendan Gregg, Daniel Borkmann, Teng Qin,
	netdev, linux-kernel

On Tue, May 30, 2017 at 08:52:14AM -0700, Alexei Starovoitov wrote:

> > +	if (!(event->attach_state & PERF_ATTACH_TASK) &&
> > +	    event->cpu != cpu)
> > +		return false;
> 
> we do if (unlikely(event->oncpu != cpu))
> as dynamic check inside bpf_perf_event_read(), since we cannot do it
> statically at perf_event_array update time.

Right, that's what I thought.

> If we drop the above 'if' and keep 'task==null' trick,
> then indeed we can use this function as static check.

Right, or otherwise have a special value to disable it.

> Right now we're trying to keep as many checks as possible as
> static checks to make bpf_perf_event_read() faster.
> I guess we can drop that approach and do perf_event_valid_local()
> check for every read since perf_event_read_local() does all the
> same checks anyway.
> So how about converting all WARN_ON in perf_event_read_local()
> into 'return -EINVAL' and change func proto into:
> int perf_event_read_local(struct perf_event *even, u64 *counter_val)

I'm confused on how that is better. My recent patches to WARN should
have greatly improved performance of WARN_ON_ONCE(). And looking at that
code, I suspect its dominated by the POPF for inactive events.

> > I cannot find reason for this comment. That is, why would
> > perf_event_read_local() not support those two types?
> 
> I don't know. What is the meaning of
> reading tracepoint/breakpoint counter?

They count like all other software events. +1 for each occurrence.

So for instance, if you use irq_vectors:local_timer_entry you get how
many cpu local timer instances happened during your measurement window.

Same with a breakpoint, it counts how many times it got hit. Typically
you'd want to install a custom handler on breakpoints to do something
'interesting', but even without that its acts like a normal software
event.

> Because of 'event->oncpu != cpu' dynamic check all counters are
> expected to be per-cpu. I'm not sure how uncore counters work.

Uncore thingies are assigned to any online CPU in their 'domain'.

> What do they have in event->oncpu? -1? I guess they have pmu->count?
> So we cannot read them from bpf program anyway?

They have the CPU number of the CPU that's assigned to them. So you
_could_ make use of them, but its a bit tricky to get them to work
reliably because you'd have to get that CPU 'right' and it can change.

Typically they would end up on the first CPU in their domain, but with
CPU hotplug you can move them about and get confusion.

I'd have to think on how to do that nicely.

> If we change warn_ons in perf_event_read_local() to returns
> them we can make per-task counters working.

I'm not entirely sure I see how that is required. Should per task not
already work? The WARN that's there will only trigger if you call them
on the wrong task, which is something you shouldn't do anyway.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 net-next 1/3] perf, bpf: Add BPF support to all perf_event types
  2017-05-30 16:51         ` Peter Zijlstra
@ 2017-05-30 17:37           ` Alexei Starovoitov
  2017-05-30 19:03             ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Alexei Starovoitov @ 2017-05-30 17:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David S . Miller, Brendan Gregg, Daniel Borkmann, Teng Qin,
	netdev, linux-kernel

On 5/30/17 9:51 AM, Peter Zijlstra wrote:
> On Tue, May 30, 2017 at 08:52:14AM -0700, Alexei Starovoitov wrote:
>
>>> +	if (!(event->attach_state & PERF_ATTACH_TASK) &&
>>> +	    event->cpu != cpu)
>>> +		return false;
>>
>> we do if (unlikely(event->oncpu != cpu))
>> as dynamic check inside bpf_perf_event_read(), since we cannot do it
>> statically at perf_event_array update time.
>
> Right, that's what I thought.
>
>> If we drop the above 'if' and keep 'task==null' trick,
>> then indeed we can use this function as static check.
>
> Right, or otherwise have a special value to disable it.
>
>> Right now we're trying to keep as many checks as possible as
>> static checks to make bpf_perf_event_read() faster.
>> I guess we can drop that approach and do perf_event_valid_local()
>> check for every read since perf_event_read_local() does all the
>> same checks anyway.
>> So how about converting all WARN_ON in perf_event_read_local()
>> into 'return -EINVAL' and change func proto into:
>> int perf_event_read_local(struct perf_event *even, u64 *counter_val)
>
> I'm confused on how that is better. My recent patches to WARN should
> have greatly improved performance of WARN_ON_ONCE(). And looking at that
> code, I suspect its dominated by the POPF for inactive events.
>
>>> I cannot find reason for this comment. That is, why would
>>> perf_event_read_local() not support those two types?
>>
>> I don't know. What is the meaning of
>> reading tracepoint/breakpoint counter?
>
> They count like all other software events. +1 for each occurrence.
>
> So for instance, if you use irq_vectors:local_timer_entry you get how
> many cpu local timer instances happened during your measurement window.
>
> Same with a breakpoint, it counts how many times it got hit. Typically
> you'd want to install a custom handler on breakpoints to do something
> 'interesting', but even without that its acts like a normal software
> event.
>
>> Because of 'event->oncpu != cpu' dynamic check all counters are
>> expected to be per-cpu. I'm not sure how uncore counters work.
>
> Uncore thingies are assigned to any online CPU in their 'domain'.
>
>> What do they have in event->oncpu? -1? I guess they have pmu->count?
>> So we cannot read them from bpf program anyway?
>
> They have the CPU number of the CPU that's assigned to them. So you
> _could_ make use of them, but its a bit tricky to get them to work
> reliably because you'd have to get that CPU 'right' and it can change.
>
> Typically they would end up on the first CPU in their domain, but with
> CPU hotplug you can move them about and get confusion.
>
> I'd have to think on how to do that nicely.
>
>> If we change warn_ons in perf_event_read_local() to returns
>> them we can make per-task counters working.
>
> I'm not entirely sure I see how that is required. Should per task not
> already work? The WARN that's there will only trigger if you call them
> on the wrong task, which is something you shouldn't do anyway.

The kernel WARN is considered to be a bug of bpf infra. That's the
reason we do all these checks at map update time and at run-time.
The bpf program authors should be able to do all possible experiments
until their scripts work. Dealing with kernel warns and reboots is not
something user space folks like to do.
Today bpf_perf_event_read() for per-task events isn't really
working due to event->oncpu != cpu runtime check in there.
If we convert warns to returns the existing scripts will continue
to work as-is and per-task will be possible.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 net-next 1/3] perf, bpf: Add BPF support to all perf_event types
  2017-05-30 17:37           ` Alexei Starovoitov
@ 2017-05-30 19:03             ` Peter Zijlstra
  2017-06-01 13:32               ` Peter Zijlstra
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2017-05-30 19:03 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S . Miller, Brendan Gregg, Daniel Borkmann, Teng Qin,
	netdev, linux-kernel

On Tue, May 30, 2017 at 10:37:46AM -0700, Alexei Starovoitov wrote:
> On 5/30/17 9:51 AM, Peter Zijlstra wrote:

> > I'm not entirely sure I see how that is required. Should per task not
> > already work? The WARN that's there will only trigger if you call them
> > on the wrong task, which is something you shouldn't do anyway.
> 
> The kernel WARN is considered to be a bug of bpf infra. That's the
> reason we do all these checks at map update time and at run-time.
> The bpf program authors should be able to do all possible experiments
> until their scripts work. Dealing with kernel warns and reboots is not
> something user space folks like to do.
> Today bpf_perf_event_read() for per-task events isn't really
> working due to event->oncpu != cpu runtime check in there.
> If we convert warns to returns the existing scripts will continue
> to work as-is and per-task will be possible.

Ah yes.. I always forget that side of things (not ever having seen a
bpf thing working doesn't help either of course).

Let me consider that for a bit..

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 net-next 1/3] perf, bpf: Add BPF support to all perf_event types
  2017-05-30 19:03             ` Peter Zijlstra
@ 2017-06-01 13:32               ` Peter Zijlstra
  2017-06-01 15:21                 ` Alexei Starovoitov
  0 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2017-06-01 13:32 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S . Miller, Brendan Gregg, Daniel Borkmann, Teng Qin,
	netdev, linux-kernel

On Tue, May 30, 2017 at 09:03:39PM +0200, Peter Zijlstra wrote:
> On Tue, May 30, 2017 at 10:37:46AM -0700, Alexei Starovoitov wrote:
> > On 5/30/17 9:51 AM, Peter Zijlstra wrote:
> 
> > > I'm not entirely sure I see how that is required. Should per task not
> > > already work? The WARN that's there will only trigger if you call them
> > > on the wrong task, which is something you shouldn't do anyway.
> > 
> > The kernel WARN is considered to be a bug of bpf infra. That's the
> > reason we do all these checks at map update time and at run-time.
> > The bpf program authors should be able to do all possible experiments
> > until their scripts work. Dealing with kernel warns and reboots is not
> > something user space folks like to do.
> > Today bpf_perf_event_read() for per-task events isn't really
> > working due to event->oncpu != cpu runtime check in there.
> > If we convert warns to returns the existing scripts will continue
> > to work as-is and per-task will be possible.
> 
> Ah yes.. I always forget that side of things (not ever having seen a
> bpf thing working doesn't help either of course).
> 
> Let me consider that for a bit..

OK, do that. Something like the below should do I suppose.

It will return -EOPNOTSUPP for permanent failure (it cannot ever work)
and -EINVAL for temporary failure (could work if you call it on another
task/cpu).

---
 kernel/events/core.c | 41 +++++++++++++++++++++++++++--------------
 1 file changed, 27 insertions(+), 14 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 8d6acaeeea17..22b6407da374 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3637,10 +3637,10 @@ static inline u64 perf_event_count(struct perf_event *event)
  *     will not be local and we cannot read them atomically
  *   - must not have a pmu::count method
  */
-u64 perf_event_read_local(struct perf_event *event)
+int perf_event_read_local(struct perf_event *event, u64 *value)
 {
 	unsigned long flags;
-	u64 val;
+	int ret = 0;
 
 	/*
 	 * Disabling interrupts avoids all counter scheduling (context
@@ -3648,25 +3648,37 @@ u64 perf_event_read_local(struct perf_event *event)
 	 */
 	local_irq_save(flags);
 
-	/* If this is a per-task event, it must be for current */
-	WARN_ON_ONCE((event->attach_state & PERF_ATTACH_TASK) &&
-		     event->hw.target != current);
-
-	/* If this is a per-CPU event, it must be for this CPU */
-	WARN_ON_ONCE(!(event->attach_state & PERF_ATTACH_TASK) &&
-		     event->cpu != smp_processor_id());
-
 	/*
 	 * It must not be an event with inherit set, we cannot read
 	 * all child counters from atomic context.
 	 */
-	WARN_ON_ONCE(event->attr.inherit);
+	if (event->attr.inherit) {
+		ret = -EOPNOTSUPP;
+		goto out;
+	}
 
 	/*
 	 * It must not have a pmu::count method, those are not
 	 * NMI safe.
 	 */
-	WARN_ON_ONCE(event->pmu->count);
+	if (event->pmu->count) {
+		ret = -EOPNOTSUPP;
+		goto out;
+	}
+
+	/* If this is a per-task event, it must be for current */
+	if ((event->attach_state & PERF_ATTACH_TASK) &&
+	    event->hw.target != current) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/* If this is a per-CPU event, it must be for this CPU */
+	if (!(event->attach_state & PERF_ATTACH_TASK) &&
+	    event->cpu != smp_processor_id()) {
+		ret = -EINVAL;
+		goto out;
+	}
 
 	/*
 	 * If the event is currently on this CPU, its either a per-task event,
@@ -3676,10 +3688,11 @@ u64 perf_event_read_local(struct perf_event *event)
 	if (event->oncpu == smp_processor_id())
 		event->pmu->read(event);
 
-	val = local64_read(&event->count);
+	*value = local64_read(&event->count);
+out:
 	local_irq_restore(flags);
 
-	return val;
+	return ret;
 }
 
 static int perf_event_read(struct perf_event *event, bool group)

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 net-next 1/3] perf, bpf: Add BPF support to all perf_event types
  2017-06-01 13:32               ` Peter Zijlstra
@ 2017-06-01 15:21                 ` Alexei Starovoitov
  0 siblings, 0 replies; 14+ messages in thread
From: Alexei Starovoitov @ 2017-06-01 15:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David S . Miller, Brendan Gregg, Daniel Borkmann, Teng Qin,
	netdev, linux-kernel

On 6/1/17 6:32 AM, Peter Zijlstra wrote:
> OK, do that. Something like the below should do I suppose.
>
> It will return -EOPNOTSUPP for permanent failure (it cannot ever work)
> and -EINVAL for temporary failure (could work if you call it on another
> task/cpu).

Thanks! Will test it and report back.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-06-01 15:23 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-26  5:55 [PATCH v2 net-next 0/3] bpf: Add BPF support to all perf_event Alexei Starovoitov
2017-05-26  5:55 ` [PATCH v2 net-next 1/3] perf, bpf: Add BPF support to all perf_event types Alexei Starovoitov
2017-05-26 11:04   ` kbuild test robot
2017-05-26 14:55   ` David Miller
2017-05-29  9:12   ` Peter Zijlstra
2017-05-29  9:39     ` Peter Zijlstra
2017-05-30 15:52       ` Alexei Starovoitov
2017-05-30 16:51         ` Peter Zijlstra
2017-05-30 17:37           ` Alexei Starovoitov
2017-05-30 19:03             ` Peter Zijlstra
2017-06-01 13:32               ` Peter Zijlstra
2017-06-01 15:21                 ` Alexei Starovoitov
2017-05-26  5:55 ` [PATCH v2 net-next 2/3] samples/bpf: add samples for more perf event types Alexei Starovoitov
2017-05-26  5:55 ` [PATCH v2 net-next 3/3] bpf: update perf event helper functions documentation Alexei Starovoitov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.