All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH bpf-next v2 1/3] bpf: implement bpf_send_signal() helper
  2019-05-03  0:08 [RFC PATCH bpf-next v2 0/3] bpf: implement bpf_send_signal() helper Yonghong Song
@ 2019-05-03  0:08 ` Yonghong Song
  2019-05-05  7:29   ` Alexei Starovoitov
  2019-05-03  0:08 ` [RFC PATCH bpf-next v2 2/3] tools/bpf: sync bpf uapi header bpf.h Yonghong Song
  2019-05-03  0:08 ` [RFC PATCH bpf-next v2 3/3] tools/bpf: add a selftest for bpf_send_signal() helper Yonghong Song
  2 siblings, 1 reply; 6+ messages in thread
From: Yonghong Song @ 2019-05-03  0:08 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Peter Zijlstra,
	Yonghong Song

This patch tries to solve the following specific use case.

Currently, bpf program can already collect stack traces
when certain events happens (e.g., cache miss counter or
cpu clock counter overflows). These stack traces can be
used for performance analysis. For jitted programs, e.g.,
hhvm (jited php), it is very hard to get the true stack
trace in the bpf program due to jit complexity.

To resolve this issue, hhvm implements a signal handler,
e.g. for SIGALARM, and a set of program locations which
it can dump stack traces. When it receives a signal, it will
dump the stack in next such program location.

The following is the current way to handle this use case:
  . profiler installs a bpf program and polls on a map.
    When certain event happens, bpf program writes to a map.
  . Once receiving the information from the map, the profiler
    sends a signal to hhvm.
This method could have large delays and causing profiling
results skewed.

This patch implements bpf_send_signal() helper to send a signal to
hhvm in real time, resulting in intended stack traces.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/uapi/linux/bpf.h | 15 ++++++-
 kernel/trace/bpf_trace.c | 85 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 99 insertions(+), 1 deletion(-)

v1 -> v2:
 fixed a compilation warning (missing return value in case)
 reported by kbuild test robot.
 added Reported-by in the above to notify the bot.

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 72336bac7573..e3e824848335 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2667,6 +2667,18 @@ union bpf_attr {
  *		0 on success.
  *
  *		**-ENOENT** if the bpf-local-storage cannot be found.
+ *
+ * int bpf_send_signal(u32 pid, u32 sig)
+ *	Description
+ *		Send signal *sig* to *pid*.
+ *	Return
+ *		0 on success or successfully queued.
+ *
+ *		**-ENOENT** if *pid* cannot be found.
+ *
+ *		**-EBUSY** if work queue under nmi is full.
+ *
+ * 		Other negative values for actual signal sending errors.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2777,7 +2789,8 @@ union bpf_attr {
 	FN(strtol),			\
 	FN(strtoul),			\
 	FN(sk_storage_get),		\
-	FN(sk_storage_delete),
+	FN(sk_storage_delete),		\
+	FN(send_signal),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 8607aba1d882..49a804d59bfb 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -559,6 +559,76 @@ static const struct bpf_func_proto bpf_probe_read_str_proto = {
 	.arg3_type	= ARG_ANYTHING,
 };
 
+struct send_signal_irq_work {
+	struct irq_work irq_work;
+	u32 pid;
+	u32 sig;
+};
+
+static DEFINE_PER_CPU(struct send_signal_irq_work, send_signal_work);
+
+static int do_bpf_send_signal_pid(u32 pid, u32 sig)
+{
+	struct task_struct *task;
+	struct pid *pidp;
+
+	pidp = find_vpid(pid);
+	if (!pidp)
+		return -ENOENT;
+
+	task = get_pid_task(pidp, PIDTYPE_PID);
+	if (!task)
+		return -ENOENT;
+
+	kill_pid_info(sig, SEND_SIG_PRIV, pidp);
+	put_task_struct(task);
+
+	return 0;
+}
+
+static void do_bpf_send_signal(struct irq_work *entry)
+{
+	struct send_signal_irq_work *work;
+
+	work = container_of(entry, struct send_signal_irq_work, irq_work);
+	do_bpf_send_signal_pid(work->pid, work->sig);
+}
+
+BPF_CALL_2(bpf_send_signal, u32, pid, u32, sig)
+{
+	struct send_signal_irq_work *work = NULL;
+
+	/* current may be in the middle of teardown and task_pid(current)
+	 * becomes NULL. task_pid(current)) is needed to find pid namespace
+	 * in order to locate proper pid structure for the target pid.
+	 */
+	if (!task_pid(current))
+		return -ENOENT;
+
+	if (in_nmi()) {
+		work = this_cpu_ptr(&send_signal_work);
+		if (work->irq_work.flags & IRQ_WORK_BUSY)
+			return -EBUSY;
+	}
+
+	if (!work)
+		return do_bpf_send_signal_pid(pid, sig);
+
+	work->pid = pid;
+	work->sig = sig;
+	irq_work_queue(&work->irq_work);
+
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_send_signal_proto = {
+	.func		= bpf_send_signal,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_ANYTHING,
+	.arg2_type	= ARG_ANYTHING,
+};
+
 static const struct bpf_func_proto *
 tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -609,6 +679,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_get_current_cgroup_id:
 		return &bpf_get_current_cgroup_id_proto;
 #endif
+	case BPF_FUNC_send_signal:
+		return &bpf_send_signal_proto;
 	default:
 		return NULL;
 	}
@@ -1334,5 +1406,18 @@ int __init bpf_event_init(void)
 	return 0;
 }
 
+static int __init send_signal_irq_work_init(void)
+{
+	int cpu;
+	struct send_signal_irq_work *work;
+
+	for_each_possible_cpu(cpu) {
+		work = per_cpu_ptr(&send_signal_work, cpu);
+		init_irq_work(&work->irq_work, do_bpf_send_signal);
+	}
+	return 0;
+}
+
 fs_initcall(bpf_event_init);
+subsys_initcall(send_signal_irq_work_init);
 #endif /* CONFIG_MODULES */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH bpf-next v2 0/3] bpf: implement bpf_send_signal() helper
@ 2019-05-03  0:08 Yonghong Song
  2019-05-03  0:08 ` [RFC PATCH bpf-next v2 1/3] " Yonghong Song
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Yonghong Song @ 2019-05-03  0:08 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Peter Zijlstra,
	Yonghong Song

Currently, bpf program can already collect stack traces 
when certain events happens (e.g., cache miss counter or
cpu clock counter overflows). These stack traces can be 
used for performance analysis. For jitted programs, e.g.,
hhvm (jited php), it is very hard to get the true stack 
trace in the bpf program due to jit complexity.         

To resolve this issue, hhvm implements a signal handler,
e.g. for SIGALARM, and a set of program locations which 
it can dump stack traces. When it receives a signal, it will
dump the stack in next such program location.           

The following is the current way to handle this use case:
  . profiler installs a bpf program and polls on a map.
    When certain event happens, bpf program writes to a map.
  . Once receiving the information from the map, the profiler
    sends a signal to hhvm.
This method could have large delays and cause profiling
results skewed.

This patch implements bpf_send_signal() helper to send a signal to
hhvm in real time, resulting in intended stack traces.

The patch is sent out as RFC as (1). I have not found a simple
solution to return error code from irq_work, and (2). I would
like some general feedback about the approach.

Changelogs:
  v1 -> v2:
    . fixed a compilation error/warning (missing return value in one
      error branch) discovered by kbuild bot.

Yonghong Song (3):
  bpf: implement bpf_send_signal() helper
  tools/bpf: sync bpf uapi header bpf.h
  tools/bpf: add a selftest for bpf_send_signal() helper

 include/uapi/linux/bpf.h                      |  15 +-
 kernel/trace/bpf_trace.c                      |  85 ++++++++
 tools/include/uapi/linux/bpf.h                |  15 +-
 tools/testing/selftests/bpf/Makefile          |   5 +-
 tools/testing/selftests/bpf/bpf_helpers.h     |   2 +
 .../bpf/progs/test_send_signal_kern.c         |  50 +++++
 .../selftests/bpf/test_send_signal_user.c     | 186 ++++++++++++++++++
 7 files changed, 354 insertions(+), 4 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_send_signal_kern.c
 create mode 100644 tools/testing/selftests/bpf/test_send_signal_user.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC PATCH bpf-next v2 2/3] tools/bpf: sync bpf uapi header bpf.h
  2019-05-03  0:08 [RFC PATCH bpf-next v2 0/3] bpf: implement bpf_send_signal() helper Yonghong Song
  2019-05-03  0:08 ` [RFC PATCH bpf-next v2 1/3] " Yonghong Song
@ 2019-05-03  0:08 ` Yonghong Song
  2019-05-03  0:08 ` [RFC PATCH bpf-next v2 3/3] tools/bpf: add a selftest for bpf_send_signal() helper Yonghong Song
  2 siblings, 0 replies; 6+ messages in thread
From: Yonghong Song @ 2019-05-03  0:08 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Peter Zijlstra,
	Yonghong Song

sync bpf uapi header bpf.h to tools directory.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/include/uapi/linux/bpf.h | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 72336bac7573..e3e824848335 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2667,6 +2667,18 @@ union bpf_attr {
  *		0 on success.
  *
  *		**-ENOENT** if the bpf-local-storage cannot be found.
+ *
+ * int bpf_send_signal(u32 pid, u32 sig)
+ *	Description
+ *		Send signal *sig* to *pid*.
+ *	Return
+ *		0 on success or successfully queued.
+ *
+ *		**-ENOENT** if *pid* cannot be found.
+ *
+ *		**-EBUSY** if work queue under nmi is full.
+ *
+ * 		Other negative values for actual signal sending errors.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2777,7 +2789,8 @@ union bpf_attr {
 	FN(strtol),			\
 	FN(strtoul),			\
 	FN(sk_storage_get),		\
-	FN(sk_storage_delete),
+	FN(sk_storage_delete),		\
+	FN(send_signal),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH bpf-next v2 3/3] tools/bpf: add a selftest for bpf_send_signal() helper
  2019-05-03  0:08 [RFC PATCH bpf-next v2 0/3] bpf: implement bpf_send_signal() helper Yonghong Song
  2019-05-03  0:08 ` [RFC PATCH bpf-next v2 1/3] " Yonghong Song
  2019-05-03  0:08 ` [RFC PATCH bpf-next v2 2/3] tools/bpf: sync bpf uapi header bpf.h Yonghong Song
@ 2019-05-03  0:08 ` Yonghong Song
  2 siblings, 0 replies; 6+ messages in thread
From: Yonghong Song @ 2019-05-03  0:08 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Peter Zijlstra,
	Yonghong Song

The test covered both nmi and tracepoint perf events.
  $ ./test_send_signal_user
  test_send_signal (tracepoint): OK
  test_send_signal (perf_event): OK

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/testing/selftests/bpf/Makefile          |   5 +-
 tools/testing/selftests/bpf/bpf_helpers.h     |   2 +
 .../bpf/progs/test_send_signal_kern.c         |  50 +++++
 .../selftests/bpf/test_send_signal_user.c     | 186 ++++++++++++++++++
 4 files changed, 241 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_send_signal_kern.c
 create mode 100644 tools/testing/selftests/bpf/test_send_signal_user.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 66f2dca1dee1..c3aa33dfae18 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -23,7 +23,8 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
 	test_align test_verifier_log test_dev_cgroup test_tcpbpf_user \
 	test_sock test_btf test_sockmap test_lirc_mode2_user get_cgroup_id_user \
 	test_socket_cookie test_cgroup_storage test_select_reuseport test_section_names \
-	test_netcnt test_tcpnotify_user test_sock_fields test_sysctl
+	test_netcnt test_tcpnotify_user test_sock_fields test_sysctl \
+	test_send_signal_user
 
 BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c)))
 TEST_GEN_FILES = $(BPF_OBJ_FILES)
@@ -124,7 +125,7 @@ endif
 CLANG_SYS_INCLUDES := $(shell $(CLANG) -v -E - </dev/null 2>&1 \
 	| sed -n '/<...> search starts here:/,/End of search list./{ s| \(/.*\)|-idirafter \1|p }')
 
-CLANG_FLAGS = -I. -I./include/uapi -I../../../include/uapi \
+CLANG_FLAGS = -I. -I./include/uapi -I../../../include/uapi -I../../../include \
 	      $(CLANG_SYS_INCLUDES) \
 	      -Wno-compare-distinct-pointer-types
 
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 6e80b66d7fb1..4517bca4f9ed 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -216,6 +216,8 @@ static void *(*bpf_sk_storage_get)(void *map, struct bpf_sock *sk,
 	(void *) BPF_FUNC_sk_storage_get;
 static int (*bpf_sk_storage_delete)(void *map, struct bpf_sock *sk) =
 	(void *)BPF_FUNC_sk_storage_delete;
+static int (*bpf_send_signal)(unsigned pid, unsigned sig) =
+	(void *)BPF_FUNC_send_signal;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
diff --git a/tools/testing/selftests/bpf/progs/test_send_signal_kern.c b/tools/testing/selftests/bpf/progs/test_send_signal_kern.c
new file mode 100644
index 000000000000..b81dbbf60b3d
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_send_signal_kern.c
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+#include <linux/bpf.h>
+#include <linux/bpf_perf_event.h>
+#include <linux/version.h>
+#include "bpf_helpers.h"
+
+struct bpf_map_def SEC("maps") info_map = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(__u64),
+	.max_entries = 1,
+};
+
+BPF_ANNOTATE_KV_PAIR(info_map, __u32, __u64);
+
+struct bpf_map_def SEC("maps") status_map = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(__u64),
+	.max_entries = 1,
+};
+
+BPF_ANNOTATE_KV_PAIR(status_map, __u32, __u64);
+
+SEC("send_signal_demo")
+int bpf_send_signal_test(void *ctx)
+{
+	__u64 *info_val, *status_val;
+	__u32 key = 0, pid, sig;
+	int ret;
+
+	status_val = bpf_map_lookup_elem(&status_map, &key);
+	if (!status_val || *status_val != 0)
+		return 0;
+
+	info_val = bpf_map_lookup_elem(&info_map, &key);
+	if (!info_val || *info_val == 0)
+		return 0;
+
+	sig = *info_val >> 32;
+	pid = *info_val & 0xffffFFFF;
+
+	ret = bpf_send_signal(pid, sig);
+	if (ret == 0)
+		*status_val = 1;
+
+	return 0;
+}
+char __license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_send_signal_user.c b/tools/testing/selftests/bpf/test_send_signal_user.c
new file mode 100644
index 000000000000..38898c575bd2
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_send_signal_user.c
@@ -0,0 +1,186 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+#include <signal.h>
+#include <syscall.h>
+#include <sys/ioctl.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/wait.h>
+#include <fcntl.h>
+#include <unistd.h>
+
+#include <linux/perf_event.h>
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+#include "bpf_rlimit.h"
+
+static volatile int signal_received = 0;
+
+static void sigusr1_handler(int signum)
+{
+	signal_received++;
+}
+
+static void test_common(struct perf_event_attr *attr, int prog_type,
+			const char *test_name, int trace_pid, int cpu)
+{
+	int pmu_fd, prog_fd, info_map_fd, status_map_fd;
+	const char *file = "./test_send_signal_kern.o";
+	struct bpf_object *obj = NULL;
+	int pipe_c2p[2], pipe_p2c[2];
+	char buf[256];
+	int err = 0;
+	u32 key = 0;
+	pid_t pid;
+	u64 val;
+
+	if (pipe(pipe_c2p) < 0) {
+		fprintf(stderr, "Creating pipe c2p error\n");
+		return;
+	}
+	if (pipe(pipe_p2c) < 0) {
+		fprintf(stderr, "Creating pipe p2c error\n");
+		return;
+	}
+
+	pid = fork();
+	if (pid == 0) {
+		/* install signal handler and notify parent */
+		signal(SIGUSR1, sigusr1_handler);
+
+		close(pipe_c2p[0]); /* close read */
+		close(pipe_p2c[1]); /* close write */
+
+		/* notify parent signal handler is installed */
+		write(pipe_c2p[1], buf, 1);
+
+		/* make sense parent enabled bpf program to send_signal */
+		read(pipe_p2c[0], buf, 1);
+
+		/* wait a little for signal handler */
+		sleep(1);
+
+		if (signal_received)
+			write(pipe_c2p[1], "2", 1);
+		else
+			write(pipe_c2p[1], "0", 1);
+
+		/* wait for parent notification and exit */
+		read(pipe_p2c[0], buf, 1);
+		exit(0);
+	}
+
+	close(pipe_c2p[1]); /* close write */
+	close(pipe_p2c[0]); /* close read */
+
+	err = bpf_prog_load(file, prog_type, &obj, &prog_fd);
+	if (err) {
+		printf("test_send_signal:bpf_prog_load errno %d\n", errno);
+		goto close_prog;
+	}
+
+	pmu_fd = syscall(__NR_perf_event_open, attr, trace_pid, cpu,
+			 -1 /* group id */, 0 /* flags */);
+	if (pmu_fd < 0)
+		goto close_prog;
+
+	err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
+	if (err)
+		goto disable_pmu;
+
+	err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
+	if (err)
+		goto disable_pmu;
+
+	info_map_fd = bpf_object__find_map_fd_by_name(obj, "info_map");
+	if (info_map_fd < 0)
+		goto disable_pmu;
+
+	status_map_fd = bpf_object__find_map_fd_by_name(obj, "status_map");
+	if (status_map_fd < 0)
+		goto disable_pmu;
+
+	/* wait until child signal handler installed */
+	read(pipe_c2p[0], buf, 1);
+
+	/* trigger the bpf send_signal */
+	key = 0;
+	val = (((u64)(SIGUSR1)) << 32) | pid;
+	bpf_map_update_elem(info_map_fd, &key, &val, 0);
+
+	/* notify child that bpf program can send_signal now */
+	write(pipe_p2c[1], buf, 1);
+
+	/* wait for result */
+	read(pipe_c2p[0], buf, 1);
+
+	if (buf[0] == '2')
+		printf("test_send_signal (%s): OK\n", test_name);
+	else
+		printf("test_send_signal (%s): FAIL\n", test_name);
+
+	/* notify child safe to exit */
+	write(pipe_p2c[1], buf, 1);
+
+	goto close_prog_noerr;
+
+close_prog:
+disable_pmu:
+	printf("test_send_signal (%s) :FAIL\n", test_name);
+close_prog_noerr:
+	bpf_object__close(obj);
+
+	wait(NULL);
+}
+
+static void test_tracepoint(void)
+{
+	struct perf_event_attr attr = {
+		.type = PERF_TYPE_TRACEPOINT,
+		.sample_type = PERF_SAMPLE_RAW | PERF_SAMPLE_CALLCHAIN,
+		.sample_period = 1,
+		.wakeup_events = 1,
+	};
+	int bytes, efd;
+	char buf[256];
+
+	snprintf(buf, sizeof(buf),
+		 "/sys/kernel/debug/tracing/events/syscalls/sys_enter_nanosleep/id");
+	efd = open(buf, O_RDONLY, 0);
+	if (efd < 0) {
+		printf("test_send_signal:bpf_prog_load errno %d\n", errno);
+		return;
+	}
+
+	bytes = read(efd, buf, sizeof(buf));
+	close(efd);
+	if (bytes <= 0 || bytes >= sizeof(buf))
+		return;
+
+	attr.config = strtol(buf, NULL, 0);
+
+	test_common(&attr, BPF_PROG_TYPE_TRACEPOINT, "tracepoint", -1, 0);
+}
+
+static void test_nmi_perf_event(void)
+{
+	struct perf_event_attr attr = {
+		.sample_freq = 50,
+		.freq = 1,
+		.type = PERF_TYPE_HARDWARE,
+		.config = PERF_COUNT_HW_CPU_CYCLES,
+	};
+
+	test_common(&attr, BPF_PROG_TYPE_PERF_EVENT, "perf_event", -1, 0);
+}
+
+int main(void)
+{
+	test_tracepoint();
+	test_nmi_perf_event();
+	return 0;
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH bpf-next v2 1/3] bpf: implement bpf_send_signal() helper
  2019-05-03  0:08 ` [RFC PATCH bpf-next v2 1/3] " Yonghong Song
@ 2019-05-05  7:29   ` Alexei Starovoitov
  2019-05-05 17:27     ` Yonghong Song
  0 siblings, 1 reply; 6+ messages in thread
From: Alexei Starovoitov @ 2019-05-05  7:29 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Network Development, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Peter Zijlstra

On Thu, May 2, 2019 at 5:08 PM Yonghong Song <yhs@fb.com> wrote:
>
> This patch tries to solve the following specific use case.
>
> Currently, bpf program can already collect stack traces
> when certain events happens (e.g., cache miss counter or
> cpu clock counter overflows). These stack traces can be
> used for performance analysis. For jitted programs, e.g.,
> hhvm (jited php), it is very hard to get the true stack
> trace in the bpf program due to jit complexity.
>
> To resolve this issue, hhvm implements a signal handler,
> e.g. for SIGALARM, and a set of program locations which
> it can dump stack traces. When it receives a signal, it will
> dump the stack in next such program location.
>
> The following is the current way to handle this use case:
>   . profiler installs a bpf program and polls on a map.
>     When certain event happens, bpf program writes to a map.
>   . Once receiving the information from the map, the profiler
>     sends a signal to hhvm.
> This method could have large delays and causing profiling
> results skewed.
>
> This patch implements bpf_send_signal() helper to send a signal to
> hhvm in real time, resulting in intended stack traces.
>
> Reported-by: kbuild test robot <lkp@intel.com>

v2 addressing buildbot issue doesn't need to have such tag.

> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  include/uapi/linux/bpf.h | 15 ++++++-
>  kernel/trace/bpf_trace.c | 85 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 99 insertions(+), 1 deletion(-)
>
> v1 -> v2:
>  fixed a compilation warning (missing return value in case)
>  reported by kbuild test robot.
>  added Reported-by in the above to notify the bot.
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 72336bac7573..e3e824848335 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2667,6 +2667,18 @@ union bpf_attr {
>   *             0 on success.
>   *
>   *             **-ENOENT** if the bpf-local-storage cannot be found.
> + *
> + * int bpf_send_signal(u32 pid, u32 sig)
> + *     Description
> + *             Send signal *sig* to *pid*.
> + *     Return
> + *             0 on success or successfully queued.
> + *
> + *             **-ENOENT** if *pid* cannot be found.
> + *
> + *             **-EBUSY** if work queue under nmi is full.
> + *
> + *             Other negative values for actual signal sending errors.
>   */
>  #define __BPF_FUNC_MAPPER(FN)          \
>         FN(unspec),                     \
> @@ -2777,7 +2789,8 @@ union bpf_attr {
>         FN(strtol),                     \
>         FN(strtoul),                    \
>         FN(sk_storage_get),             \
> -       FN(sk_storage_delete),
> +       FN(sk_storage_delete),          \
> +       FN(send_signal),
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 8607aba1d882..49a804d59bfb 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -559,6 +559,76 @@ static const struct bpf_func_proto bpf_probe_read_str_proto = {
>         .arg3_type      = ARG_ANYTHING,
>  };
>
> +struct send_signal_irq_work {
> +       struct irq_work irq_work;
> +       u32 pid;
> +       u32 sig;
> +};
> +
> +static DEFINE_PER_CPU(struct send_signal_irq_work, send_signal_work);
> +
> +static int do_bpf_send_signal_pid(u32 pid, u32 sig)
> +{
> +       struct task_struct *task;
> +       struct pid *pidp;
> +
> +       pidp = find_vpid(pid);

it takes pidns from current which should be valid
which makes bpf prog depend on current, but from nmi
there are no guarantees.
I think find_pid_ns() would be cleaner, but then the question
would be how to pass pidns...
Another issue is instability of pid as an integer...
upcoming pidfd could be the answer.
At this point I think it's cleaner to make such api to send signal
to existing process only under the same conditions as in bpf_probe_write_user.
Would that work for your use case?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH bpf-next v2 1/3] bpf: implement bpf_send_signal() helper
  2019-05-05  7:29   ` Alexei Starovoitov
@ 2019-05-05 17:27     ` Yonghong Song
  0 siblings, 0 replies; 6+ messages in thread
From: Yonghong Song @ 2019-05-05 17:27 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Network Development, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Peter Zijlstra



On 5/5/19 12:29 AM, Alexei Starovoitov wrote:
> On Thu, May 2, 2019 at 5:08 PM Yonghong Song <yhs@fb.com> wrote:
>>
>> This patch tries to solve the following specific use case.
>>
>> Currently, bpf program can already collect stack traces
>> when certain events happens (e.g., cache miss counter or
>> cpu clock counter overflows). These stack traces can be
>> used for performance analysis. For jitted programs, e.g.,
>> hhvm (jited php), it is very hard to get the true stack
>> trace in the bpf program due to jit complexity.
>>
>> To resolve this issue, hhvm implements a signal handler,
>> e.g. for SIGALARM, and a set of program locations which
>> it can dump stack traces. When it receives a signal, it will
>> dump the stack in next such program location.
>>
>> The following is the current way to handle this use case:
>>    . profiler installs a bpf program and polls on a map.
>>      When certain event happens, bpf program writes to a map.
>>    . Once receiving the information from the map, the profiler
>>      sends a signal to hhvm.
>> This method could have large delays and causing profiling
>> results skewed.
>>
>> This patch implements bpf_send_signal() helper to send a signal to
>> hhvm in real time, resulting in intended stack traces.
>>
>> Reported-by: kbuild test robot <lkp@intel.com>
> 
> v2 addressing buildbot issue doesn't need to have such tag.

I will drop this in the next revision. The intention is, as suggested
by buildbot email, is to give credit to buildbot and let it know the bug 
is taken care of.

> 
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   include/uapi/linux/bpf.h | 15 ++++++-
>>   kernel/trace/bpf_trace.c | 85 ++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 99 insertions(+), 1 deletion(-)
>>
>> v1 -> v2:
>>   fixed a compilation warning (missing return value in case)
>>   reported by kbuild test robot.
>>   added Reported-by in the above to notify the bot.
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 72336bac7573..e3e824848335 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -2667,6 +2667,18 @@ union bpf_attr {
>>    *             0 on success.
>>    *
>>    *             **-ENOENT** if the bpf-local-storage cannot be found.
>> + *
>> + * int bpf_send_signal(u32 pid, u32 sig)
>> + *     Description
>> + *             Send signal *sig* to *pid*.
>> + *     Return
>> + *             0 on success or successfully queued.
>> + *
>> + *             **-ENOENT** if *pid* cannot be found.
>> + *
>> + *             **-EBUSY** if work queue under nmi is full.
>> + *
>> + *             Other negative values for actual signal sending errors.
>>    */
>>   #define __BPF_FUNC_MAPPER(FN)          \
>>          FN(unspec),                     \
>> @@ -2777,7 +2789,8 @@ union bpf_attr {
>>          FN(strtol),                     \
>>          FN(strtoul),                    \
>>          FN(sk_storage_get),             \
>> -       FN(sk_storage_delete),
>> +       FN(sk_storage_delete),          \
>> +       FN(send_signal),
>>
>>   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>>    * function eBPF program intends to call
>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index 8607aba1d882..49a804d59bfb 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -559,6 +559,76 @@ static const struct bpf_func_proto bpf_probe_read_str_proto = {
>>          .arg3_type      = ARG_ANYTHING,
>>   };
>>
>> +struct send_signal_irq_work {
>> +       struct irq_work irq_work;
>> +       u32 pid;
>> +       u32 sig;
>> +};
>> +
>> +static DEFINE_PER_CPU(struct send_signal_irq_work, send_signal_work);
>> +
>> +static int do_bpf_send_signal_pid(u32 pid, u32 sig)
>> +{
>> +       struct task_struct *task;
>> +       struct pid *pidp;
>> +
>> +       pidp = find_vpid(pid);
> 
> it takes pidns from current which should be valid
> which makes bpf prog depend on current, but from nmi
> there are no guarantees.

This is right. For nmi, I have found some fields of "current" may not be 
valid.

> I think find_pid_ns() would be cleaner, but then the question
> would be how to pass pidns...

The "current" is mostly used to find pid namespace 
`task_active_pid_ns(current)`. Yes, we can use find_pid_ns(), but we
need to get pid_namespace in advance then pid, maybe by introducing
a new map or something, or may be directly using the below pidfd
based approach to hold the reference count in a map.

> Another issue is instability of pid as an integer...
> upcoming pidfd could be the answer.

Will take a look at the details. Thanks for referencing.

> At this point I think it's cleaner to make such api to send signal
> to existing process only under the same conditions as in bpf_probe_write_user.
> Would that work for your use case?

Let me double check this.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-05-05 17:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-03  0:08 [RFC PATCH bpf-next v2 0/3] bpf: implement bpf_send_signal() helper Yonghong Song
2019-05-03  0:08 ` [RFC PATCH bpf-next v2 1/3] " Yonghong Song
2019-05-05  7:29   ` Alexei Starovoitov
2019-05-05 17:27     ` Yonghong Song
2019-05-03  0:08 ` [RFC PATCH bpf-next v2 2/3] tools/bpf: sync bpf uapi header bpf.h Yonghong Song
2019-05-03  0:08 ` [RFC PATCH bpf-next v2 3/3] tools/bpf: add a selftest for bpf_send_signal() helper Yonghong Song

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.