netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v5 0/3] bpf: implement bpf_send_signal() helper
@ 2019-05-23 21:47 Yonghong Song
  2019-05-23 21:47 ` [PATCH bpf-next v5 1/3] " Yonghong Song
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Yonghong Song @ 2019-05-23 21:47 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Peter Zijlstra,
	Yonghong Song

This patch tries to solve the following specific use case. 

Currently, bpf program can already collect stack traces
through kernel function get_perf_callchain()
when certain events happens (e.g., cache miss counter or
cpu clock counter overflows). But such stack traces are 
not enough for jitted programs, e.g., hhvm (jited php).
To get real stack trace, jit engine internal data structures
need to be traversed in order to get the real user functions.

bpf program itself may not be the best place to traverse 
the jit engine as the traversing logic could be complex and
it is not a stable interface either.

Instead, hhvm implements a signal handler,
e.g. for SIGALARM, and a set of program locations which 
it can dump stack traces. When it receives a signal, it will
dump the stack in next such program location.

This patch implements bpf_send_signal() helper to send
a signal to hhvm in real time, resulting in intended stack traces. 

Patch #1 implemented the bpf_send_helper() in the kernel.
Patch #2 synced uapi header bpf.h to tools directory.
Patch #3 added a self test which covers tracepoint
and perf_event bpf programs. 

Changelogs:
  v4 => v5:
    . pass the "current" task struct to irq_work as well
      since the current task struct may change between
      nmi and subsequent irq_work_interrupt.
      Discovered by Daniel.
  v3 => v4:
    . fix one typo and declare "const char *id_path = ..."
      to avoid directly use the long string in the func body 
      in Patch #3.
  v2 => v3:
    . change the standalone test to be part of prog_tests.
  RFC v1 => v2:
    . previous version allows to send signal to an arbitrary 
      pid. This version just sends the signal to current
      task to avoid unstable pid and potential races between
      sending signals and task state changes for the pid.

Yonghong Song (3):
  bpf: implement bpf_send_signal() helper
  tools/bpf: sync bpf uapi header bpf.h to tools directory
  tools/bpf: add selftest in test_progs for bpf_send_signal() helper

 include/uapi/linux/bpf.h                      |  17 +-
 kernel/trace/bpf_trace.c                      |  72 +++++++
 tools/include/uapi/linux/bpf.h                |  17 +-
 tools/testing/selftests/bpf/bpf_helpers.h     |   1 +
 .../selftests/bpf/prog_tests/send_signal.c    | 198 ++++++++++++++++++
 .../bpf/progs/test_send_signal_kern.c         |  51 +++++
 6 files changed, 354 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/send_signal.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_send_signal_kern.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH bpf-next v5 1/3] bpf: implement bpf_send_signal() helper
  2019-05-23 21:47 [PATCH bpf-next v5 0/3] bpf: implement bpf_send_signal() helper Yonghong Song
@ 2019-05-23 21:47 ` Yonghong Song
  2019-05-24 21:39   ` Daniel Borkmann
  2019-05-23 21:47 ` [PATCH bpf-next v5 2/3] tools/bpf: sync bpf uapi header bpf.h to tools directory Yonghong Song
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Yonghong Song @ 2019-05-23 21:47 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Peter Zijlstra,
	Yonghong Song

This patch tries to solve the following specific use case.

Currently, bpf program can already collect stack traces
through kernel function get_perf_callchain()
when certain events happens (e.g., cache miss counter or
cpu clock counter overflows). But such stack traces are
not enough for jitted programs, e.g., hhvm (jited php).
To get real stack trace, jit engine internal data structures
need to be traversed in order to get the real user functions.

bpf program itself may not be the best place to traverse
the jit engine as the traversing logic could be complex and
it is not a stable interface either.

Instead, hhvm implements a signal handler,
e.g. for SIGALARM, and a set of program locations which
it can dump stack traces. When it receives a signal, it will
dump the stack in next such program location.

Such a mechanism can be implemented in the following way:
  . a perf ring buffer is created between bpf program
    and tracing app.
  . once a particular event happens, bpf program writes
    to the ring buffer and the tracing app gets notified.
  . the tracing app sends a signal SIGALARM to the hhvm.

But this method could have large delays and causing profiling
results skewed.

This patch implements bpf_send_signal() helper to send
a signal to hhvm in real time, resulting in intended stack traces.

Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/uapi/linux/bpf.h | 17 +++++++++-
 kernel/trace/bpf_trace.c | 72 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 63e0cf66f01a..68d4470523a0 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2672,6 +2672,20 @@ union bpf_attr {
  *		0 on success.
  *
  *		**-ENOENT** if the bpf-local-storage cannot be found.
+ *
+ * int bpf_send_signal(u32 sig)
+ *	Description
+ *		Send signal *sig* to the current task.
+ *	Return
+ *		0 on success or successfully queued.
+ *
+ *		**-EBUSY** if work queue under nmi is full.
+ *
+ *		**-EINVAL** if *sig* is invalid.
+ *
+ *		**-EPERM** if no permission to send the *sig*.
+ *
+ *		**-EAGAIN** if bpf program can try again.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2782,7 +2796,8 @@ union bpf_attr {
 	FN(strtol),			\
 	FN(strtoul),			\
 	FN(sk_storage_get),		\
-	FN(sk_storage_delete),
+	FN(sk_storage_delete),		\
+	FN(send_signal),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index f92d6ad5e080..70029eafc71f 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -567,6 +567,63 @@ static const struct bpf_func_proto bpf_probe_read_str_proto = {
 	.arg3_type	= ARG_ANYTHING,
 };
 
+struct send_signal_irq_work {
+	struct irq_work irq_work;
+	struct task_struct *task;
+	u32 sig;
+};
+
+static DEFINE_PER_CPU(struct send_signal_irq_work, send_signal_work);
+
+static void do_bpf_send_signal(struct irq_work *entry)
+{
+	struct send_signal_irq_work *work;
+
+	work = container_of(entry, struct send_signal_irq_work, irq_work);
+	group_send_sig_info(work->sig, SEND_SIG_PRIV, work->task, PIDTYPE_TGID);
+}
+
+BPF_CALL_1(bpf_send_signal, u32, sig)
+{
+	struct send_signal_irq_work *work = NULL;
+
+	/* Similar to bpf_probe_write_user, task needs to be
+	 * in a sound condition and kernel memory access be
+	 * permitted in order to send signal to the current
+	 * task.
+	 */
+	if (unlikely(current->flags & (PF_KTHREAD | PF_EXITING)))
+		return -EPERM;
+	if (unlikely(uaccess_kernel()))
+		return -EPERM;
+	if (unlikely(!nmi_uaccess_okay()))
+		return -EPERM;
+
+	if (in_nmi()) {
+		work = this_cpu_ptr(&send_signal_work);
+		if (work->irq_work.flags & IRQ_WORK_BUSY)
+			return -EBUSY;
+
+		/* Add the current task, which is the target of sending signal,
+		 * to the irq_work. The current task may change when queued
+		 * irq works get executed.
+		 */
+		work->task = current;
+		work->sig = sig;
+		irq_work_queue(&work->irq_work);
+		return 0;
+	}
+
+	return group_send_sig_info(sig, SEND_SIG_PRIV, current, PIDTYPE_TGID);
+}
+
+static const struct bpf_func_proto bpf_send_signal_proto = {
+	.func		= bpf_send_signal,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_ANYTHING,
+};
+
 static const struct bpf_func_proto *
 tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -617,6 +674,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_get_current_cgroup_id:
 		return &bpf_get_current_cgroup_id_proto;
 #endif
+	case BPF_FUNC_send_signal:
+		return &bpf_send_signal_proto;
 	default:
 		return NULL;
 	}
@@ -1343,5 +1402,18 @@ static int __init bpf_event_init(void)
 	return 0;
 }
 
+static int __init send_signal_irq_work_init(void)
+{
+	int cpu;
+	struct send_signal_irq_work *work;
+
+	for_each_possible_cpu(cpu) {
+		work = per_cpu_ptr(&send_signal_work, cpu);
+		init_irq_work(&work->irq_work, do_bpf_send_signal);
+	}
+	return 0;
+}
+
 fs_initcall(bpf_event_init);
+subsys_initcall(send_signal_irq_work_init);
 #endif /* CONFIG_MODULES */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf-next v5 2/3] tools/bpf: sync bpf uapi header bpf.h to tools directory
  2019-05-23 21:47 [PATCH bpf-next v5 0/3] bpf: implement bpf_send_signal() helper Yonghong Song
  2019-05-23 21:47 ` [PATCH bpf-next v5 1/3] " Yonghong Song
@ 2019-05-23 21:47 ` Yonghong Song
  2019-05-23 21:47 ` [PATCH bpf-next v5 3/3] tools/bpf: add selftest in test_progs for bpf_send_signal() helper Yonghong Song
  2019-05-24 21:32 ` [PATCH bpf-next v5 0/3] bpf: implement " Daniel Borkmann
  3 siblings, 0 replies; 12+ messages in thread
From: Yonghong Song @ 2019-05-23 21:47 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Peter Zijlstra,
	Yonghong Song

The bpf uapi header include/uapi/linux/bpf.h is sync'ed
to tools/include/uapi/linux/bpf.h.

Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/include/uapi/linux/bpf.h | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 63e0cf66f01a..68d4470523a0 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2672,6 +2672,20 @@ union bpf_attr {
  *		0 on success.
  *
  *		**-ENOENT** if the bpf-local-storage cannot be found.
+ *
+ * int bpf_send_signal(u32 sig)
+ *	Description
+ *		Send signal *sig* to the current task.
+ *	Return
+ *		0 on success or successfully queued.
+ *
+ *		**-EBUSY** if work queue under nmi is full.
+ *
+ *		**-EINVAL** if *sig* is invalid.
+ *
+ *		**-EPERM** if no permission to send the *sig*.
+ *
+ *		**-EAGAIN** if bpf program can try again.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2782,7 +2796,8 @@ union bpf_attr {
 	FN(strtol),			\
 	FN(strtoul),			\
 	FN(sk_storage_get),		\
-	FN(sk_storage_delete),
+	FN(sk_storage_delete),		\
+	FN(send_signal),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH bpf-next v5 3/3] tools/bpf: add selftest in test_progs for bpf_send_signal() helper
  2019-05-23 21:47 [PATCH bpf-next v5 0/3] bpf: implement bpf_send_signal() helper Yonghong Song
  2019-05-23 21:47 ` [PATCH bpf-next v5 1/3] " Yonghong Song
  2019-05-23 21:47 ` [PATCH bpf-next v5 2/3] tools/bpf: sync bpf uapi header bpf.h to tools directory Yonghong Song
@ 2019-05-23 21:47 ` Yonghong Song
  2019-05-24 21:32 ` [PATCH bpf-next v5 0/3] bpf: implement " Daniel Borkmann
  3 siblings, 0 replies; 12+ messages in thread
From: Yonghong Song @ 2019-05-23 21:47 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Peter Zijlstra,
	Yonghong Song

The test covered both nmi and tracepoint perf events.
  $ ./test_progs
  ...
  test_send_signal_tracepoint:PASS:tracepoint 0 nsec
  ...
  test_send_signal_common:PASS:tracepoint 0 nsec
  ...
  test_send_signal_common:PASS:perf_event 0 nsec
  ...
  test_send_signal:OK

Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/testing/selftests/bpf/bpf_helpers.h     |   1 +
 .../selftests/bpf/prog_tests/send_signal.c    | 198 ++++++++++++++++++
 .../bpf/progs/test_send_signal_kern.c         |  51 +++++
 3 files changed, 250 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/send_signal.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_send_signal_kern.c

diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 5f6f9e7aba2a..cb02521b8e58 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -216,6 +216,7 @@ static void *(*bpf_sk_storage_get)(void *map, struct bpf_sock *sk,
 	(void *) BPF_FUNC_sk_storage_get;
 static int (*bpf_sk_storage_delete)(void *map, struct bpf_sock *sk) =
 	(void *)BPF_FUNC_sk_storage_delete;
+static int (*bpf_send_signal)(unsigned sig) = (void *)BPF_FUNC_send_signal;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
diff --git a/tools/testing/selftests/bpf/prog_tests/send_signal.c b/tools/testing/selftests/bpf/prog_tests/send_signal.c
new file mode 100644
index 000000000000..67cea1686305
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/send_signal.c
@@ -0,0 +1,198 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+
+static volatile int sigusr1_received = 0;
+
+static void sigusr1_handler(int signum)
+{
+	sigusr1_received++;
+}
+
+static int test_send_signal_common(struct perf_event_attr *attr,
+				    int prog_type,
+				    const char *test_name)
+{
+	int err = -1, pmu_fd, prog_fd, info_map_fd, status_map_fd;
+	const char *file = "./test_send_signal_kern.o";
+	struct bpf_object *obj = NULL;
+	int pipe_c2p[2], pipe_p2c[2];
+	__u32 key = 0, duration = 0;
+	char buf[256];
+	pid_t pid;
+	__u64 val;
+
+	if (CHECK(pipe(pipe_c2p), test_name,
+		  "pipe pipe_c2p error: %s\n", strerror(errno)))
+		goto no_fork_done;
+
+	if (CHECK(pipe(pipe_p2c), test_name,
+		  "pipe pipe_p2c error: %s\n", strerror(errno))) {
+		close(pipe_c2p[0]);
+		close(pipe_c2p[1]);
+		goto no_fork_done;
+	}
+
+	pid = fork();
+	if (CHECK(pid < 0, test_name, "fork error: %s\n", strerror(errno))) {
+		close(pipe_c2p[0]);
+		close(pipe_c2p[1]);
+		close(pipe_p2c[0]);
+		close(pipe_p2c[1]);
+		goto no_fork_done;
+	}
+
+	if (pid == 0) {
+		/* install signal handler and notify parent */
+		signal(SIGUSR1, sigusr1_handler);
+
+		close(pipe_c2p[0]); /* close read */
+		close(pipe_p2c[1]); /* close write */
+
+		/* notify parent signal handler is installed */
+		write(pipe_c2p[1], buf, 1);
+
+		/* make sure parent enabled bpf program to send_signal */
+		read(pipe_p2c[0], buf, 1);
+
+		/* wait a little for signal handler */
+		sleep(1);
+
+		if (sigusr1_received)
+			write(pipe_c2p[1], "2", 1);
+		else
+			write(pipe_c2p[1], "0", 1);
+
+		/* wait for parent notification and exit */
+		read(pipe_p2c[0], buf, 1);
+
+		close(pipe_c2p[1]);
+		close(pipe_p2c[0]);
+		exit(0);
+	}
+
+	close(pipe_c2p[1]); /* close write */
+	close(pipe_p2c[0]); /* close read */
+
+	err = bpf_prog_load(file, prog_type, &obj, &prog_fd);
+	if (CHECK(err < 0, test_name, "bpf_prog_load error: %s\n",
+		  strerror(errno)))
+		goto prog_load_failure;
+
+	pmu_fd = syscall(__NR_perf_event_open, attr, pid, -1,
+			 -1 /* group id */, 0 /* flags */);
+	if (CHECK(pmu_fd < 0, test_name, "perf_event_open error: %s\n",
+		  strerror(errno))) {
+		err = -1;
+		goto close_prog;
+	}
+
+	err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
+	if (CHECK(err < 0, test_name, "ioctl perf_event_ioc_enable error: %s\n",
+		  strerror(errno)))
+		goto disable_pmu;
+
+	err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
+	if (CHECK(err < 0, test_name, "ioctl perf_event_ioc_set_bpf error: %s\n",
+		  strerror(errno)))
+		goto disable_pmu;
+
+	err = -1;
+	info_map_fd = bpf_object__find_map_fd_by_name(obj, "info_map");
+	if (CHECK(info_map_fd < 0, test_name, "find map %s error\n", "info_map"))
+		goto disable_pmu;
+
+	status_map_fd = bpf_object__find_map_fd_by_name(obj, "status_map");
+	if (CHECK(status_map_fd < 0, test_name, "find map %s error\n", "status_map"))
+		goto disable_pmu;
+
+	/* wait until child signal handler installed */
+	read(pipe_c2p[0], buf, 1);
+
+	/* trigger the bpf send_signal */
+	key = 0;
+	val = (((__u64)(SIGUSR1)) << 32) | pid;
+	bpf_map_update_elem(info_map_fd, &key, &val, 0);
+
+	/* notify child that bpf program can send_signal now */
+	write(pipe_p2c[1], buf, 1);
+
+	/* wait for result */
+	err = read(pipe_c2p[0], buf, 1);
+	if (CHECK(err < 0, test_name, "reading pipe error: %s\n", strerror(errno)))
+		goto disable_pmu;
+	if (CHECK(err == 0, test_name, "reading pipe error: size 0\n")) {
+		err = -1;
+		goto disable_pmu;
+	}
+
+	err = CHECK(buf[0] != '2', test_name, "incorrect result\n");
+
+	/* notify child safe to exit */
+	write(pipe_p2c[1], buf, 1);
+
+disable_pmu:
+	close(pmu_fd);
+close_prog:
+	bpf_object__close(obj);
+prog_load_failure:
+	close(pipe_c2p[0]);
+	close(pipe_p2c[1]);
+	wait(NULL);
+no_fork_done:
+	return err;
+}
+
+static int test_send_signal_tracepoint(void)
+{
+	const char *id_path = "/sys/kernel/debug/tracing/events/syscalls/sys_enter_nanosleep/id";
+	struct perf_event_attr attr = {
+		.type = PERF_TYPE_TRACEPOINT,
+		.sample_type = PERF_SAMPLE_RAW | PERF_SAMPLE_CALLCHAIN,
+		.sample_period = 1,
+		.wakeup_events = 1,
+	};
+	__u32 duration = 0;
+	int bytes, efd;
+	char buf[256];
+
+	efd = open(id_path, O_RDONLY, 0);
+	if (CHECK(efd < 0, "tracepoint",
+		  "open syscalls/sys_enter_nanosleep/id failure: %s\n",
+		  strerror(errno)))
+		return -1;
+
+	bytes = read(efd, buf, sizeof(buf));
+	close(efd);
+	if (CHECK(bytes <= 0 || bytes >= sizeof(buf), "tracepoint",
+		  "read syscalls/sys_enter_nanosleep/id failure: %s\n",
+		  strerror(errno)))
+		return -1;
+
+	attr.config = strtol(buf, NULL, 0);
+
+	return test_send_signal_common(&attr, BPF_PROG_TYPE_TRACEPOINT, "tracepoint");
+}
+
+static int test_send_signal_nmi(void)
+{
+	struct perf_event_attr attr = {
+		.sample_freq = 50,
+		.freq = 1,
+		.type = PERF_TYPE_HARDWARE,
+		.config = PERF_COUNT_HW_CPU_CYCLES,
+	};
+
+	return test_send_signal_common(&attr, BPF_PROG_TYPE_PERF_EVENT, "perf_event");
+}
+
+void test_send_signal(void)
+{
+	int ret = 0;
+
+	ret |= test_send_signal_tracepoint();
+	ret |= test_send_signal_nmi();
+	if (!ret)
+		printf("test_send_signal:OK\n");
+	else
+		printf("test_send_signal:FAIL\n");
+}
diff --git a/tools/testing/selftests/bpf/progs/test_send_signal_kern.c b/tools/testing/selftests/bpf/progs/test_send_signal_kern.c
new file mode 100644
index 000000000000..45a1a1a2c345
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_send_signal_kern.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+#include <linux/bpf.h>
+#include <linux/version.h>
+#include "bpf_helpers.h"
+
+struct bpf_map_def SEC("maps") info_map = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(__u64),
+	.max_entries = 1,
+};
+
+BPF_ANNOTATE_KV_PAIR(info_map, __u32, __u64);
+
+struct bpf_map_def SEC("maps") status_map = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(__u64),
+	.max_entries = 1,
+};
+
+BPF_ANNOTATE_KV_PAIR(status_map, __u32, __u64);
+
+SEC("send_signal_demo")
+int bpf_send_signal_test(void *ctx)
+{
+	__u64 *info_val, *status_val;
+	__u32 key = 0, pid, sig;
+	int ret;
+
+	status_val = bpf_map_lookup_elem(&status_map, &key);
+	if (!status_val || *status_val != 0)
+		return 0;
+
+	info_val = bpf_map_lookup_elem(&info_map, &key);
+	if (!info_val || *info_val == 0)
+		return 0;
+
+	sig = *info_val >> 32;
+	pid = *info_val & 0xffffFFFF;
+
+	if ((bpf_get_current_pid_tgid() >> 32) == pid) {
+		ret = bpf_send_signal(sig);
+		if (ret == 0)
+			*status_val = 1;
+	}
+
+	return 0;
+}
+char __license[] SEC("license") = "GPL";
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v5 0/3] bpf: implement bpf_send_signal() helper
  2019-05-23 21:47 [PATCH bpf-next v5 0/3] bpf: implement bpf_send_signal() helper Yonghong Song
                   ` (2 preceding siblings ...)
  2019-05-23 21:47 ` [PATCH bpf-next v5 3/3] tools/bpf: add selftest in test_progs for bpf_send_signal() helper Yonghong Song
@ 2019-05-24 21:32 ` Daniel Borkmann
  3 siblings, 0 replies; 12+ messages in thread
From: Daniel Borkmann @ 2019-05-24 21:32 UTC (permalink / raw)
  To: Yonghong Song, bpf, netdev
  Cc: Alexei Starovoitov, kernel-team, Peter Zijlstra

On 05/23/2019 11:47 PM, Yonghong Song wrote:
> This patch tries to solve the following specific use case. 
> 
> Currently, bpf program can already collect stack traces
> through kernel function get_perf_callchain()
> when certain events happens (e.g., cache miss counter or
> cpu clock counter overflows). But such stack traces are 
> not enough for jitted programs, e.g., hhvm (jited php).
> To get real stack trace, jit engine internal data structures
> need to be traversed in order to get the real user functions.
> 
> bpf program itself may not be the best place to traverse 
> the jit engine as the traversing logic could be complex and
> it is not a stable interface either.
> 
> Instead, hhvm implements a signal handler,
> e.g. for SIGALARM, and a set of program locations which 
> it can dump stack traces. When it receives a signal, it will
> dump the stack in next such program location.
> 
> This patch implements bpf_send_signal() helper to send
> a signal to hhvm in real time, resulting in intended stack traces. 
> 
> Patch #1 implemented the bpf_send_helper() in the kernel.
> Patch #2 synced uapi header bpf.h to tools directory.
> Patch #3 added a self test which covers tracepoint
> and perf_event bpf programs. 
> 
> Changelogs:
>   v4 => v5:
>     . pass the "current" task struct to irq_work as well
>       since the current task struct may change between
>       nmi and subsequent irq_work_interrupt.
>       Discovered by Daniel.
>   v3 => v4:
>     . fix one typo and declare "const char *id_path = ..."
>       to avoid directly use the long string in the func body 
>       in Patch #3.
>   v2 => v3:
>     . change the standalone test to be part of prog_tests.
>   RFC v1 => v2:
>     . previous version allows to send signal to an arbitrary 
>       pid. This version just sends the signal to current
>       task to avoid unstable pid and potential races between
>       sending signals and task state changes for the pid.
> 
> Yonghong Song (3):
>   bpf: implement bpf_send_signal() helper
>   tools/bpf: sync bpf uapi header bpf.h to tools directory
>   tools/bpf: add selftest in test_progs for bpf_send_signal() helper
> 
>  include/uapi/linux/bpf.h                      |  17 +-
>  kernel/trace/bpf_trace.c                      |  72 +++++++
>  tools/include/uapi/linux/bpf.h                |  17 +-
>  tools/testing/selftests/bpf/bpf_helpers.h     |   1 +
>  .../selftests/bpf/prog_tests/send_signal.c    | 198 ++++++++++++++++++
>  .../bpf/progs/test_send_signal_kern.c         |  51 +++++
>  6 files changed, 354 insertions(+), 2 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/send_signal.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_send_signal_kern.c
> 

Applied, thanks. One more remark in patch 1, will reply there.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v5 1/3] bpf: implement bpf_send_signal() helper
  2019-05-23 21:47 ` [PATCH bpf-next v5 1/3] " Yonghong Song
@ 2019-05-24 21:39   ` Daniel Borkmann
  2019-05-24 21:59     ` Daniel Borkmann
  2019-05-24 22:20     ` Yonghong Song
  0 siblings, 2 replies; 12+ messages in thread
From: Daniel Borkmann @ 2019-05-24 21:39 UTC (permalink / raw)
  To: Yonghong Song, bpf, netdev
  Cc: Alexei Starovoitov, kernel-team, Peter Zijlstra

On 05/23/2019 11:47 PM, Yonghong Song wrote:
> This patch tries to solve the following specific use case.
> 
> Currently, bpf program can already collect stack traces
> through kernel function get_perf_callchain()
> when certain events happens (e.g., cache miss counter or
> cpu clock counter overflows). But such stack traces are
> not enough for jitted programs, e.g., hhvm (jited php).
> To get real stack trace, jit engine internal data structures
> need to be traversed in order to get the real user functions.
> 
> bpf program itself may not be the best place to traverse
> the jit engine as the traversing logic could be complex and
> it is not a stable interface either.
> 
> Instead, hhvm implements a signal handler,
> e.g. for SIGALARM, and a set of program locations which
> it can dump stack traces. When it receives a signal, it will
> dump the stack in next such program location.
> 
> Such a mechanism can be implemented in the following way:
>   . a perf ring buffer is created between bpf program
>     and tracing app.
>   . once a particular event happens, bpf program writes
>     to the ring buffer and the tracing app gets notified.
>   . the tracing app sends a signal SIGALARM to the hhvm.
> 
> But this method could have large delays and causing profiling
> results skewed.
> 
> This patch implements bpf_send_signal() helper to send
> a signal to hhvm in real time, resulting in intended stack traces.
> 
> Acked-by: Andrii Nakryiko <andriin@fb.com>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  include/uapi/linux/bpf.h | 17 +++++++++-
>  kernel/trace/bpf_trace.c | 72 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 88 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 63e0cf66f01a..68d4470523a0 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2672,6 +2672,20 @@ union bpf_attr {
>   *		0 on success.
>   *
>   *		**-ENOENT** if the bpf-local-storage cannot be found.
> + *
> + * int bpf_send_signal(u32 sig)
> + *	Description
> + *		Send signal *sig* to the current task.
> + *	Return
> + *		0 on success or successfully queued.
> + *
> + *		**-EBUSY** if work queue under nmi is full.
> + *
> + *		**-EINVAL** if *sig* is invalid.
> + *
> + *		**-EPERM** if no permission to send the *sig*.
> + *
> + *		**-EAGAIN** if bpf program can try again.
>   */
>  #define __BPF_FUNC_MAPPER(FN)		\
>  	FN(unspec),			\
> @@ -2782,7 +2796,8 @@ union bpf_attr {
>  	FN(strtol),			\
>  	FN(strtoul),			\
>  	FN(sk_storage_get),		\
> -	FN(sk_storage_delete),
> +	FN(sk_storage_delete),		\
> +	FN(send_signal),
>  
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index f92d6ad5e080..70029eafc71f 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -567,6 +567,63 @@ static const struct bpf_func_proto bpf_probe_read_str_proto = {
>  	.arg3_type	= ARG_ANYTHING,
>  };
>  
> +struct send_signal_irq_work {
> +	struct irq_work irq_work;
> +	struct task_struct *task;
> +	u32 sig;
> +};
> +
> +static DEFINE_PER_CPU(struct send_signal_irq_work, send_signal_work);
> +
> +static void do_bpf_send_signal(struct irq_work *entry)
> +{
> +	struct send_signal_irq_work *work;
> +
> +	work = container_of(entry, struct send_signal_irq_work, irq_work);
> +	group_send_sig_info(work->sig, SEND_SIG_PRIV, work->task, PIDTYPE_TGID);
> +}
> +
> +BPF_CALL_1(bpf_send_signal, u32, sig)
> +{
> +	struct send_signal_irq_work *work = NULL;
> +
> +	/* Similar to bpf_probe_write_user, task needs to be
> +	 * in a sound condition and kernel memory access be
> +	 * permitted in order to send signal to the current
> +	 * task.
> +	 */
> +	if (unlikely(current->flags & (PF_KTHREAD | PF_EXITING)))
> +		return -EPERM;
> +	if (unlikely(uaccess_kernel()))
> +		return -EPERM;
> +	if (unlikely(!nmi_uaccess_okay()))
> +		return -EPERM;
> +
> +	if (in_nmi()) {
> +		work = this_cpu_ptr(&send_signal_work);
> +		if (work->irq_work.flags & IRQ_WORK_BUSY)

Given here and in stackmap are the only two users outside of kernel/irq_work.c,
it may probably be good to add a small helper to include/linux/irq_work.h and
use it for both.

Perhaps something like ...

static inline bool irq_work_busy(struct irq_work *work)
{
	return READ_ONCE(work->flags) & IRQ_WORK_BUSY;
}

> +			return -EBUSY;
> +
> +		/* Add the current task, which is the target of sending signal,
> +		 * to the irq_work. The current task may change when queued
> +		 * irq works get executed.
> +		 */
> +		work->task = current;
> +		work->sig = sig;
> +		irq_work_queue(&work->irq_work);
> +		return 0;
> +	}
> +
> +	return group_send_sig_info(sig, SEND_SIG_PRIV, current, PIDTYPE_TGID);
> +}
> +
> +static const struct bpf_func_proto bpf_send_signal_proto = {
> +	.func		= bpf_send_signal,
> +	.gpl_only	= false,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_ANYTHING,
> +};
> +
>  static const struct bpf_func_proto *
>  tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  {
> @@ -617,6 +674,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  	case BPF_FUNC_get_current_cgroup_id:
>  		return &bpf_get_current_cgroup_id_proto;
>  #endif
> +	case BPF_FUNC_send_signal:
> +		return &bpf_send_signal_proto;
>  	default:
>  		return NULL;
>  	}
> @@ -1343,5 +1402,18 @@ static int __init bpf_event_init(void)
>  	return 0;
>  }
>  
> +static int __init send_signal_irq_work_init(void)
> +{
> +	int cpu;
> +	struct send_signal_irq_work *work;
> +
> +	for_each_possible_cpu(cpu) {
> +		work = per_cpu_ptr(&send_signal_work, cpu);
> +		init_irq_work(&work->irq_work, do_bpf_send_signal);
> +	}
> +	return 0;
> +}
> +
>  fs_initcall(bpf_event_init);
> +subsys_initcall(send_signal_irq_work_init);
>  #endif /* CONFIG_MODULES */
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v5 1/3] bpf: implement bpf_send_signal() helper
  2019-05-24 21:39   ` Daniel Borkmann
@ 2019-05-24 21:59     ` Daniel Borkmann
  2019-05-24 22:23       ` Yonghong Song
  2019-05-24 22:20     ` Yonghong Song
  1 sibling, 1 reply; 12+ messages in thread
From: Daniel Borkmann @ 2019-05-24 21:59 UTC (permalink / raw)
  To: Yonghong Song, bpf, netdev
  Cc: Alexei Starovoitov, kernel-team, Peter Zijlstra

On 05/24/2019 11:39 PM, Daniel Borkmann wrote:
> On 05/23/2019 11:47 PM, Yonghong Song wrote:
>> This patch tries to solve the following specific use case.
>>
>> Currently, bpf program can already collect stack traces
>> through kernel function get_perf_callchain()
>> when certain events happens (e.g., cache miss counter or
>> cpu clock counter overflows). But such stack traces are
>> not enough for jitted programs, e.g., hhvm (jited php).
>> To get real stack trace, jit engine internal data structures
>> need to be traversed in order to get the real user functions.
>>
>> bpf program itself may not be the best place to traverse
>> the jit engine as the traversing logic could be complex and
>> it is not a stable interface either.
>>
>> Instead, hhvm implements a signal handler,
>> e.g. for SIGALARM, and a set of program locations which
>> it can dump stack traces. When it receives a signal, it will
>> dump the stack in next such program location.
>>
>> Such a mechanism can be implemented in the following way:
>>   . a perf ring buffer is created between bpf program
>>     and tracing app.
>>   . once a particular event happens, bpf program writes
>>     to the ring buffer and the tracing app gets notified.
>>   . the tracing app sends a signal SIGALARM to the hhvm.
>>
>> But this method could have large delays and causing profiling
>> results skewed.
>>
>> This patch implements bpf_send_signal() helper to send
>> a signal to hhvm in real time, resulting in intended stack traces.
>>
>> Acked-by: Andrii Nakryiko <andriin@fb.com>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>  include/uapi/linux/bpf.h | 17 +++++++++-
>>  kernel/trace/bpf_trace.c | 72 ++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 88 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 63e0cf66f01a..68d4470523a0 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -2672,6 +2672,20 @@ union bpf_attr {
>>   *		0 on success.
>>   *
>>   *		**-ENOENT** if the bpf-local-storage cannot be found.
>> + *
>> + * int bpf_send_signal(u32 sig)
>> + *	Description
>> + *		Send signal *sig* to the current task.
>> + *	Return
>> + *		0 on success or successfully queued.
>> + *
>> + *		**-EBUSY** if work queue under nmi is full.
>> + *
>> + *		**-EINVAL** if *sig* is invalid.
>> + *
>> + *		**-EPERM** if no permission to send the *sig*.
>> + *
>> + *		**-EAGAIN** if bpf program can try again.
>>   */
>>  #define __BPF_FUNC_MAPPER(FN)		\
>>  	FN(unspec),			\
>> @@ -2782,7 +2796,8 @@ union bpf_attr {
>>  	FN(strtol),			\
>>  	FN(strtoul),			\
>>  	FN(sk_storage_get),		\
>> -	FN(sk_storage_delete),
>> +	FN(sk_storage_delete),		\
>> +	FN(send_signal),
>>  
>>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>>   * function eBPF program intends to call
>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index f92d6ad5e080..70029eafc71f 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -567,6 +567,63 @@ static const struct bpf_func_proto bpf_probe_read_str_proto = {
>>  	.arg3_type	= ARG_ANYTHING,
>>  };
>>  
>> +struct send_signal_irq_work {
>> +	struct irq_work irq_work;
>> +	struct task_struct *task;
>> +	u32 sig;
>> +};
>> +
>> +static DEFINE_PER_CPU(struct send_signal_irq_work, send_signal_work);
>> +
>> +static void do_bpf_send_signal(struct irq_work *entry)
>> +{
>> +	struct send_signal_irq_work *work;
>> +
>> +	work = container_of(entry, struct send_signal_irq_work, irq_work);
>> +	group_send_sig_info(work->sig, SEND_SIG_PRIV, work->task, PIDTYPE_TGID);
>> +}
>> +
>> +BPF_CALL_1(bpf_send_signal, u32, sig)
>> +{
>> +	struct send_signal_irq_work *work = NULL;
>> +

Oh, and one more thing:

	if (!valid_signal(sig))
		return -EINVAL;

Otherwise when deferring the work, you don't have any such feedback.

>> +	/* Similar to bpf_probe_write_user, task needs to be
>> +	 * in a sound condition and kernel memory access be
>> +	 * permitted in order to send signal to the current
>> +	 * task.
>> +	 */
>> +	if (unlikely(current->flags & (PF_KTHREAD | PF_EXITING)))
>> +		return -EPERM;
>> +	if (unlikely(uaccess_kernel()))
>> +		return -EPERM;
>> +	if (unlikely(!nmi_uaccess_okay()))
>> +		return -EPERM;
>> +
>> +	if (in_nmi()) {
>> +		work = this_cpu_ptr(&send_signal_work);
>> +		if (work->irq_work.flags & IRQ_WORK_BUSY)
> 
> Given here and in stackmap are the only two users outside of kernel/irq_work.c,
> it may probably be good to add a small helper to include/linux/irq_work.h and
> use it for both.
> 
> Perhaps something like ...
> 
> static inline bool irq_work_busy(struct irq_work *work)
> {
> 	return READ_ONCE(work->flags) & IRQ_WORK_BUSY;
> }
> 
>> +			return -EBUSY;
>> +
>> +		/* Add the current task, which is the target of sending signal,
>> +		 * to the irq_work. The current task may change when queued
>> +		 * irq works get executed.
>> +		 */
>> +		work->task = current;
>> +		work->sig = sig;
>> +		irq_work_queue(&work->irq_work);
>> +		return 0;
>> +	}
>> +
>> +	return group_send_sig_info(sig, SEND_SIG_PRIV, current, PIDTYPE_TGID);
>> +}
>> +
>> +static const struct bpf_func_proto bpf_send_signal_proto = {
>> +	.func		= bpf_send_signal,
>> +	.gpl_only	= false,
>> +	.ret_type	= RET_INTEGER,
>> +	.arg1_type	= ARG_ANYTHING,
>> +};
>> +
>>  static const struct bpf_func_proto *
>>  tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>>  {
>> @@ -617,6 +674,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>>  	case BPF_FUNC_get_current_cgroup_id:
>>  		return &bpf_get_current_cgroup_id_proto;
>>  #endif
>> +	case BPF_FUNC_send_signal:
>> +		return &bpf_send_signal_proto;
>>  	default:
>>  		return NULL;
>>  	}
>> @@ -1343,5 +1402,18 @@ static int __init bpf_event_init(void)
>>  	return 0;
>>  }
>>  
>> +static int __init send_signal_irq_work_init(void)
>> +{
>> +	int cpu;
>> +	struct send_signal_irq_work *work;
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		work = per_cpu_ptr(&send_signal_work, cpu);
>> +		init_irq_work(&work->irq_work, do_bpf_send_signal);
>> +	}
>> +	return 0;
>> +}
>> +
>>  fs_initcall(bpf_event_init);
>> +subsys_initcall(send_signal_irq_work_init);
>>  #endif /* CONFIG_MODULES */
>>
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v5 1/3] bpf: implement bpf_send_signal() helper
  2019-05-24 21:39   ` Daniel Borkmann
  2019-05-24 21:59     ` Daniel Borkmann
@ 2019-05-24 22:20     ` Yonghong Song
  2019-05-24 22:37       ` Daniel Borkmann
  1 sibling, 1 reply; 12+ messages in thread
From: Yonghong Song @ 2019-05-24 22:20 UTC (permalink / raw)
  To: Daniel Borkmann, bpf, netdev
  Cc: Alexei Starovoitov, Kernel Team, Peter Zijlstra



On 5/24/19 2:39 PM, Daniel Borkmann wrote:
> On 05/23/2019 11:47 PM, Yonghong Song wrote:
>> This patch tries to solve the following specific use case.
>>
>> Currently, bpf program can already collect stack traces
>> through kernel function get_perf_callchain()
>> when certain events happens (e.g., cache miss counter or
>> cpu clock counter overflows). But such stack traces are
>> not enough for jitted programs, e.g., hhvm (jited php).
>> To get real stack trace, jit engine internal data structures
>> need to be traversed in order to get the real user functions.
>>
>> bpf program itself may not be the best place to traverse
>> the jit engine as the traversing logic could be complex and
>> it is not a stable interface either.
>>
>> Instead, hhvm implements a signal handler,
>> e.g. for SIGALARM, and a set of program locations which
>> it can dump stack traces. When it receives a signal, it will
>> dump the stack in next such program location.
>>
>> Such a mechanism can be implemented in the following way:
>>    . a perf ring buffer is created between bpf program
>>      and tracing app.
>>    . once a particular event happens, bpf program writes
>>      to the ring buffer and the tracing app gets notified.
>>    . the tracing app sends a signal SIGALARM to the hhvm.
>>
>> But this method could have large delays and causing profiling
>> results skewed.
>>
>> This patch implements bpf_send_signal() helper to send
>> a signal to hhvm in real time, resulting in intended stack traces.
>>
>> Acked-by: Andrii Nakryiko <andriin@fb.com>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   include/uapi/linux/bpf.h | 17 +++++++++-
>>   kernel/trace/bpf_trace.c | 72 ++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 88 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 63e0cf66f01a..68d4470523a0 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -2672,6 +2672,20 @@ union bpf_attr {
>>    *		0 on success.
>>    *
>>    *		**-ENOENT** if the bpf-local-storage cannot be found.
>> + *
>> + * int bpf_send_signal(u32 sig)
>> + *	Description
>> + *		Send signal *sig* to the current task.
>> + *	Return
>> + *		0 on success or successfully queued.
>> + *
>> + *		**-EBUSY** if work queue under nmi is full.
>> + *
>> + *		**-EINVAL** if *sig* is invalid.
>> + *
>> + *		**-EPERM** if no permission to send the *sig*.
>> + *
>> + *		**-EAGAIN** if bpf program can try again.
>>    */
>>   #define __BPF_FUNC_MAPPER(FN)		\
>>   	FN(unspec),			\
>> @@ -2782,7 +2796,8 @@ union bpf_attr {
>>   	FN(strtol),			\
>>   	FN(strtoul),			\
>>   	FN(sk_storage_get),		\
>> -	FN(sk_storage_delete),
>> +	FN(sk_storage_delete),		\
>> +	FN(send_signal),
>>   
>>   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>>    * function eBPF program intends to call
>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index f92d6ad5e080..70029eafc71f 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -567,6 +567,63 @@ static const struct bpf_func_proto bpf_probe_read_str_proto = {
>>   	.arg3_type	= ARG_ANYTHING,
>>   };
>>   
>> +struct send_signal_irq_work {
>> +	struct irq_work irq_work;
>> +	struct task_struct *task;
>> +	u32 sig;
>> +};
>> +
>> +static DEFINE_PER_CPU(struct send_signal_irq_work, send_signal_work);
>> +
>> +static void do_bpf_send_signal(struct irq_work *entry)
>> +{
>> +	struct send_signal_irq_work *work;
>> +
>> +	work = container_of(entry, struct send_signal_irq_work, irq_work);
>> +	group_send_sig_info(work->sig, SEND_SIG_PRIV, work->task, PIDTYPE_TGID);
>> +}
>> +
>> +BPF_CALL_1(bpf_send_signal, u32, sig)
>> +{
>> +	struct send_signal_irq_work *work = NULL;
>> +
>> +	/* Similar to bpf_probe_write_user, task needs to be
>> +	 * in a sound condition and kernel memory access be
>> +	 * permitted in order to send signal to the current
>> +	 * task.
>> +	 */
>> +	if (unlikely(current->flags & (PF_KTHREAD | PF_EXITING)))
>> +		return -EPERM;
>> +	if (unlikely(uaccess_kernel()))
>> +		return -EPERM;
>> +	if (unlikely(!nmi_uaccess_okay()))
>> +		return -EPERM;
>> +
>> +	if (in_nmi()) {
>> +		work = this_cpu_ptr(&send_signal_work);
>> +		if (work->irq_work.flags & IRQ_WORK_BUSY)
> 
> Given here and in stackmap are the only two users outside of kernel/irq_work.c,
> it may probably be good to add a small helper to include/linux/irq_work.h and
> use it for both.
> 
> Perhaps something like ...
> 
> static inline bool irq_work_busy(struct irq_work *work)
> {
> 	return READ_ONCE(work->flags) & IRQ_WORK_BUSY;
> }

Not sure whether READ_ONCE is needed here or not.

The irq_work is per cpu data structure,
   static DEFINE_PER_CPU(struct send_signal_irq_work, send_signal_work);
so presumably no collision for work->flags memory reference.

If without READ_ONCE, do you still think a helper is needed?

> 
>> +			return -EBUSY;
>> +
>> +		/* Add the current task, which is the target of sending signal,
>> +		 * to the irq_work. The current task may change when queued
>> +		 * irq works get executed.
>> +		 */
>> +		work->task = current;
>> +		work->sig = sig;
>> +		irq_work_queue(&work->irq_work);
>> +		return 0;
>> +	}
>> +
>> +	return group_send_sig_info(sig, SEND_SIG_PRIV, current, PIDTYPE_TGID);
>> +}
>> +
>> +static const struct bpf_func_proto bpf_send_signal_proto = {
>> +	.func		= bpf_send_signal,
>> +	.gpl_only	= false,
>> +	.ret_type	= RET_INTEGER,
>> +	.arg1_type	= ARG_ANYTHING,
>> +};
>> +
>>   static const struct bpf_func_proto *
>>   tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>>   {
>> @@ -617,6 +674,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>>   	case BPF_FUNC_get_current_cgroup_id:
>>   		return &bpf_get_current_cgroup_id_proto;
>>   #endif
>> +	case BPF_FUNC_send_signal:
>> +		return &bpf_send_signal_proto;
>>   	default:
>>   		return NULL;
>>   	}
>> @@ -1343,5 +1402,18 @@ static int __init bpf_event_init(void)
>>   	return 0;
>>   }
>>   
>> +static int __init send_signal_irq_work_init(void)
>> +{
>> +	int cpu;
>> +	struct send_signal_irq_work *work;
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		work = per_cpu_ptr(&send_signal_work, cpu);
>> +		init_irq_work(&work->irq_work, do_bpf_send_signal);
>> +	}
>> +	return 0;
>> +}
>> +
>>   fs_initcall(bpf_event_init);
>> +subsys_initcall(send_signal_irq_work_init);
>>   #endif /* CONFIG_MODULES */
>>
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v5 1/3] bpf: implement bpf_send_signal() helper
  2019-05-24 21:59     ` Daniel Borkmann
@ 2019-05-24 22:23       ` Yonghong Song
  2019-05-24 22:38         ` Daniel Borkmann
  0 siblings, 1 reply; 12+ messages in thread
From: Yonghong Song @ 2019-05-24 22:23 UTC (permalink / raw)
  To: Daniel Borkmann, bpf, netdev
  Cc: Alexei Starovoitov, Kernel Team, Peter Zijlstra



On 5/24/19 2:59 PM, Daniel Borkmann wrote:
> On 05/24/2019 11:39 PM, Daniel Borkmann wrote:
>> On 05/23/2019 11:47 PM, Yonghong Song wrote:
>>> This patch tries to solve the following specific use case.
>>>
>>> Currently, bpf program can already collect stack traces
>>> through kernel function get_perf_callchain()
>>> when certain events happens (e.g., cache miss counter or
>>> cpu clock counter overflows). But such stack traces are
>>> not enough for jitted programs, e.g., hhvm (jited php).
>>> To get real stack trace, jit engine internal data structures
>>> need to be traversed in order to get the real user functions.
>>>
>>> bpf program itself may not be the best place to traverse
>>> the jit engine as the traversing logic could be complex and
>>> it is not a stable interface either.
>>>
>>> Instead, hhvm implements a signal handler,
>>> e.g. for SIGALARM, and a set of program locations which
>>> it can dump stack traces. When it receives a signal, it will
>>> dump the stack in next such program location.
>>>
>>> Such a mechanism can be implemented in the following way:
>>>    . a perf ring buffer is created between bpf program
>>>      and tracing app.
>>>    . once a particular event happens, bpf program writes
>>>      to the ring buffer and the tracing app gets notified.
>>>    . the tracing app sends a signal SIGALARM to the hhvm.
>>>
>>> But this method could have large delays and causing profiling
>>> results skewed.
>>>
>>> This patch implements bpf_send_signal() helper to send
>>> a signal to hhvm in real time, resulting in intended stack traces.
>>>
>>> Acked-by: Andrii Nakryiko <andriin@fb.com>
>>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>> ---
>>>   include/uapi/linux/bpf.h | 17 +++++++++-
>>>   kernel/trace/bpf_trace.c | 72 ++++++++++++++++++++++++++++++++++++++++
>>>   2 files changed, 88 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>> index 63e0cf66f01a..68d4470523a0 100644
>>> --- a/include/uapi/linux/bpf.h
>>> +++ b/include/uapi/linux/bpf.h
>>> @@ -2672,6 +2672,20 @@ union bpf_attr {
>>>    *		0 on success.
>>>    *
>>>    *		**-ENOENT** if the bpf-local-storage cannot be found.
>>> + *
>>> + * int bpf_send_signal(u32 sig)
>>> + *	Description
>>> + *		Send signal *sig* to the current task.
>>> + *	Return
>>> + *		0 on success or successfully queued.
>>> + *
>>> + *		**-EBUSY** if work queue under nmi is full.
>>> + *
>>> + *		**-EINVAL** if *sig* is invalid.
>>> + *
>>> + *		**-EPERM** if no permission to send the *sig*.
>>> + *
>>> + *		**-EAGAIN** if bpf program can try again.
>>>    */
>>>   #define __BPF_FUNC_MAPPER(FN)		\
>>>   	FN(unspec),			\
>>> @@ -2782,7 +2796,8 @@ union bpf_attr {
>>>   	FN(strtol),			\
>>>   	FN(strtoul),			\
>>>   	FN(sk_storage_get),		\
>>> -	FN(sk_storage_delete),
>>> +	FN(sk_storage_delete),		\
>>> +	FN(send_signal),
>>>   
>>>   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>>>    * function eBPF program intends to call
>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>>> index f92d6ad5e080..70029eafc71f 100644
>>> --- a/kernel/trace/bpf_trace.c
>>> +++ b/kernel/trace/bpf_trace.c
>>> @@ -567,6 +567,63 @@ static const struct bpf_func_proto bpf_probe_read_str_proto = {
>>>   	.arg3_type	= ARG_ANYTHING,
>>>   };
>>>   
>>> +struct send_signal_irq_work {
>>> +	struct irq_work irq_work;
>>> +	struct task_struct *task;
>>> +	u32 sig;
>>> +};
>>> +
>>> +static DEFINE_PER_CPU(struct send_signal_irq_work, send_signal_work);
>>> +
>>> +static void do_bpf_send_signal(struct irq_work *entry)
>>> +{
>>> +	struct send_signal_irq_work *work;
>>> +
>>> +	work = container_of(entry, struct send_signal_irq_work, irq_work);
>>> +	group_send_sig_info(work->sig, SEND_SIG_PRIV, work->task, PIDTYPE_TGID);
>>> +}
>>> +
>>> +BPF_CALL_1(bpf_send_signal, u32, sig)
>>> +{
>>> +	struct send_signal_irq_work *work = NULL;
>>> +
> 
> Oh, and one more thing:
> 
> 	if (!valid_signal(sig))
> 		return -EINVAL;
> 
> Otherwise when deferring the work, you don't have any such feedback.

Good advice! Do you want me send a followup patch or
resend the whole series?

> 
>>> +	/* Similar to bpf_probe_write_user, task needs to be
>>> +	 * in a sound condition and kernel memory access be
>>> +	 * permitted in order to send signal to the current
>>> +	 * task.
>>> +	 */
>>> +	if (unlikely(current->flags & (PF_KTHREAD | PF_EXITING)))
>>> +		return -EPERM;
>>> +	if (unlikely(uaccess_kernel()))
>>> +		return -EPERM;
>>> +	if (unlikely(!nmi_uaccess_okay()))
>>> +		return -EPERM;
>>> +
>>> +	if (in_nmi()) {
>>> +		work = this_cpu_ptr(&send_signal_work);
>>> +		if (work->irq_work.flags & IRQ_WORK_BUSY)
>>
>> Given here and in stackmap are the only two users outside of kernel/irq_work.c,
>> it may probably be good to add a small helper to include/linux/irq_work.h and
>> use it for both.
>>
>> Perhaps something like ...
>>
>> static inline bool irq_work_busy(struct irq_work *work)
>> {
>> 	return READ_ONCE(work->flags) & IRQ_WORK_BUSY;
>> }
>>
>>> +			return -EBUSY;
>>> +
>>> +		/* Add the current task, which is the target of sending signal,
>>> +		 * to the irq_work. The current task may change when queued
>>> +		 * irq works get executed.
>>> +		 */
>>> +		work->task = current;
>>> +		work->sig = sig;
>>> +		irq_work_queue(&work->irq_work);
>>> +		return 0;
>>> +	}
>>> +
>>> +	return group_send_sig_info(sig, SEND_SIG_PRIV, current, PIDTYPE_TGID);
>>> +}
>>> +
>>> +static const struct bpf_func_proto bpf_send_signal_proto = {
>>> +	.func		= bpf_send_signal,
>>> +	.gpl_only	= false,
>>> +	.ret_type	= RET_INTEGER,
>>> +	.arg1_type	= ARG_ANYTHING,
>>> +};
>>> +
>>>   static const struct bpf_func_proto *
>>>   tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>>>   {
>>> @@ -617,6 +674,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>>>   	case BPF_FUNC_get_current_cgroup_id:
>>>   		return &bpf_get_current_cgroup_id_proto;
>>>   #endif
>>> +	case BPF_FUNC_send_signal:
>>> +		return &bpf_send_signal_proto;
>>>   	default:
>>>   		return NULL;
>>>   	}
>>> @@ -1343,5 +1402,18 @@ static int __init bpf_event_init(void)
>>>   	return 0;
>>>   }
>>>   
>>> +static int __init send_signal_irq_work_init(void)
>>> +{
>>> +	int cpu;
>>> +	struct send_signal_irq_work *work;
>>> +
>>> +	for_each_possible_cpu(cpu) {
>>> +		work = per_cpu_ptr(&send_signal_work, cpu);
>>> +		init_irq_work(&work->irq_work, do_bpf_send_signal);
>>> +	}
>>> +	return 0;
>>> +}
>>> +
>>>   fs_initcall(bpf_event_init);
>>> +subsys_initcall(send_signal_irq_work_init);
>>>   #endif /* CONFIG_MODULES */
>>>
>>
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v5 1/3] bpf: implement bpf_send_signal() helper
  2019-05-24 22:20     ` Yonghong Song
@ 2019-05-24 22:37       ` Daniel Borkmann
  2019-05-25 19:14         ` Yonghong Song
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Borkmann @ 2019-05-24 22:37 UTC (permalink / raw)
  To: Yonghong Song, bpf, netdev
  Cc: Alexei Starovoitov, Kernel Team, Peter Zijlstra

On 05/25/2019 12:20 AM, Yonghong Song wrote:
> On 5/24/19 2:39 PM, Daniel Borkmann wrote:
>> On 05/23/2019 11:47 PM, Yonghong Song wrote:
>>> This patch tries to solve the following specific use case.
>>>
>>> Currently, bpf program can already collect stack traces
>>> through kernel function get_perf_callchain()
>>> when certain events happens (e.g., cache miss counter or
>>> cpu clock counter overflows). But such stack traces are
>>> not enough for jitted programs, e.g., hhvm (jited php).
>>> To get real stack trace, jit engine internal data structures
>>> need to be traversed in order to get the real user functions.
>>>
>>> bpf program itself may not be the best place to traverse
>>> the jit engine as the traversing logic could be complex and
>>> it is not a stable interface either.
>>>
>>> Instead, hhvm implements a signal handler,
>>> e.g. for SIGALARM, and a set of program locations which
>>> it can dump stack traces. When it receives a signal, it will
>>> dump the stack in next such program location.
>>>
>>> Such a mechanism can be implemented in the following way:
>>>    . a perf ring buffer is created between bpf program
>>>      and tracing app.
>>>    . once a particular event happens, bpf program writes
>>>      to the ring buffer and the tracing app gets notified.
>>>    . the tracing app sends a signal SIGALARM to the hhvm.
>>>
>>> But this method could have large delays and causing profiling
>>> results skewed.
>>>
>>> This patch implements bpf_send_signal() helper to send
>>> a signal to hhvm in real time, resulting in intended stack traces.
>>>
>>> Acked-by: Andrii Nakryiko <andriin@fb.com>
>>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>> ---
>>>   include/uapi/linux/bpf.h | 17 +++++++++-
>>>   kernel/trace/bpf_trace.c | 72 ++++++++++++++++++++++++++++++++++++++++
>>>   2 files changed, 88 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>> index 63e0cf66f01a..68d4470523a0 100644
>>> --- a/include/uapi/linux/bpf.h
>>> +++ b/include/uapi/linux/bpf.h
>>> @@ -2672,6 +2672,20 @@ union bpf_attr {
>>>    *		0 on success.
>>>    *
>>>    *		**-ENOENT** if the bpf-local-storage cannot be found.
>>> + *
>>> + * int bpf_send_signal(u32 sig)
>>> + *	Description
>>> + *		Send signal *sig* to the current task.
>>> + *	Return
>>> + *		0 on success or successfully queued.
>>> + *
>>> + *		**-EBUSY** if work queue under nmi is full.
>>> + *
>>> + *		**-EINVAL** if *sig* is invalid.
>>> + *
>>> + *		**-EPERM** if no permission to send the *sig*.
>>> + *
>>> + *		**-EAGAIN** if bpf program can try again.
>>>    */
>>>   #define __BPF_FUNC_MAPPER(FN)		\
>>>   	FN(unspec),			\
>>> @@ -2782,7 +2796,8 @@ union bpf_attr {
>>>   	FN(strtol),			\
>>>   	FN(strtoul),			\
>>>   	FN(sk_storage_get),		\
>>> -	FN(sk_storage_delete),
>>> +	FN(sk_storage_delete),		\
>>> +	FN(send_signal),
>>>   
>>>   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>>>    * function eBPF program intends to call
>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>>> index f92d6ad5e080..70029eafc71f 100644
>>> --- a/kernel/trace/bpf_trace.c
>>> +++ b/kernel/trace/bpf_trace.c
>>> @@ -567,6 +567,63 @@ static const struct bpf_func_proto bpf_probe_read_str_proto = {
>>>   	.arg3_type	= ARG_ANYTHING,
>>>   };
>>>   
>>> +struct send_signal_irq_work {
>>> +	struct irq_work irq_work;
>>> +	struct task_struct *task;
>>> +	u32 sig;
>>> +};
>>> +
>>> +static DEFINE_PER_CPU(struct send_signal_irq_work, send_signal_work);
>>> +
>>> +static void do_bpf_send_signal(struct irq_work *entry)
>>> +{
>>> +	struct send_signal_irq_work *work;
>>> +
>>> +	work = container_of(entry, struct send_signal_irq_work, irq_work);
>>> +	group_send_sig_info(work->sig, SEND_SIG_PRIV, work->task, PIDTYPE_TGID);
>>> +}
>>> +
>>> +BPF_CALL_1(bpf_send_signal, u32, sig)
>>> +{
>>> +	struct send_signal_irq_work *work = NULL;
>>> +
>>> +	/* Similar to bpf_probe_write_user, task needs to be
>>> +	 * in a sound condition and kernel memory access be
>>> +	 * permitted in order to send signal to the current
>>> +	 * task.
>>> +	 */
>>> +	if (unlikely(current->flags & (PF_KTHREAD | PF_EXITING)))
>>> +		return -EPERM;
>>> +	if (unlikely(uaccess_kernel()))
>>> +		return -EPERM;
>>> +	if (unlikely(!nmi_uaccess_okay()))
>>> +		return -EPERM;
>>> +
>>> +	if (in_nmi()) {
>>> +		work = this_cpu_ptr(&send_signal_work);
>>> +		if (work->irq_work.flags & IRQ_WORK_BUSY)
>>
>> Given here and in stackmap are the only two users outside of kernel/irq_work.c,
>> it may probably be good to add a small helper to include/linux/irq_work.h and
>> use it for both.
>>
>> Perhaps something like ...
>>
>> static inline bool irq_work_busy(struct irq_work *work)
>> {
>> 	return READ_ONCE(work->flags) & IRQ_WORK_BUSY;
>> }
> 
> Not sure whether READ_ONCE is needed here or not.
> 
> The irq_work is per cpu data structure,
>    static DEFINE_PER_CPU(struct send_signal_irq_work, send_signal_work);
> so presumably no collision for work->flags memory reference.

The busy bit you're testing is cleared via cmpxchg(), kernel/irq_work.c +169:

cmpxchg(&work->flags, flags, flags & ~IRQ_WORK_BUSY);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v5 1/3] bpf: implement bpf_send_signal() helper
  2019-05-24 22:23       ` Yonghong Song
@ 2019-05-24 22:38         ` Daniel Borkmann
  0 siblings, 0 replies; 12+ messages in thread
From: Daniel Borkmann @ 2019-05-24 22:38 UTC (permalink / raw)
  To: Yonghong Song, bpf, netdev
  Cc: Alexei Starovoitov, Kernel Team, Peter Zijlstra

On 05/25/2019 12:23 AM, Yonghong Song wrote:
> On 5/24/19 2:59 PM, Daniel Borkmann wrote:
>> On 05/24/2019 11:39 PM, Daniel Borkmann wrote:
>>> On 05/23/2019 11:47 PM, Yonghong Song wrote:
>>>> This patch tries to solve the following specific use case.
>>>>
>>>> Currently, bpf program can already collect stack traces
>>>> through kernel function get_perf_callchain()
>>>> when certain events happens (e.g., cache miss counter or
>>>> cpu clock counter overflows). But such stack traces are
>>>> not enough for jitted programs, e.g., hhvm (jited php).
>>>> To get real stack trace, jit engine internal data structures
>>>> need to be traversed in order to get the real user functions.
>>>>
>>>> bpf program itself may not be the best place to traverse
>>>> the jit engine as the traversing logic could be complex and
>>>> it is not a stable interface either.
>>>>
>>>> Instead, hhvm implements a signal handler,
>>>> e.g. for SIGALARM, and a set of program locations which
>>>> it can dump stack traces. When it receives a signal, it will
>>>> dump the stack in next such program location.
>>>>
>>>> Such a mechanism can be implemented in the following way:
>>>>    . a perf ring buffer is created between bpf program
>>>>      and tracing app.
>>>>    . once a particular event happens, bpf program writes
>>>>      to the ring buffer and the tracing app gets notified.
>>>>    . the tracing app sends a signal SIGALARM to the hhvm.
>>>>
>>>> But this method could have large delays and causing profiling
>>>> results skewed.
>>>>
>>>> This patch implements bpf_send_signal() helper to send
>>>> a signal to hhvm in real time, resulting in intended stack traces.
>>>>
>>>> Acked-by: Andrii Nakryiko <andriin@fb.com>
>>>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>>> ---
>>>>   include/uapi/linux/bpf.h | 17 +++++++++-
>>>>   kernel/trace/bpf_trace.c | 72 ++++++++++++++++++++++++++++++++++++++++
>>>>   2 files changed, 88 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>>> index 63e0cf66f01a..68d4470523a0 100644
>>>> --- a/include/uapi/linux/bpf.h
>>>> +++ b/include/uapi/linux/bpf.h
>>>> @@ -2672,6 +2672,20 @@ union bpf_attr {
>>>>    *		0 on success.
>>>>    *
>>>>    *		**-ENOENT** if the bpf-local-storage cannot be found.
>>>> + *
>>>> + * int bpf_send_signal(u32 sig)
>>>> + *	Description
>>>> + *		Send signal *sig* to the current task.
>>>> + *	Return
>>>> + *		0 on success or successfully queued.
>>>> + *
>>>> + *		**-EBUSY** if work queue under nmi is full.
>>>> + *
>>>> + *		**-EINVAL** if *sig* is invalid.
>>>> + *
>>>> + *		**-EPERM** if no permission to send the *sig*.
>>>> + *
>>>> + *		**-EAGAIN** if bpf program can try again.
>>>>    */
>>>>   #define __BPF_FUNC_MAPPER(FN)		\
>>>>   	FN(unspec),			\
>>>> @@ -2782,7 +2796,8 @@ union bpf_attr {
>>>>   	FN(strtol),			\
>>>>   	FN(strtoul),			\
>>>>   	FN(sk_storage_get),		\
>>>> -	FN(sk_storage_delete),
>>>> +	FN(sk_storage_delete),		\
>>>> +	FN(send_signal),
>>>>   
>>>>   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>>>>    * function eBPF program intends to call
>>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>>>> index f92d6ad5e080..70029eafc71f 100644
>>>> --- a/kernel/trace/bpf_trace.c
>>>> +++ b/kernel/trace/bpf_trace.c
>>>> @@ -567,6 +567,63 @@ static const struct bpf_func_proto bpf_probe_read_str_proto = {
>>>>   	.arg3_type	= ARG_ANYTHING,
>>>>   };
>>>>   
>>>> +struct send_signal_irq_work {
>>>> +	struct irq_work irq_work;
>>>> +	struct task_struct *task;
>>>> +	u32 sig;
>>>> +};
>>>> +
>>>> +static DEFINE_PER_CPU(struct send_signal_irq_work, send_signal_work);
>>>> +
>>>> +static void do_bpf_send_signal(struct irq_work *entry)
>>>> +{
>>>> +	struct send_signal_irq_work *work;
>>>> +
>>>> +	work = container_of(entry, struct send_signal_irq_work, irq_work);
>>>> +	group_send_sig_info(work->sig, SEND_SIG_PRIV, work->task, PIDTYPE_TGID);
>>>> +}
>>>> +
>>>> +BPF_CALL_1(bpf_send_signal, u32, sig)
>>>> +{
>>>> +	struct send_signal_irq_work *work = NULL;
>>>> +
>>
>> Oh, and one more thing:
>>
>> 	if (!valid_signal(sig))
>> 		return -EINVAL;
>>
>> Otherwise when deferring the work, you don't have any such feedback.
> 
> Good advice! Do you want me send a followup patch or
> resend the whole series?

Lets do follow-up.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH bpf-next v5 1/3] bpf: implement bpf_send_signal() helper
  2019-05-24 22:37       ` Daniel Borkmann
@ 2019-05-25 19:14         ` Yonghong Song
  0 siblings, 0 replies; 12+ messages in thread
From: Yonghong Song @ 2019-05-25 19:14 UTC (permalink / raw)
  To: Daniel Borkmann, bpf, netdev
  Cc: Alexei Starovoitov, Kernel Team, Peter Zijlstra



On 5/24/19 3:37 PM, Daniel Borkmann wrote:
> On 05/25/2019 12:20 AM, Yonghong Song wrote:
>> On 5/24/19 2:39 PM, Daniel Borkmann wrote:
>>> On 05/23/2019 11:47 PM, Yonghong Song wrote:
>>>> This patch tries to solve the following specific use case.
>>>>
>>>> Currently, bpf program can already collect stack traces
>>>> through kernel function get_perf_callchain()
>>>> when certain events happens (e.g., cache miss counter or
>>>> cpu clock counter overflows). But such stack traces are
>>>> not enough for jitted programs, e.g., hhvm (jited php).
>>>> To get real stack trace, jit engine internal data structures
>>>> need to be traversed in order to get the real user functions.
>>>>
>>>> bpf program itself may not be the best place to traverse
>>>> the jit engine as the traversing logic could be complex and
>>>> it is not a stable interface either.
>>>>
>>>> Instead, hhvm implements a signal handler,
>>>> e.g. for SIGALARM, and a set of program locations which
>>>> it can dump stack traces. When it receives a signal, it will
>>>> dump the stack in next such program location.
>>>>
>>>> Such a mechanism can be implemented in the following way:
>>>>     . a perf ring buffer is created between bpf program
>>>>       and tracing app.
>>>>     . once a particular event happens, bpf program writes
>>>>       to the ring buffer and the tracing app gets notified.
>>>>     . the tracing app sends a signal SIGALARM to the hhvm.
>>>>
>>>> But this method could have large delays and causing profiling
>>>> results skewed.
>>>>
>>>> This patch implements bpf_send_signal() helper to send
>>>> a signal to hhvm in real time, resulting in intended stack traces.
>>>>
>>>> Acked-by: Andrii Nakryiko <andriin@fb.com>
>>>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>>> ---
>>>>    include/uapi/linux/bpf.h | 17 +++++++++-
>>>>    kernel/trace/bpf_trace.c | 72 ++++++++++++++++++++++++++++++++++++++++
>>>>    2 files changed, 88 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>>> index 63e0cf66f01a..68d4470523a0 100644
>>>> --- a/include/uapi/linux/bpf.h
>>>> +++ b/include/uapi/linux/bpf.h
>>>> @@ -2672,6 +2672,20 @@ union bpf_attr {
>>>>     *		0 on success.
>>>>     *
>>>>     *		**-ENOENT** if the bpf-local-storage cannot be found.
>>>> + *
>>>> + * int bpf_send_signal(u32 sig)
>>>> + *	Description
>>>> + *		Send signal *sig* to the current task.
>>>> + *	Return
>>>> + *		0 on success or successfully queued.
>>>> + *
>>>> + *		**-EBUSY** if work queue under nmi is full.
>>>> + *
>>>> + *		**-EINVAL** if *sig* is invalid.
>>>> + *
>>>> + *		**-EPERM** if no permission to send the *sig*.
>>>> + *
>>>> + *		**-EAGAIN** if bpf program can try again.
>>>>     */
>>>>    #define __BPF_FUNC_MAPPER(FN)		\
>>>>    	FN(unspec),			\
>>>> @@ -2782,7 +2796,8 @@ union bpf_attr {
>>>>    	FN(strtol),			\
>>>>    	FN(strtoul),			\
>>>>    	FN(sk_storage_get),		\
>>>> -	FN(sk_storage_delete),
>>>> +	FN(sk_storage_delete),		\
>>>> +	FN(send_signal),
>>>>    
>>>>    /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>>>>     * function eBPF program intends to call
>>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>>>> index f92d6ad5e080..70029eafc71f 100644
>>>> --- a/kernel/trace/bpf_trace.c
>>>> +++ b/kernel/trace/bpf_trace.c
>>>> @@ -567,6 +567,63 @@ static const struct bpf_func_proto bpf_probe_read_str_proto = {
>>>>    	.arg3_type	= ARG_ANYTHING,
>>>>    };
>>>>    
>>>> +struct send_signal_irq_work {
>>>> +	struct irq_work irq_work;
>>>> +	struct task_struct *task;
>>>> +	u32 sig;
>>>> +};
>>>> +
>>>> +static DEFINE_PER_CPU(struct send_signal_irq_work, send_signal_work);
>>>> +
>>>> +static void do_bpf_send_signal(struct irq_work *entry)
>>>> +{
>>>> +	struct send_signal_irq_work *work;
>>>> +
>>>> +	work = container_of(entry, struct send_signal_irq_work, irq_work);
>>>> +	group_send_sig_info(work->sig, SEND_SIG_PRIV, work->task, PIDTYPE_TGID);
>>>> +}
>>>> +
>>>> +BPF_CALL_1(bpf_send_signal, u32, sig)
>>>> +{
>>>> +	struct send_signal_irq_work *work = NULL;
>>>> +
>>>> +	/* Similar to bpf_probe_write_user, task needs to be
>>>> +	 * in a sound condition and kernel memory access be
>>>> +	 * permitted in order to send signal to the current
>>>> +	 * task.
>>>> +	 */
>>>> +	if (unlikely(current->flags & (PF_KTHREAD | PF_EXITING)))
>>>> +		return -EPERM;
>>>> +	if (unlikely(uaccess_kernel()))
>>>> +		return -EPERM;
>>>> +	if (unlikely(!nmi_uaccess_okay()))
>>>> +		return -EPERM;
>>>> +
>>>> +	if (in_nmi()) {
>>>> +		work = this_cpu_ptr(&send_signal_work);
>>>> +		if (work->irq_work.flags & IRQ_WORK_BUSY)
>>>
>>> Given here and in stackmap are the only two users outside of kernel/irq_work.c,
>>> it may probably be good to add a small helper to include/linux/irq_work.h and
>>> use it for both.
>>>
>>> Perhaps something like ...
>>>
>>> static inline bool irq_work_busy(struct irq_work *work)
>>> {
>>> 	return READ_ONCE(work->flags) & IRQ_WORK_BUSY;
>>> }
>>
>> Not sure whether READ_ONCE is needed here or not.
>>
>> The irq_work is per cpu data structure,
>>     static DEFINE_PER_CPU(struct send_signal_irq_work, send_signal_work);
>> so presumably no collision for work->flags memory reference.
> 
> The busy bit you're testing is cleared via cmpxchg(), kernel/irq_work.c +169:
> 
> cmpxchg(&work->flags, flags, flags & ~IRQ_WORK_BUSY);

Looks like for bpf case, we have bpf_prog_active guarding,
so nested nmi won't trigger nested bpf_send_signal() call,
so we should be fine without READ_ONCE here.

Also,
struct irq_work {
         unsigned long flags;
         struct llist_node llnode;
         void (*func)(struct irq_work *);
};

-bash-4.4$ egrep -r 'work->flags' irq_work.c 

         flags = work->flags & ~IRQ_WORK_PENDING;
                 oflags = cmpxchg(&work->flags, flags, nflags);
         if (work->flags & IRQ_WORK_LAZY) {
                 flags = work->flags & ~IRQ_WORK_PENDING;
                 xchg(&work->flags, flags);
                 (void)cmpxchg(&work->flags, flags, flags & ~IRQ_WORK_BUSY);
         while (work->flags & IRQ_WORK_BUSY)
-bash-4.4$

 From the above, `flags` is unsigned long and no READ_ONCE
is used, people may already assume atomic read_once is available,
so not using it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-05-25 19:14 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-23 21:47 [PATCH bpf-next v5 0/3] bpf: implement bpf_send_signal() helper Yonghong Song
2019-05-23 21:47 ` [PATCH bpf-next v5 1/3] " Yonghong Song
2019-05-24 21:39   ` Daniel Borkmann
2019-05-24 21:59     ` Daniel Borkmann
2019-05-24 22:23       ` Yonghong Song
2019-05-24 22:38         ` Daniel Borkmann
2019-05-24 22:20     ` Yonghong Song
2019-05-24 22:37       ` Daniel Borkmann
2019-05-25 19:14         ` Yonghong Song
2019-05-23 21:47 ` [PATCH bpf-next v5 2/3] tools/bpf: sync bpf uapi header bpf.h to tools directory Yonghong Song
2019-05-23 21:47 ` [PATCH bpf-next v5 3/3] tools/bpf: add selftest in test_progs for bpf_send_signal() helper Yonghong Song
2019-05-24 21:32 ` [PATCH bpf-next v5 0/3] bpf: implement " Daniel Borkmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).