linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2][v2] Add the ability to do BPF directed error injection
@ 2017-10-31 15:45 Josef Bacik
  2017-10-31 15:45 ` [PATCH 1/2] bpf: add a bpf_override_function helper Josef Bacik
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Josef Bacik @ 2017-10-31 15:45 UTC (permalink / raw)
  To: davem, netdev, linux-kernel, ast, kernel-team

v1->v2:
- moved things around to make sure that bpf_override_return could really only be
  used for an ftrace kprobe.
- killed the special return values from trace_call_bpf.
- renamed pc_modified to bpf_kprobe_state so bpf_override_return could tell if
  it was being called from an ftrace kprobe context.
- reworked the logic in kprobe_perf_func to take advantage of bpf_kprobe_state.
- updated the test as per Alexei's review.

A lot of our error paths are not well tested because we have no good way of
injecting errors generically.  Some subystems (block, memory) have ways to
inject errors, but they are random so it's hard to get reproduceable results.

With BPF we can add determinism to our error injection.  We can use kprobes and
other things to verify we are injecting errors at the exact case we are trying
to test.  This patch gives us the tool to actual do the error injection part.
It is very simple, we just set the return value of the pt_regs we're given to
whatever we provide, and then override the PC with a dummy function that simply
returns.

Right now this only works on x86, but it would be simple enough to expand to
other architectures.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] bpf: add a bpf_override_function helper
  2017-10-31 15:45 [PATCH 0/2][v2] Add the ability to do BPF directed error injection Josef Bacik
@ 2017-10-31 15:45 ` Josef Bacik
  2017-11-01  4:47   ` Alexei Starovoitov
  2017-10-31 15:45 ` [PATCH 2/2] samples/bpf: add a test for bpf_override_return Josef Bacik
  2017-11-01  1:55 ` [PATCH 0/2][v2] Add the ability to do BPF directed error injection David Miller
  2 siblings, 1 reply; 7+ messages in thread
From: Josef Bacik @ 2017-10-31 15:45 UTC (permalink / raw)
  To: davem, netdev, linux-kernel, ast, kernel-team; +Cc: Josef Bacik

From: Josef Bacik <jbacik@fb.com>

Error injection is sloppy and very ad-hoc.  BPF could fill this niche
perfectly with it's kprobe functionality.  We could make sure errors are
only triggered in specific call chains that we care about with very
specific situations.  Accomplish this with the bpf_override_funciton
helper.  This will modify the probe'd callers return value to the
specified value and set the PC to an override function that simply
returns, bypassing the originally probed function.  This gives us a nice
clean way to implement systematic error injection for all of our code
paths.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 arch/Kconfig                     |  3 +++
 arch/x86/Kconfig                 |  1 +
 arch/x86/include/asm/kprobes.h   |  4 ++++
 arch/x86/include/asm/ptrace.h    |  5 +++++
 arch/x86/kernel/kprobes/ftrace.c | 14 ++++++++++++++
 include/linux/trace_events.h     |  7 +++++++
 include/uapi/linux/bpf.h         |  7 ++++++-
 kernel/trace/Kconfig             | 11 +++++++++++
 kernel/trace/bpf_trace.c         | 30 ++++++++++++++++++++++++++++
 kernel/trace/trace_kprobe.c      | 42 +++++++++++++++++++++++++++++++++-------
 10 files changed, 116 insertions(+), 8 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index d789a89cb32c..4fb618082259 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -195,6 +195,9 @@ config HAVE_OPTPROBES
 config HAVE_KPROBES_ON_FTRACE
 	bool
 
+config HAVE_KPROBE_OVERRIDE
+	bool
+
 config HAVE_NMI
 	bool
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 971feac13506..5126d2750dd0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -152,6 +152,7 @@ config X86
 	select HAVE_KERNEL_XZ
 	select HAVE_KPROBES
 	select HAVE_KPROBES_ON_FTRACE
+	select HAVE_KPROBE_OVERRIDE
 	select HAVE_KRETPROBES
 	select HAVE_KVM
 	select HAVE_LIVEPATCH			if X86_64
diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index 6cf65437b5e5..c6c3b1f4306a 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -67,6 +67,10 @@ extern const int kretprobe_blacklist_size;
 void arch_remove_kprobe(struct kprobe *p);
 asmlinkage void kretprobe_trampoline(void);
 
+#ifdef CONFIG_KPROBES_ON_FTRACE
+extern void arch_ftrace_kprobe_override_function(struct pt_regs *regs);
+#endif
+
 /* Architecture specific copy of original instruction*/
 struct arch_specific_insn {
 	/* copy of the original instruction */
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 91c04c8e67fa..f04e71800c2f 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -108,6 +108,11 @@ static inline unsigned long regs_return_value(struct pt_regs *regs)
 	return regs->ax;
 }
 
+static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc)
+{
+	regs->ax = rc;
+}
+
 /*
  * user_mode(regs) determines whether a register set came from user
  * mode.  On x86_32, this is true if V8086 mode was enabled OR if the
diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
index 041f7b6dfa0f..3c455bf490cb 100644
--- a/arch/x86/kernel/kprobes/ftrace.c
+++ b/arch/x86/kernel/kprobes/ftrace.c
@@ -97,3 +97,17 @@ int arch_prepare_kprobe_ftrace(struct kprobe *p)
 	p->ainsn.boostable = false;
 	return 0;
 }
+
+asmlinkage void override_func(void);
+asm(
+	".type override_func, @function\n"
+	"override_func:\n"
+	"	ret\n"
+	".size override_func, .-override_func\n"
+);
+
+void arch_ftrace_kprobe_override_function(struct pt_regs *regs)
+{
+	regs->ip = (unsigned long)&override_func;
+}
+NOKPROBE_SYMBOL(arch_ftrace_kprobe_override_function);
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index fc6aeca945db..9179f109c49b 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -521,7 +521,14 @@ do {									\
 #ifdef CONFIG_PERF_EVENTS
 struct perf_event;
 
+enum {
+	BPF_STATE_NORMAL_KPROBE = 0,
+	BPF_STATE_FTRACE_KPROBE,
+	BPF_STATE_MODIFIED_PC,
+};
+
 DECLARE_PER_CPU(struct pt_regs, perf_trace_regs);
+DECLARE_PER_CPU(int, bpf_kprobe_state);
 
 extern int  perf_trace_init(struct perf_event *event);
 extern void perf_trace_destroy(struct perf_event *event);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0b7b54d898bd..1ad5b87a42f6 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -673,6 +673,10 @@ union bpf_attr {
  *     @buf: buf to fill
  *     @buf_size: size of the buf
  *     Return : 0 on success or negative error code
+ *
+ * int bpf_override_return(pt_regs, rc)
+ *	@pt_regs: pointer to struct pt_regs
+ *	@rc: the return value to set
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -732,7 +736,8 @@ union bpf_attr {
 	FN(xdp_adjust_meta),		\
 	FN(perf_event_read_value),	\
 	FN(perf_prog_read_value),	\
-	FN(getsockopt),
+	FN(getsockopt),			\
+	FN(override_return),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 434c840e2d82..9dc0deeaad2b 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -518,6 +518,17 @@ config FUNCTION_PROFILER
 
 	  If in doubt, say N.
 
+config BPF_KPROBE_OVERRIDE
+	bool "Enable BPF programs to override a kprobed function"
+	depends on BPF_EVENTS
+	depends on KPROBES_ON_FTRACE
+	depends on HAVE_KPROBE_OVERRIDE
+	depends on DYNAMIC_FTRACE_WITH_REGS
+	default n
+	help
+	 Allows BPF to override the execution of a probed function and
+	 set a different return value.  This is used for error injection.
+
 config FTRACE_MCOUNT_RECORD
 	def_bool y
 	depends on DYNAMIC_FTRACE
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 136aa6bb0422..e9a7ae9c3a78 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -13,6 +13,8 @@
 #include <linux/filter.h>
 #include <linux/uaccess.h>
 #include <linux/ctype.h>
+#include <asm/kprobes.h>
+
 #include "trace.h"
 
 u64 bpf_get_stackid(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
@@ -76,6 +78,32 @@ unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
 }
 EXPORT_SYMBOL_GPL(trace_call_bpf);
 
+#ifdef CONFIG_BPF_KPROBE_OVERRIDE
+BPF_CALL_2(bpf_override_return, struct pt_regs *, regs, unsigned long, rc)
+{
+	/* We can only override ftrace kprobes at the moment. */
+	if (__this_cpu_read(bpf_kprobe_state) != BPF_STATE_FTRACE_KPROBE)
+		return -EINVAL;
+	__this_cpu_write(bpf_kprobe_state, BPF_STATE_MODIFIED_PC);
+	regs_set_return_value(regs, rc);
+	arch_ftrace_kprobe_override_function(regs);
+	return 0;
+}
+#else
+BPF_CALL_2(bpf_override_return, struct pt_regs *, regs, unsigned long, rc)
+{
+	return -EINVAL;
+}
+#endif
+
+static const struct bpf_func_proto bpf_override_return_proto = {
+	.func		= bpf_override_return,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_ANYTHING,
+};
+
 BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr)
 {
 	int ret;
@@ -551,6 +579,8 @@ static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func
 		return &bpf_get_stackid_proto;
 	case BPF_FUNC_perf_event_read_value:
 		return &bpf_perf_event_read_value_proto;
+	case BPF_FUNC_override_return:
+		return &bpf_override_return_proto;
 	default:
 		return tracing_func_proto(func_id);
 	}
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index abf92e478cfb..b7173e44e2e7 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -42,6 +42,7 @@ struct trace_kprobe {
 	(offsetof(struct trace_kprobe, tp.args) +	\
 	(sizeof(struct probe_arg) * (n)))
 
+DEFINE_PER_CPU(int, bpf_kprobe_state);
 
 static nokprobe_inline bool trace_kprobe_is_return(struct trace_kprobe *tk)
 {
@@ -1170,7 +1171,7 @@ static int kretprobe_event_define_fields(struct trace_event_call *event_call)
 #ifdef CONFIG_PERF_EVENTS
 
 /* Kprobe profile handler */
-static void
+static int
 kprobe_perf_func(struct trace_kprobe *tk, struct pt_regs *regs)
 {
 	struct trace_event_call *call = &tk->tp.call;
@@ -1179,12 +1180,37 @@ kprobe_perf_func(struct trace_kprobe *tk, struct pt_regs *regs)
 	int size, __size, dsize;
 	int rctx;
 
-	if (bpf_prog_array_valid(call) && !trace_call_bpf(call, regs))
-		return;
+	if (bpf_prog_array_valid(call)) {
+		int ret, state;
+
+		/*
+		 * If we are an ftrace kprobe then we are allowed to do
+		 * overrides for this kprobe.
+		 */
+		if (unlikely(kprobe_ftrace(&tk->rp.kp)))
+			__this_cpu_write(bpf_kprobe_state,
+					 BPF_STATE_FTRACE_KPROBE);
+		ret = trace_call_bpf(call, regs);
+
+		/*
+		 * We need to check and see if we modified the pc of the
+		 * pt_regs, and if so clear the kprobe and return 1 so that we
+		 * don't do the instruction skipping.  Also reset our state so
+		 * we are clean the next pass through.
+		 */
+		state = __this_cpu_read(bpf_kprobe_state);
+		__this_cpu_write(bpf_kprobe_state, BPF_STATE_NORMAL_KPROBE);
+		if (state == BPF_STATE_MODIFIED_PC) {
+			reset_current_kprobe();
+			return 1;
+		}
+		if (!ret)
+			return 0;
+	}
 
 	head = this_cpu_ptr(call->perf_events);
 	if (hlist_empty(head))
-		return;
+		return 0;
 
 	dsize = __get_data_size(&tk->tp, regs);
 	__size = sizeof(*entry) + tk->tp.size + dsize;
@@ -1193,13 +1219,14 @@ kprobe_perf_func(struct trace_kprobe *tk, struct pt_regs *regs)
 
 	entry = perf_trace_buf_alloc(size, NULL, &rctx);
 	if (!entry)
-		return;
+		return 0;
 
 	entry->ip = (unsigned long)tk->rp.kp.addr;
 	memset(&entry[1], 0, dsize);
 	store_trace_args(sizeof(*entry), &tk->tp, regs, (u8 *)&entry[1], dsize);
 	perf_trace_buf_submit(entry, size, rctx, call->event.type, 1, regs,
 			      head, NULL, NULL);
+	return 0;
 }
 NOKPROBE_SYMBOL(kprobe_perf_func);
 
@@ -1275,6 +1302,7 @@ static int kprobe_register(struct trace_event_call *event,
 static int kprobe_dispatcher(struct kprobe *kp, struct pt_regs *regs)
 {
 	struct trace_kprobe *tk = container_of(kp, struct trace_kprobe, rp.kp);
+	int ret = 0;
 
 	raw_cpu_inc(*tk->nhit);
 
@@ -1282,9 +1310,9 @@ static int kprobe_dispatcher(struct kprobe *kp, struct pt_regs *regs)
 		kprobe_trace_func(tk, regs);
 #ifdef CONFIG_PERF_EVENTS
 	if (tk->tp.flags & TP_FLAG_PROFILE)
-		kprobe_perf_func(tk, regs);
+		ret = kprobe_perf_func(tk, regs);
 #endif
-	return 0;	/* We don't tweek kernel, so just return 0 */
+	return ret;
 }
 NOKPROBE_SYMBOL(kprobe_dispatcher);
 
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] samples/bpf: add a test for bpf_override_return
  2017-10-31 15:45 [PATCH 0/2][v2] Add the ability to do BPF directed error injection Josef Bacik
  2017-10-31 15:45 ` [PATCH 1/2] bpf: add a bpf_override_function helper Josef Bacik
@ 2017-10-31 15:45 ` Josef Bacik
  2017-11-01  1:55 ` [PATCH 0/2][v2] Add the ability to do BPF directed error injection David Miller
  2 siblings, 0 replies; 7+ messages in thread
From: Josef Bacik @ 2017-10-31 15:45 UTC (permalink / raw)
  To: davem, netdev, linux-kernel, ast, kernel-team; +Cc: Josef Bacik

From: Josef Bacik <jbacik@fb.com>

This adds a basic test for bpf_override_return to verify it works.  We
override the main function for mounting a btrfs fs so it'll return
-ENOMEM and then make sure that trying to mount a btrfs fs will fail.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 samples/bpf/Makefile                      |  4 ++++
 samples/bpf/test_override_return.sh       | 15 +++++++++++++++
 samples/bpf/tracex7_kern.c                | 16 ++++++++++++++++
 samples/bpf/tracex7_user.c                | 28 ++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h            |  7 ++++++-
 tools/testing/selftests/bpf/bpf_helpers.h |  3 ++-
 6 files changed, 71 insertions(+), 2 deletions(-)
 create mode 100755 samples/bpf/test_override_return.sh
 create mode 100644 samples/bpf/tracex7_kern.c
 create mode 100644 samples/bpf/tracex7_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index ea2b9e6135f3..83d06bc1f710 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -14,6 +14,7 @@ hostprogs-y += tracex3
 hostprogs-y += tracex4
 hostprogs-y += tracex5
 hostprogs-y += tracex6
+hostprogs-y += tracex7
 hostprogs-y += test_probe_write_user
 hostprogs-y += trace_output
 hostprogs-y += lathist
@@ -58,6 +59,7 @@ tracex3-objs := bpf_load.o $(LIBBPF) tracex3_user.o
 tracex4-objs := bpf_load.o $(LIBBPF) tracex4_user.o
 tracex5-objs := bpf_load.o $(LIBBPF) tracex5_user.o
 tracex6-objs := bpf_load.o $(LIBBPF) tracex6_user.o
+tracex7-objs := bpf_load.o $(LIBBPF) tracex7_user.o
 load_sock_ops-objs := bpf_load.o $(LIBBPF) load_sock_ops.o
 test_probe_write_user-objs := bpf_load.o $(LIBBPF) test_probe_write_user_user.o
 trace_output-objs := bpf_load.o $(LIBBPF) trace_output_user.o
@@ -100,6 +102,7 @@ always += tracex3_kern.o
 always += tracex4_kern.o
 always += tracex5_kern.o
 always += tracex6_kern.o
+always += tracex7_kern.o
 always += sock_flags_kern.o
 always += test_probe_write_user_kern.o
 always += trace_output_kern.o
@@ -153,6 +156,7 @@ HOSTLOADLIBES_tracex3 += -lelf
 HOSTLOADLIBES_tracex4 += -lelf -lrt
 HOSTLOADLIBES_tracex5 += -lelf
 HOSTLOADLIBES_tracex6 += -lelf
+HOSTLOADLIBES_tracex7 += -lelf
 HOSTLOADLIBES_test_cgrp2_sock2 += -lelf
 HOSTLOADLIBES_load_sock_ops += -lelf
 HOSTLOADLIBES_test_probe_write_user += -lelf
diff --git a/samples/bpf/test_override_return.sh b/samples/bpf/test_override_return.sh
new file mode 100755
index 000000000000..e68b9ee6814b
--- /dev/null
+++ b/samples/bpf/test_override_return.sh
@@ -0,0 +1,15 @@
+#!/bin/bash
+
+rm -f testfile.img
+dd if=/dev/zero of=testfile.img bs=1M seek=1000 count=1
+DEVICE=$(losetup --show -f testfile.img)
+mkfs.btrfs -f $DEVICE
+mkdir tmpmnt
+./tracex7 $DEVICE
+if [ $? -eq 0 ]
+then
+	echo "SUCCESS!"
+else
+	echo "FAILED!"
+fi
+losetup -d $DEVICE
diff --git a/samples/bpf/tracex7_kern.c b/samples/bpf/tracex7_kern.c
new file mode 100644
index 000000000000..1ab308a43e0f
--- /dev/null
+++ b/samples/bpf/tracex7_kern.c
@@ -0,0 +1,16 @@
+#include <uapi/linux/ptrace.h>
+#include <uapi/linux/bpf.h>
+#include <linux/version.h>
+#include "bpf_helpers.h"
+
+SEC("kprobe/open_ctree")
+int bpf_prog1(struct pt_regs *ctx)
+{
+	unsigned long rc = -12;
+
+	bpf_override_return(ctx, rc);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+u32 _version SEC("version") = LINUX_VERSION_CODE;
diff --git a/samples/bpf/tracex7_user.c b/samples/bpf/tracex7_user.c
new file mode 100644
index 000000000000..8a52ac492e8b
--- /dev/null
+++ b/samples/bpf/tracex7_user.c
@@ -0,0 +1,28 @@
+#define _GNU_SOURCE
+
+#include <stdio.h>
+#include <linux/bpf.h>
+#include <unistd.h>
+#include "libbpf.h"
+#include "bpf_load.h"
+
+int main(int argc, char **argv)
+{
+	FILE *f;
+	char filename[256];
+	char command[256];
+	int ret;
+
+	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
+
+	if (load_bpf_file(filename)) {
+		printf("%s", bpf_log_buf);
+		return 1;
+	}
+
+	snprintf(command, 256, "mount %s tmpmnt/", argv[1]);
+	f = popen(command, "r");
+	ret = pclose(f);
+
+	return ret ? 0 : 1;
+}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 4a4b6e78c977..3756dde69834 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -673,6 +673,10 @@ union bpf_attr {
  *     @buf: buf to fill
  *     @buf_size: size of the buf
  *     Return : 0 on success or negative error code
+ *
+ * int bpf_override_return(pt_regs, rc)
+ *	@pt_regs: pointer to struct pt_regs
+ *	@rc: the return value to set
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -732,7 +736,8 @@ union bpf_attr {
 	FN(xdp_adjust_meta),		\
 	FN(perf_event_read_value),	\
 	FN(perf_prog_read_value),	\
-	FN(getsockopt),
+	FN(getsockopt),			\
+	FN(override_return),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index abfa4c5c8527..086733298d5e 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -81,7 +81,8 @@ static int (*bpf_perf_event_read_value)(void *map, unsigned long long flags,
 static int (*bpf_perf_prog_read_value)(void *ctx, void *buf,
 				       unsigned int buf_size) =
 	(void *) BPF_FUNC_perf_prog_read_value;
-
+static int (*bpf_override_return)(void *ctx, unsigned long rc) =
+	(void *) BPF_FUNC_override_return;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2][v2] Add the ability to do BPF directed error injection
  2017-10-31 15:45 [PATCH 0/2][v2] Add the ability to do BPF directed error injection Josef Bacik
  2017-10-31 15:45 ` [PATCH 1/2] bpf: add a bpf_override_function helper Josef Bacik
  2017-10-31 15:45 ` [PATCH 2/2] samples/bpf: add a test for bpf_override_return Josef Bacik
@ 2017-11-01  1:55 ` David Miller
  2017-11-01  1:58   ` Alexei Starovoitov
  2 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2017-11-01  1:55 UTC (permalink / raw)
  To: josef; +Cc: netdev, linux-kernel, ast, kernel-team

From: Josef Bacik <josef@toxicpanda.com>
Date: Tue, 31 Oct 2017 11:45:55 -0400

> v1->v2:
> - moved things around to make sure that bpf_override_return could really only be
>   used for an ftrace kprobe.
> - killed the special return values from trace_call_bpf.
> - renamed pc_modified to bpf_kprobe_state so bpf_override_return could tell if
>   it was being called from an ftrace kprobe context.
> - reworked the logic in kprobe_perf_func to take advantage of bpf_kprobe_state.
> - updated the test as per Alexei's review.
> 
> A lot of our error paths are not well tested because we have no good way of
> injecting errors generically.  Some subystems (block, memory) have ways to
> inject errors, but they are random so it's hard to get reproduceable results.
> 
> With BPF we can add determinism to our error injection.  We can use kprobes and
> other things to verify we are injecting errors at the exact case we are trying
> to test.  This patch gives us the tool to actual do the error injection part.
> It is very simple, we just set the return value of the pt_regs we're given to
> whatever we provide, and then override the PC with a dummy function that simply
> returns.
> 
> Right now this only works on x86, but it would be simple enough to expand to
> other architectures.  Thanks,

This appears to moreso target the tracing tree than the networking tree.

Let me know if that's not the case and I should be the one intergrating
these changes.

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2][v2] Add the ability to do BPF directed error injection
  2017-11-01  1:55 ` [PATCH 0/2][v2] Add the ability to do BPF directed error injection David Miller
@ 2017-11-01  1:58   ` Alexei Starovoitov
  2017-11-01  2:44     ` David Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Alexei Starovoitov @ 2017-11-01  1:58 UTC (permalink / raw)
  To: David Miller, josef; +Cc: netdev, linux-kernel, ast, kernel-team

On 10/31/17 6:55 PM, David Miller wrote:
> From: Josef Bacik <josef@toxicpanda.com>
> Date: Tue, 31 Oct 2017 11:45:55 -0400
>
>> v1->v2:
>> - moved things around to make sure that bpf_override_return could really only be
>>   used for an ftrace kprobe.
>> - killed the special return values from trace_call_bpf.
>> - renamed pc_modified to bpf_kprobe_state so bpf_override_return could tell if
>>   it was being called from an ftrace kprobe context.
>> - reworked the logic in kprobe_perf_func to take advantage of bpf_kprobe_state.
>> - updated the test as per Alexei's review.
>>
>> A lot of our error paths are not well tested because we have no good way of
>> injecting errors generically.  Some subystems (block, memory) have ways to
>> inject errors, but they are random so it's hard to get reproduceable results.
>>
>> With BPF we can add determinism to our error injection.  We can use kprobes and
>> other things to verify we are injecting errors at the exact case we are trying
>> to test.  This patch gives us the tool to actual do the error injection part.
>> It is very simple, we just set the return value of the pt_regs we're given to
>> whatever we provide, and then override the PC with a dummy function that simply
>> returns.
>>
>> Right now this only works on x86, but it would be simple enough to expand to
>> other architectures.  Thanks,
>
> This appears to moreso target the tracing tree than the networking tree.
>
> Let me know if that's not the case and I should be the one intergrating
> these changes.

i don't think it will apply to anything but net-next. If it goes any
other tree we will have major conflicts during merge window.
btw I haven't reviewed them for the second time.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2][v2] Add the ability to do BPF directed error injection
  2017-11-01  1:58   ` Alexei Starovoitov
@ 2017-11-01  2:44     ` David Miller
  0 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2017-11-01  2:44 UTC (permalink / raw)
  To: ast; +Cc: josef, netdev, linux-kernel, ast, kernel-team

From: Alexei Starovoitov <ast@fb.com>
Date: Tue, 31 Oct 2017 18:58:00 -0700

> i don't think it will apply to anything but net-next. If it goes any
> other tree we will have major conflicts during merge window.
> btw I haven't reviewed them for the second time.

Ok, then I'll need to seem some ACKs from the tracing folks.

Thank you.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] bpf: add a bpf_override_function helper
  2017-10-31 15:45 ` [PATCH 1/2] bpf: add a bpf_override_function helper Josef Bacik
@ 2017-11-01  4:47   ` Alexei Starovoitov
  0 siblings, 0 replies; 7+ messages in thread
From: Alexei Starovoitov @ 2017-11-01  4:47 UTC (permalink / raw)
  To: Josef Bacik, davem, netdev, linux-kernel, ast, kernel-team
  Cc: Josef Bacik, Daniel Borkmann

On 10/31/17 8:45 AM, Josef Bacik wrote:
> From: Josef Bacik <jbacik@fb.com>
>
> Error injection is sloppy and very ad-hoc.  BPF could fill this niche
> perfectly with it's kprobe functionality.  We could make sure errors are
> only triggered in specific call chains that we care about with very
> specific situations.  Accomplish this with the bpf_override_funciton
> helper.  This will modify the probe'd callers return value to the
> specified value and set the PC to an override function that simply
> returns, bypassing the originally probed function.  This gives us a nice
> clean way to implement systematic error injection for all of our code
> paths.
>
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  arch/Kconfig                     |  3 +++
>  arch/x86/Kconfig                 |  1 +
>  arch/x86/include/asm/kprobes.h   |  4 ++++
>  arch/x86/include/asm/ptrace.h    |  5 +++++
>  arch/x86/kernel/kprobes/ftrace.c | 14 ++++++++++++++
>  include/linux/trace_events.h     |  7 +++++++
>  include/uapi/linux/bpf.h         |  7 ++++++-
>  kernel/trace/Kconfig             | 11 +++++++++++
>  kernel/trace/bpf_trace.c         | 30 ++++++++++++++++++++++++++++
>  kernel/trace/trace_kprobe.c      | 42 +++++++++++++++++++++++++++++++++-------
>  10 files changed, 116 insertions(+), 8 deletions(-)
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index d789a89cb32c..4fb618082259 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -195,6 +195,9 @@ config HAVE_OPTPROBES
>  config HAVE_KPROBES_ON_FTRACE
>  	bool
>
> +config HAVE_KPROBE_OVERRIDE
> +	bool
> +
>  config HAVE_NMI
>  	bool
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 971feac13506..5126d2750dd0 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -152,6 +152,7 @@ config X86
>  	select HAVE_KERNEL_XZ
>  	select HAVE_KPROBES
>  	select HAVE_KPROBES_ON_FTRACE
> +	select HAVE_KPROBE_OVERRIDE
>  	select HAVE_KRETPROBES
>  	select HAVE_KVM
>  	select HAVE_LIVEPATCH			if X86_64
> diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
> index 6cf65437b5e5..c6c3b1f4306a 100644
> --- a/arch/x86/include/asm/kprobes.h
> +++ b/arch/x86/include/asm/kprobes.h
> @@ -67,6 +67,10 @@ extern const int kretprobe_blacklist_size;
>  void arch_remove_kprobe(struct kprobe *p);
>  asmlinkage void kretprobe_trampoline(void);
>
> +#ifdef CONFIG_KPROBES_ON_FTRACE
> +extern void arch_ftrace_kprobe_override_function(struct pt_regs *regs);
> +#endif
> +
>  /* Architecture specific copy of original instruction*/
>  struct arch_specific_insn {
>  	/* copy of the original instruction */
> diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
> index 91c04c8e67fa..f04e71800c2f 100644
> --- a/arch/x86/include/asm/ptrace.h
> +++ b/arch/x86/include/asm/ptrace.h
> @@ -108,6 +108,11 @@ static inline unsigned long regs_return_value(struct pt_regs *regs)
>  	return regs->ax;
>  }
>
> +static inline void regs_set_return_value(struct pt_regs *regs, unsigned long rc)
> +{
> +	regs->ax = rc;
> +}
> +
>  /*
>   * user_mode(regs) determines whether a register set came from user
>   * mode.  On x86_32, this is true if V8086 mode was enabled OR if the
> diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
> index 041f7b6dfa0f..3c455bf490cb 100644
> --- a/arch/x86/kernel/kprobes/ftrace.c
> +++ b/arch/x86/kernel/kprobes/ftrace.c
> @@ -97,3 +97,17 @@ int arch_prepare_kprobe_ftrace(struct kprobe *p)
>  	p->ainsn.boostable = false;
>  	return 0;
>  }
> +
> +asmlinkage void override_func(void);
> +asm(
> +	".type override_func, @function\n"
> +	"override_func:\n"
> +	"	ret\n"
> +	".size override_func, .-override_func\n"
> +);
> +
> +void arch_ftrace_kprobe_override_function(struct pt_regs *regs)
> +{
> +	regs->ip = (unsigned long)&override_func;
> +}
> +NOKPROBE_SYMBOL(arch_ftrace_kprobe_override_function);
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index fc6aeca945db..9179f109c49b 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -521,7 +521,14 @@ do {									\
>  #ifdef CONFIG_PERF_EVENTS
>  struct perf_event;
>
> +enum {
> +	BPF_STATE_NORMAL_KPROBE = 0,
> +	BPF_STATE_FTRACE_KPROBE,
> +	BPF_STATE_MODIFIED_PC,
> +};
> +
>  DECLARE_PER_CPU(struct pt_regs, perf_trace_regs);
> +DECLARE_PER_CPU(int, bpf_kprobe_state);
>
>  extern int  perf_trace_init(struct perf_event *event);
>  extern void perf_trace_destroy(struct perf_event *event);
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 0b7b54d898bd..1ad5b87a42f6 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -673,6 +673,10 @@ union bpf_attr {
>   *     @buf: buf to fill
>   *     @buf_size: size of the buf
>   *     Return : 0 on success or negative error code
> + *
> + * int bpf_override_return(pt_regs, rc)
> + *	@pt_regs: pointer to struct pt_regs
> + *	@rc: the return value to set
>   */
>  #define __BPF_FUNC_MAPPER(FN)		\
>  	FN(unspec),			\
> @@ -732,7 +736,8 @@ union bpf_attr {
>  	FN(xdp_adjust_meta),		\
>  	FN(perf_event_read_value),	\
>  	FN(perf_prog_read_value),	\
> -	FN(getsockopt),
> +	FN(getsockopt),			\
> +	FN(override_return),
>
>  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>   * function eBPF program intends to call
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index 434c840e2d82..9dc0deeaad2b 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -518,6 +518,17 @@ config FUNCTION_PROFILER
>
>  	  If in doubt, say N.
>
> +config BPF_KPROBE_OVERRIDE
> +	bool "Enable BPF programs to override a kprobed function"
> +	depends on BPF_EVENTS
> +	depends on KPROBES_ON_FTRACE
> +	depends on HAVE_KPROBE_OVERRIDE
> +	depends on DYNAMIC_FTRACE_WITH_REGS
> +	default n
> +	help
> +	 Allows BPF to override the execution of a probed function and
> +	 set a different return value.  This is used for error injection.
> +
>  config FTRACE_MCOUNT_RECORD
>  	def_bool y
>  	depends on DYNAMIC_FTRACE
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 136aa6bb0422..e9a7ae9c3a78 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -13,6 +13,8 @@
>  #include <linux/filter.h>
>  #include <linux/uaccess.h>
>  #include <linux/ctype.h>
> +#include <asm/kprobes.h>
> +
>  #include "trace.h"
>
>  u64 bpf_get_stackid(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
> @@ -76,6 +78,32 @@ unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
>  }
>  EXPORT_SYMBOL_GPL(trace_call_bpf);
>
> +#ifdef CONFIG_BPF_KPROBE_OVERRIDE
> +BPF_CALL_2(bpf_override_return, struct pt_regs *, regs, unsigned long, rc)
> +{
> +	/* We can only override ftrace kprobes at the moment. */
> +	if (__this_cpu_read(bpf_kprobe_state) != BPF_STATE_FTRACE_KPROBE)
> +		return -EINVAL;
> +	__this_cpu_write(bpf_kprobe_state, BPF_STATE_MODIFIED_PC);
> +	regs_set_return_value(regs, rc);
> +	arch_ftrace_kprobe_override_function(regs);
> +	return 0;
> +}
> +#else
> +BPF_CALL_2(bpf_override_return, struct pt_regs *, regs, unsigned long, rc)
> +{
> +	return -EINVAL;
> +}
> +#endif
> +
> +static const struct bpf_func_proto bpf_override_return_proto = {
> +	.func		= bpf_override_return,
> +	.gpl_only	= true,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_CTX,
> +	.arg2_type	= ARG_ANYTHING,
> +};
> +
>  BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr)
>  {
>  	int ret;
> @@ -551,6 +579,8 @@ static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func
>  		return &bpf_get_stackid_proto;
>  	case BPF_FUNC_perf_event_read_value:
>  		return &bpf_perf_event_read_value_proto;
> +	case BPF_FUNC_override_return:
> +		return &bpf_override_return_proto;
>  	default:
>  		return tracing_func_proto(func_id);
>  	}
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index abf92e478cfb..b7173e44e2e7 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -42,6 +42,7 @@ struct trace_kprobe {
>  	(offsetof(struct trace_kprobe, tp.args) +	\
>  	(sizeof(struct probe_arg) * (n)))
>
> +DEFINE_PER_CPU(int, bpf_kprobe_state);
>
>  static nokprobe_inline bool trace_kprobe_is_return(struct trace_kprobe *tk)
>  {
> @@ -1170,7 +1171,7 @@ static int kretprobe_event_define_fields(struct trace_event_call *event_call)
>  #ifdef CONFIG_PERF_EVENTS
>
>  /* Kprobe profile handler */
> -static void
> +static int
>  kprobe_perf_func(struct trace_kprobe *tk, struct pt_regs *regs)
>  {
>  	struct trace_event_call *call = &tk->tp.call;
> @@ -1179,12 +1180,37 @@ kprobe_perf_func(struct trace_kprobe *tk, struct pt_regs *regs)
>  	int size, __size, dsize;
>  	int rctx;
>
> -	if (bpf_prog_array_valid(call) && !trace_call_bpf(call, regs))
> -		return;
> +	if (bpf_prog_array_valid(call)) {
> +		int ret, state;
> +
> +		/*
> +		 * If we are an ftrace kprobe then we are allowed to do
> +		 * overrides for this kprobe.
> +		 */
> +		if (unlikely(kprobe_ftrace(&tk->rp.kp)))
> +			__this_cpu_write(bpf_kprobe_state,
> +					 BPF_STATE_FTRACE_KPROBE);
> +		ret = trace_call_bpf(call, regs);
> +
> +		/*
> +		 * We need to check and see if we modified the pc of the
> +		 * pt_regs, and if so clear the kprobe and return 1 so that we
> +		 * don't do the instruction skipping.  Also reset our state so
> +		 * we are clean the next pass through.
> +		 */
> +		state = __this_cpu_read(bpf_kprobe_state);
> +		__this_cpu_write(bpf_kprobe_state, BPF_STATE_NORMAL_KPROBE);
> +		if (state == BPF_STATE_MODIFIED_PC) {
> +			reset_current_kprobe();
> +			return 1;
> +		}

have been thinking whether it's possible to reduce runtime
penalty in all bpf-kprobes for this feature...
The first kprobe_ftrace() check can be done at attach time.
Like a bit in bpf_prog can indicate whether it attempts to use
bpf_override_return() (similar to cb_access) and in
perf_event_attach_bpf_prog() do
if (event->tp_event->flags & TRACE_EVENT_FL_KPROBE)
   kprobe_ftrace(&((struct trace_kprobe *)
                  (event->tp_event->data))->rp.kp)
may be there are helpers to make it pretty.
Then after trace_call_bpf() we don't have to do __this_cpu_read+write
Only read will be enough.

The rest looks great. Looks like amazing feature.

> +		if (!ret)
> +			return 0;
> +	}
>
>  	head = this_cpu_ptr(call->perf_events);
>  	if (hlist_empty(head))
> -		return;
> +		return 0;
>
>  	dsize = __get_data_size(&tk->tp, regs);
>  	__size = sizeof(*entry) + tk->tp.size + dsize;
> @@ -1193,13 +1219,14 @@ kprobe_perf_func(struct trace_kprobe *tk, struct pt_regs *regs)
>
>  	entry = perf_trace_buf_alloc(size, NULL, &rctx);
>  	if (!entry)
> -		return;
> +		return 0;
>
>  	entry->ip = (unsigned long)tk->rp.kp.addr;
>  	memset(&entry[1], 0, dsize);
>  	store_trace_args(sizeof(*entry), &tk->tp, regs, (u8 *)&entry[1], dsize);
>  	perf_trace_buf_submit(entry, size, rctx, call->event.type, 1, regs,
>  			      head, NULL, NULL);
> +	return 0;
>  }
>  NOKPROBE_SYMBOL(kprobe_perf_func);
>
> @@ -1275,6 +1302,7 @@ static int kprobe_register(struct trace_event_call *event,
>  static int kprobe_dispatcher(struct kprobe *kp, struct pt_regs *regs)
>  {
>  	struct trace_kprobe *tk = container_of(kp, struct trace_kprobe, rp.kp);
> +	int ret = 0;
>
>  	raw_cpu_inc(*tk->nhit);
>
> @@ -1282,9 +1310,9 @@ static int kprobe_dispatcher(struct kprobe *kp, struct pt_regs *regs)
>  		kprobe_trace_func(tk, regs);
>  #ifdef CONFIG_PERF_EVENTS
>  	if (tk->tp.flags & TP_FLAG_PROFILE)
> -		kprobe_perf_func(tk, regs);
> +		ret = kprobe_perf_func(tk, regs);
>  #endif
> -	return 0;	/* We don't tweek kernel, so just return 0 */
> +	return ret;
>  }
>  NOKPROBE_SYMBOL(kprobe_dispatcher);
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-11-01  4:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-31 15:45 [PATCH 0/2][v2] Add the ability to do BPF directed error injection Josef Bacik
2017-10-31 15:45 ` [PATCH 1/2] bpf: add a bpf_override_function helper Josef Bacik
2017-11-01  4:47   ` Alexei Starovoitov
2017-10-31 15:45 ` [PATCH 2/2] samples/bpf: add a test for bpf_override_return Josef Bacik
2017-11-01  1:55 ` [PATCH 0/2][v2] Add the ability to do BPF directed error injection David Miller
2017-11-01  1:58   ` Alexei Starovoitov
2017-11-01  2:44     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).