bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/7] Introduce BPF trampoline
@ 2019-11-02 22:00 Alexei Starovoitov
  2019-11-02 22:00 ` [PATCH bpf-next 1/7] bpf, ftrace: temporary workaround Alexei Starovoitov
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2019-11-02 22:00 UTC (permalink / raw)
  To: davem; +Cc: daniel, peterz, rostedt, x86, netdev, bpf, kernel-team

Introduce BPF trampoline that works as a bridge between kernel functions and
BPF programs. The first use case is fentry/fexit BPF programs that are roughly
equivalent to kprobe/kretprobe. Unlike k[ret]probe there is practically zero
overhead to call a set of BPF programs before or after kernel function. In the
future patches networking use cases will be explored. For example: BPF
trampoline can be used to call XDP programs from drivers with direct calls or
wrapping BPF program into another BPF program.

The patch set depends on register_ftrace_direct() API. It's not upstream yet
and available in [1]. The first patch is a hack to workaround this dependency.
The idea is to land this set via bpf-next tree and land register_ftrace_direct
via Steven's ftrace tree. Then during the merge window revert the first patch.
Steven,
do you think it's workable?
As an alternative we can route register_ftrace_direct patches via bpf-next ?

Peter's static_call patches [2] are solving the issue of indirect call overhead
for large class of kernel use cases, but unfortunately don't help calling into
a set of BPF programs.  BPF trampoline's first goal is to translate kernel
calling convention into BPF calling convention. The second goal is to call a
set of programs efficiently. In the future we can replace BPF_PROG_RUN_ARRAY
with BPF trampoline variant. Another future work is to add support for static_key,
static_jmp and static_call inside generated BPF trampoline.

Please see patch 3 for details.

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git/commit/?h=ftrace/direct&id=3ac423d902727884a389699fd7294c0e2e94b29c
[2]
https://lore.kernel.org/lkml/20191007082708.01393931.1@infradead.org/

Alexei Starovoitov (7):
  bpf, ftrace: temporary workaround
  bpf: refactor x86 JIT into helpers
  bpf: Introduce BPF trampoline
  libbpf: Add support to attach to fentry/fexit tracing progs
  selftest/bpf: Simple test for fentry/fexit
  bpf: Add kernel test functions for fentry testing
  selftests/bpf: Add test for BPF trampoline

 arch/x86/kernel/ftrace.c                      |  36 ++
 arch/x86/net/bpf_jit_comp.c                   | 352 +++++++++++++++---
 include/linux/bpf.h                           |  90 +++++
 include/uapi/linux/bpf.h                      |   2 +
 kernel/bpf/Makefile                           |   1 +
 kernel/bpf/btf.c                              |  73 +++-
 kernel/bpf/core.c                             |  31 ++
 kernel/bpf/syscall.c                          |  53 ++-
 kernel/bpf/trampoline.c                       | 252 +++++++++++++
 kernel/bpf/verifier.c                         |  39 ++
 net/bpf/test_run.c                            |  41 ++
 tools/include/uapi/linux/bpf.h                |   2 +
 tools/lib/bpf/bpf_helpers.h                   |  13 +
 tools/lib/bpf/libbpf.c                        |  55 ++-
 tools/lib/bpf/libbpf.h                        |   2 +
 tools/lib/bpf/libbpf.map                      |   1 +
 .../selftests/bpf/prog_tests/fentry_test.c    |  65 ++++
 .../selftests/bpf/prog_tests/kfree_skb.c      |  37 +-
 .../testing/selftests/bpf/progs/fentry_test.c |  90 +++++
 tools/testing/selftests/bpf/progs/kfree_skb.c |  52 +++
 20 files changed, 1215 insertions(+), 72 deletions(-)
 create mode 100644 kernel/bpf/trampoline.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/fentry_test.c
 create mode 100644 tools/testing/selftests/bpf/progs/fentry_test.c

-- 
2.23.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 1/7] bpf, ftrace: temporary workaround
  2019-11-02 22:00 [PATCH bpf-next 0/7] Introduce BPF trampoline Alexei Starovoitov
@ 2019-11-02 22:00 ` Alexei Starovoitov
  2019-11-02 22:00 ` [PATCH bpf-next 2/7] bpf: refactor x86 JIT into helpers Alexei Starovoitov
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2019-11-02 22:00 UTC (permalink / raw)
  To: davem; +Cc: daniel, peterz, rostedt, x86, netdev, bpf, kernel-team

Add temporary workaround until proper register_ftrace_direct() api lands. This
commit must be reverted during upcoming merge window. The hack functions don't
have safety checks that ftrace api performs.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 arch/x86/kernel/ftrace.c | 36 ++++++++++++++++++++++++++++++++++++
 include/linux/bpf.h      |  5 +++++
 kernel/bpf/core.c        | 30 ++++++++++++++++++++++++++++++
 3 files changed, 71 insertions(+)

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 024c3053dbba..0fd7643da92d 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -265,6 +265,42 @@ static int update_ftrace_func(unsigned long ip, void *new)
 	return ret;
 }
 
+/*
+ * There are no safety checks here. It's a temporary hack until proper
+ * register_ftrace_direct() api lands. These functions blindly rewrite kernel
+ * text. Proper register_ftrace_direct() should check that IP is one of allowed
+ * fentry points. unregister_ftrace_direct() and modify_ftrace_direct() should
+ * do similar checks.
+ */
+int arch_register_ftrace_hack(unsigned long ip, unsigned long addr)
+{
+	u8 *new;
+	int ret;
+
+	ftrace_arch_code_modify_prepare();
+	new = ftrace_call_replace(ip, addr);
+	ret = update_ftrace_func(ip, new);
+	ftrace_arch_code_modify_post_process();
+	return ret;
+}
+
+int arch_unregister_ftrace_hack(unsigned long ip, unsigned long addr)
+{
+	u8 *old;
+	int ret;
+
+	ftrace_arch_code_modify_prepare();
+	old = (void *)ftrace_nop_replace();
+	ret = update_ftrace_func(ip, old);
+	ftrace_arch_code_modify_post_process();
+	return ret;
+}
+
+int arch_modify_ftrace_hack(unsigned long ip, unsigned long addr)
+{
+	return arch_register_ftrace_hack(ip, addr);
+}
+
 int ftrace_update_ftrace_func(ftrace_func_t func)
 {
 	unsigned long ip = (unsigned long)(&ftrace_call);
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 7c7f518811a6..a8941f113298 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1157,4 +1157,9 @@ static inline u32 bpf_xdp_sock_convert_ctx_access(enum bpf_access_type type,
 }
 #endif /* CONFIG_INET */
 
+/* workaround */
+int register_ftrace_direct(unsigned long ip, unsigned long addr);
+int unregister_ftrace_direct(unsigned long ip, unsigned long addr);
+int modify_ftrace_direct(unsigned long ip, unsigned long addr);
+
 #endif /* _LINUX_BPF_H */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index df82d5a42b23..1bacf70e6509 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2140,6 +2140,36 @@ int __weak skb_copy_bits(const struct sk_buff *skb, int offset, void *to,
 	return -EFAULT;
 }
 
+int __weak arch_register_ftrace_hack(unsigned long ip, unsigned long addr)
+{
+	return -ENOTSUPP;
+}
+
+int __weak register_ftrace_direct(unsigned long ip, unsigned long addr)
+{
+	return arch_register_ftrace_hack(ip, addr);
+}
+
+int __weak arch_unregister_ftrace_hack(unsigned long ip, unsigned long addr)
+{
+	return -ENOTSUPP;
+}
+
+int __weak unregister_ftrace_direct(unsigned long ip, unsigned long addr)
+{
+	return arch_unregister_ftrace_hack(ip, addr);
+}
+
+int __weak arch_modify_ftrace_hack(unsigned long ip, unsigned long addr)
+{
+	return -ENOTSUPP;
+}
+
+int __weak modify_ftrace_direct(unsigned long ip, unsigned long addr)
+{
+	return arch_modify_ftrace_hack(ip, addr);
+}
+
 DEFINE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
 EXPORT_SYMBOL(bpf_stats_enabled_key);
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 2/7] bpf: refactor x86 JIT into helpers
  2019-11-02 22:00 [PATCH bpf-next 0/7] Introduce BPF trampoline Alexei Starovoitov
  2019-11-02 22:00 ` [PATCH bpf-next 1/7] bpf, ftrace: temporary workaround Alexei Starovoitov
@ 2019-11-02 22:00 ` Alexei Starovoitov
  2019-11-02 22:00 ` [PATCH bpf-next 3/7] bpf: Introduce BPF trampoline Alexei Starovoitov
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2019-11-02 22:00 UTC (permalink / raw)
  To: davem; +Cc: daniel, peterz, rostedt, x86, netdev, bpf, kernel-team

Refactor x86 JITing of LDX, STX, CALL instructions into separate helper
functions.  No functional changes in LDX and STX helpers.  There is a minor
change in CALL helper. It will populate target address correctly on the first
pass of JIT instead of second pass. That won't reduce total number of JIT
passes though.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 arch/x86/net/bpf_jit_comp.c | 151 +++++++++++++++++++++++-------------
 1 file changed, 97 insertions(+), 54 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 8cd23d8309bf..e73fc8a6d665 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -390,6 +390,100 @@ static void emit_mov_reg(u8 **pprog, bool is64, u32 dst_reg, u32 src_reg)
 	*pprog = prog;
 }
 
+/* LDX: dst_reg = *(u8*)(src_reg + off) */
+static void emit_ldx(u8 **pprog, u32 size, u32 dst_reg, u32 src_reg, int off)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+
+	switch (size) {
+	case BPF_B:
+		/* Emit 'movzx rax, byte ptr [rax + off]' */
+		EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x0F, 0xB6);
+		break;
+	case BPF_H:
+		/* Emit 'movzx rax, word ptr [rax + off]' */
+		EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x0F, 0xB7);
+		break;
+	case BPF_W:
+		/* Emit 'mov eax, dword ptr [rax+0x14]' */
+		if (is_ereg(dst_reg) || is_ereg(src_reg))
+			EMIT2(add_2mod(0x40, src_reg, dst_reg), 0x8B);
+		else
+			EMIT1(0x8B);
+		break;
+	case BPF_DW:
+		/* Emit 'mov rax, qword ptr [rax+0x14]' */
+		EMIT2(add_2mod(0x48, src_reg, dst_reg), 0x8B);
+		break;
+	}
+	/*
+	 * If insn->off == 0 we can save one extra byte, but
+	 * special case of x86 R13 which always needs an offset
+	 * is not worth the hassle
+	 */
+	if (is_imm8(off))
+		EMIT2(add_2reg(0x40, src_reg, dst_reg), off);
+	else
+		EMIT1_off32(add_2reg(0x80, src_reg, dst_reg), off);
+	*pprog = prog;
+}
+
+/* STX: *(u8*)(dst_reg + off) = src_reg */
+static void emit_stx(u8 **pprog, u32 size, u32 dst_reg, u32 src_reg, int off)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+
+	switch (size) {
+	case BPF_B:
+		/* Emit 'mov byte ptr [rax + off], al' */
+		if (is_ereg(dst_reg) || is_ereg(src_reg) ||
+		    /* We have to add extra byte for x86 SIL, DIL regs */
+		    src_reg == BPF_REG_1 || src_reg == BPF_REG_2)
+			EMIT2(add_2mod(0x40, dst_reg, src_reg), 0x88);
+		else
+			EMIT1(0x88);
+		break;
+	case BPF_H:
+		if (is_ereg(dst_reg) || is_ereg(src_reg))
+			EMIT3(0x66, add_2mod(0x40, dst_reg, src_reg), 0x89);
+		else
+			EMIT2(0x66, 0x89);
+		break;
+	case BPF_W:
+		if (is_ereg(dst_reg) || is_ereg(src_reg))
+			EMIT2(add_2mod(0x40, dst_reg, src_reg), 0x89);
+		else
+			EMIT1(0x89);
+		break;
+	case BPF_DW:
+		EMIT2(add_2mod(0x48, dst_reg, src_reg), 0x89);
+		break;
+	}
+	if (is_imm8(off))
+		EMIT2(add_2reg(0x40, dst_reg, src_reg), off);
+	else
+		EMIT1_off32(add_2reg(0x80, dst_reg, src_reg),
+			    off);
+	*pprog = prog;
+}
+
+static int emit_call(u8 **pprog, void *func, void *ip)
+{
+	u8 *prog = *pprog;
+	int cnt = 0;
+	s64 offset;
+
+	offset = func - (ip + 5);
+	if (!is_simm32(offset)) {
+		pr_err("Target call %p is out of range\n", func);
+		return -EINVAL;
+	}
+	EMIT1_off32(0xE8, offset);
+	*pprog = prog;
+	return 0;
+}
 
 static bool ex_handler_bpf(const struct exception_table_entry *x,
 			   struct pt_regs *regs, int trapnr,
@@ -773,68 +867,22 @@ st:			if (is_imm8(insn->off))
 
 			/* STX: *(u8*)(dst_reg + off) = src_reg */
 		case BPF_STX | BPF_MEM | BPF_B:
-			/* Emit 'mov byte ptr [rax + off], al' */
-			if (is_ereg(dst_reg) || is_ereg(src_reg) ||
-			    /* We have to add extra byte for x86 SIL, DIL regs */
-			    src_reg == BPF_REG_1 || src_reg == BPF_REG_2)
-				EMIT2(add_2mod(0x40, dst_reg, src_reg), 0x88);
-			else
-				EMIT1(0x88);
-			goto stx;
 		case BPF_STX | BPF_MEM | BPF_H:
-			if (is_ereg(dst_reg) || is_ereg(src_reg))
-				EMIT3(0x66, add_2mod(0x40, dst_reg, src_reg), 0x89);
-			else
-				EMIT2(0x66, 0x89);
-			goto stx;
 		case BPF_STX | BPF_MEM | BPF_W:
-			if (is_ereg(dst_reg) || is_ereg(src_reg))
-				EMIT2(add_2mod(0x40, dst_reg, src_reg), 0x89);
-			else
-				EMIT1(0x89);
-			goto stx;
 		case BPF_STX | BPF_MEM | BPF_DW:
-			EMIT2(add_2mod(0x48, dst_reg, src_reg), 0x89);
-stx:			if (is_imm8(insn->off))
-				EMIT2(add_2reg(0x40, dst_reg, src_reg), insn->off);
-			else
-				EMIT1_off32(add_2reg(0x80, dst_reg, src_reg),
-					    insn->off);
+			emit_stx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn->off);
 			break;
 
 			/* LDX: dst_reg = *(u8*)(src_reg + off) */
 		case BPF_LDX | BPF_MEM | BPF_B:
 		case BPF_LDX | BPF_PROBE_MEM | BPF_B:
-			/* Emit 'movzx rax, byte ptr [rax + off]' */
-			EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x0F, 0xB6);
-			goto ldx;
 		case BPF_LDX | BPF_MEM | BPF_H:
 		case BPF_LDX | BPF_PROBE_MEM | BPF_H:
-			/* Emit 'movzx rax, word ptr [rax + off]' */
-			EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x0F, 0xB7);
-			goto ldx;
 		case BPF_LDX | BPF_MEM | BPF_W:
 		case BPF_LDX | BPF_PROBE_MEM | BPF_W:
-			/* Emit 'mov eax, dword ptr [rax+0x14]' */
-			if (is_ereg(dst_reg) || is_ereg(src_reg))
-				EMIT2(add_2mod(0x40, src_reg, dst_reg), 0x8B);
-			else
-				EMIT1(0x8B);
-			goto ldx;
 		case BPF_LDX | BPF_MEM | BPF_DW:
 		case BPF_LDX | BPF_PROBE_MEM | BPF_DW:
-			/* Emit 'mov rax, qword ptr [rax+0x14]' */
-			EMIT2(add_2mod(0x48, src_reg, dst_reg), 0x8B);
-ldx:			/*
-			 * If insn->off == 0 we can save one extra byte, but
-			 * special case of x86 R13 which always needs an offset
-			 * is not worth the hassle
-			 */
-			if (is_imm8(insn->off))
-				EMIT2(add_2reg(0x40, src_reg, dst_reg), insn->off);
-			else
-				EMIT1_off32(add_2reg(0x80, src_reg, dst_reg),
-					    insn->off);
+			emit_ldx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn->off);
 			if (BPF_MODE(insn->code) == BPF_PROBE_MEM) {
 				struct exception_table_entry *ex;
 				u8 *_insn = image + proglen;
@@ -899,13 +947,8 @@ xadd:			if (is_imm8(insn->off))
 			/* call */
 		case BPF_JMP | BPF_CALL:
 			func = (u8 *) __bpf_call_base + imm32;
-			jmp_offset = func - (image + addrs[i]);
-			if (!imm32 || !is_simm32(jmp_offset)) {
-				pr_err("unsupported BPF func %d addr %p image %p\n",
-				       imm32, func, image);
+			if (!imm32 || emit_call(&prog, func, image + addrs[i - 1]))
 				return -EINVAL;
-			}
-			EMIT1_off32(0xE8, jmp_offset);
 			break;
 
 		case BPF_JMP | BPF_TAIL_CALL:
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 3/7] bpf: Introduce BPF trampoline
  2019-11-02 22:00 [PATCH bpf-next 0/7] Introduce BPF trampoline Alexei Starovoitov
  2019-11-02 22:00 ` [PATCH bpf-next 1/7] bpf, ftrace: temporary workaround Alexei Starovoitov
  2019-11-02 22:00 ` [PATCH bpf-next 2/7] bpf: refactor x86 JIT into helpers Alexei Starovoitov
@ 2019-11-02 22:00 ` Alexei Starovoitov
  2019-11-05 19:51   ` Andrii Nakryiko
  2019-11-02 22:00 ` [PATCH bpf-next 4/7] libbpf: Add support to attach to fentry/fexit tracing progs Alexei Starovoitov
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2019-11-02 22:00 UTC (permalink / raw)
  To: davem; +Cc: daniel, peterz, rostedt, x86, netdev, bpf, kernel-team

Introduce BPF trampoline concept to allow kernel code to call into BPF programs
with practically zero overhead.  The trampoline generation logic is
architecture dependent.  It's converting native calling convention into BPF
calling convention.  BPF ISA is 64-bit (even on 32-bit architectures). The
registers R1 to R5 are used to pass arguments into BPF functions. The main BPF
program accepts only single argument "ctx" in R1. Whereas CPU native calling
convention is different. x86-64 is passing first 6 arguments in registers
and the rest on the stack. x86-32 is passing first 3 arguments in registers.
sparc64 is passing first 6 in registers. And so on.

The trampolines between BPF and kernel already exist.  BPF_CALL_x macros in
include/linux/filter.h statically compile trampolines from BPF into kernel
helpers. They convert up to five u64 arguments into kernel C pointers and
integers. On 64-bit architectures this BPF_to_kernel trampolines are nops. On
32-bit architecture they're meaningful.

The opposite job kernel_to_BPF trampolines is done by CAST_TO_U64 macros and
__bpf_trace_##call() shim functions in include/trace/bpf_probe.h. They convert
kernel function arguments into array of u64s that BPF program consumes via
R1=ctx pointer.

This patch set is doing the same job as __bpf_trace_##call() static
trampolines, but dynamically for any kernel function. There are ~22k global
kernel functions that are attachable via ftrace. The function arguments and
types are described in BTF.  The job of btf_distill_kernel_func() function is
to extract useful information from BTF into "function model" that architecture
dependent trampoline generators will use to generate assembly code to cast
kernel function arguments into array of u64s.  For example the kernel function
eth_type_trans has two pointers. They will be casted to u64 and stored into
stack of generated trampoline. The pointer to that stack space will be passed
into BPF program in R1. On x86-64 such generated trampoline will consume 16
bytes of stack and two stores of %rdi and %rsi into stack. The verifier will
make sure that only two u64 are accessed read-only by BPF program. The verifier
will also recognize the precise type of the pointers being accessed and will
not allow typecasting of the pointer to a different type within BPF program.

The tracing use case in the datacenter demonstrated that certain key kernel
functions have (like tcp_retransmit_skb) have 2 or more kprobes that are always
active.  Other functions have both kprobe and kretprobe.  So it is essential to
keep both kernel code and BPF programs executing at maximum speed. Hence
generated BPF trampoline is re-generated every time new program is attached or
detached to maintain maximum performance.

To avoid the high cost of retpoline the attached BPF programs are called
directly. __bpf_prog_enter/exit() are used to support per-program execution
stats.  In the future this logic will be optimized further by adding support
for bpf_stats_enabled_key inside generated assembly code. Introduction of
preemptible and sleepable BPF programs will completely remove the need to call
to __bpf_prog_enter/exit().

Detach of a BPF program from the trampoline should not fail. To avoid memory
allocation in detach path the half of the page is used as a reserve and flipped
after each attach/detach. 2k bytes is enough to call 40+ BPF programs directly
which is enough for BPF tracing use cases. This limit can be increased in the
future.

BPF_TRACE_FENTRY programs have access to raw kernel function arguments while
BPF_TRACE_FEXIT programs have access to kernel return value as well. Often
kprobe BPF program remembers function arguments in a map while kretprobe
fetches arguments from a map and analyzes them together with return value.
BPF_TRACE_FEXIT accelerates this typical use case.

Recursion prevention for kprobe BPF programs is done via per-cpu
bpf_prog_active counter. In practice that turned out to be a mistake. It
caused programs to randomly skip execution. The tracing tools missed results
they were looking for. Hence BPF trampoline doesn't provide builtin recursion
prevention. It's a job of BPF program itself and will be addressed in the
follow up patches.

BPF trampoline is intended to be used beyond tracing and fentry/fexit use cases
in the future. For example to remove retpoline cost from XDP programs.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 arch/x86/net/bpf_jit_comp.c | 201 ++++++++++++++++++++++++++++
 include/linux/bpf.h         |  85 ++++++++++++
 include/uapi/linux/bpf.h    |   2 +
 kernel/bpf/Makefile         |   1 +
 kernel/bpf/btf.c            |  73 ++++++++++-
 kernel/bpf/core.c           |   1 +
 kernel/bpf/syscall.c        |  53 +++++++-
 kernel/bpf/trampoline.c     | 252 ++++++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c       |  39 ++++++
 9 files changed, 699 insertions(+), 8 deletions(-)
 create mode 100644 kernel/bpf/trampoline.c

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index e73fc8a6d665..34b9e6bd57c0 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -96,6 +96,7 @@ static int bpf_size_to_x86_bytes(int bpf_size)
 
 /* Pick a register outside of BPF range for JIT internal work */
 #define AUX_REG (MAX_BPF_JIT_REG + 1)
+#define X86_REG_R9 (MAX_BPF_JIT_REG + 2)
 
 /*
  * The following table maps BPF registers to x86-64 registers.
@@ -121,6 +122,7 @@ static const int reg2hex[] = {
 	[BPF_REG_FP] = 5, /* RBP readonly */
 	[BPF_REG_AX] = 2, /* R10 temp register */
 	[AUX_REG] = 3,    /* R11 temp register */
+	[X86_REG_R9] = 1, /* R9 register, 6th function argument */
 };
 
 static const int reg2pt_regs[] = {
@@ -148,6 +150,7 @@ static bool is_ereg(u32 reg)
 			     BIT(BPF_REG_7) |
 			     BIT(BPF_REG_8) |
 			     BIT(BPF_REG_9) |
+			     BIT(X86_REG_R9) |
 			     BIT(BPF_REG_AX));
 }
 
@@ -1181,6 +1184,204 @@ xadd:			if (is_imm8(insn->off))
 	return proglen;
 }
 
+static void save_regs(struct btf_func_model *m, u8 **prog, int nr_args,
+		      int stack_size)
+{
+	int i;
+	/* Store function arguments to stack.
+	 * For a function that accepts two pointers the sequence will be:
+	 * mov QWORD PTR [rbp-0x10],rdi
+	 * mov QWORD PTR [rbp-0x8],rsi
+	 */
+	for (i = 0; i < min(nr_args, 6); i++)
+		emit_stx(prog, bytes_to_bpf_size(m->arg_size[i]),
+			 BPF_REG_FP,
+			 i == 5 ? X86_REG_R9 : BPF_REG_1 + i,
+			 -(stack_size - i * 8));
+}
+
+static void restore_regs(struct btf_func_model *m, u8 **prog, int nr_args,
+			 int stack_size)
+{
+	int i;
+
+	/* Restore function arguments from stack.
+	 * For a function that accepts two pointers the sequence will be:
+	 * EMIT4(0x48, 0x8B, 0x7D, 0xF0); mov rdi,QWORD PTR [rbp-0x10]
+	 * EMIT4(0x48, 0x8B, 0x75, 0xF8); mov rsi,QWORD PTR [rbp-0x8]
+	 */
+	for (i = 0; i < min(nr_args, 6); i++)
+		emit_ldx(prog, bytes_to_bpf_size(m->arg_size[i]),
+			 i == 5 ? X86_REG_R9 : BPF_REG_1 + i,
+			 BPF_REG_FP,
+			 -(stack_size - i * 8));
+}
+
+static int invoke_bpf(struct btf_func_model *m, u8 **pprog,
+		      struct bpf_prog **progs, int prog_cnt, int stack_size)
+{
+	u8 *prog = *pprog;
+	int cnt = 0, i;
+
+	for (i = 0; i < prog_cnt; i++) {
+		if (emit_call(&prog, __bpf_prog_enter, prog))
+			return -EINVAL;
+		/* remember prog start time returned by __bpf_prog_enter */
+		emit_mov_reg(&prog, true, BPF_REG_6, BPF_REG_0);
+
+		/* arg1: lea rdi, [rbp - stack_size] */
+		EMIT4(0x48, 0x8D, 0x7D, -stack_size);
+		/* arg2: progs[i]->insnsi for interpreter */
+		if (!progs[i]->jited)
+			emit_mov_imm64(&prog, BPF_REG_2,
+				       (long) progs[i]->insnsi >> 32,
+				       (u32) (long) progs[i]->insnsi);
+		/* call JITed bpf program or interpreter */
+		if (emit_call(&prog, progs[i]->bpf_func, prog))
+			return -EINVAL;
+
+		/* arg1: mov rdi, progs[i] */
+		emit_mov_imm64(&prog, BPF_REG_1, (long) progs[i] >> 32,
+			       (u32) (long) progs[i]);
+		/* arg2: mov rsi, rbx <- start time in nsec */
+		emit_mov_reg(&prog, true, BPF_REG_2, BPF_REG_6);
+		if (emit_call(&prog, __bpf_prog_exit, prog))
+			return -EINVAL;
+	}
+	*pprog = prog;
+	return 0;
+}
+
+/* Example:
+ * __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev);
+ * its 'struct btf_func_model' will be nr_args=2
+ * The assembly code when eth_type_trans is executing after trampoline:
+ *
+ * push rbp
+ * mov rbp, rsp
+ * sub rsp, 16                     // space for skb and dev
+ * push rbx                        // temp regs to pass start time
+ * mov qword ptr [rbp - 16], rdi   // save skb pointer to stack
+ * mov qword ptr [rbp - 8], rsi    // save dev pointer to stack
+ * call __bpf_prog_enter           // rcu_read_lock and preempt_disable
+ * mov rbx, rax                    // remember start time in bpf stats are enabled
+ * lea rdi, [rbp - 16]             // R1==ctx of bpf prog
+ * call addr_of_jited_FENTRY_prog
+ * movabsq rdi, 64bit_addr_of_struct_bpf_prog  // unused if bpf stats are off
+ * mov rsi, rbx                    // prog start time
+ * call __bpf_prog_exit            // rcu_read_unlock, preempt_enable and stats math
+ * mov rdi, qword ptr [rbp - 16]   // restore skb pointer from stack
+ * mov rsi, qword ptr [rbp - 8]    // restore dev pointer from stack
+ * pop rbx
+ * leave
+ * ret
+ *
+ * eth_type_trans has 5 byte nop at the beginning. These 5 bytes will be
+ * replaced with 'call generated_bpf_trampoline'. When it returns
+ * eth_type_trans will continue executing with original skb and dev pointers.
+ *
+ * The assembly code when eth_type_trans is called from trampoline:
+ *
+ * push rbp
+ * mov rbp, rsp
+ * sub rsp, 24                     // space for skb, dev, return value
+ * push rbx                        // temp regs to pass start time
+ * mov qword ptr [rbp - 24], rdi   // save skb pointer to stack
+ * mov qword ptr [rbp - 16], rsi   // save dev pointer to stack
+ * call __bpf_prog_enter           // rcu_read_lock and preempt_disable
+ * mov rbx, rax                    // remember start time in bpf stats are enabled
+ * lea rdi, [rbp - 24]             // R1==ctx of bpf prog
+ * call addr_of_jited_FENTRY_prog  // bpf prog can access skb and dev
+ * movabsq rdi, 64bit_addr_of_struct_bpf_prog  // unused if bpf stats are off
+ * mov rsi, rbx                    // prog start time
+ * call __bpf_prog_exit            // rcu_read_unlock, preempt_enable and stats math
+ * mov rdi, qword ptr [rbp - 24]   // restore skb pointer from stack
+ * mov rsi, qword ptr [rbp - 16]   // restore dev pointer from stack
+ * call eth_type_trans+5           // execute body of eth_type_trans
+ * mov qword ptr [rbp - 8], rax    // save return value
+ * call __bpf_prog_enter           // rcu_read_lock and preempt_disable
+ * mov rbx, rax                    // remember start time in bpf stats are enabled
+ * lea rdi, [rbp - 24]             // R1==ctx of bpf prog
+ * call addr_of_jited_FEXIT_prog   // bpf prog can access skb, dev, return value
+ * movabsq rdi, 64bit_addr_of_struct_bpf_prog  // unused if bpf stats are off
+ * mov rsi, rbx                    // prog start time
+ * call __bpf_prog_exit            // rcu_read_unlock, preempt_enable and stats math
+ * mov rax, qword ptr [rbp - 8]    // restore eth_type_trans's return value
+ * pop rbx
+ * leave
+ * add rsp, 8                      // skip eth_type_trans's frame
+ * ret                             // return to its caller
+ */
+int arch_prepare_bpf_trampoline(void *image, struct btf_func_model *m, u32 flags,
+				struct bpf_prog **fentry_progs, int fentry_cnt,
+				struct bpf_prog **fexit_progs, int fexit_cnt,
+				long orig_call)
+{
+	int cnt = 0, nr_args = m->nr_args;
+	int stack_size = nr_args * 8;
+	u8 *prog;
+
+	/* x86-64 supports up to 6 arguments. 7+ can be added in the future */
+	if (nr_args > 6)
+		return -ENOTSUPP;
+
+	if ((flags & BPF_TRAMP_F_RESTORE_REGS) &&
+	    (flags & BPF_TRAMP_F_SKIP_FRAME))
+		return -EINVAL;
+
+	if (flags & BPF_TRAMP_F_CALL_ORIG)
+		stack_size += 8; /* room for return value of orig_call */
+
+	prog = image;
+
+	EMIT1(0x55);		 /* push rbp */
+	EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */
+	EMIT4(0x48, 0x83, 0xEC, stack_size); /* sub rsp, stack_size */
+	EMIT1(0x53);		 /* push rbx */
+
+	save_regs(m, &prog, nr_args, stack_size);
+
+	if (fentry_cnt)
+		if (invoke_bpf(m, &prog, fentry_progs, fentry_cnt, stack_size))
+			return -EINVAL;
+
+	if (flags & BPF_TRAMP_F_CALL_ORIG) {
+		if (fentry_cnt)
+			restore_regs(m, &prog, nr_args, stack_size);
+
+		/* call original function */
+		if (emit_call(&prog, (void *)orig_call, prog))
+			return -EINVAL;
+		/* remember return value in a stack for bpf prog to access */
+		emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8);
+	}
+
+	if (fexit_cnt)
+		if (invoke_bpf(m, &prog, fexit_progs, fexit_cnt, stack_size))
+			return -EINVAL;
+
+	if (flags & BPF_TRAMP_F_RESTORE_REGS)
+		restore_regs(m, &prog, nr_args, stack_size);
+
+	if (flags & BPF_TRAMP_F_CALL_ORIG)
+		/* restore original return value back into RAX */
+		emit_ldx(&prog, BPF_DW, BPF_REG_0, BPF_REG_FP, -8);
+
+	EMIT1(0x5B); /* pop rbx */
+	EMIT1(0xC9); /* leave */
+	if (flags & BPF_TRAMP_F_SKIP_FRAME)
+		/* skip our return address and return to parent */
+		EMIT4(0x48, 0x83, 0xC4, 8); /* add rsp, 8 */
+	EMIT1(0xC3); /* ret */
+	/* One half of the page has active running trampoline.
+	 * Another half is an area for next trampoline.
+	 * Make sure the trampoline generation logic doesn't overflow.
+	 */
+	if (WARN_ON_ONCE(prog - (u8 *)image > PAGE_SIZE/2))
+		return -EFAULT;
+	return 0;
+}
+
 struct x64_jit_data {
 	struct bpf_binary_header *header;
 	int *addrs;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a8941f113298..2a7720021202 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -14,6 +14,7 @@
 #include <linux/numa.h>
 #include <linux/wait.h>
 #include <linux/u64_stats_sync.h>
+#include <linux/refcount.h>
 
 struct bpf_verifier_env;
 struct bpf_verifier_log;
@@ -384,6 +385,77 @@ struct bpf_prog_stats {
 	struct u64_stats_sync syncp;
 } __aligned(2 * sizeof(u64));
 
+struct btf_func_model {
+	u8 ret_size;
+	u8 nr_args;
+	u8 arg_size[MAX_BPF_FUNC_ARGS];
+};
+
+/* Restore arguments before returning from trampoline to let original function
+ * continue executing. This flag is used for fentry progs when there are no
+ * fexit progs.
+ */
+#define BPF_TRAMP_F_RESTORE_REGS	BIT(0)
+/* Call original function after fentry progs, but before fexit progs.
+ * Makes sense for fentry/fexit, normal calls and indirect calls.
+ */
+#define BPF_TRAMP_F_CALL_ORIG		BIT(1)
+/* Skip current frame and return to parent.  Makes sense for fentry/fexit
+ * programs only. Should not be used with normal calls and indirect calls.
+ */
+#define BPF_TRAMP_F_SKIP_FRAME		BIT(2)
+
+/* Different use cases for BPF trampoline:
+ * 1. replace nop at the function entry (kprobe equivalent)
+ *    flags = BPF_TRAMP_F_RESTORE_REGS
+ *    fentry = a set of programs to run before returning from trampoline
+ *
+ * 2. replace nop at the function entry (kprobe + kretprobe equivalent)
+ *    flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME
+ *    orig_call = fentry_ip + MCOUNT_INSN_SIZE
+ *    fentry = a set of program to run before calling original function
+ *    fexit = a set of program to run after original function
+ *
+ * 3. replace direct call instruction anywhere in the function body
+ *    or assign a function pointer for indirect call (like tcp_congestion_ops->cong_avoid)
+ *    With flags = 0
+ *      fentry = a set of programs to run before returning from trampoline
+ *    With flags = BPF_TRAMP_F_CALL_ORIG
+ *      orig_call = original callback addr or direct function addr
+ *      fentry = a set of program to run before calling original function
+ *      fexit = a set of program to run after original function
+ */
+int arch_prepare_bpf_trampoline(void *image, struct btf_func_model *m, u32 flags,
+				struct bpf_prog **fentry_progs, int fentry_cnt,
+				struct bpf_prog **fexit_progs, int fexit_cnt,
+				long orig_call);
+/* these two functions are called from generated trampoline */
+u64 notrace __bpf_prog_enter(void);
+void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start);
+
+enum bpf_tramp_prog_type {
+	BPF_TRAMP_FENTRY,
+	BPF_TRAMP_FEXIT,
+	BPF_TRAMP_MAX
+};
+
+struct bpf_trampoline {
+	struct hlist_node hlist;
+	refcount_t refcnt;
+	u32 btf_id;
+	struct {
+		struct btf_func_model model;
+		long addr;
+	} func;
+	struct hlist_head progs_hlist[BPF_TRAMP_MAX];
+	int progs_cnt[BPF_TRAMP_MAX];
+	void *image;
+	u64 selector;
+};
+struct bpf_trampoline *bpf_trampoline_lookup(u32 btf_id);
+int bpf_trampoline_link_prog(struct bpf_prog *prog);
+int bpf_trampoline_unlink_prog(struct bpf_prog *prog);
+
 struct bpf_prog_aux {
 	atomic_t refcnt;
 	u32 used_map_cnt;
@@ -398,6 +470,9 @@ struct bpf_prog_aux {
 	bool verifier_zext; /* Zero extensions has been inserted by verifier. */
 	bool offload_requested;
 	bool attach_btf_trace; /* true if attaching to BTF-enabled raw tp */
+	enum bpf_tramp_prog_type trampoline_prog_type;
+	struct bpf_trampoline *trampoline;
+	struct hlist_node tramp_hlist;
 	/* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
 	const struct btf_type *attach_func_proto;
 	/* function name for valid attach_btf_id */
@@ -671,6 +746,7 @@ struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct bpf_prog *prog);
 void bpf_prog_put(struct bpf_prog *prog);
 int __bpf_prog_charge(struct user_struct *user, u32 pages);
 void __bpf_prog_uncharge(struct user_struct *user, u32 pages);
+void bpf_trampoline_put(struct bpf_trampoline *tr);
 
 void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock);
 void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock);
@@ -784,6 +860,11 @@ int btf_struct_access(struct bpf_verifier_log *log,
 		      u32 *next_btf_id);
 u32 btf_resolve_helper_id(struct bpf_verifier_log *log, void *, int);
 
+int btf_distill_kernel_func(struct bpf_verifier_log *log,
+			    const struct btf_type *func_proto,
+			    const char *func_name,
+			    struct btf_func_model *m);
+
 #else /* !CONFIG_BPF_SYSCALL */
 static inline struct bpf_prog *bpf_prog_get(u32 ufd)
 {
@@ -852,6 +933,10 @@ static inline void __dev_map_flush(struct bpf_map *map)
 {
 }
 
+static inline void bpf_trampoline_put(struct bpf_trampoline *tr)
+{
+}
+
 struct xdp_buff;
 struct bpf_dtab_netdev;
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index df6809a76404..69c200e6e696 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -201,6 +201,8 @@ enum bpf_attach_type {
 	BPF_CGROUP_GETSOCKOPT,
 	BPF_CGROUP_SETSOCKOPT,
 	BPF_TRACE_RAW_TP,
+	BPF_TRACE_FENTRY,
+	BPF_TRACE_FEXIT,
 	__MAX_BPF_ATTACH_TYPE
 };
 
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index e1d9adb212f9..f7b0bd1c7bc2 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -6,6 +6,7 @@ obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o
 obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
 obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o
 obj-$(CONFIG_BPF_SYSCALL) += disasm.o
+obj-$(CONFIG_BPF_SYSCALL) += trampoline.o
 obj-$(CONFIG_BPF_SYSCALL) += btf.o
 ifeq ($(CONFIG_NET),y)
 obj-$(CONFIG_BPF_SYSCALL) += devmap.o
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 128d89601d73..64ba1fd5b6a0 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3441,13 +3441,18 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 		args++;
 		nr_args--;
 	}
-	if (arg >= nr_args) {
+
+	if (prog->expected_attach_type == BPF_TRACE_FEXIT &&
+	    arg == nr_args) {
+		/* function return type */
+		t = btf_type_by_id(btf_vmlinux, t->type);
+	} else if (arg >= nr_args) {
 		bpf_log(log, "func '%s' doesn't have %d-th argument\n",
-			tname, arg);
+			tname, arg + 1);
 		return false;
+	} else {
+		t = btf_type_by_id(btf_vmlinux, args[arg].type);
 	}
-
-	t = btf_type_by_id(btf_vmlinux, args[arg].type);
 	/* skip modifiers */
 	while (btf_type_is_modifier(t))
 		t = btf_type_by_id(btf_vmlinux, t->type);
@@ -3647,6 +3652,66 @@ u32 btf_resolve_helper_id(struct bpf_verifier_log *log, void *fn, int arg)
 	return btf_id;
 }
 
+static int __get_type_size(u32 btf_id, const struct btf_type **bad_type)
+{
+	const struct btf_type *t;
+
+	if (!btf_id)
+		/* void */
+		return 0;
+	t = btf_type_by_id(btf_vmlinux, btf_id);
+	while (btf_type_is_modifier(t))
+		t = btf_type_by_id(btf_vmlinux, t->type);
+	if (btf_type_is_ptr(t))
+		/* kernel size of pointer. Not BPF's size of pointer*/
+		return sizeof(void *);
+	if (btf_type_is_int(t) || btf_type_is_enum(t))
+		return t->size;
+	*bad_type = t;
+	return -EINVAL;
+}
+
+int btf_distill_kernel_func(struct bpf_verifier_log *log,
+			    const struct btf_type *func,
+			    const char *tname,
+			    struct btf_func_model *m)
+{
+	const struct btf_param *args;
+	const struct btf_type *t;
+	u32 i, nargs;
+	int ret;
+
+	args = (const struct btf_param *)(func + 1);
+	nargs = btf_type_vlen(func);
+	if (nargs >= MAX_BPF_FUNC_ARGS) {
+		bpf_log(log,
+			"The function %s has %d arguments. Too many.\n",
+			tname, nargs);
+		return -EINVAL;
+	}
+	ret = __get_type_size(func->type, &t);
+	if (ret < 0) {
+		bpf_log(log,
+			"The function %s return type %s is unsupported.\n",
+			tname, btf_kind_str[BTF_INFO_KIND(t->info)]);
+		return -EINVAL;
+	}
+	m->ret_size = ret;
+
+	for (i = 0; i < nargs; i++) {
+		ret = __get_type_size(args[i].type, &t);
+		if (ret < 0) {
+			bpf_log(log,
+				"The function %s arg%d type %s is unsupported.\n",
+				tname, i, btf_kind_str[BTF_INFO_KIND(t->info)]);
+			return -EINVAL;
+		}
+		m->arg_size[i] = ret;
+	}
+	m->nr_args = nargs;
+	return 0;
+}
+
 void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
 		       struct seq_file *m)
 {
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 1bacf70e6509..26b0be03c748 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2011,6 +2011,7 @@ static void bpf_prog_free_deferred(struct work_struct *work)
 	if (aux->prog->has_callchain_buf)
 		put_callchain_buffers();
 #endif
+	bpf_trampoline_put(aux->trampoline);
 	for (i = 0; i < aux->func_cnt; i++)
 		bpf_jit_free(aux->func[i]);
 	if (aux->func_cnt) {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 985d01ced196..8e435df99b0c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1790,6 +1790,49 @@ static int bpf_obj_get(const union bpf_attr *attr)
 				attr->file_flags);
 }
 
+static int bpf_tracing_prog_release(struct inode *inode, struct file *filp)
+{
+	struct bpf_prog *prog = filp->private_data;
+
+	WARN_ON_ONCE(bpf_trampoline_unlink_prog(prog));
+	bpf_prog_put(prog);
+	return 0;
+}
+
+static const struct file_operations bpf_tracing_prog_fops = {
+	.release	= bpf_tracing_prog_release,
+	.read		= bpf_dummy_read,
+	.write		= bpf_dummy_write,
+};
+
+static int bpf_tracing_prog_attach(struct bpf_prog *prog)
+{
+	int tr_fd, err;
+
+	if (prog->expected_attach_type != BPF_TRACE_FENTRY &&
+	    prog->expected_attach_type != BPF_TRACE_FEXIT) {
+		err = -EINVAL;
+		goto out_put_prog;
+	}
+
+	err = bpf_trampoline_link_prog(prog);
+	if (err)
+		goto out_put_prog;
+
+	tr_fd = anon_inode_getfd("bpf-tracing-prog", &bpf_tracing_prog_fops,
+				 prog, O_CLOEXEC);
+	if (tr_fd < 0) {
+		WARN_ON_ONCE(bpf_trampoline_unlink_prog(prog));
+		err = tr_fd;
+		goto out_put_prog;
+	}
+	return tr_fd;
+
+out_put_prog:
+	bpf_prog_put(prog);
+	return err;
+}
+
 struct bpf_raw_tracepoint {
 	struct bpf_raw_event_map *btp;
 	struct bpf_prog *prog;
@@ -1841,14 +1884,16 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
 
 	if (prog->type == BPF_PROG_TYPE_TRACING) {
 		if (attr->raw_tracepoint.name) {
-			/* raw_tp name should not be specified in raw_tp
-			 * programs that were verified via in-kernel BTF info
+			/* The attach point for this category of programs
+			 * should be specified via btf_id during program load.
 			 */
 			err = -EINVAL;
 			goto out_put_prog;
 		}
-		/* raw_tp name is taken from type name instead */
-		tp_name = prog->aux->attach_func_name;
+		if (prog->expected_attach_type == BPF_TRACE_RAW_TP)
+			tp_name = prog->aux->attach_func_name;
+		else
+			return bpf_tracing_prog_attach(prog);
 	} else {
 		if (strncpy_from_user(buf,
 				      u64_to_user_ptr(attr->raw_tracepoint.name),
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
new file mode 100644
index 000000000000..d3ac233c8193
--- /dev/null
+++ b/kernel/bpf/trampoline.c
@@ -0,0 +1,252 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2019 Facebook */
+#include <linux/hash.h>
+#include <linux/bpf.h>
+#include <linux/ftrace.h>
+#include <linux/filter.h>
+
+#ifndef MCOUNT_INSN_SIZE
+#define MCOUNT_INSN_SIZE 0
+#endif
+
+/* btf_vmlinux has ~22k attachable functions. 1k htab is enough. */
+#define TRAMPOLINE_HASH_BITS 10
+#define TRAMPOLINE_TABLE_SIZE (1 << TRAMPOLINE_HASH_BITS)
+
+static struct hlist_head trampoline_table[TRAMPOLINE_TABLE_SIZE];
+
+static DEFINE_MUTEX(trampoline_mutex);
+
+struct bpf_trampoline *bpf_trampoline_lookup(u32 btf_id)
+{
+	struct bpf_trampoline *tr;
+	struct hlist_head *head;
+	void *image;
+	int i;
+
+	mutex_lock(&trampoline_mutex);
+	head = &trampoline_table[hash_32(btf_id, TRAMPOLINE_HASH_BITS)];
+	hlist_for_each_entry(tr, head, hlist) {
+		if (tr->btf_id == btf_id) {
+			refcount_inc(&tr->refcnt);
+			goto out;
+		}
+	}
+	tr = kzalloc(sizeof(*tr), GFP_KERNEL);
+	if (!tr)
+		goto out;
+
+	/* is_root was checked earlier. No need for bpf_jit_charge_modmem() */
+	image = bpf_jit_alloc_exec(PAGE_SIZE);
+	if (!image || MCOUNT_INSN_SIZE == 0) {
+		kfree(tr);
+		tr = NULL;
+		goto out;
+	}
+
+	tr->btf_id = btf_id;
+	INIT_HLIST_NODE(&tr->hlist);
+	hlist_add_head(&tr->hlist, head);
+	refcount_set(&tr->refcnt, 1);
+	for (i = 0; i < BPF_TRAMP_MAX; i++)
+		INIT_HLIST_HEAD(&tr->progs_hlist[i]);
+
+	set_vm_flush_reset_perms(image);
+	/* Keep image as writeable. The alternative is to keep flipping ro/rw
+	 * everytime new program is attached or detached.
+	 */
+	set_memory_x((long)image, 1);
+	tr->image = image;
+out:
+	mutex_unlock(&trampoline_mutex);
+	return tr;
+}
+
+/* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50
+ * bytes on x86.  Pick a number to fit into PAGE_SIZE / 2
+ */
+#define BPF_MAX_TRAMP_PROGS 40
+
+static int bpf_trampoline_update(struct bpf_prog *prog)
+{
+	struct bpf_trampoline *tr = prog->aux->trampoline;
+	void *old_image = tr->image + ((tr->selector + 1) & 1) * PAGE_SIZE/2;
+	void *image = tr->image + (tr->selector & 1) * PAGE_SIZE/2;
+	struct bpf_prog *progs_to_run[BPF_MAX_TRAMP_PROGS];
+	int fentry_cnt = tr->progs_cnt[BPF_TRAMP_FENTRY];
+	int fexit_cnt = tr->progs_cnt[BPF_TRAMP_FEXIT];
+	struct bpf_prog **progs, **fentry, **fexit;
+	u32 flags = BPF_TRAMP_F_RESTORE_REGS;
+	struct bpf_prog_aux *aux;
+	int err;
+
+	if (fentry_cnt + fexit_cnt == 0) {
+		err = unregister_ftrace_direct(tr->func.addr, (long)old_image);
+		tr->selector = 0;
+		goto out;
+	}
+
+	/* populate fentry progs */
+	fentry = progs = progs_to_run;
+	hlist_for_each_entry(aux, &tr->progs_hlist[BPF_TRAMP_FENTRY], tramp_hlist)
+		*progs++ = aux->prog;
+
+	/* populate fexit progs */
+	fexit = progs;
+	hlist_for_each_entry(aux, &tr->progs_hlist[BPF_TRAMP_FEXIT], tramp_hlist)
+		*progs++ = aux->prog;
+
+	if (fexit_cnt)
+		flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;
+
+	err = arch_prepare_bpf_trampoline(image, &tr->func.model, flags,
+					  fentry, fentry_cnt,
+					  fexit, fexit_cnt,
+					  tr->func.addr + MCOUNT_INSN_SIZE);
+	if (err)
+		goto out;
+
+	if (tr->selector)
+		/* progs already running at this address */
+		err = modify_ftrace_direct(tr->func.addr, (long)image);
+	else
+		/* first time registering */
+		err = register_ftrace_direct(tr->func.addr, (long)image);
+
+	if (err)
+		goto out;
+	tr->selector++;
+out:
+	return err;
+}
+
+static enum bpf_tramp_prog_type bpf_attach_type_to_tramp(enum bpf_attach_type t)
+{
+	switch (t) {
+	case BPF_TRACE_FENTRY:
+		return BPF_TRAMP_FENTRY;
+	default:
+		return BPF_TRAMP_FEXIT;
+	}
+}
+
+int bpf_trampoline_link_prog(struct bpf_prog *prog)
+{
+	enum bpf_tramp_prog_type kind;
+	struct bpf_trampoline *tr;
+	int err = 0;
+
+	tr = prog->aux->trampoline;
+	kind = bpf_attach_type_to_tramp(prog->expected_attach_type);
+	mutex_lock(&trampoline_mutex);
+	if (tr->progs_cnt[BPF_TRAMP_FENTRY] + tr->progs_cnt[BPF_TRAMP_FEXIT]
+	    >= BPF_MAX_TRAMP_PROGS) {
+		err = -E2BIG;
+		goto out;
+	}
+	hlist_add_head(&prog->aux->tramp_hlist, &tr->progs_hlist[kind]);
+	tr->progs_cnt[kind]++;
+	err = bpf_trampoline_update(prog);
+	if (err) {
+		hlist_del(&prog->aux->tramp_hlist);
+		tr->progs_cnt[kind]--;
+	}
+out:
+	mutex_unlock(&trampoline_mutex);
+	return err;
+}
+
+/* bpf_trampoline_unlink_prog() should never fail.
+ * The possible exception is failure in unregister/modify_ftrace_direct()
+ * which will be almost fatal.
+ */
+int bpf_trampoline_unlink_prog(struct bpf_prog *prog)
+{
+	enum bpf_tramp_prog_type kind;
+	struct bpf_trampoline *tr;
+	int err;
+
+	tr = prog->aux->trampoline;
+	kind = bpf_attach_type_to_tramp(prog->expected_attach_type);
+	mutex_lock(&trampoline_mutex);
+	hlist_del(&prog->aux->tramp_hlist);
+	tr->progs_cnt[kind]--;
+	err = bpf_trampoline_update(prog);
+	mutex_unlock(&trampoline_mutex);
+	return err;
+}
+
+void bpf_trampoline_put(struct bpf_trampoline *tr)
+{
+	if (!tr)
+		return;
+	mutex_lock(&trampoline_mutex);
+	if (!refcount_dec_and_test(&tr->refcnt))
+		goto out;
+	if (WARN_ON_ONCE(!hlist_empty(&tr->progs_hlist[BPF_TRAMP_FENTRY])))
+		goto out;
+	if (WARN_ON_ONCE(!hlist_empty(&tr->progs_hlist[BPF_TRAMP_FEXIT])))
+		goto out;
+	bpf_jit_free_exec(tr->image);
+	hlist_del(&tr->hlist);
+	kfree(tr);
+out:
+	mutex_unlock(&trampoline_mutex);
+}
+
+/* The logic is similar to BPF_PROG_RUN, but with explicit rcu and preempt that
+ * are needed for trampoline. The macro is split into
+ * call _bpf_prog_enter
+ * call prog->bpf_func
+ * call __bpf_prog_exit
+ */
+u64 notrace __bpf_prog_enter(void)
+{
+	u64 start = 0;
+
+	rcu_read_lock();
+	preempt_disable();
+	if (static_branch_unlikely(&bpf_stats_enabled_key))
+		start = sched_clock();
+	return start;
+}
+
+void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start)
+{
+	struct bpf_prog_stats *stats;
+
+	if (static_branch_unlikely(&bpf_stats_enabled_key) &&
+	    /* static_key could be enabled in __bpf_prog_enter
+	     * and disabled in __bpf_prog_exit.
+	     * And vice versa.
+	     * Hence check that 'start' is not zero.
+	     */
+	    start) {
+		stats = this_cpu_ptr(prog->aux->stats);
+		u64_stats_update_begin(&stats->syncp);
+		stats->cnt++;
+		stats->nsecs += sched_clock() - start;
+		u64_stats_update_end(&stats->syncp);
+	}
+	preempt_enable();
+	rcu_read_unlock();
+}
+
+int __weak
+arch_prepare_bpf_trampoline(void *image, struct btf_func_model *m, u32 flags,
+			    struct bpf_prog **fentry_progs, int fentry_cnt,
+			    struct bpf_prog **fexit_progs, int fexit_cnt,
+			    long orig_call)
+{
+	return -ENOTSUPP;
+}
+
+static int __init init_trampolines(void)
+{
+	int i;
+
+	for (i = 0; i < TRAMPOLINE_TABLE_SIZE; i++)
+		INIT_HLIST_HEAD(&trampoline_table[i]);
+	return 0;
+}
+late_initcall(init_trampolines);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2f2374967b36..351501745c79 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -9382,8 +9382,11 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 	struct bpf_prog *prog = env->prog;
 	u32 btf_id = prog->aux->attach_btf_id;
 	const char prefix[] = "btf_trace_";
+	struct bpf_trampoline *tr;
 	const struct btf_type *t;
 	const char *tname;
+	long addr;
+	int ret;
 
 	if (prog->type != BPF_PROG_TYPE_TRACING)
 		return 0;
@@ -9432,6 +9435,42 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 		prog->aux->attach_func_proto = t;
 		prog->aux->attach_btf_trace = true;
 		return 0;
+	case BPF_TRACE_FENTRY:
+	case BPF_TRACE_FEXIT:
+		if (!btf_type_is_func(t)) {
+			verbose(env, "attach_btf_id %u is not a function\n",
+				btf_id);
+			return -EINVAL;
+		}
+		t = btf_type_by_id(btf_vmlinux, t->type);
+		if (!btf_type_is_func_proto(t))
+			return -EINVAL;
+		tr = bpf_trampoline_lookup(btf_id);
+		if (!tr)
+			return -ENOMEM;
+		prog->aux->attach_func_name = tname;
+		prog->aux->attach_func_proto = t;
+		if (tr->func.addr) {
+			prog->aux->trampoline = tr;
+			return 0;
+		}
+		ret = btf_distill_kernel_func(&env->log, t, tname,
+					      &tr->func.model);
+		if (ret < 0) {
+			bpf_trampoline_put(tr);
+			return ret;
+		}
+		addr = kallsyms_lookup_name(tname);
+		if (!addr) {
+			verbose(env,
+				"The address of function %s cannot be found\n",
+				tname);
+			bpf_trampoline_put(tr);
+			return -ENOENT;
+		}
+		tr->func.addr = addr;
+		prog->aux->trampoline = tr;
+		return 0;
 	default:
 		return -EINVAL;
 	}
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 4/7] libbpf: Add support to attach to fentry/fexit tracing progs
  2019-11-02 22:00 [PATCH bpf-next 0/7] Introduce BPF trampoline Alexei Starovoitov
                   ` (2 preceding siblings ...)
  2019-11-02 22:00 ` [PATCH bpf-next 3/7] bpf: Introduce BPF trampoline Alexei Starovoitov
@ 2019-11-02 22:00 ` Alexei Starovoitov
  2019-11-05 21:17   ` Andrii Nakryiko
  2019-11-02 22:00 ` [PATCH bpf-next 5/7] selftest/bpf: Simple test for fentry/fexit Alexei Starovoitov
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2019-11-02 22:00 UTC (permalink / raw)
  To: davem; +Cc: daniel, peterz, rostedt, x86, netdev, bpf, kernel-team

Teach libbpf to recognize tracing programs types and attach them to
fentry/fexit.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 tools/include/uapi/linux/bpf.h |  2 ++
 tools/lib/bpf/libbpf.c         | 55 +++++++++++++++++++++++++++++-----
 tools/lib/bpf/libbpf.h         |  2 ++
 tools/lib/bpf/libbpf.map       |  1 +
 4 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index df6809a76404..69c200e6e696 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -201,6 +201,8 @@ enum bpf_attach_type {
 	BPF_CGROUP_GETSOCKOPT,
 	BPF_CGROUP_SETSOCKOPT,
 	BPF_TRACE_RAW_TP,
+	BPF_TRACE_FENTRY,
+	BPF_TRACE_FEXIT,
 	__MAX_BPF_ATTACH_TYPE
 };
 
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 7aa2a2a22cef..03e784f36dd9 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -3744,7 +3744,7 @@ bpf_object__load_progs(struct bpf_object *obj, int log_level)
 	return 0;
 }
 
-static int libbpf_attach_btf_id_by_name(const char *name, __u32 *btf_id);
+static int libbpf_attach_btf_id_by_name(const char *name, __u32 *btf_id, bool raw_tp);
 
 static struct bpf_object *
 __bpf_object__open(const char *path, const void *obj_buf, size_t obj_buf_sz,
@@ -3811,7 +3811,9 @@ __bpf_object__open(const char *path, const void *obj_buf, size_t obj_buf_sz,
 		bpf_program__set_type(prog, prog_type);
 		bpf_program__set_expected_attach_type(prog, attach_type);
 		if (prog_type == BPF_PROG_TYPE_TRACING) {
-			err = libbpf_attach_btf_id_by_name(prog->section_name, &btf_id);
+			err = libbpf_attach_btf_id_by_name(prog->section_name,
+							   &btf_id,
+							   attach_type == BPF_TRACE_RAW_TP);
 			if (err)
 				goto out;
 			prog->attach_btf_id = btf_id;
@@ -4813,6 +4815,10 @@ static const struct {
 	BPF_PROG_SEC("raw_tp/",			BPF_PROG_TYPE_RAW_TRACEPOINT),
 	BPF_PROG_BTF("tp_btf/",			BPF_PROG_TYPE_TRACING,
 						BPF_TRACE_RAW_TP),
+	BPF_PROG_BTF("fentry/",			BPF_PROG_TYPE_TRACING,
+						BPF_TRACE_FENTRY),
+	BPF_PROG_BTF("fexit/",			BPF_PROG_TYPE_TRACING,
+						BPF_TRACE_FEXIT),
 	BPF_PROG_SEC("xdp",			BPF_PROG_TYPE_XDP),
 	BPF_PROG_SEC("perf_event",		BPF_PROG_TYPE_PERF_EVENT),
 	BPF_PROG_SEC("lwt_in",			BPF_PROG_TYPE_LWT_IN),
@@ -4930,7 +4936,7 @@ int libbpf_prog_type_by_name(const char *name, enum bpf_prog_type *prog_type,
 }
 
 #define BTF_PREFIX "btf_trace_"
-static int libbpf_attach_btf_id_by_name(const char *name, __u32 *btf_id)
+static int libbpf_attach_btf_id_by_name(const char *name, __u32 *btf_id, bool raw_tp)
 {
 	struct btf *btf = bpf_core_find_kernel_btf();
 	char raw_tp_btf_name[128] = BTF_PREFIX;
@@ -4950,10 +4956,14 @@ static int libbpf_attach_btf_id_by_name(const char *name, __u32 *btf_id)
 			continue;
 		if (strncmp(name, section_names[i].sec, section_names[i].len))
 			continue;
-		/* prepend "btf_trace_" prefix per kernel convention */
-		strncat(dst, name + section_names[i].len,
-			sizeof(raw_tp_btf_name) - sizeof(BTF_PREFIX));
-		ret = btf__find_by_name(btf, raw_tp_btf_name);
+		if (!raw_tp) {
+			ret = btf__find_by_name(btf, name + section_names[i].len);
+		} else {
+			/* prepend "btf_trace_" prefix per kernel convention */
+			strncat(dst, name + section_names[i].len,
+				sizeof(raw_tp_btf_name) - sizeof(BTF_PREFIX));
+			ret = btf__find_by_name(btf, raw_tp_btf_name);
+		}
 		if (ret <= 0) {
 			pr_warn("%s is not found in vmlinux BTF\n", dst);
 			goto out;
@@ -5594,6 +5604,37 @@ struct bpf_link *bpf_program__attach_raw_tracepoint(struct bpf_program *prog,
 	return (struct bpf_link *)link;
 }
 
+struct bpf_link *bpf_program__attach_trace(struct bpf_program *prog)
+{
+	char errmsg[STRERR_BUFSIZE];
+	struct bpf_link_fd *link;
+	int prog_fd, pfd;
+
+	prog_fd = bpf_program__fd(prog);
+	if (prog_fd < 0) {
+		pr_warn("program '%s': can't attach before loaded\n",
+			bpf_program__title(prog, false));
+		return ERR_PTR(-EINVAL);
+	}
+
+	link = malloc(sizeof(*link));
+	if (!link)
+		return ERR_PTR(-ENOMEM);
+	link->link.destroy = &bpf_link__destroy_fd;
+
+	pfd = bpf_raw_tracepoint_open(NULL, prog_fd);
+	if (pfd < 0) {
+		pfd = -errno;
+		free(link);
+		pr_warn("program '%s': failed to attach to trace: %s\n",
+			bpf_program__title(prog, false),
+			libbpf_strerror_r(pfd, errmsg, sizeof(errmsg)));
+		return ERR_PTR(pfd);
+	}
+	link->fd = pfd;
+	return (struct bpf_link *)link;
+}
+
 enum bpf_perf_event_ret
 bpf_perf_event_read_simple(void *mmap_mem, size_t mmap_size, size_t page_size,
 			   void **copy_mem, size_t *copy_size,
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 6ddc0419337b..0f53b0e0e135 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -248,6 +248,8 @@ LIBBPF_API struct bpf_link *
 bpf_program__attach_raw_tracepoint(struct bpf_program *prog,
 				   const char *tp_name);
 
+LIBBPF_API struct bpf_link *
+bpf_program__attach_trace(struct bpf_program *prog);
 struct bpf_insn;
 
 /*
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 86173cbb159d..f7b4e062d704 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -198,6 +198,7 @@ LIBBPF_0.0.6 {
 		bpf_map__set_pin_path;
 		bpf_object__open_file;
 		bpf_object__open_mem;
+		bpf_program__attach_trace;
 		bpf_program__get_expected_attach_type;
 		bpf_program__get_type;
 		bpf_program__is_tracing;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 5/7] selftest/bpf: Simple test for fentry/fexit
  2019-11-02 22:00 [PATCH bpf-next 0/7] Introduce BPF trampoline Alexei Starovoitov
                   ` (3 preceding siblings ...)
  2019-11-02 22:00 ` [PATCH bpf-next 4/7] libbpf: Add support to attach to fentry/fexit tracing progs Alexei Starovoitov
@ 2019-11-02 22:00 ` Alexei Starovoitov
  2019-11-05 21:37   ` Andrii Nakryiko
  2019-11-02 22:00 ` [PATCH bpf-next 6/7] bpf: Add kernel test functions for fentry testing Alexei Starovoitov
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2019-11-02 22:00 UTC (permalink / raw)
  To: davem; +Cc: daniel, peterz, rostedt, x86, netdev, bpf, kernel-team

Add simple test for fentry and fexit programs around eth_type_trans.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 .../selftests/bpf/prog_tests/kfree_skb.c      | 37 +++++++++++--
 tools/testing/selftests/bpf/progs/kfree_skb.c | 52 +++++++++++++++++++
 2 files changed, 86 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/kfree_skb.c b/tools/testing/selftests/bpf/prog_tests/kfree_skb.c
index 430b50de1583..d3402261bbae 100644
--- a/tools/testing/selftests/bpf/prog_tests/kfree_skb.c
+++ b/tools/testing/selftests/bpf/prog_tests/kfree_skb.c
@@ -30,15 +30,17 @@ void test_kfree_skb(void)
 		.file = "./kfree_skb.o",
 	};
 
+	struct bpf_link *link = NULL, *link_fentry = NULL, *link_fexit = NULL;
+	struct bpf_program *prog, *fentry, *fexit;
 	struct bpf_object *obj, *obj2 = NULL;
 	struct perf_buffer_opts pb_opts = {};
 	struct perf_buffer *pb = NULL;
-	struct bpf_link *link = NULL;
-	struct bpf_map *perf_buf_map;
-	struct bpf_program *prog;
+	struct bpf_map *perf_buf_map, *global_data;
 	__u32 duration, retval;
 	int err, pkt_fd, kfree_skb_fd;
 	bool passed = false;
+	const int zero = 0;
+	bool test_ok[2];
 
 	err = bpf_prog_load("./test_pkt_access.o", BPF_PROG_TYPE_SCHED_CLS, &obj, &pkt_fd);
 	if (CHECK(err, "prog_load sched cls", "err %d errno %d\n", err, errno))
@@ -51,9 +53,26 @@ void test_kfree_skb(void)
 	prog = bpf_object__find_program_by_title(obj2, "tp_btf/kfree_skb");
 	if (CHECK(!prog, "find_prog", "prog kfree_skb not found\n"))
 		goto close_prog;
+	fentry = bpf_object__find_program_by_title(obj2, "fentry/eth_type_trans");
+	if (CHECK(!fentry, "find_prog", "prog eth_type_trans not found\n"))
+		goto close_prog;
+	fexit = bpf_object__find_program_by_title(obj2, "fexit/eth_type_trans");
+	if (CHECK(!fexit, "find_prog", "prog eth_type_trans not found\n"))
+		goto close_prog;
+
+	global_data = bpf_object__find_map_by_name(obj2, "kfree_sk.bss");
+	if (CHECK(!global_data, "find global data", "not found\n"))
+		goto close_prog;
+
 	link = bpf_program__attach_raw_tracepoint(prog, NULL);
 	if (CHECK(IS_ERR(link), "attach_raw_tp", "err %ld\n", PTR_ERR(link)))
 		goto close_prog;
+	link_fentry = bpf_program__attach_trace(fentry);
+	if (CHECK(IS_ERR(link_fentry), "attach fentry", "err %ld\n", PTR_ERR(link_fentry)))
+		goto close_prog;
+	link_fexit = bpf_program__attach_trace(fexit);
+	if (CHECK(IS_ERR(link_fexit), "attach fexit", "err %ld\n", PTR_ERR(link_fexit)))
+		goto close_prog;
 
 	perf_buf_map = bpf_object__find_map_by_name(obj2, "perf_buf_map");
 	if (CHECK(!perf_buf_map, "find_perf_buf_map", "not found\n"))
@@ -76,14 +95,26 @@ void test_kfree_skb(void)
 	err = perf_buffer__poll(pb, 100);
 	if (CHECK(err < 0, "perf_buffer__poll", "err %d\n", err))
 		goto close_prog;
+
 	/* make sure kfree_skb program was triggered
 	 * and it sent expected skb into ring buffer
 	 */
 	CHECK_FAIL(!passed);
+
+	err = bpf_map_lookup_elem(bpf_map__fd(global_data), &zero, test_ok);
+	if (CHECK(err, "get_result",
+		  "failed to get output data: %d\n", err))
+		goto close_prog;
+
+	CHECK_FAIL(!test_ok[0] || !test_ok[1]);
 close_prog:
 	perf_buffer__free(pb);
 	if (!IS_ERR_OR_NULL(link))
 		bpf_link__destroy(link);
+	if (!IS_ERR_OR_NULL(link_fentry))
+		bpf_link__destroy(link_fentry);
+	if (!IS_ERR_OR_NULL(link_fexit))
+		bpf_link__destroy(link_fexit);
 	bpf_object__close(obj);
 	bpf_object__close(obj2);
 }
diff --git a/tools/testing/selftests/bpf/progs/kfree_skb.c b/tools/testing/selftests/bpf/progs/kfree_skb.c
index 489319ea1d6a..33fb87ebddee 100644
--- a/tools/testing/selftests/bpf/progs/kfree_skb.c
+++ b/tools/testing/selftests/bpf/progs/kfree_skb.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 // Copyright (c) 2019 Facebook
 #include <linux/bpf.h>
+#include <stdbool.h>
 #include "bpf_helpers.h"
 #include "bpf_endian.h"
 
@@ -101,3 +102,54 @@ int trace_kfree_skb(struct trace_kfree_skb *ctx)
 		       &ifindex, sizeof(ifindex));
 	return 0;
 }
+
+static volatile struct {
+	bool fentry_test_ok;
+	bool fexit_test_ok;
+} result;
+
+struct eth_type_trans_args {
+	struct sk_buff *skb;
+	struct net_device *dev;
+	unsigned short protocol; /* return value available to fexit progs */
+};
+
+SEC("fentry/eth_type_trans")
+int fentry_eth_type_trans(struct eth_type_trans_args *ctx)
+{
+	struct sk_buff *skb = ctx->skb;
+	struct net_device *dev = ctx->dev;
+	int len, ifindex;
+
+	__builtin_preserve_access_index(({
+		len = skb->len;
+		ifindex = dev->ifindex;
+	}));
+
+	/* fentry sees full packet including L2 header */
+	if (len != 74 || ifindex != 1)
+		return 0;
+	result.fentry_test_ok = true;
+	return 0;
+}
+
+SEC("fexit/eth_type_trans")
+int fexit_eth_type_trans(struct eth_type_trans_args *ctx)
+{
+	struct sk_buff *skb = ctx->skb;
+	struct net_device *dev = ctx->dev;
+	int len, ifindex;
+
+	__builtin_preserve_access_index(({
+		len = skb->len;
+		ifindex = dev->ifindex;
+	}));
+
+	/* fexit sees packet without L2 header that eth_type_trans should have
+	 * consumed.
+	 */
+	if (len != 60 || ctx->protocol != bpf_htons(0x86dd) || ifindex != 1)
+		return 0;
+	result.fexit_test_ok = true;
+	return 0;
+}
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 6/7] bpf: Add kernel test functions for fentry testing
  2019-11-02 22:00 [PATCH bpf-next 0/7] Introduce BPF trampoline Alexei Starovoitov
                   ` (4 preceding siblings ...)
  2019-11-02 22:00 ` [PATCH bpf-next 5/7] selftest/bpf: Simple test for fentry/fexit Alexei Starovoitov
@ 2019-11-02 22:00 ` Alexei Starovoitov
  2019-11-02 22:00 ` [PATCH bpf-next 7/7] selftests/bpf: Add test for BPF trampoline Alexei Starovoitov
  2019-11-05 14:31 ` [PATCH bpf-next 0/7] Introduce " Alexei Starovoitov
  7 siblings, 0 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2019-11-02 22:00 UTC (permalink / raw)
  To: davem; +Cc: daniel, peterz, rostedt, x86, netdev, bpf, kernel-team

Add few kernel functions with various number of arguments,
their types and sizes for BPF trampoline testing to cover
different calling conventions.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 net/bpf/test_run.c | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 0be4497cb832..62933279fbba 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -105,6 +105,40 @@ static int bpf_test_finish(const union bpf_attr *kattr,
 	return err;
 }
 
+/* Integer types of various sizes and pointer combinations cover variety of
+ * architecture dependent calling conventions. 7+ can be supported in the
+ * future.
+ */
+int noinline bpf_fentry_test1(int a)
+{
+	return a + 1;
+}
+
+int noinline bpf_fentry_test2(int a, u64 b)
+{
+	return a + b;
+}
+
+int noinline bpf_fentry_test3(char a, int b, u64 c)
+{
+	return a + b + c;
+}
+
+int noinline bpf_fentry_test4(void *a, char b, int c, u64 d)
+{
+	return (long)a + b + c + d;
+}
+
+int noinline bpf_fentry_test5(u64 a, void *b, short c, int d, u64 e)
+{
+	return a + (long)b + c + d + e;
+}
+
+int noinline bpf_fentry_test6(u64 a, void *b, short c, int d, void *e, u64 f)
+{
+	return a + (long)b + c + d + (long)e + f;
+}
+
 static void *bpf_test_init(const union bpf_attr *kattr, u32 size,
 			   u32 headroom, u32 tailroom)
 {
@@ -122,6 +156,13 @@ static void *bpf_test_init(const union bpf_attr *kattr, u32 size,
 		kfree(data);
 		return ERR_PTR(-EFAULT);
 	}
+	if (bpf_fentry_test1(1) != 2 ||
+	    bpf_fentry_test2(2, 3) != 5 ||
+	    bpf_fentry_test3(4, 5, 6) != 15 ||
+	    bpf_fentry_test4((void *)7, 8, 9, 10) != 34 ||
+	    bpf_fentry_test5(11, (void *)12, 13, 14, 15) != 65 ||
+	    bpf_fentry_test6(16, (void *)17, 18, 19, (void *)20, 21) != 111)
+		return ERR_PTR(-EFAULT);
 	return data;
 }
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 7/7] selftests/bpf: Add test for BPF trampoline
  2019-11-02 22:00 [PATCH bpf-next 0/7] Introduce BPF trampoline Alexei Starovoitov
                   ` (5 preceding siblings ...)
  2019-11-02 22:00 ` [PATCH bpf-next 6/7] bpf: Add kernel test functions for fentry testing Alexei Starovoitov
@ 2019-11-02 22:00 ` Alexei Starovoitov
  2019-11-05 21:50   ` Andrii Nakryiko
  2019-11-05 14:31 ` [PATCH bpf-next 0/7] Introduce " Alexei Starovoitov
  7 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2019-11-02 22:00 UTC (permalink / raw)
  To: davem; +Cc: daniel, peterz, rostedt, x86, netdev, bpf, kernel-team

Add sanity test for BPF trampoline that checks kernel functions
with up to 6 arguments of different sizes.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 tools/lib/bpf/bpf_helpers.h                   | 13 +++
 .../selftests/bpf/prog_tests/fentry_test.c    | 65 ++++++++++++++
 .../testing/selftests/bpf/progs/fentry_test.c | 90 +++++++++++++++++++
 3 files changed, 168 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/fentry_test.c
 create mode 100644 tools/testing/selftests/bpf/progs/fentry_test.c

diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
index 0c7d28292898..c63ab1add126 100644
--- a/tools/lib/bpf/bpf_helpers.h
+++ b/tools/lib/bpf/bpf_helpers.h
@@ -44,4 +44,17 @@ enum libbpf_pin_type {
 	LIBBPF_PIN_BY_NAME,
 };
 
+/* The following types should be used by BPF_PROG_TYPE_TRACING program to
+ * access kernel function arguments. BPF trampoline and raw tracepoints
+ * typecast arguments to 'unsigned long long'.
+ */
+typedef int __attribute__((aligned(8))) ks32;
+typedef char __attribute__((aligned(8))) ks8;
+typedef short __attribute__((aligned(8))) ks16;
+typedef long long __attribute__((aligned(8))) ks64;
+typedef unsigned int __attribute__((aligned(8))) ku32;
+typedef unsigned char __attribute__((aligned(8))) ku8;
+typedef unsigned short __attribute__((aligned(8))) ku16;
+typedef unsigned long long __attribute__((aligned(8))) ku64;
+
 #endif
diff --git a/tools/testing/selftests/bpf/prog_tests/fentry_test.c b/tools/testing/selftests/bpf/prog_tests/fentry_test.c
new file mode 100644
index 000000000000..c24c5f739df6
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/fentry_test.c
@@ -0,0 +1,65 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2019 Facebook */
+#include <test_progs.h>
+
+void test_fentry_test(void)
+{
+	struct bpf_prog_load_attr attr = {
+		.file = "./fentry_test.o",
+	};
+
+	char prog_name[] = "fentry/bpf_fentry_testX";
+	struct bpf_object *obj = NULL, *pkt_obj;
+	int err, pkt_fd, kfree_skb_fd, i;
+	struct bpf_link *link[6] = {};
+	struct bpf_program *prog[6];
+	__u32 duration, retval;
+	struct bpf_map *data_map;
+	bool passed = false;
+	const int zero = 0;
+	u64 result[6];
+
+	err = bpf_prog_load("./test_pkt_access.o", BPF_PROG_TYPE_SCHED_CLS,
+			    &pkt_obj, &pkt_fd);
+	if (CHECK(err, "prog_load sched cls", "err %d errno %d\n", err, errno))
+		return;
+	err = bpf_prog_load_xattr(&attr, &obj, &kfree_skb_fd);
+	if (CHECK(err, "prog_load fail", "err %d errno %d\n", err, errno))
+		goto close_prog;
+
+	for (i = 0; i < 6; i++) {
+		prog_name[sizeof(prog_name) - 2] = '1' + i;
+		prog[i] = bpf_object__find_program_by_title(obj, prog_name);
+		if (CHECK(!prog[i], "find_prog", "prog %s not found\n", prog_name))
+			goto close_prog;
+		link[i] = bpf_program__attach_trace(prog[i]);
+	}
+	data_map = bpf_object__find_map_by_name(obj, "fentry_t.bss");
+	if (CHECK(!data_map, "find_data_map", "data map not found\n"))
+		goto close_prog;
+
+	err = bpf_prog_test_run(pkt_fd, 1, &pkt_v6, sizeof(pkt_v6),
+				NULL, NULL, &retval, &duration);
+	CHECK(err || retval, "ipv6",
+	      "err %d errno %d retval %d duration %d\n",
+	      err, errno, retval, duration);
+
+	err = bpf_map_lookup_elem(bpf_map__fd(data_map), &zero, &result);
+	if (CHECK(err, "get_result",
+		  "failed to get output data: %d\n", err))
+		goto close_prog;
+
+	for (i = 0; i < 6; i++)
+		if (CHECK(result[i] != 1, "result", "bpf_fentry_test%d failed err %ld\n",
+			  i + 1, result[i]))
+			goto close_prog;
+
+	passed = true;
+	CHECK_FAIL(!passed);
+close_prog:
+	for (i = 0; i < 6; i++)
+		if (!IS_ERR_OR_NULL(link[i]))
+			bpf_link__destroy(link[i]);
+	bpf_object__close(obj);
+	bpf_object__close(pkt_obj);
+}
diff --git a/tools/testing/selftests/bpf/progs/fentry_test.c b/tools/testing/selftests/bpf/progs/fentry_test.c
new file mode 100644
index 000000000000..545788bf8d50
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/fentry_test.c
@@ -0,0 +1,90 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2019 Facebook */
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+
+char _license[] SEC("license") = "GPL";
+
+struct test1 {
+	ks32 a;
+};
+static volatile __u64 test1_result;
+SEC("fentry/bpf_fentry_test1")
+int test1(struct test1 *ctx)
+{
+	test1_result = ctx->a == 1;
+	return 0;
+}
+
+struct test2 {
+	ks32 a;
+	ku64 b;
+};
+static volatile __u64 test2_result;
+SEC("fentry/bpf_fentry_test2")
+int test2(struct test2 *ctx)
+{
+	test2_result = ctx->a == 2 && ctx->b == 3;
+	return 0;
+}
+
+struct test3 {
+	ks8 a;
+	ks32 b;
+	ku64 c;
+};
+static volatile __u64 test3_result;
+SEC("fentry/bpf_fentry_test3")
+int test3(struct test3 *ctx)
+{
+	test3_result = ctx->a == 4 && ctx->b == 5 && ctx->c == 6;
+	return 0;
+}
+
+struct test4 {
+	void *a;
+	ks8 b;
+	ks32 c;
+	ku64 d;
+};
+static volatile __u64 test4_result;
+SEC("fentry/bpf_fentry_test4")
+int test4(struct test4 *ctx)
+{
+	test4_result = ctx->a == (void *)7 && ctx->b == 8 && ctx->c == 9 &&
+		ctx->d == 10;
+	return 0;
+}
+
+struct test5 {
+	ku64 a;
+	void *b;
+	ks16 c;
+	ks32 d;
+	ku64 e;
+};
+static volatile __u64 test5_result;
+SEC("fentry/bpf_fentry_test5")
+int test5(struct test5 *ctx)
+{
+	test5_result = ctx->a == 11 && ctx->b == (void *)12 && ctx->c == 13 &&
+		ctx->d == 14 && ctx->e == 15;
+	return 0;
+}
+
+struct test6 {
+	ku64 a;
+	void *b;
+	ks16 c;
+	ks32 d;
+	void *e;
+	ks64 f;
+};
+static volatile __u64 test6_result;
+SEC("fentry/bpf_fentry_test6")
+int test6(struct test6 *ctx)
+{
+	test6_result = ctx->a == 16 && ctx->b == (void *)17 && ctx->c == 18 &&
+		ctx->d == 19 && ctx->e == (void *)20 && ctx->f == 21;
+	return 0;
+}
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 0/7] Introduce BPF trampoline
  2019-11-02 22:00 [PATCH bpf-next 0/7] Introduce BPF trampoline Alexei Starovoitov
                   ` (6 preceding siblings ...)
  2019-11-02 22:00 ` [PATCH bpf-next 7/7] selftests/bpf: Add test for BPF trampoline Alexei Starovoitov
@ 2019-11-05 14:31 ` Alexei Starovoitov
  2019-11-05 15:40   ` Steven Rostedt
  7 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2019-11-05 14:31 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: davem, daniel, peterz, rostedt, x86, netdev, bpf, kernel-team

On Sat, Nov 02, 2019 at 03:00:18PM -0700, Alexei Starovoitov wrote:
> Introduce BPF trampoline that works as a bridge between kernel functions and
> BPF programs. The first use case is fentry/fexit BPF programs that are roughly
> equivalent to kprobe/kretprobe. Unlike k[ret]probe there is practically zero
> overhead to call a set of BPF programs before or after kernel function. In the
> future patches networking use cases will be explored. For example: BPF
> trampoline can be used to call XDP programs from drivers with direct calls or
> wrapping BPF program into another BPF program.
> 
> The patch set depends on register_ftrace_direct() API. It's not upstream yet
> and available in [1]. The first patch is a hack to workaround this dependency.
> The idea is to land this set via bpf-next tree and land register_ftrace_direct
> via Steven's ftrace tree. Then during the merge window revert the first patch.
> Steven,
> do you think it's workable?
> As an alternative we can route register_ftrace_direct patches via bpf-next ?
> 
> Peter's static_call patches [2] are solving the issue of indirect call overhead
> for large class of kernel use cases, but unfortunately don't help calling into
> a set of BPF programs.  BPF trampoline's first goal is to translate kernel
> calling convention into BPF calling convention. The second goal is to call a
> set of programs efficiently. In the future we can replace BPF_PROG_RUN_ARRAY
> with BPF trampoline variant. Another future work is to add support for static_key,
> static_jmp and static_call inside generated BPF trampoline.
> 
> Please see patch 3 for details.
> 
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git/commit/?h=ftrace/direct&id=3ac423d902727884a389699fd7294c0e2e94b29c

Steven,
look slike your branch hasn't been updated in 13 days.
Are you still planning to fix the bugs and send it in for the merge window?
I cannot afford to wait a full release, since I have my own follow for
XDP/networking based on this and other folks are building their stuff on top of
BPF trampoline too. So I'm thinking for v2 I will switch to using text_poke()
and add appropriate guard checks, so it will be safe out of the box without
ftrace dependency. If register_ftrace_direct() indeed comes in soon I'll
switch to that and will make bpf trampoline to co-operate with ftrace.
text_poke approach will make it such that what ever comes first to trace the
fentry (either bpf trampoline or ftrace) will grab the nop and the other system
will not be able to attach. That's safe and correct, but certainly not nice
long term. So the fix will be needed. The key point that switching to text_poke
will remove the register_ftrace_direct() dependency, unblock bpf developers and
will give you time to land your stuff at your pace.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 0/7] Introduce BPF trampoline
  2019-11-05 14:31 ` [PATCH bpf-next 0/7] Introduce " Alexei Starovoitov
@ 2019-11-05 15:40   ` Steven Rostedt
  2019-11-05 15:47     ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2019-11-05 15:40 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, davem, daniel, peterz, x86, netdev, bpf,
	kernel-team, Thomas Gleixner, Linus Torvalds

On Tue, 5 Nov 2019 06:31:56 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> Steven,
> look slike your branch hasn't been updated in 13 days.
> Are you still planning to fix the bugs and send it in for the merge window?
> I cannot afford to wait a full release, since I have my own follow for
> XDP/networking based on this and other folks are building their stuff on top of
> BPF trampoline too. So I'm thinking for v2 I will switch to using text_poke()
> and add appropriate guard checks, so it will be safe out of the box without
> ftrace dependency. If register_ftrace_direct() indeed comes in soon I'll
> switch to that and will make bpf trampoline to co-operate with ftrace.
> text_poke approach will make it such that what ever comes first to trace the
> fentry (either bpf trampoline or ftrace) will grab the nop and the other system
> will not be able to attach. That's safe and correct, but certainly not nice
> long term. So the fix will be needed. The key point that switching to text_poke
> will remove the register_ftrace_direct() dependency, unblock bpf developers and
> will give you time to land your stuff at your pace.

Alexei,

I am still working on it, and if it seems stable enough I will submit
it for this merge window, but there's no guarantees. It's ready when
it's ready. I gave 5 talks last week (4 in Lyon, and one is Sofia,
Bulgaria), thus I did not have time to work on it then. I'm currently
in the Sofia office, and I got a version working that seems to be
stable. But I still have to break it up into a patch series, and run it
through more stress tests.

If you have to wait you may need to wait. The Linux kernel isn't
something that is suppose to put in temporary hacks, just to satisfy
someone's deadline. Feel free to fork the kernel if you need to. That's
what everyone else does.

The RT patch has been out of tree specifically because we never pushed
in the hacks to make it work. We could have landed RT in the tree years
ago, but we *never* pushed in something that wasn't ready. I suggest
the BPF folks follow suit.

-- Steve

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 0/7] Introduce BPF trampoline
  2019-11-05 15:40   ` Steven Rostedt
@ 2019-11-05 15:47     ` Alexei Starovoitov
  2019-11-05 16:00       ` Steven Rostedt
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2019-11-05 15:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexei Starovoitov, davem, daniel, peterz, x86, netdev, bpf,
	kernel-team, Thomas Gleixner, Linus Torvalds

On Tue, Nov 05, 2019 at 10:40:24AM -0500, Steven Rostedt wrote:
> On Tue, 5 Nov 2019 06:31:56 -0800
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> > Steven,
> > look slike your branch hasn't been updated in 13 days.
> > Are you still planning to fix the bugs and send it in for the merge window?
> > I cannot afford to wait a full release, since I have my own follow for
> > XDP/networking based on this and other folks are building their stuff on top of
> > BPF trampoline too. So I'm thinking for v2 I will switch to using text_poke()
> > and add appropriate guard checks, so it will be safe out of the box without
> > ftrace dependency. If register_ftrace_direct() indeed comes in soon I'll
> > switch to that and will make bpf trampoline to co-operate with ftrace.
> > text_poke approach will make it such that what ever comes first to trace the
> > fentry (either bpf trampoline or ftrace) will grab the nop and the other system
> > will not be able to attach. That's safe and correct, but certainly not nice
> > long term. So the fix will be needed. The key point that switching to text_poke
> > will remove the register_ftrace_direct() dependency, unblock bpf developers and
> > will give you time to land your stuff at your pace.
> 
> Alexei,
> 
> I am still working on it, and if it seems stable enough I will submit
> it for this merge window, but there's no guarantees. It's ready when
> it's ready. I gave 5 talks last week (4 in Lyon, and one is Sofia,
> Bulgaria), thus I did not have time to work on it then. I'm currently
> in the Sofia office, and I got a version working that seems to be
> stable. But I still have to break it up into a patch series, and run it
> through more stress tests.
> 
> If you have to wait you may need to wait. The Linux kernel isn't
> something that is suppose to put in temporary hacks, just to satisfy
> someone's deadline.

Ok. I will switch to text_poke and will make it hack free.
ftrace mechanisms are being replaced by text_poke anyway.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 0/7] Introduce BPF trampoline
  2019-11-05 15:47     ` Alexei Starovoitov
@ 2019-11-05 16:00       ` Steven Rostedt
  2019-11-05 16:28         ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2019-11-05 16:00 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, davem, daniel, peterz, x86, netdev, bpf,
	kernel-team, Thomas Gleixner, Linus Torvalds

On Tue, 5 Nov 2019 07:47:11 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> > If you have to wait you may need to wait. The Linux kernel isn't
> > something that is suppose to put in temporary hacks, just to satisfy
> > someone's deadline.  
> 
> Ok. I will switch to text_poke and will make it hack free.
> ftrace mechanisms are being replaced by text_poke anyway.

I see that Facebook now owns Linux.

Peter's text poke patches most likely not be ready for the next
merge window either. Don't you require them?

The database of function nops are part of the ftrace mechanisms which
are not part of text poke, and there's strong accounting associated to
them which allows the user to see how their kernel is modified. This is
why I was against the live kernel patching modifying the function code
directly, because the user loses out on visibility into how their
kernel is being modified. Any "replacement" would require the same
transparency into the modification of the kernel.

I see how you work. You pressure someone into jumping to do your all
mighty work, and if they don't jump as you would like them too, you
work to circumvent them and try to make them irrelevant.

I'm sure the rest of the community will enjoy working with you too.

-- Steve

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 0/7] Introduce BPF trampoline
  2019-11-05 16:00       ` Steven Rostedt
@ 2019-11-05 16:28         ` Alexei Starovoitov
  2019-11-05 17:26           ` Steven Rostedt
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2019-11-05 16:28 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexei Starovoitov, davem, daniel, peterz, x86, netdev, bpf,
	kernel-team, Thomas Gleixner, Linus Torvalds

On Tue, Nov 05, 2019 at 11:00:28AM -0500, Steven Rostedt wrote:
> On Tue, 5 Nov 2019 07:47:11 -0800
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> > > If you have to wait you may need to wait. The Linux kernel isn't
> > > something that is suppose to put in temporary hacks, just to satisfy
> > > someone's deadline.  
> > 
> > Ok. I will switch to text_poke and will make it hack free.
> > ftrace mechanisms are being replaced by text_poke anyway.
> 
> I see that Facebook now owns Linux.

huh?

> Peter's text poke patches most likely not be ready for the next
> merge window either. Don't you require them?

nope.
But I strongly support them. ftrace->text_poke + static_call + nop2
are all great improvements.
I'd really like to see them landing in this merge window.

> The database of function nops are part of the ftrace mechanisms which
> are not part of text poke, and there's strong accounting associated to
> them which allows the user to see how their kernel is modified. 

I guess the part that wasn't obvious from commit log of bpf trampoline patches
is that they don't care about nops and ftrace recording of nops. bpf trampoline
will work even if there are no nops in front of the function. It will work when
CONFIG_HAVE_FENTRY is off.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 0/7] Introduce BPF trampoline
  2019-11-05 16:28         ` Alexei Starovoitov
@ 2019-11-05 17:26           ` Steven Rostedt
  2019-11-05 17:59             ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2019-11-05 17:26 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, davem, daniel, peterz, x86, netdev, bpf,
	kernel-team, Thomas Gleixner, Linus Torvalds

On Tue, 5 Nov 2019 08:28:02 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Tue, Nov 05, 2019 at 11:00:28AM -0500, Steven Rostedt wrote:
> > On Tue, 5 Nov 2019 07:47:11 -0800
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >   
> > > > If you have to wait you may need to wait. The Linux kernel isn't
> > > > something that is suppose to put in temporary hacks, just to satisfy
> > > > someone's deadline.    
> > > 
> > > Ok. I will switch to text_poke and will make it hack free.
> > > ftrace mechanisms are being replaced by text_poke anyway.  
> > 
> > I see that Facebook now owns Linux.  
> 
> huh?

Sorry, I'm a bit grumpy. I've been non stop for over a week (7 days of
conferences), and I'm still not done traveling yet. :-p


> 
> > Peter's text poke patches most likely not be ready for the next
> > merge window either. Don't you require them?  
> 
> nope.
> But I strongly support them. ftrace->text_poke + static_call + nop2
> are all great improvements.
> I'd really like to see them landing in this merge window.
> 
> > The database of function nops are part of the ftrace mechanisms which
> > are not part of text poke, and there's strong accounting associated to
> > them which allows the user to see how their kernel is modified.   
> 
> I guess the part that wasn't obvious from commit log of bpf trampoline patches
> is that they don't care about nops and ftrace recording of nops. bpf trampoline
> will work even if there are no nops in front of the function. It will work when
> CONFIG_HAVE_FENTRY is off.

I'm guessing it will use kprobes (or optimized probes). I haven't had a
chance to look at your patches.

I still think using the register_ftrace_direct() will be cleaner (as it
is built on top of code that's been in the kernel for a decade).
Perhaps we can make it work even without the full ftrace code.

-- Steve

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 0/7] Introduce BPF trampoline
  2019-11-05 17:26           ` Steven Rostedt
@ 2019-11-05 17:59             ` Alexei Starovoitov
  0 siblings, 0 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2019-11-05 17:59 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexei Starovoitov, davem, daniel, peterz, x86, netdev, bpf,
	kernel-team, Thomas Gleixner, Linus Torvalds

On Tue, Nov 05, 2019 at 12:26:29PM -0500, Steven Rostedt wrote:
> 
> I'm guessing it will use kprobes (or optimized probes). I haven't had a
> chance to look at your patches.

People complained that kprobe and especially kretprobe is too slow and too
unpredictable (not guarantees that prog will be executed and k*probe won't be
missed). For bpf to attach to fentry/fexit none of k*probe stuff is needed.

> I still think using the register_ftrace_direct() will be cleaner (as it
> is built on top of code that's been in the kernel for a decade).
> Perhaps we can make it work even without the full ftrace code.

Yes and I still agree that long term using register_ftrace_direct() is cleaner.
What I strongly disagree is that bpf developers need to wait for it to land.
Especially when there is no direct dependency. It's nice to have both bpf
trampoline and ftrace to be functional at the same time tracing the same kernel
function. But for the time being running one or another is an acceptable
limitation. Especially since the path to making it clean is already defined and
agreed.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 3/7] bpf: Introduce BPF trampoline
  2019-11-02 22:00 ` [PATCH bpf-next 3/7] bpf: Introduce BPF trampoline Alexei Starovoitov
@ 2019-11-05 19:51   ` Andrii Nakryiko
  0 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2019-11-05 19:51 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Peter Ziljstra, Steven Rostedt,
	x86, Networking, bpf, Kernel Team

On Sat, Nov 2, 2019 at 3:01 PM Alexei Starovoitov <ast@kernel.org> wrote:
>
> Introduce BPF trampoline concept to allow kernel code to call into BPF programs
> with practically zero overhead.  The trampoline generation logic is
> architecture dependent.  It's converting native calling convention into BPF
> calling convention.  BPF ISA is 64-bit (even on 32-bit architectures). The
> registers R1 to R5 are used to pass arguments into BPF functions. The main BPF
> program accepts only single argument "ctx" in R1. Whereas CPU native calling
> convention is different. x86-64 is passing first 6 arguments in registers
> and the rest on the stack. x86-32 is passing first 3 arguments in registers.
> sparc64 is passing first 6 in registers. And so on.
>
> The trampolines between BPF and kernel already exist.  BPF_CALL_x macros in
> include/linux/filter.h statically compile trampolines from BPF into kernel
> helpers. They convert up to five u64 arguments into kernel C pointers and
> integers. On 64-bit architectures this BPF_to_kernel trampolines are nops. On
> 32-bit architecture they're meaningful.
>
> The opposite job kernel_to_BPF trampolines is done by CAST_TO_U64 macros and
> __bpf_trace_##call() shim functions in include/trace/bpf_probe.h. They convert
> kernel function arguments into array of u64s that BPF program consumes via
> R1=ctx pointer.
>
> This patch set is doing the same job as __bpf_trace_##call() static
> trampolines, but dynamically for any kernel function. There are ~22k global
> kernel functions that are attachable via ftrace. The function arguments and
> types are described in BTF.  The job of btf_distill_kernel_func() function is
> to extract useful information from BTF into "function model" that architecture
> dependent trampoline generators will use to generate assembly code to cast
> kernel function arguments into array of u64s.  For example the kernel function
> eth_type_trans has two pointers. They will be casted to u64 and stored into
> stack of generated trampoline. The pointer to that stack space will be passed
> into BPF program in R1. On x86-64 such generated trampoline will consume 16
> bytes of stack and two stores of %rdi and %rsi into stack. The verifier will
> make sure that only two u64 are accessed read-only by BPF program. The verifier
> will also recognize the precise type of the pointers being accessed and will
> not allow typecasting of the pointer to a different type within BPF program.
>
> The tracing use case in the datacenter demonstrated that certain key kernel
> functions have (like tcp_retransmit_skb) have 2 or more kprobes that are always
> active.  Other functions have both kprobe and kretprobe.  So it is essential to
> keep both kernel code and BPF programs executing at maximum speed. Hence
> generated BPF trampoline is re-generated every time new program is attached or
> detached to maintain maximum performance.
>
> To avoid the high cost of retpoline the attached BPF programs are called
> directly. __bpf_prog_enter/exit() are used to support per-program execution
> stats.  In the future this logic will be optimized further by adding support
> for bpf_stats_enabled_key inside generated assembly code. Introduction of
> preemptible and sleepable BPF programs will completely remove the need to call
> to __bpf_prog_enter/exit().
>
> Detach of a BPF program from the trampoline should not fail. To avoid memory
> allocation in detach path the half of the page is used as a reserve and flipped
> after each attach/detach. 2k bytes is enough to call 40+ BPF programs directly
> which is enough for BPF tracing use cases. This limit can be increased in the
> future.
>
> BPF_TRACE_FENTRY programs have access to raw kernel function arguments while
> BPF_TRACE_FEXIT programs have access to kernel return value as well. Often
> kprobe BPF program remembers function arguments in a map while kretprobe
> fetches arguments from a map and analyzes them together with return value.
> BPF_TRACE_FEXIT accelerates this typical use case.
>
> Recursion prevention for kprobe BPF programs is done via per-cpu
> bpf_prog_active counter. In practice that turned out to be a mistake. It
> caused programs to randomly skip execution. The tracing tools missed results
> they were looking for. Hence BPF trampoline doesn't provide builtin recursion
> prevention. It's a job of BPF program itself and will be addressed in the
> follow up patches.
>
> BPF trampoline is intended to be used beyond tracing and fentry/fexit use cases
> in the future. For example to remove retpoline cost from XDP programs.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---

Acked-by: Andrii Nakryiko <andriin@fb.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 4/7] libbpf: Add support to attach to fentry/fexit tracing progs
  2019-11-02 22:00 ` [PATCH bpf-next 4/7] libbpf: Add support to attach to fentry/fexit tracing progs Alexei Starovoitov
@ 2019-11-05 21:17   ` Andrii Nakryiko
  2019-11-05 23:17     ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Andrii Nakryiko @ 2019-11-05 21:17 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Peter Ziljstra, Steven Rostedt,
	x86, Networking, bpf, Kernel Team

On Sat, Nov 2, 2019 at 3:03 PM Alexei Starovoitov <ast@kernel.org> wrote:
>
> Teach libbpf to recognize tracing programs types and attach them to
> fentry/fexit.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  tools/include/uapi/linux/bpf.h |  2 ++
>  tools/lib/bpf/libbpf.c         | 55 +++++++++++++++++++++++++++++-----
>  tools/lib/bpf/libbpf.h         |  2 ++
>  tools/lib/bpf/libbpf.map       |  1 +
>  4 files changed, 53 insertions(+), 7 deletions(-)
>
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index df6809a76404..69c200e6e696 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -201,6 +201,8 @@ enum bpf_attach_type {
>         BPF_CGROUP_GETSOCKOPT,
>         BPF_CGROUP_SETSOCKOPT,
>         BPF_TRACE_RAW_TP,
> +       BPF_TRACE_FENTRY,
> +       BPF_TRACE_FEXIT,
>         __MAX_BPF_ATTACH_TYPE
>  };
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 7aa2a2a22cef..03e784f36dd9 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -3744,7 +3744,7 @@ bpf_object__load_progs(struct bpf_object *obj, int log_level)
>         return 0;
>  }
>
> -static int libbpf_attach_btf_id_by_name(const char *name, __u32 *btf_id);
> +static int libbpf_attach_btf_id_by_name(const char *name, __u32 *btf_id, bool raw_tp);

Bools are hard to follow in code, why not just passing full
attach_type instead? It will also be more future-proof, if we need
another trick, similar to "bpf_trace_" prefix for raw_tp?

Also, I have a mild preference for having output arguments to be the
very last in the argument list. Do you mind reordering so thar bool
raw_tp is second?

>
>  static struct bpf_object *
>  __bpf_object__open(const char *path, const void *obj_buf, size_t obj_buf_sz,
> @@ -3811,7 +3811,9 @@ __bpf_object__open(const char *path, const void *obj_buf, size_t obj_buf_sz,
>                 bpf_program__set_type(prog, prog_type);
>                 bpf_program__set_expected_attach_type(prog, attach_type);
>                 if (prog_type == BPF_PROG_TYPE_TRACING) {
> -                       err = libbpf_attach_btf_id_by_name(prog->section_name, &btf_id);
> +                       err = libbpf_attach_btf_id_by_name(prog->section_name,
> +                                                          &btf_id,
> +                                                          attach_type == BPF_TRACE_RAW_TP);

[...]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 5/7] selftest/bpf: Simple test for fentry/fexit
  2019-11-02 22:00 ` [PATCH bpf-next 5/7] selftest/bpf: Simple test for fentry/fexit Alexei Starovoitov
@ 2019-11-05 21:37   ` Andrii Nakryiko
  0 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2019-11-05 21:37 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Peter Ziljstra, Steven Rostedt,
	x86, Networking, bpf, Kernel Team

On Sat, Nov 2, 2019 at 3:04 PM Alexei Starovoitov <ast@kernel.org> wrote:
>
> Add simple test for fentry and fexit programs around eth_type_trans.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---

LGTM, but please fix formatting.

Acked-by: Andrii Nakryiko <andriin@fb.com>

>  .../selftests/bpf/prog_tests/kfree_skb.c      | 37 +++++++++++--
>  tools/testing/selftests/bpf/progs/kfree_skb.c | 52 +++++++++++++++++++
>  2 files changed, 86 insertions(+), 3 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/kfree_skb.c b/tools/testing/selftests/bpf/prog_tests/kfree_skb.c
> index 430b50de1583..d3402261bbae 100644
> --- a/tools/testing/selftests/bpf/prog_tests/kfree_skb.c
> +++ b/tools/testing/selftests/bpf/prog_tests/kfree_skb.c
> @@ -30,15 +30,17 @@ void test_kfree_skb(void)
>                 .file = "./kfree_skb.o",
>         };
>
> +       struct bpf_link *link = NULL, *link_fentry = NULL, *link_fexit = NULL;
> +       struct bpf_program *prog, *fentry, *fexit;
>         struct bpf_object *obj, *obj2 = NULL;
>         struct perf_buffer_opts pb_opts = {};
>         struct perf_buffer *pb = NULL;
> -       struct bpf_link *link = NULL;
> -       struct bpf_map *perf_buf_map;
> -       struct bpf_program *prog;
> +       struct bpf_map *perf_buf_map, *global_data;
>         __u32 duration, retval;
>         int err, pkt_fd, kfree_skb_fd;
>         bool passed = false;
> +       const int zero = 0;
> +       bool test_ok[2];
>
>         err = bpf_prog_load("./test_pkt_access.o", BPF_PROG_TYPE_SCHED_CLS, &obj, &pkt_fd);

too long ;)

>         if (CHECK(err, "prog_load sched cls", "err %d errno %d\n", err, errno))
> @@ -51,9 +53,26 @@ void test_kfree_skb(void)
>         prog = bpf_object__find_program_by_title(obj2, "tp_btf/kfree_skb");
>         if (CHECK(!prog, "find_prog", "prog kfree_skb not found\n"))
>                 goto close_prog;
> +       fentry = bpf_object__find_program_by_title(obj2, "fentry/eth_type_trans");
> +       if (CHECK(!fentry, "find_prog", "prog eth_type_trans not found\n"))
> +               goto close_prog;
> +       fexit = bpf_object__find_program_by_title(obj2, "fexit/eth_type_trans");
> +       if (CHECK(!fexit, "find_prog", "prog eth_type_trans not found\n"))
> +               goto close_prog;
> +
> +       global_data = bpf_object__find_map_by_name(obj2, "kfree_sk.bss");
> +       if (CHECK(!global_data, "find global data", "not found\n"))
> +               goto close_prog;
> +
>         link = bpf_program__attach_raw_tracepoint(prog, NULL);
>         if (CHECK(IS_ERR(link), "attach_raw_tp", "err %ld\n", PTR_ERR(link)))
>                 goto close_prog;
> +       link_fentry = bpf_program__attach_trace(fentry);
> +       if (CHECK(IS_ERR(link_fentry), "attach fentry", "err %ld\n", PTR_ERR(link_fentry)))
> +               goto close_prog;
> +       link_fexit = bpf_program__attach_trace(fexit);
> +       if (CHECK(IS_ERR(link_fexit), "attach fexit", "err %ld\n", PTR_ERR(link_fexit)))

checkpatch.pl?

> +               goto close_prog;
>

[...]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 7/7] selftests/bpf: Add test for BPF trampoline
  2019-11-02 22:00 ` [PATCH bpf-next 7/7] selftests/bpf: Add test for BPF trampoline Alexei Starovoitov
@ 2019-11-05 21:50   ` Andrii Nakryiko
  0 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2019-11-05 21:50 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Peter Ziljstra, Steven Rostedt,
	x86, Networking, bpf, Kernel Team

On Sat, Nov 2, 2019 at 3:04 PM Alexei Starovoitov <ast@kernel.org> wrote:
>
> Add sanity test for BPF trampoline that checks kernel functions
> with up to 6 arguments of different sizes.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  tools/lib/bpf/bpf_helpers.h                   | 13 +++
>  .../selftests/bpf/prog_tests/fentry_test.c    | 65 ++++++++++++++
>  .../testing/selftests/bpf/progs/fentry_test.c | 90 +++++++++++++++++++
>  3 files changed, 168 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/fentry_test.c
>  create mode 100644 tools/testing/selftests/bpf/progs/fentry_test.c
>
> diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
> index 0c7d28292898..c63ab1add126 100644
> --- a/tools/lib/bpf/bpf_helpers.h
> +++ b/tools/lib/bpf/bpf_helpers.h
> @@ -44,4 +44,17 @@ enum libbpf_pin_type {
>         LIBBPF_PIN_BY_NAME,
>  };
>
> +/* The following types should be used by BPF_PROG_TYPE_TRACING program to
> + * access kernel function arguments. BPF trampoline and raw tracepoints
> + * typecast arguments to 'unsigned long long'.
> + */
> +typedef int __attribute__((aligned(8))) ks32;
> +typedef char __attribute__((aligned(8))) ks8;
> +typedef short __attribute__((aligned(8))) ks16;
> +typedef long long __attribute__((aligned(8))) ks64;
> +typedef unsigned int __attribute__((aligned(8))) ku32;
> +typedef unsigned char __attribute__((aligned(8))) ku8;
> +typedef unsigned short __attribute__((aligned(8))) ku16;
> +typedef unsigned long long __attribute__((aligned(8))) ku64;
> +
>  #endif

[...]

> +
> +       err = bpf_prog_load("./test_pkt_access.o", BPF_PROG_TYPE_SCHED_CLS,
> +                           &pkt_obj, &pkt_fd);
> +       if (CHECK(err, "prog_load sched cls", "err %d errno %d\n", err, errno))
> +               return;
> +       err = bpf_prog_load_xattr(&attr, &obj, &kfree_skb_fd);
> +       if (CHECK(err, "prog_load fail", "err %d errno %d\n", err, errno))
> +               goto close_prog;
> +
> +       for (i = 0; i < 6; i++) {
> +               prog_name[sizeof(prog_name) - 2] = '1' + i;
> +               prog[i] = bpf_object__find_program_by_title(obj, prog_name);
> +               if (CHECK(!prog[i], "find_prog", "prog %s not found\n", prog_name))
> +                       goto close_prog;
> +               link[i] = bpf_program__attach_trace(prog[i]);

CHECK() here?

> +       }
> +       data_map = bpf_object__find_map_by_name(obj, "fentry_t.bss");
> +       if (CHECK(!data_map, "find_data_map", "data map not found\n"))
> +               goto close_prog;
> +
> +       err = bpf_prog_test_run(pkt_fd, 1, &pkt_v6, sizeof(pkt_v6),
> +                               NULL, NULL, &retval, &duration);
> +       CHECK(err || retval, "ipv6",
> +             "err %d errno %d retval %d duration %d\n",
> +             err, errno, retval, duration);
> +
> +       err = bpf_map_lookup_elem(bpf_map__fd(data_map), &zero, &result);
> +       if (CHECK(err, "get_result",
> +                 "failed to get output data: %d\n", err))
> +               goto close_prog;
> +
> +       for (i = 0; i < 6; i++)
> +               if (CHECK(result[i] != 1, "result", "bpf_fentry_test%d failed err %ld\n",
> +                         i + 1, result[i]))
> +                       goto close_prog;
> +
> +       passed = true;
> +       CHECK_FAIL(!passed);

passed and CHECK_FAIL are completely redundant?

> +close_prog:
> +       for (i = 0; i < 6; i++)
> +               if (!IS_ERR_OR_NULL(link[i]))
> +                       bpf_link__destroy(link[i]);
> +       bpf_object__close(obj);
> +       bpf_object__close(pkt_obj);
> +}

[...]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 4/7] libbpf: Add support to attach to fentry/fexit tracing progs
  2019-11-05 21:17   ` Andrii Nakryiko
@ 2019-11-05 23:17     ` Alexei Starovoitov
  0 siblings, 0 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2019-11-05 23:17 UTC (permalink / raw)
  To: Andrii Nakryiko, Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Peter Ziljstra, Steven Rostedt,
	x86, Networking, bpf, Kernel Team

On 11/5/19 1:17 PM, Andrii Nakryiko wrote:
> On Sat, Nov 2, 2019 at 3:03 PM Alexei Starovoitov <ast@kernel.org> wrote:
>>
>> Teach libbpf to recognize tracing programs types and attach them to
>> fentry/fexit.
>>
>> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>> ---
>>   tools/include/uapi/linux/bpf.h |  2 ++
>>   tools/lib/bpf/libbpf.c         | 55 +++++++++++++++++++++++++++++-----
>>   tools/lib/bpf/libbpf.h         |  2 ++
>>   tools/lib/bpf/libbpf.map       |  1 +
>>   4 files changed, 53 insertions(+), 7 deletions(-)
>>
>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
>> index df6809a76404..69c200e6e696 100644
>> --- a/tools/include/uapi/linux/bpf.h
>> +++ b/tools/include/uapi/linux/bpf.h
>> @@ -201,6 +201,8 @@ enum bpf_attach_type {
>>          BPF_CGROUP_GETSOCKOPT,
>>          BPF_CGROUP_SETSOCKOPT,
>>          BPF_TRACE_RAW_TP,
>> +       BPF_TRACE_FENTRY,
>> +       BPF_TRACE_FEXIT,
>>          __MAX_BPF_ATTACH_TYPE
>>   };
>>
>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>> index 7aa2a2a22cef..03e784f36dd9 100644
>> --- a/tools/lib/bpf/libbpf.c
>> +++ b/tools/lib/bpf/libbpf.c
>> @@ -3744,7 +3744,7 @@ bpf_object__load_progs(struct bpf_object *obj, int log_level)
>>          return 0;
>>   }
>>
>> -static int libbpf_attach_btf_id_by_name(const char *name, __u32 *btf_id);
>> +static int libbpf_attach_btf_id_by_name(const char *name, __u32 *btf_id, bool raw_tp);
> 
> Bools are hard to follow in code, why not just passing full
> attach_type instead? It will also be more future-proof, if we need
> another trick, similar to "bpf_trace_" prefix for raw_tp?
> 
> Also, I have a mild preference for having output arguments to be the
> very last in the argument list. Do you mind reordering so thar bool
> raw_tp is second?

Agree on both counts. Spotted another small nit in this function
while testing corner cases.
Will fix this and feedback to 5/7, 7/7 in the next version.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-11-05 23:18 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-02 22:00 [PATCH bpf-next 0/7] Introduce BPF trampoline Alexei Starovoitov
2019-11-02 22:00 ` [PATCH bpf-next 1/7] bpf, ftrace: temporary workaround Alexei Starovoitov
2019-11-02 22:00 ` [PATCH bpf-next 2/7] bpf: refactor x86 JIT into helpers Alexei Starovoitov
2019-11-02 22:00 ` [PATCH bpf-next 3/7] bpf: Introduce BPF trampoline Alexei Starovoitov
2019-11-05 19:51   ` Andrii Nakryiko
2019-11-02 22:00 ` [PATCH bpf-next 4/7] libbpf: Add support to attach to fentry/fexit tracing progs Alexei Starovoitov
2019-11-05 21:17   ` Andrii Nakryiko
2019-11-05 23:17     ` Alexei Starovoitov
2019-11-02 22:00 ` [PATCH bpf-next 5/7] selftest/bpf: Simple test for fentry/fexit Alexei Starovoitov
2019-11-05 21:37   ` Andrii Nakryiko
2019-11-02 22:00 ` [PATCH bpf-next 6/7] bpf: Add kernel test functions for fentry testing Alexei Starovoitov
2019-11-02 22:00 ` [PATCH bpf-next 7/7] selftests/bpf: Add test for BPF trampoline Alexei Starovoitov
2019-11-05 21:50   ` Andrii Nakryiko
2019-11-05 14:31 ` [PATCH bpf-next 0/7] Introduce " Alexei Starovoitov
2019-11-05 15:40   ` Steven Rostedt
2019-11-05 15:47     ` Alexei Starovoitov
2019-11-05 16:00       ` Steven Rostedt
2019-11-05 16:28         ` Alexei Starovoitov
2019-11-05 17:26           ` Steven Rostedt
2019-11-05 17:59             ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).