All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next] bpf: mark kprobe_multi_link_prog_run as always inlined function
@ 2024-03-20 20:06 Andrii Nakryiko
  2024-03-21  6:55 ` Alexei Starovoitov
  0 siblings, 1 reply; 4+ messages in thread
From: Andrii Nakryiko @ 2024-03-20 20:06 UTC (permalink / raw)
  To: bpf, ast, daniel, martin.lau; +Cc: andrii, kernel-team

kprobe_multi_link_prog_run() is called both for multi-kprobe and
multi-kretprobe BPF programs from kprobe_multi_link_handler() and
kprobe_multi_link_exit_handler(), respectively.
kprobe_multi_link_prog_run() is doing all the relevant work, with those
wrappers just satisfying ftrace's interfaces (kprobe callback is
supposed to return int, while kretprobe returns void).

With this structure compile performs tail-call optimization:

Dump of assembler code for function kprobe_multi_link_exit_handler:
   0xffffffff8122f1e0 <+0>:     add    $0xffffffffffffffc0,%rdi
   0xffffffff8122f1e4 <+4>:     mov    %rcx,%rdx
   0xffffffff8122f1e7 <+7>:     jmp    0xffffffff81230080 <kprobe_multi_link_prog_run>

This means that when trying to capture LBR that traces all indirect branches
we are wasting an entry just to record that kprobe_multi_link_exit_handler
called/jumped into kprobe_multi_link_prog_run.

LBR entries are especially sparse on AMD CPUs (just 16 entries on latest CPUs
vs typically 32 on latest Intel CPUs), and every entry counts (and we already
have a bunch of other LBR entries spent getting to a BPF program), so it would
be great to not waste any more than necessary.

Marking it as just `static inline` doesn't change anything, compiler
still performs tail call optimization only. But by marking
kprobe_multi_link_prog_run() as __always_inline we ensure that compiler
fully inlines it, avoiding jumps:

Dump of assembler code for function kprobe_multi_link_exit_handler:
   0xffffffff8122f4e0 <+0>:     push   %r15
   0xffffffff8122f4e2 <+2>:     push   %r14
   0xffffffff8122f4e4 <+4>:     push   %r13
   0xffffffff8122f4e6 <+6>:     push   %r12
   0xffffffff8122f4e8 <+8>:     push   %rbx
   0xffffffff8122f4e9 <+9>:     sub    $0x10,%rsp
   0xffffffff8122f4ed <+13>:    mov    %rdi,%r14
   0xffffffff8122f4f0 <+16>:    lea    -0x40(%rdi),%rax

   ...

   0xffffffff8122f590 <+176>:   call   0xffffffff8108e420 <sched_clock>
   0xffffffff8122f595 <+181>:   sub    %r14,%rax
   0xffffffff8122f598 <+184>:   add    %rax,0x8(%rbx,%r13,1)
   0xffffffff8122f59d <+189>:   jmp    0xffffffff8122f541 <kprobe_multi_link_exit_handler+97>

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/trace/bpf_trace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 434e3ece6688..0bebd6f02e17 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2796,7 +2796,7 @@ static u64 bpf_kprobe_multi_entry_ip(struct bpf_run_ctx *ctx)
 	return run_ctx->entry_ip;
 }
 
-static int
+static __always_inline int
 kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,
 			   unsigned long entry_ip, struct pt_regs *regs)
 {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-03-21 16:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-20 20:06 [PATCH bpf-next] bpf: mark kprobe_multi_link_prog_run as always inlined function Andrii Nakryiko
2024-03-21  6:55 ` Alexei Starovoitov
2024-03-21  7:02   ` Alexei Starovoitov
2024-03-21 16:04     ` Andrii Nakryiko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.