All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Jiri Olsa <jolsa@kernel.org>
Cc: "Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Andrii Nakryiko" <andriin@fb.com>,
	Networking <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
	"Martin KaFai Lau" <kafai@fb.com>,
	"Song Liu" <songliubraving@fb.com>, "Yonghong Song" <yhs@fb.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"KP Singh" <kpsingh@chromium.org>, "Daniel Xu" <dxu@dxuuu.xyz>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Jesper Brouer" <jbrouer@redhat.com>,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Viktor Malik" <vmalik@redhat.com>
Subject: Re: [RFC bpf-next 09/16] bpf: Add BPF_TRAMPOLINE_BATCH_ATTACH support
Date: Fri, 23 Oct 2020 13:03:22 -0700	[thread overview]
Message-ID: <CAEf4Bzb_HPmGSoUX+9+LvSP2Yb95OqEQKtjpMiW1Um-rixAM8Q@mail.gmail.com> (raw)
In-Reply-To: <20201022082138.2322434-10-jolsa@kernel.org>

On Thu, Oct 22, 2020 at 8:01 AM Jiri Olsa <jolsa@kernel.org> wrote:
>
> Adding BPF_TRAMPOLINE_BATCH_ATTACH support, that allows to attach
> tracing multiple fentry/fexit pograms to trampolines within one
> syscall.
>
> Currently each tracing program is attached in seprate bpf syscall
> and more importantly by separate register_ftrace_direct call, which
> registers trampoline in ftrace subsystem. We can save some cycles
> by simple using its batch variant register_ftrace_direct_ips.
>
> Before:
>
>  Performance counter stats for './src/bpftrace -ve kfunc:__x64_sys_s*
>     { printf("test\n"); } i:ms:10 { printf("exit\n"); exit();}' (5 runs):
>
>      2,199,433,771      cycles:k               ( +-  0.55% )
>        936,105,469      cycles:u               ( +-  0.37% )
>
>              26.48 +- 3.57 seconds time elapsed  ( +- 13.49% )
>
> After:
>
>  Performance counter stats for './src/bpftrace -ve kfunc:__x64_sys_s*
>     { printf("test\n"); } i:ms:10 { printf("exit\n"); exit();}' (5 runs):
>
>      1,456,854,867      cycles:k               ( +-  0.57% )
>        937,737,431      cycles:u               ( +-  0.13% )
>
>              12.44 +- 2.98 seconds time elapsed  ( +- 23.95% )
>
> The new BPF_TRAMPOLINE_BATCH_ATTACH syscall command expects
> following data in union bpf_attr:
>
>   struct {
>           __aligned_u64   in;
>           __aligned_u64   out;
>           __u32           count;
>   } trampoline_batch;
>
>   in    - pointer to user space array with file descrptors of loaded bpf
>           programs to attach
>   out   - pointer to user space array for resulting link descriptor
>   count - number of 'in/out' file descriptors
>
> Basically the new code gets programs from 'in' file descriptors and
> attaches them the same way the current code does, apart from the last
> step that registers probe ip with trampoline. This is done at the end
> with new register_ftrace_direct_ips function.
>
> The resulting link descriptors are written in 'out' array and match
> 'in' array file descriptors order.
>

I think this is a pretty hard API to use correctly from user-space.
Think about all those partially attached and/or partially detached BPF
programs. And subsequent clean up for them. Also there is nothing even
close to atomicity, so you might get a spurious invocation a few times
before batch-attach fails mid-way and the kernel (hopefully) will
detach those already attached programs in an attempt to clean
everything up. Debugging and handling that is a big pain for users,
IMO.

Here's a raw idea, let's think if it would be possible to implement
something like this. It seems like what you need is to create a set of
logically-grouped placeholders for multiple functions you are about to
attach to. Until the BPF program is attached, those placeholders are
just no-ops (e.g., they might jump to an "inactive" single trampoline,
which just immediately returns). Then you attach the BPF program
atomically into a single place, and all those no-op jumps to a
trampoline start to call the BPF program at the same time. It's not
strictly atomic, but is much closer in time with each other. Also,
because it's still a single trampoline, you get a nice mapping to a
single bpf_link, so detaching is not an issue.

Basically, maybe ftrace subsystem could provide a set of APIs to
prepare a set of functions to attach to. Then BPF subsystem would just
do what it does today, except instead of attaching to a specific
kernel function, it would attach to ftrace's placeholder. I don't know
anything about ftrace implementation, so this might be far off. But I
thought that looking at this problem from a bit of a different angle
would benefit the discussion. Thoughts?


> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  include/linux/bpf.h      | 15 ++++++-
>  include/uapi/linux/bpf.h |  7 ++++
>  kernel/bpf/syscall.c     | 88 ++++++++++++++++++++++++++++++++++++++--
>  kernel/bpf/trampoline.c  | 69 +++++++++++++++++++++++++++----
>  4 files changed, 164 insertions(+), 15 deletions(-)
>

[...]

  parent reply	other threads:[~2020-10-23 20:03 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-22  8:21 [RFC bpf-next 00/16] bpf: Speed up trampoline attach Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 01/16] ftrace: Add check_direct_entry function Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 02/16] ftrace: Add adjust_direct_size function Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 03/16] ftrace: Add get/put_direct_func function Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 04/16] ftrace: Add ftrace_set_filter_ips function Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 05/16] ftrace: Add register_ftrace_direct_ips function Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 06/16] ftrace: Add unregister_ftrace_direct_ips function Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 07/16] kallsyms: Use rb tree for kallsyms name search Jiri Olsa
2020-10-28 18:25   ` Jiri Olsa
2020-10-28 21:15     ` Alexei Starovoitov
2020-10-29  9:29       ` Jiri Olsa
2020-10-29 22:45         ` Andrii Nakryiko
2020-10-28 22:40     ` Andrii Nakryiko
2020-10-29  9:33       ` Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 08/16] bpf: Use delayed link free in bpf_link_put Jiri Olsa
2020-10-23 19:46   ` Andrii Nakryiko
2020-10-25 19:02     ` Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 09/16] bpf: Add BPF_TRAMPOLINE_BATCH_ATTACH support Jiri Olsa
2020-10-22 11:55   ` kernel test robot
2020-10-22 11:57   ` kernel test robot
2020-10-23 20:03   ` Andrii Nakryiko [this message]
2020-10-23 20:31     ` Steven Rostedt
2020-10-23 22:23       ` Andrii Nakryiko
2020-10-25 19:41         ` Jiri Olsa
2020-10-26 23:19           ` Andrii Nakryiko
2020-10-22  8:21 ` [RFC bpf-next 10/16] bpf: Add BPF_TRAMPOLINE_BATCH_DETACH support Jiri Olsa
2020-10-22 13:00   ` kernel test robot
2020-10-22 13:04   ` kernel test robot
2020-10-22  8:21 ` [RFC bpf-next 11/16] bpf: Sync uapi bpf.h to tools Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 12/16] bpf: Move synchronize_rcu_mult for batch processing (NOT TO BE MERGED) Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 13/16] libbpf: Add trampoline batch attach support Jiri Olsa
2020-10-23 20:09   ` Andrii Nakryiko
2020-10-25 19:11     ` Jiri Olsa
2020-10-26 23:15       ` Andrii Nakryiko
2020-10-27 19:03         ` Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 14/16] libbpf: Add trampoline batch detach support Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 15/16] selftests/bpf: Add trampoline batch test Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 16/16] selftests/bpf: Add attach batch test (NOT TO BE MERGED) Jiri Olsa
2020-10-22 13:35 ` [RFC bpf-next 00/16] bpf: Speed up trampoline attach Steven Rostedt
2020-10-22 14:11   ` Jiri Olsa
2020-10-22 14:42     ` Steven Rostedt
2020-10-22 16:21       ` Steven Rostedt
2020-10-22 20:52         ` Steven Rostedt
2020-10-23  6:09           ` Jiri Olsa
2020-10-23 13:50             ` Steven Rostedt
2020-10-25 19:01               ` Jiri Olsa
2020-10-27  4:30       ` Alexei Starovoitov
2020-10-27 13:14         ` Steven Rostedt
2020-10-27 14:28         ` Jiri Olsa
2020-10-28 21:13           ` Alexei Starovoitov
2020-10-29 11:09             ` Jiri Olsa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEf4Bzb_HPmGSoUX+9+LvSP2Yb95OqEQKtjpMiW1Um-rixAM8Q@mail.gmail.com \
    --to=andrii.nakryiko@gmail.com \
    --cc=andriin@fb.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dxu@dxuuu.xyz \
    --cc=jbrouer@redhat.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kafai@fb.com \
    --cc=kpsingh@chromium.org \
    --cc=netdev@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=songliubraving@fb.com \
    --cc=toke@redhat.com \
    --cc=vmalik@redhat.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.