All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiri Olsa <jolsa@kernel.org>
To: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andriin@fb.com>
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org,
	"Martin KaFai Lau" <kafai@fb.com>,
	"Song Liu" <songliubraving@fb.com>, "Yonghong Song" <yhs@fb.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"KP Singh" <kpsingh@chromium.org>, "Daniel Xu" <dxu@dxuuu.xyz>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Jesper Brouer" <jbrouer@redhat.com>,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Viktor Malik" <vmalik@redhat.com>
Subject: [RFC bpf-next 12/16] bpf: Move synchronize_rcu_mult for batch processing (NOT TO BE MERGED)
Date: Thu, 22 Oct 2020 10:21:34 +0200	[thread overview]
Message-ID: <20201022082138.2322434-13-jolsa@kernel.org> (raw)
In-Reply-To: <20201022082138.2322434-1-jolsa@kernel.org>

I noticed some of the profiled workloads did not spend more cycles,
but took more time to finish than current code. I tracked it to rcu
synchronize_rcu_mult call in bpf_trampoline_update and when I called
it just once for batch mode it got faster.

The current processing when attaching the program is:

  for each program:
    bpf(BPF_RAW_TRACEPOINT_OPEN
      bpf_tracing_prog_attach
        bpf_trampoline_link_prog
          bpf_trampoline_update
            synchronize_rcu_mult
            register_ftrace_direct

With the change the synchronize_rcu_mult is called just once:

  bpf(BPF_TRAMPOLINE_BATCH_ATTACH
    for each program:
      bpf_tracing_prog_attach
        bpf_trampoline_link_prog
          bpf_trampoline_update

    synchronize_rcu_mult
    register_ftrace_direct_ips

I'm not sure this does not break stuff, because I don't follow rcu
code that much ;-) However stats are nicer now:

Before:

 Performance counter stats for './test_progs -t attach_test' (5 runs):

        37,410,887      cycles:k             ( +-  0.98% )
        70,062,158      cycles:u             ( +-  0.39% )

             26.80 +- 4.10 seconds time elapsed  ( +- 15.31% )

After:

 Performance counter stats for './test_progs -t attach_test' (5 runs):

        36,812,432      cycles:k             ( +-  2.52% )
        69,907,191      cycles:u             ( +-  0.38% )

             15.04 +- 2.94 seconds time elapsed  ( +- 19.54% )

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/bpf/syscall.c    | 3 +++
 kernel/bpf/trampoline.c | 3 ++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 19fb608546c0..b315803c34d3 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -31,6 +31,7 @@
 #include <linux/poll.h>
 #include <linux/bpf-netns.h>
 #include <linux/rcupdate_trace.h>
+#include <linux/rcupdate_wait.h>
 
 #define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
 			  (map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \
@@ -2920,6 +2921,8 @@ static int bpf_trampoline_batch(const union bpf_attr *attr, int cmd)
 	if (!batch)
 		goto out_clean;
 
+	synchronize_rcu_mult(call_rcu_tasks, call_rcu_tasks_trace);
+
 	for (i = 0; i < count; i++) {
 		if (cmd == BPF_TRAMPOLINE_BATCH_ATTACH) {
 			prog = bpf_prog_get(in[i]);
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index cdad87461e5d..0d5e4c5860a9 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -271,7 +271,8 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr,
 	 * programs finish executing.
 	 * Wait for these two grace periods together.
 	 */
-	synchronize_rcu_mult(call_rcu_tasks, call_rcu_tasks_trace);
+	if (!batch)
+		synchronize_rcu_mult(call_rcu_tasks, call_rcu_tasks_trace);
 
 	err = arch_prepare_bpf_trampoline(new_image, new_image + PAGE_SIZE / 2,
 					  &tr->func.model, flags, tprogs,
-- 
2.26.2


  parent reply	other threads:[~2020-10-22  8:22 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-22  8:21 [RFC bpf-next 00/16] bpf: Speed up trampoline attach Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 01/16] ftrace: Add check_direct_entry function Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 02/16] ftrace: Add adjust_direct_size function Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 03/16] ftrace: Add get/put_direct_func function Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 04/16] ftrace: Add ftrace_set_filter_ips function Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 05/16] ftrace: Add register_ftrace_direct_ips function Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 06/16] ftrace: Add unregister_ftrace_direct_ips function Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 07/16] kallsyms: Use rb tree for kallsyms name search Jiri Olsa
2020-10-28 18:25   ` Jiri Olsa
2020-10-28 21:15     ` Alexei Starovoitov
2020-10-29  9:29       ` Jiri Olsa
2020-10-29 22:45         ` Andrii Nakryiko
2020-10-28 22:40     ` Andrii Nakryiko
2020-10-29  9:33       ` Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 08/16] bpf: Use delayed link free in bpf_link_put Jiri Olsa
2020-10-23 19:46   ` Andrii Nakryiko
2020-10-25 19:02     ` Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 09/16] bpf: Add BPF_TRAMPOLINE_BATCH_ATTACH support Jiri Olsa
2020-10-22 11:55   ` kernel test robot
2020-10-22 11:57   ` kernel test robot
2020-10-23 20:03   ` Andrii Nakryiko
2020-10-23 20:31     ` Steven Rostedt
2020-10-23 22:23       ` Andrii Nakryiko
2020-10-25 19:41         ` Jiri Olsa
2020-10-26 23:19           ` Andrii Nakryiko
2020-10-22  8:21 ` [RFC bpf-next 10/16] bpf: Add BPF_TRAMPOLINE_BATCH_DETACH support Jiri Olsa
2020-10-22 13:00   ` kernel test robot
2020-10-22 13:04   ` kernel test robot
2020-10-22  8:21 ` [RFC bpf-next 11/16] bpf: Sync uapi bpf.h to tools Jiri Olsa
2020-10-22  8:21 ` Jiri Olsa [this message]
2020-10-22  8:21 ` [RFC bpf-next 13/16] libbpf: Add trampoline batch attach support Jiri Olsa
2020-10-23 20:09   ` Andrii Nakryiko
2020-10-25 19:11     ` Jiri Olsa
2020-10-26 23:15       ` Andrii Nakryiko
2020-10-27 19:03         ` Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 14/16] libbpf: Add trampoline batch detach support Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 15/16] selftests/bpf: Add trampoline batch test Jiri Olsa
2020-10-22  8:21 ` [RFC bpf-next 16/16] selftests/bpf: Add attach batch test (NOT TO BE MERGED) Jiri Olsa
2020-10-22 13:35 ` [RFC bpf-next 00/16] bpf: Speed up trampoline attach Steven Rostedt
2020-10-22 14:11   ` Jiri Olsa
2020-10-22 14:42     ` Steven Rostedt
2020-10-22 16:21       ` Steven Rostedt
2020-10-22 20:52         ` Steven Rostedt
2020-10-23  6:09           ` Jiri Olsa
2020-10-23 13:50             ` Steven Rostedt
2020-10-25 19:01               ` Jiri Olsa
2020-10-27  4:30       ` Alexei Starovoitov
2020-10-27 13:14         ` Steven Rostedt
2020-10-27 14:28         ` Jiri Olsa
2020-10-28 21:13           ` Alexei Starovoitov
2020-10-29 11:09             ` Jiri Olsa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201022082138.2322434-13-jolsa@kernel.org \
    --to=jolsa@kernel.org \
    --cc=andriin@fb.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dxu@dxuuu.xyz \
    --cc=jbrouer@redhat.com \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@chromium.org \
    --cc=netdev@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=songliubraving@fb.com \
    --cc=toke@redhat.com \
    --cc=vmalik@redhat.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.