From: Florent Revest <revest@chromium.org>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Xu Kuohai <xukuohai@huawei.com>,
Mark Rutland <mark.rutland@arm.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Xu Kuohai <xukuohai@huaweicloud.com>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
Will Deacon <will@kernel.org>,
Jean-Philippe Brucker <jean-philippe@linaro.org>,
Ingo Molnar <mingo@redhat.com>, Oleg Nesterov <oleg@redhat.com>,
Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Song Liu <song@kernel.org>, Yonghong Song <yhs@fb.com>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>, Zi Shen Lim <zlim.lnx@gmail.com>,
Pasha Tatashin <pasha.tatashin@soleen.com>,
Ard Biesheuvel <ardb@kernel.org>, Marc Zyngier <maz@kernel.org>,
Guo Ren <guoren@kernel.org>,
Masami Hiramatsu <mhiramat@kernel.org>
Subject: Re: [PATCH bpf-next v2 0/4] Add ftrace direct call for arm64
Date: Mon, 17 Oct 2022 19:55:06 +0200 [thread overview]
Message-ID: <CABRcYm+d=xY9nBCJo-6JW_=F41g4X32QM9WOPChaOTfs6d6KCA@mail.gmail.com> (raw)
In-Reply-To: <20221006122922.53802a5c@gandalf.local.home>
On Thu, Oct 6, 2022 at 6:29 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Thu, 6 Oct 2022 18:19:12 +0200
> Florent Revest <revest@chromium.org> wrote:
>
> > Sure, we can give this a try, I'll work on a macro that generates the
> > 7 callbacks and we can check how much that helps. My belief right now
> > is that ftrace's iteration over all ops on arm64 is where we lose most
> > time but now that we have numbers it's pretty easy to check hypothesis
> > :)
>
> Ah, I forgot that's what Mark's code is doing. But yes, that needs to be
> fixed first. I forget that arm64 doesn't have the dedicated trampolines yet.
>
> So, let's hold off until that is complete.
>
> -- Steve
Mark finished an implementation of his per-callsite-ops and min-args
branches (meaning that we can now skip the expensive ftrace's saving
of all registers and iteration over all ops if only one is attached)
- https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64-ftrace-call-ops-20221017
And Masami wrote similar patches to what I had originally done to
fprobe in my branch:
- https://github.com/mhiramat/linux/commits/kprobes/fprobe-update
So I could rebase my previous "bpf on fprobe" branch on top of these:
(as before, it's just good enough for benchmarking and to give a
general sense of the idea, not for a thorough code review):
- https://github.com/FlorentRevest/linux/commits/fprobe-min-args-3
And I could run the benchmarks against my rpi4. I have different
baseline numbers as Xu so I ran everything again and tried to keep the
format the same. "indirect call" refers to my branch I just linked and
"direct call" refers to the series this is a reply to (Xu's work)
1. test with dd
1.1 when no bpf prog attached to vfs_write
# dd if=/dev/zero of=/dev/null count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB, 488 MiB) copied, 3.94315 s, 130 MB/s
1.2 attach bpf prog with kprobe, bpftrace -e kprobe:vfs_write {}
# dd if=/dev/zero of=/dev/null count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB, 488 MiB) copied, 5.80493 s, 88.2 MB/s
1.3 attach bpf prog with with direct call, bpftrace -e kfunc:vfs_write {}
# dd if=/dev/zero of=/dev/null count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB, 488 MiB) copied, 4.18579 s, 122 MB/s
1.4 attach bpf prog with with indirect call, bpftrace -e kfunc:vfs_write {}
# dd if=/dev/zero of=/dev/null count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB, 488 MiB) copied, 4.92616 s, 104 MB/s
2. test with bpf/bench
2.1 bench trig-base
Iter 0 ( 86.518us): hits 0.700M/s ( 0.700M/prod), drops
0.000M/s, total operations 0.700M/s
Iter 1 (-26.352us): hits 0.701M/s ( 0.701M/prod), drops
0.000M/s, total operations 0.701M/s
Iter 2 ( 1.092us): hits 0.701M/s ( 0.701M/prod), drops
0.000M/s, total operations 0.701M/s
Iter 3 ( -1.890us): hits 0.701M/s ( 0.701M/prod), drops
0.000M/s, total operations 0.701M/s
Iter 4 ( -2.315us): hits 0.701M/s ( 0.701M/prod), drops
0.000M/s, total operations 0.701M/s
Iter 5 ( 4.184us): hits 0.701M/s ( 0.701M/prod), drops
0.000M/s, total operations 0.701M/s
Iter 6 ( -3.241us): hits 0.701M/s ( 0.701M/prod), drops
0.000M/s, total operations 0.701M/s
Summary: hits 0.701 ± 0.000M/s ( 0.701M/prod), drops 0.000 ±
0.000M/s, total operations 0.701 ± 0.000M/s
2.2 bench trig-kprobe
Iter 0 ( 96.833us): hits 0.290M/s ( 0.290M/prod), drops
0.000M/s, total operations 0.290M/s
Iter 1 (-20.834us): hits 0.291M/s ( 0.291M/prod), drops
0.000M/s, total operations 0.291M/s
Iter 2 ( -2.426us): hits 0.291M/s ( 0.291M/prod), drops
0.000M/s, total operations 0.291M/s
Iter 3 ( 22.332us): hits 0.292M/s ( 0.292M/prod), drops
0.000M/s, total operations 0.292M/s
Iter 4 (-18.204us): hits 0.292M/s ( 0.292M/prod), drops
0.000M/s, total operations 0.292M/s
Iter 5 ( 5.370us): hits 0.292M/s ( 0.292M/prod), drops
0.000M/s, total operations 0.292M/s
Iter 6 ( -7.853us): hits 0.290M/s ( 0.290M/prod), drops
0.000M/s, total operations 0.290M/s
Summary: hits 0.291 ± 0.001M/s ( 0.291M/prod), drops 0.000 ±
0.000M/s, total operations 0.291 ± 0.001M/s
2.3 bench trig-fentry, with direct call
Iter 0 ( 86.481us): hits 0.530M/s ( 0.530M/prod), drops
0.000M/s, total operations 0.530M/s
Iter 1 (-12.593us): hits 0.536M/s ( 0.536M/prod), drops
0.000M/s, total operations 0.536M/s
Iter 2 ( -5.760us): hits 0.532M/s ( 0.532M/prod), drops
0.000M/s, total operations 0.532M/s
Iter 3 ( 1.629us): hits 0.532M/s ( 0.532M/prod), drops
0.000M/s, total operations 0.532M/s
Iter 4 ( -1.945us): hits 0.533M/s ( 0.533M/prod), drops
0.000M/s, total operations 0.533M/s
Iter 5 ( -1.297us): hits 0.532M/s ( 0.532M/prod), drops
0.000M/s, total operations 0.532M/s
Iter 6 ( 0.444us): hits 0.535M/s ( 0.535M/prod), drops
0.000M/s, total operations 0.535M/s
Summary: hits 0.533 ± 0.002M/s ( 0.533M/prod), drops 0.000 ±
0.000M/s, total operations 0.533 ± 0.002M/s
2.3 bench trig-fentry, with indirect call
Iter 0 ( 84.463us): hits 0.404M/s ( 0.404M/prod), drops
0.000M/s, total operations 0.404M/s
Iter 1 (-16.260us): hits 0.405M/s ( 0.405M/prod), drops
0.000M/s, total operations 0.405M/s
Iter 2 ( -1.038us): hits 0.405M/s ( 0.405M/prod), drops
0.000M/s, total operations 0.405M/s
Iter 3 ( -3.797us): hits 0.405M/s ( 0.405M/prod), drops
0.000M/s, total operations 0.405M/s
Iter 4 ( -0.537us): hits 0.402M/s ( 0.402M/prod), drops
0.000M/s, total operations 0.402M/s
Iter 5 ( 3.536us): hits 0.403M/s ( 0.403M/prod), drops
0.000M/s, total operations 0.403M/s
Iter 6 ( 12.203us): hits 0.404M/s ( 0.404M/prod), drops
0.000M/s, total operations 0.404M/s
Summary: hits 0.404 ± 0.001M/s ( 0.404M/prod), drops 0.000 ±
0.000M/s, total operations 0.404 ± 0.001M/s
3. perf report of bench trig-fentry
3.1 with direct call
98.67% 0.27% bench bench
[.] trigger_producer
|
--98.40%--trigger_producer
|
|--96.63%--syscall
| |
| --71.90%--el0t_64_sync
| el0t_64_sync_handler
| el0_svc
| do_el0_svc
| |
| |--70.94%--el0_svc_common
| | |
| |
|--29.55%--invoke_syscall
| | | |
| | |
|--26.23%--__arm64_sys_getpgid
| | | |
|
| | | |
|--18.88%--bpf_trampoline_6442462665_0
| | | |
| |
| | | |
| |--6.85%--__bpf_prog_enter
| | | |
| | |
| | | |
| | --2.68%--migrate_disable
| | | |
| |
| | | |
| |--5.28%--__bpf_prog_exit
| | | |
| | |
| | | |
| | --1.29%--migrate_enable
| | | |
| |
| | | |
|
|--3.96%--bpf_prog_21856463590f61f1_bench_trigger_fentry
| | | |
| |
| | | |
| --0.61%--__rcu_read_lock
| | | |
|
| | | |
--4.42%--find_task_by_vpid
| | | |
|
| | | |
|--2.53%--radix_tree_lookup
| | | |
|
| | | |
--0.61%--idr_find
| | | |
| | |
--0.81%--pid_vnr
| | |
| |
--0.53%--__arm64_sys_getpgid
| |
| --0.95%--invoke_syscall
|
--0.99%--syscall@plt
3.2 with indirect call
98.68% 0.20% bench bench
[.] trigger_producer
|
--98.48%--trigger_producer
|
--97.47%--syscall
|
--76.11%--el0t_64_sync
el0t_64_sync_handler
el0_svc
do_el0_svc
|
|--75.52%--el0_svc_common
| |
|
|--46.35%--invoke_syscall
| | |
| |
--44.06%--__arm64_sys_getpgid
| |
|
| |
|--35.40%--ftrace_caller
| |
| |
| |
| --34.04%--fprobe_handler
| |
| |
| |
| |--15.61%--bpf_fprobe_entry
| |
| | |
| |
| | |--3.79%--__bpf_prog_enter
| |
| | | |
| |
| | |
--0.80%--migrate_disable
| |
| | |
| |
| | |--3.74%--__bpf_prog_exit
| |
| | | |
| |
| | |
--0.77%--migrate_enable
| |
| | |
| |
| |
--2.65%--bpf_prog_21856463590f61f1_bench_trigger_fentry
| |
| |
| |
| |--12.65%--rethook_trampoline_handler
| |
| |
| |
| |--1.70%--rethook_try_get
| |
| | |
| |
| | --1.48%--rcu_is_watching
| |
| |
| |
| |--1.46%--freelist_try_get
| |
| |
| |
| --0.65%--rethook_recycle
| |
|
| |
--6.36%--find_task_by_vpid
| |
|
| |
|--3.64%--radix_tree_lookup
| |
|
| |
--1.74%--idr_find
| |
| --1.05%--ftrace_caller
|
--0.59%--invoke_syscall
This looks slightly better than before but it is actually still a
pretty significant performance hit compared to direct calls.
Note that I can't really make sense of the perf report with indirect
calls. it always reports it spent 12% of the time in
rethook_trampoline_handler but I verified with both a WARN in that
function and a breakpoint with a debugger, this function does *not*
get called when running this "bench trig-fentry" benchmark. Also it
wouldn't make sense for fprobe_handler to call it so I'm quite
confused why perf would report this call and such a long time spent
there. Anyone know what I could be missing here ?
WARNING: multiple messages have this Message-ID (diff)
From: Florent Revest <revest@chromium.org>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Xu Kuohai <xukuohai@huawei.com>,
Mark Rutland <mark.rutland@arm.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Xu Kuohai <xukuohai@huaweicloud.com>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
Will Deacon <will@kernel.org>,
Jean-Philippe Brucker <jean-philippe@linaro.org>,
Ingo Molnar <mingo@redhat.com>, Oleg Nesterov <oleg@redhat.com>,
Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Song Liu <song@kernel.org>, Yonghong Song <yhs@fb.com>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>, Zi Shen Lim <zlim.lnx@gmail.com>,
Pasha Tatashin <pasha.tatashin@soleen.com>,
Ard Biesheuvel <ardb@kernel.org>, Marc Zyngier <maz@kernel.org>,
Guo Ren <guoren@kernel.org>,
Masami Hiramatsu <mhiramat@kernel.org>
Subject: Re: [PATCH bpf-next v2 0/4] Add ftrace direct call for arm64
Date: Mon, 17 Oct 2022 19:55:06 +0200 [thread overview]
Message-ID: <CABRcYm+d=xY9nBCJo-6JW_=F41g4X32QM9WOPChaOTfs6d6KCA@mail.gmail.com> (raw)
In-Reply-To: <20221006122922.53802a5c@gandalf.local.home>
On Thu, Oct 6, 2022 at 6:29 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Thu, 6 Oct 2022 18:19:12 +0200
> Florent Revest <revest@chromium.org> wrote:
>
> > Sure, we can give this a try, I'll work on a macro that generates the
> > 7 callbacks and we can check how much that helps. My belief right now
> > is that ftrace's iteration over all ops on arm64 is where we lose most
> > time but now that we have numbers it's pretty easy to check hypothesis
> > :)
>
> Ah, I forgot that's what Mark's code is doing. But yes, that needs to be
> fixed first. I forget that arm64 doesn't have the dedicated trampolines yet.
>
> So, let's hold off until that is complete.
>
> -- Steve
Mark finished an implementation of his per-callsite-ops and min-args
branches (meaning that we can now skip the expensive ftrace's saving
of all registers and iteration over all ops if only one is attached)
- https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64-ftrace-call-ops-20221017
And Masami wrote similar patches to what I had originally done to
fprobe in my branch:
- https://github.com/mhiramat/linux/commits/kprobes/fprobe-update
So I could rebase my previous "bpf on fprobe" branch on top of these:
(as before, it's just good enough for benchmarking and to give a
general sense of the idea, not for a thorough code review):
- https://github.com/FlorentRevest/linux/commits/fprobe-min-args-3
And I could run the benchmarks against my rpi4. I have different
baseline numbers as Xu so I ran everything again and tried to keep the
format the same. "indirect call" refers to my branch I just linked and
"direct call" refers to the series this is a reply to (Xu's work)
1. test with dd
1.1 when no bpf prog attached to vfs_write
# dd if=/dev/zero of=/dev/null count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB, 488 MiB) copied, 3.94315 s, 130 MB/s
1.2 attach bpf prog with kprobe, bpftrace -e kprobe:vfs_write {}
# dd if=/dev/zero of=/dev/null count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB, 488 MiB) copied, 5.80493 s, 88.2 MB/s
1.3 attach bpf prog with with direct call, bpftrace -e kfunc:vfs_write {}
# dd if=/dev/zero of=/dev/null count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB, 488 MiB) copied, 4.18579 s, 122 MB/s
1.4 attach bpf prog with with indirect call, bpftrace -e kfunc:vfs_write {}
# dd if=/dev/zero of=/dev/null count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB, 488 MiB) copied, 4.92616 s, 104 MB/s
2. test with bpf/bench
2.1 bench trig-base
Iter 0 ( 86.518us): hits 0.700M/s ( 0.700M/prod), drops
0.000M/s, total operations 0.700M/s
Iter 1 (-26.352us): hits 0.701M/s ( 0.701M/prod), drops
0.000M/s, total operations 0.701M/s
Iter 2 ( 1.092us): hits 0.701M/s ( 0.701M/prod), drops
0.000M/s, total operations 0.701M/s
Iter 3 ( -1.890us): hits 0.701M/s ( 0.701M/prod), drops
0.000M/s, total operations 0.701M/s
Iter 4 ( -2.315us): hits 0.701M/s ( 0.701M/prod), drops
0.000M/s, total operations 0.701M/s
Iter 5 ( 4.184us): hits 0.701M/s ( 0.701M/prod), drops
0.000M/s, total operations 0.701M/s
Iter 6 ( -3.241us): hits 0.701M/s ( 0.701M/prod), drops
0.000M/s, total operations 0.701M/s
Summary: hits 0.701 ± 0.000M/s ( 0.701M/prod), drops 0.000 ±
0.000M/s, total operations 0.701 ± 0.000M/s
2.2 bench trig-kprobe
Iter 0 ( 96.833us): hits 0.290M/s ( 0.290M/prod), drops
0.000M/s, total operations 0.290M/s
Iter 1 (-20.834us): hits 0.291M/s ( 0.291M/prod), drops
0.000M/s, total operations 0.291M/s
Iter 2 ( -2.426us): hits 0.291M/s ( 0.291M/prod), drops
0.000M/s, total operations 0.291M/s
Iter 3 ( 22.332us): hits 0.292M/s ( 0.292M/prod), drops
0.000M/s, total operations 0.292M/s
Iter 4 (-18.204us): hits 0.292M/s ( 0.292M/prod), drops
0.000M/s, total operations 0.292M/s
Iter 5 ( 5.370us): hits 0.292M/s ( 0.292M/prod), drops
0.000M/s, total operations 0.292M/s
Iter 6 ( -7.853us): hits 0.290M/s ( 0.290M/prod), drops
0.000M/s, total operations 0.290M/s
Summary: hits 0.291 ± 0.001M/s ( 0.291M/prod), drops 0.000 ±
0.000M/s, total operations 0.291 ± 0.001M/s
2.3 bench trig-fentry, with direct call
Iter 0 ( 86.481us): hits 0.530M/s ( 0.530M/prod), drops
0.000M/s, total operations 0.530M/s
Iter 1 (-12.593us): hits 0.536M/s ( 0.536M/prod), drops
0.000M/s, total operations 0.536M/s
Iter 2 ( -5.760us): hits 0.532M/s ( 0.532M/prod), drops
0.000M/s, total operations 0.532M/s
Iter 3 ( 1.629us): hits 0.532M/s ( 0.532M/prod), drops
0.000M/s, total operations 0.532M/s
Iter 4 ( -1.945us): hits 0.533M/s ( 0.533M/prod), drops
0.000M/s, total operations 0.533M/s
Iter 5 ( -1.297us): hits 0.532M/s ( 0.532M/prod), drops
0.000M/s, total operations 0.532M/s
Iter 6 ( 0.444us): hits 0.535M/s ( 0.535M/prod), drops
0.000M/s, total operations 0.535M/s
Summary: hits 0.533 ± 0.002M/s ( 0.533M/prod), drops 0.000 ±
0.000M/s, total operations 0.533 ± 0.002M/s
2.3 bench trig-fentry, with indirect call
Iter 0 ( 84.463us): hits 0.404M/s ( 0.404M/prod), drops
0.000M/s, total operations 0.404M/s
Iter 1 (-16.260us): hits 0.405M/s ( 0.405M/prod), drops
0.000M/s, total operations 0.405M/s
Iter 2 ( -1.038us): hits 0.405M/s ( 0.405M/prod), drops
0.000M/s, total operations 0.405M/s
Iter 3 ( -3.797us): hits 0.405M/s ( 0.405M/prod), drops
0.000M/s, total operations 0.405M/s
Iter 4 ( -0.537us): hits 0.402M/s ( 0.402M/prod), drops
0.000M/s, total operations 0.402M/s
Iter 5 ( 3.536us): hits 0.403M/s ( 0.403M/prod), drops
0.000M/s, total operations 0.403M/s
Iter 6 ( 12.203us): hits 0.404M/s ( 0.404M/prod), drops
0.000M/s, total operations 0.404M/s
Summary: hits 0.404 ± 0.001M/s ( 0.404M/prod), drops 0.000 ±
0.000M/s, total operations 0.404 ± 0.001M/s
3. perf report of bench trig-fentry
3.1 with direct call
98.67% 0.27% bench bench
[.] trigger_producer
|
--98.40%--trigger_producer
|
|--96.63%--syscall
| |
| --71.90%--el0t_64_sync
| el0t_64_sync_handler
| el0_svc
| do_el0_svc
| |
| |--70.94%--el0_svc_common
| | |
| |
|--29.55%--invoke_syscall
| | | |
| | |
|--26.23%--__arm64_sys_getpgid
| | | |
|
| | | |
|--18.88%--bpf_trampoline_6442462665_0
| | | |
| |
| | | |
| |--6.85%--__bpf_prog_enter
| | | |
| | |
| | | |
| | --2.68%--migrate_disable
| | | |
| |
| | | |
| |--5.28%--__bpf_prog_exit
| | | |
| | |
| | | |
| | --1.29%--migrate_enable
| | | |
| |
| | | |
|
|--3.96%--bpf_prog_21856463590f61f1_bench_trigger_fentry
| | | |
| |
| | | |
| --0.61%--__rcu_read_lock
| | | |
|
| | | |
--4.42%--find_task_by_vpid
| | | |
|
| | | |
|--2.53%--radix_tree_lookup
| | | |
|
| | | |
--0.61%--idr_find
| | | |
| | |
--0.81%--pid_vnr
| | |
| |
--0.53%--__arm64_sys_getpgid
| |
| --0.95%--invoke_syscall
|
--0.99%--syscall@plt
3.2 with indirect call
98.68% 0.20% bench bench
[.] trigger_producer
|
--98.48%--trigger_producer
|
--97.47%--syscall
|
--76.11%--el0t_64_sync
el0t_64_sync_handler
el0_svc
do_el0_svc
|
|--75.52%--el0_svc_common
| |
|
|--46.35%--invoke_syscall
| | |
| |
--44.06%--__arm64_sys_getpgid
| |
|
| |
|--35.40%--ftrace_caller
| |
| |
| |
| --34.04%--fprobe_handler
| |
| |
| |
| |--15.61%--bpf_fprobe_entry
| |
| | |
| |
| | |--3.79%--__bpf_prog_enter
| |
| | | |
| |
| | |
--0.80%--migrate_disable
| |
| | |
| |
| | |--3.74%--__bpf_prog_exit
| |
| | | |
| |
| | |
--0.77%--migrate_enable
| |
| | |
| |
| |
--2.65%--bpf_prog_21856463590f61f1_bench_trigger_fentry
| |
| |
| |
| |--12.65%--rethook_trampoline_handler
| |
| |
| |
| |--1.70%--rethook_try_get
| |
| | |
| |
| | --1.48%--rcu_is_watching
| |
| |
| |
| |--1.46%--freelist_try_get
| |
| |
| |
| --0.65%--rethook_recycle
| |
|
| |
--6.36%--find_task_by_vpid
| |
|
| |
|--3.64%--radix_tree_lookup
| |
|
| |
--1.74%--idr_find
| |
| --1.05%--ftrace_caller
|
--0.59%--invoke_syscall
This looks slightly better than before but it is actually still a
pretty significant performance hit compared to direct calls.
Note that I can't really make sense of the perf report with indirect
calls. it always reports it spent 12% of the time in
rethook_trampoline_handler but I verified with both a WARN in that
function and a breakpoint with a debugger, this function does *not*
get called when running this "bench trig-fentry" benchmark. Also it
wouldn't make sense for fprobe_handler to call it so I'm quite
confused why perf would report this call and such a long time spent
there. Anyone know what I could be missing here ?
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2022-10-17 17:55 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-13 16:27 [PATCH bpf-next v2 0/4] Add ftrace direct call for arm64 Xu Kuohai
2022-09-13 16:27 ` Xu Kuohai
2022-09-13 16:27 ` [PATCH bpf-next v2 1/4] ftrace: Allow users to disable ftrace direct call Xu Kuohai
2022-09-13 16:27 ` Xu Kuohai
2022-09-13 16:27 ` [PATCH bpf-next v2 2/4] arm64: ftrace: Support long jump for " Xu Kuohai
2022-09-13 16:27 ` Xu Kuohai
2022-09-13 16:27 ` [PATCH bpf-next v2 3/4] arm64: ftrace: Add ftrace direct call support Xu Kuohai
2022-09-13 16:27 ` Xu Kuohai
2022-09-13 16:27 ` [PATCH bpf-next v2 4/4] ftrace: Fix dead loop caused by direct call in ftrace selftest Xu Kuohai
2022-09-13 16:27 ` Xu Kuohai
2022-09-22 18:01 ` [PATCH bpf-next v2 0/4] Add ftrace direct call for arm64 Daniel Borkmann
2022-09-22 18:01 ` Daniel Borkmann
2022-09-26 14:40 ` Catalin Marinas
2022-09-26 14:40 ` Catalin Marinas
2022-09-26 17:43 ` Mark Rutland
2022-09-26 17:43 ` Mark Rutland
2022-09-27 4:49 ` Xu Kuohai
2022-09-27 4:49 ` Xu Kuohai
2022-09-28 16:42 ` Mark Rutland
2022-09-28 16:42 ` Mark Rutland
2022-09-30 4:07 ` Xu Kuohai
2022-09-30 4:07 ` Xu Kuohai
2022-10-04 16:06 ` Florent Revest
2022-10-04 16:06 ` Florent Revest
2022-10-05 14:54 ` Xu Kuohai
2022-10-05 14:54 ` Xu Kuohai
2022-10-05 15:07 ` Steven Rostedt
2022-10-05 15:07 ` Steven Rostedt
2022-10-05 15:10 ` Florent Revest
2022-10-05 15:10 ` Florent Revest
2022-10-05 15:30 ` Steven Rostedt
2022-10-05 15:30 ` Steven Rostedt
2022-10-05 22:12 ` Jiri Olsa
2022-10-05 22:12 ` Jiri Olsa
2022-10-06 16:35 ` Florent Revest
2022-10-06 16:35 ` Florent Revest
2022-10-06 10:09 ` Xu Kuohai
2022-10-06 10:09 ` Xu Kuohai
2022-10-06 16:19 ` Florent Revest
2022-10-06 16:19 ` Florent Revest
2022-10-06 16:29 ` Steven Rostedt
2022-10-06 16:29 ` Steven Rostedt
2022-10-07 10:13 ` Xu Kuohai
2022-10-07 10:13 ` Xu Kuohai
2022-10-17 17:55 ` Florent Revest [this message]
2022-10-17 17:55 ` Florent Revest
2022-10-17 18:49 ` Steven Rostedt
2022-10-17 18:49 ` Steven Rostedt
2022-10-17 19:10 ` Florent Revest
2022-10-17 19:10 ` Florent Revest
2022-10-21 11:31 ` Masami Hiramatsu
2022-10-21 11:31 ` Masami Hiramatsu
2022-10-21 16:49 ` Florent Revest
2022-10-21 16:49 ` Florent Revest
2022-10-24 13:00 ` Masami Hiramatsu
2022-10-24 13:00 ` Masami Hiramatsu
2022-11-10 4:58 ` wuqiang
2022-11-10 4:58 ` wuqiang
2022-10-06 10:09 ` Xu Kuohai
2022-10-06 10:09 ` Xu Kuohai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CABRcYm+d=xY9nBCJo-6JW_=F41g4X32QM9WOPChaOTfs6d6KCA@mail.gmail.com' \
--to=revest@chromium.org \
--cc=andrii@kernel.org \
--cc=ardb@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=catalin.marinas@arm.com \
--cc=daniel@iogearbox.net \
--cc=guoren@kernel.org \
--cc=haoluo@google.com \
--cc=jean-philippe@linaro.org \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=martin.lau@linux.dev \
--cc=maz@kernel.org \
--cc=mhiramat@kernel.org \
--cc=mingo@redhat.com \
--cc=oleg@redhat.com \
--cc=pasha.tatashin@soleen.com \
--cc=rostedt@goodmis.org \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=will@kernel.org \
--cc=xukuohai@huawei.com \
--cc=xukuohai@huaweicloud.com \
--cc=yhs@fb.com \
--cc=zlim.lnx@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.