From: Song Liu <songliubraving@fb.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: open list <linux-kernel@vger.kernel.org>,
bpf <bpf@vger.kernel.org>, Networking <netdev@vger.kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Kernel Team <Kernel-team@fb.com>,
john fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@chromium.org>,
Jesper Dangaard Brouer <brouer@redhat.com>,
Daniel Xu <dlxu@fb.com>
Subject: Re: [PATCH bpf-next 5/5] selftests/bpf: add benchmark for uprobe vs. user_prog
Date: Wed, 5 Aug 2020 07:01:00 +0000 [thread overview]
Message-ID: <BA750BC0-0A3B-488B-806C-90C1B6CDF586@fb.com> (raw)
In-Reply-To: <CAEf4BzZ29G2KexhKY=CffOPK_DiqAXxRHWuVRREHv0dnXgobRQ@mail.gmail.com>
> On Aug 4, 2020, at 10:47 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Aug 4, 2020 at 9:47 PM Song Liu <songliubraving@fb.com> wrote:
>>
>>
>>
>>> On Aug 4, 2020, at 6:52 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>>
>>> On Tue, Aug 4, 2020 at 2:01 PM Song Liu <songliubraving@fb.com> wrote:
>>>>
>>>>
>>>>
>>>>> On Aug 2, 2020, at 10:10 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>>>>
>>>>> On Sun, Aug 2, 2020 at 9:47 PM Song Liu <songliubraving@fb.com> wrote:
>>>>>>
>>>>>>
>>>>>>> On Aug 2, 2020, at 6:51 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
>>>>>>>
>>>>>>> On Sat, Aug 1, 2020 at 1:50 AM Song Liu <songliubraving@fb.com> wrote:
>>>>>>>>
>>>>>>>> Add a benchmark to compare performance of
>>>>>>>> 1) uprobe;
>>>>>>>> 2) user program w/o args;
>>>>>>>> 3) user program w/ args;
>>>>>>>> 4) user program w/ args on random cpu.
>>>>>>>>
>>>>>>>
>>>>>>> Can you please add it to the existing benchmark runner instead, e.g.,
>>>>>>> along the other bench_trigger benchmarks? No need to re-implement
>>>>>>> benchmark setup. And also that would also allow to compare existing
>>>>>>> ways of cheaply triggering a program vs this new _USER program?
>>>>>>
>>>>>> Will try.
>>>>>>
>>>>>>>
>>>>>>> If the performance is not significantly better than other ways, do you
>>>>>>> think it still makes sense to add a new BPF program type? I think
>>>>>>> triggering KPROBE/TRACEPOINT from bpf_prog_test_run() would be very
>>>>>>> nice, maybe it's possible to add that instead of a new program type?
>>>>>>> Either way, let's see comparison with other program triggering
>>>>>>> mechanisms first.
>>>>>>
>>>>>> Triggering KPROBE and TRACEPOINT from bpf_prog_test_run() will be useful.
>>>>>> But I don't think they can be used instead of user program, for a couple
>>>>>> reasons. First, KPROBE/TRACEPOINT may be triggered by other programs
>>>>>> running in the system, so user will have to filter those noise out in
>>>>>> each program. Second, it is not easy to specify CPU for KPROBE/TRACEPOINT,
>>>>>> while this feature could be useful in many cases, e.g. get stack trace
>>>>>> on a given CPU.
>>>>>>
>>>>>
>>>>> Right, it's not as convenient with KPROBE/TRACEPOINT as with the USER
>>>>> program you've added specifically with that feature in mind. But if
>>>>> you pin user-space thread on the needed CPU and trigger kprobe/tp,
>>>>> then you'll get what you want. As for the "noise", see how
>>>>> bench_trigger() deals with that: it records thread ID and filters
>>>>> everything not matching. You can do the same with CPU ID. It's not as
>>>>> automatic as with a special BPF program type, but still pretty simple,
>>>>> which is why I'm still deciding (for myself) whether USER program type
>>>>> is necessary :)
>>>>
>>>> Here are some bench_trigger numbers:
>>>>
>>>> base : 1.698 ± 0.001M/s
>>>> tp : 1.477 ± 0.001M/s
>>>> rawtp : 1.567 ± 0.001M/s
>>>> kprobe : 1.431 ± 0.000M/s
>>>> fentry : 1.691 ± 0.000M/s
>>>> fmodret : 1.654 ± 0.000M/s
>>>> user : 1.253 ± 0.000M/s
>>>> fentry-on-cpu: 0.022 ± 0.011M/s
>>>> user-on-cpu: 0.315 ± 0.001M/s
>>>>
>>>
>>> Ok, so basically all of raw_tp,tp,kprobe,fentry/fexit are
>>> significantly faster than USER programs. Sure, when compared to
>>> uprobe, they are faster, but not when doing on-specific-CPU run, it
>>> seems (judging from this patch's description, if I'm reading it
>>> right). Anyways, speed argument shouldn't be a reason for doing this,
>>> IMO.
>>>
>>>> The two "on-cpu" tests run the program on a different CPU (see the patch
>>>> at the end).
>>>>
>>>> "user" is about 25% slower than "fentry". I think this is mostly because
>>>> getpgid() is a faster syscall than bpf(BPF_TEST_RUN).
>>>
>>> Yes, probably.
>>>
>>>>
>>>> "user-on-cpu" is more than 10x faster than "fentry-on-cpu", because IPI
>>>> is way faster than moving the process (via sched_setaffinity).
>>>
>>> I don't think that's a good comparison, because you are actually
>>> testing sched_setaffinity performance on each iteration vs IPI in the
>>> kernel, not a BPF overhead.
>>>
>>> I think the fair comparison for this would be to create a thread and
>>> pin it on necessary CPU, and only then BPF program calls in a loop.
>>> But I bet any of existing program types would beat USER program.
>>>
>>>>
>>>> For use cases that we would like to call BPF program on specific CPU,
>>>> triggering it via IPI is a lot faster.
>>>
>>> So these use cases would be nice to expand on in the motivational part
>>> of the patch set. It's not really emphasized and it's not at all clear
>>> what you are trying to achieve. It also seems, depending on latency
>>> requirements, it's totally possible to achieve comparable results by
>>> pre-creating a thread for each CPU, pinning each one to its designated
>>> CPU and then using any suitable user-space signaling mechanism (a
>>> queue, condvar, etc) to ask a thread to trigger BPF program (fentry on
>>> getpgid(), for instance).
>>
>> I don't see why user space signal plus fentry would be faster than IPI.
>> If the target cpu is running something, this gonna add two context
>> switches.
>>
>
> I didn't say faster, did I? I said it would be comparable and wouldn't
> require a new program type.
Well, I don't think adding program type is that big a deal. If that is
really a problem, we can use a new attach type instead. The goal is to
trigger it with sys_bpf() on a different cpu. So we can call it kprobe
attach to nothing and hack that way. I add the new type because it makes
sense. The user just want to trigger a BPF program from user space.
> But then again, without knowing all the
> details, it's a bit hard to discuss this. E.g., if you need to trigger
> that BPF program periodically, you can sleep in those per-CPU threads,
> or epoll, or whatever. Or maybe you can set up a per-CPU perf event
> that would trigger your program on the desired CPU, etc.My point is
> that I and others shouldn't be guessing this, I'd expect someone who's
> proposing an entire new BPF program type to motivate why this new
> program type is necessary and what problem it's solving that can't be
> solved with existing means.
Yes, there are other options. But they all come with non-trivial cost.
Per-CPU-per-process threads and/or per-CPU perf event are cost we have
to pay in production. IMO, these cost are much higher than a new program
type (or attach type).
>
> BTW, how frequently do you need to trigger the BPF program? Seems very
> frequently, if 2 vs 1 context switches might be a problem?
The whole solution requires two BPF programs. One on each context switch,
the other is the user program. The user program will not trigger very
often.
>
>>> I bet in this case the performance would be
>>> really nice for a lot of practical use cases. But then again, I don't
>>> know details of the intended use case, so please provide some more
>>> details.
>>
>> Being able to trigger BPF program on a different CPU could enable many
>> use cases and optimizations. The use case I am looking at is to access
>> perf_event and percpu maps on the target CPU. For example:
>> 0. trigger the program
>> 1. read perf_event on cpu x;
>> 2. (optional) check which process is running on cpu x;
>> 3. add perf_event value to percpu map(s) on cpu x.
>>
>> If we do these steps in a BPF program on cpu x, the cost is:
>> A.0) trigger BPF via IPI;
>> A.1) read perf_event locally;
>> A.2) local access current;
>> A.3) local access of percpu map(s).
>>
>> If we can only do these on a different CPU, the cost will be:
>> B.0) trigger BPF locally;
>> B.1) read perf_event via IPI;
>> B.2) remote access current on cpu x;
>> B.3) remote access percpu map(s), or use non-percpu map(2).
>>
>> Cost of (A.0 + A.1) is about same as (B.0 + B.1), maybe a little higher
>> (sys_bpf(), vs. sys_getpgid()). But A.2 and A.3 will be significantly
>> cheaper than B.2 and B.3.
>>
>> Does this make sense?
>
> It does, thanks. But what I was describing is still A, no? BPF program
> will be triggered on your desired cpu X, wouldn't it?
Well, that would be option C, but C could not do step 2, because we context
switch to the dedicated thread.
next prev parent reply other threads:[~2020-08-05 7:01 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20200801084721.1812607-1-songliubraving@fb.com>
[not found] ` <20200801084721.1812607-3-songliubraving@fb.com>
2020-08-03 1:40 ` [PATCH bpf-next 2/5] libbpf: support BPF_PROG_TYPE_USER programs Andrii Nakryiko
2020-08-03 4:21 ` Song Liu
2020-08-03 5:05 ` Andrii Nakryiko
2020-08-04 1:18 ` Song Liu
2020-08-05 1:38 ` Andrii Nakryiko
2020-08-05 3:59 ` Song Liu
2020-08-05 5:32 ` Andrii Nakryiko
2020-08-05 6:26 ` Song Liu
2020-08-05 6:54 ` Andrii Nakryiko
2020-08-05 7:23 ` Song Liu
2020-08-05 17:44 ` Andrii Nakryiko
[not found] ` <20200801084721.1812607-4-songliubraving@fb.com>
2020-08-03 1:43 ` [PATCH bpf-next 3/5] selftests/bpf: add selftest for BPF_PROG_TYPE_USER Andrii Nakryiko
2020-08-03 4:33 ` Song Liu
2020-08-03 5:07 ` Andrii Nakryiko
[not found] ` <20200801084721.1812607-5-songliubraving@fb.com>
2020-08-03 1:46 ` [PATCH bpf-next 4/5] selftests/bpf: move two functions to test_progs.c Andrii Nakryiko
2020-08-03 4:34 ` Song Liu
[not found] ` <20200801084721.1812607-6-songliubraving@fb.com>
2020-08-03 1:51 ` [PATCH bpf-next 5/5] selftests/bpf: add benchmark for uprobe vs. user_prog Andrii Nakryiko
2020-08-03 4:47 ` Song Liu
2020-08-03 5:10 ` Andrii Nakryiko
2020-08-04 20:54 ` Song Liu
2020-08-05 1:52 ` Andrii Nakryiko
2020-08-05 4:47 ` Song Liu
2020-08-05 5:47 ` Andrii Nakryiko
2020-08-05 7:01 ` Song Liu [this message]
2020-08-05 17:39 ` Andrii Nakryiko
2020-08-05 18:41 ` Song Liu
2020-08-05 17:16 ` Alexei Starovoitov
2020-08-05 17:27 ` Andrii Nakryiko
2020-08-05 17:45 ` Alexei Starovoitov
2020-08-05 17:56 ` Andrii Nakryiko
2020-08-05 18:56 ` Song Liu
2020-08-05 22:50 ` Alexei Starovoitov
2020-08-05 23:50 ` Song Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=BA750BC0-0A3B-488B-806C-90C1B6CDF586@fb.com \
--to=songliubraving@fb.com \
--cc=Kernel-team@fb.com \
--cc=andrii.nakryiko@gmail.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brouer@redhat.com \
--cc=daniel@iogearbox.net \
--cc=dlxu@fb.com \
--cc=john.fastabend@gmail.com \
--cc=kpsingh@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).