From: Kui-Feng Lee <kuifeng@fb.com>
To: "andrii.nakryiko@gmail.com" <andrii.nakryiko@gmail.com>
Cc: "daniel@iogearbox.net" <daniel@iogearbox.net>,
Kernel Team <Kernel-team@fb.com>, Yonghong Song <yhs@fb.com>,
"ast@kernel.org" <ast@kernel.org>,
"andrii@kernel.org" <andrii@kernel.org>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>
Subject: Re: [PATCH bpf-next v2 1/3] bpf: Parameterize task iterators.
Date: Tue, 2 Aug 2022 16:42:35 +0000 [thread overview]
Message-ID: <9ab00aa58259d9dd7b45fdf860423e86612b591d.camel@fb.com> (raw)
In-Reply-To: <CAEf4BzZqwoCecuUTe=LGBBrTWMp_bCttik1fkmRF1rBXxBYPAw@mail.gmail.com>
On Mon, 2022-08-01 at 20:30 -0700, Andrii Nakryiko wrote:
> On Mon, Aug 1, 2022 at 4:27 PM Kui-Feng Lee <kuifeng@fb.com> wrote:
> >
> > Allow creating an iterator that loops through resources of one
> > task/thread.
> >
> > People could only create iterators to loop through all resources of
> > files, vma, and tasks in the system, even though they were
> > interested
> > in only the resources of a specific task or process. Passing the
> > additional parameters, people can now create an iterator to go
> > through all resources or only the resources of a task.
> >
> > Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
> > ---
> > include/linux/bpf.h | 4 ++
> > include/uapi/linux/bpf.h | 23 +++++++++
> > kernel/bpf/task_iter.c | 93 ++++++++++++++++++++++++++----
> > ----
> > tools/include/uapi/linux/bpf.h | 23 +++++++++
> > 4 files changed, 121 insertions(+), 22 deletions(-)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 11950029284f..3c26dbfc9cef 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1718,6 +1718,10 @@ int bpf_obj_get_user(const char __user
> > *pathname, int flags);
> >
> > struct bpf_iter_aux_info {
> > struct bpf_map *map;
> > + struct {
> > + u32 tid;
> > + u8 type;
> > + } task;
> > };
> >
> > typedef int (*bpf_iter_attach_target_t)(struct bpf_prog *prog,
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index ffcbf79a556b..ed5ba501609f 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -87,10 +87,33 @@ struct bpf_cgroup_storage_key {
> > __u32 attach_type; /* program attach type
> > (enum bpf_attach_type) */
> > };
> >
> > +enum bpf_task_iter_type {
> > + BPF_TASK_ITER_ALL = 0,
> > + BPF_TASK_ITER_TID,
> > +};
> > +
> > union bpf_iter_link_info {
> > struct {
> > __u32 map_fd;
> > } map;
> > + /*
> > + * Parameters of task iterators.
> > + */
> > + struct {
> > + __u32 pid_fd;
>
> I was a bit late to the discussion about pidfd vs plain pid. I think
> we should support both in this API. While pid_fd has some nice
> guarantees like avoiding the risk of accidental PID reuse, in a lot
> (if not all) cases where task/task_vma/task_file iterators are going
> to be used this is never a risk, because pid will usually come from
> some tracing BPF program (kprobe/tp/fentry/etc), like in case of
> profiling, and then will be used by user-space almost immediately to
> query some additional information (fetching relevant vma information
> for profiling use case). So main benefit of pidfd is not that
> relevant
> for BPF tracing use cases, because PIDs are not going to be reused so
> fast within such a short time frame.
>
> But pidfd does have downsides. It requires 2 syscalls (pidfd_open and
> close) for each PID, it creates struct file for each such active
> pidfd. So it will have non-trivial overhead for high-frequency BPF
> iterator use cases (imagine querying some simple stats for a big set
> of tasks, frequently: you'll spend more time in pidfd syscalls and
> more resources just keeping corresponding struct file open than
> actually doing useful BPF work). For simple BPF iter cases it will
> unnecessarily complicate program flow while giving no benefit
> instead.
It is a good point to have more syscalls.
>
> So I propose we support both in UAPI. Internally either way we
> resolve
> to plain pid/tid, so this won't cause added maintenance burden. But
> simple cases will keep simple, while more long-lived and/or
> complicated ones will still be supported. We then can have
> BPF_TASK_ITER_PIDFD vs BPF_TASK_ITER_TID to differentiate whether the
> above __u32 pid_fd (which we should probably rename to something more
> generic like "target") is pid FD or TID/PID. See also below about TID
> vs PID.
>
> > + /*
> > + * The type of the iterator.
> > + *
> > + * It can be one of enum bpf_task_iter_type.
> > + *
> > + * BPF_TASK_ITER_ALL (default)
> > + * The iterator iterates over resources of
> > everyprocess.
> > + *
> > + * BPF_TASK_ITER_TID
> > + * You should also set *pid_fd* to iterate
> > over one task.
>
> naming nit: we should decide whether we use TID (thread) and PID
> (process) terminology (more usual for user-space) or PID (process ==
> task == user-space thread) and TGID (thread group, i.e. user-space
> process). I haven't investigated much what's we use most
> consistently,
> but curious to hear what others think.
>
> Also I can see use-cases where we want to iterate just specified task
> (i.e., just specified thread) vs all the tasks that belong to the
> same
> process group (i.e., thread within process). Naming TBD, but we
> should
> have BPF_TASK_ITER_TID and BPF_TASK_ITER_TGID (or some other naming).
I discussed with Yonghong about iterators over resources of all tasks
of a process. User code should create iterators for each thread of the
process if necessary. We may add the support of tgid if it is higly
demanded.
In a discussion of using pidfd, people mentioned to extend pidfd to
threads if there is a good use-case. It also applies to our case.
Most of the time, if not always, vma & files are shared by all threads
of a process. So, an iteration over all resources of every threads of
a process doesn't get obvious benefit. It is also true for an iterator
over the resources of a specific thread instead of a process.
>
> One might ask why do we need single-task mode if we can always stop
> iteration from BPF program, but this is trivial only for iter/task,
> while for iter/task_vma and iter/task_file it becomes inconvenient to
> detect switch from one task to another. It costs us essentially
> nothing to support this mode, so I advocate to do that.
>
> I have similar thoughts about cgroup iteration modes and actually
> supporting cgroup_fd as target for task iterators (which will mean
> iterating tasks belonging to provided cgroup(s)), but I'll reply on
> cgroup iterator patch first, and we can just reuse the same cgroup
> target specification between iter/cgroup and iter/task afterwards.
>
>
> > + */
> > + __u8 type; /* BPF_TASK_ITER_* */
> > + } task;
> > };
> >
>
> [...]
next prev parent reply other threads:[~2022-08-02 16:42 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-01 23:26 [PATCH bpf-next v2 0/3] Parameterize task iterators Kui-Feng Lee
2022-08-01 23:26 ` [PATCH bpf-next v2 1/3] bpf: " Kui-Feng Lee
2022-08-02 1:49 ` Alexei Starovoitov
2022-08-02 16:47 ` Kui-Feng Lee
2022-08-02 21:19 ` Andrii Nakryiko
2022-08-02 3:30 ` Andrii Nakryiko
2022-08-02 16:42 ` Kui-Feng Lee [this message]
2022-08-02 21:17 ` Andrii Nakryiko
2022-08-04 23:05 ` Kui-Feng Lee
2022-08-01 23:26 ` [PATCH bpf-next v2 2/3] bpf: Handle bpf_link_info for the parameterized task BPF iterators Kui-Feng Lee
2022-08-01 23:26 ` [PATCH bpf-next v2 3/3] selftests/bpf: Test " Kui-Feng Lee
2022-08-01 23:35 ` [PATCH bpf-next v2 0/3] Parameterize task iterators Kui-Feng Lee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9ab00aa58259d9dd7b45fdf860423e86612b591d.camel@fb.com \
--to=kuifeng@fb.com \
--cc=Kernel-team@fb.com \
--cc=andrii.nakryiko@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).