All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yonghong Song <yhs@fb.com>
To: Kui-Feng Lee <kuifeng@fb.com>, "memxor@gmail.com" <memxor@gmail.com>
Cc: "brauner@kernel.org" <brauner@kernel.org>,
	"daniel@iogearbox.net" <daniel@iogearbox.net>,
	"olsajiri@gmail.com" <olsajiri@gmail.com>,
	"ast@kernel.org" <ast@kernel.org>,
	Kernel Team <Kernel-team@fb.com>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"andrii@kernel.org" <andrii@kernel.org>
Subject: Re: [PATCH bpf-next 1/3] bpf: Parameterize task iterators.
Date: Thu, 28 Jul 2022 10:08:44 -0700	[thread overview]
Message-ID: <555a171a-9855-e827-878d-e75e533f72ad@fb.com> (raw)
In-Reply-To: <3805b621c511ee9bd76c6655d6ba814d1b54ee37.camel@fb.com>



On 7/28/22 9:40 AM, Kui-Feng Lee wrote:
> On Thu, 2022-07-28 at 18:22 +0200, Kumar Kartikeya Dwivedi wrote:
>> On Thu, 28 Jul 2022 at 17:16, Kui-Feng Lee <kuifeng@fb.com> wrote:
>>>
>>> On Thu, 2022-07-28 at 10:47 +0200, Kumar Kartikeya Dwivedi wrote:
>>>> On Thu, 28 Jul 2022 at 07:25, Kui-Feng Lee <kuifeng@fb.com>
>>>> wrote:
>>>>>
>>>>> On Wed, 2022-07-27 at 10:19 +0200, Kumar Kartikeya Dwivedi
>>>>> wrote:
>>>>>> On Wed, 27 Jul 2022 at 09:01, Kui-Feng Lee <kuifeng@fb.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> On Tue, 2022-07-26 at 14:13 +0200, Jiri Olsa wrote:
>>>>>>>> On Mon, Jul 25, 2022 at 10:17:11PM -0700, Kui-Feng Lee
>>>>>>>> wrote:
>>>>>>>>> Allow creating an iterator that loops through resources
>>>>>>>>> of
>>>>>>>>> one
>>>>>>>>> task/thread.
>>>>>>>>>
>>>>>>>>> People could only create iterators to loop through all
>>>>>>>>> resources of
>>>>>>>>> files, vma, and tasks in the system, even though they
>>>>>>>>> were
>>>>>>>>> interested
>>>>>>>>> in only the resources of a specific task or process.
>>>>>>>>> Passing
>>>>>>>>> the
>>>>>>>>> additional parameters, people can now create an
>>>>>>>>> iterator to
>>>>>>>>> go
>>>>>>>>> through all resources or only the resources of a task.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Kui-Feng Lee <kuifeng@fb.com>
>>>>>>>>> ---
>>>>>>>>>   include/linux/bpf.h            |  4 ++
>>>>>>>>>   include/uapi/linux/bpf.h       | 23 ++++++++++
>>>>>>>>>   kernel/bpf/task_iter.c         | 81
>>>>>>>>> +++++++++++++++++++++++++-
>>>>>>>>> ----
>>>>>>>>> ----
>>>>>>>>>   tools/include/uapi/linux/bpf.h | 23 ++++++++++
>>>>>>>>>   4 files changed, 109 insertions(+), 22 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>>>>>>>>> index 11950029284f..c8d164404e20 100644
>>>>>>>>> --- a/include/linux/bpf.h
>>>>>>>>> +++ b/include/linux/bpf.h
>>>>>>>>> @@ -1718,6 +1718,10 @@ int bpf_obj_get_user(const char
>>>>>>>>> __user
>>>>>>>>> *pathname, int flags);
>>>>>>>>>
>>>>>>>>>   struct bpf_iter_aux_info {
>>>>>>>>>          struct bpf_map *map;
>>>>>>>>> +       struct {
>>>>>>>>> +               __u32   tid;
>>>>>>>>
>>>>>>>> should be just u32 ?
>>>>>>>
>>>>>>> Or, should change the following 'type' to __u8?
>>>>>>
>>>>>> Would it be better to use a pidfd instead of a tid here?
>>>>>> Unset
>>>>>> pidfd
>>>>>> would mean going over all tasks, and any fd > 0 implies
>>>>>> attaching
>>>>>> to
>>>>>> a
>>>>>> specific task (as is the convention in BPF land). Most of the
>>>>>> new
>>>>>> UAPIs working on processes are using pidfds (to work with a
>>>>>> stable
>>>>>> handle instead of a reusable ID).
>>>>>> The iterator taking an fd also gives an opportunity to BPF
>>>>>> LSMs
>>>>>> to
>>>>>> attach permissions/policies to it (once we have a file local
>>>>>> storage
>>>>>> map) e.g. whether creating a task iterator for that specific
>>>>>> pidfd
>>>>>> instance (backed by the struct file) would be allowed or not.
>>>>>> You are using getpid in the selftest and keeping track of
>>>>>> last_tgid
>>>>>> in
>>>>>> the iterator, so I guess you don't even need to extend
>>>>>> pidfd_open
>>>>>> to
>>>>>> work on thread IDs right now for your use case (and fdtable
>>>>>> and
>>>>>> mm
>>>>>> are
>>>>>> shared for POSIX threads anyway, so for those two it won't
>>>>>> make a
>>>>>> difference).

There is one problem here. The current pidfd_open syscall
only supports thread-group leader, i.e., main thread, i.e.,
it won't support any non-main-thread tid's. Yes, thread-group
leader and other threads should share the same vma and files
in most of times, but it still possible different threads
in the same process may have different files which is why
in current task_iter.c we have:
                 *tid = pid_nr_ns(pid, ns);
                 task = get_pid_task(pid, PIDTYPE_PID);
                 if (!task) {
                         ++*tid;
                         goto retry;
                 } else if (skip_if_dup_files && 
!thread_group_leader(task) &&
                            task->files == task->group_leader->files) {
                         put_task_struct(task);
                         task = NULL;
                         ++*tid;
                         goto retry;
                 }


Each thread (tid) will have some fields different from
thread-group leader (tgid), e.g., comm and most (if not all)
scheduling related fields.

So it would be good to support for each tid from the start
as it is not clear when pidfd_open will support non
thread-group leader.

If it worries wrap around, a reference to the task
can be held when tid passed to the kernel at link
create time. This is similar to pid is passed to
the kernel at pidfd_open syscall. But in practice,
I think taking the reference during read() should
also fine. The race always exist anyway.

Kumar, could you give more details about security
concerns? I am not sure about the tight relationship
between bpf_iter and security. bpf_iter just for
iterating kernel data structures.

>>>>>>
>>>>>> What is your opinion?
>>>>>
>>>>> Do you mean removed both tid and type, and replace them with a
>>>>> pidfd?
>>>>> We can do that in uapi, struct bpf_link_info.  But, the interal
>>>>> types,
>>>>> ex. bpf_iter_aux_info, still need to use tid or struct file to
>>>>> avoid
>>>>> getting file from the per-process fdtable.  Is that what you
>>>>> mean?
>>>>>
>>>>
>>>> Yes, just for the UAPI, it is similar to taking map_fd for map
>>>> iter.
>>>> In bpf_link_info we should report just the tid, just like map
>>>> iter
>>>> reports map_id.
>>>
>>> It sounds good to me.
>>>
>>> One thing I need a clarification. You mentioned that a fd > 0
>>> implies
>>> attaching to a specific task, however fd can be 0. So, it should be
>>> fd
>>>> = 0. So, it forces the user to initialize the value of pidfd to -
>>>> 1.
>>> So, for convenience, we still need a field like 'type' to make it
>>> easy
>>> to create iterators without a filter.
>>>
>>
>> Right, but in lots of BPF UAPI fields, fd 0 means fd is unset, so it
>> is fine to rely on that assumption. For e.g. even for map_fd,
>> bpf_map_elem iterator considers fd 0 to be unset. Then you don't need
>> the type field.
> 
> I just realize that pidfd may be meaningless for the bpf_link_info
> returned by bpf_obj_get_info_by_fd() since the origin fd might be
> closed already.  So, I will always set it a value of 0.
> 

  reply	other threads:[~2022-07-28 17:09 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-26  5:17 [PATCH bpf-next 0/3] Parameterize task iterators Kui-Feng Lee
2022-07-26  5:17 ` [PATCH bpf-next 1/3] bpf: " Kui-Feng Lee
2022-07-26 12:13   ` Jiri Olsa
2022-07-26 13:19     ` Jiri Olsa
2022-07-27  6:39       ` Kui-Feng Lee
2022-07-27  6:56     ` Kui-Feng Lee
2022-07-27  8:19       ` Kumar Kartikeya Dwivedi
2022-07-28  5:25         ` Kui-Feng Lee
2022-07-28  8:47           ` Kumar Kartikeya Dwivedi
2022-07-28 15:16             ` Kui-Feng Lee
2022-07-28 16:22               ` Kumar Kartikeya Dwivedi
2022-07-28 16:40                 ` Kui-Feng Lee
2022-07-28 17:08                   ` Yonghong Song [this message]
2022-07-28 17:52                     ` Kumar Kartikeya Dwivedi
2022-07-28 19:11                       ` Hao Luo
2022-08-02 14:25                         ` Kumar Kartikeya Dwivedi
2022-07-28 22:54                       ` Yonghong Song
2022-07-29  9:10                         ` Christian Brauner
     [not found]                   ` <CAP01T74HRHapKDAfj104KNGnzCgNQSu_M5-KfEvGBNzLWNfd+Q@mail.gmail.com>
2022-07-30  2:46                     ` Kui-Feng Lee
2022-07-28 18:01                 ` Hao Luo
2022-07-26  5:17 ` [PATCH bpf-next 2/3] bpf: Handle bpf_link_info for the parameterized task BPF iterators Kui-Feng Lee
2022-07-26  5:17 ` [PATCH bpf-next 3/3] selftests/bpf: Test " Kui-Feng Lee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=555a171a-9855-e827-878d-e75e533f72ad@fb.com \
    --to=yhs@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brauner@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kuifeng@fb.com \
    --cc=memxor@gmail.com \
    --cc=olsajiri@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.