All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yonghong Song <yhs@fb.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Andrii Nakryiko <andriin@fb.com>, bpf <bpf@vger.kernel.org>,
	Martin KaFai Lau <kafai@fb.com>,
	Networking <netdev@vger.kernel.org>,
	Alexei Starovoitov <ast@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH bpf-next v2 11/20] bpf: add task and task/file iterator targets
Date: Wed, 6 May 2020 11:24:45 -0700	[thread overview]
Message-ID: <8758f1c9-f4ea-af99-9af8-afe9fb210928@fb.com> (raw)
In-Reply-To: <CAEf4BzbySjaBQSMTET=HGD_K748GOXZZQ7zMhgtEqE-JgJGbdQ@mail.gmail.com>



On 5/6/20 12:30 AM, Andrii Nakryiko wrote:
> On Sun, May 3, 2020 at 11:28 PM Yonghong Song <yhs@fb.com> wrote:
>>
>> Only the tasks belonging to "current" pid namespace
>> are enumerated.
>>
>> For task/file target, the bpf program will have access to
>>    struct task_struct *task
>>    u32 fd
>>    struct file *file
>> where fd/file is an open file for the task.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
> 
> I might be missing some subtleties with task refcounting for task_file
> iterator, asked few questions below...
> 
>>   kernel/bpf/Makefile    |   2 +-
>>   kernel/bpf/task_iter.c | 336 +++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 337 insertions(+), 1 deletion(-)
>>   create mode 100644 kernel/bpf/task_iter.c
>>
>> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
>> index b2b5eefc5254..37b2d8620153 100644
>> --- a/kernel/bpf/Makefile
>> +++ b/kernel/bpf/Makefile
>> @@ -2,7 +2,7 @@
>>   obj-y := core.o
>>   CFLAGS_core.o += $(call cc-disable-warning, override-init)
>>
>> -obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o
>> +obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o
>>   obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
>>   obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o
>>   obj-$(CONFIG_BPF_SYSCALL) += disasm.o
>> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
>> new file mode 100644
>> index 000000000000..1ca258f6e9f4
>> --- /dev/null
>> +++ b/kernel/bpf/task_iter.c
>> @@ -0,0 +1,336 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/* Copyright (c) 2020 Facebook */
>> +
>> +#include <linux/init.h>
>> +#include <linux/namei.h>
>> +#include <linux/pid_namespace.h>
>> +#include <linux/fs.h>
>> +#include <linux/fdtable.h>
>> +#include <linux/filter.h>
>> +
>> +struct bpf_iter_seq_task_common {
>> +       struct pid_namespace *ns;
>> +};
>> +
>> +struct bpf_iter_seq_task_info {
>> +       struct bpf_iter_seq_task_common common;
> 
> you have comment below in init_seq_pidns() that common is supposed to
> be the very first field, but I think it's more important and
> appropriate here, so that whoever adds anything here knows that order
> of field is important.

I can move the comments here.

> 
>> +       struct task_struct *task;
>> +       u32 id;
>> +};
>> +
> 
> [...]
> 
>> +static int __task_seq_show(struct seq_file *seq, void *v, bool in_stop)
>> +{
>> +       struct bpf_iter_meta meta;
>> +       struct bpf_iter__task ctx;
>> +       struct bpf_prog *prog;
>> +       int ret = 0;
>> +
>> +       meta.seq = seq;
>> +       prog = bpf_iter_get_info(&meta, in_stop);
>> +       if (prog) {
> 
> 
> nit: `if (!prog) return 0;` here would reduce nesting level below
> 
>> +               meta.seq = seq;
>> +               ctx.meta = &meta;
>> +               ctx.task = v;
>> +               ret = bpf_iter_run_prog(prog, &ctx);
>> +       }
>> +
>> +       return 0;
> 
> return **ret**; ?

It should return "ret". In task_file show() code is similar but correct.
I can do early return with !prog too although we do not have
deep nesting level yet.

> 
>> +}
>> +
> 
> [...]
> 
>> +
>> +static struct file *task_file_seq_get_next(struct pid_namespace *ns, u32 *id,
>> +                                          int *fd, struct task_struct **task,
>> +                                          struct files_struct **fstruct)
>> +{
>> +       struct files_struct *files;
>> +       struct task_struct *tk;
>> +       u32 sid = *id;
>> +       int sfd;
>> +
>> +       /* If this function returns a non-NULL file object,
>> +        * it held a reference to the files_struct and file.
>> +        * Otherwise, it does not hold any reference.
>> +        */
>> +again:
>> +       if (*fstruct) {
>> +               files = *fstruct;
>> +               sfd = *fd;
>> +       } else {
>> +               tk = task_seq_get_next(ns, &sid);
>> +               if (!tk)
>> +                       return NULL;
>> +
>> +               files = get_files_struct(tk);
>> +               put_task_struct(tk);
> 
> task is put here, but is still used below.. is there some additional
> hidden refcounting involved?

Good question. I had an impression that we take a reference count
for task->files so task should not go away. But reading linux
code again, I do not have sufficient evidence to back my claim.
So I will reference count task as well, e.g., do not put_task_struct()
until all files are done here.

> 
>> +               if (!files) {
>> +                       sid = ++(*id);
>> +                       *fd = 0;
>> +                       goto again;
>> +               }
>> +               *fstruct = files;
>> +               *task = tk;
>> +               if (sid == *id) {
>> +                       sfd = *fd;
>> +               } else {
>> +                       *id = sid;
>> +                       sfd = 0;
>> +               }
>> +       }
>> +
>> +       rcu_read_lock();
>> +       for (; sfd < files_fdtable(files)->max_fds; sfd++) {
> 
> files_fdtable does rcu_dereference on each iteration, would it be
> better to just cache files_fdtable(files)->max_fds into local
> variable? It's unlikely that there will be many iterations, but
> still...

I borrowed code from fs/proc/fd.c. But I can certainly to avoid
repeated reading max_fds as suggested.

> 
>> +               struct file *f;
>> +
>> +               f = fcheck_files(files, sfd);
>> +               if (!f)
>> +                       continue;
>> +               *fd = sfd;
>> +               get_file(f);
>> +               rcu_read_unlock();
>> +               return f;
>> +       }
>> +
>> +       /* the current task is done, go to the next task */
>> +       rcu_read_unlock();
>> +       put_files_struct(files);
>> +       *fstruct = NULL;
> 
> *task = NULL; for completeness?

if *fstruct == NULL, will try to get next task, so *task = NULL
is unnecessary, but I can add it, won't hurt and possibly make
it easy to understand.

> 
>> +       sid = ++(*id);
>> +       *fd = 0;
>> +       goto again;
>> +}
>> +
>> +static void *task_file_seq_start(struct seq_file *seq, loff_t *pos)
>> +{
>> +       struct bpf_iter_seq_task_file_info *info = seq->private;
>> +       struct files_struct *files = NULL;
>> +       struct task_struct *task = NULL;
>> +       struct file *file;
>> +       u32 id = info->id;
>> +       int fd = info->fd;
>> +
>> +       file = task_file_seq_get_next(info->common.ns, &id, &fd, &task, &files);
>> +       if (!file) {
>> +               info->files = NULL;
> 
> what about info->task here?

info->files == NULL indicates the end of iteration, info->task will not 
be checked any more. But I guess, I can assign NULL to task as well to
avoid confusion.

> 
>> +               return NULL;
>> +       }
>> +
>> +       ++*pos;
>> +       info->id = id;
>> +       info->fd = fd;
>> +       info->task = task;
>> +       info->files = files;
>> +
>> +       return file;
>> +}
>> +
> 
> [...]
> 
>> +
>> +struct bpf_iter__task_file {
>> +       __bpf_md_ptr(struct bpf_iter_meta *, meta);
>> +       __bpf_md_ptr(struct task_struct *, task);
>> +       u32 fd;
> 
> nit: sort of works by accident (due to all other field being 8-byte
> aligned pointers), shall we add __attribute__((aligned(8)))?

This is what I thought as well. It should work. But I think
add aligned(8) wont' hurt to expresss the intention.. Will add it.

> 
>> +       __bpf_md_ptr(struct file *, file);
>> +};
>> +
> 
> [...]
> 

  reply	other threads:[~2020-05-06 18:25 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-04  6:25 [PATCH bpf-next v2 00/20] bpf: implement bpf iterator for kernel data Yonghong Song
2020-05-04  6:25 ` [PATCH bpf-next v2 01/20] bpf: implement an interface to register bpf_iter targets Yonghong Song
2020-05-05 21:19   ` Andrii Nakryiko
2020-05-04  6:25 ` [PATCH bpf-next v2 02/20] bpf: allow loading of a bpf_iter program Yonghong Song
2020-05-05 21:29   ` Andrii Nakryiko
2020-05-06  0:07     ` Yonghong Song
2020-05-04  6:25 ` [PATCH bpf-next v2 03/20] bpf: support bpf tracing/iter programs for BPF_LINK_CREATE Yonghong Song
2020-05-05 21:30   ` Andrii Nakryiko
2020-05-06  0:14     ` Yonghong Song
2020-05-06  0:54       ` Alexei Starovoitov
2020-05-06  3:09         ` Andrii Nakryiko
2020-05-06 18:08           ` Alexei Starovoitov
2020-05-04  6:25 ` [PATCH bpf-next v2 04/20] bpf: support bpf tracing/iter programs for BPF_LINK_UPDATE Yonghong Song
2020-05-05 21:32   ` Andrii Nakryiko
2020-05-04  6:25 ` [PATCH bpf-next v2 05/20] bpf: implement bpf_seq_read() for bpf iterator Yonghong Song
2020-05-05 19:56   ` Andrii Nakryiko
2020-05-05 19:57     ` Alexei Starovoitov
2020-05-05 20:25     ` Yonghong Song
2020-05-05 21:08       ` Andrii Nakryiko
2020-05-04  6:25 ` [PATCH bpf-next v2 06/20] bpf: create anonymous " Yonghong Song
2020-05-05 20:11   ` Andrii Nakryiko
2020-05-05 20:28     ` Yonghong Song
2020-05-04  6:25 ` [PATCH bpf-next v2 07/20] bpf: create file " Yonghong Song
2020-05-05 20:15   ` Andrii Nakryiko
2020-05-04  6:25 ` [PATCH bpf-next v2 08/20] bpf: implement common macros/helpers for target iterators Yonghong Song
2020-05-05 20:25   ` Andrii Nakryiko
2020-05-05 20:30     ` Yonghong Song
2020-05-05 21:10       ` Andrii Nakryiko
2020-05-04  6:25 ` [PATCH bpf-next v2 09/20] bpf: add bpf_map iterator Yonghong Song
2020-05-06  5:11   ` Andrii Nakryiko
2020-05-04  6:25 ` [PATCH bpf-next v2 10/20] net: bpf: add netlink and ipv6_route bpf_iter targets Yonghong Song
2020-05-06  5:21   ` Andrii Nakryiko
2020-05-06 17:32     ` Yonghong Song
2020-05-04  6:25 ` [PATCH bpf-next v2 11/20] bpf: add task and task/file iterator targets Yonghong Song
2020-05-06  7:30   ` Andrii Nakryiko
2020-05-06 18:24     ` Yonghong Song [this message]
2020-05-06 20:51       ` Andrii Nakryiko
2020-05-06 21:20         ` Yonghong Song
2020-05-04  6:26 ` [PATCH bpf-next v2 12/20] bpf: add PTR_TO_BTF_ID_OR_NULL support Yonghong Song
2020-05-05 20:27   ` Andrii Nakryiko
2020-05-04  6:26 ` [PATCH bpf-next v2 13/20] bpf: add bpf_seq_printf and bpf_seq_write helpers Yonghong Song
2020-05-06 17:37   ` Andrii Nakryiko
2020-05-06 21:42     ` Yonghong Song
2020-05-08 18:15       ` Andrii Nakryiko
2020-05-04  6:26 ` [PATCH bpf-next v2 14/20] bpf: handle spilled PTR_TO_BTF_ID properly when checking stack_boundary Yonghong Song
2020-05-06 17:38   ` Andrii Nakryiko
2020-05-06 21:47     ` Yonghong Song
2020-05-04  6:26 ` [PATCH bpf-next v2 15/20] bpf: support variable length array in tracing programs Yonghong Song
2020-05-06 17:40   ` Andrii Nakryiko
2020-05-04  6:26 ` [PATCH bpf-next v2 16/20] tools/libbpf: add bpf_iter support Yonghong Song
2020-05-06  5:44   ` Andrii Nakryiko
2020-05-04  6:26 ` [PATCH bpf-next v2 17/20] tools/bpftool: add bpf_iter support for bptool Yonghong Song
2020-05-04  6:26 ` [PATCH bpf-next v2 18/20] tools/bpf: selftests: add iterator programs for ipv6_route and netlink Yonghong Song
2020-05-06  6:01   ` Andrii Nakryiko
2020-05-07  1:09     ` Yonghong Song
2020-05-08 18:17       ` Andrii Nakryiko
2020-05-06  6:04   ` Andrii Nakryiko
2020-05-06 23:07     ` Yonghong Song
2020-05-04  6:26 ` [PATCH bpf-next v2 19/20] tools/bpf: selftests: add iter progs for bpf_map/task/task_file Yonghong Song
2020-05-06  6:14   ` Andrii Nakryiko
2020-05-04  6:26 ` [PATCH bpf-next v2 20/20] tools/bpf: selftests: add bpf_iter selftests Yonghong Song
2020-05-06  6:39   ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8758f1c9-f4ea-af99-9af8-afe9fb210928@fb.com \
    --to=yhs@fb.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andriin@fb.com \
    --cc=ast@fb.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kafai@fb.com \
    --cc=kernel-team@fb.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.