linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hao Luo <haoluo@google.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>, KP Singh <kpsingh@kernel.org>,
	Shakeel Butt <shakeelb@google.com>,
	Joe Burton <jevburton.kernel@gmail.com>,
	Stanislav Fomichev <sdf@google.com>,
	bpf@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC bpf-next v2 5/5] selftests/bpf: test for pinning for cgroup_view link
Date: Thu, 3 Feb 2022 14:46:07 -0800	[thread overview]
Message-ID: <CA+khW7jkJbvQrTx4oPJAoBZ0EOCtr3C2PKbrzhxj-7euBK8ojg@mail.gmail.com> (raw)
In-Reply-To: <20220203180414.blk6ou3ccmod2qck@ast-mbp.dhcp.thefacebook.com>

On Thu, Feb 3, 2022 at 10:04 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Feb 01, 2022 at 12:55:34PM -0800, Hao Luo wrote:
> > +
> > +SEC("iter/cgroup_view")
> > +int dump_cgroup_lat(struct bpf_iter__cgroup_view *ctx)
> > +{
> > +     struct seq_file *seq = ctx->meta->seq;
> > +     struct cgroup *cgroup = ctx->cgroup;
> > +     struct wait_lat *lat;
> > +     u64 id;
> > +
> > +     BPF_SEQ_PRINTF(seq, "cgroup_id: %8lu\n", cgroup->kn->id);
> > +     lat = bpf_map_lookup_elem(&cgroup_lat, &id);
>
> Looks like "id = cgroup->kn->id" assignment is missing here?
>

Ah, yes. I'll fix it.

> Thanks a lot for this test. It explains the motivation well.
>
> It seems that the patches 1-4 are there to automatically
> supply cgroup pointer into bpf_iter__cgroup_view.
>
> Since user space needs to track good part of cgroup dir opreations
> can we task it with the job of patches 1-4 as well?
> It can register notifier for cgroupfs operations and
> do mkdir in bpffs similarly _and_ parametrize 'view' bpf program
> with corresponding cgroup_id.
> Ideally there is no new 'view' program and it's a subset of 'iter'
> bpf program. They're already parametrizable.
> When 'iter' is pinned the user space can tell it which object it should
> iterate on. The 'view' will be an interator of one element and
> argument to it can be cgroup_id.
> When user space pins the same 'view' program in a newly created bpffs
> directory it will parametrize it with a different cgroup_id.
> At the end the same 'view' program will be pinned in multiple directories
> with different cgroup_id arguments.
> This patch 5 will look very much the same, but patches 1-4 will not be
> necessary.
> Of course there are races between cgroup create/destroy and bpffs
> mkdir, prog pin operatiosn, but they will be there regardless.
> The patch 1-4 approach is not race free either.

Right. I tried to minimize the races between cgroupfs and bpffs in
this patchset. The cgroup and kernfs APIs called in this patchset
guarantee that the cgroup and kernfs objects are alive once get. Some
states in the objects such as 'id' should be valid at least.

> Will that work?

Thanks Alexei for the idea.

The parameterization part sounds good. By 'parametrize', do you mean a
variable in iter prog (like the 'pid' variable in bpf_iter_task_vma.c
[1])? or some metadata of the program? I assume it's program's
metadata. Either parameterizing with cgroup_id or passing cgroup
object to the prog should work. The problem is at pinning.

In our use case, we can't ask the users who create cgroups to do the
pinning. Pinning requires root privilege. In our use case, we have
non-root users who can create cgroup directories and still want to
read bpf stats. They can't do pinning by themselves. This is why
inheritance is a requirement for us. With inheritance, they only need
to mkdir in cgroupfs and bpffs (unprivileged operations), no pinning
operation is required. Patch 1-4 are needed to implement inheritance.

It's also not a good idea in our use case to add a userspace
privileged process to monitor cgroupfs operations and perform the
pinning. It's more complex and has a higher maintenance cost and
runtime overhead, compared to the solution of asking whoever makes
cgroups to mkdir in bpffs. The other problem is: if there are nodes in
the data center that don't have the userspace process deployed, the
stats will be unavailable, which is a no-no for some of our users.

[1] tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c

  reply	other threads:[~2022-02-03 22:46 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-01 20:55 [PATCH RFC bpf-next v2 0/5] Extend cgroup interface with bpf Hao Luo
2022-02-01 20:55 ` [PATCH RFC bpf-next v2 1/5] bpf: Bpffs directory tag Hao Luo
2022-02-01 20:55 ` [PATCH RFC bpf-next v2 2/5] bpf: Introduce inherit list for dir tag Hao Luo
2022-02-01 20:55 ` [PATCH RFC bpf-next v2 3/5] bpf: cgroup_view iter Hao Luo
2022-02-01 20:55 ` [PATCH RFC bpf-next v2 4/5] bpf: Pin cgroup_view Hao Luo
2022-02-01 20:55 ` [PATCH RFC bpf-next v2 5/5] selftests/bpf: test for pinning for cgroup_view link Hao Luo
2022-02-03 18:04   ` Alexei Starovoitov
2022-02-03 22:46     ` Hao Luo [this message]
2022-02-04  3:33       ` Alexei Starovoitov
2022-02-04 18:26         ` Hao Luo
2022-02-06  4:29           ` Alexei Starovoitov
2022-02-08 20:07             ` Hao Luo
2022-02-08 21:20               ` Alexei Starovoitov
2022-02-08 21:34                 ` Hao Luo
2022-02-14 18:29                 ` Hao Luo
2022-02-14 19:24                   ` Alexei Starovoitov
2022-02-14 20:23                     ` Hao Luo
2022-02-14 20:27                       ` Alexei Starovoitov
2022-02-14 20:39                         ` Hao Luo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+khW7jkJbvQrTx4oPJAoBZ0EOCtr3C2PKbrzhxj-7euBK8ojg@mail.gmail.com \
    --to=haoluo@google.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=jevburton.kernel@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sdf@google.com \
    --cc=shakeelb@google.com \
    --cc=songliubraving@fb.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).