Re: [PATCH bpf-next v1 8/9] bpf: Introduce cgroup iter

From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: Yonghong Song <yhs@fb.com>
Cc: Hao Luo <haoluo@google.com>, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	KP Singh <kpsingh@kernel.org>, Shakeel Butt <shakeelb@google.com>,
	Joe Burton <jevburton.kernel@gmail.com>,
	Tejun Heo <tj@kernel.org>,
	joshdon@google.com, sdf@google.com, bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH bpf-next v1 8/9] bpf: Introduce cgroup iter
Date: Thu, 3 Mar 2022 13:43:21 +0530	[thread overview]
Message-ID: <20220303081321.hoxhygtgjzi2q6xs@apollo.legion> (raw)
In-Reply-To: <ee991731-0e85-23be-2720-2d641704dcf9@fb.com>

On Thu, Mar 03, 2022 at 01:03:57PM IST, Yonghong Song wrote:
> [...]
> >
> > Right, I was thinking whether it should call seq_show for v == NULL case. All
> > other iterators seem to do so, it's a bit different here since it is only
> > iterating over a single cgroup, I guess, but it would be nice to have some
> > consistency.
>
> You are correct that I think it is okay since it only iterates with one
> cgroup. This is different from other cases so far where more than one
> objects may be traversed. We may have future other use cases, e.g.,
> one task. I think we can abstract out start()/next()/stop() callbacks
> for such use cases. So it is okay it is different from other existing
> iterators since they are indeed different.
>
> >
> > > For cgroup_iter, the following is the current workflow:
> > >     start -> not NULL -> show -> next -> NULL -> stop
> > > or
> > >     start -> NULL -> stop
> > >
> > > So for cgroup_iter_seq_stop, the input parameter 'v' will be NULL, so
> > > the cgroup_put() is not actually called, i.e., corresponding cgroup is
> > > not freed.
> > >
> > > There are two ways to fix the issue:
> > >    . call cgroup_put() in next() before return NULL. This way,
> > >      stop() will be a noop.
> > >    . put cgroup_get_from_id() and cgroup_put() in
> > >      bpf_iter_attach_cgroup() and bpf_iter_detach_cgroup().
> > >
> > > I prefer the second approach as it is cleaner.
> > >
> >
> > I think current approach is also not safe if cgroup_id gets reused, right? I.e.
> > it only does cgroup_get_from_id in seq_start, not at attach time, so it may not
> > be the same cgroup when calling read(2). kernfs is using idr_alloc_cyclic, so it
> > is less likely to occur, but since it wraps around to find a free ID it might
> > not be theoretical.
>
> As Alexei mentioned, cgroup id is 64-bit, the collision should
> be nearly impossible. Another option is to get a fd from
> the cgroup path, and send the fd to the kernel. This probably
> works.
>

I see, even on 32-bit systems the actual id is 64-bit.
As for cgroup fd vs id, existing cgroup BPF programs seem to take fd, map iter
also takes map fd, so it might make sense to use cgroup fd here as well.

> [...]

--
Kartikeya