netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosryahmed@google.com>
To: Hao Luo <haoluo@google.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>, Cgroups <cgroups@vger.kernel.org>,
	Networking <netdev@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Song Liu <song@kernel.org>, Yonghong Song <yhs@fb.com>,
	Tejun Heo <tj@kernel.org>, Zefan Li <lizefan.x@bytedance.com>,
	KP Singh <kpsingh@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Benjamin Tissoires <benjamin.tissoires@redhat.com>,
	John Fastabend <john.fastabend@gmail.com>,
	Michal Koutny <mkoutny@suse.com>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	David Rientjes <rientjes@google.com>,
	Stanislav Fomichev <sdf@google.com>,
	Shakeel Butt <shakeelb@google.com>
Subject: Re: [PATCH bpf-next v7 4/8] bpf: Introduce cgroup iter
Date: Thu, 11 Aug 2022 07:09:31 -0700	[thread overview]
Message-ID: <CAJD7tkY6ihK9PkaAwrdRr-3QyiVFf8h4WkLXx73zYwNUjS_7pw@mail.gmail.com> (raw)
In-Reply-To: <CA+khW7j1Ni_PfvsGisUpUgFtgg=f_qEUVd1VUmocn6L3=kndhw@mail.gmail.com>

On Wed, Aug 10, 2022 at 8:10 PM Hao Luo <haoluo@google.com> wrote:
>
> On Tue, Aug 9, 2022 at 11:38 AM Hao Luo <haoluo@google.com> wrote:
> >
> > On Tue, Aug 9, 2022 at 9:23 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Mon, Aug 08, 2022 at 05:56:57PM -0700, Hao Luo wrote:
> > > > On Mon, Aug 8, 2022 at 5:19 PM Andrii Nakryiko
> > > > <andrii.nakryiko@gmail.com> wrote:
> > > > >
> > > > > On Fri, Aug 5, 2022 at 2:49 PM Hao Luo <haoluo@google.com> wrote:
> > > > > >
> > > > > > Cgroup_iter is a type of bpf_iter. It walks over cgroups in four modes:
> > > > > >
> > > > > >  - walking a cgroup's descendants in pre-order.
> > > > > >  - walking a cgroup's descendants in post-order.
> > > > > >  - walking a cgroup's ancestors.
> > > > > >  - process only the given cgroup.
> > > > > >
> > [...]
> > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > > index 59a217ca2dfd..4d758b2e70d6 100644
> > > > > > --- a/include/uapi/linux/bpf.h
> > > > > > +++ b/include/uapi/linux/bpf.h
> > > > > > @@ -87,10 +87,37 @@ struct bpf_cgroup_storage_key {
> > > > > >         __u32   attach_type;            /* program attach type (enum bpf_attach_type) */
> > > > > >  };
> > > > > >
> > > > > > +enum bpf_iter_order {
> > > > > > +       BPF_ITER_ORDER_DEFAULT = 0,     /* default order. */
> > > > >
> > > > > why is this default order necessary? It just adds confusion (I had to
> > > > > look up source code to know what is default order). I might have
> > > > > missed some discussion, so if there is some very good reason, then
> > > > > please document this in commit message. But I'd rather not do some
> > > > > magical default order instead. We can set 0 to mean invalid and error
> > > > > out, or just do SELF as the very first value (and if user forgot to
> > > > > specify more fancy mode, they hopefully will quickly discover this in
> > > > > their testing).
> > > > >
> > > >
> > > > PRE/POST/UP are tree-specific orders. SELF applies on all iters and
> > > > yields only a single object. How does task_iter express a non-self
> > > > order? By non-self, I mean something like "I don't care about the
> > > > order, just scan _all_ the objects". And this "don't care" order, IMO,
> > > > may be the common case. I don't think everyone cares about walking
> > > > order for tasks. The DEFAULT is intentionally put at the first value,
> > > > so that if users don't care about order, they don't have to specify
> > > > this field.
> > > >
> > > > If that sounds valid, maybe using "UNSPEC" instead of "DEFAULT" is better?
> > >
> > > I agree with Andrii.
> > > This:
> > > +       if (order == BPF_ITER_ORDER_DEFAULT)
> > > +               order = BPF_ITER_DESCENDANTS_PRE;
> > >
> > > looks like an arbitrary choice.
> > > imo
> > > BPF_ITER_DESCENDANTS_PRE = 0,
> > > would have been more obvious. No need to dig into definition of "default".
> > >
> > > UNSPEC = 0
> > > is fine too if we want user to always be conscious about the order
> > > and the kernel will error if that field is not initialized.
> > > That would be my preference, since it will match the rest of uapi/bpf.h
> > >
> >
> > Sounds good. In the next version, will use
> >
> > enum bpf_iter_order {
> >         BPF_ITER_ORDER_UNSPEC = 0,
> >         BPF_ITER_SELF_ONLY,             /* process only a single object. */
> >         BPF_ITER_DESCENDANTS_PRE,       /* walk descendants in pre-order. */
> >         BPF_ITER_DESCENDANTS_POST,      /* walk descendants in post-order. */
> >         BPF_ITER_ANCESTORS_UP,          /* walk ancestors upward. */
> > };
> >
>
> Sigh, I find that having UNSPEC=0 and erroring out when seeing UNSPEC
> doesn't work. Basically, if we have a non-iter prog and a cgroup_iter
> prog written in the same source file, I can't use
> bpf_object__attach_skeleton to attach them. Because the default
> prog_attach_fn for iter initializes `order` to 0 (that is, UNSPEC),
> which is going to be rejected by the kernel. In order to make
> bpf_object__attach_skeleton work on cgroup_iter, I think I need to use
> the following
>
> enum bpf_iter_order {
>         BPF_ITER_DESCENDANTS_PRE,       /* walk descendants in pre-order. */
>         BPF_ITER_DESCENDANTS_POST,      /* walk descendants in post-order. */
>         BPF_ITER_ANCESTORS_UP,          /* walk ancestors upward. */
>         BPF_ITER_SELF_ONLY,             /* process only a single object. */
> };
>
> So that when calling bpf_object__attach_skeleton() on cgroup_iter, a
> link can be generated and the generated link defaults to pre-order
> walk on the whole hierarchy. Is there a better solution?
>

I think this can be handled by userspace? We can attach the
cgroup_iter separately first (and maybe we will need to set prog->link
as well) so that bpf_object__attach_skeleton() doesn't try to attach
it? I am following this pattern in the selftest in the final patch,
although I think I might be missing setting prog->link, so I am
wondering why there are no issues in that selftest which has the same
scenario that you are talking about.

I think such a pattern will need to be used anyway if the users need
to set any non-default arguments for the cgroup_iter prog (like the
selftest), right? The only case we are discussing here is the case
where the user wants to attach the cgroup_iter with all default
options (in which case the default order will fail).
I agree that it might be inconvenient if the default/uninitialized
options don't work for cgroup_iter, but Alexei pointed out that this
matches other bpf uapis.

My concern is that in the future we try to reuse enum bpf_iter_order
to set ordering for other iterators, and then the
default/uninitialized value (BPF_ITER_DESCENDANTS_PRE) doesn't make
sense for that iterator (e.g. not a tree). In this case, the same
problem that we are avoiding for cgroup_iter here will show up for
that iterator, and we can't easily change it at this point because
it's uapi.


> > and explicitly list the values acceptable by cgroup_iter, error out if
> > UNSPEC is detected.
> >
> > Also, following Andrii's comments, will change BPF_ITER_SELF to
> > BPF_ITER_SELF_ONLY, which does seem a little bit explicit in
> > comparison.
> >
> > > I applied the first 3 patches to ease respin.
> >
> > Thanks! This helps!
> >
> > > Thanks!

  reply	other threads:[~2022-08-11 14:10 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-05 21:48 [PATCH bpf-next v7 0/8] bpf: rstat: cgroup hierarchical stats Hao Luo
2022-08-05 21:48 ` [PATCH bpf-next v7 1/8] btf: Add a new kfunc flag which allows to mark a function to be sleepable Hao Luo
2022-08-05 21:48 ` [PATCH bpf-next v7 2/8] cgroup: enable cgroup_get_from_file() on cgroup1 Hao Luo
2022-08-05 21:48 ` [PATCH bpf-next v7 3/8] bpf, iter: Fix the condition on p when calling stop Hao Luo
2022-08-05 21:48 ` [PATCH bpf-next v7 4/8] bpf: Introduce cgroup iter Hao Luo
2022-08-09  0:18   ` Andrii Nakryiko
2022-08-09  0:56     ` Hao Luo
2022-08-09 16:23       ` Alexei Starovoitov
2022-08-09 18:38         ` Hao Luo
2022-08-11  3:10           ` Hao Luo
2022-08-11 14:09             ` Yosry Ahmed [this message]
2022-08-16  4:12               ` Andrii Nakryiko
2022-08-16  4:19                 ` Andrii Nakryiko
2022-08-16  6:52                 ` Hao Luo
2022-08-16 17:17                   ` Andrii Nakryiko
2022-08-16 17:22                     ` Hao Luo
2022-08-05 21:48 ` [PATCH bpf-next v7 5/8] selftests/bpf: Test cgroup_iter Hao Luo
2022-08-09  0:20   ` Andrii Nakryiko
2022-08-09  1:18     ` Hao Luo
2022-08-05 21:48 ` [PATCH bpf-next v7 6/8] cgroup: bpf: enable bpf programs to integrate with rstat Hao Luo
2022-08-05 21:48 ` [PATCH bpf-next v7 7/8] selftests/bpf: extend cgroup helpers Hao Luo
2022-08-05 21:48 ` [PATCH bpf-next v7 8/8] selftests/bpf: add a selftest for cgroup hierarchical stats collection Hao Luo
2022-08-09 16:20 ` [PATCH bpf-next v7 0/8] bpf: rstat: cgroup hierarchical stats patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJD7tkY6ihK9PkaAwrdRr-3QyiVFf8h4WkLXx73zYwNUjS_7pw@mail.gmail.com \
    --to=yosryahmed@google.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=benjamin.tissoires@redhat.com \
    --cc=bpf@vger.kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=hannes@cmpxchg.org \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan.x@bytedance.com \
    --cc=martin.lau@linux.dev \
    --cc=mhocko@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=netdev@vger.kernel.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=sdf@google.com \
    --cc=shakeelb@google.com \
    --cc=song@kernel.org \
    --cc=tj@kernel.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).