All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: "Kumar Kartikeya Dwivedi" <memxor@gmail.com>,
	bpf <bpf@vger.kernel.org>,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Martin KaFai Lau" <kafai@fb.com>,
	"Song Liu" <songliubraving@fb.com>, "Yonghong Song" <yhs@fb.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"KP Singh" <kpsingh@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Jesper Dangaard Brouer" <brouer@redhat.com>,
	Networking <netdev@vger.kernel.org>
Subject: Re: [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API
Date: Thu, 22 Apr 2021 11:28:36 -0700	[thread overview]
Message-ID: <CAEf4BzZKjhsrF3ii4U-pMk3pJt7G3U7Hzkf_7zMOFhGMv7WKWg@mail.gmail.com> (raw)
In-Reply-To: <b7ace8c2-0147-fde7-d319-479be1e2a05e@iogearbox.net>

On Thu, Apr 22, 2021 at 8:35 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 4/22/21 5:43 AM, Andrii Nakryiko wrote:
> > On Wed, Apr 21, 2021 at 3:59 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >> On 4/20/21 9:37 PM, Kumar Kartikeya Dwivedi wrote:
> >> [...]
> >>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> >>> index bec4e6a6e31d..b4ed6a41ea70 100644
> >>> --- a/tools/lib/bpf/libbpf.h
> >>> +++ b/tools/lib/bpf/libbpf.h
> >>> @@ -16,6 +16,8 @@
> >>>    #include <stdbool.h>
> >>>    #include <sys/types.h>  // for size_t
> >>>    #include <linux/bpf.h>
> >>> +#include <linux/pkt_sched.h>
> >>> +#include <linux/tc_act/tc_bpf.h>
> >>>
> >>>    #include "libbpf_common.h"
> >>>
> >>> @@ -775,6 +777,48 @@ LIBBPF_API int bpf_linker__add_file(struct bpf_linker *linker, const char *filen
> >>>    LIBBPF_API int bpf_linker__finalize(struct bpf_linker *linker);
> >>>    LIBBPF_API void bpf_linker__free(struct bpf_linker *linker);
> >>>
> >>> +/* Convenience macros for the clsact attach hooks */
> >>> +#define BPF_TC_CLSACT_INGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS)
> >>> +#define BPF_TC_CLSACT_EGRESS TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS)
> >>
> >> I would abstract those away into an enum, plus avoid having to pull in
> >> linux/pkt_sched.h and linux/tc_act/tc_bpf.h from main libbpf.h header.
> >>
> >> Just add a enum { BPF_TC_DIR_INGRESS, BPF_TC_DIR_EGRESS, } and then the
> >> concrete tc bits (TC_H_MAKE()) can be translated internally.
> >>
> >>> +struct bpf_tc_opts {
> >>> +     size_t sz;
> >>
> >> Is this set anywhere?
> >>
> >>> +     __u32 handle;
> >>> +     __u32 class_id;
> >>
> >> I'd remove class_id from here as well given in direct-action a BPF prog can
> >> set it if needed.
> >>
> >>> +     __u16 priority;
> >>> +     bool replace;
> >>> +     size_t :0;
> >>
> >> What's the rationale for this padding?
> >>
> >>> +};
> >>> +
> >>> +#define bpf_tc_opts__last_field replace
> >>> +
> >>> +/* Acts as a handle for an attached filter */
> >>> +struct bpf_tc_attach_id {
> >>
> >> nit: maybe bpf_tc_ctx
> >
> > ok, so wait. It seems like apart from INGRESS|EGRESS enum and ifindex,
> > everything else is optional and/or has some sane defaults, right? So
> > this bpf_tc_attach_id or bpf_tc_ctx seems a bit artificial construct
> > and it will cause problems for extending this.
> >
> > So if my understanding is correct, I'd get rid of it completely. As I
> > said previously, opts allow returning parameters back, so if user
> > didn't specify handle and priority and kernel picks values on user's
> > behalf, we can return them in the same opts fields.
> >
> > For detach, again, ifindex and INGRESS|EGRESS is sufficient, but if
> > user want to provide more detailed parameters, we should do that
> > through extensible opts. That way we can keep growing this easily,
> > plus simple cases will remain simple.
> >
> > Similarly bpf_tc_info below, there is no need to have struct
> > bpf_tc_attach_id id; field, just have handle and priority right there.
> > And bpf_tc_info should use OPTS framework for extensibility (though
> > opts name doesn't fit it very well, but it is still nice for
> > extensibility and for doing optional input/output params).
> >
> > Does this make sense? Am I missing something crucial here?
>
> I would probably keep the handle + priority in there; maybe if both are 0,
> we could fix it to some default value internally, but without those it might
> be a bit hard if people want to build a 'pipeline' of cls_bpf progs if they
> need/want to.

Oh, I'm not proposing to drop support for specifying handle and prio.
I'm just saying having a fixed UAPI struct bpf_tc_attach_id as an "ID"
is problematic from API stability point of view. So instead of
pretending we know what "ID" will always be like, pass any extra
non-default fields in OPTS struct. And if those are not specified by
user (either opts is NULL or handle/prio is 0), use sane defaults, as
you are proposing.

>
> Potentially, one could fixate the handle itself, and then allow to specify
> different priorities for it such that when a BPF prog returns a TC_ACT_UNSPEC,
> it will exec the next one inside that cls_bpf instance, every other TC_ACT_*
> opcode will terminate the processing. Technically, only priority would really
> be needed (unless you combine multiple different classifiers from tc side on
> the ingress/egress hook which is not great to begin with, tbh).
>
> > The general rule with any new structs added to libbpf APIs is to
> > either be 100% (ok, 99.99%) sure that they will never be changed, or
> > do forward/backward compatible OPTS. Any other thing is pain and calls
> > for symbol versioning, which we are trying really hard to avoid.
> >
> >>> +     __u32 handle;
> >>> +     __u16 priority;
> >>> +};
> >>> +
> >>> +struct bpf_tc_info {
> >>> +     struct bpf_tc_attach_id id;
> >>> +     __u16 protocol;
> >>> +     __u32 chain_index;
> >>> +     __u32 prog_id;
> >>> +     __u8 tag[BPF_TAG_SIZE];
> >>> +     __u32 class_id;
> >>> +     __u32 bpf_flags;
> >>> +     __u32 bpf_flags_gen;
> >>
> >> Given we do not yet have any setters e.g. for offload, etc, the one thing
> >> I'd see useful and crucial initially is prog_id.
> >>
> >> The protocol, chain_index, and I would also include tag should be dropped.
> >> Similarly class_id given my earlier statement, and flags I would extend once
> >> this lib API would support offloading progs.
> >>
> >>> +};
> >>> +
> >>> +/* id is out parameter that will be written to, it must not be NULL */
> >>> +LIBBPF_API int bpf_tc_attach(int fd, __u32 ifindex, __u32 parent_id,
> >>> +                          const struct bpf_tc_opts *opts,
> >>> +                          struct bpf_tc_attach_id *id);
> >>> +LIBBPF_API int bpf_tc_detach(__u32 ifindex, __u32 parent_id,
> >>> +                          const struct bpf_tc_attach_id *id);
> >>> +LIBBPF_API int bpf_tc_get_info(__u32 ifindex, __u32 parent_id,
> >>> +                            const struct bpf_tc_attach_id *id,
> >>> +                            struct bpf_tc_info *info);
> >>
> >> As per above, for parent_id I'd replace with dir enum.
> >>
> >>> +
> >>>    #ifdef __cplusplus
> >>>    } /* extern "C" */
> >>>    #endif
>

  reply	other threads:[~2021-04-22 18:28 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-20 19:37 [PATCH bpf-next v3 0/3] Add TC-BPF API Kumar Kartikeya Dwivedi
2021-04-20 19:37 ` [PATCH bpf-next v3 1/3] libbpf: add helpers for preparing netlink attributes Kumar Kartikeya Dwivedi
2021-04-21  6:37   ` Yonghong Song
2021-04-20 19:37 ` [PATCH bpf-next v3 2/3] libbpf: add low level TC-BPF API Kumar Kartikeya Dwivedi
2021-04-21  6:58   ` Yonghong Song
2021-04-21 17:06     ` Kumar Kartikeya Dwivedi
2021-04-22  1:56       ` Yonghong Song
2021-04-21 18:19   ` Andrii Nakryiko
2021-04-21 19:48     ` Toke Høiland-Jørgensen
2021-04-21 23:14       ` Daniel Borkmann
2021-04-22  9:08         ` Toke Høiland-Jørgensen
2021-04-22 11:51           ` Daniel Borkmann
2021-04-22 12:46             ` Toke Høiland-Jørgensen
2021-04-21 22:59   ` Daniel Borkmann
2021-04-21 23:08     ` Kumar Kartikeya Dwivedi
2021-04-21 23:21       ` Daniel Borkmann
2021-04-21 23:30         ` Kumar Kartikeya Dwivedi
2021-04-21 23:41           ` Daniel Borkmann
2021-04-22  9:47             ` Shaun Crampton
2021-04-22 11:26               ` Daniel Borkmann
2021-04-22  3:43     ` Andrii Nakryiko
2021-04-22 15:35       ` Daniel Borkmann
2021-04-22 18:28         ` Andrii Nakryiko [this message]
2021-04-20 19:37 ` [PATCH bpf-next v3 3/3] libbpf: add selftests for " Kumar Kartikeya Dwivedi
2021-04-21  7:01   ` Yonghong Song
2021-04-21 18:24   ` Andrii Nakryiko
2021-04-21 19:56     ` Kumar Kartikeya Dwivedi
2021-04-21 20:38       ` Toke Høiland-Jørgensen
2021-04-21 22:41         ` Daniel Borkmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEf4BzZKjhsrF3ii4U-pMk3pJt7G3U7Hzkf_7zMOFhGMv7WKWg@mail.gmail.com \
    --to=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=memxor@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=songliubraving@fb.com \
    --cc=toke@redhat.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.