All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kumar Kartikeya Dwivedi <memxor@gmail.com>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org, "Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Martin KaFai Lau" <kafai@fb.com>,
	"Song Liu" <songliubraving@fb.com>, "Yonghong Song" <yhs@fb.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"KP Singh" <kpsingh@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	"Jesper Dangaard Brouer" <brouer@redhat.com>,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH bpf-next v2 3/4] libbpf: add low level TC-BPF API
Date: Tue, 20 Apr 2021 03:15:17 +0530	[thread overview]
Message-ID: <20210419214517.bqur27tytx4onfnn@apollo> (raw)
In-Reply-To: <6e8b744c-e012-c76b-b55f-7ddc8b7483db@iogearbox.net>

On Tue, Apr 20, 2021 at 02:30:44AM IST, Daniel Borkmann wrote:
> On 4/19/21 2:18 PM, Kumar Kartikeya Dwivedi wrote:
> > This adds functions that wrap the netlink API used for adding,
> > manipulating, and removing traffic control filters. These functions
> > operate directly on the loaded prog's fd, and return a handle to the
> > filter using an out parameter named id.
> >
> > The basic featureset is covered to allow for attaching, manipulation of
> > properties, and removal of filters. Some additional features like
> > TCA_BPF_POLICE and TCA_RATE for tc_cls have been omitted. These can
> > added on top later by extending the bpf_tc_cls_opts struct.
> >
> > Support for binding actions directly to a classifier by passing them in
> > during filter creation has also been omitted for now. These actions have
> > an auto clean up property because their lifetime is bound to the filter
> > they are attached to. This can be added later, but was omitted for now
> > as direct action mode is a better alternative to it, which is enabled by
> > default.
> >
> > An API summary:
> >
> > bpf_tc_act_{attach, change, replace} may be used to attach, change, and
>
> typo on bpf_tc_act_{...} ?
>

Oops, yes. Should be bpf_tc_cls_...

> > replace SCHED_CLS bpf classifier. The protocol field can be set as 0, in
> > which case it is subsitituted as ETH_P_ALL by default.
>
> Do you have an actual user that needs anything other than ETH_P_ALL? Why is it
> even needed? Why not stick to just ETH_P_ALL?
>

Mostly because it was little to no effort to expose this. Though if you feel
strongly about it I can drop the protocol option, and just bake in ETH_P_ALL. It
can always be added later ofcourse, if the need arises in the future.

> > The behavior of the three functions is as follows:
> >
> > attach = create filter if it does not exist, fail otherwise
> > change = change properties of the classifier of existing filter
> > replace = create filter, and replace any existing filter
>
> This touches on tc oddities quite a bit. Why do we need to expose them? Can't we
> simplify/abstract this e.g. i) create or update instance, ii) delete instance,
> iii) get instance ? What concrete use case do you have that you need those three
> above?
>

'change' is relevant for modifying classifier specific options, and given it's
a lot less useful now as per the current state of this patch, I am fine with
removing it. This is also where the distinction becomes visible to the user, so
removing it should hide the filter/classifier separation.

> > bpf_tc_cls_detach may be used to detach existing SCHED_CLS
> > filter. The bpf_tc_cls_attach_id object filled in during attach,
> > change, or replace must be passed in to the detach functions for them to
> > remove the filter and its attached classififer correctly.
> >
> > bpf_tc_cls_get_info is a helper that can be used to obtain attributes
> > for the filter and classififer. The opts structure may be used to
> > choose the granularity of search, such that info for a specific filter
> > corresponding to the same loaded bpf program can be obtained. By
> > default, the first match is returned to the user.
> >
> > Examples:
> >
> > 	struct bpf_tc_cls_attach_id id = {};
> > 	struct bpf_object *obj;
> > 	struct bpf_program *p;
> > 	int fd, r;
> >
> > 	obj = bpf_object_open("foo.o");
> > 	if (IS_ERR_OR_NULL(obj))
> > 		return PTR_ERR(obj);
> >
> > 	p = bpf_object__find_program_by_title(obj, "classifier");
> > 	if (IS_ERR_OR_NULL(p))
> > 		return PTR_ERR(p);
> >
> > 	if (bpf_object__load(obj) < 0)
> > 		return -1;
> >
> > 	fd = bpf_program__fd(p);
> >
> > 	r = bpf_tc_cls_attach(fd, if_nametoindex("lo"),
> > 			      BPF_TC_CLSACT_INGRESS,
> > 			      NULL, &id);
> > 	if (r < 0)
> > 		return r;
> >
> > ... which is roughly equivalent to (after clsact qdisc setup):
> >    # tc filter add dev lo ingress bpf obj foo.o sec classifier da
> >
> > ... as direct action mode is always enabled.
> >
> > If a user wishes to modify existing options on an attached classifier,
> > bpf_tc_cls_change API may be used.
> >
> > Only parameters class_id can be modified, the rest are filled in to
> > identify the correct filter. protocol can be left out if it was not
> > chosen explicitly (defaulting to ETH_P_ALL).
> >
> > Example:
> >
> > 	/* Optional parameters necessary to select the right filter */
> > 	DECLARE_LIBBPF_OPTS(bpf_tc_cls_opts, opts,
> > 			    .handle = id.handle,
> > 			    .priority = id.priority,
> > 			    .chain_index = id.chain_index)
>
> Why do we need chain_index as part of the basic API?
>

It would be relevant when TC_ACT_GOTO_CHAIN is used. Other than that, I guess
it's not very useful.

> > 	opts.class_id = TC_H_MAKE(1UL << 16, 12);
> > 	r = bpf_tc_cls_change(fd, if_nametoindex("lo"),
> > 			      BPF_TC_CLSACT_INGRESS,
> > 			      &opts, &id);
>
> Also, I'm not sure whether the prefix should even be named  bpf_tc_cls_*() tbh,
> yes, despite being "low level" api. I think in the context of bpf we should stop
> regarding this as 'classifier' and 'action' objects since it's really just a
> single entity and not separate ones. It's weird enough to explain this concept
> to new users and if a libbpf based api could cleanly abstract it, I would be all
> for it. I don't think we need to map 1:1 the old tc legacy even in the low level
> api, tbh, as it feels backwards. I think the 'handle' & 'priority' bits are okay,
> but I would remove the others.
>

Ok, would dropping _cls from the name be better?
bpf_tc_attach
bpf_tc_replace
bpf_tc_get_info
bpf_tc_detach

As for options, I'll drop protocol, if you feel strongly about chain_index I can
drop that one too.

> > 	if (r < 0)
> > 		return r;
> >
> > 	struct bpf_tc_cls_info info = {};
> > 	r = bpf_tc_cls_get_info(fd, if_nametoindex("lo"),
> > 			        BPF_TC_CLSACT_INGRESS,
> > 				&opts, &info);
> > 	if (r < 0)
> > 		return r;
> >
> > 	assert(info.class_id == TC_H_MAKE(1UL << 16, 12));
> >
> > This would be roughly equivalent to doing:
> >    # tc filter change dev lo egress prio <p> handle <h> bpf obj foo.o sec \
> >      classifier classid 1:12
>
> Why even bother to support !da mode, what are you trying to solve with it? I
> don't think official libbpf api should support something that doesn't scale.
>

da is default now, this is yet another typo/oversight...

--
Kartikeya

  parent reply	other threads:[~2021-04-19 21:45 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-19 12:18 [PATCH bpf-next v2 0/4] Add TC-BPF API Kumar Kartikeya Dwivedi
2021-04-19 12:18 ` [PATCH bpf-next v2 1/4] tools: pkt_cls.h: sync with kernel sources Kumar Kartikeya Dwivedi
2021-04-19 12:18 ` [PATCH bpf-next v2 2/4] libbpf: add helpers for preparing netlink attributes Kumar Kartikeya Dwivedi
2021-04-19 12:18 ` [PATCH bpf-next v2 3/4] libbpf: add low level TC-BPF API Kumar Kartikeya Dwivedi
2021-04-19 21:00   ` Daniel Borkmann
2021-04-19 21:43     ` Toke Høiland-Jørgensen
2021-04-19 21:57       ` Daniel Borkmann
2021-04-19 21:45     ` Kumar Kartikeya Dwivedi [this message]
2021-04-19 12:18 ` [PATCH bpf-next v2 4/4] libbpf: add selftests for " Kumar Kartikeya Dwivedi
2021-04-20  4:35   ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210419214517.bqur27tytx4onfnn@apollo \
    --to=memxor@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=hawk@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=songliubraving@fb.com \
    --cc=toke@redhat.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.