Re: [PATCH net-next v2 02/15] bpf: offload: add infrastructure for loading programs for a specific netdev

From: Daniel Borkmann <daniel@iogearbox.net>
To: Jakub Kicinski <jakub.kicinski@netronome.com>, netdev@vger.kernel.org
Cc: oss-drivers@netronome.com, alexei.starovoitov@gmail.com
Subject: Re: [PATCH net-next v2 02/15] bpf: offload: add infrastructure for loading programs for a specific netdev
Date: Mon, 06 Nov 2017 18:32:45 +0100	[thread overview]
Message-ID: <5A009CBD.1080800@iogearbox.net> (raw)
In-Reply-To: <20171103205630.1083-3-jakub.kicinski@netronome.com>

On 11/03/2017 09:56 PM, Jakub Kicinski wrote:
> The fact that we don't know which device the program is going
> to be used on is quite limiting in current eBPF infrastructure.
> We have to reverse or limit the changes which kernel makes to
> the loaded bytecode if we want it to be offloaded to a networking
> device.  We also have to invent new APIs for debugging and
> troubleshooting support.
>
> Make it possible to load programs for a specific netdev.  This
> helps us to bring the debug information closer to the core
> eBPF infrastructure (e.g. we will be able to reuse the verifer
> log in device JIT).  It allows device JITs to perform translation
> on the original bytecode.
>
> __bpf_prog_get() when called to get a reference for an attachment
> point will now refuse to give it if program has a device assigned.
> Following patches will add a version of that function which passes
> the expected netdev in. @type argument in __bpf_prog_get() is
> renamed to attach_type to make it clearer that it's only set on
> attachment.
>
> All calls to ndo_bpf are protected by rtnl, only verifier callbacks
> are not.  We need a wait queue to make sure netdev doesn't get
> destroyed while verifier is still running and calling its driver.
>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Simon Horman <simon.horman@netronome.com>
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>

First of all, great work, I went over the series and I really like
the outcome. It's applied already anyway, but two minor comments
further below.

[...]
> @@ -1549,6 +1555,8 @@ static void bpf_prog_free_deferred(struct work_struct *work)
>   	struct bpf_prog_aux *aux;
>
>   	aux = container_of(work, struct bpf_prog_aux, work);
> +	if (bpf_prog_is_dev_bound(aux))
> +		bpf_prog_offload_destroy(aux->prog);
>   	bpf_jit_free(aux->prog);
>   }
[...]
> +static int bpf_offload_notification(struct notifier_block *notifier,
> +				    ulong event, void *ptr)
> +{
> +	struct net_device *netdev = netdev_notifier_info_to_dev(ptr);
> +	struct bpf_dev_offload *offload, *tmp;
> +
> +	ASSERT_RTNL();
> +
> +	switch (event) {
> +	case NETDEV_UNREGISTER:
> +		list_for_each_entry_safe(offload, tmp, &bpf_prog_offload_devs,
> +					 offloads) {
> +			if (offload->netdev == netdev)
> +				__bpf_prog_offload_destroy(offload->prog);

We would be calling this twice, right? Once here and then on prog
destruction again. __bpf_prog_offload_destroy() looks it will handle
this just fine, but we should probably add a comment to
__bpf_prog_offload_destroy() such that when changes are made to it
it's obvious that we need to be extra careful.

[...]
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 323be2473c4b..1574b9f0f24e 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -824,7 +824,10 @@ static int find_prog_type(enum bpf_prog_type type, struct bpf_prog *prog)
>   	if (type >= ARRAY_SIZE(bpf_prog_types) || !bpf_prog_types[type])
>   		return -EINVAL;
>
> -	prog->aux->ops = bpf_prog_types[type];
> +	if (!bpf_prog_is_dev_bound(prog->aux))
> +		prog->aux->ops = bpf_prog_types[type];
> +	else
> +		prog->aux->ops = &bpf_offload_prog_ops;
>   	prog->type = type;
>   	return 0;
>   }
> @@ -1054,7 +1057,7 @@ struct bpf_prog *bpf_prog_inc_not_zero(struct bpf_prog *prog)
>   }
>   EXPORT_SYMBOL_GPL(bpf_prog_inc_not_zero);
>
> -static struct bpf_prog *__bpf_prog_get(u32 ufd, enum bpf_prog_type *type)
> +static struct bpf_prog *__bpf_prog_get(u32 ufd, enum bpf_prog_type *attach_type)
>   {
>   	struct fd f = fdget(ufd);
>   	struct bpf_prog *prog;
> @@ -1062,7 +1065,7 @@ static struct bpf_prog *__bpf_prog_get(u32 ufd, enum bpf_prog_type *type)
>   	prog = ____bpf_prog_get(f);
>   	if (IS_ERR(prog))
>   		return prog;
> -	if (type && prog->type != *type) {
> +	if (attach_type && (prog->type != *attach_type || prog->aux->offload)) {
>   		prog = ERR_PTR(-EINVAL);
>   		goto out;
>   	}
> @@ -1089,7 +1092,7 @@ struct bpf_prog *bpf_prog_get_type(u32 ufd, enum bpf_prog_type type)
>   EXPORT_SYMBOL_GPL(bpf_prog_get_type);
>
>   /* last field in 'union bpf_attr' used by this command */
> -#define	BPF_PROG_LOAD_LAST_FIELD prog_name
> +#define	BPF_PROG_LOAD_LAST_FIELD prog_target_ifindex

For program types that are neither XDP nor cls_bpf, we should reject
the request if something calls bpf(2) with non-0 prog_target_ifindex.

That way, i) we don't burn the whole field and could perhaps reuse/union
it for other prog types like tracing in future. Probably makes sense to
do anyway since ii) for types like tracing, we would want to reject this
upfront here and not when later attach happens.

I probably missed something when reading the code, but if I spotted
that correctly, we might otherwise even go and nfp-jit simple progs
for non-networking types (we would bail out later though on in
__bpf_prog_get() ... but we shouldn't let syscall return in first
place)?

>   static int bpf_prog_load(union bpf_attr *attr)
>   {
> @@ -1152,6 +1155,12 @@ static int bpf_prog_load(union bpf_attr *attr)
>   	atomic_set(&prog->aux->refcnt, 1);
>   	prog->gpl_compatible = is_gpl ? 1 : 0;
>
> +	if (attr->prog_target_ifindex) {
> +		err = bpf_prog_offload_init(prog, attr);
> +		if (err)
> +			goto free_prog;
> +	}
> +
>   	/* find program type: socket_filter vs tracing_filter */
>   	err = find_prog_type(type, prog);
[...]