netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: "Björn Töpel" <bjorn.topel@gmail.com>
Cc: netdev@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
	"Björn Töpel" <bjorn.topel@intel.com>,
	bpf@vger.kernel.org, magnus.karlsson@gmail.com,
	magnus.karlsson@intel.com, jonathan.lemon@gmail.com,
	ecree@solarflare.com, thoiland@redhat.com,
	andrii.nakryiko@gmail.com, tariqt@mellanox.com,
	saeedm@mellanox.com, maximmi@mellanox.com
Subject: Re: [PATCH bpf-next v2 1/6] bpf: introduce BPF dispatcher
Date: Sat, 23 Nov 2019 17:55:06 -0800	[thread overview]
Message-ID: <20191124015504.yypqw4gx52e5e6og@ast-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <20191123071226.6501-2-bjorn.topel@gmail.com>

On Sat, Nov 23, 2019 at 08:12:20AM +0100, Björn Töpel wrote:
> +
> +		err = emit_jump(&prog,			/* jmp thunk */
> +				__x86_indirect_thunk_rdx, prog);

could you please add a comment that this is gcc specific and gate it
by build_bug_on ?
I think even if compiler stays the change of flags:
RETPOLINE_CFLAGS_GCC := -mindirect-branch=thunk-extern -mindirect-branch-register
may change the name of this helper?
I wonder whether it's possible to make it compiler independent.

> diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c
> new file mode 100644
> index 000000000000..385dd76ab6d2
> --- /dev/null
> +++ b/kernel/bpf/dispatcher.c
> @@ -0,0 +1,208 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(c) 2019 Intel Corporation. */
> +
> +#ifdef CONFIG_RETPOLINE

I'm worried that such strong gating will make the code rot. Especially it's not
covered by selftests.
Could you please add xdp_call_run() to generic xdp and add a selftest ?
Also could you please benchmark it without retpoline?
iirc direct call is often faster than indirect, so I suspect this optimization
may benefit non-mitigated kernels.

> +#define DISPATCHER_HASH_BITS 10
> +#define DISPATCHER_TABLE_SIZE (1 << DISPATCHER_HASH_BITS)
> +
> +static struct hlist_head dispatcher_table[DISPATCHER_TABLE_SIZE];

there is one DEFINE_XDP_CALL per driver, so total number of such
dispatch routines is pretty small. 1<<10 hash table is overkill.
The hash table itself is overkill :)

How about adding below:

> +#define BPF_DISPATCHER_MAX 16
> +
> +struct bpf_dispatcher {
> +	struct hlist_node hlist;
> +	void *func;
> +	struct bpf_prog *progs[BPF_DISPATCHER_MAX];
> +	int num_progs;
> +	void *image;
> +	u64 selector;
> +};

without hlist and without func to DEFINE_XDP_CALL() macro?
Then bpf_dispatcher_lookup() will become bpf_dispatcher_init()
and the rest will become a bit simpler?

> +
> +	set_vm_flush_reset_perms(image);
> +	set_memory_x((long)image, 1);
> +	d->image = image;

Can you add a common helper for this bit to share between
bpf dispatch and bpf trampoline?

> +static void bpf_dispatcher_update(struct bpf_dispatcher *d)
> +{
> +	void *old_image = d->image + ((d->selector + 1) & 1) * PAGE_SIZE / 2;
> +	void *new_image = d->image + (d->selector & 1) * PAGE_SIZE / 2;
> +	s64 ips[BPF_DISPATCHER_MAX] = {}, *ipsp = &ips[0];
> +	int i, err;
> +
> +	if (!d->num_progs) {
> +		bpf_arch_text_poke(d->func, BPF_MOD_JUMP_TO_NOP,
> +				   old_image, NULL);
> +		return;

how does it work? Without doing d->selector = 0; the next addition
will try to do JUMP_TO_JUMP and will fail...

> +	}
> +
> +	for (i = 0; i < BPF_DISPATCHER_MAX; i++) {
> +		if (d->progs[i])
> +			*ipsp++ = (s64)(uintptr_t)d->progs[i]->bpf_func;
> +	}
> +	err = arch_prepare_bpf_dispatcher(new_image, &ips[0], d->num_progs);
> +	if (err)
> +		return;
> +
> +	if (d->selector) {
> +		/* progs already running at this address */
> +		err = bpf_arch_text_poke(d->func, BPF_MOD_JUMP_TO_JUMP,
> +					 old_image, new_image);
> +	} else {
> +		/* first time registering */
> +		err = bpf_arch_text_poke(d->func, BPF_MOD_NOP_TO_JUMP,
> +					 NULL, new_image);
> +	}
> +	if (err)
> +		return;
> +	d->selector++;
> +}

Not sure how to share selector logic between dispatch and trampoline.
But above selector=0; weirdness is a sign that sharing is probably necessary?

> +
> +void bpf_dispatcher_change_prog(void *func, struct bpf_prog *from,
> +				struct bpf_prog *to)
> +{
> +	struct bpf_dispatcher *d;
> +	bool changed = false;
> +
> +	if (from == to)
> +		return;
> +
> +	mutex_lock(&dispatcher_mutex);
> +	d = bpf_dispatcher_lookup(func);
> +	if (!d)
> +		goto out;
> +
> +	changed |= bpf_dispatcher_remove_prog(d, from);
> +	changed |= bpf_dispatcher_add_prog(d, to);
> +
> +	if (!changed)
> +		goto out;
> +
> +	bpf_dispatcher_update(d);
> +	if (!d->num_progs)
> +		bpf_dispatcher_free(d);

I think I got it why it works.
Every time the prog cnt goes to zero you free the trampoline right away
and next time it will be allocated again and kzalloc() will zero selector.
That's hard to spot.
Also if user space does for(;;) attach/detach;
it will keep stressing bpf_jit_alloc_exec.
In case of bpf trampoline attach/detach won't be stressing it.
Only load/unload which are much slower due to verification.
I guess such difference is ok.


  reply	other threads:[~2019-11-24  1:55 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-23  7:12 [PATCH bpf-next v2 0/6] Introduce the BPF dispatcher and xdp_call.h Björn Töpel
2019-11-23  7:12 ` [PATCH bpf-next v2 1/6] bpf: introduce BPF dispatcher Björn Töpel
2019-11-24  1:55   ` Alexei Starovoitov [this message]
2019-11-24  6:55     ` Björn Töpel
2019-11-24 17:08       ` Alexei Starovoitov
2019-11-24 17:16         ` Björn Töpel
2019-11-25  0:08   ` kbuild test robot
2019-11-25 10:53   ` Daniel Borkmann
2019-11-25 15:20     ` Björn Töpel
2019-11-23  7:12 ` [PATCH bpf-next v2 2/6] xdp: introduce xdp_call Björn Töpel
2019-11-24  1:59   ` Alexei Starovoitov
2019-11-24  6:56     ` Björn Töpel
2019-11-25 11:18   ` Toke Høiland-Jørgensen
2019-11-25 15:21     ` Björn Töpel
2019-11-25 15:56       ` Toke Høiland-Jørgensen
2019-11-26  7:43         ` Björn Töpel
2019-11-26  8:37           ` Toke Høiland-Jørgensen
2019-11-23  7:12 ` [PATCH bpf-next v2 3/6] i40e: start using xdp_call.h Björn Töpel
2019-11-23  7:12 ` [PATCH bpf-next v2 4/6] ixgbe: " Björn Töpel
2019-11-23  7:12 ` [PATCH bpf-next v2 5/6] net/mlx4_en: " Björn Töpel
2019-11-23  7:12 ` [PATCH bpf-next v2 6/6] net/mlx5e: Start " Björn Töpel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191124015504.yypqw4gx52e5e6og@ast-mbp.dhcp.thefacebook.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=ast@kernel.org \
    --cc=bjorn.topel@gmail.com \
    --cc=bjorn.topel@intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=ecree@solarflare.com \
    --cc=jonathan.lemon@gmail.com \
    --cc=magnus.karlsson@gmail.com \
    --cc=magnus.karlsson@intel.com \
    --cc=maximmi@mellanox.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@mellanox.com \
    --cc=tariqt@mellanox.com \
    --cc=thoiland@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).