linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH bpf-next v4 2/6] bpf: introduce BPF dispatcher
       [not found] ` <20191211123017.13212-3-bjorn.topel@gmail.com>
@ 2022-08-15 14:13   ` Steven Rostedt
  2022-08-15 14:31     ` Alexei Starovoitov
  0 siblings, 1 reply; 7+ messages in thread
From: Steven Rostedt @ 2022-08-15 14:13 UTC (permalink / raw)
  To: Björn Töpel
  Cc: netdev, ast, daniel, Björn Töpel, bpf, magnus.karlsson,
	magnus.karlsson, jonathan.lemon, ecree, thoiland, brouer,
	andrii.nakryiko, LKML, Linus Torvalds, Christoph Hellwig,
	Peter Zijlstra, Josh Poimboeuf, Thomas Gleixner, Andrew Morton

On Wed, 11 Dec 2019 13:30:13 +0100
Björn Töpel <bjorn.topel@gmail.com> wrote:

> From: Björn Töpel <bjorn.topel@intel.com>
> 
> The BPF dispatcher is a multi-way branch code generator, mainly
> targeted for XDP programs. When an XDP program is executed via the
> bpf_prog_run_xdp(), it is invoked via an indirect call. The indirect
> call has a substantial performance impact, when retpolines are
> enabled. The dispatcher transform indirect calls to direct calls, and
> therefore avoids the retpoline. The dispatcher is generated using the
> BPF JIT, and relies on text poking provided by bpf_arch_text_poke().
> 
> The dispatcher hijacks a trampoline function it via the __fentry__ nop

Why was the ftrace maintainers not Cc'd on this patch?  I would have NACKED
it. Hell, it wasn't even sent to LKML! This was BPF being sneaky in
updating major infrastructure of the Linux kernel without letting the
stakeholders of this change know about it.

For some reason, the BPF folks think they own the entire kernel!

When I heard that ftrace was broken by BPF I thought it was something
unique they were doing, but unfortunately, I didn't investigate what they
were doing at the time.

Then they started sending me patches to hide fentry locations from ftrace.
And even telling me that fentry != ftrace

   https://lore.kernel.org/all/CAADnVQJTT7h3MniVqdBEU=eLwvJhEKNLSjbUAK4sOrhN=zggCQ@mail.gmail.com/
	
Even though fentry was created for ftrace

   https://lore.kernel.org/lkml/1258720459.22249.1018.camel@gandalf.stny.rr.com/

and all the work with fentry was for the ftrace infrastructure. Ftrace
takes a lot of care for security and use cases for other users (like
live kernel patching). But BPF has the NIH syndrome, and likes to own
everything and recreate the wheel so that they have full control.

> of the trampoline. One dispatcher instance currently supports up to 64
> dispatch points. A user creates a dispatcher with its corresponding
> trampoline with the DEFINE_BPF_DISPATCHER macro.

Anyway, this patch just looks like a re-implementation of static_calls:

   https://lore.kernel.org/all/20200818135735.948368560@infradead.org/

Maybe if the BPF folks communicated to a wider audience, they would not
need to constantly recreate the wheel.

-- Steve


> 
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  arch/x86/net/bpf_jit_comp.c | 122 +++++++++++++++++++++++++++
>  include/linux/bpf.h         |  56 +++++++++++++
>  kernel/bpf/Makefile         |   1 +
>  kernel/bpf/dispatcher.c     | 159 ++++++++++++++++++++++++++++++++++++
>  4 files changed, 338 insertions(+)
>  create mode 100644 kernel/bpf/dispatcher.c
> 
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index b8be18427277..3ce7ad41bd6f 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -10,10 +10,12 @@
>  #include <linux/if_vlan.h>
>  #include <linux/bpf.h>
>  #include <linux/memory.h>
> +#include <linux/sort.h>
>  #include <asm/extable.h>
>  #include <asm/set_memory.h>
>  #include <asm/nospec-branch.h>
>  #include <asm/text-patching.h>
> +#include <asm/asm-prototypes.h>
>  
>  static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len)
>  {
> @@ -1530,6 +1532,126 @@ int arch_prepare_bpf_trampoline(void *image, struct btf_func_model *m, u32 flags
>  	return 0;
>  }
>  
> +static int emit_cond_near_jump(u8 **pprog, void *func, void *ip, u8 jmp_cond)
> +{
> +	u8 *prog = *pprog;
> +	int cnt = 0;
> +	s64 offset;
> +
> +	offset = func - (ip + 2 + 4);
> +	if (!is_simm32(offset)) {
> +		pr_err("Target %p is out of range\n", func);
> +		return -EINVAL;
> +	}
> +	EMIT2_off32(0x0F, jmp_cond + 0x10, offset);
> +	*pprog = prog;
> +	return 0;
> +}
> +
> +static int emit_fallback_jump(u8 **pprog)
> +{
> +	u8 *prog = *pprog;
> +	int err = 0;
> +
> +#ifdef CONFIG_RETPOLINE
> +	/* Note that this assumes the the compiler uses external
> +	 * thunks for indirect calls. Both clang and GCC use the same
> +	 * naming convention for external thunks.
> +	 */
> +	err = emit_jump(&prog, __x86_indirect_thunk_rdx, prog);
> +#else
> +	int cnt = 0;
> +
> +	EMIT2(0xFF, 0xE2);	/* jmp rdx */
> +#endif
> +	*pprog = prog;
> +	return err;
> +}
> +
> +static int emit_bpf_dispatcher(u8 **pprog, int a, int b, s64 *progs)
> +{
> +	int pivot, err, jg_bytes = 1, cnt = 0;
> +	u8 *jg_reloc, *prog = *pprog;
> +	s64 jg_offset;
> +
> +	if (a == b) {
> +		/* Leaf node of recursion, i.e. not a range of indices
> +		 * anymore.
> +		 */
> +		EMIT1(add_1mod(0x48, BPF_REG_3));	/* cmp rdx,func */
> +		if (!is_simm32(progs[a]))
> +			return -1;
> +		EMIT2_off32(0x81, add_1reg(0xF8, BPF_REG_3),
> +			    progs[a]);
> +		err = emit_cond_near_jump(&prog,	/* je func */
> +					  (void *)progs[a], prog,
> +					  X86_JE);
> +		if (err)
> +			return err;
> +
> +		err = emit_fallback_jump(&prog);	/* jmp thunk/indirect */
> +		if (err)
> +			return err;
> +
> +		*pprog = prog;
> +		return 0;
> +	}
> +
> +	/* Not a leaf node, so we pivot, and recursively descend into
> +	 * the lower and upper ranges.
> +	 */
> +	pivot = (b - a) / 2;
> +	EMIT1(add_1mod(0x48, BPF_REG_3));		/* cmp rdx,func */
> +	if (!is_simm32(progs[a + pivot]))
> +		return -1;
> +	EMIT2_off32(0x81, add_1reg(0xF8, BPF_REG_3), progs[a + pivot]);
> +
> +	if (pivot > 2) {				/* jg upper_part */
> +		/* Require near jump. */
> +		jg_bytes = 4;
> +		EMIT2_off32(0x0F, X86_JG + 0x10, 0);
> +	} else {
> +		EMIT2(X86_JG, 0);
> +	}
> +	jg_reloc = prog;
> +
> +	err = emit_bpf_dispatcher(&prog, a, a + pivot,	/* emit lower_part */
> +				  progs);
> +	if (err)
> +		return err;
> +
> +	jg_offset = prog - jg_reloc;
> +	emit_code(jg_reloc - jg_bytes, jg_offset, jg_bytes);
> +
> +	err = emit_bpf_dispatcher(&prog, a + pivot + 1,	/* emit upper_part */
> +				  b, progs);
> +	if (err)
> +		return err;
> +
> +	*pprog = prog;
> +	return 0;
> +}
> +
> +static int cmp_ips(const void *a, const void *b)
> +{
> +	const s64 *ipa = a;
> +	const s64 *ipb = b;
> +
> +	if (*ipa > *ipb)
> +		return 1;
> +	if (*ipa < *ipb)
> +		return -1;
> +	return 0;
> +}
> +
> +int arch_prepare_bpf_dispatcher(void *image, s64 *funcs, int num_funcs)
> +{
> +	u8 *prog = image;
> +
> +	sort(funcs, num_funcs, sizeof(funcs[0]), cmp_ips, NULL);
> +	return emit_bpf_dispatcher(&prog, 0, num_funcs - 1, funcs);
> +}
> +
>  struct x64_jit_data {
>  	struct bpf_binary_header *header;
>  	int *addrs;
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 5d744828b399..e6a9d74d4e30 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -470,12 +470,61 @@ struct bpf_trampoline {
>  	void *image;
>  	u64 selector;
>  };
> +
> +#define BPF_DISPATCHER_MAX 64 /* Fits in 2048B */
> +
> +struct bpf_dispatcher_prog {
> +	struct bpf_prog *prog;
> +	refcount_t users;
> +};
> +
> +struct bpf_dispatcher {
> +	/* dispatcher mutex */
> +	struct mutex mutex;
> +	void *func;
> +	struct bpf_dispatcher_prog progs[BPF_DISPATCHER_MAX];
> +	int num_progs;
> +	void *image;
> +	u32 image_off;
> +};
> +
>  #ifdef CONFIG_BPF_JIT
>  struct bpf_trampoline *bpf_trampoline_lookup(u64 key);
>  int bpf_trampoline_link_prog(struct bpf_prog *prog);
>  int bpf_trampoline_unlink_prog(struct bpf_prog *prog);
>  void bpf_trampoline_put(struct bpf_trampoline *tr);
>  void *bpf_jit_alloc_exec_page(void);
> +#define BPF_DISPATCHER_INIT(name) {			\
> +	.mutex = __MUTEX_INITIALIZER(name.mutex),	\
> +	.func = &name##func,				\
> +	.progs = {},					\
> +	.num_progs = 0,					\
> +	.image = NULL,					\
> +	.image_off = 0					\
> +}
> +
> +#define DEFINE_BPF_DISPATCHER(name)					\
> +	unsigned int name##func(					\
> +		const void *xdp_ctx,					\
> +		const struct bpf_insn *insnsi,				\
> +		unsigned int (*bpf_func)(const void *,			\
> +					 const struct bpf_insn *))	\
> +	{								\
> +		return bpf_func(xdp_ctx, insnsi);			\
> +	}								\
> +	EXPORT_SYMBOL(name##func);			\
> +	struct bpf_dispatcher name = BPF_DISPATCHER_INIT(name);
> +#define DECLARE_BPF_DISPATCHER(name)					\
> +	unsigned int name##func(					\
> +		const void *xdp_ctx,					\
> +		const struct bpf_insn *insnsi,				\
> +		unsigned int (*bpf_func)(const void *,			\
> +					 const struct bpf_insn *));	\
> +	extern struct bpf_dispatcher name;
> +#define BPF_DISPATCHER_FUNC(name) name##func
> +#define BPF_DISPATCHER_PTR(name) (&name)
> +void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from,
> +				struct bpf_prog *to);
>  #else
>  static inline struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
>  {
> @@ -490,6 +539,13 @@ static inline int bpf_trampoline_unlink_prog(struct bpf_prog *prog)
>  	return -ENOTSUPP;
>  }
>  static inline void bpf_trampoline_put(struct bpf_trampoline *tr) {}
> +#define DEFINE_BPF_DISPATCHER(name)
> +#define DECLARE_BPF_DISPATCHER(name)
> +#define BPF_DISPATCHER_FUNC(name) bpf_dispatcher_nopfunc
> +#define BPF_DISPATCHER_PTR(name) NULL
> +static inline void bpf_dispatcher_change_prog(struct bpf_dispatcher *d,
> +					      struct bpf_prog *from,
> +					      struct bpf_prog *to) {}
>  #endif
>  
>  struct bpf_func_info_aux {
> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
> index 3f671bf617e8..d4f330351f87 100644
> --- a/kernel/bpf/Makefile
> +++ b/kernel/bpf/Makefile
> @@ -8,6 +8,7 @@ obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o
>  obj-$(CONFIG_BPF_SYSCALL) += disasm.o
>  obj-$(CONFIG_BPF_JIT) += trampoline.o
>  obj-$(CONFIG_BPF_SYSCALL) += btf.o
> +obj-$(CONFIG_BPF_JIT) += dispatcher.o
>  ifeq ($(CONFIG_NET),y)
>  obj-$(CONFIG_BPF_SYSCALL) += devmap.o
>  obj-$(CONFIG_BPF_SYSCALL) += cpumap.o
> diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c
> new file mode 100644
> index 000000000000..a4690460d815
> --- /dev/null
> +++ b/kernel/bpf/dispatcher.c
> @@ -0,0 +1,159 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(c) 2019 Intel Corporation. */
> +
> +#include <linux/hash.h>
> +#include <linux/bpf.h>
> +#include <linux/filter.h>
> +
> +/* The BPF dispatcher is a multiway branch code generator. The
> + * dispatcher is a mechanism to avoid the performance penalty of an
> + * indirect call, which is expensive when retpolines are enabled. A
> + * dispatch client registers a BPF program into the dispatcher, and if
> + * there is available room in the dispatcher a direct call to the BPF
> + * program will be generated. All calls to the BPF programs called via
> + * the dispatcher will then be a direct call, instead of an
> + * indirect. The dispatcher hijacks a trampoline function it via the
> + * __fentry__ of the trampoline. The trampoline function has the
> + * following signature:
> + *
> + * unsigned int trampoline(const void *xdp_ctx,
> + *                         const struct bpf_insn *insnsi,
> + *                         unsigned int (*bpf_func)(const void *,
> + *                                                  const struct bpf_insn *));
> + */
> +
> +static struct bpf_dispatcher_prog *bpf_dispatcher_find_prog(
> +	struct bpf_dispatcher *d, struct bpf_prog *prog)
> +{
> +	int i;
> +
> +	for (i = 0; i < BPF_DISPATCHER_MAX; i++) {
> +		if (prog == d->progs[i].prog)
> +			return &d->progs[i];
> +	}
> +	return NULL;
> +}
> +
> +static struct bpf_dispatcher_prog *bpf_dispatcher_find_free(
> +	struct bpf_dispatcher *d)
> +{
> +	return bpf_dispatcher_find_prog(d, NULL);
> +}
> +
> +static bool bpf_dispatcher_add_prog(struct bpf_dispatcher *d,
> +				    struct bpf_prog *prog)
> +{
> +	struct bpf_dispatcher_prog *entry;
> +
> +	if (!prog)
> +		return false;
> +
> +	entry = bpf_dispatcher_find_prog(d, prog);
> +	if (entry) {
> +		refcount_inc(&entry->users);
> +		return false;
> +	}
> +
> +	entry = bpf_dispatcher_find_free(d);
> +	if (!entry)
> +		return false;
> +
> +	bpf_prog_inc(prog);
> +	entry->prog = prog;
> +	refcount_set(&entry->users, 1);
> +	d->num_progs++;
> +	return true;
> +}
> +
> +static bool bpf_dispatcher_remove_prog(struct bpf_dispatcher *d,
> +				       struct bpf_prog *prog)
> +{
> +	struct bpf_dispatcher_prog *entry;
> +
> +	if (!prog)
> +		return false;
> +
> +	entry = bpf_dispatcher_find_prog(d, prog);
> +	if (!entry)
> +		return false;
> +
> +	if (refcount_dec_and_test(&entry->users)) {
> +		entry->prog = NULL;
> +		bpf_prog_put(prog);
> +		d->num_progs--;
> +		return true;
> +	}
> +	return false;
> +}
> +
> +int __weak arch_prepare_bpf_dispatcher(void *image, s64 *funcs, int num_funcs)
> +{
> +	return -ENOTSUPP;
> +}
> +
> +static int bpf_dispatcher_prepare(struct bpf_dispatcher *d, void *image)
> +{
> +	s64 ips[BPF_DISPATCHER_MAX] = {}, *ipsp = &ips[0];
> +	int i;
> +
> +	for (i = 0; i < BPF_DISPATCHER_MAX; i++) {
> +		if (d->progs[i].prog)
> +			*ipsp++ = (s64)(uintptr_t)d->progs[i].prog->bpf_func;
> +	}
> +	return arch_prepare_bpf_dispatcher(image, &ips[0], d->num_progs);
> +}
> +
> +static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs)
> +{
> +	void *old, *new;
> +	u32 noff;
> +	int err;
> +
> +	if (!prev_num_progs) {
> +		old = NULL;
> +		noff = 0;
> +	} else {
> +		old = d->image + d->image_off;
> +		noff = d->image_off ^ (PAGE_SIZE / 2);
> +	}
> +
> +	new = d->num_progs ? d->image + noff : NULL;
> +	if (new) {
> +		if (bpf_dispatcher_prepare(d, new))
> +			return;
> +	}
> +
> +	err = bpf_arch_text_poke(d->func, BPF_MOD_JUMP, old, new);
> +	if (err || !new)
> +		return;
> +
> +	d->image_off = noff;
> +}
> +
> +void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from,
> +				struct bpf_prog *to)
> +{
> +	bool changed = false;
> +	int prev_num_progs;
> +
> +	if (from == to)
> +		return;
> +
> +	mutex_lock(&d->mutex);
> +	if (!d->image) {
> +		d->image = bpf_jit_alloc_exec_page();
> +		if (!d->image)
> +			goto out;
> +	}
> +
> +	prev_num_progs = d->num_progs;
> +	changed |= bpf_dispatcher_remove_prog(d, from);
> +	changed |= bpf_dispatcher_add_prog(d, to);
> +
> +	if (!changed)
> +		goto out;
> +
> +	bpf_dispatcher_update(d, prev_num_progs);
> +out:
> +	mutex_unlock(&d->mutex);
> +}


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v4 2/6] bpf: introduce BPF dispatcher
  2022-08-15 14:13   ` [PATCH bpf-next v4 2/6] bpf: introduce BPF dispatcher Steven Rostedt
@ 2022-08-15 14:31     ` Alexei Starovoitov
  2022-08-15 14:56       ` Peter Zijlstra
  2022-08-15 15:16       ` Steven Rostedt
  0 siblings, 2 replies; 7+ messages in thread
From: Alexei Starovoitov @ 2022-08-15 14:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Björn Töpel, Network Development, Alexei Starovoitov,
	Daniel Borkmann, Björn Töpel, bpf, Magnus Karlsson,
	Karlsson, Magnus, Jonathan Lemon, Edward Cree,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	Andrii Nakryiko, LKML, Linus Torvalds, Christoph Hellwig,
	Peter Zijlstra, Josh Poimboeuf, Thomas Gleixner, Andrew Morton

On Mon, Aug 15, 2022 at 7:13 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Wed, 11 Dec 2019 13:30:13 +0100
> Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> > From: Björn Töpel <bjorn.topel@intel.com>
> >
> > The BPF dispatcher is a multi-way branch code generator, mainly
> > targeted for XDP programs. When an XDP program is executed via the
> > bpf_prog_run_xdp(), it is invoked via an indirect call. The indirect
> > call has a substantial performance impact, when retpolines are
> > enabled. The dispatcher transform indirect calls to direct calls, and
> > therefore avoids the retpoline. The dispatcher is generated using the
> > BPF JIT, and relies on text poking provided by bpf_arch_text_poke().
> >
> > The dispatcher hijacks a trampoline function it via the __fentry__ nop
>
> Why was the ftrace maintainers not Cc'd on this patch?  I would have NACKED
> it. Hell, it wasn't even sent to LKML! This was BPF being sneaky in
> updating major infrastructure of the Linux kernel without letting the
> stakeholders of this change know about it.
>
> For some reason, the BPF folks think they own the entire kernel!
>
> When I heard that ftrace was broken by BPF I thought it was something
> unique they were doing, but unfortunately, I didn't investigate what they
> were doing at the time.

ftrace is still broken and refusing to accept the fact doesn't make it
non-broken.

> Then they started sending me patches to hide fentry locations from ftrace.
> And even telling me that fentry != ftrace

It sounds that you've invented nop5 and kernel's ability
to replace nop5 with a jump or call.
ftrace should really stop trying to own all of the kernel text rewrites.
It's in the way. Like this case.

>    https://lore.kernel.org/all/CAADnVQJTT7h3MniVqdBEU=eLwvJhEKNLSjbUAK4sOrhN=zggCQ@mail.gmail.com/
>
> Even though fentry was created for ftrace
>
>    https://lore.kernel.org/lkml/1258720459.22249.1018.camel@gandalf.stny.rr.com/
>
> and all the work with fentry was for the ftrace infrastructure. Ftrace
> takes a lot of care for security and use cases for other users (like
> live kernel patching). But BPF has the NIH syndrome, and likes to own
> everything and recreate the wheel so that they have full control.
>
> > of the trampoline. One dispatcher instance currently supports up to 64
> > dispatch points. A user creates a dispatcher with its corresponding
> > trampoline with the DEFINE_BPF_DISPATCHER macro.
>
> Anyway, this patch just looks like a re-implementation of static_calls:

It was implemented long before static_calls made it to the kernel
and it's different. Please do your home work.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v4 2/6] bpf: introduce BPF dispatcher
  2022-08-15 14:31     ` Alexei Starovoitov
@ 2022-08-15 14:56       ` Peter Zijlstra
  2022-08-15 15:16       ` Steven Rostedt
  1 sibling, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2022-08-15 14:56 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Steven Rostedt, Björn Töpel, Network Development,
	Alexei Starovoitov, Daniel Borkmann, Björn Töpel, bpf,
	Magnus Karlsson, Karlsson, Magnus, Jonathan Lemon, Edward Cree,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	Andrii Nakryiko, LKML, Linus Torvalds, Christoph Hellwig,
	Josh Poimboeuf, Thomas Gleixner, Andrew Morton

On Mon, Aug 15, 2022 at 07:31:23AM -0700, Alexei Starovoitov wrote:
> On Mon, Aug 15, 2022 at 7:13 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > On Wed, 11 Dec 2019 13:30:13 +0100
> > Björn Töpel <bjorn.topel@gmail.com> wrote:
> >
> > > From: Björn Töpel <bjorn.topel@intel.com>
> > >
> > > The BPF dispatcher is a multi-way branch code generator, mainly
> > > targeted for XDP programs. When an XDP program is executed via the
> > > bpf_prog_run_xdp(), it is invoked via an indirect call. The indirect
> > > call has a substantial performance impact, when retpolines are
> > > enabled. The dispatcher transform indirect calls to direct calls, and
> > > therefore avoids the retpoline. The dispatcher is generated using the
> > > BPF JIT, and relies on text poking provided by bpf_arch_text_poke().
> > >
> > > The dispatcher hijacks a trampoline function it via the __fentry__ nop
> >
> > Why was the ftrace maintainers not Cc'd on this patch?  I would have NACKED
> > it. Hell, it wasn't even sent to LKML! This was BPF being sneaky in
> > updating major infrastructure of the Linux kernel without letting the
> > stakeholders of this change know about it.
> >
> > For some reason, the BPF folks think they own the entire kernel!
> >
> > When I heard that ftrace was broken by BPF I thought it was something
> > unique they were doing, but unfortunately, I didn't investigate what they
> > were doing at the time.
> 
> ftrace is still broken and refusing to accept the fact doesn't make it
> non-broken.

Alexei, stop this. The 'call __fentry__' sites are owned by ftrace.
Always have been. If BPF somehow thinks it can use them without telling
ftrace then it's BPF that's broken.

> > Then they started sending me patches to hide fentry locations from ftrace.
> > And even telling me that fentry != ftrace
> 
> It sounds that you've invented nop5 and kernel's ability
> to replace nop5 with a jump or call.

Ftrace has introduced the mcount/fentry patching into the kernel and has
always owned it for those sites. There is a lot of other text writing
not owned by ftrace. But the fentry sites are ftrace's.

Ftrace was also the one that got us the text_poke_bp() infrastructure
and got it reviewed by the CPU vendors.

Since then we've grown static_branch and static_call, they have their
own patch sites and do no interfere with ftrace.

> ftrace should really stop trying to own all of the kernel text rewrites.
> It's in the way. Like this case.

It doesn't. It hasn't. But it *does* own the fentry sites.

> It was implemented long before static_calls made it to the kernel
> and it's different.

It wasn't long before. Yes it landed a few months prior to the
static_call work, but the whole static_call thing was in progress for a
long long time.

Anyway, yes it is different. But it's still very much broken. You simply
cannot step on __fentry__ sites like that.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v4 2/6] bpf: introduce BPF dispatcher
  2022-08-15 14:31     ` Alexei Starovoitov
  2022-08-15 14:56       ` Peter Zijlstra
@ 2022-08-15 15:16       ` Steven Rostedt
  2022-08-15 15:19         ` Alexei Starovoitov
                           ` (2 more replies)
  1 sibling, 3 replies; 7+ messages in thread
From: Steven Rostedt @ 2022-08-15 15:16 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Björn Töpel, Network Development, Alexei Starovoitov,
	Daniel Borkmann, Björn Töpel, bpf, Magnus Karlsson,
	Karlsson, Magnus, Jonathan Lemon, Edward Cree,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	Andrii Nakryiko, LKML, Linus Torvalds, Christoph Hellwig,
	Peter Zijlstra, Josh Poimboeuf, Thomas Gleixner, Andrew Morton

On Mon, 15 Aug 2022 07:31:23 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >
> > When I heard that ftrace was broken by BPF I thought it was something
> > unique they were doing, but unfortunately, I didn't investigate what they
> > were doing at the time.  
> 
> ftrace is still broken and refusing to accept the fact doesn't make it
> non-broken.

I extended Jiri's patch to make it work again.

> 
> > Then they started sending me patches to hide fentry locations from ftrace.
> > And even telling me that fentry != ftrace  
> 
> It sounds that you've invented nop5 and kernel's ability
> to replace nop5 with a jump or call.

Actually I did invent it.

   https://lore.kernel.org/lkml/20080210072109.GR4100@elte.hu/


I'm the one that introduced the code to convert mcount into the 5 byte nop,
and did the research and development to make it work at run time. I had one
hiccup along the way that caused the e1000e network card breakage.

The "daemon" approach was horrible, and then I created the recordmcount.pl
perl script to accomplish the same thing at compile time.

> ftrace should really stop trying to own all of the kernel text rewrites.
> It's in the way. Like this case.

It's not trying to own all modifications (static_calls is not ftrace). But
the code at the start of functions with fentry does belong to it.

> 
> >    https://lore.kernel.org/all/CAADnVQJTT7h3MniVqdBEU=eLwvJhEKNLSjbUAK4sOrhN=zggCQ@mail.gmail.com/
> >
> > Even though fentry was created for ftrace
> >
> >    https://lore.kernel.org/lkml/1258720459.22249.1018.camel@gandalf.stny.rr.com/
> >
> > and all the work with fentry was for the ftrace infrastructure. Ftrace
> > takes a lot of care for security and use cases for other users (like
> > live kernel patching). But BPF has the NIH syndrome, and likes to own
> > everything and recreate the wheel so that they have full control.
> >  
> > > of the trampoline. One dispatcher instance currently supports up to 64
> > > dispatch points. A user creates a dispatcher with its corresponding
> > > trampoline with the DEFINE_BPF_DISPATCHER macro.  
> >
> > Anyway, this patch just looks like a re-implementation of static_calls:  
> 
> It was implemented long before static_calls made it to the kernel
> and it's different. Please do your home work.

Long before? This code made it into the kernel in Dec 2019. Yes static calls
finally made it into the kernel in 2020, but it was first introduced in
October 2018:

  https://lore.kernel.org/all/20181006015110.653946300@goodmis.org/

If you had Cc'd us on this patch, we could have collaborated and come up
with something that would have worked for you.

It took time to get in because we don't just push our features in, we make
sure that they are generic and work for others, and is secure and robust.

I sent a proof of concept, Josh took over, Linus had issues, and finally
Peter pushed it through the gate. It's a long process, but we don't break
others code while doing it!

-- Steve

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v4 2/6] bpf: introduce BPF dispatcher
  2022-08-15 15:16       ` Steven Rostedt
@ 2022-08-15 15:19         ` Alexei Starovoitov
  2022-08-15 15:21         ` Steven Rostedt
  2022-08-15 15:39         ` Steven Rostedt
  2 siblings, 0 replies; 7+ messages in thread
From: Alexei Starovoitov @ 2022-08-15 15:19 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Björn Töpel, Network Development, Alexei Starovoitov,
	Daniel Borkmann, Björn Töpel, bpf, Magnus Karlsson,
	Karlsson, Magnus, Jonathan Lemon, Edward Cree,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	Andrii Nakryiko, LKML, Linus Torvalds, Christoph Hellwig,
	Peter Zijlstra, Josh Poimboeuf, Thomas Gleixner, Andrew Morton

On Mon, Aug 15, 2022 at 8:16 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon, 15 Aug 2022 07:31:23 -0700
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > >
> > > When I heard that ftrace was broken by BPF I thought it was something
> > > unique they were doing, but unfortunately, I didn't investigate what they
> > > were doing at the time.
> >
> > ftrace is still broken and refusing to accept the fact doesn't make it
> > non-broken.
>
> I extended Jiri's patch to make it work again.
>
> >
> > > Then they started sending me patches to hide fentry locations from ftrace.
> > > And even telling me that fentry != ftrace
> >
> > It sounds that you've invented nop5 and kernel's ability
> > to replace nop5 with a jump or call.
>
> Actually I did invent it.
>
>    https://lore.kernel.org/lkml/20080210072109.GR4100@elte.hu/
>
>
> I'm the one that introduced the code to convert mcount into the 5 byte nop,
> and did the research and development to make it work at run time. I had one
> hiccup along the way that caused the e1000e network card breakage.
>
> The "daemon" approach was horrible, and then I created the recordmcount.pl
> perl script to accomplish the same thing at compile time.
>
> > ftrace should really stop trying to own all of the kernel text rewrites.
> > It's in the way. Like this case.
>
> It's not trying to own all modifications (static_calls is not ftrace). But
> the code at the start of functions with fentry does belong to it.
>
> >
> > >    https://lore.kernel.org/all/CAADnVQJTT7h3MniVqdBEU=eLwvJhEKNLSjbUAK4sOrhN=zggCQ@mail.gmail.com/
> > >
> > > Even though fentry was created for ftrace
> > >
> > >    https://lore.kernel.org/lkml/1258720459.22249.1018.camel@gandalf.stny.rr.com/
> > >
> > > and all the work with fentry was for the ftrace infrastructure. Ftrace
> > > takes a lot of care for security and use cases for other users (like
> > > live kernel patching). But BPF has the NIH syndrome, and likes to own
> > > everything and recreate the wheel so that they have full control.
> > >
> > > > of the trampoline. One dispatcher instance currently supports up to 64
> > > > dispatch points. A user creates a dispatcher with its corresponding
> > > > trampoline with the DEFINE_BPF_DISPATCHER macro.
> > >
> > > Anyway, this patch just looks like a re-implementation of static_calls:
> >
> > It was implemented long before static_calls made it to the kernel
> > and it's different. Please do your home work.
>
> Long before? This code made it into the kernel in Dec 2019. Yes static calls
> finally made it into the kernel in 2020, but it was first introduced in
> October 2018:
>
>   https://lore.kernel.org/all/20181006015110.653946300@goodmis.org/
>
> If you had Cc'd us on this patch, we could have collaborated and come up
> with something that would have worked for you.
>
> It took time to get in because we don't just push our features in, we make
> sure that they are generic and work for others, and is secure and robust.
>
> I sent a proof of concept, Josh took over, Linus had issues, and finally
> Peter pushed it through the gate. It's a long process, but we don't break
> others code while doing it!

Replied in the other thread. Let's stick to one thread please.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v4 2/6] bpf: introduce BPF dispatcher
  2022-08-15 15:16       ` Steven Rostedt
  2022-08-15 15:19         ` Alexei Starovoitov
@ 2022-08-15 15:21         ` Steven Rostedt
  2022-08-15 15:39         ` Steven Rostedt
  2 siblings, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2022-08-15 15:21 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Björn Töpel, Network Development, Alexei Starovoitov,
	Daniel Borkmann, Björn Töpel, bpf, Magnus Karlsson,
	Karlsson, Magnus, Jonathan Lemon, Edward Cree,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	Andrii Nakryiko, LKML, Linus Torvalds, Christoph Hellwig,
	Peter Zijlstra, Josh Poimboeuf, Thomas Gleixner, Andrew Morton

On Mon, 15 Aug 2022 11:16:58 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> Actually I did invent it.
> 
>    https://lore.kernel.org/lkml/20080210072109.GR4100@elte.hu/

And the next patch replaced the jmps with nops. We kept this as separate
patches for debugging reasons.

 commit dfa60aba04dae78 ("ftrace: use nops instead of jmp")

-- Steve

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH bpf-next v4 2/6] bpf: introduce BPF dispatcher
  2022-08-15 15:16       ` Steven Rostedt
  2022-08-15 15:19         ` Alexei Starovoitov
  2022-08-15 15:21         ` Steven Rostedt
@ 2022-08-15 15:39         ` Steven Rostedt
  2 siblings, 0 replies; 7+ messages in thread
From: Steven Rostedt @ 2022-08-15 15:39 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Björn Töpel, Network Development, Alexei Starovoitov,
	Daniel Borkmann, Björn Töpel, bpf, Magnus Karlsson,
	Karlsson, Magnus, Jonathan Lemon, Edward Cree,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer,
	Andrii Nakryiko, LKML, Linus Torvalds, Christoph Hellwig,
	Peter Zijlstra, Josh Poimboeuf, Thomas Gleixner, Andrew Morton

On Mon, 15 Aug 2022 11:16:58 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> > It sounds that you've invented nop5 and kernel's ability
> > to replace nop5 with a jump or call.  
> 
> Actually I did invent it.
> 
>    https://lore.kernel.org/lkml/20080210072109.GR4100@elte.hu/
> 
> 
> I'm the one that introduced the code to convert mcount into the 5 byte nop,
> and did the research and development to make it work at run time. I had one
> hiccup along the way that caused the e1000e network card breakage.
> 
> The "daemon" approach was horrible, and then I created the recordmcount.pl
> perl script to accomplish the same thing at compile time.

I guess you were not paying attention to my talk at the Kernel Recipes I
invited you to in 2019. The talk I gave was the history of how fentry came
about.

  https://kernel-recipes.org/en/2019/talks/ftrace-where-modifying-a-running-kernel-all-started/

 ;-)

-- Steve

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-08-15 15:39 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20191211123017.13212-1-bjorn.topel@gmail.com>
     [not found] ` <20191211123017.13212-3-bjorn.topel@gmail.com>
2022-08-15 14:13   ` [PATCH bpf-next v4 2/6] bpf: introduce BPF dispatcher Steven Rostedt
2022-08-15 14:31     ` Alexei Starovoitov
2022-08-15 14:56       ` Peter Zijlstra
2022-08-15 15:16       ` Steven Rostedt
2022-08-15 15:19         ` Alexei Starovoitov
2022-08-15 15:21         ` Steven Rostedt
2022-08-15 15:39         ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).