All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH bpf-next 0/4] Introduce xdp_call.h and the BPF dispatcher
@ 2019-11-13 20:47 Björn Töpel
  2019-11-13 20:47 ` [RFC PATCH bpf-next 1/4] bpf: teach bpf_arch_text_poke() jumps Björn Töpel
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Björn Töpel @ 2019-11-13 20:47 UTC (permalink / raw)
  To: netdev, ast, daniel
  Cc: Björn Töpel, bpf, magnus.karlsson, magnus.karlsson,
	jonathan.lemon

This RFC(!) introduces the BPF dispatcher and xdp_call.h, and it's a
mechanism to avoid the retpoline overhead by text-poking/rewriting
indirect calls to direct calls.

The ideas build on Alexei's V3 of the BPF trampoline work, namely:
  * Use the existing BPF JIT infrastructure generate code
  * Use bpf_arch_text_poke() to modify the kernel text  

To try the series out, you'll need V3 of the BPF trampoline work [1].

The main idea; Each XDP call-site calls the jited dispatch table,
instead of an indirect call. The dispatch table calls the XDP programs
directly. In pseudo code this be something similar to:

unsigned int do_call(struct bpf_prog *prog, struct xdp_buff *xdp)
{
	if (&prog == PROG1)
		return call_direct_PROG1(xdp);
	if (&prog == PROG2)
		return call_direct_PROG2(xdp);
	return indirect_call(prog, xdp);
}

The current dispatcher supports four entries. It could support more,
but I don't know if it's really practical (...and I was lazy -- more
than 4 entries meant moving to >1B Jcc. :-P). The dispatcher is
re-generated for each new XDP program/entry. The upper limit of four
in this series means that if six i40e netdevs have an XDP program
running, the fifth and sixth will be using an indirect call.

Now to the performance numbers. I ran this on my 3 GHz Skylake, 64B
UDP packets are sent to the i40e at ~40 Mpps.

Benchmark:
  # ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP

  1. Baseline:            26.0 Mpps
  2. Dispatcher 1 entry:  35,5 Mpps (+36.5%)
  3. Dispatcher 4 enties: 32.9 Mpps (+26.5%)
  4. Dispatcher 5 enties: 24.2 Mpps (-6.9%)

Scenario 4 is that the benchmark uses the dispatcher, but the table is
full. This means that the caller pays for the dispatching *and* the
retpoline.

Is this a good idea? The performance is nice! Can it be done in a
better way? Useful for other BPF programs? I would love your input!


Thanks!
Björn

[1] https://patchwork.ozlabs.org/cover/1191672/

Björn Töpel (4):
  bpf: teach bpf_arch_text_poke() jumps
  bpf: introduce BPF dispatcher
  xdp: introduce xdp_call
  i40e: start using xdp_call.h

 arch/x86/net/bpf_jit_comp.c                 | 130 ++++++++++++-
 drivers/net/ethernet/intel/i40e/i40e_main.c |   5 +
 drivers/net/ethernet/intel/i40e/i40e_txrx.c |   5 +-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c  |   5 +-
 include/linux/bpf.h                         |   3 +
 include/linux/xdp_call.h                    |  49 +++++
 kernel/bpf/Makefile                         |   1 +
 kernel/bpf/dispatcher.c                     | 197 ++++++++++++++++++++
 8 files changed, 388 insertions(+), 7 deletions(-)
 create mode 100644 include/linux/xdp_call.h
 create mode 100644 kernel/bpf/dispatcher.c

-- 
2.20.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2019-11-18 20:11 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-13 20:47 [RFC PATCH bpf-next 0/4] Introduce xdp_call.h and the BPF dispatcher Björn Töpel
2019-11-13 20:47 ` [RFC PATCH bpf-next 1/4] bpf: teach bpf_arch_text_poke() jumps Björn Töpel
2019-11-13 20:47 ` [RFC PATCH bpf-next 2/4] bpf: introduce BPF dispatcher Björn Töpel
2019-11-13 21:40   ` Edward Cree
2019-11-14  6:29     ` Björn Töpel
2019-11-14 10:18       ` Edward Cree
2019-11-14 11:21         ` Björn Töpel
2019-11-14 13:58           ` Peter Zijlstra
2019-11-14 12:31   ` Toke Høiland-Jørgensen
2019-11-14 13:03     ` Daniel Borkmann
2019-11-14 13:09       ` Toke Høiland-Jørgensen
2019-11-14 13:56       ` Björn Töpel
2019-11-14 14:55         ` Toke Høiland-Jørgensen
2019-11-14 15:03           ` Björn Töpel
2019-11-14 15:12             ` Toke Høiland-Jørgensen
2019-11-15  0:30   ` Alexei Starovoitov
2019-11-15  7:56     ` Björn Töpel
2019-11-15 21:58       ` Alexei Starovoitov
2019-11-18 10:03         ` Björn Töpel
2019-11-18 19:36   ` Andrii Nakryiko
2019-11-18 20:11     ` Björn Töpel
2019-11-13 20:47 ` [RFC PATCH bpf-next 3/4] xdp: introduce xdp_call Björn Töpel
2019-11-13 20:47 ` [RFC PATCH bpf-next 4/4] i40e: start using xdp_call.h Björn Töpel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.