All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/17] xdp: Add packet queueing and scheduling capabilities
@ 2022-07-13 11:14 Toke Høiland-Jørgensen
  2022-07-13 11:14 ` [RFC PATCH 01/17] dev: Move received_rps counter next to RPS members in softnet data Toke Høiland-Jørgensen
                   ` (19 more replies)
  0 siblings, 20 replies; 46+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-07-13 11:14 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Jesper Dangaard Brouer, Björn Töpel, Magnus Karlsson,
	Maciej Fijalkowski, Jonathan Lemon, Mykola Lysenko
  Cc: Kumar Kartikeya Dwivedi, netdev, bpf, Freysteinn Alfredsson,
	Cong Wang, Toke Høiland-Jørgensen

Packet forwarding is an important use case for XDP, which offers
significant performance improvements compared to forwarding using the
regular networking stack. However, XDP currently offers no mechanism to
delay, queue or schedule packets, which limits the practical uses for
XDP-based forwarding to those where the capacity of input and output links
always match each other (i.e., no rate transitions or many-to-one
forwarding). It also prevents an XDP-based router from doing any kind of
traffic shaping or reordering to enforce policy.

This series represents a first RFC of our attempt to remedy this lack. The
code in these patches is functional, but needs additional testing and
polishing before being considered for merging. I'm posting it here as an
RFC to get some early feedback on the API and overall design of the
feature.

DESIGN

The design consists of three components: A new map type for storing XDP
frames, a new 'dequeue' program type that will run in the TX softirq to
provide the stack with packets to transmit, and a set of helpers to dequeue
packets from the map, optionally drop them, and to schedule an interface
for transmission.

The new map type is modelled on the PIFO data structure proposed in the
literature[0][1]. It represents a priority queue where packets can be
enqueued in any priority, but is always dequeued from the head. From the
XDP side, the map is simply used as a target for the bpf_redirect_map()
helper, where the target index is the desired priority.

The dequeue program type is a new BPF program type that is attached to an
interface; when an interface is scheduled for transmission, the stack will
execute the attached dequeue program and, if it returns a packet to
transmit, that packet will be transmitted using the existing ndo_xdp_xmit()
driver function.

The dequeue program can obtain packets by pulling them out of a PIFO map
using the new bpf_packet_dequeue() helper. This returns a pointer to an
xdp_md structure, which can be dereferenced to obtain packet data and
data_meta pointers like in an XDP program. The returned packets are also
reference counted, meaning the verifier enforces that the dequeue program
either drops the packet (with the bpf_packet_drop() helper), or returns it
for transmission. Finally, a helper is added that can be used to actually
schedule an interface for transmission using the dequeue program type; this
helper can be called from both XDP and dequeue programs.

PERFORMANCE

Preliminary performance tests indicate about 50ns overhead of adding
queueing to the xdp_fwd example (last patch), which translates to a 20% PPS
overhead (but still 2x the forwarding performance of the netstack):

xdp_fwd :     4.7 Mpps  (213 ns /pkt)
xdp_fwd -Q:   3.8 Mpps  (263 ns /pkt)
netstack:       2 Mpps  (500 ns /pkt)

RELATION TO BPF QDISC

Cong Wang's BPF qdisc patches[2] share some aspects of this series, in
particular the use of a map to store packets. This is no accident, as we've
had ongoing discussions for a while now. I have no great hope that we can
completely converge the two efforts into a single BPF-based queueing
API (as has been discussed before[3], consolidating the SKB and XDP paths
is challenging). Rather, I'm hoping that we can converge the designs enough
that we can share BPF code between XDP and qdisc layers using common
functions, like it's possible to do with XDP and TC-BPF today. This would
imply agreeing on the map type and API, and possibly on the set of helpers
available to the BPF programs.

PATCH STRUCTURE

This series consists of a total of 17 patches, as follows:

Patches 1-3 are smaller preparatory refactoring patches used by subsequent
patches.

Patches 4-5 introduce the PIFO map type, and patch 6 introduces the dequeue
program type.

Patches 7-10 adds the dequeue helpers and the verifier features needed to
recognise packet pointers, reference count them, and allow dereferencing
them to obtain packet data pointers.

Patches 11 and 12 add the dequeue program hook to the TX path, and the
helpers to schedule an interface.

Patches 13-16 add libbpf support for the new types, and selftests for the
new features.

Finally, patch 17 adds queueing support to the xdp_fwd program in
samples/bpf to provide an easy-to-use way of testing the feature; this is
for illustrative purposes for the RFC only, and will not be included in the
final submission.

SUPPLEMENTARY MATERIAL

A (WiP) test harness for implementing and unit-testing scheduling
algorithms using this framework (and the bpf_prog_run() hook) is available
as part of the bpf-examples repository[4]. We plan to expand this with more
test algorithms to smoke-test the API, and also add ready-to-use queueing
algorithms for use for forwarding (to replace the xdp_fwd patch included as
part of this RFC submission).

The work represented in this series was done in collaboration with several
people. Thanks to Kumar Kartikeya Dwivedi for writing the verifier
enhancements in this series, to Frey Alfredsson for his work on the testing
harness in [4], and to Jesper Brouer, Per Hurtig and Anna Brunstrom for
their valuable input on the design of the queueing APIs.

This series is also available as a git tree on git.kernel.org[5].

NOTES

[0] http://web.mit.edu/pifo/
[1] https://arxiv.org/abs/1810.03060
[2] https://lore.kernel.org/r/20220602041028.95124-1-xiyou.wangcong@gmail.com
[3] https://lore.kernel.org/r/b4ff6a2b-1478-89f8-ea9f-added498c59f@gmail.com
[4] https://github.com/xdp-project/bpf-examples/pull/40
[5] https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/log/?h=xdp-queueing-06

Kumar Kartikeya Dwivedi (5):
  bpf: Use 64-bit return value for bpf_prog_run
  bpf: Teach the verifier about referenced packets returned from dequeue
    programs
  bpf: Introduce pkt_uid member for PTR_TO_PACKET
  bpf: Implement direct packet access in dequeue progs
  selftests/bpf: Add verifier tests for dequeue prog

Toke Høiland-Jørgensen (12):
  dev: Move received_rps counter next to RPS members in softnet data
  bpf: Expand map key argument of bpf_redirect_map to u64
  bpf: Add a PIFO priority queue map type
  pifomap: Add queue rotation for continuously increasing rank mode
  xdp: Add dequeue program type for getting packets from a PIFO
  bpf: Add helpers to dequeue from a PIFO map
  dev: Add XDP dequeue hook
  bpf: Add helper to schedule an interface for TX dequeue
  libbpf: Add support for dequeue program type and PIFO map type
  libbpf: Add support for querying dequeue programs
  selftests/bpf: Add test for XDP queueing through PIFO maps
  samples/bpf: Add queueing support to xdp_fwd sample

 include/linux/bpf-cgroup.h                    |  12 +-
 include/linux/bpf.h                           |  64 +-
 include/linux/bpf_types.h                     |   4 +
 include/linux/bpf_verifier.h                  |  14 +-
 include/linux/filter.h                        |  63 +-
 include/linux/netdevice.h                     |   8 +-
 include/net/xdp.h                             |  16 +-
 include/uapi/linux/bpf.h                      |  50 +-
 include/uapi/linux/if_link.h                  |   4 +-
 kernel/bpf/Makefile                           |   2 +-
 kernel/bpf/cgroup.c                           |  12 +-
 kernel/bpf/core.c                             |  14 +-
 kernel/bpf/cpumap.c                           |   4 +-
 kernel/bpf/devmap.c                           |  92 ++-
 kernel/bpf/offload.c                          |   4 +-
 kernel/bpf/pifomap.c                          | 635 ++++++++++++++++++
 kernel/bpf/syscall.c                          |   3 +
 kernel/bpf/verifier.c                         | 148 +++-
 net/bpf/test_run.c                            |  54 +-
 net/core/dev.c                                | 109 +++
 net/core/dev.h                                |   2 +
 net/core/filter.c                             | 307 ++++++++-
 net/core/rtnetlink.c                          |  30 +-
 net/packet/af_packet.c                        |   7 +-
 net/xdp/xskmap.c                              |   4 +-
 samples/bpf/xdp_fwd_kern.c                    |  65 +-
 samples/bpf/xdp_fwd_user.c                    | 200 ++++--
 tools/include/uapi/linux/bpf.h                |  48 ++
 tools/include/uapi/linux/if_link.h            |   4 +-
 tools/lib/bpf/libbpf.c                        |   1 +
 tools/lib/bpf/libbpf.h                        |   1 +
 tools/lib/bpf/libbpf_probes.c                 |   5 +
 tools/lib/bpf/netlink.c                       |   8 +
 .../selftests/bpf/prog_tests/pifo_map.c       | 125 ++++
 .../bpf/prog_tests/xdp_pifo_test_run.c        | 154 +++++
 tools/testing/selftests/bpf/progs/pifo_map.c  |  54 ++
 .../selftests/bpf/progs/test_xdp_pifo.c       | 110 +++
 tools/testing/selftests/bpf/test_verifier.c   |  29 +-
 .../testing/selftests/bpf/verifier/dequeue.c  | 160 +++++
 39 files changed, 2426 insertions(+), 200 deletions(-)
 create mode 100644 kernel/bpf/pifomap.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/pifo_map.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_pifo_test_run.c
 create mode 100644 tools/testing/selftests/bpf/progs/pifo_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_pifo.c
 create mode 100644 tools/testing/selftests/bpf/verifier/dequeue.c

-- 
2.37.0


^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2022-07-18 12:45 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-13 11:14 [RFC PATCH 00/17] xdp: Add packet queueing and scheduling capabilities Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 01/17] dev: Move received_rps counter next to RPS members in softnet data Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 02/17] bpf: Expand map key argument of bpf_redirect_map to u64 Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 03/17] bpf: Use 64-bit return value for bpf_prog_run Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 04/17] bpf: Add a PIFO priority queue map type Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 05/17] pifomap: Add queue rotation for continuously increasing rank mode Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 06/17] xdp: Add dequeue program type for getting packets from a PIFO Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 07/17] bpf: Teach the verifier about referenced packets returned from dequeue programs Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 08/17] bpf: Add helpers to dequeue from a PIFO map Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 09/17] bpf: Introduce pkt_uid member for PTR_TO_PACKET Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 10/17] bpf: Implement direct packet access in dequeue progs Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 11/17] dev: Add XDP dequeue hook Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 12/17] bpf: Add helper to schedule an interface for TX dequeue Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 13/17] libbpf: Add support for dequeue program type and PIFO map type Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 14/17] libbpf: Add support for querying dequeue programs Toke Høiland-Jørgensen
2022-07-14  5:36   ` Andrii Nakryiko
2022-07-14 10:13     ` Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 15/17] selftests/bpf: Add verifier tests for dequeue prog Toke Høiland-Jørgensen
2022-07-14  5:38   ` Andrii Nakryiko
2022-07-14  6:45     ` Kumar Kartikeya Dwivedi
2022-07-14 18:54       ` Andrii Nakryiko
2022-07-15 11:11         ` Kumar Kartikeya Dwivedi
2022-07-13 11:14 ` [RFC PATCH 16/17] selftests/bpf: Add test for XDP queueing through PIFO maps Toke Høiland-Jørgensen
2022-07-14  5:41   ` Andrii Nakryiko
2022-07-14 10:18     ` Toke Høiland-Jørgensen
2022-07-13 11:14 ` [RFC PATCH 17/17] samples/bpf: Add queueing support to xdp_fwd sample Toke Høiland-Jørgensen
2022-07-13 18:36 ` [RFC PATCH 00/17] xdp: Add packet queueing and scheduling capabilities Stanislav Fomichev
2022-07-13 21:52   ` Toke Høiland-Jørgensen
2022-07-13 22:56     ` Stanislav Fomichev
2022-07-14 10:46       ` Toke Høiland-Jørgensen
2022-07-14 17:24         ` Stanislav Fomichev
2022-07-15  1:12         ` Alexei Starovoitov
2022-07-15 12:55           ` Toke Høiland-Jørgensen
2022-07-17 19:12         ` Cong Wang
2022-07-18 12:25           ` Toke Høiland-Jørgensen
2022-07-14  6:34     ` Kumar Kartikeya Dwivedi
2022-07-17 18:17     ` Cong Wang
2022-07-17 18:41       ` Kumar Kartikeya Dwivedi
2022-07-17 19:23         ` Cong Wang
2022-07-18 12:12       ` Toke Høiland-Jørgensen
2022-07-14 14:05 ` Jamal Hadi Salim
2022-07-14 14:56   ` Dave Taht
2022-07-14 15:33     ` Jamal Hadi Salim
2022-07-14 16:21   ` Toke Høiland-Jørgensen
2022-07-17 17:46 ` Cong Wang
2022-07-18 12:45   ` Toke Høiland-Jørgensen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.