BPF Archive on lore.kernel.org
 help / color / Atom feed
From: Hangbin Liu <liuhangbin@gmail.com>
To: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Jiri Benc" <jbenc@redhat.com>,
	"Jesper Dangaard Brouer" <brouer@redhat.com>,
	"Eelco Chaudron" <echaudro@redhat.com>,
	ast@kernel.org, "Daniel Borkmann" <daniel@iogearbox.net>,
	"Lorenzo Bianconi" <lorenzo.bianconi@redhat.com>,
	"Hangbin Liu" <liuhangbin@gmail.com>
Subject: [PATCHv3 bpf-next 0/2] xdp: add dev map multicast support
Date: Sat, 23 May 2020 14:05:35 +0800
Message-ID: <20200523060537.264096-1-liuhangbin@gmail.com> (raw)
In-Reply-To: <20200415085437.23028-1-liuhangbin@gmail.com>

Hi all,

This patchset is for xdp multicast support, which has been discussed
before[0]. The goal is to be able to implement an OVS-like data plane in
XDP, i.e., a software switch that can forward XDP frames to multiple
ports.

To achieve this, an application needs to specify a group of interfaces
to forward a packet to. It is also common to want to exclude one or more
physical interfaces from the forwarding operation - e.g., to forward a
packet to all interfaces in the multicast group except the interface it
arrived on. While this could be done simply by adding more groups, this
quickly leads to a combinatorial explosion in the number of groups an
application has to maintain.

To avoid the combinatorial explosion, we propose to include the ability
to specify an "exclude group" as part of the forwarding operation. This
needs to be a group (instead of just a single port index), because a
physical interface can be part of a logical grouping, such as a bond
device.

Thus, the logical forwarding operation becomes a "set difference"
operation, i.e. "forward to all ports in group A that are not also in
group B". This series implements such an operation using device maps to
represent the groups. This means that the XDP program specifies two
device maps, one containing the list of netdevs to redirect to, and the
other containing the exclude list.

To achieve this, I re-implement a new helper bpf_redirect_map_multi()
to accept two maps, the forwarding map and exclude map. If user
don't want to use exclude map and just want simply stop redirecting back
to ingress device, they can use flag BPF_F_EXCLUDE_INGRESS.

The example in patch 2 is functional, but not a lot of effort
has been made on performance optimisation. I did a simple test(pkt size 64)
with pktgen. Here is the test result with BPF_MAP_TYPE_DEVMAP_HASH
arrays:

bpf_redirect_map() with 1 ingress, 1 egress:
generic path: ~1600k pps
native path: ~980k pps

bpf_redirect_map_multi() with 1 ingress, 3 egress:
generic path: ~600k pps
native path: ~480k pps

bpf_redirect_map_multi() with 1 ingress, 9 egress:
generic path: ~125k pps
native path: ~100k pps

The bpf_redirect_map_multi() is slower than bpf_redirect_map() as we loop
the arrays and do clone skb/xdpf. The native path is slower than generic
path as we send skbs by pktgen. So the result looks reasonable.

We need also note that the performace number will get slower if we use large
BPF_MAP_TYPE_DEVMAP arrays.

Last but not least, thanks a lot to Jiri, Eelco, Toke and Jesper for
suggestions and help on implementation.

[0] https://xdp-project.net/#Handling-multicast

v3: Based on Toke's suggestion, do the following update
a) Update bpf_redirect_map_multi() description in bpf.h.
b) Fix exclude_ifindex checking order in dev_in_exclude_map().
c) Fix one more xdpf clone in dev_map_enqueue_multi().
d) Go find next one in dev_map_enqueue_multi() if the interface is not
   able to forward instead of abort the whole loop.
e) Remove READ_ONCE/WRITE_ONCE for ex_map.
f) Add rxcnt map to show the packet transmit speed in sample test.
g) Add performace test number.

I didn't split the tools/include to a separate patch because I think
they are all the same change, and I saw some others also do like this.
But I can re-post the patch and split it if you insist.

v2:
Discussed with Jiri, Toke, Jesper, Eelco, we think the v1 is doing
a trick and may make user confused. So let's just add a new helper
to make the implementation more clear.

Hangbin Liu (2):
  xdp: add a new helper for dev map multicast support
  sample/bpf: add xdp_redirect_map_multicast test

 include/linux/bpf.h                       |  20 +++
 include/linux/filter.h                    |   1 +
 include/net/xdp.h                         |   1 +
 include/uapi/linux/bpf.h                  |  22 ++-
 kernel/bpf/devmap.c                       | 124 ++++++++++++++
 kernel/bpf/verifier.c                     |   6 +
 net/core/filter.c                         | 101 ++++++++++-
 net/core/xdp.c                            |  26 +++
 samples/bpf/Makefile                      |   3 +
 samples/bpf/xdp_redirect_map_multi.sh     | 133 +++++++++++++++
 samples/bpf/xdp_redirect_map_multi_kern.c | 112 ++++++++++++
 samples/bpf/xdp_redirect_map_multi_user.c | 198 ++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h            |  22 ++-
 13 files changed, 762 insertions(+), 7 deletions(-)
 create mode 100755 samples/bpf/xdp_redirect_map_multi.sh
 create mode 100644 samples/bpf/xdp_redirect_map_multi_kern.c
 create mode 100644 samples/bpf/xdp_redirect_map_multi_user.c

-- 
2.25.4


  parent reply index

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-15  8:54 [RFC PATCH " Hangbin Liu
2020-04-15  8:54 ` [RFC PATCH bpf-next 1/2] " Hangbin Liu
2020-04-20  9:52   ` Hangbin Liu
2020-04-15  8:54 ` [RFC PATCH bpf-next 2/2] sample/bpf: add xdp_redirect_map_multicast test Hangbin Liu
2020-04-24  8:56 ` [RFC PATCHv2 bpf-next 0/2] xdp: add dev map multicast support Hangbin Liu
2020-04-24  8:56   ` [RFC PATCHv2 bpf-next 1/2] xdp: add a new helper for " Hangbin Liu
2020-04-24 14:19     ` Lorenzo Bianconi
2020-04-28 11:09       ` Eelco Chaudron
2020-05-06  9:35       ` Hangbin Liu
2020-04-24 14:34     ` Toke Høiland-Jørgensen
2020-05-06  9:14       ` Hangbin Liu
2020-05-06 10:00         ` Toke Høiland-Jørgensen
2020-05-08  8:53           ` Hangbin Liu
2020-05-08 14:58             ` Toke Høiland-Jørgensen
2020-05-18  8:45       ` Hangbin Liu
2020-05-19 10:15         ` Jesper Dangaard Brouer
2020-05-20  1:24           ` Hangbin Liu
2020-04-24  8:56   ` [RFC PATCHv2 bpf-next 2/2] sample/bpf: add xdp_redirect_map_multicast test Hangbin Liu
2020-04-24 14:21     ` Lorenzo Bianconi
2020-05-23  6:05 ` Hangbin Liu [this message]
2020-05-23  6:05   ` [PATCHv3 bpf-next 1/2] xdp: add a new helper for dev map multicast support Hangbin Liu
2020-05-26  7:34     ` kbuild test robot
2020-05-23  6:05   ` [PATCHv3 bpf-next 2/2] sample/bpf: add xdp_redirect_map_multicast test Hangbin Liu
2020-05-26 14:05 ` [PATCHv4 bpf-next 0/2] xdp: add dev map multicast support Hangbin Liu
2020-05-26 14:05   ` [PATCHv4 bpf-next 1/2] xdp: add a new helper for " Hangbin Liu
2020-05-27 10:29     ` Toke Høiland-Jørgensen
2020-05-26 14:05   ` [PATCHv4 bpf-next 2/2] sample/bpf: add xdp_redirect_map_multicast test Hangbin Liu
2020-05-27 10:21   ` [PATCHv4 bpf-next 0/2] xdp: add dev map multicast support Toke Høiland-Jørgensen
2020-05-27 10:32     ` Eelco Chaudron
2020-05-27 12:38     ` Hangbin Liu
2020-05-27 15:04       ` Toke Høiland-Jørgensen
2020-06-03  2:40     ` Hangbin Liu
2020-06-03 11:05       ` Toke Høiland-Jørgensen
2020-06-04  4:09         ` Hangbin Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200523060537.264096-1-liuhangbin@gmail.com \
    --to=liuhangbin@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=echaudro@redhat.com \
    --cc=jbenc@redhat.com \
    --cc=lorenzo.bianconi@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
		bpf@vger.kernel.org
	public-inbox-index bpf

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.bpf


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git