bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/3] XDP bonding support
@ 2021-06-09 13:55 Jussi Maki
  2021-06-09 13:55 ` [PATCH bpf-next 1/3] net: bonding: Add XDP support to the bonding driver Jussi Maki
                   ` (8 more replies)
  0 siblings, 9 replies; 71+ messages in thread
From: Jussi Maki @ 2021-06-09 13:55 UTC (permalink / raw)
  To: bpf; +Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii, Jussi Maki

This patchset introduces XDP support to the bonding driver.

Patch 1 contains the implementation, including support for
the recently introduced EXCLUDE_INGRESS. Patch 2 contains a
performance fix to the roundrobin mode which switches rr_tx_counter
to be per-cpu. Patch 3 contains the test suite for the implementation
using a pair of veth devices.

The vmtest.sh is modified to enable the bonding module and install
modules. The config change should probably be done in the libbpf
repository. Andrii: How would you like this done properly?

The motivation for this change is to enable use of bonding (and
802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
XDP and also to transparently support bond devices for projects that
use XDP given most modern NICs have dual port adapters.  An alternative
to this approach would be to implement 802.3ad in user-space and
implement the bonding load-balancing in the XDP program itself, but
is rather a cumbersome endeavor in terms of slave device management
(e.g. by watching netlink) and requires separate programs for native
vs bond cases for the orchestrator. A native in-kernel implementation
overcomes these issues and provides more flexibility.

Below are benchmark results done on two machines with 100Gbit
Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
16-core 3950X on receiving machine. 64 byte packets were sent with
pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
ice driver, so the tests were performed with iommu=off and patch [2]
applied. Additionally the bonding round robin algorithm was modified
to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
of cache misses were caused by the shared rr_tx_counter (see patch
2/3). The statistics were collected using "sar -n dev -u 1 10".

 -----------------------|  CPU  |--| rxpck/s |--| txpck/s |----
 without patch (1 dev):
   XDP_DROP:              3.15%      48.6Mpps
   XDP_TX:                3.12%      18.3Mpps     18.3Mpps
   XDP_DROP (RSS):        9.47%      116.5Mpps
   XDP_TX (RSS):          9.67%      25.3Mpps     24.2Mpps
 -----------------------
 with patch, bond (1 dev):
   XDP_DROP:              3.14%      46.7Mpps
   XDP_TX:                3.15%      13.9Mpps     13.9Mpps
   XDP_DROP (RSS):        10.33%     117.2Mpps
   XDP_TX (RSS):          10.64%     25.1Mpps     24.0Mpps
 -----------------------
 with patch, bond (2 devs):
   XDP_DROP:              6.27%      92.7Mpps
   XDP_TX:                6.26%      17.6Mpps     17.5Mpps
   XDP_DROP (RSS):       11.38%      117.2Mpps
   XDP_TX (RSS):         14.30%      28.7Mpps     27.4Mpps
 --------------------------------------------------------------

RSS: Receive Side Scaling, e.g. the packets were sent to a range of
destination IPs.

[1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb
[2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t
[3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/

---

Jussi Maki (3):
  net: bonding: Add XDP support to the bonding driver
  net: bonding: Use per-cpu rr_tx_counter
  selftests/bpf: Add tests for XDP bonding

 drivers/net/bonding/bond_main.c               | 459 +++++++++++++++---
 include/linux/filter.h                        |  13 +-
 include/linux/netdevice.h                     |   5 +
 include/net/bonding.h                         |   3 +-
 kernel/bpf/devmap.c                           |  34 +-
 net/core/filter.c                             |  37 +-
 .../selftests/bpf/prog_tests/xdp_bonding.c    | 342 +++++++++++++
 tools/testing/selftests/bpf/vmtest.sh         |  30 +-
 8 files changed, 843 insertions(+), 80 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_bonding.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next 1/3] net: bonding: Add XDP support to the bonding driver
  2021-06-09 13:55 [PATCH bpf-next 0/3] XDP bonding support Jussi Maki
@ 2021-06-09 13:55 ` Jussi Maki
  2021-06-09 22:29   ` Maciej Fijalkowski
                     ` (4 more replies)
  2021-06-09 13:55 ` [PATCH bpf-next 2/3] net: bonding: Use per-cpu rr_tx_counter Jussi Maki
                   ` (7 subsequent siblings)
  8 siblings, 5 replies; 71+ messages in thread
From: Jussi Maki @ 2021-06-09 13:55 UTC (permalink / raw)
  To: bpf; +Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii, Jussi Maki

XDP is implemented in the bonding driver by transparently delegating
the XDP program loading, removal and xmit operations to the bonding
slave devices. The overall goal of this work is that XDP programs
can be attached to a bond device *without* any further changes (or
awareness) necessary to the program itself, meaning the same XDP
program can be attached to a native device but also a bonding device.

Semantics of XDP_TX when attached to a bond are equivalent in such
setting to the case when a tc/BPF program would be attached to the
bond, meaning transmitting the packet out of the bond itself using one
of the bond's configured xmit methods to select a slave device (rather
than XDP_TX on the slave itself). Handling of XDP_TX to transmit
using the configured bonding mechanism is therefore implemented by
rewriting the BPF program return value in bpf_prog_run_xdp. To avoid
performance impact this check is guarded by a static key, which is
incremented when a XDP program is loaded onto a bond device. This
approach was chosen to avoid changes to drivers implementing XDP. If
the slave device does not match the receive device, then XDP_REDIRECT
is transparently used to perform the redirection in order to have
the network driver release the packet from its RX ring.  The bonding
driver hashing functions have been refactored to allow reuse with
xdp_buff's to avoid code duplication.

The motivation for this change is to enable use of bonding (and
802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
XDP and also to transparently support bond devices for projects that
use XDP given most modern NICs have dual port adapters.  An alternative
to this approach would be to implement 802.3ad in user-space and
implement the bonding load-balancing in the XDP program itself, but
is rather a cumbersome endeavor in terms of slave device management
(e.g. by watching netlink) and requires separate programs for native
vs bond cases for the orchestrator. A native in-kernel implementation
overcomes these issues and provides more flexibility.

Below are benchmark results done on two machines with 100Gbit
Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
16-core 3950X on receiving machine. 64 byte packets were sent with
pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
ice driver, so the tests were performed with iommu=off and patch [2]
applied. Additionally the bonding round robin algorithm was modified
to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
of cache misses were caused by the shared rr_tx_counter (see patch
2/3). The statistics were collected using "sar -n dev -u 1 10".

 -----------------------|  CPU  |--| rxpck/s |--| txpck/s |----
 without patch (1 dev):
   XDP_DROP:              3.15%      48.6Mpps
   XDP_TX:                3.12%      18.3Mpps     18.3Mpps
   XDP_DROP (RSS):        9.47%      116.5Mpps
   XDP_TX (RSS):          9.67%      25.3Mpps     24.2Mpps
 -----------------------
 with patch, bond (1 dev):
   XDP_DROP:              3.14%      46.7Mpps
   XDP_TX:                3.15%      13.9Mpps     13.9Mpps
   XDP_DROP (RSS):        10.33%     117.2Mpps
   XDP_TX (RSS):          10.64%     25.1Mpps     24.0Mpps
 -----------------------
 with patch, bond (2 devs):
   XDP_DROP:              6.27%      92.7Mpps
   XDP_TX:                6.26%      17.6Mpps     17.5Mpps
   XDP_DROP (RSS):       11.38%      117.2Mpps
   XDP_TX (RSS):         14.30%      28.7Mpps     27.4Mpps
 --------------------------------------------------------------

RSS: Receive Side Scaling, e.g. the packets were sent to a range of
destination IPs.

[1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb
[2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t
[3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 drivers/net/bonding/bond_main.c | 441 ++++++++++++++++++++++++++++----
 include/linux/filter.h          |  13 +-
 include/linux/netdevice.h       |   5 +
 include/net/bonding.h           |   1 +
 kernel/bpf/devmap.c             |  34 ++-
 net/core/filter.c               |  37 ++-
 6 files changed, 467 insertions(+), 64 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index dafeaef3cbd3..38eea7e096f3 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -317,6 +317,19 @@ bool bond_sk_check(struct bonding *bond)
 	}
 }
 
+static bool bond_xdp_check(struct bonding *bond)
+{
+	switch (BOND_MODE(bond)) {
+	case BOND_MODE_ROUNDROBIN:
+	case BOND_MODE_ACTIVEBACKUP:
+	case BOND_MODE_8023AD:
+	case BOND_MODE_XOR:
+		return true;
+	default:
+		return false;
+	}
+}
+
 /*---------------------------------- VLAN -----------------------------------*/
 
 /* In the following 2 functions, bond_vlan_rx_add_vid and bond_vlan_rx_kill_vid,
@@ -2001,6 +2014,28 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
 	if (bond_mode_can_use_xmit_hash(bond))
 		bond_update_slave_arr(bond, NULL);
 
+	if (bond->xdp_prog) {
+		struct netdev_bpf xdp = {
+			.command = XDP_SETUP_PROG,
+			.flags   = 0,
+			.prog    = bond->xdp_prog,
+			.extack  = extack,
+		};
+		if (!slave_dev->netdev_ops->ndo_bpf ||
+		    !slave_dev->netdev_ops->ndo_xdp_xmit) {
+			NL_SET_ERR_MSG(extack, "Slave does not support XDP");
+			slave_err(bond_dev, slave_dev, "Slave does not support XDP\n");
+			res = -EOPNOTSUPP;
+			goto err_sysfs_del;
+		}
+		res = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp);
+		if (res < 0) {
+			/* ndo_bpf() sets extack error message */
+			slave_dbg(bond_dev, slave_dev, "Error %d calling ndo_bpf\n", res);
+			goto err_sysfs_del;
+		}
+		bpf_prog_inc(bond->xdp_prog);
+	}
 
 	slave_info(bond_dev, slave_dev, "Enslaving as %s interface with %s link\n",
 		   bond_is_active_slave(new_slave) ? "an active" : "a backup",
@@ -2121,6 +2156,17 @@ static int __bond_release_one(struct net_device *bond_dev,
 	/* recompute stats just before removing the slave */
 	bond_get_stats(bond->dev, &bond->bond_stats);
 
+	if (bond->xdp_prog) {
+		struct netdev_bpf xdp = {
+			.command = XDP_SETUP_PROG,
+			.flags   = 0,
+			.prog	 = NULL,
+			.extack  = NULL,
+		};
+		if (slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp))
+			slave_warn(bond_dev, slave_dev, "failed to unload XDP program\n");
+	}
+
 	bond_upper_dev_unlink(bond, slave);
 	/* unregister rx_handler early so bond_handle_frame wouldn't be called
 	 * for this slave anymore.
@@ -3479,55 +3525,80 @@ static struct notifier_block bond_netdev_notifier = {
 
 /*---------------------------- Hashing Policies -----------------------------*/
 
+/* Helper to access data in a packet, with or without a backing skb.
+ * If skb is given the data is linearized if necessary via pskb_may_pull.
+ */
+static inline const void *bond_pull_data(struct sk_buff *skb,
+					 const void *data, int hlen, int n)
+{
+	if (likely(n <= hlen))
+		return data;
+	else if (skb && likely(pskb_may_pull(skb, n)))
+		return skb->head;
+
+	return NULL;
+}
+
 /* L2 hash helper */
-static inline u32 bond_eth_hash(struct sk_buff *skb)
+static inline u32 bond_eth_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen)
 {
-	struct ethhdr *ep, hdr_tmp;
+	struct ethhdr *ep;
 
-	ep = skb_header_pointer(skb, 0, sizeof(hdr_tmp), &hdr_tmp);
-	if (ep)
-		return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto;
-	return 0;
+	data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr));
+	if (!data)
+		return 0;
+
+	ep = (struct ethhdr *)(data + mhoff);
+	return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto;
 }
 
-static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk,
-			 int *noff, int *proto, bool l34)
+static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk, const void *data,
+			 int hlen, int l2_proto, int *nhoff, int *ip_proto, bool l34)
 {
 	const struct ipv6hdr *iph6;
 	const struct iphdr *iph;
 
-	if (skb->protocol == htons(ETH_P_IP)) {
-		if (unlikely(!pskb_may_pull(skb, *noff + sizeof(*iph))))
+	if (l2_proto == htons(ETH_P_IP)) {
+		data = bond_pull_data(skb, data, hlen, *nhoff + sizeof(*iph));
+		if (!data)
 			return false;
-		iph = (const struct iphdr *)(skb->data + *noff);
+
+		iph = (const struct iphdr *)(data + *nhoff);
 		iph_to_flow_copy_v4addrs(fk, iph);
-		*noff += iph->ihl << 2;
+		*nhoff += iph->ihl << 2;
 		if (!ip_is_fragment(iph))
-			*proto = iph->protocol;
-	} else if (skb->protocol == htons(ETH_P_IPV6)) {
-		if (unlikely(!pskb_may_pull(skb, *noff + sizeof(*iph6))))
+			*ip_proto = iph->protocol;
+	} else if (l2_proto == htons(ETH_P_IPV6)) {
+		data = bond_pull_data(skb, data, hlen, *nhoff + sizeof(*iph6));
+		if (!data)
 			return false;
-		iph6 = (const struct ipv6hdr *)(skb->data + *noff);
+
+		iph6 = (const struct ipv6hdr *)(data + *nhoff);
 		iph_to_flow_copy_v6addrs(fk, iph6);
-		*noff += sizeof(*iph6);
-		*proto = iph6->nexthdr;
+		*nhoff += sizeof(*iph6);
+		*ip_proto = iph6->nexthdr;
 	} else {
 		return false;
 	}
 
-	if (l34 && *proto >= 0)
-		fk->ports.ports = skb_flow_get_ports(skb, *noff, *proto);
+	if (l34 && *ip_proto >= 0)
+		fk->ports.ports = __skb_flow_get_ports(skb, *nhoff, *ip_proto, data, hlen);
 
 	return true;
 }
 
-static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
+static u32 bond_vlan_srcmac_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen)
 {
-	struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
+	struct ethhdr *mac_hdr;
 	u32 srcmac_vendor = 0, srcmac_dev = 0;
 	u16 vlan;
 	int i;
 
+	data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr));
+	if (!data)
+		return 0;
+	mac_hdr = (struct ethhdr *)(data + mhoff);
+
 	for (i = 0; i < 3; i++)
 		srcmac_vendor = (srcmac_vendor << 8) | mac_hdr->h_source[i];
 
@@ -3543,26 +3614,30 @@ static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
 }
 
 /* Extract the appropriate headers based on bond's xmit policy */
-static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
+static bool bond_flow_dissect(struct bonding *bond,
+			      struct sk_buff *skb,
+			      const void *data,
+			      __be16 l2_proto,
+			      int nhoff,
+			      int hlen,
 			      struct flow_keys *fk)
 {
 	bool l34 = bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34;
-	int noff, proto = -1;
+	int ip_proto = -1;
 
 	switch (bond->params.xmit_policy) {
 	case BOND_XMIT_POLICY_ENCAP23:
 	case BOND_XMIT_POLICY_ENCAP34:
 		memset(fk, 0, sizeof(*fk));
 		return __skb_flow_dissect(NULL, skb, &flow_keys_bonding,
-					  fk, NULL, 0, 0, 0, 0);
+					  fk, data, l2_proto, nhoff, hlen, 0);
 	default:
 		break;
 	}
 
 	fk->ports.ports = 0;
 	memset(&fk->icmp, 0, sizeof(fk->icmp));
-	noff = skb_network_offset(skb);
-	if (!bond_flow_ip(skb, fk, &noff, &proto, l34))
+	if (!bond_flow_ip(skb, fk, data, hlen, l2_proto, &nhoff, &ip_proto, l34))
 		return false;
 
 	/* ICMP error packets contains at least 8 bytes of the header
@@ -3570,22 +3645,20 @@ static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
 	 * to correlate ICMP error packets within the same flow which
 	 * generated the error.
 	 */
-	if (proto == IPPROTO_ICMP || proto == IPPROTO_ICMPV6) {
-		skb_flow_get_icmp_tci(skb, &fk->icmp, skb->data,
-				      skb_transport_offset(skb),
-				      skb_headlen(skb));
-		if (proto == IPPROTO_ICMP) {
+	if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
+		skb_flow_get_icmp_tci(skb, &fk->icmp, data, nhoff, hlen);
+		if (ip_proto == IPPROTO_ICMP) {
 			if (!icmp_is_err(fk->icmp.type))
 				return true;
 
-			noff += sizeof(struct icmphdr);
-		} else if (proto == IPPROTO_ICMPV6) {
+			nhoff += sizeof(struct icmphdr);
+		} else if (ip_proto == IPPROTO_ICMPV6) {
 			if (!icmpv6_is_err(fk->icmp.type))
 				return true;
 
-			noff += sizeof(struct icmp6hdr);
+			nhoff += sizeof(struct icmp6hdr);
 		}
-		return bond_flow_ip(skb, fk, &noff, &proto, l34);
+		return bond_flow_ip(skb, fk, data, hlen, l2_proto, &nhoff, &ip_proto, l34);
 	}
 
 	return true;
@@ -3601,33 +3674,30 @@ static u32 bond_ip_hash(u32 hash, struct flow_keys *flow)
 	return hash >> 1;
 }
 
-/**
- * bond_xmit_hash - generate a hash value based on the xmit policy
- * @bond: bonding device
- * @skb: buffer to use for headers
- *
- * This function will extract the necessary headers from the skb buffer and use
- * them to generate a hash based on the xmit_policy set in the bonding device
+/* Generate hash based on xmit policy. If @skb is given it is used to linearize
+ * the data as required, but this function can be used without it.
  */
-u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
+static u32 __bond_xmit_hash(struct bonding *bond,
+			    struct sk_buff *skb,
+			    const void *data,
+			    __be16 l2_proto,
+			    int mhoff,
+			    int nhoff,
+			    int hlen)
 {
 	struct flow_keys flow;
 	u32 hash;
 
-	if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
-	    skb->l4_hash)
-		return skb->hash;
-
 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_VLAN_SRCMAC)
-		return bond_vlan_srcmac_hash(skb);
+		return bond_vlan_srcmac_hash(skb, data, mhoff, hlen);
 
 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
-	    !bond_flow_dissect(bond, skb, &flow))
-		return bond_eth_hash(skb);
+	    !bond_flow_dissect(bond, skb, data, l2_proto, nhoff, hlen, &flow))
+		return bond_eth_hash(skb, data, mhoff, hlen);
 
 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 ||
 	    bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23) {
-		hash = bond_eth_hash(skb);
+		hash = bond_eth_hash(skb, data, mhoff, hlen);
 	} else {
 		if (flow.icmp.id)
 			memcpy(&hash, &flow.icmp, sizeof(hash));
@@ -3638,6 +3708,48 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
 	return bond_ip_hash(hash, &flow);
 }
 
+/**
+ * bond_xmit_hash_skb - generate a hash value based on the xmit policy
+ * @bond: bonding device
+ * @skb: buffer to use for headers
+ *
+ * This function will extract the necessary headers from the skb buffer and use
+ * them to generate a hash based on the xmit_policy set in the bonding device
+ */
+u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
+{
+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
+	    skb->l4_hash)
+		return skb->hash;
+
+	return __bond_xmit_hash(bond, skb, skb->head, skb->protocol,
+				skb->mac_header,
+				skb->network_header,
+				skb_headlen(skb));
+}
+
+/**
+ * bond_xmit_hash_xdp - generate a hash value based on the xmit policy
+ * @bond: bonding device
+ * @xdp: buffer to use for headers
+ *
+ * XDP variant of bond_xmit_hash.
+ */
+static u32 bond_xmit_hash_xdp(struct bonding *bond, struct xdp_buff *xdp)
+{
+	struct ethhdr *eth;
+
+	if (xdp->data + sizeof(struct ethhdr) > xdp->data_end)
+		return 0;
+
+	eth = (struct ethhdr *)xdp->data;
+
+	return __bond_xmit_hash(bond, NULL, xdp->data, eth->h_proto,
+				0,
+				sizeof(struct ethhdr),
+				xdp->data_end - xdp->data);
+}
+
 /*-------------------------- Device entry points ----------------------------*/
 
 void bond_work_init_all(struct bonding *bond)
@@ -4254,6 +4366,47 @@ static struct slave *bond_xmit_roundrobin_slave_get(struct bonding *bond,
 	return NULL;
 }
 
+static struct slave *bond_xdp_xmit_roundrobin_slave_get(struct bonding *bond,
+							struct xdp_buff *xdp)
+{
+	struct slave *slave;
+	int slave_cnt;
+	u32 slave_id;
+	const struct ethhdr *eth;
+	void *data = xdp->data;
+
+	if (data + sizeof(struct ethhdr) > xdp->data_end)
+		goto non_igmp;
+
+	eth = (struct ethhdr *)data;
+	data += sizeof(struct ethhdr);
+
+	/* See comment on IGMP in bond_xmit_roundrobin_slave_get() */
+	if (eth->h_proto == htons(ETH_P_IP)) {
+		const struct iphdr *iph;
+
+		if (data + sizeof(struct iphdr) > xdp->data_end)
+			goto non_igmp;
+
+		iph = (struct iphdr *)data;
+
+		if (iph->protocol == IPPROTO_IGMP) {
+			slave = rcu_dereference(bond->curr_active_slave);
+			if (slave)
+				return slave;
+			return bond_get_slave_by_id(bond, 0);
+		}
+	}
+
+non_igmp:
+	slave_cnt = READ_ONCE(bond->slave_cnt);
+	if (likely(slave_cnt)) {
+		slave_id = bond_rr_gen_slave_id(bond) % slave_cnt;
+		return bond_get_slave_by_id(bond, slave_id);
+	}
+	return NULL;
+}
+
 static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
 					struct net_device *bond_dev)
 {
@@ -4267,8 +4420,7 @@ static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
 	return bond_tx_drop(bond_dev, skb);
 }
 
-static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond,
-						      struct sk_buff *skb)
+static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond)
 {
 	return rcu_dereference(bond->curr_active_slave);
 }
@@ -4282,7 +4434,7 @@ static netdev_tx_t bond_xmit_activebackup(struct sk_buff *skb,
 	struct bonding *bond = netdev_priv(bond_dev);
 	struct slave *slave;
 
-	slave = bond_xmit_activebackup_slave_get(bond, skb);
+	slave = bond_xmit_activebackup_slave_get(bond);
 	if (slave)
 		return bond_dev_queue_xmit(bond, skb, slave->dev);
 
@@ -4470,6 +4622,22 @@ static struct slave *bond_xmit_3ad_xor_slave_get(struct bonding *bond,
 	return slave;
 }
 
+static struct slave *bond_xdp_xmit_3ad_xor_slave_get(struct bonding *bond,
+						     struct xdp_buff *xdp)
+{
+	struct bond_up_slave *slaves;
+	unsigned int count;
+	u32 hash;
+
+	hash = bond_xmit_hash_xdp(bond, xdp);
+	slaves = bond->usable_slaves;
+	count = slaves ? READ_ONCE(slaves->count) : 0;
+	if (unlikely(!count))
+		return NULL;
+
+	return slaves->arr[hash % count];
+}
+
 /* Use this Xmit function for 3AD as well as XOR modes. The current
  * usable slave array is formed in the control path. The xmit function
  * just calculates hash and sends the packet out.
@@ -4580,7 +4748,7 @@ static struct net_device *bond_xmit_get_slave(struct net_device *master_dev,
 		slave = bond_xmit_roundrobin_slave_get(bond, skb);
 		break;
 	case BOND_MODE_ACTIVEBACKUP:
-		slave = bond_xmit_activebackup_slave_get(bond, skb);
+		slave = bond_xmit_activebackup_slave_get(bond);
 		break;
 	case BOND_MODE_8023AD:
 	case BOND_MODE_XOR:
@@ -4754,6 +4922,164 @@ static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	return ret;
 }
 
+struct net_device *
+bond_xdp_get_xmit_slave(struct net_device *bond_dev, struct xdp_buff *xdp)
+{
+	struct bonding *bond = netdev_priv(bond_dev);
+	struct slave *slave;
+
+	/* Caller needs to hold rcu_read_lock() */
+
+	switch (BOND_MODE(bond)) {
+	case BOND_MODE_ROUNDROBIN:
+		slave = bond_xdp_xmit_roundrobin_slave_get(bond, xdp);
+		break;
+
+	case BOND_MODE_ACTIVEBACKUP:
+		slave = bond_xmit_activebackup_slave_get(bond);
+		break;
+
+	case BOND_MODE_8023AD:
+	case BOND_MODE_XOR:
+		slave = bond_xdp_xmit_3ad_xor_slave_get(bond, xdp);
+		break;
+
+	default:
+		/* Should never happen. Mode guarded by bond_xdp_check() */
+		netdev_err(bond_dev, "Unknown bonding mode %d for xdp xmit\n", BOND_MODE(bond));
+		WARN_ON_ONCE(1);
+		return NULL;
+	}
+
+	if (slave)
+		return slave->dev;
+
+	return NULL;
+}
+
+static int bond_xdp_xmit(struct net_device *bond_dev,
+			 int n, struct xdp_frame **frames, u32 flags)
+{
+	int nxmit, err = -ENXIO;
+
+	rcu_read_lock();
+
+	for (nxmit = 0; nxmit < n; nxmit++) {
+		struct xdp_frame *frame = frames[nxmit];
+		struct xdp_frame *frames1[] = {frame};
+		struct net_device *slave_dev;
+		struct xdp_buff xdp;
+
+		xdp_convert_frame_to_buff(frame, &xdp);
+
+		slave_dev = bond_xdp_get_xmit_slave(bond_dev, &xdp);
+		if (!slave_dev) {
+			err = -ENXIO;
+			break;
+		}
+
+		err = slave_dev->netdev_ops->ndo_xdp_xmit(slave_dev, 1, frames1, flags);
+		if (err < 1)
+			break;
+	}
+
+	rcu_read_unlock();
+
+	/* If error happened on the first frame then we can pass the error up, otherwise
+	 * report the number of frames that were xmitted.
+	 */
+	if (err < 0)
+		return (nxmit == 0 ? err : nxmit);
+
+	return nxmit;
+}
+
+static int bond_xdp_set(struct net_device *dev, struct bpf_prog *prog,
+			struct netlink_ext_ack *extack)
+{
+	struct bonding *bond = netdev_priv(dev);
+	struct list_head *iter;
+	struct slave *slave, *rollback_slave;
+	struct bpf_prog *old_prog;
+	struct netdev_bpf xdp = {
+		.command = XDP_SETUP_PROG,
+		.flags   = 0,
+		.prog    = prog,
+		.extack  = extack,
+	};
+	int err;
+
+	ASSERT_RTNL();
+
+	if (!bond_xdp_check(bond))
+		return -EOPNOTSUPP;
+
+	old_prog = bond->xdp_prog;
+	bond->xdp_prog = prog;
+
+	bond_for_each_slave(bond, slave, iter) {
+		struct net_device *slave_dev = slave->dev;
+
+		if (!slave_dev->netdev_ops->ndo_bpf ||
+		    !slave_dev->netdev_ops->ndo_xdp_xmit) {
+			NL_SET_ERR_MSG(extack, "Slave device does not support XDP");
+			slave_err(dev, slave_dev, "Slave does not support XDP\n");
+			err = -EOPNOTSUPP;
+			goto err;
+		}
+		err = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp);
+		if (err < 0) {
+			/* ndo_bpf() sets extack error message */
+			slave_err(dev, slave_dev, "Error %d calling ndo_bpf\n", err);
+			goto err;
+		}
+		if (prog)
+			bpf_prog_inc(prog);
+	}
+
+	if (old_prog)
+		bpf_prog_put(old_prog);
+
+	if (prog)
+		static_branch_inc(&bpf_bond_redirect_enabled_key);
+	else
+		static_branch_dec(&bpf_bond_redirect_enabled_key);
+
+	return 0;
+
+err:
+	/* unwind the program changes */
+	bond->xdp_prog = old_prog;
+	xdp.prog = old_prog;
+	xdp.extack = NULL; /* do not overwrite original error */
+
+	bond_for_each_slave(bond, rollback_slave, iter) {
+		struct net_device *slave_dev = rollback_slave->dev;
+		int err_unwind;
+
+		if (slave == rollback_slave)
+			break;
+
+		err_unwind = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp);
+		if (err_unwind < 0)
+			slave_err(dev, slave_dev,
+				  "Error %d when unwinding XDP program change\n", err_unwind);
+		else if (xdp.prog)
+			bpf_prog_inc(xdp.prog);
+	}
+	return err;
+}
+
+static int bond_xdp(struct net_device *dev, struct netdev_bpf *xdp)
+{
+	switch (xdp->command) {
+	case XDP_SETUP_PROG:
+		return bond_xdp_set(dev, xdp->prog, xdp->extack);
+	default:
+		return -EINVAL;
+	}
+}
+
 static u32 bond_mode_bcast_speed(struct slave *slave, u32 speed)
 {
 	if (speed == 0 || speed == SPEED_UNKNOWN)
@@ -4840,6 +5166,9 @@ static const struct net_device_ops bond_netdev_ops = {
 	.ndo_features_check	= passthru_features_check,
 	.ndo_get_xmit_slave	= bond_xmit_get_slave,
 	.ndo_sk_get_lower_dev	= bond_sk_get_lower_dev,
+	.ndo_bpf		= bond_xdp,
+	.ndo_xdp_xmit           = bond_xdp_xmit,
+	.ndo_xdp_get_xmit_slave = bond_xdp_get_xmit_slave,
 };
 
 static const struct device_type bond_type = {
diff --git a/include/linux/filter.h b/include/linux/filter.h
index c5ad7df029ed..57c166089456 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -760,6 +760,10 @@ static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog,
 
 DECLARE_BPF_DISPATCHER(xdp)
 
+DECLARE_STATIC_KEY_FALSE(bpf_bond_redirect_enabled_key);
+
+u32 xdp_bond_redirect(struct xdp_buff *xdp);
+
 static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
 					    struct xdp_buff *xdp)
 {
@@ -769,7 +773,14 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
 	 * already takes rcu_read_lock() when fetching the program, so
 	 * it's not necessary here anymore.
 	 */
-	return __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
+	u32 act = __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
+
+	if (static_branch_unlikely(&bpf_bond_redirect_enabled_key)) {
+		if (act == XDP_TX && netif_is_bond_slave(xdp->rxq->dev))
+			act = xdp_bond_redirect(xdp);
+	}
+
+	return act;
 }
 
 void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5cbc950b34df..1a6cc6356498 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1321,6 +1321,9 @@ struct netdev_net_notifier {
  *	that got dropped are freed/returned via xdp_return_frame().
  *	Returns negative number, means general error invoking ndo, meaning
  *	no frames were xmit'ed and core-caller will free all frames.
+ * struct net_device *(*ndo_xdp_get_xmit_slave)(struct net_device *dev,
+ *					        struct xdp_buff *xdp);
+ *      Get the xmit slave of master device based on the xdp_buff.
  * int (*ndo_xsk_wakeup)(struct net_device *dev, u32 queue_id, u32 flags);
  *      This function is used to wake up the softirq, ksoftirqd or kthread
  *	responsible for sending and/or receiving packets on a specific
@@ -1539,6 +1542,8 @@ struct net_device_ops {
 	int			(*ndo_xdp_xmit)(struct net_device *dev, int n,
 						struct xdp_frame **xdp,
 						u32 flags);
+	struct net_device *	(*ndo_xdp_get_xmit_slave)(struct net_device *dev,
+							  struct xdp_buff *xdp);
 	int			(*ndo_xsk_wakeup)(struct net_device *dev,
 						  u32 queue_id, u32 flags);
 	struct devlink_port *	(*ndo_get_devlink_port)(struct net_device *dev);
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 019e998d944a..34acb81b4234 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -251,6 +251,7 @@ struct bonding {
 #ifdef CONFIG_XFRM_OFFLOAD
 	struct xfrm_state *xs;
 #endif /* CONFIG_XFRM_OFFLOAD */
+	struct bpf_prog *xdp_prog;
 };
 
 #define bond_slave_get_rcu(dev) \
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 2a75e6c2d27d..2caff5714f4d 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -514,9 +514,11 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
 }
 
 static bool is_valid_dst(struct bpf_dtab_netdev *obj, struct xdp_buff *xdp,
-			 int exclude_ifindex)
+			 int exclude_ifindex, int exclude_ifindex_master)
 {
-	if (!obj || obj->dev->ifindex == exclude_ifindex ||
+	if (!obj ||
+	    obj->dev->ifindex == exclude_ifindex ||
+	    obj->dev->ifindex == exclude_ifindex_master ||
 	    !obj->dev->netdev_ops->ndo_xdp_xmit)
 		return false;
 
@@ -546,12 +548,19 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
 	int exclude_ifindex = exclude_ingress ? dev_rx->ifindex : 0;
+	int exclude_ifindex_master = 0;
 	struct bpf_dtab_netdev *dst, *last_dst = NULL;
 	struct hlist_head *head;
 	struct xdp_frame *xdpf;
 	unsigned int i;
 	int err;
 
+	if (static_branch_unlikely(&bpf_bond_redirect_enabled_key)) {
+		struct net_device *master = netdev_master_upper_dev_get_rcu(dev_rx);
+
+		exclude_ifindex_master = (master && exclude_ingress) ? master->ifindex : 0;
+	}
+
 	xdpf = xdp_convert_buff_to_frame(xdp);
 	if (unlikely(!xdpf))
 		return -EOVERFLOW;
@@ -559,7 +568,7 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
 	if (map->map_type == BPF_MAP_TYPE_DEVMAP) {
 		for (i = 0; i < map->max_entries; i++) {
 			dst = READ_ONCE(dtab->netdev_map[i]);
-			if (!is_valid_dst(dst, xdp, exclude_ifindex))
+			if (!is_valid_dst(dst, xdp, exclude_ifindex, exclude_ifindex_master))
 				continue;
 
 			/* we only need n-1 clones; last_dst enqueued below */
@@ -579,7 +588,9 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
 			head = dev_map_index_hash(dtab, i);
 			hlist_for_each_entry_rcu(dst, head, index_hlist,
 						 lockdep_is_held(&dtab->index_lock)) {
-				if (!is_valid_dst(dst, xdp, exclude_ifindex))
+				if (!is_valid_dst(dst, xdp,
+						  exclude_ifindex,
+						  exclude_ifindex_master))
 					continue;
 
 				/* we only need n-1 clones; last_dst enqueued below */
@@ -646,16 +657,25 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb,
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
 	int exclude_ifindex = exclude_ingress ? dev->ifindex : 0;
+	int exclude_ifindex_master = 0;
 	struct bpf_dtab_netdev *dst, *last_dst = NULL;
 	struct hlist_head *head;
 	struct hlist_node *next;
 	unsigned int i;
 	int err;
 
+	if (static_branch_unlikely(&bpf_bond_redirect_enabled_key)) {
+		struct net_device *master = netdev_master_upper_dev_get_rcu(dev);
+
+		exclude_ifindex_master = (master && exclude_ingress) ? master->ifindex : 0;
+	}
+
 	if (map->map_type == BPF_MAP_TYPE_DEVMAP) {
 		for (i = 0; i < map->max_entries; i++) {
 			dst = READ_ONCE(dtab->netdev_map[i]);
-			if (!dst || dst->dev->ifindex == exclude_ifindex)
+			if (!dst ||
+			    dst->dev->ifindex == exclude_ifindex ||
+			    dst->dev->ifindex == exclude_ifindex_master)
 				continue;
 
 			/* we only need n-1 clones; last_dst enqueued below */
@@ -674,7 +694,9 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb,
 		for (i = 0; i < dtab->n_buckets; i++) {
 			head = dev_map_index_hash(dtab, i);
 			hlist_for_each_entry_safe(dst, next, head, index_hlist) {
-				if (!dst || dst->dev->ifindex == exclude_ifindex)
+				if (!dst ||
+				    dst->dev->ifindex == exclude_ifindex ||
+				    dst->dev->ifindex == exclude_ifindex_master)
 					continue;
 
 				/* we only need n-1 clones; last_dst enqueued below */
diff --git a/net/core/filter.c b/net/core/filter.c
index caa88955562e..5d268eb980e7 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2469,6 +2469,7 @@ int skb_do_redirect(struct sk_buff *skb)
 	ri->flags = 0;
 	if (unlikely(!dev))
 		goto out_drop;
+
 	if (flags & BPF_F_PEER) {
 		const struct net_device_ops *ops = dev->netdev_ops;
 
@@ -3947,6 +3948,40 @@ void bpf_clear_redirect_map(struct bpf_map *map)
 	}
 }
 
+DEFINE_STATIC_KEY_FALSE(bpf_bond_redirect_enabled_key);
+EXPORT_SYMBOL_GPL(bpf_bond_redirect_enabled_key);
+INDIRECT_CALLABLE_DECLARE(struct net_device *
+	bond_xdp_get_xmit_slave(struct net_device *bond_dev, struct xdp_buff *xdp));
+
+u32 xdp_bond_redirect(struct xdp_buff *xdp)
+{
+	struct net_device *master, *slave;
+	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+
+	master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev);
+
+#if IS_BUILTIN(CONFIG_BONDING)
+	slave = INDIRECT_CALL_1(master->netdev_ops->ndo_xdp_get_xmit_slave,
+				bond_xdp_get_xmit_slave,
+				master, xdp);
+#else
+	slave = master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp);
+#endif
+	if (slave && slave != xdp->rxq->dev) {
+		/* The target device is different from the receiving device, so
+		 * redirect it to the new device.
+		 * Using XDP_REDIRECT gets the correct behaviour from XDP enabled
+		 * drivers to unmap the packet from their rx ring.
+		 */
+		ri->tgt_index = slave->ifindex;
+		ri->map_id = INT_MAX;
+		ri->map_type = BPF_MAP_TYPE_UNSPEC;
+		return XDP_REDIRECT;
+	}
+	return XDP_TX;
+}
+EXPORT_SYMBOL_GPL(xdp_bond_redirect);
+
 int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
 		    struct bpf_prog *xdp_prog)
 {
@@ -4466,7 +4501,7 @@ static const struct bpf_func_proto bpf_skb_cgroup_id_proto = {
 };
 
 static inline u64 __bpf_sk_ancestor_cgroup_id(struct sock *sk,
-					      int ancestor_level)
+					     int ancestor_level)
 {
 	struct cgroup *ancestor;
 	struct cgroup *cgrp;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next 2/3] net: bonding: Use per-cpu rr_tx_counter
  2021-06-09 13:55 [PATCH bpf-next 0/3] XDP bonding support Jussi Maki
  2021-06-09 13:55 ` [PATCH bpf-next 1/3] net: bonding: Add XDP support to the bonding driver Jussi Maki
@ 2021-06-09 13:55 ` Jussi Maki
  2021-06-10  0:04   ` Jay Vosburgh
  2021-06-09 13:55 ` [PATCH bpf-next 3/3] selftests/bpf: Add tests for XDP bonding Jussi Maki
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 71+ messages in thread
From: Jussi Maki @ 2021-06-09 13:55 UTC (permalink / raw)
  To: bpf; +Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii, Jussi Maki

The round-robin rr_tx_counter was shared across CPUs leading
to significant cache trashing at high packet rates. This patch
switches the round-robin mechanism to use a per-cpu counter to
decide the destination device.

On a 100Gbit 64 byte packet test this reduces the CPU load from
50% to 10% on the test system.

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 drivers/net/bonding/bond_main.c | 18 +++++++++++++++---
 include/net/bonding.h           |  2 +-
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 38eea7e096f3..917dd2cdcbf4 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4314,16 +4314,16 @@ static u32 bond_rr_gen_slave_id(struct bonding *bond)
 		slave_id = prandom_u32();
 		break;
 	case 1:
-		slave_id = bond->rr_tx_counter;
+		slave_id = this_cpu_inc_return(*bond->rr_tx_counter);
 		break;
 	default:
 		reciprocal_packets_per_slave =
 			bond->params.reciprocal_packets_per_slave;
-		slave_id = reciprocal_divide(bond->rr_tx_counter,
+		slave_id = this_cpu_inc_return(*bond->rr_tx_counter);
+		slave_id = reciprocal_divide(slave_id,
 					     reciprocal_packets_per_slave);
 		break;
 	}
-	bond->rr_tx_counter++;
 
 	return slave_id;
 }
@@ -5278,6 +5278,9 @@ static void bond_uninit(struct net_device *bond_dev)
 
 	list_del(&bond->bond_list);
 
+	if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN)
+		free_percpu(bond->rr_tx_counter);
+
 	bond_debug_unregister(bond);
 }
 
@@ -5681,6 +5684,15 @@ static int bond_init(struct net_device *bond_dev)
 	if (!bond->wq)
 		return -ENOMEM;
 
+	if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN) {
+		bond->rr_tx_counter = alloc_percpu(u32);
+		if (!bond->rr_tx_counter) {
+			destroy_workqueue(bond->wq);
+			bond->wq = NULL;
+			return -ENOMEM;
+		}
+	}
+
 	spin_lock_init(&bond->stats_lock);
 	netdev_lockdep_set_classes(bond_dev);
 
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 34acb81b4234..8de8180f1be8 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -232,7 +232,7 @@ struct bonding {
 	char     proc_file_name[IFNAMSIZ];
 #endif /* CONFIG_PROC_FS */
 	struct   list_head bond_list;
-	u32      rr_tx_counter;
+	u32 __percpu *rr_tx_counter;
 	struct   ad_bond_info ad_info;
 	struct   alb_bond_info alb_info;
 	struct   bond_params params;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next 3/3] selftests/bpf: Add tests for XDP bonding
  2021-06-09 13:55 [PATCH bpf-next 0/3] XDP bonding support Jussi Maki
  2021-06-09 13:55 ` [PATCH bpf-next 1/3] net: bonding: Add XDP support to the bonding driver Jussi Maki
  2021-06-09 13:55 ` [PATCH bpf-next 2/3] net: bonding: Use per-cpu rr_tx_counter Jussi Maki
@ 2021-06-09 13:55 ` Jussi Maki
  2021-06-09 22:07   ` Maciej Fijalkowski
  2021-06-10 17:24 ` [PATCH bpf-next 0/3] XDP bonding support Andrii Nakryiko
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 71+ messages in thread
From: Jussi Maki @ 2021-06-09 13:55 UTC (permalink / raw)
  To: bpf; +Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii, Jussi Maki

Add a test suite to test XDP bonding implementation
over a pair of veth devices.

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 .../selftests/bpf/prog_tests/xdp_bonding.c    | 342 ++++++++++++++++++
 tools/testing/selftests/bpf/vmtest.sh         |  30 +-
 2 files changed, 360 insertions(+), 12 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_bonding.c

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c b/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c
new file mode 100644
index 000000000000..fd2b83194127
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c
@@ -0,0 +1,342 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/**
+ * Test XDP bonding support
+ *
+ * Sets up two bonded veth pairs between two fresh namespaces
+ * and verifies that XDP_TX program loaded on a bond device
+ * are correctly loaded onto the slave devices and XDP_TX'd
+ * packets are balanced using bonding.
+ */
+
+#define _GNU_SOURCE
+#include <sched.h>
+#include <stdio.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <fcntl.h>
+#include <net/if.h>
+#include <test_progs.h>
+#include <network_helpers.h>
+#include <linux/if_bonding.h>
+#include <linux/limits.h>
+#include <linux/if_ether.h>
+#include <linux/udp.h>
+
+#define BOND1_MAC {0x00, 0x11, 0x22, 0x33, 0x44, 0x55}
+#define BOND1_MAC_STR "00:11:22:33:44:55"
+#define BOND2_MAC {0x00, 0x22, 0x33, 0x44, 0x55, 0x66}
+#define BOND2_MAC_STR "00:22:33:44:55:66"
+#define NPACKETS 100
+
+static int root_netns_fd = -1;
+
+static void restore_root_netns(void)
+{
+	ASSERT_OK(setns(root_netns_fd, CLONE_NEWNET), "restore_root_netns");
+}
+
+int setns_by_name(char *name)
+{
+	int nsfd, err;
+	char nspath[PATH_MAX];
+
+	snprintf(nspath, sizeof(nspath), "%s/%s", "/var/run/netns", name);
+	nsfd = open(nspath, O_RDONLY | O_CLOEXEC);
+	if (nsfd < 0)
+		return -1;
+
+	err = setns(nsfd, CLONE_NEWNET);
+	close(nsfd);
+	return err;
+}
+
+static int get_rx_packets(const char *iface)
+{
+	FILE *f;
+	char line[512];
+	int iface_len = strlen(iface);
+
+	f = fopen("/proc/net/dev", "r");
+	if (!f)
+		return -1;
+
+	while (fgets(line, sizeof(line), f)) {
+		char *p = line;
+
+		while (*p == ' ')
+			p++; /* skip whitespace */
+		if (!strncmp(p, iface, iface_len)) {
+			p += iface_len;
+			if (*p++ != ':')
+				continue;
+			while (*p == ' ')
+				p++; /* skip whitespace */
+			while (*p && *p != ' ')
+				p++; /* skip rx bytes */
+			while (*p == ' ')
+				p++; /* skip whitespace */
+			fclose(f);
+			return atoi(p);
+		}
+	}
+	fclose(f);
+	return -1;
+}
+
+enum {
+	BOND_ONE_NO_ATTACH = 0,
+	BOND_BOTH_AND_ATTACH,
+};
+
+static int bonding_setup(int mode, int xmit_policy, int bond_both_attach)
+{
+#define SYS(fmt, ...)						\
+	({							\
+		char cmd[1024];					\
+		snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__);	\
+		if (!ASSERT_OK(system(cmd), cmd))		\
+			return -1;				\
+	})
+
+	SYS("ip netns add ns_dst");
+	SYS("ip link add veth1_1 type veth peer name veth2_1 netns ns_dst");
+	SYS("ip link add veth1_2 type veth peer name veth2_2 netns ns_dst");
+
+	SYS("modprobe -r bonding &> /dev/null");
+	SYS("modprobe bonding mode=%d packets_per_slave=1 xmit_hash_policy=%d", mode, xmit_policy);
+
+	SYS("ip link add bond1 type bond");
+	SYS("ip link set bond1 address " BOND1_MAC_STR);
+	SYS("ip link set bond1 up");
+	SYS("ip -netns ns_dst link add bond2 type bond");
+	SYS("ip -netns ns_dst link set bond2 address " BOND2_MAC_STR);
+	SYS("ip -netns ns_dst link set bond2 up");
+
+	SYS("ip link set veth1_1 master bond1");
+	if (bond_both_attach == BOND_BOTH_AND_ATTACH) {
+		SYS("ip link set veth1_2 master bond1");
+	} else {
+		SYS("ip link set veth1_2 up");
+		SYS("ip link set dev veth1_2 xdpdrv obj xdp_dummy.o sec xdp_dummy");
+	}
+
+	SYS("ip -netns ns_dst link set veth2_1 master bond2");
+
+	if (bond_both_attach == BOND_BOTH_AND_ATTACH)
+		SYS("ip -netns ns_dst link set veth2_2 master bond2");
+	else
+		SYS("ip -netns ns_dst link set veth2_2 up");
+
+	/* Load a dummy program on sending side as with veth peer needs to have a
+	 * XDP program loaded as well.
+	 */
+	SYS("ip link set dev bond1 xdpdrv obj xdp_dummy.o sec xdp_dummy");
+
+	if (bond_both_attach == BOND_BOTH_AND_ATTACH)
+		SYS("ip -netns ns_dst link set dev bond2 xdpdrv obj xdp_tx.o sec tx");
+
+#undef SYS
+	return 0;
+}
+
+static void bonding_cleanup(void)
+{
+	ASSERT_OK(system("ip link delete veth1_1"), "delete veth1_1");
+	ASSERT_OK(system("ip link delete veth1_2"), "delete veth1_2");
+	ASSERT_OK(system("ip netns delete ns_dst"), "delete ns_dst");
+	ASSERT_OK(system("modprobe -r bonding"), "unload bond");
+}
+
+static int send_udp_packets(int vary_dst_ip)
+{
+	int i, s = -1;
+	int ifindex;
+	uint8_t buf[128] = {};
+	struct ethhdr eh = {
+		.h_source = BOND1_MAC,
+		.h_dest = BOND2_MAC,
+		.h_proto = htons(ETH_P_IP),
+	};
+	struct iphdr *iph = (struct iphdr *)(buf + sizeof(eh));
+	struct udphdr *uh = (struct udphdr *)(buf + sizeof(eh) + sizeof(*iph));
+
+	s = socket(AF_PACKET, SOCK_RAW, IPPROTO_RAW);
+	if (!ASSERT_GE(s, 0, "socket"))
+		goto err;
+
+	ifindex = if_nametoindex("bond1");
+	if (!ASSERT_GT(ifindex, 0, "get bond1 ifindex"))
+		goto err;
+
+	memcpy(buf, &eh, sizeof(eh));
+	iph->ihl = 5;
+	iph->version = 4;
+	iph->tos = 16;
+	iph->id = 1;
+	iph->ttl = 64;
+	iph->protocol = IPPROTO_UDP;
+	iph->saddr = 1;
+	iph->daddr = 2;
+	iph->tot_len = htons(sizeof(buf) - ETH_HLEN);
+	iph->check = 0;
+
+	for (i = 1; i <= NPACKETS; i++) {
+		int n;
+		struct sockaddr_ll saddr_ll = {
+			.sll_ifindex = ifindex,
+			.sll_halen = ETH_ALEN,
+			.sll_addr = BOND2_MAC,
+		};
+
+		/* vary the UDP destination port for even distribution with roundrobin/xor modes */
+		uh->dest++;
+
+		if (vary_dst_ip)
+			iph->daddr++;
+
+		n = sendto(s, buf, sizeof(buf), 0, (struct sockaddr *)&saddr_ll, sizeof(saddr_ll));
+		if (!ASSERT_EQ(n, sizeof(buf), "sendto"))
+			goto err;
+	}
+
+	return 0;
+
+err:
+	if (s >= 0)
+		close(s);
+	return -1;
+}
+
+void test_xdp_bonding_with_mode(char *name, int mode, int xmit_policy)
+{
+	int bond1_rx;
+
+	if (!test__start_subtest(name))
+		return;
+
+	if (bonding_setup(mode, xmit_policy, BOND_BOTH_AND_ATTACH))
+		return;
+
+	if (send_udp_packets(xmit_policy != BOND_XMIT_POLICY_LAYER34))
+		return;
+
+	bond1_rx = get_rx_packets("bond1");
+	ASSERT_TRUE(
+		bond1_rx >= NPACKETS,
+		"expected more received packets");
+
+	switch (mode) {
+	case BOND_MODE_ROUNDROBIN:
+	case BOND_MODE_XOR: {
+		int veth1_rx = get_rx_packets("veth1_1");
+		int veth2_rx = get_rx_packets("veth1_2");
+		int diff = abs(veth1_rx - veth2_rx);
+
+		ASSERT_GE(veth1_rx + veth2_rx, NPACKETS, "expected more packets");
+
+		switch (xmit_policy) {
+		case BOND_XMIT_POLICY_LAYER2:
+			ASSERT_GE(diff, NPACKETS/2,
+				  "expected packets on only one of the interfaces");
+			break;
+		case BOND_XMIT_POLICY_LAYER23:
+		case BOND_XMIT_POLICY_LAYER34:
+			ASSERT_LT(diff, NPACKETS/2,
+				  "expected even distribution of packets");
+			break;
+		default:
+			abort();
+		}
+		break;
+	}
+	default:
+		break;
+	}
+
+	bonding_cleanup();
+}
+
+void test_xdp_bonding_redirect_multi(void)
+{
+	static const char * const ifaces[] = {"bond2", "veth2_1", "veth2_2"};
+	int veth1_rx, veth2_rx;
+	int err;
+
+	if (!test__start_subtest("xdp_bonding_redirect_multi"))
+		return;
+
+	if (bonding_setup(BOND_MODE_ROUNDROBIN, BOND_XMIT_POLICY_LAYER23, BOND_ONE_NO_ATTACH))
+		goto out;
+
+	err = system("ip -netns ns_dst link set dev bond2 xdpdrv "
+		     "obj xdp_redirect_multi_kern.o sec xdp_redirect_map_multi");
+	if (!ASSERT_OK(err, "link set xdpdrv"))
+		goto out;
+
+	/* populate the redirection devmap with the relevant interfaces */
+	if (!ASSERT_OK(setns_by_name("ns_dst"), "could not set netns to ns_dst"))
+		goto out;
+
+	for (int i = 0; i < ARRAY_SIZE(ifaces); i++) {
+		char cmd[512];
+		int ifindex = if_nametoindex(ifaces[i]);
+
+		if (!ASSERT_GT(ifindex, 0, "could not get interface index"))
+			goto out;
+
+		snprintf(cmd, sizeof(cmd),
+			 "ip netns exec ns_dst bpftool map update name map_all key %d 0 0 0 value %d 0 0 0",
+			 i, ifindex);
+
+		if (!ASSERT_OK(system(cmd), "bpftool map update"))
+			goto out;
+	}
+	restore_root_netns();
+
+	send_udp_packets(BOND_MODE_ROUNDROBIN);
+
+	veth1_rx = get_rx_packets("veth1_1");
+	veth2_rx = get_rx_packets("veth1_2");
+
+	ASSERT_LT(veth1_rx, NPACKETS/2, "expected few packets on veth1");
+	ASSERT_GE(veth2_rx, NPACKETS, "expected more packets on veth2");
+out:
+	restore_root_netns();
+	bonding_cleanup();
+}
+
+struct bond_test_case {
+	char *name;
+	int mode;
+	int xmit_policy;
+};
+
+static	struct bond_test_case bond_test_cases[] = {
+	{ "xdp_bonding_roundrobin", BOND_MODE_ROUNDROBIN, BOND_XMIT_POLICY_LAYER23, },
+	{ "xdp_bonding_activebackup", BOND_MODE_ACTIVEBACKUP, BOND_XMIT_POLICY_LAYER23 },
+
+	{ "xdp_bonding_xor_layer2", BOND_MODE_XOR, BOND_XMIT_POLICY_LAYER2, },
+	{ "xdp_bonding_xor_layer23", BOND_MODE_XOR, BOND_XMIT_POLICY_LAYER23, },
+	{ "xdp_bonding_xor_layer34", BOND_MODE_XOR, BOND_XMIT_POLICY_LAYER34, },
+};
+
+void test_xdp_bonding(void)
+{
+	int i;
+
+	root_netns_fd = open("/proc/self/ns/net", O_RDONLY);
+	if (!ASSERT_GE(root_netns_fd, 0, "open /proc/self/ns/net"))
+		return;
+
+	for (i = 0; i < ARRAY_SIZE(bond_test_cases); i++) {
+		struct bond_test_case *test_case = &bond_test_cases[i];
+
+		test_xdp_bonding_with_mode(
+			test_case->name,
+			test_case->mode,
+			test_case->xmit_policy);
+	}
+
+	test_xdp_bonding_redirect_multi();
+}
diff --git a/tools/testing/selftests/bpf/vmtest.sh b/tools/testing/selftests/bpf/vmtest.sh
index 8889b3f55236..68818780e072 100755
--- a/tools/testing/selftests/bpf/vmtest.sh
+++ b/tools/testing/selftests/bpf/vmtest.sh
@@ -106,17 +106,6 @@ download_rootfs()
 		zstd -d | sudo tar -C "$dir" -x
 }
 
-recompile_kernel()
-{
-	local kernel_checkout="$1"
-	local make_command="$2"
-
-	cd "${kernel_checkout}"
-
-	${make_command} olddefconfig
-	${make_command}
-}
-
 mount_image()
 {
 	local rootfs_img="${OUTPUT_DIR}/${ROOTFS_IMAGE}"
@@ -132,6 +121,23 @@ unmount_image()
 	sudo umount "${mount_dir}" &> /dev/null
 }
 
+recompile_kernel()
+{
+	local kernel_checkout="$1"
+	local make_command="$2"
+	local kernel_config="$3"
+
+	cd "${kernel_checkout}"
+
+	${make_command} olddefconfig
+	scripts/config --file ${kernel_config} --module CONFIG_BONDING
+	${make_command}
+	${make_command} modules
+	mount_image
+	sudo ${make_command} INSTALL_MOD_PATH=${OUTPUT_DIR}/${MOUNT_DIR} modules_install
+	unmount_image
+}
+
 update_selftests()
 {
 	local kernel_checkout="$1"
@@ -358,7 +364,7 @@ main()
 	mkdir -p "${mount_dir}"
 	update_kconfig "${kconfig_file}"
 
-	recompile_kernel "${kernel_checkout}" "${make_command}"
+	recompile_kernel "${kernel_checkout}" "${make_command}" "${kconfig_file}"
 
 	if [[ "${update_image}" == "no" && ! -f "${rootfs_img}" ]]; then
 		echo "rootfs image not found in ${rootfs_img}"
-- 
2.30.2


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next 3/3] selftests/bpf: Add tests for XDP bonding
  2021-06-09 13:55 ` [PATCH bpf-next 3/3] selftests/bpf: Add tests for XDP bonding Jussi Maki
@ 2021-06-09 22:07   ` Maciej Fijalkowski
  2021-06-14  8:08     ` Jussi Maki
  0 siblings, 1 reply; 71+ messages in thread
From: Maciej Fijalkowski @ 2021-06-09 22:07 UTC (permalink / raw)
  To: Jussi Maki
  Cc: bpf, netdev, daniel, j.vosburgh, andy, vfalico, andrii, magnus.karlsson

On Wed, Jun 09, 2021 at 01:55:37PM +0000, Jussi Maki wrote:
> Add a test suite to test XDP bonding implementation
> over a pair of veth devices.

Cc: Magnus

Jussi,
AF_XDP selftests have very similar functionality just like you are trying
to introduce over here, e.g. we setup veth pair and generate traffic.
After a quick look seems that we could have a generic layer that would
be used by both AF_XDP and bonding selftests.

WDYT?

> 
> Signed-off-by: Jussi Maki <joamaki@gmail.com>
> ---
>  .../selftests/bpf/prog_tests/xdp_bonding.c    | 342 ++++++++++++++++++
>  tools/testing/selftests/bpf/vmtest.sh         |  30 +-
>  2 files changed, 360 insertions(+), 12 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_bonding.c

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next 1/3] net: bonding: Add XDP support to the bonding driver
  2021-06-09 13:55 ` [PATCH bpf-next 1/3] net: bonding: Add XDP support to the bonding driver Jussi Maki
@ 2021-06-09 22:29   ` Maciej Fijalkowski
  2021-06-09 23:29   ` Jay Vosburgh
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 71+ messages in thread
From: Maciej Fijalkowski @ 2021-06-09 22:29 UTC (permalink / raw)
  To: Jussi Maki; +Cc: bpf, netdev, daniel, j.vosburgh, andy, vfalico, andrii

On Wed, Jun 09, 2021 at 01:55:35PM +0000, Jussi Maki wrote:
> XDP is implemented in the bonding driver by transparently delegating
> the XDP program loading, removal and xmit operations to the bonding
> slave devices. The overall goal of this work is that XDP programs
> can be attached to a bond device *without* any further changes (or
> awareness) necessary to the program itself, meaning the same XDP
> program can be attached to a native device but also a bonding device.
> 
> Semantics of XDP_TX when attached to a bond are equivalent in such
> setting to the case when a tc/BPF program would be attached to the
> bond, meaning transmitting the packet out of the bond itself using one
> of the bond's configured xmit methods to select a slave device (rather
> than XDP_TX on the slave itself). Handling of XDP_TX to transmit
> using the configured bonding mechanism is therefore implemented by
> rewriting the BPF program return value in bpf_prog_run_xdp. To avoid
> performance impact this check is guarded by a static key, which is
> incremented when a XDP program is loaded onto a bond device. This
> approach was chosen to avoid changes to drivers implementing XDP. If
> the slave device does not match the receive device, then XDP_REDIRECT
> is transparently used to perform the redirection in order to have
> the network driver release the packet from its RX ring.  The bonding
> driver hashing functions have been refactored to allow reuse with
> xdp_buff's to avoid code duplication.
> 
> The motivation for this change is to enable use of bonding (and
> 802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
> XDP and also to transparently support bond devices for projects that
> use XDP given most modern NICs have dual port adapters.  An alternative
> to this approach would be to implement 802.3ad in user-space and
> implement the bonding load-balancing in the XDP program itself, but
> is rather a cumbersome endeavor in terms of slave device management
> (e.g. by watching netlink) and requires separate programs for native
> vs bond cases for the orchestrator. A native in-kernel implementation
> overcomes these issues and provides more flexibility.
> 
> Below are benchmark results done on two machines with 100Gbit
> Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
> 16-core 3950X on receiving machine. 64 byte packets were sent with
> pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
> ice driver, so the tests were performed with iommu=off and patch [2]
> applied. Additionally the bonding round robin algorithm was modified
> to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
> of cache misses were caused by the shared rr_tx_counter (see patch
> 2/3). The statistics were collected using "sar -n dev -u 1 10".
> 
>  -----------------------|  CPU  |--| rxpck/s |--| txpck/s |----
>  without patch (1 dev):
>    XDP_DROP:              3.15%      48.6Mpps
>    XDP_TX:                3.12%      18.3Mpps     18.3Mpps
>    XDP_DROP (RSS):        9.47%      116.5Mpps
>    XDP_TX (RSS):          9.67%      25.3Mpps     24.2Mpps
>  -----------------------
>  with patch, bond (1 dev):
>    XDP_DROP:              3.14%      46.7Mpps
>    XDP_TX:                3.15%      13.9Mpps     13.9Mpps
>    XDP_DROP (RSS):        10.33%     117.2Mpps
>    XDP_TX (RSS):          10.64%     25.1Mpps     24.0Mpps
>  -----------------------
>  with patch, bond (2 devs):
>    XDP_DROP:              6.27%      92.7Mpps
>    XDP_TX:                6.26%      17.6Mpps     17.5Mpps
>    XDP_DROP (RSS):       11.38%      117.2Mpps
>    XDP_TX (RSS):         14.30%      28.7Mpps     27.4Mpps
>  --------------------------------------------------------------
> 
> RSS: Receive Side Scaling, e.g. the packets were sent to a range of
> destination IPs.
> 
> [1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb
> [2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t
> [3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/
> 
> Signed-off-by: Jussi Maki <joamaki@gmail.com>
> ---
>  drivers/net/bonding/bond_main.c | 441 ++++++++++++++++++++++++++++----
>  include/linux/filter.h          |  13 +-
>  include/linux/netdevice.h       |   5 +
>  include/net/bonding.h           |   1 +
>  kernel/bpf/devmap.c             |  34 ++-
>  net/core/filter.c               |  37 ++-
>  6 files changed, 467 insertions(+), 64 deletions(-)
> 

Could this patch be broken down onto smaller chunks that would be easier
to review? Also please apply the Reverse Christmas Tree rule.


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next 1/3] net: bonding: Add XDP support to the bonding driver
  2021-06-09 13:55 ` [PATCH bpf-next 1/3] net: bonding: Add XDP support to the bonding driver Jussi Maki
  2021-06-09 22:29   ` Maciej Fijalkowski
@ 2021-06-09 23:29   ` Jay Vosburgh
  2021-06-14  8:02     ` Jussi Maki
  2021-06-17  3:40   ` kernel test robot
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 71+ messages in thread
From: Jay Vosburgh @ 2021-06-09 23:29 UTC (permalink / raw)
  To: Jussi Maki; +Cc: bpf, netdev, daniel, andy, vfalico, andrii

Jussi Maki <joamaki@gmail.com> wrote:

>XDP is implemented in the bonding driver by transparently delegating
>the XDP program loading, removal and xmit operations to the bonding
>slave devices. The overall goal of this work is that XDP programs
>can be attached to a bond device *without* any further changes (or
>awareness) necessary to the program itself, meaning the same XDP
>program can be attached to a native device but also a bonding device.
>
>Semantics of XDP_TX when attached to a bond are equivalent in such
>setting to the case when a tc/BPF program would be attached to the
>bond, meaning transmitting the packet out of the bond itself using one
>of the bond's configured xmit methods to select a slave device (rather
>than XDP_TX on the slave itself). Handling of XDP_TX to transmit
>using the configured bonding mechanism is therefore implemented by
>rewriting the BPF program return value in bpf_prog_run_xdp. To avoid
>performance impact this check is guarded by a static key, which is
>incremented when a XDP program is loaded onto a bond device. This
>approach was chosen to avoid changes to drivers implementing XDP. If
>the slave device does not match the receive device, then XDP_REDIRECT
>is transparently used to perform the redirection in order to have
>the network driver release the packet from its RX ring.  The bonding
>driver hashing functions have been refactored to allow reuse with
>xdp_buff's to avoid code duplication.
>
>The motivation for this change is to enable use of bonding (and
>802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
>XDP and also to transparently support bond devices for projects that
>use XDP given most modern NICs have dual port adapters.  An alternative
>to this approach would be to implement 802.3ad in user-space and
>implement the bonding load-balancing in the XDP program itself, but
>is rather a cumbersome endeavor in terms of slave device management
>(e.g. by watching netlink) and requires separate programs for native
>vs bond cases for the orchestrator. A native in-kernel implementation
>overcomes these issues and provides more flexibility.
>
>Below are benchmark results done on two machines with 100Gbit
>Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
>16-core 3950X on receiving machine. 64 byte packets were sent with
>pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
>ice driver, so the tests were performed with iommu=off and patch [2]
>applied. Additionally the bonding round robin algorithm was modified
>to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
>of cache misses were caused by the shared rr_tx_counter (see patch
>2/3). The statistics were collected using "sar -n dev -u 1 10".
>
> -----------------------|  CPU  |--| rxpck/s |--| txpck/s |----
> without patch (1 dev):
>   XDP_DROP:              3.15%      48.6Mpps
>   XDP_TX:                3.12%      18.3Mpps     18.3Mpps
>   XDP_DROP (RSS):        9.47%      116.5Mpps
>   XDP_TX (RSS):          9.67%      25.3Mpps     24.2Mpps
> -----------------------
> with patch, bond (1 dev):
>   XDP_DROP:              3.14%      46.7Mpps
>   XDP_TX:                3.15%      13.9Mpps     13.9Mpps
>   XDP_DROP (RSS):        10.33%     117.2Mpps
>   XDP_TX (RSS):          10.64%     25.1Mpps     24.0Mpps
> -----------------------
> with patch, bond (2 devs):
>   XDP_DROP:              6.27%      92.7Mpps
>   XDP_TX:                6.26%      17.6Mpps     17.5Mpps
>   XDP_DROP (RSS):       11.38%      117.2Mpps
>   XDP_TX (RSS):         14.30%      28.7Mpps     27.4Mpps
> --------------------------------------------------------------
>
>RSS: Receive Side Scaling, e.g. the packets were sent to a range of
>destination IPs.
>
>[1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb
>[2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t
>[3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/

	The design adds logic around a bpf_bond_redirect_enabled_key
static key in the BPF core functions dev_map_enqueue_multi,
dev_map_redirect_multi and bpf_prog_run_xdp.  Is this something that is
correctly implemented as a special case just for bonding (i.e., it will
never ever have to be extended), or is it possible that other
upper/lower type software devices will have similar XDP functionality
added in the future, e.g., bridge, VLAN, etc?

>Signed-off-by: Jussi Maki <joamaki@gmail.com>
>---
> drivers/net/bonding/bond_main.c | 441 ++++++++++++++++++++++++++++----
> include/linux/filter.h          |  13 +-
> include/linux/netdevice.h       |   5 +
> include/net/bonding.h           |   1 +
> kernel/bpf/devmap.c             |  34 ++-
> net/core/filter.c               |  37 ++-
> 6 files changed, 467 insertions(+), 64 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index dafeaef3cbd3..38eea7e096f3 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -317,6 +317,19 @@ bool bond_sk_check(struct bonding *bond)
> 	}
> }
> 
>+static bool bond_xdp_check(struct bonding *bond)
>+{
>+	switch (BOND_MODE(bond)) {
>+	case BOND_MODE_ROUNDROBIN:
>+	case BOND_MODE_ACTIVEBACKUP:
>+	case BOND_MODE_8023AD:
>+	case BOND_MODE_XOR:
>+		return true;
>+	default:
>+		return false;
>+	}
>+}
>+
> /*---------------------------------- VLAN -----------------------------------*/
> 
> /* In the following 2 functions, bond_vlan_rx_add_vid and bond_vlan_rx_kill_vid,
>@@ -2001,6 +2014,28 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
> 	if (bond_mode_can_use_xmit_hash(bond))
> 		bond_update_slave_arr(bond, NULL);
> 
>+	if (bond->xdp_prog) {

	Will everything that declares or references ->xdp_prog fail to
compile if CONFIG_BPF is not set in the kernel config?

>+		struct netdev_bpf xdp = {
>+			.command = XDP_SETUP_PROG,
>+			.flags   = 0,
>+			.prog    = bond->xdp_prog,
>+			.extack  = extack,
>+		};
>+		if (!slave_dev->netdev_ops->ndo_bpf ||
>+		    !slave_dev->netdev_ops->ndo_xdp_xmit) {
>+			NL_SET_ERR_MSG(extack, "Slave does not support XDP");
>+			slave_err(bond_dev, slave_dev, "Slave does not support XDP\n");
>+			res = -EOPNOTSUPP;
>+			goto err_sysfs_del;
>+		}
>+		res = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp);
>+		if (res < 0) {
>+			/* ndo_bpf() sets extack error message */
>+			slave_dbg(bond_dev, slave_dev, "Error %d calling ndo_bpf\n", res);
>+			goto err_sysfs_del;
>+		}
>+		bpf_prog_inc(bond->xdp_prog);
>+	}
> 
> 	slave_info(bond_dev, slave_dev, "Enslaving as %s interface with %s link\n",
> 		   bond_is_active_slave(new_slave) ? "an active" : "a backup",
>@@ -2121,6 +2156,17 @@ static int __bond_release_one(struct net_device *bond_dev,
> 	/* recompute stats just before removing the slave */
> 	bond_get_stats(bond->dev, &bond->bond_stats);
> 
>+	if (bond->xdp_prog) {
>+		struct netdev_bpf xdp = {
>+			.command = XDP_SETUP_PROG,
>+			.flags   = 0,
>+			.prog	 = NULL,
>+			.extack  = NULL,
>+		};
>+		if (slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp))
>+			slave_warn(bond_dev, slave_dev, "failed to unload XDP program\n");
>+	}
>+
> 	bond_upper_dev_unlink(bond, slave);
> 	/* unregister rx_handler early so bond_handle_frame wouldn't be called
> 	 * for this slave anymore.
>@@ -3479,55 +3525,80 @@ static struct notifier_block bond_netdev_notifier = {
> 
> /*---------------------------- Hashing Policies -----------------------------*/
> 
>+/* Helper to access data in a packet, with or without a backing skb.
>+ * If skb is given the data is linearized if necessary via pskb_may_pull.
>+ */
>+static inline const void *bond_pull_data(struct sk_buff *skb,
>+					 const void *data, int hlen, int n)
>+{
>+	if (likely(n <= hlen))
>+		return data;
>+	else if (skb && likely(pskb_may_pull(skb, n)))
>+		return skb->head;
>+
>+	return NULL;
>+}
>+
> /* L2 hash helper */
>-static inline u32 bond_eth_hash(struct sk_buff *skb)
>+static inline u32 bond_eth_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen)
> {
>-	struct ethhdr *ep, hdr_tmp;
>+	struct ethhdr *ep;
> 
>-	ep = skb_header_pointer(skb, 0, sizeof(hdr_tmp), &hdr_tmp);
>-	if (ep)
>-		return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto;
>-	return 0;
>+	data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr));
>+	if (!data)
>+		return 0;
>+
>+	ep = (struct ethhdr *)(data + mhoff);
>+	return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto;
> }
> 
>-static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk,
>-			 int *noff, int *proto, bool l34)
>+static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk, const void *data,
>+			 int hlen, int l2_proto, int *nhoff, int *ip_proto, bool l34)
> {
> 	const struct ipv6hdr *iph6;
> 	const struct iphdr *iph;
> 
>-	if (skb->protocol == htons(ETH_P_IP)) {
>-		if (unlikely(!pskb_may_pull(skb, *noff + sizeof(*iph))))
>+	if (l2_proto == htons(ETH_P_IP)) {
>+		data = bond_pull_data(skb, data, hlen, *nhoff + sizeof(*iph));
>+		if (!data)
> 			return false;
>-		iph = (const struct iphdr *)(skb->data + *noff);
>+
>+		iph = (const struct iphdr *)(data + *nhoff);
> 		iph_to_flow_copy_v4addrs(fk, iph);
>-		*noff += iph->ihl << 2;
>+		*nhoff += iph->ihl << 2;
> 		if (!ip_is_fragment(iph))
>-			*proto = iph->protocol;
>-	} else if (skb->protocol == htons(ETH_P_IPV6)) {
>-		if (unlikely(!pskb_may_pull(skb, *noff + sizeof(*iph6))))
>+			*ip_proto = iph->protocol;
>+	} else if (l2_proto == htons(ETH_P_IPV6)) {
>+		data = bond_pull_data(skb, data, hlen, *nhoff + sizeof(*iph6));
>+		if (!data)
> 			return false;
>-		iph6 = (const struct ipv6hdr *)(skb->data + *noff);
>+
>+		iph6 = (const struct ipv6hdr *)(data + *nhoff);
> 		iph_to_flow_copy_v6addrs(fk, iph6);
>-		*noff += sizeof(*iph6);
>-		*proto = iph6->nexthdr;
>+		*nhoff += sizeof(*iph6);
>+		*ip_proto = iph6->nexthdr;
> 	} else {
> 		return false;
> 	}
> 
>-	if (l34 && *proto >= 0)
>-		fk->ports.ports = skb_flow_get_ports(skb, *noff, *proto);
>+	if (l34 && *ip_proto >= 0)
>+		fk->ports.ports = __skb_flow_get_ports(skb, *nhoff, *ip_proto, data, hlen);
> 
> 	return true;
> }
> 
>-static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
>+static u32 bond_vlan_srcmac_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen)
> {
>-	struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
>+	struct ethhdr *mac_hdr;
> 	u32 srcmac_vendor = 0, srcmac_dev = 0;
> 	u16 vlan;
> 	int i;
> 
>+	data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr));
>+	if (!data)
>+		return 0;
>+	mac_hdr = (struct ethhdr *)(data + mhoff);
>+
> 	for (i = 0; i < 3; i++)
> 		srcmac_vendor = (srcmac_vendor << 8) | mac_hdr->h_source[i];
> 
>@@ -3543,26 +3614,30 @@ static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
> }
> 
> /* Extract the appropriate headers based on bond's xmit policy */
>-static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
>+static bool bond_flow_dissect(struct bonding *bond,
>+			      struct sk_buff *skb,
>+			      const void *data,
>+			      __be16 l2_proto,
>+			      int nhoff,
>+			      int hlen,
> 			      struct flow_keys *fk)

	Please compact the argument list down to fewer lines, in
conformance with usual coding practice in the kernel.  The above style
of formatting occurs multiple times in this patch, both in function
declarations and function calls.

> {
> 	bool l34 = bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34;
>-	int noff, proto = -1;
>+	int ip_proto = -1;
> 
> 	switch (bond->params.xmit_policy) {
> 	case BOND_XMIT_POLICY_ENCAP23:
> 	case BOND_XMIT_POLICY_ENCAP34:
> 		memset(fk, 0, sizeof(*fk));
> 		return __skb_flow_dissect(NULL, skb, &flow_keys_bonding,
>-					  fk, NULL, 0, 0, 0, 0);
>+					  fk, data, l2_proto, nhoff, hlen, 0);
> 	default:
> 		break;
> 	}
> 
> 	fk->ports.ports = 0;
> 	memset(&fk->icmp, 0, sizeof(fk->icmp));
>-	noff = skb_network_offset(skb);
>-	if (!bond_flow_ip(skb, fk, &noff, &proto, l34))
>+	if (!bond_flow_ip(skb, fk, data, hlen, l2_proto, &nhoff, &ip_proto, l34))
> 		return false;
> 
> 	/* ICMP error packets contains at least 8 bytes of the header
>@@ -3570,22 +3645,20 @@ static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
> 	 * to correlate ICMP error packets within the same flow which
> 	 * generated the error.
> 	 */
>-	if (proto == IPPROTO_ICMP || proto == IPPROTO_ICMPV6) {
>-		skb_flow_get_icmp_tci(skb, &fk->icmp, skb->data,
>-				      skb_transport_offset(skb),
>-				      skb_headlen(skb));
>-		if (proto == IPPROTO_ICMP) {
>+	if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
>+		skb_flow_get_icmp_tci(skb, &fk->icmp, data, nhoff, hlen);
>+		if (ip_proto == IPPROTO_ICMP) {
> 			if (!icmp_is_err(fk->icmp.type))
> 				return true;
> 
>-			noff += sizeof(struct icmphdr);
>-		} else if (proto == IPPROTO_ICMPV6) {
>+			nhoff += sizeof(struct icmphdr);
>+		} else if (ip_proto == IPPROTO_ICMPV6) {
> 			if (!icmpv6_is_err(fk->icmp.type))
> 				return true;
> 
>-			noff += sizeof(struct icmp6hdr);
>+			nhoff += sizeof(struct icmp6hdr);
> 		}
>-		return bond_flow_ip(skb, fk, &noff, &proto, l34);
>+		return bond_flow_ip(skb, fk, data, hlen, l2_proto, &nhoff, &ip_proto, l34);
> 	}
> 
> 	return true;
>@@ -3601,33 +3674,30 @@ static u32 bond_ip_hash(u32 hash, struct flow_keys *flow)
> 	return hash >> 1;
> }
> 
>-/**
>- * bond_xmit_hash - generate a hash value based on the xmit policy
>- * @bond: bonding device
>- * @skb: buffer to use for headers
>- *
>- * This function will extract the necessary headers from the skb buffer and use
>- * them to generate a hash based on the xmit_policy set in the bonding device
>+/* Generate hash based on xmit policy. If @skb is given it is used to linearize
>+ * the data as required, but this function can be used without it.

	Please don't remove kernel-doc formatting; add your new
parameters to the documentation.

>  */
>-u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
>+static u32 __bond_xmit_hash(struct bonding *bond,
>+			    struct sk_buff *skb,
>+			    const void *data,
>+			    __be16 l2_proto,
>+			    int mhoff,
>+			    int nhoff,
>+			    int hlen)
> {
> 	struct flow_keys flow;
> 	u32 hash;
> 
>-	if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
>-	    skb->l4_hash)
>-		return skb->hash;
>-
> 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_VLAN_SRCMAC)
>-		return bond_vlan_srcmac_hash(skb);
>+		return bond_vlan_srcmac_hash(skb, data, mhoff, hlen);
> 
> 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
>-	    !bond_flow_dissect(bond, skb, &flow))
>-		return bond_eth_hash(skb);
>+	    !bond_flow_dissect(bond, skb, data, l2_proto, nhoff, hlen, &flow))
>+		return bond_eth_hash(skb, data, mhoff, hlen);
> 
> 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 ||
> 	    bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23) {
>-		hash = bond_eth_hash(skb);
>+		hash = bond_eth_hash(skb, data, mhoff, hlen);
> 	} else {
> 		if (flow.icmp.id)
> 			memcpy(&hash, &flow.icmp, sizeof(hash));
>@@ -3638,6 +3708,48 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
> 	return bond_ip_hash(hash, &flow);
> }
> 
>+/**
>+ * bond_xmit_hash_skb - generate a hash value based on the xmit policy
>+ * @bond: bonding device
>+ * @skb: buffer to use for headers
>+ *
>+ * This function will extract the necessary headers from the skb buffer and use
>+ * them to generate a hash based on the xmit_policy set in the bonding device
>+ */
>+u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
>+{
>+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
>+	    skb->l4_hash)
>+		return skb->hash;
>+
>+	return __bond_xmit_hash(bond, skb, skb->head, skb->protocol,
>+				skb->mac_header,
>+				skb->network_header,
>+				skb_headlen(skb));
>+}
>+
>+/**
>+ * bond_xmit_hash_xdp - generate a hash value based on the xmit policy
>+ * @bond: bonding device
>+ * @xdp: buffer to use for headers
>+ *
>+ * XDP variant of bond_xmit_hash.
>+ */
>+static u32 bond_xmit_hash_xdp(struct bonding *bond, struct xdp_buff *xdp)
>+{
>+	struct ethhdr *eth;
>+
>+	if (xdp->data + sizeof(struct ethhdr) > xdp->data_end)
>+		return 0;
>+
>+	eth = (struct ethhdr *)xdp->data;
>+
>+	return __bond_xmit_hash(bond, NULL, xdp->data, eth->h_proto,
>+				0,
>+				sizeof(struct ethhdr),
>+				xdp->data_end - xdp->data);
>+}
>+
> /*-------------------------- Device entry points ----------------------------*/
> 
> void bond_work_init_all(struct bonding *bond)
>@@ -4254,6 +4366,47 @@ static struct slave *bond_xmit_roundrobin_slave_get(struct bonding *bond,
> 	return NULL;
> }
> 
>+static struct slave *bond_xdp_xmit_roundrobin_slave_get(struct bonding *bond,
>+							struct xdp_buff *xdp)
>+{
>+	struct slave *slave;
>+	int slave_cnt;
>+	u32 slave_id;
>+	const struct ethhdr *eth;
>+	void *data = xdp->data;
>+
>+	if (data + sizeof(struct ethhdr) > xdp->data_end)
>+		goto non_igmp;
>+
>+	eth = (struct ethhdr *)data;
>+	data += sizeof(struct ethhdr);
>+
>+	/* See comment on IGMP in bond_xmit_roundrobin_slave_get() */
>+	if (eth->h_proto == htons(ETH_P_IP)) {
>+		const struct iphdr *iph;
>+
>+		if (data + sizeof(struct iphdr) > xdp->data_end)
>+			goto non_igmp;
>+
>+		iph = (struct iphdr *)data;
>+
>+		if (iph->protocol == IPPROTO_IGMP) {
>+			slave = rcu_dereference(bond->curr_active_slave);
>+			if (slave)
>+				return slave;
>+			return bond_get_slave_by_id(bond, 0);
>+		}
>+	}
>+
>+non_igmp:
>+	slave_cnt = READ_ONCE(bond->slave_cnt);
>+	if (likely(slave_cnt)) {
>+		slave_id = bond_rr_gen_slave_id(bond) % slave_cnt;
>+		return bond_get_slave_by_id(bond, slave_id);
>+	}
>+	return NULL;
>+}
>+
> static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
> 					struct net_device *bond_dev)
> {
>@@ -4267,8 +4420,7 @@ static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
> 	return bond_tx_drop(bond_dev, skb);
> }
> 
>-static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond,
>-						      struct sk_buff *skb)
>+static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond)
> {
> 	return rcu_dereference(bond->curr_active_slave);
> }
>@@ -4282,7 +4434,7 @@ static netdev_tx_t bond_xmit_activebackup(struct sk_buff *skb,
> 	struct bonding *bond = netdev_priv(bond_dev);
> 	struct slave *slave;
> 
>-	slave = bond_xmit_activebackup_slave_get(bond, skb);
>+	slave = bond_xmit_activebackup_slave_get(bond);
> 	if (slave)
> 		return bond_dev_queue_xmit(bond, skb, slave->dev);
> 
>@@ -4470,6 +4622,22 @@ static struct slave *bond_xmit_3ad_xor_slave_get(struct bonding *bond,
> 	return slave;
> }
> 
>+static struct slave *bond_xdp_xmit_3ad_xor_slave_get(struct bonding *bond,
>+						     struct xdp_buff *xdp)
>+{
>+	struct bond_up_slave *slaves;
>+	unsigned int count;
>+	u32 hash;
>+
>+	hash = bond_xmit_hash_xdp(bond, xdp);
>+	slaves = bond->usable_slaves;
>+	count = slaves ? READ_ONCE(slaves->count) : 0;
>+	if (unlikely(!count))
>+		return NULL;
>+
>+	return slaves->arr[hash % count];
>+}
>+
> /* Use this Xmit function for 3AD as well as XOR modes. The current
>  * usable slave array is formed in the control path. The xmit function
>  * just calculates hash and sends the packet out.
>@@ -4580,7 +4748,7 @@ static struct net_device *bond_xmit_get_slave(struct net_device *master_dev,
> 		slave = bond_xmit_roundrobin_slave_get(bond, skb);
> 		break;
> 	case BOND_MODE_ACTIVEBACKUP:
>-		slave = bond_xmit_activebackup_slave_get(bond, skb);
>+		slave = bond_xmit_activebackup_slave_get(bond);
> 		break;
> 	case BOND_MODE_8023AD:
> 	case BOND_MODE_XOR:
>@@ -4754,6 +4922,164 @@ static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
> 	return ret;
> }
> 
>+struct net_device *
>+bond_xdp_get_xmit_slave(struct net_device *bond_dev, struct xdp_buff *xdp)
>+{
>+	struct bonding *bond = netdev_priv(bond_dev);
>+	struct slave *slave;
>+
>+	/* Caller needs to hold rcu_read_lock() */
>+
>+	switch (BOND_MODE(bond)) {
>+	case BOND_MODE_ROUNDROBIN:
>+		slave = bond_xdp_xmit_roundrobin_slave_get(bond, xdp);
>+		break;
>+
>+	case BOND_MODE_ACTIVEBACKUP:
>+		slave = bond_xmit_activebackup_slave_get(bond);
>+		break;
>+
>+	case BOND_MODE_8023AD:
>+	case BOND_MODE_XOR:
>+		slave = bond_xdp_xmit_3ad_xor_slave_get(bond, xdp);
>+		break;
>+
>+	default:
>+		/* Should never happen. Mode guarded by bond_xdp_check() */
>+		netdev_err(bond_dev, "Unknown bonding mode %d for xdp xmit\n", BOND_MODE(bond));
>+		WARN_ON_ONCE(1);
>+		return NULL;
>+	}
>+
>+	if (slave)
>+		return slave->dev;
>+
>+	return NULL;
>+}
>+
>+static int bond_xdp_xmit(struct net_device *bond_dev,
>+			 int n, struct xdp_frame **frames, u32 flags)
>+{
>+	int nxmit, err = -ENXIO;
>+
>+	rcu_read_lock();
>+
>+	for (nxmit = 0; nxmit < n; nxmit++) {
>+		struct xdp_frame *frame = frames[nxmit];
>+		struct xdp_frame *frames1[] = {frame};
>+		struct net_device *slave_dev;
>+		struct xdp_buff xdp;
>+
>+		xdp_convert_frame_to_buff(frame, &xdp);
>+
>+		slave_dev = bond_xdp_get_xmit_slave(bond_dev, &xdp);
>+		if (!slave_dev) {
>+			err = -ENXIO;
>+			break;
>+		}
>+
>+		err = slave_dev->netdev_ops->ndo_xdp_xmit(slave_dev, 1, frames1, flags);
>+		if (err < 1)
>+			break;
>+	}
>+
>+	rcu_read_unlock();
>+
>+	/* If error happened on the first frame then we can pass the error up, otherwise
>+	 * report the number of frames that were xmitted.
>+	 */
>+	if (err < 0)
>+		return (nxmit == 0 ? err : nxmit);
>+
>+	return nxmit;
>+}
>+
>+static int bond_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>+			struct netlink_ext_ack *extack)
>+{
>+	struct bonding *bond = netdev_priv(dev);
>+	struct list_head *iter;
>+	struct slave *slave, *rollback_slave;
>+	struct bpf_prog *old_prog;
>+	struct netdev_bpf xdp = {
>+		.command = XDP_SETUP_PROG,
>+		.flags   = 0,
>+		.prog    = prog,
>+		.extack  = extack,
>+	};
>+	int err;
>+
>+	ASSERT_RTNL();
>+
>+	if (!bond_xdp_check(bond))
>+		return -EOPNOTSUPP;
>+
>+	old_prog = bond->xdp_prog;
>+	bond->xdp_prog = prog;
>+
>+	bond_for_each_slave(bond, slave, iter) {
>+		struct net_device *slave_dev = slave->dev;
>+
>+		if (!slave_dev->netdev_ops->ndo_bpf ||
>+		    !slave_dev->netdev_ops->ndo_xdp_xmit) {
>+			NL_SET_ERR_MSG(extack, "Slave device does not support XDP");
>+			slave_err(dev, slave_dev, "Slave does not support XDP\n");
>+			err = -EOPNOTSUPP;
>+			goto err;
>+		}
>+		err = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp);
>+		if (err < 0) {
>+			/* ndo_bpf() sets extack error message */
>+			slave_err(dev, slave_dev, "Error %d calling ndo_bpf\n", err);
>+			goto err;
>+		}
>+		if (prog)
>+			bpf_prog_inc(prog);
>+	}
>+
>+	if (old_prog)
>+		bpf_prog_put(old_prog);
>+
>+	if (prog)
>+		static_branch_inc(&bpf_bond_redirect_enabled_key);
>+	else
>+		static_branch_dec(&bpf_bond_redirect_enabled_key);
>+
>+	return 0;
>+
>+err:
>+	/* unwind the program changes */
>+	bond->xdp_prog = old_prog;
>+	xdp.prog = old_prog;
>+	xdp.extack = NULL; /* do not overwrite original error */
>+
>+	bond_for_each_slave(bond, rollback_slave, iter) {
>+		struct net_device *slave_dev = rollback_slave->dev;
>+		int err_unwind;
>+
>+		if (slave == rollback_slave)
>+			break;
>+
>+		err_unwind = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp);
>+		if (err_unwind < 0)
>+			slave_err(dev, slave_dev,
>+				  "Error %d when unwinding XDP program change\n", err_unwind);
>+		else if (xdp.prog)
>+			bpf_prog_inc(xdp.prog);
>+	}
>+	return err;
>+}
>+
>+static int bond_xdp(struct net_device *dev, struct netdev_bpf *xdp)
>+{
>+	switch (xdp->command) {
>+	case XDP_SETUP_PROG:
>+		return bond_xdp_set(dev, xdp->prog, xdp->extack);
>+	default:
>+		return -EINVAL;
>+	}
>+}
>+
> static u32 bond_mode_bcast_speed(struct slave *slave, u32 speed)
> {
> 	if (speed == 0 || speed == SPEED_UNKNOWN)
>@@ -4840,6 +5166,9 @@ static const struct net_device_ops bond_netdev_ops = {
> 	.ndo_features_check	= passthru_features_check,
> 	.ndo_get_xmit_slave	= bond_xmit_get_slave,
> 	.ndo_sk_get_lower_dev	= bond_sk_get_lower_dev,
>+	.ndo_bpf		= bond_xdp,
>+	.ndo_xdp_xmit           = bond_xdp_xmit,
>+	.ndo_xdp_get_xmit_slave = bond_xdp_get_xmit_slave,
> };
> 
> static const struct device_type bond_type = {
>diff --git a/include/linux/filter.h b/include/linux/filter.h
>index c5ad7df029ed..57c166089456 100644
>--- a/include/linux/filter.h
>+++ b/include/linux/filter.h
>@@ -760,6 +760,10 @@ static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog,
> 
> DECLARE_BPF_DISPATCHER(xdp)
> 
>+DECLARE_STATIC_KEY_FALSE(bpf_bond_redirect_enabled_key);
>+
>+u32 xdp_bond_redirect(struct xdp_buff *xdp);
>+
> static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
> 					    struct xdp_buff *xdp)
> {
>@@ -769,7 +773,14 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
> 	 * already takes rcu_read_lock() when fetching the program, so
> 	 * it's not necessary here anymore.
> 	 */
>-	return __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
>+	u32 act = __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
>+
>+	if (static_branch_unlikely(&bpf_bond_redirect_enabled_key)) {
>+		if (act == XDP_TX && netif_is_bond_slave(xdp->rxq->dev))
>+			act = xdp_bond_redirect(xdp);
>+	}
>+
>+	return act;
> }
> 
> void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog);
>diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>index 5cbc950b34df..1a6cc6356498 100644
>--- a/include/linux/netdevice.h
>+++ b/include/linux/netdevice.h
>@@ -1321,6 +1321,9 @@ struct netdev_net_notifier {
>  *	that got dropped are freed/returned via xdp_return_frame().
>  *	Returns negative number, means general error invoking ndo, meaning
>  *	no frames were xmit'ed and core-caller will free all frames.
>+ * struct net_device *(*ndo_xdp_get_xmit_slave)(struct net_device *dev,
>+ *					        struct xdp_buff *xdp);
>+ *      Get the xmit slave of master device based on the xdp_buff.
>  * int (*ndo_xsk_wakeup)(struct net_device *dev, u32 queue_id, u32 flags);
>  *      This function is used to wake up the softirq, ksoftirqd or kthread
>  *	responsible for sending and/or receiving packets on a specific
>@@ -1539,6 +1542,8 @@ struct net_device_ops {
> 	int			(*ndo_xdp_xmit)(struct net_device *dev, int n,
> 						struct xdp_frame **xdp,
> 						u32 flags);
>+	struct net_device *	(*ndo_xdp_get_xmit_slave)(struct net_device *dev,
>+							  struct xdp_buff *xdp);
> 	int			(*ndo_xsk_wakeup)(struct net_device *dev,
> 						  u32 queue_id, u32 flags);
> 	struct devlink_port *	(*ndo_get_devlink_port)(struct net_device *dev);
>diff --git a/include/net/bonding.h b/include/net/bonding.h
>index 019e998d944a..34acb81b4234 100644
>--- a/include/net/bonding.h
>+++ b/include/net/bonding.h
>@@ -251,6 +251,7 @@ struct bonding {
> #ifdef CONFIG_XFRM_OFFLOAD
> 	struct xfrm_state *xs;
> #endif /* CONFIG_XFRM_OFFLOAD */
>+	struct bpf_prog *xdp_prog;
> };
> 
> #define bond_slave_get_rcu(dev) \
>diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
>index 2a75e6c2d27d..2caff5714f4d 100644
>--- a/kernel/bpf/devmap.c
>+++ b/kernel/bpf/devmap.c
>@@ -514,9 +514,11 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
> }
> 
> static bool is_valid_dst(struct bpf_dtab_netdev *obj, struct xdp_buff *xdp,
>-			 int exclude_ifindex)
>+			 int exclude_ifindex, int exclude_ifindex_master)
> {
>-	if (!obj || obj->dev->ifindex == exclude_ifindex ||
>+	if (!obj ||
>+	    obj->dev->ifindex == exclude_ifindex ||
>+	    obj->dev->ifindex == exclude_ifindex_master ||
> 	    !obj->dev->netdev_ops->ndo_xdp_xmit)
> 		return false;
> 
>@@ -546,12 +548,19 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
> {
> 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
> 	int exclude_ifindex = exclude_ingress ? dev_rx->ifindex : 0;
>+	int exclude_ifindex_master = 0;
> 	struct bpf_dtab_netdev *dst, *last_dst = NULL;
> 	struct hlist_head *head;
> 	struct xdp_frame *xdpf;
> 	unsigned int i;
> 	int err;
> 
>+	if (static_branch_unlikely(&bpf_bond_redirect_enabled_key)) {
>+		struct net_device *master = netdev_master_upper_dev_get_rcu(dev_rx);
>+
>+		exclude_ifindex_master = (master && exclude_ingress) ? master->ifindex : 0;
>+	}
>+
> 	xdpf = xdp_convert_buff_to_frame(xdp);
> 	if (unlikely(!xdpf))
> 		return -EOVERFLOW;
>@@ -559,7 +568,7 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
> 	if (map->map_type == BPF_MAP_TYPE_DEVMAP) {
> 		for (i = 0; i < map->max_entries; i++) {
> 			dst = READ_ONCE(dtab->netdev_map[i]);
>-			if (!is_valid_dst(dst, xdp, exclude_ifindex))
>+			if (!is_valid_dst(dst, xdp, exclude_ifindex, exclude_ifindex_master))
> 				continue;
> 
> 			/* we only need n-1 clones; last_dst enqueued below */
>@@ -579,7 +588,9 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
> 			head = dev_map_index_hash(dtab, i);
> 			hlist_for_each_entry_rcu(dst, head, index_hlist,
> 						 lockdep_is_held(&dtab->index_lock)) {
>-				if (!is_valid_dst(dst, xdp, exclude_ifindex))
>+				if (!is_valid_dst(dst, xdp,
>+						  exclude_ifindex,
>+						  exclude_ifindex_master))
> 					continue;
> 
> 				/* we only need n-1 clones; last_dst enqueued below */
>@@ -646,16 +657,25 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb,
> {
> 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
> 	int exclude_ifindex = exclude_ingress ? dev->ifindex : 0;
>+	int exclude_ifindex_master = 0;
> 	struct bpf_dtab_netdev *dst, *last_dst = NULL;
> 	struct hlist_head *head;
> 	struct hlist_node *next;
> 	unsigned int i;
> 	int err;
> 
>+	if (static_branch_unlikely(&bpf_bond_redirect_enabled_key)) {
>+		struct net_device *master = netdev_master_upper_dev_get_rcu(dev);
>+
>+		exclude_ifindex_master = (master && exclude_ingress) ? master->ifindex : 0;
>+	}
>+
> 	if (map->map_type == BPF_MAP_TYPE_DEVMAP) {
> 		for (i = 0; i < map->max_entries; i++) {
> 			dst = READ_ONCE(dtab->netdev_map[i]);
>-			if (!dst || dst->dev->ifindex == exclude_ifindex)
>+			if (!dst ||
>+			    dst->dev->ifindex == exclude_ifindex ||
>+			    dst->dev->ifindex == exclude_ifindex_master)
> 				continue;
> 
> 			/* we only need n-1 clones; last_dst enqueued below */
>@@ -674,7 +694,9 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb,
> 		for (i = 0; i < dtab->n_buckets; i++) {
> 			head = dev_map_index_hash(dtab, i);
> 			hlist_for_each_entry_safe(dst, next, head, index_hlist) {
>-				if (!dst || dst->dev->ifindex == exclude_ifindex)
>+				if (!dst ||
>+				    dst->dev->ifindex == exclude_ifindex ||
>+				    dst->dev->ifindex == exclude_ifindex_master)
> 					continue;
> 
> 				/* we only need n-1 clones; last_dst enqueued below */
>diff --git a/net/core/filter.c b/net/core/filter.c
>index caa88955562e..5d268eb980e7 100644
>--- a/net/core/filter.c
>+++ b/net/core/filter.c
>@@ -2469,6 +2469,7 @@ int skb_do_redirect(struct sk_buff *skb)
> 	ri->flags = 0;
> 	if (unlikely(!dev))
> 		goto out_drop;
>+
> 	if (flags & BPF_F_PEER) {
> 		const struct net_device_ops *ops = dev->netdev_ops;
> 
>@@ -3947,6 +3948,40 @@ void bpf_clear_redirect_map(struct bpf_map *map)
> 	}
> }
> 
>+DEFINE_STATIC_KEY_FALSE(bpf_bond_redirect_enabled_key);
>+EXPORT_SYMBOL_GPL(bpf_bond_redirect_enabled_key);
>+INDIRECT_CALLABLE_DECLARE(struct net_device *
>+	bond_xdp_get_xmit_slave(struct net_device *bond_dev, struct xdp_buff *xdp));
>+
>+u32 xdp_bond_redirect(struct xdp_buff *xdp)
>+{
>+	struct net_device *master, *slave;
>+	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
>+
>+	master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev);
>+
>+#if IS_BUILTIN(CONFIG_BONDING)
>+	slave = INDIRECT_CALL_1(master->netdev_ops->ndo_xdp_get_xmit_slave,
>+				bond_xdp_get_xmit_slave,
>+				master, xdp);
>+#else
>+	slave = master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp);
>+#endif
>+	if (slave && slave != xdp->rxq->dev) {
>+		/* The target device is different from the receiving device, so
>+		 * redirect it to the new device.
>+		 * Using XDP_REDIRECT gets the correct behaviour from XDP enabled
>+		 * drivers to unmap the packet from their rx ring.
>+		 */
>+		ri->tgt_index = slave->ifindex;
>+		ri->map_id = INT_MAX;
>+		ri->map_type = BPF_MAP_TYPE_UNSPEC;
>+		return XDP_REDIRECT;
>+	}
>+	return XDP_TX;
>+}
>+EXPORT_SYMBOL_GPL(xdp_bond_redirect);
>+
> int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
> 		    struct bpf_prog *xdp_prog)
> {
>@@ -4466,7 +4501,7 @@ static const struct bpf_func_proto bpf_skb_cgroup_id_proto = {
> };
> 
> static inline u64 __bpf_sk_ancestor_cgroup_id(struct sock *sk,
>-					      int ancestor_level)
>+					     int ancestor_level)
> {
> 	struct cgroup *ancestor;
> 	struct cgroup *cgrp;
>-- 
>2.30.2
>

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next 2/3] net: bonding: Use per-cpu rr_tx_counter
  2021-06-09 13:55 ` [PATCH bpf-next 2/3] net: bonding: Use per-cpu rr_tx_counter Jussi Maki
@ 2021-06-10  0:04   ` Jay Vosburgh
  2021-06-14  7:54     ` Jussi Maki
  0 siblings, 1 reply; 71+ messages in thread
From: Jay Vosburgh @ 2021-06-10  0:04 UTC (permalink / raw)
  To: Jussi Maki; +Cc: bpf, netdev, daniel, andy, vfalico, andrii

Jussi Maki <joamaki@gmail.com> wrote:

>The round-robin rr_tx_counter was shared across CPUs leading
>to significant cache trashing at high packet rates. This patch

	"trashing" -> "thrashing" ?

>switches the round-robin mechanism to use a per-cpu counter to
>decide the destination device.
>
>On a 100Gbit 64 byte packet test this reduces the CPU load from
>50% to 10% on the test system.
>
>Signed-off-by: Jussi Maki <joamaki@gmail.com>
>---
> drivers/net/bonding/bond_main.c | 18 +++++++++++++++---
> include/net/bonding.h           |  2 +-
> 2 files changed, 16 insertions(+), 4 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 38eea7e096f3..917dd2cdcbf4 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -4314,16 +4314,16 @@ static u32 bond_rr_gen_slave_id(struct bonding *bond)
> 		slave_id = prandom_u32();
> 		break;
> 	case 1:
>-		slave_id = bond->rr_tx_counter;
>+		slave_id = this_cpu_inc_return(*bond->rr_tx_counter);
> 		break;
> 	default:
> 		reciprocal_packets_per_slave =
> 			bond->params.reciprocal_packets_per_slave;
>-		slave_id = reciprocal_divide(bond->rr_tx_counter,
>+		slave_id = this_cpu_inc_return(*bond->rr_tx_counter);
>+		slave_id = reciprocal_divide(slave_id,
> 					     reciprocal_packets_per_slave);

	With the rr_tx_counter is per-cpu, each CPU is essentially doing
its own round-robin logic, independently of other CPUs, so the resulting
spread of transmitted packets may not be as evenly distributed (as
multiple CPUs could select the same interface to transmit on
approximately in lock-step).  I'm not sure if this could cause actual
problems in practice, though, as particular flows shouldn't skip between
CPUs (and thus rr_tx_counters) very often, and round-robin already
shouldn't be the first choice if no packet reordering is a hard
requirement.

	I think this patch could be submitted against net-next
independently of the rest of the series.

Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>

	-J

> 		break;
> 	}
>-	bond->rr_tx_counter++;
> 
> 	return slave_id;
> }
>@@ -5278,6 +5278,9 @@ static void bond_uninit(struct net_device *bond_dev)
> 
> 	list_del(&bond->bond_list);
> 
>+	if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN)
>+		free_percpu(bond->rr_tx_counter);
>+
> 	bond_debug_unregister(bond);
> }
> 
>@@ -5681,6 +5684,15 @@ static int bond_init(struct net_device *bond_dev)
> 	if (!bond->wq)
> 		return -ENOMEM;
> 
>+	if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN) {
>+		bond->rr_tx_counter = alloc_percpu(u32);
>+		if (!bond->rr_tx_counter) {
>+			destroy_workqueue(bond->wq);
>+			bond->wq = NULL;
>+			return -ENOMEM;
>+		}
>+	}
>+
> 	spin_lock_init(&bond->stats_lock);
> 	netdev_lockdep_set_classes(bond_dev);
> 
>diff --git a/include/net/bonding.h b/include/net/bonding.h
>index 34acb81b4234..8de8180f1be8 100644
>--- a/include/net/bonding.h
>+++ b/include/net/bonding.h
>@@ -232,7 +232,7 @@ struct bonding {
> 	char     proc_file_name[IFNAMSIZ];
> #endif /* CONFIG_PROC_FS */
> 	struct   list_head bond_list;
>-	u32      rr_tx_counter;
>+	u32 __percpu *rr_tx_counter;
> 	struct   ad_bond_info ad_info;
> 	struct   alb_bond_info alb_info;
> 	struct   bond_params params;
>-- 
>2.30.2
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next 0/3] XDP bonding support
  2021-06-09 13:55 [PATCH bpf-next 0/3] XDP bonding support Jussi Maki
                   ` (2 preceding siblings ...)
  2021-06-09 13:55 ` [PATCH bpf-next 3/3] selftests/bpf: Add tests for XDP bonding Jussi Maki
@ 2021-06-10 17:24 ` Andrii Nakryiko
  2021-06-14 12:25   ` Jussi Maki
  2021-06-24  9:18 ` [PATCH bpf-next v2 0/4] " joamaki
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 71+ messages in thread
From: Andrii Nakryiko @ 2021-06-10 17:24 UTC (permalink / raw)
  To: Jussi Maki
  Cc: bpf, Networking, Daniel Borkmann, j.vosburgh, andy, vfalico,
	Andrii Nakryiko

On Wed, Jun 9, 2021 at 6:55 AM Jussi Maki <joamaki@gmail.com> wrote:
>
> This patchset introduces XDP support to the bonding driver.
>
> Patch 1 contains the implementation, including support for
> the recently introduced EXCLUDE_INGRESS. Patch 2 contains a
> performance fix to the roundrobin mode which switches rr_tx_counter
> to be per-cpu. Patch 3 contains the test suite for the implementation
> using a pair of veth devices.
>
> The vmtest.sh is modified to enable the bonding module and install
> modules. The config change should probably be done in the libbpf
> repository. Andrii: How would you like this done properly?

I think vmtest.sh and CI setup doesn't support modules (not easily at
least). Can we just compile that driver in? Then you can submit a PR
against libbpf Github repo to adjust the config. We have also kernel
CI repo where we'll need to make this change.

>
> The motivation for this change is to enable use of bonding (and
> 802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
> XDP and also to transparently support bond devices for projects that
> use XDP given most modern NICs have dual port adapters.  An alternative
> to this approach would be to implement 802.3ad in user-space and
> implement the bonding load-balancing in the XDP program itself, but
> is rather a cumbersome endeavor in terms of slave device management
> (e.g. by watching netlink) and requires separate programs for native
> vs bond cases for the orchestrator. A native in-kernel implementation
> overcomes these issues and provides more flexibility.
>
> Below are benchmark results done on two machines with 100Gbit
> Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
> 16-core 3950X on receiving machine. 64 byte packets were sent with
> pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
> ice driver, so the tests were performed with iommu=off and patch [2]
> applied. Additionally the bonding round robin algorithm was modified
> to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
> of cache misses were caused by the shared rr_tx_counter (see patch
> 2/3). The statistics were collected using "sar -n dev -u 1 10".
>
>  -----------------------|  CPU  |--| rxpck/s |--| txpck/s |----
>  without patch (1 dev):
>    XDP_DROP:              3.15%      48.6Mpps
>    XDP_TX:                3.12%      18.3Mpps     18.3Mpps
>    XDP_DROP (RSS):        9.47%      116.5Mpps
>    XDP_TX (RSS):          9.67%      25.3Mpps     24.2Mpps
>  -----------------------
>  with patch, bond (1 dev):
>    XDP_DROP:              3.14%      46.7Mpps
>    XDP_TX:                3.15%      13.9Mpps     13.9Mpps
>    XDP_DROP (RSS):        10.33%     117.2Mpps
>    XDP_TX (RSS):          10.64%     25.1Mpps     24.0Mpps
>  -----------------------
>  with patch, bond (2 devs):
>    XDP_DROP:              6.27%      92.7Mpps
>    XDP_TX:                6.26%      17.6Mpps     17.5Mpps
>    XDP_DROP (RSS):       11.38%      117.2Mpps
>    XDP_TX (RSS):         14.30%      28.7Mpps     27.4Mpps
>  --------------------------------------------------------------
>
> RSS: Receive Side Scaling, e.g. the packets were sent to a range of
> destination IPs.
>
> [1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb
> [2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t
> [3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/
>
> ---
>
> Jussi Maki (3):
>   net: bonding: Add XDP support to the bonding driver
>   net: bonding: Use per-cpu rr_tx_counter
>   selftests/bpf: Add tests for XDP bonding
>
>  drivers/net/bonding/bond_main.c               | 459 +++++++++++++++---
>  include/linux/filter.h                        |  13 +-
>  include/linux/netdevice.h                     |   5 +
>  include/net/bonding.h                         |   3 +-
>  kernel/bpf/devmap.c                           |  34 +-
>  net/core/filter.c                             |  37 +-
>  .../selftests/bpf/prog_tests/xdp_bonding.c    | 342 +++++++++++++
>  tools/testing/selftests/bpf/vmtest.sh         |  30 +-
>  8 files changed, 843 insertions(+), 80 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_bonding.c
>
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next 2/3] net: bonding: Use per-cpu rr_tx_counter
  2021-06-10  0:04   ` Jay Vosburgh
@ 2021-06-14  7:54     ` Jussi Maki
  0 siblings, 0 replies; 71+ messages in thread
From: Jussi Maki @ 2021-06-14  7:54 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: bpf, netdev, Daniel Borkmann, andy, vfalico, andrii

On Thu, Jun 10, 2021 at 2:04 AM Jay Vosburgh <jay.vosburgh@canonical.com> wrote:
>
> Jussi Maki <joamaki@gmail.com> wrote:
>
>         With the rr_tx_counter is per-cpu, each CPU is essentially doing
> its own round-robin logic, independently of other CPUs, so the resulting
> spread of transmitted packets may not be as evenly distributed (as
> multiple CPUs could select the same interface to transmit on
> approximately in lock-step).  I'm not sure if this could cause actual
> problems in practice, though, as particular flows shouldn't skip between
> CPUs (and thus rr_tx_counters) very often, and round-robin already
> shouldn't be the first choice if no packet reordering is a hard
> requirement.
>
>         I think this patch could be submitted against net-next
> independently of the rest of the series.

Yes this makes sense. I'll submit it separately against net-next today
and drop it off from this patchset.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next 1/3] net: bonding: Add XDP support to the bonding driver
  2021-06-09 23:29   ` Jay Vosburgh
@ 2021-06-14  8:02     ` Jussi Maki
  0 siblings, 0 replies; 71+ messages in thread
From: Jussi Maki @ 2021-06-14  8:02 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: bpf, netdev, Daniel Borkmann, Andy Gospodarek, vfalico, andrii

On Thu, Jun 10, 2021 at 1:29 AM Jay Vosburgh <jay.vosburgh@canonical.com> wrote:
>         The design adds logic around a bpf_bond_redirect_enabled_key
> static key in the BPF core functions dev_map_enqueue_multi,
> dev_map_redirect_multi and bpf_prog_run_xdp.  Is this something that is
> correctly implemented as a special case just for bonding (i.e., it will
> never ever have to be extended), or is it possible that other
> upper/lower type software devices will have similar XDP functionality
> added in the future, e.g., bridge, VLAN, etc?

Good point. For example the "team" driver would basically need pretty
much the same implementation. For that just using non-bond naming
would be enough. I don't think there's much of a cost for doing a more
generic mechanism, e.g. xdp "upper intercept" hook in netdev_ops, so
I'll try that out. At the very least I'll change the naming.

...

> >@@ -3543,26 +3614,30 @@ static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
> > }
> >
> > /* Extract the appropriate headers based on bond's xmit policy */
> >-static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
> >+static bool bond_flow_dissect(struct bonding *bond,
> >+                            struct sk_buff *skb,
> >+                            const void *data,
> >+                            __be16 l2_proto,
> >+                            int nhoff,
> >+                            int hlen,
> >                             struct flow_keys *fk)
>
>         Please compact the argument list down to fewer lines, in
> conformance with usual coding practice in the kernel.  The above style
> of formatting occurs multiple times in this patch, both in function
> declarations and function calls.

Thanks will do.

...

> >-/**
> >- * bond_xmit_hash - generate a hash value based on the xmit policy
> >- * @bond: bonding device
> >- * @skb: buffer to use for headers
> >- *
> >- * This function will extract the necessary headers from the skb buffer and use
> >- * them to generate a hash based on the xmit_policy set in the bonding device
> >+/* Generate hash based on xmit policy. If @skb is given it is used to linearize
> >+ * the data as required, but this function can be used without it.
>
>         Please don't remove kernel-doc formatting; add your new
> parameters to the documentation.

The comment and the function declaration were untouched (see further
below in patch).  I only introduced the common helper __bond_xmit_hash
used from bond_xmit_hash and bond_xmit_hash_xdp. Unfortunately the
generated diff was a bit confusing. I'll try and generate cleaner
diffs in the future.

> >  */
> >-u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
> >+static u32 __bond_xmit_hash(struct bonding *bond,
> >+                          struct sk_buff *skb,
> >+                          const void *data,
> >+                          __be16 l2_proto,
> >+                          int mhoff,
> >+                          int nhoff,
> >+                          int hlen)
> > {
> >       struct flow_keys flow;
> >       u32 hash;
> >
> >-      if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
> >-          skb->l4_hash)
> >-              return skb->hash;
> >-
> >       if (bond->params.xmit_policy == BOND_XMIT_POLICY_VLAN_SRCMAC)
> >-              return bond_vlan_srcmac_hash(skb);
> >+              return bond_vlan_srcmac_hash(skb, data, mhoff, hlen);
> >
> >       if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
> >-          !bond_flow_dissect(bond, skb, &flow))
> >-              return bond_eth_hash(skb);
> >+          !bond_flow_dissect(bond, skb, data, l2_proto, nhoff, hlen, &flow))
> >+              return bond_eth_hash(skb, data, mhoff, hlen);
> >
> >       if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 ||
> >           bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23) {
> >-              hash = bond_eth_hash(skb);
> >+              hash = bond_eth_hash(skb, data, mhoff, hlen);
> >       } else {
> >               if (flow.icmp.id)
> >                       memcpy(&hash, &flow.icmp, sizeof(hash));
> >@@ -3638,6 +3708,48 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
> >       return bond_ip_hash(hash, &flow);
> > }
> >
> >+/**
> >+ * bond_xmit_hash_skb - generate a hash value based on the xmit policy
> >+ * @bond: bonding device
> >+ * @skb: buffer to use for headers
> >+ *
> >+ * This function will extract the necessary headers from the skb buffer and use
> >+ * them to generate a hash based on the xmit_policy set in the bonding device
> >+ */
> >+u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
> >+{
> >+      if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
> >+          skb->l4_hash)
> >+              return skb->hash;
> >+
> >+      return __bond_xmit_hash(bond, skb, skb->head, skb->protocol,
> >+                              skb->mac_header,
> >+                              skb->network_header,
> >+                              skb_headlen(skb));
> >+}
...

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next 3/3] selftests/bpf: Add tests for XDP bonding
  2021-06-09 22:07   ` Maciej Fijalkowski
@ 2021-06-14  8:08     ` Jussi Maki
  2021-06-14  8:48       ` Magnus Karlsson
  0 siblings, 1 reply; 71+ messages in thread
From: Jussi Maki @ 2021-06-14  8:08 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: bpf, netdev, Daniel Borkmann, j.vosburgh, Andy Gospodarek,
	vfalico, andrii, magnus.karlsson

On Thu, Jun 10, 2021 at 12:19 AM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> On Wed, Jun 09, 2021 at 01:55:37PM +0000, Jussi Maki wrote:
> > Add a test suite to test XDP bonding implementation
> > over a pair of veth devices.
>
> Cc: Magnus
>
> Jussi,
> AF_XDP selftests have very similar functionality just like you are trying
> to introduce over here, e.g. we setup veth pair and generate traffic.
> After a quick look seems that we could have a generic layer that would
> be used by both AF_XDP and bonding selftests.
>
> WDYT?

Sounds like a good idea to me to have more shared code in the
selftests and I don't see a reason not to use the AF_XDP datapath in
the bonding selftests. I'll look into it this week and get back to
you.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next 3/3] selftests/bpf: Add tests for XDP bonding
  2021-06-14  8:08     ` Jussi Maki
@ 2021-06-14  8:48       ` Magnus Karlsson
  2021-06-14 12:20         ` Jussi Maki
  0 siblings, 1 reply; 71+ messages in thread
From: Magnus Karlsson @ 2021-06-14  8:48 UTC (permalink / raw)
  To: Jussi Maki
  Cc: Maciej Fijalkowski, bpf, Network Development, Daniel Borkmann,
	j.vosburgh, Andy Gospodarek, vfalico, Andrii Nakryiko, Karlsson,
	Magnus

On Mon, Jun 14, 2021 at 10:09 AM Jussi Maki <joamaki@gmail.com> wrote:
>
> On Thu, Jun 10, 2021 at 12:19 AM Maciej Fijalkowski
> <maciej.fijalkowski@intel.com> wrote:
> >
> > On Wed, Jun 09, 2021 at 01:55:37PM +0000, Jussi Maki wrote:
> > > Add a test suite to test XDP bonding implementation
> > > over a pair of veth devices.
> >
> > Cc: Magnus
> >
> > Jussi,
> > AF_XDP selftests have very similar functionality just like you are trying
> > to introduce over here, e.g. we setup veth pair and generate traffic.
> > After a quick look seems that we could have a generic layer that would
> > be used by both AF_XDP and bonding selftests.
> >
> > WDYT?
>
> Sounds like a good idea to me to have more shared code in the
> selftests and I don't see a reason not to use the AF_XDP datapath in
> the bonding selftests. I'll look into it this week and get back to
> you.

Note, that I am currently rewriting a large part of the AF_XDP
selftests making it more amenable to adding various tests. A test is
in my patch set is described as a set of packets to send, a set of
packets that should be received in a certain order with specified
contents, and configuration/setup information for the sender and
receiver. The current code is riddled with test specific if-statements
that make it hard to extend and use generically. So please hold off
for a week or so and review my patch set when I send it to the list.
Better use of your time. Hopefully we can make it fit your bill too
with not too much work.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next 3/3] selftests/bpf: Add tests for XDP bonding
  2021-06-14  8:48       ` Magnus Karlsson
@ 2021-06-14 12:20         ` Jussi Maki
  0 siblings, 0 replies; 71+ messages in thread
From: Jussi Maki @ 2021-06-14 12:20 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Maciej Fijalkowski, bpf, Network Development, Daniel Borkmann,
	j.vosburgh, Andy Gospodarek, vfalico, Andrii Nakryiko, Karlsson,
	Magnus

On Mon, Jun 14, 2021 at 10:48 AM Magnus Karlsson
<magnus.karlsson@gmail.com> wrote:
>
> On Mon, Jun 14, 2021 at 10:09 AM Jussi Maki <joamaki@gmail.com> wrote:
> > Sounds like a good idea to me to have more shared code in the
> > selftests and I don't see a reason not to use the AF_XDP datapath in
> > the bonding selftests. I'll look into it this week and get back to
> > you.
>
> Note, that I am currently rewriting a large part of the AF_XDP
> selftests making it more amenable to adding various tests. A test is
> in my patch set is described as a set of packets to send, a set of
> packets that should be received in a certain order with specified
> contents, and configuration/setup information for the sender and
> receiver. The current code is riddled with test specific if-statements
> that make it hard to extend and use generically. So please hold off
> for a week or so and review my patch set when I send it to the list.
> Better use of your time. Hopefully we can make it fit your bill too
> with not too much work.

Ok, thanks for the heads up! Looking forward to your patch set.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next 0/3] XDP bonding support
  2021-06-10 17:24 ` [PATCH bpf-next 0/3] XDP bonding support Andrii Nakryiko
@ 2021-06-14 12:25   ` Jussi Maki
  2021-06-14 15:37     ` Jay Vosburgh
  2021-06-15  5:34     ` Andrii Nakryiko
  0 siblings, 2 replies; 71+ messages in thread
From: Jussi Maki @ 2021-06-14 12:25 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Networking, Daniel Borkmann, j.vosburgh, Andy Gospodarek,
	vfalico, Andrii Nakryiko

On Thu, Jun 10, 2021 at 7:24 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, Jun 9, 2021 at 6:55 AM Jussi Maki <joamaki@gmail.com> wrote:
> >
> > This patchset introduces XDP support to the bonding driver.
> >
> > Patch 1 contains the implementation, including support for
> > the recently introduced EXCLUDE_INGRESS. Patch 2 contains a
> > performance fix to the roundrobin mode which switches rr_tx_counter
> > to be per-cpu. Patch 3 contains the test suite for the implementation
> > using a pair of veth devices.
> >
> > The vmtest.sh is modified to enable the bonding module and install
> > modules. The config change should probably be done in the libbpf
> > repository. Andrii: How would you like this done properly?
>
> I think vmtest.sh and CI setup doesn't support modules (not easily at
> least). Can we just compile that driver in? Then you can submit a PR
> against libbpf Github repo to adjust the config. We have also kernel
> CI repo where we'll need to make this change.

Unfortunately the mode and xmit_policy options of the bonding driver
are module params, so it'll need to be a module so the different modes
can be tested. I already modified vmtest.sh [1] to "make
module_install" into the rootfs and enable the bonding module via
scripts/config, but a cleaner approach would probably be to, as you
suggested, update latest.config in libbpf repo and probably get the
"modules_install" change into vmtest.sh separately (if you're happy
with this approach). What do you think?

[1] https://lore.kernel.org/netdev/20210609135537.1460244-1-joamaki@gmail.com/T/#maaf15ecd6b7c3af764558589118a3c6213e0af81

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next 0/3] XDP bonding support
  2021-06-14 12:25   ` Jussi Maki
@ 2021-06-14 15:37     ` Jay Vosburgh
  2021-06-15  5:34     ` Andrii Nakryiko
  1 sibling, 0 replies; 71+ messages in thread
From: Jay Vosburgh @ 2021-06-14 15:37 UTC (permalink / raw)
  To: Jussi Maki
  Cc: Andrii Nakryiko, bpf, Networking, Daniel Borkmann,
	Andy Gospodarek, vfalico, Andrii Nakryiko

Jussi Maki <joamaki@gmail.com> wrote:

>On Thu, Jun 10, 2021 at 7:24 PM Andrii Nakryiko
><andrii.nakryiko@gmail.com> wrote:
>>
>> On Wed, Jun 9, 2021 at 6:55 AM Jussi Maki <joamaki@gmail.com> wrote:
>> >
>> > This patchset introduces XDP support to the bonding driver.
>> >
>> > Patch 1 contains the implementation, including support for
>> > the recently introduced EXCLUDE_INGRESS. Patch 2 contains a
>> > performance fix to the roundrobin mode which switches rr_tx_counter
>> > to be per-cpu. Patch 3 contains the test suite for the implementation
>> > using a pair of veth devices.
>> >
>> > The vmtest.sh is modified to enable the bonding module and install
>> > modules. The config change should probably be done in the libbpf
>> > repository. Andrii: How would you like this done properly?
>>
>> I think vmtest.sh and CI setup doesn't support modules (not easily at
>> least). Can we just compile that driver in? Then you can submit a PR
>> against libbpf Github repo to adjust the config. We have also kernel
>> CI repo where we'll need to make this change.
>
>Unfortunately the mode and xmit_policy options of the bonding driver
>are module params, so it'll need to be a module so the different modes
>can be tested. I already modified vmtest.sh [1] to "make
>module_install" into the rootfs and enable the bonding module via
>scripts/config, but a cleaner approach would probably be to, as you
>suggested, update latest.config in libbpf repo and probably get the
>"modules_install" change into vmtest.sh separately (if you're happy
>with this approach). What do you think?

	The bonding mode and xmit_hash_policy (and any other option) can
be changed via "ip link"; no module parameter needed, e.g.,

ip link set dev bond0 type bond xmit_hash_policy layer2

	-J

>[1] https://lore.kernel.org/netdev/20210609135537.1460244-1-joamaki@gmail.com/T/#maaf15ecd6b7c3af764558589118a3c6213e0af81

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next 0/3] XDP bonding support
  2021-06-14 12:25   ` Jussi Maki
  2021-06-14 15:37     ` Jay Vosburgh
@ 2021-06-15  5:34     ` Andrii Nakryiko
  1 sibling, 0 replies; 71+ messages in thread
From: Andrii Nakryiko @ 2021-06-15  5:34 UTC (permalink / raw)
  To: Jussi Maki
  Cc: bpf, Networking, Daniel Borkmann, j.vosburgh, Andy Gospodarek,
	vfalico, Andrii Nakryiko

On Mon, Jun 14, 2021 at 5:25 AM Jussi Maki <joamaki@gmail.com> wrote:
>
> On Thu, Jun 10, 2021 at 7:24 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Wed, Jun 9, 2021 at 6:55 AM Jussi Maki <joamaki@gmail.com> wrote:
> > >
> > > This patchset introduces XDP support to the bonding driver.
> > >
> > > Patch 1 contains the implementation, including support for
> > > the recently introduced EXCLUDE_INGRESS. Patch 2 contains a
> > > performance fix to the roundrobin mode which switches rr_tx_counter
> > > to be per-cpu. Patch 3 contains the test suite for the implementation
> > > using a pair of veth devices.
> > >
> > > The vmtest.sh is modified to enable the bonding module and install
> > > modules. The config change should probably be done in the libbpf
> > > repository. Andrii: How would you like this done properly?
> >
> > I think vmtest.sh and CI setup doesn't support modules (not easily at
> > least). Can we just compile that driver in? Then you can submit a PR
> > against libbpf Github repo to adjust the config. We have also kernel
> > CI repo where we'll need to make this change.
>
> Unfortunately the mode and xmit_policy options of the bonding driver
> are module params, so it'll need to be a module so the different modes
> can be tested. I already modified vmtest.sh [1] to "make
> module_install" into the rootfs and enable the bonding module via
> scripts/config, but a cleaner approach would probably be to, as you
> suggested, update latest.config in libbpf repo and probably get the
> "modules_install" change into vmtest.sh separately (if you're happy
> with this approach). What do you think?

If we can make modules work in vmtest.sh then it's great, regardless
if you need it still or not. It's not supported right now because no
one did work to support modules, not because we explicitly didn't want
modules in CI.

>
> [1] https://lore.kernel.org/netdev/20210609135537.1460244-1-joamaki@gmail.com/T/#maaf15ecd6b7c3af764558589118a3c6213e0af81

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next 1/3] net: bonding: Add XDP support to the bonding driver
  2021-06-09 13:55 ` [PATCH bpf-next 1/3] net: bonding: Add XDP support to the bonding driver Jussi Maki
  2021-06-09 22:29   ` Maciej Fijalkowski
  2021-06-09 23:29   ` Jay Vosburgh
@ 2021-06-17  3:40   ` kernel test robot
  2021-06-17  6:35   ` kernel test robot
  2021-06-22  7:24   ` kernel test robot
  4 siblings, 0 replies; 71+ messages in thread
From: kernel test robot @ 2021-06-17  3:40 UTC (permalink / raw)
  To: Jussi Maki, bpf
  Cc: kbuild-all, clang-built-linux, netdev, daniel, j.vosburgh, andy,
	vfalico, andrii, Jussi Maki

[-- Attachment #1: Type: text/plain, Size: 3129 bytes --]

Hi Jussi,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Jussi-Maki/XDP-bonding-support/20210617-053146
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: x86_64-randconfig-a011-20210617 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 64720f57bea6a6bf033feef4a5751ab9c0c3b401)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/61fabab38aec5b8e0cdc33867e35ea9740da84c8
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jussi-Maki/XDP-bonding-support/20210617-053146
        git checkout 61fabab38aec5b8e0cdc33867e35ea9740da84c8
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/net/bonding/bond_main.c:4926:1: warning: no previous prototype for function 'bond_xdp_get_xmit_slave' [-Wmissing-prototypes]
   bond_xdp_get_xmit_slave(struct net_device *bond_dev, struct xdp_buff *xdp)
   ^
   drivers/net/bonding/bond_main.c:4925:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   struct net_device *
   ^
   static 
   1 warning generated.
--
>> drivers/net/bonding/bond_main.c:3720: warning: expecting prototype for bond_xmit_hash_skb(). Prototype was for bond_xmit_hash() instead


vim +/bond_xdp_get_xmit_slave +4926 drivers/net/bonding/bond_main.c

  4924	
  4925	struct net_device *
> 4926	bond_xdp_get_xmit_slave(struct net_device *bond_dev, struct xdp_buff *xdp)
  4927	{
  4928		struct bonding *bond = netdev_priv(bond_dev);
  4929		struct slave *slave;
  4930	
  4931		/* Caller needs to hold rcu_read_lock() */
  4932	
  4933		switch (BOND_MODE(bond)) {
  4934		case BOND_MODE_ROUNDROBIN:
  4935			slave = bond_xdp_xmit_roundrobin_slave_get(bond, xdp);
  4936			break;
  4937	
  4938		case BOND_MODE_ACTIVEBACKUP:
  4939			slave = bond_xmit_activebackup_slave_get(bond);
  4940			break;
  4941	
  4942		case BOND_MODE_8023AD:
  4943		case BOND_MODE_XOR:
  4944			slave = bond_xdp_xmit_3ad_xor_slave_get(bond, xdp);
  4945			break;
  4946	
  4947		default:
  4948			/* Should never happen. Mode guarded by bond_xdp_check() */
  4949			netdev_err(bond_dev, "Unknown bonding mode %d for xdp xmit\n", BOND_MODE(bond));
  4950			WARN_ON_ONCE(1);
  4951			return NULL;
  4952		}
  4953	
  4954		if (slave)
  4955			return slave->dev;
  4956	
  4957		return NULL;
  4958	}
  4959	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34438 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next 1/3] net: bonding: Add XDP support to the bonding driver
  2021-06-09 13:55 ` [PATCH bpf-next 1/3] net: bonding: Add XDP support to the bonding driver Jussi Maki
                     ` (2 preceding siblings ...)
  2021-06-17  3:40   ` kernel test robot
@ 2021-06-17  6:35   ` kernel test robot
  2021-06-22  7:24   ` kernel test robot
  4 siblings, 0 replies; 71+ messages in thread
From: kernel test robot @ 2021-06-17  6:35 UTC (permalink / raw)
  To: Jussi Maki, bpf
  Cc: kbuild-all, netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	Jussi Maki

[-- Attachment #1: Type: text/plain, Size: 2626 bytes --]

Hi Jussi,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Jussi-Maki/XDP-bonding-support/20210617-053146
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: alpha-randconfig-r014-20210617 (attached as .config)
compiler: alpha-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/61fabab38aec5b8e0cdc33867e35ea9740da84c8
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jussi-Maki/XDP-bonding-support/20210617-053146
        git checkout 61fabab38aec5b8e0cdc33867e35ea9740da84c8
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=alpha 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/net/bonding/bond_main.c:4926:1: warning: no previous prototype for 'bond_xdp_get_xmit_slave' [-Wmissing-prototypes]
    4926 | bond_xdp_get_xmit_slave(struct net_device *bond_dev, struct xdp_buff *xdp)
         | ^~~~~~~~~~~~~~~~~~~~~~~


vim +/bond_xdp_get_xmit_slave +4926 drivers/net/bonding/bond_main.c

  4924	
  4925	struct net_device *
> 4926	bond_xdp_get_xmit_slave(struct net_device *bond_dev, struct xdp_buff *xdp)
  4927	{
  4928		struct bonding *bond = netdev_priv(bond_dev);
  4929		struct slave *slave;
  4930	
  4931		/* Caller needs to hold rcu_read_lock() */
  4932	
  4933		switch (BOND_MODE(bond)) {
  4934		case BOND_MODE_ROUNDROBIN:
  4935			slave = bond_xdp_xmit_roundrobin_slave_get(bond, xdp);
  4936			break;
  4937	
  4938		case BOND_MODE_ACTIVEBACKUP:
  4939			slave = bond_xmit_activebackup_slave_get(bond);
  4940			break;
  4941	
  4942		case BOND_MODE_8023AD:
  4943		case BOND_MODE_XOR:
  4944			slave = bond_xdp_xmit_3ad_xor_slave_get(bond, xdp);
  4945			break;
  4946	
  4947		default:
  4948			/* Should never happen. Mode guarded by bond_xdp_check() */
  4949			netdev_err(bond_dev, "Unknown bonding mode %d for xdp xmit\n", BOND_MODE(bond));
  4950			WARN_ON_ONCE(1);
  4951			return NULL;
  4952		}
  4953	
  4954		if (slave)
  4955			return slave->dev;
  4956	
  4957		return NULL;
  4958	}
  4959	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 38330 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next 1/3] net: bonding: Add XDP support to the bonding driver
  2021-06-09 13:55 ` [PATCH bpf-next 1/3] net: bonding: Add XDP support to the bonding driver Jussi Maki
                     ` (3 preceding siblings ...)
  2021-06-17  6:35   ` kernel test robot
@ 2021-06-22  7:24   ` kernel test robot
  4 siblings, 0 replies; 71+ messages in thread
From: kernel test robot @ 2021-06-22  7:24 UTC (permalink / raw)
  To: Jussi Maki, bpf
  Cc: kbuild-all, netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	Jussi Maki

[-- Attachment #1: Type: text/plain, Size: 5208 bytes --]

Hi Jussi,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Jussi-Maki/XDP-bonding-support/20210617-053146
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: x86_64-randconfig-s031-20210622 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.3-341-g8af24329-dirty
        # https://github.com/0day-ci/linux/commit/61fabab38aec5b8e0cdc33867e35ea9740da84c8
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jussi-Maki/XDP-bonding-support/20210617-053146
        git checkout 61fabab38aec5b8e0cdc33867e35ea9740da84c8
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
   drivers/net/bonding/bond_main.c:2660:26: sparse: sparse: restricted __be16 degrades to integer
   drivers/net/bonding/bond_main.c:2666:20: sparse: sparse: restricted __be16 degrades to integer
   drivers/net/bonding/bond_main.c:2713:40: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __be16 [usertype] vlan_proto @@     got int @@
   drivers/net/bonding/bond_main.c:2713:40: sparse:     expected restricted __be16 [usertype] vlan_proto
   drivers/net/bonding/bond_main.c:2713:40: sparse:     got int
   drivers/net/bonding/bond_main.c:3561:25: sparse: sparse: restricted __be16 degrades to integer
   drivers/net/bonding/bond_main.c:3571:32: sparse: sparse: restricted __be16 degrades to integer
>> drivers/net/bonding/bond_main.c:3640:48: sparse: sparse: incorrect type in argument 5 (different base types) @@     expected int l2_proto @@     got restricted __be16 [usertype] l2_proto @@
   drivers/net/bonding/bond_main.c:3640:48: sparse:     expected int l2_proto
   drivers/net/bonding/bond_main.c:3640:48: sparse:     got restricted __be16 [usertype] l2_proto
   drivers/net/bonding/bond_main.c:3661:58: sparse: sparse: incorrect type in argument 5 (different base types) @@     expected int l2_proto @@     got restricted __be16 [usertype] l2_proto @@
   drivers/net/bonding/bond_main.c:3661:58: sparse:     expected int l2_proto
   drivers/net/bonding/bond_main.c:3661:58: sparse:     got restricted __be16 [usertype] l2_proto
>> drivers/net/bonding/bond_main.c:4633:16: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct bond_up_slave *slaves @@     got struct bond_up_slave [noderef] __rcu *usable_slaves @@
   drivers/net/bonding/bond_main.c:4633:16: sparse:     expected struct bond_up_slave *slaves
   drivers/net/bonding/bond_main.c:4633:16: sparse:     got struct bond_up_slave [noderef] __rcu *usable_slaves
   drivers/net/bonding/bond_main.c:3552:52: sparse: sparse: restricted __be16 degrades to integer
   drivers/net/bonding/bond_main.c:3552:52: sparse: sparse: restricted __be16 degrades to integer

vim +3640 drivers/net/bonding/bond_main.c

  3615	
  3616	/* Extract the appropriate headers based on bond's xmit policy */
  3617	static bool bond_flow_dissect(struct bonding *bond,
  3618				      struct sk_buff *skb,
  3619				      const void *data,
  3620				      __be16 l2_proto,
  3621				      int nhoff,
  3622				      int hlen,
  3623				      struct flow_keys *fk)
  3624	{
  3625		bool l34 = bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34;
  3626		int ip_proto = -1;
  3627	
  3628		switch (bond->params.xmit_policy) {
  3629		case BOND_XMIT_POLICY_ENCAP23:
  3630		case BOND_XMIT_POLICY_ENCAP34:
  3631			memset(fk, 0, sizeof(*fk));
  3632			return __skb_flow_dissect(NULL, skb, &flow_keys_bonding,
  3633						  fk, data, l2_proto, nhoff, hlen, 0);
  3634		default:
  3635			break;
  3636		}
  3637	
  3638		fk->ports.ports = 0;
  3639		memset(&fk->icmp, 0, sizeof(fk->icmp));
> 3640		if (!bond_flow_ip(skb, fk, data, hlen, l2_proto, &nhoff, &ip_proto, l34))
  3641			return false;
  3642	
  3643		/* ICMP error packets contains at least 8 bytes of the header
  3644		 * of the packet which generated the error. Use this information
  3645		 * to correlate ICMP error packets within the same flow which
  3646		 * generated the error.
  3647		 */
  3648		if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
  3649			skb_flow_get_icmp_tci(skb, &fk->icmp, data, nhoff, hlen);
  3650			if (ip_proto == IPPROTO_ICMP) {
  3651				if (!icmp_is_err(fk->icmp.type))
  3652					return true;
  3653	
  3654				nhoff += sizeof(struct icmphdr);
  3655			} else if (ip_proto == IPPROTO_ICMPV6) {
  3656				if (!icmpv6_is_err(fk->icmp.type))
  3657					return true;
  3658	
  3659				nhoff += sizeof(struct icmp6hdr);
  3660			}
  3661			return bond_flow_ip(skb, fk, data, hlen, l2_proto, &nhoff, &ip_proto, l34);
  3662		}
  3663	
  3664		return true;
  3665	}
  3666	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 47225 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v2 0/4] XDP bonding support
  2021-06-09 13:55 [PATCH bpf-next 0/3] XDP bonding support Jussi Maki
                   ` (3 preceding siblings ...)
  2021-06-10 17:24 ` [PATCH bpf-next 0/3] XDP bonding support Andrii Nakryiko
@ 2021-06-24  9:18 ` joamaki
  2021-06-24  9:18   ` [PATCH bpf-next v2 1/4] net: bonding: Refactor bond_xmit_hash for use with xdp_buff joamaki
                     ` (4 more replies)
  2021-07-07 11:25 ` [PATCH bpf-next v3 0/5] " Jussi Maki
                   ` (3 subsequent siblings)
  8 siblings, 5 replies; 71+ messages in thread
From: joamaki @ 2021-06-24  9:18 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

From: Jussi Maki <joamaki@gmail.com>

This patchset introduces XDP support to the bonding driver.

The motivation for this change is to enable use of bonding (and
802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
XDP and also to transparently support bond devices for projects that
use XDP given most modern NICs have dual port adapters.  An alternative
to this approach would be to implement 802.3ad in user-space and
implement the bonding load-balancing in the XDP program itself, but
is rather a cumbersome endeavor in terms of slave device management
(e.g. by watching netlink) and requires separate programs for native
vs bond cases for the orchestrator. A native in-kernel implementation
overcomes these issues and provides more flexibility.

Below are benchmark results done on two machines with 100Gbit
Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
16-core 3950X on receiving machine. 64 byte packets were sent with
pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
ice driver, so the tests were performed with iommu=off and patch [2]
applied. Additionally the bonding round robin algorithm was modified
to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
of cache misses were caused by the shared rr_tx_counter. Fix for this
has been already merged into net-next. The statistics were collected 
using "sar -n dev -u 1 10".

 -----------------------|  CPU  |--| rxpck/s |--| txpck/s |----
 without patch (1 dev):
   XDP_DROP:              3.15%      48.6Mpps
   XDP_TX:                3.12%      18.3Mpps     18.3Mpps
   XDP_DROP (RSS):        9.47%      116.5Mpps
   XDP_TX (RSS):          9.67%      25.3Mpps     24.2Mpps
 -----------------------
 with patch, bond (1 dev):
   XDP_DROP:              3.14%      46.7Mpps
   XDP_TX:                3.15%      13.9Mpps     13.9Mpps
   XDP_DROP (RSS):        10.33%     117.2Mpps
   XDP_TX (RSS):          10.64%     25.1Mpps     24.0Mpps
 -----------------------
 with patch, bond (2 devs):
   XDP_DROP:              6.27%      92.7Mpps
   XDP_TX:                6.26%      17.6Mpps     17.5Mpps
   XDP_DROP (RSS):       11.38%      117.2Mpps
   XDP_TX (RSS):         14.30%      28.7Mpps     27.4Mpps
 --------------------------------------------------------------

RSS: Receive Side Scaling, e.g. the packets were sent to a range of
destination IPs.

[1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb
[2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t
[3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/

Patch 1 prepares bond_xmit_hash for hashing xdp_buff's
Patch 2 adds hooks to implement redirection after bpf prog run
Patch 3 implements the hooks in the bonding driver. 
Patch 4 modifies devmap to properly handle EXCLUDE_INGRESS with a slave device.

v1->v2:
- Split up into smaller easier to review patches and address cosmetic 
  review comments.
- Drop the INDIRECT_CALL optimization as it showed little improvement in tests.
- Drop the rr_tx_counter patch as that has already been merged into net-next.
- Separate the test suite into another patch set. This will follow later once a
  patch set from Magnus Karlsson is merged and provides test utilities that can
  be reused for XDP bonding tests. v2 contains no major functional changes and
  was tested with the test suite included in v1.
  (https://lore.kernel.org/bpf/202106221509.kwNvAAZg-lkp@intel.com/T/#m464146d47299125d5868a08affd6d6ce526dfad1)

---

Jussi Maki (4):
  net: bonding: Refactor bond_xmit_hash for use with xdp_buff
  net: core: Add support for XDP redirection to slave device
  net: bonding: Add XDP support to the bonding driver
  devmap: Exclude XDP broadcast to master device

 drivers/net/bonding/bond_main.c | 431 +++++++++++++++++++++++++++-----
 include/linux/filter.h          |  13 +-
 include/linux/netdevice.h       |   5 +
 include/net/bonding.h           |   1 +
 kernel/bpf/devmap.c             |  34 ++-
 net/core/filter.c               |  25 ++
 6 files changed, 445 insertions(+), 64 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v2 1/4] net: bonding: Refactor bond_xmit_hash for use with xdp_buff
  2021-06-24  9:18 ` [PATCH bpf-next v2 0/4] " joamaki
@ 2021-06-24  9:18   ` joamaki
  2021-06-24  9:18   ` [PATCH bpf-next v2 2/4] net: core: Add support for XDP redirection to slave device joamaki
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 71+ messages in thread
From: joamaki @ 2021-06-24  9:18 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

From: Jussi Maki <joamaki@gmail.com>

In preparation for adding XDP support to the bonding driver
refactor the packet hashing functions to be able to work with
any linear data buffer without an skb.

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 drivers/net/bonding/bond_main.c | 147 +++++++++++++++++++-------------
 1 file changed, 90 insertions(+), 57 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index dafeaef3cbd3..c4dd0d0c701a 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3479,55 +3479,80 @@ static struct notifier_block bond_netdev_notifier = {
 
 /*---------------------------- Hashing Policies -----------------------------*/
 
+/* Helper to access data in a packet, with or without a backing skb.
+ * If skb is given the data is linearized if necessary via pskb_may_pull.
+ */
+static inline const void *bond_pull_data(struct sk_buff *skb,
+					 const void *data, int hlen, int n)
+{
+	if (likely(n <= hlen))
+		return data;
+	else if (skb && likely(pskb_may_pull(skb, n)))
+		return skb->head;
+
+	return NULL;
+}
+
 /* L2 hash helper */
-static inline u32 bond_eth_hash(struct sk_buff *skb)
+static inline u32 bond_eth_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen)
 {
-	struct ethhdr *ep, hdr_tmp;
+	struct ethhdr *ep;
 
-	ep = skb_header_pointer(skb, 0, sizeof(hdr_tmp), &hdr_tmp);
-	if (ep)
-		return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto;
-	return 0;
+	data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr));
+	if (!data)
+		return 0;
+
+	ep = (struct ethhdr *)(data + mhoff);
+	return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto;
 }
 
-static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk,
-			 int *noff, int *proto, bool l34)
+static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk, const void *data,
+			 int hlen, __be16 l2_proto, int *nhoff, int *ip_proto, bool l34)
 {
 	const struct ipv6hdr *iph6;
 	const struct iphdr *iph;
 
-	if (skb->protocol == htons(ETH_P_IP)) {
-		if (unlikely(!pskb_may_pull(skb, *noff + sizeof(*iph))))
+	if (l2_proto == htons(ETH_P_IP)) {
+		data = bond_pull_data(skb, data, hlen, *nhoff + sizeof(*iph));
+		if (!data)
 			return false;
-		iph = (const struct iphdr *)(skb->data + *noff);
+
+		iph = (const struct iphdr *)(data + *nhoff);
 		iph_to_flow_copy_v4addrs(fk, iph);
-		*noff += iph->ihl << 2;
+		*nhoff += iph->ihl << 2;
 		if (!ip_is_fragment(iph))
-			*proto = iph->protocol;
-	} else if (skb->protocol == htons(ETH_P_IPV6)) {
-		if (unlikely(!pskb_may_pull(skb, *noff + sizeof(*iph6))))
+			*ip_proto = iph->protocol;
+	} else if (l2_proto == htons(ETH_P_IPV6)) {
+		data = bond_pull_data(skb, data, hlen, *nhoff + sizeof(*iph6));
+		if (!data)
 			return false;
-		iph6 = (const struct ipv6hdr *)(skb->data + *noff);
+
+		iph6 = (const struct ipv6hdr *)(data + *nhoff);
 		iph_to_flow_copy_v6addrs(fk, iph6);
-		*noff += sizeof(*iph6);
-		*proto = iph6->nexthdr;
+		*nhoff += sizeof(*iph6);
+		*ip_proto = iph6->nexthdr;
 	} else {
 		return false;
 	}
 
-	if (l34 && *proto >= 0)
-		fk->ports.ports = skb_flow_get_ports(skb, *noff, *proto);
+	if (l34 && *ip_proto >= 0)
+		fk->ports.ports = __skb_flow_get_ports(skb, *nhoff, *ip_proto, data, hlen);
 
 	return true;
 }
 
-static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
+static u32 bond_vlan_srcmac_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen)
 {
-	struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
+	struct ethhdr *mac_hdr;
 	u32 srcmac_vendor = 0, srcmac_dev = 0;
 	u16 vlan;
 	int i;
 
+	data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr));
+	if (!data)
+		return 0;
+	mac_hdr = (struct ethhdr *)(data + mhoff);
+
 	for (i = 0; i < 3; i++)
 		srcmac_vendor = (srcmac_vendor << 8) | mac_hdr->h_source[i];
 
@@ -3543,26 +3568,25 @@ static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
 }
 
 /* Extract the appropriate headers based on bond's xmit policy */
-static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
-			      struct flow_keys *fk)
+static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb, const void *data,
+			      __be16 l2_proto, int nhoff, int hlen, struct flow_keys *fk)
 {
 	bool l34 = bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34;
-	int noff, proto = -1;
+	int ip_proto = -1;
 
 	switch (bond->params.xmit_policy) {
 	case BOND_XMIT_POLICY_ENCAP23:
 	case BOND_XMIT_POLICY_ENCAP34:
 		memset(fk, 0, sizeof(*fk));
 		return __skb_flow_dissect(NULL, skb, &flow_keys_bonding,
-					  fk, NULL, 0, 0, 0, 0);
+					  fk, data, l2_proto, nhoff, hlen, 0);
 	default:
 		break;
 	}
 
 	fk->ports.ports = 0;
 	memset(&fk->icmp, 0, sizeof(fk->icmp));
-	noff = skb_network_offset(skb);
-	if (!bond_flow_ip(skb, fk, &noff, &proto, l34))
+	if (!bond_flow_ip(skb, fk, data, hlen, l2_proto, &nhoff, &ip_proto, l34))
 		return false;
 
 	/* ICMP error packets contains at least 8 bytes of the header
@@ -3570,22 +3594,20 @@ static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
 	 * to correlate ICMP error packets within the same flow which
 	 * generated the error.
 	 */
-	if (proto == IPPROTO_ICMP || proto == IPPROTO_ICMPV6) {
-		skb_flow_get_icmp_tci(skb, &fk->icmp, skb->data,
-				      skb_transport_offset(skb),
-				      skb_headlen(skb));
-		if (proto == IPPROTO_ICMP) {
+	if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
+		skb_flow_get_icmp_tci(skb, &fk->icmp, data, nhoff, hlen);
+		if (ip_proto == IPPROTO_ICMP) {
 			if (!icmp_is_err(fk->icmp.type))
 				return true;
 
-			noff += sizeof(struct icmphdr);
-		} else if (proto == IPPROTO_ICMPV6) {
+			nhoff += sizeof(struct icmphdr);
+		} else if (ip_proto == IPPROTO_ICMPV6) {
 			if (!icmpv6_is_err(fk->icmp.type))
 				return true;
 
-			noff += sizeof(struct icmp6hdr);
+			nhoff += sizeof(struct icmp6hdr);
 		}
-		return bond_flow_ip(skb, fk, &noff, &proto, l34);
+		return bond_flow_ip(skb, fk, data, hlen, l2_proto, &nhoff, &ip_proto, l34);
 	}
 
 	return true;
@@ -3601,33 +3623,26 @@ static u32 bond_ip_hash(u32 hash, struct flow_keys *flow)
 	return hash >> 1;
 }
 
-/**
- * bond_xmit_hash - generate a hash value based on the xmit policy
- * @bond: bonding device
- * @skb: buffer to use for headers
- *
- * This function will extract the necessary headers from the skb buffer and use
- * them to generate a hash based on the xmit_policy set in the bonding device
+/* Generate hash based on xmit policy. If @skb is given it is used to linearize
+ * the data as required, but this function can be used without it if the data is
+ * known to be linear (e.g. with xdp_buff).
  */
-u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
+static u32 __bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, const void *data,
+			    __be16 l2_proto, int mhoff, int nhoff, int hlen)
 {
 	struct flow_keys flow;
 	u32 hash;
 
-	if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
-	    skb->l4_hash)
-		return skb->hash;
-
 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_VLAN_SRCMAC)
-		return bond_vlan_srcmac_hash(skb);
+		return bond_vlan_srcmac_hash(skb, data, mhoff, hlen);
 
 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
-	    !bond_flow_dissect(bond, skb, &flow))
-		return bond_eth_hash(skb);
+	    !bond_flow_dissect(bond, skb, data, l2_proto, nhoff, hlen, &flow))
+		return bond_eth_hash(skb, data, mhoff, hlen);
 
 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 ||
 	    bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23) {
-		hash = bond_eth_hash(skb);
+		hash = bond_eth_hash(skb, data, mhoff, hlen);
 	} else {
 		if (flow.icmp.id)
 			memcpy(&hash, &flow.icmp, sizeof(hash));
@@ -3638,6 +3653,25 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
 	return bond_ip_hash(hash, &flow);
 }
 
+/**
+ * bond_xmit_hash - generate a hash value based on the xmit policy
+ * @bond: bonding device
+ * @skb: buffer to use for headers
+ *
+ * This function will extract the necessary headers from the skb buffer and use
+ * them to generate a hash based on the xmit_policy set in the bonding device
+ */
+u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
+{
+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
+	    skb->l4_hash)
+		return skb->hash;
+
+	return __bond_xmit_hash(bond, skb, skb->head, skb->protocol,
+				skb->mac_header, skb->network_header,
+				skb_headlen(skb));
+}
+
 /*-------------------------- Device entry points ----------------------------*/
 
 void bond_work_init_all(struct bonding *bond)
@@ -4267,8 +4301,7 @@ static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
 	return bond_tx_drop(bond_dev, skb);
 }
 
-static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond,
-						      struct sk_buff *skb)
+static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond)
 {
 	return rcu_dereference(bond->curr_active_slave);
 }
@@ -4282,7 +4315,7 @@ static netdev_tx_t bond_xmit_activebackup(struct sk_buff *skb,
 	struct bonding *bond = netdev_priv(bond_dev);
 	struct slave *slave;
 
-	slave = bond_xmit_activebackup_slave_get(bond, skb);
+	slave = bond_xmit_activebackup_slave_get(bond);
 	if (slave)
 		return bond_dev_queue_xmit(bond, skb, slave->dev);
 
@@ -4580,7 +4613,7 @@ static struct net_device *bond_xmit_get_slave(struct net_device *master_dev,
 		slave = bond_xmit_roundrobin_slave_get(bond, skb);
 		break;
 	case BOND_MODE_ACTIVEBACKUP:
-		slave = bond_xmit_activebackup_slave_get(bond, skb);
+		slave = bond_xmit_activebackup_slave_get(bond);
 		break;
 	case BOND_MODE_8023AD:
 	case BOND_MODE_XOR:
-- 
2.27.0


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v2 2/4] net: core: Add support for XDP redirection to slave device
  2021-06-24  9:18 ` [PATCH bpf-next v2 0/4] " joamaki
  2021-06-24  9:18   ` [PATCH bpf-next v2 1/4] net: bonding: Refactor bond_xmit_hash for use with xdp_buff joamaki
@ 2021-06-24  9:18   ` joamaki
  2021-06-24  9:18   ` [PATCH bpf-next v2 3/4] net: bonding: Add XDP support to the bonding driver joamaki
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 71+ messages in thread
From: joamaki @ 2021-06-24  9:18 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

From: Jussi Maki <joamaki@gmail.com>

This adds the ndo_xdp_get_xmit_slave hook for transforming XDP_TX
into XDP_REDIRECT after BPF program run when the ingress device
is a bond slave.

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 include/linux/filter.h    | 13 ++++++++++++-
 include/linux/netdevice.h |  5 +++++
 net/core/filter.c         | 25 +++++++++++++++++++++++++
 3 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index c5ad7df029ed..752ba6a474a4 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -760,6 +760,10 @@ static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog,
 
 DECLARE_BPF_DISPATCHER(xdp)
 
+DECLARE_STATIC_KEY_FALSE(bpf_master_redirect_enabled_key);
+
+u32 xdp_master_redirect(struct xdp_buff *xdp);
+
 static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
 					    struct xdp_buff *xdp)
 {
@@ -769,7 +773,14 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
 	 * already takes rcu_read_lock() when fetching the program, so
 	 * it's not necessary here anymore.
 	 */
-	return __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
+	u32 act = __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
+
+	if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) {
+		if (act == XDP_TX && netif_is_bond_slave(xdp->rxq->dev))
+			act = xdp_master_redirect(xdp);
+	}
+
+	return act;
 }
 
 void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5cbc950b34df..1a6cc6356498 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1321,6 +1321,9 @@ struct netdev_net_notifier {
  *	that got dropped are freed/returned via xdp_return_frame().
  *	Returns negative number, means general error invoking ndo, meaning
  *	no frames were xmit'ed and core-caller will free all frames.
+ * struct net_device *(*ndo_xdp_get_xmit_slave)(struct net_device *dev,
+ *					        struct xdp_buff *xdp);
+ *      Get the xmit slave of master device based on the xdp_buff.
  * int (*ndo_xsk_wakeup)(struct net_device *dev, u32 queue_id, u32 flags);
  *      This function is used to wake up the softirq, ksoftirqd or kthread
  *	responsible for sending and/or receiving packets on a specific
@@ -1539,6 +1542,8 @@ struct net_device_ops {
 	int			(*ndo_xdp_xmit)(struct net_device *dev, int n,
 						struct xdp_frame **xdp,
 						u32 flags);
+	struct net_device *	(*ndo_xdp_get_xmit_slave)(struct net_device *dev,
+							  struct xdp_buff *xdp);
 	int			(*ndo_xsk_wakeup)(struct net_device *dev,
 						  u32 queue_id, u32 flags);
 	struct devlink_port *	(*ndo_get_devlink_port)(struct net_device *dev);
diff --git a/net/core/filter.c b/net/core/filter.c
index caa88955562e..3c20edb0a0ff 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3947,6 +3947,31 @@ void bpf_clear_redirect_map(struct bpf_map *map)
 	}
 }
 
+DEFINE_STATIC_KEY_FALSE(bpf_master_redirect_enabled_key);
+EXPORT_SYMBOL_GPL(bpf_master_redirect_enabled_key);
+
+u32 xdp_master_redirect(struct xdp_buff *xdp)
+{
+	struct net_device *master, *slave;
+	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+
+	master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev);
+	slave = master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp);
+	if (slave && slave != xdp->rxq->dev) {
+		/* The target device is different from the receiving device, so
+		 * redirect it to the new device.
+		 * Using XDP_REDIRECT gets the correct behaviour from XDP enabled
+		 * drivers to unmap the packet from their rx ring.
+		 */
+		ri->tgt_index = slave->ifindex;
+		ri->map_id = INT_MAX;
+		ri->map_type = BPF_MAP_TYPE_UNSPEC;
+		return XDP_REDIRECT;
+	}
+	return XDP_TX;
+}
+EXPORT_SYMBOL_GPL(xdp_master_redirect);
+
 int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
 		    struct bpf_prog *xdp_prog)
 {
-- 
2.27.0


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v2 3/4] net: bonding: Add XDP support to the bonding driver
  2021-06-24  9:18 ` [PATCH bpf-next v2 0/4] " joamaki
  2021-06-24  9:18   ` [PATCH bpf-next v2 1/4] net: bonding: Refactor bond_xmit_hash for use with xdp_buff joamaki
  2021-06-24  9:18   ` [PATCH bpf-next v2 2/4] net: core: Add support for XDP redirection to slave device joamaki
@ 2021-06-24  9:18   ` joamaki
  2021-06-24  9:18   ` [PATCH bpf-next v2 4/4] devmap: Exclude XDP broadcast to master device joamaki
  2021-07-01 18:20   ` [PATCH bpf-next v2 0/4] XDP bonding support Jay Vosburgh
  4 siblings, 0 replies; 71+ messages in thread
From: joamaki @ 2021-06-24  9:18 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

From: Jussi Maki <joamaki@gmail.com>

XDP is implemented in the bonding driver by transparently delegating
the XDP program loading, removal and xmit operations to the bonding
slave devices. The overall goal of this work is that XDP programs
can be attached to a bond device *without* any further changes (or
awareness) necessary to the program itself, meaning the same XDP
program can be attached to a native device but also a bonding device.

Semantics of XDP_TX when attached to a bond are equivalent in such
setting to the case when a tc/BPF program would be attached to the
bond, meaning transmitting the packet out of the bond itself using one
of the bond's configured xmit methods to select a slave device (rather
than XDP_TX on the slave itself). Handling of XDP_TX to transmit
using the configured bonding mechanism is therefore implemented by
rewriting the BPF program return value in bpf_prog_run_xdp. To avoid
performance impact this check is guarded by a static key, which is
incremented when a XDP program is loaded onto a bond device. This
approach was chosen to avoid changes to drivers implementing XDP. If
the slave device does not match the receive device, then XDP_REDIRECT
is transparently used to perform the redirection in order to have
the network driver release the packet from its RX ring.  The bonding
driver hashing functions have been refactored to allow reuse with
xdp_buff's to avoid code duplication.

The motivation for this change is to enable use of bonding (and
802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
XDP and also to transparently support bond devices for projects that
use XDP given most modern NICs have dual port adapters.  An alternative
to this approach would be to implement 802.3ad in user-space and
implement the bonding load-balancing in the XDP program itself, but
is rather a cumbersome endeavor in terms of slave device management
(e.g. by watching netlink) and requires separate programs for native
vs bond cases for the orchestrator. A native in-kernel implementation
overcomes these issues and provides more flexibility.

Below are benchmark results done on two machines with 100Gbit
Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
16-core 3950X on receiving machine. 64 byte packets were sent with
pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
ice driver, so the tests were performed with iommu=off and patch [2]
applied. Additionally the bonding round robin algorithm was modified
to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
of cache misses were caused by the shared rr_tx_counter. 
The statistics were collected using "sar -n dev -u 1 10".

 -----------------------|  CPU  |--| rxpck/s |--| txpck/s |----
 without patch (1 dev):
   XDP_DROP:              3.15%      48.6Mpps
   XDP_TX:                3.12%      18.3Mpps     18.3Mpps
   XDP_DROP (RSS):        9.47%      116.5Mpps
   XDP_TX (RSS):          9.67%      25.3Mpps     24.2Mpps
 -----------------------
 with patch, bond (1 dev):
   XDP_DROP:              3.14%      46.7Mpps
   XDP_TX:                3.15%      13.9Mpps     13.9Mpps
   XDP_DROP (RSS):        10.33%     117.2Mpps
   XDP_TX (RSS):          10.64%     25.1Mpps     24.0Mpps
 -----------------------
 with patch, bond (2 devs):
   XDP_DROP:              6.27%      92.7Mpps
   XDP_TX:                6.26%      17.6Mpps     17.5Mpps
   XDP_DROP (RSS):       11.38%      117.2Mpps
   XDP_TX (RSS):         14.30%      28.7Mpps     27.4Mpps
 --------------------------------------------------------------

RSS: Receive Side Scaling, e.g. the packets were sent to a range of
destination IPs.

[1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb
[2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t
[3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 drivers/net/bonding/bond_main.c | 284 ++++++++++++++++++++++++++++++++
 include/net/bonding.h           |   1 +
 2 files changed, 285 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index c4dd0d0c701a..8fe5874f155a 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -317,6 +317,19 @@ bool bond_sk_check(struct bonding *bond)
 	}
 }
 
+static bool bond_xdp_check(struct bonding *bond)
+{
+	switch (BOND_MODE(bond)) {
+	case BOND_MODE_ROUNDROBIN:
+	case BOND_MODE_ACTIVEBACKUP:
+	case BOND_MODE_8023AD:
+	case BOND_MODE_XOR:
+		return true;
+	default:
+		return false;
+	}
+}
+
 /*---------------------------------- VLAN -----------------------------------*/
 
 /* In the following 2 functions, bond_vlan_rx_add_vid and bond_vlan_rx_kill_vid,
@@ -2001,6 +2014,28 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
 	if (bond_mode_can_use_xmit_hash(bond))
 		bond_update_slave_arr(bond, NULL);
 
+	if (bond->xdp_prog) {
+		struct netdev_bpf xdp = {
+			.command = XDP_SETUP_PROG,
+			.flags   = 0,
+			.prog    = bond->xdp_prog,
+			.extack  = extack,
+		};
+		if (!slave_dev->netdev_ops->ndo_bpf ||
+		    !slave_dev->netdev_ops->ndo_xdp_xmit) {
+			NL_SET_ERR_MSG(extack, "Slave does not support XDP");
+			slave_err(bond_dev, slave_dev, "Slave does not support XDP\n");
+			res = -EOPNOTSUPP;
+			goto err_sysfs_del;
+		}
+		res = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp);
+		if (res < 0) {
+			/* ndo_bpf() sets extack error message */
+			slave_dbg(bond_dev, slave_dev, "Error %d calling ndo_bpf\n", res);
+			goto err_sysfs_del;
+		}
+		bpf_prog_inc(bond->xdp_prog);
+	}
 
 	slave_info(bond_dev, slave_dev, "Enslaving as %s interface with %s link\n",
 		   bond_is_active_slave(new_slave) ? "an active" : "a backup",
@@ -2121,6 +2156,17 @@ static int __bond_release_one(struct net_device *bond_dev,
 	/* recompute stats just before removing the slave */
 	bond_get_stats(bond->dev, &bond->bond_stats);
 
+	if (bond->xdp_prog) {
+		struct netdev_bpf xdp = {
+			.command = XDP_SETUP_PROG,
+			.flags   = 0,
+			.prog	 = NULL,
+			.extack  = NULL,
+		};
+		if (slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp))
+			slave_warn(bond_dev, slave_dev, "failed to unload XDP program\n");
+	}
+
 	bond_upper_dev_unlink(bond, slave);
 	/* unregister rx_handler early so bond_handle_frame wouldn't be called
 	 * for this slave anymore.
@@ -3672,6 +3718,26 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
 				skb_headlen(skb));
 }
 
+/**
+ * bond_xmit_hash_xdp - generate a hash value based on the xmit policy
+ * @bond: bonding device
+ * @xdp: buffer to use for headers
+ *
+ * The XDP variant of bond_xmit_hash.
+ */
+static u32 bond_xmit_hash_xdp(struct bonding *bond, struct xdp_buff *xdp)
+{
+	struct ethhdr *eth;
+
+	if (xdp->data + sizeof(struct ethhdr) > xdp->data_end)
+		return 0;
+
+	eth = (struct ethhdr *)xdp->data;
+
+	return __bond_xmit_hash(bond, NULL, xdp->data, eth->h_proto, 0,
+				sizeof(struct ethhdr), xdp->data_end - xdp->data);
+}
+
 /*-------------------------- Device entry points ----------------------------*/
 
 void bond_work_init_all(struct bonding *bond)
@@ -4288,6 +4354,47 @@ static struct slave *bond_xmit_roundrobin_slave_get(struct bonding *bond,
 	return NULL;
 }
 
+static struct slave *bond_xdp_xmit_roundrobin_slave_get(struct bonding *bond,
+							struct xdp_buff *xdp)
+{
+	struct slave *slave;
+	int slave_cnt;
+	u32 slave_id;
+	const struct ethhdr *eth;
+	void *data = xdp->data;
+
+	if (data + sizeof(struct ethhdr) > xdp->data_end)
+		goto non_igmp;
+
+	eth = (struct ethhdr *)data;
+	data += sizeof(struct ethhdr);
+
+	/* See comment on IGMP in bond_xmit_roundrobin_slave_get() */
+	if (eth->h_proto == htons(ETH_P_IP)) {
+		const struct iphdr *iph;
+
+		if (data + sizeof(struct iphdr) > xdp->data_end)
+			goto non_igmp;
+
+		iph = (struct iphdr *)data;
+
+		if (iph->protocol == IPPROTO_IGMP) {
+			slave = rcu_dereference(bond->curr_active_slave);
+			if (slave)
+				return slave;
+			return bond_get_slave_by_id(bond, 0);
+		}
+	}
+
+non_igmp:
+	slave_cnt = READ_ONCE(bond->slave_cnt);
+	if (likely(slave_cnt)) {
+		slave_id = bond_rr_gen_slave_id(bond) % slave_cnt;
+		return bond_get_slave_by_id(bond, slave_id);
+	}
+	return NULL;
+}
+
 static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
 					struct net_device *bond_dev)
 {
@@ -4503,6 +4610,22 @@ static struct slave *bond_xmit_3ad_xor_slave_get(struct bonding *bond,
 	return slave;
 }
 
+static struct slave *bond_xdp_xmit_3ad_xor_slave_get(struct bonding *bond,
+						     struct xdp_buff *xdp)
+{
+	struct bond_up_slave *slaves;
+	unsigned int count;
+	u32 hash;
+
+	hash = bond_xmit_hash_xdp(bond, xdp);
+	slaves = bond->usable_slaves;
+	count = slaves ? READ_ONCE(slaves->count) : 0;
+	if (unlikely(!count))
+		return NULL;
+
+	return slaves->arr[hash % count];
+}
+
 /* Use this Xmit function for 3AD as well as XOR modes. The current
  * usable slave array is formed in the control path. The xmit function
  * just calculates hash and sends the packet out.
@@ -4787,6 +4910,164 @@ static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	return ret;
 }
 
+static struct net_device *
+bond_xdp_get_xmit_slave(struct net_device *bond_dev, struct xdp_buff *xdp)
+{
+	struct bonding *bond = netdev_priv(bond_dev);
+	struct slave *slave;
+
+	/* Caller needs to hold rcu_read_lock() */
+
+	switch (BOND_MODE(bond)) {
+	case BOND_MODE_ROUNDROBIN:
+		slave = bond_xdp_xmit_roundrobin_slave_get(bond, xdp);
+		break;
+
+	case BOND_MODE_ACTIVEBACKUP:
+		slave = bond_xmit_activebackup_slave_get(bond);
+		break;
+
+	case BOND_MODE_8023AD:
+	case BOND_MODE_XOR:
+		slave = bond_xdp_xmit_3ad_xor_slave_get(bond, xdp);
+		break;
+
+	default:
+		/* Should never happen. Mode guarded by bond_xdp_check() */
+		netdev_err(bond_dev, "Unknown bonding mode %d for xdp xmit\n", BOND_MODE(bond));
+		WARN_ON_ONCE(1);
+		return NULL;
+	}
+
+	if (slave)
+		return slave->dev;
+
+	return NULL;
+}
+
+static int bond_xdp_xmit(struct net_device *bond_dev,
+			 int n, struct xdp_frame **frames, u32 flags)
+{
+	int nxmit, err = -ENXIO;
+
+	rcu_read_lock();
+
+	for (nxmit = 0; nxmit < n; nxmit++) {
+		struct xdp_frame *frame = frames[nxmit];
+		struct xdp_frame *frames1[] = {frame};
+		struct net_device *slave_dev;
+		struct xdp_buff xdp;
+
+		xdp_convert_frame_to_buff(frame, &xdp);
+
+		slave_dev = bond_xdp_get_xmit_slave(bond_dev, &xdp);
+		if (!slave_dev) {
+			err = -ENXIO;
+			break;
+		}
+
+		err = slave_dev->netdev_ops->ndo_xdp_xmit(slave_dev, 1, frames1, flags);
+		if (err < 1)
+			break;
+	}
+
+	rcu_read_unlock();
+
+	/* If error happened on the first frame then we can pass the error up, otherwise
+	 * report the number of frames that were xmitted.
+	 */
+	if (err < 0)
+		return (nxmit == 0 ? err : nxmit);
+
+	return nxmit;
+}
+
+static int bond_xdp_set(struct net_device *dev, struct bpf_prog *prog,
+			struct netlink_ext_ack *extack)
+{
+	struct bonding *bond = netdev_priv(dev);
+	struct list_head *iter;
+	struct slave *slave, *rollback_slave;
+	struct bpf_prog *old_prog;
+	struct netdev_bpf xdp = {
+		.command = XDP_SETUP_PROG,
+		.flags   = 0,
+		.prog    = prog,
+		.extack  = extack,
+	};
+	int err;
+
+	ASSERT_RTNL();
+
+	if (!bond_xdp_check(bond))
+		return -EOPNOTSUPP;
+
+	old_prog = bond->xdp_prog;
+	bond->xdp_prog = prog;
+
+	bond_for_each_slave(bond, slave, iter) {
+		struct net_device *slave_dev = slave->dev;
+
+		if (!slave_dev->netdev_ops->ndo_bpf ||
+		    !slave_dev->netdev_ops->ndo_xdp_xmit) {
+			NL_SET_ERR_MSG(extack, "Slave device does not support XDP");
+			slave_err(dev, slave_dev, "Slave does not support XDP\n");
+			err = -EOPNOTSUPP;
+			goto err;
+		}
+		err = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp);
+		if (err < 0) {
+			/* ndo_bpf() sets extack error message */
+			slave_err(dev, slave_dev, "Error %d calling ndo_bpf\n", err);
+			goto err;
+		}
+		if (prog)
+			bpf_prog_inc(prog);
+	}
+
+	if (old_prog)
+		bpf_prog_put(old_prog);
+
+	if (prog)
+		static_branch_inc(&bpf_master_redirect_enabled_key);
+	else
+		static_branch_dec(&bpf_master_redirect_enabled_key);
+
+	return 0;
+
+err:
+	/* unwind the program changes */
+	bond->xdp_prog = old_prog;
+	xdp.prog = old_prog;
+	xdp.extack = NULL; /* do not overwrite original error */
+
+	bond_for_each_slave(bond, rollback_slave, iter) {
+		struct net_device *slave_dev = rollback_slave->dev;
+		int err_unwind;
+
+		if (slave == rollback_slave)
+			break;
+
+		err_unwind = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp);
+		if (err_unwind < 0)
+			slave_err(dev, slave_dev,
+				  "Error %d when unwinding XDP program change\n", err_unwind);
+		else if (xdp.prog)
+			bpf_prog_inc(xdp.prog);
+	}
+	return err;
+}
+
+static int bond_xdp(struct net_device *dev, struct netdev_bpf *xdp)
+{
+	switch (xdp->command) {
+	case XDP_SETUP_PROG:
+		return bond_xdp_set(dev, xdp->prog, xdp->extack);
+	default:
+		return -EINVAL;
+	}
+}
+
 static u32 bond_mode_bcast_speed(struct slave *slave, u32 speed)
 {
 	if (speed == 0 || speed == SPEED_UNKNOWN)
@@ -4873,6 +5154,9 @@ static const struct net_device_ops bond_netdev_ops = {
 	.ndo_features_check	= passthru_features_check,
 	.ndo_get_xmit_slave	= bond_xmit_get_slave,
 	.ndo_sk_get_lower_dev	= bond_sk_get_lower_dev,
+	.ndo_bpf		= bond_xdp,
+	.ndo_xdp_xmit           = bond_xdp_xmit,
+	.ndo_xdp_get_xmit_slave = bond_xdp_get_xmit_slave,
 };
 
 static const struct device_type bond_type = {
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 019e998d944a..34acb81b4234 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -251,6 +251,7 @@ struct bonding {
 #ifdef CONFIG_XFRM_OFFLOAD
 	struct xfrm_state *xs;
 #endif /* CONFIG_XFRM_OFFLOAD */
+	struct bpf_prog *xdp_prog;
 };
 
 #define bond_slave_get_rcu(dev) \
-- 
2.27.0


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v2 4/4] devmap: Exclude XDP broadcast to master device
  2021-06-24  9:18 ` [PATCH bpf-next v2 0/4] " joamaki
                     ` (2 preceding siblings ...)
  2021-06-24  9:18   ` [PATCH bpf-next v2 3/4] net: bonding: Add XDP support to the bonding driver joamaki
@ 2021-06-24  9:18   ` joamaki
  2021-07-01 18:12     ` Jay Vosburgh
  2021-07-01 18:20   ` [PATCH bpf-next v2 0/4] XDP bonding support Jay Vosburgh
  4 siblings, 1 reply; 71+ messages in thread
From: joamaki @ 2021-06-24  9:18 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

From: Jussi Maki <joamaki@gmail.com>

If the ingress device is bond slave, do not broadcast back
through it or the bond master.

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 kernel/bpf/devmap.c | 34 ++++++++++++++++++++++++++++------
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 2a75e6c2d27d..0864fb28c8b5 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -514,9 +514,11 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
 }
 
 static bool is_valid_dst(struct bpf_dtab_netdev *obj, struct xdp_buff *xdp,
-			 int exclude_ifindex)
+			 int exclude_ifindex, int exclude_ifindex_master)
 {
-	if (!obj || obj->dev->ifindex == exclude_ifindex ||
+	if (!obj ||
+	    obj->dev->ifindex == exclude_ifindex ||
+	    obj->dev->ifindex == exclude_ifindex_master ||
 	    !obj->dev->netdev_ops->ndo_xdp_xmit)
 		return false;
 
@@ -546,12 +548,19 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
 	int exclude_ifindex = exclude_ingress ? dev_rx->ifindex : 0;
+	int exclude_ifindex_master = 0;
 	struct bpf_dtab_netdev *dst, *last_dst = NULL;
 	struct hlist_head *head;
 	struct xdp_frame *xdpf;
 	unsigned int i;
 	int err;
 
+	if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) {
+		struct net_device *master = netdev_master_upper_dev_get_rcu(dev_rx);
+
+		exclude_ifindex_master = (master && exclude_ingress) ? master->ifindex : 0;
+	}
+
 	xdpf = xdp_convert_buff_to_frame(xdp);
 	if (unlikely(!xdpf))
 		return -EOVERFLOW;
@@ -559,7 +568,7 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
 	if (map->map_type == BPF_MAP_TYPE_DEVMAP) {
 		for (i = 0; i < map->max_entries; i++) {
 			dst = READ_ONCE(dtab->netdev_map[i]);
-			if (!is_valid_dst(dst, xdp, exclude_ifindex))
+			if (!is_valid_dst(dst, xdp, exclude_ifindex, exclude_ifindex_master))
 				continue;
 
 			/* we only need n-1 clones; last_dst enqueued below */
@@ -579,7 +588,9 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
 			head = dev_map_index_hash(dtab, i);
 			hlist_for_each_entry_rcu(dst, head, index_hlist,
 						 lockdep_is_held(&dtab->index_lock)) {
-				if (!is_valid_dst(dst, xdp, exclude_ifindex))
+				if (!is_valid_dst(dst, xdp,
+						  exclude_ifindex,
+						  exclude_ifindex_master))
 					continue;
 
 				/* we only need n-1 clones; last_dst enqueued below */
@@ -646,16 +657,25 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb,
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
 	int exclude_ifindex = exclude_ingress ? dev->ifindex : 0;
+	int exclude_ifindex_master = 0;
 	struct bpf_dtab_netdev *dst, *last_dst = NULL;
 	struct hlist_head *head;
 	struct hlist_node *next;
 	unsigned int i;
 	int err;
 
+	if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) {
+		struct net_device *master = netdev_master_upper_dev_get_rcu(dev);
+
+		exclude_ifindex_master = (master && exclude_ingress) ? master->ifindex : 0;
+	}
+
 	if (map->map_type == BPF_MAP_TYPE_DEVMAP) {
 		for (i = 0; i < map->max_entries; i++) {
 			dst = READ_ONCE(dtab->netdev_map[i]);
-			if (!dst || dst->dev->ifindex == exclude_ifindex)
+			if (!dst ||
+			    dst->dev->ifindex == exclude_ifindex ||
+			    dst->dev->ifindex == exclude_ifindex_master)
 				continue;
 
 			/* we only need n-1 clones; last_dst enqueued below */
@@ -674,7 +694,9 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb,
 		for (i = 0; i < dtab->n_buckets; i++) {
 			head = dev_map_index_hash(dtab, i);
 			hlist_for_each_entry_safe(dst, next, head, index_hlist) {
-				if (!dst || dst->dev->ifindex == exclude_ifindex)
+				if (!dst ||
+				    dst->dev->ifindex == exclude_ifindex ||
+				    dst->dev->ifindex == exclude_ifindex_master)
 					continue;
 
 				/* we only need n-1 clones; last_dst enqueued below */
-- 
2.27.0


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next v2 4/4] devmap: Exclude XDP broadcast to master device
  2021-06-24  9:18   ` [PATCH bpf-next v2 4/4] devmap: Exclude XDP broadcast to master device joamaki
@ 2021-07-01 18:12     ` Jay Vosburgh
  2021-07-05 11:44       ` Jussi Maki
  0 siblings, 1 reply; 71+ messages in thread
From: Jay Vosburgh @ 2021-07-01 18:12 UTC (permalink / raw)
  To: joamaki
  Cc: bpf, netdev, daniel, andy, vfalico, andrii, maciej.fijalkowski,
	magnus.karlsson

joamaki@gmail.com wrote:

>From: Jussi Maki <joamaki@gmail.com>
>
>If the ingress device is bond slave, do not broadcast back
>through it or the bond master.
>
>Signed-off-by: Jussi Maki <joamaki@gmail.com>
>---
> kernel/bpf/devmap.c | 34 ++++++++++++++++++++++++++++------
> 1 file changed, 28 insertions(+), 6 deletions(-)
>
>diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
>index 2a75e6c2d27d..0864fb28c8b5 100644
>--- a/kernel/bpf/devmap.c
>+++ b/kernel/bpf/devmap.c
>@@ -514,9 +514,11 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
> }
> 
> static bool is_valid_dst(struct bpf_dtab_netdev *obj, struct xdp_buff *xdp,
>-			 int exclude_ifindex)
>+			 int exclude_ifindex, int exclude_ifindex_master)
> {
>-	if (!obj || obj->dev->ifindex == exclude_ifindex ||
>+	if (!obj ||
>+	    obj->dev->ifindex == exclude_ifindex ||
>+	    obj->dev->ifindex == exclude_ifindex_master ||
> 	    !obj->dev->netdev_ops->ndo_xdp_xmit)
> 		return false;
> 
>@@ -546,12 +548,19 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
> {
> 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
> 	int exclude_ifindex = exclude_ingress ? dev_rx->ifindex : 0;
>+	int exclude_ifindex_master = 0;
> 	struct bpf_dtab_netdev *dst, *last_dst = NULL;
> 	struct hlist_head *head;
> 	struct xdp_frame *xdpf;
> 	unsigned int i;
> 	int err;
> 
>+	if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) {
>+		struct net_device *master = netdev_master_upper_dev_get_rcu(dev_rx);
>+
>+		exclude_ifindex_master = (master && exclude_ingress) ? master->ifindex : 0;
>+	}
>+

	Will the above logic do what is intended if the device stacking
isn't a simple bond -> ethX arrangement?  I.e., bond -> VLAN.?? -> ethX
or perhaps even bondA -> VLAN.?? -> bondB -> ethX ?

	-J

> 	xdpf = xdp_convert_buff_to_frame(xdp);
> 	if (unlikely(!xdpf))
> 		return -EOVERFLOW;
>@@ -559,7 +568,7 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
> 	if (map->map_type == BPF_MAP_TYPE_DEVMAP) {
> 		for (i = 0; i < map->max_entries; i++) {
> 			dst = READ_ONCE(dtab->netdev_map[i]);
>-			if (!is_valid_dst(dst, xdp, exclude_ifindex))
>+			if (!is_valid_dst(dst, xdp, exclude_ifindex, exclude_ifindex_master))
> 				continue;
> 
> 			/* we only need n-1 clones; last_dst enqueued below */
>@@ -579,7 +588,9 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
> 			head = dev_map_index_hash(dtab, i);
> 			hlist_for_each_entry_rcu(dst, head, index_hlist,
> 						 lockdep_is_held(&dtab->index_lock)) {
>-				if (!is_valid_dst(dst, xdp, exclude_ifindex))
>+				if (!is_valid_dst(dst, xdp,
>+						  exclude_ifindex,
>+						  exclude_ifindex_master))
> 					continue;
> 
> 				/* we only need n-1 clones; last_dst enqueued below */
>@@ -646,16 +657,25 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb,
> {
> 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
> 	int exclude_ifindex = exclude_ingress ? dev->ifindex : 0;
>+	int exclude_ifindex_master = 0;
> 	struct bpf_dtab_netdev *dst, *last_dst = NULL;
> 	struct hlist_head *head;
> 	struct hlist_node *next;
> 	unsigned int i;
> 	int err;
> 
>+	if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) {
>+		struct net_device *master = netdev_master_upper_dev_get_rcu(dev);
>+
>+		exclude_ifindex_master = (master && exclude_ingress) ? master->ifindex : 0;
>+	}
>+
> 	if (map->map_type == BPF_MAP_TYPE_DEVMAP) {
> 		for (i = 0; i < map->max_entries; i++) {
> 			dst = READ_ONCE(dtab->netdev_map[i]);
>-			if (!dst || dst->dev->ifindex == exclude_ifindex)
>+			if (!dst ||
>+			    dst->dev->ifindex == exclude_ifindex ||
>+			    dst->dev->ifindex == exclude_ifindex_master)
> 				continue;
> 
> 			/* we only need n-1 clones; last_dst enqueued below */
>@@ -674,7 +694,9 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb,
> 		for (i = 0; i < dtab->n_buckets; i++) {
> 			head = dev_map_index_hash(dtab, i);
> 			hlist_for_each_entry_safe(dst, next, head, index_hlist) {
>-				if (!dst || dst->dev->ifindex == exclude_ifindex)
>+				if (!dst ||
>+				    dst->dev->ifindex == exclude_ifindex ||
>+				    dst->dev->ifindex == exclude_ifindex_master)
> 					continue;
> 
> 				/* we only need n-1 clones; last_dst enqueued below */
>-- 
>2.27.0
>

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next v2 0/4] XDP bonding support
  2021-06-24  9:18 ` [PATCH bpf-next v2 0/4] " joamaki
                     ` (3 preceding siblings ...)
  2021-06-24  9:18   ` [PATCH bpf-next v2 4/4] devmap: Exclude XDP broadcast to master device joamaki
@ 2021-07-01 18:20   ` Jay Vosburgh
  2021-07-05 10:32     ` Jussi Maki
  4 siblings, 1 reply; 71+ messages in thread
From: Jay Vosburgh @ 2021-07-01 18:20 UTC (permalink / raw)
  To: joamaki
  Cc: bpf, netdev, daniel, andy, vfalico, andrii, maciej.fijalkowski,
	magnus.karlsson

joamaki@gmail.com wrote:

>From: Jussi Maki <joamaki@gmail.com>
>
>This patchset introduces XDP support to the bonding driver.
>
>The motivation for this change is to enable use of bonding (and
>802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
>XDP and also to transparently support bond devices for projects that
>use XDP given most modern NICs have dual port adapters.  An alternative
>to this approach would be to implement 802.3ad in user-space and
>implement the bonding load-balancing in the XDP program itself, but
>is rather a cumbersome endeavor in terms of slave device management
>(e.g. by watching netlink) and requires separate programs for native
>vs bond cases for the orchestrator. A native in-kernel implementation
>overcomes these issues and provides more flexibility.
>
>Below are benchmark results done on two machines with 100Gbit
>Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
>16-core 3950X on receiving machine. 64 byte packets were sent with
>pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
>ice driver, so the tests were performed with iommu=off and patch [2]
>applied. Additionally the bonding round robin algorithm was modified
>to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
>of cache misses were caused by the shared rr_tx_counter. Fix for this
>has been already merged into net-next. The statistics were collected 
>using "sar -n dev -u 1 10".
>
> -----------------------|  CPU  |--| rxpck/s |--| txpck/s |----
> without patch (1 dev):
>   XDP_DROP:              3.15%      48.6Mpps
>   XDP_TX:                3.12%      18.3Mpps     18.3Mpps
>   XDP_DROP (RSS):        9.47%      116.5Mpps
>   XDP_TX (RSS):          9.67%      25.3Mpps     24.2Mpps
> -----------------------
> with patch, bond (1 dev):
>   XDP_DROP:              3.14%      46.7Mpps
>   XDP_TX:                3.15%      13.9Mpps     13.9Mpps
>   XDP_DROP (RSS):        10.33%     117.2Mpps
>   XDP_TX (RSS):          10.64%     25.1Mpps     24.0Mpps
> -----------------------
> with patch, bond (2 devs):
>   XDP_DROP:              6.27%      92.7Mpps
>   XDP_TX:                6.26%      17.6Mpps     17.5Mpps
>   XDP_DROP (RSS):       11.38%      117.2Mpps
>   XDP_TX (RSS):         14.30%      28.7Mpps     27.4Mpps
> --------------------------------------------------------------

	To be clear, the fact that the performance numbers for XDP_DROP
and XDP_TX are lower for "with patch, bond (1 dev)" than "without patch
(1 dev)" is expected, correct?

	-J

>RSS: Receive Side Scaling, e.g. the packets were sent to a range of
>destination IPs.
>
>[1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb
>[2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t
>[3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/
>
>Patch 1 prepares bond_xmit_hash for hashing xdp_buff's
>Patch 2 adds hooks to implement redirection after bpf prog run
>Patch 3 implements the hooks in the bonding driver. 
>Patch 4 modifies devmap to properly handle EXCLUDE_INGRESS with a slave device.
>
>v1->v2:
>- Split up into smaller easier to review patches and address cosmetic 
>  review comments.
>- Drop the INDIRECT_CALL optimization as it showed little improvement in tests.
>- Drop the rr_tx_counter patch as that has already been merged into net-next.
>- Separate the test suite into another patch set. This will follow later once a
>  patch set from Magnus Karlsson is merged and provides test utilities that can
>  be reused for XDP bonding tests. v2 contains no major functional changes and
>  was tested with the test suite included in v1.
>  (https://lore.kernel.org/bpf/202106221509.kwNvAAZg-lkp@intel.com/T/#m464146d47299125d5868a08affd6d6ce526dfad1)
>
>---
>
>Jussi Maki (4):
>  net: bonding: Refactor bond_xmit_hash for use with xdp_buff
>  net: core: Add support for XDP redirection to slave device
>  net: bonding: Add XDP support to the bonding driver
>  devmap: Exclude XDP broadcast to master device
>
> drivers/net/bonding/bond_main.c | 431 +++++++++++++++++++++++++++-----
> include/linux/filter.h          |  13 +-
> include/linux/netdevice.h       |   5 +
> include/net/bonding.h           |   1 +
> kernel/bpf/devmap.c             |  34 ++-
> net/core/filter.c               |  25 ++
> 6 files changed, 445 insertions(+), 64 deletions(-)
>
>-- 
>2.27.0

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next v2 0/4] XDP bonding support
  2021-07-01 18:20   ` [PATCH bpf-next v2 0/4] XDP bonding support Jay Vosburgh
@ 2021-07-05 10:32     ` Jussi Maki
  0 siblings, 0 replies; 71+ messages in thread
From: Jussi Maki @ 2021-07-05 10:32 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: bpf, Network Development, Daniel Borkmann, Andy Gospodarek,
	vfalico, Andrii Nakryiko, Maciej Fijalkowski, Karlsson, Magnus

On Thu, Jul 1, 2021 at 9:20 PM Jay Vosburgh <jay.vosburgh@canonical.com> wrote:
>
> joamaki@gmail.com wrote:
>
> >From: Jussi Maki <joamaki@gmail.com>
> >
> >This patchset introduces XDP support to the bonding driver.
> >
> >The motivation for this change is to enable use of bonding (and
> >802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
> >XDP and also to transparently support bond devices for projects that
> >use XDP given most modern NICs have dual port adapters.  An alternative
> >to this approach would be to implement 802.3ad in user-space and
> >implement the bonding load-balancing in the XDP program itself, but
> >is rather a cumbersome endeavor in terms of slave device management
> >(e.g. by watching netlink) and requires separate programs for native
> >vs bond cases for the orchestrator. A native in-kernel implementation
> >overcomes these issues and provides more flexibility.
> >
> >Below are benchmark results done on two machines with 100Gbit
> >Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
> >16-core 3950X on receiving machine. 64 byte packets were sent with
> >pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
> >ice driver, so the tests were performed with iommu=off and patch [2]
> >applied. Additionally the bonding round robin algorithm was modified
> >to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
> >of cache misses were caused by the shared rr_tx_counter. Fix for this
> >has been already merged into net-next. The statistics were collected
> >using "sar -n dev -u 1 10".
> >
> > -----------------------|  CPU  |--| rxpck/s |--| txpck/s |----
> > without patch (1 dev):
> >   XDP_DROP:              3.15%      48.6Mpps
> >   XDP_TX:                3.12%      18.3Mpps     18.3Mpps
> >   XDP_DROP (RSS):        9.47%      116.5Mpps
> >   XDP_TX (RSS):          9.67%      25.3Mpps     24.2Mpps
> > -----------------------
> > with patch, bond (1 dev):
> >   XDP_DROP:              3.14%      46.7Mpps
> >   XDP_TX:                3.15%      13.9Mpps     13.9Mpps
> >   XDP_DROP (RSS):        10.33%     117.2Mpps
> >   XDP_TX (RSS):          10.64%     25.1Mpps     24.0Mpps
> > -----------------------
> > with patch, bond (2 devs):
> >   XDP_DROP:              6.27%      92.7Mpps
> >   XDP_TX:                6.26%      17.6Mpps     17.5Mpps
> >   XDP_DROP (RSS):       11.38%      117.2Mpps
> >   XDP_TX (RSS):         14.30%      28.7Mpps     27.4Mpps
> > --------------------------------------------------------------
>
>         To be clear, the fact that the performance numbers for XDP_DROP
> and XDP_TX are lower for "with patch, bond (1 dev)" than "without patch
> (1 dev)" is expected, correct?

Yes that is correct. With the patch the ndo callback for choosing the
slave device is invoked which in this test (mode=xor) hashes L2&L3
headers (I seem to have failed to mention this in the original
message). In round-robin mode I recall it being about 16Mpps versus
the 18Mpps without the patch. I did also try "INDIRECT_CALL" to avoid
going via ndo_ops, but that had no discernible effect.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next v2 4/4] devmap: Exclude XDP broadcast to master device
  2021-07-01 18:12     ` Jay Vosburgh
@ 2021-07-05 11:44       ` Jussi Maki
  0 siblings, 0 replies; 71+ messages in thread
From: Jussi Maki @ 2021-07-05 11:44 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: bpf, Network Development, Daniel Borkmann, Andy Gospodarek,
	vfalico, Andrii Nakryiko, Maciej Fijalkowski, Karlsson, Magnus

On Thu, Jul 1, 2021 at 9:12 PM Jay Vosburgh <jay.vosburgh@canonical.com> wrote:
> >+      if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) {
> >+              struct net_device *master = netdev_master_upper_dev_get_rcu(dev_rx);
> >+
> >+              exclude_ifindex_master = (master && exclude_ingress) ? master->ifindex : 0;
> >+      }
> >+
>
>         Will the above logic do what is intended if the device stacking
> isn't a simple bond -> ethX arrangement?  I.e., bond -> VLAN.?? -> ethX
> or perhaps even bondA -> VLAN.?? -> bondB -> ethX ?

Good point. "bond -> VLAN -> eth" isn't an issue currently as vlan
devices do not support XDP. "bondA -> bondB -> ethX" however would be
supported, so I think it makes sense to change the code to collect all
upper devices and exclude them. I'll try to follow up with an updated
patch for this soon.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v3 0/5] XDP bonding support
  2021-06-09 13:55 [PATCH bpf-next 0/3] XDP bonding support Jussi Maki
                   ` (4 preceding siblings ...)
  2021-06-24  9:18 ` [PATCH bpf-next v2 0/4] " joamaki
@ 2021-07-07 11:25 ` Jussi Maki
  2021-07-07 11:25   ` [PATCH bpf-next v3 1/5] net: bonding: Refactor bond_xmit_hash for use with xdp_buff Jussi Maki
                     ` (4 more replies)
  2021-07-28 23:43 ` [PATCH bpf-next v4 0/6] XDP bonding support joamaki
                   ` (2 subsequent siblings)
  8 siblings, 5 replies; 71+ messages in thread
From: Jussi Maki @ 2021-07-07 11:25 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

This patchset introduces XDP support to the bonding driver.

The motivation for this change is to enable use of bonding (and
802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
XDP and also to transparently support bond devices for projects that
use XDP given most modern NICs have dual port adapters.  An alternative
to this approach would be to implement 802.3ad in user-space and
implement the bonding load-balancing in the XDP program itself, but
is rather a cumbersome endeavor in terms of slave device management
(e.g. by watching netlink) and requires separate programs for native
vs bond cases for the orchestrator. A native in-kernel implementation
overcomes these issues and provides more flexibility.

Below are benchmark results done on two machines with 100Gbit
Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
16-core 3950X on receiving machine. 64 byte packets were sent with
pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
ice driver, so the tests were performed with iommu=off and patch [2]
applied. Additionally the bonding round robin algorithm was modified
to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
of cache misses were caused by the shared rr_tx_counter. Fix for this
has been already merged into net-next. The statistics were collected 
using "sar -n dev -u 1 10".

 -----------------------|  CPU  |--| rxpck/s |--| txpck/s |----
 without patch (1 dev):
   XDP_DROP:              3.15%      48.6Mpps
   XDP_TX:                3.12%      18.3Mpps     18.3Mpps
   XDP_DROP (RSS):        9.47%      116.5Mpps
   XDP_TX (RSS):          9.67%      25.3Mpps     24.2Mpps
 -----------------------
 with patch, bond (1 dev):
   XDP_DROP:              3.14%      46.7Mpps
   XDP_TX:                3.15%      13.9Mpps     13.9Mpps
   XDP_DROP (RSS):        10.33%     117.2Mpps
   XDP_TX (RSS):          10.64%     25.1Mpps     24.0Mpps
 -----------------------
 with patch, bond (2 devs):
   XDP_DROP:              6.27%      92.7Mpps
   XDP_TX:                6.26%      17.6Mpps     17.5Mpps
   XDP_DROP (RSS):       11.38%      117.2Mpps
   XDP_TX (RSS):         14.30%      28.7Mpps     27.4Mpps
 --------------------------------------------------------------

RSS: Receive Side Scaling, e.g. the packets were sent to a range of
destination IPs.

[1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb
[2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t
[3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/

Patch 1 prepares bond_xmit_hash for hashing xdp_buff's.
Patch 2 adds hooks to implement redirection after bpf prog run.
Patch 3 implements the hooks in the bonding driver. 
Patch 4 modifies devmap to properly handle EXCLUDE_INGRESS with a slave device.
Patch 5 fixes an issue related to recent cleanup of rcu_read_lock in XDP context.

v2->v3:
- Address Jay's comment to properly exclude upper devices with EXCLUDE_INGRESS
  when there are deeper nesting involved. Now all upper devices are excluded.
- Refuse to enslave devices that already have XDP programs loaded and refuse to
  load XDP programs to slave devices. Earlier one could have a XDP program loaded
  and after enslaving and loading another program onto the bond device the xdp_state
  of the enslaved device would be pointing at an old program.
- Adapt netdev_lower_get_next_private_rcu so it can be called in the XDP context.

v1->v2:
- Split up into smaller easier to review patches and address cosmetic 
  review comments.
- Drop the INDIRECT_CALL optimization as it showed little improvement in tests.
- Drop the rr_tx_counter patch as that has already been merged into net-next.
- Separate the test suite into another patch set. This will follow later once a
  patch set from Magnus Karlsson is merged and provides test utilities that can
  be reused for XDP bonding tests. v2 contains no major functional changes and
  was tested with the test suite included in v1.
  (https://lore.kernel.org/bpf/202106221509.kwNvAAZg-lkp@intel.com/T/#m464146d47299125d5868a08affd6d6ce526dfad1)

---

Jussi Maki (5):
  net: bonding: Refactor bond_xmit_hash for use with xdp_buff
  net: core: Add support for XDP redirection to slave device
  net: bonding: Add XDP support to the bonding driver
  devmap: Exclude XDP broadcast to master device
  net: core: Allow netdev_lower_get_next_private_rcu in bh context

 drivers/net/bonding/bond_main.c | 450 ++++++++++++++++++++++++++++----
 include/linux/filter.h          |  13 +-
 include/linux/netdevice.h       |   6 +
 include/net/bonding.h           |   1 +
 kernel/bpf/devmap.c             |  67 ++++-
 net/core/dev.c                  |  11 +-
 net/core/filter.c               |  25 ++
 7 files changed, 504 insertions(+), 69 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v3 1/5] net: bonding: Refactor bond_xmit_hash for use with xdp_buff
  2021-07-07 11:25 ` [PATCH bpf-next v3 0/5] " Jussi Maki
@ 2021-07-07 11:25   ` Jussi Maki
  2021-07-07 11:25   ` [PATCH bpf-next v3 2/5] net: core: Add support for XDP redirection to slave device Jussi Maki
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 71+ messages in thread
From: Jussi Maki @ 2021-07-07 11:25 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

In preparation for adding XDP support to the bonding driver
refactor the packet hashing functions to be able to work with
any linear data buffer without an skb.

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 drivers/net/bonding/bond_main.c | 147 +++++++++++++++++++-------------
 1 file changed, 90 insertions(+), 57 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 0ff7567bd04f..78284c451668 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3487,55 +3487,80 @@ static struct notifier_block bond_netdev_notifier = {
 
 /*---------------------------- Hashing Policies -----------------------------*/
 
+/* Helper to access data in a packet, with or without a backing skb.
+ * If skb is given the data is linearized if necessary via pskb_may_pull.
+ */
+static inline const void *bond_pull_data(struct sk_buff *skb,
+					 const void *data, int hlen, int n)
+{
+	if (likely(n <= hlen))
+		return data;
+	else if (skb && likely(pskb_may_pull(skb, n)))
+		return skb->head;
+
+	return NULL;
+}
+
 /* L2 hash helper */
-static inline u32 bond_eth_hash(struct sk_buff *skb)
+static inline u32 bond_eth_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen)
 {
-	struct ethhdr *ep, hdr_tmp;
+	struct ethhdr *ep;
 
-	ep = skb_header_pointer(skb, 0, sizeof(hdr_tmp), &hdr_tmp);
-	if (ep)
-		return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto;
-	return 0;
+	data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr));
+	if (!data)
+		return 0;
+
+	ep = (struct ethhdr *)(data + mhoff);
+	return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto;
 }
 
-static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk,
-			 int *noff, int *proto, bool l34)
+static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk, const void *data,
+			 int hlen, __be16 l2_proto, int *nhoff, int *ip_proto, bool l34)
 {
 	const struct ipv6hdr *iph6;
 	const struct iphdr *iph;
 
-	if (skb->protocol == htons(ETH_P_IP)) {
-		if (unlikely(!pskb_may_pull(skb, *noff + sizeof(*iph))))
+	if (l2_proto == htons(ETH_P_IP)) {
+		data = bond_pull_data(skb, data, hlen, *nhoff + sizeof(*iph));
+		if (!data)
 			return false;
-		iph = (const struct iphdr *)(skb->data + *noff);
+
+		iph = (const struct iphdr *)(data + *nhoff);
 		iph_to_flow_copy_v4addrs(fk, iph);
-		*noff += iph->ihl << 2;
+		*nhoff += iph->ihl << 2;
 		if (!ip_is_fragment(iph))
-			*proto = iph->protocol;
-	} else if (skb->protocol == htons(ETH_P_IPV6)) {
-		if (unlikely(!pskb_may_pull(skb, *noff + sizeof(*iph6))))
+			*ip_proto = iph->protocol;
+	} else if (l2_proto == htons(ETH_P_IPV6)) {
+		data = bond_pull_data(skb, data, hlen, *nhoff + sizeof(*iph6));
+		if (!data)
 			return false;
-		iph6 = (const struct ipv6hdr *)(skb->data + *noff);
+
+		iph6 = (const struct ipv6hdr *)(data + *nhoff);
 		iph_to_flow_copy_v6addrs(fk, iph6);
-		*noff += sizeof(*iph6);
-		*proto = iph6->nexthdr;
+		*nhoff += sizeof(*iph6);
+		*ip_proto = iph6->nexthdr;
 	} else {
 		return false;
 	}
 
-	if (l34 && *proto >= 0)
-		fk->ports.ports = skb_flow_get_ports(skb, *noff, *proto);
+	if (l34 && *ip_proto >= 0)
+		fk->ports.ports = __skb_flow_get_ports(skb, *nhoff, *ip_proto, data, hlen);
 
 	return true;
 }
 
-static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
+static u32 bond_vlan_srcmac_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen)
 {
-	struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
+	struct ethhdr *mac_hdr;
 	u32 srcmac_vendor = 0, srcmac_dev = 0;
 	u16 vlan;
 	int i;
 
+	data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr));
+	if (!data)
+		return 0;
+	mac_hdr = (struct ethhdr *)(data + mhoff);
+
 	for (i = 0; i < 3; i++)
 		srcmac_vendor = (srcmac_vendor << 8) | mac_hdr->h_source[i];
 
@@ -3551,26 +3576,25 @@ static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
 }
 
 /* Extract the appropriate headers based on bond's xmit policy */
-static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
-			      struct flow_keys *fk)
+static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb, const void *data,
+			      __be16 l2_proto, int nhoff, int hlen, struct flow_keys *fk)
 {
 	bool l34 = bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34;
-	int noff, proto = -1;
+	int ip_proto = -1;
 
 	switch (bond->params.xmit_policy) {
 	case BOND_XMIT_POLICY_ENCAP23:
 	case BOND_XMIT_POLICY_ENCAP34:
 		memset(fk, 0, sizeof(*fk));
 		return __skb_flow_dissect(NULL, skb, &flow_keys_bonding,
-					  fk, NULL, 0, 0, 0, 0);
+					  fk, data, l2_proto, nhoff, hlen, 0);
 	default:
 		break;
 	}
 
 	fk->ports.ports = 0;
 	memset(&fk->icmp, 0, sizeof(fk->icmp));
-	noff = skb_network_offset(skb);
-	if (!bond_flow_ip(skb, fk, &noff, &proto, l34))
+	if (!bond_flow_ip(skb, fk, data, hlen, l2_proto, &nhoff, &ip_proto, l34))
 		return false;
 
 	/* ICMP error packets contains at least 8 bytes of the header
@@ -3578,22 +3602,20 @@ static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
 	 * to correlate ICMP error packets within the same flow which
 	 * generated the error.
 	 */
-	if (proto == IPPROTO_ICMP || proto == IPPROTO_ICMPV6) {
-		skb_flow_get_icmp_tci(skb, &fk->icmp, skb->data,
-				      skb_transport_offset(skb),
-				      skb_headlen(skb));
-		if (proto == IPPROTO_ICMP) {
+	if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
+		skb_flow_get_icmp_tci(skb, &fk->icmp, data, nhoff, hlen);
+		if (ip_proto == IPPROTO_ICMP) {
 			if (!icmp_is_err(fk->icmp.type))
 				return true;
 
-			noff += sizeof(struct icmphdr);
-		} else if (proto == IPPROTO_ICMPV6) {
+			nhoff += sizeof(struct icmphdr);
+		} else if (ip_proto == IPPROTO_ICMPV6) {
 			if (!icmpv6_is_err(fk->icmp.type))
 				return true;
 
-			noff += sizeof(struct icmp6hdr);
+			nhoff += sizeof(struct icmp6hdr);
 		}
-		return bond_flow_ip(skb, fk, &noff, &proto, l34);
+		return bond_flow_ip(skb, fk, data, hlen, l2_proto, &nhoff, &ip_proto, l34);
 	}
 
 	return true;
@@ -3609,33 +3631,26 @@ static u32 bond_ip_hash(u32 hash, struct flow_keys *flow)
 	return hash >> 1;
 }
 
-/**
- * bond_xmit_hash - generate a hash value based on the xmit policy
- * @bond: bonding device
- * @skb: buffer to use for headers
- *
- * This function will extract the necessary headers from the skb buffer and use
- * them to generate a hash based on the xmit_policy set in the bonding device
+/* Generate hash based on xmit policy. If @skb is given it is used to linearize
+ * the data as required, but this function can be used without it if the data is
+ * known to be linear (e.g. with xdp_buff).
  */
-u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
+static u32 __bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, const void *data,
+			    __be16 l2_proto, int mhoff, int nhoff, int hlen)
 {
 	struct flow_keys flow;
 	u32 hash;
 
-	if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
-	    skb->l4_hash)
-		return skb->hash;
-
 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_VLAN_SRCMAC)
-		return bond_vlan_srcmac_hash(skb);
+		return bond_vlan_srcmac_hash(skb, data, mhoff, hlen);
 
 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
-	    !bond_flow_dissect(bond, skb, &flow))
-		return bond_eth_hash(skb);
+	    !bond_flow_dissect(bond, skb, data, l2_proto, nhoff, hlen, &flow))
+		return bond_eth_hash(skb, data, mhoff, hlen);
 
 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 ||
 	    bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23) {
-		hash = bond_eth_hash(skb);
+		hash = bond_eth_hash(skb, data, mhoff, hlen);
 	} else {
 		if (flow.icmp.id)
 			memcpy(&hash, &flow.icmp, sizeof(hash));
@@ -3646,6 +3661,25 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
 	return bond_ip_hash(hash, &flow);
 }
 
+/**
+ * bond_xmit_hash - generate a hash value based on the xmit policy
+ * @bond: bonding device
+ * @skb: buffer to use for headers
+ *
+ * This function will extract the necessary headers from the skb buffer and use
+ * them to generate a hash based on the xmit_policy set in the bonding device
+ */
+u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
+{
+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
+	    skb->l4_hash)
+		return skb->hash;
+
+	return __bond_xmit_hash(bond, skb, skb->head, skb->protocol,
+				skb->mac_header, skb->network_header,
+				skb_headlen(skb));
+}
+
 /*-------------------------- Device entry points ----------------------------*/
 
 void bond_work_init_all(struct bonding *bond)
@@ -4275,8 +4309,7 @@ static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
 	return bond_tx_drop(bond_dev, skb);
 }
 
-static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond,
-						      struct sk_buff *skb)
+static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond)
 {
 	return rcu_dereference(bond->curr_active_slave);
 }
@@ -4290,7 +4323,7 @@ static netdev_tx_t bond_xmit_activebackup(struct sk_buff *skb,
 	struct bonding *bond = netdev_priv(bond_dev);
 	struct slave *slave;
 
-	slave = bond_xmit_activebackup_slave_get(bond, skb);
+	slave = bond_xmit_activebackup_slave_get(bond);
 	if (slave)
 		return bond_dev_queue_xmit(bond, skb, slave->dev);
 
@@ -4588,7 +4621,7 @@ static struct net_device *bond_xmit_get_slave(struct net_device *master_dev,
 		slave = bond_xmit_roundrobin_slave_get(bond, skb);
 		break;
 	case BOND_MODE_ACTIVEBACKUP:
-		slave = bond_xmit_activebackup_slave_get(bond, skb);
+		slave = bond_xmit_activebackup_slave_get(bond);
 		break;
 	case BOND_MODE_8023AD:
 	case BOND_MODE_XOR:
-- 
2.27.0


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v3 2/5] net: core: Add support for XDP redirection to slave device
  2021-07-07 11:25 ` [PATCH bpf-next v3 0/5] " Jussi Maki
  2021-07-07 11:25   ` [PATCH bpf-next v3 1/5] net: bonding: Refactor bond_xmit_hash for use with xdp_buff Jussi Maki
@ 2021-07-07 11:25   ` Jussi Maki
  2021-07-07 11:25   ` [PATCH bpf-next v3 3/5] net: bonding: Add XDP support to the bonding driver Jussi Maki
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 71+ messages in thread
From: Jussi Maki @ 2021-07-07 11:25 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

This adds the ndo_xdp_get_xmit_slave hook for transforming XDP_TX
into XDP_REDIRECT after BPF program run when the ingress device
is a bond slave.

The dev_xdp_prog_count is exposed so that slave devices can be checked
for loaded XDP programs in order to avoid the situation where both
bond master and slave have programs loaded according to xdp_state.

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 include/linux/filter.h    | 13 ++++++++++++-
 include/linux/netdevice.h |  6 ++++++
 net/core/dev.c            |  9 ++++++++-
 net/core/filter.c         | 25 +++++++++++++++++++++++++
 4 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 472f97074da0..63b426c58a45 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -760,6 +760,10 @@ static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog,
 
 DECLARE_BPF_DISPATCHER(xdp)
 
+DECLARE_STATIC_KEY_FALSE(bpf_master_redirect_enabled_key);
+
+u32 xdp_master_redirect(struct xdp_buff *xdp);
+
 static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
 					    struct xdp_buff *xdp)
 {
@@ -767,7 +771,14 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
 	 * under local_bh_disable(), which provides the needed RCU protection
 	 * for accessing map entries.
 	 */
-	return __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
+	u32 act = __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
+
+	if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) {
+		if (act == XDP_TX && netif_is_bond_slave(xdp->rxq->dev))
+			act = xdp_master_redirect(xdp);
+	}
+
+	return act;
 }
 
 void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index eaf5bb008aa9..d3b1d882d6fa 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1321,6 +1321,9 @@ struct netdev_net_notifier {
  *	that got dropped are freed/returned via xdp_return_frame().
  *	Returns negative number, means general error invoking ndo, meaning
  *	no frames were xmit'ed and core-caller will free all frames.
+ * struct net_device *(*ndo_xdp_get_xmit_slave)(struct net_device *dev,
+ *					        struct xdp_buff *xdp);
+ *      Get the xmit slave of master device based on the xdp_buff.
  * int (*ndo_xsk_wakeup)(struct net_device *dev, u32 queue_id, u32 flags);
  *      This function is used to wake up the softirq, ksoftirqd or kthread
  *	responsible for sending and/or receiving packets on a specific
@@ -1539,6 +1542,8 @@ struct net_device_ops {
 	int			(*ndo_xdp_xmit)(struct net_device *dev, int n,
 						struct xdp_frame **xdp,
 						u32 flags);
+	struct net_device *	(*ndo_xdp_get_xmit_slave)(struct net_device *dev,
+							  struct xdp_buff *xdp);
 	int			(*ndo_xsk_wakeup)(struct net_device *dev,
 						  u32 queue_id, u32 flags);
 	struct devlink_port *	(*ndo_get_devlink_port)(struct net_device *dev);
@@ -4069,6 +4074,7 @@ typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf);
 int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 		      int fd, int expected_fd, u32 flags);
 int bpf_xdp_link_attach(const union bpf_attr *attr, struct bpf_prog *prog);
+u8 dev_xdp_prog_count(struct net_device *dev);
 u32 dev_xdp_prog_id(struct net_device *dev, enum bpf_xdp_mode mode);
 
 int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index c253c2aafe97..05aac85b2bbc 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9334,7 +9334,7 @@ static struct bpf_prog *dev_xdp_prog(struct net_device *dev,
 	return dev->xdp_state[mode].prog;
 }
 
-static u8 dev_xdp_prog_count(struct net_device *dev)
+u8 dev_xdp_prog_count(struct net_device *dev)
 {
 	u8 count = 0;
 	int i;
@@ -9344,6 +9344,7 @@ static u8 dev_xdp_prog_count(struct net_device *dev)
 			count++;
 	return count;
 }
+EXPORT_SYMBOL_GPL(dev_xdp_prog_count);
 
 u32 dev_xdp_prog_id(struct net_device *dev, enum bpf_xdp_mode mode)
 {
@@ -9379,6 +9380,7 @@ static int dev_xdp_install(struct net_device *dev, enum bpf_xdp_mode mode,
 	xdp.flags = flags;
 	xdp.prog = prog;
 
+
 	/* Drivers assume refcnt is already incremented (i.e, prog pointer is
 	 * "moved" into driver), so they don't increment it on their own, but
 	 * they do decrement refcnt when program is detached or replaced.
@@ -9467,6 +9469,11 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack
 		NL_SET_ERR_MSG(extack, "XDP_FLAGS_REPLACE is not specified");
 		return -EINVAL;
 	}
+	/* don't allow loading XDP programs to a bonded device */
+	if (netif_is_bond_slave(dev)) {
+		NL_SET_ERR_MSG(extack, "XDP program can not be attached to a bond slave");
+		return -EINVAL;
+	}
 
 	mode = dev_xdp_mode(dev, flags);
 	/* can't replace attached link */
diff --git a/net/core/filter.c b/net/core/filter.c
index d70187ce851b..10b12577f71d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3950,6 +3950,31 @@ void bpf_clear_redirect_map(struct bpf_map *map)
 	}
 }
 
+DEFINE_STATIC_KEY_FALSE(bpf_master_redirect_enabled_key);
+EXPORT_SYMBOL_GPL(bpf_master_redirect_enabled_key);
+
+u32 xdp_master_redirect(struct xdp_buff *xdp)
+{
+	struct net_device *master, *slave;
+	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+
+	master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev);
+	slave = master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp);
+	if (slave && slave != xdp->rxq->dev) {
+		/* The target device is different from the receiving device, so
+		 * redirect it to the new device.
+		 * Using XDP_REDIRECT gets the correct behaviour from XDP enabled
+		 * drivers to unmap the packet from their rx ring.
+		 */
+		ri->tgt_index = slave->ifindex;
+		ri->map_id = INT_MAX;
+		ri->map_type = BPF_MAP_TYPE_UNSPEC;
+		return XDP_REDIRECT;
+	}
+	return XDP_TX;
+}
+EXPORT_SYMBOL_GPL(xdp_master_redirect);
+
 int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
 		    struct bpf_prog *xdp_prog)
 {
-- 
2.27.0


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v3 3/5] net: bonding: Add XDP support to the bonding driver
  2021-07-07 11:25 ` [PATCH bpf-next v3 0/5] " Jussi Maki
  2021-07-07 11:25   ` [PATCH bpf-next v3 1/5] net: bonding: Refactor bond_xmit_hash for use with xdp_buff Jussi Maki
  2021-07-07 11:25   ` [PATCH bpf-next v3 2/5] net: core: Add support for XDP redirection to slave device Jussi Maki
@ 2021-07-07 11:25   ` Jussi Maki
  2021-07-13  7:14     ` kernel test robot
  2021-07-07 11:25   ` [PATCH bpf-next v3 4/5] devmap: Exclude XDP broadcast to master device Jussi Maki
  2021-07-07 11:25   ` [PATCH bpf-next v3 5/5] net: core: Allow netdev_lower_get_next_private_rcu in bh context Jussi Maki
  4 siblings, 1 reply; 71+ messages in thread
From: Jussi Maki @ 2021-07-07 11:25 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

XDP is implemented in the bonding driver by transparently delegating
the XDP program loading, removal and xmit operations to the bonding
slave devices. The overall goal of this work is that XDP programs
can be attached to a bond device *without* any further changes (or
awareness) necessary to the program itself, meaning the same XDP
program can be attached to a native device but also a bonding device.

Semantics of XDP_TX when attached to a bond are equivalent in such
setting to the case when a tc/BPF program would be attached to the
bond, meaning transmitting the packet out of the bond itself using one
of the bond's configured xmit methods to select a slave device (rather
than XDP_TX on the slave itself). Handling of XDP_TX to transmit
using the configured bonding mechanism is therefore implemented by
rewriting the BPF program return value in bpf_prog_run_xdp. To avoid
performance impact this check is guarded by a static key, which is
incremented when a XDP program is loaded onto a bond device. This
approach was chosen to avoid changes to drivers implementing XDP. If
the slave device does not match the receive device, then XDP_REDIRECT
is transparently used to perform the redirection in order to have
the network driver release the packet from its RX ring.  The bonding
driver hashing functions have been refactored to allow reuse with
xdp_buff's to avoid code duplication.

The motivation for this change is to enable use of bonding (and
802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
XDP and also to transparently support bond devices for projects that
use XDP given most modern NICs have dual port adapters.  An alternative
to this approach would be to implement 802.3ad in user-space and
implement the bonding load-balancing in the XDP program itself, but
is rather a cumbersome endeavor in terms of slave device management
(e.g. by watching netlink) and requires separate programs for native
vs bond cases for the orchestrator. A native in-kernel implementation
overcomes these issues and provides more flexibility.

Below are benchmark results done on two machines with 100Gbit
Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
16-core 3950X on receiving machine. 64 byte packets were sent with
pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
ice driver, so the tests were performed with iommu=off and patch [2]
applied. Additionally the bonding round robin algorithm was modified
to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
of cache misses were caused by the shared rr_tx_counter (see patch
2/3). The statistics were collected using "sar -n dev -u 1 10".

 -----------------------|  CPU  |--| rxpck/s |--| txpck/s |----
 without patch (1 dev):
   XDP_DROP:              3.15%      48.6Mpps
   XDP_TX:                3.12%      18.3Mpps     18.3Mpps
   XDP_DROP (RSS):        9.47%      116.5Mpps
   XDP_TX (RSS):          9.67%      25.3Mpps     24.2Mpps
 -----------------------
 with patch, bond (1 dev):
   XDP_DROP:              3.14%      46.7Mpps
   XDP_TX:                3.15%      13.9Mpps     13.9Mpps
   XDP_DROP (RSS):        10.33%     117.2Mpps
   XDP_TX (RSS):          10.64%     25.1Mpps     24.0Mpps
 -----------------------
 with patch, bond (2 devs):
   XDP_DROP:              6.27%      92.7Mpps
   XDP_TX:                6.26%      17.6Mpps     17.5Mpps
   XDP_DROP (RSS):       11.38%      117.2Mpps
   XDP_TX (RSS):         14.30%      28.7Mpps     27.4Mpps
 --------------------------------------------------------------

RSS: Receive Side Scaling, e.g. the packets were sent to a range of
destination IPs.

[1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb
[2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t
[3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 drivers/net/bonding/bond_main.c | 303 ++++++++++++++++++++++++++++++++
 include/net/bonding.h           |   1 +
 2 files changed, 304 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 78284c451668..07ac40c3dac1 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -317,6 +317,19 @@ bool bond_sk_check(struct bonding *bond)
 	}
 }
 
+static bool bond_xdp_check(struct bonding *bond)
+{
+	switch (BOND_MODE(bond)) {
+	case BOND_MODE_ROUNDROBIN:
+	case BOND_MODE_ACTIVEBACKUP:
+	case BOND_MODE_8023AD:
+	case BOND_MODE_XOR:
+		return true;
+	default:
+		return false;
+	}
+}
+
 /*---------------------------------- VLAN -----------------------------------*/
 
 /* In the following 2 functions, bond_vlan_rx_add_vid and bond_vlan_rx_kill_vid,
@@ -2010,6 +2023,39 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
 		bond_update_slave_arr(bond, NULL);
 
 
+	if (!slave_dev->netdev_ops->ndo_bpf ||
+	    !slave_dev->netdev_ops->ndo_xdp_xmit) {
+		if (bond->xdp_prog) {
+			NL_SET_ERR_MSG(extack, "Slave does not support XDP");
+			slave_err(bond_dev, slave_dev, "Slave does not support XDP\n");
+			res = -EOPNOTSUPP;
+			goto err_sysfs_del;
+		}
+	} else {
+		struct netdev_bpf xdp = {
+			.command = XDP_SETUP_PROG,
+			.flags   = 0,
+			.prog    = bond->xdp_prog,
+			.extack  = extack,
+		};
+
+		if (dev_xdp_prog_count(slave_dev) > 0) { 
+			NL_SET_ERR_MSG(extack, "Slave has XDP program loaded, please unload before enslaving");
+			slave_err(bond_dev, slave_dev, "Slave has XDP program loaded, please unload before enslaving\n");
+			res = -EOPNOTSUPP;
+			goto err_sysfs_del;
+		}
+    	
+		res = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp);
+		if (res < 0) {
+			/* ndo_bpf() sets extack error message */
+			slave_dbg(bond_dev, slave_dev, "Error %d calling ndo_bpf\n", res);
+			goto err_sysfs_del;
+		}
+		if (bond->xdp_prog)
+			bpf_prog_inc(bond->xdp_prog);
+	}
+
 	slave_info(bond_dev, slave_dev, "Enslaving as %s interface with %s link\n",
 		   bond_is_active_slave(new_slave) ? "an active" : "a backup",
 		   new_slave->link != BOND_LINK_DOWN ? "an up" : "a down");
@@ -2129,6 +2175,17 @@ static int __bond_release_one(struct net_device *bond_dev,
 	/* recompute stats just before removing the slave */
 	bond_get_stats(bond->dev, &bond->bond_stats);
 
+	if (bond->xdp_prog) {
+		struct netdev_bpf xdp = {
+			.command = XDP_SETUP_PROG,
+			.flags   = 0,
+			.prog	 = NULL,
+			.extack  = NULL,
+		};
+		if (slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp))
+			slave_warn(bond_dev, slave_dev, "failed to unload XDP program\n");
+	}
+
 	bond_upper_dev_unlink(bond, slave);
 	/* unregister rx_handler early so bond_handle_frame wouldn't be called
 	 * for this slave anymore.
@@ -3680,6 +3737,26 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
 				skb_headlen(skb));
 }
 
+/**
+ * bond_xmit_hash_xdp - generate a hash value based on the xmit policy
+ * @bond: bonding device
+ * @xdp: buffer to use for headers
+ *
+ * The XDP variant of bond_xmit_hash.
+ */
+static u32 bond_xmit_hash_xdp(struct bonding *bond, struct xdp_buff *xdp)
+{
+	struct ethhdr *eth;
+
+	if (xdp->data + sizeof(struct ethhdr) > xdp->data_end)
+		return 0;
+
+	eth = (struct ethhdr *)xdp->data;
+
+	return __bond_xmit_hash(bond, NULL, xdp->data, eth->h_proto, 0,
+				sizeof(struct ethhdr), xdp->data_end - xdp->data);
+}
+
 /*-------------------------- Device entry points ----------------------------*/
 
 void bond_work_init_all(struct bonding *bond)
@@ -4296,6 +4373,47 @@ static struct slave *bond_xmit_roundrobin_slave_get(struct bonding *bond,
 	return NULL;
 }
 
+static struct slave *bond_xdp_xmit_roundrobin_slave_get(struct bonding *bond,
+							struct xdp_buff *xdp)
+{
+	struct slave *slave;
+	int slave_cnt;
+	u32 slave_id;
+	const struct ethhdr *eth;
+	void *data = xdp->data;
+
+	if (data + sizeof(struct ethhdr) > xdp->data_end)
+		goto non_igmp;
+
+	eth = (struct ethhdr *)data;
+	data += sizeof(struct ethhdr);
+
+	/* See comment on IGMP in bond_xmit_roundrobin_slave_get() */
+	if (eth->h_proto == htons(ETH_P_IP)) {
+		const struct iphdr *iph;
+
+		if (data + sizeof(struct iphdr) > xdp->data_end)
+			goto non_igmp;
+
+		iph = (struct iphdr *)data;
+
+		if (iph->protocol == IPPROTO_IGMP) {
+			slave = rcu_dereference(bond->curr_active_slave);
+			if (slave)
+				return slave;
+			return bond_get_slave_by_id(bond, 0);
+		}
+	}
+
+non_igmp:
+	slave_cnt = READ_ONCE(bond->slave_cnt);
+	if (likely(slave_cnt)) {
+		slave_id = bond_rr_gen_slave_id(bond) % slave_cnt;
+		return bond_get_slave_by_id(bond, slave_id);
+	}
+	return NULL;
+}
+
 static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
 					struct net_device *bond_dev)
 {
@@ -4511,6 +4629,22 @@ static struct slave *bond_xmit_3ad_xor_slave_get(struct bonding *bond,
 	return slave;
 }
 
+static struct slave *bond_xdp_xmit_3ad_xor_slave_get(struct bonding *bond,
+						     struct xdp_buff *xdp)
+{
+	struct bond_up_slave *slaves;
+	unsigned int count;
+	u32 hash;
+
+	hash = bond_xmit_hash_xdp(bond, xdp);
+	slaves = bond->usable_slaves;
+	count = slaves ? READ_ONCE(slaves->count) : 0;
+	if (unlikely(!count))
+		return NULL;
+
+	return slaves->arr[hash % count];
+}
+
 /* Use this Xmit function for 3AD as well as XOR modes. The current
  * usable slave array is formed in the control path. The xmit function
  * just calculates hash and sends the packet out.
@@ -4795,6 +4929,172 @@ static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	return ret;
 }
 
+static struct net_device *
+bond_xdp_get_xmit_slave(struct net_device *bond_dev, struct xdp_buff *xdp)
+{
+	struct bonding *bond = netdev_priv(bond_dev);
+	struct slave *slave;
+
+	/* Caller needs to hold rcu_read_lock() */
+
+	switch (BOND_MODE(bond)) {
+	case BOND_MODE_ROUNDROBIN:
+		slave = bond_xdp_xmit_roundrobin_slave_get(bond, xdp);
+		break;
+
+	case BOND_MODE_ACTIVEBACKUP:
+		slave = bond_xmit_activebackup_slave_get(bond);
+		break;
+
+	case BOND_MODE_8023AD:
+	case BOND_MODE_XOR:
+		slave = bond_xdp_xmit_3ad_xor_slave_get(bond, xdp);
+		break;
+
+	default:
+		/* Should never happen. Mode guarded by bond_xdp_check() */
+		netdev_err(bond_dev, "Unknown bonding mode %d for xdp xmit\n", BOND_MODE(bond));
+		WARN_ON_ONCE(1);
+		return NULL;
+	}
+
+	if (slave)
+		return slave->dev;
+
+	return NULL;
+}
+
+static int bond_xdp_xmit(struct net_device *bond_dev,
+			 int n, struct xdp_frame **frames, u32 flags)
+{
+	int nxmit, err = -ENXIO;
+
+	rcu_read_lock();
+
+	for (nxmit = 0; nxmit < n; nxmit++) {
+		struct xdp_frame *frame = frames[nxmit];
+		struct xdp_frame *frames1[] = {frame};
+		struct net_device *slave_dev;
+		struct xdp_buff xdp;
+
+		xdp_convert_frame_to_buff(frame, &xdp);
+
+		slave_dev = bond_xdp_get_xmit_slave(bond_dev, &xdp);
+		if (!slave_dev) {
+			err = -ENXIO;
+			break;
+		}
+
+		err = slave_dev->netdev_ops->ndo_xdp_xmit(slave_dev, 1, frames1, flags);
+		if (err < 1)
+			break;
+	}
+
+	rcu_read_unlock();
+
+	/* If error happened on the first frame then we can pass the error up, otherwise
+	 * report the number of frames that were xmitted.
+	 */
+	if (err < 0)
+		return (nxmit == 0 ? err : nxmit);
+
+	return nxmit;
+}
+
+static int bond_xdp_set(struct net_device *dev, struct bpf_prog *prog,
+			struct netlink_ext_ack *extack)
+{
+	struct bonding *bond = netdev_priv(dev);
+	struct list_head *iter;
+	struct slave *slave, *rollback_slave;
+	struct bpf_prog *old_prog;
+	struct netdev_bpf xdp = {
+		.command = XDP_SETUP_PROG,
+		.flags   = 0,
+		.prog    = prog,
+		.extack  = extack,
+	};
+	int err;
+
+	ASSERT_RTNL();
+
+	if (!bond_xdp_check(bond))
+		return -EOPNOTSUPP;
+
+	old_prog = bond->xdp_prog;
+	bond->xdp_prog = prog;
+
+	bond_for_each_slave(bond, slave, iter) {
+		struct net_device *slave_dev = slave->dev;
+
+		if (!slave_dev->netdev_ops->ndo_bpf ||
+		    !slave_dev->netdev_ops->ndo_xdp_xmit) {
+			NL_SET_ERR_MSG(extack, "Slave device does not support XDP");
+			slave_err(dev, slave_dev, "Slave does not support XDP\n");
+			err = -EOPNOTSUPP;
+			goto err;
+		}
+
+		if (dev_xdp_prog_count(slave_dev) > 0) {
+			NL_SET_ERR_MSG(extack, "Slave has XDP program loaded, please unload before enslaving");
+			slave_err(dev, slave_dev, "Slave has XDP program loaded, please unload before enslaving\n");
+			err = -EOPNOTSUPP;
+			goto err;
+		}
+
+		err = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp);
+		if (err < 0) {
+			/* ndo_bpf() sets extack error message */
+			slave_err(dev, slave_dev, "Error %d calling ndo_bpf\n", err);
+			goto err;
+		}
+		if (prog)
+			bpf_prog_inc(prog);
+	}
+
+	if (old_prog)
+		bpf_prog_put(old_prog);
+
+	if (prog)
+		static_branch_inc(&bpf_master_redirect_enabled_key);
+	else
+		static_branch_dec(&bpf_master_redirect_enabled_key);
+
+	return 0;
+
+err:
+	/* unwind the program changes */
+	bond->xdp_prog = old_prog;
+	xdp.prog = old_prog;
+	xdp.extack = NULL; /* do not overwrite original error */
+
+	bond_for_each_slave(bond, rollback_slave, iter) {
+		struct net_device *slave_dev = rollback_slave->dev;
+		int err_unwind;
+
+		if (slave == rollback_slave)
+			break;
+
+		err_unwind = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp);
+		if (err_unwind < 0)
+			slave_err(dev, slave_dev,
+				  "Error %d when unwinding XDP program change\n", err_unwind);
+		else if (xdp.prog)
+			bpf_prog_inc(xdp.prog);
+	}
+	return err;
+}
+
+static int bond_xdp(struct net_device *dev, struct netdev_bpf *xdp)
+{
+	switch (xdp->command) {
+	case XDP_SETUP_PROG:
+		return bond_xdp_set(dev, xdp->prog, xdp->extack);
+	default:
+		return -EINVAL;
+	}
+}
+
 static u32 bond_mode_bcast_speed(struct slave *slave, u32 speed)
 {
 	if (speed == 0 || speed == SPEED_UNKNOWN)
@@ -4881,6 +5181,9 @@ static const struct net_device_ops bond_netdev_ops = {
 	.ndo_features_check	= passthru_features_check,
 	.ndo_get_xmit_slave	= bond_xmit_get_slave,
 	.ndo_sk_get_lower_dev	= bond_sk_get_lower_dev,
+	.ndo_bpf		= bond_xdp,
+	.ndo_xdp_xmit           = bond_xdp_xmit,
+	.ndo_xdp_get_xmit_slave = bond_xdp_get_xmit_slave,
 };
 
 static const struct device_type bond_type = {
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 15335732e166..8de8180f1be8 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -251,6 +251,7 @@ struct bonding {
 #ifdef CONFIG_XFRM_OFFLOAD
 	struct xfrm_state *xs;
 #endif /* CONFIG_XFRM_OFFLOAD */
+	struct bpf_prog *xdp_prog;
 };
 
 #define bond_slave_get_rcu(dev) \
-- 
2.27.0


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v3 4/5] devmap: Exclude XDP broadcast to master device
  2021-07-07 11:25 ` [PATCH bpf-next v3 0/5] " Jussi Maki
                     ` (2 preceding siblings ...)
  2021-07-07 11:25   ` [PATCH bpf-next v3 3/5] net: bonding: Add XDP support to the bonding driver Jussi Maki
@ 2021-07-07 11:25   ` Jussi Maki
  2021-07-07 11:25   ` [PATCH bpf-next v3 5/5] net: core: Allow netdev_lower_get_next_private_rcu in bh context Jussi Maki
  4 siblings, 0 replies; 71+ messages in thread
From: Jussi Maki @ 2021-07-07 11:25 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

If the ingress device is bond slave, do not broadcast back
through it or the bond master.

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 kernel/bpf/devmap.c | 67 +++++++++++++++++++++++++++++++++++++++------
 1 file changed, 58 insertions(+), 9 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 2546dafd6672..c1a2dfb88724 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -513,10 +513,9 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
 	return __xdp_enqueue(dev, xdp, dev_rx, dst->xdp_prog);
 }
 
-static bool is_valid_dst(struct bpf_dtab_netdev *obj, struct xdp_buff *xdp,
-			 int exclude_ifindex)
+static bool is_valid_dst(struct bpf_dtab_netdev *obj, struct xdp_buff *xdp)
 {
-	if (!obj || obj->dev->ifindex == exclude_ifindex ||
+	if (!obj ||
 	    !obj->dev->netdev_ops->ndo_xdp_xmit)
 		return false;
 
@@ -541,17 +540,48 @@ static int dev_map_enqueue_clone(struct bpf_dtab_netdev *obj,
 	return 0;
 }
 
+static inline bool is_ifindex_excluded(int *excluded, int num_excluded, int ifindex)
+{
+	while (num_excluded--) {
+		if (ifindex == excluded[num_excluded])
+			return true;
+	}
+	return false;
+}
+
+/* Get ifindex of each upper device. 'indexes' must be able to hold at
+ * least MAX_NEST_DEV elements.
+ * Returns the number of ifindexes added.
+ */
+static int get_upper_ifindexes(struct net_device *dev, int *indexes)
+{
+	struct net_device *upper;
+	struct list_head *iter;
+	int n = 0;
+
+	netdev_for_each_upper_dev_rcu(dev, upper, iter) {
+		indexes[n++] = upper->ifindex;
+	}
+	return n;
+}
+
 int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
 			  struct bpf_map *map, bool exclude_ingress)
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
-	int exclude_ifindex = exclude_ingress ? dev_rx->ifindex : 0;
 	struct bpf_dtab_netdev *dst, *last_dst = NULL;
+	int excluded_devices[1+MAX_NEST_DEV];
 	struct hlist_head *head;
 	struct xdp_frame *xdpf;
+	int num_excluded = 0;
 	unsigned int i;
 	int err;
 
+	if (exclude_ingress) {
+		num_excluded = get_upper_ifindexes(dev_rx, excluded_devices);
+		excluded_devices[num_excluded++] = dev_rx->ifindex;
+	}
+
 	xdpf = xdp_convert_buff_to_frame(xdp);
 	if (unlikely(!xdpf))
 		return -EOVERFLOW;
@@ -559,7 +589,10 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
 	if (map->map_type == BPF_MAP_TYPE_DEVMAP) {
 		for (i = 0; i < map->max_entries; i++) {
 			dst = READ_ONCE(dtab->netdev_map[i]);
-			if (!is_valid_dst(dst, xdp, exclude_ifindex))
+			if (!is_valid_dst(dst, xdp))
+				continue;
+
+			if (is_ifindex_excluded(excluded_devices, num_excluded, dst->dev->ifindex))
 				continue;
 
 			/* we only need n-1 clones; last_dst enqueued below */
@@ -579,7 +612,10 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
 			head = dev_map_index_hash(dtab, i);
 			hlist_for_each_entry_rcu(dst, head, index_hlist,
 						 lockdep_is_held(&dtab->index_lock)) {
-				if (!is_valid_dst(dst, xdp, exclude_ifindex))
+				if (!is_valid_dst(dst, xdp))
+					continue;
+
+				if (is_ifindex_excluded(excluded_devices, num_excluded, dst->dev->ifindex))
 					continue;
 
 				/* we only need n-1 clones; last_dst enqueued below */
@@ -645,17 +681,26 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb,
 			   bool exclude_ingress)
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
-	int exclude_ifindex = exclude_ingress ? dev->ifindex : 0;
 	struct bpf_dtab_netdev *dst, *last_dst = NULL;
+	int excluded_devices[1+MAX_NEST_DEV];
 	struct hlist_head *head;
 	struct hlist_node *next;
+	int num_excluded = 0;
 	unsigned int i;
 	int err;
 
+	if (exclude_ingress) {
+		num_excluded = get_upper_ifindexes(dev, excluded_devices);
+		excluded_devices[num_excluded++] = dev->ifindex;
+	}
+
 	if (map->map_type == BPF_MAP_TYPE_DEVMAP) {
 		for (i = 0; i < map->max_entries; i++) {
 			dst = READ_ONCE(dtab->netdev_map[i]);
-			if (!dst || dst->dev->ifindex == exclude_ifindex)
+			if (!dst)
+				continue;
+
+			if (is_ifindex_excluded(excluded_devices, num_excluded, dst->dev->ifindex))
 				continue;
 
 			/* we only need n-1 clones; last_dst enqueued below */
@@ -669,12 +714,16 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb,
 				return err;
 
 			last_dst = dst;
+
 		}
 	} else { /* BPF_MAP_TYPE_DEVMAP_HASH */
 		for (i = 0; i < dtab->n_buckets; i++) {
 			head = dev_map_index_hash(dtab, i);
 			hlist_for_each_entry_safe(dst, next, head, index_hlist) {
-				if (!dst || dst->dev->ifindex == exclude_ifindex)
+				if (!dst)
+					continue;
+
+				if (is_ifindex_excluded(excluded_devices, num_excluded, dst->dev->ifindex))
 					continue;
 
 				/* we only need n-1 clones; last_dst enqueued below */
-- 
2.27.0


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v3 5/5] net: core: Allow netdev_lower_get_next_private_rcu in bh context
  2021-07-07 11:25 ` [PATCH bpf-next v3 0/5] " Jussi Maki
                     ` (3 preceding siblings ...)
  2021-07-07 11:25   ` [PATCH bpf-next v3 4/5] devmap: Exclude XDP broadcast to master device Jussi Maki
@ 2021-07-07 11:25   ` Jussi Maki
  4 siblings, 0 replies; 71+ messages in thread
From: Jussi Maki @ 2021-07-07 11:25 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki, toke

For the XDP bonding slave lookup to work in the NAPI poll context
in which the redudant rcu_read_lock() has been removed we have to
follow the same approach as in [1] and modify the WARN_ON to also
check rcu_read_lock_bh_held().

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=694cea395fded425008e93cd90cfdf7a451674af

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 net/core/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 05aac85b2bbc..27f95aeddc59 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7569,7 +7569,7 @@ void *netdev_lower_get_next_private_rcu(struct net_device *dev,
 {
 	struct netdev_adjacent *lower;
 
-	WARN_ON_ONCE(!rcu_read_lock_held());
+	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held());
 
 	lower = list_entry_rcu((*iter)->next, struct netdev_adjacent, list);
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next v3 3/5] net: bonding: Add XDP support to the bonding driver
  2021-07-07 11:25   ` [PATCH bpf-next v3 3/5] net: bonding: Add XDP support to the bonding driver Jussi Maki
@ 2021-07-13  7:14     ` kernel test robot
  0 siblings, 0 replies; 71+ messages in thread
From: kernel test robot @ 2021-07-13  7:14 UTC (permalink / raw)
  To: Jussi Maki, bpf
  Cc: kbuild-all, netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

[-- Attachment #1: Type: text/plain, Size: 3177 bytes --]

Hi Jussi,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Jussi-Maki/net-bonding-Refactor-bond_xmit_hash-for-use-with-xdp_buff/20210707-211616
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: m68k-randconfig-s032-20210713 (attached as .config)
compiler: m68k-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.3-341-g8af24329-dirty
        # https://github.com/0day-ci/linux/commit/127745a2455bc3577cdcafb06381fa4da354f8c2
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jussi-Maki/net-bonding-Refactor-bond_xmit_hash-for-use-with-xdp_buff/20210707-211616
        git checkout 127745a2455bc3577cdcafb06381fa4da354f8c2
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=m68k 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
   drivers/net/bonding/bond_main.c:2671:26: sparse: sparse: restricted __be16 degrades to integer
   drivers/net/bonding/bond_main.c:2677:20: sparse: sparse: restricted __be16 degrades to integer
   drivers/net/bonding/bond_main.c:2724:40: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __be16 [usertype] vlan_proto @@     got int @@
   drivers/net/bonding/bond_main.c:2724:40: sparse:     expected restricted __be16 [usertype] vlan_proto
   drivers/net/bonding/bond_main.c:2724:40: sparse:     got int
>> drivers/net/bonding/bond_main.c:4632:16: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected struct bond_up_slave *slaves @@     got struct bond_up_slave [noderef] __rcu *usable_slaves @@
   drivers/net/bonding/bond_main.c:4632:16: sparse:     expected struct bond_up_slave *slaves
   drivers/net/bonding/bond_main.c:4632:16: sparse:     got struct bond_up_slave [noderef] __rcu *usable_slaves
   drivers/net/bonding/bond_main.c:3563:52: sparse: sparse: restricted __be16 degrades to integer
   drivers/net/bonding/bond_main.c:3563:52: sparse: sparse: restricted __be16 degrades to integer

vim +4632 drivers/net/bonding/bond_main.c

  4623	
  4624	static struct slave *bond_xdp_xmit_3ad_xor_slave_get(struct bonding *bond,
  4625							     struct xdp_buff *xdp)
  4626	{
  4627		struct bond_up_slave *slaves;
  4628		unsigned int count;
  4629		u32 hash;
  4630	
  4631		hash = bond_xmit_hash_xdp(bond, xdp);
> 4632		slaves = bond->usable_slaves;
  4633		count = slaves ? READ_ONCE(slaves->count) : 0;
  4634		if (unlikely(!count))
  4635			return NULL;
  4636	
  4637		return slaves->arr[hash % count];
  4638	}
  4639	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 22479 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v4 0/6] XDP bonding support
  2021-06-09 13:55 [PATCH bpf-next 0/3] XDP bonding support Jussi Maki
                   ` (5 preceding siblings ...)
  2021-07-07 11:25 ` [PATCH bpf-next v3 0/5] " Jussi Maki
@ 2021-07-28 23:43 ` joamaki
  2021-07-28 23:43   ` [PATCH bpf-next v4 1/6] net: bonding: Refactor bond_xmit_hash for use with xdp_buff joamaki
                     ` (5 more replies)
  2021-07-30  6:18 ` [PATCH bpf-next v5 0/7] XDP bonding support Jussi Maki
  2021-07-31  5:57 ` [PATCH bpf-next v6 0/7]: XDP bonding support Jussi Maki
  8 siblings, 6 replies; 71+ messages in thread
From: joamaki @ 2021-07-28 23:43 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson

This patchset introduces XDP support to the bonding driver.

The motivation for this change is to enable use of bonding (and
802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
XDP and also to transparently support bond devices for projects that
use XDP given most modern NICs have dual port adapters.  An alternative
to this approach would be to implement 802.3ad in user-space and
implement the bonding load-balancing in the XDP program itself, but
is rather a cumbersome endeavor in terms of slave device management
(e.g. by watching netlink) and requires separate programs for native
vs bond cases for the orchestrator. A native in-kernel implementation
overcomes these issues and provides more flexibility.

Below are benchmark results done on two machines with 100Gbit
Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
16-core 3950X on receiving machine. 64 byte packets were sent with
pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
ice driver, so the tests were performed with iommu=off and patch [2]
applied. Additionally the bonding round robin algorithm was modified
to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
of cache misses were caused by the shared rr_tx_counter. Fix for this
has been already merged into net-next. The statistics were collected 
using "sar -n dev -u 1 10".

 -----------------------|  CPU  |--| rxpck/s |--| txpck/s |----
 without patch (1 dev):
   XDP_DROP:              3.15%      48.6Mpps
   XDP_TX:                3.12%      18.3Mpps     18.3Mpps
   XDP_DROP (RSS):        9.47%      116.5Mpps
   XDP_TX (RSS):          9.67%      25.3Mpps     24.2Mpps
 -----------------------
 with patch, bond (1 dev):
   XDP_DROP:              3.14%      46.7Mpps
   XDP_TX:                3.15%      13.9Mpps     13.9Mpps
   XDP_DROP (RSS):        10.33%     117.2Mpps
   XDP_TX (RSS):          10.64%     25.1Mpps     24.0Mpps
 -----------------------
 with patch, bond (2 devs):
   XDP_DROP:              6.27%      92.7Mpps
   XDP_TX:                6.26%      17.6Mpps     17.5Mpps
   XDP_DROP (RSS):       11.38%      117.2Mpps
   XDP_TX (RSS):         14.30%      28.7Mpps     27.4Mpps
 --------------------------------------------------------------

RSS: Receive Side Scaling, e.g. the packets were sent to a range of
destination IPs.

[1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb
[2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t
[3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/

Patch 1 prepares bond_xmit_hash for hashing xdp_buff's.
Patch 2 adds hooks to implement redirection after bpf prog run.
Patch 3 implements the hooks in the bonding driver. 
Patch 4 modifies devmap to properly handle EXCLUDE_INGRESS with a slave device.
Patch 5 fixes an issue related to recent cleanup of rcu_read_lock in XDP context.
Patch 6 adds tests

v3->v4:
- Add back the test suite, while removing the vmtest.sh modifications to kernel
  config new that CONFIG_BONDING=y is set. Discussed with Magnus Karlsson that 
  it makes sense right now to not reuse the code from xdpceiver.c for testing 
  XDP bonding.

v2->v3:
- Address Jay's comment to properly exclude upper devices with EXCLUDE_INGRESS
  when there are deeper nesting involved. Now all upper devices are excluded.
- Refuse to enslave devices that already have XDP programs loaded and refuse to
  load XDP programs to slave devices. Earlier one could have a XDP program loaded
  and after enslaving and loading another program onto the bond device the xdp_state
  of the enslaved device would be pointing at an old program.
- Adapt netdev_lower_get_next_private_rcu so it can be called in the XDP context.

v1->v2:
- Split up into smaller easier to review patches and address cosmetic 
  review comments.
- Drop the INDIRECT_CALL optimization as it showed little improvement in tests.
- Drop the rr_tx_counter patch as that has already been merged into net-next.
- Separate the test suite into another patch set. This will follow later once a
  patch set from Magnus Karlsson is merged and provides test utilities that can
  be reused for XDP bonding tests. v2 contains no major functional changes and
  was tested with the test suite included in v1.
  (https://lore.kernel.org/bpf/202106221509.kwNvAAZg-lkp@intel.com/T/#m464146d47299125d5868a08affd6d6ce526dfad1)

---



^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v4 1/6] net: bonding: Refactor bond_xmit_hash for use with xdp_buff
  2021-07-28 23:43 ` [PATCH bpf-next v4 0/6] XDP bonding support joamaki
@ 2021-07-28 23:43   ` joamaki
  2021-07-28 23:43   ` [PATCH bpf-next v4 2/6] net: core: Add support for XDP redirection to slave device joamaki
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 71+ messages in thread
From: joamaki @ 2021-07-28 23:43 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

From: Jussi Maki <joamaki@gmail.com>

In preparation for adding XDP support to the bonding driver
refactor the packet hashing functions to be able to work with
any linear data buffer without an skb.

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 drivers/net/bonding/bond_main.c | 147 +++++++++++++++++++-------------
 1 file changed, 90 insertions(+), 57 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index d22d78303311..dcec5cc4dab1 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3611,55 +3611,80 @@ static struct notifier_block bond_netdev_notifier = {
 
 /*---------------------------- Hashing Policies -----------------------------*/
 
+/* Helper to access data in a packet, with or without a backing skb.
+ * If skb is given the data is linearized if necessary via pskb_may_pull.
+ */
+static inline const void *bond_pull_data(struct sk_buff *skb,
+					 const void *data, int hlen, int n)
+{
+	if (likely(n <= hlen))
+		return data;
+	else if (skb && likely(pskb_may_pull(skb, n)))
+		return skb->head;
+
+	return NULL;
+}
+
 /* L2 hash helper */
-static inline u32 bond_eth_hash(struct sk_buff *skb)
+static inline u32 bond_eth_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen)
 {
-	struct ethhdr *ep, hdr_tmp;
+	struct ethhdr *ep;
 
-	ep = skb_header_pointer(skb, 0, sizeof(hdr_tmp), &hdr_tmp);
-	if (ep)
-		return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto;
-	return 0;
+	data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr));
+	if (!data)
+		return 0;
+
+	ep = (struct ethhdr *)(data + mhoff);
+	return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto;
 }
 
-static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk,
-			 int *noff, int *proto, bool l34)
+static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk, const void *data,
+			 int hlen, __be16 l2_proto, int *nhoff, int *ip_proto, bool l34)
 {
 	const struct ipv6hdr *iph6;
 	const struct iphdr *iph;
 
-	if (skb->protocol == htons(ETH_P_IP)) {
-		if (unlikely(!pskb_may_pull(skb, *noff + sizeof(*iph))))
+	if (l2_proto == htons(ETH_P_IP)) {
+		data = bond_pull_data(skb, data, hlen, *nhoff + sizeof(*iph));
+		if (!data)
 			return false;
-		iph = (const struct iphdr *)(skb->data + *noff);
+
+		iph = (const struct iphdr *)(data + *nhoff);
 		iph_to_flow_copy_v4addrs(fk, iph);
-		*noff += iph->ihl << 2;
+		*nhoff += iph->ihl << 2;
 		if (!ip_is_fragment(iph))
-			*proto = iph->protocol;
-	} else if (skb->protocol == htons(ETH_P_IPV6)) {
-		if (unlikely(!pskb_may_pull(skb, *noff + sizeof(*iph6))))
+			*ip_proto = iph->protocol;
+	} else if (l2_proto == htons(ETH_P_IPV6)) {
+		data = bond_pull_data(skb, data, hlen, *nhoff + sizeof(*iph6));
+		if (!data)
 			return false;
-		iph6 = (const struct ipv6hdr *)(skb->data + *noff);
+
+		iph6 = (const struct ipv6hdr *)(data + *nhoff);
 		iph_to_flow_copy_v6addrs(fk, iph6);
-		*noff += sizeof(*iph6);
-		*proto = iph6->nexthdr;
+		*nhoff += sizeof(*iph6);
+		*ip_proto = iph6->nexthdr;
 	} else {
 		return false;
 	}
 
-	if (l34 && *proto >= 0)
-		fk->ports.ports = skb_flow_get_ports(skb, *noff, *proto);
+	if (l34 && *ip_proto >= 0)
+		fk->ports.ports = __skb_flow_get_ports(skb, *nhoff, *ip_proto, data, hlen);
 
 	return true;
 }
 
-static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
+static u32 bond_vlan_srcmac_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen)
 {
-	struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
+	struct ethhdr *mac_hdr;
 	u32 srcmac_vendor = 0, srcmac_dev = 0;
 	u16 vlan;
 	int i;
 
+	data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr));
+	if (!data)
+		return 0;
+	mac_hdr = (struct ethhdr *)(data + mhoff);
+
 	for (i = 0; i < 3; i++)
 		srcmac_vendor = (srcmac_vendor << 8) | mac_hdr->h_source[i];
 
@@ -3675,26 +3700,25 @@ static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
 }
 
 /* Extract the appropriate headers based on bond's xmit policy */
-static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
-			      struct flow_keys *fk)
+static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb, const void *data,
+			      __be16 l2_proto, int nhoff, int hlen, struct flow_keys *fk)
 {
 	bool l34 = bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34;
-	int noff, proto = -1;
+	int ip_proto = -1;
 
 	switch (bond->params.xmit_policy) {
 	case BOND_XMIT_POLICY_ENCAP23:
 	case BOND_XMIT_POLICY_ENCAP34:
 		memset(fk, 0, sizeof(*fk));
 		return __skb_flow_dissect(NULL, skb, &flow_keys_bonding,
-					  fk, NULL, 0, 0, 0, 0);
+					  fk, data, l2_proto, nhoff, hlen, 0);
 	default:
 		break;
 	}
 
 	fk->ports.ports = 0;
 	memset(&fk->icmp, 0, sizeof(fk->icmp));
-	noff = skb_network_offset(skb);
-	if (!bond_flow_ip(skb, fk, &noff, &proto, l34))
+	if (!bond_flow_ip(skb, fk, data, hlen, l2_proto, &nhoff, &ip_proto, l34))
 		return false;
 
 	/* ICMP error packets contains at least 8 bytes of the header
@@ -3702,22 +3726,20 @@ static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
 	 * to correlate ICMP error packets within the same flow which
 	 * generated the error.
 	 */
-	if (proto == IPPROTO_ICMP || proto == IPPROTO_ICMPV6) {
-		skb_flow_get_icmp_tci(skb, &fk->icmp, skb->data,
-				      skb_transport_offset(skb),
-				      skb_headlen(skb));
-		if (proto == IPPROTO_ICMP) {
+	if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
+		skb_flow_get_icmp_tci(skb, &fk->icmp, data, nhoff, hlen);
+		if (ip_proto == IPPROTO_ICMP) {
 			if (!icmp_is_err(fk->icmp.type))
 				return true;
 
-			noff += sizeof(struct icmphdr);
-		} else if (proto == IPPROTO_ICMPV6) {
+			nhoff += sizeof(struct icmphdr);
+		} else if (ip_proto == IPPROTO_ICMPV6) {
 			if (!icmpv6_is_err(fk->icmp.type))
 				return true;
 
-			noff += sizeof(struct icmp6hdr);
+			nhoff += sizeof(struct icmp6hdr);
 		}
-		return bond_flow_ip(skb, fk, &noff, &proto, l34);
+		return bond_flow_ip(skb, fk, data, hlen, l2_proto, &nhoff, &ip_proto, l34);
 	}
 
 	return true;
@@ -3733,33 +3755,26 @@ static u32 bond_ip_hash(u32 hash, struct flow_keys *flow)
 	return hash >> 1;
 }
 
-/**
- * bond_xmit_hash - generate a hash value based on the xmit policy
- * @bond: bonding device
- * @skb: buffer to use for headers
- *
- * This function will extract the necessary headers from the skb buffer and use
- * them to generate a hash based on the xmit_policy set in the bonding device
+/* Generate hash based on xmit policy. If @skb is given it is used to linearize
+ * the data as required, but this function can be used without it if the data is
+ * known to be linear (e.g. with xdp_buff).
  */
-u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
+static u32 __bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, const void *data,
+			    __be16 l2_proto, int mhoff, int nhoff, int hlen)
 {
 	struct flow_keys flow;
 	u32 hash;
 
-	if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
-	    skb->l4_hash)
-		return skb->hash;
-
 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_VLAN_SRCMAC)
-		return bond_vlan_srcmac_hash(skb);
+		return bond_vlan_srcmac_hash(skb, data, mhoff, hlen);
 
 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
-	    !bond_flow_dissect(bond, skb, &flow))
-		return bond_eth_hash(skb);
+	    !bond_flow_dissect(bond, skb, data, l2_proto, nhoff, hlen, &flow))
+		return bond_eth_hash(skb, data, mhoff, hlen);
 
 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 ||
 	    bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23) {
-		hash = bond_eth_hash(skb);
+		hash = bond_eth_hash(skb, data, mhoff, hlen);
 	} else {
 		if (flow.icmp.id)
 			memcpy(&hash, &flow.icmp, sizeof(hash));
@@ -3770,6 +3785,25 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
 	return bond_ip_hash(hash, &flow);
 }
 
+/**
+ * bond_xmit_hash - generate a hash value based on the xmit policy
+ * @bond: bonding device
+ * @skb: buffer to use for headers
+ *
+ * This function will extract the necessary headers from the skb buffer and use
+ * them to generate a hash based on the xmit_policy set in the bonding device
+ */
+u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
+{
+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
+	    skb->l4_hash)
+		return skb->hash;
+
+	return __bond_xmit_hash(bond, skb, skb->head, skb->protocol,
+				skb->mac_header, skb->network_header,
+				skb_headlen(skb));
+}
+
 /*-------------------------- Device entry points ----------------------------*/
 
 void bond_work_init_all(struct bonding *bond)
@@ -4399,8 +4433,7 @@ static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
 	return bond_tx_drop(bond_dev, skb);
 }
 
-static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond,
-						      struct sk_buff *skb)
+static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond)
 {
 	return rcu_dereference(bond->curr_active_slave);
 }
@@ -4414,7 +4447,7 @@ static netdev_tx_t bond_xmit_activebackup(struct sk_buff *skb,
 	struct bonding *bond = netdev_priv(bond_dev);
 	struct slave *slave;
 
-	slave = bond_xmit_activebackup_slave_get(bond, skb);
+	slave = bond_xmit_activebackup_slave_get(bond);
 	if (slave)
 		return bond_dev_queue_xmit(bond, skb, slave->dev);
 
@@ -4712,7 +4745,7 @@ static struct net_device *bond_xmit_get_slave(struct net_device *master_dev,
 		slave = bond_xmit_roundrobin_slave_get(bond, skb);
 		break;
 	case BOND_MODE_ACTIVEBACKUP:
-		slave = bond_xmit_activebackup_slave_get(bond, skb);
+		slave = bond_xmit_activebackup_slave_get(bond);
 		break;
 	case BOND_MODE_8023AD:
 	case BOND_MODE_XOR:
-- 
2.17.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v4 2/6] net: core: Add support for XDP redirection to slave device
  2021-07-28 23:43 ` [PATCH bpf-next v4 0/6] XDP bonding support joamaki
  2021-07-28 23:43   ` [PATCH bpf-next v4 1/6] net: bonding: Refactor bond_xmit_hash for use with xdp_buff joamaki
@ 2021-07-28 23:43   ` joamaki
  2021-07-28 23:43   ` [PATCH bpf-next v4 3/6] net: bonding: Add XDP support to the bonding driver joamaki
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 71+ messages in thread
From: joamaki @ 2021-07-28 23:43 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

From: Jussi Maki <joamaki@gmail.com>

This adds the ndo_xdp_get_xmit_slave hook for transforming XDP_TX
into XDP_REDIRECT after BPF program run when the ingress device
is a bond slave.

The dev_xdp_prog_count is exposed so that slave devices can be checked
for loaded XDP programs in order to avoid the situation where both
bond master and slave have programs loaded according to xdp_state.

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 include/linux/filter.h    | 13 ++++++++++++-
 include/linux/netdevice.h |  6 ++++++
 net/core/dev.c            |  8 +++++++-
 net/core/filter.c         | 25 +++++++++++++++++++++++++
 4 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index ba36989f711a..7ea1cc378042 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -761,6 +761,10 @@ static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog,
 
 DECLARE_BPF_DISPATCHER(xdp)
 
+DECLARE_STATIC_KEY_FALSE(bpf_master_redirect_enabled_key);
+
+u32 xdp_master_redirect(struct xdp_buff *xdp);
+
 static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
 					    struct xdp_buff *xdp)
 {
@@ -768,7 +772,14 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
 	 * under local_bh_disable(), which provides the needed RCU protection
 	 * for accessing map entries.
 	 */
-	return __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
+	u32 act = __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
+
+	if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) {
+		if (act == XDP_TX && netif_is_bond_slave(xdp->rxq->dev))
+			act = xdp_master_redirect(xdp);
+	}
+
+	return act;
 }
 
 void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 42f6f866d5f3..a380786429e1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1321,6 +1321,9 @@ struct netdev_net_notifier {
  *	that got dropped are freed/returned via xdp_return_frame().
  *	Returns negative number, means general error invoking ndo, meaning
  *	no frames were xmit'ed and core-caller will free all frames.
+ * struct net_device *(*ndo_xdp_get_xmit_slave)(struct net_device *dev,
+ *					        struct xdp_buff *xdp);
+ *      Get the xmit slave of master device based on the xdp_buff.
  * int (*ndo_xsk_wakeup)(struct net_device *dev, u32 queue_id, u32 flags);
  *      This function is used to wake up the softirq, ksoftirqd or kthread
  *	responsible for sending and/or receiving packets on a specific
@@ -1539,6 +1542,8 @@ struct net_device_ops {
 	int			(*ndo_xdp_xmit)(struct net_device *dev, int n,
 						struct xdp_frame **xdp,
 						u32 flags);
+	struct net_device *	(*ndo_xdp_get_xmit_slave)(struct net_device *dev,
+							  struct xdp_buff *xdp);
 	int			(*ndo_xsk_wakeup)(struct net_device *dev,
 						  u32 queue_id, u32 flags);
 	struct devlink_port *	(*ndo_get_devlink_port)(struct net_device *dev);
@@ -4071,6 +4076,7 @@ typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf);
 int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 		      int fd, int expected_fd, u32 flags);
 int bpf_xdp_link_attach(const union bpf_attr *attr, struct bpf_prog *prog);
+u8 dev_xdp_prog_count(struct net_device *dev);
 u32 dev_xdp_prog_id(struct net_device *dev, enum bpf_xdp_mode mode);
 
 int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index 3ee58876e8f5..99cb14242164 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9353,7 +9353,7 @@ static struct bpf_prog *dev_xdp_prog(struct net_device *dev,
 	return dev->xdp_state[mode].prog;
 }
 
-static u8 dev_xdp_prog_count(struct net_device *dev)
+u8 dev_xdp_prog_count(struct net_device *dev)
 {
 	u8 count = 0;
 	int i;
@@ -9363,6 +9363,7 @@ static u8 dev_xdp_prog_count(struct net_device *dev)
 			count++;
 	return count;
 }
+EXPORT_SYMBOL_GPL(dev_xdp_prog_count);
 
 u32 dev_xdp_prog_id(struct net_device *dev, enum bpf_xdp_mode mode)
 {
@@ -9486,6 +9487,11 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack
 		NL_SET_ERR_MSG(extack, "XDP_FLAGS_REPLACE is not specified");
 		return -EINVAL;
 	}
+	/* don't allow loading XDP programs to a bonded device */
+	if (netif_is_bond_slave(dev)) {
+		NL_SET_ERR_MSG(extack, "XDP program can not be attached to a bond slave");
+		return -EINVAL;
+	}
 
 	mode = dev_xdp_mode(dev, flags);
 	/* can't replace attached link */
diff --git a/net/core/filter.c b/net/core/filter.c
index faf29fd82276..ff62cd39046d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3950,6 +3950,31 @@ void bpf_clear_redirect_map(struct bpf_map *map)
 	}
 }
 
+DEFINE_STATIC_KEY_FALSE(bpf_master_redirect_enabled_key);
+EXPORT_SYMBOL_GPL(bpf_master_redirect_enabled_key);
+
+u32 xdp_master_redirect(struct xdp_buff *xdp)
+{
+	struct net_device *master, *slave;
+	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+
+	master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev);
+	slave = master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp);
+	if (slave && slave != xdp->rxq->dev) {
+		/* The target device is different from the receiving device, so
+		 * redirect it to the new device.
+		 * Using XDP_REDIRECT gets the correct behaviour from XDP enabled
+		 * drivers to unmap the packet from their rx ring.
+		 */
+		ri->tgt_index = slave->ifindex;
+		ri->map_id = INT_MAX;
+		ri->map_type = BPF_MAP_TYPE_UNSPEC;
+		return XDP_REDIRECT;
+	}
+	return XDP_TX;
+}
+EXPORT_SYMBOL_GPL(xdp_master_redirect);
+
 int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
 		    struct bpf_prog *xdp_prog)
 {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v4 3/6] net: bonding: Add XDP support to the bonding driver
  2021-07-28 23:43 ` [PATCH bpf-next v4 0/6] XDP bonding support joamaki
  2021-07-28 23:43   ` [PATCH bpf-next v4 1/6] net: bonding: Refactor bond_xmit_hash for use with xdp_buff joamaki
  2021-07-28 23:43   ` [PATCH bpf-next v4 2/6] net: core: Add support for XDP redirection to slave device joamaki
@ 2021-07-28 23:43   ` joamaki
  2021-07-28 23:43   ` [PATCH bpf-next v4 4/6] devmap: Exclude XDP broadcast to master device joamaki
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 71+ messages in thread
From: joamaki @ 2021-07-28 23:43 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

From: Jussi Maki <joamaki@gmail.com>

XDP is implemented in the bonding driver by transparently delegating
the XDP program loading, removal and xmit operations to the bonding
slave devices. The overall goal of this work is that XDP programs
can be attached to a bond device *without* any further changes (or
awareness) necessary to the program itself, meaning the same XDP
program can be attached to a native device but also a bonding device.

Semantics of XDP_TX when attached to a bond are equivalent in such
setting to the case when a tc/BPF program would be attached to the
bond, meaning transmitting the packet out of the bond itself using one
of the bond's configured xmit methods to select a slave device (rather
than XDP_TX on the slave itself). Handling of XDP_TX to transmit
using the configured bonding mechanism is therefore implemented by
rewriting the BPF program return value in bpf_prog_run_xdp. To avoid
performance impact this check is guarded by a static key, which is
incremented when a XDP program is loaded onto a bond device. This
approach was chosen to avoid changes to drivers implementing XDP. If
the slave device does not match the receive device, then XDP_REDIRECT
is transparently used to perform the redirection in order to have
the network driver release the packet from its RX ring.  The bonding
driver hashing functions have been refactored to allow reuse with
xdp_buff's to avoid code duplication.

The motivation for this change is to enable use of bonding (and
802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
XDP and also to transparently support bond devices for projects that
use XDP given most modern NICs have dual port adapters.  An alternative
to this approach would be to implement 802.3ad in user-space and
implement the bonding load-balancing in the XDP program itself, but
is rather a cumbersome endeavor in terms of slave device management
(e.g. by watching netlink) and requires separate programs for native
vs bond cases for the orchestrator. A native in-kernel implementation
overcomes these issues and provides more flexibility.

Below are benchmark results done on two machines with 100Gbit
Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
16-core 3950X on receiving machine. 64 byte packets were sent with
pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
ice driver, so the tests were performed with iommu=off and patch [2]
applied. Additionally the bonding round robin algorithm was modified
to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
of cache misses were caused by the shared rr_tx_counter (see patch
2/3). The statistics were collected using "sar -n dev -u 1 10".

 -----------------------|  CPU  |--| rxpck/s |--| txpck/s |----
 without patch (1 dev):
   XDP_DROP:              3.15%      48.6Mpps
   XDP_TX:                3.12%      18.3Mpps     18.3Mpps
   XDP_DROP (RSS):        9.47%      116.5Mpps
   XDP_TX (RSS):          9.67%      25.3Mpps     24.2Mpps
 -----------------------
 with patch, bond (1 dev):
   XDP_DROP:              3.14%      46.7Mpps
   XDP_TX:                3.15%      13.9Mpps     13.9Mpps
   XDP_DROP (RSS):        10.33%     117.2Mpps
   XDP_TX (RSS):          10.64%     25.1Mpps     24.0Mpps
 -----------------------
 with patch, bond (2 devs):
   XDP_DROP:              6.27%      92.7Mpps
   XDP_TX:                6.26%      17.6Mpps     17.5Mpps
   XDP_DROP (RSS):       11.38%      117.2Mpps
   XDP_TX (RSS):         14.30%      28.7Mpps     27.4Mpps
 --------------------------------------------------------------

RSS: Receive Side Scaling, e.g. the packets were sent to a range of
destination IPs.

[1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb
[2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t
[3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 drivers/net/bonding/bond_main.c | 309 +++++++++++++++++++++++++++++++-
 include/net/bonding.h           |   1 +
 2 files changed, 309 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index dcec5cc4dab1..fcd01acd1c83 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -317,6 +317,19 @@ bool bond_sk_check(struct bonding *bond)
 	}
 }
 
+static bool bond_xdp_check(struct bonding *bond)
+{
+	switch (BOND_MODE(bond)) {
+	case BOND_MODE_ROUNDROBIN:
+	case BOND_MODE_ACTIVEBACKUP:
+	case BOND_MODE_8023AD:
+	case BOND_MODE_XOR:
+		return true;
+	default:
+		return false;
+	}
+}
+
 /*---------------------------------- VLAN -----------------------------------*/
 
 /* In the following 2 functions, bond_vlan_rx_add_vid and bond_vlan_rx_kill_vid,
@@ -2133,6 +2146,41 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
 		bond_update_slave_arr(bond, NULL);
 
 
+	if (!slave_dev->netdev_ops->ndo_bpf ||
+	    !slave_dev->netdev_ops->ndo_xdp_xmit) {
+		if (bond->xdp_prog) {
+			NL_SET_ERR_MSG(extack, "Slave does not support XDP");
+			slave_err(bond_dev, slave_dev, "Slave does not support XDP\n");
+			res = -EOPNOTSUPP;
+			goto err_sysfs_del;
+		}
+	} else {
+		struct netdev_bpf xdp = {
+			.command = XDP_SETUP_PROG,
+			.flags   = 0,
+			.prog    = bond->xdp_prog,
+			.extack  = extack,
+		};
+
+		if (dev_xdp_prog_count(slave_dev) > 0) {
+			NL_SET_ERR_MSG(extack,
+				       "Slave has XDP program loaded, please unload before enslaving");
+			slave_err(bond_dev, slave_dev,
+				  "Slave has XDP program loaded, please unload before enslaving\n");
+			res = -EOPNOTSUPP;
+			goto err_sysfs_del;
+		}
+
+		res = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp);
+		if (res < 0) {
+			/* ndo_bpf() sets extack error message */
+			slave_dbg(bond_dev, slave_dev, "Error %d calling ndo_bpf\n", res);
+			goto err_sysfs_del;
+		}
+		if (bond->xdp_prog)
+			bpf_prog_inc(bond->xdp_prog);
+	}
+
 	slave_info(bond_dev, slave_dev, "Enslaving as %s interface with %s link\n",
 		   bond_is_active_slave(new_slave) ? "an active" : "a backup",
 		   new_slave->link != BOND_LINK_DOWN ? "an up" : "a down");
@@ -2252,6 +2300,17 @@ static int __bond_release_one(struct net_device *bond_dev,
 	/* recompute stats just before removing the slave */
 	bond_get_stats(bond->dev, &bond->bond_stats);
 
+	if (bond->xdp_prog) {
+		struct netdev_bpf xdp = {
+			.command = XDP_SETUP_PROG,
+			.flags   = 0,
+			.prog	 = NULL,
+			.extack  = NULL,
+		};
+		if (slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp))
+			slave_warn(bond_dev, slave_dev, "failed to unload XDP program\n");
+	}
+
 	bond_upper_dev_unlink(bond, slave);
 	/* unregister rx_handler early so bond_handle_frame wouldn't be called
 	 * for this slave anymore.
@@ -3635,7 +3694,7 @@ static inline u32 bond_eth_hash(struct sk_buff *skb, const void *data, int mhoff
 		return 0;
 
 	ep = (struct ethhdr *)(data + mhoff);
-	return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto;
+	return ep->h_dest[5] ^ ep->h_source[5] ^ be16_to_cpu(ep->h_proto);
 }
 
 static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk, const void *data,
@@ -3804,6 +3863,26 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
 				skb_headlen(skb));
 }
 
+/**
+ * bond_xmit_hash_xdp - generate a hash value based on the xmit policy
+ * @bond: bonding device
+ * @xdp: buffer to use for headers
+ *
+ * The XDP variant of bond_xmit_hash.
+ */
+static u32 bond_xmit_hash_xdp(struct bonding *bond, struct xdp_buff *xdp)
+{
+	struct ethhdr *eth;
+
+	if (xdp->data + sizeof(struct ethhdr) > xdp->data_end)
+		return 0;
+
+	eth = (struct ethhdr *)xdp->data;
+
+	return __bond_xmit_hash(bond, NULL, xdp->data, eth->h_proto, 0,
+				sizeof(struct ethhdr), xdp->data_end - xdp->data);
+}
+
 /*-------------------------- Device entry points ----------------------------*/
 
 void bond_work_init_all(struct bonding *bond)
@@ -4420,6 +4499,47 @@ static struct slave *bond_xmit_roundrobin_slave_get(struct bonding *bond,
 	return NULL;
 }
 
+static struct slave *bond_xdp_xmit_roundrobin_slave_get(struct bonding *bond,
+							struct xdp_buff *xdp)
+{
+	struct slave *slave;
+	int slave_cnt;
+	u32 slave_id;
+	const struct ethhdr *eth;
+	void *data = xdp->data;
+
+	if (data + sizeof(struct ethhdr) > xdp->data_end)
+		goto non_igmp;
+
+	eth = (struct ethhdr *)data;
+	data += sizeof(struct ethhdr);
+
+	/* See comment on IGMP in bond_xmit_roundrobin_slave_get() */
+	if (eth->h_proto == htons(ETH_P_IP)) {
+		const struct iphdr *iph;
+
+		if (data + sizeof(struct iphdr) > xdp->data_end)
+			goto non_igmp;
+
+		iph = (struct iphdr *)data;
+
+		if (iph->protocol == IPPROTO_IGMP) {
+			slave = rcu_dereference(bond->curr_active_slave);
+			if (slave)
+				return slave;
+			return bond_get_slave_by_id(bond, 0);
+		}
+	}
+
+non_igmp:
+	slave_cnt = READ_ONCE(bond->slave_cnt);
+	if (likely(slave_cnt)) {
+		slave_id = bond_rr_gen_slave_id(bond) % slave_cnt;
+		return bond_get_slave_by_id(bond, slave_id);
+	}
+	return NULL;
+}
+
 static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
 					struct net_device *bond_dev)
 {
@@ -4635,6 +4755,22 @@ static struct slave *bond_xmit_3ad_xor_slave_get(struct bonding *bond,
 	return slave;
 }
 
+static struct slave *bond_xdp_xmit_3ad_xor_slave_get(struct bonding *bond,
+						     struct xdp_buff *xdp)
+{
+	struct bond_up_slave *slaves;
+	unsigned int count;
+	u32 hash;
+
+	hash = bond_xmit_hash_xdp(bond, xdp);
+	slaves = rcu_dereference(bond->usable_slaves);
+	count = slaves ? READ_ONCE(slaves->count) : 0;
+	if (unlikely(!count))
+		return NULL;
+
+	return slaves->arr[hash % count];
+}
+
 /* Use this Xmit function for 3AD as well as XOR modes. The current
  * usable slave array is formed in the control path. The xmit function
  * just calculates hash and sends the packet out.
@@ -4919,6 +5055,174 @@ static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	return ret;
 }
 
+static struct net_device *
+bond_xdp_get_xmit_slave(struct net_device *bond_dev, struct xdp_buff *xdp)
+{
+	struct bonding *bond = netdev_priv(bond_dev);
+	struct slave *slave;
+
+	/* Caller needs to hold rcu_read_lock() */
+
+	switch (BOND_MODE(bond)) {
+	case BOND_MODE_ROUNDROBIN:
+		slave = bond_xdp_xmit_roundrobin_slave_get(bond, xdp);
+		break;
+
+	case BOND_MODE_ACTIVEBACKUP:
+		slave = bond_xmit_activebackup_slave_get(bond);
+		break;
+
+	case BOND_MODE_8023AD:
+	case BOND_MODE_XOR:
+		slave = bond_xdp_xmit_3ad_xor_slave_get(bond, xdp);
+		break;
+
+	default:
+		/* Should never happen. Mode guarded by bond_xdp_check() */
+		netdev_err(bond_dev, "Unknown bonding mode %d for xdp xmit\n", BOND_MODE(bond));
+		WARN_ON_ONCE(1);
+		return NULL;
+	}
+
+	if (slave)
+		return slave->dev;
+
+	return NULL;
+}
+
+static int bond_xdp_xmit(struct net_device *bond_dev,
+			 int n, struct xdp_frame **frames, u32 flags)
+{
+	int nxmit, err = -ENXIO;
+
+	rcu_read_lock();
+
+	for (nxmit = 0; nxmit < n; nxmit++) {
+		struct xdp_frame *frame = frames[nxmit];
+		struct xdp_frame *frames1[] = {frame};
+		struct net_device *slave_dev;
+		struct xdp_buff xdp;
+
+		xdp_convert_frame_to_buff(frame, &xdp);
+
+		slave_dev = bond_xdp_get_xmit_slave(bond_dev, &xdp);
+		if (!slave_dev) {
+			err = -ENXIO;
+			break;
+		}
+
+		err = slave_dev->netdev_ops->ndo_xdp_xmit(slave_dev, 1, frames1, flags);
+		if (err < 1)
+			break;
+	}
+
+	rcu_read_unlock();
+
+	/* If error happened on the first frame then we can pass the error up, otherwise
+	 * report the number of frames that were xmitted.
+	 */
+	if (err < 0)
+		return (nxmit == 0 ? err : nxmit);
+
+	return nxmit;
+}
+
+static int bond_xdp_set(struct net_device *dev, struct bpf_prog *prog,
+			struct netlink_ext_ack *extack)
+{
+	struct bonding *bond = netdev_priv(dev);
+	struct list_head *iter;
+	struct slave *slave, *rollback_slave;
+	struct bpf_prog *old_prog;
+	struct netdev_bpf xdp = {
+		.command = XDP_SETUP_PROG,
+		.flags   = 0,
+		.prog    = prog,
+		.extack  = extack,
+	};
+	int err;
+
+	ASSERT_RTNL();
+
+	if (!bond_xdp_check(bond))
+		return -EOPNOTSUPP;
+
+	old_prog = bond->xdp_prog;
+	bond->xdp_prog = prog;
+
+	bond_for_each_slave(bond, slave, iter) {
+		struct net_device *slave_dev = slave->dev;
+
+		if (!slave_dev->netdev_ops->ndo_bpf ||
+		    !slave_dev->netdev_ops->ndo_xdp_xmit) {
+			NL_SET_ERR_MSG(extack, "Slave device does not support XDP");
+			slave_err(dev, slave_dev, "Slave does not support XDP\n");
+			err = -EOPNOTSUPP;
+			goto err;
+		}
+
+		if (dev_xdp_prog_count(slave_dev) > 0) {
+			NL_SET_ERR_MSG(extack,
+				       "Slave has XDP program loaded, please unload before enslaving");
+			slave_err(dev, slave_dev,
+				  "Slave has XDP program loaded, please unload before enslaving\n");
+			err = -EOPNOTSUPP;
+			goto err;
+		}
+
+		err = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp);
+		if (err < 0) {
+			/* ndo_bpf() sets extack error message */
+			slave_err(dev, slave_dev, "Error %d calling ndo_bpf\n", err);
+			goto err;
+		}
+		if (prog)
+			bpf_prog_inc(prog);
+	}
+
+	if (old_prog)
+		bpf_prog_put(old_prog);
+
+	if (prog)
+		static_branch_inc(&bpf_master_redirect_enabled_key);
+	else
+		static_branch_dec(&bpf_master_redirect_enabled_key);
+
+	return 0;
+
+err:
+	/* unwind the program changes */
+	bond->xdp_prog = old_prog;
+	xdp.prog = old_prog;
+	xdp.extack = NULL; /* do not overwrite original error */
+
+	bond_for_each_slave(bond, rollback_slave, iter) {
+		struct net_device *slave_dev = rollback_slave->dev;
+		int err_unwind;
+
+		if (slave == rollback_slave)
+			break;
+
+		err_unwind = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp);
+		if (err_unwind < 0)
+			slave_err(dev, slave_dev,
+				  "Error %d when unwinding XDP program change\n", err_unwind);
+		else if (xdp.prog)
+			bpf_prog_inc(xdp.prog);
+	}
+	return err;
+}
+
+static int bond_xdp(struct net_device *dev, struct netdev_bpf *xdp)
+{
+	switch (xdp->command) {
+	case XDP_SETUP_PROG:
+		return bond_xdp_set(dev, xdp->prog, xdp->extack);
+	default:
+		return -EINVAL;
+	}
+}
+
 static u32 bond_mode_bcast_speed(struct slave *slave, u32 speed)
 {
 	if (speed == 0 || speed == SPEED_UNKNOWN)
@@ -5005,6 +5309,9 @@ static const struct net_device_ops bond_netdev_ops = {
 	.ndo_features_check	= passthru_features_check,
 	.ndo_get_xmit_slave	= bond_xmit_get_slave,
 	.ndo_sk_get_lower_dev	= bond_sk_get_lower_dev,
+	.ndo_bpf		= bond_xdp,
+	.ndo_xdp_xmit           = bond_xdp_xmit,
+	.ndo_xdp_get_xmit_slave = bond_xdp_get_xmit_slave,
 };
 
 static const struct device_type bond_type = {
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 625d9c72dee3..b91c365e4e95 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -258,6 +258,7 @@ struct bonding {
 	/* protecting ipsec_list */
 	spinlock_t ipsec_lock;
 #endif /* CONFIG_XFRM_OFFLOAD */
+	struct bpf_prog *xdp_prog;
 };
 
 #define bond_slave_get_rcu(dev) \
-- 
2.17.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v4 4/6] devmap: Exclude XDP broadcast to master device
  2021-07-28 23:43 ` [PATCH bpf-next v4 0/6] XDP bonding support joamaki
                     ` (2 preceding siblings ...)
  2021-07-28 23:43   ` [PATCH bpf-next v4 3/6] net: bonding: Add XDP support to the bonding driver joamaki
@ 2021-07-28 23:43   ` joamaki
  2021-07-28 23:43   ` [PATCH bpf-next v4 5/6] net: core: Allow netdev_lower_get_next_private_rcu in bh context joamaki
  2021-07-28 23:43   ` [PATCH bpf-next v4 6/6] selftests/bpf: Add tests for XDP bonding joamaki
  5 siblings, 0 replies; 71+ messages in thread
From: joamaki @ 2021-07-28 23:43 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

From: Jussi Maki <joamaki@gmail.com>

If the ingress device is bond slave, do not broadcast back
through it or the bond master.

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 kernel/bpf/devmap.c | 69 +++++++++++++++++++++++++++++++++++++++------
 1 file changed, 60 insertions(+), 9 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 542e94fa30b4..f02d04540c0c 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -534,10 +534,9 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
 	return __xdp_enqueue(dev, xdp, dev_rx, dst->xdp_prog);
 }
 
-static bool is_valid_dst(struct bpf_dtab_netdev *obj, struct xdp_buff *xdp,
-			 int exclude_ifindex)
+static bool is_valid_dst(struct bpf_dtab_netdev *obj, struct xdp_buff *xdp)
 {
-	if (!obj || obj->dev->ifindex == exclude_ifindex ||
+	if (!obj ||
 	    !obj->dev->netdev_ops->ndo_xdp_xmit)
 		return false;
 
@@ -562,17 +561,48 @@ static int dev_map_enqueue_clone(struct bpf_dtab_netdev *obj,
 	return 0;
 }
 
+static inline bool is_ifindex_excluded(int *excluded, int num_excluded, int ifindex)
+{
+	while (num_excluded--) {
+		if (ifindex == excluded[num_excluded])
+			return true;
+	}
+	return false;
+}
+
+/* Get ifindex of each upper device. 'indexes' must be able to hold at
+ * least MAX_NEST_DEV elements.
+ * Returns the number of ifindexes added.
+ */
+static int get_upper_ifindexes(struct net_device *dev, int *indexes)
+{
+	struct net_device *upper;
+	struct list_head *iter;
+	int n = 0;
+
+	netdev_for_each_upper_dev_rcu(dev, upper, iter) {
+		indexes[n++] = upper->ifindex;
+	}
+	return n;
+}
+
 int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
 			  struct bpf_map *map, bool exclude_ingress)
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
-	int exclude_ifindex = exclude_ingress ? dev_rx->ifindex : 0;
 	struct bpf_dtab_netdev *dst, *last_dst = NULL;
+	int excluded_devices[1+MAX_NEST_DEV];
 	struct hlist_head *head;
 	struct xdp_frame *xdpf;
+	int num_excluded = 0;
 	unsigned int i;
 	int err;
 
+	if (exclude_ingress) {
+		num_excluded = get_upper_ifindexes(dev_rx, excluded_devices);
+		excluded_devices[num_excluded++] = dev_rx->ifindex;
+	}
+
 	xdpf = xdp_convert_buff_to_frame(xdp);
 	if (unlikely(!xdpf))
 		return -EOVERFLOW;
@@ -581,7 +611,10 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
 		for (i = 0; i < map->max_entries; i++) {
 			dst = rcu_dereference_check(dtab->netdev_map[i],
 						    rcu_read_lock_bh_held());
-			if (!is_valid_dst(dst, xdp, exclude_ifindex))
+			if (!is_valid_dst(dst, xdp))
+				continue;
+
+			if (is_ifindex_excluded(excluded_devices, num_excluded, dst->dev->ifindex))
 				continue;
 
 			/* we only need n-1 clones; last_dst enqueued below */
@@ -601,7 +634,11 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
 			head = dev_map_index_hash(dtab, i);
 			hlist_for_each_entry_rcu(dst, head, index_hlist,
 						 lockdep_is_held(&dtab->index_lock)) {
-				if (!is_valid_dst(dst, xdp, exclude_ifindex))
+				if (!is_valid_dst(dst, xdp))
+					continue;
+
+				if (is_ifindex_excluded(excluded_devices, num_excluded,
+							dst->dev->ifindex))
 					continue;
 
 				/* we only need n-1 clones; last_dst enqueued below */
@@ -675,18 +712,27 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb,
 			   bool exclude_ingress)
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
-	int exclude_ifindex = exclude_ingress ? dev->ifindex : 0;
 	struct bpf_dtab_netdev *dst, *last_dst = NULL;
+	int excluded_devices[1+MAX_NEST_DEV];
 	struct hlist_head *head;
 	struct hlist_node *next;
+	int num_excluded = 0;
 	unsigned int i;
 	int err;
 
+	if (exclude_ingress) {
+		num_excluded = get_upper_ifindexes(dev, excluded_devices);
+		excluded_devices[num_excluded++] = dev->ifindex;
+	}
+
 	if (map->map_type == BPF_MAP_TYPE_DEVMAP) {
 		for (i = 0; i < map->max_entries; i++) {
 			dst = rcu_dereference_check(dtab->netdev_map[i],
 						    rcu_read_lock_bh_held());
-			if (!dst || dst->dev->ifindex == exclude_ifindex)
+			if (!dst)
+				continue;
+
+			if (is_ifindex_excluded(excluded_devices, num_excluded, dst->dev->ifindex))
 				continue;
 
 			/* we only need n-1 clones; last_dst enqueued below */
@@ -700,12 +746,17 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb,
 				return err;
 
 			last_dst = dst;
+
 		}
 	} else { /* BPF_MAP_TYPE_DEVMAP_HASH */
 		for (i = 0; i < dtab->n_buckets; i++) {
 			head = dev_map_index_hash(dtab, i);
 			hlist_for_each_entry_safe(dst, next, head, index_hlist) {
-				if (!dst || dst->dev->ifindex == exclude_ifindex)
+				if (!dst)
+					continue;
+
+				if (is_ifindex_excluded(excluded_devices, num_excluded,
+							dst->dev->ifindex))
 					continue;
 
 				/* we only need n-1 clones; last_dst enqueued below */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v4 5/6] net: core: Allow netdev_lower_get_next_private_rcu in bh context
  2021-07-28 23:43 ` [PATCH bpf-next v4 0/6] XDP bonding support joamaki
                     ` (3 preceding siblings ...)
  2021-07-28 23:43   ` [PATCH bpf-next v4 4/6] devmap: Exclude XDP broadcast to master device joamaki
@ 2021-07-28 23:43   ` joamaki
  2021-07-28 23:43   ` [PATCH bpf-next v4 6/6] selftests/bpf: Add tests for XDP bonding joamaki
  5 siblings, 0 replies; 71+ messages in thread
From: joamaki @ 2021-07-28 23:43 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

From: Jussi Maki <joamaki@gmail.com>

For the XDP bonding slave lookup to work in the NAPI poll context
in which the redudant rcu_read_lock() has been removed we have to
follow the same approach as in [1] and modify the WARN_ON to also
check rcu_read_lock_bh_held().

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=694cea395fded425008e93cd90cfdf7a451674af

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 net/core/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 99cb14242164..9cdb551db5dd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7588,7 +7588,7 @@ void *netdev_lower_get_next_private_rcu(struct net_device *dev,
 {
 	struct netdev_adjacent *lower;
 
-	WARN_ON_ONCE(!rcu_read_lock_held());
+	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held());
 
 	lower = list_entry_rcu((*iter)->next, struct netdev_adjacent, list);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v4 6/6] selftests/bpf: Add tests for XDP bonding
  2021-07-28 23:43 ` [PATCH bpf-next v4 0/6] XDP bonding support joamaki
                     ` (4 preceding siblings ...)
  2021-07-28 23:43   ` [PATCH bpf-next v4 5/6] net: core: Allow netdev_lower_get_next_private_rcu in bh context joamaki
@ 2021-07-28 23:43   ` joamaki
  2021-08-03  0:19     ` Andrii Nakryiko
  5 siblings, 1 reply; 71+ messages in thread
From: joamaki @ 2021-07-28 23:43 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

From: Jussi Maki <joamaki@gmail.com>

Add a test suite to test XDP bonding implementation
over a pair of veth devices.

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 .../selftests/bpf/prog_tests/xdp_bonding.c    | 467 ++++++++++++++++++
 1 file changed, 467 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c b/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c
new file mode 100644
index 000000000000..6e84c2d8d7ac
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c
@@ -0,0 +1,467 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/**
+ * Test XDP bonding support
+ *
+ * Sets up two bonded veth pairs between two fresh namespaces
+ * and verifies that XDP_TX program loaded on a bond device
+ * are correctly loaded onto the slave devices and XDP_TX'd
+ * packets are balanced using bonding.
+ */
+
+#define _GNU_SOURCE
+#include <sched.h>
+#include <net/if.h>
+#include <linux/if_link.h>
+#include "test_progs.h"
+#include "network_helpers.h"
+#include <linux/if_bonding.h>
+#include <linux/limits.h>
+#include <linux/udp.h>
+
+#define BOND1_MAC {0x00, 0x11, 0x22, 0x33, 0x44, 0x55}
+#define BOND1_MAC_STR "00:11:22:33:44:55"
+#define BOND2_MAC {0x00, 0x22, 0x33, 0x44, 0x55, 0x66}
+#define BOND2_MAC_STR "00:22:33:44:55:66"
+#define NPACKETS 100
+
+static int root_netns_fd = -1;
+
+static void restore_root_netns(void)
+{
+	ASSERT_OK(setns(root_netns_fd, CLONE_NEWNET), "restore_root_netns");
+}
+
+int setns_by_name(char *name)
+{
+	int nsfd, err;
+	char nspath[PATH_MAX];
+
+	snprintf(nspath, sizeof(nspath), "%s/%s", "/var/run/netns", name);
+	nsfd = open(nspath, O_RDONLY | O_CLOEXEC);
+	if (nsfd < 0)
+		return -1;
+
+	err = setns(nsfd, CLONE_NEWNET);
+	close(nsfd);
+	return err;
+}
+
+static int get_rx_packets(const char *iface)
+{
+	FILE *f;
+	char line[512];
+	int iface_len = strlen(iface);
+
+	f = fopen("/proc/net/dev", "r");
+	if (!f)
+		return -1;
+
+	while (fgets(line, sizeof(line), f)) {
+		char *p = line;
+
+		while (*p == ' ')
+			p++; /* skip whitespace */
+		if (!strncmp(p, iface, iface_len)) {
+			p += iface_len;
+			if (*p++ != ':')
+				continue;
+			while (*p == ' ')
+				p++; /* skip whitespace */
+			while (*p && *p != ' ')
+				p++; /* skip rx bytes */
+			while (*p == ' ')
+				p++; /* skip whitespace */
+			fclose(f);
+			return atoi(p);
+		}
+	}
+	fclose(f);
+	return -1;
+}
+
+enum {
+	BOND_ONE_NO_ATTACH = 0,
+	BOND_BOTH_AND_ATTACH,
+};
+
+static const char * const mode_names[] = {
+	[BOND_MODE_ROUNDROBIN]   = "balance-rr",
+	[BOND_MODE_ACTIVEBACKUP] = "active-backup",
+	[BOND_MODE_XOR]          = "balance-xor",
+	[BOND_MODE_BROADCAST]    = "broadcast",
+	[BOND_MODE_8023AD]       = "802.3ad",
+	[BOND_MODE_TLB]          = "balance-tlb",
+	[BOND_MODE_ALB]          = "balance-alb",
+};
+
+static const char * const xmit_policy_names[] = {
+	[BOND_XMIT_POLICY_LAYER2]       = "layer2",
+	[BOND_XMIT_POLICY_LAYER34]      = "layer3+4",
+	[BOND_XMIT_POLICY_LAYER23]      = "layer2+3",
+	[BOND_XMIT_POLICY_ENCAP23]      = "encap2+3",
+	[BOND_XMIT_POLICY_ENCAP34]      = "encap3+4",
+};
+
+#define MAX_LOADED 8
+static struct bpf_object *loaded_bpf_objects[MAX_LOADED] = {};
+static int n_loaded_bpf_objects;
+
+static int load_xdp_program(const char *filename, const char *sec_name, const char *iface)
+{
+	struct bpf_prog_load_attr prog_load_attr = {
+		.prog_type = BPF_PROG_TYPE_XDP,
+		.file = filename,
+	};
+	struct bpf_program *prog;
+	struct bpf_object *obj;
+	int prog_fd = -1;
+	int ifindex, err;
+
+	err = bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd);
+	if (!ASSERT_OK(err, "prog load xattr"))
+		return err;
+
+	prog = bpf_object__find_program_by_title(obj, sec_name);
+	if (!ASSERT_OK_PTR(prog, "find program"))
+		goto err;
+
+	prog_fd = bpf_program__fd(prog);
+	if (!ASSERT_GE(prog_fd, 0, "get program fd"))
+		goto err;
+
+	ifindex = if_nametoindex(iface);
+	if (!ASSERT_GT(ifindex, 0, "get ifindex"))
+		goto err;
+
+	err = bpf_set_link_xdp_fd(ifindex, prog_fd, XDP_FLAGS_DRV_MODE | XDP_FLAGS_DRV_MODE);
+	if (!ASSERT_OK(err, "load xdp program"))
+		goto err;
+
+	loaded_bpf_objects[n_loaded_bpf_objects++] = obj;
+	if (n_loaded_bpf_objects == MAX_LOADED) {
+		fprintf(stderr, "Too many loaded BPF objects\n");
+		goto err;
+	}
+
+	return 0;
+
+err:
+	bpf_object__close(obj);
+	return -1;
+}
+
+static int bonding_setup(int mode, int xmit_policy, int bond_both_attach)
+{
+#define SYS(fmt, ...)						\
+	({							\
+		char cmd[1024];					\
+		snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__);	\
+		if (!ASSERT_OK(system(cmd), cmd))		\
+			return -1;				\
+	})
+
+	SYS("ip netns add ns_dst");
+	SYS("ip link add veth1_1 type veth peer name veth2_1 netns ns_dst");
+	SYS("ip link add veth1_2 type veth peer name veth2_2 netns ns_dst");
+
+	SYS("ip link add bond1 type bond mode %s xmit_hash_policy %s",
+	    mode_names[mode], xmit_policy_names[xmit_policy]);
+	SYS("ip link set bond1 up address " BOND1_MAC_STR " addrgenmode none");
+	SYS("ip -netns ns_dst link add bond2 type bond mode %s xmit_hash_policy %s",
+	    mode_names[mode], xmit_policy_names[xmit_policy]);
+	SYS("ip -netns ns_dst link set bond2 up address " BOND2_MAC_STR " addrgenmode none");
+
+	SYS("ip link set veth1_1 master bond1");
+	if (bond_both_attach == BOND_BOTH_AND_ATTACH) {
+		SYS("ip link set veth1_2 master bond1");
+	} else {
+		SYS("ip link set veth1_2 up addrgenmode none");
+
+		if (load_xdp_program("xdp_dummy.o", "xdp_dummy", "veth1_2"))
+			return -1;
+	}
+
+	SYS("ip -netns ns_dst link set veth2_1 master bond2");
+
+	if (bond_both_attach == BOND_BOTH_AND_ATTACH)
+		SYS("ip -netns ns_dst link set veth2_2 master bond2");
+	else
+		SYS("ip -netns ns_dst link set veth2_2 up addrgenmode none");
+
+	/* Load a dummy program on sending side as with veth peer needs to have a
+	 * XDP program loaded as well.
+	 */
+	if (load_xdp_program("xdp_dummy.o", "xdp_dummy", "bond1"))
+		return -1;
+
+	if (bond_both_attach == BOND_BOTH_AND_ATTACH) {
+		if (!ASSERT_OK(setns_by_name("ns_dst"), "set netns to ns_dst"))
+			return -1;
+		if (load_xdp_program("xdp_tx.o", "tx", "bond2"))
+			return -1;
+		restore_root_netns();
+	}
+
+#undef SYS
+	return 0;
+}
+
+static void bonding_cleanup(void)
+{
+	restore_root_netns();
+	while (n_loaded_bpf_objects) {
+		n_loaded_bpf_objects--;
+		bpf_object__close(loaded_bpf_objects[n_loaded_bpf_objects]);
+	}
+	ASSERT_OK(system("ip link delete bond1"), "delete bond1");
+	ASSERT_OK(system("ip link delete veth1_1"), "delete veth1_1");
+	ASSERT_OK(system("ip link delete veth1_2"), "delete veth1_2");
+	ASSERT_OK(system("ip netns delete ns_dst"), "delete ns_dst");
+}
+
+static int send_udp_packets(int vary_dst_ip)
+{
+	struct ethhdr eh = {
+		.h_source = BOND1_MAC,
+		.h_dest = BOND2_MAC,
+		.h_proto = htons(ETH_P_IP),
+	};
+	uint8_t buf[128] = {};
+	struct iphdr *iph = (struct iphdr *)(buf + sizeof(eh));
+	struct udphdr *uh = (struct udphdr *)(buf + sizeof(eh) + sizeof(*iph));
+	int i, s = -1;
+	int ifindex;
+
+	s = socket(AF_PACKET, SOCK_RAW, IPPROTO_RAW);
+	if (!ASSERT_GE(s, 0, "socket"))
+		goto err;
+
+	ifindex = if_nametoindex("bond1");
+	if (!ASSERT_GT(ifindex, 0, "get bond1 ifindex"))
+		goto err;
+
+	memcpy(buf, &eh, sizeof(eh));
+	iph->ihl = 5;
+	iph->version = 4;
+	iph->tos = 16;
+	iph->id = 1;
+	iph->ttl = 64;
+	iph->protocol = IPPROTO_UDP;
+	iph->saddr = 1;
+	iph->daddr = 2;
+	iph->tot_len = htons(sizeof(buf) - ETH_HLEN);
+	iph->check = 0;
+
+	for (i = 1; i <= NPACKETS; i++) {
+		int n;
+		struct sockaddr_ll saddr_ll = {
+			.sll_ifindex = ifindex,
+			.sll_halen = ETH_ALEN,
+			.sll_addr = BOND2_MAC,
+		};
+
+		/* vary the UDP destination port for even distribution with roundrobin/xor modes */
+		uh->dest++;
+
+		if (vary_dst_ip)
+			iph->daddr++;
+
+		n = sendto(s, buf, sizeof(buf), 0, (struct sockaddr *)&saddr_ll, sizeof(saddr_ll));
+		if (!ASSERT_EQ(n, sizeof(buf), "sendto"))
+			goto err;
+	}
+
+	return 0;
+
+err:
+	if (s >= 0)
+		close(s);
+	return -1;
+}
+
+void test_xdp_bonding_with_mode(char *name, int mode, int xmit_policy)
+{
+	int bond1_rx;
+
+	if (!test__start_subtest(name))
+		return;
+
+	if (bonding_setup(mode, xmit_policy, BOND_BOTH_AND_ATTACH))
+		goto out;
+
+	if (send_udp_packets(xmit_policy != BOND_XMIT_POLICY_LAYER34))
+		goto out;
+
+	bond1_rx = get_rx_packets("bond1");
+	ASSERT_EQ(bond1_rx, NPACKETS, "expected more received packets");
+
+	switch (mode) {
+	case BOND_MODE_ROUNDROBIN:
+	case BOND_MODE_XOR: {
+		int veth1_rx = get_rx_packets("veth1_1");
+		int veth2_rx = get_rx_packets("veth1_2");
+		int diff = abs(veth1_rx - veth2_rx);
+
+		ASSERT_GE(veth1_rx + veth2_rx, NPACKETS, "expected more packets");
+
+		switch (xmit_policy) {
+		case BOND_XMIT_POLICY_LAYER2:
+			ASSERT_GE(diff, NPACKETS,
+				  "expected packets on only one of the interfaces");
+			break;
+		case BOND_XMIT_POLICY_LAYER23:
+		case BOND_XMIT_POLICY_LAYER34:
+			ASSERT_LT(diff, NPACKETS/2,
+				  "expected even distribution of packets");
+			break;
+		default:
+			PRINT_FAIL("Unimplemented xmit_policy=%d\n", xmit_policy);
+			break;
+		}
+		break;
+	}
+	case BOND_MODE_ACTIVEBACKUP: {
+		int veth1_rx = get_rx_packets("veth1_1");
+		int veth2_rx = get_rx_packets("veth1_2");
+		int diff = abs(veth1_rx - veth2_rx);
+
+		ASSERT_GE(diff, NPACKETS,
+			  "expected packets on only one of the interfaces");
+		break;
+	}
+	default:
+		PRINT_FAIL("Unimplemented xmit_policy=%d\n", xmit_policy);
+		break;
+	}
+
+out:
+	bonding_cleanup();
+}
+
+
+/* Test the broadcast redirection using xdp_redirect_map_multi_prog and adding
+ * all the interfaces to it and checking that broadcasting won't send the packet
+ * to neither the ingress bond device (bond2) or its slave (veth2_1).
+ */
+void test_xdp_bonding_redirect_multi(void)
+{
+	static const char * const ifaces[] = {"bond2", "veth2_1", "veth2_2"};
+	struct bpf_prog_load_attr prog_load_attr = {
+		.prog_type = BPF_PROG_TYPE_UNSPEC,
+		.file = "xdp_redirect_multi_kern.o",
+	};
+	struct bpf_program *redirect_prog;
+	int prog_fd, map_all_fd;
+	struct bpf_object *obj;
+	int veth1_1_rx, veth1_2_rx;
+	int err;
+
+	if (!test__start_subtest("xdp_bonding_redirect_multi"))
+		return;
+
+	if (bonding_setup(BOND_MODE_ROUNDROBIN, BOND_XMIT_POLICY_LAYER23, BOND_ONE_NO_ATTACH))
+		goto out;
+
+	err = bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd);
+	if (!ASSERT_OK(err, "prog load xattr"))
+		goto out;
+
+	map_all_fd = bpf_object__find_map_fd_by_name(obj, "map_all");
+	if (!ASSERT_GE(map_all_fd, 0, "find map_all fd"))
+		goto out;
+
+	redirect_prog = bpf_object__find_program_by_name(obj, "xdp_redirect_map_multi_prog");
+	if (!ASSERT_OK_PTR(redirect_prog, "find xdp_redirect_map_multi_prog"))
+		goto out;
+
+	prog_fd = bpf_program__fd(redirect_prog);
+	if (!ASSERT_GE(prog_fd, 0, "get prog fd"))
+		goto out;
+
+	if (!ASSERT_OK(setns_by_name("ns_dst"), "could not set netns to ns_dst"))
+		goto out;
+
+	/* populate the devmap with the relevant interfaces */
+	for (int i = 0; i < ARRAY_SIZE(ifaces); i++) {
+		int ifindex = if_nametoindex(ifaces[i]);
+
+		if (!ASSERT_GT(ifindex, 0, "could not get interface index"))
+			goto out;
+
+		if (!ASSERT_OK(bpf_map_update_elem(map_all_fd, &ifindex, &ifindex, 0),
+			       "add interface to map_all"))
+			goto out;
+	}
+
+	/* finally attach the program */
+	err = bpf_set_link_xdp_fd(if_nametoindex("bond2"), prog_fd,
+				  XDP_FLAGS_DRV_MODE | XDP_FLAGS_UPDATE_IF_NOEXIST);
+	if (!ASSERT_OK(err, "set bond2 xdp"))
+		goto out;
+
+	restore_root_netns();
+
+	if (send_udp_packets(BOND_MODE_ROUNDROBIN))
+		goto out;
+
+	veth1_1_rx = get_rx_packets("veth1_1");
+	veth1_2_rx = get_rx_packets("veth1_2");
+
+	ASSERT_EQ(veth1_1_rx, 0, "expected no packets on veth1_1");
+	ASSERT_GE(veth1_2_rx, NPACKETS, "expected packets on veth1_2");
+
+out:
+	restore_root_netns();
+	bpf_object__close(obj);
+	bonding_cleanup();
+}
+
+static int libbpf_debug_print(enum libbpf_print_level level,
+			      const char *format, va_list args)
+{
+	if (level != LIBBPF_WARN)
+		vprintf(format, args);
+	return 0;
+}
+
+struct bond_test_case {
+	char *name;
+	int mode;
+	int xmit_policy;
+};
+
+static	struct bond_test_case bond_test_cases[] = {
+	{ "xdp_bonding_roundrobin", BOND_MODE_ROUNDROBIN, BOND_XMIT_POLICY_LAYER23, },
+	{ "xdp_bonding_activebackup", BOND_MODE_ACTIVEBACKUP, BOND_XMIT_POLICY_LAYER23 },
+
+	{ "xdp_bonding_xor_layer2", BOND_MODE_XOR, BOND_XMIT_POLICY_LAYER2, },
+	{ "xdp_bonding_xor_layer23", BOND_MODE_XOR, BOND_XMIT_POLICY_LAYER23, },
+	{ "xdp_bonding_xor_layer34", BOND_MODE_XOR, BOND_XMIT_POLICY_LAYER34, },
+};
+
+void test_xdp_bonding(void)
+{
+	libbpf_print_fn_t old_print_fn;
+	int i;
+
+	old_print_fn = libbpf_set_print(libbpf_debug_print);
+
+	root_netns_fd = open("/proc/self/ns/net", O_RDONLY);
+	if (!ASSERT_GE(root_netns_fd, 0, "open /proc/self/ns/net"))
+		return;
+
+	for (i = 0; i < ARRAY_SIZE(bond_test_cases); i++) {
+		struct bond_test_case *test_case = &bond_test_cases[i];
+
+		test_xdp_bonding_with_mode(
+			test_case->name,
+			test_case->mode,
+			test_case->xmit_policy);
+	}
+
+	test_xdp_bonding_redirect_multi();
+
+	libbpf_set_print(old_print_fn);
+	close(root_netns_fd);
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v5 0/7] XDP bonding support
  2021-06-09 13:55 [PATCH bpf-next 0/3] XDP bonding support Jussi Maki
                   ` (6 preceding siblings ...)
  2021-07-28 23:43 ` [PATCH bpf-next v4 0/6] XDP bonding support joamaki
@ 2021-07-30  6:18 ` Jussi Maki
  2021-07-30  6:18   ` [PATCH bpf-next v5 1/7] net: bonding: Refactor bond_xmit_hash for use with xdp_buff Jussi Maki
                     ` (6 more replies)
  2021-07-31  5:57 ` [PATCH bpf-next v6 0/7]: XDP bonding support Jussi Maki
  8 siblings, 7 replies; 71+ messages in thread
From: Jussi Maki @ 2021-07-30  6:18 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson

This patchset introduces XDP support to the bonding driver.

The motivation for this change is to enable use of bonding (and
802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
XDP and also to transparently support bond devices for projects that
use XDP given most modern NICs have dual port adapters.  An alternative
to this approach would be to implement 802.3ad in user-space and
implement the bonding load-balancing in the XDP program itself, but
is rather a cumbersome endeavor in terms of slave device management
(e.g. by watching netlink) and requires separate programs for native
vs bond cases for the orchestrator. A native in-kernel implementation
overcomes these issues and provides more flexibility.

Below are benchmark results done on two machines with 100Gbit
Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
16-core 3950X on receiving machine. 64 byte packets were sent with
pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
ice driver, so the tests were performed with iommu=off and patch [2]
applied. Additionally the bonding round robin algorithm was modified
to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
of cache misses were caused by the shared rr_tx_counter. Fix for this
has been already merged into net-next. The statistics were collected
using "sar -n dev -u 1 10".

 -----------------------|  CPU  |--| rxpck/s |--| txpck/s |----
 without patch (1 dev):
   XDP_DROP:              3.15%      48.6Mpps
   XDP_TX:                3.12%      18.3Mpps     18.3Mpps
   XDP_DROP (RSS):        9.47%      116.5Mpps
   XDP_TX (RSS):          9.67%      25.3Mpps     24.2Mpps
 -----------------------
 with patch, bond (1 dev):
   XDP_DROP:              3.14%      46.7Mpps
   XDP_TX:                3.15%      13.9Mpps     13.9Mpps
   XDP_DROP (RSS):        10.33%     117.2Mpps
   XDP_TX (RSS):          10.64%     25.1Mpps     24.0Mpps
 -----------------------
 with patch, bond (2 devs):
   XDP_DROP:              6.27%      92.7Mpps
   XDP_TX:                6.26%      17.6Mpps     17.5Mpps
   XDP_DROP (RSS):       11.38%      117.2Mpps
   XDP_TX (RSS):         14.30%      28.7Mpps     27.4Mpps
 --------------------------------------------------------------

RSS: Receive Side Scaling, e.g. the packets were sent to a range of
destination IPs.

[1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb
[2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t
[3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/

Patch 1 prepares bond_xmit_hash for hashing xdp_buff's.
Patch 2 adds hooks to implement redirection after bpf prog run.
Patch 3 implements the hooks in the bonding driver.
Patch 4 modifies devmap to properly handle EXCLUDE_INGRESS with a slave device.
Patch 5 fixes an issue related to recent cleanup of rcu_read_lock in XDP context.
Patch 6 fixes loading of xdp_tx.o by renaming section name.
Patch 7 adds tests.

v4->v5:
- As pointed by Andrii, use the generated BPF skeletons rather than libbpf
  directly.
- Renamed section name in progs/xdp_tx.c as the BPF skeleton wouldn't load it
  otherwise due to unknown program type.
- Daniel Borkmann noted that to retain backwards compatibility and allow some
  use cases we should allow attaching XDP programs to a slave device when the
  master does not have a program loaded. Modified the logic to allow this and
  added tests for the different combinations of attaching a program.

v3->v4:
- Add back the test suite, while removing the vmtest.sh modifications to kernel
  config new that CONFIG_BONDING=y is set. Discussed with Magnus Karlsson that
  it makes sense right now to not reuse the code from xdpceiver.c for testing
  XDP bonding.

v2->v3:
- Address Jay's comment to properly exclude upper devices with EXCLUDE_INGRESS
  when there are deeper nesting involved. Now all upper devices are excluded.
- Refuse to enslave devices that already have XDP programs loaded and refuse to
  load XDP programs to slave devices. Earlier one could have a XDP program loaded
  and after enslaving and loading another program onto the bond device the xdp_state
  of the enslaved device would be pointing at an old program.
- Adapt netdev_lower_get_next_private_rcu so it can be called in the XDP context.

v1->v2:
- Split up into smaller easier to review patches and address cosmetic
  review comments.
- Drop the INDIRECT_CALL optimization as it showed little improvement in tests.
- Drop the rr_tx_counter patch as that has already been merged into net-next.
- Separate the test suite into another patch set. This will follow later once a
  patch set from Magnus Karlsson is merged and provides test utilities that can
  be reused for XDP bonding tests. v2 contains no major functional changes and
  was tested with the test suite included in v1.
  (https://lore.kernel.org/bpf/202106221509.kwNvAAZg-lkp@intel.com/T/#m464146d47299125d5868a08affd6d6ce526dfad1)

---



^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v5 1/7] net: bonding: Refactor bond_xmit_hash for use with xdp_buff
  2021-07-30  6:18 ` [PATCH bpf-next v5 0/7] XDP bonding support Jussi Maki
@ 2021-07-30  6:18   ` Jussi Maki
  2021-07-30  6:18   ` [PATCH bpf-next v5 2/7] net: core: Add support for XDP redirection to slave device Jussi Maki
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 71+ messages in thread
From: Jussi Maki @ 2021-07-30  6:18 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

In preparation for adding XDP support to the bonding driver
refactor the packet hashing functions to be able to work with
any linear data buffer without an skb.

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 drivers/net/bonding/bond_main.c | 147 +++++++++++++++++++-------------
 1 file changed, 90 insertions(+), 57 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index d22d78303311..dcec5cc4dab1 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3611,55 +3611,80 @@ static struct notifier_block bond_netdev_notifier = {
 
 /*---------------------------- Hashing Policies -----------------------------*/
 
+/* Helper to access data in a packet, with or without a backing skb.
+ * If skb is given the data is linearized if necessary via pskb_may_pull.
+ */
+static inline const void *bond_pull_data(struct sk_buff *skb,
+					 const void *data, int hlen, int n)
+{
+	if (likely(n <= hlen))
+		return data;
+	else if (skb && likely(pskb_may_pull(skb, n)))
+		return skb->head;
+
+	return NULL;
+}
+
 /* L2 hash helper */
-static inline u32 bond_eth_hash(struct sk_buff *skb)
+static inline u32 bond_eth_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen)
 {
-	struct ethhdr *ep, hdr_tmp;
+	struct ethhdr *ep;
 
-	ep = skb_header_pointer(skb, 0, sizeof(hdr_tmp), &hdr_tmp);
-	if (ep)
-		return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto;
-	return 0;
+	data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr));
+	if (!data)
+		return 0;
+
+	ep = (struct ethhdr *)(data + mhoff);
+	return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto;
 }
 
-static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk,
-			 int *noff, int *proto, bool l34)
+static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk, const void *data,
+			 int hlen, __be16 l2_proto, int *nhoff, int *ip_proto, bool l34)
 {
 	const struct ipv6hdr *iph6;
 	const struct iphdr *iph;
 
-	if (skb->protocol == htons(ETH_P_IP)) {
-		if (unlikely(!pskb_may_pull(skb, *noff + sizeof(*iph))))
+	if (l2_proto == htons(ETH_P_IP)) {
+		data = bond_pull_data(skb, data, hlen, *nhoff + sizeof(*iph));
+		if (!data)
 			return false;
-		iph = (const struct iphdr *)(skb->data + *noff);
+
+		iph = (const struct iphdr *)(data + *nhoff);
 		iph_to_flow_copy_v4addrs(fk, iph);
-		*noff += iph->ihl << 2;
+		*nhoff += iph->ihl << 2;
 		if (!ip_is_fragment(iph))
-			*proto = iph->protocol;
-	} else if (skb->protocol == htons(ETH_P_IPV6)) {
-		if (unlikely(!pskb_may_pull(skb, *noff + sizeof(*iph6))))
+			*ip_proto = iph->protocol;
+	} else if (l2_proto == htons(ETH_P_IPV6)) {
+		data = bond_pull_data(skb, data, hlen, *nhoff + sizeof(*iph6));
+		if (!data)
 			return false;
-		iph6 = (const struct ipv6hdr *)(skb->data + *noff);
+
+		iph6 = (const struct ipv6hdr *)(data + *nhoff);
 		iph_to_flow_copy_v6addrs(fk, iph6);
-		*noff += sizeof(*iph6);
-		*proto = iph6->nexthdr;
+		*nhoff += sizeof(*iph6);
+		*ip_proto = iph6->nexthdr;
 	} else {
 		return false;
 	}
 
-	if (l34 && *proto >= 0)
-		fk->ports.ports = skb_flow_get_ports(skb, *noff, *proto);
+	if (l34 && *ip_proto >= 0)
+		fk->ports.ports = __skb_flow_get_ports(skb, *nhoff, *ip_proto, data, hlen);
 
 	return true;
 }
 
-static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
+static u32 bond_vlan_srcmac_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen)
 {
-	struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
+	struct ethhdr *mac_hdr;
 	u32 srcmac_vendor = 0, srcmac_dev = 0;
 	u16 vlan;
 	int i;
 
+	data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr));
+	if (!data)
+		return 0;
+	mac_hdr = (struct ethhdr *)(data + mhoff);
+
 	for (i = 0; i < 3; i++)
 		srcmac_vendor = (srcmac_vendor << 8) | mac_hdr->h_source[i];
 
@@ -3675,26 +3700,25 @@ static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
 }
 
 /* Extract the appropriate headers based on bond's xmit policy */
-static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
-			      struct flow_keys *fk)
+static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb, const void *data,
+			      __be16 l2_proto, int nhoff, int hlen, struct flow_keys *fk)
 {
 	bool l34 = bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34;
-	int noff, proto = -1;
+	int ip_proto = -1;
 
 	switch (bond->params.xmit_policy) {
 	case BOND_XMIT_POLICY_ENCAP23:
 	case BOND_XMIT_POLICY_ENCAP34:
 		memset(fk, 0, sizeof(*fk));
 		return __skb_flow_dissect(NULL, skb, &flow_keys_bonding,
-					  fk, NULL, 0, 0, 0, 0);
+					  fk, data, l2_proto, nhoff, hlen, 0);
 	default:
 		break;
 	}
 
 	fk->ports.ports = 0;
 	memset(&fk->icmp, 0, sizeof(fk->icmp));
-	noff = skb_network_offset(skb);
-	if (!bond_flow_ip(skb, fk, &noff, &proto, l34))
+	if (!bond_flow_ip(skb, fk, data, hlen, l2_proto, &nhoff, &ip_proto, l34))
 		return false;
 
 	/* ICMP error packets contains at least 8 bytes of the header
@@ -3702,22 +3726,20 @@ static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
 	 * to correlate ICMP error packets within the same flow which
 	 * generated the error.
 	 */
-	if (proto == IPPROTO_ICMP || proto == IPPROTO_ICMPV6) {
-		skb_flow_get_icmp_tci(skb, &fk->icmp, skb->data,
-				      skb_transport_offset(skb),
-				      skb_headlen(skb));
-		if (proto == IPPROTO_ICMP) {
+	if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
+		skb_flow_get_icmp_tci(skb, &fk->icmp, data, nhoff, hlen);
+		if (ip_proto == IPPROTO_ICMP) {
 			if (!icmp_is_err(fk->icmp.type))
 				return true;
 
-			noff += sizeof(struct icmphdr);
-		} else if (proto == IPPROTO_ICMPV6) {
+			nhoff += sizeof(struct icmphdr);
+		} else if (ip_proto == IPPROTO_ICMPV6) {
 			if (!icmpv6_is_err(fk->icmp.type))
 				return true;
 
-			noff += sizeof(struct icmp6hdr);
+			nhoff += sizeof(struct icmp6hdr);
 		}
-		return bond_flow_ip(skb, fk, &noff, &proto, l34);
+		return bond_flow_ip(skb, fk, data, hlen, l2_proto, &nhoff, &ip_proto, l34);
 	}
 
 	return true;
@@ -3733,33 +3755,26 @@ static u32 bond_ip_hash(u32 hash, struct flow_keys *flow)
 	return hash >> 1;
 }
 
-/**
- * bond_xmit_hash - generate a hash value based on the xmit policy
- * @bond: bonding device
- * @skb: buffer to use for headers
- *
- * This function will extract the necessary headers from the skb buffer and use
- * them to generate a hash based on the xmit_policy set in the bonding device
+/* Generate hash based on xmit policy. If @skb is given it is used to linearize
+ * the data as required, but this function can be used without it if the data is
+ * known to be linear (e.g. with xdp_buff).
  */
-u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
+static u32 __bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, const void *data,
+			    __be16 l2_proto, int mhoff, int nhoff, int hlen)
 {
 	struct flow_keys flow;
 	u32 hash;
 
-	if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
-	    skb->l4_hash)
-		return skb->hash;
-
 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_VLAN_SRCMAC)
-		return bond_vlan_srcmac_hash(skb);
+		return bond_vlan_srcmac_hash(skb, data, mhoff, hlen);
 
 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
-	    !bond_flow_dissect(bond, skb, &flow))
-		return bond_eth_hash(skb);
+	    !bond_flow_dissect(bond, skb, data, l2_proto, nhoff, hlen, &flow))
+		return bond_eth_hash(skb, data, mhoff, hlen);
 
 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 ||
 	    bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23) {
-		hash = bond_eth_hash(skb);
+		hash = bond_eth_hash(skb, data, mhoff, hlen);
 	} else {
 		if (flow.icmp.id)
 			memcpy(&hash, &flow.icmp, sizeof(hash));
@@ -3770,6 +3785,25 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
 	return bond_ip_hash(hash, &flow);
 }
 
+/**
+ * bond_xmit_hash - generate a hash value based on the xmit policy
+ * @bond: bonding device
+ * @skb: buffer to use for headers
+ *
+ * This function will extract the necessary headers from the skb buffer and use
+ * them to generate a hash based on the xmit_policy set in the bonding device
+ */
+u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
+{
+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
+	    skb->l4_hash)
+		return skb->hash;
+
+	return __bond_xmit_hash(bond, skb, skb->head, skb->protocol,
+				skb->mac_header, skb->network_header,
+				skb_headlen(skb));
+}
+
 /*-------------------------- Device entry points ----------------------------*/
 
 void bond_work_init_all(struct bonding *bond)
@@ -4399,8 +4433,7 @@ static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
 	return bond_tx_drop(bond_dev, skb);
 }
 
-static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond,
-						      struct sk_buff *skb)
+static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond)
 {
 	return rcu_dereference(bond->curr_active_slave);
 }
@@ -4414,7 +4447,7 @@ static netdev_tx_t bond_xmit_activebackup(struct sk_buff *skb,
 	struct bonding *bond = netdev_priv(bond_dev);
 	struct slave *slave;
 
-	slave = bond_xmit_activebackup_slave_get(bond, skb);
+	slave = bond_xmit_activebackup_slave_get(bond);
 	if (slave)
 		return bond_dev_queue_xmit(bond, skb, slave->dev);
 
@@ -4712,7 +4745,7 @@ static struct net_device *bond_xmit_get_slave(struct net_device *master_dev,
 		slave = bond_xmit_roundrobin_slave_get(bond, skb);
 		break;
 	case BOND_MODE_ACTIVEBACKUP:
-		slave = bond_xmit_activebackup_slave_get(bond, skb);
+		slave = bond_xmit_activebackup_slave_get(bond);
 		break;
 	case BOND_MODE_8023AD:
 	case BOND_MODE_XOR:
-- 
2.17.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v5 2/7] net: core: Add support for XDP redirection to slave device
  2021-07-30  6:18 ` [PATCH bpf-next v5 0/7] XDP bonding support Jussi Maki
  2021-07-30  6:18   ` [PATCH bpf-next v5 1/7] net: bonding: Refactor bond_xmit_hash for use with xdp_buff Jussi Maki
@ 2021-07-30  6:18   ` Jussi Maki
  2021-07-30  6:18   ` [PATCH bpf-next v5 3/7] net: bonding: Add XDP support to the bonding driver Jussi Maki
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 71+ messages in thread
From: Jussi Maki @ 2021-07-30  6:18 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

This adds the ndo_xdp_get_xmit_slave hook for transforming XDP_TX
into XDP_REDIRECT after BPF program run when the ingress device
is a bond slave.

The dev_xdp_prog_count is exposed so that slave devices can be checked
for loaded XDP programs in order to avoid the situation where both
bond master and slave have programs loaded according to xdp_state.

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 include/linux/filter.h    | 13 ++++++++++++-
 include/linux/netdevice.h |  6 ++++++
 net/core/dev.c            | 13 ++++++++++++-
 net/core/filter.c         | 25 +++++++++++++++++++++++++
 4 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index ba36989f711a..7ea1cc378042 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -761,6 +761,10 @@ static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog,
 
 DECLARE_BPF_DISPATCHER(xdp)
 
+DECLARE_STATIC_KEY_FALSE(bpf_master_redirect_enabled_key);
+
+u32 xdp_master_redirect(struct xdp_buff *xdp);
+
 static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
 					    struct xdp_buff *xdp)
 {
@@ -768,7 +772,14 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
 	 * under local_bh_disable(), which provides the needed RCU protection
 	 * for accessing map entries.
 	 */
-	return __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
+	u32 act = __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
+
+	if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) {
+		if (act == XDP_TX && netif_is_bond_slave(xdp->rxq->dev))
+			act = xdp_master_redirect(xdp);
+	}
+
+	return act;
 }
 
 void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 42f6f866d5f3..a380786429e1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1321,6 +1321,9 @@ struct netdev_net_notifier {
  *	that got dropped are freed/returned via xdp_return_frame().
  *	Returns negative number, means general error invoking ndo, meaning
  *	no frames were xmit'ed and core-caller will free all frames.
+ * struct net_device *(*ndo_xdp_get_xmit_slave)(struct net_device *dev,
+ *					        struct xdp_buff *xdp);
+ *      Get the xmit slave of master device based on the xdp_buff.
  * int (*ndo_xsk_wakeup)(struct net_device *dev, u32 queue_id, u32 flags);
  *      This function is used to wake up the softirq, ksoftirqd or kthread
  *	responsible for sending and/or receiving packets on a specific
@@ -1539,6 +1542,8 @@ struct net_device_ops {
 	int			(*ndo_xdp_xmit)(struct net_device *dev, int n,
 						struct xdp_frame **xdp,
 						u32 flags);
+	struct net_device *	(*ndo_xdp_get_xmit_slave)(struct net_device *dev,
+							  struct xdp_buff *xdp);
 	int			(*ndo_xsk_wakeup)(struct net_device *dev,
 						  u32 queue_id, u32 flags);
 	struct devlink_port *	(*ndo_get_devlink_port)(struct net_device *dev);
@@ -4071,6 +4076,7 @@ typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf);
 int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 		      int fd, int expected_fd, u32 flags);
 int bpf_xdp_link_attach(const union bpf_attr *attr, struct bpf_prog *prog);
+u8 dev_xdp_prog_count(struct net_device *dev);
 u32 dev_xdp_prog_id(struct net_device *dev, enum bpf_xdp_mode mode);
 
 int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index 3ee58876e8f5..27023ea933dd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9353,7 +9353,7 @@ static struct bpf_prog *dev_xdp_prog(struct net_device *dev,
 	return dev->xdp_state[mode].prog;
 }
 
-static u8 dev_xdp_prog_count(struct net_device *dev)
+u8 dev_xdp_prog_count(struct net_device *dev)
 {
 	u8 count = 0;
 	int i;
@@ -9363,6 +9363,7 @@ static u8 dev_xdp_prog_count(struct net_device *dev)
 			count++;
 	return count;
 }
+EXPORT_SYMBOL_GPL(dev_xdp_prog_count);
 
 u32 dev_xdp_prog_id(struct net_device *dev, enum bpf_xdp_mode mode)
 {
@@ -9456,6 +9457,8 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack
 {
 	unsigned int num_modes = hweight32(flags & XDP_FLAGS_MODES);
 	struct bpf_prog *cur_prog;
+	struct net_device *upper;
+	struct list_head *iter;
 	enum bpf_xdp_mode mode;
 	bpf_op_t bpf_op;
 	int err;
@@ -9494,6 +9497,14 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack
 		return -EBUSY;
 	}
 
+	/* don't allow if an upper device already has a program */
+	netdev_for_each_upper_dev_rcu(dev, upper, iter) {
+		if (dev_xdp_prog_count(upper) > 0) {
+			NL_SET_ERR_MSG(extack, "Cannot attach when an upper device already has a program");
+			return -EEXIST;
+		}
+	}
+
 	cur_prog = dev_xdp_prog(dev, mode);
 	/* can't replace attached prog with link */
 	if (link && cur_prog) {
diff --git a/net/core/filter.c b/net/core/filter.c
index faf29fd82276..ff62cd39046d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3950,6 +3950,31 @@ void bpf_clear_redirect_map(struct bpf_map *map)
 	}
 }
 
+DEFINE_STATIC_KEY_FALSE(bpf_master_redirect_enabled_key);
+EXPORT_SYMBOL_GPL(bpf_master_redirect_enabled_key);
+
+u32 xdp_master_redirect(struct xdp_buff *xdp)
+{
+	struct net_device *master, *slave;
+	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+
+	master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev);
+	slave = master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp);
+	if (slave && slave != xdp->rxq->dev) {
+		/* The target device is different from the receiving device, so
+		 * redirect it to the new device.
+		 * Using XDP_REDIRECT gets the correct behaviour from XDP enabled
+		 * drivers to unmap the packet from their rx ring.
+		 */
+		ri->tgt_index = slave->ifindex;
+		ri->map_id = INT_MAX;
+		ri->map_type = BPF_MAP_TYPE_UNSPEC;
+		return XDP_REDIRECT;
+	}
+	return XDP_TX;
+}
+EXPORT_SYMBOL_GPL(xdp_master_redirect);
+
 int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
 		    struct bpf_prog *xdp_prog)
 {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v5 3/7] net: bonding: Add XDP support to the bonding driver
  2021-07-30  6:18 ` [PATCH bpf-next v5 0/7] XDP bonding support Jussi Maki
  2021-07-30  6:18   ` [PATCH bpf-next v5 1/7] net: bonding: Refactor bond_xmit_hash for use with xdp_buff Jussi Maki
  2021-07-30  6:18   ` [PATCH bpf-next v5 2/7] net: core: Add support for XDP redirection to slave device Jussi Maki
@ 2021-07-30  6:18   ` Jussi Maki
  2021-07-30  6:18   ` [PATCH bpf-next v5 4/7] devmap: Exclude XDP broadcast to master device Jussi Maki
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 71+ messages in thread
From: Jussi Maki @ 2021-07-30  6:18 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

XDP is implemented in the bonding driver by transparently delegating
the XDP program loading, removal and xmit operations to the bonding
slave devices. The overall goal of this work is that XDP programs
can be attached to a bond device *without* any further changes (or
awareness) necessary to the program itself, meaning the same XDP
program can be attached to a native device but also a bonding device.

Semantics of XDP_TX when attached to a bond are equivalent in such
setting to the case when a tc/BPF program would be attached to the
bond, meaning transmitting the packet out of the bond itself using one
of the bond's configured xmit methods to select a slave device (rather
than XDP_TX on the slave itself). Handling of XDP_TX to transmit
using the configured bonding mechanism is therefore implemented by
rewriting the BPF program return value in bpf_prog_run_xdp. To avoid
performance impact this check is guarded by a static key, which is
incremented when a XDP program is loaded onto a bond device. This
approach was chosen to avoid changes to drivers implementing XDP. If
the slave device does not match the receive device, then XDP_REDIRECT
is transparently used to perform the redirection in order to have
the network driver release the packet from its RX ring.  The bonding
driver hashing functions have been refactored to allow reuse with
xdp_buff's to avoid code duplication.

The motivation for this change is to enable use of bonding (and
802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
XDP and also to transparently support bond devices for projects that
use XDP given most modern NICs have dual port adapters.  An alternative
to this approach would be to implement 802.3ad in user-space and
implement the bonding load-balancing in the XDP program itself, but
is rather a cumbersome endeavor in terms of slave device management
(e.g. by watching netlink) and requires separate programs for native
vs bond cases for the orchestrator. A native in-kernel implementation
overcomes these issues and provides more flexibility.

Below are benchmark results done on two machines with 100Gbit
Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
16-core 3950X on receiving machine. 64 byte packets were sent with
pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
ice driver, so the tests were performed with iommu=off and patch [2]
applied. Additionally the bonding round robin algorithm was modified
to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
of cache misses were caused by the shared rr_tx_counter (see patch
2/3). The statistics were collected using "sar -n dev -u 1 10".

 -----------------------|  CPU  |--| rxpck/s |--| txpck/s |----
 without patch (1 dev):
   XDP_DROP:              3.15%      48.6Mpps
   XDP_TX:                3.12%      18.3Mpps     18.3Mpps
   XDP_DROP (RSS):        9.47%      116.5Mpps
   XDP_TX (RSS):          9.67%      25.3Mpps     24.2Mpps
 -----------------------
 with patch, bond (1 dev):
   XDP_DROP:              3.14%      46.7Mpps
   XDP_TX:                3.15%      13.9Mpps     13.9Mpps
   XDP_DROP (RSS):        10.33%     117.2Mpps
   XDP_TX (RSS):          10.64%     25.1Mpps     24.0Mpps
 -----------------------
 with patch, bond (2 devs):
   XDP_DROP:              6.27%      92.7Mpps
   XDP_TX:                6.26%      17.6Mpps     17.5Mpps
   XDP_DROP (RSS):       11.38%      117.2Mpps
   XDP_TX (RSS):         14.30%      28.7Mpps     27.4Mpps
 --------------------------------------------------------------

RSS: Receive Side Scaling, e.g. the packets were sent to a range of
destination IPs.

[1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb
[2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t
[3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 drivers/net/bonding/bond_main.c | 309 +++++++++++++++++++++++++++++++-
 include/net/bonding.h           |   1 +
 2 files changed, 309 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index dcec5cc4dab1..fcd01acd1c83 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -317,6 +317,19 @@ bool bond_sk_check(struct bonding *bond)
 	}
 }
 
+static bool bond_xdp_check(struct bonding *bond)
+{
+	switch (BOND_MODE(bond)) {
+	case BOND_MODE_ROUNDROBIN:
+	case BOND_MODE_ACTIVEBACKUP:
+	case BOND_MODE_8023AD:
+	case BOND_MODE_XOR:
+		return true;
+	default:
+		return false;
+	}
+}
+
 /*---------------------------------- VLAN -----------------------------------*/
 
 /* In the following 2 functions, bond_vlan_rx_add_vid and bond_vlan_rx_kill_vid,
@@ -2133,6 +2146,41 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
 		bond_update_slave_arr(bond, NULL);
 
 
+	if (!slave_dev->netdev_ops->ndo_bpf ||
+	    !slave_dev->netdev_ops->ndo_xdp_xmit) {
+		if (bond->xdp_prog) {
+			NL_SET_ERR_MSG(extack, "Slave does not support XDP");
+			slave_err(bond_dev, slave_dev, "Slave does not support XDP\n");
+			res = -EOPNOTSUPP;
+			goto err_sysfs_del;
+		}
+	} else {
+		struct netdev_bpf xdp = {
+			.command = XDP_SETUP_PROG,
+			.flags   = 0,
+			.prog    = bond->xdp_prog,
+			.extack  = extack,
+		};
+
+		if (dev_xdp_prog_count(slave_dev) > 0) {
+			NL_SET_ERR_MSG(extack,
+				       "Slave has XDP program loaded, please unload before enslaving");
+			slave_err(bond_dev, slave_dev,
+				  "Slave has XDP program loaded, please unload before enslaving\n");
+			res = -EOPNOTSUPP;
+			goto err_sysfs_del;
+		}
+
+		res = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp);
+		if (res < 0) {
+			/* ndo_bpf() sets extack error message */
+			slave_dbg(bond_dev, slave_dev, "Error %d calling ndo_bpf\n", res);
+			goto err_sysfs_del;
+		}
+		if (bond->xdp_prog)
+			bpf_prog_inc(bond->xdp_prog);
+	}
+
 	slave_info(bond_dev, slave_dev, "Enslaving as %s interface with %s link\n",
 		   bond_is_active_slave(new_slave) ? "an active" : "a backup",
 		   new_slave->link != BOND_LINK_DOWN ? "an up" : "a down");
@@ -2252,6 +2300,17 @@ static int __bond_release_one(struct net_device *bond_dev,
 	/* recompute stats just before removing the slave */
 	bond_get_stats(bond->dev, &bond->bond_stats);
 
+	if (bond->xdp_prog) {
+		struct netdev_bpf xdp = {
+			.command = XDP_SETUP_PROG,
+			.flags   = 0,
+			.prog	 = NULL,
+			.extack  = NULL,
+		};
+		if (slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp))
+			slave_warn(bond_dev, slave_dev, "failed to unload XDP program\n");
+	}
+
 	bond_upper_dev_unlink(bond, slave);
 	/* unregister rx_handler early so bond_handle_frame wouldn't be called
 	 * for this slave anymore.
@@ -3635,7 +3694,7 @@ static inline u32 bond_eth_hash(struct sk_buff *skb, const void *data, int mhoff
 		return 0;
 
 	ep = (struct ethhdr *)(data + mhoff);
-	return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto;
+	return ep->h_dest[5] ^ ep->h_source[5] ^ be16_to_cpu(ep->h_proto);
 }
 
 static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk, const void *data,
@@ -3804,6 +3863,26 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
 				skb_headlen(skb));
 }
 
+/**
+ * bond_xmit_hash_xdp - generate a hash value based on the xmit policy
+ * @bond: bonding device
+ * @xdp: buffer to use for headers
+ *
+ * The XDP variant of bond_xmit_hash.
+ */
+static u32 bond_xmit_hash_xdp(struct bonding *bond, struct xdp_buff *xdp)
+{
+	struct ethhdr *eth;
+
+	if (xdp->data + sizeof(struct ethhdr) > xdp->data_end)
+		return 0;
+
+	eth = (struct ethhdr *)xdp->data;
+
+	return __bond_xmit_hash(bond, NULL, xdp->data, eth->h_proto, 0,
+				sizeof(struct ethhdr), xdp->data_end - xdp->data);
+}
+
 /*-------------------------- Device entry points ----------------------------*/
 
 void bond_work_init_all(struct bonding *bond)
@@ -4420,6 +4499,47 @@ static struct slave *bond_xmit_roundrobin_slave_get(struct bonding *bond,
 	return NULL;
 }
 
+static struct slave *bond_xdp_xmit_roundrobin_slave_get(struct bonding *bond,
+							struct xdp_buff *xdp)
+{
+	struct slave *slave;
+	int slave_cnt;
+	u32 slave_id;
+	const struct ethhdr *eth;
+	void *data = xdp->data;
+
+	if (data + sizeof(struct ethhdr) > xdp->data_end)
+		goto non_igmp;
+
+	eth = (struct ethhdr *)data;
+	data += sizeof(struct ethhdr);
+
+	/* See comment on IGMP in bond_xmit_roundrobin_slave_get() */
+	if (eth->h_proto == htons(ETH_P_IP)) {
+		const struct iphdr *iph;
+
+		if (data + sizeof(struct iphdr) > xdp->data_end)
+			goto non_igmp;
+
+		iph = (struct iphdr *)data;
+
+		if (iph->protocol == IPPROTO_IGMP) {
+			slave = rcu_dereference(bond->curr_active_slave);
+			if (slave)
+				return slave;
+			return bond_get_slave_by_id(bond, 0);
+		}
+	}
+
+non_igmp:
+	slave_cnt = READ_ONCE(bond->slave_cnt);
+	if (likely(slave_cnt)) {
+		slave_id = bond_rr_gen_slave_id(bond) % slave_cnt;
+		return bond_get_slave_by_id(bond, slave_id);
+	}
+	return NULL;
+}
+
 static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
 					struct net_device *bond_dev)
 {
@@ -4635,6 +4755,22 @@ static struct slave *bond_xmit_3ad_xor_slave_get(struct bonding *bond,
 	return slave;
 }
 
+static struct slave *bond_xdp_xmit_3ad_xor_slave_get(struct bonding *bond,
+						     struct xdp_buff *xdp)
+{
+	struct bond_up_slave *slaves;
+	unsigned int count;
+	u32 hash;
+
+	hash = bond_xmit_hash_xdp(bond, xdp);
+	slaves = rcu_dereference(bond->usable_slaves);
+	count = slaves ? READ_ONCE(slaves->count) : 0;
+	if (unlikely(!count))
+		return NULL;
+
+	return slaves->arr[hash % count];
+}
+
 /* Use this Xmit function for 3AD as well as XOR modes. The current
  * usable slave array is formed in the control path. The xmit function
  * just calculates hash and sends the packet out.
@@ -4919,6 +5055,174 @@ static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	return ret;
 }
 
+static struct net_device *
+bond_xdp_get_xmit_slave(struct net_device *bond_dev, struct xdp_buff *xdp)
+{
+	struct bonding *bond = netdev_priv(bond_dev);
+	struct slave *slave;
+
+	/* Caller needs to hold rcu_read_lock() */
+
+	switch (BOND_MODE(bond)) {
+	case BOND_MODE_ROUNDROBIN:
+		slave = bond_xdp_xmit_roundrobin_slave_get(bond, xdp);
+		break;
+
+	case BOND_MODE_ACTIVEBACKUP:
+		slave = bond_xmit_activebackup_slave_get(bond);
+		break;
+
+	case BOND_MODE_8023AD:
+	case BOND_MODE_XOR:
+		slave = bond_xdp_xmit_3ad_xor_slave_get(bond, xdp);
+		break;
+
+	default:
+		/* Should never happen. Mode guarded by bond_xdp_check() */
+		netdev_err(bond_dev, "Unknown bonding mode %d for xdp xmit\n", BOND_MODE(bond));
+		WARN_ON_ONCE(1);
+		return NULL;
+	}
+
+	if (slave)
+		return slave->dev;
+
+	return NULL;
+}
+
+static int bond_xdp_xmit(struct net_device *bond_dev,
+			 int n, struct xdp_frame **frames, u32 flags)
+{
+	int nxmit, err = -ENXIO;
+
+	rcu_read_lock();
+
+	for (nxmit = 0; nxmit < n; nxmit++) {
+		struct xdp_frame *frame = frames[nxmit];
+		struct xdp_frame *frames1[] = {frame};
+		struct net_device *slave_dev;
+		struct xdp_buff xdp;
+
+		xdp_convert_frame_to_buff(frame, &xdp);
+
+		slave_dev = bond_xdp_get_xmit_slave(bond_dev, &xdp);
+		if (!slave_dev) {
+			err = -ENXIO;
+			break;
+		}
+
+		err = slave_dev->netdev_ops->ndo_xdp_xmit(slave_dev, 1, frames1, flags);
+		if (err < 1)
+			break;
+	}
+
+	rcu_read_unlock();
+
+	/* If error happened on the first frame then we can pass the error up, otherwise
+	 * report the number of frames that were xmitted.
+	 */
+	if (err < 0)
+		return (nxmit == 0 ? err : nxmit);
+
+	return nxmit;
+}
+
+static int bond_xdp_set(struct net_device *dev, struct bpf_prog *prog,
+			struct netlink_ext_ack *extack)
+{
+	struct bonding *bond = netdev_priv(dev);
+	struct list_head *iter;
+	struct slave *slave, *rollback_slave;
+	struct bpf_prog *old_prog;
+	struct netdev_bpf xdp = {
+		.command = XDP_SETUP_PROG,
+		.flags   = 0,
+		.prog    = prog,
+		.extack  = extack,
+	};
+	int err;
+
+	ASSERT_RTNL();
+
+	if (!bond_xdp_check(bond))
+		return -EOPNOTSUPP;
+
+	old_prog = bond->xdp_prog;
+	bond->xdp_prog = prog;
+
+	bond_for_each_slave(bond, slave, iter) {
+		struct net_device *slave_dev = slave->dev;
+
+		if (!slave_dev->netdev_ops->ndo_bpf ||
+		    !slave_dev->netdev_ops->ndo_xdp_xmit) {
+			NL_SET_ERR_MSG(extack, "Slave device does not support XDP");
+			slave_err(dev, slave_dev, "Slave does not support XDP\n");
+			err = -EOPNOTSUPP;
+			goto err;
+		}
+
+		if (dev_xdp_prog_count(slave_dev) > 0) {
+			NL_SET_ERR_MSG(extack,
+				       "Slave has XDP program loaded, please unload before enslaving");
+			slave_err(dev, slave_dev,
+				  "Slave has XDP program loaded, please unload before enslaving\n");
+			err = -EOPNOTSUPP;
+			goto err;
+		}
+
+		err = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp);
+		if (err < 0) {
+			/* ndo_bpf() sets extack error message */
+			slave_err(dev, slave_dev, "Error %d calling ndo_bpf\n", err);
+			goto err;
+		}
+		if (prog)
+			bpf_prog_inc(prog);
+	}
+
+	if (old_prog)
+		bpf_prog_put(old_prog);
+
+	if (prog)
+		static_branch_inc(&bpf_master_redirect_enabled_key);
+	else
+		static_branch_dec(&bpf_master_redirect_enabled_key);
+
+	return 0;
+
+err:
+	/* unwind the program changes */
+	bond->xdp_prog = old_prog;
+	xdp.prog = old_prog;
+	xdp.extack = NULL; /* do not overwrite original error */
+
+	bond_for_each_slave(bond, rollback_slave, iter) {
+		struct net_device *slave_dev = rollback_slave->dev;
+		int err_unwind;
+
+		if (slave == rollback_slave)
+			break;
+
+		err_unwind = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp);
+		if (err_unwind < 0)
+			slave_err(dev, slave_dev,
+				  "Error %d when unwinding XDP program change\n", err_unwind);
+		else if (xdp.prog)
+			bpf_prog_inc(xdp.prog);
+	}
+	return err;
+}
+
+static int bond_xdp(struct net_device *dev, struct netdev_bpf *xdp)
+{
+	switch (xdp->command) {
+	case XDP_SETUP_PROG:
+		return bond_xdp_set(dev, xdp->prog, xdp->extack);
+	default:
+		return -EINVAL;
+	}
+}
+
 static u32 bond_mode_bcast_speed(struct slave *slave, u32 speed)
 {
 	if (speed == 0 || speed == SPEED_UNKNOWN)
@@ -5005,6 +5309,9 @@ static const struct net_device_ops bond_netdev_ops = {
 	.ndo_features_check	= passthru_features_check,
 	.ndo_get_xmit_slave	= bond_xmit_get_slave,
 	.ndo_sk_get_lower_dev	= bond_sk_get_lower_dev,
+	.ndo_bpf		= bond_xdp,
+	.ndo_xdp_xmit           = bond_xdp_xmit,
+	.ndo_xdp_get_xmit_slave = bond_xdp_get_xmit_slave,
 };
 
 static const struct device_type bond_type = {
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 625d9c72dee3..b91c365e4e95 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -258,6 +258,7 @@ struct bonding {
 	/* protecting ipsec_list */
 	spinlock_t ipsec_lock;
 #endif /* CONFIG_XFRM_OFFLOAD */
+	struct bpf_prog *xdp_prog;
 };
 
 #define bond_slave_get_rcu(dev) \
-- 
2.17.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v5 4/7] devmap: Exclude XDP broadcast to master device
  2021-07-30  6:18 ` [PATCH bpf-next v5 0/7] XDP bonding support Jussi Maki
                     ` (2 preceding siblings ...)
  2021-07-30  6:18   ` [PATCH bpf-next v5 3/7] net: bonding: Add XDP support to the bonding driver Jussi Maki
@ 2021-07-30  6:18   ` Jussi Maki
  2021-07-30  6:18   ` [PATCH bpf-next v5 5/7] net: core: Allow netdev_lower_get_next_private_rcu in bh context Jussi Maki
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 71+ messages in thread
From: Jussi Maki @ 2021-07-30  6:18 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

If the ingress device is bond slave, do not broadcast back
through it or the bond master.

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 kernel/bpf/devmap.c | 69 +++++++++++++++++++++++++++++++++++++++------
 1 file changed, 60 insertions(+), 9 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 542e94fa30b4..f02d04540c0c 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -534,10 +534,9 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
 	return __xdp_enqueue(dev, xdp, dev_rx, dst->xdp_prog);
 }
 
-static bool is_valid_dst(struct bpf_dtab_netdev *obj, struct xdp_buff *xdp,
-			 int exclude_ifindex)
+static bool is_valid_dst(struct bpf_dtab_netdev *obj, struct xdp_buff *xdp)
 {
-	if (!obj || obj->dev->ifindex == exclude_ifindex ||
+	if (!obj ||
 	    !obj->dev->netdev_ops->ndo_xdp_xmit)
 		return false;
 
@@ -562,17 +561,48 @@ static int dev_map_enqueue_clone(struct bpf_dtab_netdev *obj,
 	return 0;
 }
 
+static inline bool is_ifindex_excluded(int *excluded, int num_excluded, int ifindex)
+{
+	while (num_excluded--) {
+		if (ifindex == excluded[num_excluded])
+			return true;
+	}
+	return false;
+}
+
+/* Get ifindex of each upper device. 'indexes' must be able to hold at
+ * least MAX_NEST_DEV elements.
+ * Returns the number of ifindexes added.
+ */
+static int get_upper_ifindexes(struct net_device *dev, int *indexes)
+{
+	struct net_device *upper;
+	struct list_head *iter;
+	int n = 0;
+
+	netdev_for_each_upper_dev_rcu(dev, upper, iter) {
+		indexes[n++] = upper->ifindex;
+	}
+	return n;
+}
+
 int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
 			  struct bpf_map *map, bool exclude_ingress)
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
-	int exclude_ifindex = exclude_ingress ? dev_rx->ifindex : 0;
 	struct bpf_dtab_netdev *dst, *last_dst = NULL;
+	int excluded_devices[1+MAX_NEST_DEV];
 	struct hlist_head *head;
 	struct xdp_frame *xdpf;
+	int num_excluded = 0;
 	unsigned int i;
 	int err;
 
+	if (exclude_ingress) {
+		num_excluded = get_upper_ifindexes(dev_rx, excluded_devices);
+		excluded_devices[num_excluded++] = dev_rx->ifindex;
+	}
+
 	xdpf = xdp_convert_buff_to_frame(xdp);
 	if (unlikely(!xdpf))
 		return -EOVERFLOW;
@@ -581,7 +611,10 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
 		for (i = 0; i < map->max_entries; i++) {
 			dst = rcu_dereference_check(dtab->netdev_map[i],
 						    rcu_read_lock_bh_held());
-			if (!is_valid_dst(dst, xdp, exclude_ifindex))
+			if (!is_valid_dst(dst, xdp))
+				continue;
+
+			if (is_ifindex_excluded(excluded_devices, num_excluded, dst->dev->ifindex))
 				continue;
 
 			/* we only need n-1 clones; last_dst enqueued below */
@@ -601,7 +634,11 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
 			head = dev_map_index_hash(dtab, i);
 			hlist_for_each_entry_rcu(dst, head, index_hlist,
 						 lockdep_is_held(&dtab->index_lock)) {
-				if (!is_valid_dst(dst, xdp, exclude_ifindex))
+				if (!is_valid_dst(dst, xdp))
+					continue;
+
+				if (is_ifindex_excluded(excluded_devices, num_excluded,
+							dst->dev->ifindex))
 					continue;
 
 				/* we only need n-1 clones; last_dst enqueued below */
@@ -675,18 +712,27 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb,
 			   bool exclude_ingress)
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
-	int exclude_ifindex = exclude_ingress ? dev->ifindex : 0;
 	struct bpf_dtab_netdev *dst, *last_dst = NULL;
+	int excluded_devices[1+MAX_NEST_DEV];
 	struct hlist_head *head;
 	struct hlist_node *next;
+	int num_excluded = 0;
 	unsigned int i;
 	int err;
 
+	if (exclude_ingress) {
+		num_excluded = get_upper_ifindexes(dev, excluded_devices);
+		excluded_devices[num_excluded++] = dev->ifindex;
+	}
+
 	if (map->map_type == BPF_MAP_TYPE_DEVMAP) {
 		for (i = 0; i < map->max_entries; i++) {
 			dst = rcu_dereference_check(dtab->netdev_map[i],
 						    rcu_read_lock_bh_held());
-			if (!dst || dst->dev->ifindex == exclude_ifindex)
+			if (!dst)
+				continue;
+
+			if (is_ifindex_excluded(excluded_devices, num_excluded, dst->dev->ifindex))
 				continue;
 
 			/* we only need n-1 clones; last_dst enqueued below */
@@ -700,12 +746,17 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb,
 				return err;
 
 			last_dst = dst;
+
 		}
 	} else { /* BPF_MAP_TYPE_DEVMAP_HASH */
 		for (i = 0; i < dtab->n_buckets; i++) {
 			head = dev_map_index_hash(dtab, i);
 			hlist_for_each_entry_safe(dst, next, head, index_hlist) {
-				if (!dst || dst->dev->ifindex == exclude_ifindex)
+				if (!dst)
+					continue;
+
+				if (is_ifindex_excluded(excluded_devices, num_excluded,
+							dst->dev->ifindex))
 					continue;
 
 				/* we only need n-1 clones; last_dst enqueued below */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v5 5/7] net: core: Allow netdev_lower_get_next_private_rcu in bh context
  2021-07-30  6:18 ` [PATCH bpf-next v5 0/7] XDP bonding support Jussi Maki
                     ` (3 preceding siblings ...)
  2021-07-30  6:18   ` [PATCH bpf-next v5 4/7] devmap: Exclude XDP broadcast to master device Jussi Maki
@ 2021-07-30  6:18   ` Jussi Maki
  2021-07-30  6:18   ` [PATCH bpf-next v5 6/7] selftests/bpf: Fix xdp_tx.c prog section name Jussi Maki
  2021-07-30  6:18   ` [PATCH bpf-next v5 7/7] selftests/bpf: Add tests for XDP bonding Jussi Maki
  6 siblings, 0 replies; 71+ messages in thread
From: Jussi Maki @ 2021-07-30  6:18 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

For the XDP bonding slave lookup to work in the NAPI poll context
in which the redudant rcu_read_lock() has been removed we have to
follow the same approach as in [1] and modify the WARN_ON to also
check rcu_read_lock_bh_held().

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=694cea395fded425008e93cd90cfdf7a451674af

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 net/core/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 27023ea933dd..ae1aecf97b58 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7588,7 +7588,7 @@ void *netdev_lower_get_next_private_rcu(struct net_device *dev,
 {
 	struct netdev_adjacent *lower;
 
-	WARN_ON_ONCE(!rcu_read_lock_held());
+	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held());
 
 	lower = list_entry_rcu((*iter)->next, struct netdev_adjacent, list);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v5 6/7] selftests/bpf: Fix xdp_tx.c prog section name
  2021-07-30  6:18 ` [PATCH bpf-next v5 0/7] XDP bonding support Jussi Maki
                     ` (4 preceding siblings ...)
  2021-07-30  6:18   ` [PATCH bpf-next v5 5/7] net: core: Allow netdev_lower_get_next_private_rcu in bh context Jussi Maki
@ 2021-07-30  6:18   ` Jussi Maki
  2021-08-04 23:35     ` Andrii Nakryiko
  2021-07-30  6:18   ` [PATCH bpf-next v5 7/7] selftests/bpf: Add tests for XDP bonding Jussi Maki
  6 siblings, 1 reply; 71+ messages in thread
From: Jussi Maki @ 2021-07-30  6:18 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

The program type cannot be deduced from 'tx' which causes an invalid
argument error when trying to load xdp_tx.o using the skeleton.
Rename the section name to "xdp/tx" so that libbpf can deduce the type.

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 tools/testing/selftests/bpf/progs/xdp_tx.c   | 2 +-
 tools/testing/selftests/bpf/test_xdp_veth.sh | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/xdp_tx.c b/tools/testing/selftests/bpf/progs/xdp_tx.c
index 94e6c2b281cb..ece1fbbc0984 100644
--- a/tools/testing/selftests/bpf/progs/xdp_tx.c
+++ b/tools/testing/selftests/bpf/progs/xdp_tx.c
@@ -3,7 +3,7 @@
 #include <linux/bpf.h>
 #include <bpf/bpf_helpers.h>
 
-SEC("tx")
+SEC("xdp/tx")
 int xdp_tx(struct xdp_md *xdp)
 {
 	return XDP_TX;
diff --git a/tools/testing/selftests/bpf/test_xdp_veth.sh b/tools/testing/selftests/bpf/test_xdp_veth.sh
index ba8ffcdaac30..c8e0b7d36f56 100755
--- a/tools/testing/selftests/bpf/test_xdp_veth.sh
+++ b/tools/testing/selftests/bpf/test_xdp_veth.sh
@@ -108,7 +108,7 @@ ip link set dev veth2 xdp pinned $BPF_DIR/progs/redirect_map_1
 ip link set dev veth3 xdp pinned $BPF_DIR/progs/redirect_map_2
 
 ip -n ns1 link set dev veth11 xdp obj xdp_dummy.o sec xdp_dummy
-ip -n ns2 link set dev veth22 xdp obj xdp_tx.o sec tx
+ip -n ns2 link set dev veth22 xdp obj xdp_tx.o sec xdp/tx
 ip -n ns3 link set dev veth33 xdp obj xdp_dummy.o sec xdp_dummy
 
 trap cleanup EXIT
-- 
2.17.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v5 7/7] selftests/bpf: Add tests for XDP bonding
  2021-07-30  6:18 ` [PATCH bpf-next v5 0/7] XDP bonding support Jussi Maki
                     ` (5 preceding siblings ...)
  2021-07-30  6:18   ` [PATCH bpf-next v5 6/7] selftests/bpf: Fix xdp_tx.c prog section name Jussi Maki
@ 2021-07-30  6:18   ` Jussi Maki
  2021-08-04 23:33     ` Andrii Nakryiko
  6 siblings, 1 reply; 71+ messages in thread
From: Jussi Maki @ 2021-07-30  6:18 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

Add a test suite to test XDP bonding implementation
over a pair of veth devices.

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 .../selftests/bpf/prog_tests/xdp_bonding.c    | 533 ++++++++++++++++++
 1 file changed, 533 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c b/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c
new file mode 100644
index 000000000000..506fd9dc8aef
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c
@@ -0,0 +1,533 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/**
+ * Test XDP bonding support
+ *
+ * Sets up two bonded veth pairs between two fresh namespaces
+ * and verifies that XDP_TX program loaded on a bond device
+ * are correctly loaded onto the slave devices and XDP_TX'd
+ * packets are balanced using bonding.
+ */
+
+#define _GNU_SOURCE
+#include <sched.h>
+#include <net/if.h>
+#include <linux/if_link.h>
+#include "test_progs.h"
+#include "network_helpers.h"
+#include <linux/if_bonding.h>
+#include <linux/limits.h>
+#include <linux/udp.h>
+
+#include "xdp_dummy.skel.h"
+#include "xdp_redirect_multi_kern.skel.h"
+#include "xdp_tx.skel.h"
+
+#define BOND1_MAC {0x00, 0x11, 0x22, 0x33, 0x44, 0x55}
+#define BOND1_MAC_STR "00:11:22:33:44:55"
+#define BOND2_MAC {0x00, 0x22, 0x33, 0x44, 0x55, 0x66}
+#define BOND2_MAC_STR "00:22:33:44:55:66"
+#define NPACKETS 100
+
+static int root_netns_fd = -1;
+
+static void restore_root_netns(void)
+{
+	ASSERT_OK(setns(root_netns_fd, CLONE_NEWNET), "restore_root_netns");
+}
+
+int setns_by_name(char *name)
+{
+	int nsfd, err;
+	char nspath[PATH_MAX];
+
+	snprintf(nspath, sizeof(nspath), "%s/%s", "/var/run/netns", name);
+	nsfd = open(nspath, O_RDONLY | O_CLOEXEC);
+	if (nsfd < 0)
+		return -1;
+
+	err = setns(nsfd, CLONE_NEWNET);
+	close(nsfd);
+	return err;
+}
+
+static int get_rx_packets(const char *iface)
+{
+	FILE *f;
+	char line[512];
+	int iface_len = strlen(iface);
+
+	f = fopen("/proc/net/dev", "r");
+	if (!f)
+		return -1;
+
+	while (fgets(line, sizeof(line), f)) {
+		char *p = line;
+
+		while (*p == ' ')
+			p++; /* skip whitespace */
+		if (!strncmp(p, iface, iface_len)) {
+			p += iface_len;
+			if (*p++ != ':')
+				continue;
+			while (*p == ' ')
+				p++; /* skip whitespace */
+			while (*p && *p != ' ')
+				p++; /* skip rx bytes */
+			while (*p == ' ')
+				p++; /* skip whitespace */
+			fclose(f);
+			return atoi(p);
+		}
+	}
+	fclose(f);
+	return -1;
+}
+
+#define MAX_BPF_LINKS 8
+
+struct skeletons {
+	struct xdp_dummy *xdp_dummy;
+	struct xdp_tx *xdp_tx;
+	struct xdp_redirect_multi_kern *xdp_redirect_multi_kern;
+
+	int nlinks;
+	struct bpf_link *links[MAX_BPF_LINKS];
+};
+
+static int xdp_attach(struct skeletons *skeletons, struct bpf_program *prog, char *iface)
+{
+	struct bpf_link *link;
+	int ifindex;
+
+	ifindex = if_nametoindex(iface);
+	if (!ASSERT_GT(ifindex, 0, "get ifindex"))
+		return -1;
+
+	if (!ASSERT_LE(skeletons->nlinks, MAX_BPF_LINKS, "too many XDP programs attached"))
+		return -1;
+
+	link = bpf_program__attach_xdp(prog, ifindex);
+	if (!ASSERT_OK_PTR(link, "attach xdp program"))
+		return -1;
+
+	skeletons->links[skeletons->nlinks++] = link;
+	return 0;
+}
+
+enum {
+	BOND_ONE_NO_ATTACH = 0,
+	BOND_BOTH_AND_ATTACH,
+};
+
+static const char * const mode_names[] = {
+	[BOND_MODE_ROUNDROBIN]   = "balance-rr",
+	[BOND_MODE_ACTIVEBACKUP] = "active-backup",
+	[BOND_MODE_XOR]          = "balance-xor",
+	[BOND_MODE_BROADCAST]    = "broadcast",
+	[BOND_MODE_8023AD]       = "802.3ad",
+	[BOND_MODE_TLB]          = "balance-tlb",
+	[BOND_MODE_ALB]          = "balance-alb",
+};
+
+static const char * const xmit_policy_names[] = {
+	[BOND_XMIT_POLICY_LAYER2]       = "layer2",
+	[BOND_XMIT_POLICY_LAYER34]      = "layer3+4",
+	[BOND_XMIT_POLICY_LAYER23]      = "layer2+3",
+	[BOND_XMIT_POLICY_ENCAP23]      = "encap2+3",
+	[BOND_XMIT_POLICY_ENCAP34]      = "encap3+4",
+};
+
+static int bonding_setup(struct skeletons *skeletons, int mode, int xmit_policy,
+			 int bond_both_attach)
+{
+#define SYS(fmt, ...)						\
+	({							\
+		char cmd[1024];					\
+		snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__);	\
+		if (!ASSERT_OK(system(cmd), cmd))		\
+			return -1;				\
+	})
+
+	SYS("ip netns add ns_dst");
+	SYS("ip link add veth1_1 type veth peer name veth2_1 netns ns_dst");
+	SYS("ip link add veth1_2 type veth peer name veth2_2 netns ns_dst");
+
+	SYS("ip link add bond1 type bond mode %s xmit_hash_policy %s",
+	    mode_names[mode], xmit_policy_names[xmit_policy]);
+	SYS("ip link set bond1 up address " BOND1_MAC_STR " addrgenmode none");
+	SYS("ip -netns ns_dst link add bond2 type bond mode %s xmit_hash_policy %s",
+	    mode_names[mode], xmit_policy_names[xmit_policy]);
+	SYS("ip -netns ns_dst link set bond2 up address " BOND2_MAC_STR " addrgenmode none");
+
+	SYS("ip link set veth1_1 master bond1");
+	if (bond_both_attach == BOND_BOTH_AND_ATTACH) {
+		SYS("ip link set veth1_2 master bond1");
+	} else {
+		SYS("ip link set veth1_2 up addrgenmode none");
+
+		if (xdp_attach(skeletons, skeletons->xdp_dummy->progs.xdp_dummy_prog, "veth1_2"))
+			return -1;
+	}
+
+	SYS("ip -netns ns_dst link set veth2_1 master bond2");
+
+	if (bond_both_attach == BOND_BOTH_AND_ATTACH)
+		SYS("ip -netns ns_dst link set veth2_2 master bond2");
+	else
+		SYS("ip -netns ns_dst link set veth2_2 up addrgenmode none");
+
+	/* Load a dummy program on sending side as with veth peer needs to have a
+	 * XDP program loaded as well.
+	 */
+	if (xdp_attach(skeletons, skeletons->xdp_dummy->progs.xdp_dummy_prog, "bond1"))
+		return -1;
+
+	if (bond_both_attach == BOND_BOTH_AND_ATTACH) {
+		if (!ASSERT_OK(setns_by_name("ns_dst"), "set netns to ns_dst"))
+			return -1;
+
+		if (xdp_attach(skeletons, skeletons->xdp_tx->progs.xdp_tx, "bond2"))
+			return -1;
+
+		restore_root_netns();
+	}
+
+	return 0;
+
+#undef SYS
+}
+
+static void bonding_cleanup(struct skeletons *skeletons)
+{
+	restore_root_netns();
+	while (skeletons->nlinks) {
+		skeletons->nlinks--;
+		bpf_link__detach(skeletons->links[skeletons->nlinks]);
+	}
+	ASSERT_OK(system("ip link delete bond1"), "delete bond1");
+	ASSERT_OK(system("ip link delete veth1_1"), "delete veth1_1");
+	ASSERT_OK(system("ip link delete veth1_2"), "delete veth1_2");
+	ASSERT_OK(system("ip netns delete ns_dst"), "delete ns_dst");
+}
+
+static int send_udp_packets(int vary_dst_ip)
+{
+	struct ethhdr eh = {
+		.h_source = BOND1_MAC,
+		.h_dest = BOND2_MAC,
+		.h_proto = htons(ETH_P_IP),
+	};
+	uint8_t buf[128] = {};
+	struct iphdr *iph = (struct iphdr *)(buf + sizeof(eh));
+	struct udphdr *uh = (struct udphdr *)(buf + sizeof(eh) + sizeof(*iph));
+	int i, s = -1;
+	int ifindex;
+
+	s = socket(AF_PACKET, SOCK_RAW, IPPROTO_RAW);
+	if (!ASSERT_GE(s, 0, "socket"))
+		goto err;
+
+	ifindex = if_nametoindex("bond1");
+	if (!ASSERT_GT(ifindex, 0, "get bond1 ifindex"))
+		goto err;
+
+	memcpy(buf, &eh, sizeof(eh));
+	iph->ihl = 5;
+	iph->version = 4;
+	iph->tos = 16;
+	iph->id = 1;
+	iph->ttl = 64;
+	iph->protocol = IPPROTO_UDP;
+	iph->saddr = 1;
+	iph->daddr = 2;
+	iph->tot_len = htons(sizeof(buf) - ETH_HLEN);
+	iph->check = 0;
+
+	for (i = 1; i <= NPACKETS; i++) {
+		int n;
+		struct sockaddr_ll saddr_ll = {
+			.sll_ifindex = ifindex,
+			.sll_halen = ETH_ALEN,
+			.sll_addr = BOND2_MAC,
+		};
+
+		/* vary the UDP destination port for even distribution with roundrobin/xor modes */
+		uh->dest++;
+
+		if (vary_dst_ip)
+			iph->daddr++;
+
+		n = sendto(s, buf, sizeof(buf), 0, (struct sockaddr *)&saddr_ll, sizeof(saddr_ll));
+		if (!ASSERT_EQ(n, sizeof(buf), "sendto"))
+			goto err;
+	}
+
+	return 0;
+
+err:
+	if (s >= 0)
+		close(s);
+	return -1;
+}
+
+void test_xdp_bonding_with_mode(struct skeletons *skeletons, char *name, int mode, int xmit_policy)
+{
+	int bond1_rx;
+
+	if (!test__start_subtest(name))
+		return;
+
+	if (bonding_setup(skeletons, mode, xmit_policy, BOND_BOTH_AND_ATTACH))
+		goto out;
+
+	if (send_udp_packets(xmit_policy != BOND_XMIT_POLICY_LAYER34))
+		goto out;
+
+	bond1_rx = get_rx_packets("bond1");
+	ASSERT_EQ(bond1_rx, NPACKETS, "expected more received packets");
+
+	switch (mode) {
+	case BOND_MODE_ROUNDROBIN:
+	case BOND_MODE_XOR: {
+		int veth1_rx = get_rx_packets("veth1_1");
+		int veth2_rx = get_rx_packets("veth1_2");
+		int diff = abs(veth1_rx - veth2_rx);
+
+		ASSERT_GE(veth1_rx + veth2_rx, NPACKETS, "expected more packets");
+
+		switch (xmit_policy) {
+		case BOND_XMIT_POLICY_LAYER2:
+			ASSERT_GE(diff, NPACKETS,
+				  "expected packets on only one of the interfaces");
+			break;
+		case BOND_XMIT_POLICY_LAYER23:
+		case BOND_XMIT_POLICY_LAYER34:
+			ASSERT_LT(diff, NPACKETS/2,
+				  "expected even distribution of packets");
+			break;
+		default:
+			PRINT_FAIL("Unimplemented xmit_policy=%d\n", xmit_policy);
+			break;
+		}
+		break;
+	}
+	case BOND_MODE_ACTIVEBACKUP: {
+		int veth1_rx = get_rx_packets("veth1_1");
+		int veth2_rx = get_rx_packets("veth1_2");
+		int diff = abs(veth1_rx - veth2_rx);
+
+		ASSERT_GE(diff, NPACKETS,
+			  "expected packets on only one of the interfaces");
+		break;
+	}
+	default:
+		PRINT_FAIL("Unimplemented xmit_policy=%d\n", xmit_policy);
+		break;
+	}
+
+out:
+	bonding_cleanup(skeletons);
+}
+
+
+/* Test the broadcast redirection using xdp_redirect_map_multi_prog and adding
+ * all the interfaces to it and checking that broadcasting won't send the packet
+ * to neither the ingress bond device (bond2) or its slave (veth2_1).
+ */
+void test_xdp_bonding_redirect_multi(struct skeletons *skeletons)
+{
+	static const char * const ifaces[] = {"bond2", "veth2_1", "veth2_2"};
+	int veth1_1_rx, veth1_2_rx;
+	int err;
+
+	if (!test__start_subtest("xdp_bonding_redirect_multi"))
+		return;
+
+	if (bonding_setup(skeletons, BOND_MODE_ROUNDROBIN, BOND_XMIT_POLICY_LAYER23,
+			  BOND_ONE_NO_ATTACH))
+		goto out;
+
+
+	if (!ASSERT_OK(setns_by_name("ns_dst"), "could not set netns to ns_dst"))
+		goto out;
+
+	/* populate the devmap with the relevant interfaces */
+	for (int i = 0; i < ARRAY_SIZE(ifaces); i++) {
+		int ifindex = if_nametoindex(ifaces[i]);
+		int map_fd = bpf_map__fd(skeletons->xdp_redirect_multi_kern->maps.map_all);
+
+		if (!ASSERT_GT(ifindex, 0, "could not get interface index"))
+			goto out;
+
+		err = bpf_map_update_elem(map_fd, &ifindex, &ifindex, 0);
+		if (!ASSERT_OK(err, "add interface to map_all"))
+			goto out;
+	}
+
+	if (xdp_attach(skeletons,
+		       skeletons->xdp_redirect_multi_kern->progs.xdp_redirect_map_multi_prog,
+		       "bond2"))
+		goto out;
+
+	restore_root_netns();
+
+	if (send_udp_packets(BOND_MODE_ROUNDROBIN))
+		goto out;
+
+	veth1_1_rx = get_rx_packets("veth1_1");
+	veth1_2_rx = get_rx_packets("veth1_2");
+
+	ASSERT_EQ(veth1_1_rx, 0, "expected no packets on veth1_1");
+	ASSERT_GE(veth1_2_rx, NPACKETS, "expected packets on veth1_2");
+
+out:
+	restore_root_netns();
+	bonding_cleanup(skeletons);
+}
+
+/* Test that XDP programs cannot be attached to both the bond master and slaves simultaneously */
+void test_xdp_bonding_attach(struct skeletons *skeletons)
+{
+	struct bpf_link *link = NULL;
+	struct bpf_link *link2 = NULL;
+	int veth, bond;
+	int err;
+
+	if (!test__start_subtest("xdp_bonding_attach"))
+		return;
+
+	if (!ASSERT_OK(system("ip link add veth type veth"), "add veth"))
+		goto out;
+	if (!ASSERT_OK(system("ip link add bond type bond"), "add bond"))
+		goto out;
+
+	veth = if_nametoindex("veth");
+	if (!ASSERT_GE(veth, 0, "if_nametoindex veth"))
+		goto out;
+	bond = if_nametoindex("bond");
+	if (!ASSERT_GE(bond, 0, "if_nametoindex bond"))
+		goto out;
+
+	/* enslaving with a XDP program loaded fails */
+	link = bpf_program__attach_xdp(skeletons->xdp_dummy->progs.xdp_dummy_prog, veth);
+	if (!ASSERT_OK_PTR(link, "attach program to veth"))
+		goto out;
+
+	err = system("ip link set veth master bond");
+	if (!ASSERT_NEQ(err, 0, "attaching slave with xdp program expected to fail"))
+		goto out;
+
+	bpf_link__detach(link);
+	link = NULL;
+
+	err = system("ip link set veth master bond");
+	if (!ASSERT_OK(err, "set veth master"))
+		goto out;
+
+	/* attaching to slave when master has no program is allowed */
+	link = bpf_program__attach_xdp(skeletons->xdp_dummy->progs.xdp_dummy_prog, veth);
+	if (!ASSERT_OK_PTR(link, "attach program to slave when enslaved"))
+		goto out;
+
+	/* attaching to master not allowed when slave has program loaded */
+	link2 = bpf_program__attach_xdp(skeletons->xdp_dummy->progs.xdp_dummy_prog, bond);
+	if (!ASSERT_ERR_PTR(link2, "attach program to master when slave has program"))
+		goto out;
+
+	bpf_link__detach(link);
+	link = NULL;
+
+	/* attaching XDP program to master allowed when slave has no program */
+	link = bpf_program__attach_xdp(skeletons->xdp_dummy->progs.xdp_dummy_prog, bond);
+	if (!ASSERT_OK_PTR(link, "attach program to master"))
+		goto out;
+
+	/* attaching to slave not allowed when master has program loaded */
+	link2 = bpf_program__attach_xdp(skeletons->xdp_dummy->progs.xdp_dummy_prog, bond);
+	ASSERT_ERR_PTR(link2, "attach program to slave when master has program");
+
+out:
+	if (link)
+		bpf_link__detach(link);
+	if (link2)
+		bpf_link__detach(link2);
+
+	system("ip link del veth");
+	system("ip link del bond");
+}
+
+static int libbpf_debug_print(enum libbpf_print_level level,
+			      const char *format, va_list args)
+{
+	if (level != LIBBPF_WARN)
+		vprintf(format, args);
+	return 0;
+}
+
+struct bond_test_case {
+	char *name;
+	int mode;
+	int xmit_policy;
+};
+
+static	struct bond_test_case bond_test_cases[] = {
+	{ "xdp_bonding_roundrobin", BOND_MODE_ROUNDROBIN, BOND_XMIT_POLICY_LAYER23, },
+	{ "xdp_bonding_activebackup", BOND_MODE_ACTIVEBACKUP, BOND_XMIT_POLICY_LAYER23 },
+
+	{ "xdp_bonding_xor_layer2", BOND_MODE_XOR, BOND_XMIT_POLICY_LAYER2, },
+	{ "xdp_bonding_xor_layer23", BOND_MODE_XOR, BOND_XMIT_POLICY_LAYER23, },
+	{ "xdp_bonding_xor_layer34", BOND_MODE_XOR, BOND_XMIT_POLICY_LAYER34, },
+};
+
+void test_xdp_bonding(void)
+{
+	libbpf_print_fn_t old_print_fn;
+	struct skeletons skeletons = {};
+	int i;
+
+	old_print_fn = libbpf_set_print(libbpf_debug_print);
+
+	root_netns_fd = open("/proc/self/ns/net", O_RDONLY);
+	if (!ASSERT_GE(root_netns_fd, 0, "open /proc/self/ns/net"))
+		goto out;
+
+	skeletons.xdp_dummy = xdp_dummy__open_and_load();
+	if (!ASSERT_OK_PTR(skeletons.xdp_dummy, "xdp_dummy__open_and_load"))
+		goto out;
+
+	skeletons.xdp_tx = xdp_tx__open_and_load();
+	if (!ASSERT_OK_PTR(skeletons.xdp_tx, "xdp_tx__open_and_load"))
+		goto out;
+
+	skeletons.xdp_redirect_multi_kern = xdp_redirect_multi_kern__open_and_load();
+	if (!ASSERT_OK_PTR(skeletons.xdp_redirect_multi_kern,
+			   "xdp_redirect_multi_kern__open_and_load"))
+		goto out;
+
+	test_xdp_bonding_attach(&skeletons);
+
+	for (i = 0; i < ARRAY_SIZE(bond_test_cases); i++) {
+		struct bond_test_case *test_case = &bond_test_cases[i];
+
+		test_xdp_bonding_with_mode(
+			&skeletons,
+			test_case->name,
+			test_case->mode,
+			test_case->xmit_policy);
+	}
+
+	test_xdp_bonding_redirect_multi(&skeletons);
+
+out:
+	if (skeletons.xdp_dummy)
+		xdp_dummy__destroy(skeletons.xdp_dummy);
+	if (skeletons.xdp_tx)
+		xdp_tx__destroy(skeletons.xdp_tx);
+	if (skeletons.xdp_redirect_multi_kern)
+		xdp_redirect_multi_kern__destroy(skeletons.xdp_redirect_multi_kern);
+
+	libbpf_set_print(old_print_fn);
+	if (root_netns_fd)
+		close(root_netns_fd);
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v6 0/7]: XDP bonding support
  2021-06-09 13:55 [PATCH bpf-next 0/3] XDP bonding support Jussi Maki
                   ` (7 preceding siblings ...)
  2021-07-30  6:18 ` [PATCH bpf-next v5 0/7] XDP bonding support Jussi Maki
@ 2021-07-31  5:57 ` Jussi Maki
  2021-07-31  5:57   ` [PATCH bpf-next v6 1/7] net: bonding: Refactor bond_xmit_hash for use with xdp_buff Jussi Maki
                     ` (6 more replies)
  8 siblings, 7 replies; 71+ messages in thread
From: Jussi Maki @ 2021-07-31  5:57 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson

This patchset introduces XDP support to the bonding driver.

The motivation for this change is to enable use of bonding (and
802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
XDP and also to transparently support bond devices for projects that
use XDP given most modern NICs have dual port adapters.  An alternative
to this approach would be to implement 802.3ad in user-space and
implement the bonding load-balancing in the XDP program itself, but
is rather a cumbersome endeavor in terms of slave device management
(e.g. by watching netlink) and requires separate programs for native
vs bond cases for the orchestrator. A native in-kernel implementation
overcomes these issues and provides more flexibility.

Below are benchmark results done on two machines with 100Gbit
Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
16-core 3950X on receiving machine. 64 byte packets were sent with
pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
ice driver, so the tests were performed with iommu=off and patch [2]
applied. Additionally the bonding round robin algorithm was modified
to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
of cache misses were caused by the shared rr_tx_counter. Fix for this
has been already merged into net-next. The statistics were collected
using "sar -n dev -u 1 10".

 -----------------------|  CPU  |--| rxpck/s |--| txpck/s |----
 without patch (1 dev):
   XDP_DROP:              3.15%      48.6Mpps
   XDP_TX:                3.12%      18.3Mpps     18.3Mpps
   XDP_DROP (RSS):        9.47%      116.5Mpps
   XDP_TX (RSS):          9.67%      25.3Mpps     24.2Mpps
 -----------------------
 with patch, bond (1 dev):
   XDP_DROP:              3.14%      46.7Mpps
   XDP_TX:                3.15%      13.9Mpps     13.9Mpps
   XDP_DROP (RSS):        10.33%     117.2Mpps
   XDP_TX (RSS):          10.64%     25.1Mpps     24.0Mpps
 -----------------------
 with patch, bond (2 devs):
   XDP_DROP:              6.27%      92.7Mpps
   XDP_TX:                6.26%      17.6Mpps     17.5Mpps
   XDP_DROP (RSS):       11.38%      117.2Mpps
   XDP_TX (RSS):         14.30%      28.7Mpps     27.4Mpps
 --------------------------------------------------------------

RSS: Receive Side Scaling, e.g. the packets were sent to a range of
destination IPs.

[1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb
[2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t
[3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/

Patch 1 prepares bond_xmit_hash for hashing xdp_buff's.
Patch 2 adds hooks to implement redirection after bpf prog run.
Patch 3 implements the hooks in the bonding driver.
Patch 4 modifies devmap to properly handle EXCLUDE_INGRESS with a slave device.
Patch 5 fixes an issue related to recent cleanup of rcu_read_lock in XDP context.
Patch 6 fixes loading of xdp_tx.o by renaming section name.
Patch 7 adds tests.

v5->v6:
- Address Andrii's comments about the tests.

v4->v5:
- As pointed by Andrii, use the generated BPF skeletons rather than libbpf
  directly.
- Renamed section name in progs/xdp_tx.c as the BPF skeleton wouldn't load it
  otherwise due to unknown program type.
- Daniel Borkmann noted that to retain backwards compatibility and allow some
  use cases we should allow attaching XDP programs to a slave device when the
  master does not have a program loaded. Modified the logic to allow this and
  added tests for the different combinations of attaching a program.

v3->v4:
- Add back the test suite, while removing the vmtest.sh modifications to kernel
  config new that CONFIG_BONDING=y is set. Discussed with Magnus Karlsson that
  it makes sense right now to not reuse the code from xdpceiver.c for testing
  XDP bonding.

v2->v3:
- Address Jay's comment to properly exclude upper devices with EXCLUDE_INGRESS
  when there are deeper nesting involved. Now all upper devices are excluded.
- Refuse to enslave devices that already have XDP programs loaded and refuse to
  load XDP programs to slave devices. Earlier one could have a XDP program loaded
  and after enslaving and loading another program onto the bond device the xdp_state
  of the enslaved device would be pointing at an old program.
- Adapt netdev_lower_get_next_private_rcu so it can be called in the XDP context.

v1->v2:
- Split up into smaller easier to review patches and address cosmetic
  review comments.
- Drop the INDIRECT_CALL optimization as it showed little improvement in tests.
- Drop the rr_tx_counter patch as that has already been merged into net-next.
- Separate the test suite into another patch set. This will follow later once a
  patch set from Magnus Karlsson is merged and provides test utilities that can
  be reused for XDP bonding tests. v2 contains no major functional changes and
  was tested with the test suite included in v1.
  (https://lore.kernel.org/bpf/202106221509.kwNvAAZg-lkp@intel.com/T/#m464146d47299125d5868a08affd6d6ce526dfad1)

---



^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v6 1/7] net: bonding: Refactor bond_xmit_hash for use with xdp_buff
  2021-07-31  5:57 ` [PATCH bpf-next v6 0/7]: XDP bonding support Jussi Maki
@ 2021-07-31  5:57   ` Jussi Maki
  2021-08-11  1:52     ` Jonathan Toppins
  2021-07-31  5:57   ` [PATCH bpf-next v6 2/7] net: core: Add support for XDP redirection to slave device Jussi Maki
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 71+ messages in thread
From: Jussi Maki @ 2021-07-31  5:57 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

In preparation for adding XDP support to the bonding driver
refactor the packet hashing functions to be able to work with
any linear data buffer without an skb.

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 drivers/net/bonding/bond_main.c | 147 +++++++++++++++++++-------------
 1 file changed, 90 insertions(+), 57 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index d22d78303311..dcec5cc4dab1 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3611,55 +3611,80 @@ static struct notifier_block bond_netdev_notifier = {
 
 /*---------------------------- Hashing Policies -----------------------------*/
 
+/* Helper to access data in a packet, with or without a backing skb.
+ * If skb is given the data is linearized if necessary via pskb_may_pull.
+ */
+static inline const void *bond_pull_data(struct sk_buff *skb,
+					 const void *data, int hlen, int n)
+{
+	if (likely(n <= hlen))
+		return data;
+	else if (skb && likely(pskb_may_pull(skb, n)))
+		return skb->head;
+
+	return NULL;
+}
+
 /* L2 hash helper */
-static inline u32 bond_eth_hash(struct sk_buff *skb)
+static inline u32 bond_eth_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen)
 {
-	struct ethhdr *ep, hdr_tmp;
+	struct ethhdr *ep;
 
-	ep = skb_header_pointer(skb, 0, sizeof(hdr_tmp), &hdr_tmp);
-	if (ep)
-		return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto;
-	return 0;
+	data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr));
+	if (!data)
+		return 0;
+
+	ep = (struct ethhdr *)(data + mhoff);
+	return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto;
 }
 
-static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk,
-			 int *noff, int *proto, bool l34)
+static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk, const void *data,
+			 int hlen, __be16 l2_proto, int *nhoff, int *ip_proto, bool l34)
 {
 	const struct ipv6hdr *iph6;
 	const struct iphdr *iph;
 
-	if (skb->protocol == htons(ETH_P_IP)) {
-		if (unlikely(!pskb_may_pull(skb, *noff + sizeof(*iph))))
+	if (l2_proto == htons(ETH_P_IP)) {
+		data = bond_pull_data(skb, data, hlen, *nhoff + sizeof(*iph));
+		if (!data)
 			return false;
-		iph = (const struct iphdr *)(skb->data + *noff);
+
+		iph = (const struct iphdr *)(data + *nhoff);
 		iph_to_flow_copy_v4addrs(fk, iph);
-		*noff += iph->ihl << 2;
+		*nhoff += iph->ihl << 2;
 		if (!ip_is_fragment(iph))
-			*proto = iph->protocol;
-	} else if (skb->protocol == htons(ETH_P_IPV6)) {
-		if (unlikely(!pskb_may_pull(skb, *noff + sizeof(*iph6))))
+			*ip_proto = iph->protocol;
+	} else if (l2_proto == htons(ETH_P_IPV6)) {
+		data = bond_pull_data(skb, data, hlen, *nhoff + sizeof(*iph6));
+		if (!data)
 			return false;
-		iph6 = (const struct ipv6hdr *)(skb->data + *noff);
+
+		iph6 = (const struct ipv6hdr *)(data + *nhoff);
 		iph_to_flow_copy_v6addrs(fk, iph6);
-		*noff += sizeof(*iph6);
-		*proto = iph6->nexthdr;
+		*nhoff += sizeof(*iph6);
+		*ip_proto = iph6->nexthdr;
 	} else {
 		return false;
 	}
 
-	if (l34 && *proto >= 0)
-		fk->ports.ports = skb_flow_get_ports(skb, *noff, *proto);
+	if (l34 && *ip_proto >= 0)
+		fk->ports.ports = __skb_flow_get_ports(skb, *nhoff, *ip_proto, data, hlen);
 
 	return true;
 }
 
-static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
+static u32 bond_vlan_srcmac_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen)
 {
-	struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
+	struct ethhdr *mac_hdr;
 	u32 srcmac_vendor = 0, srcmac_dev = 0;
 	u16 vlan;
 	int i;
 
+	data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr));
+	if (!data)
+		return 0;
+	mac_hdr = (struct ethhdr *)(data + mhoff);
+
 	for (i = 0; i < 3; i++)
 		srcmac_vendor = (srcmac_vendor << 8) | mac_hdr->h_source[i];
 
@@ -3675,26 +3700,25 @@ static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
 }
 
 /* Extract the appropriate headers based on bond's xmit policy */
-static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
-			      struct flow_keys *fk)
+static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb, const void *data,
+			      __be16 l2_proto, int nhoff, int hlen, struct flow_keys *fk)
 {
 	bool l34 = bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34;
-	int noff, proto = -1;
+	int ip_proto = -1;
 
 	switch (bond->params.xmit_policy) {
 	case BOND_XMIT_POLICY_ENCAP23:
 	case BOND_XMIT_POLICY_ENCAP34:
 		memset(fk, 0, sizeof(*fk));
 		return __skb_flow_dissect(NULL, skb, &flow_keys_bonding,
-					  fk, NULL, 0, 0, 0, 0);
+					  fk, data, l2_proto, nhoff, hlen, 0);
 	default:
 		break;
 	}
 
 	fk->ports.ports = 0;
 	memset(&fk->icmp, 0, sizeof(fk->icmp));
-	noff = skb_network_offset(skb);
-	if (!bond_flow_ip(skb, fk, &noff, &proto, l34))
+	if (!bond_flow_ip(skb, fk, data, hlen, l2_proto, &nhoff, &ip_proto, l34))
 		return false;
 
 	/* ICMP error packets contains at least 8 bytes of the header
@@ -3702,22 +3726,20 @@ static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
 	 * to correlate ICMP error packets within the same flow which
 	 * generated the error.
 	 */
-	if (proto == IPPROTO_ICMP || proto == IPPROTO_ICMPV6) {
-		skb_flow_get_icmp_tci(skb, &fk->icmp, skb->data,
-				      skb_transport_offset(skb),
-				      skb_headlen(skb));
-		if (proto == IPPROTO_ICMP) {
+	if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
+		skb_flow_get_icmp_tci(skb, &fk->icmp, data, nhoff, hlen);
+		if (ip_proto == IPPROTO_ICMP) {
 			if (!icmp_is_err(fk->icmp.type))
 				return true;
 
-			noff += sizeof(struct icmphdr);
-		} else if (proto == IPPROTO_ICMPV6) {
+			nhoff += sizeof(struct icmphdr);
+		} else if (ip_proto == IPPROTO_ICMPV6) {
 			if (!icmpv6_is_err(fk->icmp.type))
 				return true;
 
-			noff += sizeof(struct icmp6hdr);
+			nhoff += sizeof(struct icmp6hdr);
 		}
-		return bond_flow_ip(skb, fk, &noff, &proto, l34);
+		return bond_flow_ip(skb, fk, data, hlen, l2_proto, &nhoff, &ip_proto, l34);
 	}
 
 	return true;
@@ -3733,33 +3755,26 @@ static u32 bond_ip_hash(u32 hash, struct flow_keys *flow)
 	return hash >> 1;
 }
 
-/**
- * bond_xmit_hash - generate a hash value based on the xmit policy
- * @bond: bonding device
- * @skb: buffer to use for headers
- *
- * This function will extract the necessary headers from the skb buffer and use
- * them to generate a hash based on the xmit_policy set in the bonding device
+/* Generate hash based on xmit policy. If @skb is given it is used to linearize
+ * the data as required, but this function can be used without it if the data is
+ * known to be linear (e.g. with xdp_buff).
  */
-u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
+static u32 __bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, const void *data,
+			    __be16 l2_proto, int mhoff, int nhoff, int hlen)
 {
 	struct flow_keys flow;
 	u32 hash;
 
-	if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
-	    skb->l4_hash)
-		return skb->hash;
-
 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_VLAN_SRCMAC)
-		return bond_vlan_srcmac_hash(skb);
+		return bond_vlan_srcmac_hash(skb, data, mhoff, hlen);
 
 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
-	    !bond_flow_dissect(bond, skb, &flow))
-		return bond_eth_hash(skb);
+	    !bond_flow_dissect(bond, skb, data, l2_proto, nhoff, hlen, &flow))
+		return bond_eth_hash(skb, data, mhoff, hlen);
 
 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 ||
 	    bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23) {
-		hash = bond_eth_hash(skb);
+		hash = bond_eth_hash(skb, data, mhoff, hlen);
 	} else {
 		if (flow.icmp.id)
 			memcpy(&hash, &flow.icmp, sizeof(hash));
@@ -3770,6 +3785,25 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
 	return bond_ip_hash(hash, &flow);
 }
 
+/**
+ * bond_xmit_hash - generate a hash value based on the xmit policy
+ * @bond: bonding device
+ * @skb: buffer to use for headers
+ *
+ * This function will extract the necessary headers from the skb buffer and use
+ * them to generate a hash based on the xmit_policy set in the bonding device
+ */
+u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
+{
+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
+	    skb->l4_hash)
+		return skb->hash;
+
+	return __bond_xmit_hash(bond, skb, skb->head, skb->protocol,
+				skb->mac_header, skb->network_header,
+				skb_headlen(skb));
+}
+
 /*-------------------------- Device entry points ----------------------------*/
 
 void bond_work_init_all(struct bonding *bond)
@@ -4399,8 +4433,7 @@ static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
 	return bond_tx_drop(bond_dev, skb);
 }
 
-static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond,
-						      struct sk_buff *skb)
+static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond)
 {
 	return rcu_dereference(bond->curr_active_slave);
 }
@@ -4414,7 +4447,7 @@ static netdev_tx_t bond_xmit_activebackup(struct sk_buff *skb,
 	struct bonding *bond = netdev_priv(bond_dev);
 	struct slave *slave;
 
-	slave = bond_xmit_activebackup_slave_get(bond, skb);
+	slave = bond_xmit_activebackup_slave_get(bond);
 	if (slave)
 		return bond_dev_queue_xmit(bond, skb, slave->dev);
 
@@ -4712,7 +4745,7 @@ static struct net_device *bond_xmit_get_slave(struct net_device *master_dev,
 		slave = bond_xmit_roundrobin_slave_get(bond, skb);
 		break;
 	case BOND_MODE_ACTIVEBACKUP:
-		slave = bond_xmit_activebackup_slave_get(bond, skb);
+		slave = bond_xmit_activebackup_slave_get(bond);
 		break;
 	case BOND_MODE_8023AD:
 	case BOND_MODE_XOR:
-- 
2.17.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v6 2/7] net: core: Add support for XDP redirection to slave device
  2021-07-31  5:57 ` [PATCH bpf-next v6 0/7]: XDP bonding support Jussi Maki
  2021-07-31  5:57   ` [PATCH bpf-next v6 1/7] net: bonding: Refactor bond_xmit_hash for use with xdp_buff Jussi Maki
@ 2021-07-31  5:57   ` Jussi Maki
  2021-07-31  5:57   ` [PATCH bpf-next v6 3/7] net: bonding: Add XDP support to the bonding driver Jussi Maki
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 71+ messages in thread
From: Jussi Maki @ 2021-07-31  5:57 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

This adds the ndo_xdp_get_xmit_slave hook for transforming XDP_TX
into XDP_REDIRECT after BPF program run when the ingress device
is a bond slave.

The dev_xdp_prog_count is exposed so that slave devices can be checked
for loaded XDP programs in order to avoid the situation where both
bond master and slave have programs loaded according to xdp_state.

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 include/linux/filter.h    | 13 ++++++++++++-
 include/linux/netdevice.h |  6 ++++++
 net/core/dev.c            | 13 ++++++++++++-
 net/core/filter.c         | 25 +++++++++++++++++++++++++
 4 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index ba36989f711a..7ea1cc378042 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -761,6 +761,10 @@ static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog,
 
 DECLARE_BPF_DISPATCHER(xdp)
 
+DECLARE_STATIC_KEY_FALSE(bpf_master_redirect_enabled_key);
+
+u32 xdp_master_redirect(struct xdp_buff *xdp);
+
 static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
 					    struct xdp_buff *xdp)
 {
@@ -768,7 +772,14 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
 	 * under local_bh_disable(), which provides the needed RCU protection
 	 * for accessing map entries.
 	 */
-	return __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
+	u32 act = __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
+
+	if (static_branch_unlikely(&bpf_master_redirect_enabled_key)) {
+		if (act == XDP_TX && netif_is_bond_slave(xdp->rxq->dev))
+			act = xdp_master_redirect(xdp);
+	}
+
+	return act;
 }
 
 void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 42f6f866d5f3..a380786429e1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1321,6 +1321,9 @@ struct netdev_net_notifier {
  *	that got dropped are freed/returned via xdp_return_frame().
  *	Returns negative number, means general error invoking ndo, meaning
  *	no frames were xmit'ed and core-caller will free all frames.
+ * struct net_device *(*ndo_xdp_get_xmit_slave)(struct net_device *dev,
+ *					        struct xdp_buff *xdp);
+ *      Get the xmit slave of master device based on the xdp_buff.
  * int (*ndo_xsk_wakeup)(struct net_device *dev, u32 queue_id, u32 flags);
  *      This function is used to wake up the softirq, ksoftirqd or kthread
  *	responsible for sending and/or receiving packets on a specific
@@ -1539,6 +1542,8 @@ struct net_device_ops {
 	int			(*ndo_xdp_xmit)(struct net_device *dev, int n,
 						struct xdp_frame **xdp,
 						u32 flags);
+	struct net_device *	(*ndo_xdp_get_xmit_slave)(struct net_device *dev,
+							  struct xdp_buff *xdp);
 	int			(*ndo_xsk_wakeup)(struct net_device *dev,
 						  u32 queue_id, u32 flags);
 	struct devlink_port *	(*ndo_get_devlink_port)(struct net_device *dev);
@@ -4071,6 +4076,7 @@ typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf);
 int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 		      int fd, int expected_fd, u32 flags);
 int bpf_xdp_link_attach(const union bpf_attr *attr, struct bpf_prog *prog);
+u8 dev_xdp_prog_count(struct net_device *dev);
 u32 dev_xdp_prog_id(struct net_device *dev, enum bpf_xdp_mode mode);
 
 int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index 3ee58876e8f5..27023ea933dd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9353,7 +9353,7 @@ static struct bpf_prog *dev_xdp_prog(struct net_device *dev,
 	return dev->xdp_state[mode].prog;
 }
 
-static u8 dev_xdp_prog_count(struct net_device *dev)
+u8 dev_xdp_prog_count(struct net_device *dev)
 {
 	u8 count = 0;
 	int i;
@@ -9363,6 +9363,7 @@ static u8 dev_xdp_prog_count(struct net_device *dev)
 			count++;
 	return count;
 }
+EXPORT_SYMBOL_GPL(dev_xdp_prog_count);
 
 u32 dev_xdp_prog_id(struct net_device *dev, enum bpf_xdp_mode mode)
 {
@@ -9456,6 +9457,8 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack
 {
 	unsigned int num_modes = hweight32(flags & XDP_FLAGS_MODES);
 	struct bpf_prog *cur_prog;
+	struct net_device *upper;
+	struct list_head *iter;
 	enum bpf_xdp_mode mode;
 	bpf_op_t bpf_op;
 	int err;
@@ -9494,6 +9497,14 @@ static int dev_xdp_attach(struct net_device *dev, struct netlink_ext_ack *extack
 		return -EBUSY;
 	}
 
+	/* don't allow if an upper device already has a program */
+	netdev_for_each_upper_dev_rcu(dev, upper, iter) {
+		if (dev_xdp_prog_count(upper) > 0) {
+			NL_SET_ERR_MSG(extack, "Cannot attach when an upper device already has a program");
+			return -EEXIST;
+		}
+	}
+
 	cur_prog = dev_xdp_prog(dev, mode);
 	/* can't replace attached prog with link */
 	if (link && cur_prog) {
diff --git a/net/core/filter.c b/net/core/filter.c
index faf29fd82276..ff62cd39046d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3950,6 +3950,31 @@ void bpf_clear_redirect_map(struct bpf_map *map)
 	}
 }
 
+DEFINE_STATIC_KEY_FALSE(bpf_master_redirect_enabled_key);
+EXPORT_SYMBOL_GPL(bpf_master_redirect_enabled_key);
+
+u32 xdp_master_redirect(struct xdp_buff *xdp)
+{
+	struct net_device *master, *slave;
+	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+
+	master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev);
+	slave = master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp);
+	if (slave && slave != xdp->rxq->dev) {
+		/* The target device is different from the receiving device, so
+		 * redirect it to the new device.
+		 * Using XDP_REDIRECT gets the correct behaviour from XDP enabled
+		 * drivers to unmap the packet from their rx ring.
+		 */
+		ri->tgt_index = slave->ifindex;
+		ri->map_id = INT_MAX;
+		ri->map_type = BPF_MAP_TYPE_UNSPEC;
+		return XDP_REDIRECT;
+	}
+	return XDP_TX;
+}
+EXPORT_SYMBOL_GPL(xdp_master_redirect);
+
 int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
 		    struct bpf_prog *xdp_prog)
 {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v6 3/7] net: bonding: Add XDP support to the bonding driver
  2021-07-31  5:57 ` [PATCH bpf-next v6 0/7]: XDP bonding support Jussi Maki
  2021-07-31  5:57   ` [PATCH bpf-next v6 1/7] net: bonding: Refactor bond_xmit_hash for use with xdp_buff Jussi Maki
  2021-07-31  5:57   ` [PATCH bpf-next v6 2/7] net: core: Add support for XDP redirection to slave device Jussi Maki
@ 2021-07-31  5:57   ` Jussi Maki
  2021-07-31  5:57   ` [PATCH bpf-next v6 4/7] devmap: Exclude XDP broadcast to master device Jussi Maki
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 71+ messages in thread
From: Jussi Maki @ 2021-07-31  5:57 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

XDP is implemented in the bonding driver by transparently delegating
the XDP program loading, removal and xmit operations to the bonding
slave devices. The overall goal of this work is that XDP programs
can be attached to a bond device *without* any further changes (or
awareness) necessary to the program itself, meaning the same XDP
program can be attached to a native device but also a bonding device.

Semantics of XDP_TX when attached to a bond are equivalent in such
setting to the case when a tc/BPF program would be attached to the
bond, meaning transmitting the packet out of the bond itself using one
of the bond's configured xmit methods to select a slave device (rather
than XDP_TX on the slave itself). Handling of XDP_TX to transmit
using the configured bonding mechanism is therefore implemented by
rewriting the BPF program return value in bpf_prog_run_xdp. To avoid
performance impact this check is guarded by a static key, which is
incremented when a XDP program is loaded onto a bond device. This
approach was chosen to avoid changes to drivers implementing XDP. If
the slave device does not match the receive device, then XDP_REDIRECT
is transparently used to perform the redirection in order to have
the network driver release the packet from its RX ring.  The bonding
driver hashing functions have been refactored to allow reuse with
xdp_buff's to avoid code duplication.

The motivation for this change is to enable use of bonding (and
802.3ad) in hairpinning L4 load-balancers such as [1] implemented with
XDP and also to transparently support bond devices for projects that
use XDP given most modern NICs have dual port adapters.  An alternative
to this approach would be to implement 802.3ad in user-space and
implement the bonding load-balancing in the XDP program itself, but
is rather a cumbersome endeavor in terms of slave device management
(e.g. by watching netlink) and requires separate programs for native
vs bond cases for the orchestrator. A native in-kernel implementation
overcomes these issues and provides more flexibility.

Below are benchmark results done on two machines with 100Gbit
Intel E810 (ice) NIC and with 32-core 3970X on sending machine, and
16-core 3950X on receiving machine. 64 byte packets were sent with
pktgen-dpdk at full rate. Two issues [2, 3] were identified with the
ice driver, so the tests were performed with iommu=off and patch [2]
applied. Additionally the bonding round robin algorithm was modified
to use per-cpu tx counters as high CPU load (50% vs 10%) and high rate
of cache misses were caused by the shared rr_tx_counter (see patch
2/3). The statistics were collected using "sar -n dev -u 1 10".

 -----------------------|  CPU  |--| rxpck/s |--| txpck/s |----
 without patch (1 dev):
   XDP_DROP:              3.15%      48.6Mpps
   XDP_TX:                3.12%      18.3Mpps     18.3Mpps
   XDP_DROP (RSS):        9.47%      116.5Mpps
   XDP_TX (RSS):          9.67%      25.3Mpps     24.2Mpps
 -----------------------
 with patch, bond (1 dev):
   XDP_DROP:              3.14%      46.7Mpps
   XDP_TX:                3.15%      13.9Mpps     13.9Mpps
   XDP_DROP (RSS):        10.33%     117.2Mpps
   XDP_TX (RSS):          10.64%     25.1Mpps     24.0Mpps
 -----------------------
 with patch, bond (2 devs):
   XDP_DROP:              6.27%      92.7Mpps
   XDP_TX:                6.26%      17.6Mpps     17.5Mpps
   XDP_DROP (RSS):       11.38%      117.2Mpps
   XDP_TX (RSS):         14.30%      28.7Mpps     27.4Mpps
 --------------------------------------------------------------

RSS: Receive Side Scaling, e.g. the packets were sent to a range of
destination IPs.

[1]: https://cilium.io/blog/2021/05/20/cilium-110#standalonelb
[2]: https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/T/#t
[3]: https://lore.kernel.org/bpf/CAHn8xckNXci+X_Eb2WMv4uVYjO2331UWB2JLtXr_58z0Av8+8A@mail.gmail.com/

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 drivers/net/bonding/bond_main.c | 309 +++++++++++++++++++++++++++++++-
 include/net/bonding.h           |   1 +
 2 files changed, 309 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index dcec5cc4dab1..fcd01acd1c83 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -317,6 +317,19 @@ bool bond_sk_check(struct bonding *bond)
 	}
 }
 
+static bool bond_xdp_check(struct bonding *bond)
+{
+	switch (BOND_MODE(bond)) {
+	case BOND_MODE_ROUNDROBIN:
+	case BOND_MODE_ACTIVEBACKUP:
+	case BOND_MODE_8023AD:
+	case BOND_MODE_XOR:
+		return true;
+	default:
+		return false;
+	}
+}
+
 /*---------------------------------- VLAN -----------------------------------*/
 
 /* In the following 2 functions, bond_vlan_rx_add_vid and bond_vlan_rx_kill_vid,
@@ -2133,6 +2146,41 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
 		bond_update_slave_arr(bond, NULL);
 
 
+	if (!slave_dev->netdev_ops->ndo_bpf ||
+	    !slave_dev->netdev_ops->ndo_xdp_xmit) {
+		if (bond->xdp_prog) {
+			NL_SET_ERR_MSG(extack, "Slave does not support XDP");
+			slave_err(bond_dev, slave_dev, "Slave does not support XDP\n");
+			res = -EOPNOTSUPP;
+			goto err_sysfs_del;
+		}
+	} else {
+		struct netdev_bpf xdp = {
+			.command = XDP_SETUP_PROG,
+			.flags   = 0,
+			.prog    = bond->xdp_prog,
+			.extack  = extack,
+		};
+
+		if (dev_xdp_prog_count(slave_dev) > 0) {
+			NL_SET_ERR_MSG(extack,
+				       "Slave has XDP program loaded, please unload before enslaving");
+			slave_err(bond_dev, slave_dev,
+				  "Slave has XDP program loaded, please unload before enslaving\n");
+			res = -EOPNOTSUPP;
+			goto err_sysfs_del;
+		}
+
+		res = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp);
+		if (res < 0) {
+			/* ndo_bpf() sets extack error message */
+			slave_dbg(bond_dev, slave_dev, "Error %d calling ndo_bpf\n", res);
+			goto err_sysfs_del;
+		}
+		if (bond->xdp_prog)
+			bpf_prog_inc(bond->xdp_prog);
+	}
+
 	slave_info(bond_dev, slave_dev, "Enslaving as %s interface with %s link\n",
 		   bond_is_active_slave(new_slave) ? "an active" : "a backup",
 		   new_slave->link != BOND_LINK_DOWN ? "an up" : "a down");
@@ -2252,6 +2300,17 @@ static int __bond_release_one(struct net_device *bond_dev,
 	/* recompute stats just before removing the slave */
 	bond_get_stats(bond->dev, &bond->bond_stats);
 
+	if (bond->xdp_prog) {
+		struct netdev_bpf xdp = {
+			.command = XDP_SETUP_PROG,
+			.flags   = 0,
+			.prog	 = NULL,
+			.extack  = NULL,
+		};
+		if (slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp))
+			slave_warn(bond_dev, slave_dev, "failed to unload XDP program\n");
+	}
+
 	bond_upper_dev_unlink(bond, slave);
 	/* unregister rx_handler early so bond_handle_frame wouldn't be called
 	 * for this slave anymore.
@@ -3635,7 +3694,7 @@ static inline u32 bond_eth_hash(struct sk_buff *skb, const void *data, int mhoff
 		return 0;
 
 	ep = (struct ethhdr *)(data + mhoff);
-	return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto;
+	return ep->h_dest[5] ^ ep->h_source[5] ^ be16_to_cpu(ep->h_proto);
 }
 
 static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk, const void *data,
@@ -3804,6 +3863,26 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
 				skb_headlen(skb));
 }
 
+/**
+ * bond_xmit_hash_xdp - generate a hash value based on the xmit policy
+ * @bond: bonding device
+ * @xdp: buffer to use for headers
+ *
+ * The XDP variant of bond_xmit_hash.
+ */
+static u32 bond_xmit_hash_xdp(struct bonding *bond, struct xdp_buff *xdp)
+{
+	struct ethhdr *eth;
+
+	if (xdp->data + sizeof(struct ethhdr) > xdp->data_end)
+		return 0;
+
+	eth = (struct ethhdr *)xdp->data;
+
+	return __bond_xmit_hash(bond, NULL, xdp->data, eth->h_proto, 0,
+				sizeof(struct ethhdr), xdp->data_end - xdp->data);
+}
+
 /*-------------------------- Device entry points ----------------------------*/
 
 void bond_work_init_all(struct bonding *bond)
@@ -4420,6 +4499,47 @@ static struct slave *bond_xmit_roundrobin_slave_get(struct bonding *bond,
 	return NULL;
 }
 
+static struct slave *bond_xdp_xmit_roundrobin_slave_get(struct bonding *bond,
+							struct xdp_buff *xdp)
+{
+	struct slave *slave;
+	int slave_cnt;
+	u32 slave_id;
+	const struct ethhdr *eth;
+	void *data = xdp->data;
+
+	if (data + sizeof(struct ethhdr) > xdp->data_end)
+		goto non_igmp;
+
+	eth = (struct ethhdr *)data;
+	data += sizeof(struct ethhdr);
+
+	/* See comment on IGMP in bond_xmit_roundrobin_slave_get() */
+	if (eth->h_proto == htons(ETH_P_IP)) {
+		const struct iphdr *iph;
+
+		if (data + sizeof(struct iphdr) > xdp->data_end)
+			goto non_igmp;
+
+		iph = (struct iphdr *)data;
+
+		if (iph->protocol == IPPROTO_IGMP) {
+			slave = rcu_dereference(bond->curr_active_slave);
+			if (slave)
+				return slave;
+			return bond_get_slave_by_id(bond, 0);
+		}
+	}
+
+non_igmp:
+	slave_cnt = READ_ONCE(bond->slave_cnt);
+	if (likely(slave_cnt)) {
+		slave_id = bond_rr_gen_slave_id(bond) % slave_cnt;
+		return bond_get_slave_by_id(bond, slave_id);
+	}
+	return NULL;
+}
+
 static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
 					struct net_device *bond_dev)
 {
@@ -4635,6 +4755,22 @@ static struct slave *bond_xmit_3ad_xor_slave_get(struct bonding *bond,
 	return slave;
 }
 
+static struct slave *bond_xdp_xmit_3ad_xor_slave_get(struct bonding *bond,
+						     struct xdp_buff *xdp)
+{
+	struct bond_up_slave *slaves;
+	unsigned int count;
+	u32 hash;
+
+	hash = bond_xmit_hash_xdp(bond, xdp);
+	slaves = rcu_dereference(bond->usable_slaves);
+	count = slaves ? READ_ONCE(slaves->count) : 0;
+	if (unlikely(!count))
+		return NULL;
+
+	return slaves->arr[hash % count];
+}
+
 /* Use this Xmit function for 3AD as well as XOR modes. The current
  * usable slave array is formed in the control path. The xmit function
  * just calculates hash and sends the packet out.
@@ -4919,6 +5055,174 @@ static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	return ret;
 }
 
+static struct net_device *
+bond_xdp_get_xmit_slave(struct net_device *bond_dev, struct xdp_buff *xdp)
+{
+	struct bonding *bond = netdev_priv(bond_dev);
+	struct slave *slave;
+
+	/* Caller needs to hold rcu_read_lock() */
+
+	switch (BOND_MODE(bond)) {
+	case BOND_MODE_ROUNDROBIN:
+		slave = bond_xdp_xmit_roundrobin_slave_get(bond, xdp);
+		break;
+
+	case BOND_MODE_ACTIVEBACKUP:
+		slave = bond_xmit_activebackup_slave_get(bond);
+		break;
+
+	case BOND_MODE_8023AD:
+	case BOND_MODE_XOR:
+		slave = bond_xdp_xmit_3ad_xor_slave_get(bond, xdp);
+		break;
+
+	default:
+		/* Should never happen. Mode guarded by bond_xdp_check() */
+		netdev_err(bond_dev, "Unknown bonding mode %d for xdp xmit\n", BOND_MODE(bond));
+		WARN_ON_ONCE(1);
+		return NULL;
+	}
+
+	if (slave)
+		return slave->dev;
+
+	return NULL;
+}
+
+static int bond_xdp_xmit(struct net_device *bond_dev,
+			 int n, struct xdp_frame **frames, u32 flags)
+{
+	int nxmit, err = -ENXIO;
+
+	rcu_read_lock();
+
+	for (nxmit = 0; nxmit < n; nxmit++) {
+		struct xdp_frame *frame = frames[nxmit];
+		struct xdp_frame *frames1[] = {frame};
+		struct net_device *slave_dev;
+		struct xdp_buff xdp;
+
+		xdp_convert_frame_to_buff(frame, &xdp);
+
+		slave_dev = bond_xdp_get_xmit_slave(bond_dev, &xdp);
+		if (!slave_dev) {
+			err = -ENXIO;
+			break;
+		}
+
+		err = slave_dev->netdev_ops->ndo_xdp_xmit(slave_dev, 1, frames1, flags);
+		if (err < 1)
+			break;
+	}
+
+	rcu_read_unlock();
+
+	/* If error happened on the first frame then we can pass the error up, otherwise
+	 * report the number of frames that were xmitted.
+	 */
+	if (err < 0)
+		return (nxmit == 0 ? err : nxmit);
+
+	return nxmit;
+}
+
+static int bond_xdp_set(struct net_device *dev, struct bpf_prog *prog,
+			struct netlink_ext_ack *extack)
+{
+	struct bonding *bond = netdev_priv(dev);
+	struct list_head *iter;
+	struct slave *slave, *rollback_slave;
+	struct bpf_prog *old_prog;
+	struct netdev_bpf xdp = {
+		.command = XDP_SETUP_PROG,
+		.flags   = 0,
+		.prog    = prog,
+		.extack  = extack,
+	};
+	int err;
+
+	ASSERT_RTNL();
+
+	if (!bond_xdp_check(bond))
+		return -EOPNOTSUPP;
+
+	old_prog = bond->xdp_prog;
+	bond->xdp_prog = prog;
+
+	bond_for_each_slave(bond, slave, iter) {
+		struct net_device *slave_dev = slave->dev;
+
+		if (!slave_dev->netdev_ops->ndo_bpf ||
+		    !slave_dev->netdev_ops->ndo_xdp_xmit) {
+			NL_SET_ERR_MSG(extack, "Slave device does not support XDP");
+			slave_err(dev, slave_dev, "Slave does not support XDP\n");
+			err = -EOPNOTSUPP;
+			goto err;
+		}
+
+		if (dev_xdp_prog_count(slave_dev) > 0) {
+			NL_SET_ERR_MSG(extack,
+				       "Slave has XDP program loaded, please unload before enslaving");
+			slave_err(dev, slave_dev,
+				  "Slave has XDP program loaded, please unload before enslaving\n");
+			err = -EOPNOTSUPP;
+			goto err;
+		}
+
+		err = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp);
+		if (err < 0) {
+			/* ndo_bpf() sets extack error message */
+			slave_err(dev, slave_dev, "Error %d calling ndo_bpf\n", err);
+			goto err;
+		}
+		if (prog)
+			bpf_prog_inc(prog);
+	}
+
+	if (old_prog)
+		bpf_prog_put(old_prog);
+
+	if (prog)
+		static_branch_inc(&bpf_master_redirect_enabled_key);
+	else
+		static_branch_dec(&bpf_master_redirect_enabled_key);
+
+	return 0;
+
+err:
+	/* unwind the program changes */
+	bond->xdp_prog = old_prog;
+	xdp.prog = old_prog;
+	xdp.extack = NULL; /* do not overwrite original error */
+
+	bond_for_each_slave(bond, rollback_slave, iter) {
+		struct net_device *slave_dev = rollback_slave->dev;
+		int err_unwind;
+
+		if (slave == rollback_slave)
+			break;
+
+		err_unwind = slave_dev->netdev_ops->ndo_bpf(slave_dev, &xdp);
+		if (err_unwind < 0)
+			slave_err(dev, slave_dev,
+				  "Error %d when unwinding XDP program change\n", err_unwind);
+		else if (xdp.prog)
+			bpf_prog_inc(xdp.prog);
+	}
+	return err;
+}
+
+static int bond_xdp(struct net_device *dev, struct netdev_bpf *xdp)
+{
+	switch (xdp->command) {
+	case XDP_SETUP_PROG:
+		return bond_xdp_set(dev, xdp->prog, xdp->extack);
+	default:
+		return -EINVAL;
+	}
+}
+
 static u32 bond_mode_bcast_speed(struct slave *slave, u32 speed)
 {
 	if (speed == 0 || speed == SPEED_UNKNOWN)
@@ -5005,6 +5309,9 @@ static const struct net_device_ops bond_netdev_ops = {
 	.ndo_features_check	= passthru_features_check,
 	.ndo_get_xmit_slave	= bond_xmit_get_slave,
 	.ndo_sk_get_lower_dev	= bond_sk_get_lower_dev,
+	.ndo_bpf		= bond_xdp,
+	.ndo_xdp_xmit           = bond_xdp_xmit,
+	.ndo_xdp_get_xmit_slave = bond_xdp_get_xmit_slave,
 };
 
 static const struct device_type bond_type = {
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 625d9c72dee3..b91c365e4e95 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -258,6 +258,7 @@ struct bonding {
 	/* protecting ipsec_list */
 	spinlock_t ipsec_lock;
 #endif /* CONFIG_XFRM_OFFLOAD */
+	struct bpf_prog *xdp_prog;
 };
 
 #define bond_slave_get_rcu(dev) \
-- 
2.17.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v6 4/7] devmap: Exclude XDP broadcast to master device
  2021-07-31  5:57 ` [PATCH bpf-next v6 0/7]: XDP bonding support Jussi Maki
                     ` (2 preceding siblings ...)
  2021-07-31  5:57   ` [PATCH bpf-next v6 3/7] net: bonding: Add XDP support to the bonding driver Jussi Maki
@ 2021-07-31  5:57   ` Jussi Maki
  2021-07-31  5:57   ` [PATCH bpf-next v6 5/7] net: core: Allow netdev_lower_get_next_private_rcu in bh context Jussi Maki
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 71+ messages in thread
From: Jussi Maki @ 2021-07-31  5:57 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

If the ingress device is bond slave, do not broadcast back
through it or the bond master.

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 kernel/bpf/devmap.c | 69 +++++++++++++++++++++++++++++++++++++++------
 1 file changed, 60 insertions(+), 9 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 542e94fa30b4..f02d04540c0c 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -534,10 +534,9 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
 	return __xdp_enqueue(dev, xdp, dev_rx, dst->xdp_prog);
 }
 
-static bool is_valid_dst(struct bpf_dtab_netdev *obj, struct xdp_buff *xdp,
-			 int exclude_ifindex)
+static bool is_valid_dst(struct bpf_dtab_netdev *obj, struct xdp_buff *xdp)
 {
-	if (!obj || obj->dev->ifindex == exclude_ifindex ||
+	if (!obj ||
 	    !obj->dev->netdev_ops->ndo_xdp_xmit)
 		return false;
 
@@ -562,17 +561,48 @@ static int dev_map_enqueue_clone(struct bpf_dtab_netdev *obj,
 	return 0;
 }
 
+static inline bool is_ifindex_excluded(int *excluded, int num_excluded, int ifindex)
+{
+	while (num_excluded--) {
+		if (ifindex == excluded[num_excluded])
+			return true;
+	}
+	return false;
+}
+
+/* Get ifindex of each upper device. 'indexes' must be able to hold at
+ * least MAX_NEST_DEV elements.
+ * Returns the number of ifindexes added.
+ */
+static int get_upper_ifindexes(struct net_device *dev, int *indexes)
+{
+	struct net_device *upper;
+	struct list_head *iter;
+	int n = 0;
+
+	netdev_for_each_upper_dev_rcu(dev, upper, iter) {
+		indexes[n++] = upper->ifindex;
+	}
+	return n;
+}
+
 int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
 			  struct bpf_map *map, bool exclude_ingress)
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
-	int exclude_ifindex = exclude_ingress ? dev_rx->ifindex : 0;
 	struct bpf_dtab_netdev *dst, *last_dst = NULL;
+	int excluded_devices[1+MAX_NEST_DEV];
 	struct hlist_head *head;
 	struct xdp_frame *xdpf;
+	int num_excluded = 0;
 	unsigned int i;
 	int err;
 
+	if (exclude_ingress) {
+		num_excluded = get_upper_ifindexes(dev_rx, excluded_devices);
+		excluded_devices[num_excluded++] = dev_rx->ifindex;
+	}
+
 	xdpf = xdp_convert_buff_to_frame(xdp);
 	if (unlikely(!xdpf))
 		return -EOVERFLOW;
@@ -581,7 +611,10 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
 		for (i = 0; i < map->max_entries; i++) {
 			dst = rcu_dereference_check(dtab->netdev_map[i],
 						    rcu_read_lock_bh_held());
-			if (!is_valid_dst(dst, xdp, exclude_ifindex))
+			if (!is_valid_dst(dst, xdp))
+				continue;
+
+			if (is_ifindex_excluded(excluded_devices, num_excluded, dst->dev->ifindex))
 				continue;
 
 			/* we only need n-1 clones; last_dst enqueued below */
@@ -601,7 +634,11 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
 			head = dev_map_index_hash(dtab, i);
 			hlist_for_each_entry_rcu(dst, head, index_hlist,
 						 lockdep_is_held(&dtab->index_lock)) {
-				if (!is_valid_dst(dst, xdp, exclude_ifindex))
+				if (!is_valid_dst(dst, xdp))
+					continue;
+
+				if (is_ifindex_excluded(excluded_devices, num_excluded,
+							dst->dev->ifindex))
 					continue;
 
 				/* we only need n-1 clones; last_dst enqueued below */
@@ -675,18 +712,27 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb,
 			   bool exclude_ingress)
 {
 	struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
-	int exclude_ifindex = exclude_ingress ? dev->ifindex : 0;
 	struct bpf_dtab_netdev *dst, *last_dst = NULL;
+	int excluded_devices[1+MAX_NEST_DEV];
 	struct hlist_head *head;
 	struct hlist_node *next;
+	int num_excluded = 0;
 	unsigned int i;
 	int err;
 
+	if (exclude_ingress) {
+		num_excluded = get_upper_ifindexes(dev, excluded_devices);
+		excluded_devices[num_excluded++] = dev->ifindex;
+	}
+
 	if (map->map_type == BPF_MAP_TYPE_DEVMAP) {
 		for (i = 0; i < map->max_entries; i++) {
 			dst = rcu_dereference_check(dtab->netdev_map[i],
 						    rcu_read_lock_bh_held());
-			if (!dst || dst->dev->ifindex == exclude_ifindex)
+			if (!dst)
+				continue;
+
+			if (is_ifindex_excluded(excluded_devices, num_excluded, dst->dev->ifindex))
 				continue;
 
 			/* we only need n-1 clones; last_dst enqueued below */
@@ -700,12 +746,17 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb,
 				return err;
 
 			last_dst = dst;
+
 		}
 	} else { /* BPF_MAP_TYPE_DEVMAP_HASH */
 		for (i = 0; i < dtab->n_buckets; i++) {
 			head = dev_map_index_hash(dtab, i);
 			hlist_for_each_entry_safe(dst, next, head, index_hlist) {
-				if (!dst || dst->dev->ifindex == exclude_ifindex)
+				if (!dst)
+					continue;
+
+				if (is_ifindex_excluded(excluded_devices, num_excluded,
+							dst->dev->ifindex))
 					continue;
 
 				/* we only need n-1 clones; last_dst enqueued below */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v6 5/7] net: core: Allow netdev_lower_get_next_private_rcu in bh context
  2021-07-31  5:57 ` [PATCH bpf-next v6 0/7]: XDP bonding support Jussi Maki
                     ` (3 preceding siblings ...)
  2021-07-31  5:57   ` [PATCH bpf-next v6 4/7] devmap: Exclude XDP broadcast to master device Jussi Maki
@ 2021-07-31  5:57   ` Jussi Maki
  2021-07-31  5:57   ` [PATCH bpf-next v6 6/7] selftests/bpf: Fix xdp_tx.c prog section name Jussi Maki
  2021-07-31  5:57   ` [PATCH bpf-next v6 7/7] selftests/bpf: Add tests for XDP bonding Jussi Maki
  6 siblings, 0 replies; 71+ messages in thread
From: Jussi Maki @ 2021-07-31  5:57 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

For the XDP bonding slave lookup to work in the NAPI poll context
in which the redudant rcu_read_lock() has been removed we have to
follow the same approach as in [1] and modify the WARN_ON to also
check rcu_read_lock_bh_held().

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=694cea395fded425008e93cd90cfdf7a451674af

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 net/core/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 27023ea933dd..ae1aecf97b58 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7588,7 +7588,7 @@ void *netdev_lower_get_next_private_rcu(struct net_device *dev,
 {
 	struct netdev_adjacent *lower;
 
-	WARN_ON_ONCE(!rcu_read_lock_held());
+	WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_bh_held());
 
 	lower = list_entry_rcu((*iter)->next, struct netdev_adjacent, list);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v6 6/7] selftests/bpf: Fix xdp_tx.c prog section name
  2021-07-31  5:57 ` [PATCH bpf-next v6 0/7]: XDP bonding support Jussi Maki
                     ` (4 preceding siblings ...)
  2021-07-31  5:57   ` [PATCH bpf-next v6 5/7] net: core: Allow netdev_lower_get_next_private_rcu in bh context Jussi Maki
@ 2021-07-31  5:57   ` Jussi Maki
  2021-08-06 22:53     ` Andrii Nakryiko
  2021-07-31  5:57   ` [PATCH bpf-next v6 7/7] selftests/bpf: Add tests for XDP bonding Jussi Maki
  6 siblings, 1 reply; 71+ messages in thread
From: Jussi Maki @ 2021-07-31  5:57 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

The program type cannot be deduced from 'tx' which causes an invalid
argument error when trying to load xdp_tx.o using the skeleton.
Rename the section name to "xdp" so that libbpf can deduce the type.

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 tools/testing/selftests/bpf/progs/xdp_tx.c   | 2 +-
 tools/testing/selftests/bpf/test_xdp_veth.sh | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/xdp_tx.c b/tools/testing/selftests/bpf/progs/xdp_tx.c
index 94e6c2b281cb..5f725c720e00 100644
--- a/tools/testing/selftests/bpf/progs/xdp_tx.c
+++ b/tools/testing/selftests/bpf/progs/xdp_tx.c
@@ -3,7 +3,7 @@
 #include <linux/bpf.h>
 #include <bpf/bpf_helpers.h>
 
-SEC("tx")
+SEC("xdp")
 int xdp_tx(struct xdp_md *xdp)
 {
 	return XDP_TX;
diff --git a/tools/testing/selftests/bpf/test_xdp_veth.sh b/tools/testing/selftests/bpf/test_xdp_veth.sh
index ba8ffcdaac30..995278e684b6 100755
--- a/tools/testing/selftests/bpf/test_xdp_veth.sh
+++ b/tools/testing/selftests/bpf/test_xdp_veth.sh
@@ -108,7 +108,7 @@ ip link set dev veth2 xdp pinned $BPF_DIR/progs/redirect_map_1
 ip link set dev veth3 xdp pinned $BPF_DIR/progs/redirect_map_2
 
 ip -n ns1 link set dev veth11 xdp obj xdp_dummy.o sec xdp_dummy
-ip -n ns2 link set dev veth22 xdp obj xdp_tx.o sec tx
+ip -n ns2 link set dev veth22 xdp obj xdp_tx.o sec xdp
 ip -n ns3 link set dev veth33 xdp obj xdp_dummy.o sec xdp_dummy
 
 trap cleanup EXIT
-- 
2.17.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH bpf-next v6 7/7] selftests/bpf: Add tests for XDP bonding
  2021-07-31  5:57 ` [PATCH bpf-next v6 0/7]: XDP bonding support Jussi Maki
                     ` (5 preceding siblings ...)
  2021-07-31  5:57   ` [PATCH bpf-next v6 6/7] selftests/bpf: Fix xdp_tx.c prog section name Jussi Maki
@ 2021-07-31  5:57   ` Jussi Maki
  2021-08-06 22:50     ` Andrii Nakryiko
  6 siblings, 1 reply; 71+ messages in thread
From: Jussi Maki @ 2021-07-31  5:57 UTC (permalink / raw)
  To: bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson, Jussi Maki

Add a test suite to test XDP bonding implementation
over a pair of veth devices.

Signed-off-by: Jussi Maki <joamaki@gmail.com>
---
 .../selftests/bpf/prog_tests/xdp_bonding.c    | 520 ++++++++++++++++++
 1 file changed, 520 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c b/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c
new file mode 100644
index 000000000000..334a04721a59
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c
@@ -0,0 +1,520 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/**
+ * Test XDP bonding support
+ *
+ * Sets up two bonded veth pairs between two fresh namespaces
+ * and verifies that XDP_TX program loaded on a bond device
+ * are correctly loaded onto the slave devices and XDP_TX'd
+ * packets are balanced using bonding.
+ */
+
+#define _GNU_SOURCE
+#include <sched.h>
+#include <net/if.h>
+#include <linux/if_link.h>
+#include "test_progs.h"
+#include "network_helpers.h"
+#include <linux/if_bonding.h>
+#include <linux/limits.h>
+#include <linux/udp.h>
+
+#include "xdp_dummy.skel.h"
+#include "xdp_redirect_multi_kern.skel.h"
+#include "xdp_tx.skel.h"
+
+#define BOND1_MAC {0x00, 0x11, 0x22, 0x33, 0x44, 0x55}
+#define BOND1_MAC_STR "00:11:22:33:44:55"
+#define BOND2_MAC {0x00, 0x22, 0x33, 0x44, 0x55, 0x66}
+#define BOND2_MAC_STR "00:22:33:44:55:66"
+#define NPACKETS 100
+
+static int root_netns_fd = -1;
+
+static void restore_root_netns(void)
+{
+	ASSERT_OK(setns(root_netns_fd, CLONE_NEWNET), "restore_root_netns");
+}
+
+static int setns_by_name(char *name)
+{
+	int nsfd, err;
+	char nspath[PATH_MAX];
+
+	snprintf(nspath, sizeof(nspath), "%s/%s", "/var/run/netns", name);
+	nsfd = open(nspath, O_RDONLY | O_CLOEXEC);
+	if (nsfd < 0)
+		return -1;
+
+	err = setns(nsfd, CLONE_NEWNET);
+	close(nsfd);
+	return err;
+}
+
+static int get_rx_packets(const char *iface)
+{
+	FILE *f;
+	char line[512];
+	int iface_len = strlen(iface);
+
+	f = fopen("/proc/net/dev", "r");
+	if (!f)
+		return -1;
+
+	while (fgets(line, sizeof(line), f)) {
+		char *p = line;
+
+		while (*p == ' ')
+			p++; /* skip whitespace */
+		if (!strncmp(p, iface, iface_len)) {
+			p += iface_len;
+			if (*p++ != ':')
+				continue;
+			while (*p == ' ')
+				p++; /* skip whitespace */
+			while (*p && *p != ' ')
+				p++; /* skip rx bytes */
+			while (*p == ' ')
+				p++; /* skip whitespace */
+			fclose(f);
+			return atoi(p);
+		}
+	}
+	fclose(f);
+	return -1;
+}
+
+#define MAX_BPF_LINKS 8
+
+struct skeletons {
+	struct xdp_dummy *xdp_dummy;
+	struct xdp_tx *xdp_tx;
+	struct xdp_redirect_multi_kern *xdp_redirect_multi_kern;
+
+	int nlinks;
+	struct bpf_link *links[MAX_BPF_LINKS];
+};
+
+static int xdp_attach(struct skeletons *skeletons, struct bpf_program *prog, char *iface)
+{
+	struct bpf_link *link;
+	int ifindex;
+
+	ifindex = if_nametoindex(iface);
+	if (!ASSERT_GT(ifindex, 0, "get ifindex"))
+		return -1;
+
+	if (!ASSERT_LE(skeletons->nlinks+1, MAX_BPF_LINKS, "too many XDP programs attached"))
+		return -1;
+
+	link = bpf_program__attach_xdp(prog, ifindex);
+	if (!ASSERT_OK_PTR(link, "attach xdp program"))
+		return -1;
+
+	skeletons->links[skeletons->nlinks++] = link;
+	return 0;
+}
+
+enum {
+	BOND_ONE_NO_ATTACH = 0,
+	BOND_BOTH_AND_ATTACH,
+};
+
+static const char * const mode_names[] = {
+	[BOND_MODE_ROUNDROBIN]   = "balance-rr",
+	[BOND_MODE_ACTIVEBACKUP] = "active-backup",
+	[BOND_MODE_XOR]          = "balance-xor",
+	[BOND_MODE_BROADCAST]    = "broadcast",
+	[BOND_MODE_8023AD]       = "802.3ad",
+	[BOND_MODE_TLB]          = "balance-tlb",
+	[BOND_MODE_ALB]          = "balance-alb",
+};
+
+static const char * const xmit_policy_names[] = {
+	[BOND_XMIT_POLICY_LAYER2]       = "layer2",
+	[BOND_XMIT_POLICY_LAYER34]      = "layer3+4",
+	[BOND_XMIT_POLICY_LAYER23]      = "layer2+3",
+	[BOND_XMIT_POLICY_ENCAP23]      = "encap2+3",
+	[BOND_XMIT_POLICY_ENCAP34]      = "encap3+4",
+};
+
+static int bonding_setup(struct skeletons *skeletons, int mode, int xmit_policy,
+			 int bond_both_attach)
+{
+#define SYS(fmt, ...)						\
+	({							\
+		char cmd[1024];					\
+		snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__);	\
+		if (!ASSERT_OK(system(cmd), cmd))		\
+			return -1;				\
+	})
+
+	SYS("ip netns add ns_dst");
+	SYS("ip link add veth1_1 type veth peer name veth2_1 netns ns_dst");
+	SYS("ip link add veth1_2 type veth peer name veth2_2 netns ns_dst");
+
+	SYS("ip link add bond1 type bond mode %s xmit_hash_policy %s",
+	    mode_names[mode], xmit_policy_names[xmit_policy]);
+	SYS("ip link set bond1 up address " BOND1_MAC_STR " addrgenmode none");
+	SYS("ip -netns ns_dst link add bond2 type bond mode %s xmit_hash_policy %s",
+	    mode_names[mode], xmit_policy_names[xmit_policy]);
+	SYS("ip -netns ns_dst link set bond2 up address " BOND2_MAC_STR " addrgenmode none");
+
+	SYS("ip link set veth1_1 master bond1");
+	if (bond_both_attach == BOND_BOTH_AND_ATTACH) {
+		SYS("ip link set veth1_2 master bond1");
+	} else {
+		SYS("ip link set veth1_2 up addrgenmode none");
+
+		if (xdp_attach(skeletons, skeletons->xdp_dummy->progs.xdp_dummy_prog, "veth1_2"))
+			return -1;
+	}
+
+	SYS("ip -netns ns_dst link set veth2_1 master bond2");
+
+	if (bond_both_attach == BOND_BOTH_AND_ATTACH)
+		SYS("ip -netns ns_dst link set veth2_2 master bond2");
+	else
+		SYS("ip -netns ns_dst link set veth2_2 up addrgenmode none");
+
+	/* Load a dummy program on sending side as with veth peer needs to have a
+	 * XDP program loaded as well.
+	 */
+	if (xdp_attach(skeletons, skeletons->xdp_dummy->progs.xdp_dummy_prog, "bond1"))
+		return -1;
+
+	if (bond_both_attach == BOND_BOTH_AND_ATTACH) {
+		if (!ASSERT_OK(setns_by_name("ns_dst"), "set netns to ns_dst"))
+			return -1;
+
+		if (xdp_attach(skeletons, skeletons->xdp_tx->progs.xdp_tx, "bond2"))
+			return -1;
+
+		restore_root_netns();
+	}
+
+	return 0;
+
+#undef SYS
+}
+
+static void bonding_cleanup(struct skeletons *skeletons)
+{
+	restore_root_netns();
+	while (skeletons->nlinks) {
+		skeletons->nlinks--;
+		bpf_link__destroy(skeletons->links[skeletons->nlinks]);
+	}
+	ASSERT_OK(system("ip link delete bond1"), "delete bond1");
+	ASSERT_OK(system("ip link delete veth1_1"), "delete veth1_1");
+	ASSERT_OK(system("ip link delete veth1_2"), "delete veth1_2");
+	ASSERT_OK(system("ip netns delete ns_dst"), "delete ns_dst");
+}
+
+static int send_udp_packets(int vary_dst_ip)
+{
+	struct ethhdr eh = {
+		.h_source = BOND1_MAC,
+		.h_dest = BOND2_MAC,
+		.h_proto = htons(ETH_P_IP),
+	};
+	uint8_t buf[128] = {};
+	struct iphdr *iph = (struct iphdr *)(buf + sizeof(eh));
+	struct udphdr *uh = (struct udphdr *)(buf + sizeof(eh) + sizeof(*iph));
+	int i, s = -1;
+	int ifindex;
+
+	s = socket(AF_PACKET, SOCK_RAW, IPPROTO_RAW);
+	if (!ASSERT_GE(s, 0, "socket"))
+		goto err;
+
+	ifindex = if_nametoindex("bond1");
+	if (!ASSERT_GT(ifindex, 0, "get bond1 ifindex"))
+		goto err;
+
+	memcpy(buf, &eh, sizeof(eh));
+	iph->ihl = 5;
+	iph->version = 4;
+	iph->tos = 16;
+	iph->id = 1;
+	iph->ttl = 64;
+	iph->protocol = IPPROTO_UDP;
+	iph->saddr = 1;
+	iph->daddr = 2;
+	iph->tot_len = htons(sizeof(buf) - ETH_HLEN);
+	iph->check = 0;
+
+	for (i = 1; i <= NPACKETS; i++) {
+		int n;
+		struct sockaddr_ll saddr_ll = {
+			.sll_ifindex = ifindex,
+			.sll_halen = ETH_ALEN,
+			.sll_addr = BOND2_MAC,
+		};
+
+		/* vary the UDP destination port for even distribution with roundrobin/xor modes */
+		uh->dest++;
+
+		if (vary_dst_ip)
+			iph->daddr++;
+
+		n = sendto(s, buf, sizeof(buf), 0, (struct sockaddr *)&saddr_ll, sizeof(saddr_ll));
+		if (!ASSERT_EQ(n, sizeof(buf), "sendto"))
+			goto err;
+	}
+
+	return 0;
+
+err:
+	if (s >= 0)
+		close(s);
+	return -1;
+}
+
+static void test_xdp_bonding_with_mode(struct skeletons *skeletons, int mode, int xmit_policy)
+{
+	int bond1_rx;
+
+	if (bonding_setup(skeletons, mode, xmit_policy, BOND_BOTH_AND_ATTACH))
+		goto out;
+
+	if (send_udp_packets(xmit_policy != BOND_XMIT_POLICY_LAYER34))
+		goto out;
+
+	bond1_rx = get_rx_packets("bond1");
+	ASSERT_EQ(bond1_rx, NPACKETS, "expected more received packets");
+
+	switch (mode) {
+	case BOND_MODE_ROUNDROBIN:
+	case BOND_MODE_XOR: {
+		int veth1_rx = get_rx_packets("veth1_1");
+		int veth2_rx = get_rx_packets("veth1_2");
+		int diff = abs(veth1_rx - veth2_rx);
+
+		ASSERT_GE(veth1_rx + veth2_rx, NPACKETS, "expected more packets");
+
+		switch (xmit_policy) {
+		case BOND_XMIT_POLICY_LAYER2:
+			ASSERT_GE(diff, NPACKETS,
+				  "expected packets on only one of the interfaces");
+			break;
+		case BOND_XMIT_POLICY_LAYER23:
+		case BOND_XMIT_POLICY_LAYER34:
+			ASSERT_LT(diff, NPACKETS/2,
+				  "expected even distribution of packets");
+			break;
+		default:
+			PRINT_FAIL("Unimplemented xmit_policy=%d\n", xmit_policy);
+			break;
+		}
+		break;
+	}
+	case BOND_MODE_ACTIVEBACKUP: {
+		int veth1_rx = get_rx_packets("veth1_1");
+		int veth2_rx = get_rx_packets("veth1_2");
+		int diff = abs(veth1_rx - veth2_rx);
+
+		ASSERT_GE(diff, NPACKETS,
+			  "expected packets on only one of the interfaces");
+		break;
+	}
+	default:
+		PRINT_FAIL("Unimplemented xmit_policy=%d\n", xmit_policy);
+		break;
+	}
+
+out:
+	bonding_cleanup(skeletons);
+}
+
+/* Test the broadcast redirection using xdp_redirect_map_multi_prog and adding
+ * all the interfaces to it and checking that broadcasting won't send the packet
+ * to neither the ingress bond device (bond2) or its slave (veth2_1).
+ */
+static void test_xdp_bonding_redirect_multi(struct skeletons *skeletons)
+{
+	static const char * const ifaces[] = {"bond2", "veth2_1", "veth2_2"};
+	int veth1_1_rx, veth1_2_rx;
+	int err;
+
+	if (bonding_setup(skeletons, BOND_MODE_ROUNDROBIN, BOND_XMIT_POLICY_LAYER23,
+			  BOND_ONE_NO_ATTACH))
+		goto out;
+
+
+	if (!ASSERT_OK(setns_by_name("ns_dst"), "could not set netns to ns_dst"))
+		goto out;
+
+	/* populate the devmap with the relevant interfaces */
+	for (int i = 0; i < ARRAY_SIZE(ifaces); i++) {
+		int ifindex = if_nametoindex(ifaces[i]);
+		int map_fd = bpf_map__fd(skeletons->xdp_redirect_multi_kern->maps.map_all);
+
+		if (!ASSERT_GT(ifindex, 0, "could not get interface index"))
+			goto out;
+
+		err = bpf_map_update_elem(map_fd, &ifindex, &ifindex, 0);
+		if (!ASSERT_OK(err, "add interface to map_all"))
+			goto out;
+	}
+
+	if (xdp_attach(skeletons,
+		       skeletons->xdp_redirect_multi_kern->progs.xdp_redirect_map_multi_prog,
+		       "bond2"))
+		goto out;
+
+	restore_root_netns();
+
+	if (send_udp_packets(BOND_MODE_ROUNDROBIN))
+		goto out;
+
+	veth1_1_rx = get_rx_packets("veth1_1");
+	veth1_2_rx = get_rx_packets("veth1_2");
+
+	ASSERT_EQ(veth1_1_rx, 0, "expected no packets on veth1_1");
+	ASSERT_GE(veth1_2_rx, NPACKETS, "expected packets on veth1_2");
+
+out:
+	restore_root_netns();
+	bonding_cleanup(skeletons);
+}
+
+/* Test that XDP programs cannot be attached to both the bond master and slaves simultaneously */
+static void test_xdp_bonding_attach(struct skeletons *skeletons)
+{
+	struct bpf_link *link = NULL;
+	struct bpf_link *link2 = NULL;
+	int veth, bond;
+	int err;
+
+	if (!ASSERT_OK(system("ip link add veth type veth"), "add veth"))
+		goto out;
+	if (!ASSERT_OK(system("ip link add bond type bond"), "add bond"))
+		goto out;
+
+	veth = if_nametoindex("veth");
+	if (!ASSERT_GE(veth, 0, "if_nametoindex veth"))
+		goto out;
+	bond = if_nametoindex("bond");
+	if (!ASSERT_GE(bond, 0, "if_nametoindex bond"))
+		goto out;
+
+	/* enslaving with a XDP program loaded fails */
+	link = bpf_program__attach_xdp(skeletons->xdp_dummy->progs.xdp_dummy_prog, veth);
+	if (!ASSERT_OK_PTR(link, "attach program to veth"))
+		goto out;
+
+	err = system("ip link set veth master bond");
+	if (!ASSERT_NEQ(err, 0, "attaching slave with xdp program expected to fail"))
+		goto out;
+
+	bpf_link__destroy(link);
+	link = NULL;
+
+	err = system("ip link set veth master bond");
+	if (!ASSERT_OK(err, "set veth master"))
+		goto out;
+
+	/* attaching to slave when master has no program is allowed */
+	link = bpf_program__attach_xdp(skeletons->xdp_dummy->progs.xdp_dummy_prog, veth);
+	if (!ASSERT_OK_PTR(link, "attach program to slave when enslaved"))
+		goto out;
+
+	/* attaching to master not allowed when slave has program loaded */
+	link2 = bpf_program__attach_xdp(skeletons->xdp_dummy->progs.xdp_dummy_prog, bond);
+	if (!ASSERT_ERR_PTR(link2, "attach program to master when slave has program"))
+		goto out;
+
+	bpf_link__destroy(link);
+	link = NULL;
+
+	/* attaching XDP program to master allowed when slave has no program */
+	link = bpf_program__attach_xdp(skeletons->xdp_dummy->progs.xdp_dummy_prog, bond);
+	if (!ASSERT_OK_PTR(link, "attach program to master"))
+		goto out;
+
+	/* attaching to slave not allowed when master has program loaded */
+	link2 = bpf_program__attach_xdp(skeletons->xdp_dummy->progs.xdp_dummy_prog, bond);
+	ASSERT_ERR_PTR(link2, "attach program to slave when master has program");
+
+out:
+	bpf_link__destroy(link);
+	bpf_link__destroy(link2);
+
+	system("ip link del veth");
+	system("ip link del bond");
+}
+
+static int libbpf_debug_print(enum libbpf_print_level level,
+			      const char *format, va_list args)
+{
+	if (level != LIBBPF_WARN)
+		vprintf(format, args);
+	return 0;
+}
+
+struct bond_test_case {
+	char *name;
+	int mode;
+	int xmit_policy;
+};
+
+static struct bond_test_case bond_test_cases[] = {
+	{ "xdp_bonding_roundrobin", BOND_MODE_ROUNDROBIN, BOND_XMIT_POLICY_LAYER23, },
+	{ "xdp_bonding_activebackup", BOND_MODE_ACTIVEBACKUP, BOND_XMIT_POLICY_LAYER23 },
+
+	{ "xdp_bonding_xor_layer2", BOND_MODE_XOR, BOND_XMIT_POLICY_LAYER2, },
+	{ "xdp_bonding_xor_layer23", BOND_MODE_XOR, BOND_XMIT_POLICY_LAYER23, },
+	{ "xdp_bonding_xor_layer34", BOND_MODE_XOR, BOND_XMIT_POLICY_LAYER34, },
+};
+
+void test_xdp_bonding(void)
+{
+	libbpf_print_fn_t old_print_fn;
+	struct skeletons skeletons = {};
+	int i;
+
+	old_print_fn = libbpf_set_print(libbpf_debug_print);
+
+	root_netns_fd = open("/proc/self/ns/net", O_RDONLY);
+	if (!ASSERT_GE(root_netns_fd, 0, "open /proc/self/ns/net"))
+		goto out;
+
+	skeletons.xdp_dummy = xdp_dummy__open_and_load();
+	if (!ASSERT_OK_PTR(skeletons.xdp_dummy, "xdp_dummy__open_and_load"))
+		goto out;
+
+	skeletons.xdp_tx = xdp_tx__open_and_load();
+	if (!ASSERT_OK_PTR(skeletons.xdp_tx, "xdp_tx__open_and_load"))
+		goto out;
+
+	skeletons.xdp_redirect_multi_kern = xdp_redirect_multi_kern__open_and_load();
+	if (!ASSERT_OK_PTR(skeletons.xdp_redirect_multi_kern,
+			   "xdp_redirect_multi_kern__open_and_load"))
+		goto out;
+
+	if (!test__start_subtest("xdp_bonding_attach"))
+		test_xdp_bonding_attach(&skeletons);
+
+	for (i = 0; i < ARRAY_SIZE(bond_test_cases); i++) {
+		struct bond_test_case *test_case = &bond_test_cases[i];
+
+		if (!test__start_subtest(test_case->name))
+			test_xdp_bonding_with_mode(
+				&skeletons,
+				test_case->mode,
+				test_case->xmit_policy);
+	}
+
+	if (!test__start_subtest("xdp_bonding_redirect_multi"))
+		test_xdp_bonding_redirect_multi(&skeletons);
+
+out:
+	xdp_dummy__destroy(skeletons.xdp_dummy);
+	xdp_tx__destroy(skeletons.xdp_tx);
+	xdp_redirect_multi_kern__destroy(skeletons.xdp_redirect_multi_kern);
+
+	libbpf_set_print(old_print_fn);
+	if (root_netns_fd)
+		close(root_netns_fd);
+}
-- 
2.17.1


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next v4 6/6] selftests/bpf: Add tests for XDP bonding
  2021-07-28 23:43   ` [PATCH bpf-next v4 6/6] selftests/bpf: Add tests for XDP bonding joamaki
@ 2021-08-03  0:19     ` Andrii Nakryiko
  2021-08-03  9:40       ` Jussi Maki
  0 siblings, 1 reply; 71+ messages in thread
From: Andrii Nakryiko @ 2021-08-03  0:19 UTC (permalink / raw)
  To: Jussi Maki
  Cc: bpf, Networking, Daniel Borkmann, j.vosburgh, Andy Gospodarek,
	vfalico, Andrii Nakryiko, Maciej Fijalkowski, Magnus Karlsson

On Mon, Aug 2, 2021 at 6:24 AM <joamaki@gmail.com> wrote:
>
> From: Jussi Maki <joamaki@gmail.com>
>
> Add a test suite to test XDP bonding implementation
> over a pair of veth devices.
>
> Signed-off-by: Jussi Maki <joamaki@gmail.com>
> ---

Was there any reason not to use BPF skeleton in your new tests? And
also bpf_link-based XDP attachment instead of netlink-based?

>  .../selftests/bpf/prog_tests/xdp_bonding.c    | 467 ++++++++++++++++++
>  1 file changed, 467 insertions(+)
>

[...]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next v4 6/6] selftests/bpf: Add tests for XDP bonding
  2021-08-03  0:19     ` Andrii Nakryiko
@ 2021-08-03  9:40       ` Jussi Maki
  0 siblings, 0 replies; 71+ messages in thread
From: Jussi Maki @ 2021-08-03  9:40 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Networking, Daniel Borkmann, j.vosburgh, Andy Gospodarek,
	vfalico, Andrii Nakryiko, Maciej Fijalkowski, Magnus Karlsson

On Tue, Aug 3, 2021 at 2:19 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Mon, Aug 2, 2021 at 6:24 AM <joamaki@gmail.com> wrote:
> >
> > From: Jussi Maki <joamaki@gmail.com>
> >
> > Add a test suite to test XDP bonding implementation
> > over a pair of veth devices.
> >
> > Signed-off-by: Jussi Maki <joamaki@gmail.com>
> > ---
>
> Was there any reason not to use BPF skeleton in your new tests? And
> also bpf_link-based XDP attachment instead of netlink-based?

Not really. I used the existing xdp_redirect_multi test as basis and
that used this approach. I'll give a go at changing this to use the
BPF skeletons.

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next v5 7/7] selftests/bpf: Add tests for XDP bonding
  2021-07-30  6:18   ` [PATCH bpf-next v5 7/7] selftests/bpf: Add tests for XDP bonding Jussi Maki
@ 2021-08-04 23:33     ` Andrii Nakryiko
  0 siblings, 0 replies; 71+ messages in thread
From: Andrii Nakryiko @ 2021-08-04 23:33 UTC (permalink / raw)
  To: Jussi Maki
  Cc: bpf, Networking, Daniel Borkmann, j.vosburgh, Andy Gospodarek,
	vfalico, Andrii Nakryiko, Maciej Fijalkowski, Magnus Karlsson

On Wed, Aug 4, 2021 at 5:45 AM Jussi Maki <joamaki@gmail.com> wrote:
>
> Add a test suite to test XDP bonding implementation
> over a pair of veth devices.
>
> Signed-off-by: Jussi Maki <joamaki@gmail.com>
> ---
>  .../selftests/bpf/prog_tests/xdp_bonding.c    | 533 ++++++++++++++++++
>  1 file changed, 533 insertions(+)
>

[...]

> +
> +static int xdp_attach(struct skeletons *skeletons, struct bpf_program *prog, char *iface)
> +{
> +       struct bpf_link *link;
> +       int ifindex;
> +
> +       ifindex = if_nametoindex(iface);
> +       if (!ASSERT_GT(ifindex, 0, "get ifindex"))
> +               return -1;
> +
> +       if (!ASSERT_LE(skeletons->nlinks, MAX_BPF_LINKS, "too many XDP programs attached"))

If it's already less or equal to MAX_BPF_LINKS, then you'll bump
nlinks below one more time and write beyond the array boundaries?

> +               return -1;
> +
> +       link = bpf_program__attach_xdp(prog, ifindex);
> +       if (!ASSERT_OK_PTR(link, "attach xdp program"))
> +               return -1;
> +
> +       skeletons->links[skeletons->nlinks++] = link;
> +       return 0;
> +}
> +

[...]

> +
> +static void bonding_cleanup(struct skeletons *skeletons)
> +{
> +       restore_root_netns();
> +       while (skeletons->nlinks) {
> +               skeletons->nlinks--;
> +               bpf_link__detach(skeletons->links[skeletons->nlinks]);

You want bpf_link__destroy, not bpf_link__detach (detach will leave
underlying BPF link FD open and ensure that bpf_link__destory() won't
do anything with it, just frees memory).

> +       }
> +       ASSERT_OK(system("ip link delete bond1"), "delete bond1");
> +       ASSERT_OK(system("ip link delete veth1_1"), "delete veth1_1");
> +       ASSERT_OK(system("ip link delete veth1_2"), "delete veth1_2");
> +       ASSERT_OK(system("ip netns delete ns_dst"), "delete ns_dst");
> +}
> +

> +out:
> +       bonding_cleanup(skeletons);
> +}
> +
> +

nit: extra line

> +/* Test the broadcast redirection using xdp_redirect_map_multi_prog and adding
> + * all the interfaces to it and checking that broadcasting won't send the packet
> + * to neither the ingress bond device (bond2) or its slave (veth2_1).
> + */
> +void test_xdp_bonding_redirect_multi(struct skeletons *skeletons)
> +{
> +       static const char * const ifaces[] = {"bond2", "veth2_1", "veth2_2"};
> +       int veth1_1_rx, veth1_2_rx;
> +       int err;
> +
> +       if (!test__start_subtest("xdp_bonding_redirect_multi"))
> +               return;
> +
> +       if (bonding_setup(skeletons, BOND_MODE_ROUNDROBIN, BOND_XMIT_POLICY_LAYER23,
> +                         BOND_ONE_NO_ATTACH))
> +               goto out;
> +
> +

nit: another extra empty line, please check if there are more

> +       if (!ASSERT_OK(setns_by_name("ns_dst"), "could not set netns to ns_dst"))
> +               goto out;
> +

[...]

> +       /* enslaving with a XDP program loaded fails */
> +       link = bpf_program__attach_xdp(skeletons->xdp_dummy->progs.xdp_dummy_prog, veth);
> +       if (!ASSERT_OK_PTR(link, "attach program to veth"))
> +               goto out;
> +
> +       err = system("ip link set veth master bond");
> +       if (!ASSERT_NEQ(err, 0, "attaching slave with xdp program expected to fail"))
> +               goto out;
> +
> +       bpf_link__detach(link);

same here and in few more places, you need destroy

> +       link = NULL;
> +
> +       err = system("ip link set veth master bond");
> +       if (!ASSERT_OK(err, "set veth master"))
> +               goto out;
> +
> +       /* attaching to slave when master has no program is allowed */
> +       link = bpf_program__attach_xdp(skeletons->xdp_dummy->progs.xdp_dummy_prog, veth);
> +       if (!ASSERT_OK_PTR(link, "attach program to slave when enslaved"))
> +               goto out;
> +
> +       /* attaching to master not allowed when slave has program loaded */
> +       link2 = bpf_program__attach_xdp(skeletons->xdp_dummy->progs.xdp_dummy_prog, bond);
> +       if (!ASSERT_ERR_PTR(link2, "attach program to master when slave has program"))
> +               goto out;
> +
> +       bpf_link__detach(link);
> +       link = NULL;
> +
> +       /* attaching XDP program to master allowed when slave has no program */
> +       link = bpf_program__attach_xdp(skeletons->xdp_dummy->progs.xdp_dummy_prog, bond);
> +       if (!ASSERT_OK_PTR(link, "attach program to master"))
> +               goto out;
> +
> +       /* attaching to slave not allowed when master has program loaded */
> +       link2 = bpf_program__attach_xdp(skeletons->xdp_dummy->progs.xdp_dummy_prog, bond);
> +       ASSERT_ERR_PTR(link2, "attach program to slave when master has program");
> +
> +out:
> +       if (link)
> +               bpf_link__detach(link);
> +       if (link2)
> +               bpf_link__detach(link2);

bpf_link__destroy() handles NULLs just fine, you don't have to do extra checks

> +
> +       system("ip link del veth");
> +       system("ip link del bond");
> +}
> +
> +static int libbpf_debug_print(enum libbpf_print_level level,
> +                             const char *format, va_list args)
> +{
> +       if (level != LIBBPF_WARN)
> +               vprintf(format, args);
> +       return 0;
> +}
> +
> +struct bond_test_case {
> +       char *name;
> +       int mode;
> +       int xmit_policy;
> +};
> +
> +static struct bond_test_case bond_test_cases[] = {
> +       { "xdp_bonding_roundrobin", BOND_MODE_ROUNDROBIN, BOND_XMIT_POLICY_LAYER23, },
> +       { "xdp_bonding_activebackup", BOND_MODE_ACTIVEBACKUP, BOND_XMIT_POLICY_LAYER23 },
> +
> +       { "xdp_bonding_xor_layer2", BOND_MODE_XOR, BOND_XMIT_POLICY_LAYER2, },
> +       { "xdp_bonding_xor_layer23", BOND_MODE_XOR, BOND_XMIT_POLICY_LAYER23, },
> +       { "xdp_bonding_xor_layer34", BOND_MODE_XOR, BOND_XMIT_POLICY_LAYER34, },
> +};
> +
> +void test_xdp_bonding(void)

this should be the only non-static function in this file, please fix
all the functions above

> +{
> +       libbpf_print_fn_t old_print_fn;
> +       struct skeletons skeletons = {};
> +       int i;
> +
> +       old_print_fn = libbpf_set_print(libbpf_debug_print);
> +
> +       root_netns_fd = open("/proc/self/ns/net", O_RDONLY);
> +       if (!ASSERT_GE(root_netns_fd, 0, "open /proc/self/ns/net"))
> +               goto out;
> +
> +       skeletons.xdp_dummy = xdp_dummy__open_and_load();
> +       if (!ASSERT_OK_PTR(skeletons.xdp_dummy, "xdp_dummy__open_and_load"))
> +               goto out;
> +
> +       skeletons.xdp_tx = xdp_tx__open_and_load();
> +       if (!ASSERT_OK_PTR(skeletons.xdp_tx, "xdp_tx__open_and_load"))
> +               goto out;
> +
> +       skeletons.xdp_redirect_multi_kern = xdp_redirect_multi_kern__open_and_load();
> +       if (!ASSERT_OK_PTR(skeletons.xdp_redirect_multi_kern,
> +                          "xdp_redirect_multi_kern__open_and_load"))
> +               goto out;
> +
> +       test_xdp_bonding_attach(&skeletons);

check for errors

> +
> +       for (i = 0; i < ARRAY_SIZE(bond_test_cases); i++) {
> +               struct bond_test_case *test_case = &bond_test_cases[i];
> +
> +               test_xdp_bonding_with_mode(
> +                       &skeletons,
> +                       test_case->name,
> +                       test_case->mode,
> +                       test_case->xmit_policy);
> +       }
> +
> +       test_xdp_bonding_redirect_multi(&skeletons);
> +
> +out:
> +       if (skeletons.xdp_dummy)
> +               xdp_dummy__destroy(skeletons.xdp_dummy);
> +       if (skeletons.xdp_tx)
> +               xdp_tx__destroy(skeletons.xdp_tx);
> +       if (skeletons.xdp_redirect_multi_kern)
> +               xdp_redirect_multi_kern__destroy(skeletons.xdp_redirect_multi_kern);

similarly, all libbpf destructors handle NULL and error pointers
cleanly, no need for extra ifs


> +
> +       libbpf_set_print(old_print_fn);
> +       if (root_netns_fd)
> +               close(root_netns_fd);
> +}
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next v5 6/7] selftests/bpf: Fix xdp_tx.c prog section name
  2021-07-30  6:18   ` [PATCH bpf-next v5 6/7] selftests/bpf: Fix xdp_tx.c prog section name Jussi Maki
@ 2021-08-04 23:35     ` Andrii Nakryiko
  0 siblings, 0 replies; 71+ messages in thread
From: Andrii Nakryiko @ 2021-08-04 23:35 UTC (permalink / raw)
  To: Jussi Maki
  Cc: bpf, Networking, Daniel Borkmann, j.vosburgh, Andy Gospodarek,
	vfalico, Andrii Nakryiko, Maciej Fijalkowski, Magnus Karlsson

On Wed, Aug 4, 2021 at 5:45 AM Jussi Maki <joamaki@gmail.com> wrote:
>
> The program type cannot be deduced from 'tx' which causes an invalid
> argument error when trying to load xdp_tx.o using the skeleton.
> Rename the section name to "xdp/tx" so that libbpf can deduce the type.
>
> Signed-off-by: Jussi Maki <joamaki@gmail.com>
> ---
>  tools/testing/selftests/bpf/progs/xdp_tx.c   | 2 +-
>  tools/testing/selftests/bpf/test_xdp_veth.sh | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/progs/xdp_tx.c b/tools/testing/selftests/bpf/progs/xdp_tx.c
> index 94e6c2b281cb..ece1fbbc0984 100644
> --- a/tools/testing/selftests/bpf/progs/xdp_tx.c
> +++ b/tools/testing/selftests/bpf/progs/xdp_tx.c
> @@ -3,7 +3,7 @@
>  #include <linux/bpf.h>
>  #include <bpf/bpf_helpers.h>
>
> -SEC("tx")
> +SEC("xdp/tx")

please use just SEC("xdp")

>  int xdp_tx(struct xdp_md *xdp)
>  {
>         return XDP_TX;
> diff --git a/tools/testing/selftests/bpf/test_xdp_veth.sh b/tools/testing/selftests/bpf/test_xdp_veth.sh
> index ba8ffcdaac30..c8e0b7d36f56 100755
> --- a/tools/testing/selftests/bpf/test_xdp_veth.sh
> +++ b/tools/testing/selftests/bpf/test_xdp_veth.sh
> @@ -108,7 +108,7 @@ ip link set dev veth2 xdp pinned $BPF_DIR/progs/redirect_map_1
>  ip link set dev veth3 xdp pinned $BPF_DIR/progs/redirect_map_2
>
>  ip -n ns1 link set dev veth11 xdp obj xdp_dummy.o sec xdp_dummy
> -ip -n ns2 link set dev veth22 xdp obj xdp_tx.o sec tx
> +ip -n ns2 link set dev veth22 xdp obj xdp_tx.o sec xdp/tx
>  ip -n ns3 link set dev veth33 xdp obj xdp_dummy.o sec xdp_dummy
>
>  trap cleanup EXIT
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next v6 7/7] selftests/bpf: Add tests for XDP bonding
  2021-07-31  5:57   ` [PATCH bpf-next v6 7/7] selftests/bpf: Add tests for XDP bonding Jussi Maki
@ 2021-08-06 22:50     ` Andrii Nakryiko
  2021-08-09 14:24       ` Jussi Maki
  0 siblings, 1 reply; 71+ messages in thread
From: Andrii Nakryiko @ 2021-08-06 22:50 UTC (permalink / raw)
  To: Jussi Maki
  Cc: bpf, Networking, Daniel Borkmann, j.vosburgh, Andy Gospodarek,
	vfalico, Andrii Nakryiko, Maciej Fijalkowski, Magnus Karlsson

On Thu, Aug 5, 2021 at 9:10 AM Jussi Maki <joamaki@gmail.com> wrote:
>
> Add a test suite to test XDP bonding implementation
> over a pair of veth devices.
>
> Signed-off-by: Jussi Maki <joamaki@gmail.com>
> ---
>  .../selftests/bpf/prog_tests/xdp_bonding.c    | 520 ++++++++++++++++++
>  1 file changed, 520 insertions(+)
>

I don't pretend to understand what's going on in this selftests, but
it looks good from the generic selftest standpoint. One and half small
issues below, please double-check (and probably fix the fd close
issue).

Acked-by: Andrii Nakryiko <andrii@kernel.org>


[...]

> +
> +/* Test the broadcast redirection using xdp_redirect_map_multi_prog and adding
> + * all the interfaces to it and checking that broadcasting won't send the packet
> + * to neither the ingress bond device (bond2) or its slave (veth2_1).
> + */
> +static void test_xdp_bonding_redirect_multi(struct skeletons *skeletons)
> +{
> +       static const char * const ifaces[] = {"bond2", "veth2_1", "veth2_2"};
> +       int veth1_1_rx, veth1_2_rx;
> +       int err;
> +
> +       if (bonding_setup(skeletons, BOND_MODE_ROUNDROBIN, BOND_XMIT_POLICY_LAYER23,
> +                         BOND_ONE_NO_ATTACH))
> +               goto out;
> +
> +
> +       if (!ASSERT_OK(setns_by_name("ns_dst"), "could not set netns to ns_dst"))
> +               goto out;
> +
> +       /* populate the devmap with the relevant interfaces */
> +       for (int i = 0; i < ARRAY_SIZE(ifaces); i++) {
> +               int ifindex = if_nametoindex(ifaces[i]);
> +               int map_fd = bpf_map__fd(skeletons->xdp_redirect_multi_kern->maps.map_all);
> +
> +               if (!ASSERT_GT(ifindex, 0, "could not get interface index"))
> +                       goto out;
> +
> +               err = bpf_map_update_elem(map_fd, &ifindex, &ifindex, 0);
> +               if (!ASSERT_OK(err, "add interface to map_all"))
> +                       goto out;
> +       }
> +
> +       if (xdp_attach(skeletons,
> +                      skeletons->xdp_redirect_multi_kern->progs.xdp_redirect_map_multi_prog,
> +                      "bond2"))
> +               goto out;
> +
> +       restore_root_netns();

the "goto out" below might call restore_root_netns() again, is that ok?

> +
> +       if (send_udp_packets(BOND_MODE_ROUNDROBIN))
> +               goto out;
> +
> +       veth1_1_rx = get_rx_packets("veth1_1");
> +       veth1_2_rx = get_rx_packets("veth1_2");
> +
> +       ASSERT_EQ(veth1_1_rx, 0, "expected no packets on veth1_1");
> +       ASSERT_GE(veth1_2_rx, NPACKETS, "expected packets on veth1_2");
> +
> +out:
> +       restore_root_netns();
> +       bonding_cleanup(skeletons);
> +}
> +

[...]

> +
> +void test_xdp_bonding(void)
> +{
> +       libbpf_print_fn_t old_print_fn;
> +       struct skeletons skeletons = {};
> +       int i;
> +
> +       old_print_fn = libbpf_set_print(libbpf_debug_print);
> +
> +       root_netns_fd = open("/proc/self/ns/net", O_RDONLY);
> +       if (!ASSERT_GE(root_netns_fd, 0, "open /proc/self/ns/net"))
> +               goto out;
> +
> +       skeletons.xdp_dummy = xdp_dummy__open_and_load();
> +       if (!ASSERT_OK_PTR(skeletons.xdp_dummy, "xdp_dummy__open_and_load"))
> +               goto out;
> +
> +       skeletons.xdp_tx = xdp_tx__open_and_load();
> +       if (!ASSERT_OK_PTR(skeletons.xdp_tx, "xdp_tx__open_and_load"))
> +               goto out;
> +
> +       skeletons.xdp_redirect_multi_kern = xdp_redirect_multi_kern__open_and_load();
> +       if (!ASSERT_OK_PTR(skeletons.xdp_redirect_multi_kern,
> +                          "xdp_redirect_multi_kern__open_and_load"))
> +               goto out;
> +
> +       if (!test__start_subtest("xdp_bonding_attach"))
> +               test_xdp_bonding_attach(&skeletons);
> +
> +       for (i = 0; i < ARRAY_SIZE(bond_test_cases); i++) {
> +               struct bond_test_case *test_case = &bond_test_cases[i];
> +
> +               if (!test__start_subtest(test_case->name))
> +                       test_xdp_bonding_with_mode(
> +                               &skeletons,
> +                               test_case->mode,
> +                               test_case->xmit_policy);
> +       }
> +
> +       if (!test__start_subtest("xdp_bonding_redirect_multi"))
> +               test_xdp_bonding_redirect_multi(&skeletons);
> +
> +out:
> +       xdp_dummy__destroy(skeletons.xdp_dummy);
> +       xdp_tx__destroy(skeletons.xdp_tx);
> +       xdp_redirect_multi_kern__destroy(skeletons.xdp_redirect_multi_kern);
> +
> +       libbpf_set_print(old_print_fn);
> +       if (root_netns_fd)

technically, fd could be 0, so for fds we have if (fd >= 0)
everywhere. Also, if open() above fails, root_netns_fd will be -1 and
you'll still attempt to close it.

> +               close(root_netns_fd);
> +}
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next v6 6/7] selftests/bpf: Fix xdp_tx.c prog section name
  2021-07-31  5:57   ` [PATCH bpf-next v6 6/7] selftests/bpf: Fix xdp_tx.c prog section name Jussi Maki
@ 2021-08-06 22:53     ` Andrii Nakryiko
  0 siblings, 0 replies; 71+ messages in thread
From: Andrii Nakryiko @ 2021-08-06 22:53 UTC (permalink / raw)
  To: Jussi Maki
  Cc: bpf, Networking, Daniel Borkmann, j.vosburgh, Andy Gospodarek,
	vfalico, Andrii Nakryiko, Maciej Fijalkowski, Magnus Karlsson

On Thu, Aug 5, 2021 at 9:10 AM Jussi Maki <joamaki@gmail.com> wrote:
>
> The program type cannot be deduced from 'tx' which causes an invalid
> argument error when trying to load xdp_tx.o using the skeleton.
> Rename the section name to "xdp" so that libbpf can deduce the type.
>
> Signed-off-by: Jussi Maki <joamaki@gmail.com>
> ---

LGTM.

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  tools/testing/selftests/bpf/progs/xdp_tx.c   | 2 +-
>  tools/testing/selftests/bpf/test_xdp_veth.sh | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/progs/xdp_tx.c b/tools/testing/selftests/bpf/progs/xdp_tx.c
> index 94e6c2b281cb..5f725c720e00 100644
> --- a/tools/testing/selftests/bpf/progs/xdp_tx.c
> +++ b/tools/testing/selftests/bpf/progs/xdp_tx.c
> @@ -3,7 +3,7 @@
>  #include <linux/bpf.h>
>  #include <bpf/bpf_helpers.h>
>
> -SEC("tx")
> +SEC("xdp")
>  int xdp_tx(struct xdp_md *xdp)
>  {
>         return XDP_TX;
> diff --git a/tools/testing/selftests/bpf/test_xdp_veth.sh b/tools/testing/selftests/bpf/test_xdp_veth.sh
> index ba8ffcdaac30..995278e684b6 100755
> --- a/tools/testing/selftests/bpf/test_xdp_veth.sh
> +++ b/tools/testing/selftests/bpf/test_xdp_veth.sh
> @@ -108,7 +108,7 @@ ip link set dev veth2 xdp pinned $BPF_DIR/progs/redirect_map_1
>  ip link set dev veth3 xdp pinned $BPF_DIR/progs/redirect_map_2
>
>  ip -n ns1 link set dev veth11 xdp obj xdp_dummy.o sec xdp_dummy
> -ip -n ns2 link set dev veth22 xdp obj xdp_tx.o sec tx
> +ip -n ns2 link set dev veth22 xdp obj xdp_tx.o sec xdp
>  ip -n ns3 link set dev veth33 xdp obj xdp_dummy.o sec xdp_dummy
>
>  trap cleanup EXIT
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next v6 7/7] selftests/bpf: Add tests for XDP bonding
  2021-08-06 22:50     ` Andrii Nakryiko
@ 2021-08-09 14:24       ` Jussi Maki
  2021-08-09 21:41         ` Daniel Borkmann
  0 siblings, 1 reply; 71+ messages in thread
From: Jussi Maki @ 2021-08-09 14:24 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Networking, Daniel Borkmann, j.vosburgh, Andy Gospodarek,
	vfalico, Andrii Nakryiko, Maciej Fijalkowski, Magnus Karlsson

On Sat, Aug 7, 2021 at 12:50 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Aug 5, 2021 at 9:10 AM Jussi Maki <joamaki@gmail.com> wrote:
> >
> > Add a test suite to test XDP bonding implementation
> > over a pair of veth devices.
> >
> > Signed-off-by: Jussi Maki <joamaki@gmail.com>
> > ---
> >  .../selftests/bpf/prog_tests/xdp_bonding.c    | 520 ++++++++++++++++++
> >  1 file changed, 520 insertions(+)
> >
>
> I don't pretend to understand what's going on in this selftests, but
> it looks good from the generic selftest standpoint. One and half small
> issues below, please double-check (and probably fix the fd close
> issue).

Thanks for the reviews!

> > +       if (xdp_attach(skeletons,
> > +                      skeletons->xdp_redirect_multi_kern->progs.xdp_redirect_map_multi_prog,
> > +                      "bond2"))
> > +               goto out;
> > +
> > +       restore_root_netns();
>
> the "goto out" below might call restore_root_netns() again, is that ok?

Yep that's fine.

> > +       if (!test__start_subtest("xdp_bonding_redirect_multi"))
> > +               test_xdp_bonding_redirect_multi(&skeletons);
> > +
> > +out:
> > +       xdp_dummy__destroy(skeletons.xdp_dummy);
> > +       xdp_tx__destroy(skeletons.xdp_tx);
> > +       xdp_redirect_multi_kern__destroy(skeletons.xdp_redirect_multi_kern);
> > +
> > +       libbpf_set_print(old_print_fn);
> > +       if (root_netns_fd)
>
> technically, fd could be 0, so for fds we have if (fd >= 0)
> everywhere. Also, if open() above fails, root_netns_fd will be -1 and
> you'll still attempt to close it.

Good catch. Daniel, could you fix this when applying to be "if
(root_netns_fd >= 0)"?

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next v6 7/7] selftests/bpf: Add tests for XDP bonding
  2021-08-09 14:24       ` Jussi Maki
@ 2021-08-09 21:41         ` Daniel Borkmann
  0 siblings, 0 replies; 71+ messages in thread
From: Daniel Borkmann @ 2021-08-09 21:41 UTC (permalink / raw)
  To: Jussi Maki, Andrii Nakryiko
  Cc: bpf, Networking, j.vosburgh, Andy Gospodarek, vfalico,
	Andrii Nakryiko, Maciej Fijalkowski, Magnus Karlsson

On 8/9/21 4:24 PM, Jussi Maki wrote:
[...]
>>> +       if (!test__start_subtest("xdp_bonding_redirect_multi"))
>>> +               test_xdp_bonding_redirect_multi(&skeletons);
>>> +
>>> +out:
>>> +       xdp_dummy__destroy(skeletons.xdp_dummy);
>>> +       xdp_tx__destroy(skeletons.xdp_tx);
>>> +       xdp_redirect_multi_kern__destroy(skeletons.xdp_redirect_multi_kern);
>>> +
>>> +       libbpf_set_print(old_print_fn);
>>> +       if (root_netns_fd)
>>
>> technically, fd could be 0, so for fds we have if (fd >= 0)
>> everywhere. Also, if open() above fails, root_netns_fd will be -1 and
>> you'll still attempt to close it.
> 
> Good catch. Daniel, could you fix this when applying to be "if
> (root_netns_fd >= 0)"?

Yep, done now, I had to rebase due to 220ade77452c ("bonding: 3ad: fix the concurrency
between __bond_release_one() and bond_3ad_state_machine_handler()") which this series
here didn't take into account. Please double check.

Thanks everyone,
Daniel

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next v6 1/7] net: bonding: Refactor bond_xmit_hash for use with xdp_buff
  2021-07-31  5:57   ` [PATCH bpf-next v6 1/7] net: bonding: Refactor bond_xmit_hash for use with xdp_buff Jussi Maki
@ 2021-08-11  1:52     ` Jonathan Toppins
  2021-08-11  8:22       ` Jussi Maki
  0 siblings, 1 reply; 71+ messages in thread
From: Jonathan Toppins @ 2021-08-11  1:52 UTC (permalink / raw)
  To: Jussi Maki, bpf
  Cc: netdev, daniel, j.vosburgh, andy, vfalico, andrii,
	maciej.fijalkowski, magnus.karlsson

On 7/31/21 1:57 AM, Jussi Maki wrote:
> In preparation for adding XDP support to the bonding driver
> refactor the packet hashing functions to be able to work with
> any linear data buffer without an skb.
> 
> Signed-off-by: Jussi Maki <joamaki@gmail.com>
> ---
>   drivers/net/bonding/bond_main.c | 147 +++++++++++++++++++-------------
>   1 file changed, 90 insertions(+), 57 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index d22d78303311..dcec5cc4dab1 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -3611,55 +3611,80 @@ static struct notifier_block bond_netdev_notifier = {
>   
>   /*---------------------------- Hashing Policies -----------------------------*/
>   
> +/* Helper to access data in a packet, with or without a backing skb.
> + * If skb is given the data is linearized if necessary via pskb_may_pull.
> + */
> +static inline const void *bond_pull_data(struct sk_buff *skb,
> +					 const void *data, int hlen, int n)
> +{
> +	if (likely(n <= hlen))
> +		return data;
> +	else if (skb && likely(pskb_may_pull(skb, n)))
> +		return skb->head;
> +
> +	return NULL;
> +}
> +
>   /* L2 hash helper */
> -static inline u32 bond_eth_hash(struct sk_buff *skb)
> +static inline u32 bond_eth_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen)
>   {
> -	struct ethhdr *ep, hdr_tmp;
> +	struct ethhdr *ep;
>   
> -	ep = skb_header_pointer(skb, 0, sizeof(hdr_tmp), &hdr_tmp);
> -	if (ep)
> -		return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto;
> -	return 0;
> +	data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr));
> +	if (!data)
> +		return 0;
> +
> +	ep = (struct ethhdr *)(data + mhoff);
> +	return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto;
>   }
>   
> -static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk,
> -			 int *noff, int *proto, bool l34)
> +static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk, const void *data,
> +			 int hlen, __be16 l2_proto, int *nhoff, int *ip_proto, bool l34)
>   {
>   	const struct ipv6hdr *iph6;
>   	const struct iphdr *iph;
>   
> -	if (skb->protocol == htons(ETH_P_IP)) {
> -		if (unlikely(!pskb_may_pull(skb, *noff + sizeof(*iph))))
> +	if (l2_proto == htons(ETH_P_IP)) {
> +		data = bond_pull_data(skb, data, hlen, *nhoff + sizeof(*iph));
> +		if (!data)
>   			return false;
> -		iph = (const struct iphdr *)(skb->data + *noff);
> +
> +		iph = (const struct iphdr *)(data + *nhoff);
>   		iph_to_flow_copy_v4addrs(fk, iph);
> -		*noff += iph->ihl << 2;
> +		*nhoff += iph->ihl << 2;
>   		if (!ip_is_fragment(iph))
> -			*proto = iph->protocol;
> -	} else if (skb->protocol == htons(ETH_P_IPV6)) {
> -		if (unlikely(!pskb_may_pull(skb, *noff + sizeof(*iph6))))
> +			*ip_proto = iph->protocol;
> +	} else if (l2_proto == htons(ETH_P_IPV6)) {
> +		data = bond_pull_data(skb, data, hlen, *nhoff + sizeof(*iph6));
> +		if (!data)
>   			return false;
> -		iph6 = (const struct ipv6hdr *)(skb->data + *noff);
> +
> +		iph6 = (const struct ipv6hdr *)(data + *nhoff);
>   		iph_to_flow_copy_v6addrs(fk, iph6);
> -		*noff += sizeof(*iph6);
> -		*proto = iph6->nexthdr;
> +		*nhoff += sizeof(*iph6);
> +		*ip_proto = iph6->nexthdr;
>   	} else {
>   		return false;
>   	}
>   
> -	if (l34 && *proto >= 0)
> -		fk->ports.ports = skb_flow_get_ports(skb, *noff, *proto);
> +	if (l34 && *ip_proto >= 0)
> +		fk->ports.ports = __skb_flow_get_ports(skb, *nhoff, *ip_proto, data, hlen);
>   
>   	return true;
>   }
>   
> -static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
> +static u32 bond_vlan_srcmac_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen)
>   {
> -	struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
> +	struct ethhdr *mac_hdr;
>   	u32 srcmac_vendor = 0, srcmac_dev = 0;
>   	u16 vlan;
>   	int i;
>   
> +	data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr));
> +	if (!data)
> +		return 0;
> +	mac_hdr = (struct ethhdr *)(data + mhoff);

The XDP changes are not introduced in this patch but this section looks 
consistent in later patches in the series. So assuming the XDP buff 
passed gets to this point how will a NULL dereference be avoided given 
skb == NULL, in the XDP call path, as skb is dereferenced later in the 
function?

By this section:
...
	if (!skb_vlan_tag_present(skb))
		return srcmac_vendor ^ srcmac_dev;

	vlan = skb_vlan_tag_get(skb);
...

referencing net-next/master id: d1a4e0a9576fd2b29a0d13b306a9f52440908ab4


> +
>   	for (i = 0; i < 3; i++)
>   		srcmac_vendor = (srcmac_vendor << 8) | mac_hdr->h_source[i];
>   
> @@ -3675,26 +3700,25 @@ static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
>   }
>   
>   /* Extract the appropriate headers based on bond's xmit policy */
> -static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
> -			      struct flow_keys *fk)
> +static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb, const void *data,
> +			      __be16 l2_proto, int nhoff, int hlen, struct flow_keys *fk)
>   {
>   	bool l34 = bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34;
> -	int noff, proto = -1;
> +	int ip_proto = -1;
>   
>   	switch (bond->params.xmit_policy) {
>   	case BOND_XMIT_POLICY_ENCAP23:
>   	case BOND_XMIT_POLICY_ENCAP34:
>   		memset(fk, 0, sizeof(*fk));
>   		return __skb_flow_dissect(NULL, skb, &flow_keys_bonding,
> -					  fk, NULL, 0, 0, 0, 0);
> +					  fk, data, l2_proto, nhoff, hlen, 0);
>   	default:
>   		break;
>   	}
>   
>   	fk->ports.ports = 0;
>   	memset(&fk->icmp, 0, sizeof(fk->icmp));
> -	noff = skb_network_offset(skb);
> -	if (!bond_flow_ip(skb, fk, &noff, &proto, l34))
> +	if (!bond_flow_ip(skb, fk, data, hlen, l2_proto, &nhoff, &ip_proto, l34))
>   		return false;
>   
>   	/* ICMP error packets contains at least 8 bytes of the header
> @@ -3702,22 +3726,20 @@ static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
>   	 * to correlate ICMP error packets within the same flow which
>   	 * generated the error.
>   	 */
> -	if (proto == IPPROTO_ICMP || proto == IPPROTO_ICMPV6) {
> -		skb_flow_get_icmp_tci(skb, &fk->icmp, skb->data,
> -				      skb_transport_offset(skb),
> -				      skb_headlen(skb));
> -		if (proto == IPPROTO_ICMP) {
> +	if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
> +		skb_flow_get_icmp_tci(skb, &fk->icmp, data, nhoff, hlen);
> +		if (ip_proto == IPPROTO_ICMP) {
>   			if (!icmp_is_err(fk->icmp.type))
>   				return true;
>   
> -			noff += sizeof(struct icmphdr);
> -		} else if (proto == IPPROTO_ICMPV6) {
> +			nhoff += sizeof(struct icmphdr);
> +		} else if (ip_proto == IPPROTO_ICMPV6) {
>   			if (!icmpv6_is_err(fk->icmp.type))
>   				return true;
>   
> -			noff += sizeof(struct icmp6hdr);
> +			nhoff += sizeof(struct icmp6hdr);
>   		}
> -		return bond_flow_ip(skb, fk, &noff, &proto, l34);
> +		return bond_flow_ip(skb, fk, data, hlen, l2_proto, &nhoff, &ip_proto, l34);
>   	}
>   
>   	return true;
> @@ -3733,33 +3755,26 @@ static u32 bond_ip_hash(u32 hash, struct flow_keys *flow)
>   	return hash >> 1;
>   }
>   
> -/**
> - * bond_xmit_hash - generate a hash value based on the xmit policy
> - * @bond: bonding device
> - * @skb: buffer to use for headers
> - *
> - * This function will extract the necessary headers from the skb buffer and use
> - * them to generate a hash based on the xmit_policy set in the bonding device
> +/* Generate hash based on xmit policy. If @skb is given it is used to linearize
> + * the data as required, but this function can be used without it if the data is
> + * known to be linear (e.g. with xdp_buff).
>    */
> -u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
> +static u32 __bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, const void *data,
> +			    __be16 l2_proto, int mhoff, int nhoff, int hlen)
>   {
>   	struct flow_keys flow;
>   	u32 hash;
>   
> -	if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
> -	    skb->l4_hash)
> -		return skb->hash;
> -
>   	if (bond->params.xmit_policy == BOND_XMIT_POLICY_VLAN_SRCMAC)
> -		return bond_vlan_srcmac_hash(skb);
> +		return bond_vlan_srcmac_hash(skb, data, mhoff, hlen);
>   
>   	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
> -	    !bond_flow_dissect(bond, skb, &flow))
> -		return bond_eth_hash(skb);
> +	    !bond_flow_dissect(bond, skb, data, l2_proto, nhoff, hlen, &flow))
> +		return bond_eth_hash(skb, data, mhoff, hlen);
>   
>   	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 ||
>   	    bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23) {
> -		hash = bond_eth_hash(skb);
> +		hash = bond_eth_hash(skb, data, mhoff, hlen);
>   	} else {
>   		if (flow.icmp.id)
>   			memcpy(&hash, &flow.icmp, sizeof(hash));
> @@ -3770,6 +3785,25 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
>   	return bond_ip_hash(hash, &flow);
>   }
>   
> +/**
> + * bond_xmit_hash - generate a hash value based on the xmit policy
> + * @bond: bonding device
> + * @skb: buffer to use for headers
> + *
> + * This function will extract the necessary headers from the skb buffer and use
> + * them to generate a hash based on the xmit_policy set in the bonding device
> + */
> +u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
> +{
> +	if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
> +	    skb->l4_hash)
> +		return skb->hash;
> +
> +	return __bond_xmit_hash(bond, skb, skb->head, skb->protocol,
> +				skb->mac_header, skb->network_header,
> +				skb_headlen(skb));
> +}
> +
>   /*-------------------------- Device entry points ----------------------------*/
>   
>   void bond_work_init_all(struct bonding *bond)
> @@ -4399,8 +4433,7 @@ static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
>   	return bond_tx_drop(bond_dev, skb);
>   }
>   
> -static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond,
> -						      struct sk_buff *skb)
> +static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond)
>   {
>   	return rcu_dereference(bond->curr_active_slave);
>   }
> @@ -4414,7 +4447,7 @@ static netdev_tx_t bond_xmit_activebackup(struct sk_buff *skb,
>   	struct bonding *bond = netdev_priv(bond_dev);
>   	struct slave *slave;
>   
> -	slave = bond_xmit_activebackup_slave_get(bond, skb);
> +	slave = bond_xmit_activebackup_slave_get(bond);
>   	if (slave)
>   		return bond_dev_queue_xmit(bond, skb, slave->dev);
>   
> @@ -4712,7 +4745,7 @@ static struct net_device *bond_xmit_get_slave(struct net_device *master_dev,
>   		slave = bond_xmit_roundrobin_slave_get(bond, skb);
>   		break;
>   	case BOND_MODE_ACTIVEBACKUP:
> -		slave = bond_xmit_activebackup_slave_get(bond, skb);
> +		slave = bond_xmit_activebackup_slave_get(bond);
>   		break;
>   	case BOND_MODE_8023AD:
>   	case BOND_MODE_XOR:
> 


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next v6 1/7] net: bonding: Refactor bond_xmit_hash for use with xdp_buff
  2021-08-11  1:52     ` Jonathan Toppins
@ 2021-08-11  8:22       ` Jussi Maki
  2021-08-11 14:05         ` Jonathan Toppins
  0 siblings, 1 reply; 71+ messages in thread
From: Jussi Maki @ 2021-08-11  8:22 UTC (permalink / raw)
  To: Jonathan Toppins
  Cc: bpf, Network Development, Daniel Borkmann, j.vosburgh,
	Andy Gospodarek, vfalico, Andrii Nakryiko, Maciej Fijalkowski,
	Karlsson, Magnus

Hi Jonathan,

Thanks for catching this. You're right, this will NULL deref if XDP
bonding is used with the VLAN_SRCMAC xmit policy. I think what
happened was that a very early version restricted the xmit policies
that were applicable, but it got dropped when this was refactored.
I'll look into this today and will add in support (or refuse) the
VLAN_SRCMAC xmit policy and extend the tests to cover this.

On Wed, Aug 11, 2021 at 3:52 AM Jonathan Toppins <jtoppins@redhat.com> wrote:
>
> On 7/31/21 1:57 AM, Jussi Maki wrote:
> > In preparation for adding XDP support to the bonding driver
> > refactor the packet hashing functions to be able to work with
> > any linear data buffer without an skb.
> >
> > Signed-off-by: Jussi Maki <joamaki@gmail.com>
> > ---
> >   drivers/net/bonding/bond_main.c | 147 +++++++++++++++++++-------------
> >   1 file changed, 90 insertions(+), 57 deletions(-)
> >
> > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> > index d22d78303311..dcec5cc4dab1 100644
> > --- a/drivers/net/bonding/bond_main.c
> > +++ b/drivers/net/bonding/bond_main.c
> > @@ -3611,55 +3611,80 @@ static struct notifier_block bond_netdev_notifier = {
> >
> >   /*---------------------------- Hashing Policies -----------------------------*/
> >
> > +/* Helper to access data in a packet, with or without a backing skb.
> > + * If skb is given the data is linearized if necessary via pskb_may_pull.
> > + */
> > +static inline const void *bond_pull_data(struct sk_buff *skb,
> > +                                      const void *data, int hlen, int n)
> > +{
> > +     if (likely(n <= hlen))
> > +             return data;
> > +     else if (skb && likely(pskb_may_pull(skb, n)))
> > +             return skb->head;
> > +
> > +     return NULL;
> > +}
> > +
> >   /* L2 hash helper */
> > -static inline u32 bond_eth_hash(struct sk_buff *skb)
> > +static inline u32 bond_eth_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen)
> >   {
> > -     struct ethhdr *ep, hdr_tmp;
> > +     struct ethhdr *ep;
> >
> > -     ep = skb_header_pointer(skb, 0, sizeof(hdr_tmp), &hdr_tmp);
> > -     if (ep)
> > -             return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto;
> > -     return 0;
> > +     data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr));
> > +     if (!data)
> > +             return 0;
> > +
> > +     ep = (struct ethhdr *)(data + mhoff);
> > +     return ep->h_dest[5] ^ ep->h_source[5] ^ ep->h_proto;
> >   }
> >
> > -static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk,
> > -                      int *noff, int *proto, bool l34)
> > +static bool bond_flow_ip(struct sk_buff *skb, struct flow_keys *fk, const void *data,
> > +                      int hlen, __be16 l2_proto, int *nhoff, int *ip_proto, bool l34)
> >   {
> >       const struct ipv6hdr *iph6;
> >       const struct iphdr *iph;
> >
> > -     if (skb->protocol == htons(ETH_P_IP)) {
> > -             if (unlikely(!pskb_may_pull(skb, *noff + sizeof(*iph))))
> > +     if (l2_proto == htons(ETH_P_IP)) {
> > +             data = bond_pull_data(skb, data, hlen, *nhoff + sizeof(*iph));
> > +             if (!data)
> >                       return false;
> > -             iph = (const struct iphdr *)(skb->data + *noff);
> > +
> > +             iph = (const struct iphdr *)(data + *nhoff);
> >               iph_to_flow_copy_v4addrs(fk, iph);
> > -             *noff += iph->ihl << 2;
> > +             *nhoff += iph->ihl << 2;
> >               if (!ip_is_fragment(iph))
> > -                     *proto = iph->protocol;
> > -     } else if (skb->protocol == htons(ETH_P_IPV6)) {
> > -             if (unlikely(!pskb_may_pull(skb, *noff + sizeof(*iph6))))
> > +                     *ip_proto = iph->protocol;
> > +     } else if (l2_proto == htons(ETH_P_IPV6)) {
> > +             data = bond_pull_data(skb, data, hlen, *nhoff + sizeof(*iph6));
> > +             if (!data)
> >                       return false;
> > -             iph6 = (const struct ipv6hdr *)(skb->data + *noff);
> > +
> > +             iph6 = (const struct ipv6hdr *)(data + *nhoff);
> >               iph_to_flow_copy_v6addrs(fk, iph6);
> > -             *noff += sizeof(*iph6);
> > -             *proto = iph6->nexthdr;
> > +             *nhoff += sizeof(*iph6);
> > +             *ip_proto = iph6->nexthdr;
> >       } else {
> >               return false;
> >       }
> >
> > -     if (l34 && *proto >= 0)
> > -             fk->ports.ports = skb_flow_get_ports(skb, *noff, *proto);
> > +     if (l34 && *ip_proto >= 0)
> > +             fk->ports.ports = __skb_flow_get_ports(skb, *nhoff, *ip_proto, data, hlen);
> >
> >       return true;
> >   }
> >
> > -static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
> > +static u32 bond_vlan_srcmac_hash(struct sk_buff *skb, const void *data, int mhoff, int hlen)
> >   {
> > -     struct ethhdr *mac_hdr = (struct ethhdr *)skb_mac_header(skb);
> > +     struct ethhdr *mac_hdr;
> >       u32 srcmac_vendor = 0, srcmac_dev = 0;
> >       u16 vlan;
> >       int i;
> >
> > +     data = bond_pull_data(skb, data, hlen, mhoff + sizeof(struct ethhdr));
> > +     if (!data)
> > +             return 0;
> > +     mac_hdr = (struct ethhdr *)(data + mhoff);
>
> The XDP changes are not introduced in this patch but this section looks
> consistent in later patches in the series. So assuming the XDP buff
> passed gets to this point how will a NULL dereference be avoided given
> skb == NULL, in the XDP call path, as skb is dereferenced later in the
> function?
>
> By this section:
> ...
>         if (!skb_vlan_tag_present(skb))
>                 return srcmac_vendor ^ srcmac_dev;
>
>         vlan = skb_vlan_tag_get(skb);
> ...
>
> referencing net-next/master id: d1a4e0a9576fd2b29a0d13b306a9f52440908ab4
>
>
> > +
> >       for (i = 0; i < 3; i++)
> >               srcmac_vendor = (srcmac_vendor << 8) | mac_hdr->h_source[i];
> >
> > @@ -3675,26 +3700,25 @@ static u32 bond_vlan_srcmac_hash(struct sk_buff *skb)
> >   }
> >
> >   /* Extract the appropriate headers based on bond's xmit policy */
> > -static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
> > -                           struct flow_keys *fk)
> > +static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb, const void *data,
> > +                           __be16 l2_proto, int nhoff, int hlen, struct flow_keys *fk)
> >   {
> >       bool l34 = bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34;
> > -     int noff, proto = -1;
> > +     int ip_proto = -1;
> >
> >       switch (bond->params.xmit_policy) {
> >       case BOND_XMIT_POLICY_ENCAP23:
> >       case BOND_XMIT_POLICY_ENCAP34:
> >               memset(fk, 0, sizeof(*fk));
> >               return __skb_flow_dissect(NULL, skb, &flow_keys_bonding,
> > -                                       fk, NULL, 0, 0, 0, 0);
> > +                                       fk, data, l2_proto, nhoff, hlen, 0);
> >       default:
> >               break;
> >       }
> >
> >       fk->ports.ports = 0;
> >       memset(&fk->icmp, 0, sizeof(fk->icmp));
> > -     noff = skb_network_offset(skb);
> > -     if (!bond_flow_ip(skb, fk, &noff, &proto, l34))
> > +     if (!bond_flow_ip(skb, fk, data, hlen, l2_proto, &nhoff, &ip_proto, l34))
> >               return false;
> >
> >       /* ICMP error packets contains at least 8 bytes of the header
> > @@ -3702,22 +3726,20 @@ static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
> >        * to correlate ICMP error packets within the same flow which
> >        * generated the error.
> >        */
> > -     if (proto == IPPROTO_ICMP || proto == IPPROTO_ICMPV6) {
> > -             skb_flow_get_icmp_tci(skb, &fk->icmp, skb->data,
> > -                                   skb_transport_offset(skb),
> > -                                   skb_headlen(skb));
> > -             if (proto == IPPROTO_ICMP) {
> > +     if (ip_proto == IPPROTO_ICMP || ip_proto == IPPROTO_ICMPV6) {
> > +             skb_flow_get_icmp_tci(skb, &fk->icmp, data, nhoff, hlen);
> > +             if (ip_proto == IPPROTO_ICMP) {
> >                       if (!icmp_is_err(fk->icmp.type))
> >                               return true;
> >
> > -                     noff += sizeof(struct icmphdr);
> > -             } else if (proto == IPPROTO_ICMPV6) {
> > +                     nhoff += sizeof(struct icmphdr);
> > +             } else if (ip_proto == IPPROTO_ICMPV6) {
> >                       if (!icmpv6_is_err(fk->icmp.type))
> >                               return true;
> >
> > -                     noff += sizeof(struct icmp6hdr);
> > +                     nhoff += sizeof(struct icmp6hdr);
> >               }
> > -             return bond_flow_ip(skb, fk, &noff, &proto, l34);
> > +             return bond_flow_ip(skb, fk, data, hlen, l2_proto, &nhoff, &ip_proto, l34);
> >       }
> >
> >       return true;
> > @@ -3733,33 +3755,26 @@ static u32 bond_ip_hash(u32 hash, struct flow_keys *flow)
> >       return hash >> 1;
> >   }
> >
> > -/**
> > - * bond_xmit_hash - generate a hash value based on the xmit policy
> > - * @bond: bonding device
> > - * @skb: buffer to use for headers
> > - *
> > - * This function will extract the necessary headers from the skb buffer and use
> > - * them to generate a hash based on the xmit_policy set in the bonding device
> > +/* Generate hash based on xmit policy. If @skb is given it is used to linearize
> > + * the data as required, but this function can be used without it if the data is
> > + * known to be linear (e.g. with xdp_buff).
> >    */
> > -u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
> > +static u32 __bond_xmit_hash(struct bonding *bond, struct sk_buff *skb, const void *data,
> > +                         __be16 l2_proto, int mhoff, int nhoff, int hlen)
> >   {
> >       struct flow_keys flow;
> >       u32 hash;
> >
> > -     if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
> > -         skb->l4_hash)
> > -             return skb->hash;
> > -
> >       if (bond->params.xmit_policy == BOND_XMIT_POLICY_VLAN_SRCMAC)
> > -             return bond_vlan_srcmac_hash(skb);
> > +             return bond_vlan_srcmac_hash(skb, data, mhoff, hlen);
> >
> >       if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER2 ||
> > -         !bond_flow_dissect(bond, skb, &flow))
> > -             return bond_eth_hash(skb);
> > +         !bond_flow_dissect(bond, skb, data, l2_proto, nhoff, hlen, &flow))
> > +             return bond_eth_hash(skb, data, mhoff, hlen);
> >
> >       if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 ||
> >           bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23) {
> > -             hash = bond_eth_hash(skb);
> > +             hash = bond_eth_hash(skb, data, mhoff, hlen);
> >       } else {
> >               if (flow.icmp.id)
> >                       memcpy(&hash, &flow.icmp, sizeof(hash));
> > @@ -3770,6 +3785,25 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
> >       return bond_ip_hash(hash, &flow);
> >   }
> >
> > +/**
> > + * bond_xmit_hash - generate a hash value based on the xmit policy
> > + * @bond: bonding device
> > + * @skb: buffer to use for headers
> > + *
> > + * This function will extract the necessary headers from the skb buffer and use
> > + * them to generate a hash based on the xmit_policy set in the bonding device
> > + */
> > +u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
> > +{
> > +     if (bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP34 &&
> > +         skb->l4_hash)
> > +             return skb->hash;
> > +
> > +     return __bond_xmit_hash(bond, skb, skb->head, skb->protocol,
> > +                             skb->mac_header, skb->network_header,
> > +                             skb_headlen(skb));
> > +}
> > +
> >   /*-------------------------- Device entry points ----------------------------*/
> >
> >   void bond_work_init_all(struct bonding *bond)
> > @@ -4399,8 +4433,7 @@ static netdev_tx_t bond_xmit_roundrobin(struct sk_buff *skb,
> >       return bond_tx_drop(bond_dev, skb);
> >   }
> >
> > -static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond,
> > -                                                   struct sk_buff *skb)
> > +static struct slave *bond_xmit_activebackup_slave_get(struct bonding *bond)
> >   {
> >       return rcu_dereference(bond->curr_active_slave);
> >   }
> > @@ -4414,7 +4447,7 @@ static netdev_tx_t bond_xmit_activebackup(struct sk_buff *skb,
> >       struct bonding *bond = netdev_priv(bond_dev);
> >       struct slave *slave;
> >
> > -     slave = bond_xmit_activebackup_slave_get(bond, skb);
> > +     slave = bond_xmit_activebackup_slave_get(bond);
> >       if (slave)
> >               return bond_dev_queue_xmit(bond, skb, slave->dev);
> >
> > @@ -4712,7 +4745,7 @@ static struct net_device *bond_xmit_get_slave(struct net_device *master_dev,
> >               slave = bond_xmit_roundrobin_slave_get(bond, skb);
> >               break;
> >       case BOND_MODE_ACTIVEBACKUP:
> > -             slave = bond_xmit_activebackup_slave_get(bond, skb);
> > +             slave = bond_xmit_activebackup_slave_get(bond);
> >               break;
> >       case BOND_MODE_8023AD:
> >       case BOND_MODE_XOR:
> >
>

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next v6 1/7] net: bonding: Refactor bond_xmit_hash for use with xdp_buff
  2021-08-11  8:22       ` Jussi Maki
@ 2021-08-11 14:05         ` Jonathan Toppins
  2021-08-16  9:05           ` Jussi Maki
  0 siblings, 1 reply; 71+ messages in thread
From: Jonathan Toppins @ 2021-08-11 14:05 UTC (permalink / raw)
  To: Jussi Maki
  Cc: bpf, Network Development, Daniel Borkmann, j.vosburgh,
	Andy Gospodarek, vfalico, Andrii Nakryiko, Maciej Fijalkowski,
	Karlsson, Magnus

On 8/11/21 4:22 AM, Jussi Maki wrote:
> Hi Jonathan,
> 
> Thanks for catching this. You're right, this will NULL deref if XDP
> bonding is used with the VLAN_SRCMAC xmit policy. I think what
> happened was that a very early version restricted the xmit policies
> that were applicable, but it got dropped when this was refactored.
> I'll look into this today and will add in support (or refuse) the
> VLAN_SRCMAC xmit policy and extend the tests to cover this.

In support of some customer requests and to stop adding more and more 
hashing policies I was looking at adding a custom policy that exposes a 
bitfield so userspace can select which header items should be included 
in the hash. I was looking at a flow dissector implementation to parse 
the packet and then generate the hash from the flow data pulled. It 
looks like the outer hashing functions as they exist now, 
bond_xmit_hash() and bond_xmit_hash_xdp(), could make the correctly 
formatted call to __skb_flow_dissect(). We would then pass around the 
resultant struct flow_keys, or bonding specific one to add MAC header 
parsing support, and it appears we could avoid making the actual hashing 
functions know if they need to hash an sk_buff vs xdp_buff. What do you 
think?

-Jon


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH bpf-next v6 1/7] net: bonding: Refactor bond_xmit_hash for use with xdp_buff
  2021-08-11 14:05         ` Jonathan Toppins
@ 2021-08-16  9:05           ` Jussi Maki
  0 siblings, 0 replies; 71+ messages in thread
From: Jussi Maki @ 2021-08-16  9:05 UTC (permalink / raw)
  To: Jonathan Toppins, jiri
  Cc: bpf, Network Development, Daniel Borkmann, j.vosburgh,
	Andy Gospodarek, vfalico, Andrii Nakryiko, Maciej Fijalkowski,
	Karlsson, Magnus

On Wed, Aug 11, 2021 at 4:05 PM Jonathan Toppins <jtoppins@redhat.com> wrote:
>
> On 8/11/21 4:22 AM, Jussi Maki wrote:
> > Hi Jonathan,
> >
> > Thanks for catching this. You're right, this will NULL deref if XDP
> > bonding is used with the VLAN_SRCMAC xmit policy. I think what
> > happened was that a very early version restricted the xmit policies
> > that were applicable, but it got dropped when this was refactored.
> > I'll look into this today and will add in support (or refuse) the
> > VLAN_SRCMAC xmit policy and extend the tests to cover this.
>
> In support of some customer requests and to stop adding more and more
> hashing policies I was looking at adding a custom policy that exposes a
> bitfield so userspace can select which header items should be included
> in the hash. I was looking at a flow dissector implementation to parse
> the packet and then generate the hash from the flow data pulled. It
> looks like the outer hashing functions as they exist now,
> bond_xmit_hash() and bond_xmit_hash_xdp(), could make the correctly
> formatted call to __skb_flow_dissect(). We would then pass around the
> resultant struct flow_keys, or bonding specific one to add MAC header
> parsing support, and it appears we could avoid making the actual hashing
> functions know if they need to hash an sk_buff vs xdp_buff. What do you
> think?

That sounds great! I wasn't particularly happy about how it works with
skb being optional as that was just waiting to break (as it did). The
team driver does the hashing using a user-space provided bpf program
and I'm looking to figure out how to support XDP with it. I wonder if
we could have a single approach that would work for both bonding and
team (e.g. use bpf to hash). CC'ing Jiri as he wrote the team driver.

^ permalink raw reply	[flat|nested] 71+ messages in thread

end of thread, other threads:[~2021-08-16  9:06 UTC | newest]

Thread overview: 71+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-09 13:55 [PATCH bpf-next 0/3] XDP bonding support Jussi Maki
2021-06-09 13:55 ` [PATCH bpf-next 1/3] net: bonding: Add XDP support to the bonding driver Jussi Maki
2021-06-09 22:29   ` Maciej Fijalkowski
2021-06-09 23:29   ` Jay Vosburgh
2021-06-14  8:02     ` Jussi Maki
2021-06-17  3:40   ` kernel test robot
2021-06-17  6:35   ` kernel test robot
2021-06-22  7:24   ` kernel test robot
2021-06-09 13:55 ` [PATCH bpf-next 2/3] net: bonding: Use per-cpu rr_tx_counter Jussi Maki
2021-06-10  0:04   ` Jay Vosburgh
2021-06-14  7:54     ` Jussi Maki
2021-06-09 13:55 ` [PATCH bpf-next 3/3] selftests/bpf: Add tests for XDP bonding Jussi Maki
2021-06-09 22:07   ` Maciej Fijalkowski
2021-06-14  8:08     ` Jussi Maki
2021-06-14  8:48       ` Magnus Karlsson
2021-06-14 12:20         ` Jussi Maki
2021-06-10 17:24 ` [PATCH bpf-next 0/3] XDP bonding support Andrii Nakryiko
2021-06-14 12:25   ` Jussi Maki
2021-06-14 15:37     ` Jay Vosburgh
2021-06-15  5:34     ` Andrii Nakryiko
2021-06-24  9:18 ` [PATCH bpf-next v2 0/4] " joamaki
2021-06-24  9:18   ` [PATCH bpf-next v2 1/4] net: bonding: Refactor bond_xmit_hash for use with xdp_buff joamaki
2021-06-24  9:18   ` [PATCH bpf-next v2 2/4] net: core: Add support for XDP redirection to slave device joamaki
2021-06-24  9:18   ` [PATCH bpf-next v2 3/4] net: bonding: Add XDP support to the bonding driver joamaki
2021-06-24  9:18   ` [PATCH bpf-next v2 4/4] devmap: Exclude XDP broadcast to master device joamaki
2021-07-01 18:12     ` Jay Vosburgh
2021-07-05 11:44       ` Jussi Maki
2021-07-01 18:20   ` [PATCH bpf-next v2 0/4] XDP bonding support Jay Vosburgh
2021-07-05 10:32     ` Jussi Maki
2021-07-07 11:25 ` [PATCH bpf-next v3 0/5] " Jussi Maki
2021-07-07 11:25   ` [PATCH bpf-next v3 1/5] net: bonding: Refactor bond_xmit_hash for use with xdp_buff Jussi Maki
2021-07-07 11:25   ` [PATCH bpf-next v3 2/5] net: core: Add support for XDP redirection to slave device Jussi Maki
2021-07-07 11:25   ` [PATCH bpf-next v3 3/5] net: bonding: Add XDP support to the bonding driver Jussi Maki
2021-07-13  7:14     ` kernel test robot
2021-07-07 11:25   ` [PATCH bpf-next v3 4/5] devmap: Exclude XDP broadcast to master device Jussi Maki
2021-07-07 11:25   ` [PATCH bpf-next v3 5/5] net: core: Allow netdev_lower_get_next_private_rcu in bh context Jussi Maki
2021-07-28 23:43 ` [PATCH bpf-next v4 0/6] XDP bonding support joamaki
2021-07-28 23:43   ` [PATCH bpf-next v4 1/6] net: bonding: Refactor bond_xmit_hash for use with xdp_buff joamaki
2021-07-28 23:43   ` [PATCH bpf-next v4 2/6] net: core: Add support for XDP redirection to slave device joamaki
2021-07-28 23:43   ` [PATCH bpf-next v4 3/6] net: bonding: Add XDP support to the bonding driver joamaki
2021-07-28 23:43   ` [PATCH bpf-next v4 4/6] devmap: Exclude XDP broadcast to master device joamaki
2021-07-28 23:43   ` [PATCH bpf-next v4 5/6] net: core: Allow netdev_lower_get_next_private_rcu in bh context joamaki
2021-07-28 23:43   ` [PATCH bpf-next v4 6/6] selftests/bpf: Add tests for XDP bonding joamaki
2021-08-03  0:19     ` Andrii Nakryiko
2021-08-03  9:40       ` Jussi Maki
2021-07-30  6:18 ` [PATCH bpf-next v5 0/7] XDP bonding support Jussi Maki
2021-07-30  6:18   ` [PATCH bpf-next v5 1/7] net: bonding: Refactor bond_xmit_hash for use with xdp_buff Jussi Maki
2021-07-30  6:18   ` [PATCH bpf-next v5 2/7] net: core: Add support for XDP redirection to slave device Jussi Maki
2021-07-30  6:18   ` [PATCH bpf-next v5 3/7] net: bonding: Add XDP support to the bonding driver Jussi Maki
2021-07-30  6:18   ` [PATCH bpf-next v5 4/7] devmap: Exclude XDP broadcast to master device Jussi Maki
2021-07-30  6:18   ` [PATCH bpf-next v5 5/7] net: core: Allow netdev_lower_get_next_private_rcu in bh context Jussi Maki
2021-07-30  6:18   ` [PATCH bpf-next v5 6/7] selftests/bpf: Fix xdp_tx.c prog section name Jussi Maki
2021-08-04 23:35     ` Andrii Nakryiko
2021-07-30  6:18   ` [PATCH bpf-next v5 7/7] selftests/bpf: Add tests for XDP bonding Jussi Maki
2021-08-04 23:33     ` Andrii Nakryiko
2021-07-31  5:57 ` [PATCH bpf-next v6 0/7]: XDP bonding support Jussi Maki
2021-07-31  5:57   ` [PATCH bpf-next v6 1/7] net: bonding: Refactor bond_xmit_hash for use with xdp_buff Jussi Maki
2021-08-11  1:52     ` Jonathan Toppins
2021-08-11  8:22       ` Jussi Maki
2021-08-11 14:05         ` Jonathan Toppins
2021-08-16  9:05           ` Jussi Maki
2021-07-31  5:57   ` [PATCH bpf-next v6 2/7] net: core: Add support for XDP redirection to slave device Jussi Maki
2021-07-31  5:57   ` [PATCH bpf-next v6 3/7] net: bonding: Add XDP support to the bonding driver Jussi Maki
2021-07-31  5:57   ` [PATCH bpf-next v6 4/7] devmap: Exclude XDP broadcast to master device Jussi Maki
2021-07-31  5:57   ` [PATCH bpf-next v6 5/7] net: core: Allow netdev_lower_get_next_private_rcu in bh context Jussi Maki
2021-07-31  5:57   ` [PATCH bpf-next v6 6/7] selftests/bpf: Fix xdp_tx.c prog section name Jussi Maki
2021-08-06 22:53     ` Andrii Nakryiko
2021-07-31  5:57   ` [PATCH bpf-next v6 7/7] selftests/bpf: Add tests for XDP bonding Jussi Maki
2021-08-06 22:50     ` Andrii Nakryiko
2021-08-09 14:24       ` Jussi Maki
2021-08-09 21:41         ` Daniel Borkmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).