netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf 0/2] bpf: Rework bpf_redirect_neigh() to allow supplying nexthop from caller
@ 2020-10-19 16:04 Toke Høiland-Jørgensen
  2020-10-19 16:04 ` [PATCH bpf 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter Toke Høiland-Jørgensen
  2020-10-19 16:04 ` [PATCH bpf 2/2] selftests: Update test_tc_redirect.sh to use the modified bpf_redirect_neigh() Toke Høiland-Jørgensen
  0 siblings, 2 replies; 9+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-19 16:04 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: David Ahern, netdev, bpf

Based on previous discussion[0], we determined that it would be beneficial to
rework bpf_redirect_neigh() so the caller can supply the nexthop information
(e.g., from a previous call to bpf_fib_lookup()). This way, the two helpers can
be combined without incurring a second FIB lookup to find the nexthop, and
bpf_fib_lookup() becomes usable even if no nexthop entry currently exists.

This patch (and accompanying selftest update) accomplishes this by way of an
optional paramter to bpf_redirect_neigh(). This series is against the -bpf tree,
since we need to change this call signature before it becomes API.

[0] https://lore.kernel.org/bpf/393e17fc-d187-3a8d-2f0d-a627c7c63fca@iogearbox.net/

Changelog:

v1:
- Rebase on -bpf tree
- Fix compilation with INET/INET6 disabled (kbot)
- Keep v4/v6 signatures similar, use internal flag (Daniel)
- Use a separate selftest BPF program instead of modifying existing one (Daniel)
- Fix a few style nits (David Ahern)

---

Toke Høiland-Jørgensen (2):
      bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
      selftests: Update test_tc_redirect.sh to use the modified bpf_redirect_neigh()


 .../selftests/bpf/progs/test_tc_neigh.c       |   5 +-
 .../selftests/bpf/progs/test_tc_neigh_fib.c   | 142 ++++++++++++++++++
 .../testing/selftests/bpf/test_tc_redirect.sh |  27 +++-
 3 files changed, 169 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_tc_neigh_fib.c


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH bpf 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
  2020-10-19 16:04 [PATCH bpf 0/2] bpf: Rework bpf_redirect_neigh() to allow supplying nexthop from caller Toke Høiland-Jørgensen
@ 2020-10-19 16:04 ` Toke Høiland-Jørgensen
  2020-10-19 18:23   ` Daniel Borkmann
                     ` (2 more replies)
  2020-10-19 16:04 ` [PATCH bpf 2/2] selftests: Update test_tc_redirect.sh to use the modified bpf_redirect_neigh() Toke Høiland-Jørgensen
  1 sibling, 3 replies; 9+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-19 16:04 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: David Ahern, netdev, bpf

From: Toke Høiland-Jørgensen <toke@redhat.com>

Based on the discussion in [0], update the bpf_redirect_neigh() helper to
accept an optional parameter specifying the nexthop information. This makes
it possible to combine bpf_fib_lookup() and bpf_redirect_neigh() without
incurring a duplicate FIB lookup - since the FIB lookup helper will return
the nexthop information even if no neighbour is present, this can simply be
passed on to bpf_redirect_neigh() if bpf_fib_lookup() returns
BPF_FIB_LKUP_RET_NO_NEIGH.

[0] https://lore.kernel.org/bpf/393e17fc-d187-3a8d-2f0d-a627c7c63fca@iogearbox.net/

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/linux/filter.h         |    9 ++
 include/uapi/linux/bpf.h       |   22 +++++-
 net/core/filter.c              |  159 +++++++++++++++++++++++++---------------
 scripts/bpf_helpers_doc.py     |    1 
 tools/include/uapi/linux/bpf.h |   22 +++++-
 5 files changed, 145 insertions(+), 68 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 20fc24c9779a..ba9de7188cd0 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -607,12 +607,21 @@ struct bpf_skb_data_end {
 	void *data_end;
 };
 
+struct bpf_nh_params {
+	u8 nh_family;
+	union {
+		__u32 ipv4_nh;
+		struct in6_addr ipv6_nh;
+	};
+};
+
 struct bpf_redirect_info {
 	u32 flags;
 	u32 tgt_index;
 	void *tgt_value;
 	struct bpf_map *map;
 	u32 kern_flags;
+	struct bpf_nh_params nh;
 };
 
 DECLARE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index bf5a99d803e4..82fd73ac1a13 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3677,15 +3677,19 @@ union bpf_attr {
  * 	Return
  * 		The id is returned or 0 in case the id could not be retrieved.
  *
- * long bpf_redirect_neigh(u32 ifindex, u64 flags)
+ * long bpf_redirect_neigh(u32 ifindex, struct bpf_redir_neigh *params, int plen, u64 flags)
  * 	Description
  * 		Redirect the packet to another net device of index *ifindex*
  * 		and fill in L2 addresses from neighboring subsystem. This helper
  * 		is somewhat similar to **bpf_redirect**\ (), except that it
  * 		populates L2 addresses as well, meaning, internally, the helper
- * 		performs a FIB lookup based on the skb's networking header to
- * 		get the address of the next hop and then relies on the neighbor
- * 		lookup for the L2 address of the nexthop.
+ * 		relies on the neighbor lookup for the L2 address of the nexthop.
+ *
+ * 		The helper will perform a FIB lookup based on the skb's
+ * 		networking header to get the address of the next hop, unless
+ * 		this is supplied by the caller in the *params* argument. The
+ * 		*plen* argument indicates the len of *params* and should be set
+ * 		to 0 if *params* is NULL.
  *
  * 		The *flags* argument is reserved and must be 0. The helper is
  * 		currently only supported for tc BPF program types, and enabled
@@ -4906,6 +4910,16 @@ struct bpf_fib_lookup {
 	__u8	dmac[6];     /* ETH_ALEN */
 };
 
+struct bpf_redir_neigh {
+	/* network family for lookup (AF_INET, AF_INET6) */
+	__u8	nh_family;
+	/* network address of nexthop; skips fib lookup to find gateway */
+	union {
+		__be32		ipv4_nh;
+		__u32		ipv6_nh[4];  /* in6_addr; network order */
+	};
+};
+
 enum bpf_task_fd_type {
 	BPF_FD_TYPE_RAW_TRACEPOINT,	/* tp name */
 	BPF_FD_TYPE_TRACEPOINT,		/* tp name */
diff --git a/net/core/filter.c b/net/core/filter.c
index c5e2a1c5fd8d..af8634cd4ba2 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2165,12 +2165,12 @@ static int __bpf_redirect(struct sk_buff *skb, struct net_device *dev,
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
-static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb)
+static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb,
+			    struct net_device *dev, struct bpf_nh_params *nh)
 {
-	struct dst_entry *dst = skb_dst(skb);
-	struct net_device *dev = dst->dev;
 	u32 hh_len = LL_RESERVED_SPACE(dev);
 	const struct in6_addr *nexthop;
+	struct dst_entry *dst = NULL;
 	struct neighbour *neigh;
 
 	if (dev_xmit_recursion()) {
@@ -2196,8 +2196,13 @@ static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb)
 	}
 
 	rcu_read_lock_bh();
-	nexthop = rt6_nexthop(container_of(dst, struct rt6_info, dst),
-			      &ipv6_hdr(skb)->daddr);
+	if (!nh) {
+		dst = skb_dst(skb);
+		nexthop = rt6_nexthop(container_of(dst, struct rt6_info, dst),
+				      &ipv6_hdr(skb)->daddr);
+	} else {
+		nexthop = &nh->ipv6_nh;
+	}
 	neigh = ip_neigh_gw6(dev, nexthop);
 	if (likely(!IS_ERR(neigh))) {
 		int ret;
@@ -2210,36 +2215,43 @@ static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb)
 		return ret;
 	}
 	rcu_read_unlock_bh();
-	IP6_INC_STATS(dev_net(dst->dev),
-		      ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES);
+	if (dst)
+		IP6_INC_STATS(dev_net(dst->dev),
+			      ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES);
 out_drop:
 	kfree_skb(skb);
 	return -ENETDOWN;
 }
 
-static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev)
+static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev,
+				   struct bpf_nh_params *nh)
 {
 	const struct ipv6hdr *ip6h = ipv6_hdr(skb);
 	struct net *net = dev_net(dev);
 	int err, ret = NET_XMIT_DROP;
-	struct dst_entry *dst;
-	struct flowi6 fl6 = {
-		.flowi6_flags	= FLOWI_FLAG_ANYSRC,
-		.flowi6_mark	= skb->mark,
-		.flowlabel	= ip6_flowinfo(ip6h),
-		.flowi6_oif	= dev->ifindex,
-		.flowi6_proto	= ip6h->nexthdr,
-		.daddr		= ip6h->daddr,
-		.saddr		= ip6h->saddr,
-	};
 
-	dst = ipv6_stub->ipv6_dst_lookup_flow(net, NULL, &fl6, NULL);
-	if (IS_ERR(dst))
-		goto out_drop;
+	if (!nh) {
+		struct dst_entry *dst;
+		struct flowi6 fl6 = {
+			.flowi6_flags = FLOWI_FLAG_ANYSRC,
+			.flowi6_mark  = skb->mark,
+			.flowlabel    = ip6_flowinfo(ip6h),
+			.flowi6_oif   = dev->ifindex,
+			.flowi6_proto = ip6h->nexthdr,
+			.daddr	      = ip6h->daddr,
+			.saddr	      = ip6h->saddr,
+		};
+
+		dst = ipv6_stub->ipv6_dst_lookup_flow(net, NULL, &fl6, NULL);
+		if (IS_ERR(dst))
+			goto out_drop;
 
-	skb_dst_set(skb, dst);
+		skb_dst_set(skb, dst);
+	} else if (nh->nh_family != AF_INET6) {
+		goto out_drop;
+	}
 
-	err = bpf_out_neigh_v6(net, skb);
+	err = bpf_out_neigh_v6(net, skb, dev, nh);
 	if (unlikely(net_xmit_eval(err)))
 		dev->stats.tx_errors++;
 	else
@@ -2252,7 +2264,8 @@ static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev)
 	return ret;
 }
 #else
-static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev)
+static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev,
+				   struct net_device *dev, struct bpf_nh_params *nh)
 {
 	kfree_skb(skb);
 	return NET_XMIT_DROP;
@@ -2260,11 +2273,9 @@ static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev)
 #endif /* CONFIG_IPV6 */
 
 #if IS_ENABLED(CONFIG_INET)
-static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb)
+static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb,
+			    struct net_device *dev, struct bpf_nh_params *nh)
 {
-	struct dst_entry *dst = skb_dst(skb);
-	struct rtable *rt = container_of(dst, struct rtable, dst);
-	struct net_device *dev = dst->dev;
 	u32 hh_len = LL_RESERVED_SPACE(dev);
 	struct neighbour *neigh;
 	bool is_v6gw = false;
@@ -2292,7 +2303,21 @@ static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb)
 	}
 
 	rcu_read_lock_bh();
-	neigh = ip_neigh_for_gw(rt, skb, &is_v6gw);
+	if (!nh) {
+		struct dst_entry *dst = skb_dst(skb);
+		struct rtable *rt = container_of(dst, struct rtable, dst);
+
+		neigh = ip_neigh_for_gw(rt, skb, &is_v6gw);
+	} else if (nh->nh_family == AF_INET6) {
+		neigh = ip_neigh_gw6(dev, &nh->ipv6_nh);
+		is_v6gw = true;
+	} else if (nh->nh_family == AF_INET) {
+		neigh = ip_neigh_gw4(dev, nh->ipv4_nh);
+	} else {
+		rcu_read_unlock_bh();
+		goto out_drop;
+	}
+
 	if (likely(!IS_ERR(neigh))) {
 		int ret;
 
@@ -2309,33 +2334,37 @@ static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb)
 	return -ENETDOWN;
 }
 
-static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev)
+static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev,
+				   struct bpf_nh_params *nh)
 {
 	const struct iphdr *ip4h = ip_hdr(skb);
 	struct net *net = dev_net(dev);
 	int err, ret = NET_XMIT_DROP;
-	struct rtable *rt;
-	struct flowi4 fl4 = {
-		.flowi4_flags	= FLOWI_FLAG_ANYSRC,
-		.flowi4_mark	= skb->mark,
-		.flowi4_tos	= RT_TOS(ip4h->tos),
-		.flowi4_oif	= dev->ifindex,
-		.flowi4_proto	= ip4h->protocol,
-		.daddr		= ip4h->daddr,
-		.saddr		= ip4h->saddr,
-	};
 
-	rt = ip_route_output_flow(net, &fl4, NULL);
-	if (IS_ERR(rt))
-		goto out_drop;
-	if (rt->rt_type != RTN_UNICAST && rt->rt_type != RTN_LOCAL) {
-		ip_rt_put(rt);
-		goto out_drop;
-	}
+	if (!nh) {
+		struct flowi4 fl4 = {
+			.flowi4_flags = FLOWI_FLAG_ANYSRC,
+			.flowi4_mark  = skb->mark,
+			.flowi4_tos   = RT_TOS(ip4h->tos),
+			.flowi4_oif   = dev->ifindex,
+			.flowi4_proto = ip4h->protocol,
+			.daddr	      = ip4h->daddr,
+			.saddr	      = ip4h->saddr,
+		};
+		struct rtable *rt;
+
+		rt = ip_route_output_flow(net, &fl4, NULL);
+		if (IS_ERR(rt))
+			goto out_drop;
+		if (rt->rt_type != RTN_UNICAST && rt->rt_type != RTN_LOCAL) {
+			ip_rt_put(rt);
+			goto out_drop;
+		}
 
-	skb_dst_set(skb, &rt->dst);
+		skb_dst_set(skb, &rt->dst);
+	}
 
-	err = bpf_out_neigh_v4(net, skb);
+	err = bpf_out_neigh_v4(net, skb, dev, nh);
 	if (unlikely(net_xmit_eval(err)))
 		dev->stats.tx_errors++;
 	else
@@ -2348,14 +2377,16 @@ static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev)
 	return ret;
 }
 #else
-static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev)
+static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev,
+				   struct net_device *dev, struct bpf_nh_params *nh)
 {
 	kfree_skb(skb);
 	return NET_XMIT_DROP;
 }
 #endif /* CONFIG_INET */
 
-static int __bpf_redirect_neigh(struct sk_buff *skb, struct net_device *dev)
+static int __bpf_redirect_neigh(struct sk_buff *skb, struct net_device *dev,
+				struct bpf_nh_params *nh)
 {
 	struct ethhdr *ethh = eth_hdr(skb);
 
@@ -2370,9 +2401,9 @@ static int __bpf_redirect_neigh(struct sk_buff *skb, struct net_device *dev)
 	skb_reset_network_header(skb);
 
 	if (skb->protocol == htons(ETH_P_IP))
-		return __bpf_redirect_neigh_v4(skb, dev);
+		return __bpf_redirect_neigh_v4(skb, dev, nh);
 	else if (skb->protocol == htons(ETH_P_IPV6))
-		return __bpf_redirect_neigh_v6(skb, dev);
+		return __bpf_redirect_neigh_v6(skb, dev, nh);
 out:
 	kfree_skb(skb);
 	return -ENOTSUPP;
@@ -2382,7 +2413,8 @@ static int __bpf_redirect_neigh(struct sk_buff *skb, struct net_device *dev)
 enum {
 	BPF_F_NEIGH	= (1ULL << 1),
 	BPF_F_PEER	= (1ULL << 2),
-#define BPF_F_REDIRECT_INTERNAL	(BPF_F_NEIGH | BPF_F_PEER)
+	BPF_F_NEXTHOP	= (1ULL << 3),
+#define BPF_F_REDIRECT_INTERNAL	(BPF_F_NEIGH | BPF_F_PEER | BPF_F_NEXTHOP)
 };
 
 BPF_CALL_3(bpf_clone_redirect, struct sk_buff *, skb, u32, ifindex, u64, flags)
@@ -2455,8 +2487,8 @@ int skb_do_redirect(struct sk_buff *skb)
 		return -EAGAIN;
 	}
 	return flags & BPF_F_NEIGH ?
-	       __bpf_redirect_neigh(skb, dev) :
-	       __bpf_redirect(skb, dev, flags);
+		__bpf_redirect_neigh(skb, dev, flags & BPF_F_NEXTHOP ? &ri->nh : NULL) :
+		__bpf_redirect(skb, dev, flags);
 out_drop:
 	kfree_skb(skb);
 	return -EINVAL;
@@ -2504,16 +2536,21 @@ static const struct bpf_func_proto bpf_redirect_peer_proto = {
 	.arg2_type      = ARG_ANYTHING,
 };
 
-BPF_CALL_2(bpf_redirect_neigh, u32, ifindex, u64, flags)
+BPF_CALL_4(bpf_redirect_neigh, u32, ifindex, struct bpf_redir_neigh *, params,
+	   int, plen, u64, flags)
 {
 	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
 
-	if (unlikely(flags))
+	if (unlikely((plen && plen < sizeof(*params)) || flags))
 		return TC_ACT_SHOT;
 
-	ri->flags = BPF_F_NEIGH;
+	ri->flags = BPF_F_NEIGH | (plen ? BPF_F_NEXTHOP : 0);
 	ri->tgt_index = ifindex;
 
+	BUILD_BUG_ON(sizeof(struct bpf_redir_neigh) != sizeof(struct bpf_nh_params));
+	if (plen)
+		memcpy(&ri->nh, params, sizeof(ri->nh));
+
 	return TC_ACT_REDIRECT;
 }
 
@@ -2522,7 +2559,9 @@ static const struct bpf_func_proto bpf_redirect_neigh_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_ANYTHING,
-	.arg2_type	= ARG_ANYTHING,
+	.arg2_type      = ARG_PTR_TO_MEM_OR_NULL,
+	.arg3_type      = ARG_CONST_SIZE_OR_ZERO,
+	.arg4_type	= ARG_ANYTHING,
 };
 
 BPF_CALL_2(bpf_msg_apply_bytes, struct sk_msg *, msg, u32, bytes)
diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py
index 7d86fdd190be..6769caae142f 100755
--- a/scripts/bpf_helpers_doc.py
+++ b/scripts/bpf_helpers_doc.py
@@ -453,6 +453,7 @@ class PrinterHelpers(Printer):
             'struct bpf_perf_event_data',
             'struct bpf_perf_event_value',
             'struct bpf_pidns_info',
+            'struct bpf_redir_neigh',
             'struct bpf_sk_lookup',
             'struct bpf_sock',
             'struct bpf_sock_addr',
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index bf5a99d803e4..82fd73ac1a13 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3677,15 +3677,19 @@ union bpf_attr {
  * 	Return
  * 		The id is returned or 0 in case the id could not be retrieved.
  *
- * long bpf_redirect_neigh(u32 ifindex, u64 flags)
+ * long bpf_redirect_neigh(u32 ifindex, struct bpf_redir_neigh *params, int plen, u64 flags)
  * 	Description
  * 		Redirect the packet to another net device of index *ifindex*
  * 		and fill in L2 addresses from neighboring subsystem. This helper
  * 		is somewhat similar to **bpf_redirect**\ (), except that it
  * 		populates L2 addresses as well, meaning, internally, the helper
- * 		performs a FIB lookup based on the skb's networking header to
- * 		get the address of the next hop and then relies on the neighbor
- * 		lookup for the L2 address of the nexthop.
+ * 		relies on the neighbor lookup for the L2 address of the nexthop.
+ *
+ * 		The helper will perform a FIB lookup based on the skb's
+ * 		networking header to get the address of the next hop, unless
+ * 		this is supplied by the caller in the *params* argument. The
+ * 		*plen* argument indicates the len of *params* and should be set
+ * 		to 0 if *params* is NULL.
  *
  * 		The *flags* argument is reserved and must be 0. The helper is
  * 		currently only supported for tc BPF program types, and enabled
@@ -4906,6 +4910,16 @@ struct bpf_fib_lookup {
 	__u8	dmac[6];     /* ETH_ALEN */
 };
 
+struct bpf_redir_neigh {
+	/* network family for lookup (AF_INET, AF_INET6) */
+	__u8	nh_family;
+	/* network address of nexthop; skips fib lookup to find gateway */
+	union {
+		__be32		ipv4_nh;
+		__u32		ipv6_nh[4];  /* in6_addr; network order */
+	};
+};
+
 enum bpf_task_fd_type {
 	BPF_FD_TYPE_RAW_TRACEPOINT,	/* tp name */
 	BPF_FD_TYPE_TRACEPOINT,		/* tp name */


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH bpf 2/2] selftests: Update test_tc_redirect.sh to use the modified bpf_redirect_neigh()
  2020-10-19 16:04 [PATCH bpf 0/2] bpf: Rework bpf_redirect_neigh() to allow supplying nexthop from caller Toke Høiland-Jørgensen
  2020-10-19 16:04 ` [PATCH bpf 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter Toke Høiland-Jørgensen
@ 2020-10-19 16:04 ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 9+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-19 16:04 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: David Ahern, netdev, bpf

From: Toke Høiland-Jørgensen <toke@redhat.com>

This updates the test_tc_neigh prog in selftests to use the new syntax of
bpf_redirect_neigh(). To exercise the helper both with and without the
optional parameter, add an additional test_tc_neigh_fib test program, which
does a bpf_fib_lookup() followed by a call to bpf_redirect_neigh() instead
of looking up the ifindex in a map.

Update the test_tc_redirect.sh script to run both versions of the test, and
while we're add it, fix it to work on systems that have a consolidated
dual-stack 'ping' binary instead of separate ping/ping6 versions.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 tools/testing/selftests/bpf/progs/test_tc_neigh.c  |    5 -
 .../selftests/bpf/progs/test_tc_neigh_fib.c        |  142 ++++++++++++++++++++
 tools/testing/selftests/bpf/test_tc_redirect.sh    |   27 +++-
 3 files changed, 169 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_tc_neigh_fib.c

diff --git a/tools/testing/selftests/bpf/progs/test_tc_neigh.c b/tools/testing/selftests/bpf/progs/test_tc_neigh.c
index fe182616b112..b985ac4e7a81 100644
--- a/tools/testing/selftests/bpf/progs/test_tc_neigh.c
+++ b/tools/testing/selftests/bpf/progs/test_tc_neigh.c
@@ -1,4 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
+#include <stddef.h>
 #include <stdint.h>
 #include <stdbool.h>
 
@@ -118,7 +119,7 @@ SEC("dst_ingress") int tc_dst(struct __sk_buff *skb)
 	if (bpf_skb_store_bytes(skb, 0, &zero, sizeof(zero), 0) < 0)
 		return TC_ACT_SHOT;
 
-	return bpf_redirect_neigh(get_dev_ifindex(dev_src), 0);
+	return bpf_redirect_neigh(get_dev_ifindex(dev_src), NULL, 0, 0);
 }
 
 SEC("src_ingress") int tc_src(struct __sk_buff *skb)
@@ -142,7 +143,7 @@ SEC("src_ingress") int tc_src(struct __sk_buff *skb)
 	if (bpf_skb_store_bytes(skb, 0, &zero, sizeof(zero), 0) < 0)
 		return TC_ACT_SHOT;
 
-	return bpf_redirect_neigh(get_dev_ifindex(dev_dst), 0);
+	return bpf_redirect_neigh(get_dev_ifindex(dev_dst), NULL, 0, 0);
 }
 
 char __license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/test_tc_neigh_fib.c b/tools/testing/selftests/bpf/progs/test_tc_neigh_fib.c
new file mode 100644
index 000000000000..055c7582c86c
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_tc_neigh_fib.c
@@ -0,0 +1,142 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <stdint.h>
+#include <stdbool.h>
+#include <stddef.h>
+
+#include <linux/bpf.h>
+#include <linux/stddef.h>
+#include <linux/pkt_cls.h>
+#include <linux/if_ether.h>
+#include <linux/in.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#ifndef ctx_ptr
+# define ctx_ptr(field)		(void *)(long)(field)
+#endif
+
+#define AF_INET 2
+#define AF_INET6 10
+
+static __always_inline int fill_fib_params_v4(struct __sk_buff *skb,
+					      struct bpf_fib_lookup *fib_params)
+{
+	void *data_end = ctx_ptr(skb->data_end);
+	void *data = ctx_ptr(skb->data);
+	struct iphdr *ip4h;
+
+	if (data + sizeof(struct ethhdr) > data_end)
+		return -1;
+
+	ip4h = (struct iphdr *)(data + sizeof(struct ethhdr));
+	if ((void *)(ip4h + 1) > data_end)
+		return -1;
+
+	fib_params->family = AF_INET;
+	fib_params->tos = ip4h->tos;
+	fib_params->l4_protocol = ip4h->protocol;
+	fib_params->sport = 0;
+	fib_params->dport = 0;
+	fib_params->tot_len = bpf_ntohs(ip4h->tot_len);
+	fib_params->ipv4_src = ip4h->saddr;
+	fib_params->ipv4_dst = ip4h->daddr;
+
+	return 0;
+}
+
+static __always_inline int fill_fib_params_v6(struct __sk_buff *skb,
+					      struct bpf_fib_lookup *fib_params)
+{
+	struct in6_addr *src = (struct in6_addr *)fib_params->ipv6_src;
+	struct in6_addr *dst = (struct in6_addr *)fib_params->ipv6_dst;
+	void *data_end = ctx_ptr(skb->data_end);
+	void *data = ctx_ptr(skb->data);
+	struct ipv6hdr *ip6h;
+
+	if (data + sizeof(struct ethhdr) > data_end)
+		return -1;
+
+	ip6h = (struct ipv6hdr *)(data + sizeof(struct ethhdr));
+	if ((void *)(ip6h + 1) > data_end)
+		return -1;
+
+	fib_params->family = AF_INET6;
+	fib_params->flowinfo = 0;
+	fib_params->l4_protocol = ip6h->nexthdr;
+	fib_params->sport = 0;
+	fib_params->dport = 0;
+	fib_params->tot_len = bpf_ntohs(ip6h->payload_len);
+	*src = ip6h->saddr;
+	*dst = ip6h->daddr;
+
+	return 0;
+}
+
+SEC("chk_egress") int tc_chk(struct __sk_buff *skb)
+{
+	void *data_end = ctx_ptr(skb->data_end);
+	void *data = ctx_ptr(skb->data);
+	__u32 *raw = data;
+
+	if (data + sizeof(struct ethhdr) > data_end)
+		return TC_ACT_SHOT;
+
+	return !raw[0] && !raw[1] && !raw[2] ? TC_ACT_SHOT : TC_ACT_OK;
+}
+
+SEC("redir_ingress") int tc_dst(struct __sk_buff *skb)
+{
+	struct bpf_fib_lookup fib_params = { .ifindex = skb->ingress_ifindex };
+	__u8 zero[ETH_ALEN * 2];
+	int ret = -1;
+
+	switch (skb->protocol) {
+	case __bpf_constant_htons(ETH_P_IP):
+		ret = fill_fib_params_v4(skb, &fib_params);
+		break;
+	case __bpf_constant_htons(ETH_P_IPV6):
+		ret = fill_fib_params_v6(skb, &fib_params);
+		break;
+	}
+
+	if (ret)
+		return TC_ACT_OK;
+
+	ret = bpf_fib_lookup(skb, &fib_params, sizeof(fib_params), 0);
+	if (ret == BPF_FIB_LKUP_RET_NOT_FWDED || ret < 0)
+		return TC_ACT_OK;
+
+	__builtin_memset(&zero, 0, sizeof(zero));
+	if (bpf_skb_store_bytes(skb, 0, &zero, sizeof(zero), 0) < 0)
+		return TC_ACT_SHOT;
+
+	if (ret == BPF_FIB_LKUP_RET_SUCCESS) {
+		void *data_end = ctx_ptr(skb->data_end);
+		struct ethhdr *eth = ctx_ptr(skb->data);
+
+		if (eth + 1 > data_end)
+			return TC_ACT_SHOT;
+
+		__builtin_memcpy(eth->h_dest, fib_params.dmac, ETH_ALEN);
+		__builtin_memcpy(eth->h_source, fib_params.smac, ETH_ALEN);
+
+		return bpf_redirect(fib_params.ifindex, 0);
+
+	} else if (ret == BPF_FIB_LKUP_RET_NO_NEIGH) {
+		struct bpf_redir_neigh nh_params = {};
+
+		nh_params.nh_family = fib_params.family;
+		__builtin_memcpy(&nh_params.ipv6_nh, &fib_params.ipv6_dst,
+				 sizeof(nh_params.ipv6_nh));
+
+		return bpf_redirect_neigh(fib_params.ifindex, &nh_params,
+					  sizeof(nh_params), 0);
+	}
+
+	return TC_ACT_SHOT;
+}
+
+char __license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_tc_redirect.sh b/tools/testing/selftests/bpf/test_tc_redirect.sh
index 6d7482562140..a4903739846d 100755
--- a/tools/testing/selftests/bpf/test_tc_redirect.sh
+++ b/tools/testing/selftests/bpf/test_tc_redirect.sh
@@ -24,8 +24,7 @@ command -v timeout >/dev/null 2>&1 || \
 	{ echo >&2 "timeout is not available"; exit 1; }
 command -v ping >/dev/null 2>&1 || \
 	{ echo >&2 "ping is not available"; exit 1; }
-command -v ping6 >/dev/null 2>&1 || \
-	{ echo >&2 "ping6 is not available"; exit 1; }
+if command -v ping6 >/dev/null 2>&1; then PING6=ping6; else PING6=ping; fi
 command -v perl >/dev/null 2>&1 || \
 	{ echo >&2 "perl is not available"; exit 1; }
 command -v jq >/dev/null 2>&1 || \
@@ -152,7 +151,7 @@ netns_test_connectivity()
 	echo -e "${TEST}: ${GREEN}PASS${NC}"
 
 	TEST="ICMPv6 connectivity test"
-	ip netns exec ${NS_SRC} ping6 $PING_ARG ${IP6_DST}
+	ip netns exec ${NS_SRC} $PING6 $PING_ARG ${IP6_DST}
 	if [ $? -ne 0 ]; then
 		echo -e "${TEST}: ${RED}FAIL${NC}"
 		exit 1
@@ -192,6 +191,24 @@ netns_setup_bpf()
 	done
 }
 
+netns_setup_bpf_fib()
+{
+	local obj=$1
+
+	ip netns exec ${NS_FWD} tc qdisc add dev veth_src_fwd clsact
+	ip netns exec ${NS_FWD} tc filter add dev veth_src_fwd ingress bpf da obj $obj sec redir_ingress
+	ip netns exec ${NS_FWD} tc filter add dev veth_src_fwd egress  bpf da obj $obj sec chk_egress
+
+	ip netns exec ${NS_FWD} tc qdisc add dev veth_dst_fwd clsact
+	ip netns exec ${NS_FWD} tc filter add dev veth_dst_fwd ingress bpf da obj $obj sec redir_ingress
+	ip netns exec ${NS_FWD} tc filter add dev veth_dst_fwd egress  bpf da obj $obj sec chk_egress
+
+	# bpf_fib_lookup() checks if forwarding is enabled
+	ip netns exec ${NS_FWD} sysctl -w net.ipv4.ip_forward=1
+	ip netns exec ${NS_FWD} sysctl -w net.ipv6.conf.veth_dst_fwd.forwarding=1
+	ip netns exec ${NS_FWD} sysctl -w net.ipv6.conf.veth_src_fwd.forwarding=1
+}
+
 trap netns_cleanup EXIT
 set -e
 
@@ -200,5 +217,9 @@ netns_setup_bpf test_tc_neigh.o
 netns_test_connectivity
 netns_cleanup
 netns_setup
+netns_setup_bpf_fib test_tc_neigh_fib.o
+netns_test_connectivity
+netns_cleanup
+netns_setup
 netns_setup_bpf test_tc_peer.o
 netns_test_connectivity


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
  2020-10-19 16:04 ` [PATCH bpf 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter Toke Høiland-Jørgensen
@ 2020-10-19 18:23   ` Daniel Borkmann
  2020-10-20  3:12     ` David Ahern
  2020-10-19 20:23   ` kernel test robot
  2020-10-20  3:08   ` David Ahern
  2 siblings, 1 reply; 9+ messages in thread
From: Daniel Borkmann @ 2020-10-19 18:23 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: David Ahern, netdev, bpf

On 10/19/20 6:04 PM, Toke Høiland-Jørgensen wrote:
> From: Toke Høiland-Jørgensen <toke@redhat.com>
> 
> Based on the discussion in [0], update the bpf_redirect_neigh() helper to
> accept an optional parameter specifying the nexthop information. This makes
> it possible to combine bpf_fib_lookup() and bpf_redirect_neigh() without
> incurring a duplicate FIB lookup - since the FIB lookup helper will return
> the nexthop information even if no neighbour is present, this can simply be
> passed on to bpf_redirect_neigh() if bpf_fib_lookup() returns
> BPF_FIB_LKUP_RET_NO_NEIGH.
> 
> [0] https://lore.kernel.org/bpf/393e17fc-d187-3a8d-2f0d-a627c7c63fca@iogearbox.net/
> 
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>

Looks good to me, thanks! I'll wait till David gets a chance as well to review.
One thing that would have made sense to me (probably worth a v2) is to keep the
fib lookup flag you had back then, meaning sth like BPF_FIB_SKIP_NEIGH which
would then return a BPF_FIB_LKUP_RET_NO_NEIGH before doing the neigh lookup inside
the bpf_ipv{4,6}_fib_lookup() so that programs can just unconditionally use the
combination of bpf_fib_lookup(skb, [...], BPF_FIB_SKIP_NEIGH) with the
bpf_redirect_neigh([...]) extension in that case and not do this bpf_redirect()
vs bpf_redirect_neigh() dance as you have in the selftest in patch 2/2.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
  2020-10-19 16:04 ` [PATCH bpf 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter Toke Høiland-Jørgensen
  2020-10-19 18:23   ` Daniel Borkmann
@ 2020-10-19 20:23   ` kernel test robot
  2020-10-20  3:08   ` David Ahern
  2 siblings, 0 replies; 9+ messages in thread
From: kernel test robot @ 2020-10-19 20:23 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Daniel Borkmann
  Cc: kbuild-all, clang-built-linux, David Ahern, netdev, bpf

[-- Attachment #1: Type: text/plain, Size: 3871 bytes --]

Hi "Toke,

I love your patch! Yet something to improve:

[auto build test ERROR on bpf/master]

url:    https://github.com/0day-ci/linux/commits/Toke-H-iland-J-rgensen/bpf-Rework-bpf_redirect_neigh-to-allow-supplying-nexthop-from-caller/20201020-000711
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git master
config: x86_64-randconfig-a016-20201019 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 094e9f4779eb9b5c6a49014f2f80b8cbb833572f)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/250f45235b06a53ae71587457c94d9c3a3a99f9a
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Toke-H-iland-J-rgensen/bpf-Rework-bpf_redirect_neigh-to-allow-supplying-nexthop-from-caller/20201020-000711
        git checkout 250f45235b06a53ae71587457c94d9c3a3a99f9a
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

>> net/core/filter.c:2268:27: error: redefinition of parameter 'dev'
                                      struct net_device *dev, struct bpf_nh_params *nh)
                                                         ^
   net/core/filter.c:2267:76: note: previous declaration is here
   static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev,
                                                                              ^
>> net/core/filter.c:2268:27: warning: omitting the parameter name in a function definition is a C2x extension [-Wc2x-extensions]
                                      struct net_device *dev, struct bpf_nh_params *nh)
                                                         ^
   1 warning and 1 error generated.

vim +/dev +2268 net/core/filter.c

  2225	
  2226	static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev,
  2227					   struct bpf_nh_params *nh)
  2228	{
  2229		const struct ipv6hdr *ip6h = ipv6_hdr(skb);
  2230		struct net *net = dev_net(dev);
  2231		int err, ret = NET_XMIT_DROP;
  2232	
  2233		if (!nh) {
  2234			struct dst_entry *dst;
  2235			struct flowi6 fl6 = {
  2236				.flowi6_flags = FLOWI_FLAG_ANYSRC,
  2237				.flowi6_mark  = skb->mark,
  2238				.flowlabel    = ip6_flowinfo(ip6h),
  2239				.flowi6_oif   = dev->ifindex,
  2240				.flowi6_proto = ip6h->nexthdr,
  2241				.daddr	      = ip6h->daddr,
  2242				.saddr	      = ip6h->saddr,
  2243			};
  2244	
  2245			dst = ipv6_stub->ipv6_dst_lookup_flow(net, NULL, &fl6, NULL);
  2246			if (IS_ERR(dst))
  2247				goto out_drop;
  2248	
  2249			skb_dst_set(skb, dst);
  2250		} else if (nh->nh_family != AF_INET6) {
  2251			goto out_drop;
  2252		}
  2253	
  2254		err = bpf_out_neigh_v6(net, skb, dev, nh);
  2255		if (unlikely(net_xmit_eval(err)))
  2256			dev->stats.tx_errors++;
  2257		else
  2258			ret = NET_XMIT_SUCCESS;
  2259		goto out_xmit;
  2260	out_drop:
  2261		dev->stats.tx_errors++;
  2262		kfree_skb(skb);
  2263	out_xmit:
  2264		return ret;
  2265	}
  2266	#else
  2267	static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev,
> 2268					   struct net_device *dev, struct bpf_nh_params *nh)
  2269	{
  2270		kfree_skb(skb);
  2271		return NET_XMIT_DROP;
  2272	}
  2273	#endif /* CONFIG_IPV6 */
  2274	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 31734 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
  2020-10-19 16:04 ` [PATCH bpf 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter Toke Høiland-Jørgensen
  2020-10-19 18:23   ` Daniel Borkmann
  2020-10-19 20:23   ` kernel test robot
@ 2020-10-20  3:08   ` David Ahern
  2 siblings, 0 replies; 9+ messages in thread
From: David Ahern @ 2020-10-20  3:08 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Daniel Borkmann
  Cc: David Ahern, netdev, bpf

On 10/19/20 10:04 AM, Toke Høiland-Jørgensen wrote:
> @@ -4906,6 +4910,16 @@ struct bpf_fib_lookup {
>  	__u8	dmac[6];     /* ETH_ALEN */
>  };
>  
> +struct bpf_redir_neigh {
> +	/* network family for lookup (AF_INET, AF_INET6) */
> +	__u8	nh_family;

Define every byte. This is a 3-byte hole that should have unused's and
the helper verifies they are set to 0.

> +	/* network address of nexthop; skips fib lookup to find gateway */
> +	union {
> +		__be32		ipv4_nh;
> +		__u32		ipv6_nh[4];  /* in6_addr; network order */
> +	};
> +};
> +
>  enum bpf_task_fd_type {
>  	BPF_FD_TYPE_RAW_TRACEPOINT,	/* tp name */
>  	BPF_FD_TYPE_TRACEPOINT,		/* tp name */



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
  2020-10-19 18:23   ` Daniel Borkmann
@ 2020-10-20  3:12     ` David Ahern
  2020-10-20  8:59       ` Daniel Borkmann
  0 siblings, 1 reply; 9+ messages in thread
From: David Ahern @ 2020-10-20  3:12 UTC (permalink / raw)
  To: Daniel Borkmann, Toke Høiland-Jørgensen
  Cc: David Ahern, netdev, bpf

On 10/19/20 12:23 PM, Daniel Borkmann wrote:
> Looks good to me, thanks! I'll wait till David gets a chance as well to
> review.
> One thing that would have made sense to me (probably worth a v2) is to
> keep the
> fib lookup flag you had back then, meaning sth like BPF_FIB_SKIP_NEIGH
> which
> would then return a BPF_FIB_LKUP_RET_NO_NEIGH before doing the neigh
> lookup inside
> the bpf_ipv{4,6}_fib_lookup() so that programs can just unconditionally
> use the
> combination of bpf_fib_lookup(skb, [...], BPF_FIB_SKIP_NEIGH) with the
> bpf_redirect_neigh([...]) extension in that case and not do this
> bpf_redirect()
> vs bpf_redirect_neigh() dance as you have in the selftest in patch 2/2.

That puts the overhead on bpf_ipv{4,6}_fib_lookup. The existiong helpers
return BPF_FIB_LKUP_RET_NO_NEIGH which is the key to the bpf program to
call the bpf_redirect_neigh - making the program deal with the overhead
as needed on failures.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
  2020-10-20  3:12     ` David Ahern
@ 2020-10-20  8:59       ` Daniel Borkmann
  2020-10-20 13:45         ` David Ahern
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Borkmann @ 2020-10-20  8:59 UTC (permalink / raw)
  To: David Ahern, Toke Høiland-Jørgensen; +Cc: David Ahern, netdev, bpf

On 10/20/20 5:12 AM, David Ahern wrote:
> On 10/19/20 12:23 PM, Daniel Borkmann wrote:
>> Looks good to me, thanks! I'll wait till David gets a chance as well to
>> review.
>> One thing that would have made sense to me (probably worth a v2) is to
>> keep the
>> fib lookup flag you had back then, meaning sth like BPF_FIB_SKIP_NEIGH
>> which
>> would then return a BPF_FIB_LKUP_RET_NO_NEIGH before doing the neigh
>> lookup inside
>> the bpf_ipv{4,6}_fib_lookup() so that programs can just unconditionally
>> use the
>> combination of bpf_fib_lookup(skb, [...], BPF_FIB_SKIP_NEIGH) with the
>> bpf_redirect_neigh([...]) extension in that case and not do this
>> bpf_redirect()
>> vs bpf_redirect_neigh() dance as you have in the selftest in patch 2/2.
> 
> That puts the overhead on bpf_ipv{4,6}_fib_lookup. The existiong helpers
> return BPF_FIB_LKUP_RET_NO_NEIGH which is the key to the bpf program to
> call the bpf_redirect_neigh - making the program deal with the overhead
> as needed on failures.

But if I know there's high chance anyway that __ipv*_neigh_lookup_noref*()
is failing for bpf_ipv{4,6}_fib_lookup() why even paying the price of the
hash table lookup in there? Simple test to skip and return early would be
much cheaper, and branch predictor should work it out just fine given that
setting is pretty much static anyway; I'm not sure I'm seeing why this would
be much of a concern..

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
  2020-10-20  8:59       ` Daniel Borkmann
@ 2020-10-20 13:45         ` David Ahern
  0 siblings, 0 replies; 9+ messages in thread
From: David Ahern @ 2020-10-20 13:45 UTC (permalink / raw)
  To: Daniel Borkmann, Toke Høiland-Jørgensen
  Cc: David Ahern, netdev, bpf

On 10/20/20 2:59 AM, Daniel Borkmann wrote:
> On 10/20/20 5:12 AM, David Ahern wrote:
>> On 10/19/20 12:23 PM, Daniel Borkmann wrote:
>>> Looks good to me, thanks! I'll wait till David gets a chance as well to
>>> review.
>>> One thing that would have made sense to me (probably worth a v2) is to
>>> keep the
>>> fib lookup flag you had back then, meaning sth like BPF_FIB_SKIP_NEIGH
>>> which
>>> would then return a BPF_FIB_LKUP_RET_NO_NEIGH before doing the neigh
>>> lookup inside
>>> the bpf_ipv{4,6}_fib_lookup() so that programs can just unconditionally
>>> use the
>>> combination of bpf_fib_lookup(skb, [...], BPF_FIB_SKIP_NEIGH) with the
>>> bpf_redirect_neigh([...]) extension in that case and not do this
>>> bpf_redirect()
>>> vs bpf_redirect_neigh() dance as you have in the selftest in patch 2/2.
>>
>> That puts the overhead on bpf_ipv{4,6}_fib_lookup. The existiong helpers
>> return BPF_FIB_LKUP_RET_NO_NEIGH which is the key to the bpf program to
>> call the bpf_redirect_neigh - making the program deal with the overhead
>> as needed on failures.
> 
> But if I know there's high chance anyway that __ipv*_neigh_lookup_noref*()
> is failing for bpf_ipv{4,6}_fib_lookup() why even paying the price of the
> hash table lookup in there? Simple test to skip and return early would be
> much cheaper, and branch predictor should work it out just fine given that
> setting is pretty much static anyway; I'm not sure I'm seeing why this
> would
> be much of a concern..

The death by a 1000 paper cuts mantra.

The new bpf_redirect_neigh helper only works for skb's; the older
bpf_fib_lookup helpers work for XDP and skb's.

The primary reason for a program to need both helpers back to back is
when the neighbor entry is invalid. A condition where the nexthop
address is valid yet the neighbor entry is not resolving or in an
invalid state should be a rare event - like startup.

The existing helper has enough 'if' checks in it, forced by the
multitude of features the stack supports. Rare runtime events should be
handled by the bpf programs.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-10-20 13:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-19 16:04 [PATCH bpf 0/2] bpf: Rework bpf_redirect_neigh() to allow supplying nexthop from caller Toke Høiland-Jørgensen
2020-10-19 16:04 ` [PATCH bpf 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter Toke Høiland-Jørgensen
2020-10-19 18:23   ` Daniel Borkmann
2020-10-20  3:12     ` David Ahern
2020-10-20  8:59       ` Daniel Borkmann
2020-10-20 13:45         ` David Ahern
2020-10-19 20:23   ` kernel test robot
2020-10-20  3:08   ` David Ahern
2020-10-19 16:04 ` [PATCH bpf 2/2] selftests: Update test_tc_redirect.sh to use the modified bpf_redirect_neigh() Toke Høiland-Jørgensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).