[PATCH bpf v2 0/3] bpf: Rework bpf_redirect_neigh() to allow supplying nexthop from caller

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH bpf v2 0/3] bpf: Rework bpf_redirect_neigh() to allow supplying nexthop from caller
@ 2020-10-20 10:51 Toke Høiland-Jørgensen
  2020-10-20 10:51 ` [PATCH bpf v2 1/3] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter Toke Høiland-Jørgensen
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-20 10:51 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: David Ahern, netdev, bpf

Based on previous discussion[0], we determined that it would be beneficial to
rework bpf_redirect_neigh() so the caller can supply the nexthop information
(e.g., from a previous call to bpf_fib_lookup()). This way, the two helpers can
be combined without incurring a second FIB lookup to find the nexthop, and
bpf_fib_lookup() becomes usable even if no nexthop entry currently exists.

This patch (and accompanying selftest update) accomplishes this by way of an
optional paramter to bpf_redirect_neigh(). This series is against the -bpf tree,
since we need to change this call signature before it becomes API. As a
companion change, it also adds a flag to bpf_fib_lookup() that will make it skip
the neighbour lookup, for cases where the caller knows it is likely for fail
anyway, and wants to go straight to bpf_redirect_neigh().

[0] https://lore.kernel.org/bpf/393e17fc-d187-3a8d-2f0d-a627c7c63fca@iogearbox.net/

Changelog:

v2:
- Add 'unused' member to fill hole in bpf_redir_neigh struct (David Ahern)
- Fix compilation with INET/INET6 disabled - properly this time (kbot)
- Add back the BPF_FIB_LOOKUP_SKIP_NEIGH flag as new patch 2 (Daniel)

v1:
- Rebase on -bpf tree
- Fix compilation with INET/INET6 disabled (kbot)
- Keep v4/v6 signatures similar, use internal flag (Daniel)
- Use a separate selftest BPF program instead of modifying existing one (Daniel)
- Fix a few style nits (David Ahern)

---

Toke Høiland-Jørgensen (3):
      bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
      bpf_fib_lookup: optionally skip neighbour lookup
      selftests: Update test_tc_redirect.sh to use the modified bpf_redirect_neigh()

 include/uapi/linux/bpf.h                      |  10 +-
 net/core/filter.c                             |  16 +-
 tools/include/uapi/linux/bpf.h                |  10 +-
 .../selftests/bpf/progs/test_tc_neigh.c       |   5 +-
 .../selftests/bpf/progs/test_tc_neigh_fib.c   | 153 ++++++++++++++++++
 .../testing/selftests/bpf/test_tc_redirect.sh |  18 ++-
 6 files changed, 197 insertions(+), 15 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_tc_neigh_fib.c

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf v2 1/3] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
  2020-10-20 10:51 [PATCH bpf v2 0/3] bpf: Rework bpf_redirect_neigh() to allow supplying nexthop from caller Toke Høiland-Jørgensen
@ 2020-10-20 10:51 ` Toke Høiland-Jørgensen
  2020-10-20 15:08   ` Daniel Borkmann
                     ` (2 more replies)
  2020-10-20 10:51 ` [PATCH bpf v2 2/3] bpf_fib_lookup: optionally skip neighbour lookup Toke Høiland-Jørgensen
  2020-10-20 10:51 ` [PATCH bpf v2 3/3] selftests: Update test_tc_redirect.sh to use the modified bpf_redirect_neigh() Toke Høiland-Jørgensen
  2 siblings, 3 replies; 19+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-20 10:51 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: David Ahern, netdev, bpf

From: Toke Høiland-Jørgensen <toke@redhat.com>

Based on the discussion in [0], update the bpf_redirect_neigh() helper to
accept an optional parameter specifying the nexthop information. This makes
it possible to combine bpf_fib_lookup() and bpf_redirect_neigh() without
incurring a duplicate FIB lookup - since the FIB lookup helper will return
the nexthop information even if no neighbour is present, this can simply be
passed on to bpf_redirect_neigh() if bpf_fib_lookup() returns
BPF_FIB_LKUP_RET_NO_NEIGH.

[0] https://lore.kernel.org/bpf/393e17fc-d187-3a8d-2f0d-a627c7c63fca@iogearbox.net/

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/linux/filter.h         |    9 ++
 include/uapi/linux/bpf.h       |   24 +++++-
 net/core/filter.c              |  163 +++++++++++++++++++++++++---------------
 scripts/bpf_helpers_doc.py     |    1 
 tools/include/uapi/linux/bpf.h |   24 +++++-
 5 files changed, 153 insertions(+), 68 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 20fc24c9779a..ba9de7188cd0 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -607,12 +607,21 @@ struct bpf_skb_data_end {
 	void *data_end;
 };
 
+struct bpf_nh_params {
+	u8 nh_family;
+	union {
+		__u32 ipv4_nh;
+		struct in6_addr ipv6_nh;
+	};
+};
+
 struct bpf_redirect_info {
 	u32 flags;
 	u32 tgt_index;
 	void *tgt_value;
 	struct bpf_map *map;
 	u32 kern_flags;
+	struct bpf_nh_params nh;
 };
 
 DECLARE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index bf5a99d803e4..9668cde9d684 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3677,15 +3677,19 @@ union bpf_attr {
  * 	Return
  * 		The id is returned or 0 in case the id could not be retrieved.
  *
- * long bpf_redirect_neigh(u32 ifindex, u64 flags)
+ * long bpf_redirect_neigh(u32 ifindex, struct bpf_redir_neigh *params, int plen, u64 flags)
  * 	Description
  * 		Redirect the packet to another net device of index *ifindex*
  * 		and fill in L2 addresses from neighboring subsystem. This helper
  * 		is somewhat similar to **bpf_redirect**\ (), except that it
  * 		populates L2 addresses as well, meaning, internally, the helper
- * 		performs a FIB lookup based on the skb's networking header to
- * 		get the address of the next hop and then relies on the neighbor
- * 		lookup for the L2 address of the nexthop.
+ * 		relies on the neighbor lookup for the L2 address of the nexthop.
+ *
+ * 		The helper will perform a FIB lookup based on the skb's
+ * 		networking header to get the address of the next hop, unless
+ * 		this is supplied by the caller in the *params* argument. The
+ * 		*plen* argument indicates the len of *params* and should be set
+ * 		to 0 if *params* is NULL.
  *
  * 		The *flags* argument is reserved and must be 0. The helper is
  * 		currently only supported for tc BPF program types, and enabled
@@ -4906,6 +4910,18 @@ struct bpf_fib_lookup {
 	__u8	dmac[6];     /* ETH_ALEN */
 };
 
+struct bpf_redir_neigh {
+	/* network family for lookup (AF_INET, AF_INET6) */
+	__u8 nh_family;
+	 /* avoid hole in struct - must be set to 0 */
+	__u8 unused[3];
+	/* network address of nexthop; skips fib lookup to find gateway */
+	union {
+		__be32		ipv4_nh;
+		__u32		ipv6_nh[4];  /* in6_addr; network order */
+	};
+};
+
 enum bpf_task_fd_type {
 	BPF_FD_TYPE_RAW_TRACEPOINT,	/* tp name */
 	BPF_FD_TYPE_TRACEPOINT,		/* tp name */
diff --git a/net/core/filter.c b/net/core/filter.c
index c5e2a1c5fd8d..fa09b4f141ae 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2165,12 +2165,12 @@ static int __bpf_redirect(struct sk_buff *skb, struct net_device *dev,
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
-static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb)
+static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb,
+			    struct net_device *dev, struct bpf_nh_params *nh)
 {
-	struct dst_entry *dst = skb_dst(skb);
-	struct net_device *dev = dst->dev;
 	u32 hh_len = LL_RESERVED_SPACE(dev);
 	const struct in6_addr *nexthop;
+	struct dst_entry *dst = NULL;
 	struct neighbour *neigh;
 
 	if (dev_xmit_recursion()) {
@@ -2196,8 +2196,13 @@ static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb)
 	}
 
 	rcu_read_lock_bh();
-	nexthop = rt6_nexthop(container_of(dst, struct rt6_info, dst),
-			      &ipv6_hdr(skb)->daddr);
+	if (!nh) {
+		dst = skb_dst(skb);
+		nexthop = rt6_nexthop(container_of(dst, struct rt6_info, dst),
+				      &ipv6_hdr(skb)->daddr);
+	} else {
+		nexthop = &nh->ipv6_nh;
+	}
 	neigh = ip_neigh_gw6(dev, nexthop);
 	if (likely(!IS_ERR(neigh))) {
 		int ret;
@@ -2210,36 +2215,43 @@ static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb)
 		return ret;
 	}
 	rcu_read_unlock_bh();
-	IP6_INC_STATS(dev_net(dst->dev),
-		      ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES);
+	if (dst)
+		IP6_INC_STATS(dev_net(dst->dev),
+			      ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES);
 out_drop:
 	kfree_skb(skb);
 	return -ENETDOWN;
 }
 
-static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev)
+static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev,
+				   struct bpf_nh_params *nh)
 {
 	const struct ipv6hdr *ip6h = ipv6_hdr(skb);
 	struct net *net = dev_net(dev);
 	int err, ret = NET_XMIT_DROP;
-	struct dst_entry *dst;
-	struct flowi6 fl6 = {
-		.flowi6_flags	= FLOWI_FLAG_ANYSRC,
-		.flowi6_mark	= skb->mark,
-		.flowlabel	= ip6_flowinfo(ip6h),
-		.flowi6_oif	= dev->ifindex,
-		.flowi6_proto	= ip6h->nexthdr,
-		.daddr		= ip6h->daddr,
-		.saddr		= ip6h->saddr,
-	};
 
-	dst = ipv6_stub->ipv6_dst_lookup_flow(net, NULL, &fl6, NULL);
-	if (IS_ERR(dst))
-		goto out_drop;
+	if (!nh) {
+		struct dst_entry *dst;
+		struct flowi6 fl6 = {
+			.flowi6_flags = FLOWI_FLAG_ANYSRC,
+			.flowi6_mark  = skb->mark,
+			.flowlabel    = ip6_flowinfo(ip6h),
+			.flowi6_oif   = dev->ifindex,
+			.flowi6_proto = ip6h->nexthdr,
+			.daddr	      = ip6h->daddr,
+			.saddr	      = ip6h->saddr,
+		};
+
+		dst = ipv6_stub->ipv6_dst_lookup_flow(net, NULL, &fl6, NULL);
+		if (IS_ERR(dst))
+			goto out_drop;
 
-	skb_dst_set(skb, dst);
+		skb_dst_set(skb, dst);
+	} else if (nh->nh_family != AF_INET6) {
+		goto out_drop;
+	}
 
-	err = bpf_out_neigh_v6(net, skb);
+	err = bpf_out_neigh_v6(net, skb, dev, nh);
 	if (unlikely(net_xmit_eval(err)))
 		dev->stats.tx_errors++;
 	else
@@ -2252,7 +2264,8 @@ static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev)
 	return ret;
 }
 #else
-static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev)
+static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev,
+				   struct bpf_nh_params *nh)
 {
 	kfree_skb(skb);
 	return NET_XMIT_DROP;
@@ -2260,11 +2273,9 @@ static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev)
 #endif /* CONFIG_IPV6 */
 
 #if IS_ENABLED(CONFIG_INET)
-static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb)
+static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb,
+			    struct net_device *dev, struct bpf_nh_params *nh)
 {
-	struct dst_entry *dst = skb_dst(skb);
-	struct rtable *rt = container_of(dst, struct rtable, dst);
-	struct net_device *dev = dst->dev;
 	u32 hh_len = LL_RESERVED_SPACE(dev);
 	struct neighbour *neigh;
 	bool is_v6gw = false;
@@ -2292,7 +2303,21 @@ static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb)
 	}
 
 	rcu_read_lock_bh();
-	neigh = ip_neigh_for_gw(rt, skb, &is_v6gw);
+	if (!nh) {
+		struct dst_entry *dst = skb_dst(skb);
+		struct rtable *rt = container_of(dst, struct rtable, dst);
+
+		neigh = ip_neigh_for_gw(rt, skb, &is_v6gw);
+	} else if (nh->nh_family == AF_INET6) {
+		neigh = ip_neigh_gw6(dev, &nh->ipv6_nh);
+		is_v6gw = true;
+	} else if (nh->nh_family == AF_INET) {
+		neigh = ip_neigh_gw4(dev, nh->ipv4_nh);
+	} else {
+		rcu_read_unlock_bh();
+		goto out_drop;
+	}
+
 	if (likely(!IS_ERR(neigh))) {
 		int ret;
 
@@ -2309,33 +2334,37 @@ static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb)
 	return -ENETDOWN;
 }
 
-static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev)
+static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev,
+				   struct bpf_nh_params *nh)
 {
 	const struct iphdr *ip4h = ip_hdr(skb);
 	struct net *net = dev_net(dev);
 	int err, ret = NET_XMIT_DROP;
-	struct rtable *rt;
-	struct flowi4 fl4 = {
-		.flowi4_flags	= FLOWI_FLAG_ANYSRC,
-		.flowi4_mark	= skb->mark,
-		.flowi4_tos	= RT_TOS(ip4h->tos),
-		.flowi4_oif	= dev->ifindex,
-		.flowi4_proto	= ip4h->protocol,
-		.daddr		= ip4h->daddr,
-		.saddr		= ip4h->saddr,
-	};
 
-	rt = ip_route_output_flow(net, &fl4, NULL);
-	if (IS_ERR(rt))
-		goto out_drop;
-	if (rt->rt_type != RTN_UNICAST && rt->rt_type != RTN_LOCAL) {
-		ip_rt_put(rt);
-		goto out_drop;
-	}
+	if (!nh) {
+		struct flowi4 fl4 = {
+			.flowi4_flags = FLOWI_FLAG_ANYSRC,
+			.flowi4_mark  = skb->mark,
+			.flowi4_tos   = RT_TOS(ip4h->tos),
+			.flowi4_oif   = dev->ifindex,
+			.flowi4_proto = ip4h->protocol,
+			.daddr	      = ip4h->daddr,
+			.saddr	      = ip4h->saddr,
+		};
+		struct rtable *rt;
+
+		rt = ip_route_output_flow(net, &fl4, NULL);
+		if (IS_ERR(rt))
+			goto out_drop;
+		if (rt->rt_type != RTN_UNICAST && rt->rt_type != RTN_LOCAL) {
+			ip_rt_put(rt);
+			goto out_drop;
+		}
 
-	skb_dst_set(skb, &rt->dst);
+		skb_dst_set(skb, &rt->dst);
+	}
 
-	err = bpf_out_neigh_v4(net, skb);
+	err = bpf_out_neigh_v4(net, skb, dev, nh);
 	if (unlikely(net_xmit_eval(err)))
 		dev->stats.tx_errors++;
 	else
@@ -2348,14 +2377,16 @@ static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev)
 	return ret;
 }
 #else
-static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev)
+static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev,
+				   struct bpf_nh_params *nh)
 {
 	kfree_skb(skb);
 	return NET_XMIT_DROP;
 }
 #endif /* CONFIG_INET */
 
-static int __bpf_redirect_neigh(struct sk_buff *skb, struct net_device *dev)
+static int __bpf_redirect_neigh(struct sk_buff *skb, struct net_device *dev,
+				struct bpf_nh_params *nh)
 {
 	struct ethhdr *ethh = eth_hdr(skb);
 
@@ -2370,9 +2401,9 @@ static int __bpf_redirect_neigh(struct sk_buff *skb, struct net_device *dev)
 	skb_reset_network_header(skb);
 
 	if (skb->protocol == htons(ETH_P_IP))
-		return __bpf_redirect_neigh_v4(skb, dev);
+		return __bpf_redirect_neigh_v4(skb, dev, nh);
 	else if (skb->protocol == htons(ETH_P_IPV6))
-		return __bpf_redirect_neigh_v6(skb, dev);
+		return __bpf_redirect_neigh_v6(skb, dev, nh);
 out:
 	kfree_skb(skb);
 	return -ENOTSUPP;
@@ -2382,7 +2413,8 @@ static int __bpf_redirect_neigh(struct sk_buff *skb, struct net_device *dev)
 enum {
 	BPF_F_NEIGH	= (1ULL << 1),
 	BPF_F_PEER	= (1ULL << 2),
-#define BPF_F_REDIRECT_INTERNAL	(BPF_F_NEIGH | BPF_F_PEER)
+	BPF_F_NEXTHOP	= (1ULL << 3),
+#define BPF_F_REDIRECT_INTERNAL	(BPF_F_NEIGH | BPF_F_PEER | BPF_F_NEXTHOP)
 };
 
 BPF_CALL_3(bpf_clone_redirect, struct sk_buff *, skb, u32, ifindex, u64, flags)
@@ -2455,8 +2487,8 @@ int skb_do_redirect(struct sk_buff *skb)
 		return -EAGAIN;
 	}
 	return flags & BPF_F_NEIGH ?
-	       __bpf_redirect_neigh(skb, dev) :
-	       __bpf_redirect(skb, dev, flags);
+		__bpf_redirect_neigh(skb, dev, flags & BPF_F_NEXTHOP ? &ri->nh : NULL) :
+		__bpf_redirect(skb, dev, flags);
 out_drop:
 	kfree_skb(skb);
 	return -EINVAL;
@@ -2504,16 +2536,25 @@ static const struct bpf_func_proto bpf_redirect_peer_proto = {
 	.arg2_type      = ARG_ANYTHING,
 };
 
-BPF_CALL_2(bpf_redirect_neigh, u32, ifindex, u64, flags)
+BPF_CALL_4(bpf_redirect_neigh, u32, ifindex, struct bpf_redir_neigh *, params,
+	   int, plen, u64, flags)
 {
 	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
 
-	if (unlikely(flags))
+	if (unlikely((plen && plen < sizeof(*params)) || flags))
+		return TC_ACT_SHOT;
+
+	if (unlikely(plen && (params->unused[0] || params->unused[1] ||
+			      params->unused[2])))
 		return TC_ACT_SHOT;
 
-	ri->flags = BPF_F_NEIGH;
+	ri->flags = BPF_F_NEIGH | (plen ? BPF_F_NEXTHOP : 0);
 	ri->tgt_index = ifindex;
 
+	BUILD_BUG_ON(sizeof(struct bpf_redir_neigh) != sizeof(struct bpf_nh_params));
+	if (plen)
+		memcpy(&ri->nh, params, sizeof(ri->nh));
+
 	return TC_ACT_REDIRECT;
 }
 
@@ -2522,7 +2563,9 @@ static const struct bpf_func_proto bpf_redirect_neigh_proto = {
 	.gpl_only	= false,
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_ANYTHING,
-	.arg2_type	= ARG_ANYTHING,
+	.arg2_type      = ARG_PTR_TO_MEM_OR_NULL,
+	.arg3_type      = ARG_CONST_SIZE_OR_ZERO,
+	.arg4_type	= ARG_ANYTHING,
 };
 
 BPF_CALL_2(bpf_msg_apply_bytes, struct sk_msg *, msg, u32, bytes)
diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py
index 7d86fdd190be..6769caae142f 100755
--- a/scripts/bpf_helpers_doc.py
+++ b/scripts/bpf_helpers_doc.py
@@ -453,6 +453,7 @@ class PrinterHelpers(Printer):
             'struct bpf_perf_event_data',
             'struct bpf_perf_event_value',
             'struct bpf_pidns_info',
+            'struct bpf_redir_neigh',
             'struct bpf_sk_lookup',
             'struct bpf_sock',
             'struct bpf_sock_addr',
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index bf5a99d803e4..9668cde9d684 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3677,15 +3677,19 @@ union bpf_attr {
  * 	Return
  * 		The id is returned or 0 in case the id could not be retrieved.
  *
- * long bpf_redirect_neigh(u32 ifindex, u64 flags)
+ * long bpf_redirect_neigh(u32 ifindex, struct bpf_redir_neigh *params, int plen, u64 flags)
  * 	Description
  * 		Redirect the packet to another net device of index *ifindex*
  * 		and fill in L2 addresses from neighboring subsystem. This helper
  * 		is somewhat similar to **bpf_redirect**\ (), except that it
  * 		populates L2 addresses as well, meaning, internally, the helper
- * 		performs a FIB lookup based on the skb's networking header to
- * 		get the address of the next hop and then relies on the neighbor
- * 		lookup for the L2 address of the nexthop.
+ * 		relies on the neighbor lookup for the L2 address of the nexthop.
+ *
+ * 		The helper will perform a FIB lookup based on the skb's
+ * 		networking header to get the address of the next hop, unless
+ * 		this is supplied by the caller in the *params* argument. The
+ * 		*plen* argument indicates the len of *params* and should be set
+ * 		to 0 if *params* is NULL.
  *
  * 		The *flags* argument is reserved and must be 0. The helper is
  * 		currently only supported for tc BPF program types, and enabled
@@ -4906,6 +4910,18 @@ struct bpf_fib_lookup {
 	__u8	dmac[6];     /* ETH_ALEN */
 };
 
+struct bpf_redir_neigh {
+	/* network family for lookup (AF_INET, AF_INET6) */
+	__u8 nh_family;
+	 /* avoid hole in struct - must be set to 0 */
+	__u8 unused[3];
+	/* network address of nexthop; skips fib lookup to find gateway */
+	union {
+		__be32		ipv4_nh;
+		__u32		ipv6_nh[4];  /* in6_addr; network order */
+	};
+};
+
 enum bpf_task_fd_type {
 	BPF_FD_TYPE_RAW_TRACEPOINT,	/* tp name */
 	BPF_FD_TYPE_TRACEPOINT,		/* tp name */


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH bpf v2 2/3] bpf_fib_lookup: optionally skip neighbour lookup
  2020-10-20 10:51 [PATCH bpf v2 0/3] bpf: Rework bpf_redirect_neigh() to allow supplying nexthop from caller Toke Høiland-Jørgensen
  2020-10-20 10:51 ` [PATCH bpf v2 1/3] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter Toke Høiland-Jørgensen
@ 2020-10-20 10:51 ` Toke Høiland-Jørgensen
  2020-10-20 13:49   ` David Ahern
  2020-10-20 10:51 ` [PATCH bpf v2 3/3] selftests: Update test_tc_redirect.sh to use the modified bpf_redirect_neigh() Toke Høiland-Jørgensen
  2 siblings, 1 reply; 19+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-20 10:51 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: David Ahern, netdev, bpf

From: Toke Høiland-Jørgensen <toke@redhat.com>

The bpf_fib_lookup() helper performs a neighbour lookup for the destination
IP and returns BPF_FIB_LKUP_NO_NEIGH if this fails, with the expectation
that the BPF program will deal with this condition, either by passing the
packet up the stack, or by using bpf_redirect_neigh().

The neighbour lookup is done via a hash table (through ___neigh_lookup_noref()),
which incurs some overhead. If the caller knows this is likely to fail
anyway, it may want to skip that and go unconditionally to
bpf_redirect_neigh(). For this use case, add a flag to bpf_fib_lookup()
that will make it skip the neighbour lookup and instead always return
BPF_FIB_LKUP_RET_NO_NEIGH (but still populate the gateway and target
ifindex).

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/uapi/linux/bpf.h       |   10 ++++++----
 net/core/filter.c              |   16 ++++++++++++++--
 tools/include/uapi/linux/bpf.h |   10 ++++++----
 3 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 9668cde9d684..4bfd3c72dae6 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -4841,12 +4841,14 @@ struct bpf_raw_tracepoint_args {
 	__u64 args[0];
 };
 
-/* DIRECT:  Skip the FIB rules and go to FIB table associated with device
- * OUTPUT:  Do lookup from egress perspective; default is ingress
+/* DIRECT:      Skip the FIB rules and go to FIB table associated with device
+ * OUTPUT:      Do lookup from egress perspective; default is ingress
+ * SKIP_NEIGH:  Skip neighbour lookup and return BPF_FIB_LKUP_RET_NO_NEIGH on success
  */
 enum {
-	BPF_FIB_LOOKUP_DIRECT  = (1U << 0),
-	BPF_FIB_LOOKUP_OUTPUT  = (1U << 1),
+	BPF_FIB_LOOKUP_DIRECT	  = (1U << 0),
+	BPF_FIB_LOOKUP_OUTPUT	  = (1U << 1),
+	BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2),
 };
 
 enum {
diff --git a/net/core/filter.c b/net/core/filter.c
index fa09b4f141ae..9791e6311afa 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5382,6 +5382,9 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 		if (nhc->nhc_gw_family)
 			params->ipv4_dst = nhc->nhc_gw.ipv4;
 
+		if (flags & BPF_FIB_LOOKUP_SKIP_NEIGH)
+			return BPF_FIB_LKUP_RET_NO_NEIGH;
+
 		neigh = __ipv4_neigh_lookup_noref(dev,
 						 (__force u32)params->ipv4_dst);
 	} else {
@@ -5389,6 +5392,10 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 
 		params->family = AF_INET6;
 		*dst = nhc->nhc_gw.ipv6;
+
+		if (flags & BPF_FIB_LOOKUP_SKIP_NEIGH)
+			return BPF_FIB_LKUP_RET_NO_NEIGH;
+
 		neigh = __ipv6_neigh_lookup_noref_stub(dev, dst);
 	}
 
@@ -5501,6 +5508,9 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
 	params->rt_metric = res.f6i->fib6_metric;
 	params->ifindex = dev->ifindex;
 
+	if (flags & BPF_FIB_LOOKUP_SKIP_NEIGH)
+		return BPF_FIB_LKUP_RET_NO_NEIGH;
+
 	/* xdp and cls_bpf programs are run in RCU-bh so rcu_read_lock_bh is
 	 * not needed here.
 	 */
@@ -5518,7 +5528,8 @@ BPF_CALL_4(bpf_xdp_fib_lookup, struct xdp_buff *, ctx,
 	if (plen < sizeof(*params))
 		return -EINVAL;
 
-	if (flags & ~(BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT))
+	if (flags & ~(BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT |
+		      BPF_FIB_LOOKUP_SKIP_NEIGH))
 		return -EINVAL;
 
 	switch (params->family) {
@@ -5555,7 +5566,8 @@ BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *, skb,
 	if (plen < sizeof(*params))
 		return -EINVAL;
 
-	if (flags & ~(BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT))
+	if (flags & ~(BPF_FIB_LOOKUP_DIRECT | BPF_FIB_LOOKUP_OUTPUT |
+		      BPF_FIB_LOOKUP_SKIP_NEIGH))
 		return -EINVAL;
 
 	switch (params->family) {
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 9668cde9d684..4bfd3c72dae6 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -4841,12 +4841,14 @@ struct bpf_raw_tracepoint_args {
 	__u64 args[0];
 };
 
-/* DIRECT:  Skip the FIB rules and go to FIB table associated with device
- * OUTPUT:  Do lookup from egress perspective; default is ingress
+/* DIRECT:      Skip the FIB rules and go to FIB table associated with device
+ * OUTPUT:      Do lookup from egress perspective; default is ingress
+ * SKIP_NEIGH:  Skip neighbour lookup and return BPF_FIB_LKUP_RET_NO_NEIGH on success
  */
 enum {
-	BPF_FIB_LOOKUP_DIRECT  = (1U << 0),
-	BPF_FIB_LOOKUP_OUTPUT  = (1U << 1),
+	BPF_FIB_LOOKUP_DIRECT	  = (1U << 0),
+	BPF_FIB_LOOKUP_OUTPUT	  = (1U << 1),
+	BPF_FIB_LOOKUP_SKIP_NEIGH = (1U << 2),
 };
 
 enum {


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH bpf v2 3/3] selftests: Update test_tc_redirect.sh to use the modified bpf_redirect_neigh()
  2020-10-20 10:51 [PATCH bpf v2 0/3] bpf: Rework bpf_redirect_neigh() to allow supplying nexthop from caller Toke Høiland-Jørgensen
  2020-10-20 10:51 ` [PATCH bpf v2 1/3] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter Toke Høiland-Jørgensen
  2020-10-20 10:51 ` [PATCH bpf v2 2/3] bpf_fib_lookup: optionally skip neighbour lookup Toke Høiland-Jørgensen
@ 2020-10-20 10:51 ` Toke Høiland-Jørgensen
  2 siblings, 0 replies; 19+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-20 10:51 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: David Ahern, netdev, bpf

From: Toke Høiland-Jørgensen <toke@redhat.com>

This updates the test_tc_neigh prog in selftests to use the new syntax of
bpf_redirect_neigh(). To exercise the helper both with and without the
optional parameter, add an additional test_tc_neigh_fib test program, which
does a bpf_fib_lookup() followed by a call to bpf_redirect_neigh() instead
of looking up the ifindex in a map. This second test uses the
BPF_FIB_LOOKUP_SKIP_NEIGH flag in one forwarding direction, but not in the
other, to test both ways of combining the two helpers.

Update the test_tc_redirect.sh script to run both versions of the test, and
while we're add it, fix it to work on systems that have a consolidated
dual-stack 'ping' binary instead of separate ping/ping6 versions.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 tools/testing/selftests/bpf/progs/test_tc_neigh.c  |    5 -
 .../selftests/bpf/progs/test_tc_neigh_fib.c        |  153 ++++++++++++++++++++
 tools/testing/selftests/bpf/test_tc_redirect.sh    |   18 ++
 3 files changed, 171 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_tc_neigh_fib.c

diff --git a/tools/testing/selftests/bpf/progs/test_tc_neigh.c b/tools/testing/selftests/bpf/progs/test_tc_neigh.c
index fe182616b112..b985ac4e7a81 100644
--- a/tools/testing/selftests/bpf/progs/test_tc_neigh.c
+++ b/tools/testing/selftests/bpf/progs/test_tc_neigh.c
@@ -1,4 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
+#include <stddef.h>
 #include <stdint.h>
 #include <stdbool.h>
 
@@ -118,7 +119,7 @@ SEC("dst_ingress") int tc_dst(struct __sk_buff *skb)
 	if (bpf_skb_store_bytes(skb, 0, &zero, sizeof(zero), 0) < 0)
 		return TC_ACT_SHOT;
 
-	return bpf_redirect_neigh(get_dev_ifindex(dev_src), 0);
+	return bpf_redirect_neigh(get_dev_ifindex(dev_src), NULL, 0, 0);
 }
 
 SEC("src_ingress") int tc_src(struct __sk_buff *skb)
@@ -142,7 +143,7 @@ SEC("src_ingress") int tc_src(struct __sk_buff *skb)
 	if (bpf_skb_store_bytes(skb, 0, &zero, sizeof(zero), 0) < 0)
 		return TC_ACT_SHOT;
 
-	return bpf_redirect_neigh(get_dev_ifindex(dev_dst), 0);
+	return bpf_redirect_neigh(get_dev_ifindex(dev_dst), NULL, 0, 0);
 }
 
 char __license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/test_tc_neigh_fib.c b/tools/testing/selftests/bpf/progs/test_tc_neigh_fib.c
new file mode 100644
index 000000000000..14792ce3a85c
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_tc_neigh_fib.c
@@ -0,0 +1,153 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <stdint.h>
+#include <stdbool.h>
+#include <stddef.h>
+
+#include <linux/bpf.h>
+#include <linux/stddef.h>
+#include <linux/pkt_cls.h>
+#include <linux/if_ether.h>
+#include <linux/in.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#ifndef ctx_ptr
+# define ctx_ptr(field)		(void *)(long)(field)
+#endif
+
+#define AF_INET 2
+#define AF_INET6 10
+
+static __always_inline int fill_fib_params_v4(struct __sk_buff *skb,
+					      struct bpf_fib_lookup *fib_params)
+{
+	void *data_end = ctx_ptr(skb->data_end);
+	void *data = ctx_ptr(skb->data);
+	struct iphdr *ip4h;
+
+	if (data + sizeof(struct ethhdr) > data_end)
+		return -1;
+
+	ip4h = (struct iphdr *)(data + sizeof(struct ethhdr));
+	if ((void *)(ip4h + 1) > data_end)
+		return -1;
+
+	fib_params->family = AF_INET;
+	fib_params->tos = ip4h->tos;
+	fib_params->l4_protocol = ip4h->protocol;
+	fib_params->sport = 0;
+	fib_params->dport = 0;
+	fib_params->tot_len = bpf_ntohs(ip4h->tot_len);
+	fib_params->ipv4_src = ip4h->saddr;
+	fib_params->ipv4_dst = ip4h->daddr;
+
+	return 0;
+}
+
+static __always_inline int fill_fib_params_v6(struct __sk_buff *skb,
+					      struct bpf_fib_lookup *fib_params)
+{
+	struct in6_addr *src = (struct in6_addr *)fib_params->ipv6_src;
+	struct in6_addr *dst = (struct in6_addr *)fib_params->ipv6_dst;
+	void *data_end = ctx_ptr(skb->data_end);
+	void *data = ctx_ptr(skb->data);
+	struct ipv6hdr *ip6h;
+
+	if (data + sizeof(struct ethhdr) > data_end)
+		return -1;
+
+	ip6h = (struct ipv6hdr *)(data + sizeof(struct ethhdr));
+	if ((void *)(ip6h + 1) > data_end)
+		return -1;
+
+	fib_params->family = AF_INET6;
+	fib_params->flowinfo = 0;
+	fib_params->l4_protocol = ip6h->nexthdr;
+	fib_params->sport = 0;
+	fib_params->dport = 0;
+	fib_params->tot_len = bpf_ntohs(ip6h->payload_len);
+	*src = ip6h->saddr;
+	*dst = ip6h->daddr;
+
+	return 0;
+}
+
+SEC("chk_egress") int tc_chk(struct __sk_buff *skb)
+{
+	void *data_end = ctx_ptr(skb->data_end);
+	void *data = ctx_ptr(skb->data);
+	__u32 *raw = data;
+
+	if (data + sizeof(struct ethhdr) > data_end)
+		return TC_ACT_SHOT;
+
+	return !raw[0] && !raw[1] && !raw[2] ? TC_ACT_SHOT : TC_ACT_OK;
+}
+
+static __always_inline int tc_redir(struct __sk_buff *skb, int fib_lookup_flags)
+{
+	struct bpf_fib_lookup fib_params = { .ifindex = skb->ingress_ifindex };
+	__u8 zero[ETH_ALEN * 2];
+	int ret = -1;
+
+	switch (skb->protocol) {
+	case __bpf_constant_htons(ETH_P_IP):
+		ret = fill_fib_params_v4(skb, &fib_params);
+		break;
+	case __bpf_constant_htons(ETH_P_IPV6):
+		ret = fill_fib_params_v6(skb, &fib_params);
+		break;
+	}
+
+	if (ret)
+		return TC_ACT_OK;
+
+	ret = bpf_fib_lookup(skb, &fib_params, sizeof(fib_params),
+			     fib_lookup_flags);
+	if (ret == BPF_FIB_LKUP_RET_NOT_FWDED || ret < 0)
+		return TC_ACT_OK;
+
+	__builtin_memset(&zero, 0, sizeof(zero));
+	if (bpf_skb_store_bytes(skb, 0, &zero, sizeof(zero), 0) < 0)
+		return TC_ACT_SHOT;
+
+	if (ret == BPF_FIB_LKUP_RET_NO_NEIGH) {
+		struct bpf_redir_neigh nh_params = {};
+
+		nh_params.nh_family = fib_params.family;
+		__builtin_memcpy(&nh_params.ipv6_nh, &fib_params.ipv6_dst,
+				 sizeof(nh_params.ipv6_nh));
+
+		return bpf_redirect_neigh(fib_params.ifindex, &nh_params,
+					  sizeof(nh_params), 0);
+
+	} else if (!fib_lookup_flags && ret == BPF_FIB_LKUP_RET_SUCCESS) {
+		void *data_end = ctx_ptr(skb->data_end);
+		struct ethhdr *eth = ctx_ptr(skb->data);
+
+		if (eth + 1 > data_end)
+			return TC_ACT_SHOT;
+
+		__builtin_memcpy(eth->h_dest, fib_params.dmac, ETH_ALEN);
+		__builtin_memcpy(eth->h_source, fib_params.smac, ETH_ALEN);
+
+		return bpf_redirect(fib_params.ifindex, 0);
+	}
+
+	return TC_ACT_SHOT;
+}
+
+SEC("dst_ingress") int tc_dst(struct __sk_buff *skb)
+{
+	return tc_redir(skb, 0);
+}
+
+SEC("src_ingress") int tc_src(struct __sk_buff *skb)
+{
+	return tc_redir(skb, BPF_FIB_LOOKUP_SKIP_NEIGH);
+}
+
+char __license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_tc_redirect.sh b/tools/testing/selftests/bpf/test_tc_redirect.sh
index 6d7482562140..8868aa1ca902 100755
--- a/tools/testing/selftests/bpf/test_tc_redirect.sh
+++ b/tools/testing/selftests/bpf/test_tc_redirect.sh
@@ -24,8 +24,7 @@ command -v timeout >/dev/null 2>&1 || \
 	{ echo >&2 "timeout is not available"; exit 1; }
 command -v ping >/dev/null 2>&1 || \
 	{ echo >&2 "ping is not available"; exit 1; }
-command -v ping6 >/dev/null 2>&1 || \
-	{ echo >&2 "ping6 is not available"; exit 1; }
+if command -v ping6 >/dev/null 2>&1; then PING6=ping6; else PING6=ping; fi
 command -v perl >/dev/null 2>&1 || \
 	{ echo >&2 "perl is not available"; exit 1; }
 command -v jq >/dev/null 2>&1 || \
@@ -152,7 +151,7 @@ netns_test_connectivity()
 	echo -e "${TEST}: ${GREEN}PASS${NC}"
 
 	TEST="ICMPv6 connectivity test"
-	ip netns exec ${NS_SRC} ping6 $PING_ARG ${IP6_DST}
+	ip netns exec ${NS_SRC} $PING6 $PING_ARG ${IP6_DST}
 	if [ $? -ne 0 ]; then
 		echo -e "${TEST}: ${RED}FAIL${NC}"
 		exit 1
@@ -170,6 +169,7 @@ hex_mem_str()
 netns_setup_bpf()
 {
 	local obj=$1
+	local use_forwarding=${2:-0}
 
 	ip netns exec ${NS_FWD} tc qdisc add dev veth_src_fwd clsact
 	ip netns exec ${NS_FWD} tc filter add dev veth_src_fwd ingress bpf da obj $obj sec src_ingress
@@ -179,6 +179,14 @@ netns_setup_bpf()
 	ip netns exec ${NS_FWD} tc filter add dev veth_dst_fwd ingress bpf da obj $obj sec dst_ingress
 	ip netns exec ${NS_FWD} tc filter add dev veth_dst_fwd egress  bpf da obj $obj sec chk_egress
 
+	if [ "$use_forwarding" -eq "1" ]; then
+		# bpf_fib_lookup() checks if forwarding is enabled
+		ip netns exec ${NS_FWD} sysctl -w net.ipv4.ip_forward=1
+		ip netns exec ${NS_FWD} sysctl -w net.ipv6.conf.veth_dst_fwd.forwarding=1
+		ip netns exec ${NS_FWD} sysctl -w net.ipv6.conf.veth_src_fwd.forwarding=1
+		return 0
+	fi
+
 	veth_src=$(ip netns exec ${NS_FWD} cat /sys/class/net/veth_src_fwd/ifindex)
 	veth_dst=$(ip netns exec ${NS_FWD} cat /sys/class/net/veth_dst_fwd/ifindex)
 
@@ -200,5 +208,9 @@ netns_setup_bpf test_tc_neigh.o
 netns_test_connectivity
 netns_cleanup
 netns_setup
+netns_setup_bpf test_tc_neigh_fib.o 1
+netns_test_connectivity
+netns_cleanup
+netns_setup
 netns_setup_bpf test_tc_peer.o
 netns_test_connectivity


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf v2 2/3] bpf_fib_lookup: optionally skip neighbour lookup
  2020-10-20 10:51 ` [PATCH bpf v2 2/3] bpf_fib_lookup: optionally skip neighbour lookup Toke Høiland-Jørgensen
@ 2020-10-20 13:49   ` David Ahern
  2020-10-20 15:04     ` Daniel Borkmann
  0 siblings, 1 reply; 19+ messages in thread
From: David Ahern @ 2020-10-20 13:49 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Daniel Borkmann
  Cc: David Ahern, netdev, bpf

On 10/20/20 4:51 AM, Toke Høiland-Jørgensen wrote:
> From: Toke Høiland-Jørgensen <toke@redhat.com>
> 
> The bpf_fib_lookup() helper performs a neighbour lookup for the destination
> IP and returns BPF_FIB_LKUP_NO_NEIGH if this fails, with the expectation
> that the BPF program will deal with this condition, either by passing the
> packet up the stack, or by using bpf_redirect_neigh().
> 
> The neighbour lookup is done via a hash table (through ___neigh_lookup_noref()),
> which incurs some overhead. If the caller knows this is likely to fail
> anyway, it may want to skip that and go unconditionally to
> bpf_redirect_neigh(). For this use case, add a flag to bpf_fib_lookup()
> that will make it skip the neighbour lookup and instead always return
> BPF_FIB_LKUP_RET_NO_NEIGH (but still populate the gateway and target
> ifindex).
> 
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
>  include/uapi/linux/bpf.h       |   10 ++++++----
>  net/core/filter.c              |   16 ++++++++++++++--
>  tools/include/uapi/linux/bpf.h |   10 ++++++----
>  3 files changed, 26 insertions(+), 10 deletions(-)

Nack. Please don't.

As I mentioned in my reply to Daniel, I would prefer such logic be
pushed to the bpf programs. There is no reason for rare run time events
to warrant a new flag and new check in the existing FIB helpers. The bpf
programs can take the hit of the extra lookup.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf v2 2/3] bpf_fib_lookup: optionally skip neighbour lookup
  2020-10-20 13:49   ` David Ahern
@ 2020-10-20 15:04     ` Daniel Borkmann
  2020-10-20 18:10       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Borkmann @ 2020-10-20 15:04 UTC (permalink / raw)
  To: David Ahern, Toke Høiland-Jørgensen; +Cc: David Ahern, netdev, bpf

On 10/20/20 3:49 PM, David Ahern wrote:
> On 10/20/20 4:51 AM, Toke Høiland-Jørgensen wrote:
>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>>
>> The bpf_fib_lookup() helper performs a neighbour lookup for the destination
>> IP and returns BPF_FIB_LKUP_NO_NEIGH if this fails, with the expectation
>> that the BPF program will deal with this condition, either by passing the
>> packet up the stack, or by using bpf_redirect_neigh().
>>
>> The neighbour lookup is done via a hash table (through ___neigh_lookup_noref()),
>> which incurs some overhead. If the caller knows this is likely to fail
>> anyway, it may want to skip that and go unconditionally to
>> bpf_redirect_neigh(). For this use case, add a flag to bpf_fib_lookup()
>> that will make it skip the neighbour lookup and instead always return
>> BPF_FIB_LKUP_RET_NO_NEIGH (but still populate the gateway and target
>> ifindex).
>>
>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> ---
>>   include/uapi/linux/bpf.h       |   10 ++++++----
>>   net/core/filter.c              |   16 ++++++++++++++--
>>   tools/include/uapi/linux/bpf.h |   10 ++++++----
>>   3 files changed, 26 insertions(+), 10 deletions(-)
> 
> Nack. Please don't.
> 
> As I mentioned in my reply to Daniel, I would prefer such logic be
> pushed to the bpf programs. There is no reason for rare run time events
> to warrant a new flag and new check in the existing FIB helpers. The bpf
> programs can take the hit of the extra lookup.

Fair enough, lets push it to progs then.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf v2 1/3] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
  2020-10-20 10:51 ` [PATCH bpf v2 1/3] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter Toke Høiland-Jørgensen
@ 2020-10-20 15:08   ` Daniel Borkmann
  2020-10-20 18:08     ` Toke Høiland-Jørgensen
  2020-10-20 16:30   ` Jakub Kicinski
  2020-10-20 16:34   ` Jakub Kicinski
  2 siblings, 1 reply; 19+ messages in thread
From: Daniel Borkmann @ 2020-10-20 15:08 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: David Ahern, netdev, bpf

On 10/20/20 12:51 PM, Toke Høiland-Jørgensen wrote:
> From: Toke Høiland-Jørgensen <toke@redhat.com>
[...]
>   BPF_CALL_3(bpf_clone_redirect, struct sk_buff *, skb, u32, ifindex, u64, flags)
> @@ -2455,8 +2487,8 @@ int skb_do_redirect(struct sk_buff *skb)
>   		return -EAGAIN;
>   	}
>   	return flags & BPF_F_NEIGH ?
> -	       __bpf_redirect_neigh(skb, dev) :
> -	       __bpf_redirect(skb, dev, flags);
> +		__bpf_redirect_neigh(skb, dev, flags & BPF_F_NEXTHOP ? &ri->nh : NULL) :
> +		__bpf_redirect(skb, dev, flags);
>   out_drop:
>   	kfree_skb(skb);
>   	return -EINVAL;
> @@ -2504,16 +2536,25 @@ static const struct bpf_func_proto bpf_redirect_peer_proto = {
>   	.arg2_type      = ARG_ANYTHING,
>   };
>   
> -BPF_CALL_2(bpf_redirect_neigh, u32, ifindex, u64, flags)
> +BPF_CALL_4(bpf_redirect_neigh, u32, ifindex, struct bpf_redir_neigh *, params,
> +	   int, plen, u64, flags)
>   {
>   	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
>   
> -	if (unlikely(flags))
> +	if (unlikely((plen && plen < sizeof(*params)) || flags))
> +		return TC_ACT_SHOT;
> +
> +	if (unlikely(plen && (params->unused[0] || params->unused[1] ||
> +			      params->unused[2])))

small nit: maybe fold this into the prior check that already tests non-zero plen

if (unlikely((plen && (plen < sizeof(*params) ||
                        (params->unused[0] | params->unused[1] |
                         params->unused[2]))) || flags))
         return TC_ACT_SHOT;

>   		return TC_ACT_SHOT;
>   
> -	ri->flags = BPF_F_NEIGH;
> +	ri->flags = BPF_F_NEIGH | (plen ? BPF_F_NEXTHOP : 0);
>   	ri->tgt_index = ifindex;
>   
> +	BUILD_BUG_ON(sizeof(struct bpf_redir_neigh) != sizeof(struct bpf_nh_params));
> +	if (plen)
> +		memcpy(&ri->nh, params, sizeof(ri->nh));
> +
>   	return TC_ACT_REDIRECT;
>   }
>   

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf v2 1/3] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
  2020-10-20 10:51 ` [PATCH bpf v2 1/3] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter Toke Høiland-Jørgensen
  2020-10-20 15:08   ` Daniel Borkmann
@ 2020-10-20 16:30   ` Jakub Kicinski
  2020-10-20 18:08     ` Toke Høiland-Jørgensen
  2020-10-20 18:12     ` David Ahern
  2020-10-20 16:34   ` Jakub Kicinski
  2 siblings, 2 replies; 19+ messages in thread
From: Jakub Kicinski @ 2020-10-20 16:30 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Daniel Borkmann, David Ahern, netdev, bpf

On Tue, 20 Oct 2020 12:51:02 +0200 Toke Høiland-Jørgensen wrote:
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 20fc24c9779a..ba9de7188cd0 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -607,12 +607,21 @@ struct bpf_skb_data_end {
>  	void *data_end;
>  };
>  
> +struct bpf_nh_params {
> +	u8 nh_family;
> +	union {
> +		__u32 ipv4_nh;
> +		struct in6_addr ipv6_nh;
> +	};
> +};

> @@ -4906,6 +4910,18 @@ struct bpf_fib_lookup {
>  	__u8	dmac[6];     /* ETH_ALEN */
>  };
>  
> +struct bpf_redir_neigh {
> +	/* network family for lookup (AF_INET, AF_INET6) */
> +	__u8 nh_family;
> +	 /* avoid hole in struct - must be set to 0 */
> +	__u8 unused[3];
> +	/* network address of nexthop; skips fib lookup to find gateway */
> +	union {
> +		__be32		ipv4_nh;
> +		__u32		ipv6_nh[4];  /* in6_addr; network order */
> +	};
> +};

Isn't this backward? The hole could be named in the internal structure.
This is a bit of a gray area, but if you name this hole in uAPI and
programs start referring to it you will never be able to reuse it.
So you may as well not require it to be zeroed..

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf v2 1/3] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
  2020-10-20 10:51 ` [PATCH bpf v2 1/3] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter Toke Høiland-Jørgensen
  2020-10-20 15:08   ` Daniel Borkmann
  2020-10-20 16:30   ` Jakub Kicinski
@ 2020-10-20 16:34   ` Jakub Kicinski
  2020-10-20 18:03     ` Toke Høiland-Jørgensen
  2 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2020-10-20 16:34 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Daniel Borkmann, David Ahern, netdev, bpf

On Tue, 20 Oct 2020 12:51:02 +0200 Toke Høiland-Jørgensen wrote:
> +struct bpf_nh_params {
> +	u8 nh_family;
> +	union {
> +		__u32 ipv4_nh;
> +		struct in6_addr ipv6_nh;
> +	};
> +};

Folks, not directly related to this set, but there's a SRv6 patch going
around which adds ifindex, otherwise nh can't be link local.

I wonder if we want to consider this use case from the start (or the
close approximation of start in this case ;)).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf v2 1/3] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
  2020-10-20 16:34   ` Jakub Kicinski
@ 2020-10-20 18:03     ` Toke Høiland-Jørgensen
  2020-10-20 18:14       ` David Ahern
  0 siblings, 1 reply; 19+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-20 18:03 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Daniel Borkmann, David Ahern, netdev, bpf

Jakub Kicinski <kuba@kernel.org> writes:

> On Tue, 20 Oct 2020 12:51:02 +0200 Toke Høiland-Jørgensen wrote:
>> +struct bpf_nh_params {
>> +	u8 nh_family;
>> +	union {
>> +		__u32 ipv4_nh;
>> +		struct in6_addr ipv6_nh;
>> +	};
>> +};
>
> Folks, not directly related to this set, but there's a SRv6 patch going
> around which adds ifindex, otherwise nh can't be link local.
>
> I wonder if we want to consider this use case from the start (or the
> close approximation of start in this case ;)).

The ifindex is there, it's just in the function call signature instead
of the struct... Or did you mean something different?

-Toke


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf v2 1/3] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
  2020-10-20 16:30   ` Jakub Kicinski
@ 2020-10-20 18:08     ` Toke Høiland-Jørgensen
  2020-10-20 19:01       ` Jakub Kicinski
  2020-10-20 18:12     ` David Ahern
  1 sibling, 1 reply; 19+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-20 18:08 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Daniel Borkmann, David Ahern, netdev, bpf

Jakub Kicinski <kuba@kernel.org> writes:

> On Tue, 20 Oct 2020 12:51:02 +0200 Toke Høiland-Jørgensen wrote:
>> diff --git a/include/linux/filter.h b/include/linux/filter.h
>> index 20fc24c9779a..ba9de7188cd0 100644
>> --- a/include/linux/filter.h
>> +++ b/include/linux/filter.h
>> @@ -607,12 +607,21 @@ struct bpf_skb_data_end {
>>  	void *data_end;
>>  };
>>  
>> +struct bpf_nh_params {
>> +	u8 nh_family;
>> +	union {
>> +		__u32 ipv4_nh;
>> +		struct in6_addr ipv6_nh;
>> +	};
>> +};
>
>> @@ -4906,6 +4910,18 @@ struct bpf_fib_lookup {
>>  	__u8	dmac[6];     /* ETH_ALEN */
>>  };
>>  
>> +struct bpf_redir_neigh {
>> +	/* network family for lookup (AF_INET, AF_INET6) */
>> +	__u8 nh_family;
>> +	 /* avoid hole in struct - must be set to 0 */
>> +	__u8 unused[3];
>> +	/* network address of nexthop; skips fib lookup to find gateway */
>> +	union {
>> +		__be32		ipv4_nh;
>> +		__u32		ipv6_nh[4];  /* in6_addr; network order */
>> +	};
>> +};
>
> Isn't this backward? The hole could be named in the internal structure.
> This is a bit of a gray area, but if you name this hole in uAPI and
> programs start referring to it you will never be able to reuse it.
> So you may as well not require it to be zeroed..

Hmm, yeah, suppose you're right. Doesn't the verifier prevent any part
of the memory from being unitialised anyway? I seem to recall having run
into verifier complaints when I didn't initialise struct on the stack...

-Toke


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf v2 1/3] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
  2020-10-20 15:08   ` Daniel Borkmann
@ 2020-10-20 18:08     ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 19+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-20 18:08 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: David Ahern, netdev, bpf

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 10/20/20 12:51 PM, Toke Høiland-Jørgensen wrote:
>> From: Toke Høiland-Jørgensen <toke@redhat.com>
> [...]
>>   BPF_CALL_3(bpf_clone_redirect, struct sk_buff *, skb, u32, ifindex, u64, flags)
>> @@ -2455,8 +2487,8 @@ int skb_do_redirect(struct sk_buff *skb)
>>   		return -EAGAIN;
>>   	}
>>   	return flags & BPF_F_NEIGH ?
>> -	       __bpf_redirect_neigh(skb, dev) :
>> -	       __bpf_redirect(skb, dev, flags);
>> +		__bpf_redirect_neigh(skb, dev, flags & BPF_F_NEXTHOP ? &ri->nh : NULL) :
>> +		__bpf_redirect(skb, dev, flags);
>>   out_drop:
>>   	kfree_skb(skb);
>>   	return -EINVAL;
>> @@ -2504,16 +2536,25 @@ static const struct bpf_func_proto bpf_redirect_peer_proto = {
>>   	.arg2_type      = ARG_ANYTHING,
>>   };
>>   
>> -BPF_CALL_2(bpf_redirect_neigh, u32, ifindex, u64, flags)
>> +BPF_CALL_4(bpf_redirect_neigh, u32, ifindex, struct bpf_redir_neigh *, params,
>> +	   int, plen, u64, flags)
>>   {
>>   	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
>>   
>> -	if (unlikely(flags))
>> +	if (unlikely((plen && plen < sizeof(*params)) || flags))
>> +		return TC_ACT_SHOT;
>> +
>> +	if (unlikely(plen && (params->unused[0] || params->unused[1] ||
>> +			      params->unused[2])))
>
> small nit: maybe fold this into the prior check that already tests non-zero plen
>
> if (unlikely((plen && (plen < sizeof(*params) ||
>                         (params->unused[0] | params->unused[1] |
>                          params->unused[2]))) || flags))
>          return TC_ACT_SHOT;

Well that was my first thought as well, but I thought it was uglier.
Isn't the compiler smart enough to make those two equivalent?

Anyway, given Jakub's comment, I guess this is moot anyway, as we should
just get rid of the member, no?

-Toke


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf v2 2/3] bpf_fib_lookup: optionally skip neighbour lookup
  2020-10-20 15:04     ` Daniel Borkmann
@ 2020-10-20 18:10       ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 19+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-20 18:10 UTC (permalink / raw)
  To: Daniel Borkmann, David Ahern; +Cc: David Ahern, netdev, bpf

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 10/20/20 3:49 PM, David Ahern wrote:
>> On 10/20/20 4:51 AM, Toke Høiland-Jørgensen wrote:
>>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>>>
>>> The bpf_fib_lookup() helper performs a neighbour lookup for the destination
>>> IP and returns BPF_FIB_LKUP_NO_NEIGH if this fails, with the expectation
>>> that the BPF program will deal with this condition, either by passing the
>>> packet up the stack, or by using bpf_redirect_neigh().
>>>
>>> The neighbour lookup is done via a hash table (through ___neigh_lookup_noref()),
>>> which incurs some overhead. If the caller knows this is likely to fail
>>> anyway, it may want to skip that and go unconditionally to
>>> bpf_redirect_neigh(). For this use case, add a flag to bpf_fib_lookup()
>>> that will make it skip the neighbour lookup and instead always return
>>> BPF_FIB_LKUP_RET_NO_NEIGH (but still populate the gateway and target
>>> ifindex).
>>>
>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>> ---
>>>   include/uapi/linux/bpf.h       |   10 ++++++----
>>>   net/core/filter.c              |   16 ++++++++++++++--
>>>   tools/include/uapi/linux/bpf.h |   10 ++++++----
>>>   3 files changed, 26 insertions(+), 10 deletions(-)
>> 
>> Nack. Please don't.
>> 
>> As I mentioned in my reply to Daniel, I would prefer such logic be
>> pushed to the bpf programs. There is no reason for rare run time events
>> to warrant a new flag and new check in the existing FIB helpers. The bpf
>> programs can take the hit of the extra lookup.
>
> Fair enough, lets push it to progs then.

OK, with this and the other comments, this goes back to v1 + the
compilation fix. Will send that as v3...

-Toke


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf v2 1/3] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
  2020-10-20 16:30   ` Jakub Kicinski
  2020-10-20 18:08     ` Toke Høiland-Jørgensen
@ 2020-10-20 18:12     ` David Ahern
  2020-10-20 18:56       ` Jakub Kicinski
  1 sibling, 1 reply; 19+ messages in thread
From: David Ahern @ 2020-10-20 18:12 UTC (permalink / raw)
  To: Jakub Kicinski, Toke Høiland-Jørgensen
  Cc: Daniel Borkmann, David Ahern, netdev, bpf

On 10/20/20 10:30 AM, Jakub Kicinski wrote:
> On Tue, 20 Oct 2020 12:51:02 +0200 Toke Høiland-Jørgensen wrote:
>> diff --git a/include/linux/filter.h b/include/linux/filter.h
>> index 20fc24c9779a..ba9de7188cd0 100644
>> --- a/include/linux/filter.h
>> +++ b/include/linux/filter.h
>> @@ -607,12 +607,21 @@ struct bpf_skb_data_end {
>>  	void *data_end;
>>  };
>>  
>> +struct bpf_nh_params {
>> +	u8 nh_family;
>> +	union {
>> +		__u32 ipv4_nh;
>> +		struct in6_addr ipv6_nh;
>> +	};
>> +};
> 
>> @@ -4906,6 +4910,18 @@ struct bpf_fib_lookup {
>>  	__u8	dmac[6];     /* ETH_ALEN */
>>  };
>>  
>> +struct bpf_redir_neigh {
>> +	/* network family for lookup (AF_INET, AF_INET6) */
>> +	__u8 nh_family;
>> +	 /* avoid hole in struct - must be set to 0 */
>> +	__u8 unused[3];
>> +	/* network address of nexthop; skips fib lookup to find gateway */
>> +	union {
>> +		__be32		ipv4_nh;
>> +		__u32		ipv6_nh[4];  /* in6_addr; network order */
>> +	};
>> +};
> 
> Isn't this backward? The hole could be named in the internal structure.
> This is a bit of a gray area, but if you name this hole in uAPI and
> programs start referring to it you will never be able to reuse it.
> So you may as well not require it to be zeroed..
> 

for uapi naming the holes, stating they are unused and requiring a 0
value allows them to be used later if an api change needs to.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf v2 1/3] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
  2020-10-20 18:03     ` Toke Høiland-Jørgensen
@ 2020-10-20 18:14       ` David Ahern
  2020-10-20 18:50         ` Jakub Kicinski
  0 siblings, 1 reply; 19+ messages in thread
From: David Ahern @ 2020-10-20 18:14 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Jakub Kicinski
  Cc: Daniel Borkmann, David Ahern, netdev, bpf

On 10/20/20 12:03 PM, Toke Høiland-Jørgensen wrote:
> Jakub Kicinski <kuba@kernel.org> writes:
> 
>> On Tue, 20 Oct 2020 12:51:02 +0200 Toke Høiland-Jørgensen wrote:
>>> +struct bpf_nh_params {
>>> +	u8 nh_family;
>>> +	union {
>>> +		__u32 ipv4_nh;
>>> +		struct in6_addr ipv6_nh;
>>> +	};
>>> +};
>>
>> Folks, not directly related to this set, but there's a SRv6 patch going
>> around which adds ifindex, otherwise nh can't be link local.
>>
>> I wonder if we want to consider this use case from the start (or the
>> close approximation of start in this case ;)).
> 
> The ifindex is there, it's just in the function call signature instead
> of the struct... Or did you mean something different?
> 

ifindex as the first argument qualifies the device for the address.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf v2 1/3] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
  2020-10-20 18:14       ` David Ahern
@ 2020-10-20 18:50         ` Jakub Kicinski
  0 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2020-10-20 18:50 UTC (permalink / raw)
  To: David Ahern
  Cc: Toke Høiland-Jørgensen, Daniel Borkmann, David Ahern,
	netdev, bpf

On Tue, 20 Oct 2020 12:14:16 -0600 David Ahern wrote:
> On 10/20/20 12:03 PM, Toke Høiland-Jørgensen wrote:
> > Jakub Kicinski <kuba@kernel.org> writes:
> >> On Tue, 20 Oct 2020 12:51:02 +0200 Toke Høiland-Jørgensen wrote:  
> >>> +struct bpf_nh_params {
> >>> +	u8 nh_family;
> >>> +	union {
> >>> +		__u32 ipv4_nh;
> >>> +		struct in6_addr ipv6_nh;
> >>> +	};
> >>> +};  
> >>
> >> Folks, not directly related to this set, but there's a SRv6 patch going
> >> around which adds ifindex, otherwise nh can't be link local.
> >>
> >> I wonder if we want to consider this use case from the start (or the
> >> close approximation of start in this case ;)).  
> > 
> > The ifindex is there, it's just in the function call signature instead
> > of the struct... Or did you mean something different?
> 
> ifindex as the first argument qualifies the device for the address.

Ah, I should have read closer. Seeing there is a plen I assumed all
args would naturally be in the structure, but I'm guessing the case
where params are NULL will be quite common. Don't mind me.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf v2 1/3] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
  2020-10-20 18:12     ` David Ahern
@ 2020-10-20 18:56       ` Jakub Kicinski
  0 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2020-10-20 18:56 UTC (permalink / raw)
  To: David Ahern
  Cc: Toke Høiland-Jørgensen, Daniel Borkmann, David Ahern,
	netdev, bpf

On Tue, 20 Oct 2020 12:12:32 -0600 David Ahern wrote:
> On 10/20/20 10:30 AM, Jakub Kicinski wrote:
> > On Tue, 20 Oct 2020 12:51:02 +0200 Toke Høiland-Jørgensen wrote:  
> >> diff --git a/include/linux/filter.h b/include/linux/filter.h
> >> index 20fc24c9779a..ba9de7188cd0 100644
> >> --- a/include/linux/filter.h
> >> +++ b/include/linux/filter.h
> >> @@ -607,12 +607,21 @@ struct bpf_skb_data_end {
> >>  	void *data_end;
> >>  };
> >>  
> >> +struct bpf_nh_params {
> >> +	u8 nh_family;
> >> +	union {
> >> +		__u32 ipv4_nh;
> >> +		struct in6_addr ipv6_nh;
> >> +	};
> >> +};  
> >   
> >> @@ -4906,6 +4910,18 @@ struct bpf_fib_lookup {
> >>  	__u8	dmac[6];     /* ETH_ALEN */
> >>  };
> >>  
> >> +struct bpf_redir_neigh {
> >> +	/* network family for lookup (AF_INET, AF_INET6) */
> >> +	__u8 nh_family;
> >> +	 /* avoid hole in struct - must be set to 0 */
> >> +	__u8 unused[3];
> >> +	/* network address of nexthop; skips fib lookup to find gateway */
> >> +	union {
> >> +		__be32		ipv4_nh;
> >> +		__u32		ipv6_nh[4];  /* in6_addr; network order */
> >> +	};
> >> +};  
> > 
> > Isn't this backward? The hole could be named in the internal structure.
> > This is a bit of a gray area, but if you name this hole in uAPI and
> > programs start referring to it you will never be able to reuse it.
> > So you may as well not require it to be zeroed..
> 
> for uapi naming the holes, stating they are unused and requiring a 0
> value allows them to be used later if an api change needs to.

I'm not sure what you're saying, if the field is referenced it can't be
removed. But we could use a union, so I guess it's not a deal breaker.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf v2 1/3] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
  2020-10-20 18:08     ` Toke Høiland-Jørgensen
@ 2020-10-20 19:01       ` Jakub Kicinski
  2020-10-20 19:47         ` Daniel Borkmann
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2020-10-20 19:01 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Daniel Borkmann, David Ahern, netdev, bpf

On Tue, 20 Oct 2020 20:08:18 +0200 Toke Høiland-Jørgensen wrote:
> > Isn't this backward? The hole could be named in the internal structure.
> > This is a bit of a gray area, but if you name this hole in uAPI and
> > programs start referring to it you will never be able to reuse it.
> > So you may as well not require it to be zeroed..  
> 
> Hmm, yeah, suppose you're right. Doesn't the verifier prevent any part
> of the memory from being unitialised anyway? I seem to recall having run
> into verifier complaints when I didn't initialise struct on the stack...

Good point, in which case we have a convenient way to zero the hole
after nh_family but no convenient way to zero the empty address space
for IPv4 :) (even though that one only needs to be zeroed for the
verifier)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf v2 1/3] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
  2020-10-20 19:01       ` Jakub Kicinski
@ 2020-10-20 19:47         ` Daniel Borkmann
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Borkmann @ 2020-10-20 19:47 UTC (permalink / raw)
  To: Jakub Kicinski, Toke Høiland-Jørgensen; +Cc: David Ahern, netdev, bpf

On 10/20/20 9:01 PM, Jakub Kicinski wrote:
> On Tue, 20 Oct 2020 20:08:18 +0200 Toke Høiland-Jørgensen wrote:
>>> Isn't this backward? The hole could be named in the internal structure.
>>> This is a bit of a gray area, but if you name this hole in uAPI and
>>> programs start referring to it you will never be able to reuse it.
>>> So you may as well not require it to be zeroed..
>>
>> Hmm, yeah, suppose you're right. Doesn't the verifier prevent any part
>> of the memory from being unitialised anyway? I seem to recall having run
>> into verifier complaints when I didn't initialise struct on the stack...
> 
> Good point, in which case we have a convenient way to zero the hole
> after nh_family but no convenient way to zero the empty address space
> for IPv4 :) (even though that one only needs to be zeroed for the
> verifier)

Technically, it's uninitialized, so zero or any other garbage from BPF stack's
previous use of the program. We could use couple of __u8 :8 after nh_family to
have an unnamed placeholder (like in __bpf_md_ptr()), or we might as well just
switch to __u32 nh_family and avoid the hole that way (also gets rid of the extra
check) ... given we have the liberty to extend later anyway if ever needed.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-10-20 19:47 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-20 10:51 [PATCH bpf v2 0/3] bpf: Rework bpf_redirect_neigh() to allow supplying nexthop from caller Toke Høiland-Jørgensen
2020-10-20 10:51 ` [PATCH bpf v2 1/3] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter Toke Høiland-Jørgensen
2020-10-20 15:08   ` Daniel Borkmann
2020-10-20 18:08     ` Toke Høiland-Jørgensen
2020-10-20 16:30   ` Jakub Kicinski
2020-10-20 18:08     ` Toke Høiland-Jørgensen
2020-10-20 19:01       ` Jakub Kicinski
2020-10-20 19:47         ` Daniel Borkmann
2020-10-20 18:12     ` David Ahern
2020-10-20 18:56       ` Jakub Kicinski
2020-10-20 16:34   ` Jakub Kicinski
2020-10-20 18:03     ` Toke Høiland-Jørgensen
2020-10-20 18:14       ` David Ahern
2020-10-20 18:50         ` Jakub Kicinski
2020-10-20 10:51 ` [PATCH bpf v2 2/3] bpf_fib_lookup: optionally skip neighbour lookup Toke Høiland-Jørgensen
2020-10-20 13:49   ` David Ahern
2020-10-20 15:04     ` Daniel Borkmann
2020-10-20 18:10       ` Toke Høiland-Jørgensen
2020-10-20 10:51 ` [PATCH bpf v2 3/3] selftests: Update test_tc_redirect.sh to use the modified bpf_redirect_neigh() Toke Høiland-Jørgensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).