* [PATCH RFC bpf-next 0/2] bpf: Rework bpf_redirect_neigh() to allow supplying nexthop from caller
@ 2020-10-15 15:46 Toke Høiland-Jørgensen
2020-10-15 15:46 ` [PATCH RFC bpf-next 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter Toke Høiland-Jørgensen
2020-10-15 15:46 ` [PATCH RFC bpf-next 2/2] selftests: Update test_tc_neigh to use the modified bpf_redirect_neigh() Toke Høiland-Jørgensen
0 siblings, 2 replies; 12+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-15 15:46 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: David Ahern, netdev, bpf
Based on previous discussion[0], we determined that it would be beneficial to
rework bpf_redirect_neigh() so the caller can supply the nexthop information
(e.g., from a previous call to bpf_fib_lookup()). This way, the two helpers can
be combined without incurring a second FIB lookup to find the nexthop, and
bpf_fib_lookup() becomes usable even if no nexthop entry currently exists.
This patch (and accompanying selftest update) accomplishes this by way of an
optional paramter to bpf_redirect_neigh(). This is an API change, and so should
really be merged into the bpf tree to be part of the 5.10 cycle; however, since
bpf-next has not yet been merged into bpf, I'm sending this as an RFC against
bpf-next for discussion, and will repost against bpf once that merge happens
(Daniel, unless you have a better way of doing this, of course).
-Toke
[0] https://lore.kernel.org/bpf/393e17fc-d187-3a8d-2f0d-a627c7c63fca@iogearbox.net/
---
Toke Høiland-Jørgensen (2):
bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
selftests: Update test_tc_neigh to use the modified bpf_redirect_neigh()
.../selftests/bpf/progs/test_tc_neigh.c | 83 ++++++++++++++++---
.../testing/selftests/bpf/test_tc_redirect.sh | 8 +-
2 files changed, 78 insertions(+), 13 deletions(-)
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH RFC bpf-next 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
2020-10-15 15:46 [PATCH RFC bpf-next 0/2] bpf: Rework bpf_redirect_neigh() to allow supplying nexthop from caller Toke Høiland-Jørgensen
@ 2020-10-15 15:46 ` Toke Høiland-Jørgensen
2020-10-15 16:27 ` David Ahern
` (2 more replies)
2020-10-15 15:46 ` [PATCH RFC bpf-next 2/2] selftests: Update test_tc_neigh to use the modified bpf_redirect_neigh() Toke Høiland-Jørgensen
1 sibling, 3 replies; 12+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-15 15:46 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: David Ahern, netdev, bpf
From: Toke Høiland-Jørgensen <toke@redhat.com>
Based on the discussion in [0], update the bpf_redirect_neigh() helper to
accept an optional parameter specifying the nexthop information. This makes
it possible to combine bpf_fib_lookup() and bpf_redirect_neigh() without
incurring a duplicate FIB lookup - since the FIB lookup helper will return
the nexthop information even if no neighbour is present, this can simply be
passed on to bpf_redirect_neigh() if bpf_fib_lookup() returns
BPF_FIB_LKUP_RET_NO_NEIGH.
[0] https://lore.kernel.org/bpf/393e17fc-d187-3a8d-2f0d-a627c7c63fca@iogearbox.net/
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
include/linux/filter.h | 9 ++
include/uapi/linux/bpf.h | 23 +++++-
net/core/filter.c | 152 +++++++++++++++++++++++++---------------
scripts/bpf_helpers_doc.py | 1
tools/include/uapi/linux/bpf.h | 23 +++++-
5 files changed, 143 insertions(+), 65 deletions(-)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 20fc24c9779a..ba9de7188cd0 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -607,12 +607,21 @@ struct bpf_skb_data_end {
void *data_end;
};
+struct bpf_nh_params {
+ u8 nh_family;
+ union {
+ __u32 ipv4_nh;
+ struct in6_addr ipv6_nh;
+ };
+};
+
struct bpf_redirect_info {
u32 flags;
u32 tgt_index;
void *tgt_value;
struct bpf_map *map;
u32 kern_flags;
+ struct bpf_nh_params nh;
};
DECLARE_PER_CPU(struct bpf_redirect_info, bpf_redirect_info);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index bf5a99d803e4..980cc1363be8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3677,15 +3677,19 @@ union bpf_attr {
* Return
* The id is returned or 0 in case the id could not be retrieved.
*
- * long bpf_redirect_neigh(u32 ifindex, u64 flags)
+ * long bpf_redirect_neigh(u32 ifindex, struct bpf_redir_neigh *params, int plen, u64 flags)
* Description
* Redirect the packet to another net device of index *ifindex*
* and fill in L2 addresses from neighboring subsystem. This helper
* is somewhat similar to **bpf_redirect**\ (), except that it
* populates L2 addresses as well, meaning, internally, the helper
- * performs a FIB lookup based on the skb's networking header to
- * get the address of the next hop and then relies on the neighbor
- * lookup for the L2 address of the nexthop.
+ * relies on the neighbor lookup for the L2 address of the nexthop.
+ *
+ * The helper will perform a FIB lookup based on the skb's
+ * networking header to get the address of the next hop, unless
+ * this is supplied by the caller in the *params* argument. The
+ * *plen* argument indicates the len of *params* and should be set
+ * to 0 if *params* is NULL.
*
* The *flags* argument is reserved and must be 0. The helper is
* currently only supported for tc BPF program types, and enabled
@@ -4906,6 +4910,17 @@ struct bpf_fib_lookup {
__u8 dmac[6]; /* ETH_ALEN */
};
+struct bpf_redir_neigh {
+ /* network family for lookup (AF_INET, AF_INET6)
+ */
+ __u8 nh_family;
+ /* network address of nexthop; skips fib lookup to find gateway */
+ union {
+ __be32 ipv4_nh;
+ __u32 ipv6_nh[4]; /* in6_addr; network order */
+ };
+};
+
enum bpf_task_fd_type {
BPF_FD_TYPE_RAW_TRACEPOINT, /* tp name */
BPF_FD_TYPE_TRACEPOINT, /* tp name */
diff --git a/net/core/filter.c b/net/core/filter.c
index c5e2a1c5fd8d..d073031a3a61 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2165,12 +2165,11 @@ static int __bpf_redirect(struct sk_buff *skb, struct net_device *dev,
}
#if IS_ENABLED(CONFIG_IPV6)
-static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb)
+static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb,
+ struct net_device *dev, const struct in6_addr *nexthop)
{
- struct dst_entry *dst = skb_dst(skb);
- struct net_device *dev = dst->dev;
u32 hh_len = LL_RESERVED_SPACE(dev);
- const struct in6_addr *nexthop;
+ struct dst_entry *dst = NULL;
struct neighbour *neigh;
if (dev_xmit_recursion()) {
@@ -2196,8 +2195,11 @@ static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb)
}
rcu_read_lock_bh();
- nexthop = rt6_nexthop(container_of(dst, struct rt6_info, dst),
- &ipv6_hdr(skb)->daddr);
+ if (!nexthop) {
+ dst = skb_dst(skb);
+ nexthop = rt6_nexthop(container_of(dst, struct rt6_info, dst),
+ &ipv6_hdr(skb)->daddr);
+ }
neigh = ip_neigh_gw6(dev, nexthop);
if (likely(!IS_ERR(neigh))) {
int ret;
@@ -2210,36 +2212,46 @@ static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb)
return ret;
}
rcu_read_unlock_bh();
- IP6_INC_STATS(dev_net(dst->dev),
- ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES);
+ if (dst)
+ IP6_INC_STATS(dev_net(dst->dev),
+ ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES);
out_drop:
kfree_skb(skb);
return -ENETDOWN;
}
-static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev)
+static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev,
+ struct bpf_nh_params *nh)
{
const struct ipv6hdr *ip6h = ipv6_hdr(skb);
+ struct in6_addr *nexthop = NULL;
struct net *net = dev_net(dev);
int err, ret = NET_XMIT_DROP;
- struct dst_entry *dst;
- struct flowi6 fl6 = {
- .flowi6_flags = FLOWI_FLAG_ANYSRC,
- .flowi6_mark = skb->mark,
- .flowlabel = ip6_flowinfo(ip6h),
- .flowi6_oif = dev->ifindex,
- .flowi6_proto = ip6h->nexthdr,
- .daddr = ip6h->daddr,
- .saddr = ip6h->saddr,
- };
- dst = ipv6_stub->ipv6_dst_lookup_flow(net, NULL, &fl6, NULL);
- if (IS_ERR(dst))
- goto out_drop;
+ if (!nh->nh_family) {
+ struct dst_entry *dst;
+ struct flowi6 fl6 = {
+ .flowi6_flags = FLOWI_FLAG_ANYSRC,
+ .flowi6_mark = skb->mark,
+ .flowlabel = ip6_flowinfo(ip6h),
+ .flowi6_oif = dev->ifindex,
+ .flowi6_proto = ip6h->nexthdr,
+ .daddr = ip6h->daddr,
+ .saddr = ip6h->saddr,
+ };
+
+ dst = ipv6_stub->ipv6_dst_lookup_flow(net, NULL, &fl6, NULL);
+ if (IS_ERR(dst))
+ goto out_drop;
- skb_dst_set(skb, dst);
+ skb_dst_set(skb, dst);
+ } else if (nh->nh_family == AF_INET6) {
+ nexthop = &nh->ipv6_nh;
+ } else {
+ goto out_drop;
+ }
- err = bpf_out_neigh_v6(net, skb);
+ err = bpf_out_neigh_v6(net, skb, dev, nexthop);
if (unlikely(net_xmit_eval(err)))
dev->stats.tx_errors++;
else
@@ -2260,11 +2272,9 @@ static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev)
#endif /* CONFIG_IPV6 */
#if IS_ENABLED(CONFIG_INET)
-static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb)
+static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb,
+ struct net_device *dev, struct bpf_nh_params *nh)
{
- struct dst_entry *dst = skb_dst(skb);
- struct rtable *rt = container_of(dst, struct rtable, dst);
- struct net_device *dev = dst->dev;
u32 hh_len = LL_RESERVED_SPACE(dev);
struct neighbour *neigh;
bool is_v6gw = false;
@@ -2292,7 +2302,20 @@ static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb)
}
rcu_read_lock_bh();
- neigh = ip_neigh_for_gw(rt, skb, &is_v6gw);
+ if (!nh) {
+ struct dst_entry *dst = skb_dst(skb);
+ struct rtable *rt = container_of(dst, struct rtable, dst);
+
+ neigh = ip_neigh_for_gw(rt, skb, &is_v6gw);
+ } else if (nh->nh_family == AF_INET6) {
+ neigh = ip_neigh_gw6(dev, &nh->ipv6_nh);
+ is_v6gw = true;
+ } else if (nh->nh_family == AF_INET) {
+ neigh = ip_neigh_gw4(dev, nh->ipv4_nh);
+ } else {
+ goto out_drop;
+ }
+
if (likely(!IS_ERR(neigh))) {
int ret;
@@ -2309,33 +2332,38 @@ static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb)
return -ENETDOWN;
}
-static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev)
+static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev,
+ struct bpf_nh_params *nh)
{
const struct iphdr *ip4h = ip_hdr(skb);
struct net *net = dev_net(dev);
int err, ret = NET_XMIT_DROP;
- struct rtable *rt;
- struct flowi4 fl4 = {
- .flowi4_flags = FLOWI_FLAG_ANYSRC,
- .flowi4_mark = skb->mark,
- .flowi4_tos = RT_TOS(ip4h->tos),
- .flowi4_oif = dev->ifindex,
- .flowi4_proto = ip4h->protocol,
- .daddr = ip4h->daddr,
- .saddr = ip4h->saddr,
- };
- rt = ip_route_output_flow(net, &fl4, NULL);
- if (IS_ERR(rt))
- goto out_drop;
- if (rt->rt_type != RTN_UNICAST && rt->rt_type != RTN_LOCAL) {
- ip_rt_put(rt);
- goto out_drop;
- }
+ if (!nh->nh_family) {
+ struct rtable *rt;
+ struct flowi4 fl4 = {
+ .flowi4_flags = FLOWI_FLAG_ANYSRC,
+ .flowi4_mark = skb->mark,
+ .flowi4_tos = RT_TOS(ip4h->tos),
+ .flowi4_oif = dev->ifindex,
+ .flowi4_proto = ip4h->protocol,
+ .daddr = ip4h->daddr,
+ .saddr = ip4h->saddr,
+ };
+
+ rt = ip_route_output_flow(net, &fl4, NULL);
+ if (IS_ERR(rt))
+ goto out_drop;
+ if (rt->rt_type != RTN_UNICAST && rt->rt_type != RTN_LOCAL) {
+ ip_rt_put(rt);
+ goto out_drop;
+ }
- skb_dst_set(skb, &rt->dst);
+ skb_dst_set(skb, &rt->dst);
+ nh = NULL;
+ }
- err = bpf_out_neigh_v4(net, skb);
+ err = bpf_out_neigh_v4(net, skb, dev, nh);
if (unlikely(net_xmit_eval(err)))
dev->stats.tx_errors++;
else
@@ -2355,7 +2383,8 @@ static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev)
}
#endif /* CONFIG_INET */
-static int __bpf_redirect_neigh(struct sk_buff *skb, struct net_device *dev)
+static int __bpf_redirect_neigh(struct sk_buff *skb, struct net_device *dev,
+ struct bpf_nh_params *nh)
{
struct ethhdr *ethh = eth_hdr(skb);
@@ -2370,9 +2399,9 @@ static int __bpf_redirect_neigh(struct sk_buff *skb, struct net_device *dev)
skb_reset_network_header(skb);
if (skb->protocol == htons(ETH_P_IP))
- return __bpf_redirect_neigh_v4(skb, dev);
+ return __bpf_redirect_neigh_v4(skb, dev, nh);
else if (skb->protocol == htons(ETH_P_IPV6))
- return __bpf_redirect_neigh_v6(skb, dev);
+ return __bpf_redirect_neigh_v6(skb, dev, nh);
out:
kfree_skb(skb);
return -ENOTSUPP;
@@ -2455,8 +2484,8 @@ int skb_do_redirect(struct sk_buff *skb)
return -EAGAIN;
}
return flags & BPF_F_NEIGH ?
- __bpf_redirect_neigh(skb, dev) :
- __bpf_redirect(skb, dev, flags);
+ __bpf_redirect_neigh(skb, dev, &ri->nh) :
+ __bpf_redirect(skb, dev, flags);
out_drop:
kfree_skb(skb);
return -EINVAL;
@@ -2504,16 +2533,23 @@ static const struct bpf_func_proto bpf_redirect_peer_proto = {
.arg2_type = ARG_ANYTHING,
};
-BPF_CALL_2(bpf_redirect_neigh, u32, ifindex, u64, flags)
+BPF_CALL_4(bpf_redirect_neigh, u32, ifindex, struct bpf_redir_neigh *, params,
+ int, plen, u64, flags)
{
struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
- if (unlikely(flags))
+ if (unlikely((plen && plen < sizeof(*params)) || flags))
return TC_ACT_SHOT;
ri->flags = BPF_F_NEIGH;
ri->tgt_index = ifindex;
+ BUILD_BUG_ON(sizeof(struct bpf_redir_neigh) != sizeof(struct bpf_nh_params));
+ if (plen)
+ memcpy(&ri->nh, params, sizeof(ri->nh));
+ else
+ ri->nh.nh_family = 0; /* clear previous value */
+
return TC_ACT_REDIRECT;
}
@@ -2522,7 +2558,9 @@ static const struct bpf_func_proto bpf_redirect_neigh_proto = {
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_ANYTHING,
- .arg2_type = ARG_ANYTHING,
+ .arg2_type = ARG_PTR_TO_MEM_OR_NULL,
+ .arg3_type = ARG_CONST_SIZE_OR_ZERO,
+ .arg4_type = ARG_ANYTHING,
};
BPF_CALL_2(bpf_msg_apply_bytes, struct sk_msg *, msg, u32, bytes)
diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py
index 7d86fdd190be..6769caae142f 100755
--- a/scripts/bpf_helpers_doc.py
+++ b/scripts/bpf_helpers_doc.py
@@ -453,6 +453,7 @@ class PrinterHelpers(Printer):
'struct bpf_perf_event_data',
'struct bpf_perf_event_value',
'struct bpf_pidns_info',
+ 'struct bpf_redir_neigh',
'struct bpf_sk_lookup',
'struct bpf_sock',
'struct bpf_sock_addr',
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index bf5a99d803e4..980cc1363be8 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3677,15 +3677,19 @@ union bpf_attr {
* Return
* The id is returned or 0 in case the id could not be retrieved.
*
- * long bpf_redirect_neigh(u32 ifindex, u64 flags)
+ * long bpf_redirect_neigh(u32 ifindex, struct bpf_redir_neigh *params, int plen, u64 flags)
* Description
* Redirect the packet to another net device of index *ifindex*
* and fill in L2 addresses from neighboring subsystem. This helper
* is somewhat similar to **bpf_redirect**\ (), except that it
* populates L2 addresses as well, meaning, internally, the helper
- * performs a FIB lookup based on the skb's networking header to
- * get the address of the next hop and then relies on the neighbor
- * lookup for the L2 address of the nexthop.
+ * relies on the neighbor lookup for the L2 address of the nexthop.
+ *
+ * The helper will perform a FIB lookup based on the skb's
+ * networking header to get the address of the next hop, unless
+ * this is supplied by the caller in the *params* argument. The
+ * *plen* argument indicates the len of *params* and should be set
+ * to 0 if *params* is NULL.
*
* The *flags* argument is reserved and must be 0. The helper is
* currently only supported for tc BPF program types, and enabled
@@ -4906,6 +4910,17 @@ struct bpf_fib_lookup {
__u8 dmac[6]; /* ETH_ALEN */
};
+struct bpf_redir_neigh {
+ /* network family for lookup (AF_INET, AF_INET6)
+ */
+ __u8 nh_family;
+ /* network address of nexthop; skips fib lookup to find gateway */
+ union {
+ __be32 ipv4_nh;
+ __u32 ipv6_nh[4]; /* in6_addr; network order */
+ };
+};
+
enum bpf_task_fd_type {
BPF_FD_TYPE_RAW_TRACEPOINT, /* tp name */
BPF_FD_TYPE_TRACEPOINT, /* tp name */
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH RFC bpf-next 2/2] selftests: Update test_tc_neigh to use the modified bpf_redirect_neigh()
2020-10-15 15:46 [PATCH RFC bpf-next 0/2] bpf: Rework bpf_redirect_neigh() to allow supplying nexthop from caller Toke Høiland-Jørgensen
2020-10-15 15:46 ` [PATCH RFC bpf-next 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter Toke Høiland-Jørgensen
@ 2020-10-15 15:46 ` Toke Høiland-Jørgensen
2020-10-19 14:40 ` Daniel Borkmann
1 sibling, 1 reply; 12+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-15 15:46 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: David Ahern, netdev, bpf
From: Toke Høiland-Jørgensen <toke@redhat.com>
This updates the test_tc_neigh selftest to use the new syntax of
bpf_redirect_neigh(). To exercise the helper both with and without the
optional parameter, one forwarding direction is changed to do a
bpf_fib_lookup() followed by a call to bpf_redirect_neigh(), while the
other direction is using the map-based ifindex lookup letting the redirect
helper resolve the nexthop from the FIB.
This also fixes the test_tc_redirect.sh script to work on systems that have
a consolidated dual-stack 'ping' binary instead of separate ping/ping6
versions.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
tools/testing/selftests/bpf/progs/test_tc_neigh.c | 83 ++++++++++++++++++---
tools/testing/selftests/bpf/test_tc_redirect.sh | 8 +-
2 files changed, 78 insertions(+), 13 deletions(-)
diff --git a/tools/testing/selftests/bpf/progs/test_tc_neigh.c b/tools/testing/selftests/bpf/progs/test_tc_neigh.c
index fe182616b112..ba03e603ba9b 100644
--- a/tools/testing/selftests/bpf/progs/test_tc_neigh.c
+++ b/tools/testing/selftests/bpf/progs/test_tc_neigh.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
#include <stdint.h>
#include <stdbool.h>
+#include <stddef.h>
#include <linux/bpf.h>
#include <linux/stddef.h>
@@ -32,6 +33,9 @@
a.s6_addr32[3] == b.s6_addr32[3])
#endif
+#define AF_INET 2
+#define AF_INET6 10
+
enum {
dev_src,
dev_dst,
@@ -45,7 +49,8 @@ struct bpf_map_def SEC("maps") ifindex_map = {
};
static __always_inline bool is_remote_ep_v4(struct __sk_buff *skb,
- __be32 addr)
+ __be32 addr,
+ struct bpf_fib_lookup *fib_params)
{
void *data_end = ctx_ptr(skb->data_end);
void *data = ctx_ptr(skb->data);
@@ -58,11 +63,26 @@ static __always_inline bool is_remote_ep_v4(struct __sk_buff *skb,
if ((void *)(ip4h + 1) > data_end)
return false;
- return ip4h->daddr == addr;
+ if (ip4h->daddr != addr)
+ return false;
+
+ if (fib_params) {
+ fib_params->family = AF_INET;
+ fib_params->tos = ip4h->tos;
+ fib_params->l4_protocol = ip4h->protocol;
+ fib_params->sport = 0;
+ fib_params->dport = 0;
+ fib_params->tot_len = bpf_ntohs(ip4h->tot_len);
+ fib_params->ipv4_src = ip4h->saddr;
+ fib_params->ipv4_dst = ip4h->daddr;
+ }
+
+ return true;
}
static __always_inline bool is_remote_ep_v6(struct __sk_buff *skb,
- struct in6_addr addr)
+ struct in6_addr addr,
+ struct bpf_fib_lookup *fib_params)
{
void *data_end = ctx_ptr(skb->data_end);
void *data = ctx_ptr(skb->data);
@@ -75,7 +95,24 @@ static __always_inline bool is_remote_ep_v6(struct __sk_buff *skb,
if ((void *)(ip6h + 1) > data_end)
return false;
- return v6_equal(ip6h->daddr, addr);
+ if (!v6_equal(ip6h->daddr, addr))
+ return false;
+
+ if (fib_params) {
+ struct in6_addr *src = (struct in6_addr *)fib_params->ipv6_src;
+ struct in6_addr *dst = (struct in6_addr *)fib_params->ipv6_dst;
+
+ fib_params->family = AF_INET6;
+ fib_params->flowinfo = 0;
+ fib_params->l4_protocol = ip6h->nexthdr;
+ fib_params->sport = 0;
+ fib_params->dport = 0;
+ fib_params->tot_len = bpf_ntohs(ip6h->payload_len);
+ *src = ip6h->saddr;
+ *dst = ip6h->daddr;
+ }
+
+ return true;
}
static __always_inline int get_dev_ifindex(int which)
@@ -99,15 +136,17 @@ SEC("chk_egress") int tc_chk(struct __sk_buff *skb)
SEC("dst_ingress") int tc_dst(struct __sk_buff *skb)
{
+ struct bpf_fib_lookup fib_params = { .ifindex = skb->ingress_ifindex };
__u8 zero[ETH_ALEN * 2];
bool redirect = false;
+ int ret;
switch (skb->protocol) {
case __bpf_constant_htons(ETH_P_IP):
- redirect = is_remote_ep_v4(skb, __bpf_constant_htonl(ip4_src));
+ redirect = is_remote_ep_v4(skb, __bpf_constant_htonl(ip4_src), &fib_params);
break;
case __bpf_constant_htons(ETH_P_IPV6):
- redirect = is_remote_ep_v6(skb, (struct in6_addr)ip6_src);
+ redirect = is_remote_ep_v6(skb, (struct in6_addr)ip6_src, &fib_params);
break;
}
@@ -118,7 +157,31 @@ SEC("dst_ingress") int tc_dst(struct __sk_buff *skb)
if (bpf_skb_store_bytes(skb, 0, &zero, sizeof(zero), 0) < 0)
return TC_ACT_SHOT;
- return bpf_redirect_neigh(get_dev_ifindex(dev_src), 0);
+ ret = bpf_fib_lookup(skb, &fib_params, sizeof(fib_params), 0);
+ bpf_printk("bpf_fib_lookup() ret: %d\n", ret);
+ if (ret == BPF_FIB_LKUP_RET_SUCCESS) {
+ void *data_end = ctx_ptr(skb->data_end);
+ struct ethhdr *eth = ctx_ptr(skb->data);
+
+ if (eth + 1 > data_end)
+ return TC_ACT_SHOT;
+
+ __builtin_memcpy(eth->h_dest, fib_params.dmac, ETH_ALEN);
+ __builtin_memcpy(eth->h_source, fib_params.smac, ETH_ALEN);
+
+ return bpf_redirect(fib_params.ifindex, 0);
+
+ } else if (ret == BPF_FIB_LKUP_RET_NO_NEIGH) {
+ struct bpf_redir_neigh nh_params = {};
+
+ nh_params.nh_family = fib_params.family;
+ __builtin_memcpy(&nh_params.ipv6_nh, &fib_params.ipv6_dst,
+ sizeof(nh_params.ipv6_nh));
+
+ return bpf_redirect_neigh(fib_params.ifindex, &nh_params,
+ sizeof(nh_params), 0);
+ }
+ return TC_ACT_SHOT;
}
SEC("src_ingress") int tc_src(struct __sk_buff *skb)
@@ -128,10 +191,10 @@ SEC("src_ingress") int tc_src(struct __sk_buff *skb)
switch (skb->protocol) {
case __bpf_constant_htons(ETH_P_IP):
- redirect = is_remote_ep_v4(skb, __bpf_constant_htonl(ip4_dst));
+ redirect = is_remote_ep_v4(skb, __bpf_constant_htonl(ip4_dst), NULL);
break;
case __bpf_constant_htons(ETH_P_IPV6):
- redirect = is_remote_ep_v6(skb, (struct in6_addr)ip6_dst);
+ redirect = is_remote_ep_v6(skb, (struct in6_addr)ip6_dst, NULL);
break;
}
@@ -142,7 +205,7 @@ SEC("src_ingress") int tc_src(struct __sk_buff *skb)
if (bpf_skb_store_bytes(skb, 0, &zero, sizeof(zero), 0) < 0)
return TC_ACT_SHOT;
- return bpf_redirect_neigh(get_dev_ifindex(dev_dst), 0);
+ return bpf_redirect_neigh(get_dev_ifindex(dev_dst), NULL, 0, 0);
}
char __license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_tc_redirect.sh b/tools/testing/selftests/bpf/test_tc_redirect.sh
index 6d7482562140..09b20f24d018 100755
--- a/tools/testing/selftests/bpf/test_tc_redirect.sh
+++ b/tools/testing/selftests/bpf/test_tc_redirect.sh
@@ -24,8 +24,7 @@ command -v timeout >/dev/null 2>&1 || \
{ echo >&2 "timeout is not available"; exit 1; }
command -v ping >/dev/null 2>&1 || \
{ echo >&2 "ping is not available"; exit 1; }
-command -v ping6 >/dev/null 2>&1 || \
- { echo >&2 "ping6 is not available"; exit 1; }
+if command -v ping6 >/dev/null 2>&1; then PING6=ping6; else PING6=ping; fi
command -v perl >/dev/null 2>&1 || \
{ echo >&2 "perl is not available"; exit 1; }
command -v jq >/dev/null 2>&1 || \
@@ -152,7 +151,7 @@ netns_test_connectivity()
echo -e "${TEST}: ${GREEN}PASS${NC}"
TEST="ICMPv6 connectivity test"
- ip netns exec ${NS_SRC} ping6 $PING_ARG ${IP6_DST}
+ ip netns exec ${NS_SRC} $PING6 $PING_ARG ${IP6_DST}
if [ $? -ne 0 ]; then
echo -e "${TEST}: ${RED}FAIL${NC}"
exit 1
@@ -179,6 +178,9 @@ netns_setup_bpf()
ip netns exec ${NS_FWD} tc filter add dev veth_dst_fwd ingress bpf da obj $obj sec dst_ingress
ip netns exec ${NS_FWD} tc filter add dev veth_dst_fwd egress bpf da obj $obj sec chk_egress
+ # bpf_fib_lookup() checks if forwarding is enabled
+ ip netns exec ${NS_FWD} sysctl -w net.ipv4.ip_forward=1 net.ipv6.conf.veth_dst_fwd.forwarding=1
+
veth_src=$(ip netns exec ${NS_FWD} cat /sys/class/net/veth_src_fwd/ifindex)
veth_dst=$(ip netns exec ${NS_FWD} cat /sys/class/net/veth_dst_fwd/ifindex)
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH RFC bpf-next 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
2020-10-15 15:46 ` [PATCH RFC bpf-next 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter Toke Høiland-Jørgensen
@ 2020-10-15 16:27 ` David Ahern
2020-10-15 19:34 ` Toke Høiland-Jørgensen
2020-10-19 14:48 ` Daniel Borkmann
2020-10-19 15:01 ` Daniel Borkmann
2 siblings, 1 reply; 12+ messages in thread
From: David Ahern @ 2020-10-15 16:27 UTC (permalink / raw)
To: Toke Høiland-Jørgensen, Daniel Borkmann
Cc: David Ahern, netdev, bpf
On 10/15/20 9:46 AM, Toke Høiland-Jørgensen wrote:
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index bf5a99d803e4..980cc1363be8 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -3677,15 +3677,19 @@ union bpf_attr {
> * Return
> * The id is returned or 0 in case the id could not be retrieved.
> *
> - * long bpf_redirect_neigh(u32 ifindex, u64 flags)
> + * long bpf_redirect_neigh(u32 ifindex, struct bpf_redir_neigh *params, int plen, u64 flags)
why not fold ifindex into params? with params and plen this should be
extensible later if needed.
A couple of nits below that caught me eye.
> * Description
> * Redirect the packet to another net device of index *ifindex*
> * and fill in L2 addresses from neighboring subsystem. This helper
> * is somewhat similar to **bpf_redirect**\ (), except that it
> * populates L2 addresses as well, meaning, internally, the helper
> - * performs a FIB lookup based on the skb's networking header to
> - * get the address of the next hop and then relies on the neighbor
> - * lookup for the L2 address of the nexthop.
> + * relies on the neighbor lookup for the L2 address of the nexthop.
> + *
> + * The helper will perform a FIB lookup based on the skb's
> + * networking header to get the address of the next hop, unless
> + * this is supplied by the caller in the *params* argument. The
> + * *plen* argument indicates the len of *params* and should be set
> + * to 0 if *params* is NULL.
> *
> * The *flags* argument is reserved and must be 0. The helper is
> * currently only supported for tc BPF program types, and enabled
> @@ -4906,6 +4910,17 @@ struct bpf_fib_lookup {
> __u8 dmac[6]; /* ETH_ALEN */
> };
>
> +struct bpf_redir_neigh {
> + /* network family for lookup (AF_INET, AF_INET6)
> + */
second line for the comment is not needed.
> + __u8 nh_family;
> + /* network address of nexthop; skips fib lookup to find gateway */
> + union {
> + __be32 ipv4_nh;
> + __u32 ipv6_nh[4]; /* in6_addr; network order */
> + };
> +};
> +
> enum bpf_task_fd_type {
> BPF_FD_TYPE_RAW_TRACEPOINT, /* tp name */
> BPF_FD_TYPE_TRACEPOINT, /* tp name */
> diff --git a/net/core/filter.c b/net/core/filter.c
> index c5e2a1c5fd8d..d073031a3a61 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2165,12 +2165,11 @@ static int __bpf_redirect(struct sk_buff *skb, struct net_device *dev,
> }
>
> #if IS_ENABLED(CONFIG_IPV6)
> -static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb)
> +static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb,
> + struct net_device *dev, const struct in6_addr *nexthop)
> {
> - struct dst_entry *dst = skb_dst(skb);
> - struct net_device *dev = dst->dev;
> u32 hh_len = LL_RESERVED_SPACE(dev);
> - const struct in6_addr *nexthop;
> + struct dst_entry *dst = NULL;
> struct neighbour *neigh;
>
> if (dev_xmit_recursion()) {
> @@ -2196,8 +2195,11 @@ static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb)
> }
>
> rcu_read_lock_bh();
> - nexthop = rt6_nexthop(container_of(dst, struct rt6_info, dst),
> - &ipv6_hdr(skb)->daddr);
> + if (!nexthop) {
> + dst = skb_dst(skb);
> + nexthop = rt6_nexthop(container_of(dst, struct rt6_info, dst),
> + &ipv6_hdr(skb)->daddr);
> + }
> neigh = ip_neigh_gw6(dev, nexthop);
> if (likely(!IS_ERR(neigh))) {
> int ret;
> @@ -2210,36 +2212,46 @@ static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb)
> return ret;
> }
> rcu_read_unlock_bh();
> - IP6_INC_STATS(dev_net(dst->dev),
> - ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES);
> + if (dst)
> + IP6_INC_STATS(dev_net(dst->dev),
> + ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES);
> out_drop:
> kfree_skb(skb);
> return -ENETDOWN;
> }
>
> -static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev)
> +static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev,
> + struct bpf_nh_params *nh)
> {
> const struct ipv6hdr *ip6h = ipv6_hdr(skb);
> + struct in6_addr *nexthop = NULL;
> struct net *net = dev_net(dev);
> int err, ret = NET_XMIT_DROP;
> - struct dst_entry *dst;
> - struct flowi6 fl6 = {
> - .flowi6_flags = FLOWI_FLAG_ANYSRC,
> - .flowi6_mark = skb->mark,
> - .flowlabel = ip6_flowinfo(ip6h),
> - .flowi6_oif = dev->ifindex,
> - .flowi6_proto = ip6h->nexthdr,
> - .daddr = ip6h->daddr,
> - .saddr = ip6h->saddr,
> - };
>
> - dst = ipv6_stub->ipv6_dst_lookup_flow(net, NULL, &fl6, NULL);
> - if (IS_ERR(dst))
> - goto out_drop;
> + if (!nh->nh_family) {
> + struct dst_entry *dst;
reverse xmas tree ordering
> + struct flowi6 fl6 = {
> + .flowi6_flags = FLOWI_FLAG_ANYSRC,
> + .flowi6_mark = skb->mark,
> + .flowlabel = ip6_flowinfo(ip6h),
> + .flowi6_oif = dev->ifindex,
> + .flowi6_proto = ip6h->nexthdr,
> + .daddr = ip6h->daddr,
> + .saddr = ip6h->saddr,
> + };
> +
> + dst = ipv6_stub->ipv6_dst_lookup_flow(net, NULL, &fl6, NULL);
> + if (IS_ERR(dst))
> + goto out_drop;
>
> - skb_dst_set(skb, dst);
> + skb_dst_set(skb, dst);
> + } else if (nh->nh_family == AF_INET6) {
> + nexthop = &nh->ipv6_nh;
> + } else {
> + goto out_drop;
> + }
>
> - err = bpf_out_neigh_v6(net, skb);
> + err = bpf_out_neigh_v6(net, skb, dev, nexthop);
> if (unlikely(net_xmit_eval(err)))
> dev->stats.tx_errors++;
> else
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC bpf-next 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
2020-10-15 16:27 ` David Ahern
@ 2020-10-15 19:34 ` Toke Høiland-Jørgensen
2020-10-19 13:09 ` Daniel Borkmann
0 siblings, 1 reply; 12+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-15 19:34 UTC (permalink / raw)
To: David Ahern, Daniel Borkmann; +Cc: David Ahern, netdev, bpf
David Ahern <dsahern@gmail.com> writes:
> On 10/15/20 9:46 AM, Toke Høiland-Jørgensen wrote:
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index bf5a99d803e4..980cc1363be8 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -3677,15 +3677,19 @@ union bpf_attr {
>> * Return
>> * The id is returned or 0 in case the id could not be retrieved.
>> *
>> - * long bpf_redirect_neigh(u32 ifindex, u64 flags)
>> + * long bpf_redirect_neigh(u32 ifindex, struct bpf_redir_neigh *params, int plen, u64 flags)
>
> why not fold ifindex into params? with params and plen this should be
> extensible later if needed.
Figured this way would make it easier to run *without* the params (like
in the existing examples). But don't feel strongly about it, let's see
what Daniel thinks.
> A couple of nits below that caught me eye.
Thanks, will fix; the kernel bot also found a sparse warning, so I guess
I need to respin anyway (but waiting for Daniel's comments and/or
instructions on what tree to properly submit this to).
-Toke
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC bpf-next 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
2020-10-15 19:34 ` Toke Høiland-Jørgensen
@ 2020-10-19 13:09 ` Daniel Borkmann
2020-10-19 13:28 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 12+ messages in thread
From: Daniel Borkmann @ 2020-10-19 13:09 UTC (permalink / raw)
To: Toke Høiland-Jørgensen, David Ahern; +Cc: David Ahern, netdev, bpf
On 10/15/20 9:34 PM, Toke Høiland-Jørgensen wrote:
> David Ahern <dsahern@gmail.com> writes:
>> On 10/15/20 9:46 AM, Toke Høiland-Jørgensen wrote:
>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>> index bf5a99d803e4..980cc1363be8 100644
>>> --- a/include/uapi/linux/bpf.h
>>> +++ b/include/uapi/linux/bpf.h
>>> @@ -3677,15 +3677,19 @@ union bpf_attr {
>>> * Return
>>> * The id is returned or 0 in case the id could not be retrieved.
>>> *
>>> - * long bpf_redirect_neigh(u32 ifindex, u64 flags)
>>> + * long bpf_redirect_neigh(u32 ifindex, struct bpf_redir_neigh *params, int plen, u64 flags)
>>
>> why not fold ifindex into params? with params and plen this should be
>> extensible later if needed.
>
> Figured this way would make it easier to run *without* the params (like
> in the existing examples). But don't feel strongly about it, let's see
> what Daniel thinks.
My preference is what Toke has here, this simplifies use by just being able to
call bpf_redirect_neigh(ifindex, NULL, 0, 0) when just single external facing
device is used.
>> A couple of nits below that caught me eye.
>
> Thanks, will fix; the kernel bot also found a sparse warning, so I guess
> I need to respin anyway (but waiting for Daniel's comments and/or
> instructions on what tree to properly submit this to).
Given API change, lets do bpf. (Will review the rest later today.)
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC bpf-next 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
2020-10-19 13:09 ` Daniel Borkmann
@ 2020-10-19 13:28 ` Toke Høiland-Jørgensen
0 siblings, 0 replies; 12+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-19 13:28 UTC (permalink / raw)
To: Daniel Borkmann, David Ahern; +Cc: David Ahern, netdev, bpf
Daniel Borkmann <daniel@iogearbox.net> writes:
> On 10/15/20 9:34 PM, Toke Høiland-Jørgensen wrote:
>> David Ahern <dsahern@gmail.com> writes:
>>> On 10/15/20 9:46 AM, Toke Høiland-Jørgensen wrote:
>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>>> index bf5a99d803e4..980cc1363be8 100644
>>>> --- a/include/uapi/linux/bpf.h
>>>> +++ b/include/uapi/linux/bpf.h
>>>> @@ -3677,15 +3677,19 @@ union bpf_attr {
>>>> * Return
>>>> * The id is returned or 0 in case the id could not be retrieved.
>>>> *
>>>> - * long bpf_redirect_neigh(u32 ifindex, u64 flags)
>>>> + * long bpf_redirect_neigh(u32 ifindex, struct bpf_redir_neigh *params, int plen, u64 flags)
>>>
>>> why not fold ifindex into params? with params and plen this should be
>>> extensible later if needed.
>>
>> Figured this way would make it easier to run *without* the params (like
>> in the existing examples). But don't feel strongly about it, let's see
>> what Daniel thinks.
>
> My preference is what Toke has here, this simplifies use by just being able to
> call bpf_redirect_neigh(ifindex, NULL, 0, 0) when just single external facing
> device is used.
>
>>> A couple of nits below that caught me eye.
>>
>> Thanks, will fix; the kernel bot also found a sparse warning, so I guess
>> I need to respin anyway (but waiting for Daniel's comments and/or
>> instructions on what tree to properly submit this to).
>
> Given API change, lets do bpf. (Will review the rest later today.)
Right, ACK. I'll wait for your review, then resubmit against the bpf
tree :)
-Toke
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC bpf-next 2/2] selftests: Update test_tc_neigh to use the modified bpf_redirect_neigh()
2020-10-15 15:46 ` [PATCH RFC bpf-next 2/2] selftests: Update test_tc_neigh to use the modified bpf_redirect_neigh() Toke Høiland-Jørgensen
@ 2020-10-19 14:40 ` Daniel Borkmann
2020-10-19 14:48 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 12+ messages in thread
From: Daniel Borkmann @ 2020-10-19 14:40 UTC (permalink / raw)
To: Toke Høiland-Jørgensen; +Cc: David Ahern, netdev, bpf
On 10/15/20 5:46 PM, Toke Høiland-Jørgensen wrote:
> From: Toke Høiland-Jørgensen <toke@redhat.com>
>
> This updates the test_tc_neigh selftest to use the new syntax of
> bpf_redirect_neigh(). To exercise the helper both with and without the
> optional parameter, one forwarding direction is changed to do a
> bpf_fib_lookup() followed by a call to bpf_redirect_neigh(), while the
> other direction is using the map-based ifindex lookup letting the redirect
> helper resolve the nexthop from the FIB.
>
> This also fixes the test_tc_redirect.sh script to work on systems that have
> a consolidated dual-stack 'ping' binary instead of separate ping/ping6
> versions.
>
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
I would prefer if you could not mix the two tests, meaning, one complete test
case is only with bpf_redirect_neigh(get_dev_ifindex(xxx), NULL, 0, 0) for both
directions, and another self-contained one is with fib lookup + bpf_redirect_neigh
with params, even if it means we duplicate test_tc_neigh.c slighly, but I think
that's fine for sake of test coverage.
Thanks,
Daniel
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC bpf-next 2/2] selftests: Update test_tc_neigh to use the modified bpf_redirect_neigh()
2020-10-19 14:40 ` Daniel Borkmann
@ 2020-10-19 14:48 ` Toke Høiland-Jørgensen
0 siblings, 0 replies; 12+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-19 14:48 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: David Ahern, netdev, bpf
Daniel Borkmann <daniel@iogearbox.net> writes:
> On 10/15/20 5:46 PM, Toke Høiland-Jørgensen wrote:
>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>>
>> This updates the test_tc_neigh selftest to use the new syntax of
>> bpf_redirect_neigh(). To exercise the helper both with and without the
>> optional parameter, one forwarding direction is changed to do a
>> bpf_fib_lookup() followed by a call to bpf_redirect_neigh(), while the
>> other direction is using the map-based ifindex lookup letting the redirect
>> helper resolve the nexthop from the FIB.
>>
>> This also fixes the test_tc_redirect.sh script to work on systems that have
>> a consolidated dual-stack 'ping' binary instead of separate ping/ping6
>> versions.
>>
>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>
> I would prefer if you could not mix the two tests, meaning, one complete test
> case is only with bpf_redirect_neigh(get_dev_ifindex(xxx), NULL, 0, 0) for both
> directions, and another self-contained one is with fib lookup + bpf_redirect_neigh
> with params, even if it means we duplicate test_tc_neigh.c slighly, but I think
> that's fine for sake of test coverage.
Sure, can do :)
-Toke
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC bpf-next 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
2020-10-15 15:46 ` [PATCH RFC bpf-next 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter Toke Høiland-Jørgensen
2020-10-15 16:27 ` David Ahern
@ 2020-10-19 14:48 ` Daniel Borkmann
2020-10-19 14:56 ` Toke Høiland-Jørgensen
2020-10-19 15:01 ` Daniel Borkmann
2 siblings, 1 reply; 12+ messages in thread
From: Daniel Borkmann @ 2020-10-19 14:48 UTC (permalink / raw)
To: Toke Høiland-Jørgensen; +Cc: David Ahern, netdev, bpf
On 10/15/20 5:46 PM, Toke Høiland-Jørgensen wrote:
> From: Toke Høiland-Jørgensen <toke@redhat.com>
>
> Based on the discussion in [0], update the bpf_redirect_neigh() helper to
> accept an optional parameter specifying the nexthop information. This makes
> it possible to combine bpf_fib_lookup() and bpf_redirect_neigh() without
> incurring a duplicate FIB lookup - since the FIB lookup helper will return
> the nexthop information even if no neighbour is present, this can simply be
> passed on to bpf_redirect_neigh() if bpf_fib_lookup() returns
> BPF_FIB_LKUP_RET_NO_NEIGH.
>
> [0] https://lore.kernel.org/bpf/393e17fc-d187-3a8d-2f0d-a627c7c63fca@iogearbox.net/
>
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Overall looks good from what I can tell, just small nits below on top of
David's feedback:
[...]
> -static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev)
> +static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev,
> + struct bpf_nh_params *nh)
> {
> const struct iphdr *ip4h = ip_hdr(skb);
> struct net *net = dev_net(dev);
> int err, ret = NET_XMIT_DROP;
> - struct rtable *rt;
> - struct flowi4 fl4 = {
> - .flowi4_flags = FLOWI_FLAG_ANYSRC,
> - .flowi4_mark = skb->mark,
> - .flowi4_tos = RT_TOS(ip4h->tos),
> - .flowi4_oif = dev->ifindex,
> - .flowi4_proto = ip4h->protocol,
> - .daddr = ip4h->daddr,
> - .saddr = ip4h->saddr,
> - };
>
> - rt = ip_route_output_flow(net, &fl4, NULL);
> - if (IS_ERR(rt))
> - goto out_drop;
> - if (rt->rt_type != RTN_UNICAST && rt->rt_type != RTN_LOCAL) {
> - ip_rt_put(rt);
> - goto out_drop;
> - }
> + if (!nh->nh_family) {
> + struct rtable *rt;
> + struct flowi4 fl4 = {
> + .flowi4_flags = FLOWI_FLAG_ANYSRC,
> + .flowi4_mark = skb->mark,
> + .flowi4_tos = RT_TOS(ip4h->tos),
> + .flowi4_oif = dev->ifindex,
> + .flowi4_proto = ip4h->protocol,
> + .daddr = ip4h->daddr,
> + .saddr = ip4h->saddr,
> + };
> +
> + rt = ip_route_output_flow(net, &fl4, NULL);
> + if (IS_ERR(rt))
> + goto out_drop;
> + if (rt->rt_type != RTN_UNICAST && rt->rt_type != RTN_LOCAL) {
> + ip_rt_put(rt);
> + goto out_drop;
> + }
>
> - skb_dst_set(skb, &rt->dst);
> + skb_dst_set(skb, &rt->dst);
> + nh = NULL;
> + }
>
> - err = bpf_out_neigh_v4(net, skb);
> + err = bpf_out_neigh_v4(net, skb, dev, nh);
> if (unlikely(net_xmit_eval(err)))
> dev->stats.tx_errors++;
> else
> @@ -2355,7 +2383,8 @@ static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev)
> }
> #endif /* CONFIG_INET */
>
> -static int __bpf_redirect_neigh(struct sk_buff *skb, struct net_device *dev)
> +static int __bpf_redirect_neigh(struct sk_buff *skb, struct net_device *dev,
> + struct bpf_nh_params *nh)
> {
> struct ethhdr *ethh = eth_hdr(skb);
>
> @@ -2370,9 +2399,9 @@ static int __bpf_redirect_neigh(struct sk_buff *skb, struct net_device *dev)
> skb_reset_network_header(skb);
>
> if (skb->protocol == htons(ETH_P_IP))
> - return __bpf_redirect_neigh_v4(skb, dev);
> + return __bpf_redirect_neigh_v4(skb, dev, nh);
> else if (skb->protocol == htons(ETH_P_IPV6))
> - return __bpf_redirect_neigh_v6(skb, dev);
> + return __bpf_redirect_neigh_v6(skb, dev, nh);
> out:
> kfree_skb(skb);
> return -ENOTSUPP;
> @@ -2455,8 +2484,8 @@ int skb_do_redirect(struct sk_buff *skb)
> return -EAGAIN;
> }
> return flags & BPF_F_NEIGH ?
> - __bpf_redirect_neigh(skb, dev) :
> - __bpf_redirect(skb, dev, flags);
> + __bpf_redirect_neigh(skb, dev, &ri->nh) :
> + __bpf_redirect(skb, dev, flags);
> out_drop:
> kfree_skb(skb);
> return -EINVAL;
> @@ -2504,16 +2533,23 @@ static const struct bpf_func_proto bpf_redirect_peer_proto = {
> .arg2_type = ARG_ANYTHING,
> };
>
> -BPF_CALL_2(bpf_redirect_neigh, u32, ifindex, u64, flags)
> +BPF_CALL_4(bpf_redirect_neigh, u32, ifindex, struct bpf_redir_neigh *, params,
> + int, plen, u64, flags)
> {
> struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
>
> - if (unlikely(flags))
> + if (unlikely((plen && plen < sizeof(*params)) || flags))
> return TC_ACT_SHOT;
>
> ri->flags = BPF_F_NEIGH;
> ri->tgt_index = ifindex;
>
> + BUILD_BUG_ON(sizeof(struct bpf_redir_neigh) != sizeof(struct bpf_nh_params));
> + if (plen)
> + memcpy(&ri->nh, params, sizeof(ri->nh));
> + else
> + ri->nh.nh_family = 0; /* clear previous value */
I'd probably just add an internal flag and do ...
ri->flags = BPF_F_NEIGH | (plen ? BPF_F_NEXTHOP : 0);
... instead of above clearing, and skb_do_redirect() then becomes:
__bpf_redirect_neigh(skb, dev, flags & BPF_F_NEXTHOP ? &ri->nh : NULL)
... which would then also avoid this !nh->nh_family check where you later on
set nh = NULL to pass it onwards.
> return TC_ACT_REDIRECT;
> }
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC bpf-next 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
2020-10-19 14:48 ` Daniel Borkmann
@ 2020-10-19 14:56 ` Toke Høiland-Jørgensen
0 siblings, 0 replies; 12+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-10-19 14:56 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: David Ahern, netdev, bpf
Daniel Borkmann <daniel@iogearbox.net> writes:
> On 10/15/20 5:46 PM, Toke Høiland-Jørgensen wrote:
>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>>
>> Based on the discussion in [0], update the bpf_redirect_neigh() helper to
>> accept an optional parameter specifying the nexthop information. This makes
>> it possible to combine bpf_fib_lookup() and bpf_redirect_neigh() without
>> incurring a duplicate FIB lookup - since the FIB lookup helper will return
>> the nexthop information even if no neighbour is present, this can simply be
>> passed on to bpf_redirect_neigh() if bpf_fib_lookup() returns
>> BPF_FIB_LKUP_RET_NO_NEIGH.
>>
>> [0] https://lore.kernel.org/bpf/393e17fc-d187-3a8d-2f0d-a627c7c63fca@iogearbox.net/
>>
>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>
> Overall looks good from what I can tell, just small nits below on top of
> David's feedback:
>
> [...]
>> -static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev)
>> +static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev,
>> + struct bpf_nh_params *nh)
>> {
>> const struct iphdr *ip4h = ip_hdr(skb);
>> struct net *net = dev_net(dev);
>> int err, ret = NET_XMIT_DROP;
>> - struct rtable *rt;
>> - struct flowi4 fl4 = {
>> - .flowi4_flags = FLOWI_FLAG_ANYSRC,
>> - .flowi4_mark = skb->mark,
>> - .flowi4_tos = RT_TOS(ip4h->tos),
>> - .flowi4_oif = dev->ifindex,
>> - .flowi4_proto = ip4h->protocol,
>> - .daddr = ip4h->daddr,
>> - .saddr = ip4h->saddr,
>> - };
>>
>> - rt = ip_route_output_flow(net, &fl4, NULL);
>> - if (IS_ERR(rt))
>> - goto out_drop;
>> - if (rt->rt_type != RTN_UNICAST && rt->rt_type != RTN_LOCAL) {
>> - ip_rt_put(rt);
>> - goto out_drop;
>> - }
>> + if (!nh->nh_family) {
>> + struct rtable *rt;
>> + struct flowi4 fl4 = {
>> + .flowi4_flags = FLOWI_FLAG_ANYSRC,
>> + .flowi4_mark = skb->mark,
>> + .flowi4_tos = RT_TOS(ip4h->tos),
>> + .flowi4_oif = dev->ifindex,
>> + .flowi4_proto = ip4h->protocol,
>> + .daddr = ip4h->daddr,
>> + .saddr = ip4h->saddr,
>> + };
>> +
>> + rt = ip_route_output_flow(net, &fl4, NULL);
>> + if (IS_ERR(rt))
>> + goto out_drop;
>> + if (rt->rt_type != RTN_UNICAST && rt->rt_type != RTN_LOCAL) {
>> + ip_rt_put(rt);
>> + goto out_drop;
>> + }
>>
>> - skb_dst_set(skb, &rt->dst);
>> + skb_dst_set(skb, &rt->dst);
>> + nh = NULL;
>> + }
>>
>> - err = bpf_out_neigh_v4(net, skb);
>> + err = bpf_out_neigh_v4(net, skb, dev, nh);
>> if (unlikely(net_xmit_eval(err)))
>> dev->stats.tx_errors++;
>> else
>> @@ -2355,7 +2383,8 @@ static int __bpf_redirect_neigh_v4(struct sk_buff *skb, struct net_device *dev)
>> }
>> #endif /* CONFIG_INET */
>>
>> -static int __bpf_redirect_neigh(struct sk_buff *skb, struct net_device *dev)
>> +static int __bpf_redirect_neigh(struct sk_buff *skb, struct net_device *dev,
>> + struct bpf_nh_params *nh)
>> {
>> struct ethhdr *ethh = eth_hdr(skb);
>>
>> @@ -2370,9 +2399,9 @@ static int __bpf_redirect_neigh(struct sk_buff *skb, struct net_device *dev)
>> skb_reset_network_header(skb);
>>
>> if (skb->protocol == htons(ETH_P_IP))
>> - return __bpf_redirect_neigh_v4(skb, dev);
>> + return __bpf_redirect_neigh_v4(skb, dev, nh);
>> else if (skb->protocol == htons(ETH_P_IPV6))
>> - return __bpf_redirect_neigh_v6(skb, dev);
>> + return __bpf_redirect_neigh_v6(skb, dev, nh);
>> out:
>> kfree_skb(skb);
>> return -ENOTSUPP;
>> @@ -2455,8 +2484,8 @@ int skb_do_redirect(struct sk_buff *skb)
>> return -EAGAIN;
>> }
>> return flags & BPF_F_NEIGH ?
>> - __bpf_redirect_neigh(skb, dev) :
>> - __bpf_redirect(skb, dev, flags);
>> + __bpf_redirect_neigh(skb, dev, &ri->nh) :
>> + __bpf_redirect(skb, dev, flags);
>> out_drop:
>> kfree_skb(skb);
>> return -EINVAL;
>> @@ -2504,16 +2533,23 @@ static const struct bpf_func_proto bpf_redirect_peer_proto = {
>> .arg2_type = ARG_ANYTHING,
>> };
>>
>> -BPF_CALL_2(bpf_redirect_neigh, u32, ifindex, u64, flags)
>> +BPF_CALL_4(bpf_redirect_neigh, u32, ifindex, struct bpf_redir_neigh *, params,
>> + int, plen, u64, flags)
>> {
>> struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
>>
>> - if (unlikely(flags))
>> + if (unlikely((plen && plen < sizeof(*params)) || flags))
>> return TC_ACT_SHOT;
>>
>> ri->flags = BPF_F_NEIGH;
>> ri->tgt_index = ifindex;
>>
>> + BUILD_BUG_ON(sizeof(struct bpf_redir_neigh) != sizeof(struct bpf_nh_params));
>> + if (plen)
>> + memcpy(&ri->nh, params, sizeof(ri->nh));
>> + else
>> + ri->nh.nh_family = 0; /* clear previous value */
>
> I'd probably just add an internal flag and do ...
>
> ri->flags = BPF_F_NEIGH | (plen ? BPF_F_NEXTHOP : 0);
>
> ... instead of above clearing, and skb_do_redirect() then becomes:
>
> __bpf_redirect_neigh(skb, dev, flags & BPF_F_NEXTHOP ? &ri->nh : NULL)
>
> ... which would then also avoid this !nh->nh_family check where you later on
> set nh = NULL to pass it onwards.
Ah yes, excellent idea! Will fix :)
-Toke
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFC bpf-next 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter
2020-10-15 15:46 ` [PATCH RFC bpf-next 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter Toke Høiland-Jørgensen
2020-10-15 16:27 ` David Ahern
2020-10-19 14:48 ` Daniel Borkmann
@ 2020-10-19 15:01 ` Daniel Borkmann
2 siblings, 0 replies; 12+ messages in thread
From: Daniel Borkmann @ 2020-10-19 15:01 UTC (permalink / raw)
To: Toke Høiland-Jørgensen; +Cc: David Ahern, netdev, bpf
On 10/15/20 5:46 PM, Toke Høiland-Jørgensen wrote:
[...]
> +struct bpf_redir_neigh {
> + /* network family for lookup (AF_INET, AF_INET6)
> + */
> + __u8 nh_family;
> + /* network address of nexthop; skips fib lookup to find gateway */
> + union {
> + __be32 ipv4_nh;
> + __u32 ipv6_nh[4]; /* in6_addr; network order */
> + };
> +};
> +
> enum bpf_task_fd_type {
> BPF_FD_TYPE_RAW_TRACEPOINT, /* tp name */
> BPF_FD_TYPE_TRACEPOINT, /* tp name */
> diff --git a/net/core/filter.c b/net/core/filter.c
> index c5e2a1c5fd8d..d073031a3a61 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2165,12 +2165,11 @@ static int __bpf_redirect(struct sk_buff *skb, struct net_device *dev,
> }
>
> #if IS_ENABLED(CONFIG_IPV6)
> -static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb)
> +static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb,
> + struct net_device *dev, const struct in6_addr *nexthop)
> {
> - struct dst_entry *dst = skb_dst(skb);
> - struct net_device *dev = dst->dev;
> u32 hh_len = LL_RESERVED_SPACE(dev);
> - const struct in6_addr *nexthop;
> + struct dst_entry *dst = NULL;
> struct neighbour *neigh;
>
> if (dev_xmit_recursion()) {
> @@ -2196,8 +2195,11 @@ static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb)
> }
>
> rcu_read_lock_bh();
> - nexthop = rt6_nexthop(container_of(dst, struct rt6_info, dst),
> - &ipv6_hdr(skb)->daddr);
> + if (!nexthop) {
> + dst = skb_dst(skb);
> + nexthop = rt6_nexthop(container_of(dst, struct rt6_info, dst),
> + &ipv6_hdr(skb)->daddr);
> + }
> neigh = ip_neigh_gw6(dev, nexthop);
> if (likely(!IS_ERR(neigh))) {
> int ret;
> @@ -2210,36 +2212,46 @@ static int bpf_out_neigh_v6(struct net *net, struct sk_buff *skb)
> return ret;
> }
> rcu_read_unlock_bh();
> - IP6_INC_STATS(dev_net(dst->dev),
> - ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES);
> + if (dst)
> + IP6_INC_STATS(dev_net(dst->dev),
> + ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES);
> out_drop:
> kfree_skb(skb);
> return -ENETDOWN;
> }
>
> -static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev)
> +static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev,
> + struct bpf_nh_params *nh)
> {
> const struct ipv6hdr *ip6h = ipv6_hdr(skb);
> + struct in6_addr *nexthop = NULL;
> struct net *net = dev_net(dev);
> int err, ret = NET_XMIT_DROP;
> - struct dst_entry *dst;
> - struct flowi6 fl6 = {
> - .flowi6_flags = FLOWI_FLAG_ANYSRC,
> - .flowi6_mark = skb->mark,
> - .flowlabel = ip6_flowinfo(ip6h),
> - .flowi6_oif = dev->ifindex,
> - .flowi6_proto = ip6h->nexthdr,
> - .daddr = ip6h->daddr,
> - .saddr = ip6h->saddr,
> - };
>
> - dst = ipv6_stub->ipv6_dst_lookup_flow(net, NULL, &fl6, NULL);
> - if (IS_ERR(dst))
> - goto out_drop;
> + if (!nh->nh_family) {
> + struct dst_entry *dst;
> + struct flowi6 fl6 = {
> + .flowi6_flags = FLOWI_FLAG_ANYSRC,
> + .flowi6_mark = skb->mark,
> + .flowlabel = ip6_flowinfo(ip6h),
> + .flowi6_oif = dev->ifindex,
> + .flowi6_proto = ip6h->nexthdr,
> + .daddr = ip6h->daddr,
> + .saddr = ip6h->saddr,
nit: Would be good for readability to keep the previous whitespace alignment intact.
> + };
> +
> + dst = ipv6_stub->ipv6_dst_lookup_flow(net, NULL, &fl6, NULL);
> + if (IS_ERR(dst))
> + goto out_drop;
>
> - skb_dst_set(skb, dst);
> + skb_dst_set(skb, dst);
> + } else if (nh->nh_family == AF_INET6) {
> + nexthop = &nh->ipv6_nh;
> + } else {
> + goto out_drop;
> + }
>
> - err = bpf_out_neigh_v6(net, skb);
> + err = bpf_out_neigh_v6(net, skb, dev, nexthop);
I'd probably model the bpf_out_neigh_v{4,6}() as close as possible similar to each other in terms
of args we pass etc. In the v6 case you pass the nexthop in6_addr directly whereas v4 passes
bpf_nh_params, I'd probably also stick to the latter for v6 to keep it symmetric.
> if (unlikely(net_xmit_eval(err)))
> dev->stats.tx_errors++;
> else
> @@ -2260,11 +2272,9 @@ static int __bpf_redirect_neigh_v6(struct sk_buff *skb, struct net_device *dev)
> #endif /* CONFIG_IPV6 */
>
> #if IS_ENABLED(CONFIG_INET)
> -static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb)
> +static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb,
> + struct net_device *dev, struct bpf_nh_params *nh)
> {
> - struct dst_entry *dst = skb_dst(skb);
> - struct rtable *rt = container_of(dst, struct rtable, dst);
> - struct net_device *dev = dst->dev;
> u32 hh_len = LL_RESERVED_SPACE(dev);
> struct neighbour *neigh;
> bool is_v6gw = false;
> @@ -2292,7 +2302,20 @@ static int bpf_out_neigh_v4(struct net *net, struct sk_buff *skb)
> }
>
> rcu_read_lock_bh();
> - neigh = ip_neigh_for_gw(rt, skb, &is_v6gw);
> + if (!nh) {
> + struct dst_entry *dst = skb_dst(skb);
> + struct rtable *rt = container_of(dst, struct rtable, dst);
> +
> + neigh = ip_neigh_for_gw(rt, skb, &is_v6gw);
> + } else if (nh->nh_family == AF_INET6) {
> + neigh = ip_neigh_gw6(dev, &nh->ipv6_nh);
> + is_v6gw = true;
> + } else if (nh->nh_family == AF_INET) {
> + neigh = ip_neigh_gw4(dev, nh->ipv4_nh);
> + } else {
> + goto out_drop;
> + }
> +
> if (likely(!IS_ERR(neigh))) {
> int ret;
>
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2020-10-19 15:01 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-15 15:46 [PATCH RFC bpf-next 0/2] bpf: Rework bpf_redirect_neigh() to allow supplying nexthop from caller Toke Høiland-Jørgensen
2020-10-15 15:46 ` [PATCH RFC bpf-next 1/2] bpf_redirect_neigh: Support supplying the nexthop as a helper parameter Toke Høiland-Jørgensen
2020-10-15 16:27 ` David Ahern
2020-10-15 19:34 ` Toke Høiland-Jørgensen
2020-10-19 13:09 ` Daniel Borkmann
2020-10-19 13:28 ` Toke Høiland-Jørgensen
2020-10-19 14:48 ` Daniel Borkmann
2020-10-19 14:56 ` Toke Høiland-Jørgensen
2020-10-19 15:01 ` Daniel Borkmann
2020-10-15 15:46 ` [PATCH RFC bpf-next 2/2] selftests: Update test_tc_neigh to use the modified bpf_redirect_neigh() Toke Høiland-Jørgensen
2020-10-19 14:40 ` Daniel Borkmann
2020-10-19 14:48 ` Toke Høiland-Jørgensen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).