All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 0/2] bpf: Fix bpf_redirect to an ipip/ip6tnl dev
@ 2016-11-09 23:36 Martin KaFai Lau
  2016-11-09 23:36 ` [PATCH net 1/2] " Martin KaFai Lau
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Martin KaFai Lau @ 2016-11-09 23:36 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, Kernel Team

Hi,

This patch set fixes a bug in bpf_redirect(dev, flags) when dev is an
ipip/ip6tnl.  The current problem is IP-EthHdr-IP is sent out instead of
IP-IP.

Patch 1 adds a dev->type test similar to dev_is_mac_header_xmit()
in act_mirred.c which is only available in net-next.  We can consider to
refactor it once this patch is pulled into net-next from net.

Thanks,
-- Martin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH net 1/2] bpf: Fix bpf_redirect to an ipip/ip6tnl dev
  2016-11-09 23:36 [PATCH net 0/2] bpf: Fix bpf_redirect to an ipip/ip6tnl dev Martin KaFai Lau
@ 2016-11-09 23:36 ` Martin KaFai Lau
  2019-01-10 20:41   ` Willem de Bruijn
  2016-11-09 23:36 ` [PATCH net 2/2] bpf: Add test for bpf_redirect to ipip/ip6tnl Martin KaFai Lau
  2016-11-13  4:39 ` [PATCH net 0/2] bpf: Fix bpf_redirect to an ipip/ip6tnl dev David Miller
  2 siblings, 1 reply; 8+ messages in thread
From: Martin KaFai Lau @ 2016-11-09 23:36 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, Kernel Team

If the bpf program calls bpf_redirect(dev, 0) and dev is
an ipip/ip6tnl, it currently includes the mac header.
e.g. If dev is ipip, the end result is IP-EthHdr-IP instead
of IP-IP.

The fix is to pull the mac header.  At ingress, skb_postpull_rcsum()
is not needed because the ethhdr should have been pulled once already
and then got pushed back just before calling the bpf_prog.
At egress, this patch calls skb_postpull_rcsum().

If bpf_redirect(dev, BPF_F_INGRESS) is called,
it also fails now because it calls dev_forward_skb() which
eventually calls eth_type_trans(skb, dev).  The eth_type_trans()
will set skb->type = PACKET_OTHERHOST because the mac address
does not match the redirecting dev->dev_addr.  The PACKET_OTHERHOST
will eventually cause the ip_rcv() errors out.  To fix this,
____dev_forward_skb() is added.

Joint work with Daniel Borkmann.

Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel")
Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 include/linux/netdevice.h | 15 +++++++++++
 net/core/dev.c            | 17 +++++-------
 net/core/filter.c         | 68 +++++++++++++++++++++++++++++++++++++++++------
 3 files changed, 81 insertions(+), 19 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 91ee364..bf04a46 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3354,6 +3354,21 @@ int dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
 bool is_skb_forwardable(const struct net_device *dev,
 			const struct sk_buff *skb);
 
+static __always_inline int ____dev_forward_skb(struct net_device *dev,
+					       struct sk_buff *skb)
+{
+	if (skb_orphan_frags(skb, GFP_ATOMIC) ||
+	    unlikely(!is_skb_forwardable(dev, skb))) {
+		atomic_long_inc(&dev->rx_dropped);
+		kfree_skb(skb);
+		return NET_RX_DROP;
+	}
+
+	skb_scrub_packet(skb, true);
+	skb->priority = 0;
+	return 0;
+}
+
 void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev);
 
 extern int		netdev_budget;
diff --git a/net/core/dev.c b/net/core/dev.c
index eaad4c2..6666b28 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1766,19 +1766,14 @@ EXPORT_SYMBOL_GPL(is_skb_forwardable);
 
 int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
 {
-	if (skb_orphan_frags(skb, GFP_ATOMIC) ||
-	    unlikely(!is_skb_forwardable(dev, skb))) {
-		atomic_long_inc(&dev->rx_dropped);
-		kfree_skb(skb);
-		return NET_RX_DROP;
-	}
+	int ret = ____dev_forward_skb(dev, skb);
 
-	skb_scrub_packet(skb, true);
-	skb->priority = 0;
-	skb->protocol = eth_type_trans(skb, dev);
-	skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
+	if (likely(!ret)) {
+		skb->protocol = eth_type_trans(skb, dev);
+		skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
+	}
 
-	return 0;
+	return ret;
 }
 EXPORT_SYMBOL_GPL(__dev_forward_skb);
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 00351cd..b391209 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1628,6 +1628,19 @@ static inline int __bpf_rx_skb(struct net_device *dev, struct sk_buff *skb)
 	return dev_forward_skb(dev, skb);
 }
 
+static inline int __bpf_rx_skb_no_mac(struct net_device *dev,
+				      struct sk_buff *skb)
+{
+	int ret = ____dev_forward_skb(dev, skb);
+
+	if (likely(!ret)) {
+		skb->dev = dev;
+		ret = netif_rx(skb);
+	}
+
+	return ret;
+}
+
 static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb)
 {
 	int ret;
@@ -1647,6 +1660,51 @@ static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb)
 	return ret;
 }
 
+static int __bpf_redirect_no_mac(struct sk_buff *skb, struct net_device *dev,
+				 u32 flags)
+{
+	/* skb->mac_len is not set on normal egress */
+	unsigned int mlen = skb->network_header - skb->mac_header;
+
+	__skb_pull(skb, mlen);
+
+	/* At ingress, the mac header has already been pulled once.
+	 * At egress, skb_pospull_rcsum has to be done in case that
+	 * the skb is originated from ingress (i.e. a forwarded skb)
+	 * to ensure that rcsum starts at net header.
+	 */
+	if (!skb_at_tc_ingress(skb))
+		skb_postpull_rcsum(skb, skb_mac_header(skb), mlen);
+	skb_pop_mac_header(skb);
+	skb_reset_mac_len(skb);
+	return flags & BPF_F_INGRESS ?
+	       __bpf_rx_skb_no_mac(dev, skb) : __bpf_tx_skb(dev, skb);
+}
+
+static int __bpf_redirect_common(struct sk_buff *skb, struct net_device *dev,
+				 u32 flags)
+{
+	bpf_push_mac_rcsum(skb);
+	return flags & BPF_F_INGRESS ?
+	       __bpf_rx_skb(dev, skb) : __bpf_tx_skb(dev, skb);
+}
+
+static int __bpf_redirect(struct sk_buff *skb, struct net_device *dev,
+			  u32 flags)
+{
+	switch (dev->type) {
+	case ARPHRD_TUNNEL:
+	case ARPHRD_TUNNEL6:
+	case ARPHRD_SIT:
+	case ARPHRD_IPGRE:
+	case ARPHRD_VOID:
+	case ARPHRD_NONE:
+		return __bpf_redirect_no_mac(skb, dev, flags);
+	default:
+		return __bpf_redirect_common(skb, dev, flags);
+	}
+}
+
 BPF_CALL_3(bpf_clone_redirect, struct sk_buff *, skb, u32, ifindex, u64, flags)
 {
 	struct net_device *dev;
@@ -1675,10 +1733,7 @@ BPF_CALL_3(bpf_clone_redirect, struct sk_buff *, skb, u32, ifindex, u64, flags)
 		return -ENOMEM;
 	}
 
-	bpf_push_mac_rcsum(clone);
-
-	return flags & BPF_F_INGRESS ?
-	       __bpf_rx_skb(dev, clone) : __bpf_tx_skb(dev, clone);
+	return __bpf_redirect(clone, dev, flags);
 }
 
 static const struct bpf_func_proto bpf_clone_redirect_proto = {
@@ -1722,10 +1777,7 @@ int skb_do_redirect(struct sk_buff *skb)
 		return -EINVAL;
 	}
 
-	bpf_push_mac_rcsum(skb);
-
-	return ri->flags & BPF_F_INGRESS ?
-	       __bpf_rx_skb(dev, skb) : __bpf_tx_skb(dev, skb);
+	return __bpf_redirect(skb, dev, ri->flags);
 }
 
 static const struct bpf_func_proto bpf_redirect_proto = {
-- 
2.5.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH net 2/2] bpf: Add test for bpf_redirect to ipip/ip6tnl
  2016-11-09 23:36 [PATCH net 0/2] bpf: Fix bpf_redirect to an ipip/ip6tnl dev Martin KaFai Lau
  2016-11-09 23:36 ` [PATCH net 1/2] " Martin KaFai Lau
@ 2016-11-09 23:36 ` Martin KaFai Lau
  2016-11-13  4:39 ` [PATCH net 0/2] bpf: Fix bpf_redirect to an ipip/ip6tnl dev David Miller
  2 siblings, 0 replies; 8+ messages in thread
From: Martin KaFai Lau @ 2016-11-09 23:36 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, Kernel Team

The test creates two netns, ns1 and ns2.  The host (the default netns)
has an ipip or ip6tnl dev configured for tunneling traffic to the ns2.

    ping VIPS from ns1 <----> host <--tunnel--> ns2 (VIPs at loopback)

The test is to have ns1 pinging VIPs configured at the loopback
interface in ns2.

The VIPs are 10.10.1.102 and 2401:face::66 (which are configured
at lo@ns2). [Note: 0x66 => 102].

At ns1, the VIPs are routed _via_ the host.

At the host, bpf programs are installed at the veth to redirect packets
from a veth to the ipip/ip6tnl.  The test is configured in a way so
that both ingress and egress can be tested.

At ns2, the ipip/ip6tnl dev is configured with the local and remote address
specified.  The return path is routed to the dev ipip/ip6tnl.

During egress test, the host also locally tests pinging the VIPs to ensure
that bpf_redirect at egress also works for the direct egress (i.e. not
forwarding from dev ve1 to ve2).

Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
---
 samples/bpf/Makefile              |   4 +
 samples/bpf/tc_l2_redirect.sh     | 173 ++++++++++++++++++++++++++++
 samples/bpf/tc_l2_redirect_kern.c | 236 ++++++++++++++++++++++++++++++++++++++
 samples/bpf/tc_l2_redirect_user.c |  73 ++++++++++++
 4 files changed, 486 insertions(+)
 create mode 100755 samples/bpf/tc_l2_redirect.sh
 create mode 100644 samples/bpf/tc_l2_redirect_kern.c
 create mode 100644 samples/bpf/tc_l2_redirect_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 12b7304..72c5867 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -27,6 +27,7 @@ hostprogs-y += xdp2
 hostprogs-y += test_current_task_under_cgroup
 hostprogs-y += trace_event
 hostprogs-y += sampleip
+hostprogs-y += tc_l2_redirect
 
 test_verifier-objs := test_verifier.o libbpf.o
 test_maps-objs := test_maps.o libbpf.o
@@ -56,6 +57,7 @@ test_current_task_under_cgroup-objs := bpf_load.o libbpf.o \
 				       test_current_task_under_cgroup_user.o
 trace_event-objs := bpf_load.o libbpf.o trace_event_user.o
 sampleip-objs := bpf_load.o libbpf.o sampleip_user.o
+tc_l2_redirect-objs := bpf_load.o libbpf.o tc_l2_redirect_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -72,6 +74,7 @@ always += test_probe_write_user_kern.o
 always += trace_output_kern.o
 always += tcbpf1_kern.o
 always += tcbpf2_kern.o
+always += tc_l2_redirect_kern.o
 always += lathist_kern.o
 always += offwaketime_kern.o
 always += spintest_kern.o
@@ -111,6 +114,7 @@ HOSTLOADLIBES_xdp2 += -lelf
 HOSTLOADLIBES_test_current_task_under_cgroup += -lelf
 HOSTLOADLIBES_trace_event += -lelf
 HOSTLOADLIBES_sampleip += -lelf
+HOSTLOADLIBES_tc_l2_redirect += -l elf
 
 # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
 #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
diff --git a/samples/bpf/tc_l2_redirect.sh b/samples/bpf/tc_l2_redirect.sh
new file mode 100755
index 0000000..b0e1f09
--- /dev/null
+++ b/samples/bpf/tc_l2_redirect.sh
@@ -0,0 +1,173 @@
+#!/bin/bash
+
+[[ -z $TC ]] && TC='tc'
+[[ -z $IP ]] && IP='ip'
+
+REDIRECT_USER='./tc_l2_redirect'
+REDIRECT_BPF='./tc_l2_redirect_kern.o'
+
+RP_FILTER=$(< /proc/sys/net/ipv4/conf/all/rp_filter)
+IPV6_FORWARDING=$(< /proc/sys/net/ipv6/conf/all/forwarding)
+
+function config_common {
+	local tun_type=$1
+
+	$IP netns add ns1
+	$IP netns add ns2
+	$IP link add ve1 type veth peer name vens1
+	$IP link add ve2 type veth peer name vens2
+	$IP link set dev ve1 up
+	$IP link set dev ve2 up
+	$IP link set dev ve1 mtu 1500
+	$IP link set dev ve2 mtu 1500
+	$IP link set dev vens1 netns ns1
+	$IP link set dev vens2 netns ns2
+
+	$IP -n ns1 link set dev lo up
+	$IP -n ns1 link set dev vens1 up
+	$IP -n ns1 addr add 10.1.1.101/24 dev vens1
+	$IP -n ns1 addr add 2401:db01::65/64 dev vens1 nodad
+	$IP -n ns1 route add default via 10.1.1.1 dev vens1
+	$IP -n ns1 route add default via 2401:db01::1 dev vens1
+
+	$IP -n ns2 link set dev lo up
+	$IP -n ns2 link set dev vens2 up
+	$IP -n ns2 addr add 10.2.1.102/24 dev vens2
+	$IP -n ns2 addr add 2401:db02::66/64 dev vens2 nodad
+	$IP -n ns2 addr add 10.10.1.102 dev lo
+	$IP -n ns2 addr add 2401:face::66/64 dev lo nodad
+	$IP -n ns2 link add ipt2 type ipip local 10.2.1.102 remote 10.2.1.1
+	$IP -n ns2 link add ip6t2 type ip6tnl mode any local 2401:db02::66 remote 2401:db02::1
+	$IP -n ns2 link set dev ipt2 up
+	$IP -n ns2 link set dev ip6t2 up
+	$IP netns exec ns2 $TC qdisc add dev vens2 clsact
+	$IP netns exec ns2 $TC filter add dev vens2 ingress bpf da obj $REDIRECT_BPF sec drop_non_tun_vip
+	if [[ $tun_type == "ipip" ]]; then
+		$IP -n ns2 route add 10.1.1.0/24 dev ipt2
+		$IP netns exec ns2 sysctl -q -w net.ipv4.conf.all.rp_filter=0
+		$IP netns exec ns2 sysctl -q -w net.ipv4.conf.ipt2.rp_filter=0
+	else
+		$IP -n ns2 route add 10.1.1.0/24 dev ip6t2
+		$IP -n ns2 route add 2401:db01::/64 dev ip6t2
+		$IP netns exec ns2 sysctl -q -w net.ipv4.conf.all.rp_filter=0
+		$IP netns exec ns2 sysctl -q -w net.ipv4.conf.ip6t2.rp_filter=0
+	fi
+
+	$IP addr add 10.1.1.1/24 dev ve1
+	$IP addr add 2401:db01::1/64 dev ve1 nodad
+	$IP addr add 10.2.1.1/24 dev ve2
+	$IP addr add 2401:db02::1/64 dev ve2 nodad
+
+	$TC qdisc add dev ve2 clsact
+	$TC filter add dev ve2 ingress bpf da obj $REDIRECT_BPF sec l2_to_iptun_ingress_forward
+
+	sysctl -q -w net.ipv4.conf.all.rp_filter=0
+	sysctl -q -w net.ipv6.conf.all.forwarding=1
+}
+
+function cleanup {
+	set +e
+	[[ -z $DEBUG ]] || set +x
+	$IP netns delete ns1 >& /dev/null
+	$IP netns delete ns2 >& /dev/null
+	$IP link del ve1 >& /dev/null
+	$IP link del ve2 >& /dev/null
+	$IP link del ipt >& /dev/null
+	$IP link del ip6t >& /dev/null
+	sysctl -q -w net.ipv4.conf.all.rp_filter=$RP_FILTER
+	sysctl -q -w net.ipv6.conf.all.forwarding=$IPV6_FORWARDING
+	rm -f /sys/fs/bpf/tc/globals/tun_iface
+	[[ -z $DEBUG ]] || set -x
+	set -e
+}
+
+function l2_to_ipip {
+	echo -n "l2_to_ipip $1: "
+
+	local dir=$1
+
+	config_common ipip
+
+	$IP link add ipt type ipip external
+	$IP link set dev ipt up
+	sysctl -q -w net.ipv4.conf.ipt.rp_filter=0
+	sysctl -q -w net.ipv4.conf.ipt.forwarding=1
+
+	if [[ $dir == "egress" ]]; then
+		$IP route add 10.10.1.0/24 via 10.2.1.102 dev ve2
+		$TC filter add dev ve2 egress bpf da obj $REDIRECT_BPF sec l2_to_iptun_ingress_redirect
+		sysctl -q -w net.ipv4.conf.ve1.forwarding=1
+	else
+		$TC qdisc add dev ve1 clsact
+		$TC filter add dev ve1 ingress bpf da obj $REDIRECT_BPF sec l2_to_iptun_ingress_redirect
+	fi
+
+	$REDIRECT_USER -U /sys/fs/bpf/tc/globals/tun_iface -i $(< /sys/class/net/ipt/ifindex)
+
+	$IP netns exec ns1 ping -c1 10.10.1.102 >& /dev/null
+
+	if [[ $dir == "egress" ]]; then
+		# test direct egress to ve2 (i.e. not forwarding from
+		# ve1 to ve2).
+		ping -c1 10.10.1.102 >& /dev/null
+	fi
+
+	cleanup
+
+	echo "OK"
+}
+
+function l2_to_ip6tnl {
+	echo -n "l2_to_ip6tnl $1: "
+
+	local dir=$1
+
+	config_common ip6tnl
+
+	$IP link add ip6t type ip6tnl mode any external
+	$IP link set dev ip6t up
+	sysctl -q -w net.ipv4.conf.ip6t.rp_filter=0
+	sysctl -q -w net.ipv4.conf.ip6t.forwarding=1
+
+	if [[ $dir == "egress" ]]; then
+		$IP route add 10.10.1.0/24 via 10.2.1.102 dev ve2
+		$IP route add 2401:face::/64 via 2401:db02::66 dev ve2
+		$TC filter add dev ve2 egress bpf da obj $REDIRECT_BPF sec l2_to_ip6tun_ingress_redirect
+		sysctl -q -w net.ipv4.conf.ve1.forwarding=1
+	else
+		$TC qdisc add dev ve1 clsact
+		$TC filter add dev ve1 ingress bpf da obj $REDIRECT_BPF sec l2_to_ip6tun_ingress_redirect
+	fi
+
+	$REDIRECT_USER -U /sys/fs/bpf/tc/globals/tun_iface -i $(< /sys/class/net/ip6t/ifindex)
+
+	$IP netns exec ns1 ping -c1 10.10.1.102 >& /dev/null
+	$IP netns exec ns1 ping -6 -c1 2401:face::66 >& /dev/null
+
+	if [[ $dir == "egress" ]]; then
+		# test direct egress to ve2 (i.e. not forwarding from
+		# ve1 to ve2).
+		ping -c1 10.10.1.102 >& /dev/null
+		ping -6 -c1 2401:face::66 >& /dev/null
+	fi
+
+	cleanup
+
+	echo "OK"
+}
+
+cleanup
+test_names="l2_to_ipip l2_to_ip6tnl"
+test_dirs="ingress egress"
+if [[ $# -ge 2 ]]; then
+	test_names=$1
+	test_dirs=$2
+elif [[ $# -ge 1 ]]; then
+    	test_names=$1
+fi
+
+for t in $test_names; do
+	for d in $test_dirs; do
+		$t $d
+	done
+done
diff --git a/samples/bpf/tc_l2_redirect_kern.c b/samples/bpf/tc_l2_redirect_kern.c
new file mode 100644
index 0000000..92a4472
--- /dev/null
+++ b/samples/bpf/tc_l2_redirect_kern.c
@@ -0,0 +1,236 @@
+/* Copyright (c) 2016 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <uapi/linux/bpf.h>
+#include <uapi/linux/if_ether.h>
+#include <uapi/linux/if_packet.h>
+#include <uapi/linux/ip.h>
+#include <uapi/linux/ipv6.h>
+#include <uapi/linux/in.h>
+#include <uapi/linux/tcp.h>
+#include <uapi/linux/filter.h>
+#include <uapi/linux/pkt_cls.h>
+#include <net/ipv6.h>
+#include "bpf_helpers.h"
+
+#define _htonl __builtin_bswap32
+
+#define PIN_GLOBAL_NS		2
+struct bpf_elf_map {
+	__u32 type;
+	__u32 size_key;
+	__u32 size_value;
+	__u32 max_elem;
+	__u32 flags;
+	__u32 id;
+	__u32 pinning;
+};
+
+/* copy of 'struct ethhdr' without __packed */
+struct eth_hdr {
+	unsigned char   h_dest[ETH_ALEN];
+	unsigned char   h_source[ETH_ALEN];
+	unsigned short  h_proto;
+};
+
+struct bpf_elf_map SEC("maps") tun_iface = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.size_key = sizeof(int),
+	.size_value = sizeof(int),
+	.pinning = PIN_GLOBAL_NS,
+	.max_elem = 1,
+};
+
+static __always_inline bool is_vip_addr(__be16 eth_proto, __be32 daddr)
+{
+	if (eth_proto == htons(ETH_P_IP))
+		return (_htonl(0xffffff00) & daddr) == _htonl(0x0a0a0100);
+	else if (eth_proto == htons(ETH_P_IPV6))
+		return (daddr == _htonl(0x2401face));
+
+	return false;
+}
+
+SEC("l2_to_iptun_ingress_forward")
+int _l2_to_iptun_ingress_forward(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key tkey = {};
+	void *data = (void *)(long)skb->data;
+	struct eth_hdr *eth = data;
+	void *data_end = (void *)(long)skb->data_end;
+	int key = 0, *ifindex;
+
+	int ret;
+
+	if (data + sizeof(*eth) > data_end)
+		return TC_ACT_OK;
+
+	ifindex = bpf_map_lookup_elem(&tun_iface, &key);
+	if (!ifindex)
+		return TC_ACT_OK;
+
+	if (eth->h_proto == htons(ETH_P_IP)) {
+		char fmt4[] = "ingress forward to ifindex:%d daddr4:%x\n";
+		struct iphdr *iph = data + sizeof(*eth);
+
+		if (data + sizeof(*eth) + sizeof(*iph) > data_end)
+			return TC_ACT_OK;
+
+		if (iph->protocol != IPPROTO_IPIP)
+			return TC_ACT_OK;
+
+		bpf_trace_printk(fmt4, sizeof(fmt4), *ifindex,
+				 _htonl(iph->daddr));
+		return bpf_redirect(*ifindex, BPF_F_INGRESS);
+	} else if (eth->h_proto == htons(ETH_P_IPV6)) {
+		char fmt6[] = "ingress forward to ifindex:%d daddr6:%x::%x\n";
+		struct ipv6hdr *ip6h = data + sizeof(*eth);
+
+		if (data + sizeof(*eth) + sizeof(*ip6h) > data_end)
+			return TC_ACT_OK;
+
+		if (ip6h->nexthdr != IPPROTO_IPIP &&
+		    ip6h->nexthdr != IPPROTO_IPV6)
+			return TC_ACT_OK;
+
+		bpf_trace_printk(fmt6, sizeof(fmt6), *ifindex,
+				 _htonl(ip6h->daddr.s6_addr32[0]),
+				 _htonl(ip6h->daddr.s6_addr32[3]));
+		return bpf_redirect(*ifindex, BPF_F_INGRESS);
+	}
+
+	return TC_ACT_OK;
+}
+
+SEC("l2_to_iptun_ingress_redirect")
+int _l2_to_iptun_ingress_redirect(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key tkey = {};
+	void *data = (void *)(long)skb->data;
+	struct eth_hdr *eth = data;
+	void *data_end = (void *)(long)skb->data_end;
+	int key = 0, *ifindex;
+
+	int ret;
+
+	if (data + sizeof(*eth) > data_end)
+		return TC_ACT_OK;
+
+	ifindex = bpf_map_lookup_elem(&tun_iface, &key);
+	if (!ifindex)
+		return TC_ACT_OK;
+
+	if (eth->h_proto == htons(ETH_P_IP)) {
+		char fmt4[] = "e/ingress redirect daddr4:%x to ifindex:%d\n";
+		struct iphdr *iph = data + sizeof(*eth);
+		__be32 daddr = iph->daddr;
+
+		if (data + sizeof(*eth) + sizeof(*iph) > data_end)
+			return TC_ACT_OK;
+
+		if (!is_vip_addr(eth->h_proto, daddr))
+			return TC_ACT_OK;
+
+		bpf_trace_printk(fmt4, sizeof(fmt4), _htonl(daddr), *ifindex);
+	} else {
+		return TC_ACT_OK;
+	}
+
+	tkey.tunnel_id = 10000;
+	tkey.tunnel_ttl = 64;
+	tkey.remote_ipv4 = 0x0a020166; /* 10.2.1.102 */
+	bpf_skb_set_tunnel_key(skb, &tkey, sizeof(tkey), 0);
+	return bpf_redirect(*ifindex, 0);
+}
+
+SEC("l2_to_ip6tun_ingress_redirect")
+int _l2_to_ip6tun_ingress_redirect(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key tkey = {};
+	void *data = (void *)(long)skb->data;
+	struct eth_hdr *eth = data;
+	void *data_end = (void *)(long)skb->data_end;
+	int key = 0, *ifindex;
+
+	if (data + sizeof(*eth) > data_end)
+		return TC_ACT_OK;
+
+	ifindex = bpf_map_lookup_elem(&tun_iface, &key);
+	if (!ifindex)
+		return TC_ACT_OK;
+
+	if (eth->h_proto == htons(ETH_P_IP)) {
+		char fmt4[] = "e/ingress redirect daddr4:%x to ifindex:%d\n";
+		struct iphdr *iph = data + sizeof(*eth);
+
+		if (data + sizeof(*eth) + sizeof(*iph) > data_end)
+			return TC_ACT_OK;
+
+		if (!is_vip_addr(eth->h_proto, iph->daddr))
+			return TC_ACT_OK;
+
+		bpf_trace_printk(fmt4, sizeof(fmt4), _htonl(iph->daddr),
+				 *ifindex);
+	} else if (eth->h_proto == htons(ETH_P_IPV6)) {
+		char fmt6[] = "e/ingress redirect daddr6:%x to ifindex:%d\n";
+		struct ipv6hdr *ip6h = data + sizeof(*eth);
+
+		if (data + sizeof(*eth) + sizeof(*ip6h) > data_end)
+			return TC_ACT_OK;
+
+		if (!is_vip_addr(eth->h_proto, ip6h->daddr.s6_addr32[0]))
+			return TC_ACT_OK;
+
+		bpf_trace_printk(fmt6, sizeof(fmt6),
+				 _htonl(ip6h->daddr.s6_addr32[0]), *ifindex);
+	} else {
+		return TC_ACT_OK;
+	}
+
+	tkey.tunnel_id = 10000;
+	tkey.tunnel_ttl = 64;
+	/* 2401:db02:0:0:0:0:0:66 */
+	tkey.remote_ipv6[0] = _htonl(0x2401db02);
+	tkey.remote_ipv6[1] = 0;
+	tkey.remote_ipv6[2] = 0;
+	tkey.remote_ipv6[3] = _htonl(0x00000066);
+	bpf_skb_set_tunnel_key(skb, &tkey, sizeof(tkey), BPF_F_TUNINFO_IPV6);
+	return bpf_redirect(*ifindex, 0);
+}
+
+SEC("drop_non_tun_vip")
+int _drop_non_tun_vip(struct __sk_buff *skb)
+{
+	struct bpf_tunnel_key tkey = {};
+	void *data = (void *)(long)skb->data;
+	struct eth_hdr *eth = data;
+	void *data_end = (void *)(long)skb->data_end;
+
+	if (data + sizeof(*eth) > data_end)
+		return TC_ACT_OK;
+
+	if (eth->h_proto == htons(ETH_P_IP)) {
+		struct iphdr *iph = data + sizeof(*eth);
+
+		if (data + sizeof(*eth) + sizeof(*iph) > data_end)
+			return TC_ACT_OK;
+
+		if (is_vip_addr(eth->h_proto, iph->daddr))
+			return TC_ACT_SHOT;
+	} else if (eth->h_proto == htons(ETH_P_IPV6)) {
+		struct ipv6hdr *ip6h = data + sizeof(*eth);
+
+		if (data + sizeof(*eth) + sizeof(*ip6h) > data_end)
+			return TC_ACT_OK;
+
+		if (is_vip_addr(eth->h_proto, ip6h->daddr.s6_addr32[0]))
+			return TC_ACT_SHOT;
+	}
+
+	return TC_ACT_OK;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/samples/bpf/tc_l2_redirect_user.c b/samples/bpf/tc_l2_redirect_user.c
new file mode 100644
index 0000000..4013c53
--- /dev/null
+++ b/samples/bpf/tc_l2_redirect_user.c
@@ -0,0 +1,73 @@
+/* Copyright (c) 2016 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <linux/unistd.h>
+#include <linux/bpf.h>
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <string.h>
+#include <errno.h>
+
+#include "libbpf.h"
+
+static void usage(void)
+{
+	printf("Usage: tc_l2_ipip_redirect [...]\n");
+	printf("       -U <file>   Update an already pinned BPF array\n");
+	printf("       -i <ifindex> Interface index\n");
+	printf("       -h          Display this help\n");
+}
+
+int main(int argc, char **argv)
+{
+	const char *pinned_file = NULL;
+	int ifindex = -1;
+	int array_key = 0;
+	int array_fd = -1;
+	int ret = -1;
+	int opt;
+
+	while ((opt = getopt(argc, argv, "F:U:i:")) != -1) {
+		switch (opt) {
+		/* General args */
+		case 'U':
+			pinned_file = optarg;
+			break;
+		case 'i':
+			ifindex = atoi(optarg);
+			break;
+		default:
+			usage();
+			goto out;
+		}
+	}
+
+	if (ifindex < 0 || !pinned_file) {
+		usage();
+		goto out;
+	}
+
+	array_fd = bpf_obj_get(pinned_file);
+	if (array_fd < 0) {
+		fprintf(stderr, "bpf_obj_get(%s): %s(%d)\n",
+			pinned_file, strerror(errno), errno);
+		goto out;
+	}
+
+	/* bpf_tunnel_key.remote_ipv4 expects host byte orders */
+	ret = bpf_update_elem(array_fd, &array_key, &ifindex, 0);
+	if (ret) {
+		perror("bpf_update_elem");
+		goto out;
+	}
+
+out:
+	if (array_fd != -1)
+		close(array_fd);
+	return ret;
+}
-- 
2.5.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net 0/2] bpf: Fix bpf_redirect to an ipip/ip6tnl dev
  2016-11-09 23:36 [PATCH net 0/2] bpf: Fix bpf_redirect to an ipip/ip6tnl dev Martin KaFai Lau
  2016-11-09 23:36 ` [PATCH net 1/2] " Martin KaFai Lau
  2016-11-09 23:36 ` [PATCH net 2/2] bpf: Add test for bpf_redirect to ipip/ip6tnl Martin KaFai Lau
@ 2016-11-13  4:39 ` David Miller
  2 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2016-11-13  4:39 UTC (permalink / raw)
  To: kafai; +Cc: netdev, ast, daniel, kernel-team

From: Martin KaFai Lau <kafai@fb.com>
Date: Wed, 9 Nov 2016 15:36:32 -0800

> This patch set fixes a bug in bpf_redirect(dev, flags) when dev is an
> ipip/ip6tnl.  The current problem is IP-EthHdr-IP is sent out instead of
> IP-IP.
> 
> Patch 1 adds a dev->type test similar to dev_is_mac_header_xmit()
> in act_mirred.c which is only available in net-next.  We can consider to
> refactor it once this patch is pulled into net-next from net.

Series applied, thanks Martin.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net 1/2] bpf: Fix bpf_redirect to an ipip/ip6tnl dev
  2016-11-09 23:36 ` [PATCH net 1/2] " Martin KaFai Lau
@ 2019-01-10 20:41   ` Willem de Bruijn
  2019-01-10 22:22     ` Daniel Borkmann
  0 siblings, 1 reply; 8+ messages in thread
From: Willem de Bruijn @ 2019-01-10 20:41 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Network Development, Alexei Starovoitov, Daniel Borkmann, tgraf

On Wed, Nov 9, 2016 at 6:57 PM Martin KaFai Lau <kafai@fb.com> wrote:
>
> If the bpf program calls bpf_redirect(dev, 0) and dev is
> an ipip/ip6tnl, it currently includes the mac header.
> e.g. If dev is ipip, the end result is IP-EthHdr-IP instead
> of IP-IP.
>
> The fix is to pull the mac header.  At ingress, skb_postpull_rcsum()
> is not needed because the ethhdr should have been pulled once already
> and then got pushed back just before calling the bpf_prog.
> At egress, this patch calls skb_postpull_rcsum().
>
> If bpf_redirect(dev, BPF_F_INGRESS) is called,
> it also fails now because it calls dev_forward_skb() which
> eventually calls eth_type_trans(skb, dev).  The eth_type_trans()
> will set skb->type = PACKET_OTHERHOST because the mac address
> does not match the redirecting dev->dev_addr.  The PACKET_OTHERHOST
> will eventually cause the ip_rcv() errors out.  To fix this,
> ____dev_forward_skb() is added.
>
> Joint work with Daniel Borkmann.
>
> Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel")
> Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: Alexei Starovoitov <ast@fb.com>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> ---
>  include/linux/netdevice.h | 15 +++++++++++
>  net/core/dev.c            | 17 +++++-------
>  net/core/filter.c         | 68 +++++++++++++++++++++++++++++++++++++++++------
>  3 files changed, 81 insertions(+), 19 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 91ee364..bf04a46 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3354,6 +3354,21 @@ int dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
>  bool is_skb_forwardable(const struct net_device *dev,
>                         const struct sk_buff *skb);
>
> +static __always_inline int ____dev_forward_skb(struct net_device *dev,
> +                                              struct sk_buff *skb)
> +{
> +       if (skb_orphan_frags(skb, GFP_ATOMIC) ||
> +           unlikely(!is_skb_forwardable(dev, skb))) {
> +               atomic_long_inc(&dev->rx_dropped);
> +               kfree_skb(skb);
> +               return NET_RX_DROP;
> +       }
> +
> +       skb_scrub_packet(skb, true);
> +       skb->priority = 0;
> +       return 0;
> +}
> +
>  void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev);
>
>  extern int             netdev_budget;
> diff --git a/net/core/dev.c b/net/core/dev.c
> index eaad4c2..6666b28 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1766,19 +1766,14 @@ EXPORT_SYMBOL_GPL(is_skb_forwardable);
>
>  int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
>  {
> -       if (skb_orphan_frags(skb, GFP_ATOMIC) ||
> -           unlikely(!is_skb_forwardable(dev, skb))) {
> -               atomic_long_inc(&dev->rx_dropped);
> -               kfree_skb(skb);
> -               return NET_RX_DROP;
> -       }
> +       int ret = ____dev_forward_skb(dev, skb);
>
> -       skb_scrub_packet(skb, true);
> -       skb->priority = 0;
> -       skb->protocol = eth_type_trans(skb, dev);
> -       skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
> +       if (likely(!ret)) {
> +               skb->protocol = eth_type_trans(skb, dev);
> +               skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
> +       }
>
> -       return 0;
> +       return ret;
>  }
>  EXPORT_SYMBOL_GPL(__dev_forward_skb);
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 00351cd..b391209 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -1628,6 +1628,19 @@ static inline int __bpf_rx_skb(struct net_device *dev, struct sk_buff *skb)
>         return dev_forward_skb(dev, skb);
>  }
>
> +static inline int __bpf_rx_skb_no_mac(struct net_device *dev,
> +                                     struct sk_buff *skb)
> +{
> +       int ret = ____dev_forward_skb(dev, skb);
> +
> +       if (likely(!ret)) {
> +               skb->dev = dev;
> +               ret = netif_rx(skb);
> +       }
> +
> +       return ret;
> +}
> +
>  static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb)
>  {
>         int ret;
> @@ -1647,6 +1660,51 @@ static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb)
>         return ret;
>  }
>
> +static int __bpf_redirect_no_mac(struct sk_buff *skb, struct net_device *dev,
> +                                u32 flags)
> +{
> +       /* skb->mac_len is not set on normal egress */
> +       unsigned int mlen = skb->network_header - skb->mac_header;
> +
> +       __skb_pull(skb, mlen);

syzbot found an issue when redirecting an LWT_XMIT program using
BPF_PROG_TEST_RUN.

That path produces packets with skb->data at skb->network_header, as
is_l2 is false in bpf_prog_test_run_skb.

I think the correct fix here is to pull from skb->data up to
skb->network_header, instead of unconditionally from skb->mac_header:

-       unsigned int mlen = skb->network_header - skb->mac_header;
+       unsigned int mlen = skb_network_offset(skb);

-       __skb_pull(skb, mlen);
-
-       /* At ingress, the mac header has already been pulled once.
-        * At egress, skb_pospull_rcsum has to be done in case that
-        * the skb is originated from ingress (i.e. a forwarded skb)
-        * to ensure that rcsum starts at net header.
-        */
-       if (!skb_at_tc_ingress(skb))
-               skb_postpull_rcsum(skb, skb_mac_header(skb), mlen);
+       if (mlen) {
+               __skb_pull(skb, mlen);
+
+               /* At ingress, the mac header has already been pulled once.
+                * At egress, skb_pospull_rcsum has to be done in case that
+                * the skb is originated from ingress (i.e. a forwarded skb)
+                * to ensure that rcsum starts at net header.
+                */
+               if (!skb_at_tc_ingress(skb))
+                       skb_postpull_rcsum(skb, skb_mac_header(skb), mlen);
+       }

Comments, concerns?

> +
> +       /* At ingress, the mac header has already been pulled once.
> +        * At egress, skb_pospull_rcsum has to be done in case that
> +        * the skb is originated from ingress (i.e. a forwarded skb)
> +        * to ensure that rcsum starts at net header.
> +        */
> +       if (!skb_at_tc_ingress(skb))
> +               skb_postpull_rcsum(skb, skb_mac_header(skb), mlen);
> +       skb_pop_mac_header(skb);
> +       skb_reset_mac_len(skb);
> +       return flags & BPF_F_INGRESS ?
> +              __bpf_rx_skb_no_mac(dev, skb) : __bpf_tx_skb(dev, skb);
> +}
> +
> +static int __bpf_redirect_common(struct sk_buff *skb, struct net_device *dev,
> +                                u32 flags)
> +{

This function has since been extended with check

+       /* Verify that a link layer header is carried */
+       if (unlikely(skb->mac_header >= skb->network_header)) {
+               kfree_skb(skb);
+               return -ERANGE;
+       }
+

I haven't checked yet, but I think that this will stillbe incorrect
from LWT_XMIT to a destination that expects skb->data at
skb->mac_header.

> +       bpf_push_mac_rcsum(skb);
> +       return flags & BPF_F_INGRESS ?
> +              __bpf_rx_skb(dev, skb) : __bpf_tx_skb(dev, skb);
> +}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net 1/2] bpf: Fix bpf_redirect to an ipip/ip6tnl dev
  2019-01-10 20:41   ` Willem de Bruijn
@ 2019-01-10 22:22     ` Daniel Borkmann
  2019-01-10 22:44       ` Willem de Bruijn
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Borkmann @ 2019-01-10 22:22 UTC (permalink / raw)
  To: Willem de Bruijn, Martin KaFai Lau
  Cc: Network Development, Alexei Starovoitov, tgraf

On 01/10/2019 09:41 PM, Willem de Bruijn wrote:
> On Wed, Nov 9, 2016 at 6:57 PM Martin KaFai Lau <kafai@fb.com> wrote:
>>
>> If the bpf program calls bpf_redirect(dev, 0) and dev is
>> an ipip/ip6tnl, it currently includes the mac header.
>> e.g. If dev is ipip, the end result is IP-EthHdr-IP instead
>> of IP-IP.
>>
>> The fix is to pull the mac header.  At ingress, skb_postpull_rcsum()
>> is not needed because the ethhdr should have been pulled once already
>> and then got pushed back just before calling the bpf_prog.
>> At egress, this patch calls skb_postpull_rcsum().
>>
>> If bpf_redirect(dev, BPF_F_INGRESS) is called,
>> it also fails now because it calls dev_forward_skb() which
>> eventually calls eth_type_trans(skb, dev).  The eth_type_trans()
>> will set skb->type = PACKET_OTHERHOST because the mac address
>> does not match the redirecting dev->dev_addr.  The PACKET_OTHERHOST
>> will eventually cause the ip_rcv() errors out.  To fix this,
>> ____dev_forward_skb() is added.
>>
>> Joint work with Daniel Borkmann.
>>
>> Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel")
>> Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
>> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
>> Acked-by: Alexei Starovoitov <ast@fb.com>
>> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
>> ---
>>  include/linux/netdevice.h | 15 +++++++++++
>>  net/core/dev.c            | 17 +++++-------
>>  net/core/filter.c         | 68 +++++++++++++++++++++++++++++++++++++++++------
>>  3 files changed, 81 insertions(+), 19 deletions(-)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 91ee364..bf04a46 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -3354,6 +3354,21 @@ int dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
>>  bool is_skb_forwardable(const struct net_device *dev,
>>                         const struct sk_buff *skb);
>>
>> +static __always_inline int ____dev_forward_skb(struct net_device *dev,
>> +                                              struct sk_buff *skb)
>> +{
>> +       if (skb_orphan_frags(skb, GFP_ATOMIC) ||
>> +           unlikely(!is_skb_forwardable(dev, skb))) {
>> +               atomic_long_inc(&dev->rx_dropped);
>> +               kfree_skb(skb);
>> +               return NET_RX_DROP;
>> +       }
>> +
>> +       skb_scrub_packet(skb, true);
>> +       skb->priority = 0;
>> +       return 0;
>> +}
>> +
>>  void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev);
>>
>>  extern int             netdev_budget;
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index eaad4c2..6666b28 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -1766,19 +1766,14 @@ EXPORT_SYMBOL_GPL(is_skb_forwardable);
>>
>>  int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
>>  {
>> -       if (skb_orphan_frags(skb, GFP_ATOMIC) ||
>> -           unlikely(!is_skb_forwardable(dev, skb))) {
>> -               atomic_long_inc(&dev->rx_dropped);
>> -               kfree_skb(skb);
>> -               return NET_RX_DROP;
>> -       }
>> +       int ret = ____dev_forward_skb(dev, skb);
>>
>> -       skb_scrub_packet(skb, true);
>> -       skb->priority = 0;
>> -       skb->protocol = eth_type_trans(skb, dev);
>> -       skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
>> +       if (likely(!ret)) {
>> +               skb->protocol = eth_type_trans(skb, dev);
>> +               skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
>> +       }
>>
>> -       return 0;
>> +       return ret;
>>  }
>>  EXPORT_SYMBOL_GPL(__dev_forward_skb);
>>
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 00351cd..b391209 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -1628,6 +1628,19 @@ static inline int __bpf_rx_skb(struct net_device *dev, struct sk_buff *skb)
>>         return dev_forward_skb(dev, skb);
>>  }
>>
>> +static inline int __bpf_rx_skb_no_mac(struct net_device *dev,
>> +                                     struct sk_buff *skb)
>> +{
>> +       int ret = ____dev_forward_skb(dev, skb);
>> +
>> +       if (likely(!ret)) {
>> +               skb->dev = dev;
>> +               ret = netif_rx(skb);
>> +       }
>> +
>> +       return ret;
>> +}
>> +
>>  static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb)
>>  {
>>         int ret;
>> @@ -1647,6 +1660,51 @@ static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb)
>>         return ret;
>>  }
>>
>> +static int __bpf_redirect_no_mac(struct sk_buff *skb, struct net_device *dev,
>> +                                u32 flags)
>> +{
>> +       /* skb->mac_len is not set on normal egress */
>> +       unsigned int mlen = skb->network_header - skb->mac_header;
>> +
>> +       __skb_pull(skb, mlen);
> 
> syzbot found an issue when redirecting an LWT_XMIT program using
> BPF_PROG_TEST_RUN.
> 
> That path produces packets with skb->data at skb->network_header, as
> is_l2 is false in bpf_prog_test_run_skb.
> 
> I think the correct fix here is to pull from skb->data up to
> skb->network_header, instead of unconditionally from skb->mac_header:
> 
> -       unsigned int mlen = skb->network_header - skb->mac_header;
> +       unsigned int mlen = skb_network_offset(skb);
> 
> -       __skb_pull(skb, mlen);
> -
> -       /* At ingress, the mac header has already been pulled once.
> -        * At egress, skb_pospull_rcsum has to be done in case that
> -        * the skb is originated from ingress (i.e. a forwarded skb)
> -        * to ensure that rcsum starts at net header.
> -        */
> -       if (!skb_at_tc_ingress(skb))
> -               skb_postpull_rcsum(skb, skb_mac_header(skb), mlen);
> +       if (mlen) {
> +               __skb_pull(skb, mlen);
> +
> +               /* At ingress, the mac header has already been pulled once.
> +                * At egress, skb_pospull_rcsum has to be done in case that
> +                * the skb is originated from ingress (i.e. a forwarded skb)
> +                * to ensure that rcsum starts at net header.
> +                */
> +               if (!skb_at_tc_ingress(skb))
> +                       skb_postpull_rcsum(skb, skb_mac_header(skb), mlen);
> +       }
> 
> Comments, concerns?

Hmm, thanks for the report! Do you have some more info on the root cause,
are you saying that skb->mac_header is invalid (~0U) and therefore the
subsequent __skb_pull() will set skb->data to garbage? But then also the
skb_postpull_rcsum() from above would be wrong as well, no? Given we have
no actual LWT_* program under BPF kselftest, I'm actually wondering whether
the BPF_PROG_TEST_RUN framework correctly reflects skb setup from lwt with
all its assumptions, but if that is indeed correct and is the only such
case that causes trouble then another option would be to fix up run_lwt_bpf()
to correctly prep the skb before skb_do_redirect() to avoid potential
breakage on others. Not sure if this would do it, so wild shot in the dark:

diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
index 3e85437..a648568 100644
--- a/net/core/lwt_bpf.c
+++ b/net/core/lwt_bpf.c
@@ -63,6 +63,7 @@ static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt,
                                     lwt->name ? : "<unknown>");
                        ret = BPF_OK;
                } else {
+                       skb_reset_mac_header(skb);
                        ret = skb_do_redirect(skb);
                        if (ret == 0)
                                ret = BPF_REDIRECT;

>> +
>> +       /* At ingress, the mac header has already been pulled once.
>> +        * At egress, skb_pospull_rcsum has to be done in case that
>> +        * the skb is originated from ingress (i.e. a forwarded skb)
>> +        * to ensure that rcsum starts at net header.
>> +        */
>> +       if (!skb_at_tc_ingress(skb))
>> +               skb_postpull_rcsum(skb, skb_mac_header(skb), mlen);
>> +       skb_pop_mac_header(skb);
>> +       skb_reset_mac_len(skb);
>> +       return flags & BPF_F_INGRESS ?
>> +              __bpf_rx_skb_no_mac(dev, skb) : __bpf_tx_skb(dev, skb);
>> +}
>> +
>> +static int __bpf_redirect_common(struct sk_buff *skb, struct net_device *dev,
>> +                                u32 flags)
>> +{
> 
> This function has since been extended with check
> 
> +       /* Verify that a link layer header is carried */
> +       if (unlikely(skb->mac_header >= skb->network_header)) {
> +               kfree_skb(skb);
> +               return -ERANGE;
> +       }
> +
> 
> I haven't checked yet, but I think that this will stillbe incorrect
> from LWT_XMIT to a destination that expects skb->data at
> skb->mac_header.

Hmm, but this is in __bpf_redirect_common(), the above issue described
is in __bpf_redirect_no_mac(). Meaning, if a LWT_XMIT prog wants to
egress redirect to a device that expects mac header, then I think the
above check is okay for dropping invalid skb, so the prog first needs
to call bpf_skb_change_head() helper to add a mac header instead.

>> +       bpf_push_mac_rcsum(skb);
>> +       return flags & BPF_F_INGRESS ?
>> +              __bpf_rx_skb(dev, skb) : __bpf_tx_skb(dev, skb);
>> +}

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net 1/2] bpf: Fix bpf_redirect to an ipip/ip6tnl dev
  2019-01-10 22:22     ` Daniel Borkmann
@ 2019-01-10 22:44       ` Willem de Bruijn
  2019-01-16  1:28         ` Willem de Bruijn
  0 siblings, 1 reply; 8+ messages in thread
From: Willem de Bruijn @ 2019-01-10 22:44 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Martin KaFai Lau, Network Development, Alexei Starovoitov, tgraf

On Thu, Jan 10, 2019 at 5:22 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 01/10/2019 09:41 PM, Willem de Bruijn wrote:
> > On Wed, Nov 9, 2016 at 6:57 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >>
> >> If the bpf program calls bpf_redirect(dev, 0) and dev is
> >> an ipip/ip6tnl, it currently includes the mac header.
> >> e.g. If dev is ipip, the end result is IP-EthHdr-IP instead
> >> of IP-IP.
> >>
> >> The fix is to pull the mac header.  At ingress, skb_postpull_rcsum()
> >> is not needed because the ethhdr should have been pulled once already
> >> and then got pushed back just before calling the bpf_prog.
> >> At egress, this patch calls skb_postpull_rcsum().
> >>
> >> If bpf_redirect(dev, BPF_F_INGRESS) is called,
> >> it also fails now because it calls dev_forward_skb() which
> >> eventually calls eth_type_trans(skb, dev).  The eth_type_trans()
> >> will set skb->type = PACKET_OTHERHOST because the mac address
> >> does not match the redirecting dev->dev_addr.  The PACKET_OTHERHOST
> >> will eventually cause the ip_rcv() errors out.  To fix this,
> >> ____dev_forward_skb() is added.
> >>
> >> Joint work with Daniel Borkmann.
> >>
> >> Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel")
> >> Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
> >> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> >> Acked-by: Alexei Starovoitov <ast@fb.com>
> >> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> >> ---
> >>  include/linux/netdevice.h | 15 +++++++++++
> >>  net/core/dev.c            | 17 +++++-------
> >>  net/core/filter.c         | 68 +++++++++++++++++++++++++++++++++++++++++------
> >>  3 files changed, 81 insertions(+), 19 deletions(-)
> >>
> >> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >> index 91ee364..bf04a46 100644
> >> --- a/include/linux/netdevice.h
> >> +++ b/include/linux/netdevice.h
> >> @@ -3354,6 +3354,21 @@ int dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
> >>  bool is_skb_forwardable(const struct net_device *dev,
> >>                         const struct sk_buff *skb);
> >>
> >> +static __always_inline int ____dev_forward_skb(struct net_device *dev,
> >> +                                              struct sk_buff *skb)
> >> +{
> >> +       if (skb_orphan_frags(skb, GFP_ATOMIC) ||
> >> +           unlikely(!is_skb_forwardable(dev, skb))) {
> >> +               atomic_long_inc(&dev->rx_dropped);
> >> +               kfree_skb(skb);
> >> +               return NET_RX_DROP;
> >> +       }
> >> +
> >> +       skb_scrub_packet(skb, true);
> >> +       skb->priority = 0;
> >> +       return 0;
> >> +}
> >> +
> >>  void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev);
> >>
> >>  extern int             netdev_budget;
> >> diff --git a/net/core/dev.c b/net/core/dev.c
> >> index eaad4c2..6666b28 100644
> >> --- a/net/core/dev.c
> >> +++ b/net/core/dev.c
> >> @@ -1766,19 +1766,14 @@ EXPORT_SYMBOL_GPL(is_skb_forwardable);
> >>
> >>  int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
> >>  {
> >> -       if (skb_orphan_frags(skb, GFP_ATOMIC) ||
> >> -           unlikely(!is_skb_forwardable(dev, skb))) {
> >> -               atomic_long_inc(&dev->rx_dropped);
> >> -               kfree_skb(skb);
> >> -               return NET_RX_DROP;
> >> -       }
> >> +       int ret = ____dev_forward_skb(dev, skb);
> >>
> >> -       skb_scrub_packet(skb, true);
> >> -       skb->priority = 0;
> >> -       skb->protocol = eth_type_trans(skb, dev);
> >> -       skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
> >> +       if (likely(!ret)) {
> >> +               skb->protocol = eth_type_trans(skb, dev);
> >> +               skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
> >> +       }
> >>
> >> -       return 0;
> >> +       return ret;
> >>  }
> >>  EXPORT_SYMBOL_GPL(__dev_forward_skb);
> >>
> >> diff --git a/net/core/filter.c b/net/core/filter.c
> >> index 00351cd..b391209 100644
> >> --- a/net/core/filter.c
> >> +++ b/net/core/filter.c
> >> @@ -1628,6 +1628,19 @@ static inline int __bpf_rx_skb(struct net_device *dev, struct sk_buff *skb)
> >>         return dev_forward_skb(dev, skb);
> >>  }
> >>
> >> +static inline int __bpf_rx_skb_no_mac(struct net_device *dev,
> >> +                                     struct sk_buff *skb)
> >> +{
> >> +       int ret = ____dev_forward_skb(dev, skb);
> >> +
> >> +       if (likely(!ret)) {
> >> +               skb->dev = dev;
> >> +               ret = netif_rx(skb);
> >> +       }
> >> +
> >> +       return ret;
> >> +}
> >> +
> >>  static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb)
> >>  {
> >>         int ret;
> >> @@ -1647,6 +1660,51 @@ static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb)
> >>         return ret;
> >>  }
> >>
> >> +static int __bpf_redirect_no_mac(struct sk_buff *skb, struct net_device *dev,
> >> +                                u32 flags)
> >> +{
> >> +       /* skb->mac_len is not set on normal egress */
> >> +       unsigned int mlen = skb->network_header - skb->mac_header;
> >> +
> >> +       __skb_pull(skb, mlen);
> >
> > syzbot found an issue when redirecting an LWT_XMIT program using
> > BPF_PROG_TEST_RUN.
> >
> > That path produces packets with skb->data at skb->network_header, as
> > is_l2 is false in bpf_prog_test_run_skb.
> >
> > I think the correct fix here is to pull from skb->data up to
> > skb->network_header, instead of unconditionally from skb->mac_header:
> >
> > -       unsigned int mlen = skb->network_header - skb->mac_header;
> > +       unsigned int mlen = skb_network_offset(skb);
> >
> > -       __skb_pull(skb, mlen);
> > -
> > -       /* At ingress, the mac header has already been pulled once.
> > -        * At egress, skb_pospull_rcsum has to be done in case that
> > -        * the skb is originated from ingress (i.e. a forwarded skb)
> > -        * to ensure that rcsum starts at net header.
> > -        */
> > -       if (!skb_at_tc_ingress(skb))
> > -               skb_postpull_rcsum(skb, skb_mac_header(skb), mlen);
> > +       if (mlen) {
> > +               __skb_pull(skb, mlen);
> > +
> > +               /* At ingress, the mac header has already been pulled once.
> > +                * At egress, skb_pospull_rcsum has to be done in case that
> > +                * the skb is originated from ingress (i.e. a forwarded skb)
> > +                * to ensure that rcsum starts at net header.
> > +                */
> > +               if (!skb_at_tc_ingress(skb))
> > +                       skb_postpull_rcsum(skb, skb_mac_header(skb), mlen);
> > +       }
> >
> > Comments, concerns?
>
> Hmm, thanks for the report! Do you have some more info on the root cause,

Thanks for the quick follow-up :)

> are you saying that skb->mac_header is invalid (~0U) and therefore the
> subsequent __skb_pull() will set skb->data to garbage?

The field are initialized correctly. At entry to __bpf_redirect_no_mac:

skb->len = 0
mac_hdr = 64
net_hdr = 78
mac_len = 0

In bpf_prog_test_run_skb, it built an exactly 14 byte packet.

Just before calling the bpf program:

skb->len = 0
mac offset = -14
mac_len = 0

It's all setup correctly due to the call to eth_type_trans.

> But then also the
> skb_postpull_rcsum() from above would be wrong as well, no? Given we have
> no actual LWT_* program under BPF kselftest, I'm actually wondering whether
> the BPF_PROG_TEST_RUN framework correctly reflects skb setup from lwt with
> all its assumptions, but if that is indeed correct and is the only such
> case that causes trouble then another option would be to fix up run_lwt_bpf()

This call stack does not seem to go through run_lwt_bpf():

[   60.305641] __bpf_redirect+0x287/0x1270
[   60.316567]  bpf_clone_redirect+0x37b/0x520
[   60.317097]  ___bpf_prog_run+0x24d4/0x6610
[   60.321365]  __bpf_prog_run512+0xea/0x150
[   60.329756]  bpf_test_run+0x257/0x6c0
[   60.330760]  bpf_prog_test_run_skb+0xc0a/0x1470
[   60.334455]  __do_sys_bpf+0x139e/0x3a60
[   60.341832]  __x64_sys_bpf+0x73/0xb0
[   60.342287]  do_syscall_64+0x192/0x770

Program that syzkaller cooked, (poorly) manually annotated:

[   43.588807]   code=0xb7 dst=2 src=0 off=0 imm=8
BPF_MOV64_IMM
[   43.589395]   code=0xbf dst=3 src=10 off=0 imm=0
BPF_MOV64_REG
[   43.589986]   code=0x7 dst=3 src=0 off=0 imm=-512
BPF_MISC
[   43.590581]   code=0x7a dst=10 src=0 off=-16 imm=-8
[   43.591210]   code=0x79 dst=4 src=10 off=-16 imm=0
[   43.591879]   code=0xb7 dst=6 src=0 off=0 imm=-1
BPF_MOV64_IMM
[   43.592517]   code=0x2d dst=4 src=6 off=5 imm=0
[   43.593181]   code=0x65 dst=4 src=0 off=4 imm=1
BPF_JMP
[   43.593856]   code=0x4 dst=4 src=0 off=0 imm=1618804737
[   43.594573]   code=0xb7 dst=3 src=0 off=0 imm=0
BPF_MOV64_IMM
[   43.595238]   code=0x6a dst=10 src=0 off=-512 imm=0
[   43.595920]   code=0x85 dst=0 src=0 off=0 imm=13
BPF_CALL clone_redirect
[   43.596560]   code=0xb7 dst=0 src=0 off=0 imm=201326592
BPF_MOV64_IMM
[   43.597278]   code=0x95 dst=0 src=0 off=0 imm=0
BPF_EXIT_INSN

> to correctly prep the skb before skb_do_redirect() to avoid potential
> breakage on others. Not sure if this would do it, so wild shot in the dark:
>
> diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
> index 3e85437..a648568 100644
> --- a/net/core/lwt_bpf.c
> +++ b/net/core/lwt_bpf.c
> @@ -63,6 +63,7 @@ static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt,
>                                      lwt->name ? : "<unknown>");
>                         ret = BPF_OK;
>                 } else {
> +                       skb_reset_mac_header(skb);
>                         ret = skb_do_redirect(skb);
>                         if (ret == 0)
>                                 ret = BPF_REDIRECT;
>
> >> +
> >> +       /* At ingress, the mac header has already been pulled once.
> >> +        * At egress, skb_pospull_rcsum has to be done in case that
> >> +        * the skb is originated from ingress (i.e. a forwarded skb)
> >> +        * to ensure that rcsum starts at net header.
> >> +        */
> >> +       if (!skb_at_tc_ingress(skb))
> >> +               skb_postpull_rcsum(skb, skb_mac_header(skb), mlen);
> >> +       skb_pop_mac_header(skb);
> >> +       skb_reset_mac_len(skb);
> >> +       return flags & BPF_F_INGRESS ?
> >> +              __bpf_rx_skb_no_mac(dev, skb) : __bpf_tx_skb(dev, skb);
> >> +}
> >> +
> >> +static int __bpf_redirect_common(struct sk_buff *skb, struct net_device *dev,
> >> +                                u32 flags)
> >> +{
> >
> > This function has since been extended with check
> >
> > +       /* Verify that a link layer header is carried */
> > +       if (unlikely(skb->mac_header >= skb->network_header)) {
> > +               kfree_skb(skb);
> > +               return -ERANGE;
> > +       }
> > +
> >
> > I haven't checked yet, but I think that this will stillbe incorrect
> > from LWT_XMIT to a destination that expects skb->data at
> > skb->mac_header.
>
> Hmm, but this is in __bpf_redirect_common(), the above issue described
> is in __bpf_redirect_no_mac(). Meaning, if a LWT_XMIT prog wants to
> egress redirect to a device that expects mac header, then I think the
> above check is okay for dropping invalid skb, so the prog first needs
> to call bpf_skb_change_head() helper to add a mac header instead.

Ah, I hadn't thought of the option for the program itself to have to
insert the header.

At least for the bpf_prog_test_run_skb case, the mac header will
already exist, just right before skb->data and skb->network_header. So
the test won't catch it. But this is likely not true for real LWT_XMIT
paths.
>
> >> +       bpf_push_mac_rcsum(skb);
> >> +       return flags & BPF_F_INGRESS ?
> >> +              __bpf_rx_skb(dev, skb) : __bpf_tx_skb(dev, skb);
> >> +}
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net 1/2] bpf: Fix bpf_redirect to an ipip/ip6tnl dev
  2019-01-10 22:44       ` Willem de Bruijn
@ 2019-01-16  1:28         ` Willem de Bruijn
  0 siblings, 0 replies; 8+ messages in thread
From: Willem de Bruijn @ 2019-01-16  1:28 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Martin KaFai Lau, Network Development, Alexei Starovoitov, tgraf

On Thu, Jan 10, 2019 at 5:44 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Thu, Jan 10, 2019 at 5:22 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > On 01/10/2019 09:41 PM, Willem de Bruijn wrote:
> > > On Wed, Nov 9, 2016 at 6:57 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > >>
> > >> If the bpf program calls bpf_redirect(dev, 0) and dev is
> > >> an ipip/ip6tnl, it currently includes the mac header.
> > >> e.g. If dev is ipip, the end result is IP-EthHdr-IP instead
> > >> of IP-IP.
> > >>
> > >> The fix is to pull the mac header.  At ingress, skb_postpull_rcsum()
> > >> is not needed because the ethhdr should have been pulled once already
> > >> and then got pushed back just before calling the bpf_prog.
> > >> At egress, this patch calls skb_postpull_rcsum().
> > >>
> > >> If bpf_redirect(dev, BPF_F_INGRESS) is called,
> > >> it also fails now because it calls dev_forward_skb() which
> > >> eventually calls eth_type_trans(skb, dev).  The eth_type_trans()
> > >> will set skb->type = PACKET_OTHERHOST because the mac address
> > >> does not match the redirecting dev->dev_addr.  The PACKET_OTHERHOST
> > >> will eventually cause the ip_rcv() errors out.  To fix this,
> > >> ____dev_forward_skb() is added.
> > >>
> > >> Joint work with Daniel Borkmann.
> > >>
> > >> Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel")
> > >> Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
> > >> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> > >> Acked-by: Alexei Starovoitov <ast@fb.com>
> > >> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > >> ---

> > >> +static int __bpf_redirect_no_mac(struct sk_buff *skb, struct net_device *dev,
> > >> +                                u32 flags)
> > >> +{
> > >> +       /* skb->mac_len is not set on normal egress */
> > >> +       unsigned int mlen = skb->network_header - skb->mac_header;
> > >> +
> > >> +       __skb_pull(skb, mlen);
> > >
> > > syzbot found an issue when redirecting an LWT_XMIT program using
> > > BPF_PROG_TEST_RUN.
> > >
> > > That path produces packets with skb->data at skb->network_header, as
> > > is_l2 is false in bpf_prog_test_run_skb.
> > >
> > > I think the correct fix here is to pull from skb->data up to
> > > skb->network_header, instead of unconditionally from skb->mac_header:
> > >
> > > -       unsigned int mlen = skb->network_header - skb->mac_header;
> > > +       unsigned int mlen = skb_network_offset(skb);
> > >
> > > -       __skb_pull(skb, mlen);
> > > -
> > > -       /* At ingress, the mac header has already been pulled once.
> > > -        * At egress, skb_pospull_rcsum has to be done in case that
> > > -        * the skb is originated from ingress (i.e. a forwarded skb)
> > > -        * to ensure that rcsum starts at net header.
> > > -        */
> > > -       if (!skb_at_tc_ingress(skb))
> > > -               skb_postpull_rcsum(skb, skb_mac_header(skb), mlen);
> > > +       if (mlen) {
> > > +               __skb_pull(skb, mlen);
> > > +
> > > +               /* At ingress, the mac header has already been pulled once.
> > > +                * At egress, skb_pospull_rcsum has to be done in case that
> > > +                * the skb is originated from ingress (i.e. a forwarded skb)
> > > +                * to ensure that rcsum starts at net header.
> > > +                */
> > > +               if (!skb_at_tc_ingress(skb))
> > > +                       skb_postpull_rcsum(skb, skb_mac_header(skb), mlen);
> > > +       }
> > >
> > > Comments, concerns?
> >
> > Hmm, thanks for the report! Do you have some more info on the root cause,
>
> > But then also the
> > skb_postpull_rcsum() from above would be wrong as well, no? Given we have
> > no actual LWT_* program under BPF kselftest, I'm actually wondering whether
> > the BPF_PROG_TEST_RUN framework correctly reflects skb setup from lwt with
> > all its assumptions, but if that is indeed correct and is the only such
> > case that causes trouble then another option would be to fix up run_lwt_bpf()
> > to correctly prep the skb before skb_do_redirect() to avoid potential
> > breakage on others. Not sure if this would do it, so wild shot in the dark:
> >
> > diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
> > index 3e85437..a648568 100644
> > --- a/net/core/lwt_bpf.c
> > +++ b/net/core/lwt_bpf.c
> > @@ -63,6 +63,7 @@ static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt,
> >                                      lwt->name ? : "<unknown>");
> >                         ret = BPF_OK;
> >                 } else {
> > +                       skb_reset_mac_header(skb);
> >                         ret = skb_do_redirect(skb);
> >                         if (ret == 0)
> >                                 ret = BPF_REDIRECT;
> >

Sent http://patchwork.ozlabs.org/patch/1025584 including your
suggestion. Thanks Daniel. mac_header is indeed not yet set at
ip_finish_output2 and lwtunnel_xmit.

This line is not strictly needed to fix the bad packet that syzkaller
cooked through bpf_prog_test_run_skb. But we care more about fixing
the real lightweight tunnel paths and both changes will be needed
there.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-01-16  1:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-09 23:36 [PATCH net 0/2] bpf: Fix bpf_redirect to an ipip/ip6tnl dev Martin KaFai Lau
2016-11-09 23:36 ` [PATCH net 1/2] " Martin KaFai Lau
2019-01-10 20:41   ` Willem de Bruijn
2019-01-10 22:22     ` Daniel Borkmann
2019-01-10 22:44       ` Willem de Bruijn
2019-01-16  1:28         ` Willem de Bruijn
2016-11-09 23:36 ` [PATCH net 2/2] bpf: Add test for bpf_redirect to ipip/ip6tnl Martin KaFai Lau
2016-11-13  4:39 ` [PATCH net 0/2] bpf: Fix bpf_redirect to an ipip/ip6tnl dev David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.