All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 net-next 0/7] udp: Flow dissection for tunnels
@ 2016-10-17 19:41 Tom Herbert
  2016-10-17 19:41 ` [PATCH v2 net-next 1/7] ipv6: Fix Makefile conditional to use CONFIG_INET Tom Herbert
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: Tom Herbert @ 2016-10-17 19:41 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

Now that we have a means to perform a UDP socket lookup without taking
a reference, it is feasible to have flow dissector crack open UDP
encapsulated packets. Generally, we would expect that the UDP source
port or the flow label in IPv6 would contain enough entropy about
the encapsulated flow. However, there will be cases, such as a static
UDP tunnel with fixed ports, where dissecting the encapsulated packet
is valuable.

The model is here is similar to that implemented for UDP GRO. A
tunnel implementation (e.g. GUE) may set a flow_dissect function
in the udp_sk. In __skb_flow_dissect a case has been added for
UDP to check if there is a socket with flow_dissect set. If there
is the function is called. The (per tunnel implementation)
function can parse the encapsulation headers and return the
next protocol for __skb_flow_dissect to process and it's position
in nhoff.

Since performing a UDP lookup on every packet might be expensive
I added a static key check to bypass the lookup if there are no
sockets with flow_dissect set. I should mention that doing the
lookup wasn't particularly a big hit anyway.

Fou/gue was modified to perform tunnel dissection. This is enabled
on each listener socket via a netlink configuration option.

v2:
  - davem suggested that we don't need udp_flow_dissect and that
    udp{v6}_encap_needed could be used. Problem is that those are
    in respective udp.c and flow_dissector.c is in net/core. Keep
    udp_flow_dissect as more generic item.
  - Fixed Makefile issue where we were using CONFIG_NET instead of
    CONFIG_INET.
  - Added limits inf flow dissector from controlling number of nested
    encapsulations or EHs that are dissected.
  - Added CONFIG_INET around use of inet_offloads in flow_dissector.c.

Tested:

Running 200 streams with TCP_RR.

GRE/GUE variable source port (baseline)
RSS distributes packets, RFS is effective
1211702 tps
147/241/442 50/90/99% latencies
87.95 CPU utilization

GRE/GUE fixed source port
All packets to one CPU, RFS is ineffective
173680 tps
1170/1377/1853 50/90/99% latencies
7.42 CPU utilization

GRE/GUE fixed source port with deep hash enabled
All packets to one CPU, but now RFS is effective
730359 tps
263/325/464 50/90/99% latencies
38.25% CPU utilization (Interrupting CPU is maxed out)


Tom Herbert (7):
  ipv6: Fix Makefile conditional to use CONFIG_INET
  flow_dissector: Limit processing of next encaps and extensions
  udp: Add socket lookup functions with noref
  udp: UDP flow dissector
  udp: Add UDP flow dissection functions to IPv4 and IPv6
  udp: UDP tunnel flow dissection infrastructure
  fou: Support flow dissection

 include/linux/netdevice.h    |   5 ++
 include/linux/udp.h          |   7 +++
 include/net/flow_dissector.h |   8 +++
 include/net/udp.h            |  12 +++++
 include/net/udp_tunnel.h     |   5 ++
 include/uapi/linux/fou.h     |   1 +
 net/Makefile                 |   2 +-
 net/core/flow_dissector.c    | 122 ++++++++++++++++++++++++++++++++++++++-----
 net/ipv4/fou.c               |  68 +++++++++++++++++++++++-
 net/ipv4/udp.c               |  11 ++++
 net/ipv4/udp_offload.c       |  39 ++++++++++++++
 net/ipv4/udp_tunnel.c        |   5 ++
 net/ipv6/udp.c               |  10 ++++
 net/ipv6/udp_offload.c       |  40 +++++++++++++-
 14 files changed, 320 insertions(+), 15 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 net-next 1/7] ipv6: Fix Makefile conditional to use CONFIG_INET
  2016-10-17 19:41 [PATCH v2 net-next 0/7] udp: Flow dissection for tunnels Tom Herbert
@ 2016-10-17 19:41 ` Tom Herbert
  2016-10-17 19:41 ` [PATCH v2 net-next 2/7] flow_dissector: Limit processing of next encaps and extensions Tom Herbert
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Tom Herbert @ 2016-10-17 19:41 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

ipv6 directory was being built based on CONFIG_NET not CONFIG_INET.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 net/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/Makefile b/net/Makefile
index 4cafaa2..82ffb91 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -17,7 +17,7 @@ obj-$(CONFIG_NETFILTER)		+= netfilter/
 obj-$(CONFIG_INET)		+= ipv4/
 obj-$(CONFIG_XFRM)		+= xfrm/
 obj-$(CONFIG_UNIX)		+= unix/
-obj-$(CONFIG_NET)		+= ipv6/
+obj-$(CONFIG_INET)		+= ipv6/
 obj-$(CONFIG_PACKET)		+= packet/
 obj-$(CONFIG_NET_KEY)		+= key/
 obj-$(CONFIG_BRIDGE)		+= bridge/
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 net-next 2/7] flow_dissector: Limit processing of next encaps and extensions
  2016-10-17 19:41 [PATCH v2 net-next 0/7] udp: Flow dissection for tunnels Tom Herbert
  2016-10-17 19:41 ` [PATCH v2 net-next 1/7] ipv6: Fix Makefile conditional to use CONFIG_INET Tom Herbert
@ 2016-10-17 19:41 ` Tom Herbert
  2016-10-17 19:41 ` [PATCH v2 net-next 3/7] udp: Add socket lookup functions with noref Tom Herbert
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Tom Herbert @ 2016-10-17 19:41 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

Flow dissector does not limit the number of encapsulated packets or IPv6
header extensions that will be processed. This could easily be
suceptible to DOS attack-- for instance a 1500 byte packet could contain
75 IPIP headers.

This patch places limits on the number of encapsulations and IPv6 extension
headers that are processed in flow dissector

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 net/core/flow_dissector.c | 37 +++++++++++++++++++++++++++----------
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 1a7b80f..919bd02 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -91,6 +91,22 @@ __be32 __skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto,
 }
 EXPORT_SYMBOL(__skb_flow_get_ports);
 
+#define MAX_DISSECT_DEPTH	10
+#define MAX_DISSECT_EXT		10
+
+#define __DISSECT_AGAIN(_target, _depth, _limit) do {	\
+	(_depth)++;					\
+	if ((_depth) > (_limit))				\
+		goto out_good;				\
+	else						\
+		goto _target;				\
+} while (0)
+
+#define DISSECT_AGAIN(target) \
+	__DISSECT_AGAIN(target, depth, MAX_DISSECT_DEPTH)
+#define DISSECT_AGAIN_EXT(target) \
+	__DISSECT_AGAIN(target, ext_cnt, MAX_DISSECT_EXT)
+
 /**
  * __skb_flow_dissect - extract the flow_keys struct and return it
  * @skb: sk_buff to extract the flow from, can be NULL if the rest are specified
@@ -123,6 +139,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 	bool skip_vlan = false;
 	u8 ip_proto = 0;
 	bool ret = false;
+	int depth = 0, ext_cnt = 0;
 
 	if (!data) {
 		data = skb->data;
@@ -262,7 +279,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 			proto = vlan->h_vlan_encapsulated_proto;
 			nhoff += sizeof(*vlan);
 			if (skip_vlan)
-				goto again;
+				DISSECT_AGAIN(again);
 		}
 
 		skip_vlan = true;
@@ -285,7 +302,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 			}
 		}
 
-		goto again;
+		DISSECT_AGAIN(again);
 	}
 	case htons(ETH_P_PPP_SES): {
 		struct {
@@ -299,9 +316,9 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 		nhoff += PPPOE_SES_HLEN;
 		switch (proto) {
 		case htons(PPP_IP):
-			goto ip;
+			DISSECT_AGAIN(ip);
 		case htons(PPP_IPV6):
-			goto ipv6;
+			DISSECT_AGAIN(ipv6);
 		default:
 			goto out_bad;
 		}
@@ -472,7 +489,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 		if (flags & FLOW_DISSECTOR_F_STOP_AT_ENCAP)
 			goto out_good;
 
-		goto again;
+		DISSECT_AGAIN(again);
 	}
 	case NEXTHDR_HOP:
 	case NEXTHDR_ROUTING:
@@ -490,7 +507,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 		ip_proto = opthdr[0];
 		nhoff += (opthdr[1] + 1) << 3;
 
-		goto ip_proto_again;
+		DISSECT_AGAIN_EXT(ip_proto_again);
 	}
 	case NEXTHDR_FRAGMENT: {
 		struct frag_hdr _fh, *fh;
@@ -512,7 +529,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 		if (!(fh->frag_off & htons(IP6_OFFSET))) {
 			key_control->flags |= FLOW_DIS_FIRST_FRAG;
 			if (flags & FLOW_DISSECTOR_F_PARSE_1ST_FRAG)
-				goto ip_proto_again;
+				DISSECT_AGAIN_EXT(ip_proto_again);
 		}
 		goto out_good;
 	}
@@ -523,7 +540,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 		if (flags & FLOW_DISSECTOR_F_STOP_AT_ENCAP)
 			goto out_good;
 
-		goto ip;
+		DISSECT_AGAIN(ip);
 	case IPPROTO_IPV6:
 		proto = htons(ETH_P_IPV6);
 
@@ -531,10 +548,10 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 		if (flags & FLOW_DISSECTOR_F_STOP_AT_ENCAP)
 			goto out_good;
 
-		goto ipv6;
+		DISSECT_AGAIN(ipv6);
 	case IPPROTO_MPLS:
 		proto = htons(ETH_P_MPLS_UC);
-		goto mpls;
+		DISSECT_AGAIN(mpls);
 	default:
 		break;
 	}
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 net-next 3/7] udp: Add socket lookup functions with noref
  2016-10-17 19:41 [PATCH v2 net-next 0/7] udp: Flow dissection for tunnels Tom Herbert
  2016-10-17 19:41 ` [PATCH v2 net-next 1/7] ipv6: Fix Makefile conditional to use CONFIG_INET Tom Herbert
  2016-10-17 19:41 ` [PATCH v2 net-next 2/7] flow_dissector: Limit processing of next encaps and extensions Tom Herbert
@ 2016-10-17 19:41 ` Tom Herbert
  2016-10-17 19:41 ` [PATCH v2 net-next 4/7] udp: UDP flow dissector Tom Herbert
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Tom Herbert @ 2016-10-17 19:41 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

Create udp4_lib_lookup_noref and udp6_lib_lookup_noref. These perfrom
a socket lookup on addresses and ports without taking a reference.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/net/udp.h |  8 ++++++++
 net/ipv4/udp.c    |  8 ++++++++
 net/ipv6/udp.c    | 10 ++++++++++
 3 files changed, 26 insertions(+)

diff --git a/include/net/udp.h b/include/net/udp.h
index ea53a87..717a972 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -275,6 +275,10 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
 			       struct udp_table *tbl, struct sk_buff *skb);
 struct sock *udp4_lib_lookup_skb(struct sk_buff *skb,
 				 __be16 sport, __be16 dport);
+struct sock *udp4_lib_lookup_noref(struct net *net,
+				   __be32 saddr, __be16 sport,
+				   __be32 daddr, __be16 dport,
+				   int dif);
 struct sock *udp6_lib_lookup(struct net *net,
 			     const struct in6_addr *saddr, __be16 sport,
 			     const struct in6_addr *daddr, __be16 dport,
@@ -286,6 +290,10 @@ struct sock *__udp6_lib_lookup(struct net *net,
 			       struct sk_buff *skb);
 struct sock *udp6_lib_lookup_skb(struct sk_buff *skb,
 				 __be16 sport, __be16 dport);
+struct sock *udp6_lib_lookup_noref(struct net *net,
+				   const struct in6_addr *saddr, __be16 sport,
+				   const struct in6_addr *daddr, __be16 dport,
+				   int dif);
 
 /*
  * 	SNMP statistics for UDP and UDP-Lite
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 7d96dc2..7f84c51 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -595,6 +595,14 @@ struct sock *udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
 EXPORT_SYMBOL_GPL(udp4_lib_lookup);
 #endif
 
+struct sock *udp4_lib_lookup_noref(struct net *net, __be32 saddr, __be16 sport,
+				   __be32 daddr, __be16 dport, int dif)
+{
+	return __udp4_lib_lookup(net, saddr, sport, daddr, dport,
+				 dif, &udp_table, NULL);
+}
+EXPORT_SYMBOL_GPL(udp4_lib_lookup_noref);
+
 static inline bool __udp_is_mcast_sock(struct net *net, struct sock *sk,
 				       __be16 loc_port, __be32 loc_addr,
 				       __be16 rmt_port, __be32 rmt_addr,
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 9aa7c1c..6e382d9 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -317,6 +317,16 @@ struct sock *udp6_lib_lookup(struct net *net, const struct in6_addr *saddr, __be
 EXPORT_SYMBOL_GPL(udp6_lib_lookup);
 #endif
 
+struct sock *udp6_lib_lookup_noref(struct net *net,
+				   const struct in6_addr *saddr, __be16 sport,
+				   const struct in6_addr *daddr, __be16 dport,
+				   int dif)
+{
+	return __udp6_lib_lookup(net, saddr, sport, daddr, dport,
+				 dif, &udp_table, NULL);
+}
+EXPORT_SYMBOL_GPL(udp6_lib_lookup_noref);
+
 /*
  *	This should be easy, if there is something there we
  *	return it, otherwise we block.
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 net-next 4/7] udp: UDP flow dissector
  2016-10-17 19:41 [PATCH v2 net-next 0/7] udp: Flow dissection for tunnels Tom Herbert
                   ` (2 preceding siblings ...)
  2016-10-17 19:41 ` [PATCH v2 net-next 3/7] udp: Add socket lookup functions with noref Tom Herbert
@ 2016-10-17 19:41 ` Tom Herbert
  2016-10-17 19:42 ` [PATCH v2 net-next 5/7] udp: Add UDP flow dissection functions to IPv4 and IPv6 Tom Herbert
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Tom Herbert @ 2016-10-17 19:41 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

Add infrastructure for performing per protocol flow dissection and
support flow dissection in UDP payloads (e.g. flow dissection on a
UDP encapsulated tunnel.

The per protocol flow dissector is called by flow_dissect function
in the offload_callbacks of a protocol. The arguments of this function
include the necessary information to do flow dissection as derived
from __skb_flow_dissect which is where the callback is intended to be
called from. There are return codes from the callback in the form
FLOW_DIS_RET_* that indicate the result. FLOW_DIS_RET_IPPROTO
means that the payload should be dissected as an IP proto, the
specific protocol is returned in a pointer argument. Likewise,
FLOW_DIS_RET_PROTO indicate the payload should be processed as
an ethertype which is returned in another argument.

A case for IPPROTO_UDP was added to __skb_flow_dissect. Since
UDP flow dissector involves a relatively expensive socket lookup
there is a static key check first to see if there are any sockets
that have enabled flow dissection. After this check, the offload
ops for UDP for either IPv4 or IPv6 is considered. If the
flow_dissect function is it is called. Upon return the result
is processed (pass, out_bad, process as IP protocol, process
as ethertype). Note that if the result indicates a protocol must
be processed it is expected that nhoff has been updated to the
encapsulated protocol header.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/linux/netdevice.h    |  5 +++
 include/linux/udp.h          |  7 ++++
 include/net/flow_dissector.h |  8 +++++
 include/net/udp.h            |  4 +++
 net/core/flow_dissector.c    | 85 ++++++++++++++++++++++++++++++++++++++++++--
 net/ipv4/udp.c               |  3 ++
 6 files changed, 110 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index bf341b6..c5f4295 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2203,6 +2203,11 @@ struct offload_callbacks {
 	struct sk_buff		**(*gro_receive)(struct sk_buff **head,
 						 struct sk_buff *skb);
 	int			(*gro_complete)(struct sk_buff *skb, int nhoff);
+	int			(*flow_dissect)(const struct sk_buff *skb,
+		void *data, int hlen,
+		int *nhoff, u8 *ip_proto,
+		__be16 *proto,
+		 struct flow_dissector_key_addrs *key_addrs);
 };
 
 struct packet_offload {
diff --git a/include/linux/udp.h b/include/linux/udp.h
index d1fd8cd..608ebf4 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -79,6 +79,13 @@ struct udp_sock {
 	int			(*gro_complete)(struct sock *sk,
 						struct sk_buff *skb,
 						int nhoff);
+
+	/* Flow dissector function for UDP socket */
+	int			(*flow_dissect)(struct sock *sk,
+						const struct sk_buff *skb,
+						void *data, int hlen,
+						int *nhoff, u8 *ip_proto,
+						__be16 *proto);
 };
 
 static inline struct udp_sock *udp_sk(const struct sock *sk)
diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index d953492..9de4904 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -203,4 +203,12 @@ static inline void *skb_flow_dissector_target(struct flow_dissector *flow_dissec
 	return ((char *)target_container) + flow_dissector->offset[key_id];
 }
 
+/* Return codes from per socket flow dissector (e.g. UDP) */
+enum {
+	FLOW_DIS_RET_PASS = 0,
+	FLOW_DIS_RET_BAD,
+	FLOW_DIS_RET_IPPROTO,
+	FLOW_DIS_RET_PROTO,
+};
+
 #endif
diff --git a/include/net/udp.h b/include/net/udp.h
index 717a972..8d364e8 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -360,4 +360,8 @@ void udp_encap_enable(void);
 #if IS_ENABLED(CONFIG_IPV6)
 void udpv6_encap_enable(void);
 #endif
+
+void udp_flow_dissect_enable(void);
+void udp_flow_dissect_disable(void);
+
 #endif	/* _UDP_H */
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 919bd02..06ccfd5 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -8,6 +8,8 @@
 #include <net/ipv6.h>
 #include <net/gre.h>
 #include <net/pptp.h>
+#include <net/protocol.h>
+#include <net/udp.h>
 #include <linux/igmp.h>
 #include <linux/icmp.h>
 #include <linux/sctp.h>
@@ -57,6 +59,20 @@ void skb_flow_dissector_init(struct flow_dissector *flow_dissector,
 }
 EXPORT_SYMBOL(skb_flow_dissector_init);
 
+static struct static_key udp_flow_dissect __read_mostly;
+
+void udp_flow_dissect_enable(void)
+{
+	static_key_slow_inc(&udp_flow_dissect);
+}
+EXPORT_SYMBOL(udp_flow_dissect_enable);
+
+void udp_flow_dissect_disable(void)
+{
+	static_key_slow_dec(&udp_flow_dissect);
+}
+EXPORT_SYMBOL(udp_flow_dissect_disable);
+
 /**
  * __skb_flow_get_ports - extract the upper layer ports and return them
  * @skb: sk_buff to extract the ports from
@@ -131,7 +147,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 {
 	struct flow_dissector_key_control *key_control;
 	struct flow_dissector_key_basic *key_basic;
-	struct flow_dissector_key_addrs *key_addrs;
+	struct flow_dissector_key_addrs *key_addrs = NULL;
 	struct flow_dissector_key_ports *key_ports;
 	struct flow_dissector_key_tags *key_tags;
 	struct flow_dissector_key_vlan *key_vlan;
@@ -262,7 +278,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 	}
 	case htons(ETH_P_8021AD):
 	case htons(ETH_P_8021Q): {
-		const struct vlan_hdr *vlan;
+		const struct vlan_hdr *vlan = NULL;
 
 		if (skb_vlan_tag_present(skb))
 			proto = skb->protocol;
@@ -552,6 +568,71 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 	case IPPROTO_MPLS:
 		proto = htons(ETH_P_MPLS_UC);
 		DISSECT_AGAIN(mpls);
+	case IPPROTO_UDP:
+	{
+		const struct net_offload **offloads;
+		const struct net_offload *ops;
+		int ret;
+
+		if (!static_key_false(&udp_flow_dissect))
+			break;
+
+		if (depth) {
+			/* Only try to parse the UDP encapsulation if no
+			 * encapsulation has been encountered yet. With an
+			 * encapsulated packet there is a good chance that it is
+			 * in a different namespace so the UDP lookup to get
+			 * flow dissection may be invalid.
+			 */
+			break;
+		}
+
+		if (!key_addrs)
+			break;
+
+		/* See if there is a flow dissector for UDP protocol */
+
+		switch (key_control->addr_type) {
+#ifdef CONFIG_INET
+		case FLOW_DISSECTOR_KEY_IPV4_ADDRS:
+			offloads = inet_offloads;
+			break;
+		case FLOW_DISSECTOR_KEY_IPV6_ADDRS:
+			offloads = inet6_offloads;
+			break;
+#endif
+		default:
+			goto udp_finish;
+		}
+
+		rcu_read_lock();
+
+		ops = rcu_dereference(offloads[IPPROTO_UDP]);
+
+		if (!ops || !ops->callbacks.flow_dissect) {
+			rcu_read_unlock();
+			goto udp_finish;
+		}
+
+		ret = ops->callbacks.flow_dissect(skb, data, hlen, &nhoff,
+						  &ip_proto, &proto, key_addrs);
+
+		rcu_read_unlock();
+
+		switch (ret) {
+		case FLOW_DIS_RET_IPPROTO:
+			DISSECT_AGAIN(ip_proto_again);
+		case FLOW_DIS_RET_PROTO:
+			DISSECT_AGAIN(again);
+		case FLOW_DIS_RET_BAD:
+			goto out_bad;
+		case FLOW_DIS_RET_PASS:
+		default:
+			break;
+		}
+udp_finish:
+		break;
+	}
 	default:
 		break;
 	}
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 7f84c51..b4b528e 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1977,6 +1977,9 @@ void udp_destroy_sock(struct sock *sk)
 		if (encap_destroy)
 			encap_destroy(sk);
 	}
+
+	if (up->flow_dissect)
+		udp_flow_dissect_disable();
 }
 
 /*
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 net-next 5/7] udp: Add UDP flow dissection functions to IPv4 and IPv6
  2016-10-17 19:41 [PATCH v2 net-next 0/7] udp: Flow dissection for tunnels Tom Herbert
                   ` (3 preceding siblings ...)
  2016-10-17 19:41 ` [PATCH v2 net-next 4/7] udp: UDP flow dissector Tom Herbert
@ 2016-10-17 19:42 ` Tom Herbert
  2016-10-17 19:42 ` [PATCH v2 net-next 6/7] udp: UDP tunnel flow dissection infrastructure Tom Herbert
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Tom Herbert @ 2016-10-17 19:42 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

Add per protocol offload callbacks for flow_dissect to UDP for
IPv4 and IPv6. The callback functions extract the port number
information and with the packet addresses (given in an argument with
type flow_dissector_key_addrs) it performs a lookup on the UDP
socket. If a socket is found and flow_dissect is set for the
socket then that function is called.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 net/ipv4/udp_offload.c | 39 +++++++++++++++++++++++++++++++++++++++
 net/ipv6/udp_offload.c | 40 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 78 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index f9333c9..c7753ba 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -377,11 +377,50 @@ static int udp4_gro_complete(struct sk_buff *skb, int nhoff)
 	return udp_gro_complete(skb, nhoff, udp4_lib_lookup_skb);
 }
 
+/* Assumes rcu lock is held */
+static int udp4_flow_dissect(const struct sk_buff *skb, void *data, int hlen,
+			     int *nhoff, u8 *ip_proto, __be16 *proto,
+			     struct flow_dissector_key_addrs *key_addrs)
+{
+	u16 _ports[2], *ports;
+	struct net *net;
+	struct sock *sk;
+	int dif = -1;
+
+	/* See if there is a flow dissector in the UDP socket */
+
+	if (skb->dev) {
+		net = dev_net(skb->dev);
+		dif = skb->dev->ifindex;
+	} else if (skb->sk) {
+		net = sock_net(skb->sk);
+	} else {
+		return FLOW_DIS_RET_PASS;
+	}
+
+	ports = __skb_header_pointer(skb, *nhoff, sizeof(_ports),
+				     data, hlen, &_ports);
+	if (!ports)
+		return FLOW_DIS_RET_BAD;
+
+	sk = udp4_lib_lookup_noref(net,
+				   key_addrs->v4addrs.src, ports[0],
+				   key_addrs->v4addrs.dst, ports[1],
+				   dif);
+
+	if (sk && udp_sk(sk)->flow_dissect)
+		return udp_sk(sk)->flow_dissect(sk, skb, data, hlen, nhoff,
+						ip_proto, proto);
+	else
+		return FLOW_DIS_RET_PASS;
+}
+
 static const struct net_offload udpv4_offload = {
 	.callbacks = {
 		.gso_segment = udp4_ufo_fragment,
 		.gro_receive  =	udp4_gro_receive,
 		.gro_complete =	udp4_gro_complete,
+		.flow_dissect = udp4_flow_dissect,
 	},
 };
 
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index ac858c4..12d9a92 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -1,5 +1,5 @@
 /*
- *	IPV6 GSO/GRO offload support
+ *	ipv6 gso/gro offload support
  *	Linux INET6 implementation
  *
  *	This program is free software; you can redistribute it and/or
@@ -163,11 +163,49 @@ static int udp6_gro_complete(struct sk_buff *skb, int nhoff)
 	return udp_gro_complete(skb, nhoff, udp6_lib_lookup_skb);
 }
 
+/* Assumes rcu lock is held */
+static int udp6_flow_dissect(const struct sk_buff *skb, void *data, int hlen,
+			     int *nhoff, u8 *ip_proto, __be16 *proto,
+			     const struct flow_dissector_key_addrs *key_addrs)
+{
+	u16 _ports[2], *ports;
+	struct net *net;
+	struct sock *sk;
+	int dif = -1;
+
+	/* See if there is a flow dissector in the UDP socket */
+
+	if (skb->dev) {
+		net = dev_net(skb->dev);
+		dif = skb->dev->ifindex;
+	} else if (skb->sk) {
+		net = sock_net(skb->sk);
+	} else {
+		return FLOW_DIS_RET_PASS;
+	}
+
+	ports = __skb_header_pointer(skb, *nhoff, sizeof(_ports),
+				     data, hlen, &_ports);
+	if (!ports)
+		return FLOW_DIS_RET_BAD;
+
+	sk = udp6_lib_lookup_noref(net,
+				   &key_addrs->v6addrs.src, ports[0],
+				   &key_addrs->v6addrs.dst, ports[1],
+				   dif);
+
+	if (sk && udp_sk(sk)->flow_dissect)
+		return udp_sk(sk)->flow_dissect(sk, skb, data, hlen, nhoff,
+						ip_proto, proto);
+	return FLOW_DIS_RET_PASS;
+}
+
 static const struct net_offload udpv6_offload = {
 	.callbacks = {
 		.gso_segment	=	udp6_ufo_fragment,
 		.gro_receive	=	udp6_gro_receive,
 		.gro_complete	=	udp6_gro_complete,
+		.flow_dissect	=	udp6_flow_dissect,
 	},
 };
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 net-next 6/7] udp: UDP tunnel flow dissection infrastructure
  2016-10-17 19:41 [PATCH v2 net-next 0/7] udp: Flow dissection for tunnels Tom Herbert
                   ` (4 preceding siblings ...)
  2016-10-17 19:42 ` [PATCH v2 net-next 5/7] udp: Add UDP flow dissection functions to IPv4 and IPv6 Tom Herbert
@ 2016-10-17 19:42 ` Tom Herbert
  2016-10-17 19:42 ` [PATCH v2 net-next 7/7] fou: Support flow dissection Tom Herbert
  2016-10-18 15:48 ` [PATCH v2 net-next 0/7] udp: Flow dissection for tunnels David Miller
  7 siblings, 0 replies; 10+ messages in thread
From: Tom Herbert @ 2016-10-17 19:42 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

Add infrastructure to allow UDP tunnels to setup flow dissecion.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/net/udp_tunnel.h | 5 +++++
 net/ipv4/udp_tunnel.c    | 5 +++++
 2 files changed, 10 insertions(+)

diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
index 02c5be0..81d2584 100644
--- a/include/net/udp_tunnel.h
+++ b/include/net/udp_tunnel.h
@@ -69,6 +69,10 @@ typedef struct sk_buff **(*udp_tunnel_gro_receive_t)(struct sock *sk,
 						     struct sk_buff *skb);
 typedef int (*udp_tunnel_gro_complete_t)(struct sock *sk, struct sk_buff *skb,
 					 int nhoff);
+typedef int (*udp_tunnel_flow_dissect_t)(struct sock *sk,
+					 const struct sk_buff *skb,
+					 void *data, int hlen, int *nhoff,
+					 u8 *ip_proto, __be16 *proto);
 
 struct udp_tunnel_sock_cfg {
 	void *sk_user_data;     /* user data used by encap_rcv call back */
@@ -78,6 +82,7 @@ struct udp_tunnel_sock_cfg {
 	udp_tunnel_encap_destroy_t encap_destroy;
 	udp_tunnel_gro_receive_t gro_receive;
 	udp_tunnel_gro_complete_t gro_complete;
+	udp_tunnel_flow_dissect_t flow_dissect;
 };
 
 /* Setup the given (UDP) sock to receive UDP encapsulated packets */
diff --git a/net/ipv4/udp_tunnel.c b/net/ipv4/udp_tunnel.c
index 58bd39f..4459288 100644
--- a/net/ipv4/udp_tunnel.c
+++ b/net/ipv4/udp_tunnel.c
@@ -72,6 +72,11 @@ void setup_udp_tunnel_sock(struct net *net, struct socket *sock,
 	udp_sk(sk)->gro_receive = cfg->gro_receive;
 	udp_sk(sk)->gro_complete = cfg->gro_complete;
 
+	if (cfg->flow_dissect) {
+		udp_sk(sk)->flow_dissect = cfg->flow_dissect;
+		udp_flow_dissect_enable();
+	}
+
 	udp_tunnel_encap_enable(sock);
 }
 EXPORT_SYMBOL_GPL(setup_udp_tunnel_sock);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 net-next 7/7] fou: Support flow dissection
  2016-10-17 19:41 [PATCH v2 net-next 0/7] udp: Flow dissection for tunnels Tom Herbert
                   ` (5 preceding siblings ...)
  2016-10-17 19:42 ` [PATCH v2 net-next 6/7] udp: UDP tunnel flow dissection infrastructure Tom Herbert
@ 2016-10-17 19:42 ` Tom Herbert
  2016-10-18 15:48 ` [PATCH v2 net-next 0/7] udp: Flow dissection for tunnels David Miller
  7 siblings, 0 replies; 10+ messages in thread
From: Tom Herbert @ 2016-10-17 19:42 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

This patch performs flow dissection for GUE and FOU. This is an
optional feature on the receiver and is set by FOU_ATTR_DEEP_HASH
netlink configuration. When enable the UDP socket flow_dissect
function is set to fou_flow_dissect or gue_flow_dissect as
appropriate. These functions return FLOW_DIS_RET_IPPROTO and
set ip protocol argument. In the case of GUE the header is
parsed to find the protocol number.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/uapi/linux/fou.h |  1 +
 net/ipv4/fou.c           | 68 +++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/fou.h b/include/uapi/linux/fou.h
index d2947c5..2c837eb 100644
--- a/include/uapi/linux/fou.h
+++ b/include/uapi/linux/fou.h
@@ -15,6 +15,7 @@ enum {
 	FOU_ATTR_IPPROTO,			/* u8 */
 	FOU_ATTR_TYPE,				/* u8 */
 	FOU_ATTR_REMCSUM_NOPARTIAL,		/* flag */
+	FOU_ATTR_DEEP_HASH,			/* flag */
 
 	__FOU_ATTR_MAX,
 };
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index cf50f7e..95ac5a8 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -27,7 +27,8 @@ struct fou {
 	struct rcu_head rcu;
 };
 
-#define FOU_F_REMCSUM_NOPARTIAL BIT(0)
+#define FOU_F_REMCSUM_NOPARTIAL	BIT(0)
+#define FOU_F_DEEP_HASH		BIT(1)
 
 struct fou_cfg {
 	u16 type;
@@ -281,6 +282,16 @@ static int fou_gro_complete(struct sock *sk, struct sk_buff *skb,
 	return err;
 }
 
+static int fou_flow_dissect(struct sock *sk, const struct sk_buff *skb,
+			    void *data, int hlen, int *nhoff, u8 *ip_proto,
+			    __be16 *proto)
+{
+	*ip_proto = fou_from_sock(sk)->protocol;
+	*nhoff += sizeof(struct udphdr);
+
+	return FLOW_DIS_RET_IPPROTO;
+}
+
 static struct guehdr *gue_gro_remcsum(struct sk_buff *skb, unsigned int off,
 				      struct guehdr *guehdr, void *data,
 				      size_t hdrlen, struct gro_remcsum *grc,
@@ -498,6 +509,48 @@ static int gue_gro_complete(struct sock *sk, struct sk_buff *skb, int nhoff)
 	return err;
 }
 
+static int gue_flow_dissect(struct sock *sk, const struct sk_buff *skb,
+			    void *data, int hlen, int *nhoff, u8 *ip_proto,
+			    __be16 *proto)
+{
+	struct guehdr _hdr, *hdr;
+
+	hdr = __skb_header_pointer(skb, *nhoff + sizeof(struct udphdr),
+				   sizeof(_hdr), data, hlen, &_hdr);
+	if (!hdr)
+		return FLOW_DIS_RET_BAD;
+
+	switch (hdr->version) {
+	case 0: /* Full GUE header present */
+		if (hdr->control)
+			return FLOW_DIS_RET_PASS;
+
+		*nhoff += sizeof(struct udphdr) + sizeof(_hdr) +
+			  (hdr->hlen << 2);
+		*ip_proto = hdr->proto_ctype;
+
+		return FLOW_DIS_RET_IPPROTO;
+	case 1:
+		/* Direct encasulation of IPv4 or IPv6 */
+
+		switch (((struct iphdr *)hdr)->version) {
+		case 4:
+			*nhoff += sizeof(struct udphdr);
+			*ip_proto = IPPROTO_IPIP;
+			return FLOW_DIS_RET_IPPROTO;
+		case 6:
+			*nhoff += sizeof(struct udphdr);
+			*ip_proto = IPPROTO_IPV6;
+			return FLOW_DIS_RET_IPPROTO;
+		default:
+			return FLOW_DIS_RET_PASS;
+		}
+
+	default:
+		return FLOW_DIS_RET_PASS;
+	}
+}
+
 static int fou_add_to_port_list(struct net *net, struct fou *fou)
 {
 	struct fou_net *fn = net_generic(net, fou_net_id);
@@ -568,12 +621,16 @@ static int fou_create(struct net *net, struct fou_cfg *cfg,
 		tunnel_cfg.encap_rcv = fou_udp_recv;
 		tunnel_cfg.gro_receive = fou_gro_receive;
 		tunnel_cfg.gro_complete = fou_gro_complete;
+		if (cfg->flags & FOU_F_DEEP_HASH)
+			tunnel_cfg.flow_dissect = fou_flow_dissect;
 		fou->protocol = cfg->protocol;
 		break;
 	case FOU_ENCAP_GUE:
 		tunnel_cfg.encap_rcv = gue_udp_recv;
 		tunnel_cfg.gro_receive = gue_gro_receive;
 		tunnel_cfg.gro_complete = gue_gro_complete;
+		if (cfg->flags & FOU_F_DEEP_HASH)
+			tunnel_cfg.flow_dissect = gue_flow_dissect;
 		break;
 	default:
 		err = -EINVAL;
@@ -637,6 +694,7 @@ static const struct nla_policy fou_nl_policy[FOU_ATTR_MAX + 1] = {
 	[FOU_ATTR_IPPROTO] = { .type = NLA_U8, },
 	[FOU_ATTR_TYPE] = { .type = NLA_U8, },
 	[FOU_ATTR_REMCSUM_NOPARTIAL] = { .type = NLA_FLAG, },
+	[FOU_ATTR_DEEP_HASH] = { .type = NLA_FLAG },
 };
 
 static int parse_nl_config(struct genl_info *info,
@@ -677,6 +735,9 @@ static int parse_nl_config(struct genl_info *info,
 	if (info->attrs[FOU_ATTR_REMCSUM_NOPARTIAL])
 		cfg->flags |= FOU_F_REMCSUM_NOPARTIAL;
 
+	if (info->attrs[FOU_ATTR_DEEP_HASH])
+		cfg->flags |= FOU_F_DEEP_HASH;
+
 	return 0;
 }
 
@@ -717,6 +778,11 @@ static int fou_fill_info(struct fou *fou, struct sk_buff *msg)
 	if (fou->flags & FOU_F_REMCSUM_NOPARTIAL)
 		if (nla_put_flag(msg, FOU_ATTR_REMCSUM_NOPARTIAL))
 			return -1;
+
+	if (fou->flags & FOU_F_DEEP_HASH)
+		if (nla_put_flag(msg, FOU_ATTR_DEEP_HASH))
+			return -1;
+
 	return 0;
 }
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 net-next 0/7] udp: Flow dissection for tunnels
  2016-10-17 19:41 [PATCH v2 net-next 0/7] udp: Flow dissection for tunnels Tom Herbert
                   ` (6 preceding siblings ...)
  2016-10-17 19:42 ` [PATCH v2 net-next 7/7] fou: Support flow dissection Tom Herbert
@ 2016-10-18 15:48 ` David Miller
  2016-10-18 15:52   ` David Miller
  7 siblings, 1 reply; 10+ messages in thread
From: David Miller @ 2016-10-18 15:48 UTC (permalink / raw)
  To: tom; +Cc: netdev, kernel-team

From: Tom Herbert <tom@herbertland.com>
Date: Mon, 17 Oct 2016 12:41:55 -0700

> Now that we have a means to perform a UDP socket lookup without taking
> a reference, it is feasible to have flow dissector crack open UDP
> encapsulated packets. Generally, we would expect that the UDP source
> port or the flow label in IPv6 would contain enough entropy about
> the encapsulated flow. However, there will be cases, such as a static
> UDP tunnel with fixed ports, where dissecting the encapsulated packet
> is valuable.
> 
> The model is here is similar to that implemented for UDP GRO. A
> tunnel implementation (e.g. GUE) may set a flow_dissect function
> in the udp_sk. In __skb_flow_dissect a case has been added for
> UDP to check if there is a socket with flow_dissect set. If there
> is the function is called. The (per tunnel implementation)
> function can parse the encapsulation headers and return the
> next protocol for __skb_flow_dissect to process and it's position
> in nhoff.
> 
> Since performing a UDP lookup on every packet might be expensive
> I added a static key check to bypass the lookup if there are no
> sockets with flow_dissect set. I should mention that doing the
> lookup wasn't particularly a big hit anyway.
> 
> Fou/gue was modified to perform tunnel dissection. This is enabled
> on each listener socket via a netlink configuration option.

Series applied, thanks Tom.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 net-next 0/7] udp: Flow dissection for tunnels
  2016-10-18 15:48 ` [PATCH v2 net-next 0/7] udp: Flow dissection for tunnels David Miller
@ 2016-10-18 15:52   ` David Miller
  0 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2016-10-18 15:52 UTC (permalink / raw)
  To: tom; +Cc: netdev, kernel-team

From: David Miller <davem@davemloft.net>
Date: Tue, 18 Oct 2016 11:48:37 -0400 (EDT)

> Series applied, thanks Tom.

Actually, reverted.

Tom, would you mind build testing with ipv6 enabled? :-)

net/ipv6/udp_offload.c:208:19: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
   .flow_dissect = udp6_flow_dissect,
                   ^
net/ipv6/udp_offload.c:208:19: note: (near initialization for ‘udpv6_offload.callbacks.flow_dissect’)

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-10-18 15:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-17 19:41 [PATCH v2 net-next 0/7] udp: Flow dissection for tunnels Tom Herbert
2016-10-17 19:41 ` [PATCH v2 net-next 1/7] ipv6: Fix Makefile conditional to use CONFIG_INET Tom Herbert
2016-10-17 19:41 ` [PATCH v2 net-next 2/7] flow_dissector: Limit processing of next encaps and extensions Tom Herbert
2016-10-17 19:41 ` [PATCH v2 net-next 3/7] udp: Add socket lookup functions with noref Tom Herbert
2016-10-17 19:41 ` [PATCH v2 net-next 4/7] udp: UDP flow dissector Tom Herbert
2016-10-17 19:42 ` [PATCH v2 net-next 5/7] udp: Add UDP flow dissection functions to IPv4 and IPv6 Tom Herbert
2016-10-17 19:42 ` [PATCH v2 net-next 6/7] udp: UDP tunnel flow dissection infrastructure Tom Herbert
2016-10-17 19:42 ` [PATCH v2 net-next 7/7] fou: Support flow dissection Tom Herbert
2016-10-18 15:48 ` [PATCH v2 net-next 0/7] udp: Flow dissection for tunnels David Miller
2016-10-18 15:52   ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.