All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/5] udp: Flow dissection for tunnels
@ 2016-10-12 23:25 Tom Herbert
  2016-10-12 23:25 ` [PATCH net-next 1/5] udp: Add socket lookup functions with noref Tom Herbert
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Tom Herbert @ 2016-10-12 23:25 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

Now that we have a means to perform a UDP socket lookup without taking
a reference, it is feasible to have flow dissector crack open UDP
encapsulated packets. Generally, we would expect that the UDP source
port or the flow label in IPv6 would contain enough entropy about
the encapsulated flow. However, there will be cases, such as a static
UDP tunnel with fixed ports, where dissecting the encapsulated packet
is valuable.

The model is here is similar to that implemented for UDP GRO. A
tunnel implementation (e.g. GUE) may set a flow_dissect function
in the udp_sk. In __skb_flow_dissect a case has been added for
UDP to check if there is a socket with flow_dissect set. If there
is the function is called. The (per tunnel implementation)
function can parse the encapsulation headers and return the
next protocol for __skb_flow_dissect to process and it's position
in nhoff.

Since performing a UDP lookup on every packet might be expensive
I added a static key check to bypass the lookup if there are no
sockets with flow_dissect set. I should mention that doing the
lookup wasn't particularly a big hit anyway.

Fou/gue was modified to perform tunnel dissection. This is enabled
on each listener socket via a netlink configuration option.

Tested:

Running 200 streams with TCP_RR.

GRE/GUE variable source port (baseline)
RSS distributes packets, RFS is effective
1211702 tps
147/241/442 50/90/99% latencies
87.95 CPU utilization

GRE/GUE fixed source port
All packets to one CPU, RFS is ineffective
173680 tps
1170/1377/1853 50/90/99% latencies
7.42 CPU utilization

GRE/GUE fixed source port with deep hash enabled
All packets to one CPU, but now RFS is effective
730359 tps
263/325/464 50/90/99% latencies
38.25% CPU utilization (Interrupting CPU is maxed out)


Tom Herbert (5):
  udp: Add socket lookup functions with noref
  udp: UDP flow dissector
  udp: Add UDP flow dissection functions to IPv4 and IPv6
  udp: UDP tunnel flow dissection infrastructure
  fou: Support flow dissection

 include/linux/netdevice.h    |  5 +++
 include/linux/udp.h          |  7 +++++
 include/net/flow_dissector.h |  8 +++++
 include/net/udp.h            | 12 ++++++++
 include/net/udp_tunnel.h     |  5 +++
 include/uapi/linux/fou.h     |  1 +
 net/core/flow_dissector.c    | 73 ++++++++++++++++++++++++++++++++++++++++++--
 net/ipv4/fou.c               | 68 ++++++++++++++++++++++++++++++++++++++++-
 net/ipv4/udp.c               | 11 +++++++
 net/ipv4/udp_offload.c       | 39 +++++++++++++++++++++++
 net/ipv4/udp_tunnel.c        |  5 +++
 net/ipv6/udp.c               | 10 ++++++
 net/ipv6/udp_offload.c       | 38 +++++++++++++++++++++++
 13 files changed, 279 insertions(+), 3 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH net-next 1/5] udp: Add socket lookup functions with noref
  2016-10-12 23:25 [PATCH net-next 0/5] udp: Flow dissection for tunnels Tom Herbert
@ 2016-10-12 23:25 ` Tom Herbert
  2016-10-12 23:25 ` [PATCH net-next 2/5] udp: UDP flow dissector Tom Herbert
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Tom Herbert @ 2016-10-12 23:25 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

Create udp4_lib_lookup_noref and udp6_lib_lookup_noref. These perfrom
a socket lookup on addresses and ports without taking a reference.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/net/udp.h |  8 ++++++++
 net/ipv4/udp.c    |  8 ++++++++
 net/ipv6/udp.c    | 10 ++++++++++
 3 files changed, 26 insertions(+)

diff --git a/include/net/udp.h b/include/net/udp.h
index ea53a87..717a972 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -275,6 +275,10 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
 			       struct udp_table *tbl, struct sk_buff *skb);
 struct sock *udp4_lib_lookup_skb(struct sk_buff *skb,
 				 __be16 sport, __be16 dport);
+struct sock *udp4_lib_lookup_noref(struct net *net,
+				   __be32 saddr, __be16 sport,
+				   __be32 daddr, __be16 dport,
+				   int dif);
 struct sock *udp6_lib_lookup(struct net *net,
 			     const struct in6_addr *saddr, __be16 sport,
 			     const struct in6_addr *daddr, __be16 dport,
@@ -286,6 +290,10 @@ struct sock *__udp6_lib_lookup(struct net *net,
 			       struct sk_buff *skb);
 struct sock *udp6_lib_lookup_skb(struct sk_buff *skb,
 				 __be16 sport, __be16 dport);
+struct sock *udp6_lib_lookup_noref(struct net *net,
+				   const struct in6_addr *saddr, __be16 sport,
+				   const struct in6_addr *daddr, __be16 dport,
+				   int dif);
 
 /*
  * 	SNMP statistics for UDP and UDP-Lite
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 7d96dc2..7f84c51 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -595,6 +595,14 @@ struct sock *udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
 EXPORT_SYMBOL_GPL(udp4_lib_lookup);
 #endif
 
+struct sock *udp4_lib_lookup_noref(struct net *net, __be32 saddr, __be16 sport,
+				   __be32 daddr, __be16 dport, int dif)
+{
+	return __udp4_lib_lookup(net, saddr, sport, daddr, dport,
+				 dif, &udp_table, NULL);
+}
+EXPORT_SYMBOL_GPL(udp4_lib_lookup_noref);
+
 static inline bool __udp_is_mcast_sock(struct net *net, struct sock *sk,
 				       __be16 loc_port, __be32 loc_addr,
 				       __be16 rmt_port, __be32 rmt_addr,
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 9aa7c1c..6e382d9 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -317,6 +317,16 @@ struct sock *udp6_lib_lookup(struct net *net, const struct in6_addr *saddr, __be
 EXPORT_SYMBOL_GPL(udp6_lib_lookup);
 #endif
 
+struct sock *udp6_lib_lookup_noref(struct net *net,
+				   const struct in6_addr *saddr, __be16 sport,
+				   const struct in6_addr *daddr, __be16 dport,
+				   int dif)
+{
+	return __udp6_lib_lookup(net, saddr, sport, daddr, dport,
+				 dif, &udp_table, NULL);
+}
+EXPORT_SYMBOL_GPL(udp6_lib_lookup_noref);
+
 /*
  *	This should be easy, if there is something there we
  *	return it, otherwise we block.
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next 2/5] udp: UDP flow dissector
  2016-10-12 23:25 [PATCH net-next 0/5] udp: Flow dissection for tunnels Tom Herbert
  2016-10-12 23:25 ` [PATCH net-next 1/5] udp: Add socket lookup functions with noref Tom Herbert
@ 2016-10-12 23:25 ` Tom Herbert
  2016-10-12 23:25 ` [PATCH net-next 3/5] udp: Add UDP flow dissection functions to IPv4 and IPv6 Tom Herbert
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Tom Herbert @ 2016-10-12 23:25 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

Add infrastructure for performing per protocol flow dissection and
support flow dissection in UDP payloads (e.g. flow dissection on a
UDP encapsulated tunnel.

The per protocol flow dissector is called by flow_dissect function
in the offload_callbacks of a protocol. The arguments of this function
include the necessary information to do flow dissection as derived
from __skb_flow_dissect which is where the callback is intended to be
called from. There are return codes from the callback in the form
FLOW_DIS_RET_* that indicate the result. FLOW_DIS_RET_IPPROTO
means that the payload should be dissected as an IP proto, the
specific protocol is returned in a pointer argument. Likewise,
FLOW_DIS_RET_PROTO indicate the payload should be processed as
an ethertype which is returned in another argument.

A case for IPPROTO_UDP was added to __skb_flow_dissect. Since
UDP flow dissector involves a relatively expensive socket lookup
there is a static key check first to see if there are any sockets
that have enabled flow dissection. After this check, the offload
ops for UDP for either IPv4 or IPv6 is considered. If the
flow_dissect function is it is called. Upon return the result
is processed (pass, out_bad, process as IP protocol, process
as ethertype). Note that if the result indicates a protocol must
be processed it is expected that nhoff has been updated to the
encapsulated protocol header.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/linux/netdevice.h    |  5 +++
 include/linux/udp.h          |  7 +++++
 include/net/flow_dissector.h |  8 +++++
 include/net/udp.h            |  4 +++
 net/core/flow_dissector.c    | 73 ++++++++++++++++++++++++++++++++++++++++++--
 net/ipv4/udp.c               |  3 ++
 6 files changed, 98 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 136ae6bb..51b43fb1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2199,6 +2199,11 @@ struct offload_callbacks {
 	struct sk_buff		**(*gro_receive)(struct sk_buff **head,
 						 struct sk_buff *skb);
 	int			(*gro_complete)(struct sk_buff *skb, int nhoff);
+	int			(*flow_dissect)(const struct sk_buff *skb,
+		void *data, int hlen,
+		int *nhoff, u8 *ip_proto,
+		__be16 *proto,
+		 struct flow_dissector_key_addrs *key_addrs);
 };
 
 struct packet_offload {
diff --git a/include/linux/udp.h b/include/linux/udp.h
index d1fd8cd..608ebf4 100644
--- a/include/linux/udp.h
+++ b/include/linux/udp.h
@@ -79,6 +79,13 @@ struct udp_sock {
 	int			(*gro_complete)(struct sock *sk,
 						struct sk_buff *skb,
 						int nhoff);
+
+	/* Flow dissector function for UDP socket */
+	int			(*flow_dissect)(struct sock *sk,
+						const struct sk_buff *skb,
+						void *data, int hlen,
+						int *nhoff, u8 *ip_proto,
+						__be16 *proto);
 };
 
 static inline struct udp_sock *udp_sk(const struct sock *sk)
diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index d953492..9de4904 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -203,4 +203,12 @@ static inline void *skb_flow_dissector_target(struct flow_dissector *flow_dissec
 	return ((char *)target_container) + flow_dissector->offset[key_id];
 }
 
+/* Return codes from per socket flow dissector (e.g. UDP) */
+enum {
+	FLOW_DIS_RET_PASS = 0,
+	FLOW_DIS_RET_BAD,
+	FLOW_DIS_RET_IPPROTO,
+	FLOW_DIS_RET_PROTO,
+};
+
 #endif
diff --git a/include/net/udp.h b/include/net/udp.h
index 717a972..8d364e8 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -360,4 +360,8 @@ void udp_encap_enable(void);
 #if IS_ENABLED(CONFIG_IPV6)
 void udpv6_encap_enable(void);
 #endif
+
+void udp_flow_dissect_enable(void);
+void udp_flow_dissect_disable(void);
+
 #endif	/* _UDP_H */
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 1a7b80f..5a4dfaf 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -8,6 +8,8 @@
 #include <net/ipv6.h>
 #include <net/gre.h>
 #include <net/pptp.h>
+#include <net/protocol.h>
+#include <net/udp.h>
 #include <linux/igmp.h>
 #include <linux/icmp.h>
 #include <linux/sctp.h>
@@ -57,6 +59,20 @@ void skb_flow_dissector_init(struct flow_dissector *flow_dissector,
 }
 EXPORT_SYMBOL(skb_flow_dissector_init);
 
+static struct static_key udp_flow_dissect __read_mostly;
+
+void udp_flow_dissect_enable(void)
+{
+	static_key_slow_inc(&udp_flow_dissect);
+}
+EXPORT_SYMBOL(udp_flow_dissect_enable);
+
+void udp_flow_dissect_disable(void)
+{
+	static_key_slow_dec(&udp_flow_dissect);
+}
+EXPORT_SYMBOL(udp_flow_dissect_disable);
+
 /**
  * __skb_flow_get_ports - extract the upper layer ports and return them
  * @skb: sk_buff to extract the ports from
@@ -115,7 +131,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 {
 	struct flow_dissector_key_control *key_control;
 	struct flow_dissector_key_basic *key_basic;
-	struct flow_dissector_key_addrs *key_addrs;
+	struct flow_dissector_key_addrs *key_addrs = NULL;
 	struct flow_dissector_key_ports *key_ports;
 	struct flow_dissector_key_tags *key_tags;
 	struct flow_dissector_key_vlan *key_vlan;
@@ -245,7 +261,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 	}
 	case htons(ETH_P_8021AD):
 	case htons(ETH_P_8021Q): {
-		const struct vlan_hdr *vlan;
+		const struct vlan_hdr *vlan = NULL;
 
 		if (skb_vlan_tag_present(skb))
 			proto = skb->protocol;
@@ -535,6 +551,59 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 	case IPPROTO_MPLS:
 		proto = htons(ETH_P_MPLS_UC);
 		goto mpls;
+	case IPPROTO_UDP:
+	{
+		const struct net_offload **offloads;
+		const struct net_offload *ops;
+		int ret;
+
+		if (!static_key_false(&udp_flow_dissect))
+			break;
+
+		if (!key_addrs)
+			break;
+
+		/* See if there is a flow dissector for UDP protocol */
+
+		switch (key_control->addr_type) {
+		case FLOW_DISSECTOR_KEY_IPV4_ADDRS:
+			offloads = inet_offloads;
+			break;
+		case FLOW_DISSECTOR_KEY_IPV6_ADDRS:
+			offloads = inet6_offloads;
+			break;
+		default:
+			goto udp_finish;
+		}
+
+		rcu_read_lock();
+
+		ops = rcu_dereference(offloads[IPPROTO_UDP]);
+
+		if (!ops || !ops->callbacks.flow_dissect) {
+			rcu_read_unlock();
+			break;
+		}
+
+		ret = ops->callbacks.flow_dissect(skb, data, hlen, &nhoff,
+						  &ip_proto, &proto, key_addrs);
+
+		rcu_read_unlock();
+
+		switch (ret) {
+		case FLOW_DIS_RET_IPPROTO:
+			goto ip_proto_again;
+		case FLOW_DIS_RET_PROTO:
+			goto again;
+		case FLOW_DIS_RET_BAD:
+			goto out_bad;
+		case FLOW_DIS_RET_PASS:
+		default:
+			break;
+		}
+udp_finish:
+		break;
+	}
 	default:
 		break;
 	}
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 7f84c51..b4b528e 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1977,6 +1977,9 @@ void udp_destroy_sock(struct sock *sk)
 		if (encap_destroy)
 			encap_destroy(sk);
 	}
+
+	if (up->flow_dissect)
+		udp_flow_dissect_disable();
 }
 
 /*
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next 3/5] udp: Add UDP flow dissection functions to IPv4 and IPv6
  2016-10-12 23:25 [PATCH net-next 0/5] udp: Flow dissection for tunnels Tom Herbert
  2016-10-12 23:25 ` [PATCH net-next 1/5] udp: Add socket lookup functions with noref Tom Herbert
  2016-10-12 23:25 ` [PATCH net-next 2/5] udp: UDP flow dissector Tom Herbert
@ 2016-10-12 23:25 ` Tom Herbert
  2016-10-12 23:25 ` [PATCH net-next 4/5] udp: UDP tunnel flow dissection infrastructure Tom Herbert
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Tom Herbert @ 2016-10-12 23:25 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

Add per protocol offload callbacks for flow_dissect to UDP for
IPv4 and IPv6. The callback functions extract the port number
information and with the packet addresses (given in an argument with
type flow_dissector_key_addrs) it performs a lookup on the UDP
socket. If a socket is found and flow_dissect is set for the
socket then that function is called.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 net/ipv4/udp_offload.c | 39 +++++++++++++++++++++++++++++++++++++++
 net/ipv6/udp_offload.c | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index f9333c9..c7753ba 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -377,11 +377,50 @@ static int udp4_gro_complete(struct sk_buff *skb, int nhoff)
 	return udp_gro_complete(skb, nhoff, udp4_lib_lookup_skb);
 }
 
+/* Assumes rcu lock is held */
+static int udp4_flow_dissect(const struct sk_buff *skb, void *data, int hlen,
+			     int *nhoff, u8 *ip_proto, __be16 *proto,
+			     struct flow_dissector_key_addrs *key_addrs)
+{
+	u16 _ports[2], *ports;
+	struct net *net;
+	struct sock *sk;
+	int dif = -1;
+
+	/* See if there is a flow dissector in the UDP socket */
+
+	if (skb->dev) {
+		net = dev_net(skb->dev);
+		dif = skb->dev->ifindex;
+	} else if (skb->sk) {
+		net = sock_net(skb->sk);
+	} else {
+		return FLOW_DIS_RET_PASS;
+	}
+
+	ports = __skb_header_pointer(skb, *nhoff, sizeof(_ports),
+				     data, hlen, &_ports);
+	if (!ports)
+		return FLOW_DIS_RET_BAD;
+
+	sk = udp4_lib_lookup_noref(net,
+				   key_addrs->v4addrs.src, ports[0],
+				   key_addrs->v4addrs.dst, ports[1],
+				   dif);
+
+	if (sk && udp_sk(sk)->flow_dissect)
+		return udp_sk(sk)->flow_dissect(sk, skb, data, hlen, nhoff,
+						ip_proto, proto);
+	else
+		return FLOW_DIS_RET_PASS;
+}
+
 static const struct net_offload udpv4_offload = {
 	.callbacks = {
 		.gso_segment = udp4_ufo_fragment,
 		.gro_receive  =	udp4_gro_receive,
 		.gro_complete =	udp4_gro_complete,
+		.flow_dissect = udp4_flow_dissect,
 	},
 };
 
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index ac858c4..b3f4a6c 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -163,11 +163,49 @@ static int udp6_gro_complete(struct sk_buff *skb, int nhoff)
 	return udp_gro_complete(skb, nhoff, udp6_lib_lookup_skb);
 }
 
+/* Assumes rcu lock is held */
+static int udp6_flow_dissect(const struct sk_buff *skb, void *data, int hlen,
+			     int *nhoff, u8 *ip_proto, __be16 *proto,
+			     const struct flow_dissector_key_addrs *key_addrs)
+{
+	u16 _ports[2], *ports;
+	struct net *net;
+	struct sock *sk;
+	int dif = -1;
+
+	/* See if there is a flow dissector in the UDP socket */
+
+	if (skb->dev) {
+		net = dev_net(skb->dev);
+		dif = skb->dev->ifindex;
+	} else if (skb->sk) {
+		net = sock_net(skb->sk);
+	} else {
+		return FLOW_DIS_RET_PASS;
+	}
+
+	ports = __skb_header_pointer(skb, *nhoff, sizeof(_ports),
+				     data, hlen, &_ports);
+	if (!ports)
+		return FLOW_DIS_RET_BAD;
+
+	sk = udp6_lib_lookup_noref(net,
+				   &key_addrs->v6addrs.src, ports[0],
+				   &key_addrs->v6addrs.dst, ports[1],
+				   dif);
+
+	if (sk && udp_sk(sk)->flow_dissect)
+		return udp_sk(sk)->flow_dissect(sk, skb, data, hlen, nhoff,
+						ip_proto, proto);
+	return FLOW_DIS_RET_PASS;
+}
+
 static const struct net_offload udpv6_offload = {
 	.callbacks = {
 		.gso_segment	=	udp6_ufo_fragment,
 		.gro_receive	=	udp6_gro_receive,
 		.gro_complete	=	udp6_gro_complete,
+		.flow_dissect	=	udp6_flow_dissect,
 	},
 };
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next 4/5] udp: UDP tunnel flow dissection infrastructure
  2016-10-12 23:25 [PATCH net-next 0/5] udp: Flow dissection for tunnels Tom Herbert
                   ` (2 preceding siblings ...)
  2016-10-12 23:25 ` [PATCH net-next 3/5] udp: Add UDP flow dissection functions to IPv4 and IPv6 Tom Herbert
@ 2016-10-12 23:25 ` Tom Herbert
  2016-10-12 23:25 ` [PATCH net-next 5/5] fou: Support flow dissection Tom Herbert
  2016-10-13 19:17 ` [PATCH net-next 0/5] udp: Flow dissection for tunnels David Miller
  5 siblings, 0 replies; 9+ messages in thread
From: Tom Herbert @ 2016-10-12 23:25 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

Add infrastructure to allow UDP tunnels to setup flow dissection.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/net/udp_tunnel.h | 5 +++++
 net/ipv4/udp_tunnel.c    | 5 +++++
 2 files changed, 10 insertions(+)

diff --git a/include/net/udp_tunnel.h b/include/net/udp_tunnel.h
index 02c5be0..81d2584 100644
--- a/include/net/udp_tunnel.h
+++ b/include/net/udp_tunnel.h
@@ -69,6 +69,10 @@ typedef struct sk_buff **(*udp_tunnel_gro_receive_t)(struct sock *sk,
 						     struct sk_buff *skb);
 typedef int (*udp_tunnel_gro_complete_t)(struct sock *sk, struct sk_buff *skb,
 					 int nhoff);
+typedef int (*udp_tunnel_flow_dissect_t)(struct sock *sk,
+					 const struct sk_buff *skb,
+					 void *data, int hlen, int *nhoff,
+					 u8 *ip_proto, __be16 *proto);
 
 struct udp_tunnel_sock_cfg {
 	void *sk_user_data;     /* user data used by encap_rcv call back */
@@ -78,6 +82,7 @@ struct udp_tunnel_sock_cfg {
 	udp_tunnel_encap_destroy_t encap_destroy;
 	udp_tunnel_gro_receive_t gro_receive;
 	udp_tunnel_gro_complete_t gro_complete;
+	udp_tunnel_flow_dissect_t flow_dissect;
 };
 
 /* Setup the given (UDP) sock to receive UDP encapsulated packets */
diff --git a/net/ipv4/udp_tunnel.c b/net/ipv4/udp_tunnel.c
index 58bd39f..4459288 100644
--- a/net/ipv4/udp_tunnel.c
+++ b/net/ipv4/udp_tunnel.c
@@ -72,6 +72,11 @@ void setup_udp_tunnel_sock(struct net *net, struct socket *sock,
 	udp_sk(sk)->gro_receive = cfg->gro_receive;
 	udp_sk(sk)->gro_complete = cfg->gro_complete;
 
+	if (cfg->flow_dissect) {
+		udp_sk(sk)->flow_dissect = cfg->flow_dissect;
+		udp_flow_dissect_enable();
+	}
+
 	udp_tunnel_encap_enable(sock);
 }
 EXPORT_SYMBOL_GPL(setup_udp_tunnel_sock);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next 5/5] fou: Support flow dissection
  2016-10-12 23:25 [PATCH net-next 0/5] udp: Flow dissection for tunnels Tom Herbert
                   ` (3 preceding siblings ...)
  2016-10-12 23:25 ` [PATCH net-next 4/5] udp: UDP tunnel flow dissection infrastructure Tom Herbert
@ 2016-10-12 23:25 ` Tom Herbert
  2016-10-13 19:17 ` [PATCH net-next 0/5] udp: Flow dissection for tunnels David Miller
  5 siblings, 0 replies; 9+ messages in thread
From: Tom Herbert @ 2016-10-12 23:25 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

This patch performs flow dissection for GUE and FOU. This is an
optional feature on the receiver and is set by FOU_ATTR_DEEP_HASH
netlink configuration. When enable the UDP socket flow_dissect
function is set to fou_flow_dissect or gue_flow_dissect as
appropriate. These functions return FLOW_DIS_RET_IPPROTO and
set ip protocol argument. In the case of GUE the header is
parsed to find the protocol number.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/uapi/linux/fou.h |  1 +
 net/ipv4/fou.c           | 68 +++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/fou.h b/include/uapi/linux/fou.h
index d2947c5..2c837eb 100644
--- a/include/uapi/linux/fou.h
+++ b/include/uapi/linux/fou.h
@@ -15,6 +15,7 @@ enum {
 	FOU_ATTR_IPPROTO,			/* u8 */
 	FOU_ATTR_TYPE,				/* u8 */
 	FOU_ATTR_REMCSUM_NOPARTIAL,		/* flag */
+	FOU_ATTR_DEEP_HASH,			/* flag */
 
 	__FOU_ATTR_MAX,
 };
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index cf50f7e..95ac5a8 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -27,7 +27,8 @@ struct fou {
 	struct rcu_head rcu;
 };
 
-#define FOU_F_REMCSUM_NOPARTIAL BIT(0)
+#define FOU_F_REMCSUM_NOPARTIAL	BIT(0)
+#define FOU_F_DEEP_HASH		BIT(1)
 
 struct fou_cfg {
 	u16 type;
@@ -281,6 +282,16 @@ static int fou_gro_complete(struct sock *sk, struct sk_buff *skb,
 	return err;
 }
 
+static int fou_flow_dissect(struct sock *sk, const struct sk_buff *skb,
+			    void *data, int hlen, int *nhoff, u8 *ip_proto,
+			    __be16 *proto)
+{
+	*ip_proto = fou_from_sock(sk)->protocol;
+	*nhoff += sizeof(struct udphdr);
+
+	return FLOW_DIS_RET_IPPROTO;
+}
+
 static struct guehdr *gue_gro_remcsum(struct sk_buff *skb, unsigned int off,
 				      struct guehdr *guehdr, void *data,
 				      size_t hdrlen, struct gro_remcsum *grc,
@@ -498,6 +509,48 @@ static int gue_gro_complete(struct sock *sk, struct sk_buff *skb, int nhoff)
 	return err;
 }
 
+static int gue_flow_dissect(struct sock *sk, const struct sk_buff *skb,
+			    void *data, int hlen, int *nhoff, u8 *ip_proto,
+			    __be16 *proto)
+{
+	struct guehdr _hdr, *hdr;
+
+	hdr = __skb_header_pointer(skb, *nhoff + sizeof(struct udphdr),
+				   sizeof(_hdr), data, hlen, &_hdr);
+	if (!hdr)
+		return FLOW_DIS_RET_BAD;
+
+	switch (hdr->version) {
+	case 0: /* Full GUE header present */
+		if (hdr->control)
+			return FLOW_DIS_RET_PASS;
+
+		*nhoff += sizeof(struct udphdr) + sizeof(_hdr) +
+			  (hdr->hlen << 2);
+		*ip_proto = hdr->proto_ctype;
+
+		return FLOW_DIS_RET_IPPROTO;
+	case 1:
+		/* Direct encasulation of IPv4 or IPv6 */
+
+		switch (((struct iphdr *)hdr)->version) {
+		case 4:
+			*nhoff += sizeof(struct udphdr);
+			*ip_proto = IPPROTO_IPIP;
+			return FLOW_DIS_RET_IPPROTO;
+		case 6:
+			*nhoff += sizeof(struct udphdr);
+			*ip_proto = IPPROTO_IPV6;
+			return FLOW_DIS_RET_IPPROTO;
+		default:
+			return FLOW_DIS_RET_PASS;
+		}
+
+	default:
+		return FLOW_DIS_RET_PASS;
+	}
+}
+
 static int fou_add_to_port_list(struct net *net, struct fou *fou)
 {
 	struct fou_net *fn = net_generic(net, fou_net_id);
@@ -568,12 +621,16 @@ static int fou_create(struct net *net, struct fou_cfg *cfg,
 		tunnel_cfg.encap_rcv = fou_udp_recv;
 		tunnel_cfg.gro_receive = fou_gro_receive;
 		tunnel_cfg.gro_complete = fou_gro_complete;
+		if (cfg->flags & FOU_F_DEEP_HASH)
+			tunnel_cfg.flow_dissect = fou_flow_dissect;
 		fou->protocol = cfg->protocol;
 		break;
 	case FOU_ENCAP_GUE:
 		tunnel_cfg.encap_rcv = gue_udp_recv;
 		tunnel_cfg.gro_receive = gue_gro_receive;
 		tunnel_cfg.gro_complete = gue_gro_complete;
+		if (cfg->flags & FOU_F_DEEP_HASH)
+			tunnel_cfg.flow_dissect = gue_flow_dissect;
 		break;
 	default:
 		err = -EINVAL;
@@ -637,6 +694,7 @@ static const struct nla_policy fou_nl_policy[FOU_ATTR_MAX + 1] = {
 	[FOU_ATTR_IPPROTO] = { .type = NLA_U8, },
 	[FOU_ATTR_TYPE] = { .type = NLA_U8, },
 	[FOU_ATTR_REMCSUM_NOPARTIAL] = { .type = NLA_FLAG, },
+	[FOU_ATTR_DEEP_HASH] = { .type = NLA_FLAG },
 };
 
 static int parse_nl_config(struct genl_info *info,
@@ -677,6 +735,9 @@ static int parse_nl_config(struct genl_info *info,
 	if (info->attrs[FOU_ATTR_REMCSUM_NOPARTIAL])
 		cfg->flags |= FOU_F_REMCSUM_NOPARTIAL;
 
+	if (info->attrs[FOU_ATTR_DEEP_HASH])
+		cfg->flags |= FOU_F_DEEP_HASH;
+
 	return 0;
 }
 
@@ -717,6 +778,11 @@ static int fou_fill_info(struct fou *fou, struct sk_buff *msg)
 	if (fou->flags & FOU_F_REMCSUM_NOPARTIAL)
 		if (nla_put_flag(msg, FOU_ATTR_REMCSUM_NOPARTIAL))
 			return -1;
+
+	if (fou->flags & FOU_F_DEEP_HASH)
+		if (nla_put_flag(msg, FOU_ATTR_DEEP_HASH))
+			return -1;
+
 	return 0;
 }
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 0/5] udp: Flow dissection for tunnels
  2016-10-12 23:25 [PATCH net-next 0/5] udp: Flow dissection for tunnels Tom Herbert
                   ` (4 preceding siblings ...)
  2016-10-12 23:25 ` [PATCH net-next 5/5] fou: Support flow dissection Tom Herbert
@ 2016-10-13 19:17 ` David Miller
  2016-10-13 19:29   ` Tom Herbert
  5 siblings, 1 reply; 9+ messages in thread
From: David Miller @ 2016-10-13 19:17 UTC (permalink / raw)
  To: tom; +Cc: netdev, kernel-team

From: Tom Herbert <tom@herbertland.com>
Date: Wed, 12 Oct 2016 16:25:42 -0700

> Since performing a UDP lookup on every packet might be expensive I
> added a static key check to bypass the lookup if there are no
> sockets with flow_dissect set. I should mention that doing the
> lookup wasn't particularly a big hit anyway.

I think this new static key is unnecessary, as it is equivalent
to: (udp_encap_needed + udpv6_encap_needed)

This socket lookup is very heavy handed, and I realize that you
need this because we no longer store the encapsulation socket in
skb->sk these days.

Can you talk about the various code paths that lead into the
flow dissector and why the UDP socket lookup is needed?  Maybe
we can propagate it another way, at least on TX.

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 0/5] udp: Flow dissection for tunnels
  2016-10-13 19:17 ` [PATCH net-next 0/5] udp: Flow dissection for tunnels David Miller
@ 2016-10-13 19:29   ` Tom Herbert
  2016-10-14 15:03     ` David Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Tom Herbert @ 2016-10-13 19:29 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Kernel Network Developers, Kernel Team

On Thu, Oct 13, 2016 at 12:17 PM, David Miller <davem@davemloft.net> wrote:
> From: Tom Herbert <tom@herbertland.com>
> Date: Wed, 12 Oct 2016 16:25:42 -0700
>
>> Since performing a UDP lookup on every packet might be expensive I
>> added a static key check to bypass the lookup if there are no
>> sockets with flow_dissect set. I should mention that doing the
>> lookup wasn't particularly a big hit anyway.
>
> I think this new static key is unnecessary, as it is equivalent
> to: (udp_encap_needed + udpv6_encap_needed)
>
Good point.

> This socket lookup is very heavy handed, and I realize that you
> need this because we no longer store the encapsulation socket in
> skb->sk these days.
>
I don't quite understand your point about the encapsulation socket.
The reason we do the socket lookup is the same as why we do it this
way in GRO already, identifying UDP encapsulation by just port is
insufficient and may lead to incorrect results. Once we removed the
atomic operation from the UDP socket lookup (thanks Eric!) it becomes
feasible to do the lookup in critical paths and so the special per
port offloads lookup was removed. In my testing I didn't see a
noticeable hit in performing the lookup, but it still seems prudent to
use the static key since UDP tunnels are a relatively narrow use case.

> Can you talk about the various code paths that lead into the
> flow dissector and why the UDP socket lookup is needed?  Maybe
> we can propagate it another way, at least on TX.
>
The code path I'm interested in simple RX, particularly in the case
where UDP is being used as static tunnel for many flows in IPv4
(assuming in IPv6 we have flow label to provide entropy). For getting
the flow hash on TX we are already using sk->tx_hash if it's
available.

Tom

> Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 0/5] udp: Flow dissection for tunnels
  2016-10-13 19:29   ` Tom Herbert
@ 2016-10-14 15:03     ` David Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David Miller @ 2016-10-14 15:03 UTC (permalink / raw)
  To: tom; +Cc: netdev, kernel-team

From: Tom Herbert <tom@herbertland.com>
Date: Thu, 13 Oct 2016 12:29:46 -0700

> On Thu, Oct 13, 2016 at 12:17 PM, David Miller <davem@davemloft.net> wrote:
>> This socket lookup is very heavy handed, and I realize that you
>> need this because we no longer store the encapsulation socket in
>> skb->sk these days.
>>
> I don't quite understand your point about the encapsulation socket.

On the transmit side we used to have an issue wrt. what socket lives
on skb->sk when encapsulation is involved.

The problem is that we need skb->sk to be the transport level socket,
but in the output path we used to make tests on "skb->sk" to determine
things such as the multicast loopback flag.  But we should be looking
at the tunnel socket for that, otherwise we could do crazy things
like dereference an AF_PACKET socket as if it were an inet socket one.

As such we modified the output path to pass the inner tunnel socket
'sk' down through the call chain, as an argument to functions such as
ip_queue_xmit(), ip_local_out*(), etc.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-10-14 15:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-12 23:25 [PATCH net-next 0/5] udp: Flow dissection for tunnels Tom Herbert
2016-10-12 23:25 ` [PATCH net-next 1/5] udp: Add socket lookup functions with noref Tom Herbert
2016-10-12 23:25 ` [PATCH net-next 2/5] udp: UDP flow dissector Tom Herbert
2016-10-12 23:25 ` [PATCH net-next 3/5] udp: Add UDP flow dissection functions to IPv4 and IPv6 Tom Herbert
2016-10-12 23:25 ` [PATCH net-next 4/5] udp: UDP tunnel flow dissection infrastructure Tom Herbert
2016-10-12 23:25 ` [PATCH net-next 5/5] fou: Support flow dissection Tom Herbert
2016-10-13 19:17 ` [PATCH net-next 0/5] udp: Flow dissection for tunnels David Miller
2016-10-13 19:29   ` Tom Herbert
2016-10-14 15:03     ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.