All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/7] tou: Transports over UDP - part I
@ 2016-05-23 22:48 Tom Herbert
  2016-05-23 22:48 ` [RFC PATCH 1/7] fou: Get net from sock_net if dev_net unavailable Tom Herbert
                   ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: Tom Herbert @ 2016-05-23 22:48 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

Transports over UDP is intended to encapsulate TCP and other transport
protocols directly and securely in UDP.

The goal of this work is twofold:

1) Allow applications to run their own transport layer stack (i.e.from
   userspace). This eliminates dependencies on the OS (e.g. solves a
   major dependency issue for Facebook on clients).

2) Make transport layer headers (all of L4) invisible to the network
   so that they can't do intrusive actions at L4. This will be enforced
   with DTLS in use.

Note that #1 is really about running a transport stack in userspace
applications in clients, not necessarily servers. For servers we
intend to modified the kernel stack in order to leverage existing
implementation for building scalable serves (hence these patches).

This is described in more detail in the Internet Draft:
https://tools.ietf.org/html/draft-herbert-transports-over-udp-00

In Part I we implement a straightforward encapsulation of TCP in GUE.
The implements the basic mechanics of TOU encapsulation for TCP,
however does not yet implement the IP addressing interactions so
therefore so this is not robust to use in the presence of NAT.
TOU is enabled per socket with a new socket option. This
implementation includes GSO, GRO, and RCO support.

These patches also establish the baseline performance of TOU
and isolate the performance cost of UDP encapsulation. Performance
results are below.

Tested: Various cases of TOU with IPv4, IPv6 using TCP_STREAM and
TCP_RR. Also, tested IPIP for comparing TOU encapsulation to IP
tunneling.

    - IPv6 native
      1 TCP_STREAM
	8394 tps
      200 TCP_RR
	1726825 tps
	100/177/361 90/95/99% latencies

    - IPv6 TOU with RCO
      1 TCP_STREAM
	7410 tps
      200 TCP_RR
	1445603 tps
	121/211/395 90/95/99% latencies

    - IPv4 native
      1 TCP_STREAM
	8525 tps
      200 TCP_RR
	1826729 tps
	94/166/345 90/95/99% latencies

    - IPv4 TOU with RCO
      1 TCP_STREAM
	7624 tps
      200 TCP_RR
	1599642 tps
	108/190/377 90/95/99% latencies

    - IPIP with GUE and RCO
      1 TCP_STREAM
	5092 tps
      200 TCP_RR
	1276716 tps
	137/237/445 90/95/99% latencies


Tom Herbert (7):
  fou: Get net from sock_net if dev_net unavailable
  tou: Base infrastructure for Transport over UDP
  ipv4: Support TOU
  tcp: Support for TOU
  ipv6: Support TOU
  tcp6: Support for TOU
  tou: Support for GSO

 include/linux/skbuff.h           |   2 +
 include/net/inet_sock.h          |   1 +
 include/net/udp.h                |   2 +
 include/uapi/linux/if_tunnel.h   |  10 +++
 include/uapi/linux/in.h          |   1 +
 include/uapi/linux/in6.h         |   1 +
 net/ipv4/Makefile                |   3 +-
 net/ipv4/af_inet.c               |   4 +
 net/ipv4/fou.c                   |  24 +++++-
 net/ipv4/ip_output.c             |  42 ++++++++--
 net/ipv4/ip_sockglue.c           |   7 ++
 net/ipv4/tcp_ipv4.c              |   9 ++-
 net/ipv4/tou.c                   | 132 +++++++++++++++++++++++++++++++
 net/ipv4/udp_offload.c           | 164 +++++++++++++++++++++++++++++++++++++--
 net/ipv6/inet6_connection_sock.c |  59 ++++++++++++--
 net/ipv6/ipv6_sockglue.c         |   7 ++
 net/ipv6/tcp_ipv6.c              |  11 +--
 net/ipv6/udp_offload.c           | 128 +++++++++++++++---------------
 18 files changed, 512 insertions(+), 95 deletions(-)
 create mode 100644 net/ipv4/tou.c

-- 
2.8.0.rc2

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH 1/7] fou: Get net from sock_net if dev_net unavailable
  2016-05-23 22:48 [RFC PATCH 0/7] tou: Transports over UDP - part I Tom Herbert
@ 2016-05-23 22:48 ` Tom Herbert
  2016-05-24 22:01   ` David Miller
  2016-05-23 22:48 ` [RFC PATCH 2/7] tou: Base infrastructure for Transport over UDP Tom Herbert
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Tom Herbert @ 2016-05-23 22:48 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

With the implementation of Transports over UDP fou and gue build header
may be called before skb->dev is set. This patch checks skb->dev and
if it is not set then tries to get net from sock_net.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 net/ipv4/fou.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index 5f9207c..96260c6 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -807,13 +807,20 @@ int __fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
 		       u8 *protocol, __be16 *sport, int type)
 {
 	int err;
+	struct net *net;
 
 	err = iptunnel_handle_offloads(skb, type);
 	if (err)
 		return err;
 
-	*sport = e->sport ? : udp_flow_src_port(dev_net(skb->dev),
-						skb, 0, 0, false);
+	if (skb->dev)
+		net = dev_net(skb->dev);
+	else if (skb->sk)
+		net = sock_net(skb->sk);
+	else
+		return -EINVAL;
+
+	*sport = e->sport ? : udp_flow_src_port(net, skb, 0, 0, false);
 
 	return 0;
 }
@@ -845,6 +852,14 @@ int __gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
 	void *data;
 	bool need_priv = false;
 	int err;
+	struct net *net;
+
+	if (skb->dev)
+		net = dev_net(skb->dev);
+	else if (skb->sk)
+		net = sock_net(skb->sk);
+	else
+		return -EINVAL;
 
 	if ((e->flags & TUNNEL_ENCAP_FLAG_REMCSUM) &&
 	    skb->ip_summed == CHECKSUM_PARTIAL) {
@@ -860,8 +875,7 @@ int __gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
 		return err;
 
 	/* Get source port (based on flow hash) before skb_push */
-	*sport = e->sport ? : udp_flow_src_port(dev_net(skb->dev),
-						skb, 0, 0, false);
+	*sport = e->sport ? : udp_flow_src_port(net, skb, 0, 0, false);
 
 	hdrlen = sizeof(struct guehdr) + optlen;
 
-- 
2.8.0.rc2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 2/7] tou: Base infrastructure for Transport over UDP
  2016-05-23 22:48 [RFC PATCH 0/7] tou: Transports over UDP - part I Tom Herbert
  2016-05-23 22:48 ` [RFC PATCH 1/7] fou: Get net from sock_net if dev_net unavailable Tom Herbert
@ 2016-05-23 22:48 ` Tom Herbert
  2016-05-24 22:02   ` David Miller
  2016-05-23 22:48 ` [RFC PATCH 3/7] ipv4: Support TOU Tom Herbert
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Tom Herbert @ 2016-05-23 22:48 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

Add tou.c that implements common setsockopt functionality. This includes
initialization and argument structure for the setsockopt.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/uapi/linux/if_tunnel.h |  10 ++++
 net/ipv4/Makefile              |   3 +-
 net/ipv4/af_inet.c             |   4 ++
 net/ipv4/tou.c                 | 132 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 148 insertions(+), 1 deletion(-)
 create mode 100644 net/ipv4/tou.c

diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index af4de90..c6b4afa 100644
--- a/include/uapi/linux/if_tunnel.h
+++ b/include/uapi/linux/if_tunnel.h
@@ -71,6 +71,16 @@ enum tunnel_encap_types {
 #define TUNNEL_ENCAP_FLAG_CSUM6		(1<<1)
 #define TUNNEL_ENCAP_FLAG_REMCSUM	(1<<2)
 
+/* Structure for Transport Over UDP (TOU) encapsulation. This is used in
+ * setsockopt of inet sockets.
+ */
+struct tou_encap {
+	u16			type; /* enum tunnel_encap_types */
+	u16			flags;
+	__be16			sport;
+	__be16			dport;
+};
+
 /* SIT-mode i_flags */
 #define	SIT_ISATAP	0x0001
 
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index bfa1336..3b46dd6 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -12,7 +12,8 @@ obj-y     := route.o inetpeer.o protocol.o \
 	     tcp_offload.o datagram.o raw.o udp.o udplite.o \
 	     udp_offload.o arp.o icmp.o devinet.o af_inet.o igmp.o \
 	     fib_frontend.o fib_semantics.o fib_trie.o \
-	     inet_fragment.o ping.o ip_tunnel_core.o gre_offload.o
+	     inet_fragment.o ping.o ip_tunnel_core.o gre_offload.o \
+	     tou.o
 
 obj-$(CONFIG_NET_IP_TUNNEL) += ip_tunnel.o
 obj-$(CONFIG_SYSCTL) += sysctl_net_ipv4.o
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 377424e..7a856f8 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -120,6 +120,7 @@
 #include <linux/mroute.h>
 #endif
 #include <net/l3mdev.h>
+#include <net/tou.h>
 
 
 /* The inetsw table contains everything that inet_create needs to
@@ -1822,6 +1823,9 @@ static int __init inet_init(void)
 	/* Add UDP-Lite (RFC 3828) */
 	udplite4_register();
 
+	/* Set TOU slab cache (Transport layer encapsulation over UDP) */
+	tou_init();
+
 	ping_init();
 
 	/*
diff --git a/net/ipv4/tou.c b/net/ipv4/tou.c
new file mode 100644
index 0000000..601466a
--- /dev/null
+++ b/net/ipv4/tou.c
@@ -0,0 +1,132 @@
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/socket.h>
+#include <linux/skbuff.h>
+#include <linux/ip.h>
+#include <linux/udp.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <net/genetlink.h>
+#include <net/gue.h>
+#include <net/ip.h>
+#include <net/protocol.h>
+#include <net/udp.h>
+#include <net/udp_tunnel.h>
+#include <net/xfrm.h>
+#include <net/tou.h>
+#include <net/ip6_tunnel.h>
+#include <uapi/linux/fou.h>
+#include <uapi/linux/genetlink.h>
+
+static struct kmem_cache *tou_cachep __read_mostly;
+
+int tou_encap_setsockopt(struct sock *sk, char __user *optval, int optlen,
+			 bool is_ipv6)
+{
+	struct tou_encap te;
+	struct ip_tunnel_encap encap;
+	struct inet_sock *inet = inet_sk(sk);
+	struct ip_tunnel_encap *e = inet->tou_encap;
+	int hlen = 0, old_hlen = 0;
+
+	if (optlen < sizeof(te))
+		return -EINVAL;
+
+	if (copy_from_user(&te, optval, sizeof(te)))
+		return -EFAULT;
+
+	if (e) {
+		old_hlen = is_ipv6 ? ip6_encap_hlen(e) : ip_encap_hlen(e);
+		if (unlikely(old_hlen < 0))
+			return -EINVAL;
+	}
+
+	if (te.type == TUNNEL_ENCAP_NONE) {
+		if (e) {
+			if (unlikely(old_hlen < 0))
+				return -EINVAL;
+
+			inet->tou_encap = NULL;
+			kmem_cache_free(tou_cachep, e);
+
+			goto adjust_ext_hdr;
+		} else {
+			return 0;
+		}
+	}
+
+	memset(&encap, 0, sizeof(encap));
+	encap.type = te.type;
+	encap.sport = te.sport;
+	encap.dport = te.dport;
+	encap.flags = te.flags;
+
+	hlen = is_ipv6 ? ip6_encap_hlen(e) : ip_encap_hlen(e);
+	if (hlen < 0)
+		return hlen;
+
+	if (!e) {
+		e = kmem_cache_alloc(tou_cachep, GFP_KERNEL);
+		if (!e)
+			return -ENOMEM;
+		inet->tou_encap = e;
+	}
+
+	*e = encap;
+
+adjust_ext_hdr:
+	if (inet->is_icsk) {
+		struct inet_connection_sock *icsk = inet_csk(sk);
+
+		/* For a connected socket add the overhead of encapsulation
+		 * (specifically the difference between the new encapsulation
+		 * and the old one it present) into the extrenal header length
+		 * and adjust the mss.
+		 */
+		icsk->icsk_ext_hdr_len += (hlen - old_hlen);
+		icsk->icsk_sync_mss(sk, icsk->icsk_pmtu_cookie);
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(tou_encap_setsockopt);
+
+int tou_encap_getsockopt(struct sock *sk, char __user *optval,
+			 int len, int __user *optlen, bool is_ipv6)
+{
+	struct tou_encap te;
+	struct inet_sock *inet = inet_sk(sk);
+	struct ip_tunnel_encap *e = inet->tou_encap;
+
+	if (len < sizeof(te))
+		return -EINVAL;
+
+	len = sizeof(te);
+
+	memset(&te, 0, sizeof(te));
+
+	if (!e) {
+		te.type = TUNNEL_ENCAP_NONE;
+	} else {
+		te.type = e->type;
+		te.sport = e->sport;
+		te.dport = e->dport;
+		te.flags = e->flags;
+	}
+
+	if (put_user(len, optlen))
+		return -EFAULT;
+
+	if (copy_to_user(optval, &te, len))
+		return -EFAULT;
+
+	return 0;
+}
+EXPORT_SYMBOL(tou_encap_getsockopt);
+
+void __init tou_init(void)
+{
+	tou_cachep = kmem_cache_create("tou_cache",
+				       sizeof(struct ip_tunnel_encap), 0,
+				       SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL);
+}
-- 
2.8.0.rc2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 3/7] ipv4: Support TOU
  2016-05-23 22:48 [RFC PATCH 0/7] tou: Transports over UDP - part I Tom Herbert
  2016-05-23 22:48 ` [RFC PATCH 1/7] fou: Get net from sock_net if dev_net unavailable Tom Herbert
  2016-05-23 22:48 ` [RFC PATCH 2/7] tou: Base infrastructure for Transport over UDP Tom Herbert
@ 2016-05-23 22:48 ` Tom Herbert
  2016-05-24  3:16   ` Eric Dumazet
  2016-05-23 22:48 ` [RFC PATCH 4/7] tcp: Support for TOU Tom Herbert
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Tom Herbert @ 2016-05-23 22:48 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

Add tou_encap structure to inet_sock. In transmit path (ip_queue_xmit)
check if encapsulation is enabled and call the build header op
if it is. Add IP_TOU_ENCAP setsockopt for IPv4 sockets.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/net/inet_sock.h |  1 +
 include/uapi/linux/in.h |  1 +
 net/ipv4/ip_output.c    | 40 ++++++++++++++++++++++++++++++++++------
 net/ipv4/ip_sockglue.c  |  7 +++++++
 net/ipv4/tou.c          |  2 +-
 5 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 012b1f9..ce22bc9 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -209,6 +209,7 @@ struct inet_sock {
 	__be32			mc_addr;
 	struct ip_mc_socklist __rcu	*mc_list;
 	struct inet_cork_full	cork;
+	struct ip_tunnel_encap	*tou_encap;
 };
 
 #define IPCORK_OPT	1	/* ip-options has been held in ipcork.opt */
diff --git a/include/uapi/linux/in.h b/include/uapi/linux/in.h
index eaf9491..9827bff 100644
--- a/include/uapi/linux/in.h
+++ b/include/uapi/linux/in.h
@@ -152,6 +152,7 @@ struct in_addr {
 #define MCAST_MSFILTER			48
 #define IP_MULTICAST_ALL		49
 #define IP_UNICAST_IF			50
+#define IP_TOU_ENCAP			51
 
 #define MCAST_EXCLUDE	0
 #define MCAST_INCLUDE	1
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 124bf0a..e7dbded 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -78,6 +78,7 @@
 #include <linux/netfilter_bridge.h>
 #include <linux/netlink.h>
 #include <linux/tcp.h>
+#include <net/ip_tunnels.h>
 
 static int
 ip_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
@@ -382,11 +383,36 @@ int ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl)
 	struct rtable *rt;
 	struct iphdr *iph;
 	int res;
+	__be16 dport, sport;
+	u8 protocol = sk->sk_protocol;
 
 	/* Skip all of this if the packet is already routed,
 	 * f.e. by something like SCTP.
 	 */
 	rcu_read_lock();
+
+	if (inet->tou_encap) {
+		struct ip_tunnel_encap *e = inet->tou_encap;
+		const struct ip_tunnel_encap_ops *ops;
+
+		/* Transport layer protocol over UDP enapsulation */
+		dport = e->dport;
+		sport = e->sport;
+		ops = rcu_dereference(iptun_encaps[e->type]);
+		if (likely(ops && ops->build_header)) {
+			res = ops->build_header(skb, e, &protocol,
+						(struct flowi4 *)fl);
+			if (res < 0)
+				goto fail;
+		} else {
+			res = -EINVAL;
+			goto fail;
+		}
+	} else {
+		dport = inet->inet_dport;
+		sport = inet->inet_sport;
+	}
+
 	inet_opt = rcu_dereference(inet->inet_opt);
 	fl4 = &fl->u.ip4;
 	rt = skb_rtable(skb);
@@ -409,9 +435,9 @@ int ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl)
 		 */
 		rt = ip_route_output_ports(net, fl4, sk,
 					   daddr, inet->inet_saddr,
-					   inet->inet_dport,
-					   inet->inet_sport,
-					   sk->sk_protocol,
+					   dport,
+					   sport,
+					   protocol,
 					   RT_CONN_FLAGS(sk),
 					   sk->sk_bound_dev_if);
 		if (IS_ERR(rt))
@@ -434,7 +460,7 @@ packet_routed:
 	else
 		iph->frag_off = 0;
 	iph->ttl      = ip_select_ttl(inet, &rt->dst);
-	iph->protocol = sk->sk_protocol;
+	iph->protocol = protocol;
 	ip_copy_addrs(iph, fl4);
 
 	/* Transport layer set skb->h.foo itself. */
@@ -456,10 +482,12 @@ packet_routed:
 	return res;
 
 no_route:
-	rcu_read_unlock();
 	IP_INC_STATS(net, IPSTATS_MIB_OUTNOROUTES);
+	res = -EHOSTUNREACH;
+fail:
+	rcu_read_unlock();
 	kfree_skb(skb);
-	return -EHOSTUNREACH;
+	return res;
 }
 EXPORT_SYMBOL(ip_queue_xmit);
 
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 71a52f4..0c9d3f0 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -42,6 +42,7 @@
 #include <net/transp_v6.h>
 #endif
 #include <net/ip_fib.h>
+#include <net/tou.h>
 
 #include <linux/errqueue.h>
 #include <asm/uaccess.h>
@@ -1162,6 +1163,10 @@ mc_msf_out:
 		inet->min_ttl = val;
 		break;
 
+	case IP_TOU_ENCAP:
+		err = tou_encap_setsockopt(sk, optval, optlen, false);
+		break;
+
 	default:
 		err = -ENOPROTOOPT;
 		break;
@@ -1493,6 +1498,8 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
 	case IP_MINTTL:
 		val = inet->min_ttl;
 		break;
+	case IPV6_TOU_ENCAP:
+		return tou_encap_getsockopt(sk, optval, len, optlen, false);
 	default:
 		release_sock(sk);
 		return -ENOPROTOOPT;
diff --git a/net/ipv4/tou.c b/net/ipv4/tou.c
index 601466a..bbb44b6 100644
--- a/net/ipv4/tou.c
+++ b/net/ipv4/tou.c
@@ -61,7 +61,7 @@ int tou_encap_setsockopt(struct sock *sk, char __user *optval, int optlen,
 	encap.dport = te.dport;
 	encap.flags = te.flags;
 
-	hlen = is_ipv6 ? ip6_encap_hlen(e) : ip_encap_hlen(e);
+	hlen = is_ipv6 ? ip6_encap_hlen(&encap) : ip_encap_hlen(&encap);
 	if (hlen < 0)
 		return hlen;
 
-- 
2.8.0.rc2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 4/7] tcp: Support for TOU
  2016-05-23 22:48 [RFC PATCH 0/7] tou: Transports over UDP - part I Tom Herbert
                   ` (2 preceding siblings ...)
  2016-05-23 22:48 ` [RFC PATCH 3/7] ipv4: Support TOU Tom Herbert
@ 2016-05-23 22:48 ` Tom Herbert
  2016-05-23 22:48 ` [RFC PATCH 5/7] ipv6: Support TOU Tom Herbert
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Tom Herbert @ 2016-05-23 22:48 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

Need to adjust MSS to account for encapsulation overhead. This is done
by add encpasulation header length into icsk_ext_hdr_len.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 net/ipv4/tcp_ipv4.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 3708de2..c344f667 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -74,6 +74,7 @@
 #include <net/xfrm.h>
 #include <net/secure_seq.h>
 #include <net/busy_poll.h>
+#include <net/tou.h>
 
 #include <linux/inet.h>
 #include <linux/ipv6.h>
@@ -205,9 +206,9 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	inet->inet_dport = usin->sin_port;
 	sk_daddr_set(sk, daddr);
 
-	inet_csk(sk)->icsk_ext_hdr_len = 0;
+	inet_csk(sk)->icsk_ext_hdr_len = tou_hdr_len(sk);
 	if (inet_opt)
-		inet_csk(sk)->icsk_ext_hdr_len = inet_opt->opt.optlen;
+		inet_csk(sk)->icsk_ext_hdr_len += inet_opt->opt.optlen;
 
 	tp->rx_opt.mss_clamp = TCP_MSS_DEFAULT;
 
@@ -1296,9 +1297,9 @@ struct sock *tcp_v4_syn_recv_sock(const struct sock *sk, struct sk_buff *skb,
 	newinet->mc_index     = inet_iif(skb);
 	newinet->mc_ttl	      = ip_hdr(skb)->ttl;
 	newinet->rcv_tos      = ip_hdr(skb)->tos;
-	inet_csk(newsk)->icsk_ext_hdr_len = 0;
+	inet_csk(sk)->icsk_ext_hdr_len = tou_hdr_len(sk);
 	if (inet_opt)
-		inet_csk(newsk)->icsk_ext_hdr_len = inet_opt->opt.optlen;
+		inet_csk(newsk)->icsk_ext_hdr_len += inet_opt->opt.optlen;
 	newinet->inet_id = newtp->write_seq ^ jiffies;
 
 	if (!dst) {
-- 
2.8.0.rc2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 5/7] ipv6: Support TOU
  2016-05-23 22:48 [RFC PATCH 0/7] tou: Transports over UDP - part I Tom Herbert
                   ` (3 preceding siblings ...)
  2016-05-23 22:48 ` [RFC PATCH 4/7] tcp: Support for TOU Tom Herbert
@ 2016-05-23 22:48 ` Tom Herbert
  2016-05-23 22:48 ` [RFC PATCH 6/7] tcp6: Support for TOU Tom Herbert
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Tom Herbert @ 2016-05-23 22:48 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

In transmit path (inet6_csk_xmit) check if encapsulation is enabled and
call the build header op if it is. Add IP_TOU_ENCAP setsockopt for IPv6
sockets.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/uapi/linux/in6.h         |  1 +
 net/ipv6/inet6_connection_sock.c | 56 ++++++++++++++++++++++++++++++++++------
 net/ipv6/ipv6_sockglue.c         |  7 +++++
 3 files changed, 56 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
index 318a482..9a610c3 100644
--- a/include/uapi/linux/in6.h
+++ b/include/uapi/linux/in6.h
@@ -282,6 +282,7 @@ struct in6_flowlabel_req {
 #define IPV6_RECVORIGDSTADDR    IPV6_ORIGDSTADDR
 #define IPV6_TRANSPARENT        75
 #define IPV6_UNICAST_IF         76
+#define IPV6_TOU_ENCAP		77
 
 /*
  * Multicast Routing:
diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
index 532c3ef..5f2df4f 100644
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -24,6 +24,7 @@
 #include <net/inet_ecn.h>
 #include <net/inet_hashtables.h>
 #include <net/ip6_route.h>
+#include <net/ip6_tunnel.h>
 #include <net/sock.h>
 #include <net/inet6_connection_sock.h>
 #include <net/sock_reuseport.h>
@@ -118,13 +119,11 @@ struct dst_entry *__inet6_csk_dst_check(struct sock *sk, u32 cookie)
 	return __sk_dst_check(sk, cookie);
 }
 
-static struct dst_entry *inet6_csk_route_socket(struct sock *sk,
-						struct flowi6 *fl6)
+static void inet6_csk_fill_flowi6(struct sock *sk,
+				  struct flowi6 *fl6)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	struct ipv6_pinfo *np = inet6_sk(sk);
-	struct in6_addr *final_p, final;
-	struct dst_entry *dst;
 
 	memset(fl6, 0, sizeof(*fl6));
 	fl6->flowi6_proto = sk->sk_protocol;
@@ -137,6 +136,14 @@ static struct dst_entry *inet6_csk_route_socket(struct sock *sk,
 	fl6->fl6_sport = inet->inet_sport;
 	fl6->fl6_dport = inet->inet_dport;
 	security_sk_classify_flow(sk, flowi6_to_flowi(fl6));
+}
+
+static struct dst_entry *inet6_csk_route_socket(struct sock *sk,
+						struct flowi6 *fl6)
+{
+	struct ipv6_pinfo *np = inet6_sk(sk);
+	struct in6_addr *final_p, final;
+	struct dst_entry *dst;
 
 	rcu_read_lock();
 	final_p = fl6_update_dst(fl6, rcu_dereference(np->opt), &final);
@@ -154,20 +161,46 @@ static struct dst_entry *inet6_csk_route_socket(struct sock *sk,
 
 int inet6_csk_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl_unused)
 {
+	struct inet_sock *inet = inet_sk(sk);
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	struct flowi6 fl6;
 	struct dst_entry *dst;
 	int res;
+	u8 protocol = sk->sk_protocol;
+
+	inet6_csk_fill_flowi6(sk, &fl6);
+
+	rcu_read_lock();
+
+	if (inet->tou_encap) {
+		struct ip_tunnel_encap *e = inet->tou_encap;
+		const struct ip6_tnl_encap_ops *ops;
+
+		/* Transport layer protocol over UDP enapsulation */
+		ops = rcu_dereference(ip6tun_encaps[e->type]);
+		if (likely(ops && ops->build_header)) {
+			res = ops->build_header(skb, e, &protocol, &fl6);
+			if (res < 0)
+				goto fail;
+		} else {
+			res = -EINVAL;
+			goto fail;
+		}
+
+		/* Changing ports and protocol to be routed */
+		fl6.fl6_sport = e->sport;
+		fl6.fl6_dport = e->dport;
+		fl6.flowi6_proto = protocol;
+	}
 
 	dst = inet6_csk_route_socket(sk, &fl6);
 	if (IS_ERR(dst)) {
 		sk->sk_err_soft = -PTR_ERR(dst);
 		sk->sk_route_caps = 0;
-		kfree_skb(skb);
-		return PTR_ERR(dst);
+		res = PTR_ERR(dst);
+		goto fail;
 	}
 
-	rcu_read_lock();
 	skb_dst_set_noref(skb, dst);
 
 	/* Restore final destination back after routing done */
@@ -177,14 +210,21 @@ int inet6_csk_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl_unused
 		       np->tclass);
 	rcu_read_unlock();
 	return res;
+fail:
+	rcu_read_unlock();
+	kfree_skb(skb);
+	return res;
 }
 EXPORT_SYMBOL_GPL(inet6_csk_xmit);
 
 struct dst_entry *inet6_csk_update_pmtu(struct sock *sk, u32 mtu)
 {
 	struct flowi6 fl6;
-	struct dst_entry *dst = inet6_csk_route_socket(sk, &fl6);
+	struct dst_entry *dst;
+
+	inet6_csk_fill_flowi6(sk, &fl6);
 
+	dst = inet6_csk_route_socket(sk, &fl6);
 	if (IS_ERR(dst))
 		return NULL;
 	dst->ops->update_pmtu(dst, sk, NULL, mtu);
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index a9895e1..1697c0e 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -52,6 +52,7 @@
 #include <net/udplite.h>
 #include <net/xfrm.h>
 #include <net/compat.h>
+#include <net/tou.h>
 
 #include <asm/uaccess.h>
 
@@ -868,6 +869,9 @@ pref_skip_coa:
 		np->autoflowlabel = valbool;
 		retv = 0;
 		break;
+	case IPV6_TOU_ENCAP:
+		retv = tou_encap_setsockopt(sk, optval, optlen, true);
+		break;
 	}
 
 	release_sock(sk);
@@ -1310,6 +1314,9 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
 		val = np->autoflowlabel;
 		break;
 
+	case IPV6_TOU_ENCAP:
+		return tou_encap_getsockopt(sk, optval, len, optlen, true);
+
 	default:
 		return -ENOPROTOOPT;
 	}
-- 
2.8.0.rc2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 6/7] tcp6: Support for TOU
  2016-05-23 22:48 [RFC PATCH 0/7] tou: Transports over UDP - part I Tom Herbert
                   ` (4 preceding siblings ...)
  2016-05-23 22:48 ` [RFC PATCH 5/7] ipv6: Support TOU Tom Herbert
@ 2016-05-23 22:48 ` Tom Herbert
  2016-05-23 22:48 ` [RFC PATCH 7/7] tou: Support for GSO Tom Herbert
  2016-05-26 15:49 ` [RFC PATCH 0/7] tou: Transports over UDP - part I Alex Elsayed
  7 siblings, 0 replies; 15+ messages in thread
From: Tom Herbert @ 2016-05-23 22:48 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

Need to adjust MSS to account for encapsulation overhead. This is done
by add encpasulation header length into icsk_ext_hdr_len.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 net/ipv6/tcp_ipv6.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 79e33e0..6a3c2e7 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -62,6 +62,7 @@
 #include <net/inet_common.h>
 #include <net/secure_seq.h>
 #include <net/busy_poll.h>
+#include <net/tou.h>
 
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
@@ -210,7 +211,7 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
 		err = tcp_v4_connect(sk, (struct sockaddr *)&sin, sizeof(sin));
 
 		if (err) {
-			icsk->icsk_ext_hdr_len = exthdrlen;
+			icsk->icsk_ext_hdr_len = tou_hdr_len(sk) + exthdrlen;
 			icsk->icsk_af_ops = &ipv6_specific;
 			sk->sk_backlog_rcv = tcp_v6_do_rcv;
 #ifdef CONFIG_TCP_MD5SIG
@@ -262,9 +263,9 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
 	    ipv6_addr_equal(&fl6.daddr, &sk->sk_v6_daddr))
 		tcp_fetch_timewait_stamp(sk, dst);
 
-	icsk->icsk_ext_hdr_len = 0;
+	icsk->icsk_ext_hdr_len = tou_hdr_len(sk);
 	if (opt)
-		icsk->icsk_ext_hdr_len = opt->opt_flen +
+		icsk->icsk_ext_hdr_len += opt->opt_flen +
 					 opt->opt_nflen;
 
 	tp->rx_opt.mss_clamp = IPV6_MIN_MTU - sizeof(struct tcphdr) - sizeof(struct ipv6hdr);
@@ -1114,9 +1115,9 @@ static struct sock *tcp_v6_syn_recv_sock(const struct sock *sk, struct sk_buff *
 		opt = ipv6_dup_options(newsk, opt);
 		RCU_INIT_POINTER(newnp->opt, opt);
 	}
-	inet_csk(newsk)->icsk_ext_hdr_len = 0;
+	inet_csk(newsk)->icsk_ext_hdr_len = tou_hdr_len(sk);
 	if (opt)
-		inet_csk(newsk)->icsk_ext_hdr_len = opt->opt_nflen +
+		inet_csk(newsk)->icsk_ext_hdr_len += opt->opt_nflen +
 						    opt->opt_flen;
 
 	tcp_ca_openreq_child(newsk, dst);
-- 
2.8.0.rc2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 7/7] tou: Support for GSO
  2016-05-23 22:48 [RFC PATCH 0/7] tou: Transports over UDP - part I Tom Herbert
                   ` (5 preceding siblings ...)
  2016-05-23 22:48 ` [RFC PATCH 6/7] tcp6: Support for TOU Tom Herbert
@ 2016-05-23 22:48 ` Tom Herbert
  2016-05-24 14:59   ` Alexander Duyck
  2016-05-26 15:49 ` [RFC PATCH 0/7] tou: Transports over UDP - part I Alex Elsayed
  7 siblings, 1 reply; 15+ messages in thread
From: Tom Herbert @ 2016-05-23 22:48 UTC (permalink / raw)
  To: davem, netdev; +Cc: kernel-team

Add SKB_GSO_TOU. In udp[64]_ufo_fragment check for SKB_GSO_TOU. If this
is set call skb_udp_tou_segment. skb_udp_tou_segment is very similar
to skb_udp_tunnel_segment except that we only need to deal with the
L4 headers.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/linux/skbuff.h           |   2 +
 include/net/udp.h                |   2 +
 net/ipv4/fou.c                   |   2 +
 net/ipv4/ip_output.c             |   2 +
 net/ipv4/udp_offload.c           | 164 +++++++++++++++++++++++++++++++++++++--
 net/ipv6/inet6_connection_sock.c |   3 +
 net/ipv6/udp_offload.c           | 128 +++++++++++++++---------------
 7 files changed, 236 insertions(+), 67 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 65968a9..b57e484 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -482,6 +482,8 @@ enum {
 	SKB_GSO_PARTIAL = 1 << 13,
 
 	SKB_GSO_TUNNEL_REMCSUM = 1 << 14,
+
+	SKB_GSO_TOU = 1 << 15,
 };
 
 #if BITS_PER_LONG > 32
diff --git a/include/net/udp.h b/include/net/udp.h
index ae07f37..4423234 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -262,6 +262,8 @@ unsigned int udp_poll(struct file *file, struct socket *sock, poll_table *wait);
 struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb,
 				       netdev_features_t features,
 				       bool is_ipv6);
+struct sk_buff *skb_udp_tou_segment(struct sk_buff *skb,
+				    netdev_features_t features, bool is_ipv6);
 int udp_lib_getsockopt(struct sock *sk, int level, int optname,
 		       char __user *optval, int __user *optlen);
 int udp_lib_setsockopt(struct sock *sk, int level, int optname,
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index 96260c6..1855fc2f 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -381,6 +381,8 @@ static struct sk_buff **gue_gro_receive(struct sock *sk,
 	/* Flag this frame as already having an outer encap header */
 	NAPI_GRO_CB(skb)->is_fou = 1;
 
+	skb_set_transport_header(skb, skb_gro_offset(skb));
+
 	rcu_read_lock();
 	offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
 	ops = rcu_dereference(offloads[guehdr->proto_ctype]);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index e7dbded..922c09c 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -408,6 +408,8 @@ int ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl)
 			res = -EINVAL;
 			goto fail;
 		}
+		skb_shinfo(skb)->gso_type |= SKB_GSO_TOU;
+		skb_set_inner_ipproto(skb, sk->sk_protocol);
 	} else {
 		dport = inet->inet_dport;
 		sport = inet->inet_sport;
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 81f253b..93ad42e 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -184,6 +184,156 @@ out_unlock:
 }
 EXPORT_SYMBOL(skb_udp_tunnel_segment);
 
+/* __skb_udp_tou_segment
+ *
+ * Handle segmentation of TOU (Transports Protocols over UDP). Note that this
+ * is very similar __skb_udp_tunnel_segment however here we don't need to
+ * deal with MAC or nework layers. Everything is done base on transport
+ * headers only.
+ */
+static struct sk_buff *__skb_udp_tou_segment(struct sk_buff *skb,
+	netdev_features_t features,
+	struct sk_buff *(*gso_inner_segment)(struct sk_buff *skb,
+					     netdev_features_t features),
+	bool is_ipv6)
+{
+	int tnl_hlen = skb_inner_transport_header(skb) -
+		       skb_transport_header(skb);
+	bool remcsum, need_csum, offload_csum, ufo;
+	struct sk_buff *segs = ERR_PTR(-EINVAL);
+	struct udphdr *uh = udp_hdr(skb);
+	int outer_hlen;
+	__wsum partial;
+
+	if (unlikely(!pskb_may_pull(skb, tnl_hlen)))
+		goto out;
+
+	/* Adjust partial header checksum to negate old length.
+	 * We cannot rely on the value contained in uh->len as it is
+	 * possible that the actual value exceeds the boundaries of the
+	 * 16 bit length field due to the header being added outside of an
+	 * IP or IPv6 frame that was already limited to 64K - 1.
+	 */
+	if (skb_shinfo(skb)->gso_type & SKB_GSO_PARTIAL)
+		partial = (__force __wsum)uh->len;
+	else
+		partial = (__force __wsum)htonl(skb->len);
+	partial = csum_sub(csum_unfold(uh->check), partial);
+
+	/* Setup inner skb. Only the transport header is relevant */
+	skb->encapsulation = 0;
+	SKB_GSO_CB(skb)->encap_level = 0;
+	__skb_pull(skb, tnl_hlen);
+	skb_reset_transport_header(skb);
+
+	need_csum = !!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL_CSUM);
+	skb->encap_hdr_csum = need_csum;
+
+	remcsum = !!(skb_shinfo(skb)->gso_type & SKB_GSO_TUNNEL_REMCSUM);
+	skb->remcsum_offload = remcsum;
+
+	ufo = !!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP);
+
+	/* Try to offload checksum if possible */
+	offload_csum = !!(need_csum &&
+			  (skb->dev->features &
+			   (is_ipv6 ? (NETIF_F_HW_CSUM | NETIF_F_IPV6_CSUM) :
+				      (NETIF_F_HW_CSUM | NETIF_F_IP_CSUM))));
+
+	features &= skb->dev->hw_enc_features;
+
+	/* The only checksum offload we care about from here on out is the
+	 * outer one so strip the existing checksum feature flags and
+	 * instead set the flag based on our outer checksum offload value.
+	 */
+	if (remcsum || ufo) {
+		features &= ~NETIF_F_CSUM_MASK;
+		if (!need_csum || offload_csum)
+			features |= NETIF_F_HW_CSUM;
+	}
+
+	/* segment inner packet. */
+	segs = gso_inner_segment(skb, features);
+	if (IS_ERR_OR_NULL(segs)) {
+		skb->encapsulation = 1;
+		skb_push(skb, tnl_hlen);
+		skb_reset_transport_header(skb);
+
+		goto out;
+	}
+
+	skb = segs;
+	do {
+		unsigned int len;
+
+		if (remcsum)
+			skb->ip_summed = CHECKSUM_NONE;
+
+		/* Adjust transport header back to UDP header */
+
+		skb->transport_header -= tnl_hlen;
+		uh = udp_hdr(skb);
+		len = skb->len - ((unsigned char *)uh - skb->data);
+
+		/* If we are only performing partial GSO the inner header
+		 * will be using a length value equal to only one MSS sized
+		 * segment instead of the entire frame.
+		 */
+		if (skb_is_gso(skb)) {
+			uh->len = htons(skb_shinfo(skb)->gso_size +
+					SKB_GSO_CB(skb)->data_offset +
+					skb->head - (unsigned char *)uh);
+		} else {
+			uh->len = htons(len);
+		}
+
+		if (!need_csum)
+			continue;
+
+		uh->check = ~csum_fold(csum_add(partial,
+				       (__force __wsum)htonl(len)));
+
+		if (skb->encapsulation || !offload_csum) {
+			uh->check = gso_make_checksum(skb, ~uh->check);
+			if (uh->check == 0)
+				uh->check = CSUM_MANGLED_0;
+		} else {
+			skb->ip_summed = CHECKSUM_PARTIAL;
+			skb->csum_start = skb_transport_header(skb) - skb->head;
+			skb->csum_offset = offsetof(struct udphdr, check);
+		}
+	} while ((skb = skb->next));
+out:
+	return segs;
+}
+
+struct sk_buff *skb_udp_tou_segment(struct sk_buff *skb,
+				    netdev_features_t features,
+				    bool is_ipv6)
+{
+	const struct net_offload **offloads;
+	const struct net_offload *ops;
+	struct sk_buff *segs = ERR_PTR(-EINVAL);
+	struct sk_buff *(*gso_inner_segment)(struct sk_buff *skb,
+					     netdev_features_t features);
+
+	rcu_read_lock();
+
+	offloads = is_ipv6 ? inet6_offloads : inet_offloads;
+	ops = rcu_dereference(offloads[skb->inner_ipproto]);
+	if (!ops || !ops->callbacks.gso_segment)
+		goto out_unlock;
+	gso_inner_segment = ops->callbacks.gso_segment;
+
+	segs = __skb_udp_tou_segment(skb, features, gso_inner_segment, is_ipv6);
+
+out_unlock:
+	rcu_read_unlock();
+
+	return segs;
+}
+EXPORT_SYMBOL(skb_udp_tou_segment);
+
 static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
 					 netdev_features_t features)
 {
@@ -193,11 +343,15 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
 	struct udphdr *uh;
 	struct iphdr *iph;
 
-	if (skb->encapsulation &&
-	    (skb_shinfo(skb)->gso_type &
-	     (SKB_GSO_UDP_TUNNEL|SKB_GSO_UDP_TUNNEL_CSUM))) {
-		segs = skb_udp_tunnel_segment(skb, features, false);
-		goto out;
+	if (skb->encapsulation) {
+		if (skb_shinfo(skb)->gso_type & SKB_GSO_TOU) {
+			segs = skb_udp_tou_segment(skb, features, false);
+			goto out;
+		} else if ((skb_shinfo(skb)->gso_type &
+		    (SKB_GSO_UDP_TUNNEL | SKB_GSO_UDP_TUNNEL_CSUM))) {
+			segs = skb_udp_tunnel_segment(skb, features, false);
+			goto out;
+		}
 	}
 
 	if (!pskb_may_pull(skb, sizeof(struct udphdr)))
diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
index 5f2df4f..3b8b2f4 100644
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -187,6 +187,9 @@ int inet6_csk_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl_unused
 			goto fail;
 		}
 
+		skb_shinfo(skb)->gso_type |= SKB_GSO_TOU;
+		skb_set_inner_ipproto(skb, sk->sk_protocol);
+
 		/* Changing ports and protocol to be routed */
 		fl6.fl6_sport = e->sport;
 		fl6.fl6_dport = e->dport;
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index ac858c4..b53486b 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -29,6 +29,8 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
 	u8 frag_hdr_sz = sizeof(struct frag_hdr);
 	__wsum csum;
 	int tnl_hlen;
+	const struct ipv6hdr *ipv6h;
+	struct udphdr *uh;
 
 	mss = skb_shinfo(skb)->gso_size;
 	if (unlikely(skb->len <= mss))
@@ -47,74 +49,76 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
 		goto out;
 	}
 
-	if (skb->encapsulation && skb_shinfo(skb)->gso_type &
-	    (SKB_GSO_UDP_TUNNEL|SKB_GSO_UDP_TUNNEL_CSUM))
-		segs = skb_udp_tunnel_segment(skb, features, true);
-	else {
-		const struct ipv6hdr *ipv6h;
-		struct udphdr *uh;
-
-		if (!pskb_may_pull(skb, sizeof(struct udphdr)))
+	if (skb->encapsulation) {
+		if (skb_shinfo(skb)->gso_type & SKB_GSO_TOU) {
+			segs = skb_udp_tou_segment(skb, features, true);
+			goto out;
+		} else if (skb_shinfo(skb)->gso_type &
+			   (SKB_GSO_UDP_TUNNEL | SKB_GSO_UDP_TUNNEL_CSUM)) {
+			segs = skb_udp_tunnel_segment(skb, features, true);
 			goto out;
-
-		/* Do software UFO. Complete and fill in the UDP checksum as HW cannot
-		 * do checksum of UDP packets sent as multiple IP fragments.
-		 */
-
-		uh = udp_hdr(skb);
-		ipv6h = ipv6_hdr(skb);
-
-		uh->check = 0;
-		csum = skb_checksum(skb, 0, skb->len, 0);
-		uh->check = udp_v6_check(skb->len, &ipv6h->saddr,
-					  &ipv6h->daddr, csum);
-		if (uh->check == 0)
-			uh->check = CSUM_MANGLED_0;
-
-		skb->ip_summed = CHECKSUM_NONE;
-
-		/* If there is no outer header we can fake a checksum offload
-		 * due to the fact that we have already done the checksum in
-		 * software prior to segmenting the frame.
-		 */
-		if (!skb->encap_hdr_csum)
-			features |= NETIF_F_HW_CSUM;
-
-		/* Check if there is enough headroom to insert fragment header. */
-		tnl_hlen = skb_tnl_header_len(skb);
-		if (skb->mac_header < (tnl_hlen + frag_hdr_sz)) {
-			if (gso_pskb_expand_head(skb, tnl_hlen + frag_hdr_sz))
-				goto out;
 		}
+	}
 
-		/* Find the unfragmentable header and shift it left by frag_hdr_sz
-		 * bytes to insert fragment header.
-		 */
-		unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr);
-		nexthdr = *prevhdr;
-		*prevhdr = NEXTHDR_FRAGMENT;
-		unfrag_len = (skb_network_header(skb) - skb_mac_header(skb)) +
-			     unfrag_ip6hlen + tnl_hlen;
-		packet_start = (u8 *) skb->head + SKB_GSO_CB(skb)->mac_offset;
-		memmove(packet_start-frag_hdr_sz, packet_start, unfrag_len);
-
-		SKB_GSO_CB(skb)->mac_offset -= frag_hdr_sz;
-		skb->mac_header -= frag_hdr_sz;
-		skb->network_header -= frag_hdr_sz;
-
-		fptr = (struct frag_hdr *)(skb_network_header(skb) + unfrag_ip6hlen);
-		fptr->nexthdr = nexthdr;
-		fptr->reserved = 0;
-		if (!skb_shinfo(skb)->ip6_frag_id)
-			ipv6_proxy_select_ident(dev_net(skb->dev), skb);
-		fptr->identification = skb_shinfo(skb)->ip6_frag_id;
+	if (!pskb_may_pull(skb, sizeof(struct udphdr)))
+		goto out;
 
-		/* Fragment the skb. ipv6 header and the remaining fields of the
-		 * fragment header are updated in ipv6_gso_segment()
-		 */
-		segs = skb_segment(skb, features);
+	/* Do software UFO. Complete and fill in the UDP checksum as HW cannot
+	 * do checksum of UDP packets sent as multiple IP fragments.
+	 */
+
+	uh = udp_hdr(skb);
+	ipv6h = ipv6_hdr(skb);
+
+	uh->check = 0;
+	csum = skb_checksum(skb, 0, skb->len, 0);
+	uh->check = udp_v6_check(skb->len, &ipv6h->saddr,
+				  &ipv6h->daddr, csum);
+	if (uh->check == 0)
+		uh->check = CSUM_MANGLED_0;
+
+	skb->ip_summed = CHECKSUM_NONE;
+
+	/* If there is no outer header we can fake a checksum offload
+	 * due to the fact that we have already done the checksum in
+	 * software prior to segmenting the frame.
+	 */
+	if (!skb->encap_hdr_csum)
+		features |= NETIF_F_HW_CSUM;
+
+	/* Check if there is enough headroom to insert fragment header. */
+	tnl_hlen = skb_tnl_header_len(skb);
+	if (skb->mac_header < (tnl_hlen + frag_hdr_sz)) {
+		if (gso_pskb_expand_head(skb, tnl_hlen + frag_hdr_sz))
+			goto out;
 	}
 
+	/* Find the unfragmentable header and shift it left by frag_hdr_sz
+	 * bytes to insert fragment header.
+	 */
+	unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr);
+	nexthdr = *prevhdr;
+	*prevhdr = NEXTHDR_FRAGMENT;
+	unfrag_len = (skb_network_header(skb) - skb_mac_header(skb)) +
+		     unfrag_ip6hlen + tnl_hlen;
+	packet_start = (u8 *)skb->head + SKB_GSO_CB(skb)->mac_offset;
+	memmove(packet_start - frag_hdr_sz, packet_start, unfrag_len);
+
+	SKB_GSO_CB(skb)->mac_offset -= frag_hdr_sz;
+	skb->mac_header -= frag_hdr_sz;
+	skb->network_header -= frag_hdr_sz;
+
+	fptr = (struct frag_hdr *)(skb_network_header(skb) + unfrag_ip6hlen);
+	fptr->nexthdr = nexthdr;
+	fptr->reserved = 0;
+	if (!skb_shinfo(skb)->ip6_frag_id)
+		ipv6_proxy_select_ident(dev_net(skb->dev), skb);
+	fptr->identification = skb_shinfo(skb)->ip6_frag_id;
+
+	/* Fragment the skb. ipv6 header and the remaining fields of the
+	 * fragment header are updated in ipv6_gso_segment()
+	 */
+	segs = skb_segment(skb, features);
 out:
 	return segs;
 }
-- 
2.8.0.rc2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 3/7] ipv4: Support TOU
  2016-05-23 22:48 ` [RFC PATCH 3/7] ipv4: Support TOU Tom Herbert
@ 2016-05-24  3:16   ` Eric Dumazet
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2016-05-24  3:16 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev, kernel-team

On Mon, 2016-05-23 at 15:48 -0700, Tom Herbert wrote:

...

>  static int
>  ip_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
> @@ -382,11 +383,36 @@ int ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl)
>  	struct rtable *rt;
>  	struct iphdr *iph;
>  	int res;
> +	__be16 dport, sport;
> +	u8 protocol = sk->sk_protocol;
>  
>  	/* Skip all of this if the packet is already routed,
>  	 * f.e. by something like SCTP.
>  	 */
>  	rcu_read_lock();
> +
> +	if (inet->tou_encap) {
> +		struct ip_tunnel_encap *e = inet->tou_encap;

You seem to rely on RCU, in spirit at least.

But ...

1) You read inet->tou_encap twice, so the second time could read a NULL
and then crash on a NULL dereference.

2) No RCU grace period is respected in tou_encap_setsockopt(), so use
after free is possible.

> +		const struct ip_tunnel_encap_ops *ops;
> +
> +		/* Transport layer protocol over UDP enapsulation */


> +		dport = e->dport;
> +		sport = e->sport;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 7/7] tou: Support for GSO
  2016-05-23 22:48 ` [RFC PATCH 7/7] tou: Support for GSO Tom Herbert
@ 2016-05-24 14:59   ` Alexander Duyck
  2016-05-24 17:07     ` Tom Herbert
  0 siblings, 1 reply; 15+ messages in thread
From: Alexander Duyck @ 2016-05-24 14:59 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, Netdev, Kernel Team

On Mon, May 23, 2016 at 3:48 PM, Tom Herbert <tom@herbertland.com> wrote:
> Add SKB_GSO_TOU. In udp[64]_ufo_fragment check for SKB_GSO_TOU. If this
> is set call skb_udp_tou_segment. skb_udp_tou_segment is very similar
> to skb_udp_tunnel_segment except that we only need to deal with the
> L4 headers.
>
> Signed-off-by: Tom Herbert <tom@herbertland.com>
> ---
>  include/linux/skbuff.h           |   2 +
>  include/net/udp.h                |   2 +
>  net/ipv4/fou.c                   |   2 +
>  net/ipv4/ip_output.c             |   2 +
>  net/ipv4/udp_offload.c           | 164 +++++++++++++++++++++++++++++++++++++--
>  net/ipv6/inet6_connection_sock.c |   3 +
>  net/ipv6/udp_offload.c           | 128 +++++++++++++++---------------
>  7 files changed, 236 insertions(+), 67 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 65968a9..b57e484 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -482,6 +482,8 @@ enum {
>         SKB_GSO_PARTIAL = 1 << 13,
>
>         SKB_GSO_TUNNEL_REMCSUM = 1 << 14,
> +
> +       SKB_GSO_TOU = 1 << 15,
>  };
>

So where do you add the netdev feature bit?  From what I can tell that
was overlooked and as a result devices that support FCoE CRC will end
up corrupting TOU frames because netif_gso_ok currently ands the two
together.

Also I am pretty sure we can offload this on the Intel NICs using the
GSO partial approach as we can just stuff the UDP header into the
space that we would use for IPv4 options or IPv6 extension headers and
it shouldn't complain.

- Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 7/7] tou: Support for GSO
  2016-05-24 14:59   ` Alexander Duyck
@ 2016-05-24 17:07     ` Tom Herbert
  0 siblings, 0 replies; 15+ messages in thread
From: Tom Herbert @ 2016-05-24 17:07 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: David Miller, Netdev, Kernel Team

On Tue, May 24, 2016 at 7:59 AM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Mon, May 23, 2016 at 3:48 PM, Tom Herbert <tom@herbertland.com> wrote:
>> Add SKB_GSO_TOU. In udp[64]_ufo_fragment check for SKB_GSO_TOU. If this
>> is set call skb_udp_tou_segment. skb_udp_tou_segment is very similar
>> to skb_udp_tunnel_segment except that we only need to deal with the
>> L4 headers.
>>
>> Signed-off-by: Tom Herbert <tom@herbertland.com>
>> ---
>>  include/linux/skbuff.h           |   2 +
>>  include/net/udp.h                |   2 +
>>  net/ipv4/fou.c                   |   2 +
>>  net/ipv4/ip_output.c             |   2 +
>>  net/ipv4/udp_offload.c           | 164 +++++++++++++++++++++++++++++++++++++--
>>  net/ipv6/inet6_connection_sock.c |   3 +
>>  net/ipv6/udp_offload.c           | 128 +++++++++++++++---------------
>>  7 files changed, 236 insertions(+), 67 deletions(-)
>>
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index 65968a9..b57e484 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -482,6 +482,8 @@ enum {
>>         SKB_GSO_PARTIAL = 1 << 13,
>>
>>         SKB_GSO_TUNNEL_REMCSUM = 1 << 14,
>> +
>> +       SKB_GSO_TOU = 1 << 15,
>>  };
>>
>
> So where do you add the netdev feature bit?  From what I can tell that
> was overlooked and as a result devices that support FCoE CRC will end
> up corrupting TOU frames because netif_gso_ok currently ands the two
> together.
>
An obvious omission, thanks for pointing it out.

> Also I am pretty sure we can offload this on the Intel NICs using the
> GSO partial approach as we can just stuff the UDP header into the
> space that we would use for IPv4 options or IPv6 extension headers and
> it shouldn't complain.
>
That would be cool!

> - Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 1/7] fou: Get net from sock_net if dev_net unavailable
  2016-05-23 22:48 ` [RFC PATCH 1/7] fou: Get net from sock_net if dev_net unavailable Tom Herbert
@ 2016-05-24 22:01   ` David Miller
  2016-05-24 22:40     ` Tom Herbert
  0 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2016-05-24 22:01 UTC (permalink / raw)
  To: tom; +Cc: netdev, kernel-team

From: Tom Herbert <tom@herbertland.com>
Date: Mon, 23 May 2016 15:48:20 -0700

> diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
> index 5f9207c..96260c6 100644
> --- a/net/ipv4/fou.c
> +++ b/net/ipv4/fou.c
> @@ -807,13 +807,20 @@ int __fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
>  		       u8 *protocol, __be16 *sport, int type)
>  {
>  	int err;
> +	struct net *net;
>  

Please order local variables from longest to shortest line.

>  	err = iptunnel_handle_offloads(skb, type);
>  	if (err)
>  		return err;
>  
> -	*sport = e->sport ? : udp_flow_src_port(dev_net(skb->dev),
> -						skb, 0, 0, false);
> +	if (skb->dev)
> +		net = dev_net(skb->dev);
> +	else if (skb->sk)
> +		net = sock_net(skb->sk);

This is getting rediculous.  Why not just put the net namespace pointer into
the tunnel encap object?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 2/7] tou: Base infrastructure for Transport over UDP
  2016-05-23 22:48 ` [RFC PATCH 2/7] tou: Base infrastructure for Transport over UDP Tom Herbert
@ 2016-05-24 22:02   ` David Miller
  0 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2016-05-24 22:02 UTC (permalink / raw)
  To: tom; +Cc: netdev, kernel-team

From: Tom Herbert <tom@herbertland.com>
Date: Mon, 23 May 2016 15:48:21 -0700

> +int tou_encap_setsockopt(struct sock *sk, char __user *optval, int optlen,
> +			 bool is_ipv6)
> +{
> +	struct tou_encap te;
> +	struct ip_tunnel_encap encap;
> +	struct inet_sock *inet = inet_sk(sk);
> +	struct ip_tunnel_encap *e = inet->tou_encap;

This doesn't compile, because you don't add the tou_encap member to inet_sock
until patch #3.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 1/7] fou: Get net from sock_net if dev_net unavailable
  2016-05-24 22:01   ` David Miller
@ 2016-05-24 22:40     ` Tom Herbert
  0 siblings, 0 replies; 15+ messages in thread
From: Tom Herbert @ 2016-05-24 22:40 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Kernel Network Developers, Kernel Team

On Tue, May 24, 2016 at 3:01 PM, David Miller <davem@davemloft.net> wrote:
> From: Tom Herbert <tom@herbertland.com>
> Date: Mon, 23 May 2016 15:48:20 -0700
>
>> diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
>> index 5f9207c..96260c6 100644
>> --- a/net/ipv4/fou.c
>> +++ b/net/ipv4/fou.c
>> @@ -807,13 +807,20 @@ int __fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
>>                      u8 *protocol, __be16 *sport, int type)
>>  {
>>       int err;
>> +     struct net *net;
>>
>
> Please order local variables from longest to shortest line.
>
>>       err = iptunnel_handle_offloads(skb, type);
>>       if (err)
>>               return err;
>>
>> -     *sport = e->sport ? : udp_flow_src_port(dev_net(skb->dev),
>> -                                             skb, 0, 0, false);
>> +     if (skb->dev)
>> +             net = dev_net(skb->dev);
>> +     else if (skb->sk)
>> +             net = sock_net(skb->sk);
>
> This is getting rediculous.  Why not just put the net namespace pointer into
> the tunnel encap object?

Thanks, in this case it will be easier to just pass net in as an argument.

Tom

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 0/7] tou: Transports over UDP - part I
  2016-05-23 22:48 [RFC PATCH 0/7] tou: Transports over UDP - part I Tom Herbert
                   ` (6 preceding siblings ...)
  2016-05-23 22:48 ` [RFC PATCH 7/7] tou: Support for GSO Tom Herbert
@ 2016-05-26 15:49 ` Alex Elsayed
  7 siblings, 0 replies; 15+ messages in thread
From: Alex Elsayed @ 2016-05-26 15:49 UTC (permalink / raw)
  To: netdev

Tom Herbert <tom <at> herbertland.com> writes:

> 
> Transports over UDP is intended to encapsulate TCP and other transport
> protocols directly and securely in UDP.
> 
> The goal of this work is twofold:
> 
> 1) Allow applications to run their own transport layer stack (i.e.from
>    userspace). This eliminates dependencies on the OS (e.g. solves a
>    major dependency issue for Facebook on clients).
> 
> 2) Make transport layer headers (all of L4) invisible to the network
>    so that they can't do intrusive actions at L4. This will be enforced
>    with DTLS in use.

Just popping in to note that this has significant similarities with the
DeDiS group's Tng project[1], which takes the approach of splitting the
"transport layer" into four sub-layers:

1.) Endpoint (What port?)
2.) Flow (Congestion control)
3.) Isolation (Integrity, confidentiality, and preventing middlebox mangling)
4.) Semantic (End-to-end guarantees, fate-sharing, stream vs. dgram vs.
seqpacket, etc)

[1] http://dedis.cs.yale.edu/2009/tng/

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-05-26 15:49 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-23 22:48 [RFC PATCH 0/7] tou: Transports over UDP - part I Tom Herbert
2016-05-23 22:48 ` [RFC PATCH 1/7] fou: Get net from sock_net if dev_net unavailable Tom Herbert
2016-05-24 22:01   ` David Miller
2016-05-24 22:40     ` Tom Herbert
2016-05-23 22:48 ` [RFC PATCH 2/7] tou: Base infrastructure for Transport over UDP Tom Herbert
2016-05-24 22:02   ` David Miller
2016-05-23 22:48 ` [RFC PATCH 3/7] ipv4: Support TOU Tom Herbert
2016-05-24  3:16   ` Eric Dumazet
2016-05-23 22:48 ` [RFC PATCH 4/7] tcp: Support for TOU Tom Herbert
2016-05-23 22:48 ` [RFC PATCH 5/7] ipv6: Support TOU Tom Herbert
2016-05-23 22:48 ` [RFC PATCH 6/7] tcp6: Support for TOU Tom Herbert
2016-05-23 22:48 ` [RFC PATCH 7/7] tou: Support for GSO Tom Herbert
2016-05-24 14:59   ` Alexander Duyck
2016-05-24 17:07     ` Tom Herbert
2016-05-26 15:49 ` [RFC PATCH 0/7] tou: Transports over UDP - part I Alex Elsayed

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.