All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 net-next 0/7] net: foo-over-udp (fou)
@ 2014-09-15  3:07 Tom Herbert
  2014-09-15  3:07 ` [PATCH v2 net-next 1/7] net: Export inet_offloads and inet6_offloads Tom Herbert
                   ` (7 more replies)
  0 siblings, 8 replies; 33+ messages in thread
From: Tom Herbert @ 2014-09-15  3:07 UTC (permalink / raw)
  To: davem, netdev

This patch series implements foo-over-udp. The idea is that we can
encapsulate different IP protocols in UDP packets. The rationale for
this is that networking devices such as NICs and switches are usually
implemented with UDP (and TCP) specific mechanims for processing. For
instance, many switches and routers will implement a 5-tuple hash
for UDP packets to perform Equal Cost Multipath Routing (ECMP) or
RSS (on NICs). Many NICs also only provide rudimentary checksum
offload (basic TCP and UDP packet), with foo-over-udp we may be
able to leverage these NICs to offload checksums of tunneled packets
(using checksum unnecessary conversion and eventually remote checksum
offload)
 
An example encapsulation of IPIP over FOU is diagrammed below. As
illustrated, the packet overhead for FOU is the 8 byte UDP header.

+------------------+
|    IPv4 hdr      |
+------------------+
|     UDP hdr      |
+------------------+
|    IPv4 hdr      |
+------------------+
|     TCP hdr      |
+------------------+
|   TCP payload    |
+------------------+

Conceptually, FOU should be able to encapsulate any IP protocol.
The FOU header (UDP hdr.) is essentially an inserted header between the
IP header and transport, so in the case of TCP or UDP encapsulation
the pseudo header would be based on the outer IP header and its length
field must not include the UDP header.

* Receive

In this patch set the RX path for FOU is implemented in a new fou
module. To enable FOU for a particular protocol, a UDP-FOU socket is
opened to the port to receive FOU packets. The socket is mapped to the
IP protocol for the packets. The XFRM mechanism used to receive
encapsulated packets (udp_encap_rcv) for the port. Upon reception, the
UDP is removed and packet is reinjected in the stack for the
corresponding protocol associated with the socket (return -protocol
from udp_encap_rcv function).

GRO is provided with the appropriate fou_gro_receive and
fou_gro_complete. These routines need to know the encapsulation
protocol so we save that in udp_offloads structure with the port
and pass it in the napi_gro_cb structure.

* TX

This patch series implements FOU transmit encapsulation for IPIP, GRE, and
SIT. This done by some common infrastructure in ip_tunnel including an
ip_tunnel_encap to perform FOU encapsulation and common configuration
to enable FOU on IP tunnels. FOU is configured on existing tunnels and
does not create any new interfaces. The transmit and receive paths are
independent, so use of FOU may be assymetric between tunnel endpoints.

* Configuration

The fou module using netlink to configure FOU receive ports. The ip
command can be augmented with a fou subcommand to support this. e.g. to
configure FOU for IPIP on port 5555:

  ip fou add port 5555 ipproto 4

GRE, IPIP, and SIT have been modified with netlink commands to
configure use of FOU on transmit. The "ip link" command will be
augmented with an encap subcommand (for supporting various forms of
secondary encapsulation). For instance, to configure an ipip tunnel
with FOU on port 5555:

  ip link add name tun1 type ipip \
    remote 192.168.1.1 local 192.168.1.2 ttl 225 \
    encap fou encap-sport auto encap-dport 5555

* Notes
  - This patch set does not implement GSO for FOU. The UDP encapsulation
    code assumes TEB, so that will need to be reimplemented.
  - When a packet is received through FOU, the UDP header is not
    actually removed for the skbuf, pointers to transport header
    and length in the IP header are updated (like in ESP/UDP RX). A
    side effect is the IP header will now appear to have an incorrect
    checksum by an external observer (e.g. tcpdump), it will be off
    by sizeof UDP header. If necessary we could adjust the checksum 
    to compensate.
  - Performance results are below. My expectation is that FOU should
    entail little overhead (clearly there is some work to do :-) ).
    Optimizing UDP socket lookup for encapsulation ports should help
    significantly.
  - I really don't expect/want devices to have special support for any
    of this. Generic checksum offload mechanisms (NETIF_HW_CSUM
    and use of CHECKSUM_COMPLETE) should be sufficient. RSS and flow
    steering is provided by commonly implemented UDP hashing. GRO/GSO
    seem fairly comparable with LRO/TSO already.

* Performance

Ran netperf TCP_RR and TCP_STREAM tests across various configurations.
This was performed on bnx2x and I disabled TSO/GSO on sender to get
fair comparison for FOU versus non-FOU. CPU utilization is reported
for receive in TCP_STREAM.

  GRE
    IPv4, FOU, UDP checksum enabled
      TCP_STREAM
        24.85% CPU utilization
        9310.6 Mbps
      TCP_RR
        94.2% CPU utilization
        155/249/460 90/95/99% latencies
        1.17018e+06 tps
    IPv4, FOU, UDP checksum disabled
      TCP_STREAM
        31.04% CPU utilization
        9302.22 Mbps
      TCP_RR
        94.13% CPU utilization
        154/239/419 90/95/99% latencies
        1.17555e+06 tps
    IPv4, no FOU
      TCP_STREAM
        23.13% CPU utilization
        9354.58 Mbps
      TCP_RR
        90.24% CPU utilization
        156/228/360 90/95/99% latencies
        1.18169e+06 tps

  IPIP
    FOU, UDP checksum enabled
      TCP_STREAM
        24.13% CPU utilization
        9328 Mbps
      TCP_RR
        94.23
        149/237/429 90/95/99% latencies
        1.19553e+06 tps
    FOU, UDP checksum disabled
      TCP_STREAM
        29.13% CPU utilization
        9370.25 Mbps
      TCP_RR
        94.13% CPU utilization
        149/232/398 90/95/99% latencies
        1.19225e+06 tps
    No FOU
      TCP_STREAM
        10.43% CPU utilization
        5302.03 Mbps
      TCP_RR
        51.53% CPU utilization
        215/324/475 90/95/99% latencies
        864998 tps

  SIT
    FOU, UDP checksum enabled
      TCP_STREAM
        30.38% CPU utilization
        9176.76 Mbps
      TCP_RR
        96.9% CPU utilization
        170/281/581 90/95/99% latencies
        1.03372e+06 tps
    FOU, UDP checksum disabled
      TCP_STREAM
        39.6% CPU utilization
        9176.57 Mbps
      TCP_RR
        97.14% CPU utilization
        167/272/548 90/95/99% latencies
        1.03203e+06 tps
    No FOU
      TCP_STREAM
        11.2% CPU utilization
        4636.05 Mbps
      TCP_RR
        59.51% CPU utilization
        232/346/489 90/95/99% latencies
        813199 tps

v2:
  - Removed encap IP tunnel ioctls, configuration is done by netlink
    only.
  - Don't export fou_create and fou_destroy, they are currently
    intended to be called within fou module only.
  - Filled on tunnel netlink structures and functions for new values.

Tom Herbert (7):
  net: Export inet_offloads and inet6_offloads
  fou: Support for foo-over-udp RX path
  fou: Add GRO support
  net: Changes to ip_tunnel to support foo-over-udp encapsulation
  sit: TX path for sit/UDP foo-over-udp encapsulation
  ipip: TX path for IPIP/UDP foo-over-udp encapsulation
  gre: TX path for GRE/UDP foo-over-udp encapsulation

 include/linux/netdevice.h      |   3 +-
 include/net/ip_tunnels.h       |  19 ++-
 include/uapi/linux/fou.h       |  32 ++++
 include/uapi/linux/if_tunnel.h |  16 ++
 net/ipv4/Kconfig               |  10 ++
 net/ipv4/Makefile              |   1 +
 net/ipv4/fou.c                 | 369 +++++++++++++++++++++++++++++++++++++++++
 net/ipv4/ip_gre.c              |  98 ++++++++++-
 net/ipv4/ip_tunnel.c           |  91 +++++++++-
 net/ipv4/ipip.c                |  86 +++++++++-
 net/ipv4/protocol.c            |   1 +
 net/ipv4/udp_offload.c         |   5 +-
 net/ipv6/protocol.c            |   1 +
 net/ipv6/sit.c                 | 107 ++++++++++--
 14 files changed, 815 insertions(+), 24 deletions(-)
 create mode 100644 include/uapi/linux/fou.h
 create mode 100644 net/ipv4/fou.c

-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 net-next 1/7] net: Export inet_offloads and inet6_offloads
  2014-09-15  3:07 [PATCH v2 net-next 0/7] net: foo-over-udp (fou) Tom Herbert
@ 2014-09-15  3:07 ` Tom Herbert
  2014-09-15 13:33   ` Or Gerlitz
  2014-09-15 18:21   ` David Miller
  2014-09-15  3:07 ` [PATCH v2 net-next 2/7] fou: Support for foo-over-udp RX path Tom Herbert
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 33+ messages in thread
From: Tom Herbert @ 2014-09-15  3:07 UTC (permalink / raw)
  To: davem, netdev

Want to be able to call this in foo-over-udp offloads, etc.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 net/ipv4/protocol.c | 1 +
 net/ipv6/protocol.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/net/ipv4/protocol.c b/net/ipv4/protocol.c
index 46d6a1c..4b7c0ec 100644
--- a/net/ipv4/protocol.c
+++ b/net/ipv4/protocol.c
@@ -30,6 +30,7 @@
 
 const struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS] __read_mostly;
 const struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS] __read_mostly;
+EXPORT_SYMBOL(inet_offloads);
 
 int inet_add_protocol(const struct net_protocol *prot, unsigned char protocol)
 {
diff --git a/net/ipv6/protocol.c b/net/ipv6/protocol.c
index e048cf1..e3770ab 100644
--- a/net/ipv6/protocol.c
+++ b/net/ipv6/protocol.c
@@ -51,6 +51,7 @@ EXPORT_SYMBOL(inet6_del_protocol);
 #endif
 
 const struct net_offload __rcu *inet6_offloads[MAX_INET_PROTOS] __read_mostly;
+EXPORT_SYMBOL(inet6_offloads);
 
 int inet6_add_offload(const struct net_offload *prot, unsigned char protocol)
 {
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 net-next 2/7] fou: Support for foo-over-udp RX path
  2014-09-15  3:07 [PATCH v2 net-next 0/7] net: foo-over-udp (fou) Tom Herbert
  2014-09-15  3:07 ` [PATCH v2 net-next 1/7] net: Export inet_offloads and inet6_offloads Tom Herbert
@ 2014-09-15  3:07 ` Tom Herbert
  2014-09-15  3:07 ` [PATCH v2 net-next 3/7] fou: Add GRO support Tom Herbert
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 33+ messages in thread
From: Tom Herbert @ 2014-09-15  3:07 UTC (permalink / raw)
  To: davem, netdev

This patch provides a receive path for foo-over-udp. This allows
direct encapsulation of IP protocols over UDP. The bound destination
port is used to map to an IP protocol, and the XFRM framework
(udp_encap_rcv) is used to receive encapsulated packets. Upon
reception, the encapsulation header is logically removed (pointer
to transport header is advanced) and the packet is reinjected into
the receive path with the IP protocol indicated by the mapping.

Netlink is used to configure FOU ports. The configuration information
includes the port number to bind to and the IP protocol corresponding
to that port.

This should support GRE/UDP
(http://tools.ietf.org/html/draft-yong-tsvwg-gre-in-udp-encap-02),
as will as the other IP tunneling protocols (IPIP, SIT).

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/uapi/linux/fou.h |  32 ++++++
 net/ipv4/Kconfig         |  10 ++
 net/ipv4/Makefile        |   1 +
 net/ipv4/fou.c           | 279 +++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 322 insertions(+)
 create mode 100644 include/uapi/linux/fou.h
 create mode 100644 net/ipv4/fou.c

diff --git a/include/uapi/linux/fou.h b/include/uapi/linux/fou.h
new file mode 100644
index 0000000..e03376d
--- /dev/null
+++ b/include/uapi/linux/fou.h
@@ -0,0 +1,32 @@
+/* fou.h - FOU Interface */
+
+#ifndef _UAPI_LINUX_FOU_H
+#define _UAPI_LINUX_FOU_H
+
+/* NETLINK_GENERIC related info
+ */
+#define FOU_GENL_NAME		"fou"
+#define FOU_GENL_VERSION	0x1
+
+enum {
+	FOU_ATTR_UNSPEC,
+	FOU_ATTR_PORT,				/* u16 */
+	FOU_ATTR_AF,				/* u8 */
+	FOU_ATTR_IPPROTO,			/* u8 */
+
+	__FOU_ATTR_MAX,
+};
+
+#define FOU_ATTR_MAX		(__FOU_ATTR_MAX - 1)
+
+enum {
+	FOU_CMD_UNSPEC,
+	FOU_CMD_ADD,
+	FOU_CMD_DEL,
+
+	__FOU_CMD_MAX,
+};
+
+#define FOU_CMD_MAX	(__FOU_CMD_MAX - 1)
+
+#endif /* _UAPI_LINUX_FOU_H */
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index dbc10d8..84f710b 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -311,6 +311,16 @@ config NET_UDP_TUNNEL
 	tristate
 	default n
 
+config NET_FOU
+	tristate "IP: Foo (IP protocols) over UDP"
+	select XFRM
+	select NET_UDP_TUNNEL
+	---help---
+	  Foo over UDP allows any IP protocol to be directly encapsulated
+	  over UDP include tunnels (IPIP, GRE, SIT). By encapsulating in UDP
+	  network mechanisms and optimizations for UDP (such as ECMP
+	  and RSS) can be leveraged to provide better service.
+
 config INET_AH
 	tristate "IP: AH transformation"
 	select XFRM_ALGO
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index 8ee1cd4..d78d404 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -20,6 +20,7 @@ obj-$(CONFIG_IP_MULTIPLE_TABLES) += fib_rules.o
 obj-$(CONFIG_IP_MROUTE) += ipmr.o
 obj-$(CONFIG_NET_IPIP) += ipip.o
 gre-y := gre_demux.o
+obj-$(CONFIG_NET_FOU) += fou.o
 obj-$(CONFIG_NET_IPGRE_DEMUX) += gre.o
 obj-$(CONFIG_NET_IPGRE) += ip_gre.o
 obj-$(CONFIG_NET_UDP_TUNNEL) += udp_tunnel.o
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
new file mode 100644
index 0000000..de2af74
--- /dev/null
+++ b/net/ipv4/fou.c
@@ -0,0 +1,279 @@
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/socket.h>
+#include <linux/skbuff.h>
+#include <linux/ip.h>
+#include <linux/udp.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <net/genetlink.h>
+#include <net/ip.h>
+#include <net/udp.h>
+#include <net/udp_tunnel.h>
+#include <net/xfrm.h>
+#include <uapi/linux/fou.h>
+#include <uapi/linux/genetlink.h>
+
+static DEFINE_SPINLOCK(fou_lock);
+static LIST_HEAD(fou_list);
+
+struct fou {
+	struct socket *sock;
+	u8 protocol;
+	u16 port;
+	struct list_head list;
+};
+
+struct fou_cfg {
+	u8 protocol;
+	struct udp_port_cfg udp_config;
+};
+
+static inline struct fou *fou_from_sock(struct sock *sk)
+{
+	return (struct fou *)sk->sk_user_data;
+}
+
+static int fou_udp_encap_recv_deliver(struct sk_buff *skb,
+				      u8 protocol, size_t len)
+{
+	struct iphdr *iph = ip_hdr(skb);
+
+	/* Remove 'len' bytes from the packet (UDP header and
+	 * FOU header if present), modify the protocol to the one
+	 * we found, and then call rcv_encap.
+	 */
+	iph->tot_len = htons(ntohs(iph->tot_len) - len);
+	__skb_pull(skb, len);
+	skb_postpull_rcsum(skb, udp_hdr(skb), len);
+	skb_reset_transport_header(skb);
+
+	return -protocol;
+}
+
+static int fou_udp_recv(struct sock *sk, struct sk_buff *skb)
+{
+	struct fou *fou = fou_from_sock(sk);
+
+	if (!fou)
+		return 1;
+
+	return fou_udp_encap_recv_deliver(skb, fou->protocol,
+					  sizeof(struct udphdr));
+}
+
+static int fou_add_to_port_list(struct fou *fou)
+{
+	struct fou *fout;
+
+	spin_lock(&fou_lock);
+	list_for_each_entry(fout, &fou_list, list) {
+		if (fou->port == fout->port) {
+			spin_unlock(&fou_lock);
+			return -EALREADY;
+		}
+	}
+
+	list_add(&fou->list, &fou_list);
+	spin_unlock(&fou_lock);
+
+	return 0;
+}
+
+static void fou_release(struct fou *fou)
+{
+	struct socket *sock = fou->sock;
+	struct sock *sk = sock->sk;
+
+	udp_del_offload(&fou->udp_offloads);
+
+	list_del(&fou->list);
+
+	/* Remove hooks into tunnel socket */
+	sk->sk_user_data = NULL;
+
+	sock_release(sock);
+
+	kfree(fou);
+}
+
+static int fou_create(struct net *net, struct fou_cfg *cfg,
+		      struct socket **sockp)
+{
+	struct fou *fou = NULL;
+	int err;
+	struct socket *sock = NULL;
+	struct sock *sk;
+
+	/* Open UDP socket */
+	err = udp_sock_create(net, &cfg->udp_config, &sock);
+	if (err < 0)
+		goto error;
+
+	sk = sock->sk;
+
+	/* Allocate FOU port structure */
+	fou = kzalloc(sizeof(*fou), GFP_KERNEL);
+	if (!fou) {
+		err = -ENOMEM;
+		goto error;
+	}
+
+	/* Mark socket as an encapsulation socket. See net/ipv4/udp.c */
+	fou->protocol = cfg->protocol;
+	fou->port =  cfg->udp_config.local_udp_port;
+	udp_sk(sk)->encap_rcv = fou_udp_recv;
+
+	udp_sk(sk)->encap_type = 1;
+	udp_encap_enable();
+
+	sk->sk_user_data = fou;
+	fou->sock = sock;
+
+	udp_set_convert_csum(sock->sk, true);
+
+	sk->sk_allocation = GFP_ATOMIC;
+
+	err = fou_add_to_port_list(fou);
+	if (err)
+		goto error;
+
+	if (sockp)
+		*sockp = sock;
+
+	return 0;
+
+error:
+	kfree(fou);
+	if (sock)
+		sock_release(sock);
+
+	return err;
+}
+
+static int fou_destroy(struct net *net, struct fou_cfg *cfg)
+{
+	struct fou *fou;
+	u16 port = htons(cfg->udp_config.local_udp_port);
+	int err = -EINVAL;
+
+	spin_lock(&fou_lock);
+	list_for_each_entry(fou, &fou_list, list) {
+		if (fou->port == port) {
+			fou_release(fou);
+			err = 0;
+			break;
+		}
+	}
+	spin_unlock(&fou_lock);
+
+	return err;
+}
+
+static struct genl_family fou_nl_family = {
+	.id		= GENL_ID_GENERATE,
+	.hdrsize	= 0,
+	.name		= FOU_GENL_NAME,
+	.version	= FOU_GENL_VERSION,
+	.maxattr	= FOU_ATTR_MAX,
+	.netnsok	= true,
+};
+
+static struct nla_policy fou_nl_policy[FOU_ATTR_MAX + 1] = {
+	[FOU_ATTR_PORT] = { .type = NLA_U16, },
+	[FOU_ATTR_AF] = { .type = NLA_U8, },
+	[FOU_ATTR_IPPROTO] = { .type = NLA_U8, },
+};
+
+static int parse_nl_config(struct genl_info *info,
+			   struct fou_cfg *cfg)
+{
+	memset(cfg, 0, sizeof(*cfg));
+
+	cfg->udp_config.family = AF_INET;
+
+	if (info->attrs[FOU_ATTR_AF]) {
+		u8 family = nla_get_u8(info->attrs[FOU_ATTR_AF]);
+
+		if (family != AF_INET && family != AF_INET6)
+			return -EINVAL;
+
+		cfg->udp_config.family = family;
+	}
+
+	if (info->attrs[FOU_ATTR_PORT]) {
+		u16 port = nla_get_u16(info->attrs[FOU_ATTR_PORT]);
+
+		cfg->udp_config.local_udp_port = port;
+	}
+
+	if (info->attrs[FOU_ATTR_IPPROTO])
+		cfg->protocol = nla_get_u8(info->attrs[FOU_ATTR_IPPROTO]);
+
+	return 0;
+}
+
+static int fou_nl_cmd_add_port(struct sk_buff *skb, struct genl_info *info)
+{
+	struct fou_cfg cfg;
+	int err;
+
+	err = parse_nl_config(info, &cfg);
+	if (err)
+		return err;
+
+	return fou_create(&init_net, &cfg, NULL);
+}
+
+static int fou_nl_cmd_rm_port(struct sk_buff *skb, struct genl_info *info)
+{
+	struct fou_cfg cfg;
+
+	parse_nl_config(info, &cfg);
+
+	return fou_destroy(&init_net, &cfg);
+}
+
+static const struct genl_ops fou_nl_ops[] = {
+	{
+		.cmd = FOU_CMD_ADD,
+		.doit = fou_nl_cmd_add_port,
+		.policy = fou_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+	{
+		.cmd = FOU_CMD_DEL,
+		.doit = fou_nl_cmd_rm_port,
+		.policy = fou_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+	},
+};
+
+static int __init fou_init(void)
+{
+	int ret;
+
+	ret = genl_register_family_with_ops(&fou_nl_family,
+					    fou_nl_ops);
+
+	return ret;
+}
+
+static void __exit fou_fini(void)
+{
+	struct fou *fou, *next;
+
+	genl_unregister_family(&fou_nl_family);
+
+	/* Close all the FOU sockets */
+
+	spin_lock(&fou_lock);
+	list_for_each_entry_safe(fou, next, &fou_list, list)
+		fou_release(fou);
+	spin_unlock(&fou_lock);
+}
+
+module_init(fou_init);
+module_exit(fou_fini);
+MODULE_AUTHOR("Tom Herbert <therbert@google.com>");
+MODULE_LICENSE("GPL");
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 net-next 3/7] fou: Add GRO support
  2014-09-15  3:07 [PATCH v2 net-next 0/7] net: foo-over-udp (fou) Tom Herbert
  2014-09-15  3:07 ` [PATCH v2 net-next 1/7] net: Export inet_offloads and inet6_offloads Tom Herbert
  2014-09-15  3:07 ` [PATCH v2 net-next 2/7] fou: Support for foo-over-udp RX path Tom Herbert
@ 2014-09-15  3:07 ` Tom Herbert
  2014-09-15 15:00   ` Or Gerlitz
  2014-09-15 18:22   ` David Miller
  2014-09-15  3:08 ` [PATCH v2 net-next 4/7] net: Changes to ip_tunnel to support foo-over-udp encapsulation Tom Herbert
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 33+ messages in thread
From: Tom Herbert @ 2014-09-15  3:07 UTC (permalink / raw)
  To: davem, netdev

Implement fou_gro_receive and fou_gro_complete, and populate these
in the correponsing udp_offloads for the socket. Added ipproto to
udp_offloads and pass this from UDP to the fou GRO routine in proto
field of napi_gro_cb structure.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/linux/netdevice.h |  3 +-
 net/ipv4/fou.c            | 90 +++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/udp_offload.c    |  5 ++-
 3 files changed, 96 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f9e81d1..d380574 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1874,7 +1874,7 @@ struct napi_gro_cb {
 	/* jiffies when first packet was created/queued */
 	unsigned long age;
 
-	/* Used in ipv6_gro_receive() */
+	/* Used in ipv6_gro_receive() and foo-over-udp */
 	u16	proto;
 
 	/* Used in udp_gro_receive */
@@ -1925,6 +1925,7 @@ struct packet_offload {
 
 struct udp_offload {
 	__be16			 port;
+	u8			 ipproto;
 	struct offload_callbacks callbacks;
 };
 
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index de2af74..2ddd3b8 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -8,6 +8,7 @@
 #include <linux/kernel.h>
 #include <net/genetlink.h>
 #include <net/ip.h>
+#include <net/protocol.h>
 #include <net/udp.h>
 #include <net/udp_tunnel.h>
 #include <net/xfrm.h>
@@ -21,6 +22,7 @@ struct fou {
 	struct socket *sock;
 	u8 protocol;
 	u16 port;
+	struct udp_offload udp_offloads;
 	struct list_head list;
 };
 
@@ -62,6 +64,70 @@ static int fou_udp_recv(struct sock *sk, struct sk_buff *skb)
 					  sizeof(struct udphdr));
 }
 
+static inline struct sk_buff **fou_gro_receive(struct sk_buff **head,
+					       struct sk_buff *skb,
+					       const struct net_offload
+							     **offloads)
+{
+	const struct net_offload *ops;
+	struct sk_buff **pp = NULL;
+	u8 proto = NAPI_GRO_CB(skb)->proto;
+
+	rcu_read_lock();
+	ops = rcu_dereference(offloads[proto]);
+	if (!ops || !ops->callbacks.gro_receive)
+		goto out_unlock;
+
+	pp = ops->callbacks.gro_receive(head, skb);
+
+out_unlock:
+	rcu_read_unlock();
+
+	return pp;
+}
+
+static inline int fou_gro_complete(struct sk_buff *skb, int nhoff,
+				   const struct net_offload **offloads)
+{
+	const struct net_offload *ops;
+	u8 proto = NAPI_GRO_CB(skb)->proto;
+	int err = -ENOSYS;
+
+	rcu_read_lock();
+	ops = rcu_dereference(offloads[proto]);
+	if (WARN_ON(!ops || !ops->callbacks.gro_complete))
+		goto out_unlock;
+
+	err = ops->callbacks.gro_complete(skb, nhoff);
+
+out_unlock:
+	rcu_read_unlock();
+
+	return err;
+}
+
+static struct sk_buff **fou4_gro_receive(struct sk_buff **head,
+					 struct sk_buff *skb)
+{
+	return fou_gro_receive(head, skb, inet_offloads);
+}
+
+static int fou4_gro_complete(struct sk_buff *skb, int nhoff)
+{
+	return fou_gro_complete(skb, nhoff, inet_offloads);
+}
+
+static struct sk_buff **fou6_gro_receive(struct sk_buff **head,
+					 struct sk_buff *skb)
+{
+	return fou_gro_receive(head, skb, inet6_offloads);
+}
+
+static int fou6_gro_complete(struct sk_buff *skb, int nhoff)
+{
+	return fou_gro_complete(skb, nhoff, inet6_offloads);
+}
+
 static int fou_add_to_port_list(struct fou *fou)
 {
 	struct fou *fout;
@@ -134,6 +200,29 @@ static int fou_create(struct net *net, struct fou_cfg *cfg,
 
 	sk->sk_allocation = GFP_ATOMIC;
 
+	switch (cfg->udp_config.family) {
+	case AF_INET:
+		fou->udp_offloads.callbacks.gro_receive = fou4_gro_receive;
+		fou->udp_offloads.callbacks.gro_complete = fou4_gro_complete;
+		break;
+	case AF_INET6:
+		fou->udp_offloads.callbacks.gro_receive = fou6_gro_receive;
+		fou->udp_offloads.callbacks.gro_complete = fou6_gro_complete;
+		break;
+	default:
+		err = -EPFNOSUPPORT;
+		goto error;
+	}
+
+	fou->udp_offloads.port = cfg->udp_config.local_udp_port;
+	fou->udp_offloads.ipproto = cfg->protocol;
+
+	if (cfg->udp_config.family == AF_INET) {
+		err = udp_add_offload(&fou->udp_offloads);
+		if (err)
+			goto error;
+	}
+
 	err = fou_add_to_port_list(fou);
 	if (err)
 		goto error;
@@ -160,6 +249,7 @@ static int fou_destroy(struct net *net, struct fou_cfg *cfg)
 	spin_lock(&fou_lock);
 	list_for_each_entry(fou, &fou_list, list) {
 		if (fou->port == port) {
+			udp_del_offload(&fou->udp_offloads);
 			fou_release(fou);
 			err = 0;
 			break;
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index adab393..d7c43f7 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -276,6 +276,7 @@ unflush:
 
 	skb_gro_pull(skb, sizeof(struct udphdr)); /* pull encapsulating udp header */
 	skb_gro_postpull_rcsum(skb, uh, sizeof(struct udphdr));
+	NAPI_GRO_CB(skb)->proto = uo_priv->offload->ipproto;
 	pp = uo_priv->offload->callbacks.gro_receive(head, skb);
 
 out_unlock:
@@ -329,8 +330,10 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff)
 			break;
 	}
 
-	if (uo_priv != NULL)
+	if (uo_priv != NULL) {
+		NAPI_GRO_CB(skb)->proto = uo_priv->offload->ipproto;
 		err = uo_priv->offload->callbacks.gro_complete(skb, nhoff + sizeof(struct udphdr));
+	}
 
 	rcu_read_unlock();
 	return err;
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 net-next 4/7] net: Changes to ip_tunnel to support foo-over-udp encapsulation
  2014-09-15  3:07 [PATCH v2 net-next 0/7] net: foo-over-udp (fou) Tom Herbert
                   ` (2 preceding siblings ...)
  2014-09-15  3:07 ` [PATCH v2 net-next 3/7] fou: Add GRO support Tom Herbert
@ 2014-09-15  3:08 ` Tom Herbert
  2014-09-15  3:08 ` [PATCH v2 net-next 5/7] sit: TX path for sit/UDP " Tom Herbert
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 33+ messages in thread
From: Tom Herbert @ 2014-09-15  3:08 UTC (permalink / raw)
  To: davem, netdev

This patch changes IP tunnel to support (secondary) encapsulation,
Foo-over-UDP. Changes include:

1) Adding tun_hlen as the tunnel header length, encap_hlen as the
   encapsulation header length, and hlen becomes the grand total
   of these.
2) Added common netlink define to support FOU encapsulation.
3) Routines to perform FOU encapsulation.

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/net/ip_tunnels.h       | 19 ++++++++-
 include/uapi/linux/if_tunnel.h | 12 ++++++
 net/ipv4/ip_tunnel.c           | 91 +++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 120 insertions(+), 2 deletions(-)

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 8dd8cab..7f538ba 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -10,6 +10,7 @@
 #include <net/gro_cells.h>
 #include <net/inet_ecn.h>
 #include <net/ip.h>
+#include <net/netns/generic.h>
 #include <net/rtnetlink.h>
 
 #if IS_ENABLED(CONFIG_IPV6)
@@ -31,6 +32,13 @@ struct ip_tunnel_6rd_parm {
 };
 #endif
 
+struct ip_tunnel_encap {
+	__u16			type;
+	__u16			flags;
+	__be16			sport;
+	__be16			dport;
+};
+
 struct ip_tunnel_prl_entry {
 	struct ip_tunnel_prl_entry __rcu *next;
 	__be32				addr;
@@ -56,13 +64,18 @@ struct ip_tunnel {
 	/* These four fields used only by GRE */
 	__u32		i_seqno;	/* The last seen seqno	*/
 	__u32		o_seqno;	/* The last output seqno */
-	int		hlen;		/* Precalculated header length */
+	int		tun_hlen;	/* Precalculated header length */
 	int		mlink;
 
 	struct ip_tunnel_dst __percpu *dst_cache;
 
 	struct ip_tunnel_parm parms;
 
+	int		encap_hlen;	/* Encap header length (FOU,GUE) */
+	struct ip_tunnel_encap encap;
+
+	int		hlen;		/* tun_hlen + encap_hlen */
+
 	/* for SIT */
 #ifdef CONFIG_IPV6_SIT_6RD
 	struct ip_tunnel_6rd_parm ip6rd;
@@ -114,6 +127,8 @@ void ip_tunnel_delete_net(struct ip_tunnel_net *itn, struct rtnl_link_ops *ops);
 void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 		    const struct iphdr *tnl_params, const u8 protocol);
 int ip_tunnel_ioctl(struct net_device *dev, struct ip_tunnel_parm *p, int cmd);
+int ip_tunnel_encap(struct sk_buff *skb, struct ip_tunnel *t,
+		    u8 *protocol, struct flowi4 *fl4);
 int ip_tunnel_change_mtu(struct net_device *dev, int new_mtu);
 
 struct rtnl_link_stats64 *ip_tunnel_get_stats64(struct net_device *dev,
@@ -131,6 +146,8 @@ int ip_tunnel_newlink(struct net_device *dev, struct nlattr *tb[],
 		      struct ip_tunnel_parm *p);
 void ip_tunnel_setup(struct net_device *dev, int net_id);
 void ip_tunnel_dst_reset_all(struct ip_tunnel *t);
+int ip_tunnel_encap_setup(struct ip_tunnel *t,
+			  struct ip_tunnel_encap *ipencap);
 
 /* Extract dsfield from inner protocol */
 static inline u8 ip_tunnel_get_dsfield(const struct iphdr *iph,
diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index 3bce9e9..9fedca7 100644
--- a/include/uapi/linux/if_tunnel.h
+++ b/include/uapi/linux/if_tunnel.h
@@ -53,10 +53,22 @@ enum {
 	IFLA_IPTUN_6RD_RELAY_PREFIX,
 	IFLA_IPTUN_6RD_PREFIXLEN,
 	IFLA_IPTUN_6RD_RELAY_PREFIXLEN,
+	IFLA_IPTUN_ENCAP_TYPE,
+	IFLA_IPTUN_ENCAP_FLAGS,
+	IFLA_IPTUN_ENCAP_SPORT,
+	IFLA_IPTUN_ENCAP_DPORT,
 	__IFLA_IPTUN_MAX,
 };
 #define IFLA_IPTUN_MAX	(__IFLA_IPTUN_MAX - 1)
 
+enum tunnel_encap_types {
+	TUNNEL_ENCAP_NONE,
+	TUNNEL_ENCAP_FOU,
+};
+
+#define TUNNEL_ENCAP_FLAG_CSUM		(1<<0)
+#define TUNNEL_ENCAP_FLAG_CSUM6		(1<<1)
+
 /* SIT-mode i_flags */
 #define	SIT_ISATAP	0x0001
 
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index afed1aa..e3a3dc9 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -55,6 +55,7 @@
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 #include <net/rtnetlink.h>
+#include <net/udp.h>
 
 #if IS_ENABLED(CONFIG_IPV6)
 #include <net/ipv6.h>
@@ -487,6 +488,91 @@ drop:
 }
 EXPORT_SYMBOL_GPL(ip_tunnel_rcv);
 
+static int ip_encap_hlen(struct ip_tunnel_encap *e)
+{
+	switch (e->type) {
+	case TUNNEL_ENCAP_NONE:
+		return 0;
+	case TUNNEL_ENCAP_FOU:
+		return sizeof(struct udphdr);
+	default:
+		return -EINVAL;
+	}
+}
+
+int ip_tunnel_encap_setup(struct ip_tunnel *t,
+			  struct ip_tunnel_encap *ipencap)
+{
+	int hlen;
+
+	memset(&t->encap, 0, sizeof(t->encap));
+
+	hlen = ip_encap_hlen(ipencap);
+	if (hlen < 0)
+		return hlen;
+
+	t->encap.type = ipencap->type;
+	t->encap.sport = ipencap->sport;
+	t->encap.dport = ipencap->dport;
+	t->encap.flags = ipencap->flags;
+
+	t->encap_hlen = hlen;
+	t->hlen = t->encap_hlen + t->tun_hlen;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ip_tunnel_encap_setup);
+
+static int fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
+			    size_t hdr_len, u8 *protocol, struct flowi4 *fl4)
+{
+	struct udphdr *uh;
+	__be16 sport;
+	bool csum = !!(e->flags & TUNNEL_ENCAP_FLAG_CSUM);
+	int type = csum ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
+
+	skb = iptunnel_handle_offloads(skb, csum, type);
+
+	if (IS_ERR(skb))
+		return PTR_ERR(skb);
+
+	/* Get length and hash before making space in skb */
+
+	sport = e->sport ? : udp_flow_src_port(dev_net(skb->dev),
+					       skb, 0, 0, false);
+
+	skb_push(skb, hdr_len);
+
+	skb_reset_transport_header(skb);
+	uh = udp_hdr(skb);
+
+	uh->dest = e->dport;
+	uh->source = sport;
+	uh->len = htons(skb->len);
+	uh->check = 0;
+	udp_set_csum(!(e->flags & TUNNEL_ENCAP_FLAG_CSUM), skb,
+		     fl4->saddr, fl4->daddr, skb->len);
+
+	*protocol = IPPROTO_UDP;
+
+	return 0;
+}
+
+int ip_tunnel_encap(struct sk_buff *skb, struct ip_tunnel *t,
+		    u8 *protocol, struct flowi4 *fl4)
+{
+	switch (t->encap.type) {
+	case TUNNEL_ENCAP_NONE:
+		return 0;
+	case TUNNEL_ENCAP_FOU:
+		return fou_build_header(skb, &t->encap, t->encap_hlen,
+					protocol, fl4);
+	default:
+		return -EINVAL;
+	}
+}
+EXPORT_SYMBOL(ip_tunnel_encap);
+
 static int tnl_update_pmtu(struct net_device *dev, struct sk_buff *skb,
 			    struct rtable *rt, __be16 df)
 {
@@ -536,7 +622,7 @@ static int tnl_update_pmtu(struct net_device *dev, struct sk_buff *skb,
 }
 
 void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
-		    const struct iphdr *tnl_params, const u8 protocol)
+		    const struct iphdr *tnl_params, u8 protocol)
 {
 	struct ip_tunnel *tunnel = netdev_priv(dev);
 	const struct iphdr *inner_iph;
@@ -617,6 +703,9 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 	init_tunnel_flow(&fl4, protocol, dst, tnl_params->saddr,
 			 tunnel->parms.o_key, RT_TOS(tos), tunnel->parms.link);
 
+	if (ip_tunnel_encap(skb, tunnel, &protocol, &fl4) < 0)
+		goto tx_error;
+
 	rt = connected ? tunnel_rtable_get(tunnel, 0, &fl4.saddr) : NULL;
 
 	if (!rt) {
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 net-next 5/7] sit: TX path for sit/UDP foo-over-udp encapsulation
  2014-09-15  3:07 [PATCH v2 net-next 0/7] net: foo-over-udp (fou) Tom Herbert
                   ` (3 preceding siblings ...)
  2014-09-15  3:08 ` [PATCH v2 net-next 4/7] net: Changes to ip_tunnel to support foo-over-udp encapsulation Tom Herbert
@ 2014-09-15  3:08 ` Tom Herbert
  2014-09-15  3:08 ` [PATCH v2 net-next 6/7] ipip: TX path for IPIP/UDP " Tom Herbert
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 33+ messages in thread
From: Tom Herbert @ 2014-09-15  3:08 UTC (permalink / raw)
  To: davem, netdev

Signed-off-by: Tom Herbert <therbert@google.com>
---
 net/ipv6/sit.c | 107 +++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 97 insertions(+), 10 deletions(-)

diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 86e3fa8..db75809 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -822,6 +822,8 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 	int addr_type;
 	u8 ttl;
 	int err;
+	u8 protocol = IPPROTO_IPV6;
+	int t_hlen = tunnel->hlen + sizeof(struct iphdr);
 
 	if (skb->protocol != htons(ETH_P_IPV6))
 		goto tx_error;
@@ -911,8 +913,14 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 		goto tx_error;
 	}
 
+	skb = iptunnel_handle_offloads(skb, false, SKB_GSO_SIT);
+	if (IS_ERR(skb)) {
+		ip_rt_put(rt);
+		goto out;
+	}
+
 	if (df) {
-		mtu = dst_mtu(&rt->dst) - sizeof(struct iphdr);
+		mtu = dst_mtu(&rt->dst) - t_hlen;
 
 		if (mtu < 68) {
 			dev->stats.collisions++;
@@ -947,7 +955,7 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 	/*
 	 * Okay, now see if we can stuff it in the buffer as-is.
 	 */
-	max_headroom = LL_RESERVED_SPACE(tdev)+sizeof(struct iphdr);
+	max_headroom = LL_RESERVED_SPACE(tdev) + t_hlen;
 
 	if (skb_headroom(skb) < max_headroom || skb_shared(skb) ||
 	    (skb_cloned(skb) && !skb_clone_writable(skb, 0))) {
@@ -969,14 +977,13 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 		ttl = iph6->hop_limit;
 	tos = INET_ECN_encapsulate(tos, ipv6_get_dsfield(iph6));
 
-	skb = iptunnel_handle_offloads(skb, false, SKB_GSO_SIT);
-	if (IS_ERR(skb)) {
+	if (ip_tunnel_encap(skb, tunnel, &protocol, &fl4) < 0) {
 		ip_rt_put(rt);
-		goto out;
+		goto tx_error;
 	}
 
 	err = iptunnel_xmit(skb->sk, rt, skb, fl4.saddr, fl4.daddr,
-			    IPPROTO_IPV6, tos, ttl, df,
+			    protocol, tos, ttl, df,
 			    !net_eq(tunnel->net, dev_net(dev)));
 	iptunnel_xmit_stats(err, &dev->stats, dev->tstats);
 	return NETDEV_TX_OK;
@@ -1059,8 +1066,10 @@ static void ipip6_tunnel_bind_dev(struct net_device *dev)
 		tdev = __dev_get_by_index(tunnel->net, tunnel->parms.link);
 
 	if (tdev) {
+		int t_hlen = tunnel->hlen + sizeof(struct iphdr);
+
 		dev->hard_header_len = tdev->hard_header_len + sizeof(struct iphdr);
-		dev->mtu = tdev->mtu - sizeof(struct iphdr);
+		dev->mtu = tdev->mtu - t_hlen;
 		if (dev->mtu < IPV6_MIN_MTU)
 			dev->mtu = IPV6_MIN_MTU;
 	}
@@ -1307,7 +1316,10 @@ done:
 
 static int ipip6_tunnel_change_mtu(struct net_device *dev, int new_mtu)
 {
-	if (new_mtu < IPV6_MIN_MTU || new_mtu > 0xFFF8 - sizeof(struct iphdr))
+	struct ip_tunnel *tunnel = netdev_priv(dev);
+	int t_hlen = tunnel->hlen + sizeof(struct iphdr);
+
+	if (new_mtu < IPV6_MIN_MTU || new_mtu > 0xFFF8 - t_hlen)
 		return -EINVAL;
 	dev->mtu = new_mtu;
 	return 0;
@@ -1338,12 +1350,15 @@ static void ipip6_dev_free(struct net_device *dev)
 
 static void ipip6_tunnel_setup(struct net_device *dev)
 {
+	struct ip_tunnel *tunnel = netdev_priv(dev);
+	int t_hlen = tunnel->hlen + sizeof(struct iphdr);
+
 	dev->netdev_ops		= &ipip6_netdev_ops;
 	dev->destructor		= ipip6_dev_free;
 
 	dev->type		= ARPHRD_SIT;
-	dev->hard_header_len	= LL_MAX_HEADER + sizeof(struct iphdr);
-	dev->mtu		= ETH_DATA_LEN - sizeof(struct iphdr);
+	dev->hard_header_len	= LL_MAX_HEADER + t_hlen;
+	dev->mtu		= ETH_DATA_LEN - t_hlen;
 	dev->flags		= IFF_NOARP;
 	dev->priv_flags	       &= ~IFF_XMIT_DST_RELEASE;
 	dev->iflink		= 0;
@@ -1466,6 +1481,40 @@ static void ipip6_netlink_parms(struct nlattr *data[],
 
 }
 
+/* This function returns true when ENCAP attributes are present in the nl msg */
+static bool ipip6_netlink_encap_parms(struct nlattr *data[],
+				      struct ip_tunnel_encap *ipencap)
+{
+	bool ret = false;
+
+	memset(ipencap, 0, sizeof(*ipencap));
+
+	if (!data)
+		return ret;
+
+	if (data[IFLA_IPTUN_ENCAP_TYPE]) {
+		ret = true;
+		ipencap->type = nla_get_u16(data[IFLA_IPTUN_ENCAP_TYPE]);
+	}
+
+	if (data[IFLA_IPTUN_ENCAP_FLAGS]) {
+		ret = true;
+		ipencap->flags = nla_get_u16(data[IFLA_IPTUN_ENCAP_FLAGS]);
+	}
+
+	if (data[IFLA_IPTUN_ENCAP_SPORT]) {
+		ret = true;
+		ipencap->sport = nla_get_u16(data[IFLA_IPTUN_ENCAP_SPORT]);
+	}
+
+	if (data[IFLA_IPTUN_ENCAP_DPORT]) {
+		ret = true;
+		ipencap->dport = nla_get_u16(data[IFLA_IPTUN_ENCAP_DPORT]);
+	}
+
+	return ret;
+}
+
 #ifdef CONFIG_IPV6_SIT_6RD
 /* This function returns true when 6RD attributes are present in the nl msg */
 static bool ipip6_netlink_6rd_parms(struct nlattr *data[],
@@ -1509,12 +1558,20 @@ static int ipip6_newlink(struct net *src_net, struct net_device *dev,
 {
 	struct net *net = dev_net(dev);
 	struct ip_tunnel *nt;
+	struct ip_tunnel_encap ipencap;
 #ifdef CONFIG_IPV6_SIT_6RD
 	struct ip_tunnel_6rd ip6rd;
 #endif
 	int err;
 
 	nt = netdev_priv(dev);
+
+	if (ipip6_netlink_encap_parms(data, &ipencap)) {
+		err = ip_tunnel_encap_setup(nt, &ipencap);
+		if (err < 0)
+			return err;
+	}
+
 	ipip6_netlink_parms(data, &nt->parms);
 
 	if (ipip6_tunnel_locate(net, &nt->parms, 0))
@@ -1537,15 +1594,23 @@ static int ipip6_changelink(struct net_device *dev, struct nlattr *tb[],
 {
 	struct ip_tunnel *t = netdev_priv(dev);
 	struct ip_tunnel_parm p;
+	struct ip_tunnel_encap ipencap;
 	struct net *net = t->net;
 	struct sit_net *sitn = net_generic(net, sit_net_id);
 #ifdef CONFIG_IPV6_SIT_6RD
 	struct ip_tunnel_6rd ip6rd;
 #endif
+	int err;
 
 	if (dev == sitn->fb_tunnel_dev)
 		return -EINVAL;
 
+	if (ipip6_netlink_encap_parms(data, &ipencap)) {
+		err = ip_tunnel_encap_setup(t, &ipencap);
+		if (err < 0)
+			return err;
+	}
+
 	ipip6_netlink_parms(data, &p);
 
 	if (((dev->flags & IFF_POINTOPOINT) && !p.iph.daddr) ||
@@ -1599,6 +1664,14 @@ static size_t ipip6_get_size(const struct net_device *dev)
 		/* IFLA_IPTUN_6RD_RELAY_PREFIXLEN */
 		nla_total_size(2) +
 #endif
+		/* IFLA_IPTUN_ENCAP_TYPE */
+		nla_total_size(2) +
+		/* IFLA_IPTUN_ENCAP_FLAGS */
+		nla_total_size(2) +
+		/* IFLA_IPTUN_ENCAP_SPORT */
+		nla_total_size(2) +
+		/* IFLA_IPTUN_ENCAP_DPORT */
+		nla_total_size(2) +
 		0;
 }
 
@@ -1630,6 +1703,16 @@ static int ipip6_fill_info(struct sk_buff *skb, const struct net_device *dev)
 		goto nla_put_failure;
 #endif
 
+	if (nla_put_u16(skb, IFLA_IPTUN_ENCAP_TYPE,
+			tunnel->encap.type) ||
+	    nla_put_u16(skb, IFLA_IPTUN_ENCAP_SPORT,
+			tunnel->encap.sport) ||
+	    nla_put_u16(skb, IFLA_IPTUN_ENCAP_DPORT,
+			tunnel->encap.dport) ||
+	    nla_put_u16(skb, IFLA_IPTUN_ENCAP_FLAGS,
+			tunnel->encap.dport))
+		goto nla_put_failure;
+
 	return 0;
 
 nla_put_failure:
@@ -1651,6 +1734,10 @@ static const struct nla_policy ipip6_policy[IFLA_IPTUN_MAX + 1] = {
 	[IFLA_IPTUN_6RD_PREFIXLEN]	= { .type = NLA_U16 },
 	[IFLA_IPTUN_6RD_RELAY_PREFIXLEN] = { .type = NLA_U16 },
 #endif
+	[IFLA_IPTUN_ENCAP_TYPE]		= { .type = NLA_U16 },
+	[IFLA_IPTUN_ENCAP_FLAGS]	= { .type = NLA_U16 },
+	[IFLA_IPTUN_ENCAP_SPORT]	= { .type = NLA_U16 },
+	[IFLA_IPTUN_ENCAP_DPORT]	= { .type = NLA_U16 },
 };
 
 static void ipip6_dellink(struct net_device *dev, struct list_head *head)
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 net-next 6/7] ipip: TX path for IPIP/UDP foo-over-udp encapsulation
  2014-09-15  3:07 [PATCH v2 net-next 0/7] net: foo-over-udp (fou) Tom Herbert
                   ` (4 preceding siblings ...)
  2014-09-15  3:08 ` [PATCH v2 net-next 5/7] sit: TX path for sit/UDP " Tom Herbert
@ 2014-09-15  3:08 ` Tom Herbert
  2014-09-15  3:08 ` [PATCH v2 net-next 7/7] gre: TX path for GRE/UDP " Tom Herbert
  2014-09-15 18:08 ` [PATCH v2 net-next 0/7] net: foo-over-udp (fou) Or Gerlitz
  7 siblings, 0 replies; 33+ messages in thread
From: Tom Herbert @ 2014-09-15  3:08 UTC (permalink / raw)
  To: davem, netdev

Signed-off-by: Tom Herbert <therbert@google.com>
---
 net/ipv4/ipip.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 83 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 62eaa00..2985551 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -301,7 +301,8 @@ static int ipip_tunnel_init(struct net_device *dev)
 	memcpy(dev->dev_addr, &tunnel->parms.iph.saddr, 4);
 	memcpy(dev->broadcast, &tunnel->parms.iph.daddr, 4);
 
-	tunnel->hlen = 0;
+	tunnel->tun_hlen = 0;
+	tunnel->hlen = tunnel->tun_hlen + tunnel->encap_hlen;
 	tunnel->parms.iph.protocol = IPPROTO_IPIP;
 	return ip_tunnel_init(dev);
 }
@@ -340,19 +341,73 @@ static void ipip_netlink_parms(struct nlattr *data[],
 		parms->iph.frag_off = htons(IP_DF);
 }
 
+/* This function returns true when ENCAP attributes are present in the nl msg */
+static bool ipip_netlink_encap_parms(struct nlattr *data[],
+				     struct ip_tunnel_encap *ipencap)
+{
+	bool ret = false;
+
+	memset(ipencap, 0, sizeof(*ipencap));
+
+	if (!data)
+		return ret;
+
+	if (data[IFLA_IPTUN_ENCAP_TYPE]) {
+		ret = true;
+		ipencap->type = nla_get_u16(data[IFLA_IPTUN_ENCAP_TYPE]);
+	}
+
+	if (data[IFLA_IPTUN_ENCAP_FLAGS]) {
+		ret = true;
+		ipencap->flags = nla_get_u16(data[IFLA_IPTUN_ENCAP_FLAGS]);
+	}
+
+	if (data[IFLA_IPTUN_ENCAP_SPORT]) {
+		ret = true;
+		ipencap->sport = nla_get_u16(data[IFLA_IPTUN_ENCAP_SPORT]);
+	}
+
+	if (data[IFLA_IPTUN_ENCAP_DPORT]) {
+		ret = true;
+		ipencap->dport = nla_get_u16(data[IFLA_IPTUN_ENCAP_DPORT]);
+	}
+
+	return ret;
+}
+
 static int ipip_newlink(struct net *src_net, struct net_device *dev,
 			struct nlattr *tb[], struct nlattr *data[])
 {
 	struct ip_tunnel_parm p;
+	struct ip_tunnel *t = netdev_priv(dev);
+	struct ip_tunnel_encap ipencap;
+	int err;
+
+	if (ipip_netlink_encap_parms(data, &ipencap)) {
+		err = ip_tunnel_encap_setup(t, &ipencap);
+		if (err < 0)
+			return err;
+	}
 
 	ipip_netlink_parms(data, &p);
-	return ip_tunnel_newlink(dev, tb, &p);
+	err = ip_tunnel_newlink(dev, tb, &p);
+
+	return err;
 }
 
 static int ipip_changelink(struct net_device *dev, struct nlattr *tb[],
 			   struct nlattr *data[])
 {
+	struct ip_tunnel *t = netdev_priv(dev);
 	struct ip_tunnel_parm p;
+	struct ip_tunnel_encap ipencap;
+	int err;
+
+	if (ipip_netlink_encap_parms(data, &ipencap)) {
+		err = ip_tunnel_encap_setup(t, &ipencap);
+		if (err < 0)
+			return err;
+	}
 
 	ipip_netlink_parms(data, &p);
 
@@ -360,7 +415,9 @@ static int ipip_changelink(struct net_device *dev, struct nlattr *tb[],
 	    (!(dev->flags & IFF_POINTOPOINT) && p.iph.daddr))
 		return -EINVAL;
 
-	return ip_tunnel_changelink(dev, tb, &p);
+	err = ip_tunnel_changelink(dev, tb, &p);
+
+	return err;
 }
 
 static size_t ipip_get_size(const struct net_device *dev)
@@ -378,6 +435,14 @@ static size_t ipip_get_size(const struct net_device *dev)
 		nla_total_size(1) +
 		/* IFLA_IPTUN_PMTUDISC */
 		nla_total_size(1) +
+		/* IFLA_IPTUN_ENCAP_TYPE */
+		nla_total_size(2) +
+		/* IFLA_IPTUN_ENCAP_FLAGS */
+		nla_total_size(2) +
+		/* IFLA_IPTUN_ENCAP_SPORT */
+		nla_total_size(2) +
+		/* IFLA_IPTUN_ENCAP_DPORT */
+		nla_total_size(2) +
 		0;
 }
 
@@ -394,6 +459,17 @@ static int ipip_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	    nla_put_u8(skb, IFLA_IPTUN_PMTUDISC,
 		       !!(parm->iph.frag_off & htons(IP_DF))))
 		goto nla_put_failure;
+
+	if (nla_put_u16(skb, IFLA_IPTUN_ENCAP_TYPE,
+			tunnel->encap.type) ||
+	    nla_put_u16(skb, IFLA_IPTUN_ENCAP_SPORT,
+			tunnel->encap.sport) ||
+	    nla_put_u16(skb, IFLA_IPTUN_ENCAP_DPORT,
+			tunnel->encap.dport) ||
+	    nla_put_u16(skb, IFLA_IPTUN_ENCAP_FLAGS,
+			tunnel->encap.dport))
+		goto nla_put_failure;
+
 	return 0;
 
 nla_put_failure:
@@ -407,6 +483,10 @@ static const struct nla_policy ipip_policy[IFLA_IPTUN_MAX + 1] = {
 	[IFLA_IPTUN_TTL]		= { .type = NLA_U8 },
 	[IFLA_IPTUN_TOS]		= { .type = NLA_U8 },
 	[IFLA_IPTUN_PMTUDISC]		= { .type = NLA_U8 },
+	[IFLA_IPTUN_ENCAP_TYPE]		= { .type = NLA_U16 },
+	[IFLA_IPTUN_ENCAP_FLAGS]	= { .type = NLA_U16 },
+	[IFLA_IPTUN_ENCAP_SPORT]	= { .type = NLA_U16 },
+	[IFLA_IPTUN_ENCAP_DPORT]	= { .type = NLA_U16 },
 };
 
 static struct rtnl_link_ops ipip_link_ops __read_mostly = {
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 net-next 7/7] gre: TX path for GRE/UDP foo-over-udp encapsulation
  2014-09-15  3:07 [PATCH v2 net-next 0/7] net: foo-over-udp (fou) Tom Herbert
                   ` (5 preceding siblings ...)
  2014-09-15  3:08 ` [PATCH v2 net-next 6/7] ipip: TX path for IPIP/UDP " Tom Herbert
@ 2014-09-15  3:08 ` Tom Herbert
  2014-09-15 18:08 ` [PATCH v2 net-next 0/7] net: foo-over-udp (fou) Or Gerlitz
  7 siblings, 0 replies; 33+ messages in thread
From: Tom Herbert @ 2014-09-15  3:08 UTC (permalink / raw)
  To: davem, netdev

Signed-off-by: Tom Herbert <therbert@google.com>
---
 include/uapi/linux/if_tunnel.h |  4 ++
 net/ipv4/ip_gre.c              | 98 +++++++++++++++++++++++++++++++++++++++---
 2 files changed, 95 insertions(+), 7 deletions(-)

diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index 9fedca7..7c832af 100644
--- a/include/uapi/linux/if_tunnel.h
+++ b/include/uapi/linux/if_tunnel.h
@@ -106,6 +106,10 @@ enum {
 	IFLA_GRE_ENCAP_LIMIT,
 	IFLA_GRE_FLOWINFO,
 	IFLA_GRE_FLAGS,
+	IFLA_GRE_ENCAP_TYPE,
+	IFLA_GRE_ENCAP_FLAGS,
+	IFLA_GRE_ENCAP_SPORT,
+	IFLA_GRE_ENCAP_DPORT,
 	__IFLA_GRE_MAX,
 };
 
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 9b84254..5681344 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -239,7 +239,7 @@ static void __gre_xmit(struct sk_buff *skb, struct net_device *dev,
 	tpi.seq = htonl(tunnel->o_seqno);
 
 	/* Push GRE header. */
-	gre_build_header(skb, &tpi, tunnel->hlen);
+	gre_build_header(skb, &tpi, tunnel->tun_hlen);
 
 	ip_tunnel_xmit(skb, dev, tnl_params, tnl_params->protocol);
 }
@@ -310,7 +310,7 @@ out:
 static int ipgre_tunnel_ioctl(struct net_device *dev,
 			      struct ifreq *ifr, int cmd)
 {
-	int err = 0;
+	int err;
 	struct ip_tunnel_parm p;
 
 	if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof(p)))
@@ -470,13 +470,18 @@ static void ipgre_tunnel_setup(struct net_device *dev)
 static void __gre_tunnel_init(struct net_device *dev)
 {
 	struct ip_tunnel *tunnel;
+	int t_hlen;
 
 	tunnel = netdev_priv(dev);
-	tunnel->hlen = ip_gre_calc_hlen(tunnel->parms.o_flags);
+	tunnel->tun_hlen = ip_gre_calc_hlen(tunnel->parms.o_flags);
 	tunnel->parms.iph.protocol = IPPROTO_GRE;
 
-	dev->needed_headroom	= LL_MAX_HEADER + sizeof(struct iphdr) + 4;
-	dev->mtu		= ETH_DATA_LEN - sizeof(struct iphdr) - 4;
+	tunnel->hlen = tunnel->tun_hlen + tunnel->encap_hlen;
+
+	t_hlen = tunnel->hlen + sizeof(struct iphdr);
+
+	dev->needed_headroom	= LL_MAX_HEADER + t_hlen + 4;
+	dev->mtu		= ETH_DATA_LEN - t_hlen - 4;
 
 	dev->features		|= GRE_FEATURES;
 	dev->hw_features	|= GRE_FEATURES;
@@ -628,6 +633,40 @@ static void ipgre_netlink_parms(struct nlattr *data[], struct nlattr *tb[],
 		parms->iph.frag_off = htons(IP_DF);
 }
 
+/* This function returns true when ENCAP attributes are present in the nl msg */
+static bool ipgre_netlink_encap_parms(struct nlattr *data[],
+				      struct ip_tunnel_encap *ipencap)
+{
+	bool ret = false;
+
+	memset(ipencap, 0, sizeof(*ipencap));
+
+	if (!data)
+		return ret;
+
+	if (data[IFLA_GRE_ENCAP_TYPE]) {
+		ret = true;
+		ipencap->type = nla_get_u16(data[IFLA_GRE_ENCAP_TYPE]);
+	}
+
+	if (data[IFLA_GRE_ENCAP_FLAGS]) {
+		ret = true;
+		ipencap->flags = nla_get_u16(data[IFLA_GRE_ENCAP_FLAGS]);
+	}
+
+	if (data[IFLA_GRE_ENCAP_SPORT]) {
+		ret = true;
+		ipencap->sport = nla_get_u16(data[IFLA_GRE_ENCAP_SPORT]);
+	}
+
+	if (data[IFLA_GRE_ENCAP_DPORT]) {
+		ret = true;
+		ipencap->dport = nla_get_u16(data[IFLA_GRE_ENCAP_DPORT]);
+	}
+
+	return ret;
+}
+
 static int gre_tap_init(struct net_device *dev)
 {
 	__gre_tunnel_init(dev);
@@ -657,18 +696,40 @@ static int ipgre_newlink(struct net *src_net, struct net_device *dev,
 			 struct nlattr *tb[], struct nlattr *data[])
 {
 	struct ip_tunnel_parm p;
+	struct ip_tunnel *nt = netdev_priv(dev);
+	struct ip_tunnel_encap ipencap;
+	int err;
+
+	if (ipgre_netlink_encap_parms(data, &ipencap)) {
+		err = ip_tunnel_encap_setup(nt, &ipencap);
+		if (err < 0)
+			return err;
+	}
 
 	ipgre_netlink_parms(data, tb, &p);
-	return ip_tunnel_newlink(dev, tb, &p);
+	err = ip_tunnel_newlink(dev, tb, &p);
+
+	return err;
 }
 
 static int ipgre_changelink(struct net_device *dev, struct nlattr *tb[],
 			    struct nlattr *data[])
 {
+	struct ip_tunnel *t = netdev_priv(dev);
 	struct ip_tunnel_parm p;
+	struct ip_tunnel_encap ipencap;
+	int err;
+
+	if (ipgre_netlink_encap_parms(data, &ipencap)) {
+		err = ip_tunnel_encap_setup(t, &ipencap);
+		if (err < 0)
+			return err;
+	}
 
 	ipgre_netlink_parms(data, tb, &p);
-	return ip_tunnel_changelink(dev, tb, &p);
+	err = ip_tunnel_changelink(dev, tb, &p);
+
+	return err;
 }
 
 static size_t ipgre_get_size(const struct net_device *dev)
@@ -694,6 +755,14 @@ static size_t ipgre_get_size(const struct net_device *dev)
 		nla_total_size(1) +
 		/* IFLA_GRE_PMTUDISC */
 		nla_total_size(1) +
+		/* IFLA_GRE_ENCAP_TYPE */
+		nla_total_size(2) +
+		/* IFLA_GRE_ENCAP_FLAGS */
+		nla_total_size(2) +
+		/* IFLA_GRE_ENCAP_SPORT */
+		nla_total_size(2) +
+		/* IFLA_GRE_ENCAP_DPORT */
+		nla_total_size(2) +
 		0;
 }
 
@@ -714,6 +783,17 @@ static int ipgre_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	    nla_put_u8(skb, IFLA_GRE_PMTUDISC,
 		       !!(p->iph.frag_off & htons(IP_DF))))
 		goto nla_put_failure;
+
+	if (nla_put_u16(skb, IFLA_GRE_ENCAP_TYPE,
+			t->encap.type) ||
+	    nla_put_u16(skb, IFLA_GRE_ENCAP_SPORT,
+			t->encap.sport) ||
+	    nla_put_u16(skb, IFLA_GRE_ENCAP_DPORT,
+			t->encap.dport) ||
+	    nla_put_u16(skb, IFLA_GRE_ENCAP_FLAGS,
+			t->encap.dport))
+		goto nla_put_failure;
+
 	return 0;
 
 nla_put_failure:
@@ -731,6 +811,10 @@ static const struct nla_policy ipgre_policy[IFLA_GRE_MAX + 1] = {
 	[IFLA_GRE_TTL]		= { .type = NLA_U8 },
 	[IFLA_GRE_TOS]		= { .type = NLA_U8 },
 	[IFLA_GRE_PMTUDISC]	= { .type = NLA_U8 },
+	[IFLA_GRE_ENCAP_TYPE]	= { .type = NLA_U16 },
+	[IFLA_GRE_ENCAP_FLAGS]	= { .type = NLA_U16 },
+	[IFLA_GRE_ENCAP_SPORT]	= { .type = NLA_U16 },
+	[IFLA_GRE_ENCAP_DPORT]	= { .type = NLA_U16 },
 };
 
 static struct rtnl_link_ops ipgre_link_ops __read_mostly = {
-- 
2.1.0.rc2.206.gedb03e5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 net-next 1/7] net: Export inet_offloads and inet6_offloads
  2014-09-15  3:07 ` [PATCH v2 net-next 1/7] net: Export inet_offloads and inet6_offloads Tom Herbert
@ 2014-09-15 13:33   ` Or Gerlitz
  2014-09-15 15:13     ` Tom Herbert
  2014-09-15 18:21   ` David Miller
  1 sibling, 1 reply; 33+ messages in thread
From: Or Gerlitz @ 2014-09-15 13:33 UTC (permalink / raw)
  To: Tom Herbert, Jerry Chu; +Cc: davem, netdev

On Mon, Sep 15, 2014 at 6:07 AM, Tom Herbert <therbert@google.com> wrote:
> Want to be able to call this in foo-over-udp offloads, etc.

In the L2 gro case, we did dedicated helpers
gro_find_receive/complete_by_type, not sure what was
the exact rational there but worth checking, jerry?


>
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---
>  net/ipv4/protocol.c | 1 +
>  net/ipv6/protocol.c | 1 +
>  2 files changed, 2 insertions(+)
>
> diff --git a/net/ipv4/protocol.c b/net/ipv4/protocol.c
> index 46d6a1c..4b7c0ec 100644
> --- a/net/ipv4/protocol.c
> +++ b/net/ipv4/protocol.c
> @@ -30,6 +30,7 @@
>
>  const struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS] __read_mostly;
>  const struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS] __read_mostly;
> +EXPORT_SYMBOL(inet_offloads);
>
>  int inet_add_protocol(const struct net_protocol *prot, unsigned char protocol)
>  {
> diff --git a/net/ipv6/protocol.c b/net/ipv6/protocol.c
> index e048cf1..e3770ab 100644
> --- a/net/ipv6/protocol.c
> +++ b/net/ipv6/protocol.c
> @@ -51,6 +51,7 @@ EXPORT_SYMBOL(inet6_del_protocol);
>  #endif
>
>  const struct net_offload __rcu *inet6_offloads[MAX_INET_PROTOS] __read_mostly;
> +EXPORT_SYMBOL(inet6_offloads);
>
>  int inet6_add_offload(const struct net_offload *prot, unsigned char protocol)
>  {
> --
> 2.1.0.rc2.206.gedb03e5
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 net-next 3/7] fou: Add GRO support
  2014-09-15  3:07 ` [PATCH v2 net-next 3/7] fou: Add GRO support Tom Herbert
@ 2014-09-15 15:00   ` Or Gerlitz
  2014-09-15 15:10     ` Tom Herbert
  2014-09-15 18:22   ` David Miller
  1 sibling, 1 reply; 33+ messages in thread
From: Or Gerlitz @ 2014-09-15 15:00 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev

On Mon, Sep 15, 2014 at 6:07 AM, Tom Herbert <therbert@google.com> wrote:
> Implement fou_gro_receive and fou_gro_complete, and populate these
> in the correponsing udp_offloads for the socket. Added ipproto to
> udp_offloads and pass this from UDP to the fou GRO routine in proto
> field of napi_gro_cb structure.


Do we really need that  extra hop of fou4_gro_receive/complete?
can't we somehow plant the gro receive/complete (say) GRE handlers in
the udp offload
struct with the UDP port that related to (say) GRE over UDP tunneling?

Or.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 net-next 3/7] fou: Add GRO support
  2014-09-15 15:00   ` Or Gerlitz
@ 2014-09-15 15:10     ` Tom Herbert
  2014-09-15 17:03       ` Or Gerlitz
  0 siblings, 1 reply; 33+ messages in thread
From: Tom Herbert @ 2014-09-15 15:10 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: David Miller, Linux Netdev List

On Mon, Sep 15, 2014 at 8:00 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Mon, Sep 15, 2014 at 6:07 AM, Tom Herbert <therbert@google.com> wrote:
>> Implement fou_gro_receive and fou_gro_complete, and populate these
>> in the correponsing udp_offloads for the socket. Added ipproto to
>> udp_offloads and pass this from UDP to the fou GRO routine in proto
>> field of napi_gro_cb structure.
>
>
> Do we really need that  extra hop of fou4_gro_receive/complete?
> can't we somehow plant the gro receive/complete (say) GRE handlers in
> the udp offload
> struct with the UDP port that related to (say) GRE over UDP tunneling?
>
That would be nice, but it isn't obvious to me how to manage the
references. The offload functions are accessed with RCU pretty
consistently.

Tom

> Or.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 net-next 1/7] net: Export inet_offloads and inet6_offloads
  2014-09-15 13:33   ` Or Gerlitz
@ 2014-09-15 15:13     ` Tom Herbert
  2014-09-15 17:15       ` Or Gerlitz
  0 siblings, 1 reply; 33+ messages in thread
From: Tom Herbert @ 2014-09-15 15:13 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Jerry Chu, David Miller, Linux Netdev List

On Mon, Sep 15, 2014 at 6:33 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Mon, Sep 15, 2014 at 6:07 AM, Tom Herbert <therbert@google.com> wrote:
>> Want to be able to call this in foo-over-udp offloads, etc.
>
> In the L2 gro case, we did dedicated helpers
> gro_find_receive/complete_by_type, not sure what was
> the exact rational there but worth checking, jerry?
>
It allows offload_base to be kept a static, but then
gro_find_receive_by_type can't be inlined.

>
>>
>> Signed-off-by: Tom Herbert <therbert@google.com>
>> ---
>>  net/ipv4/protocol.c | 1 +
>>  net/ipv6/protocol.c | 1 +
>>  2 files changed, 2 insertions(+)
>>
>> diff --git a/net/ipv4/protocol.c b/net/ipv4/protocol.c
>> index 46d6a1c..4b7c0ec 100644
>> --- a/net/ipv4/protocol.c
>> +++ b/net/ipv4/protocol.c
>> @@ -30,6 +30,7 @@
>>
>>  const struct net_protocol __rcu *inet_protos[MAX_INET_PROTOS] __read_mostly;
>>  const struct net_offload __rcu *inet_offloads[MAX_INET_PROTOS] __read_mostly;
>> +EXPORT_SYMBOL(inet_offloads);
>>
>>  int inet_add_protocol(const struct net_protocol *prot, unsigned char protocol)
>>  {
>> diff --git a/net/ipv6/protocol.c b/net/ipv6/protocol.c
>> index e048cf1..e3770ab 100644
>> --- a/net/ipv6/protocol.c
>> +++ b/net/ipv6/protocol.c
>> @@ -51,6 +51,7 @@ EXPORT_SYMBOL(inet6_del_protocol);
>>  #endif
>>
>>  const struct net_offload __rcu *inet6_offloads[MAX_INET_PROTOS] __read_mostly;
>> +EXPORT_SYMBOL(inet6_offloads);
>>
>>  int inet6_add_offload(const struct net_offload *prot, unsigned char protocol)
>>  {
>> --
>> 2.1.0.rc2.206.gedb03e5
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 net-next 3/7] fou: Add GRO support
  2014-09-15 15:10     ` Tom Herbert
@ 2014-09-15 17:03       ` Or Gerlitz
  2014-09-15 17:21         ` Tom Herbert
  0 siblings, 1 reply; 33+ messages in thread
From: Or Gerlitz @ 2014-09-15 17:03 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, Linux Netdev List

On Mon, Sep 15, 2014 at 6:10 PM, Tom Herbert <therbert@google.com> wrote:
> On Mon, Sep 15, 2014 at 8:00 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>> On Mon, Sep 15, 2014 at 6:07 AM, Tom Herbert <therbert@google.com> wrote:
>>> Implement fou_gro_receive and fou_gro_complete, and populate these
>>> in the correponsing udp_offloads for the socket. Added ipproto to
>>> udp_offloads and pass this from UDP to the fou GRO routine in proto
>>> field of napi_gro_cb structure.
>>
>>
>> Do we really need that  extra hop of fou4_gro_receive/complete?
>> can't we somehow plant the gro receive/complete (say) GRE handlers in
>> the udp offload
>> struct with the UDP port that related to (say) GRE over UDP tunneling?
>>
> That would be nice, but it isn't obvious to me how to manage the
> references. The offload functions are accessed with RCU pretty consistently.

Currently udp_gro_receive calls rcu_read_lock() before it invokes
fou4/6_gro_receive
and rcu_read_unlock after the fou calls returns. The Fou call repeats
the same practice
w.r.t the (say) GRE gro receive callback, so... what happens if we
eliminate the fou part all together?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 net-next 1/7] net: Export inet_offloads and inet6_offloads
  2014-09-15 15:13     ` Tom Herbert
@ 2014-09-15 17:15       ` Or Gerlitz
  2014-09-15 17:32         ` Tom Herbert
  0 siblings, 1 reply; 33+ messages in thread
From: Or Gerlitz @ 2014-09-15 17:15 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Jerry Chu, David Miller, Linux Netdev List

On Mon, Sep 15, 2014 at 6:13 PM, Tom Herbert <therbert@google.com> wrote:
> On Mon, Sep 15, 2014 at 6:33 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>> On Mon, Sep 15, 2014 at 6:07 AM, Tom Herbert <therbert@google.com> wrote:
>>> Want to be able to call this in foo-over-udp offloads, etc.
>>
>> In the L2 gro case, we did dedicated helpers
>> gro_find_receive/complete_by_type, not sure what was
>> the exact rational there but worth checking, jerry?
>>
> It allows offload_base to be kept a static, but then
> gro_find_receive_by_type can't be inlined.


so we have two similar locations in the networking stack acting
differently on the same/similar simple
matter... a bit problematic maintainance wise, I would say.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 net-next 3/7] fou: Add GRO support
  2014-09-15 17:03       ` Or Gerlitz
@ 2014-09-15 17:21         ` Tom Herbert
  0 siblings, 0 replies; 33+ messages in thread
From: Tom Herbert @ 2014-09-15 17:21 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: David Miller, Linux Netdev List

On Mon, Sep 15, 2014 at 10:03 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Mon, Sep 15, 2014 at 6:10 PM, Tom Herbert <therbert@google.com> wrote:
>> On Mon, Sep 15, 2014 at 8:00 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>> On Mon, Sep 15, 2014 at 6:07 AM, Tom Herbert <therbert@google.com> wrote:
>>>> Implement fou_gro_receive and fou_gro_complete, and populate these
>>>> in the correponsing udp_offloads for the socket. Added ipproto to
>>>> udp_offloads and pass this from UDP to the fou GRO routine in proto
>>>> field of napi_gro_cb structure.
>>>
>>>
>>> Do we really need that  extra hop of fou4_gro_receive/complete?
>>> can't we somehow plant the gro receive/complete (say) GRE handlers in
>>> the udp offload
>>> struct with the UDP port that related to (say) GRE over UDP tunneling?
>>>
>> That would be nice, but it isn't obvious to me how to manage the
>> references. The offload functions are accessed with RCU pretty consistently.
>
> Currently udp_gro_receive calls rcu_read_lock() before it invokes
> fou4/6_gro_receive
> and rcu_read_unlock after the fou calls returns. The Fou call repeats
> the same practice
> w.r.t the (say) GRE gro receive callback, so... what happens if we
> eliminate the fou part all together?

Yes, we could conceivably index into inet_offloads directly from
udp_gro_receive in lieu of calling the offload functions in the
structure. It is more special case code in udp though and I'm not sure
that makes it a win.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 net-next 1/7] net: Export inet_offloads and inet6_offloads
  2014-09-15 17:15       ` Or Gerlitz
@ 2014-09-15 17:32         ` Tom Herbert
  0 siblings, 0 replies; 33+ messages in thread
From: Tom Herbert @ 2014-09-15 17:32 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Jerry Chu, David Miller, Linux Netdev List

On Mon, Sep 15, 2014 at 10:15 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Mon, Sep 15, 2014 at 6:13 PM, Tom Herbert <therbert@google.com> wrote:
>> On Mon, Sep 15, 2014 at 6:33 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>> On Mon, Sep 15, 2014 at 6:07 AM, Tom Herbert <therbert@google.com> wrote:
>>>> Want to be able to call this in foo-over-udp offloads, etc.
>>>
>>> In the L2 gro case, we did dedicated helpers
>>> gro_find_receive/complete_by_type, not sure what was
>>> the exact rational there but worth checking, jerry?
>>>
>> It allows offload_base to be kept a static, but then
>> gro_find_receive_by_type can't be inlined.
>
>
> so we have two similar locations in the networking stack acting
> differently on the same/similar simple
> matter... a bit problematic maintainance wise, I would say.

Yes, these should be similar. I think we'd need to cleanup
gro_find_receive_by_type first: export udp_offload base and inline
these functions. Have skb_mac_gso_segment call this also.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: foo-over-udp (fou)
  2014-09-15  3:07 [PATCH v2 net-next 0/7] net: foo-over-udp (fou) Tom Herbert
                   ` (6 preceding siblings ...)
  2014-09-15  3:08 ` [PATCH v2 net-next 7/7] gre: TX path for GRE/UDP " Tom Herbert
@ 2014-09-15 18:08 ` Or Gerlitz
  2014-09-15 19:15   ` Tom Herbert
  7 siblings, 1 reply; 33+ messages in thread
From: Or Gerlitz @ 2014-09-15 18:08 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, Linux Netdev List

On Mon, Sep 15, 2014 at 6:07 AM, Tom Herbert <therbert@google.com> wrote:
[...]
> * Notes
>   - This patch set does not implement GSO for FOU. The UDP encapsulation
>     code assumes TEB, so that will need to be reimplemented.

Can you please clarify this point little further? Specifically, today
few NICs are
advertizing NETIF_F_GSO_UDP_TUNNEL when they are practically GSO
capable only w.r.t to VXLAN. What happens when such NIC expose this
cap and a large guest frame goes through GRE over UDP or alike tunneling?

>   - I really don't expect/want devices to have special support for any
>     of this. Generic checksum offload mechanisms (NETIF_HW_CSUM
>     and use of CHECKSUM_COMPLETE) should be sufficient. RSS and flow
>     steering is provided by commonly implemented UDP hashing. GRO/GSO
>     seem fairly comparable with LRO/TSO already.

Again, today NICs are advertizing checksum offloads capability in
enc_hw_features but aren't capable to compute (say) the TCP checksum
of the inner
packet regardless of which actual tunneling is used (e.g VXLAN vs
GRE), a bit inconsistent?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 net-next 1/7] net: Export inet_offloads and inet6_offloads
  2014-09-15  3:07 ` [PATCH v2 net-next 1/7] net: Export inet_offloads and inet6_offloads Tom Herbert
  2014-09-15 13:33   ` Or Gerlitz
@ 2014-09-15 18:21   ` David Miller
  1 sibling, 0 replies; 33+ messages in thread
From: David Miller @ 2014-09-15 18:21 UTC (permalink / raw)
  To: therbert; +Cc: netdev

From: Tom Herbert <therbert@google.com>
Date: Sun, 14 Sep 2014 20:07:57 -0700

> Want to be able to call this in foo-over-udp offloads, etc.
> 
> Signed-off-by: Tom Herbert <therbert@google.com>

I don't think inet{,6}_offloads are symbols you can "call" :-)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 net-next 3/7] fou: Add GRO support
  2014-09-15  3:07 ` [PATCH v2 net-next 3/7] fou: Add GRO support Tom Herbert
  2014-09-15 15:00   ` Or Gerlitz
@ 2014-09-15 18:22   ` David Miller
  1 sibling, 0 replies; 33+ messages in thread
From: David Miller @ 2014-09-15 18:22 UTC (permalink / raw)
  To: therbert; +Cc: netdev

From: Tom Herbert <therbert@google.com>
Date: Sun, 14 Sep 2014 20:07:59 -0700

> @@ -62,6 +64,70 @@ static int fou_udp_recv(struct sock *sk, struct sk_buff *skb)
>  					  sizeof(struct udphdr));
>  }
>  
> +static inline struct sk_buff **fou_gro_receive(struct sk_buff **head,
> +					       struct sk_buff *skb,
> +					       const struct net_offload
> +							     **offloads)

Please drop the "inline" in foo.c files and let the compiler decide.  And
then you won't need to split up the "offloads" variable declaration like
that.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: foo-over-udp (fou)
  2014-09-15 18:08 ` [PATCH v2 net-next 0/7] net: foo-over-udp (fou) Or Gerlitz
@ 2014-09-15 19:15   ` Tom Herbert
  2014-09-15 22:44     ` Jesse Gross
  2014-09-16 13:35     ` Or Gerlitz
  0 siblings, 2 replies; 33+ messages in thread
From: Tom Herbert @ 2014-09-15 19:15 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: David Miller, Linux Netdev List

On Mon, Sep 15, 2014 at 11:08 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Mon, Sep 15, 2014 at 6:07 AM, Tom Herbert <therbert@google.com> wrote:
> [...]
>> * Notes
>>   - This patch set does not implement GSO for FOU. The UDP encapsulation
>>     code assumes TEB, so that will need to be reimplemented.
>
> Can you please clarify this point little further? Specifically, today
> few NICs are
> advertizing NETIF_F_GSO_UDP_TUNNEL when they are practically GSO
> capable only w.r.t to VXLAN. What happens when such NIC expose this
> cap and a large guest frame goes through GRE over UDP or alike tunneling?
>
My interpretation is that NETIF_F_GSO_UDP_TUNNEL means L3/L4
encapsulation over UDP, not VXLAN. If the NIC implements things
properly following the generic interface then I believe it should work
with various flavors of UDP encapsulation (FOU, GUE, VXLAN, VXLAN-gpe,
geneve, LISP, L2TP, nvgre, or whatever else people might dream up).
This presumes that any encapsulation headers doesn't require any per
segment update (so no GRE csum for instance). The stack will set up
inner headers as needed, which should enough to provide to devices the
offsets inner IP and TCP header which are needed for the the TSO
operation (outer IP and UDP can be deduced also).

>>   - I really don't expect/want devices to have special support for any
>>     of this. Generic checksum offload mechanisms (NETIF_HW_CSUM
>>     and use of CHECKSUM_COMPLETE) should be sufficient. RSS and flow
>>     steering is provided by commonly implemented UDP hashing. GRO/GSO
>>     seem fairly comparable with LRO/TSO already.
>
> Again, today NICs are advertizing checksum offloads capability in
> enc_hw_features but aren't capable to compute (say) the TCP checksum
> of the inner
> packet regardless of which actual tunneling is used (e.g VXLAN vs
> GRE), a bit inconsistent?

I doubt this is true of all NICs! For instance, a NIC that implements
NETIF_F_HW_CSUM should have no problem computing an encapsulated
checksums in just about any scenario. Both, checksum offload and TSO
can be supported for arbitrary flavors of UDP encapsulation if NICs
use protocol agnostic means as opposed to protocol specific means that
require a lot of parsing the packets themselves. Look at the long
standing comments in sk_buff about why protocol specific methods like
CHECKSUM_UNNECESSARY and NETIF_F_IP_CSUM are bad ideas. With the
emergence of encapsulation these are now *really* bad ideas!

If devices are interpreting NETIF_F_GSO_UDP_TUNNEL as VXLAN would
break when presented with any other flavor of UDP encapsulation, then
we should probably define NETIF_F_GSO_VXLAN_TUNNEL just for that case
to maintain backwards compatibility.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: foo-over-udp (fou)
  2014-09-15 19:15   ` Tom Herbert
@ 2014-09-15 22:44     ` Jesse Gross
  2014-09-15 22:59       ` Tom Herbert
  2014-09-16 12:44       ` Or Gerlitz
  2014-09-16 13:35     ` Or Gerlitz
  1 sibling, 2 replies; 33+ messages in thread
From: Jesse Gross @ 2014-09-15 22:44 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Or Gerlitz, David Miller, Linux Netdev List

On Mon, Sep 15, 2014 at 12:15 PM, Tom Herbert <therbert@google.com> wrote:
> On Mon, Sep 15, 2014 at 11:08 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>> On Mon, Sep 15, 2014 at 6:07 AM, Tom Herbert <therbert@google.com> wrote:
>> [...]
>>> * Notes
>>>   - This patch set does not implement GSO for FOU. The UDP encapsulation
>>>     code assumes TEB, so that will need to be reimplemented.
>>
>> Can you please clarify this point little further? Specifically, today
>> few NICs are
>> advertizing NETIF_F_GSO_UDP_TUNNEL when they are practically GSO
>> capable only w.r.t to VXLAN. What happens when such NIC expose this
>> cap and a large guest frame goes through GRE over UDP or alike tunneling?
>>
> My interpretation is that NETIF_F_GSO_UDP_TUNNEL means L3/L4
> encapsulation over UDP, not VXLAN. If the NIC implements things
> properly following the generic interface then I believe it should work
> with various flavors of UDP encapsulation (FOU, GUE, VXLAN, VXLAN-gpe,
> geneve, LISP, L2TP, nvgre, or whatever else people might dream up).
> This presumes that any encapsulation headers doesn't require any per
> segment update (so no GRE csum for instance). The stack will set up
> inner headers as needed, which should enough to provide to devices the
> offsets inner IP and TCP header which are needed for the the TSO
> operation (outer IP and UDP can be deduced also).

>From the NICs that I am familiar with this is mostly true. The main
part that is missing from the current implementation is a length
limit: just because the hardware can skip over headers to an offset
doesn't mean that it can do so to an arbitrary depth. For example, in
the NICs that are exposing VXLAN as NETIF_F_GSO_UDP_TUNNEL we can
probably assume that this is limited to 8 bytes. With the Intel NICs
that were just announced with Geneve support, this limit has been
increased to 64. If we add a parameter to the driver interface to
expose this then it should be generic across tunnels.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: foo-over-udp (fou)
  2014-09-15 22:44     ` Jesse Gross
@ 2014-09-15 22:59       ` Tom Herbert
  2014-09-16  0:15         ` Jesse Gross
  2014-09-16 12:44       ` Or Gerlitz
  1 sibling, 1 reply; 33+ messages in thread
From: Tom Herbert @ 2014-09-15 22:59 UTC (permalink / raw)
  To: Jesse Gross; +Cc: Or Gerlitz, David Miller, Linux Netdev List

On Mon, Sep 15, 2014 at 3:44 PM, Jesse Gross <jesse@nicira.com> wrote:
> On Mon, Sep 15, 2014 at 12:15 PM, Tom Herbert <therbert@google.com> wrote:
>> On Mon, Sep 15, 2014 at 11:08 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>> On Mon, Sep 15, 2014 at 6:07 AM, Tom Herbert <therbert@google.com> wrote:
>>> [...]
>>>> * Notes
>>>>   - This patch set does not implement GSO for FOU. The UDP encapsulation
>>>>     code assumes TEB, so that will need to be reimplemented.
>>>
>>> Can you please clarify this point little further? Specifically, today
>>> few NICs are
>>> advertizing NETIF_F_GSO_UDP_TUNNEL when they are practically GSO
>>> capable only w.r.t to VXLAN. What happens when such NIC expose this
>>> cap and a large guest frame goes through GRE over UDP or alike tunneling?
>>>
>> My interpretation is that NETIF_F_GSO_UDP_TUNNEL means L3/L4
>> encapsulation over UDP, not VXLAN. If the NIC implements things
>> properly following the generic interface then I believe it should work
>> with various flavors of UDP encapsulation (FOU, GUE, VXLAN, VXLAN-gpe,
>> geneve, LISP, L2TP, nvgre, or whatever else people might dream up).
>> This presumes that any encapsulation headers doesn't require any per
>> segment update (so no GRE csum for instance). The stack will set up
>> inner headers as needed, which should enough to provide to devices the
>> offsets inner IP and TCP header which are needed for the the TSO
>> operation (outer IP and UDP can be deduced also).
>
> From the NICs that I am familiar with this is mostly true. The main
> part that is missing from the current implementation is a length
> limit: just because the hardware can skip over headers to an offset
> doesn't mean that it can do so to an arbitrary depth. For example, in
> the NICs that are exposing VXLAN as NETIF_F_GSO_UDP_TUNNEL we can
> probably assume that this is limited to 8 bytes. With the Intel NICs
> that were just announced with Geneve support, this limit has been
> increased to 64. If we add a parameter to the driver interface to
> expose this then it should be generic across tunnels.

Sounds reasonable, although I think you'll need to define precisely
what length refers to.

Tom

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: foo-over-udp (fou)
  2014-09-15 22:59       ` Tom Herbert
@ 2014-09-16  0:15         ` Jesse Gross
  0 siblings, 0 replies; 33+ messages in thread
From: Jesse Gross @ 2014-09-16  0:15 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Or Gerlitz, David Miller, Linux Netdev List

On Mon, Sep 15, 2014 at 3:59 PM, Tom Herbert <therbert@google.com> wrote:
> On Mon, Sep 15, 2014 at 3:44 PM, Jesse Gross <jesse@nicira.com> wrote:
>> On Mon, Sep 15, 2014 at 12:15 PM, Tom Herbert <therbert@google.com> wrote:
>>> On Mon, Sep 15, 2014 at 11:08 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>> On Mon, Sep 15, 2014 at 6:07 AM, Tom Herbert <therbert@google.com> wrote:
>>>> [...]
>>>>> * Notes
>>>>>   - This patch set does not implement GSO for FOU. The UDP encapsulation
>>>>>     code assumes TEB, so that will need to be reimplemented.
>>>>
>>>> Can you please clarify this point little further? Specifically, today
>>>> few NICs are
>>>> advertizing NETIF_F_GSO_UDP_TUNNEL when they are practically GSO
>>>> capable only w.r.t to VXLAN. What happens when such NIC expose this
>>>> cap and a large guest frame goes through GRE over UDP or alike tunneling?
>>>>
>>> My interpretation is that NETIF_F_GSO_UDP_TUNNEL means L3/L4
>>> encapsulation over UDP, not VXLAN. If the NIC implements things
>>> properly following the generic interface then I believe it should work
>>> with various flavors of UDP encapsulation (FOU, GUE, VXLAN, VXLAN-gpe,
>>> geneve, LISP, L2TP, nvgre, or whatever else people might dream up).
>>> This presumes that any encapsulation headers doesn't require any per
>>> segment update (so no GRE csum for instance). The stack will set up
>>> inner headers as needed, which should enough to provide to devices the
>>> offsets inner IP and TCP header which are needed for the the TSO
>>> operation (outer IP and UDP can be deduced also).
>>
>> From the NICs that I am familiar with this is mostly true. The main
>> part that is missing from the current implementation is a length
>> limit: just because the hardware can skip over headers to an offset
>> doesn't mean that it can do so to an arbitrary depth. For example, in
>> the NICs that are exposing VXLAN as NETIF_F_GSO_UDP_TUNNEL we can
>> probably assume that this is limited to 8 bytes. With the Intel NICs
>> that were just announced with Geneve support, this limit has been
>> increased to 64. If we add a parameter to the driver interface to
>> expose this then it should be generic across tunnels.
>
> Sounds reasonable, although I think you'll need to define precisely
> what length refers to.

I agree, the definition is important.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: foo-over-udp (fou)
  2014-09-15 22:44     ` Jesse Gross
  2014-09-15 22:59       ` Tom Herbert
@ 2014-09-16 12:44       ` Or Gerlitz
  2014-09-16 18:34         ` Tom Herbert
  1 sibling, 1 reply; 33+ messages in thread
From: Or Gerlitz @ 2014-09-16 12:44 UTC (permalink / raw)
  To: Jesse Gross, Tom Herbert; +Cc: David Miller, Linux Netdev List

On Tue, Sep 16, 2014 at 1:44 AM, Jesse Gross <jesse@nicira.com> wrote:
> On Mon, Sep 15, 2014 at 12:15 PM, Tom Herbert <therbert@google.com> wrote:

>> My interpretation is that NETIF_F_GSO_UDP_TUNNEL means L3/L4
>> encapsulation over UDP, not VXLAN.
>> If the NIC implements things properly following the generic interface then I believe it should work
>> with various flavors of UDP encapsulation (FOU, GUE, VXLAN, VXLAN-gpe,
>> geneve, LISP, L2TP, nvgre, or whatever else people might dream up).
>> This presumes that any encapsulation headers doesn't require any per
>> segment update (so no GRE csum for instance). The stack will set up
>> inner headers as needed, which should enough to provide to devices the
>> offsets inner IP and TCP header which are needed for the the TSO
>> operation (outer IP and UDP can be deduced also).



> From the NICs that I am familiar with this is mostly true. The main
> part that is missing from the current implementation is a length
> limit: just because the hardware can skip over headers to an offset
> doesn't mean that it can do so to an arbitrary depth. For example, in
> the NICs that are exposing VXLAN as NETIF_F_GSO_UDP_TUNNEL we can
> probably assume that this is limited to 8 bytes. With the Intel NICs
> that were just announced with Geneve support, this limit has been
> increased to 64. If we add a parameter to the driver interface to
> expose this then it should be generic across tunnels.

I'm not sure to see why the length limit became our primary concern here...

Fact is that we have nice set of NICs drivers in the kernel that do advertize
the GSO_UDP_TUNNEL feature but their HW isn't capable to segment all of:
FOU, GUE, VXLAN, VXLAN-gpe, geneve, LISP, L2TP, nvgre, or whatever
else people might dream up, right?

So we need to fix that and let each NIC properly advertize up to the
stack what they
can segment in HW and what not which means that networking code would have to
do that in SW for (say) 64KB guest TCP segment that just went through
this encapsulation.

As long as Linux didn't support any UDP encapsulation other then VXLAN
it worked,
but soon will too easily broken, and I vote for the fix to be part of
the FOU series, so we have
the kernel functional also once it applied...

Even if the encapsulated headers need no update per segment (and they always
do, e.g the IP ID field of the outer IP header) still a certain HW may
not be able
to do TCP segmentation under any encapsulation scheme.

And in that respect, I am not sure to follow on the " If the NIC
implements things properly
following the generic interface" comment.

Or.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: foo-over-udp (fou)
  2014-09-15 19:15   ` Tom Herbert
  2014-09-15 22:44     ` Jesse Gross
@ 2014-09-16 13:35     ` Or Gerlitz
  2014-09-16 15:00       ` Tom Herbert
  1 sibling, 1 reply; 33+ messages in thread
From: Or Gerlitz @ 2014-09-16 13:35 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, Linux Netdev List

On Mon, Sep 15, 2014 at 10:15 PM, Tom Herbert <therbert@google.com> wrote:
> On Mon, Sep 15, 2014 at 11:08 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>> On Mon, Sep 15, 2014 at 6:07 AM, Tom Herbert <therbert@google.com> wrote:

>>>   - I really don't expect/want devices to have special support for any
>>>     of this. Generic checksum offload mechanisms (NETIF_HW_CSUM
>>>     and use of CHECKSUM_COMPLETE) should be sufficient. RSS and flow
>>>     steering is provided by commonly implemented UDP hashing. GRO/GSO
>>>     seem fairly comparable with LRO/TSO already.

>> Again, today NICs are advertizing checksum offloads capability in
>> enc_hw_features but aren't capable to compute (say) the TCP checksum
>> of the inner packet regardless of which actual tunneling is used (e.g VXLAN vs
>> GRE), a bit inconsistent?

> I doubt this is true of all NICs! For instance, a NIC that implements
> NETIF_F_HW_CSUM should have no problem computing an encapsulated
> checksums in just about any scenario.

The comment for NETIF_F_HW_CSUM says "Can checksum all the packets" --
so your interpretation is that NICs supporting that will always report
CHECKSUM_COMPLETE, OK. But there are bunch of 10/40Gbs NIC drivers
that don't report the HW_CSUM bit in neither of the ->features and
->hw_enc_features, the system should act in a manner that supports them.

> Both, checksum offload and TSO
> can be supported for arbitrary flavors of UDP encapsulation if NICs
> use protocol agnostic means as opposed to protocol specific means that
> require a lot of parsing the packets themselves. Look at the long
> standing comments in sk_buff about why protocol specific methods like
> CHECKSUM_UNNECESSARY and NETIF_F_IP_CSUM are bad ideas. With the
> emergence of encapsulation these are now *really* bad ideas!

So we go and throw away the HW?

Or.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: foo-over-udp (fou)
  2014-09-16 13:35     ` Or Gerlitz
@ 2014-09-16 15:00       ` Tom Herbert
  2014-09-16 20:04         ` Or Gerlitz
  0 siblings, 1 reply; 33+ messages in thread
From: Tom Herbert @ 2014-09-16 15:00 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: David Miller, Linux Netdev List

On Tue, Sep 16, 2014 at 6:35 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Mon, Sep 15, 2014 at 10:15 PM, Tom Herbert <therbert@google.com> wrote:
>> On Mon, Sep 15, 2014 at 11:08 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>> On Mon, Sep 15, 2014 at 6:07 AM, Tom Herbert <therbert@google.com> wrote:
>
>>>>   - I really don't expect/want devices to have special support for any
>>>>     of this. Generic checksum offload mechanisms (NETIF_HW_CSUM
>>>>     and use of CHECKSUM_COMPLETE) should be sufficient. RSS and flow
>>>>     steering is provided by commonly implemented UDP hashing. GRO/GSO
>>>>     seem fairly comparable with LRO/TSO already.
>
>>> Again, today NICs are advertizing checksum offloads capability in
>>> enc_hw_features but aren't capable to compute (say) the TCP checksum
>>> of the inner packet regardless of which actual tunneling is used (e.g VXLAN vs
>>> GRE), a bit inconsistent?
>
>> I doubt this is true of all NICs! For instance, a NIC that implements
>> NETIF_F_HW_CSUM should have no problem computing an encapsulated
>> checksums in just about any scenario.
>
> The comment for NETIF_F_HW_CSUM says "Can checksum all the packets" --
> so your interpretation is that NICs supporting that will always report
> CHECKSUM_COMPLETE, OK. But there are bunch of 10/40Gbs NIC drivers
> that don't report the HW_CSUM bit in neither of the ->features and
> ->hw_enc_features, the system should act in a manner that supports them.
>
>> Both, checksum offload and TSO
>> can be supported for arbitrary flavors of UDP encapsulation if NICs
>> use protocol agnostic means as opposed to protocol specific means that
>> require a lot of parsing the packets themselves. Look at the long
>> standing comments in sk_buff about why protocol specific methods like
>> CHECKSUM_UNNECESSARY and NETIF_F_IP_CSUM are bad ideas. With the
>> emergence of encapsulation these are now *really* bad ideas!
>
> So we go and throw away the HW?
>
No, and this is exactly the point! I shouldn't have to throw away all
my deployed hardware to just to get support for the latest
encapsulation protocol du jour. Fortunately, we'll be able to code
around mosts of the limitations of deployed NICs that don't implement
generic mechanisms (with UDP RSS, checksum conversions,  remote
checksum offload). But if you're contemplating a new NIC, *please*
consider implementing generic, protocol agnostic mechanisms.

> Or.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: foo-over-udp (fou)
  2014-09-16 12:44       ` Or Gerlitz
@ 2014-09-16 18:34         ` Tom Herbert
  2014-09-16 19:14           ` Or Gerlitz
  0 siblings, 1 reply; 33+ messages in thread
From: Tom Herbert @ 2014-09-16 18:34 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Jesse Gross, David Miller, Linux Netdev List

On Tue, Sep 16, 2014 at 5:44 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Tue, Sep 16, 2014 at 1:44 AM, Jesse Gross <jesse@nicira.com> wrote:
>> On Mon, Sep 15, 2014 at 12:15 PM, Tom Herbert <therbert@google.com> wrote:
>
>>> My interpretation is that NETIF_F_GSO_UDP_TUNNEL means L3/L4
>>> encapsulation over UDP, not VXLAN.
>>> If the NIC implements things properly following the generic interface then I believe it should work
>>> with various flavors of UDP encapsulation (FOU, GUE, VXLAN, VXLAN-gpe,
>>> geneve, LISP, L2TP, nvgre, or whatever else people might dream up).
>>> This presumes that any encapsulation headers doesn't require any per
>>> segment update (so no GRE csum for instance). The stack will set up
>>> inner headers as needed, which should enough to provide to devices the
>>> offsets inner IP and TCP header which are needed for the the TSO
>>> operation (outer IP and UDP can be deduced also).
>
>
>
>> From the NICs that I am familiar with this is mostly true. The main
>> part that is missing from the current implementation is a length
>> limit: just because the hardware can skip over headers to an offset
>> doesn't mean that it can do so to an arbitrary depth. For example, in
>> the NICs that are exposing VXLAN as NETIF_F_GSO_UDP_TUNNEL we can
>> probably assume that this is limited to 8 bytes. With the Intel NICs
>> that were just announced with Geneve support, this limit has been
>> increased to 64. If we add a parameter to the driver interface to
>> expose this then it should be generic across tunnels.
>
> I'm not sure to see why the length limit became our primary concern here...
>
Like Jesse mentioned above, looks like some NICs may have assumed all
encapsulation headers are eight bytes (which allows HW to implement
everything in fixed offsets). But this length is not a universal
constant: FOU is zero length encapsulation headers, GUE or geneve is
variable. The driver should really be checking if NIC can handle the
length and if it can't perform GSO in software-- I don't think we'll
need to expose this in the features.

> Fact is that we have nice set of NICs drivers in the kernel that do advertize
> the GSO_UDP_TUNNEL feature but their HW isn't capable to segment all of:
> FOU, GUE, VXLAN, VXLAN-gpe, geneve, LISP, L2TP, nvgre, or whatever
> else people might dream up, right?
>
> So we need to fix that and let each NIC properly advertize up to the
> stack what they
> can segment in HW and what not which means that networking code would have to
> do that in SW for (say) 64KB guest TCP segment that just went through
> this encapsulation.
>
> As long as Linux didn't support any UDP encapsulation other then VXLAN
> it worked,
> but soon will too easily broken, and I vote for the fix to be part of
> the FOU series, so we have
> the kernel functional also once it applied...
>
> Even if the encapsulated headers need no update per segment (and they always
> do, e.g the IP ID field of the outer IP header) still a certain HW may
> not be able
> to do TCP segmentation under any encapsulation scheme.
>
> And in that respect, I am not sure to follow on the " If the NIC
> implements things properly
> following the generic interface" comment.
>
> Or.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: foo-over-udp (fou)
  2014-09-16 18:34         ` Tom Herbert
@ 2014-09-16 19:14           ` Or Gerlitz
  2014-09-16 20:31             ` Tom Herbert
  0 siblings, 1 reply; 33+ messages in thread
From: Or Gerlitz @ 2014-09-16 19:14 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Jesse Gross, David Miller, Linux Netdev List

On Tue, Sep 16, 2014 at 9:34 PM, Tom Herbert <therbert@google.com> wrote:
> On Tue, Sep 16, 2014 at 5:44 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>> On Tue, Sep 16, 2014 at 1:44 AM, Jesse Gross <jesse@nicira.com> wrote:
>>> On Mon, Sep 15, 2014 at 12:15 PM, Tom Herbert <therbert@google.com> wrote:
>>
>>>> My interpretation is that NETIF_F_GSO_UDP_TUNNEL means L3/L4
>>>> encapsulation over UDP, not VXLAN.
>>>> If the NIC implements things properly following the generic interface then I believe it should work
>>>> with various flavors of UDP encapsulation (FOU, GUE, VXLAN, VXLAN-gpe,
>>>> geneve, LISP, L2TP, nvgre, or whatever else people might dream up).
>>>> This presumes that any encapsulation headers doesn't require any per
>>>> segment update (so no GRE csum for instance). The stack will set up
>>>> inner headers as needed, which should enough to provide to devices the
>>>> offsets inner IP and TCP header which are needed for the the TSO
>>>> operation (outer IP and UDP can be deduced also).
>>
>>
>>
>>> From the NICs that I am familiar with this is mostly true. The main
>>> part that is missing from the current implementation is a length
>>> limit: just because the hardware can skip over headers to an offset
>>> doesn't mean that it can do so to an arbitrary depth. For example, in
>>> the NICs that are exposing VXLAN as NETIF_F_GSO_UDP_TUNNEL we can
>>> probably assume that this is limited to 8 bytes. With the Intel NICs
>>> that were just announced with Geneve support, this limit has been
>>> increased to 64. If we add a parameter to the driver interface to
>>> expose this then it should be generic across tunnels.
>>
>> I'm not sure to see why the length limit became our primary concern here...

> Like Jesse mentioned above, looks like some NICs may have assumed all
> encapsulation headers are eight bytes (which allows HW to implement
> everything in fixed offsets). But this length is not a universal
> constant: FOU is zero length encapsulation headers, GUE or geneve is
> variable. The driver should really be checking if NIC can handle the
> length and if it can't perform GSO in software-- I don't think we'll
> need to expose this in the features.

I understand that for some NICs there's a claim saying the essence of
the limitation lies in an assumption on fixed length of the
encapsulation headers  -- and BTW for VXLAN it's 50 (= 14 + 20 + 8 +
8) bytes, not eight. So newer NICs  or new brands of existing NICs
should be more flexible.

If I correctly read your comment "The driver should really be checking
if NIC can handle the length and if it can't perform GSO in software"
as saying that a SW GSO call should be made from within the driver
when they can't serve GSO under some encap scheme -- I don't think
this is the correct track, the driver should advertize up what they
can do in HW so the stack does in SW what's not supported.

Another clarification - so FOU doesn't supersedes GUE? what's their
difference...?

Or.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: foo-over-udp (fou)
  2014-09-16 15:00       ` Tom Herbert
@ 2014-09-16 20:04         ` Or Gerlitz
  2014-09-16 20:16           ` David Miller
  2014-09-16 20:35           ` Tom Herbert
  0 siblings, 2 replies; 33+ messages in thread
From: Or Gerlitz @ 2014-09-16 20:04 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, Linux Netdev List

On Tue, Sep 16, 2014 at 6:00 PM, Tom Herbert <therbert@google.com> wrote:
> [...] Fortunately, we'll be able to code
> around mosts of the limitations of deployed NICs that don't implement
> generic mechanisms (with UDP RSS, checksum conversions,  remote
> checksum offload) [...]

So UDP RSS is a clear requirement and NICs have it by now.

Re checksum conversions, I assume you mean the ability on the RX path
to report CHECKSUM_COMPLETE on any sort of IP packet potentially
having multiple encapsulations, right? how would you phrase (and
model) the other way around, e.g what is the generic requirement on
the TX path w.r.t checksum offload?

And I wasn't sure what do you mean by remote checksum offload, can you clarify?

Or.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: foo-over-udp (fou)
  2014-09-16 20:04         ` Or Gerlitz
@ 2014-09-16 20:16           ` David Miller
  2014-09-17 15:22             ` Or Gerlitz
  2014-09-16 20:35           ` Tom Herbert
  1 sibling, 1 reply; 33+ messages in thread
From: David Miller @ 2014-09-16 20:16 UTC (permalink / raw)
  To: gerlitz.or; +Cc: therbert, netdev

From: Or Gerlitz <gerlitz.or@gmail.com>
Date: Tue, 16 Sep 2014 23:04:30 +0300

> And I wasn't sure what do you mean by remote checksum offload, can
> you clarify?

http://vger.kernel.org/encapsulation_offloads.pdf

Page 14.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: foo-over-udp (fou)
  2014-09-16 19:14           ` Or Gerlitz
@ 2014-09-16 20:31             ` Tom Herbert
  0 siblings, 0 replies; 33+ messages in thread
From: Tom Herbert @ 2014-09-16 20:31 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Jesse Gross, David Miller, Linux Netdev List

On Tue, Sep 16, 2014 at 12:14 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Tue, Sep 16, 2014 at 9:34 PM, Tom Herbert <therbert@google.com> wrote:
>> On Tue, Sep 16, 2014 at 5:44 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>> On Tue, Sep 16, 2014 at 1:44 AM, Jesse Gross <jesse@nicira.com> wrote:
>>>> On Mon, Sep 15, 2014 at 12:15 PM, Tom Herbert <therbert@google.com> wrote:
>>>
>>>>> My interpretation is that NETIF_F_GSO_UDP_TUNNEL means L3/L4
>>>>> encapsulation over UDP, not VXLAN.
>>>>> If the NIC implements things properly following the generic interface then I believe it should work
>>>>> with various flavors of UDP encapsulation (FOU, GUE, VXLAN, VXLAN-gpe,
>>>>> geneve, LISP, L2TP, nvgre, or whatever else people might dream up).
>>>>> This presumes that any encapsulation headers doesn't require any per
>>>>> segment update (so no GRE csum for instance). The stack will set up
>>>>> inner headers as needed, which should enough to provide to devices the
>>>>> offsets inner IP and TCP header which are needed for the the TSO
>>>>> operation (outer IP and UDP can be deduced also).
>>>
>>>
>>>
>>>> From the NICs that I am familiar with this is mostly true. The main
>>>> part that is missing from the current implementation is a length
>>>> limit: just because the hardware can skip over headers to an offset
>>>> doesn't mean that it can do so to an arbitrary depth. For example, in
>>>> the NICs that are exposing VXLAN as NETIF_F_GSO_UDP_TUNNEL we can
>>>> probably assume that this is limited to 8 bytes. With the Intel NICs
>>>> that were just announced with Geneve support, this limit has been
>>>> increased to 64. If we add a parameter to the driver interface to
>>>> expose this then it should be generic across tunnels.
>>>
>>> I'm not sure to see why the length limit became our primary concern here...
>
>> Like Jesse mentioned above, looks like some NICs may have assumed all
>> encapsulation headers are eight bytes (which allows HW to implement
>> everything in fixed offsets). But this length is not a universal
>> constant: FOU is zero length encapsulation headers, GUE or geneve is
>> variable. The driver should really be checking if NIC can handle the
>> length and if it can't perform GSO in software-- I don't think we'll
>> need to expose this in the features.
>
> I understand that for some NICs there's a claim saying the essence of
> the limitation lies in an assumption on fixed length of the
> encapsulation headers  -- and BTW for VXLAN it's 50 (= 14 + 20 + 8 +
> 8) bytes, not eight. So newer NICs  or new brands of existing NICs
> should be more flexible.
>
> If I correctly read your comment "The driver should really be checking
> if NIC can handle the length and if it can't perform GSO in software"
> as saying that a SW GSO call should be made from within the driver
> when they can't serve GSO under some encap scheme -- I don't think
> this is the correct track, the driver should advertize up what they
> can do in HW so the stack does in SW what's not supported.
>
The problem is that it is likely impractical for drivers to advertise
all possible constraints of their HW. Right now we have the features
flags, but that is very limited and we really can't afford to add a
new value for every permutation.  There are just to many dimensions.
Some devices might expect fixed length headers, some might not but
might have other length constraints. Some may perfectly happy with
v4/v4 but maybe choke on combinations with IPv6. Others may not mind
IP options or extension headers, some might be okay. Some devices
might not like certain packet layouts, etc.

There is precedent for the driver punting to software mechanisms when
it can't handle something. For instance, cxgb veth call
skb_checksum_help for resolving UDP checksums, myri10ge and marvell
controller call it for headers that are too large. gianfar calls it
per some errata condition...

If we can't do GSO from within the drivers, then another alternative
would be to add an ndo_gso_check function to call when stack is
deciding whether to do SW GSO.

> Another clarification - so FOU doesn't supersedes GUE? what's their
> difference...?
>
FOU is direct encapsulation of IP protocol packets in UDP payload. GUE
is an encapsulation header between UDP and the encapsulated IP
protocol packet. http://tools.ietf.org/html/draft-herbert-gue-01


Tom

> Or.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: foo-over-udp (fou)
  2014-09-16 20:04         ` Or Gerlitz
  2014-09-16 20:16           ` David Miller
@ 2014-09-16 20:35           ` Tom Herbert
  1 sibling, 0 replies; 33+ messages in thread
From: Tom Herbert @ 2014-09-16 20:35 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: David Miller, Linux Netdev List

On Tue, Sep 16, 2014 at 1:04 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Tue, Sep 16, 2014 at 6:00 PM, Tom Herbert <therbert@google.com> wrote:
>> [...] Fortunately, we'll be able to code
>> around mosts of the limitations of deployed NICs that don't implement
>> generic mechanisms (with UDP RSS, checksum conversions,  remote
>> checksum offload) [...]
>
> So UDP RSS is a clear requirement and NICs have it by now.
>
> Re checksum conversions, I assume you mean the ability on the RX path
> to report CHECKSUM_COMPLETE on any sort of IP packet potentially
> having multiple encapsulations, right? how would you phrase (and
> model) the other way around, e.g what is the generic requirement on
> the TX path w.r.t checksum offload?
>
> And I wasn't sure what do you mean by remote checksum offload, can you clarify?
>
This is the part intend to solve TX checksum offload for encapsulation
for "dumb" devices. Unlike, checksum conversion, this one requires a
bit of protocol support so we need an extensible encapsulation header.

http://tools.ietf.org/html/draft-herbert-remotecsumoffload-00

> Or.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 net-next 0/7] net: foo-over-udp (fou)
  2014-09-16 20:16           ` David Miller
@ 2014-09-17 15:22             ` Or Gerlitz
  0 siblings, 0 replies; 33+ messages in thread
From: Or Gerlitz @ 2014-09-17 15:22 UTC (permalink / raw)
  To: David Miller; +Cc: Tom Herbert, Linux Netdev List

On Tue, Sep 16, 2014 at 11:16 PM, David Miller <davem@davemloft.net> wrote:
> From: Or Gerlitz <gerlitz.or@gmail.com>
> Date: Tue, 16 Sep 2014 23:04:30 +0300
>
>> And I wasn't sure what do you mean by remote checksum offload, can you clarify?

> http://vger.kernel.org/encapsulation_offloads.pdf
> Page 14.

Oh, thanks for the pointer, will take a look.

Or.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2014-09-17 15:22 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-15  3:07 [PATCH v2 net-next 0/7] net: foo-over-udp (fou) Tom Herbert
2014-09-15  3:07 ` [PATCH v2 net-next 1/7] net: Export inet_offloads and inet6_offloads Tom Herbert
2014-09-15 13:33   ` Or Gerlitz
2014-09-15 15:13     ` Tom Herbert
2014-09-15 17:15       ` Or Gerlitz
2014-09-15 17:32         ` Tom Herbert
2014-09-15 18:21   ` David Miller
2014-09-15  3:07 ` [PATCH v2 net-next 2/7] fou: Support for foo-over-udp RX path Tom Herbert
2014-09-15  3:07 ` [PATCH v2 net-next 3/7] fou: Add GRO support Tom Herbert
2014-09-15 15:00   ` Or Gerlitz
2014-09-15 15:10     ` Tom Herbert
2014-09-15 17:03       ` Or Gerlitz
2014-09-15 17:21         ` Tom Herbert
2014-09-15 18:22   ` David Miller
2014-09-15  3:08 ` [PATCH v2 net-next 4/7] net: Changes to ip_tunnel to support foo-over-udp encapsulation Tom Herbert
2014-09-15  3:08 ` [PATCH v2 net-next 5/7] sit: TX path for sit/UDP " Tom Herbert
2014-09-15  3:08 ` [PATCH v2 net-next 6/7] ipip: TX path for IPIP/UDP " Tom Herbert
2014-09-15  3:08 ` [PATCH v2 net-next 7/7] gre: TX path for GRE/UDP " Tom Herbert
2014-09-15 18:08 ` [PATCH v2 net-next 0/7] net: foo-over-udp (fou) Or Gerlitz
2014-09-15 19:15   ` Tom Herbert
2014-09-15 22:44     ` Jesse Gross
2014-09-15 22:59       ` Tom Herbert
2014-09-16  0:15         ` Jesse Gross
2014-09-16 12:44       ` Or Gerlitz
2014-09-16 18:34         ` Tom Herbert
2014-09-16 19:14           ` Or Gerlitz
2014-09-16 20:31             ` Tom Herbert
2014-09-16 13:35     ` Or Gerlitz
2014-09-16 15:00       ` Tom Herbert
2014-09-16 20:04         ` Or Gerlitz
2014-09-16 20:16           ` David Miller
2014-09-17 15:22             ` Or Gerlitz
2014-09-16 20:35           ` Tom Herbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.