All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net v2 0/3] Tunneling fixes
@ 2016-03-19 16:31 Jesse Gross
  2016-03-19 16:32 ` [PATCH net v2 1/3] ipip: Properly mark ipip GRO packets as encapsulated Jesse Gross
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Jesse Gross @ 2016-03-19 16:31 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

This series fixes a problem that was reported where encapsulated packets
do not have their encapsulation offload markers stripped off when being
decapsulated. This causes a significant performance drop if the packets
are later retransmitted.

Fixing this revealed two other bugs which are also addressed as prerequisites:
 * GRO can aggregate packets for multiple layers of encapsulation which the
   stack cannot properly handle.
 * IPIP packets which are combined by GRO are not marked properly with their
   GSO type.

Note that this is based off the net-next tree as the current target for
bug fixes.

v2: No code changes, just additional information in commit messages and
    a new cover letter.

Jesse Gross (3):
  ipip: Properly mark ipip GRO packets as encapsulated.
  tunnels: Don't apply GRO to multiple layers of encapsulation.
  tunnels: Remove encapsulation offloads on decap.

 include/linux/netdevice.h |  4 ++--
 include/net/ip_tunnels.h  | 16 ++++++++++++++++
 net/core/dev.c            |  2 +-
 net/ipv4/af_inet.c        | 24 ++++++++++++++++++++++--
 net/ipv4/fou.c            | 13 +++++++++++--
 net/ipv4/gre_offload.c    |  5 +++++
 net/ipv4/ip_tunnel_core.c |  3 ++-
 net/ipv4/udp_offload.c    |  6 +++---
 net/ipv6/ip6_offload.c    | 15 ++++++++++++++-
 net/ipv6/sit.c            |  6 ++++--
 10 files changed, 80 insertions(+), 14 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net v2 1/3] ipip: Properly mark ipip GRO packets as encapsulated.
  2016-03-19 16:31 [PATCH net v2 0/3] Tunneling fixes Jesse Gross
@ 2016-03-19 16:32 ` Jesse Gross
  2016-03-19 16:32 ` [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation Jesse Gross
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Jesse Gross @ 2016-03-19 16:32 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Patrick Boutilier, Eric Dumazet

ipip encapsulated packets can be merged together by GRO but the result
does not have the proper GSO type set or even marked as being
encapsulated at all. Later retransmission of these packets will likely
fail if the device does not support ipip offloads. This is similar to
the issue resolved in IPv6 sit in feec0cb3
("ipv6: gro: support sit protocol").

Reported-by: Patrick Boutilier <boutilpj@ednet.ns.ca>
Fixes: 9667e9bb ("ipip: Add gro callbacks to ipip offload")
Tested-by: Patrick Boutilier <boutilpj@ednet.ns.ca>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jesse Gross <jesse@kernel.org>
---
v2: No change.
---
 net/ipv4/af_inet.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 0cc923f..9659233 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1448,6 +1448,13 @@ out_unlock:
 	return err;
 }
 
+static int ipip_gro_complete(struct sk_buff *skb, int nhoff)
+{
+	skb->encapsulation = 1;
+	skb_shinfo(skb)->gso_type |= SKB_GSO_IPIP;
+	return inet_gro_complete(skb, nhoff);
+}
+
 int inet_ctl_sock_create(struct sock **sk, unsigned short family,
 			 unsigned short type, unsigned char protocol,
 			 struct net *net)
@@ -1676,7 +1683,7 @@ static const struct net_offload ipip_offload = {
 	.callbacks = {
 		.gso_segment	= inet_gso_segment,
 		.gro_receive	= inet_gro_receive,
-		.gro_complete	= inet_gro_complete,
+		.gro_complete	= ipip_gro_complete,
 	},
 };
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.
  2016-03-19 16:31 [PATCH net v2 0/3] Tunneling fixes Jesse Gross
  2016-03-19 16:32 ` [PATCH net v2 1/3] ipip: Properly mark ipip GRO packets as encapsulated Jesse Gross
@ 2016-03-19 16:32 ` Jesse Gross
  2016-03-26 19:41   ` Tom Herbert
  2016-03-19 16:32 ` [PATCH net v2 3/3] tunnels: Remove encapsulation offloads on decap Jesse Gross
  2016-03-20 20:33 ` [PATCH net v2 0/3] Tunneling fixes David Miller
  3 siblings, 1 reply; 18+ messages in thread
From: Jesse Gross @ 2016-03-19 16:32 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

When drivers express support for TSO of encapsulated packets, they
only mean that they can do it for one layer of encapsulation.
Supporting additional levels would mean updating, at a minimum,
more IP length fields and they are unaware of this.

No encapsulation device expresses support for handling offloaded
encapsulated packets, so we won't generate these types of frames
in the transmit path. However, GRO doesn't have a check for
multiple levels of encapsulation and will attempt to build them.

UDP tunnel GRO actually does prevent this situation but it only
handles multiple UDP tunnels stacked on top of each other. This
generalizes that solution to prevent any kind of tunnel stacking
that would cause problems.

Fixes: bf5a755f ("net-gre-gro: Add GRE support to the GRO stack")
Signed-off-by: Jesse Gross <jesse@kernel.org>
---
v2: No change.
---
 include/linux/netdevice.h |  4 ++--
 net/core/dev.c            |  2 +-
 net/ipv4/af_inet.c        | 15 ++++++++++++++-
 net/ipv4/gre_offload.c    |  5 +++++
 net/ipv4/udp_offload.c    |  6 +++---
 net/ipv6/ip6_offload.c    | 15 ++++++++++++++-
 6 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index be693b3..f9eebd5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2096,8 +2096,8 @@ struct napi_gro_cb {
 	/* This is non-zero if the packet may be of the same flow. */
 	u8	same_flow:1;
 
-	/* Used in udp_gro_receive */
-	u8	udp_mark:1;
+	/* Used in tunnel GRO receive */
+	u8	encap_mark:1;
 
 	/* GRO checksum is valid */
 	u8	csum_valid:1;
diff --git a/net/core/dev.c b/net/core/dev.c
index edb7179..43c74ca 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4438,7 +4438,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 		NAPI_GRO_CB(skb)->same_flow = 0;
 		NAPI_GRO_CB(skb)->flush = 0;
 		NAPI_GRO_CB(skb)->free = 0;
-		NAPI_GRO_CB(skb)->udp_mark = 0;
+		NAPI_GRO_CB(skb)->encap_mark = 0;
 		NAPI_GRO_CB(skb)->gro_remcsum_start = 0;
 
 		/* Setup for GRO checksum validation */
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 9659233..0fefba6 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1380,6 +1380,19 @@ out:
 	return pp;
 }
 
+static struct sk_buff **ipip_gro_receive(struct sk_buff **head,
+					 struct sk_buff *skb)
+{
+	if (NAPI_GRO_CB(skb)->encap_mark) {
+		NAPI_GRO_CB(skb)->flush = 1;
+		return NULL;
+	}
+
+	NAPI_GRO_CB(skb)->encap_mark = 1;
+
+	return inet_gro_receive(head, skb);
+}
+
 #define SECONDS_PER_DAY	86400
 
 /* inet_current_timestamp - Return IP network timestamp
@@ -1682,7 +1695,7 @@ static struct packet_offload ip_packet_offload __read_mostly = {
 static const struct net_offload ipip_offload = {
 	.callbacks = {
 		.gso_segment	= inet_gso_segment,
-		.gro_receive	= inet_gro_receive,
+		.gro_receive	= ipip_gro_receive,
 		.gro_complete	= ipip_gro_complete,
 	},
 };
diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
index 540866d..dd03161 100644
--- a/net/ipv4/gre_offload.c
+++ b/net/ipv4/gre_offload.c
@@ -126,6 +126,11 @@ static struct sk_buff **gre_gro_receive(struct sk_buff **head,
 	struct packet_offload *ptype;
 	__be16 type;
 
+	if (NAPI_GRO_CB(skb)->encap_mark)
+		goto out;
+
+	NAPI_GRO_CB(skb)->encap_mark = 1;
+
 	off = skb_gro_offset(skb);
 	hlen = off + sizeof(*greh);
 	greh = skb_gro_header_fast(skb, off);
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 8a3405a..8007f73 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -311,14 +311,14 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb,
 	unsigned int off = skb_gro_offset(skb);
 	int flush = 1;
 
-	if (NAPI_GRO_CB(skb)->udp_mark ||
+	if (NAPI_GRO_CB(skb)->encap_mark ||
 	    (skb->ip_summed != CHECKSUM_PARTIAL &&
 	     NAPI_GRO_CB(skb)->csum_cnt == 0 &&
 	     !NAPI_GRO_CB(skb)->csum_valid))
 		goto out;
 
-	/* mark that this skb passed once through the udp gro layer */
-	NAPI_GRO_CB(skb)->udp_mark = 1;
+	/* mark that this skb passed once through the tunnel gro layer */
+	NAPI_GRO_CB(skb)->encap_mark = 1;
 
 	rcu_read_lock();
 	uo_priv = rcu_dereference(udp_offload_base);
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index eeca943..82e9f30 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -258,6 +258,19 @@ out:
 	return pp;
 }
 
+static struct sk_buff **sit_gro_receive(struct sk_buff **head,
+					struct sk_buff *skb)
+{
+	if (NAPI_GRO_CB(skb)->encap_mark) {
+		NAPI_GRO_CB(skb)->flush = 1;
+		return NULL;
+	}
+
+	NAPI_GRO_CB(skb)->encap_mark = 1;
+
+	return ipv6_gro_receive(head, skb);
+}
+
 static int ipv6_gro_complete(struct sk_buff *skb, int nhoff)
 {
 	const struct net_offload *ops;
@@ -302,7 +315,7 @@ static struct packet_offload ipv6_packet_offload __read_mostly = {
 static const struct net_offload sit_offload = {
 	.callbacks = {
 		.gso_segment	= ipv6_gso_segment,
-		.gro_receive    = ipv6_gro_receive,
+		.gro_receive    = sit_gro_receive,
 		.gro_complete   = sit_gro_complete,
 	},
 };
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net v2 3/3] tunnels: Remove encapsulation offloads on decap.
  2016-03-19 16:31 [PATCH net v2 0/3] Tunneling fixes Jesse Gross
  2016-03-19 16:32 ` [PATCH net v2 1/3] ipip: Properly mark ipip GRO packets as encapsulated Jesse Gross
  2016-03-19 16:32 ` [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation Jesse Gross
@ 2016-03-19 16:32 ` Jesse Gross
  2016-03-20 20:33 ` [PATCH net v2 0/3] Tunneling fixes David Miller
  3 siblings, 0 replies; 18+ messages in thread
From: Jesse Gross @ 2016-03-19 16:32 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Ramu Ramamurthy

If a packet is either locally encapsulated or processed through GRO
it is marked with the offloads that it requires. However, when it is
decapsulated these tunnel offload indications are not removed. This
means that if we receive an encapsulated TCP packet, aggregate it with
GRO, decapsulate, and retransmit the resulting frame on a NIC that does
not support encapsulation, we won't be able to take advantage of hardware
offloads even though it is just a simple TCP packet at this point.

This fixes the problem by stripping off encapsulation offload indications
when packets are decapsulated.

The performance impacts of this bug are significant. In a test where a
Geneve encapsulated TCP stream is sent to a hypervisor, GRO'ed, decapsulated,
and bridged to a VM performance is improved by 60% (5Gbps->8Gbps) as a
result of avoiding unnecessary segmentation at the VM tap interface.

Reported-by: Ramu Ramamurthy <sramamur@linux.vnet.ibm.com>
Fixes: 68c33163 ("v4 GRE: Add TCP segmentation offload for GRE")
Signed-off-by: Jesse Gross <jesse@kernel.org>
---
v2: Additional performance numbers in the commit message.
---
 include/net/ip_tunnels.h  | 16 ++++++++++++++++
 net/ipv4/fou.c            | 13 +++++++++++--
 net/ipv4/ip_tunnel_core.c |  3 ++-
 net/ipv6/sit.c            |  6 ++++--
 4 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index c35dda9..56050f9 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -305,6 +305,22 @@ struct metadata_dst *iptunnel_metadata_reply(struct metadata_dst *md,
 
 struct sk_buff *iptunnel_handle_offloads(struct sk_buff *skb, int gso_type_mask);
 
+static inline int iptunnel_pull_offloads(struct sk_buff *skb)
+{
+	if (skb_is_gso(skb)) {
+		int err;
+
+		err = skb_unclone(skb, GFP_ATOMIC);
+		if (unlikely(err))
+			return err;
+		skb_shinfo(skb)->gso_type &= ~(NETIF_F_GSO_ENCAP_ALL >>
+					       NETIF_F_GSO_SHIFT);
+	}
+
+	skb->encapsulation = 0;
+	return 0;
+}
+
 static inline void iptunnel_xmit_stats(struct net_device *dev, int pkt_len)
 {
 	if (pkt_len > 0) {
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index 7804842..a0586b4 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -48,7 +48,7 @@ static inline struct fou *fou_from_sock(struct sock *sk)
 	return sk->sk_user_data;
 }
 
-static void fou_recv_pull(struct sk_buff *skb, size_t len)
+static int fou_recv_pull(struct sk_buff *skb, size_t len)
 {
 	struct iphdr *iph = ip_hdr(skb);
 
@@ -59,6 +59,7 @@ static void fou_recv_pull(struct sk_buff *skb, size_t len)
 	__skb_pull(skb, len);
 	skb_postpull_rcsum(skb, udp_hdr(skb), len);
 	skb_reset_transport_header(skb);
+	return iptunnel_pull_offloads(skb);
 }
 
 static int fou_udp_recv(struct sock *sk, struct sk_buff *skb)
@@ -68,9 +69,14 @@ static int fou_udp_recv(struct sock *sk, struct sk_buff *skb)
 	if (!fou)
 		return 1;
 
-	fou_recv_pull(skb, sizeof(struct udphdr));
+	if (fou_recv_pull(skb, sizeof(struct udphdr)))
+		goto drop;
 
 	return -fou->protocol;
+
+drop:
+	kfree_skb(skb);
+	return 0;
 }
 
 static struct guehdr *gue_remcsum(struct sk_buff *skb, struct guehdr *guehdr,
@@ -170,6 +176,9 @@ static int gue_udp_recv(struct sock *sk, struct sk_buff *skb)
 	__skb_pull(skb, sizeof(struct udphdr) + hdrlen);
 	skb_reset_transport_header(skb);
 
+	if (iptunnel_pull_offloads(skb))
+		goto drop;
+
 	return -guehdr->proto_ctype;
 
 drop:
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index d27276f..02dd990 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -114,7 +114,8 @@ int iptunnel_pull_header(struct sk_buff *skb, int hdr_len, __be16 inner_proto,
 	skb->vlan_tci = 0;
 	skb_set_queue_mapping(skb, 0);
 	skb_scrub_packet(skb, xnet);
-	return 0;
+
+	return iptunnel_pull_offloads(skb);
 }
 EXPORT_SYMBOL_GPL(iptunnel_pull_header);
 
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index f45b8ff..8338430 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -681,14 +681,16 @@ static int ipip6_rcv(struct sk_buff *skb)
 		skb->mac_header = skb->network_header;
 		skb_reset_network_header(skb);
 		IPCB(skb)->flags = 0;
-		skb->protocol = htons(ETH_P_IPV6);
+		skb->dev = tunnel->dev;
 
 		if (packet_is_spoofed(skb, iph, tunnel)) {
 			tunnel->dev->stats.rx_errors++;
 			goto out;
 		}
 
-		__skb_tunnel_rx(skb, tunnel->dev, tunnel->net);
+		if (iptunnel_pull_header(skb, 0, htons(ETH_P_IPV6),
+		    !net_eq(tunnel->net, dev_net(tunnel->dev))))
+			goto out;
 
 		err = IP_ECN_decapsulate(iph, skb);
 		if (unlikely(err)) {
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2 0/3] Tunneling fixes
  2016-03-19 16:31 [PATCH net v2 0/3] Tunneling fixes Jesse Gross
                   ` (2 preceding siblings ...)
  2016-03-19 16:32 ` [PATCH net v2 3/3] tunnels: Remove encapsulation offloads on decap Jesse Gross
@ 2016-03-20 20:33 ` David Miller
  3 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2016-03-20 20:33 UTC (permalink / raw)
  To: jesse; +Cc: netdev

From: Jesse Gross <jesse@kernel.org>
Date: Sat, 19 Mar 2016 09:31:59 -0700

> This series fixes a problem that was reported where encapsulated packets
> do not have their encapsulation offload markers stripped off when being
> decapsulated. This causes a significant performance drop if the packets
> are later retransmitted.
> 
> Fixing this revealed two other bugs which are also addressed as prerequisites:
>  * GRO can aggregate packets for multiple layers of encapsulation which the
>    stack cannot properly handle.
>  * IPIP packets which are combined by GRO are not marked properly with their
>    GSO type.
> 
> Note that this is based off the net-next tree as the current target for
> bug fixes.
> 
> v2: No code changes, just additional information in commit messages and
>     a new cover letter.

Applied, thanks for respinning this Jesse.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.
  2016-03-19 16:32 ` [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation Jesse Gross
@ 2016-03-26 19:41   ` Tom Herbert
  2016-03-28  4:38     ` Jesse Gross
  0 siblings, 1 reply; 18+ messages in thread
From: Tom Herbert @ 2016-03-26 19:41 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, Linux Kernel Network Developers

On Sat, Mar 19, 2016 at 9:32 AM, Jesse Gross <jesse@kernel.org> wrote:
> When drivers express support for TSO of encapsulated packets, they
> only mean that they can do it for one layer of encapsulation.
> Supporting additional levels would mean updating, at a minimum,
> more IP length fields and they are unaware of this.
>
This patch completely breaks GRO for FOU and GUE.

> No encapsulation device expresses support for handling offloaded
> encapsulated packets, so we won't generate these types of frames
> in the transmit path. However, GRO doesn't have a check for
> multiple levels of encapsulation and will attempt to build them.
>
> UDP tunnel GRO actually does prevent this situation but it only
> handles multiple UDP tunnels stacked on top of each other. This
> generalizes that solution to prevent any kind of tunnel stacking
> that would cause problems.
>
GUE and FOU regularly create packets that will be both GSO UDP tunnels
and IPIP, SIT, GRE, etc. This is by design. There should be no
ambiguity in the drivers as to what this means. For instance, if
SKB_GSO_UDP_TUNNEL and SKB_GSO_GRE are set that is GRE/UDP packet, a
driver can use ndo_features_check to validate.

So multiple levels of encapsulation with GRO is perfectly valid and I
would suggest to simply revert this patch. The one potential issue we
could have is the potential for GRO to construct a packet which is a
UDP-encapsulation inside another encapsulation, like UDP-encap in GRE.
In this case the GSO flags don't provide enough information to
distinguish say between GRE/UDP (common case) and UDP/GRE (uncommon
case). To make this clear we can set udp_mark in GRE, ipip, and sit
but *not* check for it.

Tom

> Fixes: bf5a755f ("net-gre-gro: Add GRE support to the GRO stack")
> Signed-off-by: Jesse Gross <jesse@kernel.org>
> ---
> v2: No change.
> ---
>  include/linux/netdevice.h |  4 ++--
>  net/core/dev.c            |  2 +-
>  net/ipv4/af_inet.c        | 15 ++++++++++++++-
>  net/ipv4/gre_offload.c    |  5 +++++
>  net/ipv4/udp_offload.c    |  6 +++---
>  net/ipv6/ip6_offload.c    | 15 ++++++++++++++-
>  6 files changed, 39 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index be693b3..f9eebd5 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -2096,8 +2096,8 @@ struct napi_gro_cb {
>         /* This is non-zero if the packet may be of the same flow. */
>         u8      same_flow:1;
>
> -       /* Used in udp_gro_receive */
> -       u8      udp_mark:1;
> +       /* Used in tunnel GRO receive */
> +       u8      encap_mark:1;
>
>         /* GRO checksum is valid */
>         u8      csum_valid:1;
> diff --git a/net/core/dev.c b/net/core/dev.c
> index edb7179..43c74ca 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4438,7 +4438,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
>                 NAPI_GRO_CB(skb)->same_flow = 0;
>                 NAPI_GRO_CB(skb)->flush = 0;
>                 NAPI_GRO_CB(skb)->free = 0;
> -               NAPI_GRO_CB(skb)->udp_mark = 0;
> +               NAPI_GRO_CB(skb)->encap_mark = 0;
>                 NAPI_GRO_CB(skb)->gro_remcsum_start = 0;
>
>                 /* Setup for GRO checksum validation */
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 9659233..0fefba6 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1380,6 +1380,19 @@ out:
>         return pp;
>  }
>
> +static struct sk_buff **ipip_gro_receive(struct sk_buff **head,
> +                                        struct sk_buff *skb)
> +{
> +       if (NAPI_GRO_CB(skb)->encap_mark) {
> +               NAPI_GRO_CB(skb)->flush = 1;
> +               return NULL;
> +       }
> +
> +       NAPI_GRO_CB(skb)->encap_mark = 1;
> +
> +       return inet_gro_receive(head, skb);
> +}
> +
>  #define SECONDS_PER_DAY        86400
>
>  /* inet_current_timestamp - Return IP network timestamp
> @@ -1682,7 +1695,7 @@ static struct packet_offload ip_packet_offload __read_mostly = {
>  static const struct net_offload ipip_offload = {
>         .callbacks = {
>                 .gso_segment    = inet_gso_segment,
> -               .gro_receive    = inet_gro_receive,
> +               .gro_receive    = ipip_gro_receive,
>                 .gro_complete   = ipip_gro_complete,
>         },
>  };
> diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
> index 540866d..dd03161 100644
> --- a/net/ipv4/gre_offload.c
> +++ b/net/ipv4/gre_offload.c
> @@ -126,6 +126,11 @@ static struct sk_buff **gre_gro_receive(struct sk_buff **head,
>         struct packet_offload *ptype;
>         __be16 type;
>
> +       if (NAPI_GRO_CB(skb)->encap_mark)
> +               goto out;
> +
> +       NAPI_GRO_CB(skb)->encap_mark = 1;
> +
>         off = skb_gro_offset(skb);
>         hlen = off + sizeof(*greh);
>         greh = skb_gro_header_fast(skb, off);
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index 8a3405a..8007f73 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -311,14 +311,14 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, struct sk_buff *skb,
>         unsigned int off = skb_gro_offset(skb);
>         int flush = 1;
>
> -       if (NAPI_GRO_CB(skb)->udp_mark ||
> +       if (NAPI_GRO_CB(skb)->encap_mark ||
>             (skb->ip_summed != CHECKSUM_PARTIAL &&
>              NAPI_GRO_CB(skb)->csum_cnt == 0 &&
>              !NAPI_GRO_CB(skb)->csum_valid))
>                 goto out;
>
> -       /* mark that this skb passed once through the udp gro layer */
> -       NAPI_GRO_CB(skb)->udp_mark = 1;
> +       /* mark that this skb passed once through the tunnel gro layer */
> +       NAPI_GRO_CB(skb)->encap_mark = 1;
>
>         rcu_read_lock();
>         uo_priv = rcu_dereference(udp_offload_base);
> diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
> index eeca943..82e9f30 100644
> --- a/net/ipv6/ip6_offload.c
> +++ b/net/ipv6/ip6_offload.c
> @@ -258,6 +258,19 @@ out:
>         return pp;
>  }
>
> +static struct sk_buff **sit_gro_receive(struct sk_buff **head,
> +                                       struct sk_buff *skb)
> +{
> +       if (NAPI_GRO_CB(skb)->encap_mark) {
> +               NAPI_GRO_CB(skb)->flush = 1;
> +               return NULL;
> +       }
> +
> +       NAPI_GRO_CB(skb)->encap_mark = 1;
> +
> +       return ipv6_gro_receive(head, skb);
> +}
> +
>  static int ipv6_gro_complete(struct sk_buff *skb, int nhoff)
>  {
>         const struct net_offload *ops;
> @@ -302,7 +315,7 @@ static struct packet_offload ipv6_packet_offload __read_mostly = {
>  static const struct net_offload sit_offload = {
>         .callbacks = {
>                 .gso_segment    = ipv6_gso_segment,
> -               .gro_receive    = ipv6_gro_receive,
> +               .gro_receive    = sit_gro_receive,
>                 .gro_complete   = sit_gro_complete,
>         },
>  };
> --
> 2.5.0
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.
  2016-03-26 19:41   ` Tom Herbert
@ 2016-03-28  4:38     ` Jesse Gross
  2016-03-28 16:31       ` Tom Herbert
  2016-03-28 19:33       ` David Miller
  0 siblings, 2 replies; 18+ messages in thread
From: Jesse Gross @ 2016-03-28  4:38 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, Linux Kernel Network Developers

On Sat, Mar 26, 2016 at 12:41 PM, Tom Herbert <tom@herbertland.com> wrote:
> On Sat, Mar 19, 2016 at 9:32 AM, Jesse Gross <jesse@kernel.org> wrote:
>> When drivers express support for TSO of encapsulated packets, they
>> only mean that they can do it for one layer of encapsulation.
>> Supporting additional levels would mean updating, at a minimum,
>> more IP length fields and they are unaware of this.
>>
> This patch completely breaks GRO for FOU and GUE.
>
>> No encapsulation device expresses support for handling offloaded
>> encapsulated packets, so we won't generate these types of frames
>> in the transmit path. However, GRO doesn't have a check for
>> multiple levels of encapsulation and will attempt to build them.
>>
>> UDP tunnel GRO actually does prevent this situation but it only
>> handles multiple UDP tunnels stacked on top of each other. This
>> generalizes that solution to prevent any kind of tunnel stacking
>> that would cause problems.
>>
> GUE and FOU regularly create packets that will be both GSO UDP tunnels
> and IPIP, SIT, GRE, etc. This is by design. There should be no
> ambiguity in the drivers as to what this means. For instance, if
> SKB_GSO_UDP_TUNNEL and SKB_GSO_GRE are set that is GRE/UDP packet, a
> driver can use ndo_features_check to validate.
>
> So multiple levels of encapsulation with GRO is perfectly valid and I
> would suggest to simply revert this patch. The one potential issue we
> could have is the potential for GRO to construct a packet which is a
> UDP-encapsulation inside another encapsulation, like UDP-encap in GRE.
> In this case the GSO flags don't provide enough information to
> distinguish say between GRE/UDP (common case) and UDP/GRE (uncommon
> case). To make this clear we can set udp_mark in GRE, ipip, and sit
> but *not* check for it.

Generally speaking, multiple levels of encapsulation offload are not
valid. I think this is pretty clear from the fact that none of the
encapsulation drivers expose support for encapsulation offloads on
transmit and the drivers supporting NETIF_F_GSO_GRE and
NETIF_F_GSO_UDP_TUNNEL do not mean they can handle GRE in VXLAN.

Asking drivers to assume that this combination of flags means FOU
doesn't seem right to me. To the best of my knowledge, no driver uses
ndo_feature_check to do validation of multiple tunnel offload flags
since the assumption is that the stack will never try to do this.
Since FOU is being treated as only a single level of encapsulation, I
think it would be better to just advertise it that way on transmit
(i.e. only set SKB_GSO_UDP_TUNNEL).

The change that you are suggesting would result in packets generated
by GRO that cannot be handled properly on transmit in some cases and
would likely end up being dropped or malformed. GRO is just an
optimization and correctness must come first so we cannot use it if it
might corrupt traffic.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.
  2016-03-28  4:38     ` Jesse Gross
@ 2016-03-28 16:31       ` Tom Herbert
  2016-03-28 17:37         ` Alexander Duyck
  2016-03-28 19:33       ` David Miller
  1 sibling, 1 reply; 18+ messages in thread
From: Tom Herbert @ 2016-03-28 16:31 UTC (permalink / raw)
  To: Jesse Gross; +Cc: David Miller, Linux Kernel Network Developers

On Sun, Mar 27, 2016 at 9:38 PM, Jesse Gross <jesse@kernel.org> wrote:
> On Sat, Mar 26, 2016 at 12:41 PM, Tom Herbert <tom@herbertland.com> wrote:
>> On Sat, Mar 19, 2016 at 9:32 AM, Jesse Gross <jesse@kernel.org> wrote:
>>> When drivers express support for TSO of encapsulated packets, they
>>> only mean that they can do it for one layer of encapsulation.
>>> Supporting additional levels would mean updating, at a minimum,
>>> more IP length fields and they are unaware of this.
>>>
>> This patch completely breaks GRO for FOU and GUE.
>>
>>> No encapsulation device expresses support for handling offloaded
>>> encapsulated packets, so we won't generate these types of frames
>>> in the transmit path. However, GRO doesn't have a check for
>>> multiple levels of encapsulation and will attempt to build them.
>>>
>>> UDP tunnel GRO actually does prevent this situation but it only
>>> handles multiple UDP tunnels stacked on top of each other. This
>>> generalizes that solution to prevent any kind of tunnel stacking
>>> that would cause problems.
>>>
>> GUE and FOU regularly create packets that will be both GSO UDP tunnels
>> and IPIP, SIT, GRE, etc. This is by design. There should be no
>> ambiguity in the drivers as to what this means. For instance, if
>> SKB_GSO_UDP_TUNNEL and SKB_GSO_GRE are set that is GRE/UDP packet, a
>> driver can use ndo_features_check to validate.
>>
>> So multiple levels of encapsulation with GRO is perfectly valid and I
>> would suggest to simply revert this patch. The one potential issue we
>> could have is the potential for GRO to construct a packet which is a
>> UDP-encapsulation inside another encapsulation, like UDP-encap in GRE.
>> In this case the GSO flags don't provide enough information to
>> distinguish say between GRE/UDP (common case) and UDP/GRE (uncommon
>> case). To make this clear we can set udp_mark in GRE, ipip, and sit
>> but *not* check for it.
>
> Generally speaking, multiple levels of encapsulation offload are not
> valid. I think this is pretty clear from the fact that none of the
> encapsulation drivers expose support for encapsulation offloads on
> transmit and the drivers supporting NETIF_F_GSO_GRE and
> NETIF_F_GSO_UDP_TUNNEL do not mean they can handle GRE in VXLAN.
>

Kernel software offload does support this combination just fine.
Seriously, I've tested that more than a thousand times. This is a few
HW implementations you're referring to. The limitations of these
drivers should not dictate how we build the software-- it needs to
work the other way around.

> Asking drivers to assume that this combination of flags means FOU
> doesn't seem right to me. To the best of my knowledge, no driver uses
> ndo_feature_check to do validation of multiple tunnel offload flags
> since the assumption is that the stack will never try to do this.
> Since FOU is being treated as only a single level of encapsulation, I
> think it would be better to just advertise it that way on transmit
> (i.e. only set SKB_GSO_UDP_TUNNEL).
>
If it's not FOU it will be something else. Arbitrarily declaring
multi-levels of encapsulation invalid is simply the wrong direction,
we should be increasing generality and capabilities of the kernel not
holding it down with artificial limits. This is why the generic TSO
work that Alexander and Edward are looking at is so important-- if
this flies then we can offload any combination of encapsulations with
out protocol specific information.

> The change that you are suggesting would result in packets generated
> by GRO that cannot be handled properly on transmit in some cases and
> would likely end up being dropped or malformed. GRO is just an
> optimization and correctness must come first so we cannot use it if it
> might corrupt traffic.

That's (a few) drivers problem. It's no different than if they had
decided that SKB_GSO_UDP_TUNNEL means vxlan, or they can support GRE
in IPv4 offload but not GRE in IPv6, or they can only handle headers
of a certain size, can't handle IPv6 ext. hdrs., etc. As I mentioned,
the long term solution is to eliminate the GSO_ flags and use a
generic segmentation offload interface. But in the interim, it is
*incumbent* on drivers to determine if they can handle a GSO packet
and the interfaces to do that exist. Restricting the capabilities of
GRO just to appease those drivers is not right. Breaking existing GRO
for their benefit is definitely not right.

Tom

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.
  2016-03-28 16:31       ` Tom Herbert
@ 2016-03-28 17:37         ` Alexander Duyck
  2016-03-28 18:47           ` Tom Herbert
  0 siblings, 1 reply; 18+ messages in thread
From: Alexander Duyck @ 2016-03-28 17:37 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Jesse Gross, David Miller, Linux Kernel Network Developers

On Mon, Mar 28, 2016 at 9:31 AM, Tom Herbert <tom@herbertland.com> wrote:
> On Sun, Mar 27, 2016 at 9:38 PM, Jesse Gross <jesse@kernel.org> wrote:
>> On Sat, Mar 26, 2016 at 12:41 PM, Tom Herbert <tom@herbertland.com> wrote:
>>> On Sat, Mar 19, 2016 at 9:32 AM, Jesse Gross <jesse@kernel.org> wrote:
>>>> When drivers express support for TSO of encapsulated packets, they
>>>> only mean that they can do it for one layer of encapsulation.
>>>> Supporting additional levels would mean updating, at a minimum,
>>>> more IP length fields and they are unaware of this.
>>>>
>>> This patch completely breaks GRO for FOU and GUE.
>>>
>>>> No encapsulation device expresses support for handling offloaded
>>>> encapsulated packets, so we won't generate these types of frames
>>>> in the transmit path. However, GRO doesn't have a check for
>>>> multiple levels of encapsulation and will attempt to build them.
>>>>
>>>> UDP tunnel GRO actually does prevent this situation but it only
>>>> handles multiple UDP tunnels stacked on top of each other. This
>>>> generalizes that solution to prevent any kind of tunnel stacking
>>>> that would cause problems.
>>>>
>>> GUE and FOU regularly create packets that will be both GSO UDP tunnels
>>> and IPIP, SIT, GRE, etc. This is by design. There should be no
>>> ambiguity in the drivers as to what this means. For instance, if
>>> SKB_GSO_UDP_TUNNEL and SKB_GSO_GRE are set that is GRE/UDP packet, a
>>> driver can use ndo_features_check to validate.
>>>
>>> So multiple levels of encapsulation with GRO is perfectly valid and I
>>> would suggest to simply revert this patch. The one potential issue we
>>> could have is the potential for GRO to construct a packet which is a
>>> UDP-encapsulation inside another encapsulation, like UDP-encap in GRE.
>>> In this case the GSO flags don't provide enough information to
>>> distinguish say between GRE/UDP (common case) and UDP/GRE (uncommon
>>> case). To make this clear we can set udp_mark in GRE, ipip, and sit
>>> but *not* check for it.
>>
>> Generally speaking, multiple levels of encapsulation offload are not
>> valid. I think this is pretty clear from the fact that none of the
>> encapsulation drivers expose support for encapsulation offloads on
>> transmit and the drivers supporting NETIF_F_GSO_GRE and
>> NETIF_F_GSO_UDP_TUNNEL do not mean they can handle GRE in VXLAN.
>>
>
> Kernel software offload does support this combination just fine.
> Seriously, I've tested that more than a thousand times. This is a few
> HW implementations you're referring to. The limitations of these
> drivers should not dictate how we build the software-- it needs to
> work the other way around.

Jesse does have a point.  Drivers aren't checking for this kind of
thing currently as the transmit side doesn't generate these kind of
frames.  The fact that GRO is makes things a bit more messy as we will
really need to restructure the GSO code in order to handle it.  As far
as drivers testing for it I am pretty certain the i40e isn't.  I would
wonder if we need to add yet another GSO bit to indicate that we
support multiple layers of encapsulation.  I'm pretty sure the only
way we could possibly handle it would be in software since what you
are indicating is a indeterminate number of headers that all require
updates.

>> Asking drivers to assume that this combination of flags means FOU
>> doesn't seem right to me. To the best of my knowledge, no driver uses
>> ndo_feature_check to do validation of multiple tunnel offload flags
>> since the assumption is that the stack will never try to do this.
>> Since FOU is being treated as only a single level of encapsulation, I
>> think it would be better to just advertise it that way on transmit
>> (i.e. only set SKB_GSO_UDP_TUNNEL).
>>
> If it's not FOU it will be something else. Arbitrarily declaring
> multi-levels of encapsulation invalid is simply the wrong direction,
> we should be increasing generality and capabilities of the kernel not
> holding it down with artificial limits. This is why the generic TSO
> work that Alexander and Edward are looking at is so important-- if
> this flies then we can offload any combination of encapsulations with
> out protocol specific information.

The segmentation code isn't designed to handle more than 2 layers of
headers.  Currently we have the pointers for the inner headers and the
outer headers.  If you are talking about adding yet another level we
would need additional pointers in the skbuff to handle them and we
currently don't have them at present.

>> The change that you are suggesting would result in packets generated
>> by GRO that cannot be handled properly on transmit in some cases and
>> would likely end up being dropped or malformed. GRO is just an
>> optimization and correctness must come first so we cannot use it if it
>> might corrupt traffic.
>
> That's (a few) drivers problem. It's no different than if they had
> decided that SKB_GSO_UDP_TUNNEL means vxlan, or they can support GRE
> in IPv4 offload but not GRE in IPv6, or they can only handle headers
> of a certain size, can't handle IPv6 ext. hdrs., etc. As I mentioned,
> the long term solution is to eliminate the GSO_ flags and use a
> generic segmentation offload interface. But in the interim, it is
> *incumbent* on drivers to determine if they can handle a GSO packet
> and the interfaces to do that exist. Restricting the capabilities of
> GRO just to appease those drivers is not right. Breaking existing GRO
> for their benefit is definitely not right.

This isn't about if drivers can handle it.  It is about if the skbuff
can handle it.  The problem as it stands right now is that we start
losing data once we go past 1 level of encapsulation.  We only have
the standard mac_header, network_header, transport_header, and then
the inner set of the same pointers.  If we try to go multiple levels
deep we start losing data.

If we are wanting to start allowing unlimited headers then maybe we
need to start allowing for "partial" GRO to complement the partial GSO
implementation I have.  With that at least we might be able to track
the first and last headers and we could leave the remaining headers
static.  Then when the frame made it to the driver it would know that
the headers in the long list can be replicated and it only has to
update the outer network header and the inner transport header,
otherwise if it doesn't support that it can just chop up the frame in
software and send that out.

- Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.
  2016-03-28 17:37         ` Alexander Duyck
@ 2016-03-28 18:47           ` Tom Herbert
  2016-03-28 19:31             ` Alexander Duyck
  0 siblings, 1 reply; 18+ messages in thread
From: Tom Herbert @ 2016-03-28 18:47 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jesse Gross, David Miller, Linux Kernel Network Developers

On Mon, Mar 28, 2016 at 10:37 AM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Mon, Mar 28, 2016 at 9:31 AM, Tom Herbert <tom@herbertland.com> wrote:
>> On Sun, Mar 27, 2016 at 9:38 PM, Jesse Gross <jesse@kernel.org> wrote:
>>> On Sat, Mar 26, 2016 at 12:41 PM, Tom Herbert <tom@herbertland.com> wrote:
>>>> On Sat, Mar 19, 2016 at 9:32 AM, Jesse Gross <jesse@kernel.org> wrote:
>>>>> When drivers express support for TSO of encapsulated packets, they
>>>>> only mean that they can do it for one layer of encapsulation.
>>>>> Supporting additional levels would mean updating, at a minimum,
>>>>> more IP length fields and they are unaware of this.
>>>>>
>>>> This patch completely breaks GRO for FOU and GUE.
>>>>
>>>>> No encapsulation device expresses support for handling offloaded
>>>>> encapsulated packets, so we won't generate these types of frames
>>>>> in the transmit path. However, GRO doesn't have a check for
>>>>> multiple levels of encapsulation and will attempt to build them.
>>>>>
>>>>> UDP tunnel GRO actually does prevent this situation but it only
>>>>> handles multiple UDP tunnels stacked on top of each other. This
>>>>> generalizes that solution to prevent any kind of tunnel stacking
>>>>> that would cause problems.
>>>>>
>>>> GUE and FOU regularly create packets that will be both GSO UDP tunnels
>>>> and IPIP, SIT, GRE, etc. This is by design. There should be no
>>>> ambiguity in the drivers as to what this means. For instance, if
>>>> SKB_GSO_UDP_TUNNEL and SKB_GSO_GRE are set that is GRE/UDP packet, a
>>>> driver can use ndo_features_check to validate.
>>>>
>>>> So multiple levels of encapsulation with GRO is perfectly valid and I
>>>> would suggest to simply revert this patch. The one potential issue we
>>>> could have is the potential for GRO to construct a packet which is a
>>>> UDP-encapsulation inside another encapsulation, like UDP-encap in GRE.
>>>> In this case the GSO flags don't provide enough information to
>>>> distinguish say between GRE/UDP (common case) and UDP/GRE (uncommon
>>>> case). To make this clear we can set udp_mark in GRE, ipip, and sit
>>>> but *not* check for it.
>>>
>>> Generally speaking, multiple levels of encapsulation offload are not
>>> valid. I think this is pretty clear from the fact that none of the
>>> encapsulation drivers expose support for encapsulation offloads on
>>> transmit and the drivers supporting NETIF_F_GSO_GRE and
>>> NETIF_F_GSO_UDP_TUNNEL do not mean they can handle GRE in VXLAN.
>>>
>>
>> Kernel software offload does support this combination just fine.
>> Seriously, I've tested that more than a thousand times. This is a few
>> HW implementations you're referring to. The limitations of these
>> drivers should not dictate how we build the software-- it needs to
>> work the other way around.
>
> Jesse does have a point.  Drivers aren't checking for this kind of
> thing currently as the transmit side doesn't generate these kind of
> frames.  The fact that GRO is makes things a bit more messy as we will
> really need to restructure the GSO code in order to handle it.  As far
> as drivers testing for it I am pretty certain the i40e isn't.  I would
> wonder if we need to add yet another GSO bit to indicate that we
> support multiple layers of encapsulation.  I'm pretty sure the only
> way we could possibly handle it would be in software since what you
> are indicating is a indeterminate number of headers that all require
> updates.
>
>>> Asking drivers to assume that this combination of flags means FOU
>>> doesn't seem right to me. To the best of my knowledge, no driver uses
>>> ndo_feature_check to do validation of multiple tunnel offload flags
>>> since the assumption is that the stack will never try to do this.
>>> Since FOU is being treated as only a single level of encapsulation, I
>>> think it would be better to just advertise it that way on transmit
>>> (i.e. only set SKB_GSO_UDP_TUNNEL).
>>>
>> If it's not FOU it will be something else. Arbitrarily declaring
>> multi-levels of encapsulation invalid is simply the wrong direction,
>> we should be increasing generality and capabilities of the kernel not
>> holding it down with artificial limits. This is why the generic TSO
>> work that Alexander and Edward are looking at is so important-- if
>> this flies then we can offload any combination of encapsulations with
>> out protocol specific information.
>
> The segmentation code isn't designed to handle more than 2 layers of
> headers.  Currently we have the pointers for the inner headers and the
> outer headers.  If you are talking about adding yet another level we
> would need additional pointers in the skbuff to handle them and we
> currently don't have them at present.
>
>>> The change that you are suggesting would result in packets generated
>>> by GRO that cannot be handled properly on transmit in some cases and
>>> would likely end up being dropped or malformed. GRO is just an
>>> optimization and correctness must come first so we cannot use it if it
>>> might corrupt traffic.
>>
>> That's (a few) drivers problem. It's no different than if they had
>> decided that SKB_GSO_UDP_TUNNEL means vxlan, or they can support GRE
>> in IPv4 offload but not GRE in IPv6, or they can only handle headers
>> of a certain size, can't handle IPv6 ext. hdrs., etc. As I mentioned,
>> the long term solution is to eliminate the GSO_ flags and use a
>> generic segmentation offload interface. But in the interim, it is
>> *incumbent* on drivers to determine if they can handle a GSO packet
>> and the interfaces to do that exist. Restricting the capabilities of
>> GRO just to appease those drivers is not right. Breaking existing GRO
>> for their benefit is definitely not right.
>
> This isn't about if drivers can handle it.  It is about if the skbuff
> can handle it.  The problem as it stands right now is that we start
> losing data once we go past 1 level of encapsulation.  We only have
> the standard mac_header, network_header, transport_header, and then
> the inner set of the same pointers.  If we try to go multiple levels
> deep we start losing data.
>
Huh? GUE/FOU has been running perfectly well with two levels of
encapsulation for over a year now. We never had to add anything to
skbuff to make that work. If "losing data" is a problem please provide
the *reproducible* test case for that and we'll debug that.

> If we are wanting to start allowing unlimited headers then maybe we
> need to start allowing for "partial" GRO to complement the partial GSO
> implementation I have.  With that at least we might be able to track
> the first and last headers and we could leave the remaining headers
> static.  Then when the frame made it to the driver it would know that
> the headers in the long list can be replicated and it only has to
> update the outer network header and the inner transport header,
> otherwise if it doesn't support that it can just chop up the frame in
> software and send that out.
>
> - Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.
  2016-03-28 18:47           ` Tom Herbert
@ 2016-03-28 19:31             ` Alexander Duyck
  2016-03-28 20:03               ` Tom Herbert
  0 siblings, 1 reply; 18+ messages in thread
From: Alexander Duyck @ 2016-03-28 19:31 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Jesse Gross, David Miller, Linux Kernel Network Developers

On Mon, Mar 28, 2016 at 11:47 AM, Tom Herbert <tom@herbertland.com> wrote:
> On Mon, Mar 28, 2016 at 10:37 AM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On Mon, Mar 28, 2016 at 9:31 AM, Tom Herbert <tom@herbertland.com> wrote:
>>> On Sun, Mar 27, 2016 at 9:38 PM, Jesse Gross <jesse@kernel.org> wrote:
>>>> On Sat, Mar 26, 2016 at 12:41 PM, Tom Herbert <tom@herbertland.com> wrote:
>>>>> On Sat, Mar 19, 2016 at 9:32 AM, Jesse Gross <jesse@kernel.org> wrote:
>>>>>> When drivers express support for TSO of encapsulated packets, they
>>>>>> only mean that they can do it for one layer of encapsulation.
>>>>>> Supporting additional levels would mean updating, at a minimum,
>>>>>> more IP length fields and they are unaware of this.
>>>>>>
>>>>> This patch completely breaks GRO for FOU and GUE.
>>>>>
>>>>>> No encapsulation device expresses support for handling offloaded
>>>>>> encapsulated packets, so we won't generate these types of frames
>>>>>> in the transmit path. However, GRO doesn't have a check for
>>>>>> multiple levels of encapsulation and will attempt to build them.
>>>>>>
>>>>>> UDP tunnel GRO actually does prevent this situation but it only
>>>>>> handles multiple UDP tunnels stacked on top of each other. This
>>>>>> generalizes that solution to prevent any kind of tunnel stacking
>>>>>> that would cause problems.
>>>>>>
>>>>> GUE and FOU regularly create packets that will be both GSO UDP tunnels
>>>>> and IPIP, SIT, GRE, etc. This is by design. There should be no
>>>>> ambiguity in the drivers as to what this means. For instance, if
>>>>> SKB_GSO_UDP_TUNNEL and SKB_GSO_GRE are set that is GRE/UDP packet, a
>>>>> driver can use ndo_features_check to validate.
>>>>>
>>>>> So multiple levels of encapsulation with GRO is perfectly valid and I
>>>>> would suggest to simply revert this patch. The one potential issue we
>>>>> could have is the potential for GRO to construct a packet which is a
>>>>> UDP-encapsulation inside another encapsulation, like UDP-encap in GRE.
>>>>> In this case the GSO flags don't provide enough information to
>>>>> distinguish say between GRE/UDP (common case) and UDP/GRE (uncommon
>>>>> case). To make this clear we can set udp_mark in GRE, ipip, and sit
>>>>> but *not* check for it.
>>>>
>>>> Generally speaking, multiple levels of encapsulation offload are not
>>>> valid. I think this is pretty clear from the fact that none of the
>>>> encapsulation drivers expose support for encapsulation offloads on
>>>> transmit and the drivers supporting NETIF_F_GSO_GRE and
>>>> NETIF_F_GSO_UDP_TUNNEL do not mean they can handle GRE in VXLAN.
>>>>
>>>
>>> Kernel software offload does support this combination just fine.
>>> Seriously, I've tested that more than a thousand times. This is a few
>>> HW implementations you're referring to. The limitations of these
>>> drivers should not dictate how we build the software-- it needs to
>>> work the other way around.
>>
>> Jesse does have a point.  Drivers aren't checking for this kind of
>> thing currently as the transmit side doesn't generate these kind of
>> frames.  The fact that GRO is makes things a bit more messy as we will
>> really need to restructure the GSO code in order to handle it.  As far
>> as drivers testing for it I am pretty certain the i40e isn't.  I would
>> wonder if we need to add yet another GSO bit to indicate that we
>> support multiple layers of encapsulation.  I'm pretty sure the only
>> way we could possibly handle it would be in software since what you
>> are indicating is a indeterminate number of headers that all require
>> updates.
>>
>>>> Asking drivers to assume that this combination of flags means FOU
>>>> doesn't seem right to me. To the best of my knowledge, no driver uses
>>>> ndo_feature_check to do validation of multiple tunnel offload flags
>>>> since the assumption is that the stack will never try to do this.
>>>> Since FOU is being treated as only a single level of encapsulation, I
>>>> think it would be better to just advertise it that way on transmit
>>>> (i.e. only set SKB_GSO_UDP_TUNNEL).
>>>>
>>> If it's not FOU it will be something else. Arbitrarily declaring
>>> multi-levels of encapsulation invalid is simply the wrong direction,
>>> we should be increasing generality and capabilities of the kernel not
>>> holding it down with artificial limits. This is why the generic TSO
>>> work that Alexander and Edward are looking at is so important-- if
>>> this flies then we can offload any combination of encapsulations with
>>> out protocol specific information.
>>
>> The segmentation code isn't designed to handle more than 2 layers of
>> headers.  Currently we have the pointers for the inner headers and the
>> outer headers.  If you are talking about adding yet another level we
>> would need additional pointers in the skbuff to handle them and we
>> currently don't have them at present.
>>
>>>> The change that you are suggesting would result in packets generated
>>>> by GRO that cannot be handled properly on transmit in some cases and
>>>> would likely end up being dropped or malformed. GRO is just an
>>>> optimization and correctness must come first so we cannot use it if it
>>>> might corrupt traffic.
>>>
>>> That's (a few) drivers problem. It's no different than if they had
>>> decided that SKB_GSO_UDP_TUNNEL means vxlan, or they can support GRE
>>> in IPv4 offload but not GRE in IPv6, or they can only handle headers
>>> of a certain size, can't handle IPv6 ext. hdrs., etc. As I mentioned,
>>> the long term solution is to eliminate the GSO_ flags and use a
>>> generic segmentation offload interface. But in the interim, it is
>>> *incumbent* on drivers to determine if they can handle a GSO packet
>>> and the interfaces to do that exist. Restricting the capabilities of
>>> GRO just to appease those drivers is not right. Breaking existing GRO
>>> for their benefit is definitely not right.
>>
>> This isn't about if drivers can handle it.  It is about if the skbuff
>> can handle it.  The problem as it stands right now is that we start
>> losing data once we go past 1 level of encapsulation.  We only have
>> the standard mac_header, network_header, transport_header, and then
>> the inner set of the same pointers.  If we try to go multiple levels
>> deep we start losing data.
>>
> Huh? GUE/FOU has been running perfectly well with two levels of
> encapsulation for over a year now. We never had to add anything to
> skbuff to make that work. If "losing data" is a problem please provide
> the *reproducible* test case for that and we'll debug that.

I'm guessing most of your examples involve either a remote checksum
being enabled or using NICs that don't support any tunnel offloads?
Hardware needs to be able to identify where headers are in order to
perform most of their offloads for TSO.  We either have to parse to
find them or we are provided with them by the stack.  GSO can work
around it as long as we don't stack checksum based offload with
non-checksum of the same type.

mostIf for example you were to try and send a frame that had either an
inner or outer GRE tunnel in addition to a UDP tunnel most NICs would
probably screw it up.  Up until now that hasn't been an issue.  As we
start turning on offload support for multiple tunnel types thanks to
the partial offloads it will come back and bite us if we try to tell
hardware to handle more than 2 levels of headers.  I'm thinking if
nothing else we might have to add yet another bit to GSO for stacked
tunnels which can probably only be supported via partial GSO and only
if we can get away with just replicating headers.

It looks like we need to go through and probably clean up both the GSO
and GRO code.  First we have to get the GSO code setup so that it can
fully handle multiple levels of tunnels.  The code as it is now
assumes you would only have one level and we are configuring the
headers as such.  In addition the GRO code doesn't seem to place the
header offsets correctly.  For instance, from what I can tell it looks
like the inner transport offset is never updated.  We will probably
need to have pointers for the inner-most and outer-most set of headers
and from there we can start handling the offloads.

- Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.
  2016-03-28  4:38     ` Jesse Gross
  2016-03-28 16:31       ` Tom Herbert
@ 2016-03-28 19:33       ` David Miller
  1 sibling, 0 replies; 18+ messages in thread
From: David Miller @ 2016-03-28 19:33 UTC (permalink / raw)
  To: jesse; +Cc: tom, netdev

From: Jesse Gross <jesse@kernel.org>
Date: Sun, 27 Mar 2016 21:38:34 -0700

> Generally speaking, multiple levels of encapsulation offload are not
> valid.

That's very much not true Jesse.

Please fix the regression you introduced or I will have no choice but to
revert your entire set of changes.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.
  2016-03-28 19:31             ` Alexander Duyck
@ 2016-03-28 20:03               ` Tom Herbert
  2016-03-28 20:34                 ` Alexander Duyck
  0 siblings, 1 reply; 18+ messages in thread
From: Tom Herbert @ 2016-03-28 20:03 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jesse Gross, David Miller, Linux Kernel Network Developers

On Mon, Mar 28, 2016 at 12:31 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Mon, Mar 28, 2016 at 11:47 AM, Tom Herbert <tom@herbertland.com> wrote:
>> On Mon, Mar 28, 2016 at 10:37 AM, Alexander Duyck
>> <alexander.duyck@gmail.com> wrote:
>>> On Mon, Mar 28, 2016 at 9:31 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>> On Sun, Mar 27, 2016 at 9:38 PM, Jesse Gross <jesse@kernel.org> wrote:
>>>>> On Sat, Mar 26, 2016 at 12:41 PM, Tom Herbert <tom@herbertland.com> wrote:
>>>>>> On Sat, Mar 19, 2016 at 9:32 AM, Jesse Gross <jesse@kernel.org> wrote:
>>>>>>> When drivers express support for TSO of encapsulated packets, they
>>>>>>> only mean that they can do it for one layer of encapsulation.
>>>>>>> Supporting additional levels would mean updating, at a minimum,
>>>>>>> more IP length fields and they are unaware of this.
>>>>>>>
>>>>>> This patch completely breaks GRO for FOU and GUE.
>>>>>>
>>>>>>> No encapsulation device expresses support for handling offloaded
>>>>>>> encapsulated packets, so we won't generate these types of frames
>>>>>>> in the transmit path. However, GRO doesn't have a check for
>>>>>>> multiple levels of encapsulation and will attempt to build them.
>>>>>>>
>>>>>>> UDP tunnel GRO actually does prevent this situation but it only
>>>>>>> handles multiple UDP tunnels stacked on top of each other. This
>>>>>>> generalizes that solution to prevent any kind of tunnel stacking
>>>>>>> that would cause problems.
>>>>>>>
>>>>>> GUE and FOU regularly create packets that will be both GSO UDP tunnels
>>>>>> and IPIP, SIT, GRE, etc. This is by design. There should be no
>>>>>> ambiguity in the drivers as to what this means. For instance, if
>>>>>> SKB_GSO_UDP_TUNNEL and SKB_GSO_GRE are set that is GRE/UDP packet, a
>>>>>> driver can use ndo_features_check to validate.
>>>>>>
>>>>>> So multiple levels of encapsulation with GRO is perfectly valid and I
>>>>>> would suggest to simply revert this patch. The one potential issue we
>>>>>> could have is the potential for GRO to construct a packet which is a
>>>>>> UDP-encapsulation inside another encapsulation, like UDP-encap in GRE.
>>>>>> In this case the GSO flags don't provide enough information to
>>>>>> distinguish say between GRE/UDP (common case) and UDP/GRE (uncommon
>>>>>> case). To make this clear we can set udp_mark in GRE, ipip, and sit
>>>>>> but *not* check for it.
>>>>>
>>>>> Generally speaking, multiple levels of encapsulation offload are not
>>>>> valid. I think this is pretty clear from the fact that none of the
>>>>> encapsulation drivers expose support for encapsulation offloads on
>>>>> transmit and the drivers supporting NETIF_F_GSO_GRE and
>>>>> NETIF_F_GSO_UDP_TUNNEL do not mean they can handle GRE in VXLAN.
>>>>>
>>>>
>>>> Kernel software offload does support this combination just fine.
>>>> Seriously, I've tested that more than a thousand times. This is a few
>>>> HW implementations you're referring to. The limitations of these
>>>> drivers should not dictate how we build the software-- it needs to
>>>> work the other way around.
>>>
>>> Jesse does have a point.  Drivers aren't checking for this kind of
>>> thing currently as the transmit side doesn't generate these kind of
>>> frames.  The fact that GRO is makes things a bit more messy as we will
>>> really need to restructure the GSO code in order to handle it.  As far
>>> as drivers testing for it I am pretty certain the i40e isn't.  I would
>>> wonder if we need to add yet another GSO bit to indicate that we
>>> support multiple layers of encapsulation.  I'm pretty sure the only
>>> way we could possibly handle it would be in software since what you
>>> are indicating is a indeterminate number of headers that all require
>>> updates.
>>>
>>>>> Asking drivers to assume that this combination of flags means FOU
>>>>> doesn't seem right to me. To the best of my knowledge, no driver uses
>>>>> ndo_feature_check to do validation of multiple tunnel offload flags
>>>>> since the assumption is that the stack will never try to do this.
>>>>> Since FOU is being treated as only a single level of encapsulation, I
>>>>> think it would be better to just advertise it that way on transmit
>>>>> (i.e. only set SKB_GSO_UDP_TUNNEL).
>>>>>
>>>> If it's not FOU it will be something else. Arbitrarily declaring
>>>> multi-levels of encapsulation invalid is simply the wrong direction,
>>>> we should be increasing generality and capabilities of the kernel not
>>>> holding it down with artificial limits. This is why the generic TSO
>>>> work that Alexander and Edward are looking at is so important-- if
>>>> this flies then we can offload any combination of encapsulations with
>>>> out protocol specific information.
>>>
>>> The segmentation code isn't designed to handle more than 2 layers of
>>> headers.  Currently we have the pointers for the inner headers and the
>>> outer headers.  If you are talking about adding yet another level we
>>> would need additional pointers in the skbuff to handle them and we
>>> currently don't have them at present.
>>>
>>>>> The change that you are suggesting would result in packets generated
>>>>> by GRO that cannot be handled properly on transmit in some cases and
>>>>> would likely end up being dropped or malformed. GRO is just an
>>>>> optimization and correctness must come first so we cannot use it if it
>>>>> might corrupt traffic.
>>>>
>>>> That's (a few) drivers problem. It's no different than if they had
>>>> decided that SKB_GSO_UDP_TUNNEL means vxlan, or they can support GRE
>>>> in IPv4 offload but not GRE in IPv6, or they can only handle headers
>>>> of a certain size, can't handle IPv6 ext. hdrs., etc. As I mentioned,
>>>> the long term solution is to eliminate the GSO_ flags and use a
>>>> generic segmentation offload interface. But in the interim, it is
>>>> *incumbent* on drivers to determine if they can handle a GSO packet
>>>> and the interfaces to do that exist. Restricting the capabilities of
>>>> GRO just to appease those drivers is not right. Breaking existing GRO
>>>> for their benefit is definitely not right.
>>>
>>> This isn't about if drivers can handle it.  It is about if the skbuff
>>> can handle it.  The problem as it stands right now is that we start
>>> losing data once we go past 1 level of encapsulation.  We only have
>>> the standard mac_header, network_header, transport_header, and then
>>> the inner set of the same pointers.  If we try to go multiple levels
>>> deep we start losing data.
>>>
>> Huh? GUE/FOU has been running perfectly well with two levels of
>> encapsulation for over a year now. We never had to add anything to
>> skbuff to make that work. If "losing data" is a problem please provide
>> the *reproducible* test case for that and we'll debug that.
>
> I'm guessing most of your examples involve either a remote checksum
> being enabled or using NICs that don't support any tunnel offloads?
> Hardware needs to be able to identify where headers are in order to
> perform most of their offloads for TSO.  We either have to parse to
> find them or we are provided with them by the stack.  GSO can work
> around it as long as we don't stack checksum based offload with
> non-checksum of the same type.
>
> mostIf for example you were to try and send a frame that had either an
> inner or outer GRE tunnel in addition to a UDP tunnel most NICs would
> probably screw it up.

Please be more precise. Obviously we're only talking about the few
NICs that even support UDP tunnel offload so it's not most NICs. Also,
there is a big difference between "probably" and "definitely"; no one
has provided a single data point that that there is even an issue. For
instance, looking a i40e I suspect this will work with GRE/UDP since
the driver already deals with offsets and shouldn't care about
intermediate levels of encapsulation.

> Up until now that hasn't been an issue.  As we
> start turning on offload support for multiple tunnel types thanks to
> the partial offloads it will come back and bite us if we try to tell
> hardware to handle more than 2 levels of headers.  I'm thinking if
> nothing else we might have to add yet another bit to GSO for stacked
> tunnels which can probably only be supported via partial GSO and only
> if we can get away with just replicating headers.
>
> It looks like we need to go through and probably clean up both the GSO
> and GRO code.  First we have to get the GSO code setup so that it can
> fully handle multiple levels of tunnels.  The code as it is now
> assumes you would only have one level and we are configuring the
> headers as such.  In addition the GRO code doesn't seem to place the
> header offsets correctly.  For instance, from what I can tell it looks
> like the inner transport offset is never updated.  We will probably
> need to have pointers for the inner-most and outer-most set of headers
> and from there we can start handling the offloads.
>
> - Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.
  2016-03-28 20:03               ` Tom Herbert
@ 2016-03-28 20:34                 ` Alexander Duyck
  2016-03-28 22:10                   ` Tom Herbert
  0 siblings, 1 reply; 18+ messages in thread
From: Alexander Duyck @ 2016-03-28 20:34 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Jesse Gross, David Miller, Linux Kernel Network Developers

On Mon, Mar 28, 2016 at 1:03 PM, Tom Herbert <tom@herbertland.com> wrote:
> On Mon, Mar 28, 2016 at 12:31 PM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On Mon, Mar 28, 2016 at 11:47 AM, Tom Herbert <tom@herbertland.com> wrote:
>>> On Mon, Mar 28, 2016 at 10:37 AM, Alexander Duyck
>>> <alexander.duyck@gmail.com> wrote:
>>>> On Mon, Mar 28, 2016 at 9:31 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>>> On Sun, Mar 27, 2016 at 9:38 PM, Jesse Gross <jesse@kernel.org> wrote:
>>>>>> On Sat, Mar 26, 2016 at 12:41 PM, Tom Herbert <tom@herbertland.com> wrote:
>>>>>>> On Sat, Mar 19, 2016 at 9:32 AM, Jesse Gross <jesse@kernel.org> wrote:
>>>>>>>> When drivers express support for TSO of encapsulated packets, they
>>>>>>>> only mean that they can do it for one layer of encapsulation.
>>>>>>>> Supporting additional levels would mean updating, at a minimum,
>>>>>>>> more IP length fields and they are unaware of this.
>>>>>>>>
>>>>>>> This patch completely breaks GRO for FOU and GUE.
>>>>>>>
>>>>>>>> No encapsulation device expresses support for handling offloaded
>>>>>>>> encapsulated packets, so we won't generate these types of frames
>>>>>>>> in the transmit path. However, GRO doesn't have a check for
>>>>>>>> multiple levels of encapsulation and will attempt to build them.
>>>>>>>>
>>>>>>>> UDP tunnel GRO actually does prevent this situation but it only
>>>>>>>> handles multiple UDP tunnels stacked on top of each other. This
>>>>>>>> generalizes that solution to prevent any kind of tunnel stacking
>>>>>>>> that would cause problems.
>>>>>>>>
>>>>>>> GUE and FOU regularly create packets that will be both GSO UDP tunnels
>>>>>>> and IPIP, SIT, GRE, etc. This is by design. There should be no
>>>>>>> ambiguity in the drivers as to what this means. For instance, if
>>>>>>> SKB_GSO_UDP_TUNNEL and SKB_GSO_GRE are set that is GRE/UDP packet, a
>>>>>>> driver can use ndo_features_check to validate.
>>>>>>>
>>>>>>> So multiple levels of encapsulation with GRO is perfectly valid and I
>>>>>>> would suggest to simply revert this patch. The one potential issue we
>>>>>>> could have is the potential for GRO to construct a packet which is a
>>>>>>> UDP-encapsulation inside another encapsulation, like UDP-encap in GRE.
>>>>>>> In this case the GSO flags don't provide enough information to
>>>>>>> distinguish say between GRE/UDP (common case) and UDP/GRE (uncommon
>>>>>>> case). To make this clear we can set udp_mark in GRE, ipip, and sit
>>>>>>> but *not* check for it.
>>>>>>
>>>>>> Generally speaking, multiple levels of encapsulation offload are not
>>>>>> valid. I think this is pretty clear from the fact that none of the
>>>>>> encapsulation drivers expose support for encapsulation offloads on
>>>>>> transmit and the drivers supporting NETIF_F_GSO_GRE and
>>>>>> NETIF_F_GSO_UDP_TUNNEL do not mean they can handle GRE in VXLAN.
>>>>>>
>>>>>
>>>>> Kernel software offload does support this combination just fine.
>>>>> Seriously, I've tested that more than a thousand times. This is a few
>>>>> HW implementations you're referring to. The limitations of these
>>>>> drivers should not dictate how we build the software-- it needs to
>>>>> work the other way around.
>>>>
>>>> Jesse does have a point.  Drivers aren't checking for this kind of
>>>> thing currently as the transmit side doesn't generate these kind of
>>>> frames.  The fact that GRO is makes things a bit more messy as we will
>>>> really need to restructure the GSO code in order to handle it.  As far
>>>> as drivers testing for it I am pretty certain the i40e isn't.  I would
>>>> wonder if we need to add yet another GSO bit to indicate that we
>>>> support multiple layers of encapsulation.  I'm pretty sure the only
>>>> way we could possibly handle it would be in software since what you
>>>> are indicating is a indeterminate number of headers that all require
>>>> updates.
>>>>
>>>>>> Asking drivers to assume that this combination of flags means FOU
>>>>>> doesn't seem right to me. To the best of my knowledge, no driver uses
>>>>>> ndo_feature_check to do validation of multiple tunnel offload flags
>>>>>> since the assumption is that the stack will never try to do this.
>>>>>> Since FOU is being treated as only a single level of encapsulation, I
>>>>>> think it would be better to just advertise it that way on transmit
>>>>>> (i.e. only set SKB_GSO_UDP_TUNNEL).
>>>>>>
>>>>> If it's not FOU it will be something else. Arbitrarily declaring
>>>>> multi-levels of encapsulation invalid is simply the wrong direction,
>>>>> we should be increasing generality and capabilities of the kernel not
>>>>> holding it down with artificial limits. This is why the generic TSO
>>>>> work that Alexander and Edward are looking at is so important-- if
>>>>> this flies then we can offload any combination of encapsulations with
>>>>> out protocol specific information.
>>>>
>>>> The segmentation code isn't designed to handle more than 2 layers of
>>>> headers.  Currently we have the pointers for the inner headers and the
>>>> outer headers.  If you are talking about adding yet another level we
>>>> would need additional pointers in the skbuff to handle them and we
>>>> currently don't have them at present.
>>>>
>>>>>> The change that you are suggesting would result in packets generated
>>>>>> by GRO that cannot be handled properly on transmit in some cases and
>>>>>> would likely end up being dropped or malformed. GRO is just an
>>>>>> optimization and correctness must come first so we cannot use it if it
>>>>>> might corrupt traffic.
>>>>>
>>>>> That's (a few) drivers problem. It's no different than if they had
>>>>> decided that SKB_GSO_UDP_TUNNEL means vxlan, or they can support GRE
>>>>> in IPv4 offload but not GRE in IPv6, or they can only handle headers
>>>>> of a certain size, can't handle IPv6 ext. hdrs., etc. As I mentioned,
>>>>> the long term solution is to eliminate the GSO_ flags and use a
>>>>> generic segmentation offload interface. But in the interim, it is
>>>>> *incumbent* on drivers to determine if they can handle a GSO packet
>>>>> and the interfaces to do that exist. Restricting the capabilities of
>>>>> GRO just to appease those drivers is not right. Breaking existing GRO
>>>>> for their benefit is definitely not right.
>>>>
>>>> This isn't about if drivers can handle it.  It is about if the skbuff
>>>> can handle it.  The problem as it stands right now is that we start
>>>> losing data once we go past 1 level of encapsulation.  We only have
>>>> the standard mac_header, network_header, transport_header, and then
>>>> the inner set of the same pointers.  If we try to go multiple levels
>>>> deep we start losing data.
>>>>
>>> Huh? GUE/FOU has been running perfectly well with two levels of
>>> encapsulation for over a year now. We never had to add anything to
>>> skbuff to make that work. If "losing data" is a problem please provide
>>> the *reproducible* test case for that and we'll debug that.
>>
>> I'm guessing most of your examples involve either a remote checksum
>> being enabled or using NICs that don't support any tunnel offloads?
>> Hardware needs to be able to identify where headers are in order to
>> perform most of their offloads for TSO.  We either have to parse to
>> find them or we are provided with them by the stack.  GSO can work
>> around it as long as we don't stack checksum based offload with
>> non-checksum of the same type.
>>
>> mostIf for example you were to try and send a frame that had either an
>> inner or outer GRE tunnel in addition to a UDP tunnel most NICs would
>> probably screw it up.
>
> Please be more precise. Obviously we're only talking about the few
> NICs that even support UDP tunnel offload so it's not most NICs. Also,
> there is a big difference between "probably" and "definitely"; no one
> has provided a single data point that that there is even an issue. For
> instance, looking a i40e I suspect this will work with GRE/UDP since
> the driver already deals with offsets and shouldn't care about
> intermediate levels of encapsulation.

I'm sorry I cannot be super precise as I hadn't looked that closely at
the FOU or GUE code before.  Honestly I suspect most people probably
haven't.

So from what I can tell FOU and GUE isn't really even necessarily
doing stacked tunnels.  It looks like you support IPIP, SIT, or GRE.
So in the case of IPIP and SIT for instance the only real difference
between VXLAN and those tunnels with fou is that you don't have the
VXLAN or inner Ethernet headers.  So really you still only have two
levels of headers.  Even in the case of GRE you have that set after
the outer UDP header so in that case GRE ends up being treated as a
tunnel header since the outer transport offset was overwritten.  If we
had hardware that supported both UDP and GRE outer checksums it would
cause a mess as the hardware would place the GRE checksum in the wrong
spot.

So if anything all we probably would need to do is treat the FOU or
GUE stuff as a special case.  Basically if we end up having to do a
GRO over a FOU or GUE instance maybe we need to knock the encap_mark
back to 0 of it is at 1 and the next level is IPIP, SIT, or GRE
because really it can just be treated as a UDP tunnel.  Jesse's code
is still needed to catch the case where someone is trying to do
something like IPIP over VXLAN or whatever but it does seem like there
should be some wiggle room for FOU or GUE.

- Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.
  2016-03-28 20:34                 ` Alexander Duyck
@ 2016-03-28 22:10                   ` Tom Herbert
  2016-03-28 23:34                     ` Jesse Gross
  0 siblings, 1 reply; 18+ messages in thread
From: Tom Herbert @ 2016-03-28 22:10 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jesse Gross, David Miller, Linux Kernel Network Developers

On Mon, Mar 28, 2016 at 1:34 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Mon, Mar 28, 2016 at 1:03 PM, Tom Herbert <tom@herbertland.com> wrote:
>> On Mon, Mar 28, 2016 at 12:31 PM, Alexander Duyck
>> <alexander.duyck@gmail.com> wrote:
>>> On Mon, Mar 28, 2016 at 11:47 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>> On Mon, Mar 28, 2016 at 10:37 AM, Alexander Duyck
>>>> <alexander.duyck@gmail.com> wrote:
>>>>> On Mon, Mar 28, 2016 at 9:31 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>>>> On Sun, Mar 27, 2016 at 9:38 PM, Jesse Gross <jesse@kernel.org> wrote:
>>>>>>> On Sat, Mar 26, 2016 at 12:41 PM, Tom Herbert <tom@herbertland.com> wrote:
>>>>>>>> On Sat, Mar 19, 2016 at 9:32 AM, Jesse Gross <jesse@kernel.org> wrote:
>>>>>>>>> When drivers express support for TSO of encapsulated packets, they
>>>>>>>>> only mean that they can do it for one layer of encapsulation.
>>>>>>>>> Supporting additional levels would mean updating, at a minimum,
>>>>>>>>> more IP length fields and they are unaware of this.
>>>>>>>>>
>>>>>>>> This patch completely breaks GRO for FOU and GUE.
>>>>>>>>
>>>>>>>>> No encapsulation device expresses support for handling offloaded
>>>>>>>>> encapsulated packets, so we won't generate these types of frames
>>>>>>>>> in the transmit path. However, GRO doesn't have a check for
>>>>>>>>> multiple levels of encapsulation and will attempt to build them.
>>>>>>>>>
>>>>>>>>> UDP tunnel GRO actually does prevent this situation but it only
>>>>>>>>> handles multiple UDP tunnels stacked on top of each other. This
>>>>>>>>> generalizes that solution to prevent any kind of tunnel stacking
>>>>>>>>> that would cause problems.
>>>>>>>>>
>>>>>>>> GUE and FOU regularly create packets that will be both GSO UDP tunnels
>>>>>>>> and IPIP, SIT, GRE, etc. This is by design. There should be no
>>>>>>>> ambiguity in the drivers as to what this means. For instance, if
>>>>>>>> SKB_GSO_UDP_TUNNEL and SKB_GSO_GRE are set that is GRE/UDP packet, a
>>>>>>>> driver can use ndo_features_check to validate.
>>>>>>>>
>>>>>>>> So multiple levels of encapsulation with GRO is perfectly valid and I
>>>>>>>> would suggest to simply revert this patch. The one potential issue we
>>>>>>>> could have is the potential for GRO to construct a packet which is a
>>>>>>>> UDP-encapsulation inside another encapsulation, like UDP-encap in GRE.
>>>>>>>> In this case the GSO flags don't provide enough information to
>>>>>>>> distinguish say between GRE/UDP (common case) and UDP/GRE (uncommon
>>>>>>>> case). To make this clear we can set udp_mark in GRE, ipip, and sit
>>>>>>>> but *not* check for it.
>>>>>>>
>>>>>>> Generally speaking, multiple levels of encapsulation offload are not
>>>>>>> valid. I think this is pretty clear from the fact that none of the
>>>>>>> encapsulation drivers expose support for encapsulation offloads on
>>>>>>> transmit and the drivers supporting NETIF_F_GSO_GRE and
>>>>>>> NETIF_F_GSO_UDP_TUNNEL do not mean they can handle GRE in VXLAN.
>>>>>>>
>>>>>>
>>>>>> Kernel software offload does support this combination just fine.
>>>>>> Seriously, I've tested that more than a thousand times. This is a few
>>>>>> HW implementations you're referring to. The limitations of these
>>>>>> drivers should not dictate how we build the software-- it needs to
>>>>>> work the other way around.
>>>>>
>>>>> Jesse does have a point.  Drivers aren't checking for this kind of
>>>>> thing currently as the transmit side doesn't generate these kind of
>>>>> frames.  The fact that GRO is makes things a bit more messy as we will
>>>>> really need to restructure the GSO code in order to handle it.  As far
>>>>> as drivers testing for it I am pretty certain the i40e isn't.  I would
>>>>> wonder if we need to add yet another GSO bit to indicate that we
>>>>> support multiple layers of encapsulation.  I'm pretty sure the only
>>>>> way we could possibly handle it would be in software since what you
>>>>> are indicating is a indeterminate number of headers that all require
>>>>> updates.
>>>>>
>>>>>>> Asking drivers to assume that this combination of flags means FOU
>>>>>>> doesn't seem right to me. To the best of my knowledge, no driver uses
>>>>>>> ndo_feature_check to do validation of multiple tunnel offload flags
>>>>>>> since the assumption is that the stack will never try to do this.
>>>>>>> Since FOU is being treated as only a single level of encapsulation, I
>>>>>>> think it would be better to just advertise it that way on transmit
>>>>>>> (i.e. only set SKB_GSO_UDP_TUNNEL).
>>>>>>>
>>>>>> If it's not FOU it will be something else. Arbitrarily declaring
>>>>>> multi-levels of encapsulation invalid is simply the wrong direction,
>>>>>> we should be increasing generality and capabilities of the kernel not
>>>>>> holding it down with artificial limits. This is why the generic TSO
>>>>>> work that Alexander and Edward are looking at is so important-- if
>>>>>> this flies then we can offload any combination of encapsulations with
>>>>>> out protocol specific information.
>>>>>
>>>>> The segmentation code isn't designed to handle more than 2 layers of
>>>>> headers.  Currently we have the pointers for the inner headers and the
>>>>> outer headers.  If you are talking about adding yet another level we
>>>>> would need additional pointers in the skbuff to handle them and we
>>>>> currently don't have them at present.
>>>>>
>>>>>>> The change that you are suggesting would result in packets generated
>>>>>>> by GRO that cannot be handled properly on transmit in some cases and
>>>>>>> would likely end up being dropped or malformed. GRO is just an
>>>>>>> optimization and correctness must come first so we cannot use it if it
>>>>>>> might corrupt traffic.
>>>>>>
>>>>>> That's (a few) drivers problem. It's no different than if they had
>>>>>> decided that SKB_GSO_UDP_TUNNEL means vxlan, or they can support GRE
>>>>>> in IPv4 offload but not GRE in IPv6, or they can only handle headers
>>>>>> of a certain size, can't handle IPv6 ext. hdrs., etc. As I mentioned,
>>>>>> the long term solution is to eliminate the GSO_ flags and use a
>>>>>> generic segmentation offload interface. But in the interim, it is
>>>>>> *incumbent* on drivers to determine if they can handle a GSO packet
>>>>>> and the interfaces to do that exist. Restricting the capabilities of
>>>>>> GRO just to appease those drivers is not right. Breaking existing GRO
>>>>>> for their benefit is definitely not right.
>>>>>
>>>>> This isn't about if drivers can handle it.  It is about if the skbuff
>>>>> can handle it.  The problem as it stands right now is that we start
>>>>> losing data once we go past 1 level of encapsulation.  We only have
>>>>> the standard mac_header, network_header, transport_header, and then
>>>>> the inner set of the same pointers.  If we try to go multiple levels
>>>>> deep we start losing data.
>>>>>
>>>> Huh? GUE/FOU has been running perfectly well with two levels of
>>>> encapsulation for over a year now. We never had to add anything to
>>>> skbuff to make that work. If "losing data" is a problem please provide
>>>> the *reproducible* test case for that and we'll debug that.
>>>
>>> I'm guessing most of your examples involve either a remote checksum
>>> being enabled or using NICs that don't support any tunnel offloads?
>>> Hardware needs to be able to identify where headers are in order to
>>> perform most of their offloads for TSO.  We either have to parse to
>>> find them or we are provided with them by the stack.  GSO can work
>>> around it as long as we don't stack checksum based offload with
>>> non-checksum of the same type.
>>>
>>> mostIf for example you were to try and send a frame that had either an
>>> inner or outer GRE tunnel in addition to a UDP tunnel most NICs would
>>> probably screw it up.
>>
>> Please be more precise. Obviously we're only talking about the few
>> NICs that even support UDP tunnel offload so it's not most NICs. Also,
>> there is a big difference between "probably" and "definitely"; no one
>> has provided a single data point that that there is even an issue. For
>> instance, looking a i40e I suspect this will work with GRE/UDP since
>> the driver already deals with offsets and shouldn't care about
>> intermediate levels of encapsulation.
>
> I'm sorry I cannot be super precise as I hadn't looked that closely at
> the FOU or GUE code before.  Honestly I suspect most people probably
> haven't.
>
Sorry, but one simple test would have identified this patch was a
regression. I don't mind that bugs are introduced with new patches,
but it is frustrating when testing of the patches is obviously
inadequate and then we get the "it's your stuff that's broken" knee
jerk reaction when a patch causes a regression for someone else.

> So from what I can tell FOU and GUE isn't really even necessarily
> doing stacked tunnels.  It looks like you support IPIP, SIT, or GRE.
> So in the case of IPIP and SIT for instance the only real difference
> between VXLAN and those tunnels with fou is that you don't have the
> VXLAN or inner Ethernet headers.  So really you still only have two
> levels of headers.  Even in the case of GRE you have that set after
> the outer UDP header so in that case GRE ends up being treated as a
> tunnel header since the outer transport offset was overwritten.  If we
> had hardware that supported both UDP and GRE outer checksums it would
> cause a mess as the hardware would place the GRE checksum in the wrong
> spot.
>
> So if anything all we probably would need to do is treat the FOU or
> GUE stuff as a special case.  Basically if we end up having to do a
> GRO over a FOU or GUE instance maybe we need to knock the encap_mark
> back to 0 of it is at 1 and the next level is IPIP, SIT, or GRE
> because really it can just be treated as a UDP tunnel.  Jesse's code
> is still needed to catch the case where someone is trying to do
> something like IPIP over VXLAN or whatever but it does seem like there
> should be some wiggle room for FOU or GUE.
>
Nothing is needed other than to revert this patch. There is no problem
with GRO. You nor Jesse have not identified any tangible problem. If a
driver really does had an issue with doing GRE/UDP with GSO then it
can filter that in check_features (this is like 2 lines of code). But
that has not been proven to be a problem, and I would expect that
anyone who wants to fix that better run tests showing there is a
problem.

Tom

> - Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.
  2016-03-28 22:10                   ` Tom Herbert
@ 2016-03-28 23:34                     ` Jesse Gross
  2016-03-29  0:50                       ` Tom Herbert
  0 siblings, 1 reply; 18+ messages in thread
From: Jesse Gross @ 2016-03-28 23:34 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Alexander Duyck, David Miller, Linux Kernel Network Developers

On Mon, Mar 28, 2016 at 3:10 PM, Tom Herbert <tom@herbertland.com> wrote:
> On Mon, Mar 28, 2016 at 1:34 PM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On Mon, Mar 28, 2016 at 1:03 PM, Tom Herbert <tom@herbertland.com> wrote:
>>> On Mon, Mar 28, 2016 at 12:31 PM, Alexander Duyck
>>> <alexander.duyck@gmail.com> wrote:
>>>> On Mon, Mar 28, 2016 at 11:47 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>>> On Mon, Mar 28, 2016 at 10:37 AM, Alexander Duyck
>>>>> <alexander.duyck@gmail.com> wrote:
>>>>>> On Mon, Mar 28, 2016 at 9:31 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>>>>> On Sun, Mar 27, 2016 at 9:38 PM, Jesse Gross <jesse@kernel.org> wrote:
>>>>>>>> On Sat, Mar 26, 2016 at 12:41 PM, Tom Herbert <tom@herbertland.com> wrote:
>>>>>>>>> On Sat, Mar 19, 2016 at 9:32 AM, Jesse Gross <jesse@kernel.org> wrote:
>>>>>>>>>> When drivers express support for TSO of encapsulated packets, they
>>>>>>>>>> only mean that they can do it for one layer of encapsulation.
>>>>>>>>>> Supporting additional levels would mean updating, at a minimum,
>>>>>>>>>> more IP length fields and they are unaware of this.
>>>>>>>>>>
>>>>>>>>> This patch completely breaks GRO for FOU and GUE.
>>>>>>>>>
>>>>>>>>>> No encapsulation device expresses support for handling offloaded
>>>>>>>>>> encapsulated packets, so we won't generate these types of frames
>>>>>>>>>> in the transmit path. However, GRO doesn't have a check for
>>>>>>>>>> multiple levels of encapsulation and will attempt to build them.
>>>>>>>>>>
>>>>>>>>>> UDP tunnel GRO actually does prevent this situation but it only
>>>>>>>>>> handles multiple UDP tunnels stacked on top of each other. This
>>>>>>>>>> generalizes that solution to prevent any kind of tunnel stacking
>>>>>>>>>> that would cause problems.
>>>>>>>>>>
>>>>>>>>> GUE and FOU regularly create packets that will be both GSO UDP tunnels
>>>>>>>>> and IPIP, SIT, GRE, etc. This is by design. There should be no
>>>>>>>>> ambiguity in the drivers as to what this means. For instance, if
>>>>>>>>> SKB_GSO_UDP_TUNNEL and SKB_GSO_GRE are set that is GRE/UDP packet, a
>>>>>>>>> driver can use ndo_features_check to validate.
>>>>>>>>>
>>>>>>>>> So multiple levels of encapsulation with GRO is perfectly valid and I
>>>>>>>>> would suggest to simply revert this patch. The one potential issue we
>>>>>>>>> could have is the potential for GRO to construct a packet which is a
>>>>>>>>> UDP-encapsulation inside another encapsulation, like UDP-encap in GRE.
>>>>>>>>> In this case the GSO flags don't provide enough information to
>>>>>>>>> distinguish say between GRE/UDP (common case) and UDP/GRE (uncommon
>>>>>>>>> case). To make this clear we can set udp_mark in GRE, ipip, and sit
>>>>>>>>> but *not* check for it.
>>>>>>>>
>>>>>>>> Generally speaking, multiple levels of encapsulation offload are not
>>>>>>>> valid. I think this is pretty clear from the fact that none of the
>>>>>>>> encapsulation drivers expose support for encapsulation offloads on
>>>>>>>> transmit and the drivers supporting NETIF_F_GSO_GRE and
>>>>>>>> NETIF_F_GSO_UDP_TUNNEL do not mean they can handle GRE in VXLAN.
>>>>>>>>
>>>>>>>
>>>>>>> Kernel software offload does support this combination just fine.
>>>>>>> Seriously, I've tested that more than a thousand times. This is a few
>>>>>>> HW implementations you're referring to. The limitations of these
>>>>>>> drivers should not dictate how we build the software-- it needs to
>>>>>>> work the other way around.
>>>>>>
>>>>>> Jesse does have a point.  Drivers aren't checking for this kind of
>>>>>> thing currently as the transmit side doesn't generate these kind of
>>>>>> frames.  The fact that GRO is makes things a bit more messy as we will
>>>>>> really need to restructure the GSO code in order to handle it.  As far
>>>>>> as drivers testing for it I am pretty certain the i40e isn't.  I would
>>>>>> wonder if we need to add yet another GSO bit to indicate that we
>>>>>> support multiple layers of encapsulation.  I'm pretty sure the only
>>>>>> way we could possibly handle it would be in software since what you
>>>>>> are indicating is a indeterminate number of headers that all require
>>>>>> updates.
>>>>>>
>>>>>>>> Asking drivers to assume that this combination of flags means FOU
>>>>>>>> doesn't seem right to me. To the best of my knowledge, no driver uses
>>>>>>>> ndo_feature_check to do validation of multiple tunnel offload flags
>>>>>>>> since the assumption is that the stack will never try to do this.
>>>>>>>> Since FOU is being treated as only a single level of encapsulation, I
>>>>>>>> think it would be better to just advertise it that way on transmit
>>>>>>>> (i.e. only set SKB_GSO_UDP_TUNNEL).
>>>>>>>>
>>>>>>> If it's not FOU it will be something else. Arbitrarily declaring
>>>>>>> multi-levels of encapsulation invalid is simply the wrong direction,
>>>>>>> we should be increasing generality and capabilities of the kernel not
>>>>>>> holding it down with artificial limits. This is why the generic TSO
>>>>>>> work that Alexander and Edward are looking at is so important-- if
>>>>>>> this flies then we can offload any combination of encapsulations with
>>>>>>> out protocol specific information.
>>>>>>
>>>>>> The segmentation code isn't designed to handle more than 2 layers of
>>>>>> headers.  Currently we have the pointers for the inner headers and the
>>>>>> outer headers.  If you are talking about adding yet another level we
>>>>>> would need additional pointers in the skbuff to handle them and we
>>>>>> currently don't have them at present.
>>>>>>
>>>>>>>> The change that you are suggesting would result in packets generated
>>>>>>>> by GRO that cannot be handled properly on transmit in some cases and
>>>>>>>> would likely end up being dropped or malformed. GRO is just an
>>>>>>>> optimization and correctness must come first so we cannot use it if it
>>>>>>>> might corrupt traffic.
>>>>>>>
>>>>>>> That's (a few) drivers problem. It's no different than if they had
>>>>>>> decided that SKB_GSO_UDP_TUNNEL means vxlan, or they can support GRE
>>>>>>> in IPv4 offload but not GRE in IPv6, or they can only handle headers
>>>>>>> of a certain size, can't handle IPv6 ext. hdrs., etc. As I mentioned,
>>>>>>> the long term solution is to eliminate the GSO_ flags and use a
>>>>>>> generic segmentation offload interface. But in the interim, it is
>>>>>>> *incumbent* on drivers to determine if they can handle a GSO packet
>>>>>>> and the interfaces to do that exist. Restricting the capabilities of
>>>>>>> GRO just to appease those drivers is not right. Breaking existing GRO
>>>>>>> for their benefit is definitely not right.
>>>>>>
>>>>>> This isn't about if drivers can handle it.  It is about if the skbuff
>>>>>> can handle it.  The problem as it stands right now is that we start
>>>>>> losing data once we go past 1 level of encapsulation.  We only have
>>>>>> the standard mac_header, network_header, transport_header, and then
>>>>>> the inner set of the same pointers.  If we try to go multiple levels
>>>>>> deep we start losing data.
>>>>>>
>>>>> Huh? GUE/FOU has been running perfectly well with two levels of
>>>>> encapsulation for over a year now. We never had to add anything to
>>>>> skbuff to make that work. If "losing data" is a problem please provide
>>>>> the *reproducible* test case for that and we'll debug that.
>>>>
>>>> I'm guessing most of your examples involve either a remote checksum
>>>> being enabled or using NICs that don't support any tunnel offloads?
>>>> Hardware needs to be able to identify where headers are in order to
>>>> perform most of their offloads for TSO.  We either have to parse to
>>>> find them or we are provided with them by the stack.  GSO can work
>>>> around it as long as we don't stack checksum based offload with
>>>> non-checksum of the same type.
>>>>
>>>> mostIf for example you were to try and send a frame that had either an
>>>> inner or outer GRE tunnel in addition to a UDP tunnel most NICs would
>>>> probably screw it up.
>>>
>>> Please be more precise. Obviously we're only talking about the few
>>> NICs that even support UDP tunnel offload so it's not most NICs. Also,
>>> there is a big difference between "probably" and "definitely"; no one
>>> has provided a single data point that that there is even an issue. For
>>> instance, looking a i40e I suspect this will work with GRE/UDP since
>>> the driver already deals with offsets and shouldn't care about
>>> intermediate levels of encapsulation.
>>
>> I'm sorry I cannot be super precise as I hadn't looked that closely at
>> the FOU or GUE code before.  Honestly I suspect most people probably
>> haven't.
>>
> Sorry, but one simple test would have identified this patch was a
> regression. I don't mind that bugs are introduced with new patches,
> but it is frustrating when testing of the patches is obviously
> inadequate and then we get the "it's your stuff that's broken" knee
> jerk reaction when a patch causes a regression for someone else.
>
>> So from what I can tell FOU and GUE isn't really even necessarily
>> doing stacked tunnels.  It looks like you support IPIP, SIT, or GRE.
>> So in the case of IPIP and SIT for instance the only real difference
>> between VXLAN and those tunnels with fou is that you don't have the
>> VXLAN or inner Ethernet headers.  So really you still only have two
>> levels of headers.  Even in the case of GRE you have that set after
>> the outer UDP header so in that case GRE ends up being treated as a
>> tunnel header since the outer transport offset was overwritten.  If we
>> had hardware that supported both UDP and GRE outer checksums it would
>> cause a mess as the hardware would place the GRE checksum in the wrong
>> spot.
>>
>> So if anything all we probably would need to do is treat the FOU or
>> GUE stuff as a special case.  Basically if we end up having to do a
>> GRO over a FOU or GUE instance maybe we need to knock the encap_mark
>> back to 0 of it is at 1 and the next level is IPIP, SIT, or GRE
>> because really it can just be treated as a UDP tunnel.  Jesse's code
>> is still needed to catch the case where someone is trying to do
>> something like IPIP over VXLAN or whatever but it does seem like there
>> should be some wiggle room for FOU or GUE.
>>
> Nothing is needed other than to revert this patch. There is no problem
> with GRO. You nor Jesse have not identified any tangible problem. If a
> driver really does had an issue with doing GRE/UDP with GSO then it
> can filter that in check_features (this is like 2 lines of code). But
> that has not been proven to be a problem, and I would expect that
> anyone who wants to fix that better run tests showing there is a
> problem.

Let me try to give a very clear and specific example. Note that this
is pure software and does not involve any hardware offloading.

* A packet is received that is encapsulated with two layers of GRE. It
looks like this: Eth|IP|GRE|IP|GRE|IP|TCP
* The packet is processed through GRO successfully. skb->encapsulation
bit is set and skb_shinfo(skb)->gso_type is set to SKB_GSO_GRE |
SKB_GSO_TCPV4.
* The packet is bridged to an output device and is prepared for
transmission. skb_gso_segment() is called to segment the packet.
* As we descent down the GSO code, we get to gre_gso_segment() for the
first GRE header. skb->encapsulation is cleared and we keep going to
the next header.
* We return to gre_gso_segment() again for the second GRE header.
There is a check for skb->encapsulation being set but it is now clear.
GSO processing is aborted.
* The packet is dropped.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.
  2016-03-28 23:34                     ` Jesse Gross
@ 2016-03-29  0:50                       ` Tom Herbert
  2016-03-29  2:41                         ` Alexander Duyck
  0 siblings, 1 reply; 18+ messages in thread
From: Tom Herbert @ 2016-03-29  0:50 UTC (permalink / raw)
  To: Jesse Gross
  Cc: Alexander Duyck, David Miller, Linux Kernel Network Developers

On Mon, Mar 28, 2016 at 4:34 PM, Jesse Gross <jesse@kernel.org> wrote:
> On Mon, Mar 28, 2016 at 3:10 PM, Tom Herbert <tom@herbertland.com> wrote:
>> On Mon, Mar 28, 2016 at 1:34 PM, Alexander Duyck
>> <alexander.duyck@gmail.com> wrote:
>>> On Mon, Mar 28, 2016 at 1:03 PM, Tom Herbert <tom@herbertland.com> wrote:
>>>> On Mon, Mar 28, 2016 at 12:31 PM, Alexander Duyck
>>>> <alexander.duyck@gmail.com> wrote:
>>>>> On Mon, Mar 28, 2016 at 11:47 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>>>> On Mon, Mar 28, 2016 at 10:37 AM, Alexander Duyck
>>>>>> <alexander.duyck@gmail.com> wrote:
>>>>>>> On Mon, Mar 28, 2016 at 9:31 AM, Tom Herbert <tom@herbertland.com> wrote:
>>>>>>>> On Sun, Mar 27, 2016 at 9:38 PM, Jesse Gross <jesse@kernel.org> wrote:
>>>>>>>>> On Sat, Mar 26, 2016 at 12:41 PM, Tom Herbert <tom@herbertland.com> wrote:
>>>>>>>>>> On Sat, Mar 19, 2016 at 9:32 AM, Jesse Gross <jesse@kernel.org> wrote:
>>>>>>>>>>> When drivers express support for TSO of encapsulated packets, they
>>>>>>>>>>> only mean that they can do it for one layer of encapsulation.
>>>>>>>>>>> Supporting additional levels would mean updating, at a minimum,
>>>>>>>>>>> more IP length fields and they are unaware of this.
>>>>>>>>>>>
>>>>>>>>>> This patch completely breaks GRO for FOU and GUE.
>>>>>>>>>>
>>>>>>>>>>> No encapsulation device expresses support for handling offloaded
>>>>>>>>>>> encapsulated packets, so we won't generate these types of frames
>>>>>>>>>>> in the transmit path. However, GRO doesn't have a check for
>>>>>>>>>>> multiple levels of encapsulation and will attempt to build them.
>>>>>>>>>>>
>>>>>>>>>>> UDP tunnel GRO actually does prevent this situation but it only
>>>>>>>>>>> handles multiple UDP tunnels stacked on top of each other. This
>>>>>>>>>>> generalizes that solution to prevent any kind of tunnel stacking
>>>>>>>>>>> that would cause problems.
>>>>>>>>>>>
>>>>>>>>>> GUE and FOU regularly create packets that will be both GSO UDP tunnels
>>>>>>>>>> and IPIP, SIT, GRE, etc. This is by design. There should be no
>>>>>>>>>> ambiguity in the drivers as to what this means. For instance, if
>>>>>>>>>> SKB_GSO_UDP_TUNNEL and SKB_GSO_GRE are set that is GRE/UDP packet, a
>>>>>>>>>> driver can use ndo_features_check to validate.
>>>>>>>>>>
>>>>>>>>>> So multiple levels of encapsulation with GRO is perfectly valid and I
>>>>>>>>>> would suggest to simply revert this patch. The one potential issue we
>>>>>>>>>> could have is the potential for GRO to construct a packet which is a
>>>>>>>>>> UDP-encapsulation inside another encapsulation, like UDP-encap in GRE.
>>>>>>>>>> In this case the GSO flags don't provide enough information to
>>>>>>>>>> distinguish say between GRE/UDP (common case) and UDP/GRE (uncommon
>>>>>>>>>> case). To make this clear we can set udp_mark in GRE, ipip, and sit
>>>>>>>>>> but *not* check for it.
>>>>>>>>>
>>>>>>>>> Generally speaking, multiple levels of encapsulation offload are not
>>>>>>>>> valid. I think this is pretty clear from the fact that none of the
>>>>>>>>> encapsulation drivers expose support for encapsulation offloads on
>>>>>>>>> transmit and the drivers supporting NETIF_F_GSO_GRE and
>>>>>>>>> NETIF_F_GSO_UDP_TUNNEL do not mean they can handle GRE in VXLAN.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Kernel software offload does support this combination just fine.
>>>>>>>> Seriously, I've tested that more than a thousand times. This is a few
>>>>>>>> HW implementations you're referring to. The limitations of these
>>>>>>>> drivers should not dictate how we build the software-- it needs to
>>>>>>>> work the other way around.
>>>>>>>
>>>>>>> Jesse does have a point.  Drivers aren't checking for this kind of
>>>>>>> thing currently as the transmit side doesn't generate these kind of
>>>>>>> frames.  The fact that GRO is makes things a bit more messy as we will
>>>>>>> really need to restructure the GSO code in order to handle it.  As far
>>>>>>> as drivers testing for it I am pretty certain the i40e isn't.  I would
>>>>>>> wonder if we need to add yet another GSO bit to indicate that we
>>>>>>> support multiple layers of encapsulation.  I'm pretty sure the only
>>>>>>> way we could possibly handle it would be in software since what you
>>>>>>> are indicating is a indeterminate number of headers that all require
>>>>>>> updates.
>>>>>>>
>>>>>>>>> Asking drivers to assume that this combination of flags means FOU
>>>>>>>>> doesn't seem right to me. To the best of my knowledge, no driver uses
>>>>>>>>> ndo_feature_check to do validation of multiple tunnel offload flags
>>>>>>>>> since the assumption is that the stack will never try to do this.
>>>>>>>>> Since FOU is being treated as only a single level of encapsulation, I
>>>>>>>>> think it would be better to just advertise it that way on transmit
>>>>>>>>> (i.e. only set SKB_GSO_UDP_TUNNEL).
>>>>>>>>>
>>>>>>>> If it's not FOU it will be something else. Arbitrarily declaring
>>>>>>>> multi-levels of encapsulation invalid is simply the wrong direction,
>>>>>>>> we should be increasing generality and capabilities of the kernel not
>>>>>>>> holding it down with artificial limits. This is why the generic TSO
>>>>>>>> work that Alexander and Edward are looking at is so important-- if
>>>>>>>> this flies then we can offload any combination of encapsulations with
>>>>>>>> out protocol specific information.
>>>>>>>
>>>>>>> The segmentation code isn't designed to handle more than 2 layers of
>>>>>>> headers.  Currently we have the pointers for the inner headers and the
>>>>>>> outer headers.  If you are talking about adding yet another level we
>>>>>>> would need additional pointers in the skbuff to handle them and we
>>>>>>> currently don't have them at present.
>>>>>>>
>>>>>>>>> The change that you are suggesting would result in packets generated
>>>>>>>>> by GRO that cannot be handled properly on transmit in some cases and
>>>>>>>>> would likely end up being dropped or malformed. GRO is just an
>>>>>>>>> optimization and correctness must come first so we cannot use it if it
>>>>>>>>> might corrupt traffic.
>>>>>>>>
>>>>>>>> That's (a few) drivers problem. It's no different than if they had
>>>>>>>> decided that SKB_GSO_UDP_TUNNEL means vxlan, or they can support GRE
>>>>>>>> in IPv4 offload but not GRE in IPv6, or they can only handle headers
>>>>>>>> of a certain size, can't handle IPv6 ext. hdrs., etc. As I mentioned,
>>>>>>>> the long term solution is to eliminate the GSO_ flags and use a
>>>>>>>> generic segmentation offload interface. But in the interim, it is
>>>>>>>> *incumbent* on drivers to determine if they can handle a GSO packet
>>>>>>>> and the interfaces to do that exist. Restricting the capabilities of
>>>>>>>> GRO just to appease those drivers is not right. Breaking existing GRO
>>>>>>>> for their benefit is definitely not right.
>>>>>>>
>>>>>>> This isn't about if drivers can handle it.  It is about if the skbuff
>>>>>>> can handle it.  The problem as it stands right now is that we start
>>>>>>> losing data once we go past 1 level of encapsulation.  We only have
>>>>>>> the standard mac_header, network_header, transport_header, and then
>>>>>>> the inner set of the same pointers.  If we try to go multiple levels
>>>>>>> deep we start losing data.
>>>>>>>
>>>>>> Huh? GUE/FOU has been running perfectly well with two levels of
>>>>>> encapsulation for over a year now. We never had to add anything to
>>>>>> skbuff to make that work. If "losing data" is a problem please provide
>>>>>> the *reproducible* test case for that and we'll debug that.
>>>>>
>>>>> I'm guessing most of your examples involve either a remote checksum
>>>>> being enabled or using NICs that don't support any tunnel offloads?
>>>>> Hardware needs to be able to identify where headers are in order to
>>>>> perform most of their offloads for TSO.  We either have to parse to
>>>>> find them or we are provided with them by the stack.  GSO can work
>>>>> around it as long as we don't stack checksum based offload with
>>>>> non-checksum of the same type.
>>>>>
>>>>> mostIf for example you were to try and send a frame that had either an
>>>>> inner or outer GRE tunnel in addition to a UDP tunnel most NICs would
>>>>> probably screw it up.
>>>>
>>>> Please be more precise. Obviously we're only talking about the few
>>>> NICs that even support UDP tunnel offload so it's not most NICs. Also,
>>>> there is a big difference between "probably" and "definitely"; no one
>>>> has provided a single data point that that there is even an issue. For
>>>> instance, looking a i40e I suspect this will work with GRE/UDP since
>>>> the driver already deals with offsets and shouldn't care about
>>>> intermediate levels of encapsulation.
>>>
>>> I'm sorry I cannot be super precise as I hadn't looked that closely at
>>> the FOU or GUE code before.  Honestly I suspect most people probably
>>> haven't.
>>>
>> Sorry, but one simple test would have identified this patch was a
>> regression. I don't mind that bugs are introduced with new patches,
>> but it is frustrating when testing of the patches is obviously
>> inadequate and then we get the "it's your stuff that's broken" knee
>> jerk reaction when a patch causes a regression for someone else.
>>
>>> So from what I can tell FOU and GUE isn't really even necessarily
>>> doing stacked tunnels.  It looks like you support IPIP, SIT, or GRE.
>>> So in the case of IPIP and SIT for instance the only real difference
>>> between VXLAN and those tunnels with fou is that you don't have the
>>> VXLAN or inner Ethernet headers.  So really you still only have two
>>> levels of headers.  Even in the case of GRE you have that set after
>>> the outer UDP header so in that case GRE ends up being treated as a
>>> tunnel header since the outer transport offset was overwritten.  If we
>>> had hardware that supported both UDP and GRE outer checksums it would
>>> cause a mess as the hardware would place the GRE checksum in the wrong
>>> spot.
>>>
>>> So if anything all we probably would need to do is treat the FOU or
>>> GUE stuff as a special case.  Basically if we end up having to do a
>>> GRO over a FOU or GUE instance maybe we need to knock the encap_mark
>>> back to 0 of it is at 1 and the next level is IPIP, SIT, or GRE
>>> because really it can just be treated as a UDP tunnel.  Jesse's code
>>> is still needed to catch the case where someone is trying to do
>>> something like IPIP over VXLAN or whatever but it does seem like there
>>> should be some wiggle room for FOU or GUE.
>>>
>> Nothing is needed other than to revert this patch. There is no problem
>> with GRO. You nor Jesse have not identified any tangible problem. If a
>> driver really does had an issue with doing GRE/UDP with GSO then it
>> can filter that in check_features (this is like 2 lines of code). But
>> that has not been proven to be a problem, and I would expect that
>> anyone who wants to fix that better run tests showing there is a
>> problem.
>
> Let me try to give a very clear and specific example. Note that this
> is pure software and does not involve any hardware offloading.
>
> * A packet is received that is encapsulated with two layers of GRE. It
> looks like this: Eth|IP|GRE|IP|GRE|IP|TCP
> * The packet is processed through GRO successfully. skb->encapsulation
> bit is set and skb_shinfo(skb)->gso_type is set to SKB_GSO_GRE |
> SKB_GSO_TCPV4.
> * The packet is bridged to an output device and is prepared for
> transmission. skb_gso_segment() is called to segment the packet.
> * As we descent down the GSO code, we get to gre_gso_segment() for the
> first GRE header. skb->encapsulation is cleared and we keep going to
> the next header.
> * We return to gre_gso_segment() again for the second GRE header.
> There is a check for skb->encapsulation being set but it is now clear.
> GSO processing is aborted.
> * The packet is dropped.

If multiple level of GRE GSO is the problem then you are welcome to
fix it after FOU is fixed and when net-next opens (GSO partial should
fix this this anyway). But this is not at all the same problem as
saying all multiple levels of encapsulation are invalid so we need to
disallow that in GRO for everyone regardless of whether the packet is
is being forwarded or some driver can't understand a certain valid
combination.

Testing GUE is really not hard. There is no reason why GUE, Geneve,
and VXLAN should not be tested each time a change is made to any of
the common UDP tunneling code.

Configuring ipip-GUE can be done by:

./ip fou add port 6080 gue
./ip link add name tun1 type ipip remote 10.1.1.2 local 10.1.1.1 ttl
225 encap gue encap-sport auto encap-dport 6080 encap-csum
encap-remcsum
ifconfig tun1 192.168.1.1
ip route add 192.168.1.0/24 dev tun1

Configuring gre-GUE is just s/ipip/gre/ of above.

./netperf -H 192.168.1.2 -t TCP_STREAM -l 100

Is good for showing showing regression in a single flow. You should
see GSO/GRO being effective on both sides and perf should show only a
minute amount of time in csum_partial assuming that your device has
checksum offload for outer UDP (which should be about all).

./super_netperf 200 -H 192.168.1.2 -j -l 1 -t TCP_RR -- -r 1,1 -o
TRANSACTION_RATE,P50_LATENCY,P90_LATENCY,P99_LATENCY

Is run to test pps. We're looking for no regression in tput, latency,
or CPU utilization.

Next time you make changes to anything that affects common paths in
tunnels please run these tests and report the results in the commit
log so we can avoid regressions like this. You should also be running
an equivalent battery of VXLAN tests. The configuration I use is:

./ip link add vxlan0 type vxlan id 10 group 224.10.10.10 ttl 10 dev
eth0  udpcsum remcsumtx remcsumrx
ifconfig vxlan0 192.168.111.1
ip route add 192.168.111.0/24 dev vxlan0

Again we expect the same sort of results, GRO/GSO should be effective,
csum_partial should not be getting significant tine, etc...

Thanks,
Tom

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation.
  2016-03-29  0:50                       ` Tom Herbert
@ 2016-03-29  2:41                         ` Alexander Duyck
  0 siblings, 0 replies; 18+ messages in thread
From: Alexander Duyck @ 2016-03-29  2:41 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Jesse Gross, David Miller, Linux Kernel Network Developers

On Mon, Mar 28, 2016 at 5:50 PM, Tom Herbert <tom@herbertland.com> wrote:
> On Mon, Mar 28, 2016 at 4:34 PM, Jesse Gross <jesse@kernel.org> wrote:
>> * A packet is received that is encapsulated with two layers of GRE. It
>> looks like this: Eth|IP|GRE|IP|GRE|IP|TCP
>> * The packet is processed through GRO successfully. skb->encapsulation
>> bit is set and skb_shinfo(skb)->gso_type is set to SKB_GSO_GRE |
>> SKB_GSO_TCPV4.
>> * The packet is bridged to an output device and is prepared for
>> transmission. skb_gso_segment() is called to segment the packet.
>> * As we descent down the GSO code, we get to gre_gso_segment() for the
>> first GRE header. skb->encapsulation is cleared and we keep going to
>> the next header.
>> * We return to gre_gso_segment() again for the second GRE header.
>> There is a check for skb->encapsulation being set but it is now clear.
>> GSO processing is aborted.
>> * The packet is dropped.
>
> If multiple level of GRE GSO is the problem then you are welcome to
> fix it after FOU is fixed and when net-next opens (GSO partial should
> fix this this anyway). But this is not at all the same problem as
> saying all multiple levels of encapsulation are invalid so we need to
> disallow that in GRO for everyone regardless of whether the packet is
> is being forwarded or some driver can't understand a certain valid
> combination.

GSO partial is not magic.  It is not going to fix a great many of
these kind of issues.  All GSO partial gives us is a way for hardware
to segment frames if the headers can be replicated.  If GSO cannot
handle the frame in software then GSO partial still cannot handle it.

It looks like what we probably need to do is rewrite GSO in order to
support this as multiple tunnel levels are currently not supported.  I
figure it is something we can add once we have GSO partial in.  Maybe
call it GSO_STACKED.  I still have to figure out where we are going to
want the header pointers since there are obvious bits that wouldn't
work such as the part in UDP or GRE segmentation code where we assume
we can just reset the network header offset based off of the inner
network header offset.  If we have multiple levels of tunnels then
this is blatantly broken.  Maybe to get around it we might have to add
the tunnel header length as a value passed with UDP offloads.

> Testing GUE is really not hard. There is no reason why GUE, Geneve,
> and VXLAN should not be tested each time a change is made to any of
> the common UDP tunneling code.
>
> Configuring ipip-GUE can be done by:
>
> ./ip fou add port 6080 gue
> ./ip link add name tun1 type ipip remote 10.1.1.2 local 10.1.1.1 ttl
> 225 encap gue encap-sport auto encap-dport 6080 encap-csum
> encap-remcsum
> ifconfig tun1 192.168.1.1
> ip route add 192.168.1.0/24 dev tun1
>
> Configuring gre-GUE is just s/ipip/gre/ of above.
>
> ./netperf -H 192.168.1.2 -t TCP_STREAM -l 100

This is essentially the test case I ran to verify that the patch I
submitted fixed the issue.  I'm not sure we need to do too much
testing for the patch I submitted since the whole thing is pretty
straightforward.

> Is good for showing showing regression in a single flow. You should
> see GSO/GRO being effective on both sides and perf should show only a
> minute amount of time in csum_partial assuming that your device has
> checksum offload for outer UDP (which should be about all).
>
> ./super_netperf 200 -H 192.168.1.2 -j -l 1 -t TCP_RR -- -r 1,1 -o
> TRANSACTION_RATE,P50_LATENCY,P90_LATENCY,P99_LATENCY
>
> Is run to test pps. We're looking for no regression in tput, latency,
> or CPU utilization.
>
> Next time you make changes to anything that affects common paths in
> tunnels please run these tests and report the results in the commit
> log so we can avoid regressions like this. You should also be running
> an equivalent battery of VXLAN tests. The configuration I use is:
>
> ./ip link add vxlan0 type vxlan id 10 group 224.10.10.10 ttl 10 dev
> eth0  udpcsum remcsumtx remcsumrx
> ifconfig vxlan0 192.168.111.1
> ip route add 192.168.111.0/24 dev vxlan0
>
> Again we expect the same sort of results, GRO/GSO should be effective,
> csum_partial should not be getting significant tine, etc...

With my patch I can verify that GRO is working and coalescing frames
at least for the IPIP case as that was the only one I tested.

If there is something I missed I am willing to respin it and resubmit
it.  From what I can tell though it should restore the original
functionality for the FOU/GUE cases though since all it really does is
push the setting of the encap_mark back from the UDP header over to
the next level header which is the correct behavior (at least in my
opinion).

- Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-03-29  2:41 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-19 16:31 [PATCH net v2 0/3] Tunneling fixes Jesse Gross
2016-03-19 16:32 ` [PATCH net v2 1/3] ipip: Properly mark ipip GRO packets as encapsulated Jesse Gross
2016-03-19 16:32 ` [PATCH net v2 2/3] tunnels: Don't apply GRO to multiple layers of encapsulation Jesse Gross
2016-03-26 19:41   ` Tom Herbert
2016-03-28  4:38     ` Jesse Gross
2016-03-28 16:31       ` Tom Herbert
2016-03-28 17:37         ` Alexander Duyck
2016-03-28 18:47           ` Tom Herbert
2016-03-28 19:31             ` Alexander Duyck
2016-03-28 20:03               ` Tom Herbert
2016-03-28 20:34                 ` Alexander Duyck
2016-03-28 22:10                   ` Tom Herbert
2016-03-28 23:34                     ` Jesse Gross
2016-03-29  0:50                       ` Tom Herbert
2016-03-29  2:41                         ` Alexander Duyck
2016-03-28 19:33       ` David Miller
2016-03-19 16:32 ` [PATCH net v2 3/3] tunnels: Remove encapsulation offloads on decap Jesse Gross
2016-03-20 20:33 ` [PATCH net v2 0/3] Tunneling fixes David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.