All of lore.kernel.org
 help / color / mirror / Atom feed
* [net PATCH 0/2] Fixes for GRO and GRE tunnels
@ 2016-04-01 18:05 Alexander Duyck
  2016-04-01 18:05 ` [net PATCH 1/2] GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU Alexander Duyck
  2016-04-01 18:05 ` [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864 Alexander Duyck
  0 siblings, 2 replies; 15+ messages in thread
From: Alexander Duyck @ 2016-04-01 18:05 UTC (permalink / raw)
  To: herbert, tom, jesse, alexander.duyck, edumazet, netdev, davem

This pair of patches addresses a few issues I have discovered over the last
week or so concerning GRO and GRE tunnels.

The first patch addresses an item I called out as an issue with FOU/GUE
encapsulating GRE, and I finally had a chance to test it and verify that
the code concerning it was broken so I took the opportunity to fix it so
that we cannot generate a FOU/GUE frame that is encapsulating a GRE tunnel
with checksum while requesting TSO/GSO for the frame.

The second patch actually addresses something I realized was an issue if we
feed a tunnel through GRO and back out through GSO.  Specifically it was
possible for GRO to generate overlapping IPv4 ID ranges as the outer IP IDs
were being ignored for tunnels.  Ignoring the IP IDs like this should only
be valid if the DF bit is set.  This is normally the case for IPIP, SIT,
and GRE tunnels, but not so for UDP tunnels.  In the case that the DF bit
is not set we store off the fact that there was a delta from what we were
expecting and when we hit the inner-most header we validate the value as to
avoid generating a frame which could lead to an IP ID collision on packets
that could eventually be fragmented.  A side effect is that the inner-most
IP ID test is relaxed as well, but the worst case scenario is that we GRO a
frame with a throw-away ID sequence anyway so if anything segmenting such a
frame with the wrong IP IDs should have no negative effects.

---

Alexander Duyck (2):
      GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU
      ipv4/GRO: Make GRO conform to RFC 6864


 include/linux/netdevice.h |    5 ++++-
 net/core/dev.c            |    2 ++
 net/ipv4/af_inet.c        |   23 ++++++++++++++++-------
 net/ipv4/fou.c            |    6 ++++++
 net/ipv4/gre_offload.c    |    8 ++++++++
 net/ipv4/ip_gre.c         |   13 ++++++++++---
 net/ipv6/ip6_offload.c    |    3 ---
 7 files changed, 46 insertions(+), 14 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [net PATCH 1/2] GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU
  2016-04-01 18:05 [net PATCH 0/2] Fixes for GRO and GRE tunnels Alexander Duyck
@ 2016-04-01 18:05 ` Alexander Duyck
  2016-04-01 18:05 ` [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864 Alexander Duyck
  1 sibling, 0 replies; 15+ messages in thread
From: Alexander Duyck @ 2016-04-01 18:05 UTC (permalink / raw)
  To: herbert, tom, jesse, alexander.duyck, edumazet, netdev, davem

This patch fixes an issue I found in which we were dropping frames if we
had enabled checksums on GRE headers that were encapsulated by either FOU
or GUE.  Without this patch I was barely able to get 1 Gb/s of throughput.
With this patch applied I am now at least getting around 6 Gb/s.

The issue is due to the fact that with FOU or GUE applied we do not provide
a transport offset pointing to the GRE header, nor do we offload it in
software as the GRE header is completely skipped by GSO and treated like a
VXLAN or GENEVE type header.  As such we need to prevent the stack from
generating it and also prevent GRE from generating it via any interface we
create.

Fixes: c3483384ee511 ("gro: Allow tunnel stacking in the case of FOU/GUE")
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
 include/linux/netdevice.h |    5 ++++-
 net/core/dev.c            |    1 +
 net/ipv4/fou.c            |    6 ++++++
 net/ipv4/gre_offload.c    |    8 ++++++++
 net/ipv4/ip_gre.c         |   13 ++++++++++---
 5 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cb0d5d09c2e4..8395308a2445 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2120,7 +2120,10 @@ struct napi_gro_cb {
 	/* Used in foo-over-udp, set in udp[46]_gro_receive */
 	u8	is_ipv6:1;
 
-	/* 7 bit hole */
+	/* Used in GRE, set in fou/gue_gro_receive */
+	u8	is_fou:1;
+
+	/* 6 bit hole */
 
 	/* used to support CHECKSUM_COMPLETE for tunneling protocols */
 	__wsum	csum;
diff --git a/net/core/dev.c b/net/core/dev.c
index b9bcbe77d913..77a71cd68535 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4439,6 +4439,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
 		NAPI_GRO_CB(skb)->flush = 0;
 		NAPI_GRO_CB(skb)->free = 0;
 		NAPI_GRO_CB(skb)->encap_mark = 0;
+		NAPI_GRO_CB(skb)->is_fou = 0;
 		NAPI_GRO_CB(skb)->gro_remcsum_start = 0;
 
 		/* Setup for GRO checksum validation */
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index 5a94aea280d3..a39068b4a4d9 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -203,6 +203,9 @@ static struct sk_buff **fou_gro_receive(struct sk_buff **head,
 	 */
 	NAPI_GRO_CB(skb)->encap_mark = 0;
 
+	/* Flag this frame as already having an outer encap header */
+	NAPI_GRO_CB(skb)->is_fou = 1;
+
 	rcu_read_lock();
 	offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
 	ops = rcu_dereference(offloads[proto]);
@@ -368,6 +371,9 @@ static struct sk_buff **gue_gro_receive(struct sk_buff **head,
 	 */
 	NAPI_GRO_CB(skb)->encap_mark = 0;
 
+	/* Flag this frame as already having an outer encap header */
+	NAPI_GRO_CB(skb)->is_fou = 1;
+
 	rcu_read_lock();
 	offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
 	ops = rcu_dereference(offloads[guehdr->proto_ctype]);
diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
index c47539d04b88..6a5bd4317866 100644
--- a/net/ipv4/gre_offload.c
+++ b/net/ipv4/gre_offload.c
@@ -150,6 +150,14 @@ static struct sk_buff **gre_gro_receive(struct sk_buff **head,
 	if ((greh->flags & ~(GRE_KEY|GRE_CSUM)) != 0)
 		goto out;
 
+	/* We can only support GRE_CSUM if we can track the location of
+	 * the GRE header.  In the case of FOU/GUE we cannot because the
+	 * outer UDP header displaces the GRE header leaving us in a state
+	 * of limbo.
+	 */
+	if ((greh->flags & GRE_CSUM) && NAPI_GRO_CB(skb)->is_fou)
+		goto out;
+
 	type = greh->protocol;
 
 	rcu_read_lock();
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 31936d387cfd..af5d1f38217f 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -862,9 +862,16 @@ static void __gre_tunnel_init(struct net_device *dev)
 	dev->hw_features	|= GRE_FEATURES;
 
 	if (!(tunnel->parms.o_flags & TUNNEL_SEQ)) {
-		/* TCP offload with GRE SEQ is not supported. */
-		dev->features    |= NETIF_F_GSO_SOFTWARE;
-		dev->hw_features |= NETIF_F_GSO_SOFTWARE;
+		/* TCP offload with GRE SEQ is not supported, nor
+		 * can we support 2 levels of outer headers requiring
+		 * an update.
+		 */
+		if (!(tunnel->parms.o_flags & TUNNEL_CSUM) ||
+		    (tunnel->encap.type == TUNNEL_ENCAP_NONE)) {
+			dev->features    |= NETIF_F_GSO_SOFTWARE;
+			dev->hw_features |= NETIF_F_GSO_SOFTWARE;
+		}
+
 		/* Can use a lockless transmit, unless we generate
 		 * output sequences
 		 */

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864
  2016-04-01 18:05 [net PATCH 0/2] Fixes for GRO and GRE tunnels Alexander Duyck
  2016-04-01 18:05 ` [net PATCH 1/2] GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU Alexander Duyck
@ 2016-04-01 18:05 ` Alexander Duyck
  2016-04-01 18:49   ` Eric Dumazet
  1 sibling, 1 reply; 15+ messages in thread
From: Alexander Duyck @ 2016-04-01 18:05 UTC (permalink / raw)
  To: herbert, tom, jesse, alexander.duyck, edumazet, netdev, davem

RFC 6864 states that the IPv4 ID field MUST NOT be used for purposes other
than fragmentation and reassembly.  Currently we are looking at this field
as a way of identifying what frames can be aggregated and  which cannot for
GRO.  While this is valid for frames that do not have DF set, it is invalid
to do so if the bit is set.

In addition we were generating IPv4 ID collisions when 2 or more flows were
interleaved over the same tunnel.  To prevent that we store the result of
all IP ID checks via a "|=" instead of overwriting previous values.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
---
 net/core/dev.c         |    1 +
 net/ipv4/af_inet.c     |   23 ++++++++++++++++-------
 net/ipv6/ip6_offload.c |    3 ---
 3 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 77a71cd68535..3429632398a4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4352,6 +4352,7 @@ static void gro_list_prepare(struct napi_struct *napi, struct sk_buff *skb)
 		unsigned long diffs;
 
 		NAPI_GRO_CB(p)->flush = 0;
+		NAPI_GRO_CB(p)->flush_id = 0;
 
 		if (hash != skb_get_hash_raw(p)) {
 			NAPI_GRO_CB(p)->same_flow = 0;
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 9e481992dbae..7d8733393934 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1347,14 +1347,23 @@ static struct sk_buff **inet_gro_receive(struct sk_buff **head,
 			(iph->tos ^ iph2->tos) |
 			((iph->frag_off ^ iph2->frag_off) & htons(IP_DF));
 
-		/* Save the IP ID check to be included later when we get to
-		 * the transport layer so only the inner most IP ID is checked.
-		 * This is because some GSO/TSO implementations do not
-		 * correctly increment the IP ID for the outer hdrs.
-		 */
-		NAPI_GRO_CB(p)->flush_id =
-			    ((u16)(ntohs(iph2->id) + NAPI_GRO_CB(p)->count) ^ id);
 		NAPI_GRO_CB(p)->flush |= flush;
+
+		/* For non-atomic datagrams we need to save the IP ID offset
+		 * to be included later.  If the frame has the DF bit set
+		 * we must ignore the IP ID value as per RFC 6864.
+		 */
+		if (iph2->frag_off & htons(IP_DF))
+			continue;
+
+		/* We must save the offset as it is possible to have multiple
+		 * flows using the same protocol and address pairs so we
+		 * need to wait until we can validate this is part of the
+		 * same flow with a 5-tuple or better to avoid unnecessary
+		 * collisions between flows.
+		 */
+		NAPI_GRO_CB(p)->flush_id |= ntohs(iph2->id) ^
+					    (u16)(id - NAPI_GRO_CB(p)->count);
 	}
 
 	NAPI_GRO_CB(skb)->flush |= flush;
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 82e9f3076028..9aa53f64dffd 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -238,9 +238,6 @@ static struct sk_buff **ipv6_gro_receive(struct sk_buff **head,
 		/* flush if Traffic Class fields are different */
 		NAPI_GRO_CB(p)->flush |= !!(first_word & htonl(0x0FF00000));
 		NAPI_GRO_CB(p)->flush |= flush;
-
-		/* Clear flush_id, there's really no concept of ID in IPv6. */
-		NAPI_GRO_CB(p)->flush_id = 0;
 	}
 
 	NAPI_GRO_CB(skb)->flush |= flush;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864
  2016-04-01 18:05 ` [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864 Alexander Duyck
@ 2016-04-01 18:49   ` Eric Dumazet
  2016-04-01 19:24     ` David Miller
  2016-04-02  1:57     ` Herbert Xu
  0 siblings, 2 replies; 15+ messages in thread
From: Eric Dumazet @ 2016-04-01 18:49 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: herbert, tom, jesse, alexander.duyck, edumazet, netdev, davem

On Fri, 2016-04-01 at 11:05 -0700, Alexander Duyck wrote:
> RFC 6864 states that the IPv4 ID field MUST NOT be used for purposes other
> than fragmentation and reassembly.  Currently we are looking at this field
> as a way of identifying what frames can be aggregated and  which cannot for
> GRO.  While this is valid for frames that do not have DF set, it is invalid
> to do so if the bit is set.
> 
> In addition we were generating IPv4 ID collisions when 2 or more flows were
> interleaved over the same tunnel.  To prevent that we store the result of
> all IP ID checks via a "|=" instead of overwriting previous values.

Note that for atomic datagrams (DF = 1), since fragmentation and
reassembly can not occur, maybe some people use ID field for other
purposes.

For example, TCP stack tracks per socket ID generation, even if it sends
DF=1 frames. Damn useful for tcpdump analysis and drop inference.


With your change, the resulting GRO packet would propagate the ID of
first frag. Most GSO/GSO engines will then provide a ID sequence, which
might not match original packets.

I do not particularly care, but it is worth mentioning that GRO+TSO
would not be idempotent anymore.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864
  2016-04-01 18:49   ` Eric Dumazet
@ 2016-04-01 19:24     ` David Miller
  2016-04-01 19:58       ` Alexander Duyck
  2016-04-02  1:57     ` Herbert Xu
  1 sibling, 1 reply; 15+ messages in thread
From: David Miller @ 2016-04-01 19:24 UTC (permalink / raw)
  To: eric.dumazet
  Cc: aduyck, herbert, tom, jesse, alexander.duyck, edumazet, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 01 Apr 2016 11:49:03 -0700

> For example, TCP stack tracks per socket ID generation, even if it
> sends DF=1 frames. Damn useful for tcpdump analysis and drop
> inference.

Thanks for mentioning this, I never considered this use case.

> With your change, the resulting GRO packet would propagate the ID of
> first frag. Most GSO/GSO engines will then provide a ID sequence,
> which might not match original packets.
> 
> I do not particularly care, but it is worth mentioning that GRO+TSO
> would not be idempotent anymore.

Our eventual plan was to start emitting zero in the ID field for
outgoing TCP datagrams with DF set, since the issue that caused us to
generate incrementing IDs in the first place (buggy Microsoft SLHC
compression) we decided is not relevant and important enough to
accommodate any more.

So outside of your TCP behavior analysis case, there isn't a
compelling argument to keeping that code around any more, rather than
just put zero in the ID field.

I suppose we could keep the counter code around and allow it to be
enabled using a sysctl or socket option, but how strongly do you
really feel about this?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864
  2016-04-01 19:24     ` David Miller
@ 2016-04-01 19:58       ` Alexander Duyck
  2016-04-01 21:13         ` Subash Abhinov Kasiviswanathan
  2016-04-02  2:16         ` David Miller
  0 siblings, 2 replies; 15+ messages in thread
From: Alexander Duyck @ 2016-04-01 19:58 UTC (permalink / raw)
  To: David Miller
  Cc: Eric Dumazet, Alex Duyck, Herbert Xu, Tom Herbert, Jesse Gross,
	Eric Dumazet, Netdev

On Fri, Apr 1, 2016 at 12:24 PM, David Miller <davem@davemloft.net> wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 01 Apr 2016 11:49:03 -0700
>
>> For example, TCP stack tracks per socket ID generation, even if it
>> sends DF=1 frames. Damn useful for tcpdump analysis and drop
>> inference.
>
> Thanks for mentioning this, I never considered this use case.

RFC 6864 is pretty explicit about this, IPv4 ID used only for
fragmentation.  https://tools.ietf.org/html/rfc6864#section-4.1

The goal with this change is to try and keep most of the existing
behavior in tact without violating this rule?  I would think the
sequence number should give you the ability to infer a drop in the
case of TCP.  In the case of UDP tunnels we are now getting a bit more
data since we were ignoring the outer IP header ID before.

>> With your change, the resulting GRO packet would propagate the ID of
>> first frag. Most GSO/GSO engines will then provide a ID sequence,
>> which might not match original packets.

Right.  But that is only in the case where the IP IDs did not already
increment or were left uninitialized meaning the transmitter was
probably already following RFC 6864 and chose a fixed value.  Odds are
in such a case we end up improving the performance if anything as
there are plenty of legacy systems out there that still require the
IPv4 ID increment in order to get LRO/GRO.

>> I do not particularly care, but it is worth mentioning that GRO+TSO
>> would not be idempotent anymore.

In the patch I mentioned we had already broken that.  I'm basically
just going through and fixing the cases for tunnels where we were
doing the outer header wrong while at the same time relaxing the
requirements for the inner header if DF is set.  I'll probably add
some documentation do the Documentation folder about it as well.  I'm
currently in the process of writing up documentation for GSO and GSO
partial for the upcoming patchset.  I can pretty easily throw in a few
comments about GRO as well.

> Our eventual plan was to start emitting zero in the ID field for
> outgoing TCP datagrams with DF set, since the issue that caused us to
> generate incrementing IDs in the first place (buggy Microsoft SLHC
> compression) we decided is not relevant and important enough to
> accommodate any more.

For the GSO partial stuff I was probably just going to have the IP ID
on the inner headers lurch forward in chunks equal to gso_segs when we
are doing the segmentation.  I didn't want to use a fixed value just
because that would likely make it easy to identify Linux devices being
a bump in the wire.  I figure if there are already sources that
weren't updating IP ID for their segmentation offloads then if we just
take that approach odds are we will blend in with the other devices
and be more difficult to single out.

Another reason for doing it this way is that different devices are
going to have different behaviors with GSO partial.  In the case of
the i40e driver it recognizes both inner and outer network headers so
it can increment both correctly.  In the case of igb and ixgbe they
only can support the outer header so the inner IP ID value would be
lurching by gso_size every time we move from one GSO frame to the
next.

> So outside of your TCP behavior analysis case, there isn't a
> compelling argument to keeping that code around any more, rather than
> just put zero in the ID field.
>
> I suppose we could keep the counter code around and allow it to be
> enabled using a sysctl or socket option, but how strongly do you
> really feel about this?

I'm not suggesting we drop the counter code for transmit.  What RFC
6864 says is "Originating sources MAY set the IPv4 ID field of atomic
datagrams to any value."

For transmit we can leave the IP ID code as is.  For receive we should
not be snooping into the IP ID for any frames that have the DF bit set
as devices that have adopted RFC 6864 on their transmit path will end
up causing issues.

- Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864
  2016-04-01 19:58       ` Alexander Duyck
@ 2016-04-01 21:13         ` Subash Abhinov Kasiviswanathan
  2016-04-02  2:16         ` David Miller
  1 sibling, 0 replies; 15+ messages in thread
From: Subash Abhinov Kasiviswanathan @ 2016-04-01 21:13 UTC (permalink / raw)
  To: 'Alexander Duyck', 'David Miller'
  Cc: 'Eric Dumazet', 'Alex Duyck',
	'Herbert Xu', 'Tom Herbert',
	'Jesse Gross', 'Eric Dumazet', 'Netdev'

|  For transmit we can leave the IP ID code as is.  For receive we should not be
|  snooping into the IP ID for any frames that have the DF bit set as devices
|  that have adopted RFC 6864 on their transmit path will end up causing issues.

Currently, GRO does not coalesce TCP packets originating from nodes which do IPv6 to IPv4 translation.
These packets have DF set to 1 and IP ID set 0 according to https://tools.ietf.org/html/rfc6145#page-17 

With this patch, we should be able to see coalescing happen for those cases as well.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864
  2016-04-01 18:49   ` Eric Dumazet
  2016-04-01 19:24     ` David Miller
@ 2016-04-02  1:57     ` Herbert Xu
  2016-04-02  2:15       ` Eric Dumazet
  1 sibling, 1 reply; 15+ messages in thread
From: Herbert Xu @ 2016-04-02  1:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: aduyck, tom, jesse, alexander.duyck, edumazet, netdev, davem

Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> I do not particularly care, but it is worth mentioning that GRO+TSO
> would not be idempotent anymore.

We could easily fix that by adding a feature bit to control this,
something like SKB_GSO_TCP_FIXEDID.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864
  2016-04-02  1:57     ` Herbert Xu
@ 2016-04-02  2:15       ` Eric Dumazet
  2016-04-02  2:19         ` Herbert Xu
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2016-04-02  2:15 UTC (permalink / raw)
  To: Herbert Xu; +Cc: aduyck, tom, jesse, alexander.duyck, edumazet, netdev, davem

On Sat, 2016-04-02 at 09:57 +0800, Herbert Xu wrote:
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > I do not particularly care, but it is worth mentioning that GRO+TSO
> > would not be idempotent anymore.
> 
> We could easily fix that by adding a feature bit to control this,
> something like SKB_GSO_TCP_FIXEDID.

I understood the patch allowed to aggregate 4 segments having

ID=12   ID=40   ID=80  ID=1000

-> resulting GRO packet with 4 segments and ID=12.  TSO would generate
12,13,14,15   or 12,12,12,12 with your flag ?

(Before the patch no aggregation took place and exact same packets were
forwarded with 12, 40, 80, 1000)

As I said, I am not sure we should care :)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864
  2016-04-01 19:58       ` Alexander Duyck
  2016-04-01 21:13         ` Subash Abhinov Kasiviswanathan
@ 2016-04-02  2:16         ` David Miller
  2016-04-02  2:21           ` Eric Dumazet
  2016-04-02  6:02           ` Alexander Duyck
  1 sibling, 2 replies; 15+ messages in thread
From: David Miller @ 2016-04-02  2:16 UTC (permalink / raw)
  To: alexander.duyck
  Cc: eric.dumazet, aduyck, herbert, tom, jesse, edumazet, netdev

From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Fri, 1 Apr 2016 12:58:41 -0700

> RFC 6864 is pretty explicit about this, IPv4 ID used only for
> fragmentation.  https://tools.ietf.org/html/rfc6864#section-4.1
> 
> The goal with this change is to try and keep most of the existing
> behavior in tact without violating this rule?  I would think the
> sequence number should give you the ability to infer a drop in the
> case of TCP.  In the case of UDP tunnels we are now getting a bit more
> data since we were ignoring the outer IP header ID before.

When retransmits happen, the sequence numbers are the same.  But you
can then use the IP ID to see exactly what happened.  You can even
tell if multiple retransmits got reordered.

Eric's use case is extremely useful, and flat out eliminates ambiguity
when analyzing TCP traces.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864
  2016-04-02  2:15       ` Eric Dumazet
@ 2016-04-02  2:19         ` Herbert Xu
  2016-04-02  2:26           ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Herbert Xu @ 2016-04-02  2:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: aduyck, tom, jesse, alexander.duyck, edumazet, netdev, davem

On Fri, Apr 01, 2016 at 07:15:33PM -0700, Eric Dumazet wrote:
> On Sat, 2016-04-02 at 09:57 +0800, Herbert Xu wrote:
> > 
> > We could easily fix that by adding a feature bit to control this,
> > something like SKB_GSO_TCP_FIXEDID.
> 
> I understood the patch allowed to aggregate 4 segments having
> 
> ID=12   ID=40   ID=80  ID=1000

Right.  But I haven't seen any justification that we should aggregate
such packets at all.  The only valid case that I have seen is for
the case of unchanging IDs, no?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864
  2016-04-02  2:16         ` David Miller
@ 2016-04-02  2:21           ` Eric Dumazet
  2016-04-02 20:26             ` Rick Jones
  2016-04-02  6:02           ` Alexander Duyck
  1 sibling, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2016-04-02  2:21 UTC (permalink / raw)
  To: David Miller
  Cc: alexander.duyck, aduyck, herbert, tom, jesse, edumazet, netdev

On Fri, 2016-04-01 at 22:16 -0400, David Miller wrote:
> From: Alexander Duyck <alexander.duyck@gmail.com>
> Date: Fri, 1 Apr 2016 12:58:41 -0700
> 
> > RFC 6864 is pretty explicit about this, IPv4 ID used only for
> > fragmentation.  https://tools.ietf.org/html/rfc6864#section-4.1
> > 
> > The goal with this change is to try and keep most of the existing
> > behavior in tact without violating this rule?  I would think the
> > sequence number should give you the ability to infer a drop in the
> > case of TCP.  In the case of UDP tunnels we are now getting a bit more
> > data since we were ignoring the outer IP header ID before.
> 
> When retransmits happen, the sequence numbers are the same.  But you
> can then use the IP ID to see exactly what happened.  You can even
> tell if multiple retransmits got reordered.
> 
> Eric's use case is extremely useful, and flat out eliminates ambiguity
> when analyzing TCP traces.

Yes, our team (including Van Jacobson ;) ) would be sad to not have
sequential IP ID (but then we don't have them for IPv6 ;) )

Since the cost of generating them is pretty small (inet->inet_id
counter), we probably should keep them in linux. Their usage will phase
out as IPv6 wins the Internet war...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864
  2016-04-02  2:19         ` Herbert Xu
@ 2016-04-02  2:26           ` Eric Dumazet
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2016-04-02  2:26 UTC (permalink / raw)
  To: Herbert Xu; +Cc: aduyck, tom, jesse, alexander.duyck, edumazet, netdev, davem

On Sat, 2016-04-02 at 10:19 +0800, Herbert Xu wrote:
> On Fri, Apr 01, 2016 at 07:15:33PM -0700, Eric Dumazet wrote:
> > On Sat, 2016-04-02 at 09:57 +0800, Herbert Xu wrote:
> > > 
> > > We could easily fix that by adding a feature bit to control this,
> > > something like SKB_GSO_TCP_FIXEDID.
> > 
> > I understood the patch allowed to aggregate 4 segments having
> > 
> > ID=12   ID=40   ID=80  ID=1000
> 
> Right.  But I haven't seen any justification that we should aggregate
> such packets at all.  The only valid case that I have seen is for
> the case of unchanging IDs, no?

Presumably repeats of "DF=1 ID=0" should be what we really need to
handle.

On my wish list, having some reordering logic in GRO would be far more
interesting than these minor details ;)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864
  2016-04-02  2:16         ` David Miller
  2016-04-02  2:21           ` Eric Dumazet
@ 2016-04-02  6:02           ` Alexander Duyck
  1 sibling, 0 replies; 15+ messages in thread
From: Alexander Duyck @ 2016-04-02  6:02 UTC (permalink / raw)
  To: David Miller
  Cc: Eric Dumazet, Alex Duyck, Herbert Xu, Tom Herbert, Jesse Gross,
	Eric Dumazet, Netdev

On Fri, Apr 1, 2016 at 7:16 PM, David Miller <davem@davemloft.net> wrote:
> From: Alexander Duyck <alexander.duyck@gmail.com>
> Date: Fri, 1 Apr 2016 12:58:41 -0700
>
>> RFC 6864 is pretty explicit about this, IPv4 ID used only for
>> fragmentation.  https://tools.ietf.org/html/rfc6864#section-4.1
>>
>> The goal with this change is to try and keep most of the existing
>> behavior in tact without violating this rule?  I would think the
>> sequence number should give you the ability to infer a drop in the
>> case of TCP.  In the case of UDP tunnels we are now getting a bit more
>> data since we were ignoring the outer IP header ID before.
>
> When retransmits happen, the sequence numbers are the same.  But you
> can then use the IP ID to see exactly what happened.  You can even
> tell if multiple retransmits got reordered.
>
> Eric's use case is extremely useful, and flat out eliminates ambiguity
> when analyzing TCP traces.

I'm not really sure the IP ID is really all that useful.  On a 10G
link it takes about 80ms for it to wrap using an MTU of 1500.  That
value is going to keep dropping as we move up to 40G and 100G.  That
was one of the motivations behind RFC 6864 because it starts becoming
a bottle-neck if you want to keep the IP IDs truly unique.  In
addition while this change would allow you to combined disjointed IP
IDs I don't think you would lose the re-transmission as there would
likely be a gap in sequence numbers that would cause the flow to be
flushed from GRO, and it isn't as if we can prepend the retransmit to
the aggregated frame, we are always adding to the tail.

I would think the most likely case where this change would foul up any
IP IDs would be the garbage-in/garbage-out case like the IPv6 to IPv4
translation that is using the fixed IP ID of 0.  I agree that it isn't
desirable, however at the same time per RFC 6864 there is nothing
there to say we cannot do that.  In addition it would likely help to
mitigate some of the impact of IP ID on things like SLHC compression
since the resegmented frames would be incrementing so it would reduce
the number of times the IP ID would have to be updated.

- Alex

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864
  2016-04-02  2:21           ` Eric Dumazet
@ 2016-04-02 20:26             ` Rick Jones
  0 siblings, 0 replies; 15+ messages in thread
From: Rick Jones @ 2016-04-02 20:26 UTC (permalink / raw)
  To: Eric Dumazet, David Miller
  Cc: alexander.duyck, aduyck, herbert, tom, jesse, edumazet, netdev

On 04/01/2016 07:21 PM, Eric Dumazet wrote:
> On Fri, 2016-04-01 at 22:16 -0400, David Miller wrote:
>> From: Alexander Duyck <alexander.duyck@gmail.com>
>> Date: Fri, 1 Apr 2016 12:58:41 -0700
>>
>>> RFC 6864 is pretty explicit about this, IPv4 ID used only for
>>> fragmentation.  https://tools.ietf.org/html/rfc6864#section-4.1
>>>
>>> The goal with this change is to try and keep most of the existing
>>> behavior in tact without violating this rule?  I would think the
>>> sequence number should give you the ability to infer a drop in the
>>> case of TCP.  In the case of UDP tunnels we are now getting a bit more
>>> data since we were ignoring the outer IP header ID before.
>>
>> When retransmits happen, the sequence numbers are the same.  But you
>> can then use the IP ID to see exactly what happened.  You can even
>> tell if multiple retransmits got reordered.
>>
>> Eric's use case is extremely useful, and flat out eliminates ambiguity
>> when analyzing TCP traces.
>
> Yes, our team (including Van Jacobson ;) ) would be sad to not have
> sequential IP ID (but then we don't have them for IPv6 ;) )

Your team would not be the only one sad to see that go away.

rick jones

> Since the cost of generating them is pretty small (inet->inet_id
> counter), we probably should keep them in linux. Their usage will phase
> out as IPv6 wins the Internet war...
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-04-02 20:26 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-01 18:05 [net PATCH 0/2] Fixes for GRO and GRE tunnels Alexander Duyck
2016-04-01 18:05 ` [net PATCH 1/2] GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU Alexander Duyck
2016-04-01 18:05 ` [net PATCH 2/2] ipv4/GRO: Make GRO conform to RFC 6864 Alexander Duyck
2016-04-01 18:49   ` Eric Dumazet
2016-04-01 19:24     ` David Miller
2016-04-01 19:58       ` Alexander Duyck
2016-04-01 21:13         ` Subash Abhinov Kasiviswanathan
2016-04-02  2:16         ` David Miller
2016-04-02  2:21           ` Eric Dumazet
2016-04-02 20:26             ` Rick Jones
2016-04-02  6:02           ` Alexander Duyck
2016-04-02  1:57     ` Herbert Xu
2016-04-02  2:15       ` Eric Dumazet
2016-04-02  2:19         ` Herbert Xu
2016-04-02  2:26           ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.