All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] gro: Partly revert "net: gro: allow to build full sized skb"
@ 2016-04-21  7:40 Steffen Klassert
  2016-04-21 12:59 ` Eric Dumazet
  2016-04-21 16:02 ` Alexander Duyck
  0 siblings, 2 replies; 8+ messages in thread
From: Steffen Klassert @ 2016-04-21  7:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Sowmini Varadhan, netdev

This partly reverts the below mentioned patch because on
forwarding, such skbs can't be offloaded to a NIC.

We need this to get IPsec GRO for forwarding to work properly,
otherwise the GRO aggregated packets get segmented again by
the GSO layer. Although discovered when implementing IPsec GRO,
this is a general problem in the forwarding path.

-------------------------------------------------------------------------
commit 8a29111c7ca68d928dfab58636f3f6acf0ac04f7
Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Oct 8 09:02:23 2013 -0700

    net: gro: allow to build full sized skb

    skb_gro_receive() is currently limited to 16 or 17 MSS per GRO skb,
    typically 24616 bytes, because it fills up to MAX_SKB_FRAGS frags.

    It's relatively easy to extend the skb using frag_list to allow
    more frags to be appended into the last sk_buff.

    This still builds very efficient skbs, and allows reaching 45 MSS per
    skb.

    (45 MSS GRO packet uses one skb plus a frag_list containing 2 additional
    sk_buff)

    High speed TCP flows benefit from this extension by lowering TCP stack
    cpu usage (less packets stored in receive queue, less ACK packets
    processed)

    Forwarding setups could be hurt, as such skbs will need to be
    linearized, although its not a new problem, as GRO could already
    provide skbs with a frag_list.

    We could make the 65536 bytes threshold a tunable to mitigate this.

    (First time we need to linearize skb in skb_needs_linearize(), we could
    lower the tunable to ~16*1460 so that following skb_gro_receive() calls
    build smaller skbs)

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
---------------------------------------------------------------------------

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---

Hi Eric, this is a followup on our discussion at the netdev
conference. Would you still be ok with this revert, or do
you think there is a better solution in sight?

The full IPsec patchset for what I need this can be found here:

https://git.kernel.org/cgit/linux/kernel/git/klassert/linux-stk.git/log/?h=net-next-ipsec-offload-work

 net/core/skbuff.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 4cc594c..fb11a9b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3347,7 +3347,7 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 		int nr_frags = pinfo->nr_frags + i;
 
 		if (nr_frags > MAX_SKB_FRAGS)
-			goto merge;
+			return -E2BIG;
 
 		offset -= headlen;
 		pinfo->nr_frags = nr_frags;
@@ -3380,7 +3380,7 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 		unsigned int first_offset;
 
 		if (nr_frags + 1 + skbinfo->nr_frags > MAX_SKB_FRAGS)
-			goto merge;
+			return -E2BIG;
 
 		first_offset = skb->data -
 			       (unsigned char *)page_address(page) +
@@ -3400,7 +3400,6 @@ int skb_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 		goto done;
 	}
 
-merge:
 	delta_truesize = skb->truesize;
 	if (offset > headlen) {
 		unsigned int eat = offset - headlen;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] gro: Partly revert "net: gro: allow to build full sized skb"
  2016-04-21  7:40 [RFC PATCH] gro: Partly revert "net: gro: allow to build full sized skb" Steffen Klassert
@ 2016-04-21 12:59 ` Eric Dumazet
  2016-04-22  9:13   ` Steffen Klassert
  2016-04-21 16:02 ` Alexander Duyck
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2016-04-21 12:59 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Sowmini Varadhan, netdev

On Thu, 2016-04-21 at 09:40 +0200, Steffen Klassert wrote:
> This partly reverts the below mentioned patch because on
> forwarding, such skbs can't be offloaded to a NIC.
> 
> We need this to get IPsec GRO for forwarding to work properly,
> otherwise the GRO aggregated packets get segmented again by
> the GSO layer. Although discovered when implementing IPsec GRO,
> this is a general problem in the forwarding path.
> 
> -------------------------------------------------------------------------
> commit 8a29111c7ca68d928dfab58636f3f6acf0ac04f7
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Tue Oct 8 09:02:23 2013 -0700
> 
>     net: gro: allow to build full sized skb
> 
>     skb_gro_receive() is currently limited to 16 or 17 MSS per GRO skb,
>     typically 24616 bytes, because it fills up to MAX_SKB_FRAGS frags.
> 
>     It's relatively easy to extend the skb using frag_list to allow
>     more frags to be appended into the last sk_buff.
> 
>     This still builds very efficient skbs, and allows reaching 45 MSS per
>     skb.
> 
>     (45 MSS GRO packet uses one skb plus a frag_list containing 2 additional
>     sk_buff)
> 
>     High speed TCP flows benefit from this extension by lowering TCP stack
>     cpu usage (less packets stored in receive queue, less ACK packets
>     processed)
> 
>     Forwarding setups could be hurt, as such skbs will need to be
>     linearized, although its not a new problem, as GRO could already
>     provide skbs with a frag_list.
> 
>     We could make the 65536 bytes threshold a tunable to mitigate this.
> 
>     (First time we need to linearize skb in skb_needs_linearize(), we could
>     lower the tunable to ~16*1460 so that following skb_gro_receive() calls
>     build smaller skbs)
> 
>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> ---------------------------------------------------------------------------
> 
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> ---
> 
> Hi Eric, this is a followup on our discussion at the netdev
> conference. Would you still be ok with this revert, or do
> you think there is a better solution in sight?

Note that some GRO enabled drivers would still generate frag_list.

(This happens if they are using skb with some TCP payload in skb->head
and skb->head was allocated with kmalloc())

We have sysctl_max_skb_frags sysctl, we might have a sysctl
enabling/disabling GRO from building any frag_list.
Or simply reuse an existing one, like /proc/sys/net/ipv4/ip_forward ?)

Here at Google, we increased MAX_SKB_FRAGS, but this is a rather
intrusive change to be upstreamed :(

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] gro: Partly revert "net: gro: allow to build full sized skb"
  2016-04-21  7:40 [RFC PATCH] gro: Partly revert "net: gro: allow to build full sized skb" Steffen Klassert
  2016-04-21 12:59 ` Eric Dumazet
@ 2016-04-21 16:02 ` Alexander Duyck
  2016-04-22  8:51   ` Steffen Klassert
  1 sibling, 1 reply; 8+ messages in thread
From: Alexander Duyck @ 2016-04-21 16:02 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Eric Dumazet, Sowmini Varadhan, Netdev

On Thu, Apr 21, 2016 at 12:40 AM, Steffen Klassert
<steffen.klassert@secunet.com> wrote:
> This partly reverts the below mentioned patch because on
> forwarding, such skbs can't be offloaded to a NIC.
>
> We need this to get IPsec GRO for forwarding to work properly,
> otherwise the GRO aggregated packets get segmented again by
> the GSO layer. Although discovered when implementing IPsec GRO,
> this is a general problem in the forwarding path.

I'm confused as to why you would need this to get IPsec GRO forwarding
to work.  Are you having to go through a device that doesn't have
NETIF_F_FRAGLIST defined?  Also what is the issue with having to go
through the GSO layer on segmentation?  It seems like we might be able
to do something like what we did with GSO partial to split frames so
that they are in chunks that wouldn't require NETIF_F_FRAGLIST.  Then
you could get the best of both worlds in that the stack would only
process one super-frame, and the transmitter could TSO a series of
frames that are some fixed MSS in size.

- Alex

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] gro: Partly revert "net: gro: allow to build full sized skb"
  2016-04-21 16:02 ` Alexander Duyck
@ 2016-04-22  8:51   ` Steffen Klassert
  2016-04-22 17:14     ` Alexander Duyck
  0 siblings, 1 reply; 8+ messages in thread
From: Steffen Klassert @ 2016-04-22  8:51 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Eric Dumazet, Sowmini Varadhan, Netdev

On Thu, Apr 21, 2016 at 09:02:48AM -0700, Alexander Duyck wrote:
> On Thu, Apr 21, 2016 at 12:40 AM, Steffen Klassert
> <steffen.klassert@secunet.com> wrote:
> > This partly reverts the below mentioned patch because on
> > forwarding, such skbs can't be offloaded to a NIC.
> >
> > We need this to get IPsec GRO for forwarding to work properly,
> > otherwise the GRO aggregated packets get segmented again by
> > the GSO layer. Although discovered when implementing IPsec GRO,
> > this is a general problem in the forwarding path.
> 
> I'm confused as to why you would need this to get IPsec GRO forwarding
> to work. 

It works without this, but the performance numbers are not that good
if we have to do GSO in software.

> Are you having to go through a device that doesn't have
> NETIF_F_FRAGLIST defined?

I don't know of any NIC that can do TSO on a skbuff with fraglist,
that's why I try to avoid to have a buffer with fraglist.

> Also what is the issue with having to go
> through the GSO layer on segmentation?  It seems like we might be able
> to do something like what we did with GSO partial to split frames so
> that they are in chunks that wouldn't require NETIF_F_FRAGLIST.  Then
> you could get the best of both worlds in that the stack would only
> process one super-frame, and the transmitter could TSO a series of
> frames that are some fixed MSS in size.

This could be interesting. Then we could have a buffer with
fraglist, GSO layer splits in skbuffs without fraglist that
can be TSO offloaded. Something like this might solve my
performance problems.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] gro: Partly revert "net: gro: allow to build full sized skb"
  2016-04-21 12:59 ` Eric Dumazet
@ 2016-04-22  9:13   ` Steffen Klassert
  2016-04-22 12:39     ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Steffen Klassert @ 2016-04-22  9:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Sowmini Varadhan, netdev

On Thu, Apr 21, 2016 at 05:59:06AM -0700, Eric Dumazet wrote:
> On Thu, 2016-04-21 at 09:40 +0200, Steffen Klassert wrote:
> > 
> > Hi Eric, this is a followup on our discussion at the netdev
> > conference. Would you still be ok with this revert, or do
> > you think there is a better solution in sight?
> 
> Note that some GRO enabled drivers would still generate frag_list.
> 
> (This happens if they are using skb with some TCP payload in skb->head
> and skb->head was allocated with kmalloc())
> 
> We have sysctl_max_skb_frags sysctl, we might have a sysctl
> enabling/disabling GRO from building any frag_list.
> Or simply reuse an existing one, like /proc/sys/net/ipv4/ip_forward ?)

Reusing the ipv4/ipv6 forwarding sysctls would be probably the
simplest solution, but maybe we can do the partial split in
the GSO layer that Alex proposed. Then we would not need to
change the way GRO builds the buffers.

> 
> Here at Google, we increased MAX_SKB_FRAGS, but this is a rather
> intrusive change to be upstreamed :(

I've played here with MAX_SKB_FRAGS too, but it seems to
be device specific how many page fragments it can handle.
I wonder if we could increase MAX_SKB_FRAGS at the GRO
layer and let GSO split this buffer in something that
the transmitting device can handle?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] gro: Partly revert "net: gro: allow to build full sized skb"
  2016-04-22  9:13   ` Steffen Klassert
@ 2016-04-22 12:39     ` Eric Dumazet
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2016-04-22 12:39 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Sowmini Varadhan, netdev

On Fri, 2016-04-22 at 11:13 +0200, Steffen Klassert wrote:
> On Thu, Apr 21, 2016 at 05:59:06AM -0700, Eric Dumazet wrote:

> > Here at Google, we increased MAX_SKB_FRAGS, but this is a rather
> > intrusive change to be upstreamed :(
> 
> I've played here with MAX_SKB_FRAGS too, but it seems to
> be device specific how many page fragments it can handle.
> I wonder if we could increase MAX_SKB_FRAGS at the GRO
> layer and let GSO split this buffer in something that
> the transmitting device can handle?

Yes, the same principle would apply.

Split a GRO packet into X TSO packets with whatever number of frags.

bnx2x has to linearize some skbs having more than 13 frags.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] gro: Partly revert "net: gro: allow to build full sized skb"
  2016-04-22  8:51   ` Steffen Klassert
@ 2016-04-22 17:14     ` Alexander Duyck
  2016-06-24  8:22       ` Steffen Klassert
  0 siblings, 1 reply; 8+ messages in thread
From: Alexander Duyck @ 2016-04-22 17:14 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Eric Dumazet, Sowmini Varadhan, Netdev

On Fri, Apr 22, 2016 at 1:51 AM, Steffen Klassert
<steffen.klassert@secunet.com> wrote:
> On Thu, Apr 21, 2016 at 09:02:48AM -0700, Alexander Duyck wrote:
>> On Thu, Apr 21, 2016 at 12:40 AM, Steffen Klassert
>> <steffen.klassert@secunet.com> wrote:
>> > This partly reverts the below mentioned patch because on
>> > forwarding, such skbs can't be offloaded to a NIC.
>> >
>> > We need this to get IPsec GRO for forwarding to work properly,
>> > otherwise the GRO aggregated packets get segmented again by
>> > the GSO layer. Although discovered when implementing IPsec GRO,
>> > this is a general problem in the forwarding path.
>>
>> I'm confused as to why you would need this to get IPsec GRO forwarding
>> to work.
>
> It works without this, but the performance numbers are not that good
> if we have to do GSO in software.

Well really GSO is only meant to preform better than if we didn't do
any GRO/GSO at all.  If that isn't the case I wouldn't consider it a
regression since as Eric points out there are other scenerios where
you end up with a chain of buffers stuck on the fraglist.  Mostly what
GRO/GSO gets you is fewer runs through the stack.

>> Are you having to go through a device that doesn't have
>> NETIF_F_FRAGLIST defined?
>
> I don't know of any NIC that can do TSO on a skbuff with fraglist,
> that's why I try to avoid to have a buffer with fraglist.
>

Most of them don't.  There are only one or two NICs out there that
support transmitting a frame that has a fraglist.

>> Also what is the issue with having to go
>> through the GSO layer on segmentation?  It seems like we might be able
>> to do something like what we did with GSO partial to split frames so
>> that they are in chunks that wouldn't require NETIF_F_FRAGLIST.  Then
>> you could get the best of both worlds in that the stack would only
>> process one super-frame, and the transmitter could TSO a series of
>> frames that are some fixed MSS in size.
>
> This could be interesting. Then we could have a buffer with
> fraglist, GSO layer splits in skbuffs without fraglist that
> can be TSO offloaded. Something like this might solve my
> performance problems.

Right.  It is something to think about.  I was considering what might
be involved to make a fraglist based skb a GSO type.  Then we might be
able to handle it kind of like what we do for the whole
SKB_GSO_DODGY/NETIF_F_GSO_ROBUST path.  Basically if we just need to
break the frame at the fraglist level it probably wouldn't be that
hard to do assuming each skb is MSS aligned in terms of size.

- Alex

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH] gro: Partly revert "net: gro: allow to build full sized skb"
  2016-04-22 17:14     ` Alexander Duyck
@ 2016-06-24  8:22       ` Steffen Klassert
  0 siblings, 0 replies; 8+ messages in thread
From: Steffen Klassert @ 2016-06-24  8:22 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Eric Dumazet, Sowmini Varadhan, Netdev

Sorry for replying to old mail, but wanted to keep the context.

On Fri, Apr 22, 2016 at 10:14:22AM -0700, Alexander Duyck wrote:
> On Fri, Apr 22, 2016 at 1:51 AM, Steffen Klassert
> <steffen.klassert@secunet.com> wrote:
> > On Thu, Apr 21, 2016 at 09:02:48AM -0700, Alexander Duyck wrote:
> >> On Thu, Apr 21, 2016 at 12:40 AM, Steffen Klassert
> >> <steffen.klassert@secunet.com> wrote:
> >> > This partly reverts the below mentioned patch because on
> >> > forwarding, such skbs can't be offloaded to a NIC.
> >> >
> >> > We need this to get IPsec GRO for forwarding to work properly,
> >> > otherwise the GRO aggregated packets get segmented again by
> >> > the GSO layer. Although discovered when implementing IPsec GRO,
> >> > this is a general problem in the forwarding path.
> >>
> >> I'm confused as to why you would need this to get IPsec GRO forwarding
> >> to work.
> >
> > It works without this, but the performance numbers are not that good
> > if we have to do GSO in software.
> 
> Well really GSO is only meant to preform better than if we didn't do
> any GRO/GSO at all.  If that isn't the case I wouldn't consider it a
> regression since as Eric points out there are other scenerios where
> you end up with a chain of buffers stuck on the fraglist.  Mostly what
> GRO/GSO gets you is fewer runs through the stack.
> 
> >> Are you having to go through a device that doesn't have
> >> NETIF_F_FRAGLIST defined?
> >
> > I don't know of any NIC that can do TSO on a skbuff with fraglist,
> > that's why I try to avoid to have a buffer with fraglist.
> >
> 
> Most of them don't.  There are only one or two NICs out there that
> support transmitting a frame that has a fraglist.
> 
> >> Also what is the issue with having to go
> >> through the GSO layer on segmentation?  It seems like we might be able
> >> to do something like what we did with GSO partial to split frames so
> >> that they are in chunks that wouldn't require NETIF_F_FRAGLIST.  Then
> >> you could get the best of both worlds in that the stack would only
> >> process one super-frame, and the transmitter could TSO a series of
> >> frames that are some fixed MSS in size.
> >
> > This could be interesting. Then we could have a buffer with
> > fraglist, GSO layer splits in skbuffs without fraglist that
> > can be TSO offloaded. Something like this might solve my
> > performance problems.
> 
> Right.  It is something to think about.  I was considering what might
> be involved to make a fraglist based skb a GSO type.  Then we might be
> able to handle it kind of like what we do for the whole
> SKB_GSO_DODGY/NETIF_F_GSO_ROBUST path.  Basically if we just need to
> break the frame at the fraglist level it probably wouldn't be that
> hard to do assuming each skb is MSS aligned in terms of size.

I've tried to implement the idea to split buffers at the frag_list
pointer and ended up with the patch below. With this patch, the
SKB_GSO_PARTIAL case is not the only case where skb_segment() can
return a gso skb. I had to adapt some gso handlers to this, not
sure if I found all places where I have to do this. Works in my
case, but needs review and maybe some more sophisticated tests.

I've could not benchmark this with big packet sizes because my
10G interfaces are the limiting factors then. So I did an iperf
forwarding test with reduced TCP mss to 536 byte.

Result with a recent net-next tree:

net-next 6.67 Gbits/sec

net-next + patch 8.20 Gbits/sec


Subject: [RFC PATCH] gso: Support partial splitting at the frag_list pointer

Since commit 8a29111c7 ("net: gro: allow to build full sized skb")
gro may build buffers with a frag_list. This can hurts forwarding
because most NICs can't offload such packets, they need to be
segmented in software. This patch splits buffers with a frag_list
at the frag_list pointer into buffers that can be TSO offloaded.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/core/skbuff.c      | 90 +++++++++++++++++++++++++++++++++++++++++++++++++-
 net/ipv4/af_inet.c     |  7 ++--
 net/ipv4/gre_offload.c |  7 +++-
 net/ipv4/tcp_offload.c |  3 ++
 net/ipv4/udp_offload.c |  9 +++--
 net/ipv6/ip6_offload.c |  6 +++-
 6 files changed, 115 insertions(+), 7 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index e7ec6d3..093c3cd 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3096,6 +3096,93 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 	sg = !!(features & NETIF_F_SG);
 	csum = !!can_checksum_protocol(features, proto);
 
+	headroom = skb_headroom(head_skb);
+
+	if (list_skb && net_gso_ok(features, skb_shinfo(head_skb)->gso_type) &&
+	    csum && sg && (mss != GSO_BY_FRAGS) &&
+	    !(features & NETIF_F_GSO_PARTIAL)) {
+		unsigned int lskb_segs;
+		unsigned int delta_segs, delta_len, delta_truesize;
+		struct sk_buff *nskb;
+		delta_segs = delta_len = delta_truesize = 0;
+
+		segs = __alloc_skb(skb_headlen(head_skb) + headroom,
+				   GFP_ATOMIC, skb_alloc_rx_flag(head_skb),
+				   NUMA_NO_NODE);
+		if (unlikely(!segs))
+			return ERR_PTR(-ENOMEM);
+
+		skb_reserve(segs, headroom);
+		skb_put(segs, skb_headlen(head_skb));
+		skb_copy_from_linear_data(head_skb, segs->data, segs->len);
+		copy_skb_header(segs, head_skb);
+
+		if (skb_shinfo(head_skb)->nr_frags) {
+			int i;
+
+			if (skb_orphan_frags(head_skb, GFP_ATOMIC))
+				goto err;
+
+			for (i = 0; i < skb_shinfo(head_skb)->nr_frags; i++) {
+				skb_shinfo(segs)->frags[i] = skb_shinfo(head_skb)->frags[i];
+				skb_frag_ref(head_skb, i);
+			}
+			skb_shinfo(segs)->nr_frags = i;
+		}
+
+		do {
+			nskb = skb_clone(list_skb, GFP_ATOMIC);
+			if (unlikely(!nskb))
+				goto err;
+
+			list_skb = list_skb->next;
+
+			if (!tail)
+				segs->next = nskb;
+			else
+				tail->next = nskb;
+
+			tail = nskb;
+
+			if (skb_cow_head(nskb, doffset + headroom))
+				goto err;
+
+			lskb_segs = nskb->len / mss;
+
+			skb_shinfo(nskb)->gso_size = mss;
+			skb_shinfo(nskb)->gso_type = skb_shinfo(head_skb)->gso_type;
+			skb_shinfo(nskb)->gso_segs = lskb_segs;
+
+
+			delta_segs += lskb_segs;
+			delta_len += nskb->len;
+			delta_truesize += nskb->truesize;
+
+			__skb_push(nskb, doffset);
+
+			skb_release_head_state(nskb);
+			__copy_skb_header(nskb, head_skb);
+
+			skb_headers_offset_update(nskb, skb_headroom(nskb) - headroom);
+			skb_reset_mac_len(nskb);
+
+			skb_copy_from_linear_data_offset(head_skb, -tnl_hlen,
+							 nskb->data - tnl_hlen,
+							 doffset + tnl_hlen);
+
+
+		} while (list_skb);
+
+		skb_shinfo(segs)->gso_segs -= delta_segs;
+		segs->len = head_skb->len - delta_len;
+		segs->data_len = head_skb->data_len - delta_len;
+		segs->truesize += head_skb->data_len - delta_truesize;
+
+		segs->prev = tail;
+
+		goto out;
+	}
+
 	/* GSO partial only requires that we trim off any excess that
 	 * doesn't fit into an MSS sized block, so take care of that
 	 * now.
@@ -3108,7 +3195,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 			partial_segs = 0;
 	}
 
-	headroom = skb_headroom(head_skb);
 	pos = skb_headlen(head_skb);
 
 	do {
@@ -3325,6 +3411,8 @@ perform_csum_check:
 		swap(tail->destructor, head_skb->destructor);
 		swap(tail->sk, head_skb->sk);
 	}
+
+out:
 	return segs;
 
 err:
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index d39e9e4..90ecc22 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1195,7 +1195,7 @@ EXPORT_SYMBOL(inet_sk_rebuild_header);
 struct sk_buff *inet_gso_segment(struct sk_buff *skb,
 				 netdev_features_t features)
 {
-	bool udpfrag = false, fixedid = false, encap;
+	bool udpfrag = false, fixedid = false, gso_partial = false, encap;
 	struct sk_buff *segs = ERR_PTR(-EINVAL);
 	const struct net_offload *ops;
 	unsigned int offset = 0;
@@ -1248,6 +1248,9 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb,
 	if (IS_ERR_OR_NULL(segs))
 		goto out;
 
+	if (skb_shinfo(segs)->gso_type & SKB_GSO_PARTIAL)
+		gso_partial = true;
+
 	skb = segs;
 	do {
 		iph = (struct iphdr *)(skb_mac_header(skb) + nhoff);
@@ -1257,7 +1260,7 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb,
 				iph->frag_off |= htons(IP_MF);
 			offset += skb->len - nhoff - ihl;
 			tot_len = skb->len - nhoff;
-		} else if (skb_is_gso(skb)) {
+		} else if (skb_is_gso(skb) && gso_partial) {
 			if (!fixedid) {
 				iph->id = htons(id);
 				id += skb_shinfo(skb)->gso_segs;
diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
index ecd1e09..cf82e28 100644
--- a/net/ipv4/gre_offload.c
+++ b/net/ipv4/gre_offload.c
@@ -24,7 +24,7 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
 	__be16 protocol = skb->protocol;
 	u16 mac_len = skb->mac_len;
 	int gre_offset, outer_hlen;
-	bool need_csum, ufo;
+	bool need_csum, ufo, gso_partial;
 
 	if (!skb->encapsulation)
 		goto out;
@@ -69,6 +69,11 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
 		goto out;
 	}
 
+	if (skb_shinfo(segs)->gso_type & SKB_GSO_PARTIAL)
+		gso_partial = true;
+	else
+		gso_partial = false;
+
 	outer_hlen = skb_tnl_header_len(skb);
 	gre_offset = outer_hlen - tnl_hlen;
 	skb = segs;
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 5c59649..dddd227 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -107,6 +107,9 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 
 	/* Only first segment might have ooo_okay set */
 	segs->ooo_okay = ooo_okay;
+	if (skb_is_gso(segs) && !(skb_shinfo(segs)->gso_type & SKB_GSO_PARTIAL))
+		mss = (skb_tail_pointer(segs) - skb_transport_header(segs)) +
+		       segs->data_len - thlen;
 
 	delta = htonl(oldlen + (thlen + mss));
 
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 81f253b..dfb6a2c 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -21,7 +21,7 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
 	__be16 new_protocol, bool is_ipv6)
 {
 	int tnl_hlen = skb_inner_mac_header(skb) - skb_transport_header(skb);
-	bool remcsum, need_csum, offload_csum, ufo;
+	bool remcsum, need_csum, offload_csum, ufo, gso_partial;
 	struct sk_buff *segs = ERR_PTR(-EINVAL);
 	struct udphdr *uh = udp_hdr(skb);
 	u16 mac_offset = skb->mac_header;
@@ -88,6 +88,11 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
 		goto out;
 	}
 
+	if (skb_shinfo(segs)->gso_type & SKB_GSO_PARTIAL)
+		gso_partial = true;
+	else
+		gso_partial = false;
+
 	outer_hlen = skb_tnl_header_len(skb);
 	udp_offset = outer_hlen - tnl_hlen;
 	skb = segs;
@@ -117,7 +122,7 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
 		 * will be using a length value equal to only one MSS sized
 		 * segment instead of the entire frame.
 		 */
-		if (skb_is_gso(skb)) {
+		if (skb_is_gso(skb) && gso_partial) {
 			uh->len = htons(skb_shinfo(skb)->gso_size +
 					SKB_GSO_CB(skb)->data_offset +
 					skb->head - (unsigned char *)uh);
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 22e90e5..0ec16ba 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -69,6 +69,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 	int offset = 0;
 	bool encap, udpfrag;
 	int nhoff;
+	bool gso_partial = false;
 
 	skb_reset_network_header(skb);
 	nhoff = skb_network_header(skb) - skb_mac_header(skb);
@@ -101,9 +102,12 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 	if (IS_ERR(segs))
 		goto out;
 
+	if (skb_shinfo(segs)->gso_type & SKB_GSO_PARTIAL)
+		gso_partial = true;
+
 	for (skb = segs; skb; skb = skb->next) {
 		ipv6h = (struct ipv6hdr *)(skb_mac_header(skb) + nhoff);
-		if (skb_is_gso(skb))
+		if (skb_is_gso(skb) && gso_partial)
 			payload_len = skb_shinfo(skb)->gso_size +
 				      SKB_GSO_CB(skb)->data_offset +
 				      skb->head - (unsigned char *)(ipv6h + 1);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-06-24  8:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-21  7:40 [RFC PATCH] gro: Partly revert "net: gro: allow to build full sized skb" Steffen Klassert
2016-04-21 12:59 ` Eric Dumazet
2016-04-22  9:13   ` Steffen Klassert
2016-04-22 12:39     ` Eric Dumazet
2016-04-21 16:02 ` Alexander Duyck
2016-04-22  8:51   ` Steffen Klassert
2016-04-22 17:14     ` Alexander Duyck
2016-06-24  8:22       ` Steffen Klassert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.