All of lore.kernel.org
 help / color / mirror / Atom feed
* problem with MPLS and TSO/GSO
@ 2016-07-25 16:39 Lennert Buytenhek
       [not found] ` <CAD=hENdOy-d0v9BskuvfqF3qdbrWCy2b-Dc-LSSUcZBmHy-X1A@mail.gmail.com>
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Lennert Buytenhek @ 2016-07-25 16:39 UTC (permalink / raw)
  To: David Ahern, Roopa Prabhu, Robert Shearman; +Cc: Alexander Duyck, netdev

Hi!

I am seeing pretty horrible TCP transmit performance (anywhere between
1 and 10 Mb/s, on a 10 Gb/s interface) when traffic is sent out over a
route that involves MPLS labeling, and this seems to be due to an
interaction between MPLS and TSO/GSO that causes all segmentable TCP
frames that are MPLS-labeled to be dropped on egress.

I initially ran into this issue with the ixgbe driver, but it is easily
reproduced with veth interfaces, and the script attached below this
email reproduces the issue.  The script configures three network
namespaces: one that transmits TCP data (netperf) with MPLS labels,
one that takes the MPLS traffic and pops the labels and forwards the
traffic on, and one that receives the traffic (netserver).  When not
using MPLS labeling, I get ~30000 Mb/s single-stream TCP performance
in this setup on my test box, and with MPLS labeling, I get ~2 Mb/s.

Some investigating shows that egress TCP frames that need to be
segmented are being dropped in validate_xmit_skb(), which calls
skb_gso_segment() which calls skb_mac_gso_segment() which returns
-EPROTONOSUPPORT because we apparently didn't have the right kernel
module (mpls_gso) loaded.

(It's somewhat poor design, IMHO, to degrade network performance by
15000x if someone didn't load a kernel module they didn't know they
should have loaded, and in a way that doesn't log any warnings or
errors and can only be diagnosed by adding printk calls to net/core/
and recompiling your kernel.)

(Also, I'm not sure why mpls_gso is needed when ixgbe seems to be
able to natively do TSO on MPLS-labeled traffic, maybe because ixgbe
doesn't advertise the necessary features in ->mpls_features?  But
adding those bits doesn't seem to change much.)

But, loading mpls_gso doesn't change much -- skb_gso_segment() then
starts return -EINVAL instead, which is due to the
skb_network_protocol() call in skb_mac_gso_segment() returning zero.
And looking at skb_network_protocol(), I don't see how this is
supposed to work -- skb->protocol is 0 at this point, and there is no
way to figure out that what we are encapsulating is IP traffic, because
unlike what is the case with VLAN tags, MPLS labels aren't followed by
an inner ethertype that says what kind of traffic is in here, you have
to have explicit knowledge of the payload type for MPLS.

Any ideas?

Thanks in advance!


Cheers,
Lennert



=== problem.sh
#!/bin/sh

# ns0 sends out packets with mpls labels
# ns1 receives the labelled packets, pops the labels, and forwards to ns2
# ns2 receives the unlabelled packets and replies to ns0

ip netns add ns0
ip netns add ns1
ip netns add ns2

ip link add virt01 type veth peer name virt10
ip link set virt01 netns ns0
ip link set virt10 netns ns1

ip link add virt12 type veth peer name virt21
ip link set virt12 netns ns1
ip link set virt21 netns ns2

ip netns exec ns0 ip addr add 127.0.0.1/8 dev lo
ip netns exec ns0 ip link set lo up
ip netns exec ns0 ip addr add 172.16.20.20/24 dev virt01
ip netns exec ns0 ip link set virt01 up

ip netns exec ns1 ip addr add 127.0.0.1/8 dev lo
ip netns exec ns1 ip link set lo up
ip netns exec ns1 ip addr add 172.16.20.21/24 dev virt10
ip netns exec ns1 ip link set virt10 up
ip netns exec ns1 ip addr add 172.16.21.21/24 dev virt12
ip netns exec ns1 ip link set virt12 up

ip netns exec ns2 ip addr add 127.0.0.1/8 dev lo
ip netns exec ns2 ip link set lo up
ip netns exec ns2 ip addr add 172.16.21.22/24 dev virt21
ip netns exec ns2 ip link set virt21 up

modprobe mpls_iptunnel

ip netns exec ns0 ip route add 10.10.10.10/32 encap mpls 100 via inet 172.16.20.21 mtu lock 1496
#ip netns exec ns0 ip route add 172.16.21.0/24 via 172.16.20.21
ip netns exec ns0 ip route add 172.16.21.0/24 via 172.16.20.21 mtu lock 1496

ip netns exec ns1 sysctl -w net.ipv4.conf.all.rp_filter=0
ip netns exec ns1 sysctl -w net.ipv4.conf.default.rp_filter=0
ip netns exec ns1 sysctl -w net.ipv4.conf.lo.rp_filter=0
ip netns exec ns1 sysctl -w net.ipv4.conf.virt10.rp_filter=0
ip netns exec ns1 sysctl -w net.ipv4.conf.virt12.rp_filter=0
ip netns exec ns1 sysctl -w net.ipv4.ip_forward=1
ip netns exec ns1 sysctl -w net.mpls.conf.virt10.input=1
ip netns exec ns1 sysctl -w net.mpls.platform_labels=1000
ip netns exec ns1 ip -f mpls route add 100 via inet 172.16.21.22

ip netns exec ns2 ip addr add 10.10.10.10/32 dev lo
ip netns exec ns2 ip route add 172.16.20.0/24 via 172.16.21.21

ip netns exec ns0 ping -c 1 10.10.10.10

ip netns exec ns2 netserver

# non-mpls
ip netns exec ns0 netperf -c -C -H 172.16.21.22 -l 10 -t TCP_STREAM

# mpls (retry this with mpls_gso loaded)
ip netns exec ns0 netperf -c -C -H 10.10.10.10 -l 10 -t TCP_STREAM

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: problem with MPLS and TSO/GSO
       [not found] ` <CAD=hENdOy-d0v9BskuvfqF3qdbrWCy2b-Dc-LSSUcZBmHy-X1A@mail.gmail.com>
@ 2016-07-27  7:03   ` Lennert Buytenhek
  2016-07-31  7:07   ` Roopa Prabhu
  1 sibling, 0 replies; 7+ messages in thread
From: Lennert Buytenhek @ 2016-07-27  7:03 UTC (permalink / raw)
  To: zhuyj; +Cc: David Ahern, Roopa Prabhu, Robert Shearman, Alexander Duyck, netdev

On Wed, Jul 27, 2016 at 03:02:24PM +0800, zhuyj wrote:

> On ubuntu16.04 server 64 bit
> The attached script is run, the following will appear.
> 
> Error: either "to" is duplicate, or "encap" is a garbage.

Looks like your installed iproute2 package doesn't grok MPLS.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: problem with MPLS and TSO/GSO
       [not found] ` <CAD=hENdOy-d0v9BskuvfqF3qdbrWCy2b-Dc-LSSUcZBmHy-X1A@mail.gmail.com>
  2016-07-27  7:03   ` Lennert Buytenhek
@ 2016-07-31  7:07   ` Roopa Prabhu
  2016-08-08 15:25     ` Simon Horman
  1 sibling, 1 reply; 7+ messages in thread
From: Roopa Prabhu @ 2016-07-31  7:07 UTC (permalink / raw)
  To: zhuyj
  Cc: Lennert Buytenhek, David Ahern, Robert Shearman, Alexander Duyck,
	netdev, Simon Horman

On 7/27/16, 12:02 AM, zhuyj wrote:
> On ubuntu16.04 server 64 bit
> The attached script is run, the following will appear.
>
> Error: either "to" is duplicate, or "encap" is a garbage.

This maybe just because the iproute2 version on ubuntu does not
support the route encap attributes yet.

[snip]

>
> On Tue, Jul 26, 2016 at 12:39 AM, Lennert Buytenhek <buytenh@wantstofly.org>
> wrote:
>
>> Hi!
>>
>> I am seeing pretty horrible TCP transmit performance (anywhere between
>> 1 and 10 Mb/s, on a 10 Gb/s interface) when traffic is sent out over a
>> route that involves MPLS labeling, and this seems to be due to an
>> interaction between MPLS and TSO/GSO that causes all segmentable TCP
>> frames that are MPLS-labeled to be dropped on egress.
>>
>> I initially ran into this issue with the ixgbe driver, but it is easily
>> reproduced with veth interfaces, and the script attached below this
>> email reproduces the issue.  The script configures three network
>> namespaces: one that transmits TCP data (netperf) with MPLS labels,
>> one that takes the MPLS traffic and pops the labels and forwards the
>> traffic on, and one that receives the traffic (netserver).  When not
>> using MPLS labeling, I get ~30000 Mb/s single-stream TCP performance
>> in this setup on my test box, and with MPLS labeling, I get ~2 Mb/s.
>>
>> Some investigating shows that egress TCP frames that need to be
>> segmented are being dropped in validate_xmit_skb(), which calls
>> skb_gso_segment() which calls skb_mac_gso_segment() which returns
>> -EPROTONOSUPPORT because we apparently didn't have the right kernel
>> module (mpls_gso) loaded.
>>
>> (It's somewhat poor design, IMHO, to degrade network performance by
>> 15000x if someone didn't load a kernel module they didn't know they
>> should have loaded, and in a way that doesn't log any warnings or
>> errors and can only be diagnosed by adding printk calls to net/core/
>> and recompiling your kernel.)

Its possible that the right way to do this is to always auto select MPLS_GSO
if MPLS_IPTUNNEL is selected. I am guessing this by looking at the
openvswitch mpls Kconfig entries and comparing with MPLS_IPTUNNEL.
will look some more.

>>
>> (Also, I'm not sure why mpls_gso is needed when ixgbe seems to be
>> able to natively do TSO on MPLS-labeled traffic, maybe because ixgbe
>> doesn't advertise the necessary features in ->mpls_features?  But
>> adding those bits doesn't seem to change much.)
>>
>> But, loading mpls_gso doesn't change much -- skb_gso_segment() then
>> starts return -EINVAL instead, which is due to the
>> skb_network_protocol() call in skb_mac_gso_segment() returning zero.
>> And looking at skb_network_protocol(), I don't see how this is
>> supposed to work -- skb->protocol is 0 at this point, and there is no
>> way to figure out that what we are encapsulating is IP traffic, because
>> unlike what is the case with VLAN tags, MPLS labels aren't followed by
>> an inner ethertype that says what kind of traffic is in here, you have
>> to have explicit knowledge of the payload type for MPLS.
>>
>> Any ideas?
I was looking at the history of net/mpls/mpls_gso.c and the initial git log comment
says that the driver expects the mpls tunnel driver to do a few things which I think
might be the problem. I do see mpls_iptunnel.c setting the skb->protocol but not the
skb->inner_protocol. wonder if fixing anything there will help ?.

thanks,
Roopa

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: problem with MPLS and TSO/GSO
  2016-07-31  7:07   ` Roopa Prabhu
@ 2016-08-08 15:25     ` Simon Horman
  2016-08-10  5:44       ` Roopa Prabhu
  0 siblings, 1 reply; 7+ messages in thread
From: Simon Horman @ 2016-08-08 15:25 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: zhuyj, Lennert Buytenhek, David Ahern, Robert Shearman,
	Alexander Duyck, netdev

On Sun, Jul 31, 2016 at 12:07:10AM -0700, Roopa Prabhu wrote:
> On 7/27/16, 12:02 AM, zhuyj wrote:
> > On ubuntu16.04 server 64 bit
> > The attached script is run, the following will appear.
> >
> > Error: either "to" is duplicate, or "encap" is a garbage.
> 
> This maybe just because the iproute2 version on ubuntu does not
> support the route encap attributes yet.
> 
> [snip]
> 
> >
> > On Tue, Jul 26, 2016 at 12:39 AM, Lennert Buytenhek <buytenh@wantstofly.org>
> > wrote:
> >
> >> Hi!
> >>
> >> I am seeing pretty horrible TCP transmit performance (anywhere between
> >> 1 and 10 Mb/s, on a 10 Gb/s interface) when traffic is sent out over a
> >> route that involves MPLS labeling, and this seems to be due to an
> >> interaction between MPLS and TSO/GSO that causes all segmentable TCP
> >> frames that are MPLS-labeled to be dropped on egress.
> >>
> >> I initially ran into this issue with the ixgbe driver, but it is easily
> >> reproduced with veth interfaces, and the script attached below this
> >> email reproduces the issue.  The script configures three network
> >> namespaces: one that transmits TCP data (netperf) with MPLS labels,
> >> one that takes the MPLS traffic and pops the labels and forwards the
> >> traffic on, and one that receives the traffic (netserver).  When not
> >> using MPLS labeling, I get ~30000 Mb/s single-stream TCP performance
> >> in this setup on my test box, and with MPLS labeling, I get ~2 Mb/s.
> >>
> >> Some investigating shows that egress TCP frames that need to be
> >> segmented are being dropped in validate_xmit_skb(), which calls
> >> skb_gso_segment() which calls skb_mac_gso_segment() which returns
> >> -EPROTONOSUPPORT because we apparently didn't have the right kernel
> >> module (mpls_gso) loaded.
> >>
> >> (It's somewhat poor design, IMHO, to degrade network performance by
> >> 15000x if someone didn't load a kernel module they didn't know they
> >> should have loaded, and in a way that doesn't log any warnings or
> >> errors and can only be diagnosed by adding printk calls to net/core/
> >> and recompiling your kernel.)
> 
> Its possible that the right way to do this is to always auto select MPLS_GSO
> if MPLS_IPTUNNEL is selected. I am guessing this by looking at the
> openvswitch mpls Kconfig entries and comparing with MPLS_IPTUNNEL.
> will look some more.
> 
> >>
> >> (Also, I'm not sure why mpls_gso is needed when ixgbe seems to be
> >> able to natively do TSO on MPLS-labeled traffic, maybe because ixgbe
> >> doesn't advertise the necessary features in ->mpls_features?  But
> >> adding those bits doesn't seem to change much.)
> >>
> >> But, loading mpls_gso doesn't change much -- skb_gso_segment() then
> >> starts return -EINVAL instead, which is due to the
> >> skb_network_protocol() call in skb_mac_gso_segment() returning zero.
> >> And looking at skb_network_protocol(), I don't see how this is
> >> supposed to work -- skb->protocol is 0 at this point, and there is no
> >> way to figure out that what we are encapsulating is IP traffic, because
> >> unlike what is the case with VLAN tags, MPLS labels aren't followed by
> >> an inner ethertype that says what kind of traffic is in here, you have
> >> to have explicit knowledge of the payload type for MPLS.
> >>
> >> Any ideas?
> I was looking at the history of net/mpls/mpls_gso.c and the initial git log comment
> says that the driver expects the mpls tunnel driver to do a few things which I think
> might be the problem. I do see mpls_iptunnel.c setting the skb->protocol but not the
> skb->inner_protocol. wonder if fixing anything there will help ?.

If the inner protocol is not set then I don't think that segmentation can
function as there is (or at least was for the use case the code was added)
no way for the stack to know the protocol of the inner packet otherwise.

On another note I was recently poking around the code and I wonder if the
following may be needed (this was in the context of my under-construction
l3 tunnel work for OvS and it may only be needed in that context):

diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
index 2055e57ed1c3..113cba89653d 100644
--- a/net/mpls/mpls_gso.c
+++ b/net/mpls/mpls_gso.c
@@ -39,16 +39,18 @@ static struct sk_buff *mpls_gso_segment(struct sk_buff *skb,
 	mpls_features = skb->dev->mpls_features & features;
 	segs = skb_mac_gso_segment(skb, mpls_features);
 
-
-	/* Restore outer protocol. */
-	skb->protocol = mpls_protocol;
-
 	/* Re-pull the mac header that the call to skb_mac_gso_segment()
 	 * above pulled.  It will be re-pushed after returning
 	 * skb_mac_gso_segment(), an indirect caller of this function.
 	 */
 	__skb_pull(skb, skb->data - skb_mac_header(skb));
 
+	/* Restore outer protocol. */
+	skb->protocol = mpls_protocol;
+	if (!IS_ERR(segs))
+		for (skb = segs; skb; skb = skb->next)
+			skb->protocol = mpls_protocol;
+
 	return segs;
 }
 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: problem with MPLS and TSO/GSO
  2016-07-25 16:39 problem with MPLS and TSO/GSO Lennert Buytenhek
       [not found] ` <CAD=hENdOy-d0v9BskuvfqF3qdbrWCy2b-Dc-LSSUcZBmHy-X1A@mail.gmail.com>
@ 2016-08-08 17:48 ` David Ahern
  2016-08-10  3:52 ` David Ahern
  2 siblings, 0 replies; 7+ messages in thread
From: David Ahern @ 2016-08-08 17:48 UTC (permalink / raw)
  To: Lennert Buytenhek, Roopa Prabhu, Robert Shearman; +Cc: Alexander Duyck, netdev

On 7/25/16 10:39 AM, Lennert Buytenhek wrote:
> Hi!
> 
> I am seeing pretty horrible TCP transmit performance (anywhere between
> 1 and 10 Mb/s, on a 10 Gb/s interface) when traffic is sent out over a
> route that involves MPLS labeling, and this seems to be due to an
> interaction between MPLS and TSO/GSO that causes all segmentable TCP
> frames that are MPLS-labeled to be dropped on egress.
> 
> I initially ran into this issue with the ixgbe driver, but it is easily
> reproduced with veth interfaces, and the script attached below this
> email reproduces the issue.  The script configures three network
> namespaces: one that transmits TCP data (netperf) with MPLS labels,
> one that takes the MPLS traffic and pops the labels and forwards the
> traffic on, and one that receives the traffic (netserver).  When not
> using MPLS labeling, I get ~30000 Mb/s single-stream TCP performance
> in this setup on my test box, and with MPLS labeling, I get ~2 Mb/s.
> 
> Some investigating shows that egress TCP frames that need to be
> segmented are being dropped in validate_xmit_skb(), which calls
> skb_gso_segment() which calls skb_mac_gso_segment() which returns
> -EPROTONOSUPPORT because we apparently didn't have the right kernel
> module (mpls_gso) loaded.
> 
> (It's somewhat poor design, IMHO, to degrade network performance by
> 15000x if someone didn't load a kernel module they didn't know they
> should have loaded, and in a way that doesn't log any warnings or
> errors and can only be diagnosed by adding printk calls to net/core/
> and recompiling your kernel.)
> 
> (Also, I'm not sure why mpls_gso is needed when ixgbe seems to be
> able to natively do TSO on MPLS-labeled traffic, maybe because ixgbe
> doesn't advertise the necessary features in ->mpls_features?  But
> adding those bits doesn't seem to change much.)
> 
> But, loading mpls_gso doesn't change much -- skb_gso_segment() then
> starts return -EINVAL instead, which is due to the
> skb_network_protocol() call in skb_mac_gso_segment() returning zero.
> And looking at skb_network_protocol(), I don't see how this is
> supposed to work -- skb->protocol is 0 at this point, and there is no
> way to figure out that what we are encapsulating is IP traffic, because
> unlike what is the case with VLAN tags, MPLS labels aren't followed by
> an inner ethertype that says what kind of traffic is in here, you have
> to have explicit knowledge of the payload type for MPLS.
> 
> Any ideas?

Something is up with the skb manipulations or settings by mpls. With the inner protocol set in mpls_output:

skb_set_inner_protocol(skb, skb->protocol);

I get EINVAL failures from inet_gso_segment because the iphdr is not proper (ihl is 0 and version is 0).


Thanks for the script to repro with namespaces; much simpler to debug.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: problem with MPLS and TSO/GSO
  2016-07-25 16:39 problem with MPLS and TSO/GSO Lennert Buytenhek
       [not found] ` <CAD=hENdOy-d0v9BskuvfqF3qdbrWCy2b-Dc-LSSUcZBmHy-X1A@mail.gmail.com>
  2016-08-08 17:48 ` David Ahern
@ 2016-08-10  3:52 ` David Ahern
  2 siblings, 0 replies; 7+ messages in thread
From: David Ahern @ 2016-08-10  3:52 UTC (permalink / raw)
  To: Lennert Buytenhek, Roopa Prabhu, Robert Shearman
  Cc: Alexander Duyck, netdev, Simon Horman

On 7/25/16 10:39 AM, Lennert Buytenhek wrote:
> Hi!
> 
> I am seeing pretty horrible TCP transmit performance (anywhere between
> 1 and 10 Mb/s, on a 10 Gb/s interface) when traffic is sent out over a
> route that involves MPLS labeling, and this seems to be due to an
> interaction between MPLS and TSO/GSO that causes all segmentable TCP
> frames that are MPLS-labeled to be dropped on egress.
...
> But, loading mpls_gso doesn't change much -- skb_gso_segment() then
> starts return -EINVAL instead, which is due to the
> skb_network_protocol() call in skb_mac_gso_segment() returning zero.
> And looking at skb_network_protocol(), I don't see how this is
> supposed to work -- skb->protocol is 0 at this point, and there is no
> way to figure out that what we are encapsulating is IP traffic, because
> unlike what is the case with VLAN tags, MPLS labels aren't followed by
> an inner ethertype that says what kind of traffic is in here, you have
> to have explicit knowledge of the payload type for MPLS.
> 
> Any ideas?

A quick update. I have a pretty good handle on the GSO changes for MPLS but I am still puzzled by a few things. Hopefully by end of week I can send out a patch series. Current performance comparison with my changes and a patch from Roopa:

MPLS
====
root@kenny-jessie3:~# ip netns exec ns0 netperf -c -C -H 10.10.10.10 -l 10 -t TCP_STREAM
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.10.10.10 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  16384  16384    10.00      3510.26   48.11    48.11    4.491   4.491

non-MPLS
========
root@kenny-jessie3:~# ip netns exec ns0 netperf -c -C -H 172.16.21.22 -l 30 -t TCP_STREAM
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.21.22 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

 87380  16384  16384    30.00      9654.97   42.37    42.37    1.438   1.438

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: problem with MPLS and TSO/GSO
  2016-08-08 15:25     ` Simon Horman
@ 2016-08-10  5:44       ` Roopa Prabhu
  0 siblings, 0 replies; 7+ messages in thread
From: Roopa Prabhu @ 2016-08-10  5:44 UTC (permalink / raw)
  To: Simon Horman
  Cc: zhuyj, Lennert Buytenhek, David Ahern, Robert Shearman,
	Alexander Duyck, netdev

On 8/8/16, 8:25 AM, Simon Horman wrote:
> On Sun, Jul 31, 2016 at 12:07:10AM -0700, Roopa Prabhu wrote:
>> On 7/27/16, 12:02 AM, zhuyj wrote:
>>> On ubuntu16.04 server 64 bit
>>> The attached script is run, the following will appear.
>>>
>>> Error: either "to" is duplicate, or "encap" is a garbage.
>> This maybe just because the iproute2 version on ubuntu does not
>> support the route encap attributes yet.
>>
>> [snip]
>>
>>> On Tue, Jul 26, 2016 at 12:39 AM, Lennert Buytenhek <buytenh@wantstofly.org>
>>> wrote:
>>>
>>>> Hi!
>>>>
>>>> I am seeing pretty horrible TCP transmit performance (anywhere between
>>>> 1 and 10 Mb/s, on a 10 Gb/s interface) when traffic is sent out over a
>>>> route that involves MPLS labeling, and this seems to be due to an
>>>> interaction between MPLS and TSO/GSO that causes all segmentable TCP
>>>> frames that are MPLS-labeled to be dropped on egress.
>>>>
>>>> I initially ran into this issue with the ixgbe driver, but it is easily
>>>> reproduced with veth interfaces, and the script attached below this
>>>> email reproduces the issue.  The script configures three network
>>>> namespaces: one that transmits TCP data (netperf) with MPLS labels,
>>>> one that takes the MPLS traffic and pops the labels and forwards the
>>>> traffic on, and one that receives the traffic (netserver).  When not
>>>> using MPLS labeling, I get ~30000 Mb/s single-stream TCP performance
>>>> in this setup on my test box, and with MPLS labeling, I get ~2 Mb/s.
>>>>
>>>> Some investigating shows that egress TCP frames that need to be
>>>> segmented are being dropped in validate_xmit_skb(), which calls
>>>> skb_gso_segment() which calls skb_mac_gso_segment() which returns
>>>> -EPROTONOSUPPORT because we apparently didn't have the right kernel
>>>> module (mpls_gso) loaded.
>>>>
>>>> (It's somewhat poor design, IMHO, to degrade network performance by
>>>> 15000x if someone didn't load a kernel module they didn't know they
>>>> should have loaded, and in a way that doesn't log any warnings or
>>>> errors and can only be diagnosed by adding printk calls to net/core/
>>>> and recompiling your kernel.)
>> Its possible that the right way to do this is to always auto select MPLS_GSO
>> if MPLS_IPTUNNEL is selected. I am guessing this by looking at the
>> openvswitch mpls Kconfig entries and comparing with MPLS_IPTUNNEL.
>> will look some more.
>>
>>>> (Also, I'm not sure why mpls_gso is needed when ixgbe seems to be
>>>> able to natively do TSO on MPLS-labeled traffic, maybe because ixgbe
>>>> doesn't advertise the necessary features in ->mpls_features?  But
>>>> adding those bits doesn't seem to change much.)
>>>>
>>>> But, loading mpls_gso doesn't change much -- skb_gso_segment() then
>>>> starts return -EINVAL instead, which is due to the
>>>> skb_network_protocol() call in skb_mac_gso_segment() returning zero.
>>>> And looking at skb_network_protocol(), I don't see how this is
>>>> supposed to work -- skb->protocol is 0 at this point, and there is no
>>>> way to figure out that what we are encapsulating is IP traffic, because
>>>> unlike what is the case with VLAN tags, MPLS labels aren't followed by
>>>> an inner ethertype that says what kind of traffic is in here, you have
>>>> to have explicit knowledge of the payload type for MPLS.
>>>>
>>>> Any ideas?
>> I was looking at the history of net/mpls/mpls_gso.c and the initial git log comment
>> says that the driver expects the mpls tunnel driver to do a few things which I think
>> might be the problem. I do see mpls_iptunnel.c setting the skb->protocol but not the
>> skb->inner_protocol. wonder if fixing anything there will help ?.
> If the inner protocol is not set then I don't think that segmentation can
> function as there is (or at least was for the use case the code was added)
> no way for the stack to know the protocol of the inner packet otherwise.
>
> On another note I was recently poking around the code and I wonder if the
> following may be needed (this was in the context of my under-construction
> l3 tunnel work for OvS and it may only be needed in that context):

Thanks simon, we are still working with this.. stay tuned.
>
> diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
> index 2055e57ed1c3..113cba89653d 100644
> --- a/net/mpls/mpls_gso.c
> +++ b/net/mpls/mpls_gso.c
> @@ -39,16 +39,18 @@ static struct sk_buff *mpls_gso_segment(struct sk_buff *skb,
>  	mpls_features = skb->dev->mpls_features & features;
>  	segs = skb_mac_gso_segment(skb, mpls_features);
>  
> -
> -	/* Restore outer protocol. */
> -	skb->protocol = mpls_protocol;
> -
>  	/* Re-pull the mac header that the call to skb_mac_gso_segment()
>  	 * above pulled.  It will be re-pushed after returning
>  	 * skb_mac_gso_segment(), an indirect caller of this function.
>  	 */
>  	__skb_pull(skb, skb->data - skb_mac_header(skb));
>  
> +	/* Restore outer protocol. */
> +	skb->protocol = mpls_protocol;
> +	if (!IS_ERR(segs))
> +		for (skb = segs; skb; skb = skb->next)
> +			skb->protocol = mpls_protocol;
> +
>  	return segs;
>  }
>  

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-08-10 19:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-25 16:39 problem with MPLS and TSO/GSO Lennert Buytenhek
     [not found] ` <CAD=hENdOy-d0v9BskuvfqF3qdbrWCy2b-Dc-LSSUcZBmHy-X1A@mail.gmail.com>
2016-07-27  7:03   ` Lennert Buytenhek
2016-07-31  7:07   ` Roopa Prabhu
2016-08-08 15:25     ` Simon Horman
2016-08-10  5:44       ` Roopa Prabhu
2016-08-08 17:48 ` David Ahern
2016-08-10  3:52 ` David Ahern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.