large packet support in netfront driver and guest network throughput

All of lore.kernel.org
 help / color / mirror / Atom feed

* large packet support in netfront driver and guest network throughput
@ 2013-09-12 17:53 Anirban Chakraborty
  2013-09-13 11:44 ` Wei Liu
  0 siblings, 1 reply; 16+ messages in thread
From: Anirban Chakraborty @ 2013-09-12 17:53 UTC (permalink / raw)
  To: xen-devel

Hi All,

I am sure this has been answered somewhere in the list in the past, but I can't find it. I was wondering if the linux guest netfront driver has GRO support in it. tcpdump shows packets coming in with 1500 bytes, although the eth0 in dom0 and the vif corresponding to the linux guest in dom0 is showing that they receive large packet:

In dom0:
eth0      Link encap:Ethernet  HWaddr 90:E2:BA:3A:B1:A4  
          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
tcpdump -i eth0 -nnvv -s 1500 src 10.84.20.214
17:38:25.155373 IP (tos 0x0, ttl 64, id 54607, offset 0, flags [DF], proto TCP (6), length 29012)
    10.84.20.214.51041 > 10.84.20.213.5001: Flags [.], seq 276592:305552, ack 1, win 229, options [nop,nop,TS val 65594025 ecr 65569225], length 28960

vif4.0    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF  
          UP BROADCAST RUNNING NOARP PROMISC  MTU:1500  Metric:1
tcpdump -i vif4.0 -nnvv -s 1500 src 10.84.20.214
17:38:25.156364 IP (tos 0x0, ttl 64, id 54607, offset 0, flags [DF], proto TCP (6), length 29012)
    10.84.20.214.51041 > 10.84.20.213.5001: Flags [.], seq 276592:305552, ack 1, win 229, options [nop,nop,TS val 65594025 ecr 65569225], length 28960

In the guest:
eth0      Link encap:Ethernet  HWaddr CA:FD:DE:AB:E1:E4  
          inet addr:10.84.20.213  Bcast:10.84.20.255  Mask:255.255.255.0
          inet6 addr: fe80::c8fd:deff:feab:e1e4/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
tcpdump -i eth0 -nnvv -s 1500 src 10.84.20.214
10:38:25.071418 IP (tos 0x0, ttl 64, id 15074, offset 0, flags [DF], proto TCP (6), length 1500)
    10.84.20.214.51040 > 10.84.20.213.5001: Flags [.], seq 17400:18848, ack 1, win 229, options [nop,nop,TS val 65594013 ecr 65569213], length 1448

Is the packet on transfer from netback to net front is segmented into MTU size? Is GRO not supported in the guest?

I am seeing extremely low throughput on a 10Gb/s link. Two linux guests (Centos 6.4 64bit, 4 VCPU and 4GB of memory) are running on two different XenServer 6.1s and iperf session between them shows at most 3.2 Gbps. 
I am using linux bridge as network backend switch. Dom0 is configured to have 2940MB of RAM.
In most cases, after a few runs the throughput drops to ~2.2 Gbps. top shows that the netback thread in dom0 is having about 70-80% CPU utilization. I have checked the dom0 network configuration and there is no QoS policy in place etc. So, my question is that is PCI passthrough only option to get line rate in the guests? Is there any benchmark of maximum throughput achieved in the guests using PV drivers and without PCI pass thru? Also, what could be the reason for throughput drop in the guests (from ~3.2 to ~2.2 Gbps) consistently after few runs of iperf?

Any pointer will be highly appreciated.

thanks,
Anirban 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: large packet support in netfront driver and guest network throughput
  2013-09-12 17:53 large packet support in netfront driver and guest network throughput Anirban Chakraborty
@ 2013-09-13 11:44 ` Wei Liu
  2013-09-13 17:09   ` Anirban Chakraborty
  0 siblings, 1 reply; 16+ messages in thread
From: Wei Liu @ 2013-09-13 11:44 UTC (permalink / raw)
  To: Anirban Chakraborty; +Cc: wei.liu2, xen-devel

On Thu, Sep 12, 2013 at 05:53:02PM +0000, Anirban Chakraborty wrote:
> Hi All,
> 
> I am sure this has been answered somewhere in the list in the past, but I can't find it. I was wondering if the linux guest netfront driver has GRO support in it. tcpdump shows packets coming in with 1500 bytes, although the eth0 in dom0 and the vif corresponding to the linux guest in dom0 is showing that they receive large packet:
> 
> In dom0:
> eth0      Link encap:Ethernet  HWaddr 90:E2:BA:3A:B1:A4  
>           UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
> tcpdump -i eth0 -nnvv -s 1500 src 10.84.20.214
> 17:38:25.155373 IP (tos 0x0, ttl 64, id 54607, offset 0, flags [DF], proto TCP (6), length 29012)
>     10.84.20.214.51041 > 10.84.20.213.5001: Flags [.], seq 276592:305552, ack 1, win 229, options [nop,nop,TS val 65594025 ecr 65569225], length 28960
> 
> vif4.0    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF  
>           UP BROADCAST RUNNING NOARP PROMISC  MTU:1500  Metric:1
> tcpdump -i vif4.0 -nnvv -s 1500 src 10.84.20.214
> 17:38:25.156364 IP (tos 0x0, ttl 64, id 54607, offset 0, flags [DF], proto TCP (6), length 29012)
>     10.84.20.214.51041 > 10.84.20.213.5001: Flags [.], seq 276592:305552, ack 1, win 229, options [nop,nop,TS val 65594025 ecr 65569225], length 28960
> 
> 
> In the guest:
> eth0      Link encap:Ethernet  HWaddr CA:FD:DE:AB:E1:E4  
>           inet addr:10.84.20.213  Bcast:10.84.20.255  Mask:255.255.255.0
>           inet6 addr: fe80::c8fd:deff:feab:e1e4/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> tcpdump -i eth0 -nnvv -s 1500 src 10.84.20.214
> 10:38:25.071418 IP (tos 0x0, ttl 64, id 15074, offset 0, flags [DF], proto TCP (6), length 1500)
>     10.84.20.214.51040 > 10.84.20.213.5001: Flags [.], seq 17400:18848, ack 1, win 229, options [nop,nop,TS val 65594013 ecr 65569213], length 1448
> 
> Is the packet on transfer from netback to net front is segmented into MTU size? Is GRO not supported in the guest?

Here is what I see in the guest, iperf server running in guest and iperf
client running in Dom0. Tcpdump runs with the rune you provided.

10.80.238.213.38895 > 10.80.239.197.5001: Flags [.], seq
5806480:5818064, ack 1, win 229, options [nop,nop,TS val 21968973 ecr
21832969], length 11584

This is a upstream kernel. The throughput from Dom0 to DomU is ~7.2Gb/s.

> 
> I am seeing extremely low throughput on a 10Gb/s link. Two linux guests (Centos 6.4 64bit, 4 VCPU and 4GB of memory) are running on two different XenServer 6.1s and iperf session between them shows at most 3.2 Gbps. 

XenServer might use different Dom0 kernel with their own tuning. You can
also try to contact XenServer support for better idea?

In general, off-host communication can be affected by various things. It
would be quite useful to identify the bottleneck first.

Try to run:
1. Dom0 to Dom0 iperf (or you workload)
2. Dom0 to DomU iperf
3. DomU to Dom0 iperf

In order to get line rate, you need to at least get line rate from Dom0
to Dom0 IMHO. 10G/s line rate from guest to guest has not yet been
achieved at the moment...

Wei.

> I am using linux bridge as network backend switch. Dom0 is configured to have 2940MB of RAM.
> In most cases, after a few runs the throughput drops to ~2.2 Gbps. top shows that the netback thread in dom0 is having about 70-80% CPU utilization. I have checked the dom0 network configuration and there is no QoS policy in place etc. So, my question is that is PCI passthrough only option to get line rate in the guests? Is there any benchmark of maximum throughput achieved in the guests using PV drivers and without PCI pass thru? Also, what could be the reason for throughput drop in the guests (from ~3.2 to ~2.2 Gbps) consistently after few runs of iperf?
> 
> Any pointer will be highly appreciated.
> 
> thanks,
> Anirban 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: large packet support in netfront driver and guest network throughput
  2013-09-13 11:44 ` Wei Liu
@ 2013-09-13 17:09   ` Anirban Chakraborty
  2013-09-16 14:21     ` Wei Liu
  0 siblings, 1 reply; 16+ messages in thread
From: Anirban Chakraborty @ 2013-09-13 17:09 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel

On Sep 13, 2013, at 4:44 AM, Wei Liu <wei.liu2@citrix.com> wrote:

> On Thu, Sep 12, 2013 at 05:53:02PM +0000, Anirban Chakraborty wrote:
>> Hi All,
>> 
>> I am sure this has been answered somewhere in the list in the past, but I can't find it. I was wondering if the linux guest netfront driver has GRO support in it. tcpdump shows packets coming in with 1500 bytes, although the eth0 in dom0 and the vif corresponding to the linux guest in dom0 is showing that they receive large packet:
>> 
>> In dom0:
>> eth0      Link encap:Ethernet  HWaddr 90:E2:BA:3A:B1:A4  
>>          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
>> tcpdump -i eth0 -nnvv -s 1500 src 10.84.20.214
>> 17:38:25.155373 IP (tos 0x0, ttl 64, id 54607, offset 0, flags [DF], proto TCP (6), length 29012)
>>    10.84.20.214.51041 > 10.84.20.213.5001: Flags [.], seq 276592:305552, ack 1, win 229, options [nop,nop,TS val 65594025 ecr 65569225], length 28960
>> 
>> vif4.0    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF  
>>          UP BROADCAST RUNNING NOARP PROMISC  MTU:1500  Metric:1
>> tcpdump -i vif4.0 -nnvv -s 1500 src 10.84.20.214
>> 17:38:25.156364 IP (tos 0x0, ttl 64, id 54607, offset 0, flags [DF], proto TCP (6), length 29012)
>>    10.84.20.214.51041 > 10.84.20.213.5001: Flags [.], seq 276592:305552, ack 1, win 229, options [nop,nop,TS val 65594025 ecr 65569225], length 28960
>> 
>> 
>> In the guest:
>> eth0      Link encap:Ethernet  HWaddr CA:FD:DE:AB:E1:E4  
>>          inet addr:10.84.20.213  Bcast:10.84.20.255  Mask:255.255.255.0
>>          inet6 addr: fe80::c8fd:deff:feab:e1e4/64 Scope:Link
>>          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>> tcpdump -i eth0 -nnvv -s 1500 src 10.84.20.214
>> 10:38:25.071418 IP (tos 0x0, ttl 64, id 15074, offset 0, flags [DF], proto TCP (6), length 1500)
>>    10.84.20.214.51040 > 10.84.20.213.5001: Flags [.], seq 17400:18848, ack 1, win 229, options [nop,nop,TS val 65594013 ecr 65569213], length 1448
>> 
>> Is the packet on transfer from netback to net front is segmented into MTU size? Is GRO not supported in the guest?
> 
> Here is what I see in the guest, iperf server running in guest and iperf
> client running in Dom0. Tcpdump runs with the rune you provided.
> 
> 10.80.238.213.38895 > 10.80.239.197.5001: Flags [.], seq
> 5806480:5818064, ack 1, win 229, options [nop,nop,TS val 21968973 ecr
> 21832969], length 11584
> 
> This is a upstream kernel. The throughput from Dom0 to DomU is ~7.2Gb/s.

Thanks for your reply. The tcpdump was captured on dom0 of the guest [at both vif and the physical interfaces] , i.e. on the receive path of the server. iperf server was running on the guest (10.84.20.213) and the client was at another guest (on a different server) with IP 10.84.20.214. The traffic was between two guests, not between dom0 and the guest.

> 
>> 
>> I am seeing extremely low throughput on a 10Gb/s link. Two linux guests (Centos 6.4 64bit, 4 VCPU and 4GB of memory) are running on two different XenServer 6.1s and iperf session between them shows at most 3.2 Gbps. 
> 
> XenServer might use different Dom0 kernel with their own tuning. You can
> also try to contact XenServer support for better idea?
> 

XenServer 6.1 is running 2.6.32.43 kernel. Since the issue is in netfront driver, as it appears from the tcpdump, thats why I thought I post it here. Note that checksum offloads of the interfaces (virtual and physical) were not even touched, the default setting (which was set to on) was used.

> In general, off-host communication can be affected by various things. It
> would be quite useful to identify the bottleneck first.
> 
> Try to run:
> 1. Dom0 to Dom0 iperf (or you workload)
> 2. Dom0 to DomU iperf
> 3. DomU to Dom0 iperf

I tried dom0 to dom0 and I got 9.4 Gbps, which is what I expected (with GRO turned on in the physical interface). However, when I run guest to guest, things fall off. Is large packet not supported in netfront? I thought otherwise. I looked at the code and I do not see any call to napi_gro_receive(), rather it is using netif_receive_skb(). netback seems to be sending GSO packets to the netfront, but it is being segmented to 1500 byte (as it appears from the tcpdump).

> 
> In order to get line rate, you need to at least get line rate from Dom0
> to Dom0 IMHO. 10G/s line rate from guest to guest has not yet been
> achieved at the moment…

What is the current number, without VCPU pinning etc. for 1500 byte MTU? I am getting 2.2-3.2 Gbps for 4VCPU guest with 4GB of memory. It is the only vm running on that server without any other traffic.

-Anirban

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: large packet support in netfront driver and guest network throughput
  2013-09-13 17:09   ` Anirban Chakraborty
@ 2013-09-16 14:21     ` Wei Liu
  2013-09-17  2:09       ` annie li
  0 siblings, 1 reply; 16+ messages in thread
From: Wei Liu @ 2013-09-16 14:21 UTC (permalink / raw)
  To: Anirban Chakraborty; +Cc: Wei Liu, xen-devel

On Fri, Sep 13, 2013 at 05:09:48PM +0000, Anirban Chakraborty wrote:
> On Sep 13, 2013, at 4:44 AM, Wei Liu <wei.liu2@citrix.com> wrote:
> 
> > On Thu, Sep 12, 2013 at 05:53:02PM +0000, Anirban Chakraborty wrote:
> >> Hi All,
> >> 
> >> I am sure this has been answered somewhere in the list in the past, but I can't find it. I was wondering if the linux guest netfront driver has GRO support in it. tcpdump shows packets coming in with 1500 bytes, although the eth0 in dom0 and the vif corresponding to the linux guest in dom0 is showing that they receive large packet:
> >> 
> >> In dom0:
> >> eth0      Link encap:Ethernet  HWaddr 90:E2:BA:3A:B1:A4  
> >>          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
> >> tcpdump -i eth0 -nnvv -s 1500 src 10.84.20.214
> >> 17:38:25.155373 IP (tos 0x0, ttl 64, id 54607, offset 0, flags [DF], proto TCP (6), length 29012)
> >>    10.84.20.214.51041 > 10.84.20.213.5001: Flags [.], seq 276592:305552, ack 1, win 229, options [nop,nop,TS val 65594025 ecr 65569225], length 28960
> >> 
> >> vif4.0    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF  
> >>          UP BROADCAST RUNNING NOARP PROMISC  MTU:1500  Metric:1
> >> tcpdump -i vif4.0 -nnvv -s 1500 src 10.84.20.214
> >> 17:38:25.156364 IP (tos 0x0, ttl 64, id 54607, offset 0, flags [DF], proto TCP (6), length 29012)
> >>    10.84.20.214.51041 > 10.84.20.213.5001: Flags [.], seq 276592:305552, ack 1, win 229, options [nop,nop,TS val 65594025 ecr 65569225], length 28960
> >> 
> >> 
> >> In the guest:
> >> eth0      Link encap:Ethernet  HWaddr CA:FD:DE:AB:E1:E4  
> >>          inet addr:10.84.20.213  Bcast:10.84.20.255  Mask:255.255.255.0
> >>          inet6 addr: fe80::c8fd:deff:feab:e1e4/64 Scope:Link
> >>          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >> tcpdump -i eth0 -nnvv -s 1500 src 10.84.20.214
> >> 10:38:25.071418 IP (tos 0x0, ttl 64, id 15074, offset 0, flags [DF], proto TCP (6), length 1500)
> >>    10.84.20.214.51040 > 10.84.20.213.5001: Flags [.], seq 17400:18848, ack 1, win 229, options [nop,nop,TS val 65594013 ecr 65569213], length 1448
> >> 
> >> Is the packet on transfer from netback to net front is segmented into MTU size? Is GRO not supported in the guest?
> > 
> > Here is what I see in the guest, iperf server running in guest and iperf
> > client running in Dom0. Tcpdump runs with the rune you provided.
> > 
> > 10.80.238.213.38895 > 10.80.239.197.5001: Flags [.], seq
> > 5806480:5818064, ack 1, win 229, options [nop,nop,TS val 21968973 ecr
> > 21832969], length 11584
> > 
> > This is a upstream kernel. The throughput from Dom0 to DomU is ~7.2Gb/s.
> 
> Thanks for your reply. The tcpdump was captured on dom0 of the guest [at both vif and the physical interfaces] , i.e. on the receive path of the server. iperf server was running on the guest (10.84.20.213) and the client was at another guest (on a different server) with IP 10.84.20.214. The traffic was between two guests, not between dom0 and the guest.
> 
> > 
> >> 
> >> I am seeing extremely low throughput on a 10Gb/s link. Two linux guests (Centos 6.4 64bit, 4 VCPU and 4GB of memory) are running on two different XenServer 6.1s and iperf session between them shows at most 3.2 Gbps. 
> > 
> > XenServer might use different Dom0 kernel with their own tuning. You can
> > also try to contact XenServer support for better idea?
> > 
> 
> XenServer 6.1 is running 2.6.32.43 kernel. Since the issue is in netfront driver, as it appears from the tcpdump, thats why I thought I post it here. Note that checksum offloads of the interfaces (virtual and physical) were not even touched, the default setting (which was set to on) was used.
> 
> > In general, off-host communication can be affected by various things. It
> > would be quite useful to identify the bottleneck first.
> > 
> > Try to run:
> > 1. Dom0 to Dom0 iperf (or you workload)
> > 2. Dom0 to DomU iperf
> > 3. DomU to Dom0 iperf
> 
> I tried dom0 to dom0 and I got 9.4 Gbps, which is what I expected (with GRO turned on in the physical interface). However, when I run guest to guest, things fall off. Is large packet not supported in netfront? I thought otherwise. I looked at the code and I do not see any call to napi_gro_receive(), rather it is using netif_receive_skb(). netback seems to be sending GSO packets to the netfront, but it is being segmented to 1500 byte (as it appears from the tcpdump).
> 

OK, I get your problem.

Indeed netfront doesn't make use of GRO API at the moment. I've added
this to my list to work on. I will keep you posted when I get to that.

Thanks!

Wei.

> > 
> > In order to get line rate, you need to at least get line rate from Dom0
> > to Dom0 IMHO. 10G/s line rate from guest to guest has not yet been
> > achieved at the moment…
> 
> What is the current number, without VCPU pinning etc. for 1500 byte MTU? I am getting 2.2-3.2 Gbps for 4VCPU guest with 4GB of memory. It is the only vm running on that server without any other traffic.
> 
> -Anirban
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: large packet support in netfront driver and guest network throughput
  2013-09-16 14:21     ` Wei Liu
@ 2013-09-17  2:09       ` annie li
  2013-09-17  8:25         ` Wei Liu
  0 siblings, 1 reply; 16+ messages in thread
From: annie li @ 2013-09-17  2:09 UTC (permalink / raw)
  To: Wei Liu; +Cc: Anirban Chakraborty, xen-devel


On 2013-9-16 22:21, Wei Liu wrote:
> On Fri, Sep 13, 2013 at 05:09:48PM +0000, Anirban Chakraborty wrote:
>> On Sep 13, 2013, at 4:44 AM, Wei Liu <wei.liu2@citrix.com> wrote:
>>
>>> On Thu, Sep 12, 2013 at 05:53:02PM +0000, Anirban Chakraborty wrote:
>>>> Hi All,
>>>>
>>>> I am sure this has been answered somewhere in the list in the past, but I can't find it. I was wondering if the linux guest netfront driver has GRO support in it. tcpdump shows packets coming in with 1500 bytes, although the eth0 in dom0 and the vif corresponding to the linux guest in dom0 is showing that they receive large packet:
>>>>
>>>> In dom0:
>>>> eth0      Link encap:Ethernet  HWaddr 90:E2:BA:3A:B1:A4
>>>>           UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
>>>> tcpdump -i eth0 -nnvv -s 1500 src 10.84.20.214
>>>> 17:38:25.155373 IP (tos 0x0, ttl 64, id 54607, offset 0, flags [DF], proto TCP (6), length 29012)
>>>>     10.84.20.214.51041 > 10.84.20.213.5001: Flags [.], seq 276592:305552, ack 1, win 229, options [nop,nop,TS val 65594025 ecr 65569225], length 28960
>>>>
>>>> vif4.0    Link encap:Ethernet  HWaddr FE:FF:FF:FF:FF:FF
>>>>           UP BROADCAST RUNNING NOARP PROMISC  MTU:1500  Metric:1
>>>> tcpdump -i vif4.0 -nnvv -s 1500 src 10.84.20.214
>>>> 17:38:25.156364 IP (tos 0x0, ttl 64, id 54607, offset 0, flags [DF], proto TCP (6), length 29012)
>>>>     10.84.20.214.51041 > 10.84.20.213.5001: Flags [.], seq 276592:305552, ack 1, win 229, options [nop,nop,TS val 65594025 ecr 65569225], length 28960
>>>>
>>>>
>>>> In the guest:
>>>> eth0      Link encap:Ethernet  HWaddr CA:FD:DE:AB:E1:E4
>>>>           inet addr:10.84.20.213  Bcast:10.84.20.255  Mask:255.255.255.0
>>>>           inet6 addr: fe80::c8fd:deff:feab:e1e4/64 Scope:Link
>>>>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>>> tcpdump -i eth0 -nnvv -s 1500 src 10.84.20.214
>>>> 10:38:25.071418 IP (tos 0x0, ttl 64, id 15074, offset 0, flags [DF], proto TCP (6), length 1500)
>>>>     10.84.20.214.51040 > 10.84.20.213.5001: Flags [.], seq 17400:18848, ack 1, win 229, options [nop,nop,TS val 65594013 ecr 65569213], length 1448
>>>>
>>>> Is the packet on transfer from netback to net front is segmented into MTU size? Is GRO not supported in the guest?
>>> Here is what I see in the guest, iperf server running in guest and iperf
>>> client running in Dom0. Tcpdump runs with the rune you provided.
>>>
>>> 10.80.238.213.38895 > 10.80.239.197.5001: Flags [.], seq
>>> 5806480:5818064, ack 1, win 229, options [nop,nop,TS val 21968973 ecr
>>> 21832969], length 11584
>>>
>>> This is a upstream kernel. The throughput from Dom0 to DomU is ~7.2Gb/s.
>> Thanks for your reply. The tcpdump was captured on dom0 of the guest [at both vif and the physical interfaces] , i.e. on the receive path of the server. iperf server was running on the guest (10.84.20.213) and the client was at another guest (on a different server) with IP 10.84.20.214. The traffic was between two guests, not between dom0 and the guest.
>>
>>>> I am seeing extremely low throughput on a 10Gb/s link. Two linux guests (Centos 6.4 64bit, 4 VCPU and 4GB of memory) are running on two different XenServer 6.1s and iperf session between them shows at most 3.2 Gbps.
>>> XenServer might use different Dom0 kernel with their own tuning. You can
>>> also try to contact XenServer support for better idea?
>>>
>> XenServer 6.1 is running 2.6.32.43 kernel. Since the issue is in netfront driver, as it appears from the tcpdump, thats why I thought I post it here. Note that checksum offloads of the interfaces (virtual and physical) were not even touched, the default setting (which was set to on) was used.
>>
>>> In general, off-host communication can be affected by various things. It
>>> would be quite useful to identify the bottleneck first.
>>>
>>> Try to run:
>>> 1. Dom0 to Dom0 iperf (or you workload)
>>> 2. Dom0 to DomU iperf
>>> 3. DomU to Dom0 iperf
>> I tried dom0 to dom0 and I got 9.4 Gbps, which is what I expected (with GRO turned on in the physical interface). However, when I run guest to guest, things fall off. Is large packet not supported in netfront? I thought otherwise. I looked at the code and I do not see any call to napi_gro_receive(), rather it is using netif_receive_skb(). netback seems to be sending GSO packets to the netfront, but it is being segmented to 1500 byte (as it appears from the tcpdump).
>>
> OK, I get your problem.
>
> Indeed netfront doesn't make use of GRO API at the moment.

This is true.
But I am wondering why large packet is not segmented into mtu size with 
upstream kernel? I did see large packets with upsteam kernel on receive 
guest(test between 2 domus on same host).

Thanks
Annie
>   I've added
> this to my list to work on. I will keep you posted when I get to that.
>
> Thanks!
>
> Wei.
>
>>> In order to get line rate, you need to at least get line rate from Dom0
>>> to Dom0 IMHO. 10G/s line rate from guest to guest has not yet been
>>> achieved at the moment…
>> What is the current number, without VCPU pinning etc. for 1500 byte MTU? I am getting 2.2-3.2 Gbps for 4VCPU guest with 4GB of memory. It is the only vm running on that server without any other traffic.
>>
>> -Anirban
>>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: large packet support in netfront driver and guest network throughput
  2013-09-17  2:09       ` annie li
@ 2013-09-17  8:25         ` Wei Liu
  2013-09-17 17:53           ` Anirban Chakraborty
  0 siblings, 1 reply; 16+ messages in thread
From: Wei Liu @ 2013-09-17  8:25 UTC (permalink / raw)
  To: annie li; +Cc: Anirban Chakraborty, Wei Liu, xen-devel

On Tue, Sep 17, 2013 at 10:09:21AM +0800, annie li wrote:
[...]
> >>>>Is the packet on transfer from netback to net front is segmented into MTU size? Is GRO not supported in the guest?
> >>>Here is what I see in the guest, iperf server running in guest and iperf
> >>>client running in Dom0. Tcpdump runs with the rune you provided.
> >>>
> >>>10.80.238.213.38895 > 10.80.239.197.5001: Flags [.], seq
> >>>5806480:5818064, ack 1, win 229, options [nop,nop,TS val 21968973 ecr
> >>>21832969], length 11584
> >>>
> >>>This is a upstream kernel. The throughput from Dom0 to DomU is ~7.2Gb/s.
> >>Thanks for your reply. The tcpdump was captured on dom0 of the guest [at both vif and the physical interfaces] , i.e. on the receive path of the server. iperf server was running on the guest (10.84.20.213) and the client was at another guest (on a different server) with IP 10.84.20.214. The traffic was between two guests, not between dom0 and the guest.
> >>
> >>>>I am seeing extremely low throughput on a 10Gb/s link. Two linux guests (Centos 6.4 64bit, 4 VCPU and 4GB of memory) are running on two different XenServer 6.1s and iperf session between them shows at most 3.2 Gbps.
> >>>XenServer might use different Dom0 kernel with their own tuning. You can
> >>>also try to contact XenServer support for better idea?
> >>>
> >>XenServer 6.1 is running 2.6.32.43 kernel. Since the issue is in netfront driver, as it appears from the tcpdump, thats why I thought I post it here. Note that checksum offloads of the interfaces (virtual and physical) were not even touched, the default setting (which was set to on) was used.
> >>
> >>>In general, off-host communication can be affected by various things. It
> >>>would be quite useful to identify the bottleneck first.
> >>>
> >>>Try to run:
> >>>1. Dom0 to Dom0 iperf (or you workload)
> >>>2. Dom0 to DomU iperf
> >>>3. DomU to Dom0 iperf
> >>I tried dom0 to dom0 and I got 9.4 Gbps, which is what I expected (with GRO turned on in the physical interface). However, when I run guest to guest, things fall off. Is large packet not supported in netfront? I thought otherwise. I looked at the code and I do not see any call to napi_gro_receive(), rather it is using netif_receive_skb(). netback seems to be sending GSO packets to the netfront, but it is being segmented to 1500 byte (as it appears from the tcpdump).
> >>
> >OK, I get your problem.
> >
> >Indeed netfront doesn't make use of GRO API at the moment.
> 
> This is true.
> But I am wondering why large packet is not segmented into mtu size
> with upstream kernel? I did see large packets with upsteam kernel on
> receive guest(test between 2 domus on same host).
> 

I think Anirban's setup is different. The traffic is from a DomU on
another host.

I will need to setup testing environment with 10G link to test this.

Anirban, can you share your setup, especially DomU kernel version, are
you using upstream kernel in DomU?

Wei.

> Thanks
> Annie
> >  I've added
> >this to my list to work on. I will keep you posted when I get to that.
> >
> >Thanks!
> >
> >Wei.
> >
> >>>In order to get line rate, you need to at least get line rate from Dom0
> >>>to Dom0 IMHO. 10G/s line rate from guest to guest has not yet been
> >>>achieved at the moment…
> >>What is the current number, without VCPU pinning etc. for 1500 byte MTU? I am getting 2.2-3.2 Gbps for 4VCPU guest with 4GB of memory. It is the only vm running on that server without any other traffic.
> >>
> >>-Anirban
> >>
> >_______________________________________________
> >Xen-devel mailing list
> >Xen-devel@lists.xen.org
> >http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: large packet support in netfront driver and guest network throughput
  2013-09-17  8:25         ` Wei Liu
@ 2013-09-17 17:53           ` Anirban Chakraborty
  2013-09-18  2:28             ` annie li
  2013-09-18 15:48             ` Wei Liu
  0 siblings, 2 replies; 16+ messages in thread
From: Anirban Chakraborty @ 2013-09-17 17:53 UTC (permalink / raw)
  To: Wei Liu, annie li; +Cc: xen-devel



On 9/17/13 1:25 AM, "Wei Liu" <wei.liu2@citrix.com> wrote:

>On Tue, Sep 17, 2013 at 10:09:21AM +0800, annie li wrote:
>><snip>
>>>>I tried dom0 to dom0 and I got 9.4 Gbps, which is what I expected
>>>>(with GRO turned on in the physical interface). However, when I run
>>>>guest to guest, things fall off. Is large packet not supported in
>>>>netfront? I thought otherwise. I looked at the code and I do not see
>>>>any call to napi_gro_receive(), rather it is using
>>>>netif_receive_skb(). netback seems to be sending GSO packets to the
>>>>netfront, but it is being segmented to 1500 byte (as it appears from
>>>>the tcpdump).
>> >>
>> >OK, I get your problem.
>> >
>> >Indeed netfront doesn't make use of GRO API at the moment.
>> 
>> This is true.
>> But I am wondering why large packet is not segmented into mtu size
>> with upstream kernel? I did see large packets with upsteam kernel on
>> receive guest(test between 2 domus on same host).
>> 
>
>I think Anirban's setup is different. The traffic is from a DomU on
>another host.
>
>I will need to setup testing environment with 10G link to test this.
>
>Anirban, can you share your setup, especially DomU kernel version, are
>you using upstream kernel in DomU?

Sure..
I have two hosts, say h1 and h2 running XenServer 6.1.
h1 running Centos 6.4, 64bit kernel, say guest1 and h2 running identical
guest, guest2.

iperf server is running on guest1 with iperf client connecting from guest2.

I haven't tried with upstream kernel yet. However, what I found out is
that the netback on the receiving host is transmitting GSO segments to the
guest (guest1), but the packets are segmented at the netfront interface.

Annie's setup has both the guests running on the same host, in which case
packets are looped back.

-Anirban

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: large packet support in netfront driver and guest network throughput
  2013-09-17 17:53           ` Anirban Chakraborty
@ 2013-09-18  2:28             ` annie li
  2013-09-18 21:06               ` Anirban Chakraborty
  2013-09-18 15:48             ` Wei Liu
  1 sibling, 1 reply; 16+ messages in thread
From: annie li @ 2013-09-18  2:28 UTC (permalink / raw)
  To: Anirban Chakraborty; +Cc: Wei Liu, xen-devel


On 2013-9-18 1:53, Anirban Chakraborty wrote:
>
> On 9/17/13 1:25 AM, "Wei Liu" <wei.liu2@citrix.com> wrote:
>
>> On Tue, Sep 17, 2013 at 10:09:21AM +0800, annie li wrote:
>>> <snip>
>>>>> I tried dom0 to dom0 and I got 9.4 Gbps, which is what I expected
>>>>> (with GRO turned on in the physical interface). However, when I run
>>>>> guest to guest, things fall off. Is large packet not supported in
>>>>> netfront? I thought otherwise. I looked at the code and I do not see
>>>>> any call to napi_gro_receive(), rather it is using
>>>>> netif_receive_skb(). netback seems to be sending GSO packets to the
>>>>> netfront, but it is being segmented to 1500 byte (as it appears from
>>>>> the tcpdump).
>>>>>
>>>> OK, I get your problem.
>>>>
>>>> Indeed netfront doesn't make use of GRO API at the moment.
>>> This is true.
>>> But I am wondering why large packet is not segmented into mtu size
>>> with upstream kernel? I did see large packets with upsteam kernel on
>>> receive guest(test between 2 domus on same host).
>>>
>> I think Anirban's setup is different. The traffic is from a DomU on
>> another host.
>>
>> I will need to setup testing environment with 10G link to test this.
>>
>> Anirban, can you share your setup, especially DomU kernel version, are
>> you using upstream kernel in DomU?
> Sure..
> I have two hosts, say h1 and h2 running XenServer 6.1.
> h1 running Centos 6.4, 64bit kernel, say guest1 and h2 running identical
> guest, guest2.
>
> iperf server is running on guest1 with iperf client connecting from guest2.
>
> I haven't tried with upstream kernel yet. However, what I found out is
> that the netback on the receiving host is transmitting GSO segments to the
> guest (guest1), but the packets are segmented at the netfront interface.

Did you try the guests on same host in your environment?

>
> Annie's setup has both the guests running on the same host, in which case
> packets are looped back.

If guests does not segment packets for same host case, it should not do 
segment for different host case. For current upstream, Netback->netfront 
mechanism does not treat differently for these two cases.

Thanks
Annie

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: large packet support in netfront driver and guest network throughput
  2013-09-17 17:53           ` Anirban Chakraborty
  2013-09-18  2:28             ` annie li
@ 2013-09-18 15:48             ` Wei Liu
  2013-09-18 20:38               ` Anirban Chakraborty
  1 sibling, 1 reply; 16+ messages in thread
From: Wei Liu @ 2013-09-18 15:48 UTC (permalink / raw)
  To: Anirban Chakraborty; +Cc: annie li, Wei Liu, xen-devel

On Tue, Sep 17, 2013 at 05:53:43PM +0000, Anirban Chakraborty wrote:
> 
> 
> On 9/17/13 1:25 AM, "Wei Liu" <wei.liu2@citrix.com> wrote:
> 
> >On Tue, Sep 17, 2013 at 10:09:21AM +0800, annie li wrote:
> >><snip>
> >>>>I tried dom0 to dom0 and I got 9.4 Gbps, which is what I expected
> >>>>(with GRO turned on in the physical interface). However, when I run
> >>>>guest to guest, things fall off. Is large packet not supported in
> >>>>netfront? I thought otherwise. I looked at the code and I do not see
> >>>>any call to napi_gro_receive(), rather it is using
> >>>>netif_receive_skb(). netback seems to be sending GSO packets to the
> >>>>netfront, but it is being segmented to 1500 byte (as it appears from
> >>>>the tcpdump).
> >> >>
> >> >OK, I get your problem.
> >> >
> >> >Indeed netfront doesn't make use of GRO API at the moment.
> >> 
> >> This is true.
> >> But I am wondering why large packet is not segmented into mtu size
> >> with upstream kernel? I did see large packets with upsteam kernel on
> >> receive guest(test between 2 domus on same host).
> >> 
> >
> >I think Anirban's setup is different. The traffic is from a DomU on
> >another host.
> >
> >I will need to setup testing environment with 10G link to test this.
> >
> >Anirban, can you share your setup, especially DomU kernel version, are
> >you using upstream kernel in DomU?
> 
> Sure..
> I have two hosts, say h1 and h2 running XenServer 6.1.
> h1 running Centos 6.4, 64bit kernel, say guest1 and h2 running identical
> guest, guest2.
> 

Do you have exact version of your DomU' kernel? Is it available
somewhere online?

> iperf server is running on guest1 with iperf client connecting from guest2.
> 
> I haven't tried with upstream kernel yet. However, what I found out is
> that the netback on the receiving host is transmitting GSO segments to the
> guest (guest1), but the packets are segmented at the netfront interface.
> 

I just tried, with vanilla upstream kernel I can see large packet size
on DomU's side.

I also tried to convert netfront to use GRO API (hopefully I didn't get
it wrong), I didn't see much improvement -- it's quite obvious because I
already saw large packet even without GRO.

If you fancy trying GRO API, see attached patch. Note that you might
need to do some contextual adjustment as this patch is for upstream
kernel.

Wei.

---8<---
>From ca532dd11d7b8f5f8ce9d2b8043dd974d9587cb0 Mon Sep 17 00:00:00 2001
From: Wei Liu <wei.liu2@citrix.com>
Date: Wed, 18 Sep 2013 16:46:23 +0100
Subject: [PATCH] xen-netfront: convert to GRO API

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netfront.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 36808bf..dd1011e 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -952,7 +952,7 @@ static int handle_incoming_queue(struct net_device *dev,
 		u64_stats_update_end(&stats->syncp);
 
 		/* Pass it up. */
-		netif_receive_skb(skb);
+		napi_gro_receive(&np->napi, skb);
 	}
 
 	return packets_dropped;
@@ -1051,6 +1051,8 @@ err:
 	if (work_done < budget) {
 		int more_to_do = 0;
 
+		napi_gro_flush(napi, false);
+
 		local_irq_save(flags);
 
 		RING_FINAL_CHECK_FOR_RESPONSES(&np->rx, more_to_do);
-- 
1.7.10.4


> Annie's setup has both the guests running on the same host, in which case
> packets are looped back.
> 
> -Anirban
> 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: large packet support in netfront driver and guest network throughput
  2013-09-18 15:48             ` Wei Liu
@ 2013-09-18 20:38               ` Anirban Chakraborty
  2013-09-19  9:41                 ` Wei Liu
  0 siblings, 1 reply; 16+ messages in thread
From: Anirban Chakraborty @ 2013-09-18 20:38 UTC (permalink / raw)
  To: Wei Liu; +Cc: annie li, xen-devel


On Sep 18, 2013, at 8:48 AM, Wei Liu <wei.liu2@citrix.com> wrote:

> On Tue, Sep 17, 2013 at 05:53:43PM +0000, Anirban Chakraborty wrote:
>> 
>> 
>> On 9/17/13 1:25 AM, "Wei Liu" <wei.liu2@citrix.com> wrote:
>> 
>>> On Tue, Sep 17, 2013 at 10:09:21AM +0800, annie li wrote:
>>>> <snip>
>>>>>> I tried dom0 to dom0 and I got 9.4 Gbps, which is what I expected
>>>>>> (with GRO turned on in the physical interface). However, when I run
>>>>>> guest to guest, things fall off. Is large packet not supported in
>>>>>> netfront? I thought otherwise. I looked at the code and I do not see
>>>>>> any call to napi_gro_receive(), rather it is using
>>>>>> netif_receive_skb(). netback seems to be sending GSO packets to the
>>>>>> netfront, but it is being segmented to 1500 byte (as it appears from
>>>>>> the tcpdump).
>>>>>> 
>>>>> OK, I get your problem.
>>>>> 
>>>>> Indeed netfront doesn't make use of GRO API at the moment.
>>>> 
>>>> This is true.
>>>> But I am wondering why large packet is not segmented into mtu size
>>>> with upstream kernel? I did see large packets with upsteam kernel on
>>>> receive guest(test between 2 domus on same host).
>>>> 
>>> 
>>> I think Anirban's setup is different. The traffic is from a DomU on
>>> another host.
>>> 
>>> I will need to setup testing environment with 10G link to test this.
>>> 
>>> Anirban, can you share your setup, especially DomU kernel version, are
>>> you using upstream kernel in DomU?
>> 
>> Sure..
>> I have two hosts, say h1 and h2 running XenServer 6.1.
>> h1 running Centos 6.4, 64bit kernel, say guest1 and h2 running identical
>> guest, guest2.
>> 
> 
> Do you have exact version of your DomU' kernel? Is it available
> somewhere online?

Yes, it is 2.6.32-358.el6.x86_64. Sorry, I missed it out last time.

> 
>> iperf server is running on guest1 with iperf client connecting from guest2.
>> 
>> I haven't tried with upstream kernel yet. However, what I found out is
>> that the netback on the receiving host is transmitting GSO segments to the
>> guest (guest1), but the packets are segmented at the netfront interface.
>> 
> 
> I just tried, with vanilla upstream kernel I can see large packet size
> on DomU's side.
> 
> I also tried to convert netfront to use GRO API (hopefully I didn't get
> it wrong), I didn't see much improvement -- it's quite obvious because I
> already saw large packet even without GRO.
> 
> If you fancy trying GRO API, see attached patch. Note that you might
> need to do some contextual adjustment as this patch is for upstream
> kernel.
> 
> Wei.
> 
> ---8<---
> From ca532dd11d7b8f5f8ce9d2b8043dd974d9587cb0 Mon Sep 17 00:00:00 2001
> From: Wei Liu <wei.liu2@citrix.com>
> Date: Wed, 18 Sep 2013 16:46:23 +0100
> Subject: [PATCH] xen-netfront: convert to GRO API
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
> drivers/net/xen-netfront.c |    4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 36808bf..dd1011e 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -952,7 +952,7 @@ static int handle_incoming_queue(struct net_device *dev,
> 		u64_stats_update_end(&stats->syncp);
> 
> 		/* Pass it up. */
> -		netif_receive_skb(skb);
> +		napi_gro_receive(&np->napi, skb);
> 	}
> 
> 	return packets_dropped;
> @@ -1051,6 +1051,8 @@ err:
> 	if (work_done < budget) {
> 		int more_to_do = 0;
> 
> +		napi_gro_flush(napi, false);
> +
> 		local_irq_save(flags);
> 
> 		RING_FINAL_CHECK_FOR_RESPONSES(&np->rx, more_to_do);
> -- 
> 1.7.10.4


I was able to see a bit of improvement (from 2.65 to 3.6 Gbps) with the following patch (your patch plus the advertisement of NETIF_F_GRO) :
---------
diff --git a/xen-netfront.c.orig b/xen-netfront.c
index 23e467d..bc673d3 100644
--- a/xen-netfront.c.orig
+++ b/xen-netfront.c
@@ -818,6 +818,7 @@ static int handle_incoming_queue(struct net_device *dev,
 {
 	int packets_dropped = 0;
 	struct sk_buff *skb;
+	struct netfront_info *np = netdev_priv(dev);
 
 	while ((skb = __skb_dequeue(rxq)) != NULL) {
 		struct page *page = NETFRONT_SKB_CB(skb)->page;
@@ -846,7 +847,7 @@ static int handle_incoming_queue(struct net_device *dev,
 		dev->stats.rx_bytes += skb->len;
 
 		/* Pass it up. */
-		netif_receive_skb(skb);
+		napi_gro_receive(&np->napi, skb);
 	}
 
 	return packets_dropped;
@@ -981,6 +982,7 @@ err:
 	if (work_done < budget) {
 		int more_to_do = 0;
 
+		napi_gro_flush(napi);
 		local_irq_save(flags);
 
 		RING_FINAL_CHECK_FOR_RESPONSES(&np->rx, more_to_do);
@@ -1182,7 +1184,8 @@ static struct net_device * __devinit xennet_create_dev(struct xenbus_device *dev
 	netif_napi_add(netdev, &np->napi, xennet_poll, 64);
 
 	/* Assume all features and let xennet_set_features fix up.  */
-	netdev->features        = NETIF_F_IP_CSUM | NETIF_F_SG | NETIF_F_TSO;
+	netdev->features        = NETIF_F_IP_CSUM | NETIF_F_SG | NETIF_F_TSO |
+					NETIF_F_GRO;
 
 	SET_ETHTOOL_OPS(netdev, &xennet_ethtool_ops);
 	SET_NETDEV_DEV(netdev, &dev->dev);
-----------
tcpdump showed that the guest interface received large packets. I haven't checked upstream kernel as guest though.

Anirban

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: large packet support in netfront driver and guest network throughput
  2013-09-18  2:28             ` annie li
@ 2013-09-18 21:06               ` Anirban Chakraborty
  0 siblings, 0 replies; 16+ messages in thread
From: Anirban Chakraborty @ 2013-09-18 21:06 UTC (permalink / raw)
  To: annie li; +Cc: Wei Liu, xen-devel


On Sep 17, 2013, at 7:28 PM, annie li <annie.li@oracle.com> wrote:

> 
> On 2013-9-18 1:53, Anirban Chakraborty wrote:
>> 
>> On 9/17/13 1:25 AM, "Wei Liu" <wei.liu2@citrix.com> wrote:
>> 
>>> On Tue, Sep 17, 2013 at 10:09:21AM +0800, annie li wrote:
>>>> <snip>
>>>>>> I tried dom0 to dom0 and I got 9.4 Gbps, which is what I expected
>>>>>> (with GRO turned on in the physical interface). However, when I run
>>>>>> guest to guest, things fall off. Is large packet not supported in
>>>>>> netfront? I thought otherwise. I looked at the code and I do not see
>>>>>> any call to napi_gro_receive(), rather it is using
>>>>>> netif_receive_skb(). netback seems to be sending GSO packets to the
>>>>>> netfront, but it is being segmented to 1500 byte (as it appears from
>>>>>> the tcpdump).
>>>>>> 
>>>>> OK, I get your problem.
>>>>> 
>>>>> Indeed netfront doesn't make use of GRO API at the moment.
>>>> This is true.
>>>> But I am wondering why large packet is not segmented into mtu size
>>>> with upstream kernel? I did see large packets with upsteam kernel on
>>>> receive guest(test between 2 domus on same host).
>>>> 
>>> I think Anirban's setup is different. The traffic is from a DomU on
>>> another host.
>>> 
>>> I will need to setup testing environment with 10G link to test this.
>>> 
>>> Anirban, can you share your setup, especially DomU kernel version, are
>>> you using upstream kernel in DomU?
>> Sure..
>> I have two hosts, say h1 and h2 running XenServer 6.1.
>> h1 running Centos 6.4, 64bit kernel, say guest1 and h2 running identical
>> guest, guest2.
>> 
>> iperf server is running on guest1 with iperf client connecting from guest2.
>> 
>> I haven't tried with upstream kernel yet. However, what I found out is
>> that the netback on the receiving host is transmitting GSO segments to the
>> guest (guest1), but the packets are segmented at the netfront interface.
> 
> Did you try the guests on same host in your environment?

Yes, I did and it showed up 5.66 Gbps as opposed to 2.6-2.8Gbps.
> 
>> 
>> Annie's setup has both the guests running on the same host, in which case
>> packets are looped back.
> 
> If guests does not segment packets for same host case, it should not do segment for different host case. For current upstream, Netback->netfront mechanism does not treat differently for these two cases.

For the two guests (Centos 6.4, 2.6.32.-358) on the same host, the receiving guest indeed receives large packet, while this is not true if the receiving guest is on a different host. Since both the vifs are connected to the same linux bridge and hence packets passed between them are forwarded without hitting the wire. This explains the higher throughput. However, I would still expect segmented packet at the receiving guest, but certainly that is not what I am seeing.

Anirban

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: large packet support in netfront driver and guest network throughput
  2013-09-18 20:38               ` Anirban Chakraborty
@ 2013-09-19  9:41                 ` Wei Liu
  2013-09-19 16:59                   ` Anirban Chakraborty
  0 siblings, 1 reply; 16+ messages in thread
From: Wei Liu @ 2013-09-19  9:41 UTC (permalink / raw)
  To: Anirban Chakraborty; +Cc: annie li, Wei Liu, xen-devel

On Wed, Sep 18, 2013 at 08:38:01PM +0000, Anirban Chakraborty wrote:
> 
> On Sep 18, 2013, at 8:48 AM, Wei Liu <wei.liu2@citrix.com> wrote:
> 
> > On Tue, Sep 17, 2013 at 05:53:43PM +0000, Anirban Chakraborty wrote:
> >> 
> >> 
> >> On 9/17/13 1:25 AM, "Wei Liu" <wei.liu2@citrix.com> wrote:
> >> 
> >>> On Tue, Sep 17, 2013 at 10:09:21AM +0800, annie li wrote:
> >>>> <snip>
> >>>>>> I tried dom0 to dom0 and I got 9.4 Gbps, which is what I expected
> >>>>>> (with GRO turned on in the physical interface). However, when I run
> >>>>>> guest to guest, things fall off. Is large packet not supported in
> >>>>>> netfront? I thought otherwise. I looked at the code and I do not see
> >>>>>> any call to napi_gro_receive(), rather it is using
> >>>>>> netif_receive_skb(). netback seems to be sending GSO packets to the
> >>>>>> netfront, but it is being segmented to 1500 byte (as it appears from
> >>>>>> the tcpdump).
> >>>>>> 
> >>>>> OK, I get your problem.
> >>>>> 
> >>>>> Indeed netfront doesn't make use of GRO API at the moment.
> >>>> 
> >>>> This is true.
> >>>> But I am wondering why large packet is not segmented into mtu size
> >>>> with upstream kernel? I did see large packets with upsteam kernel on
> >>>> receive guest(test between 2 domus on same host).
> >>>> 
> >>> 
> >>> I think Anirban's setup is different. The traffic is from a DomU on
> >>> another host.
> >>> 
> >>> I will need to setup testing environment with 10G link to test this.
> >>> 
> >>> Anirban, can you share your setup, especially DomU kernel version, are
> >>> you using upstream kernel in DomU?
> >> 
> >> Sure..
> >> I have two hosts, say h1 and h2 running XenServer 6.1.
> >> h1 running Centos 6.4, 64bit kernel, say guest1 and h2 running identical
> >> guest, guest2.
> >> 
> > 
> > Do you have exact version of your DomU' kernel? Is it available
> > somewhere online?
> 
> Yes, it is 2.6.32-358.el6.x86_64. Sorry, I missed it out last time.
> 

So that's a RHEL kernel, you might also want to ask Redhat to have a
look at that?

> > 
> >> iperf server is running on guest1 with iperf client connecting from guest2.
> >> 
> >> I haven't tried with upstream kernel yet. However, what I found out is
> >> that the netback on the receiving host is transmitting GSO segments to the
> >> guest (guest1), but the packets are segmented at the netfront interface.
> >> 
> > 
> > I just tried, with vanilla upstream kernel I can see large packet size
> > on DomU's side.
> > 
> > I also tried to convert netfront to use GRO API (hopefully I didn't get
> > it wrong), I didn't see much improvement -- it's quite obvious because I
> > already saw large packet even without GRO.
> > 
> > If you fancy trying GRO API, see attached patch. Note that you might
> > need to do some contextual adjustment as this patch is for upstream
> > kernel.
> > 
> > Wei.
> > 
> > ---8<---
> > From ca532dd11d7b8f5f8ce9d2b8043dd974d9587cb0 Mon Sep 17 00:00:00 2001
> > From: Wei Liu <wei.liu2@citrix.com>
> > Date: Wed, 18 Sep 2013 16:46:23 +0100
> > Subject: [PATCH] xen-netfront: convert to GRO API
> > 
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > ---
> > drivers/net/xen-netfront.c |    4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > index 36808bf..dd1011e 100644
> > --- a/drivers/net/xen-netfront.c
> > +++ b/drivers/net/xen-netfront.c
> > @@ -952,7 +952,7 @@ static int handle_incoming_queue(struct net_device *dev,
> > 		u64_stats_update_end(&stats->syncp);
> > 
> > 		/* Pass it up. */
> > -		netif_receive_skb(skb);
> > +		napi_gro_receive(&np->napi, skb);
> > 	}
> > 
> > 	return packets_dropped;
> > @@ -1051,6 +1051,8 @@ err:
> > 	if (work_done < budget) {
> > 		int more_to_do = 0;
> > 
> > +		napi_gro_flush(napi, false);
> > +
> > 		local_irq_save(flags);
> > 
> > 		RING_FINAL_CHECK_FOR_RESPONSES(&np->rx, more_to_do);
> > -- 
> > 1.7.10.4
> 
> 
> I was able to see a bit of improvement (from 2.65 to 3.6 Gbps) with the following patch (your patch plus the advertisement of NETIF_F_GRO) :

OK, thanks for reporting back.

I'm curious about the packet size after enabling GRO. I can get 5G/s
upstream with packet size ~24K on a 10G nic. It's not line rate yet,
certainly there is space for improvement.

Wei.

> ---------
> diff --git a/xen-netfront.c.orig b/xen-netfront.c
> index 23e467d..bc673d3 100644
> --- a/xen-netfront.c.orig
> +++ b/xen-netfront.c
> @@ -818,6 +818,7 @@ static int handle_incoming_queue(struct net_device *dev,
>  {
>  	int packets_dropped = 0;
>  	struct sk_buff *skb;
> +	struct netfront_info *np = netdev_priv(dev);
>  
>  	while ((skb = __skb_dequeue(rxq)) != NULL) {
>  		struct page *page = NETFRONT_SKB_CB(skb)->page;
> @@ -846,7 +847,7 @@ static int handle_incoming_queue(struct net_device *dev,
>  		dev->stats.rx_bytes += skb->len;
>  
>  		/* Pass it up. */
> -		netif_receive_skb(skb);
> +		napi_gro_receive(&np->napi, skb);
>  	}
>  
>  	return packets_dropped;
> @@ -981,6 +982,7 @@ err:
>  	if (work_done < budget) {
>  		int more_to_do = 0;
>  
> +		napi_gro_flush(napi);
>  		local_irq_save(flags);
>  
>  		RING_FINAL_CHECK_FOR_RESPONSES(&np->rx, more_to_do);
> @@ -1182,7 +1184,8 @@ static struct net_device * __devinit xennet_create_dev(struct xenbus_device *dev
>  	netif_napi_add(netdev, &np->napi, xennet_poll, 64);
>  
>  	/* Assume all features and let xennet_set_features fix up.  */
> -	netdev->features        = NETIF_F_IP_CSUM | NETIF_F_SG | NETIF_F_TSO;
> +	netdev->features        = NETIF_F_IP_CSUM | NETIF_F_SG | NETIF_F_TSO |
> +					NETIF_F_GRO;
>  
>  	SET_ETHTOOL_OPS(netdev, &xennet_ethtool_ops);
>  	SET_NETDEV_DEV(netdev, &dev->dev);
> -----------
> tcpdump showed that the guest interface received large packets. I haven't checked upstream kernel as guest though.
> 
> Anirban
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: large packet support in netfront driver and guest network throughput
  2013-09-19  9:41                 ` Wei Liu
@ 2013-09-19 16:59                   ` Anirban Chakraborty
  2013-09-19 18:43                     ` Wei Liu
  0 siblings, 1 reply; 16+ messages in thread
From: Anirban Chakraborty @ 2013-09-19 16:59 UTC (permalink / raw)
  To: Wei Liu; +Cc: annie li, xen-devel

On Sep 19, 2013, at 2:41 AM, Wei Liu <wei.liu2@citrix.com> wrote:

> On Wed, Sep 18, 2013 at 08:38:01PM +0000, Anirban Chakraborty wrote:
>> 
>> On Sep 18, 2013, at 8:48 AM, Wei Liu <wei.liu2@citrix.com> wrote:
>> 
>>> On Tue, Sep 17, 2013 at 05:53:43PM +0000, Anirban Chakraborty wrote:
>>>> 
>>>> 
>>>> On 9/17/13 1:25 AM, "Wei Liu" <wei.liu2@citrix.com> wrote:
>>>> <snip>
>> 
>> Yes, it is 2.6.32-358.el6.x86_64. Sorry, I missed it out last time.
>> 
> 
> So that's a RHEL kernel, you might also want to ask Redhat to have a
> look at that?

It is not RHEL kernel, Centos 6.4. This being a netfront driver issue, I think we should address it here.

> 
>>> <snip>-- 
>>> 1.7.10.4
>> 
>> 
>> I was able to see a bit of improvement (from 2.65 to 3.6 Gbps) with the following patch (your patch plus the advertisement of NETIF_F_GRO) :
> 
> OK, thanks for reporting back.
> 
> I'm curious about the packet size after enabling GRO. I can get 5G/s
> upstream with packet size ~24K on a 10G nic. It's not line rate yet,
> certainly there is space for improvement.

I am seeing varying packet sizes with the GRO patch, from 2K to all the way upto 64K.
I do not think we can get line rate by enabling GRO only. The netback thread that is handling the guest traffic is running on a different CPU (and possibly different node) compared to the guest. If we can schedule netback for a guest to run on the same node as the guest, we should be able to see better numbers. 
In any case, are you going to submit the patch upstream or should I do it? 
Thanks.

Anirban

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: large packet support in netfront driver and guest network throughput
  2013-09-19 16:59                   ` Anirban Chakraborty
@ 2013-09-19 18:43                     ` Wei Liu
  2013-09-19 19:04                       ` Wei Liu
  0 siblings, 1 reply; 16+ messages in thread
From: Wei Liu @ 2013-09-19 18:43 UTC (permalink / raw)
  To: Anirban Chakraborty; +Cc: annie li, Wei Liu, xen-devel

On Thu, Sep 19, 2013 at 04:59:49PM +0000, Anirban Chakraborty wrote:
[...]
> >> 
> >> I was able to see a bit of improvement (from 2.65 to 3.6 Gbps) with the following patch (your patch plus the advertisement of NETIF_F_GRO) :
> > 
> > OK, thanks for reporting back.
> > 
> > I'm curious about the packet size after enabling GRO. I can get 5G/s
> > upstream with packet size ~24K on a 10G nic. It's not line rate yet,
> > certainly there is space for improvement.
> 
> I am seeing varying packet sizes with the GRO patch, from 2K to all the way upto 64K.
> I do not think we can get line rate by enabling GRO only. The netback thread that is handling the guest traffic is running on a different CPU (and possibly different node) compared to the guest. If we can schedule netback for a guest to run on the same node as the guest, we should be able to see better numbers. 

You can use vcpu pin to pin Dom0's CPUs and Dom0's CPUs to the same NUMA
node. However domain's memory might still be striped across different
nodes.

> In any case, are you going to submit the patch upstream or should I do it? 

I will do that once net-next is open.

Wei.

> Thanks.
> 
> Anirban
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: large packet support in netfront driver and guest network throughput
  2013-09-19 18:43                     ` Wei Liu
@ 2013-09-19 19:04                       ` Wei Liu
  2013-09-19 20:54                         ` Anirban Chakraborty
  0 siblings, 1 reply; 16+ messages in thread
From: Wei Liu @ 2013-09-19 19:04 UTC (permalink / raw)
  To: Anirban Chakraborty; +Cc: annie li, Wei Liu, xen-devel

On Thu, Sep 19, 2013 at 07:43:15PM +0100, Wei Liu wrote:
[...]
> 
> > In any case, are you going to submit the patch upstream or should I do it? 
> 
> I will do that once net-next is open.
> 

I will add your SoB to that patch, is that OK?

Wei.

> Wei.
> 
> > Thanks.
> > 
> > Anirban
> > 
> > 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: large packet support in netfront driver and guest network throughput
  2013-09-19 19:04                       ` Wei Liu
@ 2013-09-19 20:54                         ` Anirban Chakraborty
  0 siblings, 0 replies; 16+ messages in thread
From: Anirban Chakraborty @ 2013-09-19 20:54 UTC (permalink / raw)
  To: Wei Liu; +Cc: annie li, xen-devel


On Sep 19, 2013, at 12:04 PM, Wei Liu <wei.liu2@citrix.com> wrote:

> On Thu, Sep 19, 2013 at 07:43:15PM +0100, Wei Liu wrote:
> [...]
>> 
>>> In any case, are you going to submit the patch upstream or should I do it? 
>> 
>> I will do that once net-next is open.
>> 
> 
> I will add your SoB to that patch, is that OK?

Thats fine. Thanks.
Anirban

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-09-19 20:54 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-12 17:53 large packet support in netfront driver and guest network throughput Anirban Chakraborty
2013-09-13 11:44 ` Wei Liu
2013-09-13 17:09   ` Anirban Chakraborty
2013-09-16 14:21     ` Wei Liu
2013-09-17  2:09       ` annie li
2013-09-17  8:25         ` Wei Liu
2013-09-17 17:53           ` Anirban Chakraborty
2013-09-18  2:28             ` annie li
2013-09-18 21:06               ` Anirban Chakraborty
2013-09-18 15:48             ` Wei Liu
2013-09-18 20:38               ` Anirban Chakraborty
2013-09-19  9:41                 ` Wei Liu
2013-09-19 16:59                   ` Anirban Chakraborty
2013-09-19 18:43                     ` Wei Liu
2013-09-19 19:04                       ` Wei Liu
2013-09-19 20:54                         ` Anirban Chakraborty

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.