All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] examples/vhost: fix perf regression
@ 2016-07-19 13:53 Jianfeng Tan
  2016-07-20  1:44 ` Yuanhan Liu
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Jianfeng Tan @ 2016-07-19 13:53 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, zhihong.wang, Jianfeng Tan

We find significant perfermance drop introduced by below commit,
when vhost example is started with --mergeable 0 and inside vm,
kernel virtio-net driver is used to do ip based forwarding.

The root cause is that below commit adds support for
VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6, and when
mergeable is disabled, it triggers big_packets path of virtio-net
driver. In this path, virtio driver uses 19 desc with 18 4K-sized
pages to receive each packet, so that it can receive a big packet
with size of 64K. But QEMU only creates 256 desc entries for each
vq, which results in that only 13 packets can be received. VM
kernel can quickly handle those packets and go to sleep (HLT).

As QEMU has no option to set the desc entries of a vq, so here,
we disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6
with VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we
disable tso of vhost example, to avoid VM kernel virtio driver
go into big_packets path.

Fixes: 859b480d5afd ("vhost: add guest offload setting")

Reported-by: Qian Xu <qian.q.xu@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 examples/vhost/main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 3b98f42..92a9823 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -327,6 +327,8 @@ port_init(uint8_t port)
 	if (enable_tso == 0) {
 		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO4);
 		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO6);
+		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_GUEST_TSO4);
+		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_GUEST_TSO6);
 	}
 
 	rx_rings = (uint16_t)dev_info.max_rx_queues;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] examples/vhost: fix perf regression
  2016-07-19 13:53 [PATCH] examples/vhost: fix perf regression Jianfeng Tan
@ 2016-07-20  1:44 ` Yuanhan Liu
  2016-07-20  2:44   ` Tan, Jianfeng
  2016-07-20  3:52 ` Xu, Qian Q
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Yuanhan Liu @ 2016-07-20  1:44 UTC (permalink / raw)
  To: Jianfeng Tan; +Cc: dev, zhihong.wang

On Tue, Jul 19, 2016 at 01:53:11PM +0000, Jianfeng Tan wrote:
> We find significant perfermance drop introduced by below commit,
> when vhost example is started with --mergeable 0 and inside vm,
> kernel virtio-net driver is used to do ip based forwarding.
> 
> The root cause is that below commit adds support for
> VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6, and when
> mergeable is disabled, it triggers big_packets path of virtio-net
> driver. In this path, virtio driver uses 19 desc with 18 4K-sized
> pages to receive each packet, so that it can receive a big packet
> with size of 64K. But QEMU only creates 256 desc entries for each
> vq, which results in that only 13 packets can be received. VM
> kernel can quickly handle those packets and go to sleep (HLT).
> 
> As QEMU has no option to set the desc entries of a vq, so here,
> we disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6
> with VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we
> disable tso of vhost example, to avoid VM kernel virtio driver
> go into big_packets path.
> 
> Fixes: 859b480d5afd ("vhost: add guest offload setting")
> 
> Reported-by: Qian Xu <qian.q.xu@intel.com>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>

We could apply this patch, but I don't think it actually fix anything:

- it doesn't fix other vhost applications, say OVS, which is for sure
  way more widly used than vhost-example.

- it doesn't even fix it when tso is enabled and mergeable-rx is disabled
  with this vhost-example.

Thanks for the good root-cause, btw!

	--yliu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] examples/vhost: fix perf regression
  2016-07-20  1:44 ` Yuanhan Liu
@ 2016-07-20  2:44   ` Tan, Jianfeng
  2016-07-20  3:16     ` Xu, Qian Q
  2016-07-20  4:00     ` Yuanhan Liu
  0 siblings, 2 replies; 16+ messages in thread
From: Tan, Jianfeng @ 2016-07-20  2:44 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, zhihong.wang

Hi Yuanhan,

On 7/20/2016 9:44 AM, Yuanhan Liu wrote:
> On Tue, Jul 19, 2016 at 01:53:11PM +0000, Jianfeng Tan wrote:
>> We find significant perfermance drop introduced by below commit,
>> when vhost example is started with --mergeable 0 and inside vm,
>> kernel virtio-net driver is used to do ip based forwarding.
>>
>> The root cause is that below commit adds support for
>> VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6, and when
>> mergeable is disabled, it triggers big_packets path of virtio-net
>> driver. In this path, virtio driver uses 19 desc with 18 4K-sized
>> pages to receive each packet, so that it can receive a big packet
>> with size of 64K. But QEMU only creates 256 desc entries for each
>> vq, which results in that only 13 packets can be received. VM
>> kernel can quickly handle those packets and go to sleep (HLT).
>>
>> As QEMU has no option to set the desc entries of a vq, so here,
>> we disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6
>> with VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we
>> disable tso of vhost example, to avoid VM kernel virtio driver
>> go into big_packets path.
>>
>> Fixes: 859b480d5afd ("vhost: add guest offload setting")
>>
>> Reported-by: Qian Xu <qian.q.xu@intel.com>
>> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> We could apply this patch, but I don't think it actually fix anything:
>
> - it doesn't fix other vhost applications, say OVS, which is for sure
>    way more widly used than vhost-example.

If I remember it correctly, OVS will enable mergeable.

>
> - it doesn't even fix it when tso is enabled and mergeable-rx is disabled
>    with this vhost-example.

But we'd better avoid users go into such doubt that performance drops 
because of that commit under the case tso=off,mergeable=off, right?

Thanks,
Jianfeng

>
> Thanks for the good root-cause, btw!
>
> 	--yliu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] examples/vhost: fix perf regression
  2016-07-20  2:44   ` Tan, Jianfeng
@ 2016-07-20  3:16     ` Xu, Qian Q
  2016-07-20  4:00     ` Yuanhan Liu
  1 sibling, 0 replies; 16+ messages in thread
From: Xu, Qian Q @ 2016-07-20  3:16 UTC (permalink / raw)
  To: Tan, Jianfeng, Yuanhan Liu; +Cc: dev, Wang, Zhihong

My comments below. 

-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tan, Jianfeng
Sent: Wednesday, July 20, 2016 10:44 AM
To: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: dev@dpdk.org; Wang, Zhihong <zhihong.wang@intel.com>
Subject: Re: [dpdk-dev] [PATCH] examples/vhost: fix perf regression

Hi Yuanhan,

On 7/20/2016 9:44 AM, Yuanhan Liu wrote:
> On Tue, Jul 19, 2016 at 01:53:11PM +0000, Jianfeng Tan wrote:
>> We find significant perfermance drop introduced by below commit, when 
>> vhost example is started with --mergeable 0 and inside vm, kernel 
>> virtio-net driver is used to do ip based forwarding.
>>
>> The root cause is that below commit adds support for
>> VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6, and when 
>> mergeable is disabled, it triggers big_packets path of virtio-net 
>> driver. In this path, virtio driver uses 19 desc with 18 4K-sized 
>> pages to receive each packet, so that it can receive a big packet 
>> with size of 64K. But QEMU only creates 256 desc entries for each vq, 
>> which results in that only 13 packets can be received. VM kernel can 
>> quickly handle those packets and go to sleep (HLT).
>>
>> As QEMU has no option to set the desc entries of a vq, so here, we 
>> disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6 with 
>> VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we disable tso 
>> of vhost example, to avoid VM kernel virtio driver go into 
>> big_packets path.
>>
>> Fixes: 859b480d5afd ("vhost: add guest offload setting")
>>
>> Reported-by: Qian Xu <qian.q.xu@intel.com>
>> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> We could apply this patch, but I don't think it actually fix anything:
>
> - it doesn't fix other vhost applications, say OVS, which is for sure
>    way more widly used than vhost-example.

If I remember it correctly, OVS will enable mergeable.

>
> - it doesn't even fix it when tso is enabled and mergeable-rx is disabled
>    with this vhost-example.

But we'd better avoid users go into such doubt that performance drops because of that commit under the case tso=off,mergeable=off, right?

Normally, when people enable TSO, they should turn on mergeable, if they don't turn on mergeable, then please don't expect high performance, 
so this is not a problem. They may get low performance due to the improper settings. 

As to a complete fix for the issue, we may need go back to the TSO feature design for vhost, currently, the feature negotiation code is in the application, 
but it's better to be considered in the vhost/virtio library so that application doesn't need to check/set the feature. But now it's too late for the complete fix, 
so the workaround is ok for this release from my view. 

Thanks,
Jianfeng

>
> Thanks for the good root-cause, btw!
>
> 	--yliu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] examples/vhost: fix perf regression
  2016-07-19 13:53 [PATCH] examples/vhost: fix perf regression Jianfeng Tan
  2016-07-20  1:44 ` Yuanhan Liu
@ 2016-07-20  3:52 ` Xu, Qian Q
  2016-07-20  4:38 ` Yuanhan Liu
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Xu, Qian Q @ 2016-07-20  3:52 UTC (permalink / raw)
  To: Tan, Jianfeng, dev; +Cc: yuanhan.liu, Wang, Zhihong, Tan, Jianfeng

Tested-by: Qian Xu <qian.q.xu@intel.com>

- Test Commit: 8f6f24342281f59de0df7bd976a32f714d39b9a9
- OS/Kernel: Fedora 21/4.1.13
- GCC: gcc (GCC) 4.9.2 20141101 (Red Hat 4.9.2-1)
- CPU: Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10
- NIC: Intel(R) Ethernet Controller X710 for 10GbE SFP+
- Total 2 cases, 2 passed, 0 failed. 

Test Case1: Virtio-net IPV4 fwd performance with mergable=off

Summary: 
Launch the vhost-switch sample, and launch VM with 2 virtio-net devices, let 2 virtio-net run IPV4 fwd, send traffic to the NIC port and let the traffic go through 2 virtio-net devices. Check the performance.

Details: 
1. Bind one port to igb_uio. 
2. Run vhost switch sample with mergeable=0, disable mergeable. 
taskset -c 18-19 ./examples/vhost/build/vhost-switch -c 0xc0000 -n 4 --huge-dir /mnt/huge --socket-mem 1024,1024 -- -p 0x1 --mergeable 0 --vm2vm 0
3. Launch VM: 
taskset -c 22-23 \
/root/qemu-versions/qemu-2.6.0/x86_64-softmmu/qemu-system-x86_64 -name vm1 \
-cpu host -enable-kvm -m 2048 -object memory-backend-file,id=mem,size=2048M,mem-path=/mnt/huge,share=on -numa node,memdev=mem -mem-prealloc \
-smp cores=4 -drive file=/home/img/vm1.img  \
-chardev socket,id=char0,path=./vhost-net \
-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \
-device virtio-net-pci,mac=52:54:00:00:00:01,netdev=mynet1,mrg_rxbuf=on \
-chardev socket,id=char1,path=./vhost-net \
-netdev type=vhost-user,id=mynet2,chardev=char1,vhostforce \
-device virtio-net-pci,mac=52:54:00:00:00:02,netdev=mynet2,mrg_rxbuf=on \
-netdev tap,id=ipvm1,ifname=tap3,script=/etc/qemu-ifup -device rtl8139,netdev=ipvm1,id=net0,mac=00:00:00:00:10:01 \
-vnc :3 -daemonize
4. Set IPV4 fwd rules in VM: 
virtio1=$1
virtio2=$2
systemctl stop firewalld.service
systemctl disable firewalld.service
systemctl stop ip6tables.service
systemctl disable ip6tables.service
systemctl stop iptables.service
systemctl disable iptables.service
systemctl stop NetworkManager.service
echo 1 >/proc/sys/net/ipv4/ip_forward
ip addr add 192.168.1.2/24 dev $virtio1
ip neigh add 192.168.1.1 lladdr 00:00:10:00:24:00 dev $virtio1
ip link set dev $virtio1 up

ip addr add 192.168.2.2/24 dev $virtio2
ip neigh add 192.168.2.1 lladdr 00:00:10:00:24:01 dev $virtio2
ip link set dev $virtio2 up

5. Send traffic to NIC and see the performance back from virtio2. The performance is back with the patch. 

Test Case2: Virtio-net IPV4 fwd performance with mergable=on
Similar steps, just one feature set is different, set mergable=1 in the vhost-switch sample, then the performance is good as before. 



Thanks
Qian

-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianfeng Tan
Sent: Tuesday, July 19, 2016 9:53 PM
To: dev@dpdk.org
Cc: yuanhan.liu@linux.intel.com; Wang, Zhihong <zhihong.wang@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>
Subject: [dpdk-dev] [PATCH] examples/vhost: fix perf regression

We find significant perfermance drop introduced by below commit, when vhost example is started with --mergeable 0 and inside vm, kernel virtio-net driver is used to do ip based forwarding.

The root cause is that below commit adds support for
VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6, and when mergeable is disabled, it triggers big_packets path of virtio-net driver. In this path, virtio driver uses 19 desc with 18 4K-sized pages to receive each packet, so that it can receive a big packet with size of 64K. But QEMU only creates 256 desc entries for each vq, which results in that only 13 packets can be received. VM kernel can quickly handle those packets and go to sleep (HLT).

As QEMU has no option to set the desc entries of a vq, so here, we disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6 with VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we disable tso of vhost example, to avoid VM kernel virtio driver go into big_packets path.

Fixes: 859b480d5afd ("vhost: add guest offload setting")

Reported-by: Qian Xu <qian.q.xu@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 examples/vhost/main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c index 3b98f42..92a9823 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -327,6 +327,8 @@ port_init(uint8_t port)
 	if (enable_tso == 0) {
 		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO4);
 		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO6);
+		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_GUEST_TSO4);
+		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_GUEST_TSO6);
 	}
 
 	rx_rings = (uint16_t)dev_info.max_rx_queues;
--
2.7.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] examples/vhost: fix perf regression
  2016-07-20  2:44   ` Tan, Jianfeng
  2016-07-20  3:16     ` Xu, Qian Q
@ 2016-07-20  4:00     ` Yuanhan Liu
  1 sibling, 0 replies; 16+ messages in thread
From: Yuanhan Liu @ 2016-07-20  4:00 UTC (permalink / raw)
  To: Tan, Jianfeng; +Cc: dev, zhihong.wang

On Wed, Jul 20, 2016 at 10:44:13AM +0800, Tan, Jianfeng wrote:
> Hi Yuanhan,
> 
> On 7/20/2016 9:44 AM, Yuanhan Liu wrote:
> >On Tue, Jul 19, 2016 at 01:53:11PM +0000, Jianfeng Tan wrote:
> >>We find significant perfermance drop introduced by below commit,
> >>when vhost example is started with --mergeable 0 and inside vm,
> >>kernel virtio-net driver is used to do ip based forwarding.
> >>
> >>The root cause is that below commit adds support for
> >>VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6, and when
> >>mergeable is disabled, it triggers big_packets path of virtio-net
> >>driver. In this path, virtio driver uses 19 desc with 18 4K-sized
> >>pages to receive each packet, so that it can receive a big packet
> >>with size of 64K. But QEMU only creates 256 desc entries for each
> >>vq, which results in that only 13 packets can be received. VM
> >>kernel can quickly handle those packets and go to sleep (HLT).
> >>
> >>As QEMU has no option to set the desc entries of a vq, so here,
> >>we disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6
> >>with VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we
> >>disable tso of vhost example, to avoid VM kernel virtio driver
> >>go into big_packets path.
> >>
> >>Fixes: 859b480d5afd ("vhost: add guest offload setting")
> >>
> >>Reported-by: Qian Xu <qian.q.xu@intel.com>
> >>Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> >We could apply this patch, but I don't think it actually fix anything:
> >
> >- it doesn't fix other vhost applications, say OVS, which is for sure
> >   way more widly used than vhost-example.
> 
> If I remember it correctly, OVS will enable mergeable.

Yes, and actually, vhost-example also should have enabled it by default.
Meanwhile, all features could be enabled/disabled by user.

> >
> >- it doesn't even fix it when tso is enabled and mergeable-rx is disabled
> >   with this vhost-example.
> 
> But we'd better avoid users go into such doubt that performance drops
> because of that commit under the case tso=off,mergeable=off, right?

I doubt people would actually use vhost-example (besides developer like
us), meaning they can NOT see the benifit from this patch; it also means
that user __does__ go into doubt that performance drops for the case
tso=off,mergeable=off.

Actually, it looks wrong to me to fiddle with those flags in the vhost-example.
If you want to disable tso, you should go disable it on the qemu side,
with something like:

    csum=off,gso=off,guest_tso4=off,guest_tso6=off,...

	--yliu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] examples/vhost: fix perf regression
  2016-07-19 13:53 [PATCH] examples/vhost: fix perf regression Jianfeng Tan
  2016-07-20  1:44 ` Yuanhan Liu
  2016-07-20  3:52 ` Xu, Qian Q
@ 2016-07-20  4:38 ` Yuanhan Liu
  2016-07-20  5:50   ` Tan, Jianfeng
  2016-07-21  0:23 ` [PATCH v2] " Jianfeng Tan
  2016-07-21  0:42 ` [PATCH v3] " Jianfeng Tan
  4 siblings, 1 reply; 16+ messages in thread
From: Yuanhan Liu @ 2016-07-20  4:38 UTC (permalink / raw)
  To: Jianfeng Tan; +Cc: dev, zhihong.wang, Xu, Qian Q

On Tue, Jul 19, 2016 at 01:53:11PM +0000, Jianfeng Tan wrote:
> We find significant perfermance drop introduced by below commit,
> when vhost example is started with --mergeable 0 and inside vm,
> kernel virtio-net driver is used to do ip based forwarding.
> 
> The root cause is that below commit adds support for
> VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6, and when
> mergeable is disabled, it triggers big_packets path of virtio-net
> driver. In this path, virtio driver uses 19 desc with 18 4K-sized
> pages to receive each packet, so that it can receive a big packet
> with size of 64K. But QEMU only creates 256 desc entries for each
> vq, which results in that only 13 packets can be received. VM
> kernel can quickly handle those packets and go to sleep (HLT).
> 
> As QEMU has no option to set the desc entries of a vq, so here,
> we disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6
> with VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we
> disable tso of vhost example, to avoid VM kernel virtio driver
> go into big_packets path.
> 
> Fixes: 859b480d5afd ("vhost: add guest offload setting")

And here you are patching vhost example to try to fix an "issue"
in vhost lib, this is __logically__ wrong.

	--yliu
> 
> Reported-by: Qian Xu <qian.q.xu@intel.com>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---
>  examples/vhost/main.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/examples/vhost/main.c b/examples/vhost/main.c
> index 3b98f42..92a9823 100644
> --- a/examples/vhost/main.c
> +++ b/examples/vhost/main.c
> @@ -327,6 +327,8 @@ port_init(uint8_t port)
>  	if (enable_tso == 0) {
>  		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO4);
>  		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO6);
> +		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_GUEST_TSO4);
> +		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_GUEST_TSO6);
>  	}
>  
>  	rx_rings = (uint16_t)dev_info.max_rx_queues;
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] examples/vhost: fix perf regression
  2016-07-20  4:38 ` Yuanhan Liu
@ 2016-07-20  5:50   ` Tan, Jianfeng
  2016-07-20  6:13     ` Yuanhan Liu
  0 siblings, 1 reply; 16+ messages in thread
From: Tan, Jianfeng @ 2016-07-20  5:50 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, zhihong.wang, Xu, Qian Q



On 7/20/2016 12:38 PM, Yuanhan Liu wrote:
> On Tue, Jul 19, 2016 at 01:53:11PM +0000, Jianfeng Tan wrote:
>> We find significant perfermance drop introduced by below commit,
>> when vhost example is started with --mergeable 0 and inside vm,
>> kernel virtio-net driver is used to do ip based forwarding.
>>
>> The root cause is that below commit adds support for
>> VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6, and when
>> mergeable is disabled, it triggers big_packets path of virtio-net
>> driver. In this path, virtio driver uses 19 desc with 18 4K-sized
>> pages to receive each packet, so that it can receive a big packet
>> with size of 64K. But QEMU only creates 256 desc entries for each
>> vq, which results in that only 13 packets can be received. VM
>> kernel can quickly handle those packets and go to sleep (HLT).
>>
>> As QEMU has no option to set the desc entries of a vq, so here,
>> we disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6
>> with VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we
>> disable tso of vhost example, to avoid VM kernel virtio driver
>> go into big_packets path.
>>
>> Fixes: 859b480d5afd ("vhost: add guest offload setting")
> And here you are patching vhost example to try to fix an "issue"
> in vhost lib, this is __logically__ wrong.
>
> 	--yliu

This is not an issue from vhost lib's perspective, vhost lib should 
provide all features it supports by default. Applications can 
enable/disable features according to their own requirements. And the 
vhost example after this commit just triggers a slow path of virtio 
driver. So this fix just makes sure vhost example does not go into the 
slow path by default.

By the way, if a fix patch should only involve those commits it will change?

Thanks,
Jianfeng

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] examples/vhost: fix perf regression
  2016-07-20  5:50   ` Tan, Jianfeng
@ 2016-07-20  6:13     ` Yuanhan Liu
  2016-07-20  6:30       ` Tan, Jianfeng
  0 siblings, 1 reply; 16+ messages in thread
From: Yuanhan Liu @ 2016-07-20  6:13 UTC (permalink / raw)
  To: Tan, Jianfeng; +Cc: dev, zhihong.wang, Xu, Qian Q

On Wed, Jul 20, 2016 at 01:50:34PM +0800, Tan, Jianfeng wrote:
> 
> 
> On 7/20/2016 12:38 PM, Yuanhan Liu wrote:
> >On Tue, Jul 19, 2016 at 01:53:11PM +0000, Jianfeng Tan wrote:
> >>We find significant perfermance drop introduced by below commit,
> >>when vhost example is started with --mergeable 0 and inside vm,
> >>kernel virtio-net driver is used to do ip based forwarding.
> >>
> >>The root cause is that below commit adds support for
> >>VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6, and when
> >>mergeable is disabled, it triggers big_packets path of virtio-net
> >>driver. In this path, virtio driver uses 19 desc with 18 4K-sized
> >>pages to receive each packet, so that it can receive a big packet
> >>with size of 64K. But QEMU only creates 256 desc entries for each
> >>vq, which results in that only 13 packets can be received. VM
> >>kernel can quickly handle those packets and go to sleep (HLT).
> >>
> >>As QEMU has no option to set the desc entries of a vq, so here,
> >>we disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6
> >>with VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we
> >>disable tso of vhost example, to avoid VM kernel virtio driver
> >>go into big_packets path.
> >>
> >>Fixes: 859b480d5afd ("vhost: add guest offload setting")
> >And here you are patching vhost example to try to fix an "issue"
> >in vhost lib, this is __logically__ wrong.
> >
> >	--yliu
> 
> This is not an issue from vhost lib's perspective, vhost lib should provide
> all features it supports by default.

Bingo.., that's why "Fixes: 859b480d5afd ... " is wrong to me.
  
> Applications can enable/disable
> features according to their own requirements.

Yes, application can, but application normally doesn't do that. And
as stated in my early reply, the qemu is the place you should go for
all those options enabling/disabling, but not vhost (not vhost-example).

I think it's sometimes more handy if we can do that by introducing
some vhost-example options, and I guess that's why those options are
given.

In another word, there is nothing wrong about the commit 859b480d5afd,
if you want to "fix" anything here, following commit is something
we need fix:

    Fixes: 9fd72e3cbd29 ("examples/vhost: add virtio offload")

Because that commit just partially disables some TSO related features,
letting the virtio net driver goes to the slow path.

> And the vhost example after
> this commit just triggers a slow path of virtio driver. So this fix just
> makes sure vhost example does not go into the slow path by default.

I have made a statement in the first time, that I am not object to
have this patch at all.

Meanwhile, the right "fix" is you need disable all TSO related features
from QEMU, in such way, we should see no such issue from all vhost
application, but not only this one, the one we used mostly internally.

As you can see, it's more about the usage.

> By the way, if a fix patch should only involve those commits it will change?

IMO, logically, yes.

	--yliu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] examples/vhost: fix perf regression
  2016-07-20  6:13     ` Yuanhan Liu
@ 2016-07-20  6:30       ` Tan, Jianfeng
  0 siblings, 0 replies; 16+ messages in thread
From: Tan, Jianfeng @ 2016-07-20  6:30 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: dev, zhihong.wang, Xu, Qian Q



On 7/20/2016 2:13 PM, Yuanhan Liu wrote:
> On Wed, Jul 20, 2016 at 01:50:34PM +0800, Tan, Jianfeng wrote:
>>
>> On 7/20/2016 12:38 PM, Yuanhan Liu wrote:
>>> On Tue, Jul 19, 2016 at 01:53:11PM +0000, Jianfeng Tan wrote:
>>>> We find significant perfermance drop introduced by below commit,
>>>> when vhost example is started with --mergeable 0 and inside vm,
>>>> kernel virtio-net driver is used to do ip based forwarding.
>>>>
>>>> The root cause is that below commit adds support for
>>>> VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6, and when
>>>> mergeable is disabled, it triggers big_packets path of virtio-net
>>>> driver. In this path, virtio driver uses 19 desc with 18 4K-sized
>>>> pages to receive each packet, so that it can receive a big packet
>>>> with size of 64K. But QEMU only creates 256 desc entries for each
>>>> vq, which results in that only 13 packets can be received. VM
>>>> kernel can quickly handle those packets and go to sleep (HLT).
>>>>
>>>> As QEMU has no option to set the desc entries of a vq, so here,
>>>> we disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6
>>>> with VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we
>>>> disable tso of vhost example, to avoid VM kernel virtio driver
>>>> go into big_packets path.
>>>>
>>>> Fixes: 859b480d5afd ("vhost: add guest offload setting")
>>> And here you are patching vhost example to try to fix an "issue"
>>> in vhost lib, this is __logically__ wrong.
>>>
>>> 	--yliu
>> This is not an issue from vhost lib's perspective, vhost lib should provide
>> all features it supports by default.
> Bingo.., that's why "Fixes: 859b480d5afd ... " is wrong to me.
>    
>> Applications can enable/disable
>> features according to their own requirements.
> Yes, application can, but application normally doesn't do that. And
> as stated in my early reply, the qemu is the place you should go for
> all those options enabling/disabling, but not vhost (not vhost-example).
>
> I think it's sometimes more handy if we can do that by introducing
> some vhost-example options, and I guess that's why those options are
> given.
>
> In another word, there is nothing wrong about the commit 859b480d5afd,
> if you want to "fix" anything here, following commit is something
> we need fix:
>
>      Fixes: 9fd72e3cbd29 ("examples/vhost: add virtio offload")
>
> Because that commit just partially disables some TSO related features,
> letting the virtio net driver goes to the slow path.

Great, I see. And thanks for detailed clarification. I'll send v2.

>
>> And the vhost example after
>> this commit just triggers a slow path of virtio driver. So this fix just
>> makes sure vhost example does not go into the slow path by default.
> I have made a statement in the first time, that I am not object to
> have this patch at all.
>
> Meanwhile, the right "fix" is you need disable all TSO related features
> from QEMU, in such way, we should see no such issue from all vhost
> application, but not only this one, the one we used mostly internally.
>
> As you can see, it's more about the usage.

Yes, I agree this is the BKM we should adopt and recommend users to use.

Thanks,
Jianfeng

>
>> By the way, if a fix patch should only involve those commits it will change?
> IMO, logically, yes.
>
> 	--yliu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2] examples/vhost: fix perf regression
  2016-07-19 13:53 [PATCH] examples/vhost: fix perf regression Jianfeng Tan
                   ` (2 preceding siblings ...)
  2016-07-20  4:38 ` Yuanhan Liu
@ 2016-07-21  0:23 ` Jianfeng Tan
  2016-07-21  0:42   ` Tan, Jianfeng
  2016-07-21  0:42 ` [PATCH v3] " Jianfeng Tan
  4 siblings, 1 reply; 16+ messages in thread
From: Jianfeng Tan @ 2016-07-21  0:23 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, zhihong.wang, qian.q.xu, Jianfeng Tan

We find significant perfermance drop introduced by below commit,
when vhost example is started with --mergeable 0 and inside vm,
kernel virtio-net driver is used to do ip based forwarding.

The root cause is that below commit adds support for
VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6, and when
mergeable is disabled, it triggers big_packets path of virtio-net
driver. In this path, virtio driver uses 19 desc with 18 4K-sized
pages to receive each packet, so that it can receive a big packet
with size of 64K. But QEMU only creates 256 desc entries for each
vq, which results in that only 13 packets can be received. VM
kernel can quickly handle those packets and go to sleep (HLT).

As QEMU has no option to set the desc entries of a vq, so here,
we disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6
with VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we
disable tso of vhost example, to avoid VM kernel virtio driver
go into big_packets path.

Fixes: 9fd72e3cbd29 ("examples/vhost: add virtio offload")

Reported-by: Qian Xu <qian.q.xu@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
v2: change the Fixes line to point to proper commit to fix.
 examples/vhost/main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 3b98f42..92a9823 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -327,6 +327,8 @@ port_init(uint8_t port)
 	if (enable_tso == 0) {
 		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO4);
 		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO6);
+		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_GUEST_TSO4);
+		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_GUEST_TSO6);
 	}
 
 	rx_rings = (uint16_t)dev_info.max_rx_queues;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] examples/vhost: fix perf regression
  2016-07-21  0:23 ` [PATCH v2] " Jianfeng Tan
@ 2016-07-21  0:42   ` Tan, Jianfeng
  0 siblings, 0 replies; 16+ messages in thread
From: Tan, Jianfeng @ 2016-07-21  0:42 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, Wang, Zhihong, Xu, Qian Q

Self-Nack this patch because the commit log needs change further.

> -----Original Message-----
> From: Tan, Jianfeng
> Sent: Thursday, July 21, 2016 8:24 AM
> To: dev@dpdk.org
> Cc: yuanhan.liu@linux.intel.com; Wang, Zhihong; Xu, Qian Q; Tan, Jianfeng
> Subject: [PATCH v2] examples/vhost: fix perf regression
> 
> We find significant perfermance drop introduced by below commit,
> when vhost example is started with --mergeable 0 and inside vm,
> kernel virtio-net driver is used to do ip based forwarding.
> 
> The root cause is that below commit adds support for
> VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6, and when
> mergeable is disabled, it triggers big_packets path of virtio-net
> driver. In this path, virtio driver uses 19 desc with 18 4K-sized
> pages to receive each packet, so that it can receive a big packet
> with size of 64K. But QEMU only creates 256 desc entries for each
> vq, which results in that only 13 packets can be received. VM
> kernel can quickly handle those packets and go to sleep (HLT).
> 
> As QEMU has no option to set the desc entries of a vq, so here,
> we disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6
> with VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we
> disable tso of vhost example, to avoid VM kernel virtio driver
> go into big_packets path.
> 
> Fixes: 9fd72e3cbd29 ("examples/vhost: add virtio offload")
> 
> Reported-by: Qian Xu <qian.q.xu@intel.com>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---
> v2: change the Fixes line to point to proper commit to fix.
>  examples/vhost/main.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/examples/vhost/main.c b/examples/vhost/main.c
> index 3b98f42..92a9823 100644
> --- a/examples/vhost/main.c
> +++ b/examples/vhost/main.c
> @@ -327,6 +327,8 @@ port_init(uint8_t port)
>  	if (enable_tso == 0) {
>  		rte_vhost_feature_disable(1ULL <<
> VIRTIO_NET_F_HOST_TSO4);
>  		rte_vhost_feature_disable(1ULL <<
> VIRTIO_NET_F_HOST_TSO6);
> +		rte_vhost_feature_disable(1ULL <<
> VIRTIO_NET_F_GUEST_TSO4);
> +		rte_vhost_feature_disable(1ULL <<
> VIRTIO_NET_F_GUEST_TSO6);
>  	}
> 
>  	rx_rings = (uint16_t)dev_info.max_rx_queues;
> --
> 2.7.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3] examples/vhost: fix perf regression
  2016-07-19 13:53 [PATCH] examples/vhost: fix perf regression Jianfeng Tan
                   ` (3 preceding siblings ...)
  2016-07-21  0:23 ` [PATCH v2] " Jianfeng Tan
@ 2016-07-21  0:42 ` Jianfeng Tan
  2016-07-21  1:34   ` Yuanhan Liu
  4 siblings, 1 reply; 16+ messages in thread
From: Jianfeng Tan @ 2016-07-21  0:42 UTC (permalink / raw)
  To: dev; +Cc: yuanhan.liu, zhihong.wang, qian.q.xu, Jianfeng Tan

We find significant perfermance drop introduced by below commit,
when vhost example is started with --mergeable 0 and inside vm,
kernel virtio-net driver is used to do ip based forwarding.

The commit, 859b480d5afd ("vhost: add guest offload setting"), adds
support for VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6,
in vhost lib. But inside vhost example, the way to disable tso only
excludes the direction from virtio to vhost, but not the opposite
direction. When mergeable is disabled, it triggers big_packets path
of virtio-net driver to prepare to receive possible big packets with
size of 64K. Because mergeable is off, for each entry of avail ring,
virtio driver uses 19 desc chained together, with one desc pointing
to header, other 18 desc pointing to 4K-sized pages. But QEMU only
creates 256 desc entries for each vq, which results in that only 13
packets can be received. VM kernel can quickly handle those packets
and go to sleep (HLT).

As QEMU has no option to set the desc entries of a vq, so here,
we disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6
with VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we
disable tso of vhost example, to avoid VM kernel virtio driver
go into big_packets path.

Fixes: 9fd72e3cbd29 ("examples/vhost: add virtio offload")

Reported-by: Qian Xu <qian.q.xu@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
v3: reword commit log.
v2: change the Fixes line to point to proper commit to fix.
 examples/vhost/main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 3b98f42..92a9823 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -327,6 +327,8 @@ port_init(uint8_t port)
 	if (enable_tso == 0) {
 		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO4);
 		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO6);
+		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_GUEST_TSO4);
+		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_GUEST_TSO6);
 	}
 
 	rx_rings = (uint16_t)dev_info.max_rx_queues;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] examples/vhost: fix perf regression
  2016-07-21  0:42 ` [PATCH v3] " Jianfeng Tan
@ 2016-07-21  1:34   ` Yuanhan Liu
  2016-07-21  1:38     ` Xu, Qian Q
  2016-07-22  9:59     ` Thomas Monjalon
  0 siblings, 2 replies; 16+ messages in thread
From: Yuanhan Liu @ 2016-07-21  1:34 UTC (permalink / raw)
  To: Jianfeng Tan; +Cc: dev, zhihong.wang, qian.q.xu

On Thu, Jul 21, 2016 at 12:42:45AM +0000, Jianfeng Tan wrote:
> We find significant perfermance drop introduced by below commit,
> when vhost example is started with --mergeable 0 and inside vm,
> kernel virtio-net driver is used to do ip based forwarding.
> 
> The commit, 859b480d5afd ("vhost: add guest offload setting"), adds
> support for VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6,
> in vhost lib. But inside vhost example, the way to disable tso only
> excludes the direction from virtio to vhost, but not the opposite
> direction. When mergeable is disabled, it triggers big_packets path
> of virtio-net driver to prepare to receive possible big packets with
> size of 64K. Because mergeable is off, for each entry of avail ring,
> virtio driver uses 19 desc chained together, with one desc pointing
> to header, other 18 desc pointing to 4K-sized pages. But QEMU only
> creates 256 desc entries for each vq, which results in that only 13
> packets can be received. VM kernel can quickly handle those packets
> and go to sleep (HLT).
> 
> As QEMU has no option to set the desc entries of a vq, so here,
> we disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6
> with VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we
> disable tso of vhost example, to avoid VM kernel virtio driver
> go into big_packets path.
> 
> Fixes: 9fd72e3cbd29 ("examples/vhost: add virtio offload")
> 
> Reported-by: Qian Xu <qian.q.xu@intel.com>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---
> v3: reword commit log.

Yes, much better. One minor nit: you forgot to carry the Tested-by from
Qian.

Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

Thanks.

	--yliu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] examples/vhost: fix perf regression
  2016-07-21  1:34   ` Yuanhan Liu
@ 2016-07-21  1:38     ` Xu, Qian Q
  2016-07-22  9:59     ` Thomas Monjalon
  1 sibling, 0 replies; 16+ messages in thread
From: Xu, Qian Q @ 2016-07-21  1:38 UTC (permalink / raw)
  To: Yuanhan Liu, Tan, Jianfeng; +Cc: dev, Wang, Zhihong

Add the tested-by:)

Tested-by: Qian Xu <qian.q.xu@intel.com>

- Test Commit: 608487f3fc96704271c624d0f3fe9d7fb2187aea
- OS/Kernel: Fedora 21/4.1.13
- GCC: gcc (GCC) 4.9.2 20141101 (Red Hat 4.9.2-1)
- CPU: Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10
- NIC: Intel(R) Ethernet Controller X710 for 10GbE SFP+
- Total 2 cases, 2 passed, 0 failed. 

Test Case1: Virtio-net IPV4 fwd performance with mergable=off

Summary: 
Launch the vhost-switch sample, and launch VM with 2 virtio-net devices, let 2 virtio-net run IPV4 fwd, send traffic to the NIC port and let the traffic go through 2 virtio-net devices. Check the performance.

Details: 
1. Bind one port to igb_uio. 
2. Run vhost switch sample with mergeable=0, disable mergeable. 
taskset -c 18-19 ./examples/vhost/build/vhost-switch -c 0xc0000 -n 4 --huge-dir /mnt/huge --socket-mem 1024,1024 -- -p 0x1 --mergeable 0 --vm2vm 0 3. Launch VM: 
taskset -c 22-23 \
/root/qemu-versions/qemu-2.6.0/x86_64-softmmu/qemu-system-x86_64 -name vm1 \ -cpu host -enable-kvm -m 2048 -object memory-backend-file,id=mem,size=2048M,mem-path=/mnt/huge,share=on -numa node,memdev=mem -mem-prealloc \ -smp cores=4 -drive file=/home/img/vm1.img  \ -chardev socket,id=char0,path=./vhost-net \ -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \ -device virtio-net-pci,mac=52:54:00:00:00:01,netdev=mynet1,mrg_rxbuf=on \ -chardev socket,id=char1,path=./vhost-net \ -netdev type=vhost-user,id=mynet2,chardev=char1,vhostforce \ -device virtio-net-pci,mac=52:54:00:00:00:02,netdev=mynet2,mrg_rxbuf=on \ -netdev tap,id=ipvm1,ifname=tap3,script=/etc/qemu-ifup -device rtl8139,netdev=ipvm1,id=net0,mac=00:00:00:00:10:01 \ -vnc :3 -daemonize 4. Set IPV4 fwd rules in VM: 
virtio1=$1
virtio2=$2
systemctl stop firewalld.service
systemctl disable firewalld.service
systemctl stop ip6tables.service
systemctl disable ip6tables.service
systemctl stop iptables.service
systemctl disable iptables.service
systemctl stop NetworkManager.service
echo 1 >/proc/sys/net/ipv4/ip_forward
ip addr add 192.168.1.2/24 dev $virtio1
ip neigh add 192.168.1.1 lladdr 00:00:10:00:24:00 dev $virtio1 ip link set dev $virtio1 up

ip addr add 192.168.2.2/24 dev $virtio2
ip neigh add 192.168.2.1 lladdr 00:00:10:00:24:01 dev $virtio2 ip link set dev $virtio2 up

5. Send traffic to NIC and see the performance back from virtio2. The performance is back with the patch. 

Test Case2: Virtio-net IPV4 fwd performance with mergable=on Similar steps, just one feature set is different, set mergable=1 in the vhost-switch sample, then the performance is good as before.

-----Original Message-----
From: Yuanhan Liu [mailto:yuanhan.liu@linux.intel.com] 
Sent: Thursday, July 21, 2016 9:34 AM
To: Tan, Jianfeng <jianfeng.tan@intel.com>
Cc: dev@dpdk.org; Wang, Zhihong <zhihong.wang@intel.com>; Xu, Qian Q <qian.q.xu@intel.com>
Subject: Re: [PATCH v3] examples/vhost: fix perf regression

On Thu, Jul 21, 2016 at 12:42:45AM +0000, Jianfeng Tan wrote:
> We find significant perfermance drop introduced by below commit, when 
> vhost example is started with --mergeable 0 and inside vm, kernel 
> virtio-net driver is used to do ip based forwarding.
> 
> The commit, 859b480d5afd ("vhost: add guest offload setting"), adds 
> support for VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6, in 
> vhost lib. But inside vhost example, the way to disable tso only 
> excludes the direction from virtio to vhost, but not the opposite 
> direction. When mergeable is disabled, it triggers big_packets path of 
> virtio-net driver to prepare to receive possible big packets with size 
> of 64K. Because mergeable is off, for each entry of avail ring, virtio 
> driver uses 19 desc chained together, with one desc pointing to 
> header, other 18 desc pointing to 4K-sized pages. But QEMU only 
> creates 256 desc entries for each vq, which results in that only 13 
> packets can be received. VM kernel can quickly handle those packets 
> and go to sleep (HLT).
> 
> As QEMU has no option to set the desc entries of a vq, so here, we 
> disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6 with 
> VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we disable tso 
> of vhost example, to avoid VM kernel virtio driver go into big_packets 
> path.
> 
> Fixes: 9fd72e3cbd29 ("examples/vhost: add virtio offload")
> 
> Reported-by: Qian Xu <qian.q.xu@intel.com>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---
> v3: reword commit log.

Yes, much better. One minor nit: you forgot to carry the Tested-by from Qian.

Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

Thanks.

	--yliu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3] examples/vhost: fix perf regression
  2016-07-21  1:34   ` Yuanhan Liu
  2016-07-21  1:38     ` Xu, Qian Q
@ 2016-07-22  9:59     ` Thomas Monjalon
  1 sibling, 0 replies; 16+ messages in thread
From: Thomas Monjalon @ 2016-07-22  9:59 UTC (permalink / raw)
  To: Jianfeng Tan; +Cc: dev, Yuanhan Liu, zhihong.wang, qian.q.xu

2016-07-21 09:34, Yuanhan Liu:
> On Thu, Jul 21, 2016 at 12:42:45AM +0000, Jianfeng Tan wrote:
> > We find significant perfermance drop introduced by below commit,
> > when vhost example is started with --mergeable 0 and inside vm,
> > kernel virtio-net driver is used to do ip based forwarding.
> > 
> > The commit, 859b480d5afd ("vhost: add guest offload setting"), adds
> > support for VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6,
> > in vhost lib. But inside vhost example, the way to disable tso only
> > excludes the direction from virtio to vhost, but not the opposite
> > direction. When mergeable is disabled, it triggers big_packets path
> > of virtio-net driver to prepare to receive possible big packets with
> > size of 64K. Because mergeable is off, for each entry of avail ring,
> > virtio driver uses 19 desc chained together, with one desc pointing
> > to header, other 18 desc pointing to 4K-sized pages. But QEMU only
> > creates 256 desc entries for each vq, which results in that only 13
> > packets can be received. VM kernel can quickly handle those packets
> > and go to sleep (HLT).
> > 
> > As QEMU has no option to set the desc entries of a vq, so here,
> > we disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6
> > with VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we
> > disable tso of vhost example, to avoid VM kernel virtio driver
> > go into big_packets path.
> > 
> > Fixes: 9fd72e3cbd29 ("examples/vhost: add virtio offload")
> > 
> > Reported-by: Qian Xu <qian.q.xu@intel.com>
> > Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> > ---
> > v3: reword commit log.
> 
> Yes, much better. One minor nit: you forgot to carry the Tested-by from
> Qian.
> 
> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

Applied, thanks

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-07-22  9:59 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-19 13:53 [PATCH] examples/vhost: fix perf regression Jianfeng Tan
2016-07-20  1:44 ` Yuanhan Liu
2016-07-20  2:44   ` Tan, Jianfeng
2016-07-20  3:16     ` Xu, Qian Q
2016-07-20  4:00     ` Yuanhan Liu
2016-07-20  3:52 ` Xu, Qian Q
2016-07-20  4:38 ` Yuanhan Liu
2016-07-20  5:50   ` Tan, Jianfeng
2016-07-20  6:13     ` Yuanhan Liu
2016-07-20  6:30       ` Tan, Jianfeng
2016-07-21  0:23 ` [PATCH v2] " Jianfeng Tan
2016-07-21  0:42   ` Tan, Jianfeng
2016-07-21  0:42 ` [PATCH v3] " Jianfeng Tan
2016-07-21  1:34   ` Yuanhan Liu
2016-07-21  1:38     ` Xu, Qian Q
2016-07-22  9:59     ` Thomas Monjalon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.