-----邮件原件-----
发件人: Willem de Bruijn [mailto:willemdebruijn.kernel@gmail.com] 
发送时间: 2020年3月30日 9:52
收件人: Yi Yang (杨燚)-云服务集团 <yangyi01@inspur.com>
抄送: willemdebruijn.kernel@gmail.com; yang_y_yi@163.com; netdev@vger.kernel.org; u9012063@gmail.com
主题: Re: [vger.kernel.org代发]Re: [vger.kernel.org代发]Re: [PATCH net-next] net/ packet: fix TPACKET_V3 performance issue in case of TSO

> iperf3 test result
> -----------------------
> [yangyi@localhost ovs-master]$ sudo ../run-iperf3.sh
> iperf3: no process found
> Connecting to host 10.15.1.3, port 5201 [  4] local 10.15.1.2 port 
> 44976 connected to 10.15.1.3 port 5201
> [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
> [  4]   0.00-10.00  sec  19.6 GBytes  16.8 Gbits/sec  106586    307 KBytes
> [  4]  10.00-20.00  sec  19.5 GBytes  16.7 Gbits/sec  104625    215 KBytes
> [  4]  20.00-30.00  sec  20.0 GBytes  17.2 Gbits/sec  106962    301 KBytes

Thanks for the detailed info.

So there is more going on there than a simple network tap. veth, which calls netif_rx and thus schedules delivery with a napi after a softirq (twice), tpacket for recv + send + ovs processing. And this is a single flow, so more sensitive to batching, drops and interrupt moderation than a workload of many flows.

If anything, I would expect the ACKs on the return path to be the more likely cause for concern, as they are even less likely to fill a block before the timer. The return path is a separate packet socket?

With initial small window size, I guess it might be possible for the entire window to be in transit. And as no follow-up data will arrive, this waits for the timeout. But at 3Gbps that is no longer the case.
Again, the timeout is intrinsic to TPACKET_V3. If that is unacceptable, then TPACKET_V2 is a more logical choice. Here also in relation to timely ACK responses.

Other users of TPACKET_V3 may be using fewer blocks of larger size. A change to retire blocks after 1 gso packet will negatively affect their workloads. At the very least this should be an optional feature, similar to how I suggested converting to micro seconds.

[Yi Yang] My iperf3 test is TCP socket, return path is same socket as forward path. BTW this patch will retire current block only if vnet header is in packets, I don't know what else use cases will use vnet header except our user scenario. In addition, I also have more conditions to limit this, but it impacts on performance. I'll try if V2 can fix our issue, this will be only one way to fix our issue if not.

+
+       if (do_vnet) {
+               vnet_hdr_ok = virtio_net_hdr_from_skb(skb, &vnet_hdr,
+                                                     vio_le(), true, 0);
+               /* Improve performance by retiring current block for
+                * TPACKET_V3 in case of TSO.
+                */
+               if (vnet_hdr_ok == 0 && po->tp_version == TPACKET_V3 &&
+                   vnet_hdr.flags != 0 &&
+                   (vnet_hdr.gso_type == VIRTIO_NET_HDR_GSO_TCPV4 ||
+                       vnet_hdr.gso_type == VIRTIO_NET_HDR_GSO_TCPV6)) {
+                       retire_cur_frame = true;
+               }
+       }
+