From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.6 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C63BBC433E0 for ; Wed, 1 Jul 2020 14:10:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 95C9420702 for ; Wed, 1 Jul 2020 14:10:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="gvkMcSfe" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731397AbgGAOKe (ORCPT ); Wed, 1 Jul 2020 10:10:34 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:45886 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731222AbgGAOKd (ORCPT ); Wed, 1 Jul 2020 10:10:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1593612630; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=khW6d9Ym4bkQrcnO5y03gt6SBZFm61i7mNPGes2rY08=; b=gvkMcSfeXUYGkYM6n4mm9gc7Iyg+lG4AowwLdeMmNqO4Oc6u0u/gwr0nomtfiM5z4hynFL 7xqpCxjcVtY1eNkWentWXhFppD649Mjag00VIUCQ1vuwVcLs+ZZnHkwPeeVA8C160Rx77W W7opfrES5Hz3qlH2ecMNLZ3nVH40uyY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-149-75JDQkjAMjuLA30dqQbpWg-1; Wed, 01 Jul 2020 10:10:12 -0400 X-MC-Unique: 75JDQkjAMjuLA30dqQbpWg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 324DA8015CE; Wed, 1 Jul 2020 14:10:11 +0000 (UTC) Received: from [10.72.12.71] (ovpn-12-71.pek2.redhat.com [10.72.12.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9BE105C1C5; Wed, 1 Jul 2020 14:09:55 +0000 (UTC) Subject: Re: [PATCH RFC v8 02/11] vhost: use batched get_vq_desc version To: Eugenio Perez Martin Cc: "Michael S. Tsirkin" , Konrad Rzeszutek Wilk , linux-kernel@vger.kernel.org, kvm list , virtualization@lists.linux-foundation.org, netdev@vger.kernel.org References: <20200611113404.17810-1-mst@redhat.com> <20200611113404.17810-3-mst@redhat.com> <20200611152257.GA1798@char.us.oracle.com> <20200622114622-mutt-send-email-mst@kernel.org> <20200622122546-mutt-send-email-mst@kernel.org> <419cc689-adae-7ba4-fe22-577b3986688c@redhat.com> From: Jason Wang Message-ID: <0a83aa03-8e3c-1271-82f5-4c07931edea3@redhat.com> Date: Wed, 1 Jul 2020 22:09:53 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020/7/1 下午9:04, Eugenio Perez Martin wrote: > On Wed, Jul 1, 2020 at 2:40 PM Jason Wang wrote: >> >> On 2020/7/1 下午6:43, Eugenio Perez Martin wrote: >>> On Tue, Jun 23, 2020 at 6:15 PM Eugenio Perez Martin >>> wrote: >>>> On Mon, Jun 22, 2020 at 6:29 PM Michael S. Tsirkin wrote: >>>>> On Mon, Jun 22, 2020 at 06:11:21PM +0200, Eugenio Perez Martin wrote: >>>>>> On Mon, Jun 22, 2020 at 5:55 PM Michael S. Tsirkin wrote: >>>>>>> On Fri, Jun 19, 2020 at 08:07:57PM +0200, Eugenio Perez Martin wrote: >>>>>>>> On Mon, Jun 15, 2020 at 2:28 PM Eugenio Perez Martin >>>>>>>> wrote: >>>>>>>>> On Thu, Jun 11, 2020 at 5:22 PM Konrad Rzeszutek Wilk >>>>>>>>> wrote: >>>>>>>>>> On Thu, Jun 11, 2020 at 07:34:19AM -0400, Michael S. Tsirkin wrote: >>>>>>>>>>> As testing shows no performance change, switch to that now. >>>>>>>>>> What kind of testing? 100GiB? Low latency? >>>>>>>>>> >>>>>>>>> Hi Konrad. >>>>>>>>> >>>>>>>>> I tested this version of the patch: >>>>>>>>> https://lkml.org/lkml/2019/10/13/42 >>>>>>>>> >>>>>>>>> It was tested for throughput with DPDK's testpmd (as described in >>>>>>>>> http://doc.dpdk.org/guides/howto/virtio_user_as_exceptional_path.html) >>>>>>>>> and kernel pktgen. No latency tests were performed by me. Maybe it is >>>>>>>>> interesting to perform a latency test or just a different set of tests >>>>>>>>> over a recent version. >>>>>>>>> >>>>>>>>> Thanks! >>>>>>>> I have repeated the tests with v9, and results are a little bit different: >>>>>>>> * If I test opening it with testpmd, I see no change between versions >>>>>>> OK that is testpmd on guest, right? And vhost-net on the host? >>>>>>> >>>>>> Hi Michael. >>>>>> >>>>>> No, sorry, as described in >>>>>> http://doc.dpdk.org/guides/howto/virtio_user_as_exceptional_path.html. >>>>>> But I could add to test it in the guest too. >>>>>> >>>>>> These kinds of raw packets "bursts" do not show performance >>>>>> differences, but I could test deeper if you think it would be worth >>>>>> it. >>>>> Oh ok, so this is without guest, with virtio-user. >>>>> It might be worth checking dpdk within guest too just >>>>> as another data point. >>>>> >>>> Ok, I will do it! >>>> >>>>>>>> * If I forward packets between two vhost-net interfaces in the guest >>>>>>>> using a linux bridge in the host: >>>>>>> And here I guess you mean virtio-net in the guest kernel? >>>>>> Yes, sorry: Two virtio-net interfaces connected with a linux bridge in >>>>>> the host. More precisely: >>>>>> * Adding one of the interfaces to another namespace, assigning it an >>>>>> IP, and starting netserver there. >>>>>> * Assign another IP in the range manually to the other virtual net >>>>>> interface, and start the desired test there. >>>>>> >>>>>> If you think it would be better to perform then differently please let me know. >>>>> Not sure why you bother with namespaces since you said you are >>>>> using L2 bridging. I guess it's unimportant. >>>>> >>>> Sorry, I think I should have provided more context about that. >>>> >>>> The only reason to use namespaces is to force the traffic of these >>>> netperf tests to go through the external bridge. To test netperf >>>> different possibilities than the testpmd (or pktgen or others "blast >>>> of frames unconditionally" tests). >>>> >>>> This way, I make sure that is the same version of everything in the >>>> guest, and is a little bit easier to manage cpu affinity, start and >>>> stop testing... >>>> >>>> I could use a different VM for sending and receiving, but I find this >>>> way a faster one and it should not introduce a lot of noise. I can >>>> test with two VM if you think that this use of network namespace >>>> introduces too much noise. >>>> >>>> Thanks! >>>> >>>>>>>> - netperf UDP_STREAM shows a performance increase of 1.8, almost >>>>>>>> doubling performance. This gets lower as frame size increase. >>> Regarding UDP_STREAM: >>> * with event_idx=on: The performance difference is reduced a lot if >>> applied affinity properly (manually assigning CPU on host/guest and >>> setting IRQs on guest), making them perform equally with and without >>> the patch again. Maybe the batching makes the scheduler perform >>> better. >> >> Note that for UDP_STREAM, the result is pretty trick to be analyzed. E.g >> setting a sndbuf for TAP may help for the performance (reduce the drop). >> > Ok, will add that to the test. Thanks! Actually, it's better to skip the UDP_STREAM test since: - My understanding is very few application is using raw UDP stream - It's hard to analyze (usually you need to count the drop ratio etc) > >>>>>>>> - rests of the test goes noticeably worse: UDP_RR goes from ~6347 >>>>>>>> transactions/sec to 5830 >>> * Regarding UDP_RR, TCP_STREAM, and TCP_RR, proper CPU pinning makes >>> them perform similarly again, only a very small performance drop >>> observed. It could be just noise. >>> ** All of them perform better than vanilla if event_idx=off, not sure >>> why. I can try to repeat them if you suspect that can be a test >>> failure. >>> >>> * With testpmd and event_idx=off, if I send from the VM to host, I see >>> a performance increment especially in small packets. The buf api also >>> increases performance compared with only batching: Sending the minimum >>> packet size in testpmd makes pps go from 356kpps to 473 kpps. >> >> What's your setup for this. The number looks rather low. I'd expected >> 1-2 Mpps at least. >> > Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz, 2 NUMA nodes of 16G memory > each, and no device assigned to the NUMA node I'm testing in. Too low > for testpmd AF_PACKET driver too? I don't test AF_PACKET, I guess it should use the V3 which mmap based zerocopy interface. And it might worth to check the cpu utilization of vhost thread. It's required to stress it as 100% otherwise there could be a bottleneck somewhere. > >>> Sending >>> 1024 length UDP-PDU makes it go from 570kpps to 64 kpps. >>> >>> Something strange I observe in these tests: I get more pps the bigger >>> the transmitted buffer size is. Not sure why. >>> >>> ** Sending from the host to the VM does not make a big change with the >>> patches in small packets scenario (minimum, 64 bytes, about 645 >>> without the patch, ~625 with batch and batch+buf api). If the packets >>> are bigger, I can see a performance increase: with 256 bits, >> >> I think you meant bytes? >> > Yes, sorry. > >>> it goes >>> from 590kpps to about 600kpps, and in case of 1500 bytes payload it >>> gets from 348kpps to 528kpps, so it is clearly an improvement. >>> >>> * with testpmd and event_idx=on, batching+buf api perform similarly in >>> both directions. >>> >>> All of testpmd tests were performed with no linux bridge, just a >>> host's tap interface ( in xml), >> >> What DPDK driver did you use in the test (AF_PACKET?). >> > Yes, both testpmd are using AF_PACKET driver. I see, using AF_PACKET means extra layers of issues need to be analyzed which is probably not good. > >>> with a >>> testpmd txonly and another in rxonly forward mode, and using the >>> receiving side packets/bytes data. Guest's rps, xps and interrupts, >>> and host's vhost threads affinity were also tuned in each test to >>> schedule both testpmd and vhost in different processors. >> >> My feeling is that if we start from simple setup, it would be more >> easier as a start. E.g start without an VM. >> >> 1) TX: testpmd(txonly) -> virtio-user -> vhost_net -> XDP_DROP on TAP >> 2) RX: pkgetn -> TAP -> vhost_net -> testpmd(rxonly) >> > Got it. Is there a reason to prefer pktgen over testpmd? I think the reason is using testpmd you must use a userspace kernel interface (AF_PACKET), and it could not be as fast as pktgen since: - it talks directly to xmit of TAP - skb can be cloned Thanks > >> Thanks >> >> >>> I will send the v10 RFC with the small changes requested by Stefan and Jason. >>> >>> Thanks! >>> >>> >>> >>> >>> >>> >>> >>>>>>> OK so it seems plausible that we still have a bug where an interrupt >>>>>>> is delayed. That is the main difference between pmd and virtio. >>>>>>> Let's try disabling event index, and see what happens - that's >>>>>>> the trickiest part of interrupts. >>>>>>> >>>>>> Got it, will get back with the results. >>>>>> >>>>>> Thank you very much! >>>>>> >>>>>>>> - TCP_STREAM goes from ~10.7 gbps to ~7Gbps >>>>>>>> - TCP_RR from 6223.64 transactions/sec to 5739.44