From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tiwei Bie Subject: Re: [PATCH v6 00/11] implement packed virtqueues Date: Fri, 21 Sep 2018 20:32:22 +0800 Message-ID: <20180921123222.GA25292@debian> References: <20180921103308.16357-1-jfreimann@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: dev@dpdk.org, maxime.coquelin@redhat.com, Gavin.Hu@arm.com, zhihong.wang@intel.com To: Jens Freimann Return-path: Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by dpdk.org (Postfix) with ESMTP id 80A33A49 for ; Fri, 21 Sep 2018 14:33:34 +0200 (CEST) Content-Disposition: inline In-Reply-To: <20180921103308.16357-1-jfreimann@redhat.com> List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Fri, Sep 21, 2018 at 12:32:57PM +0200, Jens Freimann wrote: > This is a basic implementation of packed virtqueues as specified in the > Virtio 1.1 draft. A compiled version of the current draft is available > at https://github.com/oasis-tcs/virtio-docs.git (or as .pdf at > https://github.com/oasis-tcs/virtio-docs/blob/master/virtio-v1.1-packed-wd10.pdf > > A packed virtqueue is different from a split virtqueue in that it > consists of only a single descriptor ring that replaces available and > used ring, index and descriptor buffer. > > Each descriptor is readable and writable and has a flags field. These flags > will mark if a descriptor is available or used. To detect new available descriptors > even after the ring has wrapped, device and driver each have a > single-bit wrap counter that is flipped from 0 to 1 and vice versa every time > the last descriptor in the ring is used/made available. > > The idea behind this is to 1. improve performance by avoiding cache misses > and 2. be easier for devices to implement. > > Regarding performance: with these patches I get 21.13 Mpps on my system > as compared to 18.8 Mpps with the virtio 1.0 code. Packet size was 64 Did you enable multiple-queue and use multiple cores on vhost side? If not, I guess the above performance gain is the gain in vhost side instead of virtio side. If you use more cores on vhost side or virtio side, will you see any performance changes? Did you do any performance test with the kernel vhost-net backend (with zero-copy enabled and disabled)? I think we also need some performance data for these two cases. And it can help us to make sure that it works with the kernel backends. And for the "virtio-PMD + vhost-PMD" test cases, I think we need below performance data: #1. The maximum 1 core performance of virtio PMD when using split ring. #2. The maximum 1 core performance of virtio PMD when using packed ring. #3. The maximum 1 core performance of vhost PMD when using split ring. #4. The maximum 1 core performance of vhost PMD when using packed ring. And then we can have a clear understanding of the performance gain in DPDK with packed ring. And FYI, the maximum 1 core performance of virtio PMD can be got in below steps: 1. Launch vhost-PMD with multiple queues, and use multiple CPU cores for forwarding. 2. Launch virtio-PMD with multiple queues and use 1 CPU core for forwarding. 3. Repeat above two steps with adding more CPU cores for forwarding in vhost-PMD side until we can't see performance increase anymore. Besides, I just did a quick glance at the Tx implementation, it still assumes the descs will be written back in order by device. You can find more details from my comments on that patch. Thanks > bytes, 0.05% acceptable loss. Test setup is described as in > http://dpdk.org/doc/guides/howto/pvp_reference_benchmark.html > > Packet generator: > MoonGen > Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz > Intel X710 NIC > RHEL 7.4 > > Device under test: > Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz > Intel X710 NIC > RHEL 7.4 > > VM on DuT: RHEL7.4 > > I plan to do more performance test with bigger frame sizes. > > changes from v5->v6: > * fix VIRTQUEUE_DUMP macro > * rework mergeable rx buffer path, support out of order (not sure if I > need a .next field to support chains) > * move wmb in virtio_receive_pkts_packed() (Gavin) > * rename to virtio_init_split/_packed (Maxime) > * add support for ctrl virtqueues (Tiwei, thx Max for fixing) > * rework tx path to support update_packet_stats and > virtqueue_xmit_offload, TODO: merge with split-ring code to > avoid a lot of duplicate code > * remove unnecessary check for avoiding to call VIRTQUEUE_DUMP (Maxime) > > changes from v4->v5: > * fix VIRTQUEUE_DUMP macro > * fix wrap counter logic in transmit and receive functions > > changes from v3->v4: > * added helpers to increment index and set available/used flags > * driver keeps track of number of descriptors used > * change logic in set_rxtx_funcs() > * add patch for ctrl virtqueue with support for packed virtqueues > * rename virtio-1.1.h to virtio-packed.h > * fix wrong sizeof() in "vhost: vring address setup for packed queues" > * fix coding style of function definition in "net/virtio: add packed > virtqueue helpers" > * fix padding in vring_size() > * move patches to enable packed virtqueues end of series > * v4 has two open problems: I'm sending it out anyway for feedback/help: > * when VIRTIO_NET_F_MRG_RXBUF enabled only 128 packets are send in > guest, i.e. when ring is full for the first time. I suspect a bug in > setting the avail/used flags > > changes from v2->v3: > * implement event suppression > * add code do dump packed virtqueues > * don't use assert in vhost code > * rename virtio-user parameter to packed-vq > * support rxvf flush > > changes from v1->v2: > * don't use VIRTQ_DESC_F_NEXT in used descriptors (Jason) > * no rte_panice() in guest triggerable code (Maxime) > * use unlikely when checking for vq (Maxime) > * rename everything from _1_1 to _packed (Yuanhan) > * add two more patches to implement mergeable receive buffers > > *** BLURB HERE *** > > Jens Freimann (10): > net/virtio: vring init for packed queues > net/virtio: add packed virtqueue defines > net/virtio: add packed virtqueue helpers > net/virtio: flush packed receive virtqueues > net/virtio: dump packed virtqueue data > net/virtio: implement transmit path for packed queues > net/virtio: implement receive path for packed queues > net/virtio: add support for mergeable buffers with packed virtqueues > net/virtio: add virtio send command packed queue support > net/virtio: enable packed virtqueues by default > > Yuanhan Liu (1): > net/virtio-user: add option to use packed queues > > drivers/net/virtio/virtio_ethdev.c | 135 ++++- > drivers/net/virtio/virtio_ethdev.h | 5 + > drivers/net/virtio/virtio_pci.h | 8 + > drivers/net/virtio/virtio_ring.h | 96 +++- > drivers/net/virtio/virtio_rxtx.c | 490 +++++++++++++++++- > .../net/virtio/virtio_user/virtio_user_dev.c | 10 +- > .../net/virtio/virtio_user/virtio_user_dev.h | 2 +- > drivers/net/virtio/virtio_user_ethdev.c | 14 +- > drivers/net/virtio/virtqueue.c | 21 + > drivers/net/virtio/virtqueue.h | 50 +- > 10 files changed, 796 insertions(+), 35 deletions(-) > > -- > 2.17.1 >