All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yuanhan Liu <yliu@fridaylinux.org>
To: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: dev@dpdk.org, remy.horton@intel.com, tiwei.bie@intel.com,
	mst@redhat.com, jfreiman@redhat.com, vkaplans@redhat.com,
	jasowang@redhat.com
Subject: Re: [PATCH v3 00/19] Vhost-user: Implement device IOTLB support
Date: Fri, 6 Oct 2017 14:24:26 +0800	[thread overview]
Message-ID: <20171006062426.GA1545@yliu-home> (raw)
In-Reply-To: <20171005083627.27828-1-maxime.coquelin@redhat.com>

Series applied to dpdk-next-virtio.

Thanks.

	--yliu

On Thu, Oct 05, 2017 at 10:36:08AM +0200, Maxime Coquelin wrote:
> This v3 lists the feature in the release note, and fixes the bug in
> is_vring_iotlb_update() reported by Yuanhan.
> 
> The purpose of this series is to add support for
> VIRTIO_F_IOMMU_PLATFORM feature, by implementing device IOTLB in the
> vhost-user backend. It improves the guest safety by enabling the
> possibility to isolate the Virtio device.
> 
> It makes possible to use Virtio PMD in guest with using VFIO driver
> without enable_unsafe_noiommu_mode parameter set, so that the DPDK
> application on guest can only access memory its has been allowed to,
> and preventing malicious/buggy DPDK application in guest to make
> vhost-user backend write random guest memory. Note that Virtio-net
> Kernel driver also support IOMMU.
> 
> The series depends on Qemu's "vhost-user: Specify and implement
> device IOTLB support" [0], available upstream and which will be part
> of Qemu v2.10 release.
> 
> Performance-wise, even if this RFC has still room for optimizations,
> no performance degradation is noticed with static mappings (i.e. DPDK
> on guest) with PVP benchmark:
> 	Traffic Generator: Moongen (lua-trafficgen)
> 	Acceptable Loss: 0.005%
> 	Validation run time: 1 min
> 	Guest DPDK version/commit: v17.05
> 	QEMU version/commit: master (6db174aed1fd)
> 	Virtio features: default
> 	CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
> 	NIC: 2 x X710
> 	Page size: 1G host/1G guest
> 	Results (bidirectional, total of the two flows):
> 	 - base: 18.8Mpps
> 	 - base + IOTLB series, IOMMU OFF: 18.8Mpps
> 	 - base + IOTLB series, IOMMU ON: 18.8Mpps (14.5Mpps w/o PATCH 21/21)
> 
> This is explained because IOTLB misses, which are very costly, only
> happen at startup time. Indeed, once used, the buffers are not
> invalidated, so if the IOTLB cache is large enough, there will be only
> cache hits. Also, the use of 1G huge pages improves the IOTLB cache
> searching time by reducing the number of entries.
> 
> With 2M hugepages, a performance degradation is seen with IOMMU on:
> 	Traffic Generator: Moongen (lua-trafficgen)
> 	Acceptable Loss: 0.005%
> 	Validation run time: 1 min
> 	Guest DPDK version/commit: v17.05
> 	QEMU version/commit: master (6db174aed1fd)
> 	Virtio features: default
> 	CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
> 	NIC: 2 x X710
> 	Page size: 2M host/2M guest
> 	Results (bidirectional, total of the two flows):
> 	 - base: 18.8Mpps
> 	 - base + IOTLB series, IOMMU OFF: 18.8Mpps
> 	 - base + IOTLB series, IOMMU ON: 13.5Mpps (12.4Mpps wo PATCH 21/21)
> 
> A possible improvement would be to merge contiguous IOTLB entries sharing
> the same permissions. A very rough patch implementing this idea fixes
> the performance degradation (18.8Mpps), but the required work to clean
> it would delay this series after v17.11.
> 
> With dynamic mappings (i.e. Virtio-net kernel driver), this is another
> story. The performance is so poor it makes it almost unusable. Indeed,
> since the Kernel driver unmaps the buffers as soon as they are handled,
> almost all descriptors buffers addresses translations result in an IOTLB
> miss. There is not much that can be done on DPDK side, except maybe
> batching IOTLB miss requests no to break bursts, but it would require
> a big rework. In Qemu, we may consider enabling IOMMU MAP notifications,
> so that DPDK receives the IOTLB updates without having to send IOTLB miss
> request.
> 
> Regarding the design choices:
>  - I initially intended to use userspace RCU library[1] for the cache
> implementation, but it would have added an external dependency, and the
> lib is not available in all distros. Qemu for example got rid of this
> dependency by copying some of the userspace RCU lib parts into Qemu tree,
> but this is not possible with DPDK due to licensing issues (RCU lib is
> LGPL v2). Thanks to Jason advice, I implemented the cache using rd/wr
> locks.
>  - I initially implemented a per-device IOTLB cache, but the concurrent
> acccesses on the IOTLB lock had huge impact on performance (~-40% in
> bidirectionnal, expect even worse with multiqueue). I move to a per-
> virtqueue IOTLB design, which prevents this concurrency.
>  - The slave IOTLB miss request supports reply-ack feature in spec, but
> this version doesn't block or busy-wait for the corresponding update so
> that other queues sharing the same lcore can be processed in the meantime.
> 
> For those who would like to test the series, I made it available on
> gitlab[2] (vhost_user_iotlb_v1 tag). The guest kernel command line requires
> the intel_iommu=on parameter, and the guest should be started with and
> iommu device attached to the virtio-net device. For example:
> 
> ./qemu-system-x86_64 \
>   -enable-kvm -m 4096 -smp 2 \
>   -M q35,kernel-irqchip=split \
>   -cpu host \
>   -device intel-iommu,device-iotlb=on,intremap \
>   -device ioh3420,id=root.1,chassis=1 \
>   -chardev socket,id=char0,path=/tmp/vhost-user1 \
>   -netdev type=vhost-user,id=hn2,chardev=char0 \
>   -device virtio-net-pci,netdev=hn2,id=v0,mq=off,mac=$MAC,bus=root.1,disable-modern=off,disable-legacy=on,iommu_platform=on,ats=on \
> ...
> 
> [0]: https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg00520.html
> [1]: http://liburcu.org/
> [2]: https://gitlab.com/mcoquelin/dpdk-next-virtio/commits/vhost_user_iotlb_v2
> 
> Changes since v1:
> =================
> - Add feature support in release note (Yuanhan)
> - Fix return value in is_vring_iotlb_update() (Yuanhan)
> 
> Changes since v1:
> =================
> - Implement random cache entry eviction instead of removing all entries
>   when cache is full (Yuanhan)
> - Adds bounds check on vring_idx (Remy)
> - Initialize slave_req_fd to -1 (Tiwei)
> - Remove virtio-net device lock (Tiwei)
> - Simplify vhost_user_iotlb_cache_remove() (Tiwei)
> - Squash iotlb lock usage optimization patch
> - Inline noiommu part of vhost_iova_to_vva to remove performance
>   regressions with IOMMU=off
> - Reworked iotlb files copyrights
> 
> Changes since RFC:
> ==================
> - Fix memory leak in error patch reported by Jens
> - Rework wait for IOTLB update by stopping the burst to let other
>   queues to be processed, if any. It implies the introduction an
>   iotlb_pending_list, so that iotlb miss requests aren't sent multiple
>   times for a same address.
> - Optimize iotlb lock usage to recover to same as IOMMU off performance
> - Fix device locking issue in rte_vhost_dequeue_burst() error path
> - Change virtio_dev_rx error handling for consistency with mergeable rx,
>   and to ease returning in case of IOTLB misses.
> - Fix checkpatch warnings reported by checkpatch@dpdk.org
> 
> Maxime Coquelin (19):
>   Revert "vhost: workaround MQ fails to startup"
>   vhost: make error handling consistent in rx path
>   vhost: prepare send_vhost_message() to slave requests
>   vhost: add support to slave requests channel
>   vhost: declare missing IOMMU-related definitions for old kernels
>   vhost: add iotlb helper functions
>   vhost: iotlb: add pending miss request list and helpers
>   vhost-user: add support to IOTLB miss slave requests
>   vhost: initialize vrings IOTLB caches
>   vhost-user: handle IOTLB update and invalidate requests
>   vhost: introduce guest IOVA to backend VA helper
>   vhost: use the guest IOVA to host VA helper
>   vhost: enable rings at the right time
>   vhost: don't dereference invalid dev pointer after its reallocation
>   vhost: postpone rings addresses translation
>   vhost-user: translate ring addresses when IOMMU enabled
>   vhost-user: iommu: postpone device creation until ring are mapped
>   vhost: iommu: Invalidate vring in case of matching IOTLB invalidate
>   vhost: enable IOMMU support
> 
>  doc/guides/rel_notes/release_17_11.rst |   4 +
>  lib/librte_vhost/Makefile              |   4 +-
>  lib/librte_vhost/iotlb.c               | 352 +++++++++++++++++++++++++++++++++
>  lib/librte_vhost/iotlb.h               |  76 +++++++
>  lib/librte_vhost/vhost.c               | 115 +++++++++--
>  lib/librte_vhost/vhost.h               |  61 +++++-
>  lib/librte_vhost/vhost_user.c          | 312 +++++++++++++++++++++++++----
>  lib/librte_vhost/vhost_user.h          |  20 +-
>  lib/librte_vhost/virtio_net.c          |  96 +++++++--
>  9 files changed, 962 insertions(+), 78 deletions(-)
>  create mode 100644 lib/librte_vhost/iotlb.c
>  create mode 100644 lib/librte_vhost/iotlb.h
> 
> -- 
> 2.13.6

      parent reply	other threads:[~2017-10-06  6:24 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-05  8:36 [PATCH v3 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 01/19] Revert "vhost: workaround MQ fails to startup" Maxime Coquelin
2017-11-01 17:11   ` Kavanagh, Mark B
2017-11-02  9:40     ` Maxime Coquelin
2017-11-03 13:05       ` Yuanhan Liu
2017-11-03 14:28         ` Maxime Coquelin
2017-11-06 12:00           ` Yuanhan Liu
2017-11-06 12:07             ` Maxime Coquelin
2017-11-06 12:24               ` Yuanhan Liu
2017-11-06 12:50                 ` Maxime Coquelin
2017-11-06 13:36                   ` Yuanhan Liu
2017-11-03 15:34       ` Thomas Monjalon
2017-11-03 16:31         ` Kavanagh, Mark B
2017-10-05  8:36 ` [PATCH v3 02/19] vhost: make error handling consistent in rx path Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 03/19] vhost: prepare send_vhost_message() to slave requests Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 04/19] vhost: add support to slave requests channel Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 05/19] vhost: declare missing IOMMU-related definitions for old kernels Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 06/19] vhost: add iotlb helper functions Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 07/19] vhost: iotlb: add pending miss request list and helpers Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 08/19] vhost-user: add support to IOTLB miss slave requests Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 09/19] vhost: initialize vrings IOTLB caches Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 10/19] vhost-user: handle IOTLB update and invalidate requests Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 11/19] vhost: introduce guest IOVA to backend VA helper Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 12/19] vhost: use the guest IOVA to host " Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 13/19] vhost: enable rings at the right time Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 14/19] vhost: don't dereference invalid dev pointer after its reallocation Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 15/19] vhost: postpone rings addresses translation Maxime Coquelin
2017-10-13  1:47   ` Yao, Lei A
2017-10-13  7:32     ` Maxime Coquelin
2017-10-13  7:55       ` Yao, Lei A
2017-10-13  7:56         ` Maxime Coquelin
2017-10-16  5:59         ` Yao, Lei A
2017-10-16  6:23           ` Yao, Lei A
2017-10-16  9:47             ` Maxime Coquelin
2017-10-16 10:47               ` Maxime Coquelin
2017-10-17  1:24                 ` Yao, Lei A
2017-10-17  8:06                   ` Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 16/19] vhost-user: translate ring addresses when IOMMU enabled Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 17/19] vhost-user: iommu: postpone device creation until ring are mapped Maxime Coquelin
2017-11-02  7:21   ` Yao, Lei A
2017-11-02  8:21     ` Maxime Coquelin
2017-11-02 16:02       ` Maxime Coquelin
2017-11-03  8:25         ` Maxime Coquelin
2017-11-03 15:15           ` Michael S. Tsirkin
2017-11-03 15:54             ` Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 18/19] vhost: iommu: Invalidate vring in case of matching IOTLB invalidate Maxime Coquelin
2017-10-05  8:36 ` [PATCH v3 19/19] vhost: enable IOMMU support Maxime Coquelin
2017-10-06  6:24 ` Yuanhan Liu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171006062426.GA1545@yliu-home \
    --to=yliu@fridaylinux.org \
    --cc=dev@dpdk.org \
    --cc=jasowang@redhat.com \
    --cc=jfreiman@redhat.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=mst@redhat.com \
    --cc=remy.horton@intel.com \
    --cc=tiwei.bie@intel.com \
    --cc=vkaplans@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.