All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 00/19] Vhost-user: Implement device IOTLB support
@ 2017-07-04  9:49 Maxime Coquelin
  2017-07-04  9:49 ` [RFC 01/19] vhost: protect virtio_net device struct Maxime Coquelin
                   ` (19 more replies)
  0 siblings, 20 replies; 23+ messages in thread
From: Maxime Coquelin @ 2017-07-04  9:49 UTC (permalink / raw)
  To: dev, Yuanhan Liu; +Cc: mst, vkaplans, jasowang, jfreiman, Maxime Coquelin

This first RFC, which targets v17.11,  adds support for
VIRTIO_F_IOMMU_PLATFORM feature, by implementing device IOTLB in the
vhost-user backend. It improves the guest safety by enabling the
possibility to isolate the Virtio device.

It makes possible to use Virtio PMD in guest with using VFIO driver
without enable_unsafe_noiommu_mode parameter set, so that the DPDK
application on guest can only access memory its has been allowed to,
and preventing malicious/buggy DPDK application in guest to make
vhost-user backend write random guest memory. Note that Virtio-net
Kernel driver also support IOMMU.

The series depends on Qemu's "vhost-user: Specify and implement
device IOTLB support" [0], available upstream and which will be part
of Qemu v2.10 release.

Performance-wise, even if this RFC has still room for optimizations,
no performance degradation is noticed with static mappings (i.e. DPDK
on guest) with PVP benchmark:
	Traffic Generator: Moongen (lua-trafficgen)
	Acceptable Loss: 0.005%
	Validation run time: 1 min
	Guest DPDK version/commit: v17.05
	QEMU version/commit: master (6db174aed1fd)
	Virtio features: default
	CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
	NIC: 2 x X710
	Page size: 1G host/1G guest
	Results (bidirectional, total of the two flows):
	 - base: 18.8Mpps
	 - base + IOTLB series, IOMMU OFF: 18.8Mpps
	 - base + IOTLB series, IOMMU ON: 18.8Mpps

This is explained because IOTLB misses, which are very costly, only
happen at startup time. Indeed, once used, the buffers are not
invalidated, so if the IOTLB cache is large enough, there will be only
cache hit. Also, the use of 1G huge pages improves the IOTLB cache
searching time by reducing the number of entries. In next revision of
this series, I plan to provide PVP results with 2MB pages.

With dynamic mappings (i.e. Virtio-net kernel driver), this is another
story. The performance is so poor it makes it almost unusable. Indeed,
since the Kernel driver unmaps the buffers as soon as they are handled,
almost all descriptors buffers addresses translations result in an IOTLB
miss. There is not much that can be done on DPDK side. In Qemu, we may
consider enabling IOMMU MAP notifications, so that DPDK receives the
IOTLB updates without having to send IOTLB miss request.

Regarding the design choices:
 - I initially intended to use userspace RCU library[1] for the cache
implementation, but it would have added an external dependency, and the
lib is not available in all distros. Qemu for example got rid of this
dependency by copying some of the userspace RCU lib parts into Qemu tree,
but this is not possible with DPDK due to licensing issues (RCU lib is
LGPL v2). Thanks to Jason advice, I implemented the cache using rd/wr
locks.
 - I initially implemented a per-device IOTLB cache, but the concurrent
acccesses on the IOTLB lock had huge impact on performance (~-40% in
bidirectionnal, expect even worse with multiqueue). I move to a per-
virtqueue IOTLB design, which prevents this concurrency.
 - The slave IOTLB miss request supports reply-ack feature in spec, but
I moved to a busy-polling on IOTLB event to avoid a deadlock happening
when the device is stopped while a processing thread is waiting for an
IOTLB update.

For those who would like to test the series, I made it available on
gitlab[2] (vhost_user_iotlb_upstream_rfc tag). The guest kernel command
line requires the intel_iommu=on parameter, and the guest should be
started with and iommu device attached to the virtio-net device. For
example:

./qemu-system-x86_64 \
  -enable-kvm -m 4096 -smp 2 \
  -M q35,kernel-irqchip=split \
  -cpu host \
  -device intel-iommu,device-iotlb=on,intremap \
  -device ioh3420,id=root.1,chassis=1 \
  -chardev socket,id=char0,path=/tmp/vhost-user1 \
  -netdev type=vhost-user,id=hn2,chardev=char0 \
  -device virtio-net-pci,netdev=hn2,id=v0,mq=off,mac=$MAC,bus=root.1,disable-modern=off,disable-legacy=on,iommu_platform=on,ats=on \
...

[0]: https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg00520.html
[1]: http://liburcu.org/
[2]: https://gitlab.com/mcoquelin/dpdk-next-virtio/tree/vhost_user_iotlb_upstream_rfc

Maxime Coquelin (19):
  vhost: protect virtio_net device struct
  Revert "vhost: workaround MQ fails to startup"
  vhost: prepare send_vhost_message() to slave requests
  vhost: add support to slave requests channel
  vhost: declare missing IOMMU-related definitions for old kernels
  vhost: add iotlb helper functions
  vhost-user: add support to IOTLB miss slave requests
  vhost: initialize vrings IOTLB caches
  vhost: implement IOTLB events notification mechanism
  vhost-user: handle IOTLB update and invalidate requests
  vhost: introduce guest IOVA to backend VA helper
  vhost: use the guest IOVA to host VA helper
  vhost: enable rings at the right time
  vhost: don't dereference invalid dev pointer after its reallocation
  vhost: postpone rings adresses translation
  vhost-user: translate ring addresses when IOMMU enabled
  vhost-user: iommu: postpone device creation until ring are mapped
  vhost: iommu: Invalidate vring in case of matching IOTLB invalidate
  vhost: enable IOMMU support

 lib/librte_vhost/Makefile     |   4 +-
 lib/librte_vhost/iotlb.c      | 236 ++++++++++++++++++++++++++
 lib/librte_vhost/iotlb.h      |  47 +++++
 lib/librte_vhost/vhost.c      | 350 +++++++++++++++++++++++++++++++++-----
 lib/librte_vhost/vhost.h      |  53 +++++-
 lib/librte_vhost/vhost_user.c | 387 +++++++++++++++++++++++++++++++++---------
 lib/librte_vhost/vhost_user.h |  20 ++-
 lib/librte_vhost/virtio_net.c | 100 ++++++++---
 8 files changed, 1033 insertions(+), 164 deletions(-)
 create mode 100644 lib/librte_vhost/iotlb.c
 create mode 100644 lib/librte_vhost/iotlb.h

-- 
2.9.4

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC 01/19] vhost: protect virtio_net device struct
  2017-07-04  9:49 [RFC 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
@ 2017-07-04  9:49 ` Maxime Coquelin
  2017-07-05 10:07   ` Jens Freimann
  2017-07-04  9:49 ` [RFC 02/19] Revert "vhost: workaround MQ fails to startup" Maxime Coquelin
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 23+ messages in thread
From: Maxime Coquelin @ 2017-07-04  9:49 UTC (permalink / raw)
  To: dev, Yuanhan Liu; +Cc: mst, vkaplans, jasowang, jfreiman, Maxime Coquelin

virtio_net device might be accessed while being reallocated
in case of NUMA awareness. This case might be theoretical,
but it will be needed anyway to protect vrings pages against
invalidation.

The virtio_net devs are now protected with a readers/writers
lock, so that before reallocating the device, it is ensured
that it is not being referenced by the processing threads.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.c      | 223 +++++++++++++++++++++++++++++++++++-------
 lib/librte_vhost/vhost.h      |   3 +-
 lib/librte_vhost/vhost_user.c |  73 +++++---------
 lib/librte_vhost/virtio_net.c |  17 +++-
 4 files changed, 228 insertions(+), 88 deletions(-)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 19c5a43..2a4bc91 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -45,16 +45,25 @@
 #include <rte_string_fns.h>
 #include <rte_memory.h>
 #include <rte_malloc.h>
+#include <rte_rwlock.h>
 #include <rte_vhost.h>
 
 #include "vhost.h"
 
-struct virtio_net *vhost_devices[MAX_VHOST_DEVICE];
+struct vhost_device {
+	struct virtio_net *dev;
+	rte_rwlock_t lock;
+};
 
-struct virtio_net *
-get_device(int vid)
+/* Declared as static so that .lock is initialized */
+static struct vhost_device vhost_devices[MAX_VHOST_DEVICE];
+
+static inline struct virtio_net *
+__get_device(int vid)
 {
-	struct virtio_net *dev = vhost_devices[vid];
+	struct virtio_net *dev;
+
+	dev = vhost_devices[vid].dev;
 
 	if (unlikely(!dev)) {
 		RTE_LOG(ERR, VHOST_CONFIG,
@@ -64,6 +73,78 @@ get_device(int vid)
 	return dev;
 }
 
+struct virtio_net *
+get_device(int vid)
+{
+	struct virtio_net *dev;
+
+	rte_rwlock_read_lock(&vhost_devices[vid].lock);
+
+	dev = __get_device(vid);
+	if (unlikely(!dev))
+		rte_rwlock_read_unlock(&vhost_devices[vid].lock);
+
+	return dev;
+}
+
+void
+put_device(int vid)
+{
+	rte_rwlock_read_unlock(&vhost_devices[vid].lock);
+}
+
+static struct virtio_net *
+get_device_wr(int vid)
+{
+	struct virtio_net *dev;
+
+	rte_rwlock_write_lock(&vhost_devices[vid].lock);
+
+	dev = __get_device(vid);
+	if (unlikely(!dev))
+		rte_rwlock_write_unlock(&vhost_devices[vid].lock);
+
+	return dev;
+}
+
+static void
+put_device_wr(int vid)
+{
+	rte_rwlock_write_unlock(&vhost_devices[vid].lock);
+}
+
+int
+realloc_device(int vid, int vq_index, int node)
+{
+	struct virtio_net *dev, *old_dev;
+	struct vhost_virtqueue *vq;
+
+	dev = rte_malloc_socket(NULL, sizeof(*dev), 0, node);
+	if (!dev)
+		return -1;
+
+	vq = rte_malloc_socket(NULL, sizeof(*vq), 0, node);
+	if (!vq)
+		return -1;
+
+	old_dev = get_device_wr(vid);
+	if (!old_dev)
+		return -1;
+
+	memcpy(dev, old_dev, sizeof(*dev));
+	memcpy(vq, old_dev->virtqueue[vq_index], sizeof(*vq));
+	dev->virtqueue[vq_index] = vq;
+
+	rte_free(old_dev->virtqueue[vq_index]);
+	rte_free(old_dev);
+
+	vhost_devices[vid].dev = dev;
+
+	put_device_wr(vid);
+
+	return 0;
+}
+
 static void
 cleanup_vq(struct vhost_virtqueue *vq, int destroy)
 {
@@ -194,7 +275,7 @@ vhost_new_device(void)
 	}
 
 	for (i = 0; i < MAX_VHOST_DEVICE; i++) {
-		if (vhost_devices[i] == NULL)
+		if (vhost_devices[i].dev == NULL)
 			break;
 	}
 	if (i == MAX_VHOST_DEVICE) {
@@ -204,8 +285,10 @@ vhost_new_device(void)
 		return -1;
 	}
 
-	vhost_devices[i] = dev;
+	rte_rwlock_write_lock(&vhost_devices[i].lock);
+	vhost_devices[i].dev = dev;
 	dev->vid = i;
+	rte_rwlock_write_unlock(&vhost_devices[i].lock);
 
 	return i;
 }
@@ -227,10 +310,15 @@ vhost_destroy_device(int vid)
 		dev->notify_ops->destroy_device(vid);
 	}
 
+	put_device(vid);
+	dev = get_device_wr(vid);
+
 	cleanup_device(dev, 1);
 	free_device(dev);
 
-	vhost_devices[vid] = NULL;
+	vhost_devices[vid].dev = NULL;
+
+	put_device_wr(vid);
 }
 
 void
@@ -248,6 +336,8 @@ vhost_set_ifname(int vid, const char *if_name, unsigned int if_len)
 
 	strncpy(dev->ifname, if_name, len);
 	dev->ifname[sizeof(dev->ifname) - 1] = '\0';
+
+	put_device(vid);
 }
 
 void
@@ -259,25 +349,30 @@ vhost_enable_dequeue_zero_copy(int vid)
 		return;
 
 	dev->dequeue_zero_copy = 1;
+
+	put_device(vid);
 }
 
 int
 rte_vhost_get_mtu(int vid, uint16_t *mtu)
 {
 	struct virtio_net *dev = get_device(vid);
+	int ret = 0;
 
 	if (!dev)
 		return -ENODEV;
 
 	if (!(dev->flags & VIRTIO_DEV_READY))
-		return -EAGAIN;
+		ret = -EAGAIN;
 
 	if (!(dev->features & VIRTIO_NET_F_MTU))
-		return -ENOTSUP;
+		ret = -ENOTSUP;
 
 	*mtu = dev->mtu;
 
-	return 0;
+	put_device(vid);
+
+	return ret;
 }
 
 int
@@ -296,9 +391,11 @@ rte_vhost_get_numa_node(int vid)
 	if (ret < 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"(%d) failed to query numa node: %d\n", vid, ret);
-		return -1;
+		numa_node = -1;
 	}
 
+	put_device(vid);
+
 	return numa_node;
 #else
 	RTE_SET_USED(vid);
@@ -310,22 +407,32 @@ uint32_t
 rte_vhost_get_queue_num(int vid)
 {
 	struct virtio_net *dev = get_device(vid);
+	uint32_t queue_num;
 
 	if (dev == NULL)
 		return 0;
 
-	return dev->nr_vring / 2;
+	queue_num = dev->nr_vring / 2;
+
+	put_device(vid);
+
+	return queue_num;
 }
 
 uint16_t
 rte_vhost_get_vring_num(int vid)
 {
 	struct virtio_net *dev = get_device(vid);
+	uint16_t vring_num;
 
 	if (dev == NULL)
 		return 0;
 
-	return dev->nr_vring;
+	vring_num = dev->nr_vring;
+
+	put_device(vid);
+
+	return vring_num;
 }
 
 int
@@ -341,6 +448,8 @@ rte_vhost_get_ifname(int vid, char *buf, size_t len)
 	strncpy(buf, dev->ifname, len);
 	buf[len - 1] = '\0';
 
+	put_device(vid);
+
 	return 0;
 }
 
@@ -354,6 +463,9 @@ rte_vhost_get_negotiated_features(int vid, uint64_t *features)
 		return -1;
 
 	*features = dev->features;
+
+	put_device(vid);
+
 	return 0;
 }
 
@@ -363,6 +475,7 @@ rte_vhost_get_mem_table(int vid, struct rte_vhost_memory **mem)
 	struct virtio_net *dev;
 	struct rte_vhost_memory *m;
 	size_t size;
+	int ret = 0;
 
 	dev = get_device(vid);
 	if (!dev)
@@ -370,14 +483,19 @@ rte_vhost_get_mem_table(int vid, struct rte_vhost_memory **mem)
 
 	size = dev->mem->nregions * sizeof(struct rte_vhost_mem_region);
 	m = malloc(sizeof(struct rte_vhost_memory) + size);
-	if (!m)
-		return -1;
+	if (!m) {
+		ret = -1;
+		goto out;
+	}
 
 	m->nregions = dev->mem->nregions;
 	memcpy(m->regions, dev->mem->regions, size);
 	*mem = m;
 
-	return 0;
+out:
+	put_device(vid);
+
+	return ret;
 }
 
 int
@@ -386,17 +504,22 @@ rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx,
 {
 	struct virtio_net *dev;
 	struct vhost_virtqueue *vq;
+	int ret = 0;
 
 	dev = get_device(vid);
 	if (!dev)
 		return -1;
 
-	if (vring_idx >= VHOST_MAX_VRING)
-		return -1;
+	if (vring_idx >= VHOST_MAX_VRING) {
+		ret = -1;
+		goto out;
+	}
 
 	vq = dev->virtqueue[vring_idx];
-	if (!vq)
-		return -1;
+	if (!vq) {
+		ret = -1;
+		goto out;
+	}
 
 	vring->desc  = vq->desc;
 	vring->avail = vq->avail;
@@ -407,7 +530,10 @@ rte_vhost_get_vhost_vring(int vid, uint16_t vring_idx,
 	vring->kickfd  = vq->kickfd;
 	vring->size    = vq->size;
 
-	return 0;
+out:
+	put_device(vid);
+
+	return ret;
 }
 
 uint16_t
@@ -415,6 +541,7 @@ rte_vhost_avail_entries(int vid, uint16_t queue_id)
 {
 	struct virtio_net *dev;
 	struct vhost_virtqueue *vq;
+	uint16_t avail_entries = 0;
 
 	dev = get_device(vid);
 	if (!dev)
@@ -422,15 +549,23 @@ rte_vhost_avail_entries(int vid, uint16_t queue_id)
 
 	vq = dev->virtqueue[queue_id];
 	if (!vq->enabled)
-		return 0;
+		goto out;
+
 
-	return *(volatile uint16_t *)&vq->avail->idx - vq->last_used_idx;
+	avail_entries = *(volatile uint16_t *)&vq->avail->idx;
+	avail_entries -= vq->last_used_idx;
+
+out:
+	put_device(vid);
+
+	return avail_entries;
 }
 
 int
 rte_vhost_enable_guest_notification(int vid, uint16_t queue_id, int enable)
 {
 	struct virtio_net *dev = get_device(vid);
+	int ret = 0;
 
 	if (dev == NULL)
 		return -1;
@@ -438,11 +573,16 @@ rte_vhost_enable_guest_notification(int vid, uint16_t queue_id, int enable)
 	if (enable) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"guest notification isn't supported.\n");
-		return -1;
+		ret = -1;
+		goto out;
 	}
 
 	dev->virtqueue[queue_id]->used->flags = VRING_USED_F_NO_NOTIFY;
-	return 0;
+
+out:
+	put_device(vid);
+
+	return ret;
 }
 
 void
@@ -454,6 +594,8 @@ rte_vhost_log_write(int vid, uint64_t addr, uint64_t len)
 		return;
 
 	vhost_log_write(dev, addr, len);
+
+	put_device(vid);
 }
 
 void
@@ -468,12 +610,15 @@ rte_vhost_log_used_vring(int vid, uint16_t vring_idx,
 		return;
 
 	if (vring_idx >= VHOST_MAX_VRING)
-		return;
+		goto out;
 	vq = dev->virtqueue[vring_idx];
 	if (!vq)
-		return;
+		goto out;
 
 	vhost_log_used_vring(dev, vq, offset, len);
+
+out:
+	put_device(vid);
 }
 
 uint32_t
@@ -481,6 +626,7 @@ rte_vhost_rx_queue_count(int vid, uint16_t qid)
 {
 	struct virtio_net *dev;
 	struct vhost_virtqueue *vq;
+	uint32_t queue_count;
 
 	dev = get_device(vid);
 	if (dev == NULL)
@@ -489,15 +635,26 @@ rte_vhost_rx_queue_count(int vid, uint16_t qid)
 	if (unlikely(qid >= dev->nr_vring || (qid & 1) == 0)) {
 		RTE_LOG(ERR, VHOST_DATA, "(%d) %s: invalid virtqueue idx %d.\n",
 			dev->vid, __func__, qid);
-		return 0;
+		queue_count = 0;
+		goto out;
 	}
 
 	vq = dev->virtqueue[qid];
-	if (vq == NULL)
-		return 0;
+	if (vq == NULL) {
+		queue_count = 0;
+		goto out;
+	}
 
-	if (unlikely(vq->enabled == 0 || vq->avail == NULL))
-		return 0;
+	if (unlikely(vq->enabled == 0 || vq->avail == NULL)) {
+		queue_count = 0;
+		goto out;
+	}
+
+	queue_count = *((volatile uint16_t *)&vq->avail->idx);
+	queue_count -= vq->last_avail_idx;
+
+out:
+	put_device(vid);
 
-	return *((volatile uint16_t *)&vq->avail->idx) - vq->last_avail_idx;
+	return queue_count;
 }
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 0f294f3..18ad69c 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -269,7 +269,6 @@ vhost_log_used_vring(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 extern uint64_t VHOST_FEATURES;
 #define MAX_VHOST_DEVICE	1024
-extern struct virtio_net *vhost_devices[MAX_VHOST_DEVICE];
 
 /* Convert guest physical address to host physical address */
 static __rte_always_inline phys_addr_t
@@ -292,6 +291,8 @@ gpa_to_hpa(struct virtio_net *dev, uint64_t gpa, uint64_t size)
 }
 
 struct virtio_net *get_device(int vid);
+void put_device(int vid);
+int realloc_device(int vid, int vq_index, int node);
 
 int vhost_new_device(void);
 void cleanup_device(struct virtio_net *dev, int destroy);
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index ad2e8d3..5b3b881 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -241,62 +241,31 @@ vhost_user_set_vring_num(struct virtio_net *dev,
 static struct virtio_net*
 numa_realloc(struct virtio_net *dev, int index)
 {
-	int oldnode, newnode;
-	struct virtio_net *old_dev;
-	struct vhost_virtqueue *old_vq, *vq;
-	int ret;
+	int oldnode, newnode, vid, ret;
 
-	old_dev = dev;
-	vq = old_vq = dev->virtqueue[index];
+	vid = dev->vid;
 
-	ret = get_mempolicy(&newnode, NULL, 0, old_vq->desc,
+	ret = get_mempolicy(&newnode, NULL, 0, dev->virtqueue[index]->desc,
 			    MPOL_F_NODE | MPOL_F_ADDR);
 
 	/* check if we need to reallocate vq */
-	ret |= get_mempolicy(&oldnode, NULL, 0, old_vq,
+	ret |= get_mempolicy(&oldnode, NULL, 0, dev->virtqueue[index],
 			     MPOL_F_NODE | MPOL_F_ADDR);
 	if (ret) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"Unable to get vq numa information.\n");
 		return dev;
 	}
-	if (oldnode != newnode) {
-		RTE_LOG(INFO, VHOST_CONFIG,
-			"reallocate vq from %d to %d node\n", oldnode, newnode);
-		vq = rte_malloc_socket(NULL, sizeof(*vq), 0, newnode);
-		if (!vq)
-			return dev;
-
-		memcpy(vq, old_vq, sizeof(*vq));
-		rte_free(old_vq);
-	}
 
-	/* check if we need to reallocate dev */
-	ret = get_mempolicy(&oldnode, NULL, 0, old_dev,
-			    MPOL_F_NODE | MPOL_F_ADDR);
-	if (ret) {
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"Unable to get dev numa information.\n");
-		goto out;
-	}
 	if (oldnode != newnode) {
 		RTE_LOG(INFO, VHOST_CONFIG,
-			"reallocate dev from %d to %d node\n",
-			oldnode, newnode);
-		dev = rte_malloc_socket(NULL, sizeof(*dev), 0, newnode);
-		if (!dev) {
-			dev = old_dev;
-			goto out;
-		}
-
-		memcpy(dev, old_dev, sizeof(*dev));
-		rte_free(old_dev);
+			"reallocate vq from %d to %d node\n", oldnode, newnode);
+		put_device(vid);
+		if (realloc_device(vid, index, newnode))
+			RTE_LOG(ERR, VHOST_CONFIG, "Failed to realloc device\n");
+		dev = get_device(vid);
 	}
 
-out:
-	dev->virtqueue[index] = vq;
-	vhost_devices[dev->vid] = dev;
-
 	return dev;
 }
 #else
@@ -336,9 +305,10 @@ qva_to_vva(struct virtio_net *dev, uint64_t qva)
  * This function then converts these to our address space.
  */
 static int
-vhost_user_set_vring_addr(struct virtio_net *dev, VhostUserMsg *msg)
+vhost_user_set_vring_addr(struct virtio_net **pdev, VhostUserMsg *msg)
 {
 	struct vhost_virtqueue *vq;
+	struct virtio_net *dev = *pdev;
 
 	if (dev->mem == NULL)
 		return -1;
@@ -356,7 +326,7 @@ vhost_user_set_vring_addr(struct virtio_net *dev, VhostUserMsg *msg)
 		return -1;
 	}
 
-	dev = numa_realloc(dev, msg->payload.addr.index);
+	*pdev = dev = numa_realloc(dev, msg->payload.addr.index);
 	vq = dev->virtqueue[msg->payload.addr.index];
 
 	vq->avail = (struct vring_avail *)(uintptr_t)qva_to_vva(dev,
@@ -966,7 +936,7 @@ vhost_user_msg_handler(int vid, int fd)
 {
 	struct virtio_net *dev;
 	struct VhostUserMsg msg;
-	int ret;
+	int ret = 0;
 
 	dev = get_device(vid);
 	if (dev == NULL)
@@ -978,7 +948,8 @@ vhost_user_msg_handler(int vid, int fd)
 			RTE_LOG(ERR, VHOST_CONFIG,
 				"failed to get callback ops for driver %s\n",
 				dev->ifname);
-			return -1;
+			ret = -1;
+			goto out;
 		}
 	}
 
@@ -994,10 +965,10 @@ vhost_user_msg_handler(int vid, int fd)
 			RTE_LOG(ERR, VHOST_CONFIG,
 				"vhost read incorrect message\n");
 
-		return -1;
+		ret = -1;
+		goto out;
 	}
 
-	ret = 0;
 	RTE_LOG(INFO, VHOST_CONFIG, "read message %s\n",
 		vhost_message_str[msg.request]);
 
@@ -1005,7 +976,8 @@ vhost_user_msg_handler(int vid, int fd)
 	if (ret < 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"failed to alloc queue\n");
-		return -1;
+		ret = -1;
+		goto out;
 	}
 
 	switch (msg.request) {
@@ -1054,7 +1026,7 @@ vhost_user_msg_handler(int vid, int fd)
 		vhost_user_set_vring_num(dev, &msg);
 		break;
 	case VHOST_USER_SET_VRING_ADDR:
-		vhost_user_set_vring_addr(dev, &msg);
+		vhost_user_set_vring_addr(&dev, &msg);
 		break;
 	case VHOST_USER_SET_VRING_BASE:
 		vhost_user_set_vring_base(dev, &msg);
@@ -1122,5 +1094,8 @@ vhost_user_msg_handler(int vid, int fd)
 		}
 	}
 
-	return 0;
+out:
+	put_device(vid);
+
+	return ret;
 }
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index ebfda1c..726d349 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -587,14 +587,19 @@ rte_vhost_enqueue_burst(int vid, uint16_t queue_id,
 	struct rte_mbuf **pkts, uint16_t count)
 {
 	struct virtio_net *dev = get_device(vid);
+	int ret = 0;
 
 	if (!dev)
 		return 0;
 
 	if (dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF))
-		return virtio_dev_merge_rx(dev, queue_id, pkts, count);
+		ret = virtio_dev_merge_rx(dev, queue_id, pkts, count);
 	else
-		return virtio_dev_rx(dev, queue_id, pkts, count);
+		ret = virtio_dev_rx(dev, queue_id, pkts, count);
+
+	put_device(vid);
+
+	return ret;
 }
 
 static inline bool
@@ -993,12 +998,12 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 	if (unlikely(!is_valid_virt_queue_idx(queue_id, 1, dev->nr_vring))) {
 		RTE_LOG(ERR, VHOST_DATA, "(%d) %s: invalid virtqueue idx %d.\n",
 			dev->vid, __func__, queue_id);
-		return 0;
+		goto out;
 	}
 
 	vq = dev->virtqueue[queue_id];
 	if (unlikely(vq->enabled == 0))
-		return 0;
+		goto out;
 
 	if (unlikely(dev->dequeue_zero_copy)) {
 		struct zcopy_mbuf *zmbuf, *next;
@@ -1048,7 +1053,7 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 		if (rarp_mbuf == NULL) {
 			RTE_LOG(ERR, VHOST_DATA,
 				"Failed to allocate memory for mbuf.\n");
-			return 0;
+			goto out;
 		}
 
 		if (make_rarp_packet(rarp_mbuf, &dev->mac)) {
@@ -1167,5 +1172,7 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 		i += 1;
 	}
 
+	put_device(vid);
+
 	return i;
 }
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 02/19] Revert "vhost: workaround MQ fails to startup"
  2017-07-04  9:49 [RFC 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
  2017-07-04  9:49 ` [RFC 01/19] vhost: protect virtio_net device struct Maxime Coquelin
@ 2017-07-04  9:49 ` Maxime Coquelin
  2017-07-04  9:49 ` [RFC 03/19] vhost: prepare send_vhost_message() to slave requests Maxime Coquelin
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2017-07-04  9:49 UTC (permalink / raw)
  To: dev, Yuanhan Liu; +Cc: mst, vkaplans, jasowang, jfreiman, Maxime Coquelin

This reverts commit 04d81227960b5c1cf2f11f492100979ead20c526.
---
 lib/librte_vhost/vhost_user.h | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index 35ebd71..2ba22db 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -49,14 +49,10 @@
 #define VHOST_USER_PROTOCOL_F_REPLY_ACK	3
 #define VHOST_USER_PROTOCOL_F_NET_MTU 4
 
-/*
- * disable REPLY_ACK feature to workaround the buggy QEMU implementation.
- * Proved buggy QEMU includes v2.7 - v2.9.
- */
 #define VHOST_USER_PROTOCOL_FEATURES	((1ULL << VHOST_USER_PROTOCOL_F_MQ) | \
 					 (1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD) |\
 					 (1ULL << VHOST_USER_PROTOCOL_F_RARP) | \
-					 (0ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK) | \
+					 (1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK) | \
 					 (1ULL << VHOST_USER_PROTOCOL_F_NET_MTU))
 
 typedef enum VhostUserRequest {
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 03/19] vhost: prepare send_vhost_message() to slave requests
  2017-07-04  9:49 [RFC 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
  2017-07-04  9:49 ` [RFC 01/19] vhost: protect virtio_net device struct Maxime Coquelin
  2017-07-04  9:49 ` [RFC 02/19] Revert "vhost: workaround MQ fails to startup" Maxime Coquelin
@ 2017-07-04  9:49 ` Maxime Coquelin
  2017-07-04  9:49 ` [RFC 04/19] vhost: add support to slave requests channel Maxime Coquelin
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2017-07-04  9:49 UTC (permalink / raw)
  To: dev, Yuanhan Liu; +Cc: mst, vkaplans, jasowang, jfreiman, Maxime Coquelin

send_vhost_message() is currently only used to send
replies, so it modifies message flags to perpare the
reply.

With upcoming channel for backend initiated request,
this function can be used to send requests.

This patch introduces a new send_vhost_reply() that
does the message flags modifications, and makes
send_vhost_message() generic.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost_user.c | 27 ++++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 5b3b881..8984dcb 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -877,8 +877,16 @@ read_vhost_message(int sockfd, struct VhostUserMsg *msg)
 static int
 send_vhost_message(int sockfd, struct VhostUserMsg *msg)
 {
-	int ret;
+	if (!msg)
+		return 0;
+
+	return send_fd_message(sockfd, (char *)msg,
+		VHOST_USER_HDR_SIZE + msg->size, NULL, 0);
+}
 
+static int
+send_vhost_reply(int sockfd, struct VhostUserMsg *msg)
+{
 	if (!msg)
 		return 0;
 
@@ -887,10 +895,7 @@ send_vhost_message(int sockfd, struct VhostUserMsg *msg)
 	msg->flags |= VHOST_USER_VERSION;
 	msg->flags |= VHOST_USER_REPLY_MASK;
 
-	ret = send_fd_message(sockfd, (char *)msg,
-		VHOST_USER_HDR_SIZE + msg->size, NULL, 0);
-
-	return ret;
+	return send_vhost_message(sockfd, msg);
 }
 
 /*
@@ -984,7 +989,7 @@ vhost_user_msg_handler(int vid, int fd)
 	case VHOST_USER_GET_FEATURES:
 		msg.payload.u64 = vhost_user_get_features(dev);
 		msg.size = sizeof(msg.payload.u64);
-		send_vhost_message(fd, &msg);
+		send_vhost_reply(fd, &msg);
 		break;
 	case VHOST_USER_SET_FEATURES:
 		vhost_user_set_features(dev, msg.payload.u64);
@@ -993,7 +998,7 @@ vhost_user_msg_handler(int vid, int fd)
 	case VHOST_USER_GET_PROTOCOL_FEATURES:
 		msg.payload.u64 = VHOST_USER_PROTOCOL_FEATURES;
 		msg.size = sizeof(msg.payload.u64);
-		send_vhost_message(fd, &msg);
+		send_vhost_reply(fd, &msg);
 		break;
 	case VHOST_USER_SET_PROTOCOL_FEATURES:
 		vhost_user_set_protocol_features(dev, msg.payload.u64);
@@ -1015,7 +1020,7 @@ vhost_user_msg_handler(int vid, int fd)
 
 		/* it needs a reply */
 		msg.size = sizeof(msg.payload.u64);
-		send_vhost_message(fd, &msg);
+		send_vhost_reply(fd, &msg);
 		break;
 	case VHOST_USER_SET_LOG_FD:
 		close(msg.fds[0]);
@@ -1035,7 +1040,7 @@ vhost_user_msg_handler(int vid, int fd)
 	case VHOST_USER_GET_VRING_BASE:
 		vhost_user_get_vring_base(dev, &msg);
 		msg.size = sizeof(msg.payload.state);
-		send_vhost_message(fd, &msg);
+		send_vhost_reply(fd, &msg);
 		break;
 
 	case VHOST_USER_SET_VRING_KICK:
@@ -1054,7 +1059,7 @@ vhost_user_msg_handler(int vid, int fd)
 	case VHOST_USER_GET_QUEUE_NUM:
 		msg.payload.u64 = VHOST_MAX_QUEUE_PAIRS;
 		msg.size = sizeof(msg.payload.u64);
-		send_vhost_message(fd, &msg);
+		send_vhost_reply(fd, &msg);
 		break;
 
 	case VHOST_USER_SET_VRING_ENABLE:
@@ -1077,7 +1082,7 @@ vhost_user_msg_handler(int vid, int fd)
 	if (msg.flags & VHOST_USER_NEED_REPLY) {
 		msg.payload.u64 = !!ret;
 		msg.size = sizeof(msg.payload.u64);
-		send_vhost_message(fd, &msg);
+		send_vhost_reply(fd, &msg);
 	}
 
 	if (!(dev->flags & VIRTIO_DEV_RUNNING) && virtio_is_ready(dev)) {
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 04/19] vhost: add support to slave requests channel
  2017-07-04  9:49 [RFC 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (2 preceding siblings ...)
  2017-07-04  9:49 ` [RFC 03/19] vhost: prepare send_vhost_message() to slave requests Maxime Coquelin
@ 2017-07-04  9:49 ` Maxime Coquelin
  2017-07-04  9:49 ` [RFC 05/19] vhost: declare missing IOMMU-related definitions for old kernels Maxime Coquelin
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2017-07-04  9:49 UTC (permalink / raw)
  To: dev, Yuanhan Liu; +Cc: mst, vkaplans, jasowang, jfreiman, Maxime Coquelin

Currently, only QEMU sends requests, the backend sends
replies. In some cases, the backend may need to send
requests to QEMU, like IOTLB miss events when IOMMU is
supported.

This patch introduces a new channel for such requests.
QEMU sends a file descriptor of a new socket using
VHOST_USER_SET_SLAVE_REQ_FD.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.h      |  2 ++
 lib/librte_vhost/vhost_user.c | 27 +++++++++++++++++++++++++++
 lib/librte_vhost/vhost_user.h | 10 +++++++++-
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 18ad69c..2340b0c 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -196,6 +196,8 @@ struct virtio_net {
 	uint32_t		nr_guest_pages;
 	uint32_t		max_guest_pages;
 	struct guest_page       *guest_pages;
+
+	int			slave_req_fd;
 } __rte_cache_aligned;
 
 
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 8984dcb..7b3c2f3 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -76,6 +76,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
 	[VHOST_USER_SET_VRING_ENABLE]  = "VHOST_USER_SET_VRING_ENABLE",
 	[VHOST_USER_SEND_RARP]  = "VHOST_USER_SEND_RARP",
 	[VHOST_USER_NET_SET_MTU]  = "VHOST_USER_NET_SET_MTU",
+	[VHOST_USER_SET_SLAVE_REQ_FD]  = "VHOST_USER_SET_SLAVE_REQ_FD",
 };
 
 static uint64_t
@@ -122,6 +123,11 @@ vhost_backend_cleanup(struct virtio_net *dev)
 		munmap((void *)(uintptr_t)dev->log_addr, dev->log_size);
 		dev->log_addr = 0;
 	}
+
+	if (dev->slave_req_fd >= 0) {
+		close(dev->slave_req_fd);
+		dev->slave_req_fd = -1;
+	}
 }
 
 /*
@@ -844,6 +850,23 @@ vhost_user_net_set_mtu(struct virtio_net *dev, struct VhostUserMsg *msg)
 	return 0;
 }
 
+static int
+vhost_user_set_req_fd(struct virtio_net *dev, struct VhostUserMsg *msg)
+{
+	int fd = msg->fds[0];
+
+	if (fd < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+				"Invalid file descriptor for slave channel (%d)\n",
+				fd);
+		return -1;
+	}
+
+	dev->slave_req_fd = fd;
+
+	return 0;
+}
+
 /* return bytes# of read on success or negative val on failure. */
 static int
 read_vhost_message(int sockfd, struct VhostUserMsg *msg)
@@ -1073,6 +1096,10 @@ vhost_user_msg_handler(int vid, int fd)
 		ret = vhost_user_net_set_mtu(dev, &msg);
 		break;
 
+	case VHOST_USER_SET_SLAVE_REQ_FD:
+		ret = vhost_user_set_req_fd(dev, &msg);
+		break;
+
 	default:
 		ret = -1;
 		break;
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index 2ba22db..98f6e6f 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -48,12 +48,14 @@
 #define VHOST_USER_PROTOCOL_F_RARP	2
 #define VHOST_USER_PROTOCOL_F_REPLY_ACK	3
 #define VHOST_USER_PROTOCOL_F_NET_MTU 4
+#define VHOST_USER_PROTOCOL_F_SLAVE_REQ 5
 
 #define VHOST_USER_PROTOCOL_FEATURES	((1ULL << VHOST_USER_PROTOCOL_F_MQ) | \
 					 (1ULL << VHOST_USER_PROTOCOL_F_LOG_SHMFD) |\
 					 (1ULL << VHOST_USER_PROTOCOL_F_RARP) | \
 					 (1ULL << VHOST_USER_PROTOCOL_F_REPLY_ACK) | \
-					 (1ULL << VHOST_USER_PROTOCOL_F_NET_MTU))
+					 (1ULL << VHOST_USER_PROTOCOL_F_NET_MTU) | \
+					 (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_REQ))
 
 typedef enum VhostUserRequest {
 	VHOST_USER_NONE = 0,
@@ -77,9 +79,15 @@ typedef enum VhostUserRequest {
 	VHOST_USER_SET_VRING_ENABLE = 18,
 	VHOST_USER_SEND_RARP = 19,
 	VHOST_USER_NET_SET_MTU = 20,
+	VHOST_USER_SET_SLAVE_REQ_FD = 21,
 	VHOST_USER_MAX
 } VhostUserRequest;
 
+typedef enum VhostUserSlaveRequest {
+	VHOST_USER_SLAVE_NONE = 0,
+	VHOST_USER_SLAVE_MAX
+} VhostUserSlaveRequest;
+
 typedef struct VhostUserMemoryRegion {
 	uint64_t guest_phys_addr;
 	uint64_t memory_size;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 05/19] vhost: declare missing IOMMU-related definitions for old kernels
  2017-07-04  9:49 [RFC 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (3 preceding siblings ...)
  2017-07-04  9:49 ` [RFC 04/19] vhost: add support to slave requests channel Maxime Coquelin
@ 2017-07-04  9:49 ` Maxime Coquelin
  2017-07-04  9:49 ` [RFC 06/19] vhost: add iotlb helper functions Maxime Coquelin
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2017-07-04  9:49 UTC (permalink / raw)
  To: dev, Yuanhan Liu; +Cc: mst, vkaplans, jasowang, jfreiman, Maxime Coquelin

These defines and enums have been introduced in upstream kernel v4.8,
and backported to RHEL 7.4.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.h | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 2340b0c..5c9d931 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -132,6 +132,37 @@ struct vhost_virtqueue {
  #define VIRTIO_NET_F_MTU 3
 #endif
 
+/* Declare IOMMU related bits for older kernels */
+#ifndef VIRTIO_F_IOMMU_PLATFORM
+
+#define VIRTIO_F_IOMMU_PLATFORM 33
+
+struct vhost_iotlb_msg {
+	__u64 iova;
+	__u64 size;
+	__u64 uaddr;
+#define VHOST_ACCESS_RO      0x1
+#define VHOST_ACCESS_WO      0x2
+#define VHOST_ACCESS_RW      0x3
+	__u8 perm;
+#define VHOST_IOTLB_MISS           1
+#define VHOST_IOTLB_UPDATE         2
+#define VHOST_IOTLB_INVALIDATE     3
+#define VHOST_IOTLB_ACCESS_FAIL    4
+	__u8 type;
+};
+
+#define VHOST_IOTLB_MSG 0x1
+
+struct vhost_msg {
+	int type;
+	union {
+		struct vhost_iotlb_msg iotlb;
+		__u8 padding[64];
+	};
+};
+#endif
+
 /*
  * Define virtio 1.0 for older kernels
  */
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 06/19] vhost: add iotlb helper functions
  2017-07-04  9:49 [RFC 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (4 preceding siblings ...)
  2017-07-04  9:49 ` [RFC 05/19] vhost: declare missing IOMMU-related definitions for old kernels Maxime Coquelin
@ 2017-07-04  9:49 ` Maxime Coquelin
  2017-07-04  9:49 ` [RFC 07/19] vhost-user: add support to IOTLB miss slave requests Maxime Coquelin
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2017-07-04  9:49 UTC (permalink / raw)
  To: dev, Yuanhan Liu; +Cc: mst, vkaplans, jasowang, jfreiman, Maxime Coquelin

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/Makefile |   4 +-
 lib/librte_vhost/iotlb.c  | 235 ++++++++++++++++++++++++++++++++++++++++++++++
 lib/librte_vhost/iotlb.h  |  47 ++++++++++
 lib/librte_vhost/vhost.c  |   1 +
 lib/librte_vhost/vhost.h  |   5 +
 5 files changed, 290 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_vhost/iotlb.c
 create mode 100644 lib/librte_vhost/iotlb.h

diff --git a/lib/librte_vhost/Makefile b/lib/librte_vhost/Makefile
index 4a116fe..e1084ab 100644
--- a/lib/librte_vhost/Makefile
+++ b/lib/librte_vhost/Makefile
@@ -47,8 +47,8 @@ LDLIBS += -lnuma
 endif
 
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c socket.c vhost.c vhost_user.c \
-				   virtio_net.c
+SRCS-$(CONFIG_RTE_LIBRTE_VHOST) := fd_man.c iotlb.c socket.c vhost.c \
+					vhost_user.c virtio_net.c
 
 # install includes
 SYMLINK-$(CONFIG_RTE_LIBRTE_VHOST)-include += rte_vhost.h
diff --git a/lib/librte_vhost/iotlb.c b/lib/librte_vhost/iotlb.c
new file mode 100644
index 0000000..02457fa
--- /dev/null
+++ b/lib/librte_vhost/iotlb.c
@@ -0,0 +1,235 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) 2017 Red Hat, Inc.
+ *   Copyright (c) 2017 Maxime Coquelin <maxime.coquelin@redhat.com>
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifdef RTE_LIBRTE_VHOST_NUMA
+#include <numaif.h>
+#endif
+
+#include <rte_tailq.h>
+
+#include "iotlb.h"
+#include "vhost.h"
+
+struct vhost_iotlb_entry {
+	TAILQ_ENTRY(vhost_iotlb_entry) next;
+
+	uint64_t iova;
+	uint64_t uaddr;
+	uint64_t size;
+	uint8_t perm;
+};
+
+/* ToDo: refine cache size */
+#define IOTLB_CACHE_SIZE 1024
+
+/* ToDo: Coalesce contiguous entries? */
+void vhost_user_iotlb_insert(struct vhost_virtqueue *vq, uint64_t iova,
+				uint64_t uaddr, uint64_t size, uint8_t perm)
+{
+	struct vhost_iotlb_entry *node, *new_node;
+	int ret;
+
+	ret = rte_mempool_get(vq->iotlb_pool, (void **)&new_node);
+	if (ret) {
+		RTE_LOG(ERR, VHOST_CONFIG, "IOTLB pool empty, invalidate cache\n");
+		vhost_user_iotlb_remove_all(vq);
+		ret = rte_mempool_get(vq->iotlb_pool, (void **)&new_node);
+		if (ret) {
+			RTE_LOG(ERR, VHOST_CONFIG, "IOTLB pool still empty, failure\n");
+			return;
+		}
+	}
+
+	new_node->iova = iova;
+	new_node->uaddr = uaddr;
+	new_node->size = size;
+	new_node->perm = perm;
+
+	rte_rwlock_write_lock(&vq->iotlb_lock);
+
+	TAILQ_FOREACH(node, &vq->iotlb_list, next) {
+		/*
+		 * IIUC, entries must be invalidated before being updated.
+		 * So if iova already in list, assume identical.
+		 */
+		if (node->iova == new_node->iova) {
+			rte_mempool_put(vq->iotlb_pool, new_node);
+			goto unlock;
+		} else if (node->iova > new_node->iova) {
+			TAILQ_INSERT_BEFORE(node, new_node, next);
+			goto unlock;
+		}
+	}
+
+	TAILQ_INSERT_TAIL(&vq->iotlb_list, new_node, next);
+
+unlock:
+	rte_rwlock_write_unlock(&vq->iotlb_lock);
+}
+
+void vhost_user_iotlb_remove(struct vhost_virtqueue *vq,
+					uint64_t iova, uint64_t size)
+{
+	struct vhost_iotlb_entry *node, *temp_node;
+
+	if (unlikely(!size))
+		return;
+
+	rte_rwlock_write_lock(&vq->iotlb_lock);
+
+	TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+		/* Sorted list */
+		if (unlikely(node->iova >= iova + size)) {
+			break;
+		} else if ((node->iova < iova + size) &&
+					(iova < node->iova + node->size)) {
+			TAILQ_REMOVE(&vq->iotlb_list, node, next);
+			rte_mempool_put(vq->iotlb_pool, node);
+			continue;
+		}
+	}
+
+	rte_rwlock_write_unlock(&vq->iotlb_lock);
+}
+
+void vhost_user_iotlb_remove_all(struct vhost_virtqueue *vq)
+{
+	struct vhost_iotlb_entry *node, *temp_node;
+
+	rte_rwlock_write_lock(&vq->iotlb_lock);
+
+	TAILQ_FOREACH_SAFE(node, &vq->iotlb_list, next, temp_node) {
+		TAILQ_REMOVE(&vq->iotlb_list, node, next);
+		rte_mempool_put(vq->iotlb_pool, node);
+	}
+
+	rte_rwlock_write_unlock(&vq->iotlb_lock);
+}
+
+uint64_t vhost_user_iotlb_find(struct vhost_virtqueue *vq, uint64_t iova,
+						uint64_t *size, uint8_t perm)
+{
+	struct vhost_iotlb_entry *node;
+	uint64_t offset, vva = 0, mapped = 0;
+
+	if (unlikely(!*size))
+		goto out;
+
+	rte_rwlock_read_lock(&vq->iotlb_lock);
+
+	TAILQ_FOREACH(node, &vq->iotlb_list, next) {
+		/* List sorted by iova */
+		if (unlikely(iova < node->iova))
+			break;
+
+		if (iova >= node->iova + node->size)
+			continue;
+
+		if (unlikely((perm & node->perm) != perm)) {
+			vva = 0;
+			break;
+		}
+
+		offset = iova - node->iova;
+		if (!vva)
+			vva = node->uaddr + offset;
+
+		mapped += node->size - offset;
+		iova = node->iova + node->size;
+
+		if (mapped >= *size)
+			break;
+	}
+
+	rte_rwlock_read_unlock(&vq->iotlb_lock);
+
+out:
+	if (mapped < *size)
+		*size = mapped;
+
+	/* Only part of the requested chunk is mapped */
+	if (unlikely(mapped < *size))
+		*size = mapped;
+
+	return vva;
+}
+
+int vhost_user_iotlb_init(struct virtio_net *dev, int vq_index)
+{
+	char pool_name[RTE_MEMPOOL_NAMESIZE];
+	struct vhost_virtqueue *vq = dev->virtqueue[vq_index];
+	int ret = -1, socket;
+
+	if (vq->iotlb_pool) {
+		/*
+		 * The cache has already been initialized,
+		 * just drop all entries
+		 */
+		vhost_user_iotlb_remove_all(vq);
+		return 0;
+	}
+
+#ifdef RTE_LIBRTE_VHOST_NUMA
+	ret = get_mempolicy(&socket, NULL, 0, vq, MPOL_F_NODE | MPOL_F_ADDR);
+#endif
+	if (ret)
+		socket = 0;
+
+	rte_rwlock_init(&vq->iotlb_lock);
+
+	TAILQ_INIT(&vq->iotlb_list);
+
+	snprintf(pool_name, sizeof(pool_name), "iotlb_cache_%d_%d",
+			dev->vid, vq_index);
+
+	/* If already created, free it and recreate */
+	vq->iotlb_pool = rte_mempool_lookup(pool_name);
+	if (vq->iotlb_pool)
+		rte_mempool_free(vq->iotlb_pool);
+
+	vq->iotlb_pool = rte_mempool_create(pool_name,
+			IOTLB_CACHE_SIZE, sizeof(struct vhost_iotlb_entry), 0,
+			0, 0, NULL, NULL, NULL, socket,
+			MEMPOOL_F_NO_CACHE_ALIGN |
+			MEMPOOL_F_SP_PUT |
+			MEMPOOL_F_SC_GET);
+	if (!vq->iotlb_pool) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+				"Failed to create IOTLB cache pool (%s)\n",
+				pool_name);
+		return -1;
+	}
+
+	return 0;
+}
+
diff --git a/lib/librte_vhost/iotlb.h b/lib/librte_vhost/iotlb.h
new file mode 100644
index 0000000..68a43ec
--- /dev/null
+++ b/lib/librte_vhost/iotlb.h
@@ -0,0 +1,47 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) 2017 Red Hat, Inc.
+ *   Copyright (c) 2017 Maxime Coquelin <maxime.coquelin@redhat.com>
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#ifndef _VHOST_IOTLB_H_
+#define _VHOST_IOTLB_H_
+
+#include "vhost.h"
+void vhost_user_iotlb_insert(struct vhost_virtqueue *vq, uint64_t iova,
+					uint64_t uaddr, uint64_t size,
+					uint8_t perm);
+void vhost_user_iotlb_remove(struct vhost_virtqueue *vq,
+					uint64_t iova, uint64_t size);
+void vhost_user_iotlb_remove_all(struct vhost_virtqueue *vq);
+uint64_t vhost_user_iotlb_find(struct vhost_virtqueue *vq, uint64_t iova,
+					uint64_t *size, uint8_t perm);
+int vhost_user_iotlb_init(struct virtio_net *dev, int vq_index);
+
+#endif /* _VHOST_IOTLB_H_ */
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 2a4bc91..5ca4de4 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -182,6 +182,7 @@ free_device(struct virtio_net *dev)
 		vq = dev->virtqueue[i];
 
 		rte_free(vq->shadow_used_ring);
+		rte_mempool_free(vq->iotlb_pool);
 
 		rte_free(vq);
 	}
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 5c9d931..7816a92 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -45,6 +45,7 @@
 
 #include <rte_log.h>
 #include <rte_ether.h>
+#include <rte_rwlock.h>
 
 #include "rte_vhost.h"
 
@@ -114,6 +115,10 @@ struct vhost_virtqueue {
 
 	struct vring_used_elem  *shadow_used_ring;
 	uint16_t                shadow_used_idx;
+
+	rte_rwlock_t	iotlb_lock;
+	struct rte_mempool *iotlb_pool;
+	TAILQ_HEAD(, vhost_iotlb_entry) iotlb_list;
 } __rte_cache_aligned;
 
 /* Old kernels have no such macros defined */
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 07/19] vhost-user: add support to IOTLB miss slave requests
  2017-07-04  9:49 [RFC 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (5 preceding siblings ...)
  2017-07-04  9:49 ` [RFC 06/19] vhost: add iotlb helper functions Maxime Coquelin
@ 2017-07-04  9:49 ` Maxime Coquelin
  2017-07-04  9:49 ` [RFC 08/19] vhost: initialize vrings IOTLB caches Maxime Coquelin
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2017-07-04  9:49 UTC (permalink / raw)
  To: dev, Yuanhan Liu; +Cc: mst, vkaplans, jasowang, jfreiman, Maxime Coquelin

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost_user.c | 25 +++++++++++++++++++++++++
 lib/librte_vhost/vhost_user.h |  3 +++
 2 files changed, 28 insertions(+)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 7b3c2f3..8b4a2b3 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -1131,3 +1131,28 @@ vhost_user_msg_handler(int vid, int fd)
 
 	return ret;
 }
+
+int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm)
+{
+	int ret;
+	struct VhostUserMsg msg = {
+		.request = VHOST_USER_SLAVE_IOTLB_MSG,
+		.flags = VHOST_USER_VERSION,
+		.size = sizeof(msg.payload.iotlb),
+		.payload.iotlb = {
+			.iova = iova,
+			.perm = perm,
+			.type = VHOST_IOTLB_MISS,
+		},
+	};
+
+	ret = send_vhost_message(dev->slave_req_fd, &msg);
+	if (ret < 0) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+				"Failed to send IOTLB miss message (%d)\n",
+				ret);
+		return ret;
+	}
+
+	return 0;
+}
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index 98f6e6f..0b2aff1 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -85,6 +85,7 @@ typedef enum VhostUserRequest {
 
 typedef enum VhostUserSlaveRequest {
 	VHOST_USER_SLAVE_NONE = 0,
+	VHOST_USER_SLAVE_IOTLB_MSG = 1,
 	VHOST_USER_SLAVE_MAX
 } VhostUserSlaveRequest;
 
@@ -122,6 +123,7 @@ typedef struct VhostUserMsg {
 		struct vhost_vring_addr addr;
 		VhostUserMemory memory;
 		VhostUserLog    log;
+		struct vhost_iotlb_msg iotlb;
 	} payload;
 	int fds[VHOST_MEMORY_MAX_NREGIONS];
 } __attribute((packed)) VhostUserMsg;
@@ -134,6 +136,7 @@ typedef struct VhostUserMsg {
 
 /* vhost_user.c */
 int vhost_user_msg_handler(int vid, int fd);
+int vhost_user_iotlb_miss(struct virtio_net *dev, uint64_t iova, uint8_t perm);
 
 /* socket.c */
 int read_fd_message(int sockfd, char *buf, int buflen, int *fds, int fd_num);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 08/19] vhost: initialize vrings IOTLB caches
  2017-07-04  9:49 [RFC 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (6 preceding siblings ...)
  2017-07-04  9:49 ` [RFC 07/19] vhost-user: add support to IOTLB miss slave requests Maxime Coquelin
@ 2017-07-04  9:49 ` Maxime Coquelin
  2017-07-04  9:49 ` [RFC 09/19] vhost: implement IOTLB events notification mechanism Maxime Coquelin
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2017-07-04  9:49 UTC (permalink / raw)
  To: dev, Yuanhan Liu; +Cc: mst, vkaplans, jasowang, jfreiman, Maxime Coquelin

The per-virtqueue IOTLB cache init is done at virtqueue
init time. init_vring_queue() now takes vring id as parameter,
so that the IOTLB cache mempool name can be generated.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 5ca4de4..06b8f20 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -48,6 +48,7 @@
 #include <rte_rwlock.h>
 #include <rte_vhost.h>
 
+#include "iotlb.h"
 #include "vhost.h"
 
 struct vhost_device {
@@ -191,13 +192,16 @@ free_device(struct virtio_net *dev)
 }
 
 static void
-init_vring_queue(struct vhost_virtqueue *vq)
+init_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
 {
+	struct vhost_virtqueue *vq = dev->virtqueue[vring_idx];
+
 	memset(vq, 0, sizeof(struct vhost_virtqueue));
 
 	vq->kickfd = VIRTIO_UNINITIALIZED_EVENTFD;
 	vq->callfd = VIRTIO_UNINITIALIZED_EVENTFD;
 
+	vhost_user_iotlb_init(dev, vring_idx);
 	/* Backends are set to -1 indicating an inactive device. */
 	vq->backend = -1;
 
@@ -211,12 +215,13 @@ init_vring_queue(struct vhost_virtqueue *vq)
 }
 
 static void
-reset_vring_queue(struct vhost_virtqueue *vq)
+reset_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
 {
+	struct vhost_virtqueue *vq = dev->virtqueue[vring_idx];
 	int callfd;
 
 	callfd = vq->callfd;
-	init_vring_queue(vq);
+	init_vring_queue(dev, vring_idx);
 	vq->callfd = callfd;
 }
 
@@ -233,7 +238,7 @@ alloc_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
 	}
 
 	dev->virtqueue[vring_idx] = vq;
-	init_vring_queue(vq);
+	init_vring_queue(dev, vring_idx);
 
 	dev->nr_vring += 1;
 
@@ -255,7 +260,7 @@ reset_device(struct virtio_net *dev)
 	dev->flags = 0;
 
 	for (i = 0; i < dev->nr_vring; i++)
-		reset_vring_queue(dev->virtqueue[i]);
+		reset_vring_queue(dev, i);
 }
 
 /*
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 09/19] vhost: implement IOTLB events notification mechanism
  2017-07-04  9:49 [RFC 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (7 preceding siblings ...)
  2017-07-04  9:49 ` [RFC 08/19] vhost: initialize vrings IOTLB caches Maxime Coquelin
@ 2017-07-04  9:49 ` Maxime Coquelin
  2017-07-04  9:49 ` [RFC 10/19] vhost-user: handle IOTLB update and invalidate requests Maxime Coquelin
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2017-07-04  9:49 UTC (permalink / raw)
  To: dev, Yuanhan Liu; +Cc: mst, vkaplans, jasowang, jfreiman, Maxime Coquelin

In case of IOTLB miss, the processing threads wait for the
IOTLB events to parse the IOTLB cache and find requested
entries.

This event is to be raised on IOTLB update as it might match
the awaited entry, on IOTLB invalidate to stop the ring if the
invalidate matches ring addresses and on vring stop to be able
to destroy the ring.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/iotlb.c      |  1 +
 lib/librte_vhost/vhost.c      | 16 ++++++++++++++++
 lib/librte_vhost/vhost.h      |  2 ++
 lib/librte_vhost/vhost_user.c | 10 ++++++++++
 4 files changed, 29 insertions(+)

diff --git a/lib/librte_vhost/iotlb.c b/lib/librte_vhost/iotlb.c
index 02457fa..025d6c6 100644
--- a/lib/librte_vhost/iotlb.c
+++ b/lib/librte_vhost/iotlb.c
@@ -206,6 +206,7 @@ int vhost_user_iotlb_init(struct virtio_net *dev, int vq_index)
 		socket = 0;
 
 	rte_rwlock_init(&vq->iotlb_lock);
+	rte_atomic16_init(&vq->iotlb_event);
 
 	TAILQ_INIT(&vq->iotlb_list);
 
diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 06b8f20..3897f9c 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -191,6 +191,17 @@ free_device(struct virtio_net *dev)
 	rte_free(dev);
 }
 
+void notify_iotlb_event(struct virtio_net *dev)
+{
+	struct vhost_virtqueue *vq;
+	uint32_t i;
+
+	for (i = 0; i < dev->nr_vring; i++) {
+		vq = dev->virtqueue[i];
+		rte_atomic16_set(&vq->iotlb_event, 1);
+	}
+}
+
 static void
 init_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
 {
@@ -313,6 +324,11 @@ vhost_destroy_device(int vid)
 
 	if (dev->flags & VIRTIO_DEV_RUNNING) {
 		dev->flags &= ~VIRTIO_DEV_RUNNING;
+		/*
+		 * Unblock processing threads waiting for
+		 * IOTLB updates, if any.
+		 */
+		notify_iotlb_event(dev);
 		dev->notify_ops->destroy_device(vid);
 	}
 
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 7816a92..3242658 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -119,6 +119,7 @@ struct vhost_virtqueue {
 	rte_rwlock_t	iotlb_lock;
 	struct rte_mempool *iotlb_pool;
 	TAILQ_HEAD(, vhost_iotlb_entry) iotlb_list;
+	rte_atomic16_t		iotlb_event;
 } __rte_cache_aligned;
 
 /* Old kernels have no such macros defined */
@@ -350,5 +351,6 @@ struct vhost_device_ops const *vhost_driver_callback_get(const char *path);
  * TODO: fix it; we have one backend now
  */
 void vhost_backend_cleanup(struct virtio_net *dev);
+void notify_iotlb_event(struct virtio_net *dev);
 
 #endif /* _VHOST_NET_CDEV_H_ */
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 8b4a2b3..6b41fea 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -145,6 +145,11 @@ vhost_user_reset_owner(struct virtio_net *dev)
 {
 	if (dev->flags & VIRTIO_DEV_RUNNING) {
 		dev->flags &= ~VIRTIO_DEV_RUNNING;
+		/*
+		 * Unblock processing threads waiting for
+		 * IOTLB updates, if any.
+		 */
+		notify_iotlb_event(dev);
 		dev->notify_ops->destroy_device(dev->vid);
 	}
 
@@ -691,6 +696,11 @@ vhost_user_get_vring_base(struct virtio_net *dev,
 	/* We have to stop the queue (virtio) if it is running. */
 	if (dev->flags & VIRTIO_DEV_RUNNING) {
 		dev->flags &= ~VIRTIO_DEV_RUNNING;
+		/*
+		 * Unblock processing threads waiting for
+		 * IOTLB updates, if any.
+		 */
+		notify_iotlb_event(dev);
 		dev->notify_ops->destroy_device(dev->vid);
 	}
 
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 10/19] vhost-user: handle IOTLB update and invalidate requests
  2017-07-04  9:49 [RFC 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (8 preceding siblings ...)
  2017-07-04  9:49 ` [RFC 09/19] vhost: implement IOTLB events notification mechanism Maxime Coquelin
@ 2017-07-04  9:49 ` Maxime Coquelin
  2017-07-04  9:49 ` [RFC 11/19] vhost: introduce guest IOVA to backend VA helper Maxime Coquelin
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2017-07-04  9:49 UTC (permalink / raw)
  To: dev, Yuanhan Liu; +Cc: mst, vkaplans, jasowang, jfreiman, Maxime Coquelin

Vhost-user device IOTLB protocol extension introduces
VHOST_USER_IOTLB message type. The associated payload is the
vhost_iotlb_msg struct defined in Kernel, which in this was can
be either an IOTLB update or invalidate message.

On IOTLB update, the virtqueues get notified of a new entry.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost_user.c | 45 +++++++++++++++++++++++++++++++++++++++++++
 lib/librte_vhost/vhost_user.h |  1 +
 2 files changed, 46 insertions(+)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 6b41fea..225f993 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -48,6 +48,7 @@
 #include <rte_malloc.h>
 #include <rte_log.h>
 
+#include "iotlb.h"
 #include "vhost.h"
 #include "vhost_user.h"
 
@@ -77,6 +78,7 @@ static const char *vhost_message_str[VHOST_USER_MAX] = {
 	[VHOST_USER_SEND_RARP]  = "VHOST_USER_SEND_RARP",
 	[VHOST_USER_NET_SET_MTU]  = "VHOST_USER_NET_SET_MTU",
 	[VHOST_USER_SET_SLAVE_REQ_FD]  = "VHOST_USER_SET_SLAVE_REQ_FD",
+	[VHOST_USER_IOTLB_MSG]  = "VHOST_USER_IOTLB_MSG",
 };
 
 static uint64_t
@@ -198,6 +200,7 @@ vhost_user_set_features(struct virtio_net *dev, uint64_t features)
 	} else {
 		dev->vhost_hlen = sizeof(struct virtio_net_hdr);
 	}
+
 	LOG_DEBUG(VHOST_CONFIG,
 		"(%d) mergeable RX buffers %s, virtio 1 %s\n",
 		dev->vid,
@@ -877,6 +880,44 @@ vhost_user_set_req_fd(struct virtio_net *dev, struct VhostUserMsg *msg)
 	return 0;
 }
 
+static int
+vhost_user_iotlb_msg(struct virtio_net *dev, struct VhostUserMsg *msg)
+{
+	struct vhost_iotlb_msg *imsg = &msg->payload.iotlb;
+	uint16_t i;
+	uint64_t vva;
+
+	switch (imsg->type) {
+	case VHOST_IOTLB_UPDATE:
+		vva = qva_to_vva(dev, imsg->uaddr);
+		if (!vva)
+			return -1;
+
+		for (i = 0; i < dev->nr_vring; i++) {
+			struct vhost_virtqueue *vq = dev->virtqueue[i];
+
+			vhost_user_iotlb_insert(vq, imsg->iova, vva,
+					imsg->size, imsg->perm);
+		}
+		notify_iotlb_event(dev);
+
+		break;
+	case VHOST_IOTLB_INVALIDATE:
+		for (i = 0; i < dev->nr_vring; i++) {
+			struct vhost_virtqueue *vq = dev->virtqueue[i];
+
+			vhost_user_iotlb_remove(vq, imsg->iova, imsg->size);
+		}
+		break;
+	default:
+		RTE_LOG(ERR, VHOST_CONFIG, "Invalid IOTLB message type (%d)\n",
+				imsg->type);
+		return -1;
+	}
+
+	return 0;
+}
+
 /* return bytes# of read on success or negative val on failure. */
 static int
 read_vhost_message(int sockfd, struct VhostUserMsg *msg)
@@ -1110,6 +1151,10 @@ vhost_user_msg_handler(int vid, int fd)
 		ret = vhost_user_set_req_fd(dev, &msg);
 		break;
 
+	case VHOST_USER_IOTLB_MSG:
+		ret = vhost_user_iotlb_msg(dev, &msg);
+		break;
+
 	default:
 		ret = -1;
 		break;
diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
index 0b2aff1..46c6ff9 100644
--- a/lib/librte_vhost/vhost_user.h
+++ b/lib/librte_vhost/vhost_user.h
@@ -80,6 +80,7 @@ typedef enum VhostUserRequest {
 	VHOST_USER_SEND_RARP = 19,
 	VHOST_USER_NET_SET_MTU = 20,
 	VHOST_USER_SET_SLAVE_REQ_FD = 21,
+	VHOST_USER_IOTLB_MSG = 22,
 	VHOST_USER_MAX
 } VhostUserRequest;
 
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 11/19] vhost: introduce guest IOVA to backend VA helper
  2017-07-04  9:49 [RFC 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (9 preceding siblings ...)
  2017-07-04  9:49 ` [RFC 10/19] vhost-user: handle IOTLB update and invalidate requests Maxime Coquelin
@ 2017-07-04  9:49 ` Maxime Coquelin
  2017-07-04  9:49 ` [RFC 12/19] vhost: use the guest IOVA to host " Maxime Coquelin
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2017-07-04  9:49 UTC (permalink / raw)
  To: dev, Yuanhan Liu; +Cc: mst, vkaplans, jasowang, jfreiman, Maxime Coquelin

This patch introduces vhost_iova_to_vva() function to translate
guest's IO virtual adresses to backend's virtual addresses.

When IOMMU is enabled, the IOTLB cache is queried to get the
translation. If missing from the IOTLB cache, an IOTLB_MISS request
is sent to Qemu, and IOTLB cache is queried again on IOTLB event
notification.

When IOMMU is disabled, the passed address is a guest's physical
address, so the legacy rte_vhost_gpa_to_vva() API is used.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.c | 35 +++++++++++++++++++++++++++++++++++
 lib/librte_vhost/vhost.h |  3 +++
 2 files changed, 38 insertions(+)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 3897f9c..7920216 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -47,9 +47,11 @@
 #include <rte_malloc.h>
 #include <rte_rwlock.h>
 #include <rte_vhost.h>
+#include <rte_rwlock.h>
 
 #include "iotlb.h"
 #include "vhost.h"
+#include "vhost_user.h"
 
 struct vhost_device {
 	struct virtio_net *dev;
@@ -59,6 +61,39 @@ struct vhost_device {
 /* Declared as static so that .lock is initialized */
 static struct vhost_device vhost_devices[MAX_VHOST_DEVICE];
 
+uint64_t vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
+			uint64_t iova, uint64_t size, uint8_t perm)
+{
+	uint64_t vva, tmp_size;
+	int ret;
+
+	if (unlikely(!size))
+		return 0;
+
+	if (!(dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)))
+		return rte_vhost_gpa_to_vva(dev->mem, iova);
+
+	do {
+		tmp_size = size;
+
+		vva = vhost_user_iotlb_find(vq, iova, &tmp_size, perm);
+		if (tmp_size == size)
+			return vva;
+
+		/* Not in IOTLB cache */
+		ret = vhost_user_iotlb_miss(dev, iova + tmp_size, perm);
+		if (ret)
+			return 0;
+
+		while (!rte_atomic16_cmpset((volatile uint16_t *)
+					&vq->iotlb_event.cnt, 1, 0))
+			rte_pause();
+
+	} while (likely(dev->flags & VIRTIO_DEV_RUNNING));
+
+	return 0;
+}
+
 static inline struct virtio_net *
 __get_device(int vid)
 {
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 3242658..5c39d88 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -353,4 +353,7 @@ struct vhost_device_ops const *vhost_driver_callback_get(const char *path);
 void vhost_backend_cleanup(struct virtio_net *dev);
 void notify_iotlb_event(struct virtio_net *dev);
 
+uint64_t vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
+			uint64_t iova, uint64_t size, uint8_t perm);
+
 #endif /* _VHOST_NET_CDEV_H_ */
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 12/19] vhost: use the guest IOVA to host VA helper
  2017-07-04  9:49 [RFC 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (10 preceding siblings ...)
  2017-07-04  9:49 ` [RFC 11/19] vhost: introduce guest IOVA to backend VA helper Maxime Coquelin
@ 2017-07-04  9:49 ` Maxime Coquelin
  2017-07-04  9:49 ` [RFC 13/19] vhost: enable rings at the right time Maxime Coquelin
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2017-07-04  9:49 UTC (permalink / raw)
  To: dev, Yuanhan Liu; +Cc: mst, vkaplans, jasowang, jfreiman, Maxime Coquelin

Replace rte_vhost_gpa_to_vva() calls with vhost_iova_to_vva(), which
requires to also pass the mapped len and the access permissions needed.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/virtio_net.c | 71 +++++++++++++++++++++++++++++--------------
 1 file changed, 49 insertions(+), 22 deletions(-)

diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 726d349..62c231a 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -154,8 +154,9 @@ virtio_enqueue_offload(struct rte_mbuf *m_buf, struct virtio_net_hdr *net_hdr)
 }
 
 static __rte_always_inline int
-copy_mbuf_to_desc(struct virtio_net *dev, struct vring_desc *descs,
-		  struct rte_mbuf *m, uint16_t desc_idx, uint32_t size)
+copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		  struct vring_desc *descs, struct rte_mbuf *m,
+		  uint16_t desc_idx, uint32_t size)
 {
 	uint32_t desc_avail, desc_offset;
 	uint32_t mbuf_avail, mbuf_offset;
@@ -166,7 +167,8 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vring_desc *descs,
 	uint16_t nr_desc = 1;
 
 	desc = &descs[desc_idx];
-	desc_addr = rte_vhost_gpa_to_vva(dev->mem, desc->addr);
+	desc_addr = vhost_iova_to_vva(dev, vq, desc->addr,
+					desc->len, VHOST_ACCESS_RW);
 	/*
 	 * Checking of 'desc_addr' placed outside of 'unlikely' macro to avoid
 	 * performance issue with some versions of gcc (4.8.4 and 5.3.0) which
@@ -205,7 +207,9 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vring_desc *descs,
 				return -1;
 
 			desc = &descs[desc->next];
-			desc_addr = rte_vhost_gpa_to_vva(dev->mem, desc->addr);
+			desc_addr = vhost_iova_to_vva(dev, vq, desc->addr,
+							desc->len,
+							VHOST_ACCESS_RW);
 			if (unlikely(!desc_addr))
 				return -1;
 
@@ -290,8 +294,10 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 
 		if (vq->desc[desc_idx].flags & VRING_DESC_F_INDIRECT) {
 			descs = (struct vring_desc *)(uintptr_t)
-				rte_vhost_gpa_to_vva(dev->mem,
-					vq->desc[desc_idx].addr);
+				vhost_iova_to_vva(dev,
+						vq, vq->desc[desc_idx].addr,
+						vq->desc[desc_idx].len,
+						VHOST_ACCESS_RO);
 			if (unlikely(!descs)) {
 				count = i;
 				break;
@@ -304,7 +310,7 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 			sz = vq->size;
 		}
 
-		err = copy_mbuf_to_desc(dev, descs, pkts[i], desc_idx, sz);
+		err = copy_mbuf_to_desc(dev, vq, descs, pkts[i], desc_idx, sz);
 		if (unlikely(err)) {
 			used_idx = (start_idx + i) & (vq->size - 1);
 			vq->used->ring[used_idx].len = dev->vhost_hlen;
@@ -350,7 +356,9 @@ fill_vec_buf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 	if (vq->desc[idx].flags & VRING_DESC_F_INDIRECT) {
 		descs = (struct vring_desc *)(uintptr_t)
-			rte_vhost_gpa_to_vva(dev->mem, vq->desc[idx].addr);
+			vhost_iova_to_vva(dev, vq, vq->desc[idx].addr,
+						vq->desc[idx].len,
+						VHOST_ACCESS_RO);
 		if (unlikely(!descs))
 			return -1;
 
@@ -425,8 +433,9 @@ reserve_avail_buf_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq,
 }
 
 static __rte_always_inline int
-copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct rte_mbuf *m,
-			    struct buf_vector *buf_vec, uint16_t num_buffers)
+copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct vhost_virtqueue *vq,
+				struct rte_mbuf *m, struct buf_vector *buf_vec,
+				uint16_t num_buffers)
 {
 	uint32_t vec_idx = 0;
 	uint64_t desc_addr;
@@ -439,7 +448,9 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct rte_mbuf *m,
 	if (unlikely(m == NULL))
 		return -1;
 
-	desc_addr = rte_vhost_gpa_to_vva(dev->mem, buf_vec[vec_idx].buf_addr);
+	desc_addr = vhost_iova_to_vva(dev, vq, buf_vec[vec_idx].buf_addr,
+						buf_vec[vec_idx].buf_len,
+						VHOST_ACCESS_RW);
 	if (buf_vec[vec_idx].buf_len < dev->vhost_hlen || !desc_addr)
 		return -1;
 
@@ -460,8 +471,11 @@ copy_mbuf_to_desc_mergeable(struct virtio_net *dev, struct rte_mbuf *m,
 		/* done with current desc buf, get the next one */
 		if (desc_avail == 0) {
 			vec_idx++;
-			desc_addr = rte_vhost_gpa_to_vva(dev->mem,
-					buf_vec[vec_idx].buf_addr);
+			desc_addr =
+				vhost_iova_to_vva(dev, vq,
+					buf_vec[vec_idx].buf_addr,
+					buf_vec[vec_idx].buf_len,
+					VHOST_ACCESS_RW);
 			if (unlikely(!desc_addr))
 				return -1;
 
@@ -558,7 +572,7 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
 			dev->vid, vq->last_avail_idx,
 			vq->last_avail_idx + num_buffers);
 
-		if (copy_mbuf_to_desc_mergeable(dev, pkts[pkt_idx],
+		if (copy_mbuf_to_desc_mergeable(dev, vq, pkts[pkt_idx],
 						buf_vec, num_buffers) < 0) {
 			vq->shadow_used_idx -= num_buffers;
 			break;
@@ -755,8 +769,9 @@ put_zmbuf(struct zcopy_mbuf *zmbuf)
 }
 
 static __rte_always_inline int
-copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs,
-		  uint16_t max_desc, struct rte_mbuf *m, uint16_t desc_idx,
+copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		  struct vring_desc *descs, uint16_t max_desc,
+		  struct rte_mbuf *m, uint16_t desc_idx,
 		  struct rte_mempool *mbuf_pool)
 {
 	struct vring_desc *desc;
@@ -774,7 +789,10 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs,
 			(desc->flags & VRING_DESC_F_INDIRECT))
 		return -1;
 
-	desc_addr = rte_vhost_gpa_to_vva(dev->mem, desc->addr);
+	desc_addr = vhost_iova_to_vva(dev,
+					vq, desc->addr,
+					desc->len,
+					VHOST_ACCESS_RO);
 	if (unlikely(!desc_addr))
 		return -1;
 
@@ -794,7 +812,10 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs,
 		if (unlikely(desc->flags & VRING_DESC_F_INDIRECT))
 			return -1;
 
-		desc_addr = rte_vhost_gpa_to_vva(dev->mem, desc->addr);
+		desc_addr = vhost_iova_to_vva(dev,
+							vq, desc->addr,
+							desc->len,
+							VHOST_ACCESS_RO);
 		if (unlikely(!desc_addr))
 			return -1;
 
@@ -858,7 +879,10 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vring_desc *descs,
 			if (unlikely(desc->flags & VRING_DESC_F_INDIRECT))
 				return -1;
 
-			desc_addr = rte_vhost_gpa_to_vva(dev->mem, desc->addr);
+			desc_addr = vhost_iova_to_vva(dev,
+							vq, desc->addr,
+							desc->len,
+							VHOST_ACCESS_RO);
 			if (unlikely(!desc_addr))
 				return -1;
 
@@ -1104,8 +1128,10 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 
 		if (vq->desc[desc_indexes[i]].flags & VRING_DESC_F_INDIRECT) {
 			desc = (struct vring_desc *)(uintptr_t)
-				rte_vhost_gpa_to_vva(dev->mem,
-					vq->desc[desc_indexes[i]].addr);
+				vhost_iova_to_vva(dev, vq,
+						vq->desc[desc_indexes[i]].addr,
+						sizeof(*desc),
+						VHOST_ACCESS_RO);
 			if (unlikely(!desc))
 				break;
 
@@ -1125,7 +1151,8 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 			break;
 		}
 
-		err = copy_desc_to_mbuf(dev, desc, sz, pkts[i], idx, mbuf_pool);
+		err = copy_desc_to_mbuf(dev, vq, desc, sz,
+						pkts[i], idx, mbuf_pool);
 		if (unlikely(err)) {
 			rte_pktmbuf_free(pkts[i]);
 			break;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 13/19] vhost: enable rings at the right time
  2017-07-04  9:49 [RFC 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (11 preceding siblings ...)
  2017-07-04  9:49 ` [RFC 12/19] vhost: use the guest IOVA to host " Maxime Coquelin
@ 2017-07-04  9:49 ` Maxime Coquelin
  2017-07-04  9:49 ` [RFC 14/19] vhost: don't dereference invalid dev pointer after its reallocation Maxime Coquelin
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2017-07-04  9:49 UTC (permalink / raw)
  To: dev, Yuanhan Liu; +Cc: mst, vkaplans, jasowang, jfreiman, Maxime Coquelin

When VHOST_USER_F_PROTOCOL_FEATURES is negotiated, the ring is not
enabled when started, but enabled through dedicated
VHOST_USER_SET_VRING_ENABLE request.

When not negotiated, the ring is started in enabled state, at
VHOST_USER_SET_VRING_KICK request time.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.c      | 6 ------
 lib/librte_vhost/vhost_user.c | 9 +++++++++
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 7920216..b1b6a97 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -251,12 +251,6 @@ init_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
 	/* Backends are set to -1 indicating an inactive device. */
 	vq->backend = -1;
 
-	/*
-	 * always set the vq to enabled; this is to keep compatibility
-	 * with the old QEMU, whereas there is no SET_VRING_ENABLE message.
-	 */
-	vq->enabled = 1;
-
 	TAILQ_INIT(&vq->zmbuf_list);
 }
 
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 225f993..e061e85 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -666,6 +666,15 @@ vhost_user_set_vring_kick(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 		"vring kick idx:%d file:%d\n", file.index, file.fd);
 
 	vq = dev->virtqueue[file.index];
+
+	/*
+	 * When VHOST_USER_F_PROTOCOL_FEATURES is not negotiated,
+	 * the ring starts already enabled. Otherwise, it is enabled via
+	 * the SET_VRING_ENABLE message.
+	 */
+	if (!(dev->features & (1ULL << VHOST_USER_F_PROTOCOL_FEATURES)))
+		vq->enabled = 1;
+
 	if (vq->kickfd >= 0)
 		close(vq->kickfd);
 	vq->kickfd = file.fd;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 14/19] vhost: don't dereference invalid dev pointer after its reallocation
  2017-07-04  9:49 [RFC 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (12 preceding siblings ...)
  2017-07-04  9:49 ` [RFC 13/19] vhost: enable rings at the right time Maxime Coquelin
@ 2017-07-04  9:49 ` Maxime Coquelin
  2017-07-04  9:49 ` [RFC 15/19] vhost: postpone rings adresses translation Maxime Coquelin
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2017-07-04  9:49 UTC (permalink / raw)
  To: dev, Yuanhan Liu
  Cc: mst, vkaplans, jasowang, jfreiman, Maxime Coquelin, stable

numa_realloc() reallocates the virtio_net device structure and
updates the vhost_devices[] table with the new pointer if the rings
are allocated different NUMA node.

Problem is that vhost_user_msg_handler() still derenferences old
pointer afterward.

This patch prevents this by fetching again the dev pointer in
vhost_devices[] after messages have been handled.

Cc: stable@dpdk.org
Fixes: af295ad4698c ("vhost: realloc device and queues to same numa node as vring desc")
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost_user.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index e061e85..99ac7d4 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -1170,6 +1170,12 @@ vhost_user_msg_handler(int vid, int fd)
 
 	}
 
+	/*
+	 * The virtio_net struct might have been reallocated on a different
+	 * NUMA node, so dev pointer might no more be valid.
+	 */
+	dev = get_device(vid);
+
 	if (msg.flags & VHOST_USER_NEED_REPLY) {
 		msg.payload.u64 = !!ret;
 		msg.size = sizeof(msg.payload.u64);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 15/19] vhost: postpone rings adresses translation
  2017-07-04  9:49 [RFC 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (13 preceding siblings ...)
  2017-07-04  9:49 ` [RFC 14/19] vhost: don't dereference invalid dev pointer after its reallocation Maxime Coquelin
@ 2017-07-04  9:49 ` Maxime Coquelin
  2017-07-04  9:49 ` [RFC 16/19] vhost-user: translate ring addresses when IOMMU enabled Maxime Coquelin
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2017-07-04  9:49 UTC (permalink / raw)
  To: dev, Yuanhan Liu; +Cc: mst, vkaplans, jasowang, jfreiman, Maxime Coquelin

This patch postpones rings addresses translations and checks, as
addresses sent by the master shuld not be interpreted as long as
ring is not started and enabled[0].

When protocol features aren't negotiated, the ring is started in
enabled state, so the addresses translations are postponed to
vhost_user_set_vring_kick().
Otherwise, it is postponed to when ring is enabled, in
vhost_user_set_vring_enable().

[0]: http://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg04355.html

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.h      |  1 +
 lib/librte_vhost/vhost_user.c | 74 +++++++++++++++++++++++++++++++++----------
 2 files changed, 58 insertions(+), 17 deletions(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 5c39d88..832a80a 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -115,6 +115,7 @@ struct vhost_virtqueue {
 
 	struct vring_used_elem  *shadow_used_ring;
 	uint16_t                shadow_used_idx;
+	struct vhost_vring_addr ring_addrs;
 
 	rte_rwlock_t	iotlb_lock;
 	struct rte_mempool *iotlb_pool;
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 99ac7d4..d9712e9 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -319,10 +319,10 @@ qva_to_vva(struct virtio_net *dev, uint64_t qva)
  * This function then converts these to our address space.
  */
 static int
-vhost_user_set_vring_addr(struct virtio_net **pdev, VhostUserMsg *msg)
+vhost_user_set_vring_addr(struct virtio_net *dev, VhostUserMsg *msg)
 {
 	struct vhost_virtqueue *vq;
-	struct virtio_net *dev = *pdev;
+	struct vhost_vring_addr *addr = &msg->payload.addr;
 
 	if (dev->mem == NULL)
 		return -1;
@@ -330,35 +330,50 @@ vhost_user_set_vring_addr(struct virtio_net **pdev, VhostUserMsg *msg)
 	/* addr->index refers to the queue index. The txq 1, rxq is 0. */
 	vq = dev->virtqueue[msg->payload.addr.index];
 
+	/*
+	 * Rings addresses should not be interpreted as long as the ring is not
+	 * started and enabled
+	 */
+	memcpy(&vq->ring_addrs, addr, sizeof(*addr));
+
+	return 0;
+}
+
+static struct virtio_net *translate_ring_addresses(struct virtio_net *dev,
+		int vq_index)
+{
+	struct vhost_virtqueue *vq = dev->virtqueue[vq_index];
+	struct vhost_vring_addr *addr = &vq->ring_addrs;
+
 	/* The addresses are converted from QEMU virtual to Vhost virtual. */
 	vq->desc = (struct vring_desc *)(uintptr_t)qva_to_vva(dev,
-			msg->payload.addr.desc_user_addr);
+			addr->desc_user_addr);
 	if (vq->desc == 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"(%d) failed to find desc ring address.\n",
 			dev->vid);
-		return -1;
+		return NULL;
 	}
 
-	*pdev = dev = numa_realloc(dev, msg->payload.addr.index);
-	vq = dev->virtqueue[msg->payload.addr.index];
+	dev = numa_realloc(dev, vq_index);
+	vq = dev->virtqueue[vq_index];
 
 	vq->avail = (struct vring_avail *)(uintptr_t)qva_to_vva(dev,
-			msg->payload.addr.avail_user_addr);
+			addr->avail_user_addr);
 	if (vq->avail == 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"(%d) failed to find avail ring address.\n",
 			dev->vid);
-		return -1;
+		return NULL;
 	}
 
 	vq->used = (struct vring_used *)(uintptr_t)qva_to_vva(dev,
-			msg->payload.addr.used_user_addr);
+			addr->used_user_addr);
 	if (vq->used == 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"(%d) failed to find used ring address.\n",
 			dev->vid);
-		return -1;
+		return NULL;
 	}
 
 	if (vq->last_used_idx != vq->used->idx) {
@@ -370,7 +385,7 @@ vhost_user_set_vring_addr(struct virtio_net **pdev, VhostUserMsg *msg)
 		vq->last_avail_idx = vq->used->idx;
 	}
 
-	vq->log_guest_addr = msg->payload.addr.log_guest_addr;
+	vq->log_guest_addr = addr->log_guest_addr;
 
 	LOG_DEBUG(VHOST_CONFIG, "(%d) mapped address desc: %p\n",
 			dev->vid, vq->desc);
@@ -381,7 +396,7 @@ vhost_user_set_vring_addr(struct virtio_net **pdev, VhostUserMsg *msg)
 	LOG_DEBUG(VHOST_CONFIG, "(%d) log_guest_addr: %" PRIx64 "\n",
 			dev->vid, vq->log_guest_addr);
 
-	return 0;
+	return dev;
 }
 
 /*
@@ -652,10 +667,11 @@ vhost_user_set_vring_call(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 }
 
 static void
-vhost_user_set_vring_kick(struct virtio_net *dev, struct VhostUserMsg *pmsg)
+vhost_user_set_vring_kick(struct virtio_net **pdev, struct VhostUserMsg *pmsg)
 {
 	struct vhost_vring_file file;
 	struct vhost_virtqueue *vq;
+	struct virtio_net *dev = *pdev;
 
 	file.index = pmsg->payload.u64 & VHOST_USER_VRING_IDX_MASK;
 	if (pmsg->payload.u64 & VHOST_USER_VRING_NOFD_MASK)
@@ -665,6 +681,16 @@ vhost_user_set_vring_kick(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 	RTE_LOG(INFO, VHOST_CONFIG,
 		"vring kick idx:%d file:%d\n", file.index, file.fd);
 
+	/*
+	 * Interpret ring addresses only when ring is started and enabled.
+	 * This is now if protocol features aren't supported.
+	 */
+	if (!(dev->features & (1ULL << VHOST_USER_F_PROTOCOL_FEATURES))) {
+		*pdev = dev = translate_ring_addresses(dev, file.index);
+		if (!dev)
+			return;
+	}
+
 	vq = dev->virtqueue[file.index];
 
 	/*
@@ -747,15 +773,29 @@ vhost_user_get_vring_base(struct virtio_net *dev,
  * enable the virtio queue pair.
  */
 static int
-vhost_user_set_vring_enable(struct virtio_net *dev,
+vhost_user_set_vring_enable(struct virtio_net **pdev,
 			    VhostUserMsg *msg)
 {
+	struct virtio_net *dev = *pdev;
 	int enable = (int)msg->payload.state.num;
 
 	RTE_LOG(INFO, VHOST_CONFIG,
 		"set queue enable: %d to qp idx: %d\n",
 		enable, msg->payload.state.index);
 
+	/*
+	 * Interpret ring addresses only when ring is started and enabled.
+	 * This is now if protocol features are supported.
+	 */
+	if (enable && (dev->features &
+				(1ULL << VHOST_USER_F_PROTOCOL_FEATURES))) {
+		dev = translate_ring_addresses(dev, msg->payload.state.index);
+		if (!dev)
+			return -1;
+
+		*pdev = dev;
+	}
+
 	if (dev->notify_ops->vring_state_changed)
 		dev->notify_ops->vring_state_changed(dev->vid,
 				msg->payload.state.index, enable);
@@ -1114,7 +1154,7 @@ vhost_user_msg_handler(int vid, int fd)
 		vhost_user_set_vring_num(dev, &msg);
 		break;
 	case VHOST_USER_SET_VRING_ADDR:
-		vhost_user_set_vring_addr(&dev, &msg);
+		vhost_user_set_vring_addr(dev, &msg);
 		break;
 	case VHOST_USER_SET_VRING_BASE:
 		vhost_user_set_vring_base(dev, &msg);
@@ -1127,7 +1167,7 @@ vhost_user_msg_handler(int vid, int fd)
 		break;
 
 	case VHOST_USER_SET_VRING_KICK:
-		vhost_user_set_vring_kick(dev, &msg);
+		vhost_user_set_vring_kick(&dev, &msg);
 		break;
 	case VHOST_USER_SET_VRING_CALL:
 		vhost_user_set_vring_call(dev, &msg);
@@ -1146,7 +1186,7 @@ vhost_user_msg_handler(int vid, int fd)
 		break;
 
 	case VHOST_USER_SET_VRING_ENABLE:
-		vhost_user_set_vring_enable(dev, &msg);
+		vhost_user_set_vring_enable(&dev, &msg);
 		break;
 	case VHOST_USER_SEND_RARP:
 		vhost_user_send_rarp(dev, &msg);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 16/19] vhost-user: translate ring addresses when IOMMU enabled
  2017-07-04  9:49 [RFC 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (14 preceding siblings ...)
  2017-07-04  9:49 ` [RFC 15/19] vhost: postpone rings adresses translation Maxime Coquelin
@ 2017-07-04  9:49 ` Maxime Coquelin
  2017-07-04  9:49 ` [RFC 17/19] vhost-user: iommu: postpone device creation until ring are mapped Maxime Coquelin
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2017-07-04  9:49 UTC (permalink / raw)
  To: dev, Yuanhan Liu; +Cc: mst, vkaplans, jasowang, jfreiman, Maxime Coquelin

When IOMMU is enabled, the ring addresses set by the
VHOST_USER_SET_VRING_ADDR requests are guest's IO virtual addresses,
whereas Qemu virtual addresses when IOMMU is disabled.

When enabled and the required translation is not in the IOTLB cache,
an IOTLB miss request is sent, but being called by the vhost-user
socket handling thread, the function does not wait for the requested
IOTLB update.

The function will be called again on the next IOTLB update message
reception if matching the vring addresses.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost_user.c | 43 +++++++++++++++++++++++++++++++++----------
 1 file changed, 33 insertions(+), 10 deletions(-)

diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index d9712e9..8230578 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -290,10 +290,7 @@ numa_realloc(struct virtio_net *dev, int index __rte_unused)
 }
 #endif
 
-/*
- * Converts QEMU virtual address to Vhost virtual address. This function is
- * used to convert the ring addresses to our address space.
- */
+/* Converts QEMU virtual address to Vhost virtual address. */
 static uint64_t
 qva_to_vva(struct virtio_net *dev, uint64_t qva)
 {
@@ -314,6 +311,29 @@ qva_to_vva(struct virtio_net *dev, uint64_t qva)
 	return 0;
 }
 
+
+/*
+ * Converts ring address to Vhost virtual address.
+ * If IOMMU is enabled, the ring address is a guest IO virtual address,
+ * else it is a QEMU virtual address.
+ */
+static uint64_t
+ring_addr_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
+		uint64_t ra, uint64_t size)
+{
+	if (dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)) {
+		uint64_t vva;
+
+		vva = vhost_user_iotlb_find(vq, ra, &size, VHOST_ACCESS_RW);
+		if (!vva)
+			vhost_user_iotlb_miss(dev, ra, VHOST_ACCESS_RW);
+
+		return vva;
+	}
+
+	return qva_to_vva(dev, ra);
+}
+
 /*
  * The virtio device sends us the desc, used and avail ring addresses.
  * This function then converts these to our address space.
@@ -346,8 +366,11 @@ static struct virtio_net *translate_ring_addresses(struct virtio_net *dev,
 	struct vhost_vring_addr *addr = &vq->ring_addrs;
 
 	/* The addresses are converted from QEMU virtual to Vhost virtual. */
-	vq->desc = (struct vring_desc *)(uintptr_t)qva_to_vva(dev,
-			addr->desc_user_addr);
+	if (vq->desc && vq->avail && vq->used)
+		return dev;
+
+	vq->desc = (struct vring_desc *)(uintptr_t)ring_addr_to_vva(dev,
+			vq, addr->desc_user_addr, sizeof(struct vring_desc));
 	if (vq->desc == 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"(%d) failed to find desc ring address.\n",
@@ -358,8 +381,8 @@ static struct virtio_net *translate_ring_addresses(struct virtio_net *dev,
 	dev = numa_realloc(dev, vq_index);
 	vq = dev->virtqueue[vq_index];
 
-	vq->avail = (struct vring_avail *)(uintptr_t)qva_to_vva(dev,
-			addr->avail_user_addr);
+	vq->avail = (struct vring_avail *)(uintptr_t)ring_addr_to_vva(dev,
+			vq, addr->avail_user_addr, sizeof(struct vring_avail));
 	if (vq->avail == 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"(%d) failed to find avail ring address.\n",
@@ -367,8 +390,8 @@ static struct virtio_net *translate_ring_addresses(struct virtio_net *dev,
 		return NULL;
 	}
 
-	vq->used = (struct vring_used *)(uintptr_t)qva_to_vva(dev,
-			addr->used_user_addr);
+	vq->used = (struct vring_used *)(uintptr_t)ring_addr_to_vva(dev,
+			vq, addr->used_user_addr, sizeof(struct vring_used));
 	if (vq->used == 0) {
 		RTE_LOG(ERR, VHOST_CONFIG,
 			"(%d) failed to find used ring address.\n",
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 17/19] vhost-user: iommu: postpone device creation until ring are mapped
  2017-07-04  9:49 [RFC 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (15 preceding siblings ...)
  2017-07-04  9:49 ` [RFC 16/19] vhost-user: translate ring addresses when IOMMU enabled Maxime Coquelin
@ 2017-07-04  9:49 ` Maxime Coquelin
  2017-07-04  9:49 ` [RFC 18/19] vhost: iommu: Invalidate vring in case of matching IOTLB invalidate Maxime Coquelin
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2017-07-04  9:49 UTC (permalink / raw)
  To: dev, Yuanhan Liu; +Cc: mst, vkaplans, jasowang, jfreiman, Maxime Coquelin

Translating the start addresses of the rings is not enough, we need to
be sure all the ring is made available by the guest.

It depends on the size of the rings, which is not known on SET_VRING_ADDR
reception. Furthermore, we need to be be safe against vring pages
invalidates.

This patch introduces a new access_ok flag per virtqueue, which is set
when all the rings are mapped, and cleared as soon as a page used by a
ring is invalidated. The invalidation part is implemented in a following
patch.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.c      | 37 ++++++++++++++++++++++++++
 lib/librte_vhost/vhost.h      |  4 ++-
 lib/librte_vhost/vhost_user.c | 62 +++++++++++++++++++++++++++++++------------
 lib/librte_vhost/virtio_net.c | 12 +++++++++
 4 files changed, 97 insertions(+), 18 deletions(-)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index b1b6a97..3f193bf 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -237,6 +237,43 @@ void notify_iotlb_event(struct virtio_net *dev)
 	}
 }
 
+int
+vring_translate(struct virtio_net *dev, struct vhost_virtqueue *vq)
+{
+	uint64_t size;
+
+	if (!(dev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)))
+		goto out;
+
+	size = sizeof(struct vring_desc) * vq->size;
+	vq->desc = (struct vring_desc *)vhost_iova_to_vva(dev, vq,
+						vq->ring_addrs.desc_user_addr,
+						size, VHOST_ACCESS_RW);
+	if (!vq->desc)
+		return -1;
+
+	size = sizeof(struct vring_avail);
+	size += sizeof(uint16_t) * vq->size;
+	vq->avail = (struct vring_avail *)vhost_iova_to_vva(dev, vq,
+						vq->ring_addrs.avail_user_addr,
+						size, VHOST_ACCESS_RW);
+	if (!vq->avail)
+		return -1;
+
+	size = sizeof(struct vring_used);
+	size += sizeof(struct vring_used_elem) * vq->size;
+	vq->used = (struct vring_used *)vhost_iova_to_vva(dev, vq,
+						vq->ring_addrs.used_user_addr,
+						size, VHOST_ACCESS_RW);
+	if (!vq->used)
+		return -1;
+
+out:
+	vq->access_ok = 1;
+
+	return 0;
+}
+
 static void
 init_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
 {
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 832a80a..4b03977 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -103,6 +103,7 @@ struct vhost_virtqueue {
 	/* Currently unused as polling mode is enabled */
 	int			kickfd;
 	int			enabled;
+	int			access_ok;
 
 	/* Physical address of used ring, for logging */
 	uint64_t		log_guest_addr;
@@ -355,6 +356,7 @@ void vhost_backend_cleanup(struct virtio_net *dev);
 void notify_iotlb_event(struct virtio_net *dev);
 
 uint64_t vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
-			uint64_t iova, uint64_t size, uint8_t perm);
+				uint64_t iova, uint64_t size, uint8_t perm);
+int vring_translate(struct virtio_net *dev, struct vhost_virtqueue *vq);
 
 #endif /* _VHOST_NET_CDEV_H_ */
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 8230578..0ebaac7 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -356,6 +356,12 @@ vhost_user_set_vring_addr(struct virtio_net *dev, VhostUserMsg *msg)
 	 */
 	memcpy(&vq->ring_addrs, addr, sizeof(*addr));
 
+	vq->desc = NULL;
+	vq->avail = NULL;
+	vq->used = NULL;
+
+	vq->access_ok = 0;
+
 	return 0;
 }
 
@@ -372,10 +378,10 @@ static struct virtio_net *translate_ring_addresses(struct virtio_net *dev,
 	vq->desc = (struct vring_desc *)(uintptr_t)ring_addr_to_vva(dev,
 			vq, addr->desc_user_addr, sizeof(struct vring_desc));
 	if (vq->desc == 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,
+		RTE_LOG(DEBUG, VHOST_CONFIG,
 			"(%d) failed to find desc ring address.\n",
 			dev->vid);
-		return NULL;
+		return dev;
 	}
 
 	dev = numa_realloc(dev, vq_index);
@@ -384,19 +390,19 @@ static struct virtio_net *translate_ring_addresses(struct virtio_net *dev,
 	vq->avail = (struct vring_avail *)(uintptr_t)ring_addr_to_vva(dev,
 			vq, addr->avail_user_addr, sizeof(struct vring_avail));
 	if (vq->avail == 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,
+		RTE_LOG(DEBUG, VHOST_CONFIG,
 			"(%d) failed to find avail ring address.\n",
 			dev->vid);
-		return NULL;
+		return dev;
 	}
 
 	vq->used = (struct vring_used *)(uintptr_t)ring_addr_to_vva(dev,
 			vq, addr->used_user_addr, sizeof(struct vring_used));
 	if (vq->used == 0) {
-		RTE_LOG(ERR, VHOST_CONFIG,
+		RTE_LOG(DEBUG, VHOST_CONFIG,
 			"(%d) failed to find used ring address.\n",
 			dev->vid);
-		return NULL;
+		return dev;
 	}
 
 	if (vq->last_used_idx != vq->used->idx) {
@@ -642,7 +648,7 @@ vhost_user_set_mem_table(struct virtio_net *dev, struct VhostUserMsg *pmsg)
 static int
 vq_is_ready(struct vhost_virtqueue *vq)
 {
-	return vq && vq->desc   &&
+	return vq && vq->desc && vq->avail && vq->used &&
 	       vq->kickfd != VIRTIO_UNINITIALIZED_EVENTFD &&
 	       vq->callfd != VIRTIO_UNINITIALIZED_EVENTFD;
 }
@@ -953,8 +959,29 @@ vhost_user_set_req_fd(struct virtio_net *dev, struct VhostUserMsg *msg)
 }
 
 static int
-vhost_user_iotlb_msg(struct virtio_net *dev, struct VhostUserMsg *msg)
+is_vring_iotlb_update(struct vhost_virtqueue *vq, struct vhost_iotlb_msg *imsg)
 {
+	struct vhost_vring_addr *ra;
+	uint64_t start, end;
+
+	start = imsg->iova;
+	end = start + imsg->size;
+
+	ra = &vq->ring_addrs;
+	if (ra->desc_user_addr >= start && ra->desc_user_addr < end)
+		return 1;
+	if (ra->avail_user_addr >= start && ra->avail_user_addr < end)
+		return 1;
+	if (ra->used_user_addr >= start && ra->used_user_addr < end)
+		return 1;
+
+	return -1;
+}
+
+static int
+vhost_user_iotlb_msg(struct virtio_net **pdev, struct VhostUserMsg *msg)
+{
+	struct virtio_net *dev = *pdev;
 	struct vhost_iotlb_msg *imsg = &msg->payload.iotlb;
 	uint16_t i;
 	uint64_t vva;
@@ -970,6 +997,9 @@ vhost_user_iotlb_msg(struct virtio_net *dev, struct VhostUserMsg *msg)
 
 			vhost_user_iotlb_insert(vq, imsg->iova, vva,
 					imsg->size, imsg->perm);
+
+			if (is_vring_iotlb_update(vq, imsg))
+				*pdev = dev = translate_ring_addresses(dev, i);
 		}
 		notify_iotlb_event(dev);
 
@@ -1120,8 +1150,12 @@ vhost_user_msg_handler(int vid, int fd)
 		goto out;
 	}
 
-	RTE_LOG(INFO, VHOST_CONFIG, "read message %s\n",
-		vhost_message_str[msg.request]);
+	if (msg.request != VHOST_USER_IOTLB_MSG)
+		RTE_LOG(INFO, VHOST_CONFIG, "read message %s\n",
+			vhost_message_str[msg.request]);
+	else
+		RTE_LOG(DEBUG, VHOST_CONFIG, "read message %s\n",
+			vhost_message_str[msg.request]);
 
 	ret = vhost_user_check_and_alloc_queue_pair(dev, &msg);
 	if (ret < 0) {
@@ -1224,7 +1258,7 @@ vhost_user_msg_handler(int vid, int fd)
 		break;
 
 	case VHOST_USER_IOTLB_MSG:
-		ret = vhost_user_iotlb_msg(dev, &msg);
+		ret = vhost_user_iotlb_msg(&dev, &msg);
 		break;
 
 	default:
@@ -1233,12 +1267,6 @@ vhost_user_msg_handler(int vid, int fd)
 
 	}
 
-	/*
-	 * The virtio_net struct might have been reallocated on a different
-	 * NUMA node, so dev pointer might no more be valid.
-	 */
-	dev = get_device(vid);
-
 	if (msg.flags & VHOST_USER_NEED_REPLY) {
 		msg.payload.u64 = !!ret;
 		msg.size = sizeof(msg.payload.u64);
diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 62c231a..b0870df 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -263,6 +263,10 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
 	if (unlikely(vq->enabled == 0))
 		return 0;
 
+	if (unlikely(vq->access_ok == 0))
+		if (unlikely(vring_translate(dev, vq) < 0))
+			return 0;
+
 	avail_idx = *((volatile uint16_t *)&vq->avail->idx);
 	start_idx = vq->last_used_idx;
 	free_entries = avail_idx - start_idx;
@@ -547,6 +551,10 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
 	if (unlikely(vq->enabled == 0))
 		return 0;
 
+	if (unlikely(vq->access_ok == 0))
+		if (unlikely(vring_translate(dev, vq) < 0))
+			return 0;
+
 	count = RTE_MIN((uint32_t)MAX_PKT_BURST, count);
 	if (count == 0)
 		return 0;
@@ -1029,6 +1037,10 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
 	if (unlikely(vq->enabled == 0))
 		goto out;
 
+	if (unlikely(vq->access_ok == 0))
+		if (unlikely(vring_translate(dev, vq) < 0))
+			return 0;
+
 	if (unlikely(dev->dequeue_zero_copy)) {
 		struct zcopy_mbuf *zmbuf, *next;
 		int nr_updated = 0;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 18/19] vhost: iommu: Invalidate vring in case of matching IOTLB invalidate
  2017-07-04  9:49 [RFC 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (16 preceding siblings ...)
  2017-07-04  9:49 ` [RFC 17/19] vhost-user: iommu: postpone device creation until ring are mapped Maxime Coquelin
@ 2017-07-04  9:49 ` Maxime Coquelin
  2017-07-04  9:49 ` [RFC 19/19] vhost: enable IOMMU support Maxime Coquelin
  2017-08-31  9:10 ` [RFC 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
  19 siblings, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2017-07-04  9:49 UTC (permalink / raw)
  To: dev, Yuanhan Liu; +Cc: mst, vkaplans, jasowang, jfreiman, Maxime Coquelin

As soon as a page used by a ring is invalidated, the access_ok flag
is cleared, so that processing threads try to map them again.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.c      | 17 +++++++++++++++++
 lib/librte_vhost/vhost.h      |  1 +
 lib/librte_vhost/vhost_user.c | 38 +++++++++++++++++++++++++++++++++-----
 3 files changed, 51 insertions(+), 5 deletions(-)

diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c
index 3f193bf..971ff51 100644
--- a/lib/librte_vhost/vhost.c
+++ b/lib/librte_vhost/vhost.c
@@ -274,6 +274,23 @@ vring_translate(struct virtio_net *dev, struct vhost_virtqueue *vq)
 	return 0;
 }
 
+void vring_invalidate(struct virtio_net *dev, struct vhost_virtqueue *vq)
+{
+	int vid = dev->vid;
+
+	/* Ensure no thread is using the vrings */
+	put_device(vid);
+	dev = get_device_wr(vid);
+
+	vq->access_ok = 0;
+	vq->desc = NULL;
+	vq->avail = NULL;
+	vq->used = NULL;
+
+	put_device_wr(vid);
+	get_device(vid);
+}
+
 static void
 init_vring_queue(struct virtio_net *dev, uint32_t vring_idx)
 {
diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 4b03977..9aef00e 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -358,5 +358,6 @@ void notify_iotlb_event(struct virtio_net *dev);
 uint64_t vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
 				uint64_t iova, uint64_t size, uint8_t perm);
 int vring_translate(struct virtio_net *dev, struct vhost_virtqueue *vq);
+void vring_invalidate(struct virtio_net *dev, struct vhost_virtqueue *vq);
 
 #endif /* _VHOST_NET_CDEV_H_ */
diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c
index 0ebaac7..1ff83ca 100644
--- a/lib/librte_vhost/vhost_user.c
+++ b/lib/librte_vhost/vhost_user.c
@@ -356,11 +356,7 @@ vhost_user_set_vring_addr(struct virtio_net *dev, VhostUserMsg *msg)
 	 */
 	memcpy(&vq->ring_addrs, addr, sizeof(*addr));
 
-	vq->desc = NULL;
-	vq->avail = NULL;
-	vq->used = NULL;
-
-	vq->access_ok = 0;
+	vring_invalidate(dev, vq);
 
 	return 0;
 }
@@ -979,6 +975,35 @@ is_vring_iotlb_update(struct vhost_virtqueue *vq, struct vhost_iotlb_msg *imsg)
 }
 
 static int
+is_vring_iotlb_invalidate(struct vhost_virtqueue *vq,
+				struct vhost_iotlb_msg *imsg)
+{
+	uint64_t istart, iend, vstart, vend;
+
+	istart = imsg->iova;
+	iend = istart + imsg->size - 1;
+
+	vstart = (uint64_t)vq->desc;
+	vend = vstart + sizeof(struct vring_desc) * vq->size - 1;
+	if (vstart <= iend && istart <= vend)
+		return 1;
+
+	vstart = (uint64_t)vq->avail;
+	vend = vstart + sizeof(struct vring_avail);
+	vend += sizeof(uint16_t) * vq->size - 1;
+	if (vstart <= iend && istart <= vend)
+		return 1;
+
+	vstart = (uint64_t)vq->used;
+	vend = vstart + sizeof(struct vring_used);
+	vend += sizeof(struct vring_used_elem) * vq->size - 1;
+	if (vstart <= iend && istart <= vend)
+		return 1;
+
+	return 0;
+}
+
+static int
 vhost_user_iotlb_msg(struct virtio_net **pdev, struct VhostUserMsg *msg)
 {
 	struct virtio_net *dev = *pdev;
@@ -1009,6 +1034,9 @@ vhost_user_iotlb_msg(struct virtio_net **pdev, struct VhostUserMsg *msg)
 			struct vhost_virtqueue *vq = dev->virtqueue[i];
 
 			vhost_user_iotlb_remove(vq, imsg->iova, imsg->size);
+
+			if (is_vring_iotlb_invalidate(vq, imsg))
+				vring_invalidate(dev, vq);
 		}
 		break;
 	default:
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC 19/19] vhost: enable IOMMU support
  2017-07-04  9:49 [RFC 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (17 preceding siblings ...)
  2017-07-04  9:49 ` [RFC 18/19] vhost: iommu: Invalidate vring in case of matching IOTLB invalidate Maxime Coquelin
@ 2017-07-04  9:49 ` Maxime Coquelin
  2017-08-31  9:10 ` [RFC 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
  19 siblings, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2017-07-04  9:49 UTC (permalink / raw)
  To: dev, Yuanhan Liu; +Cc: mst, vkaplans, jasowang, jfreiman, Maxime Coquelin

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/vhost.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h
index 9aef00e..ab6e66a 100644
--- a/lib/librte_vhost/vhost.h
+++ b/lib/librte_vhost/vhost.h
@@ -196,7 +196,8 @@ struct vhost_msg {
 				(1ULL << VIRTIO_NET_F_GUEST_TSO4) | \
 				(1ULL << VIRTIO_NET_F_GUEST_TSO6) | \
 				(1ULL << VIRTIO_RING_F_INDIRECT_DESC) | \
-				(1ULL << VIRTIO_NET_F_MTU))
+				(1ULL << VIRTIO_NET_F_MTU) | \
+				(1ULL << VIRTIO_F_IOMMU_PLATFORM))
 
 
 struct guest_page {
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC 01/19] vhost: protect virtio_net device struct
  2017-07-04  9:49 ` [RFC 01/19] vhost: protect virtio_net device struct Maxime Coquelin
@ 2017-07-05 10:07   ` Jens Freimann
  2017-07-07  7:31     ` Maxime Coquelin
  0 siblings, 1 reply; 23+ messages in thread
From: Jens Freimann @ 2017-07-05 10:07 UTC (permalink / raw)
  To: Maxime Coquelin; +Cc: dev, Yuanhan Liu, mst, vkaplans, jasowang

On Tue, Jul 04, 2017 at 11:49:04AM +0200, Maxime Coquelin wrote:
>virtio_net device might be accessed while being reallocated
>in case of NUMA awareness. This case might be theoretical,
>but it will be needed anyway to protect vrings pages against
>invalidation.
>
>The virtio_net devs are now protected with a readers/writers
>lock, so that before reallocating the device, it is ensured
>that it is not being referenced by the processing threads.
>
>Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>---
> lib/librte_vhost/vhost.c      | 223 +++++++++++++++++++++++++++++++++++-------
> lib/librte_vhost/vhost.h      |   3 +-
> lib/librte_vhost/vhost_user.c |  73 +++++---------
> lib/librte_vhost/virtio_net.c |  17 +++-
> 4 files changed, 228 insertions(+), 88 deletions(-)
[...]
>+int
>+realloc_device(int vid, int vq_index, int node)
>+{
>+	struct virtio_net *dev, *old_dev;
>+	struct vhost_virtqueue *vq;
>+
>+	dev = rte_malloc_socket(NULL, sizeof(*dev), 0, node);
>+	if (!dev)
>+		return -1;
>+
>+	vq = rte_malloc_socket(NULL, sizeof(*vq), 0, node);
>+	if (!vq)
>+		return -1;
>+
>+	old_dev = get_device_wr(vid);
>+	if (!old_dev)
>+		return -1;

Should we free vq and dev here?

regards,
Jens 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 01/19] vhost: protect virtio_net device struct
  2017-07-05 10:07   ` Jens Freimann
@ 2017-07-07  7:31     ` Maxime Coquelin
  0 siblings, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2017-07-07  7:31 UTC (permalink / raw)
  To: Jens Freimann; +Cc: dev, Yuanhan Liu, mst, vkaplans, jasowang



On 07/05/2017 12:07 PM, Jens Freimann wrote:
> On Tue, Jul 04, 2017 at 11:49:04AM +0200, Maxime Coquelin wrote:
>> virtio_net device might be accessed while being reallocated
>> in case of NUMA awareness. This case might be theoretical,
>> but it will be needed anyway to protect vrings pages against
>> invalidation.
>>
>> The virtio_net devs are now protected with a readers/writers
>> lock, so that before reallocating the device, it is ensured
>> that it is not being referenced by the processing threads.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>> lib/librte_vhost/vhost.c      | 223 
>> +++++++++++++++++++++++++++++++++++-------
>> lib/librte_vhost/vhost.h      |   3 +-
>> lib/librte_vhost/vhost_user.c |  73 +++++---------
>> lib/librte_vhost/virtio_net.c |  17 +++-
>> 4 files changed, 228 insertions(+), 88 deletions(-)
> [...]
>> +int
>> +realloc_device(int vid, int vq_index, int node)
>> +{
>> +    struct virtio_net *dev, *old_dev;
>> +    struct vhost_virtqueue *vq;
>> +
>> +    dev = rte_malloc_socket(NULL, sizeof(*dev), 0, node);
>> +    if (!dev)
>> +        return -1;
>> +
>> +    vq = rte_malloc_socket(NULL, sizeof(*vq), 0, node);
>> +    if (!vq)
>> +        return -1;
>> +
>> +    old_dev = get_device_wr(vid);
>> +    if (!old_dev)
>> +        return -1;
> 
> Should we free vq and dev here?

Of course we should.
This will be fixed in next release.

Thanks,
Maxime

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC 00/19] Vhost-user: Implement device IOTLB support
  2017-07-04  9:49 [RFC 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
                   ` (18 preceding siblings ...)
  2017-07-04  9:49 ` [RFC 19/19] vhost: enable IOMMU support Maxime Coquelin
@ 2017-08-31  9:10 ` Maxime Coquelin
  19 siblings, 0 replies; 23+ messages in thread
From: Maxime Coquelin @ 2017-08-31  9:10 UTC (permalink / raw)
  To: dev, Yuanhan Liu; +Cc: mst, vkaplans, jasowang, jfreiman

Hi,

On 07/04/2017 11:49 AM, Maxime Coquelin wrote:
> This first RFC, which targets v17.11,  adds support for
> VIRTIO_F_IOMMU_PLATFORM feature, by implementing device IOTLB in the
> vhost-user backend. It improves the guest safety by enabling the
> possibility to isolate the Virtio device.
> 
> It makes possible to use Virtio PMD in guest with using VFIO driver
> without enable_unsafe_noiommu_mode parameter set, so that the DPDK
> application on guest can only access memory its has been allowed to,
> and preventing malicious/buggy DPDK application in guest to make
> vhost-user backend write random guest memory. Note that Virtio-net
> Kernel driver also support IOMMU.
> 
> The series depends on Qemu's "vhost-user: Specify and implement
> device IOTLB support" [0], available upstream and which will be part
> of Qemu v2.10 release.
> 
> Performance-wise, even if this RFC has still room for optimizations,
> no performance degradation is noticed with static mappings (i.e. DPDK
> on guest) with PVP benchmark:
> 	Traffic Generator: Moongen (lua-trafficgen)
> 	Acceptable Loss: 0.005%
> 	Validation run time: 1 min
> 	Guest DPDK version/commit: v17.05
> 	QEMU version/commit: master (6db174aed1fd)
> 	Virtio features: default
> 	CPU: Intel(R) Xeon(R) CPU E5-2667 v4 @ 3.20GHz
> 	NIC: 2 x X710
> 	Page size: 1G host/1G guest
> 	Results (bidirectional, total of the two flows):
> 	 - base: 18.8Mpps
> 	 - base + IOTLB series, IOMMU OFF: 18.8Mpps
> 	 - base + IOTLB series, IOMMU ON: 18.8Mpps

It seems that I did a mistake when benchmarking with IOMMU on.
Actually, with this RFC, the result is 14.5Mpps, which is a noticeable
performance degradation.

Next revision fixing this issue is coming soon, performance is recovered
to 18.8Mpps.

Cheers,
Maxime

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2017-08-31  9:10 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-04  9:49 [RFC 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin
2017-07-04  9:49 ` [RFC 01/19] vhost: protect virtio_net device struct Maxime Coquelin
2017-07-05 10:07   ` Jens Freimann
2017-07-07  7:31     ` Maxime Coquelin
2017-07-04  9:49 ` [RFC 02/19] Revert "vhost: workaround MQ fails to startup" Maxime Coquelin
2017-07-04  9:49 ` [RFC 03/19] vhost: prepare send_vhost_message() to slave requests Maxime Coquelin
2017-07-04  9:49 ` [RFC 04/19] vhost: add support to slave requests channel Maxime Coquelin
2017-07-04  9:49 ` [RFC 05/19] vhost: declare missing IOMMU-related definitions for old kernels Maxime Coquelin
2017-07-04  9:49 ` [RFC 06/19] vhost: add iotlb helper functions Maxime Coquelin
2017-07-04  9:49 ` [RFC 07/19] vhost-user: add support to IOTLB miss slave requests Maxime Coquelin
2017-07-04  9:49 ` [RFC 08/19] vhost: initialize vrings IOTLB caches Maxime Coquelin
2017-07-04  9:49 ` [RFC 09/19] vhost: implement IOTLB events notification mechanism Maxime Coquelin
2017-07-04  9:49 ` [RFC 10/19] vhost-user: handle IOTLB update and invalidate requests Maxime Coquelin
2017-07-04  9:49 ` [RFC 11/19] vhost: introduce guest IOVA to backend VA helper Maxime Coquelin
2017-07-04  9:49 ` [RFC 12/19] vhost: use the guest IOVA to host " Maxime Coquelin
2017-07-04  9:49 ` [RFC 13/19] vhost: enable rings at the right time Maxime Coquelin
2017-07-04  9:49 ` [RFC 14/19] vhost: don't dereference invalid dev pointer after its reallocation Maxime Coquelin
2017-07-04  9:49 ` [RFC 15/19] vhost: postpone rings adresses translation Maxime Coquelin
2017-07-04  9:49 ` [RFC 16/19] vhost-user: translate ring addresses when IOMMU enabled Maxime Coquelin
2017-07-04  9:49 ` [RFC 17/19] vhost-user: iommu: postpone device creation until ring are mapped Maxime Coquelin
2017-07-04  9:49 ` [RFC 18/19] vhost: iommu: Invalidate vring in case of matching IOTLB invalidate Maxime Coquelin
2017-07-04  9:49 ` [RFC 19/19] vhost: enable IOMMU support Maxime Coquelin
2017-08-31  9:10 ` [RFC 00/19] Vhost-user: Implement device IOTLB support Maxime Coquelin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.