KVM Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/5] vDPA support
@ 2020-01-16 12:42 Jason Wang
  2020-01-16 12:42 ` [PATCH 1/5] vhost: factor out IOTLB Jason Wang
                   ` (5 more replies)
  0 siblings, 6 replies; 76+ messages in thread
From: Jason Wang @ 2020-01-16 12:42 UTC (permalink / raw)
  To: mst, jasowang, linux-kernel, kvm, virtualization, netdev
  Cc: tiwei.bie, jgg, maxime.coquelin, cunming.liang, zhihong.wang,
	rob.miller, xiao.w.wang, haotian.wang, lingshan.zhu, eperezma,
	lulu, parav, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, jiri, shahafs, hanand, mhabets

Hi all:

Based on the comments and discussion for mdev based hardware virtio
offloading support[1]. A different approach to support vDPA device is
proposed in this series.

Instead of leveraging VFIO/mdev which may not work for some
vendors. This series tries to introduce a dedicated vDPA bus and
leverage vhost for userspace drivers. This help for the devices that
are not fit for VFIO and may reduce the conflict when try to propose a
bus template for virtual devices in [1].

The vDPA support is split into following parts:

1) vDPA core (bus, device and driver abstraction)
2) virtio vDPA transport for kernel virtio driver to control vDPA
   device
3) vhost vDPA bus driver for userspace vhost driver to control vDPA
   device
4) vendor vDPA drivers
5) management API

Both 1) and 2) are included in this series. Tiwei will work on part
3). For 4), Ling Shan will work and post IFCVF driver. For 5) we leave
it to vendor to implement, but it's better to come into an agreement
for management to create/configure/destroy vDPA device.

The sample driver is kept but renamed to vdap_sim. An on-chip IOMMU
implementation is added to sample device to make it work for both
kernel virtio driver and userspace vhost driver. It implements a sysfs
based management API, but it can switch to any other (e.g devlink) if
necessary.

Please refer each patch for more information.

Comments are welcomed.

[1] https://lkml.org/lkml/2019/11/18/261

Jason Wang (5):
  vhost: factor out IOTLB
  vringh: IOTLB support
  vDPA: introduce vDPA bus
  virtio: introduce a vDPA based transport
  vdpasim: vDPA device simulator

 MAINTAINERS                    |   2 +
 drivers/vhost/Kconfig          |   7 +
 drivers/vhost/Kconfig.vringh   |   1 +
 drivers/vhost/Makefile         |   2 +
 drivers/vhost/net.c            |   2 +-
 drivers/vhost/vhost.c          | 221 +++------
 drivers/vhost/vhost.h          |  36 +-
 drivers/vhost/vhost_iotlb.c    | 171 +++++++
 drivers/vhost/vringh.c         | 434 +++++++++++++++++-
 drivers/virtio/Kconfig         |  15 +
 drivers/virtio/Makefile        |   2 +
 drivers/virtio/vdpa/Kconfig    |  26 ++
 drivers/virtio/vdpa/Makefile   |   3 +
 drivers/virtio/vdpa/vdpa.c     | 141 ++++++
 drivers/virtio/vdpa/vdpa_sim.c | 796 +++++++++++++++++++++++++++++++++
 drivers/virtio/virtio_vdpa.c   | 400 +++++++++++++++++
 include/linux/vdpa.h           | 191 ++++++++
 include/linux/vhost_iotlb.h    |  45 ++
 include/linux/vringh.h         |  36 ++
 19 files changed, 2327 insertions(+), 204 deletions(-)
 create mode 100644 drivers/vhost/vhost_iotlb.c
 create mode 100644 drivers/virtio/vdpa/Kconfig
 create mode 100644 drivers/virtio/vdpa/Makefile
 create mode 100644 drivers/virtio/vdpa/vdpa.c
 create mode 100644 drivers/virtio/vdpa/vdpa_sim.c
 create mode 100644 drivers/virtio/virtio_vdpa.c
 create mode 100644 include/linux/vdpa.h
 create mode 100644 include/linux/vhost_iotlb.h

-- 
2.19.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH 1/5] vhost: factor out IOTLB
  2020-01-16 12:42 [PATCH 0/5] vDPA support Jason Wang
@ 2020-01-16 12:42 ` Jason Wang
  2020-01-17  4:14   ` Randy Dunlap
                     ` (2 more replies)
  2020-01-16 12:42 ` [PATCH 2/5] vringh: IOTLB support Jason Wang
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 76+ messages in thread
From: Jason Wang @ 2020-01-16 12:42 UTC (permalink / raw)
  To: mst, jasowang, linux-kernel, kvm, virtualization, netdev
  Cc: tiwei.bie, jgg, maxime.coquelin, cunming.liang, zhihong.wang,
	rob.miller, xiao.w.wang, haotian.wang, lingshan.zhu, eperezma,
	lulu, parav, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, jiri, shahafs, hanand, mhabets

This patch factors out IOTLB into a dedicated module in order to be
reused by other modules like vringh. User may choose to enable the
automatic retiring by specifying VHOST_IOTLB_FLAG_RETIRE flag to fit
for the case of vhost device IOTLB implementation.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 MAINTAINERS                 |   1 +
 drivers/vhost/Kconfig       |   7 ++
 drivers/vhost/Makefile      |   2 +
 drivers/vhost/net.c         |   2 +-
 drivers/vhost/vhost.c       | 221 +++++++++++-------------------------
 drivers/vhost/vhost.h       |  36 ++----
 drivers/vhost/vhost_iotlb.c | 171 ++++++++++++++++++++++++++++
 include/linux/vhost_iotlb.h |  45 ++++++++
 8 files changed, 304 insertions(+), 181 deletions(-)
 create mode 100644 drivers/vhost/vhost_iotlb.c
 create mode 100644 include/linux/vhost_iotlb.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 2549f10eb0b1..d4bda9c900fa 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17607,6 +17607,7 @@ T:	git git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
 S:	Maintained
 F:	drivers/vhost/
 F:	include/uapi/linux/vhost.h
+F:	include/linux/vhost_iotlb.h
 
 VIRTIO INPUT DRIVER
 M:	Gerd Hoffmann <kraxel@redhat.com>
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 3d03ccbd1adc..f21c45aa5e07 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -36,6 +36,7 @@ config VHOST_VSOCK
 
 config VHOST
 	tristate
+        depends on VHOST_IOTLB
 	---help---
 	  This option is selected by any driver which needs to access
 	  the core of vhost.
@@ -54,3 +55,9 @@ config VHOST_CROSS_ENDIAN_LEGACY
 	  adds some overhead, it is disabled by default.
 
 	  If unsure, say "N".
+
+config VHOST_IOTLB
+	tristate
+        default m
+        help
+          Generic IOTLB implementation for vhost and vringh.
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index 6c6df24f770c..df99756fbb26 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -11,3 +11,5 @@ vhost_vsock-y := vsock.o
 obj-$(CONFIG_VHOST_RING) += vringh.o
 
 obj-$(CONFIG_VHOST)	+= vhost.o
+
+obj-$(CONFIG_VHOST_IOTLB) += vhost_iotlb.o
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index e158159671fa..e4a20d7a2921 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1594,7 +1594,7 @@ static long vhost_net_reset_owner(struct vhost_net *n)
 	struct socket *tx_sock = NULL;
 	struct socket *rx_sock = NULL;
 	long err;
-	struct vhost_umem *umem;
+	struct vhost_iotlb *umem;
 
 	mutex_lock(&n->dev.mutex);
 	err = vhost_dev_check_owner(&n->dev);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index f44340b41494..9059b95cac83 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -50,10 +50,6 @@ enum {
 #define vhost_used_event(vq) ((__virtio16 __user *)&vq->avail->ring[vq->num])
 #define vhost_avail_event(vq) ((__virtio16 __user *)&vq->used->ring[vq->num])
 
-INTERVAL_TREE_DEFINE(struct vhost_umem_node,
-		     rb, __u64, __subtree_last,
-		     START, LAST, static inline, vhost_umem_interval_tree);
-
 #ifdef CONFIG_VHOST_CROSS_ENDIAN_LEGACY
 static void vhost_disable_cross_endian(struct vhost_virtqueue *vq)
 {
@@ -581,21 +577,25 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
 }
 EXPORT_SYMBOL_GPL(vhost_dev_set_owner);
 
-struct vhost_umem *vhost_dev_reset_owner_prepare(void)
+static struct vhost_iotlb *iotlb_alloc(void)
+{
+	return vhost_iotlb_alloc(max_iotlb_entries,
+				 VHOST_IOTLB_FLAG_RETIRE);
+}
+
+struct vhost_iotlb *vhost_dev_reset_owner_prepare(void)
 {
-	return kvzalloc(sizeof(struct vhost_umem), GFP_KERNEL);
+	return iotlb_alloc();
 }
 EXPORT_SYMBOL_GPL(vhost_dev_reset_owner_prepare);
 
 /* Caller should have device mutex */
-void vhost_dev_reset_owner(struct vhost_dev *dev, struct vhost_umem *umem)
+void vhost_dev_reset_owner(struct vhost_dev *dev, struct vhost_iotlb *umem)
 {
 	int i;
 
 	vhost_dev_cleanup(dev);
 
-	/* Restore memory to default empty mapping. */
-	INIT_LIST_HEAD(&umem->umem_list);
 	dev->umem = umem;
 	/* We don't need VQ locks below since vhost_dev_cleanup makes sure
 	 * VQs aren't running.
@@ -618,28 +618,6 @@ void vhost_dev_stop(struct vhost_dev *dev)
 }
 EXPORT_SYMBOL_GPL(vhost_dev_stop);
 
-static void vhost_umem_free(struct vhost_umem *umem,
-			    struct vhost_umem_node *node)
-{
-	vhost_umem_interval_tree_remove(node, &umem->umem_tree);
-	list_del(&node->link);
-	kfree(node);
-	umem->numem--;
-}
-
-static void vhost_umem_clean(struct vhost_umem *umem)
-{
-	struct vhost_umem_node *node, *tmp;
-
-	if (!umem)
-		return;
-
-	list_for_each_entry_safe(node, tmp, &umem->umem_list, link)
-		vhost_umem_free(umem, node);
-
-	kvfree(umem);
-}
-
 static void vhost_clear_msg(struct vhost_dev *dev)
 {
 	struct vhost_msg_node *node, *n;
@@ -677,9 +655,9 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
 		eventfd_ctx_put(dev->log_ctx);
 	dev->log_ctx = NULL;
 	/* No one will access memory at this point */
-	vhost_umem_clean(dev->umem);
+	vhost_iotlb_free(dev->umem);
 	dev->umem = NULL;
-	vhost_umem_clean(dev->iotlb);
+	vhost_iotlb_free(dev->iotlb);
 	dev->iotlb = NULL;
 	vhost_clear_msg(dev);
 	wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM);
@@ -715,27 +693,26 @@ static bool vhost_overflow(u64 uaddr, u64 size)
 }
 
 /* Caller should have vq mutex and device mutex. */
-static bool vq_memory_access_ok(void __user *log_base, struct vhost_umem *umem,
+static bool vq_memory_access_ok(void __user *log_base, struct vhost_iotlb *umem,
 				int log_all)
 {
-	struct vhost_umem_node *node;
+	struct vhost_iotlb_map *map;
 
 	if (!umem)
 		return false;
 
-	list_for_each_entry(node, &umem->umem_list, link) {
-		unsigned long a = node->userspace_addr;
+	list_for_each_entry(map, &umem->list, link) {
+		unsigned long a = map->addr;
 
-		if (vhost_overflow(node->userspace_addr, node->size))
+		if (vhost_overflow(map->addr, map->size))
 			return false;
 
 
-		if (!access_ok((void __user *)a,
-				    node->size))
+		if (!access_ok((void __user *)a, map->size))
 			return false;
 		else if (log_all && !log_access_ok(log_base,
-						   node->start,
-						   node->size))
+						   map->start,
+						   map->size))
 			return false;
 	}
 	return true;
@@ -745,17 +722,17 @@ static inline void __user *vhost_vq_meta_fetch(struct vhost_virtqueue *vq,
 					       u64 addr, unsigned int size,
 					       int type)
 {
-	const struct vhost_umem_node *node = vq->meta_iotlb[type];
+	const struct vhost_iotlb_map *map = vq->meta_iotlb[type];
 
-	if (!node)
+	if (!map)
 		return NULL;
 
-	return (void *)(uintptr_t)(node->userspace_addr + addr - node->start);
+	return (void *)(uintptr_t)(map->addr + addr - map->start);
 }
 
 /* Can we switch to this memory table? */
 /* Caller should have device mutex but not vq mutex */
-static bool memory_access_ok(struct vhost_dev *d, struct vhost_umem *umem,
+static bool memory_access_ok(struct vhost_dev *d, struct vhost_iotlb *umem,
 			     int log_all)
 {
 	int i;
@@ -1020,47 +997,6 @@ static inline int vhost_get_desc(struct vhost_virtqueue *vq,
 	return vhost_copy_from_user(vq, desc, vq->desc + idx, sizeof(*desc));
 }
 
-static int vhost_new_umem_range(struct vhost_umem *umem,
-				u64 start, u64 size, u64 end,
-				u64 userspace_addr, int perm)
-{
-	struct vhost_umem_node *tmp, *node;
-
-	if (!size)
-		return -EFAULT;
-
-	node = kmalloc(sizeof(*node), GFP_ATOMIC);
-	if (!node)
-		return -ENOMEM;
-
-	if (umem->numem == max_iotlb_entries) {
-		tmp = list_first_entry(&umem->umem_list, typeof(*tmp), link);
-		vhost_umem_free(umem, tmp);
-	}
-
-	node->start = start;
-	node->size = size;
-	node->last = end;
-	node->userspace_addr = userspace_addr;
-	node->perm = perm;
-	INIT_LIST_HEAD(&node->link);
-	list_add_tail(&node->link, &umem->umem_list);
-	vhost_umem_interval_tree_insert(node, &umem->umem_tree);
-	umem->numem++;
-
-	return 0;
-}
-
-static void vhost_del_umem_range(struct vhost_umem *umem,
-				 u64 start, u64 end)
-{
-	struct vhost_umem_node *node;
-
-	while ((node = vhost_umem_interval_tree_iter_first(&umem->umem_tree,
-							   start, end)))
-		vhost_umem_free(umem, node);
-}
-
 static void vhost_iotlb_notify_vq(struct vhost_dev *d,
 				  struct vhost_iotlb_msg *msg)
 {
@@ -1117,9 +1053,9 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
 			break;
 		}
 		vhost_vq_meta_reset(dev);
-		if (vhost_new_umem_range(dev->iotlb, msg->iova, msg->size,
-					 msg->iova + msg->size - 1,
-					 msg->uaddr, msg->perm)) {
+		if (vhost_iotlb_add_range(dev->iotlb, msg->iova,
+					  msg->iova + msg->size - 1,
+					  msg->uaddr, msg->perm)) {
 			ret = -ENOMEM;
 			break;
 		}
@@ -1131,8 +1067,8 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
 			break;
 		}
 		vhost_vq_meta_reset(dev);
-		vhost_del_umem_range(dev->iotlb, msg->iova,
-				     msg->iova + msg->size - 1);
+		vhost_iotlb_del_range(dev->iotlb, msg->iova,
+				      msg->iova + msg->size - 1);
 		break;
 	default:
 		ret = -EINVAL;
@@ -1311,44 +1247,42 @@ static bool vq_access_ok(struct vhost_virtqueue *vq, unsigned int num,
 }
 
 static void vhost_vq_meta_update(struct vhost_virtqueue *vq,
-				 const struct vhost_umem_node *node,
+				 const struct vhost_iotlb_map *map,
 				 int type)
 {
 	int access = (type == VHOST_ADDR_USED) ?
 		     VHOST_ACCESS_WO : VHOST_ACCESS_RO;
 
-	if (likely(node->perm & access))
-		vq->meta_iotlb[type] = node;
+	if (likely(map->perm & access))
+		vq->meta_iotlb[type] = map;
 }
 
 static bool iotlb_access_ok(struct vhost_virtqueue *vq,
 			    int access, u64 addr, u64 len, int type)
 {
-	const struct vhost_umem_node *node;
-	struct vhost_umem *umem = vq->iotlb;
+	const struct vhost_iotlb_map *map;
+	struct vhost_iotlb *umem = vq->iotlb;
 	u64 s = 0, size, orig_addr = addr, last = addr + len - 1;
 
 	if (vhost_vq_meta_fetch(vq, addr, len, type))
 		return true;
 
 	while (len > s) {
-		node = vhost_umem_interval_tree_iter_first(&umem->umem_tree,
-							   addr,
-							   last);
-		if (node == NULL || node->start > addr) {
+		map = vhost_iotlb_itree_first(umem, addr, last);
+		if (map == NULL || map->start > addr) {
 			vhost_iotlb_miss(vq, addr, access);
 			return false;
-		} else if (!(node->perm & access)) {
+		} else if (!(map->perm & access)) {
 			/* Report the possible access violation by
 			 * request another translation from userspace.
 			 */
 			return false;
 		}
 
-		size = node->size - addr + node->start;
+		size = map->size - addr + map->start;
 
 		if (orig_addr == addr && size >= len)
-			vhost_vq_meta_update(vq, node, type);
+			vhost_vq_meta_update(vq, map, type);
 
 		s += size;
 		addr += size;
@@ -1364,12 +1298,12 @@ int vq_meta_prefetch(struct vhost_virtqueue *vq)
 	if (!vq->iotlb)
 		return 1;
 
-	return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc,
+	return iotlb_access_ok(vq, VHOST_MAP_RO, (u64)(uintptr_t)vq->desc,
 			       vhost_get_desc_size(vq, num), VHOST_ADDR_DESC) &&
-	       iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->avail,
+	       iotlb_access_ok(vq, VHOST_MAP_RO, (u64)(uintptr_t)vq->avail,
 			       vhost_get_avail_size(vq, num),
 			       VHOST_ADDR_AVAIL) &&
-	       iotlb_access_ok(vq, VHOST_ACCESS_WO, (u64)(uintptr_t)vq->used,
+	       iotlb_access_ok(vq, VHOST_MAP_WO, (u64)(uintptr_t)vq->used,
 			       vhost_get_used_size(vq, num), VHOST_ADDR_USED);
 }
 EXPORT_SYMBOL_GPL(vq_meta_prefetch);
@@ -1408,25 +1342,11 @@ bool vhost_vq_access_ok(struct vhost_virtqueue *vq)
 }
 EXPORT_SYMBOL_GPL(vhost_vq_access_ok);
 
-static struct vhost_umem *vhost_umem_alloc(void)
-{
-	struct vhost_umem *umem = kvzalloc(sizeof(*umem), GFP_KERNEL);
-
-	if (!umem)
-		return NULL;
-
-	umem->umem_tree = RB_ROOT_CACHED;
-	umem->numem = 0;
-	INIT_LIST_HEAD(&umem->umem_list);
-
-	return umem;
-}
-
 static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
 {
 	struct vhost_memory mem, *newmem;
 	struct vhost_memory_region *region;
-	struct vhost_umem *newumem, *oldumem;
+	struct vhost_iotlb *newumem, *oldumem;
 	unsigned long size = offsetof(struct vhost_memory, regions);
 	int i;
 
@@ -1448,7 +1368,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
 		return -EFAULT;
 	}
 
-	newumem = vhost_umem_alloc();
+	newumem = iotlb_alloc();
 	if (!newumem) {
 		kvfree(newmem);
 		return -ENOMEM;
@@ -1457,13 +1377,12 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
 	for (region = newmem->regions;
 	     region < newmem->regions + mem.nregions;
 	     region++) {
-		if (vhost_new_umem_range(newumem,
-					 region->guest_phys_addr,
-					 region->memory_size,
-					 region->guest_phys_addr +
-					 region->memory_size - 1,
-					 region->userspace_addr,
-					 VHOST_ACCESS_RW))
+		if (vhost_iotlb_add_range(newumem,
+					  region->guest_phys_addr,
+					  region->guest_phys_addr +
+					  region->memory_size - 1,
+					  region->userspace_addr,
+					  VHOST_MAP_RW))
 			goto err;
 	}
 
@@ -1481,11 +1400,11 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
 	}
 
 	kvfree(newmem);
-	vhost_umem_clean(oldumem);
+	vhost_iotlb_free(oldumem);
 	return 0;
 
 err:
-	vhost_umem_clean(newumem);
+	vhost_iotlb_free(newumem);
 	kvfree(newmem);
 	return -EFAULT;
 }
@@ -1726,10 +1645,10 @@ EXPORT_SYMBOL_GPL(vhost_vring_ioctl);
 
 int vhost_init_device_iotlb(struct vhost_dev *d, bool enabled)
 {
-	struct vhost_umem *niotlb, *oiotlb;
+	struct vhost_iotlb *niotlb, *oiotlb;
 	int i;
 
-	niotlb = vhost_umem_alloc();
+	niotlb = iotlb_alloc();
 	if (!niotlb)
 		return -ENOMEM;
 
@@ -1745,7 +1664,7 @@ int vhost_init_device_iotlb(struct vhost_dev *d, bool enabled)
 		mutex_unlock(&vq->mutex);
 	}
 
-	vhost_umem_clean(oiotlb);
+	vhost_iotlb_free(oiotlb);
 
 	return 0;
 }
@@ -1875,8 +1794,8 @@ static int log_write(void __user *log_base,
 
 static int log_write_hva(struct vhost_virtqueue *vq, u64 hva, u64 len)
 {
-	struct vhost_umem *umem = vq->umem;
-	struct vhost_umem_node *u;
+	struct vhost_iotlb *umem = vq->umem;
+	struct vhost_iotlb_map *u;
 	u64 start, end, l, min;
 	int r;
 	bool hit = false;
@@ -1886,16 +1805,15 @@ static int log_write_hva(struct vhost_virtqueue *vq, u64 hva, u64 len)
 		/* More than one GPAs can be mapped into a single HVA. So
 		 * iterate all possible umems here to be safe.
 		 */
-		list_for_each_entry(u, &umem->umem_list, link) {
-			if (u->userspace_addr > hva - 1 + len ||
-			    u->userspace_addr - 1 + u->size < hva)
+		list_for_each_entry(u, &umem->list, link) {
+			if (u->addr > hva - 1 + len ||
+			    u->addr - 1 + u->size < hva)
 				continue;
-			start = max(u->userspace_addr, hva);
-			end = min(u->userspace_addr - 1 + u->size,
-				  hva - 1 + len);
+			start = max(u->addr, hva);
+			end = min(u->addr - 1 + u->size, hva - 1 + len);
 			l = end - start + 1;
 			r = log_write(vq->log_base,
-				      u->start + start - u->userspace_addr,
+				      u->start + start - u->addr,
 				      l);
 			if (r < 0)
 				return r;
@@ -2046,9 +1964,9 @@ EXPORT_SYMBOL_GPL(vhost_vq_init_access);
 static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len,
 			  struct iovec iov[], int iov_size, int access)
 {
-	const struct vhost_umem_node *node;
+	const struct vhost_iotlb_map *map;
 	struct vhost_dev *dev = vq->dev;
-	struct vhost_umem *umem = dev->iotlb ? dev->iotlb : dev->umem;
+	struct vhost_iotlb *umem = dev->iotlb ? dev->iotlb : dev->umem;
 	struct iovec *_iov;
 	u64 s = 0;
 	int ret = 0;
@@ -2060,25 +1978,24 @@ static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len,
 			break;
 		}
 
-		node = vhost_umem_interval_tree_iter_first(&umem->umem_tree,
-							addr, addr + len - 1);
-		if (node == NULL || node->start > addr) {
+		map = vhost_iotlb_itree_first(umem, addr, addr + len - 1);
+		if (map == NULL || map->start > addr) {
 			if (umem != dev->iotlb) {
 				ret = -EFAULT;
 				break;
 			}
 			ret = -EAGAIN;
 			break;
-		} else if (!(node->perm & access)) {
+		} else if (!(map->perm & access)) {
 			ret = -EPERM;
 			break;
 		}
 
 		_iov = iov + ret;
-		size = node->size - addr + node->start;
+		size = map->size - addr + map->start;
 		_iov->iov_len = min((u64)len - s, size);
 		_iov->iov_base = (void __user *)(unsigned long)
-			(node->userspace_addr + addr - node->start);
+				 (map->addr + addr - map->start);
 		s += size;
 		addr += size;
 		++ret;
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index a123fd70847e..b99c6ffb6be1 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -12,6 +12,7 @@
 #include <linux/virtio_config.h>
 #include <linux/virtio_ring.h>
 #include <linux/atomic.h>
+#include <linux/vhost_iotlb.h>
 
 struct vhost_work;
 typedef void (*vhost_work_fn_t)(struct vhost_work *work);
@@ -52,27 +53,6 @@ struct vhost_log {
 	u64 len;
 };
 
-#define START(node) ((node)->start)
-#define LAST(node) ((node)->last)
-
-struct vhost_umem_node {
-	struct rb_node rb;
-	struct list_head link;
-	__u64 start;
-	__u64 last;
-	__u64 size;
-	__u64 userspace_addr;
-	__u32 perm;
-	__u32 flags_padding;
-	__u64 __subtree_last;
-};
-
-struct vhost_umem {
-	struct rb_root_cached umem_tree;
-	struct list_head umem_list;
-	int numem;
-};
-
 enum vhost_uaddr_type {
 	VHOST_ADDR_DESC = 0,
 	VHOST_ADDR_AVAIL = 1,
@@ -90,7 +70,7 @@ struct vhost_virtqueue {
 	struct vring_desc __user *desc;
 	struct vring_avail __user *avail;
 	struct vring_used __user *used;
-	const struct vhost_umem_node *meta_iotlb[VHOST_NUM_ADDRS];
+	const struct vhost_iotlb_map *meta_iotlb[VHOST_NUM_ADDRS];
 	struct file *kick;
 	struct eventfd_ctx *call_ctx;
 	struct eventfd_ctx *error_ctx;
@@ -128,8 +108,8 @@ struct vhost_virtqueue {
 	struct iovec *indirect;
 	struct vring_used_elem *heads;
 	/* Protected by virtqueue mutex. */
-	struct vhost_umem *umem;
-	struct vhost_umem *iotlb;
+	struct vhost_iotlb *umem;
+	struct vhost_iotlb *iotlb;
 	void *private_data;
 	u64 acked_features;
 	u64 acked_backend_features;
@@ -164,8 +144,8 @@ struct vhost_dev {
 	struct eventfd_ctx *log_ctx;
 	struct llist_head work_list;
 	struct task_struct *worker;
-	struct vhost_umem *umem;
-	struct vhost_umem *iotlb;
+	struct vhost_iotlb *umem;
+	struct vhost_iotlb *iotlb;
 	spinlock_t iotlb_lock;
 	struct list_head read_list;
 	struct list_head pending_list;
@@ -182,8 +162,8 @@ void vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs,
 long vhost_dev_set_owner(struct vhost_dev *dev);
 bool vhost_dev_has_owner(struct vhost_dev *dev);
 long vhost_dev_check_owner(struct vhost_dev *);
-struct vhost_umem *vhost_dev_reset_owner_prepare(void);
-void vhost_dev_reset_owner(struct vhost_dev *, struct vhost_umem *);
+struct vhost_iotlb *vhost_dev_reset_owner_prepare(void);
+void vhost_dev_reset_owner(struct vhost_dev *dev, struct vhost_iotlb *iotlb);
 void vhost_dev_cleanup(struct vhost_dev *);
 void vhost_dev_stop(struct vhost_dev *);
 long vhost_dev_ioctl(struct vhost_dev *, unsigned int ioctl, void __user *argp);
diff --git a/drivers/vhost/vhost_iotlb.c b/drivers/vhost/vhost_iotlb.c
new file mode 100644
index 000000000000..e08710f1690c
--- /dev/null
+++ b/drivers/vhost/vhost_iotlb.c
@@ -0,0 +1,171 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (C) 2020 Red Hat, Inc.
+ * Author: Jason Wang <jasowang@redhat.com>
+ *
+ * IOTLB implementation for vhost.
+ */
+#include <linux/slab.h>
+#include <linux/vhost_iotlb.h>
+#include <linux/module.h>
+
+#define MOD_VERSION  "0.1"
+#define MOD_DESC     "VHOST IOTLB"
+#define MOD_AUTHOR   "Jason Wang <jasowang@redhat.com>"
+#define MOD_LICENSE  "GPL v2"
+
+#define START(map) ((map)->start)
+#define LAST(map) ((map)->last)
+
+INTERVAL_TREE_DEFINE(struct vhost_iotlb_map,
+		     rb, __u64, __subtree_last,
+		     START, LAST, static inline, vhost_iotlb_itree);
+
+static void iotlb_map_free(struct vhost_iotlb *iotlb,
+			   struct vhost_iotlb_map *map)
+{
+	vhost_iotlb_itree_remove(map, &iotlb->root);
+	list_del(&map->link);
+	kfree(map);
+	iotlb->nmaps--;
+}
+
+/**
+ * vhost_iotlb_add_range - add a new range to vhost IOTLB
+ * @iotlb: the IOTLB
+ * @start: start of the IOVA range
+ * @last: last of IOVA range
+ * @addr: the address that is mapped to @start
+ * @perm: access permission of this range
+ *
+ * Returns an error last is smaller than start or memory allocation
+ * fails
+ */
+int vhost_iotlb_add_range(struct vhost_iotlb *iotlb,
+			  u64 start, u64 last,
+			  u64 addr, unsigned int perm)
+{
+	struct vhost_iotlb_map *map;
+
+	if (last < start)
+		return -EFAULT;
+
+	if (iotlb->limit &&
+	    iotlb->nmaps == iotlb->limit &&
+	    iotlb->flags & VHOST_IOTLB_FLAG_RETIRE) {
+		map = list_first_entry(&iotlb->list, typeof(*map), link);
+		iotlb_map_free(iotlb, map);
+	}
+
+	map = kmalloc(sizeof(*map), GFP_ATOMIC);
+	if (!map)
+		return -ENOMEM;
+
+	map->start = start;
+	map->size = last - start + 1;
+	map->last = last;
+	map->addr = addr;
+	map->perm = perm;
+
+	iotlb->nmaps++;
+	vhost_iotlb_itree_insert(map, &iotlb->root);
+
+	INIT_LIST_HEAD(&map->link);
+	list_add_tail(&map->link, &iotlb->list);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(vhost_iotlb_add_range);
+
+/**
+ * vring_iotlb_del_range - delete overlapped ranges from vhost IOTLB
+ * @iotlb: the IOTLB
+ * @start: start of the IOVA range
+ * @last: last of IOVA range
+ */
+void vhost_iotlb_del_range(struct vhost_iotlb *iotlb, u64 start, u64 last)
+{
+	struct vhost_iotlb_map *map;
+
+	while ((map = vhost_iotlb_itree_iter_first(&iotlb->root,
+						   start, last)))
+		iotlb_map_free(iotlb, map);
+}
+EXPORT_SYMBOL_GPL(vhost_iotlb_del_range);
+
+/**
+ * vhost_iotlb_alloc - add a new vhost IOTLB
+ * @limit: maximum number of IOTLB entries
+ * @flags: VHOST_IOTLB_FLAG_XXX
+ *
+ * Returns an error is memory allocation fails
+ */
+struct vhost_iotlb *vhost_iotlb_alloc(unsigned int limit, unsigned int flags)
+{
+	struct vhost_iotlb *iotlb = kzalloc(sizeof(*iotlb), GFP_KERNEL);
+
+	if (!iotlb)
+		return NULL;
+
+	iotlb->root = RB_ROOT_CACHED;
+	iotlb->limit = limit;
+	iotlb->nmaps = 0;
+	iotlb->flags = flags;
+	INIT_LIST_HEAD(&iotlb->list);
+
+	return iotlb;
+}
+EXPORT_SYMBOL_GPL(vhost_iotlb_alloc);
+
+/**
+ * vhost_iotlb_reset - reset vhost IOTLB (free all IOTLB entries)
+ * @iotlb: the IOTLB to be reset
+ */
+void vhost_iotlb_reset(struct vhost_iotlb *iotlb)
+{
+	vhost_iotlb_del_range(iotlb, 0ULL, 0ULL - 1);
+}
+EXPORT_SYMBOL_GPL(vhost_iotlb_reset);
+
+/**
+ * vhost_iotlb_free - reset and free vhost IOTLB
+ * @iotlb: the IOTLB to be freed
+ */
+void vhost_iotlb_free(struct vhost_iotlb *iotlb)
+{
+	if (iotlb) {
+		vhost_iotlb_reset(iotlb);
+		kfree(iotlb);
+	}
+}
+EXPORT_SYMBOL_GPL(vhost_iotlb_free);
+
+/**
+ * vhost_iotlb_itree_first - return the first overlapped range
+ * @iotlb: the IOTLB
+ * @start: start of IOVA range
+ * @end: end of IOVA range
+ */
+struct vhost_iotlb_map *
+vhost_iotlb_itree_first(struct vhost_iotlb *iotlb, u64 start, u64 last)
+{
+	return vhost_iotlb_itree_iter_first(&iotlb->root, start, last);
+}
+EXPORT_SYMBOL_GPL(vhost_iotlb_itree_first);
+
+/**
+ * vhost_iotlb_itree_first - return the next overlapped range
+ * @iotlb: the IOTLB
+ * @start: start of IOVA range
+ * @end: end of IOVA range
+ */
+struct vhost_iotlb_map *
+vhost_iotlb_itree_next(struct vhost_iotlb_map *map, u64 start, u64 last)
+{
+	return vhost_iotlb_itree_iter_next(map, start, last);
+}
+EXPORT_SYMBOL_GPL(vhost_iotlb_itree_next);
+
+MODULE_VERSION(MOD_VERSION);
+MODULE_DESCRIPTION(MOD_DESC);
+MODULE_AUTHOR(MOD_AUTHOR);
+MODULE_LICENSE(MOD_LICENSE);
diff --git a/include/linux/vhost_iotlb.h b/include/linux/vhost_iotlb.h
new file mode 100644
index 000000000000..a44c61f5627b
--- /dev/null
+++ b/include/linux/vhost_iotlb.h
@@ -0,0 +1,45 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_VHOST_IOTLB_H
+#define _LINUX_VHOST_IOTLB_H
+
+#include <linux/interval_tree_generic.h>
+
+struct vhost_iotlb_map {
+	struct rb_node rb;
+	struct list_head link;
+	u64 start;
+	u64 last;
+	u64 size;
+	u64 addr;
+#define VHOST_MAP_RO 0x1
+#define VHOST_MAP_WO 0x2
+#define VHOST_MAP_RW 0x3
+	u32 perm;
+	u32 flags_padding;
+	u64 __subtree_last;
+};
+
+#define VHOST_IOTLB_FLAG_RETIRE 0x1
+
+struct vhost_iotlb {
+	struct rb_root_cached root;
+	struct list_head list;
+	unsigned int limit;
+	unsigned int nmaps;
+	unsigned int flags;
+};
+
+int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, u64 start, u64 last,
+			  u64 addr, unsigned int perm);
+void vhost_iotlb_del_range(struct vhost_iotlb *iotlb, u64 start, u64 last);
+
+struct vhost_iotlb *vhost_iotlb_alloc(unsigned int limit, unsigned int flags);
+void vhost_iotlb_free(struct vhost_iotlb *iotlb);
+void vhost_iotlb_reset(struct vhost_iotlb *iotlb);
+
+struct vhost_iotlb_map *
+vhost_iotlb_itree_first(struct vhost_iotlb *iotlb, u64 start, u64 last);
+struct vhost_iotlb_map *
+vhost_iotlb_itree_next(struct vhost_iotlb_map *map, u64 start, u64 last);
+
+#endif
-- 
2.19.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH 2/5] vringh: IOTLB support
  2020-01-16 12:42 [PATCH 0/5] vDPA support Jason Wang
  2020-01-16 12:42 ` [PATCH 1/5] vhost: factor out IOTLB Jason Wang
@ 2020-01-16 12:42 ` Jason Wang
  2020-01-17 21:54   ` kbuild test robot
  2020-01-17 22:33   ` kbuild test robot
  2020-01-16 12:42 ` [PATCH 3/5] vDPA: introduce vDPA bus Jason Wang
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 76+ messages in thread
From: Jason Wang @ 2020-01-16 12:42 UTC (permalink / raw)
  To: mst, jasowang, linux-kernel, kvm, virtualization, netdev
  Cc: tiwei.bie, jgg, maxime.coquelin, cunming.liang, zhihong.wang,
	rob.miller, xiao.w.wang, haotian.wang, lingshan.zhu, eperezma,
	lulu, parav, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, jiri, shahafs, hanand, mhabets

This patch implements the third memory accessor for vringh besides
current kernel and userspace accessors. This idea is to allow vringh
to do the address translation through an IOTLB which is implemented
via vhost_map interval tree. Users should setup and IOVA to PA mapping
in this IOTLB.

This allows us to:

- Using vringh to access virtqueues with vIOMMU
- Using vringh to implement software vDPA devices

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/Kconfig.vringh |   1 +
 drivers/vhost/vringh.c       | 434 +++++++++++++++++++++++++++++++++--
 include/linux/vringh.h       |  36 +++
 3 files changed, 448 insertions(+), 23 deletions(-)

diff --git a/drivers/vhost/Kconfig.vringh b/drivers/vhost/Kconfig.vringh
index c1fe36a9b8d4..294a47cff35f 100644
--- a/drivers/vhost/Kconfig.vringh
+++ b/drivers/vhost/Kconfig.vringh
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 config VHOST_RING
 	tristate
+        select VHOST_IOTLB
 	---help---
 	  This option is selected by any driver which needs to access
 	  the host side of a virtio ring.
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c
index a0a2d74967ef..5a793cc77c4e 100644
--- a/drivers/vhost/vringh.c
+++ b/drivers/vhost/vringh.c
@@ -13,6 +13,9 @@
 #include <linux/uaccess.h>
 #include <linux/slab.h>
 #include <linux/export.h>
+#include <linux/bvec.h>
+#include <linux/highmem.h>
+#include <linux/vhost_iotlb.h>
 #include <uapi/linux/virtio_config.h>
 
 static __printf(1,2) __cold void vringh_bad(const char *fmt, ...)
@@ -71,9 +74,11 @@ static inline int __vringh_get_head(const struct vringh *vrh,
 }
 
 /* Copy some bytes to/from the iovec.  Returns num copied. */
-static inline ssize_t vringh_iov_xfer(struct vringh_kiov *iov,
+static inline ssize_t vringh_iov_xfer(struct vringh *vrh,
+				      struct vringh_kiov *iov,
 				      void *ptr, size_t len,
-				      int (*xfer)(void *addr, void *ptr,
+				      int (*xfer)(const struct vringh *vrh,
+						  void *addr, void *ptr,
 						  size_t len))
 {
 	int err, done = 0;
@@ -82,7 +87,7 @@ static inline ssize_t vringh_iov_xfer(struct vringh_kiov *iov,
 		size_t partlen;
 
 		partlen = min(iov->iov[iov->i].iov_len, len);
-		err = xfer(iov->iov[iov->i].iov_base, ptr, partlen);
+		err = xfer(vrh, iov->iov[iov->i].iov_base, ptr, partlen);
 		if (err)
 			return err;
 		done += partlen;
@@ -96,6 +101,7 @@ static inline ssize_t vringh_iov_xfer(struct vringh_kiov *iov,
 			/* Fix up old iov element then increment. */
 			iov->iov[iov->i].iov_len = iov->consumed;
 			iov->iov[iov->i].iov_base -= iov->consumed;
+
 			
 			iov->consumed = 0;
 			iov->i++;
@@ -227,7 +233,8 @@ static int slow_copy(struct vringh *vrh, void *dst, const void *src,
 				      u64 addr,
 				      struct vringh_range *r),
 		     struct vringh_range *range,
-		     int (*copy)(void *dst, const void *src, size_t len))
+		     int (*copy)(const struct vringh *vrh,
+				 void *dst, const void *src, size_t len))
 {
 	size_t part, len = sizeof(struct vring_desc);
 
@@ -241,7 +248,7 @@ static int slow_copy(struct vringh *vrh, void *dst, const void *src,
 		if (!rcheck(vrh, addr, &part, range, getrange))
 			return -EINVAL;
 
-		err = copy(dst, src, part);
+		err = copy(vrh, dst, src, part);
 		if (err)
 			return err;
 
@@ -262,7 +269,8 @@ __vringh_iov(struct vringh *vrh, u16 i,
 					     struct vringh_range *)),
 	     bool (*getrange)(struct vringh *, u64, struct vringh_range *),
 	     gfp_t gfp,
-	     int (*copy)(void *dst, const void *src, size_t len))
+	     int (*copy)(const struct vringh *vrh,
+			 void *dst, const void *src, size_t len))
 {
 	int err, count = 0, up_next, desc_max;
 	struct vring_desc desc, *descs;
@@ -291,7 +299,7 @@ __vringh_iov(struct vringh *vrh, u16 i,
 			err = slow_copy(vrh, &desc, &descs[i], rcheck, getrange,
 					&slowrange, copy);
 		else
-			err = copy(&desc, &descs[i], sizeof(desc));
+			err = copy(vrh, &desc, &descs[i], sizeof(desc));
 		if (unlikely(err))
 			goto fail;
 
@@ -404,7 +412,8 @@ static inline int __vringh_complete(struct vringh *vrh,
 				    unsigned int num_used,
 				    int (*putu16)(const struct vringh *vrh,
 						  __virtio16 *p, u16 val),
-				    int (*putused)(struct vring_used_elem *dst,
+				    int (*putused)(const struct vringh *vrh,
+						   struct vring_used_elem *dst,
 						   const struct vring_used_elem
 						   *src, unsigned num))
 {
@@ -420,12 +429,12 @@ static inline int __vringh_complete(struct vringh *vrh,
 	/* Compiler knows num_used == 1 sometimes, hence extra check */
 	if (num_used > 1 && unlikely(off + num_used >= vrh->vring.num)) {
 		u16 part = vrh->vring.num - off;
-		err = putused(&used_ring->ring[off], used, part);
+		err = putused(vrh, &used_ring->ring[off], used, part);
 		if (!err)
-			err = putused(&used_ring->ring[0], used + part,
+			err = putused(vrh, &used_ring->ring[0], used + part,
 				      num_used - part);
 	} else
-		err = putused(&used_ring->ring[off], used, num_used);
+		err = putused(vrh, &used_ring->ring[off], used, num_used);
 
 	if (err) {
 		vringh_bad("Failed to write %u used entries %u at %p",
@@ -564,13 +573,15 @@ static inline int putu16_user(const struct vringh *vrh, __virtio16 *p, u16 val)
 	return put_user(v, (__force __virtio16 __user *)p);
 }
 
-static inline int copydesc_user(void *dst, const void *src, size_t len)
+static inline int copydesc_user(const struct vringh *vrh,
+				void *dst, const void *src, size_t len)
 {
 	return copy_from_user(dst, (__force void __user *)src, len) ?
 		-EFAULT : 0;
 }
 
-static inline int putused_user(struct vring_used_elem *dst,
+static inline int putused_user(const struct vringh *vrh,
+			       struct vring_used_elem *dst,
 			       const struct vring_used_elem *src,
 			       unsigned int num)
 {
@@ -578,13 +589,15 @@ static inline int putused_user(struct vring_used_elem *dst,
 			    sizeof(*dst) * num) ? -EFAULT : 0;
 }
 
-static inline int xfer_from_user(void *src, void *dst, size_t len)
+static inline int xfer_from_user(const struct vringh *vrh, void *src,
+				 void *dst, size_t len)
 {
 	return copy_from_user(dst, (__force void __user *)src, len) ?
 		-EFAULT : 0;
 }
 
-static inline int xfer_to_user(void *dst, void *src, size_t len)
+static inline int xfer_to_user(const struct vringh *vrh,
+			       void *dst, void *src, size_t len)
 {
 	return copy_to_user((__force void __user *)dst, src, len) ?
 		-EFAULT : 0;
@@ -706,7 +719,7 @@ EXPORT_SYMBOL(vringh_getdesc_user);
  */
 ssize_t vringh_iov_pull_user(struct vringh_iov *riov, void *dst, size_t len)
 {
-	return vringh_iov_xfer((struct vringh_kiov *)riov,
+	return vringh_iov_xfer(NULL, (struct vringh_kiov *)riov,
 			       dst, len, xfer_from_user);
 }
 EXPORT_SYMBOL(vringh_iov_pull_user);
@@ -722,7 +735,7 @@ EXPORT_SYMBOL(vringh_iov_pull_user);
 ssize_t vringh_iov_push_user(struct vringh_iov *wiov,
 			     const void *src, size_t len)
 {
-	return vringh_iov_xfer((struct vringh_kiov *)wiov,
+	return vringh_iov_xfer(NULL, (struct vringh_kiov *)wiov,
 			       (void *)src, len, xfer_to_user);
 }
 EXPORT_SYMBOL(vringh_iov_push_user);
@@ -832,13 +845,15 @@ static inline int putu16_kern(const struct vringh *vrh, __virtio16 *p, u16 val)
 	return 0;
 }
 
-static inline int copydesc_kern(void *dst, const void *src, size_t len)
+static inline int copydesc_kern(const struct vringh *vrh,
+				void *dst, const void *src, size_t len)
 {
 	memcpy(dst, src, len);
 	return 0;
 }
 
-static inline int putused_kern(struct vring_used_elem *dst,
+static inline int putused_kern(const struct vringh *vrh,
+			       struct vring_used_elem *dst,
 			       const struct vring_used_elem *src,
 			       unsigned int num)
 {
@@ -846,13 +861,15 @@ static inline int putused_kern(struct vring_used_elem *dst,
 	return 0;
 }
 
-static inline int xfer_kern(void *src, void *dst, size_t len)
+static inline int xfer_kern(const struct vringh *vrh, void *src,
+			    void *dst, size_t len)
 {
 	memcpy(dst, src, len);
 	return 0;
 }
 
-static inline int kern_xfer(void *dst, void *src, size_t len)
+static inline int kern_xfer(const struct vringh *vrh, void *dst,
+			    void *src, size_t len)
 {
 	memcpy(dst, src, len);
 	return 0;
@@ -949,7 +966,7 @@ EXPORT_SYMBOL(vringh_getdesc_kern);
  */
 ssize_t vringh_iov_pull_kern(struct vringh_kiov *riov, void *dst, size_t len)
 {
-	return vringh_iov_xfer(riov, dst, len, xfer_kern);
+	return vringh_iov_xfer(NULL, riov, dst, len, xfer_kern);
 }
 EXPORT_SYMBOL(vringh_iov_pull_kern);
 
@@ -964,7 +981,7 @@ EXPORT_SYMBOL(vringh_iov_pull_kern);
 ssize_t vringh_iov_push_kern(struct vringh_kiov *wiov,
 			     const void *src, size_t len)
 {
-	return vringh_iov_xfer(wiov, (void *)src, len, kern_xfer);
+	return vringh_iov_xfer(NULL, wiov, (void *)src, len, kern_xfer);
 }
 EXPORT_SYMBOL(vringh_iov_push_kern);
 
@@ -1042,4 +1059,375 @@ int vringh_need_notify_kern(struct vringh *vrh)
 }
 EXPORT_SYMBOL(vringh_need_notify_kern);
 
+static int iotlb_translate(const struct vringh *vrh,
+			   u64 addr, u64 len, struct bio_vec iov[],
+			   int iov_size, u32 perm)
+{
+	struct vhost_iotlb_map *map;
+	struct vhost_iotlb *iotlb = vrh->iotlb;
+	int ret = 0;
+	u64 s = 0;
+
+	while (len > s) {
+		u64 size, pa, pfn;
+
+		if (unlikely(ret >= iov_size)) {
+			ret = -ENOBUFS;
+			break;
+		}
+
+		map = vhost_iotlb_itree_first(iotlb, addr,
+					      addr + len - 1);
+		if (!map || map->start > addr) {
+			ret = -EINVAL;
+			break;
+		} else if (!(map->perm & perm)) {
+			ret = -EPERM;
+			break;
+		}
+
+		size = map->size - addr + map->start;
+		pa = map->addr + addr - map->start;
+		pfn = pa >> PAGE_SHIFT;
+		iov[ret].bv_page = pfn_to_page(pfn);
+		iov[ret].bv_len = min(len - s, size);
+		iov[ret].bv_offset = pa & (PAGE_SIZE - 1);
+		s += size;
+		addr += size;
+		++ret;
+	}
+
+	return ret;
+}
+
+static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
+				  void *src, size_t len)
+{
+	struct iov_iter iter;
+	struct bio_vec iov[16];
+	int ret;
+
+	ret = iotlb_translate(vrh, (u64)src, len, iov, 16, VHOST_MAP_RO);
+	if (ret < 0)
+		return ret;
+
+	iov_iter_bvec(&iter, READ, iov, ret, len);
+
+	ret = copy_from_iter(dst, len, &iter);
+
+	return ret;
+}
+
+static inline int copy_to_iotlb(const struct vringh *vrh, void *dst,
+				void *src, size_t len)
+{
+	struct iov_iter iter;
+	struct bio_vec iov[16];
+	int ret;
+
+	ret = iotlb_translate(vrh, (u64)dst, len, iov, 16, VHOST_MAP_WO);
+	if (ret < 0)
+		return ret;
+
+	iov_iter_bvec(&iter, WRITE, iov, ret, len);
+
+	return copy_to_iter(src, len, &iter);
+}
+
+static inline int getu16_iotlb(const struct vringh *vrh,
+			       u16 *val, const __virtio16 *p)
+{
+	struct bio_vec iov;
+	void *kaddr, *from;
+	int ret;
+
+	/* Atomic read is needed for getu16 */
+	ret = iotlb_translate(vrh, (u64)p, sizeof(*p),
+			      &iov, 1, VHOST_MAP_RO);
+	if (ret < 0)
+		return ret;
+
+	kaddr = kmap_atomic(iov.bv_page);
+	from = kaddr + iov.bv_offset;
+	*val = vringh16_to_cpu(vrh, READ_ONCE(*(__virtio16 *)from));
+	kunmap_atomic(kaddr);
+
+	return 0;
+}
+
+static inline int putu16_iotlb(const struct vringh *vrh,
+			       __virtio16 *p, u16 val)
+{
+	struct bio_vec iov;
+	void *kaddr, *to;
+	int ret;
+
+	/* Atomic write is needed for putu16 */
+	ret = iotlb_translate(vrh, (u64)p, sizeof(*p),
+			      &iov, 1, VHOST_MAP_WO);
+	if (ret < 0)
+		return ret;
+
+	kaddr = kmap_atomic(iov.bv_page);
+	to = kaddr + iov.bv_offset;
+	WRITE_ONCE(*(__virtio16 *)to, cpu_to_vringh16(vrh, val));
+	kunmap_atomic(kaddr);
+
+	return 0;
+}
+
+static inline int copydesc_iotlb(const struct vringh *vrh,
+				 void *dst, const void *src, size_t len)
+{
+	int ret;
+
+	ret = copy_from_iotlb(vrh, dst, (void *)src, len);
+	if (ret != len)
+		return -EFAULT;
+
+	return 0;
+}
+
+static inline int xfer_from_iotlb(const struct vringh *vrh, void *src,
+				  void *dst, size_t len)
+{
+	int ret;
+
+	ret = copy_from_iotlb(vrh, dst, src, len);
+	if (ret != len)
+		return -EFAULT;
+
+	return 0;
+}
+
+static inline int xfer_to_iotlb(const struct vringh *vrh,
+			       void *dst, void *src, size_t len)
+{
+	int ret;
+
+	ret = copy_to_iotlb(vrh, dst, src, len);
+	if (ret != len)
+		return -EFAULT;
+
+	return 0;
+}
+
+static inline int putused_iotlb(const struct vringh *vrh,
+				struct vring_used_elem *dst,
+				const struct vring_used_elem *src,
+				unsigned int num)
+{
+	int size = num * sizeof(*dst);
+	int ret;
+
+	ret = copy_to_iotlb(vrh, dst, (void *)src, num * sizeof(*dst));
+	if (ret != size)
+		return -EFAULT;
+
+	return 0;
+}
+
+/**
+ * vringh_init_iotlb - initialize a vringh for a ring with IOTLB.
+ * @vrh: the vringh to initialize.
+ * @features: the feature bits for this ring.
+ * @num: the number of elements.
+ * @weak_barriers: true if we only need memory barriers, not I/O.
+ * @desc: the userpace descriptor pointer.
+ * @avail: the userpace avail pointer.
+ * @used: the userpace used pointer.
+ *
+ * Returns an error if num is invalid.
+ */
+int vringh_init_iotlb(struct vringh *vrh, u64 features,
+		      unsigned int num, bool weak_barriers,
+		      struct vring_desc *desc,
+		      struct vring_avail *avail,
+		      struct vring_used *used)
+{
+	/* Sane power of 2 please! */
+	if (!num || num > 0xffff || (num & (num - 1))) {
+		vringh_bad("Bad ring size %u", num);
+		return -EINVAL;
+	}
+
+	vrh->little_endian = (features & (1ULL << VIRTIO_F_VERSION_1));
+	vrh->event_indices = (features & (1 << VIRTIO_RING_F_EVENT_IDX));
+	vrh->weak_barriers = weak_barriers;
+	vrh->completed = 0;
+	vrh->last_avail_idx = 0;
+	vrh->last_used_idx = 0;
+	vrh->vring.num = num;
+	vrh->vring.desc = desc;
+	vrh->vring.avail = avail;
+	vrh->vring.used = used;
+	return 0;
+}
+EXPORT_SYMBOL(vringh_init_iotlb);
+
+/**
+ * vringh_set_iotlb - initialize a vringh for a ring with IOTLB.
+ * @vrh: the vring
+ * @iotlb: iotlb associated with this vring
+ */
+void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb)
+{
+	vrh->iotlb = iotlb;
+}
+EXPORT_SYMBOL(vringh_set_iotlb);
+
+/**
+ * vringh_getdesc_iotlb - get next available descriptor from ring with
+ * IOTLB.
+ * @vrh: the kernelspace vring.
+ * @riov: where to put the readable descriptors (or NULL)
+ * @wiov: where to put the writable descriptors (or NULL)
+ * @head: head index we received, for passing to vringh_complete_iotlb().
+ * @gfp: flags for allocating larger riov/wiov.
+ *
+ * Returns 0 if there was no descriptor, 1 if there was, or -errno.
+ *
+ * Note that on error return, you can tell the difference between an
+ * invalid ring and a single invalid descriptor: in the former case,
+ * *head will be vrh->vring.num.  You may be able to ignore an invalid
+ * descriptor, but there's not much you can do with an invalid ring.
+ *
+ * Note that you may need to clean up riov and wiov, even on error!
+ */
+int vringh_getdesc_iotlb(struct vringh *vrh,
+			 struct vringh_kiov *riov,
+			 struct vringh_kiov *wiov,
+			 u16 *head,
+			 gfp_t gfp)
+{
+	int err;
+
+	err = __vringh_get_head(vrh, getu16_iotlb, &vrh->last_avail_idx);
+	if (err < 0)
+		return err;
+
+	/* Empty... */
+	if (err == vrh->vring.num)
+		return 0;
+
+	*head = err;
+	err = __vringh_iov(vrh, *head, riov, wiov, no_range_check, NULL,
+			   gfp, copydesc_iotlb);
+	if (err)
+		return err;
+
+	return 1;
+}
+EXPORT_SYMBOL(vringh_getdesc_iotlb);
+
+/**
+ * vringh_iov_pull_iotlb - copy bytes from vring_iov.
+ * @vrh: the vring.
+ * @riov: the riov as passed to vringh_getdesc_iotlb() (updated as we consume)
+ * @dst: the place to copy.
+ * @len: the maximum length to copy.
+ *
+ * Returns the bytes copied <= len or a negative errno.
+ */
+ssize_t vringh_iov_pull_iotlb(struct vringh *vrh,
+			      struct vringh_kiov *riov,
+			      void *dst, size_t len)
+{
+	return vringh_iov_xfer(vrh, riov, dst, len, xfer_from_iotlb);
+}
+EXPORT_SYMBOL(vringh_iov_pull_iotlb);
+
+/**
+ * vringh_iov_push_iotlb - copy bytes into vring_iov.
+ * @vrh: the vring.
+ * @wiov: the wiov as passed to vringh_getdesc_iotlb() (updated as we consume)
+ * @dst: the place to copy.
+ * @len: the maximum length to copy.
+ *
+ * Returns the bytes copied <= len or a negative errno.
+ */
+ssize_t vringh_iov_push_iotlb(struct vringh *vrh,
+			      struct vringh_kiov *wiov,
+			      const void *src, size_t len)
+{
+	return vringh_iov_xfer(vrh, wiov, (void *)src, len, xfer_to_iotlb);
+}
+EXPORT_SYMBOL(vringh_iov_push_iotlb);
+
+/**
+ * vringh_abandon_iotlb - we've decided not to handle the descriptor(s).
+ * @vrh: the vring.
+ * @num: the number of descriptors to put back (ie. num
+ *	 vringh_get_iotlb() to undo).
+ *
+ * The next vringh_get_iotlb() will return the old descriptor(s) again.
+ */
+void vringh_abandon_iotlb(struct vringh *vrh, unsigned int num)
+{
+	/* We only update vring_avail_event(vr) when we want to be notified,
+	 * so we haven't changed that yet.
+	 */
+	vrh->last_avail_idx -= num;
+}
+EXPORT_SYMBOL(vringh_abandon_iotlb);
+
+/**
+ * vringh_complete_iotlb - we've finished with descriptor, publish it.
+ * @vrh: the vring.
+ * @head: the head as filled in by vringh_getdesc_iotlb.
+ * @len: the length of data we have written.
+ *
+ * You should check vringh_need_notify_iotlb() after one or more calls
+ * to this function.
+ */
+int vringh_complete_iotlb(struct vringh *vrh, u16 head, u32 len)
+{
+	struct vring_used_elem used;
+
+	used.id = cpu_to_vringh32(vrh, head);
+	used.len = cpu_to_vringh32(vrh, len);
+
+	return __vringh_complete(vrh, &used, 1, putu16_iotlb, putused_iotlb);
+}
+EXPORT_SYMBOL(vringh_complete_iotlb);
+
+/**
+ * vringh_notify_enable_iotlb - we want to know if something changes.
+ * @vrh: the vring.
+ *
+ * This always enables notifications, but returns false if there are
+ * now more buffers available in the vring.
+ */
+bool vringh_notify_enable_iotlb(struct vringh *vrh)
+{
+	return __vringh_notify_enable(vrh, getu16_iotlb, putu16_iotlb);
+}
+EXPORT_SYMBOL(vringh_notify_enable_iotlb);
+
+/**
+ * vringh_notify_disable_iotlb - don't tell us if something changes.
+ * @vrh: the vring.
+ *
+ * This is our normal running state: we disable and then only enable when
+ * we're going to sleep.
+ */
+void vringh_notify_disable_iotlb(struct vringh *vrh)
+{
+	__vringh_notify_disable(vrh, putu16_iotlb);
+}
+EXPORT_SYMBOL(vringh_notify_disable_iotlb);
+
+/**
+ * vringh_need_notify_iotlb - must we tell the other side about used buffers?
+ * @vrh: the vring we've called vringh_complete_iotlb() on.
+ *
+ * Returns -errno or 0 if we don't need to tell the other side, 1 if we do.
+ */
+int vringh_need_notify_iotlb(struct vringh *vrh)
+{
+	return __vringh_need_notify(vrh, getu16_iotlb);
+}
+EXPORT_SYMBOL(vringh_need_notify_iotlb);
+
+
 MODULE_LICENSE("GPL");
diff --git a/include/linux/vringh.h b/include/linux/vringh.h
index d237087eb257..bd0503ca6f8f 100644
--- a/include/linux/vringh.h
+++ b/include/linux/vringh.h
@@ -14,6 +14,8 @@
 #include <linux/virtio_byteorder.h>
 #include <linux/uio.h>
 #include <linux/slab.h>
+#include <linux/dma-direction.h>
+#include <linux/vhost_iotlb.h>
 #include <asm/barrier.h>
 
 /* virtio_ring with information needed for host access. */
@@ -39,6 +41,9 @@ struct vringh {
 	/* The vring (note: it may contain user pointers!) */
 	struct vring vring;
 
+	/* IOTLB for this vring */
+	struct vhost_iotlb *iotlb;
+
 	/* The function to call to notify the guest about added buffers */
 	void (*notify)(struct vringh *);
 };
@@ -248,4 +253,35 @@ static inline __virtio64 cpu_to_vringh64(const struct vringh *vrh, u64 val)
 {
 	return __cpu_to_virtio64(vringh_is_little_endian(vrh), val);
 }
+
+void vringh_set_iotlb(struct vringh *vrh, struct vhost_iotlb *iotlb);
+
+int vringh_init_iotlb(struct vringh *vrh, u64 features,
+		      unsigned int num, bool weak_barriers,
+		      struct vring_desc *desc,
+		      struct vring_avail *avail,
+		      struct vring_used *used);
+
+int vringh_getdesc_iotlb(struct vringh *vrh,
+			 struct vringh_kiov *riov,
+			 struct vringh_kiov *wiov,
+			 u16 *head,
+			 gfp_t gfp);
+
+ssize_t vringh_iov_pull_iotlb(struct vringh *vrh,
+			      struct vringh_kiov *riov,
+			      void *dst, size_t len);
+ssize_t vringh_iov_push_iotlb(struct vringh *vrh,
+			      struct vringh_kiov *wiov,
+			      const void *src, size_t len);
+
+void vringh_abandon_iotlb(struct vringh *vrh, unsigned int num);
+
+int vringh_complete_iotlb(struct vringh *vrh, u16 head, u32 len);
+
+bool vringh_notify_enable_iotlb(struct vringh *vrh);
+void vringh_notify_disable_iotlb(struct vringh *vrh);
+
+int vringh_need_notify_iotlb(struct vringh *vrh);
+
 #endif /* _LINUX_VRINGH_H */
-- 
2.19.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-16 12:42 [PATCH 0/5] vDPA support Jason Wang
  2020-01-16 12:42 ` [PATCH 1/5] vhost: factor out IOTLB Jason Wang
  2020-01-16 12:42 ` [PATCH 2/5] vringh: IOTLB support Jason Wang
@ 2020-01-16 12:42 ` Jason Wang
  2020-01-16 15:22   ` Jason Gunthorpe
                     ` (2 more replies)
  2020-01-16 12:42 ` [PATCH 4/5] virtio: introduce a vDPA based transport Jason Wang
                   ` (2 subsequent siblings)
  5 siblings, 3 replies; 76+ messages in thread
From: Jason Wang @ 2020-01-16 12:42 UTC (permalink / raw)
  To: mst, jasowang, linux-kernel, kvm, virtualization, netdev
  Cc: tiwei.bie, jgg, maxime.coquelin, cunming.liang, zhihong.wang,
	rob.miller, xiao.w.wang, haotian.wang, lingshan.zhu, eperezma,
	lulu, parav, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, jiri, shahafs, hanand, mhabets

vDPA device is a device that uses a datapath which complies with the
virtio specifications with vendor specific control path. vDPA devices
can be both physically located on the hardware or emulated by
software. vDPA hardware devices are usually implemented through PCIE
with the following types:

- PF (Physical Function) - A single Physical Function
- VF (Virtual Function) - Device that supports single root I/O
  virtualization (SR-IOV). Its Virtual Function (VF) represents a
  virtualized instance of the device that can be assigned to different
  partitions
- VDEV (Virtual Device) - With technologies such as Intel Scalable
  IOV, a virtual device composed by host OS utilizing one or more
  ADIs.
- SF (Sub function) - Vendor specific interface to slice the Physical
  Function to multiple sub functions that can be assigned to different
  partitions as virtual devices.

From a driver's perspective, depends on how and where the DMA
translation is done, vDPA devices are split into two types:

- Platform specific DMA translation - From the driver's perspective,
  the device can be used on a platform where device access to data in
  memory is limited and/or translated. An example is a PCIE vDPA whose
  DMA request was tagged via a bus (e.g PCIE) specific way. DMA
  translation and protection are done at PCIE bus IOMMU level.
- Device specific DMA translation - The device implements DMA
  isolation and protection through its own logic. An example is a vDPA
  device which uses on-chip IOMMU.

To hide the differences and complexity of the above types for a vDPA
device/IOMMU options and in order to present a generic virtio device
to the upper layer, a device agnostic framework is required.

This patch introduces a software vDPA bus which abstracts the
common attributes of vDPA device, vDPA bus driver and the
communication method (vdpa_config_ops) between the vDPA device
abstraction and the vDPA bus driver:

With the abstraction of vDPA bus and vDPA bus operations, the
difference and complexity of the under layer hardware is hidden from
upper layer. The vDPA bus drivers on top can use a unified
vdpa_config_ops to control different types of vDPA device.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 MAINTAINERS                  |   1 +
 drivers/virtio/Kconfig       |   2 +
 drivers/virtio/Makefile      |   1 +
 drivers/virtio/vdpa/Kconfig  |   9 ++
 drivers/virtio/vdpa/Makefile |   2 +
 drivers/virtio/vdpa/vdpa.c   | 141 ++++++++++++++++++++++++++
 include/linux/vdpa.h         | 191 +++++++++++++++++++++++++++++++++++
 7 files changed, 347 insertions(+)
 create mode 100644 drivers/virtio/vdpa/Kconfig
 create mode 100644 drivers/virtio/vdpa/Makefile
 create mode 100644 drivers/virtio/vdpa/vdpa.c
 create mode 100644 include/linux/vdpa.h

diff --git a/MAINTAINERS b/MAINTAINERS
index d4bda9c900fa..578d2a581e3b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17540,6 +17540,7 @@ F:	tools/virtio/
 F:	drivers/net/virtio_net.c
 F:	drivers/block/virtio_blk.c
 F:	include/linux/virtio*.h
+F:	include/linux/vdpa.h
 F:	include/uapi/linux/virtio_*.h
 F:	drivers/crypto/virtio/
 F:	mm/balloon_compaction.c
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 078615cf2afc..9c4fdb64d9ac 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -96,3 +96,5 @@ config VIRTIO_MMIO_CMDLINE_DEVICES
 	 If unsure, say 'N'.
 
 endif # VIRTIO_MENU
+
+source "drivers/virtio/vdpa/Kconfig"
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 3a2b5c5dcf46..fdf5eacd0d0a 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
 virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
 obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
+obj-$(CONFIG_VDPA) += vdpa/
diff --git a/drivers/virtio/vdpa/Kconfig b/drivers/virtio/vdpa/Kconfig
new file mode 100644
index 000000000000..3032727b4d98
--- /dev/null
+++ b/drivers/virtio/vdpa/Kconfig
@@ -0,0 +1,9 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config VDPA
+	tristate
+        default n
+        help
+          Enable this module to support vDPA device that uses a
+          datapath which complies with virtio specifications with
+          vendor specific control path.
+
diff --git a/drivers/virtio/vdpa/Makefile b/drivers/virtio/vdpa/Makefile
new file mode 100644
index 000000000000..ee6a35e8a4fb
--- /dev/null
+++ b/drivers/virtio/vdpa/Makefile
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_VDPA) += vdpa.o
diff --git a/drivers/virtio/vdpa/vdpa.c b/drivers/virtio/vdpa/vdpa.c
new file mode 100644
index 000000000000..2b0e4a9f105d
--- /dev/null
+++ b/drivers/virtio/vdpa/vdpa.c
@@ -0,0 +1,141 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * vDPA bus.
+ *
+ * Copyright (c) 2019, Red Hat. All rights reserved.
+ *     Author: Jason Wang <jasowang@redhat.com>
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/idr.h>
+#include <linux/vdpa.h>
+
+#define MOD_VERSION  "0.1"
+#define MOD_DESC     "vDPA bus"
+#define MOD_AUTHOR   "Jason Wang <jasowang@redhat.com>"
+#define MOD_LICENSE  "GPL v2"
+
+static DEFINE_IDA(vdpa_index_ida);
+
+struct device *vdpa_get_parent(struct vdpa_device *vdpa)
+{
+	return vdpa->dev.parent;
+}
+EXPORT_SYMBOL(vdpa_get_parent);
+
+void vdpa_set_parent(struct vdpa_device *vdpa, struct device *parent)
+{
+	vdpa->dev.parent = parent;
+}
+EXPORT_SYMBOL(vdpa_set_parent);
+
+struct vdpa_device *dev_to_vdpa(struct device *_dev)
+{
+	return container_of(_dev, struct vdpa_device, dev);
+}
+EXPORT_SYMBOL_GPL(dev_to_vdpa);
+
+struct device *vdpa_to_dev(struct vdpa_device *vdpa)
+{
+	return &vdpa->dev;
+}
+EXPORT_SYMBOL_GPL(vdpa_to_dev);
+
+static int vdpa_dev_probe(struct device *d)
+{
+	struct vdpa_device *dev = dev_to_vdpa(d);
+	struct vdpa_driver *drv = drv_to_vdpa(dev->dev.driver);
+	int ret = 0;
+
+	if (drv && drv->probe)
+		ret = drv->probe(d);
+
+	return ret;
+}
+
+static int vdpa_dev_remove(struct device *d)
+{
+	struct vdpa_device *dev = dev_to_vdpa(d);
+	struct vdpa_driver *drv = drv_to_vdpa(dev->dev.driver);
+
+	if (drv && drv->remove)
+		drv->remove(d);
+
+	return 0;
+}
+
+static struct bus_type vdpa_bus = {
+	.name  = "vdpa",
+	.probe = vdpa_dev_probe,
+	.remove = vdpa_dev_remove,
+};
+
+int register_vdpa_device(struct vdpa_device *vdpa)
+{
+	int err;
+
+	if (!vdpa_get_parent(vdpa))
+		return -EINVAL;
+
+	if (!vdpa->config)
+		return -EINVAL;
+
+	err = ida_simple_get(&vdpa_index_ida, 0, 0, GFP_KERNEL);
+	if (err < 0)
+		return -EFAULT;
+
+	vdpa->dev.bus = &vdpa_bus;
+	device_initialize(&vdpa->dev);
+
+	vdpa->index = err;
+	dev_set_name(&vdpa->dev, "vdpa%u", vdpa->index);
+
+	err = device_add(&vdpa->dev);
+	if (err)
+		ida_simple_remove(&vdpa_index_ida, vdpa->index);
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(register_vdpa_device);
+
+void unregister_vdpa_device(struct vdpa_device *vdpa)
+{
+	int index = vdpa->index;
+
+	device_unregister(&vdpa->dev);
+	ida_simple_remove(&vdpa_index_ida, index);
+}
+EXPORT_SYMBOL_GPL(unregister_vdpa_device);
+
+int register_vdpa_driver(struct vdpa_driver *driver)
+{
+	driver->drv.bus = &vdpa_bus;
+	return driver_register(&driver->drv);
+}
+EXPORT_SYMBOL_GPL(register_vdpa_driver);
+
+void unregister_vdpa_driver(struct vdpa_driver *driver)
+{
+	driver_unregister(&driver->drv);
+}
+EXPORT_SYMBOL_GPL(unregister_vdpa_driver);
+
+static int vdpa_init(void)
+{
+	if (bus_register(&vdpa_bus) != 0)
+		panic("virtio bus registration failed");
+	return 0;
+}
+
+static void __exit vdpa_exit(void)
+{
+	bus_unregister(&vdpa_bus);
+	ida_destroy(&vdpa_index_ida);
+}
+core_initcall(vdpa_init);
+module_exit(vdpa_exit);
+
+MODULE_VERSION(MOD_VERSION);
+MODULE_AUTHOR(MOD_AUTHOR);
+MODULE_LICENSE(MOD_LICENSE);
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
new file mode 100644
index 000000000000..47760137ef66
--- /dev/null
+++ b/include/linux/vdpa.h
@@ -0,0 +1,191 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_VDPA_H
+#define _LINUX_VDPA_H
+
+#include <linux/device.h>
+#include <linux/interrupt.h>
+#include <linux/vhost_iotlb.h>
+
+/**
+ * vDPA callback definition.
+ * @callback: interrupt callback function
+ * @private: the data passed to the callback function
+ */
+struct vdpa_callback {
+	irqreturn_t (*callback)(void *data);
+	void *private;
+};
+
+/**
+ * vDPA device - representation of a vDPA device
+ * @dev: underlying device
+ * @config: the configuration ops for this device.
+ * @index: device index
+ */
+struct vdpa_device {
+	struct device dev;
+	const struct vdpa_config_ops *config;
+	int index;
+};
+
+/**
+ * vDPA_config_ops - operations for configuring a vDPA device.
+ * Note: vDPA device drivers are required to implement all of the
+ * operations unless it is optional mentioned in the following list.
+ * @set_vq_address:		Set the address of virtqueue
+ *				@vdev: vdpa device
+ *				@idx: virtqueue index
+ *				@desc_area: address of desc area
+ *				@driver_area: address of driver area
+ *				@device_area: address of device area
+ *				Returns integer: success (0) or error (< 0)
+ * @set_vq_num:			Set the size of virtqueue
+ *				@vdev: vdpa device
+ *				@idx: virtqueue index
+ *				@num: the size of virtqueue
+ * @kick_vq:			Kick the virtqueue
+ *				@vdev: vdpa device
+ *				@idx: virtqueue index
+ * @set_vq_cb:			Set the interrupt callback function for
+ *				a virtqueue
+ *				@vdev: vdpa device
+ *				@idx: virtqueue index
+ *				@cb: virtio-vdev interrupt callback structure
+ * @set_vq_ready:		Set ready status for a virtqueue
+ *				@vdev: vdpa device
+ *				@idx: virtqueue index
+ *				@ready: ready (true) not ready(false)
+ * @get_vq_ready:		Get ready status for a virtqueue
+ *				@vdev: vdpa device
+ *				@idx: virtqueue index
+ *				Returns boolean: ready (true) or not (false)
+ * @set_vq_state:		Set the state for a virtqueue
+ *				@vdev: vdpa device
+ *				@idx: virtqueue index
+ *				@state: virtqueue state (last_avail_idx)
+ *				Returns integer: success (0) or error (< 0)
+ * @get_vq_state:		Get the state for a virtqueue
+ *				@vdev: vdpa device
+ *				@idx: virtqueue index
+ *				Returns virtqueue state (last_avail_idx)
+ * @get_vq_align:		Get the virtqueue align requirement
+ *				for the device
+ *				@vdev: vdpa device
+ *				Returns virtqueue algin requirement
+ * @get_features:		Get virtio features supported by the device
+ *				@vdev: vdpa device
+ *				Returns the virtio features support by the
+ *				device
+ * @set_features:		Set virtio features supported by the driver
+ *				@vdev: vdpa device
+ *				@features: feature support by the driver
+ *				Returns integer: success (0) or error (< 0)
+ * @set_config_cb:		Set the config interrupt callback
+ *				@vdev: vdpa device
+ *				@cb: virtio-vdev interrupt callback structure
+ * @get_vq_num_max:		Get the max size of virtqueue
+ *				@vdev: vdpa device
+ *				Returns u16: max size of virtqueue
+ * @get_device_id:		Get virtio device id
+ *				@vdev: vdpa device
+ *				Returns u32: virtio device id
+ * @get_vendor_id:		Get id for the vendor that provides this device
+ *				@vdev: vdpa device
+ *				Returns u32: virtio vendor id
+ * @get_status:			Get the device status
+ *				@vdev: vdpa device
+ *				Returns u8: virtio device status
+ * @set_status:			Set the device status
+ *				@vdev: vdpa device
+ *				@status: virtio device status
+ * @get_config:			Read from device specific configuration space
+ *				@vdev: vdpa device
+ *				@offset: offset from the beginning of
+ *				configuration space
+ *				@buf: buffer used to read to
+ *				@len: the length to read from
+ *				configuration space
+ * @set_config:			Write to device specific configuration space
+ *				@vdev: vdpa device
+ *				@offset: offset from the beginning of
+ *				configuration space
+ *				@buf: buffer used to write from
+ *				@len: the length to write to
+ *				configuration space
+ * @get_generation:		Get device config generation (optional)
+ *				@vdev: vdpa device
+ *				Returns u32: device generation
+ * @set_map:			Set device memory mapping, optional
+ *				and only needed for device that using
+ *				device specific DMA translation
+ *				(on-chip IOMMU)
+ *				@vdev: vdpa device
+ *				@iotlb: vhost memory mapping to be
+ *				used by the vDPA
+ *				Returns integer: success (0) or error (< 0)
+ */
+struct vdpa_config_ops {
+	/* Virtqueue ops */
+	int (*set_vq_address)(struct vdpa_device *vdev,
+			      u16 idx, u64 desc_area, u64 driver_area,
+			      u64 device_area);
+	void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
+	void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
+	void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
+			  struct vdpa_callback *cb);
+	void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool ready);
+	bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
+	int (*set_vq_state)(struct vdpa_device *vdev, u16 idx, u64 state);
+	u64 (*get_vq_state)(struct vdpa_device *vdev, u16 idx);
+
+	/* Device ops */
+	u16 (*get_vq_align)(struct vdpa_device *vdev);
+	u64 (*get_features)(struct vdpa_device *vdev);
+	int (*set_features)(struct vdpa_device *vdev, u64 features);
+	void (*set_config_cb)(struct vdpa_device *vdev,
+			      struct vdpa_callback *cb);
+	u16 (*get_vq_num_max)(struct vdpa_device *vdev);
+	u32 (*get_device_id)(struct vdpa_device *vdev);
+	u32 (*get_vendor_id)(struct vdpa_device *vdev);
+	u8 (*get_status)(struct vdpa_device *vdev);
+	void (*set_status)(struct vdpa_device *vdev, u8 status);
+	void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
+			   void *buf, unsigned int len);
+	void (*set_config)(struct vdpa_device *vdev, unsigned int offset,
+			   const void *buf, unsigned int len);
+	u32 (*get_generation)(struct vdpa_device *vdev);
+
+	/* Mem table */
+	int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb);
+};
+
+int register_vdpa_device(struct vdpa_device *vdpa);
+void unregister_vdpa_device(struct vdpa_device *vdpa);
+
+struct device *vdpa_get_parent(struct vdpa_device *vdpa);
+void vdpa_set_parent(struct vdpa_device *vdpa, struct device *parent);
+
+struct vdpa_device *dev_to_vdpa(struct device *_dev);
+struct device *vdpa_to_dev(struct vdpa_device *vdpa);
+
+/**
+ * vdpa_driver - operations for a vDPA driver
+ * @driver: underlying device driver
+ * @probe: the function to call when a device is found.  Returns 0 or -errno.
+ * @remove: the function to call when a device is removed.
+ */
+struct vdpa_driver {
+	struct device_driver drv;
+	int (*probe)(struct device *dev);
+	void (*remove)(struct device *dev);
+};
+
+int register_vdpa_driver(struct vdpa_driver *drv);
+void unregister_vdpa_driver(struct vdpa_driver *drv);
+
+static inline struct vdpa_driver *drv_to_vdpa(struct device_driver *drv)
+{
+	return container_of(drv, struct vdpa_driver, drv);
+}
+
+#endif /* _LINUX_VDPA_H */
-- 
2.19.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH 4/5] virtio: introduce a vDPA based transport
  2020-01-16 12:42 [PATCH 0/5] vDPA support Jason Wang
                   ` (2 preceding siblings ...)
  2020-01-16 12:42 ` [PATCH 3/5] vDPA: introduce vDPA bus Jason Wang
@ 2020-01-16 12:42 ` Jason Wang
  2020-01-16 15:38   ` Jason Gunthorpe
  2020-01-17  4:10   ` Randy Dunlap
  2020-01-16 12:42 ` [PATCH 5/5] vdpasim: vDPA device simulator Jason Wang
  2020-01-21  8:44 ` [PATCH 0/5] vDPA support Tian, Kevin
  5 siblings, 2 replies; 76+ messages in thread
From: Jason Wang @ 2020-01-16 12:42 UTC (permalink / raw)
  To: mst, jasowang, linux-kernel, kvm, virtualization, netdev
  Cc: tiwei.bie, jgg, maxime.coquelin, cunming.liang, zhihong.wang,
	rob.miller, xiao.w.wang, haotian.wang, lingshan.zhu, eperezma,
	lulu, parav, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, jiri, shahafs, hanand, mhabets

This patch introduces a vDPA transport for virtio. This is used to
use kernel virtio driver to drive the mediated device that is capable
of populating virtqueue directly.

A new virtio-vdpa driver will be registered to the vDPA bus, when a
new virtio-vdpa device is probed, it will register the device with
vdpa based config ops. This means it is a software transport between
vDPA driver and vDPA device. The transport was implemented through
bus_ops of vDPA parent.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/Kconfig       |  13 ++
 drivers/virtio/Makefile      |   1 +
 drivers/virtio/virtio_vdpa.c | 400 +++++++++++++++++++++++++++++++++++
 3 files changed, 414 insertions(+)
 create mode 100644 drivers/virtio/virtio_vdpa.c

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 9c4fdb64d9ac..b4276999d17d 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -43,6 +43,19 @@ config VIRTIO_PCI_LEGACY
 
 	  If unsure, say Y.
 
+config VIRTIO_VDPA
+	tristate "vDPA driver for virtio devices"
+	depends on VDPA && VIRTIO
+	default n
+	help
+	  This driver provides support for virtio based paravirtual
+	  device driver over vDPA bus. For this to be useful, you need
+	  an appropriate vDPA device implementation that operates on a
+          physical device to allow the datapath of virtio to be
+	  offloaded to hardware.
+
+	  If unsure, say M.
+
 config VIRTIO_PMEM
 	tristate "Support for virtio pmem driver"
 	depends on VIRTIO
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index fdf5eacd0d0a..3407ac03fe60 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -6,4 +6,5 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
 virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
 obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
+obj-$(CONFIG_VIRTIO_VDPA) += virtio_vdpa.o
 obj-$(CONFIG_VDPA) += vdpa/
diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
new file mode 100644
index 000000000000..86936e5e7ec3
--- /dev/null
+++ b/drivers/virtio/virtio_vdpa.c
@@ -0,0 +1,400 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * VIRTIO based driver for vDPA device
+ *
+ * Copyright (c) 2020, Red Hat. All rights reserved.
+ *     Author: Jason Wang <jasowang@redhat.com>
+ *
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/uuid.h>
+#include <linux/virtio.h>
+#include <linux/vdpa.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_ring.h>
+
+#define MOD_VERSION  "0.1"
+#define MOD_AUTHOR   "Jason Wang <jasowang@redhat.com>"
+#define MOD_DESC     "vDPA bus driver for virtio devices"
+#define MOD_LICENSE  "GPL v2"
+
+#define to_virtio_vdpa_device(dev) \
+	container_of(dev, struct virtio_vdpa_device, vdev)
+
+struct virtio_vdpa_device {
+	struct virtio_device vdev;
+	struct vdpa_device *vdpa;
+	u64 features;
+
+	/* The lock to protect virtqueue list */
+	spinlock_t lock;
+	/* List of virtio_vdpa_vq_info */
+	struct list_head virtqueues;
+};
+
+struct virtio_vdpa_vq_info {
+	/* the actual virtqueue */
+	struct virtqueue *vq;
+
+	/* the list node for the virtqueues list */
+	struct list_head node;
+};
+
+static struct vdpa_device *vd_get_vdpa(struct virtio_device *vdev)
+{
+	struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev);
+	struct vdpa_device *vdpa = vd_dev->vdpa;
+
+	return vdpa;
+}
+
+static void virtio_vdpa_get(struct virtio_device *vdev, unsigned offset,
+			    void *buf, unsigned len)
+{
+	struct vdpa_device *vdpa = vd_get_vdpa(vdev);
+	const struct vdpa_config_ops *ops = vdpa->config;
+
+	ops->get_config(vdpa, offset, buf, len);
+}
+
+static void virtio_vdpa_set(struct virtio_device *vdev, unsigned offset,
+			    const void *buf, unsigned len)
+{
+	struct vdpa_device *vdpa = vd_get_vdpa(vdev);
+	const struct vdpa_config_ops *ops = vdpa->config;
+
+	ops->set_config(vdpa, offset, buf, len);
+}
+
+static u32 virtio_vdpa_generation(struct virtio_device *vdev)
+{
+	struct vdpa_device *vdpa = vd_get_vdpa(vdev);
+	const struct vdpa_config_ops *ops = vdpa->config;
+
+	if (ops->get_generation)
+		return ops->get_generation(vdpa);
+
+	return 0;
+}
+
+static u8 virtio_vdpa_get_status(struct virtio_device *vdev)
+{
+	struct vdpa_device *vdpa = vd_get_vdpa(vdev);
+	const struct vdpa_config_ops *ops = vdpa->config;
+
+	return ops->get_status(vdpa);
+}
+
+static void virtio_vdpa_set_status(struct virtio_device *vdev, u8 status)
+{
+	struct vdpa_device *vdpa = vd_get_vdpa(vdev);
+	const struct vdpa_config_ops *ops = vdpa->config;
+
+	return ops->set_status(vdpa, status);
+}
+
+static void virtio_vdpa_reset(struct virtio_device *vdev)
+{
+	struct vdpa_device *vdpa = vd_get_vdpa(vdev);
+	const struct vdpa_config_ops *ops = vdpa->config;
+
+	return ops->set_status(vdpa, 0);
+}
+
+static bool virtio_vdpa_notify(struct virtqueue *vq)
+{
+	struct vdpa_device *vdpa = vd_get_vdpa(vq->vdev);
+	const struct vdpa_config_ops *ops = vdpa->config;
+
+	ops->kick_vq(vdpa, vq->index);
+
+	return true;
+}
+
+static irqreturn_t virtio_vdpa_config_cb(void *private)
+{
+	struct virtio_vdpa_device *vd_dev = private;
+
+	virtio_config_changed(&vd_dev->vdev);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t virtio_vdpa_virtqueue_cb(void *private)
+{
+	struct virtio_vdpa_vq_info *info = private;
+
+	return vring_interrupt(0, info->vq);
+}
+
+static struct virtqueue *
+virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned int index,
+		     void (*callback)(struct virtqueue *vq),
+		     const char *name, bool ctx)
+{
+	struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev);
+	struct vdpa_device *vdpa = vd_get_vdpa(vdev);
+	const struct vdpa_config_ops *ops = vdpa->config;
+	struct virtio_vdpa_vq_info *info;
+	struct vdpa_callback cb;
+	struct virtqueue *vq;
+	u64 desc_addr, driver_addr, device_addr;
+	unsigned long flags;
+	u32 align, num;
+	int err;
+
+	if (!name)
+		return NULL;
+
+	/* Queue shouldn't already be set up. */
+	if (ops->get_vq_ready(vdpa, index))
+		return ERR_PTR(-ENOENT);
+
+	/* Allocate and fill out our active queue description */
+	info = kmalloc(sizeof(*info), GFP_KERNEL);
+	if (!info)
+		return ERR_PTR(-ENOMEM);
+
+	num = ops->get_vq_num_max(vdpa);
+	if (num == 0) {
+		err = -ENOENT;
+		goto error_new_virtqueue;
+	}
+
+	/* Create the vring */
+	align = ops->get_vq_align(vdpa);
+	vq = vring_create_virtqueue(index, num, align, vdev,
+				    true, true, ctx,
+				    virtio_vdpa_notify, callback, name);
+	if (!vq) {
+		err = -ENOMEM;
+		goto error_new_virtqueue;
+	}
+
+	/* Setup virtqueue callback */
+	cb.callback = virtio_vdpa_virtqueue_cb;
+	cb.private = info;
+	ops->set_vq_cb(vdpa, index, &cb);
+	ops->set_vq_num(vdpa, index, virtqueue_get_vring_size(vq));
+
+	desc_addr = virtqueue_get_desc_addr(vq);
+	driver_addr = virtqueue_get_avail_addr(vq);
+	device_addr = virtqueue_get_used_addr(vq);
+
+	if (ops->set_vq_address(vdpa, index,
+				desc_addr, driver_addr,
+				device_addr)) {
+		err = -EINVAL;
+		goto err_vq;
+	}
+
+	ops->set_vq_ready(vdpa, index, 1);
+
+	vq->priv = info;
+	info->vq = vq;
+
+	spin_lock_irqsave(&vd_dev->lock, flags);
+	list_add(&info->node, &vd_dev->virtqueues);
+	spin_unlock_irqrestore(&vd_dev->lock, flags);
+
+	return vq;
+
+err_vq:
+	vring_del_virtqueue(vq);
+error_new_virtqueue:
+	ops->set_vq_ready(vdpa, index, 0);
+	WARN_ON(ops->get_vq_ready(vdpa, index));
+	kfree(info);
+	return ERR_PTR(err);
+}
+
+static void virtio_vdpa_del_vq(struct virtqueue *vq)
+{
+	struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vq->vdev);
+	struct vdpa_device *vdpa = vd_dev->vdpa;
+	const struct vdpa_config_ops *ops = vdpa->config;
+	struct virtio_vdpa_vq_info *info = vq->priv;
+	unsigned int index = vq->index;
+	unsigned long flags;
+
+	spin_lock_irqsave(&vd_dev->lock, flags);
+	list_del(&info->node);
+	spin_unlock_irqrestore(&vd_dev->lock, flags);
+
+	/* Select and deactivate the queue */
+	ops->set_vq_ready(vdpa, index, 0);
+	WARN_ON(ops->get_vq_ready(vdpa, index));
+
+	vring_del_virtqueue(vq);
+
+	kfree(info);
+}
+
+static void virtio_vdpa_del_vqs(struct virtio_device *vdev)
+{
+	struct virtqueue *vq, *n;
+
+	list_for_each_entry_safe(vq, n, &vdev->vqs, list)
+		virtio_vdpa_del_vq(vq);
+}
+
+static int virtio_vdpa_find_vqs(struct virtio_device *vdev, unsigned nvqs,
+				struct virtqueue *vqs[],
+				vq_callback_t *callbacks[],
+				const char * const names[],
+				const bool *ctx,
+				struct irq_affinity *desc)
+{
+	struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev);
+	struct vdpa_device *vdpa = vd_get_vdpa(vdev);
+	const struct vdpa_config_ops *ops = vdpa->config;
+	struct vdpa_callback cb;
+	int i, err, queue_idx = 0;
+
+	for (i = 0; i < nvqs; ++i) {
+		if (!names[i]) {
+			vqs[i] = NULL;
+			continue;
+		}
+
+		vqs[i] = virtio_vdpa_setup_vq(vdev, queue_idx++,
+					      callbacks[i], names[i], ctx ?
+					      ctx[i] : false);
+		if (IS_ERR(vqs[i])) {
+			err = PTR_ERR(vqs[i]);
+			goto err_setup_vq;
+		}
+	}
+
+	cb.callback = virtio_vdpa_config_cb;
+	cb.private = vd_dev;
+	ops->set_config_cb(vdpa, &cb);
+
+	return 0;
+
+err_setup_vq:
+	virtio_vdpa_del_vqs(vdev);
+	return err;
+}
+
+static u64 virtio_vdpa_get_features(struct virtio_device *vdev)
+{
+	struct vdpa_device *vdpa = vd_get_vdpa(vdev);
+	const struct vdpa_config_ops *ops = vdpa->config;
+
+	return ops->get_features(vdpa);
+}
+
+static int virtio_vdpa_finalize_features(struct virtio_device *vdev)
+{
+	struct vdpa_device *vdpa = vd_get_vdpa(vdev);
+	const struct vdpa_config_ops *ops = vdpa->config;
+
+	/* Give virtio_ring a chance to accept features. */
+	vring_transport_features(vdev);
+
+	return ops->set_features(vdpa, vdev->features);
+}
+
+static const char *virtio_vdpa_bus_name(struct virtio_device *vdev)
+{
+	struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev);
+	struct vdpa_device *vdpa = vd_dev->vdpa;
+
+	return dev_name(vdpa_to_dev(vdpa));
+}
+
+static const struct virtio_config_ops virtio_vdpa_config_ops = {
+	.get		= virtio_vdpa_get,
+	.set		= virtio_vdpa_set,
+	.generation	= virtio_vdpa_generation,
+	.get_status	= virtio_vdpa_get_status,
+	.set_status	= virtio_vdpa_set_status,
+	.reset		= virtio_vdpa_reset,
+	.find_vqs	= virtio_vdpa_find_vqs,
+	.del_vqs	= virtio_vdpa_del_vqs,
+	.get_features	= virtio_vdpa_get_features,
+	.finalize_features = virtio_vdpa_finalize_features,
+	.bus_name	= virtio_vdpa_bus_name,
+};
+
+static void virtio_vdpa_release_dev(struct device *_d)
+{
+	struct virtio_device *vdev =
+	       container_of(_d, struct virtio_device, dev);
+	struct virtio_vdpa_device *vd_dev =
+	       container_of(vdev, struct virtio_vdpa_device, vdev);
+	struct vdpa_device *vdpa = vd_dev->vdpa;
+
+	devm_kfree(&vdpa->dev, vd_dev);
+}
+
+static int virtio_vdpa_probe(struct device *dev)
+{
+	struct vdpa_device *vdpa = dev_to_vdpa(dev);
+	const struct vdpa_config_ops *ops = vdpa->config;
+	struct virtio_vdpa_device *vd_dev;
+	int rc;
+
+	vd_dev = devm_kzalloc(dev, sizeof(*vd_dev), GFP_KERNEL);
+	if (!vd_dev)
+		return -ENOMEM;
+
+	vd_dev->vdev.dev.parent = &vdpa->dev;
+	vd_dev->vdev.dev.release = virtio_vdpa_release_dev;
+	vd_dev->vdev.config = &virtio_vdpa_config_ops;
+	vd_dev->vdpa = vdpa;
+	INIT_LIST_HEAD(&vd_dev->virtqueues);
+	spin_lock_init(&vd_dev->lock);
+
+	vd_dev->vdev.id.device = ops->get_device_id(vdpa);
+	if (vd_dev->vdev.id.device == 0)
+		return -ENODEV;
+
+	vd_dev->vdev.id.vendor = ops->get_vendor_id(vdpa);
+	rc = register_virtio_device(&vd_dev->vdev);
+	if (rc)
+		put_device(dev);
+	else
+		dev_set_drvdata(dev, vd_dev);
+
+	return rc;
+}
+
+static void virtio_vdpa_remove(struct device *dev)
+{
+	struct virtio_vdpa_device *vd_dev = dev_get_drvdata(dev);
+
+	unregister_virtio_device(&vd_dev->vdev);
+}
+
+static struct vdpa_driver virtio_vdpa_driver = {
+	.drv = {
+		.name	= "virtio_vdpa",
+	},
+	.probe	= virtio_vdpa_probe,
+	.remove = virtio_vdpa_remove,
+};
+
+static int __init virtio_vdpa_init(void)
+{
+	return register_vdpa_driver(&virtio_vdpa_driver);
+}
+
+static void __exit virtio_vdpa_exit(void)
+{
+	unregister_vdpa_driver(&virtio_vdpa_driver);
+}
+
+module_init(virtio_vdpa_init)
+module_exit(virtio_vdpa_exit)
+
+MODULE_VERSION(MOD_VERSION);
+MODULE_LICENSE(MOD_LICENSE);
+MODULE_AUTHOR(MOD_AUTHOR);
+MODULE_DESCRIPTION(MOD_DESC);
-- 
2.19.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH 5/5] vdpasim: vDPA device simulator
  2020-01-16 12:42 [PATCH 0/5] vDPA support Jason Wang
                   ` (3 preceding siblings ...)
  2020-01-16 12:42 ` [PATCH 4/5] virtio: introduce a vDPA based transport Jason Wang
@ 2020-01-16 12:42 ` Jason Wang
  2020-01-16 15:47   ` Jason Gunthorpe
                     ` (4 more replies)
  2020-01-21  8:44 ` [PATCH 0/5] vDPA support Tian, Kevin
  5 siblings, 5 replies; 76+ messages in thread
From: Jason Wang @ 2020-01-16 12:42 UTC (permalink / raw)
  To: mst, jasowang, linux-kernel, kvm, virtualization, netdev
  Cc: tiwei.bie, jgg, maxime.coquelin, cunming.liang, zhihong.wang,
	rob.miller, xiao.w.wang, haotian.wang, lingshan.zhu, eperezma,
	lulu, parav, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, jiri, shahafs, hanand, mhabets

This patch implements a software vDPA networking device. The datapath
is implemented through vringh and workqueue. The device has an on-chip
IOMMU which translates IOVA to PA. For kernel virtio drivers, vDPA
simulator driver provides dma_ops. For vhost driers, set_map() methods
of vdpa_config_ops is implemented to accept mappings from vhost.

A sysfs based management interface is implemented, devices are
created and removed through:

/sys/devices/virtual/vdpa_simulator/netdev/{create|remove}

Netlink based lifecycle management could be implemented for vDPA
simulator as well.

Currently, vDPA device simulator will loopback TX traffic to RX. So
the main use case for the device is vDPA feature testing, prototyping
and development.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/vdpa/Kconfig    |  17 +
 drivers/virtio/vdpa/Makefile   |   1 +
 drivers/virtio/vdpa/vdpa_sim.c | 796 +++++++++++++++++++++++++++++++++
 3 files changed, 814 insertions(+)
 create mode 100644 drivers/virtio/vdpa/vdpa_sim.c

diff --git a/drivers/virtio/vdpa/Kconfig b/drivers/virtio/vdpa/Kconfig
index 3032727b4d98..12ec25d48423 100644
--- a/drivers/virtio/vdpa/Kconfig
+++ b/drivers/virtio/vdpa/Kconfig
@@ -7,3 +7,20 @@ config VDPA
           datapath which complies with virtio specifications with
           vendor specific control path.
 
+menuconfig VDPA_MENU
+	bool "VDPA drivers"
+	default n
+
+if VDPA_MENU
+
+config VDPA_SIM
+	tristate "vDPA device simulator"
+        select VDPA
+        default n
+        help
+          vDPA networking device simulator which loop TX traffic back
+          to RX. This device is used for testing, prototyping and
+          development of vDPA.
+
+endif # VDPA_MENU
+
diff --git a/drivers/virtio/vdpa/Makefile b/drivers/virtio/vdpa/Makefile
index ee6a35e8a4fb..5ec0e6ae3c57 100644
--- a/drivers/virtio/vdpa/Makefile
+++ b/drivers/virtio/vdpa/Makefile
@@ -1,2 +1,3 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_VDPA) += vdpa.o
+obj-$(CONFIG_VDPA_SIM) += vdpa_sim.o
diff --git a/drivers/virtio/vdpa/vdpa_sim.c b/drivers/virtio/vdpa/vdpa_sim.c
new file mode 100644
index 000000000000..85a235f99e3d
--- /dev/null
+++ b/drivers/virtio/vdpa/vdpa_sim.c
@@ -0,0 +1,796 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * VDPA networking device simulator.
+ *
+ * Copyright (c) 2020, Red Hat Inc. All rights reserved.
+ *     Author: Jason Wang <jasowang@redhat.com>
+ *
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/fs.h>
+#include <linux/poll.h>
+#include <linux/slab.h>
+#include <linux/sched.h>
+#include <linux/wait.h>
+#include <linux/uuid.h>
+#include <linux/iommu.h>
+#include <linux/sysfs.h>
+#include <linux/file.h>
+#include <linux/etherdevice.h>
+#include <linux/vringh.h>
+#include <linux/vdpa.h>
+#include <linux/vhost_iotlb.h>
+#include <uapi/linux/virtio_config.h>
+#include <uapi/linux/virtio_net.h>
+
+#define DRV_VERSION  "0.1"
+#define DRV_AUTHOR   "Jason Wang <jasowang@redhat.com>"
+#define DRV_DESC     "vDPA Device Simulator"
+#define DRV_LICENSE  "GPL v2"
+
+struct vdpasim_dev {
+	struct class	*vd_class;
+	struct idr	vd_idr;
+	struct device	dev;
+	struct kobject  *devices_kobj;
+};
+
+struct vdpasim_dev *vdpasim_dev;
+
+struct vdpasim_virtqueue {
+	struct vringh vring;
+	struct vringh_kiov iov;
+	unsigned short head;
+	bool ready;
+	u64 desc_addr;
+	u64 device_addr;
+	u64 driver_addr;
+	u32 num;
+	void *private;
+	irqreturn_t (*cb)(void *data);
+};
+
+#define VDPASIM_QUEUE_ALIGN PAGE_SIZE
+#define VDPASIM_QUEUE_MAX 256
+#define VDPASIM_DEVICE_ID 0x1
+#define VDPASIM_VENDOR_ID 0
+#define VDPASIM_VQ_NUM 0x2
+#define VDPASIM_CLASS_NAME "vdpa_simulator"
+#define VDPASIM_NAME "netdev"
+
+u64 vdpasim_features = (1ULL << VIRTIO_F_ANY_LAYOUT) |
+		       (1ULL << VIRTIO_F_VERSION_1)  |
+		       (1ULL << VIRTIO_F_IOMMU_PLATFORM);
+
+/* State of each vdpasim device */
+struct vdpasim {
+	struct vdpasim_virtqueue vqs[2];
+	struct work_struct work;
+	/* spinlock to synchronize virtqueue state */
+	spinlock_t lock;
+	struct vdpa_device vdpa;
+	struct virtio_net_config config;
+	struct vhost_iotlb *iommu;
+	void *buffer;
+	u32 status;
+	u32 generation;
+	u64 features;
+	struct list_head next;
+	guid_t uuid;
+	char name[64];
+};
+
+static struct mutex vsim_list_lock;
+static struct list_head vsim_devices_list;
+
+static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa)
+{
+	return container_of(vdpa, struct vdpasim, vdpa);
+}
+
+static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
+{
+	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
+	int ret;
+
+	ret = vringh_init_iotlb(&vq->vring, vdpasim_features, VDPASIM_QUEUE_MAX,
+			        false, (struct vring_desc *)vq->desc_addr,
+				(struct vring_avail *)vq->driver_addr,
+				(struct vring_used *)vq->device_addr);
+}
+
+static void vdpasim_vq_reset(struct vdpasim_virtqueue *vq)
+{
+	vq->ready = 0;
+	vq->desc_addr = 0;
+	vq->driver_addr = 0;
+	vq->device_addr = 0;
+	vq->cb = NULL;
+	vq->private = NULL;
+	vringh_init_iotlb(&vq->vring, vdpasim_features, VDPASIM_QUEUE_MAX,
+			  false, 0, 0, 0);
+}
+
+static void vdpasim_reset(struct vdpasim *vdpasim)
+{
+	int i;
+
+	for (i = 0; i < VDPASIM_VQ_NUM; i++)
+		vdpasim_vq_reset(&vdpasim->vqs[i]);
+
+	vhost_iotlb_reset(vdpasim->iommu);
+
+	vdpasim->features = 0;
+	vdpasim->status = 0;
+	++vdpasim->generation;
+}
+
+static void vdpasim_work(struct work_struct *work)
+{
+	struct vdpasim *vdpasim = container_of(work, struct
+						 vdpasim, work);
+	struct vdpasim_virtqueue *txq = &vdpasim->vqs[1];
+	struct vdpasim_virtqueue *rxq = &vdpasim->vqs[0];
+	size_t read, write, total_write;
+	int err;
+	int pkts = 0;
+
+	spin_lock(&vdpasim->lock);
+
+	if (!(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK))
+		goto out;
+
+	if (!txq->ready || !rxq->ready)
+		goto out;
+
+	while (true) {
+		total_write = 0;
+		err = vringh_getdesc_iotlb(&txq->vring, &txq->iov, NULL,
+					   &txq->head, GFP_ATOMIC);
+		if (err <= 0)
+			break;
+
+		err = vringh_getdesc_iotlb(&rxq->vring, NULL, &rxq->iov,
+					   &rxq->head, GFP_ATOMIC);
+		if (err <= 0) {
+			vringh_complete_iotlb(&txq->vring, txq->head, 0);
+			break;
+		}
+
+		while (true) {
+			read = vringh_iov_pull_iotlb(&txq->vring, &txq->iov,
+						     vdpasim->buffer,
+						     PAGE_SIZE);
+			if (read <= 0)
+				break;
+
+			write = vringh_iov_push_iotlb(&rxq->vring, &rxq->iov,
+						      vdpasim->buffer, read);
+			if (write <= 0)
+				break;
+
+			total_write += write;
+		}
+
+		/* Make sure data is wrote before advancing index */
+		smp_wmb();
+
+		vringh_complete_iotlb(&txq->vring, txq->head, 0);
+		vringh_complete_iotlb(&rxq->vring, rxq->head, total_write);
+
+		/* Make sure used is visible before rasing the interrupt. */
+		smp_wmb();
+
+		local_bh_disable();
+		if (txq->cb)
+			txq->cb(txq->private);
+		if (rxq->cb)
+			rxq->cb(rxq->private);
+		local_bh_enable();
+
+		if (++pkts > 4) {
+			schedule_work(&vdpasim->work);
+			goto out;
+		}
+	}
+
+out:
+	spin_unlock(&vdpasim->lock);
+}
+
+static int dir_to_perm(enum dma_data_direction dir)
+{
+	int perm = -EFAULT;
+
+	switch (dir) {
+	case DMA_FROM_DEVICE:
+		perm = VHOST_MAP_WO;
+		break;
+	case DMA_TO_DEVICE:
+		perm = VHOST_MAP_RO;
+		break;
+	case DMA_BIDIRECTIONAL:
+		perm = VHOST_MAP_RW;
+		break;
+	default:
+		break;
+	}
+
+	return perm;
+}
+
+static dma_addr_t vdpasim_map_page(struct device *dev, struct page *page,
+				   unsigned long offset, size_t size,
+				   enum dma_data_direction dir,
+				   unsigned long attrs)
+{
+	struct vdpa_device *vdpa = dev_to_vdpa(dev);
+	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+	struct vhost_iotlb *iommu = vdpasim->iommu;
+	u64 pa = (page_to_pfn(page) << PAGE_SHIFT) + offset;
+	int ret, perm = dir_to_perm(dir);
+
+	if (perm < 0)
+		return DMA_MAPPING_ERROR;
+
+	/* For simplicity, use identical mapping to avoid e.g iova
+	 * allocator.
+	 */
+	ret = vhost_iotlb_add_range(iommu, pa, pa + size - 1,
+				    pa, dir_to_perm(dir));
+	if (ret)
+		return DMA_MAPPING_ERROR;
+
+	return (dma_addr_t)(pa);
+}
+
+static void vdpasim_unmap_page(struct device *dev, dma_addr_t dma_addr,
+			       size_t size, enum dma_data_direction dir,
+			       unsigned long attrs)
+{
+	struct vdpa_device *vdpa = dev_to_vdpa(dev);
+	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+	struct vhost_iotlb *iommu = vdpasim->iommu;
+
+	vhost_iotlb_del_range(iommu, (u64)dma_addr,
+			      (u64)dma_addr + size - 1);
+}
+
+static void *vdpasim_alloc_coherent(struct device *dev, size_t size,
+				    dma_addr_t *dma_addr, gfp_t flag,
+				    unsigned long attrs)
+{
+	struct vdpa_device *vdpa = dev_to_vdpa(dev);
+	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+	struct vhost_iotlb *iommu = vdpasim->iommu;
+	void *addr = kmalloc(size, flag);
+	int ret;
+
+	if (!addr)
+		*dma_addr = DMA_MAPPING_ERROR;
+	else {
+		u64 pa = virt_to_phys(addr);
+
+		ret = vhost_iotlb_add_range(iommu, (u64)pa,
+					    (u64)pa + size - 1,
+					    pa, VHOST_MAP_RW);
+		if (ret) {
+			kfree(addr);
+			*dma_addr = DMA_MAPPING_ERROR;
+		} else
+			*dma_addr = (dma_addr_t)pa;
+	}
+
+	return addr;
+}
+
+static void vdpasim_free_coherent(struct device *dev, size_t size,
+				void *vaddr, dma_addr_t dma_addr,
+				unsigned long attrs)
+{
+	struct vdpa_device *vdpa = dev_to_vdpa(dev);
+	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+	struct vhost_iotlb *iommu = vdpasim->iommu;
+
+	vhost_iotlb_del_range(iommu, (u64)dma_addr,
+			       (u64)dma_addr + size - 1);
+	kfree((void *)dma_addr);
+}
+
+static const struct dma_map_ops vdpasim_dma_ops = {
+	.map_page = vdpasim_map_page,
+	.unmap_page = vdpasim_unmap_page,
+	.alloc = vdpasim_alloc_coherent,
+	.free = vdpasim_free_coherent,
+};
+
+static void vdpasim_release_dev(struct device *_d)
+{
+	struct vdpa_device *vdpa = dev_to_vdpa(_d);
+	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+
+	sysfs_remove_link(vdpasim_dev->devices_kobj, vdpasim->name);
+
+	mutex_lock(&vsim_list_lock);
+	list_del(&vdpasim->next);
+	mutex_unlock(&vsim_list_lock);
+
+	kfree(vdpasim->buffer);
+	kfree(vdpasim);
+}
+
+static const struct vdpa_config_ops vdpasim_net_config_ops;
+
+static int vdpasim_create(const guid_t *uuid)
+{
+	struct vdpasim *vdpasim, *tmp;
+	struct virtio_net_config *config;
+	struct vdpa_device *vdpa;
+	struct device *dev;
+	int ret = -ENOMEM;
+
+	mutex_lock(&vsim_list_lock);
+	list_for_each_entry(tmp, &vsim_devices_list, next) {
+		if (guid_equal(&tmp->uuid, uuid)) {
+			mutex_unlock(&vsim_list_lock);
+			return -EEXIST;
+		}
+	}
+
+	vdpasim = kzalloc(sizeof(*vdpasim), GFP_KERNEL);
+	if (!vdpasim)
+		goto err_vdpa_alloc;
+
+	vdpasim->buffer = kmalloc(PAGE_SIZE, GFP_KERNEL);
+	if (!vdpasim->buffer)
+		goto err_buffer_alloc;
+
+	vdpasim->iommu = vhost_iotlb_alloc(2048, 0);
+	if (!vdpasim->iommu)
+		goto err_iotlb;
+
+	config = &vdpasim->config;
+	config->mtu = 1500;
+	config->status = VIRTIO_NET_S_LINK_UP;
+	eth_random_addr(config->mac);
+
+	INIT_WORK(&vdpasim->work, vdpasim_work);
+	spin_lock_init(&vdpasim->lock);
+
+	guid_copy(&vdpasim->uuid, uuid);
+
+	list_add(&vdpasim->next, &vsim_devices_list);
+	vdpa = &vdpasim->vdpa;
+
+	mutex_unlock(&vsim_list_lock);
+
+	vdpa = &vdpasim->vdpa;
+	vdpa->config = &vdpasim_net_config_ops;
+	vdpa_set_parent(vdpa, &vdpasim_dev->dev);
+	vdpa->dev.release = vdpasim_release_dev;
+
+	vringh_set_iotlb(&vdpasim->vqs[0].vring, vdpasim->iommu);
+	vringh_set_iotlb(&vdpasim->vqs[1].vring, vdpasim->iommu);
+
+	dev = &vdpa->dev;
+	dev->coherent_dma_mask = DMA_BIT_MASK(64);
+	set_dma_ops(dev, &vdpasim_dma_ops);
+
+	ret = register_vdpa_device(vdpa);
+	if (ret)
+		goto err_register;
+
+	sprintf(vdpasim->name, "%pU", uuid);
+
+	ret = sysfs_create_link(vdpasim_dev->devices_kobj, &vdpa->dev.kobj,
+				vdpasim->name);
+	if (ret)
+		goto err_link;
+
+	return 0;
+
+err_link:
+err_register:
+	vhost_iotlb_free(vdpasim->iommu);
+	mutex_lock(&vsim_list_lock);
+	list_del(&vdpasim->next);
+	mutex_unlock(&vsim_list_lock);
+err_iotlb:
+	kfree(vdpasim->buffer);
+err_buffer_alloc:
+	kfree(vdpasim);
+err_vdpa_alloc:
+	return ret;
+}
+
+static int vdpasim_remove(const guid_t *uuid)
+{
+	struct vdpasim *vds, *tmp;
+	struct vdpa_device *vdpa = NULL;
+	int ret = -EINVAL;
+
+	mutex_lock(&vsim_list_lock);
+	list_for_each_entry_safe(vds, tmp, &vsim_devices_list, next) {
+		if (guid_equal(&vds->uuid, uuid)) {
+			vdpa = &vds->vdpa;
+			ret = 0;
+			break;
+		}
+	}
+	mutex_unlock(&vsim_list_lock);
+
+	if (vdpa)
+		unregister_vdpa_device(vdpa);
+
+	return ret;
+}
+
+static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
+				  u64 desc_area, u64 driver_area,
+				  u64 device_area)
+{
+	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
+
+	vq->desc_addr = desc_area;
+	vq->driver_addr = driver_area;
+	vq->device_addr = device_area;
+
+	return 0;
+}
+
+static void vdpasim_set_vq_num(struct vdpa_device *vdpa, u16 idx, u32 num)
+{
+	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
+
+	vq->num = num;
+}
+
+static void vdpasim_kick_vq(struct vdpa_device *vdpa, u16 idx)
+{
+	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
+
+	if (vq->ready)
+		schedule_work(&vdpasim->work);
+}
+
+static void vdpasim_set_vq_cb(struct vdpa_device *vdpa, u16 idx,
+			      struct vdpa_callback *cb)
+{
+	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
+
+	vq->cb = cb->callback;
+	vq->private = cb->private;
+}
+
+static void vdpasim_set_vq_ready(struct vdpa_device *vdpa, u16 idx, bool ready)
+{
+	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
+
+	spin_lock(&vdpasim->lock);
+	vq->ready = ready;
+	if (vq->ready)
+		vdpasim_queue_ready(vdpasim, idx);
+	spin_unlock(&vdpasim->lock);
+}
+
+static bool vdpasim_get_vq_ready(struct vdpa_device *vdpa, u16 idx)
+{
+	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
+
+	return vq->ready;
+}
+
+static int vdpasim_set_vq_state(struct vdpa_device *vdpa, u16 idx, u64 state)
+{
+	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
+	struct vringh *vrh = &vq->vring;
+
+	spin_lock(&vdpasim->lock);
+	vrh->last_avail_idx = state;
+	spin_unlock(&vdpasim->lock);
+
+	return 0;
+}
+
+static u64 vdpasim_get_vq_state(struct vdpa_device *vdpa, u16 idx)
+{
+	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
+	struct vringh *vrh = &vq->vring;
+
+	return vrh->last_avail_idx;
+}
+
+static u16 vdpasim_get_vq_align(struct vdpa_device *vdpa)
+{
+	return VDPASIM_QUEUE_ALIGN;
+}
+
+static u64 vdpasim_get_features(struct vdpa_device *vdpa)
+{
+	return vdpasim_features;
+}
+
+static int vdpasim_set_features(struct vdpa_device *vdpa, u64 features)
+{
+	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+
+	/* DMA mapping must be done by driver */
+	if (!(features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)))
+		return -EINVAL;
+
+	vdpasim->features = features & vdpasim_features;
+
+	return 0;
+}
+
+static void vdpasim_set_config_cb(struct vdpa_device *vdpa,
+				  struct vdpa_callback *cb)
+{
+	/* We don't support config interrupt */
+}
+
+static u16 vdpasim_get_vq_num_max(struct vdpa_device *vdpa)
+{
+	return VDPASIM_QUEUE_MAX;
+}
+
+static u32 vdpasim_get_device_id(struct vdpa_device *vdpa)
+{
+	return VDPASIM_DEVICE_ID;
+}
+
+static u32 vdpasim_get_vendor_id(struct vdpa_device *vdpa)
+{
+	return VDPASIM_VENDOR_ID;
+}
+
+static u8 vdpasim_get_status(struct vdpa_device *vdpa)
+{
+	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+	u8 status;
+
+	spin_lock(&vdpasim->lock);
+	status = vdpasim->status;
+	spin_unlock(&vdpasim->lock);
+
+	return vdpasim->status;
+}
+
+static void vdpasim_set_status(struct vdpa_device *vdpa, u8 status)
+{
+	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+
+	spin_lock(&vdpasim->lock);
+	vdpasim->status = status;
+	if (status == 0)
+		vdpasim_reset(vdpasim);
+	spin_unlock(&vdpasim->lock);
+}
+
+static void vdpasim_get_config(struct vdpa_device *vdpa, unsigned int offset,
+			     void *buf, unsigned int len)
+{
+	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+
+	if (offset + len < sizeof(struct virtio_net_config))
+		memcpy(buf, &vdpasim->config + offset, len);
+}
+
+static void vdpasim_set_config(struct vdpa_device *vdpa, unsigned int offset,
+			     const void *buf, unsigned int len)
+{
+	/* No writable config supportted by vdpasim */
+}
+
+static u32 vdpasim_get_generation(struct vdpa_device *vdpa)
+{
+	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+
+	return vdpasim->generation;
+}
+
+static int vdpasim_set_map(struct vdpa_device *vdpa,
+			   struct vhost_iotlb *iotlb)
+{
+	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+	struct vhost_iotlb_map *map;
+	u64 start = 0ULL, last = 0ULL - 1;
+	int ret;
+
+	vhost_iotlb_reset(vdpasim->iommu);
+
+	for (map = vhost_iotlb_itree_first(iotlb, start, last); map;
+	     map = vhost_iotlb_itree_next(map, start, last)) {
+		ret = vhost_iotlb_add_range(vdpasim->iommu, map->start,
+					    map->last, map->addr, map->perm);
+		if (ret)
+			goto err;
+	}
+	return 0;
+
+err:
+	vhost_iotlb_reset(vdpasim->iommu);
+	return ret;
+}
+
+static const struct vdpa_config_ops vdpasim_net_config_ops = {
+	.set_vq_address         = vdpasim_set_vq_address,
+	.set_vq_num             = vdpasim_set_vq_num,
+	.kick_vq                = vdpasim_kick_vq,
+	.set_vq_cb              = vdpasim_set_vq_cb,
+	.set_vq_ready           = vdpasim_set_vq_ready,
+	.get_vq_ready           = vdpasim_get_vq_ready,
+	.set_vq_state           = vdpasim_set_vq_state,
+	.get_vq_state           = vdpasim_get_vq_state,
+	.get_vq_align           = vdpasim_get_vq_align,
+	.get_features           = vdpasim_get_features,
+	.set_features           = vdpasim_set_features,
+	.set_config_cb          = vdpasim_set_config_cb,
+	.get_vq_num_max         = vdpasim_get_vq_num_max,
+	.get_device_id          = vdpasim_get_device_id,
+	.get_vendor_id          = vdpasim_get_vendor_id,
+	.get_status             = vdpasim_get_status,
+	.set_status             = vdpasim_set_status,
+	.get_config             = vdpasim_get_config,
+	.set_config             = vdpasim_set_config,
+	.get_generation         = vdpasim_get_generation,
+	.set_map                = vdpasim_set_map,
+};
+
+static void vdpasim_device_release(struct device *dev)
+{
+	struct vdpasim_dev *vdpasim_dev =
+	       container_of(dev, struct vdpasim_dev, dev);
+
+	vdpasim_dev->dev.bus = NULL;
+	idr_destroy(&vdpasim_dev->vd_idr);
+	class_destroy(vdpasim_dev->vd_class);
+	vdpasim_dev->vd_class = NULL;
+	kfree(vdpasim_dev);
+}
+
+static ssize_t create_store(struct kobject *kobj, struct kobj_attribute *attr,
+			    const char *buf, size_t count)
+{
+	char *str;
+	guid_t uuid;
+	int ret;
+
+	if ((count < UUID_STRING_LEN) || (count > UUID_STRING_LEN + 1))
+		return -EINVAL;
+
+	str = kstrndup(buf, count, GFP_KERNEL);
+	if (!str)
+		return -ENOMEM;
+
+	ret = guid_parse(str, &uuid);
+	kfree(str);
+	if (ret)
+		return ret;
+
+	ret = vdpasim_create(&uuid);
+	if (ret)
+		return ret;
+
+	return count;
+}
+
+static ssize_t remove_store(struct kobject *kobj, struct kobj_attribute *attr,
+			    const char *buf, size_t count)
+{
+	char *str;
+	guid_t uuid;
+	int ret;
+
+	if ((count < UUID_STRING_LEN) || (count > UUID_STRING_LEN + 1))
+		return -EINVAL;
+
+	str = kstrndup(buf, count, GFP_KERNEL);
+	if (!str)
+		return -ENOMEM;
+
+	ret = guid_parse(str, &uuid);
+	kfree(str);
+	if (ret)
+		return ret;
+
+	ret = vdpasim_remove(&uuid);
+	if (ret)
+		return ret;
+
+	return count;
+}
+
+static struct kobj_attribute create_attribute = __ATTR_WO(create);
+static struct kobj_attribute remove_attribute = __ATTR_WO(remove);
+
+static struct attribute *attrs[] = {
+	&create_attribute.attr,
+	&remove_attribute.attr,
+	NULL,
+};
+
+static struct attribute_group attr_group = {
+	.attrs = attrs,
+};
+
+static int __init vdpasim_dev_init(void)
+{
+	struct device *dev;
+	int ret = 0;
+
+	vdpasim_dev = kzalloc(sizeof(*vdpasim_dev), GFP_KERNEL);
+	if (!vdpasim_dev)
+		return -ENOMEM;
+
+	idr_init(&vdpasim_dev->vd_idr);
+
+	vdpasim_dev->vd_class = class_create(THIS_MODULE, VDPASIM_CLASS_NAME);
+
+	if (IS_ERR(vdpasim_dev->vd_class)) {
+		pr_err("Error: failed to register vdpasim_dev class\n");
+		ret = PTR_ERR(vdpasim_dev->vd_class);
+		goto err_class;
+	}
+
+	dev = &vdpasim_dev->dev;
+	dev->class = vdpasim_dev->vd_class;
+	dev->release = vdpasim_device_release;
+	dev_set_name(dev, "%s", VDPASIM_NAME);
+
+	ret = device_register(&vdpasim_dev->dev);
+	if (ret)
+		goto err_register;
+
+	ret = sysfs_create_group(&vdpasim_dev->dev.kobj, &attr_group);
+	if (ret)
+		goto err_create;
+
+	vdpasim_dev->devices_kobj = kobject_create_and_add("devices",
+							   &dev->kobj);
+	if (!vdpasim_dev->devices_kobj) {
+		ret = -ENOMEM;
+		goto err_devices;
+	}
+
+	mutex_init(&vsim_list_lock);
+	INIT_LIST_HEAD(&vsim_devices_list);
+
+	return 0;
+
+err_devices:
+	sysfs_remove_group(&vdpasim_dev->dev.kobj, &attr_group);
+err_create:
+	device_unregister(&vdpasim_dev->dev);
+err_register:
+	class_destroy(vdpasim_dev->vd_class);
+err_class:
+	kfree(vdpasim_dev);
+	vdpasim_dev = NULL;
+	return ret;
+}
+
+static void __exit vdpasim_dev_exit(void)
+{
+	device_unregister(&vdpasim_dev->dev);
+}
+
+module_init(vdpasim_dev_init)
+module_exit(vdpasim_dev_exit)
+
+MODULE_VERSION(DRV_VERSION);
+MODULE_LICENSE(DRV_LICENSE);
+MODULE_AUTHOR(DRV_AUTHOR);
+MODULE_DESCRIPTION(DRV_DESC);
-- 
2.19.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-16 12:42 ` [PATCH 3/5] vDPA: introduce vDPA bus Jason Wang
@ 2020-01-16 15:22   ` Jason Gunthorpe
  2020-01-17  3:03     ` Jason Wang
  2020-01-17  4:16   ` Randy Dunlap
  2020-01-17 12:13   ` Michael S. Tsirkin
  2 siblings, 1 reply; 76+ messages in thread
From: Jason Gunthorpe @ 2020-01-16 15:22 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, linux-kernel, kvm, virtualization, netdev, tiwei.bie,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller,
	xiao.w.wang, haotian.wang, lingshan.zhu, eperezma, lulu,
	Parav Pandit, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, Jiri Pirko, Shahaf Shuler, hanand, mhabets

On Thu, Jan 16, 2020 at 08:42:29PM +0800, Jason Wang wrote:
> vDPA device is a device that uses a datapath which complies with the
> virtio specifications with vendor specific control path. vDPA devices
> can be both physically located on the hardware or emulated by
> software. vDPA hardware devices are usually implemented through PCIE
> with the following types:
> 
> - PF (Physical Function) - A single Physical Function
> - VF (Virtual Function) - Device that supports single root I/O
>   virtualization (SR-IOV). Its Virtual Function (VF) represents a
>   virtualized instance of the device that can be assigned to different
>   partitions

> - VDEV (Virtual Device) - With technologies such as Intel Scalable
>   IOV, a virtual device composed by host OS utilizing one or more
>   ADIs.
> - SF (Sub function) - Vendor specific interface to slice the Physical
>   Function to multiple sub functions that can be assigned to different
>   partitions as virtual devices.

I really hope we don't end up with two different ways to spell this
same thing.

> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0
> +obj-$(CONFIG_VDPA) += vdpa.o
> diff --git a/drivers/virtio/vdpa/vdpa.c b/drivers/virtio/vdpa/vdpa.c
> new file mode 100644
> index 000000000000..2b0e4a9f105d
> +++ b/drivers/virtio/vdpa/vdpa.c
> @@ -0,0 +1,141 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * vDPA bus.
> + *
> + * Copyright (c) 2019, Red Hat. All rights reserved.
> + *     Author: Jason Wang <jasowang@redhat.com>

2020 tests days

> + *
> + */
> +
> +#include <linux/module.h>
> +#include <linux/idr.h>
> +#include <linux/vdpa.h>
> +
> +#define MOD_VERSION  "0.1"

I think module versions are discouraged these days

> +#define MOD_DESC     "vDPA bus"
> +#define MOD_AUTHOR   "Jason Wang <jasowang@redhat.com>"
> +#define MOD_LICENSE  "GPL v2"
> +
> +static DEFINE_IDA(vdpa_index_ida);
> +
> +struct device *vdpa_get_parent(struct vdpa_device *vdpa)
> +{
> +	return vdpa->dev.parent;
> +}
> +EXPORT_SYMBOL(vdpa_get_parent);
> +
> +void vdpa_set_parent(struct vdpa_device *vdpa, struct device *parent)
> +{
> +	vdpa->dev.parent = parent;
> +}
> +EXPORT_SYMBOL(vdpa_set_parent);
> +
> +struct vdpa_device *dev_to_vdpa(struct device *_dev)
> +{
> +	return container_of(_dev, struct vdpa_device, dev);
> +}
> +EXPORT_SYMBOL_GPL(dev_to_vdpa);
> +
> +struct device *vdpa_to_dev(struct vdpa_device *vdpa)
> +{
> +	return &vdpa->dev;
> +}
> +EXPORT_SYMBOL_GPL(vdpa_to_dev);

Why these trivial assessors? Seems unnecessary, or should at least be
static inlines in a header

> +int register_vdpa_device(struct vdpa_device *vdpa)
> +{

Usually we want to see symbols consistently prefixed with vdpa_*, is
there a reason why register/unregister are swapped?

> +	int err;
> +
> +	if (!vdpa_get_parent(vdpa))
> +		return -EINVAL;
> +
> +	if (!vdpa->config)
> +		return -EINVAL;
> +
> +	err = ida_simple_get(&vdpa_index_ida, 0, 0, GFP_KERNEL);
> +	if (err < 0)
> +		return -EFAULT;
> +
> +	vdpa->dev.bus = &vdpa_bus;
> +	device_initialize(&vdpa->dev);

IMHO device_initialize should not be called inside something called
register, toooften we find out that the caller drivers need the device
to be initialized earlier, ie to use the kref, or something.

I find the best flow is to have some init function that does the
device_initialize and sets the device_name that the driver can call
early.

Shouldn't there be a device/driver matching process of some kind?

Jason

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 4/5] virtio: introduce a vDPA based transport
  2020-01-16 12:42 ` [PATCH 4/5] virtio: introduce a vDPA based transport Jason Wang
@ 2020-01-16 15:38   ` Jason Gunthorpe
  2020-01-17  9:32     ` Jason Wang
  2020-01-17  4:10   ` Randy Dunlap
  1 sibling, 1 reply; 76+ messages in thread
From: Jason Gunthorpe @ 2020-01-16 15:38 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, linux-kernel, kvm, virtualization, netdev, tiwei.bie,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller,
	xiao.w.wang, haotian.wang, lingshan.zhu, eperezma, lulu,
	Parav Pandit, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, Jiri Pirko, Shahaf Shuler, hanand, mhabets

On Thu, Jan 16, 2020 at 08:42:30PM +0800, Jason Wang wrote:
> diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
> new file mode 100644
> index 000000000000..86936e5e7ec3
> +++ b/drivers/virtio/virtio_vdpa.c
> @@ -0,0 +1,400 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * VIRTIO based driver for vDPA device
> + *
> + * Copyright (c) 2020, Red Hat. All rights reserved.
> + *     Author: Jason Wang <jasowang@redhat.com>
> + *
> + */
> +
> +#include <linux/init.h>
> +#include <linux/module.h>
> +#include <linux/device.h>
> +#include <linux/kernel.h>
> +#include <linux/slab.h>
> +#include <linux/uuid.h>
> +#include <linux/virtio.h>
> +#include <linux/vdpa.h>
> +#include <linux/virtio_config.h>
> +#include <linux/virtio_ring.h>
> +
> +#define MOD_VERSION  "0.1"
> +#define MOD_AUTHOR   "Jason Wang <jasowang@redhat.com>"
> +#define MOD_DESC     "vDPA bus driver for virtio devices"
> +#define MOD_LICENSE  "GPL v2"
> +
> +#define to_virtio_vdpa_device(dev) \
> +	container_of(dev, struct virtio_vdpa_device, vdev)

Should be a static function

> +struct virtio_vdpa_device {
> +	struct virtio_device vdev;
> +	struct vdpa_device *vdpa;
> +	u64 features;
> +
> +	/* The lock to protect virtqueue list */
> +	spinlock_t lock;
> +	/* List of virtio_vdpa_vq_info */
> +	struct list_head virtqueues;
> +};
> +
> +struct virtio_vdpa_vq_info {
> +	/* the actual virtqueue */
> +	struct virtqueue *vq;
> +
> +	/* the list node for the virtqueues list */
> +	struct list_head node;
> +};
> +
> +static struct vdpa_device *vd_get_vdpa(struct virtio_device *vdev)
> +{
> +	struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev);
> +	struct vdpa_device *vdpa = vd_dev->vdpa;
> +
> +	return vdpa;

Bit of a long way to say

  return to_virtio_vdpa_device(vdev)->vdpa

?

> +err_vq:
> +	vring_del_virtqueue(vq);
> +error_new_virtqueue:
> +	ops->set_vq_ready(vdpa, index, 0);
> +	WARN_ON(ops->get_vq_ready(vdpa, index));

A warn_on during error unwind? Sketchy, deserves a comment I think

> +static void virtio_vdpa_release_dev(struct device *_d)
> +{
> +	struct virtio_device *vdev =
> +	       container_of(_d, struct virtio_device, dev);
> +	struct virtio_vdpa_device *vd_dev =
> +	       container_of(vdev, struct virtio_vdpa_device, vdev);
> +	struct vdpa_device *vdpa = vd_dev->vdpa;
> +
> +	devm_kfree(&vdpa->dev, vd_dev);
> +}

It is unusual for the release function to not be owned by the
subsystem, through the class. I'm not sure there are enough module ref
counts to ensure that this function is not unloaded?

Usually to make this all work sanely the subsytem provides some
allocation function

 vdpa_dev = vdpa_alloc_dev(parent, ops, sizeof(struct virtio_vdpa_device))
 struct virtio_vdpa_device *priv = vdpa_priv(vdpa_dev)

Then the subsystem naturally owns all the memory.

Otherwise it gets tricky to ensure that the module doesn't unload
before all the krefs are put.

> +
> +static int virtio_vdpa_probe(struct device *dev)
> +{
> +	struct vdpa_device *vdpa = dev_to_vdpa(dev);

The probe function for a class should accept the classes type already,
no casting.

> +	const struct vdpa_config_ops *ops = vdpa->config;
> +	struct virtio_vdpa_device *vd_dev;
> +	int rc;
> +
> +	vd_dev = devm_kzalloc(dev, sizeof(*vd_dev), GFP_KERNEL);
> +	if (!vd_dev)
> +		return -ENOMEM;

This is not right, the struct device lifetime is controled by a kref,
not via devm. If you want to use a devm unwind then the unwind is
put_device, not devm_kfree.

In this simple situation I don't see a reason to use devm.

> +	vd_dev->vdev.dev.parent = &vdpa->dev;
> +	vd_dev->vdev.dev.release = virtio_vdpa_release_dev;
> +	vd_dev->vdev.config = &virtio_vdpa_config_ops;
> +	vd_dev->vdpa = vdpa;
> +	INIT_LIST_HEAD(&vd_dev->virtqueues);
> +	spin_lock_init(&vd_dev->lock);
> +
> +	vd_dev->vdev.id.device = ops->get_device_id(vdpa);
> +	if (vd_dev->vdev.id.device == 0)
> +		return -ENODEV;
> +
> +	vd_dev->vdev.id.vendor = ops->get_vendor_id(vdpa);
> +	rc = register_virtio_device(&vd_dev->vdev);
> +	if (rc)
> +		put_device(dev);

And a ugly unwind like this is why you want to have device_initialize()
exposed to the driver, so there is a clear pairing that calling
device_initialize() must be followed by put_device. This should also
use the goto unwind style

> +	else
> +		dev_set_drvdata(dev, vd_dev);
> +
> +	return rc;
> +}
> +
> +static void virtio_vdpa_remove(struct device *dev)
> +{

Remove should also already accept the right type

> +	struct virtio_vdpa_device *vd_dev = dev_get_drvdata(dev);
> +
> +	unregister_virtio_device(&vd_dev->vdev);
> +}
> +
> +static struct vdpa_driver virtio_vdpa_driver = {
> +	.drv = {
> +		.name	= "virtio_vdpa",
> +	},
> +	.probe	= virtio_vdpa_probe,
> +	.remove = virtio_vdpa_remove,
> +};

Still a little unclear on binding, is this supposed to bind to all
vdpa devices?

Where is the various THIS_MODULE's I expect to see in a scheme like
this?

All function pointers must be protected by a held module reference
count, ie the above probe/remove and all the pointers in ops.

> +static int __init virtio_vdpa_init(void)
> +{
> +	return register_vdpa_driver(&virtio_vdpa_driver);
> +}
> +
> +static void __exit virtio_vdpa_exit(void)
> +{
> +	unregister_vdpa_driver(&virtio_vdpa_driver);
> +}
> +
> +module_init(virtio_vdpa_init)
> +module_exit(virtio_vdpa_exit)

Best to provide the usual 'module_pci_driver' like scheme for this
boiler plate.

> +MODULE_VERSION(MOD_VERSION);
> +MODULE_LICENSE(MOD_LICENSE);
> +MODULE_AUTHOR(MOD_AUTHOR);
> +MODULE_DESCRIPTION(MOD_DESC);

Why the indirection with 2nd defines?

Jason

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 5/5] vdpasim: vDPA device simulator
  2020-01-16 12:42 ` [PATCH 5/5] vdpasim: vDPA device simulator Jason Wang
@ 2020-01-16 15:47   ` Jason Gunthorpe
  2020-01-17  9:32     ` Jason Wang
  2020-01-17  4:12   ` Randy Dunlap
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 76+ messages in thread
From: Jason Gunthorpe @ 2020-01-16 15:47 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, linux-kernel, kvm, virtualization, netdev, tiwei.bie,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller,
	xiao.w.wang, haotian.wang, lingshan.zhu, eperezma, lulu,
	Parav Pandit, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, Jiri Pirko, Shahaf Shuler, hanand, mhabets

On Thu, Jan 16, 2020 at 08:42:31PM +0800, Jason Wang wrote:
> This patch implements a software vDPA networking device. The datapath
> is implemented through vringh and workqueue. The device has an on-chip
> IOMMU which translates IOVA to PA. For kernel virtio drivers, vDPA
> simulator driver provides dma_ops. For vhost driers, set_map() methods
> of vdpa_config_ops is implemented to accept mappings from vhost.
> 
> A sysfs based management interface is implemented, devices are
> created and removed through:
> 
> /sys/devices/virtual/vdpa_simulator/netdev/{create|remove}

This is very gross, creating a class just to get a create/remove and
then not using the class for anything else? Yuk.

> Netlink based lifecycle management could be implemented for vDPA
> simulator as well.

This is just begging for a netlink based approach.

Certainly netlink driven removal should be an agreeable standard for
all devices, I think.

> +struct vdpasim_virtqueue {
> +	struct vringh vring;
> +	struct vringh_kiov iov;
> +	unsigned short head;
> +	bool ready;
> +	u64 desc_addr;
> +	u64 device_addr;
> +	u64 driver_addr;
> +	u32 num;
> +	void *private;
> +	irqreturn_t (*cb)(void *data);
> +};
> +
> +#define VDPASIM_QUEUE_ALIGN PAGE_SIZE
> +#define VDPASIM_QUEUE_MAX 256
> +#define VDPASIM_DEVICE_ID 0x1
> +#define VDPASIM_VENDOR_ID 0
> +#define VDPASIM_VQ_NUM 0x2
> +#define VDPASIM_CLASS_NAME "vdpa_simulator"
> +#define VDPASIM_NAME "netdev"
> +
> +u64 vdpasim_features = (1ULL << VIRTIO_F_ANY_LAYOUT) |
> +		       (1ULL << VIRTIO_F_VERSION_1)  |
> +		       (1ULL << VIRTIO_F_IOMMU_PLATFORM);

Is not using static here intentional?

> +static void vdpasim_release_dev(struct device *_d)
> +{
> +	struct vdpa_device *vdpa = dev_to_vdpa(_d);
> +	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> +
> +	sysfs_remove_link(vdpasim_dev->devices_kobj, vdpasim->name);
> +
> +	mutex_lock(&vsim_list_lock);
> +	list_del(&vdpasim->next);
> +	mutex_unlock(&vsim_list_lock);
> +
> +	kfree(vdpasim->buffer);
> +	kfree(vdpasim);
> +}

It is again a bit weird to see a realease function in a driver. This
stuff is usually in the remove remove function.

> +static int vdpasim_create(const guid_t *uuid)
> +{
> +	struct vdpasim *vdpasim, *tmp;
> +	struct virtio_net_config *config;
> +	struct vdpa_device *vdpa;
> +	struct device *dev;
> +	int ret = -ENOMEM;
> +
> +	mutex_lock(&vsim_list_lock);
> +	list_for_each_entry(tmp, &vsim_devices_list, next) {
> +		if (guid_equal(&tmp->uuid, uuid)) {
> +			mutex_unlock(&vsim_list_lock);
> +			return -EEXIST;
> +		}
> +	}
> +
> +	vdpasim = kzalloc(sizeof(*vdpasim), GFP_KERNEL);
> +	if (!vdpasim)
> +		goto err_vdpa_alloc;
> +
> +	vdpasim->buffer = kmalloc(PAGE_SIZE, GFP_KERNEL);
> +	if (!vdpasim->buffer)
> +		goto err_buffer_alloc;
> +
> +	vdpasim->iommu = vhost_iotlb_alloc(2048, 0);
> +	if (!vdpasim->iommu)
> +		goto err_iotlb;
> +
> +	config = &vdpasim->config;
> +	config->mtu = 1500;
> +	config->status = VIRTIO_NET_S_LINK_UP;
> +	eth_random_addr(config->mac);
> +
> +	INIT_WORK(&vdpasim->work, vdpasim_work);
> +	spin_lock_init(&vdpasim->lock);
> +
> +	guid_copy(&vdpasim->uuid, uuid);
> +
> +	list_add(&vdpasim->next, &vsim_devices_list);
> +	vdpa = &vdpasim->vdpa;
> +
> +	mutex_unlock(&vsim_list_lock);
> +
> +	vdpa = &vdpasim->vdpa;
> +	vdpa->config = &vdpasim_net_config_ops;
> +	vdpa_set_parent(vdpa, &vdpasim_dev->dev);
> +	vdpa->dev.release = vdpasim_release_dev;
> +
> +	vringh_set_iotlb(&vdpasim->vqs[0].vring, vdpasim->iommu);
> +	vringh_set_iotlb(&vdpasim->vqs[1].vring, vdpasim->iommu);
> +
> +	dev = &vdpa->dev;
> +	dev->coherent_dma_mask = DMA_BIT_MASK(64);
> +	set_dma_ops(dev, &vdpasim_dma_ops);
> +
> +	ret = register_vdpa_device(vdpa);
> +	if (ret)
> +		goto err_register;
> +
> +	sprintf(vdpasim->name, "%pU", uuid);
>+
> +	ret = sysfs_create_link(vdpasim_dev->devices_kobj, &vdpa->dev.kobj,
> +				vdpasim->name);
> +	if (ret)
> +		goto err_link;

The goto err_link does the wrong unwind, once register is completed
the error unwind is unregister & put_device, not kfree. This is why I
recommend to always initalize the device early, and always using
put_device during error unwinds.

This whole guid thing seems unncessary when the device is immediately
assigned a vdpa index from the ida. If you were not using syfs you'd
just return that index from the creation netlink.

Jason

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-16 15:22   ` Jason Gunthorpe
@ 2020-01-17  3:03     ` Jason Wang
  2020-01-17 13:54       ` Jason Gunthorpe
  2020-01-21  8:40       ` Tian, Kevin
  0 siblings, 2 replies; 76+ messages in thread
From: Jason Wang @ 2020-01-17  3:03 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mst, linux-kernel, kvm, virtualization, netdev, tiwei.bie,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller,
	xiao.w.wang, haotian.wang, lingshan.zhu, eperezma, lulu,
	Parav Pandit, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, Jiri Pirko, Shahaf Shuler, hanand, mhabets


On 2020/1/16 下午11:22, Jason Gunthorpe wrote:
> On Thu, Jan 16, 2020 at 08:42:29PM +0800, Jason Wang wrote:
>> vDPA device is a device that uses a datapath which complies with the
>> virtio specifications with vendor specific control path. vDPA devices
>> can be both physically located on the hardware or emulated by
>> software. vDPA hardware devices are usually implemented through PCIE
>> with the following types:
>>
>> - PF (Physical Function) - A single Physical Function
>> - VF (Virtual Function) - Device that supports single root I/O
>>    virtualization (SR-IOV). Its Virtual Function (VF) represents a
>>    virtualized instance of the device that can be assigned to different
>>    partitions
>> - VDEV (Virtual Device) - With technologies such as Intel Scalable
>>    IOV, a virtual device composed by host OS utilizing one or more
>>    ADIs.
>> - SF (Sub function) - Vendor specific interface to slice the Physical
>>    Function to multiple sub functions that can be assigned to different
>>    partitions as virtual devices.
> I really hope we don't end up with two different ways to spell this
> same thing.


I think you meant ADI vs SF. It looks to me that ADI is limited to the 
scope of scalable IOV but SF not.


>
>> @@ -0,0 +1,2 @@
>> +# SPDX-License-Identifier: GPL-2.0
>> +obj-$(CONFIG_VDPA) += vdpa.o
>> diff --git a/drivers/virtio/vdpa/vdpa.c b/drivers/virtio/vdpa/vdpa.c
>> new file mode 100644
>> index 000000000000..2b0e4a9f105d
>> +++ b/drivers/virtio/vdpa/vdpa.c
>> @@ -0,0 +1,141 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * vDPA bus.
>> + *
>> + * Copyright (c) 2019, Red Hat. All rights reserved.
>> + *     Author: Jason Wang <jasowang@redhat.com>
> 2020 tests days


Will fix.


>
>> + *
>> + */
>> +
>> +#include <linux/module.h>
>> +#include <linux/idr.h>
>> +#include <linux/vdpa.h>
>> +
>> +#define MOD_VERSION  "0.1"
> I think module versions are discouraged these days


Will remove.


>
>> +#define MOD_DESC     "vDPA bus"
>> +#define MOD_AUTHOR   "Jason Wang <jasowang@redhat.com>"
>> +#define MOD_LICENSE  "GPL v2"
>> +
>> +static DEFINE_IDA(vdpa_index_ida);
>> +
>> +struct device *vdpa_get_parent(struct vdpa_device *vdpa)
>> +{
>> +	return vdpa->dev.parent;
>> +}
>> +EXPORT_SYMBOL(vdpa_get_parent);
>> +
>> +void vdpa_set_parent(struct vdpa_device *vdpa, struct device *parent)
>> +{
>> +	vdpa->dev.parent = parent;
>> +}
>> +EXPORT_SYMBOL(vdpa_set_parent);
>> +
>> +struct vdpa_device *dev_to_vdpa(struct device *_dev)
>> +{
>> +	return container_of(_dev, struct vdpa_device, dev);
>> +}
>> +EXPORT_SYMBOL_GPL(dev_to_vdpa);
>> +
>> +struct device *vdpa_to_dev(struct vdpa_device *vdpa)
>> +{
>> +	return &vdpa->dev;
>> +}
>> +EXPORT_SYMBOL_GPL(vdpa_to_dev);
> Why these trivial assessors? Seems unnecessary, or should at least be
> static inlines in a header


Will fix.


>
>> +int register_vdpa_device(struct vdpa_device *vdpa)
>> +{
> Usually we want to see symbols consistently prefixed with vdpa_*, is
> there a reason why register/unregister are swapped?


I follow the name from virtio. I will switch to vdpa_*.


>
>> +	int err;
>> +
>> +	if (!vdpa_get_parent(vdpa))
>> +		return -EINVAL;
>> +
>> +	if (!vdpa->config)
>> +		return -EINVAL;
>> +
>> +	err = ida_simple_get(&vdpa_index_ida, 0, 0, GFP_KERNEL);
>> +	if (err < 0)
>> +		return -EFAULT;
>> +
>> +	vdpa->dev.bus = &vdpa_bus;
>> +	device_initialize(&vdpa->dev);
> IMHO device_initialize should not be called inside something called
> register, toooften we find out that the caller drivers need the device
> to be initialized earlier, ie to use the kref, or something.
>
> I find the best flow is to have some init function that does the
> device_initialize and sets the device_name that the driver can call
> early.


Ok, will do.


>
> Shouldn't there be a device/driver matching process of some kind?


The question is what do we want do match here.

1) "virtio" vs "vhost", I implemented matching method for this in mdev 
series, but it looks unnecessary for vDPA device driver to know about 
this. Anyway we can use sysfs driver bind/unbind to switch drivers
2) virtio device id and vendor id. I'm not sure we need this consider 
the two drivers so far (virtio/vhost) are all bus drivers.

Thanks


>
> Jason
>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 4/5] virtio: introduce a vDPA based transport
  2020-01-16 12:42 ` [PATCH 4/5] virtio: introduce a vDPA based transport Jason Wang
  2020-01-16 15:38   ` Jason Gunthorpe
@ 2020-01-17  4:10   ` Randy Dunlap
  1 sibling, 0 replies; 76+ messages in thread
From: Randy Dunlap @ 2020-01-17  4:10 UTC (permalink / raw)
  To: Jason Wang, mst, linux-kernel, kvm, virtualization, netdev
  Cc: tiwei.bie, jgg, maxime.coquelin, cunming.liang, zhihong.wang,
	rob.miller, xiao.w.wang, haotian.wang, lingshan.zhu, eperezma,
	lulu, parav, kevin.tian, stefanha, hch, aadam, jakub.kicinski,
	jiri, shahafs, hanand, mhabets

Hi,

On 1/16/20 4:42 AM, Jason Wang wrote:
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 9c4fdb64d9ac..b4276999d17d 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -43,6 +43,19 @@ config VIRTIO_PCI_LEGACY
>  
>  	  If unsure, say Y.
>  
> +config VIRTIO_VDPA
> +	tristate "vDPA driver for virtio devices"
> +	depends on VDPA && VIRTIO
> +	default n
> +	help
> +	  This driver provides support for virtio based paravirtual

	                                   virtio-based

> +	  device driver over vDPA bus. For this to be useful, you need
> +	  an appropriate vDPA device implementation that operates on a
> +          physical device to allow the datapath of virtio to be

use tab + 2 spaces above for indentation, not lots of spaces.

> +	  offloaded to hardware.
> +
> +	  If unsure, say M.
> +
>  config VIRTIO_PMEM
>  	tristate "Support for virtio pmem driver"
>  	depends on VIRTIO


-- 
~Randy


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 5/5] vdpasim: vDPA device simulator
  2020-01-16 12:42 ` [PATCH 5/5] vdpasim: vDPA device simulator Jason Wang
  2020-01-16 15:47   ` Jason Gunthorpe
@ 2020-01-17  4:12   ` Randy Dunlap
  2020-01-17  9:35     ` Jason Wang
  2020-01-18 18:18   ` kbuild test robot
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 76+ messages in thread
From: Randy Dunlap @ 2020-01-17  4:12 UTC (permalink / raw)
  To: Jason Wang, mst, linux-kernel, kvm, virtualization, netdev
  Cc: tiwei.bie, jgg, maxime.coquelin, cunming.liang, zhihong.wang,
	rob.miller, xiao.w.wang, haotian.wang, lingshan.zhu, eperezma,
	lulu, parav, kevin.tian, stefanha, hch, aadam, jakub.kicinski,
	jiri, shahafs, hanand, mhabets

On 1/16/20 4:42 AM, Jason Wang wrote:
> diff --git a/drivers/virtio/vdpa/Kconfig b/drivers/virtio/vdpa/Kconfig
> index 3032727b4d98..12ec25d48423 100644
> --- a/drivers/virtio/vdpa/Kconfig
> +++ b/drivers/virtio/vdpa/Kconfig
> @@ -7,3 +7,20 @@ config VDPA
>            datapath which complies with virtio specifications with
>            vendor specific control path.
>  
> +menuconfig VDPA_MENU
> +	bool "VDPA drivers"
> +	default n
> +
> +if VDPA_MENU
> +
> +config VDPA_SIM
> +	tristate "vDPA device simulator"
> +        select VDPA
> +        default n
> +        help
> +          vDPA networking device simulator which loop TX traffic back

	                                            loops

> +          to RX. This device is used for testing, prototyping and
> +          development of vDPA.
> +
> +endif # VDPA_MENU

Most lines above use spaces for indentation, while they should use
tab + 2 spaces.

-- 
~Randy


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 1/5] vhost: factor out IOTLB
  2020-01-16 12:42 ` [PATCH 1/5] vhost: factor out IOTLB Jason Wang
@ 2020-01-17  4:14   ` Randy Dunlap
  2020-01-17  9:34     ` Jason Wang
  2020-01-18  0:01   ` kbuild test robot
  2020-01-18  0:40   ` kbuild test robot
  2 siblings, 1 reply; 76+ messages in thread
From: Randy Dunlap @ 2020-01-17  4:14 UTC (permalink / raw)
  To: Jason Wang, mst, linux-kernel, kvm, virtualization, netdev
  Cc: tiwei.bie, jgg, maxime.coquelin, cunming.liang, zhihong.wang,
	rob.miller, xiao.w.wang, haotian.wang, lingshan.zhu, eperezma,
	lulu, parav, kevin.tian, stefanha, hch, aadam, jakub.kicinski,
	jiri, shahafs, hanand, mhabets

On 1/16/20 4:42 AM, Jason Wang wrote:
> diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
> index 3d03ccbd1adc..f21c45aa5e07 100644
> --- a/drivers/vhost/Kconfig
> +++ b/drivers/vhost/Kconfig
> @@ -36,6 +36,7 @@ config VHOST_VSOCK
>  
>  config VHOST
>  	tristate
> +        depends on VHOST_IOTLB
>  	---help---
>  	  This option is selected by any driver which needs to access
>  	  the core of vhost.
> @@ -54,3 +55,9 @@ config VHOST_CROSS_ENDIAN_LEGACY
>  	  adds some overhead, it is disabled by default.
>  
>  	  If unsure, say "N".
> +
> +config VHOST_IOTLB
> +	tristate
> +        default m
> +        help
> +          Generic IOTLB implementation for vhost and vringh.

Use tab + 2 spaces for Kconfig indentation.

-- 
~Randy
Reported-by: Randy Dunlap <rdunlap@infradead.org>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-16 12:42 ` [PATCH 3/5] vDPA: introduce vDPA bus Jason Wang
  2020-01-16 15:22   ` Jason Gunthorpe
@ 2020-01-17  4:16   ` Randy Dunlap
  2020-01-17  9:34     ` Jason Wang
  2020-01-17 12:13   ` Michael S. Tsirkin
  2 siblings, 1 reply; 76+ messages in thread
From: Randy Dunlap @ 2020-01-17  4:16 UTC (permalink / raw)
  To: Jason Wang, mst, linux-kernel, kvm, virtualization, netdev
  Cc: tiwei.bie, jgg, maxime.coquelin, cunming.liang, zhihong.wang,
	rob.miller, xiao.w.wang, haotian.wang, lingshan.zhu, eperezma,
	lulu, parav, kevin.tian, stefanha, hch, aadam, jakub.kicinski,
	jiri, shahafs, hanand, mhabets

On 1/16/20 4:42 AM, Jason Wang wrote:
> diff --git a/drivers/virtio/vdpa/Kconfig b/drivers/virtio/vdpa/Kconfig
> new file mode 100644
> index 000000000000..3032727b4d98
> --- /dev/null
> +++ b/drivers/virtio/vdpa/Kconfig
> @@ -0,0 +1,9 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config VDPA
> +	tristate
> +        default n
> +        help
> +          Enable this module to support vDPA device that uses a

	                                        devices

> +          datapath which complies with virtio specifications with
> +          vendor specific control path.
> +

Use tab + 2 spaces for Kconfig indentation.

-- 
~Randy


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 4/5] virtio: introduce a vDPA based transport
  2020-01-16 15:38   ` Jason Gunthorpe
@ 2020-01-17  9:32     ` Jason Wang
  2020-01-17 14:00       ` Jason Gunthorpe
  0 siblings, 1 reply; 76+ messages in thread
From: Jason Wang @ 2020-01-17  9:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mst, linux-kernel, kvm, virtualization, netdev, tiwei.bie,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller,
	xiao.w.wang, haotian.wang, lingshan.zhu, eperezma, lulu,
	Parav Pandit, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, Jiri Pirko, Shahaf Shuler, hanand, mhabets


On 2020/1/16 下午11:38, Jason Gunthorpe wrote:
> On Thu, Jan 16, 2020 at 08:42:30PM +0800, Jason Wang wrote:
>> diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
>> new file mode 100644
>> index 000000000000..86936e5e7ec3
>> +++ b/drivers/virtio/virtio_vdpa.c
>> @@ -0,0 +1,400 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * VIRTIO based driver for vDPA device
>> + *
>> + * Copyright (c) 2020, Red Hat. All rights reserved.
>> + *     Author: Jason Wang <jasowang@redhat.com>
>> + *
>> + */
>> +
>> +#include <linux/init.h>
>> +#include <linux/module.h>
>> +#include <linux/device.h>
>> +#include <linux/kernel.h>
>> +#include <linux/slab.h>
>> +#include <linux/uuid.h>
>> +#include <linux/virtio.h>
>> +#include <linux/vdpa.h>
>> +#include <linux/virtio_config.h>
>> +#include <linux/virtio_ring.h>
>> +
>> +#define MOD_VERSION  "0.1"
>> +#define MOD_AUTHOR   "Jason Wang <jasowang@redhat.com>"
>> +#define MOD_DESC     "vDPA bus driver for virtio devices"
>> +#define MOD_LICENSE  "GPL v2"
>> +
>> +#define to_virtio_vdpa_device(dev) \
>> +	container_of(dev, struct virtio_vdpa_device, vdev)
> Should be a static function


Ok.


>
>> +struct virtio_vdpa_device {
>> +	struct virtio_device vdev;
>> +	struct vdpa_device *vdpa;
>> +	u64 features;
>> +
>> +	/* The lock to protect virtqueue list */
>> +	spinlock_t lock;
>> +	/* List of virtio_vdpa_vq_info */
>> +	struct list_head virtqueues;
>> +};
>> +
>> +struct virtio_vdpa_vq_info {
>> +	/* the actual virtqueue */
>> +	struct virtqueue *vq;
>> +
>> +	/* the list node for the virtqueues list */
>> +	struct list_head node;
>> +};
>> +
>> +static struct vdpa_device *vd_get_vdpa(struct virtio_device *vdev)
>> +{
>> +	struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev);
>> +	struct vdpa_device *vdpa = vd_dev->vdpa;
>> +
>> +	return vdpa;
> Bit of a long way to say
>
>    return to_virtio_vdpa_device(vdev)->vdpa
>
> ?


Right.


>
>> +err_vq:
>> +	vring_del_virtqueue(vq);
>> +error_new_virtqueue:
>> +	ops->set_vq_ready(vdpa, index, 0);
>> +	WARN_ON(ops->get_vq_ready(vdpa, index));
> A warn_on during error unwind? Sketchy, deserves a comment I think


Yes, it's a hint of bug in the vDPA driver. Will add a comment.


>
>> +static void virtio_vdpa_release_dev(struct device *_d)
>> +{
>> +	struct virtio_device *vdev =
>> +	       container_of(_d, struct virtio_device, dev);
>> +	struct virtio_vdpa_device *vd_dev =
>> +	       container_of(vdev, struct virtio_vdpa_device, vdev);
>> +	struct vdpa_device *vdpa = vd_dev->vdpa;
>> +
>> +	devm_kfree(&vdpa->dev, vd_dev);
>> +}
> It is unusual for the release function to not be owned by the
> subsystem, through the class.


This is how virtio_pci and virtio_mmio work now. Virtio devices may have 
different transports which require different release functions. I think 
this is the reason why virtio


> I'm not sure there are enough module ref
> counts to ensure that this function is not unloaded?


Let me double check this.


>
> Usually to make this all work sanely the subsytem provides some
> allocation function
>
>   vdpa_dev = vdpa_alloc_dev(parent, ops, sizeof(struct virtio_vdpa_device))
>   struct virtio_vdpa_device *priv = vdpa_priv(vdpa_dev)
>
> Then the subsystem naturally owns all the memory.
>
> Otherwise it gets tricky to ensure that the module doesn't unload
> before all the krefs are put.


I see.


>
>> +
>> +static int virtio_vdpa_probe(struct device *dev)
>> +{
>> +	struct vdpa_device *vdpa = dev_to_vdpa(dev);
> The probe function for a class should accept the classes type already,
> no casting.


Right.


>
>> +	const struct vdpa_config_ops *ops = vdpa->config;
>> +	struct virtio_vdpa_device *vd_dev;
>> +	int rc;
>> +
>> +	vd_dev = devm_kzalloc(dev, sizeof(*vd_dev), GFP_KERNEL);
>> +	if (!vd_dev)
>> +		return -ENOMEM;
> This is not right, the struct device lifetime is controled by a kref,
> not via devm. If you want to use a devm unwind then the unwind is
> put_device, not devm_kfree.


I'm not sure I get the point here. The lifetime is bound to underlying 
vDPA device and devres allow to be freed before the vpda device is 
released. But I agree using devres of underlying vdpa device looks wired.


>
> In this simple situation I don't see a reason to use devm.
>
>> +	vd_dev->vdev.dev.parent = &vdpa->dev;
>> +	vd_dev->vdev.dev.release = virtio_vdpa_release_dev;
>> +	vd_dev->vdev.config = &virtio_vdpa_config_ops;
>> +	vd_dev->vdpa = vdpa;
>> +	INIT_LIST_HEAD(&vd_dev->virtqueues);
>> +	spin_lock_init(&vd_dev->lock);
>> +
>> +	vd_dev->vdev.id.device = ops->get_device_id(vdpa);
>> +	if (vd_dev->vdev.id.device == 0)
>> +		return -ENODEV;
>> +
>> +	vd_dev->vdev.id.vendor = ops->get_vendor_id(vdpa);
>> +	rc = register_virtio_device(&vd_dev->vdev);
>> +	if (rc)
>> +		put_device(dev);
> And a ugly unwind like this is why you want to have device_initialize()
> exposed to the driver,


In this context, which "driver" did you mean here? (Note, virtio-vdpa is 
the driver for vDPA bus here).


>   so there is a clear pairing that calling
> device_initialize() must be followed by put_device. This should also
> use the goto unwind style
>
>> +	else
>> +		dev_set_drvdata(dev, vd_dev);
>> +
>> +	return rc;
>> +}
>> +
>> +static void virtio_vdpa_remove(struct device *dev)
>> +{
> Remove should also already accept the right type


Yes.


>
>> +	struct virtio_vdpa_device *vd_dev = dev_get_drvdata(dev);
>> +
>> +	unregister_virtio_device(&vd_dev->vdev);
>> +}
>> +
>> +static struct vdpa_driver virtio_vdpa_driver = {
>> +	.drv = {
>> +		.name	= "virtio_vdpa",
>> +	},
>> +	.probe	= virtio_vdpa_probe,
>> +	.remove = virtio_vdpa_remove,
>> +};
> Still a little unclear on binding, is this supposed to bind to all
> vdpa devices?


Yes, it expected to drive all vDPA devices.


>
> Where is the various THIS_MODULE's I expect to see in a scheme like
> this?
>
> All function pointers must be protected by a held module reference
> count, ie the above probe/remove and all the pointers in ops.


Will double check, since I don't see this in other virtio transport 
drivers (PCI or MMIO).


>
>> +static int __init virtio_vdpa_init(void)
>> +{
>> +	return register_vdpa_driver(&virtio_vdpa_driver);
>> +}
>> +
>> +static void __exit virtio_vdpa_exit(void)
>> +{
>> +	unregister_vdpa_driver(&virtio_vdpa_driver);
>> +}
>> +
>> +module_init(virtio_vdpa_init)
>> +module_exit(virtio_vdpa_exit)
> Best to provide the usual 'module_pci_driver' like scheme for this
> boiler plate.


Ok.


>
>> +MODULE_VERSION(MOD_VERSION);
>> +MODULE_LICENSE(MOD_LICENSE);
>> +MODULE_AUTHOR(MOD_AUTHOR);
>> +MODULE_DESCRIPTION(MOD_DESC);
> Why the indirection with 2nd defines?


Will fix.

Thanks


>
> Jason
>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 5/5] vdpasim: vDPA device simulator
  2020-01-16 15:47   ` Jason Gunthorpe
@ 2020-01-17  9:32     ` Jason Wang
  2020-01-17 14:10       ` Jason Gunthorpe
  0 siblings, 1 reply; 76+ messages in thread
From: Jason Wang @ 2020-01-17  9:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mst, linux-kernel, kvm, virtualization, netdev, tiwei.bie,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller,
	xiao.w.wang, haotian.wang, lingshan.zhu, eperezma, lulu,
	Parav Pandit, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, Jiri Pirko, Shahaf Shuler, hanand, mhabets, kuba


On 2020/1/16 下午11:47, Jason Gunthorpe wrote:
> On Thu, Jan 16, 2020 at 08:42:31PM +0800, Jason Wang wrote:
>> This patch implements a software vDPA networking device. The datapath
>> is implemented through vringh and workqueue. The device has an on-chip
>> IOMMU which translates IOVA to PA. For kernel virtio drivers, vDPA
>> simulator driver provides dma_ops. For vhost driers, set_map() methods
>> of vdpa_config_ops is implemented to accept mappings from vhost.
>>
>> A sysfs based management interface is implemented, devices are
>> created and removed through:
>>
>> /sys/devices/virtual/vdpa_simulator/netdev/{create|remove}
> This is very gross, creating a class just to get a create/remove and
> then not using the class for anything else? Yuk.


It includes more information, e.g the devices and the link from vdpa_sim 
device and vdpa device.


>
>> Netlink based lifecycle management could be implemented for vDPA
>> simulator as well.
> This is just begging for a netlink based approach.
>
> Certainly netlink driven removal should be an agreeable standard for
> all devices, I think.


Well, I think Parav had some proposals during the discussion of mdev 
approach. But I'm not sure if he had any RFC codes for me to integrate 
it into vdpasim.

Or do you want me to propose the netlink API? If yes, would you prefer 
to a new virtio dedicated one or be a subset of devlink?

But it might be better to reach an agreement for all the vendors here.

Rob, Steve, Tiwei, Lingshan, Harpreet, Martin, Jakub, please share your 
thoughts about the management API here.


>
>> +struct vdpasim_virtqueue {
>> +	struct vringh vring;
>> +	struct vringh_kiov iov;
>> +	unsigned short head;
>> +	bool ready;
>> +	u64 desc_addr;
>> +	u64 device_addr;
>> +	u64 driver_addr;
>> +	u32 num;
>> +	void *private;
>> +	irqreturn_t (*cb)(void *data);
>> +};
>> +
>> +#define VDPASIM_QUEUE_ALIGN PAGE_SIZE
>> +#define VDPASIM_QUEUE_MAX 256
>> +#define VDPASIM_DEVICE_ID 0x1
>> +#define VDPASIM_VENDOR_ID 0
>> +#define VDPASIM_VQ_NUM 0x2
>> +#define VDPASIM_CLASS_NAME "vdpa_simulator"
>> +#define VDPASIM_NAME "netdev"
>> +
>> +u64 vdpasim_features = (1ULL << VIRTIO_F_ANY_LAYOUT) |
>> +		       (1ULL << VIRTIO_F_VERSION_1)  |
>> +		       (1ULL << VIRTIO_F_IOMMU_PLATFORM);
> Is not using static here intentional?


No, let me fix.


>
>> +static void vdpasim_release_dev(struct device *_d)
>> +{
>> +	struct vdpa_device *vdpa = dev_to_vdpa(_d);
>> +	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
>> +
>> +	sysfs_remove_link(vdpasim_dev->devices_kobj, vdpasim->name);
>> +
>> +	mutex_lock(&vsim_list_lock);
>> +	list_del(&vdpasim->next);
>> +	mutex_unlock(&vsim_list_lock);
>> +
>> +	kfree(vdpasim->buffer);
>> +	kfree(vdpasim);
>> +}
> It is again a bit weird to see a realease function in a driver. This
> stuff is usually in the remove remove function.


Will fix.


>
>> +static int vdpasim_create(const guid_t *uuid)
>> +{
>> +	struct vdpasim *vdpasim, *tmp;
>> +	struct virtio_net_config *config;
>> +	struct vdpa_device *vdpa;
>> +	struct device *dev;
>> +	int ret = -ENOMEM;
>> +
>> +	mutex_lock(&vsim_list_lock);
>> +	list_for_each_entry(tmp, &vsim_devices_list, next) {
>> +		if (guid_equal(&tmp->uuid, uuid)) {
>> +			mutex_unlock(&vsim_list_lock);
>> +			return -EEXIST;
>> +		}
>> +	}
>> +
>> +	vdpasim = kzalloc(sizeof(*vdpasim), GFP_KERNEL);
>> +	if (!vdpasim)
>> +		goto err_vdpa_alloc;
>> +
>> +	vdpasim->buffer = kmalloc(PAGE_SIZE, GFP_KERNEL);
>> +	if (!vdpasim->buffer)
>> +		goto err_buffer_alloc;
>> +
>> +	vdpasim->iommu = vhost_iotlb_alloc(2048, 0);
>> +	if (!vdpasim->iommu)
>> +		goto err_iotlb;
>> +
>> +	config = &vdpasim->config;
>> +	config->mtu = 1500;
>> +	config->status = VIRTIO_NET_S_LINK_UP;
>> +	eth_random_addr(config->mac);
>> +
>> +	INIT_WORK(&vdpasim->work, vdpasim_work);
>> +	spin_lock_init(&vdpasim->lock);
>> +
>> +	guid_copy(&vdpasim->uuid, uuid);
>> +
>> +	list_add(&vdpasim->next, &vsim_devices_list);
>> +	vdpa = &vdpasim->vdpa;
>> +
>> +	mutex_unlock(&vsim_list_lock);
>> +
>> +	vdpa = &vdpasim->vdpa;
>> +	vdpa->config = &vdpasim_net_config_ops;
>> +	vdpa_set_parent(vdpa, &vdpasim_dev->dev);
>> +	vdpa->dev.release = vdpasim_release_dev;
>> +
>> +	vringh_set_iotlb(&vdpasim->vqs[0].vring, vdpasim->iommu);
>> +	vringh_set_iotlb(&vdpasim->vqs[1].vring, vdpasim->iommu);
>> +
>> +	dev = &vdpa->dev;
>> +	dev->coherent_dma_mask = DMA_BIT_MASK(64);
>> +	set_dma_ops(dev, &vdpasim_dma_ops);
>> +
>> +	ret = register_vdpa_device(vdpa);
>> +	if (ret)
>> +		goto err_register;
>> +
>> +	sprintf(vdpasim->name, "%pU", uuid);
>> +
>> +	ret = sysfs_create_link(vdpasim_dev->devices_kobj, &vdpa->dev.kobj,
>> +				vdpasim->name);
>> +	if (ret)
>> +		goto err_link;
> The goto err_link does the wrong unwind, once register is completed
> the error unwind is unregister & put_device, not kfree. This is why I
> recommend to always initalize the device early, and always using
> put_device during error unwinds.


Will fix.


>
> This whole guid thing seems unncessary when the device is immediately
> assigned a vdpa index from the ida.


The problem here is that user need to know which vdpa_sim is the one 
that is just created.


> If you were not using syfs you'd
> just return that index from the creation netlink.


Yes it is.

Thanks


>
> Jason
>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 1/5] vhost: factor out IOTLB
  2020-01-17  4:14   ` Randy Dunlap
@ 2020-01-17  9:34     ` Jason Wang
  0 siblings, 0 replies; 76+ messages in thread
From: Jason Wang @ 2020-01-17  9:34 UTC (permalink / raw)
  To: Randy Dunlap, mst, linux-kernel, kvm, virtualization, netdev
  Cc: tiwei.bie, jgg, maxime.coquelin, cunming.liang, zhihong.wang,
	rob.miller, xiao.w.wang, haotian.wang, lingshan.zhu, eperezma,
	lulu, parav, kevin.tian, stefanha, hch, aadam, jakub.kicinski,
	jiri, shahafs, hanand, mhabets


On 2020/1/17 下午12:14, Randy Dunlap wrote:
> On 1/16/20 4:42 AM, Jason Wang wrote:
>> diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
>> index 3d03ccbd1adc..f21c45aa5e07 100644
>> --- a/drivers/vhost/Kconfig
>> +++ b/drivers/vhost/Kconfig
>> @@ -36,6 +36,7 @@ config VHOST_VSOCK
>>   
>>   config VHOST
>>   	tristate
>> +        depends on VHOST_IOTLB
>>   	---help---
>>   	  This option is selected by any driver which needs to access
>>   	  the core of vhost.
>> @@ -54,3 +55,9 @@ config VHOST_CROSS_ENDIAN_LEGACY
>>   	  adds some overhead, it is disabled by default.
>>   
>>   	  If unsure, say "N".
>> +
>> +config VHOST_IOTLB
>> +	tristate
>> +        default m
>> +        help
>> +          Generic IOTLB implementation for vhost and vringh.
> Use tab + 2 spaces for Kconfig indentation.


Will fix.

I wonder why checkpath doesn't complain about this :)

Thanks


>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-17  4:16   ` Randy Dunlap
@ 2020-01-17  9:34     ` Jason Wang
  0 siblings, 0 replies; 76+ messages in thread
From: Jason Wang @ 2020-01-17  9:34 UTC (permalink / raw)
  To: Randy Dunlap, mst, linux-kernel, kvm, virtualization, netdev
  Cc: tiwei.bie, jgg, maxime.coquelin, cunming.liang, zhihong.wang,
	rob.miller, xiao.w.wang, haotian.wang, lingshan.zhu, eperezma,
	lulu, parav, kevin.tian, stefanha, hch, aadam, jakub.kicinski,
	jiri, shahafs, hanand, mhabets


On 2020/1/17 下午12:16, Randy Dunlap wrote:
> On 1/16/20 4:42 AM, Jason Wang wrote:
>> diff --git a/drivers/virtio/vdpa/Kconfig b/drivers/virtio/vdpa/Kconfig
>> new file mode 100644
>> index 000000000000..3032727b4d98
>> --- /dev/null
>> +++ b/drivers/virtio/vdpa/Kconfig
>> @@ -0,0 +1,9 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +config VDPA
>> +	tristate
>> +        default n
>> +        help
>> +          Enable this module to support vDPA device that uses a
> 	                                        devices
>
>> +          datapath which complies with virtio specifications with
>> +          vendor specific control path.
>> +
> Use tab + 2 spaces for Kconfig indentation.


Will fix.

Thanks

>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 5/5] vdpasim: vDPA device simulator
  2020-01-17  4:12   ` Randy Dunlap
@ 2020-01-17  9:35     ` Jason Wang
  0 siblings, 0 replies; 76+ messages in thread
From: Jason Wang @ 2020-01-17  9:35 UTC (permalink / raw)
  To: Randy Dunlap, mst, linux-kernel, kvm, virtualization, netdev
  Cc: tiwei.bie, jgg, maxime.coquelin, cunming.liang, zhihong.wang,
	rob.miller, xiao.w.wang, haotian.wang, lingshan.zhu, eperezma,
	lulu, parav, kevin.tian, stefanha, hch, aadam, jakub.kicinski,
	jiri, shahafs, hanand, mhabets


On 2020/1/17 下午12:12, Randy Dunlap wrote:
> On 1/16/20 4:42 AM, Jason Wang wrote:
>> diff --git a/drivers/virtio/vdpa/Kconfig b/drivers/virtio/vdpa/Kconfig
>> index 3032727b4d98..12ec25d48423 100644
>> --- a/drivers/virtio/vdpa/Kconfig
>> +++ b/drivers/virtio/vdpa/Kconfig
>> @@ -7,3 +7,20 @@ config VDPA
>>             datapath which complies with virtio specifications with
>>             vendor specific control path.
>>   
>> +menuconfig VDPA_MENU
>> +	bool "VDPA drivers"
>> +	default n
>> +
>> +if VDPA_MENU
>> +
>> +config VDPA_SIM
>> +	tristate "vDPA device simulator"
>> +        select VDPA
>> +        default n
>> +        help
>> +          vDPA networking device simulator which loop TX traffic back
> 	                                            loops
>
>> +          to RX. This device is used for testing, prototyping and
>> +          development of vDPA.
>> +
>> +endif # VDPA_MENU
> Most lines above use spaces for indentation, while they should use
> tab + 2 spaces.


Right, will fix.

Thanks


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-16 12:42 ` [PATCH 3/5] vDPA: introduce vDPA bus Jason Wang
  2020-01-16 15:22   ` Jason Gunthorpe
  2020-01-17  4:16   ` Randy Dunlap
@ 2020-01-17 12:13   ` Michael S. Tsirkin
  2020-01-17 13:52     ` Jason Wang
  2 siblings, 1 reply; 76+ messages in thread
From: Michael S. Tsirkin @ 2020-01-17 12:13 UTC (permalink / raw)
  To: Jason Wang
  Cc: linux-kernel, kvm, virtualization, netdev, tiwei.bie, jgg,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller,
	xiao.w.wang, haotian.wang, lingshan.zhu, eperezma, lulu, parav,
	kevin.tian, stefanha, rdunlap, hch, aadam, jakub.kicinski, jiri,
	shahafs, hanand, mhabets

On Thu, Jan 16, 2020 at 08:42:29PM +0800, Jason Wang wrote:
> vDPA device is a device that uses a datapath which complies with the
> virtio specifications with vendor specific control path. vDPA devices
> can be both physically located on the hardware or emulated by
> software. vDPA hardware devices are usually implemented through PCIE
> with the following types:
> 
> - PF (Physical Function) - A single Physical Function
> - VF (Virtual Function) - Device that supports single root I/O
>   virtualization (SR-IOV). Its Virtual Function (VF) represents a
>   virtualized instance of the device that can be assigned to different
>   partitions
> - VDEV (Virtual Device) - With technologies such as Intel Scalable
>   IOV, a virtual device composed by host OS utilizing one or more
>   ADIs.
> - SF (Sub function) - Vendor specific interface to slice the Physical
>   Function to multiple sub functions that can be assigned to different
>   partitions as virtual devices.
> 
> >From a driver's perspective, depends on how and where the DMA
> translation is done, vDPA devices are split into two types:
> 
> - Platform specific DMA translation - From the driver's perspective,
>   the device can be used on a platform where device access to data in
>   memory is limited and/or translated. An example is a PCIE vDPA whose
>   DMA request was tagged via a bus (e.g PCIE) specific way. DMA
>   translation and protection are done at PCIE bus IOMMU level.
> - Device specific DMA translation - The device implements DMA
>   isolation and protection through its own logic. An example is a vDPA
>   device which uses on-chip IOMMU.
> 
> To hide the differences and complexity of the above types for a vDPA
> device/IOMMU options and in order to present a generic virtio device
> to the upper layer, a device agnostic framework is required.
> 
> This patch introduces a software vDPA bus which abstracts the
> common attributes of vDPA device, vDPA bus driver and the
> communication method (vdpa_config_ops) between the vDPA device
> abstraction and the vDPA bus driver:
> 
> With the abstraction of vDPA bus and vDPA bus operations, the
> difference and complexity of the under layer hardware is hidden from
> upper layer. The vDPA bus drivers on top can use a unified
> vdpa_config_ops to control different types of vDPA device.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  MAINTAINERS                  |   1 +
>  drivers/virtio/Kconfig       |   2 +
>  drivers/virtio/Makefile      |   1 +
>  drivers/virtio/vdpa/Kconfig  |   9 ++
>  drivers/virtio/vdpa/Makefile |   2 +
>  drivers/virtio/vdpa/vdpa.c   | 141 ++++++++++++++++++++++++++
>  include/linux/vdpa.h         | 191 +++++++++++++++++++++++++++++++++++
>  7 files changed, 347 insertions(+)
>  create mode 100644 drivers/virtio/vdpa/Kconfig
>  create mode 100644 drivers/virtio/vdpa/Makefile
>  create mode 100644 drivers/virtio/vdpa/vdpa.c
>  create mode 100644 include/linux/vdpa.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index d4bda9c900fa..578d2a581e3b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -17540,6 +17540,7 @@ F:	tools/virtio/
>  F:	drivers/net/virtio_net.c
>  F:	drivers/block/virtio_blk.c
>  F:	include/linux/virtio*.h
> +F:	include/linux/vdpa.h
>  F:	include/uapi/linux/virtio_*.h
>  F:	drivers/crypto/virtio/
>  F:	mm/balloon_compaction.c
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 078615cf2afc..9c4fdb64d9ac 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -96,3 +96,5 @@ config VIRTIO_MMIO_CMDLINE_DEVICES
>  	 If unsure, say 'N'.
>  
>  endif # VIRTIO_MENU
> +
> +source "drivers/virtio/vdpa/Kconfig"
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 3a2b5c5dcf46..fdf5eacd0d0a 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> +obj-$(CONFIG_VDPA) += vdpa/
> diff --git a/drivers/virtio/vdpa/Kconfig b/drivers/virtio/vdpa/Kconfig
> new file mode 100644
> index 000000000000..3032727b4d98
> --- /dev/null
> +++ b/drivers/virtio/vdpa/Kconfig
> @@ -0,0 +1,9 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config VDPA
> +	tristate
> +        default n
> +        help
> +          Enable this module to support vDPA device that uses a
> +          datapath which complies with virtio specifications with
> +          vendor specific control path.
> +
> diff --git a/drivers/virtio/vdpa/Makefile b/drivers/virtio/vdpa/Makefile
> new file mode 100644
> index 000000000000..ee6a35e8a4fb
> --- /dev/null
> +++ b/drivers/virtio/vdpa/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0
> +obj-$(CONFIG_VDPA) += vdpa.o
> diff --git a/drivers/virtio/vdpa/vdpa.c b/drivers/virtio/vdpa/vdpa.c
> new file mode 100644
> index 000000000000..2b0e4a9f105d
> --- /dev/null
> +++ b/drivers/virtio/vdpa/vdpa.c
> @@ -0,0 +1,141 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * vDPA bus.
> + *
> + * Copyright (c) 2019, Red Hat. All rights reserved.
> + *     Author: Jason Wang <jasowang@redhat.com>
> + *
> + */
> +
> +#include <linux/module.h>
> +#include <linux/idr.h>
> +#include <linux/vdpa.h>
> +
> +#define MOD_VERSION  "0.1"
> +#define MOD_DESC     "vDPA bus"
> +#define MOD_AUTHOR   "Jason Wang <jasowang@redhat.com>"
> +#define MOD_LICENSE  "GPL v2"
> +
> +static DEFINE_IDA(vdpa_index_ida);
> +
> +struct device *vdpa_get_parent(struct vdpa_device *vdpa)
> +{
> +	return vdpa->dev.parent;
> +}
> +EXPORT_SYMBOL(vdpa_get_parent);
> +
> +void vdpa_set_parent(struct vdpa_device *vdpa, struct device *parent)
> +{
> +	vdpa->dev.parent = parent;
> +}
> +EXPORT_SYMBOL(vdpa_set_parent);
> +
> +struct vdpa_device *dev_to_vdpa(struct device *_dev)
> +{
> +	return container_of(_dev, struct vdpa_device, dev);
> +}
> +EXPORT_SYMBOL_GPL(dev_to_vdpa);
> +
> +struct device *vdpa_to_dev(struct vdpa_device *vdpa)
> +{
> +	return &vdpa->dev;
> +}
> +EXPORT_SYMBOL_GPL(vdpa_to_dev);
> +
> +static int vdpa_dev_probe(struct device *d)
> +{
> +	struct vdpa_device *dev = dev_to_vdpa(d);
> +	struct vdpa_driver *drv = drv_to_vdpa(dev->dev.driver);
> +	int ret = 0;
> +
> +	if (drv && drv->probe)
> +		ret = drv->probe(d);
> +
> +	return ret;
> +}
> +
> +static int vdpa_dev_remove(struct device *d)
> +{
> +	struct vdpa_device *dev = dev_to_vdpa(d);
> +	struct vdpa_driver *drv = drv_to_vdpa(dev->dev.driver);
> +
> +	if (drv && drv->remove)
> +		drv->remove(d);
> +
> +	return 0;
> +}
> +
> +static struct bus_type vdpa_bus = {
> +	.name  = "vdpa",
> +	.probe = vdpa_dev_probe,
> +	.remove = vdpa_dev_remove,
> +};
> +
> +int register_vdpa_device(struct vdpa_device *vdpa)
> +{
> +	int err;
> +
> +	if (!vdpa_get_parent(vdpa))
> +		return -EINVAL;
> +
> +	if (!vdpa->config)
> +		return -EINVAL;
> +
> +	err = ida_simple_get(&vdpa_index_ida, 0, 0, GFP_KERNEL);
> +	if (err < 0)
> +		return -EFAULT;
> +
> +	vdpa->dev.bus = &vdpa_bus;
> +	device_initialize(&vdpa->dev);
> +
> +	vdpa->index = err;
> +	dev_set_name(&vdpa->dev, "vdpa%u", vdpa->index);
> +
> +	err = device_add(&vdpa->dev);
> +	if (err)
> +		ida_simple_remove(&vdpa_index_ida, vdpa->index);
> +
> +	return err;
> +}
> +EXPORT_SYMBOL_GPL(register_vdpa_device);
> +
> +void unregister_vdpa_device(struct vdpa_device *vdpa)
> +{
> +	int index = vdpa->index;
> +
> +	device_unregister(&vdpa->dev);
> +	ida_simple_remove(&vdpa_index_ida, index);
> +}
> +EXPORT_SYMBOL_GPL(unregister_vdpa_device);
> +
> +int register_vdpa_driver(struct vdpa_driver *driver)
> +{
> +	driver->drv.bus = &vdpa_bus;
> +	return driver_register(&driver->drv);
> +}
> +EXPORT_SYMBOL_GPL(register_vdpa_driver);
> +
> +void unregister_vdpa_driver(struct vdpa_driver *driver)
> +{
> +	driver_unregister(&driver->drv);
> +}
> +EXPORT_SYMBOL_GPL(unregister_vdpa_driver);
> +
> +static int vdpa_init(void)
> +{
> +	if (bus_register(&vdpa_bus) != 0)
> +		panic("virtio bus registration failed");
> +	return 0;
> +}
> +
> +static void __exit vdpa_exit(void)
> +{
> +	bus_unregister(&vdpa_bus);
> +	ida_destroy(&vdpa_index_ida);
> +}
> +core_initcall(vdpa_init);
> +module_exit(vdpa_exit);
> +
> +MODULE_VERSION(MOD_VERSION);
> +MODULE_AUTHOR(MOD_AUTHOR);
> +MODULE_LICENSE(MOD_LICENSE);
> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> new file mode 100644
> index 000000000000..47760137ef66
> --- /dev/null
> +++ b/include/linux/vdpa.h
> @@ -0,0 +1,191 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_VDPA_H
> +#define _LINUX_VDPA_H
> +
> +#include <linux/device.h>
> +#include <linux/interrupt.h>
> +#include <linux/vhost_iotlb.h>
> +
> +/**
> + * vDPA callback definition.
> + * @callback: interrupt callback function
> + * @private: the data passed to the callback function
> + */
> +struct vdpa_callback {
> +	irqreturn_t (*callback)(void *data);
> +	void *private;
> +};
> +
> +/**
> + * vDPA device - representation of a vDPA device
> + * @dev: underlying device
> + * @config: the configuration ops for this device.
> + * @index: device index
> + */
> +struct vdpa_device {
> +	struct device dev;
> +	const struct vdpa_config_ops *config;
> +	int index;
> +};
> +
> +/**
> + * vDPA_config_ops - operations for configuring a vDPA device.
> + * Note: vDPA device drivers are required to implement all of the
> + * operations unless it is optional mentioned in the following list.
> + * @set_vq_address:		Set the address of virtqueue
> + *				@vdev: vdpa device
> + *				@idx: virtqueue index
> + *				@desc_area: address of desc area
> + *				@driver_area: address of driver area
> + *				@device_area: address of device area
> + *				Returns integer: success (0) or error (< 0)
> + * @set_vq_num:			Set the size of virtqueue
> + *				@vdev: vdpa device
> + *				@idx: virtqueue index
> + *				@num: the size of virtqueue
> + * @kick_vq:			Kick the virtqueue
> + *				@vdev: vdpa device
> + *				@idx: virtqueue index


This seems wrong: kicks are data path so drivers should not
do it in a vendor specific way. How about an API
returning the device/resource that can then be
mapped as appropriate?


> + * @set_vq_cb:			Set the interrupt callback function for
> + *				a virtqueue
> + *				@vdev: vdpa device
> + *				@idx: virtqueue index
> + *				@cb: virtio-vdev interrupt callback structure


Calls are data path too, I think we need some way to map MSI?

> + * @set_vq_ready:		Set ready status for a virtqueue
> + *				@vdev: vdpa device
> + *				@idx: virtqueue index
> + *				@ready: ready (true) not ready(false)
> + * @get_vq_ready:		Get ready status for a virtqueue
> + *				@vdev: vdpa device
> + *				@idx: virtqueue index
> + *				Returns boolean: ready (true) or not (false)
> + * @set_vq_state:		Set the state for a virtqueue
> + *				@vdev: vdpa device
> + *				@idx: virtqueue index
> + *				@state: virtqueue state (last_avail_idx)
> + *				Returns integer: success (0) or error (< 0)
> + * @get_vq_state:		Get the state for a virtqueue
> + *				@vdev: vdpa device
> + *				@idx: virtqueue index
> + *				Returns virtqueue state (last_avail_idx)
> + * @get_vq_align:		Get the virtqueue align requirement
> + *				for the device
> + *				@vdev: vdpa device
> + *				Returns virtqueue algin requirement


Where does this come from? Spec dictates that for a data path,
vendor specific values for this will break userspace ...

> + * @get_features:		Get virtio features supported by the device
> + *				@vdev: vdpa device
> + *				Returns the virtio features support by the
> + *				device
> + * @set_features:		Set virtio features supported by the driver
> + *				@vdev: vdpa device
> + *				@features: feature support by the driver
> + *				Returns integer: success (0) or error (< 0)
> + * @set_config_cb:		Set the config interrupt callback
> + *				@vdev: vdpa device
> + *				@cb: virtio-vdev interrupt callback structure
> + * @get_vq_num_max:		Get the max size of virtqueue
> + *				@vdev: vdpa device
> + *				Returns u16: max size of virtqueue


I'm not sure this has to be uniform across VQs.

> + * @get_device_id:		Get virtio device id
> + *				@vdev: vdpa device
> + *				Returns u32: virtio device id


is this the virtio ID? PCI ID?

> + * @get_vendor_id:		Get id for the vendor that provides this device
> + *				@vdev: vdpa device
> + *				Returns u32: virtio vendor id

what's the idea behind this? userspace normally doesn't interact with
this ... debugging?

> + * @get_status:			Get the device status
> + *				@vdev: vdpa device
> + *				Returns u8: virtio device status
> + * @set_status:			Set the device status
> + *				@vdev: vdpa device
> + *				@status: virtio device status
> + * @get_config:			Read from device specific configuration space
> + *				@vdev: vdpa device
> + *				@offset: offset from the beginning of
> + *				configuration space
> + *				@buf: buffer used to read to
> + *				@len: the length to read from
> + *				configuration space
> + * @set_config:			Write to device specific configuration space
> + *				@vdev: vdpa device
> + *				@offset: offset from the beginning of
> + *				configuration space
> + *				@buf: buffer used to write from
> + *				@len: the length to write to
> + *				configuration space
> + * @get_generation:		Get device config generation (optional)
> + *				@vdev: vdpa device
> + *				Returns u32: device generation
> + * @set_map:			Set device memory mapping, optional
> + *				and only needed for device that using
> + *				device specific DMA translation
> + *				(on-chip IOMMU)
> + *				@vdev: vdpa device
> + *				@iotlb: vhost memory mapping to be
> + *				used by the vDPA
> + *				Returns integer: success (0) or error (< 0)

OK so any change just swaps in a completely new mapping?
Wouldn't this make minor changes such as memory hotplug
quite expensive?

> + */
> +struct vdpa_config_ops {
> +	/* Virtqueue ops */
> +	int (*set_vq_address)(struct vdpa_device *vdev,
> +			      u16 idx, u64 desc_area, u64 driver_area,
> +			      u64 device_area);
> +	void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
> +	void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
> +	void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
> +			  struct vdpa_callback *cb);
> +	void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool ready);
> +	bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
> +	int (*set_vq_state)(struct vdpa_device *vdev, u16 idx, u64 state);
> +	u64 (*get_vq_state)(struct vdpa_device *vdev, u16 idx);
> +
> +	/* Device ops */
> +	u16 (*get_vq_align)(struct vdpa_device *vdev);
> +	u64 (*get_features)(struct vdpa_device *vdev);
> +	int (*set_features)(struct vdpa_device *vdev, u64 features);
> +	void (*set_config_cb)(struct vdpa_device *vdev,
> +			      struct vdpa_callback *cb);
> +	u16 (*get_vq_num_max)(struct vdpa_device *vdev);
> +	u32 (*get_device_id)(struct vdpa_device *vdev);
> +	u32 (*get_vendor_id)(struct vdpa_device *vdev);
> +	u8 (*get_status)(struct vdpa_device *vdev);
> +	void (*set_status)(struct vdpa_device *vdev, u8 status);
> +	void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
> +			   void *buf, unsigned int len);
> +	void (*set_config)(struct vdpa_device *vdev, unsigned int offset,
> +			   const void *buf, unsigned int len);
> +	u32 (*get_generation)(struct vdpa_device *vdev);
> +
> +	/* Mem table */
> +	int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb);
> +};
> +
> +int register_vdpa_device(struct vdpa_device *vdpa);
> +void unregister_vdpa_device(struct vdpa_device *vdpa);
> +
> +struct device *vdpa_get_parent(struct vdpa_device *vdpa);
> +void vdpa_set_parent(struct vdpa_device *vdpa, struct device *parent);
> +
> +struct vdpa_device *dev_to_vdpa(struct device *_dev);
> +struct device *vdpa_to_dev(struct vdpa_device *vdpa);
> +
> +/**
> + * vdpa_driver - operations for a vDPA driver
> + * @driver: underlying device driver
> + * @probe: the function to call when a device is found.  Returns 0 or -errno.
> + * @remove: the function to call when a device is removed.
> + */
> +struct vdpa_driver {
> +	struct device_driver drv;
> +	int (*probe)(struct device *dev);
> +	void (*remove)(struct device *dev);
> +};
> +
> +int register_vdpa_driver(struct vdpa_driver *drv);
> +void unregister_vdpa_driver(struct vdpa_driver *drv);
> +
> +static inline struct vdpa_driver *drv_to_vdpa(struct device_driver *drv)
> +{
> +	return container_of(drv, struct vdpa_driver, drv);
> +}
> +
> +#endif /* _LINUX_VDPA_H */
> -- 
> 2.19.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-17 12:13   ` Michael S. Tsirkin
@ 2020-01-17 13:52     ` Jason Wang
       [not found]       ` <CAJPjb1+fG9L3=iKbV4Vn13VwaeDZZdcfBPvarogF_Nzhk+FnKg@mail.gmail.com>
  0 siblings, 1 reply; 76+ messages in thread
From: Jason Wang @ 2020-01-17 13:52 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, kvm, virtualization, netdev, tiwei.bie, jgg,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller,
	xiao.w.wang, haotian.wang, lingshan.zhu, eperezma, lulu, parav,
	kevin.tian, stefanha, rdunlap, hch, aadam, jakub.kicinski, jiri,
	shahafs, hanand, mhabets


On 2020/1/17 下午8:13, Michael S. Tsirkin wrote:
> On Thu, Jan 16, 2020 at 08:42:29PM +0800, Jason Wang wrote:
>> vDPA device is a device that uses a datapath which complies with the
>> virtio specifications with vendor specific control path. vDPA devices
>> can be both physically located on the hardware or emulated by
>> software. vDPA hardware devices are usually implemented through PCIE
>> with the following types:
>>
>> - PF (Physical Function) - A single Physical Function
>> - VF (Virtual Function) - Device that supports single root I/O
>>    virtualization (SR-IOV). Its Virtual Function (VF) represents a
>>    virtualized instance of the device that can be assigned to different
>>    partitions
>> - VDEV (Virtual Device) - With technologies such as Intel Scalable
>>    IOV, a virtual device composed by host OS utilizing one or more
>>    ADIs.
>> - SF (Sub function) - Vendor specific interface to slice the Physical
>>    Function to multiple sub functions that can be assigned to different
>>    partitions as virtual devices.
>>
>> >From a driver's perspective, depends on how and where the DMA
>> translation is done, vDPA devices are split into two types:
>>
>> - Platform specific DMA translation - From the driver's perspective,
>>    the device can be used on a platform where device access to data in
>>    memory is limited and/or translated. An example is a PCIE vDPA whose
>>    DMA request was tagged via a bus (e.g PCIE) specific way. DMA
>>    translation and protection are done at PCIE bus IOMMU level.
>> - Device specific DMA translation - The device implements DMA
>>    isolation and protection through its own logic. An example is a vDPA
>>    device which uses on-chip IOMMU.
>>
>> To hide the differences and complexity of the above types for a vDPA
>> device/IOMMU options and in order to present a generic virtio device
>> to the upper layer, a device agnostic framework is required.
>>
>> This patch introduces a software vDPA bus which abstracts the
>> common attributes of vDPA device, vDPA bus driver and the
>> communication method (vdpa_config_ops) between the vDPA device
>> abstraction and the vDPA bus driver:
>>
>> With the abstraction of vDPA bus and vDPA bus operations, the
>> difference and complexity of the under layer hardware is hidden from
>> upper layer. The vDPA bus drivers on top can use a unified
>> vdpa_config_ops to control different types of vDPA device.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>   MAINTAINERS                  |   1 +
>>   drivers/virtio/Kconfig       |   2 +
>>   drivers/virtio/Makefile      |   1 +
>>   drivers/virtio/vdpa/Kconfig  |   9 ++
>>   drivers/virtio/vdpa/Makefile |   2 +
>>   drivers/virtio/vdpa/vdpa.c   | 141 ++++++++++++++++++++++++++
>>   include/linux/vdpa.h         | 191 +++++++++++++++++++++++++++++++++++
>>   7 files changed, 347 insertions(+)
>>   create mode 100644 drivers/virtio/vdpa/Kconfig
>>   create mode 100644 drivers/virtio/vdpa/Makefile
>>   create mode 100644 drivers/virtio/vdpa/vdpa.c
>>   create mode 100644 include/linux/vdpa.h
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index d4bda9c900fa..578d2a581e3b 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -17540,6 +17540,7 @@ F:	tools/virtio/
>>   F:	drivers/net/virtio_net.c
>>   F:	drivers/block/virtio_blk.c
>>   F:	include/linux/virtio*.h
>> +F:	include/linux/vdpa.h
>>   F:	include/uapi/linux/virtio_*.h
>>   F:	drivers/crypto/virtio/
>>   F:	mm/balloon_compaction.c
>> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
>> index 078615cf2afc..9c4fdb64d9ac 100644
>> --- a/drivers/virtio/Kconfig
>> +++ b/drivers/virtio/Kconfig
>> @@ -96,3 +96,5 @@ config VIRTIO_MMIO_CMDLINE_DEVICES
>>   	 If unsure, say 'N'.
>>   
>>   endif # VIRTIO_MENU
>> +
>> +source "drivers/virtio/vdpa/Kconfig"
>> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
>> index 3a2b5c5dcf46..fdf5eacd0d0a 100644
>> --- a/drivers/virtio/Makefile
>> +++ b/drivers/virtio/Makefile
>> @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>>   virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>>   obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>>   obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
>> +obj-$(CONFIG_VDPA) += vdpa/
>> diff --git a/drivers/virtio/vdpa/Kconfig b/drivers/virtio/vdpa/Kconfig
>> new file mode 100644
>> index 000000000000..3032727b4d98
>> --- /dev/null
>> +++ b/drivers/virtio/vdpa/Kconfig
>> @@ -0,0 +1,9 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +config VDPA
>> +	tristate
>> +        default n
>> +        help
>> +          Enable this module to support vDPA device that uses a
>> +          datapath which complies with virtio specifications with
>> +          vendor specific control path.
>> +
>> diff --git a/drivers/virtio/vdpa/Makefile b/drivers/virtio/vdpa/Makefile
>> new file mode 100644
>> index 000000000000..ee6a35e8a4fb
>> --- /dev/null
>> +++ b/drivers/virtio/vdpa/Makefile
>> @@ -0,0 +1,2 @@
>> +# SPDX-License-Identifier: GPL-2.0
>> +obj-$(CONFIG_VDPA) += vdpa.o
>> diff --git a/drivers/virtio/vdpa/vdpa.c b/drivers/virtio/vdpa/vdpa.c
>> new file mode 100644
>> index 000000000000..2b0e4a9f105d
>> --- /dev/null
>> +++ b/drivers/virtio/vdpa/vdpa.c
>> @@ -0,0 +1,141 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * vDPA bus.
>> + *
>> + * Copyright (c) 2019, Red Hat. All rights reserved.
>> + *     Author: Jason Wang <jasowang@redhat.com>
>> + *
>> + */
>> +
>> +#include <linux/module.h>
>> +#include <linux/idr.h>
>> +#include <linux/vdpa.h>
>> +
>> +#define MOD_VERSION  "0.1"
>> +#define MOD_DESC     "vDPA bus"
>> +#define MOD_AUTHOR   "Jason Wang <jasowang@redhat.com>"
>> +#define MOD_LICENSE  "GPL v2"
>> +
>> +static DEFINE_IDA(vdpa_index_ida);
>> +
>> +struct device *vdpa_get_parent(struct vdpa_device *vdpa)
>> +{
>> +	return vdpa->dev.parent;
>> +}
>> +EXPORT_SYMBOL(vdpa_get_parent);
>> +
>> +void vdpa_set_parent(struct vdpa_device *vdpa, struct device *parent)
>> +{
>> +	vdpa->dev.parent = parent;
>> +}
>> +EXPORT_SYMBOL(vdpa_set_parent);
>> +
>> +struct vdpa_device *dev_to_vdpa(struct device *_dev)
>> +{
>> +	return container_of(_dev, struct vdpa_device, dev);
>> +}
>> +EXPORT_SYMBOL_GPL(dev_to_vdpa);
>> +
>> +struct device *vdpa_to_dev(struct vdpa_device *vdpa)
>> +{
>> +	return &vdpa->dev;
>> +}
>> +EXPORT_SYMBOL_GPL(vdpa_to_dev);
>> +
>> +static int vdpa_dev_probe(struct device *d)
>> +{
>> +	struct vdpa_device *dev = dev_to_vdpa(d);
>> +	struct vdpa_driver *drv = drv_to_vdpa(dev->dev.driver);
>> +	int ret = 0;
>> +
>> +	if (drv && drv->probe)
>> +		ret = drv->probe(d);
>> +
>> +	return ret;
>> +}
>> +
>> +static int vdpa_dev_remove(struct device *d)
>> +{
>> +	struct vdpa_device *dev = dev_to_vdpa(d);
>> +	struct vdpa_driver *drv = drv_to_vdpa(dev->dev.driver);
>> +
>> +	if (drv && drv->remove)
>> +		drv->remove(d);
>> +
>> +	return 0;
>> +}
>> +
>> +static struct bus_type vdpa_bus = {
>> +	.name  = "vdpa",
>> +	.probe = vdpa_dev_probe,
>> +	.remove = vdpa_dev_remove,
>> +};
>> +
>> +int register_vdpa_device(struct vdpa_device *vdpa)
>> +{
>> +	int err;
>> +
>> +	if (!vdpa_get_parent(vdpa))
>> +		return -EINVAL;
>> +
>> +	if (!vdpa->config)
>> +		return -EINVAL;
>> +
>> +	err = ida_simple_get(&vdpa_index_ida, 0, 0, GFP_KERNEL);
>> +	if (err < 0)
>> +		return -EFAULT;
>> +
>> +	vdpa->dev.bus = &vdpa_bus;
>> +	device_initialize(&vdpa->dev);
>> +
>> +	vdpa->index = err;
>> +	dev_set_name(&vdpa->dev, "vdpa%u", vdpa->index);
>> +
>> +	err = device_add(&vdpa->dev);
>> +	if (err)
>> +		ida_simple_remove(&vdpa_index_ida, vdpa->index);
>> +
>> +	return err;
>> +}
>> +EXPORT_SYMBOL_GPL(register_vdpa_device);
>> +
>> +void unregister_vdpa_device(struct vdpa_device *vdpa)
>> +{
>> +	int index = vdpa->index;
>> +
>> +	device_unregister(&vdpa->dev);
>> +	ida_simple_remove(&vdpa_index_ida, index);
>> +}
>> +EXPORT_SYMBOL_GPL(unregister_vdpa_device);
>> +
>> +int register_vdpa_driver(struct vdpa_driver *driver)
>> +{
>> +	driver->drv.bus = &vdpa_bus;
>> +	return driver_register(&driver->drv);
>> +}
>> +EXPORT_SYMBOL_GPL(register_vdpa_driver);
>> +
>> +void unregister_vdpa_driver(struct vdpa_driver *driver)
>> +{
>> +	driver_unregister(&driver->drv);
>> +}
>> +EXPORT_SYMBOL_GPL(unregister_vdpa_driver);
>> +
>> +static int vdpa_init(void)
>> +{
>> +	if (bus_register(&vdpa_bus) != 0)
>> +		panic("virtio bus registration failed");
>> +	return 0;
>> +}
>> +
>> +static void __exit vdpa_exit(void)
>> +{
>> +	bus_unregister(&vdpa_bus);
>> +	ida_destroy(&vdpa_index_ida);
>> +}
>> +core_initcall(vdpa_init);
>> +module_exit(vdpa_exit);
>> +
>> +MODULE_VERSION(MOD_VERSION);
>> +MODULE_AUTHOR(MOD_AUTHOR);
>> +MODULE_LICENSE(MOD_LICENSE);
>> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
>> new file mode 100644
>> index 000000000000..47760137ef66
>> --- /dev/null
>> +++ b/include/linux/vdpa.h
>> @@ -0,0 +1,191 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef _LINUX_VDPA_H
>> +#define _LINUX_VDPA_H
>> +
>> +#include <linux/device.h>
>> +#include <linux/interrupt.h>
>> +#include <linux/vhost_iotlb.h>
>> +
>> +/**
>> + * vDPA callback definition.
>> + * @callback: interrupt callback function
>> + * @private: the data passed to the callback function
>> + */
>> +struct vdpa_callback {
>> +	irqreturn_t (*callback)(void *data);
>> +	void *private;
>> +};
>> +
>> +/**
>> + * vDPA device - representation of a vDPA device
>> + * @dev: underlying device
>> + * @config: the configuration ops for this device.
>> + * @index: device index
>> + */
>> +struct vdpa_device {
>> +	struct device dev;
>> +	const struct vdpa_config_ops *config;
>> +	int index;
>> +};
>> +
>> +/**
>> + * vDPA_config_ops - operations for configuring a vDPA device.
>> + * Note: vDPA device drivers are required to implement all of the
>> + * operations unless it is optional mentioned in the following list.
>> + * @set_vq_address:		Set the address of virtqueue
>> + *				@vdev: vdpa device
>> + *				@idx: virtqueue index
>> + *				@desc_area: address of desc area
>> + *				@driver_area: address of driver area
>> + *				@device_area: address of device area
>> + *				Returns integer: success (0) or error (< 0)
>> + * @set_vq_num:			Set the size of virtqueue
>> + *				@vdev: vdpa device
>> + *				@idx: virtqueue index
>> + *				@num: the size of virtqueue
>> + * @kick_vq:			Kick the virtqueue
>> + *				@vdev: vdpa device
>> + *				@idx: virtqueue index
>
> This seems wrong: kicks are data path so drivers should not
> do it in a vendor specific way.


I'm not sure I get this since the doorbell is pretty vendor specific.

The idea here is to start form simple and common cases that can work for 
both kernel virtio drivers and vhost:

- For kernel, kick_vq() is called from vq->notify() directly
- For vhost, vhost is in charge of hook eventfd to kick_vq()


>   How about an API
> returning the device/resource that can then be
> mapped as appropriate?
>

Yes, this could be a further optimization on top but not a must (only 
work for e.g the doorbell does not share MMIO space with other 
functions). For vhost we need something like this and need to hook it to 
mmap() of vhost file descriptor.


>> + * @set_vq_cb:			Set the interrupt callback function for
>> + *				a virtqueue
>> + *				@vdev: vdpa device
>> + *				@idx: virtqueue index
>> + *				@cb: virtio-vdev interrupt callback structure
>
> Calls are data path too, I think we need some way to map MSI?


Similarly, this could be a optimization on top, and we can start from 
simple and common cases:

- For kernel, the vq callback could be mapped to MSI interrupt handler 
directly
- For vhost, eventfd wakeup could be hook in the cb here


>
>> + * @set_vq_ready:		Set ready status for a virtqueue
>> + *				@vdev: vdpa device
>> + *				@idx: virtqueue index
>> + *				@ready: ready (true) not ready(false)
>> + * @get_vq_ready:		Get ready status for a virtqueue
>> + *				@vdev: vdpa device
>> + *				@idx: virtqueue index
>> + *				Returns boolean: ready (true) or not (false)
>> + * @set_vq_state:		Set the state for a virtqueue
>> + *				@vdev: vdpa device
>> + *				@idx: virtqueue index
>> + *				@state: virtqueue state (last_avail_idx)
>> + *				Returns integer: success (0) or error (< 0)
>> + * @get_vq_state:		Get the state for a virtqueue
>> + *				@vdev: vdpa device
>> + *				@idx: virtqueue index
>> + *				Returns virtqueue state (last_avail_idx)
>> + * @get_vq_align:		Get the virtqueue align requirement
>> + *				for the device
>> + *				@vdev: vdpa device
>> + *				Returns virtqueue algin requirement
>
> Where does this come from? Spec dictates that for a data path,
> vendor specific values for this will break userspace ...


It comes from the align parameter of vring_create_virtqueue(). We can 
expose the alignment to userspace if necessary. If it's not necessary, I 
can drop this method here.


>
>> + * @get_features:		Get virtio features supported by the device
>> + *				@vdev: vdpa device
>> + *				Returns the virtio features support by the
>> + *				device
>> + * @set_features:		Set virtio features supported by the driver
>> + *				@vdev: vdpa device
>> + *				@features: feature support by the driver
>> + *				Returns integer: success (0) or error (< 0)
>> + * @set_config_cb:		Set the config interrupt callback
>> + *				@vdev: vdpa device
>> + *				@cb: virtio-vdev interrupt callback structure
>> + * @get_vq_num_max:		Get the max size of virtqueue
>> + *				@vdev: vdpa device
>> + *				Returns u16: max size of virtqueue
>
> I'm not sure this has to be uniform across VQs.


Let me add an index parameter to this.


>
>> + * @get_device_id:		Get virtio device id
>> + *				@vdev: vdpa device
>> + *				Returns u32: virtio device id
>
> is this the virtio ID? PCI ID?


Virtio ID


>
>> + * @get_vendor_id:		Get id for the vendor that provides this device
>> + *				@vdev: vdpa device
>> + *				Returns u32: virtio vendor id
> what's the idea behind this? userspace normally doesn't interact with
> this ... debugging?


This allows some vendor specific driver on top of vDPA bus. If this is 
not interested, I can drop this.


>
>> + * @get_status:			Get the device status
>> + *				@vdev: vdpa device
>> + *				Returns u8: virtio device status
>> + * @set_status:			Set the device status
>> + *				@vdev: vdpa device
>> + *				@status: virtio device status
>> + * @get_config:			Read from device specific configuration space
>> + *				@vdev: vdpa device
>> + *				@offset: offset from the beginning of
>> + *				configuration space
>> + *				@buf: buffer used to read to
>> + *				@len: the length to read from
>> + *				configuration space
>> + * @set_config:			Write to device specific configuration space
>> + *				@vdev: vdpa device
>> + *				@offset: offset from the beginning of
>> + *				configuration space
>> + *				@buf: buffer used to write from
>> + *				@len: the length to write to
>> + *				configuration space
>> + * @get_generation:		Get device config generation (optional)
>> + *				@vdev: vdpa device
>> + *				Returns u32: device generation
>> + * @set_map:			Set device memory mapping, optional
>> + *				and only needed for device that using
>> + *				device specific DMA translation
>> + *				(on-chip IOMMU)
>> + *				@vdev: vdpa device
>> + *				@iotlb: vhost memory mapping to be
>> + *				used by the vDPA
>> + *				Returns integer: success (0) or error (< 0)
> OK so any change just swaps in a completely new mapping?
> Wouldn't this make minor changes such as memory hotplug
> quite expensive?


My understanding is that the incremental updating of the on chip IOMMU 
may degrade the  performance. So vendor vDPA drivers may want to know 
all the mappings at once. Technically, we can keep the incremental API 
here and let the vendor vDPA drivers to record the full mapping 
internally which may slightly increase the complexity of vendor driver. 
We need more inputs from vendors here.

Thanks


>
>> + */
>> +struct vdpa_config_ops {
>> +	/* Virtqueue ops */
>> +	int (*set_vq_address)(struct vdpa_device *vdev,
>> +			      u16 idx, u64 desc_area, u64 driver_area,
>> +			      u64 device_area);
>> +	void (*set_vq_num)(struct vdpa_device *vdev, u16 idx, u32 num);
>> +	void (*kick_vq)(struct vdpa_device *vdev, u16 idx);
>> +	void (*set_vq_cb)(struct vdpa_device *vdev, u16 idx,
>> +			  struct vdpa_callback *cb);
>> +	void (*set_vq_ready)(struct vdpa_device *vdev, u16 idx, bool ready);
>> +	bool (*get_vq_ready)(struct vdpa_device *vdev, u16 idx);
>> +	int (*set_vq_state)(struct vdpa_device *vdev, u16 idx, u64 state);
>> +	u64 (*get_vq_state)(struct vdpa_device *vdev, u16 idx);
>> +
>> +	/* Device ops */
>> +	u16 (*get_vq_align)(struct vdpa_device *vdev);
>> +	u64 (*get_features)(struct vdpa_device *vdev);
>> +	int (*set_features)(struct vdpa_device *vdev, u64 features);
>> +	void (*set_config_cb)(struct vdpa_device *vdev,
>> +			      struct vdpa_callback *cb);
>> +	u16 (*get_vq_num_max)(struct vdpa_device *vdev);
>> +	u32 (*get_device_id)(struct vdpa_device *vdev);
>> +	u32 (*get_vendor_id)(struct vdpa_device *vdev);
>> +	u8 (*get_status)(struct vdpa_device *vdev);
>> +	void (*set_status)(struct vdpa_device *vdev, u8 status);
>> +	void (*get_config)(struct vdpa_device *vdev, unsigned int offset,
>> +			   void *buf, unsigned int len);
>> +	void (*set_config)(struct vdpa_device *vdev, unsigned int offset,
>> +			   const void *buf, unsigned int len);
>> +	u32 (*get_generation)(struct vdpa_device *vdev);
>> +
>> +	/* Mem table */
>> +	int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb);
>> +};
>> +
>> +int register_vdpa_device(struct vdpa_device *vdpa);
>> +void unregister_vdpa_device(struct vdpa_device *vdpa);
>> +
>> +struct device *vdpa_get_parent(struct vdpa_device *vdpa);
>> +void vdpa_set_parent(struct vdpa_device *vdpa, struct device *parent);
>> +
>> +struct vdpa_device *dev_to_vdpa(struct device *_dev);
>> +struct device *vdpa_to_dev(struct vdpa_device *vdpa);
>> +
>> +/**
>> + * vdpa_driver - operations for a vDPA driver
>> + * @driver: underlying device driver
>> + * @probe: the function to call when a device is found.  Returns 0 or -errno.
>> + * @remove: the function to call when a device is removed.
>> + */
>> +struct vdpa_driver {
>> +	struct device_driver drv;
>> +	int (*probe)(struct device *dev);
>> +	void (*remove)(struct device *dev);
>> +};
>> +
>> +int register_vdpa_driver(struct vdpa_driver *drv);
>> +void unregister_vdpa_driver(struct vdpa_driver *drv);
>> +
>> +static inline struct vdpa_driver *drv_to_vdpa(struct device_driver *drv)
>> +{
>> +	return container_of(drv, struct vdpa_driver, drv);
>> +}
>> +
>> +#endif /* _LINUX_VDPA_H */
>> -- 
>> 2.19.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-17  3:03     ` Jason Wang
@ 2020-01-17 13:54       ` Jason Gunthorpe
  2020-01-20  7:50         ` Jason Wang
  2020-01-20 12:17         ` Michael S. Tsirkin
  2020-01-21  8:40       ` Tian, Kevin
  1 sibling, 2 replies; 76+ messages in thread
From: Jason Gunthorpe @ 2020-01-17 13:54 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, linux-kernel, kvm, virtualization, netdev, tiwei.bie,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller,
	xiao.w.wang, haotian.wang, lingshan.zhu, eperezma, lulu,
	Parav Pandit, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, Jiri Pirko, Shahaf Shuler, hanand, mhabets

On Fri, Jan 17, 2020 at 11:03:12AM +0800, Jason Wang wrote:
> 
> On 2020/1/16 下午11:22, Jason Gunthorpe wrote:
> > On Thu, Jan 16, 2020 at 08:42:29PM +0800, Jason Wang wrote:
> > > vDPA device is a device that uses a datapath which complies with the
> > > virtio specifications with vendor specific control path. vDPA devices
> > > can be both physically located on the hardware or emulated by
> > > software. vDPA hardware devices are usually implemented through PCIE
> > > with the following types:
> > > 
> > > - PF (Physical Function) - A single Physical Function
> > > - VF (Virtual Function) - Device that supports single root I/O
> > >    virtualization (SR-IOV). Its Virtual Function (VF) represents a
> > >    virtualized instance of the device that can be assigned to different
> > >    partitions
> > > - VDEV (Virtual Device) - With technologies such as Intel Scalable
> > >    IOV, a virtual device composed by host OS utilizing one or more
> > >    ADIs.
> > > - SF (Sub function) - Vendor specific interface to slice the Physical
> > >    Function to multiple sub functions that can be assigned to different
> > >    partitions as virtual devices.
> > I really hope we don't end up with two different ways to spell this
> > same thing.
> 
> I think you meant ADI vs SF. It looks to me that ADI is limited to the scope
> of scalable IOV but SF not.

I think if one looks carefully you'd find that SF and ADI are using
very similar techiniques. For instance we'd also like to use the code
reorg of the MSIX vector setup with SFs that Intel is calling IMS.

Really SIOV is simply a bundle of pre-existing stuff under a tidy
name, whatever code skeleton we come up with for SFs should be re-used
for ADI.

> > Shouldn't there be a device/driver matching process of some kind?
> 
> 
> The question is what do we want do match here.
> 
> 1) "virtio" vs "vhost", I implemented matching method for this in mdev
> series, but it looks unnecessary for vDPA device driver to know about this.
> Anyway we can use sysfs driver bind/unbind to switch drivers
> 2) virtio device id and vendor id. I'm not sure we need this consider the
> two drivers so far (virtio/vhost) are all bus drivers.

As we seem to be contemplating some dynamic creation of vdpa devices I
think upon creation time it should be specified what mode they should
run it and then all driver binding and autoloading should happen
automatically. Telling the user to bind/unbind is a very poor
experience.

Jason

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 4/5] virtio: introduce a vDPA based transport
  2020-01-17  9:32     ` Jason Wang
@ 2020-01-17 14:00       ` Jason Gunthorpe
  2020-01-20  7:52         ` Jason Wang
  0 siblings, 1 reply; 76+ messages in thread
From: Jason Gunthorpe @ 2020-01-17 14:00 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, linux-kernel, kvm, virtualization, netdev, tiwei.bie,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller,
	xiao.w.wang, haotian.wang, lingshan.zhu, eperezma, lulu,
	Parav Pandit, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, Jiri Pirko, Shahaf Shuler, hanand, mhabets

On Fri, Jan 17, 2020 at 05:32:35PM +0800, Jason Wang wrote:
> > 
> > > +	const struct vdpa_config_ops *ops = vdpa->config;
> > > +	struct virtio_vdpa_device *vd_dev;
> > > +	int rc;
> > > +
> > > +	vd_dev = devm_kzalloc(dev, sizeof(*vd_dev), GFP_KERNEL);
> > > +	if (!vd_dev)
> > > +		return -ENOMEM;
> > This is not right, the struct device lifetime is controled by a kref,
> > not via devm. If you want to use a devm unwind then the unwind is
> > put_device, not devm_kfree.
> 
> I'm not sure I get the point here. The lifetime is bound to underlying vDPA
> device and devres allow to be freed before the vpda device is released. But
> I agree using devres of underlying vdpa device looks wired.

Once device_initialize is called the only way to free a struct device
is via put_device, while here you have a devm trigger that will
unconditionally do kfree on a struct device without respecting the
reference count.

reference counted memory must never be allocated with devm.

> > > +	vd_dev->vdev.dev.release = virtio_vdpa_release_dev;
> > > +	vd_dev->vdev.config = &virtio_vdpa_config_ops;
> > > +	vd_dev->vdpa = vdpa;
> > > +	INIT_LIST_HEAD(&vd_dev->virtqueues);
> > > +	spin_lock_init(&vd_dev->lock);
> > > +
> > > +	vd_dev->vdev.id.device = ops->get_device_id(vdpa);
> > > +	if (vd_dev->vdev.id.device == 0)
> > > +		return -ENODEV;
> > > +
> > > +	vd_dev->vdev.id.vendor = ops->get_vendor_id(vdpa);
> > > +	rc = register_virtio_device(&vd_dev->vdev);
> > > +	if (rc)
> > > +		put_device(dev);
> > And a ugly unwind like this is why you want to have device_initialize()
> > exposed to the driver,
> 
> In this context, which "driver" did you mean here? (Note, virtio-vdpa is the
> driver for vDPA bus here).

'driver' is the thing using the 'core' library calls to implement a
device, so here the 'vd_dev' is the driver and
'register_virtio_device' is the core

> > 
> > Where is the various THIS_MODULE's I expect to see in a scheme like
> > this?
> > 
> > All function pointers must be protected by a held module reference
> > count, ie the above probe/remove and all the pointers in ops.
> 
> Will double check, since I don't see this in other virtio transport drivers
> (PCI or MMIO).

pci_register_driver is a macro that provides a THIS_MODULE, and the
pci core code sets driver.owner, then the rest of the stuff related to
driver ops is supposed to work against that to protect the driver ops.

For the device module refcounting you either need to ensure that
'unregister' is a strong fence and guanentees that no device ops are
called past unregister (noting that this is impossible for release),
or you need to hold the module lock until release.

It is common to see non-core subsystems get this stuff wrong.

Jason

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 5/5] vdpasim: vDPA device simulator
  2020-01-17  9:32     ` Jason Wang
@ 2020-01-17 14:10       ` Jason Gunthorpe
  2020-01-20  8:01         ` Jason Wang
  2020-02-04  4:19         ` Jason Wang
  0 siblings, 2 replies; 76+ messages in thread
From: Jason Gunthorpe @ 2020-01-17 14:10 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, linux-kernel, kvm, virtualization, netdev, tiwei.bie,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller,
	xiao.w.wang, haotian.wang, lingshan.zhu, eperezma, lulu,
	Parav Pandit, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, Jiri Pirko, Shahaf Shuler, hanand, mhabets, kuba

On Fri, Jan 17, 2020 at 05:32:39PM +0800, Jason Wang wrote:
> 
> On 2020/1/16 下午11:47, Jason Gunthorpe wrote:
> > On Thu, Jan 16, 2020 at 08:42:31PM +0800, Jason Wang wrote:
> > > This patch implements a software vDPA networking device. The datapath
> > > is implemented through vringh and workqueue. The device has an on-chip
> > > IOMMU which translates IOVA to PA. For kernel virtio drivers, vDPA
> > > simulator driver provides dma_ops. For vhost driers, set_map() methods
> > > of vdpa_config_ops is implemented to accept mappings from vhost.
> > > 
> > > A sysfs based management interface is implemented, devices are
> > > created and removed through:
> > > 
> > > /sys/devices/virtual/vdpa_simulator/netdev/{create|remove}
> > This is very gross, creating a class just to get a create/remove and
> > then not using the class for anything else? Yuk.
> 
> 
> It includes more information, e.g the devices and the link from vdpa_sim
> device and vdpa device.

I feel like regardless of how the device is created there should be a
consistent virtio centric management for post-creation tasks, such as
introspection and destruction

A virto struct device should already have back pointers to it's parent
device, which should be enough to discover the vdpa_sim, none of the
extra sysfs munging should be needed.

> > > Netlink based lifecycle management could be implemented for vDPA
> > > simulator as well.
> > This is just begging for a netlink based approach.
> > 
> > Certainly netlink driven removal should be an agreeable standard for
> > all devices, I think.
> 
> 
> Well, I think Parav had some proposals during the discussion of mdev
> approach. But I'm not sure if he had any RFC codes for me to integrate it
> into vdpasim.
>
> Or do you want me to propose the netlink API? If yes, would you prefer to a
> new virtio dedicated one or be a subset of devlink?

Well, lets see what feed back Parav has

Jason

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/5] vringh: IOTLB support
  2020-01-16 12:42 ` [PATCH 2/5] vringh: IOTLB support Jason Wang
@ 2020-01-17 21:54   ` kbuild test robot
  2020-01-17 22:33   ` kbuild test robot
  1 sibling, 0 replies; 76+ messages in thread
From: kbuild test robot @ 2020-01-17 21:54 UTC (permalink / raw)
  To: Jason Wang
  Cc: kbuild-all, mst, jasowang, linux-kernel, kvm, virtualization,
	netdev, tiwei.bie, jgg, maxime.coquelin, cunming.liang,
	zhihong.wang, rob.miller, xiao.w.wang, haotian.wang,
	lingshan.zhu, eperezma, lulu, parav, kevin.tian, stefanha,
	rdunlap, hch, aadam, jakub.kicinski, jiri, shahafs, hanand,
	mhabets


[-- Attachment #1: Type: text/plain, Size: 2662 bytes --]

Hi Jason,

I love your patch! Yet something to improve:

[auto build test ERROR on vhost/linux-next]
[also build test ERROR on linux/master linus/master v5.5-rc6 next-20200117]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Jason-Wang/vDPA-support/20200117-170243
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: nios2-randconfig-a001-20200117 (attached as .config)
compiler: nios2-linux-gcc (GCC) 7.5.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.5.0 make.cross ARCH=nios2 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/vhost/vringh.o: In function `iotlb_translate':
>> drivers/vhost/vringh.c:1079: undefined reference to `vhost_iotlb_itree_first'
   drivers/vhost/vringh.c:1079:(.text+0x45c): relocation truncated to fit: R_NIOS2_CALL26 against `vhost_iotlb_itree_first'

vim +1079 drivers/vhost/vringh.c

  1061	
  1062	static int iotlb_translate(const struct vringh *vrh,
  1063				   u64 addr, u64 len, struct bio_vec iov[],
  1064				   int iov_size, u32 perm)
  1065	{
  1066		struct vhost_iotlb_map *map;
  1067		struct vhost_iotlb *iotlb = vrh->iotlb;
  1068		int ret = 0;
  1069		u64 s = 0;
  1070	
  1071		while (len > s) {
  1072			u64 size, pa, pfn;
  1073	
  1074			if (unlikely(ret >= iov_size)) {
  1075				ret = -ENOBUFS;
  1076				break;
  1077			}
  1078	
> 1079			map = vhost_iotlb_itree_first(iotlb, addr,
  1080						      addr + len - 1);
  1081			if (!map || map->start > addr) {
  1082				ret = -EINVAL;
  1083				break;
  1084			} else if (!(map->perm & perm)) {
  1085				ret = -EPERM;
  1086				break;
  1087			}
  1088	
  1089			size = map->size - addr + map->start;
  1090			pa = map->addr + addr - map->start;
  1091			pfn = pa >> PAGE_SHIFT;
  1092			iov[ret].bv_page = pfn_to_page(pfn);
  1093			iov[ret].bv_len = min(len - s, size);
  1094			iov[ret].bv_offset = pa & (PAGE_SIZE - 1);
  1095			s += size;
  1096			addr += size;
  1097			++ret;
  1098		}
  1099	
  1100		return ret;
  1101	}
  1102	

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25345 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/5] vringh: IOTLB support
  2020-01-16 12:42 ` [PATCH 2/5] vringh: IOTLB support Jason Wang
  2020-01-17 21:54   ` kbuild test robot
@ 2020-01-17 22:33   ` kbuild test robot
  1 sibling, 0 replies; 76+ messages in thread
From: kbuild test robot @ 2020-01-17 22:33 UTC (permalink / raw)
  To: Jason Wang
  Cc: kbuild-all, mst, jasowang, linux-kernel, kvm, virtualization,
	netdev, tiwei.bie, jgg, maxime.coquelin, cunming.liang,
	zhihong.wang, rob.miller, xiao.w.wang, haotian.wang,
	lingshan.zhu, eperezma, lulu, parav, kevin.tian, stefanha,
	rdunlap, hch, aadam, jakub.kicinski, jiri, shahafs, hanand,
	mhabets


[-- Attachment #1: Type: text/plain, Size: 2835 bytes --]

Hi Jason,

I love your patch! Perhaps something to improve:

[auto build test WARNING on vhost/linux-next]
[also build test WARNING on linux/master linus/master v5.5-rc6 next-20200117]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Jason-Wang/vDPA-support/20200117-170243
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: sh-randconfig-a001-20200117 (attached as .config)
compiler: sh4-linux-gcc (GCC) 7.5.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.5.0 make.cross ARCH=sh 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   drivers/vhost/vringh.c: In function 'copy_from_iotlb':
>> drivers/vhost/vringh.c:1110:29: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
     ret = iotlb_translate(vrh, (u64)src, len, iov, 16, VHOST_MAP_RO);
                                ^
   drivers/vhost/vringh.c: In function 'copy_to_iotlb':
   drivers/vhost/vringh.c:1128:29: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
     ret = iotlb_translate(vrh, (u64)dst, len, iov, 16, VHOST_MAP_WO);
                                ^
   drivers/vhost/vringh.c: In function 'getu16_iotlb':
   drivers/vhost/vringh.c:1145:29: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
     ret = iotlb_translate(vrh, (u64)p, sizeof(*p),
                                ^
   drivers/vhost/vringh.c: In function 'putu16_iotlb':
   drivers/vhost/vringh.c:1166:29: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
     ret = iotlb_translate(vrh, (u64)p, sizeof(*p),
                                ^

vim +1110 drivers/vhost/vringh.c

  1102	
  1103	static inline int copy_from_iotlb(const struct vringh *vrh, void *dst,
  1104					  void *src, size_t len)
  1105	{
  1106		struct iov_iter iter;
  1107		struct bio_vec iov[16];
  1108		int ret;
  1109	
> 1110		ret = iotlb_translate(vrh, (u64)src, len, iov, 16, VHOST_MAP_RO);
  1111		if (ret < 0)
  1112			return ret;
  1113	
  1114		iov_iter_bvec(&iter, READ, iov, ret, len);
  1115	
  1116		ret = copy_from_iter(dst, len, &iter);
  1117	
  1118		return ret;
  1119	}
  1120	

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28673 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 1/5] vhost: factor out IOTLB
  2020-01-16 12:42 ` [PATCH 1/5] vhost: factor out IOTLB Jason Wang
  2020-01-17  4:14   ` Randy Dunlap
@ 2020-01-18  0:01   ` kbuild test robot
  2020-01-18  0:40   ` kbuild test robot
  2 siblings, 0 replies; 76+ messages in thread
From: kbuild test robot @ 2020-01-18  0:01 UTC (permalink / raw)
  To: Jason Wang
  Cc: kbuild-all, mst, jasowang, linux-kernel, kvm, virtualization,
	netdev, tiwei.bie, jgg, maxime.coquelin, cunming.liang,
	zhihong.wang, rob.miller, xiao.w.wang, haotian.wang,
	lingshan.zhu, eperezma, lulu, parav, kevin.tian, stefanha,
	rdunlap, hch, aadam, jakub.kicinski, jiri, shahafs, hanand,
	mhabets


[-- Attachment #1: Type: text/plain, Size: 2387 bytes --]

Hi Jason,

I love your patch! Yet something to improve:

[auto build test ERROR on vhost/linux-next]
[also build test ERROR on linux/master linus/master v5.5-rc6 next-20200117]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Jason-Wang/vDPA-support/20200117-170243
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: i386-randconfig-g003-20200117 (attached as .config)
compiler: gcc-7 (Debian 7.5.0-3) 7.5.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   ld: drivers/vhost/vhost.o: in function `vhost_dev_reset_owner_prepare':
>> (.text+0x18b): undefined reference to `vhost_iotlb_alloc'
   ld: drivers/vhost/vhost.o: in function `vhost_init_device_iotlb':
   (.text+0x204): undefined reference to `vhost_iotlb_alloc'
>> ld: (.text+0x26c): undefined reference to `vhost_iotlb_free'
   ld: drivers/vhost/vhost.o: in function `iotlb_access_ok':
>> vhost.c:(.text+0x60e): undefined reference to `vhost_iotlb_itree_first'
   ld: drivers/vhost/vhost.o: in function `translate_desc':
   vhost.c:(.text+0x85f): undefined reference to `vhost_iotlb_itree_first'
   ld: drivers/vhost/vhost.o: in function `vhost_dev_cleanup':
>> (.text+0xd4a): undefined reference to `vhost_iotlb_free'
   ld: (.text+0xd59): undefined reference to `vhost_iotlb_free'
   ld: drivers/vhost/vhost.o: in function `vhost_chr_write_iter':
>> (.text+0x1c3b): undefined reference to `vhost_iotlb_del_range'
>> ld: (.text+0x1d78): undefined reference to `vhost_iotlb_add_range'
   ld: drivers/vhost/vhost.o: in function `vhost_dev_ioctl':
   (.text+0x3d6d): undefined reference to `vhost_iotlb_alloc'
   ld: (.text+0x3dd5): undefined reference to `vhost_iotlb_add_range'
   ld: (.text+0x3de4): undefined reference to `vhost_iotlb_free'
   ld: (.text+0x3e7e): undefined reference to `vhost_iotlb_free'

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34728 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 1/5] vhost: factor out IOTLB
  2020-01-16 12:42 ` [PATCH 1/5] vhost: factor out IOTLB Jason Wang
  2020-01-17  4:14   ` Randy Dunlap
  2020-01-18  0:01   ` kbuild test robot
@ 2020-01-18  0:40   ` kbuild test robot
  2 siblings, 0 replies; 76+ messages in thread
From: kbuild test robot @ 2020-01-18  0:40 UTC (permalink / raw)
  To: Jason Wang
  Cc: kbuild-all, mst, jasowang, linux-kernel, kvm, virtualization,
	netdev, tiwei.bie, jgg, maxime.coquelin, cunming.liang,
	zhihong.wang, rob.miller, xiao.w.wang, haotian.wang,
	lingshan.zhu, eperezma, lulu, parav, kevin.tian, stefanha,
	rdunlap, hch, aadam, jakub.kicinski, jiri, shahafs, hanand,
	mhabets


[-- Attachment #1: Type: text/plain, Size: 5445 bytes --]

Hi Jason,

I love your patch! Yet something to improve:

[auto build test ERROR on vhost/linux-next]
[also build test ERROR on linux/master linus/master v5.5-rc6 next-20200117]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Jason-Wang/vDPA-support/20200117-170243
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: i386-randconfig-d002-20200117 (attached as .config)
compiler: gcc-7 (Debian 7.5.0-3) 7.5.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   ld: drivers/vhost/vhost.o: in function `iotlb_alloc':
>> drivers/vhost/vhost.c:582: undefined reference to `vhost_iotlb_alloc'
>> ld: drivers/vhost/vhost.c:582: undefined reference to `vhost_iotlb_alloc'
   ld: drivers/vhost/vhost.o: in function `vhost_init_device_iotlb':
>> drivers/vhost/vhost.c:1667: undefined reference to `vhost_iotlb_free'
   ld: drivers/vhost/vhost.o: in function `iotlb_access_ok':
>> drivers/vhost/vhost.c:1271: undefined reference to `vhost_iotlb_itree_first'
   ld: drivers/vhost/vhost.o: in function `translate_desc':
   drivers/vhost/vhost.c:1981: undefined reference to `vhost_iotlb_itree_first'
   ld: drivers/vhost/vhost.o: in function `vhost_dev_cleanup':
   drivers/vhost/vhost.c:658: undefined reference to `vhost_iotlb_free'
>> ld: drivers/vhost/vhost.c:660: undefined reference to `vhost_iotlb_free'
   ld: drivers/vhost/vhost.o: in function `vhost_process_iotlb_msg':
>> drivers/vhost/vhost.c:1070: undefined reference to `vhost_iotlb_del_range'
>> ld: drivers/vhost/vhost.c:1056: undefined reference to `vhost_iotlb_add_range'
   ld: drivers/vhost/vhost.o: in function `iotlb_alloc':
>> drivers/vhost/vhost.c:582: undefined reference to `vhost_iotlb_alloc'
   ld: drivers/vhost/vhost.o: in function `vhost_set_memory':
>> drivers/vhost/vhost.c:1380: undefined reference to `vhost_iotlb_add_range'
   ld: drivers/vhost/vhost.c:1407: undefined reference to `vhost_iotlb_free'
   ld: drivers/vhost/vhost.c:1403: undefined reference to `vhost_iotlb_free'

vim +582 drivers/vhost/vhost.c

   579	
   580	static struct vhost_iotlb *iotlb_alloc(void)
   581	{
 > 582		return vhost_iotlb_alloc(max_iotlb_entries,
   583					 VHOST_IOTLB_FLAG_RETIRE);
   584	}
   585	
   586	struct vhost_iotlb *vhost_dev_reset_owner_prepare(void)
   587	{
   588		return iotlb_alloc();
   589	}
   590	EXPORT_SYMBOL_GPL(vhost_dev_reset_owner_prepare);
   591	
   592	/* Caller should have device mutex */
   593	void vhost_dev_reset_owner(struct vhost_dev *dev, struct vhost_iotlb *umem)
   594	{
   595		int i;
   596	
   597		vhost_dev_cleanup(dev);
   598	
   599		dev->umem = umem;
   600		/* We don't need VQ locks below since vhost_dev_cleanup makes sure
   601		 * VQs aren't running.
   602		 */
   603		for (i = 0; i < dev->nvqs; ++i)
   604			dev->vqs[i]->umem = umem;
   605	}
   606	EXPORT_SYMBOL_GPL(vhost_dev_reset_owner);
   607	
   608	void vhost_dev_stop(struct vhost_dev *dev)
   609	{
   610		int i;
   611	
   612		for (i = 0; i < dev->nvqs; ++i) {
   613			if (dev->vqs[i]->kick && dev->vqs[i]->handle_kick) {
   614				vhost_poll_stop(&dev->vqs[i]->poll);
   615				vhost_poll_flush(&dev->vqs[i]->poll);
   616			}
   617		}
   618	}
   619	EXPORT_SYMBOL_GPL(vhost_dev_stop);
   620	
   621	static void vhost_clear_msg(struct vhost_dev *dev)
   622	{
   623		struct vhost_msg_node *node, *n;
   624	
   625		spin_lock(&dev->iotlb_lock);
   626	
   627		list_for_each_entry_safe(node, n, &dev->read_list, node) {
   628			list_del(&node->node);
   629			kfree(node);
   630		}
   631	
   632		list_for_each_entry_safe(node, n, &dev->pending_list, node) {
   633			list_del(&node->node);
   634			kfree(node);
   635		}
   636	
   637		spin_unlock(&dev->iotlb_lock);
   638	}
   639	
   640	void vhost_dev_cleanup(struct vhost_dev *dev)
   641	{
   642		int i;
   643	
   644		for (i = 0; i < dev->nvqs; ++i) {
   645			if (dev->vqs[i]->error_ctx)
   646				eventfd_ctx_put(dev->vqs[i]->error_ctx);
   647			if (dev->vqs[i]->kick)
   648				fput(dev->vqs[i]->kick);
   649			if (dev->vqs[i]->call_ctx)
   650				eventfd_ctx_put(dev->vqs[i]->call_ctx);
   651			vhost_vq_reset(dev, dev->vqs[i]);
   652		}
   653		vhost_dev_free_iovecs(dev);
   654		if (dev->log_ctx)
   655			eventfd_ctx_put(dev->log_ctx);
   656		dev->log_ctx = NULL;
   657		/* No one will access memory at this point */
   658		vhost_iotlb_free(dev->umem);
   659		dev->umem = NULL;
 > 660		vhost_iotlb_free(dev->iotlb);
   661		dev->iotlb = NULL;
   662		vhost_clear_msg(dev);
   663		wake_up_interruptible_poll(&dev->wait, EPOLLIN | EPOLLRDNORM);
   664		WARN_ON(!llist_empty(&dev->work_list));
   665		if (dev->worker) {
   666			kthread_stop(dev->worker);
   667			dev->worker = NULL;
   668			dev->kcov_handle = 0;
   669		}
   670		if (dev->mm)
   671			mmput(dev->mm);
   672		dev->mm = NULL;
   673	}
   674	EXPORT_SYMBOL_GPL(vhost_dev_cleanup);
   675	

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 32011 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 5/5] vdpasim: vDPA device simulator
  2020-01-16 12:42 ` [PATCH 5/5] vdpasim: vDPA device simulator Jason Wang
  2020-01-16 15:47   ` Jason Gunthorpe
  2020-01-17  4:12   ` Randy Dunlap
@ 2020-01-18 18:18   ` kbuild test robot
  2020-01-28  3:32   ` Dan Carpenter
  2020-02-04  8:21   ` Zhu Lingshan
  4 siblings, 0 replies; 76+ messages in thread
From: kbuild test robot @ 2020-01-18 18:18 UTC (permalink / raw)
  To: Jason Wang
  Cc: kbuild-all, mst, jasowang, linux-kernel, kvm, virtualization,
	netdev, tiwei.bie, jgg, maxime.coquelin, cunming.liang,
	zhihong.wang, rob.miller, xiao.w.wang, haotian.wang,
	lingshan.zhu, eperezma, lulu, parav, kevin.tian, stefanha,
	rdunlap, hch, aadam, jakub.kicinski, jiri, shahafs, hanand,
	mhabets


[-- Attachment #1: Type: text/plain, Size: 2415 bytes --]

Hi Jason,

I love your patch! Yet something to improve:

[auto build test ERROR on vhost/linux-next]
[also build test ERROR on linux/master linus/master v5.5-rc6 next-20200117]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Jason-Wang/vDPA-support/20200117-170243
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: m68k-allmodconfig (attached as .config)
compiler: m68k-linux-gcc (GCC) 7.5.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.5.0 make.cross ARCH=m68k 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   drivers/virtio/vdpa/vdpa_sim.c: In function 'vdpasim_queue_ready':
>> drivers/virtio/vdpa/vdpa_sim.c:101:19: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
               false, (struct vring_desc *)vq->desc_addr,
                      ^
   drivers/virtio/vdpa/vdpa_sim.c:102:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
        (struct vring_avail *)vq->driver_addr,
        ^
   drivers/virtio/vdpa/vdpa_sim.c:103:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
        (struct vring_used *)vq->device_addr);
        ^
--
>> ERROR: "vhost_iotlb_free" [drivers/virtio//vdpa/vdpa_sim.ko] undefined!
>> ERROR: "vhost_iotlb_alloc" [drivers/virtio//vdpa/vdpa_sim.ko] undefined!
>> ERROR: "vhost_iotlb_itree_next" [drivers/virtio//vdpa/vdpa_sim.ko] undefined!
>> ERROR: "vhost_iotlb_itree_first" [drivers/virtio//vdpa/vdpa_sim.ko] undefined!
>> ERROR: "vhost_iotlb_reset" [drivers/virtio//vdpa/vdpa_sim.ko] undefined!
>> ERROR: "vhost_iotlb_add_range" [drivers/virtio//vdpa/vdpa_sim.ko] undefined!
>> ERROR: "vhost_iotlb_del_range" [drivers/virtio//vdpa/vdpa_sim.ko] undefined!

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 51799 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [PATCH 3/5] vDPA: introduce vDPA bus
       [not found]       ` <CAJPjb1+fG9L3=iKbV4Vn13VwaeDZZdcfBPvarogF_Nzhk+FnKg@mail.gmail.com>
@ 2020-01-19  9:07         ` Shahaf Shuler
  2020-01-19  9:59           ` Michael S. Tsirkin
  2020-01-20  8:43           ` Jason Wang
  2020-01-20  8:19         ` Jason Wang
  1 sibling, 2 replies; 76+ messages in thread
From: Shahaf Shuler @ 2020-01-19  9:07 UTC (permalink / raw)
  To: Rob Miller, Jason Wang
  Cc: Michael S. Tsirkin, linux-kernel, kvm, virtualization, Netdev,
	Bie, Tiwei, Jason Gunthorpe, maxime.coquelin, Liang, Cunming,
	Wang, Zhihong, Wang, Xiao W, haotian.wang, Zhu, Lingshan,
	eperezma, lulu, Parav Pandit, Tian, Kevin, stefanha, rdunlap,
	hch, Ariel Adam, jakub.kicinski, Jiri Pirko, hanand, mhabets

Friday, January 17, 2020 4:13 PM, Rob Miller:
Subject: Re: [PATCH 3/5] vDPA: introduce vDPA bus
>>On 2020/1/17 下午8:13, Michael S. Tsirkin wrote:
>>> On Thu, Jan 16, 2020 at 08:42:29PM +0800, Jason Wang wrote:

[...]

>>> + * @set_map:                        Set device memory mapping, optional
>>> + *                          and only needed for device that using
>>> + *                          device specific DMA translation
>>> + *                          (on-chip IOMMU)
>>> + *                          @vdev: vdpa device
>>> + *                          @iotlb: vhost memory mapping to be
>>> + *                          used by the vDPA
>>> + *                          Returns integer: success (0) or error (< 0)
>> OK so any change just swaps in a completely new mapping?
>> Wouldn't this make minor changes such as memory hotplug
>> quite expensive?

What is the concern? Traversing the rb tree or fully replace the on-chip IOMMU translations? 
If the latest, then I think we can take such optimization on the driver level (i.e. to update only the diff between the two mapping). 
If the first one, then I think memory hotplug is a heavy flow regardless. Do you think the extra cycles for the tree traverse will be visible in any way? 

>
>My understanding is that the incremental updating of the on chip IOMMU 
>may degrade the  performance. So vendor vDPA drivers may want to know 
>all the mappings at once. 

Yes exact. For Mellanox case for instance many optimization can be performed on a given memory layout.

>Technically, we can keep the incremental API 
>here and let the vendor vDPA drivers to record the full mapping 
>internally which may slightly increase the complexity of vendor driver. 

What will be the trigger for the driver to know it received the last mapping on this series and it can now push it to the on-chip IOMMU?

>We need more inputs from vendors here.
>
>Thanks



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-19  9:07         ` Shahaf Shuler
@ 2020-01-19  9:59           ` Michael S. Tsirkin
  2020-01-20  8:44             ` Jason Wang
  2020-01-20  8:43           ` Jason Wang
  1 sibling, 1 reply; 76+ messages in thread
From: Michael S. Tsirkin @ 2020-01-19  9:59 UTC (permalink / raw)
  To: Shahaf Shuler
  Cc: Rob Miller, Jason Wang, linux-kernel, kvm, virtualization,
	Netdev, Bie, Tiwei, Jason Gunthorpe, maxime.coquelin, Liang,
	Cunming, Wang, Zhihong, Wang, Xiao W, haotian.wang, Zhu,
	Lingshan, eperezma, lulu, Parav Pandit, Tian, Kevin, stefanha,
	rdunlap, hch, Ariel Adam, jakub.kicinski, Jiri Pirko, hanand,
	mhabets

On Sun, Jan 19, 2020 at 09:07:09AM +0000, Shahaf Shuler wrote:
> >Technically, we can keep the incremental API 
> >here and let the vendor vDPA drivers to record the full mapping 
> >internally which may slightly increase the complexity of vendor driver. 
> 
> What will be the trigger for the driver to know it received the last mapping on this series and it can now push it to the on-chip IOMMU?

Some kind of invalidate API?

-- 
MST


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-17 13:54       ` Jason Gunthorpe
@ 2020-01-20  7:50         ` Jason Wang
  2020-01-20 12:17         ` Michael S. Tsirkin
  1 sibling, 0 replies; 76+ messages in thread
From: Jason Wang @ 2020-01-20  7:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mst, linux-kernel, kvm, virtualization, netdev, tiwei.bie,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller,
	xiao.w.wang, haotian.wang, lingshan.zhu, eperezma, lulu,
	Parav Pandit, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, Jiri Pirko, Shahaf Shuler, hanand, mhabets


On 2020/1/17 下午9:54, Jason Gunthorpe wrote:
> On Fri, Jan 17, 2020 at 11:03:12AM +0800, Jason Wang wrote:
>> On 2020/1/16 下午11:22, Jason Gunthorpe wrote:
>>> On Thu, Jan 16, 2020 at 08:42:29PM +0800, Jason Wang wrote:
>>>> vDPA device is a device that uses a datapath which complies with the
>>>> virtio specifications with vendor specific control path. vDPA devices
>>>> can be both physically located on the hardware or emulated by
>>>> software. vDPA hardware devices are usually implemented through PCIE
>>>> with the following types:
>>>>
>>>> - PF (Physical Function) - A single Physical Function
>>>> - VF (Virtual Function) - Device that supports single root I/O
>>>>     virtualization (SR-IOV). Its Virtual Function (VF) represents a
>>>>     virtualized instance of the device that can be assigned to different
>>>>     partitions
>>>> - VDEV (Virtual Device) - With technologies such as Intel Scalable
>>>>     IOV, a virtual device composed by host OS utilizing one or more
>>>>     ADIs.
>>>> - SF (Sub function) - Vendor specific interface to slice the Physical
>>>>     Function to multiple sub functions that can be assigned to different
>>>>     partitions as virtual devices.
>>> I really hope we don't end up with two different ways to spell this
>>> same thing.
>> I think you meant ADI vs SF. It looks to me that ADI is limited to the scope
>> of scalable IOV but SF not.
> I think if one looks carefully you'd find that SF and ADI are using
> very similar techiniques. For instance we'd also like to use the code
> reorg of the MSIX vector setup with SFs that Intel is calling IMS.
>
> Really SIOV is simply a bundle of pre-existing stuff under a tidy
> name, whatever code skeleton we come up with for SFs should be re-used
> for ADI.


Ok, but do you prefer to mention ADI only for the next version?


>
>>> Shouldn't there be a device/driver matching process of some kind?
>>
>> The question is what do we want do match here.
>>
>> 1) "virtio" vs "vhost", I implemented matching method for this in mdev
>> series, but it looks unnecessary for vDPA device driver to know about this.
>> Anyway we can use sysfs driver bind/unbind to switch drivers
>> 2) virtio device id and vendor id. I'm not sure we need this consider the
>> two drivers so far (virtio/vhost) are all bus drivers.
> As we seem to be contemplating some dynamic creation of vdpa devices I
> think upon creation time it should be specified what mode they should
> run it and then all driver binding and autoloading should happen
> automatically. Telling the user to bind/unbind is a very poor
> experience.
>
> Jason


Ok, I will add the type (virtio vs vhost) and driver matching method back.

Thanks


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 4/5] virtio: introduce a vDPA based transport
  2020-01-17 14:00       ` Jason Gunthorpe
@ 2020-01-20  7:52         ` Jason Wang
  0 siblings, 0 replies; 76+ messages in thread
From: Jason Wang @ 2020-01-20  7:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mst, linux-kernel, kvm, virtualization, netdev, tiwei.bie,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller,
	xiao.w.wang, haotian.wang, lingshan.zhu, eperezma, lulu,
	Parav Pandit, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, Jiri Pirko, Shahaf Shuler, hanand, mhabets


On 2020/1/17 下午10:00, Jason Gunthorpe wrote:
> On Fri, Jan 17, 2020 at 05:32:35PM +0800, Jason Wang wrote:
>>>> +	const struct vdpa_config_ops *ops = vdpa->config;
>>>> +	struct virtio_vdpa_device *vd_dev;
>>>> +	int rc;
>>>> +
>>>> +	vd_dev = devm_kzalloc(dev, sizeof(*vd_dev), GFP_KERNEL);
>>>> +	if (!vd_dev)
>>>> +		return -ENOMEM;
>>> This is not right, the struct device lifetime is controled by a kref,
>>> not via devm. If you want to use a devm unwind then the unwind is
>>> put_device, not devm_kfree.
>> I'm not sure I get the point here. The lifetime is bound to underlying vDPA
>> device and devres allow to be freed before the vpda device is released. But
>> I agree using devres of underlying vdpa device looks wired.
> Once device_initialize is called the only way to free a struct device
> is via put_device, while here you have a devm trigger that will
> unconditionally do kfree on a struct device without respecting the
> reference count.
>
> reference counted memory must never be allocated with devm.


Right, fixed.


>
>>>> +	vd_dev->vdev.dev.release = virtio_vdpa_release_dev;
>>>> +	vd_dev->vdev.config = &virtio_vdpa_config_ops;
>>>> +	vd_dev->vdpa = vdpa;
>>>> +	INIT_LIST_HEAD(&vd_dev->virtqueues);
>>>> +	spin_lock_init(&vd_dev->lock);
>>>> +
>>>> +	vd_dev->vdev.id.device = ops->get_device_id(vdpa);
>>>> +	if (vd_dev->vdev.id.device == 0)
>>>> +		return -ENODEV;
>>>> +
>>>> +	vd_dev->vdev.id.vendor = ops->get_vendor_id(vdpa);
>>>> +	rc = register_virtio_device(&vd_dev->vdev);
>>>> +	if (rc)
>>>> +		put_device(dev);
>>> And a ugly unwind like this is why you want to have device_initialize()
>>> exposed to the driver,
>> In this context, which "driver" did you mean here? (Note, virtio-vdpa is the
>> driver for vDPA bus here).
> 'driver' is the thing using the 'core' library calls to implement a
> device, so here the 'vd_dev' is the driver and
> 'register_virtio_device' is the core


Ok.


>
>>> Where is the various THIS_MODULE's I expect to see in a scheme like
>>> this?
>>>
>>> All function pointers must be protected by a held module reference
>>> count, ie the above probe/remove and all the pointers in ops.
>> Will double check, since I don't see this in other virtio transport drivers
>> (PCI or MMIO).
> pci_register_driver is a macro that provides a THIS_MODULE, and the
> pci core code sets driver.owner, then the rest of the stuff related to
> driver ops is supposed to work against that to protect the driver ops.
>
> For the device module refcounting you either need to ensure that
> 'unregister' is a strong fence and guanentees that no device ops are
> called past unregister (noting that this is impossible for release),
> or you need to hold the module lock until release.
>
> It is common to see non-core subsystems get this stuff wrong.
>
> Jason


Ok. I see.

Thanks


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 5/5] vdpasim: vDPA device simulator
  2020-01-17 14:10       ` Jason Gunthorpe
@ 2020-01-20  8:01         ` Jason Wang
  2020-02-04  4:19         ` Jason Wang
  1 sibling, 0 replies; 76+ messages in thread
From: Jason Wang @ 2020-01-20  8:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: mst, linux-kernel, kvm, virtualization, netdev, tiwei.bie,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller,
	xiao.w.wang, haotian.wang, lingshan.zhu, eperezma, lulu,
	Parav Pandit, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, Jiri Pirko, Shahaf Shuler, hanand, mhabets, kuba


On 2020/1/17 下午10:10, Jason Gunthorpe wrote:
> On Fri, Jan 17, 2020 at 05:32:39PM +0800, Jason Wang wrote:
>> On 2020/1/16 下午11:47, Jason Gunthorpe wrote:
>>> On Thu, Jan 16, 2020 at 08:42:31PM +0800, Jason Wang wrote:
>>>> This patch implements a software vDPA networking device. The datapath
>>>> is implemented through vringh and workqueue. The device has an on-chip
>>>> IOMMU which translates IOVA to PA. For kernel virtio drivers, vDPA
>>>> simulator driver provides dma_ops. For vhost driers, set_map() methods
>>>> of vdpa_config_ops is implemented to accept mappings from vhost.
>>>>
>>>> A sysfs based management interface is implemented, devices are
>>>> created and removed through:
>>>>
>>>> /sys/devices/virtual/vdpa_simulator/netdev/{create|remove}
>>> This is very gross, creating a class just to get a create/remove and
>>> then not using the class for anything else? Yuk.
>>
>> It includes more information, e.g the devices and the link from vdpa_sim
>> device and vdpa device.
> I feel like regardless of how the device is created there should be a
> consistent virtio centric management for post-creation tasks, such as
> introspection and destruction


Right, actually, this is something that could be done by sysfs as well. 
Having an intermediate steps as "activate" and introducing attributes 
for post-creation tasks.


>
> A virto struct device should already have back pointers to it's parent
> device, which should be enough to discover the vdpa_sim, none of the
> extra sysfs munging should be needed.
>
>>>> Netlink based lifecycle management could be implemented for vDPA
>>>> simulator as well.
>>> This is just begging for a netlink based approach.
>>>
>>> Certainly netlink driven removal should be an agreeable standard for
>>> all devices, I think.
>>
>> Well, I think Parav had some proposals during the discussion of mdev
>> approach. But I'm not sure if he had any RFC codes for me to integrate it
>> into vdpasim.
>>
>> Or do you want me to propose the netlink API? If yes, would you prefer to a
>> new virtio dedicated one or be a subset of devlink?
> Well, lets see what feed back Parav has
>
> Jason


Ok.

Thanks


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
       [not found]       ` <CAJPjb1+fG9L3=iKbV4Vn13VwaeDZZdcfBPvarogF_Nzhk+FnKg@mail.gmail.com>
  2020-01-19  9:07         ` Shahaf Shuler
@ 2020-01-20  8:19         ` Jason Wang
  1 sibling, 0 replies; 76+ messages in thread
From: Jason Wang @ 2020-01-20  8:19 UTC (permalink / raw)
  To: Rob Miller
  Cc: Michael S. Tsirkin, linux-kernel, kvm, virtualization, Netdev,
	Bie, Tiwei, Jason Gunthorpe, maxime.coquelin, Liang, Cunming,
	Wang, Zhihong, Wang, Xiao W, haotian.wang, Zhu, Lingshan,
	eperezma, lulu, Parav Pandit, Tian, Kevin, stefanha, rdunlap,
	hch, Ariel Adam, jakub.kicinski, Jiri Pirko, shahafs, hanand,
	mhabets


On 2020/1/17 下午10:12, Rob Miller wrote:
>
>
>     >> + * @get_vendor_id:          Get id for the vendor that
>     provides this device
>     >> + *                          @vdev: vdpa device
>     >> + *                          Returns u32: virtio vendor id
>     > what's the idea behind this? userspace normally doesn't interact
>     with
>     > this ... debugging?
>
>
>     This allows some vendor specific driver on top of vDPA bus. If
>     this is
>     not interested, I can drop this.
>
> RJM>] wouldn't  usage of get_device_id & get_vendor_id, beyond 
> reporting, tend to possibly leading to vendor specific code in the 
> framework instead of leaving the framework agnostic and leave the 
> vendor specific stuff to the vendor's vDPA device driver?


For virtio device id, I think it is needed for kernel/userspace to know 
which driver to load (e.g loading virtio-net for networking devic).

For virtio vendor id, it was needed by kernel virtio driver, and virtio 
bus can match driver based on virtio vendor id. So it doesn't prevent 
3rd vendor specific driver for virtio device.

Maybe we can report VIRTIO_DEV_ANY_ID as vendor id to forbid vendor 
specific stuffs.

Thanks



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-19  9:07         ` Shahaf Shuler
  2020-01-19  9:59           ` Michael S. Tsirkin
@ 2020-01-20  8:43           ` Jason Wang
  2020-01-20 17:49             ` Jason Gunthorpe
  1 sibling, 1 reply; 76+ messages in thread
From: Jason Wang @ 2020-01-20  8:43 UTC (permalink / raw)
  To: Shahaf Shuler, Rob Miller
  Cc: Michael S. Tsirkin, linux-kernel, kvm, virtualization, Netdev,
	Bie, Tiwei, Jason Gunthorpe, maxime.coquelin, Liang, Cunming,
	Wang, Zhihong, Wang, Xiao W, haotian.wang, Zhu, Lingshan,
	eperezma, lulu, Parav Pandit, Tian, Kevin, stefanha, rdunlap,
	hch, Ariel Adam, jakub.kicinski, Jiri Pirko, hanand, mhabets


On 2020/1/19 下午5:07, Shahaf Shuler wrote:
> Friday, January 17, 2020 4:13 PM, Rob Miller:
> Subject: Re: [PATCH 3/5] vDPA: introduce vDPA bus
>>> On 2020/1/17 下午8:13, Michael S. Tsirkin wrote:
>>>> On Thu, Jan 16, 2020 at 08:42:29PM +0800, Jason Wang wrote:
> [...]
>
>>>> + * @set_map:                        Set device memory mapping, optional
>>>> + *                          and only needed for device that using
>>>> + *                          device specific DMA translation
>>>> + *                          (on-chip IOMMU)
>>>> + *                          @vdev: vdpa device
>>>> + *                          @iotlb: vhost memory mapping to be
>>>> + *                          used by the vDPA
>>>> + *                          Returns integer: success (0) or error (< 0)
>>> OK so any change just swaps in a completely new mapping?
>>> Wouldn't this make minor changes such as memory hotplug
>>> quite expensive?
> What is the concern? Traversing the rb tree or fully replace the on-chip IOMMU translations?
> If the latest, then I think we can take such optimization on the driver level (i.e. to update only the diff between the two mapping).


This is similar to the design of platform IOMMU part of vhost-vdpa. We 
decide to send diffs to platform IOMMU there. If it's ok to do that in 
driver, we can replace set_map with incremental API like map()/unmap().

Then driver need to maintain rbtree itself.


> If the first one, then I think memory hotplug is a heavy flow regardless. Do you think the extra cycles for the tree traverse will be visible in any way?


I think if the driver can pause the DMA during the time for setting up 
new mapping, it should be fine.


>   
>
>> My understanding is that the incremental updating of the on chip IOMMU
>> may degrade the  performance. So vendor vDPA drivers may want to know
>> all the mappings at once.
> Yes exact. For Mellanox case for instance many optimization can be performed on a given memory layout.
>
>> Technically, we can keep the incremental API
>> here and let the vendor vDPA drivers to record the full mapping
>> internally which may slightly increase the complexity of vendor driver.
> What will be the trigger for the driver to know it received the last mapping on this series and it can now push it to the on-chip IOMMU?


For GPA->HVA(HPA) mapping, we can have flag for this.

But for GIOVA_>HVA(HPA) mapping which could be changed by guest, it 
looks to me there's no concept of "last mapping" there. I guess in this 
case, mappings needs to be set from the ground. This could be expensive 
but consider most application uses static mappings (e.g dpdk in guest). 
It should be ok.

Thanks


>
>> We need more inputs from vendors here.
>>
>> Thanks
>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-19  9:59           ` Michael S. Tsirkin
@ 2020-01-20  8:44             ` Jason Wang
  2020-01-20 12:09               ` Michael S. Tsirkin
  0 siblings, 1 reply; 76+ messages in thread
From: Jason Wang @ 2020-01-20  8:44 UTC (permalink / raw)
  To: Michael S. Tsirkin, Shahaf Shuler
  Cc: Rob Miller, linux-kernel, kvm, virtualization, Netdev, Bie,
	Tiwei, Jason Gunthorpe, maxime.coquelin, Liang, Cunming, Wang,
	Zhihong, Wang, Xiao W, haotian.wang, Zhu, Lingshan, eperezma,
	lulu, Parav Pandit, Tian, Kevin, stefanha, rdunlap, hch,
	Ariel Adam, jakub.kicinski, Jiri Pirko, hanand, mhabets


On 2020/1/19 下午5:59, Michael S. Tsirkin wrote:
> On Sun, Jan 19, 2020 at 09:07:09AM +0000, Shahaf Shuler wrote:
>>> Technically, we can keep the incremental API
>>> here and let the vendor vDPA drivers to record the full mapping
>>> internally which may slightly increase the complexity of vendor driver.
>> What will be the trigger for the driver to know it received the last mapping on this series and it can now push it to the on-chip IOMMU?
> Some kind of invalidate API?
>

The problem is how to deal with the case of vIOMMU. When vIOMMU is 
enabling there's no concept of last mapping.

Thanks


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-20  8:44             ` Jason Wang
@ 2020-01-20 12:09               ` Michael S. Tsirkin
  2020-01-21  3:32                 ` Jason Wang
  0 siblings, 1 reply; 76+ messages in thread
From: Michael S. Tsirkin @ 2020-01-20 12:09 UTC (permalink / raw)
  To: Jason Wang
  Cc: Shahaf Shuler, Rob Miller, linux-kernel, kvm, virtualization,
	Netdev, Bie, Tiwei, Jason Gunthorpe, maxime.coquelin, Liang,
	Cunming, Wang, Zhihong, Wang, Xiao W, haotian.wang, Zhu,
	Lingshan, eperezma, lulu, Parav Pandit, Tian, Kevin, stefanha,
	rdunlap, hch, Ariel Adam, jakub.kicinski, Jiri Pirko, hanand,
	mhabets

On Mon, Jan 20, 2020 at 04:44:34PM +0800, Jason Wang wrote:
> 
> On 2020/1/19 下午5:59, Michael S. Tsirkin wrote:
> > On Sun, Jan 19, 2020 at 09:07:09AM +0000, Shahaf Shuler wrote:
> > > > Technically, we can keep the incremental API
> > > > here and let the vendor vDPA drivers to record the full mapping
> > > > internally which may slightly increase the complexity of vendor driver.
> > > What will be the trigger for the driver to know it received the last mapping on this series and it can now push it to the on-chip IOMMU?
> > Some kind of invalidate API?
> > 
> 
> The problem is how to deal with the case of vIOMMU. When vIOMMU is enabling
> there's no concept of last mapping.
> 
> Thanks

Most IOMMUs have a translation cache so have an invalidate API too.

-- 
MST


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-17 13:54       ` Jason Gunthorpe
  2020-01-20  7:50         ` Jason Wang
@ 2020-01-20 12:17         ` Michael S. Tsirkin
  2020-01-20 17:50           ` Jason Gunthorpe
  1 sibling, 1 reply; 76+ messages in thread
From: Michael S. Tsirkin @ 2020-01-20 12:17 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jason Wang, linux-kernel, kvm, virtualization, netdev, tiwei.bie,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller,
	xiao.w.wang, haotian.wang, lingshan.zhu, eperezma, lulu,
	Parav Pandit, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, Jiri Pirko, Shahaf Shuler, hanand, mhabets

On Fri, Jan 17, 2020 at 01:54:42PM +0000, Jason Gunthorpe wrote:
> > 1) "virtio" vs "vhost", I implemented matching method for this in mdev
> > series, but it looks unnecessary for vDPA device driver to know about this.
> > Anyway we can use sysfs driver bind/unbind to switch drivers
> > 2) virtio device id and vendor id. I'm not sure we need this consider the
> > two drivers so far (virtio/vhost) are all bus drivers.
> 
> As we seem to be contemplating some dynamic creation of vdpa devices I
> think upon creation time it should be specified what mode they should
> run it and then all driver binding and autoloading should happen
> automatically. Telling the user to bind/unbind is a very poor
> experience.

Maybe but OTOH it's an existing interface. I think we can reasonably
start with bind/unbind and then add ability to specify
the mode later. bind/unbind come from core so they will be
maintained anyway.
-- 
MST


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-20  8:43           ` Jason Wang
@ 2020-01-20 17:49             ` Jason Gunthorpe
  2020-01-20 20:51               ` Shahaf Shuler
                                 ` (2 more replies)
  0 siblings, 3 replies; 76+ messages in thread
From: Jason Gunthorpe @ 2020-01-20 17:49 UTC (permalink / raw)
  To: Jason Wang
  Cc: Shahaf Shuler, Rob Miller, Michael S. Tsirkin, linux-kernel, kvm,
	virtualization, Netdev, Bie, Tiwei, maxime.coquelin, Liang,
	Cunming, Wang, Zhihong, Wang, Xiao W, haotian.wang, Zhu,
	Lingshan, eperezma, lulu, Parav Pandit, Tian, Kevin, stefanha,
	rdunlap, hch, Ariel Adam, jakub.kicinski, Jiri Pirko, hanand,
	mhabets

On Mon, Jan 20, 2020 at 04:43:53PM +0800, Jason Wang wrote:
> This is similar to the design of platform IOMMU part of vhost-vdpa. We
> decide to send diffs to platform IOMMU there. If it's ok to do that in
> driver, we can replace set_map with incremental API like map()/unmap().
> 
> Then driver need to maintain rbtree itself.

I think we really need to see two modes, one where there is a fixed
translation without dynamic vIOMMU driven changes and one that
supports vIOMMU.

There are different optimization goals in the drivers for these two
configurations.

> > If the first one, then I think memory hotplug is a heavy flow
> > regardless. Do you think the extra cycles for the tree traverse
> > will be visible in any way?
> 
> I think if the driver can pause the DMA during the time for setting up new
> mapping, it should be fine.

This is very tricky for any driver if the mapping change hits the
virtio rings. :(

Even a IOMMU using driver is going to have problems with that..

Jason

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-20 12:17         ` Michael S. Tsirkin
@ 2020-01-20 17:50           ` Jason Gunthorpe
  2020-01-20 21:56             ` Michael S. Tsirkin
  0 siblings, 1 reply; 76+ messages in thread
From: Jason Gunthorpe @ 2020-01-20 17:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, linux-kernel, kvm, virtualization, netdev, tiwei.bie,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller,
	xiao.w.wang, haotian.wang, lingshan.zhu, eperezma, lulu,
	Parav Pandit, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, Jiri Pirko, Shahaf Shuler, hanand, mhabets

On Mon, Jan 20, 2020 at 07:17:26AM -0500, Michael S. Tsirkin wrote:
> On Fri, Jan 17, 2020 at 01:54:42PM +0000, Jason Gunthorpe wrote:
> > > 1) "virtio" vs "vhost", I implemented matching method for this in mdev
> > > series, but it looks unnecessary for vDPA device driver to know about this.
> > > Anyway we can use sysfs driver bind/unbind to switch drivers
> > > 2) virtio device id and vendor id. I'm not sure we need this consider the
> > > two drivers so far (virtio/vhost) are all bus drivers.
> > 
> > As we seem to be contemplating some dynamic creation of vdpa devices I
> > think upon creation time it should be specified what mode they should
> > run it and then all driver binding and autoloading should happen
> > automatically. Telling the user to bind/unbind is a very poor
> > experience.
> 
> Maybe but OTOH it's an existing interface. I think we can reasonably
> start with bind/unbind and then add ability to specify
> the mode later. bind/unbind come from core so they will be
> maintained anyway.

Existing where? For vfio? vfio is the only thing I am aware doing
that, and this is not vfio..

Jason

^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-20 17:49             ` Jason Gunthorpe
@ 2020-01-20 20:51               ` Shahaf Shuler
  2020-01-20 21:25                 ` Michael S. Tsirkin
  2020-01-20 21:48               ` Michael S. Tsirkin
  2020-01-21  4:00               ` Jason Wang
  2 siblings, 1 reply; 76+ messages in thread
From: Shahaf Shuler @ 2020-01-20 20:51 UTC (permalink / raw)
  To: Jason Gunthorpe, Jason Wang
  Cc: Rob Miller, Michael S. Tsirkin, linux-kernel, kvm,
	virtualization, Netdev, Bie, Tiwei, maxime.coquelin, Liang,
	Cunming, Wang, Zhihong, Wang, Xiao W, haotian.wang, Zhu,
	Lingshan, eperezma, lulu, Parav Pandit, Tian, Kevin, stefanha,
	rdunlap, hch, Ariel Adam, jakub.kicinski, Jiri Pirko, hanand,
	mhabets

Monday, January 20, 2020 7:50 PM, Jason Gunthorpe:
> Subject: Re: [PATCH 3/5] vDPA: introduce vDPA bus
> 
> On Mon, Jan 20, 2020 at 04:43:53PM +0800, Jason Wang wrote:
> > This is similar to the design of platform IOMMU part of vhost-vdpa. We
> > decide to send diffs to platform IOMMU there. If it's ok to do that in
> > driver, we can replace set_map with incremental API like map()/unmap().
> >
> > Then driver need to maintain rbtree itself.
> 
> I think we really need to see two modes, one where there is a fixed
> translation without dynamic vIOMMU driven changes and one that supports
> vIOMMU.
> 
> There are different optimization goals in the drivers for these two
> configurations.

+1.
It will be best to have one API for static config (i.e. mapping can be set only before virtio device gets active), and one API for dynamic changes that can be set after the virtio device is active. 

> 
> > > If the first one, then I think memory hotplug is a heavy flow
> > > regardless. Do you think the extra cycles for the tree traverse will
> > > be visible in any way?
> >
> > I think if the driver can pause the DMA during the time for setting up
> > new mapping, it should be fine.
> 
> This is very tricky for any driver if the mapping change hits the virtio rings. :(
> 
> Even a IOMMU using driver is going to have problems with that..
> 
> Jason

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-20 20:51               ` Shahaf Shuler
@ 2020-01-20 21:25                 ` Michael S. Tsirkin
  2020-01-20 21:47                   ` Shahaf Shuler
  2020-01-21 14:07                   ` Jason Gunthorpe
  0 siblings, 2 replies; 76+ messages in thread
From: Michael S. Tsirkin @ 2020-01-20 21:25 UTC (permalink / raw)
  To: Shahaf Shuler
  Cc: Jason Gunthorpe, Jason Wang, Rob Miller, linux-kernel, kvm,
	virtualization, Netdev, Bie, Tiwei, maxime.coquelin, Liang,
	Cunming, Wang, Zhihong, Wang, Xiao W, haotian.wang, Zhu,
	Lingshan, eperezma, lulu, Parav Pandit, Tian, Kevin, stefanha,
	rdunlap, hch, Ariel Adam, jakub.kicinski, Jiri Pirko, hanand,
	mhabets

On Mon, Jan 20, 2020 at 08:51:43PM +0000, Shahaf Shuler wrote:
> Monday, January 20, 2020 7:50 PM, Jason Gunthorpe:
> > Subject: Re: [PATCH 3/5] vDPA: introduce vDPA bus
> > 
> > On Mon, Jan 20, 2020 at 04:43:53PM +0800, Jason Wang wrote:
> > > This is similar to the design of platform IOMMU part of vhost-vdpa. We
> > > decide to send diffs to platform IOMMU there. If it's ok to do that in
> > > driver, we can replace set_map with incremental API like map()/unmap().
> > >
> > > Then driver need to maintain rbtree itself.
> > 
> > I think we really need to see two modes, one where there is a fixed
> > translation without dynamic vIOMMU driven changes and one that supports
> > vIOMMU.
> > 
> > There are different optimization goals in the drivers for these two
> > configurations.
> 
> +1.
> It will be best to have one API for static config (i.e. mapping can be
> set only before virtio device gets active), and one API for dynamic
> changes that can be set after the virtio device is active. 

Frankly I don't see when we'd use the static one.
Memory hotplug is enabled for most guests...

> > 
> > > > If the first one, then I think memory hotplug is a heavy flow
> > > > regardless. Do you think the extra cycles for the tree traverse will
> > > > be visible in any way?
> > >
> > > I think if the driver can pause the DMA during the time for setting up
> > > new mapping, it should be fine.
> > 
> > This is very tricky for any driver if the mapping change hits the virtio rings. :(
> > 
> > Even a IOMMU using driver is going to have problems with that..
> > 
> > Jason


^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-20 21:25                 ` Michael S. Tsirkin
@ 2020-01-20 21:47                   ` Shahaf Shuler
  2020-01-20 21:59                     ` Michael S. Tsirkin
  2020-01-21 14:07                   ` Jason Gunthorpe
  1 sibling, 1 reply; 76+ messages in thread
From: Shahaf Shuler @ 2020-01-20 21:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Gunthorpe, Jason Wang, Rob Miller, linux-kernel, kvm,
	virtualization, Netdev, Bie, Tiwei, maxime.coquelin, Liang,
	Cunming, Wang, Zhihong, Wang, Xiao W, haotian.wang, Zhu,
	Lingshan, eperezma, lulu, Parav Pandit, Tian, Kevin, stefanha,
	rdunlap, hch, Ariel Adam, jakub.kicinski, Jiri Pirko, hanand,
	mhabets

Monday, January 20, 2020 11:25 PM, Michael S. Tsirkin:
> Subject: Re: [PATCH 3/5] vDPA: introduce vDPA bus
> 
> On Mon, Jan 20, 2020 at 08:51:43PM +0000, Shahaf Shuler wrote:
> > Monday, January 20, 2020 7:50 PM, Jason Gunthorpe:
> > > Subject: Re: [PATCH 3/5] vDPA: introduce vDPA bus
> > >
> > > On Mon, Jan 20, 2020 at 04:43:53PM +0800, Jason Wang wrote:
> > > > This is similar to the design of platform IOMMU part of
> > > > vhost-vdpa. We decide to send diffs to platform IOMMU there. If
> > > > it's ok to do that in driver, we can replace set_map with incremental API
> like map()/unmap().
> > > >
> > > > Then driver need to maintain rbtree itself.
> > >
> > > I think we really need to see two modes, one where there is a fixed
> > > translation without dynamic vIOMMU driven changes and one that
> > > supports vIOMMU.
> > >
> > > There are different optimization goals in the drivers for these two
> > > configurations.
> >
> > +1.
> > It will be best to have one API for static config (i.e. mapping can be
> > set only before virtio device gets active), and one API for dynamic
> > changes that can be set after the virtio device is active.
> 
> Frankly I don't see when we'd use the static one.
> Memory hotplug is enabled for most guests...

The fact memory hotplug is enabled doesn't necessarily means there is not cold-plugged memory on the hot plugged slots. 
So your claim is majority of guests are deployed w/o any cold-plugged memory? 

> 
> > >
> > > > > If the first one, then I think memory hotplug is a heavy flow
> > > > > regardless. Do you think the extra cycles for the tree traverse
> > > > > will be visible in any way?
> > > >
> > > > I think if the driver can pause the DMA during the time for
> > > > setting up new mapping, it should be fine.
> > >
> > > This is very tricky for any driver if the mapping change hits the
> > > virtio rings. :(
> > >
> > > Even a IOMMU using driver is going to have problems with that..
> > >
> > > Jason


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-20 17:49             ` Jason Gunthorpe
  2020-01-20 20:51               ` Shahaf Shuler
@ 2020-01-20 21:48               ` Michael S. Tsirkin
  2020-01-21  4:00               ` Jason Wang
  2 siblings, 0 replies; 76+ messages in thread
From: Michael S. Tsirkin @ 2020-01-20 21:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jason Wang, Shahaf Shuler, Rob Miller, linux-kernel, kvm,
	virtualization, Netdev, Bie, Tiwei, maxime.coquelin, Liang,
	Cunming, Wang, Zhihong, Wang, Xiao W, haotian.wang, Zhu,
	Lingshan, eperezma, lulu, Parav Pandit, Tian, Kevin, stefanha,
	rdunlap, hch, Ariel Adam, jakub.kicinski, Jiri Pirko, hanand,
	mhabets

On Mon, Jan 20, 2020 at 05:49:39PM +0000, Jason Gunthorpe wrote:
> On Mon, Jan 20, 2020 at 04:43:53PM +0800, Jason Wang wrote:
> > This is similar to the design of platform IOMMU part of vhost-vdpa. We
> > decide to send diffs to platform IOMMU there. If it's ok to do that in
> > driver, we can replace set_map with incremental API like map()/unmap().
> > 
> > Then driver need to maintain rbtree itself.
> 
> I think we really need to see two modes, one where there is a fixed
> translation without dynamic vIOMMU driven changes and one that
> supports vIOMMU.
> 
> There are different optimization goals in the drivers for these two
> configurations.
> 
> > > If the first one, then I think memory hotplug is a heavy flow
> > > regardless. Do you think the extra cycles for the tree traverse
> > > will be visible in any way?
> > 
> > I think if the driver can pause the DMA during the time for setting up new
> > mapping, it should be fine.
> 
> This is very tricky for any driver if the mapping change hits the
> virtio rings. :(
> 
> Even a IOMMU using driver is going to have problems with that..
> 
> Jason

I think for starters we can assume this doesn't happen,
so any change doesn't affect any buffers in use.
Certainly true e.g. for memory hotplug.

-- 
MST


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-20 17:50           ` Jason Gunthorpe
@ 2020-01-20 21:56             ` Michael S. Tsirkin
  2020-01-21 14:12               ` Jason Gunthorpe
  0 siblings, 1 reply; 76+ messages in thread
From: Michael S. Tsirkin @ 2020-01-20 21:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jason Wang, linux-kernel, kvm, virtualization, netdev, tiwei.bie,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller,
	xiao.w.wang, haotian.wang, lingshan.zhu, eperezma, lulu,
	Parav Pandit, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, Jiri Pirko, Shahaf Shuler, hanand, mhabets

On Mon, Jan 20, 2020 at 05:50:55PM +0000, Jason Gunthorpe wrote:
> On Mon, Jan 20, 2020 at 07:17:26AM -0500, Michael S. Tsirkin wrote:
> > On Fri, Jan 17, 2020 at 01:54:42PM +0000, Jason Gunthorpe wrote:
> > > > 1) "virtio" vs "vhost", I implemented matching method for this in mdev
> > > > series, but it looks unnecessary for vDPA device driver to know about this.
> > > > Anyway we can use sysfs driver bind/unbind to switch drivers
> > > > 2) virtio device id and vendor id. I'm not sure we need this consider the
> > > > two drivers so far (virtio/vhost) are all bus drivers.
> > > 
> > > As we seem to be contemplating some dynamic creation of vdpa devices I
> > > think upon creation time it should be specified what mode they should
> > > run it and then all driver binding and autoloading should happen
> > > automatically. Telling the user to bind/unbind is a very poor
> > > experience.
> > 
> > Maybe but OTOH it's an existing interface. I think we can reasonably
> > start with bind/unbind and then add ability to specify
> > the mode later. bind/unbind come from core so they will be
> > maintained anyway.
> 
> Existing where?

Driver core.

> For vfio? vfio is the only thing I am aware doing
> that, and this is not vfio..
> 
> Jason


vfio is not doing anything. anyone can use a combination
of unbind and driver_override to attach a driver to a device.

It's not a great interface but it's there without any code,
and it will stay there without maintainance overhead
if we later add a nicer one.

-- 
MST


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-20 21:47                   ` Shahaf Shuler
@ 2020-01-20 21:59                     ` Michael S. Tsirkin
  2020-01-21  6:01                       ` Shahaf Shuler
  0 siblings, 1 reply; 76+ messages in thread
From: Michael S. Tsirkin @ 2020-01-20 21:59 UTC (permalink / raw)
  To: Shahaf Shuler
  Cc: Jason Gunthorpe, Jason Wang, Rob Miller, linux-kernel, kvm,
	virtualization, Netdev, Bie, Tiwei, maxime.coquelin, Liang,
	Cunming, Wang, Zhihong, Wang, Xiao W, haotian.wang, Zhu,
	Lingshan, eperezma, lulu, Parav Pandit, Tian, Kevin, stefanha,
	rdunlap, hch, Ariel Adam, jakub.kicinski, Jiri Pirko, hanand,
	mhabets

On Mon, Jan 20, 2020 at 09:47:18PM +0000, Shahaf Shuler wrote:
> Monday, January 20, 2020 11:25 PM, Michael S. Tsirkin:
> > Subject: Re: [PATCH 3/5] vDPA: introduce vDPA bus
> > 
> > On Mon, Jan 20, 2020 at 08:51:43PM +0000, Shahaf Shuler wrote:
> > > Monday, January 20, 2020 7:50 PM, Jason Gunthorpe:
> > > > Subject: Re: [PATCH 3/5] vDPA: introduce vDPA bus
> > > >
> > > > On Mon, Jan 20, 2020 at 04:43:53PM +0800, Jason Wang wrote:
> > > > > This is similar to the design of platform IOMMU part of
> > > > > vhost-vdpa. We decide to send diffs to platform IOMMU there. If
> > > > > it's ok to do that in driver, we can replace set_map with incremental API
> > like map()/unmap().
> > > > >
> > > > > Then driver need to maintain rbtree itself.
> > > >
> > > > I think we really need to see two modes, one where there is a fixed
> > > > translation without dynamic vIOMMU driven changes and one that
> > > > supports vIOMMU.
> > > >
> > > > There are different optimization goals in the drivers for these two
> > > > configurations.
> > >
> > > +1.
> > > It will be best to have one API for static config (i.e. mapping can be
> > > set only before virtio device gets active), and one API for dynamic
> > > changes that can be set after the virtio device is active.
> > 
> > Frankly I don't see when we'd use the static one.
> > Memory hotplug is enabled for most guests...
> 
> The fact memory hotplug is enabled doesn't necessarily means there is not cold-plugged memory on the hot plugged slots. 
> So your claim is majority of guests are deployed w/o any cold-plugged memory? 

Sorry for not being clear. I was merely saying that dynamic one
can't be optional, and static one can. So how about we
start just with the dynamic one, then add the static one
as a later optimization?


> > 
> > > >
> > > > > > If the first one, then I think memory hotplug is a heavy flow
> > > > > > regardless. Do you think the extra cycles for the tree traverse
> > > > > > will be visible in any way?
> > > > >
> > > > > I think if the driver can pause the DMA during the time for
> > > > > setting up new mapping, it should be fine.
> > > >
> > > > This is very tricky for any driver if the mapping change hits the
> > > > virtio rings. :(
> > > >
> > > > Even a IOMMU using driver is going to have problems with that..
> > > >
> > > > Jason


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-20 12:09               ` Michael S. Tsirkin
@ 2020-01-21  3:32                 ` Jason Wang
  0 siblings, 0 replies; 76+ messages in thread
From: Jason Wang @ 2020-01-21  3:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Shahaf Shuler, Rob Miller, linux-kernel, kvm, virtualization,
	Netdev, Bie, Tiwei, Jason Gunthorpe, maxime.coquelin, Liang,
	Cunming, Wang, Zhihong, Wang, Xiao W, haotian.wang, Zhu,
	Lingshan, eperezma, lulu, Parav Pandit, Tian, Kevin, stefanha,
	rdunlap, hch, Ariel Adam, jakub.kicinski, Jiri Pirko, hanand,
	mhabets


On 2020/1/20 下午8:09, Michael S. Tsirkin wrote:
> On Mon, Jan 20, 2020 at 04:44:34PM +0800, Jason Wang wrote:
>> On 2020/1/19 下午5:59, Michael S. Tsirkin wrote:
>>> On Sun, Jan 19, 2020 at 09:07:09AM +0000, Shahaf Shuler wrote:
>>>>> Technically, we can keep the incremental API
>>>>> here and let the vendor vDPA drivers to record the full mapping
>>>>> internally which may slightly increase the complexity of vendor driver.
>>>> What will be the trigger for the driver to know it received the last mapping on this series and it can now push it to the on-chip IOMMU?
>>> Some kind of invalidate API?
>>>
>> The problem is how to deal with the case of vIOMMU. When vIOMMU is enabling
>> there's no concept of last mapping.
>>
>> Thanks
> Most IOMMUs have a translation cache so have an invalidate API too.


Ok, then I get you.

But in this case, when vIOMMU is enabled, each new map became a "last 
mapping".

Thanks

>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-20 17:49             ` Jason Gunthorpe
  2020-01-20 20:51               ` Shahaf Shuler
  2020-01-20 21:48               ` Michael S. Tsirkin
@ 2020-01-21  4:00               ` Jason Wang
  2020-01-21  5:47                 ` Michael S. Tsirkin
  2 siblings, 1 reply; 76+ messages in thread
From: Jason Wang @ 2020-01-21  4:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Shahaf Shuler, Rob Miller, Michael S. Tsirkin, linux-kernel, kvm,
	virtualization, Netdev, Bie, Tiwei, maxime.coquelin, Liang,
	Cunming, Wang, Zhihong, Wang, Xiao W, haotian.wang, Zhu,
	Lingshan, eperezma, lulu, Parav Pandit, Tian, Kevin, stefanha,
	rdunlap, hch, Ariel Adam, jakub.kicinski, Jiri Pirko, hanand,
	mhabets


On 2020/1/21 上午1:49, Jason Gunthorpe wrote:
> On Mon, Jan 20, 2020 at 04:43:53PM +0800, Jason Wang wrote:
>> This is similar to the design of platform IOMMU part of vhost-vdpa. We
>> decide to send diffs to platform IOMMU there. If it's ok to do that in
>> driver, we can replace set_map with incremental API like map()/unmap().
>>
>> Then driver need to maintain rbtree itself.
> I think we really need to see two modes, one where there is a fixed
> translation without dynamic vIOMMU driven changes and one that
> supports vIOMMU.


I think in this case, you meant the method proposed by Shahaf that sends 
diffs of "fixed translation" to device?

It would be kind of tricky to deal with the following case for example:

old map [4G, 16G) new map [4G, 8G)

If we do

1) flush [4G, 16G)
2) add [4G, 8G)

There could be a window between 1) and 2).

It requires the IOMMU that can do

1) remove [8G, 16G)
2) flush [8G, 16G)
3) change [4G, 8G)

....

>
> There are different optimization goals in the drivers for these two
> configurations.
>
>>> If the first one, then I think memory hotplug is a heavy flow
>>> regardless. Do you think the extra cycles for the tree traverse
>>> will be visible in any way?
>> I think if the driver can pause the DMA during the time for setting up new
>> mapping, it should be fine.
> This is very tricky for any driver if the mapping change hits the
> virtio rings. :(
>
> Even a IOMMU using driver is going to have problems with that..
>
> Jason


Or I wonder whether ATS/PRI can help here. E.g during I/O page fault, 
driver/device can wait for the new mapping to be set and then replay the 
DMA.

Thanks

>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-21  4:00               ` Jason Wang
@ 2020-01-21  5:47                 ` Michael S. Tsirkin
  2020-01-21  8:00                   ` Jason Wang
  0 siblings, 1 reply; 76+ messages in thread
From: Michael S. Tsirkin @ 2020-01-21  5:47 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jason Gunthorpe, Shahaf Shuler, Rob Miller, linux-kernel, kvm,
	virtualization, Netdev, Bie, Tiwei, maxime.coquelin, Liang,
	Cunming, Wang, Zhihong, Wang, Xiao W, haotian.wang, Zhu,
	Lingshan, eperezma, lulu, Parav Pandit, Tian, Kevin, stefanha,
	rdunlap, hch, Ariel Adam, jakub.kicinski, Jiri Pirko, hanand,
	mhabets

On Tue, Jan 21, 2020 at 12:00:57PM +0800, Jason Wang wrote:
> 
> On 2020/1/21 上午1:49, Jason Gunthorpe wrote:
> > On Mon, Jan 20, 2020 at 04:43:53PM +0800, Jason Wang wrote:
> > > This is similar to the design of platform IOMMU part of vhost-vdpa. We
> > > decide to send diffs to platform IOMMU there. If it's ok to do that in
> > > driver, we can replace set_map with incremental API like map()/unmap().
> > > 
> > > Then driver need to maintain rbtree itself.
> > I think we really need to see two modes, one where there is a fixed
> > translation without dynamic vIOMMU driven changes and one that
> > supports vIOMMU.
> 
> 
> I think in this case, you meant the method proposed by Shahaf that sends
> diffs of "fixed translation" to device?
> 
> It would be kind of tricky to deal with the following case for example:
> 
> old map [4G, 16G) new map [4G, 8G)
> 
> If we do
> 
> 1) flush [4G, 16G)
> 2) add [4G, 8G)
> 
> There could be a window between 1) and 2).
> 
> It requires the IOMMU that can do
> 
> 1) remove [8G, 16G)
> 2) flush [8G, 16G)
> 3) change [4G, 8G)
> 
> ....

Basically what I had in mind is something like qemu memory api

0. begin
1. remove [8G, 16G)
2. add [4G, 8G)
3. commit

Anyway, I'm fine with a one-shot API for now, we can
improve it later.

> > 
> > There are different optimization goals in the drivers for these two
> > configurations.
> > 
> > > > If the first one, then I think memory hotplug is a heavy flow
> > > > regardless. Do you think the extra cycles for the tree traverse
> > > > will be visible in any way?
> > > I think if the driver can pause the DMA during the time for setting up new
> > > mapping, it should be fine.
> > This is very tricky for any driver if the mapping change hits the
> > virtio rings. :(
> > 
> > Even a IOMMU using driver is going to have problems with that..
> > 
> > Jason
> 
> 
> Or I wonder whether ATS/PRI can help here. E.g during I/O page fault,
> driver/device can wait for the new mapping to be set and then replay the
> DMA.
> 
> Thanks
> 


-- 
MST


^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-20 21:59                     ` Michael S. Tsirkin
@ 2020-01-21  6:01                       ` Shahaf Shuler
  2020-01-21  7:57                         ` Jason Wang
  0 siblings, 1 reply; 76+ messages in thread
From: Shahaf Shuler @ 2020-01-21  6:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Gunthorpe, Jason Wang, Rob Miller, linux-kernel, kvm,
	virtualization, Netdev, Bie, Tiwei, maxime.coquelin, Liang,
	Cunming, Wang, Zhihong, Wang, Xiao W, haotian.wang, Zhu,
	Lingshan, eperezma, lulu, Parav Pandit, Tian, Kevin, stefanha,
	rdunlap, hch, Ariel Adam, jakub.kicinski, Jiri Pirko, hanand,
	mhabets

Tuesday, January 21, 2020 12:00 AM, Michael S. Tsirkin:
> Subject: Re: [PATCH 3/5] vDPA: introduce vDPA bus
> 
> On Mon, Jan 20, 2020 at 09:47:18PM +0000, Shahaf Shuler wrote:
> > Monday, January 20, 2020 11:25 PM, Michael S. Tsirkin:
> > > Subject: Re: [PATCH 3/5] vDPA: introduce vDPA bus
> > >
> > > On Mon, Jan 20, 2020 at 08:51:43PM +0000, Shahaf Shuler wrote:
> > > > Monday, January 20, 2020 7:50 PM, Jason Gunthorpe:
> > > > > Subject: Re: [PATCH 3/5] vDPA: introduce vDPA bus
> > > > >
> > > > > On Mon, Jan 20, 2020 at 04:43:53PM +0800, Jason Wang wrote:
> > > > > > This is similar to the design of platform IOMMU part of
> > > > > > vhost-vdpa. We decide to send diffs to platform IOMMU there.
> > > > > > If it's ok to do that in driver, we can replace set_map with
> > > > > > incremental API
> > > like map()/unmap().
> > > > > >
> > > > > > Then driver need to maintain rbtree itself.
> > > > >
> > > > > I think we really need to see two modes, one where there is a
> > > > > fixed translation without dynamic vIOMMU driven changes and one
> > > > > that supports vIOMMU.
> > > > >
> > > > > There are different optimization goals in the drivers for these
> > > > > two configurations.
> > > >
> > > > +1.
> > > > It will be best to have one API for static config (i.e. mapping
> > > > can be set only before virtio device gets active), and one API for
> > > > dynamic changes that can be set after the virtio device is active.
> > >
> > > Frankly I don't see when we'd use the static one.
> > > Memory hotplug is enabled for most guests...
> >
> > The fact memory hotplug is enabled doesn't necessarily means there is not
> cold-plugged memory on the hot plugged slots.
> > So your claim is majority of guests are deployed w/o any cold-plugged
> memory?
> 
> Sorry for not being clear. I was merely saying that dynamic one can't be
> optional, and static one can. So how about we start just with the dynamic
> one, then add the static one as a later optimization?

Since we have the use case (cold plugged memory to guest, e.g. when populated w/ hugepages) I think we should start w/ both. The static one can be optional for drivers. 

Moreover am not yet clear about the suggested API for dynamic, can you share the prototype you have in mind?
Also will it be :
1. multiple add_map and then flag the driver to set
Or
2. each add_map should be set by the driver as stand alone. 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-21  6:01                       ` Shahaf Shuler
@ 2020-01-21  7:57                         ` Jason Wang
  0 siblings, 0 replies; 76+ messages in thread
From: Jason Wang @ 2020-01-21  7:57 UTC (permalink / raw)
  To: Shahaf Shuler, Michael S. Tsirkin
  Cc: Jason Gunthorpe, Rob Miller, linux-kernel, kvm, virtualization,
	Netdev, Bie, Tiwei, maxime.coquelin, Liang, Cunming, Wang,
	Zhihong, Wang, Xiao W, haotian.wang, Zhu, Lingshan, eperezma,
	lulu, Parav Pandit, Tian, Kevin, stefanha, rdunlap, hch,
	Ariel Adam, jakub.kicinski, Jiri Pirko, hanand, mhabets


On 2020/1/21 下午2:01, Shahaf Shuler wrote:
> Tuesday, January 21, 2020 12:00 AM, Michael S. Tsirkin:
>> Subject: Re: [PATCH 3/5] vDPA: introduce vDPA bus
>>
>> On Mon, Jan 20, 2020 at 09:47:18PM +0000, Shahaf Shuler wrote:
>>> Monday, January 20, 2020 11:25 PM, Michael S. Tsirkin:
>>>> Subject: Re: [PATCH 3/5] vDPA: introduce vDPA bus
>>>>
>>>> On Mon, Jan 20, 2020 at 08:51:43PM +0000, Shahaf Shuler wrote:
>>>>> Monday, January 20, 2020 7:50 PM, Jason Gunthorpe:
>>>>>> Subject: Re: [PATCH 3/5] vDPA: introduce vDPA bus
>>>>>>
>>>>>> On Mon, Jan 20, 2020 at 04:43:53PM +0800, Jason Wang wrote:
>>>>>>> This is similar to the design of platform IOMMU part of
>>>>>>> vhost-vdpa. We decide to send diffs to platform IOMMU there.
>>>>>>> If it's ok to do that in driver, we can replace set_map with
>>>>>>> incremental API
>>>> like map()/unmap().
>>>>>>> Then driver need to maintain rbtree itself.
>>>>>> I think we really need to see two modes, one where there is a
>>>>>> fixed translation without dynamic vIOMMU driven changes and one
>>>>>> that supports vIOMMU.
>>>>>>
>>>>>> There are different optimization goals in the drivers for these
>>>>>> two configurations.
>>>>> +1.
>>>>> It will be best to have one API for static config (i.e. mapping
>>>>> can be set only before virtio device gets active), and one API for
>>>>> dynamic changes that can be set after the virtio device is active.
>>>> Frankly I don't see when we'd use the static one.
>>>> Memory hotplug is enabled for most guests...
>>> The fact memory hotplug is enabled doesn't necessarily means there is not
>> cold-plugged memory on the hot plugged slots.
>>> So your claim is majority of guests are deployed w/o any cold-plugged
>> memory?
>>
>> Sorry for not being clear. I was merely saying that dynamic one can't be
>> optional, and static one can. So how about we start just with the dynamic
>> one, then add the static one as a later optimization?
> Since we have the use case (cold plugged memory to guest, e.g. when populated w/ hugepages) I think we should start w/ both. The static one can be optional for drivers.
>
> Moreover am not yet clear about the suggested API for dynamic, can you share the prototype you have in mind?
> Also will it be :
> 1. multiple add_map and then flag the driver to set
> Or
> 2. each add_map should be set by the driver as stand alone.


For dynamic one, it looks to me that introducing add_map()/del_map() bus 
operations is much more cleaner than reusing current set_map() one.

Thanks


>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-21  5:47                 ` Michael S. Tsirkin
@ 2020-01-21  8:00                   ` Jason Wang
  2020-01-21  8:15                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 76+ messages in thread
From: Jason Wang @ 2020-01-21  8:00 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Gunthorpe, Shahaf Shuler, Rob Miller, linux-kernel, kvm,
	virtualization, Netdev, Bie, Tiwei, maxime.coquelin, Liang,
	Cunming, Wang, Zhihong, Wang, Xiao W, haotian.wang, Zhu,
	Lingshan, eperezma, lulu, Parav Pandit, Tian, Kevin, stefanha,
	rdunlap, hch, Ariel Adam, jakub.kicinski, Jiri Pirko, hanand,
	mhabets


On 2020/1/21 下午1:47, Michael S. Tsirkin wrote:
> On Tue, Jan 21, 2020 at 12:00:57PM +0800, Jason Wang wrote:
>> On 2020/1/21 上午1:49, Jason Gunthorpe wrote:
>>> On Mon, Jan 20, 2020 at 04:43:53PM +0800, Jason Wang wrote:
>>>> This is similar to the design of platform IOMMU part of vhost-vdpa. We
>>>> decide to send diffs to platform IOMMU there. If it's ok to do that in
>>>> driver, we can replace set_map with incremental API like map()/unmap().
>>>>
>>>> Then driver need to maintain rbtree itself.
>>> I think we really need to see two modes, one where there is a fixed
>>> translation without dynamic vIOMMU driven changes and one that
>>> supports vIOMMU.
>>
>> I think in this case, you meant the method proposed by Shahaf that sends
>> diffs of "fixed translation" to device?
>>
>> It would be kind of tricky to deal with the following case for example:
>>
>> old map [4G, 16G) new map [4G, 8G)
>>
>> If we do
>>
>> 1) flush [4G, 16G)
>> 2) add [4G, 8G)
>>
>> There could be a window between 1) and 2).
>>
>> It requires the IOMMU that can do
>>
>> 1) remove [8G, 16G)
>> 2) flush [8G, 16G)
>> 3) change [4G, 8G)
>>
>> ....
> Basically what I had in mind is something like qemu memory api
>
> 0. begin
> 1. remove [8G, 16G)
> 2. add [4G, 8G)
> 3. commit


This sounds more flexible e.g driver may choose to implement static 
mapping one through commit. But a question here, it looks to me this 
still requires the DMA to be synced with at least commit here. Otherwise 
device may get DMA fault? Or device is expected to be paused DMA during 
begin?

Thanks


>
> Anyway, I'm fine with a one-shot API for now, we can
> improve it later.
>
>>> There are different optimization goals in the drivers for these two
>>> configurations.
>>>
>>>>> If the first one, then I think memory hotplug is a heavy flow
>>>>> regardless. Do you think the extra cycles for the tree traverse
>>>>> will be visible in any way?
>>>> I think if the driver can pause the DMA during the time for setting up new
>>>> mapping, it should be fine.
>>> This is very tricky for any driver if the mapping change hits the
>>> virtio rings. :(
>>>
>>> Even a IOMMU using driver is going to have problems with that..
>>>
>>> Jason
>>
>> Or I wonder whether ATS/PRI can help here. E.g during I/O page fault,
>> driver/device can wait for the new mapping to be set and then replay the
>> DMA.
>>
>> Thanks
>>
>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-21  8:00                   ` Jason Wang
@ 2020-01-21  8:15                     ` Michael S. Tsirkin
  2020-01-21  8:35                       ` Jason Wang
  2020-01-21 14:05                       ` Jason Gunthorpe
  0 siblings, 2 replies; 76+ messages in thread
From: Michael S. Tsirkin @ 2020-01-21  8:15 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jason Gunthorpe, Shahaf Shuler, Rob Miller, linux-kernel, kvm,
	virtualization, Netdev, Bie, Tiwei, maxime.coquelin, Liang,
	Cunming, Wang, Zhihong, Wang, Xiao W, haotian.wang, Zhu,
	Lingshan, eperezma, lulu, Parav Pandit, Tian, Kevin, stefanha,
	rdunlap, hch, Ariel Adam, jakub.kicinski, Jiri Pirko, hanand,
	mhabets

On Tue, Jan 21, 2020 at 04:00:38PM +0800, Jason Wang wrote:
> 
> On 2020/1/21 下午1:47, Michael S. Tsirkin wrote:
> > On Tue, Jan 21, 2020 at 12:00:57PM +0800, Jason Wang wrote:
> > > On 2020/1/21 上午1:49, Jason Gunthorpe wrote:
> > > > On Mon, Jan 20, 2020 at 04:43:53PM +0800, Jason Wang wrote:
> > > > > This is similar to the design of platform IOMMU part of vhost-vdpa. We
> > > > > decide to send diffs to platform IOMMU there. If it's ok to do that in
> > > > > driver, we can replace set_map with incremental API like map()/unmap().
> > > > > 
> > > > > Then driver need to maintain rbtree itself.
> > > > I think we really need to see two modes, one where there is a fixed
> > > > translation without dynamic vIOMMU driven changes and one that
> > > > supports vIOMMU.
> > > 
> > > I think in this case, you meant the method proposed by Shahaf that sends
> > > diffs of "fixed translation" to device?
> > > 
> > > It would be kind of tricky to deal with the following case for example:
> > > 
> > > old map [4G, 16G) new map [4G, 8G)
> > > 
> > > If we do
> > > 
> > > 1) flush [4G, 16G)
> > > 2) add [4G, 8G)
> > > 
> > > There could be a window between 1) and 2).
> > > 
> > > It requires the IOMMU that can do
> > > 
> > > 1) remove [8G, 16G)
> > > 2) flush [8G, 16G)
> > > 3) change [4G, 8G)
> > > 
> > > ....
> > Basically what I had in mind is something like qemu memory api
> > 
> > 0. begin
> > 1. remove [8G, 16G)
> > 2. add [4G, 8G)
> > 3. commit
> 
> 
> This sounds more flexible e.g driver may choose to implement static mapping
> one through commit. But a question here, it looks to me this still requires
> the DMA to be synced with at least commit here. Otherwise device may get DMA
> fault? Or device is expected to be paused DMA during begin?
> 
> Thanks

For example, commit might switch one set of tables for another,
without need to pause DMA.

> 
> > 
> > Anyway, I'm fine with a one-shot API for now, we can
> > improve it later.
> > 
> > > > There are different optimization goals in the drivers for these two
> > > > configurations.
> > > > 
> > > > > > If the first one, then I think memory hotplug is a heavy flow
> > > > > > regardless. Do you think the extra cycles for the tree traverse
> > > > > > will be visible in any way?
> > > > > I think if the driver can pause the DMA during the time for setting up new
> > > > > mapping, it should be fine.
> > > > This is very tricky for any driver if the mapping change hits the
> > > > virtio rings. :(
> > > > 
> > > > Even a IOMMU using driver is going to have problems with that..
> > > > 
> > > > Jason
> > > 
> > > Or I wonder whether ATS/PRI can help here. E.g during I/O page fault,
> > > driver/device can wait for the new mapping to be set and then replay the
> > > DMA.
> > > 
> > > Thanks
> > > 
> > 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-21  8:15                     ` Michael S. Tsirkin
@ 2020-01-21  8:35                       ` Jason Wang
  2020-01-21 11:09                         ` Shahaf Shuler
  2020-01-21 14:05                       ` Jason Gunthorpe
  1 sibling, 1 reply; 76+ messages in thread
From: Jason Wang @ 2020-01-21  8:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Gunthorpe, Shahaf Shuler, Rob Miller, linux-kernel, kvm,
	virtualization, Netdev, Bie, Tiwei, maxime.coquelin, Liang,
	Cunming, Wang, Zhihong, Wang, Xiao W, haotian.wang, Zhu,
	Lingshan, eperezma, lulu, Parav Pandit, Tian, Kevin, stefanha,
	rdunlap, hch, Ariel Adam, Jiri Pirko, hanand, mhabets


On 2020/1/21 下午4:15, Michael S. Tsirkin wrote:
> On Tue, Jan 21, 2020 at 04:00:38PM +0800, Jason Wang wrote:
>> On 2020/1/21 下午1:47, Michael S. Tsirkin wrote:
>>> On Tue, Jan 21, 2020 at 12:00:57PM +0800, Jason Wang wrote:
>>>> On 2020/1/21 上午1:49, Jason Gunthorpe wrote:
>>>>> On Mon, Jan 20, 2020 at 04:43:53PM +0800, Jason Wang wrote:
>>>>>> This is similar to the design of platform IOMMU part of vhost-vdpa. We
>>>>>> decide to send diffs to platform IOMMU there. If it's ok to do that in
>>>>>> driver, we can replace set_map with incremental API like map()/unmap().
>>>>>>
>>>>>> Then driver need to maintain rbtree itself.
>>>>> I think we really need to see two modes, one where there is a fixed
>>>>> translation without dynamic vIOMMU driven changes and one that
>>>>> supports vIOMMU.
>>>> I think in this case, you meant the method proposed by Shahaf that sends
>>>> diffs of "fixed translation" to device?
>>>>
>>>> It would be kind of tricky to deal with the following case for example:
>>>>
>>>> old map [4G, 16G) new map [4G, 8G)
>>>>
>>>> If we do
>>>>
>>>> 1) flush [4G, 16G)
>>>> 2) add [4G, 8G)
>>>>
>>>> There could be a window between 1) and 2).
>>>>
>>>> It requires the IOMMU that can do
>>>>
>>>> 1) remove [8G, 16G)
>>>> 2) flush [8G, 16G)
>>>> 3) change [4G, 8G)
>>>>
>>>> ....
>>> Basically what I had in mind is something like qemu memory api
>>>
>>> 0. begin
>>> 1. remove [8G, 16G)
>>> 2. add [4G, 8G)
>>> 3. commit
>>
>> This sounds more flexible e.g driver may choose to implement static mapping
>> one through commit. But a question here, it looks to me this still requires
>> the DMA to be synced with at least commit here. Otherwise device may get DMA
>> fault? Or device is expected to be paused DMA during begin?
>>
>> Thanks
> For example, commit might switch one set of tables for another,
> without need to pause DMA.


Yes, I think that works but need confirmation from Shahaf or Jason.

Thanks



>
>>> Anyway, I'm fine with a one-shot API for now, we can
>>> improve it later.
>>>
>>>>> There are different optimization goals in the drivers for these two
>>>>> configurations.
>>>>>
>>>>>>> If the first one, then I think memory hotplug is a heavy flow
>>>>>>> regardless. Do you think the extra cycles for the tree traverse
>>>>>>> will be visible in any way?
>>>>>> I think if the driver can pause the DMA during the time for setting up new
>>>>>> mapping, it should be fine.
>>>>> This is very tricky for any driver if the mapping change hits the
>>>>> virtio rings. :(
>>>>>
>>>>> Even a IOMMU using driver is going to have problems with that..
>>>>>
>>>>> Jason
>>>> Or I wonder whether ATS/PRI can help here. E.g during I/O page fault,
>>>> driver/device can wait for the new mapping to be set and then replay the
>>>> DMA.
>>>>
>>>> Thanks
>>>>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-17  3:03     ` Jason Wang
  2020-01-17 13:54       ` Jason Gunthorpe
@ 2020-01-21  8:40       ` Tian, Kevin
  2020-01-21  9:41         ` Jason Wang
  1 sibling, 1 reply; 76+ messages in thread
From: Tian, Kevin @ 2020-01-21  8:40 UTC (permalink / raw)
  To: Jason Wang, Jason Gunthorpe
  Cc: mst, linux-kernel, kvm, virtualization, netdev, Bie, Tiwei,
	maxime.coquelin, Liang, Cunming, Wang, Zhihong, rob.miller, Wang,
	Xiao W, haotian.wang, Zhu, Lingshan, eperezma, lulu,
	Parav Pandit, stefanha, rdunlap, hch, aadam, jakub.kicinski,
	Jiri Pirko, Shahaf Shuler, hanand, mhabets

> From: Jason Wang <jasowang@redhat.com>
> Sent: Friday, January 17, 2020 11:03 AM
> 
> 
> On 2020/1/16 下午11:22, Jason Gunthorpe wrote:
> > On Thu, Jan 16, 2020 at 08:42:29PM +0800, Jason Wang wrote:
> >> vDPA device is a device that uses a datapath which complies with the
> >> virtio specifications with vendor specific control path. vDPA devices
> >> can be both physically located on the hardware or emulated by
> >> software. vDPA hardware devices are usually implemented through PCIE
> >> with the following types:
> >>
> >> - PF (Physical Function) - A single Physical Function
> >> - VF (Virtual Function) - Device that supports single root I/O
> >>    virtualization (SR-IOV). Its Virtual Function (VF) represents a
> >>    virtualized instance of the device that can be assigned to different
> >>    partitions
> >> - VDEV (Virtual Device) - With technologies such as Intel Scalable
> >>    IOV, a virtual device composed by host OS utilizing one or more
> >>    ADIs.

the concept of VDEV includes both software bits and ADIs. If you
only take about hardware types, using ADI is more accurate.

> >> - SF (Sub function) - Vendor specific interface to slice the Physical
> >>    Function to multiple sub functions that can be assigned to different
> >>    partitions as virtual devices.
> > I really hope we don't end up with two different ways to spell this
> > same thing.
> 
> 
> I think you meant ADI vs SF. It looks to me that ADI is limited to the
> scope of scalable IOV but SF not.

ADI is just a term for minimally assignable resource in Scalable IOV. 
'assignable' implies several things, e.g. the resource can be independently 
mapped to/accessed by user space or guest, DMAs between two
ADIs are isolated, operating one ADI doesn't affecting another ADI,
etc.  I'm not clear about  other vendor specific interfaces, but supposing
they need match the similar requirements. Then do we really want to
differentiate ADI vs. SF? What about merging them with ADI as just
one example of finer-grained slicing?

> 
> 
> >
> >> @@ -0,0 +1,2 @@
> >> +# SPDX-License-Identifier: GPL-2.0
> >> +obj-$(CONFIG_VDPA) += vdpa.o
> >> diff --git a/drivers/virtio/vdpa/vdpa.c b/drivers/virtio/vdpa/vdpa.c
> >> new file mode 100644
> >> index 000000000000..2b0e4a9f105d
> >> +++ b/drivers/virtio/vdpa/vdpa.c
> >> @@ -0,0 +1,141 @@
> >> +// SPDX-License-Identifier: GPL-2.0-only
> >> +/*
> >> + * vDPA bus.
> >> + *
> >> + * Copyright (c) 2019, Red Hat. All rights reserved.
> >> + *     Author: Jason Wang <jasowang@redhat.com>
> > 2020 tests days
> 
> 
> Will fix.
> 
> 
> >
> >> + *
> >> + */
> >> +
> >> +#include <linux/module.h>
> >> +#include <linux/idr.h>
> >> +#include <linux/vdpa.h>
> >> +
> >> +#define MOD_VERSION  "0.1"
> > I think module versions are discouraged these days
> 
> 
> Will remove.
> 
> 
> >
> >> +#define MOD_DESC     "vDPA bus"
> >> +#define MOD_AUTHOR   "Jason Wang <jasowang@redhat.com>"
> >> +#define MOD_LICENSE  "GPL v2"
> >> +
> >> +static DEFINE_IDA(vdpa_index_ida);
> >> +
> >> +struct device *vdpa_get_parent(struct vdpa_device *vdpa)
> >> +{
> >> +	return vdpa->dev.parent;
> >> +}
> >> +EXPORT_SYMBOL(vdpa_get_parent);
> >> +
> >> +void vdpa_set_parent(struct vdpa_device *vdpa, struct device *parent)
> >> +{
> >> +	vdpa->dev.parent = parent;
> >> +}
> >> +EXPORT_SYMBOL(vdpa_set_parent);
> >> +
> >> +struct vdpa_device *dev_to_vdpa(struct device *_dev)
> >> +{
> >> +	return container_of(_dev, struct vdpa_device, dev);
> >> +}
> >> +EXPORT_SYMBOL_GPL(dev_to_vdpa);
> >> +
> >> +struct device *vdpa_to_dev(struct vdpa_device *vdpa)
> >> +{
> >> +	return &vdpa->dev;
> >> +}
> >> +EXPORT_SYMBOL_GPL(vdpa_to_dev);
> > Why these trivial assessors? Seems unnecessary, or should at least be
> > static inlines in a header
> 
> 
> Will fix.
> 
> 
> >
> >> +int register_vdpa_device(struct vdpa_device *vdpa)
> >> +{
> > Usually we want to see symbols consistently prefixed with vdpa_*, is
> > there a reason why register/unregister are swapped?
> 
> 
> I follow the name from virtio. I will switch to vdpa_*.
> 
> 
> >
> >> +	int err;
> >> +
> >> +	if (!vdpa_get_parent(vdpa))
> >> +		return -EINVAL;
> >> +
> >> +	if (!vdpa->config)
> >> +		return -EINVAL;
> >> +
> >> +	err = ida_simple_get(&vdpa_index_ida, 0, 0, GFP_KERNEL);
> >> +	if (err < 0)
> >> +		return -EFAULT;
> >> +
> >> +	vdpa->dev.bus = &vdpa_bus;
> >> +	device_initialize(&vdpa->dev);
> > IMHO device_initialize should not be called inside something called
> > register, toooften we find out that the caller drivers need the device
> > to be initialized earlier, ie to use the kref, or something.
> >
> > I find the best flow is to have some init function that does the
> > device_initialize and sets the device_name that the driver can call
> > early.
> 
> 
> Ok, will do.
> 
> 
> >
> > Shouldn't there be a device/driver matching process of some kind?
> 
> 
> The question is what do we want do match here.
> 
> 1) "virtio" vs "vhost", I implemented matching method for this in mdev
> series, but it looks unnecessary for vDPA device driver to know about
> this. Anyway we can use sysfs driver bind/unbind to switch drivers
> 2) virtio device id and vendor id. I'm not sure we need this consider
> the two drivers so far (virtio/vhost) are all bus drivers.
> 
> Thanks
> 
> 
> >
> > Jason
> >


^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [PATCH 0/5] vDPA support
  2020-01-16 12:42 [PATCH 0/5] vDPA support Jason Wang
                   ` (4 preceding siblings ...)
  2020-01-16 12:42 ` [PATCH 5/5] vdpasim: vDPA device simulator Jason Wang
@ 2020-01-21  8:44 ` Tian, Kevin
  2020-01-21  9:39   ` Jason Wang
  5 siblings, 1 reply; 76+ messages in thread
From: Tian, Kevin @ 2020-01-21  8:44 UTC (permalink / raw)
  To: Jason Wang, mst, linux-kernel, kvm, virtualization, netdev
  Cc: Bie, Tiwei, jgg, maxime.coquelin, Liang, Cunming, Wang, Zhihong,
	rob.miller, Wang, Xiao W, haotian.wang, Zhu, Lingshan, eperezma,
	lulu, parav, stefanha, rdunlap, hch, aadam, jakub.kicinski, jiri,
	shahafs, hanand, mhabets

> From: Jason Wang
> Sent: Thursday, January 16, 2020 8:42 PM
> 
> Hi all:
> 
> Based on the comments and discussion for mdev based hardware virtio
> offloading support[1]. A different approach to support vDPA device is
> proposed in this series.

Can you point to the actual link which triggered the direction change?
A quick glimpse in that thread doesn't reveal such information...

> 
> Instead of leveraging VFIO/mdev which may not work for some
> vendors. This series tries to introduce a dedicated vDPA bus and
> leverage vhost for userspace drivers. This help for the devices that
> are not fit for VFIO and may reduce the conflict when try to propose a
> bus template for virtual devices in [1].
> 
> The vDPA support is split into following parts:
> 
> 1) vDPA core (bus, device and driver abstraction)
> 2) virtio vDPA transport for kernel virtio driver to control vDPA
>    device
> 3) vhost vDPA bus driver for userspace vhost driver to control vDPA
>    device
> 4) vendor vDPA drivers
> 5) management API
> 
> Both 1) and 2) are included in this series. Tiwei will work on part
> 3). For 4), Ling Shan will work and post IFCVF driver. For 5) we leave
> it to vendor to implement, but it's better to come into an agreement
> for management to create/configure/destroy vDPA device.
> 
> The sample driver is kept but renamed to vdap_sim. An on-chip IOMMU
> implementation is added to sample device to make it work for both
> kernel virtio driver and userspace vhost driver. It implements a sysfs
> based management API, but it can switch to any other (e.g devlink) if
> necessary.
> 
> Please refer each patch for more information.
> 
> Comments are welcomed.
> 
> [1] https://lkml.org/lkml/2019/11/18/261
> 
> Jason Wang (5):
>   vhost: factor out IOTLB
>   vringh: IOTLB support
>   vDPA: introduce vDPA bus
>   virtio: introduce a vDPA based transport
>   vdpasim: vDPA device simulator
> 
>  MAINTAINERS                    |   2 +
>  drivers/vhost/Kconfig          |   7 +
>  drivers/vhost/Kconfig.vringh   |   1 +
>  drivers/vhost/Makefile         |   2 +
>  drivers/vhost/net.c            |   2 +-
>  drivers/vhost/vhost.c          | 221 +++------
>  drivers/vhost/vhost.h          |  36 +-
>  drivers/vhost/vhost_iotlb.c    | 171 +++++++
>  drivers/vhost/vringh.c         | 434 +++++++++++++++++-
>  drivers/virtio/Kconfig         |  15 +
>  drivers/virtio/Makefile        |   2 +
>  drivers/virtio/vdpa/Kconfig    |  26 ++
>  drivers/virtio/vdpa/Makefile   |   3 +
>  drivers/virtio/vdpa/vdpa.c     | 141 ++++++
>  drivers/virtio/vdpa/vdpa_sim.c | 796
> +++++++++++++++++++++++++++++++++
>  drivers/virtio/virtio_vdpa.c   | 400 +++++++++++++++++
>  include/linux/vdpa.h           | 191 ++++++++
>  include/linux/vhost_iotlb.h    |  45 ++
>  include/linux/vringh.h         |  36 ++
>  19 files changed, 2327 insertions(+), 204 deletions(-)
>  create mode 100644 drivers/vhost/vhost_iotlb.c
>  create mode 100644 drivers/virtio/vdpa/Kconfig
>  create mode 100644 drivers/virtio/vdpa/Makefile
>  create mode 100644 drivers/virtio/vdpa/vdpa.c
>  create mode 100644 drivers/virtio/vdpa/vdpa_sim.c
>  create mode 100644 drivers/virtio/virtio_vdpa.c
>  create mode 100644 include/linux/vdpa.h
>  create mode 100644 include/linux/vhost_iotlb.h
> 
> --
> 2.19.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 0/5] vDPA support
  2020-01-21  8:44 ` [PATCH 0/5] vDPA support Tian, Kevin
@ 2020-01-21  9:39   ` Jason Wang
  0 siblings, 0 replies; 76+ messages in thread
From: Jason Wang @ 2020-01-21  9:39 UTC (permalink / raw)
  To: Tian, Kevin, mst, linux-kernel, kvm, virtualization, netdev
  Cc: Bie, Tiwei, jgg, maxime.coquelin, Liang, Cunming, Wang, Zhihong,
	rob.miller, Wang, Xiao W, haotian.wang, Zhu, Lingshan, eperezma,
	lulu, parav, stefanha, rdunlap, hch, aadam, jakub.kicinski, jiri,
	shahafs, hanand, mhabets


On 2020/1/21 下午4:44, Tian, Kevin wrote:
>> From: Jason Wang
>> Sent: Thursday, January 16, 2020 8:42 PM
>>
>> Hi all:
>>
>> Based on the comments and discussion for mdev based hardware virtio
>> offloading support[1]. A different approach to support vDPA device is
>> proposed in this series.
> Can you point to the actual link which triggered the direction change?
> A quick glimpse in that thread doesn't reveal such information...


Right, please see this link, the actual discussion happens on the thread 
of virtual-bus for some reasons...

https://patchwork.ozlabs.org/patch/1195895/

Thanks


>
>> Instead of leveraging VFIO/mdev which may not work for some
>> vendors. This series tries to introduce a dedicated vDPA bus and
>> leverage vhost for userspace drivers. This help for the devices that
>> are not fit for VFIO and may reduce the conflict when try to propose a
>> bus template for virtual devices in [1].
>>
>> The vDPA support is split into following parts:
>>
>> 1) vDPA core (bus, device and driver abstraction)
>> 2) virtio vDPA transport for kernel virtio driver to control vDPA
>>     device
>> 3) vhost vDPA bus driver for userspace vhost driver to control vDPA
>>     device
>> 4) vendor vDPA drivers
>> 5) management API
>>
>> Both 1) and 2) are included in this series. Tiwei will work on part
>> 3). For 4), Ling Shan will work and post IFCVF driver. For 5) we leave
>> it to vendor to implement, but it's better to come into an agreement
>> for management to create/configure/destroy vDPA device.
>>
>> The sample driver is kept but renamed to vdap_sim. An on-chip IOMMU
>> implementation is added to sample device to make it work for both
>> kernel virtio driver and userspace vhost driver. It implements a sysfs
>> based management API, but it can switch to any other (e.g devlink) if
>> necessary.
>>
>> Please refer each patch for more information.
>>
>> Comments are welcomed.
>>
>> [1] https://lkml.org/lkml/2019/11/18/261
>>
>> Jason Wang (5):
>>    vhost: factor out IOTLB
>>    vringh: IOTLB support
>>    vDPA: introduce vDPA bus
>>    virtio: introduce a vDPA based transport
>>    vdpasim: vDPA device simulator
>>
>>   MAINTAINERS                    |   2 +
>>   drivers/vhost/Kconfig          |   7 +
>>   drivers/vhost/Kconfig.vringh   |   1 +
>>   drivers/vhost/Makefile         |   2 +
>>   drivers/vhost/net.c            |   2 +-
>>   drivers/vhost/vhost.c          | 221 +++------
>>   drivers/vhost/vhost.h          |  36 +-
>>   drivers/vhost/vhost_iotlb.c    | 171 +++++++
>>   drivers/vhost/vringh.c         | 434 +++++++++++++++++-
>>   drivers/virtio/Kconfig         |  15 +
>>   drivers/virtio/Makefile        |   2 +
>>   drivers/virtio/vdpa/Kconfig    |  26 ++
>>   drivers/virtio/vdpa/Makefile   |   3 +
>>   drivers/virtio/vdpa/vdpa.c     | 141 ++++++
>>   drivers/virtio/vdpa/vdpa_sim.c | 796
>> +++++++++++++++++++++++++++++++++
>>   drivers/virtio/virtio_vdpa.c   | 400 +++++++++++++++++
>>   include/linux/vdpa.h           | 191 ++++++++
>>   include/linux/vhost_iotlb.h    |  45 ++
>>   include/linux/vringh.h         |  36 ++
>>   19 files changed, 2327 insertions(+), 204 deletions(-)
>>   create mode 100644 drivers/vhost/vhost_iotlb.c
>>   create mode 100644 drivers/virtio/vdpa/Kconfig
>>   create mode 100644 drivers/virtio/vdpa/Makefile
>>   create mode 100644 drivers/virtio/vdpa/vdpa.c
>>   create mode 100644 drivers/virtio/vdpa/vdpa_sim.c
>>   create mode 100644 drivers/virtio/virtio_vdpa.c
>>   create mode 100644 include/linux/vdpa.h
>>   create mode 100644 include/linux/vhost_iotlb.h
>>
>> --
>> 2.19.1


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-21  8:40       ` Tian, Kevin
@ 2020-01-21  9:41         ` Jason Wang
  0 siblings, 0 replies; 76+ messages in thread
From: Jason Wang @ 2020-01-21  9:41 UTC (permalink / raw)
  To: Tian, Kevin, Jason Gunthorpe
  Cc: mst, linux-kernel, kvm, virtualization, netdev, Bie, Tiwei,
	maxime.coquelin, Liang, Cunming, Wang, Zhihong, rob.miller, Wang,
	Xiao W, haotian.wang, Zhu, Lingshan, eperezma, lulu,
	Parav Pandit, stefanha, rdunlap, hch, aadam, jakub.kicinski,
	Jiri Pirko, Shahaf Shuler, hanand, mhabets


On 2020/1/21 下午4:40, Tian, Kevin wrote:
>> From: Jason Wang <jasowang@redhat.com>
>> Sent: Friday, January 17, 2020 11:03 AM
>>
>>
>> On 2020/1/16 下午11:22, Jason Gunthorpe wrote:
>>> On Thu, Jan 16, 2020 at 08:42:29PM +0800, Jason Wang wrote:
>>>> vDPA device is a device that uses a datapath which complies with the
>>>> virtio specifications with vendor specific control path. vDPA devices
>>>> can be both physically located on the hardware or emulated by
>>>> software. vDPA hardware devices are usually implemented through PCIE
>>>> with the following types:
>>>>
>>>> - PF (Physical Function) - A single Physical Function
>>>> - VF (Virtual Function) - Device that supports single root I/O
>>>>     virtualization (SR-IOV). Its Virtual Function (VF) represents a
>>>>     virtualized instance of the device that can be assigned to different
>>>>     partitions
>>>> - VDEV (Virtual Device) - With technologies such as Intel Scalable
>>>>     IOV, a virtual device composed by host OS utilizing one or more
>>>>     ADIs.
> the concept of VDEV includes both software bits and ADIs. If you
> only take about hardware types, using ADI is more accurate.


Ok.


>
>>>> - SF (Sub function) - Vendor specific interface to slice the Physical
>>>>     Function to multiple sub functions that can be assigned to different
>>>>     partitions as virtual devices.
>>> I really hope we don't end up with two different ways to spell this
>>> same thing.
>>
>> I think you meant ADI vs SF. It looks to me that ADI is limited to the
>> scope of scalable IOV but SF not.
> ADI is just a term for minimally assignable resource in Scalable IOV.
> 'assignable' implies several things, e.g. the resource can be independently
> mapped to/accessed by user space or guest, DMAs between two
> ADIs are isolated, operating one ADI doesn't affecting another ADI,
> etc.  I'm not clear about  other vendor specific interfaces, but supposing
> they need match the similar requirements. Then do we really want to
> differentiate ADI vs. SF? What about merging them with ADI as just
> one example of finer-grained slicing?


I think so. That what Jason G want as well.

Thanks


>
>>
>>>> @@ -0,0 +1,2 @@
>>>> +# SPDX-License-Identifier: GPL-2.0
>>>> +obj-$(CONFIG_VDPA) += vdpa.o
>>>> diff --git a/drivers/virtio/vdpa/vdpa.c b/drivers/virtio/vdpa/vdpa.c
>>>> new file mode 100644
>>>> index 000000000000..2b0e4a9f105d
>>>> +++ b/drivers/virtio/vdpa/vdpa.c
>>>> @@ -0,0 +1,141 @@
>>>> +// SPDX-License-Identifier: GPL-2.0-only
>>>> +/*
>>>> + * vDPA bus.
>>>> + *
>>>> + * Copyright (c) 2019, Red Hat. All rights reserved.
>>>> + *     Author: Jason Wang <jasowang@redhat.com>
>>> 2020 tests days
>>
>> Will fix.
>>
>>
>>>> + *
>>>> + */
>>>> +
>>>> +#include <linux/module.h>
>>>> +#include <linux/idr.h>
>>>> +#include <linux/vdpa.h>
>>>> +
>>>> +#define MOD_VERSION  "0.1"
>>> I think module versions are discouraged these days
>>
>> Will remove.
>>
>>
>>>> +#define MOD_DESC     "vDPA bus"
>>>> +#define MOD_AUTHOR   "Jason Wang <jasowang@redhat.com>"
>>>> +#define MOD_LICENSE  "GPL v2"
>>>> +
>>>> +static DEFINE_IDA(vdpa_index_ida);
>>>> +
>>>> +struct device *vdpa_get_parent(struct vdpa_device *vdpa)
>>>> +{
>>>> +	return vdpa->dev.parent;
>>>> +}
>>>> +EXPORT_SYMBOL(vdpa_get_parent);
>>>> +
>>>> +void vdpa_set_parent(struct vdpa_device *vdpa, struct device *parent)
>>>> +{
>>>> +	vdpa->dev.parent = parent;
>>>> +}
>>>> +EXPORT_SYMBOL(vdpa_set_parent);
>>>> +
>>>> +struct vdpa_device *dev_to_vdpa(struct device *_dev)
>>>> +{
>>>> +	return container_of(_dev, struct vdpa_device, dev);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(dev_to_vdpa);
>>>> +
>>>> +struct device *vdpa_to_dev(struct vdpa_device *vdpa)
>>>> +{
>>>> +	return &vdpa->dev;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(vdpa_to_dev);
>>> Why these trivial assessors? Seems unnecessary, or should at least be
>>> static inlines in a header
>>
>> Will fix.
>>
>>
>>>> +int register_vdpa_device(struct vdpa_device *vdpa)
>>>> +{
>>> Usually we want to see symbols consistently prefixed with vdpa_*, is
>>> there a reason why register/unregister are swapped?
>>
>> I follow the name from virtio. I will switch to vdpa_*.
>>
>>
>>>> +	int err;
>>>> +
>>>> +	if (!vdpa_get_parent(vdpa))
>>>> +		return -EINVAL;
>>>> +
>>>> +	if (!vdpa->config)
>>>> +		return -EINVAL;
>>>> +
>>>> +	err = ida_simple_get(&vdpa_index_ida, 0, 0, GFP_KERNEL);
>>>> +	if (err < 0)
>>>> +		return -EFAULT;
>>>> +
>>>> +	vdpa->dev.bus = &vdpa_bus;
>>>> +	device_initialize(&vdpa->dev);
>>> IMHO device_initialize should not be called inside something called
>>> register, toooften we find out that the caller drivers need the device
>>> to be initialized earlier, ie to use the kref, or something.
>>>
>>> I find the best flow is to have some init function that does the
>>> device_initialize and sets the device_name that the driver can call
>>> early.
>>
>> Ok, will do.
>>
>>
>>> Shouldn't there be a device/driver matching process of some kind?
>>
>> The question is what do we want do match here.
>>
>> 1) "virtio" vs "vhost", I implemented matching method for this in mdev
>> series, but it looks unnecessary for vDPA device driver to know about
>> this. Anyway we can use sysfs driver bind/unbind to switch drivers
>> 2) virtio device id and vendor id. I'm not sure we need this consider
>> the two drivers so far (virtio/vhost) are all bus drivers.
>>
>> Thanks
>>
>>
>>> Jason
>>>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-21  8:35                       ` Jason Wang
@ 2020-01-21 11:09                         ` Shahaf Shuler
  2020-01-22  6:36                           ` Jason Wang
  0 siblings, 1 reply; 76+ messages in thread
From: Shahaf Shuler @ 2020-01-21 11:09 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: Jason Gunthorpe, Rob Miller, linux-kernel, kvm, virtualization,
	Netdev, Bie, Tiwei, maxime.coquelin, Liang, Cunming, Wang,
	Zhihong, Wang, Xiao W, haotian.wang, Zhu, Lingshan, eperezma,
	lulu, Parav Pandit, Tian, Kevin, stefanha, rdunlap, hch,
	Ariel Adam, Jiri Pirko, hanand, mhabets

Tuesday, January 21, 2020 10:35 AM, Jason Wang:
> Subject: Re: [PATCH 3/5] vDPA: introduce vDPA bus
> 
> 
> On 2020/1/21 下午4:15, Michael S. Tsirkin wrote:
> > On Tue, Jan 21, 2020 at 04:00:38PM +0800, Jason Wang wrote:
> >> On 2020/1/21 下午1:47, Michael S. Tsirkin wrote:
> >>> On Tue, Jan 21, 2020 at 12:00:57PM +0800, Jason Wang wrote:
> >>>> On 2020/1/21 上午1:49, Jason Gunthorpe wrote:
> >>>>> On Mon, Jan 20, 2020 at 04:43:53PM +0800, Jason Wang wrote:
> >>>>>> This is similar to the design of platform IOMMU part of
> >>>>>> vhost-vdpa. We decide to send diffs to platform IOMMU there. If
> >>>>>> it's ok to do that in driver, we can replace set_map with incremental
> API like map()/unmap().
> >>>>>>
> >>>>>> Then driver need to maintain rbtree itself.
> >>>>> I think we really need to see two modes, one where there is a
> >>>>> fixed translation without dynamic vIOMMU driven changes and one
> >>>>> that supports vIOMMU.
> >>>> I think in this case, you meant the method proposed by Shahaf that
> >>>> sends diffs of "fixed translation" to device?
> >>>>
> >>>> It would be kind of tricky to deal with the following case for example:
> >>>>
> >>>> old map [4G, 16G) new map [4G, 8G)
> >>>>
> >>>> If we do
> >>>>
> >>>> 1) flush [4G, 16G)
> >>>> 2) add [4G, 8G)
> >>>>
> >>>> There could be a window between 1) and 2).
> >>>>
> >>>> It requires the IOMMU that can do
> >>>>
> >>>> 1) remove [8G, 16G)
> >>>> 2) flush [8G, 16G)
> >>>> 3) change [4G, 8G)
> >>>>
> >>>> ....
> >>> Basically what I had in mind is something like qemu memory api
> >>>
> >>> 0. begin
> >>> 1. remove [8G, 16G)
> >>> 2. add [4G, 8G)
> >>> 3. commit
> >>
> >> This sounds more flexible e.g driver may choose to implement static
> >> mapping one through commit. But a question here, it looks to me this
> >> still requires the DMA to be synced with at least commit here.
> >> Otherwise device may get DMA fault? Or device is expected to be paused
> DMA during begin?
> >>
> >> Thanks
> > For example, commit might switch one set of tables for another,
> > without need to pause DMA.
> 
> 
> Yes, I think that works but need confirmation from Shahaf or Jason.

From my side, as I wrote, I would like to see the suggested function prototype along w/ the definition of the expectation from driver upon calling those. 
It is not 100% clear to me what should be the outcome of remove/flush/change/commit

> 
> Thanks
> 
> 
> 
> >
> >>> Anyway, I'm fine with a one-shot API for now, we can improve it
> >>> later.
> >>>
> >>>>> There are different optimization goals in the drivers for these
> >>>>> two configurations.
> >>>>>
> >>>>>>> If the first one, then I think memory hotplug is a heavy flow
> >>>>>>> regardless. Do you think the extra cycles for the tree traverse
> >>>>>>> will be visible in any way?
> >>>>>> I think if the driver can pause the DMA during the time for
> >>>>>> setting up new mapping, it should be fine.
> >>>>> This is very tricky for any driver if the mapping change hits the
> >>>>> virtio rings. :(
> >>>>>
> >>>>> Even a IOMMU using driver is going to have problems with that..
> >>>>>
> >>>>> Jason
> >>>> Or I wonder whether ATS/PRI can help here. E.g during I/O page
> >>>> fault, driver/device can wait for the new mapping to be set and
> >>>> then replay the DMA.
> >>>>
> >>>> Thanks
> >>>>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-21  8:15                     ` Michael S. Tsirkin
  2020-01-21  8:35                       ` Jason Wang
@ 2020-01-21 14:05                       ` Jason Gunthorpe
  2020-01-21 14:17                         ` Michael S. Tsirkin
  1 sibling, 1 reply; 76+ messages in thread
From: Jason Gunthorpe @ 2020-01-21 14:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Shahaf Shuler, Rob Miller, linux-kernel, kvm,
	virtualization, Netdev, Bie, Tiwei, maxime.coquelin, Liang,
	Cunming, Wang, Zhihong, Wang, Xiao W, haotian.wang, Zhu,
	Lingshan, eperezma, lulu, Parav Pandit, Tian, Kevin, stefanha,
	rdunlap, hch, Ariel Adam, jakub.kicinski, Jiri Pirko, hanand,
	mhabets

On Tue, Jan 21, 2020 at 03:15:43AM -0500, Michael S. Tsirkin wrote:
> > This sounds more flexible e.g driver may choose to implement static mapping
> > one through commit. But a question here, it looks to me this still requires
> > the DMA to be synced with at least commit here. Otherwise device may get DMA
> > fault? Or device is expected to be paused DMA during begin?
> > 
> > Thanks
> 
> For example, commit might switch one set of tables for another,
> without need to pause DMA.

I'm not aware of any hardware that can do something like this
completely atomically..

Any mapping change API has to be based around add/remove regions
without any active DMA (ie active DMA is a guest error the guest can
be crashed if it does this)

Jason

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-20 21:25                 ` Michael S. Tsirkin
  2020-01-20 21:47                   ` Shahaf Shuler
@ 2020-01-21 14:07                   ` Jason Gunthorpe
  2020-01-21 14:16                     ` Michael S. Tsirkin
  1 sibling, 1 reply; 76+ messages in thread
From: Jason Gunthorpe @ 2020-01-21 14:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Shahaf Shuler, Jason Wang, Rob Miller, linux-kernel, kvm,
	virtualization, Netdev, Bie, Tiwei, maxime.coquelin, Liang,
	Cunming, Wang, Zhihong, Wang, Xiao W, haotian.wang, Zhu,
	Lingshan, eperezma, lulu, Parav Pandit, Tian, Kevin, stefanha,
	rdunlap, hch, Ariel Adam, jakub.kicinski, Jiri Pirko, hanand,
	mhabets

On Mon, Jan 20, 2020 at 04:25:23PM -0500, Michael S. Tsirkin wrote:
> On Mon, Jan 20, 2020 at 08:51:43PM +0000, Shahaf Shuler wrote:
> > Monday, January 20, 2020 7:50 PM, Jason Gunthorpe:
> > > Subject: Re: [PATCH 3/5] vDPA: introduce vDPA bus
> > > 
> > > On Mon, Jan 20, 2020 at 04:43:53PM +0800, Jason Wang wrote:
> > > > This is similar to the design of platform IOMMU part of vhost-vdpa. We
> > > > decide to send diffs to platform IOMMU there. If it's ok to do that in
> > > > driver, we can replace set_map with incremental API like map()/unmap().
> > > >
> > > > Then driver need to maintain rbtree itself.
> > > 
> > > I think we really need to see two modes, one where there is a fixed
> > > translation without dynamic vIOMMU driven changes and one that supports
> > > vIOMMU.
> > > 
> > > There are different optimization goals in the drivers for these two
> > > configurations.
> > 
> > +1.
> > It will be best to have one API for static config (i.e. mapping can be
> > set only before virtio device gets active), and one API for dynamic
> > changes that can be set after the virtio device is active. 
> 
> Frankly I don't see when we'd use the static one.
> Memory hotplug is enabled for most guests...

If someone wants to run a full performance application, like dpdk,
then they may wish to trade memory hotplug in that VM for more
performance.

Perhaps Shahaf can quantify the performance delta?

Jason

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-20 21:56             ` Michael S. Tsirkin
@ 2020-01-21 14:12               ` Jason Gunthorpe
  2020-01-21 14:15                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 76+ messages in thread
From: Jason Gunthorpe @ 2020-01-21 14:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, linux-kernel, kvm, virtualization, netdev, tiwei.bie,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller,
	xiao.w.wang, haotian.wang, lingshan.zhu, eperezma, lulu,
	Parav Pandit, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, Jiri Pirko, Shahaf Shuler, hanand, mhabets

On Mon, Jan 20, 2020 at 04:56:06PM -0500, Michael S. Tsirkin wrote:
> > For vfio? vfio is the only thing I am aware doing
> > that, and this is not vfio..
> 
> vfio is not doing anything. anyone can use a combination
> of unbind and driver_override to attach a driver to a device.
> 
> It's not a great interface but it's there without any code,
> and it will stay there without maintainance overhead
> if we later add a nicer one.

Well, it is not a great interface, and it is only really used in
normal cases by vfio.

I don't think it is a good idea to design new subsystems with that
idea in mind, particularly since detatching the vdpa driver would not
trigger destruction of the underlying dynamic resource (ie the SF).

We need a way to trigger that destruction..

Jason 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-21 14:12               ` Jason Gunthorpe
@ 2020-01-21 14:15                 ` Michael S. Tsirkin
  2020-01-21 14:16                   ` Jason Gunthorpe
  0 siblings, 1 reply; 76+ messages in thread
From: Michael S. Tsirkin @ 2020-01-21 14:15 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jason Wang, linux-kernel, kvm, virtualization, netdev, tiwei.bie,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller,
	xiao.w.wang, haotian.wang, lingshan.zhu, eperezma, lulu,
	Parav Pandit, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, Jiri Pirko, Shahaf Shuler, hanand, mhabets

On Tue, Jan 21, 2020 at 02:12:05PM +0000, Jason Gunthorpe wrote:
> On Mon, Jan 20, 2020 at 04:56:06PM -0500, Michael S. Tsirkin wrote:
> > > For vfio? vfio is the only thing I am aware doing
> > > that, and this is not vfio..
> > 
> > vfio is not doing anything. anyone can use a combination
> > of unbind and driver_override to attach a driver to a device.
> > 
> > It's not a great interface but it's there without any code,
> > and it will stay there without maintainance overhead
> > if we later add a nicer one.
> 
> Well, it is not a great interface, and it is only really used in
> normal cases by vfio.
> 
> I don't think it is a good idea to design new subsystems with that
> idea in mind, particularly since detatching the vdpa driver would not
> trigger destruction of the underlying dynamic resource (ie the SF).
> 
> We need a way to trigger that destruction..
> 
> Jason 

You wanted a netlink command for this, right?

-- 
MST


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-21 14:07                   ` Jason Gunthorpe
@ 2020-01-21 14:16                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 76+ messages in thread
From: Michael S. Tsirkin @ 2020-01-21 14:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Shahaf Shuler, Jason Wang, Rob Miller, linux-kernel, kvm,
	virtualization, Netdev, Bie, Tiwei, maxime.coquelin, Liang,
	Cunming, Wang, Zhihong, Wang, Xiao W, haotian.wang, Zhu,
	Lingshan, eperezma, lulu, Parav Pandit, Tian, Kevin, stefanha,
	rdunlap, hch, Ariel Adam, jakub.kicinski, Jiri Pirko, hanand,
	mhabets

On Tue, Jan 21, 2020 at 02:07:59PM +0000, Jason Gunthorpe wrote:
> On Mon, Jan 20, 2020 at 04:25:23PM -0500, Michael S. Tsirkin wrote:
> > On Mon, Jan 20, 2020 at 08:51:43PM +0000, Shahaf Shuler wrote:
> > > Monday, January 20, 2020 7:50 PM, Jason Gunthorpe:
> > > > Subject: Re: [PATCH 3/5] vDPA: introduce vDPA bus
> > > > 
> > > > On Mon, Jan 20, 2020 at 04:43:53PM +0800, Jason Wang wrote:
> > > > > This is similar to the design of platform IOMMU part of vhost-vdpa. We
> > > > > decide to send diffs to platform IOMMU there. If it's ok to do that in
> > > > > driver, we can replace set_map with incremental API like map()/unmap().
> > > > >
> > > > > Then driver need to maintain rbtree itself.
> > > > 
> > > > I think we really need to see two modes, one where there is a fixed
> > > > translation without dynamic vIOMMU driven changes and one that supports
> > > > vIOMMU.
> > > > 
> > > > There are different optimization goals in the drivers for these two
> > > > configurations.
> > > 
> > > +1.
> > > It will be best to have one API for static config (i.e. mapping can be
> > > set only before virtio device gets active), and one API for dynamic
> > > changes that can be set after the virtio device is active. 
> > 
> > Frankly I don't see when we'd use the static one.
> > Memory hotplug is enabled for most guests...
> 
> If someone wants to run a full performance application, like dpdk,
> then they may wish to trade memory hotplug in that VM for more
> performance.

Right. But let let's get basic functionality working first.

> Perhaps Shahaf can quantify the performance delta?
> 
> Jason


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-21 14:15                 ` Michael S. Tsirkin
@ 2020-01-21 14:16                   ` Jason Gunthorpe
  0 siblings, 0 replies; 76+ messages in thread
From: Jason Gunthorpe @ 2020-01-21 14:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, linux-kernel, kvm, virtualization, netdev, tiwei.bie,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller,
	xiao.w.wang, haotian.wang, lingshan.zhu, eperezma, lulu,
	Parav Pandit, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, Jiri Pirko, Shahaf Shuler, hanand, mhabets

On Tue, Jan 21, 2020 at 09:15:14AM -0500, Michael S. Tsirkin wrote:
> On Tue, Jan 21, 2020 at 02:12:05PM +0000, Jason Gunthorpe wrote:
> > On Mon, Jan 20, 2020 at 04:56:06PM -0500, Michael S. Tsirkin wrote:
> > > > For vfio? vfio is the only thing I am aware doing
> > > > that, and this is not vfio..
> > > 
> > > vfio is not doing anything. anyone can use a combination
> > > of unbind and driver_override to attach a driver to a device.
> > > 
> > > It's not a great interface but it's there without any code,
> > > and it will stay there without maintainance overhead
> > > if we later add a nicer one.
> > 
> > Well, it is not a great interface, and it is only really used in
> > normal cases by vfio.
> > 
> > I don't think it is a good idea to design new subsystems with that
> > idea in mind, particularly since detatching the vdpa driver would not
> > trigger destruction of the underlying dynamic resource (ie the SF).
> > 
> > We need a way to trigger that destruction..
> > 
> > Jason 
> 
> You wanted a netlink command for this, right?

It is my suggestion.

Based on experiance here we started out with sysfs and it was OK, but
slow. When we added container support the entire sysfs thing
completely exploded and we had to replace it with netlink.

Jason

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-21 14:05                       ` Jason Gunthorpe
@ 2020-01-21 14:17                         ` Michael S. Tsirkin
  2020-01-22  6:18                           ` Jason Wang
  0 siblings, 1 reply; 76+ messages in thread
From: Michael S. Tsirkin @ 2020-01-21 14:17 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jason Wang, Shahaf Shuler, Rob Miller, linux-kernel, kvm,
	virtualization, Netdev, Bie, Tiwei, maxime.coquelin, Liang,
	Cunming, Wang, Zhihong, Wang, Xiao W, haotian.wang, Zhu,
	Lingshan, eperezma, lulu, Parav Pandit, Tian, Kevin, stefanha,
	rdunlap, hch, Ariel Adam, jakub.kicinski, Jiri Pirko, hanand,
	mhabets

On Tue, Jan 21, 2020 at 02:05:04PM +0000, Jason Gunthorpe wrote:
> On Tue, Jan 21, 2020 at 03:15:43AM -0500, Michael S. Tsirkin wrote:
> > > This sounds more flexible e.g driver may choose to implement static mapping
> > > one through commit. But a question here, it looks to me this still requires
> > > the DMA to be synced with at least commit here. Otherwise device may get DMA
> > > fault? Or device is expected to be paused DMA during begin?
> > > 
> > > Thanks
> > 
> > For example, commit might switch one set of tables for another,
> > without need to pause DMA.
> 
> I'm not aware of any hardware that can do something like this
> completely atomically..

FWIW VTD can do this atomically.

> Any mapping change API has to be based around add/remove regions
> without any active DMA (ie active DMA is a guest error the guest can
> be crashed if it does this)
> 
> Jason

Right, lots of cases are well served by only changing parts of
mapping that aren't in active use. Memory hotplug is such a case.
That's not the same as a completely static mapping.

-- 
MST


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-21 14:17                         ` Michael S. Tsirkin
@ 2020-01-22  6:18                           ` Jason Wang
  0 siblings, 0 replies; 76+ messages in thread
From: Jason Wang @ 2020-01-22  6:18 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Gunthorpe
  Cc: Shahaf Shuler, Rob Miller, linux-kernel, kvm, virtualization,
	Netdev, Bie, Tiwei, maxime.coquelin, Liang, Cunming, Wang,
	Zhihong, Wang, Xiao W, haotian.wang, Zhu, Lingshan, eperezma,
	lulu, Parav Pandit, Tian, Kevin, stefanha, rdunlap, hch,
	Ariel Adam, jakub.kicinski, Jiri Pirko, hanand, mhabets


On 2020/1/21 下午10:17, Michael S. Tsirkin wrote:
> On Tue, Jan 21, 2020 at 02:05:04PM +0000, Jason Gunthorpe wrote:
>> On Tue, Jan 21, 2020 at 03:15:43AM -0500, Michael S. Tsirkin wrote:
>>>> This sounds more flexible e.g driver may choose to implement static mapping
>>>> one through commit. But a question here, it looks to me this still requires
>>>> the DMA to be synced with at least commit here. Otherwise device may get DMA
>>>> fault? Or device is expected to be paused DMA during begin?
>>>>
>>>> Thanks
>>> For example, commit might switch one set of tables for another,
>>> without need to pause DMA.
>> I'm not aware of any hardware that can do something like this
>> completely atomically..
> FWIW VTD can do this atomically.
>
>> Any mapping change API has to be based around add/remove regions
>> without any active DMA (ie active DMA is a guest error the guest can
>> be crashed if it does this)
>>
>> Jason
> Right, lots of cases are well served by only changing parts of
> mapping that aren't in active use. Memory hotplug is such a case.
> That's not the same as a completely static mapping.


For hotplug it should be fine with current Qemu since it belongs to 
different memory regions. So each dimm should have its own dedicated map 
entries in IOMMU.

But I'm not sure if the merging logic in current vhost memory listener 
may cause any trouble, we may need to disable it.

Thanks


>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 3/5] vDPA: introduce vDPA bus
  2020-01-21 11:09                         ` Shahaf Shuler
@ 2020-01-22  6:36                           ` Jason Wang
  0 siblings, 0 replies; 76+ messages in thread
From: Jason Wang @ 2020-01-22  6:36 UTC (permalink / raw)
  To: Shahaf Shuler, Michael S. Tsirkin
  Cc: Jason Gunthorpe, Rob Miller, linux-kernel, kvm, virtualization,
	Netdev, Bie, Tiwei, maxime.coquelin, Liang, Cunming, Wang,
	Zhihong, Wang, Xiao W, haotian.wang, Zhu, Lingshan, eperezma,
	lulu, Parav Pandit, Tian, Kevin, stefanha, rdunlap, hch,
	Ariel Adam, Jiri Pirko, hanand, mhabets


On 2020/1/21 下午7:09, Shahaf Shuler wrote:
> Tuesday, January 21, 2020 10:35 AM, Jason Wang:
>> Subject: Re: [PATCH 3/5] vDPA: introduce vDPA bus
>>
>>
>> On 2020/1/21 下午4:15, Michael S. Tsirkin wrote:
>>> On Tue, Jan 21, 2020 at 04:00:38PM +0800, Jason Wang wrote:
>>>> On 2020/1/21 下午1:47, Michael S. Tsirkin wrote:
>>>>> On Tue, Jan 21, 2020 at 12:00:57PM +0800, Jason Wang wrote:
>>>>>> On 2020/1/21 上午1:49, Jason Gunthorpe wrote:
>>>>>>> On Mon, Jan 20, 2020 at 04:43:53PM +0800, Jason Wang wrote:
>>>>>>>> This is similar to the design of platform IOMMU part of
>>>>>>>> vhost-vdpa. We decide to send diffs to platform IOMMU there. If
>>>>>>>> it's ok to do that in driver, we can replace set_map with incremental
>> API like map()/unmap().
>>>>>>>> Then driver need to maintain rbtree itself.
>>>>>>> I think we really need to see two modes, one where there is a
>>>>>>> fixed translation without dynamic vIOMMU driven changes and one
>>>>>>> that supports vIOMMU.
>>>>>> I think in this case, you meant the method proposed by Shahaf that
>>>>>> sends diffs of "fixed translation" to device?
>>>>>>
>>>>>> It would be kind of tricky to deal with the following case for example:
>>>>>>
>>>>>> old map [4G, 16G) new map [4G, 8G)
>>>>>>
>>>>>> If we do
>>>>>>
>>>>>> 1) flush [4G, 16G)
>>>>>> 2) add [4G, 8G)
>>>>>>
>>>>>> There could be a window between 1) and 2).
>>>>>>
>>>>>> It requires the IOMMU that can do
>>>>>>
>>>>>> 1) remove [8G, 16G)
>>>>>> 2) flush [8G, 16G)
>>>>>> 3) change [4G, 8G)
>>>>>>
>>>>>> ....
>>>>> Basically what I had in mind is something like qemu memory api
>>>>>
>>>>> 0. begin
>>>>> 1. remove [8G, 16G)
>>>>> 2. add [4G, 8G)
>>>>> 3. commit
>>>> This sounds more flexible e.g driver may choose to implement static
>>>> mapping one through commit. But a question here, it looks to me this
>>>> still requires the DMA to be synced with at least commit here.
>>>> Otherwise device may get DMA fault? Or device is expected to be paused
>> DMA during begin?
>>>> Thanks
>>> For example, commit might switch one set of tables for another,
>>> without need to pause DMA.
>> Yes, I think that works but need confirmation from Shahaf or Jason.
>  From my side, as I wrote, I would like to see the suggested function prototype along w/ the definition of the expectation from driver upon calling those.
> It is not 100% clear to me what should be the outcome of remove/flush/change/commit


Right, I can do this in next version after the discussion is converged.

Thanks


>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 5/5] vdpasim: vDPA device simulator
  2020-01-16 12:42 ` [PATCH 5/5] vdpasim: vDPA device simulator Jason Wang
                     ` (2 preceding siblings ...)
  2020-01-18 18:18   ` kbuild test robot
@ 2020-01-28  3:32   ` Dan Carpenter
  2020-02-04  4:07     ` Jason Wang
  2020-02-04  8:21   ` Zhu Lingshan
  4 siblings, 1 reply; 76+ messages in thread
From: Dan Carpenter @ 2020-01-28  3:32 UTC (permalink / raw)
  To: kbuild, Jason Wang
  Cc: kbuild-all, mst, jasowang, linux-kernel, kvm, virtualization,
	netdev, tiwei.bie, jgg, maxime.coquelin, cunming.liang,
	zhihong.wang, rob.miller, xiao.w.wang, haotian.wang,
	lingshan.zhu, eperezma, lulu, parav, kevin.tian, stefanha,
	rdunlap, hch, aadam, jakub.kicinski, jiri, shahafs, hanand,
	mhabets

Hi Jason,

url:    https://github.com/0day-ci/linux/commits/Jason-Wang/vDPA-support/20200117-170243
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
drivers/virtio/vdpa/vdpa_sim.c:288 vdpasim_alloc_coherent() warn: returning freed memory 'addr'

# https://github.com/0day-ci/linux/commit/55047769b3e974d68b2aab5ce0022459b172a23f
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 55047769b3e974d68b2aab5ce0022459b172a23f
vim +/addr +288 drivers/virtio/vdpa/vdpa_sim.c

55047769b3e974 Jason Wang 2020-01-16  263  static void *vdpasim_alloc_coherent(struct device *dev, size_t size,
55047769b3e974 Jason Wang 2020-01-16  264  				    dma_addr_t *dma_addr, gfp_t flag,
55047769b3e974 Jason Wang 2020-01-16  265  				    unsigned long attrs)
55047769b3e974 Jason Wang 2020-01-16  266  {
55047769b3e974 Jason Wang 2020-01-16  267  	struct vdpa_device *vdpa = dev_to_vdpa(dev);
55047769b3e974 Jason Wang 2020-01-16  268  	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
55047769b3e974 Jason Wang 2020-01-16  269  	struct vhost_iotlb *iommu = vdpasim->iommu;
55047769b3e974 Jason Wang 2020-01-16  270  	void *addr = kmalloc(size, flag);
55047769b3e974 Jason Wang 2020-01-16  271  	int ret;
55047769b3e974 Jason Wang 2020-01-16  272  
55047769b3e974 Jason Wang 2020-01-16  273  	if (!addr)
55047769b3e974 Jason Wang 2020-01-16  274  		*dma_addr = DMA_MAPPING_ERROR;
55047769b3e974 Jason Wang 2020-01-16  275  	else {
55047769b3e974 Jason Wang 2020-01-16  276  		u64 pa = virt_to_phys(addr);
55047769b3e974 Jason Wang 2020-01-16  277  
55047769b3e974 Jason Wang 2020-01-16  278  		ret = vhost_iotlb_add_range(iommu, (u64)pa,
55047769b3e974 Jason Wang 2020-01-16  279  					    (u64)pa + size - 1,
55047769b3e974 Jason Wang 2020-01-16  280  					    pa, VHOST_MAP_RW);
55047769b3e974 Jason Wang 2020-01-16  281  		if (ret) {
55047769b3e974 Jason Wang 2020-01-16  282  			kfree(addr);
                                                                ^^^^^^^^^^^
55047769b3e974 Jason Wang 2020-01-16  283  			*dma_addr = DMA_MAPPING_ERROR;
55047769b3e974 Jason Wang 2020-01-16  284  		} else
55047769b3e974 Jason Wang 2020-01-16  285  			*dma_addr = (dma_addr_t)pa;
55047769b3e974 Jason Wang 2020-01-16  286  	}
55047769b3e974 Jason Wang 2020-01-16  287  
55047769b3e974 Jason Wang 2020-01-16 @288  	return addr;
                                                ^^^^^^^^^^^^
55047769b3e974 Jason Wang 2020-01-16  289  }

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 5/5] vdpasim: vDPA device simulator
  2020-01-28  3:32   ` Dan Carpenter
@ 2020-02-04  4:07     ` Jason Wang
  0 siblings, 0 replies; 76+ messages in thread
From: Jason Wang @ 2020-02-04  4:07 UTC (permalink / raw)
  To: Dan Carpenter, kbuild
  Cc: kbuild-all, mst, linux-kernel, kvm, virtualization, netdev,
	tiwei.bie, jgg, maxime.coquelin, cunming.liang, zhihong.wang,
	rob.miller, xiao.w.wang, haotian.wang, lingshan.zhu, eperezma,
	lulu, parav, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, jiri, shahafs, hanand, mhabets


On 2020/1/28 上午11:32, Dan Carpenter wrote:
> Hi Jason,
>
> url:    https://github.com/0day-ci/linux/commits/Jason-Wang/vDPA-support/20200117-170243
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next


Will fix this.

Thanks


>
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp@intel.com>
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
>
> smatch warnings:
> drivers/virtio/vdpa/vdpa_sim.c:288 vdpasim_alloc_coherent() warn: returning freed memory 'addr'
>
> # https://github.com/0day-ci/linux/commit/55047769b3e974d68b2aab5ce0022459b172a23f
> git remote add linux-review https://github.com/0day-ci/linux
> git remote update linux-review
> git checkout 55047769b3e974d68b2aab5ce0022459b172a23f
> vim +/addr +288 drivers/virtio/vdpa/vdpa_sim.c
>
> 55047769b3e974 Jason Wang 2020-01-16  263  static void *vdpasim_alloc_coherent(struct device *dev, size_t size,
> 55047769b3e974 Jason Wang 2020-01-16  264  				    dma_addr_t *dma_addr, gfp_t flag,
> 55047769b3e974 Jason Wang 2020-01-16  265  				    unsigned long attrs)
> 55047769b3e974 Jason Wang 2020-01-16  266  {
> 55047769b3e974 Jason Wang 2020-01-16  267  	struct vdpa_device *vdpa = dev_to_vdpa(dev);
> 55047769b3e974 Jason Wang 2020-01-16  268  	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> 55047769b3e974 Jason Wang 2020-01-16  269  	struct vhost_iotlb *iommu = vdpasim->iommu;
> 55047769b3e974 Jason Wang 2020-01-16  270  	void *addr = kmalloc(size, flag);
> 55047769b3e974 Jason Wang 2020-01-16  271  	int ret;
> 55047769b3e974 Jason Wang 2020-01-16  272
> 55047769b3e974 Jason Wang 2020-01-16  273  	if (!addr)
> 55047769b3e974 Jason Wang 2020-01-16  274  		*dma_addr = DMA_MAPPING_ERROR;
> 55047769b3e974 Jason Wang 2020-01-16  275  	else {
> 55047769b3e974 Jason Wang 2020-01-16  276  		u64 pa = virt_to_phys(addr);
> 55047769b3e974 Jason Wang 2020-01-16  277
> 55047769b3e974 Jason Wang 2020-01-16  278  		ret = vhost_iotlb_add_range(iommu, (u64)pa,
> 55047769b3e974 Jason Wang 2020-01-16  279  					    (u64)pa + size - 1,
> 55047769b3e974 Jason Wang 2020-01-16  280  					    pa, VHOST_MAP_RW);
> 55047769b3e974 Jason Wang 2020-01-16  281  		if (ret) {
> 55047769b3e974 Jason Wang 2020-01-16  282  			kfree(addr);
>                                                                  ^^^^^^^^^^^
> 55047769b3e974 Jason Wang 2020-01-16  283  			*dma_addr = DMA_MAPPING_ERROR;
> 55047769b3e974 Jason Wang 2020-01-16  284  		} else
> 55047769b3e974 Jason Wang 2020-01-16  285  			*dma_addr = (dma_addr_t)pa;
> 55047769b3e974 Jason Wang 2020-01-16  286  	}
> 55047769b3e974 Jason Wang 2020-01-16  287
> 55047769b3e974 Jason Wang 2020-01-16 @288  	return addr;
>                                                  ^^^^^^^^^^^^
> 55047769b3e974 Jason Wang 2020-01-16  289  }
>
> ---
> 0-DAY kernel test infrastructure                 Open Source Technology Center
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org Intel Corporation
>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 5/5] vdpasim: vDPA device simulator
  2020-01-17 14:10       ` Jason Gunthorpe
  2020-01-20  8:01         ` Jason Wang
@ 2020-02-04  4:19         ` Jason Wang
  1 sibling, 0 replies; 76+ messages in thread
From: Jason Wang @ 2020-02-04  4:19 UTC (permalink / raw)
  To: Jason Gunthorpe, Parav Pandit
  Cc: mst, linux-kernel, kvm, virtualization, netdev, tiwei.bie,
	maxime.coquelin, cunming.liang, zhihong.wang, rob.miller,
	xiao.w.wang, haotian.wang, lingshan.zhu, eperezma, lulu,
	kevin.tian, stefanha, rdunlap, hch, aadam, jakub.kicinski,
	Jiri Pirko, Shahaf Shuler, hanand, mhabets, kuba


On 2020/1/17 下午10:10, Jason Gunthorpe wrote:
>>>> Netlink based lifecycle management could be implemented for vDPA
>>>> simulator as well.
>>> This is just begging for a netlink based approach.
>>>
>>> Certainly netlink driven removal should be an agreeable standard for
>>> all devices, I think.
>> Well, I think Parav had some proposals during the discussion of mdev
>> approach. But I'm not sure if he had any RFC codes for me to integrate it
>> into vdpasim.
>>
>> Or do you want me to propose the netlink API? If yes, would you prefer to a
>> new virtio dedicated one or be a subset of devlink?
> Well, lets see what feed back Parav has
>
> Jason


Hi Parav:

Do you have any update on this? If it still require sometime, I will 
post V2 that sticks to sysfs based API.

Thanks


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 5/5] vdpasim: vDPA device simulator
  2020-01-16 12:42 ` [PATCH 5/5] vdpasim: vDPA device simulator Jason Wang
                     ` (3 preceding siblings ...)
  2020-01-28  3:32   ` Dan Carpenter
@ 2020-02-04  8:21   ` Zhu Lingshan
  2020-02-04  8:28     ` Jason Wang
  4 siblings, 1 reply; 76+ messages in thread
From: Zhu Lingshan @ 2020-02-04  8:21 UTC (permalink / raw)
  To: Jason Wang, mst, linux-kernel, kvm, virtualization, netdev
  Cc: tiwei.bie, jgg, maxime.coquelin, cunming.liang, zhihong.wang,
	rob.miller, xiao.w.wang, haotian.wang, lingshan.zhu, eperezma,
	lulu, parav, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, jiri, shahafs, hanand, mhabets


On 1/16/2020 8:42 PM, Jason Wang wrote:
> This patch implements a software vDPA networking device. The datapath
> is implemented through vringh and workqueue. The device has an on-chip
> IOMMU which translates IOVA to PA. For kernel virtio drivers, vDPA
> simulator driver provides dma_ops. For vhost driers, set_map() methods
> of vdpa_config_ops is implemented to accept mappings from vhost.
>
> A sysfs based management interface is implemented, devices are
> created and removed through:
>
> /sys/devices/virtual/vdpa_simulator/netdev/{create|remove}
>
> Netlink based lifecycle management could be implemented for vDPA
> simulator as well.
>
> Currently, vDPA device simulator will loopback TX traffic to RX. So
> the main use case for the device is vDPA feature testing, prototyping
> and development.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>   drivers/virtio/vdpa/Kconfig    |  17 +
>   drivers/virtio/vdpa/Makefile   |   1 +
>   drivers/virtio/vdpa/vdpa_sim.c | 796 +++++++++++++++++++++++++++++++++
>   3 files changed, 814 insertions(+)
>   create mode 100644 drivers/virtio/vdpa/vdpa_sim.c
>
> diff --git a/drivers/virtio/vdpa/Kconfig b/drivers/virtio/vdpa/Kconfig
> index 3032727b4d98..12ec25d48423 100644
> --- a/drivers/virtio/vdpa/Kconfig
> +++ b/drivers/virtio/vdpa/Kconfig
> @@ -7,3 +7,20 @@ config VDPA
>             datapath which complies with virtio specifications with
>             vendor specific control path.
>   
> +menuconfig VDPA_MENU
> +	bool "VDPA drivers"
> +	default n
> +
> +if VDPA_MENU
> +
> +config VDPA_SIM
> +	tristate "vDPA device simulator"
> +        select VDPA
> +        default n
> +        help
> +          vDPA networking device simulator which loop TX traffic back
> +          to RX. This device is used for testing, prototyping and
> +          development of vDPA.
> +
> +endif # VDPA_MENU
> +
> diff --git a/drivers/virtio/vdpa/Makefile b/drivers/virtio/vdpa/Makefile
> index ee6a35e8a4fb..5ec0e6ae3c57 100644
> --- a/drivers/virtio/vdpa/Makefile
> +++ b/drivers/virtio/vdpa/Makefile
> @@ -1,2 +1,3 @@
>   # SPDX-License-Identifier: GPL-2.0
>   obj-$(CONFIG_VDPA) += vdpa.o
> +obj-$(CONFIG_VDPA_SIM) += vdpa_sim.o
> diff --git a/drivers/virtio/vdpa/vdpa_sim.c b/drivers/virtio/vdpa/vdpa_sim.c
> new file mode 100644
> index 000000000000..85a235f99e3d
> --- /dev/null
> +++ b/drivers/virtio/vdpa/vdpa_sim.c
> @@ -0,0 +1,796 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * VDPA networking device simulator.
> + *
> + * Copyright (c) 2020, Red Hat Inc. All rights reserved.
> + *     Author: Jason Wang <jasowang@redhat.com>
> + *
> + */
> +
> +#include <linux/init.h>
> +#include <linux/module.h>
> +#include <linux/device.h>
> +#include <linux/kernel.h>
> +#include <linux/fs.h>
> +#include <linux/poll.h>
> +#include <linux/slab.h>
> +#include <linux/sched.h>
> +#include <linux/wait.h>
> +#include <linux/uuid.h>
> +#include <linux/iommu.h>
> +#include <linux/sysfs.h>
> +#include <linux/file.h>
> +#include <linux/etherdevice.h>
> +#include <linux/vringh.h>
> +#include <linux/vdpa.h>
> +#include <linux/vhost_iotlb.h>
> +#include <uapi/linux/virtio_config.h>
> +#include <uapi/linux/virtio_net.h>
> +
> +#define DRV_VERSION  "0.1"
> +#define DRV_AUTHOR   "Jason Wang <jasowang@redhat.com>"
> +#define DRV_DESC     "vDPA Device Simulator"
> +#define DRV_LICENSE  "GPL v2"
> +
> +struct vdpasim_dev {
> +	struct class	*vd_class;
> +	struct idr	vd_idr;
> +	struct device	dev;
> +	struct kobject  *devices_kobj;
> +};
> +
> +struct vdpasim_dev *vdpasim_dev;
> +
> +struct vdpasim_virtqueue {
> +	struct vringh vring;
> +	struct vringh_kiov iov;
> +	unsigned short head;
> +	bool ready;
> +	u64 desc_addr;
> +	u64 device_addr;
> +	u64 driver_addr;
> +	u32 num;
> +	void *private;
> +	irqreturn_t (*cb)(void *data);
> +};
> +
> +#define VDPASIM_QUEUE_ALIGN PAGE_SIZE
> +#define VDPASIM_QUEUE_MAX 256
> +#define VDPASIM_DEVICE_ID 0x1
> +#define VDPASIM_VENDOR_ID 0
> +#define VDPASIM_VQ_NUM 0x2
> +#define VDPASIM_CLASS_NAME "vdpa_simulator"
> +#define VDPASIM_NAME "netdev"
> +
> +u64 vdpasim_features = (1ULL << VIRTIO_F_ANY_LAYOUT) |
> +		       (1ULL << VIRTIO_F_VERSION_1)  |
> +		       (1ULL << VIRTIO_F_IOMMU_PLATFORM);
> +
> +/* State of each vdpasim device */
> +struct vdpasim {
> +	struct vdpasim_virtqueue vqs[2];
> +	struct work_struct work;
> +	/* spinlock to synchronize virtqueue state */
> +	spinlock_t lock;
> +	struct vdpa_device vdpa;
> +	struct virtio_net_config config;
> +	struct vhost_iotlb *iommu;
> +	void *buffer;
> +	u32 status;
> +	u32 generation;
> +	u64 features;
> +	struct list_head next;
> +	guid_t uuid;
> +	char name[64];
> +};
> +
> +static struct mutex vsim_list_lock;
> +static struct list_head vsim_devices_list;
> +
> +static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa)
> +{
> +	return container_of(vdpa, struct vdpasim, vdpa);
> +}
> +
> +static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx)
> +{
> +	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> +	int ret;
> +
> +	ret = vringh_init_iotlb(&vq->vring, vdpasim_features, VDPASIM_QUEUE_MAX,
> +			        false, (struct vring_desc *)vq->desc_addr,
> +				(struct vring_avail *)vq->driver_addr,
> +				(struct vring_used *)vq->device_addr);
> +}
> +
> +static void vdpasim_vq_reset(struct vdpasim_virtqueue *vq)
> +{
> +	vq->ready = 0;
> +	vq->desc_addr = 0;
> +	vq->driver_addr = 0;
> +	vq->device_addr = 0;
> +	vq->cb = NULL;
> +	vq->private = NULL;
> +	vringh_init_iotlb(&vq->vring, vdpasim_features, VDPASIM_QUEUE_MAX,
> +			  false, 0, 0, 0);
> +}
> +
> +static void vdpasim_reset(struct vdpasim *vdpasim)
> +{
> +	int i;
> +
> +	for (i = 0; i < VDPASIM_VQ_NUM; i++)
> +		vdpasim_vq_reset(&vdpasim->vqs[i]);
> +
> +	vhost_iotlb_reset(vdpasim->iommu);
> +
> +	vdpasim->features = 0;
> +	vdpasim->status = 0;
> +	++vdpasim->generation;
> +}
> +
> +static void vdpasim_work(struct work_struct *work)
> +{
> +	struct vdpasim *vdpasim = container_of(work, struct
> +						 vdpasim, work);
> +	struct vdpasim_virtqueue *txq = &vdpasim->vqs[1];
> +	struct vdpasim_virtqueue *rxq = &vdpasim->vqs[0];
> +	size_t read, write, total_write;
> +	int err;
> +	int pkts = 0;
> +
> +	spin_lock(&vdpasim->lock);
> +
> +	if (!(vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK))
> +		goto out;
> +
> +	if (!txq->ready || !rxq->ready)
> +		goto out;
> +
> +	while (true) {
> +		total_write = 0;
> +		err = vringh_getdesc_iotlb(&txq->vring, &txq->iov, NULL,
> +					   &txq->head, GFP_ATOMIC);
> +		if (err <= 0)
> +			break;
> +
> +		err = vringh_getdesc_iotlb(&rxq->vring, NULL, &rxq->iov,
> +					   &rxq->head, GFP_ATOMIC);
> +		if (err <= 0) {
> +			vringh_complete_iotlb(&txq->vring, txq->head, 0);
> +			break;
> +		}
> +
> +		while (true) {
> +			read = vringh_iov_pull_iotlb(&txq->vring, &txq->iov,
> +						     vdpasim->buffer,
> +						     PAGE_SIZE);
> +			if (read <= 0)
> +				break;
> +
> +			write = vringh_iov_push_iotlb(&rxq->vring, &rxq->iov,
> +						      vdpasim->buffer, read);
> +			if (write <= 0)
> +				break;
> +
> +			total_write += write;
> +		}
> +
> +		/* Make sure data is wrote before advancing index */
> +		smp_wmb();
> +
> +		vringh_complete_iotlb(&txq->vring, txq->head, 0);
> +		vringh_complete_iotlb(&rxq->vring, rxq->head, total_write);
> +
> +		/* Make sure used is visible before rasing the interrupt. */
> +		smp_wmb();
> +
> +		local_bh_disable();
> +		if (txq->cb)
> +			txq->cb(txq->private);
> +		if (rxq->cb)
> +			rxq->cb(rxq->private);
> +		local_bh_enable();
> +
> +		if (++pkts > 4) {
> +			schedule_work(&vdpasim->work);
> +			goto out;
> +		}
> +	}
> +
> +out:
> +	spin_unlock(&vdpasim->lock);
> +}
> +
> +static int dir_to_perm(enum dma_data_direction dir)
> +{
> +	int perm = -EFAULT;
> +
> +	switch (dir) {
> +	case DMA_FROM_DEVICE:
> +		perm = VHOST_MAP_WO;
> +		break;
> +	case DMA_TO_DEVICE:
> +		perm = VHOST_MAP_RO;
> +		break;
> +	case DMA_BIDIRECTIONAL:
> +		perm = VHOST_MAP_RW;
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	return perm;
> +}
> +
> +static dma_addr_t vdpasim_map_page(struct device *dev, struct page *page,
> +				   unsigned long offset, size_t size,
> +				   enum dma_data_direction dir,
> +				   unsigned long attrs)
> +{
> +	struct vdpa_device *vdpa = dev_to_vdpa(dev);
> +	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> +	struct vhost_iotlb *iommu = vdpasim->iommu;
> +	u64 pa = (page_to_pfn(page) << PAGE_SHIFT) + offset;
> +	int ret, perm = dir_to_perm(dir);
> +
> +	if (perm < 0)
> +		return DMA_MAPPING_ERROR;
> +
> +	/* For simplicity, use identical mapping to avoid e.g iova
> +	 * allocator.
> +	 */
> +	ret = vhost_iotlb_add_range(iommu, pa, pa + size - 1,
> +				    pa, dir_to_perm(dir));
> +	if (ret)
> +		return DMA_MAPPING_ERROR;
> +
> +	return (dma_addr_t)(pa);
> +}
> +
> +static void vdpasim_unmap_page(struct device *dev, dma_addr_t dma_addr,
> +			       size_t size, enum dma_data_direction dir,
> +			       unsigned long attrs)
> +{
> +	struct vdpa_device *vdpa = dev_to_vdpa(dev);
> +	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> +	struct vhost_iotlb *iommu = vdpasim->iommu;
> +
> +	vhost_iotlb_del_range(iommu, (u64)dma_addr,
> +			      (u64)dma_addr + size - 1);
> +}
> +
> +static void *vdpasim_alloc_coherent(struct device *dev, size_t size,
> +				    dma_addr_t *dma_addr, gfp_t flag,
> +				    unsigned long attrs)
> +{
> +	struct vdpa_device *vdpa = dev_to_vdpa(dev);
> +	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> +	struct vhost_iotlb *iommu = vdpasim->iommu;
> +	void *addr = kmalloc(size, flag);
> +	int ret;
> +
> +	if (!addr)
> +		*dma_addr = DMA_MAPPING_ERROR;
> +	else {
> +		u64 pa = virt_to_phys(addr);
> +
> +		ret = vhost_iotlb_add_range(iommu, (u64)pa,
> +					    (u64)pa + size - 1,
> +					    pa, VHOST_MAP_RW);
> +		if (ret) {
> +			kfree(addr);
> +			*dma_addr = DMA_MAPPING_ERROR;
> +		} else
> +			*dma_addr = (dma_addr_t)pa;
> +	}
> +
> +	return addr;
> +}
> +
> +static void vdpasim_free_coherent(struct device *dev, size_t size,
> +				void *vaddr, dma_addr_t dma_addr,
> +				unsigned long attrs)
> +{
> +	struct vdpa_device *vdpa = dev_to_vdpa(dev);
> +	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> +	struct vhost_iotlb *iommu = vdpasim->iommu;
> +
> +	vhost_iotlb_del_range(iommu, (u64)dma_addr,
> +			       (u64)dma_addr + size - 1);
> +	kfree((void *)dma_addr);
> +}
> +
> +static const struct dma_map_ops vdpasim_dma_ops = {
> +	.map_page = vdpasim_map_page,
> +	.unmap_page = vdpasim_unmap_page,
> +	.alloc = vdpasim_alloc_coherent,
> +	.free = vdpasim_free_coherent,
> +};
> +

Hey Jason,

IMHO, it would be nice if dma_ops of the parent device could be re-used. 
vdpa_device is expecting to represent a physical device except this 
simulator, however, there are not enough information in vdpa_device.dev 
to indicating which kind physical device it attached to. Namely 
get_arch_dma_ops(struct bus type) can not work on vdpa_device.dev. Then 
it seems device drivers need to implement a wrap of dma_ops of parent 
devices. Can this work be done in the vdpa framework since it looks like 
a common task? Can "vd_dev->vdev.dev.parent = vdpa->dev->parent;" in 
virtio_vdpa_probe() do the work?

Thanks,
BR
Zhu Lingshan
> +static void vdpasim_release_dev(struct device *_d)
> +{
> +	struct vdpa_device *vdpa = dev_to_vdpa(_d);
> +	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> +
> +	sysfs_remove_link(vdpasim_dev->devices_kobj, vdpasim->name);
> +
> +	mutex_lock(&vsim_list_lock);
> +	list_del(&vdpasim->next);
> +	mutex_unlock(&vsim_list_lock);
> +
> +	kfree(vdpasim->buffer);
> +	kfree(vdpasim);
> +}
> +
> +static const struct vdpa_config_ops vdpasim_net_config_ops;
> +
> +static int vdpasim_create(const guid_t *uuid)
> +{
> +	struct vdpasim *vdpasim, *tmp;
> +	struct virtio_net_config *config;
> +	struct vdpa_device *vdpa;
> +	struct device *dev;
> +	int ret = -ENOMEM;
> +
> +	mutex_lock(&vsim_list_lock);
> +	list_for_each_entry(tmp, &vsim_devices_list, next) {
> +		if (guid_equal(&tmp->uuid, uuid)) {
> +			mutex_unlock(&vsim_list_lock);
> +			return -EEXIST;
> +		}
> +	}
> +
> +	vdpasim = kzalloc(sizeof(*vdpasim), GFP_KERNEL);
> +	if (!vdpasim)
> +		goto err_vdpa_alloc;
> +
> +	vdpasim->buffer = kmalloc(PAGE_SIZE, GFP_KERNEL);
> +	if (!vdpasim->buffer)
> +		goto err_buffer_alloc;
> +
> +	vdpasim->iommu = vhost_iotlb_alloc(2048, 0);
> +	if (!vdpasim->iommu)
> +		goto err_iotlb;
> +
> +	config = &vdpasim->config;
> +	config->mtu = 1500;
> +	config->status = VIRTIO_NET_S_LINK_UP;
> +	eth_random_addr(config->mac);
> +
> +	INIT_WORK(&vdpasim->work, vdpasim_work);
> +	spin_lock_init(&vdpasim->lock);
> +
> +	guid_copy(&vdpasim->uuid, uuid);
> +
> +	list_add(&vdpasim->next, &vsim_devices_list);
> +	vdpa = &vdpasim->vdpa;
> +
> +	mutex_unlock(&vsim_list_lock);
> +
> +	vdpa = &vdpasim->vdpa;
> +	vdpa->config = &vdpasim_net_config_ops;
> +	vdpa_set_parent(vdpa, &vdpasim_dev->dev);
> +	vdpa->dev.release = vdpasim_release_dev;
> +
> +	vringh_set_iotlb(&vdpasim->vqs[0].vring, vdpasim->iommu);
> +	vringh_set_iotlb(&vdpasim->vqs[1].vring, vdpasim->iommu);
> +
> +	dev = &vdpa->dev;
> +	dev->coherent_dma_mask = DMA_BIT_MASK(64);
> +	set_dma_ops(dev, &vdpasim_dma_ops);
> +
> +	ret = register_vdpa_device(vdpa);
> +	if (ret)
> +		goto err_register;
> +
> +	sprintf(vdpasim->name, "%pU", uuid);
> +
> +	ret = sysfs_create_link(vdpasim_dev->devices_kobj, &vdpa->dev.kobj,
> +				vdpasim->name);
> +	if (ret)
> +		goto err_link;
> +
> +	return 0;
> +
> +err_link:
> +err_register:
> +	vhost_iotlb_free(vdpasim->iommu);
> +	mutex_lock(&vsim_list_lock);
> +	list_del(&vdpasim->next);
> +	mutex_unlock(&vsim_list_lock);
> +err_iotlb:
> +	kfree(vdpasim->buffer);
> +err_buffer_alloc:
> +	kfree(vdpasim);
> +err_vdpa_alloc:
> +	return ret;
> +}
> +
> +static int vdpasim_remove(const guid_t *uuid)
> +{
> +	struct vdpasim *vds, *tmp;
> +	struct vdpa_device *vdpa = NULL;
> +	int ret = -EINVAL;
> +
> +	mutex_lock(&vsim_list_lock);
> +	list_for_each_entry_safe(vds, tmp, &vsim_devices_list, next) {
> +		if (guid_equal(&vds->uuid, uuid)) {
> +			vdpa = &vds->vdpa;
> +			ret = 0;
> +			break;
> +		}
> +	}
> +	mutex_unlock(&vsim_list_lock);
> +
> +	if (vdpa)
> +		unregister_vdpa_device(vdpa);
> +
> +	return ret;
> +}
> +
> +static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> +				  u64 desc_area, u64 driver_area,
> +				  u64 device_area)
> +{
> +	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> +	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> +
> +	vq->desc_addr = desc_area;
> +	vq->driver_addr = driver_area;
> +	vq->device_addr = device_area;
> +
> +	return 0;
> +}
> +
> +static void vdpasim_set_vq_num(struct vdpa_device *vdpa, u16 idx, u32 num)
> +{
> +	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> +	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> +
> +	vq->num = num;
> +}
> +
> +static void vdpasim_kick_vq(struct vdpa_device *vdpa, u16 idx)
> +{
> +	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> +	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> +
> +	if (vq->ready)
> +		schedule_work(&vdpasim->work);
> +}
> +
> +static void vdpasim_set_vq_cb(struct vdpa_device *vdpa, u16 idx,
> +			      struct vdpa_callback *cb)
> +{
> +	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> +	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> +
> +	vq->cb = cb->callback;
> +	vq->private = cb->private;
> +}
> +
> +static void vdpasim_set_vq_ready(struct vdpa_device *vdpa, u16 idx, bool ready)
> +{
> +	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> +	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> +
> +	spin_lock(&vdpasim->lock);
> +	vq->ready = ready;
> +	if (vq->ready)
> +		vdpasim_queue_ready(vdpasim, idx);
> +	spin_unlock(&vdpasim->lock);
> +}
> +
> +static bool vdpasim_get_vq_ready(struct vdpa_device *vdpa, u16 idx)
> +{
> +	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> +	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> +
> +	return vq->ready;
> +}
> +
> +static int vdpasim_set_vq_state(struct vdpa_device *vdpa, u16 idx, u64 state)
> +{
> +	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> +	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> +	struct vringh *vrh = &vq->vring;
> +
> +	spin_lock(&vdpasim->lock);
> +	vrh->last_avail_idx = state;
> +	spin_unlock(&vdpasim->lock);
> +
> +	return 0;
> +}
> +
> +static u64 vdpasim_get_vq_state(struct vdpa_device *vdpa, u16 idx)
> +{
> +	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> +	struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> +	struct vringh *vrh = &vq->vring;
> +
> +	return vrh->last_avail_idx;
> +}
> +
> +static u16 vdpasim_get_vq_align(struct vdpa_device *vdpa)
> +{
> +	return VDPASIM_QUEUE_ALIGN;
> +}
> +
> +static u64 vdpasim_get_features(struct vdpa_device *vdpa)
> +{
> +	return vdpasim_features;
> +}
> +
> +static int vdpasim_set_features(struct vdpa_device *vdpa, u64 features)
> +{
> +	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> +
> +	/* DMA mapping must be done by driver */
> +	if (!(features & (1ULL << VIRTIO_F_IOMMU_PLATFORM)))
> +		return -EINVAL;
> +
> +	vdpasim->features = features & vdpasim_features;
> +
> +	return 0;
> +}
> +
> +static void vdpasim_set_config_cb(struct vdpa_device *vdpa,
> +				  struct vdpa_callback *cb)
> +{
> +	/* We don't support config interrupt */
> +}
> +
> +static u16 vdpasim_get_vq_num_max(struct vdpa_device *vdpa)
> +{
> +	return VDPASIM_QUEUE_MAX;
> +}
> +
> +static u32 vdpasim_get_device_id(struct vdpa_device *vdpa)
> +{
> +	return VDPASIM_DEVICE_ID;
> +}
> +
> +static u32 vdpasim_get_vendor_id(struct vdpa_device *vdpa)
> +{
> +	return VDPASIM_VENDOR_ID;
> +}
> +
> +static u8 vdpasim_get_status(struct vdpa_device *vdpa)
> +{
> +	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> +	u8 status;
> +
> +	spin_lock(&vdpasim->lock);
> +	status = vdpasim->status;
> +	spin_unlock(&vdpasim->lock);
> +
> +	return vdpasim->status;
> +}
> +
> +static void vdpasim_set_status(struct vdpa_device *vdpa, u8 status)
> +{
> +	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> +
> +	spin_lock(&vdpasim->lock);
> +	vdpasim->status = status;
> +	if (status == 0)
> +		vdpasim_reset(vdpasim);
> +	spin_unlock(&vdpasim->lock);
> +}
> +
> +static void vdpasim_get_config(struct vdpa_device *vdpa, unsigned int offset,
> +			     void *buf, unsigned int len)
> +{
> +	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> +
> +	if (offset + len < sizeof(struct virtio_net_config))
> +		memcpy(buf, &vdpasim->config + offset, len);
> +}
> +
> +static void vdpasim_set_config(struct vdpa_device *vdpa, unsigned int offset,
> +			     const void *buf, unsigned int len)
> +{
> +	/* No writable config supportted by vdpasim */
> +}
> +
> +static u32 vdpasim_get_generation(struct vdpa_device *vdpa)
> +{
> +	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> +
> +	return vdpasim->generation;
> +}
> +
> +static int vdpasim_set_map(struct vdpa_device *vdpa,
> +			   struct vhost_iotlb *iotlb)
> +{
> +	struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> +	struct vhost_iotlb_map *map;
> +	u64 start = 0ULL, last = 0ULL - 1;
> +	int ret;
> +
> +	vhost_iotlb_reset(vdpasim->iommu);
> +
> +	for (map = vhost_iotlb_itree_first(iotlb, start, last); map;
> +	     map = vhost_iotlb_itree_next(map, start, last)) {
> +		ret = vhost_iotlb_add_range(vdpasim->iommu, map->start,
> +					    map->last, map->addr, map->perm);
> +		if (ret)
> +			goto err;
> +	}
> +	return 0;
> +
> +err:
> +	vhost_iotlb_reset(vdpasim->iommu);
> +	return ret;
> +}
> +
> +static const struct vdpa_config_ops vdpasim_net_config_ops = {
> +	.set_vq_address         = vdpasim_set_vq_address,
> +	.set_vq_num             = vdpasim_set_vq_num,
> +	.kick_vq                = vdpasim_kick_vq,
> +	.set_vq_cb              = vdpasim_set_vq_cb,
> +	.set_vq_ready           = vdpasim_set_vq_ready,
> +	.get_vq_ready           = vdpasim_get_vq_ready,
> +	.set_vq_state           = vdpasim_set_vq_state,
> +	.get_vq_state           = vdpasim_get_vq_state,
> +	.get_vq_align           = vdpasim_get_vq_align,
> +	.get_features           = vdpasim_get_features,
> +	.set_features           = vdpasim_set_features,
> +	.set_config_cb          = vdpasim_set_config_cb,
> +	.get_vq_num_max         = vdpasim_get_vq_num_max,
> +	.get_device_id          = vdpasim_get_device_id,
> +	.get_vendor_id          = vdpasim_get_vendor_id,
> +	.get_status             = vdpasim_get_status,
> +	.set_status             = vdpasim_set_status,
> +	.get_config             = vdpasim_get_config,
> +	.set_config             = vdpasim_set_config,
> +	.get_generation         = vdpasim_get_generation,
> +	.set_map                = vdpasim_set_map,
> +};
> +
> +static void vdpasim_device_release(struct device *dev)
> +{
> +	struct vdpasim_dev *vdpasim_dev =
> +	       container_of(dev, struct vdpasim_dev, dev);
> +
> +	vdpasim_dev->dev.bus = NULL;
> +	idr_destroy(&vdpasim_dev->vd_idr);
> +	class_destroy(vdpasim_dev->vd_class);
> +	vdpasim_dev->vd_class = NULL;
> +	kfree(vdpasim_dev);
> +}
> +
> +static ssize_t create_store(struct kobject *kobj, struct kobj_attribute *attr,
> +			    const char *buf, size_t count)
> +{
> +	char *str;
> +	guid_t uuid;
> +	int ret;
> +
> +	if ((count < UUID_STRING_LEN) || (count > UUID_STRING_LEN + 1))
> +		return -EINVAL;
> +
> +	str = kstrndup(buf, count, GFP_KERNEL);
> +	if (!str)
> +		return -ENOMEM;
> +
> +	ret = guid_parse(str, &uuid);
> +	kfree(str);
> +	if (ret)
> +		return ret;
> +
> +	ret = vdpasim_create(&uuid);
> +	if (ret)
> +		return ret;
> +
> +	return count;
> +}
> +
> +static ssize_t remove_store(struct kobject *kobj, struct kobj_attribute *attr,
> +			    const char *buf, size_t count)
> +{
> +	char *str;
> +	guid_t uuid;
> +	int ret;
> +
> +	if ((count < UUID_STRING_LEN) || (count > UUID_STRING_LEN + 1))
> +		return -EINVAL;
> +
> +	str = kstrndup(buf, count, GFP_KERNEL);
> +	if (!str)
> +		return -ENOMEM;
> +
> +	ret = guid_parse(str, &uuid);
> +	kfree(str);
> +	if (ret)
> +		return ret;
> +
> +	ret = vdpasim_remove(&uuid);
> +	if (ret)
> +		return ret;
> +
> +	return count;
> +}
> +
> +static struct kobj_attribute create_attribute = __ATTR_WO(create);
> +static struct kobj_attribute remove_attribute = __ATTR_WO(remove);
> +
> +static struct attribute *attrs[] = {
> +	&create_attribute.attr,
> +	&remove_attribute.attr,
> +	NULL,
> +};
> +
> +static struct attribute_group attr_group = {
> +	.attrs = attrs,
> +};
> +
> +static int __init vdpasim_dev_init(void)
> +{
> +	struct device *dev;
> +	int ret = 0;
> +
> +	vdpasim_dev = kzalloc(sizeof(*vdpasim_dev), GFP_KERNEL);
> +	if (!vdpasim_dev)
> +		return -ENOMEM;
> +
> +	idr_init(&vdpasim_dev->vd_idr);
> +
> +	vdpasim_dev->vd_class = class_create(THIS_MODULE, VDPASIM_CLASS_NAME);
> +
> +	if (IS_ERR(vdpasim_dev->vd_class)) {
> +		pr_err("Error: failed to register vdpasim_dev class\n");
> +		ret = PTR_ERR(vdpasim_dev->vd_class);
> +		goto err_class;
> +	}
> +
> +	dev = &vdpasim_dev->dev;
> +	dev->class = vdpasim_dev->vd_class;
> +	dev->release = vdpasim_device_release;
> +	dev_set_name(dev, "%s", VDPASIM_NAME);
> +
> +	ret = device_register(&vdpasim_dev->dev);
> +	if (ret)
> +		goto err_register;
> +
> +	ret = sysfs_create_group(&vdpasim_dev->dev.kobj, &attr_group);
> +	if (ret)
> +		goto err_create;
> +
> +	vdpasim_dev->devices_kobj = kobject_create_and_add("devices",
> +							   &dev->kobj);
> +	if (!vdpasim_dev->devices_kobj) {
> +		ret = -ENOMEM;
> +		goto err_devices;
> +	}
> +
> +	mutex_init(&vsim_list_lock);
> +	INIT_LIST_HEAD(&vsim_devices_list);
> +
> +	return 0;
> +
> +err_devices:
> +	sysfs_remove_group(&vdpasim_dev->dev.kobj, &attr_group);
> +err_create:
> +	device_unregister(&vdpasim_dev->dev);
> +err_register:
> +	class_destroy(vdpasim_dev->vd_class);
> +err_class:
> +	kfree(vdpasim_dev);
> +	vdpasim_dev = NULL;
> +	return ret;
> +}
> +
> +static void __exit vdpasim_dev_exit(void)
> +{
> +	device_unregister(&vdpasim_dev->dev);
> +}
> +
> +module_init(vdpasim_dev_init)
> +module_exit(vdpasim_dev_exit)
> +
> +MODULE_VERSION(DRV_VERSION);
> +MODULE_LICENSE(DRV_LICENSE);
> +MODULE_AUTHOR(DRV_AUTHOR);
> +MODULE_DESCRIPTION(DRV_DESC);

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 5/5] vdpasim: vDPA device simulator
  2020-02-04  8:21   ` Zhu Lingshan
@ 2020-02-04  8:28     ` Jason Wang
  2020-02-04 12:52       ` Jason Gunthorpe
  0 siblings, 1 reply; 76+ messages in thread
From: Jason Wang @ 2020-02-04  8:28 UTC (permalink / raw)
  To: Zhu Lingshan, mst, linux-kernel, kvm, virtualization, netdev
  Cc: tiwei.bie, jgg, maxime.coquelin, cunming.liang, zhihong.wang,
	rob.miller, xiao.w.wang, haotian.wang, lingshan.zhu, eperezma,
	lulu, parav, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, jiri, shahafs, hanand, mhabets


On 2020/2/4 下午4:21, Zhu Lingshan wrote:
>> +static const struct dma_map_ops vdpasim_dma_ops = {
>> +    .map_page = vdpasim_map_page,
>> +    .unmap_page = vdpasim_unmap_page,
>> +    .alloc = vdpasim_alloc_coherent,
>> +    .free = vdpasim_free_coherent,
>> +};
>> +
>
> Hey Jason,
>
> IMHO, it would be nice if dma_ops of the parent device could be 
> re-used. vdpa_device is expecting to represent a physical device 
> except this simulator, however, there are not enough information in 
> vdpa_device.dev to indicating which kind physical device it attached 
> to. Namely get_arch_dma_ops(struct bus type) can not work on 
> vdpa_device.dev. Then it seems device drivers need to implement a wrap 
> of dma_ops of parent devices. Can this work be done in the vdpa 
> framework since it looks like a common task? Can 
> "vd_dev->vdev.dev.parent = vdpa->dev->parent;" in virtio_vdpa_probe() 
> do the work?
>
> Thanks,
> BR
> Zhu Lingshan 


Good catch.

I think we can.

Thanks


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 5/5] vdpasim: vDPA device simulator
  2020-02-04  8:28     ` Jason Wang
@ 2020-02-04 12:52       ` Jason Gunthorpe
  2020-02-05  3:14         ` Jason Wang
  0 siblings, 1 reply; 76+ messages in thread
From: Jason Gunthorpe @ 2020-02-04 12:52 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhu Lingshan, mst, linux-kernel, kvm, virtualization, netdev,
	tiwei.bie, maxime.coquelin, cunming.liang, zhihong.wang,
	rob.miller, xiao.w.wang, haotian.wang, lingshan.zhu, eperezma,
	lulu, parav, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, jiri, shahafs, hanand, mhabets

On Tue, Feb 04, 2020 at 04:28:27PM +0800, Jason Wang wrote:
> 
> On 2020/2/4 下午4:21, Zhu Lingshan wrote:
> > > +static const struct dma_map_ops vdpasim_dma_ops = {
> > > +    .map_page = vdpasim_map_page,
> > > +    .unmap_page = vdpasim_unmap_page,
> > > +    .alloc = vdpasim_alloc_coherent,
> > > +    .free = vdpasim_free_coherent,
> > > +};
> > > +
> > 
> > Hey Jason,
> > 
> > IMHO, it would be nice if dma_ops of the parent device could be re-used.
> > vdpa_device is expecting to represent a physical device except this
> > simulator, however, there are not enough information in vdpa_device.dev
> > to indicating which kind physical device it attached to. Namely
> > get_arch_dma_ops(struct bus type) can not work on vdpa_device.dev. Then
> > it seems device drivers need to implement a wrap of dma_ops of parent
> > devices. Can this work be done in the vdpa framework since it looks like
> > a common task? Can "vd_dev->vdev.dev.parent = vdpa->dev->parent;" in
> > virtio_vdpa_probe() do the work?
> > 
> > Thanks,
> > BR
> > Zhu Lingshan
> 
> 
> Good catch.
> 
> I think we can.

IMHO you need to specify some 'dma_device', not try and play tricks
with dma_ops, or assuming the parent is always the device used for
dma.

Jason

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 5/5] vdpasim: vDPA device simulator
  2020-02-04 12:52       ` Jason Gunthorpe
@ 2020-02-05  3:14         ` Jason Wang
  0 siblings, 0 replies; 76+ messages in thread
From: Jason Wang @ 2020-02-05  3:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Zhu Lingshan, mst, linux-kernel, kvm, virtualization, netdev,
	tiwei.bie, maxime.coquelin, cunming.liang, zhihong.wang,
	rob.miller, xiao.w.wang, haotian.wang, lingshan.zhu, eperezma,
	lulu, parav, kevin.tian, stefanha, rdunlap, hch, aadam,
	jakub.kicinski, jiri, shahafs, hanand, mhabets


On 2020/2/4 下午8:52, Jason Gunthorpe wrote:
> On Tue, Feb 04, 2020 at 04:28:27PM +0800, Jason Wang wrote:
>> On 2020/2/4 下午4:21, Zhu Lingshan wrote:
>>>> +static const struct dma_map_ops vdpasim_dma_ops = {
>>>> +    .map_page = vdpasim_map_page,
>>>> +    .unmap_page = vdpasim_unmap_page,
>>>> +    .alloc = vdpasim_alloc_coherent,
>>>> +    .free = vdpasim_free_coherent,
>>>> +};
>>>> +
>>> Hey Jason,
>>>
>>> IMHO, it would be nice if dma_ops of the parent device could be re-used.
>>> vdpa_device is expecting to represent a physical device except this
>>> simulator, however, there are not enough information in vdpa_device.dev
>>> to indicating which kind physical device it attached to. Namely
>>> get_arch_dma_ops(struct bus type) can not work on vdpa_device.dev. Then
>>> it seems device drivers need to implement a wrap of dma_ops of parent
>>> devices. Can this work be done in the vdpa framework since it looks like
>>> a common task? Can "vd_dev->vdev.dev.parent = vdpa->dev->parent;" in
>>> virtio_vdpa_probe() do the work?
>>>
>>> Thanks,
>>> BR
>>> Zhu Lingshan
>>
>> Good catch.
>>
>> I think we can.
> IMHO you need to specify some 'dma_device', not try and play tricks
> with dma_ops, or assuming the parent is always the device used for
> dma.
>
> Jason


Right, this is what in my mind and discussed in the vhost-vdpa thread.

Will go this way.

Thanks



^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, back to index

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-16 12:42 [PATCH 0/5] vDPA support Jason Wang
2020-01-16 12:42 ` [PATCH 1/5] vhost: factor out IOTLB Jason Wang
2020-01-17  4:14   ` Randy Dunlap
2020-01-17  9:34     ` Jason Wang
2020-01-18  0:01   ` kbuild test robot
2020-01-18  0:40   ` kbuild test robot
2020-01-16 12:42 ` [PATCH 2/5] vringh: IOTLB support Jason Wang
2020-01-17 21:54   ` kbuild test robot
2020-01-17 22:33   ` kbuild test robot
2020-01-16 12:42 ` [PATCH 3/5] vDPA: introduce vDPA bus Jason Wang
2020-01-16 15:22   ` Jason Gunthorpe
2020-01-17  3:03     ` Jason Wang
2020-01-17 13:54       ` Jason Gunthorpe
2020-01-20  7:50         ` Jason Wang
2020-01-20 12:17         ` Michael S. Tsirkin
2020-01-20 17:50           ` Jason Gunthorpe
2020-01-20 21:56             ` Michael S. Tsirkin
2020-01-21 14:12               ` Jason Gunthorpe
2020-01-21 14:15                 ` Michael S. Tsirkin
2020-01-21 14:16                   ` Jason Gunthorpe
2020-01-21  8:40       ` Tian, Kevin
2020-01-21  9:41         ` Jason Wang
2020-01-17  4:16   ` Randy Dunlap
2020-01-17  9:34     ` Jason Wang
2020-01-17 12:13   ` Michael S. Tsirkin
2020-01-17 13:52     ` Jason Wang
     [not found]       ` <CAJPjb1+fG9L3=iKbV4Vn13VwaeDZZdcfBPvarogF_Nzhk+FnKg@mail.gmail.com>
2020-01-19  9:07         ` Shahaf Shuler
2020-01-19  9:59           ` Michael S. Tsirkin
2020-01-20  8:44             ` Jason Wang
2020-01-20 12:09               ` Michael S. Tsirkin
2020-01-21  3:32                 ` Jason Wang
2020-01-20  8:43           ` Jason Wang
2020-01-20 17:49             ` Jason Gunthorpe
2020-01-20 20:51               ` Shahaf Shuler
2020-01-20 21:25                 ` Michael S. Tsirkin
2020-01-20 21:47                   ` Shahaf Shuler
2020-01-20 21:59                     ` Michael S. Tsirkin
2020-01-21  6:01                       ` Shahaf Shuler
2020-01-21  7:57                         ` Jason Wang
2020-01-21 14:07                   ` Jason Gunthorpe
2020-01-21 14:16                     ` Michael S. Tsirkin
2020-01-20 21:48               ` Michael S. Tsirkin
2020-01-21  4:00               ` Jason Wang
2020-01-21  5:47                 ` Michael S. Tsirkin
2020-01-21  8:00                   ` Jason Wang
2020-01-21  8:15                     ` Michael S. Tsirkin
2020-01-21  8:35                       ` Jason Wang
2020-01-21 11:09                         ` Shahaf Shuler
2020-01-22  6:36                           ` Jason Wang
2020-01-21 14:05                       ` Jason Gunthorpe
2020-01-21 14:17                         ` Michael S. Tsirkin
2020-01-22  6:18                           ` Jason Wang
2020-01-20  8:19         ` Jason Wang
2020-01-16 12:42 ` [PATCH 4/5] virtio: introduce a vDPA based transport Jason Wang
2020-01-16 15:38   ` Jason Gunthorpe
2020-01-17  9:32     ` Jason Wang
2020-01-17 14:00       ` Jason Gunthorpe
2020-01-20  7:52         ` Jason Wang
2020-01-17  4:10   ` Randy Dunlap
2020-01-16 12:42 ` [PATCH 5/5] vdpasim: vDPA device simulator Jason Wang
2020-01-16 15:47   ` Jason Gunthorpe
2020-01-17  9:32     ` Jason Wang
2020-01-17 14:10       ` Jason Gunthorpe
2020-01-20  8:01         ` Jason Wang
2020-02-04  4:19         ` Jason Wang
2020-01-17  4:12   ` Randy Dunlap
2020-01-17  9:35     ` Jason Wang
2020-01-18 18:18   ` kbuild test robot
2020-01-28  3:32   ` Dan Carpenter
2020-02-04  4:07     ` Jason Wang
2020-02-04  8:21   ` Zhu Lingshan
2020-02-04  8:28     ` Jason Wang
2020-02-04 12:52       ` Jason Gunthorpe
2020-02-05  3:14         ` Jason Wang
2020-01-21  8:44 ` [PATCH 0/5] vDPA support Tian, Kevin
2020-01-21  9:39   ` Jason Wang

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git