All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] use vfio_dma_rw to read/write IOVAs from CPU side
@ 2020-02-24  8:43 Yan Zhao
  2020-02-24  8:46 ` [PATCH v3 1/7] vfio: allow external user to get vfio group from device Yan Zhao
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Yan Zhao @ 2020-02-24  8:43 UTC (permalink / raw)
  To: alex.williamson, zhenyuw
  Cc: intel-gvt-dev, kvm, linux-kernel, pbonzini, kevin.tian, peterx, Yan Zhao

It is better for a device model to use IOVAs to read/write memory to
perform some sort of virtual DMA on behalf of the device.

patch 1 exports VFIO group to external user so that it can hold the group
reference until finishing using of it. It saves ~500 cycles that are spent
on VFIO group looking up, referencing and dereferencing. (this data is
measured with 1 VFIO user).

patch 2 introduces interface vfio_dma_rw().

patch 3 introduces interfaces vfio_pin_pages_from_group() and
vfio_unpin_pages_from_group() to get rid of VFIO group looking-up in
vfio_pin_pages() and vfio_unpin_pages().

patch 4-5 let kvmgt switch from calling kvm_read/write_guest() to calling
vfio_dma_rw to rw IOVAs.

patch 6 let kvmgt switch to use lighter version of vfio_pin/unpin_pages(),
i.e. vfio_pin/unpin_pages_from_group()

patch 7 enables kvmgt to read/write IOVAs of size larger than PAGE_SIZE.


Performance:

Comparison between vfio_dma_rw() and kvm_read/write_guest():

1. avergage CPU cycles of each interface measured with 1 running VM:
 --------------------------------------------------
|  rw       |          avg cycles of               |
|  size     | (vfio_dma_rw - kvm_read/write_guest) |
|---------- ---------------------------------------|
| <= 1 page |            +155 ~ +195               |        
|--------------------------------------------------|
| 5 pages   |                -530                  |
|--------------------------------------------------|
| 20 pages  |           -2005 ~ -2533              |
 --------------------------------------------------

2. average scores

base: base code before applying code in this series. use
kvm_read/write_pages() to rw IOVAs

base + this series: use vfio_dma_rw() to read IOVAs and use
vfio_pin/unpin_pages_from_group(), and kvmgt is able to rw several pages
at a time.

Scores of benchmarks running in 1 VM each:
 -----------------------------------------------------------------
|                    | glmark2 | lightsmark | openarena | heavens |
|-----------------------------------------------------------------|
|       base         |  1248   |  219.70    |  114.9    |   560   |
|-----------------------------------------------------------------|
|base + this series  |  1248   |  225.8     |   115     |   559   |
 -----------------------------------------------------------------

Sum of scores of two benchmark instances running in 2 VMs each:
 -------------------------------------------------------
|                    | glmark2 | lightsmark | openarena |
|-------------------------------------------------------|
|       base         |  812    |  211.46    |  115.3    |
|-------------------------------------------------------|
|base + this series  |  814    |  214.69    |  115.9    |
 -------------------------------------------------------


Changelogs:
v2 --> v3:
- add vfio_group_get_external_user_from_dev() to improve performance (Alex)
- add vfio_pin/unpin_pages_from_group() to avoid repeated looking up of
  VFIO group in vfio_pin/unpin_pages() (Alex)
- add a check for IOMMU_READ permission. (Alex)
- rename vfio_iommu_type1_rw_dma_nopin() to
  vfio_iommu_type1_dma_rw_chunk(). (Alex)
- in kvmgt, change "write ? vfio_dma_rw(...,true) :
  vfio_dma_rw(...,false)" to vfio_dma_rw(dev, gpa, buf, len, write)
  (Alex and Paolo)
- in kvmgt, instead of read/write context pages 1:1, combining the
  reads/writes of continuous IOVAs to take advantage of vfio_dma_rw() for
  faster crossing page boundary accesses.

v1 --> v2:
- rename vfio_iova_rw to vfio_dma_rw, vfio iommu driver ops .iova_rw
to .dma_rw. (Alex).
- change iova and len from unsigned long to dma_addr_t and size_t,
respectively. (Alex)
- fix possible overflow in dma->vaddr + iova - dma->iova + offset (Alex)
- split DMAs from on page boundary to on max available size to eliminate
  redundant searching of vfio_dma and switching mm. (Alex)
- add a check for IOMMU_WRITE permission.

 Yan Zhao (7):
  vfio: allow external user to get vfio group from device
  vfio: introduce vfio_dma_rw to read/write a range of IOVAs
  vfio: avoid inefficient lookup of VFIO group in vfio_pin/unpin_pages
  drm/i915/gvt: hold reference of VFIO group during opening of vgpu
  drm/i915/gvt: subsitute kvm_read/write_guest with vfio_dma_rw
  drm/i915/gvt: avoid unnecessary lookup in each vfio pin & unpin pages
  drm/i915/gvt: rw more pages a time for shadow context

 drivers/gpu/drm/i915/gvt/gvt.h       |   1 +
 drivers/gpu/drm/i915/gvt/kvmgt.c     |  43 +++----
 drivers/gpu/drm/i915/gvt/scheduler.c | 101 +++++++++++-----
 drivers/vfio/vfio.c                  | 175 +++++++++++++++++++++++++++
 drivers/vfio/vfio_iommu_type1.c      |  77 ++++++++++++
 include/linux/vfio.h                 |  13 ++
 6 files changed, 358 insertions(+), 52 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v3 1/7] vfio: allow external user to get vfio group from device
  2020-02-24  8:43 [PATCH v3 0/7] use vfio_dma_rw to read/write IOVAs from CPU side Yan Zhao
@ 2020-02-24  8:46 ` Yan Zhao
  2020-02-24 19:15   ` Alex Williamson
  2020-02-24  8:47 ` [PATCH v3 2/7] vfio: introduce vfio_dma_rw to read/write a range of IOVAs Yan Zhao
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Yan Zhao @ 2020-02-24  8:46 UTC (permalink / raw)
  To: alex.williamson
  Cc: zhenyuw, intel-gvt-dev, kvm, linux-kernel, pbonzini, kevin.tian,
	peterx, Yan Zhao

external user is able to
1. add a device into an vfio group
2. call vfio_group_get_external_user_from_dev() with the device pointer
to get vfio_group associated with this device and increments the container
user counter to prevent the VFIO group from disposal before KVM exits.
3. When the external KVM finishes, it calls vfio_group_put_external_user()
to release the VFIO group.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 drivers/vfio/vfio.c  | 37 +++++++++++++++++++++++++++++++++++++
 include/linux/vfio.h |  2 ++
 2 files changed, 39 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index c8482624ca34..914bdf4b9d73 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1720,6 +1720,43 @@ struct vfio_group *vfio_group_get_external_user(struct file *filep)
 }
 EXPORT_SYMBOL_GPL(vfio_group_get_external_user);
 
+/**
+ * External user API, exported by symbols to be linked dynamically.
+ *
+ * The protocol includes:
+ * 1. External user add a device into a vfio group
+ *
+ * 2. The external user calls vfio_group_get_external_user_from_dev()
+ * with the device pointer
+ * to verify that:
+ *	- there's a vfio group associated with it and is initialized;
+ *	- IOMMU is set for the vfio group.
+ * If both checks passed, vfio_group_get_external_user_from_dev()
+ * increments the container user counter to prevent
+ * the VFIO group from disposal before KVM exits.
+ *
+ * 3. When the external KVM finishes, it calls
+ * vfio_group_put_external_user() to release the VFIO group.
+ * This call decrements the container user counter.
+ */
+
+struct vfio_group *vfio_group_get_external_user_from_dev(struct device *dev)
+{
+	struct vfio_group *group;
+	int ret;
+
+	group = vfio_group_get_from_dev(dev);
+	if (!group)
+		return ERR_PTR(-ENODEV);
+
+	ret = vfio_group_add_container_user(group);
+	if (ret)
+		return ERR_PTR(ret);
+
+	return group;
+}
+EXPORT_SYMBOL_GPL(vfio_group_get_external_user_from_dev);
+
 void vfio_group_put_external_user(struct vfio_group *group)
 {
 	vfio_group_try_dissolve_container(group);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index e42a711a2800..2e1fa0c7396f 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -94,6 +94,8 @@ extern void vfio_unregister_iommu_driver(
  */
 extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
 extern void vfio_group_put_external_user(struct vfio_group *group);
+extern
+struct vfio_group *vfio_group_get_external_user_from_dev(struct device *dev);
 extern bool vfio_external_group_match_file(struct vfio_group *group,
 					   struct file *filep);
 extern int vfio_external_user_iommu_id(struct vfio_group *group);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 2/7] vfio: introduce vfio_dma_rw to read/write a range of IOVAs
  2020-02-24  8:43 [PATCH v3 0/7] use vfio_dma_rw to read/write IOVAs from CPU side Yan Zhao
  2020-02-24  8:46 ` [PATCH v3 1/7] vfio: allow external user to get vfio group from device Yan Zhao
@ 2020-02-24  8:47 ` Yan Zhao
  2020-02-24 19:14   ` Alex Williamson
  2020-03-06  1:21   ` Yan Zhao
  2020-02-24  8:47 ` [PATCH v3 3/7] vfio: avoid inefficient lookup of VFIO group in vfio_pin/unpin_pages Yan Zhao
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 17+ messages in thread
From: Yan Zhao @ 2020-02-24  8:47 UTC (permalink / raw)
  To: alex.williamson
  Cc: zhenyuw, intel-gvt-dev, kvm, linux-kernel, pbonzini, kevin.tian,
	peterx, Yan Zhao

vfio_dma_rw will read/write a range of user space memory pointed to by
IOVA into/from a kernel buffer without enforcing pinning the user space
memory.

TODO: mark the IOVAs to user space memory dirty if they are written in
vfio_dma_rw().

Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 drivers/vfio/vfio.c             | 49 +++++++++++++++++++++
 drivers/vfio/vfio_iommu_type1.c | 77 +++++++++++++++++++++++++++++++++
 include/linux/vfio.h            |  5 +++
 3 files changed, 131 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 914bdf4b9d73..902867627cbf 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1998,6 +1998,55 @@ int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn, int npage)
 }
 EXPORT_SYMBOL(vfio_unpin_pages);
 
+
+/*
+ * This interface allows the CPUs to perform some sort of virtual DMA on
+ * behalf of the device.
+ *
+ * CPUs read/write a range of IOVAs pointing to user space memory into/from
+ * a kernel buffer.
+ *
+ * As the read/write of user space memory is conducted via the CPUs and is
+ * not a real device DMA, it is not necessary to pin the user space memory.
+ *
+ * The caller needs to call vfio_group_get_external_user() or
+ * vfio_group_get_external_user_from_dev() prior to calling this interface,
+ * so as to prevent the VFIO group from disposal in the middle of the call.
+ * But it can keep the reference to the VFIO group for several calls into
+ * this interface.
+ * After finishing using of the VFIO group, the caller needs to release the
+ * VFIO group by calling vfio_group_put_external_user().
+ *
+ * @group [in]: vfio group of a device
+ * @iova [in] : base IOVA of a user space buffer
+ * @data [in] : pointer to kernel buffer
+ * @len [in]  : kernel buffer length
+ * @write     : indicate read or write
+ * Return error code on failure or 0 on success.
+ */
+int vfio_dma_rw(struct vfio_group *group, dma_addr_t iova,
+		void *data, size_t len, bool write)
+{
+	struct vfio_container *container;
+	struct vfio_iommu_driver *driver;
+	int ret = 0;
+
+	if (!group || !data || len <= 0)
+		return -EINVAL;
+
+	container = group->container;
+	driver = container->iommu_driver;
+
+	if (likely(driver && driver->ops->dma_rw))
+		ret = driver->ops->dma_rw(container->iommu_data,
+					  iova, data, len, write);
+	else
+		ret = -ENOTTY;
+
+	return ret;
+}
+EXPORT_SYMBOL(vfio_dma_rw);
+
 static int vfio_register_iommu_notifier(struct vfio_group *group,
 					unsigned long *events,
 					struct notifier_block *nb)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2ada8e6cdb88..74e1c425943c 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -27,6 +27,7 @@
 #include <linux/iommu.h>
 #include <linux/module.h>
 #include <linux/mm.h>
+#include <linux/mmu_context.h>
 #include <linux/rbtree.h>
 #include <linux/sched/signal.h>
 #include <linux/sched/mm.h>
@@ -2326,6 +2327,81 @@ static int vfio_iommu_type1_unregister_notifier(void *iommu_data,
 	return blocking_notifier_chain_unregister(&iommu->notifier, nb);
 }
 
+static size_t vfio_iommu_type1_dma_rw_chunk(struct vfio_iommu *iommu,
+					    dma_addr_t iova, void *data,
+					    size_t count, bool write)
+{
+	struct mm_struct *mm;
+	unsigned long vaddr;
+	struct vfio_dma *dma;
+	bool kthread = current->mm == NULL;
+	size_t done = 0;
+	size_t offset;
+
+	dma = vfio_find_dma(iommu, iova, 1);
+	if (!dma)
+		return 0;
+
+	if ((write && !(dma->prot & IOMMU_WRITE)) ||
+			!(dma->prot & IOMMU_READ))
+		return 0;
+
+	mm = get_task_mm(dma->task);
+
+	if (!mm)
+		return 0;
+
+	if (kthread)
+		use_mm(mm);
+	else if (current->mm != mm)
+		goto out;
+
+	offset = iova - dma->iova;
+
+	if (count > dma->size - offset)
+		count = dma->size - offset;
+
+	vaddr = dma->vaddr + offset;
+
+	if (write)
+		done = __copy_to_user((void __user *)vaddr,
+				       data, count) ? 0 : count;
+	else
+		done = __copy_from_user(data, (void __user *)vaddr,
+					count) ? 0 : count;
+
+	if (kthread)
+		unuse_mm(mm);
+out:
+	mmput(mm);
+	return done;
+}
+
+static int vfio_iommu_type1_dma_rw(void *iommu_data, dma_addr_t iova,
+				   void *data, size_t count, bool write)
+{
+	struct vfio_iommu *iommu = iommu_data;
+	int ret = 0;
+	size_t done = 0;
+
+	mutex_lock(&iommu->lock);
+	while (count > 0) {
+		done = vfio_iommu_type1_dma_rw_chunk(iommu, iova, data,
+						     count, write);
+		if (!done) {
+			ret = -EFAULT;
+			break;
+		}
+
+		count -= done;
+		data += done;
+		iova += done;
+	}
+
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
 static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
 	.name			= "vfio-iommu-type1",
 	.owner			= THIS_MODULE,
@@ -2338,6 +2414,7 @@ static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
 	.unpin_pages		= vfio_iommu_type1_unpin_pages,
 	.register_notifier	= vfio_iommu_type1_register_notifier,
 	.unregister_notifier	= vfio_iommu_type1_unregister_notifier,
+	.dma_rw			= vfio_iommu_type1_dma_rw,
 };
 
 static int __init vfio_iommu_type1_init(void)
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 2e1fa0c7396f..fea0cb1e61d2 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -82,6 +82,8 @@ struct vfio_iommu_driver_ops {
 					     struct notifier_block *nb);
 	int		(*unregister_notifier)(void *iommu_data,
 					       struct notifier_block *nb);
+	int		(*dma_rw)(void *iommu_data, dma_addr_t iova,
+				  void *data, size_t count, bool write);
 };
 
 extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
@@ -109,6 +111,9 @@ extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
 extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn,
 			    int npage);
 
+extern int vfio_dma_rw(struct vfio_group *group, dma_addr_t iova, void *data,
+		       size_t len, bool write);
+
 /* each type has independent events */
 enum vfio_notify_type {
 	VFIO_IOMMU_NOTIFY = 0,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 3/7] vfio: avoid inefficient lookup of VFIO group in vfio_pin/unpin_pages
  2020-02-24  8:43 [PATCH v3 0/7] use vfio_dma_rw to read/write IOVAs from CPU side Yan Zhao
  2020-02-24  8:46 ` [PATCH v3 1/7] vfio: allow external user to get vfio group from device Yan Zhao
  2020-02-24  8:47 ` [PATCH v3 2/7] vfio: introduce vfio_dma_rw to read/write a range of IOVAs Yan Zhao
@ 2020-02-24  8:47 ` Yan Zhao
  2020-02-24  8:47 ` [PATCH v3 4/7] drm/i915/gvt: hold reference of VFIO group during opening of vgpu Yan Zhao
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Yan Zhao @ 2020-02-24  8:47 UTC (permalink / raw)
  To: alex.williamson
  Cc: zhenyuw, intel-gvt-dev, kvm, linux-kernel, pbonzini, kevin.tian,
	peterx, Yan Zhao

in vfio_pin_pages() and vfio_unpin_pages(), before calling into iommu
driver, vfio group for the device is first searched and gets reference
count increased; and after calling into iommu driver, the vfio group is
dereference.

This searching/ref/deref operations can be combined for several calls by
invoking  holding reference of the VFIO group prior to use it and not
releasing the VFIO group until finishing using it.

vfio_pin_pages_from_group() and vfio_unpin_pages_from_group() are
introduced to avoid the above inefficient lookup of VFIO group.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 drivers/vfio/vfio.c  | 89 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/vfio.h |  6 +++
 2 files changed, 95 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 902867627cbf..08d9e8fb33e5 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1998,6 +1998,95 @@ int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn, int npage)
 }
 EXPORT_SYMBOL(vfio_unpin_pages);
 
+/*
+ * Pin a set of guest PFNs and return their associated host PFNs for local
+ * domain only.
+ *
+ * The caller needs to call vfio_group_get_external_user() or
+ * vfio_group_get_external_user_from_dev() prior to calling this interface,
+ * so as to prevent the VFIO group from disposal in the middle of the call.
+ * But it can keep the reference to the VFIO group for several calls into
+ * this interface.
+ * After finishing using of the VFIO group, the caller needs to release the
+ * VFIO group by calling vfio_group_put_external_user().
+ *
+ * @group [in]   : VFIO group of a device
+ * @user_pfn [in]: array of user/guest PFNs to be pinned.
+ * @npage [in]   : count of elements in user_pfn array.  This count should not
+ *		   be greater VFIO_PIN_PAGES_MAX_ENTRIES.
+ * @prot [in]    : protection flags
+ * @phys_pfn[out]: array of host PFNs
+ * Return error or number of pages pinned.
+ */
+int vfio_pin_pages_from_group(struct vfio_group *group,
+			      unsigned long *user_pfn, int npage,
+			      int prot, unsigned long *phys_pfn)
+{
+	struct vfio_container *container;
+	struct vfio_iommu_driver *driver;
+	int ret;
+
+	if (!group || !user_pfn || !phys_pfn || !npage)
+		return -EINVAL;
+
+	if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
+		return -E2BIG;
+
+	container = group->container;
+	driver = container->iommu_driver;
+	if (likely(driver && driver->ops->pin_pages))
+		ret = driver->ops->pin_pages(container->iommu_data, user_pfn,
+					     npage, prot, phys_pfn);
+	else
+		ret = -ENOTTY;
+
+	return ret;
+}
+EXPORT_SYMBOL(vfio_pin_pages_from_group);
+
+/*
+ * Unpin set of host PFNs for local domain only.
+ *
+ * The caller needs to call vfio_group_get_external_user() or
+ * vfio_group_get_external_user_from_dev() prior to calling this interface,
+ * so as to prevent the VFIO group from disposal in the middle of the call.
+ * But it can keep the reference to the VFIO group for several calls into
+ * this interface.
+ * After finishing using of the VFIO group, the caller needs to release the
+ * VFIO group by calling vfio_group_put_external_user().
+ *
+ * @group [in]   : vfio group of a device
+ * @user_pfn [in]: array of user/guest PFNs to be unpinned. Number of user/guest
+ *		   PFNs should not be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
+ * @npage [in]   : count of elements in user_pfn array.  This count should not
+ *                 be greater than VFIO_PIN_PAGES_MAX_ENTRIES.
+ * Return error or number of pages unpinned.
+ */
+int vfio_unpin_pages_from_group(struct vfio_group *group,
+				unsigned long *user_pfn, int npage)
+{
+	struct vfio_container *container;
+	struct vfio_iommu_driver *driver;
+	int ret;
+
+	if (!group || !user_pfn || !npage)
+		return -EINVAL;
+
+	if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
+		return -E2BIG;
+
+	container = group->container;
+	driver = container->iommu_driver;
+	if (likely(driver && driver->ops->unpin_pages))
+		ret = driver->ops->unpin_pages(container->iommu_data, user_pfn,
+					       npage);
+	else
+		ret = -ENOTTY;
+
+	return ret;
+}
+EXPORT_SYMBOL(vfio_unpin_pages_from_group);
+
 
 /*
  * This interface allows the CPUs to perform some sort of virtual DMA on
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index fea0cb1e61d2..aacf6611c084 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -111,6 +111,12 @@ extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
 extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn,
 			    int npage);
 
+extern int vfio_pin_pages_from_group(struct vfio_group *group,
+				     unsigned long *user_pfn, int npage,
+				     int prot, unsigned long *phys_pfn);
+extern int vfio_unpin_pages_from_group(struct vfio_group *group,
+				       unsigned long *user_pfn, int npage);
+
 extern int vfio_dma_rw(struct vfio_group *group, dma_addr_t iova, void *data,
 		       size_t len, bool write);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 4/7] drm/i915/gvt: hold reference of VFIO group during opening of vgpu
  2020-02-24  8:43 [PATCH v3 0/7] use vfio_dma_rw to read/write IOVAs from CPU side Yan Zhao
                   ` (2 preceding siblings ...)
  2020-02-24  8:47 ` [PATCH v3 3/7] vfio: avoid inefficient lookup of VFIO group in vfio_pin/unpin_pages Yan Zhao
@ 2020-02-24  8:47 ` Yan Zhao
  2020-02-24  8:48 ` [PATCH v3 5/7] drm/i915/gvt: subsitute kvm_read/write_guest with vfio_dma_rw Yan Zhao
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: Yan Zhao @ 2020-02-24  8:47 UTC (permalink / raw)
  To: zhenyuw
  Cc: alex.williamson, intel-gvt-dev, kvm, linux-kernel, pbonzini,
	kevin.tian, peterx, Yan Zhao

hold reference count of the VFIO group for each vgpu at vgpu opening and
release the reference at vgpu releasing.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 drivers/gpu/drm/i915/gvt/gvt.h   |  1 +
 drivers/gpu/drm/i915/gvt/kvmgt.c | 13 +++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h
index 0081b051d3e0..5230ac80b84c 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.h
+++ b/drivers/gpu/drm/i915/gvt/gvt.h
@@ -219,6 +219,7 @@ struct intel_vgpu {
 		struct work_struct release_work;
 		atomic_t released;
 		struct vfio_device *vfio_device;
+		struct vfio_group *vfio_group;
 	} vdev;
 #endif
 
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index bd79a9718cc7..ed4c79cc3e09 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -747,6 +747,7 @@ static int intel_vgpu_open(struct mdev_device *mdev)
 	struct intel_vgpu *vgpu = mdev_get_drvdata(mdev);
 	unsigned long events;
 	int ret;
+	struct vfio_group *vfio_group;
 
 	vgpu->vdev.iommu_notifier.notifier_call = intel_vgpu_iommu_notifier;
 	vgpu->vdev.group_notifier.notifier_call = intel_vgpu_group_notifier;
@@ -769,6 +770,14 @@ static int intel_vgpu_open(struct mdev_device *mdev)
 		goto undo_iommu;
 	}
 
+	vfio_group = vfio_group_get_external_user_from_dev(mdev_dev(mdev));
+	if (IS_ERR_OR_NULL(vfio_group)) {
+		ret = !vfio_group ? -EFAULT : PTR_ERR(vfio_group);
+		gvt_vgpu_err("vfio_group_get_external_user_from_dev failed\n");
+		goto undo_register_group;
+	}
+	vgpu->vdev.vfio_group = vfio_group;
+
 	/* Take a module reference as mdev core doesn't take
 	 * a reference for vendor driver.
 	 */
@@ -785,6 +794,9 @@ static int intel_vgpu_open(struct mdev_device *mdev)
 	return ret;
 
 undo_group:
+	vfio_group_put_external_user(vgpu->vdev.vfio_group);
+
+undo_register_group:
 	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
 					&vgpu->vdev.group_notifier);
 
@@ -834,6 +846,7 @@ static void __intel_vgpu_release(struct intel_vgpu *vgpu)
 	kvmgt_guest_exit(info);
 
 	intel_vgpu_release_msi_eventfd_ctx(vgpu);
+	vfio_group_put_external_user(vgpu->vdev.vfio_group);
 
 	vgpu->vdev.kvm = NULL;
 	vgpu->handle = 0;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 5/7] drm/i915/gvt: subsitute kvm_read/write_guest with vfio_dma_rw
  2020-02-24  8:43 [PATCH v3 0/7] use vfio_dma_rw to read/write IOVAs from CPU side Yan Zhao
                   ` (3 preceding siblings ...)
  2020-02-24  8:47 ` [PATCH v3 4/7] drm/i915/gvt: hold reference of VFIO group during opening of vgpu Yan Zhao
@ 2020-02-24  8:48 ` Yan Zhao
  2020-02-24  8:48 ` [PATCH v3 6/7] drm/i915/gvt: avoid unnecessary lookup in each vfio pin & unpin pages Yan Zhao
  2020-02-24  8:48 ` [PATCH v3 7/7] drm/i915/gvt: rw more pages a time for shadow context Yan Zhao
  6 siblings, 0 replies; 17+ messages in thread
From: Yan Zhao @ 2020-02-24  8:48 UTC (permalink / raw)
  To: zhenyuw
  Cc: alex.williamson, intel-gvt-dev, kvm, linux-kernel, pbonzini,
	kevin.tian, peterx, Yan Zhao

As a device model, it is better to read/write guest memory using vfio
interface, so that vfio is able to maintain dirty info of device IOVAs.

Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c | 22 +---------------------
 1 file changed, 1 insertion(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index ed4c79cc3e09..3d6362fd94e7 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1979,33 +1979,13 @@ static int kvmgt_rw_gpa(unsigned long handle, unsigned long gpa,
 			void *buf, unsigned long len, bool write)
 {
 	struct kvmgt_guest_info *info;
-	struct kvm *kvm;
-	int idx, ret;
-	bool kthread = current->mm == NULL;
 
 	if (!handle_valid(handle))
 		return -ESRCH;
 
 	info = (struct kvmgt_guest_info *)handle;
-	kvm = info->kvm;
-
-	if (kthread) {
-		if (!mmget_not_zero(kvm->mm))
-			return -EFAULT;
-		use_mm(kvm->mm);
-	}
 
-	idx = srcu_read_lock(&kvm->srcu);
-	ret = write ? kvm_write_guest(kvm, gpa, buf, len) :
-		      kvm_read_guest(kvm, gpa, buf, len);
-	srcu_read_unlock(&kvm->srcu, idx);
-
-	if (kthread) {
-		unuse_mm(kvm->mm);
-		mmput(kvm->mm);
-	}
-
-	return ret;
+	return vfio_dma_rw(info->vgpu->vdev.vfio_group, gpa, buf, len, write);
 }
 
 static int kvmgt_read_gpa(unsigned long handle, unsigned long gpa,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 6/7] drm/i915/gvt: avoid unnecessary lookup in each vfio pin & unpin pages
  2020-02-24  8:43 [PATCH v3 0/7] use vfio_dma_rw to read/write IOVAs from CPU side Yan Zhao
                   ` (4 preceding siblings ...)
  2020-02-24  8:48 ` [PATCH v3 5/7] drm/i915/gvt: subsitute kvm_read/write_guest with vfio_dma_rw Yan Zhao
@ 2020-02-24  8:48 ` Yan Zhao
  2020-02-24  8:48 ` [PATCH v3 7/7] drm/i915/gvt: rw more pages a time for shadow context Yan Zhao
  6 siblings, 0 replies; 17+ messages in thread
From: Yan Zhao @ 2020-02-24  8:48 UTC (permalink / raw)
  To: zhenyuw
  Cc: alex.williamson, intel-gvt-dev, kvm, linux-kernel, pbonzini,
	kevin.tian, peterx, Yan Zhao

substitute vfio_pin_pages() and vfio_unpin_pages() with
vfio_pin_pages_from_group() and vfio_unpin_pages_from_group(), so that
it will not go through looking up, referencing, dereferencing of VFIO
group in each call.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 3d6362fd94e7..aa7c6f2f1fb8 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -129,7 +129,8 @@ static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 	for (npage = 0; npage < total_pages; npage++) {
 		unsigned long cur_gfn = gfn + npage;
 
-		ret = vfio_unpin_pages(mdev_dev(vgpu->vdev.mdev), &cur_gfn, 1);
+		ret = vfio_unpin_pages_from_group(vgpu->vdev.vfio_group,
+						  &cur_gfn, 1);
 		WARN_ON(ret != 1);
 	}
 }
@@ -152,8 +153,9 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 		unsigned long cur_gfn = gfn + npage;
 		unsigned long pfn;
 
-		ret = vfio_pin_pages(mdev_dev(vgpu->vdev.mdev), &cur_gfn, 1,
-				     IOMMU_READ | IOMMU_WRITE, &pfn);
+		ret = vfio_pin_pages_from_group(vgpu->vdev.vfio_group,
+						&cur_gfn, 1,
+						IOMMU_READ | IOMMU_WRITE, &pfn);
 		if (ret != 1) {
 			gvt_vgpu_err("vfio_pin_pages failed for gfn 0x%lx, ret %d\n",
 				     cur_gfn, ret);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v3 7/7] drm/i915/gvt: rw more pages a time for shadow context
  2020-02-24  8:43 [PATCH v3 0/7] use vfio_dma_rw to read/write IOVAs from CPU side Yan Zhao
                   ` (5 preceding siblings ...)
  2020-02-24  8:48 ` [PATCH v3 6/7] drm/i915/gvt: avoid unnecessary lookup in each vfio pin & unpin pages Yan Zhao
@ 2020-02-24  8:48 ` Yan Zhao
  6 siblings, 0 replies; 17+ messages in thread
From: Yan Zhao @ 2020-02-24  8:48 UTC (permalink / raw)
  To: zhenyuw
  Cc: alex.williamson, intel-gvt-dev, kvm, linux-kernel, pbonzini,
	kevin.tian, peterx, Yan Zhao

1. as shadow context is pinned in intel_vgpu_setup_submission() and
unpinned in intel_vgpu_clean_submission(), its base virtual address of
is safely obtained from lrc_reg_state. no need to call kmap()/kunmap()
repeatedly.

2. IOVA(GPA)s of context pages are checked in this patch and if they are
continuous, read/write them together in one intel_gvt_hypervisor_read_gpa()
/intel_gvt_hypervisor_write_gpa().

after the two changes in this patch,
average cycles for populate_shadow_context() and update_guest_context()
are reduced by ~10000-20000 cycles, depending on the average continuous
pages in each read/write.

(1) comparison of cycles of
populate_shadow_context() + update_guest_context() when executing
different benchmarks
 -------------------------------------------------------------
|       cycles      | glmark2     | lightsmark  | openarena   |
|-------------------------------------------------------------|
| before this patch | 65968       | 97852       | 61373       |
|  after this patch | 56017 (85%) | 73862 (75%) | 47463 (77%) |
 -------------------------------------------------------------

(2) average count of pages read/written a time in
populate_shadow_context() and update_guest_context()
for each benchmark

 -----------------------------------------------------------
|     page cnt      | glmark2     | lightsmark  | openarena |
|-----------------------------------------------------------|
| before this patch |    1        |      1      |    1      |
|  after this patch |    5.25     |     19.99   |   20      |
 ------------------------------------------------------------

(3) comparison of benchmarks scores
 ---------------------------------------------------------------------
|      score        | glmark2       | lightsmark     | openarena      |
|---------------------------------------------------------------------|
| before this patch | 1244          | 222.18         | 114.4          |
|  after this patch | 1248 (100.3%) | 225.8 (101.6%) | 115.0 (100.9%) |
 ---------------------------------------------------------------------

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 drivers/gpu/drm/i915/gvt/scheduler.c | 101 +++++++++++++++++++--------
 1 file changed, 73 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
index 5b2a7d072ec9..c2b521b099bc 100644
--- a/drivers/gpu/drm/i915/gvt/scheduler.c
+++ b/drivers/gpu/drm/i915/gvt/scheduler.c
@@ -129,16 +129,22 @@ static int populate_shadow_context(struct intel_vgpu_workload *workload)
 	struct intel_vgpu *vgpu = workload->vgpu;
 	struct intel_gvt *gvt = vgpu->gvt;
 	int ring_id = workload->ring_id;
-	struct drm_i915_gem_object *ctx_obj =
-		workload->req->hw_context->state->obj;
+	struct intel_context *ctx_ce = workload->req->hw_context;
 	struct execlist_ring_context *shadow_ring_context;
-	struct page *page;
 	void *dst;
+	void *context_base;
 	unsigned long context_gpa, context_page_num;
+	unsigned long seq_gpa_base; /* first gpa of successive GPAs */
+	int seq_page_cnt; /* cnt of GPAs that are successive */
 	int i;
 
-	page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
-	shadow_ring_context = kmap(page);
+	GEM_BUG_ON(!intel_context_is_pinned(ctx_ce));
+
+	context_base = (void *) ctx_ce->lrc_reg_state -
+				(LRC_STATE_PN << I915_GTT_PAGE_SHIFT);
+
+	shadow_ring_context = context_base +
+				(LRC_STATE_PN << I915_GTT_PAGE_SHIFT);
 
 	sr_oa_regs(workload, (u32 *)shadow_ring_context, true);
 #define COPY_REG(name) \
@@ -170,7 +176,6 @@ static int populate_shadow_context(struct intel_vgpu_workload *workload)
 			I915_GTT_PAGE_SIZE - sizeof(*shadow_ring_context));
 
 	sr_oa_regs(workload, (u32 *)shadow_ring_context, false);
-	kunmap(page);
 
 	if (IS_RESTORE_INHIBIT(shadow_ring_context->ctx_ctrl.val))
 		return 0;
@@ -185,8 +190,12 @@ static int populate_shadow_context(struct intel_vgpu_workload *workload)
 	if (IS_BROADWELL(gvt->dev_priv) && ring_id == RCS0)
 		context_page_num = 19;
 
-	i = 2;
-	while (i < context_page_num) {
+
+	/* find continuous GPAs from gma until the first discontinuous GPA
+	 * read continuous GPAs into dst virtual address
+	 */
+	seq_page_cnt = 0;
+	for (i = 2; i < context_page_num; i++) {
 		context_gpa = intel_vgpu_gma_to_gpa(vgpu->gtt.ggtt_mm,
 				(u32)((workload->ctx_desc.lrca + i) <<
 				I915_GTT_PAGE_SHIFT));
@@ -195,12 +204,26 @@ static int populate_shadow_context(struct intel_vgpu_workload *workload)
 			return -EFAULT;
 		}
 
-		page = i915_gem_object_get_page(ctx_obj, i);
-		dst = kmap(page);
-		intel_gvt_hypervisor_read_gpa(vgpu, context_gpa, dst,
-				I915_GTT_PAGE_SIZE);
-		kunmap(page);
-		i++;
+		if (seq_page_cnt == 0) {
+			seq_gpa_base = context_gpa;
+			dst = context_base + (i << I915_GTT_PAGE_SHIFT);
+		} else if (context_gpa != seq_gpa_base +
+					(seq_page_cnt << I915_GTT_PAGE_SHIFT))
+			goto read;
+
+		seq_page_cnt++;
+
+		if (i == context_page_num - 1)
+			goto read;
+
+		continue;
+
+read:
+		intel_gvt_hypervisor_read_gpa(vgpu, seq_gpa_base, dst,
+					seq_page_cnt << I915_GTT_PAGE_SHIFT);
+		seq_page_cnt = 1;
+		seq_gpa_base = context_gpa;
+		dst = context_base + (i << I915_GTT_PAGE_SHIFT);
 	}
 	return 0;
 }
@@ -787,20 +810,24 @@ static void update_guest_context(struct intel_vgpu_workload *workload)
 	struct i915_request *rq = workload->req;
 	struct intel_vgpu *vgpu = workload->vgpu;
 	struct intel_gvt *gvt = vgpu->gvt;
-	struct drm_i915_gem_object *ctx_obj = rq->hw_context->state->obj;
+	struct intel_context *ctx_ce = workload->req->hw_context;
 	struct execlist_ring_context *shadow_ring_context;
-	struct page *page;
-	void *src;
 	unsigned long context_gpa, context_page_num;
+	unsigned long seq_gpa_base; /* first gpa of successive GPAs */
+	int seq_page_cnt; /* cnt of GPAs that are successive */
 	int i;
 	struct drm_i915_private *dev_priv = gvt->dev_priv;
 	u32 ring_base;
 	u32 head, tail;
 	u16 wrap_count;
+	void *src;
+	void *context_base;
 
 	gvt_dbg_sched("ring id %d workload lrca %x\n", rq->engine->id,
 		      workload->ctx_desc.lrca);
 
+	GEM_BUG_ON(!intel_context_is_pinned(ctx_ce));
+
 	head = workload->rb_head;
 	tail = workload->rb_tail;
 	wrap_count = workload->guest_rb_head >> RB_HEAD_WRAP_CNT_OFF;
@@ -824,9 +851,14 @@ static void update_guest_context(struct intel_vgpu_workload *workload)
 	if (IS_BROADWELL(gvt->dev_priv) && rq->engine->id == RCS0)
 		context_page_num = 19;
 
-	i = 2;
+	context_base = (void *) ctx_ce->lrc_reg_state -
+		 LRC_STATE_PN * PAGE_SIZE;
 
-	while (i < context_page_num) {
+	/* find continuous GPAs from gma until the first discontinuous GPA
+	 * write continuous GPAs from src virtual address
+	 */
+	seq_page_cnt = 0;
+	for (i = 2; i < context_page_num; i++) {
 		context_gpa = intel_vgpu_gma_to_gpa(vgpu->gtt.ggtt_mm,
 				(u32)((workload->ctx_desc.lrca + i) <<
 					I915_GTT_PAGE_SHIFT));
@@ -835,19 +867,33 @@ static void update_guest_context(struct intel_vgpu_workload *workload)
 			return;
 		}
 
-		page = i915_gem_object_get_page(ctx_obj, i);
-		src = kmap(page);
-		intel_gvt_hypervisor_write_gpa(vgpu, context_gpa, src,
-				I915_GTT_PAGE_SIZE);
-		kunmap(page);
-		i++;
+		if (seq_page_cnt == 0) {
+			seq_gpa_base = context_gpa;
+			src = context_base + (i << I915_GTT_PAGE_SHIFT);
+		} else if (context_gpa != seq_gpa_base +
+					(seq_page_cnt << I915_GTT_PAGE_SHIFT))
+			goto write;
+
+		seq_page_cnt++;
+
+		if (i == context_page_num - 1)
+			goto write;
+
+		continue;
+
+write:
+		intel_gvt_hypervisor_write_gpa(vgpu, seq_gpa_base, src,
+					seq_page_cnt << I915_GTT_PAGE_SHIFT);
+		seq_page_cnt = 1;
+		seq_gpa_base = context_gpa;
+		src = context_base + (i << I915_GTT_PAGE_SHIFT);
 	}
 
 	intel_gvt_hypervisor_write_gpa(vgpu, workload->ring_context_gpa +
 		RING_CTX_OFF(ring_header.val), &workload->rb_tail, 4);
 
-	page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
-	shadow_ring_context = kmap(page);
+	shadow_ring_context = context_base +
+				(LRC_STATE_PN << I915_GTT_PAGE_SHIFT);
 
 #define COPY_REG(name) \
 	intel_gvt_hypervisor_write_gpa(vgpu, workload->ring_context_gpa + \
@@ -865,7 +911,6 @@ static void update_guest_context(struct intel_vgpu_workload *workload)
 			sizeof(*shadow_ring_context),
 			I915_GTT_PAGE_SIZE - sizeof(*shadow_ring_context));
 
-	kunmap(page);
 }
 
 void intel_vgpu_clean_workloads(struct intel_vgpu *vgpu,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/7] vfio: introduce vfio_dma_rw to read/write a range of IOVAs
  2020-02-24  8:47 ` [PATCH v3 2/7] vfio: introduce vfio_dma_rw to read/write a range of IOVAs Yan Zhao
@ 2020-02-24 19:14   ` Alex Williamson
  2020-02-25  3:44     ` Yan Zhao
  2020-03-06  1:21   ` Yan Zhao
  1 sibling, 1 reply; 17+ messages in thread
From: Alex Williamson @ 2020-02-24 19:14 UTC (permalink / raw)
  To: Yan Zhao
  Cc: zhenyuw, intel-gvt-dev, kvm, linux-kernel, pbonzini, kevin.tian, peterx

On Mon, 24 Feb 2020 03:47:15 -0500
Yan Zhao <yan.y.zhao@intel.com> wrote:

> vfio_dma_rw will read/write a range of user space memory pointed to by
> IOVA into/from a kernel buffer without enforcing pinning the user space
> memory.
> 
> TODO: mark the IOVAs to user space memory dirty if they are written in
> vfio_dma_rw().
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>  drivers/vfio/vfio.c             | 49 +++++++++++++++++++++
>  drivers/vfio/vfio_iommu_type1.c | 77 +++++++++++++++++++++++++++++++++
>  include/linux/vfio.h            |  5 +++
>  3 files changed, 131 insertions(+)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 914bdf4b9d73..902867627cbf 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1998,6 +1998,55 @@ int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn, int npage)
>  }
>  EXPORT_SYMBOL(vfio_unpin_pages);
>  
> +
> +/*
> + * This interface allows the CPUs to perform some sort of virtual DMA on
> + * behalf of the device.
> + *
> + * CPUs read/write a range of IOVAs pointing to user space memory into/from
> + * a kernel buffer.
> + *
> + * As the read/write of user space memory is conducted via the CPUs and is
> + * not a real device DMA, it is not necessary to pin the user space memory.
> + *
> + * The caller needs to call vfio_group_get_external_user() or
> + * vfio_group_get_external_user_from_dev() prior to calling this interface,
> + * so as to prevent the VFIO group from disposal in the middle of the call.
> + * But it can keep the reference to the VFIO group for several calls into
> + * this interface.
> + * After finishing using of the VFIO group, the caller needs to release the
> + * VFIO group by calling vfio_group_put_external_user().
> + *
> + * @group [in]: vfio group of a device
> + * @iova [in] : base IOVA of a user space buffer
> + * @data [in] : pointer to kernel buffer
> + * @len [in]  : kernel buffer length
> + * @write     : indicate read or write
> + * Return error code on failure or 0 on success.
> + */
> +int vfio_dma_rw(struct vfio_group *group, dma_addr_t iova,
> +		void *data, size_t len, bool write)
> +{
> +	struct vfio_container *container;
> +	struct vfio_iommu_driver *driver;
> +	int ret = 0;
> +
> +	if (!group || !data || len <= 0)
> +		return -EINVAL;
> +
> +	container = group->container;
> +	driver = container->iommu_driver;
> +
> +	if (likely(driver && driver->ops->dma_rw))
> +		ret = driver->ops->dma_rw(container->iommu_data,
> +					  iova, data, len, write);
> +	else
> +		ret = -ENOTTY;
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(vfio_dma_rw);
> +
>  static int vfio_register_iommu_notifier(struct vfio_group *group,
>  					unsigned long *events,
>  					struct notifier_block *nb)
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 2ada8e6cdb88..74e1c425943c 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -27,6 +27,7 @@
>  #include <linux/iommu.h>
>  #include <linux/module.h>
>  #include <linux/mm.h>
> +#include <linux/mmu_context.h>
>  #include <linux/rbtree.h>
>  #include <linux/sched/signal.h>
>  #include <linux/sched/mm.h>
> @@ -2326,6 +2327,81 @@ static int vfio_iommu_type1_unregister_notifier(void *iommu_data,
>  	return blocking_notifier_chain_unregister(&iommu->notifier, nb);
>  }
>  
> +static size_t vfio_iommu_type1_dma_rw_chunk(struct vfio_iommu *iommu,
> +					    dma_addr_t iova, void *data,
> +					    size_t count, bool write)
> +{
> +	struct mm_struct *mm;
> +	unsigned long vaddr;
> +	struct vfio_dma *dma;
> +	bool kthread = current->mm == NULL;
> +	size_t done = 0;
> +	size_t offset;
> +
> +	dma = vfio_find_dma(iommu, iova, 1);
> +	if (!dma)
> +		return 0;
> +
> +	if ((write && !(dma->prot & IOMMU_WRITE)) ||
> +			!(dma->prot & IOMMU_READ))
> +		return 0;
> +
> +	mm = get_task_mm(dma->task);
> +
> +	if (!mm)
> +		return 0;
> +
> +	if (kthread)
> +		use_mm(mm);
> +	else if (current->mm != mm)
> +		goto out;
> +
> +	offset = iova - dma->iova;
> +
> +	if (count > dma->size - offset)
> +		count = dma->size - offset;
> +
> +	vaddr = dma->vaddr + offset;
> +
> +	if (write)
> +		done = __copy_to_user((void __user *)vaddr,
> +				       data, count) ? 0 : count;
> +	else
> +		done = __copy_from_user(data, (void __user *)vaddr,
> +					count) ? 0 : count;
> +
> +	if (kthread)
> +		unuse_mm(mm);
> +out:
> +	mmput(mm);
> +	return done;


Return 0 on error?  Why wouldn't this function decide the errno rather
than masking them all as -EFAULT by the callee below?  Thanks,

Alex

> +}
> +
> +static int vfio_iommu_type1_dma_rw(void *iommu_data, dma_addr_t iova,
> +				   void *data, size_t count, bool write)
> +{
> +	struct vfio_iommu *iommu = iommu_data;
> +	int ret = 0;
> +	size_t done = 0;
> +
> +	mutex_lock(&iommu->lock);
> +	while (count > 0) {
> +		done = vfio_iommu_type1_dma_rw_chunk(iommu, iova, data,
> +						     count, write);
> +		if (!done) {
> +			ret = -EFAULT;
> +			break;
> +		}
> +
> +		count -= done;
> +		data += done;
> +		iova += done;
> +	}
> +
> +	mutex_unlock(&iommu->lock);
> +	return ret;
> +}
> +
>  static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
>  	.name			= "vfio-iommu-type1",
>  	.owner			= THIS_MODULE,
> @@ -2338,6 +2414,7 @@ static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
>  	.unpin_pages		= vfio_iommu_type1_unpin_pages,
>  	.register_notifier	= vfio_iommu_type1_register_notifier,
>  	.unregister_notifier	= vfio_iommu_type1_unregister_notifier,
> +	.dma_rw			= vfio_iommu_type1_dma_rw,
>  };
>  
>  static int __init vfio_iommu_type1_init(void)
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 2e1fa0c7396f..fea0cb1e61d2 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -82,6 +82,8 @@ struct vfio_iommu_driver_ops {
>  					     struct notifier_block *nb);
>  	int		(*unregister_notifier)(void *iommu_data,
>  					       struct notifier_block *nb);
> +	int		(*dma_rw)(void *iommu_data, dma_addr_t iova,
> +				  void *data, size_t count, bool write);
>  };
>  
>  extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
> @@ -109,6 +111,9 @@ extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
>  extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn,
>  			    int npage);
>  
> +extern int vfio_dma_rw(struct vfio_group *group, dma_addr_t iova, void *data,
> +		       size_t len, bool write);
> +
>  /* each type has independent events */
>  enum vfio_notify_type {
>  	VFIO_IOMMU_NOTIFY = 0,


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/7] vfio: allow external user to get vfio group from device
  2020-02-24  8:46 ` [PATCH v3 1/7] vfio: allow external user to get vfio group from device Yan Zhao
@ 2020-02-24 19:15   ` Alex Williamson
  2020-02-25  3:35     ` Yan Zhao
  0 siblings, 1 reply; 17+ messages in thread
From: Alex Williamson @ 2020-02-24 19:15 UTC (permalink / raw)
  To: Yan Zhao
  Cc: zhenyuw, intel-gvt-dev, kvm, linux-kernel, pbonzini, kevin.tian, peterx

On Mon, 24 Feb 2020 03:46:41 -0500
Yan Zhao <yan.y.zhao@intel.com> wrote:

> external user is able to
> 1. add a device into an vfio group

How so?  The device is added via existing mechanisms, the only thing
added here is an interface to get a group reference from a struct
device.

> 2. call vfio_group_get_external_user_from_dev() with the device pointer
> to get vfio_group associated with this device and increments the container
> user counter to prevent the VFIO group from disposal before KVM exits.
> 3. When the external KVM finishes, it calls vfio_group_put_external_user()
> to release the VFIO group.
> 
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>  drivers/vfio/vfio.c  | 37 +++++++++++++++++++++++++++++++++++++
>  include/linux/vfio.h |  2 ++
>  2 files changed, 39 insertions(+)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index c8482624ca34..914bdf4b9d73 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1720,6 +1720,43 @@ struct vfio_group *vfio_group_get_external_user(struct file *filep)
>  }
>  EXPORT_SYMBOL_GPL(vfio_group_get_external_user);
>  
> +/**
> + * External user API, exported by symbols to be linked dynamically.
> + *
> + * The protocol includes:
> + * 1. External user add a device into a vfio group
> + *
> + * 2. The external user calls vfio_group_get_external_user_from_dev()
> + * with the device pointer
> + * to verify that:
> + *	- there's a vfio group associated with it and is initialized;
> + *	- IOMMU is set for the vfio group.
> + * If both checks passed, vfio_group_get_external_user_from_dev()
> + * increments the container user counter to prevent
> + * the VFIO group from disposal before KVM exits.
> + *
> + * 3. When the external KVM finishes, it calls
> + * vfio_group_put_external_user() to release the VFIO group.
> + * This call decrements the container user counter.
> + */

I don't think we need to duplicate this whole comment block for a
_from_dev() version of the existing vfio_group_get_external_user().
Please merge the comments.

> +
> +struct vfio_group *vfio_group_get_external_user_from_dev(struct device *dev)
> +{
> +	struct vfio_group *group;
> +	int ret;
> +
> +	group = vfio_group_get_from_dev(dev);
> +	if (!group)
> +		return ERR_PTR(-ENODEV);
> +
> +	ret = vfio_group_add_container_user(group);
> +	if (ret)
> +		return ERR_PTR(ret);

Error path leaks group reference.

> +
> +	return group;
> +}
> +EXPORT_SYMBOL_GPL(vfio_group_get_external_user_from_dev);
> +
>  void vfio_group_put_external_user(struct vfio_group *group)
>  {
>  	vfio_group_try_dissolve_container(group);
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index e42a711a2800..2e1fa0c7396f 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -94,6 +94,8 @@ extern void vfio_unregister_iommu_driver(
>   */
>  extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
>  extern void vfio_group_put_external_user(struct vfio_group *group);
> +extern
> +struct vfio_group *vfio_group_get_external_user_from_dev(struct device *dev);

Slight cringe at this line wrap, personally would prefer to wrap the
args as done repeatedly elsewhere in this file.  Thanks,

Alex

>  extern bool vfio_external_group_match_file(struct vfio_group *group,
>  					   struct file *filep);
>  extern int vfio_external_user_iommu_id(struct vfio_group *group);


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/7] vfio: allow external user to get vfio group from device
  2020-02-24 19:15   ` Alex Williamson
@ 2020-02-25  3:35     ` Yan Zhao
  2020-03-05 19:01       ` Alex Williamson
  0 siblings, 1 reply; 17+ messages in thread
From: Yan Zhao @ 2020-02-25  3:35 UTC (permalink / raw)
  To: Alex Williamson
  Cc: zhenyuw, intel-gvt-dev, kvm, linux-kernel, pbonzini, Tian, Kevin, peterx

On Tue, Feb 25, 2020 at 03:15:04AM +0800, Alex Williamson wrote:
> On Mon, 24 Feb 2020 03:46:41 -0500
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > external user is able to
> > 1. add a device into an vfio group
> 
> How so?  The device is added via existing mechanisms, the only thing
> added here is an interface to get a group reference from a struct
> device.
> 
> > 2. call vfio_group_get_external_user_from_dev() with the device pointer
> > to get vfio_group associated with this device and increments the container
> > user counter to prevent the VFIO group from disposal before KVM exits.
> > 3. When the external KVM finishes, it calls vfio_group_put_external_user()
> > to release the VFIO group.
> > 
> > Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > ---
> >  drivers/vfio/vfio.c  | 37 +++++++++++++++++++++++++++++++++++++
> >  include/linux/vfio.h |  2 ++
> >  2 files changed, 39 insertions(+)
> > 
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > index c8482624ca34..914bdf4b9d73 100644
> > --- a/drivers/vfio/vfio.c
> > +++ b/drivers/vfio/vfio.c
> > @@ -1720,6 +1720,43 @@ struct vfio_group *vfio_group_get_external_user(struct file *filep)
> >  }
> >  EXPORT_SYMBOL_GPL(vfio_group_get_external_user);
> >  
> > +/**
> > + * External user API, exported by symbols to be linked dynamically.
> > + *
> > + * The protocol includes:
> > + * 1. External user add a device into a vfio group
> > + *
> > + * 2. The external user calls vfio_group_get_external_user_from_dev()
> > + * with the device pointer
> > + * to verify that:
> > + *	- there's a vfio group associated with it and is initialized;
> > + *	- IOMMU is set for the vfio group.
> > + * If both checks passed, vfio_group_get_external_user_from_dev()
> > + * increments the container user counter to prevent
> > + * the VFIO group from disposal before KVM exits.
> > + *
> > + * 3. When the external KVM finishes, it calls
> > + * vfio_group_put_external_user() to release the VFIO group.
> > + * This call decrements the container user counter.
> > + */
> 
> I don't think we need to duplicate this whole comment block for a
> _from_dev() version of the existing vfio_group_get_external_user().
> Please merge the comments.
ok. but I have a question: for an external user, as it already has group
fd, it can use vfio_group_get_external_user() directly, is there a
necessity for it to call vfio_group_get_external_user_from_dev() ?

If an external user wants to call this interface, it needs to first get
device fd, passes the device fd to kernel and kernel retrieves the pointer
to struct device, right?


> > +
> > +struct vfio_group *vfio_group_get_external_user_from_dev(struct device *dev)
> > +{
> > +	struct vfio_group *group;
> > +	int ret;
> > +
> > +	group = vfio_group_get_from_dev(dev);
> > +	if (!group)
> > +		return ERR_PTR(-ENODEV);
> > +
> > +	ret = vfio_group_add_container_user(group);
> > +	if (ret)
> > +		return ERR_PTR(ret);
> 
> Error path leaks group reference.
>
oops, sorry for that.

> > +
> > +	return group;
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_group_get_external_user_from_dev);
> > +
> >  void vfio_group_put_external_user(struct vfio_group *group)
> >  {
> >  	vfio_group_try_dissolve_container(group);
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index e42a711a2800..2e1fa0c7396f 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -94,6 +94,8 @@ extern void vfio_unregister_iommu_driver(
> >   */
> >  extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
> >  extern void vfio_group_put_external_user(struct vfio_group *group);
> > +extern
> > +struct vfio_group *vfio_group_get_external_user_from_dev(struct device *dev);
> 
> Slight cringe at this line wrap, personally would prefer to wrap the
> args as done repeatedly elsewhere in this file.  Thanks,
>
yeah, I tried to do in that way, but the name of this interface is too long,
as well as its return type, it passes 80 characters limit even with just one
arg...

is it better to wrap in below way?
extern struct vfio_group *vfio_group_get_external_user_from_dev(struct device
								*dev);

or just a shorter interface name?
extern struct vfio_group *vfio_group_get_user_from_dev(struct device *dev);

Thanks
Yan
> 
> >  extern bool vfio_external_group_match_file(struct vfio_group *group,
> >  					   struct file *filep);
> >  extern int vfio_external_user_iommu_id(struct vfio_group *group);
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/7] vfio: introduce vfio_dma_rw to read/write a range of IOVAs
  2020-02-24 19:14   ` Alex Williamson
@ 2020-02-25  3:44     ` Yan Zhao
  0 siblings, 0 replies; 17+ messages in thread
From: Yan Zhao @ 2020-02-25  3:44 UTC (permalink / raw)
  To: Alex Williamson
  Cc: zhenyuw, intel-gvt-dev, kvm, linux-kernel, pbonzini, Tian, Kevin, peterx

On Tue, Feb 25, 2020 at 03:14:42AM +0800, Alex Williamson wrote:
> On Mon, 24 Feb 2020 03:47:15 -0500
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > vfio_dma_rw will read/write a range of user space memory pointed to by
> > IOVA into/from a kernel buffer without enforcing pinning the user space
> > memory.
> > 
> > TODO: mark the IOVAs to user space memory dirty if they are written in
> > vfio_dma_rw().
> > 
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > ---
> >  drivers/vfio/vfio.c             | 49 +++++++++++++++++++++
> >  drivers/vfio/vfio_iommu_type1.c | 77 +++++++++++++++++++++++++++++++++
> >  include/linux/vfio.h            |  5 +++
> >  3 files changed, 131 insertions(+)
> > 
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > index 914bdf4b9d73..902867627cbf 100644
> > --- a/drivers/vfio/vfio.c
> > +++ b/drivers/vfio/vfio.c
> > @@ -1998,6 +1998,55 @@ int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn, int npage)
> >  }
> >  EXPORT_SYMBOL(vfio_unpin_pages);
> >  
> > +
> > +/*
> > + * This interface allows the CPUs to perform some sort of virtual DMA on
> > + * behalf of the device.
> > + *
> > + * CPUs read/write a range of IOVAs pointing to user space memory into/from
> > + * a kernel buffer.
> > + *
> > + * As the read/write of user space memory is conducted via the CPUs and is
> > + * not a real device DMA, it is not necessary to pin the user space memory.
> > + *
> > + * The caller needs to call vfio_group_get_external_user() or
> > + * vfio_group_get_external_user_from_dev() prior to calling this interface,
> > + * so as to prevent the VFIO group from disposal in the middle of the call.
> > + * But it can keep the reference to the VFIO group for several calls into
> > + * this interface.
> > + * After finishing using of the VFIO group, the caller needs to release the
> > + * VFIO group by calling vfio_group_put_external_user().
> > + *
> > + * @group [in]: vfio group of a device
> > + * @iova [in] : base IOVA of a user space buffer
> > + * @data [in] : pointer to kernel buffer
> > + * @len [in]  : kernel buffer length
> > + * @write     : indicate read or write
> > + * Return error code on failure or 0 on success.
> > + */
> > +int vfio_dma_rw(struct vfio_group *group, dma_addr_t iova,
> > +		void *data, size_t len, bool write)
> > +{
> > +	struct vfio_container *container;
> > +	struct vfio_iommu_driver *driver;
> > +	int ret = 0;
> > +
> > +	if (!group || !data || len <= 0)
> > +		return -EINVAL;
> > +
> > +	container = group->container;
> > +	driver = container->iommu_driver;
> > +
> > +	if (likely(driver && driver->ops->dma_rw))
> > +		ret = driver->ops->dma_rw(container->iommu_data,
> > +					  iova, data, len, write);
> > +	else
> > +		ret = -ENOTTY;
> > +
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL(vfio_dma_rw);
> > +
> >  static int vfio_register_iommu_notifier(struct vfio_group *group,
> >  					unsigned long *events,
> >  					struct notifier_block *nb)
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index 2ada8e6cdb88..74e1c425943c 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -27,6 +27,7 @@
> >  #include <linux/iommu.h>
> >  #include <linux/module.h>
> >  #include <linux/mm.h>
> > +#include <linux/mmu_context.h>
> >  #include <linux/rbtree.h>
> >  #include <linux/sched/signal.h>
> >  #include <linux/sched/mm.h>
> > @@ -2326,6 +2327,81 @@ static int vfio_iommu_type1_unregister_notifier(void *iommu_data,
> >  	return blocking_notifier_chain_unregister(&iommu->notifier, nb);
> >  }
> >  
> > +static size_t vfio_iommu_type1_dma_rw_chunk(struct vfio_iommu *iommu,
> > +					    dma_addr_t iova, void *data,
> > +					    size_t count, bool write)
> > +{
> > +	struct mm_struct *mm;
> > +	unsigned long vaddr;
> > +	struct vfio_dma *dma;
> > +	bool kthread = current->mm == NULL;
> > +	size_t done = 0;
> > +	size_t offset;
> > +
> > +	dma = vfio_find_dma(iommu, iova, 1);
> > +	if (!dma)
> > +		return 0;
> > +
> > +	if ((write && !(dma->prot & IOMMU_WRITE)) ||
> > +			!(dma->prot & IOMMU_READ))
> > +		return 0;
> > +
> > +	mm = get_task_mm(dma->task);
> > +
> > +	if (!mm)
> > +		return 0;
> > +
> > +	if (kthread)
> > +		use_mm(mm);
> > +	else if (current->mm != mm)
> > +		goto out;
> > +
> > +	offset = iova - dma->iova;
> > +
> > +	if (count > dma->size - offset)
> > +		count = dma->size - offset;
> > +
> > +	vaddr = dma->vaddr + offset;
> > +
> > +	if (write)
> > +		done = __copy_to_user((void __user *)vaddr,
> > +				       data, count) ? 0 : count;
> > +	else
> > +		done = __copy_from_user(data, (void __user *)vaddr,
> > +					count) ? 0 : count;
> > +
> > +	if (kthread)
> > +		unuse_mm(mm);
> > +out:
> > +	mmput(mm);
> > +	return done;
> 
> 
> Return 0 on error?  Why wouldn't this function decide the errno rather
> than masking them all as -EFAULT by the callee below?  Thanks,
 
ok. let me return negative errno on error. Thanks!
Yan
> 
> > +}
> > +
> > +static int vfio_iommu_type1_dma_rw(void *iommu_data, dma_addr_t iova,
> > +				   void *data, size_t count, bool write)
> > +{
> > +	struct vfio_iommu *iommu = iommu_data;
> > +	int ret = 0;
> > +	size_t done = 0;
> > +
> > +	mutex_lock(&iommu->lock);
> > +	while (count > 0) {
> > +		done = vfio_iommu_type1_dma_rw_chunk(iommu, iova, data,
> > +						     count, write);
> > +		if (!done) {
> > +			ret = -EFAULT;
> > +			break;
> > +		}
> > +
> > +		count -= done;
> > +		data += done;
> > +		iova += done;
> > +	}
> > +
> > +	mutex_unlock(&iommu->lock);
> > +	return ret;
> > +}
> > +
> >  static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
> >  	.name			= "vfio-iommu-type1",
> >  	.owner			= THIS_MODULE,
> > @@ -2338,6 +2414,7 @@ static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
> >  	.unpin_pages		= vfio_iommu_type1_unpin_pages,
> >  	.register_notifier	= vfio_iommu_type1_register_notifier,
> >  	.unregister_notifier	= vfio_iommu_type1_unregister_notifier,
> > +	.dma_rw			= vfio_iommu_type1_dma_rw,
> >  };
> >  
> >  static int __init vfio_iommu_type1_init(void)
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index 2e1fa0c7396f..fea0cb1e61d2 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -82,6 +82,8 @@ struct vfio_iommu_driver_ops {
> >  					     struct notifier_block *nb);
> >  	int		(*unregister_notifier)(void *iommu_data,
> >  					       struct notifier_block *nb);
> > +	int		(*dma_rw)(void *iommu_data, dma_addr_t iova,
> > +				  void *data, size_t count, bool write);
> >  };
> >  
> >  extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
> > @@ -109,6 +111,9 @@ extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
> >  extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn,
> >  			    int npage);
> >  
> > +extern int vfio_dma_rw(struct vfio_group *group, dma_addr_t iova, void *data,
> > +		       size_t len, bool write);
> > +
> >  /* each type has independent events */
> >  enum vfio_notify_type {
> >  	VFIO_IOMMU_NOTIFY = 0,
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/7] vfio: allow external user to get vfio group from device
  2020-02-25  3:35     ` Yan Zhao
@ 2020-03-05 19:01       ` Alex Williamson
  2020-03-06  1:12         ` Yan Zhao
  0 siblings, 1 reply; 17+ messages in thread
From: Alex Williamson @ 2020-03-05 19:01 UTC (permalink / raw)
  To: Yan Zhao
  Cc: zhenyuw, intel-gvt-dev, kvm, linux-kernel, pbonzini, Tian, Kevin, peterx

On Mon, 24 Feb 2020 22:35:42 -0500
Yan Zhao <yan.y.zhao@intel.com> wrote:

> On Tue, Feb 25, 2020 at 03:15:04AM +0800, Alex Williamson wrote:
> > On Mon, 24 Feb 2020 03:46:41 -0500
> > Yan Zhao <yan.y.zhao@intel.com> wrote:
> >   
> > > external user is able to
> > > 1. add a device into an vfio group  
> > 
> > How so?  The device is added via existing mechanisms, the only thing
> > added here is an interface to get a group reference from a struct
> > device.
> >   
> > > 2. call vfio_group_get_external_user_from_dev() with the device pointer
> > > to get vfio_group associated with this device and increments the container
> > > user counter to prevent the VFIO group from disposal before KVM exits.
> > > 3. When the external KVM finishes, it calls vfio_group_put_external_user()
> > > to release the VFIO group.
> > > 
> > > Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> > > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > > ---
> > >  drivers/vfio/vfio.c  | 37 +++++++++++++++++++++++++++++++++++++
> > >  include/linux/vfio.h |  2 ++
> > >  2 files changed, 39 insertions(+)
> > > 
> > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > > index c8482624ca34..914bdf4b9d73 100644
> > > --- a/drivers/vfio/vfio.c
> > > +++ b/drivers/vfio/vfio.c
> > > @@ -1720,6 +1720,43 @@ struct vfio_group *vfio_group_get_external_user(struct file *filep)
> > >  }
> > >  EXPORT_SYMBOL_GPL(vfio_group_get_external_user);
> > >  
> > > +/**
> > > + * External user API, exported by symbols to be linked dynamically.
> > > + *
> > > + * The protocol includes:
> > > + * 1. External user add a device into a vfio group
> > > + *
> > > + * 2. The external user calls vfio_group_get_external_user_from_dev()
> > > + * with the device pointer
> > > + * to verify that:
> > > + *	- there's a vfio group associated with it and is initialized;
> > > + *	- IOMMU is set for the vfio group.
> > > + * If both checks passed, vfio_group_get_external_user_from_dev()
> > > + * increments the container user counter to prevent
> > > + * the VFIO group from disposal before KVM exits.
> > > + *
> > > + * 3. When the external KVM finishes, it calls
> > > + * vfio_group_put_external_user() to release the VFIO group.
> > > + * This call decrements the container user counter.
> > > + */  
> > 
> > I don't think we need to duplicate this whole comment block for a
> > _from_dev() version of the existing vfio_group_get_external_user().
> > Please merge the comments.  
> ok. but I have a question: for an external user, as it already has group
> fd, it can use vfio_group_get_external_user() directly, is there a
> necessity for it to call vfio_group_get_external_user_from_dev() ?
> 
> If an external user wants to call this interface, it needs to first get
> device fd, passes the device fd to kernel and kernel retrieves the pointer
> to struct device, right?

If you have the fd already, then yeah, let's not add a _from_dev()
version, but how would an mdev vendor driver have the fd?  IIRC, the
existing interface is designed this way to allow the user to prove
ownership, whereas using a _from_dev() interface would be for trusted
parts of the kernel, where we can theoretically trust that code isn't
simply locating a device in order to perform malicious actions in the
user (because they'd have more direct ways than this to be malicious to
the user already).

> > > +
> > > +struct vfio_group *vfio_group_get_external_user_from_dev(struct device *dev)
> > > +{
> > > +	struct vfio_group *group;
> > > +	int ret;
> > > +
> > > +	group = vfio_group_get_from_dev(dev);
> > > +	if (!group)
> > > +		return ERR_PTR(-ENODEV);
> > > +
> > > +	ret = vfio_group_add_container_user(group);
> > > +	if (ret)
> > > +		return ERR_PTR(ret);  
> > 
> > Error path leaks group reference.
> >  
> oops, sorry for that.
> 
> > > +
> > > +	return group;
> > > +}
> > > +EXPORT_SYMBOL_GPL(vfio_group_get_external_user_from_dev);
> > > +
> > >  void vfio_group_put_external_user(struct vfio_group *group)
> > >  {
> > >  	vfio_group_try_dissolve_container(group);
> > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > > index e42a711a2800..2e1fa0c7396f 100644
> > > --- a/include/linux/vfio.h
> > > +++ b/include/linux/vfio.h
> > > @@ -94,6 +94,8 @@ extern void vfio_unregister_iommu_driver(
> > >   */
> > >  extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
> > >  extern void vfio_group_put_external_user(struct vfio_group *group);
> > > +extern
> > > +struct vfio_group *vfio_group_get_external_user_from_dev(struct device *dev);  
> > 
> > Slight cringe at this line wrap, personally would prefer to wrap the
> > args as done repeatedly elsewhere in this file.  Thanks,
> >  
> yeah, I tried to do in that way, but the name of this interface is too long,
> as well as its return type, it passes 80 characters limit even with just one
> arg...
> 
> is it better to wrap in below way?
> extern struct vfio_group *vfio_group_get_external_user_from_dev(struct device
> 								*dev);
> 
> or just a shorter interface name?
> extern struct vfio_group *vfio_group_get_user_from_dev(struct device *dev);

I'd probably tend towards the former, keeping "external" in the name
makes it clear that it belongs to a certain class of functions with
similar conventions.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 1/7] vfio: allow external user to get vfio group from device
  2020-03-05 19:01       ` Alex Williamson
@ 2020-03-06  1:12         ` Yan Zhao
  0 siblings, 0 replies; 17+ messages in thread
From: Yan Zhao @ 2020-03-06  1:12 UTC (permalink / raw)
  To: Alex Williamson
  Cc: zhenyuw, intel-gvt-dev, kvm, linux-kernel, pbonzini, Tian, Kevin, peterx

On Fri, Mar 06, 2020 at 03:01:49AM +0800, Alex Williamson wrote:
> On Mon, 24 Feb 2020 22:35:42 -0500
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > On Tue, Feb 25, 2020 at 03:15:04AM +0800, Alex Williamson wrote:
> > > On Mon, 24 Feb 2020 03:46:41 -0500
> > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > >   
> > > > external user is able to
> > > > 1. add a device into an vfio group  
> > > 
> > > How so?  The device is added via existing mechanisms, the only thing
> > > added here is an interface to get a group reference from a struct
> > > device.
> > >   
> > > > 2. call vfio_group_get_external_user_from_dev() with the device pointer
> > > > to get vfio_group associated with this device and increments the container
> > > > user counter to prevent the VFIO group from disposal before KVM exits.
> > > > 3. When the external KVM finishes, it calls vfio_group_put_external_user()
> > > > to release the VFIO group.
> > > > 
> > > > Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> > > > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > > > ---
> > > >  drivers/vfio/vfio.c  | 37 +++++++++++++++++++++++++++++++++++++
> > > >  include/linux/vfio.h |  2 ++
> > > >  2 files changed, 39 insertions(+)
> > > > 
> > > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > > > index c8482624ca34..914bdf4b9d73 100644
> > > > --- a/drivers/vfio/vfio.c
> > > > +++ b/drivers/vfio/vfio.c
> > > > @@ -1720,6 +1720,43 @@ struct vfio_group *vfio_group_get_external_user(struct file *filep)
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(vfio_group_get_external_user);
> > > >  
> > > > +/**
> > > > + * External user API, exported by symbols to be linked dynamically.
> > > > + *
> > > > + * The protocol includes:
> > > > + * 1. External user add a device into a vfio group
> > > > + *
> > > > + * 2. The external user calls vfio_group_get_external_user_from_dev()
> > > > + * with the device pointer
> > > > + * to verify that:
> > > > + *	- there's a vfio group associated with it and is initialized;
> > > > + *	- IOMMU is set for the vfio group.
> > > > + * If both checks passed, vfio_group_get_external_user_from_dev()
> > > > + * increments the container user counter to prevent
> > > > + * the VFIO group from disposal before KVM exits.
> > > > + *
> > > > + * 3. When the external KVM finishes, it calls
> > > > + * vfio_group_put_external_user() to release the VFIO group.
> > > > + * This call decrements the container user counter.
> > > > + */  
> > > 
> > > I don't think we need to duplicate this whole comment block for a
> > > _from_dev() version of the existing vfio_group_get_external_user().
> > > Please merge the comments.  
> > ok. but I have a question: for an external user, as it already has group
> > fd, it can use vfio_group_get_external_user() directly, is there a
> > necessity for it to call vfio_group_get_external_user_from_dev() ?
> > 
> > If an external user wants to call this interface, it needs to first get
> > device fd, passes the device fd to kernel and kernel retrieves the pointer
> > to struct device, right?
> 
> If you have the fd already, then yeah, let's not add a _from_dev()
> version, but how would an mdev vendor driver have the fd?  IIRC, the
> existing interface is designed this way to allow the user to prove
> ownership, whereas using a _from_dev() interface would be for trusted
> parts of the kernel, where we can theoretically trust that code isn't
> simply locating a device in order to perform malicious actions in the
> user (because they'd have more direct ways than this to be malicious to
> the user already).

ok, thanks for this explanation!

> > > > +
> > > > +struct vfio_group *vfio_group_get_external_user_from_dev(struct device *dev)
> > > > +{
> > > > +	struct vfio_group *group;
> > > > +	int ret;
> > > > +
> > > > +	group = vfio_group_get_from_dev(dev);
> > > > +	if (!group)
> > > > +		return ERR_PTR(-ENODEV);
> > > > +
> > > > +	ret = vfio_group_add_container_user(group);
> > > > +	if (ret)
> > > > +		return ERR_PTR(ret);  
> > > 
> > > Error path leaks group reference.
> > >  
> > oops, sorry for that.
> > 
> > > > +
> > > > +	return group;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(vfio_group_get_external_user_from_dev);
> > > > +
> > > >  void vfio_group_put_external_user(struct vfio_group *group)
> > > >  {
> > > >  	vfio_group_try_dissolve_container(group);
> > > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > > > index e42a711a2800..2e1fa0c7396f 100644
> > > > --- a/include/linux/vfio.h
> > > > +++ b/include/linux/vfio.h
> > > > @@ -94,6 +94,8 @@ extern void vfio_unregister_iommu_driver(
> > > >   */
> > > >  extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
> > > >  extern void vfio_group_put_external_user(struct vfio_group *group);
> > > > +extern
> > > > +struct vfio_group *vfio_group_get_external_user_from_dev(struct device *dev);  
> > > 
> > > Slight cringe at this line wrap, personally would prefer to wrap the
> > > args as done repeatedly elsewhere in this file.  Thanks,
> > >  
> > yeah, I tried to do in that way, but the name of this interface is too long,
> > as well as its return type, it passes 80 characters limit even with just one
> > arg...
> > 
> > is it better to wrap in below way?
> > extern struct vfio_group *vfio_group_get_external_user_from_dev(struct device
> > 								*dev);
> > 
> > or just a shorter interface name?
> > extern struct vfio_group *vfio_group_get_user_from_dev(struct device *dev);
> 
> I'd probably tend towards the former, keeping "external" in the name
> makes it clear that it belongs to a certain class of functions with
> similar conventions.  Thanks,
> 
Got it!
Thanks!

Yan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/7] vfio: introduce vfio_dma_rw to read/write a range of IOVAs
  2020-02-24  8:47 ` [PATCH v3 2/7] vfio: introduce vfio_dma_rw to read/write a range of IOVAs Yan Zhao
  2020-02-24 19:14   ` Alex Williamson
@ 2020-03-06  1:21   ` Yan Zhao
  2020-03-06 16:27     ` Alex Williamson
  1 sibling, 1 reply; 17+ messages in thread
From: Yan Zhao @ 2020-03-06  1:21 UTC (permalink / raw)
  To: alex.williamson
  Cc: zhenyuw, intel-gvt-dev, kvm, linux-kernel, pbonzini, Tian, Kevin, peterx

On Mon, Feb 24, 2020 at 04:47:15PM +0800, Zhao, Yan Y wrote:
> vfio_dma_rw will read/write a range of user space memory pointed to by
> IOVA into/from a kernel buffer without enforcing pinning the user space
> memory.
> 
> TODO: mark the IOVAs to user space memory dirty if they are written in
> vfio_dma_rw().
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>  drivers/vfio/vfio.c             | 49 +++++++++++++++++++++
>  drivers/vfio/vfio_iommu_type1.c | 77 +++++++++++++++++++++++++++++++++
>  include/linux/vfio.h            |  5 +++
>  3 files changed, 131 insertions(+)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 914bdf4b9d73..902867627cbf 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1998,6 +1998,55 @@ int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn, int npage)
>  }
>  EXPORT_SYMBOL(vfio_unpin_pages);
>  
> +
> +/*
> + * This interface allows the CPUs to perform some sort of virtual DMA on
> + * behalf of the device.
> + *
> + * CPUs read/write a range of IOVAs pointing to user space memory into/from
> + * a kernel buffer.
> + *
> + * As the read/write of user space memory is conducted via the CPUs and is
> + * not a real device DMA, it is not necessary to pin the user space memory.
> + *
> + * The caller needs to call vfio_group_get_external_user() or
> + * vfio_group_get_external_user_from_dev() prior to calling this interface,
> + * so as to prevent the VFIO group from disposal in the middle of the call.
> + * But it can keep the reference to the VFIO group for several calls into
> + * this interface.
> + * After finishing using of the VFIO group, the caller needs to release the
> + * VFIO group by calling vfio_group_put_external_user().
> + *
> + * @group [in]: vfio group of a device
> + * @iova [in] : base IOVA of a user space buffer
> + * @data [in] : pointer to kernel buffer
> + * @len [in]  : kernel buffer length
> + * @write     : indicate read or write
> + * Return error code on failure or 0 on success.
> + */
> +int vfio_dma_rw(struct vfio_group *group, dma_addr_t iova,
> +		void *data, size_t len, bool write)
hi Alex
May I rename this interface to vfio_dma_rw_from_group() that takes
VFIO group as arg and add another interface vfio_dma_rw(struct device *dev...) ?
That might be easier for a driver to use the second one if it does not care about
performance much.

Thanks
Yan

> +{
> +	struct vfio_container *container;
> +	struct vfio_iommu_driver *driver;
> +	int ret = 0;
> +
> +	if (!group || !data || len <= 0)
> +		return -EINVAL;
> +
> +	container = group->container;
> +	driver = container->iommu_driver;
> +
> +	if (likely(driver && driver->ops->dma_rw))
> +		ret = driver->ops->dma_rw(container->iommu_data,
> +					  iova, data, len, write);
> +	else
> +		ret = -ENOTTY;
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(vfio_dma_rw);
> +
>  static int vfio_register_iommu_notifier(struct vfio_group *group,
>  					unsigned long *events,
>  					struct notifier_block *nb)
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 2ada8e6cdb88..74e1c425943c 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -27,6 +27,7 @@
>  #include <linux/iommu.h>
>  #include <linux/module.h>
>  #include <linux/mm.h>
> +#include <linux/mmu_context.h>
>  #include <linux/rbtree.h>
>  #include <linux/sched/signal.h>
>  #include <linux/sched/mm.h>
> @@ -2326,6 +2327,81 @@ static int vfio_iommu_type1_unregister_notifier(void *iommu_data,
>  	return blocking_notifier_chain_unregister(&iommu->notifier, nb);
>  }
>  
> +static size_t vfio_iommu_type1_dma_rw_chunk(struct vfio_iommu *iommu,
> +					    dma_addr_t iova, void *data,
> +					    size_t count, bool write)
> +{
> +	struct mm_struct *mm;
> +	unsigned long vaddr;
> +	struct vfio_dma *dma;
> +	bool kthread = current->mm == NULL;
> +	size_t done = 0;
> +	size_t offset;
> +
> +	dma = vfio_find_dma(iommu, iova, 1);
> +	if (!dma)
> +		return 0;
> +
> +	if ((write && !(dma->prot & IOMMU_WRITE)) ||
> +			!(dma->prot & IOMMU_READ))
> +		return 0;
> +
> +	mm = get_task_mm(dma->task);
> +
> +	if (!mm)
> +		return 0;
> +
> +	if (kthread)
> +		use_mm(mm);
> +	else if (current->mm != mm)
> +		goto out;
> +
> +	offset = iova - dma->iova;
> +
> +	if (count > dma->size - offset)
> +		count = dma->size - offset;
> +
> +	vaddr = dma->vaddr + offset;
> +
> +	if (write)
> +		done = __copy_to_user((void __user *)vaddr,
> +				       data, count) ? 0 : count;
> +	else
> +		done = __copy_from_user(data, (void __user *)vaddr,
> +					count) ? 0 : count;
> +
> +	if (kthread)
> +		unuse_mm(mm);
> +out:
> +	mmput(mm);
> +	return done;
> +}
> +
> +static int vfio_iommu_type1_dma_rw(void *iommu_data, dma_addr_t iova,
> +				   void *data, size_t count, bool write)
> +{
> +	struct vfio_iommu *iommu = iommu_data;
> +	int ret = 0;
> +	size_t done = 0;
> +
> +	mutex_lock(&iommu->lock);
> +	while (count > 0) {
> +		done = vfio_iommu_type1_dma_rw_chunk(iommu, iova, data,
> +						     count, write);
> +		if (!done) {
> +			ret = -EFAULT;
> +			break;
> +		}
> +
> +		count -= done;
> +		data += done;
> +		iova += done;
> +	}
> +
> +	mutex_unlock(&iommu->lock);
> +	return ret;
> +}
> +
>  static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
>  	.name			= "vfio-iommu-type1",
>  	.owner			= THIS_MODULE,
> @@ -2338,6 +2414,7 @@ static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
>  	.unpin_pages		= vfio_iommu_type1_unpin_pages,
>  	.register_notifier	= vfio_iommu_type1_register_notifier,
>  	.unregister_notifier	= vfio_iommu_type1_unregister_notifier,
> +	.dma_rw			= vfio_iommu_type1_dma_rw,
>  };
>  
>  static int __init vfio_iommu_type1_init(void)
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 2e1fa0c7396f..fea0cb1e61d2 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -82,6 +82,8 @@ struct vfio_iommu_driver_ops {
>  					     struct notifier_block *nb);
>  	int		(*unregister_notifier)(void *iommu_data,
>  					       struct notifier_block *nb);
> +	int		(*dma_rw)(void *iommu_data, dma_addr_t iova,
> +				  void *data, size_t count, bool write);
>  };
>  
>  extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
> @@ -109,6 +111,9 @@ extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
>  extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn,
>  			    int npage);
>  
> +extern int vfio_dma_rw(struct vfio_group *group, dma_addr_t iova, void *data,
> +		       size_t len, bool write);
> +
>  /* each type has independent events */
>  enum vfio_notify_type {
>  	VFIO_IOMMU_NOTIFY = 0,
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/7] vfio: introduce vfio_dma_rw to read/write a range of IOVAs
  2020-03-06  1:21   ` Yan Zhao
@ 2020-03-06 16:27     ` Alex Williamson
  2020-03-09  1:00       ` Yan Zhao
  0 siblings, 1 reply; 17+ messages in thread
From: Alex Williamson @ 2020-03-06 16:27 UTC (permalink / raw)
  To: Yan Zhao
  Cc: zhenyuw, intel-gvt-dev, kvm, linux-kernel, pbonzini, Tian, Kevin, peterx

On Thu, 5 Mar 2020 20:21:48 -0500
Yan Zhao <yan.y.zhao@intel.com> wrote:

> On Mon, Feb 24, 2020 at 04:47:15PM +0800, Zhao, Yan Y wrote:
> > vfio_dma_rw will read/write a range of user space memory pointed to by
> > IOVA into/from a kernel buffer without enforcing pinning the user space
> > memory.
> > 
> > TODO: mark the IOVAs to user space memory dirty if they are written in
> > vfio_dma_rw().
> > 
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > ---
> >  drivers/vfio/vfio.c             | 49 +++++++++++++++++++++
> >  drivers/vfio/vfio_iommu_type1.c | 77 +++++++++++++++++++++++++++++++++
> >  include/linux/vfio.h            |  5 +++
> >  3 files changed, 131 insertions(+)
> > 
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > index 914bdf4b9d73..902867627cbf 100644
> > --- a/drivers/vfio/vfio.c
> > +++ b/drivers/vfio/vfio.c
> > @@ -1998,6 +1998,55 @@ int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn, int npage)
> >  }
> >  EXPORT_SYMBOL(vfio_unpin_pages);
> >  
> > +
> > +/*
> > + * This interface allows the CPUs to perform some sort of virtual DMA on
> > + * behalf of the device.
> > + *
> > + * CPUs read/write a range of IOVAs pointing to user space memory into/from
> > + * a kernel buffer.
> > + *
> > + * As the read/write of user space memory is conducted via the CPUs and is
> > + * not a real device DMA, it is not necessary to pin the user space memory.
> > + *
> > + * The caller needs to call vfio_group_get_external_user() or
> > + * vfio_group_get_external_user_from_dev() prior to calling this interface,
> > + * so as to prevent the VFIO group from disposal in the middle of the call.
> > + * But it can keep the reference to the VFIO group for several calls into
> > + * this interface.
> > + * After finishing using of the VFIO group, the caller needs to release the
> > + * VFIO group by calling vfio_group_put_external_user().
> > + *
> > + * @group [in]: vfio group of a device
> > + * @iova [in] : base IOVA of a user space buffer
> > + * @data [in] : pointer to kernel buffer
> > + * @len [in]  : kernel buffer length
> > + * @write     : indicate read or write
> > + * Return error code on failure or 0 on success.
> > + */
> > +int vfio_dma_rw(struct vfio_group *group, dma_addr_t iova,
> > +		void *data, size_t len, bool write)  
> hi Alex
> May I rename this interface to vfio_dma_rw_from_group() that takes
> VFIO group as arg and add another interface vfio_dma_rw(struct device *dev...) ?
> That might be easier for a driver to use the second one if it does not care about
> performance much.

Perhaps vfio_group_dma_rw() and vfio_dev_dma_rw()?  I'd be reluctant to
add the latter, if a caller doesn't care about performance then they
won't mind making a couple calls to get and release the group reference.
Thanks,

Alex


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v3 2/7] vfio: introduce vfio_dma_rw to read/write a range of IOVAs
  2020-03-06 16:27     ` Alex Williamson
@ 2020-03-09  1:00       ` Yan Zhao
  0 siblings, 0 replies; 17+ messages in thread
From: Yan Zhao @ 2020-03-09  1:00 UTC (permalink / raw)
  To: Alex Williamson
  Cc: zhenyuw, intel-gvt-dev, kvm, linux-kernel, pbonzini, Tian, Kevin, peterx

On Sat, Mar 07, 2020 at 12:27:46AM +0800, Alex Williamson wrote:
> On Thu, 5 Mar 2020 20:21:48 -0500
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > On Mon, Feb 24, 2020 at 04:47:15PM +0800, Zhao, Yan Y wrote:
> > > vfio_dma_rw will read/write a range of user space memory pointed to by
> > > IOVA into/from a kernel buffer without enforcing pinning the user space
> > > memory.
> > > 
> > > TODO: mark the IOVAs to user space memory dirty if they are written in
> > > vfio_dma_rw().
> > > 
> > > Cc: Kevin Tian <kevin.tian@intel.com>
> > > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > > ---
> > >  drivers/vfio/vfio.c             | 49 +++++++++++++++++++++
> > >  drivers/vfio/vfio_iommu_type1.c | 77 +++++++++++++++++++++++++++++++++
> > >  include/linux/vfio.h            |  5 +++
> > >  3 files changed, 131 insertions(+)
> > > 
> > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > > index 914bdf4b9d73..902867627cbf 100644
> > > --- a/drivers/vfio/vfio.c
> > > +++ b/drivers/vfio/vfio.c
> > > @@ -1998,6 +1998,55 @@ int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn, int npage)
> > >  }
> > >  EXPORT_SYMBOL(vfio_unpin_pages);
> > >  
> > > +
> > > +/*
> > > + * This interface allows the CPUs to perform some sort of virtual DMA on
> > > + * behalf of the device.
> > > + *
> > > + * CPUs read/write a range of IOVAs pointing to user space memory into/from
> > > + * a kernel buffer.
> > > + *
> > > + * As the read/write of user space memory is conducted via the CPUs and is
> > > + * not a real device DMA, it is not necessary to pin the user space memory.
> > > + *
> > > + * The caller needs to call vfio_group_get_external_user() or
> > > + * vfio_group_get_external_user_from_dev() prior to calling this interface,
> > > + * so as to prevent the VFIO group from disposal in the middle of the call.
> > > + * But it can keep the reference to the VFIO group for several calls into
> > > + * this interface.
> > > + * After finishing using of the VFIO group, the caller needs to release the
> > > + * VFIO group by calling vfio_group_put_external_user().
> > > + *
> > > + * @group [in]: vfio group of a device
> > > + * @iova [in] : base IOVA of a user space buffer
> > > + * @data [in] : pointer to kernel buffer
> > > + * @len [in]  : kernel buffer length
> > > + * @write     : indicate read or write
> > > + * Return error code on failure or 0 on success.
> > > + */
> > > +int vfio_dma_rw(struct vfio_group *group, dma_addr_t iova,
> > > +		void *data, size_t len, bool write)  
> > hi Alex
> > May I rename this interface to vfio_dma_rw_from_group() that takes
> > VFIO group as arg and add another interface vfio_dma_rw(struct device *dev...) ?
> > That might be easier for a driver to use the second one if it does not care about
> > performance much.
> 
> Perhaps vfio_group_dma_rw() and vfio_dev_dma_rw()?  I'd be reluctant to
> add the latter, if a caller doesn't care about performance then they
> won't mind making a couple calls to get and release the group reference.
> Thanks,
>
yes, it makes sense. Then I withdraw this request :)

Thanks
Yan

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-03-09  1:10 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-24  8:43 [PATCH v3 0/7] use vfio_dma_rw to read/write IOVAs from CPU side Yan Zhao
2020-02-24  8:46 ` [PATCH v3 1/7] vfio: allow external user to get vfio group from device Yan Zhao
2020-02-24 19:15   ` Alex Williamson
2020-02-25  3:35     ` Yan Zhao
2020-03-05 19:01       ` Alex Williamson
2020-03-06  1:12         ` Yan Zhao
2020-02-24  8:47 ` [PATCH v3 2/7] vfio: introduce vfio_dma_rw to read/write a range of IOVAs Yan Zhao
2020-02-24 19:14   ` Alex Williamson
2020-02-25  3:44     ` Yan Zhao
2020-03-06  1:21   ` Yan Zhao
2020-03-06 16:27     ` Alex Williamson
2020-03-09  1:00       ` Yan Zhao
2020-02-24  8:47 ` [PATCH v3 3/7] vfio: avoid inefficient lookup of VFIO group in vfio_pin/unpin_pages Yan Zhao
2020-02-24  8:47 ` [PATCH v3 4/7] drm/i915/gvt: hold reference of VFIO group during opening of vgpu Yan Zhao
2020-02-24  8:48 ` [PATCH v3 5/7] drm/i915/gvt: subsitute kvm_read/write_guest with vfio_dma_rw Yan Zhao
2020-02-24  8:48 ` [PATCH v3 6/7] drm/i915/gvt: avoid unnecessary lookup in each vfio pin & unpin pages Yan Zhao
2020-02-24  8:48 ` [PATCH v3 7/7] drm/i915/gvt: rw more pages a time for shadow context Yan Zhao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.