linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/7] use vfio_dma_rw to read/write IOVAs from CPU side
@ 2020-03-13  3:05 Yan Zhao
  2020-03-13  3:07 ` [PATCH v4 1/7] vfio: allow external user to get vfio group from device Yan Zhao
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Yan Zhao @ 2020-03-13  3:05 UTC (permalink / raw)
  To: intel-gvt-dev, kvm, linux-kernel
  Cc: alex.williamson, zhenyuw, pbonzini, kevin.tian, peterx, Yan Zhao

It is better for a device model to use IOVAs to read/write memory to
perform some sort of virtual DMA on behalf of the device.

patch 1 exports VFIO group to external user so that it can hold the group
reference until finishing using of it. It saves ~500 cycles that are spent
on VFIO group looking up, referencing and dereferencing. (this data is
measured with 1 VFIO user).

patch 2 introduces interface vfio_dma_rw().

patch 3 introduces interfaces vfio_group_pin_pages() and
vfio_group_unpin_pages() to get rid of VFIO group looking-up in
vfio_pin_pages() and vfio_unpin_pages().

patch 4-5 let kvmgt switch from calling kvm_read/write_guest() to calling
vfio_dma_rw to rw IOVAs.

patch 6 let kvmgt switch to use lighter version of vfio_pin/unpin_pages(),
i.e. vfio_group_pin/unpin_pages()

patch 7 enables kvmgt to read/write IOVAs of size larger than PAGE_SIZE.


Performance:

Comparison between vfio_dma_rw() and kvm_read/write_guest():

1. avergage CPU cycles of each interface measured with 1 running VM:
 --------------------------------------------------
|  rw       |          avg cycles of               |
|  size     | (vfio_dma_rw - kvm_read/write_guest) |
|---------- ---------------------------------------|
| <= 1 page |            +155 ~ +195               |        
|--------------------------------------------------|
| 5 pages   |                -530                  |
|--------------------------------------------------|
| 20 pages  |           -2005 ~ -2533              |
 --------------------------------------------------

2. average scores

base: base code before applying code in this series. use
kvm_read/write_pages() to rw IOVAs

base + this series: use vfio_dma_rw() to read IOVAs and use
vfio_group_pin/unpin_pages(), and kvmgt is able to rw several pages
at a time.

Scores of benchmarks running in 1 VM each:
 -----------------------------------------------------------------
|                    | glmark2 | lightsmark | openarena | heavens |
|-----------------------------------------------------------------|
|       base         |  1248   |  219.70    |  114.9    |   560   |
|-----------------------------------------------------------------|
|base + this series  |  1248   |  225.8     |   115     |   559   |
 -----------------------------------------------------------------

Sum of scores of two benchmark instances running in 2 VMs each:
 -------------------------------------------------------
|                    | glmark2 | lightsmark | openarena |
|-------------------------------------------------------|
|       base         |  812    |  211.46    |  115.3    |
|-------------------------------------------------------|
|base + this series  |  814    |  214.69    |  115.9    |
 -------------------------------------------------------


Changelogs:
v3 --> v4:
- rebased to 5.6.0-rc4+
- adjust wrap position for vfio_group_get_external_user_from_dev() in
  header file.(Alex)
- updated function description of vfio_group_get_external_user_from_dev()
  (Alex)
- fixed Error path group reference leaking in
  vfio_group_get_external_user_from_dev()  (Alex)
- reurn 0 for success or errno in vfio_dma_rw_chunk(). (Alex)
- renamed iova to user_iova in interface vfio_dam_rw().
- renamed vfio_pin_pages_from_group() and vfio_unpin_pages_from_group() to
  vfio_group_pin_pages() and vfio_group_unpin_pages()
- renamed user_pfn to user_iova_pfn in vfio_group_pin_pages() and
  vfio_group_unpin_pages()

v2 --> v3:
- add vfio_group_get_external_user_from_dev() to improve performance (Alex)
- add vfio_pin/unpin_pages_from_group() to avoid repeated looking up of
  VFIO group in vfio_pin/unpin_pages() (Alex)
- add a check for IOMMU_READ permission. (Alex)
- rename vfio_iommu_type1_rw_dma_nopin() to
  vfio_iommu_type1_dma_rw_chunk(). (Alex)
- in kvmgt, change "write ? vfio_dma_rw(...,true) :
  vfio_dma_rw(...,false)" to vfio_dma_rw(dev, gpa, buf, len, write)
  (Alex and Paolo)
- in kvmgt, instead of read/write context pages 1:1, combining the
  reads/writes of continuous IOVAs to take advantage of vfio_dma_rw() for
  faster crossing page boundary accesses.

v1 --> v2:
- rename vfio_iova_rw to vfio_dma_rw, vfio iommu driver ops .iova_rw
to .dma_rw. (Alex).
- change iova and len from unsigned long to dma_addr_t and size_t,
respectively. (Alex)
- fix possible overflow in dma->vaddr + iova - dma->iova + offset (Alex)
- split DMAs from on page boundary to on max available size to eliminate
  redundant searching of vfio_dma and switching mm. (Alex)
- add a check for IOMMU_WRITE permission.


Yan Zhao (7):
  vfio: allow external user to get vfio group from device
  vfio: introduce vfio_dma_rw to read/write a range of IOVAs
  vfio: avoid inefficient operations on VFIO group in
    vfio_pin/unpin_pages
  drm/i915/gvt: hold reference of VFIO group during opening of vgpu
  drm/i915/gvt: subsitute kvm_read/write_guest with vfio_dma_rw
  drm/i915/gvt: switch to user vfio_group_pin/upin_pages
  drm/i915/gvt: rw more pages a time for shadow context

 drivers/gpu/drm/i915/gvt/kvmgt.c     |  46 ++++---
 drivers/gpu/drm/i915/gvt/scheduler.c |  97 ++++++++++-----
 drivers/vfio/vfio.c                  | 180 +++++++++++++++++++++++++++
 drivers/vfio/vfio_iommu_type1.c      |  76 +++++++++++
 include/linux/vfio.h                 |  13 ++
 5 files changed, 360 insertions(+), 52 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v4 1/7] vfio: allow external user to get vfio group from device
  2020-03-13  3:05 [PATCH v4 0/7] use vfio_dma_rw to read/write IOVAs from CPU side Yan Zhao
@ 2020-03-13  3:07 ` Yan Zhao
  2020-03-13  3:09 ` [PATCH v4 2/7] vfio: introduce vfio_dma_rw to read/write a range of IOVAs Yan Zhao
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Yan Zhao @ 2020-03-13  3:07 UTC (permalink / raw)
  To: intel-gvt-dev, kvm, linux-kernel
  Cc: alex.williamson, zhenyuw, pbonzini, kevin.tian, peterx, Yan Zhao

external user calls vfio_group_get_external_user_from_dev() with a device
pointer to get the VFIO group associated with this device.
The VFIO group is checked to be vialbe and have IOMMU set. Then
container user counter is increased and VFIO group reference is hold
to prevent the VFIO group from disposal before external user exits.

when the external user finishes using of the VFIO group, it calls
vfio_group_put_external_user() to dereference the VFIO group and the
container user counter.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 drivers/vfio/vfio.c  | 38 ++++++++++++++++++++++++++++++++++++++
 include/linux/vfio.h |  2 ++
 2 files changed, 40 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index c8482624ca34..97b972bfb735 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1720,6 +1720,44 @@ struct vfio_group *vfio_group_get_external_user(struct file *filep)
 }
 EXPORT_SYMBOL_GPL(vfio_group_get_external_user);
 
+/**
+ * External user API, exported by symbols to be linked dynamically.
+ * The external user passes in a device pointer
+ * to verify that:
+ *	- A VFIO group is assiciated with the device;
+ *	- IOMMU is set for the group.
+ * If both checks passed, vfio_group_get_external_user_from_dev()
+ * increments the container user counter to prevent the VFIO group
+ * from disposal before external user exits and returns the pointer
+ * to the VFIO group.
+ *
+ * When the external user finishes using the VFIO group, it calls
+ * vfio_group_put_external_user() to release the VFIO group and
+ * decrement the container user counter.
+ *
+ * @dev [in]	: device
+ * Return error PTR or pointer to VFIO group.
+ */
+
+struct vfio_group *vfio_group_get_external_user_from_dev(struct device *dev)
+{
+	struct vfio_group *group;
+	int ret;
+
+	group = vfio_group_get_from_dev(dev);
+	if (!group)
+		return ERR_PTR(-ENODEV);
+
+	ret = vfio_group_add_container_user(group);
+	if (ret) {
+		vfio_group_put(group);
+		return ERR_PTR(ret);
+	}
+
+	return group;
+}
+EXPORT_SYMBOL_GPL(vfio_group_get_external_user_from_dev);
+
 void vfio_group_put_external_user(struct vfio_group *group)
 {
 	vfio_group_try_dissolve_container(group);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index e42a711a2800..fb71e0ac0e76 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -94,6 +94,8 @@ extern void vfio_unregister_iommu_driver(
  */
 extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
 extern void vfio_group_put_external_user(struct vfio_group *group);
+extern struct vfio_group *vfio_group_get_external_user_from_dev(struct device
+								*dev);
 extern bool vfio_external_group_match_file(struct vfio_group *group,
 					   struct file *filep);
 extern int vfio_external_user_iommu_id(struct vfio_group *group);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 2/7] vfio: introduce vfio_dma_rw to read/write a range of IOVAs
  2020-03-13  3:05 [PATCH v4 0/7] use vfio_dma_rw to read/write IOVAs from CPU side Yan Zhao
  2020-03-13  3:07 ` [PATCH v4 1/7] vfio: allow external user to get vfio group from device Yan Zhao
@ 2020-03-13  3:09 ` Yan Zhao
  2020-04-05 16:17   ` Kees Cook
  2020-03-13  3:09 ` [PATCH v4 3/7] vfio: avoid inefficient operations on VFIO group in vfio_pin/unpin_pages Yan Zhao
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Yan Zhao @ 2020-03-13  3:09 UTC (permalink / raw)
  To: intel-gvt-dev, kvm, linux-kernel
  Cc: alex.williamson, zhenyuw, pbonzini, kevin.tian, peterx, Yan Zhao

vfio_dma_rw will read/write a range of user space memory pointed to by
IOVA into/from a kernel buffer without enforcing pinning the user space
memory.

TODO: mark the IOVAs to user space memory dirty if they are written in
vfio_dma_rw().

Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 drivers/vfio/vfio.c             | 49 +++++++++++++++++++++
 drivers/vfio/vfio_iommu_type1.c | 76 +++++++++++++++++++++++++++++++++
 include/linux/vfio.h            |  5 +++
 3 files changed, 130 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 97b972bfb735..6997f711b925 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1999,6 +1999,55 @@ int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn, int npage)
 }
 EXPORT_SYMBOL(vfio_unpin_pages);
 
+
+/*
+ * This interface allows the CPUs to perform some sort of virtual DMA on
+ * behalf of the device.
+ *
+ * CPUs read/write from/into a range of IOVAs pointing to user space memory
+ * into/from a kernel buffer.
+ *
+ * As the read/write of user space memory is conducted via the CPUs and is
+ * not a real device DMA, it is not necessary to pin the user space memory.
+ *
+ * The caller needs to call vfio_group_get_external_user() or
+ * vfio_group_get_external_user_from_dev() prior to calling this interface,
+ * so as to prevent the VFIO group from disposal in the middle of the call.
+ * But it can keep the reference to the VFIO group for several calls into
+ * this interface.
+ * After finishing using of the VFIO group, the caller needs to release the
+ * VFIO group by calling vfio_group_put_external_user().
+ *
+ * @group [in]		: VFIO group
+ * @user_iova [in]	: base IOVA of a user space buffer
+ * @data [in]		: pointer to kernel buffer
+ * @len [in]		: kernel buffer length
+ * @write		: indicate read or write
+ * Return error code on failure or 0 on success.
+ */
+int vfio_dma_rw(struct vfio_group *group, dma_addr_t user_iova,
+		void *data, size_t len, bool write)
+{
+	struct vfio_container *container;
+	struct vfio_iommu_driver *driver;
+	int ret = 0;
+
+	if (!group || !data || len <= 0)
+		return -EINVAL;
+
+	container = group->container;
+	driver = container->iommu_driver;
+
+	if (likely(driver && driver->ops->dma_rw))
+		ret = driver->ops->dma_rw(container->iommu_data,
+					  user_iova, data, len, write);
+	else
+		ret = -ENOTTY;
+
+	return ret;
+}
+EXPORT_SYMBOL(vfio_dma_rw);
+
 static int vfio_register_iommu_notifier(struct vfio_group *group,
 					unsigned long *events,
 					struct notifier_block *nb)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index a177bf2c6683..9fdfae1cb17a 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -27,6 +27,7 @@
 #include <linux/iommu.h>
 #include <linux/module.h>
 #include <linux/mm.h>
+#include <linux/mmu_context.h>
 #include <linux/rbtree.h>
 #include <linux/sched/signal.h>
 #include <linux/sched/mm.h>
@@ -2305,6 +2306,80 @@ static int vfio_iommu_type1_unregister_notifier(void *iommu_data,
 	return blocking_notifier_chain_unregister(&iommu->notifier, nb);
 }
 
+static int vfio_iommu_type1_dma_rw_chunk(struct vfio_iommu *iommu,
+					 dma_addr_t user_iova, void *data,
+					 size_t count, bool write,
+					 size_t *copied)
+{
+	struct mm_struct *mm;
+	unsigned long vaddr;
+	struct vfio_dma *dma;
+	bool kthread = current->mm == NULL;
+	size_t offset;
+
+	*copied = 0;
+
+	dma = vfio_find_dma(iommu, user_iova, 1);
+	if (!dma)
+		return -EINVAL;
+
+	if ((write && !(dma->prot & IOMMU_WRITE)) ||
+			!(dma->prot & IOMMU_READ))
+		return -EPERM;
+
+	mm = get_task_mm(dma->task);
+
+	if (!mm)
+		return -EPERM;
+
+	if (kthread)
+		use_mm(mm);
+	else if (current->mm != mm)
+		goto out;
+
+	offset = user_iova - dma->iova;
+
+	if (count > dma->size - offset)
+		count = dma->size - offset;
+
+	vaddr = dma->vaddr + offset;
+
+	if (write)
+		*copied = __copy_to_user((void __user *)vaddr, data,
+					 count) ? 0 : count;
+	else
+		*copied = __copy_from_user(data, (void __user *)vaddr,
+					   count) ? 0 : count;
+	if (kthread)
+		unuse_mm(mm);
+out:
+	mmput(mm);
+	return *copied ? 0 : -EFAULT;
+}
+
+static int vfio_iommu_type1_dma_rw(void *iommu_data, dma_addr_t user_iova,
+				   void *data, size_t count, bool write)
+{
+	struct vfio_iommu *iommu = iommu_data;
+	int ret = 0;
+	size_t done;
+
+	mutex_lock(&iommu->lock);
+	while (count > 0) {
+		ret = vfio_iommu_type1_dma_rw_chunk(iommu, user_iova, data,
+						    count, write, &done);
+		if (ret)
+			break;
+
+		count -= done;
+		data += done;
+		user_iova += done;
+	}
+
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
 static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
 	.name			= "vfio-iommu-type1",
 	.owner			= THIS_MODULE,
@@ -2317,6 +2392,7 @@ static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
 	.unpin_pages		= vfio_iommu_type1_unpin_pages,
 	.register_notifier	= vfio_iommu_type1_register_notifier,
 	.unregister_notifier	= vfio_iommu_type1_unregister_notifier,
+	.dma_rw			= vfio_iommu_type1_dma_rw,
 };
 
 static int __init vfio_iommu_type1_init(void)
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index fb71e0ac0e76..34b2fdf4de6e 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -82,6 +82,8 @@ struct vfio_iommu_driver_ops {
 					     struct notifier_block *nb);
 	int		(*unregister_notifier)(void *iommu_data,
 					       struct notifier_block *nb);
+	int		(*dma_rw)(void *iommu_data, dma_addr_t user_iova,
+				  void *data, size_t count, bool write);
 };
 
 extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
@@ -109,6 +111,9 @@ extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
 extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn,
 			    int npage);
 
+extern int vfio_dma_rw(struct vfio_group *group, dma_addr_t user_iova,
+		       void *data, size_t len, bool write);
+
 /* each type has independent events */
 enum vfio_notify_type {
 	VFIO_IOMMU_NOTIFY = 0,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 3/7] vfio: avoid inefficient operations on VFIO group in vfio_pin/unpin_pages
  2020-03-13  3:05 [PATCH v4 0/7] use vfio_dma_rw to read/write IOVAs from CPU side Yan Zhao
  2020-03-13  3:07 ` [PATCH v4 1/7] vfio: allow external user to get vfio group from device Yan Zhao
  2020-03-13  3:09 ` [PATCH v4 2/7] vfio: introduce vfio_dma_rw to read/write a range of IOVAs Yan Zhao
@ 2020-03-13  3:09 ` Yan Zhao
  2020-03-13  3:10 ` [PATCH v4 4/7] drm/i915/gvt: hold reference of VFIO group during opening of vgpu Yan Zhao
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Yan Zhao @ 2020-03-13  3:09 UTC (permalink / raw)
  To: intel-gvt-dev, kvm, linux-kernel
  Cc: alex.williamson, zhenyuw, pbonzini, kevin.tian, peterx, Yan Zhao

vfio_group_pin_pages() and vfio_group_unpin_pages() are introduced to
avoid inefficient search/check/ref/deref opertions associated with VFIO
group as those in each calling into vfio_pin_pages() and
vfio_unpin_pages().

VFIO group is taken as arg directly. The callers combine
search/check/ref/deref operations associated with VFIO group by calling
vfio_group_get_external_user()/vfio_group_get_external_user_from_dev()
beforehand, and vfio_group_put_external_user() afterwards.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 drivers/vfio/vfio.c  | 91 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/vfio.h |  6 +++
 2 files changed, 97 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 6997f711b925..210fcf426643 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1999,6 +1999,97 @@ int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn, int npage)
 }
 EXPORT_SYMBOL(vfio_unpin_pages);
 
+/*
+ * Pin a set of guest IOVA PFNs and return their associated host PFNs for a
+ * VFIO group.
+ *
+ * The caller needs to call vfio_group_get_external_user() or
+ * vfio_group_get_external_user_from_dev() prior to calling this interface,
+ * so as to prevent the VFIO group from disposal in the middle of the call.
+ * But it can keep the reference to the VFIO group for several calls into
+ * this interface.
+ * After finishing using of the VFIO group, the caller needs to release the
+ * VFIO group by calling vfio_group_put_external_user().
+ *
+ * @group [in]		: VFIO group
+ * @user_iova_pfn [in]	: array of user/guest IOVA PFNs to be pinned.
+ * @npage [in]		: count of elements in user_iova_pfn array.
+ *			  This count should not be greater
+ *			  VFIO_PIN_PAGES_MAX_ENTRIES.
+ * @prot [in]		: protection flags
+ * @phys_pfn [out]	: array of host PFNs
+ * Return error or number of pages pinned.
+ */
+int vfio_group_pin_pages(struct vfio_group *group,
+			 unsigned long *user_iova_pfn, int npage,
+			 int prot, unsigned long *phys_pfn)
+{
+	struct vfio_container *container;
+	struct vfio_iommu_driver *driver;
+	int ret;
+
+	if (!group || !user_iova_pfn || !phys_pfn || !npage)
+		return -EINVAL;
+
+	if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
+		return -E2BIG;
+
+	container = group->container;
+	driver = container->iommu_driver;
+	if (likely(driver && driver->ops->pin_pages))
+		ret = driver->ops->pin_pages(container->iommu_data,
+					     user_iova_pfn, npage,
+					     prot, phys_pfn);
+	else
+		ret = -ENOTTY;
+
+	return ret;
+}
+EXPORT_SYMBOL(vfio_group_pin_pages);
+
+/*
+ * Unpin a set of guest IOVA PFNs for a VFIO group.
+ *
+ * The caller needs to call vfio_group_get_external_user() or
+ * vfio_group_get_external_user_from_dev() prior to calling this interface,
+ * so as to prevent the VFIO group from disposal in the middle of the call.
+ * But it can keep the reference to the VFIO group for several calls into
+ * this interface.
+ * After finishing using of the VFIO group, the caller needs to release the
+ * VFIO group by calling vfio_group_put_external_user().
+ *
+ * @group [in]		: vfio group
+ * @user_iova_pfn [in]	: array of user/guest IOVA PFNs to be unpinned.
+ * @npage [in]		: count of elements in user_iova_pfn array.
+ *			  This count should not be greater than
+ *			  VFIO_PIN_PAGES_MAX_ENTRIES.
+ * Return error or number of pages unpinned.
+ */
+int vfio_group_unpin_pages(struct vfio_group *group,
+			   unsigned long *user_iova_pfn, int npage)
+{
+	struct vfio_container *container;
+	struct vfio_iommu_driver *driver;
+	int ret;
+
+	if (!group || !user_iova_pfn || !npage)
+		return -EINVAL;
+
+	if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
+		return -E2BIG;
+
+	container = group->container;
+	driver = container->iommu_driver;
+	if (likely(driver && driver->ops->unpin_pages))
+		ret = driver->ops->unpin_pages(container->iommu_data,
+					       user_iova_pfn, npage);
+	else
+		ret = -ENOTTY;
+
+	return ret;
+}
+EXPORT_SYMBOL(vfio_group_unpin_pages);
+
 
 /*
  * This interface allows the CPUs to perform some sort of virtual DMA on
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 34b2fdf4de6e..be2bd358b952 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -111,6 +111,12 @@ extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
 extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn,
 			    int npage);
 
+extern int vfio_group_pin_pages(struct vfio_group *group,
+				unsigned long *user_iova_pfn, int npage,
+				int prot, unsigned long *phys_pfn);
+extern int vfio_group_unpin_pages(struct vfio_group *group,
+				  unsigned long *user_iova_pfn, int npage);
+
 extern int vfio_dma_rw(struct vfio_group *group, dma_addr_t user_iova,
 		       void *data, size_t len, bool write);
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 4/7] drm/i915/gvt: hold reference of VFIO group during opening of vgpu
  2020-03-13  3:05 [PATCH v4 0/7] use vfio_dma_rw to read/write IOVAs from CPU side Yan Zhao
                   ` (2 preceding siblings ...)
  2020-03-13  3:09 ` [PATCH v4 3/7] vfio: avoid inefficient operations on VFIO group in vfio_pin/unpin_pages Yan Zhao
@ 2020-03-13  3:10 ` Yan Zhao
  2020-03-13  3:11 ` [PATCH v4 5/7] drm/i915/gvt: subsitute kvm_read/write_guest with vfio_dma_rw Yan Zhao
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Yan Zhao @ 2020-03-13  3:10 UTC (permalink / raw)
  To: intel-gvt-dev, kvm, linux-kernel
  Cc: alex.williamson, zhenyuw, pbonzini, kevin.tian, peterx, Yan Zhao

hold reference count of the VFIO group for each vgpu at vgpu opening and
release the reference at vgpu releasing.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 074c4efb58eb..811cee28ae06 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -131,6 +131,7 @@ struct kvmgt_vdev {
 	struct work_struct release_work;
 	atomic_t released;
 	struct vfio_device *vfio_device;
+	struct vfio_group *vfio_group;
 };
 
 static inline struct kvmgt_vdev *kvmgt_vdev(struct intel_vgpu *vgpu)
@@ -792,6 +793,7 @@ static int intel_vgpu_open(struct mdev_device *mdev)
 	struct kvmgt_vdev *vdev = kvmgt_vdev(vgpu);
 	unsigned long events;
 	int ret;
+	struct vfio_group *vfio_group;
 
 	vdev->iommu_notifier.notifier_call = intel_vgpu_iommu_notifier;
 	vdev->group_notifier.notifier_call = intel_vgpu_group_notifier;
@@ -814,6 +816,14 @@ static int intel_vgpu_open(struct mdev_device *mdev)
 		goto undo_iommu;
 	}
 
+	vfio_group = vfio_group_get_external_user_from_dev(mdev_dev(mdev));
+	if (IS_ERR_OR_NULL(vfio_group)) {
+		ret = !vfio_group ? -EFAULT : PTR_ERR(vfio_group);
+		gvt_vgpu_err("vfio_group_get_external_user_from_dev failed\n");
+		goto undo_register;
+	}
+	vdev->vfio_group = vfio_group;
+
 	/* Take a module reference as mdev core doesn't take
 	 * a reference for vendor driver.
 	 */
@@ -830,6 +840,10 @@ static int intel_vgpu_open(struct mdev_device *mdev)
 	return ret;
 
 undo_group:
+	vfio_group_put_external_user(vdev->vfio_group);
+	vdev->vfio_group = NULL;
+
+undo_register:
 	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
 					&vdev->group_notifier);
 
@@ -884,6 +898,7 @@ static void __intel_vgpu_release(struct intel_vgpu *vgpu)
 	kvmgt_guest_exit(info);
 
 	intel_vgpu_release_msi_eventfd_ctx(vgpu);
+	vfio_group_put_external_user(vdev->vfio_group);
 
 	vdev->kvm = NULL;
 	vgpu->handle = 0;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 5/7] drm/i915/gvt: subsitute kvm_read/write_guest with vfio_dma_rw
  2020-03-13  3:05 [PATCH v4 0/7] use vfio_dma_rw to read/write IOVAs from CPU side Yan Zhao
                   ` (3 preceding siblings ...)
  2020-03-13  3:10 ` [PATCH v4 4/7] drm/i915/gvt: hold reference of VFIO group during opening of vgpu Yan Zhao
@ 2020-03-13  3:11 ` Yan Zhao
  2020-03-13  3:11 ` [PATCH v4 6/7] drm/i915/gvt: switch to user vfio_group_pin/upin_pages Yan Zhao
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Yan Zhao @ 2020-03-13  3:11 UTC (permalink / raw)
  To: intel-gvt-dev, kvm, linux-kernel
  Cc: alex.williamson, zhenyuw, pbonzini, kevin.tian, peterx, Yan Zhao

As a device model, it is better to read/write guest memory using vfio
interface, so that vfio is able to maintain dirty info of device IOVAs.

Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c | 23 ++---------------------
 1 file changed, 2 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 811cee28ae06..cee7376ba39d 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -2050,33 +2050,14 @@ static int kvmgt_rw_gpa(unsigned long handle, unsigned long gpa,
 			void *buf, unsigned long len, bool write)
 {
 	struct kvmgt_guest_info *info;
-	struct kvm *kvm;
-	int idx, ret;
-	bool kthread = current->mm == NULL;
 
 	if (!handle_valid(handle))
 		return -ESRCH;
 
 	info = (struct kvmgt_guest_info *)handle;
-	kvm = info->kvm;
-
-	if (kthread) {
-		if (!mmget_not_zero(kvm->mm))
-			return -EFAULT;
-		use_mm(kvm->mm);
-	}
 
-	idx = srcu_read_lock(&kvm->srcu);
-	ret = write ? kvm_write_guest(kvm, gpa, buf, len) :
-		      kvm_read_guest(kvm, gpa, buf, len);
-	srcu_read_unlock(&kvm->srcu, idx);
-
-	if (kthread) {
-		unuse_mm(kvm->mm);
-		mmput(kvm->mm);
-	}
-
-	return ret;
+	return vfio_dma_rw(kvmgt_vdev(info->vgpu)->vfio_group,
+			   gpa, buf, len, write);
 }
 
 static int kvmgt_read_gpa(unsigned long handle, unsigned long gpa,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 6/7] drm/i915/gvt: switch to user vfio_group_pin/upin_pages
  2020-03-13  3:05 [PATCH v4 0/7] use vfio_dma_rw to read/write IOVAs from CPU side Yan Zhao
                   ` (4 preceding siblings ...)
  2020-03-13  3:11 ` [PATCH v4 5/7] drm/i915/gvt: subsitute kvm_read/write_guest with vfio_dma_rw Yan Zhao
@ 2020-03-13  3:11 ` Yan Zhao
  2020-03-13  3:12 ` [PATCH v4 7/7] drm/i915/gvt: rw more pages a time for shadow context Yan Zhao
  2020-03-13 22:29 ` [PATCH v4 0/7] use vfio_dma_rw to read/write IOVAs from CPU side Alex Williamson
  7 siblings, 0 replies; 14+ messages in thread
From: Yan Zhao @ 2020-03-13  3:11 UTC (permalink / raw)
  To: intel-gvt-dev, kvm, linux-kernel
  Cc: alex.williamson, zhenyuw, pbonzini, kevin.tian, peterx, Yan Zhao

substitute vfio_pin_pages() and vfio_unpin_pages() with
vfio_group_pin_pages() and vfio_group_unpin_pages(), so that
it will not go through looking up, checking, referencing,
dereferencing of VFIO group in each call.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index cee7376ba39d..eee530453aa6 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -152,6 +152,7 @@ static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 		unsigned long size)
 {
 	struct drm_i915_private *i915 = vgpu->gvt->gt->i915;
+	struct kvmgt_vdev *vdev = kvmgt_vdev(vgpu);
 	int total_pages;
 	int npage;
 	int ret;
@@ -161,7 +162,7 @@ static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 	for (npage = 0; npage < total_pages; npage++) {
 		unsigned long cur_gfn = gfn + npage;
 
-		ret = vfio_unpin_pages(mdev_dev(kvmgt_vdev(vgpu)->mdev), &cur_gfn, 1);
+		ret = vfio_group_unpin_pages(vdev->vfio_group, &cur_gfn, 1);
 		drm_WARN_ON(&i915->drm, ret != 1);
 	}
 }
@@ -170,6 +171,7 @@ static void gvt_unpin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 		unsigned long size, struct page **page)
 {
+	struct kvmgt_vdev *vdev = kvmgt_vdev(vgpu);
 	unsigned long base_pfn = 0;
 	int total_pages;
 	int npage;
@@ -184,8 +186,8 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 		unsigned long cur_gfn = gfn + npage;
 		unsigned long pfn;
 
-		ret = vfio_pin_pages(mdev_dev(kvmgt_vdev(vgpu)->mdev), &cur_gfn, 1,
-				     IOMMU_READ | IOMMU_WRITE, &pfn);
+		ret = vfio_group_pin_pages(vdev->vfio_group, &cur_gfn, 1,
+					   IOMMU_READ | IOMMU_WRITE, &pfn);
 		if (ret != 1) {
 			gvt_vgpu_err("vfio_pin_pages failed for gfn 0x%lx, ret %d\n",
 				     cur_gfn, ret);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4 7/7] drm/i915/gvt: rw more pages a time for shadow context
  2020-03-13  3:05 [PATCH v4 0/7] use vfio_dma_rw to read/write IOVAs from CPU side Yan Zhao
                   ` (5 preceding siblings ...)
  2020-03-13  3:11 ` [PATCH v4 6/7] drm/i915/gvt: switch to user vfio_group_pin/upin_pages Yan Zhao
@ 2020-03-13  3:12 ` Yan Zhao
  2020-03-16  3:23   ` Zhenyu Wang
  2020-03-13 22:29 ` [PATCH v4 0/7] use vfio_dma_rw to read/write IOVAs from CPU side Alex Williamson
  7 siblings, 1 reply; 14+ messages in thread
From: Yan Zhao @ 2020-03-13  3:12 UTC (permalink / raw)
  To: intel-gvt-dev, kvm, linux-kernel
  Cc: alex.williamson, zhenyuw, pbonzini, kevin.tian, peterx, Yan Zhao

1. as shadow context is pinned in intel_vgpu_setup_submission() and
unpinned in intel_vgpu_clean_submission(), its base virtual address of
is safely obtained from lrc_reg_state. no need to call kmap()/kunmap()
repeatedly.

2. IOVA(GPA)s of context pages are checked in this patch and if they are
consecutive, read/write them together in one
intel_gvt_hypervisor_read_gpa() / intel_gvt_hypervisor_write_gpa().

after the two changes in this patch,
average cycles for populate_shadow_context() and update_guest_context()
are reduced by ~10000-20000 cycles, depending on the average number of
consecutive pages in each read/write.

(1) comparison of cycles of
populate_shadow_context() + update_guest_context() when executing
different benchmarks
 -------------------------------------------------------------
|       cycles      | glmark2     | lightsmark  | openarena   |
|-------------------------------------------------------------|
| before this patch | 65968       | 97852       | 61373       |
|  after this patch | 56017 (85%) | 73862 (75%) | 47463 (77%) |
 -------------------------------------------------------------

(2) average count of pages read/written a time in
populate_shadow_context() and update_guest_context()
for each benchmark

 -----------------------------------------------------------
|     page cnt      | glmark2     | lightsmark  | openarena |
|-----------------------------------------------------------|
| before this patch |    1        |      1      |    1      |
|  after this patch |    5.25     |     19.99   |   20      |
 ------------------------------------------------------------

(3) comparison of benchmarks scores
 ---------------------------------------------------------------------
|      score        | glmark2       | lightsmark     | openarena      |
|---------------------------------------------------------------------|
| before this patch | 1244          | 222.18         | 114.4          |
|  after this patch | 1248 (100.3%) | 225.8 (101.6%) | 115.0 (100.9%) |
 ---------------------------------------------------------------------

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 drivers/gpu/drm/i915/gvt/scheduler.c | 95 ++++++++++++++++++++--------
 1 file changed, 67 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
index 1c95bf8cbed0..852d924f6abc 100644
--- a/drivers/gpu/drm/i915/gvt/scheduler.c
+++ b/drivers/gpu/drm/i915/gvt/scheduler.c
@@ -128,16 +128,21 @@ static int populate_shadow_context(struct intel_vgpu_workload *workload)
 {
 	struct intel_vgpu *vgpu = workload->vgpu;
 	struct intel_gvt *gvt = vgpu->gvt;
-	struct drm_i915_gem_object *ctx_obj =
-		workload->req->context->state->obj;
+	struct intel_context *ctx = workload->req->context;
 	struct execlist_ring_context *shadow_ring_context;
-	struct page *page;
 	void *dst;
+	void *context_base;
 	unsigned long context_gpa, context_page_num;
+	unsigned long gpa_base; /* first gpa of consecutive GPAs */
+	unsigned long gpa_size; /* size of consecutive GPAs */
 	int i;
 
-	page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
-	shadow_ring_context = kmap(page);
+	GEM_BUG_ON(!intel_context_is_pinned(ctx));
+
+	context_base = (void *) ctx->lrc_reg_state -
+				(LRC_STATE_PN << I915_GTT_PAGE_SHIFT);
+
+	shadow_ring_context = (void *) ctx->lrc_reg_state;
 
 	sr_oa_regs(workload, (u32 *)shadow_ring_context, true);
 #define COPY_REG(name) \
@@ -169,7 +174,6 @@ static int populate_shadow_context(struct intel_vgpu_workload *workload)
 			I915_GTT_PAGE_SIZE - sizeof(*shadow_ring_context));
 
 	sr_oa_regs(workload, (u32 *)shadow_ring_context, false);
-	kunmap(page);
 
 	if (IS_RESTORE_INHIBIT(shadow_ring_context->ctx_ctrl.val))
 		return 0;
@@ -184,8 +188,12 @@ static int populate_shadow_context(struct intel_vgpu_workload *workload)
 	if (IS_BROADWELL(gvt->gt->i915) && workload->engine->id == RCS0)
 		context_page_num = 19;
 
-	i = 2;
-	while (i < context_page_num) {
+
+	/* find consecutive GPAs from gma until the first inconsecutive GPA.
+	 * read from the continuous GPAs into dst virtual address
+	 */
+	gpa_size = 0;
+	for (i = 2; i < context_page_num; i++) {
 		context_gpa = intel_vgpu_gma_to_gpa(vgpu->gtt.ggtt_mm,
 				(u32)((workload->ctx_desc.lrca + i) <<
 				I915_GTT_PAGE_SHIFT));
@@ -194,12 +202,24 @@ static int populate_shadow_context(struct intel_vgpu_workload *workload)
 			return -EFAULT;
 		}
 
-		page = i915_gem_object_get_page(ctx_obj, i);
-		dst = kmap(page);
-		intel_gvt_hypervisor_read_gpa(vgpu, context_gpa, dst,
-				I915_GTT_PAGE_SIZE);
-		kunmap(page);
-		i++;
+		if (gpa_size == 0) {
+			gpa_base = context_gpa;
+			dst = context_base + (i << I915_GTT_PAGE_SHIFT);
+		} else if (context_gpa != gpa_base + gpa_size)
+			goto read;
+
+		gpa_size += I915_GTT_PAGE_SIZE;
+
+		if (i == context_page_num - 1)
+			goto read;
+
+		continue;
+
+read:
+		intel_gvt_hypervisor_read_gpa(vgpu, gpa_base, dst, gpa_size);
+		gpa_base = context_gpa;
+		gpa_size = I915_GTT_PAGE_SIZE;
+		dst = context_base + (i << I915_GTT_PAGE_SHIFT);
 	}
 	return 0;
 }
@@ -784,19 +804,23 @@ static void update_guest_context(struct intel_vgpu_workload *workload)
 {
 	struct i915_request *rq = workload->req;
 	struct intel_vgpu *vgpu = workload->vgpu;
-	struct drm_i915_gem_object *ctx_obj = rq->context->state->obj;
+	struct intel_context *ctx = workload->req->context;
 	struct execlist_ring_context *shadow_ring_context;
-	struct page *page;
-	void *src;
 	unsigned long context_gpa, context_page_num;
+	unsigned long gpa_base; /* first gpa of consecutive GPAs */
+	unsigned long gpa_size; /* size of consecutive GPAs*/
 	int i;
 	u32 ring_base;
 	u32 head, tail;
 	u16 wrap_count;
+	void *src;
+	void *context_base;
 
 	gvt_dbg_sched("ring id %d workload lrca %x\n", rq->engine->id,
 		      workload->ctx_desc.lrca);
 
+	GEM_BUG_ON(!intel_context_is_pinned(ctx));
+
 	head = workload->rb_head;
 	tail = workload->rb_tail;
 	wrap_count = workload->guest_rb_head >> RB_HEAD_WRAP_CNT_OFF;
@@ -820,9 +844,14 @@ static void update_guest_context(struct intel_vgpu_workload *workload)
 	if (IS_BROADWELL(rq->i915) && rq->engine->id == RCS0)
 		context_page_num = 19;
 
-	i = 2;
+	context_base = (void *) ctx->lrc_reg_state -
+			(LRC_STATE_PN << I915_GTT_PAGE_SHIFT);
 
-	while (i < context_page_num) {
+	/* find consecutive GPAs from gma until the first inconsecutive GPA.
+	 * write to the consecutive GPAs from src virtual address
+	 */
+	gpa_size = 0;
+	for (i = 2; i < context_page_num; i++) {
 		context_gpa = intel_vgpu_gma_to_gpa(vgpu->gtt.ggtt_mm,
 				(u32)((workload->ctx_desc.lrca + i) <<
 					I915_GTT_PAGE_SHIFT));
@@ -831,19 +860,30 @@ static void update_guest_context(struct intel_vgpu_workload *workload)
 			return;
 		}
 
-		page = i915_gem_object_get_page(ctx_obj, i);
-		src = kmap(page);
-		intel_gvt_hypervisor_write_gpa(vgpu, context_gpa, src,
-				I915_GTT_PAGE_SIZE);
-		kunmap(page);
-		i++;
+		if (gpa_size == 0) {
+			gpa_base = context_gpa;
+			src = context_base + (i << I915_GTT_PAGE_SHIFT);
+		} else if (context_gpa != gpa_base + gpa_size)
+			goto write;
+
+		gpa_size += I915_GTT_PAGE_SIZE;
+
+		if (i == context_page_num - 1)
+			goto write;
+
+		continue;
+
+write:
+		intel_gvt_hypervisor_write_gpa(vgpu, gpa_base, src, gpa_size);
+		gpa_base = context_gpa;
+		gpa_size = I915_GTT_PAGE_SIZE;
+		src = context_base + (i << I915_GTT_PAGE_SHIFT);
 	}
 
 	intel_gvt_hypervisor_write_gpa(vgpu, workload->ring_context_gpa +
 		RING_CTX_OFF(ring_header.val), &workload->rb_tail, 4);
 
-	page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
-	shadow_ring_context = kmap(page);
+	shadow_ring_context = (void *) ctx->lrc_reg_state;
 
 #define COPY_REG(name) \
 	intel_gvt_hypervisor_write_gpa(vgpu, workload->ring_context_gpa + \
@@ -861,7 +901,6 @@ static void update_guest_context(struct intel_vgpu_workload *workload)
 			sizeof(*shadow_ring_context),
 			I915_GTT_PAGE_SIZE - sizeof(*shadow_ring_context));
 
-	kunmap(page);
 }
 
 void intel_vgpu_clean_workloads(struct intel_vgpu *vgpu,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 0/7] use vfio_dma_rw to read/write IOVAs from CPU side
  2020-03-13  3:05 [PATCH v4 0/7] use vfio_dma_rw to read/write IOVAs from CPU side Yan Zhao
                   ` (6 preceding siblings ...)
  2020-03-13  3:12 ` [PATCH v4 7/7] drm/i915/gvt: rw more pages a time for shadow context Yan Zhao
@ 2020-03-13 22:29 ` Alex Williamson
  2020-03-15 23:56   ` Yan Zhao
  2020-03-16  3:24   ` Zhenyu Wang
  7 siblings, 2 replies; 14+ messages in thread
From: Alex Williamson @ 2020-03-13 22:29 UTC (permalink / raw)
  To: Yan Zhao
  Cc: intel-gvt-dev, kvm, linux-kernel, zhenyuw, pbonzini, kevin.tian,
	peterx, Kirti Wankhede, Neo Jia (cjia@nvidia.com)

[Cc +NVIDIA]

On Thu, 12 Mar 2020 23:05:48 -0400
Yan Zhao <yan.y.zhao@intel.com> wrote:

> It is better for a device model to use IOVAs to read/write memory to
> perform some sort of virtual DMA on behalf of the device.
> 
> patch 1 exports VFIO group to external user so that it can hold the group
> reference until finishing using of it. It saves ~500 cycles that are spent
> on VFIO group looking up, referencing and dereferencing. (this data is
> measured with 1 VFIO user).
> 
> patch 2 introduces interface vfio_dma_rw().
> 
> patch 3 introduces interfaces vfio_group_pin_pages() and
> vfio_group_unpin_pages() to get rid of VFIO group looking-up in
> vfio_pin_pages() and vfio_unpin_pages().
> 
> patch 4-5 let kvmgt switch from calling kvm_read/write_guest() to calling
> vfio_dma_rw to rw IOVAs.
> 
> patch 6 let kvmgt switch to use lighter version of vfio_pin/unpin_pages(),
> i.e. vfio_group_pin/unpin_pages()
> 
> patch 7 enables kvmgt to read/write IOVAs of size larger than PAGE_SIZE.

This looks pretty good to me, hopefully Kirti and Neo can find some
advantage with this series as well.  Given that we're also trying to
get the migration interface and dirty page tracking integrated for
v5.7, would it make sense to merge the first 3 patches via my next
branch?  This is probably the easiest way to update the dirty tracking.
Thanks,

Alex

> Performance:
> 
> Comparison between vfio_dma_rw() and kvm_read/write_guest():
> 
> 1. avergage CPU cycles of each interface measured with 1 running VM:
>  --------------------------------------------------
> |  rw       |          avg cycles of               |
> |  size     | (vfio_dma_rw - kvm_read/write_guest) |
> |---------- ---------------------------------------|
> | <= 1 page |            +155 ~ +195               |        
> |--------------------------------------------------|
> | 5 pages   |                -530                  |
> |--------------------------------------------------|
> | 20 pages  |           -2005 ~ -2533              |
>  --------------------------------------------------
> 
> 2. average scores
> 
> base: base code before applying code in this series. use
> kvm_read/write_pages() to rw IOVAs
> 
> base + this series: use vfio_dma_rw() to read IOVAs and use
> vfio_group_pin/unpin_pages(), and kvmgt is able to rw several pages
> at a time.
> 
> Scores of benchmarks running in 1 VM each:
>  -----------------------------------------------------------------
> |                    | glmark2 | lightsmark | openarena | heavens |
> |-----------------------------------------------------------------|
> |       base         |  1248   |  219.70    |  114.9    |   560   |
> |-----------------------------------------------------------------|
> |base + this series  |  1248   |  225.8     |   115     |   559   |
>  -----------------------------------------------------------------
> 
> Sum of scores of two benchmark instances running in 2 VMs each:
>  -------------------------------------------------------
> |                    | glmark2 | lightsmark | openarena |
> |-------------------------------------------------------|
> |       base         |  812    |  211.46    |  115.3    |
> |-------------------------------------------------------|
> |base + this series  |  814    |  214.69    |  115.9    |
>  -------------------------------------------------------
> 
> 
> Changelogs:
> v3 --> v4:
> - rebased to 5.6.0-rc4+
> - adjust wrap position for vfio_group_get_external_user_from_dev() in
>   header file.(Alex)
> - updated function description of vfio_group_get_external_user_from_dev()
>   (Alex)
> - fixed Error path group reference leaking in
>   vfio_group_get_external_user_from_dev()  (Alex)
> - reurn 0 for success or errno in vfio_dma_rw_chunk(). (Alex)
> - renamed iova to user_iova in interface vfio_dam_rw().
> - renamed vfio_pin_pages_from_group() and vfio_unpin_pages_from_group() to
>   vfio_group_pin_pages() and vfio_group_unpin_pages()
> - renamed user_pfn to user_iova_pfn in vfio_group_pin_pages() and
>   vfio_group_unpin_pages()
> 
> v2 --> v3:
> - add vfio_group_get_external_user_from_dev() to improve performance (Alex)
> - add vfio_pin/unpin_pages_from_group() to avoid repeated looking up of
>   VFIO group in vfio_pin/unpin_pages() (Alex)
> - add a check for IOMMU_READ permission. (Alex)
> - rename vfio_iommu_type1_rw_dma_nopin() to
>   vfio_iommu_type1_dma_rw_chunk(). (Alex)
> - in kvmgt, change "write ? vfio_dma_rw(...,true) :
>   vfio_dma_rw(...,false)" to vfio_dma_rw(dev, gpa, buf, len, write)
>   (Alex and Paolo)
> - in kvmgt, instead of read/write context pages 1:1, combining the
>   reads/writes of continuous IOVAs to take advantage of vfio_dma_rw() for
>   faster crossing page boundary accesses.
> 
> v1 --> v2:
> - rename vfio_iova_rw to vfio_dma_rw, vfio iommu driver ops .iova_rw
> to .dma_rw. (Alex).
> - change iova and len from unsigned long to dma_addr_t and size_t,
> respectively. (Alex)
> - fix possible overflow in dma->vaddr + iova - dma->iova + offset (Alex)
> - split DMAs from on page boundary to on max available size to eliminate
>   redundant searching of vfio_dma and switching mm. (Alex)
> - add a check for IOMMU_WRITE permission.
> 
> 
> Yan Zhao (7):
>   vfio: allow external user to get vfio group from device
>   vfio: introduce vfio_dma_rw to read/write a range of IOVAs
>   vfio: avoid inefficient operations on VFIO group in
>     vfio_pin/unpin_pages
>   drm/i915/gvt: hold reference of VFIO group during opening of vgpu
>   drm/i915/gvt: subsitute kvm_read/write_guest with vfio_dma_rw
>   drm/i915/gvt: switch to user vfio_group_pin/upin_pages
>   drm/i915/gvt: rw more pages a time for shadow context
> 
>  drivers/gpu/drm/i915/gvt/kvmgt.c     |  46 ++++---
>  drivers/gpu/drm/i915/gvt/scheduler.c |  97 ++++++++++-----
>  drivers/vfio/vfio.c                  | 180 +++++++++++++++++++++++++++
>  drivers/vfio/vfio_iommu_type1.c      |  76 +++++++++++
>  include/linux/vfio.h                 |  13 ++
>  5 files changed, 360 insertions(+), 52 deletions(-)
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 0/7] use vfio_dma_rw to read/write IOVAs from CPU side
  2020-03-13 22:29 ` [PATCH v4 0/7] use vfio_dma_rw to read/write IOVAs from CPU side Alex Williamson
@ 2020-03-15 23:56   ` Yan Zhao
  2020-03-16  3:24   ` Zhenyu Wang
  1 sibling, 0 replies; 14+ messages in thread
From: Yan Zhao @ 2020-03-15 23:56 UTC (permalink / raw)
  To: Alex Williamson
  Cc: intel-gvt-dev, kvm, linux-kernel, zhenyuw, pbonzini, Tian, Kevin,
	peterx, Kirti Wankhede, Neo Jia (cjia@nvidia.com)

On Sat, Mar 14, 2020 at 06:29:58AM +0800, Alex Williamson wrote:
> [Cc +NVIDIA]
> 
> On Thu, 12 Mar 2020 23:05:48 -0400
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > It is better for a device model to use IOVAs to read/write memory to
> > perform some sort of virtual DMA on behalf of the device.
> > 
> > patch 1 exports VFIO group to external user so that it can hold the group
> > reference until finishing using of it. It saves ~500 cycles that are spent
> > on VFIO group looking up, referencing and dereferencing. (this data is
> > measured with 1 VFIO user).
> > 
> > patch 2 introduces interface vfio_dma_rw().
> > 
> > patch 3 introduces interfaces vfio_group_pin_pages() and
> > vfio_group_unpin_pages() to get rid of VFIO group looking-up in
> > vfio_pin_pages() and vfio_unpin_pages().
> > 
> > patch 4-5 let kvmgt switch from calling kvm_read/write_guest() to calling
> > vfio_dma_rw to rw IOVAs.
> > 
> > patch 6 let kvmgt switch to use lighter version of vfio_pin/unpin_pages(),
> > i.e. vfio_group_pin/unpin_pages()
> > 
> > patch 7 enables kvmgt to read/write IOVAs of size larger than PAGE_SIZE.
> 
> This looks pretty good to me, hopefully Kirti and Neo can find some
> advantage with this series as well.  Given that we're also trying to
> get the migration interface and dirty page tracking integrated for
> v5.7, would it make sense to merge the first 3 patches via my next
> branch?  This is probably the easiest way to update the dirty tracking.
> Thanks,
> 
sure. glad to hear that :)

Thanks
Yan

> 
> > Performance:
> > 
> > Comparison between vfio_dma_rw() and kvm_read/write_guest():
> > 
> > 1. avergage CPU cycles of each interface measured with 1 running VM:
> >  --------------------------------------------------
> > |  rw       |          avg cycles of               |
> > |  size     | (vfio_dma_rw - kvm_read/write_guest) |
> > |---------- ---------------------------------------|
> > | <= 1 page |            +155 ~ +195               |        
> > |--------------------------------------------------|
> > | 5 pages   |                -530                  |
> > |--------------------------------------------------|
> > | 20 pages  |           -2005 ~ -2533              |
> >  --------------------------------------------------
> > 
> > 2. average scores
> > 
> > base: base code before applying code in this series. use
> > kvm_read/write_pages() to rw IOVAs
> > 
> > base + this series: use vfio_dma_rw() to read IOVAs and use
> > vfio_group_pin/unpin_pages(), and kvmgt is able to rw several pages
> > at a time.
> > 
> > Scores of benchmarks running in 1 VM each:
> >  -----------------------------------------------------------------
> > |                    | glmark2 | lightsmark | openarena | heavens |
> > |-----------------------------------------------------------------|
> > |       base         |  1248   |  219.70    |  114.9    |   560   |
> > |-----------------------------------------------------------------|
> > |base + this series  |  1248   |  225.8     |   115     |   559   |
> >  -----------------------------------------------------------------
> > 
> > Sum of scores of two benchmark instances running in 2 VMs each:
> >  -------------------------------------------------------
> > |                    | glmark2 | lightsmark | openarena |
> > |-------------------------------------------------------|
> > |       base         |  812    |  211.46    |  115.3    |
> > |-------------------------------------------------------|
> > |base + this series  |  814    |  214.69    |  115.9    |
> >  -------------------------------------------------------
> > 
> > 
> > Changelogs:
> > v3 --> v4:
> > - rebased to 5.6.0-rc4+
> > - adjust wrap position for vfio_group_get_external_user_from_dev() in
> >   header file.(Alex)
> > - updated function description of vfio_group_get_external_user_from_dev()
> >   (Alex)
> > - fixed Error path group reference leaking in
> >   vfio_group_get_external_user_from_dev()  (Alex)
> > - reurn 0 for success or errno in vfio_dma_rw_chunk(). (Alex)
> > - renamed iova to user_iova in interface vfio_dam_rw().
> > - renamed vfio_pin_pages_from_group() and vfio_unpin_pages_from_group() to
> >   vfio_group_pin_pages() and vfio_group_unpin_pages()
> > - renamed user_pfn to user_iova_pfn in vfio_group_pin_pages() and
> >   vfio_group_unpin_pages()
> > 
> > v2 --> v3:
> > - add vfio_group_get_external_user_from_dev() to improve performance (Alex)
> > - add vfio_pin/unpin_pages_from_group() to avoid repeated looking up of
> >   VFIO group in vfio_pin/unpin_pages() (Alex)
> > - add a check for IOMMU_READ permission. (Alex)
> > - rename vfio_iommu_type1_rw_dma_nopin() to
> >   vfio_iommu_type1_dma_rw_chunk(). (Alex)
> > - in kvmgt, change "write ? vfio_dma_rw(...,true) :
> >   vfio_dma_rw(...,false)" to vfio_dma_rw(dev, gpa, buf, len, write)
> >   (Alex and Paolo)
> > - in kvmgt, instead of read/write context pages 1:1, combining the
> >   reads/writes of continuous IOVAs to take advantage of vfio_dma_rw() for
> >   faster crossing page boundary accesses.
> > 
> > v1 --> v2:
> > - rename vfio_iova_rw to vfio_dma_rw, vfio iommu driver ops .iova_rw
> > to .dma_rw. (Alex).
> > - change iova and len from unsigned long to dma_addr_t and size_t,
> > respectively. (Alex)
> > - fix possible overflow in dma->vaddr + iova - dma->iova + offset (Alex)
> > - split DMAs from on page boundary to on max available size to eliminate
> >   redundant searching of vfio_dma and switching mm. (Alex)
> > - add a check for IOMMU_WRITE permission.
> > 
> > 
> > Yan Zhao (7):
> >   vfio: allow external user to get vfio group from device
> >   vfio: introduce vfio_dma_rw to read/write a range of IOVAs
> >   vfio: avoid inefficient operations on VFIO group in
> >     vfio_pin/unpin_pages
> >   drm/i915/gvt: hold reference of VFIO group during opening of vgpu
> >   drm/i915/gvt: subsitute kvm_read/write_guest with vfio_dma_rw
> >   drm/i915/gvt: switch to user vfio_group_pin/upin_pages
> >   drm/i915/gvt: rw more pages a time for shadow context
> > 
> >  drivers/gpu/drm/i915/gvt/kvmgt.c     |  46 ++++---
> >  drivers/gpu/drm/i915/gvt/scheduler.c |  97 ++++++++++-----
> >  drivers/vfio/vfio.c                  | 180 +++++++++++++++++++++++++++
> >  drivers/vfio/vfio_iommu_type1.c      |  76 +++++++++++
> >  include/linux/vfio.h                 |  13 ++
> >  5 files changed, 360 insertions(+), 52 deletions(-)
> > 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 7/7] drm/i915/gvt: rw more pages a time for shadow context
  2020-03-13  3:12 ` [PATCH v4 7/7] drm/i915/gvt: rw more pages a time for shadow context Yan Zhao
@ 2020-03-16  3:23   ` Zhenyu Wang
  0 siblings, 0 replies; 14+ messages in thread
From: Zhenyu Wang @ 2020-03-16  3:23 UTC (permalink / raw)
  To: Yan Zhao
  Cc: intel-gvt-dev, kvm, linux-kernel, alex.williamson, zhenyuw,
	pbonzini, kevin.tian, peterx

[-- Attachment #1: Type: text/plain, Size: 8961 bytes --]

On 2020.03.12 23:12:33 -0400, Yan Zhao wrote:
> 1. as shadow context is pinned in intel_vgpu_setup_submission() and
> unpinned in intel_vgpu_clean_submission(), its base virtual address of
> is safely obtained from lrc_reg_state. no need to call kmap()/kunmap()
> repeatedly.
> 
> 2. IOVA(GPA)s of context pages are checked in this patch and if they are
> consecutive, read/write them together in one
> intel_gvt_hypervisor_read_gpa() / intel_gvt_hypervisor_write_gpa().
> 
> after the two changes in this patch,

Better split the kmap remove and consecutive copy one for bisect.

> average cycles for populate_shadow_context() and update_guest_context()
> are reduced by ~10000-20000 cycles, depending on the average number of
> consecutive pages in each read/write.
> 
> (1) comparison of cycles of
> populate_shadow_context() + update_guest_context() when executing
> different benchmarks
>  -------------------------------------------------------------
> |       cycles      | glmark2     | lightsmark  | openarena   |
> |-------------------------------------------------------------|
> | before this patch | 65968       | 97852       | 61373       |
> |  after this patch | 56017 (85%) | 73862 (75%) | 47463 (77%) |
>  -------------------------------------------------------------
> 
> (2) average count of pages read/written a time in
> populate_shadow_context() and update_guest_context()
> for each benchmark
> 
>  -----------------------------------------------------------
> |     page cnt      | glmark2     | lightsmark  | openarena |
> |-----------------------------------------------------------|
> | before this patch |    1        |      1      |    1      |
> |  after this patch |    5.25     |     19.99   |   20      |
>  ------------------------------------------------------------
> 
> (3) comparison of benchmarks scores
>  ---------------------------------------------------------------------
> |      score        | glmark2       | lightsmark     | openarena      |
> |---------------------------------------------------------------------|
> | before this patch | 1244          | 222.18         | 114.4          |
> |  after this patch | 1248 (100.3%) | 225.8 (101.6%) | 115.0 (100.9%) |
>  ---------------------------------------------------------------------
> 
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>  drivers/gpu/drm/i915/gvt/scheduler.c | 95 ++++++++++++++++++++--------
>  1 file changed, 67 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
> index 1c95bf8cbed0..852d924f6abc 100644
> --- a/drivers/gpu/drm/i915/gvt/scheduler.c
> +++ b/drivers/gpu/drm/i915/gvt/scheduler.c
> @@ -128,16 +128,21 @@ static int populate_shadow_context(struct intel_vgpu_workload *workload)
>  {
>  	struct intel_vgpu *vgpu = workload->vgpu;
>  	struct intel_gvt *gvt = vgpu->gvt;
> -	struct drm_i915_gem_object *ctx_obj =
> -		workload->req->context->state->obj;
> +	struct intel_context *ctx = workload->req->context;
>  	struct execlist_ring_context *shadow_ring_context;
> -	struct page *page;
>  	void *dst;
> +	void *context_base;
>  	unsigned long context_gpa, context_page_num;
> +	unsigned long gpa_base; /* first gpa of consecutive GPAs */
> +	unsigned long gpa_size; /* size of consecutive GPAs */
>  	int i;
>  
> -	page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
> -	shadow_ring_context = kmap(page);
> +	GEM_BUG_ON(!intel_context_is_pinned(ctx));
> +
> +	context_base = (void *) ctx->lrc_reg_state -
> +				(LRC_STATE_PN << I915_GTT_PAGE_SHIFT);
> +
> +	shadow_ring_context = (void *) ctx->lrc_reg_state;
>  
>  	sr_oa_regs(workload, (u32 *)shadow_ring_context, true);
>  #define COPY_REG(name) \
> @@ -169,7 +174,6 @@ static int populate_shadow_context(struct intel_vgpu_workload *workload)
>  			I915_GTT_PAGE_SIZE - sizeof(*shadow_ring_context));
>  
>  	sr_oa_regs(workload, (u32 *)shadow_ring_context, false);
> -	kunmap(page);
>  
>  	if (IS_RESTORE_INHIBIT(shadow_ring_context->ctx_ctrl.val))
>  		return 0;
> @@ -184,8 +188,12 @@ static int populate_shadow_context(struct intel_vgpu_workload *workload)
>  	if (IS_BROADWELL(gvt->gt->i915) && workload->engine->id == RCS0)
>  		context_page_num = 19;
>  
> -	i = 2;
> -	while (i < context_page_num) {
> +
> +	/* find consecutive GPAs from gma until the first inconsecutive GPA.
> +	 * read from the continuous GPAs into dst virtual address
> +	 */
> +	gpa_size = 0;
> +	for (i = 2; i < context_page_num; i++) {
>  		context_gpa = intel_vgpu_gma_to_gpa(vgpu->gtt.ggtt_mm,
>  				(u32)((workload->ctx_desc.lrca + i) <<
>  				I915_GTT_PAGE_SHIFT));
> @@ -194,12 +202,24 @@ static int populate_shadow_context(struct intel_vgpu_workload *workload)
>  			return -EFAULT;
>  		}
>  
> -		page = i915_gem_object_get_page(ctx_obj, i);
> -		dst = kmap(page);
> -		intel_gvt_hypervisor_read_gpa(vgpu, context_gpa, dst,
> -				I915_GTT_PAGE_SIZE);
> -		kunmap(page);
> -		i++;
> +		if (gpa_size == 0) {
> +			gpa_base = context_gpa;
> +			dst = context_base + (i << I915_GTT_PAGE_SHIFT);
> +		} else if (context_gpa != gpa_base + gpa_size)
> +			goto read;
> +
> +		gpa_size += I915_GTT_PAGE_SIZE;
> +
> +		if (i == context_page_num - 1)
> +			goto read;
> +
> +		continue;
> +
> +read:
> +		intel_gvt_hypervisor_read_gpa(vgpu, gpa_base, dst, gpa_size);
> +		gpa_base = context_gpa;
> +		gpa_size = I915_GTT_PAGE_SIZE;
> +		dst = context_base + (i << I915_GTT_PAGE_SHIFT);
>  	}
>  	return 0;
>  }
> @@ -784,19 +804,23 @@ static void update_guest_context(struct intel_vgpu_workload *workload)
>  {
>  	struct i915_request *rq = workload->req;
>  	struct intel_vgpu *vgpu = workload->vgpu;
> -	struct drm_i915_gem_object *ctx_obj = rq->context->state->obj;
> +	struct intel_context *ctx = workload->req->context;
>  	struct execlist_ring_context *shadow_ring_context;
> -	struct page *page;
> -	void *src;
>  	unsigned long context_gpa, context_page_num;
> +	unsigned long gpa_base; /* first gpa of consecutive GPAs */
> +	unsigned long gpa_size; /* size of consecutive GPAs*/
>  	int i;
>  	u32 ring_base;
>  	u32 head, tail;
>  	u16 wrap_count;
> +	void *src;
> +	void *context_base;
>  
>  	gvt_dbg_sched("ring id %d workload lrca %x\n", rq->engine->id,
>  		      workload->ctx_desc.lrca);
>  
> +	GEM_BUG_ON(!intel_context_is_pinned(ctx));
> +
>  	head = workload->rb_head;
>  	tail = workload->rb_tail;
>  	wrap_count = workload->guest_rb_head >> RB_HEAD_WRAP_CNT_OFF;
> @@ -820,9 +844,14 @@ static void update_guest_context(struct intel_vgpu_workload *workload)
>  	if (IS_BROADWELL(rq->i915) && rq->engine->id == RCS0)
>  		context_page_num = 19;
>  
> -	i = 2;
> +	context_base = (void *) ctx->lrc_reg_state -
> +			(LRC_STATE_PN << I915_GTT_PAGE_SHIFT);
>  
> -	while (i < context_page_num) {
> +	/* find consecutive GPAs from gma until the first inconsecutive GPA.
> +	 * write to the consecutive GPAs from src virtual address
> +	 */
> +	gpa_size = 0;
> +	for (i = 2; i < context_page_num; i++) {
>  		context_gpa = intel_vgpu_gma_to_gpa(vgpu->gtt.ggtt_mm,
>  				(u32)((workload->ctx_desc.lrca + i) <<
>  					I915_GTT_PAGE_SHIFT));
> @@ -831,19 +860,30 @@ static void update_guest_context(struct intel_vgpu_workload *workload)
>  			return;
>  		}
>  
> -		page = i915_gem_object_get_page(ctx_obj, i);
> -		src = kmap(page);
> -		intel_gvt_hypervisor_write_gpa(vgpu, context_gpa, src,
> -				I915_GTT_PAGE_SIZE);
> -		kunmap(page);
> -		i++;
> +		if (gpa_size == 0) {
> +			gpa_base = context_gpa;
> +			src = context_base + (i << I915_GTT_PAGE_SHIFT);
> +		} else if (context_gpa != gpa_base + gpa_size)
> +			goto write;
> +
> +		gpa_size += I915_GTT_PAGE_SIZE;
> +
> +		if (i == context_page_num - 1)
> +			goto write;
> +
> +		continue;
> +
> +write:
> +		intel_gvt_hypervisor_write_gpa(vgpu, gpa_base, src, gpa_size);
> +		gpa_base = context_gpa;
> +		gpa_size = I915_GTT_PAGE_SIZE;
> +		src = context_base + (i << I915_GTT_PAGE_SHIFT);
>  	}
>  
>  	intel_gvt_hypervisor_write_gpa(vgpu, workload->ring_context_gpa +
>  		RING_CTX_OFF(ring_header.val), &workload->rb_tail, 4);
>  
> -	page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
> -	shadow_ring_context = kmap(page);
> +	shadow_ring_context = (void *) ctx->lrc_reg_state;
>  
>  #define COPY_REG(name) \
>  	intel_gvt_hypervisor_write_gpa(vgpu, workload->ring_context_gpa + \
> @@ -861,7 +901,6 @@ static void update_guest_context(struct intel_vgpu_workload *workload)
>  			sizeof(*shadow_ring_context),
>  			I915_GTT_PAGE_SIZE - sizeof(*shadow_ring_context));
>  
> -	kunmap(page);
>  }
>  
>  void intel_vgpu_clean_workloads(struct intel_vgpu *vgpu,
> -- 
> 2.17.1
> 

-- 
Open Source Technology Center, Intel ltd.

$gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 0/7] use vfio_dma_rw to read/write IOVAs from CPU side
  2020-03-13 22:29 ` [PATCH v4 0/7] use vfio_dma_rw to read/write IOVAs from CPU side Alex Williamson
  2020-03-15 23:56   ` Yan Zhao
@ 2020-03-16  3:24   ` Zhenyu Wang
  1 sibling, 0 replies; 14+ messages in thread
From: Zhenyu Wang @ 2020-03-16  3:24 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yan Zhao, intel-gvt-dev, kvm, linux-kernel, zhenyuw, pbonzini,
	kevin.tian, peterx, Kirti Wankhede, Neo Jia (cjia@nvidia.com)

[-- Attachment #1: Type: text/plain, Size: 6829 bytes --]

On 2020.03.13 16:29:58 -0600, Alex Williamson wrote:
> [Cc +NVIDIA]
> 
> On Thu, 12 Mar 2020 23:05:48 -0400
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > It is better for a device model to use IOVAs to read/write memory to
> > perform some sort of virtual DMA on behalf of the device.
> > 
> > patch 1 exports VFIO group to external user so that it can hold the group
> > reference until finishing using of it. It saves ~500 cycles that are spent
> > on VFIO group looking up, referencing and dereferencing. (this data is
> > measured with 1 VFIO user).
> > 
> > patch 2 introduces interface vfio_dma_rw().
> > 
> > patch 3 introduces interfaces vfio_group_pin_pages() and
> > vfio_group_unpin_pages() to get rid of VFIO group looking-up in
> > vfio_pin_pages() and vfio_unpin_pages().
> > 
> > patch 4-5 let kvmgt switch from calling kvm_read/write_guest() to calling
> > vfio_dma_rw to rw IOVAs.
> > 
> > patch 6 let kvmgt switch to use lighter version of vfio_pin/unpin_pages(),
> > i.e. vfio_group_pin/unpin_pages()
> > 
> > patch 7 enables kvmgt to read/write IOVAs of size larger than PAGE_SIZE.
> 
> This looks pretty good to me, hopefully Kirti and Neo can find some
> advantage with this series as well.  Given that we're also trying to
> get the migration interface and dirty page tracking integrated for
> v5.7, would it make sense to merge the first 3 patches via my next
> branch?  This is probably the easiest way to update the dirty tracking.

I think that should be ok, other kvmgt change for 5.7 has already been in queue,
once merge window is open, can send left kvmgt patches.

For kvmgt patch 4-6,

Reviewed-by: Zhenyu Wang <zhenyuw@linux.intel.com>

> 
> Alex
> 
> > Performance:
> > 
> > Comparison between vfio_dma_rw() and kvm_read/write_guest():
> > 
> > 1. avergage CPU cycles of each interface measured with 1 running VM:
> >  --------------------------------------------------
> > |  rw       |          avg cycles of               |
> > |  size     | (vfio_dma_rw - kvm_read/write_guest) |
> > |---------- ---------------------------------------|
> > | <= 1 page |            +155 ~ +195               |        
> > |--------------------------------------------------|
> > | 5 pages   |                -530                  |
> > |--------------------------------------------------|
> > | 20 pages  |           -2005 ~ -2533              |
> >  --------------------------------------------------
> > 
> > 2. average scores
> > 
> > base: base code before applying code in this series. use
> > kvm_read/write_pages() to rw IOVAs
> > 
> > base + this series: use vfio_dma_rw() to read IOVAs and use
> > vfio_group_pin/unpin_pages(), and kvmgt is able to rw several pages
> > at a time.
> > 
> > Scores of benchmarks running in 1 VM each:
> >  -----------------------------------------------------------------
> > |                    | glmark2 | lightsmark | openarena | heavens |
> > |-----------------------------------------------------------------|
> > |       base         |  1248   |  219.70    |  114.9    |   560   |
> > |-----------------------------------------------------------------|
> > |base + this series  |  1248   |  225.8     |   115     |   559   |
> >  -----------------------------------------------------------------
> > 
> > Sum of scores of two benchmark instances running in 2 VMs each:
> >  -------------------------------------------------------
> > |                    | glmark2 | lightsmark | openarena |
> > |-------------------------------------------------------|
> > |       base         |  812    |  211.46    |  115.3    |
> > |-------------------------------------------------------|
> > |base + this series  |  814    |  214.69    |  115.9    |
> >  -------------------------------------------------------
> > 
> > 
> > Changelogs:
> > v3 --> v4:
> > - rebased to 5.6.0-rc4+
> > - adjust wrap position for vfio_group_get_external_user_from_dev() in
> >   header file.(Alex)
> > - updated function description of vfio_group_get_external_user_from_dev()
> >   (Alex)
> > - fixed Error path group reference leaking in
> >   vfio_group_get_external_user_from_dev()  (Alex)
> > - reurn 0 for success or errno in vfio_dma_rw_chunk(). (Alex)
> > - renamed iova to user_iova in interface vfio_dam_rw().
> > - renamed vfio_pin_pages_from_group() and vfio_unpin_pages_from_group() to
> >   vfio_group_pin_pages() and vfio_group_unpin_pages()
> > - renamed user_pfn to user_iova_pfn in vfio_group_pin_pages() and
> >   vfio_group_unpin_pages()
> > 
> > v2 --> v3:
> > - add vfio_group_get_external_user_from_dev() to improve performance (Alex)
> > - add vfio_pin/unpin_pages_from_group() to avoid repeated looking up of
> >   VFIO group in vfio_pin/unpin_pages() (Alex)
> > - add a check for IOMMU_READ permission. (Alex)
> > - rename vfio_iommu_type1_rw_dma_nopin() to
> >   vfio_iommu_type1_dma_rw_chunk(). (Alex)
> > - in kvmgt, change "write ? vfio_dma_rw(...,true) :
> >   vfio_dma_rw(...,false)" to vfio_dma_rw(dev, gpa, buf, len, write)
> >   (Alex and Paolo)
> > - in kvmgt, instead of read/write context pages 1:1, combining the
> >   reads/writes of continuous IOVAs to take advantage of vfio_dma_rw() for
> >   faster crossing page boundary accesses.
> > 
> > v1 --> v2:
> > - rename vfio_iova_rw to vfio_dma_rw, vfio iommu driver ops .iova_rw
> > to .dma_rw. (Alex).
> > - change iova and len from unsigned long to dma_addr_t and size_t,
> > respectively. (Alex)
> > - fix possible overflow in dma->vaddr + iova - dma->iova + offset (Alex)
> > - split DMAs from on page boundary to on max available size to eliminate
> >   redundant searching of vfio_dma and switching mm. (Alex)
> > - add a check for IOMMU_WRITE permission.
> > 
> > 
> > Yan Zhao (7):
> >   vfio: allow external user to get vfio group from device
> >   vfio: introduce vfio_dma_rw to read/write a range of IOVAs
> >   vfio: avoid inefficient operations on VFIO group in
> >     vfio_pin/unpin_pages
> >   drm/i915/gvt: hold reference of VFIO group during opening of vgpu
> >   drm/i915/gvt: subsitute kvm_read/write_guest with vfio_dma_rw
> >   drm/i915/gvt: switch to user vfio_group_pin/upin_pages
> >   drm/i915/gvt: rw more pages a time for shadow context
> > 
> >  drivers/gpu/drm/i915/gvt/kvmgt.c     |  46 ++++---
> >  drivers/gpu/drm/i915/gvt/scheduler.c |  97 ++++++++++-----
> >  drivers/vfio/vfio.c                  | 180 +++++++++++++++++++++++++++
> >  drivers/vfio/vfio_iommu_type1.c      |  76 +++++++++++
> >  include/linux/vfio.h                 |  13 ++
> >  5 files changed, 360 insertions(+), 52 deletions(-)
> > 
> 

-- 
Open Source Technology Center, Intel ltd.

$gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 2/7] vfio: introduce vfio_dma_rw to read/write a range of IOVAs
  2020-03-13  3:09 ` [PATCH v4 2/7] vfio: introduce vfio_dma_rw to read/write a range of IOVAs Yan Zhao
@ 2020-04-05 16:17   ` Kees Cook
  2020-04-07  3:48     ` Yan Zhao
  0 siblings, 1 reply; 14+ messages in thread
From: Kees Cook @ 2020-04-05 16:17 UTC (permalink / raw)
  To: Yan Zhao
  Cc: intel-gvt-dev, kvm, linux-kernel, alex.williamson, zhenyuw,
	pbonzini, kevin.tian, peterx

On Thu, Mar 12, 2020 at 11:09:01PM -0400, Yan Zhao wrote:
> vfio_dma_rw will read/write a range of user space memory pointed to by
> IOVA into/from a kernel buffer without enforcing pinning the user space
> memory.
> 
> TODO: mark the IOVAs to user space memory dirty if they are written in
> vfio_dma_rw().
> 
> Cc: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>  drivers/vfio/vfio.c             | 49 +++++++++++++++++++++
>  drivers/vfio/vfio_iommu_type1.c | 76 +++++++++++++++++++++++++++++++++
>  include/linux/vfio.h            |  5 +++
>  3 files changed, 130 insertions(+)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 97b972bfb735..6997f711b925 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -1999,6 +1999,55 @@ int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn, int npage)
>  }
>  EXPORT_SYMBOL(vfio_unpin_pages);
>  
> +
> +/*
> + * This interface allows the CPUs to perform some sort of virtual DMA on
> + * behalf of the device.
> + *
> + * CPUs read/write from/into a range of IOVAs pointing to user space memory
> + * into/from a kernel buffer.
> + *
> + * As the read/write of user space memory is conducted via the CPUs and is
> + * not a real device DMA, it is not necessary to pin the user space memory.
> + *
> + * The caller needs to call vfio_group_get_external_user() or
> + * vfio_group_get_external_user_from_dev() prior to calling this interface,
> + * so as to prevent the VFIO group from disposal in the middle of the call.
> + * But it can keep the reference to the VFIO group for several calls into
> + * this interface.
> + * After finishing using of the VFIO group, the caller needs to release the
> + * VFIO group by calling vfio_group_put_external_user().
> + *
> + * @group [in]		: VFIO group
> + * @user_iova [in]	: base IOVA of a user space buffer
> + * @data [in]		: pointer to kernel buffer
> + * @len [in]		: kernel buffer length
> + * @write		: indicate read or write
> + * Return error code on failure or 0 on success.
> + */
> +int vfio_dma_rw(struct vfio_group *group, dma_addr_t user_iova,
> +		void *data, size_t len, bool write)
> +{
> +	struct vfio_container *container;
> +	struct vfio_iommu_driver *driver;
> +	int ret = 0;
> +
> +	if (!group || !data || len <= 0)
> +		return -EINVAL;
> +
> +	container = group->container;
> +	driver = container->iommu_driver;
> +
> +	if (likely(driver && driver->ops->dma_rw))
> +		ret = driver->ops->dma_rw(container->iommu_data,
> +					  user_iova, data, len, write);
> +	else
> +		ret = -ENOTTY;
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(vfio_dma_rw);
> +
>  static int vfio_register_iommu_notifier(struct vfio_group *group,
>  					unsigned long *events,
>  					struct notifier_block *nb)
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index a177bf2c6683..9fdfae1cb17a 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -27,6 +27,7 @@
>  #include <linux/iommu.h>
>  #include <linux/module.h>
>  #include <linux/mm.h>
> +#include <linux/mmu_context.h>
>  #include <linux/rbtree.h>
>  #include <linux/sched/signal.h>
>  #include <linux/sched/mm.h>
> @@ -2305,6 +2306,80 @@ static int vfio_iommu_type1_unregister_notifier(void *iommu_data,
>  	return blocking_notifier_chain_unregister(&iommu->notifier, nb);
>  }
>  
> +static int vfio_iommu_type1_dma_rw_chunk(struct vfio_iommu *iommu,
> +					 dma_addr_t user_iova, void *data,
> +					 size_t count, bool write,
> +					 size_t *copied)
> +{
> +	struct mm_struct *mm;
> +	unsigned long vaddr;
> +	struct vfio_dma *dma;
> +	bool kthread = current->mm == NULL;
> +	size_t offset;
> +
> +	*copied = 0;
> +
> +	dma = vfio_find_dma(iommu, user_iova, 1);
> +	if (!dma)
> +		return -EINVAL;
> +
> +	if ((write && !(dma->prot & IOMMU_WRITE)) ||
> +			!(dma->prot & IOMMU_READ))
> +		return -EPERM;
> +
> +	mm = get_task_mm(dma->task);
> +
> +	if (!mm)
> +		return -EPERM;
> +
> +	if (kthread)
> +		use_mm(mm);
> +	else if (current->mm != mm)
> +		goto out;
> +
> +	offset = user_iova - dma->iova;
> +
> +	if (count > dma->size - offset)
> +		count = dma->size - offset;
> +
> +	vaddr = dma->vaddr + offset;
> +
> +	if (write)
> +		*copied = __copy_to_user((void __user *)vaddr, data,
> +					 count) ? 0 : count;
> +	else
> +		*copied = __copy_from_user(data, (void __user *)vaddr,
> +					   count) ? 0 : count;

Why are these using __copy_*_user()? Where are the access_ok() checks?

-Kees

> +	if (kthread)
> +		unuse_mm(mm);
> +out:
> +	mmput(mm);
> +	return *copied ? 0 : -EFAULT;
> +}
> +
> +static int vfio_iommu_type1_dma_rw(void *iommu_data, dma_addr_t user_iova,
> +				   void *data, size_t count, bool write)
> +{
> +	struct vfio_iommu *iommu = iommu_data;
> +	int ret = 0;
> +	size_t done;
> +
> +	mutex_lock(&iommu->lock);
> +	while (count > 0) {
> +		ret = vfio_iommu_type1_dma_rw_chunk(iommu, user_iova, data,
> +						    count, write, &done);
> +		if (ret)
> +			break;
> +
> +		count -= done;
> +		data += done;
> +		user_iova += done;
> +	}
> +
> +	mutex_unlock(&iommu->lock);
> +	return ret;
> +}
> +
>  static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
>  	.name			= "vfio-iommu-type1",
>  	.owner			= THIS_MODULE,
> @@ -2317,6 +2392,7 @@ static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
>  	.unpin_pages		= vfio_iommu_type1_unpin_pages,
>  	.register_notifier	= vfio_iommu_type1_register_notifier,
>  	.unregister_notifier	= vfio_iommu_type1_unregister_notifier,
> +	.dma_rw			= vfio_iommu_type1_dma_rw,
>  };
>  
>  static int __init vfio_iommu_type1_init(void)
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index fb71e0ac0e76..34b2fdf4de6e 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -82,6 +82,8 @@ struct vfio_iommu_driver_ops {
>  					     struct notifier_block *nb);
>  	int		(*unregister_notifier)(void *iommu_data,
>  					       struct notifier_block *nb);
> +	int		(*dma_rw)(void *iommu_data, dma_addr_t user_iova,
> +				  void *data, size_t count, bool write);
>  };
>  
>  extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
> @@ -109,6 +111,9 @@ extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
>  extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn,
>  			    int npage);
>  
> +extern int vfio_dma_rw(struct vfio_group *group, dma_addr_t user_iova,
> +		       void *data, size_t len, bool write);
> +
>  /* each type has independent events */
>  enum vfio_notify_type {
>  	VFIO_IOMMU_NOTIFY = 0,
> -- 
> 2.17.1
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4 2/7] vfio: introduce vfio_dma_rw to read/write a range of IOVAs
  2020-04-05 16:17   ` Kees Cook
@ 2020-04-07  3:48     ` Yan Zhao
  0 siblings, 0 replies; 14+ messages in thread
From: Yan Zhao @ 2020-04-07  3:48 UTC (permalink / raw)
  To: Kees Cook
  Cc: intel-gvt-dev, kvm, linux-kernel, alex.williamson, zhenyuw,
	pbonzini, kevin.tian, peterx

On Sun, Apr 05, 2020 at 09:17:13AM -0700, Kees Cook wrote:
> On Thu, Mar 12, 2020 at 11:09:01PM -0400, Yan Zhao wrote:
> > vfio_dma_rw will read/write a range of user space memory pointed to by
> > IOVA into/from a kernel buffer without enforcing pinning the user space
> > memory.
> > 
> > TODO: mark the IOVAs to user space memory dirty if they are written in
> > vfio_dma_rw().
> > 
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > ---
> >  drivers/vfio/vfio.c             | 49 +++++++++++++++++++++
> >  drivers/vfio/vfio_iommu_type1.c | 76 +++++++++++++++++++++++++++++++++
> >  include/linux/vfio.h            |  5 +++
> >  3 files changed, 130 insertions(+)
> > 
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > index 97b972bfb735..6997f711b925 100644
> > --- a/drivers/vfio/vfio.c
> > +++ b/drivers/vfio/vfio.c
> > @@ -1999,6 +1999,55 @@ int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn, int npage)
> >  }
> >  EXPORT_SYMBOL(vfio_unpin_pages);
> >  
> > +
> > +/*
> > + * This interface allows the CPUs to perform some sort of virtual DMA on
> > + * behalf of the device.
> > + *
> > + * CPUs read/write from/into a range of IOVAs pointing to user space memory
> > + * into/from a kernel buffer.
> > + *
> > + * As the read/write of user space memory is conducted via the CPUs and is
> > + * not a real device DMA, it is not necessary to pin the user space memory.
> > + *
> > + * The caller needs to call vfio_group_get_external_user() or
> > + * vfio_group_get_external_user_from_dev() prior to calling this interface,
> > + * so as to prevent the VFIO group from disposal in the middle of the call.
> > + * But it can keep the reference to the VFIO group for several calls into
> > + * this interface.
> > + * After finishing using of the VFIO group, the caller needs to release the
> > + * VFIO group by calling vfio_group_put_external_user().
> > + *
> > + * @group [in]		: VFIO group
> > + * @user_iova [in]	: base IOVA of a user space buffer
> > + * @data [in]		: pointer to kernel buffer
> > + * @len [in]		: kernel buffer length
> > + * @write		: indicate read or write
> > + * Return error code on failure or 0 on success.
> > + */
> > +int vfio_dma_rw(struct vfio_group *group, dma_addr_t user_iova,
> > +		void *data, size_t len, bool write)
> > +{
> > +	struct vfio_container *container;
> > +	struct vfio_iommu_driver *driver;
> > +	int ret = 0;
> > +
> > +	if (!group || !data || len <= 0)
> > +		return -EINVAL;
> > +
> > +	container = group->container;
> > +	driver = container->iommu_driver;
> > +
> > +	if (likely(driver && driver->ops->dma_rw))
> > +		ret = driver->ops->dma_rw(container->iommu_data,
> > +					  user_iova, data, len, write);
> > +	else
> > +		ret = -ENOTTY;
> > +
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL(vfio_dma_rw);
> > +
> >  static int vfio_register_iommu_notifier(struct vfio_group *group,
> >  					unsigned long *events,
> >  					struct notifier_block *nb)
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index a177bf2c6683..9fdfae1cb17a 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -27,6 +27,7 @@
> >  #include <linux/iommu.h>
> >  #include <linux/module.h>
> >  #include <linux/mm.h>
> > +#include <linux/mmu_context.h>
> >  #include <linux/rbtree.h>
> >  #include <linux/sched/signal.h>
> >  #include <linux/sched/mm.h>
> > @@ -2305,6 +2306,80 @@ static int vfio_iommu_type1_unregister_notifier(void *iommu_data,
> >  	return blocking_notifier_chain_unregister(&iommu->notifier, nb);
> >  }
> >  
> > +static int vfio_iommu_type1_dma_rw_chunk(struct vfio_iommu *iommu,
> > +					 dma_addr_t user_iova, void *data,
> > +					 size_t count, bool write,
> > +					 size_t *copied)
> > +{
> > +	struct mm_struct *mm;
> > +	unsigned long vaddr;
> > +	struct vfio_dma *dma;
> > +	bool kthread = current->mm == NULL;
> > +	size_t offset;
> > +
> > +	*copied = 0;
> > +
> > +	dma = vfio_find_dma(iommu, user_iova, 1);
> > +	if (!dma)
> > +		return -EINVAL;
> > +
> > +	if ((write && !(dma->prot & IOMMU_WRITE)) ||
> > +			!(dma->prot & IOMMU_READ))
> > +		return -EPERM;
> > +
> > +	mm = get_task_mm(dma->task);
> > +
> > +	if (!mm)
> > +		return -EPERM;
> > +
> > +	if (kthread)
> > +		use_mm(mm);
> > +	else if (current->mm != mm)
> > +		goto out;
> > +
> > +	offset = user_iova - dma->iova;
> > +
> > +	if (count > dma->size - offset)
> > +		count = dma->size - offset;
> > +
> > +	vaddr = dma->vaddr + offset;
> > +
> > +	if (write)
> > +		*copied = __copy_to_user((void __user *)vaddr, data,
> > +					 count) ? 0 : count;
> > +	else
> > +		*copied = __copy_from_user(data, (void __user *)vaddr,
> > +					   count) ? 0 : count;
> 
> Why are these using __copy_*_user()? Where are the access_ok() checks?
> 
sorry, my fault. I thought there was a access_ok() for vaddr when adding it
to vfio_dma tree.
will send a fix to use copy_to/from_user instead.

Thanks
Yan

> 
> > +	if (kthread)
> > +		unuse_mm(mm);
> > +out:
> > +	mmput(mm);
> > +	return *copied ? 0 : -EFAULT;
> > +}
> > +
> > +static int vfio_iommu_type1_dma_rw(void *iommu_data, dma_addr_t user_iova,
> > +				   void *data, size_t count, bool write)
> > +{
> > +	struct vfio_iommu *iommu = iommu_data;
> > +	int ret = 0;
> > +	size_t done;
> > +
> > +	mutex_lock(&iommu->lock);
> > +	while (count > 0) {
> > +		ret = vfio_iommu_type1_dma_rw_chunk(iommu, user_iova, data,
> > +						    count, write, &done);
> > +		if (ret)
> > +			break;
> > +
> > +		count -= done;
> > +		data += done;
> > +		user_iova += done;
> > +	}
> > +
> > +	mutex_unlock(&iommu->lock);
> > +	return ret;
> > +}
> > +
> >  static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
> >  	.name			= "vfio-iommu-type1",
> >  	.owner			= THIS_MODULE,
> > @@ -2317,6 +2392,7 @@ static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
> >  	.unpin_pages		= vfio_iommu_type1_unpin_pages,
> >  	.register_notifier	= vfio_iommu_type1_register_notifier,
> >  	.unregister_notifier	= vfio_iommu_type1_unregister_notifier,
> > +	.dma_rw			= vfio_iommu_type1_dma_rw,
> >  };
> >  
> >  static int __init vfio_iommu_type1_init(void)
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> > index fb71e0ac0e76..34b2fdf4de6e 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -82,6 +82,8 @@ struct vfio_iommu_driver_ops {
> >  					     struct notifier_block *nb);
> >  	int		(*unregister_notifier)(void *iommu_data,
> >  					       struct notifier_block *nb);
> > +	int		(*dma_rw)(void *iommu_data, dma_addr_t user_iova,
> > +				  void *data, size_t count, bool write);
> >  };
> >  
> >  extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
> > @@ -109,6 +111,9 @@ extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
> >  extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn,
> >  			    int npage);
> >  
> > +extern int vfio_dma_rw(struct vfio_group *group, dma_addr_t user_iova,
> > +		       void *data, size_t len, bool write);
> > +
> >  /* each type has independent events */
> >  enum vfio_notify_type {
> >  	VFIO_IOMMU_NOTIFY = 0,
> > -- 
> > 2.17.1
> > 
> 
> -- 
> Kees Cook

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-04-07  3:58 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-13  3:05 [PATCH v4 0/7] use vfio_dma_rw to read/write IOVAs from CPU side Yan Zhao
2020-03-13  3:07 ` [PATCH v4 1/7] vfio: allow external user to get vfio group from device Yan Zhao
2020-03-13  3:09 ` [PATCH v4 2/7] vfio: introduce vfio_dma_rw to read/write a range of IOVAs Yan Zhao
2020-04-05 16:17   ` Kees Cook
2020-04-07  3:48     ` Yan Zhao
2020-03-13  3:09 ` [PATCH v4 3/7] vfio: avoid inefficient operations on VFIO group in vfio_pin/unpin_pages Yan Zhao
2020-03-13  3:10 ` [PATCH v4 4/7] drm/i915/gvt: hold reference of VFIO group during opening of vgpu Yan Zhao
2020-03-13  3:11 ` [PATCH v4 5/7] drm/i915/gvt: subsitute kvm_read/write_guest with vfio_dma_rw Yan Zhao
2020-03-13  3:11 ` [PATCH v4 6/7] drm/i915/gvt: switch to user vfio_group_pin/upin_pages Yan Zhao
2020-03-13  3:12 ` [PATCH v4 7/7] drm/i915/gvt: rw more pages a time for shadow context Yan Zhao
2020-03-16  3:23   ` Zhenyu Wang
2020-03-13 22:29 ` [PATCH v4 0/7] use vfio_dma_rw to read/write IOVAs from CPU side Alex Williamson
2020-03-15 23:56   ` Yan Zhao
2020-03-16  3:24   ` Zhenyu Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).