All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough
@ 2021-01-25  9:03 Shenming Lu
  2021-01-25  9:03 ` [RFC PATCH v1 1/4] vfio/type1: Add a bitmap to track IOPF mapped pages Shenming Lu
                   ` (4 more replies)
  0 siblings, 5 replies; 31+ messages in thread
From: Shenming Lu @ 2021-01-25  9:03 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, kvm, linux-kernel
  Cc: Jean-Philippe Brucker, Eric Auger, Lu Baolu, Kevin Tian,
	wanghaibin.wang, yuzenghui, lushenming

Hi,

The static pinning and mapping problem in VFIO and possible solutions
have been discussed a lot [1, 2]. One of the solutions is to add I/O
page fault support for VFIO devices. Different from those relatively
complicated software approaches such as presenting a vIOMMU that provides
the DMA buffer information (might include para-virtualized optimizations),
IOPF mainly depends on the hardware faulting capability, such as the PCIe
PRI extension or Arm SMMU stall model. What's more, the IOPF support in
the IOMMU driver is being implemented in SVA [3]. So do we consider to
add IOPF support for VFIO passthrough based on the IOPF part of SVA at
present?

We have implemented a basic demo only for one stage of translation (GPA
-> HPA in virtualization, note that it can be configured at either stage),
and tested on Hisilicon Kunpeng920 board. The nested mode is more complicated
since VFIO only handles the second stage page faults (same as the non-nested
case), while the first stage page faults need to be further delivered to
the guest, which is being implemented in [4] on ARM. My thought on this
is to report the page faults to VFIO regardless of the occured stage (try
to carry the stage information), and handle respectively according to the
configured mode in VFIO. Or the IOMMU driver might evolve to support more...

Might TODO:
 - Optimize the faulting path, and measure the performance (it might still
   be a big issue).
 - Add support for PRI.
 - Add a MMU notifier to avoid pinning.
 - Add support for the nested mode.
...

Any comments and suggestions are very welcome. :-)

Links:
[1] Lesokhin I, et al. Page Fault Support for Network Controllers. In ASPLOS,
    2016.
[2] Tian K, et al. coIOMMU: A Virtual IOMMU with Cooperative DMA Buffer Tracking
    for Efficient Memory Management in Direct I/O. In USENIX ATC, 2020.
[3] iommu: I/O page faults for SMMUv3:
    https://patchwork.kernel.org/project/linux-arm-kernel/cover/20210121123623.2060416-1-jean-philippe@linaro.org/
[4] SMMUv3 Nested Stage Setup (VFIO part):
    https://patchwork.kernel.org/project/kvm/cover/20201116110030.32335-1-eric.auger@redhat.com/

Thanks,
Shenming


Shenming Lu (4):
  vfio/type1: Add a bitmap to track IOPF mapped pages
  vfio: Add a page fault handler
  vfio: Try to enable IOPF for VFIO devices
  vfio: Allow to pin and map dynamically

 drivers/vfio/vfio.c             |  75 ++++++++++++++++++
 drivers/vfio/vfio_iommu_type1.c | 131 +++++++++++++++++++++++++++++++-
 include/linux/vfio.h            |   6 ++
 3 files changed, 211 insertions(+), 1 deletion(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC PATCH v1 1/4] vfio/type1: Add a bitmap to track IOPF mapped pages
  2021-01-25  9:03 [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough Shenming Lu
@ 2021-01-25  9:03 ` Shenming Lu
  2021-01-29 22:58   ` Alex Williamson
  2021-01-25  9:04 ` [RFC PATCH v1 2/4] vfio: Add a page fault handler Shenming Lu
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 31+ messages in thread
From: Shenming Lu @ 2021-01-25  9:03 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, kvm, linux-kernel
  Cc: Jean-Philippe Brucker, Eric Auger, Lu Baolu, Kevin Tian,
	wanghaibin.wang, yuzenghui, lushenming

When IOPF enabled, the pages are pinned and mapped on demand, we add
a bitmap to track them.

Signed-off-by: Shenming Lu <lushenming@huawei.com>
---
 drivers/vfio/vfio_iommu_type1.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 0b4dedaa9128..f1d4de5ab094 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -95,6 +95,7 @@ struct vfio_dma {
 	struct task_struct	*task;
 	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
 	unsigned long		*bitmap;
+	unsigned long		*iommu_mapped_bitmap;	/* IOPF mapped bitmap */
 };
 
 struct vfio_group {
@@ -143,6 +144,8 @@ struct vfio_regions {
 #define DIRTY_BITMAP_PAGES_MAX	 ((u64)INT_MAX)
 #define DIRTY_BITMAP_SIZE_MAX	 DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
 
+#define IOMMU_MAPPED_BITMAP_BYTES(n) DIRTY_BITMAP_BYTES(n)
+
 static int put_pfn(unsigned long pfn, int prot);
 
 static struct vfio_group *vfio_iommu_find_iommu_group(struct vfio_iommu *iommu,
@@ -949,6 +952,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
 	vfio_unlink_dma(iommu, dma);
 	put_task_struct(dma->task);
 	vfio_dma_bitmap_free(dma);
+	kfree(dma->iommu_mapped_bitmap);
 	kfree(dma);
 	iommu->dma_avail++;
 }
@@ -1354,6 +1358,14 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 		goto out_unlock;
 	}
 
+	dma->iommu_mapped_bitmap = kvzalloc(IOMMU_MAPPED_BITMAP_BYTES(size / PAGE_SIZE),
+					    GFP_KERNEL);
+	if (!dma->iommu_mapped_bitmap) {
+		ret = -ENOMEM;
+		kfree(dma);
+		goto out_unlock;
+	}
+
 	iommu->dma_avail--;
 	dma->iova = iova;
 	dma->vaddr = vaddr;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH v1 2/4] vfio: Add a page fault handler
  2021-01-25  9:03 [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough Shenming Lu
  2021-01-25  9:03 ` [RFC PATCH v1 1/4] vfio/type1: Add a bitmap to track IOPF mapped pages Shenming Lu
@ 2021-01-25  9:04 ` Shenming Lu
  2021-01-27 17:42   ` Christoph Hellwig
  2021-01-25  9:04 ` [RFC PATCH v1 3/4] vfio: Try to enable IOPF for VFIO devices Shenming Lu
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 31+ messages in thread
From: Shenming Lu @ 2021-01-25  9:04 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, kvm, linux-kernel
  Cc: Jean-Philippe Brucker, Eric Auger, Lu Baolu, Kevin Tian,
	wanghaibin.wang, yuzenghui, lushenming

VFIO manages the passthrough DMA mapping itself. In order to support
IOPF for VFIO devices, we need to add a VFIO page fault handler to
serve the reported page faults from the IOMMU driver.

Signed-off-by: Shenming Lu <lushenming@huawei.com>
---
 drivers/vfio/vfio.c             | 35 ++++++++++++++++++++
 drivers/vfio/vfio_iommu_type1.c | 58 +++++++++++++++++++++++++++++++++
 include/linux/vfio.h            |  5 +++
 3 files changed, 98 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 4ad8a35667a7..ff7797260d0f 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -2349,6 +2349,41 @@ struct iommu_domain *vfio_group_iommu_domain(struct vfio_group *group)
 }
 EXPORT_SYMBOL_GPL(vfio_group_iommu_domain);
 
+int vfio_iommu_dev_fault_handler(struct iommu_fault *fault, void *data)
+{
+	struct device *dev = (struct device *)data;
+	struct vfio_container *container;
+	struct vfio_group *group;
+	struct vfio_iommu_driver *driver;
+	int ret;
+
+	if (!dev)
+		return -EINVAL;
+
+	group = vfio_group_get_from_dev(dev);
+	if (!group)
+		return -ENODEV;
+
+	ret = vfio_group_add_container_user(group);
+	if (ret)
+		goto out;
+
+	container = group->container;
+	driver = container->iommu_driver;
+	if (likely(driver && driver->ops->dynamic_dma_map))
+		ret = driver->ops->dynamic_dma_map(container->iommu_data,
+						   fault, dev);
+	else
+		ret = -ENOTTY;
+
+	vfio_group_try_dissolve_container(group);
+
+out:
+	vfio_group_put(group);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(vfio_iommu_dev_fault_handler);
+
 /**
  * Module/class support
  */
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index f1d4de5ab094..ac6f00c97897 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -145,6 +145,8 @@ struct vfio_regions {
 #define DIRTY_BITMAP_SIZE_MAX	 DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
 
 #define IOMMU_MAPPED_BITMAP_BYTES(n) DIRTY_BITMAP_BYTES(n)
+#define IOMMU_MAPPED_BITMAP_GET(dma, i)	((dma->iommu_mapped_bitmap[i / BITS_PER_LONG]	\
+					 >> (i % BITS_PER_LONG)) & 0x1)
 
 static int put_pfn(unsigned long pfn, int prot);
 
@@ -2992,6 +2994,61 @@ static int vfio_iommu_type1_dma_rw(void *iommu_data, dma_addr_t user_iova,
 	return ret;
 }
 
+static int vfio_iommu_type1_dynamic_dma_map(void *iommu_data,
+					    struct iommu_fault *fault,
+					    struct device *dev)
+{
+	struct vfio_iommu *iommu = iommu_data;
+	dma_addr_t iova = ALIGN_DOWN(fault->prm.addr, PAGE_SIZE);
+	struct vfio_dma *dma;
+	int access_flags = 0;
+	unsigned long bit_offset, vaddr, pfn;
+	enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
+	struct iommu_page_response resp = {0};
+
+	if (fault->type != IOMMU_FAULT_PAGE_REQ)
+		return -EOPNOTSUPP;
+
+	mutex_lock(&iommu->lock);
+
+	dma = vfio_find_dma(iommu, iova, PAGE_SIZE);
+	if (!dma)
+		goto out_invalid;
+
+	if (fault->prm.perm & IOMMU_FAULT_PERM_READ)
+		access_flags |= IOMMU_READ;
+	if (fault->prm.perm & IOMMU_FAULT_PERM_WRITE)
+		access_flags |= IOMMU_WRITE;
+	if ((dma->prot & access_flags) != access_flags)
+		goto out_invalid;
+
+	bit_offset = (iova - dma->iova) >> PAGE_SHIFT;
+	if (IOMMU_MAPPED_BITMAP_GET(dma, bit_offset))
+		goto out_success;
+
+	vaddr = iova - dma->iova + dma->vaddr;
+	if (vfio_pin_page_external(dma, vaddr, &pfn, true))
+		goto out_invalid;
+
+	if (vfio_iommu_map(iommu, iova, pfn, 1, dma->prot)) {
+		vfio_unpin_page_external(dma, iova, true);
+		goto out_invalid;
+	}
+
+	bitmap_set(dma->iommu_mapped_bitmap, bit_offset, 1);
+
+out_success:
+	status = IOMMU_PAGE_RESP_SUCCESS;
+
+out_invalid:
+	mutex_unlock(&iommu->lock);
+	resp.version		= IOMMU_PAGE_RESP_VERSION_1;
+	resp.grpid		= fault->prm.grpid;
+	resp.code		= status;
+	iommu_page_response(dev, &resp);
+	return 0;
+}
+
 static struct iommu_domain *
 vfio_iommu_type1_group_iommu_domain(void *iommu_data,
 				    struct iommu_group *iommu_group)
@@ -3028,6 +3085,7 @@ static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
 	.register_notifier	= vfio_iommu_type1_register_notifier,
 	.unregister_notifier	= vfio_iommu_type1_unregister_notifier,
 	.dma_rw			= vfio_iommu_type1_dma_rw,
+	.dynamic_dma_map	= vfio_iommu_type1_dynamic_dma_map,
 	.group_iommu_domain	= vfio_iommu_type1_group_iommu_domain,
 };
 
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index f45940b38a02..6d535f029f21 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -90,6 +90,9 @@ struct vfio_iommu_driver_ops {
 					       struct notifier_block *nb);
 	int		(*dma_rw)(void *iommu_data, dma_addr_t user_iova,
 				  void *data, size_t count, bool write);
+	int		(*dynamic_dma_map)(void *iommu_data,
+					   struct iommu_fault *fault,
+					   struct device *dev);
 	struct iommu_domain *(*group_iommu_domain)(void *iommu_data,
 						   struct iommu_group *group);
 };
@@ -153,6 +156,8 @@ extern int vfio_unregister_notifier(struct device *dev,
 struct kvm;
 extern void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
 
+extern int vfio_iommu_dev_fault_handler(struct iommu_fault *fault, void *data);
+
 /*
  * Sub-module helpers
  */
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH v1 3/4] vfio: Try to enable IOPF for VFIO devices
  2021-01-25  9:03 [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough Shenming Lu
  2021-01-25  9:03 ` [RFC PATCH v1 1/4] vfio/type1: Add a bitmap to track IOPF mapped pages Shenming Lu
  2021-01-25  9:04 ` [RFC PATCH v1 2/4] vfio: Add a page fault handler Shenming Lu
@ 2021-01-25  9:04 ` Shenming Lu
  2021-01-26 14:33   ` kernel test robot
                     ` (2 more replies)
  2021-01-25  9:04 ` [RFC PATCH v1 4/4] vfio: Allow to pin and map dynamically Shenming Lu
  2021-01-29 22:57 ` [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough Alex Williamson
  4 siblings, 3 replies; 31+ messages in thread
From: Shenming Lu @ 2021-01-25  9:04 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, kvm, linux-kernel
  Cc: Jean-Philippe Brucker, Eric Auger, Lu Baolu, Kevin Tian,
	wanghaibin.wang, yuzenghui, lushenming

If IOMMU_DEV_FEAT_IOPF is set for the VFIO device, which means that
the delivering of page faults of this device from the IOMMU is enabled,
we register the VFIO page fault handler to complete the whole faulting
path (HW+SW). And add a iopf_enabled field in struct vfio_device to
record it.

Signed-off-by: Shenming Lu <lushenming@huawei.com>
---
 drivers/vfio/vfio.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index ff7797260d0f..fd885d99ee0f 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -97,6 +97,7 @@ struct vfio_device {
 	struct vfio_group		*group;
 	struct list_head		group_next;
 	void				*device_data;
+	bool				iopf_enabled;
 };
 
 #ifdef CONFIG_VFIO_NOIOMMU
@@ -532,6 +533,21 @@ static struct vfio_group *vfio_group_get_from_dev(struct device *dev)
 /**
  * Device objects - create, release, get, put, search
  */
+
+static void vfio_device_enable_iopf(struct vfio_device *device)
+{
+	struct device *dev = device->dev;
+
+	if (!iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_IOPF))
+		return;
+
+	if (WARN_ON(iommu_register_device_fault_handler(dev,
+					vfio_iommu_dev_fault_handler, dev)))
+		return;
+
+	device->iopf_enabled = true;
+}
+
 static
 struct vfio_device *vfio_group_create_device(struct vfio_group *group,
 					     struct device *dev,
@@ -549,6 +565,8 @@ struct vfio_device *vfio_group_create_device(struct vfio_group *group,
 	device->group = group;
 	device->ops = ops;
 	device->device_data = device_data;
+	/* By default try to enable IOPF */
+	vfio_device_enable_iopf(device);
 	dev_set_drvdata(dev, device);
 
 	/* No need to get group_lock, caller has group reference */
@@ -573,6 +591,8 @@ static void vfio_device_release(struct kref *kref)
 	mutex_unlock(&group->device_lock);
 
 	dev_set_drvdata(device->dev, NULL);
+	if (device->iopf_enabled)
+		WARN_ON(iommu_unregister_device_fault_handler(device->dev));
 
 	kfree(device);
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [RFC PATCH v1 4/4] vfio: Allow to pin and map dynamically
  2021-01-25  9:03 [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough Shenming Lu
                   ` (2 preceding siblings ...)
  2021-01-25  9:04 ` [RFC PATCH v1 3/4] vfio: Try to enable IOPF for VFIO devices Shenming Lu
@ 2021-01-25  9:04 ` Shenming Lu
  2021-01-29 22:57 ` [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough Alex Williamson
  4 siblings, 0 replies; 31+ messages in thread
From: Shenming Lu @ 2021-01-25  9:04 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, kvm, linux-kernel
  Cc: Jean-Philippe Brucker, Eric Auger, Lu Baolu, Kevin Tian,
	wanghaibin.wang, yuzenghui, lushenming

If IOPF enabled for the whole VFIO container, there is no need to
statically pin and map the entire DMA range, we can do it on demand.
And unmap and unpin according to the IOPF mapped bitmap when removing
the DMA mapping.

Signed-off-by: Shenming Lu <lushenming@huawei.com>
---
 drivers/vfio/vfio.c             | 20 +++++++++++
 drivers/vfio/vfio_iommu_type1.c | 61 ++++++++++++++++++++++++++++++++-
 include/linux/vfio.h            |  1 +
 3 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index fd885d99ee0f..466959f4d661 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -2404,6 +2404,26 @@ int vfio_iommu_dev_fault_handler(struct iommu_fault *fault, void *data)
 }
 EXPORT_SYMBOL_GPL(vfio_iommu_dev_fault_handler);
 
+/*
+ * Return 0 if enabled.
+ */
+int vfio_device_iopf_enabled(struct device *dev, void *data)
+{
+	struct vfio_device *device;
+	int ret = 0;
+
+	device = vfio_device_get_from_dev(dev);
+	if (!device)
+		return -ENODEV;
+
+	if (!device->iopf_enabled)
+		ret = 1;
+
+	vfio_device_put(device);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(vfio_device_iopf_enabled);
+
 /**
  * Module/class support
  */
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index ac6f00c97897..da84155513e4 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -864,6 +864,43 @@ static size_t unmap_unpin_slow(struct vfio_domain *domain,
 	return unmapped;
 }
 
+static long vfio_clear_iommu_mapped_bitmap(struct vfio_iommu *iommu,
+					   struct vfio_dma *dma,
+					   bool do_accounting)
+{
+	dma_addr_t iova = dma->iova;
+	size_t size = dma->size;
+	uint64_t i, npages = size / PAGE_SIZE;
+	long unlocked = 0;
+
+	for (i = 0; i < npages; i++, iova += PAGE_SIZE) {
+		if (IOMMU_MAPPED_BITMAP_GET(dma, i)) {
+			struct vfio_domain *d;
+			phys_addr_t phys;
+
+			d = list_first_entry(&iommu->domain_list,
+					     struct vfio_domain, next);
+			phys = iommu_iova_to_phys(d->domain, iova);
+			if (WARN_ON(!phys))
+				continue;
+
+			list_for_each_entry(d, &iommu->domain_list, next) {
+				iommu_unmap(d->domain, iova, PAGE_SIZE);
+				cond_resched();
+			}
+			vfio_unpin_pages_remote(dma, iova, phys >> PAGE_SHIFT,
+						1, do_accounting);
+
+			bitmap_clear(dma->iommu_mapped_bitmap, i, 1);
+			unlocked++;
+		}
+	}
+
+	if (do_accounting)
+		return 0;
+	return unlocked;
+}
+
 static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
 			     bool do_accounting)
 {
@@ -880,6 +917,10 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
 	if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu))
 		return 0;
 
+	if (!dma->iommu_mapped)
+		return vfio_clear_iommu_mapped_bitmap(iommu, dma,
+						      do_accounting);
+
 	/*
 	 * We use the IOMMU to track the physical addresses, otherwise we'd
 	 * need a much more complicated tracking system.  Unfortunately that
@@ -1302,6 +1343,23 @@ static bool vfio_iommu_iova_dma_valid(struct vfio_iommu *iommu,
 	return list_empty(iova);
 }
 
+static bool vfio_iommu_iopf_enabled(struct vfio_iommu *iommu)
+{
+	struct vfio_domain *d;
+
+	list_for_each_entry(d, &iommu->domain_list, next) {
+		struct vfio_group *g;
+
+		list_for_each_entry(g, &d->group_list, next) {
+			if (iommu_group_for_each_dev(g->iommu_group, NULL,
+						     vfio_device_iopf_enabled))
+				return false;
+		}
+	}
+
+	return true;
+}
+
 static int vfio_dma_do_map(struct vfio_iommu *iommu,
 			   struct vfio_iommu_type1_dma_map *map)
 {
@@ -1408,7 +1466,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 	vfio_link_dma(iommu, dma);
 
 	/* Don't pin and map if container doesn't contain IOMMU capable domain*/
-	if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu))
+	if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) ||
+	    vfio_iommu_iopf_enabled(iommu))
 		dma->size = size;
 	else
 		ret = vfio_pin_map_dma(iommu, dma, size);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 6d535f029f21..cea1e9fd4bb4 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -157,6 +157,7 @@ struct kvm;
 extern void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
 
 extern int vfio_iommu_dev_fault_handler(struct iommu_fault *fault, void *data);
+extern int vfio_device_iopf_enabled(struct device *dev, void *data);
 
 /*
  * Sub-module helpers
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 3/4] vfio: Try to enable IOPF for VFIO devices
  2021-01-25  9:04 ` [RFC PATCH v1 3/4] vfio: Try to enable IOPF for VFIO devices Shenming Lu
@ 2021-01-26 14:33   ` kernel test robot
  2021-01-27  6:31   ` kernel test robot
  2021-01-29 22:42   ` Alex Williamson
  2 siblings, 0 replies; 31+ messages in thread
From: kernel test robot @ 2021-01-26 14:33 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2564 bytes --]

Hi Shenming,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on vfio/next]
[also build test ERROR on v5.11-rc5 next-20210125]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Shenming-Lu/vfio-type1-Add-a-bitmap-to-track-IOPF-mapped-pages/20210126-141137
base:   https://github.com/awilliam/linux-vfio.git next
config: arm-randconfig-r014-20210126 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 925ae8c790c7e354f12ec14a6cac6aa49fc75b29)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm cross compiling tool for clang build
        # apt-get install binutils-arm-linux-gnueabi
        # https://github.com/0day-ci/linux/commit/275e8cc280f586d526fb8bf83f6b1575e52256a1
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Shenming-Lu/vfio-type1-Add-a-bitmap-to-track-IOPF-mapped-pages/20210126-141137
        git checkout 275e8cc280f586d526fb8bf83f6b1575e52256a1
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> drivers/vfio/vfio.c:541:34: error: use of undeclared identifier 'IOMMU_DEV_FEAT_IOPF'
           if (!iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_IOPF))
                                           ^
   1 error generated.


vim +/IOMMU_DEV_FEAT_IOPF +541 drivers/vfio/vfio.c

   532	
   533	/**
   534	 * Device objects - create, release, get, put, search
   535	 */
   536	
   537	static void vfio_device_enable_iopf(struct vfio_device *device)
   538	{
   539		struct device *dev = device->dev;
   540	
 > 541		if (!iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_IOPF))
   542			return;
   543	
   544		if (WARN_ON(iommu_register_device_fault_handler(dev,
   545						vfio_iommu_dev_fault_handler, dev)))
   546			return;
   547	
   548		device->iopf_enabled = true;
   549	}
   550	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 36340 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 3/4] vfio: Try to enable IOPF for VFIO devices
  2021-01-25  9:04 ` [RFC PATCH v1 3/4] vfio: Try to enable IOPF for VFIO devices Shenming Lu
  2021-01-26 14:33   ` kernel test robot
@ 2021-01-27  6:31   ` kernel test robot
  2021-01-29 22:42   ` Alex Williamson
  2 siblings, 0 replies; 31+ messages in thread
From: kernel test robot @ 2021-01-27  6:31 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2446 bytes --]

Hi Shenming,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on vfio/next]
[also build test ERROR on v5.11-rc5 next-20210125]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Shenming-Lu/vfio-type1-Add-a-bitmap-to-track-IOPF-mapped-pages/20210126-141137
base:   https://github.com/awilliam/linux-vfio.git next
config: x86_64-rhel-8.3 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce (this is a W=1 build):
        # https://github.com/0day-ci/linux/commit/275e8cc280f586d526fb8bf83f6b1575e52256a1
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Shenming-Lu/vfio-type1-Add-a-bitmap-to-track-IOPF-mapped-pages/20210126-141137
        git checkout 275e8cc280f586d526fb8bf83f6b1575e52256a1
        # save the attached .config to linux build tree
        make W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/vfio/vfio.c: In function 'vfio_device_enable_iopf':
>> drivers/vfio/vfio.c:541:34: error: 'IOMMU_DEV_FEAT_IOPF' undeclared (first use in this function); did you mean 'IOMMU_DEV_FEAT_SVA'?
     541 |  if (!iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_IOPF))
         |                                  ^~~~~~~~~~~~~~~~~~~
         |                                  IOMMU_DEV_FEAT_SVA
   drivers/vfio/vfio.c:541:34: note: each undeclared identifier is reported only once for each function it appears in


vim +541 drivers/vfio/vfio.c

   532	
   533	/**
   534	 * Device objects - create, release, get, put, search
   535	 */
   536	
   537	static void vfio_device_enable_iopf(struct vfio_device *device)
   538	{
   539		struct device *dev = device->dev;
   540	
 > 541		if (!iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_IOPF))
   542			return;
   543	
   544		if (WARN_ON(iommu_register_device_fault_handler(dev,
   545						vfio_iommu_dev_fault_handler, dev)))
   546			return;
   547	
   548		device->iopf_enabled = true;
   549	}
   550	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 40834 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 2/4] vfio: Add a page fault handler
  2021-01-25  9:04 ` [RFC PATCH v1 2/4] vfio: Add a page fault handler Shenming Lu
@ 2021-01-27 17:42   ` Christoph Hellwig
  2021-01-28  6:10     ` Shenming Lu
  0 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2021-01-27 17:42 UTC (permalink / raw)
  To: Shenming Lu
  Cc: Alex Williamson, Cornelia Huck, kvm, linux-kernel,
	Jean-Philippe Brucker, Eric Auger, Lu Baolu, Kevin Tian,
	wanghaibin.wang, yuzenghui

On Mon, Jan 25, 2021 at 05:04:00PM +0800, Shenming Lu wrote:
> +EXPORT_SYMBOL_GPL(vfio_iommu_dev_fault_handler);

This function is only used in vfio.c itself, so it should not be
exported, but rather marked static.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 2/4] vfio: Add a page fault handler
  2021-01-27 17:42   ` Christoph Hellwig
@ 2021-01-28  6:10     ` Shenming Lu
  0 siblings, 0 replies; 31+ messages in thread
From: Shenming Lu @ 2021-01-28  6:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Alex Williamson, Cornelia Huck, kvm, linux-kernel,
	Jean-Philippe Brucker, Eric Auger, Lu Baolu, Kevin Tian,
	wanghaibin.wang, yuzenghui

On 2021/1/28 1:42, Christoph Hellwig wrote:
> On Mon, Jan 25, 2021 at 05:04:00PM +0800, Shenming Lu wrote:
>> +EXPORT_SYMBOL_GPL(vfio_iommu_dev_fault_handler);
> 
> This function is only used in vfio.c itself, so it should not be
> exported, but rather marked static.
> .
> 

Yeah, it makes sense. Thanks,

Shenming

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 3/4] vfio: Try to enable IOPF for VFIO devices
  2021-01-25  9:04 ` [RFC PATCH v1 3/4] vfio: Try to enable IOPF for VFIO devices Shenming Lu
  2021-01-26 14:33   ` kernel test robot
  2021-01-27  6:31   ` kernel test robot
@ 2021-01-29 22:42   ` Alex Williamson
  2021-01-30  9:31     ` Shenming Lu
  2 siblings, 1 reply; 31+ messages in thread
From: Alex Williamson @ 2021-01-29 22:42 UTC (permalink / raw)
  To: Shenming Lu
  Cc: Cornelia Huck, kvm, linux-kernel, Jean-Philippe Brucker,
	Eric Auger, Lu Baolu, Kevin Tian, wanghaibin.wang, yuzenghui

On Mon, 25 Jan 2021 17:04:01 +0800
Shenming Lu <lushenming@huawei.com> wrote:

> If IOMMU_DEV_FEAT_IOPF is set for the VFIO device, which means that
> the delivering of page faults of this device from the IOMMU is enabled,
> we register the VFIO page fault handler to complete the whole faulting
> path (HW+SW). And add a iopf_enabled field in struct vfio_device to
> record it.
> 
> Signed-off-by: Shenming Lu <lushenming@huawei.com>
> ---
>  drivers/vfio/vfio.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index ff7797260d0f..fd885d99ee0f 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -97,6 +97,7 @@ struct vfio_device {
>  	struct vfio_group		*group;
>  	struct list_head		group_next;
>  	void				*device_data;
> +	bool				iopf_enabled;
>  };
>  
>  #ifdef CONFIG_VFIO_NOIOMMU
> @@ -532,6 +533,21 @@ static struct vfio_group *vfio_group_get_from_dev(struct device *dev)
>  /**
>   * Device objects - create, release, get, put, search
>   */
> +
> +static void vfio_device_enable_iopf(struct vfio_device *device)
> +{
> +	struct device *dev = device->dev;
> +
> +	if (!iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_IOPF))
> +		return;
> +
> +	if (WARN_ON(iommu_register_device_fault_handler(dev,
> +					vfio_iommu_dev_fault_handler, dev)))

The layering here is wrong, vfio-core doesn't manage the IOMMU, we have
backend IOMMU drivers for that.  We can't even assume we have IOMMU API
support here, that's what the type1 backend handles.  Thanks,

Alex

> +		return;
> +
> +	device->iopf_enabled = true;
> +}
> +
>  static
>  struct vfio_device *vfio_group_create_device(struct vfio_group *group,
>  					     struct device *dev,
> @@ -549,6 +565,8 @@ struct vfio_device *vfio_group_create_device(struct vfio_group *group,
>  	device->group = group;
>  	device->ops = ops;
>  	device->device_data = device_data;
> +	/* By default try to enable IOPF */
> +	vfio_device_enable_iopf(device);
>  	dev_set_drvdata(dev, device);
>  
>  	/* No need to get group_lock, caller has group reference */
> @@ -573,6 +591,8 @@ static void vfio_device_release(struct kref *kref)
>  	mutex_unlock(&group->device_lock);
>  
>  	dev_set_drvdata(device->dev, NULL);
> +	if (device->iopf_enabled)
> +		WARN_ON(iommu_unregister_device_fault_handler(device->dev));
>  
>  	kfree(device);
>  


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough
  2021-01-25  9:03 [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough Shenming Lu
                   ` (3 preceding siblings ...)
  2021-01-25  9:04 ` [RFC PATCH v1 4/4] vfio: Allow to pin and map dynamically Shenming Lu
@ 2021-01-29 22:57 ` Alex Williamson
  2021-01-30  9:30   ` Shenming Lu
  2021-02-01  7:56   ` Tian, Kevin
  4 siblings, 2 replies; 31+ messages in thread
From: Alex Williamson @ 2021-01-29 22:57 UTC (permalink / raw)
  To: Shenming Lu
  Cc: Cornelia Huck, kvm, linux-kernel, Jean-Philippe Brucker,
	Eric Auger, Lu Baolu, Kevin Tian, wanghaibin.wang, yuzenghui

On Mon, 25 Jan 2021 17:03:58 +0800
Shenming Lu <lushenming@huawei.com> wrote:

> Hi,
> 
> The static pinning and mapping problem in VFIO and possible solutions
> have been discussed a lot [1, 2]. One of the solutions is to add I/O
> page fault support for VFIO devices. Different from those relatively
> complicated software approaches such as presenting a vIOMMU that provides
> the DMA buffer information (might include para-virtualized optimizations),
> IOPF mainly depends on the hardware faulting capability, such as the PCIe
> PRI extension or Arm SMMU stall model. What's more, the IOPF support in
> the IOMMU driver is being implemented in SVA [3]. So do we consider to
> add IOPF support for VFIO passthrough based on the IOPF part of SVA at
> present?
> 
> We have implemented a basic demo only for one stage of translation (GPA
> -> HPA in virtualization, note that it can be configured at either stage),  
> and tested on Hisilicon Kunpeng920 board. The nested mode is more complicated
> since VFIO only handles the second stage page faults (same as the non-nested
> case), while the first stage page faults need to be further delivered to
> the guest, which is being implemented in [4] on ARM. My thought on this
> is to report the page faults to VFIO regardless of the occured stage (try
> to carry the stage information), and handle respectively according to the
> configured mode in VFIO. Or the IOMMU driver might evolve to support more...
> 
> Might TODO:
>  - Optimize the faulting path, and measure the performance (it might still
>    be a big issue).
>  - Add support for PRI.
>  - Add a MMU notifier to avoid pinning.
>  - Add support for the nested mode.
> ...
> 
> Any comments and suggestions are very welcome. :-)

I expect performance to be pretty bad here, the lookup involved per
fault is excessive.  There are cases where a user is not going to be
willing to have a slow ramp up of performance for their devices as they
fault in pages, so we might need to considering making this
configurable through the vfio interface.  Our page mapping also only
grows here, should mappings expire or do we need a least recently
mapped tracker to avoid exceeding the user's locked memory limit?  How
does a user know what to set for a locked memory limit?  The behavior
here would lead to cases where an idle system might be ok, but as soon
as load increases with more inflight DMA, we start seeing
"unpredictable" I/O faults from the user perspective.  Seems like there
are lots of outstanding considerations and I'd also like to hear from
the SVA folks about how this meshes with their work.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 1/4] vfio/type1: Add a bitmap to track IOPF mapped pages
  2021-01-25  9:03 ` [RFC PATCH v1 1/4] vfio/type1: Add a bitmap to track IOPF mapped pages Shenming Lu
@ 2021-01-29 22:58   ` Alex Williamson
  2021-01-30  9:31     ` Shenming Lu
  0 siblings, 1 reply; 31+ messages in thread
From: Alex Williamson @ 2021-01-29 22:58 UTC (permalink / raw)
  To: Shenming Lu
  Cc: Cornelia Huck, kvm, linux-kernel, Jean-Philippe Brucker,
	Eric Auger, Lu Baolu, Kevin Tian, wanghaibin.wang, yuzenghui

On Mon, 25 Jan 2021 17:03:59 +0800
Shenming Lu <lushenming@huawei.com> wrote:

> When IOPF enabled, the pages are pinned and mapped on demand, we add
> a bitmap to track them.
> 
> Signed-off-by: Shenming Lu <lushenming@huawei.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 0b4dedaa9128..f1d4de5ab094 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -95,6 +95,7 @@ struct vfio_dma {
>  	struct task_struct	*task;
>  	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
>  	unsigned long		*bitmap;
> +	unsigned long		*iommu_mapped_bitmap;	/* IOPF mapped bitmap */
>  };
>  
>  struct vfio_group {
> @@ -143,6 +144,8 @@ struct vfio_regions {
>  #define DIRTY_BITMAP_PAGES_MAX	 ((u64)INT_MAX)
>  #define DIRTY_BITMAP_SIZE_MAX	 DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
>  
> +#define IOMMU_MAPPED_BITMAP_BYTES(n) DIRTY_BITMAP_BYTES(n)
> +
>  static int put_pfn(unsigned long pfn, int prot);
>  
>  static struct vfio_group *vfio_iommu_find_iommu_group(struct vfio_iommu *iommu,
> @@ -949,6 +952,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>  	vfio_unlink_dma(iommu, dma);
>  	put_task_struct(dma->task);
>  	vfio_dma_bitmap_free(dma);
> +	kfree(dma->iommu_mapped_bitmap);
>  	kfree(dma);
>  	iommu->dma_avail++;
>  }
> @@ -1354,6 +1358,14 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>  		goto out_unlock;
>  	}
>  
> +	dma->iommu_mapped_bitmap = kvzalloc(IOMMU_MAPPED_BITMAP_BYTES(size / PAGE_SIZE),
> +					    GFP_KERNEL);

This is a lot of bloat for all the platforms that don't support this
feature.  Thanks,

Alex

> +	if (!dma->iommu_mapped_bitmap) {
> +		ret = -ENOMEM;
> +		kfree(dma);
> +		goto out_unlock;
> +	}
> +
>  	iommu->dma_avail--;
>  	dma->iova = iova;
>  	dma->vaddr = vaddr;


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough
  2021-01-29 22:57 ` [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough Alex Williamson
@ 2021-01-30  9:30   ` Shenming Lu
  2021-02-01  7:56   ` Tian, Kevin
  1 sibling, 0 replies; 31+ messages in thread
From: Shenming Lu @ 2021-01-30  9:30 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Cornelia Huck, kvm, linux-kernel, Jean-Philippe Brucker,
	Eric Auger, Lu Baolu, Kevin Tian, wanghaibin.wang, yuzenghui

On 2021/1/30 6:57, Alex Williamson wrote:
> On Mon, 25 Jan 2021 17:03:58 +0800
> Shenming Lu <lushenming@huawei.com> wrote:
> 
>> Hi,
>>
>> The static pinning and mapping problem in VFIO and possible solutions
>> have been discussed a lot [1, 2]. One of the solutions is to add I/O
>> page fault support for VFIO devices. Different from those relatively
>> complicated software approaches such as presenting a vIOMMU that provides
>> the DMA buffer information (might include para-virtualized optimizations),
>> IOPF mainly depends on the hardware faulting capability, such as the PCIe
>> PRI extension or Arm SMMU stall model. What's more, the IOPF support in
>> the IOMMU driver is being implemented in SVA [3]. So do we consider to
>> add IOPF support for VFIO passthrough based on the IOPF part of SVA at
>> present?
>>
>> We have implemented a basic demo only for one stage of translation (GPA
>> -> HPA in virtualization, note that it can be configured at either stage),  
>> and tested on Hisilicon Kunpeng920 board. The nested mode is more complicated
>> since VFIO only handles the second stage page faults (same as the non-nested
>> case), while the first stage page faults need to be further delivered to
>> the guest, which is being implemented in [4] on ARM. My thought on this
>> is to report the page faults to VFIO regardless of the occured stage (try
>> to carry the stage information), and handle respectively according to the
>> configured mode in VFIO. Or the IOMMU driver might evolve to support more...
>>
>> Might TODO:
>>  - Optimize the faulting path, and measure the performance (it might still
>>    be a big issue).
>>  - Add support for PRI.
>>  - Add a MMU notifier to avoid pinning.
>>  - Add support for the nested mode.
>> ...
>>
>> Any comments and suggestions are very welcome. :-)
> 
> I expect performance to be pretty bad here, the lookup involved per
> fault is excessive.

We might consider to prepin more pages as a further optimization.

> There are cases where a user is not going to be
> willing to have a slow ramp up of performance for their devices as they
> fault in pages, so we might need to considering making this
> configurable through the vfio interface.

Yeah, makes sense, I will try to implement this: maybe add a ioctl called
VFIO_IOMMU_ENABLE_IOPF for Type1 VFIO IOMMU...

> Our page mapping also only
> grows here, should mappings expire or do we need a least recently
> mapped tracker to avoid exceeding the user's locked memory limit?  How
> does a user know what to set for a locked memory limit?

Yeah, we can add a LRU(mapped) tracker to release the pages when exceeding
a memory limit, maybe have a thread to periodically check this.
And as for the memory limit, maybe we could give the user some levels
(10%(default)/30%/50%/70%/unlimited of the total user memory (mapping size))
to choose from via the VFIO_IOMMU_ENABLE_IOPF ioctl...

> The behavior
> here would lead to cases where an idle system might be ok, but as soon
> as load increases with more inflight DMA, we start seeing
> "unpredictable" I/O faults from the user perspective.

"unpredictable" I/O faults? We might see more problems after more testing...

Thanks,
Shenming

> Seems like there
> are lots of outstanding considerations and I'd also like to hear from
> the SVA folks about how this meshes with their work.  Thanks,
> 
> Alex
> 
> .
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 1/4] vfio/type1: Add a bitmap to track IOPF mapped pages
  2021-01-29 22:58   ` Alex Williamson
@ 2021-01-30  9:31     ` Shenming Lu
  0 siblings, 0 replies; 31+ messages in thread
From: Shenming Lu @ 2021-01-30  9:31 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Cornelia Huck, kvm, linux-kernel, Jean-Philippe Brucker,
	Eric Auger, Lu Baolu, Kevin Tian, wanghaibin.wang, yuzenghui

On 2021/1/30 6:58, Alex Williamson wrote:
> On Mon, 25 Jan 2021 17:03:59 +0800
> Shenming Lu <lushenming@huawei.com> wrote:
> 
>> When IOPF enabled, the pages are pinned and mapped on demand, we add
>> a bitmap to track them.
>>
>> Signed-off-by: Shenming Lu <lushenming@huawei.com>
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 12 ++++++++++++
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 0b4dedaa9128..f1d4de5ab094 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -95,6 +95,7 @@ struct vfio_dma {
>>  	struct task_struct	*task;
>>  	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
>>  	unsigned long		*bitmap;
>> +	unsigned long		*iommu_mapped_bitmap;	/* IOPF mapped bitmap */
>>  };
>>  
>>  struct vfio_group {
>> @@ -143,6 +144,8 @@ struct vfio_regions {
>>  #define DIRTY_BITMAP_PAGES_MAX	 ((u64)INT_MAX)
>>  #define DIRTY_BITMAP_SIZE_MAX	 DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
>>  
>> +#define IOMMU_MAPPED_BITMAP_BYTES(n) DIRTY_BITMAP_BYTES(n)
>> +
>>  static int put_pfn(unsigned long pfn, int prot);
>>  
>>  static struct vfio_group *vfio_iommu_find_iommu_group(struct vfio_iommu *iommu,
>> @@ -949,6 +952,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>  	vfio_unlink_dma(iommu, dma);
>>  	put_task_struct(dma->task);
>>  	vfio_dma_bitmap_free(dma);
>> +	kfree(dma->iommu_mapped_bitmap);
>>  	kfree(dma);
>>  	iommu->dma_avail++;
>>  }
>> @@ -1354,6 +1358,14 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>  		goto out_unlock;
>>  	}
>>  
>> +	dma->iommu_mapped_bitmap = kvzalloc(IOMMU_MAPPED_BITMAP_BYTES(size / PAGE_SIZE),
>> +					    GFP_KERNEL);
> 
> This is a lot of bloat for all the platforms that don't support this
> feature.  Thanks,

Yes, I will make this dedicated to IOPF.

Thanks,
Shenming

> 
> Alex
> 
>> +	if (!dma->iommu_mapped_bitmap) {
>> +		ret = -ENOMEM;
>> +		kfree(dma);
>> +		goto out_unlock;
>> +	}
>> +
>>  	iommu->dma_avail--;
>>  	dma->iova = iova;
>>  	dma->vaddr = vaddr;
> 
> .
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 3/4] vfio: Try to enable IOPF for VFIO devices
  2021-01-29 22:42   ` Alex Williamson
@ 2021-01-30  9:31     ` Shenming Lu
  0 siblings, 0 replies; 31+ messages in thread
From: Shenming Lu @ 2021-01-30  9:31 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Cornelia Huck, kvm, linux-kernel, Jean-Philippe Brucker,
	Eric Auger, Lu Baolu, Kevin Tian, wanghaibin.wang, yuzenghui

On 2021/1/30 6:42, Alex Williamson wrote:
> On Mon, 25 Jan 2021 17:04:01 +0800
> Shenming Lu <lushenming@huawei.com> wrote:
> 
>> If IOMMU_DEV_FEAT_IOPF is set for the VFIO device, which means that
>> the delivering of page faults of this device from the IOMMU is enabled,
>> we register the VFIO page fault handler to complete the whole faulting
>> path (HW+SW). And add a iopf_enabled field in struct vfio_device to
>> record it.
>>
>> Signed-off-by: Shenming Lu <lushenming@huawei.com>
>> ---
>>  drivers/vfio/vfio.c | 20 ++++++++++++++++++++
>>  1 file changed, 20 insertions(+)
>>
>> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
>> index ff7797260d0f..fd885d99ee0f 100644
>> --- a/drivers/vfio/vfio.c
>> +++ b/drivers/vfio/vfio.c
>> @@ -97,6 +97,7 @@ struct vfio_device {
>>  	struct vfio_group		*group;
>>  	struct list_head		group_next;
>>  	void				*device_data;
>> +	bool				iopf_enabled;
>>  };
>>  
>>  #ifdef CONFIG_VFIO_NOIOMMU
>> @@ -532,6 +533,21 @@ static struct vfio_group *vfio_group_get_from_dev(struct device *dev)
>>  /**
>>   * Device objects - create, release, get, put, search
>>   */
>> +
>> +static void vfio_device_enable_iopf(struct vfio_device *device)
>> +{
>> +	struct device *dev = device->dev;
>> +
>> +	if (!iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_IOPF))
>> +		return;
>> +
>> +	if (WARN_ON(iommu_register_device_fault_handler(dev,
>> +					vfio_iommu_dev_fault_handler, dev)))
> 
> The layering here is wrong, vfio-core doesn't manage the IOMMU, we have
> backend IOMMU drivers for that.  We can't even assume we have IOMMU API
> support here, that's what the type1 backend handles.  Thanks,

Thanks for pointing it out, I will correct it: maybe do the enabling via
the VFIO_IOMMU_ENABLE_IOPF ioctl mentioned in the cover and suggest the
user to call it before VFIO_IOMMU_MAP_DMA, also move the iopf_enabled field
from struct vfio_device to struct vfio_iommu...

Thanks,
Shenming

> 
> Alex
> 
>> +		return;
>> +
>> +	device->iopf_enabled = true;
>> +}
>> +
>>  static
>>  struct vfio_device *vfio_group_create_device(struct vfio_group *group,
>>  					     struct device *dev,
>> @@ -549,6 +565,8 @@ struct vfio_device *vfio_group_create_device(struct vfio_group *group,
>>  	device->group = group;
>>  	device->ops = ops;
>>  	device->device_data = device_data;
>> +	/* By default try to enable IOPF */
>> +	vfio_device_enable_iopf(device);
>>  	dev_set_drvdata(dev, device);
>>  
>>  	/* No need to get group_lock, caller has group reference */
>> @@ -573,6 +591,8 @@ static void vfio_device_release(struct kref *kref)
>>  	mutex_unlock(&group->device_lock);
>>  
>>  	dev_set_drvdata(device->dev, NULL);
>> +	if (device->iopf_enabled)
>> +		WARN_ON(iommu_unregister_device_fault_handler(device->dev));
>>  
>>  	kfree(device);
>>  
> 
> .
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough
  2021-01-29 22:57 ` [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough Alex Williamson
  2021-01-30  9:30   ` Shenming Lu
@ 2021-02-01  7:56   ` Tian, Kevin
  2021-02-02  6:41     ` Shenming Lu
  1 sibling, 1 reply; 31+ messages in thread
From: Tian, Kevin @ 2021-02-01  7:56 UTC (permalink / raw)
  To: Alex Williamson, Shenming Lu
  Cc: Cornelia Huck, kvm, linux-kernel, Jean-Philippe Brucker,
	Eric Auger, Lu Baolu, wanghaibin.wang, yuzenghui

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Saturday, January 30, 2021 6:58 AM
> 
> On Mon, 25 Jan 2021 17:03:58 +0800
> Shenming Lu <lushenming@huawei.com> wrote:
> 
> > Hi,
> >
> > The static pinning and mapping problem in VFIO and possible solutions
> > have been discussed a lot [1, 2]. One of the solutions is to add I/O
> > page fault support for VFIO devices. Different from those relatively
> > complicated software approaches such as presenting a vIOMMU that
> provides
> > the DMA buffer information (might include para-virtualized optimizations),
> > IOPF mainly depends on the hardware faulting capability, such as the PCIe
> > PRI extension or Arm SMMU stall model. What's more, the IOPF support in
> > the IOMMU driver is being implemented in SVA [3]. So do we consider to
> > add IOPF support for VFIO passthrough based on the IOPF part of SVA at
> > present?
> >
> > We have implemented a basic demo only for one stage of translation (GPA
> > -> HPA in virtualization, note that it can be configured at either stage),
> > and tested on Hisilicon Kunpeng920 board. The nested mode is more
> complicated
> > since VFIO only handles the second stage page faults (same as the non-
> nested
> > case), while the first stage page faults need to be further delivered to
> > the guest, which is being implemented in [4] on ARM. My thought on this
> > is to report the page faults to VFIO regardless of the occured stage (try
> > to carry the stage information), and handle respectively according to the
> > configured mode in VFIO. Or the IOMMU driver might evolve to support
> more...
> >
> > Might TODO:
> >  - Optimize the faulting path, and measure the performance (it might still
> >    be a big issue).
> >  - Add support for PRI.
> >  - Add a MMU notifier to avoid pinning.
> >  - Add support for the nested mode.
> > ...
> >
> > Any comments and suggestions are very welcome. :-)
> 
> I expect performance to be pretty bad here, the lookup involved per
> fault is excessive.  There are cases where a user is not going to be
> willing to have a slow ramp up of performance for their devices as they
> fault in pages, so we might need to considering making this
> configurable through the vfio interface.  Our page mapping also only

There is another factor to be considered. The presence of IOMMU_
DEV_FEAT_IOPF just indicates the device capability of triggering I/O 
page fault through the IOMMU, but not exactly means that the device 
can tolerate I/O page fault for arbitrary DMA requests. In reality, many 
devices allow I/O faulting only in selective contexts. However, there
is no standard way (e.g. PCISIG) for the device to report whether 
arbitrary I/O fault is allowed. Then we may have to maintain device
specific knowledge in software, e.g. in an opt-in table to list devices
which allows arbitrary faults. For devices which only support selective 
faulting, a mediator (either through vendor extensions on vfio-pci-core
or a mdev wrapper) might be necessary to help lock down non-faultable 
mappings and then enable faulting on the rest mappings.

> grows here, should mappings expire or do we need a least recently
> mapped tracker to avoid exceeding the user's locked memory limit?  How
> does a user know what to set for a locked memory limit?  The behavior
> here would lead to cases where an idle system might be ok, but as soon
> as load increases with more inflight DMA, we start seeing
> "unpredictable" I/O faults from the user perspective.  Seems like there
> are lots of outstanding considerations and I'd also like to hear from
> the SVA folks about how this meshes with their work.  Thanks,
> 

The main overlap between this feature and SVA is the IOPF reporting
framework, which currently still has gap to support both in nested
mode, as discussed here:

https://lore.kernel.org/linux-acpi/YAaxjmJW+ZMvrhac@myrica/

Once that gap is resolved in the future, the VFIO fault handler just 
adopts different actions according to the fault-level: 1st level faults
are forwarded to userspace thru the vSVA path while 2nd-level faults
are fixed (or warned if not intended) by VFIO itself thru the IOMMU
mapping interface.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough
  2021-02-01  7:56   ` Tian, Kevin
@ 2021-02-02  6:41     ` Shenming Lu
  2021-02-04  6:52       ` Tian, Kevin
  0 siblings, 1 reply; 31+ messages in thread
From: Shenming Lu @ 2021-02-02  6:41 UTC (permalink / raw)
  To: Tian, Kevin, Alex Williamson
  Cc: Cornelia Huck, kvm, linux-kernel, Jean-Philippe Brucker,
	Eric Auger, Lu Baolu, wanghaibin.wang, yuzenghui

On 2021/2/1 15:56, Tian, Kevin wrote:
>> From: Alex Williamson <alex.williamson@redhat.com>
>> Sent: Saturday, January 30, 2021 6:58 AM
>>
>> On Mon, 25 Jan 2021 17:03:58 +0800
>> Shenming Lu <lushenming@huawei.com> wrote:
>>
>>> Hi,
>>>
>>> The static pinning and mapping problem in VFIO and possible solutions
>>> have been discussed a lot [1, 2]. One of the solutions is to add I/O
>>> page fault support for VFIO devices. Different from those relatively
>>> complicated software approaches such as presenting a vIOMMU that
>> provides
>>> the DMA buffer information (might include para-virtualized optimizations),
>>> IOPF mainly depends on the hardware faulting capability, such as the PCIe
>>> PRI extension or Arm SMMU stall model. What's more, the IOPF support in
>>> the IOMMU driver is being implemented in SVA [3]. So do we consider to
>>> add IOPF support for VFIO passthrough based on the IOPF part of SVA at
>>> present?
>>>
>>> We have implemented a basic demo only for one stage of translation (GPA
>>> -> HPA in virtualization, note that it can be configured at either stage),
>>> and tested on Hisilicon Kunpeng920 board. The nested mode is more
>> complicated
>>> since VFIO only handles the second stage page faults (same as the non-
>> nested
>>> case), while the first stage page faults need to be further delivered to
>>> the guest, which is being implemented in [4] on ARM. My thought on this
>>> is to report the page faults to VFIO regardless of the occured stage (try
>>> to carry the stage information), and handle respectively according to the
>>> configured mode in VFIO. Or the IOMMU driver might evolve to support
>> more...
>>>
>>> Might TODO:
>>>  - Optimize the faulting path, and measure the performance (it might still
>>>    be a big issue).
>>>  - Add support for PRI.
>>>  - Add a MMU notifier to avoid pinning.
>>>  - Add support for the nested mode.
>>> ...
>>>
>>> Any comments and suggestions are very welcome. :-)
>>
>> I expect performance to be pretty bad here, the lookup involved per
>> fault is excessive.  There are cases where a user is not going to be
>> willing to have a slow ramp up of performance for their devices as they
>> fault in pages, so we might need to considering making this
>> configurable through the vfio interface.  Our page mapping also only
> 
> There is another factor to be considered. The presence of IOMMU_
> DEV_FEAT_IOPF just indicates the device capability of triggering I/O 
> page fault through the IOMMU, but not exactly means that the device 
> can tolerate I/O page fault for arbitrary DMA requests.

Yes, so I add a iopf_enabled field in VFIO to indicate the whole path faulting
capability and set it to true after registering a VFIO page fault handler.

> In reality, many 
> devices allow I/O faulting only in selective contexts. However, there
> is no standard way (e.g. PCISIG) for the device to report whether 
> arbitrary I/O fault is allowed. Then we may have to maintain device
> specific knowledge in software, e.g. in an opt-in table to list devices
> which allows arbitrary faults. For devices which only support selective 
> faulting, a mediator (either through vendor extensions on vfio-pci-core
> or a mdev wrapper) might be necessary to help lock down non-faultable 
> mappings and then enable faulting on the rest mappings.

For devices which only support selective faulting, they could tell it to the
IOMMU driver and let it filter out non-faultable faults? Do I get it wrong?

> 
>> grows here, should mappings expire or do we need a least recently
>> mapped tracker to avoid exceeding the user's locked memory limit?  How
>> does a user know what to set for a locked memory limit?  The behavior
>> here would lead to cases where an idle system might be ok, but as soon
>> as load increases with more inflight DMA, we start seeing
>> "unpredictable" I/O faults from the user perspective.  Seems like there
>> are lots of outstanding considerations and I'd also like to hear from
>> the SVA folks about how this meshes with their work.  Thanks,
>>
> 
> The main overlap between this feature and SVA is the IOPF reporting
> framework, which currently still has gap to support both in nested
> mode, as discussed here:
> 
> https://lore.kernel.org/linux-acpi/YAaxjmJW+ZMvrhac@myrica/
> 
> Once that gap is resolved in the future, the VFIO fault handler just 
> adopts different actions according to the fault-level: 1st level faults
> are forwarded to userspace thru the vSVA path while 2nd-level faults
> are fixed (or warned if not intended) by VFIO itself thru the IOMMU
> mapping interface.

I understand what you mean is:
From the perspective of VFIO, first, we need to set FEAT_IOPF, and then regster its
own handler with a flag to indicate FLAT or NESTED and which level is concerned,
thus the VFIO handler can handle the page faults directly according to the carried
level information.

Is there any plan for evolving(implementing) the IOMMU driver to support this? Or
could we help this?  :-)

Thanks,
Shenming

> 
> Thanks
> Kevin
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough
  2021-02-02  6:41     ` Shenming Lu
@ 2021-02-04  6:52       ` Tian, Kevin
  2021-02-05 10:37         ` Jean-Philippe Brucker
                           ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Tian, Kevin @ 2021-02-04  6:52 UTC (permalink / raw)
  To: Shenming Lu, Alex Williamson
  Cc: Cornelia Huck, kvm, linux-kernel, Jean-Philippe Brucker,
	Eric Auger, Lu Baolu, wanghaibin.wang, yuzenghui, Liu, Yi L, Pan,
	Jacob jun

> From: Shenming Lu <lushenming@huawei.com>
> Sent: Tuesday, February 2, 2021 2:42 PM
> 
> On 2021/2/1 15:56, Tian, Kevin wrote:
> >> From: Alex Williamson <alex.williamson@redhat.com>
> >> Sent: Saturday, January 30, 2021 6:58 AM
> >>
> >> On Mon, 25 Jan 2021 17:03:58 +0800
> >> Shenming Lu <lushenming@huawei.com> wrote:
> >>
> >>> Hi,
> >>>
> >>> The static pinning and mapping problem in VFIO and possible solutions
> >>> have been discussed a lot [1, 2]. One of the solutions is to add I/O
> >>> page fault support for VFIO devices. Different from those relatively
> >>> complicated software approaches such as presenting a vIOMMU that
> >> provides
> >>> the DMA buffer information (might include para-virtualized
> optimizations),
> >>> IOPF mainly depends on the hardware faulting capability, such as the
> PCIe
> >>> PRI extension or Arm SMMU stall model. What's more, the IOPF support
> in
> >>> the IOMMU driver is being implemented in SVA [3]. So do we consider to
> >>> add IOPF support for VFIO passthrough based on the IOPF part of SVA at
> >>> present?
> >>>
> >>> We have implemented a basic demo only for one stage of translation
> (GPA
> >>> -> HPA in virtualization, note that it can be configured at either stage),
> >>> and tested on Hisilicon Kunpeng920 board. The nested mode is more
> >> complicated
> >>> since VFIO only handles the second stage page faults (same as the non-
> >> nested
> >>> case), while the first stage page faults need to be further delivered to
> >>> the guest, which is being implemented in [4] on ARM. My thought on this
> >>> is to report the page faults to VFIO regardless of the occured stage (try
> >>> to carry the stage information), and handle respectively according to the
> >>> configured mode in VFIO. Or the IOMMU driver might evolve to support
> >> more...
> >>>
> >>> Might TODO:
> >>>  - Optimize the faulting path, and measure the performance (it might still
> >>>    be a big issue).
> >>>  - Add support for PRI.
> >>>  - Add a MMU notifier to avoid pinning.
> >>>  - Add support for the nested mode.
> >>> ...
> >>>
> >>> Any comments and suggestions are very welcome. :-)
> >>
> >> I expect performance to be pretty bad here, the lookup involved per
> >> fault is excessive.  There are cases where a user is not going to be
> >> willing to have a slow ramp up of performance for their devices as they
> >> fault in pages, so we might need to considering making this
> >> configurable through the vfio interface.  Our page mapping also only
> >
> > There is another factor to be considered. The presence of IOMMU_
> > DEV_FEAT_IOPF just indicates the device capability of triggering I/O
> > page fault through the IOMMU, but not exactly means that the device
> > can tolerate I/O page fault for arbitrary DMA requests.
> 
> Yes, so I add a iopf_enabled field in VFIO to indicate the whole path faulting
> capability and set it to true after registering a VFIO page fault handler.
> 
> > In reality, many
> > devices allow I/O faulting only in selective contexts. However, there
> > is no standard way (e.g. PCISIG) for the device to report whether
> > arbitrary I/O fault is allowed. Then we may have to maintain device
> > specific knowledge in software, e.g. in an opt-in table to list devices
> > which allows arbitrary faults. For devices which only support selective
> > faulting, a mediator (either through vendor extensions on vfio-pci-core
> > or a mdev wrapper) might be necessary to help lock down non-faultable
> > mappings and then enable faulting on the rest mappings.
> 
> For devices which only support selective faulting, they could tell it to the
> IOMMU driver and let it filter out non-faultable faults? Do I get it wrong?

Not exactly to IOMMU driver. There is already a vfio_pin_pages() for
selectively page-pinning. The matter is that 'they' imply some device
specific logic to decide which pages must be pinned and such knowledge
is outside of VFIO.

From enabling p.o.v we could possibly do it in phased approach. First 
handles devices which tolerate arbitrary DMA faults, and then extends
to devices with selective-faulting. The former is simpler, but with one
main open whether we want to maintain such device IDs in a static
table in VFIO or rely on some hints from other components (e.g. PF
driver in VF assignment case). Let's see how Alex thinks about it.

> 
> >
> >> grows here, should mappings expire or do we need a least recently
> >> mapped tracker to avoid exceeding the user's locked memory limit?  How
> >> does a user know what to set for a locked memory limit?  The behavior
> >> here would lead to cases where an idle system might be ok, but as soon
> >> as load increases with more inflight DMA, we start seeing
> >> "unpredictable" I/O faults from the user perspective.  Seems like there
> >> are lots of outstanding considerations and I'd also like to hear from
> >> the SVA folks about how this meshes with their work.  Thanks,
> >>
> >
> > The main overlap between this feature and SVA is the IOPF reporting
> > framework, which currently still has gap to support both in nested
> > mode, as discussed here:
> >
> > https://lore.kernel.org/linux-acpi/YAaxjmJW+ZMvrhac@myrica/
> >
> > Once that gap is resolved in the future, the VFIO fault handler just
> > adopts different actions according to the fault-level: 1st level faults
> > are forwarded to userspace thru the vSVA path while 2nd-level faults
> > are fixed (or warned if not intended) by VFIO itself thru the IOMMU
> > mapping interface.
> 
> I understand what you mean is:
> From the perspective of VFIO, first, we need to set FEAT_IOPF, and then
> regster its
> own handler with a flag to indicate FLAT or NESTED and which level is
> concerned,
> thus the VFIO handler can handle the page faults directly according to the
> carried
> level information.
> 
> Is there any plan for evolving(implementing) the IOMMU driver to support
> this? Or
> could we help this?  :-)
> 

Yes, it's in plan but just not happened yet. We are still focusing on guest
SVA part thus only the 1st-level page fault (+Yi/Jacob). It's always welcomed 
to collaborate/help if you have time. 😊

Thanks
Kevin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough
  2021-02-04  6:52       ` Tian, Kevin
@ 2021-02-05 10:37         ` Jean-Philippe Brucker
  2021-02-07  8:20           ` Tian, Kevin
  2021-02-09 11:06         ` Liu, Yi L
  2021-03-18  7:53         ` Shenming Lu
  2 siblings, 1 reply; 31+ messages in thread
From: Jean-Philippe Brucker @ 2021-02-05 10:37 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Shenming Lu, Alex Williamson, Cornelia Huck, kvm, linux-kernel,
	Eric Auger, Lu Baolu, wanghaibin.wang, yuzenghui, Liu, Yi L, Pan,
	Jacob jun

Hi,

On Thu, Feb 04, 2021 at 06:52:10AM +0000, Tian, Kevin wrote:
> > >>> The static pinning and mapping problem in VFIO and possible solutions
> > >>> have been discussed a lot [1, 2]. One of the solutions is to add I/O
> > >>> page fault support for VFIO devices. Different from those relatively
> > >>> complicated software approaches such as presenting a vIOMMU that
> > >> provides
> > >>> the DMA buffer information (might include para-virtualized
> > optimizations),

I'm curious about the performance difference between this and the
map/unmap vIOMMU, as well as the coIOMMU. This is probably a lot faster
but those don't depend on IOPF which is a pretty rare feature at the
moment.

[...]
> > > In reality, many
> > > devices allow I/O faulting only in selective contexts. However, there
> > > is no standard way (e.g. PCISIG) for the device to report whether
> > > arbitrary I/O fault is allowed. Then we may have to maintain device
> > > specific knowledge in software, e.g. in an opt-in table to list devices
> > > which allows arbitrary faults. For devices which only support selective
> > > faulting, a mediator (either through vendor extensions on vfio-pci-core
> > > or a mdev wrapper) might be necessary to help lock down non-faultable
> > > mappings and then enable faulting on the rest mappings.
> > 
> > For devices which only support selective faulting, they could tell it to the
> > IOMMU driver and let it filter out non-faultable faults? Do I get it wrong?
> 
> Not exactly to IOMMU driver. There is already a vfio_pin_pages() for
> selectively page-pinning. The matter is that 'they' imply some device
> specific logic to decide which pages must be pinned and such knowledge
> is outside of VFIO.
> 
> From enabling p.o.v we could possibly do it in phased approach. First 
> handles devices which tolerate arbitrary DMA faults, and then extends
> to devices with selective-faulting. The former is simpler, but with one
> main open whether we want to maintain such device IDs in a static
> table in VFIO or rely on some hints from other components (e.g. PF
> driver in VF assignment case). Let's see how Alex thinks about it.

Do you think selective-faulting will be the norm, or only a problem for
initial IOPF implementations?  To me it's the selective-faulting kind of
device that will be the odd one out, but that's pure speculation. Either
way maintaining a device list seems like a pain.

[...]
> Yes, it's in plan but just not happened yet. We are still focusing on guest
> SVA part thus only the 1st-level page fault (+Yi/Jacob). It's always welcomed 
> to collaborate/help if you have time. 😊

By the way the current fault report API is missing a way to invalidate
partial faults: when the IOMMU device's PRI queue overflows, it may
auto-respond to page request groups that were already partially reported
by the IOMMU driver. Upon detecting an overflow, the IOMMU driver needs to
tell all fault consumers to discard their partial groups.
iopf_queue_discard_partial() [1] does this for the internal IOPF handler
but we have nothing for the lower-level fault handler at the moment. And
it gets more complicated when injecting IOPFs to guests, we'd need a
mechanism to recall partial groups all the way through kernel->userspace
and userspace->guest.

Shenming suggests [2] to also use the IOPF handler for IOPFs managed by
device drivers. It's worth considering in my opinion because we could hold
partial groups within the kernel and only report full groups to device
drivers (and guests). In addition we'd consolidate tracking of IOPFs,
since they're done both by iommu_report_device_fault() and the IOPF
handler at the moment.

Note that I plan to upstream the IOPF patch [1] as is because it was
already in good shape for 5.12, and consolidating the fault handler will
require some thinking.

Thanks,
Jean


[1] https://lore.kernel.org/linux-iommu/20210127154322.3959196-7-jean-philippe@linaro.org/
[2] https://lore.kernel.org/linux-iommu/f79f06be-e46b-a65a-3951-3e7dbfa66b4a@huawei.com/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough
  2021-02-05 10:37         ` Jean-Philippe Brucker
@ 2021-02-07  8:20           ` Tian, Kevin
  2021-02-07 11:47             ` Shenming Lu
  0 siblings, 1 reply; 31+ messages in thread
From: Tian, Kevin @ 2021-02-07  8:20 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Shenming Lu, Alex Williamson, Cornelia Huck, kvm, linux-kernel,
	Eric Auger, Lu Baolu, wanghaibin.wang, yuzenghui, Liu, Yi L, Pan,
	Jacob jun

> From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> Sent: Friday, February 5, 2021 6:37 PM
> 
> Hi,
> 
> On Thu, Feb 04, 2021 at 06:52:10AM +0000, Tian, Kevin wrote:
> > > >>> The static pinning and mapping problem in VFIO and possible
> solutions
> > > >>> have been discussed a lot [1, 2]. One of the solutions is to add I/O
> > > >>> page fault support for VFIO devices. Different from those relatively
> > > >>> complicated software approaches such as presenting a vIOMMU that
> > > >> provides
> > > >>> the DMA buffer information (might include para-virtualized
> > > optimizations),
> 
> I'm curious about the performance difference between this and the
> map/unmap vIOMMU, as well as the coIOMMU. This is probably a lot faster
> but those don't depend on IOPF which is a pretty rare feature at the
> moment.
> 
> [...]
> > > > In reality, many
> > > > devices allow I/O faulting only in selective contexts. However, there
> > > > is no standard way (e.g. PCISIG) for the device to report whether
> > > > arbitrary I/O fault is allowed. Then we may have to maintain device
> > > > specific knowledge in software, e.g. in an opt-in table to list devices
> > > > which allows arbitrary faults. For devices which only support selective
> > > > faulting, a mediator (either through vendor extensions on vfio-pci-core
> > > > or a mdev wrapper) might be necessary to help lock down non-
> faultable
> > > > mappings and then enable faulting on the rest mappings.
> > >
> > > For devices which only support selective faulting, they could tell it to the
> > > IOMMU driver and let it filter out non-faultable faults? Do I get it wrong?
> >
> > Not exactly to IOMMU driver. There is already a vfio_pin_pages() for
> > selectively page-pinning. The matter is that 'they' imply some device
> > specific logic to decide which pages must be pinned and such knowledge
> > is outside of VFIO.
> >
> > From enabling p.o.v we could possibly do it in phased approach. First
> > handles devices which tolerate arbitrary DMA faults, and then extends
> > to devices with selective-faulting. The former is simpler, but with one
> > main open whether we want to maintain such device IDs in a static
> > table in VFIO or rely on some hints from other components (e.g. PF
> > driver in VF assignment case). Let's see how Alex thinks about it.
> 
> Do you think selective-faulting will be the norm, or only a problem for
> initial IOPF implementations?  To me it's the selective-faulting kind of
> device that will be the odd one out, but that's pure speculation. Either
> way maintaining a device list seems like a pain.

I would think it's norm for quite some time (e.g. multiple years), as from
what I learned turning a complex accelerator to an implementation 
tolerating arbitrary DMA fault is way complex (in every critical path) and
not cost effective (tracking in-fly requests). It might be OK for some 
purposely-built devices in specific usage but for most it has to be an 
evolving path toward the 100%-faultable goal...

> 
> [...]
> > Yes, it's in plan but just not happened yet. We are still focusing on guest
> > SVA part thus only the 1st-level page fault (+Yi/Jacob). It's always
> welcomed
> > to collaborate/help if you have time. 😊
> 
> By the way the current fault report API is missing a way to invalidate
> partial faults: when the IOMMU device's PRI queue overflows, it may
> auto-respond to page request groups that were already partially reported
> by the IOMMU driver. Upon detecting an overflow, the IOMMU driver needs
> to
> tell all fault consumers to discard their partial groups.
> iopf_queue_discard_partial() [1] does this for the internal IOPF handler
> but we have nothing for the lower-level fault handler at the moment. And
> it gets more complicated when injecting IOPFs to guests, we'd need a
> mechanism to recall partial groups all the way through kernel->userspace
> and userspace->guest.

I didn't know how to recall partial groups through emulated vIOMMUs
(at least for virtual VT-d). Possibly it could be supported by virtio-iommu.
But in any case I consider it more like an optimization instead of a functional
requirement (and could be avoided in below Shenming's suggestion).

> 
> Shenming suggests [2] to also use the IOPF handler for IOPFs managed by
> device drivers. It's worth considering in my opinion because we could hold
> partial groups within the kernel and only report full groups to device
> drivers (and guests). In addition we'd consolidate tracking of IOPFs,
> since they're done both by iommu_report_device_fault() and the IOPF
> handler at the moment.

I also think it's the right thing to do. In concept w/ or w/o DEV_FEAT_IOPF
just reflects how IOPFs are delivered to the system software. In the end 
IOPFs are all about permission violations in the IOMMU page tables thus
we should try to reuse/consolidate the IOMMU fault reporting stack as 
much as possible.

> 
> Note that I plan to upstream the IOPF patch [1] as is because it was
> already in good shape for 5.12, and consolidating the fault handler will
> require some thinking.

This plan makes sense.

> 
> Thanks,
> Jean
> 
> 
> [1] https://lore.kernel.org/linux-iommu/20210127154322.3959196-7-jean-
> philippe@linaro.org/
> [2] https://lore.kernel.org/linux-iommu/f79f06be-e46b-a65a-3951-
> 3e7dbfa66b4a@huawei.com/

Thanks
Kevin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough
  2021-02-07  8:20           ` Tian, Kevin
@ 2021-02-07 11:47             ` Shenming Lu
  0 siblings, 0 replies; 31+ messages in thread
From: Shenming Lu @ 2021-02-07 11:47 UTC (permalink / raw)
  To: Tian, Kevin, Jean-Philippe Brucker
  Cc: Alex Williamson, Cornelia Huck, kvm, linux-kernel, Eric Auger,
	Lu Baolu, wanghaibin.wang, yuzenghui, Liu, Yi L, Pan, Jacob jun

On 2021/2/7 16:20, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker <jean-philippe@linaro.org>
>> Sent: Friday, February 5, 2021 6:37 PM
>>
>> Hi,
>>
>> On Thu, Feb 04, 2021 at 06:52:10AM +0000, Tian, Kevin wrote:
>>>>>>> The static pinning and mapping problem in VFIO and possible
>> solutions
>>>>>>> have been discussed a lot [1, 2]. One of the solutions is to add I/O
>>>>>>> page fault support for VFIO devices. Different from those relatively
>>>>>>> complicated software approaches such as presenting a vIOMMU that
>>>>>> provides
>>>>>>> the DMA buffer information (might include para-virtualized
>>>> optimizations),
>>
>> I'm curious about the performance difference between this and the
>> map/unmap vIOMMU, as well as the coIOMMU. This is probably a lot faster
>> but those don't depend on IOPF which is a pretty rare feature at the
>> moment.

Yeah, I will give the performance data later.

>>
>> [...]
>>>>> In reality, many
>>>>> devices allow I/O faulting only in selective contexts. However, there
>>>>> is no standard way (e.g. PCISIG) for the device to report whether
>>>>> arbitrary I/O fault is allowed. Then we may have to maintain device
>>>>> specific knowledge in software, e.g. in an opt-in table to list devices
>>>>> which allows arbitrary faults. For devices which only support selective
>>>>> faulting, a mediator (either through vendor extensions on vfio-pci-core
>>>>> or a mdev wrapper) might be necessary to help lock down non-
>> faultable
>>>>> mappings and then enable faulting on the rest mappings.
>>>>
>>>> For devices which only support selective faulting, they could tell it to the
>>>> IOMMU driver and let it filter out non-faultable faults? Do I get it wrong?
>>>
>>> Not exactly to IOMMU driver. There is already a vfio_pin_pages() for
>>> selectively page-pinning. The matter is that 'they' imply some device
>>> specific logic to decide which pages must be pinned and such knowledge
>>> is outside of VFIO.
>>>
>>> From enabling p.o.v we could possibly do it in phased approach. First
>>> handles devices which tolerate arbitrary DMA faults, and then extends
>>> to devices with selective-faulting. The former is simpler, but with one
>>> main open whether we want to maintain such device IDs in a static
>>> table in VFIO or rely on some hints from other components (e.g. PF
>>> driver in VF assignment case). Let's see how Alex thinks about it.
>>
>> Do you think selective-faulting will be the norm, or only a problem for
>> initial IOPF implementations?  To me it's the selective-faulting kind of
>> device that will be the odd one out, but that's pure speculation. Either
>> way maintaining a device list seems like a pain.
> 
> I would think it's norm for quite some time (e.g. multiple years), as from
> what I learned turning a complex accelerator to an implementation 
> tolerating arbitrary DMA fault is way complex (in every critical path) and
> not cost effective (tracking in-fly requests). It might be OK for some 
> purposely-built devices in specific usage but for most it has to be an 
> evolving path toward the 100%-faultable goal...
> 
>>
>> [...]
>>> Yes, it's in plan but just not happened yet. We are still focusing on guest
>>> SVA part thus only the 1st-level page fault (+Yi/Jacob). It's always
>> welcomed
>>> to collaborate/help if you have time. 😊
>>
>> By the way the current fault report API is missing a way to invalidate
>> partial faults: when the IOMMU device's PRI queue overflows, it may
>> auto-respond to page request groups that were already partially reported
>> by the IOMMU driver. Upon detecting an overflow, the IOMMU driver needs
>> to
>> tell all fault consumers to discard their partial groups.
>> iopf_queue_discard_partial() [1] does this for the internal IOPF handler
>> but we have nothing for the lower-level fault handler at the moment. And
>> it gets more complicated when injecting IOPFs to guests, we'd need a
>> mechanism to recall partial groups all the way through kernel->userspace
>> and userspace->guest.
> 
> I didn't know how to recall partial groups through emulated vIOMMUs
> (at least for virtual VT-d). Possibly it could be supported by virtio-iommu.
> But in any case I consider it more like an optimization instead of a functional
> requirement (and could be avoided in below Shenming's suggestion).
> 
>>
>> Shenming suggests [2] to also use the IOPF handler for IOPFs managed by
>> device drivers. It's worth considering in my opinion because we could hold
>> partial groups within the kernel and only report full groups to device
>> drivers (and guests). In addition we'd consolidate tracking of IOPFs,
>> since they're done both by iommu_report_device_fault() and the IOPF
>> handler at the moment.
> 
> I also think it's the right thing to do. In concept w/ or w/o DEV_FEAT_IOPF
> just reflects how IOPFs are delivered to the system software. In the end 
> IOPFs are all about permission violations in the IOMMU page tables thus
> we should try to reuse/consolidate the IOMMU fault reporting stack as 
> much as possible.
> 
>>
>> Note that I plan to upstream the IOPF patch [1] as is because it was
>> already in good shape for 5.12, and consolidating the fault handler will
>> require some thinking.
> 
> This plan makes sense.

Yeah, maybe this problem could be left for the implementation of a device(VFIO)
specific fault handler... :-)

Thanks,
Shenming

> 
>>
>> Thanks,
>> Jean
>>
>>
>> [1] https://lore.kernel.org/linux-iommu/20210127154322.3959196-7-jean-
>> philippe@linaro.org/
>> [2] https://lore.kernel.org/linux-iommu/f79f06be-e46b-a65a-3951-
>> 3e7dbfa66b4a@huawei.com/
> 
> Thanks
> Kevin
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough
  2021-02-04  6:52       ` Tian, Kevin
  2021-02-05 10:37         ` Jean-Philippe Brucker
@ 2021-02-09 11:06         ` Liu, Yi L
  2021-02-10  8:02           ` Shenming Lu
  2021-03-18  7:53         ` Shenming Lu
  2 siblings, 1 reply; 31+ messages in thread
From: Liu, Yi L @ 2021-02-09 11:06 UTC (permalink / raw)
  To: Tian, Kevin, Shenming Lu, Alex Williamson
  Cc: Cornelia Huck, kvm, linux-kernel, Jean-Philippe Brucker,
	Eric Auger, Lu Baolu, wanghaibin.wang, yuzenghui, Pan, Jacob jun

> From: Tian, Kevin <kevin.tian@intel.com>
> Sent: Thursday, February 4, 2021 2:52 PM
> 
> > From: Shenming Lu <lushenming@huawei.com>
> > Sent: Tuesday, February 2, 2021 2:42 PM
> >
> > On 2021/2/1 15:56, Tian, Kevin wrote:
> > >> From: Alex Williamson <alex.williamson@redhat.com>
> > >> Sent: Saturday, January 30, 2021 6:58 AM
> > >>
> > >> On Mon, 25 Jan 2021 17:03:58 +0800
> > >> Shenming Lu <lushenming@huawei.com> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> The static pinning and mapping problem in VFIO and possible
> solutions
> > >>> have been discussed a lot [1, 2]. One of the solutions is to add I/O
> > >>> page fault support for VFIO devices. Different from those relatively
> > >>> complicated software approaches such as presenting a vIOMMU that
> > >> provides
> > >>> the DMA buffer information (might include para-virtualized
> > optimizations),
> > >>> IOPF mainly depends on the hardware faulting capability, such as the
> > PCIe
> > >>> PRI extension or Arm SMMU stall model. What's more, the IOPF
> support
> > in
> > >>> the IOMMU driver is being implemented in SVA [3]. So do we
> consider to
> > >>> add IOPF support for VFIO passthrough based on the IOPF part of SVA
> at
> > >>> present?
> > >>>
> > >>> We have implemented a basic demo only for one stage of translation
> > (GPA
> > >>> -> HPA in virtualization, note that it can be configured at either stage),
> > >>> and tested on Hisilicon Kunpeng920 board. The nested mode is more
> > >> complicated
> > >>> since VFIO only handles the second stage page faults (same as the
> non-
> > >> nested
> > >>> case), while the first stage page faults need to be further delivered to
> > >>> the guest, which is being implemented in [4] on ARM. My thought on
> this
> > >>> is to report the page faults to VFIO regardless of the occured stage
> (try
> > >>> to carry the stage information), and handle respectively according to
> the
> > >>> configured mode in VFIO. Or the IOMMU driver might evolve to
> support
> > >> more...
> > >>>
> > >>> Might TODO:
> > >>>  - Optimize the faulting path, and measure the performance (it might
> still
> > >>>    be a big issue).
> > >>>  - Add support for PRI.
> > >>>  - Add a MMU notifier to avoid pinning.
> > >>>  - Add support for the nested mode.
> > >>> ...
> > >>>
> > >>> Any comments and suggestions are very welcome. :-)
> > >>
> > >> I expect performance to be pretty bad here, the lookup involved per
> > >> fault is excessive.  There are cases where a user is not going to be
> > >> willing to have a slow ramp up of performance for their devices as they
> > >> fault in pages, so we might need to considering making this
> > >> configurable through the vfio interface.  Our page mapping also only
> > >
> > > There is another factor to be considered. The presence of IOMMU_
> > > DEV_FEAT_IOPF just indicates the device capability of triggering I/O
> > > page fault through the IOMMU, but not exactly means that the device
> > > can tolerate I/O page fault for arbitrary DMA requests.
> >
> > Yes, so I add a iopf_enabled field in VFIO to indicate the whole path
> faulting
> > capability and set it to true after registering a VFIO page fault handler.
> >
> > > In reality, many
> > > devices allow I/O faulting only in selective contexts. However, there
> > > is no standard way (e.g. PCISIG) for the device to report whether
> > > arbitrary I/O fault is allowed. Then we may have to maintain device
> > > specific knowledge in software, e.g. in an opt-in table to list devices
> > > which allows arbitrary faults. For devices which only support selective
> > > faulting, a mediator (either through vendor extensions on vfio-pci-core
> > > or a mdev wrapper) might be necessary to help lock down non-faultable
> > > mappings and then enable faulting on the rest mappings.
> >
> > For devices which only support selective faulting, they could tell it to the
> > IOMMU driver and let it filter out non-faultable faults? Do I get it wrong?
> 
> Not exactly to IOMMU driver. There is already a vfio_pin_pages() for
> selectively page-pinning. The matter is that 'they' imply some device
> specific logic to decide which pages must be pinned and such knowledge
> is outside of VFIO.
> 
> From enabling p.o.v we could possibly do it in phased approach. First
> handles devices which tolerate arbitrary DMA faults, and then extends
> to devices with selective-faulting. The former is simpler, but with one
> main open whether we want to maintain such device IDs in a static
> table in VFIO or rely on some hints from other components (e.g. PF
> driver in VF assignment case). Let's see how Alex thinks about it.
> 
> >
> > >
> > >> grows here, should mappings expire or do we need a least recently
> > >> mapped tracker to avoid exceeding the user's locked memory limit?
> How
> > >> does a user know what to set for a locked memory limit?  The behavior
> > >> here would lead to cases where an idle system might be ok, but as
> soon
> > >> as load increases with more inflight DMA, we start seeing
> > >> "unpredictable" I/O faults from the user perspective.  Seems like there
> > >> are lots of outstanding considerations and I'd also like to hear from
> > >> the SVA folks about how this meshes with their work.  Thanks,
> > >>
> > >
> > > The main overlap between this feature and SVA is the IOPF reporting
> > > framework, which currently still has gap to support both in nested
> > > mode, as discussed here:
> > >
> > > https://lore.kernel.org/linux-acpi/YAaxjmJW+ZMvrhac@myrica/
> > >
> > > Once that gap is resolved in the future, the VFIO fault handler just
> > > adopts different actions according to the fault-level: 1st level faults
> > > are forwarded to userspace thru the vSVA path while 2nd-level faults
> > > are fixed (or warned if not intended) by VFIO itself thru the IOMMU
> > > mapping interface.
> >
> > I understand what you mean is:
> > From the perspective of VFIO, first, we need to set FEAT_IOPF, and then
> > regster its
> > own handler with a flag to indicate FLAT or NESTED and which level is
> > concerned,
> > thus the VFIO handler can handle the page faults directly according to the
> > carried
> > level information.
> >
> > Is there any plan for evolving(implementing) the IOMMU driver to
> support
> > this? Or
> > could we help this?  :-)
> >
> 
> Yes, it's in plan but just not happened yet. We are still focusing on guest
> SVA part thus only the 1st-level page fault (+Yi/Jacob). It's always welcomed
> to collaborate/help if you have time. ??

yeah, I saw Eric's page fault support patch is listed as reference. BTW.
one thing needs to clarify, currently only one iommu fault handler supported
for a single device. So for the fault handler added in this series, it should
be consolidated with the one added in Eric's series.

Regards,
Yi Liu

> Thanks
> Kevin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough
  2021-02-09 11:06         ` Liu, Yi L
@ 2021-02-10  8:02           ` Shenming Lu
  0 siblings, 0 replies; 31+ messages in thread
From: Shenming Lu @ 2021-02-10  8:02 UTC (permalink / raw)
  To: Liu, Yi L, Tian, Kevin, Alex Williamson
  Cc: Cornelia Huck, kvm, linux-kernel, Jean-Philippe Brucker,
	Eric Auger, Lu Baolu, wanghaibin.wang, yuzenghui, Pan, Jacob jun

On 2021/2/9 19:06, Liu, Yi L wrote:
>> From: Tian, Kevin <kevin.tian@intel.com>
>> Sent: Thursday, February 4, 2021 2:52 PM
>>
>>> From: Shenming Lu <lushenming@huawei.com>
>>> Sent: Tuesday, February 2, 2021 2:42 PM
>>>
>>> On 2021/2/1 15:56, Tian, Kevin wrote:
>>>>> From: Alex Williamson <alex.williamson@redhat.com>
>>>>> Sent: Saturday, January 30, 2021 6:58 AM
>>>>>
>>>>> On Mon, 25 Jan 2021 17:03:58 +0800
>>>>> Shenming Lu <lushenming@huawei.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> The static pinning and mapping problem in VFIO and possible
>> solutions
>>>>>> have been discussed a lot [1, 2]. One of the solutions is to add I/O
>>>>>> page fault support for VFIO devices. Different from those relatively
>>>>>> complicated software approaches such as presenting a vIOMMU that
>>>>> provides
>>>>>> the DMA buffer information (might include para-virtualized
>>> optimizations),
>>>>>> IOPF mainly depends on the hardware faulting capability, such as the
>>> PCIe
>>>>>> PRI extension or Arm SMMU stall model. What's more, the IOPF
>> support
>>> in
>>>>>> the IOMMU driver is being implemented in SVA [3]. So do we
>> consider to
>>>>>> add IOPF support for VFIO passthrough based on the IOPF part of SVA
>> at
>>>>>> present?
>>>>>>
>>>>>> We have implemented a basic demo only for one stage of translation
>>> (GPA
>>>>>> -> HPA in virtualization, note that it can be configured at either stage),
>>>>>> and tested on Hisilicon Kunpeng920 board. The nested mode is more
>>>>> complicated
>>>>>> since VFIO only handles the second stage page faults (same as the
>> non-
>>>>> nested
>>>>>> case), while the first stage page faults need to be further delivered to
>>>>>> the guest, which is being implemented in [4] on ARM. My thought on
>> this
>>>>>> is to report the page faults to VFIO regardless of the occured stage
>> (try
>>>>>> to carry the stage information), and handle respectively according to
>> the
>>>>>> configured mode in VFIO. Or the IOMMU driver might evolve to
>> support
>>>>> more...
>>>>>>
>>>>>> Might TODO:
>>>>>>  - Optimize the faulting path, and measure the performance (it might
>> still
>>>>>>    be a big issue).
>>>>>>  - Add support for PRI.
>>>>>>  - Add a MMU notifier to avoid pinning.
>>>>>>  - Add support for the nested mode.
>>>>>> ...
>>>>>>
>>>>>> Any comments and suggestions are very welcome. :-)
>>>>>
>>>>> I expect performance to be pretty bad here, the lookup involved per
>>>>> fault is excessive.  There are cases where a user is not going to be
>>>>> willing to have a slow ramp up of performance for their devices as they
>>>>> fault in pages, so we might need to considering making this
>>>>> configurable through the vfio interface.  Our page mapping also only
>>>>
>>>> There is another factor to be considered. The presence of IOMMU_
>>>> DEV_FEAT_IOPF just indicates the device capability of triggering I/O
>>>> page fault through the IOMMU, but not exactly means that the device
>>>> can tolerate I/O page fault for arbitrary DMA requests.
>>>
>>> Yes, so I add a iopf_enabled field in VFIO to indicate the whole path
>> faulting
>>> capability and set it to true after registering a VFIO page fault handler.
>>>
>>>> In reality, many
>>>> devices allow I/O faulting only in selective contexts. However, there
>>>> is no standard way (e.g. PCISIG) for the device to report whether
>>>> arbitrary I/O fault is allowed. Then we may have to maintain device
>>>> specific knowledge in software, e.g. in an opt-in table to list devices
>>>> which allows arbitrary faults. For devices which only support selective
>>>> faulting, a mediator (either through vendor extensions on vfio-pci-core
>>>> or a mdev wrapper) might be necessary to help lock down non-faultable
>>>> mappings and then enable faulting on the rest mappings.
>>>
>>> For devices which only support selective faulting, they could tell it to the
>>> IOMMU driver and let it filter out non-faultable faults? Do I get it wrong?
>>
>> Not exactly to IOMMU driver. There is already a vfio_pin_pages() for
>> selectively page-pinning. The matter is that 'they' imply some device
>> specific logic to decide which pages must be pinned and such knowledge
>> is outside of VFIO.
>>
>> From enabling p.o.v we could possibly do it in phased approach. First
>> handles devices which tolerate arbitrary DMA faults, and then extends
>> to devices with selective-faulting. The former is simpler, but with one
>> main open whether we want to maintain such device IDs in a static
>> table in VFIO or rely on some hints from other components (e.g. PF
>> driver in VF assignment case). Let's see how Alex thinks about it.
>>
>>>
>>>>
>>>>> grows here, should mappings expire or do we need a least recently
>>>>> mapped tracker to avoid exceeding the user's locked memory limit?
>> How
>>>>> does a user know what to set for a locked memory limit?  The behavior
>>>>> here would lead to cases where an idle system might be ok, but as
>> soon
>>>>> as load increases with more inflight DMA, we start seeing
>>>>> "unpredictable" I/O faults from the user perspective.  Seems like there
>>>>> are lots of outstanding considerations and I'd also like to hear from
>>>>> the SVA folks about how this meshes with their work.  Thanks,
>>>>>
>>>>
>>>> The main overlap between this feature and SVA is the IOPF reporting
>>>> framework, which currently still has gap to support both in nested
>>>> mode, as discussed here:
>>>>
>>>> https://lore.kernel.org/linux-acpi/YAaxjmJW+ZMvrhac@myrica/
>>>>
>>>> Once that gap is resolved in the future, the VFIO fault handler just
>>>> adopts different actions according to the fault-level: 1st level faults
>>>> are forwarded to userspace thru the vSVA path while 2nd-level faults
>>>> are fixed (or warned if not intended) by VFIO itself thru the IOMMU
>>>> mapping interface.
>>>
>>> I understand what you mean is:
>>> From the perspective of VFIO, first, we need to set FEAT_IOPF, and then
>>> regster its
>>> own handler with a flag to indicate FLAT or NESTED and which level is
>>> concerned,
>>> thus the VFIO handler can handle the page faults directly according to the
>>> carried
>>> level information.
>>>
>>> Is there any plan for evolving(implementing) the IOMMU driver to
>> support
>>> this? Or
>>> could we help this?  :-)
>>>
>>
>> Yes, it's in plan but just not happened yet. We are still focusing on guest
>> SVA part thus only the 1st-level page fault (+Yi/Jacob). It's always welcomed
>> to collaborate/help if you have time. ??
> 
> yeah, I saw Eric's page fault support patch is listed as reference. BTW.
> one thing needs to clarify, currently only one iommu fault handler supported
> for a single device. So for the fault handler added in this series, it should
> be consolidated with the one added in Eric's series.

Yeah, they could be combined. And maybe we could register the handler in this series
to the IOMMU driver, and when it receives page faults, further calls the handler in Eric's
series (maybe implemented as a callback in vfio_device_ops) if occurs at 1st-level.
We have to communicate with Eric about this. :-)

Thanks,
Shenming

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough
  2021-02-04  6:52       ` Tian, Kevin
  2021-02-05 10:37         ` Jean-Philippe Brucker
  2021-02-09 11:06         ` Liu, Yi L
@ 2021-03-18  7:53         ` Shenming Lu
  2021-03-18  9:07           ` Tian, Kevin
  2 siblings, 1 reply; 31+ messages in thread
From: Shenming Lu @ 2021-03-18  7:53 UTC (permalink / raw)
  To: Tian, Kevin, Alex Williamson
  Cc: Cornelia Huck, kvm, linux-kernel, Jean-Philippe Brucker,
	Eric Auger, Lu Baolu, wanghaibin.wang, yuzenghui, Liu, Yi L, Pan,
	Jacob jun

On 2021/2/4 14:52, Tian, Kevin wrote:>>> In reality, many
>>> devices allow I/O faulting only in selective contexts. However, there
>>> is no standard way (e.g. PCISIG) for the device to report whether
>>> arbitrary I/O fault is allowed. Then we may have to maintain device
>>> specific knowledge in software, e.g. in an opt-in table to list devices
>>> which allows arbitrary faults. For devices which only support selective
>>> faulting, a mediator (either through vendor extensions on vfio-pci-core
>>> or a mdev wrapper) might be necessary to help lock down non-faultable
>>> mappings and then enable faulting on the rest mappings.
>>
>> For devices which only support selective faulting, they could tell it to the
>> IOMMU driver and let it filter out non-faultable faults? Do I get it wrong?
> 
> Not exactly to IOMMU driver. There is already a vfio_pin_pages() for
> selectively page-pinning. The matter is that 'they' imply some device
> specific logic to decide which pages must be pinned and such knowledge
> is outside of VFIO.
> 
> From enabling p.o.v we could possibly do it in phased approach. First 
> handles devices which tolerate arbitrary DMA faults, and then extends
> to devices with selective-faulting. The former is simpler, but with one
> main open whether we want to maintain such device IDs in a static
> table in VFIO or rely on some hints from other components (e.g. PF
> driver in VF assignment case). Let's see how Alex thinks about it.

Hi Kevin,

You mentioned selective-faulting some time ago. I still have some doubt
about it:
There is already a vfio_pin_pages() which is used for limiting the IOMMU
group dirty scope to pinned pages, could it also be used for indicating
the faultable scope is limited to the pinned pages and the rest mappings
is non-faultable that should be pinned and mapped immediately? But it
seems to be a little weird and not exactly to what you meant... I will
be grateful if you can help to explain further. :-)

Thanks,
Shenming

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough
  2021-03-18  7:53         ` Shenming Lu
@ 2021-03-18  9:07           ` Tian, Kevin
  2021-03-18 11:53             ` Shenming Lu
  0 siblings, 1 reply; 31+ messages in thread
From: Tian, Kevin @ 2021-03-18  9:07 UTC (permalink / raw)
  To: Shenming Lu, Alex Williamson
  Cc: Cornelia Huck, kvm, linux-kernel, Jean-Philippe Brucker,
	Eric Auger, Lu Baolu, wanghaibin.wang, yuzenghui, Liu, Yi L, Pan,
	Jacob jun

> From: Shenming Lu <lushenming@huawei.com>
> Sent: Thursday, March 18, 2021 3:53 PM
> 
> On 2021/2/4 14:52, Tian, Kevin wrote:>>> In reality, many
> >>> devices allow I/O faulting only in selective contexts. However, there
> >>> is no standard way (e.g. PCISIG) for the device to report whether
> >>> arbitrary I/O fault is allowed. Then we may have to maintain device
> >>> specific knowledge in software, e.g. in an opt-in table to list devices
> >>> which allows arbitrary faults. For devices which only support selective
> >>> faulting, a mediator (either through vendor extensions on vfio-pci-core
> >>> or a mdev wrapper) might be necessary to help lock down non-faultable
> >>> mappings and then enable faulting on the rest mappings.
> >>
> >> For devices which only support selective faulting, they could tell it to the
> >> IOMMU driver and let it filter out non-faultable faults? Do I get it wrong?
> >
> > Not exactly to IOMMU driver. There is already a vfio_pin_pages() for
> > selectively page-pinning. The matter is that 'they' imply some device
> > specific logic to decide which pages must be pinned and such knowledge
> > is outside of VFIO.
> >
> > From enabling p.o.v we could possibly do it in phased approach. First
> > handles devices which tolerate arbitrary DMA faults, and then extends
> > to devices with selective-faulting. The former is simpler, but with one
> > main open whether we want to maintain such device IDs in a static
> > table in VFIO or rely on some hints from other components (e.g. PF
> > driver in VF assignment case). Let's see how Alex thinks about it.
> 
> Hi Kevin,
> 
> You mentioned selective-faulting some time ago. I still have some doubt
> about it:
> There is already a vfio_pin_pages() which is used for limiting the IOMMU
> group dirty scope to pinned pages, could it also be used for indicating
> the faultable scope is limited to the pinned pages and the rest mappings
> is non-faultable that should be pinned and mapped immediately? But it
> seems to be a little weird and not exactly to what you meant... I will
> be grateful if you can help to explain further. :-)
> 

The opposite, i.e. the vendor driver uses vfio_pin_pages to lock down
pages that are not faultable (based on its specific knowledge) and then
the rest memory becomes faultable.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough
  2021-03-18  9:07           ` Tian, Kevin
@ 2021-03-18 11:53             ` Shenming Lu
  2021-03-18 12:32               ` Tian, Kevin
  2021-03-19  0:33               ` Lu Baolu
  0 siblings, 2 replies; 31+ messages in thread
From: Shenming Lu @ 2021-03-18 11:53 UTC (permalink / raw)
  To: Tian, Kevin, Alex Williamson
  Cc: Cornelia Huck, kvm, linux-kernel, Jean-Philippe Brucker,
	Eric Auger, Lu Baolu, wanghaibin.wang, yuzenghui, Liu, Yi L, Pan,
	Jacob jun

On 2021/3/18 17:07, Tian, Kevin wrote:
>> From: Shenming Lu <lushenming@huawei.com>
>> Sent: Thursday, March 18, 2021 3:53 PM
>>
>> On 2021/2/4 14:52, Tian, Kevin wrote:>>> In reality, many
>>>>> devices allow I/O faulting only in selective contexts. However, there
>>>>> is no standard way (e.g. PCISIG) for the device to report whether
>>>>> arbitrary I/O fault is allowed. Then we may have to maintain device
>>>>> specific knowledge in software, e.g. in an opt-in table to list devices
>>>>> which allows arbitrary faults. For devices which only support selective
>>>>> faulting, a mediator (either through vendor extensions on vfio-pci-core
>>>>> or a mdev wrapper) might be necessary to help lock down non-faultable
>>>>> mappings and then enable faulting on the rest mappings.
>>>>
>>>> For devices which only support selective faulting, they could tell it to the
>>>> IOMMU driver and let it filter out non-faultable faults? Do I get it wrong?
>>>
>>> Not exactly to IOMMU driver. There is already a vfio_pin_pages() for
>>> selectively page-pinning. The matter is that 'they' imply some device
>>> specific logic to decide which pages must be pinned and such knowledge
>>> is outside of VFIO.
>>>
>>> From enabling p.o.v we could possibly do it in phased approach. First
>>> handles devices which tolerate arbitrary DMA faults, and then extends
>>> to devices with selective-faulting. The former is simpler, but with one
>>> main open whether we want to maintain such device IDs in a static
>>> table in VFIO or rely on some hints from other components (e.g. PF
>>> driver in VF assignment case). Let's see how Alex thinks about it.
>>
>> Hi Kevin,
>>
>> You mentioned selective-faulting some time ago. I still have some doubt
>> about it:
>> There is already a vfio_pin_pages() which is used for limiting the IOMMU
>> group dirty scope to pinned pages, could it also be used for indicating
>> the faultable scope is limited to the pinned pages and the rest mappings
>> is non-faultable that should be pinned and mapped immediately? But it
>> seems to be a little weird and not exactly to what you meant... I will
>> be grateful if you can help to explain further. :-)
>>
> 
> The opposite, i.e. the vendor driver uses vfio_pin_pages to lock down
> pages that are not faultable (based on its specific knowledge) and then
> the rest memory becomes faultable.

Ahh...
Thus, from the perspective of VFIO IOMMU, if IOPF enabled for such device,
only the page faults within the pinned range are valid in the registered
iommu fault handler...
I have another question here, for the IOMMU backed devices, they are already
all pinned and mapped when attaching, is there a need to call vfio_pin_pages()
to lock down pages for them? Did I miss something?...

Thanks,
Shenming

> 
> Thanks
> Kevin
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough
  2021-03-18 11:53             ` Shenming Lu
@ 2021-03-18 12:32               ` Tian, Kevin
  2021-03-18 12:47                 ` Shenming Lu
  2021-03-19  0:33               ` Lu Baolu
  1 sibling, 1 reply; 31+ messages in thread
From: Tian, Kevin @ 2021-03-18 12:32 UTC (permalink / raw)
  To: Shenming Lu, Alex Williamson
  Cc: Cornelia Huck, kvm, linux-kernel, Jean-Philippe Brucker,
	Eric Auger, Lu Baolu, wanghaibin.wang, yuzenghui, Liu, Yi L, Pan,
	Jacob jun

> From: Shenming Lu <lushenming@huawei.com>
> Sent: Thursday, March 18, 2021 7:54 PM
> 
> On 2021/3/18 17:07, Tian, Kevin wrote:
> >> From: Shenming Lu <lushenming@huawei.com>
> >> Sent: Thursday, March 18, 2021 3:53 PM
> >>
> >> On 2021/2/4 14:52, Tian, Kevin wrote:>>> In reality, many
> >>>>> devices allow I/O faulting only in selective contexts. However, there
> >>>>> is no standard way (e.g. PCISIG) for the device to report whether
> >>>>> arbitrary I/O fault is allowed. Then we may have to maintain device
> >>>>> specific knowledge in software, e.g. in an opt-in table to list devices
> >>>>> which allows arbitrary faults. For devices which only support selective
> >>>>> faulting, a mediator (either through vendor extensions on vfio-pci-core
> >>>>> or a mdev wrapper) might be necessary to help lock down non-
> faultable
> >>>>> mappings and then enable faulting on the rest mappings.
> >>>>
> >>>> For devices which only support selective faulting, they could tell it to the
> >>>> IOMMU driver and let it filter out non-faultable faults? Do I get it wrong?
> >>>
> >>> Not exactly to IOMMU driver. There is already a vfio_pin_pages() for
> >>> selectively page-pinning. The matter is that 'they' imply some device
> >>> specific logic to decide which pages must be pinned and such knowledge
> >>> is outside of VFIO.
> >>>
> >>> From enabling p.o.v we could possibly do it in phased approach. First
> >>> handles devices which tolerate arbitrary DMA faults, and then extends
> >>> to devices with selective-faulting. The former is simpler, but with one
> >>> main open whether we want to maintain such device IDs in a static
> >>> table in VFIO or rely on some hints from other components (e.g. PF
> >>> driver in VF assignment case). Let's see how Alex thinks about it.
> >>
> >> Hi Kevin,
> >>
> >> You mentioned selective-faulting some time ago. I still have some doubt
> >> about it:
> >> There is already a vfio_pin_pages() which is used for limiting the IOMMU
> >> group dirty scope to pinned pages, could it also be used for indicating
> >> the faultable scope is limited to the pinned pages and the rest mappings
> >> is non-faultable that should be pinned and mapped immediately? But it
> >> seems to be a little weird and not exactly to what you meant... I will
> >> be grateful if you can help to explain further. :-)
> >>
> >
> > The opposite, i.e. the vendor driver uses vfio_pin_pages to lock down
> > pages that are not faultable (based on its specific knowledge) and then
> > the rest memory becomes faultable.
> 
> Ahh...
> Thus, from the perspective of VFIO IOMMU, if IOPF enabled for such device,
> only the page faults within the pinned range are valid in the registered
> iommu fault handler...
> I have another question here, for the IOMMU backed devices, they are
> already
> all pinned and mapped when attaching, is there a need to call
> vfio_pin_pages()
> to lock down pages for them? Did I miss something?...
> 

If a device is marked as supporting I/O page fault (fully or selectively), 
there should be no pinning at attach or DMA_MAP time (suppose as 
this series does). Then for devices with selective-faulting its vendor 
driver will lock down the pages which are not faultable at run-time, 
e.g. when intercepting guest registration of a ring buffer...

Thanks
Kevin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough
  2021-03-18 12:32               ` Tian, Kevin
@ 2021-03-18 12:47                 ` Shenming Lu
  0 siblings, 0 replies; 31+ messages in thread
From: Shenming Lu @ 2021-03-18 12:47 UTC (permalink / raw)
  To: Tian, Kevin, Alex Williamson
  Cc: Cornelia Huck, kvm, linux-kernel, Jean-Philippe Brucker,
	Eric Auger, Lu Baolu, wanghaibin.wang, yuzenghui, Liu, Yi L, Pan,
	Jacob jun

On 2021/3/18 20:32, Tian, Kevin wrote:
>> From: Shenming Lu <lushenming@huawei.com>
>> Sent: Thursday, March 18, 2021 7:54 PM
>>
>> On 2021/3/18 17:07, Tian, Kevin wrote:
>>>> From: Shenming Lu <lushenming@huawei.com>
>>>> Sent: Thursday, March 18, 2021 3:53 PM
>>>>
>>>> On 2021/2/4 14:52, Tian, Kevin wrote:>>> In reality, many
>>>>>>> devices allow I/O faulting only in selective contexts. However, there
>>>>>>> is no standard way (e.g. PCISIG) for the device to report whether
>>>>>>> arbitrary I/O fault is allowed. Then we may have to maintain device
>>>>>>> specific knowledge in software, e.g. in an opt-in table to list devices
>>>>>>> which allows arbitrary faults. For devices which only support selective
>>>>>>> faulting, a mediator (either through vendor extensions on vfio-pci-core
>>>>>>> or a mdev wrapper) might be necessary to help lock down non-
>> faultable
>>>>>>> mappings and then enable faulting on the rest mappings.
>>>>>>
>>>>>> For devices which only support selective faulting, they could tell it to the
>>>>>> IOMMU driver and let it filter out non-faultable faults? Do I get it wrong?
>>>>>
>>>>> Not exactly to IOMMU driver. There is already a vfio_pin_pages() for
>>>>> selectively page-pinning. The matter is that 'they' imply some device
>>>>> specific logic to decide which pages must be pinned and such knowledge
>>>>> is outside of VFIO.
>>>>>
>>>>> From enabling p.o.v we could possibly do it in phased approach. First
>>>>> handles devices which tolerate arbitrary DMA faults, and then extends
>>>>> to devices with selective-faulting. The former is simpler, but with one
>>>>> main open whether we want to maintain such device IDs in a static
>>>>> table in VFIO or rely on some hints from other components (e.g. PF
>>>>> driver in VF assignment case). Let's see how Alex thinks about it.
>>>>
>>>> Hi Kevin,
>>>>
>>>> You mentioned selective-faulting some time ago. I still have some doubt
>>>> about it:
>>>> There is already a vfio_pin_pages() which is used for limiting the IOMMU
>>>> group dirty scope to pinned pages, could it also be used for indicating
>>>> the faultable scope is limited to the pinned pages and the rest mappings
>>>> is non-faultable that should be pinned and mapped immediately? But it
>>>> seems to be a little weird and not exactly to what you meant... I will
>>>> be grateful if you can help to explain further. :-)
>>>>
>>>
>>> The opposite, i.e. the vendor driver uses vfio_pin_pages to lock down
>>> pages that are not faultable (based on its specific knowledge) and then
>>> the rest memory becomes faultable.
>>
>> Ahh...
>> Thus, from the perspective of VFIO IOMMU, if IOPF enabled for such device,
>> only the page faults within the pinned range are valid in the registered
>> iommu fault handler...
>> I have another question here, for the IOMMU backed devices, they are
>> already
>> all pinned and mapped when attaching, is there a need to call
>> vfio_pin_pages()
>> to lock down pages for them? Did I miss something?...
>>
> 
> If a device is marked as supporting I/O page fault (fully or selectively), 
> there should be no pinning at attach or DMA_MAP time (suppose as 
> this series does). Then for devices with selective-faulting its vendor 
> driver will lock down the pages which are not faultable at run-time, 
> e.g. when intercepting guest registration of a ring buffer...

Get it. Thanks a lot for this! :-)

Shenming

> 
> Thanks
> Kevin
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough
  2021-03-18 11:53             ` Shenming Lu
  2021-03-18 12:32               ` Tian, Kevin
@ 2021-03-19  0:33               ` Lu Baolu
  2021-03-19  1:30                 ` Keqian Zhu
  1 sibling, 1 reply; 31+ messages in thread
From: Lu Baolu @ 2021-03-19  0:33 UTC (permalink / raw)
  To: Shenming Lu, Tian, Kevin, Alex Williamson
  Cc: baolu.lu, Cornelia Huck, kvm, linux-kernel,
	Jean-Philippe Brucker, Eric Auger, wanghaibin.wang, yuzenghui,
	Liu, Yi L, Pan, Jacob jun

On 3/18/21 7:53 PM, Shenming Lu wrote:
> On 2021/3/18 17:07, Tian, Kevin wrote:
>>> From: Shenming Lu <lushenming@huawei.com>
>>> Sent: Thursday, March 18, 2021 3:53 PM
>>>
>>> On 2021/2/4 14:52, Tian, Kevin wrote:>>> In reality, many
>>>>>> devices allow I/O faulting only in selective contexts. However, there
>>>>>> is no standard way (e.g. PCISIG) for the device to report whether
>>>>>> arbitrary I/O fault is allowed. Then we may have to maintain device
>>>>>> specific knowledge in software, e.g. in an opt-in table to list devices
>>>>>> which allows arbitrary faults. For devices which only support selective
>>>>>> faulting, a mediator (either through vendor extensions on vfio-pci-core
>>>>>> or a mdev wrapper) might be necessary to help lock down non-faultable
>>>>>> mappings and then enable faulting on the rest mappings.
>>>>>
>>>>> For devices which only support selective faulting, they could tell it to the
>>>>> IOMMU driver and let it filter out non-faultable faults? Do I get it wrong?
>>>>
>>>> Not exactly to IOMMU driver. There is already a vfio_pin_pages() for
>>>> selectively page-pinning. The matter is that 'they' imply some device
>>>> specific logic to decide which pages must be pinned and such knowledge
>>>> is outside of VFIO.
>>>>
>>>>  From enabling p.o.v we could possibly do it in phased approach. First
>>>> handles devices which tolerate arbitrary DMA faults, and then extends
>>>> to devices with selective-faulting. The former is simpler, but with one
>>>> main open whether we want to maintain such device IDs in a static
>>>> table in VFIO or rely on some hints from other components (e.g. PF
>>>> driver in VF assignment case). Let's see how Alex thinks about it.
>>>
>>> Hi Kevin,
>>>
>>> You mentioned selective-faulting some time ago. I still have some doubt
>>> about it:
>>> There is already a vfio_pin_pages() which is used for limiting the IOMMU
>>> group dirty scope to pinned pages, could it also be used for indicating
>>> the faultable scope is limited to the pinned pages and the rest mappings
>>> is non-faultable that should be pinned and mapped immediately? But it
>>> seems to be a little weird and not exactly to what you meant... I will
>>> be grateful if you can help to explain further. :-)
>>>
>>
>> The opposite, i.e. the vendor driver uses vfio_pin_pages to lock down
>> pages that are not faultable (based on its specific knowledge) and then
>> the rest memory becomes faultable.
> 
> Ahh...
> Thus, from the perspective of VFIO IOMMU, if IOPF enabled for such device,
> only the page faults within the pinned range are valid in the registered
> iommu fault handler...

Isn't it opposite? The pinned pages will never generate any page faults.
I might miss some contexts here.

> I have another question here, for the IOMMU backed devices, they are already
> all pinned and mapped when attaching, is there a need to call vfio_pin_pages()
> to lock down pages for them? Did I miss something?...

Best regards,
baolu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough
  2021-03-19  0:33               ` Lu Baolu
@ 2021-03-19  1:30                 ` Keqian Zhu
  2021-03-20  1:35                   ` Lu Baolu
  0 siblings, 1 reply; 31+ messages in thread
From: Keqian Zhu @ 2021-03-19  1:30 UTC (permalink / raw)
  To: Lu Baolu, Shenming Lu, Tian, Kevin, Alex Williamson
  Cc: Cornelia Huck, kvm, linux-kernel, Jean-Philippe Brucker,
	Eric Auger, wanghaibin.wang, yuzenghui, Liu, Yi L, Pan,
	Jacob jun

Hi Baolu,

On 2021/3/19 8:33, Lu Baolu wrote:
> On 3/18/21 7:53 PM, Shenming Lu wrote:
>> On 2021/3/18 17:07, Tian, Kevin wrote:
>>>> From: Shenming Lu <lushenming@huawei.com>
>>>> Sent: Thursday, March 18, 2021 3:53 PM
>>>>
>>>> On 2021/2/4 14:52, Tian, Kevin wrote:>>> In reality, many
>>>>>>> devices allow I/O faulting only in selective contexts. However, there
>>>>>>> is no standard way (e.g. PCISIG) for the device to report whether
>>>>>>> arbitrary I/O fault is allowed. Then we may have to maintain device
>>>>>>> specific knowledge in software, e.g. in an opt-in table to list devices
>>>>>>> which allows arbitrary faults. For devices which only support selective
>>>>>>> faulting, a mediator (either through vendor extensions on vfio-pci-core
>>>>>>> or a mdev wrapper) might be necessary to help lock down non-faultable
>>>>>>> mappings and then enable faulting on the rest mappings.
>>>>>>
>>>>>> For devices which only support selective faulting, they could tell it to the
>>>>>> IOMMU driver and let it filter out non-faultable faults? Do I get it wrong?
>>>>>
>>>>> Not exactly to IOMMU driver. There is already a vfio_pin_pages() for
>>>>> selectively page-pinning. The matter is that 'they' imply some device
>>>>> specific logic to decide which pages must be pinned and such knowledge
>>>>> is outside of VFIO.
>>>>>
>>>>>  From enabling p.o.v we could possibly do it in phased approach. First
>>>>> handles devices which tolerate arbitrary DMA faults, and then extends
>>>>> to devices with selective-faulting. The former is simpler, but with one
>>>>> main open whether we want to maintain such device IDs in a static
>>>>> table in VFIO or rely on some hints from other components (e.g. PF
>>>>> driver in VF assignment case). Let's see how Alex thinks about it.
>>>>
>>>> Hi Kevin,
>>>>
>>>> You mentioned selective-faulting some time ago. I still have some doubt
>>>> about it:
>>>> There is already a vfio_pin_pages() which is used for limiting the IOMMU
>>>> group dirty scope to pinned pages, could it also be used for indicating
>>>> the faultable scope is limited to the pinned pages and the rest mappings
>>>> is non-faultable that should be pinned and mapped immediately? But it
>>>> seems to be a little weird and not exactly to what you meant... I will
>>>> be grateful if you can help to explain further. :-)
>>>>
>>>
>>> The opposite, i.e. the vendor driver uses vfio_pin_pages to lock down
>>> pages that are not faultable (based on its specific knowledge) and then
>>> the rest memory becomes faultable.
>>
>> Ahh...
>> Thus, from the perspective of VFIO IOMMU, if IOPF enabled for such device,
>> only the page faults within the pinned range are valid in the registered
>> iommu fault handler...
> 
> Isn't it opposite? The pinned pages will never generate any page faults.
> I might miss some contexts here.
It seems that vfio_pin_pages() just pin some pages and record the pinned scope to pfn_list of vfio_dma.
No mapping is established, so we still has page faults.

IIUC, vfio_pin_pages() is used to
1. pin pages for non-iommu backed devices.
2. mark dirty scope for non-iommu backed devices and iommu backed devices.

Thanks,
Keqian

> 
>> I have another question here, for the IOMMU backed devices, they are already
>> all pinned and mapped when attaching, is there a need to call vfio_pin_pages()
>> to lock down pages for them? Did I miss something?...
> 
> Best regards,
> baolu
> .
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough
  2021-03-19  1:30                 ` Keqian Zhu
@ 2021-03-20  1:35                   ` Lu Baolu
  0 siblings, 0 replies; 31+ messages in thread
From: Lu Baolu @ 2021-03-20  1:35 UTC (permalink / raw)
  To: Keqian Zhu, Shenming Lu, Tian, Kevin, Alex Williamson
  Cc: baolu.lu, Cornelia Huck, kvm, linux-kernel,
	Jean-Philippe Brucker, Eric Auger, wanghaibin.wang, yuzenghui,
	Liu, Yi L, Pan, Jacob jun

On 3/19/21 9:30 AM, Keqian Zhu wrote:
> Hi Baolu,
> 
> On 2021/3/19 8:33, Lu Baolu wrote:
>> On 3/18/21 7:53 PM, Shenming Lu wrote:
>>> On 2021/3/18 17:07, Tian, Kevin wrote:
>>>>> From: Shenming Lu<lushenming@huawei.com>
>>>>> Sent: Thursday, March 18, 2021 3:53 PM
>>>>>
>>>>> On 2021/2/4 14:52, Tian, Kevin wrote:>>> In reality, many
>>>>>>>> devices allow I/O faulting only in selective contexts. However, there
>>>>>>>> is no standard way (e.g. PCISIG) for the device to report whether
>>>>>>>> arbitrary I/O fault is allowed. Then we may have to maintain device
>>>>>>>> specific knowledge in software, e.g. in an opt-in table to list devices
>>>>>>>> which allows arbitrary faults. For devices which only support selective
>>>>>>>> faulting, a mediator (either through vendor extensions on vfio-pci-core
>>>>>>>> or a mdev wrapper) might be necessary to help lock down non-faultable
>>>>>>>> mappings and then enable faulting on the rest mappings.
>>>>>>> For devices which only support selective faulting, they could tell it to the
>>>>>>> IOMMU driver and let it filter out non-faultable faults? Do I get it wrong?
>>>>>> Not exactly to IOMMU driver. There is already a vfio_pin_pages() for
>>>>>> selectively page-pinning. The matter is that 'they' imply some device
>>>>>> specific logic to decide which pages must be pinned and such knowledge
>>>>>> is outside of VFIO.
>>>>>>
>>>>>>   From enabling p.o.v we could possibly do it in phased approach. First
>>>>>> handles devices which tolerate arbitrary DMA faults, and then extends
>>>>>> to devices with selective-faulting. The former is simpler, but with one
>>>>>> main open whether we want to maintain such device IDs in a static
>>>>>> table in VFIO or rely on some hints from other components (e.g. PF
>>>>>> driver in VF assignment case). Let's see how Alex thinks about it.
>>>>> Hi Kevin,
>>>>>
>>>>> You mentioned selective-faulting some time ago. I still have some doubt
>>>>> about it:
>>>>> There is already a vfio_pin_pages() which is used for limiting the IOMMU
>>>>> group dirty scope to pinned pages, could it also be used for indicating
>>>>> the faultable scope is limited to the pinned pages and the rest mappings
>>>>> is non-faultable that should be pinned and mapped immediately? But it
>>>>> seems to be a little weird and not exactly to what you meant... I will
>>>>> be grateful if you can help to explain further.:-)
>>>>>
>>>> The opposite, i.e. the vendor driver uses vfio_pin_pages to lock down
>>>> pages that are not faultable (based on its specific knowledge) and then
>>>> the rest memory becomes faultable.
>>> Ahh...
>>> Thus, from the perspective of VFIO IOMMU, if IOPF enabled for such device,
>>> only the page faults within the pinned range are valid in the registered
>>> iommu fault handler...
>> Isn't it opposite? The pinned pages will never generate any page faults.
>> I might miss some contexts here.
> It seems that vfio_pin_pages() just pin some pages and record the pinned scope to pfn_list of vfio_dma.
> No mapping is established, so we still has page faults.

Make sense. Thanks a lot for the explanation.

> 
> IIUC, vfio_pin_pages() is used to
> 1. pin pages for non-iommu backed devices.
> 2. mark dirty scope for non-iommu backed devices and iommu backed devices.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2021-03-20  1:46 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-25  9:03 [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough Shenming Lu
2021-01-25  9:03 ` [RFC PATCH v1 1/4] vfio/type1: Add a bitmap to track IOPF mapped pages Shenming Lu
2021-01-29 22:58   ` Alex Williamson
2021-01-30  9:31     ` Shenming Lu
2021-01-25  9:04 ` [RFC PATCH v1 2/4] vfio: Add a page fault handler Shenming Lu
2021-01-27 17:42   ` Christoph Hellwig
2021-01-28  6:10     ` Shenming Lu
2021-01-25  9:04 ` [RFC PATCH v1 3/4] vfio: Try to enable IOPF for VFIO devices Shenming Lu
2021-01-26 14:33   ` kernel test robot
2021-01-27  6:31   ` kernel test robot
2021-01-29 22:42   ` Alex Williamson
2021-01-30  9:31     ` Shenming Lu
2021-01-25  9:04 ` [RFC PATCH v1 4/4] vfio: Allow to pin and map dynamically Shenming Lu
2021-01-29 22:57 ` [RFC PATCH v1 0/4] vfio: Add IOPF support for VFIO passthrough Alex Williamson
2021-01-30  9:30   ` Shenming Lu
2021-02-01  7:56   ` Tian, Kevin
2021-02-02  6:41     ` Shenming Lu
2021-02-04  6:52       ` Tian, Kevin
2021-02-05 10:37         ` Jean-Philippe Brucker
2021-02-07  8:20           ` Tian, Kevin
2021-02-07 11:47             ` Shenming Lu
2021-02-09 11:06         ` Liu, Yi L
2021-02-10  8:02           ` Shenming Lu
2021-03-18  7:53         ` Shenming Lu
2021-03-18  9:07           ` Tian, Kevin
2021-03-18 11:53             ` Shenming Lu
2021-03-18 12:32               ` Tian, Kevin
2021-03-18 12:47                 ` Shenming Lu
2021-03-19  0:33               ` Lu Baolu
2021-03-19  1:30                 ` Keqian Zhu
2021-03-20  1:35                   ` Lu Baolu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.