All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] vfio/type1/pci: IOMMU PFNMAP invalidation
@ 2020-05-14 16:51 Alex Williamson
  2020-05-14 16:51 ` [PATCH 1/2] vfio: Introduce bus driver to IOMMU invalidation interface Alex Williamson
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Alex Williamson @ 2020-05-14 16:51 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, cohuck, jgg, peterx

This is a follow-on series to "vfio-pci: Block user access to disabled
device MMIO"[1], which extends user access blocking of disabled MMIO
ranges to include unmapping the ranges from the IOMMU.  The first patch
adds an invalidation callback path, allowing vfio bus drivers to signal
the IOMMU backend to unmap ranges with vma level granularity.  This
signaling is done both when the MMIO range becomes inaccessible due to
memory disabling, as well as when a vma is closed, making up for the
lack of tracking or pinning for non-page backed vmas.  The second
patch adds registration and testing interfaces such that the IOMMU
backend driver can test whether a given PFNMAP vma is provided by a
vfio bus driver supporting invalidation.  We can then implement more
restricted semantics to only allow PFNMAP DMA mappings when we have
such support, which becomes the new default.

Jason, if you'd like Suggested-by credit for the ideas here I'd be
glad to add it.  Thanks,

Alex

[1]https://lore.kernel.org/kvm/158871401328.15589.17598154478222071285.stgit@gimli.home/

---

Alex Williamson (2):
      vfio: Introduce bus driver to IOMMU invalidation interface
      vfio: Introduce strict PFNMAP mappings


 drivers/vfio/pci/vfio_pci.c         |   41 ++++++++++-
 drivers/vfio/pci/vfio_pci_private.h |    1 
 drivers/vfio/vfio.c                 |   76 ++++++++++++++++++++
 drivers/vfio/vfio_iommu_type1.c     |  130 +++++++++++++++++++++++++++--------
 include/linux/vfio.h                |    9 ++
 5 files changed, 222 insertions(+), 35 deletions(-)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] vfio: Introduce bus driver to IOMMU invalidation interface
  2020-05-14 16:51 [PATCH 0/2] vfio/type1/pci: IOMMU PFNMAP invalidation Alex Williamson
@ 2020-05-14 16:51 ` Alex Williamson
  2020-05-20  0:19   ` Jason Gunthorpe
  2020-05-14 16:52 ` [PATCH 2/2] vfio: Introduce strict PFNMAP mappings Alex Williamson
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Alex Williamson @ 2020-05-14 16:51 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, cohuck, jgg, peterx

VFIO bus drivers, like vfio-pci, can allow mmaps of non-page backed
device memory, such as MMIO regions of the device.  The user may then
map these ranges through the IOMMU, for example to enable peer-to-peer
DMA between devices.  When these ranges are zapped or removed from the
user, such as when the MMIO region is disabled or the device is
released, we should also remove the IOMMU mapping.  This provides
kernel level enforcement of the behavior we already see from userspace
drivers like QEMU, where these ranges are unmapped when they become
inaccessible.  This userspace behavior is still recommended as this
support only provides invalidation, dropping unmapped vmas.  Those
vmas are not automatically re-installed when re-mapped.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/pci/vfio_pci.c         |   34 +++++++++-
 drivers/vfio/pci/vfio_pci_private.h |    1 
 drivers/vfio/vfio.c                 |   14 ++++
 drivers/vfio/vfio_iommu_type1.c     |  123 ++++++++++++++++++++++++++---------
 include/linux/vfio.h                |    5 +
 5 files changed, 142 insertions(+), 35 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 49ae9faa6099..100fe5f6bc22 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -521,6 +521,8 @@ static void vfio_pci_release(void *device_data)
 		vfio_pci_vf_token_user_add(vdev, -1);
 		vfio_spapr_pci_eeh_release(vdev->pdev);
 		vfio_pci_disable(vdev);
+		vfio_group_put_external_user(vdev->group);
+		vdev->group = NULL;
 	}
 
 	mutex_unlock(&vdev->reflck->lock);
@@ -539,6 +541,15 @@ static int vfio_pci_open(void *device_data)
 	mutex_lock(&vdev->reflck->lock);
 
 	if (!vdev->refcnt) {
+		struct pci_dev *pdev = vdev->pdev;
+
+		vdev->group = vfio_group_get_external_user_from_dev(&pdev->dev);
+		if (IS_ERR_OR_NULL(vdev->group)) {
+			ret = PTR_ERR(vdev->group);
+			vdev->group = NULL;
+			goto error;
+		}
+
 		ret = vfio_pci_enable(vdev);
 		if (ret)
 			goto error;
@@ -549,8 +560,13 @@ static int vfio_pci_open(void *device_data)
 	vdev->refcnt++;
 error:
 	mutex_unlock(&vdev->reflck->lock);
-	if (ret)
+	if (ret) {
 		module_put(THIS_MODULE);
+		if (vdev->group) {
+			vfio_group_put_external_user(vdev->group);
+			vdev->group = NULL;
+		}
+	}
 	return ret;
 }
 
@@ -1370,7 +1386,7 @@ static ssize_t vfio_pci_write(void *device_data, const char __user *buf,
 /* Return 1 on zap and vma_lock acquired, 0 on contention (only with @try) */
 static int vfio_pci_zap_and_vma_lock(struct vfio_pci_device *vdev, bool try)
 {
-	struct vfio_pci_mmap_vma *mmap_vma, *tmp;
+	struct vfio_pci_mmap_vma *mmap_vma;
 
 	/*
 	 * Lock ordering:
@@ -1420,6 +1436,7 @@ static int vfio_pci_zap_and_vma_lock(struct vfio_pci_device *vdev, bool try)
 			return 1;
 		mutex_unlock(&vdev->vma_lock);
 
+again:
 		if (try) {
 			if (!down_read_trylock(&mm->mmap_sem)) {
 				mmput(mm);
@@ -1438,8 +1455,8 @@ static int vfio_pci_zap_and_vma_lock(struct vfio_pci_device *vdev, bool try)
 			} else {
 				mutex_lock(&vdev->vma_lock);
 			}
-			list_for_each_entry_safe(mmap_vma, tmp,
-						 &vdev->vma_list, vma_next) {
+			list_for_each_entry(mmap_vma,
+					    &vdev->vma_list, vma_next) {
 				struct vm_area_struct *vma = mmap_vma->vma;
 
 				if (vma->vm_mm != mm)
@@ -1450,6 +1467,10 @@ static int vfio_pci_zap_and_vma_lock(struct vfio_pci_device *vdev, bool try)
 
 				zap_vma_ptes(vma, vma->vm_start,
 					     vma->vm_end - vma->vm_start);
+				mutex_unlock(&vdev->vma_lock);
+				up_read(&mm->mmap_sem);
+				vfio_invalidate_pfnmap_vma(vdev->group, vma);
+				goto again;
 			}
 			mutex_unlock(&vdev->vma_lock);
 		}
@@ -1494,16 +1515,21 @@ static void vfio_pci_mmap_close(struct vm_area_struct *vma)
 {
 	struct vfio_pci_device *vdev = vma->vm_private_data;
 	struct vfio_pci_mmap_vma *mmap_vma;
+	bool found = false;
 
 	mutex_lock(&vdev->vma_lock);
 	list_for_each_entry(mmap_vma, &vdev->vma_list, vma_next) {
 		if (mmap_vma->vma == vma) {
 			list_del(&mmap_vma->vma_next);
 			kfree(mmap_vma);
+			found = true;
 			break;
 		}
 	}
 	mutex_unlock(&vdev->vma_lock);
+
+	if (found)
+		vfio_invalidate_pfnmap_vma(vdev->group, vma);
 }
 
 static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index c4f25f1e80d7..abfb60674506 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -140,6 +140,7 @@ struct vfio_pci_device {
 	struct mutex		vma_lock;
 	struct list_head	vma_list;
 	struct rw_semaphore	memory_lock;
+	struct vfio_group	*group;
 };
 
 #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX)
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 765e0e5d83ed..0fff057b7cd9 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -2151,6 +2151,20 @@ int vfio_dma_rw(struct vfio_group *group, dma_addr_t user_iova,
 }
 EXPORT_SYMBOL(vfio_dma_rw);
 
+void vfio_invalidate_pfnmap_vma(struct vfio_group *group,
+				struct vm_area_struct *vma)
+{
+	struct vfio_container *container;
+	struct vfio_iommu_driver *driver;
+
+	container = group->container;
+	driver = container->iommu_driver;
+
+	if (likely(driver && driver->ops->invalidate_pfnmap_vma))
+		driver->ops->invalidate_pfnmap_vma(container->iommu_data, vma);
+}
+EXPORT_SYMBOL(vfio_invalidate_pfnmap_vma);
+
 static int vfio_register_iommu_notifier(struct vfio_group *group,
 					unsigned long *events,
 					struct notifier_block *nb)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 4a4cb7cd86b2..62ba6bd8a486 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -91,6 +91,7 @@ struct vfio_dma {
 	bool			lock_cap;	/* capable(CAP_IPC_LOCK) */
 	struct task_struct	*task;
 	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
+	struct vm_area_struct	*pfnmap_vma;
 };
 
 struct vfio_group {
@@ -343,21 +344,25 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
 	return ret;
 }
 
-static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
-			 int prot, unsigned long *pfn)
+static int vaddr_get_pfn(struct vfio_dma *dma, struct mm_struct *mm,
+			 unsigned long vaddr, unsigned long *pfn)
 {
 	struct page *page[1];
 	struct vm_area_struct *vma;
 	unsigned int flags = 0;
 	int ret;
 
-	if (prot & IOMMU_WRITE)
+	if (dma->prot & IOMMU_WRITE)
 		flags |= FOLL_WRITE;
 
 	down_read(&mm->mmap_sem);
 	ret = pin_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
 				    page, NULL, NULL);
 	if (ret == 1) {
+		if (dma->pfnmap_vma) {
+			unpin_user_page(page[0]);
+			return -EINVAL;
+		}
 		*pfn = page_to_pfn(page[0]);
 		ret = 0;
 		goto done;
@@ -369,7 +374,15 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
 	vma = find_vma_intersection(mm, vaddr, vaddr + 1);
 
 	if (vma && vma->vm_flags & VM_PFNMAP) {
-		ret = follow_fault_pfn(vma, mm, vaddr, pfn, prot & IOMMU_WRITE);
+		if ((dma->pfnmap_vma && dma->pfnmap_vma != vma) ||
+		    (!dma->pfnmap_vma && vaddr != dma->vaddr)) {
+			ret = -EPERM;
+			goto done;
+		}
+
+		dma->pfnmap_vma = vma;
+
+		ret = follow_fault_pfn(vma, mm, vaddr, pfn, flags & FOLL_WRITE);
 		if (ret == -EAGAIN)
 			goto retry;
 
@@ -399,7 +412,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
 	if (!current->mm)
 		return -ENODEV;
 
-	ret = vaddr_get_pfn(current->mm, vaddr, dma->prot, pfn_base);
+	ret = vaddr_get_pfn(dma, current->mm, vaddr, pfn_base);
 	if (ret)
 		return ret;
 
@@ -426,7 +439,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
 	/* Lock all the consecutive pages from pfn_base */
 	for (vaddr += PAGE_SIZE, iova += PAGE_SIZE; pinned < npage;
 	     pinned++, vaddr += PAGE_SIZE, iova += PAGE_SIZE) {
-		ret = vaddr_get_pfn(current->mm, vaddr, dma->prot, &pfn);
+		ret = vaddr_get_pfn(dma, current->mm, vaddr, &pfn);
 		if (ret)
 			break;
 
@@ -496,7 +509,7 @@ static int vfio_pin_page_external(struct vfio_dma *dma, unsigned long vaddr,
 	if (!mm)
 		return -ENODEV;
 
-	ret = vaddr_get_pfn(mm, vaddr, dma->prot, pfn_base);
+	ret = vaddr_get_pfn(dma, mm, vaddr, pfn_base);
 	if (!ret && do_accounting && !is_invalid_reserved_pfn(*pfn_base)) {
 		ret = vfio_lock_acct(dma, 1, true);
 		if (ret) {
@@ -860,6 +873,75 @@ static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
 	return bitmap;
 }
 
+static struct vfio_dma *vfio_find_dma_by_pfnmap_vma(struct vfio_iommu *iommu,
+						    struct vm_area_struct *vma)
+{
+	struct rb_node *node = rb_first(&iommu->dma_list);
+
+	for (; node; node = rb_next(node)) {
+		struct vfio_dma *dma = rb_entry(node, struct vfio_dma, node);
+
+		if (dma->pfnmap_vma == vma)
+			return dma;
+	}
+
+	return NULL;
+}
+
+/*
+ * Called with iommu->lock
+ * Return 1 if lock dropped to notify pfn holders, zero otherwise
+ */
+static int unmap_dma_pfn_list(struct vfio_iommu *iommu, struct vfio_dma *dma,
+			      struct vfio_dma **dma_last, int *retries)
+{
+	if (!RB_EMPTY_ROOT(&dma->pfn_list)) {
+		struct vfio_iommu_type1_dma_unmap nb_unmap;
+
+		if (*dma_last == dma) {
+			BUG_ON(++(*retries) > 10);
+		} else {
+			*dma_last = dma;
+			*retries = 0;
+		}
+
+		nb_unmap.iova = dma->iova;
+		nb_unmap.size = dma->size;
+
+		/*
+		 * Notify anyone (mdev vendor drivers) to invalidate and
+		 * unmap iovas within the range we're about to unmap.
+		 * Vendor drivers MUST unpin pages in response to an
+		 * invalidation.
+		 */
+		mutex_unlock(&iommu->lock);
+		blocking_notifier_call_chain(&iommu->notifier,
+					    VFIO_IOMMU_NOTIFY_DMA_UNMAP,
+					    &nb_unmap);
+		return 1;
+	}
+	return 0;
+}
+
+static void vfio_iommu_type1_invalidate_pfnmap_vma(void *iommu_data,
+						   struct vm_area_struct *vma)
+{
+	struct vfio_iommu *iommu = iommu_data;
+	struct vfio_dma *dma, *dma_last = NULL;
+	int retries = 0;
+
+again:
+	mutex_lock(&iommu->lock);
+
+	while ((dma = vfio_find_dma_by_pfnmap_vma(iommu, vma))) {
+		if (unmap_dma_pfn_list(iommu, dma, &dma_last, &retries))
+			goto again;
+
+		vfio_remove_dma(iommu, dma);
+	}
+	mutex_unlock(&iommu->lock);
+}
+
 static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 			     struct vfio_iommu_type1_dma_unmap *unmap)
 {
@@ -936,31 +1018,9 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 		if (dma->task->mm != current->mm)
 			break;
 
-		if (!RB_EMPTY_ROOT(&dma->pfn_list)) {
-			struct vfio_iommu_type1_dma_unmap nb_unmap;
-
-			if (dma_last == dma) {
-				BUG_ON(++retries > 10);
-			} else {
-				dma_last = dma;
-				retries = 0;
-			}
-
-			nb_unmap.iova = dma->iova;
-			nb_unmap.size = dma->size;
-
-			/*
-			 * Notify anyone (mdev vendor drivers) to invalidate and
-			 * unmap iovas within the range we're about to unmap.
-			 * Vendor drivers MUST unpin pages in response to an
-			 * invalidation.
-			 */
-			mutex_unlock(&iommu->lock);
-			blocking_notifier_call_chain(&iommu->notifier,
-						    VFIO_IOMMU_NOTIFY_DMA_UNMAP,
-						    &nb_unmap);
+		if (unmap_dma_pfn_list(iommu, dma, &dma_last, &retries))
 			goto again;
-		}
+
 		unmapped += dma->size;
 		vfio_remove_dma(iommu, dma);
 	}
@@ -2423,6 +2483,7 @@ static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
 	.register_notifier	= vfio_iommu_type1_register_notifier,
 	.unregister_notifier	= vfio_iommu_type1_unregister_notifier,
 	.dma_rw			= vfio_iommu_type1_dma_rw,
+	.invalidate_pfnmap_vma	= vfio_iommu_type1_invalidate_pfnmap_vma,
 };
 
 static int __init vfio_iommu_type1_init(void)
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 5d92ee15d098..13abfecc1530 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -88,6 +88,8 @@ struct vfio_iommu_driver_ops {
 					       struct notifier_block *nb);
 	int		(*dma_rw)(void *iommu_data, dma_addr_t user_iova,
 				  void *data, size_t count, bool write);
+	void		(*invalidate_pfnmap_vma)(void *iommu_data,
+						 struct vm_area_struct *vma);
 };
 
 extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
@@ -124,6 +126,9 @@ extern int vfio_group_unpin_pages(struct vfio_group *group,
 extern int vfio_dma_rw(struct vfio_group *group, dma_addr_t user_iova,
 		       void *data, size_t len, bool write);
 
+extern void vfio_invalidate_pfnmap_vma(struct vfio_group *group,
+				       struct vm_area_struct *vma);
+
 /* each type has independent events */
 enum vfio_notify_type {
 	VFIO_IOMMU_NOTIFY = 0,


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] vfio: Introduce strict PFNMAP mappings
  2020-05-14 16:51 [PATCH 0/2] vfio/type1/pci: IOMMU PFNMAP invalidation Alex Williamson
  2020-05-14 16:51 ` [PATCH 1/2] vfio: Introduce bus driver to IOMMU invalidation interface Alex Williamson
@ 2020-05-14 16:52 ` Alex Williamson
  2020-05-20  0:20   ` Jason Gunthorpe
  2020-05-14 21:25 ` [PATCH 0/2] vfio/type1/pci: IOMMU PFNMAP invalidation Peter Xu
  2020-05-20  0:23 ` Jason Gunthorpe
  3 siblings, 1 reply; 13+ messages in thread
From: Alex Williamson @ 2020-05-14 16:52 UTC (permalink / raw)
  To: kvm; +Cc: linux-kernel, cohuck, jgg, peterx

We can't pin PFNMAP IOMMU mappings like we can standard page-backed
mappings, therefore without an invalidation mechanism we can't know
if we should have revoked a user's mapping.  Now that we have an
invalidation callback mechanism we can create an interface for vfio
bus drivers to indicate their support for invalidation by registering
supported vm_ops functions with vfio-core.  A vfio IOMMU backend
driver can then test a vma against the registered vm_ops with this
support to determine whether to allow such a mapping.  The type1
backend then adopts a new 'strict_mmio_maps' module option, enabled
by default, restricting IOMMU mapping of PFNMAP vmas to only those
supporting invalidation callbacks.  vfio-pci is updated to register
vfio_pci_mmap_ops as supporting this feature.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/pci/vfio_pci.c     |    7 ++++
 drivers/vfio/vfio.c             |   62 +++++++++++++++++++++++++++++++++++++++
 drivers/vfio/vfio_iommu_type1.c |    9 +++++-
 include/linux/vfio.h            |    4 +++
 4 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 100fe5f6bc22..dbfe6a11aa74 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -2281,6 +2281,7 @@ static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev)
 
 static void __exit vfio_pci_cleanup(void)
 {
+	vfio_unregister_vma_inv_ops(&vfio_pci_mmap_ops);
 	pci_unregister_driver(&vfio_pci_driver);
 	vfio_pci_uninit_perm_bits();
 }
@@ -2340,10 +2341,16 @@ static int __init vfio_pci_init(void)
 	if (ret)
 		goto out_driver;
 
+	ret = vfio_register_vma_inv_ops(&vfio_pci_mmap_ops);
+	if (ret)
+		goto out_inv_ops;
+
 	vfio_pci_fill_ids();
 
 	return 0;
 
+out_inv_ops:
+	pci_unregister_driver(&vfio_pci_driver);
 out_driver:
 	vfio_pci_uninit_perm_bits();
 	return ret;
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 0fff057b7cd9..0f0a9d3b38aa 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -47,6 +47,8 @@ static struct vfio {
 	struct cdev			group_cdev;
 	dev_t				group_devt;
 	wait_queue_head_t		release_q;
+	struct list_head		vma_inv_ops_list;
+	struct mutex			vma_inv_ops_lock;
 } vfio;
 
 struct vfio_iommu_driver {
@@ -98,6 +100,11 @@ struct vfio_device {
 	void				*device_data;
 };
 
+struct vfio_vma_inv_ops {
+	const struct vm_operations_struct	*ops;
+	struct list_head			ops_next;
+};
+
 #ifdef CONFIG_VFIO_NOIOMMU
 static bool noiommu __read_mostly;
 module_param_named(enable_unsafe_noiommu_mode,
@@ -2332,6 +2339,58 @@ int vfio_unregister_notifier(struct device *dev, enum vfio_notify_type type,
 }
 EXPORT_SYMBOL(vfio_unregister_notifier);
 
+int vfio_register_vma_inv_ops(const struct vm_operations_struct *ops)
+{
+	struct vfio_vma_inv_ops *inv_ops;
+
+	inv_ops = kmalloc(sizeof(*inv_ops), GFP_KERNEL);
+	if (!inv_ops)
+		return -ENOMEM;
+
+	inv_ops->ops = ops;
+
+	mutex_lock(&vfio.vma_inv_ops_lock);
+	list_add(&inv_ops->ops_next, &vfio.vma_inv_ops_list);
+	mutex_unlock(&vfio.vma_inv_ops_lock);
+
+	return 0;
+}
+EXPORT_SYMBOL(vfio_register_vma_inv_ops);
+
+void vfio_unregister_vma_inv_ops(const struct vm_operations_struct *ops)
+{
+	struct vfio_vma_inv_ops *inv_ops;
+
+	mutex_lock(&vfio.vma_inv_ops_lock);
+	list_for_each_entry(inv_ops, &vfio.vma_inv_ops_list, ops_next) {
+		if (inv_ops->ops == ops) {
+			list_del(&inv_ops->ops_next);
+			kfree(inv_ops);
+			break;
+		}
+	}
+	mutex_unlock(&vfio.vma_inv_ops_lock);
+}
+EXPORT_SYMBOL(vfio_unregister_vma_inv_ops);
+
+bool vfio_vma_has_inv_ops(struct vm_area_struct *vma)
+{
+	struct vfio_vma_inv_ops *inv_ops;
+	bool ret = false;
+
+	mutex_lock(&vfio.vma_inv_ops_lock);
+	list_for_each_entry(inv_ops, &vfio.vma_inv_ops_list, ops_next) {
+		if (inv_ops->ops == vma->vm_ops) {
+			ret = true;
+			break;
+		}
+	}
+	mutex_unlock(&vfio.vma_inv_ops_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(vfio_vma_has_inv_ops);
+
 /**
  * Module/class support
  */
@@ -2355,8 +2414,10 @@ static int __init vfio_init(void)
 	idr_init(&vfio.group_idr);
 	mutex_init(&vfio.group_lock);
 	mutex_init(&vfio.iommu_drivers_lock);
+	mutex_init(&vfio.vma_inv_ops_lock);
 	INIT_LIST_HEAD(&vfio.group_list);
 	INIT_LIST_HEAD(&vfio.iommu_drivers_list);
+	INIT_LIST_HEAD(&vfio.vma_inv_ops_list);
 	init_waitqueue_head(&vfio.release_q);
 
 	ret = misc_register(&vfio_dev);
@@ -2403,6 +2464,7 @@ static int __init vfio_init(void)
 static void __exit vfio_cleanup(void)
 {
 	WARN_ON(!list_empty(&vfio.group_list));
+	WARN_ON(!list_empty(&vfio.vma_inv_ops_list));
 
 #ifdef CONFIG_VFIO_NOIOMMU
 	vfio_unregister_iommu_driver(&vfio_noiommu_ops);
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 62ba6bd8a486..8d6286d89230 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -61,6 +61,11 @@ module_param_named(dma_entry_limit, dma_entry_limit, uint, 0644);
 MODULE_PARM_DESC(dma_entry_limit,
 		 "Maximum number of user DMA mappings per container (65535).");
 
+static bool strict_mmio_maps = true;
+module_param_named(strict_mmio_maps, strict_mmio_maps, bool, 0644);
+MODULE_PARM_DESC(strict_mmio_maps,
+		 "Restrict DMA mappings of MMIO to those provided by vfio bus drivers supporting invalidation (true).");
+
 struct vfio_iommu {
 	struct list_head	domain_list;
 	struct list_head	iova_list;
@@ -375,7 +380,9 @@ static int vaddr_get_pfn(struct vfio_dma *dma, struct mm_struct *mm,
 
 	if (vma && vma->vm_flags & VM_PFNMAP) {
 		if ((dma->pfnmap_vma && dma->pfnmap_vma != vma) ||
-		    (!dma->pfnmap_vma && vaddr != dma->vaddr)) {
+		    (!dma->pfnmap_vma && (vaddr != dma->vaddr ||
+					  (strict_mmio_maps &&
+					   !vfio_vma_has_inv_ops(vma))))) {
 			ret = -EPERM;
 			goto done;
 		}
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 13abfecc1530..edc393f1287d 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -149,6 +149,10 @@ extern int vfio_unregister_notifier(struct device *dev,
 				    enum vfio_notify_type type,
 				    struct notifier_block *nb);
 
+extern int vfio_register_vma_inv_ops(const struct vm_operations_struct *ops);
+extern void vfio_unregister_vma_inv_ops(const struct vm_operations_struct *ops);
+extern bool vfio_vma_has_inv_ops(struct vm_area_struct *vma);
+
 struct kvm;
 extern void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
 


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] vfio/type1/pci: IOMMU PFNMAP invalidation
  2020-05-14 16:51 [PATCH 0/2] vfio/type1/pci: IOMMU PFNMAP invalidation Alex Williamson
  2020-05-14 16:51 ` [PATCH 1/2] vfio: Introduce bus driver to IOMMU invalidation interface Alex Williamson
  2020-05-14 16:52 ` [PATCH 2/2] vfio: Introduce strict PFNMAP mappings Alex Williamson
@ 2020-05-14 21:25 ` Peter Xu
  2020-05-14 22:17   ` Alex Williamson
  2020-05-20  0:23 ` Jason Gunthorpe
  3 siblings, 1 reply; 13+ messages in thread
From: Peter Xu @ 2020-05-14 21:25 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, linux-kernel, cohuck, jgg

On Thu, May 14, 2020 at 10:51:46AM -0600, Alex Williamson wrote:
> This is a follow-on series to "vfio-pci: Block user access to disabled
> device MMIO"[1], which extends user access blocking of disabled MMIO
> ranges to include unmapping the ranges from the IOMMU.  The first patch
> adds an invalidation callback path, allowing vfio bus drivers to signal
> the IOMMU backend to unmap ranges with vma level granularity.  This
> signaling is done both when the MMIO range becomes inaccessible due to
> memory disabling, as well as when a vma is closed, making up for the
> lack of tracking or pinning for non-page backed vmas.  The second
> patch adds registration and testing interfaces such that the IOMMU
> backend driver can test whether a given PFNMAP vma is provided by a
> vfio bus driver supporting invalidation.  We can then implement more
> restricted semantics to only allow PFNMAP DMA mappings when we have
> such support, which becomes the new default.

Hi, Alex,

IIUC we'll directly tearing down the IOMMU page table without telling the
userspace for those PFNMAP pages.  I'm thinking whether there be any side
effect on the userspace side when userspace cached these mapping information
somehow.  E.g., is there a way for userspace to know this event?

Currently, QEMU VT-d will maintain all the IOVA mappings for each assigned
device when used with vfio-pci.  In this case, QEMU will probably need to
depend some invalidations sent from the guest (either userspace or kernel)
device drivers to invalidate such IOVA mappings after they got removed from the
hardware IOMMU page table underneath.  I haven't thought deeper on what would
happen if the vIOMMU has got an inconsistent mapping of the real device.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] vfio/type1/pci: IOMMU PFNMAP invalidation
  2020-05-14 21:25 ` [PATCH 0/2] vfio/type1/pci: IOMMU PFNMAP invalidation Peter Xu
@ 2020-05-14 22:17   ` Alex Williamson
  2020-05-14 22:24     ` Jason Gunthorpe
  0 siblings, 1 reply; 13+ messages in thread
From: Alex Williamson @ 2020-05-14 22:17 UTC (permalink / raw)
  To: Peter Xu; +Cc: kvm, linux-kernel, cohuck, jgg

On Thu, 14 May 2020 17:25:38 -0400
Peter Xu <peterx@redhat.com> wrote:

> On Thu, May 14, 2020 at 10:51:46AM -0600, Alex Williamson wrote:
> > This is a follow-on series to "vfio-pci: Block user access to disabled
> > device MMIO"[1], which extends user access blocking of disabled MMIO
> > ranges to include unmapping the ranges from the IOMMU.  The first patch
> > adds an invalidation callback path, allowing vfio bus drivers to signal
> > the IOMMU backend to unmap ranges with vma level granularity.  This
> > signaling is done both when the MMIO range becomes inaccessible due to
> > memory disabling, as well as when a vma is closed, making up for the
> > lack of tracking or pinning for non-page backed vmas.  The second
> > patch adds registration and testing interfaces such that the IOMMU
> > backend driver can test whether a given PFNMAP vma is provided by a
> > vfio bus driver supporting invalidation.  We can then implement more
> > restricted semantics to only allow PFNMAP DMA mappings when we have
> > such support, which becomes the new default.  
> 
> Hi, Alex,
> 
> IIUC we'll directly tearing down the IOMMU page table without telling the
> userspace for those PFNMAP pages.  I'm thinking whether there be any side
> effect on the userspace side when userspace cached these mapping information
> somehow.  E.g., is there a way for userspace to know this event?
> 
> Currently, QEMU VT-d will maintain all the IOVA mappings for each assigned
> device when used with vfio-pci.  In this case, QEMU will probably need to
> depend some invalidations sent from the guest (either userspace or kernel)
> device drivers to invalidate such IOVA mappings after they got removed from the
> hardware IOMMU page table underneath.  I haven't thought deeper on what would
> happen if the vIOMMU has got an inconsistent mapping of the real device.

Full disclosure, I haven't tested vIOMMU, there might be issues.  Let's
puzzle through this.  Without a vIOMMU the vfio MemoryListener in QEMU
makes use of address_space_memory, which is essentially the vCPU view
of memory.  When the memory bit of a PCI device is disabled, QEMU
correctly removes the MMIO regions of the device from this AddressSpace.
When re-enabled, they get re-added.  In that case what we're doing here
is a little bit redundant, the IOMMU mappings get dropped in response
to the memory bit and the subsequent callback from the MemoryListener
is effectively a no-op since the range is already unmapped.  When the
memory bit is re-enabled, the AddressSpace gets updated, the
MemoryListener fires and we re-fault the mmap as we're re-adding the
IOMMU mapping.

When we have a vIOMMU, the address_space_memory behavior should be the
same as above; the vIOMMU isn't involved in vCPU to device access.  So
I think our concern is explicit mappings, much like vfio itself makes.
That feels like a gap.  When the vma is being closed, I think dropping
those mappings out from under the user is probably still the right
approach and I think this series would still be useful if we only did
that much.  I think this would also address Jason's primary concern.
It's better to get an IOMMU fault from the user trying to access those
mappings than it is to leave them in place.

OTOH, if we're dropping mappings in response to disabling the memory
bit, we're changing a potential disabled MMIO access fault into an
IOMMU fault, where the platform response might very well be fatal in
either case.  Maybe we need to look at a more temporary invalidation
due to the memory enable bit.  If we could remap the range to a kernel
page we could maybe avoid the IOMMU fault and maybe even have a crude
test for whether any data was written to the page while that mapping
was in place (ie. simulating more restricted error handling, though
more asynchronous than done at the platform level).  Let me look into
it.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] vfio/type1/pci: IOMMU PFNMAP invalidation
  2020-05-14 22:17   ` Alex Williamson
@ 2020-05-14 22:24     ` Jason Gunthorpe
  2020-05-14 22:55       ` Alex Williamson
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2020-05-14 22:24 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Peter Xu, kvm, linux-kernel, cohuck

On Thu, May 14, 2020 at 04:17:12PM -0600, Alex Williamson wrote:

> that much.  I think this would also address Jason's primary concern.
> It's better to get an IOMMU fault from the user trying to access those
> mappings than it is to leave them in place.

Yes, there are few options here - if the pages are available for use
by the IOMMU and *asynchronously* someone else revokes them, then the
only way to protect the kernel is to block them from the IOMMUU.

For this to be sane the revokation must be under complete control of
the VFIO user. ie if a user decides to disable MMIO traffic then of
course the IOMMU should block P2P transfer to the MMIO bar. It is user
error to have not disabled those transfers in the first place.

When this is all done inside a guest the whole logic applies. On bare
metal you might get some AER or crash or MCE. In virtualization you'll
get an IOMMU fault.

> due to the memory enable bit.  If we could remap the range to a kernel
> page we could maybe avoid the IOMMU fault and maybe even have a crude
> test for whether any data was written to the page while that mapping
> was in place (ie. simulating more restricted error handling, though
> more asynchronous than done at the platform level).  

I'm not if this makes sense, can't we arrange to directly trap the
IOMMU failure and route it into qemu if that is what is desired?

I'll try to look at this next week, swamped right now

Thanks,
Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] vfio/type1/pci: IOMMU PFNMAP invalidation
  2020-05-14 22:24     ` Jason Gunthorpe
@ 2020-05-14 22:55       ` Alex Williamson
  2020-05-15 15:22         ` Peter Xu
  2020-05-20  0:24         ` Jason Gunthorpe
  0 siblings, 2 replies; 13+ messages in thread
From: Alex Williamson @ 2020-05-14 22:55 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Peter Xu, kvm, linux-kernel, cohuck

On Thu, 14 May 2020 19:24:15 -0300
Jason Gunthorpe <jgg@ziepe.ca> wrote:

> On Thu, May 14, 2020 at 04:17:12PM -0600, Alex Williamson wrote:
> 
> > that much.  I think this would also address Jason's primary concern.
> > It's better to get an IOMMU fault from the user trying to access those
> > mappings than it is to leave them in place.  
> 
> Yes, there are few options here - if the pages are available for use
> by the IOMMU and *asynchronously* someone else revokes them, then the
> only way to protect the kernel is to block them from the IOMMUU.
> 
> For this to be sane the revokation must be under complete control of
> the VFIO user. ie if a user decides to disable MMIO traffic then of
> course the IOMMU should block P2P transfer to the MMIO bar. It is user
> error to have not disabled those transfers in the first place.
> 
> When this is all done inside a guest the whole logic applies. On bare
> metal you might get some AER or crash or MCE. In virtualization you'll
> get an IOMMU fault.
> 
> > due to the memory enable bit.  If we could remap the range to a kernel
> > page we could maybe avoid the IOMMU fault and maybe even have a crude
> > test for whether any data was written to the page while that mapping
> > was in place (ie. simulating more restricted error handling, though
> > more asynchronous than done at the platform level).    
> 
> I'm not if this makes sense, can't we arrange to directly trap the
> IOMMU failure and route it into qemu if that is what is desired?

Can't guarantee it, some systems wire that directly into their
management processor so that they can "protect their users" regardless
of whether they want or need it.  Yay firmware first error handling,
*sigh*.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] vfio/type1/pci: IOMMU PFNMAP invalidation
  2020-05-14 22:55       ` Alex Williamson
@ 2020-05-15 15:22         ` Peter Xu
  2020-05-15 15:54           ` Alex Williamson
  2020-05-20  0:24         ` Jason Gunthorpe
  1 sibling, 1 reply; 13+ messages in thread
From: Peter Xu @ 2020-05-15 15:22 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Jason Gunthorpe, kvm, linux-kernel, cohuck

On Thu, May 14, 2020 at 04:55:17PM -0600, Alex Williamson wrote:
> > I'm not if this makes sense, can't we arrange to directly trap the
> > IOMMU failure and route it into qemu if that is what is desired?
> 
> Can't guarantee it, some systems wire that directly into their
> management processor so that they can "protect their users" regardless
> of whether they want or need it.  Yay firmware first error handling,
> *sigh*.  Thanks,

Sorry to be slightly out of topic - Alex, does this mean the general approach
of fault reporting from vfio to the userspace is not gonna work too?

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] vfio/type1/pci: IOMMU PFNMAP invalidation
  2020-05-15 15:22         ` Peter Xu
@ 2020-05-15 15:54           ` Alex Williamson
  0 siblings, 0 replies; 13+ messages in thread
From: Alex Williamson @ 2020-05-15 15:54 UTC (permalink / raw)
  To: Peter Xu; +Cc: Jason Gunthorpe, kvm, linux-kernel, cohuck

On Fri, 15 May 2020 11:22:51 -0400
Peter Xu <peterx@redhat.com> wrote:

> On Thu, May 14, 2020 at 04:55:17PM -0600, Alex Williamson wrote:
> > > I'm not if this makes sense, can't we arrange to directly trap the
> > > IOMMU failure and route it into qemu if that is what is desired?  
> > 
> > Can't guarantee it, some systems wire that directly into their
> > management processor so that they can "protect their users" regardless
> > of whether they want or need it.  Yay firmware first error handling,
> > *sigh*.  Thanks,  
> 
> Sorry to be slightly out of topic - Alex, does this mean the general approach
> of fault reporting from vfio to the userspace is not gonna work too?

AFAIK these platforms only generate a fatal fault on certain classes of
access which imply a potential for data loss, for example a DMA write to
an invalid PTE entry.  The actual IOMMU page faulting mechanism should
not be affected by this, or at least one would hope.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] vfio: Introduce bus driver to IOMMU invalidation interface
  2020-05-14 16:51 ` [PATCH 1/2] vfio: Introduce bus driver to IOMMU invalidation interface Alex Williamson
@ 2020-05-20  0:19   ` Jason Gunthorpe
  0 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2020-05-20  0:19 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, linux-kernel, cohuck, peterx

On Thu, May 14, 2020 at 10:51:58AM -0600, Alex Williamson wrote:
> @@ -1450,6 +1467,10 @@ static int vfio_pci_zap_and_vma_lock(struct vfio_pci_device *vdev, bool try)
>  
>  				zap_vma_ptes(vma, vma->vm_start,
>  					     vma->vm_end - vma->vm_start);
> +				mutex_unlock(&vdev->vma_lock);
> +				up_read(&mm->mmap_sem);
> +				vfio_invalidate_pfnmap_vma(vdev->group, vma);
> +				goto again;

The vma pointer can't leave the read side of the mmap_sem

> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 4a4cb7cd86b2..62ba6bd8a486 100644
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -91,6 +91,7 @@ struct vfio_dma {
>  	bool			lock_cap;	/* capable(CAP_IPC_LOCK) */
>  	struct task_struct	*task;
>  	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
> +	struct vm_area_struct	*pfnmap_vma;

This is also confusing, how does it prevent pfnmap_vma from becoming
freed?

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] vfio: Introduce strict PFNMAP mappings
  2020-05-14 16:52 ` [PATCH 2/2] vfio: Introduce strict PFNMAP mappings Alex Williamson
@ 2020-05-20  0:20   ` Jason Gunthorpe
  0 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2020-05-20  0:20 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, linux-kernel, cohuck, peterx

On Thu, May 14, 2020 at 10:52:09AM -0600, Alex Williamson wrote:
>  	vfio_unregister_iommu_driver(&vfio_noiommu_ops);
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 62ba6bd8a486..8d6286d89230 100644
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -61,6 +61,11 @@ module_param_named(dma_entry_limit, dma_entry_limit, uint, 0644);
>  MODULE_PARM_DESC(dma_entry_limit,
>  		 "Maximum number of user DMA mappings per container (65535).");
>  
> +static bool strict_mmio_maps = true;
> +module_param_named(strict_mmio_maps, strict_mmio_maps, bool, 0644);
> +MODULE_PARM_DESC(strict_mmio_maps,
> +		 "Restrict DMA mappings of MMIO to those provided by vfio bus drivers supporting invalidation (true).");
> +

This should probably explain that 'false' allows some kind of security
issue and maybe taint the kernel?

Do you think there is a reason to have this anyhow?

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] vfio/type1/pci: IOMMU PFNMAP invalidation
  2020-05-14 16:51 [PATCH 0/2] vfio/type1/pci: IOMMU PFNMAP invalidation Alex Williamson
                   ` (2 preceding siblings ...)
  2020-05-14 21:25 ` [PATCH 0/2] vfio/type1/pci: IOMMU PFNMAP invalidation Peter Xu
@ 2020-05-20  0:23 ` Jason Gunthorpe
  3 siblings, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2020-05-20  0:23 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm, linux-kernel, cohuck, peterx

On Thu, May 14, 2020 at 10:51:46AM -0600, Alex Williamson wrote:
> This is a follow-on series to "vfio-pci: Block user access to disabled
> device MMIO"[1], which extends user access blocking of disabled MMIO
> ranges to include unmapping the ranges from the IOMMU.  The first patch
> adds an invalidation callback path, allowing vfio bus drivers to signal
> the IOMMU backend to unmap ranges with vma level granularity.  This
> signaling is done both when the MMIO range becomes inaccessible due to
> memory disabling, as well as when a vma is closed, making up for the
> lack of tracking or pinning for non-page backed vmas.  The second
> patch adds registration and testing interfaces such that the IOMMU
> backend driver can test whether a given PFNMAP vma is provided by a
> vfio bus driver supporting invalidation.  We can then implement more
> restricted semantics to only allow PFNMAP DMA mappings when we have
> such support, which becomes the new default.
> 
> Jason, if you'd like Suggested-by credit for the ideas here I'd be
> glad to add it.  Thanks,

Certainly a Reported-by would be OK

The only thing I don't like here is this makes some P2P DMA mapping
scheme for VMAs with invalidation that is completely private to vfio.

Many of us want this in other subsystems, and there are legimiate uses
for vfio to import BAR memory for P2P from other places than itself.

So I would really rather this be supported by the core kernel in some
way.

That said, this is a bug fix, and we still don't have much agreement
on what the core kernel version should look like, let alone how it
should work with an IOMMU. 

So maybe this goes ahead as is and we can figure out how to replace it
with something general later on?

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/2] vfio/type1/pci: IOMMU PFNMAP invalidation
  2020-05-14 22:55       ` Alex Williamson
  2020-05-15 15:22         ` Peter Xu
@ 2020-05-20  0:24         ` Jason Gunthorpe
  1 sibling, 0 replies; 13+ messages in thread
From: Jason Gunthorpe @ 2020-05-20  0:24 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Peter Xu, kvm, linux-kernel, cohuck

On Thu, May 14, 2020 at 04:55:17PM -0600, Alex Williamson wrote:
> On Thu, 14 May 2020 19:24:15 -0300
> Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
> > On Thu, May 14, 2020 at 04:17:12PM -0600, Alex Williamson wrote:
> > 
> > > that much.  I think this would also address Jason's primary concern.
> > > It's better to get an IOMMU fault from the user trying to access those
> > > mappings than it is to leave them in place.  
> > 
> > Yes, there are few options here - if the pages are available for use
> > by the IOMMU and *asynchronously* someone else revokes them, then the
> > only way to protect the kernel is to block them from the IOMMUU.
> > 
> > For this to be sane the revokation must be under complete control of
> > the VFIO user. ie if a user decides to disable MMIO traffic then of
> > course the IOMMU should block P2P transfer to the MMIO bar. It is user
> > error to have not disabled those transfers in the first place.
> > 
> > When this is all done inside a guest the whole logic applies. On bare
> > metal you might get some AER or crash or MCE. In virtualization you'll
> > get an IOMMU fault.
> > 
> > > due to the memory enable bit.  If we could remap the range to a kernel
> > > page we could maybe avoid the IOMMU fault and maybe even have a crude
> > > test for whether any data was written to the page while that mapping
> > > was in place (ie. simulating more restricted error handling, though
> > > more asynchronous than done at the platform level).    
> > 
> > I'm not if this makes sense, can't we arrange to directly trap the
> > IOMMU failure and route it into qemu if that is what is desired?
> 
> Can't guarantee it, some systems wire that directly into their
> management processor so that they can "protect their users" regardless
> of whether they want or need it.  Yay firmware first error handling,
> *sigh*.  Thanks,

I feel like those system should just loose the ability to reliably
mirror IOMMU errors to their guests - trying to emulate it by scanning
memory/etc sounds too horrible.

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-05-20  0:24 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-14 16:51 [PATCH 0/2] vfio/type1/pci: IOMMU PFNMAP invalidation Alex Williamson
2020-05-14 16:51 ` [PATCH 1/2] vfio: Introduce bus driver to IOMMU invalidation interface Alex Williamson
2020-05-20  0:19   ` Jason Gunthorpe
2020-05-14 16:52 ` [PATCH 2/2] vfio: Introduce strict PFNMAP mappings Alex Williamson
2020-05-20  0:20   ` Jason Gunthorpe
2020-05-14 21:25 ` [PATCH 0/2] vfio/type1/pci: IOMMU PFNMAP invalidation Peter Xu
2020-05-14 22:17   ` Alex Williamson
2020-05-14 22:24     ` Jason Gunthorpe
2020-05-14 22:55       ` Alex Williamson
2020-05-15 15:22         ` Peter Xu
2020-05-15 15:54           ` Alex Williamson
2020-05-20  0:24         ` Jason Gunthorpe
2020-05-20  0:23 ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.