All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V6 0/7] fixes for virtual address update
@ 2022-12-16 18:50 Steve Sistare
  2022-12-16 18:50 ` [PATCH V6 1/7] vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR Steve Sistare
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Steve Sistare @ 2022-12-16 18:50 UTC (permalink / raw)
  To: kvm
  Cc: Alex Williamson, Cornelia Huck, Jason Gunthorpe, Kevin Tian,
	Steve Sistare

Fix bugs in the interfaces that allow the underlying memory object of an
iova range to be mapped in a new address space.  They allow userland to
indefinitely block vfio mediated device kernel threads, and do not
propagate the locked_vm count to a new mm.  Also fix a pre-existing bug
that allows locked_vm underflow.

The fixes impose restrictions that eliminate waiting conditions, so
revert the dead code:
  commit 898b9eaeb3fe ("vfio/type1: block on invalid vaddr")
  commit 487ace134053 ("vfio/type1: implement notify callback")
  commit ec5e32940cc9 ("vfio: iommu driver notify callback")

Changes in V2 (thanks Alex):
  * do not allow group attach while vaddrs are invalid
  * add patches to delete dead code
  * add WARN_ON for never-should-happen conditions
  * check for changed mm in unmap.
  * check for vfio_lock_acct failure in remap

Changes in V3 (ditto!):
  * return errno at WARN_ON sites, and make it unique
  * correctly check for dma task mm change
  * change dma owner to current when vaddr is updated
  * add Fixes to commit messages
  * refactored new code in vfio_dma_do_map

Changes in V4:
  * misc cosmetic changes

Changes in V5 (thanks Jason and Kevin):
  * grab mm and use it for locked_vm accounting
  * separate patches for underflow and restoring locked_vm
  * account for reserved pages
  * improve error messages

Changes in V6:
  * drop "count reserved pages" patch
  * add "track locked_vm" patch
  * grab current->mm not group_leader->mm
  * simplify vfio_change_dma_owner
  * fix commit messages

Steve Sistare (7):
  vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR
  vfio/type1: prevent underflow of locked_vm via exec()
  vfio/type1: track locked_vm per dma
  vfio/type1: restore locked_vm
  vfio/type1: revert "block on invalid vaddr"
  vfio/type1: revert "implement notify callback"
  vfio: revert "iommu driver notify callback"

 drivers/vfio/container.c        |   5 -
 drivers/vfio/vfio.h             |   7 --
 drivers/vfio/vfio_iommu_type1.c | 226 ++++++++++++++++++----------------------
 include/uapi/linux/vfio.h       |  15 +--
 4 files changed, 111 insertions(+), 142 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH V6 1/7] vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR
  2022-12-16 18:50 [PATCH V6 0/7] fixes for virtual address update Steve Sistare
@ 2022-12-16 18:50 ` Steve Sistare
  2022-12-16 18:50 ` [PATCH V6 2/7] vfio/type1: prevent underflow of locked_vm via exec() Steve Sistare
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Steve Sistare @ 2022-12-16 18:50 UTC (permalink / raw)
  To: kvm
  Cc: Alex Williamson, Cornelia Huck, Jason Gunthorpe, Kevin Tian,
	Steve Sistare

Disable the VFIO_UPDATE_VADDR capability if mediated devices are present.
Their kernel threads could be blocked indefinitely by a misbehaving
userland while trying to pin/unpin pages while vaddrs are being updated.

Do not allow groups to be added to the container while vaddr's are invalid,
so we never need to block user threads from pinning, and can delete the
vaddr-waiting code in a subsequent patch.

Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
 include/uapi/linux/vfio.h       | 15 ++++++++------
 2 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 23c24fe..144f5bb 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -861,6 +861,12 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 
 	mutex_lock(&iommu->lock);
 
+	if (WARN_ONCE(iommu->vaddr_invalid_count,
+		      "vfio_pin_pages not allowed with VFIO_UPDATE_VADDR\n")) {
+		ret = -EBUSY;
+		goto pin_done;
+	}
+
 	/*
 	 * Wait for all necessary vaddr's to be valid so they can be used in
 	 * the main loop without dropping the lock, to avoid racing vs unmap.
@@ -1343,6 +1349,12 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 
 	mutex_lock(&iommu->lock);
 
+	/* Cannot update vaddr if mdev is present. */
+	if (invalidate_vaddr && !list_empty(&iommu->emulated_iommu_groups)) {
+		ret = -EBUSY;
+		goto unlock;
+	}
+
 	pgshift = __ffs(iommu->pgsize_bitmap);
 	pgsize = (size_t)1 << pgshift;
 
@@ -2185,11 +2197,16 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	struct iommu_domain_geometry *geo;
 	LIST_HEAD(iova_copy);
 	LIST_HEAD(group_resv_regions);
-	int ret = -EINVAL;
+	int ret = -EBUSY;
 
 	mutex_lock(&iommu->lock);
 
+	/* Attach could require pinning, so disallow while vaddr is invalid. */
+	if (iommu->vaddr_invalid_count)
+		goto out_unlock;
+
 	/* Check for duplicates */
+	ret = -EINVAL;
 	if (vfio_iommu_find_iommu_group(iommu, iommu_group))
 		goto out_unlock;
 
@@ -2660,6 +2677,16 @@ static int vfio_domains_have_enforce_cache_coherency(struct vfio_iommu *iommu)
 	return ret;
 }
 
+static bool vfio_iommu_has_emulated(struct vfio_iommu *iommu)
+{
+	bool ret;
+
+	mutex_lock(&iommu->lock);
+	ret = !list_empty(&iommu->emulated_iommu_groups);
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
 static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu,
 					    unsigned long arg)
 {
@@ -2668,8 +2695,13 @@ static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu,
 	case VFIO_TYPE1v2_IOMMU:
 	case VFIO_TYPE1_NESTING_IOMMU:
 	case VFIO_UNMAP_ALL:
-	case VFIO_UPDATE_VADDR:
 		return 1;
+	case VFIO_UPDATE_VADDR:
+		/*
+		 * Disable this feature if mdevs are present.  They cannot
+		 * safely pin/unpin/rw while vaddrs are being updated.
+		 */
+		return iommu && !vfio_iommu_has_emulated(iommu);
 	case VFIO_DMA_CC_IOMMU:
 		if (!iommu)
 			return 0;
@@ -3138,6 +3170,13 @@ static int vfio_iommu_type1_dma_rw(void *iommu_data, dma_addr_t user_iova,
 	size_t done;
 
 	mutex_lock(&iommu->lock);
+
+	if (WARN_ONCE(iommu->vaddr_invalid_count,
+		      "vfio_dma_rw not allowed with VFIO_UPDATE_VADDR\n")) {
+		ret = -EBUSY;
+		goto out;
+	}
+
 	while (count > 0) {
 		ret = vfio_iommu_type1_dma_rw_chunk(iommu, user_iova, data,
 						    count, write, &done);
@@ -3149,6 +3188,7 @@ static int vfio_iommu_type1_dma_rw(void *iommu_data, dma_addr_t user_iova,
 		user_iova += done;
 	}
 
+out:
 	mutex_unlock(&iommu->lock);
 	return ret;
 }
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index d7d8e09..4e8d344 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -49,7 +49,11 @@
 /* Supports VFIO_DMA_UNMAP_FLAG_ALL */
 #define VFIO_UNMAP_ALL			9
 
-/* Supports the vaddr flag for DMA map and unmap */
+/*
+ * Supports the vaddr flag for DMA map and unmap.  Not supported for mediated
+ * devices, so this capability is subject to change as groups are added or
+ * removed.
+ */
 #define VFIO_UPDATE_VADDR		10
 
 /*
@@ -1215,8 +1219,7 @@ struct vfio_iommu_type1_info_dma_avail {
  * Map process virtual addresses to IO virtual addresses using the
  * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
  *
- * If flags & VFIO_DMA_MAP_FLAG_VADDR, update the base vaddr for iova, and
- * unblock translation of host virtual addresses in the iova range.  The vaddr
+ * If flags & VFIO_DMA_MAP_FLAG_VADDR, update the base vaddr for iova. The vaddr
  * must have previously been invalidated with VFIO_DMA_UNMAP_FLAG_VADDR.  To
  * maintain memory consistency within the user application, the updated vaddr
  * must address the same memory object as originally mapped.  Failure to do so
@@ -1267,9 +1270,9 @@ struct vfio_bitmap {
  * must be 0.  This cannot be combined with the get-dirty-bitmap flag.
  *
  * If flags & VFIO_DMA_UNMAP_FLAG_VADDR, do not unmap, but invalidate host
- * virtual addresses in the iova range.  Tasks that attempt to translate an
- * iova's vaddr will block.  DMA to already-mapped pages continues.  This
- * cannot be combined with the get-dirty-bitmap flag.
+ * virtual addresses in the iova range.  DMA to already-mapped pages continues.
+ * Groups may not be added to the container while any addresses are invalid.
+ * This cannot be combined with the get-dirty-bitmap flag.
  */
 struct vfio_iommu_type1_dma_unmap {
 	__u32	argsz;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V6 2/7] vfio/type1: prevent underflow of locked_vm via exec()
  2022-12-16 18:50 [PATCH V6 0/7] fixes for virtual address update Steve Sistare
  2022-12-16 18:50 ` [PATCH V6 1/7] vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR Steve Sistare
@ 2022-12-16 18:50 ` Steve Sistare
  2022-12-19  7:48   ` Tian, Kevin
  2022-12-16 18:50 ` [PATCH V6 3/7] vfio/type1: track locked_vm per dma Steve Sistare
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Steve Sistare @ 2022-12-16 18:50 UTC (permalink / raw)
  To: kvm
  Cc: Alex Williamson, Cornelia Huck, Jason Gunthorpe, Kevin Tian,
	Steve Sistare

When a vfio container is preserved across exec, the task does not change,
but it gets a new mm with locked_vm=0.  If the user later unmaps a dma
mapping, locked_vm underflows to a large unsigned value, and a subsequent
dma map request fails with ENOMEM in __account_locked_vm.

To avoid underflow, grab and save the mm at the time a dma is mapped.
Use that mm when adjusting locked_vm, rather than re-acquiring the saved
task's mm, which may have changed.  If the saved mm is dead, do nothing.

Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 drivers/vfio/vfio_iommu_type1.c | 27 +++++++++++----------------
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 144f5bb..71f980b 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -100,6 +100,7 @@ struct vfio_dma {
 	struct task_struct	*task;
 	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
 	unsigned long		*bitmap;
+	struct mm_struct	*mm;
 };
 
 struct vfio_batch {
@@ -420,8 +421,8 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
 	if (!npage)
 		return 0;
 
-	mm = async ? get_task_mm(dma->task) : dma->task->mm;
-	if (!mm)
+	mm = dma->mm;
+	if (async && !mmget_not_zero(mm))
 		return -ESRCH; /* process exited */
 
 	ret = mmap_write_lock_killable(mm);
@@ -794,8 +795,8 @@ static int vfio_pin_page_external(struct vfio_dma *dma, unsigned long vaddr,
 	struct mm_struct *mm;
 	int ret;
 
-	mm = get_task_mm(dma->task);
-	if (!mm)
+	mm = dma->mm;
+	if (!mmget_not_zero(mm))
 		return -ENODEV;
 
 	ret = vaddr_get_pfns(mm, vaddr, 1, dma->prot, pfn_base, pages);
@@ -1180,6 +1181,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
 	vfio_unmap_unpin(iommu, dma, true);
 	vfio_unlink_dma(iommu, dma);
 	put_task_struct(dma->task);
+	mmdrop(dma->mm);
 	vfio_dma_bitmap_free(dma);
 	if (dma->vaddr_invalid) {
 		iommu->vaddr_invalid_count--;
@@ -1664,15 +1666,7 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 	 * against the locked memory limit and we need to be able to do both
 	 * outside of this call path as pinning can be asynchronous via the
 	 * external interfaces for mdev devices.  RLIMIT_MEMLOCK requires a
-	 * task_struct and VM locked pages requires an mm_struct, however
-	 * holding an indefinite mm reference is not recommended, therefore we
-	 * only hold a reference to a task.  We could hold a reference to
-	 * current, however QEMU uses this call path through vCPU threads,
-	 * which can be killed resulting in a NULL mm and failure in the unmap
-	 * path when called via a different thread.  Avoid this problem by
-	 * using the group_leader as threads within the same group require
-	 * both CLONE_THREAD and CLONE_VM and will therefore use the same
-	 * mm_struct.
+	 * task_struct and VM locked pages requires an mm_struct.
 	 *
 	 * Previously we also used the task for testing CAP_IPC_LOCK at the
 	 * time of pinning and accounting, however has_capability() makes use
@@ -1687,6 +1681,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 	get_task_struct(current->group_leader);
 	dma->task = current->group_leader;
 	dma->lock_cap = capable(CAP_IPC_LOCK);
+	dma->mm = current->mm;
+	mmgrab(dma->mm);
 
 	dma->pfn_list = RB_ROOT;
 
@@ -3122,9 +3118,8 @@ static int vfio_iommu_type1_dma_rw_chunk(struct vfio_iommu *iommu,
 			!(dma->prot & IOMMU_READ))
 		return -EPERM;
 
-	mm = get_task_mm(dma->task);
-
-	if (!mm)
+	mm = dma->mm;
+	if (!mmget_not_zero(mm))
 		return -EPERM;
 
 	if (kthread)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V6 3/7] vfio/type1: track locked_vm per dma
  2022-12-16 18:50 [PATCH V6 0/7] fixes for virtual address update Steve Sistare
  2022-12-16 18:50 ` [PATCH V6 1/7] vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR Steve Sistare
  2022-12-16 18:50 ` [PATCH V6 2/7] vfio/type1: prevent underflow of locked_vm via exec() Steve Sistare
@ 2022-12-16 18:50 ` Steve Sistare
  2022-12-19  7:51   ` Tian, Kevin
  2022-12-16 18:50 ` [PATCH V6 4/7] vfio/type1: restore locked_vm Steve Sistare
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Steve Sistare @ 2022-12-16 18:50 UTC (permalink / raw)
  To: kvm
  Cc: Alex Williamson, Cornelia Huck, Jason Gunthorpe, Kevin Tian,
	Steve Sistare

Track locked_vm per dma struct, and create a new subroutine, both for use
in a subsequent patch.  No functional change.

Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 drivers/vfio/vfio_iommu_type1.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 71f980b..889e920 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -101,6 +101,7 @@ struct vfio_dma {
 	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
 	unsigned long		*bitmap;
 	struct mm_struct	*mm;
+	long			locked_vm;
 };
 
 struct vfio_batch {
@@ -413,22 +414,21 @@ static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, struct vfio_pfn *vpfn)
 	return ret;
 }
 
-static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
+static int task_lock_acct(struct task_struct *task, struct mm_struct *mm,
+			  bool lock_cap, long npage, bool async)
 {
-	struct mm_struct *mm;
 	int ret;
 
 	if (!npage)
 		return 0;
 
-	mm = dma->mm;
 	if (async && !mmget_not_zero(mm))
 		return -ESRCH; /* process exited */
 
 	ret = mmap_write_lock_killable(mm);
 	if (!ret) {
-		ret = __account_locked_vm(mm, abs(npage), npage > 0, dma->task,
-					  dma->lock_cap);
+		ret = __account_locked_vm(mm, abs(npage), npage > 0, task,
+					  lock_cap);
 		mmap_write_unlock(mm);
 	}
 
@@ -438,6 +438,16 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
 	return ret;
 }
 
+static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
+{
+	int ret;
+
+	ret = task_lock_acct(dma->task, dma->mm, dma->lock_cap, npage, async);
+	if (!ret)
+		dma->locked_vm += npage;
+	return ret;
+}
+
 /*
  * Some mappings aren't backed by a struct page, for example an mmap'd
  * MMIO range for our own or another device.  These use a different
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V6 4/7] vfio/type1: restore locked_vm
  2022-12-16 18:50 [PATCH V6 0/7] fixes for virtual address update Steve Sistare
                   ` (2 preceding siblings ...)
  2022-12-16 18:50 ` [PATCH V6 3/7] vfio/type1: track locked_vm per dma Steve Sistare
@ 2022-12-16 18:50 ` Steve Sistare
  2022-12-19  7:54   ` Tian, Kevin
  2022-12-16 18:50 ` [PATCH V6 5/7] vfio/type1: revert "block on invalid vaddr" Steve Sistare
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Steve Sistare @ 2022-12-16 18:50 UTC (permalink / raw)
  To: kvm
  Cc: Alex Williamson, Cornelia Huck, Jason Gunthorpe, Kevin Tian,
	Steve Sistare

When a vfio container is preserved across exec or fork-exec, the new
task's mm has a locked_vm count of 0.  After a dma vaddr is updated using
VFIO_DMA_MAP_FLAG_VADDR, locked_vm remains 0, and the pinned memory does
not count against the task's RLIMIT_MEMLOCK.

To restore the correct locked_vm count, when VFIO_DMA_MAP_FLAG_VADDR is
used and the dma's mm has changed, add the dma's locked_vm count to
the new mm->locked_vm, subject to the rlimit.

Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 drivers/vfio/vfio_iommu_type1.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 889e920..4e79bce 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1590,6 +1590,35 @@ static bool vfio_iommu_iova_dma_valid(struct vfio_iommu *iommu,
 	return list_empty(iova);
 }
 
+static int vfio_change_dma_owner(struct vfio_dma *dma)
+{
+	int ret = 0;
+	struct task_struct *task = current->group_leader;
+	struct mm_struct *mm = current->mm;
+
+	if (task->mm != dma->mm) {
+		long npage = dma->locked_vm;
+		bool lock_cap = capable(CAP_IPC_LOCK);
+
+		ret = task_lock_acct(task, mm, lock_cap, npage, false);
+		if (ret)
+			return ret;
+
+		task_lock_acct(dma->task, dma->mm, dma->lock_cap, -npage, true);
+
+		if (dma->task != task) {
+			put_task_struct(dma->task);
+			dma->task = get_task_struct(task);
+		}
+		mmdrop(dma->mm);
+		dma->mm = mm;
+		mmgrab(dma->mm);
+		dma->lock_cap = lock_cap;
+	}
+
+	return ret;
+}
+
 static int vfio_dma_do_map(struct vfio_iommu *iommu,
 			   struct vfio_iommu_type1_dma_map *map)
 {
@@ -1639,6 +1668,9 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 			   dma->size != size) {
 			ret = -EINVAL;
 		} else {
+			ret = vfio_change_dma_owner(dma);
+			if (ret)
+				goto out_unlock;
 			dma->vaddr = vaddr;
 			dma->vaddr_invalid = false;
 			iommu->vaddr_invalid_count--;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V6 5/7] vfio/type1: revert "block on invalid vaddr"
  2022-12-16 18:50 [PATCH V6 0/7] fixes for virtual address update Steve Sistare
                   ` (3 preceding siblings ...)
  2022-12-16 18:50 ` [PATCH V6 4/7] vfio/type1: restore locked_vm Steve Sistare
@ 2022-12-16 18:50 ` Steve Sistare
  2022-12-19  7:54   ` Tian, Kevin
  2022-12-16 18:50 ` [PATCH V6 6/7] vfio/type1: revert "implement notify callback" Steve Sistare
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Steve Sistare @ 2022-12-16 18:50 UTC (permalink / raw)
  To: kvm
  Cc: Alex Williamson, Cornelia Huck, Jason Gunthorpe, Kevin Tian,
	Steve Sistare

Revert this dead code:
  commit 898b9eaeb3fe ("vfio/type1: block on invalid vaddr")

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 drivers/vfio/vfio_iommu_type1.c | 94 +++--------------------------------------
 1 file changed, 5 insertions(+), 89 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 4e79bce..2471f87 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -72,7 +72,6 @@ struct vfio_iommu {
 	unsigned int		vaddr_invalid_count;
 	uint64_t		pgsize_bitmap;
 	uint64_t		num_non_pinned_groups;
-	wait_queue_head_t	vaddr_wait;
 	bool			v2;
 	bool			nesting;
 	bool			dirty_page_tracking;
@@ -154,8 +153,6 @@ struct vfio_regions {
 #define DIRTY_BITMAP_PAGES_MAX	 ((u64)INT_MAX)
 #define DIRTY_BITMAP_SIZE_MAX	 DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
 
-#define WAITED 1
-
 static int put_pfn(unsigned long pfn, int prot);
 
 static struct vfio_iommu_group*
@@ -606,61 +603,6 @@ static int vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
 	return ret;
 }
 
-static int vfio_wait(struct vfio_iommu *iommu)
-{
-	DEFINE_WAIT(wait);
-
-	prepare_to_wait(&iommu->vaddr_wait, &wait, TASK_KILLABLE);
-	mutex_unlock(&iommu->lock);
-	schedule();
-	mutex_lock(&iommu->lock);
-	finish_wait(&iommu->vaddr_wait, &wait);
-	if (kthread_should_stop() || !iommu->container_open ||
-	    fatal_signal_pending(current)) {
-		return -EFAULT;
-	}
-	return WAITED;
-}
-
-/*
- * Find dma struct and wait for its vaddr to be valid.  iommu lock is dropped
- * if the task waits, but is re-locked on return.  Return result in *dma_p.
- * Return 0 on success with no waiting, WAITED on success if waited, and -errno
- * on error.
- */
-static int vfio_find_dma_valid(struct vfio_iommu *iommu, dma_addr_t start,
-			       size_t size, struct vfio_dma **dma_p)
-{
-	int ret = 0;
-
-	do {
-		*dma_p = vfio_find_dma(iommu, start, size);
-		if (!*dma_p)
-			return -EINVAL;
-		else if (!(*dma_p)->vaddr_invalid)
-			return ret;
-		else
-			ret = vfio_wait(iommu);
-	} while (ret == WAITED);
-
-	return ret;
-}
-
-/*
- * Wait for all vaddr in the dma_list to become valid.  iommu lock is dropped
- * if the task waits, but is re-locked on return.  Return 0 on success with no
- * waiting, WAITED on success if waited, and -errno on error.
- */
-static int vfio_wait_all_valid(struct vfio_iommu *iommu)
-{
-	int ret = 0;
-
-	while (iommu->vaddr_invalid_count && ret >= 0)
-		ret = vfio_wait(iommu);
-
-	return ret;
-}
-
 /*
  * Attempt to pin pages.  We really don't want to track all the pfns and
  * the iommu can only map chunks of consecutive pfns anyway, so get the
@@ -861,7 +803,6 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 	unsigned long remote_vaddr;
 	struct vfio_dma *dma;
 	bool do_accounting;
-	dma_addr_t iova;
 
 	if (!iommu || !pages)
 		return -EINVAL;
@@ -878,22 +819,6 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 		goto pin_done;
 	}
 
-	/*
-	 * Wait for all necessary vaddr's to be valid so they can be used in
-	 * the main loop without dropping the lock, to avoid racing vs unmap.
-	 */
-again:
-	if (iommu->vaddr_invalid_count) {
-		for (i = 0; i < npage; i++) {
-			iova = user_iova + PAGE_SIZE * i;
-			ret = vfio_find_dma_valid(iommu, iova, PAGE_SIZE, &dma);
-			if (ret < 0)
-				goto pin_done;
-			if (ret == WAITED)
-				goto again;
-		}
-	}
-
 	/* Fail if no dma_umap notifier is registered */
 	if (list_empty(&iommu->device_list)) {
 		ret = -EINVAL;
@@ -909,6 +834,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 
 	for (i = 0; i < npage; i++) {
 		unsigned long phys_pfn;
+		dma_addr_t iova;
 		struct vfio_pfn *vpfn;
 
 		iova = user_iova + PAGE_SIZE * i;
@@ -1193,10 +1119,8 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
 	put_task_struct(dma->task);
 	mmdrop(dma->mm);
 	vfio_dma_bitmap_free(dma);
-	if (dma->vaddr_invalid) {
+	if (dma->vaddr_invalid)
 		iommu->vaddr_invalid_count--;
-		wake_up_all(&iommu->vaddr_wait);
-	}
 	kfree(dma);
 	iommu->dma_avail++;
 }
@@ -1674,7 +1598,6 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 			dma->vaddr = vaddr;
 			dma->vaddr_invalid = false;
 			iommu->vaddr_invalid_count--;
-			wake_up_all(&iommu->vaddr_wait);
 		}
 		goto out_unlock;
 	} else if (dma) {
@@ -1757,10 +1680,6 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
 	unsigned long limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 	int ret;
 
-	ret = vfio_wait_all_valid(iommu);
-	if (ret < 0)
-		return ret;
-
 	/* Arbitrarily pick the first domain in the list for lookups */
 	if (!list_empty(&iommu->domain_list))
 		d = list_first_entry(&iommu->domain_list,
@@ -2651,7 +2570,6 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 	mutex_init(&iommu->lock);
 	mutex_init(&iommu->device_list_lock);
 	INIT_LIST_HEAD(&iommu->device_list);
-	init_waitqueue_head(&iommu->vaddr_wait);
 	iommu->pgsize_bitmap = PAGE_MASK;
 	INIT_LIST_HEAD(&iommu->emulated_iommu_groups);
 
@@ -3148,13 +3066,12 @@ static int vfio_iommu_type1_dma_rw_chunk(struct vfio_iommu *iommu,
 	struct vfio_dma *dma;
 	bool kthread = current->mm == NULL;
 	size_t offset;
-	int ret;
 
 	*copied = 0;
 
-	ret = vfio_find_dma_valid(iommu, user_iova, 1, &dma);
-	if (ret < 0)
-		return ret;
+	dma = vfio_find_dma(iommu, user_iova, 1);
+	if (!dma)
+		return -EINVAL;
 
 	if ((write && !(dma->prot & IOMMU_WRITE)) ||
 			!(dma->prot & IOMMU_READ))
@@ -3263,7 +3180,6 @@ static void vfio_iommu_type1_notify(void *iommu_data,
 	mutex_lock(&iommu->lock);
 	iommu->container_open = false;
 	mutex_unlock(&iommu->lock);
-	wake_up_all(&iommu->vaddr_wait);
 }
 
 static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V6 6/7] vfio/type1: revert "implement notify callback"
  2022-12-16 18:50 [PATCH V6 0/7] fixes for virtual address update Steve Sistare
                   ` (4 preceding siblings ...)
  2022-12-16 18:50 ` [PATCH V6 5/7] vfio/type1: revert "block on invalid vaddr" Steve Sistare
@ 2022-12-16 18:50 ` Steve Sistare
  2022-12-19  7:55   ` Tian, Kevin
  2022-12-16 18:50 ` [PATCH V6 7/7] vfio: revert "iommu driver " Steve Sistare
  2022-12-19 18:42 ` [PATCH V6 0/7] fixes for virtual address update Steven Sistare
  7 siblings, 1 reply; 19+ messages in thread
From: Steve Sistare @ 2022-12-16 18:50 UTC (permalink / raw)
  To: kvm
  Cc: Alex Williamson, Cornelia Huck, Jason Gunthorpe, Kevin Tian,
	Steve Sistare

This is dead code.  Revert it.
  commit 487ace134053 ("vfio/type1: implement notify callback")

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 drivers/vfio/vfio_iommu_type1.c | 15 ---------------
 1 file changed, 15 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2471f87..e814c60 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -75,7 +75,6 @@ struct vfio_iommu {
 	bool			v2;
 	bool			nesting;
 	bool			dirty_page_tracking;
-	bool			container_open;
 	struct list_head	emulated_iommu_groups;
 };
 
@@ -2566,7 +2565,6 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 	INIT_LIST_HEAD(&iommu->iova_list);
 	iommu->dma_list = RB_ROOT;
 	iommu->dma_avail = dma_entry_limit;
-	iommu->container_open = true;
 	mutex_init(&iommu->lock);
 	mutex_init(&iommu->device_list_lock);
 	INIT_LIST_HEAD(&iommu->device_list);
@@ -3170,18 +3168,6 @@ static int vfio_iommu_type1_dma_rw(void *iommu_data, dma_addr_t user_iova,
 	return domain;
 }
 
-static void vfio_iommu_type1_notify(void *iommu_data,
-				    enum vfio_iommu_notify_type event)
-{
-	struct vfio_iommu *iommu = iommu_data;
-
-	if (event != VFIO_IOMMU_CONTAINER_CLOSE)
-		return;
-	mutex_lock(&iommu->lock);
-	iommu->container_open = false;
-	mutex_unlock(&iommu->lock);
-}
-
 static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
 	.name			= "vfio-iommu-type1",
 	.owner			= THIS_MODULE,
@@ -3196,7 +3182,6 @@ static void vfio_iommu_type1_notify(void *iommu_data,
 	.unregister_device	= vfio_iommu_type1_unregister_device,
 	.dma_rw			= vfio_iommu_type1_dma_rw,
 	.group_iommu_domain	= vfio_iommu_type1_group_iommu_domain,
-	.notify			= vfio_iommu_type1_notify,
 };
 
 static int __init vfio_iommu_type1_init(void)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH V6 7/7] vfio: revert "iommu driver notify callback"
  2022-12-16 18:50 [PATCH V6 0/7] fixes for virtual address update Steve Sistare
                   ` (5 preceding siblings ...)
  2022-12-16 18:50 ` [PATCH V6 6/7] vfio/type1: revert "implement notify callback" Steve Sistare
@ 2022-12-16 18:50 ` Steve Sistare
  2022-12-19  7:55   ` Tian, Kevin
  2022-12-19 18:42 ` [PATCH V6 0/7] fixes for virtual address update Steven Sistare
  7 siblings, 1 reply; 19+ messages in thread
From: Steve Sistare @ 2022-12-16 18:50 UTC (permalink / raw)
  To: kvm
  Cc: Alex Williamson, Cornelia Huck, Jason Gunthorpe, Kevin Tian,
	Steve Sistare

Revert this dead code:
  commit ec5e32940cc9 ("vfio: iommu driver notify callback")

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 drivers/vfio/container.c | 5 -----
 drivers/vfio/vfio.h      | 7 -------
 2 files changed, 12 deletions(-)

diff --git a/drivers/vfio/container.c b/drivers/vfio/container.c
index d74164a..5bfd10d 100644
--- a/drivers/vfio/container.c
+++ b/drivers/vfio/container.c
@@ -382,11 +382,6 @@ static int vfio_fops_open(struct inode *inode, struct file *filep)
 static int vfio_fops_release(struct inode *inode, struct file *filep)
 {
 	struct vfio_container *container = filep->private_data;
-	struct vfio_iommu_driver *driver = container->iommu_driver;
-
-	if (driver && driver->ops->notify)
-		driver->ops->notify(container->iommu_data,
-				    VFIO_IOMMU_CONTAINER_CLOSE);
 
 	filep->private_data = NULL;
 
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index bcad54b..8a439c6 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -62,11 +62,6 @@ struct vfio_group {
 	struct blocking_notifier_head	notifier;
 };
 
-/* events for the backend driver notify callback */
-enum vfio_iommu_notify_type {
-	VFIO_IOMMU_CONTAINER_CLOSE = 0,
-};
-
 /**
  * struct vfio_iommu_driver_ops - VFIO IOMMU driver callbacks
  */
@@ -97,8 +92,6 @@ struct vfio_iommu_driver_ops {
 				  void *data, size_t count, bool write);
 	struct iommu_domain *(*group_iommu_domain)(void *iommu_data,
 						   struct iommu_group *group);
-	void		(*notify)(void *iommu_data,
-				  enum vfio_iommu_notify_type event);
 };
 
 struct vfio_iommu_driver {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* RE: [PATCH V6 2/7] vfio/type1: prevent underflow of locked_vm via exec()
  2022-12-16 18:50 ` [PATCH V6 2/7] vfio/type1: prevent underflow of locked_vm via exec() Steve Sistare
@ 2022-12-19  7:48   ` Tian, Kevin
  2022-12-20 15:01     ` Steven Sistare
  0 siblings, 1 reply; 19+ messages in thread
From: Tian, Kevin @ 2022-12-19  7:48 UTC (permalink / raw)
  To: Steve Sistare, kvm; +Cc: Alex Williamson, Cornelia Huck, Jason Gunthorpe

> From: Steve Sistare <steven.sistare@oracle.com>
> Sent: Saturday, December 17, 2022 2:51 AM
> 
> When a vfio container is preserved across exec, the task does not change,
> but it gets a new mm with locked_vm=0.  If the user later unmaps a dma
> mapping, locked_vm underflows to a large unsigned value, and a subsequent
> dma map request fails with ENOMEM in __account_locked_vm.
> 
> To avoid underflow, grab and save the mm at the time a dma is mapped.
> Use that mm when adjusting locked_vm, rather than re-acquiring the saved
> task's mm, which may have changed.  If the saved mm is dead, do nothing.

worth clarifying that locked_vm of the new mm is still not fixed.

> @@ -1664,15 +1666,7 @@ static int vfio_dma_do_map(struct vfio_iommu
> *iommu,
>  	 * against the locked memory limit and we need to be able to do both
>  	 * outside of this call path as pinning can be asynchronous via the
>  	 * external interfaces for mdev devices.  RLIMIT_MEMLOCK requires
> a
> -	 * task_struct and VM locked pages requires an mm_struct, however
> -	 * holding an indefinite mm reference is not recommended,
> therefore we
> -	 * only hold a reference to a task.  We could hold a reference to
> -	 * current, however QEMU uses this call path through vCPU threads,
> -	 * which can be killed resulting in a NULL mm and failure in the
> unmap
> -	 * path when called via a different thread.  Avoid this problem by
> -	 * using the group_leader as threads within the same group require
> -	 * both CLONE_THREAD and CLONE_VM and will therefore use the
> same
> -	 * mm_struct.
> +	 * task_struct and VM locked pages requires an mm_struct.

IMHO the rationale why choosing group_leader still applies...

otherwise this looks good to me:

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH V6 3/7] vfio/type1: track locked_vm per dma
  2022-12-16 18:50 ` [PATCH V6 3/7] vfio/type1: track locked_vm per dma Steve Sistare
@ 2022-12-19  7:51   ` Tian, Kevin
  0 siblings, 0 replies; 19+ messages in thread
From: Tian, Kevin @ 2022-12-19  7:51 UTC (permalink / raw)
  To: Steve Sistare, kvm; +Cc: Alex Williamson, Cornelia Huck, Jason Gunthorpe

> From: Steve Sistare <steven.sistare@oracle.com>
> Sent: Saturday, December 17, 2022 2:51 AM
> 
> Track locked_vm per dma struct, and create a new subroutine, both for use
> in a subsequent patch.  No functional change.
> 
> Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr")
> Cc: stable@vger.kernel.org
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>, with one nit:
 
> -static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
> +static int task_lock_acct(struct task_struct *task, struct mm_struct *mm,
> +			  bool lock_cap, long npage, bool async)

is it more accurate to call it mm_lock_acct() given almost all invocations
in this function are around mm?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH V6 4/7] vfio/type1: restore locked_vm
  2022-12-16 18:50 ` [PATCH V6 4/7] vfio/type1: restore locked_vm Steve Sistare
@ 2022-12-19  7:54   ` Tian, Kevin
  0 siblings, 0 replies; 19+ messages in thread
From: Tian, Kevin @ 2022-12-19  7:54 UTC (permalink / raw)
  To: Steve Sistare, kvm; +Cc: Alex Williamson, Cornelia Huck, Jason Gunthorpe

> From: Steve Sistare <steven.sistare@oracle.com>
> Sent: Saturday, December 17, 2022 2:51 AM
> 
> +static int vfio_change_dma_owner(struct vfio_dma *dma)
> +{
> +	int ret = 0;
> +	struct task_struct *task = current->group_leader;
> +	struct mm_struct *mm = current->mm;
> +
> +	if (task->mm != dma->mm) {

	if (mm != dma->mm) {
		...

with that fixed:

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH V6 5/7] vfio/type1: revert "block on invalid vaddr"
  2022-12-16 18:50 ` [PATCH V6 5/7] vfio/type1: revert "block on invalid vaddr" Steve Sistare
@ 2022-12-19  7:54   ` Tian, Kevin
  0 siblings, 0 replies; 19+ messages in thread
From: Tian, Kevin @ 2022-12-19  7:54 UTC (permalink / raw)
  To: Steve Sistare, kvm; +Cc: Alex Williamson, Cornelia Huck, Jason Gunthorpe

> From: Steve Sistare <steven.sistare@oracle.com>
> Sent: Saturday, December 17, 2022 2:51 AM
> 
> Revert this dead code:
>   commit 898b9eaeb3fe ("vfio/type1: block on invalid vaddr")
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH V6 6/7] vfio/type1: revert "implement notify callback"
  2022-12-16 18:50 ` [PATCH V6 6/7] vfio/type1: revert "implement notify callback" Steve Sistare
@ 2022-12-19  7:55   ` Tian, Kevin
  0 siblings, 0 replies; 19+ messages in thread
From: Tian, Kevin @ 2022-12-19  7:55 UTC (permalink / raw)
  To: Steve Sistare, kvm; +Cc: Alex Williamson, Cornelia Huck, Jason Gunthorpe

> From: Steve Sistare <steven.sistare@oracle.com>
> Sent: Saturday, December 17, 2022 2:51 AM
> 
> This is dead code.  Revert it.
>   commit 487ace134053 ("vfio/type1: implement notify callback")
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH V6 7/7] vfio: revert "iommu driver notify callback"
  2022-12-16 18:50 ` [PATCH V6 7/7] vfio: revert "iommu driver " Steve Sistare
@ 2022-12-19  7:55   ` Tian, Kevin
  0 siblings, 0 replies; 19+ messages in thread
From: Tian, Kevin @ 2022-12-19  7:55 UTC (permalink / raw)
  To: Steve Sistare, kvm; +Cc: Alex Williamson, Cornelia Huck, Jason Gunthorpe

> From: Steve Sistare <steven.sistare@oracle.com>
> Sent: Saturday, December 17, 2022 2:51 AM
> 
> Revert this dead code:
>   commit ec5e32940cc9 ("vfio: iommu driver notify callback")
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V6 0/7] fixes for virtual address update
  2022-12-16 18:50 [PATCH V6 0/7] fixes for virtual address update Steve Sistare
                   ` (6 preceding siblings ...)
  2022-12-16 18:50 ` [PATCH V6 7/7] vfio: revert "iommu driver " Steve Sistare
@ 2022-12-19 18:42 ` Steven Sistare
  2022-12-19 20:55   ` Alex Williamson
  7 siblings, 1 reply; 19+ messages in thread
From: Steven Sistare @ 2022-12-19 18:42 UTC (permalink / raw)
  To: kvm, Alex Williamson, Jason Gunthorpe; +Cc: Cornelia Huck, Kevin Tian

Alex, Jason, any comments before I post the (hopefully final) version?

- Steve

On 12/16/2022 1:50 PM, Steve Sistare wrote:
> Fix bugs in the interfaces that allow the underlying memory object of an
> iova range to be mapped in a new address space.  They allow userland to
> indefinitely block vfio mediated device kernel threads, and do not
> propagate the locked_vm count to a new mm.  Also fix a pre-existing bug
> that allows locked_vm underflow.
> 
> The fixes impose restrictions that eliminate waiting conditions, so
> revert the dead code:
>   commit 898b9eaeb3fe ("vfio/type1: block on invalid vaddr")
>   commit 487ace134053 ("vfio/type1: implement notify callback")
>   commit ec5e32940cc9 ("vfio: iommu driver notify callback")
> 
> Changes in V2 (thanks Alex):
>   * do not allow group attach while vaddrs are invalid
>   * add patches to delete dead code
>   * add WARN_ON for never-should-happen conditions
>   * check for changed mm in unmap.
>   * check for vfio_lock_acct failure in remap
> 
> Changes in V3 (ditto!):
>   * return errno at WARN_ON sites, and make it unique
>   * correctly check for dma task mm change
>   * change dma owner to current when vaddr is updated
>   * add Fixes to commit messages
>   * refactored new code in vfio_dma_do_map
> 
> Changes in V4:
>   * misc cosmetic changes
> 
> Changes in V5 (thanks Jason and Kevin):
>   * grab mm and use it for locked_vm accounting
>   * separate patches for underflow and restoring locked_vm
>   * account for reserved pages
>   * improve error messages
> 
> Changes in V6:
>   * drop "count reserved pages" patch
>   * add "track locked_vm" patch
>   * grab current->mm not group_leader->mm
>   * simplify vfio_change_dma_owner
>   * fix commit messages
> 
> Steve Sistare (7):
>   vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR
>   vfio/type1: prevent underflow of locked_vm via exec()
>   vfio/type1: track locked_vm per dma
>   vfio/type1: restore locked_vm
>   vfio/type1: revert "block on invalid vaddr"
>   vfio/type1: revert "implement notify callback"
>   vfio: revert "iommu driver notify callback"
> 
>  drivers/vfio/container.c        |   5 -
>  drivers/vfio/vfio.h             |   7 --
>  drivers/vfio/vfio_iommu_type1.c | 226 ++++++++++++++++++----------------------
>  include/uapi/linux/vfio.h       |  15 +--
>  4 files changed, 111 insertions(+), 142 deletions(-)
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V6 0/7] fixes for virtual address update
  2022-12-19 18:42 ` [PATCH V6 0/7] fixes for virtual address update Steven Sistare
@ 2022-12-19 20:55   ` Alex Williamson
  0 siblings, 0 replies; 19+ messages in thread
From: Alex Williamson @ 2022-12-19 20:55 UTC (permalink / raw)
  To: Steven Sistare; +Cc: kvm, Jason Gunthorpe, Cornelia Huck, Kevin Tian

On Mon, 19 Dec 2022 13:42:06 -0500
Steven Sistare <steven.sistare@oracle.com> wrote:

> Alex, Jason, any comments before I post the (hopefully final) version?

I like Kevin's comments, nothing additional from me.  Thanks,

Alex

> On 12/16/2022 1:50 PM, Steve Sistare wrote:
> > Fix bugs in the interfaces that allow the underlying memory object of an
> > iova range to be mapped in a new address space.  They allow userland to
> > indefinitely block vfio mediated device kernel threads, and do not
> > propagate the locked_vm count to a new mm.  Also fix a pre-existing bug
> > that allows locked_vm underflow.
> > 
> > The fixes impose restrictions that eliminate waiting conditions, so
> > revert the dead code:
> >   commit 898b9eaeb3fe ("vfio/type1: block on invalid vaddr")
> >   commit 487ace134053 ("vfio/type1: implement notify callback")
> >   commit ec5e32940cc9 ("vfio: iommu driver notify callback")
> > 
> > Changes in V2 (thanks Alex):
> >   * do not allow group attach while vaddrs are invalid
> >   * add patches to delete dead code
> >   * add WARN_ON for never-should-happen conditions
> >   * check for changed mm in unmap.
> >   * check for vfio_lock_acct failure in remap
> > 
> > Changes in V3 (ditto!):
> >   * return errno at WARN_ON sites, and make it unique
> >   * correctly check for dma task mm change
> >   * change dma owner to current when vaddr is updated
> >   * add Fixes to commit messages
> >   * refactored new code in vfio_dma_do_map
> > 
> > Changes in V4:
> >   * misc cosmetic changes
> > 
> > Changes in V5 (thanks Jason and Kevin):
> >   * grab mm and use it for locked_vm accounting
> >   * separate patches for underflow and restoring locked_vm
> >   * account for reserved pages
> >   * improve error messages
> > 
> > Changes in V6:
> >   * drop "count reserved pages" patch
> >   * add "track locked_vm" patch
> >   * grab current->mm not group_leader->mm
> >   * simplify vfio_change_dma_owner
> >   * fix commit messages
> > 
> > Steve Sistare (7):
> >   vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR
> >   vfio/type1: prevent underflow of locked_vm via exec()
> >   vfio/type1: track locked_vm per dma
> >   vfio/type1: restore locked_vm
> >   vfio/type1: revert "block on invalid vaddr"
> >   vfio/type1: revert "implement notify callback"
> >   vfio: revert "iommu driver notify callback"
> > 
> >  drivers/vfio/container.c        |   5 -
> >  drivers/vfio/vfio.h             |   7 --
> >  drivers/vfio/vfio_iommu_type1.c | 226 ++++++++++++++++++----------------------
> >  include/uapi/linux/vfio.h       |  15 +--
> >  4 files changed, 111 insertions(+), 142 deletions(-)
> >   
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V6 2/7] vfio/type1: prevent underflow of locked_vm via exec()
  2022-12-19  7:48   ` Tian, Kevin
@ 2022-12-20 15:01     ` Steven Sistare
  2022-12-20 21:59       ` Alex Williamson
  0 siblings, 1 reply; 19+ messages in thread
From: Steven Sistare @ 2022-12-20 15:01 UTC (permalink / raw)
  To: Tian, Kevin, kvm; +Cc: Alex Williamson, Cornelia Huck, Jason Gunthorpe

On 12/19/2022 2:48 AM, Tian, Kevin wrote:
>> From: Steve Sistare <steven.sistare@oracle.com>
>> Sent: Saturday, December 17, 2022 2:51 AM
>>
>> When a vfio container is preserved across exec, the task does not change,
>> but it gets a new mm with locked_vm=0.  If the user later unmaps a dma
>> mapping, locked_vm underflows to a large unsigned value, and a subsequent
>> dma map request fails with ENOMEM in __account_locked_vm.
>>
>> To avoid underflow, grab and save the mm at the time a dma is mapped.
>> Use that mm when adjusting locked_vm, rather than re-acquiring the saved
>> task's mm, which may have changed.  If the saved mm is dead, do nothing.
> 
> worth clarifying that locked_vm of the new mm is still not fixed.

Will do.

>> @@ -1664,15 +1666,7 @@ static int vfio_dma_do_map(struct vfio_iommu
>> *iommu,
>>  	 * against the locked memory limit and we need to be able to do both
>>  	 * outside of this call path as pinning can be asynchronous via the
>>  	 * external interfaces for mdev devices.  RLIMIT_MEMLOCK requires
>> a
>> -	 * task_struct and VM locked pages requires an mm_struct, however
>> -	 * holding an indefinite mm reference is not recommended,
>> therefore we
>> -	 * only hold a reference to a task.  We could hold a reference to
>> -	 * current, however QEMU uses this call path through vCPU threads,
>> -	 * which can be killed resulting in a NULL mm and failure in the
>> unmap
>> -	 * path when called via a different thread.  Avoid this problem by
>> -	 * using the group_leader as threads within the same group require
>> -	 * both CLONE_THREAD and CLONE_VM and will therefore use the
>> same
>> -	 * mm_struct.
>> +	 * task_struct and VM locked pages requires an mm_struct.
> 
> IMHO the rationale why choosing group_leader still applies...

I don't see why it still applies.  With the new code, we may save a reference 
to current or current->group_leader, without error.  "NULL mm and failure in the 
unmap path" will not happen with mmgrab.  task->signal->rlimit is shared, so it 
does not matter which task we use, or whether the task is dead, as long as 
one of the tasks lives, which is guaranteed by the mmget_not_zero() guard.  Am
I missing something?

I kept current->group_leader for ease of debugging, so that all dma's are owned 
by the same task.

- Steve

> otherwise this looks good to me:
> 
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V6 2/7] vfio/type1: prevent underflow of locked_vm via exec()
  2022-12-20 15:01     ` Steven Sistare
@ 2022-12-20 21:59       ` Alex Williamson
  2022-12-20 22:06         ` Steven Sistare
  0 siblings, 1 reply; 19+ messages in thread
From: Alex Williamson @ 2022-12-20 21:59 UTC (permalink / raw)
  To: Steven Sistare; +Cc: Tian, Kevin, kvm, Cornelia Huck, Jason Gunthorpe

On Tue, 20 Dec 2022 10:01:21 -0500
Steven Sistare <steven.sistare@oracle.com> wrote:

> On 12/19/2022 2:48 AM, Tian, Kevin wrote:
> >> From: Steve Sistare <steven.sistare@oracle.com>
> >> Sent: Saturday, December 17, 2022 2:51 AM
> >> @@ -1664,15 +1666,7 @@ static int vfio_dma_do_map(struct vfio_iommu
> >> *iommu,
> >>  	 * against the locked memory limit and we need to be able to do both
> >>  	 * outside of this call path as pinning can be asynchronous via the
> >>  	 * external interfaces for mdev devices.  RLIMIT_MEMLOCK requires
> >> a
> >> -	 * task_struct and VM locked pages requires an mm_struct, however
> >> -	 * holding an indefinite mm reference is not recommended,
> >> therefore we
> >> -	 * only hold a reference to a task.  We could hold a reference to
> >> -	 * current, however QEMU uses this call path through vCPU threads,
> >> -	 * which can be killed resulting in a NULL mm and failure in the
> >> unmap
> >> -	 * path when called via a different thread.  Avoid this problem by
> >> -	 * using the group_leader as threads within the same group require
> >> -	 * both CLONE_THREAD and CLONE_VM and will therefore use the
> >> same
> >> -	 * mm_struct.
> >> +	 * task_struct and VM locked pages requires an mm_struct.  
> > 
> > IMHO the rationale why choosing group_leader still applies...  
> 
> I don't see why it still applies.  With the new code, we may save a reference 
> to current or current->group_leader, without error.  "NULL mm and failure in the 
> unmap path" will not happen with mmgrab.  task->signal->rlimit is shared, so it 
> does not matter which task we use, or whether the task is dead, as long as 
> one of the tasks lives, which is guaranteed by the mmget_not_zero() guard.  Am
> I missing something?
> 
> I kept current->group_leader for ease of debugging, so that all dma's are owned 
> by the same task.

That much at least would be a good comment to add since the above
removes all justification for why we're storing group_leader as the
task rather than current.  Ex:

	QEMU typically calls this path through vCPU threads, which can
	terminate due to vCPU hotplug, therefore historically we've used
	group_leader for task tracking.  With the addition of grabbing
	the mm for the life of the DMA tracking structure, this is not
	so much a concern, but we continue to use group_leader for
	debug'ability, ie. all DMA tracking is owned by the same task.

Given the upcoming holidays and reviewers starting to disappear, I
suggest we take this up as a fixes series for v6.2 after the new year.
The remainder of the vfio next branch for v6.2 has already been merged.
Thanks,

Alex


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V6 2/7] vfio/type1: prevent underflow of locked_vm via exec()
  2022-12-20 21:59       ` Alex Williamson
@ 2022-12-20 22:06         ` Steven Sistare
  0 siblings, 0 replies; 19+ messages in thread
From: Steven Sistare @ 2022-12-20 22:06 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Tian, Kevin, kvm, Cornelia Huck, Jason Gunthorpe

On 12/20/2022 4:59 PM, Alex Williamson wrote:
> On Tue, 20 Dec 2022 10:01:21 -0500
> Steven Sistare <steven.sistare@oracle.com> wrote:
> 
>> On 12/19/2022 2:48 AM, Tian, Kevin wrote:
>>>> From: Steve Sistare <steven.sistare@oracle.com>
>>>> Sent: Saturday, December 17, 2022 2:51 AM
>>>> @@ -1664,15 +1666,7 @@ static int vfio_dma_do_map(struct vfio_iommu
>>>> *iommu,
>>>>  	 * against the locked memory limit and we need to be able to do both
>>>>  	 * outside of this call path as pinning can be asynchronous via the
>>>>  	 * external interfaces for mdev devices.  RLIMIT_MEMLOCK requires
>>>> a
>>>> -	 * task_struct and VM locked pages requires an mm_struct, however
>>>> -	 * holding an indefinite mm reference is not recommended,
>>>> therefore we
>>>> -	 * only hold a reference to a task.  We could hold a reference to
>>>> -	 * current, however QEMU uses this call path through vCPU threads,
>>>> -	 * which can be killed resulting in a NULL mm and failure in the
>>>> unmap
>>>> -	 * path when called via a different thread.  Avoid this problem by
>>>> -	 * using the group_leader as threads within the same group require
>>>> -	 * both CLONE_THREAD and CLONE_VM and will therefore use the
>>>> same
>>>> -	 * mm_struct.
>>>> +	 * task_struct and VM locked pages requires an mm_struct.  
>>>
>>> IMHO the rationale why choosing group_leader still applies...  
>>
>> I don't see why it still applies.  With the new code, we may save a reference 
>> to current or current->group_leader, without error.  "NULL mm and failure in the 
>> unmap path" will not happen with mmgrab.  task->signal->rlimit is shared, so it 
>> does not matter which task we use, or whether the task is dead, as long as 
>> one of the tasks lives, which is guaranteed by the mmget_not_zero() guard.  Am
>> I missing something?
>>
>> I kept current->group_leader for ease of debugging, so that all dma's are owned 
>> by the same task.
> 
> That much at least would be a good comment to add since the above
> removes all justification for why we're storing group_leader as the
> task rather than current.  Ex:
> 
> 	QEMU typically calls this path through vCPU threads, which can
> 	terminate due to vCPU hotplug, therefore historically we've used
> 	group_leader for task tracking.  With the addition of grabbing
> 	the mm for the life of the DMA tracking structure, this is not
> 	so much a concern, but we continue to use group_leader for
> 	debug'ability, ie. all DMA tracking is owned by the same task.
> 
> Given the upcoming holidays and reviewers starting to disappear, I
> suggest we take this up as a fixes series for v6.2 after the new year.
> The remainder of the vfio next branch for v6.2 has already been merged.
> Thanks,

Will do - Steve

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2022-12-20 22:07 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-16 18:50 [PATCH V6 0/7] fixes for virtual address update Steve Sistare
2022-12-16 18:50 ` [PATCH V6 1/7] vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR Steve Sistare
2022-12-16 18:50 ` [PATCH V6 2/7] vfio/type1: prevent underflow of locked_vm via exec() Steve Sistare
2022-12-19  7:48   ` Tian, Kevin
2022-12-20 15:01     ` Steven Sistare
2022-12-20 21:59       ` Alex Williamson
2022-12-20 22:06         ` Steven Sistare
2022-12-16 18:50 ` [PATCH V6 3/7] vfio/type1: track locked_vm per dma Steve Sistare
2022-12-19  7:51   ` Tian, Kevin
2022-12-16 18:50 ` [PATCH V6 4/7] vfio/type1: restore locked_vm Steve Sistare
2022-12-19  7:54   ` Tian, Kevin
2022-12-16 18:50 ` [PATCH V6 5/7] vfio/type1: revert "block on invalid vaddr" Steve Sistare
2022-12-19  7:54   ` Tian, Kevin
2022-12-16 18:50 ` [PATCH V6 6/7] vfio/type1: revert "implement notify callback" Steve Sistare
2022-12-19  7:55   ` Tian, Kevin
2022-12-16 18:50 ` [PATCH V6 7/7] vfio: revert "iommu driver " Steve Sistare
2022-12-19  7:55   ` Tian, Kevin
2022-12-19 18:42 ` [PATCH V6 0/7] fixes for virtual address update Steven Sistare
2022-12-19 20:55   ` Alex Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.