All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V5 0/7] fixes for virtual address update
@ 2022-12-15 21:56 Steve Sistare
  2022-12-15 21:56 ` [PATCH V5 1/7] vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR Steve Sistare
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: Steve Sistare @ 2022-12-15 21:56 UTC (permalink / raw)
  To: kvm
  Cc: Alex Williamson, Cornelia Huck, Jason Gunthorpe, Kevin Tian,
	Steve Sistare

Fix bugs in the interfaces that allow the underlying memory object of an
iova range to be mapped in a new address space.  They allow userland to
indefinitely block vfio mediated device kernel threads, and do not
propagate the locked_vm count to a new mm.  Also fix a pre-existing bug
that allows locked_vm underflow.

The fixes impose restrictions that eliminate waiting conditions, so
revert the dead code:
  commit 898b9eaeb3fe ("vfio/type1: block on invalid vaddr")
  commit 487ace134053 ("vfio/type1: implement notify callback")
  commit ec5e32940cc9 ("vfio: iommu driver notify callback")

Changes in V2 (thanks Alex):
  * do not allow group attach while vaddrs are invalid
  * add patches to delete dead code
  * add WARN_ON for never-should-happen conditions
  * check for changed mm in unmap.
  * check for vfio_lock_acct failure in remap

Changes in V3 (ditto!):
  * return errno at WARN_ON sites, and make it unique
  * correctly check for dma task mm change
  * change dma owner to current when vaddr is updated
  * add Fixes to commit messages
  * refactored new code in vfio_dma_do_map

Changes in V4:
  * misc cosmetic changes

Changes in V5 (thanks Jason and Kevin):
  * grab mm and use it for locked_vm accounting
  * separate patches for underflow and restoring locked_vm
  * account for reserved pages
  * improve error messages

Steve Sistare (7):
  vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR
  vfio/type1: prevent locked_vm underflow
  vfio/type1: count reserved pages
  vfio/type1: restore locked_vm
  vfio/type1: revert "block on invalid vaddr"
  vfio/type1: revert "implement notify callback"
  vfio: revert "iommu driver notify callback"

 drivers/vfio/container.c        |   5 -
 drivers/vfio/vfio.h             |   7 --
 drivers/vfio/vfio_iommu_type1.c | 215 +++++++++++++++++++---------------------
 include/uapi/linux/vfio.h       |  15 +--
 4 files changed, 110 insertions(+), 132 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH V5 1/7] vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR
  2022-12-15 21:56 [PATCH V5 0/7] fixes for virtual address update Steve Sistare
@ 2022-12-15 21:56 ` Steve Sistare
  2022-12-16 14:10   ` Jason Gunthorpe
  2022-12-15 21:56 ` [PATCH V5 2/7] vfio/type1: prevent locked_vm underflow Steve Sistare
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Steve Sistare @ 2022-12-15 21:56 UTC (permalink / raw)
  To: kvm
  Cc: Alex Williamson, Cornelia Huck, Jason Gunthorpe, Kevin Tian,
	Steve Sistare

Disable the VFIO_UPDATE_VADDR capability if mediated devices are present.
Their kernel threads could be blocked indefinitely by a misbehaving
userland while trying to pin/unpin pages while vaddrs are being updated.

Do not allow groups to be added to the container while vaddr's are invalid,
so we never need to block user threads from pinning, and can delete the
vaddr-waiting code in a subsequent patch.

Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr")

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 drivers/vfio/vfio_iommu_type1.c | 44 +++++++++++++++++++++++++++++++++++++++--
 include/uapi/linux/vfio.h       | 15 ++++++++------
 2 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 23c24fe..144f5bb 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -861,6 +861,12 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 
 	mutex_lock(&iommu->lock);
 
+	if (WARN_ONCE(iommu->vaddr_invalid_count,
+		      "vfio_pin_pages not allowed with VFIO_UPDATE_VADDR\n")) {
+		ret = -EBUSY;
+		goto pin_done;
+	}
+
 	/*
 	 * Wait for all necessary vaddr's to be valid so they can be used in
 	 * the main loop without dropping the lock, to avoid racing vs unmap.
@@ -1343,6 +1349,12 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 
 	mutex_lock(&iommu->lock);
 
+	/* Cannot update vaddr if mdev is present. */
+	if (invalidate_vaddr && !list_empty(&iommu->emulated_iommu_groups)) {
+		ret = -EBUSY;
+		goto unlock;
+	}
+
 	pgshift = __ffs(iommu->pgsize_bitmap);
 	pgsize = (size_t)1 << pgshift;
 
@@ -2185,11 +2197,16 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 	struct iommu_domain_geometry *geo;
 	LIST_HEAD(iova_copy);
 	LIST_HEAD(group_resv_regions);
-	int ret = -EINVAL;
+	int ret = -EBUSY;
 
 	mutex_lock(&iommu->lock);
 
+	/* Attach could require pinning, so disallow while vaddr is invalid. */
+	if (iommu->vaddr_invalid_count)
+		goto out_unlock;
+
 	/* Check for duplicates */
+	ret = -EINVAL;
 	if (vfio_iommu_find_iommu_group(iommu, iommu_group))
 		goto out_unlock;
 
@@ -2660,6 +2677,16 @@ static int vfio_domains_have_enforce_cache_coherency(struct vfio_iommu *iommu)
 	return ret;
 }
 
+static bool vfio_iommu_has_emulated(struct vfio_iommu *iommu)
+{
+	bool ret;
+
+	mutex_lock(&iommu->lock);
+	ret = !list_empty(&iommu->emulated_iommu_groups);
+	mutex_unlock(&iommu->lock);
+	return ret;
+}
+
 static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu,
 					    unsigned long arg)
 {
@@ -2668,8 +2695,13 @@ static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu,
 	case VFIO_TYPE1v2_IOMMU:
 	case VFIO_TYPE1_NESTING_IOMMU:
 	case VFIO_UNMAP_ALL:
-	case VFIO_UPDATE_VADDR:
 		return 1;
+	case VFIO_UPDATE_VADDR:
+		/*
+		 * Disable this feature if mdevs are present.  They cannot
+		 * safely pin/unpin/rw while vaddrs are being updated.
+		 */
+		return iommu && !vfio_iommu_has_emulated(iommu);
 	case VFIO_DMA_CC_IOMMU:
 		if (!iommu)
 			return 0;
@@ -3138,6 +3170,13 @@ static int vfio_iommu_type1_dma_rw(void *iommu_data, dma_addr_t user_iova,
 	size_t done;
 
 	mutex_lock(&iommu->lock);
+
+	if (WARN_ONCE(iommu->vaddr_invalid_count,
+		      "vfio_dma_rw not allowed with VFIO_UPDATE_VADDR\n")) {
+		ret = -EBUSY;
+		goto out;
+	}
+
 	while (count > 0) {
 		ret = vfio_iommu_type1_dma_rw_chunk(iommu, user_iova, data,
 						    count, write, &done);
@@ -3149,6 +3188,7 @@ static int vfio_iommu_type1_dma_rw(void *iommu_data, dma_addr_t user_iova,
 		user_iova += done;
 	}
 
+out:
 	mutex_unlock(&iommu->lock);
 	return ret;
 }
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index d7d8e09..4e8d344 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -49,7 +49,11 @@
 /* Supports VFIO_DMA_UNMAP_FLAG_ALL */
 #define VFIO_UNMAP_ALL			9
 
-/* Supports the vaddr flag for DMA map and unmap */
+/*
+ * Supports the vaddr flag for DMA map and unmap.  Not supported for mediated
+ * devices, so this capability is subject to change as groups are added or
+ * removed.
+ */
 #define VFIO_UPDATE_VADDR		10
 
 /*
@@ -1215,8 +1219,7 @@ struct vfio_iommu_type1_info_dma_avail {
  * Map process virtual addresses to IO virtual addresses using the
  * provided struct vfio_dma_map. Caller sets argsz. READ &/ WRITE required.
  *
- * If flags & VFIO_DMA_MAP_FLAG_VADDR, update the base vaddr for iova, and
- * unblock translation of host virtual addresses in the iova range.  The vaddr
+ * If flags & VFIO_DMA_MAP_FLAG_VADDR, update the base vaddr for iova. The vaddr
  * must have previously been invalidated with VFIO_DMA_UNMAP_FLAG_VADDR.  To
  * maintain memory consistency within the user application, the updated vaddr
  * must address the same memory object as originally mapped.  Failure to do so
@@ -1267,9 +1270,9 @@ struct vfio_bitmap {
  * must be 0.  This cannot be combined with the get-dirty-bitmap flag.
  *
  * If flags & VFIO_DMA_UNMAP_FLAG_VADDR, do not unmap, but invalidate host
- * virtual addresses in the iova range.  Tasks that attempt to translate an
- * iova's vaddr will block.  DMA to already-mapped pages continues.  This
- * cannot be combined with the get-dirty-bitmap flag.
+ * virtual addresses in the iova range.  DMA to already-mapped pages continues.
+ * Groups may not be added to the container while any addresses are invalid.
+ * This cannot be combined with the get-dirty-bitmap flag.
  */
 struct vfio_iommu_type1_dma_unmap {
 	__u32	argsz;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V5 2/7] vfio/type1: prevent locked_vm underflow
  2022-12-15 21:56 [PATCH V5 0/7] fixes for virtual address update Steve Sistare
  2022-12-15 21:56 ` [PATCH V5 1/7] vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR Steve Sistare
@ 2022-12-15 21:56 ` Steve Sistare
  2022-12-16 14:09   ` Jason Gunthorpe
  2022-12-15 21:57 ` [PATCH V5 3/7] vfio/type1: count reserved pages Steve Sistare
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Steve Sistare @ 2022-12-15 21:56 UTC (permalink / raw)
  To: kvm
  Cc: Alex Williamson, Cornelia Huck, Jason Gunthorpe, Kevin Tian,
	Steve Sistare

When a vfio container is preserved across exec, the task does not change,
but it gets a new mm with locked_vm=0.  If the user later unmaps a dma
mapping, locked_vm underflows to a large unsigned value, and a subsequent
dma map request fails with ENOMEM in __account_locked_vm.

To avoid underflow, grab and save the mm at the time a dma is mapped.
Use that mm when adjusting locked_vm, rather than re-acquiring the saved
task's mm, which may have changed.  If the saved mm is dead, do nothing.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 drivers/vfio/vfio_iommu_type1.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 144f5bb..cd49b656 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -100,6 +100,7 @@ struct vfio_dma {
 	struct task_struct	*task;
 	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
 	unsigned long		*bitmap;
+	struct mm_struct	*mm;
 };
 
 struct vfio_batch {
@@ -420,8 +421,8 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
 	if (!npage)
 		return 0;
 
-	mm = async ? get_task_mm(dma->task) : dma->task->mm;
-	if (!mm)
+	mm = dma->mm;
+	if (async && !mmget_not_zero(mm))
 		return -ESRCH; /* process exited */
 
 	ret = mmap_write_lock_killable(mm);
@@ -794,8 +795,8 @@ static int vfio_pin_page_external(struct vfio_dma *dma, unsigned long vaddr,
 	struct mm_struct *mm;
 	int ret;
 
-	mm = get_task_mm(dma->task);
-	if (!mm)
+	mm = dma->mm;
+	if (!mmget_not_zero(mm))
 		return -ENODEV;
 
 	ret = vaddr_get_pfns(mm, vaddr, 1, dma->prot, pfn_base, pages);
@@ -1180,6 +1181,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
 	vfio_unmap_unpin(iommu, dma, true);
 	vfio_unlink_dma(iommu, dma);
 	put_task_struct(dma->task);
+	mmdrop(dma->mm);
 	vfio_dma_bitmap_free(dma);
 	if (dma->vaddr_invalid) {
 		iommu->vaddr_invalid_count--;
@@ -1687,6 +1689,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 	get_task_struct(current->group_leader);
 	dma->task = current->group_leader;
 	dma->lock_cap = capable(CAP_IPC_LOCK);
+	dma->mm = dma->task->mm;
+	mmgrab(dma->mm);
 
 	dma->pfn_list = RB_ROOT;
 
@@ -3122,9 +3126,8 @@ static int vfio_iommu_type1_dma_rw_chunk(struct vfio_iommu *iommu,
 			!(dma->prot & IOMMU_READ))
 		return -EPERM;
 
-	mm = get_task_mm(dma->task);
-
-	if (!mm)
+	mm = dma->mm;
+	if (!mmget_not_zero(mm))
 		return -EPERM;
 
 	if (kthread)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V5 3/7] vfio/type1: count reserved pages
  2022-12-15 21:56 [PATCH V5 0/7] fixes for virtual address update Steve Sistare
  2022-12-15 21:56 ` [PATCH V5 1/7] vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR Steve Sistare
  2022-12-15 21:56 ` [PATCH V5 2/7] vfio/type1: prevent locked_vm underflow Steve Sistare
@ 2022-12-15 21:57 ` Steve Sistare
  2022-12-15 22:15   ` Steven Sistare
  2022-12-15 21:57 ` [PATCH V5 4/7] vfio/type1: restore locked_vm Steve Sistare
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Steve Sistare @ 2022-12-15 21:57 UTC (permalink / raw)
  To: kvm
  Cc: Alex Williamson, Cornelia Huck, Jason Gunthorpe, Kevin Tian,
	Steve Sistare

A pinned dma mapping may include reserved pages, which are not included
in the task's locked_vm count.  Maintain a count of reserved pages, for
iommu capable devices, so that locked_vm can be restored after fork or
exec in a subsequent patch.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 drivers/vfio/vfio_iommu_type1.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index cd49b656..add87cd 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -101,6 +101,7 @@ struct vfio_dma {
 	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
 	unsigned long		*bitmap;
 	struct mm_struct	*mm;
+	long			reserved_pages;
 };
 
 struct vfio_batch {
@@ -662,7 +663,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
 {
 	unsigned long pfn;
 	struct mm_struct *mm = current->mm;
-	long ret, pinned = 0, lock_acct = 0;
+	long ret, pinned = 0, lock_acct = 0, reserved_pages = 0;
 	bool rsvd;
 	dma_addr_t iova = vaddr - dma->vaddr + dma->iova;
 
@@ -716,7 +717,9 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
 			 * externally pinned pages are already counted against
 			 * the user.
 			 */
-			if (!rsvd && !vfio_find_vpfn(dma, iova)) {
+			if (rsvd) {
+				reserved_pages++;
+			} else if (!vfio_find_vpfn(dma, iova)) {
 				if (!dma->lock_cap &&
 				    mm->locked_vm + lock_acct + 1 > limit) {
 					pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n",
@@ -746,6 +749,8 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
 
 out:
 	ret = vfio_lock_acct(dma, lock_acct, false);
+	if (!ret)
+		dma->reserved_pages += reserved_pages;
 
 unpin_out:
 	if (batch->size == 1 && !batch->offset) {
@@ -771,7 +776,7 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
 				    unsigned long pfn, long npage,
 				    bool do_accounting)
 {
-	long unlocked = 0, locked = 0;
+	long unlocked = 0, locked = 0, reserved_pages = 0;
 	long i;
 
 	for (i = 0; i < npage; i++, iova += PAGE_SIZE) {
@@ -779,9 +784,12 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
 			unlocked++;
 			if (vfio_find_vpfn(dma, iova))
 				locked++;
+		} else {
+			reserved_pages++;
 		}
 	}
 
+	dma->reserved_pages -= reserved_pages;
 	if (do_accounting)
 		vfio_lock_acct(dma, locked - unlocked, true);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V5 4/7] vfio/type1: restore locked_vm
  2022-12-15 21:56 [PATCH V5 0/7] fixes for virtual address update Steve Sistare
                   ` (2 preceding siblings ...)
  2022-12-15 21:57 ` [PATCH V5 3/7] vfio/type1: count reserved pages Steve Sistare
@ 2022-12-15 21:57 ` Steve Sistare
  2022-12-16 14:12   ` Jason Gunthorpe
  2022-12-15 21:57 ` [PATCH V5 5/7] vfio/type1: revert "block on invalid vaddr" Steve Sistare
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: Steve Sistare @ 2022-12-15 21:57 UTC (permalink / raw)
  To: kvm
  Cc: Alex Williamson, Cornelia Huck, Jason Gunthorpe, Kevin Tian,
	Steve Sistare

When a vfio container is preserved across exec or fork-exec, the new
task's mm has a locked_vm count of 0.  After a dma vaddr is updated using
VFIO_DMA_MAP_FLAG_VADDR, locked_vm remains 0, and the pinned memory does
not count against the task's RLIMIT_MEMLOCK.

To restore the correct locked_vm count, when VFIO_DMA_MAP_FLAG_VADDR is
used and the dma's mm has changed, add the mapping's pinned page count to
the new mm->locked_vm, subject to the rlimit.  Now that mediated devices
are excluded when using VFIO_UPDATE_VADDR, the amount of pinned memory
equals the size of the mapping less the reserved page count.

Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr")

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 drivers/vfio/vfio_iommu_type1.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index add87cd..70b52e9 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -1588,6 +1588,38 @@ static bool vfio_iommu_iova_dma_valid(struct vfio_iommu *iommu,
 	return list_empty(iova);
 }
 
+static int vfio_change_dma_owner(struct vfio_dma *dma)
+{
+	struct task_struct *new_task = current->group_leader;
+
+	if (new_task->mm != dma->mm) {
+		long npage = (dma->size >> PAGE_SHIFT) - dma->reserved_pages;
+		bool new_lock_cap = capable(CAP_IPC_LOCK);
+		int ret = mmap_write_lock_killable(new_task->mm);
+
+		if (ret)
+			return ret;
+
+		ret = __account_locked_vm(new_task->mm, npage, true,
+					  new_task, new_lock_cap);
+		mmap_write_unlock(new_task->mm);
+		if (ret)
+			return ret;
+
+		vfio_lock_acct(dma, -npage, true);
+		if (dma->task != new_task) {
+			put_task_struct(dma->task);
+			dma->task = get_task_struct(new_task);
+		}
+		mmdrop(dma->mm);
+		dma->mm = new_task->mm;
+		mmgrab(dma->mm);
+		dma->lock_cap = new_lock_cap;
+	}
+
+	return 0;
+}
+
 static int vfio_dma_do_map(struct vfio_iommu *iommu,
 			   struct vfio_iommu_type1_dma_map *map)
 {
@@ -1637,6 +1669,9 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 			   dma->size != size) {
 			ret = -EINVAL;
 		} else {
+			ret = vfio_change_dma_owner(dma);
+			if (ret)
+				goto out_unlock;
 			dma->vaddr = vaddr;
 			dma->vaddr_invalid = false;
 			iommu->vaddr_invalid_count--;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V5 5/7] vfio/type1: revert "block on invalid vaddr"
  2022-12-15 21:56 [PATCH V5 0/7] fixes for virtual address update Steve Sistare
                   ` (3 preceding siblings ...)
  2022-12-15 21:57 ` [PATCH V5 4/7] vfio/type1: restore locked_vm Steve Sistare
@ 2022-12-15 21:57 ` Steve Sistare
  2022-12-15 21:57 ` [PATCH V5 6/7] vfio/type1: revert "implement notify callback" Steve Sistare
  2022-12-15 21:57 ` [PATCH V5 7/7] vfio: revert "iommu driver " Steve Sistare
  6 siblings, 0 replies; 17+ messages in thread
From: Steve Sistare @ 2022-12-15 21:57 UTC (permalink / raw)
  To: kvm
  Cc: Alex Williamson, Cornelia Huck, Jason Gunthorpe, Kevin Tian,
	Steve Sistare

Revert this dead code:
  commit 898b9eaeb3fe ("vfio/type1: block on invalid vaddr")

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 drivers/vfio/vfio_iommu_type1.c | 94 +++--------------------------------------
 1 file changed, 5 insertions(+), 89 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 70b52e9..c08c0a3 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -72,7 +72,6 @@ struct vfio_iommu {
 	unsigned int		vaddr_invalid_count;
 	uint64_t		pgsize_bitmap;
 	uint64_t		num_non_pinned_groups;
-	wait_queue_head_t	vaddr_wait;
 	bool			v2;
 	bool			nesting;
 	bool			dirty_page_tracking;
@@ -154,8 +153,6 @@ struct vfio_regions {
 #define DIRTY_BITMAP_PAGES_MAX	 ((u64)INT_MAX)
 #define DIRTY_BITMAP_SIZE_MAX	 DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
 
-#define WAITED 1
-
 static int put_pfn(unsigned long pfn, int prot);
 
 static struct vfio_iommu_group*
@@ -597,61 +594,6 @@ static int vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
 	return ret;
 }
 
-static int vfio_wait(struct vfio_iommu *iommu)
-{
-	DEFINE_WAIT(wait);
-
-	prepare_to_wait(&iommu->vaddr_wait, &wait, TASK_KILLABLE);
-	mutex_unlock(&iommu->lock);
-	schedule();
-	mutex_lock(&iommu->lock);
-	finish_wait(&iommu->vaddr_wait, &wait);
-	if (kthread_should_stop() || !iommu->container_open ||
-	    fatal_signal_pending(current)) {
-		return -EFAULT;
-	}
-	return WAITED;
-}
-
-/*
- * Find dma struct and wait for its vaddr to be valid.  iommu lock is dropped
- * if the task waits, but is re-locked on return.  Return result in *dma_p.
- * Return 0 on success with no waiting, WAITED on success if waited, and -errno
- * on error.
- */
-static int vfio_find_dma_valid(struct vfio_iommu *iommu, dma_addr_t start,
-			       size_t size, struct vfio_dma **dma_p)
-{
-	int ret = 0;
-
-	do {
-		*dma_p = vfio_find_dma(iommu, start, size);
-		if (!*dma_p)
-			return -EINVAL;
-		else if (!(*dma_p)->vaddr_invalid)
-			return ret;
-		else
-			ret = vfio_wait(iommu);
-	} while (ret == WAITED);
-
-	return ret;
-}
-
-/*
- * Wait for all vaddr in the dma_list to become valid.  iommu lock is dropped
- * if the task waits, but is re-locked on return.  Return 0 on success with no
- * waiting, WAITED on success if waited, and -errno on error.
- */
-static int vfio_wait_all_valid(struct vfio_iommu *iommu)
-{
-	int ret = 0;
-
-	while (iommu->vaddr_invalid_count && ret >= 0)
-		ret = vfio_wait(iommu);
-
-	return ret;
-}
-
 /*
  * Attempt to pin pages.  We really don't want to track all the pfns and
  * the iommu can only map chunks of consecutive pfns anyway, so get the
@@ -859,7 +801,6 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 	unsigned long remote_vaddr;
 	struct vfio_dma *dma;
 	bool do_accounting;
-	dma_addr_t iova;
 
 	if (!iommu || !pages)
 		return -EINVAL;
@@ -876,22 +817,6 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 		goto pin_done;
 	}
 
-	/*
-	 * Wait for all necessary vaddr's to be valid so they can be used in
-	 * the main loop without dropping the lock, to avoid racing vs unmap.
-	 */
-again:
-	if (iommu->vaddr_invalid_count) {
-		for (i = 0; i < npage; i++) {
-			iova = user_iova + PAGE_SIZE * i;
-			ret = vfio_find_dma_valid(iommu, iova, PAGE_SIZE, &dma);
-			if (ret < 0)
-				goto pin_done;
-			if (ret == WAITED)
-				goto again;
-		}
-	}
-
 	/* Fail if no dma_umap notifier is registered */
 	if (list_empty(&iommu->device_list)) {
 		ret = -EINVAL;
@@ -907,6 +832,7 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
 
 	for (i = 0; i < npage; i++) {
 		unsigned long phys_pfn;
+		dma_addr_t iova;
 		struct vfio_pfn *vpfn;
 
 		iova = user_iova + PAGE_SIZE * i;
@@ -1191,10 +1117,8 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
 	put_task_struct(dma->task);
 	mmdrop(dma->mm);
 	vfio_dma_bitmap_free(dma);
-	if (dma->vaddr_invalid) {
+	if (dma->vaddr_invalid)
 		iommu->vaddr_invalid_count--;
-		wake_up_all(&iommu->vaddr_wait);
-	}
 	kfree(dma);
 	iommu->dma_avail++;
 }
@@ -1675,7 +1599,6 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
 			dma->vaddr = vaddr;
 			dma->vaddr_invalid = false;
 			iommu->vaddr_invalid_count--;
-			wake_up_all(&iommu->vaddr_wait);
 		}
 		goto out_unlock;
 	} else if (dma) {
@@ -1766,10 +1689,6 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
 	unsigned long limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 	int ret;
 
-	ret = vfio_wait_all_valid(iommu);
-	if (ret < 0)
-		return ret;
-
 	/* Arbitrarily pick the first domain in the list for lookups */
 	if (!list_empty(&iommu->domain_list))
 		d = list_first_entry(&iommu->domain_list,
@@ -2660,7 +2579,6 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 	mutex_init(&iommu->lock);
 	mutex_init(&iommu->device_list_lock);
 	INIT_LIST_HEAD(&iommu->device_list);
-	init_waitqueue_head(&iommu->vaddr_wait);
 	iommu->pgsize_bitmap = PAGE_MASK;
 	INIT_LIST_HEAD(&iommu->emulated_iommu_groups);
 
@@ -3157,13 +3075,12 @@ static int vfio_iommu_type1_dma_rw_chunk(struct vfio_iommu *iommu,
 	struct vfio_dma *dma;
 	bool kthread = current->mm == NULL;
 	size_t offset;
-	int ret;
 
 	*copied = 0;
 
-	ret = vfio_find_dma_valid(iommu, user_iova, 1, &dma);
-	if (ret < 0)
-		return ret;
+	dma = vfio_find_dma(iommu, user_iova, 1);
+	if (!dma)
+		return -EINVAL;
 
 	if ((write && !(dma->prot & IOMMU_WRITE)) ||
 			!(dma->prot & IOMMU_READ))
@@ -3272,7 +3189,6 @@ static void vfio_iommu_type1_notify(void *iommu_data,
 	mutex_lock(&iommu->lock);
 	iommu->container_open = false;
 	mutex_unlock(&iommu->lock);
-	wake_up_all(&iommu->vaddr_wait);
 }
 
 static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V5 6/7] vfio/type1: revert "implement notify callback"
  2022-12-15 21:56 [PATCH V5 0/7] fixes for virtual address update Steve Sistare
                   ` (4 preceding siblings ...)
  2022-12-15 21:57 ` [PATCH V5 5/7] vfio/type1: revert "block on invalid vaddr" Steve Sistare
@ 2022-12-15 21:57 ` Steve Sistare
  2022-12-15 21:57 ` [PATCH V5 7/7] vfio: revert "iommu driver " Steve Sistare
  6 siblings, 0 replies; 17+ messages in thread
From: Steve Sistare @ 2022-12-15 21:57 UTC (permalink / raw)
  To: kvm
  Cc: Alex Williamson, Cornelia Huck, Jason Gunthorpe, Kevin Tian,
	Steve Sistare

This is dead code.  Revert it.
  commit 487ace134053 ("vfio/type1: implement notify callback")

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 drivers/vfio/vfio_iommu_type1.c | 15 ---------------
 1 file changed, 15 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index c08c0a3..3faec81 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -75,7 +75,6 @@ struct vfio_iommu {
 	bool			v2;
 	bool			nesting;
 	bool			dirty_page_tracking;
-	bool			container_open;
 	struct list_head	emulated_iommu_groups;
 };
 
@@ -2575,7 +2574,6 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 	INIT_LIST_HEAD(&iommu->iova_list);
 	iommu->dma_list = RB_ROOT;
 	iommu->dma_avail = dma_entry_limit;
-	iommu->container_open = true;
 	mutex_init(&iommu->lock);
 	mutex_init(&iommu->device_list_lock);
 	INIT_LIST_HEAD(&iommu->device_list);
@@ -3179,18 +3177,6 @@ static int vfio_iommu_type1_dma_rw(void *iommu_data, dma_addr_t user_iova,
 	return domain;
 }
 
-static void vfio_iommu_type1_notify(void *iommu_data,
-				    enum vfio_iommu_notify_type event)
-{
-	struct vfio_iommu *iommu = iommu_data;
-
-	if (event != VFIO_IOMMU_CONTAINER_CLOSE)
-		return;
-	mutex_lock(&iommu->lock);
-	iommu->container_open = false;
-	mutex_unlock(&iommu->lock);
-}
-
 static const struct vfio_iommu_driver_ops vfio_iommu_driver_ops_type1 = {
 	.name			= "vfio-iommu-type1",
 	.owner			= THIS_MODULE,
@@ -3205,7 +3191,6 @@ static void vfio_iommu_type1_notify(void *iommu_data,
 	.unregister_device	= vfio_iommu_type1_unregister_device,
 	.dma_rw			= vfio_iommu_type1_dma_rw,
 	.group_iommu_domain	= vfio_iommu_type1_group_iommu_domain,
-	.notify			= vfio_iommu_type1_notify,
 };
 
 static int __init vfio_iommu_type1_init(void)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH V5 7/7] vfio: revert "iommu driver notify callback"
  2022-12-15 21:56 [PATCH V5 0/7] fixes for virtual address update Steve Sistare
                   ` (5 preceding siblings ...)
  2022-12-15 21:57 ` [PATCH V5 6/7] vfio/type1: revert "implement notify callback" Steve Sistare
@ 2022-12-15 21:57 ` Steve Sistare
  6 siblings, 0 replies; 17+ messages in thread
From: Steve Sistare @ 2022-12-15 21:57 UTC (permalink / raw)
  To: kvm
  Cc: Alex Williamson, Cornelia Huck, Jason Gunthorpe, Kevin Tian,
	Steve Sistare

Revert this dead code:
  commit ec5e32940cc9 ("vfio: iommu driver notify callback")

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 drivers/vfio/container.c | 5 -----
 drivers/vfio/vfio.h      | 7 -------
 2 files changed, 12 deletions(-)

diff --git a/drivers/vfio/container.c b/drivers/vfio/container.c
index d74164a..5bfd10d 100644
--- a/drivers/vfio/container.c
+++ b/drivers/vfio/container.c
@@ -382,11 +382,6 @@ static int vfio_fops_open(struct inode *inode, struct file *filep)
 static int vfio_fops_release(struct inode *inode, struct file *filep)
 {
 	struct vfio_container *container = filep->private_data;
-	struct vfio_iommu_driver *driver = container->iommu_driver;
-
-	if (driver && driver->ops->notify)
-		driver->ops->notify(container->iommu_data,
-				    VFIO_IOMMU_CONTAINER_CLOSE);
 
 	filep->private_data = NULL;
 
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index bcad54b..8a439c6 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -62,11 +62,6 @@ struct vfio_group {
 	struct blocking_notifier_head	notifier;
 };
 
-/* events for the backend driver notify callback */
-enum vfio_iommu_notify_type {
-	VFIO_IOMMU_CONTAINER_CLOSE = 0,
-};
-
 /**
  * struct vfio_iommu_driver_ops - VFIO IOMMU driver callbacks
  */
@@ -97,8 +92,6 @@ struct vfio_iommu_driver_ops {
 				  void *data, size_t count, bool write);
 	struct iommu_domain *(*group_iommu_domain)(void *iommu_data,
 						   struct iommu_group *group);
-	void		(*notify)(void *iommu_data,
-				  enum vfio_iommu_notify_type event);
 };
 
 struct vfio_iommu_driver {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH V5 3/7] vfio/type1: count reserved pages
  2022-12-15 21:57 ` [PATCH V5 3/7] vfio/type1: count reserved pages Steve Sistare
@ 2022-12-15 22:15   ` Steven Sistare
  0 siblings, 0 replies; 17+ messages in thread
From: Steven Sistare @ 2022-12-15 22:15 UTC (permalink / raw)
  To: kvm; +Cc: Alex Williamson, Cornelia Huck, Jason Gunthorpe, Kevin Tian

I just realized it makes more sense to directly count dma->locked_vm, which
can be done with a few lines in vfio_lock_acct.  I will do that tomorrow,
along with addressing any new comments from this review.

- Steve

On 12/15/2022 4:57 PM, Steve Sistare wrote:
> A pinned dma mapping may include reserved pages, which are not included
> in the task's locked_vm count.  Maintain a count of reserved pages, for
> iommu capable devices, so that locked_vm can be restored after fork or
> exec in a subsequent patch.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index cd49b656..add87cd 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -101,6 +101,7 @@ struct vfio_dma {
>  	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
>  	unsigned long		*bitmap;
>  	struct mm_struct	*mm;
> +	long			reserved_pages;
>  };
>  
>  struct vfio_batch {
> @@ -662,7 +663,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
>  {
>  	unsigned long pfn;
>  	struct mm_struct *mm = current->mm;
> -	long ret, pinned = 0, lock_acct = 0;
> +	long ret, pinned = 0, lock_acct = 0, reserved_pages = 0;
>  	bool rsvd;
>  	dma_addr_t iova = vaddr - dma->vaddr + dma->iova;
>  
> @@ -716,7 +717,9 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
>  			 * externally pinned pages are already counted against
>  			 * the user.
>  			 */
> -			if (!rsvd && !vfio_find_vpfn(dma, iova)) {
> +			if (rsvd) {
> +				reserved_pages++;
> +			} else if (!vfio_find_vpfn(dma, iova)) {
>  				if (!dma->lock_cap &&
>  				    mm->locked_vm + lock_acct + 1 > limit) {
>  					pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n",
> @@ -746,6 +749,8 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
>  
>  out:
>  	ret = vfio_lock_acct(dma, lock_acct, false);
> +	if (!ret)
> +		dma->reserved_pages += reserved_pages;
>  
>  unpin_out:
>  	if (batch->size == 1 && !batch->offset) {
> @@ -771,7 +776,7 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
>  				    unsigned long pfn, long npage,
>  				    bool do_accounting)
>  {
> -	long unlocked = 0, locked = 0;
> +	long unlocked = 0, locked = 0, reserved_pages = 0;
>  	long i;
>  
>  	for (i = 0; i < npage; i++, iova += PAGE_SIZE) {
> @@ -779,9 +784,12 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
>  			unlocked++;
>  			if (vfio_find_vpfn(dma, iova))
>  				locked++;
> +		} else {
> +			reserved_pages++;
>  		}
>  	}
>  
> +	dma->reserved_pages -= reserved_pages;
>  	if (do_accounting)
>  		vfio_lock_acct(dma, locked - unlocked, true);
>  

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V5 2/7] vfio/type1: prevent locked_vm underflow
  2022-12-15 21:56 ` [PATCH V5 2/7] vfio/type1: prevent locked_vm underflow Steve Sistare
@ 2022-12-16 14:09   ` Jason Gunthorpe
  2022-12-16 15:42     ` Steven Sistare
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2022-12-16 14:09 UTC (permalink / raw)
  To: Steve Sistare; +Cc: kvm, Alex Williamson, Cornelia Huck, Kevin Tian

On Thu, Dec 15, 2022 at 01:56:59PM -0800, Steve Sistare wrote:
> When a vfio container is preserved across exec, the task does not change,
> but it gets a new mm with locked_vm=0.  If the user later unmaps a dma
> mapping, locked_vm underflows to a large unsigned value, and a subsequent
> dma map request fails with ENOMEM in __account_locked_vm.
> 
> To avoid underflow, grab and save the mm at the time a dma is mapped.
> Use that mm when adjusting locked_vm, rather than re-acquiring the saved
> task's mm, which may have changed.  If the saved mm is dead, do nothing.
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)

Add fixes lines and a CC stable

The subject should be more like 'vfio/typ1: Prevent corruption of mm->locked_vm via exec()'

> @@ -1687,6 +1689,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>  	get_task_struct(current->group_leader);
>  	dma->task = current->group_leader;
>  	dma->lock_cap = capable(CAP_IPC_LOCK);
> +	dma->mm = dma->task->mm;

This should be current->mm, current->group_leader->mm is not quite the
same thing (and maybe another bug, I'm not sure)

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V5 1/7] vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR
  2022-12-15 21:56 ` [PATCH V5 1/7] vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR Steve Sistare
@ 2022-12-16 14:10   ` Jason Gunthorpe
  0 siblings, 0 replies; 17+ messages in thread
From: Jason Gunthorpe @ 2022-12-16 14:10 UTC (permalink / raw)
  To: Steve Sistare; +Cc: kvm, Alex Williamson, Cornelia Huck, Kevin Tian

On Thu, Dec 15, 2022 at 01:56:58PM -0800, Steve Sistare wrote:
> Disable the VFIO_UPDATE_VADDR capability if mediated devices are present.
> Their kernel threads could be blocked indefinitely by a misbehaving
> userland while trying to pin/unpin pages while vaddrs are being updated.
> 
> Do not allow groups to be added to the container while vaddr's are invalid,
> so we never need to block user threads from pinning, and can delete the
> vaddr-waiting code in a subsequent patch.
> 
> Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr")
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>

No blank line here

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V5 4/7] vfio/type1: restore locked_vm
  2022-12-15 21:57 ` [PATCH V5 4/7] vfio/type1: restore locked_vm Steve Sistare
@ 2022-12-16 14:12   ` Jason Gunthorpe
  0 siblings, 0 replies; 17+ messages in thread
From: Jason Gunthorpe @ 2022-12-16 14:12 UTC (permalink / raw)
  To: Steve Sistare; +Cc: kvm, Alex Williamson, Cornelia Huck, Kevin Tian

On Thu, Dec 15, 2022 at 01:57:01PM -0800, Steve Sistare wrote:
> When a vfio container is preserved across exec or fork-exec, the new
> task's mm has a locked_vm count of 0.  After a dma vaddr is updated using
> VFIO_DMA_MAP_FLAG_VADDR, locked_vm remains 0, and the pinned memory does
> not count against the task's RLIMIT_MEMLOCK.
> 
> To restore the correct locked_vm count, when VFIO_DMA_MAP_FLAG_VADDR is
> used and the dma's mm has changed, add the mapping's pinned page count to
> the new mm->locked_vm, subject to the rlimit.  Now that mediated devices
> are excluded when using VFIO_UPDATE_VADDR, the amount of pinned memory
> equals the size of the mapping less the reserved page count.
> 
> Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr")
> 
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 35 +++++++++++++++++++++++++++++++++++
>  1 file changed, 35 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index add87cd..70b52e9 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1588,6 +1588,38 @@ static bool vfio_iommu_iova_dma_valid(struct vfio_iommu *iommu,
>  	return list_empty(iova);
>  }
>  
> +static int vfio_change_dma_owner(struct vfio_dma *dma)
> +{
> +	struct task_struct *new_task = current->group_leader;
> +
> +	if (new_task->mm != dma->mm) {
> +		long npage = (dma->size >> PAGE_SHIFT) - dma->reserved_pages;
> +		bool new_lock_cap = capable(CAP_IPC_LOCK);
> +		int ret = mmap_write_lock_killable(new_task->mm);
> +
> +		if (ret)
> +			return ret;
> +
> +		ret = __account_locked_vm(new_task->mm, npage, true,
> +					  new_task, new_lock_cap);
> +		mmap_write_unlock(new_task->mm);
> +		if (ret)
> +			return ret;
> +
> +		vfio_lock_acct(dma, -npage, true);
> +		if (dma->task != new_task) {
> +			put_task_struct(dma->task);
> +			dma->task = get_task_struct(new_task);
> +		}
> +		mmdrop(dma->mm);
> +		dma->mm = new_task->mm;

This also should be current->mm not current->group_leader->mm

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V5 2/7] vfio/type1: prevent locked_vm underflow
  2022-12-16 14:09   ` Jason Gunthorpe
@ 2022-12-16 15:42     ` Steven Sistare
  2022-12-16 16:10       ` Alex Williamson
  2022-12-16 17:07       ` Jason Gunthorpe
  0 siblings, 2 replies; 17+ messages in thread
From: Steven Sistare @ 2022-12-16 15:42 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: kvm, Alex Williamson, Cornelia Huck, Kevin Tian

On 12/16/2022 9:09 AM, Jason Gunthorpe wrote:
> On Thu, Dec 15, 2022 at 01:56:59PM -0800, Steve Sistare wrote:
>> When a vfio container is preserved across exec, the task does not change,
>> but it gets a new mm with locked_vm=0.  If the user later unmaps a dma
>> mapping, locked_vm underflows to a large unsigned value, and a subsequent
>> dma map request fails with ENOMEM in __account_locked_vm.
>>
>> To avoid underflow, grab and save the mm at the time a dma is mapped.
>> Use that mm when adjusting locked_vm, rather than re-acquiring the saved
>> task's mm, which may have changed.  If the saved mm is dead, do nothing.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 17 ++++++++++-------
>>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> Add fixes lines and a CC stable

This predates the update vaddr functionality, so AFAICT:

    Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation")

I'll wait on cc'ing stable until alex has chimed in.

> The subject should be more like 'vfio/typ1: Prevent corruption of mm->locked_vm via exec()'

Underflow is a more precise description of the first corruption. How about:

vfio/type1: Prevent underflow of locked_vm via exec()

>> @@ -1687,6 +1689,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>  	get_task_struct(current->group_leader);
>>  	dma->task = current->group_leader;
>>  	dma->lock_cap = capable(CAP_IPC_LOCK);
>> +	dma->mm = dma->task->mm;
> 
> This should be current->mm, current->group_leader->mm is not quite the
> same thing (and maybe another bug, I'm not sure)

When are they different -- when the leader is a zombie?

BTW I just noticed I need to update the comments about mm preceding these lines.

- Steve

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V5 2/7] vfio/type1: prevent locked_vm underflow
  2022-12-16 15:42     ` Steven Sistare
@ 2022-12-16 16:10       ` Alex Williamson
  2022-12-16 16:16         ` Steven Sistare
  2022-12-16 17:07       ` Jason Gunthorpe
  1 sibling, 1 reply; 17+ messages in thread
From: Alex Williamson @ 2022-12-16 16:10 UTC (permalink / raw)
  To: Steven Sistare; +Cc: Jason Gunthorpe, kvm, Cornelia Huck, Kevin Tian

On Fri, 16 Dec 2022 10:42:13 -0500
Steven Sistare <steven.sistare@oracle.com> wrote:

> On 12/16/2022 9:09 AM, Jason Gunthorpe wrote:
> > On Thu, Dec 15, 2022 at 01:56:59PM -0800, Steve Sistare wrote:  
> >> When a vfio container is preserved across exec, the task does not change,
> >> but it gets a new mm with locked_vm=0.  If the user later unmaps a dma
> >> mapping, locked_vm underflows to a large unsigned value, and a subsequent
> >> dma map request fails with ENOMEM in __account_locked_vm.
> >>
> >> To avoid underflow, grab and save the mm at the time a dma is mapped.
> >> Use that mm when adjusting locked_vm, rather than re-acquiring the saved
> >> task's mm, which may have changed.  If the saved mm is dead, do nothing.
> >>
> >> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> >> ---
> >>  drivers/vfio/vfio_iommu_type1.c | 17 ++++++++++-------
> >>  1 file changed, 10 insertions(+), 7 deletions(-)  
> > 
> > Add fixes lines and a CC stable  
> 
> This predates the update vaddr functionality, so AFAICT:
> 
>     Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation")
> 
> I'll wait on cc'ing stable until alex has chimed in.

Technically, adding the stable Cc tag is still the correct approach per
the stable process docs, but the Fixes: tag alone is generally
sufficient to crank up the backport engines.  The original
implementation is probably the correct commit to identify, exec was
certainly not considered there.  Thanks,

Alex
 
> > The subject should be more like 'vfio/typ1: Prevent corruption of mm->locked_vm via exec()'  
> 
> Underflow is a more precise description of the first corruption. How about:
> 
> vfio/type1: Prevent underflow of locked_vm via exec()
> 
> >> @@ -1687,6 +1689,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> >>  	get_task_struct(current->group_leader);
> >>  	dma->task = current->group_leader;
> >>  	dma->lock_cap = capable(CAP_IPC_LOCK);
> >> +	dma->mm = dma->task->mm;  
> > 
> > This should be current->mm, current->group_leader->mm is not quite the
> > same thing (and maybe another bug, I'm not sure)  
> 
> When are they different -- when the leader is a zombie?
> 
> BTW I just noticed I need to update the comments about mm preceding these lines.
> 
> - Steve
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V5 2/7] vfio/type1: prevent locked_vm underflow
  2022-12-16 16:10       ` Alex Williamson
@ 2022-12-16 16:16         ` Steven Sistare
  2022-12-16 16:33           ` Alex Williamson
  0 siblings, 1 reply; 17+ messages in thread
From: Steven Sistare @ 2022-12-16 16:16 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Jason Gunthorpe, kvm, Cornelia Huck, Kevin Tian

On 12/16/2022 11:10 AM, Alex Williamson wrote:
> On Fri, 16 Dec 2022 10:42:13 -0500
> Steven Sistare <steven.sistare@oracle.com> wrote:
> 
>> On 12/16/2022 9:09 AM, Jason Gunthorpe wrote:
>>> On Thu, Dec 15, 2022 at 01:56:59PM -0800, Steve Sistare wrote:  
>>>> When a vfio container is preserved across exec, the task does not change,
>>>> but it gets a new mm with locked_vm=0.  If the user later unmaps a dma
>>>> mapping, locked_vm underflows to a large unsigned value, and a subsequent
>>>> dma map request fails with ENOMEM in __account_locked_vm.
>>>>
>>>> To avoid underflow, grab and save the mm at the time a dma is mapped.
>>>> Use that mm when adjusting locked_vm, rather than re-acquiring the saved
>>>> task's mm, which may have changed.  If the saved mm is dead, do nothing.
>>>>
>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>> ---
>>>>  drivers/vfio/vfio_iommu_type1.c | 17 ++++++++++-------
>>>>  1 file changed, 10 insertions(+), 7 deletions(-)  
>>>
>>> Add fixes lines and a CC stable  
>>
>> This predates the update vaddr functionality, so AFAICT:
>>
>>     Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation")
>>
>> I'll wait on cc'ing stable until alex has chimed in.
> 
> Technically, adding the stable Cc tag is still the correct approach per
> the stable process docs, but the Fixes: tag alone is generally
> sufficient to crank up the backport engines.  The original
> implementation is probably the correct commit to identify, exec was
> certainly not considered there.  Thanks,

Should I cc stable on the whole series, or re-send individually?  If the
latter, which ones?

- Steve
  
>>> The subject should be more like 'vfio/typ1: Prevent corruption of mm->locked_vm via exec()'  
>>
>> Underflow is a more precise description of the first corruption. How about:
>>
>> vfio/type1: Prevent underflow of locked_vm via exec()
>>
>>>> @@ -1687,6 +1689,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>>>  	get_task_struct(current->group_leader);
>>>>  	dma->task = current->group_leader;
>>>>  	dma->lock_cap = capable(CAP_IPC_LOCK);
>>>> +	dma->mm = dma->task->mm;  
>>>
>>> This should be current->mm, current->group_leader->mm is not quite the
>>> same thing (and maybe another bug, I'm not sure)  
>>
>> When are they different -- when the leader is a zombie?
>>
>> BTW I just noticed I need to update the comments about mm preceding these lines.
>>
>> - Steve
>>
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V5 2/7] vfio/type1: prevent locked_vm underflow
  2022-12-16 16:16         ` Steven Sistare
@ 2022-12-16 16:33           ` Alex Williamson
  0 siblings, 0 replies; 17+ messages in thread
From: Alex Williamson @ 2022-12-16 16:33 UTC (permalink / raw)
  To: Steven Sistare; +Cc: Jason Gunthorpe, kvm, Cornelia Huck, Kevin Tian

On Fri, 16 Dec 2022 11:16:59 -0500
Steven Sistare <steven.sistare@oracle.com> wrote:

> On 12/16/2022 11:10 AM, Alex Williamson wrote:
> > On Fri, 16 Dec 2022 10:42:13 -0500
> > Steven Sistare <steven.sistare@oracle.com> wrote:
> >   
> >> On 12/16/2022 9:09 AM, Jason Gunthorpe wrote:  
> >>> On Thu, Dec 15, 2022 at 01:56:59PM -0800, Steve Sistare wrote:    
> >>>> When a vfio container is preserved across exec, the task does not change,
> >>>> but it gets a new mm with locked_vm=0.  If the user later unmaps a dma
> >>>> mapping, locked_vm underflows to a large unsigned value, and a subsequent
> >>>> dma map request fails with ENOMEM in __account_locked_vm.
> >>>>
> >>>> To avoid underflow, grab and save the mm at the time a dma is mapped.
> >>>> Use that mm when adjusting locked_vm, rather than re-acquiring the saved
> >>>> task's mm, which may have changed.  If the saved mm is dead, do nothing.
> >>>>
> >>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> >>>> ---
> >>>>  drivers/vfio/vfio_iommu_type1.c | 17 ++++++++++-------
> >>>>  1 file changed, 10 insertions(+), 7 deletions(-)    
> >>>
> >>> Add fixes lines and a CC stable    
> >>
> >> This predates the update vaddr functionality, so AFAICT:
> >>
> >>     Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation")
> >>
> >> I'll wait on cc'ing stable until alex has chimed in.  
> > 
> > Technically, adding the stable Cc tag is still the correct approach per
> > the stable process docs, but the Fixes: tag alone is generally
> > sufficient to crank up the backport engines.  The original
> > implementation is probably the correct commit to identify, exec was
> > certainly not considered there.  Thanks,  
> 
> Should I cc stable on the whole series, or re-send individually?  If the
> latter, which ones?

Only for the Fixes: commits, and note that Cc: stable is effectively
just a tag like Fixes:, you don't actually need to copy the stable@vger
list on the patch (and IIRC you'll get some correctional email if you
do).  Thanks,

Alex


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH V5 2/7] vfio/type1: prevent locked_vm underflow
  2022-12-16 15:42     ` Steven Sistare
  2022-12-16 16:10       ` Alex Williamson
@ 2022-12-16 17:07       ` Jason Gunthorpe
  1 sibling, 0 replies; 17+ messages in thread
From: Jason Gunthorpe @ 2022-12-16 17:07 UTC (permalink / raw)
  To: Steven Sistare; +Cc: kvm, Alex Williamson, Cornelia Huck, Kevin Tian

On Fri, Dec 16, 2022 at 10:42:13AM -0500, Steven Sistare wrote:
> On 12/16/2022 9:09 AM, Jason Gunthorpe wrote:
> > On Thu, Dec 15, 2022 at 01:56:59PM -0800, Steve Sistare wrote:
> >> When a vfio container is preserved across exec, the task does not change,
> >> but it gets a new mm with locked_vm=0.  If the user later unmaps a dma
> >> mapping, locked_vm underflows to a large unsigned value, and a subsequent
> >> dma map request fails with ENOMEM in __account_locked_vm.
> >>
> >> To avoid underflow, grab and save the mm at the time a dma is mapped.
> >> Use that mm when adjusting locked_vm, rather than re-acquiring the saved
> >> task's mm, which may have changed.  If the saved mm is dead, do nothing.
> >>
> >> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> >> ---
> >>  drivers/vfio/vfio_iommu_type1.c | 17 ++++++++++-------
> >>  1 file changed, 10 insertions(+), 7 deletions(-)
> > 
> > Add fixes lines and a CC stable
> 
> This predates the update vaddr functionality, so AFAICT:
> 
>     Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation")
> 
> I'll wait on cc'ing stable until alex has chimed in.

Yes

> > The subject should be more like 'vfio/typ1: Prevent corruption of mm->locked_vm via exec()'
> 
> Underflow is a more precise description of the first corruption. How about:
> 
> vfio/type1: Prevent underflow of locked_vm via exec()

sure
 
> >> @@ -1687,6 +1689,8 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> >>  	get_task_struct(current->group_leader);
> >>  	dma->task = current->group_leader;
> >>  	dma->lock_cap = capable(CAP_IPC_LOCK);
> >> +	dma->mm = dma->task->mm;
> > 
> > This should be current->mm, current->group_leader->mm is not quite the
> > same thing (and maybe another bug, I'm not sure)
> 
> When are they different -- when the leader is a zombie?

I'm actually not sure if they can be different, but if they are
different then group_leader is the wrong one. Better not to chance it

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-12-16 17:07 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-15 21:56 [PATCH V5 0/7] fixes for virtual address update Steve Sistare
2022-12-15 21:56 ` [PATCH V5 1/7] vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR Steve Sistare
2022-12-16 14:10   ` Jason Gunthorpe
2022-12-15 21:56 ` [PATCH V5 2/7] vfio/type1: prevent locked_vm underflow Steve Sistare
2022-12-16 14:09   ` Jason Gunthorpe
2022-12-16 15:42     ` Steven Sistare
2022-12-16 16:10       ` Alex Williamson
2022-12-16 16:16         ` Steven Sistare
2022-12-16 16:33           ` Alex Williamson
2022-12-16 17:07       ` Jason Gunthorpe
2022-12-15 21:57 ` [PATCH V5 3/7] vfio/type1: count reserved pages Steve Sistare
2022-12-15 22:15   ` Steven Sistare
2022-12-15 21:57 ` [PATCH V5 4/7] vfio/type1: restore locked_vm Steve Sistare
2022-12-16 14:12   ` Jason Gunthorpe
2022-12-15 21:57 ` [PATCH V5 5/7] vfio/type1: revert "block on invalid vaddr" Steve Sistare
2022-12-15 21:57 ` [PATCH V5 6/7] vfio/type1: revert "implement notify callback" Steve Sistare
2022-12-15 21:57 ` [PATCH V5 7/7] vfio: revert "iommu driver " Steve Sistare

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.