All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] Fully lock the container members of struct vfio_group
@ 2022-05-16 23:41 Jason Gunthorpe
  2022-05-16 23:41 ` [PATCH v2 1/6] vfio: Add missing locking for struct vfio_group::kvm Jason Gunthorpe
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2022-05-16 23:41 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, kvm
  Cc: Xiao Guangrong, Jike Song, Kevin Tian, Kirti Wankhede, Nicolin Chen

The atomic based scheme for tracking the group->container and group->kvm
has two race conditions, simplify it by adding a rwsem to protect those
values and related and remove the atomics.

This is on github: https://github.com/jgunthorpe/linux/commits/vfio_group_locking

v2:
 - Updated comments and commit messages
 - Rebased on vfio next
 - Left the dev_warn in place, will adjust it later
 - s/singleton_file/opened_file/
v1: https://lore.kernel.org/r/0-v1-c1d14aae2e8f+2f4-vfio_group_locking_jgg@nvidia.com

Cc: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Jason Gunthorpe (6):
  vfio: Add missing locking for struct vfio_group::kvm
  vfio: Change struct vfio_group::opened from an atomic to bool
  vfio: Split up vfio_group_get_device_fd()
  vfio: Fully lock struct vfio_group::container
  vfio: Simplify the life cycle of the group FD
  vfio: Change struct vfio_group::container_users to a non-atomic int

 drivers/vfio/vfio.c | 266 +++++++++++++++++++++++++++-----------------
 1 file changed, 163 insertions(+), 103 deletions(-)


base-commit: 6a985ae80befcf2c00e7c889336bfe9e9739e2ef
-- 
2.36.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/6] vfio: Add missing locking for struct vfio_group::kvm
  2022-05-16 23:41 [PATCH v2 0/6] Fully lock the container members of struct vfio_group Jason Gunthorpe
@ 2022-05-16 23:41 ` Jason Gunthorpe
  2022-05-16 23:41 ` [PATCH v2 2/6] vfio: Change struct vfio_group::opened from an atomic to bool Jason Gunthorpe
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2022-05-16 23:41 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, kvm
  Cc: Xiao Guangrong, Jike Song, Kevin Tian, Kirti Wankhede, Nicolin Chen

Without locking userspace can trigger a UAF by racing
KVM_DEV_VFIO_GROUP_DEL with VFIO_GROUP_GET_DEVICE_FD:

              CPU1                               CPU2
					    ioctl(KVM_DEV_VFIO_GROUP_DEL)
 ioctl(VFIO_GROUP_GET_DEVICE_FD)
    vfio_group_get_device_fd
     open_device()
      intel_vgpu_open_device()
        vfio_register_notifier()
	 vfio_register_group_notifier()
	   blocking_notifier_call_chain(&group->notifier,
               VFIO_GROUP_NOTIFY_SET_KVM, group->kvm);

					      set_kvm()
						group->kvm = NULL
					    close()
					     kfree(kvm)

             intel_vgpu_group_notifier()
                vdev->kvm = data
    [..]
        kvm_get_kvm(vgpu->kvm);
	    // UAF!

Add a simple rwsem in the group to protect the kvm while the notifier is
using it.

Note this doesn't fix the race internal to i915 where userspace can
trigger two VFIO_GROUP_NOTIFY_SET_KVM's before we reach a consumer of
vgpu->kvm and trigger this same UAF, it just makes the notifier
self-consistent.

Fixes: ccd46dbae77d ("vfio: support notifier chain in vfio_group")
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 1758d96f43f4a6..4261eeec9e73c6 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -76,6 +76,7 @@ struct vfio_group {
 	atomic_t			opened;
 	enum vfio_group_type		type;
 	unsigned int			dev_counter;
+	struct rw_semaphore		group_rwsem;
 	struct kvm			*kvm;
 	struct blocking_notifier_head	notifier;
 };
@@ -360,6 +361,7 @@ static struct vfio_group *vfio_group_alloc(struct iommu_group *iommu_group,
 	group->cdev.owner = THIS_MODULE;
 
 	refcount_set(&group->users, 1);
+	init_rwsem(&group->group_rwsem);
 	INIT_LIST_HEAD(&group->device_list);
 	mutex_init(&group->device_lock);
 	group->iommu_group = iommu_group;
@@ -1694,9 +1696,11 @@ void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
 	if (file->f_op != &vfio_group_fops)
 		return;
 
+	down_write(&group->group_rwsem);
 	group->kvm = kvm;
 	blocking_notifier_call_chain(&group->notifier,
 				     VFIO_GROUP_NOTIFY_SET_KVM, kvm);
+	up_write(&group->group_rwsem);
 }
 EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
 
@@ -2004,15 +2008,22 @@ static int vfio_register_group_notifier(struct vfio_group *group,
 		return -EINVAL;
 
 	ret = blocking_notifier_chain_register(&group->notifier, nb);
+	if (ret)
+		return ret;
 
 	/*
 	 * The attaching of kvm and vfio_group might already happen, so
 	 * here we replay once upon registration.
 	 */
-	if (!ret && set_kvm && group->kvm)
-		blocking_notifier_call_chain(&group->notifier,
-					VFIO_GROUP_NOTIFY_SET_KVM, group->kvm);
-	return ret;
+	if (set_kvm) {
+		down_read(&group->group_rwsem);
+		if (group->kvm)
+			blocking_notifier_call_chain(&group->notifier,
+						     VFIO_GROUP_NOTIFY_SET_KVM,
+						     group->kvm);
+		up_read(&group->group_rwsem);
+	}
+	return 0;
 }
 
 int vfio_register_notifier(struct vfio_device *device,
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/6] vfio: Change struct vfio_group::opened from an atomic to bool
  2022-05-16 23:41 [PATCH v2 0/6] Fully lock the container members of struct vfio_group Jason Gunthorpe
  2022-05-16 23:41 ` [PATCH v2 1/6] vfio: Add missing locking for struct vfio_group::kvm Jason Gunthorpe
@ 2022-05-16 23:41 ` Jason Gunthorpe
  2022-05-16 23:41 ` [PATCH v2 3/6] vfio: Split up vfio_group_get_device_fd() Jason Gunthorpe
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2022-05-16 23:41 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, kvm
  Cc: Xiao Guangrong, Jike Song, Kevin Tian, Kirti Wankhede, Nicolin Chen

This is not a performance path, just use the group_rwsem to protect the
value.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio.c | 46 ++++++++++++++++++++++++++-------------------
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 4261eeec9e73c6..12d4b3efd4639e 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -73,7 +73,7 @@ struct vfio_group {
 	struct mutex			device_lock;
 	struct list_head		vfio_next;
 	struct list_head		container_next;
-	atomic_t			opened;
+	bool				opened;
 	enum vfio_group_type		type;
 	unsigned int			dev_counter;
 	struct rw_semaphore		group_rwsem;
@@ -1213,30 +1213,30 @@ static int vfio_group_fops_open(struct inode *inode, struct file *filep)
 {
 	struct vfio_group *group =
 		container_of(inode->i_cdev, struct vfio_group, cdev);
-	int opened;
+	int ret;
 
-	/* users can be zero if this races with vfio_group_put() */
-	if (!refcount_inc_not_zero(&group->users))
-		return -ENODEV;
+	down_write(&group->group_rwsem);
 
-	if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO)) {
-		vfio_group_put(group);
-		return -EPERM;
+	/* users can be zero if this races with vfio_group_put() */
+	if (!refcount_inc_not_zero(&group->users)) {
+		ret = -ENODEV;
+		goto err_unlock;
 	}
 
-	/* Do we need multiple instances of the group open?  Seems not. */
-	opened = atomic_cmpxchg(&group->opened, 0, 1);
-	if (opened) {
-		vfio_group_put(group);
-		return -EBUSY;
+	if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO)) {
+		ret = -EPERM;
+		goto err_put;
 	}
 
-	/* Is something still in use from a previous open? */
-	if (group->container) {
-		atomic_dec(&group->opened);
-		vfio_group_put(group);
-		return -EBUSY;
+	/*
+	 * Do we need multiple instances of the group open?  Seems not.
+	 * Is something still in use from a previous open?
+	 */
+	if (group->opened || group->container) {
+		ret = -EBUSY;
+		goto err_put;
 	}
+	group->opened = true;
 
 	/* Warn if previous user didn't cleanup and re-init to drop them */
 	if (WARN_ON(group->notifier.head))
@@ -1244,7 +1244,13 @@ static int vfio_group_fops_open(struct inode *inode, struct file *filep)
 
 	filep->private_data = group;
 
+	up_write(&group->group_rwsem);
 	return 0;
+err_put:
+	vfio_group_put(group);
+err_unlock:
+	up_write(&group->group_rwsem);
+	return ret;
 }
 
 static int vfio_group_fops_release(struct inode *inode, struct file *filep)
@@ -1255,7 +1261,9 @@ static int vfio_group_fops_release(struct inode *inode, struct file *filep)
 
 	vfio_group_try_dissolve_container(group);
 
-	atomic_dec(&group->opened);
+	down_write(&group->group_rwsem);
+	group->opened = false;
+	up_write(&group->group_rwsem);
 
 	vfio_group_put(group);
 
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 3/6] vfio: Split up vfio_group_get_device_fd()
  2022-05-16 23:41 [PATCH v2 0/6] Fully lock the container members of struct vfio_group Jason Gunthorpe
  2022-05-16 23:41 ` [PATCH v2 1/6] vfio: Add missing locking for struct vfio_group::kvm Jason Gunthorpe
  2022-05-16 23:41 ` [PATCH v2 2/6] vfio: Change struct vfio_group::opened from an atomic to bool Jason Gunthorpe
@ 2022-05-16 23:41 ` Jason Gunthorpe
  2022-05-16 23:41 ` [PATCH v2 4/6] vfio: Fully lock struct vfio_group::container Jason Gunthorpe
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2022-05-16 23:41 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, kvm
  Cc: Xiao Guangrong, Jike Song, Kevin Tian, Kirti Wankhede, Nicolin Chen

The split follows the pairing with the destroy functions:

 - vfio_group_get_device_fd() destroyed by close()

 - vfio_device_open() destroyed by vfio_device_fops_release()

 - vfio_device_assign_container() destroyed by
   vfio_group_try_dissolve_container()

The next patch will put a lock around vfio_device_assign_container().

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio.c | 79 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 56 insertions(+), 23 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 12d4b3efd4639e..21db0e8d0d4004 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1064,12 +1064,9 @@ static bool vfio_assert_device_open(struct vfio_device *device)
 	return !WARN_ON_ONCE(!READ_ONCE(device->open_count));
 }
 
-static int vfio_group_get_device_fd(struct vfio_group *group, char *buf)
+static int vfio_device_assign_container(struct vfio_device *device)
 {
-	struct vfio_device *device;
-	struct file *filep;
-	int fdno;
-	int ret = 0;
+	struct vfio_group *group = device->group;
 
 	if (0 == atomic_read(&group->container_users) ||
 	    !group->container->iommu_driver)
@@ -1078,13 +1075,22 @@ static int vfio_group_get_device_fd(struct vfio_group *group, char *buf)
 	if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO))
 		return -EPERM;
 
-	device = vfio_device_get_from_name(group, buf);
-	if (IS_ERR(device))
-		return PTR_ERR(device);
+	atomic_inc(&group->container_users);
+	return 0;
+}
+
+static struct file *vfio_device_open(struct vfio_device *device)
+{
+	struct file *filep;
+	int ret;
+
+	ret = vfio_device_assign_container(device);
+	if (ret)
+		return ERR_PTR(ret);
 
 	if (!try_module_get(device->dev->driver->owner)) {
 		ret = -ENODEV;
-		goto err_device_put;
+		goto err_unassign_container;
 	}
 
 	mutex_lock(&device->dev_set->lock);
@@ -1100,15 +1106,11 @@ static int vfio_group_get_device_fd(struct vfio_group *group, char *buf)
 	 * We can't use anon_inode_getfd() because we need to modify
 	 * the f_mode flags directly to allow more than just ioctls
 	 */
-	fdno = ret = get_unused_fd_flags(O_CLOEXEC);
-	if (ret < 0)
-		goto err_close_device;
-
 	filep = anon_inode_getfile("[vfio-device]", &vfio_device_fops,
 				   device, O_RDWR);
 	if (IS_ERR(filep)) {
 		ret = PTR_ERR(filep);
-		goto err_fd;
+		goto err_close_device;
 	}
 
 	/*
@@ -1118,17 +1120,15 @@ static int vfio_group_get_device_fd(struct vfio_group *group, char *buf)
 	 */
 	filep->f_mode |= (FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE);
 
-	atomic_inc(&group->container_users);
-
-	fd_install(fdno, filep);
-
-	if (group->type == VFIO_NO_IOMMU)
+	if (device->group->type == VFIO_NO_IOMMU)
 		dev_warn(device->dev, "vfio-noiommu device opened by user "
 			 "(%s:%d)\n", current->comm, task_pid_nr(current));
-	return fdno;
+	/*
+	 * On success the ref of device is moved to the file and
+	 * put in vfio_device_fops_release()
+	 */
+	return filep;
 
-err_fd:
-	put_unused_fd(fdno);
 err_close_device:
 	mutex_lock(&device->dev_set->lock);
 	if (device->open_count == 1 && device->ops->close_device)
@@ -1137,7 +1137,40 @@ static int vfio_group_get_device_fd(struct vfio_group *group, char *buf)
 	device->open_count--;
 	mutex_unlock(&device->dev_set->lock);
 	module_put(device->dev->driver->owner);
-err_device_put:
+err_unassign_container:
+	vfio_group_try_dissolve_container(device->group);
+	return ERR_PTR(ret);
+}
+
+static int vfio_group_get_device_fd(struct vfio_group *group, char *buf)
+{
+	struct vfio_device *device;
+	struct file *filep;
+	int fdno;
+	int ret;
+
+	device = vfio_device_get_from_name(group, buf);
+	if (IS_ERR(device))
+		return PTR_ERR(device);
+
+	fdno = get_unused_fd_flags(O_CLOEXEC);
+	if (fdno < 0) {
+		ret = fdno;
+		goto err_put_device;
+	}
+
+	filep = vfio_device_open(device);
+	if (IS_ERR(filep)) {
+		ret = PTR_ERR(filep);
+		goto err_put_fdno;
+	}
+
+	fd_install(fdno, filep);
+	return fdno;
+
+err_put_fdno:
+	put_unused_fd(fdno);
+err_put_device:
 	vfio_device_put(device);
 	return ret;
 }
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 4/6] vfio: Fully lock struct vfio_group::container
  2022-05-16 23:41 [PATCH v2 0/6] Fully lock the container members of struct vfio_group Jason Gunthorpe
                   ` (2 preceding siblings ...)
  2022-05-16 23:41 ` [PATCH v2 3/6] vfio: Split up vfio_group_get_device_fd() Jason Gunthorpe
@ 2022-05-16 23:41 ` Jason Gunthorpe
  2022-05-16 23:41 ` [PATCH v2 5/6] vfio: Simplify the life cycle of the group FD Jason Gunthorpe
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2022-05-16 23:41 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, kvm
  Cc: Xiao Guangrong, Jike Song, Kevin Tian, Kirti Wankhede, Nicolin Chen

This is necessary to avoid various user triggerable races, for instance
racing SET_CONTAINER/UNSET_CONTAINER:

                                  ioctl(VFIO_GROUP_SET_CONTAINER)
ioctl(VFIO_GROUP_UNSET_CONTAINER)
 vfio_group_unset_container
    int users = atomic_cmpxchg(&group->container_users, 1, 0);
    // users == 1 container_users == 0
    __vfio_group_unset_container(group);
      container = group->container;
                                    vfio_group_set_container()
	                              if (!atomic_read(&group->container_users))
				        down_write(&container->group_lock);
				        group->container = container;
				        up_write(&container->group_lock);

      down_write(&container->group_lock);
      group->container = NULL;
      up_write(&container->group_lock);
      vfio_container_put(container);
      /* woops we lost/leaked the new container  */

This can then go on to NULL pointer deref since container == 0 and
container_users == 1.

Wrap all touches of container, except those on a performance path with a
known open device, with the group_rwsem.

The only user of vfio_group_add_container_user() holds the user count for
a simple operation, change it to just hold the group_lock over the
operation and delete vfio_group_add_container_user(). Containers now only
gain a user when a device FD is opened.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio.c | 66 +++++++++++++++++++++++++++------------------
 1 file changed, 40 insertions(+), 26 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 21db0e8d0d4004..81330c8ca7fea8 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -918,6 +918,8 @@ static void __vfio_group_unset_container(struct vfio_group *group)
 	struct vfio_container *container = group->container;
 	struct vfio_iommu_driver *driver;
 
+	lockdep_assert_held_write(&group->group_rwsem);
+
 	down_write(&container->group_lock);
 
 	driver = container->iommu_driver;
@@ -953,6 +955,8 @@ static int vfio_group_unset_container(struct vfio_group *group)
 {
 	int users = atomic_cmpxchg(&group->container_users, 1, 0);
 
+	lockdep_assert_held_write(&group->group_rwsem);
+
 	if (!users)
 		return -EINVAL;
 	if (users != 1)
@@ -971,8 +975,10 @@ static int vfio_group_unset_container(struct vfio_group *group)
  */
 static void vfio_group_try_dissolve_container(struct vfio_group *group)
 {
+	down_write(&group->group_rwsem);
 	if (0 == atomic_dec_if_positive(&group->container_users))
 		__vfio_group_unset_container(group);
+	up_write(&group->group_rwsem);
 }
 
 static int vfio_group_set_container(struct vfio_group *group, int container_fd)
@@ -982,6 +988,8 @@ static int vfio_group_set_container(struct vfio_group *group, int container_fd)
 	struct vfio_iommu_driver *driver;
 	int ret = 0;
 
+	lockdep_assert_held_write(&group->group_rwsem);
+
 	if (atomic_read(&group->container_users))
 		return -EINVAL;
 
@@ -1039,23 +1047,6 @@ static int vfio_group_set_container(struct vfio_group *group, int container_fd)
 	return ret;
 }
 
-static int vfio_group_add_container_user(struct vfio_group *group)
-{
-	if (!atomic_inc_not_zero(&group->container_users))
-		return -EINVAL;
-
-	if (group->type == VFIO_NO_IOMMU) {
-		atomic_dec(&group->container_users);
-		return -EPERM;
-	}
-	if (!group->container->iommu_driver) {
-		atomic_dec(&group->container_users);
-		return -EINVAL;
-	}
-
-	return 0;
-}
-
 static const struct file_operations vfio_device_fops;
 
 /* true if the vfio_device has open_device() called but not close_device() */
@@ -1068,6 +1059,8 @@ static int vfio_device_assign_container(struct vfio_device *device)
 {
 	struct vfio_group *group = device->group;
 
+	lockdep_assert_held_write(&group->group_rwsem);
+
 	if (0 == atomic_read(&group->container_users) ||
 	    !group->container->iommu_driver)
 		return -EINVAL;
@@ -1084,7 +1077,9 @@ static struct file *vfio_device_open(struct vfio_device *device)
 	struct file *filep;
 	int ret;
 
+	down_write(&device->group->group_rwsem);
 	ret = vfio_device_assign_container(device);
+	up_write(&device->group->group_rwsem);
 	if (ret)
 		return ERR_PTR(ret);
 
@@ -1197,11 +1192,13 @@ static long vfio_group_fops_unl_ioctl(struct file *filep,
 
 		status.flags = 0;
 
+		down_read(&group->group_rwsem);
 		if (group->container)
 			status.flags |= VFIO_GROUP_FLAGS_CONTAINER_SET |
 					VFIO_GROUP_FLAGS_VIABLE;
 		else if (!iommu_group_dma_owner_claimed(group->iommu_group))
 			status.flags |= VFIO_GROUP_FLAGS_VIABLE;
+		up_read(&group->group_rwsem);
 
 		if (copy_to_user((void __user *)arg, &status, minsz))
 			return -EFAULT;
@@ -1219,11 +1216,15 @@ static long vfio_group_fops_unl_ioctl(struct file *filep,
 		if (fd < 0)
 			return -EINVAL;
 
+		down_write(&group->group_rwsem);
 		ret = vfio_group_set_container(group, fd);
+		up_write(&group->group_rwsem);
 		break;
 	}
 	case VFIO_GROUP_UNSET_CONTAINER:
+		down_write(&group->group_rwsem);
 		ret = vfio_group_unset_container(group);
+		up_write(&group->group_rwsem);
 		break;
 	case VFIO_GROUP_GET_DEVICE_FD:
 	{
@@ -1709,15 +1710,19 @@ bool vfio_file_enforced_coherent(struct file *file)
 	if (file->f_op != &vfio_group_fops)
 		return true;
 
-	/*
-	 * Since the coherency state is determined only once a container is
-	 * attached the user must do so before they can prove they have
-	 * permission.
-	 */
-	if (vfio_group_add_container_user(group))
-		return true;
-	ret = vfio_ioctl_check_extension(group->container, VFIO_DMA_CC_IOMMU);
-	vfio_group_try_dissolve_container(group);
+	down_read(&group->group_rwsem);
+	if (group->container) {
+		ret = vfio_ioctl_check_extension(group->container,
+						 VFIO_DMA_CC_IOMMU);
+	} else {
+		/*
+		 * Since the coherency state is determined only once a container
+		 * is attached the user must do so before they can prove they
+		 * have permission.
+		 */
+		ret = true;
+	}
+	up_read(&group->group_rwsem);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
@@ -1910,6 +1915,7 @@ int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn,
 	if (group->dev_counter > 1)
 		return -EINVAL;
 
+	/* group->container cannot change while a vfio device is open */
 	container = group->container;
 	driver = container->iommu_driver;
 	if (likely(driver && driver->ops->pin_pages))
@@ -1945,6 +1951,7 @@ int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn,
 	if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
 		return -E2BIG;
 
+	/* group->container cannot change while a vfio device is open */
 	container = device->group->container;
 	driver = container->iommu_driver;
 	if (likely(driver && driver->ops->unpin_pages))
@@ -1984,6 +1991,7 @@ int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova, void *data,
 	if (!data || len <= 0 || !vfio_assert_device_open(device))
 		return -EINVAL;
 
+	/* group->container cannot change while a vfio device is open */
 	container = device->group->container;
 	driver = container->iommu_driver;
 
@@ -2004,6 +2012,7 @@ static int vfio_register_iommu_notifier(struct vfio_group *group,
 	struct vfio_iommu_driver *driver;
 	int ret;
 
+	down_read(&group->group_rwsem);
 	container = group->container;
 	driver = container->iommu_driver;
 	if (likely(driver && driver->ops->register_notifier))
@@ -2011,6 +2020,8 @@ static int vfio_register_iommu_notifier(struct vfio_group *group,
 						     events, nb);
 	else
 		ret = -ENOTTY;
+	up_read(&group->group_rwsem);
+
 	return ret;
 }
 
@@ -2021,6 +2032,7 @@ static int vfio_unregister_iommu_notifier(struct vfio_group *group,
 	struct vfio_iommu_driver *driver;
 	int ret;
 
+	down_read(&group->group_rwsem);
 	container = group->container;
 	driver = container->iommu_driver;
 	if (likely(driver && driver->ops->unregister_notifier))
@@ -2028,6 +2040,8 @@ static int vfio_unregister_iommu_notifier(struct vfio_group *group,
 						       nb);
 	else
 		ret = -ENOTTY;
+	up_read(&group->group_rwsem);
+
 	return ret;
 }
 
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 5/6] vfio: Simplify the life cycle of the group FD
  2022-05-16 23:41 [PATCH v2 0/6] Fully lock the container members of struct vfio_group Jason Gunthorpe
                   ` (3 preceding siblings ...)
  2022-05-16 23:41 ` [PATCH v2 4/6] vfio: Fully lock struct vfio_group::container Jason Gunthorpe
@ 2022-05-16 23:41 ` Jason Gunthorpe
  2022-05-16 23:41 ` [PATCH v2 6/6] vfio: Change struct vfio_group::container_users to a non-atomic int Jason Gunthorpe
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2022-05-16 23:41 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, kvm
  Cc: Xiao Guangrong, Jike Song, Kevin Tian, Kirti Wankhede, Nicolin Chen

Once userspace opens a group FD it is prevented from opening another
instance of that same group FD until all the prior group FDs and users of
the container are done.

The first is done trivially by checking the group->opened during group FD
open.

However, things get a little weird if userspace creates a device FD and
then closes the group FD. The group FD still cannot be re-opened, but this
time it is because the group->container is still set and container_users
is elevated by the device FD.

Due to this mismatched lifecycle we have the
vfio_group_try_dissolve_container() which tries to auto-free a container
after the group FD is closed but the device FD remains open.

Instead have the device FD hold onto a reference to the single group
FD. This directly prevents vfio_group_fops_release() from being called
when any device FD exists and makes the lifecycle model more
understandable.

vfio_group_try_dissolve_container() is removed as the only place a
container is auto-deleted is during vfio_group_fops_release(). At this
point the container_users is either 1 or 0 since all device FDs must be
closed.

Change group->opened to group->opened_file which points to the single
struct file * that is open for the group. If the group->open_file is
NULL then group->container == NULL.

If all device FDs have closed then the group's notifier list must be
empty.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio.c | 52 +++++++++++++++++++++------------------------
 1 file changed, 24 insertions(+), 28 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 81330c8ca7fea8..149c25840130f9 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -73,11 +73,11 @@ struct vfio_group {
 	struct mutex			device_lock;
 	struct list_head		vfio_next;
 	struct list_head		container_next;
-	bool				opened;
 	enum vfio_group_type		type;
 	unsigned int			dev_counter;
 	struct rw_semaphore		group_rwsem;
 	struct kvm			*kvm;
+	struct file			*opened_file;
 	struct blocking_notifier_head	notifier;
 };
 
@@ -967,20 +967,6 @@ static int vfio_group_unset_container(struct vfio_group *group)
 	return 0;
 }
 
-/*
- * When removing container users, anything that removes the last user
- * implicitly removes the group from the container.  That is, if the
- * group file descriptor is closed, as well as any device file descriptors,
- * the group is free.
- */
-static void vfio_group_try_dissolve_container(struct vfio_group *group)
-{
-	down_write(&group->group_rwsem);
-	if (0 == atomic_dec_if_positive(&group->container_users))
-		__vfio_group_unset_container(group);
-	up_write(&group->group_rwsem);
-}
-
 static int vfio_group_set_container(struct vfio_group *group, int container_fd)
 {
 	struct fd f;
@@ -1068,10 +1054,19 @@ static int vfio_device_assign_container(struct vfio_device *device)
 	if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO))
 		return -EPERM;
 
+	get_file(group->opened_file);
 	atomic_inc(&group->container_users);
 	return 0;
 }
 
+static void vfio_device_unassign_container(struct vfio_device *device)
+{
+	down_write(&device->group->group_rwsem);
+	atomic_dec(&device->group->container_users);
+	fput(device->group->opened_file);
+	up_write(&device->group->group_rwsem);
+}
+
 static struct file *vfio_device_open(struct vfio_device *device)
 {
 	struct file *filep;
@@ -1133,7 +1128,7 @@ static struct file *vfio_device_open(struct vfio_device *device)
 	mutex_unlock(&device->dev_set->lock);
 	module_put(device->dev->driver->owner);
 err_unassign_container:
-	vfio_group_try_dissolve_container(device->group);
+	vfio_device_unassign_container(device);
 	return ERR_PTR(ret);
 }
 
@@ -1264,18 +1259,12 @@ static int vfio_group_fops_open(struct inode *inode, struct file *filep)
 
 	/*
 	 * Do we need multiple instances of the group open?  Seems not.
-	 * Is something still in use from a previous open?
 	 */
-	if (group->opened || group->container) {
+	if (group->opened_file) {
 		ret = -EBUSY;
 		goto err_put;
 	}
-	group->opened = true;
-
-	/* Warn if previous user didn't cleanup and re-init to drop them */
-	if (WARN_ON(group->notifier.head))
-		BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
-
+	group->opened_file = filep;
 	filep->private_data = group;
 
 	up_write(&group->group_rwsem);
@@ -1293,10 +1282,17 @@ static int vfio_group_fops_release(struct inode *inode, struct file *filep)
 
 	filep->private_data = NULL;
 
-	vfio_group_try_dissolve_container(group);
-
 	down_write(&group->group_rwsem);
-	group->opened = false;
+	/*
+	 * Device FDs hold a group file reference, therefore the group release
+	 * is only called when there are no open devices.
+	 */
+	WARN_ON(group->notifier.head);
+	if (group->container) {
+		WARN_ON(atomic_read(&group->container_users) != 1);
+		__vfio_group_unset_container(group);
+	}
+	group->opened_file = NULL;
 	up_write(&group->group_rwsem);
 
 	vfio_group_put(group);
@@ -1328,7 +1324,7 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
 
 	module_put(device->dev->driver->owner);
 
-	vfio_group_try_dissolve_container(device->group);
+	vfio_device_unassign_container(device);
 
 	vfio_device_put(device);
 
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 6/6] vfio: Change struct vfio_group::container_users to a non-atomic int
  2022-05-16 23:41 [PATCH v2 0/6] Fully lock the container members of struct vfio_group Jason Gunthorpe
                   ` (4 preceding siblings ...)
  2022-05-16 23:41 ` [PATCH v2 5/6] vfio: Simplify the life cycle of the group FD Jason Gunthorpe
@ 2022-05-16 23:41 ` Jason Gunthorpe
  2022-05-17  5:11 ` [PATCH v2 0/6] Fully lock the container members of struct vfio_group Nicolin Chen
  2022-05-18 17:51 ` Alex Williamson
  7 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2022-05-16 23:41 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, kvm
  Cc: Xiao Guangrong, Jike Song, Kevin Tian, Kirti Wankhede, Nicolin Chen

Now that everything is fully locked there is no need for container_users
to remain as an atomic, change it to an unsigned int.

Use 'if (group->container)' as the test to determine if the container is
present or not instead of using container_users.

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio.c | 28 +++++++++++++---------------
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 149c25840130f9..cfcff7764403fc 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -66,7 +66,7 @@ struct vfio_group {
 	struct device 			dev;
 	struct cdev			cdev;
 	refcount_t			users;
-	atomic_t			container_users;
+	unsigned int			container_users;
 	struct iommu_group		*iommu_group;
 	struct vfio_container		*container;
 	struct list_head		device_list;
@@ -429,7 +429,7 @@ static void vfio_group_put(struct vfio_group *group)
 	 * properly hold the group reference.
 	 */
 	WARN_ON(!list_empty(&group->device_list));
-	WARN_ON(atomic_read(&group->container_users));
+	WARN_ON(group->container || group->container_users);
 	WARN_ON(group->notifier.head);
 
 	list_del(&group->vfio_next);
@@ -930,6 +930,7 @@ static void __vfio_group_unset_container(struct vfio_group *group)
 	iommu_group_release_dma_owner(group->iommu_group);
 
 	group->container = NULL;
+	group->container_users = 0;
 	list_del(&group->container_next);
 
 	/* Detaching the last group deprivileges a container, remove iommu */
@@ -953,17 +954,13 @@ static void __vfio_group_unset_container(struct vfio_group *group)
  */
 static int vfio_group_unset_container(struct vfio_group *group)
 {
-	int users = atomic_cmpxchg(&group->container_users, 1, 0);
-
 	lockdep_assert_held_write(&group->group_rwsem);
 
-	if (!users)
+	if (!group->container)
 		return -EINVAL;
-	if (users != 1)
+	if (group->container_users != 1)
 		return -EBUSY;
-
 	__vfio_group_unset_container(group);
-
 	return 0;
 }
 
@@ -976,7 +973,7 @@ static int vfio_group_set_container(struct vfio_group *group, int container_fd)
 
 	lockdep_assert_held_write(&group->group_rwsem);
 
-	if (atomic_read(&group->container_users))
+	if (group->container || WARN_ON(group->container_users))
 		return -EINVAL;
 
 	if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO))
@@ -1020,12 +1017,12 @@ static int vfio_group_set_container(struct vfio_group *group, int container_fd)
 	}
 
 	group->container = container;
+	group->container_users = 1;
 	container->noiommu = (group->type == VFIO_NO_IOMMU);
 	list_add(&group->container_next, &container->group_list);
 
 	/* Get a reference on the container and mark a user within the group */
 	vfio_container_get(container);
-	atomic_inc(&group->container_users);
 
 unlock_out:
 	up_write(&container->group_lock);
@@ -1047,22 +1044,23 @@ static int vfio_device_assign_container(struct vfio_device *device)
 
 	lockdep_assert_held_write(&group->group_rwsem);
 
-	if (0 == atomic_read(&group->container_users) ||
-	    !group->container->iommu_driver)
+	if (!group->container || !group->container->iommu_driver ||
+	    WARN_ON(!group->container_users))
 		return -EINVAL;
 
 	if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO))
 		return -EPERM;
 
 	get_file(group->opened_file);
-	atomic_inc(&group->container_users);
+	group->container_users++;
 	return 0;
 }
 
 static void vfio_device_unassign_container(struct vfio_device *device)
 {
 	down_write(&device->group->group_rwsem);
-	atomic_dec(&device->group->container_users);
+	WARN_ON(device->group->container_users <= 1);
+	device->group->container_users--;
 	fput(device->group->opened_file);
 	up_write(&device->group->group_rwsem);
 }
@@ -1289,7 +1287,7 @@ static int vfio_group_fops_release(struct inode *inode, struct file *filep)
 	 */
 	WARN_ON(group->notifier.head);
 	if (group->container) {
-		WARN_ON(atomic_read(&group->container_users) != 1);
+		WARN_ON(group->container_users != 1);
 		__vfio_group_unset_container(group);
 	}
 	group->opened_file = NULL;
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/6] Fully lock the container members of struct vfio_group
  2022-05-16 23:41 [PATCH v2 0/6] Fully lock the container members of struct vfio_group Jason Gunthorpe
                   ` (5 preceding siblings ...)
  2022-05-16 23:41 ` [PATCH v2 6/6] vfio: Change struct vfio_group::container_users to a non-atomic int Jason Gunthorpe
@ 2022-05-17  5:11 ` Nicolin Chen
  2022-05-17 15:37   ` Matthew Rosato
  2022-05-18 17:51 ` Alex Williamson
  7 siblings, 1 reply; 10+ messages in thread
From: Nicolin Chen @ 2022-05-17  5:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, Cornelia Huck, kvm, Xiao Guangrong, Jike Song,
	Kevin Tian, Kirti Wankhede

On Mon, May 16, 2022 at 08:41:16PM -0300, Jason Gunthorpe wrote:
> The atomic based scheme for tracking the group->container and group->kvm
> has two race conditions, simplify it by adding a rwsem to protect those
> values and related and remove the atomics.
> 
> This is on github: https://github.com/jgunthorpe/linux/commits/vfio_group_locking
> 
> v2:
>  - Updated comments and commit messages
>  - Rebased on vfio next
>  - Left the dev_warn in place, will adjust it later
>  - s/singleton_file/opened_file/
> v1: https://lore.kernel.org/r/0-v1-c1d14aae2e8f+2f4-vfio_group_locking_jgg@nvidia.com
> 
> Cc: Nicolin Chen <nicolinc@nvidia.com>

Sanity tested on x86_64 and ARM64.

Tested-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/6] Fully lock the container members of struct vfio_group
  2022-05-17  5:11 ` [PATCH v2 0/6] Fully lock the container members of struct vfio_group Nicolin Chen
@ 2022-05-17 15:37   ` Matthew Rosato
  0 siblings, 0 replies; 10+ messages in thread
From: Matthew Rosato @ 2022-05-17 15:37 UTC (permalink / raw)
  To: Nicolin Chen, Jason Gunthorpe
  Cc: Alex Williamson, Cornelia Huck, kvm, Xiao Guangrong, Jike Song,
	Kevin Tian, Kirti Wankhede

On 5/17/22 1:11 AM, Nicolin Chen wrote:
> On Mon, May 16, 2022 at 08:41:16PM -0300, Jason Gunthorpe wrote:
>> The atomic based scheme for tracking the group->container and group->kvm
>> has two race conditions, simplify it by adding a rwsem to protect those
>> values and related and remove the atomics.
>>
>> This is on github: https://github.com/jgunthorpe/linux/commits/vfio_group_locking
>>
>> v2:
>>   - Updated comments and commit messages
>>   - Rebased on vfio next
>>   - Left the dev_warn in place, will adjust it later
>>   - s/singleton_file/opened_file/
>> v1: https://lore.kernel.org/r/0-v1-c1d14aae2e8f+2f4-vfio_group_locking_jgg@nvidia.com
>>
>> Cc: Nicolin Chen <nicolinc@nvidia.com>
> 
> Sanity tested on x86_64 and ARM64.
> 
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>

Also sanity tested this series on s390x (vfio-pci and vfio-ap)

Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/6] Fully lock the container members of struct vfio_group
  2022-05-16 23:41 [PATCH v2 0/6] Fully lock the container members of struct vfio_group Jason Gunthorpe
                   ` (6 preceding siblings ...)
  2022-05-17  5:11 ` [PATCH v2 0/6] Fully lock the container members of struct vfio_group Nicolin Chen
@ 2022-05-18 17:51 ` Alex Williamson
  7 siblings, 0 replies; 10+ messages in thread
From: Alex Williamson @ 2022-05-18 17:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Cornelia Huck, kvm, Xiao Guangrong, Jike Song, Kevin Tian,
	Kirti Wankhede, Nicolin Chen

On Mon, 16 May 2022 20:41:16 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> The atomic based scheme for tracking the group->container and group->kvm
> has two race conditions, simplify it by adding a rwsem to protect those
> values and related and remove the atomics.
> 
> This is on github: https://github.com/jgunthorpe/linux/commits/vfio_group_locking
> 
> v2:
>  - Updated comments and commit messages
>  - Rebased on vfio next
>  - Left the dev_warn in place, will adjust it later
>  - s/singleton_file/opened_file/
> v1: https://lore.kernel.org/r/0-v1-c1d14aae2e8f+2f4-vfio_group_locking_jgg@nvidia.com
> 
> Cc: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> Jason Gunthorpe (6):
>   vfio: Add missing locking for struct vfio_group::kvm
>   vfio: Change struct vfio_group::opened from an atomic to bool
>   vfio: Split up vfio_group_get_device_fd()
>   vfio: Fully lock struct vfio_group::container
>   vfio: Simplify the life cycle of the group FD
>   vfio: Change struct vfio_group::container_users to a non-atomic int
> 
>  drivers/vfio/vfio.c | 266 +++++++++++++++++++++++++++-----------------
>  1 file changed, 163 insertions(+), 103 deletions(-)
> 
> 
> base-commit: 6a985ae80befcf2c00e7c889336bfe9e9739e2ef

Applied to vfio next branch for v5.19.  Thanks!

Alex


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-05-18 17:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-16 23:41 [PATCH v2 0/6] Fully lock the container members of struct vfio_group Jason Gunthorpe
2022-05-16 23:41 ` [PATCH v2 1/6] vfio: Add missing locking for struct vfio_group::kvm Jason Gunthorpe
2022-05-16 23:41 ` [PATCH v2 2/6] vfio: Change struct vfio_group::opened from an atomic to bool Jason Gunthorpe
2022-05-16 23:41 ` [PATCH v2 3/6] vfio: Split up vfio_group_get_device_fd() Jason Gunthorpe
2022-05-16 23:41 ` [PATCH v2 4/6] vfio: Fully lock struct vfio_group::container Jason Gunthorpe
2022-05-16 23:41 ` [PATCH v2 5/6] vfio: Simplify the life cycle of the group FD Jason Gunthorpe
2022-05-16 23:41 ` [PATCH v2 6/6] vfio: Change struct vfio_group::container_users to a non-atomic int Jason Gunthorpe
2022-05-17  5:11 ` [PATCH v2 0/6] Fully lock the container members of struct vfio_group Nicolin Chen
2022-05-17 15:37   ` Matthew Rosato
2022-05-18 17:51 ` Alex Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.