All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Update vfio_group to use the modern cdev lifecycle
@ 2021-10-01 23:22 Jason Gunthorpe
  2021-10-01 23:22 ` [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier() Jason Gunthorpe
                   ` (4 more replies)
  0 siblings, 5 replies; 29+ messages in thread
From: Jason Gunthorpe @ 2021-10-01 23:22 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, kvm
  Cc: Christoph Hellwig, Tian, Kevin, Liu Yi L

These days drivers with state should use cdev_device_add() and
cdev_device_del() to manage the cdev and sysfs lifetime. This simple
pattern ties all the state (vfio, dev, and cdev) together in one memory
structure and uses container_of() to navigate between the layers.

This is a followup to the discussion here:

https://lore.kernel.org/kvm/20210921155705.GN327412@nvidia.com/

This builds on Christoph's work to revise how the vfio_group works and is
against the latest VFIO tree.

Jason Gunthorpe (5):
  vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier()
  vfio: Do not open code the group list search in vfio_create_group()
  vfio: Don't leak a group reference if the group already exists
  vfio: Use a refcount_t instead of a kref in the vfio_group
  vfio: Use cdev_device_add() instead of device_create()

 drivers/vfio/vfio.c | 363 +++++++++++++++++---------------------------
 1 file changed, 140 insertions(+), 223 deletions(-)


base-commit: d9a0cd510c3383b61db6f70a84e0c3487f836a63
-- 
2.33.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier()
  2021-10-01 23:22 [PATCH 0/5] Update vfio_group to use the modern cdev lifecycle Jason Gunthorpe
@ 2021-10-01 23:22 ` Jason Gunthorpe
  2021-10-04 22:25   ` Alex Williamson
                     ` (2 more replies)
  2021-10-01 23:22 ` [PATCH 2/5] vfio: Do not open code the group list search in vfio_create_group() Jason Gunthorpe
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 29+ messages in thread
From: Jason Gunthorpe @ 2021-10-01 23:22 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, kvm
  Cc: Christoph Hellwig, Tian, Kevin, Liu Yi L

iommu_group_register_notifier()/iommu_group_unregister_notifier() are
built using a blocking_notifier_chain which integrates a rwsem. The
notifier function cannot be running outside its registration.

When considering how the notifier function interacts with create/destroy
of the group there are two fringe cases, the notifier starts before
list_add(&vfio.group_list) and the notifier runs after the kref
becomes 0.

Prior to vfio_create_group() unlocking and returning we have
   container_users == 0
   device_list == empty
And this cannot change until the mutex is unlocked.

After the kref goes to zero we must also have
   container_users == 0
   device_list == empty

Both are required because they are balanced operations and a 0 kref means
some caller became unbalanced. Add the missing assertion that
container_users must be zero as well.

These two facts are important because when checking each operation we see:

- IOMMU_GROUP_NOTIFY_ADD_DEVICE
   Empty device_list avoids the WARN_ON in vfio_group_nb_add_dev()
   0 container_users ends the call
- IOMMU_GROUP_NOTIFY_BOUND_DRIVER
   0 container_users ends the call

Finally, we have IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER, which only deletes
items from the unbound list. During creation this list is empty, during
kref == 0 nothing can read this list, and it will be freed soon.

Since the vfio_group_release() doesn't hold the appropriate lock to
manipulate the unbound_list and could race with the notifier, move the
cleanup to directly before the kfree.

This allows deleting all of the deferred group put code.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio.c | 89 +++++----------------------------------------
 1 file changed, 9 insertions(+), 80 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 08b27b64f0f935..32a53cb3598524 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -324,12 +324,20 @@ static void vfio_container_put(struct vfio_container *container)
 
 static void vfio_group_unlock_and_free(struct vfio_group *group)
 {
+	struct vfio_unbound_dev *unbound, *tmp;
+
 	mutex_unlock(&vfio.group_lock);
 	/*
 	 * Unregister outside of lock.  A spurious callback is harmless now
 	 * that the group is no longer in vfio.group_list.
 	 */
 	iommu_group_unregister_notifier(group->iommu_group, &group->nb);
+
+	list_for_each_entry_safe(unbound, tmp,
+				 &group->unbound_list, unbound_next) {
+		list_del(&unbound->unbound_next);
+		kfree(unbound);
+	}
 	kfree(group);
 }
 
@@ -361,13 +369,6 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
 
 	group->nb.notifier_call = vfio_iommu_group_notifier;
 
-	/*
-	 * blocking notifiers acquire a rwsem around registering and hold
-	 * it around callback.  Therefore, need to register outside of
-	 * vfio.group_lock to avoid A-B/B-A contention.  Our callback won't
-	 * do anything unless it can find the group in vfio.group_list, so
-	 * no harm in registering early.
-	 */
 	ret = iommu_group_register_notifier(iommu_group, &group->nb);
 	if (ret) {
 		kfree(group);
@@ -415,18 +416,12 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
 static void vfio_group_release(struct kref *kref)
 {
 	struct vfio_group *group = container_of(kref, struct vfio_group, kref);
-	struct vfio_unbound_dev *unbound, *tmp;
 	struct iommu_group *iommu_group = group->iommu_group;
 
 	WARN_ON(!list_empty(&group->device_list));
+	WARN_ON(atomic_read(&group->container_users));
 	WARN_ON(group->notifier.head);
 
-	list_for_each_entry_safe(unbound, tmp,
-				 &group->unbound_list, unbound_next) {
-		list_del(&unbound->unbound_next);
-		kfree(unbound);
-	}
-
 	device_destroy(vfio.class, MKDEV(MAJOR(vfio.group_devt), group->minor));
 	list_del(&group->vfio_next);
 	vfio_free_group_minor(group->minor);
@@ -439,61 +434,12 @@ static void vfio_group_put(struct vfio_group *group)
 	kref_put_mutex(&group->kref, vfio_group_release, &vfio.group_lock);
 }
 
-struct vfio_group_put_work {
-	struct work_struct work;
-	struct vfio_group *group;
-};
-
-static void vfio_group_put_bg(struct work_struct *work)
-{
-	struct vfio_group_put_work *do_work;
-
-	do_work = container_of(work, struct vfio_group_put_work, work);
-
-	vfio_group_put(do_work->group);
-	kfree(do_work);
-}
-
-static void vfio_group_schedule_put(struct vfio_group *group)
-{
-	struct vfio_group_put_work *do_work;
-
-	do_work = kmalloc(sizeof(*do_work), GFP_KERNEL);
-	if (WARN_ON(!do_work))
-		return;
-
-	INIT_WORK(&do_work->work, vfio_group_put_bg);
-	do_work->group = group;
-	schedule_work(&do_work->work);
-}
-
 /* Assume group_lock or group reference is held */
 static void vfio_group_get(struct vfio_group *group)
 {
 	kref_get(&group->kref);
 }
 
-/*
- * Not really a try as we will sleep for mutex, but we need to make
- * sure the group pointer is valid under lock and get a reference.
- */
-static struct vfio_group *vfio_group_try_get(struct vfio_group *group)
-{
-	struct vfio_group *target = group;
-
-	mutex_lock(&vfio.group_lock);
-	list_for_each_entry(group, &vfio.group_list, vfio_next) {
-		if (group == target) {
-			vfio_group_get(group);
-			mutex_unlock(&vfio.group_lock);
-			return group;
-		}
-	}
-	mutex_unlock(&vfio.group_lock);
-
-	return NULL;
-}
-
 static
 struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group)
 {
@@ -691,14 +637,6 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb,
 	struct device *dev = data;
 	struct vfio_unbound_dev *unbound;
 
-	/*
-	 * Need to go through a group_lock lookup to get a reference or we
-	 * risk racing a group being removed.  Ignore spurious notifies.
-	 */
-	group = vfio_group_try_get(group);
-	if (!group)
-		return NOTIFY_OK;
-
 	switch (action) {
 	case IOMMU_GROUP_NOTIFY_ADD_DEVICE:
 		vfio_group_nb_add_dev(group, dev);
@@ -749,15 +687,6 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb,
 		mutex_unlock(&group->unbound_lock);
 		break;
 	}
-
-	/*
-	 * If we're the last reference to the group, the group will be
-	 * released, which includes unregistering the iommu group notifier.
-	 * We hold a read-lock on that notifier list, unregistering needs
-	 * a write-lock... deadlock.  Release our reference asynchronously
-	 * to avoid that situation.
-	 */
-	vfio_group_schedule_put(group);
 	return NOTIFY_OK;
 }
 
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/5] vfio: Do not open code the group list search in vfio_create_group()
  2021-10-01 23:22 [PATCH 0/5] Update vfio_group to use the modern cdev lifecycle Jason Gunthorpe
  2021-10-01 23:22 ` [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier() Jason Gunthorpe
@ 2021-10-01 23:22 ` Jason Gunthorpe
  2021-10-02  3:19   ` kernel test robot
                     ` (2 more replies)
  2021-10-01 23:22 ` [PATCH 3/5] vfio: Don't leak a group reference if the group already exists Jason Gunthorpe
                   ` (2 subsequent siblings)
  4 siblings, 3 replies; 29+ messages in thread
From: Jason Gunthorpe @ 2021-10-01 23:22 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, kvm
  Cc: Christoph Hellwig, Tian, Kevin, Liu Yi L

Split vfio_group_get_from_iommu() into __vfio_group_get_from_iommu() so
that vfio_create_group() can call it to consolidate this duplicated code.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio.c | 55 ++++++++++++++++++++++++---------------------
 1 file changed, 30 insertions(+), 25 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 32a53cb3598524..1cb12033b02240 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -344,10 +344,35 @@ static void vfio_group_unlock_and_free(struct vfio_group *group)
 /**
  * Group objects - create, release, get, put, search
  */
+static struct vfio_group *
+__vfio_group_get_from_iommu(struct iommu_group *iommu_group)
+{
+	struct vfio_group *group;
+
+	list_for_each_entry(group, &vfio.group_list, vfio_next) {
+		if (group->iommu_group == iommu_group) {
+			vfio_group_get(group);
+			return group;
+		}
+	}
+	return NULL;
+}
+
+static struct vfio_group *
+vfio_group_get_from_iommu(struct iommu_group *iommu_group)
+{
+	struct vfio_group *group;
+
+	mutex_lock(&vfio.group_lock);
+	group = __vfio_group_get_from_iommu(iommu_group);
+	mutex_unlock(&vfio.group_lock);
+	return group;
+}
+
 static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
 		enum vfio_group_type type)
 {
-	struct vfio_group *group, *tmp;
+	struct vfio_group *group, *existing_group;
 	struct device *dev;
 	int ret, minor;
 
@@ -378,12 +403,10 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
 	mutex_lock(&vfio.group_lock);
 
 	/* Did we race creating this group? */
-	list_for_each_entry(tmp, &vfio.group_list, vfio_next) {
-		if (tmp->iommu_group == iommu_group) {
-			vfio_group_get(tmp);
-			vfio_group_unlock_and_free(group);
-			return tmp;
-		}
+	existing_group = __vfio_group_get_from_iommu(iommu_group);
+	if (existing_group) {
+		vfio_group_unlock_and_free(group);
+		return existing_group;
 	}
 
 	minor = vfio_alloc_group_minor(group);
@@ -440,24 +463,6 @@ static void vfio_group_get(struct vfio_group *group)
 	kref_get(&group->kref);
 }
 
-static
-struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group)
-{
-	struct vfio_group *group;
-
-	mutex_lock(&vfio.group_lock);
-	list_for_each_entry(group, &vfio.group_list, vfio_next) {
-		if (group->iommu_group == iommu_group) {
-			vfio_group_get(group);
-			mutex_unlock(&vfio.group_lock);
-			return group;
-		}
-	}
-	mutex_unlock(&vfio.group_lock);
-
-	return NULL;
-}
-
 static struct vfio_group *vfio_group_get_from_minor(int minor)
 {
 	struct vfio_group *group;
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/5] vfio: Don't leak a group reference if the group already exists
  2021-10-01 23:22 [PATCH 0/5] Update vfio_group to use the modern cdev lifecycle Jason Gunthorpe
  2021-10-01 23:22 ` [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier() Jason Gunthorpe
  2021-10-01 23:22 ` [PATCH 2/5] vfio: Do not open code the group list search in vfio_create_group() Jason Gunthorpe
@ 2021-10-01 23:22 ` Jason Gunthorpe
  2021-10-04 22:25   ` Alex Williamson
  2021-10-01 23:22 ` [PATCH 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group Jason Gunthorpe
  2021-10-01 23:22 ` [PATCH 5/5] vfio: Use cdev_device_add() instead of device_create() Jason Gunthorpe
  4 siblings, 1 reply; 29+ messages in thread
From: Jason Gunthorpe @ 2021-10-01 23:22 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, kvm
  Cc: Christoph Hellwig, Tian, Kevin, Liu Yi L

If vfio_create_group() searches the group list and returns an already
existing group it does not put back the iommu_group reference that the
caller passed in.

Change the semantic of vfio_create_group() to not move the reference in
from the caller, but instead obtain a new reference inside and leave the
caller's reference alone. The two callers must now call iommu_group_put().

This is an unlikely race as the only caller that could hit it has already
searched the group list before attempting to create the group.

Fixes: cba3345cc494 ("vfio: VFIO core")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 1cb12033b02240..bf233943dc992f 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -338,6 +338,7 @@ static void vfio_group_unlock_and_free(struct vfio_group *group)
 		list_del(&unbound->unbound_next);
 		kfree(unbound);
 	}
+	iommu_group_put(group->iommu_group);
 	kfree(group);
 }
 
@@ -389,6 +390,8 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
 	atomic_set(&group->opened, 0);
 	init_waitqueue_head(&group->container_q);
 	group->iommu_group = iommu_group;
+	/* put in vfio_group_unlock_and_free() */
+	iommu_group_ref_get(iommu_group);
 	group->type = type;
 	BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
 
@@ -396,8 +399,8 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
 
 	ret = iommu_group_register_notifier(iommu_group, &group->nb);
 	if (ret) {
-		kfree(group);
-		return ERR_PTR(ret);
+		group = ERR_PTR(ret);
+		goto err_put_group;
 	}
 
 	mutex_lock(&vfio.group_lock);
@@ -432,6 +435,9 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
 
 	mutex_unlock(&vfio.group_lock);
 
+err_put_group:
+	iommu_group_put(iommu_group);
+	kfree(group);
 	return group;
 }
 
@@ -439,7 +445,6 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
 static void vfio_group_release(struct kref *kref)
 {
 	struct vfio_group *group = container_of(kref, struct vfio_group, kref);
-	struct iommu_group *iommu_group = group->iommu_group;
 
 	WARN_ON(!list_empty(&group->device_list));
 	WARN_ON(atomic_read(&group->container_users));
@@ -449,7 +454,6 @@ static void vfio_group_release(struct kref *kref)
 	list_del(&group->vfio_next);
 	vfio_free_group_minor(group->minor);
 	vfio_group_unlock_and_free(group);
-	iommu_group_put(iommu_group);
 }
 
 static void vfio_group_put(struct vfio_group *group)
@@ -734,7 +738,7 @@ static struct vfio_group *vfio_noiommu_group_alloc(struct device *dev,
 		ret = PTR_ERR(group);
 		goto out_remove_device;
 	}
-
+	iommu_group_put(iommu_group);
 	return group;
 
 out_remove_device:
@@ -776,10 +780,6 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev)
 
 	/* a newly created vfio_group keeps the reference. */
 	group = vfio_create_group(iommu_group, VFIO_IOMMU);
-	if (IS_ERR(group))
-		goto out_put;
-	return group;
-
 out_put:
 	iommu_group_put(iommu_group);
 	return group;
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group
  2021-10-01 23:22 [PATCH 0/5] Update vfio_group to use the modern cdev lifecycle Jason Gunthorpe
                   ` (2 preceding siblings ...)
  2021-10-01 23:22 ` [PATCH 3/5] vfio: Don't leak a group reference if the group already exists Jason Gunthorpe
@ 2021-10-01 23:22 ` Jason Gunthorpe
  2021-10-04 22:25   ` Alex Williamson
                     ` (2 more replies)
  2021-10-01 23:22 ` [PATCH 5/5] vfio: Use cdev_device_add() instead of device_create() Jason Gunthorpe
  4 siblings, 3 replies; 29+ messages in thread
From: Jason Gunthorpe @ 2021-10-01 23:22 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, kvm
  Cc: Christoph Hellwig, Tian, Kevin, Liu Yi L

The next patch adds a struct device to the struct vfio_group, and it is
confusing/bad practice to have two krefs in the same struct. This kref is
controlling the period when the vfio_group is registered in sysfs, and
visible in the internal lookup. Switch it to a refcount_t instead.

The refcount_dec_and_mutex_lock() is still required because we need
atomicity of the list searches and sysfs presence.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio.c | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index bf233943dc992f..dbe7edd88ce35c 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -69,7 +69,7 @@ struct vfio_unbound_dev {
 };
 
 struct vfio_group {
-	struct kref			kref;
+	refcount_t users;
 	int				minor;
 	atomic_t			container_users;
 	struct iommu_group		*iommu_group;
@@ -381,7 +381,7 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
 	if (!group)
 		return ERR_PTR(-ENOMEM);
 
-	kref_init(&group->kref);
+	refcount_set(&group->users, 1);
 	INIT_LIST_HEAD(&group->device_list);
 	mutex_init(&group->device_lock);
 	INIT_LIST_HEAD(&group->unbound_list);
@@ -441,10 +441,10 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
 	return group;
 }
 
-/* called with vfio.group_lock held */
-static void vfio_group_release(struct kref *kref)
+static void vfio_group_put(struct vfio_group *group)
 {
-	struct vfio_group *group = container_of(kref, struct vfio_group, kref);
+	if (!refcount_dec_and_mutex_lock(&group->users, &vfio.group_lock))
+		return;
 
 	WARN_ON(!list_empty(&group->device_list));
 	WARN_ON(atomic_read(&group->container_users));
@@ -456,15 +456,9 @@ static void vfio_group_release(struct kref *kref)
 	vfio_group_unlock_and_free(group);
 }
 
-static void vfio_group_put(struct vfio_group *group)
-{
-	kref_put_mutex(&group->kref, vfio_group_release, &vfio.group_lock);
-}
-
-/* Assume group_lock or group reference is held */
 static void vfio_group_get(struct vfio_group *group)
 {
-	kref_get(&group->kref);
+	refcount_inc(&group->users);
 }
 
 static struct vfio_group *vfio_group_get_from_minor(int minor)
@@ -1662,6 +1656,7 @@ struct vfio_group *vfio_group_get_external_user(struct file *filep)
 	if (ret)
 		return ERR_PTR(ret);
 
+	/* Since the caller holds the fget on the file users must be >= 1 */
 	vfio_group_get(group);
 
 	return group;
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 5/5] vfio: Use cdev_device_add() instead of device_create()
  2021-10-01 23:22 [PATCH 0/5] Update vfio_group to use the modern cdev lifecycle Jason Gunthorpe
                   ` (3 preceding siblings ...)
  2021-10-01 23:22 ` [PATCH 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group Jason Gunthorpe
@ 2021-10-01 23:22 ` Jason Gunthorpe
  2021-10-12  8:33   ` Tian, Kevin
  2021-10-12  8:57   ` Liu, Yi L
  4 siblings, 2 replies; 29+ messages in thread
From: Jason Gunthorpe @ 2021-10-01 23:22 UTC (permalink / raw)
  To: Alex Williamson, Cornelia Huck, kvm
  Cc: Christoph Hellwig, Tian, Kevin, Liu Yi L

Modernize how vfio is creating the group char dev and sysfs presence.

These days drivers with state should use cdev_device_add() and
cdev_device_del() to manage the cdev and sysfs lifetime.

This API requires the driver to put the struct device and struct cdev
inside its state struct (vfio_group), and then use the usual
device_initialize()/cdev_device_add()/cdev_device_del() sequence.

Split the code to make this possible:

 - vfio_group_alloc()/vfio_group_release() are pair'd functions to
   alloc/free the vfio_group. release is done under the struct device
   kref.

 - vfio_create_group()/vfio_group_put() are pairs that manage the
   sysfs/cdev lifetime. Once the uses count is zero the vfio group's
   userspace presence is destroyed.

 - The IDR is replaced with an IDA. container_of(inode->i_cdev)
   is used to get back to the vfio_group during fops open. The IDA
   assigns unique minor numbers.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/vfio.c | 200 +++++++++++++++++++++-----------------------
 1 file changed, 94 insertions(+), 106 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index dbe7edd88ce35c..01e04947250f40 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -43,9 +43,8 @@ static struct vfio {
 	struct list_head		iommu_drivers_list;
 	struct mutex			iommu_drivers_lock;
 	struct list_head		group_list;
-	struct idr			group_idr;
-	struct mutex			group_lock;
-	struct cdev			group_cdev;
+	struct mutex			group_lock; /* locks group_list */
+	struct ida			group_ida;
 	dev_t				group_devt;
 } vfio;
 
@@ -69,14 +68,14 @@ struct vfio_unbound_dev {
 };
 
 struct vfio_group {
+	struct device dev;
+	struct cdev cdev;
 	refcount_t users;
-	int				minor;
 	atomic_t			container_users;
 	struct iommu_group		*iommu_group;
 	struct vfio_container		*container;
 	struct list_head		device_list;
 	struct mutex			device_lock;
-	struct device			*dev;
 	struct notifier_block		nb;
 	struct list_head		vfio_next;
 	struct list_head		container_next;
@@ -98,6 +97,7 @@ MODULE_PARM_DESC(enable_unsafe_noiommu_mode, "Enable UNSAFE, no-IOMMU mode.  Thi
 #endif
 
 static DEFINE_XARRAY(vfio_device_set_xa);
+static const struct file_operations vfio_group_fops;
 
 int vfio_assign_device_set(struct vfio_device *device, void *set_id)
 {
@@ -281,19 +281,6 @@ void vfio_unregister_iommu_driver(const struct vfio_iommu_driver_ops *ops)
 }
 EXPORT_SYMBOL_GPL(vfio_unregister_iommu_driver);
 
-/**
- * Group minor allocation/free - both called with vfio.group_lock held
- */
-static int vfio_alloc_group_minor(struct vfio_group *group)
-{
-	return idr_alloc(&vfio.group_idr, group, 0, MINORMASK + 1, GFP_KERNEL);
-}
-
-static void vfio_free_group_minor(int minor)
-{
-	idr_remove(&vfio.group_idr, minor);
-}
-
 static int vfio_iommu_group_notifier(struct notifier_block *nb,
 				     unsigned long action, void *data);
 static void vfio_group_get(struct vfio_group *group);
@@ -322,26 +309,6 @@ static void vfio_container_put(struct vfio_container *container)
 	kref_put(&container->kref, vfio_container_release);
 }
 
-static void vfio_group_unlock_and_free(struct vfio_group *group)
-{
-	struct vfio_unbound_dev *unbound, *tmp;
-
-	mutex_unlock(&vfio.group_lock);
-	/*
-	 * Unregister outside of lock.  A spurious callback is harmless now
-	 * that the group is no longer in vfio.group_list.
-	 */
-	iommu_group_unregister_notifier(group->iommu_group, &group->nb);
-
-	list_for_each_entry_safe(unbound, tmp,
-				 &group->unbound_list, unbound_next) {
-		list_del(&unbound->unbound_next);
-		kfree(unbound);
-	}
-	iommu_group_put(group->iommu_group);
-	kfree(group);
-}
-
 /**
  * Group objects - create, release, get, put, search
  */
@@ -370,75 +337,112 @@ vfio_group_get_from_iommu(struct iommu_group *iommu_group)
 	return group;
 }
 
-static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
-		enum vfio_group_type type)
+static void vfio_group_release(struct device *dev)
 {
-	struct vfio_group *group, *existing_group;
-	struct device *dev;
-	int ret, minor;
+	struct vfio_group *group = container_of(dev, struct vfio_group, dev);
+	struct vfio_unbound_dev *unbound, *tmp;
+
+	list_for_each_entry_safe(unbound, tmp,
+				 &group->unbound_list, unbound_next) {
+		list_del(&unbound->unbound_next);
+		kfree(unbound);
+	}
+
+	mutex_destroy(&group->device_lock);
+	mutex_destroy(&group->unbound_lock);
+	iommu_group_put(group->iommu_group);
+	ida_free(&vfio.group_ida, MINOR(group->dev.devt));
+	kfree(group);
+}
+
+static struct vfio_group *vfio_group_alloc(struct iommu_group *iommu_group,
+					   enum vfio_group_type type)
+{
+	struct vfio_group *group;
+	int minor;
 
 	group = kzalloc(sizeof(*group), GFP_KERNEL);
 	if (!group)
 		return ERR_PTR(-ENOMEM);
 
+	minor = ida_alloc_max(&vfio.group_ida, MINORMASK, GFP_KERNEL);
+	if (minor < 0) {
+		kfree(group);
+		return ERR_PTR(minor);
+	}
+
+	device_initialize(&group->dev);
+	group->dev.devt = MKDEV(MAJOR(vfio.group_devt), minor);
+	group->dev.class = vfio.class;
+	group->dev.release = vfio_group_release;
+	cdev_init(&group->cdev, &vfio_group_fops);
+	group->cdev.owner = THIS_MODULE;
+
 	refcount_set(&group->users, 1);
 	INIT_LIST_HEAD(&group->device_list);
 	mutex_init(&group->device_lock);
 	INIT_LIST_HEAD(&group->unbound_list);
 	mutex_init(&group->unbound_lock);
-	atomic_set(&group->container_users, 0);
-	atomic_set(&group->opened, 0);
 	init_waitqueue_head(&group->container_q);
 	group->iommu_group = iommu_group;
-	/* put in vfio_group_unlock_and_free() */
+	/* put in vfio_group_release() */
 	iommu_group_ref_get(iommu_group);
 	group->type = type;
 	BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
 
-	group->nb.notifier_call = vfio_iommu_group_notifier;
+	return group;
+}
 
-	ret = iommu_group_register_notifier(iommu_group, &group->nb);
-	if (ret) {
-		group = ERR_PTR(ret);
-		goto err_put_group;
+static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
+		enum vfio_group_type type)
+{
+	struct vfio_group *group;
+	struct vfio_group *ret;
+	int err;
+
+	group = vfio_group_alloc(iommu_group, type);
+	if (IS_ERR(group))
+		return group;
+
+	err = dev_set_name(&group->dev, "%s%d",
+			   group->type == VFIO_NO_IOMMU ? "noiommu-" : "",
+			   iommu_group_id(iommu_group));
+	if (err) {
+		ret = ERR_PTR(err);
+		goto err_put;
+	}
+
+	group->nb.notifier_call = vfio_iommu_group_notifier;
+	err = iommu_group_register_notifier(iommu_group, &group->nb);
+	if (err) {
+		ret = ERR_PTR(err);
+		goto err_put;
 	}
 
 	mutex_lock(&vfio.group_lock);
 
 	/* Did we race creating this group? */
-	existing_group = __vfio_group_get_from_iommu(iommu_group);
-	if (existing_group) {
-		vfio_group_unlock_and_free(group);
-		return existing_group;
-	}
+	ret = __vfio_group_get_from_iommu(iommu_group);
+	if (ret)
+		goto err_unlock;
 
-	minor = vfio_alloc_group_minor(group);
-	if (minor < 0) {
-		vfio_group_unlock_and_free(group);
-		return ERR_PTR(minor);
+	err = cdev_device_add(&group->cdev, &group->dev);
+	if (err) {
+		ret = ERR_PTR(err);
+		goto err_unlock;
 	}
 
-	dev = device_create(vfio.class, NULL,
-			    MKDEV(MAJOR(vfio.group_devt), minor), group, "%s%d",
-			    group->type == VFIO_NO_IOMMU ? "noiommu-" : "",
-			    iommu_group_id(iommu_group));
-	if (IS_ERR(dev)) {
-		vfio_free_group_minor(minor);
-		vfio_group_unlock_and_free(group);
-		return ERR_CAST(dev);
-	}
-
-	group->minor = minor;
-	group->dev = dev;
-
 	list_add(&group->vfio_next, &vfio.group_list);
 
 	mutex_unlock(&vfio.group_lock);
-
-err_put_group:
-	iommu_group_put(iommu_group);
-	kfree(group);
 	return group;
+
+err_unlock:
+	mutex_unlock(&vfio.group_lock);
+	iommu_group_unregister_notifier(group->iommu_group, &group->nb);
+err_put:
+	put_device(&group->dev);
+	return ret;
 }
 
 static void vfio_group_put(struct vfio_group *group)
@@ -450,10 +454,12 @@ static void vfio_group_put(struct vfio_group *group)
 	WARN_ON(atomic_read(&group->container_users));
 	WARN_ON(group->notifier.head);
 
-	device_destroy(vfio.class, MKDEV(MAJOR(vfio.group_devt), group->minor));
 	list_del(&group->vfio_next);
-	vfio_free_group_minor(group->minor);
-	vfio_group_unlock_and_free(group);
+	cdev_device_del(&group->cdev, &group->dev);
+	mutex_unlock(&vfio.group_lock);
+
+	iommu_group_unregister_notifier(group->iommu_group, &group->nb);
+	put_device(&group->dev);
 }
 
 static void vfio_group_get(struct vfio_group *group)
@@ -461,20 +467,10 @@ static void vfio_group_get(struct vfio_group *group)
 	refcount_inc(&group->users);
 }
 
-static struct vfio_group *vfio_group_get_from_minor(int minor)
+/* returns true if the get was obtained */
+static bool vfio_group_try_get(struct vfio_group *group)
 {
-	struct vfio_group *group;
-
-	mutex_lock(&vfio.group_lock);
-	group = idr_find(&vfio.group_idr, minor);
-	if (!group) {
-		mutex_unlock(&vfio.group_lock);
-		return NULL;
-	}
-	vfio_group_get(group);
-	mutex_unlock(&vfio.group_lock);
-
-	return group;
+	return refcount_inc_not_zero(&group->users);
 }
 
 static struct vfio_group *vfio_group_get_from_dev(struct device *dev)
@@ -1484,11 +1480,11 @@ static long vfio_group_fops_unl_ioctl(struct file *filep,
 
 static int vfio_group_fops_open(struct inode *inode, struct file *filep)
 {
-	struct vfio_group *group;
+	struct vfio_group *group =
+		container_of(inode->i_cdev, struct vfio_group, cdev);
 	int opened;
 
-	group = vfio_group_get_from_minor(iminor(inode));
-	if (!group)
+	if (!vfio_group_try_get(group))
 		return -ENODEV;
 
 	if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO)) {
@@ -2296,7 +2292,7 @@ static int __init vfio_init(void)
 {
 	int ret;
 
-	idr_init(&vfio.group_idr);
+	ida_init(&vfio.group_ida);
 	mutex_init(&vfio.group_lock);
 	mutex_init(&vfio.iommu_drivers_lock);
 	INIT_LIST_HEAD(&vfio.group_list);
@@ -2321,11 +2317,6 @@ static int __init vfio_init(void)
 	if (ret)
 		goto err_alloc_chrdev;
 
-	cdev_init(&vfio.group_cdev, &vfio_group_fops);
-	ret = cdev_add(&vfio.group_cdev, vfio.group_devt, MINORMASK + 1);
-	if (ret)
-		goto err_cdev_add;
-
 	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
 
 #ifdef CONFIG_VFIO_NOIOMMU
@@ -2333,8 +2324,6 @@ static int __init vfio_init(void)
 #endif
 	return 0;
 
-err_cdev_add:
-	unregister_chrdev_region(vfio.group_devt, MINORMASK + 1);
 err_alloc_chrdev:
 	class_destroy(vfio.class);
 	vfio.class = NULL;
@@ -2350,8 +2339,7 @@ static void __exit vfio_cleanup(void)
 #ifdef CONFIG_VFIO_NOIOMMU
 	vfio_unregister_iommu_driver(&vfio_noiommu_ops);
 #endif
-	idr_destroy(&vfio.group_idr);
-	cdev_del(&vfio.group_cdev);
+	ida_destroy(&vfio.group_ida);
 	unregister_chrdev_region(vfio.group_devt, MINORMASK + 1);
 	class_destroy(vfio.class);
 	vfio.class = NULL;
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/5] vfio: Do not open code the group list search in vfio_create_group()
  2021-10-01 23:22 ` [PATCH 2/5] vfio: Do not open code the group list search in vfio_create_group() Jason Gunthorpe
@ 2021-10-02  3:19   ` kernel test robot
  2021-10-12  6:37   ` Tian, Kevin
  2021-10-12  8:52   ` Liu, Yi L
  2 siblings, 0 replies; 29+ messages in thread
From: kernel test robot @ 2021-10-02  3:19 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 5423 bytes --]

Hi Jason,

I love your patch! Perhaps something to improve:

[auto build test WARNING on d9a0cd510c3383b61db6f70a84e0c3487f836a63]

url:    https://github.com/0day-ci/linux/commits/Jason-Gunthorpe/Update-vfio_group-to-use-the-modern-cdev-lifecycle/20211002-072315
base:   d9a0cd510c3383b61db6f70a84e0c3487f836a63
config: powerpc-allyesconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/863433fdbdd6ce5de9dcd683278f3bbfa34b1d0c
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jason-Gunthorpe/Update-vfio_group-to-use-the-modern-cdev-lifecycle/20211002-072315
        git checkout 863433fdbdd6ce5de9dcd683278f3bbfa34b1d0c
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=powerpc SHELL=/bin/bash drivers/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   drivers/vfio/vfio.c:236: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * IOMMU driver registration
   drivers/vfio/vfio.c:288: warning: Function parameter or member 'group' not described in 'vfio_alloc_group_minor'
   drivers/vfio/vfio.c:288: warning: expecting prototype for Group minor allocation/free(). Prototype was for vfio_alloc_group_minor() instead
   drivers/vfio/vfio.c:308: warning: Function parameter or member 'container' not described in 'vfio_container_get'
   drivers/vfio/vfio.c:308: warning: expecting prototype for Container objects(). Prototype was for vfio_container_get() instead
   drivers/vfio/vfio.c:349: warning: Function parameter or member 'iommu_group' not described in '__vfio_group_get_from_iommu'
>> drivers/vfio/vfio.c:349: warning: expecting prototype for Group objects(). Prototype was for __vfio_group_get_from_iommu() instead
   drivers/vfio/vfio.c:502: warning: Function parameter or member 'device' not described in 'vfio_device_put'
   drivers/vfio/vfio.c:502: warning: expecting prototype for Device objects(). Prototype was for vfio_device_put() instead
   drivers/vfio/vfio.c:605: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * Async device support
   drivers/vfio/vfio.c:699: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * VFIO driver API
   drivers/vfio/vfio.c:848: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * Get a reference to the vfio_device for a device.  Even if the
   drivers/vfio/vfio.c:982: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * VFIO base fd, /dev/vfio/vfio
   drivers/vfio/vfio.c:1200: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * VFIO Group fd, /dev/vfio/$GROUP
   drivers/vfio/vfio.c:1552: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * VFIO Device fd
   drivers/vfio/vfio.c:1627: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * External user API, exported by symbols to be linked dynamically.
   drivers/vfio/vfio.c:1672: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * External user API, exported by symbols to be linked dynamically.
   drivers/vfio/vfio.c:1751: warning: Function parameter or member 'caps' not described in 'vfio_info_cap_add'
   drivers/vfio/vfio.c:1751: warning: Function parameter or member 'size' not described in 'vfio_info_cap_add'
   drivers/vfio/vfio.c:1751: warning: Function parameter or member 'id' not described in 'vfio_info_cap_add'
   drivers/vfio/vfio.c:1751: warning: Function parameter or member 'version' not described in 'vfio_info_cap_add'
   drivers/vfio/vfio.c:1751: warning: expecting prototype for Sub(). Prototype was for vfio_info_cap_add() instead
   drivers/vfio/vfio.c:2285: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * Module/class support


vim +349 drivers/vfio/vfio.c

   343	
   344	/**
   345	 * Group objects - create, release, get, put, search
   346	 */
   347	static struct vfio_group *
   348	__vfio_group_get_from_iommu(struct iommu_group *iommu_group)
 > 349	{
   350		struct vfio_group *group;
   351	
   352		list_for_each_entry(group, &vfio.group_list, vfio_next) {
   353			if (group->iommu_group == iommu_group) {
   354				vfio_group_get(group);
   355				return group;
   356			}
   357		}
   358		return NULL;
   359	}
   360	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 74076 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier()
  2021-10-01 23:22 ` [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier() Jason Gunthorpe
@ 2021-10-04 22:25   ` Alex Williamson
  2021-10-04 22:34     ` Jason Gunthorpe
  2021-10-12  6:32   ` Tian, Kevin
  2021-10-12  8:51   ` Liu, Yi L
  2 siblings, 1 reply; 29+ messages in thread
From: Alex Williamson @ 2021-10-04 22:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Cornelia Huck, kvm, Christoph Hellwig, Tian, Kevin, Liu Yi L

On Fri,  1 Oct 2021 20:22:20 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> iommu_group_register_notifier()/iommu_group_unregister_notifier() are
> built using a blocking_notifier_chain which integrates a rwsem. The
> notifier function cannot be running outside its registration.
> 
> When considering how the notifier function interacts with create/destroy
> of the group there are two fringe cases, the notifier starts before
> list_add(&vfio.group_list) and the notifier runs after the kref
> becomes 0.
> 
> Prior to vfio_create_group() unlocking and returning we have
>    container_users == 0
>    device_list == empty
> And this cannot change until the mutex is unlocked.
> 
> After the kref goes to zero we must also have
>    container_users == 0
>    device_list == empty
> 
> Both are required because they are balanced operations and a 0 kref means
> some caller became unbalanced. Add the missing assertion that
> container_users must be zero as well.
> 
> These two facts are important because when checking each operation we see:
> 
> - IOMMU_GROUP_NOTIFY_ADD_DEVICE
>    Empty device_list avoids the WARN_ON in vfio_group_nb_add_dev()
>    0 container_users ends the call
> - IOMMU_GROUP_NOTIFY_BOUND_DRIVER
>    0 container_users ends the call
> 
> Finally, we have IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER, which only deletes
> items from the unbound list. During creation this list is empty, during
> kref == 0 nothing can read this list, and it will be freed soon.
> 
> Since the vfio_group_release() doesn't hold the appropriate lock to
> manipulate the unbound_list and could race with the notifier, move the
> cleanup to directly before the kfree.
> 
> This allows deleting all of the deferred group put code.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/vfio/vfio.c | 89 +++++----------------------------------------
>  1 file changed, 9 insertions(+), 80 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 08b27b64f0f935..32a53cb3598524 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -324,12 +324,20 @@ static void vfio_container_put(struct vfio_container *container)
>  
>  static void vfio_group_unlock_and_free(struct vfio_group *group)
>  {
> +	struct vfio_unbound_dev *unbound, *tmp;
> +
>  	mutex_unlock(&vfio.group_lock);
>  	/*
>  	 * Unregister outside of lock.  A spurious callback is harmless now
>  	 * that the group is no longer in vfio.group_list.
>  	 */

This comment is indirectly referencing the vfio_group_try_get() in the
notifier callback, but as you describe in the commit log, it's actually
the container_users value that prevents this from racing group release
now.  Otherwise, tricky but looks good.  Thanks,

Alex


>  	iommu_group_unregister_notifier(group->iommu_group, &group->nb);
> +
> +	list_for_each_entry_safe(unbound, tmp,
> +				 &group->unbound_list, unbound_next) {
> +		list_del(&unbound->unbound_next);
> +		kfree(unbound);
> +	}
>  	kfree(group);
>  }
>  
> @@ -361,13 +369,6 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
>  
>  	group->nb.notifier_call = vfio_iommu_group_notifier;
>  
> -	/*
> -	 * blocking notifiers acquire a rwsem around registering and hold
> -	 * it around callback.  Therefore, need to register outside of
> -	 * vfio.group_lock to avoid A-B/B-A contention.  Our callback won't
> -	 * do anything unless it can find the group in vfio.group_list, so
> -	 * no harm in registering early.
> -	 */
>  	ret = iommu_group_register_notifier(iommu_group, &group->nb);
>  	if (ret) {
>  		kfree(group);
> @@ -415,18 +416,12 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
>  static void vfio_group_release(struct kref *kref)
>  {
>  	struct vfio_group *group = container_of(kref, struct vfio_group, kref);
> -	struct vfio_unbound_dev *unbound, *tmp;
>  	struct iommu_group *iommu_group = group->iommu_group;
>  
>  	WARN_ON(!list_empty(&group->device_list));
> +	WARN_ON(atomic_read(&group->container_users));
>  	WARN_ON(group->notifier.head);
>  
> -	list_for_each_entry_safe(unbound, tmp,
> -				 &group->unbound_list, unbound_next) {
> -		list_del(&unbound->unbound_next);
> -		kfree(unbound);
> -	}
> -
>  	device_destroy(vfio.class, MKDEV(MAJOR(vfio.group_devt), group->minor));
>  	list_del(&group->vfio_next);
>  	vfio_free_group_minor(group->minor);
> @@ -439,61 +434,12 @@ static void vfio_group_put(struct vfio_group *group)
>  	kref_put_mutex(&group->kref, vfio_group_release, &vfio.group_lock);
>  }
>  
> -struct vfio_group_put_work {
> -	struct work_struct work;
> -	struct vfio_group *group;
> -};
> -
> -static void vfio_group_put_bg(struct work_struct *work)
> -{
> -	struct vfio_group_put_work *do_work;
> -
> -	do_work = container_of(work, struct vfio_group_put_work, work);
> -
> -	vfio_group_put(do_work->group);
> -	kfree(do_work);
> -}
> -
> -static void vfio_group_schedule_put(struct vfio_group *group)
> -{
> -	struct vfio_group_put_work *do_work;
> -
> -	do_work = kmalloc(sizeof(*do_work), GFP_KERNEL);
> -	if (WARN_ON(!do_work))
> -		return;
> -
> -	INIT_WORK(&do_work->work, vfio_group_put_bg);
> -	do_work->group = group;
> -	schedule_work(&do_work->work);
> -}
> -
>  /* Assume group_lock or group reference is held */
>  static void vfio_group_get(struct vfio_group *group)
>  {
>  	kref_get(&group->kref);
>  }
>  
> -/*
> - * Not really a try as we will sleep for mutex, but we need to make
> - * sure the group pointer is valid under lock and get a reference.
> - */
> -static struct vfio_group *vfio_group_try_get(struct vfio_group *group)
> -{
> -	struct vfio_group *target = group;
> -
> -	mutex_lock(&vfio.group_lock);
> -	list_for_each_entry(group, &vfio.group_list, vfio_next) {
> -		if (group == target) {
> -			vfio_group_get(group);
> -			mutex_unlock(&vfio.group_lock);
> -			return group;
> -		}
> -	}
> -	mutex_unlock(&vfio.group_lock);
> -
> -	return NULL;
> -}
> -
>  static
>  struct vfio_group *vfio_group_get_from_iommu(struct iommu_group *iommu_group)
>  {
> @@ -691,14 +637,6 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb,
>  	struct device *dev = data;
>  	struct vfio_unbound_dev *unbound;
>  
> -	/*
> -	 * Need to go through a group_lock lookup to get a reference or we
> -	 * risk racing a group being removed.  Ignore spurious notifies.
> -	 */
> -	group = vfio_group_try_get(group);
> -	if (!group)
> -		return NOTIFY_OK;
> -
>  	switch (action) {
>  	case IOMMU_GROUP_NOTIFY_ADD_DEVICE:
>  		vfio_group_nb_add_dev(group, dev);
> @@ -749,15 +687,6 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb,
>  		mutex_unlock(&group->unbound_lock);
>  		break;
>  	}
> -
> -	/*
> -	 * If we're the last reference to the group, the group will be
> -	 * released, which includes unregistering the iommu group notifier.
> -	 * We hold a read-lock on that notifier list, unregistering needs
> -	 * a write-lock... deadlock.  Release our reference asynchronously
> -	 * to avoid that situation.
> -	 */
> -	vfio_group_schedule_put(group);
>  	return NOTIFY_OK;
>  }
>  


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] vfio: Don't leak a group reference if the group already exists
  2021-10-01 23:22 ` [PATCH 3/5] vfio: Don't leak a group reference if the group already exists Jason Gunthorpe
@ 2021-10-04 22:25   ` Alex Williamson
  2021-10-04 22:36     ` Jason Gunthorpe
  0 siblings, 1 reply; 29+ messages in thread
From: Alex Williamson @ 2021-10-04 22:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Cornelia Huck, kvm, Christoph Hellwig, Tian, Kevin, Liu Yi L

On Fri,  1 Oct 2021 20:22:22 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> If vfio_create_group() searches the group list and returns an already
> existing group it does not put back the iommu_group reference that the
> caller passed in.
> 
> Change the semantic of vfio_create_group() to not move the reference in
> from the caller, but instead obtain a new reference inside and leave the
> caller's reference alone. The two callers must now call iommu_group_put().
> 
> This is an unlikely race as the only caller that could hit it has already
> searched the group list before attempting to create the group.
> 
> Fixes: cba3345cc494 ("vfio: VFIO core")
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/vfio/vfio.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 1cb12033b02240..bf233943dc992f 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -338,6 +338,7 @@ static void vfio_group_unlock_and_free(struct vfio_group *group)
>  		list_del(&unbound->unbound_next);
>  		kfree(unbound);
>  	}
> +	iommu_group_put(group->iommu_group);
>  	kfree(group);
>  }
>  
> @@ -389,6 +390,8 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
>  	atomic_set(&group->opened, 0);
>  	init_waitqueue_head(&group->container_q);
>  	group->iommu_group = iommu_group;
> +	/* put in vfio_group_unlock_and_free() */
> +	iommu_group_ref_get(iommu_group);
>  	group->type = type;
>  	BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
>  
> @@ -396,8 +399,8 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
>  
>  	ret = iommu_group_register_notifier(iommu_group, &group->nb);
>  	if (ret) {
> -		kfree(group);
> -		return ERR_PTR(ret);
> +		group = ERR_PTR(ret);
> +		goto err_put_group;
>  	}
>  
>  	mutex_lock(&vfio.group_lock);
> @@ -432,6 +435,9 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
>  
>  	mutex_unlock(&vfio.group_lock);
>  
> +err_put_group:
> +	iommu_group_put(iommu_group);
> +	kfree(group);

????

In the non-error path we're releasing the caller's reference which is
now their responsibility to release, but in any case we're freeing the
object that we return?  That can't be right.

>  	return group;
>  }
>  
> @@ -439,7 +445,6 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
>  static void vfio_group_release(struct kref *kref)
>  {
>  	struct vfio_group *group = container_of(kref, struct vfio_group, kref);
> -	struct iommu_group *iommu_group = group->iommu_group;
>  
>  	WARN_ON(!list_empty(&group->device_list));
>  	WARN_ON(atomic_read(&group->container_users));
> @@ -449,7 +454,6 @@ static void vfio_group_release(struct kref *kref)
>  	list_del(&group->vfio_next);
>  	vfio_free_group_minor(group->minor);
>  	vfio_group_unlock_and_free(group);
> -	iommu_group_put(iommu_group);
>  }
>  
>  static void vfio_group_put(struct vfio_group *group)
> @@ -734,7 +738,7 @@ static struct vfio_group *vfio_noiommu_group_alloc(struct device *dev,
>  		ret = PTR_ERR(group);
>  		goto out_remove_device;
>  	}
> -
> +	iommu_group_put(iommu_group);
>  	return group;
>  
>  out_remove_device:
> @@ -776,10 +780,6 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev)
>  
>  	/* a newly created vfio_group keeps the reference. */

This comment is now incorrect.  Thanks,

Alex


>  	group = vfio_create_group(iommu_group, VFIO_IOMMU);
> -	if (IS_ERR(group))
> -		goto out_put;
> -	return group;
> -
>  out_put:
>  	iommu_group_put(iommu_group);
>  	return group;


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group
  2021-10-01 23:22 ` [PATCH 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group Jason Gunthorpe
@ 2021-10-04 22:25   ` Alex Williamson
  2021-10-04 22:39     ` Jason Gunthorpe
  2021-10-12  7:08   ` Tian, Kevin
  2021-10-12  9:04   ` Liu, Yi L
  2 siblings, 1 reply; 29+ messages in thread
From: Alex Williamson @ 2021-10-04 22:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Cornelia Huck, kvm, Christoph Hellwig, Tian, Kevin, Liu Yi L

On Fri,  1 Oct 2021 20:22:23 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> The next patch adds a struct device to the struct vfio_group, and it is
> confusing/bad practice to have two krefs in the same struct. This kref is
> controlling the period when the vfio_group is registered in sysfs, and
> visible in the internal lookup. Switch it to a refcount_t instead.
> 
> The refcount_dec_and_mutex_lock() is still required because we need
> atomicity of the list searches and sysfs presence.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/vfio/vfio.c | 19 +++++++------------
>  1 file changed, 7 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index bf233943dc992f..dbe7edd88ce35c 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -69,7 +69,7 @@ struct vfio_unbound_dev {
>  };
>  
>  struct vfio_group {
> -	struct kref			kref;
> +	refcount_t users;

Follow indenting for existing structs please.  The next patch even
mixes following and changing formatting, so I'm not sure what rule is
being used here.  Thanks,

Alex

>  	int				minor;
>  	atomic_t			container_users;
>  	struct iommu_group		*iommu_group;
> @@ -381,7 +381,7 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
>  	if (!group)
>  		return ERR_PTR(-ENOMEM);
>  
> -	kref_init(&group->kref);
> +	refcount_set(&group->users, 1);
>  	INIT_LIST_HEAD(&group->device_list);
>  	mutex_init(&group->device_lock);
>  	INIT_LIST_HEAD(&group->unbound_list);
> @@ -441,10 +441,10 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
>  	return group;
>  }
>  
> -/* called with vfio.group_lock held */
> -static void vfio_group_release(struct kref *kref)
> +static void vfio_group_put(struct vfio_group *group)
>  {
> -	struct vfio_group *group = container_of(kref, struct vfio_group, kref);
> +	if (!refcount_dec_and_mutex_lock(&group->users, &vfio.group_lock))
> +		return;
>  
>  	WARN_ON(!list_empty(&group->device_list));
>  	WARN_ON(atomic_read(&group->container_users));
> @@ -456,15 +456,9 @@ static void vfio_group_release(struct kref *kref)
>  	vfio_group_unlock_and_free(group);
>  }
>  
> -static void vfio_group_put(struct vfio_group *group)
> -{
> -	kref_put_mutex(&group->kref, vfio_group_release, &vfio.group_lock);
> -}
> -
> -/* Assume group_lock or group reference is held */
>  static void vfio_group_get(struct vfio_group *group)
>  {
> -	kref_get(&group->kref);
> +	refcount_inc(&group->users);
>  }
>  
>  static struct vfio_group *vfio_group_get_from_minor(int minor)
> @@ -1662,6 +1656,7 @@ struct vfio_group *vfio_group_get_external_user(struct file *filep)
>  	if (ret)
>  		return ERR_PTR(ret);
>  
> +	/* Since the caller holds the fget on the file users must be >= 1 */
>  	vfio_group_get(group);
>  
>  	return group;


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier()
  2021-10-04 22:25   ` Alex Williamson
@ 2021-10-04 22:34     ` Jason Gunthorpe
  2021-10-05  4:01       ` Alex Williamson
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Gunthorpe @ 2021-10-04 22:34 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Cornelia Huck, kvm, Christoph Hellwig, Tian, Kevin, Liu Yi L

On Mon, Oct 04, 2021 at 04:25:32PM -0600, Alex Williamson wrote:
> On Fri,  1 Oct 2021 20:22:20 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > iommu_group_register_notifier()/iommu_group_unregister_notifier() are
> > built using a blocking_notifier_chain which integrates a rwsem. The
> > notifier function cannot be running outside its registration.
> > 
> > When considering how the notifier function interacts with create/destroy
> > of the group there are two fringe cases, the notifier starts before
> > list_add(&vfio.group_list) and the notifier runs after the kref
> > becomes 0.
> > 
> > Prior to vfio_create_group() unlocking and returning we have
> >    container_users == 0
> >    device_list == empty
> > And this cannot change until the mutex is unlocked.
> > 
> > After the kref goes to zero we must also have
> >    container_users == 0
> >    device_list == empty
> > 
> > Both are required because they are balanced operations and a 0 kref means
> > some caller became unbalanced. Add the missing assertion that
> > container_users must be zero as well.
> > 
> > These two facts are important because when checking each operation we see:
> > 
> > - IOMMU_GROUP_NOTIFY_ADD_DEVICE
> >    Empty device_list avoids the WARN_ON in vfio_group_nb_add_dev()
> >    0 container_users ends the call
> > - IOMMU_GROUP_NOTIFY_BOUND_DRIVER
> >    0 container_users ends the call
> > 
> > Finally, we have IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER, which only deletes
> > items from the unbound list. During creation this list is empty, during
> > kref == 0 nothing can read this list, and it will be freed soon.
> > 
> > Since the vfio_group_release() doesn't hold the appropriate lock to
> > manipulate the unbound_list and could race with the notifier, move the
> > cleanup to directly before the kfree.
> > 
> > This allows deleting all of the deferred group put code.
> > 
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> >  drivers/vfio/vfio.c | 89 +++++----------------------------------------
> >  1 file changed, 9 insertions(+), 80 deletions(-)
> > 
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > index 08b27b64f0f935..32a53cb3598524 100644
> > +++ b/drivers/vfio/vfio.c
> > @@ -324,12 +324,20 @@ static void vfio_container_put(struct vfio_container *container)
> >  
> >  static void vfio_group_unlock_and_free(struct vfio_group *group)
> >  {
> > +	struct vfio_unbound_dev *unbound, *tmp;
> > +
> >  	mutex_unlock(&vfio.group_lock);
> >  	/*
> >  	 * Unregister outside of lock.  A spurious callback is harmless now
> >  	 * that the group is no longer in vfio.group_list.
> >  	 */
> 
> This comment is indirectly referencing the vfio_group_try_get() in the
> notifier callback, but as you describe in the commit log, it's actually
> the container_users value that prevents this from racing group release
> now.  Otherwise, tricky but looks good.  Thanks,

Do you think the comment should be deleted in this commit? I think I
got it later on..

Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] vfio: Don't leak a group reference if the group already exists
  2021-10-04 22:25   ` Alex Williamson
@ 2021-10-04 22:36     ` Jason Gunthorpe
  2021-10-05  4:01       ` Alex Williamson
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Gunthorpe @ 2021-10-04 22:36 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Cornelia Huck, kvm, Christoph Hellwig, Tian, Kevin, Liu Yi L

On Mon, Oct 04, 2021 at 04:25:43PM -0600, Alex Williamson wrote:
> On Fri,  1 Oct 2021 20:22:22 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > If vfio_create_group() searches the group list and returns an already
> > existing group it does not put back the iommu_group reference that the
> > caller passed in.
> > 
> > Change the semantic of vfio_create_group() to not move the reference in
> > from the caller, but instead obtain a new reference inside and leave the
> > caller's reference alone. The two callers must now call iommu_group_put().
> > 
> > This is an unlikely race as the only caller that could hit it has already
> > searched the group list before attempting to create the group.
> > 
> > Fixes: cba3345cc494 ("vfio: VFIO core")
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> >  drivers/vfio/vfio.c | 18 +++++++++---------
> >  1 file changed, 9 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > index 1cb12033b02240..bf233943dc992f 100644
> > +++ b/drivers/vfio/vfio.c
> > @@ -338,6 +338,7 @@ static void vfio_group_unlock_and_free(struct vfio_group *group)
> >  		list_del(&unbound->unbound_next);
> >  		kfree(unbound);
> >  	}
> > +	iommu_group_put(group->iommu_group);
> >  	kfree(group);
> >  }
> >  
> > @@ -389,6 +390,8 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
> >  	atomic_set(&group->opened, 0);
> >  	init_waitqueue_head(&group->container_q);
> >  	group->iommu_group = iommu_group;
> > +	/* put in vfio_group_unlock_and_free() */
> > +	iommu_group_ref_get(iommu_group);

      ^^^^^^^^^^^^^^^^^

> >  	group->type = type;
> >  	BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
> >  
> > @@ -396,8 +399,8 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
> >  
> >  	ret = iommu_group_register_notifier(iommu_group, &group->nb);
> >  	if (ret) {
> > -		kfree(group);
> > -		return ERR_PTR(ret);
> > +		group = ERR_PTR(ret);
> > +		goto err_put_group;
> >  	}
> >  
> >  	mutex_lock(&vfio.group_lock);
> > @@ -432,6 +435,9 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
> >  
> >  	mutex_unlock(&vfio.group_lock);
> >  
> > +err_put_group:
> > +	iommu_group_put(iommu_group);
> > +	kfree(group);
> 
> ????
> 
> In the non-error path we're releasing the caller's reference which is
> now their responsibility to release,

This release is paried with the get in the same function added one
hunk above

> but in any case we're freeing the object that we return?  That
> can't be right.

Yes, that is a rebasing mistake pulling this back from the last patch
that had a "return ret" here, thanks

> > @@ -776,10 +780,6 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev)
> >  
> >  	/* a newly created vfio_group keeps the reference. */
> 
> This comment is now incorrect.  Thanks,

Indeed

Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group
  2021-10-04 22:25   ` Alex Williamson
@ 2021-10-04 22:39     ` Jason Gunthorpe
  0 siblings, 0 replies; 29+ messages in thread
From: Jason Gunthorpe @ 2021-10-04 22:39 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Cornelia Huck, kvm, Christoph Hellwig, Tian, Kevin, Liu Yi L

On Mon, Oct 04, 2021 at 04:25:51PM -0600, Alex Williamson wrote:
> On Fri,  1 Oct 2021 20:22:23 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > The next patch adds a struct device to the struct vfio_group, and it is
> > confusing/bad practice to have two krefs in the same struct. This kref is
> > controlling the period when the vfio_group is registered in sysfs, and
> > visible in the internal lookup. Switch it to a refcount_t instead.
> > 
> > The refcount_dec_and_mutex_lock() is still required because we need
> > atomicity of the list searches and sysfs presence.
> > 
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> >  drivers/vfio/vfio.c | 19 +++++++------------
> >  1 file changed, 7 insertions(+), 12 deletions(-)
> > 
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > index bf233943dc992f..dbe7edd88ce35c 100644
> > +++ b/drivers/vfio/vfio.c
> > @@ -69,7 +69,7 @@ struct vfio_unbound_dev {
> >  };
> >  
> >  struct vfio_group {
> > -	struct kref			kref;
> > +	refcount_t users;
> 
> Follow indenting for existing structs please.  The next patch even
> mixes following and changing formatting, so I'm not sure what rule is
> being used here.  Thanks,

Sure, I generally err toward nixing the vertical alignment. It
generally is a net harm to backporting and I don't care much for the
poor readability of 30 chars of whitespace between related
words.. I look over the patches again and check

Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier()
  2021-10-04 22:34     ` Jason Gunthorpe
@ 2021-10-05  4:01       ` Alex Williamson
  2021-10-05 16:17         ` Jason Gunthorpe
  0 siblings, 1 reply; 29+ messages in thread
From: Alex Williamson @ 2021-10-05  4:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Cornelia Huck, kvm, Christoph Hellwig, Tian, Kevin, Liu Yi L

On Mon, 4 Oct 2021 19:34:31 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Mon, Oct 04, 2021 at 04:25:32PM -0600, Alex Williamson wrote:
> > On Fri,  1 Oct 2021 20:22:20 -0300
> > Jason Gunthorpe <jgg@nvidia.com> wrote:
> >   
> > > iommu_group_register_notifier()/iommu_group_unregister_notifier() are
> > > built using a blocking_notifier_chain which integrates a rwsem. The
> > > notifier function cannot be running outside its registration.
> > > 
> > > When considering how the notifier function interacts with create/destroy
> > > of the group there are two fringe cases, the notifier starts before
> > > list_add(&vfio.group_list) and the notifier runs after the kref
> > > becomes 0.
> > > 
> > > Prior to vfio_create_group() unlocking and returning we have
> > >    container_users == 0
> > >    device_list == empty
> > > And this cannot change until the mutex is unlocked.
> > > 
> > > After the kref goes to zero we must also have
> > >    container_users == 0
> > >    device_list == empty
> > > 
> > > Both are required because they are balanced operations and a 0 kref means
> > > some caller became unbalanced. Add the missing assertion that
> > > container_users must be zero as well.
> > > 
> > > These two facts are important because when checking each operation we see:
> > > 
> > > - IOMMU_GROUP_NOTIFY_ADD_DEVICE
> > >    Empty device_list avoids the WARN_ON in vfio_group_nb_add_dev()
> > >    0 container_users ends the call
> > > - IOMMU_GROUP_NOTIFY_BOUND_DRIVER
> > >    0 container_users ends the call
> > > 
> > > Finally, we have IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER, which only deletes
> > > items from the unbound list. During creation this list is empty, during
> > > kref == 0 nothing can read this list, and it will be freed soon.
> > > 
> > > Since the vfio_group_release() doesn't hold the appropriate lock to
> > > manipulate the unbound_list and could race with the notifier, move the
> > > cleanup to directly before the kfree.
> > > 
> > > This allows deleting all of the deferred group put code.
> > > 
> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > >  drivers/vfio/vfio.c | 89 +++++----------------------------------------
> > >  1 file changed, 9 insertions(+), 80 deletions(-)
> > > 
> > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > > index 08b27b64f0f935..32a53cb3598524 100644
> > > +++ b/drivers/vfio/vfio.c
> > > @@ -324,12 +324,20 @@ static void vfio_container_put(struct vfio_container *container)
> > >  
> > >  static void vfio_group_unlock_and_free(struct vfio_group *group)
> > >  {
> > > +	struct vfio_unbound_dev *unbound, *tmp;
> > > +
> > >  	mutex_unlock(&vfio.group_lock);
> > >  	/*
> > >  	 * Unregister outside of lock.  A spurious callback is harmless now
> > >  	 * that the group is no longer in vfio.group_list.
> > >  	 */  
> > 
> > This comment is indirectly referencing the vfio_group_try_get() in the
> > notifier callback, but as you describe in the commit log, it's actually
> > the container_users value that prevents this from racing group release
> > now.  Otherwise, tricky but looks good.  Thanks,  
> 
> Do you think the comment should be deleted in this commit? I think I
> got it later on..

I think the commit log argument is that notifies racing the group
release are harmless so long as the container is unused, and releasing
a group with active container users would be unbalanced, which
justifies the WARN_ON added here.  This comment does get removed in the
final patch, but maybe that logic can be noted in a comment by the new
sanity test.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] vfio: Don't leak a group reference if the group already exists
  2021-10-04 22:36     ` Jason Gunthorpe
@ 2021-10-05  4:01       ` Alex Williamson
  2021-10-05 14:45         ` Jason Gunthorpe
  0 siblings, 1 reply; 29+ messages in thread
From: Alex Williamson @ 2021-10-05  4:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Cornelia Huck, kvm, Christoph Hellwig, Tian, Kevin, Liu Yi L

On Mon, 4 Oct 2021 19:36:41 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Mon, Oct 04, 2021 at 04:25:43PM -0600, Alex Williamson wrote:
> > On Fri,  1 Oct 2021 20:22:22 -0300
> > Jason Gunthorpe <jgg@nvidia.com> wrote:
> >   
> > > If vfio_create_group() searches the group list and returns an already
> > > existing group it does not put back the iommu_group reference that the
> > > caller passed in.
> > > 
> > > Change the semantic of vfio_create_group() to not move the reference in
> > > from the caller, but instead obtain a new reference inside and leave the
> > > caller's reference alone. The two callers must now call iommu_group_put().
> > > 
> > > This is an unlikely race as the only caller that could hit it has already
> > > searched the group list before attempting to create the group.
> > > 
> > > Fixes: cba3345cc494 ("vfio: VFIO core")
> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > >  drivers/vfio/vfio.c | 18 +++++++++---------
> > >  1 file changed, 9 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > > index 1cb12033b02240..bf233943dc992f 100644
> > > +++ b/drivers/vfio/vfio.c
> > > @@ -338,6 +338,7 @@ static void vfio_group_unlock_and_free(struct vfio_group *group)
> > >  		list_del(&unbound->unbound_next);
> > >  		kfree(unbound);
> > >  	}
> > > +	iommu_group_put(group->iommu_group);
> > >  	kfree(group);
> > >  }
> > >  
> > > @@ -389,6 +390,8 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
> > >  	atomic_set(&group->opened, 0);
> > >  	init_waitqueue_head(&group->container_q);
> > >  	group->iommu_group = iommu_group;
> > > +	/* put in vfio_group_unlock_and_free() */
> > > +	iommu_group_ref_get(iommu_group);  
> 
>       ^^^^^^^^^^^^^^^^^
> 
> > >  	group->type = type;
> > >  	BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
> > >  
> > > @@ -396,8 +399,8 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
> > >  
> > >  	ret = iommu_group_register_notifier(iommu_group, &group->nb);
> > >  	if (ret) {
> > > -		kfree(group);
> > > -		return ERR_PTR(ret);
> > > +		group = ERR_PTR(ret);
> > > +		goto err_put_group;
> > >  	}
> > >  
> > >  	mutex_lock(&vfio.group_lock);
> > > @@ -432,6 +435,9 @@ static struct vfio_group *vfio_create_group(struct iommu_group *iommu_group,
> > >  
> > >  	mutex_unlock(&vfio.group_lock);
> > >  
> > > +err_put_group:
> > > +	iommu_group_put(iommu_group);
> > > +	kfree(group);  
> > 
> > ????
> > 
> > In the non-error path we're releasing the caller's reference which is
> > now their responsibility to release,  
> 
> This release is paried with the get in the same function added one
> hunk above

Note that this is the common exit path until the last patch in the
series pulls returning the successfully created/found group above the
error condition exit paths.  As it stands, this patch unconditionally
releases the reference it claims to newly create.  Thanks,

Alex
 
> > but in any case we're freeing the object that we return?  That
> > can't be right.  
> 
> Yes, that is a rebasing mistake pulling this back from the last patch
> that had a "return ret" here, thanks
> 
> > > @@ -776,10 +780,6 @@ static struct vfio_group *vfio_group_find_or_alloc(struct device *dev)
> > >  
> > >  	/* a newly created vfio_group keeps the reference. */  
> > 
> > This comment is now incorrect.  Thanks,  
> 
> Indeed
> 
> Jason
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 3/5] vfio: Don't leak a group reference if the group already exists
  2021-10-05  4:01       ` Alex Williamson
@ 2021-10-05 14:45         ` Jason Gunthorpe
  0 siblings, 0 replies; 29+ messages in thread
From: Jason Gunthorpe @ 2021-10-05 14:45 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Cornelia Huck, kvm, Christoph Hellwig, Tian, Kevin, Liu Yi L

On Mon, Oct 04, 2021 at 10:01:54PM -0600, Alex Williamson wrote:

> Note that this is the common exit path until the last patch in the
> series pulls returning the successfully created/found group above the
> error condition exit paths.  As it stands, this patch unconditionally
> releases the reference it claims to newly create.  Thanks,

Oh, I see, you are pointing at the missing return.. Yes I got that too
when I fixed the rest of it

Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier()
  2021-10-05  4:01       ` Alex Williamson
@ 2021-10-05 16:17         ` Jason Gunthorpe
  0 siblings, 0 replies; 29+ messages in thread
From: Jason Gunthorpe @ 2021-10-05 16:17 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Cornelia Huck, kvm, Christoph Hellwig, Tian, Kevin, Liu Yi L

On Mon, Oct 04, 2021 at 10:01:52PM -0600, Alex Williamson wrote:

> I think the commit log argument is that notifies racing the group
> release are harmless so long as the container is unused, and releasing
> a group with active container users would be unbalanced, which
> justifies the WARN_ON added here.  

Yes

I changed it like this:

@@ -327,6 +327,10 @@ static void vfio_group_unlock_and_free(struct vfio_group *group)
        struct vfio_unbound_dev *unbound, *tmp;
 
        mutex_unlock(&vfio.group_lock);
+       /*
+        * Unregister outside of lock.  A spurious callback is harmless now
+        * that the group is no longer in vfio.group_list.
+        */
        iommu_group_unregister_notifier(group->iommu_group, &group->nb);
 
        list_for_each_entry_safe(unbound, tmp,
@@ -413,12 +418,6 @@ static void vfio_group_release(struct kref *kref)
        struct vfio_group *group = container_of(kref, struct vfio_group, kref);
        struct iommu_group *iommu_group = group->iommu_group;
 
-       /*
-        * These data structures all have paired operations that can only be
-        * undone when the caller holds a live reference on the group. Since all
-        * pairs must be undone these WARN_ON's indicate some caller did not
-        * properly hold the group reference.
-        */
        WARN_ON(!list_empty(&group->device_list));
        WARN_ON(atomic_read(&group->container_users));
        WARN_ON(group->notifier.head);

Thanks,
Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier()
  2021-10-01 23:22 ` [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier() Jason Gunthorpe
  2021-10-04 22:25   ` Alex Williamson
@ 2021-10-12  6:32   ` Tian, Kevin
  2021-10-12  8:51   ` Liu, Yi L
  2 siblings, 0 replies; 29+ messages in thread
From: Tian, Kevin @ 2021-10-12  6:32 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson, Cornelia Huck, kvm
  Cc: Christoph Hellwig, Liu, Yi L, Lu, Baolu

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, October 2, 2021 7:22 AM
> 
> iommu_group_register_notifier()/iommu_group_unregister_notifier() are
> built using a blocking_notifier_chain which integrates a rwsem. The
> notifier function cannot be running outside its registration.
> 
> When considering how the notifier function interacts with create/destroy
> of the group there are two fringe cases, the notifier starts before
> list_add(&vfio.group_list) and the notifier runs after the kref
> becomes 0.
> 
> Prior to vfio_create_group() unlocking and returning we have
>    container_users == 0
>    device_list == empty
> And this cannot change until the mutex is unlocked.
> 
> After the kref goes to zero we must also have
>    container_users == 0
>    device_list == empty
> 
> Both are required because they are balanced operations and a 0 kref means
> some caller became unbalanced. Add the missing assertion that
> container_users must be zero as well.
> 
> These two facts are important because when checking each operation we see:
> 
> - IOMMU_GROUP_NOTIFY_ADD_DEVICE
>    Empty device_list avoids the WARN_ON in vfio_group_nb_add_dev()
>    0 container_users ends the call
> - IOMMU_GROUP_NOTIFY_BOUND_DRIVER
>    0 container_users ends the call
> 
> Finally, we have IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER, which only
> deletes
> items from the unbound list. During creation this list is empty, during
> kref == 0 nothing can read this list, and it will be freed soon.
> 
> Since the vfio_group_release() doesn't hold the appropriate lock to
> manipulate the unbound_list and could race with the notifier, move the
> cleanup to directly before the kfree.
> 
> This allows deleting all of the deferred group put code.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

btw once we get dma ownership check in iommu core, the entire
iommu group notifier and unbound list will be removed. Then this
deferred code will be unnecessary even w/o this tricky but precise 
analysis. 😊

> ---
>  drivers/vfio/vfio.c | 89 +++++----------------------------------------
>  1 file changed, 9 insertions(+), 80 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 08b27b64f0f935..32a53cb3598524 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -324,12 +324,20 @@ static void vfio_container_put(struct
> vfio_container *container)
> 
>  static void vfio_group_unlock_and_free(struct vfio_group *group)
>  {
> +	struct vfio_unbound_dev *unbound, *tmp;
> +
>  	mutex_unlock(&vfio.group_lock);
>  	/*
>  	 * Unregister outside of lock.  A spurious callback is harmless now
>  	 * that the group is no longer in vfio.group_list.
>  	 */
>  	iommu_group_unregister_notifier(group->iommu_group, &group-
> >nb);
> +
> +	list_for_each_entry_safe(unbound, tmp,
> +				 &group->unbound_list, unbound_next) {
> +		list_del(&unbound->unbound_next);
> +		kfree(unbound);
> +	}
>  	kfree(group);
>  }
> 
> @@ -361,13 +369,6 @@ static struct vfio_group *vfio_create_group(struct
> iommu_group *iommu_group,
> 
>  	group->nb.notifier_call = vfio_iommu_group_notifier;
> 
> -	/*
> -	 * blocking notifiers acquire a rwsem around registering and hold
> -	 * it around callback.  Therefore, need to register outside of
> -	 * vfio.group_lock to avoid A-B/B-A contention.  Our callback won't
> -	 * do anything unless it can find the group in vfio.group_list, so
> -	 * no harm in registering early.
> -	 */
>  	ret = iommu_group_register_notifier(iommu_group, &group->nb);
>  	if (ret) {
>  		kfree(group);
> @@ -415,18 +416,12 @@ static struct vfio_group *vfio_create_group(struct
> iommu_group *iommu_group,
>  static void vfio_group_release(struct kref *kref)
>  {
>  	struct vfio_group *group = container_of(kref, struct vfio_group, kref);
> -	struct vfio_unbound_dev *unbound, *tmp;
>  	struct iommu_group *iommu_group = group->iommu_group;
> 
>  	WARN_ON(!list_empty(&group->device_list));
> +	WARN_ON(atomic_read(&group->container_users));
>  	WARN_ON(group->notifier.head);
> 
> -	list_for_each_entry_safe(unbound, tmp,
> -				 &group->unbound_list, unbound_next) {
> -		list_del(&unbound->unbound_next);
> -		kfree(unbound);
> -	}
> -
>  	device_destroy(vfio.class, MKDEV(MAJOR(vfio.group_devt), group-
> >minor));
>  	list_del(&group->vfio_next);
>  	vfio_free_group_minor(group->minor);
> @@ -439,61 +434,12 @@ static void vfio_group_put(struct vfio_group
> *group)
>  	kref_put_mutex(&group->kref, vfio_group_release,
> &vfio.group_lock);
>  }
> 
> -struct vfio_group_put_work {
> -	struct work_struct work;
> -	struct vfio_group *group;
> -};
> -
> -static void vfio_group_put_bg(struct work_struct *work)
> -{
> -	struct vfio_group_put_work *do_work;
> -
> -	do_work = container_of(work, struct vfio_group_put_work, work);
> -
> -	vfio_group_put(do_work->group);
> -	kfree(do_work);
> -}
> -
> -static void vfio_group_schedule_put(struct vfio_group *group)
> -{
> -	struct vfio_group_put_work *do_work;
> -
> -	do_work = kmalloc(sizeof(*do_work), GFP_KERNEL);
> -	if (WARN_ON(!do_work))
> -		return;
> -
> -	INIT_WORK(&do_work->work, vfio_group_put_bg);
> -	do_work->group = group;
> -	schedule_work(&do_work->work);
> -}
> -
>  /* Assume group_lock or group reference is held */
>  static void vfio_group_get(struct vfio_group *group)
>  {
>  	kref_get(&group->kref);
>  }
> 
> -/*
> - * Not really a try as we will sleep for mutex, but we need to make
> - * sure the group pointer is valid under lock and get a reference.
> - */
> -static struct vfio_group *vfio_group_try_get(struct vfio_group *group)
> -{
> -	struct vfio_group *target = group;
> -
> -	mutex_lock(&vfio.group_lock);
> -	list_for_each_entry(group, &vfio.group_list, vfio_next) {
> -		if (group == target) {
> -			vfio_group_get(group);
> -			mutex_unlock(&vfio.group_lock);
> -			return group;
> -		}
> -	}
> -	mutex_unlock(&vfio.group_lock);
> -
> -	return NULL;
> -}
> -
>  static
>  struct vfio_group *vfio_group_get_from_iommu(struct iommu_group
> *iommu_group)
>  {
> @@ -691,14 +637,6 @@ static int vfio_iommu_group_notifier(struct
> notifier_block *nb,
>  	struct device *dev = data;
>  	struct vfio_unbound_dev *unbound;
> 
> -	/*
> -	 * Need to go through a group_lock lookup to get a reference or we
> -	 * risk racing a group being removed.  Ignore spurious notifies.
> -	 */
> -	group = vfio_group_try_get(group);
> -	if (!group)
> -		return NOTIFY_OK;
> -
>  	switch (action) {
>  	case IOMMU_GROUP_NOTIFY_ADD_DEVICE:
>  		vfio_group_nb_add_dev(group, dev);
> @@ -749,15 +687,6 @@ static int vfio_iommu_group_notifier(struct
> notifier_block *nb,
>  		mutex_unlock(&group->unbound_lock);
>  		break;
>  	}
> -
> -	/*
> -	 * If we're the last reference to the group, the group will be
> -	 * released, which includes unregistering the iommu group notifier.
> -	 * We hold a read-lock on that notifier list, unregistering needs
> -	 * a write-lock... deadlock.  Release our reference asynchronously
> -	 * to avoid that situation.
> -	 */
> -	vfio_group_schedule_put(group);
>  	return NOTIFY_OK;
>  }
> 
> --
> 2.33.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 2/5] vfio: Do not open code the group list search in vfio_create_group()
  2021-10-01 23:22 ` [PATCH 2/5] vfio: Do not open code the group list search in vfio_create_group() Jason Gunthorpe
  2021-10-02  3:19   ` kernel test robot
@ 2021-10-12  6:37   ` Tian, Kevin
  2021-10-12  8:52   ` Liu, Yi L
  2 siblings, 0 replies; 29+ messages in thread
From: Tian, Kevin @ 2021-10-12  6:37 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson, Cornelia Huck, kvm
  Cc: Christoph Hellwig, Liu, Yi L

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, October 2, 2021 7:22 AM
> 
> Split vfio_group_get_from_iommu() into __vfio_group_get_from_iommu()
> so
> that vfio_create_group() can call it to consolidate this duplicated code.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

> ---
>  drivers/vfio/vfio.c | 55 ++++++++++++++++++++++++---------------------
>  1 file changed, 30 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 32a53cb3598524..1cb12033b02240 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -344,10 +344,35 @@ static void vfio_group_unlock_and_free(struct
> vfio_group *group)
>  /**
>   * Group objects - create, release, get, put, search
>   */
> +static struct vfio_group *
> +__vfio_group_get_from_iommu(struct iommu_group *iommu_group)
> +{
> +	struct vfio_group *group;
> +
> +	list_for_each_entry(group, &vfio.group_list, vfio_next) {
> +		if (group->iommu_group == iommu_group) {
> +			vfio_group_get(group);
> +			return group;
> +		}
> +	}
> +	return NULL;
> +}
> +
> +static struct vfio_group *
> +vfio_group_get_from_iommu(struct iommu_group *iommu_group)
> +{
> +	struct vfio_group *group;
> +
> +	mutex_lock(&vfio.group_lock);
> +	group = __vfio_group_get_from_iommu(iommu_group);
> +	mutex_unlock(&vfio.group_lock);
> +	return group;
> +}
> +
>  static struct vfio_group *vfio_create_group(struct iommu_group
> *iommu_group,
>  		enum vfio_group_type type)
>  {
> -	struct vfio_group *group, *tmp;
> +	struct vfio_group *group, *existing_group;
>  	struct device *dev;
>  	int ret, minor;
> 
> @@ -378,12 +403,10 @@ static struct vfio_group *vfio_create_group(struct
> iommu_group *iommu_group,
>  	mutex_lock(&vfio.group_lock);
> 
>  	/* Did we race creating this group? */
> -	list_for_each_entry(tmp, &vfio.group_list, vfio_next) {
> -		if (tmp->iommu_group == iommu_group) {
> -			vfio_group_get(tmp);
> -			vfio_group_unlock_and_free(group);
> -			return tmp;
> -		}
> +	existing_group = __vfio_group_get_from_iommu(iommu_group);
> +	if (existing_group) {
> +		vfio_group_unlock_and_free(group);
> +		return existing_group;
>  	}
> 
>  	minor = vfio_alloc_group_minor(group);
> @@ -440,24 +463,6 @@ static void vfio_group_get(struct vfio_group *group)
>  	kref_get(&group->kref);
>  }
> 
> -static
> -struct vfio_group *vfio_group_get_from_iommu(struct iommu_group
> *iommu_group)
> -{
> -	struct vfio_group *group;
> -
> -	mutex_lock(&vfio.group_lock);
> -	list_for_each_entry(group, &vfio.group_list, vfio_next) {
> -		if (group->iommu_group == iommu_group) {
> -			vfio_group_get(group);
> -			mutex_unlock(&vfio.group_lock);
> -			return group;
> -		}
> -	}
> -	mutex_unlock(&vfio.group_lock);
> -
> -	return NULL;
> -}
> -
>  static struct vfio_group *vfio_group_get_from_minor(int minor)
>  {
>  	struct vfio_group *group;
> --
> 2.33.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group
  2021-10-01 23:22 ` [PATCH 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group Jason Gunthorpe
  2021-10-04 22:25   ` Alex Williamson
@ 2021-10-12  7:08   ` Tian, Kevin
  2021-10-12  9:04   ` Liu, Yi L
  2 siblings, 0 replies; 29+ messages in thread
From: Tian, Kevin @ 2021-10-12  7:08 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson, Cornelia Huck, kvm
  Cc: Christoph Hellwig, Liu, Yi L

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, October 2, 2021 7:22 AM
> 
> The next patch adds a struct device to the struct vfio_group, and it is
> confusing/bad practice to have two krefs in the same struct. This kref is
> controlling the period when the vfio_group is registered in sysfs, and
> visible in the internal lookup. Switch it to a refcount_t instead.
> 
> The refcount_dec_and_mutex_lock() is still required because we need
> atomicity of the list searches and sysfs presence.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>

> ---
>  drivers/vfio/vfio.c | 19 +++++++------------
>  1 file changed, 7 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index bf233943dc992f..dbe7edd88ce35c 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -69,7 +69,7 @@ struct vfio_unbound_dev {
>  };
> 
>  struct vfio_group {
> -	struct kref			kref;
> +	refcount_t users;
>  	int				minor;
>  	atomic_t			container_users;
>  	struct iommu_group		*iommu_group;
> @@ -381,7 +381,7 @@ static struct vfio_group *vfio_create_group(struct
> iommu_group *iommu_group,
>  	if (!group)
>  		return ERR_PTR(-ENOMEM);
> 
> -	kref_init(&group->kref);
> +	refcount_set(&group->users, 1);
>  	INIT_LIST_HEAD(&group->device_list);
>  	mutex_init(&group->device_lock);
>  	INIT_LIST_HEAD(&group->unbound_list);
> @@ -441,10 +441,10 @@ static struct vfio_group *vfio_create_group(struct
> iommu_group *iommu_group,
>  	return group;
>  }
> 
> -/* called with vfio.group_lock held */
> -static void vfio_group_release(struct kref *kref)
> +static void vfio_group_put(struct vfio_group *group)
>  {
> -	struct vfio_group *group = container_of(kref, struct vfio_group, kref);
> +	if (!refcount_dec_and_mutex_lock(&group->users, &vfio.group_lock))
> +		return;
> 
>  	WARN_ON(!list_empty(&group->device_list));
>  	WARN_ON(atomic_read(&group->container_users));
> @@ -456,15 +456,9 @@ static void vfio_group_release(struct kref *kref)
>  	vfio_group_unlock_and_free(group);
>  }
> 
> -static void vfio_group_put(struct vfio_group *group)
> -{
> -	kref_put_mutex(&group->kref, vfio_group_release,
> &vfio.group_lock);
> -}
> -
> -/* Assume group_lock or group reference is held */
>  static void vfio_group_get(struct vfio_group *group)
>  {
> -	kref_get(&group->kref);
> +	refcount_inc(&group->users);
>  }
> 
>  static struct vfio_group *vfio_group_get_from_minor(int minor)
> @@ -1662,6 +1656,7 @@ struct vfio_group
> *vfio_group_get_external_user(struct file *filep)
>  	if (ret)
>  		return ERR_PTR(ret);
> 
> +	/* Since the caller holds the fget on the file users must be >= 1 */
>  	vfio_group_get(group);
> 
>  	return group;
> --
> 2.33.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 5/5] vfio: Use cdev_device_add() instead of device_create()
  2021-10-01 23:22 ` [PATCH 5/5] vfio: Use cdev_device_add() instead of device_create() Jason Gunthorpe
@ 2021-10-12  8:33   ` Tian, Kevin
  2021-10-12 12:05     ` Jason Gunthorpe
  2021-10-12  8:57   ` Liu, Yi L
  1 sibling, 1 reply; 29+ messages in thread
From: Tian, Kevin @ 2021-10-12  8:33 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson, Cornelia Huck, kvm
  Cc: Christoph Hellwig, Liu, Yi L

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, October 2, 2021 7:22 AM
>
[...]
 
> +static void vfio_group_release(struct device *dev)
>  {
> -	struct vfio_group *group, *existing_group;
> -	struct device *dev;
> -	int ret, minor;
> +	struct vfio_group *group = container_of(dev, struct vfio_group, dev);
> +	struct vfio_unbound_dev *unbound, *tmp;
> +
> +	list_for_each_entry_safe(unbound, tmp,
> +				 &group->unbound_list, unbound_next) {
> +		list_del(&unbound->unbound_next);
> +		kfree(unbound);
> +	}

move to vfio_group_put()? this is not paired with vfio_group_alloc()...

[...]
>  static int vfio_group_fops_open(struct inode *inode, struct file *filep)
>  {
> -	struct vfio_group *group;
> +	struct vfio_group *group =
> +		container_of(inode->i_cdev, struct vfio_group, cdev);
>  	int opened;

A curiosity question. According to cdev_device_del() any cdev already
open will remain with their fops callable. What prevents vfio_group
from being released after cdev_device_del() has been called? Is it
because cdev open will hold a reference to device thus put_device()
will not hit zero in vfio_group_put()?

Thanks
Kevin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier()
  2021-10-01 23:22 ` [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier() Jason Gunthorpe
  2021-10-04 22:25   ` Alex Williamson
  2021-10-12  6:32   ` Tian, Kevin
@ 2021-10-12  8:51   ` Liu, Yi L
  2 siblings, 0 replies; 29+ messages in thread
From: Liu, Yi L @ 2021-10-12  8:51 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson, Cornelia Huck, kvm
  Cc: Christoph Hellwig, Tian, Kevin

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, October 2, 2021 7:22 AM
> 
> iommu_group_register_notifier()/iommu_group_unregister_notifier() are
> built using a blocking_notifier_chain which integrates a rwsem. The
> notifier function cannot be running outside its registration.
> 
> When considering how the notifier function interacts with create/destroy
> of the group there are two fringe cases, the notifier starts before
> list_add(&vfio.group_list) and the notifier runs after the kref
> becomes 0.
> 
> Prior to vfio_create_group() unlocking and returning we have
>    container_users == 0
>    device_list == empty
> And this cannot change until the mutex is unlocked.
> 
> After the kref goes to zero we must also have
>    container_users == 0
>    device_list == empty
> 
> Both are required because they are balanced operations and a 0 kref means
> some caller became unbalanced. Add the missing assertion that
> container_users must be zero as well.
> 
> These two facts are important because when checking each operation we see:
> 
> - IOMMU_GROUP_NOTIFY_ADD_DEVICE
>    Empty device_list avoids the WARN_ON in vfio_group_nb_add_dev()
>    0 container_users ends the call
> - IOMMU_GROUP_NOTIFY_BOUND_DRIVER
>    0 container_users ends the call
> 
> Finally, we have IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER, which only
> deletes
> items from the unbound list. During creation this list is empty, during
> kref == 0 nothing can read this list, and it will be freed soon.
> 
> Since the vfio_group_release() doesn't hold the appropriate lock to
> manipulate the unbound_list and could race with the notifier, move the
> cleanup to directly before the kfree.
> 
> This allows deleting all of the deferred group put code.

Looks good to me.

Reviewed-by: Liu Yi L <yi.l.liu@intel.com>

Regards,
Yi Liu

> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/vfio/vfio.c | 89 +++++----------------------------------------
>  1 file changed, 9 insertions(+), 80 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 08b27b64f0f935..32a53cb3598524 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -324,12 +324,20 @@ static void vfio_container_put(struct vfio_container
> *container)
> 
>  static void vfio_group_unlock_and_free(struct vfio_group *group)
>  {
> +	struct vfio_unbound_dev *unbound, *tmp;
> +
>  	mutex_unlock(&vfio.group_lock);
>  	/*
>  	 * Unregister outside of lock.  A spurious callback is harmless now
>  	 * that the group is no longer in vfio.group_list.
>  	 */
>  	iommu_group_unregister_notifier(group->iommu_group, &group->nb);
> +
> +	list_for_each_entry_safe(unbound, tmp,
> +				 &group->unbound_list, unbound_next) {
> +		list_del(&unbound->unbound_next);
> +		kfree(unbound);
> +	}
>  	kfree(group);
>  }
> 
> @@ -361,13 +369,6 @@ static struct vfio_group *vfio_create_group(struct
> iommu_group *iommu_group,
> 
>  	group->nb.notifier_call = vfio_iommu_group_notifier;
> 
> -	/*
> -	 * blocking notifiers acquire a rwsem around registering and hold
> -	 * it around callback.  Therefore, need to register outside of
> -	 * vfio.group_lock to avoid A-B/B-A contention.  Our callback won't
> -	 * do anything unless it can find the group in vfio.group_list, so
> -	 * no harm in registering early.
> -	 */
>  	ret = iommu_group_register_notifier(iommu_group, &group->nb);
>  	if (ret) {
>  		kfree(group);
> @@ -415,18 +416,12 @@ static struct vfio_group *vfio_create_group(struct
> iommu_group *iommu_group,
>  static void vfio_group_release(struct kref *kref)
>  {
>  	struct vfio_group *group = container_of(kref, struct vfio_group, kref);
> -	struct vfio_unbound_dev *unbound, *tmp;
>  	struct iommu_group *iommu_group = group->iommu_group;
> 
>  	WARN_ON(!list_empty(&group->device_list));
> +	WARN_ON(atomic_read(&group->container_users));
>  	WARN_ON(group->notifier.head);
> 
> -	list_for_each_entry_safe(unbound, tmp,
> -				 &group->unbound_list, unbound_next) {
> -		list_del(&unbound->unbound_next);
> -		kfree(unbound);
> -	}
> -
>  	device_destroy(vfio.class, MKDEV(MAJOR(vfio.group_devt), group-
> >minor));
>  	list_del(&group->vfio_next);
>  	vfio_free_group_minor(group->minor);
> @@ -439,61 +434,12 @@ static void vfio_group_put(struct vfio_group *group)
>  	kref_put_mutex(&group->kref, vfio_group_release, &vfio.group_lock);
>  }
> 
> -struct vfio_group_put_work {
> -	struct work_struct work;
> -	struct vfio_group *group;
> -};
> -
> -static void vfio_group_put_bg(struct work_struct *work)
> -{
> -	struct vfio_group_put_work *do_work;
> -
> -	do_work = container_of(work, struct vfio_group_put_work, work);
> -
> -	vfio_group_put(do_work->group);
> -	kfree(do_work);
> -}
> -
> -static void vfio_group_schedule_put(struct vfio_group *group)
> -{
> -	struct vfio_group_put_work *do_work;
> -
> -	do_work = kmalloc(sizeof(*do_work), GFP_KERNEL);
> -	if (WARN_ON(!do_work))
> -		return;
> -
> -	INIT_WORK(&do_work->work, vfio_group_put_bg);
> -	do_work->group = group;
> -	schedule_work(&do_work->work);
> -}
> -
>  /* Assume group_lock or group reference is held */
>  static void vfio_group_get(struct vfio_group *group)
>  {
>  	kref_get(&group->kref);
>  }
> 
> -/*
> - * Not really a try as we will sleep for mutex, but we need to make
> - * sure the group pointer is valid under lock and get a reference.
> - */
> -static struct vfio_group *vfio_group_try_get(struct vfio_group *group)
> -{
> -	struct vfio_group *target = group;
> -
> -	mutex_lock(&vfio.group_lock);
> -	list_for_each_entry(group, &vfio.group_list, vfio_next) {
> -		if (group == target) {
> -			vfio_group_get(group);
> -			mutex_unlock(&vfio.group_lock);
> -			return group;
> -		}
> -	}
> -	mutex_unlock(&vfio.group_lock);
> -
> -	return NULL;
> -}
> -
>  static
>  struct vfio_group *vfio_group_get_from_iommu(struct iommu_group
> *iommu_group)
>  {
> @@ -691,14 +637,6 @@ static int vfio_iommu_group_notifier(struct
> notifier_block *nb,
>  	struct device *dev = data;
>  	struct vfio_unbound_dev *unbound;
> 
> -	/*
> -	 * Need to go through a group_lock lookup to get a reference or we
> -	 * risk racing a group being removed.  Ignore spurious notifies.
> -	 */
> -	group = vfio_group_try_get(group);
> -	if (!group)
> -		return NOTIFY_OK;
> -
>  	switch (action) {
>  	case IOMMU_GROUP_NOTIFY_ADD_DEVICE:
>  		vfio_group_nb_add_dev(group, dev);
> @@ -749,15 +687,6 @@ static int vfio_iommu_group_notifier(struct
> notifier_block *nb,
>  		mutex_unlock(&group->unbound_lock);
>  		break;
>  	}
> -
> -	/*
> -	 * If we're the last reference to the group, the group will be
> -	 * released, which includes unregistering the iommu group notifier.
> -	 * We hold a read-lock on that notifier list, unregistering needs
> -	 * a write-lock... deadlock.  Release our reference asynchronously
> -	 * to avoid that situation.
> -	 */
> -	vfio_group_schedule_put(group);
>  	return NOTIFY_OK;
>  }
> 
> --
> 2.33.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 2/5] vfio: Do not open code the group list search in vfio_create_group()
  2021-10-01 23:22 ` [PATCH 2/5] vfio: Do not open code the group list search in vfio_create_group() Jason Gunthorpe
  2021-10-02  3:19   ` kernel test robot
  2021-10-12  6:37   ` Tian, Kevin
@ 2021-10-12  8:52   ` Liu, Yi L
  2 siblings, 0 replies; 29+ messages in thread
From: Liu, Yi L @ 2021-10-12  8:52 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson, Cornelia Huck, kvm
  Cc: Christoph Hellwig, Tian, Kevin

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, October 2, 2021 7:22 AM
> 
> Split vfio_group_get_from_iommu() into __vfio_group_get_from_iommu() so
> that vfio_create_group() can call it to consolidate this duplicated code.

Reviewed-by: Liu Yi L <yi.l.liu@intel.com>

Regards,
Yi Liu

> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/vfio/vfio.c | 55 ++++++++++++++++++++++++---------------------
>  1 file changed, 30 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 32a53cb3598524..1cb12033b02240 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -344,10 +344,35 @@ static void vfio_group_unlock_and_free(struct
> vfio_group *group)
>  /**
>   * Group objects - create, release, get, put, search
>   */
> +static struct vfio_group *
> +__vfio_group_get_from_iommu(struct iommu_group *iommu_group)
> +{
> +	struct vfio_group *group;
> +
> +	list_for_each_entry(group, &vfio.group_list, vfio_next) {
> +		if (group->iommu_group == iommu_group) {
> +			vfio_group_get(group);
> +			return group;
> +		}
> +	}
> +	return NULL;
> +}
> +
> +static struct vfio_group *
> +vfio_group_get_from_iommu(struct iommu_group *iommu_group)
> +{
> +	struct vfio_group *group;
> +
> +	mutex_lock(&vfio.group_lock);
> +	group = __vfio_group_get_from_iommu(iommu_group);
> +	mutex_unlock(&vfio.group_lock);
> +	return group;
> +}
> +
>  static struct vfio_group *vfio_create_group(struct iommu_group
> *iommu_group,
>  		enum vfio_group_type type)
>  {
> -	struct vfio_group *group, *tmp;
> +	struct vfio_group *group, *existing_group;
>  	struct device *dev;
>  	int ret, minor;
> 
> @@ -378,12 +403,10 @@ static struct vfio_group *vfio_create_group(struct
> iommu_group *iommu_group,
>  	mutex_lock(&vfio.group_lock);
> 
>  	/* Did we race creating this group? */
> -	list_for_each_entry(tmp, &vfio.group_list, vfio_next) {
> -		if (tmp->iommu_group == iommu_group) {
> -			vfio_group_get(tmp);
> -			vfio_group_unlock_and_free(group);
> -			return tmp;
> -		}
> +	existing_group = __vfio_group_get_from_iommu(iommu_group);
> +	if (existing_group) {
> +		vfio_group_unlock_and_free(group);
> +		return existing_group;
>  	}
> 
>  	minor = vfio_alloc_group_minor(group);
> @@ -440,24 +463,6 @@ static void vfio_group_get(struct vfio_group *group)
>  	kref_get(&group->kref);
>  }
> 
> -static
> -struct vfio_group *vfio_group_get_from_iommu(struct iommu_group
> *iommu_group)
> -{
> -	struct vfio_group *group;
> -
> -	mutex_lock(&vfio.group_lock);
> -	list_for_each_entry(group, &vfio.group_list, vfio_next) {
> -		if (group->iommu_group == iommu_group) {
> -			vfio_group_get(group);
> -			mutex_unlock(&vfio.group_lock);
> -			return group;
> -		}
> -	}
> -	mutex_unlock(&vfio.group_lock);
> -
> -	return NULL;
> -}
> -
>  static struct vfio_group *vfio_group_get_from_minor(int minor)
>  {
>  	struct vfio_group *group;
> --
> 2.33.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 5/5] vfio: Use cdev_device_add() instead of device_create()
  2021-10-01 23:22 ` [PATCH 5/5] vfio: Use cdev_device_add() instead of device_create() Jason Gunthorpe
  2021-10-12  8:33   ` Tian, Kevin
@ 2021-10-12  8:57   ` Liu, Yi L
  2021-10-13 12:49     ` Jason Gunthorpe
  1 sibling, 1 reply; 29+ messages in thread
From: Liu, Yi L @ 2021-10-12  8:57 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson, Cornelia Huck, kvm
  Cc: Christoph Hellwig, Tian, Kevin

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, October 2, 2021 7:22 AM
> 
> Modernize how vfio is creating the group char dev and sysfs presence.
> 
> These days drivers with state should use cdev_device_add() and
> cdev_device_del() to manage the cdev and sysfs lifetime.
> 
> This API requires the driver to put the struct device and struct cdev
> inside its state struct (vfio_group), and then use the usual
> device_initialize()/cdev_device_add()/cdev_device_del() sequence.
> 
> Split the code to make this possible:
> 
>  - vfio_group_alloc()/vfio_group_release() are pair'd functions to
>    alloc/free the vfio_group. release is done under the struct device
>    kref.
> 
>  - vfio_create_group()/vfio_group_put() are pairs that manage the
>    sysfs/cdev lifetime. Once the uses count is zero the vfio group's
>    userspace presence is destroyed.
> 
>  - The IDR is replaced with an IDA. container_of(inode->i_cdev)
>    is used to get back to the vfio_group during fops open. The IDA
>    assigns unique minor numbers.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/vfio/vfio.c | 200 +++++++++++++++++++++-----------------------
>  1 file changed, 94 insertions(+), 106 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index dbe7edd88ce35c..01e04947250f40 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -43,9 +43,8 @@ static struct vfio {
>  	struct list_head		iommu_drivers_list;
>  	struct mutex			iommu_drivers_lock;
>  	struct list_head		group_list;
> -	struct idr			group_idr;
> -	struct mutex			group_lock;
> -	struct cdev			group_cdev;
> +	struct mutex			group_lock; /* locks group_list */
> +	struct ida			group_ida;
>  	dev_t				group_devt;
>  } vfio;
> 
> @@ -69,14 +68,14 @@ struct vfio_unbound_dev {
>  };
> 
>  struct vfio_group {
> +	struct device dev;
> +	struct cdev cdev;
>  	refcount_t users;
> -	int				minor;
>  	atomic_t			container_users;
>  	struct iommu_group		*iommu_group;
>  	struct vfio_container		*container;
>  	struct list_head		device_list;
>  	struct mutex			device_lock;
> -	struct device			*dev;
>  	struct notifier_block		nb;
>  	struct list_head		vfio_next;
>  	struct list_head		container_next;
> @@ -98,6 +97,7 @@ MODULE_PARM_DESC(enable_unsafe_noiommu_mode,
> "Enable UNSAFE, no-IOMMU mode.  Thi
>  #endif
> 
>  static DEFINE_XARRAY(vfio_device_set_xa);
> +static const struct file_operations vfio_group_fops;
> 
>  int vfio_assign_device_set(struct vfio_device *device, void *set_id)
>  {
> @@ -281,19 +281,6 @@ void vfio_unregister_iommu_driver(const struct
> vfio_iommu_driver_ops *ops)
>  }
>  EXPORT_SYMBOL_GPL(vfio_unregister_iommu_driver);
> 
> -/**
> - * Group minor allocation/free - both called with vfio.group_lock held
> - */
> -static int vfio_alloc_group_minor(struct vfio_group *group)
> -{
> -	return idr_alloc(&vfio.group_idr, group, 0, MINORMASK + 1,
> GFP_KERNEL);
> -}
> -
> -static void vfio_free_group_minor(int minor)
> -{
> -	idr_remove(&vfio.group_idr, minor);
> -}
> -
>  static int vfio_iommu_group_notifier(struct notifier_block *nb,
>  				     unsigned long action, void *data);
>  static void vfio_group_get(struct vfio_group *group);
> @@ -322,26 +309,6 @@ static void vfio_container_put(struct vfio_container
> *container)
>  	kref_put(&container->kref, vfio_container_release);
>  }
> 
> -static void vfio_group_unlock_and_free(struct vfio_group *group)
> -{
> -	struct vfio_unbound_dev *unbound, *tmp;
> -
> -	mutex_unlock(&vfio.group_lock);
> -	/*
> -	 * Unregister outside of lock.  A spurious callback is harmless now
> -	 * that the group is no longer in vfio.group_list.
> -	 */
> -	iommu_group_unregister_notifier(group->iommu_group, &group->nb);
> -
> -	list_for_each_entry_safe(unbound, tmp,
> -				 &group->unbound_list, unbound_next) {
> -		list_del(&unbound->unbound_next);
> -		kfree(unbound);
> -	}
> -	iommu_group_put(group->iommu_group);
> -	kfree(group);
> -}
> -
>  /**
>   * Group objects - create, release, get, put, search
>   */
> @@ -370,75 +337,112 @@ vfio_group_get_from_iommu(struct iommu_group
> *iommu_group)
>  	return group;
>  }
> 
> -static struct vfio_group *vfio_create_group(struct iommu_group
> *iommu_group,
> -		enum vfio_group_type type)
> +static void vfio_group_release(struct device *dev)
>  {
> -	struct vfio_group *group, *existing_group;
> -	struct device *dev;
> -	int ret, minor;
> +	struct vfio_group *group = container_of(dev, struct vfio_group, dev);
> +	struct vfio_unbound_dev *unbound, *tmp;
> +
> +	list_for_each_entry_safe(unbound, tmp,
> +				 &group->unbound_list, unbound_next) {
> +		list_del(&unbound->unbound_next);
> +		kfree(unbound);
> +	}
> +
> +	mutex_destroy(&group->device_lock);
> +	mutex_destroy(&group->unbound_lock);
> +	iommu_group_put(group->iommu_group);
> +	ida_free(&vfio.group_ida, MINOR(group->dev.devt));
> +	kfree(group);
> +}
> +
> +static struct vfio_group *vfio_group_alloc(struct iommu_group
> *iommu_group,
> +					   enum vfio_group_type type)
> +{
> +	struct vfio_group *group;
> +	int minor;
> 
>  	group = kzalloc(sizeof(*group), GFP_KERNEL);
>  	if (!group)
>  		return ERR_PTR(-ENOMEM);
> 
> +	minor = ida_alloc_max(&vfio.group_ida, MINORMASK, GFP_KERNEL);
> +	if (minor < 0) {
> +		kfree(group);
> +		return ERR_PTR(minor);
> +	}
> +
> +	device_initialize(&group->dev);
> +	group->dev.devt = MKDEV(MAJOR(vfio.group_devt), minor);
> +	group->dev.class = vfio.class;
> +	group->dev.release = vfio_group_release;
> +	cdev_init(&group->cdev, &vfio_group_fops);
> +	group->cdev.owner = THIS_MODULE;
> +
>  	refcount_set(&group->users, 1);
>  	INIT_LIST_HEAD(&group->device_list);
>  	mutex_init(&group->device_lock);
>  	INIT_LIST_HEAD(&group->unbound_list);
>  	mutex_init(&group->unbound_lock);
> -	atomic_set(&group->container_users, 0);
> -	atomic_set(&group->opened, 0);
>  	init_waitqueue_head(&group->container_q);
>  	group->iommu_group = iommu_group;
> -	/* put in vfio_group_unlock_and_free() */
> +	/* put in vfio_group_release() */
>  	iommu_group_ref_get(iommu_group);
>  	group->type = type;
>  	BLOCKING_INIT_NOTIFIER_HEAD(&group->notifier);
> 
> -	group->nb.notifier_call = vfio_iommu_group_notifier;
> +	return group;
> +}
> 
> -	ret = iommu_group_register_notifier(iommu_group, &group->nb);
> -	if (ret) {
> -		group = ERR_PTR(ret);
> -		goto err_put_group;
> +static struct vfio_group *vfio_create_group(struct iommu_group
> *iommu_group,
> +		enum vfio_group_type type)
> +{
> +	struct vfio_group *group;
> +	struct vfio_group *ret;
> +	int err;
> +
> +	group = vfio_group_alloc(iommu_group, type);
> +	if (IS_ERR(group))
> +		return group;
> +
> +	err = dev_set_name(&group->dev, "%s%d",
> +			   group->type == VFIO_NO_IOMMU ? "noiommu-" : "",
> +			   iommu_group_id(iommu_group));
> +	if (err) {
> +		ret = ERR_PTR(err);
> +		goto err_put;
> +	}
> +
> +	group->nb.notifier_call = vfio_iommu_group_notifier;
> +	err = iommu_group_register_notifier(iommu_group, &group->nb);
> +	if (err) {
> +		ret = ERR_PTR(err);
> +		goto err_put;
>  	}
> 
>  	mutex_lock(&vfio.group_lock);
> 
>  	/* Did we race creating this group? */
> -	existing_group = __vfio_group_get_from_iommu(iommu_group);
> -	if (existing_group) {
> -		vfio_group_unlock_and_free(group);
> -		return existing_group;
> -	}
> +	ret = __vfio_group_get_from_iommu(iommu_group);
> +	if (ret)
> +		goto err_unlock;
> 
> -	minor = vfio_alloc_group_minor(group);
> -	if (minor < 0) {
> -		vfio_group_unlock_and_free(group);
> -		return ERR_PTR(minor);
> +	err = cdev_device_add(&group->cdev, &group->dev);
> +	if (err) {
> +		ret = ERR_PTR(err);
> +		goto err_unlock;

should this err branch put the vfio_group reference acquired
in above __vfio_group_get_from_iommu(iommu_group);?

Regards,
Yi Liu

>  	}
> 
> -	dev = device_create(vfio.class, NULL,
> -			    MKDEV(MAJOR(vfio.group_devt), minor), group,
> "%s%d",
> -			    group->type == VFIO_NO_IOMMU ? "noiommu-" :
> "",
> -			    iommu_group_id(iommu_group));
> -	if (IS_ERR(dev)) {
> -		vfio_free_group_minor(minor);
> -		vfio_group_unlock_and_free(group);
> -		return ERR_CAST(dev);
> -	}
> -
> -	group->minor = minor;
> -	group->dev = dev;
> -
>  	list_add(&group->vfio_next, &vfio.group_list);
> 
>  	mutex_unlock(&vfio.group_lock);
> -
> -err_put_group:
> -	iommu_group_put(iommu_group);
> -	kfree(group);
>  	return group;
> +
> +err_unlock:
> +	mutex_unlock(&vfio.group_lock);
> +	iommu_group_unregister_notifier(group->iommu_group, &group->nb);
> +err_put:
> +	put_device(&group->dev);
> +	return ret;
>  }
> 
>  static void vfio_group_put(struct vfio_group *group)
> @@ -450,10 +454,12 @@ static void vfio_group_put(struct vfio_group *group)
>  	WARN_ON(atomic_read(&group->container_users));
>  	WARN_ON(group->notifier.head);
> 
> -	device_destroy(vfio.class, MKDEV(MAJOR(vfio.group_devt), group-
> >minor));
>  	list_del(&group->vfio_next);
> -	vfio_free_group_minor(group->minor);
> -	vfio_group_unlock_and_free(group);
> +	cdev_device_del(&group->cdev, &group->dev);
> +	mutex_unlock(&vfio.group_lock);
> +
> +	iommu_group_unregister_notifier(group->iommu_group, &group->nb);
> +	put_device(&group->dev);
>  }
> 
>  static void vfio_group_get(struct vfio_group *group)
> @@ -461,20 +467,10 @@ static void vfio_group_get(struct vfio_group *group)
>  	refcount_inc(&group->users);
>  }
> 
> -static struct vfio_group *vfio_group_get_from_minor(int minor)
> +/* returns true if the get was obtained */
> +static bool vfio_group_try_get(struct vfio_group *group)
>  {
> -	struct vfio_group *group;
> -
> -	mutex_lock(&vfio.group_lock);
> -	group = idr_find(&vfio.group_idr, minor);
> -	if (!group) {
> -		mutex_unlock(&vfio.group_lock);
> -		return NULL;
> -	}
> -	vfio_group_get(group);
> -	mutex_unlock(&vfio.group_lock);
> -
> -	return group;
> +	return refcount_inc_not_zero(&group->users);
>  }
> 
>  static struct vfio_group *vfio_group_get_from_dev(struct device *dev)
> @@ -1484,11 +1480,11 @@ static long vfio_group_fops_unl_ioctl(struct file
> *filep,
> 
>  static int vfio_group_fops_open(struct inode *inode, struct file *filep)
>  {
> -	struct vfio_group *group;
> +	struct vfio_group *group =
> +		container_of(inode->i_cdev, struct vfio_group, cdev);
>  	int opened;
> 
> -	group = vfio_group_get_from_minor(iminor(inode));
> -	if (!group)
> +	if (!vfio_group_try_get(group))
>  		return -ENODEV;
> 
>  	if (group->type == VFIO_NO_IOMMU && !capable(CAP_SYS_RAWIO)) {
> @@ -2296,7 +2292,7 @@ static int __init vfio_init(void)
>  {
>  	int ret;
> 
> -	idr_init(&vfio.group_idr);
> +	ida_init(&vfio.group_ida);
>  	mutex_init(&vfio.group_lock);
>  	mutex_init(&vfio.iommu_drivers_lock);
>  	INIT_LIST_HEAD(&vfio.group_list);
> @@ -2321,11 +2317,6 @@ static int __init vfio_init(void)
>  	if (ret)
>  		goto err_alloc_chrdev;
> 
> -	cdev_init(&vfio.group_cdev, &vfio_group_fops);
> -	ret = cdev_add(&vfio.group_cdev, vfio.group_devt, MINORMASK + 1);
> -	if (ret)
> -		goto err_cdev_add;
> -
>  	pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
> 
>  #ifdef CONFIG_VFIO_NOIOMMU
> @@ -2333,8 +2324,6 @@ static int __init vfio_init(void)
>  #endif
>  	return 0;
> 
> -err_cdev_add:
> -	unregister_chrdev_region(vfio.group_devt, MINORMASK + 1);
>  err_alloc_chrdev:
>  	class_destroy(vfio.class);
>  	vfio.class = NULL;
> @@ -2350,8 +2339,7 @@ static void __exit vfio_cleanup(void)
>  #ifdef CONFIG_VFIO_NOIOMMU
>  	vfio_unregister_iommu_driver(&vfio_noiommu_ops);
>  #endif
> -	idr_destroy(&vfio.group_idr);
> -	cdev_del(&vfio.group_cdev);
> +	ida_destroy(&vfio.group_ida);
>  	unregister_chrdev_region(vfio.group_devt, MINORMASK + 1);
>  	class_destroy(vfio.class);
>  	vfio.class = NULL;
> --
> 2.33.0


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group
  2021-10-01 23:22 ` [PATCH 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group Jason Gunthorpe
  2021-10-04 22:25   ` Alex Williamson
  2021-10-12  7:08   ` Tian, Kevin
@ 2021-10-12  9:04   ` Liu, Yi L
  2 siblings, 0 replies; 29+ messages in thread
From: Liu, Yi L @ 2021-10-12  9:04 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson, Cornelia Huck, kvm
  Cc: Christoph Hellwig, Tian, Kevin

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, October 2, 2021 7:22 AM
> 
> The next patch adds a struct device to the struct vfio_group, and it is
> confusing/bad practice to have two krefs in the same struct. This kref is
> controlling the period when the vfio_group is registered in sysfs, and
> visible in the internal lookup. Switch it to a refcount_t instead.
> 
> The refcount_dec_and_mutex_lock() is still required because we need
> atomicity of the list searches and sysfs presence.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/vfio/vfio.c | 19 +++++++------------
>  1 file changed, 7 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index bf233943dc992f..dbe7edd88ce35c 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -69,7 +69,7 @@ struct vfio_unbound_dev {
>  };
> 
>  struct vfio_group {
> -	struct kref			kref;
> +	refcount_t users;
>  	int				minor;
>  	atomic_t			container_users;
>  	struct iommu_group		*iommu_group;
> @@ -381,7 +381,7 @@ static struct vfio_group *vfio_create_group(struct
> iommu_group *iommu_group,
>  	if (!group)
>  		return ERR_PTR(-ENOMEM);
> 
> -	kref_init(&group->kref);
> +	refcount_set(&group->users, 1);
>  	INIT_LIST_HEAD(&group->device_list);
>  	mutex_init(&group->device_lock);
>  	INIT_LIST_HEAD(&group->unbound_list);
> @@ -441,10 +441,10 @@ static struct vfio_group *vfio_create_group(struct
> iommu_group *iommu_group,
>  	return group;
>  }
> 
> -/* called with vfio.group_lock held */
> -static void vfio_group_release(struct kref *kref)
> +static void vfio_group_put(struct vfio_group *group)
>  {
> -	struct vfio_group *group = container_of(kref, struct vfio_group, kref);
> +	if (!refcount_dec_and_mutex_lock(&group->users, &vfio.group_lock))
> +		return;
> 
>  	WARN_ON(!list_empty(&group->device_list));
>  	WARN_ON(atomic_read(&group->container_users));
> @@ -456,15 +456,9 @@ static void vfio_group_release(struct kref *kref)
>  	vfio_group_unlock_and_free(group);
>  }
> 
> -static void vfio_group_put(struct vfio_group *group)
> -{
> -	kref_put_mutex(&group->kref, vfio_group_release, &vfio.group_lock);
> -}
> -
> -/* Assume group_lock or group reference is held */
>  static void vfio_group_get(struct vfio_group *group)
>  {
> -	kref_get(&group->kref);
> +	refcount_inc(&group->users);
>  }
> 
>  static struct vfio_group *vfio_group_get_from_minor(int minor)
> @@ -1662,6 +1656,7 @@ struct vfio_group
> *vfio_group_get_external_user(struct file *filep)
>  	if (ret)
>  		return ERR_PTR(ret);
> 
> +	/* Since the caller holds the fget on the file users must be >= 1 */

s/on the file users must be >= 1/on the file, group->users must be >= 1/

not native speaker, but may be clearer per me. ^_^

Reviewed-by: Liu Yi L <yi.l.liu@intel.com>

Regards,
Yi Liu

>  	vfio_group_get(group);
> 
>  	return group;
> --
> 2.33.0
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/5] vfio: Use cdev_device_add() instead of device_create()
  2021-10-12  8:33   ` Tian, Kevin
@ 2021-10-12 12:05     ` Jason Gunthorpe
  2021-10-13  1:07       ` Tian, Kevin
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Gunthorpe @ 2021-10-12 12:05 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Alex Williamson, Cornelia Huck, kvm, Christoph Hellwig, Liu, Yi L

On Tue, Oct 12, 2021 at 08:33:49AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Saturday, October 2, 2021 7:22 AM
> >
> [...]
>  
> > +static void vfio_group_release(struct device *dev)
> >  {
> > -	struct vfio_group *group, *existing_group;
> > -	struct device *dev;
> > -	int ret, minor;
> > +	struct vfio_group *group = container_of(dev, struct vfio_group, dev);
> > +	struct vfio_unbound_dev *unbound, *tmp;
> > +
> > +	list_for_each_entry_safe(unbound, tmp,
> > +				 &group->unbound_list, unbound_next) {
> > +		list_del(&unbound->unbound_next);
> > +		kfree(unbound);
> > +	}
> 
> move to vfio_group_put()? this is not paired with vfio_group_alloc()...

Lists are tricky for pairing analysis, the vfio_group_alloc() creates
an empty list and release restores the list to empty.

> >  static int vfio_group_fops_open(struct inode *inode, struct file *filep)
> >  {
> > -	struct vfio_group *group;
> > +	struct vfio_group *group =
> > +		container_of(inode->i_cdev, struct vfio_group, cdev);
> >  	int opened;
> 
> A curiosity question. According to cdev_device_del() any cdev already
> open will remain with their fops callable. 

Correct

> What prevents vfio_group from being released after cdev_device_del()
> has been called? Is it because cdev open will hold a reference to
> device thus put_device() will not hit zero in vfio_group_put()?

Yes, that is right. The extra reference is hidden deep inside the FS
code and is actually a reference on the cdev struct, which in turn
holds a kobject parent reference on the struct device. It is
complicated under the covers, but from an API perspective if a struct
file exists then so does the vfio_group.

Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 5/5] vfio: Use cdev_device_add() instead of device_create()
  2021-10-12 12:05     ` Jason Gunthorpe
@ 2021-10-13  1:07       ` Tian, Kevin
  0 siblings, 0 replies; 29+ messages in thread
From: Tian, Kevin @ 2021-10-13  1:07 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, Cornelia Huck, kvm, Christoph Hellwig, Liu, Yi L

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Tuesday, October 12, 2021 8:05 PM
> 
> On Tue, Oct 12, 2021 at 08:33:49AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Saturday, October 2, 2021 7:22 AM
> > >
> > [...]
> >
> > > +static void vfio_group_release(struct device *dev)
> > >  {
> > > -	struct vfio_group *group, *existing_group;
> > > -	struct device *dev;
> > > -	int ret, minor;
> > > +	struct vfio_group *group = container_of(dev, struct vfio_group, dev);
> > > +	struct vfio_unbound_dev *unbound, *tmp;
> > > +
> > > +	list_for_each_entry_safe(unbound, tmp,
> > > +				 &group->unbound_list, unbound_next) {
> > > +		list_del(&unbound->unbound_next);
> > > +		kfree(unbound);
> > > +	}
> >
> > move to vfio_group_put()? this is not paired with vfio_group_alloc()...
> 
> Lists are tricky for pairing analysis, the vfio_group_alloc() creates
> an empty list and release restores the list to empty.

items are added to this list after vfio_create_group() (in the start of
vfio_unregister_group_dev()). So I feel it makes more sense to move
it back to empty before put_device() in vfio_group_put(). But not a
strong opinion...

> 
> > >  static int vfio_group_fops_open(struct inode *inode, struct file *filep)
> > >  {
> > > -	struct vfio_group *group;
> > > +	struct vfio_group *group =
> > > +		container_of(inode->i_cdev, struct vfio_group, cdev);
> > >  	int opened;
> >
> > A curiosity question. According to cdev_device_del() any cdev already
> > open will remain with their fops callable.
> 
> Correct
> 
> > What prevents vfio_group from being released after cdev_device_del()
> > has been called? Is it because cdev open will hold a reference to
> > device thus put_device() will not hit zero in vfio_group_put()?
> 
> Yes, that is right. The extra reference is hidden deep inside the FS
> code and is actually a reference on the cdev struct, which in turn
> holds a kobject parent reference on the struct device. It is
> complicated under the covers, but from an API perspective if a struct
> file exists then so does the vfio_group.
> 

Make sense. Thanks for explanation.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 5/5] vfio: Use cdev_device_add() instead of device_create()
  2021-10-12  8:57   ` Liu, Yi L
@ 2021-10-13 12:49     ` Jason Gunthorpe
  2021-10-13 14:15       ` Liu, Yi L
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Gunthorpe @ 2021-10-13 12:49 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Alex Williamson, Cornelia Huck, kvm, Christoph Hellwig, Tian, Kevin

On Tue, Oct 12, 2021 at 08:57:22AM +0000, Liu, Yi L wrote:
> > +	ret = __vfio_group_get_from_iommu(iommu_group);
> > +	if (ret)
> > +		goto err_unlock;
> > 
> > -	minor = vfio_alloc_group_minor(group);
> > -	if (minor < 0) {
> > -		vfio_group_unlock_and_free(group);
> > -		return ERR_PTR(minor);
> > +	err = cdev_device_add(&group->cdev, &group->dev);
> > +	if (err) {
> > +		ret = ERR_PTR(err);
> > +		goto err_unlock;
> 
> should this err branch put the vfio_group reference acquired
> in above __vfio_group_get_from_iommu(iommu_group);?

No, if ret was !NULL then the reference is returned to the caller via
the err_unlock

This path to cdev_device_add has ret = NULL so there is no reference
to put back.

Jason

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH 5/5] vfio: Use cdev_device_add() instead of device_create()
  2021-10-13 12:49     ` Jason Gunthorpe
@ 2021-10-13 14:15       ` Liu, Yi L
  0 siblings, 0 replies; 29+ messages in thread
From: Liu, Yi L @ 2021-10-13 14:15 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, Cornelia Huck, kvm, Christoph Hellwig, Tian, Kevin

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, October 13, 2021 8:49 PM
> 
> On Tue, Oct 12, 2021 at 08:57:22AM +0000, Liu, Yi L wrote:
> > > +	ret = __vfio_group_get_from_iommu(iommu_group);
> > > +	if (ret)
> > > +		goto err_unlock;
> > >
> > > -	minor = vfio_alloc_group_minor(group);
> > > -	if (minor < 0) {
> > > -		vfio_group_unlock_and_free(group);
> > > -		return ERR_PTR(minor);
> > > +	err = cdev_device_add(&group->cdev, &group->dev);
> > > +	if (err) {
> > > +		ret = ERR_PTR(err);
> > > +		goto err_unlock;
> >
> > should this err branch put the vfio_group reference acquired
> > in above __vfio_group_get_from_iommu(iommu_group);?
> 
> No, if ret was !NULL then the reference is returned to the caller via
> the err_unlock
> 
> This path to cdev_device_add has ret = NULL so there is no reference
> to put back.

oh, yes. only if ret is NULL will go ahead to this place.

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2021-10-13 14:16 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-01 23:22 [PATCH 0/5] Update vfio_group to use the modern cdev lifecycle Jason Gunthorpe
2021-10-01 23:22 ` [PATCH 1/5] vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier() Jason Gunthorpe
2021-10-04 22:25   ` Alex Williamson
2021-10-04 22:34     ` Jason Gunthorpe
2021-10-05  4:01       ` Alex Williamson
2021-10-05 16:17         ` Jason Gunthorpe
2021-10-12  6:32   ` Tian, Kevin
2021-10-12  8:51   ` Liu, Yi L
2021-10-01 23:22 ` [PATCH 2/5] vfio: Do not open code the group list search in vfio_create_group() Jason Gunthorpe
2021-10-02  3:19   ` kernel test robot
2021-10-12  6:37   ` Tian, Kevin
2021-10-12  8:52   ` Liu, Yi L
2021-10-01 23:22 ` [PATCH 3/5] vfio: Don't leak a group reference if the group already exists Jason Gunthorpe
2021-10-04 22:25   ` Alex Williamson
2021-10-04 22:36     ` Jason Gunthorpe
2021-10-05  4:01       ` Alex Williamson
2021-10-05 14:45         ` Jason Gunthorpe
2021-10-01 23:22 ` [PATCH 4/5] vfio: Use a refcount_t instead of a kref in the vfio_group Jason Gunthorpe
2021-10-04 22:25   ` Alex Williamson
2021-10-04 22:39     ` Jason Gunthorpe
2021-10-12  7:08   ` Tian, Kevin
2021-10-12  9:04   ` Liu, Yi L
2021-10-01 23:22 ` [PATCH 5/5] vfio: Use cdev_device_add() instead of device_create() Jason Gunthorpe
2021-10-12  8:33   ` Tian, Kevin
2021-10-12 12:05     ` Jason Gunthorpe
2021-10-13  1:07       ` Tian, Kevin
2021-10-12  8:57   ` Liu, Yi L
2021-10-13 12:49     ` Jason Gunthorpe
2021-10-13 14:15       ` Liu, Yi L

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.